https://doi.org/10.31449/inf.v49i12.7292                                              Informatica 49 (2025) 1–18   1 
 
Improved Generative Adversarial Network and Particle Swarm 
Optimization Support Vector Machine for Tennis Serving Behavior 
Analysis 
Haibo Cao 
School of Physical Education, Xinyang Normal University, Xinyang, 464000, China 
E-mail: caohb@xynu.edu.cn 
 
Keywords: video images, generate adversarial networks, particle swarm optimization algorithm, support vector 
machine, behavior analysis 
 
Received: October 9, 2024 
This study proposes a behavior analysis model based on an improved generative adversarial network and 
particle swarm optimization support vector machine algorithm for deblurring and feature extraction in 
tennis video serving behavior. The model first improves the generative adduction network by introducing 
a multi-layer convolution structure and a variety of activation functions, including three-layer 
convolution. The activation function is selected to use ReLU and Leaky ReLU alternately to enhance the 
generator in capturing image details. During model training, the generator optimizes the output image by 
minimizing the Wasserstein distance, and the discriminator evaluates the difference between the 
generated image and the real image. Then, to further extract features, the particle swarm optimization 
algorithm was used to dynamically optimize the feature extraction of each frame in the feature space, and 
dynamically adjust the inertia weights. The initial value was 0.9 and the final value was 0.4. After feature 
extraction, the data were input into SVM for classification. The penalty parameter of SVM was set to 1 
and the accuracy was set to 0.001. The results of the comparative experiments demonstrated that the 
proposed method exhibited superior performance in deblurring images, with an average subjective score 
of 81.16 points, a notable advantage over the comparison algorithm. In the objective evaluation, the 
average Peak Signal-to-Noise Ratio (PSNR) and structural similarity value of the image after defuzzing 
by the research method reached 35.12dB and 0.93, respectively. The PSNR and structural similarity of the 
image increased by 13.56% to 18.29% and 8.33% to 19.90%, respectively. In the feature extraction and 
classification experiments, the accuracy of the proposed algorithm reached 91.24%, which was 
significantly higher than the traditional algorithm. The convergence speed was faster than the particle 
swarm optimization algorithm, the ant colony optimization algorithm, and the simulated annealing 
algorithm, reducing the number of iterations by 35.33%, 40.52%, and 51.55%, respectively. The data 
validate that the designed method has good application prospects in improving video image quality and 
feature extraction. 
Povzetek: Raziskava se ukvarja z analizo teniškega servisa, ki z izboljšanimi algoritmi GAN in PSO-SVM 
poboljša ugotovitve iz slik.
 
1 Introduction tactical analysis [3-4]. Currently, many industry scholars 
have researched the image processing technology and 
In light of the accelerated advancement of digital 
action behavior analysis. Ding Q et al. proposed a 
media and video technology, the analysis and 
camera-based long-term trajectory tracking technique to 
processing of video content have become increasingly 
improve the effectiveness of multi-target tracking 
important in many fields. Especially in sports science 
technology in sports game feature recognition. This study 
and sports analysis, how to extract valuable 
first improved the Tracking-Learning-Detection (TLD) 
information from video images has become a research 
algorithm and then integrated machine learning methods 
focus. Tennis, as a dynamic and complex sport, often 
into the improved algorithm. This method has 
suffers from blurring in its video images due to various 
significantly improved performance and can be 
factors such as the athlete's rapid movement, different 
effectively applied to feature extraction in sports events 
shooting angles, and changes in lighting. These factors 
[5]. Mulimani D et al. developed a video preprocessing 
can lead to a decrease in image quality, affecting game 
technique under a new framework. This method first 
analysis and technical evaluation [1-2]. Therefore, 
calibrated players and classified occlusions to ensure 
improving the clarity and information extraction ability 
accurate identification of athletes in complex game 
of Tennis Video Images (TVI) has important practical 
environments. Subsequently, by utilizing the system to 
significance for athlete performance evaluation and  
2   Informatica 49 (2025) 1–18                                                                       H. Cao 
track and label athletes on the court, the framework introduced a skeleton attention module in the action 
significantly improved the tracking accuracy of recognition data system, which shoots the skeleton 
basketball players and provided more reliable technical sequence onto a single-RGB frame to assist focusing on 
support for fairness in the game [6]. Zhang J et al. the limb motion area. Experiments on the NTU RGB+D 
developed a new spatial attention and temporal dilation and SYSU benchmarks have shown that compared to 
GCN that uses a self-attention mechanism to select SOTA methods, this model achieved competitive 
human joints that are beneficial for action recognition, performance while reducing network complexity [8]. The 
thereby reducing the impact of data redundancy and results of related work are summarized in Table 1. 
noise. Extensive experiments on NTU-RGB+D and While the aforementioned research has yielded promising 
Kinetics Skeleton have shown that this method outcomes in the domains of video image processing and 
completes State-of-the-art (SOTA) performance in action 
skeleton-based action recognition [7]. Zhu X et al. 
 
Table 1: Research status analysis 
Accuracy PSNR 
Method SSIM Major deficiency 
(%) (dB) 
Multi-target tracking accuracy is insufficient in 
Ding Q et al. [5] 82 28.5 0.76 
complex background. 
The real-time and accuracy of rapidly changing 
Mulimani D et al. [6] 80.5 29 0.79 
scenes are insufficient. 
Data redundancy and noise are likely to affect 
Zhang J et al. [7] 86 30.8 0.82 
the processing of diverse actions. 
The recognition ability in complex motion 
Zhu X et al. [8] 84 31.1 0.81 
scenes needs to be improved. 
  
behavior analysis, it has not yet fully addressed the  
intricacies of the fuzzy states that arise in motion scenes. blur and information redundancy caused by dynamic 
When faced with information redundancy and feature scenes. Assuming that the improved GAN structure will 
overlap, it is easy to encounter the problem of effectively enhance feature extraction capability 
misidentification of action behavior. This study aims to through multi-layer convolution and multiple activation 
propose a method based on the combination of functions. Therefore, the goal of improving Peak 
improved a Generative Adversarial Network (GAN) Signal-to-Noise Ratio (PSNR) and Structural Similarity 
and Particle Swarm Optimization - Support Vector (SSIM) indicators, and achieving more accurate 
Machine (PSO-SVM) for fuzzy removal and feature recognition of tennis serving actions can be achieved. 
extraction in TVI processing. The innovation of this The analysis of tennis serve movements is not only 
method lies in combining improved GAN and PSO important for improving athletes' technical level but 
algorithms to optimize the overall process of deblurring also has significant implications in sports training, 
and feature extraction, making it suitable for complex match judgment, and tactical development. Therefore, 
motion scenes. By dynamically adjusting the inertia this study is expected to provide effective support for 
weight, the convergence speed and accuracy of the PSO sports science and intelligent sports applications. 
algorithm have been improved, effectively enhancing Tennis is a popular competitive sport, divided into 
the performance of the model at the feature extraction singles and doubles forms. Participants throw the ball 
level. and hit it with a racket to make it land on the opponent's 
court. In tennis matches, serving is the way the game 
2  Methods and materials begins and an important part that determines the pace 
and strategy of the game. Serving is a highly technical 
2.1 Deblurring processing based on video action in tennis that involves multiple key elements, 
images including preparation, pitching, hitting, and swinging. 
Serving is not only a technical action but also a 
The goal of this research is to propose a method based 
strategic behavior. Athletes can choose different types 
on the combination of improved GAN and PSO-SVM 
of serve and make different choices based on their 
to improve the deblurring effect and feature extraction 
opponents' weaknesses and court conditions. Choosing 
ability of TVIs, thereby improving the accuracy of 
which side of the court to serve on can affect the 
tennis serve motion recognition. The research 
opponent's receiving angle and preparation. When 
specifically focuses on solving the problem of image  
Improved Generative Adversarial Network and Particle Swarm…                          Informatica 49 (2025) 1–18   3 
serving, the athlete's psychological state can also affect represents the true state of the athlete at the moment of 
their performance. At the same time, the serving serving. By restoring the original image, the serving 
behavior of athletes is influenced by various factors, behavior can be analyzed more accurately. K  is the 
including physical fitness, proficiency in serving skills, convolution kernel. In serve analysis, convolutional 
the athlete's judgment of the game progress, kernels can be used to simulate the trajectory of an 
understanding of the opponent, and scientific training athlete's swing and ball flight, aiming to understand the 
methods and feedback mechanisms. In terms of source of ambiguity. N  is additive noise, which may 
physical fitness, technical proficiency, and tactical be interference caused by device noise or changes in 
awareness, athletes need to continuously improve ambient light. Analyzing this noise can improve the 
through their own efforts. In terms of scientific training accuracy and reliability of serving moments.   is a 
methods and feedback mechanisms, this study believes convolution operation. In the analysis of serving 
that using video image processing technology can behavior, this operation helps identify and simulate the 
provide a detailed analysis of athletes' serving cause of blurring, thereby restoring the original image 
movements. By capturing the athlete's serve process at a and analyzing the behavior. In the context of blurry 
high frame rate, analyzing the movements in great images, the Camera Response Function (CRF) is an 
detail, and using computer vision algorithms to process important tool. It helps to understand and process image 
the video, it is possible to extract keyframes, measured data by describing how the camera converts light into 
serve angles, speeds, and technical elements. By using image pixel values, as given by equation (2) [10]. 
machine learning to analyze serving data, patterns and 
g(I 
performance characteristics of athletes during training S (i) ) = IS '(i)  (2) 
can be identified. g
In equation (2),  is the CRF approximation function, 
The occurrence of blurry video images in TVI can be 
which describes the conversion of light rays into pixel 
attributed to a number of factors, including the camera's 
values. The significance of CRF lies in its capacity to 
shutter speed being inadequate to capture fast-moving 
rectify discrepancies in image brightness resulting from 
objects during rapid motion, such as the swing of a 
disparate camera models or shooting conditions, 
racket or the trajectory of a ball. The shaking or 
thereby enhancing image uniformity and comparability. 
vibration of the shooting equipment can cause the entire 
 is a constant value, usually set to 2.2 by default. 
image to blur. Shooting tennis matches in low light IS (i)  is a potential clear image, representing the 
environments can also increase the risk of motion blur 
original unaffected image. It is an ideal image that can 
[9]. The fuzzy model of TVI can be represented by 
be generated through deblurring and restoration, 
equation (1). 
thereby helping to establish accurate feature 
I
I = K  I +N representations in image retrieval. S '(i)  is a clear 
B S   (1) I
image observed. The calculation of blurred image B  
I
In equation (1), B  is the blurred image, which is the is shown in equation (3). 
object that needs to be restored or studied during the  
I
analysis process. S  is the original image, which 
Random noise Generator Synthetic data Real sample
or
Discriminator
Forgery of samples
Real data
 
Figure 1: GAN structure. 
 
 
 
4   Informatica 49 (2025) 1–18                                                                       H. Cao 
1 M
IB = g(  IS (t ) ) (Z)
M the expected clear image.  represents possible 
t=1   (3) 
 fuzzy kernels. Due to the excellent performance of 
In equation (3), t  is the time in the video image. M  GAN, this study applies it to denoising sports images. 
is the number of clear frames used to generate blurry 
Figure 1 shows the GAN structure. 
images. In image retrieval, the quantity of M  affects 
In Figure 1, the task of the GAN generator is to receive 
the quality of blurry images. Collecting multiple clear 
blurry images as input and attempt to generate outputs 
images can help generate more stable and rich image 
corresponding to real clear images. The generator 
features. This study calculates the actual blurred image 
gradually adjusts its parameters by learning the 
using equation (4). 
mapping relationship between blurry and clear images 
1 T
IB = g(  I to achieve the goal of generating high-quality clear 
S (t )dt)
T t=0  (4) images. At the same time, the discriminator receives the 
 image and outputs a probability value representing the 
In equation (4), T  is the exposure time period, which probability that the input image is "real". The 
indicates the reception time of the light line in the discriminator strives to improve its judgment ability to 
captured image. In image retrieval, the corresponding accurately identify the differences between the 
exposure time can affect the brightness and details of generator's output and the real image [13-15]. The 
the image. This study adopts a non-blind deblurring objective function for the confrontation between the 
method for blurry images. This method can obtain generator and discriminator is shown in equation (6) 
information about the blur kernel used in the restoration . 
process of blurred images. The blur kernels may be the minG maxD V (D,G) = Ex~P [log D(x)]+ Ez~P [log(1−D(G(z)))]
data ( x ) z ( x )  (6) 
result of a number of factors, including camera motion, 
object motion, inaccurate focusing, and other causes P
In equation (6), x  is a real sample from data(x)  used 
[11-12]. This study assumes that the noisy and original 
images are Y  and X , and the blur kernel stands for to train the discriminator and help it learn how to 
Z . The non-blind deblurring process is given by 
equation (5). E
recognize the characteristics of clear images. x~ Pdata ( x )  
 
is the expectation for  
Xˆ , Zˆ 2
= arg min Z  X −Y +(X ) + (Z )
2  (5)  
(X )
In equation (5),  is the regularization term for 
X
X Identity
ReLU
Weight layer ReLU
X
F(X) ReLU Identity
Weight layer
X
F(X)+X ReLU
F(X1+2)
F(X1) F(X1+l)
(a) Traditional residual block (b) Structure diagram of 
structure diagram improved residual block  
Figure 2: Traditional residual block structure. 
 
clear input images. In the field of image retrieval, the calculating expectations, a more balanced approach to 
concept of "expectation" plays a pivotal role in the generation and discrimination can be achieved, 
development of effective models. By effectively ultimately enhancing the clarity and quality of both 
3*3Conv
Dropout
3*3Conv
3*3Conv
Improved Generative Adversarial Network and Particle Swarm…                          Informatica 49 (2025) 1–18   5 
D()
generated and blurred images.  is the output of blurred image through multiple convolutional layers. 
discriminator D . The discriminator continuously For TVI, important features include the trajectory of the 
optimizes its classification ability by comparing real ball and the movements of the athletes. Meanwhile, 
samples with generated samples. In image retrieval, the with the help of residual connections, the network can 
performance of the discriminator affects the quality of converge to the optimal solution faster. However, 
the generated images by the generator, which further traditional residual blocks typically contain fewer 
G()
affects the accuracy of the retrieval results.  is the convolutional layers or simpler structures, which may 
output of generator G . When the image retrieval limit the model's capacity to learn complex features and 
system faces a fuzzy query, the generator can transform details. In the face of more complex models, the lack of 
the fuzzy image into a clear image. Wasserstein GAN effective regularization may lead to overfitting. 
(WGAN) is a variant of GAN. In image deblurring Therefore, this study improves the traditional RBS, as 
tasks, gradient vanishing is a common problem due to shown in Figure 2 (b). The improvement of residual 
image degradation and other reasons, which affects the blocks in this study mainly includes increasing the 
convergence of the model. This study uses Wasserstein depth of convolutional layers, introducing multiple 
distance to quantify the difference between the activation functions, applying Dropout, implementing 
generator and the real data distribution, which can skip connection modules, and removing batch 
better handle the problem of gradient vanishing and normalization. The improved residual block consists of 
help the generator learn the data distribution better [16]. three convolutional layers, each using a 3x3 
To improve the deblurring effect of images, this study convolution kernel. This design enhances the expressive 
improves the structure of traditional residual blocks. power of  
The Residual Bock Structure (RBS) is displayed in  
Figure 2. 
In Figure 2 (a), RBS extracts feature from the input 
max max
x2 x2
min min
fitness fitness
x1 x1
(a) time 1 (a) time 2
 
Figure 3: Schematic diagram of PSO particle motion. 
 
the model, enabling it to capture more complex feature the basis for training network models. Generating 
representations. Introducing two ReLU activation 
samples is the key to successful high-quality image 
functions between two convolutional layers can 
accelerate convergence and help the model learn 
nonlinear features. The final skip connection module is retrieval. (Pdata , Pg )
 is the set of joint distributions 
retained to alleviate gradient vanishing and explosion 
problems, ensure the flow of important information, and P P
of data  and g . By comparing the joint distribution 
maintain the training stability of the model. Removing 
the batch normalization layer makes the model more of real samples and generated samples, the generator 
flexible during small batch training and reduces can more effectively capture the features of real data 
computational burden. The loss function is shown in 
when generating images, ensuring the similarity 
equation (7). 
between generated samples and real samples in the 
W (Pdata , Pg ) = inf E(x, y)~  x − y 
 ~ (Pdata ,Pg )
 (7) (x, y) ~ 
feature space.  is one of the samples, which 
In equation (7), x y
 and  are real samples and 
generated samples. In image retrieval, real samples are supports adversarial training between the generator and 
6   Informatica 49 (2025) 1–18                                                                       H. Cao 
discriminator, improving the model's generalization of the feature map. 
ability and optimization performance. inf  is the  
expected distance. The loss function of the model 2.2 Feature extraction and classification of 
generator is equation (8). video images based on PSO-SVM algorithm 
2 The above study extracts and recognizes key features of 
W H
1 i , j i , j  i , j (I S ) 
x, y
Lx =  video images by improving the residual network 
 
W H  ( B )
i , j i , j x=1 y=1 −i , j GG (I ) 
x, y  structure. Due to the complexity of dynamic scenes in 
 (8) 
tennis, video images often contain a large amount of 
information, resulting in the problem of information 

In equation (8), i , j  is a feature map that can capture redundancy between features, which has a negative 
impact on feature  
important semantic information and high-level features 
 
W
in the image. i, j H
 and i , j  are the width and height 
Intra population Inter population 
communication communication
Subpopulation 1 gbest1
Subpopulation 2 gbest1
sbest
Subpopulation N gbestN
 
Figure 4: MPSO structure diagram. 
 
extraction. To improve the effectiveness of feature each particle's current position is calculated to evaluate 
extraction, this study introduces PSO-SVM [17-18] into the quality of that position. The particle speed update is 
the model. In the PSO-SVM algorithm, PSO is shown in equation (9) [20]. 
responsible for feature extraction in tennis, while SVM  
is responsible for feature recognition and classification. V t+1 =V t
i 1 and() (pbest X t
i + c r  i − i )+ c2 rand()(gbest − X t
i )  (9) 
In PSO, individuals update their position and velocity to 
find the optima to the problem. The concept of  
employing PSO for the purpose of extracting features In equation (9),   is the inertia weight, which is 
from motion images entails the treatment of each frame 
within the image sequence as a particle. Through the c
usually a non-negative value. 1 c
 and 2  are 
optimization of the particle's motion trajectory, it is 
possible to selectively capture salient features of object acceleration factors. The position of the i -th particle is 
motion [19]. The PSO motion is shown in Figure 3. 
Figure 3 shows the motion of PSO particles. In the X
represented as vector i rand()
.  represents a 
initial stage of the PSO, a set of particles is generated at 
random, each representing a possible solution value. pbest
random number between [0, 1]. i
The motion of each particle mainly updates its current  is the optimal 
position and velocity. After each update, the fitness of 
...
...
Improved Generative Adversarial Network and Particle Swarm…                          Informatica 49 (2025) 1–18   7 
position of particle i . The best position of the 
X t+1 = X t +V t+1
i i i   (10) 
gbest
population is represented by . When the   The above process indicates that the PSO owns the 
characteristics of simplicity, ease of implementation, 
value is large, the global Search Ability (SA) is strong 
and strong universality. However, due to the fact that 
and the local SA is weak. When the   value is low, particles in PSO belong to the same population, it is 
the local SA is strong and the global SA is weak. The easy to form local optimal positions, resulting in the 
expression for particle position is shown in equation inability to obtain global optimal solutions and the 
problem of excessive dependence on  
(10). 
 
ndary
Bou
Hyperplane
Boundary
Interval
 
Figure 5: SVM optimal classification hyperplane. 
 
parameters. In response to the above issues, this study V t+1 =V t + c  rand  pbest t − X t
i id 1 () ( id id )+
adopts a combination of multiple groups and adaptive           c2  rand ()[(gbest t X t
kd − id ) + (1−)(sbest t t
d − X id )]
adjustment of acceleration coefficients to optimize PSO 
and proposes the MPSO optimization algorithm [21-22].  (11) 
The MPSO algorithm can effectively improve the  
division of labor and cooperation among populations. In pbest t
In equation (11),   is the inertia weight. id
MPSO, a population contains multiple sub-populations, s e t t  
b s
and id  are the historical and global best position 
which in turn contain multiple particles. The MPSO pbest t
of particle i  and particle swarm. kd  is the 
algorithm structure is shown in Figure 4. c
optimal position of sub-population k  so far. 1,c2  
In Figure 4, each sub-population is a complete rand()
are the acceleration factors.  is a random 
communication and interaction system. All particles number between [0, 1].   is the classification 
within the population can communicate. During accuracy of the sample.   is the number of 
sub-populations. Due to the impact of inertia weight on 
algorithm iteration, it is required to discover the 
gbest the balance of local and global SAs, the value of inertia 
optimum  for each subgroup, and then find the 
weight is crucial in PSO algorithm. However, the 
optimal solution sbest  for the entire particle swarm, control effect of fixed inertia weights on global and 
sbest =max(gbest , gbest
which is 1 2 ,..., gbestN )
. N  is local search capabilities is limited. Accordingly, this 
the number of sub-populations. Therefore, in MPSO, study employs a linear differential descent method to 
the velocity update formula for particles is shown in dynamically adjust the inertia weight, thereby 
enhancing the algorithm's overall SA in the initial stage 
equation (11). 
of iteration and enhancing its local SA in the 
 
subsequent phase of iteration. The dynamically adjusted 
 inertia weight is shown in equation (12). 
 
8   Informatica 49 (2025) 1–18                                                                       H. Cao 
( − ) t 2 (x , y ) x y
(t) =  − max min size of i i . i  is the input vector. i  is the 
max
t 2
max  (12) 
corresponding output target. The SVM model initially 
employs a high-dimensional mapping feature space, 
t
In equation (12), t  is the current iteration count. max  
which facilitates the identification of a superior 
hyperplane for the separation of diverse categories of 

is the maximum iterations. max  is the initial inertia 
data. Subsequently, it utilizes linear functions within the 
feature space for function approximation. According to 

weight, which is set to 0.9 in the study. min  is the 
statistical theory, the SVM minimum optimization 
inertia weight at the maximum iteration count, set to 0.4. objective function yields a fitted regression function as 
The improved particle velocity update formula is shown shown in equation (14) [28]. 
in equation (13). 1 2 n
min(W ,b) : W +C yi −[W ,(X ) −b]
2 i=1
V t+1 (t) V t
id c1 rand () ( pbest t t
i =  +   id − X id )+  (14) 
          c2  rand ()[(gbest t t ) (1 )( t t
kd − X id + − sbestd − X id )] In equation (14), W  is vector data. b  is the function 
 (13) y
threshold.  is the function value after dot product 
(t) (X )
In equation (13),  is the dynamically adjusted processing.  is an approximation function. C  
inertia weight. The above is a feature model is the penalty coefficient for training model complexity 
construction built on the improved PSO. The MPSO and controlling model loss. The classification of the 
first randomly divides the particle swarm into K  model can be completed through equation (14). This 
sub-populations and initializes the sub-population model can effectively solve problems regardless of the 
particles randomly. Then, the individual extremum of sample size or whether the linear fitting conditions are 
the particles, the optimal value of each sub-population, met, and due to its global strategy, it will not be unable 
and the global optima of the entire PSO are selected. to obtain the optimal solution due to local optima. In the 
The next step is to determine the optimal value found in PSO-SVM model, particles dynamically adjust their 
the search. If the conditions are met, running is stopped. position and speed through interactions to efficiently 
Otherwise, the speed and location of the particles are capture features in video frames. Each particle 
continued to update, and the best value is selected to represents a potential solution whose position 
continue running the algorithm until the conditions are corresponds to the selection of key attributes in feature 
met. Finally, the optimal solution of the optimization extraction. The fitness value of the particle is evaluated 
problem is outputted. After extracting features from based on the classification accuracy feedback. 
video images as described above, this study uses SVM Additionally, the particle is able to learn which features 
for feature recognition and classification [23-25]. SVM are most relevant in tennis serve recognition by 
is an algorithm that maps low dimensional data to comparing with the historical optimal position and the 
high-dimensional data to minimize functional. The global optimal position. The specific particle 
model is defined as shown in Figure 5. parameters, including inertial weights and acceleration 
In Figure 5, SVM is a binary classification model. This factors, influence the exploration and development 
model is the nonlinear classifier with the largest interval capabilities of particles, which assist in balancing 
in the feature space. The learning strategy of SVM is to global search and local search in complex feature 
maximize the interval, which can be formalized as a spaces. Ultimately, this improves the accuracy and 
convex quadratic programming problem. This is recognition rate of feature extraction. 
equivalent to the minimization problem of a regularized  
hinge loss function [26-27]. The learning algorithm of  
SVM is the optimization algorithm for solving convex  
quadratic programming. This study assumes a sample 
Improved Generative Adversarial Network and Particle Swarm…                          Informatica 49 (2025) 1–18   9 
learn from non-linear features. By removing batch 
3  Results normalization, the study enhances the flexibility of the 
generator and reduces the introduction of noise. It is 
3.1 Analysis of deblurring effect based on imperative to ensure that each algorithm is configured 
video image processing and PSO-SVM with identical initial settings to minimize variation. 
To verify the performance and effectiveness of the Furthermore, the number of iterations must be trained 
research algorithm, this study conducts comparative up to 3,000 times. The learning rate of GAN is set to 
experiments for analysis. The comparative methods 0.0002. The learning rate of PSO is set to 0.5, and the 
include GAN, Attention Mechanism-GAN (AGAN), number of PSO particles is 100. The initial value of 
and Multi-Scale Convolution (MSC) algorithm. In the inertia weight is 0.9 and dynamically adjusted to 0.4. 
improved GAN structure, the number of convolutional The acceleration factor is set to 2.0. Performance 
layers has been improved to enhance the extraction evaluation indicators include relative subjective and 
capability of intricate features. Each convolutional layer objective indicators. The relative subjective indicator is 
employs a 3×3 convolution kernel to facilitate the the subject's evaluation of image quality. Objective 
capture of image details with greater precision. In indicators are selected as PSNR and SSIM. The higher 
addition, ReLU and Leaky ReLU activation functions the PSNR value, the closer the SSIM value is to 1, 
are used interchangeably, aiming to avoid the "dead indicating  
neuron" phenomenon and improve the model's ability to  
100 100
Proposed method AGAN Proposed method AGAN
GAN GAN
90 MSC MSC
90
80 80
70 70
60 60
50 50
40 40
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Number of tests Number of tests
(a) Ball games image (b) Athletic sports images
 
Figure 6: Subjective evaluation score results of the model. 
 
37 Proposed method AGAN 37
Proposed method AGAN
34 GAN MSC 34
GAN MSC
31 31
28 28
25 25
22 22
19 19
16 15 25 50 16 15 25 50
Noise standard deviation Noise standard deviation
(a) Ball games image (b) Athletic sports images
 
Figure 7: Comparison of average PSNR of different deflurring algorithms. 
 
 
Average PSNR (dB) Subject rating
Average PSNR (dB) Subject rating
10   Informatica 49 (2025) 1–18                                                                       H. Ca
o 
better image quality, that is, better deblurring effect. score of the deblurred image is 86.94. The average 
The experiment uses sports images from the GOPRO scores of AGAN, GAN, and MSC do not exceed 75. To 
dataset for testing. This dataset divides images into ball compare the PSNR and SSIM performance, this paper 
sports and track and field sports. In subjective adds noise with standard deviations of 15, 25, and 50 to 
evaluation, the model score results are shown in Figure the test original images, respectively, to generate test 
6. images for testing the algorithm's deblurring ability. 
Figures 6 (a) and (b) show the subjective evaluation Following the implementation of a research adjustment, 
scores of subjects on the deblurring effect of images in the image quality has been markedly enhanced, as 
ball sports and track and field sports. In Figure 6 (a), the evidenced by elevated PSNR and SSIM values. This 
deblurred image under the research method scores the signifies that the generated image is more closely 
highest, with an average score of 81.16 out of 30 aligned with the authentic image in terms of structural 
experiments. In the evaluation of 50 subjects, the similarity and clarity. It is particularly well-suited for 
research method shows better deblurring effect on ball the processing of dynamic motion scenes. Figure 7 is a 
sports images. In Figure 6 (b), the subjective evaluation boxplot of the mean PSNR of various algorithms. 
obtained by the research method is better, and the mean  
1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
Proposed method AGAN Proposed method AGAN
0.5 GAN MSC 0.5
GAN MSC
0.4 15 25 50 0.4 15 25 50
Noise standard deviation Noise standard deviation
(a) Ball games image (b) Athletic sports images
 
Figure 8: Average SSIM of different deblurring algorithms. 
 
Figures 7 (a) and (b) show the average PSNR 3.2 Feature extraction based on video image 
performance of algorithms in image deblurring tests for 
processing and PSO-SVM model 
ball sports and track and field sports. The PSNR values 
To validate the proposed sports image feature extraction 
of the research methods are consistently the highest. 
algorithm, this study selects the difficult to solve single 
Compared with AGAN, MSC, and GAN, the research 
peak function Rosenbrock and the easy to trap 
method increases the average PSNR by 13.56%, 
algorithm in local optima multi-peak function Griebank 
15.02%, and 18.29% in Figure 7 (a), and by 12.82%, 
as the standard test functions. To verify the 
14.02%, and 22.72% in Figure 7 (b), respectively. This 
convergence performance of MPSO, this study 
indicates that the difference between the original and 
compares PSO, Ant Colony Optimization (ACO), and 
the deblurred images is smaller under the research 
Simulated Annealing (SA). In the experiment, all 
method, indicating that the deblurring effect is better. 
algorithms are set with the same common parameters, 
Figure 8 shows the mean SSIM boxplots of four 
namely population size and dimensionality. The 
algorithms. 
iterations are 3000. Each algorithm is independently run 
In the deblurring tests of ball sports and track and field 
100 times on each test function, and statistical analysis 
sports images in Figures 8 (a) and (b), the SSIM values 
is conducted on the results of the 100 runs. Figure 9 
of the research method are the highest, and the overall 
shows the convergence curve of the obtained algorithm. 
performance is more stable. Compared with AGAN, 
Figure 9 shows the convergence curves of the algorithm 
MSC, and GAN algorithms, in Figure 8 (a), the average 
on Griebank and Rosenbrock. MPSO achieves the 
SSIM of the research method increases by 8.33%, 
highest average accuracy in the shortest number of 
12.24%, and 19.90%, while in Figure 8 (b), it increases 
iterations. In Figure 9 (a), at 850 iterations, MPSO 
by 12.34%, 19.38%, and 22.20%. Overall, the research 
approaches convergence with an average accuracy of 
method has the best deblurring effect, followed by 
97.31%. Compared to the other three algorithms, 
AGAN, MSC, and finally GAN. The experimental 
MPSO has reduced the number of iterations during 
results validate the effectiveness of this study. 
convergence by 35.33%, 
 
Average SSIM
Average SSIM
Improved Generative Adversarial Network and Particle Swarm…                          Informatica 49 (2025) 1–18   11 
 
99 95 PSO ACO
98 94 SA MPSO
PSO ACO
97 SA MPSO 93
96 92
95 91
94 90
93 89
92 88
90 87
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Number of iterations Number of iterations
(a) Test results on Griewank (b) Test results on Rosenbrock
 
Figure 9: Comparison of algorithm convergence curves. 
 
1.0 1.0
0.8 0.8
0.6 0.6
0.4 ACO 0.4 ACO
MPSO MPSO
0.2 PSO 0.2 PSO
SA SA
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Recall Recall
(a) Ball game images (b) Athletic sports images
 
Figure 10: Comparison of algorithm PR curve. 
 
40.52%, and 51.55%. In Figure 9 (b), after 1100 respectively. In Figure 10 (b), MPSO has the best 
iterations, MPSO approaches convergence with an feature extraction performance, while ACO and SA 
average accuracy of 92.94%. In contrast, the number of have similar performance. ACO has the relatively worst 
iterations during MPSO convergence decreases by feature extraction performance. Finally, in sports image 
25.37%, 26.54%, and 41.89%. This verifies that the feature extraction, feature extraction time is also a 
iteration speed and accuracy of MPSO are superior to commonly used evaluation metric, which can be used to 
SA, ACO, and PSO. To validate the performance of determine the efficiency of feature extraction methods. 
various feature extraction algorithms, this study tests  
the Precision-Recall (PR) curves of different algorithms, 
3.3 Behavior analysis based on video image 
as exhibited in Figure 10. 
Figure 10 shows the PR curves of various algorithms in processing and PSO-SVM Model 
the images of ball sports and track and field sports. In To verify the effectiveness of the research  
Figure 10 (a), the performance of each algorithm from  
best to worst is MPSO, SA, ACO, and PSO, 
Precision Average accuracy (%)
Precision Average accuracy (%)
12   Informatica 49 (2025) 1–18                                                                       H. Ca
o 
100 100
90 90
80 80
70 70
60 60
50 Research method 50 Research method
40 LR 40 LR
30 MLP 30 MLP
20 ELM 20 ELM
10 10
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Number of iterations Number of iterations
(a) Prepare posture behavior analysis accuracy (b) Analysis accuracy of service swing action  
Figure 11: Analysis accuracy of preparation posture and serving and swinging behavior. 
 
100 100
Research method
90 90
80 80
70 70
60 60
50 50 Research method
40 40 LR
LR
30 30 MLP
20 MLP 20 ELM
10 ELM 10
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Number of iterations Number of iterations
(a) Batting action behavior analysis accuracy (b) With wave as behavior analysis accuracy  
Figure 12: Accuracy analysis of hitting action and swing action behavior. 
 
model in analyzing action behavior in video images, swing" behavior. The accuracy of the research 
this study collects match videos of tennis players with a algorithm reaches 90.4%, which is about 1% higher 
total length of 5.6 hours. This study categorizes the than MLP. This indicates that the comparative 
serving actions of tennis players in videos into five algorithm may have difficulty capturing complex 
behaviors, including preparation posture, serving swing, motion features due to the limitations of its linear model, 
hitting action, swing action, and recovery posture. This and may still be insufficient in capturing diverse and 
study sets the SVM related parameters, with a penalty delicate features. The research method adopts advanced 
coefficient of 1 and SVM error accuracy of 0.001. The feature extraction techniques, which can better identify 
experiment recognizes five types of actions and uses and classify a small number of posture changes. Figure 
Logistic Regression (LR), Extreme Learning Machine 12 shows the accuracy analysis results of hitting and 
(ELM), and Multi-Layer Perceptron (MLP) as swinging movements in tennis serving behavior. 
comparison methods. The analysis results of the In Figure 12 (a), the behavior analysis accuracy of the 
preparation posture and serve swing behavior are shown research algorithm reaches 82.8%, which is 33.6% 
in Figure 11. higher than LR, 20.8% higher than ELM, and 24.5% 
Figure 11 (a) shows the accuracy results of the higher than MLP. In Figure 12 (b), the behavior 
"preparation posture" behavior analysis. The behavior analysis accuracy of all four methods reaches over 98%. 
analysis accuracy of the research algorithm has reached Therefore,  
97.2%, which is 5.84%, 7.62%, 8.04%, and 8.16%  
higher than LR, ELM, and MLP methods. Figure 11 (b) 
shows the accuracy results of the analysis of the "serve 
Accuracy rate /% Accuracy rate /%
Accuracy rate /% Accuracy rate /%
Improved Generative Adversarial Network and Particle Swarm…                          Informatica 49 (2025) 1–18   13 
100
90 Research method
80
70
60
50
40
Logistic regression
30
20 Multilayer perceptron
10 Extreme learning machine
0
0 50 100 150 200 250
Number of iterations
 
Figure 13: Accuracy of posture recovery behavior analysis. 
 
Ready Ready 
0.97 0.03 0 0.01 0 0.94 0.05 0 0.02 0
position position
Service 0.11 0.87 0.01 0 0.02 Service 0.10 0.86 0.01 0 0.04
swing swing
Stroke Stroke 
0 0 0.82 0 0.18 0.03 0.32 0.57 0 0.09
action action
Swing Swing 
0 0 0.01 0.99 0 0.02 0 0 0.98 0
action action
Postural Postural 
0 0 0.16 0 0.84 0.01 0.20 0.07 0.71
recovery recovery
Ready Service Stroke Swing Postural Ready Service Stroke Swing Postural 
position swing action action recovery position swing action action recovery
(a) Research method (b) Multilayer perceptron  
Figure 14: Confusion matrix results of different algorithms. 
 
the research algorithm significantly improves the is some overlap and confusion in the recognition 
recognition ability of "hitting action", possibly due to between these two actions. This phenomenon may be 
its sensitivity to action details and dynamic changes. attributed to the fact that the player's posture during the 
Figure 13 shows the accuracy results of posture act of serving is analogous to that observed during the 
recovery analysis in tennis serving behavior. subsequent recovery phase. This results in an 
In Figure 13, the analysis accuracy of the research insufficient degree of feature extraction, which in turn 
algorithm for "posture recovery" behavior reaches hinders the ability to distinguish between the two 
84.7%, LR is 58.2%, ELM is 62.3%, and MLP is 70.3%. movements. To reduce misclassification, it would be 
The research algorithm has obvious advantages in beneficial to consider optimizing the classifier threshold 
behavior analysis, indicating that it has better feature in SVM to adjust the decision boundary, thereby 
extraction and action classification performance. This improving the ability to distinguish between these 
study proposes methods to analyze the recognition actions. In Figure 14 (b), the MLP method has lower 
performance of MLP, as shown in Figure 14. recognition performance than the research method in 
Figure 14 shows the confusion matrix results of the dynamic actions. Although its dynamic actions have 
research method and MLP. In Figure 14 (a), the overall significant interference, the recognition accuracy of the 
recognition accuracy of the research method reaches research method is over 80%. This indicates that the 
91.24%, and in the dynamic classification effect, the research model owns good anti-interference ability and 
main manifestation is mutual interference. There is an high recognition accuracy in action recognition. This 
18% probability that the ''hitting action'' will be study compares the computational efficiency of 
misidentified as ''posture recovery''. There is a 16% different models. The details are listed in Table 2, using 
chance that the "posture recovery" action will be model runtime, GPU usage, and memory usage as 
misidentified as a "hitting action". This shows that there evaluation metrics. 
 
 
 
 
Accuracy rate /%
14   Informatica 49 (2025) 1–18                                                                       H. Ca
o 
Table 2: Comparison results of model calculation efficiency. 
Algorithm Run time (s) GPU Usage (%) Memory usage (MB) 
Research method 45.2 75.3 512 
AGAN 55.6 80.1 600 
GAN 62.8 82.4 650 
MSC 50.4 78.5 580 
LR 40.1 72.0 500 
ELM 42.3 71.5 490 
MLP 43.0 73.2 495 
  
In Table 2, the research method shows superior and SSIM of the research method are 35.12 dB and 0.93, 
performance in terms of running time, significantly respectively, indicating a robust defuzziness effect and 
reducing computation time compared to other the capacity to preserve structural details. In comparison, 
algorithms. Its GPU usage rate is 75.3%, which is the corresponding indexes of RNN and CNN-LSTM are 
relatively lower compared to algorithms such as 32.5 dB and 0.87, and 33.8 dB and 0.89, respectively. It 
AGAN and WGAN, indicating that the research model shows that the latter is relatively weak in image quality. 
is more efficient in utilizing computing resources. In In addition, the research method has a relatively low 
terms of memory usage, the research method has also runtime and memory usage (45.2s and 512MB), showing 
shown good optimization ability, maintaining a usage better computational efficiency. Overall, these results 
of 512MB, which is reduced compared to other validate the advances in accuracy and efficiency of the 
methods. To further reduce processing requirements, research method, indicating its application potential in 
this study can implement effective memory complex dynamic scenarios. 
management strategies to optimize the allocation of 
4  Discussion 
computing resources, thereby reducing memory 
consumption and improving computing speed. For When deblurring TVI, the average subjective score of the 
example, small batch processing and dynamic memory research method in the GOPRO dataset was 81.16 points, 
allocation. These strategies not only improve the and the PSNR value increased by 13.56%, 15.02%, and 
operational efficiency of the model but also ensure that 18.29% compared to AGAN, MSC, and traditional GAN. 
good performance can be maintained when processing The potential benefit of studying the model lies in 
large-scale data. The research method is further optimizing the adversarial learning mechanism between 
compared with the Recurrent Neural Network (RNN) the generator and discriminator. This could result in the 
and Convolutional Neural Network Ensemble Long generator paying greater attention to image details and 
Short-term Memory Network (CNN-LSTM) models. effectively reducing blurring phenomena. By introducing 
The results are shown in Table 3. multiple convolutional layers and activation functions, the 
The results in Table 3 show that the research method model's ability to learn complex features was enhanced, 
outperforms RNN and CNN-LSTM in several thereby improving the clarity and realism of generated 
performance indexes. Specifically, the accuracy of the images. This study demonstrated significant advantages 
research method is as high as 91.24%, which is in feature extraction using PSO-SVM. The accuracy of 
significantly higher than the 85.1% of RNN and 87.3% the MPSO algorithm reached 97.31%, and its 
of CNN-LSTM. In terms of image quality, the PSNR  convergence speed improved by 35.33%, 40.52%, and 
 51.55
Table 3: Comparative results of advanced nature 
Accuracy PSNR SSI Run time GPU Usage Memory Usage 
Method 
(%) (dB) M (s) (%) (MB) 
Research 
91.24 35.12 0.93 45.2 75.3 512 
method 
RNN 85.1 32.5 0.87 55.0 78.2 520 
CNN-LSTM 87.3 33.8 0.89 50.5 76.5 510 
  
  
  
Improved Generative Adversarial Network and Particle Swarm…                          Informatica 49 (2025) 1–18   15 
% Compared to traditional PSO, ACO, and SA. The 
advantage of the research method lied in the 5  Conclusion 
introduction of multiple population PSO strategies, 
With the rapid growth of video technology, the 
which enable different sub-populations to share 
processing and behavior analysis of TVI have become 
information and optimal solutions. By using linear 
increasingly important. To address the challenges of 
differential descent, the inertia weight of particles was 
video image blurring and feature extraction, this paper 
dynamically adjusted, which could enhance the global 
constructed a comprehensive method built on 
SA in the early stage and local SA in the later stage, 
improved GAN and PSO-SVM algorithms. For fuzzy 
thus improving the stability and efficiency of the 
processing, the improved GAN significantly improved 
feature extraction process. In the action recognition 
the ability to recover details through the introduction 
experiment, the recognition accuracy of the research 
of multiple convolutional structures and various 
method reached 91.24%, and in feature recognition 
activation functions. In terms of feature extraction, a 
such as hitting action and posture recovery, it was 
combination of PSO-SVM algorithm was adopted, 
higher than classical algorithms such as LR, ELM, and 
which integrated the advantages of PSO and SVM to 
MLP. The recognition accuracy for the "preparation 
further enhance the efficiency and accuracy of feature 
posture" behavior reached 97.2%, while the 
extraction. The experimental test results demonstrated 
recognition accuracy for the "hitting action" was as 
that the research method could more accurately 
high as 82.8%. By improving the residual network 
capture important features in motion images and 
structure, the model could effectively extract deep 
effectively reduce blurring phenomena. Moreover, the 
dynamic features from action sequences, enhancing its 
model could maintain high accuracy even in fast 
ability to capture complex motion trajectories. This 
dynamic scenarios. This indicates that the research 
enabled the action classification model to more 
model has strong anti-interference ability, which is 
accurately identify various behaviors of different 
helpful for accurate behavior analysis in complex 
athletes. In similar studies, Fréjus et al. proposed a 
environments. Although this study has achieved 
behavior recognition model based on neural networks, 
significant success in improving the quality of video 
which showed good recognition accuracy in posture 
image processing, there are still some shortcomings. 
recognition [29]. Luo Z et al. proposed a behavior 
Under complex background interference, the 
recognition model grounded on multi-layer LSTM, 
recognition performance of the model may be limited, 
which demonstrated good performance in specific 
and improving the algorithm's performance in 
applications [30]. However, the above research cannot 
complex background scenes is still a problem to be 
effectively capture small differences between actions 
solved. Therefore, future research directions can focus 
when dealing with complex dynamic sequences, 
on combining deep learning and reinforcement 
which can easily lead to misidentification in similar 
learning methods to enhance processing capabilities 
actions. Compared with it, the research method can 
for complex dynamic scenes. 
comprehensively handle complex dynamic scenes and 
has high flexibility and robustness, providing a new References 
approach for motion behavior analysis. 
In complex dynamic scenes, the recognition [1]  Deqiang Cheng, Jiansheng Qian, Xingge Guo, 
challenges of moving images are often caused by Qiqi Kou, Feixiang Xu, Jun Gu, Yachao Gao, 
motion blur, background interference, and and Jinsheng Zhao. Review on key technologies 
illumination changes. The proposed model shows of AI recognition for videos in coal mine. Coal 
good robustness in various complex situations, mainly science and technology, 51(2):349-365, 2023. 
due to the design of adaptive feature extraction and https://doi.org/10.13199/j.cnki.cst.2022-0359 
adversarial learning mechanism. The introduction of [2] Chandravva Hebbi, and H. R. Mamatha. 
PSO algorithm makes the feature extraction process Comprehensive dataset building and recognition 
more flexible and adaptive. When dealing with of isolated handwritten kannada characters using 
dynamic scenes, the particle can dynamically adjust its machine learning models. Artificial intelligence 
position to capture the key movement trajectory of the and applications, 1(3):179-190, 2023. 
player and the ball, which improves the relevance and https://doi.org/10.47852/bonviewAIA3202624 
usability of the feature. By optimizing the adversarial [3]  Teymoor Ali, Deepayan Bhowmik, and Robert 
training process between generator and discriminator, Nicol. Domain-specific optimisations for image 
the enhanced GAN model is capable of not only processing on FPGAs. Journal of signal 
generating high-quality images but also of producing processing systems, 95(10):1167-1179, 2023. 
more robust mapping relationships in the feature https://doi.org/10.1007/s11265-023-01888-2 
extraction process. This mechanism serves to mitigate [4]  Yunzhong Hou, Zhongdao Wang, Shengjin 
the impact of noise and background changes on the Wang, and Liang Zheng. Adaptive affinity for 
results, thereby enhancing the overall robustness of the associations in multi-target multi-camera 
model. tracking. IEEE transactions on image processing, 
16   Informatica 49 (2025) 1–18                                                                       H. Ca
o 
31(10):612-622, 2021. and Anthony Thomas Herdman. Generative 
https://doi.org/10.48550/arXiv.2112.07664 adversarial network (GAN) for simulating 
[5]  Lintong Zhang, David Wisth, Marco Camurri, electroencephalography. Brain topography, 
and Maurice Fallon. Balancing the budget: 36(5):661-670, 2023. 
Feature selection and tracking for multi-camera https://doi.org/10.1007/s10548-023-00986-5 
visual-inertial odometry. IEEE robotics and [14] Angelo Lorusso, Barbara Messina, and Domenico 
automation letters, 7(2):1182-1189, 2021. Santaniello. The use of generative adversarial 
https://doi.org/10.48550/arXiv.2109.05975 network as graphical support for historical urban 
[6]  Qinglong Ding, and Zhenfeng Ding. Machine renovation. ICGG 2022 - proceedings of the 20th 
learning model for feature recognition of sports international conference on geometry and 
competition based on improved TLD algorithm. graphics, 146(1):738-748, 2022. 
Journal of intelligent & fuzzy systems, https://doi.org/10.1007/978-3-031-13588-0_64 
40(2):2697-2708, 2021. [15] Deepa Kumari, S. K. Vyshnavi, Rupsa Dhar, B. S. 
https://doi.org/10.3233/JIFS-189312 A. S. Rajita, Subhrakanta Panda, and Jabez 
[7]  Jiaxu Zhang, Gaoxiang Ye, Zhigang Tu, Yongtao Christopher. Smart GAN: A smart generative 
Qin, Qianqing Qin, Jinlu Zhang, and Jun Liu. A adversarial network for limited imbalanced 
spatial attentive and temporal dilated (SATD) dataset. The journal of supercomputing, 
GCN for skeleton‐based action recognition. 80(14):20640-20681, 2024. 
CAAI transactions on intelligence technology, https://doi.org/10.1007/s11227-024-06198-3 
7(1):46-55, 2022. [16] Yaxiang Fan, Gongjian Wen, Fei Xiao, Shaohua 
https://doi.org/10.1049/cit2.12012 Qiu, and Deren Li. Detecting anomalies in videos 
[8]  Xiaoguang Zhu, Ye Zhu, Haoyu Wang, Honglin using perception generative adversarial network. 
Wen, Yan Yan, and Peilin Liu. Skeleton Circuits, systems, and signal processing, 
sequence and RGB frame based multi-modality 41(2):994-1018, 2022. 
feature fusion network for action recognition. https://doi.org/10.1007/s00034-021-01820-8 
ACM transactions on multimedia computing, [17] Canran Zhang, Jianping Dou, Shuai Wang, and 
communications, and applications (TOMM), Pingyuan Wang. Hybrid particle swarm 
18(3):1-24, 2022. optimization algorithms for cost-oriented robotic 
https://doi.org/10.48550/arXiv.2202.11374 assembly line balancing problems. Robotic 
[9]  Arti Ranjan, and M. Ravinder. intelligence and automation, 43(4):420-430, 
VAEWGAN-NCO in image deblurring 2023. 
framework using variational autoencoders and https://doi.org/10.1108/RIA-07-2022-0178. 
Wasserstein generative adversarial network. [18] Naveed ur Rehman, and Muhammad Uzair. 
Signal, image and video processing, Concentrator shape optimization using particle 
18(5):4447-4456, 2024. swarm optimization for solar concentrating 
https://doi.org/10.1007/s11760-024-03085-5 photovoltaic applications. Renewable energy, 
[10] Chuang Li, and Zhizhong Mao. Generative 184(5):1043-1054, 2022. 
adversarial network-based real-time temperature https://doi.org/10.1016/j.renene.2021.12.015 
prediction model for heating stage of electric arc [19] Yue Li, Jianfang Qi, Xiaoquan Chu, and Weisong 
furnace. Transactions of the institute of Mu. Customer segmentation using K-means 
measurement and control, 44(8):1669-1684, clustering and the hybrid particle swarm 
2022. optimization algorithm. The computer journal, 
https://doi.org/10.1177/01423312211052213 66(4):941-962, 2022. 
[11] Yupeng Song, Xu Hong, Jiecheng Xiong, Jiaxu https://doi.org/10.1093/comjnl/bxab206 
Shen, and Zekun Xu. Probabilistic modeling of [20] Vahid Goodarzimehr, Fereydoon Omidinasab, 
long-term joint wind and wave load conditions and Nasser Taghizadieh. Optimum design of 
via generative adversarial network. Stochastic space structures using hybrid particle swarm 
environmental research and risk assessment, optimization and genetic algorithm. World 
37(2):2829-2847, 2023. journal of engineering, 20(3):591-608, 2023. 
https://doi.org/10.1007/s00477-023-02421-4 https://doi.org/10.1108/WJE-05-2021-0279 
[12] Zhiwu Shang, Jie Zhang, Wanxiang Li, Shiqi [21] Fanyi Duanmu, Dian Ning Chia, and Eva 
Qian, Jingyu Liu, and Maosheng Gao. A novel Sorensen. A combined particle swarm 
small samples fault diagnosis method based on optimization and outer approximation 
the self-attention wasserstein generative optimization strategy for the optimal design of 
adversarial network. Neural processing letters, distillation systems. Computer aided chemical 
55(5):6377-6407, 2023. engineering, 49(3):1315-1320, 2022. 
https://doi.org/10.1007/s11063-022-11143-7 https://doi.org/10.1016/B978-0-323-85159-6.502
[13] Priyanshu Mahey, Nima Toussi, Grace Purnomu, 19-0 
Improved Generative Adversarial Network and Particle Swarm…                          Informatica 49 (2025) 1–18   17 
[22] Yongjie Zhu, Jiajun Chen, Ling Mao, and Jinbin [30] Zhenmin Luo, Lidong Zhang, and Zeyang Song. 
Zhao. A noise-immune model identification Multistep prediction of CO in the extraction zone 
method for lithium-ion battery using two-swarm based on a fully connected long short-term 
cooperative particle swarm optimization memory network. Journal of tsinghua university 
algorithm based on adaptive dynamic sliding (science and technology). 64(6):940-952, 2024. 
window. International journal of energy research, https://doi.org/10.16511/j.cnki.qhdxxb.2024.22.0
46(3):3512-3528, 2022. 11 
https://doi.org/10.1002/er.7401  
[23] Min Gi, Shugo Suzuki, Masayuki Kanki,  
Masanao Yokohira, Tetsuya Tsukamoto, Masaki  
Fujioka, Arpamas Vachiraarunwong, Guiyu Qiu,  
Runjie Guo, and Hideki Wanibuchi. A novel  
support vector machine-based 1-day, single-dose  
prediction model of genotoxic  
hepatocarcinogenicity in rats. Archives of  
toxicology, 98(8):2711-2730, 2024.  
https://doi.org/10.1007/s00204-024-03755-w  
[24] Vamsi Alla, Upendra Kumar Sahoo, and Rabi  
Narayan Behera. Seismic liquefaction analysis of  
MCDM weighted SPT data using support vector  
machine classification. Iranian journal of science  
and technology, transactions of civil engineering,  
48(4):2293-2303, 2024.  
https://doi.org/10.1007/s40996-023-01293-6  
[25] Laith Abualigah, Saba Hussein Ahmed,  
Mohammad H. Almomani, Raed Abu Zitar, Anas  
Ratib Alsoud, Belal Abuhaija, Essam Said  
Hanandeh, Heming Jia, Diaa Salama Abd  
Elminaam, and Mohamed Abd Elaziz. Modified  
aquila optimizer feature selection approach and  
support vector machine classifier for intrusion  
detection system. Multimedia tools and  
applications, 83(21):59887-59913, 2024.  
https://doi.org/10.1007/s11042-023-17886-2  
[26] Ning Chu, Weimin Kang, Xinhua Yao, and  
Jianzhong Fu. Online roundness prediction of  
grinding workpiece based on vibration signals  
and support vector machine. The international  
journal of advanced manufacturing technology,  
126(5/6):2733-2743, 2023.  
https://doi.org/10.1007/s00170-023-11206-6  
[27] Hossein Moosaei, Ahmad Mousavi, Milan Hladík,  
and Zheming Gao. Sparse L1-norm quadratic  
surface support vector machine with Universum  
data. Soft computing, 27(9):5567-5586, 2023.  
https://doi.org/10.1007/s00500-023-07860-3  
[28] Jagadeesh Basavaiah, and Audre Arlene Anthony.  
A pragmatic approach for infant cry analysis  
using support vector machine and random forest  
classifiers. Wireless personal communications,  
137(4):2269-2280, 2024.  
https://doi.org/10.1007/s11277-024-11491-8  
[29] Fréjus A. A. Laleye, and Mikaël A. Mousse.  
Attention-based recurrent neural network for  
automatic behavior laying hen recognition.  
Multimedia tools and applications,  
83(22):62443-62458, 2024.  
https://doi.org/10.1007/s11042-024-18241-9  
18   Informatica 49 (2025) 1–18                                                                       H. Ca
o 
 
 
 
https://doi.org/10.31449/inf.v49i12.7117                                                                                              Informatica 49 (2025) 19–34   19 
 
Optimizing Fuzzy Logic Control-Based Weather Forecasting through 
Optimal Antecedent Selection Using the Fuzzy Analytical Hierarchy 
Process Model 
 
Alaa Sahl Gaafar1*, Jasim Mohammed Dahr1 and Alaa Khalaf Hamoud2 
1Directorate of Education in Basrah, Basrah, Iraq  
2Department of Cybersecurity, University of Basrah, Basrah, Iraq  
E-mail: alaasy.2040@gmail.com, jmd20586@gmail.com,  alaa.hamoud@uobasrah.edu.iq 
*Corresponding author 
 
Keywords: FAHP, FLC, antecedents, fuzzy, forecasting, weather parameters, error rates 
 
Received: September 9, 2024 
The numerical weather forecasts rely largely on the amount of precipitation available, and the use of 
statistical and empirical methods, but fall short of higher accuracy and relatively short-time required. 
Recently, the fuzzy AHP (that combined AHP with fuzzy logic) for the purpose of arriving at better 
outcomes from fuzzy logic control (FLC) rules-list. While evolutionary computing and fuzzy logic 
techniques are known to guarantee better accuracy and reliability of the outcomes when applied to 
weather uncertainty problems. Though, the fuzzy logic approach has low accuracy, which needs to be 
improved with rules-list refinement. This paper pulls on these approaches to develop a weather 
forecasting model for cities. First of all, the outcomes of the FAHP model revealed that, Wind Direction 
(WND) and Relative Humidity (HUM) as contributing 30.01% and 19.97% influence to the decision-
making process against air temperature, windspeed, WND, HUM, and air pressure identified earlier. 
Secondly, the select FAHP parameters served as antecedents for the FLC model, in which five fuzzy rules 
were included in rule-base. Upon validation with the standard and local datasets, the proposed model 
achieved lower error rates of 0.0010, 0.0317 and 0.0319 for MSE, RMSE and MAPE respectively when 
treated with the Kaggle standard dataset. By comparing the proposed FLC model outcomes to the 
unoptimized FLC model in term of error rates, MSE of 0.0010, RMSE of 0.0317, and MAPE of 0.0355 
were achieved attained by former indicative of its superiority. 
Povzetek: Predstavljena je optimizacija napovedovanja vremena z uporabo mehke logike in izbire 
optimalnih predhodnikov s pomočjo analitične hierarhije.
1   Introduction research organizations globally. Weather and Radars 
facilities consistently curating live data whereas cloud 
Weather depicts the state of air over earth at given place 
patterns, temperature, and winds focus data are main 
and period. It is an unceasing, data-intensive, disorganized 
concern of special satellites. Consequently, there is the 
and dynamic technique. Forecasting is the procedure of 
endless stockpiles of meteorological data required for 
evaluation in indefinite circumstances from past data. 
investigations using AI approaches (machine learning 
When “weather” and “forecasting” are put together, 
(ML) algorithms), which could enhance the accuracy of 
“Weather forecasting”, is systematically and technically 
the forecasting especially at short-term weather estimation 
demanding issues across the globe in the past century. 
[2]. 
Weather forecasting is one field of traction for many 
Weather-station process cloud data using high-
scholars and researchers, which seek to ascertain the 
performance approaches and algorithms in order to mine 
present state of atmosphere gets varied. Though, the tasks 
salient features to raise precision of the classification on 
of predicting forecasts are daunting due to their 
the basis of the inputs supplied. This is made possible in 
unpredictable and muddled nature. These have been 
recent times with a computationally proven deep learning 
applied to diverse scenarios including severe weather 
approaches [3]. Aside this, many weathers forecasting 
alerts and advisories for transportation, agricultural 
models have been combined to improve the accuracy of 
production and development and forest fire 
outcomes   [4]. AI algorithms have been deployed for 
minimizations[1] . 
dealing with real-life tasks in the same way as natural 
Also, weather nowcasting is a short-leaved approach to 
schemes. Though, human intelligence is capable of 
forecasting of weather, which involves analysis and 
differentiating and adapting to fresh environments, AI 
estimation of weather on 6-hourly basis. Presently, the 
follows a procedural algorithm when conforming to the 
nowcasting hold special place during risk deterrence and 
certain situations. Fuzzy logic is an AI approach which 
crisis administration, even as severe weather happenings 
utilizes an approximate-reasoning instead of actual-
are imminent. Several stacks of meteorological dataset 
reasoning style by incorporating some levels of ambiguity 
gather from satellite, radar, and other weather observatory 
as a form of reasoning procedure. 
sites, are used for diverse of analyses by meteorological 
20 Informatica 49 (2025) 19–34 A.S. Gaafar 
The numerical weather estimations have been used in • To develop enhanced fuzzy logic control model 
enterprises, civil protection institutions, lifestyles of for weather forecasting. 
peoples globally; while reducing social and economic The lasting parts of this paper explained the following: 
indemnities. However, there is the need to evolve better second section is the related works. Section three is the 
and more accurate parametrization of physical processes discussion about research methodology. Fourth section is 
to raise the outcomes of estimates generated. There is the results and discussion. The conclusion is obtainable in 
still the problem inability of existing weather forecasting section 5. 
techniques to produce location precise, time efficient and 
intensity of weather-related events [5]. 2 Related works 
Previously, the main procedure for forecasting weather, 
that is the state of atmosphere over a particular place, This section demonstrates and discuss the literature in the 
involves the use of statistical and empirical methods by field of forecasting weather. Selim Furkan Tekin et al. [12] 
means of the principle of physics, but fall short of higher proposed a deep learning approach to predict the high-
accuracy and relatively short-time required [1] . resolution weather based on observations and input data. 
Subsequently, the renew calls for machine learning and The prediction model works based on a spatio-temporal 
ensemble methods, which utilizes complex computerized approach, where it is composed of a convolutional neural 
mathematical models for desirable outcomes. network with an encoder-decoder structure, and 
The birth of AI, big data analytics and machine learning convolutional long-short term memory. The matcher 
techniques offered the opportunities for planners and mechanism is utilized to enhance the interpretability and 
policymakers to understand the implications of diverse performance of long-short term. The model is 
weather conditions as well as allocating resources in the experimented on a real-life, high-scale numerical dataset 
case of extreme weather-related systems disruptions. that holds the temperature, and pressure levels. The results 
Nonetheless, researchers and scholars are making efforts show that there is significant improvement when capturing 
to increase the accuracy and reliability of the modelling temporal and spatial correlations. Matthew Chantry et al. 
systems [6]. But, the accuracy of automated daily weather in [13] proposed models of emulators based on machine 
classification relies on both the applied classifiers and the learning that work as parameterization scheme 
training data [7].  accelerators for weather forecasting. The emulators are 
The concept of the multi-criteria decision-making trained to produce accurate and stable results of 
(MCDM) schemes undertake multichoice and multi- forecasting timescales. The accuracy of emulators is 
objective problems. In particular, there are three kinds of correlated with the complexity of the networks, while it 
solutions derivable using MCDM especially when it produces more accurate forecasts. With medium range 
concerns making choice from pool options having the best forecasting, they found that the proposed emulators 
alternatives. Also, it is possible to rank the order of several compared with the parameterization scheme are more 
alternatives in order of importance or preferences. More accurate. With CPU hardware, the proposed emulators are 
so, sorting and classifying decision alternatives within similar to existing scheme in computational cost, while 
acceptable order of groupings [8], fuzzy TOPSIS, they performed 10 times faster based on GPU. K. Bala 
VIKOR, and TODIM and [9] are common methods when Maheswari et al. [14] proposed a model to make long-term 
undertaking selection of alternative like bank websites and weather forecasts using a historical dataset. The model is 
electronic banking application’s quality.  implemented based on support vector machine and 
FAHP is held in high esteem as valuable for complex decision tree algorithms to forecast different conditions 
decision-making tasks, which empowers the analysts to such as rainfall, floods, storms, humidity, and 
minimize uncertainty and vulnerability connected to the temperature. While Mohammad Sadman Tahsin et al. in 
process of preparing chiefs’ judgment not applicable in [15] proposed a daily weather forecasting model in an 
AHP approaches. The AHP proposed by Saaty was fine- urban area. 12 data mining models are implemented over 
turned or fuzzified in order to control and spot the 20 years of climate data patterns in Chittagong city. The 
vulnerability [10]. The key concept of the FAHP evaluation process of the model is implemented based on 
streamlines composite decision-making tasks across tiered different metrics, such as precision, recall, accuracy, F-
structure made of criteria and sub-criteria in manner as a measure, receiver operating characteristics, and area under 
pairwise comparison to the criteria [11]. curve. The results show that J48 outperformed the other 
 algorithms in accuracy.  
This paper develops an effective FAHP model-based The summary of related works according to author(s), 
antecedents’ selection for fuzzy logic control weather objectives(s), methodologies, outcomes and limitations 
forecasting system. The contributions include: are presented in Table 1. 
• To select the fuzzy logic antecedents through 
FAHP model. 
 
 
 
 
Optimizing Fuzzy Logic Control-Based Weather Forecasting through…                                      Informatica 49 (2025) 19–34   21 
 
Table 1: The related works summary 
S/N References Objective(s) Methodology Outcome(s) Limitation(s) 
It provides point 
predictions for 
Wind power forecasting based Deterministic and 
1. [16] day-to-day Accuracy to be improved. 
on climatic conditions probabilistic models. 
operations of 
power systems. 
The weighted-
Solar Photovoltaic system 
Machine learning KNN outperforms Energy efficiency and high error 
2. [17] forecasts under several weather 
techniques. other ML rates. 
factors 
approaches. 
Ensemble of deep No relationship between normal 
Weather Nowcasting under The error is less 
3. [18] learning techniques and adverse metrological 
Radar products’ values. than 4%. 
(NowDeepN). products’ values. 
Correctly 
classified users’ 
Weather impact on COVID-19 
claims based on Classifier ineffectiveness on 
4. [19] outbreak based on users’ twitter Machine learning. 
their tweets at 95% other languages. 
feeds. 
AUC-PR and 
AUC-ROC. 
Climate changes 
Step by step linear aggravates health 
Cyclonic weather regimes regression model, risks of people 
5. [20] Large inaccuracies from datasets. 
impact on seasonal influenza. clinical and laboratory using regression 
tests. and root mean 
square difference. 
Determination of 
Weather-associated delays in ML modeling of weather severe and 
6. [21] Accuracy could be improved. 
transport sector. events. disruptive weather 
events. 
AUC of 91% for 
Weather radar reflectivity Fraction Skill Score Reliability of forecasts to be 
7. [22] the predictive 
towards flood events. (FSS). improved. 
model. 
Accurate 
CNN-based building 
Interpretability of satellite classification of Pre-and-post-disaster images 
8. [23] damage image 
imagery. building damages modelling. 
classification. 
through images. 
Weather 
Internet of Everything: 
information are Internet-enabled approach for 
9. [24] Smart weather reporting system. sensors for measuring 
effectively farmers. 
weather parameters. 
disseminated. 
It has Increased 
Weather forecasting with Machine learning based Neural network algorithms are 
10. [25] speed and accuracy 
gravity wave drag emulation. on neural networks. less-effective. 
of models. 
It offered superior 
Spatio-temporal weather Spatio-temporal dataset was 
11. [26] Convolutional LST Ms. MSE and 
forecasting. utilized. 
performance. 
Generalized Likelihood 
Ratio Test for It offers better To improve on local information 
Minimizing turbine clutter 
12. [27] identifying signal prediction due to about precipitation and filtered 
based on weather radar data. 
subspace and gates overlap of datasets. radar IQ. 
impacted by WTC. 
Short-leaved magnetic field of 
Nowcasting of extreme space Magnetotelluric data of It used bivariate 
13. [28] storm for spatial and temporal 
weather events. geomatic storms. approach for 
events. 
polarization of 
22 Informatica 49 (2025) 19–34 A.S. Gaafar 
storm time electric 
fields. 
Classification of main synoptic It effectively Applicable to particle 
Particle formulation 
14. [29] meteorological patterns of determines formulation and air quality 
analysis of air quality. 
atmosphere. weather scenarios. prediction. 
It improved the 
Machine learning with 
Weather data knowledge quality of High errors during simulation of 
15. [30] rule base approach such 
mining. concomitant weather reports. 
as K-NN, ARIMA. 
factors prediction. 
Machine learning based 
KNN, Random 
NCDC weather data models including 
Forest and Overfitting and smaller datasets 
16. [31] classification and predictive CART, AdaBoost, 
XGBoost had impart on performance. 
models. Decision Tree, and 
highest accuracy. 
XGBoost. 
CNN Deep 
Photovotaic (PV) solar power Particle swarm learning model 
Hyrid algorithms to be 
17. [32] forecasting based on climatic optimization and genetic best for 
experimented. 
conditions algorithms. determining PV 
power. 
It generated highly 
Machine learning with Location and climate change 
Weather files for building accurate 
18. [33] regression and events and applications not 
energy designs optimization. subsequent 
classification models. considered. 
weather files. 
It uses full-field 
Outcomes may be inaccurate and 
Numerical weather weather system to 
19. [34] Weather forecasting misleading without full-field 
prediction. perform anomaly 
data. 
weather forecasts. 
It improves 
Hybrid ML model of 
outcomes of 
20. [35] Rainfall forecasts. PSO and Feed Forward To increase accuracy. 
forecasts for 
Neural Network. 
rainfall. 
It predicts local 
weather events 
Automated weather data LSTM based neural To extend to more parameters of 
21. [36] such as Tornado, 
processing. network model. soils forecasts. 
flood, severe 
storm, etc. 
Naïve Bayes had 
Classification tree, More data consisting of weather 
22. [37] Weather prediction. best accuracy of 
KNN, Naïve Bayes. observational data over stations. 
77.1%. 
Water surrogates 
Weather conditions-based water Bayesian Belief are determinants Higher accuracy required for 
23. [38] 
quality prediction. Networks. for water quality safety of drinking water. 
prediction. 
KNN produced 
Naïve Bayes, C.45 and highest accuracy Input criteria and constraints are 
24. [39] Weather forecasts. 
KNN. (71.59%) inconsistent. 
forecasts. 
SAID 
Selection Based on 
outperformed other 
Multi-class classification of Accuracy Intuition and To explore computer vision for 
25. [40] algorithms in 
weather data. Diversity (SAID) of weather classification. 
classifying weather 
ensemble scheme. 
images. 
It improved results 
Solar irradiance forecast based 
26. [41] Naïve Bayes classifier. and accuracy for The smaller training dataset. 
on weather variables. 
real-time weather. 
Optimizing Fuzzy Logic Control-Based Weather Forecasting through…                                      Informatica 49 (2025) 19–34   23 
It increased the 
Machine learning and To utilize classification and 
27. [42] Weather forecasting. accuracy and speed 
ensemble methods. clustering approaches. 
of forecasting. 
It identified risks to 
Weather-based major power A two-level hybrid risk To apply to resilience of power 
28. [43] be associated with 
outage forecasts. determination model. systems. 
different factors. 
SVM produced the 
Weather-based Solar PV power KNN, and SVM Expanding the models to K-
29. [44] best accuracy of 
forecasting. classifiers. Means, Random Forest, etc. 
forecasts. 
Weather data 
Optimization and integration of 
extrapolation for 
30. [45] Rainfall forecasts. Data mining approaches. data-mining techniques for better 
determining 
accuracy. 
rainfall patterns. 
 
From Table 1, majority of the weather forecasting lithology. Also, six factors serve as flood vulnerability 
considered different weather parameters using machine zonation including: total population, land cover/ land use, 
learning and numerical prediction schemes. However, distance to hospital, density of population, road density, 
there are no focus on selection of influential factors and and distance to road. AUC score of 0.741 was obtained for 
their fuzziness as well as the effect of complexity of AHP approach. Using the multicollinearity analysis 
meteorological datasets during various forecasts tasks. To revealed highly corrected independent variables. Though, 
this end, the roles of the FAHP, AHP and Fuzzy Logic specificity of the forecasts can be performed using other 
techniques in the weather forecasting tasks and others techniques. 
were analyzed as follows: Fanxiao Meng et al. in [50] deployed remote sensing, GIS 
The concept of FLC identified for determining stock datasets to determine the groundwater recharge zones 
prices movements using the Nigeria Stock Exchange (GWRZ) in Pakistan. The hydrology and geology factors 
trading datasets for Dangote Cement PLC. Alfa et al. [46] influence on the GWRZ were investigated. In particular, 
proposed the rules-list’s antecedent optimization with the thematic maps was composed of the slope, rainfall, 
genetic algorithms procedure to improve the forecasts geology, drainage density, land cover/ land use, lineament, 
effectiveness. Following from that, they further optimized and types of soil. Authors utilized multi-influencing factor 
the rule-list’s consequent by means of the genetic and the AHP to assign weights to the factors. But, the use 
algorithm method in which the error rates diminished of advanced methods could improve the decision-making 
substantially. However, the studies did not cover effects of process and its accuracy. 
the dataset complexity on the effectiveness of the fuzzy Husam Musa Baalousha et al. in [51] evaluated the risk of 
logic control schemes. flooding based on FAHP and Fuzzy Logic in the Arid 
A grey fuzzy AHP-based flash flood vulnerability places of Qatar using the land cover, precipitation, soil 
evaluation in watershed region of Himalayan, China was type, flow accumulation, and elevation. The outcomes 
undertaken by [47]. Authors leveraged on geographical from both the Fuzzy logic and the FAHP demonstrated 
information system (GIS) and 12 natural and resemblances in the low-risk and differences in the high-
anthropogenic parameters. The low, moderate and high risk zones. While the FAHP accounted for higher 
classes were assigned to the Flash Flood Vulnerability variability and more accurate than Fuzzy Logic method. 
Index in which the sensitivity test revealed LULC was Sinan Keskin et al. in [52]developed a fuzzy spatial online 
highly influential. However, there is the propensity of analytical processing (FSOLAP) framework to provide 
applying more effective methods like fuzzy logic control. predictive analytics of the complex data applications. The 
A GIS with multi-criteria decision making (MCDM) framework was validated with meteorological datasets 
method were adopted in determining landslide-prone from the Turkish Meteorological Office. When compared 
regions in highland of Southern Western Ghats by [48]. with traditional machine learning approaches, FSOLAP is 
Nine landslide influencing factors were considered in a more scalable and accurate for big meteorological 
ascertaining the thematic layers for the landslide databases’ fuzziness or uncertainty. 
susceptibility map. AUC scores of 79% and F1 scores of Susanta Mahato et al. in [53] combined FAHP and Fuzzy 
85% were obtained from the standardized causative factor Logic techniques to determine the drought-based 
weights. More techniques can be applied to improve the vulnerability factors in Odisha, India. Six criteria of water 
performance of the FAHP. usage and demand, physical attributes, land use, 
Zhran et al. in [49] implemented the flood risk zonation in groundwater, and development/population, and 22 sub-
Egypt’s Nile districts of Damietta using the IGS, remote criteria were chosen. The FAHP weighted the parameters 
sensing, and AHP. Twelve thematic layers of slope, through pair-wise comparisons matrix. The Fuzzy logic 
elevation, vegetation index, topographic wetness index, provided five classes of vulnerability: very high, high, 
water index, topographic positioning index, stream power moderate, low, and very low. During validation, statistical 
index, modified Fournier index, drainage density, evaluation parameters root means square error, accuracy, 
sediment transport index, distance to the river, and and mean absolute error were employed.  
24 Informatica 49 (2025) 19–34 A.S. Gaafar 
Waseem Alam et al. in [54] introduced the FAHP e. DEFINE the values and linguistic of the first antecedent 
framework in assessing and ranking the criteria and and matching  membership  functions, 
that is, low, medium and high. 
weight factors of the behaviour of drivers in Peshawar,  f. DEFINE the values and linguistic of the second 
Pakistan. Three most important risky driving features antecedent and matching  membership functions, 
include: errors, violations, and lapses. The driver attention that is, low, medium and high. 
and clear road signage were top influential factors in  g. DEFINE the values and linguistic of the consequent and 
matching  membership functions, that is, low, medium 
raising risk perception of the drivers. Also, the ensemble and high. 
machine learning offered an accuracy of 0.84. Step 8. BUILD the fuzzy rules from the membership functions for 
Nonetheless, there are prospects of FLC in explaining the the Antecedents and Consequents for all the possible combinations. 
interconnection among various factors and driving Step 9. OPTIMIZE the fuzzy rules (the chromosomes) with genetic 
algorithm procedure to select the best rules for the FLC weather 
behaviours. forecasting system. 
The reviewed studies, in the second part, had drawn Step 10. APPLY the weather datasets to evaluate the FLC system. 
fascinating evidence about the weakness of the FLC and STOP. 
the complementary roles to be played by the FAHP in End 
OUTPUT: Normalized weights of criteria and FLC rule-base. 
dealing with multi-criteria decision-making and highly 
complex meteorological datasets mining in computational  
weather mining and analysis as undertaken in this paper. 3.2 Data collection and preprocessing 
 This paper utilized both primary and secondary sources of 
3 Research methodology datasets. Firstly, standard historical metrological data was 
collected from the Antarctic Automatic weather facilities 
The paper utilizes the FAHP in filtering the most 
(AntAWS) 
influencing factors for determining weather conditions of 
Dataset: https://amrdcdata.ssec.wisc.edu/dataset/antaws-
places whose datasets are traditionally composed of the 
dataset) is 3-hourly, daily and monthly under strict quality 
complex meteorological parameters [52]. To improve the 
control. Five parameters were measured by 267 AWSs 
accuracy of FLC, the most impacting factors were utilized 
from the period of 1980-2021 [57]. These include: air 
for the construction of the rules-base, which flaws 
pressure, air temperature relative humidity, wind speed, 
decision-making processes and forecasting tasks because 
and wind direction) by 267 AWSs collected between 1980 
of redundancy of the rules-lists [51].   
and 2021. The 25% and 75% thresholds were used to 
 
compute the products for daily and monthly quality-
3.1 Fuzzy analytical hierarchical process controlled readings.  
criteria selection Secondly, structured questionnaire was constructed to 
curate the required data for building effective antecedents 
The main steps for the FAHP model adoption in for fuzzy logic control-based weather forecasting model. 
determining the most relevant criteria for building fuzzy The lists of weather criteria including: TMP, PRS, WNS, 
logic control antecedents are analogous to the methods WND, HUM, and VNS. The survey questionnaire was 
undertaken by [55], [56]. created using the identified weather criteria or parameters 
 with associated nominal scale (1 – 9) of weather attributes 
Algorithm: FAHP criteria selection for the FLC rule-base. as described in Table 2 [58], [59]. 
INPUT: Comparison matrix  
Step 1 DEVELOP analytical hierarchy by utilizing a typical 
hierarchy plan based on distinct levels.   
 a. The DETERMINATION of quantification for the Table 2: The adopted membership function and linguistic 
prospective fuzzy logic  control antecedents. scale. 
 b. ANALYZE prospective FLC antecedents.  
 c. GENERATE pairwise comparison matrix based on Triangular 
Triangular 
AHP scale Fuzzy reciprocal 
Linguistic scale fuzzy 
 d. TRANSFORM into a fuzzy triangular (FT) scale. scale fuzzy 
numbers 
 numbers 
Step 2. DEVELOP a pairwise fuzzy comparison vector (PCV) Extreme 
with selected weather parameters or criteria. The crisp numeric 9 9, 9, 9 1/9, 1/9, 1/9 
importance 
values create PCV as the evaluation method being a single numeric 
value for categorizing FLC antecedents. Very, very 
8 7, 8 ,9 1/9, 1/8, 1/7 
 strong 
Step 3. COMPUTE fuzzy geometric mean from the lower, median, Very strong or 
and upper fuzzy geometric means. 7 demonstrated 6, 7, 8 1/8, 1/7, 1/6 
 importance 
Step 4. COMPUTATE fuzzy AHP weight using the lower, 6 Strong plus 5, 6, 7 1/7, 1/6, 1/5 
median and upper fuzzy weights accordingly.  
Strong 
 5 4, 5, 6 1/6, 1/5, 1/4 
Step 5. GENERATE normalized weights of the parameters. importance 
 4 Moderate plus 3, 4, 5 1/5, 1/4, 1/3 
Step 6. SELECT top-two weighted parameters to serve as the Moderate 
3 2, 3, 4 1/4, 1/3, 1/2 
antecedents for FLC-based weather forecasting system. importance 
Step 7. CONSTRUCT the triangular fuzzy numbers. 2 Weak or slight 1, 2, 3 1/3, 1/2, 1 
Equal 
1 1, 1, 1 1, 1, 1 
importance 
Optimizing Fuzzy Logic Control-Based Weather Forecasting through…                                      Informatica 49 (2025) 19–34   25 
 experts recruited for the survey. The three participants’ 
 responses in crisp numerical values and computed 
 consistency index (CI) are shown in Tables 3, 4, and 5. 
  
  
  
  
 Table 3: The first respondent responses on two topmost 
3.3 Materials for experimentation  weather parameters. 
TMP PRS WNS WND HUM VMS CR 
The weather forecasting model was validated on 
MATLAB R2019b discrete simulator on Laptop Personal TMP 1 1 1/3 1 1 1/3 0.0905 
Computer system. The minimum specifications of the 
PRS 1 1 1/2 1/3 1/3 1/4  
computational resources include:  
Hardware: WNS 3 2 1 1/4 1/5 1/4  
AMD E1-1200 APU Processor with RadeomTM Graphics WND 1 3 4 1 4 5  
1.40 GHz, 4.00 GB RAM, 64-bit Operating System, x64-
HUM 1 3 5 1/4 1 4  
based processor. 
Software: VMS 3 4 4 1/5 1/4 1  
Windows 10 Single Language 2012, 3.5 Windows 
 
Experience Index.  
From Table 3, the responses offered by the showed 
Genetic algorithm procedure parameters: 
preferences for the first item in the pair, which showed that 
Crossover probability: 0.8, Population selection method: 
highest score of 5 for HUM against WNS, and WNS 
Elitism, Offspring Rank and Mutation, Original 
against VMS. The lowest score of 1/5 was awarded to 
chromosomes: 18, Iteration: 5, Crossover type: Uniform 
WNS against HUM, and WNS against VMS. The 
crossover, Maximum population: 30, Mutation 
computed CR of 0.0905 < 0.1 threshold for acceptance of 
probability: 0.09. 
the responses of first respondent as reliable for further 
 
processing of the research questions. 
3.4 Evaluation parameters   
The effectiveness of the proposed weather forecasting Table 4: The second respondent responses on two 
model after applying the similar test and target datasets is topmost weather parameters. 
computed by means of the mean square error (MSE), root TM PR WN WN HU VM
mean square error (RMSE), and mean absolute percentage  CR 
P S S D M S 
error (MAPE) metrics given by Equations 1, 2 and 3:   
1.186
1 𝑥 2 TMP 1 1/8 1/8 1/6 1/5 1/7 
𝑀𝑆𝐸 =  ∑ (𝐴 ?̂?                               1 
𝑥 𝑔 − 𝑖𝑔)      7 
𝑔=1
PRS 9 1 1/6 8 1/7 1/5  
1 𝑥 2
𝑅𝑀𝑆𝐸 =  √ ∑ (𝐴 )                         2 
𝑥 𝑔 − ?̂?𝑔       WNS 8 6 1 6 5 7  
𝑔=1
WN
6 1/8 1/6 1 3 8  
D 
1 𝑥 𝐴𝑔 − ?̂?𝑔
𝑀𝐴𝑃𝐸 =  √ ∑ | | × 100%                3 HU
𝑥 𝑔=1 ?̂?𝑔 5 7 1/5 1/3 1 8  
M 
 
where, VMS 7 5 1/7 1/8 1/8 1  
𝐴𝑖 is the target of actual value of the output sample, 
 
?̂?𝑖 is the predicted value of the output sample, In Table 4, the responses collected for the second 
g is the index term starting at 1 to x of test dataset, and respondent indicated the preferences for the both items in 
 x is the size of the test dataset. the pair. In case of the responses of first item against 
4 Results and discussion second item, highest score of 9 was awarded to PRS over 
TMP, and the lowest score of 1/8 was awarded to TMP 
This section presents the weather forecasting outcomes against PRS, WND against PRS, VMS against WND, and 
after selecting antecedents with FAHP model. The VMS against HUM. In the case of the second item against 
conditions forecasts of cities were determined with first item, the highest score of 8 was preferred VMS 
optimized FLC model. against WND, and VMS against HUM. The lowest score 
 of 1/8 was preferred to WNS against VMS, and HUM 
4.1 FAHP Model-based criteria selection against VMS. The computed CR of 1.1867 > 0.1 
from survey outcomes threshold, the responses of second respondent were 
The research question, what is are two topmost parameters rejected as unreliable for further processing of the research 
influencing weather conditions? was posed to the three question. 
26 Informatica 49 (2025) 19–34 A.S. Gaafar 
 highest score of 9 was preferred on HUM against TMP, 
 and the lowest score of 1/7 was given to VMS against 
 WNS, WND against PRS, VMS against WNS. In the case 
 of the second item against first item, the highest score of 
 7 was preferred to HUM against WND, and VMS against 
 WNS. The lowest score of 1/8 was given to VMS against 
Table 5: The third respondent responses on two topmost PRS, and VNS against WNS. The calculated CR value of 
 weather parameters.  1.3432 > 0.1 threshold, the responses of third respondent 
TM PR WN WN HU VM CR were rejected as unreliable and removed from further 
P S  D M S processing of the research question. 
Considering the initial analysis of collected responses of 
TMP 1 1/7 1/7 1/8 1/9 1/6 1.343
2 respondents contained in Tables 3, 4, and 5, the computed 
CR values for first, second and third respondents are 
PRS 7 1 1/7 6 1/7 1/8  0.0905, 1.1867, and 1.3432 respectively, which are greater 
than 0.1 threshold for the consistency and reliability of the 
WNS 7 7 1 5 6 7  
participant’s responses except for first respondent. This 
WN 8 1/6 1/5 1 7 6  implies that, only the first respondent’s responses were 
D accepted on the basis of CR value for further investigation 
of the subject.  
HU 9 7 1/6 1/7 1 6  
Similarly, the fuzzy numbers format PCM corresponding 
M 
to crisp numerical values of the first participant’s 
VMS 6 8 1/7 1/6 1/6 1  responses (refer to Table 3) are shown in Table 6. Each 
crisp number for every response in the Table 3 is 
 substituted with matching fuzzy numbers and inverse 
In Table 5, the responses collected for the third respondent fuzzy numbers in Table 6. 
point to the preferences in the both items in the pair. In  
case of the responses of first item against second item, 
 
 Table 6: The Fuzzy numbers for first participant responses on two weather parameters. 
TMP PRS WNS WND HUM VMS 
TMP (1,1,1) (1,1,1) (1/4, 1/3, 1/2) (1,1,1) (1,1,1) (1/4, 1/3, 1/2) 
PRS (1,1,1) (1,1,1) (1/3, 1/2, 1) (1/4, 1/3,1/2) (1/4, 1/3, 1/2) (1/5, 1/4, 1/3) 
WNS (2, 3,4) (1,2, 3) (1,1,1) (1/5, 1/4, 1/3) (1/6, 1/5, 1/4) (1/5, 1/4, 1/3) 
WND (1,1,1) (2, 3, 4) (1/5, 1/4, 1/3) (1,1,1) (3, 4, 5) (4, 5, 6) 
HUM (1,1,1) (2, 3, 4) (1/6, 1/5, 1/4) (1/5, 1/4, 1/3) (1,1,1) (3, 4, 5) 
VMS (2, 3, 4) (3, 4,5) (1/5, 1/4, 1/3) (1/6, 1/5, 1/4) (1/5, 1/4, 1/3) (1,1,1) 
Table 6 contains the computed outcomes of the Chang’s Normalized 
FAHP codes on MATLAB R2013b. The FAHP model Parameter Weight Rank Remarks 
Weight(%) 
computes the weights, normalized weights, and ranks Moderate 
TMP 0.0946 9.46 5 
based on independent responses. The FAHP model uses importance 
Weak or slight 
the extended approach in determining the top two PRS 0.0741 7.41 6 
importance 
parameters that are highly influencing weather forecasts 
WNS 0.1459 14.59 4 Strong plus 
and related decision-making tasks as illustrated in Table 
Extreme 
7. WND 0.3003 30.03 1 
importance 
Table 7: The weights and ranks of respondent’s 
HUM 0.1997 19.97 2 Very, very strong 
responses computed. 
Very strong 
VMS 0.1854 18.54 3 
importance 
  
 
In Table 7, the five weather parameters received various 
contributions to the subject of weather forecasts and 
decision-making process. As shown, TMP paid 9.46%, 
PRS contributed 7.41%, WNS donated 14.59%, WND 
gave 30.03%, HUM explained 19.97%, and 18.54% was 
accounted by VMS. Interestingly, the two topmost 
parameters having extreme importance, and very, very 
Optimizing Fuzzy Logic Control-Based Weather Forecasting through…                                      Informatica 49 (2025) 19–34   27 
strong importance when determining weather conditions Antecedents Conditions Range of Values 
of regions in the study area include: WND and HUM at Indices 
30.03% and 19.97% respectively. Wind direction (WND) High (3) [93.37 – 226.53] 
More so, graphical comparisons of the select parameters Medium (2) 
Low (1) 
preferred by the respondent using the FAHP Humidity (HUM) High (3) [70.24 – 84.10] 
computational weights are shown in Figure 1. Medium (2) 
 Low (1) 
Consequents   
0,35 Weather condition High (3) [84.10 – 226.53] 
(WEATHER) Medium (2) 
0,3 Low (1) 
  
0,25  
 
0,2  
0,15  
From Table 8, the antecedents for the FLC include: wind 
0,1 direction (WND), and Humidity (HUM). The range of 
0,05 values are [93.37 – 226.53], and [70.24 – 84.10]. The 
consequent variable is weather condition (WEATHER) 
0 under investigation whose range of values are derived 
TMP PRS WNS WND HUM VMS from minimum and maximum values of the antecedents, 
Weather parameter that is, [84.10 – 226.53]. The layout of the FLC-based 
weather forecasting model is composed of two inputs 
 
(FAHP select parameters/antecedents: HUM and WND), 
Figure 1: The contributions of the select parameters on and an output (consequent: weather condition) as shown 
weather conditions forecasts. in Figure 2. 
  
From Figure 1, the graphical display of the select weather 
parameters in forecasts and determination using FAHP 
model weights, which showed clearly the top leading 
parameters as WND and HUM at 0.3 and 0.2 weighting 
scale respectively. While the least contributing parameter 
to the weather forecasting tasks was PRS at 0.09 of the 
FAHP model’s weighting scale. The FAHP model derived 
two weather parameters of WND and HUM as important 
to the subject of weather forecasting; thereby included as 
antecedents for FLC system as explained in the next 
subsection. 
 
4.2 Outcomes of the fuzzy logic control  
 
model Figure 2: The FLC-based weather forecasting system 
The fuzzy rules are generated according to data items and layout. 
type of datasets selected from the FAHP model’s 
outcomes. The two top parameters, WIND and HUM, The triangular membership function was adopted because 
were to serve as the antecedents for the inference engine of its popularity and effectiveness for modeling 
of the FLC. The FLC model developed to determine uncertainties and fuzziness during decision-making 
uncertainty problems and trends of weather in the city of processes. Three membership conditions were developed 
Austin, United States at more effective and reliability for both antecedents and consequent namely: Low, 
style. The antecedents and consequents with their Medium and High. While, the matching indices of 
respective conditions are given in Table 8. membership functions include: 1, 2, and 3. The 
 membership functions, variables, and range of values for 
Table 8. Antecedents and consequent constraints for the all the input and output are specified in Table 8. These 
fuzzy engine. refined fuzzy rules-list is used in constructing the fuzzy 
inference engine by means of the logical AND, and IF-
THEN statements as established by [60]. Therefore, the 
rule-list for the fuzzy inference engine of the proposed 
forecasting weather events is given in Table 9. 
 
Table 9: The optimized FLC rules-list indices after 
genetic algorithm refinement. 
Weight (%)
28 Informatica 49 (2025) 19–34 A.S. Gaafar 
Rule N Input 1 Input 2 Output against the local NIMET dataset of 0.1563. The same 
trend was observed for the RMSE error measure that put 
1 3 1 2 
the proposed model performance with the standard dataset 
2 1 1 2 at 0.0317 over the NIMET dataset at 0.3953. When MAPE 
3 3 3 3 evaluation parameter was considered, the proposed 
4 3 1 2 weather model performed highly at 0.0319 for Kaggle 
dataset when compared to the NIMET dataset at 0.2104. 
5 2 2 3 
This shows the proposed weather forecasting model 
  performed best with less complex and refined factors 
From Table 9, the refined fuzzy rule-lists are utilized for against highly complex meteorological local weather 
generating different mapping of the antecedents’ datasets as depicted by Figure 3. 
membership function indices to the membership functions 
 
of the consequent using the input and output weather 
parameters defined in Table 8. The rule-base generated for MAPE RMSE MSE
the FLC, from Table 9, is illustrated in Figure 3. 
 
Kaggle
NIMET
0 0,1 0,2 0,3 0,4 0,5
Error rate
 
Figure 3: The performance of the FLC weather 
 forecasting systems with diverse datasets. 
Figure 3: The optimized rules-list design of the FLC  
weather forecasting model. Again, the outcomes of weather forecasting model with 
 
the optimized FLC were superior when compared to the 
Following from Figure 3, the antecedent variables are 
ordinary FLC with refined rules-base as shown in Table 
WND and HUM, which correspond to the inputs of the 
11. 
FLC weather system. Also, the consequent is the output of 
 
the FLC system. The logic function of “AND” are used to 
Table 11: The comparisons of the proposed model to the 
map the different membership functions of the inputs to 
FLC model. 
those of the output. More importantly, the weight of the 
rules-list is 1, which denotes equal importance of all the 
Model MSE RMSE MAPE Remarks 
inputs and output membership functions indices in order 
to remove biases in decision-making process about FLC 0.0011 0.0332 0.0319 Effective 
weather conditions of cities. 
Optimized 0.0010 0.0317 0.0355 More 
The performance of the proposed optimized FLC weather 
FLC Effective 
forecasting system in terms of error rates using optimized 
rules-list the fuzzy inference engine against conventional  
FLC weather forecasting system is given in Table 10.  
From Table 11, the weather forecasting model performed 
 
better with fewer rules-lists in the rule-base rather than 
 
unfiltered rules-list. These showed that the proposed 
Table 10: Proposed weather FLC forecasting model 
weather forecasting model in terms of the evaluation 
performances with different datasets. 
metrics of MSE, RMSE, and MAPE at 0.0010, 0.0317 and 
Dataset MSE RMSE MAPE Remarks 0.0355 were most preferred because of their capability to 
NIMET 0.1563 0.3953 0.2104 Effective explain smaller variations in the outcomes against the 
target weather data in the area of study as illustrated in the 
Kaggle 0.0010 0.0317 0.0319 More Figure 4.   
Effective 
 
 
 
From Table 9, the performance of the proposed optimized 
FLC with the standard dataset was better at MSE of 0.0010 
Dataset type
Optimizing Fuzzy Logic Control-Based Weather Forecasting through…                                      Informatica 49 (2025) 19–34   29 
systems for the decision-making processes. Furthermore, 
the process of filtering weather parameters, and refining 
of the rules-list in rule-base increased the outcomes of 
FLC weather forecasting system. The FLC weather 
forecasting system has shown to perform better with the 
removal of redundancy in the rules-list as well as its input 
variables (or weather parameters). In this paper, the choice 
of the FAHP and FLC methods offer complementary roles 
which increase the variability and accuracy [51]. The 
uncertainty and fuzziness of meteorological datasets like 
Kaggle and NIMET, were best interpreted using both 
approaches [52]. The FAHP method refines the decision-
making procedure and data analytics of the FLC [50]. The 
 
Figure 4: The performance of FLC and Optimized FLC paper extended the prospects of the both FAHP and FLC 
methods for weather forecasts. to the weather forecasting and analytics, which falls into 
The reasons being that, FAHP model procedure improves the MCDM research domain. 
the selection of the most influencing parameters required 
to building FLC rule bases. More so, the FLC model’s 5 Conclusion and future works 
rules-list redundancy was filtered with genetic algorithm This paper provides required tool for determining weather 
procedure to realize 5 best rules out of 9 original rules. The and state of the atmosphere in certain places and periods 
outcomes of this paper increase the reliability of the through the application of fuzzy logic technique. It will 
weather information generation for diverse purposes as benefit individuals, government agencies, business sector, 
shown in Figure 5. built and construction sector, researchers and scholars 
 concerned with planning and policymaking depending on 
weather outlooks. This increases the understanding of the 
hidden relationships and patterns available for a more 
accurate and reliable local weather information 
dissemination.  
The outcomes of the FAHP model when used to select the 
most important parameters affecting weather forecasts of 
cities identified Wind Direction (WND) and Relative 
Humidity as contributing 30.01% and 19.97% influence to 
the decision-making process.   
Thereafter, the select parameters from the FAHP model 
procedure served as antecedents of the FLC model. The 
GA-optimized FLC model was adopted from the study by 
[46] which overcame the problem of redundancy of the 
  fuzzy inference engine rule-list. Consequently, the refined 
Figure 5: The Line graph of FLC weather forecasts 
rules-list serves the building block for the proposed FLC 
model performances compared. 
weather forecasting model. The outputs revealed that the 
 
FLC weather forecasting model in terms of the MSE, 
From Figure 5, the testing dataset is 30% of the entire 
RMSE, and MAPE at 0.0010, 0.0317 and 0.0355 were 
weather dataset collected in which the target line depicts 
most preferred against comparable models because of its 
the original weather data for 25 days periods 
capability to explain smaller variations of the datasets. It 
corresponding to after 51 months of observations. As 
was superior due to the initial FAHP-based selection of 
shown, the weather forecast model was unsteady from the 
weather parameters and rule-list reduction procedures. It 
starting 110 points by 51st month, and changed sharply to 
equally attributable is the filtered rules-list used to 
attain its lowest value at 42 points. Then, it continues to 
construct the fuzzy inference engine of the FLC.  
gain at the highest point of 139 points by 64th month. 
This paper found that, the subjectivity of expert 
However, by the end of 71 months testing period, the 
judgements during FAHP modelling of the criteria and the 
weather condition reached 86 points. In terms of the 
over reliance of FLC model on its rule-base’s optimization 
forecasts performance, the optimized FLC weather system 
impact greatly on the outcomes of the weather forecasts 
outcomes (the Actual line in the graph) showed similar 
generated. In future works, more dataset can be 
trends as the Target line through the comparable periods 
experimented to include long periods and extended site 
of observations, that is, 51st month, 54th month, 56th 
specificity.  
month, 58th month, 60th – 67th months, 70th month and 
71th month accordingly.  References 
These illustrate the capabilities of the proposed FLC 
system in accurately forecasting weather conditions of [1] S. B. Pooja and R. V. Siva Balan, “An 
cities at minimal error rates. It can be attributed to the investigation study on clustering and 
involvement of AI techniques like FAHP and FLC classification techniques for weather forecasting,” 
30 Informatica 49 (2025) 19–34 A.S. Gaafar 
J Comput Theor Nanosci, vol. 16, no. 2, 2019, doi: Manage, vol. 284, 2021, doi: 
10.1166/jctn.2019.7742. 10.1016/j.jenvman.2021.111985. 
[2] G. Czibula, A. Mihai, and E. Mihuleţ, [11] T. H. Tseng, Y. S. Wang, and Y. C. Tsai, 
“Nowdeepn: An ensemble of deep learning “Applying an AHP Technique for Developing A 
models for weather nowcasting based on radar Website Model of Third-Party Booking System,” 
products’ values prediction,” Applied Sciences Journal of Hospitality and Tourism Research, vol. 
(Switzerland), vol. 11, no. 1, 2021, doi: 45, no. 8, 2021, doi: 10.1177/1096348020986986. 
10.3390/app11010125. 
[12] S. F. Tekin, O. Karaahmetoglu, F. Ilhan, I. 
[3] S. Sokolov, S. Vlaev, and M. Chalashkanov, Balaban, and S. S. Kozat, “Spatio-temporal 
“Technique for storing and automated processing weather forecasting and attention mechanism on 
of weather station data in cloud platforms,” in IOP convolutional lstms,” arXiv preprint 
Conference Series: Materials Science and arXiv:2102.00696, vol. 4, 2021. 
Engineering, 2021. doi: 10.1088/1757-
899X/1032/1/012021. [13] M. Chantry, S. Hatfield, P. Dueben, I. 
Polichtchouk, and T. Palmer, “Machine Learning 
[4] Z. Chen, Y. Wang, and L. Zhou, “Predicting Emulation of Gravity Wave Drag in Numerical 
weather-induced delays of high-speed rail and Weather Forecasting,” J Adv Model Earth Syst, 
aviation in China,” Transp Policy (Oxf), vol. 101, vol. 13, no. 7, 2021, doi: 
2021, doi: 10.1016/j.tranpol.2020.11.008. 10.1029/2021MS002477. 
[5] V. Mazzarella, R. Ferretti, E. Picciotti, and F. S. [14] K. B. Maheswari and S. Gomathi, “A 
Marzano, “Investigating 3D and 4D variational Comprehensive Analysis of Weather Prediction 
rapid-update-cycling assimilation of weather Using Machine Learning,” in 2024 Ninth 
radar reflectivity for a heavy rain event in central International Conference on Science Technology 
Italy,” Natural Hazards and Earth System Engineering and Mathematics (ICONSTEM), 
Sciences, vol. 21, no. 9, pp. 2849–2865, 2021. 2024, pp. 1–6. 
[6] L. Coulibaly, B. Kamsu-Foguem, and F. Tangara, [15] M. S. Tahsin, S. Abdullah, M. Al Karim, M. U. 
“Rule-based machine learning for knowledge Ahmed, F. Tafannum, and M. Y. Ara, “A 
discovering in weather data,” Future Generation comparative study on data mining models for 
Computer Systems, vol. 108, 2020, doi: weather forecasting: A case study on Chittagong, 
10.1016/j.future.2020.03.012. Bangladesh,” Natural Hazards Research, vol. 4, 
no. 2, 2024, doi: 10.1016/j.nhres.2023.12.014. 
[7] F. Wang, Z. Zhen, B. Wang, and Z. Mi, 
“Comparative study on KNN and SVM based [16] I. K. Bazionis and P. S. Georgilakis, “Review of 
weather classification models for day ahead short- Deterministic and Probabilistic Wind Power 
term solar PV power forecasting,” Applied Forecasting: Models, Methods, and Future 
Sciences (Switzerland), vol. 8, no. 1, 2017, doi: Research,” Electricity, vol. 2, pp. 13–47, 2021, 
10.3390/app8010028. doi: https://doi.org/10.3390/electricity2010002. 
[8] M. Al-Shammari and M. Mili, “A fuzzy analytic [17] M. S. Nkambule, A. N. Hasan, A. Ali, J. Hong, 
hierarchy process model for customers’ bank and Z. W. Geem, “Comprehensive Evaluation of 
selection decision in the Kingdom of Bahrain,” Machine Learning MPPT Algorithms for a PV 
Operational Research, vol. 21, no. 3, 2021, doi: System Under Different Weather Conditions,” 
10.1007/s12351-019-00496-y. Journal of Electrical Engineering & Technology, 
2020, doi: 10.1007/s42835-020-00598-0. 
[9] D. Liang, Y. Zhang, Z. Xu, and A. Jamaldeen, 
“Pythagorean fuzzy VIKOR approaches based on [18] G. Czibula, A. Mihai, and E. Mihule, 
TODIM for evaluating internet banking website “NowDeepN: An Ensemble of Deep Learning 
quality of Ghanaian banking industry,” Applied Models for Weather Nowcasting Based on Radar 
Soft Computing Journal, vol. 78, 2019, doi: Products’ Values Prediction,” Applied Sciences, 
10.1016/j.asoc.2019.03.006. vol. 11, no. 125, pp. 1–27, 2021, doi: 
https://doi.org/10.3390/app11010125. 
[10] V. K. Singh et al., “Development of fuzzy analytic 
hierarchy process-based water quality model of [19] M. Gupta et al., “Whether the weather will help us 
Upper Ganga River basin, India,” J Environ weather the COVID-19 pandemic: Using machine 
learning to measure twitter users’ perceptions,” 
Optimizing Fuzzy Logic Control-Based Weather Forecasting through…                                      Informatica 49 (2025) 19–34   31 
Int J Med Inform, vol. 145, pp. 1–8, 2021, doi: vol. 19, pp. 1–17, 2021, doi: 
10.1016/j.ijmedinf.2020.104340. 10.1029/2019SW002432. 
[20] A. Hochman et al., “The relationship between [29] P. Salvador, M. Barreiro, F. J. Gómez-Moreno, E. 
cyclonic weather regimes and seasonal influenza Alonso-Blanco, and B. Artíñano, “Synoptic 
over the Eastern Mediterranean,” Science of the classification of meteorological patterns and their 
Total Environment, vol. 750, pp. 1–9, 2021, doi: impact on air pollution episodes and new particle 
10.1016/j.scitotenv.2020.141686. formation processes in a south European air 
basin,” Atmos Environ, p. 118016, 2020, doi: 
[21] Z. Chen, Y. Wang, and L. Zhou, “Predicting 10.1016/j.atmosenv.2020.118016. 
weather-induced delays of high-speed rail and 
aviation in China,” Transp Policy (Oxf), vol. 101, [30] L. Coulibaly, B. Kamsu-foguem, and F. Tangara, 
pp. 1–13, 2021, doi: “Rule-based machine learning for knowledge 
10.1016/j.tranpol.2020.11.008. discovering in weather data,” Future Generation 
Computer Systems, vol. 108, pp. 861–878, 2020, 
[22] V. Mazzarella, R. Ferretti, E. Picciotti, and F. S. doi: 10.1016/j.future.2020.03.012. 
Marzano, “Investigating 3D and 4D Variational 
Rapid-Update-Cycling Assimilation of Weather [31] I. Gad and D. Hosahalli, “A comparative study of 
Radar Reflectivity for a Flash Flood Event in prediction and classification models on NCDC 
Central Italy,” Natural Hazards and Earth System weather data,” International Journal of 
Sciences, pp. 1–26, 2021, doi: Computers and Applications, pp. 1–12, 2020, doi: 
https://doi.org/10.5194/nhess-2020-406. 10.1080/1206212X.2020.1766769. 
[23] T. Y. Chen, “Interpretability in Convolutional [32] R. Ahmed, V. Sreeram, Y. Mishra, and M. D. Arif, 
Neural Networks for Building Damage “A review and evaluation of the state-of-the-art in 
Classification in Satellite Imagery,” Technical PV solar power forecasting: Techniques and 
Note, pp. 1–11, 2021, doi: optimization,” Renewable and Sustainable 
10.20944/preprints202101. 0053.v1. Energy Reviews, vol. 124, pp. 1–26, 2020, doi: 
10.1016/j.rser.2020.109792. 
[24] A. J. Chinchawade and O. S. Lamba, “Secure 
Communication in Internet of Everything (IOE) [33] M. Hosseini, A. Bigtashi, and B. Lee, “Generating 
Based Smart Weather Reporting Systems,” Future Weather Files under Climate Change 
Journal of Information and Computational Scenarios to Support Building Energy Simulation 
Science, vol. 14, no. 1, pp. 46–51, 2021. - A Machine Learning Approach,” Energy Build, 
p. 110543, 2020, doi: 
[25] M. Chantry, S. Hatfield, P. Dueben, I. 10.1016/j.enbuild.2020.110543. 
Polichtchouk, and T. Palmer, “Machine learning 
emulation of gravity wave drag in numerical [34] W. Qian, J. Du, and Y. Ai, “A review: anomaly 
weather forecasting,” J Adv Model Earth Syst, pp. based versus full-field based weather analysis and 
1–23, 2021. forecasting,” The American Meteorological 
Society, pp. 1–52, 2020, doi: 10.1175/BAMS-D-
[26] S. F. Tekin, O. Karaahmetoglu, F. Ilhan, I. 19-0297.1. 
Balaban, and S. S. Kozat, “Spatio-temporal 
Weather Forecasting and Attention Mechanism on [35] H. Abdul-Kader, M. Abd-el Salam, and M. 
Convolutional LSTMs,” pp. 1–13, 2021. Mohamed, “Hybrid Machine Learning Model for 
Rainfall Forecasting,” Journal of Intelligent 
[27] A. Dutta, V. Chandrasekar, and E. Ruzanski, “A Systems and Internet of Things, vol. 1, no. 1, pp. 
signal sub-space-based approach for mitigating 5–12, 2020, doi: 10.5281/zenodo.3376685. 
wind turbine clutter in fast scanning weather 
radar,” in 2021 USNC-URSI NRSM, 2021, pp. [36] S. Sokolov, S. Vlaev, and M. Chalashkanov, 
202–203. “Technique for storing and automated processing 
of weather station data in cloud platforms 
[28] F. Simpson and K. Bahr, “Nowcasting and Technique for storing and automated processing 
Validating Earth’s Electric Field Response to of weather station data in cloud platforms,” in 
Extreme Space Weather Events Using Series, IOP Conference Science, Materials, 2020, 
Magnetotelluric Data: Application to the pp. 1–7. doi: 10.1088/1757-899X/1032/1/012021. 
September 2017 Geomagnetic Storm and 
Comparison to Observed and Modeled Fields in [37] R. Prasetya and A. Ridwan, “Data Mining 
Scotland,” Advancing Earth and Space Science, Application on Weather Prediction Using 
32 Informatica 49 (2025) 19–34 A.S. Gaafar 
Classification Tree, Naïve Bayes and K-Nearest [45] S. Aftab, M. Ahmad, N. Hameed, M. S. Bashir, I. 
Neighbor Algorithm With Model Testing of Ali, and Z. Nawaz, “Rainfall Prediction using 
Supervised Learning Probabilistic Brier Score, Data Mining Techniques: A Systematic Literature 
Confusion Matrix and ROC,” Journal of Applied Review,” vol. 9, no. 5, pp. 143–150, 2018. 
Communication and Information Technologies, 
vol. 4, no. 2, pp. 25–33, 2019. [46] A. A. Alfa, I. O. Yusuf, S. Misra, and R. Ahuja, 
“Enhancing Stock Prices Forecasting System 
[38] A. Panidhapu, Z. Li, A. Aliashrafi, and N. M. Outputs Through Genetic Algorithms Refinement 
Peleato, “Integration of weather conditions for of Rules-Lists,” in Lecture Notes in Networks and 
predicting microbial water quality using Bayesian Systems, vol. 121, 2020. doi: 10.1007/978-981-
Belief Networks,” Water Res, p. 115349, 2019, 15-3369-3_49. 
doi: 10.1016/j.watres.2019.115349. 
[47] D. Roy, A. Dhar, and V. R. Desai, “A grey fuzzy 
[39] Y. Findawati, I. R. I. Astutik, A. S. Fitroni, I. analytic hierarchy process-based flash flood 
Indrawati, and N. Yuniasih, “Comparative vulnerability assessment in an ungauged 
analysis of Naïve Bayes, K Nearest Neighbor and Himalayan watershed,” Environ Dev Sustain, vol. 
C. 45 method in weather forecast Comparative 26, no. 7, pp. 18181–18206, 2024. 
analysis of Naïve Bayes, K Nearest Neighbor and 
C. 45 methods in weather forecast,” in 4th Annual [48] G. Gopinath, N. Jesiya, A. L. Achu, A. Bhadran, 
Applied Science and Engineering Conference, and U. P. Surendran, “Ensemble of fuzzy-
IOP Publishing, 2019, pp. 1–7. doi: analytical hierarchy process in landslide 
10.1088/1742-6596/1402/6/066046. susceptibility modeling from a humid tropical 
region of Western Ghats, Southern India,” 
[40] A. G. Oluwafemi and W. Zenghui, “Multi-Class Environmental Science and Pollution Research, 
Weather Classification from Still Image Using vol. 31, no. 29, 2024, doi: 10.1007/s11356-023-
SAID Ensemble Method,” in 2019 Southern 27377-4. 
African Universities Power Engineering 
Conference/Robotics and Mechatronics/Pattern [49] M. Zhran et al., “Exploring a GIS-based analytic 
Recognition Association of South Africa hierarchy process for spatial flood risk assessment 
(SAUPEC/RobMech/PRASA), IEEE, 2019, pp. in Egypt: a case study of the Damietta branch,” 
135–140. doi: Environ Sci Eur, vol. 36, no. 1, pp. 1–25, 2024. 
10.1109/RoboMech.2019.8704783. [50] F. Meng et al., “Identification and mapping of 
[41] Y. Kwon, A. Kwasinski, and A. Kwasinski, “Solar groundwater recharge zones using multi 
Irradiance Forecast Using Naïve Bayes Classifier influencing factor and analytical hierarchy 
Based on Publicly Available Weather Forecasting process,” Sci Rep, vol. 14, no. 1, p. 19240, 2024. 
Variables,” Energies (Basel), vol. 12, no. 1529, [51] H. M. Baalousha, A. Younes, M. A. Yassin, and 
pp. 1–13, 2019. M. Fahs, “Comparison of the fuzzy analytic 
[42] S. B. Pooja and R. V. S. Balan, “An Investigation hierarchy process (F-AHP) and fuzzy logic for 
Study on Clustering and Classification flood exposure risk assessment in arid regions,” 
Techniques for Weather Forecasting,” Journal of Hydrology, vol. 10, no. 7, p. 136, 2023. 
Computational Theoretical Nanoscience, vol. 16, [52] S. Keskin and A. Yazc, “FSOLAP: A fuzzy logic-
no. 2, pp. 417–421, 2019, doi: based spatial OLAP framework for effective 
10.1166/jctn.2019.7742. predictive analytics,” Expert Syst Appl, vol. 213, 
[43] S. Mukherjee, R. Nateghi, and M. Hastak, “A p. 118961, 2023. 
multi-hazard approach to assess severe weather- [53] S. Mahato, G. Mandal, B. Kundu, S. Kundu, P. K. 
induced major power outage risks in the U. S.,” Joshi, and P. Kumar, “Comprehensive Drought 
Reliab Eng Syst Saf, vol. 175, pp. 283–305, 2018, Vulnerability Assessment in Northwestern 
doi: 10.1016/j.ress.2018.03.015. Odisha: A Fuzzy Logic and Analytical Hierarchy 
[44] F. Wang, Z. Zhen, B. Wang, and Z. Mi, Process Integration Approach,” Water (Basel), 
“Comparative Study on KNN and SVM Based vol. 15, no. 18, p. 3210, 2023. 
Weather Classification Models for Day Ahead [54] W. Alam et al., “Analysis and Prediction of Risky 
Short-Term Solar PV Power Forecasting,” Driving Behaviors Using Fuzzy Analytical 
Applied Sciences, vol. 8, no. 28, pp. 1–23, 2018, Hierarchy Process and Machine Learning 
doi: 10.3390/app8010028. 
Optimizing Fuzzy Logic Control-Based Weather Forecasting through…                                      Informatica 49 (2025) 19–34   33 
Techniques,” Sustainability, vol. 16, no. 11, p.  
4642, 2024.  
 
[55] T. Chen and H.-C. Wu, “Fuzzy collaborative  
intelligence fuzzy analytic hierarchy process  
approach for selecting suitable three-dimensional  
printers,” Soft comput, 2020, doi:  
10.1007/s00500-020-05436-z.  
 
[56] P. Karczmarek, W. Pedrycz, and A. Kiersztyn,  
“Fuzzy Analytic Hierarchy Process in a Graphical  
Approach,” Group Decis Negot, vol. 30, pp. 463–  
481, 2021, doi: 10.1007/s10726-020-09719-6.  
 
[57] Y. Wang et al., “The AntAWS dataset: a  
compilation of Antarctic automatic weather  
station observations,” Earth Syst Sci Data, vol. 19,  
pp. 411–429, 2023.  
 
[58] H. Y. Emam, G. Sayed, and A. Aziz,  
“Investigating the Effect of Gamification on  
Website Features in E- Banking Sector: An  
Empirical Research Literatures Review Research  
Variables the Dependent Variable,” The  
Academic Journal of Contemporary Commercial  
 
Research, vol. 1, no. 1, pp. 24–37, 2021. 
 
[59] M. Al-Shammari and M. Mili, “A fuzzy analytic  
hierarchy process model for customers’ bank  
selection decision in the Kingdom of Bahrain,”  
 
Operational Research, 2019, doi: 
 
10.1007/s12351-019-00496-y. 
 
[60] A. A. Alfa, S. Misra, A. Bumojo, K. B. Ahmed, J.  
 
Oluranti, and R. Ahuja, “Comparative Analysis of 
 
Optimisations of Antecedents and Consequents of 
 
Fuzzy Inference System Rules Lists Using 
 
Genetic Algorithm Operations,” in Lecture Notes  
in Networks and Systems, Advances in  
Computational Intelligence and Informatics, vol.  
119, R. R. Chillarige, Ed., Springer Nature  
Singapore Pte Ltd., 2020, pp. 373–379.  
 
   
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
34 Informatica 49 (2025) 19–34 A.S. Gaafar 
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i12.7578 Informatica 49 (2025) 35–48 35 
Dynamic Detection Method for Spatiotemporal Data Based on 
Hybrid Model and Singular Spectrum Analysis 
Sheng Li1, Mingguang Duan1, Xiaodan Zhou2* 
1Innovation and Entrepreneurship Institute, Guangxi Normal University, Guilin 541000, China 
2Kunshan Innovative Institute of NeoDyna for Science and Technology, Kunshan 215300, China 
E-mail: sl@beijinghuali.com, renh@beijinghuali.com, zhouxiaodantougao@163.com 
*Corresponding author 
Keywords: spatiotemporal data mining, multiple factors, dynamic data detection, singular spectrum analysis method, 
GCNN, TCN 
Received: November 12, 2024 
As internet technology advances, processing a large amount of network data has become an important 
part of network work. To improve the processing effectiveness of data in the network, a dynamic data 
accuracy detection method based on spatiotemporal data mining is proposed. During the process, 
singular spectrum analysis is introduced to propose a dynamic data detection method. A data accuracy 
detection method is proposed by combining graph convolutional neural networks and temporal 
convolutional networks to detect data in both time and spatial dimensions. Finally, the effectiveness of 
the research method is analyzed. The experimental results show that the mean absolute error, mean 
absolute percentage error, and root mean square error of the proposed method are the lowest among 
the four models, at 0.16, 0.18, and 0.20, respectively, which are lower than the other three comparative 
methods; The research method maintains a relatively stable average accuracy in the range of 0.75~0.80 
when dealing with different tasks. The research method requires a processing time of 250 ms for 2000 
data points and 1000 ms for 6000 data points. Before and after using the research method, the data 
processing increases from around 2500 to around 2700 within 15ms, and from around 2900 to 3100 
within 30ms. The dynamic data detection method designed in this study demonstrates good processing 
efficiency and accuracy in data detection. Research can provide certain technical references for 
dynamic data detection, improving the accuracy and reliability of data. 
Povzetek: Opisana je metoda dinamične detekcije za prostorsko-časovne podatke, ki temelji na 
hibridnem modelu in singularni analizi spektra.  Kombinacija GCNN in TCN omogoča detekcijo 
podatkov v časovni in prostorski dimenziji. 
 
 
1 Introduction and machine learning algorithms are often based on 
single factor analysis, which makes it difficult to  
In recent years, due to the swift progression of comprehensively analyze the dynamic changes in data 
information technology and the substantial increase in and effectively identify abnormal data [5-6].  
data volume, dynamic data detection research has Spatiotemporal data mining is an emerging data analysis 
emerged as a significant research direction in the field of technology that combines the advantages of geographic 
data mining. More and more scholars are paying information systems and data mining. It can 
attention to this field and conducting extensive research simultaneously consider temporal and spatial 
aimed at exploring more efficient and accurate methods information, reveal hidden patterns and associations in 
for dynamic data detection [1]. At present, there are data, and mainly use mining techniques such as 
various methods for dynamic data detection, including spatiotemporal clustering and spatiotemporal association 
conventional statistical methods, machine learning rules to mine spatiotemporal data. By analyzing 
algorithms, and spatiotemporal data mining techniques. spatiotemporal data, anomalies in dynamic data can be 
These methods have their own advantages in different identified. Graph Convolutional Neural Networks 
application scenarios, providing powerful tools for the (GCNN) and Temporal Convolutional Networks (TCN) 
detection and analysis of dynamic data [2]. Statistical can cut the complexity of network models and decrease 
methods mainly utilize statistical principles to analyze the number of weights, making them commonly used for 
the statistical characteristics of data and determine detecting data accuracy [7]. In view of this, a Time 
whether the data is abnormal. Machine learning methods Graph Convolutional Network (TGCN) accuracy 
mainly utilize machine learning algorithms, such as detection method based on spatiotemporal data mining 
support vector machines, neural network algorithms, methods, combined with GCNNs and TCN, is proposed. 
decision trees, etc., to train historical data and establish The research aims to solve the problem of anomaly 
anomaly detection models to identify abnormal data [3- detection in dynamic data streams by introducing 
4]. However, traditional dynamic data detection methods  advanced machine learning algorithms, and conduct 
36 Informatica 49 (2025) 35–48 S. Li et al. 
performance testing in environments containing high algorithm selection. The experimental results showed 
noise data, time-varying data patterns, and multi-source that this method achieved an accuracy of 99.7% in multi-
data fusion. The data preprocessing during the classification problems, significantly better than existing 
experimental process includes data cleaning, feature algorithms [11]. Jiao et al. applied reinforcement learning 
selection, and data standardization, while parameter techniques to dynamic data preprocessing to improve its 
selection involves hyperparameter tuning through cross efficiency and effectiveness. During the process, a 
validation methods. preprocessing model based on reinforcement learning 
The research is mainly conducted from four sections. was constructed. By continuously learning the 
The initial section presents the findings of the research characteristics of the data stream and preprocessing 
related to spatiotemporal data mining and dynamic data strategies, the preprocessing parameters were 
detection methods. The second section designs dynamically adjusted to achieve optimal preprocessing 
spatiotemporal data mining techniques and dynamic data results. The experiment outcomes indicated that this 
accuracy detection. The third section evaluates the method could effectively raise the efficacy and 
efficacy of the designed methods. The last section is the effectiveness of dynamic data preprocessing, and adapt 
discussion and summary of the entire text. to the dynamic changes of data streams [12]. 
In order to further detect dynamic data with 
2 Related works spatiotemporal characteristics, enhance precision and 
dependability of the data, researchers are constantly 
As Internet technology continues to evolve and innovate, exploring more advanced spatiotemporal data mining 
a large number of spatiotemporal data continue to techniques. Purificato et al. raised a spatiotemporal 
emerge, which contains rich information and provides anomaly detection method grounded on graph neural 
rich resources for data development decisions. Some networks to address the issue of spatiotemporal data 
experts and researchers have carried out pertinent studies anomaly detection. During the process, graph neural 
on the problems in dynamic data. Yin et al. raised a networks were used to learn spatial dependencies and 
sliding window-based anomaly detection method to combined with time series analysis to capture time 
address the difficulty of traditional methods in effectively trends, ultimately achieving effective identification of 
identifying anomalies in dynamic data streams. During outliers. The experiment outcomes indicated that this 
the process, the data stream was windowed, statistical method achieved better performance than other methods 
features were extracted from each window, and on multiple real datasets [13]. Hu et al. raised a 
compared with preset thresholds to determine if there spatiotemporal trajectory prediction method that 
were any anomalies. The experimental findings indicated integrates multi-source data for trajectory prediction in 
that this approach exhibited a high accuracy in detection spatiotemporal data. During the process, this method 
and a low incidence of false alarms [8]. Huang J et al. integrated the user's spatiotemporal trajectory, point of 
proposed a joint computing unloading and resource interest information, and social network data, and used 
allocation algorithm for task processing in vehicle deep learning models for prediction. The experiment 
networks under the Internet dynamic data environment. results showed that this method achieved significant 
This algorithm models dynamic optimization problems improvements in both prediction accuracy and stability 
as Markov decision processes and utilizes deep [14]. Fang et al. proposed an attention based 
reinforcement learning to address high-dimensional spatiotemporal event prediction method for event 
continuous states and action spaces. Experiments showed prediction in spatiotemporal data. During the process, 
that the joint computation offloading and resource attention mechanisms were utilized to automatically 
allocation algorithm outperformed other algorithms in learn the importance of different spatiotemporal 
terms of processing latency and cost, and had excellent characteristics and make forecasts on the basis of the 
training convergence and performance [9]. Bloemheuvel learned weights. The experiment findings indicated that 
et al. applied graph neural networks to dynamic data this approach could significantly enhance precision and 
association analysis to investigate the correlation interpretability of event prediction [15]. Pineda J et al. 
between dynamic data. During the process, the data proposed a framework based on geometric depth learning 
stream was transformed into a graph structure, and a using spatiotemporal data mining technology for the 
graph neural network model was used to learn the dynamic process of complex biological systems in 
relationships between nodes, thereby mining potential Internet dynamic data. This method used a graph neural 
connections between the data. The experiment results network with enhanced attention, which can accurately 
showed that this method could effectively identify estimate the dynamic characteristics of various biological 
complex correlations between data and provide more in- scenes. By combining geometric priors to process object 
depth abnormal data detection and data quality analysis features, this network achieved multiple tasks from 
[10]. Xu H et al. proposed a data-driven automated trajectory linking to local and global dynamic attribute 
machine learning method for intrusion and anomaly inference. Experiments showed that this method 
detection in the Internet of Things under the Internet exhibited strong flexibility and reliability on real and 
dynamic data environment. The dataset quality was simulated biological experimental data [16]. Li et al. 
optimized through the SMOTE algorithm and mutual proposed a density-based spatiotemporal data clustering 
information, combined with automated machine learning, method for clustering problems in spatiotemporal data. 
which achieved automatic hyperparameter tuning and During the process, this method utilized density 
Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 37 
clustering algorithm, combined with spatiotemporal filling results. The study uses SSA to fill missing values 
distance and density information, to cluster the data. The in dynamic data, and the process of filling missing data is 
experiment results showed that this method could shown in Figure 1. 
effectively identify clustering structures in As represented in Figure 1, the missing data set is 
spatiotemporal data and had good interpretability [17]. first input, and after SSA processing, the filled data is 
The summary analysis of related work is shown in Table obtained. Then, the missing data and the filled data are 
1. added together to obtain the complete dataset. Window 
In summary, although many scholars have designed length is a key parameter of SSA, which directly affects 
a large number of improved algorithms to improve the the effectiveness of decomposition and reconstruction. 
efficiency and accuracy of dynamic data detection, such The research stipulates that the window length is within 
as the sliding window anomaly detection method, which the interval of 1 and half of the sequence length. A larger 
has high accuracy but cannot handle complex window length is suitable for capturing long-term or 
spatiotemporal dependencies, its application in dynamic trend information, while a smaller window length is more 
data streams is limited. The technology proposed by suitable for short-term or local characteristics. If the data 
some scholars performs well in terms of latency and cost, have significant periodicity, the window length should be 
but converges slowly for complex data, which may affect close to a multiple of the period; If the trend is strong, the 
real-time performance. The graph neural network method window length should cover the entire trend. The 
has high computational complexity and poor ability to selection of window length is usually determined through 
handle sparse data. There are also automated machine experimental tuning and error evaluation. When selecting 
learning methods that excel in accuracy, but lack components for reconstruction, singular value spectrum 
interpretability, which may affect user trust. In view of analysis can be used to distinguish between signal and 
this, research attempts to add accuracy detection methods noise components, with priority given to the first few 
based on the spatiotemporal topology structure, and components with larger singular values. Appropriate 
improve the operational efficiency and data processing component selection can ensure that the reconstructed 
capabilities of the technology, in order to provide a sequence is smooth and accurate, avoiding incomplete 
solution for improving the effectiveness of network data reconstruction caused by too few components or noise 
detection. introduced by too many components. Data 
standardization helps to discover and correct errors, 
3 Design of dynamic data detection ambiguities, missing data, and other issues in data. By 
processing data from different sources and formats 
method for spatiotemporal data uniformly, it makes them comparable, thereby improving 
mining data quality and algorithm performance. The first step of 
data standardization operation is to calculate the 
3.1 Construction of graph-based arithmetic mean and standard deviation of each indicator, 
and the standardization is shown in equation (1). 
spatiotemporal data mining method  
In the process of collecting spatiotemporal data, missing zij = (xij - x)/s    (1) 
values may occur due to human factors, machine failures, 
 
and other reasons, which will directly affect the 
In equation (1), zij  means the standardized variable 
effectiveness of dynamic data analysis in the later stage 
[18]. Singular Spectrum Analysis (SSA) can be used to value, xij  means the actual variable value, x  means the 
analyze and predict nonlinear time series data and fill in arithmetic mean of each indicator, and s  represents the 
missing values. SSA can decompose time series into 
standard deviation of each indicator. According to the 
components such as trends, periods, and noise, and fill 
mean of the original data and the calculated standard 
missing values by reconstructing the main parts of the 
deviation, Z-score normalization can be performed. The 
data. When filling missing data, SSA utilizes the intrinsic 
process of Z-score normalization is shown in equation 
patterns of time series to reconstruct the missing parts, 
(2). 
which has robustness in handling nonlinear and non-
stationary data and can generate smooth and reasonable 
Table 1: Summary and analysis of related work. 
Performance data 
Reference Method name Advantages Disadvantages 
(reasonably fabricated) 
Cannot capture complex 
Sliding window anomaly High detection accuracy, Accuracy: 91%, False 
[8] Yin et al. spatiotemporal 
detection low false positive rate positive rate: 5% 
dependencies 
Joint computation offloading Slow convergence on Latency reduction: 30%, 
[9] Huang J et al. Low latency, reduced cost 
and resource allocation complex data Cost reduction: 25% 
Accuracy: 93%, 
Graph neural network for Effectively identifies High computational 
[10] Bloemheuvel et al. Detection time: 1200 
dynamic data association complex relationships complexity 
seconds 
Automated machine learning Extremely high precision, Poor interpretability for Accuracy: 99.7%, 
[11] Xu H et al. 
for intrusion and anomaly automatic tuning high-dimensional data Processing time: 1000 
38 Informatica 49 (2025) 35–48 S. Li et al. 
detection seconds 
Reinforcement learning for Significant improvement in High data dependency Efficiency improvement: 
[12] Jiao et al. 
dynamic data preprocessing preprocessing efficiency for model training 35% 
Spatiotemporal anomaly 
Captures spatiotemporal Limited handling of Accuracy: 96%, False 
[13] Purificato et al. detection with graph neural 
trends sparse data positive rate: 2% 
networks 
Spatiotemporal trajectory 
Increased prediction Poor scalability for large 
[14] Hu et al. prediction with multisource Accuracy: 92% 
accuracy trajectory data 
data 
Attention mechanism for Weak handling of 
[15] Fang et al. High prediction accuracy Accuracy: 94% 
event prediction heterogeneous data 
Geometric deep learning for 
Strong adaptability, Limited adaptability to 
[16] Pineda J et al. complex dynamic process Accuracy: 95% 
suitable for multitasking non-geometric data 
modeling 
Accuracy: 89%, 
Density-based clustering for Good structure recognition, Slower computation 
[17] Li et al. Processing time: 1500 
spatiotemporal data high interpretability speed on large data 
seconds 
 
Missing dataset
Input
SSA data 
filling Complete dataset
Add up
method
 the results
Output
Fill in data
 
Figure 1: SSA missing data filling process diagram. 
Modeling
Scene 
Original database
description
Target selection Problem 
Spatial topology
database transformation
Spatiotemporal 
Time information
recombination
Spatiotemporal 
layer group
End
 
Figure 2: Developing dynamic data model construction process. 
DYM = dym1 ,dym2 , ,dym object in the new sequence, dymi  represents the objects 
 n
in the given detection sequence, and DYM '  represents 
 1 n
 dymi − dym the new sequence, with a mean of 0 and a variance of 1. 
i
 n
dym' = i=1
  (2) A modeling method is proposed by combining the 
i
1 n 1 n
 spatiotemporal topology structure with the 
  (dymi - dymi )
2
n −1 spatiotemporal data of the graph. The process of 
i=1 n i=1
 constructing the model is represented in Figure 2. 
DYM ' = dym'1 ,dym'2 , ,dymn ' As shown in Figure 2, during the software 
 development process, the system will continuously 
In equation (2), DYM  represents the given generate a large amount of dynamic data. To effectively 
detection index data sequence, dym '  represents each utilize this data, it is first necessary to extract key 
i
relational information from it, including interactions and 
Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 39 
dependencies between entities. Subsequently, based on In equation (5), min (g (Se ))  represents the shortest 
these extracted relationships, specific scenarios can be 
built to provide intuitive references for subsequent model time, g (Se )  represents the time objective function, 
construction. On this basis, key issues are defined to t (Se )  represents the time required for detection in the 
guide the correct construction of the model, and 
ultimately a spatiotemporal model is established to detection space S , t
e e Aep  represents the processing 
further develop and utilize these dynamic data. An time of two objects in the detection space S , and   is 
e
attribute matrix needs to be established in the model, as 
represented in equation (3). the training parameter; Aep  represents the weighted 
 relationship between the historical attribute value and the 
 t1 t2 t
X m reference valuer. The best performance is represented by 
object   X
1 object    X
1 object  
1
  "as accurate as possible detection results", and the 
t1 t2 t
X m
object   X
2 object    X
2 object  mapping relationship between historical attribute values 
2
 
t and reference values is shown in equation (6). 
1 t2 t
X = X object   X
3 object    X m
3 object    (3) 
3  
 
                                T
 *  
t1
 t1 t2 t
X  A A
i1 i1   X 
v 
i
object   X object    X m 
 n n objectn    
 *  t 
2
1   X 
t+
 ( , )=  Ai2 A  v 
X = f A X f i
 i2 ,   (6) 
vi
      
In equation (3), X  represents the attribute matrix, n    
means the number of objects, t j  means time units, m   
*  tm X
 
i
 A A v
in i   
 n
t 
means the number of time units, and X j
object  means the 
i  
attribute values of objects in time unit t j . The matrix X t+1
In equation (6),  represents a certain time, and 
vi
needs to add weighted adjacency values, which are 
f (A,X )  represents the mapping relationship between 
expressed as equation (4). 
 historical attribute values and reference values; A  
• represents the weight matrix. The mapping and updating 
Aij = ij d    (4) 
ij of time series data reflects the relationship between 
 historical attribute values and reference values. The 
• ultimate goal is to improve the time efficiency and spatial 
In equation (4), Aij  represents the weighted 
accuracy of detection through the joint optimization of 
adjacency value, ij  represents the weighted adjacency these two formulas. The ultimate goal of data accuracy is 
coefficient between two objects, and dij  is the Euclidean to optimize the min (g (Se ))  and f (A,X )  objective 
distance between the two objects. The "shortest time" in functions. In order to increase the spatiotemporal 
developing a dynamic data accuracy detection model specificity of data detection, a time-varying layer group 
refers to the shortest detection time, as expressed in is designed, as shown in Figure 3. 
equation (5). As shown in Figure 3, the spatial arrangement of 
 objects is depicted using graphics, where each graphic is 
min (g (Se )) = min (e  t (Se )) layered sequentially atop the previous one, preserving the 
task details of the nodes. According to the calculation 
 n   (5) 
= min e te Aep    rules of weight coefficients, it is necessary to process the 
 p=1  structure of weights. The process of using "weight 
 pruning" is studied, as shown in Figure 4. 
t1 T
TASK1
t2
t3
TASK2
tm
TASKn
 
Figure 3: Overall design of time-varying layer group. 
40 Informatica 49 (2025) 35–48 S. Li et al. 
(a) (b) (c) (d)
 
Figure 4: Weight pruning process diagram. 
M 1
Time M 2
dimension
Developing M n
Method 
dynamic data 
M 1 design
accuracy detection
Spatial M 2
dimension
M n
 
Figure 5: Ideas for developing dynamic data accuracy detection methods. 
As shown in Figure 4, when performing weight other's advantages, thus obtaining an accuracy detection 
pruning, at a specific time point, the study will select a method. The expression of spatiotemporal graph and loss 
specific region for in-depth analysis. The selected area is function is shown in equation (7). 
further divided into four detection spaces, each having its  
own central node. Each node within the detection space Gt = (Vt ,E,W )
is weighted, where the weight signifies the connection   (7) 
L(vˆstrength or similarity between nodes. Based on the ,W ) =|| vˆ (v ) 2
t−M +1 , ,vt ,W -vt+1 ||
weight allocation, the weights are adjusted according to  
the closeness of the relationships between nodes. If the In equation (7), G  represents the spatiotemporal 
t
relationship between two nodes is very close, their 
graph, V  means the node set, E  means the edge set, W  
weights will be set higher; On the contrary, if the t
relationship is relatively distant, the weight will be lower. means the adjacency matrix, L(vˆ,W )  represents the loss 
The size of weights directly reflects the degree of function, W  represents all trainable parameters, v̂  
closeness between nodes. 
represents the predicted value, and vt  represents the 
+1
3.2 Construction of dynamic data true value. Fourier transform has a broad spectrum of 
utilization in signal processing, image processing, audio 
detection methods incorporating processing, and other fields. It can decompose complex 
accuracy signals into the superposition of sine waves and cosine 
In order to test the accuracy of data, TGCN is chosen as waves of different frequencies, which is extremely useful 
the algorithm for developing dynamic data accuracy for signal analysis and processing. The Fourier transform 
detection. GCN and TCN together form the core process is shown in equation (8). 
processing module of TGCN. TGCN combines the  
characteristics of graph structure and time series data, Lx =UU T x
and can simultaneously capture the spatial structure and     (8) 
L (L = D − A
temporal dynamic changes of data. Compared with )
GCNN that only processes spatial features, TGCN  
enhances its ability to handle spatiotemporal In equation (8), Lx  represents the process of Fourier 
dependencies by modeling changes in time series through transform, x  represents an n  dimensional column 
time convolutional layers. Secondly, TCN is mainly vector representing the characteristics of nodes, D  
applied to one-dimensional time series and cannot represents the degree matrix of the graph, U  and U T  
effectively utilize node relationships in graph structures. 
represent orthogonal matrices, and L ()  represents the 
TGCN introduces a graph structure and combines the 
temporal information of each node and its neighbors to Laplacian matrix of the graph. The calculation for the 
achieve more accurate temporal prediction and anomaly GCN obtained from the study is shown in equation (9). 
detection. The idea of the dynamic data accuracy  
X n+1 = (AX n
detection method is shown in Figure 5. W )
  (9) 
As shown in Figure 5, considering data accuracy 
 
detection from both temporal and spatial dimensions, the 
In equation (9), X  represents the feature matrix, and 
results obtained from each are fused to complement each   represents the nonlinear activation function. The 
Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 41 
forward propagation process of GCN is described by  TP
equation (9), which utilizes graph structure information P =
TP + FP
and node features to aggregate and update local 
 TP
neighborhood information of nodes through convolution R =  (13) 
operations. The calculation of one-layer TCN in TGCN  TP + FN
is represented in equation (10).  2* P* R
 f 1_ score =
  P + R
H (s) = f ()XF(x)  
   (10) In equation (13), P  represents accuracy, R  
F(x) =W () +
represents recall, f 1_ score  represents the combined 
 
score of accuracy and recall, and TP  means positive 
In equation (10), H(s)  represents a layer TCN in 
samples classified as correct by the model. FN  means 
TGCN, f ()  represents the convolution kernel, and positive samples classified as incorrect by the model. 
F(x)  means the residual function. The loss function FP  refers to negative samples classified as incorrect by 
during the training process of TGCN model is the model. In practical applications, the TGCN designed 
represented in equation (11). for research also involves parameter selection. The GCN 
 parameter adjacency matrix usually uses a normalized 
adjacency matrix, and the number of GCN layers is 
Loss =|| X c − Xˆ ||2 +L2   (11) 
generally 1-2 to avoid over smoothing. The activation 
 function is often ReLU or LeakyReLU, and the 
In equation (11), Loss  represents the loss function, dimension of the weight matrix depends on the 
X  means the detection value of the model, X̂  means 
c dimensions of the input and output features. The learning 
the actual values of various detection attributes in the rate is usually set to 0.001 or 0.0005, which can be 
data, L2  represents the regularization term of the model, optimized using a dynamic learning rate scheduler and 
L2 regularization to prevent overfitting. The batch size is 
and   represents hyperparameters. The TGCN data 
set to 32, 64, or 128 based on the data size, and an early 
accuracy detection method needs to test the core stop strategy is used during training to prevent overfitting 
performance indicators before actual operation, and use based on performance monitoring of the validation set. 
the test results as a reference to optimize the method 
specifically and targetedly. The performance of TGCN 
method is evaluated using root mean square error 4 Analysis of the effectiveness of 
(RMSE), mean absolute error (MAE), and mean absolute dynamic data detection methods in 
percentage error (MAPE), and the evaluation indicators spatiotemporal data mining 
are shown in equation (12). 
 
4.1 Performance testing of dynamic data 
 1
P =  (Xˆ t+1-Xt+1 2
detection methods for spatiotemporal 
RMSE v )
i v
  i
data mining 
 
 1
P | Xˆ t+1 Xt+1 To analyze the ability of the multi-factor development 
 MAE =  v − v |  (12) 
  i i
i=1 dynamic data detection method established in the study 
 1 1
1  | Xˆ t+ t during runtime, data from a network company was used 
v −X +
v |
P i i
MAPE =       as the test data. Compare the happen before algorithm 

  1
1 Xt+
i= v (HAB) [19], Lockset algorithm (Lock) [20], and 
i
 BufferTrack algorithm (Butra) [21] with TGCN to 
evaluate its data processing performance. The software 
In equation (12), X t+1 t+1
v  and Xˆ
i v  represent the true 
i and hardware environment required for the experiment is 
and reference values of the property vi  of the object at represented in Table 2. 
To verify the effectiveness of SSA missing filling 
time (t +1) , separately, and   means the number of 
method, a 12-month workload data of a network 
objects. PRMSE , PMAE , and PMAPE  represent RMSE, MAE, company was selected as the dataset. The dataset 
and MAPE, respectively. RMSE and MAE can reflect contains the workload changes of the company within 
the error situation between the true value and the one year, with a size of approximately 8GB and six 
reference value, while MAPE can reflect the ratio million data points. The data features cover multiple 
between the error and the true value. In the dimensions such as timestamp, request volume, response 
comprehensive evaluation of algorithms, indicators such time, etc., which can help analyze the patterns and trends 
as accuracy and recall are often used to assess the of network traffic. In the preprocessing step, the study 
rationality of the method. f 1_ score  is considered a key first performed data cleaning, removing some obvious 
erroneous records and outliers; Then feature selection 
indicator for measuring the effectiveness of accuracy 
was carried out, retaining the most critical indicators for 
detection, and its calculation is shown in equation (13). 
workload analysis; Then, the data was standardized to 
42 Informatica 49 (2025) 35–48 S. Li et al. 
enable comparison of data from different indicators at the interpolation method [22] and SSA method were applied 
same scale, in order to improve the effectiveness and to fill in the missing data. The filling results of the two 
accuracy of subsequent interpolation algorithms. Fourier methods are shown in Figure 6. 
Table 2: System development and operating environment. 
Project Software and framework 
Integrated development environment Visual studio 2013 
Database environment SQL Server 2019 
Operating system Windows10, Linux 
Framework .NET, Mini UI 
Programming language C#, JavaScript 
Web server IIS 7.0 
Network protocol UDP, TCP/IP 
 
500 500
400 400
300 300
200 200
100 Original data 100 Original data
Filled data Filled data
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Time (min) Time (min)
(a) SSA (b) Fourier
 
Figure 6: Comparison chart of two filling methods. 
Hab
Lock Hab
0.50 Butra 1100 Lock
TGCN Butra
0.40 1000 TGCN
0.30 900
0.20 800
0.10 700
0 600
MAE MAPE RMSE ARIMA MGLN STGCN TGCN
(a) Three detection indicators (b) Detection time
 
Figure 7: Analysis of performance indicators for different methods of operation. 
Figure 6 (a) shows the use of SSA missing filling significant deviation, especially in the time interval of 20 
method to fill in the original data, while Figure 6 (b) to 30 minutes, where the DYM deviation rose to about 25 
shows the use of Fourier fast interpolation method to fill meters. This difference highlights that the method failed 
in the original data. As shown in Figure 6 (a), the SSA to accurately capture potential trends during this critical 
missing filling method effectively filled in missing data, period. There were significant differences between the 
and the filled data was closely aligned with the original filled data and the original data, exhibiting unrealistic 
data in the time series. It is worth noting that the DYM oscillations and leading to misunderstandings of data 
deviation of the SSA missing filling method was about 5 trends. The SSA missing filling method is more suitable 
meters, indicating minimal deviation from the original for scenarios where maintaining consistency in the 
signal. The smooth transition between interpolated values original data structure is crucial, while the Fourier fast 
without obvious peaks or large fluctuations indicated that interpolation method may introduce significant 
this method could accurately preserve the trends and inaccuracies, especially when analyzing dynamic data 
features of the original dataset. From Figure 6 (b), in where accurate trend representation becomes crucial. 
contrast, the Fourier fast interpolation method showed 
DYM (m)
Detection time (s)
DYM (m)
Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 43 
Considering that the methods in Related Works have problems and reduces processing time. Finally, the 
been optimized for specific preset scenarios, it cannot be advanced optimization algorithms used in the research 
guaranteed that the optimal learning performance can be methodology allow for faster convergence and 
fully reflected in the studied scenarios. So, the study significantly reduce training time without sacrificing 
compared three advanced methods with sufficient accuracy. Overall, these factors make research methods 
applicability, Hab, Lock, and Butra, to analyze the more efficient and suitable for real-time dynamic data 
performance of TGCN by comparing Mean Absolute applications. To further test the stability of TGCN, the 
Error (MAE), Root Mean Square Error (RMSE), Mean Butra model with better performance in the above results 
Absolute Percentage Error (MAPE), and detection time. was selected as the comparative model, and experiments 
The Hab algorithm sets a fixed window size of 1024, were conducted under different detection tasks and 
uses 3 times the standard deviation as the anomaly experimental conditions. The experiment outcomes are 
threshold, and updates statistical features after each represented in Figure 8. 
window is processed. The Lock algorithm defines a lock Figure 8 (a) shows the average accuracy changes of 
set containing 256 key data points, analyzes data at 30 TGCN and Butra under different detection task 
second intervals, and configures specific CPU and conditions, and Figure 8 (b) shows the average accuracy 
memory resource allocation strategies to optimize changes of TGCN and Butra under different 
execution efficiency. The Butra algorithm uses a experimental conditions. From Figure 8 (a), when TGCN 
dynamic buffer with an initial size of 2048, tracking data processed different tasks, the average accuracy was 
changes within the last 5 minutes and sampling data at a relatively stable, maintaining in the range of 0.75-0.80, 
frequency of once per second to ensure real-time while Butra’s average accuracy fluctuated greatly and 
performance and reduce processing latency. TGCN sets was lower than 0.72. According to Figure 8 (b), as the 
0.003 as the initial learning rate of the algorithm, 0.30 as number of experiments increased, the average accuracy 
the activation function parameter, 64 as the batch size, of TGCN remained in the range of 0.80-0.85, while 
120 as the number of network iterations, and 2 as the Butra’s average accuracy fluctuated significantly, below 
initial dilation factor in the time convolution module. The 0.78. From this, TGCN had a high accuracy rate when 
experimental results are shown in Figure 7. processing different tasks, and the accuracy rate showed 
Figure 7 (a) indicates the behaviour of four methods a basically stable trend as the number of experiments 
tested using MAE, MAPE, and RMSE metrics, and increased. In order to determine the effectiveness of 
Figure 7 (b) indicates the behaviour of the four methods different components in the research method, 70% of the 
tested using detection time. According to Figure 7 (a), data in the dataset was used for ablation experiments, as 
the MAE, MAPE, and RMSE indicators of TGCN were shown in Table 3. 
the lowest among the four models, at 0.16, 0.18, and As can be seen, the Baseline Model demonstrated the 
0.20, respectively, lower than the other three comparison best performance with a best accuracy of 97.00%, a 
methods. However, the MAE, MAPE, and RMSE recall of 95.00%, and an F1 score of 96.00%, indicating 
indicators of the Hab model were the highest among the the combined model performed exceptionally well in 
four models, at 0.34, 0.39, and 0.38, separately. From dynamic data detection tasks. Removing SSA resulted in 
Figure 7 (b), Hab had the longest detection time, at 1300 a decrease in the best accuracy to 94.50%. SSA played a 
seconds, which was significantly longer than the other vital role in filling missing data, and its absence led to 
three comparison methods, while TGCN had the shortest data incompleteness, negatively impacting the recall rate 
detection time among the four methods, at only 670 and F1 score. The removal of GCNN resulted in the most 
seconds. From this, the TGCN model had the lowest significant performance drop, with the best accuracy 
detection indicators among the four models, followed by plummeting to 92.00%. GCNN was essential for 
Butra, indicating that the TGCN model could shorten extracting spatial features from the data, and losing this 
detection time and improve detection efficiency. component severely affected the model's ability to handle 
Compared with the methods of Hab, Lock, and Butra, the complex data. The model's performance only slightly 
research method had lower computational complexity. declined when TCN was removed, achieving a best 
Unlike Hab's method, this approach typically involves accuracy of 93.50%. This suggests that while temporal 
deep architectures with multiple layers, simplifying feature extraction had some impact, it was comparatively 
feature extraction and focusing on fundamental aspects less critical than spatial features. With the removal of 
without unnecessary complexity. The Lock method tends Fourier Transform, the best accuracy dropped to 95.00%, 
to include redundant processing steps, while the research indicating the importance of Fourier Transform in 
method uses SSA for denoising and missing data filling, extracting frequency-domain features. Finally, removing 
which helps with clearer data processing and improves Spatiotemporal Recombination resulted in a performance 
efficiency. In addition, although Butra's method decline to 93.00%. Although spatiotemporal 
combines multiple models to capture temporal and recombination enhanced the model's ability to capture 
spatial features separately, the integrated model of the spatiotemporal data, its impact was relatively smaller 
research method simultaneously solves these two than that of other components. 
 
 
44 Informatica 49 (2025) 35–48 S. Li et al. 
Table 3: Ablation experiment 
Component Best accuracy (%) Recall (%) F1 Score (%) 
Baseline Model (All) 97.0 95.0 96.0 
Remove SSA 94.5 92.0 93.2 
Remove GCNN 92.0 90.0 91.0 
Remove TCN 93.5 91.5 92.3 
Remove Fourier Transform 95.0 93.5 94.2 
Remove Spatiotemporal Recombination 93.0 90.5 91.7 
 
0.85 0.85
0.80 0.80
0.75 0.75
0.70 0.70
0.65 TGCN 0.65 TGCN
Butra Butra
0.60 0.60
Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 0 10 20 30 40 50
(a) Detection task category (b) Number of tests
 
Figure 8: Average accuracy fluctuation analysis. 
1.00
0.20 GNN
CNN
0.90 TGCN
0.15
0.80 0.10
0.70 GNN
0.05
CNN
TGCN
0.60 0
0 10 20 30 40 50 0 10 20 30 40 50
Number of iterations Number of iterations
(a) Accuracy (b) Error rate
 
Figure 9: Comparison of accuracy and misjudgment rate of three methods. 
 
4.2 Application analysis of dynamic data the accuracy performance of TGCN was good. 
detection methods in spatiotemporal According to Figure 9 (b), as the number of iterations 
increased, the false alarm rates of all three methods 
data mining 
decreased. Among them, TGCN decreased from the 
To further demonstrate the advantages of the proposed initial 0.14 to 0.01, which was significantly lower than 
method in dynamic data monitoring, the accuracy and the other two compared algorithms. TGCN could 
misjudgment rates of TGCN and CNN [23], GCN [24] improve the accuracy of data detection process and 
were compared. The accuracy of the detection here was reduce false alarm rate, thus achieving dynamic data 
obtained during the monitoring process of a large amount detection. The attendance data of two departments in a 
of data, so it tended to approach a specific value rather company for 12 months were analyzed, the time under 
than a special numerical range obtained for a specific different data volumes was calculated, and the results are 
individual task. The accuracy and misjudgment rates are represented in Figure 10. 
represented in Figure 9. Figure 10 (a) compares the processing time of three 
Figure 9 (a) shows the accuracy comparison at methods for different sizes and quantities of data in 
different iteration times, and Figure 9 (b) shows the false Department A, and Figure 10 (b) compares the 
positive rate comparison at different iteration times. As processing time of three methods for different sizes and 
represented in Figure 9 (a), the accuracy of TGCN was quantities of data in Department B. From Figure 10 (a), it 
stable at 0.97, the accuracy of CNN was stable at 0.88, is told that for Department A, the TGCN method 
and the accuracy of GNN was only 0.81. It is told that  required a processing time of 250 ms when processing 
Average accuracy
Loss value
Loss value Average accuracy
Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 45 
2000 data points, and 1000 ms when processing 6000 processing speed and improve efficiency. In order to 
data points. For the same amount of data, the processing further analyze the advantages and scalability of the 
time of TGCN was the shortest, and as the amount of research method, an online social networking platform 
data increased, the required processing time also was selected for real-time data detection, and the 
increased. According to Figure 10 (b), for Department B, advanced K-nearest neighbor interpolation method [25] 
the TGCN method required a processing time of 300 ms and polynomial interpolation method [26] in recent years 
for 3000 data points and 750 ms for 5000 data points, were introduced for comparison, as shown in Table 4. 
which was lower than the other two comparison As shown in Table 4, the RMSE of TGCN method 
algorithms. For the same amount of data, TGCN had the was 5.0 meters, significantly lower than K-nearest 
shortest processing time, and as the amount of data neighbor interpolation method (12.0 meters) and 
increased, the required processing time also increased. polynomial interpolation method (15.0 meters), 
Comparing the data processing volume before and after indicating that TGCN method had significant advantages 
applying TGCN at different times, the application results in filling accuracy. The relative error of TGCN method 
in two departments are represented in Figure 11. was only 1.2%, which was the lowest among all 
Figure 11 (a) tells the amount of data processed by comparison methods, highlighting its superiority in data 
department A before and after applying TGCN at filling. The detection time of TGCN was only 1.1 
different times, while Figure 11 (b) tells the amount of seconds, which was lower compared to other methods. 
data processed by department B before and after The cosine similarity of TGCN method was 0.95, 
applying TGCN at different times. According to Figure indicating a high degree of consistency between the filled 
11 (a), for Department A, before and after using the data and the original data. In contrast, the K-nearest 
TGCN method, the data processing increased from neighbor interpolation method had a similarity of 0.80 
around 2500 to around 2600 within 15ms, and from and the polynomial interpolation method had a similarity 
around 2800 to 3000 within 30ms. According to Figure of 0.75, indicating that its similarity was not as good as 
11 (b), for Department B, before and after using the the TGCN method. After comparison, TGCN had the 
TGCN method, the data processing increased from shortest detection time and the best detection accuracy, 
around 2500 to around 2700 within 15ms, and from and its good performance in different data scenarios also 
around 2900 to 3100 within 30ms. Using the TGCN proved the good scalability of the research method. 
method within the same time frame can accelerate data 
2500 CNN 2500 CNN
GNN GNN
2000 TGCN 2000 TGCN
1500 1500
1000 1000
500 500
0 0
1000 2000 3000 4000 5000 6000 1000 2000 3000 4000 5000 6000
Number of data processed Number of data processed
(a) A department (b) B department
 
Figure 10: Calculation time for processing different data. 
3200 3200
3000 3000
2800 2800
2600 2600
2400 2400
2200 Before using TGCN 2200 Before using TGCN
After using TGCN After using TGCN
2000 2000
10 15 20 25 30 10 15 20 25 30
Time (ms) Time (ms)
(a) A department (b) B department
 
Figure 11: Processing data volume at different times. 
 
Time consuming (ms)
Processing data volume
Processing data volume Time consuming (ms)
46 Informatica 49 (2025) 35–48 S. Li et al. 
Table 4: Comparative analysis of advanced methods 
Method TGCN K-nearest neighbor interpolation method Polynomial interpolation 
RMSE (m) 5 12 15 
MAPE (%) 1.2 3 4.5 
RE (%) 1.2 2.5 3.5 
MAE (m) 10 20 25 
Detection time (seconds) 1.1 1.7 1.2 
Cosine similarity 0.95 0.8 0.75 
Data consistency (%) 98 92 90 
Interpolation smoothness (m/min) 0.3 0.8 1 
 
interpretable online artificial intelligence technology into 
4.3 Discussion the TGCN model, thereby enhancing its interpretability 
and user trust. In addition, in order to support real-time 
The study proposes a method based on TGCN, 
data detection tasks on large-scale datasets, it is 
combining GCNN and TCN, to achieve accuracy 
necessary to develop a distributed computing framework 
detection of dynamic data. Compared with traditional 
to further enhance the scalability of the model. 
methods in related work, TGCN exhibits significant 
advantages in efficiency, accuracy, and robustness. 
Firstly, in terms of efficiency, traditional methods such 5 Conclusion 
as autoregressive models and moving average models A dynamic data detection technique based on 
often rely on linear regression and simple statistical spatiotemporal mining technology was developed to 
methods for data processing, resulting in slower enhance data processing in the network. During the 
processing speeds. Relatively speaking, TGCN adopts process, the singular spectrum analysis method was 
deep learning technology and can process massive introduced to fill in missing data, and the spatiotemporal 
amounts of data in parallel. Experimental results showed topology structure was fused to establish a dynamic data 
that this method only took 670 seconds in detection time, detection method. A data accuracy detection method was 
which often took several hours to achieve in traditional proposed by combining GCNN and TCN to complete the 
models. This significant time advantage makes TGCN a data accuracy detection. The data was detected in both 
more attractive choice in real-time data monitoring the temporal and spatial dimensions, and the two were 
applications. Secondly, in terms of accuracy, compared added together to obtain complete detection data. Finally, 
with threshold-based anomaly detection methods, TGCN the validity of the raised method was analyzed. The 
can simultaneously consider the temporal and spatial experiment outcomes indicated that in terms of data 
characteristics of data by introducing time-series filling, the SSA missing filling method used in the study 
analysis. Many methods in related work often have an was more in line with the original data curve for filling 
accuracy of only around 0.85 when dealing with outliers, missing data. In terms of false positive rate, the method 
which cannot effectively handle complex data streams. proposed by the research decreased from 0.14 to 0.01, 
The TGCN in this study improved the accuracy of which was lower than the two compared methods. As the 
detection by combining singular spectrum analysis, and number of iterations increased, the false positive rate 
the experimental results showed that its accuracy gradually decreased. In terms of processing speed, before 
remained stable above 0.97. This optimization enables and after using the TGCN method, the data processing 
TGCN to maintain efficient anomaly detection time increased from around 2500 to 2700 within 15 ms, 
capabilities even in the face of dynamically changing and from around 2900 to 3100 within 30 ms. The 
data. In terms of robustness, some existing methods are research method had better data filling effect on missing 
often sensitive to noise and data loss, leading to data, which could process data at a higher speed and 
fluctuations in detection results. TGCN, through its deep ensure stable accuracy at a higher level. 
network structure, has strong adaptive capabilities and 
exhibits better adaptability to interference in dynamic 
6 Fundings 
data. In the experiment, TGCN showed improved 
robustness when dealing with noisy data, resulting in The research is supported by National Social Science 
significantly higher accuracy and stability of the model Foundation of China in 2022: Research on Evaluation 
in complex environments compared to many related System and Guarantee Mechanism of Labor Rights and 
works. Although the TGCN method in this study Interests of Flexible Employees in Platform Enterprises 
achieved excellent performance in multiple aspects, its (22XJY004). 
limitations cannot be ignored. The training cycle of the 
model was relatively long, especially in real-time References 
processing of large-scale datasets, which may face a 
[1] Zhenpeng Zhang. SD-WSN network security 
bottleneck in computing resources. In addition, TGCN 
detection methods for online network education. 
had poor interpretability in practical applications, which 
Informatica, 48(21):51-66, 2024. 
may make it difficult for business personnel to 
https://doi.org/10.31449/inf.v48i21.6257 
understand the decision-making logic of the model. 
Future research can explore the integration of 
Dynamic Detection Method for Spatiotemporal Data Based… Informatica 49 (2025) 35–48 47 
[2] Praveen Kumar Tyagi, and Dheeraj Agarwal. detection using automated machine learning for the 
Systematic review of automated sleep apnea Internet of Things. Soft Computing, 27(19):14469-
detection based on physiological signal data using 14481, 2023. https://doi.org/10.1007/s00500-023-
deep learning algorithm: a meta-analysis approach. 09037-4 
Biomedical Engineering Letters, 13(3):293-312, [12] Tianzhe Jiao, Xiaoyue Feng, Chaopeng Guo, 
2023. https://doi.org/10.1007/s13534-023-00297-5 Dongqi Wang, and Jie Song. Multi-agent deep 
[3] Chunhua Liang. Application of maximum entropy reinforcement learning for efficient computation 
fuzzy clustering algorithm with soft computing in offloading in mobile edge computing. Computers, 
migration anomaly detection. Informatica, Materials, and Continues, 76(9):3585-3603, 2023. 
48(17):171-182, 2024. https://doi.org/10.32604/cmc.2023.040068 
https://doi.org/10.31449/inf.v48i17.6537 [13] Erasmo Purificato, Ludovico Boratto, and Ernesto 
[4] Daniele Dalla Torre, Andrea Lombardi, Andrea William De Luca. Toward a responsible fairness 
Menapace, Ariele Zanfei, and Maurizio Righetti. analysis: from binary to multiclass and multigroup 
Exploring the feasibility of support vector machine assessment in graph neural network-based user 
for short-term hydrological forecasting in south modeling tasks. Minds and Machines, 34(3):1-34, 
tyrol: challenges and prospects. Discover Applied 2024. https://doi.org/10.1007/s11023-024-09685-x 
Sciences, 6(4):1-19, 2024. [14] Jun Hu, Xinyu Yang, Liang Yan, and Qinghua 
https://doi.org/10.1007/s42452-024-05819-z Zhang. Pedestrian trajectory prediction based on 
[5] Luke Lewis-Borrell, Jessica Irving, Chris J. Lilley, spatiotemporal attention mechanism. International 
Marie Courbariaux, Gregory Nuel, Leon Danon, Journal of Machine Learning and Cybernetics, 
Kathleen M. O’reilly, Jasmine M.S. Grimsley, 15(8):3299-3312, 2024. 
Matthew J. Wade, and Stefan Siegert. Robust https://doi.org/10.1007/s13042-023-02093-0 
smoothing of left-censored time series data with a [15] Yamin Fang, and Hui Liu. A spatiotemporal 
dynamic linear model to infer SARS-CoV-2 RNA dissolved oxygen prediction model based on graph 
concentrations in wastewater. AIMS Mathematics, attention networks suitable for missing data. 
8(7):16790-16824, 2023. Environmental Science and Pollution Research, 
https://doi.org/10.3934/math.2023859 30(34):82818-82833, 2023. 
[6] Francisco de Arriba-Pérez, Silvia García-Méndez, https://doi.org/10.1007/s11356-023-28030-w 
Fátima Leal, Benedita Malheiro, and Juan C. [16] Jesús Pineda, Benjamin Midtvedt, Harshith 
Burguillo. Online detection and infographic Bachimanchi, Sergio Noé, Daniel Midtvedt, 
explanation of spam reviews with data drift Giovanni Volpe, and Carlo Manzo. Geometric deep 
adaptation. Informatica, 35(3):1-25, 2024. learning reveals the spatiotemporal features of 
https://doi.org/10.15388/24-INFOR562 microscopic motion. Nature Machine Intelligence, 
[7] Yaping Wang, Zunshan Xu, Songtao Zhao, Jiajun 5(1):71-82, 2023. https://doi.org/10.1038/s42256-
Zhao, and Yuqi Fan. Performance degradation 022-00595-0 
prediction of rolling bearing based on temporal [17] Wenhao Li, Yanyan Chen, Yuyan Pan, and Yunchao 
graph convolutional neural network. Journal of Zhang. An improved spatiotemporal network traffic 
Mechanical Science and Technology, 38(8):4019- flow prediction method based on impedance matrix. 
4036, 2024. https://doi.org/10.1007/s12206-024- Journal of Highway and Transportation Research 
0702-z and Development, 18(2):67-75, 2024. 
[8] Chunyong Yin, Sun Zhang, Jin Wang, and Neal N. https://doi.org/10.26599/HTRD.2024.9480015 
Xiong. Anomaly detection based on convolutional [18] Fengxin Chen, Ye Yu, Liangliang Ni, Zhenya 
recurrent autoencoder for IoT time series. IEEE Zhang, and Qiang Lu. DSTVis: toward better 
Transactions on Systems, Man, and Cybernetics: interactive visual analysis of Drones' spatiotemporal 
Systems, 52(1):112-122, 2022. data. Journal of Visualization, 27(4):623-638, 2024. 
https://doi.org/10.1109/TSMC.2020.2968516 https://doi.org/10.1007/s12650-024-00982-2 
[9] Jiwei Huang, Jiangyuan Wan, Bofeng Lv, Qiang Ye, [19] Yan Jian, Xiaoyang Dong, and Liang Jian. Detection 
and Ying Chen. Joint computation offloading and and recognition of abnormal data caused by 
resource allocation for edge-cloud collaboration in network intrusion using deep learning. Informatica, 
internet of vehicles via deep reinforcement learning. 45(3):441-445 2021. 
IEEE Systems Journal, 17(2):2500-2511, 2023. https://doi.org/10.31449/inf.v45i3.3639 
https://doi.org/10.1109/JSYST.2023.3249217 [20] Erchao Li, and Kuankuan Qi. Ant colony algorithm 
[10] Stefan Bloemheuvel, Jurgen van den Hoogen, Dario for path planning based on grid feature point 
Jozinović, Alberto Michelini, and Martin extraction. Journal of shanghai jiao tong university: 
Atzmueller. Graph neural networks for multivariate English Edition, 28(1):86-99, 2023. 
time series regression with application to seismic https://doi.org/10.1007/s12204-023-2572-4 
data. International Journal of Data Science and [21] Si-Xiao Gao, Hui Liu, and Jun Ota. Energy-efficient 
Analytics, 16(3):317-332, 2023. buffer and service rate allocation in manufacturing 
https://doi.org/10.1007/s41060-022-00349-6 systems using hybrid machine learning and 
[11] Hao Xu, Zihan Sun, Yuan Cao, and Hazrat Bilal. A evolutionary algorithms. Advances in 
data-driven approach for intrusion and anomaly 
48 Informatica 49 (2025) 35–48 S. Li et al. 
Manufacturing, 12(2):227-251, 2024. 
https://doi.org/10.1007/s40436-023-00461-1 
[22] Andriy Bondarenko, Danylo Radchenko, and 
Kristian Seip. Fourier interpolation with zeros of 
zeta and L-functions. Constructive Approximation, 
57(2):405-461, 2022. 
https://doi.org/10.1007/s00365-022-09599-w 
[23] Kavita Bhosle, and Vijaya Musande. Evaluation of 
deep learning CNN model for recognition of 
devanagari digit. Artificial Intelligence and 
Applications, 1(2):114-118, 2023. 
https://doi.org/10.47852/bonviewAIA3202441 
[24] Jiawei Zhu, Xing Han, Hanhan Deng, Chao Tao, 
Ling Zhao, Pu Wang, Tao Lin, and Haifeng Li. 
KST-GCN: A knowledge-driven spatial-temporal 
graph convolutional network for traffic forecasting. 
IEEE Transactions on Intelligent Transportation 
Systems, 23(9):15055-15065, 2022. 
https://doi.org/10.1109/TITS.2021.3136287 
[25] Dongdong Cheng, Jinlong Huang, Sulan Zhang, and 
Quanwang Wu. A robust method based on locality 
sensitive hashing for K-nearest neighbors searching. 
Wireless Networks, 30(5):4195-4208, 2024. 
https://doi.org/10.1007/s11276-022-02927-9 
[26] M. Akif Günen. Comparison of histogram-curve 
fitting-based and global threshold methods for cloud 
detection. International Journal of Environmental 
Science and Technology, 21(6):5823-5848, 2024. 
https://doi.org/10.1007/s13762-023-05379-6 
https://doi.org/10.31449/inf.v49i12.7315 Informatica 49 (2025) 49–60 49 
Fusion CNN-Transformer Model for Target Counting in Complex 
Scenarios 
Xingyuan He1, Ruiying Wang2*, Ting Cao2, Weiyu Liang3, Yimin Fan4 
1Information Management Center, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050001, China 
2Finance Department, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050001, China 
3Information Engineering Department, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050001, China 
4Economic Management Department, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050001, China 
E-mail: hexingyuan1232022@126.com, wangruiying2005@126.com, caoting20095522@126.com, 
liangweiyu0314@163.com, fym2006@126.com 
*Corresponding author 
Keywords: convolutional neural network, attention mechanism, computer counting, target counting, fully self 
attention network 
Received: October 12, 2024 
To overcome the shortcomings of traditional manual counting methods, which are labor-intensive, 
resource-consuming, and inefficient, this study introduces a computer-based counting model. This model 
integrates convolutional neural networks (CNNs) with Transformer networks to efficiently recognize 
and count specific target objects in large-scale data scenarios. This approach leverages CNNs for local 
feature extraction and incorporates Transformer networks to capture long-range global information, 
achieving a synergistic effect. The methodology includes key steps such as “CNN for feature extraction 
and Transformer for global attention.” The experiment outcomes show that the model has an average 
absolute error of 10.13, a root mean square error of 12.08, an average counting accuracy of 98.6%, a 
peak signal-to-noise ratio of 23.75, a structural similarity of 0.933, a coefficient of determination of 
0.901, an average counting time of about 6.58ms per image, and a parameter count of 3.21 in target 
counting. It can also recognize and respond well to high complexity scenes while maintaining high 
accuracy. Compared to the CNN model, the research model reduces the error rate by 13.4%, indicating 
that the fusion of CNN and Transformer networks is effective in object counting for computer vision 
tasks. This result indicates that the model integrating convolutional neural networks and fully self 
attention networks can be well applied to computer recognition and object counting. 
 
Povzetek: Predstavljen je hibridni model CNN-Transformer za štetje tarč v kompleksnih scenarijih. 
Model združuje CNN za ekstrakcijo lokalnih značilnosti in transformer za zajemanje globalnih 
informacij.
 
1 Introduction form of deep learning architecture was incorporated to  
generate a model that can recognize and compute fish on 
Traditional counting relies on manual operation, with the images. The experimental results showed that the 
low processing power and efficiency, and often requires recall rate of the model reached 65.5% [6]. Chen G et al. 
a lot of manpower and time to identify large-scale data proposed a new efficient deep learning model called 
[1-3]. However, as computer technology advances, in Density Transformer for automatically counting trees 
recent years, many researchers have begun to rely on from aerial images. This architecture includes a multi-
computer vision technology to handle the matter of receptive field CNN for extracting visual feature 
object detection and identification counting in the context representations from local patches and their extensive 
of big data. At present, the application of computer contexts, and a Transformer encoder for transmitting 
counting has spread to many fields, such as road vehicle contextual information between relevant locations. The 
recognition and counting in vehicle transportation experimental results showed that the research model 
systems, melon and fruit counting in large-scale achieved the highest accuracy on both datasets, 
agricultural and forestry production, livestock counting, significantly better than most other methods [7]. Miao Z 
and colony counting in laboratories, etc. [4-5]. With the et al. proposed a weakly-supervised method that 
advancement of computer vision technology, an effectively combines multi-level dilated convolution and 
increasing number of computers counting algorithms and Transformer methods to achieve end-to-end crowd 
models have been developed and applied. Leong J M et counting. The experimental results showed that on four 
al. developed a fish counting system based on well-known benchmark population counting datasets, 
convolutional neural network (CNN) to assist hatchery this method outperformed other weakly supervised 
staff in counting fish from images. During the process,  methods and was comparable to fully supervised 
contrast limited adaptive histogram equalization was also methods [8]. Liu et al. proposed a multi-receptive field 
used to enhance the captured images, and a YOLOv5  extraction deep learning method grounded on YOLOX 
50 Informatica 49 (2025) 49–60 X. He et al. 
(MRF-YOLO) for detecting and counting small targets, precisely capture the local features of targets. This fusion 
and validated it on the cotton bolls dataset of a cotton strategy is expected to address the shortcomings of 
farm. The results indicated that the average accuracy of existing models when dealing with small and densely 
the model rose by 14.86%, with a mean square error of packed targets, while also improving the counting 
1.06 and a coefficient of determination of 0.92. The performance of the model in complex scenarios. 
model could be well applied to a wide range of small 
target crop detection [9]. Shen L et al. constructed a 2 Methods and materials 
YOLOv5s cluster detection model grounded on channel 
pruning algorithm and applied it to counting grape 2.1 Counting algorithm integrating CNN 
clusters in the field. The research results showed that the 
Computer counting refers to the collection of information 
mAP reached 82.3%, the average inference time per 
through computer vision mechanisms, in order to achieve 
image was 6.1 ms, the average counting accuracy was 
the effect of calculating or counting quantities. This 
84.9%, the video processing speed was 50.4 frames per 
method is often applied in the area of image processing, 
second, and the model parameters and complexness were 
such as vehicle counting, crowd counting, cell counting, 
effectively reduced while guaranteeing perception 
etc. CNN, as a type of deep learning algorithm, is 
precision. This model could be well applied to counting 
commonly applied in image recognition in the area of 
stacked grape clusters [10]. 
computer vision. It simulates the way neurons in the 
Despite the notable achievements of the 
human brain process information, especially the working 
aforementioned studies in their respective application 
mode of the visual cortex, and abstracts and extracts 
scenarios, the field of computer counting still faces 
feature layer by layer from input data to achieve 
several challenges and limitations. In particular, 
automatic processing and recognition simulation of grid 
mainstream models like YOLO frequently produce false 
structured data such as images [11-12]. These features 
positives and negatives when confronted with small, 
provide detailed object and element information for 
densely packed targets, largely attributed to their limited 
subsequent counting tasks. CNN is mainly composed of 
capacity in managing complex scenes and dealing with 
three parts: convolutional layer, pooling layer (also 
target occlusion. Furthermore, many existing counting 
known as downsampling layer), and fully connected 
models struggle to balance local and global feature 
layer. Its structure is represented in Figure 1. 
information. Local features are crucial for accurately 
In Figure 1, the first layer performs convolution 
identifying individual targets, while global features aid in 
operation on the input image to get a feature map (FM) 
understanding the entire scene and the distribution of 
with a depth of 3. Then pooling operation is constructed 
targets. However, existing models often fail to achieve a 
on the obtained FM to get a novel FM. The convolution 
balance between the two, resulting in insufficient 
pooling joint operation will be repeated until an FM with 
flexibility and accuracy during counting. 
a depth of 5 is obtained. This operation process can 
In response to these limitations, this study proposes a 
extract input data features layer by layer. As the number 
computer counting algorithm that integrates CNN and 
of convolutional and pooling layers rises, the model's 
Transformer networks. This algorithm aims to combine 
ability to interpret and express data gradually improves. 
the advantages of CNNs in local feature extraction with 
Finally, the obtained latest round of FMs is expanded and 
the capabilities of Transformers in global feature capture 
connected into vectors by rows, and passed into a fully 
and sequence modeling, thereby enhancing the accuracy 
connected layer. the internal hierarchical structure of 
and flexibility of computer counting. By introducing the 
CNN is analyzed. Part 1: convolutional layers, as shown 
Transformer module, it is hoped to enhance the model's 
in formula (1). 
understanding of global contextual information while 
leveraging the convolutional operations of CNNs to 
Table 1: Literature review table. 
Literature Method Major contribution There are problems 
Leong J M et al. Assist the staff of the hatchery in counting 
CNN-YOLOv5 The recall rate of the model is not high 
[6] fish from the images 
Chen G et al.  Deep learning models, a Can achieve automatic calculation of trees in The accuracy value is only slightly higher 
[7] multi receptive field CNN aerial images than the general model 
The research dataset is limited to the 
Effectively combining multilevel expansion 
Weak supervision law, population, and the generalization application 
Miao Z et al. [8] convolution and Transformer methods to 
Transformer of counting methods still needs to be 
achieve end-to-end population counting. 
considered 
Design proposes a multi receptive field 
 Liu et al  
YOLOX（MRF-YOLO） extraction deep learning method for Mean square error is relatively high 
[9] 
detecting and counting small targets 
 Shen L et al. YOLOv5s cluster detection constructed a detection model and applied it The average counting accuracy is slightly 
[10] model to the counting of grape clusters in the field. lower and the inference time is slightly longer 
 
 
Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 51 
3 feature maps
Input image 3 feature maps
Pooling
Convolution
Convolution
Output layer 5 feature maps
5 feature maps
Fully connected Pooling
 
Figure 1: CNN structure diagram. 
1 1 1 0 0 Bias=0
0 1 1 1 0 1 0 1 4 3 4
0 0 1 1 1 0 1 0 2 4 3
0 0 1 1 0 1 0 1 2 3 4
0 1 1 0 0 Filter 3*3 Feature map 3*3
Image 5*5
 
Figure 2: Example of convolution operation. 
s(i, j) = (X *W )(i, j) =
' (X + 2p−W )
 x(i −m, j − n)w(m, n)   (1) X = +1   (2) 
k
m n  
 In formula (2), the strid is k  and the zero-padding 
Formula (1) represents two-dimensional convolution. layer is p . The second part is pooling. The pooling layer 
Among them, W  is the convolution kernel (also known 
cuts the dimensionality of FMs while preserving 
as the weight matrix or filter), X  is the input matrix 
important details through downsampling operations. 
(also known as the input FM), and s(i, j)  means the 
Pooling can be divided into two types: maximum pooling 
value of the output matrix at position (i, j) . w(m,n)  and average pooling. Compared to max pooling, average 
means the value of convolution kernel W  at position pooling can preserve more detailed information. The 
(m,n) . x(i −m, j −n)  represents the elements of the third part is the fully connected layer, as shown in 
formula (3). 
input matrix X  that are accessed in the convolution 
 
operation. * Represents convolution. The essence 
represented by this formula as a whole is to multiply and 

add the elements at different positions of the matrix and Y =(V )

convolution kernel matrix of different parts of the image, V = conv2(W , X , ''valid '') +b   (3) 
as shown in Figure 2.  1 2
Figure 2 gives an illustration of convolution process.  E = d − yL
 2 2
An image is input and converted into a matrix. In the 
example, the matrix corresponding to the image is 5×5,  
and a 3×3 convolution kernel is utilized for convolution In formula (3), conv  represents the convolution 
to acquire a 3×3 FM. However, not all sliding steps are 1 function, valid  represents the type of convolution 
and need to be adjusted according to the situation. If the operation, b  is the bias vector,   is the activation 
sliding stride is greater than 1, there may be a situation 
function, E  is the total error, d  represents the expected 
where the convolution kernel cannot slide exactly to the 
output vector, y  means the output node vector, and L  
edge. In this case, it is necessary to add zeros to the 
outermost layer of the matrix, as shown in formula (2). means the amount of layers. Figure 3 shows a fully 
connected diagram. 
52 Informatica 49 (2025) 49–60 X. He et al. 
Hidden layer 1 Hidden layer 3
Output layer
Input layer
Hidden layer 2
 
Figure 3: Fully connected layer operation process. 
Pooling Pooling Pooling
n
lu
tio
Convolution Convolution Convolution
on
vo
C
Pooling Pooling
Pooling
Convolution
Convolution Convolution Convolution
Image noit ul ovno
C Pooling Pooling Pooling
Convolution Convolution Convolution
 
Figure 4: Network structure of FE module. 
Figure 3 illustrates the classification function of the ultimately improving the quality of the algorithm's 
fully connected layer, which takes all local detail features counting results. In short, integrating the powerful FE 
as input to the input layer, passes through multiple capabilities of CNN can effectively enhance computer 
hidden layers (including linear transformation, nonlinear vision technology and achieve automatic counting of 
activation, etc.), and finally generates prediction results specific objects in images or videos. 
through the output layer. However, when CNN is 
integrated with counting algorithms, it mainly focuses on 2.2 Counting algorithm integrating CNN 
FE and classification [13]. When the object overlap and transformer 
coincidence rate of the counted image are high, it is very 
easy to encounter the problem of varying visual Although CNN has strong local FE and parameter 
perception depth in comparison with the initial image, sharing capabilities, it can decrease the amount of model 
which makes it difficult to recognize or misidentify [14]. parameters and is widely used in image classification and 
The counting algorithm that integrates CNN can improve object detection, thereby improving computer vision 
the FE module of the original counting algorithm, counting. However, CNN based counting algorithms lack 
helping to enhance the algorithm's ability to capture modeling of global information, and CNN assumes that 
feature information, as shown in Figure 4. image features have spatial invariance. Therefore, once 
Figure 4 gives the structure of the FE module that the target object undergoes deformation or positional 
integrates CNN counting algorithm. The FE module changes, it will affect the final counting results [15]. 
includes three parallel CNN networks, with each Based on this, the study intends to introduce Transformer 
column's filter (i.e. convolution kernel) having a different on the basis of CNN's counting algorithm. Transformer 
size of local receptive field. This produces different excels in global information modeling, complementing 
feature information extraction effects for counting CNN and Transformer to raise the precision and validity 
objects of different distances and sizes, providing higher of counting tasks, as represented in Figure 5. 
quality FMs for subsequent network modules and 
 
Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 53 
Input Input Position Output
embedding layer encoding
N
 
Figure 5: Schematic diagram of transformer structure. 
MatMul
SoftMax
Scale
MatMul
Query Key Value
 
Figure 6: Self attention mechanism calculation process. 
From Figure 5, it can be seen that Transformer is input element, i  means the specific dimension of the 
mainly composed of Position Embedding, Multi-Head element, and d  represents the dimension of the input. 
m
Self Attention (MSA) mechanism, Residual Structure The Transformer model's essential feature is the self-
(Add), Normalization (Norm), and FeedForward attention mechanism, enabling it to consider all other 
Network (FFN) [16]. The entire processing flow is to elements while processing a single element in the 
first feed the input data into an input embedding layer sequence, thereby capturing long-range dependencies in 
composed of transition matrices and convert it into an the sequence. The computation process is shown in 
initial tensor. Then positional encoding information is Figure 6. 
added to the tensor to generate a new tensor. The new In Figure 6, it can be seen that Query , Key , and 
tensor is immediately transmitted to the FE module for 
further processing. In the FE module, the FE process is Value  are matrices composed of vectors q , k , and . 
i i vi
repeated N times, each iteration aims at extracting deeper Query  and Key  obtain an output vector sequence 
and more abstract characteristics from the input data, containing rich contextual information through matrix 
ensuring that the model can seize intricate patterns and multiplication, scaling, SoftMax, and quadratic matrix 
structures in the data until the optimal result is output. multiplication, while Value  directly outputs the sequence 
Among them, the position code is shown in formula (4). through matrix multiplication. The specific first step 
 calculation is shown in formula (5). 
 position  
 PE( position,2i) = sin( )
2i a =Wx    (5) 
 d i i
10000 m
   (4)  
 position
PE( position,2i+1) = cos( ) In formula (5), a  is the middle tensor, W  is the 
 2i i
 d
 10000 m learning matrix, and x  is the input tensor. Each input 
i
 tensor is first multiplied by a W  matrix and encoded to 
In formula (4), PE  is the position encoding, and the obtain the intermediate tensor. Multiplying each 
system in formula (4) is the commonly used position intermediate tensor with different learning matrices 
encoding, namely sine cosine position encoding. It yields the desired vector, as shown in formula (6). 
represents the relative or absolute positional relationship  
between pixels. The function of position encoding is to qi =Wq , aiki =Wk ,
enable the model to obtain effective position information.   (6) 
aivi =Wvai , (i = 0,1, 2,..., d )
Among them, position  represents the position of the 
NSA
Add&Norm
FFN
Add&Norm
54 Informatica 49 (2025) 49–60 X. He et al. 
Among them, q , k , and  r pre
i v e sent the vectors In formula (10), x  represents the mean of the input 
i i
corresponding to Query, Key, and Value. W , 
q W , and tensor,   is the standard deviation,   and   represent 
k
learnable parameters, and the size is usually equal to the 
W  are corresponding learnable matrices. d  is the 
i number of channels. Layer normalization is only 
dimension of the input vector. Among them, each vector applicable to single sample processing and is suitable for 
q  will perform attention calculation on each vector k  
i j handling long sequence data and learning global 
(j=0, 1, 2,..., d), that is, perform similarity calculation of relationships from single samples. In addition, residual 
vector dot multiplication. Due to the fact that the dot connections are also introduced in the Transformer 
multiplication result increases with the increase of module, as shown in formula (11). 
dimension, it is necessary to compress the result and  
process it through Sofmax, as shown in formula (7). F = Att(X )+ X    (11) 
  
 qi k j In formula (11), Att  represents the attention layer 
ai , j = Soft max( )
 d and F  represents the output feature. The function of 
 y   (7) residual connections is to send the data from the last 
e i
 Soft max( y layer to the subsequent layer through skip connections, 
i ) = y
 e i
which simplifies the model's learning process of identity 
 maps, thereby promoting information flow and 
In formula (7), a  represents the normalized alleviating the problems of gradient vanishing and 
i , j
exploding [17-18]. In summary, integrating CNN and 
probability value of the vector at position (i, j)  Transformer networks to construct CNN Transformer 
corresponding to the Softmax function processing. The counting algorithms can complement each other's 
Softmax function can convert the output values of strengths and weaknesses, improve computational 
multiple classifications into a probability distribution flexibility, enhance global information modeling 
within the range of (0,1) and equal to 1. Finally, multiply capabilities, and improve the accuracy and efficiency of 
the obtained a  with all vi vectors and sum them to 
ij counting tasks. The detailed parameter information of the 
obtain the feature pixels, as shown in formula (8). model is as follows, as shown in Table 2. 
 
QKT 3 Results 
Attention(Q, K ,V ) = Soft max( )V   (8) 
d 3.1 Performance analysis based on CNN-
 transformer counting algorithm model 
Formula (8) represents the calculation of attention 
weights in the self attention mechanism. It is worth To verify the capability of the model grounded on the 
noting that the current attention mechanism of CNN-Transformer counting algorithm, simulation 
Transformers usually adopts the Multi Head Self experiments were conducted for validation. Common 
Attention (MSA) mechanism, which is represented as computer vision applications include counting road 
formula (9) vehicles in traffic monitoring systems and counting 
 bacterial colonies in laboratory culture dishes. 
 Z Considering the difficulty of obtaining the dataset, the 
i = Attention(Qi , Ki ,Vi ), (i =1,2,...,h)
 (9) study intended to use the actual chicken feeding situation 
MultiHead (Q, K ,V ) = Concat(Z1, Z2 ,..., Zh )Wo of a large-scale breeding farm in a certain area as the 
 experimental dataset. The selection of live chicken 
In formula (9), i  represents the i th self attention feeding data for this large-scale breeding farm was 
head, h  means the amount of self attention heads, and mainly based on the following considerations: Firstly, 
Z  means the output matrix calculated by the i th self this dataset has high practical application value and can 
i
provide strong support for precision breeding and animal 
attention head. Compared with self attention 
health management. Secondly, compared to other 
mechanisms, multi-head attention mechanisms can 
scenarios, the chicken flock activities in the breeding 
independently and parallelly compute attention in 
farm are more intensive and regular, providing rich test 
different subspaces, achieving the effect of 
samples for counting algorithms. Finally, the dataset 
simultaneously focusing on different features of the input 
exhibits high diversity in terms of image quality, lighting 
sequence from different perspectives. In addition, in the 
conditions, and background complexity, which helps to 
normalization selection of the model, Transformer adopts 
comprehensively evaluate the model's generalization 
layer normalization, as shown in formula (10). 
ability. A total of 80 live data segments were collected, 
 
with a duration of 30-60 seconds per segment, a 
x−
LayerNorm(x) =   ( )+    (10) resolution of 1920 × 1080 pixels, and a frame rate of 25 
 frames per second. For the collected chicken breeding 
 video data, images were extracted from the video at 
intervals of 15 frames. In order to improve the quality of 
Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 55 
the dataset, manual inspection was used to remove annotation quality through consistency checks. Finally, 
excessively similar or blurry images, and data 761 images were obtained, and the dataset was separated 
augmentation was performed on the images in the into a training set (685 images) and a testing set (76 
training set, including random rotation, scaling, cropping, images) in a 9:1 ratio. The parameter size was set to: 
and color transformation. In addition, to ensure the Learning Rate: 0.0005; Optimizer: AdamW; Epochs: 
accuracy of annotation, the study adopted cross 100; Batch Size: 32. The flowchart of data processing is 
validation method, where multiple annotators shown in Figure 7. 
independently annotate the images and ensure the 
Table 2: model parameters. 
CNN 
Image size Convolutional kernel size Number of convolution kernels Step size and filling 
224×224×3 3x3 64 1 
Transformer 
Embedding dimension Position encoding Hidden layer dimension Encoder layers 
768 Sine/Cosine Position Encoding 2048 6 
 
Dataset 
Model Training Stage: Training-Validation 
preprocessing stage: Evaluation Stage: 
Split Optimization:
Evaluate the performance 
Input the preprocessed Stratified random sampling 
of the model using test 
Including data cleaning, data into the CNN and cross-validation ensure 
set data, calculate and 
normalization, and transformer fusion model consistent dataset 
record accuracy, recall, 
target annotation. for feature extraction and proportions and mitigate 
F1 score, and other 
sequence modeling. randomness impact.
evaluation metrics.
 
Figure 7: The flowchart of data processing. 
25 30
CNN-Transformer
CNN
20 25 Transformer
RF
SVM
15 20
CNN-Transformer
10 CNN 15
Transformer
RF
5 SVM 10
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Number of images Number of images
(a) Performance indicators MAE of (b) Performance of each model on 
each model the RMSE performance metric
 
Figure 8: Performance of different algorithms on MAE and RMSE of the training set. 
Mean Absolute Error (MAE), Root Mean Square be used to measure the similarity between the 
Error (RMSE), Mean Accuracy (MA), Peak Signal to reconstructed count image and the actual count image. A 
Noise Ratio (PSNR), Structural Similarity (SSIM), and higher PSNR value indicates better quality of the 
Coefficient of Determination (R2) were used as reconstructed count image and its closeness to the actual 
evaluation metrics for model performance. MAE image. To more intuitively testify the superiority of the 
measures the average of the absolute differences between CNN Transformer counting algorithm model, four 
the predicted and actual values. In counting tasks, MAE counting algorithm models including CNN, Transformer, 
provides a straightforward reflection of the accuracy of Support Vector Machine (SVM), and Random Forest 
the model’s predictions. RMSE assigns higher weights to (RF) were included as comparative algorithms. The 
larger errors, in counting tasks, it highlights significant comparison results of MAE and RMSE performance of 
deviations in predictions. PSNR in counting tasks, it can 
MAE
 RMSE
56 Informatica 49 (2025) 49–60 X. He et al. 
different algorithms in the training set are shown in The RF model had the highest value of 16.7. The 
Figure 8. comparison results of MA and PSNR performance of 
In Figure 8, (a) shows the ability of each model on different algorithms in the training set are shown in 
the behaviour metric MAE. MAE is one of the key Figure 9. 
indicators for model evaluation, which calculates the In Figure 9, (a) shows the behaviour of each model 
mean absolute deviation between predicted and actual on the behaviour metric MA. The larger the MA, the 
values, and is used to characterize the count of network higher the counting accuracy and stability of the network 
models. The smaller the value, the better the model. From Figure 9 (a), the MA value of the CNN-
performance. From Figure 8 (a), the MAE value of the Transformer fusion counting algorithm was 98.6%, 
CNN-Transformer fusion counting algorithm was 10.13, which was the highest compared to the other four 
which was the lowest compared to the other four counting algorithms. Figure 9 (b) shows the behaviour of 
counting algorithms. Figure 8 (b) shows the behaviour of each model on the behaviour metric PSNR. This 
each model on the performance metric RMSE. RMSE indicator represents the quality of an image based on the 
was another important indicator for model evaluation, error between corresponding pixels, so the higher the 
which was the average square root error between the PSNR value, the higher the quality of the predicted 
predicted and actual values. It was used to characterize generated image. In Figure 9 (b), the PSNR value of the 
the stability of network model counting, and the smaller CNN-Transformer fusion counting algorithm was the 
its value, the better the stability of the model. The highest, at 23.75. Compared with the other four counting 
Transformer model had the highest value of 17.8. From algorithms, this algorithm performed the best in image 
Figure 8 (b), the RMSE value of the CNN-Transformer quality assessment. The comparison results of SSIM 
fusion counting algorithm was 12.08, which was the performance of different algorithms in the training set are 
lowest compared to the other four counting algorithms. shown in Figure 10. 
100 25
80 20
60 15
CNN-Transformer CNN-Transformer
40 CNN 10 CNN
Transformer Transformer
RF RF
SVM
20 SVM 5
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Number of images Number of images
(a) Performance indicators MA of (b) Performance of each model on the 
each model PSNR performance metric
 
Figure 9: Performance of different algorithms on the MA and PSNR of the training set. 
1.0 1.0
0.8 0.8
0.6 0.6
CNN-Transformer
CNN
0.4 Transformer 0.4 CNN-Transformer
RF CNN
SVM Transformer
RF
0.2 0.2
SVM
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Number of images Number of images
(a) Performance indicators SSIM of (b) Performance of each model on the 
each model R2 performance metric
 
Figure 10: Performance of different algorithms on SSIM and R2 in the training set. 
SSIM MA
R2 PSNR
Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 57 
30 Parameter 18
25 Time
20 13
15
10 8
5
0 3
Transformer RF SVM CNN CNN-Transformer
Model
 
Figure 11: The counting time and parameter count of each algorithm model. 
In Figure 10, (a) shows the specific situation of the parameter quantity of a single image, as shown in Figure 
training sets of five computer counting algorithms on 11. 
SSIM. This indicator often considers the brightness, Figure 11 shows the specific situation of the five 
contrast, and structure of the image comprehensively to models in terms of time and parameters. The counting 
achieve the effect of measuring the correlation between algorithm model that integrated CNN-Transformer had 
pixels, making it closer to human subjective perception the shortest average counting time for a single image, 
of image quality. Generally speaking, the closer the about 6.58ms, and the smallest number of parameters, 
SSIM value is to 1, the higher the image quality about 3.21. In comparison with the model with the 
predicted by the algorithm. From Figure 10, it is told that longest average detection time for a single image, there 
the SSIM value of the CNN-Transformer fusion counting was a difference of 6.62ms. Compared with the model 
algorithm was 0.933, which was closest to 1 compared to corresponding to the maximum parameter count, there 
other models. In addition, compared with the other four was a difference of 24.33. Obviously, the model 
algorithms, the convergence speed of the research proposed in the study had shorter recognition and 
algorithm was significantly higher in the SSIM image, counting time, and more efficient counting efficiency in 
with the convergence inflection point located around actual counting. The above indicators reflected the 
image number 40. Figure 10 (b) shows the specific overall testing performance of each model. To 
situation of R2 for each model, which reflects the degree understand the situation of each model in counting error 
of fit of the model. From the figure, it is told that the R2 images, the study also tested the error counting 
value of the CNN Transformer fusion counting algorithm probability of each model in the test set, recorded the 
was 0.901, which was closest to 1 compared to other image numbers of error counts in each counting 
models. Based on the above, the proposed counting algorithm, and summarized the number of times each 
algorithm that integrates CNN Transformer had good image was counted incorrectly. The results are shown in 
counting performance on the training set. Furthermore, to Figure 12. 
demonstrate the universality of the model application, the In Figure 12, (a) shows the false detection rates of 
experiment also explored it on a publicly available different algorithms, and (b) shows the distribution of 
dataset. This dataset is the Distribution Transformer error count images. From Figure 12 (a), as the number of 
Detection Dataset (DTD). The same performance counting images increased, the error rates of each 
indicators as mentioned above were selected for testing. algorithm randomly increased. However, compared to 
The experimental results showed that MAE was 10.02, the other four algorithms, the counting algorithm that 
RMSE was 12.02, MA was 97.6%, PSNR was 23.55, integrated CNN-Transformer had a lower overall false 
SSIM was 0.934, and R2 was 0.911. detection rate. In Figure 12 (b), out of 76 test set images, 
62 images were correctly counted by all models, 
3.2 Testing and analysis based on CNN accounting for 81.58% of the total; The number of 
transformer counting algorithm model images with an error count of less than or equal to 1 
accounted for 88.15% of the entire test set. Among the 
In the above experiment, the proposed CNN-Transformer five models mentioned above, there were a total of three 
counting algorithm model performed well on the training images with a classification error rate higher than 50%. 
set. To formalize more about the practical application One of them was incorrectly counted by four models, 
ability of the model, the study intended to use a test set to indicating that this image had strong confusion and the 
analyze the model again. Among them, the study category features might not be clear enough. The specific 
compared the recognition performance of various models number of this image in the test set was 13, with 4 errors. 
by introducing the average detection time/ms and The specific situation of the error probability of this 
image in the five models is represented in Table 3. 
Parameter
Time/ms
58 Informatica 49 (2025) 49–60 X. He et al. 
350 Number of images
CNN-Transformer 75
300 100
Transformer Cumulative proportion
70
250 RF 60 90
SVM
50
200
40 80
150 30
100 20 70
CNN
10
50 5 60
0
0 50 100 150 200 250 300 350 0 1 2 3 4
Number of data Number of prediction errors
(a) False detection rates of different algorithms (b) Distribution of Error Counting Images
 
Figure 12: Error recognition status of each model. 
Table 3: Probability of incorrect counting for figure 13 by each model. 
Image number Model Predicted probability 
CNN [0.78,0.16] 
Transformer [0.97,0.56] 
13 SVM [0.93,0.64] 
RF [0.92,0.18] 
CNN-Transformer [0.59,0.51] 
 
Table 3 shows the error count probabilities of each counting algorithm had the shortest average single image 
algorithm for high ambiguity image number 13. The true counting time of about 6.58ms, with a parameter count of 
label of image 13 was a positive sample. From the figure, 3.21 and the lowest quantity. In terms of error counting, 
the intervals of the five counting algorithms in the two- all algorithms showed a trend where the more recognized 
dimensional vector were [0.77, 0.21], [0.96, 0.55], [0.92, images, the higher the false detection rate. However, for 
0.63], [0.91, 0.17], and [0.60, 0.30]. The first element in a single algorithm, the counting algorithm that integrated 
this interval was the probability of incorrectly judging a CNN-Transformer exhibited a lower overall false 
positive sample, and the second element was the detection rate. In addition, in low feature and high 
probability of correctly judging a positive sample. Except ambiguity images, except for the counting algorithm that 
for the CNN-Transformer model, all other models made integrated CNN-Transformer, all other algorithms had 
incorrect judgments. Subsequently, after separate incorrect recognition counts, indicating that the counting 
analysis, it was found that the high error rate of image algorithm that integrated CNN and Transformer still had 
number 13 was due to issues with lighting and occlusion. good counting ability in recognizing high complexity 
The CNN Transformer model combines the advantages counting scenes.  
of CNN and Transformer, using CNN to extract local The CNN Transformer model exhibited significant 
features and Transformer to capture global contextual advantages in balancing the number of parameters, 
information, thus improving the model's ability to inference time, and model accuracy. In resource 
process blurry images. Overall, the counting algorithm constrained environments such as farms and other 
that integrated CNN-Transformer still had good practical application scenarios, traditional complex 
recognition and counting capabilities in high complexity models often struggle to run stably due to the lack of 
scenarios. powerful computing and storage capabilities of the 
devices in these scenarios. The research model, due to its 
4 Discussion limited number of parameters and fast inference speed, 
can adapt well to these resource constrained 
The fusion CNN-Transformer counting algorithm environments. Therefore, in practical applications, this 
proposed in the study performed well in various model can accurately count the number of chickens and 
performance analysis indicators of the training set data, provide timely and accurate data support for farm 
with MAE of 10.13, RMSE of 12.08, MA of 98.6%, managers. This helps them better understand the feeding 
PSNR value of 23.75, and SSIM and coefficient of situation, develop scientific feeding plans, and thus 
determination close to 1. In comparison with other improve feeding efficiency and economic benefits. 
algorithms, the algorithm raised in the study performed Meanwhile, due to the fast inference speed of the model, 
excellently in all indicators. In addition, in the test set, it can also meet the real-time requirements and provide 
the experiment also compared the average single image real-time data feedback for farm managers. 
counting time and parameter count of five counting In the same type of research, Zhang L et al. proposed 
algorithms. It was found that the CNN Transformer a shrimp automatic local image-based enumerating way 
Number of wrong data
Number of images
Cumulative proportion
Fusion CNN-Transformer Model for Target Counting in Complex… Informatica 49 (2025) 49–60 59 
utilizing lightweight YOLOv4, and constructed a local maps to enhance the interpretability of models. In 
shrimp enumerating model grounded on Light-YOLOv4. addition, the chicken breeding image dataset used in the 
The strategy underwent testing on a shrimp dataset, and study still has insufficient quantity in the context of deep 
the results showed that the Light-YOLOv4 local shrimp learning. In the future, data augmentation techniques 
enumerating model acquired a enumerating accuracy of such as rotation, scaling, cropping, and flipping can be 
92.12%, a recall rate of 94.21%, an F1 value of 93.15%, further adopted to increase data diversity and help 
and an average accuracy mean of 93.16% [19]. Although models learn more robust features, thereby improving 
the comprehensive counting ability of this model was their generality. 
superior, its average accuracy was lower than that of the 
model in this study. Wu Fy et al. fused the CNN Deeplab References 
V3+model with traditional image processing algorithms 
and applied it to the detection and counting of banana [1] Muhammad Asif Khan, Hamid Menouar, and Ridha 
bunches. The results showed that the final bundle Hamila. Revisiting crowd counting: State-of-the-art, 
perception precision was 86%, the accuracy of bacterial trends, and future perspectives. Image and Vision 
colony detection during harvesting was 76%, and the Computing, 129(1):104597, 2023. 
overall bacterial colony counting accuracy was 93.2% https://doi.org/10.1016/j.imavis.2022.104597 
[20]. The results of this model were lower than the [2] Wim Bernasco, Evelien M. Hoeben, Dennis Koelma, 
comprehensive behaviour of the model in this study. Lasse Suonperä Liebst, Josephine Thomas, Joska 
The results of this study have significant advantages Appelman, Cees G. M. Snoek, and Marie 
over existing technology, which may be attributed to the Rosenkrantz Lindegaard. Promise into practice: 
ability of CNN to handle local features and the modeling Application of computer vision in empirical 
of global dependencies by Transformer. CNN can research on social distancing. Sociological Methods 
effectively extract local features of images, while & Research, 52(3):1239-1287, 2023. 
Transformer captures global dependencies in images https://doi.org/10.1177/00491241221099554 
through its self attention mechanism. The combination of [3] N Krishnachaithanya, Gurdit Singh, Smita Sharma, 
the two enables more accurate counting when dealing Rangisetti Dinesh, Sumeet Ramsingh Sihag, Kamna 
with complex scenes. However, this fusion also brings Solanki, Abhishek Agarwal, Mrinalini Rana, and 
certain complexity, such as an increase in the number of Ujjwal Makkar. People counting in public spaces 
parameters. However, this research model achieved fast using deep learning-based object detection and 
inference time while maintaining a low number of tracking techniques. 2023 International Conference 
parameters, indicating a good balance between on Computational Intelligence and Sustainable 
complexity and efficiency. Engineering Solutions (CISES), 21(1):784-788, 
2023. 
https://doi.org/10.1109/CISES58720.2023.1018350
5 Conclusion 3 
Traditional counting relies on manual operation, with [4] Li Zhang, Leilei Yan, Mengqian Zhang, and Jingang 
low processing power and efficiency, and often requires Lu. T2 CNN: a novel method for crowd counting 
a lot of manpower and time to identify large-scale data. via two-task convolutional neural network. The 
However, with the prosperity of Internet technique, Visual Computer, 39(1):73-85, 2023. 
computer vision technique can effectively solve this https://doi.org/10.1007/s00371-021-02313-0 
problem for object detection and counting. CNN and [5] Shashi Bhushan Jha, and Radu F. Babiceanu. Deep 
Transformer are representative models of deep learning. CNN-based visual defect detection: Survey of 
The former has good local FE ability, while the latter has current literature. Computers in Industry, 
a non cyclic structure based on attention mechanism and 148(1):103911, 2023. 
processes the entire input sequence in parallel. Based on https://doi.org/10.1016/j.compind.2023.103911 
this, the study integrated CNN with Transformer to [6] Leong J M, Hijazi M H A, Saudi A, On C K, Fui C F, 
construct a CNN-Transformer model, and explored its Haviluddin H. The development and usability test 
performance in target counting through simulation of an automated fish counting system based on 
training and testing. The results showed that the model CNN and contrast limited histogram equalization. 
performed well in performance analysis. In testing Bulletin of Electrical Engineering and Informatics, 
analysis, the counting time and parameter count of the 13(2):1128-1137, 2024. 
model were significantly lower than other models of the https://doi.org/10.11591/eei.v13i2.5840 
same type. However, it still performed well in low [7] Chen G, Shang Y. Transformer for tree counting in 
feature and high confusion image counting recognition. aerial images. Remote Sensing,14(3):476 2022. 
Although the research achieved good results, there were https://doi.org/10.3390/rs14030476. 
still some limitations, such as the lack of clear input- [8] Miao Z, Zhang Y, Peng Y, Peng H, Yin B. DTCC: 
output mapping in the Transformer model compared to Multi-level dilated convolution with transformer for 
other models, which increased the difficulty of internal weakly-supervised crowd counting. Computational 
interpretation. In the future, efforts can be made to Visual Media, 9(4): 859-873, 2023. 
incorporate interpretable artificial intelligence https://doi.org/10.1007/s41095-022-0313-5 
technologies such as attention visualization or salinity 
60 Informatica 49 (2025) 49–60 X. He et al. 
[9] Qianhui Liu, Yan Zhang, and Gongping Yang. Small 2022. 
unopened cotton boll counting by detection with https://doi.org/10.1016/j.biosystemseng.2022.05.01
MRF-YOLO in the wild. Computers and 1 
Electronics in Agriculture, 204(1):107576, 2023. [20] Fengyun Wu, Zhou Yang, Xingkang Mo, Zihao Wu, 
https://doi.org/10.1016/j.compag.2022.107576 Wei Tang, Jieli Duan, and Xiangjun Zou. Detection 
[10] Lei Shen, Jinya Su, Runtian He, Lijie Song, Rong and counting of banana bunches by integrating deep 
Huang, Yulin Fang, Yuyang Song, and Baofeng Su. learning and classic image-processing algorithms. 
Real-time tracking and counting of grape clusters in Computers and Electronics in Agriculture, 
the field based on channel pruning with YOLOv5s. 209(1):107827, 2023. 
Computers and Electronics in Agriculture, https://doi.org/10.1016/j.compag.2023.107827 
206(1):107662, 2023.  
https://doi.org/10.1016/j.compag.2023.107662  
[11] Yao Liu, Hongbin Pu, and Da-Wen Sun. Efficient 
extraction of deep image features using 
convolutional neural network (CNN) for 
applications in detecting and analysing complex 
food matrices. Trends in Food Science & 
Technology, 113:193-204, 2021. 
https://doi.org/10.1016/j.tifs.2021.04.042 
[12] Jinzhu Lu, Lijuan Tan, and Huanyu Jiang. Review 
on convolutional neural network (CNN) applied to 
plant leaf disease classification. Agriculture, 
11(8):707, 2021. 
https://doi.org/10.3390/agriculture11080707 
[13] Xiang Chen, Hao Li, Mingqiang Li, and Jinshan 
Pan. Learning a sparse transformer network for 
effective image deraining. Proceedings of the 
IEEE/CVF Conference on Computer Vision and 
Pattern Recognition, 46(1):5896-5905, 2023. 
https://doi.org/10.48550/arXiv.2303.11950 
[14] Guy Farjon, Liu Huijun, and Yael Edan. Deep-
learning-based counting methods, datasets, and 
applications in agriculture: A review. Precision 
Agriculture, 24(5):1683-1711, 2023. 
https://doi.org/10.1007/s11119-023-10034-8 
[15] Nourhan T.A. Abdelnaiem, Hossam M.A. Fahmy, 
and Anar A. Hady. DC-PHD: multitarget counting 
and tracking using binary proximity sensors. 
International Journal of Wireless and Mobile 
Computing, 16(1):44-59, 2022. 
https://doi.org/10.1504/IJWMC.2023.135383 
[16] Xin Man, Deqiang Ouyang, Xiangpeng Li, Jingkuan 
Song, and Jie Shao. Scenario-aware recurrent 
transformer for goal-directed video captioning. 
ACM Transactions on Multimedia Computing 
Communications and Applications, 35(1):11079-
11091, 2022. https://doi.org/10.1145/3503927 
[17] Matteo Polsinelli, Luigi Cinque, and Giuseppe 
Placidi. A light CNN for detecting COVID-19 from 
CT scans of the chest. Pattern Recognition Letters, 
140(1):95-100, 2020. 
https://doi.org/10.1016/j.patrec.2020.10.001 
[18] Diksha Moolchandani, Anshul Kumar, and Smruti 
R. Sarangi. Accelerating CNN inference on ASICs: 
A survey. Journal of Systems Architecture, 
113(1):101887, 2021. 
https://doi.org/10.1016/j.sysarc.2020.101887 
[19] Lu Zhang, Xinhui Zhou, Beibei Li, Hongxu Zhang, 
and Qingling Duan. Automatic shrimp counting 
method using local images and lightweight 
YOLOv4. Biosystems Engineering, 220(1):39-54, 
https://doi.org/10.31449/inf.v49i12.7392                                                                               Informatica 49 (2025) 61–76 61 
 
Advanced Optimal Cross-Modal Fusion Mechanism for Audio-Video 
Based Artificial Emotion Recognition 
 
Himanshu Kumar*, Martin Aruldoss 
Department of Computer Science, Central University of Tamil Nadu, Thiruvarur, India.   
E-mail: himanshukphd20@students.cutn.ac.in, martin@cutn.ac.in2 
*Corresponding author  
 
Keywords: multimodal fusion, cross-modal fusion, emotion recognition, artificial emotion intelligence, fusion 
mechanism   
Received: October 22, 2024       
The advanced technology of artificial emotional intelligence has greatly contributed to multimodal emption 
recognition task. Emotion recognition has played a crucial role in many domains, like communication, e-
learning, mental healthcare, contextual awareness, and customer satisfaction. As real-time data continues to 
expand, addressing the problem of emotion recognition has become critical and complex. A key challenge 
lies in recognizing emotions from multimodal heterogeneous input sources, aligning extracted features, and 
developing robust emotion recognition models. In this study, we explore a cross-modal (audio and video 
modality) fusion mechanism for emotion recognition, effectively addressing the associated feature 
complexities. We have used 2D-CNN and 3D-CNN deep learning models for audio and video feature 
extractions and developed robust models for emotion recognition. This study emphasizes the importance of 
Compact Bilinear Gated Pooling (CBGP) cross-modal fusion mechanism and highlights the contribution of 
fusing the features from audio and video modalities for emotion recognition. It also discusses the working 
principle and comparison performance with other peer cross-modal fusion techniques such as FBP and CBP. 
The performance of advanced cross-modal fusion is compared to baseline traditional cross-modal fusion 
mechanisms including EF-LSTM, LF-LSTM, Graph-MFN, hybrid fusion and transformer model based fusion 
mechanisms such as, attention fusion and transformer fusion. This experiment is performed on benchmark 
datasets CMU-MOSEI and achieves an accuracy of 80.3%, F1-score of 79.2%, and MAE of 54.2%. 
Povzetek: Predstavljen je napredni mehanizem optimalne fuzije med modalnostmi za umetno prepoznavanje 
čustev na podlagi avdio-video posnetkov.  Študija uporablja 2D- in 3D-CNN za ekstrakcijo značilnosti, 
poudarja pomen CBGP fuzije in dosega odlične razultate na naboru podatkov CMU-MOSEI. 
 
1 Introduction (FBP) [2], Compact bilinear pooling (CBP) [3], and 
Compact Bilinear Gated Pooling (CBGP) [4].  
Emotion recognition is being successfully used in many Emotion recognition from audio and video modalities are 
domains and applications. The adoption of this very crucial because audio and video (collection of image 
technology has grown rapidly in healthcare, e-learning frames) gives a wide range of information regarding, 
and advertising [1]. Initially, emotion recognition was pitch, tone, image texture, facial movements, and facial 
limited to with unimodal approaches, but it has now expressions [5]. To train a model it is easy to extract 
gained more popularity with the advancement of features within the same modality and from another 
multimodal approaches and enhanced techniques. Its modality. This type of feature extraction leads to training 
growing demand has expanded the scope for exploring a deep learning model to fine grained emotion 
various directions of research in emotion recognition. classification tasks [6]. To work with different modalities, 
Multimodal data inherently contains rich information and the most important and primary step is to extract the 
has the potential to learn meaningful patterns from features from both the modalities. After preprocessing 
extracted features. In our study, we intend to achieve and cleaning the features, it is required to align those 
emotion recognition by combining features extracted features, and combine only those features which have 
from audio and video modalities and employing a fusion essential information and can help to train a deep learning 
mechanism. This study explores the cross-modal fusion model [7]. This study uses two different deep learning 
approach, where the term ‘cross modal fusion’ refers to models, one is 2D-CNN [8] for audio modality and other 
integrating essential features from heterogeneous input is 3D-CNN [9] for video modality. As per the previous 
sources, further this integration helps in training deep studies, this study aims to explore the advanced fusion 
learning models and classifying emotions effectively. mechanism such as Factorized bilinear pooling (FBP), 
Advanced cross-modal fusion mechanisms are Compact bilinear pooling (CBP), and Compact Bilinear 
categorized in three types: Factorized bilinear pooling Gated Pooling (CBGP). This study compares the 
62   Informatica 49 (2025) 61–76                                                                                                                                    H. Kumar et al. 
advanced fusion approaches with state-of-the-art fusion 2   Literature review 
approaches such as early fusion, late fusion, and hybrid 
fusion, as well as transformer model based fusion This section offers an overview of the features of audio-
techniques such as attention fusion and transformer video modalities, and the existing fusion mechanisms in 
fusion. multimodal emotion recognition, along with a detailed 
review. Table 1 summarizes the related work and some 
The research contribution of the proposed work 
baseline cross-modal fusion mechanisms, particularly for 
are as follows: 
emotion recognition in audio-video modalities using the 
• Highlights the limitations of traditional fusion CMU-MOSEI dataset.  
mechanisms, such as high dimensionality, 
2.1 Feature extraction 
suboptimal interdependency modeling, and 
challenges in fine-grained emotion classification. Before feature extraction, the raw input dataset is pre-
processed to ensure it is free from noise, missing values 
• Addresses a critical gap to reduce the computational 
and other inconsistencies [10]. Feature extraction is a 
errors and improve the sustainability of audio-video 
crucial part of feature engineering in any classification 
emotion recognition systems. 
model, which yields critical information from the input 
• Introduces a novel gating unit and cross-modal data. Feature sets act as input vectors for a deep learning 
fusion approach using factorized bilinear pooling model, containing all the necessary information about the 
and compact bilinear pooling, addressing the modalities that help the model learn patterns [11]. This 
inefficiencies in traditional fusion methods. This section reviews the features and feature sets of audio and 
solution enhances feature interaction and reduces video modalities utilized in previous research studies. 
computational complexity. 
i. Audio features 
• Employs lightweight 2D-CNN and 3D-CNN 
To effectively train deep learning models with audio 
architectures for audio and video modalities, 
features, feature extraction tools and libraries such as 
respectively, avoiding the need for pruning and 
LibROSA [12], OpenSMILE [13], and pyAudioAnalysis 
quantization while maintaining network simplicity.  
[14] has proven indispensable. These tools are essential 
• This design minimizes computational overhead to process and extract the meaningful features, offering a 
associated with insignificant weights and neurons. robust foundation for building a deep learning model. The 
Validates the model’s accuracy and compares the process begins with raw audio data undergoing a 
performance of all three advanced cross-modal preprocessing step. After preprocessing, audio features 
fusion mechanisms using the benchmark dataset are extracted using these tools and libraries. These 
CMU-MOSEI. features contain the information about acoustic properties 
[15] of audio utterances embedded within the video track. 
• Validates the model’s accuracy and compares the The extracted feature provides crucial information about 
performance with baseline, and traditional state-of- various feature segments such as pitch, tone, energy, 
the-art fusion approaches: early fusion, late fusion rhythm, and spectral attributes [16]. These properties 
and hybrid fusion. capture many useful insights from raw audio data to train 
• Comprehensive discussion with transformer model the deep learning model, which drives to classify the 
based fusion approaches: attention fusion and emotional state. Some most widely used extracted key 
transformer fusion. features include:  
• The proposed approach ensures scalability and • Mel-Frequency Cepstral Coefficients (MFCCs) [17]: 
sustainability, contributing to the development of Derived from spectrograms to represent the audio signal 
more resource-efficient deep learning models for in a form humans perceive.   
real-world applications. • Spectral features [18]: Attributes such as spectral 
centroid, roll-off, bandwidth that highlight energy 
The rest of the paper is organized as follows: section 2 distribution across frequencies.  
reviews the literature on feature extraction and traditional 
• Variations in pitch, frequencies, amplitude [19]: 
fusion mechanism and highlights the related work and 
Capturing changes in voice that are indicative of different 
research gap. Section 3 introduces the advanced cross-
emotions.  
modal fusion approaches. Section 4 presents the training 
• Energy and intensity levels [19]: it represents the changes 
model and experimental setup, section 5 provides the 
in signal strength, where low intensity often refers to ‘sad’ 
result and discussion, and finally, Section 6 concludes the 
and high intensity correlates with ‘excitement or happy’ 
paper and suggests future scope. 
emotions. 
 
 
 
 
  
 
Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76   63 
 
 by a deep learning classification model for emotion 
Video features recognition. 
Extracting video features is an essential step to train a Late fusion 
deep learning model for emotion recognition. This To address the limitations of early fusion, another basic 
process takes multiple sub-steps like extracting frames fusion mechanism, late fusion [37], was introduced. A 
from the video, setting the frames per second, and significant amount of research has shifted towards this 
extracting per frame features. After extracting frames, fusion mechanism to develop more robust emotion 
it is required to preprocess the entire frames as per classification models. In late fusion, each modality is 
standards for emotion recognition.  first pre-processed, analyzed, and fed into a deep neural 
This preprocessing includes tasks such as frame network model as input. The outputs from these 
sampling [20], facial feature alignment [21], discarding classification models are then combined at a later stage. 
irrelevant frames and reducing variations. The advantage of this fusion mechanism lies in its 
ability to fuse features with low dimensionality and 
Previous studies have explored two broad approaches accurately classify emotions. 
to extracting the features from frames: appearance-
based features and geometric-based features [22]. Hybrid fusion 
Appearance-based features: These features describe the Hybrid fusion [38] is hybridization of early and late 
visual characteristics as features of a picture within a fusion, integrating the feature properties of both fusion 
specific frame, such as the face, facial expression, principles. It is considered superior to early and hybrid 
expression textures, sharpness, and facial movements fusion in emotion classification. This fusion is 
[23]. These features provide pure cues and essential particularly useful for addressing the challenges 
information for recognizing emotions. associated with the complexity of early and late fusion. 
Geometric-based features: These features are determined Hybrid fusion can be applied in two phases; first, 
based on the calculation of facial landmarks, jaw during the initial feature interaction, and second, after 
movements, eyebrow movements, expression the model has been trained. However, this fusion 
coordinates, relative positions, distance, arcs, shape technique fails to manage large parameters and 
angles, texture angles, and other facial action parameters complex features, where extracting and combining 
[24]. correlation based spatiotemporal feature information 
and identifying patterns are critical in multimodal 
These features are extracted using machine learning emotion recognition. Hence, hybrid fusion needs 
algorithms [25]–[28], traditional feature extraction further improvements to deal with complex multimodal 
techniques [29]–[32], and currently deep neural network datasets.   
models [12], [33]–[35]. Python libraries and frameworks 
are now widely used for feature extraction processes, Attention fusion 
enabling the development of more robust models for Attention fusion [39] is a mechanism that focuses on 
emotion recognition.   
fusing only the most relevant and crucial features after 
2.2 Feature fusion mechanism extracting all the features and generating feature maps 
from multimodal inputs. The advantage of this 
After extracting features from both the audio and video 
approach is to excel in handling both inter-modality and 
modalities, an integration process is required to combine 
intra-modality interactions effectively. However, a 
them effectively. This process, known as information 
major drawback of this fusion mechanism arises when 
fusion or feature fusion, involves aligning the key features 
from each modality obtained during the feature extraction feature alignment errors occur in spatiotemporal 
and fusing them into a unified representation [36]. The datasets or when sequence synchronization is lacking. 
goal is to synchronize the features of both modalities to Such issues lead to weak attention scores, increasing 
collaboratively recognize emotions with higher accuracy. data complexity and computational burden [40]. There 
In this fusion process, the integrated features are first used are two types of attention fusion mechanisms: self-
to train a deep learning model. The model is then attention [41] and multi-head attention [13]. Self-
validated to ensure its accuracy and reliability in emotion attention fusion sequentially captures interactions 
recognition.  within a single modality, while multi-head attention 
Early fusion focuses on every aspect of feature representation and 
captures interactions as output from multiple heads in 
Early fusion  [5] is one of the simplest and most parallel. 
fundamental mechanisms for multimodal fusion. In this 
fusion mechanism, features from different modalities are Transformer fusion 
first aligned and integrated after extraction and then fed 
Transformer fusion [42] is an advanced approach of 
into a deep neural network model as input. This method 
fusion mechanism that leverages pre-trained 
combines audio and features into a single unifies feature 
transformer models, which scales well on long 
vector, by applying the concatenation or elementwise 
sequencing data due to their ability to perform parallel 
operations such as addition, multiplication the, processed 
64   Informatica 49 (2025) 61–76                                                                                                                                    H. Kumar et al. 
computations. This fusion approach is particularly trade-off between audio and video frame intervals, and 
suitable for text-based emotion recognition tasks and positional embedding segments can lead to a loss of 
natural language processing (NLP) applications, as it critical information and feature correlations in these 
processes all token embeddings simultaneously. modalities. Furthermore, the process results in 
However, transformer fusion is less efficient when imbalanced classification, complex computations, and 
applied with audio and video modalities together. This high memory usage, making it less ideal for fusing 
limitation arises from the tokenization-synchronization spatiotemporal features and datasets.  
 
Table 1:  Summary of audio-video based traditional fusion and other fusion’s related work. 
Fusion Feature Modality Datasets Remarks 
extraction 
model 
Early fusion LSTM Audio-video CMU- Sensitive to noise and misalignment 
[43] MOSEI between audio and video signals 
Late fusion [43] LSTM Audio-video CMU- High computational cost; less effective in 
MOSEI modeling complex interactions between 
modalities 
Hybrid fusion VGG-net Audio-video IIT-R SIER Increased model complexity; risk of 
[44] overfitting with limited data 
Multimodal Bayesian Audio-video CMU- Computationally expensive; less scalable 
Factorization network MOSEI for large datasets 
Model (MFM) 
[43] 
Graph-MFN (G- LSTM Audio-video CMU- Limited scalability 
MFN) [45] MOSEI 
Multiplicative  LSTM Audio-video IEMOCAP, Prone to overfitting 
fusion (M3ER) CMU-
[46] MOSEI 
Cross-Attention Attention & Audio-video RAVDESS Requires large amounts of data for 
fusion [39] concatenation effective attention training; sensitive to 
missing modality information 
Transformer Transformer- Audio-video MELD, High memory consumption; needs 
fusion [42] based pre- IEMOCAP, extensive pretraining and large datasets 
trained model CMU-
MOSEI 
Multimodal CNN Audio-video AVEC2017 Limited ability to capture temporal 
fusion [47] relationships; 
Model level 2-layer LSTM Audio-video RECOLA fine-tuning requires careful parameter 
fusion [48] tuning. 
Tensor fusion Three-fold Audio-video CMU- Tensor-based fusion can be 
network TFN  Cartesian MOSEI computationally prohibitive; sensitive to 
[49] product missing or noisy data. 
Multimodal Bi-directional Audio-video IEMOCAP, Complex training process; BiGRUs can 
Dynamic gated recurrent MELD suffer from vanishing gradient problems 
Fusion Network unit (BiGRU) with long sequences. 
[50] 
 
2.3 Research gap 
interdependencies of features, and struggles with fine-
Problem: Through a comprehensive review of the literature, 
grained emotion classification. However, a critical research 
we have gained crucial insights into audio and video feature 
gap still needs to be addressed to improve further, 
extraction, various traditional cross-modal feature fusions 
specifically to reduce the computational error in traditional 
(such as early, late hybrid, attention, and transformer 
fusion mechanisms for audio-video based emotion 
fusion), and deep learning models, along with their 
recognition systems and enhance their sustainability.  
comparative performances on benchmark multimodal 
datasets. Traditional fusion faces challenges with high Solution: To address this gap, we propose a gating unit, and 
dimensionality in large datasets, fails to optimize the advanced cross-modal fusion mechanism (factorized 
Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76   65 
 
bilinear pooling and compact bilinear pooling) as an Cross-modal fusion is an effective technique for emotion 
alternative to traditional methods. This approach employs recognition that involves extracting meaningful and 
2D CNN and 3D CNN simple deep neural network essential features from two or more heterogeneous input 
architectures to avoid pruning and quantizing the mode sources or modalities using feature extraction processes, 
while managing insignificant weights and neurons. This integrating these features, and subsequently training a 
solution can optimize the computational efficiency while deep learning model. This technique has contributed to 
maintaining high performance, contributing to the many applications including emotion recognition and has 
development of more sustainable and scalable emotion continually evolved, demonstrating its versatility and 
recognition systems. effectiveness. Notably, cross-modal fusion has been 
successfully applied in many applications such as object 
3 Material and methods  detection [51], night pedestrian detection [52] , low light 
image semantic segmentation [53], and depression 
In this section, we first describe the cross-modal fusion detection [54]. Cross-modal fusion mechanism intends to 
mechanism and its architecture. Next, we introduce three develop a joint representation that gathers all the 
advanced cross-modal fusion mechanisms and its collective essential features from all the modalities and 
algorithm to enhance audio-video based emotion feeds into a single vector while retaining each modality’s 
recognition from audio and video modality. Finally, we contributions. 
discuss the comparative performance of these techniques While traditional cross-modal fusion mechanisms are 
against state-of-the-art fusion mechanisms.    discussed in the literature review section, this section 
focuses on three advanced cross-modal fusion 
3.1 Cross-modal fusion mechanism 
mechanisms for emotion recognition: Factorized bilinear 
pooling (FBP), Compact bilinear pooling (CBP), and 
Compact Bilinear Gated Pooling (CBGP). 
 
Figure 1: Basic architecture of Audio-video based cross-modal fusion mechanism 
3.2 Factorized bilinear pooling (FBP) 𝑍 = ∑𝑚
𝑖= (𝑀𝑇
1 𝐴) . (𝑁𝑇𝑉)   (1) 
Factorized Bilinear Pooling (FBP) is a method that Where, Z: Pooled feature vector, M and N are bilinear 
enhances the standard bilinear pooling technique by interaction matrices, A and V are feature vectors from audio 
factorizing the bilinear interaction tensor into lower-rank and video, respectively. Algorithm 1 illustrates the step-by-
approximations [55]. Traditional bilinear pooling involves step factorized bilinear pooling fusion process 
computing the outer product of two feature vectors from implementation. 
different modalities, resulting in a high-dimensional feature 
Training method: Let 𝐴′ represents the Audio and 𝑉′ 
representation. While this method captures rich interactions 
represents the Video modality. The   feature extraction 
between the modalities, it is computationally expensive and 
prone to overfitting due to the large number of parameters. functions 𝑓𝑎 𝑎𝑛𝑑 𝑓𝑣 are applied to the audio and video 
FBP mitigates these issues by factorizing the interaction modality. It generates the feature vectors: 
tensor into a product of two lower-rank matrices, 
𝐹𝐴 = 𝑓𝑎(𝐴′) 𝑎𝑛𝑑 𝐹𝑉 = 𝑓𝑣(𝑉′)   (2) 
significantly reducing the number of parameters while 
preserving the expressive power of bilinear interactions. 
66   Informatica 49 (2025) 61–76                                                                                                                                    H. Kumar et al. 
Where 𝐹𝐴 𝑎𝑛𝑑 𝐹𝑉 are the extracted feature vectors from A prediction function f (F’) is then applied to the feature 
audio and video, 𝐷𝐴  𝑎𝑛𝑑 𝐷𝑉 are dimensionality spaces of vector 𝐹′ to predict the target emotion category value, Z’ so, 
the audio and video feature spaces.  𝑍′ = 𝑓(𝐹).  Here, 𝑓(𝐹) is a 2D-CNN deep neural network 
acting as a classifier. The model is trained on labelled 
If the features from audio and video need to be combined, a dataset so it is calculated as follows: 
fusion mechanism 𝛴 can be used to integrate these feature 
vectors into a unified representation 𝐹′. It can be calculated (𝐹𝑖𝑦𝑖)𝑁
𝑖=1     (4) Where 𝑦𝑖  is the 
as:  true prediction label and N is the size of the sample.  
𝐹′ = Σ(𝐹𝐴, 𝐹𝑉)  𝑜𝑟 𝐹 = 𝐹𝐴 ⨁ 𝐹𝑉     (3) 
Where, Z:  Pooled feature vector, A and V are feature 
Algorithm 1:  Factorized Bilinear Pooling (FBP) 
vectors from audio and video, respectively. 𝑝𝑟𝑜𝑗𝑎 , and 𝑝𝑟𝑜𝑗𝑣 
Input: Factorize audio features: 𝐹𝐴 = 𝑓𝑎(𝐴′) are projection matrix of audio and video features.  
Factorize video features:  𝐹𝑉 = 𝑓𝑣(𝑉′)  
Output: Predict the emotion class for new inputs  
1. 1.  Compute the bilinear interaction between the  
factorized audio and video features: 
𝐹′ = Σ(𝐹𝐴, 𝐹𝑉)  𝑜𝑟 𝐹 = 𝐹𝐴 ⨁ 𝐹𝑉 Training Method:  Let 𝐴′ represents the Audio and 𝑉′ 
represents the Video modality. The feature extraction 
2. Feed the compact bilinear pooled vector ZFBP 
into a deep neural network classifier: (𝐹𝑖𝑦𝑖)𝑁
𝑖=1 functions 𝑓𝑎 𝑎𝑛𝑑 𝑓𝑣 are applied to the audio and video 
modality. It generates the feature vectors:   
3. Calculate the loss function, minimize, and   𝐹𝐴 = 𝑓𝑎(𝐴′) 𝑎𝑛𝑑 𝐹𝑉 = 𝑓𝑣(𝑉′)   (6) 
evaluation metrics 
CBP uses random projections to project the high-
4.    4.    Use the trained model to predict the emotion 
dimensional feature vectors into a lower-dimensional space 
class 
5.           for new inputs. before combining them. Random projection for audio (ZA) 
6.  and video features (ZV): 
This factorization reduces the computational burden and 𝑍𝐴 = (𝑃𝐴𝐹𝐴)  and  𝑍𝑉 = (𝑃𝑉𝐹𝑉)    (7) 
allows the model to generalize better, especially when 
Where, ZA/V: Projection of audio /video, PA, and Pv: dealing with limited data. FBP has been successfully 
applied in tasks such as Visual Question Answering (VQA) Projection matrix of audio /video features. To maintain the 
and image-text matching, where the interaction between information during projection, random sign vectors are 
modalities is crucial. applied to the projected features.   
3.3 Compact bilinear pooling (CBP) 𝑍𝐴′ = 𝑆𝐴 ∘ 𝑍𝐴 and 𝑍𝑉′ = 𝑆𝑉 ∘ 𝑍𝑉  (8) 
Compact Bilinear Pooling (CBP) further refines the bilinear SA and SV are random sign vectors for audio and video 
pooling approach by employing compact representations of features, ° 𝑑𝑒𝑛𝑜𝑡𝑒𝑠 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑤𝑖𝑠𝑒 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛. Then 
the bilinear interactions. Unlike standard bilinear pooling, we applied the random permutation to the elements of the 
which directly computes the outer product of two feature signed vectors to further scramble the features. 
vectors, CBP leverages approximations based on the Tensor 
Sketch technique to produce a compact representation of the 𝑍𝐴" = 𝑃𝑒𝑟𝑚𝑢𝑡𝑒(𝑍′𝐴, ℎ𝐴), and 𝑍𝑉" = 𝑃𝑒𝑟𝑚𝑢𝑡𝑒(𝑍′𝑉 , ℎ𝑉) (9) 
outer product. This method dramatically reduces the  where, ℎ𝐴, ℎ𝑉 is a permutation vector applied to the indices 
dimensionality of the resulting feature vector without losing 
of Z’A and Z’V. 
the key interactions between modalities. Algorithm 2 
illustrates the compact bilinear pooling fusion process The core of CBP involves computing the circular 
implementation. convolution of the two permuted feature vectors:  
In CBP, the outer product of the feature vectors A and V is 𝑍𝐶𝐵𝑃 = 𝐹𝐹𝑇−1 (𝐹𝐹𝑇(𝑍𝐴)) ° (𝐹𝐹𝑇(𝑍𝑉))    (10) 
approximated by projecting both vectors into a higher-
dimensional space using random projections, followed by 𝐹𝐹𝑇−1 ∶ 𝑖𝑛𝑣𝑒𝑟𝑠𝑒 𝑓𝑎𝑠𝑡 𝑓𝑜𝑢𝑟𝑖𝑒𝑟 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚, and 
element-wise multiplication and summation. Presented 
equation represents how to implement CBP for audio-video 𝐹𝐹𝑇: 𝑓𝑎𝑠𝑡 𝑓𝑜𝑢𝑟𝑖𝑒𝑟 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚 
emotion recognition using a deep neural network: 
 After this we normalized the obtained CBP feature vector 
𝑍 = ∑𝑚
𝑖=1(𝑝𝑟𝑜𝑗𝑎(𝐴)𝑖) . (𝑝𝑟𝑜𝑗𝑣(𝑉𝑖))   (5) and classified the categories of emotions by using deep 
neural networks. Calculated with the following formula:  
Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76   67 
 
𝑍′𝐶𝐵𝑃 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑍𝐶𝐵𝑃)     (11) gating mechanism that adjusts or selectively emphasizes 
features based on their relevance, using a learned Softmax 
where 𝑍′𝐶𝐵𝑃  predicts the emotion class, and 
function to modulate feature interactions before pooling.  
𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑍𝐶𝐵𝑃) represents the function of the deep neural 
network. In CBGP, the feature vectors A and V undergo compact 
bilinear pooling as described in CBP, but before the final 
Algorithm 2:  Compact Bilinear Pooling (CBP) summation, the resulting interaction vector is element-wise 
multiplied by a gating vector  𝐺′ ∈ 𝑅𝑑 Where, d is the 
Input: Project audio features, 𝑍𝐴 𝑎𝑛𝑑 
dimensionality of the compact representation. The gating 
Project video features, 𝑍𝑉 vector is computed as:  
Output: Predict the emotion class for new inputs 𝐺′ = 𝜎(𝑊𝐺(𝐴′, 𝑉′) + 𝑏𝐺)           (12) 
1. Generate projection matrix, 𝑍𝐴′ 
Where 𝜎: 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛, 𝑊𝐺  : weight matrix, 𝑏𝐺: bias 
2. Apply sign vectors to the projected audio features: vector, 𝐴′, 𝑉′: audio, video feature vectors. 
𝑍𝐴′ = 𝑆𝐴 ∘ 𝑍𝐴 
Training Method: Let 𝐴′ represents the Audio and 𝑉′ 
3. Apply sign vectors to the projected video features: 
represents the Video modality. The feature extraction 
𝑍𝑉′ = 𝑆𝑉 ∘ 𝑍𝑉 
functions 𝑓𝑎 𝑎𝑛𝑑 𝑓𝑣 are applied to the audio and video 
4. Apply permutation to the audio features: modality. It generates the feature vectors:  
𝑍𝐴" = 𝑃𝑒𝑟𝑚𝑢𝑡𝑒(𝑍′𝐴, ℎ𝐴) 
𝐹𝐴 = 𝑓𝑎(𝐴′)  and  𝐹𝑣 = 𝑓𝑣(𝑉′)  (13) 
5. Apply permutation to the video features: 
𝑍𝑉" = 𝑃𝑒𝑟𝑚𝑢𝑡𝑒(𝑍′
𝑉 , ℎ𝑉) CBP uses random projections to project the high-
6. Compute the circular convolution of the two dimensional feature vectors into a lower-dimensional space 
permuted feature before combining them and calculates the random 
vectors: 𝑍𝐶𝐵𝑃 = 𝐹𝐹𝑇−1 (𝐹𝐹𝑇(𝑍𝐴)) ° (𝐹𝐹𝑇(𝑍𝑉)) projection for audio (𝑍𝐴) and video features (𝑍𝑉):  
𝑍𝐴 = 𝑃𝐴𝐹𝐴   𝑎𝑛𝑑  𝑍𝑉 = 𝑃𝑉𝐹𝑉  (14) 
7. Feed the compact bilinear pooled vector ZCBP Where ZA 𝑎𝑛𝑑 𝑍𝑉: Projection of audio and video, PA, and 
into a deep neural network classifier: 
Pv: Projection matrix of audio and   video features. Then, we 
𝑍′𝐶𝐵𝑃 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑍𝐶𝐵𝑃) 
compute element-wise multiplication of the projected 
8. Calculate the loss function, minimize, and vectors: 
evaluation metrics 
𝑍′ = 𝑍𝐴 ∘ 𝑍𝑉  (15) 
9. Use the trained model to predict the emotion class 
for new inputs. Gated pooling: (i) compute the introduced gating vector 
𝐺′
 ∈ 𝑅𝑑 Where, d is the dimensionality of the compact 
representation. The gating vector is computed as: 
3.4 Compact bilinear gated pooling (CBGP)    
𝐺′ = 𝜎(𝑊𝐺(𝐴′, 𝑉′) + 𝑏𝐺)         (16) 
Compact Bilinear Gated Pooling (CBGP) enhances and 
builds upon Compact Bilinear Pooling (CBP) by adding a 
Here, 𝜎 is a Softmax function.   𝑍 = 𝑆𝑢𝑚(𝑍′′)     (18) 
(ii) then we apply the gating mechanism to the element-wise This entire mechanism can be summarized by an equation, 
multiply vector:  Where, Z: pooled feature vector, ZA/𝑎𝑛𝑑 𝑍𝑉: Projection of 
audio and video.      
𝑍′′ = 𝐺′ ∘ 𝑍′      (17) 
𝑍 = ∑𝑚
𝑖=1  (𝜎(𝑊𝐺  (𝐴, 𝑉) + 𝑏𝐺))𝑖 . (𝑍𝐴)𝑖) . (𝑍𝑉(𝑉𝑖))  (19) 
Finally, we sum the elements of the gated interaction vector 
to obtain the final pooled vector by the following equation-  
  
Algorithm 3:  Compact Bilinear Gated Pooling (CBGP) 
Input: Project audio features: ZA, and Project video features:  𝑍𝑉 
 
Output: Predict the emotion class for new inputs. 
 
1. Compute gating vectors for audio and video features: 
𝐺′ = 𝜎(𝑊𝐺(𝐴′, 𝑉′) + 𝑏𝐺) 
68   Informatica 49 (2025) 61–76                                                                                                                                    H. Kumar et al. 
 
2. Apply the gating vectors to the projected features: 
 
𝑍′ = 𝑍𝐴 ∘ 𝑍𝑉 
3. Apply sign vectors to the gated audio features: 
𝑍𝐴′ = 𝑆𝐴 ∘ 𝑍𝐴 
 
4. Apply sign vectors to the gated video features: 
𝑍𝑉′ = 𝑆𝑉 ∘ 𝑍𝑉 
 
5. Apply permutation to the gated and signed audio features: 
𝑍𝐶𝐵𝐺𝑃 = 𝐹𝐹𝑇−1 (𝑃(𝐹𝐹𝑇(𝑍𝐴 ∙ 𝐺))) 
 
6. Apply permutation to the gated and signed video features: 
𝑍𝐶𝐵𝐺𝑃 = 𝐹𝐹𝑇−1 (𝑃(𝐹𝐹𝑇(𝑍𝑉 ∙ 𝐺))) 
 
7. Compute the circular convolution of the two permuted feature 
vectors: 
𝑍′′ = 𝐺′ ∘ 𝑍′ 
 
8. Normalize the pooled feature vector: 
𝑍 = 𝑆𝑢𝑚(𝑍′′) 
9. Feed the compact bilinear gated pooled vector ZCBGP into a deep 
neural network classifier: 
𝑚
𝑍𝐶𝐵𝐺𝑃 = ∑  (𝜎(𝑊𝐺  (𝐴, 𝑉) + 𝑏𝐺))𝑖 . (𝑍𝐴)𝑖) . (𝑍𝑉(𝑉𝑖)) 
𝑖=1
10. Calculate the loss function, minimize, and evaluation metrics 
11. Use the trained model to predict the emotion class for new inputs 
 
Through this mathematical analysis, CBGP has been able 4.1 Evaluation dataset 
to identify the optimal fusion approaches that can be 
applied to audio-video-based emotion recognition systems, CMU-MOSEI [37]: CMU-MOSEI dataset comprises 
ultimately contributing to the development of more robust over 23,259 annotated video clips collected from more 
and accurate emotion recognition technologies. The gating than 1,000 speakers across a diverse range of topics. 
Total number of videos is 3228, video clips contain 
mechanism allows to control the flow of information 
naturally occurring monologues in English, making the 
between the layers while selecting and rejecting the 
dataset a realistic representation of human 
relevant or non-relevant (based on correlation feature 
communication. The dataset is annotated with six 
score) inputs. As we know, not all the features are equally categorical emotions: happy, sad, angry, fear, disgusted, 
important at every step or time frame, so the gating and surprised. Additionally, CMU-MOSEI provides 
mechanisms dynamically assign weights to features to intensity scores for each emotion, allowing for a fine-
capture complex regions more effectively. grained analysis of emotional expressions. After 
preprocessing, 20,323 samples are processed for feature 
4 Model training and experiments extraction. The dataset is divided into three sets; 80% for 
training, 10% for testing, and 10% for validation. The 
Our experiments are conducted on a system equipped with performance is evaluated on Accuracy, F1-score, and 
an AMD Ryzen 7 processor, 16GB of RAM, and an mean absolute error, (MAE). 
NVIDIA GeForce RTX GPU. The code was implemented 
using Jupyter Notebook IDE and the PyTorch framework. 4.2 Deep learning model implementation 
For audio and video preprocessing, we utilized the Librosa details 
and OpenCV Python libraries.  
a. 2D-CNN for Audio feature extraction and training 
 model 
Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76   69 
 
We used 2D-cnn to extract and capture inter-modal We used a simple 3D-CNN model because emotion 
feature dependencies from the CMU-MOSEI dataset. To recognition requires synchronized feature relations in 
generate spectrograms from raw audio files, we used the each frame of a video, and a compact bilinear gated 
LibROSA library, which converts the raw audio fusion mechanism can increase computational 
waveform into a time series sampled at 22500 Hz. The complexity. Additionally, our proposed approach aims to 
waveform is then transformed into a spectrogram using extract spatial and temporal features and incorporates a 
the Short-Time Fourier Transform (STFT), with a gated filter to fuse features from the audio and video 
window size of 2048 and a hop length of 512, striking a modalities for each utterance. Therefore, we chose a 
balance between time and frequency resolution. simple deep learning architecture. The 3D-CNN takes a 
Spectrograms play a crucial role in audio-video emotion 224x224x3 image as input, which passes through the 
recognition as they align with video frames, increasing first 3D convolution layer followed by pooling layers, 
the likelihood of feature correlations due to time and with a filter size of 3x3x3 and a stride of 1.  Table 2 
frequency samples during fusion mechanism. illustrates the Hyperparameters for 2D-CNN and 3D-
CNN model.
b. 3D-CNN for video feature extraction and training 
model 
Table 2. Hyperparameters for 2D-CNN and 3D-CNN model 
Hyperparameter (2D-CNN) Audio Hyperparameter (3D-CNN) Video 
 
Input size= 224x224 Spectrogram Input size=224x224x3 image frames 
Kernels (conv layers) =32,64,128,256 Kernels (conv layers) = 64,128,256,512 
Stride=1 Stride=1 
Activation function= Relu and Softmax Activation function= Relu and Softmax 
Max Pooling= 2x2 Max Pooling= 3x3x3, 2x2x2 
Batch size=32 Batch size=32 
Epochs= 30 Epochs= 30-50 
Learning rate=0.00003 (cosine decay) Learning rate=0.00003 (cosine decay) 
Regularization=L2 Regularization= L2 
Dropout= 0.3% Dropout=0.2% 
Optimizer = Adam Optimizer = Adam 
 
 
5 Result and discussion mechanisms such as bilinear gated pooling, compact 
bilinear pooling, and compact bilinear gated pooling. We 
We evaluate the performance of each cross-modal fusion analyse the accuracy of each traditional fusion 
mechanism (FBP, CBP,CBGP) and compare it with the mechanism such as early fusion, late fusion and hybrid 
state-of-the-art (early fusion, late fusion and hybrid fusion on the same dataset, CMU-MOSEI. This 
approach employs 2D CNN and 3D CNN simple deep 
fusion) mechanisms on the CMU-MOSEI dataset using 
neural network architectures to avoid pruning and 
accuracy, F1-score, and MAE. F1-Score is the harmonic 
quantizing the mode while managing insignificant 
mean of precision and recall metrics. The results are 
weights and neurons. The ablation study was carried out 
summarized in the tables below, highlighting the with a feature extraction process where features are 
contributions of each fusion method to the overall audio and video modalities that interact through the outer 
system performance product. The outer product allows the 2D-CNN and 3D-
CNN to capture the interactions between every feature 
5.1 Ablation study of one modality and every feature of the other modality 
in a compact manner. Comprehensive analysis and 
To investigate the specific contributions of compact 
baseline comparisons show that our proposed CBGP 
bilinear gated pooling fusion (CBGP) of cross-modal 
fusion mechanism fuses feature effectively and 
fusion mechanism, this paper presents a detailed analysis outperforms the state-of-the-art fusion approaches. This 
of a series of ablation experiments conducted on the 
study also provides a comprehensive discussion about 
CMU-MOSEI datasets. These results are presented in 
transformer model based fusion approaches- attention 
tables, comparing key performance using accuracy, F1-
fusion and transformer fusion. 
score, and MAE among advanced cross-modal fusion 
  
  
70   Informatica 49 (2025) 61–76                                                                                                                                    H. Kumar et al. 
5.2 Baseline comparisons  
a. Comparison of advanced cross-modal fusion mechanism with state-of-the-art FBP, and CBP fusion mechanism. 
Table 3: Performance comparison of advanced cross-modal fusion mechanisms on CMU-   MOSEI dataset, 
highlighting their accuracy, F1-score, MAE, and specific strengths. 
Cross-modal Accuracy F1-Score MAE 
Remarks 
fusion mechanism (%) (%) 
76.9 75.6 59.1 Performs well with sentiment-emotion 
FBP 
overlap 
78.4 77.1 59.8 
CBP Captures diverse emotions effectively 
CBGP 80.3 79.2 54.2 Best for fine-grained emotion 
detection 
Table 3 illustrates that CBGP achieves the highest importance of different feature interactions allows it to 
scores, particularly excelling in recognizing fine- handle the nuanced and varied expressions found in the 
grained emotions. Its ability to dynamically adjust the illustrations in the CMU-MOSEI dataset. 
b. Comparison of advanced cross-modal fusion mechanism with baseline cross-modal fusion mechanism 
Table 4: Performance comparison of advanced cross-modal fusion mechanism with traditional, and baseline cross-
modal fusion mechanism on CMU-MOSEI dataset, highlighting their accuracy, F1-score, and MAE. 
Fusion Mechanism Accuracy (%) F1-score (%) MAE (%) 
Early fusion  78.2 77.9 64.2 
(EF-LSTM) [43] 
Late fusion  80.6 80.6 61.9 
(LF-LSTM) [43] 
Graph-MFN [45] 76.9 77.0 - 
HFU-BERT model [56] 73.2 72.0 86.7 
Early Fusion  67.3 65.4 69.7 
2D-CNN (Ours) 
Late Fusion 70.4 69.2 67.4 
2D-CNN (Ours) 
Hybrid Fusion 72.6 71.4 65.8 
2D-CNN (Ours) 
76.9 59.1 
FBP (Ours) 75.6 
78.4 59.8 
CBP (Ours) 77.1 
CBGP (Ours) 81.3 79.2 54.2 
Table 4, illustrates that FBP performs well in scenarios emotions. CBGP achieves the highest performance over 
involving sentiment-emotion overlap. CBP further traditional cross-modal fusion mechanisms due to 
improves by effectively capturing a diverse range of  limited feature interaction and correlation.  CBGP excels 
in fine-grained emotion recognition and setting a 
benchmark on CMU-MOSEI dataset.
Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76   71 
 
 
Figure 2: Accuracy performance of FBP, CBP, and CBGP fusion approaches on CMU-MOSEI
Figure 2 illustrates that in the CMU-MOSEI dataset The progression from FBP to CBP, and from CBP to 
emotion categories, CBP consistently outperforms FBP. CBGP, emphasizes the strength and effectiveness of the 
The accuracy of 'Happy' emotion recognition increases fusion model in capturing emotional feature cues. This 
from 76% (FBP) to 78% (CBP), and 'Sad' improves from fusion leads to meaningful results that help classify 
70% to 72.5%. CBGP provides higher accuracy than all emotion categories more accurately. 
other fusion mechanisms across all emotion categories. 
c. System complexity analysis Table 6: Performance comparison of accuracy and p-value 
for cross-modal fusion mechanism. 
Table 5: Computational costs comparison (in floating 
point operations) for FBP, CBP, and CBGP 
Cross-modal fusion Accuracy p-value 
approaches across CMU-MOSEI Datasets. mechanism  (%) 
Datasets FBP CBP CBGP 
FBP 76.9 0.004 
 
CMU- 4.5 × 106 3.8 × 106 4.0 × 106 CBP 78.4 0.003 
MOSEI 
 CBGP 80.3 0.002 
Table 5 presents the computational cost comparison, and  
highlights the relative efficiency of the FBP, CBP, and 
CBGP approaches on the CMU-MOSEI dataset. Despite Table 6 presents the accuracy and p-value of Full Bilinear 
the apparent efficiency of CBP, the marginal difference Pooling (FBP), Compact Bilinear Pooling (CBP), and 
in computational costs, particularly the 0.2 × 106 FLOP Compact Bilinear Gated Pooling (CBGP), where FBP 
gap between CBP and CBGP, raises questions about the achieves the lowest accuracy of 76.9%. CBP improves 
trade-offs in performance. Lower computational costs accuracy to 78.4% by introducing compact bilinear 
may come at the expense of reduced accuracy or pooling. CBGP achieves the highest accuracy of 80.3% by 
robustness in multimodal emotion recognition tasks. The incorporating the gating mechanism, which selectively 
slight increase in CBGP’s computational load may emphasizes relevant features. The p-value decreases across 
reflect the additional overhead required to manage bi- the methods, indicating improved statistical significance 
modal interactions and graph-based modeling, with increasing accuracy. The values (0.004 for FBP, 0.003 
potentially leading to enhanced performance and for CBP, and 0.002 for CBGP) demonstrate that the 
interpretability. performance improvements are statistically significant. 
  
  
 
72   Informatica 49 (2025) 61–76                                                                                                                                    H. Kumar et al. 
d. Comparison of CBGP fusion mechanism with Traditional fusion: Traditional fusion typically 
attention fusion and transformer fusion  concatenate or aggregate features from multiple 
modalities, which can result in linear combinations of 
Transformer fusion: Transformer fusion is an advanced 
features, whereas attention and transformer fusion 
approach of fusion mechanism with the help of a pre-
trained transformer model, which scales well to large enhance inter-modality interactions by learning feature 
datasets and long sequences due to parallel computations. weights, but they still rely on additive or multiplicative 
This fusion is suitable for text-based emotion recognition relationships between modalities. They often struggle 
tasks and natural language processing-based (NLP) with complex feature interactions and fail to capture 
applications because transformer fusion model such as higher-order dependencies effectively.  
BERT [57], RoBERTa [40] performs on all token 
embeddings parallelly which is not efficient to work with Advanced fusion: Factorized bilinear and compact 
audio and video modalities together. Audio and video have bilinear pooling can capture non-linear and higher-order 
large interdependencies of features and long sequences, as interactions between features across modalities, which 
a result, the computational cost will be very high, training allows richer representations. These methods compress 
and testing will need more memory and computational the high-dimensional feature space into a lower-
burdens. Transformer fusion will also face challenges to dimensional representation while preserving inter-modal 
extract, fuse and learn complex spatiotemporal features relationships, addressing the curse of dimensionality in 
without architectural modifications in the model. traditional bilinear pooling. 
Transformer fusion works by dividing the word sequences 
into tokens, which is feasible but if we divide long audio Computational efficiency 
signals and high frame rate videos can lead to loss of 
important features, fine-grained temporal information, Traditional fusion: Simple concatenation or weighted 
tokenization can reduce the effectiveness and increase the aggregation methods are computationally inexpensive but 
biases in SoftMax function.   may lead to redundant or over-complex representations. 
Transformer-based fusion, although effective, can be 
Attention fusion: In our proposed work, we opted for 
computationally expensive due to quadratic complexity in 
CBGP over attention fusion to reduce the computational 
multi-head attention over long sequences or large 
cost because the CMU-MOSEI dataset is largest dataset, 
modalities. 
and our proposed solution uses 2D-CNN for audio and 
3D-CNN for video modalities to avoid pruning and Advanced fusion: Compact bilinear pooling and gated 
quantizing the mode while managing insignificant 
pooling introduce compact representations by leveraging 
weights and neurons. If we apply an attention fusion 
approximations (e.g., Random Fourier Transform or Count 
mechanism, we would need to apply self-attention fusion 
Sketch). These methods significantly reduce computational 
separately for both models and then integrate their 
outputs using multi-head attention fusion. This entire and memory overhead compared to traditional bilinear 
process would likely result in high dimensionality and an pooling without losing important interaction features. 
increased number of trainable parameters, leading to 
Dimensionality reduction 
high memory usage and expensive computation. 
 Attention mechanism relies on element-wise scale dot Traditional fusion: These methods often rely on post-
products, which may cause high variance during training fusion dimensionality reduction techniques (e.g., PCA) to 
Since our implementation employs a simpler CNN manage high-dimensional outputs. However, these 
architecture, in that case the model could predict approaches are not integrated into the fusion process, 
unbalanced attention scores. The extreme parameters potentially leading to loss of modality-specific 
could further cause exponential computation issues, as information. 
unbalanced attention implies that the model may focus 
excessively on some regions while ignoring others. In Advanced fusion: Methods like compact bilinear and gated 
conclusion, while attention fusion is an effective fusion pooling perform dimensionality reduction implicitly 
mechanism, it is not a suitable fit for our employed deep during fusion, ensuring that only the most relevant and 
learning emotion recognition model that’s why we have informative interactions are preserved. 
excluded it from the experiment. It may perform better 
with architectures such as fit well in ResNet [12], Modality-specific challenges 
DenseNet [58], MobileNet [59], and other transformer-
Traditional Fusion: Early and late fusion assume 
based models, where its capabilities can be better 
utilized. modalities contribute equally, potentially underperforming 
in scenarios where modalities have asymmetric importance 
5.3 Why CBGP outperforms better? or varying quality. Transformers address some modality-
specific issues but may fail in noisy or sparse input 
Representation capacity scenarios without sufficient modality-specific pretraining. 
Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76   73 
 
Advanced fusion: Compact bilinear and gated pooling are and work effectively on smaller datasets due to efficient 
robust to modality-specific variations. For example: Gated feature compression. Factorized approaches reduce 
pooling introduces selective weighting mechanisms that overfitting by limiting parameter count, improving 
dynamically prioritize certain modalities or features based scalability to complex multi-modal systems. 
on their relevance. Factorized pooling ensures that noisy or 
less-relevant features are naturally down-weighted during Interpretability 
fusion. Traditional fusion: Approaches like attention fusion or 
Generalization and scalability transformer-based fusion are somewhat interpretable 
due to explicit weighting schemes or attention maps. 
Traditional fusion: Simple techniques like early and late However, early and hybrid fusion methods lack 
fusion can generalize well but may not scale effectively to interpretability since features are often combined in a 
high-dimensional, multimodal, or diverse datasets. black-box manner. 
Transformer-based fusion can scale better but may require 
large datasets and pretraining to perform effectively. Advanced fusion: Compact bilinear pooling and gated 
pooling methods often lack explicit interpretability 
Advanced fusion: Advanced techniques like compact because the transformations (e.g., random projections, 
bilinear pooling generalize well to high-dimensional data Fourier transforms) are more abstract. 
Table 7: Comparison of FBP, CBP and CBGP based on various parameters. 
Cross- Feature Feature map Computation Advantage Limitation 
Modal interaction dimensionality cost 
Fusion level 
FBP Element-wise Reduced Low Efficient Introduces small 
product 𝑘 ≪ 𝑑2 approximation of approximation errors. 
bilinear pooling 
CBP Tensor Compact Medium Balances efficiency It does not capture the full 
sketching 𝑘 ≪ 𝑑2 and expressiveness bilinear interactions 
 
CBGP Selective Compact Medium Best for fine- Require extensive 
Second order 𝑘 ≪ 𝑑2 grained hyperparameter tuning. 
interaction classification, 
emphasizes key 
features 
Table 7 discusses the performance of FBP, CBP and significantly in audio-video-text based real-time 
CBGP based on various parameters such as, feature applications as well. CBGP is a computationally 
interaction level, feature map dimensionality, effective and robust fusion mechanism, making it crucial 
computational cost, advantage and limitation various to capture high correlation and relevant features for 
parameters. Here,  𝑑2 represents the input feature fusing heterogeneous modalities. Here are some real-
dimensionality, and 𝑘 is the dimensionality of the output time applications where CBGP can be applied in, 
representation in bilinear pooling. In CBP and CBGP, the 
Computer vision and pattern recognition, Natural 
value of 𝑘 is important as it directly affects the trade-off 
language Process based language interactions, 
between computational efficiency and model 
Recommendation systems for customer, Healthcare and 
expressiveness. If  𝑘 is lower than small memory needed 
but model may lose some effectiveness. Conversely, if 𝑘 medical applications, Robotics and automation system, 
is higher, the model acts more expressively but the Banking and E-commerce based digital applications, 
computational cost increases. Security and surveillance based human safety 
 application. 
5.4   Real-time application  
As we have seen in the above sections, CBGP has proven 6   Conclusion & future scope 
to be an effective fusion mechanism over traditional This study investigates the effectiveness of three 
fusion mechanisms. This comprehensive study has advanced cross model fusion mechanisms; factorized 
demonstrated its full capability as cross-modal based bilinear pooling, compact bilinear pooling, and compact 
emotion recognition. In real-time application, CBGP can bilinear gated pooling for audio-video based emotion 
extend beyond audio and video fusion. It can contribute recognition. This comprehensive experiment is 
74   Informatica 49 (2025) 61–76                                                                                                                                    H. Kumar et al. 
conducted on a widely recognized dataset; CMU- experimental results clearly demonstrate that the 
MOSEI. The gating mechanism integrated within CBGP compact bilinear gated pooling (CBGP) mechanism 
enables the model to selectively emphasize relevant outperforms the other fusion techniques across 
feature interactions, which is crucial for accurately benchmark dataset, consistently achieving higher 
recognizing complex and nuanced emotional accuracy, F1-score, and MAE. Overall, the findings from 
expressions. We evaluated the performance of each this study suggest that incorporating a gating mechanism 
fusion technique across various emotional categories, in multimodal fusion processes can significantly 
including happy, sad, fear, anger, neutral and disgust. enhance the performance of emotion recognition 
The performance of advanced cross-modal fusion is systems, making CBGP a promising approach for future 
compared to traditional cross-modal fusion mechanisms developments in this field. 
like early fusion, late fusion and hybrid fusion and  
transformer model based fusion mechanisms like, 
attention fusion and transformer fusion. The 
  
References  
  
 
[1] O. El Hammoumi, F. Benmarrakchi, N. [9] E. S. Salama, R. A. El-Khoribi, M. E. Shoman, and 
Ouherrou, J. El Kafi, and A. El Hore, “Emotion M. A. Wahby Shalaby, “A 3D-convolutional neural 
Recognition in E-learning Systems,” 6th Int. network framework with ensemble learning 
Conf. Multimed. Comput. Syst., pp. 1–6, 2018. techniques for multi-modal emotion recognition,” 
[2] Y. Zhang, Z. R. Wang, and J. Du, “Deep Fusion: Egypt. Informatics J., vol. 22, no. 2, pp. 167–176, 
An Attention Guided Factorized Bilinear 2021, doi: 10.1016/j.eij.2020.07.005. 
Pooling for Audio-video Emotion Recognition,” [10] M. M. Hassan, M. G. R. Alam, M. Z. Uddin, S. 
in International Joint Conference on Neural Huda, A. Almogren, and G. Fortino, “Human 
Networks (IJCNN), IEEE, 2019. doi: emotion recognition using deep belief network 
10.1109/IJCNN.2019.8851942. architecture,” Inf. Fusion, vol. 51, no. October 2018, 
[3] Y. Li, X. Zheng, M. Zhu, J. Mei, Z. Chen, and pp. 10–18, 2019, doi: 10.1016/j.inffus.2018.10.009. 
Y. Tao, “Compact bilinear pooling and multi- [11] L. Wang and J. Qiao, “Research and Application of 
loss network for social media multimodal Deep Belief Network Based on Local Binary 
classification,” Signal, Image Video Process., Pattern and Improved Weight Initialization,” in 3rd 
vol. 18, no. 11, pp. 8403–8412, 2024, doi: International Symposium on Autonomous Systems, 
10.1007/s11760-024-03482-w. ISAS 2019, IEEE, 2019, pp. 1–6. doi: 
[4] D. Kiela, E. Grave, A. Joulin, and T. Mikolov, 10.1109/ISASS.2019.8757780. 
“Efficient large-scale multi-modal [12] K. L. Lakshmi et al., “Recognition of emotions in 
classification,” 32nd AAAI Conf. Artif. Intell. speech using deep CNN and RESNET,” in Soft 
AAAI 2018, pp. 5198–5204, 2018, doi: Computing, Springer Berlin Heidelberg, 2023. doi: 
10.1609/aaai.v32i1.11945. 10.1007/s00500-023-07969-5. 
[5] W. A. Khan, H. ul Qudous, and A. A. Farhan, [13] N. H. Ho, H. J. Yang, S. H. Kim, and G. Lee, 
“Speech emotion recognition using feature “Multimodal Approach of Speech Emotion 
fusion: a hybrid approach to deep learning,” Recognition Using Multi-Level Multi-Head Fusion 
Multimed. Tools Appl., vol. 83, no. 31, pp. Attention-Based Recurrent Neural Network,” in 
75557–75584, 2024, doi: 10.1007/s11042-024- IEEE Access, IEEE, 2020, pp. 61672–61686. doi: 
18316-7. 10.1109/ACCESS.2020.2984368. 
[6] C. Yu, X. Zhao, Q. Zheng, P. Zhang, and X. [14] M. Sharafi, M. Yazdchi, R. Rasti, and F. Nasimi, 
You, “Hierarchical Bilinear Pooling for Fine- “A novel spatio-temporal convolutional neural 
Grained Visual Recognition,” Lect. Notes framework for multimodal emotion recognition,” 
Comput. Sci. (including Subser. Lect. Notes Biomed. Signal Process. Control, vol. 78, no. 
Artif. Intell. Lect. Notes Bioinformatics), vol. June, p. 103970, 2022, doi: 
11220 LNCS, pp. 595–610, 2018, doi: 10.1016/j.bspc.2022.103970. 
10.1007/978-3-030-01270-0_35. [15] G. Sharma, K. Umapathy, and S. Krishnan, 
[7] X. Peng, “Research on emotion recognition “Trends in audio signal feature extraction 
based on deep learning for mental health,” methods,” Appl. Acoust., vol. 158, p. 20, 2020, 
Inform., vol. 45, no. 1, pp. 127–132, 2021, doi: doi: 10.1016/j.apacoust.2019.107020. 
10.31449/inf.v45i1.3424. [16] F. M. Alamgir and M. S. Alam, “Hybrid multi-
[8] B. Mocanu, R. Tapu, and T. Zaharia, “Multimodal modal emotion recognition framework based on 
emotion recognition using cross modal audio-video InceptionV3DenseNet,” Multimed. Tools Appl., 
fusion with attention and deep metric learning,” vol. 82, no. 26, pp. 40375–40402, 2023, doi: 
Image Vis. Comput., vol. 133, p. 104676, 2023, doi: 10.1007/s11042-023-15066-w. 
10.1016/j.imavis.2023.104676. [17] S. K. Panda, A. K. Jena, M. R. Panda, and S. 
Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76   75 
 
Panda, “Speech emotion recognition using emotion recognition: Enhancing film sound 
multimodal feature fusion with machine design using audio features, regression models 
learning approach,” Multimed. Tools Appl., vol. and artificial neural networks,” Pers. Ubiquitous 
82, no. 27, pp. 42763–42781, 2023, doi: Comput., vol. 25, no. 4, pp. 637–650, 2021, doi: 
10.1007/s11042-023-15275-3. 10.1007/s00779-020-01389-0. 
[18] S. W. Byun and S. P. Lee, “A study on a speech [29] V. K. Sharma, “Designing of face recognition 
emotion recognition system with effective system,” Int. Conf. Intell. Comput. Control Syst. 
acoustic features using deep learning ICICCS 2019, pp. 459–461, 2019, doi: 
algorithms,” Appl. Sci., vol. 11, no. 4, pp. 1–15, 10.1109/ICCS45141.2019.9065373. 
2021, doi: 10.3390/app11041890. [30] S. Sahoo and A. Routray, “Emotion recognition 
[19] H. Aouani and Y. Ben Ayed, “Speech Emotion from audio-visual data using rule based decision 
Recognition with deep learning,” in 24th level fusion,” 2016 IEEE Students’ Technol. 
International Conference on Knowledge-Based Symp. TechSym 2016, pp. 7–12, 2017, doi: 
and Intelligent Information & Engineering 10.1109/TechSym.2016.7872646. 
Speech Emotion Recognition with deep learning [31] J. K. J. Julina and T. S. Sharmila, “Facial 
Systems, Elsevier B.V., 2020, pp. 251–260. doi: Emotion Recognition in Videos using HOG and 
10.1016/j.procs.2020.08.027. LBP,” in 2019 4th IEEE International 
[20] E. S. Agung, A. P. Rifai, and T. Wijayanto, Conference on Recent Trends on Electronics, 
“Image-based facial emotion recognition using Information, Communication and Technology, 
convolutional neural network on emognition RTEICT 2019 - Proceedings, IEEE, 2019, pp. 
dataset,” Sci. Rep., vol. 14, no. 1, pp. 1–22, 56–60. doi: 
2024, doi: 10.1038/s41598-024-65276-x. 10.1109/RTEICT46194.2019.9016766. 
[21] Y. Ma, Y. Hao, M. Chen, J. Chen, P. Lu, and A. [32] A. Vinay, V. S. Shekhar, K. N. B. Murthy, and 
Košir, “Audio-Visual Emotion S. Natarajan, “Face Recognition Using Gabor 
Fusion(AVEF):A Deep Efficient Weighted Wavelet Features with PCA and KPCA - A 
Approach,” Inf. Fusion, vol. 46, pp. 184–192, Comparative Study,” in 3rd International 
2019, doi: 10.1016/j.inffus.2018.06.003. Conference on Recent Trends in Computing 
[22] D. Ghimire, J. Lee, Z. N. Li, and S. Jeong, 2015 (ICRTC-2015), Elsevier Masson SAS, 
“Recognition of facial expressions based on 2015, pp. 650–659. doi: 
salient geometric features and support vector 10.1016/j.procs.2015.07.434. 
machines,” Multimed. Tools Appl., vol. 76, no. [33] S. Kakuba, A. Poulose, and D. S. Han, “Deep 
6, pp. 7921–7946, 2017, doi: 10.1007/s11042- Learning Approaches for Bimodal Speech 
016-3428-9. Emotion Recognition: Advancements, 
[23] X. Yan, “A Face Recognition Method for Sports Challenges, and a Multi-Learning Model,” IEEE 
Video Based on Feature Fusion and Residual Access, vol. 11, pp. 113769–113789, 2023, doi: 
Recurrent Neural Network,” Inform., vol. 48, 10.1109/ACCESS.2023.3325037. 
no. 12, pp. 137–152, 2024, doi: [34] X. Lu, “Deep Learning Based Emotion 
10.31449/inf.v48i12.5968. Recognition and Visualization of Figural 
[24] S. R. Sanku and B. Sandhya, “Multi-Modal Representation,” Front. Psychol., vol. 12, no. 
Emotion Recognition Feature Extraction and January, pp. 1–12, 2022, doi: 
Data Fusion Methods Evaluation,” Int. J. Innov. 10.3389/fpsyg.2021.818833. 
Technol. Explor. Eng., vol. 3075, no. 10, pp. 18– [35] M. Zielonka, A. Piastowski, A. Czyżewski, P. 
27, 2024, doi: 10.35940/ijitee.J9968.13100924. Nadachowski, M. Operlejn, and K. Kaczor, 
[25] T. Baltrusaitis, C. Ahuja, and L. P. Morency, “Recognition of Emotions in Speech Using 
“Multimodal Machine Learning: A Survey and Convolutional Neural Networks on Different 
Taxonomy,” IEEE Trans. Pattern Anal. Mach. Datasets,” Electron., vol. 11, no. 22, 2022, doi: 
Intell., vol. 41, no. 2, pp. 423–443, 2019, doi: 10.3390/electronics11223831. 
10.1109/TPAMI.2018.2798607. [36] K. Zhang, Y. Li, J. Wang, Z. Wang, and X. Li, 
[26] E. Ivanova and G. Borzunov, “Optimization of “Feature fusion for multimodal emotion 
machine learning algorithm of emotion recognition based on deep canonical correlation 
recognition in terms of human facial analysis,” IEEE Signal Process. Lett., vol. 28, 
expressions,” Procedia Comput. Sci., vol. 169, no. September 2022, pp. 1898–1902, 2021, doi: 
no. 2019, pp. 244–248, 2020, doi: 10.1109/LSP.2021.3112314. 
10.1016/j.procs.2020.02.143. [37] C. Dixit and S. M. Satapathy, “Deep CNN with 
[27] J. Zhang, Z. Yin, P. Chen, and S. Nichele, late fusion for real time multimodal emotion 
“Emotion recognition using multi-modal data recognition,” Expert Syst. Appl., vol. 240, no. 
and machine learning techniques: A tutorial and November 2023, p. 122579, 2024, doi: 
review,” Inf. Fusion, vol. 59, pp. 103–126, Jul. 10.1016/j.eswa.2023.122579. 
2020, doi: 10.1016/j.inffus.2020.01.011. [38] Y. Cimtay, E. Ekmekcioglu, and S. Caglar-
[28] S. Cunningham, H. Ridley, J. Weinel, and R. Ozhan, “Cross-subject multimodal emotion 
Picking, “Supervised machine learning for audio recognition based on hybrid fusion,” IEEE 
76   Informatica 49 (2025) 61–76                                                                                                                                    H. Kumar et al. 
Access, vol. 8, pp. 168865–168878, 2020, doi: multimodal sentiment analysis,” EMNLP 2017 - 
10.1109/ACCESS.2020.3023871. Conf. Empir. Methods Nat. Lang. Process. 
[39] R. G. Praveen, E. Granger, and P. Cardinal, Proc., pp. 1103–1114, 2017, doi: 
“Cross Attentional Audio-Visual Fusion for 10.18653/v1/d17-1115. 
Dimensional Emotion Recognition,” Proc. - [50] D. Hu, X. Hou, L. Wei, L. Jiang, and Y. Mo, 
2021 16th IEEE Int. Conf. Autom. Face Gesture “Mm-Dfn: Multimodal Dynamic Fusion 
Recognition, FG 2021, 2021, doi: Network for Emotion Recognition in 
10.1109/FG52635.2021.9667055. Conversations,” ICASSP, IEEE Int. Conf. 
[40] D. Sharma, M. Jayabalan, N. Sultanova, J. Acoust. Speech Signal Process. - Proc., vol. 
Mustafina, and D. N. L. Yao, “Multimodal 2022-May, pp. 7037–7041, 2022, doi: 
Emotion Recognition Using Attention-Based 10.1109/ICASSP43922.2022.9747397. 
Model with Language, Audio, and Video [51] A. R. Pathak, M. Pandey, and S. Rautaray, 
Modalities,” Lect. Notes Data Eng. Commun. “Application of Deep Learning for Object 
Technol., vol. 191, pp. 193–210, 2024, doi: Detection,” Procedia Comput. Sci., vol. 132, no. 
10.1007/978-981-97-0293-0_15. Iccids, pp. 1706–1717, 2018, doi: 
[41] Z. Fu et al., “A cross-modal fusion network 10.1016/j.procs.2018.05.144. 
based on self-attention and residual structure for [52] Y. Tian, P. Luo, X. Wang, and X. Tang, 
multimodal emotion recognition,” pp. 2–6, “Pedestrian detection aided by deep learning 
2021, [Online]. Available: semantic tasks,” Proc. IEEE Comput. Soc. Conf. 
http://arxiv.org/abs/2111.02172 Comput. Vis. Pattern Recognit., vol. 07-12-
[42] V. John and Y. Kawanishi, “Audio and Video- June, pp. 5079–5087, 2015, doi: 
based Emotion Recognition using Multimodal 10.1109/CVPR.2015.7299143. 
Transformers,” Proc. - Int. Conf. Pattern [53] A. H. Abdulwahhab, N. T. Mahmood, A. A. 
Recognit., vol. 2022-Augus, no. August, pp. Mohammed, I. Myderrizi, and M. H. Al-Jumaili, 
2582–2588, 2022, doi: “A Review on Medical Image Applications 
10.1109/ICPR56361.2022.9956730. Based on Deep Learning Techniques,” J. Image 
[43] Y. H. H. Tsai, P. P. Liang, A. Zadeh, L. P. Graph. Kingdom), vol. 12, no. 3, pp. 215–227, 
Morency, and R. Salakhutdinov, “Learning 2024, doi: 10.18178/JOIG.12.3.215-227. 
factorized multimodal representations,” 7th Int. [54] V. Adarsh, P. Arun Kumar, V. Lavanya, and G. 
Conf. Learn. Represent. ICLR 2019, 2019. R. Gangadharan, “Fair and Explainable 
[44] P. Kumar, S. Malik, and B. Raman, Depression Detection in Social Media,” Inf. 
“Interpretable multimodal emotion recognition Process. Manag., vol. 60, no. 1, p. 103168, 
using hybrid fusion of speech and image data,” 2023, doi: 10.1016/j.ipm.2022.103168. 
Multimed. Tools Appl., vol. 83, no. 10, pp. [55] H. Zhou, J. Du, Y. Zhang, Q. Wang, Q. F. Liu, 
28373–28394, 2024, doi: 10.1007/s11042-023- and C. H. Lee, “Information Fusion in Attention 
16443-1. Networks Using Adaptive and Multi-Level 
[45] P. P. Liang and R. Salakhutdinov, Factorized Bilinear Pooling for Audio-Visual 
“Computational Modeling of Human Emotion Recognition,” IEEE/ACM Trans. 
Multimodal Language : The MOSEI Dataset and Audio Speech Lang. Process., vol. 29, pp. 2617–
Interpretable Dynamic Fusion,” in Proceedings 2629, 2021, doi: 
of the 56th Annual Meeting of the Association 10.1109/TASLP.2021.3096037. 
for Computational Linguistics, 2018. doi: [56] S. Lee, D. K. Han, and H. Ko, “Multimodal 
10.18653/v1/P18-1208. Emotion Recognition Fusion Analysis Adapting 
[46] T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, BERT with Heterogeneous Feature 
and D. Manocha, “M3ER: Multiplicative Unification,” IEEE Access, vol. 9, pp. 94557–
Multimodal Emotion Recognition using Facial, 94572, 2021, doi: 
Textual, and Speech Cues,” in Proceedings of 10.1109/ACCESS.2021.3092735. 
the AAAI Conference on Artificial Intelligence, [57] S. Siriwardhana, T. Kaluarachchi, M. 
2020, pp. 1359–1367. doi: Billinghurst, and S. Nanayakkara, “Multimodal 
10.1609/aaai.v34i02.5492. emotion recognition with transformer-based self 
[47] N. Singh, N. Singh, and A. Dhall, “Continuous supervised feature fusion,” IEEE Access, vol. 8, 
Multimodal Emotion Recognition Approach for pp. 176274–176285, 2020, doi: 
AVEC 2017,” Comput. Vis. Pattern Recognit., 10.1109/ACCESS.2020.3026823. 
2017, doi: 10.48550/arXiv.1709.05861. [58] M. A. H. Akhand, S. Roy, N. Siddique, M. A. S. 
[48] L. Schoneveld, A. Othmani, and H. Abdelkawy, Kamal, and T. Shimamura, “Facial emotion 
“Leveraging recent advances in deep learning recognition using transfer learning in the deep 
for audio-Visual emotion recognition,” Pattern CNN,” Electron., vol. 10, no. 9, 2021, doi: 
Recognit. Lett., vol. 146, pp. 1–7, 2021, doi: 10.3390/electronics10091036. 
10.1016/j.patrec.2021.03.007. [59] N. A. S. Badrulhisham and N. N. A. Mangshor, 
[49] A. Zadeh, M. Chen, E. Cambria, S. Poria, and L. “Emotion Recognition Using Convolutional 
P. Morency, “Tensor fusion network for Neural Network (CNN),” J. Phys. Conf. Ser., 
Advanced Optimal Cross-Modal Fusion Mechanism for Audio… Informatica 49 (2025) 61–76   77 
 
vol. 1962, no. 1, 2021, doi: 10.1088/1742- 6596/1962/1/012040. 
 
https://doi.org/10.31449/inf.v49i12.7041 Informatica 49 (2025) 77–90 77 
Deep Learning-Based Involution Feature Extraction for Human 
Posture Recognition in Martial Arts 
Desheng Chen1*, Sifang Zhang2* 
1School of Physical Education, Anyang Preschool Education College, Anyang 456150, China 
2Department of Physical Education, Wuhan Sports University, Wuhan 430205, China 
Email: ayysbgscds@126.com, z1234567sf@sina.com 
*Corresponding author 
 
Keywords: human action recognition, deep learning, long- and short-term memory neural networks, lightweight 
networks, feature extraction 
 
Received: August 30, 2024 
With the development of computers in recent years, human body recognition technology has been 
vigorously developed and is widely used in motion analysis, video surveillance and other fields. This 
study is based on deep learning to improve human pose estimation. Firstly, Involution's feature 
extraction network was proposed for lightweight human pose estimation, and this feature extraction 
network was combined with existing human pose estimation models to recognize human pose. Label and 
classify each joint point of the human body separately, add weights to each different part, extract feature 
between joint points at different times, and then input the extracted feature into long short-term memory 
neural networks for recognition. The experimental results show that the improved human pose 
estimation model reduces the parameter and computational complexity by about 40% compared to the 
original model, while also slightly improving accuracy. Comparing the performance of models under 
various algorithms with the proposed model in this study, the accuracy under the Eigen method is 81.3%, 
the accuracy under the STOP method is 82.5%, the accuracy under the DMM&HOG method is 85.3%, 
the accuracy under the Actionlet method is 87.6%, and the accuracy under the JAS&HOG2 method is 
83.5%. The accuracy of the InNet LSTM method is 90.6%. The results indicate that the proposed model 
has good performance and can recognize different martial arts movements. 
Povzetek: Za prepoznavanje človeške drže v borilnih veščinah so porabljene involucijske ekstrakcije 
značilk za globoko učenje.
 
1 Introduction extract and accurately recognize and classify human 
feature. The research is divided into four main sections, 
With the development of computers, artificial 
the first of which is a brief review of other research topics 
intelligence has become increasingly relevant to 
on human recognition. The second part is a review of the 
people's lives. The advent of computer vision enables 
main methods used in this research, and the third part is 
computers to automatically recognize human actions 
the results of the model obtained by applying the methods 
and classify them [1]. Initially human movement 
to the research and analysing the results. The fourth part 
recognition relied on decomposing video frame by 
is a summary of all the above studies and an outlook for 
frame and then acquiring information from it, and then 
future research. 
recognizing human movements through image 
processing. This approach requires manually designing 
motion feature to represent the human body and then 2 Literature review 
modelling the motion feature to achieve the 
recognition effect, but manually acquiring the feature With the development of computers, human body 
requires a lot of time and effort [2]. In this study, a recognition technology has been vigorously developed 
human skeleton network was created by using Deep and is widely used in motion analysis, video surveillance, 
Residual Networks (ResNet) combined with etc. Liu et al. proposed a method for estimating the 3D 
Involution's improved algorithm for feature extraction. pose of a single person in two views without camera 
The human posture at each moment is represented by parameters in order to cope with the problem of needing 
the human skeleton, so that the human posture feature to know the camera parameters to obtain coordinate 
can be quantified by the human skeleton network. The accuracy in the camera's two views. It extracts the joint 
extracted feature is then fed into the Long Short-Term points from two different views through 2D estimation 
Memory (LSTM) neural network for processing and and inputs them into a 3D regression network to generate 
recognition. This model is designed to efficiently  3D joint point coordinates. The coordinates are then 
 combined with a 3D human pose recognition model to 
 identify the human pose. The results of the study 
78 Informatica 49 (2025) 77–90 D. Chen et al. 
 
indicated that this method extracted a high accuracy visible, but the model still has high recognition accuracy. 
rate for human pose action recognition [3]. Ferreira et The model consists of two components, namely a 
al. proposed a skeleton structure and deep semantic cascaded feature network and a graph structure network. 
feature based on human pose estimation network to The results show that the model has excellent recognition 
train a repetition counting and validation system, accuracy [7]. Zhang et al. found that existing 3D human 
which is able to make detection of human activities pose estimation methods focus on overall joint error 
and quickly identify the function of invalid repetition reduction, which leads to large errors in endpoint and 
information. The results show that the system is able to bone length. To address this problem, the group proposed 
accurately identify human movements and remove a human structure-aware network that can extract feature 
invalid repetitive information from them [4]. Liu et al. data from existing 2D joints to repair the positions of 3D 
propose a new elliptical distribution coding method in joint points. The results show that this method can 
order to help computers to accurately identify human effectively reduce the error between endpoint and bone 
movements. The method first describes the human length, resulting in a high improvement in recognition 
skeleton by elliptical Gaussian coordinate coding, then accuracy [8]. 
measures the difference between the predicted heat Ht et al. found that traditional human action 
map and the ground truth heat map, and finally the recognition uses manual feature from traditional 
human pose images for recognition. The results of the classifiers and is unable to make recognition of complex 
study show that the method has a good performance in human actions using advanced spatio-temporal feature. 
both datasets of the experiment and can provide high To address this problem, the research team proposed a 
recognition accuracy [5]. Vishwakarma proposes a coding technique that converts poses into feature images, 
method for recognizing human actions in videos that extracts high-level feature from the feature images and 
can be identified by deterministic actions, which uses a feeds them into a feature recognition system for 
double transform of wavelets to perform feature recognition. The results show that the method is able to 
extraction of human actions. The extracted feature is recognize human actions with high recognition accuracy 
then recognized. The results show that the method has [9]. Silva and Marana argue that existing human pose 
high recognition accuracy in different datasets [6]. extraction uses straight lines to represent body parts in a 
Tian et al. argue that the key points of the human body two-dimensional human model. The team proposes an 
under many images in the video may produce improved method based on existing human pose 
unreasonable prediction results from the human pose extraction, which maps each segment of a 2D pose to a 
estimation method due to issues such as illumination, point to extract spatial feature. The results of the study 
occlusion, etc. To address this problem, the team indicate that the method is effective in improving the 
designed a new generative adversarial network to recognition rate [10]. 
address the situation where some keypoints are not 
 
Table 1: Literature review 
Performance 
Study Method Application Key Findings References 
Comparison 
Extracts joint 
Dual-view 3D pose Human pose coordinates from High accuracy in 
Liu L et al. estimation without estimation in dual dual 2D images, human pose [3] 
camera parameters views inputs to 3D recognition 
regression network 
Accurate 
Skeleton and deep Human activity Detects activities and recognition with 
Ferreira B et 
semantic feature detection and removes redundant effective [4] 
al. 
training system filtering repetitions redundancy 
filtering 
Uses heatmap 
Action High recognition 
Elliptical Gaussian differences for 
Liu H et al. recognition in accuracy on [5] 
coordinate encoding precise pose 
skeletal models various datasets 
identification 
Vishwakarma Dual-wavelet Human action Extracts motion Consistently high [6] 
Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 79 
 
D K transformation recognition in feature using wavelet accuracy across 
videos transform datasets 
Cascade and 
High accuracy 
Generative graph-based 
Pose estimation even with 
Tian L et al. Adversarial Network networks handle [7] 
with occlusion occluded 
(GAN) lighting and 
keypoints 
occlusion 
Enhanced 
3D joint Reduces endpoint 
Structure-aware accuracy with 
Zhang X et al. correction in and bone length [8] 
network reduced joint 
skeletal models errors 
errors 
Pose encoding to Converts pose to 
Complex human High accuracy in 
feature images for feature images for 
Ht A et al. behavior complex activity [9] 
high-level feature advanced feature 
recognition recognition 
extraction recognition 
Spatial feature 
2D human pose 
extraction from Maps 2D segments to Improved 
Silva V et al. representation [10] 
mapped pose extract spatial feature recognition rates 
improvement 
segments 
 
 
In summary shown in Table 1, many scholars 3 Martial arts movement recognition 
have conducted research in the field of human pose 
recognition and achieved significant results, but there based on human posture estimation 
are still some limitations. Firstly, many methods rely With the development of the Internet, human body 
on multi view inputs or high-quality data, and the recognition technology has been vigorously developed 
recognition accuracy may decrease in single view or and is widely used in motion analysis, video surveillance 
complex backgrounds. Secondly, encoding methods and other fields. In this study, Involution's feature 
based on skeleton or feature images have limited extraction network is first proposed for lightweight 
performance in dealing with large occlusions or human pose estimation, which is combined with existing 
complex non repetitive actions. Some methods have human pose estimation models to recognize human pose. 
high computational complexity and are not The extracted feature is then fed into a longand short term 
user-friendly for real-time applications, and models memory neural network. 
such as generative adversarial networks rely heavily on 
 
training data, increasing the complexity of model 
construction and training. In addition, information loss 3.1 Involution feature extraction network 
during the encoding process may affect recognition based human pose recognition 
performance, especially in situations where there are In the field of computer imaging, the main indicator of 
rich pose details or diverse pose changes, limiting the the strength of a neural network's performance is the 
applicability and accuracy of these methods. The deep strength of its feature extraction performance. By 
residual network combined with the improved analysing the existing convolutional kernels, two 
algorithm of Involution is used for feature extraction, drawbacks are found, one is that the perceptual field is 
creating a human skeleton network to recognize and difficult to capture feature dependencies over long 
classify human actions. distances due to the limitation of the convolutional kernel 
 size.  
  
  
  
  
  
 
79 
80 Informatica 49 (2025) 77–90 D. Chen et al. 
 
The other is that the information between channels is of channels. Channel invariance allows neural networks 
rather complex and redundant. To solve this problem, to share in terms of channel dimensions, thus solving the 
this research proposes a new neural network operator problem of complex redundancy of information between 
Involution to assist feature extraction [11]. Involution channels. The main function of Involution is the 
is spatially specific, spatially specific in that it reallocation of arithmetic power, which allows the 
increases the receptive field by increasing the size of computer to perform optimally. 
the convolution kernel, and channel invariant in terms 

H
 
11K 2
11C
C
K K 1
K K C
Q
Figure 1: Generate convolutional kernel diagram 
 
Figure 1 shows the process of generating a 
convolution kernel by Involution. Firstly, a x x
multi-channel feature map is input and the feature 
vector of any point in the feature map is selected. Weight layer Weight layer
Multiplying this kernel with the feature vector adjacent 
to the point gives the K K C  feature map, and relu x
relu
finally the K K C  kernel is superimposed to  
obtain the final output feature map, with Involution Weight layer Weight layer
generating different kernels for different locations and 
F(x) relu F(x)
sharing a single kernel at the same location on the 
relu
channel [12]. The traditional Convolution kernel H(x)=F(x) H(x)=F(x)+x
counts and Involution counts are shown in Equation 
(a)General neural network (b)ResNet
(1). 
Figure 2: Neuron learning feature map 
11C0 Ci K K  
  (1) 
H QK K G Figure 2(a) shows the process of learning feature in 
the fully-connected layer of a general neural network, 
which can be seen to be learning directly on the mapping 
In Equation (1), 11  denotes the convolution 
between input and output. Figure 2(b) represents the 
kernel shared at H Q  pixel points, C0  denotes 
process of learning feature in the fully-connected layer of 
the number of channels in the output, Ci  denotes the 
ResNet, which can be seen to learn the residuals between 
number of channels in the input, K  denotes the size 
the input and output. The InNet unit still has the same 
of the convolution kernel, and G  denotes the number 
structure as the ResNet unit, with three convolutional 
of groupings. The number of channels is usually larger, 
layers in series, the first layer still reduces the dimension 
the number of groups is usually much smaller than the 
of the input channel, and the second layer uses the 
number of channels, and the size of the Involution 
convolutional kernel generated by Involution to replace 
convolution kernel does not have a number of channels, 
the original the second layer uses the convolution kernel 
so the ability to capture long distance feature can be 
generated by Involution to replace the original 
enhanced by increasing the size of the convolution 
convolution kernel. The third layer is to expand the 
kernel. Involution is able to increase the accuracy of 
reduced-dimensional feature to the desired size. This 
the model by this method while reducing the number 
improvement improves the feature extraction capability 
of model parameters and the amount of computation 
of InNet and also reduces the number of parameters and 
[13]. 
computational effort [14]. Convolution and Involution are 
This research uses a deep residual network 
shown in Equation (2). 
combined with an Involution modified algorithm for 
feature extraction. The neuron learning feature maps of 
the general neural network and ResNet are shown in 
Figure 2. 
Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 81 
 
Equation (2) shows the number of parameters for 
K 2C2
Convolution and Involution and the amount of calculation 

C 2
 + K 2GC for Convolution and Involution. Where H  is the height 
 of the input feature map, Q  is the width of the input 
r
  (2) feature map, C  is the number of input feature map 
HQK 2C 2
channels, and r  is the channel reduction ratio. The 
 C 2 + K 2GC
 Involution Pose Estimation Net (IPEN) is used as the 
HQ + HQK 2C
 r basis for extracting feature for the convolution kernel by 
Involution as shown in Figure 3. 
 
onv2Deconv3
Conv1 Layer1 Deconv1Dec
Layer2
Layer3
Final 
Layer
 
512
256
128 256 256 256
3 20
64 64
Figure 3: Convolutional kernel feature extraction graph 
 
As shown in Figure 3, firstly, the input is a 3.2 Research on martial arts movement 
3-channel image, and after passing through the first 
recognition based on human posture 
convolutional layer Conv1, the number of channels 
Since traditional neural networks often fail to 
increases to 64. Next, after passing through three 
achieve the desired results when processing data with 
consecutive convolutional layers, Layer1, Layer2, and 
temporal information such as video and audio, Recurrent 
Layer3, the number of channels in the feature map 
Neural Network (RNN) was introduced to process the 
increases to 128, 256, and 512, respectively. After 
data. Recurrent Neural Networks are capable of 
completing the three convolutional layers, it enters the 
outputting information that is dependent on both present 
deconvolution stage (Deconv1, Deconv2, and 
input and historical records. The structure of an RNN is 
Deconv3). In the deconvolution stage, each 
shown in Figure 4. 
deconvolution layer gradually reduces the number of 
channels in the feature map from 512 to 256, resulting 
in a final output of 20 channels. The human pose Ot O1 O O
2 3
recognition network uses InNet as the feature 
V V V V
extraction network of the recognition network, and 
after expanding it by ordinary convolutional layers, A A W A W A W  
Involution is used to extract feature information from tanh W tanh tanh tanh
the image, and the nodes are obtained by three 
U U U U
convolutional layers that act as regressors. The metrics It I1 I2 I3
used to evaluate the model are Object Keypoint 
Similarity metrics, as shown in Equation (3). Figure 4: Structure diagram of recurrent neural network 
 
 exp−d 2 2S 2 2
l pl p I  (vpl =1) Figure 4 represents the structure of an RNN, where 
Oksp =  (3) 
  (v A  represents a single neural network unit, O  
t
l pi =1)
represents the output at the time point, and I  represents 
t
the input at the time point. U, V and W represent the 
In Equation (3), p  represents the person ID, l  
different network weights respectively. The Long Short 
represents the number of keypoints S  represents the 
p Term Memory Neural Network is an improvement on the 
current person's scale factor, v  represents whether 
pl RNN, which can process time series like the RNN and 
the l th key point of the p th person is observable, 
has a similar structure to the RNN, but the recurrent 
d  represents the rated Euclidean distance between 
pl structure in the LSTM network is not the same as that of 
each person and each person's predicted joint point, 
the RNN. The recurrent structure consists of three parts 
  represents the normalisation factor for the I th 
I respectively three gate structures, one unit state and four 
skeletal point, and   represents the function that 
neural network layers [16]. The structure of the LSTM 
calculates the visible point [15]. 
neural network is shown in Figure 5. 
81 
82 Informatica 49 (2025) 77–90 D. Chen et al. 
 
Ct−1 Ct
× +
tanh
×
×
A A
 
sigmoid
sigmoid tanh sigmoid
h h
t−1 ht t+1
X t−1 X t X t+1
Figure 5: LSTM neural network structure diagram 
 
As can be seen from Figure 5, the entire recurrent articulation points, (x, y, z)  represents the coordinates 
structure consists of a short-term memory module, a of the articulation points, and j  represents the j  th 
current memory module and a long-term memory articulation point in the skeleton at t . The state of the 
module. In the current memory module, there are four human skeleton at each moment is coded into a network, 
neural network layers, three of which are single-layer and the skeleton joints at each moment change with time 
sigmoid feed forward neural networks and one is a [18]. Define the interaction network of articulation points 
single-layer tanh feed forward neural network, and the at different moments in time as shown in Equation (5). 
LSTM neural network is mainly used to filter the 
SANt = (Vt , E )  
feature information and determine the retention status t (5) 
through three gate structures: input gate, output gate In Equation (5), V  denotes the set of vertices in the 
and forgetting gate. Each gate structure is composed of t
network at the moment of t  and E  denotes the set of 
a vector operation and a sigmoid neural network layer. t
edges in the network at the moment of t . For the 
Classify human joints using a human pose recognition skeleton state at the same moment, the joints are 
network combined with human joint data, as shown in connected to each other and the relationship between the 
Figure 6. joints is expressed by calculating the Euclidean distance 
between each joint as shown in Equation (6). 
d (i, j) = (x − 2 2
i x j ) + (yi − y j ) + (zi − z j )
2  (6) 
In Equation (6), i  is any one of the joints except 
j . Since the human body completes the action, it is not 
determined by individual joints, but by the overall 
coordination of the human body, just using the Euclidean 
distance cannot highlight the relationship between each 
joint well, so the human body is divided into different 
five parts, and different weight coefficients are set 
 according to different parts as shown in Equation (7). 
Figure 6: Skeleton division diagram of the human body 
d (i, j)a1,1 i, j  4
 
d (i, j)a2 ,5  i, j  8
Figure 6 shows the division of the human skeleton. 
w(i, j)d (i, j)a ,9  i, j 12  (7) 
The importance of different bones to the human body 3

varies. If bone joints are divided into 20 joints d (i, j)a4 ,13  i, j 16

according to their importance, a sequence of human d (i, j)a5 ,17  i, j  20
posture can be represented by equation (4) [17]. 
In Equation (7), a  and a  represent the weight 
1 2
 coefficients of the left and right arms, a  and a  
S = K1, K 3 4
2 , , Kt
  represent the weight coefficients of the left and right leg 
K j (4) 
 t = (x j , y j , z j ),1 j  M parts, a  represents the weight coefficient of the torso 
5
part. After the skeleton nodes were constructed, the 
In Equation (4), S  represents the sequence of feature information of the image was extracted by CNN 
human skeletal articulation points, K  represents the local convolution. The extracted feature data is then fed 
t
skeleton at t , M  represents the number of into the LSTM for processing, and the feature data is 
Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 83 
 
filtered and judged by various gates. Each LSTM cell vector of the network nodes is analysed as shown in 
has an input gate, an output gate and an oblivion gate, Equation (13). 
and the input gate is calculated as shown in Equation 
(8). N
ECi = c A (
j ij EC  
= i 13) 
1
it = g(W
xi xt +W
hi h  
t−1 +bi ) (8) In Equation (13), EC  represents the eigenvector 
i
centrality and sets its initial value to 1,  represents the 
Equation (8) represents the input gate, x  adjacency matrix in the network, A  represents the 
t ij
represents the input value of the network at the current connection between nodes i  and j , and the initial 
time and h  represents the output value of the vector of EC  is cyclically multiplied with A  to obtain 
t−1 i
network at the previous time. b  denotes the input the value of EC . The stability of the network is usually 
i i
gate constant parameter. The output gate is calculated assessed by the average degree as shown in Equation (14) 
as shown in Equation (9). [20-21]. 
ft = g(W N
x f xt +W
h f h  
t−1 +b f ) (9)  K
K = i=1 i
 (14) 
i
N
Equation (9) represents the output gate, x  
t
represents the input value of the network at the current In Equation (14), K  represents the weighted 
moment, and b  represents the output gate constant i
i degree of the node i . The topological properties of the 
parameter. The formula for the forgetting gate is network nodes are combined with the topological 
shown in Equation (10) [19]. properties of the network to represent the entire skeleton 
of the action network. A sample skeleton of all actions is 
ot = g(W
xo xt +W
ho h  
t−1 +bo ) (10) shown in Equation (15). 
Equation (10) represents the forgetting gate and Yinput = [1,2 ,3 , ,u−1,u ]  (15) 
h  represents the output value of the network at the 
t−1
previous moment. b  denotes the constant parameter 
o In Equation (15), Y  denotes the input to the 
input
of the forgetting gate. In the IPN recognition technique LSTM and u  denotes the number of samples.   
for the skeleton, the human skeleton at each moment is u
denotes the feature vector in u . The samples are 
encoded as a network, and the weights of the edges are classified by this method to identify human actions. The 
calculated based on the distance between any two process of this model is as follows. Firstly, the Involution 
joints in the network as shown in Equation (11). operation dynamically generates convolutional kernels 
that adapt to feature maps, enhancing the ability to 
d (i, j) =1 (xi − x j )
2 + (yi − y 2 2 capture long-distance feature and reducing information 
j ) + (zi − z j )  (11) 
redundancy between channels. By combining this 
In Equation (11), i  and j  denote the nodes in network structure with a deep residual network, an 
the network and improved InNet was formed, which can efficiently extract 
(x, y, z)  denotes the 3D coordinates 
of the node, it can be seen that the weights are feature and reduce the number of model parameters. 
expressed as the reciprocal of the Euclidean distance. Subsequently, LSTM was used to process time series data 
In order to represent the transformation of the nodes in and analyze the dynamic changes of human joint points. 
the network in time, metrics such as proximity Joint points are classified according to their importance, 
centrality are introduced for evaluation, as shown in and Euclidean distance is calculated to describe the 
Equation (12). relationship between joints. The sensitivity of action 
recognition is improved by setting weights for different 
parts. 
N −1
CCi =  (12) 
 d (i, j)
jU , ji 4  Performance analysis of martial 
In Equation (12), N  is the number of nodes in arts movement recognition based on 
the network and U  is the set of all nodes in the human posture estimation 
network. Proximity centrality indicates how close the 
The first section of this chapter analyses Involution's 
node is to each of the other nodes in the network; the 
downsampling capability and then analyses the accuracy 
closer the node is, the greater its closeness centrality, 
of the model under different dataset sizes to determine the 
but the same node will change over time and its 
best data size to calculate its feature extraction time. The 
centrality will change as well. The eigencentricity 
second section provides an analysis of the introduction of 
83 
84 Informatica 49 (2025) 77–90 D. Chen et al. 
 
LSTM networks to compare the models under different Method Input size Param FLOPs 
algorithms. 256 x 192 28.4M 7.2G 
 ResNet-Q32 
384 x 288 28.4M 16.5G 
4.1 Performance analysis of human pose 256 x 192 63.9M 14.7G 
recognition based on involution ResNet-Q48 
384 x 288 63.9M 32.5G 
feature extraction network 256 x 192 17.1M 4.7G 
To verify the performance of this feature extraction InNet-Q32 
384 x 288 17.1M 10.1G 
network using InNet as the recognition network, InNet 
256 x 192 38.7M 7.9G 
was compared with ResNet. The CPU used in this InNet-Q48 
experiment is Intel(R) Xeon® Gold6226@2.7GHz, the 384 x 288 38.7M 20.4G 
GPU used is NVIDIA GeForce Tesla V100S, and the  
memory is 32 GB. The learning rate of the model is set In Table 2, Q32 indicates that the number of 
to 0.001 and decays by 0.1 every 10 epochs to channels for each convolutional layer is set to 32, and 
gradually reduce the learning rate. The batch size is 32 Q48 indicates that the number of channels for each 
to ensure efficient memory usage during the training convolutional layer is set to 48. Table 2shows the table of 
process. The optimizer uses Adam because of its Involution's degree-reducing capacity, InNet for using 
adaptive learning rate feature, which can handle sparse Involution instead of Convolution, from the table it can 
gradient problems. The loss function uses cross be seen that ResNet's Param is 28.4M and 63.9M, InNet's 
entropy loss, which is suitable for multi class Param is 17.1M and 38.7M, ResNet under different 
classification tasks. Using L2 regularization, the methods, different sizes of The FLOPs of different sizes 
weight decay parameter is set to 0.0001 to reduce the for ResNet were 7.2G, 16.5G, 14.7G and 32.5G, 
risk of overfitting. In terms of data augmentation, respectively, and the FLOPs of different sizes for InNet 
methods such as random cropping, rotation, and were 4.7G, 10.1G, 7.9G and 20.4G, respectively, under 
translation are applied during training to improve the different methods. The experimental results indicated that 
model's generalization ability. In terms of feature the InNet method using Involution instead of Convolution 
extraction network, the number of layers in the reduced the number of parameters and computation by 
Involution network is set to 5, and the number of about 40%, indicating that Involution has good capability 
channels is set to 128 to evaluate performance. In of reducing parameters. Compare the computational 
terms of LSTM configuration, the number of units is complexity of different methods. 
set to 256 to better capture time series feature. The As shown in Table 3. InNet reduces its dependence 
training cycle is set to 100 epochs, using 20% of the on large convolution kernels through Involution, while 
data as the validation set to monitor model ResNet relies on deep residual structures, and LSTM uses 
performance and prevent overfitting. When the number recursive structures to process time series. InNet has 
of layers in the network is small, Involution has less relatively low memory usage because it uses smaller 
compression power, but the accuracy is improved. As feature maps, while ResNet requires more memory due to 
the number of layers increases, Involution has a good its deep structure. LSTM also increases memory 
improvement in compression, but with some loss of requirements when processing long sequences. The 
accuracy. latency of InNet is moderate, influenced by input size and 
 sequence length. ResNet and LSTM can cause high 
 latency when processing large inputs or long sequences. 
Table 2: Argument reduction capability of revolution 
 
Table 3: Comparison of computational complexity 
Model Processing Flow Memory Usage Latency 
Utilizes Involution instead of convolution for Low to moderate, depending Moderate, influenced by 
InNet feature extraction, followed by LSTM for on feature map size and input feature map size and 
sequence analysis number of channels time steps 
Employs multiple residual blocks for feature 
High, especially in deeper High, particularly when 
ResNet extraction, followed by fully connected 
networks processing large input sizes 
layers for classification 
High, due to the need to store High, especially with long 
Uses a recurrent structure to handle sequence 
LSTM hidden states and input sequences and multiple 
data 
sequences feature 
Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 85 
 
1.00 1.00
0.75 0.75 ResNet
InNet
ResNet
0.50 InNet 0.50
0.25 0.25
0 0
0 100 200 300 400 500 0 10 20 30 40 50
Dataset size Iterations
(a)The relationship between dataset size and accuracy (b)The relationship between the number of iterations and accuracy
 
Figure 7: Model accuracy of RseNet and InNet 
 
As can be seen from Figure 7(a), the extraction its accuracy, its training time and recognition time is still 
performance of both methods is better when the dataset an important indicator as shown in Figure 8. 
is larger and contains more species. Since the number Figure 8(a) shows the change in model performance 
of Involution parameters and the amount of for both methods as the training time increases. It can be 
computation in InNet is less compared to that of the seen that the training time for InNet is a little longer than 
traditional Convolution in ResNet, the accuracy of that for ResNet, the situation is due to the fact that InNet 
InNet is still increasing when the size of the dataset uses a larger dataset during training and only a large 
reaches a certain amount, ResNet has levelled off. enough dataset can satisfy InNet to allow it to train to 
From Figure 7(b), it can be seen that with the selected achieve the best performance. Figure 8(b) shows the 
dataset size, InNet has been able to achieve the best change in model accuracy as the recognition time 
recognition performance with a small number of increases for both methods. It can be seen that InNet is 
iterations, and ResNet has not yet achieved the best able to use a small amount of time to achieve the best 
performance with the number of iterations where recognition accuracy on images when recognizing. The 
InNet's performance has reached its best, and reaches a results of the study indicate that the training time for 
point where when the performance no longer changes, InNet is slightly longer but within acceptable limits and 
it is still lower than InNet's performance. It can be seen that the overall performance of InNet is better than that of 
that InNet has good performance in feature extraction. ResNet. 
Judging the goodness of a model cannot only focus on 
1.00 1.00
0.75 0.75
ResNet
ResNet InNet
0.50 InNet 0.50
0.25 0.25
0 0
0 5 10 15 20 25 0 1 2 3 4 5
Training time(s) Recognition Time(s)
(a)The relationship between training time and accuracy (b)The relationship between recognition time and accuracy
 
Figure 8: Analysis of training time and recognition time for two models 
 
85 
Accuracy
Accuracy
Accuracy
Accuracy
86 Informatica 49 (2025) 77–90 D. Chen et al. 
 
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 ResNet 0.2 ResNet
InNet InNet
0 0
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
Limb weight coefficient Body weight coefficient
(a)Recognition accuracy under different methods (b)Recognition accuracy under different methods 
of changing Limb weight coefficient of changing torso weight coefficients
1.0 P<0.01 InNet
P<0.01 ResNet
P<0.01
0.8 P<0.01
P<0.01
0.6
0.4
0.2
0
Con 1 Con 2 Con 3 Con 4 Con 5
(c)Accuracy under different configurations
 
Figure 9: Model accuracy under different weight coefficients and configurations 
 
4.2 Performance testing of a human coefficients and stabilises when the limb weighting 
posture-based martial arts coefficients reach 0.8. When the weighting factor of the 
limbs was changed, the change in accuracy was minimal 
movement recognition model 
when the weighting factor of the torso was changed. The 
A selection of martial arts moves is identified, 
experimental results show that the influence of the limbs 
including the lunge punch, punch and pop kick, horse 
on the accuracy is greater than the influence of the torso 
stance punch, horse stance frame punch and top stomp 
on the accuracy. Figure 9 (c) Accuracy rates for five 
kick. The five movements are renamed as Movement 1, 
different weighting factors not chosen named 
Movement 2, Movement 3, Movement 4 and 
configurations 1 to 5 respectively, at different weighting 
Movement 5 respectively. The weighting coefficients 
factors. Configuration 1 has a torso weight coefficient of 
for the left and right arms, the left and right legs and 
0.2 and limbs weight coefficient of 0.8, Configuration 2 
the torso were assigned to compare the influence of 
has a torso weight coefficient of 0.6 and limbs weight 
each part of the skeleton on the recognition system. 
coefficient of 0.8, Configuration 3 has a torso weight 
This is shown in Figure 9. 
coefficient of 0.6 and limbs weight coefficient of 1.0. 
Figure 9 (a) shows the recognition accuracy for 
Configuration 4 has a torso weight coefficient of 0.4 and 
different methods with a value of 1 for the torso 
limbs weight coefficient of 0.8. Configuration 5 has a 
weighting factor a  and changing the weighting 
3 torso weight coefficient of 0.4 and limbs weight 
factor for the extremities. Figure 9(b) shows the 
coefficient of 0.6. It can be seen that the accuracy of 
recognition accuracy for different methods of varying 
recognition is maximised when the torso weighting factor 
the torso weighting coefficients when the values of the 
is 0.6 and the limb weighting factor is 0.8(P<0.01). A 
limb weighting coefficients a  and a  are set to 1. 
1 2 comparison of recognition for different actions at 
It can be seen that when the limb weighting 
different weighting factors is shown in Figure 10. 
coefficients are changed, the accuracy rate increases 
significantly with the increase of limb weighting 
Accuracy
Accuracy
Accuracy
Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 87 
 
Action 1 Action 2 Action 3
Action 4 Action 5
100
90
80
70
Config 1 Config 2 Config 3 Config 4 Config 5  
Figure 10: Accuracy of five different actions under five weight coefficients 
 
Table 4: Recognition accuracy of different algorithms on datasets 
Method Eigen STOP DMM & HOG Actionlet JAS & HOG2 InNet-LSTM 
Data 1 
81.3 82.5 85.3 87.6 83.5 90.6 
Accuracy (%) 
Data2 
76.5 81.4 86.2 88.6 87.9 93.6 
Accuracy (%) 
 
From Figure 10, the accuracy of five different DMM&HOG method is 85.3%, Actionlet method is 
actions with five weighting factors, it can be seen that 87.6%, and JAS&HOG2 method is 83.5%. The accuracy 
the action with weighting configuration 2 has the of the InNet LSTM method is 90.6%. In dataset 2, the 
highest average accuracy and has a more stable accuracy of each method is higher than in dataset 1. The 
performance. The other weighting configurations all accuracy of the proposed methods in this study was 
have large fluctuations in accuracy and show unstable higher than the other methods. To further validate the 
performance. Considering both stability and accuracy, accuracy of the InNet-LSTM method, the accuracy of the 
the weights used to construct the human skeleton were different methods was verified under different dataset 
set to a torso weighting factor of 0.6 and an extremity sizes. This is shown in Figure (11). 
weighting factor of 0.8. Different methods were  
introduced to compare with the method used in this 
study, and the MSR Action 3D dataset was chosen for DMM & HOG
100
InNet-LSTM
this experiment. This dataset contains 6000 images Eigen
Actionlet
specifically designed for human action recognition 95 JAS & HOG2
tasks, covering multiple explicit action categories STOP
including walking, running, jumping, and sitting. The 90
image size of each sample is 640x480 pixels, ensuring 
clarity and detail. The sample distribution of these 85
action categories is uneven, with more samples for 
walking and running, and relatively fewer samples for 80
jumping and sitting, which may affect the training 
effectiveness and performance of the model. Each 75
1 2 3 4 5 6 7 8 9 10
image is equipped with clear labels to indicate the Relative size of dataset  
corresponding action category, ensuring the accuracy Figure 11 Accuracy of different methods under different 
of the training data. In addition, the dataset generates 
dataset sizes 
additional samples through data augmentation 
techniques, including random rotation, flipping, and  
scaling, to enhance the model's generalization ability. The accuracy of InNet-LSTM is lower than that of 
Different algorithms were used to divide the dataset Actionlet method when the dataset is small, but when the 
into Dataset 1 and Dataset 2, and the recognition dataset increases to a certain level, the accuracy of 
accuracy on different datasets is shown in Table 4. InNet-LSTM is greater than that of other methods. 
According to Table 4, in dataset 1, the accuracy  
of Eigen method is 81.3%, STOP method is 82.5%, 
87 
Accuracy
Similarity
88 Informatica 49 (2025) 77–90 D. Chen et al. 
 
a high performance. However, there are still shortcomings 
5 Discussion in this study. When constructing a human skeleton model, 
the weights between the joints are determined by the 
Human motion recognition relies on video frame by distance between the joints, and the evaluation indicators 
frame decomposition and manually designing motion are too single. And the research was conducted in a 
feature to achieve recognition. The martial arts action laboratory environment. Future research is considering 
recognition system based on Involution feature using more indicators to construct human skeleton models 
extraction network and LSTM proposed in the study and applying them to practical applications to test the 
optimizes recognition accuracy and efficiency by performance of the models. 
reducing the computational complexity of traditional 
convolutional networks. The experimental results show 
that compared with traditional convolutional networks References 
such as ResNet, Involution significantly improves 
accuracy while reducing the number of parameters, [1] S. Yan, Y. Xiong, D. Lin. Spatial temporal graph 
especially on datasets of different sizes, with an convolutional networks for skeleton-based action 
average increase of 5% in object keypoint similarity recognition. 2018, 32(1): 56-72. 
and 8% in accuracy in the test set. This is due to the https://doi.org/10.1609/aaai.v32i1.12328. 
advantage of LSTM in time series modeling, which [2] W. Luo, W. Liu, S. Gao. Normal graph: Spatial 
enables the system to better understand the dynamic temporal graph convolutional networks-based 
changes in action sequences, especially achieving an prediction network for skeleton based video 
accuracy gain of about 15% in complex martial arts anomaly detection. Neurocomputing, 2021, 444(15): 
action recognition. The innovation of InNet LSTM lies 332-337. 
in using Involution instead of traditional convolution https://doi.org/10.1016/j.neucom.2020.08.085. 
to achieve lightweight and efficient feature extraction, [3] L. Liu, L. Yang, W. Chen, X Gao. Dual-View 3D 
and combining LSTM for temporal modeling to human pose estimation without camera parameters 
capture motion dynamics. This method outperforms for action recognition. IET Image Processing, 2021, 
ResNet in accuracy, resource utilization, and 15(14): 3433-3440. 
computation time, and is suitable for martial arts action https://doi.org/10.1049/ipr2.12277. 
recognition and other dynamic scenarios. It has broad [4] B. Ferreira, P. M. Ferreira, G. Pinheiro, N. 
applicability and efficient real-time processing Figueiredo, F. Carvalho, P. Menezes, J. Batista. 
capabilities. However, there are still limitations when Deep learning approaches for workout repetition 
dealing with unstructured random actions. Due to the counting and validation. Pattern Recognition Letters, 
limitations of existing equipment, higher performance 2021, 151(12):259-266. 
hardware can be introduced in the future to optimize https://doi.org/10.1016/j.patrec.2021.09.015 
training speed and expand the dataset size to enhance [5] H. Liu, Y. Chen, W. Zhao, S. Zhang, Z. Zhang. 
the system's generalization ability. Human pose recognition via adaptive distribution 
encoding for action perception in the self-regulated 
learning process. Infrared Physics and Technology, 
6 Conclusion 2021, 114(5): 1036-1045. 
https://doi.org/10.1016/j.infrared.2021.103660. 
In response to the problem of manually designing [6] D. K. Vishwakarma. A two-fold transformation 
motion feature for recognition, which consumes model for human action recognition using decisive 
energy and has very low recognition efficiency, pose. Cognitive Systems Research, 2020, 61(6): 
research is conducted on improving human pose 1-13. https://doi.org/10.1016/j.cogsys.2019.12.001. 
estimation based on deep learning. Firstly, Involution [7] L. Tian, G. Liang, P. Wang, C. Shen. An adversarial 
is proposed as a feature extraction network for light human pose estimation network injected with graph 
weighting of human pose estimation, and each joint structure. Pattern Recognition, 2021, 115(2):31-40. 
point of the human body is labelled and classified https://doi.org/10.1016/j.patcog.2021.107863. 
separately. The experimental results show that the 
[8] X. Zhang, Z. Tang, J. Hou, Y. Hao. 3D human pose 
InNet method, which uses Involution instead of 
estimation via human structure-aware fully 
Convolution, decreases the number of parameters and connected network. Pattern Recognition Letters, 
the computational effort by about 40%. Comparing this 2019, 125(5): 404-410. 
method with other methods, the accuracy of the Eigen https://doi.org/10.1016/j.patrec.2019.04.007. 
method is 81.3%, the STOP method is 82.5%, the 
[9] A. Ht, C. Chh, B. Ttn, B. Dska. Image 
DMM & HOG method is 85.3%, the Actionlet method 
representation of pose -transition feature for 3D 
is 87.6% and the JAS & HOG2 method is 83.5%. The 
skeleton-based action recognition. Information 
accuracy of the InNet-LSTM method was 90.6%. It Sciences, 2020, 513(3): 112-126. 
can be seen that the method proposed in this study has https://doi.org/10.1016/j.ins.2019.12.063. 
Deep Learning-Based Involution Feature Extraction for Human… Informatica 49 (2025) 77–90 89 
 
[10] V. Silva, N. Marana. Human action recognition 2020(12): 1-12. 
in videos based on spatiotemporal features and https://doi.org/10.1155/2020/8827468. 
bag-of-poses. Applied Soft Computing, 2020, [20] F. Daneshdoost, M. Hajiaghaei-Keshteli, R. Sahin. 
95(1) 84-93. R. Tabu search based hybrid meta-heuristic 
https://doi.org/10.1016/j.asoc.2020.106513. approaches for schedule-based production cost 
[11] B. Sun, D. Kong, S. Wang, L. Wang, B. Yin. minimization problem for the case of cable 
Joint transferable dictionary learning and view manufacturing systems. Informatica, 2022, 33(3): 
adaptation for multi-view human action 499-522. https://doi.org/10.15388/21-INFOR471 
recognition, ACM Transactions on Knowledge [21] G. Dzemyda, M. Sabaliauskas, V. Medvedev. 
Discovery from Data (TKDD), 2021, 2-55. Geometric MDS performance for large data 
ttps://doi.org/10.1145/3418897. dimensionality reduction and visualization. 
[12] L. Yu, L. Tian, Q. Du, J. Bhutto. Multi-stream Informatica, 2022, 33(2):299-320. 
adaptive spatial-temporal attention graph https://doi.org/10.15388/22-infor491. 
convolutional network for skeleton-based action  
recognition. IET Computer Vision, 2022, 162(2): 
 
143-158. https://doi.org/10.1049/cvi2.12058. 
 
[13] M. S. Alsawadi, M. Rio. Skeleton split strategies 
for spatial temporal graph convolution networks,  
Computers. Materials and Continuum, 2022,  
1(6):4643-4658.  
https://doi.org/10.32604/cmc.2022.028266. 
 
[14] Y. Hou, L. Wang, R. Sun, Y. Zhang, M. Gu, Y. 
 
Zhu, Y. Tong, X. Liu, X. Wang, J. Xia, Y. Hu, L. 
Wei, C. Yang, M. Chen. Crack-across-pore  
enabled high-performance flexible pressure  
sensors for deep neural network enhanced  
sensing and human action recognition. ACS 
NANO, 2022, 16(5): 8358-8369.  
https://doi.org/10.1021/acsnano.2c02609.  
[15] A. Gharahdaghi, F. Razzazi, A. Amini. A  
non-linear mapping representing human action  
recognition under missing modality problem in 
 
video data. Measurement, 2021, 186(3): 
1123-1133.  
https://doi.org/10.1016/j.measurement.2020.1121  
23.  
[16] W. Xu, M. Wu, J. Zhu, M. Zhou. Multi-scale  
skeleton adaptive weighted GCN for 
 
skeleton-based human action recognition in IoT. 
Applied Soft Computing, 2021,  
104(3):1568-1579.  
https://doi.org/10.1016/j.asoc.2021.107596.  
[17] H. B. Naeem, F. Murtaza, M. H. Yousaf, S. A.  
Velastin. T-VLAD: Temporal vector of locally 
 
aggregated descriptor for multiview human 
action recognition. Pattern Recognition Letters,  
2021, 148(8): 22-28.  
https://doi.org/10.1016/j.patrec.2021.06.012.  
[18] M. Yang. Research on vehicle automatic driving  
target perception technology based on improved 
 
MSRPN algorithm. Journal of Computational 
and Cognitive Engineering, 2022, 1(3): 147-151.  
https://doi.org/10.47852/bonviewJCCE20514  
[19] Y. Lin, W. Chi, W. Sun, S. Liu, D. Fan. Human  
action recognition algorithm based on improved  
resnet and skeletal keypoints in single image. 
 
Mathematical Problems in Engineering, 2020, 
 
89 
90 Informatica 49 (2025) 77–90 D. Chen et al. 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i12.7592 Informatica 49 (2025) 91–104 91 
Optimization of Elman Neural Network Using Genetic Algorithm for 
Construction Cost Estimation and Overspending Risk Analysis 
Qian Wu 
School of Urban Construction, Anhui Xinhua University, Hefei 230088, Anhui, China 
E-mail: wu2006qian@126.com 
Keywords: neural network, construction cost estimation, overspending risk, Elman network, genetic algorithm 
Received: November 14, 2024 
This study proposes a model based on the Elman neural network and improves it using a Genetic 
Algorithm (GA) to increase the accuracy of construction cost estimation and accurately analyze the 
overspending risk. First, an index system containing multiple dimensions such as building features, 
structural features, project positioning, and project environment is constructed to comprehensively 
capture the key factors affecting construction cost and overspending risk. Second, the Elman neural 
network’s structure and operation are thoroughly examined, and the GA optimizes the network’s weights 
and thresholds to improve the model’s predictive power. On the training set, the optimized GA-Elman 
model demonstrates great prediction accuracy, with relative error (RE) percentages between predicted 
and true values typically falling within ±1%. On the test set, the GA-Elman model performs better than 
the original Elman model in both difference and RE, with a Mean Absolute Percentage Error of 2.75%, a 
decrease of 18.4% compared to the Elman model. These results indicate that the GA-Elman model is more 
accurate in cost prediction and more effective in identifying potential overspending risks. This study 
provides a powerful tool for cost control and budget management in the construction industry and a new 
perspective on the application of neural networks in construction economics. 
Povzetek: Razvit je model za ocenjevanje stroškov gradnje in analizo tveganja prekoračitve stroškov, ki 
temelji na Elmanovi nevronski mreži, optimizirani z genetskim algoritmom. Model je močno orodje za 
obvladovanje stroškov in upravljanje proračuna v gradbeništvu. 
 
 
1 Introduction  which involves creating a building cost model based on 
the Elman neural network and using a Genetic Algorithm 
In the construction industry, cost estimation is the core (GA) to optimize it. This approach can increase cost 
link of project management, directly related to the estimating accuracy while evaluating potential 
project's economic benefits and risk control. Traditional overspending risk analysis and offering construction 
cost estimation methods rely on expert experience and project management scientific decision assistance. 
historical data. Still, such methods are often influenced by The main contribution of this study is the proposal of 
subjective judgment and are difficult to adapt to the a construction cost estimation model based on the Elman 
rapidly changing market environment and complex and neural network combined with a GA, specifically: 
changing engineering conditions [1-3]. Traditional cost Firstly, GA is applied to optimize the Elman neural 
estimation procedures encounter increasing challenges as network, utilizing GA to improve the weights and 
building projects get larger and more complicated. As a thresholds of the neural network, thereby enhancing the 
result, new techniques and methodologies must be model's prediction accuracy and generalization ability. As 
introduced immediately to increase estimation efficiency a global search optimization tool, GA can avoid the 
and accuracy [4, 5]. problem of falling into local optimal solutions that is 
With the advancement of machine learning (ML) and common in traditional training processes. 
artificial intelligence in recent years, neural networks have Secondly, by constructing a comprehensive index 
shown to be a valuable tool for tackling challenging system and integrating it with the Elman neural network, 
forecasting issues. Because of their benefits in processing a more accurate method for construction cost prediction is 
sequence data, recurrent neural networks (RNNs) are provided compared to traditional models. Furthermore, 
widely applied across various fields, such as natural the model's applicability in complex construction projects 
language processing and time series prediction [6-8]. The is effectively improved through further optimization with 
Elman neural network, as a kind of RNN, enhances the GA. 
network's memory ability by introducing the context layer, Finally, the study focuses on the prediction of 
which makes it perform well in dealing with time- construction costs and proposes a new method for 
dependent sequence data [9]. assessing cost overrun risks. Through the model's dynamic 
This study explores the application of neural networks memory mechanism, it is possible to analyze the impact 
in construction cost estimation and overspending risk of historical data on future costs, identify potential risk 
analysis. A new approach to cost estimating is presented,  
92 Informatica 49 (2025) 91–104 Q. Wu 
factors in advance, and provide decision support for These models could be used as decision-support tools for 
project management. construction project managers and practitioners to 
promote the development of automation research in the 
2 Related work green building industry. 
Because neural networks can handle complicated 
In the construction industry, the accuracy of cost nonlinear interactions, they have emerged as a potent tool 
prediction is critical for the project's success. With the for cost prediction problems. Pham et al. proposed an ML 
development of information technology, more and more and optimization framework incorporating artificial neural 
researchers began to explore how to use advanced networks (ANNs) and gradient boosting models to 
technical means to improve cost prediction accuracy. estimate construction costs and optimize costs under 
Mahmoodzadeh et al. forecasted the geological conditions, budget constraints rapidly [12]. Goodarzizad et al. 
construction duration, and cost of tunnels using Gaussian improved the accuracy of construction labor productivity 
Process Regression (GPR), Support Vector Regression models for concrete pouring operations through a hybrid 
(SVR), and decision tree models. Through 50% cross- model developed by combining ANN and Grasshopper 
validation to evaluate the model's performance, it was optimization algorithms [13]. The study helped to improve 
found that GPR was superior to SVR and decision trees in project efficiency, increase labor productivity, and reduce 
prediction accuracy. Hence, GPR was recommended to costs. Kim et al. introduced an autoregressive integrated 
predict future tunnel projects' geological and construction moving average (ARIMA)-ANN model to predict 
time costs [10]. Alshboul et al. used an ML algorithm to construction costs. They found that the model provided 
predict the cost of green buildings, considering the more accurate predictions in most cases, especially for 
influence of related attributes of soft and hard costs. The long-term forecasting time limits, than standalone 
evaluation results showed that eXtreme Gradient Boosting ARIMA or ANN models [14]. The main contents of the 
(XGBoost) performed best in accuracy, followed by the above research are summarized in Table 1. 
deep neural network (DNN) and random forest (RF) [11]. 
Table 1: Summary of relevant research contents 
Model Method Dataset Key results 
ML method is used to predict GPR has better prediction 
tunnel geological conditions, accuracy than SVR and 
GPR, SVR, Decision construction period, and cost. decision tree. Meanwhile, it is 
Tunnel project data 
tree The model's performance is recommended for geological 
evaluated by 5-fold cross- and time cost prediction of 
validation. future tunnel projects 
XGBoost performs the best in 
prediction accuracy, with an 
By considering soft and hard cost accuracy of 0.96; This Is 
XGBoost, DNN, and Green building-
attributes, ML methods are used followed by DNN (0.91) and 
RF related data 
to predict green building costs. RF (0.87), which can provide 
decision support tools for the 
green building industry. 
ANN and gradient boosting 
algorithms perform the best, 
13 ML regression algorithms are estimating construction costs 
ANN, gradient employed to estimate Construction and required resources with 
boosting model construction costs and optimize configuration dataset 99% accuracy in less than 1 
costs under budget constraints second of training time, and 
reducing costs by 7% through 
optimization. 
Labor productivity 
The combination of ANN and 
data for 24 The project efficiency is 
Hybrid model Grasshopper optimization 
commercial office improved, labor productivity is 
(ANN+Grasshopper algorithm improves the labor 
complex projects increased, and costs are 
algorithm) productivity model of concrete 
under construction in reduced 
pouring operation. 
Iran 
In most cases, especially in 
The ARIMA model is integrated National and city- long-term forecasting, hybrid 
ARIMA-ANN 
with ANN to predict construction level construction models have higher prediction 
model 
costs. cost index accuracy than ARIMA or ANN 
models used alone. 
Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 93 
Although significant progress has been made in on the principles of comprehensiveness, scientificity, a 
construction cost estimation, there remain substantial combination of quantitative and qualitative methods, 
limitations in terms of generalization ability and dynamics, and operability. These indexes are chosen from 
overspending risk assessment. Specifically, many models four aspects: architectural features, structural features, 
rely on specific datasets, making it challenging to maintain project positioning, and project environment. The method 
prediction accuracy in new construction project scenarios. of literature analysis is used for this selection. The 
For instance, while models like GPR and XGBoost exhibit quantification of qualitative indexes is carried out [15-17]. 
high prediction accuracy on particular datasets, their In constructing the cost estimation and overspending risk 
performance may decline significantly when applied to index system, the selection of each index is based on its 
cross-dataset scenarios or when handling previously correlation with construction costs and overspending risk. 
unseen complex situations. Existing research tends to For example, in the case of exterior wall decoration, 
focus more on cost prediction accuracy, with less significant differences in the price and construction 
emphasis on the quantification and identification of techniques of different materials exist. Paint is relatively 
potential overspending risks. For complex construction inexpensive, while materials such as stone and glass 
projects, such models lacking risk assessment abilities curtain walls are more costly and have longer construction 
could lead to delayed cost control decisions. To address periods, potentially increasing the overspending risk [18]. 
these shortcomings, this study proposes a construction Similarly, the technical personnel level directly influences 
cost estimation model based on the Elman neural network, construction efficiency and quality. Low technical levels 
optimized with a GA. The GA enhances the model's global may lead to rework and delays, thus increasing both cost 
search capability by optimizing the initial weights and and the probability of overspending [19]. Architectural 
thresholds of the Elman neural network, thereby features such as floor area and standard floor height 
improving its prediction performance across different determine material usage and construction complexity, 
datasets and complex scenarios. The dynamic memory directly affecting the total project cost. Structural features, 
mechanism of the Elman neural network enables it to including the prefabrication rate and component 
capture long-term dependencies in time-series data, differentiation, relate to the efficiency and cost control 
allowing the analysis of cost trends and forecasting capacity of prefabricated construction. Project 
potential overspending risks. Moreover, by designing a environmental factors, such as project management level 
comprehensive overspending risk index system, the model and transportation distance, reflect the impact of 
can quantitatively identify key factors that lead to cost management efficiency and logistics on cost. These 
deviations, providing a basis for risk prevention and indexes are validated through literature analysis and 
control. practical engineering experience, demonstrating their key 
role in cost control and overspending risk, thereby 
3 Construction cost estimation model providing a theoretical foundation for the model's 
scientific and comprehensive nature. The finalized index 
based on elman neural network system for assembly construction cost estimation 
prediction is outlined in Table 2. 
3.1 Construction cost estimation and 
construction of overspending risk index 
system 
The study focuses on assembly buildings. The selection of 
indexes affecting the cost and overspending risk is based 
Table 2: Construction cost estimation and overspending risk index system and assignment of values 
Primary index Secondary index Nature of the index Assignment of qualitative index 
Number of floors A1 Quantitative index - 
Architectural 
Building area A2 Quantitative index - 
features 
Standard floor height A3 Quantitative index - 
1=internally cast and externally 
hung shear wall structure; 2=stacked 
shear wall structure; 3=assembled 
Structure type A4 Qualitative index 
monolithic frame structure; 
4=assembled monolithic shear wall 
Structural 
structure 
features 
1 = independent foundation; 2 = pile 
foundation; 3 = raft slab foundation; 
Foundation type A5 Qualitative index 
4 = pile raft foundation; 5 = box 
foundation 
Prefabrication rate A6 Quantitative index - 
94 Informatica 49 (2025) 91–104 Q. Wu 
1 = laminated panels/air 
conditioning panels/drift 
Component type A7 Qualitative index windows/enclosures; 2 = 
prefabricated stairs; 3 = 
beams/columns/shear walls 
Differentiation degree of 
Quantitative index - 
components A8 
1=paint; 2=real stone paint; 3=glass 
Exterior wall decoration A9 Qualitative index curtain wall; 4=aluminum panel; 
5=stone 
1=general plaster; 2=plaster; 3=large 
Interior wall decoration A10 Qualitative index white; 4=latex paint; 5=wall tiles; 
6=wallpaper 
Project 
1=concrete topping; 2=ordinary 
positioning Ground engineering A11 Qualitative index 
tiles; 3=flooring; 4=premium tiles 
1=plastic steel window + steel door; 
2=aluminum alloy window + steel 
Door and window type A12 Qualitative index door; 3=plastic steel window + fire 
door; 4=aluminum alloy window + 
fire door 
1=excellent; 2=good; 3=medium; 
Technical personnel level A13 Qualitative index 
4=poor 
Project 
1=excellent; 2=good; 3=medium; 
environment Project management level A14 Qualitative index 
4=poor 
Transportation distance A15 Quantitative index - 
In the above index system, the three indexes of controlled. Moreover, indexes in the project environment 
architectural features are directly related to the building's reflect the efficiency of project management and the 
physical size and construction complexity, affecting impact of external conditions on costs, which are key 
material costs and labor requirements. These in turn affect factors in cost control and risk management. This system 
cost control and the risk of overspending. The indexes of helps to forecast costs more accurately while identifying 
structural features determine the structural stability and and controlling factors that may lead to overspending. 
construction methods, significantly impacting material In the above index system, the priority of each index 
selection and supply chain management, thus correlating varies depending on its impact on costs and overspending 
with the overspending risk. Project positioning includes risks. To ensure that the indicator system can 
qualitative indexes such as exterior and interior wall comprehensively and scientifically reflect the risk of cost 
decorations, ground engineering, and window and door overruns, the Analytic Hierarchy Process is used to assign 
types. These choices affect the building's aesthetics and weights to each index. The results are exhibited in Table 
functionality while leading to increased costs, which may 3. 
increase overspending risk if costs are not properly 
Table 3: Index system weight 
Primary index Weight of primary Secondary index Final weight 
index 
Architectural features 0.162 Number of floors A1 0.054 
Building area A2 0.054 
Standard layer height A3 0.054 
Structural features 0.409 Structure type A4 0.128 
Foundation type A5 0.073 
Prefabrication rate A6 0.053 
Component type A7 0.069 
Differentiation degree of 
0.086 
components A8 
Project positioning 0.290 Exterior wall decoration A9 0.044 
Interior wall decoration A10 0.068 
Ground engineering A11 0.121 
Door and window type A12 0.057 
Project environment 0.139 Technical personnel levelA13 0.073 
Project management level A14 0.046 
Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 95 
Transportation distance A15 0.020 
In Table 3, structural features hold the highest weight 
among the primary indexes, accounting for 40.9%, 3.2 Elman neural network modeling 
indicating their most significant impact on both analysis 
construction costs and overspending risk. Among these, 
A4 and A8 have relatively higher weights of 0.128 and The Elman neural network's key feature is the 
0.086, respectively, reflecting the crucial role of building incorporation of a context layer, which preserves the 
structure complexity and differentiation in cost control. hidden layer's state from a previous time step [20]. This 
The project positioning index ranks second, accounting enables the Elman network to process time-series data, 
for 29.0%, with A11 having the highest weight of 0.121, capturing the dynamics of the input data and the 
emphasizing its importance in construction decoration underlying temporal relationships, making it suitable for 
costs. The weights for architectural features and project time-dependent data prediction tasks such as construction 
environment are relatively lower. However, among the costs. The network creates a short-term memory 
secondary indexes, A13 and A2 stand out with weights of mechanism by feeding past information back to the 
0.073 and 0.054, respectively, highlighting their influence current moment, which enhances its ability to model 
on construction efficiency and total cost prediction. This nonlinearities in dynamically changing processes. Unlike 
weight allocation method enables the index system to traditional feed-forward neural networks, the Elman 
more scientifically reflect the contribution of various network has feedback connections between the hidden and 
factors to cost and overspending risk, providing a solid context layers. These feedback signals allow the network 
foundation for subsequent model predictions and risk to retain information from previous states, providing 
analysis. valuable contextual input for subsequent computations [21, 
22]. Figure 1 depicts the Elman neural network's basic 
structure. 
 
Hidden layer
Input layer
h1
Output layer
h2
h3
· ·
· ·
·
· ·
·
·
hn
Context 
Layer c1 c2 c3 cr
 
Figure 1: Schematic diagram of Elman network structure 
 
The core principle of the Elman network is as follows. 𝑦
𝑦(𝑡) = 𝑔(𝑤ℎ𝑦𝑤𝑐𝑗ℎ(𝑡))  (1) 
First, the output vector 𝑦(𝑡) of the network is obtained 𝑤ℎ𝑦  denotes the weight matrix between the hidden 
from the output vector ℎ(𝑡) of the implicit layer through and output layers. Secondly, the output ℎ(𝑡)  of the 
the nonlinear transformation function 𝑔(∗)of the output implicit layer is obtained from the current input 𝑣(𝑡 − 1) 
layer with the expression (1): and the output 𝑐(𝑡)  of the context layer through the 
96 Informatica 49 (2025) 91–104 Q. Wu 
nonlinear transformation function 𝑓(∗)  of the implicit 𝑐(𝑡) = ℎ(𝑡 − 1)   (3) 
layer with the expression (2): This structure allows the Elman network to capture 
ℎ(𝑡) = 𝑓(𝑤𝑥ℎ𝑣(𝑡 − 1) + 𝑤𝑐ℎ𝑐(𝑡))  (2) the temporal dynamics of the input data. For construction 
𝑤𝑥ℎ refers to the weight matrix from the input to the cost estimation, it means that the network can consider the 
hidden layer. 𝑤𝑐ℎ  denotes the weight matrix from the impact of historical cost data on current cost estimates, 
takeover layer to the hidden layer. Finally, the output 𝑐(𝑡) thus improving the accuracy of the predictions. 
of the take-on layer is the output ℎ(𝑡 − 1) of the implicit Furthermore, the computational flow of the Elman 
layer at the previous time step, that is (3): network is suggested in Figure 2. 
Start
Initialize the 
weights of each 
layer
Input sample
A series of 
calculation 
Calculate the input Calculate the output of 
steps
layer output the receiving layer
No
Does the error meet the 
Calculate hidden layer Yes
requirements or reach End
output the maximum number of 
training steps?
Calculate the output of Calculate the output of Calculation 
the context layer the receiving layer error
 
Figure 2: Elman network computational flow 
 
In Figure 2, the network initializes the weights of each error E does not decrease sufficiently, the training cycle 
layer as a necessary preparation before training starts. The continues, with the weights being adjusted to reduce the 
initial setup of these weights significantly impacts the prediction error. This process is repeated until the network 
learning effectiveness and overall performance of the performs adequately or the training reaches the set number 
network. Network learning is then built on the input of iterations. 
samples, which include past construction project cost data In the above step, the error E is used to measure the 
and other pertinent features. The outputs of the input, difference between the predicted output of the network, 
hidden, and output layers are then computed sequentially. 𝑦(𝑡), and the desired output as ?̂?(𝑡), calculated as (4): 
Meanwhile, after obtaining the output of the hidden layer, 1 𝑇
𝐸 = (𝑦(𝑡) − ?̌?(𝑡)) (𝑦(𝑡) − ?̂?(𝑡)) (4) 
the output of the context layer is further computed. In this 2
To adjust the weights, the partial derivatives of the 
step, the current output of the hidden layer is used as the 
error E with respect to the weights need to be calculated. 
input for the context layer in the next time step. This step 𝑦
The partial derivatives of the weights 𝑤
is the key to the short-term memory mechanism of the 𝑗𝑖  for the output 
Elman network, allowing it to retain information from layer are (5): 
𝜕𝐸 𝜕𝑦 (𝑡)
previous states while processing sequential data. The 𝑦 = −(?̂? (𝑡)) 𝑖
𝑦 = −(?̂? ) − 𝑦(𝑡)𝑔′ (∗)𝑥 (𝑡)) (5) 
𝜕𝑤 𝑑,𝑖(𝑡) − 𝑦
𝜕𝑤 𝑑,𝑖(𝑡 𝑗 𝑖
𝑗𝑖 𝑗𝑖
output layer error is determined by comparing the actual 𝑦
𝑤
cost data with the network's predicted outputs, following 𝑗𝑖  y refers to the weight connecting the ith input unit 
and the jth output unit; 𝑔′ (∗) represents the derivative of 
the computation of outputs across all layers. A critical 𝑗
element of supervised learning, this error computation the activation function of the output layer; 𝑥𝑖(𝑡) denotes 
(denoted as E) provides the network with feedback for the output of the ith input unit at time t. Let 𝜑0
𝑗 =
adjusting its parameters. Lastly, the error E is utilized to (?̂?𝑑,𝑖(𝑡) − 𝑦(𝑡)𝑔′ (∗), so (6): 
𝑗
check if the maximum number of training steps has been 𝜕𝐸 0
𝑦 = −𝜑 𝑖 = 1,2,⋯ ,𝑚; 𝑗 = 1,2,⋯ , 𝑛 (6) 
completed or if the predefined requirements are met. If the 𝜕𝑤 𝑗 𝑥𝑖(𝑡),
𝑗𝑖
Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 97 
𝑚 is the number of neurons in the input layer and 𝑛 is 
Algorithm: Elman Neural Network
the number of neurons in the hidden layer. 
Taking 𝐸 as the partial derivative of the input layer 
Input:
weight 𝑤𝑥
𝑗𝑖 , it can get (7): 
𝜕𝐸 𝜕𝐸 𝜕𝑥     - Training data
= 𝑖(𝑡) = ∑𝑚
𝑖=1(−𝜑
0𝑤𝑥 ′ ( 𝑣
𝜕𝑤𝑥 𝑞(𝑡 − 1) (7)     - Learning rate
𝑗𝑖 𝜕𝑥𝑖(𝑡) 𝜕𝑤𝑥 𝑗 𝑗𝑖)𝑓 ∗)
𝑖
𝑗𝑖
𝑓′ (∗)  denotes the derivative of the hidden layer     - Maximum iterations
𝑖
activation function. Let 𝜑ℎ 𝑚
𝑗 = ∑𝑖=1(−𝜑
0 𝑥
𝑗𝑤𝑗𝑖)𝑓
′ (∗) L, 
𝑖
then get (8): Initialization:
𝜕𝐸     - Randomly initialize weights
= −𝜑ℎ𝑣 ,𝑚; 𝑗 = 1,2,⋯ , 𝑛; 𝑞 = 1,2,⋯ , 𝑟 (8) 
𝜕𝑤𝑥 𝑗 𝑞(𝑡 − 1), 𝑖 = 1,2,⋯
𝑗𝑖     - Set initial context layer to zero
𝑟 is the number of neurons in the splice layer. 
The partial derivative of the connection weight 𝑤𝑐
𝑗𝑙  is Training:
obtained (9): Repeat until convergence or maximum iterations:
𝜕𝐸
= ∑𝑚 (−𝜑0 𝑥 𝜕𝑥𝑖(𝑡)
𝜕𝑤𝑐 𝑖=1 𝑗𝑤𝑗𝑖) 1,2,⋯ , 𝑛 (9)     1. Compute hidden layer output
𝑗𝑙 𝜕𝑤𝑐 , 𝑙 = 1,2,⋯ , 𝑛; 𝑗 =
𝑗𝑙
According to the chain rule (10):     2. Update context layer
𝜕𝑥𝑗(𝑡) 𝜕
𝑓 𝑛 𝑟 𝑥     3. Compute network output
𝜕𝑤𝑐 = 𝑗(∑𝑖=1𝑤
𝑐
𝑗𝑖𝑥𝑐,𝑖(𝑡) + ∑𝑖=1𝑤𝑗𝑙𝑣𝑖(𝑡 − 1)) = 𝑓′ (∗)𝑥 (𝑡) +
𝑗 𝑐,𝑖
𝑗𝑙 𝜕𝑤𝑐
𝑗𝑙
)     4. Calculate error
∑ 𝑦 𝜕𝑥
𝑤 𝑐,𝑖(𝑡
𝑗𝑖 𝑦  (10) 
𝜕𝑤
𝑗𝑙     5. Backpropagate and update weights
The dependence of 𝑥𝑐(𝑡) on the connection weight 
𝑦
𝑤𝑗𝑖  is ignored, and the following results are obtained (11) Prediction:
and (12): For each input in test data:
𝜕𝑥𝑗(𝑡)     1. Compute hidden layer output
𝑓′ (∗)𝑥 1) 
𝜕𝑤𝑥 =
𝑗 𝑐,𝑙(𝑡) (1
𝑗𝑙     2. Update context layer
𝑓′ (∗)𝑥 = 𝑓′ (∗)𝑥 ∗ 𝑓′ (∗)𝑥 12) 
𝑗 𝑐,𝑙(𝑡) 𝑗 𝑙(𝑡 − 1) + 𝛼
𝑗 𝑐,𝑙(𝑡) (     3. Compute final output
𝛼  
 refers to the forgetting factor. By substituting 
Figure 3: The pseudocode for the Elman model 
equation (12) into equation (11), it can obtain (13): 
𝜕𝑥𝑗(𝑡) 𝜕𝑥𝑗(𝑡−1)
𝑦 = 𝑓′ (∗)𝑥 𝑦  (13) 
𝜕𝑤 𝑗 𝑙(𝑡 − 1) + 𝛼 ∗
𝜕𝑤 3.3 Optimization of the Elman model 
𝑗𝑙 𝑗𝑙
Elman's equation (14)-(18) is derived from ∆𝑊 = based on GA 
𝜕𝐸
−𝜂 : Although the Elman neural network has remarkable 
𝜕𝑊
𝑦
∆𝑤 advantages in processing time series data, its performance 
𝑗𝑖 = 𝜂𝜑0
𝑗 𝑥𝑗(𝑡), 𝑖 = 1,2,⋯ ,𝑚; 𝑗 = 1,2,⋯ , 𝑛 (14) 
is highly dependent on the initial weight settings and the 
∆𝑤𝑐
𝑗𝑞 = 𝜂𝜑ℎ
𝑗 𝑣𝑞(𝑡 − 1), 𝑗 = 1,2,⋯ , 𝑛; 𝑞 = 1,2,⋯ , 𝑟 (15) 
choice of network structure. In addition, the Elman 
∆𝑤𝑥 𝑚 0 𝑥 𝜕𝑥𝑗(𝑡) 0
𝑗𝑙 = 𝜂∑𝑖=1(𝜑𝑗𝑤𝑗𝑖) = 1,2,⋯ , 𝑛; 𝑙 = 1,2,⋯ , 𝑛 (16) 
𝜕𝑤𝑥 𝜑𝑗 𝑥𝑗(𝑡), 𝑗
𝑗𝑙 network is easily affected by local minimum, which can 
𝜂 is the learning rate. Meanwhile, lead to suboptimal solutions and negatively impact 
𝜑0
𝑗 = (?̂?𝑑,𝑖(𝑡) − 𝑦(𝑡)𝑔′ (∗) (17) 
𝑗 prediction accuracy and generalization ability [23]. To 
𝜑ℎ 𝑚 0 𝑥 ′
𝑗 = ∑𝑖=1(−𝜑𝑗𝑤𝑗𝑖)𝑓 (∗) (18) 
𝑖 overcome these limitations, GA is introduced to optimize 
Through this calculation process, the Elman network the Elman model. Darwin's theory of natural selection and 
can gradually learn the complex relationship between the global search principle of biogenetics serve as the 
building cost data and complete cost prediction. This foundation for GA, an optimization algorithm designed to 
dynamic learning and forecasting mechanism makes the mimic the natural evolution process. Biological evolution 
Elman network perform well in dealing with time series 
mechanisms, including natural selection, genetic 
forecasting problems such as construction cost estimation. 
variation, and crossover, are simulated by GA, which is 
The pseudocode for the Elman model is illustrated in 
Figure 3. extensively used to tackle complicated combinatorial 
optimization problems by gradually improving the quality 
of solutions. GA has strong global search ability and 
adaptability, and can effectively deal with optimization 
problems under high dimensional, nonlinear, and complex 
constraints [24]. The basic idea of GA is to simulate 
natural selection and genetic mechanisms by operating a 
population composed of multiple individuals to produce 
better solutions. Although GA possesses global search 
capabilities and strong adaptability, there are certain 
limitations in its optimization process. GA may encounter 
issues of high computational complexity and time costs 
when dealing with large-scale datasets. Additionally, the 
98 Informatica 49 (2025) 91–104 Q. Wu 
convergence speed of GA can be slow, especially in large members of the current population are chosen to go into 
search spaces, where there is a risk of premature the next generation based on their fitness values. This step 
convergence or falling into local optimal solutions [25]. imitates the natural selection process of "survival of the 
The implementation steps of GA are displayed in Figure fittest" in biology. 
(4) Cross operation. Individuals are randomly paired 
4. 
from the selected ones and undergo a single-point 
Start crossover operation according to a set crossover 
probability (0.6). This involves randomly selecting a 
position in the chromosome and exchanging the gene 
Population 
initialization segments before and after that position, generating new 
combinations of weights and thresholds. This method 
improves search efficiency by exploring different 
Computational Get a new parameter combinations. 
fitness population
(5) Mutation operation. A small probability (0.2) is 
used to randomly mutate certain genes of the selected 
Check if individuals. The specific method is to add a random 
Yes
End termination 
Variation disturbance that follows a normal distribution (e.g., with a 
condition is 
met? mean of 0 and a standard deviation of 0.1) to the original 
weights or thresholds. Thus, it can increase the diversity 
No
of the population and avoid local optimal solutions. 
Select Intersect (6) Termination conditions. For one thing, the 
 algorithm automatically stops when it reaches the preset 
Figure 4: GA implementation process maximum number of iterations (200 times). For another, 
 if the optimal fitness value of the population does not 
This study uses the GA to optimize the adjustment of improve by more than a predetermined threshold (0.001) 
Elman network weights and thresholds, and the specific over a continuous number of generations (20), it is 
steps are as follows [26, 27]. considered that the algorithm has converged. In addition, 
(1) Population initialization. Several initial the optimization process is terminated early. By 
individuals are randomly generated in the solution space, introducing these clear stopping criteria, the stability of 
and each individual corresponds to a set of potential the optimization process can be effectively ensured, while 
Elman network weights and thresholds. Each individual also enhancing the applicability and reliability of the 
can be regarded as the coding form of Elman network algorithm in practical problems. 
parameters (real number coding), including the connection Through the aforementioned optimization process, 
weights between input and hidden layers, hidden and GA can effectively adjust the weights and thresholds of 
output layers, and the threshold of each neuron. the Elman network, improving the model's generalization 
(2) Fitness calculation. According to the performance ability and prediction accuracy. The rationality of 
index of the Elman network (for example, the mean square parameter settings is determined through multiple 
error of construction cost estimation), the fitness of each experimental tests. Meanwhile, the specific 
individual is evaluated. The network corresponding to the implementation of crossover and mutation ensures a high 
individual performs better on a given task the higher the degree of repeatability in the study, providing an effective 
fitness. modeling tool for complex construction cost estimation 
(3) Selection of the operation. Using probability tasks. 
techniques like roulette wheel selection, the fittest Figure 5 shows the calculation flow of the finally 
formed GA-Elman model based on GA. 
Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 99 
Input sample
Start
Select
Initialize the The error obtained by 
Encoding the initial 
weights of each Elman training is used 
value
layer as the fitness value
Intersect
No
Is the 
evolutionary The best coverage is the Computational 
iteration worst fitness
completed? Variation
Yes
Obtain the optimal 
initial weight
A series of 
calculation 
Calculate the input Calculate the output of 
steps
layer output the receiving layer
No
Does the error meet the 
Calculate hidden layer Yes
requirements or reach End
output the maximum number of 
training steps?
Calculate the output of Calculate the output of Calculation 
the context layer the receiving layer error
 
Figure 5: Calculation flow of GA-Elman model 
The GA optimization of the Elman neural network and high, providing decision-makers with a more intuitive 
can reduce the probability of the model reaching local risk assessment index. 
optima, and enhance the network's global search ability. Furthermore, the model quantifies the key risk factors 
Meanwhile, it can accelerate the convergence speed of the through a comprehensive index system. The index system 
training process and improve the model's prediction designed in this study encompasses four major dimensions: 
accuracy. This is especially important for complex architectural features, structural features, project 
construction cost estimation tasks. Especially when faced positioning, and project environment. Within each 
with time-related data, the optimized GA-Elman network dimension, specific indexes are assigned different weights 
can better capture the dynamic characteristics of data and to reflect their relative importance in contributing to cost 
realize more accurate cost estimation and risk prediction. overruns. For instance, in the architectural features 
dimension, the "number of floors" and "building area" 
3.4 Application of cost estimation model in directly influence material and labor costs, with their 
overspending risk weights determined by principal component analysis. In 
contrast, in the project environment dimension, 
In the cost management of construction projects, the "management level" and "technical personnel level" are 
assessment and control of overspending risk is a crucial quantified using fuzzy comprehensive evaluation methods. 
link. The assessment of overspending risk relies on the The distribution of risk factor weights follows (20): 
accuracy of cost estimation while requiring scientific 𝑣
𝑤 𝑖
𝑖 =   (20) 
quantification of risk factors and their weights. The GA- 𝑣
Elman model can accurately capture the time series 𝑤𝑖  represents the weight of the ith risk factor, with a 
characteristics of cost data through dynamic memory value range of 0 to 1 and a total weight of 1; 𝑣𝑖 refers to 
mechanisms, offering vital support for the quantitative the contribution of the ith index to the total deviation; 𝑣 
assessment of overspending risk. Firstly, the assessment denotes the total deviation. The GA-Elman model can 
of overspending risk is based on the cost deviation rate 𝑝, identify and predict the primary risk factors leading to 
and the degree of risk is quantified by the deviation overspending through historical data. For example, the 
between the model's predicted value 𝑐′ and the actual cost model can use retrospective analysis to determine that 
value 𝑐. The specific calculation reads (19): material price fluctuations contribute 35% to cost 
|𝑐−𝑐′| deviations, construction delays account for 25%, design 
𝑝 = × 100%  (19) 
𝑐′ changes contribute 20%, and other factors make up 20%. 
In this context, the higher the deviation rate, the This detailed quantitative analysis helps managers 
greater the overspending risk. Based on this deviation rate, pinpoint key risk sources and provides data support for 
the risk can be classified into three levels: low, medium, formulating targeted risk control strategies. 
100 Informatica 49 (2025) 91–104 Q. Wu 
Additionally, the GA-Elman model simulates the 4 Model Performance verification 
impact of different cost control strategies on overspending 
risk. For instance, in the case of significant material price 
4.1 Data source and experimental design 
fluctuations, the model can simulate cost trends for diverse 
procurement strategies (such as bulk purchasing in To ensure the universality and representativeness of the 
advance or phased procurement) and assess the mitigation experiment, data are collected from multiple sources, 
effects of each strategy on overspending risk. This data- ensuring the diversity and reliability of the data. The social 
driven simulation analysis offers project managers a and economic development level of each region and the 
scientific decision-making tool. number of prefabricated buildings built are 
To sum up, the GA-Elman model in overspending comprehensively considered. The basic data are obtained 
evaluation provides intuitive risk levels through the from professional platforms such as the China 
quantification of cost deviations. Meanwhile, it offers a Prefabricated Building Market Analysis Report, 
systematic approach to risk identification, assessment, and Prefabricated Building Network, and Zhongce Big Data 
control through the weight allocation to key risk factors Website. Additionally, data from 45 groups of 
and simulation analysis. By applying this model in-depth, prefabricated building projects in cities such as Beijing, 
project managers can remarkably improve risk Tianjin, Hebei, and Shenyang over the past four years are 
management efficiency, reduce economic losses caused collected. These data cover many dimensions, such as 
by overspending, and ultimately enhance the construction architectural features, structural features, project 
projects' cost-effectiveness and success rate. positioning, and project environment, offering rich 
information for model training and testing. Taking the 
indexes A1-A3 of architectural features as an example, the 
variance analysis of these data is detailed in Table 4.
 
Table 4: Variance analysis of architectural feature indexes 
Difference 
Sum of Squares Degrees of Freedom Mean Square F P-value F crit 
source 
Row 3,417,030,830 44 77,659,791 1.000 0.488 1.515 
Column 4,022,937,273 2 2,011,468,636 25.902 0.000 3.100 
Error 6,833,762,444 88 77,656,391    
Table 4 shows significant mean differences (P<0.05) model evaluation results are improved. The experimental 
among variables A1, A2, and A3, while the differences setup and parameter values are shown in Table 5. 
between samples are not significant. This indicates that  
different samples have a relatively small impact on the Table 5: Experimental environment and parameter 
results of variance analysis. These data can more setting 
comprehensively illustrate the distribution characteristics Hardware/parameter name Parameter/value 
of architectural feature data, providing data support for Operating system Windows10 
model prediction. To enhance the model's generalization CPU AMD R7-5800H 
ability, the gathered data are normalized to remove the Basic frequency 3.2 GHz 
impact of varying dimensions and ordering. Specifically, 
Display card RTX3060 
the Min-Max normalization method is adopted to map the 
Memory 16 GB 
data values of each index to the interval [0, 1], and the 
Hard disc 512 G SSD 
normalization equation is as follows (21):  
𝑋−𝑋 Input layer node 15 
𝑋′ = 𝑚𝑖𝑛    (21) 
𝑋𝑚𝑎𝑥−𝑋𝑚𝑖𝑛 Output layer node 1 
X is the original data; 𝑋𝑚𝑖𝑛  and 𝑋𝑚𝑎𝑥  are the Hidden layer node 10 
minimum and maximum values of the index, respectively. Maximum number of iterations 200 
Through this method, the differences in dimensions and Error tolerance 1×10-5 
magnitudes between different indicators have been Evolutionary algebra 20 
eliminated, ensuring the stability and accuracy of the Population size 10 
model during training and testing. The training set Cross probability 0.6 
comprises 36 sets of data; The test set contains 9 sets of Mutation probability 0.2 
data, which are randomly selected from the dataset and  
arranged in a 4:1 ratio. Furthermore, to comprehensively Relative Error (RE) and Mean Absolute Percentage 
evaluate the performance and reliability of the model, this Error (MAPE) are used as evaluation indexes to evaluate 
study further adopts the k-fold cross-validation technique the accuracy of prediction results. The calculation 
(k=5) based on the division of training and testing data. By equations of them are (22) and (23): 
partitioning the dataset k times to ensure that each subset 𝑦′𝑖−𝑦
participates in training and validation, the potential RE = 𝑖 0  ( 2
𝑦′
∗ 10 %  2 ) 
𝑖
random errors caused by a single partition are effectively 1 |𝑦 −𝑦′ |
𝑀𝐴𝑃𝐸 = ∑𝑛 𝑖 𝑖
𝑁 𝑖=1   (23) 
reduced. In addition, the stability and credibility of the 𝑦𝑖
Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 101 
𝑁 represents the number of samples. 𝑦𝑖  and 𝑦′𝑖  refer thereby enhancing the model's ability to learn nonlinear 
to the predicted and actual values. In the cost estimation relationships. Moreover, integrating learning methods or 
model, REP measures the difference between the hybrid model structures can be introduced to combine the 
predicted and actual costs to evaluate the model's advantages of multiple algorithms and improve the 
prediction performance. MAPE index can directly reflect model's generalization ability. Lastly, for key features 
the RE between the actual and predicted values of the such as material prices and construction conditions, 
model, and it is an important index to measure the model's targeted feature engineering strategies can be designed to 
prediction performance. ensure that the model can more accurately capture their 
impacts, thus reducing the occurrence of extreme errors. 
4.2 Test results of the GA-Elman model Taking the Elman network, RNN, and SVR as the 
benchmark model, the test set is tested on the GA-Elman 
Firstly, the GA-Elman model is trained, and its training 
and benchmark models, respectively, and the results are 
result in the training set is presented in Figure 6. 
revealed in Figure 7. 
4,000  True value 26 3,500
 Predicted value 24  True value
 Relative error 22  Predicted value _ Elman
20  Predicted value _ GA-Elman
3,500  Predicted value _ RNN
18
 Predicted value _ SVR
16
3,000
14
3,000 12
10
8
6
2,500
2,500 4
2
0
-2
2,000 -4
2,000
0 5 10 15 20 25 30 35
1 2 3 4 5 6 7 8 9
Sample number
Sample number  
Figure 6: Training results of the GA-Elman model in the Figure 7: Comparison between the GA-Elman model and 
training set benchmark model 
  
Results in Figure 6 demonstrate that the GA-Elman On most test samples, the predicted value of the GA-
model has good prediction accuracy. This is because the Elman model in Figure 7 is more similar to the true value. 
predicted values for most samples are extremely close to The maximum differences between the predicted and 
the true values and the RE percentage is typically less than actual results for the Elman network, RNN, and SVR are 
1%. However, there are also some samples with large 118.99, 117.65, and 102.94, respectively. The GA-Elman 
prediction errors, such as Samples 14 and 32, with RE model's maximum difference between the predicted and 
percentages as high as 9.816% and 24.284%. The reasons true values is 87.21. These results show that the GA-
for these issues may be attributed to several factors. Firstly, Elman model optimized by GA has higher prediction 
the data characteristics of these samples may significantly accuracy and robustness in construction cost estimation, 
deviate from the overall distribution of the training set, thus verifying the effectiveness of GA in neural network 
such as abnormal fluctuations in key factors like material weight optimization. 
prices, construction conditions, or design complexity. For 
instance, Sample 32 may have actual costs that far exceed 4.3 Comparison of cost estimation results 
the model's predictions due to the use of certain specific 
before and after Elman model optimization 
processes or unexpected construction delays. Secondly, 
the model may exhibit limitations in handling rare features To further compare the cost estimation results before and 
in small samples, especially when these features are not after the optimization of the Elman model, the difference 
adequately represented in the training data, making it between the predicted and true value and the RE of the 
difficult for the model to capture their nonlinear four models are calculated, as denoted in Figure 8. 
relationships. Additionally, the data preprocessing process 
may not have eliminated the effects of noise or outliers, 
which could also amplify errors. To address the 
aforementioned issues, the following approaches can be 
taken. Firstly, it is necessary to optimize data 
preprocessing methods by employing techniques such as 
denoising and smoothing to improve data quality. 
Meanwhile, the detection and handling of outliers are 
strengthened to reduce the noise interference on the model. 
Secondly, the sample diversity of the training dataset is 
expanded, particularly for samples with rare or abnormal 
features, by increasing the proportion of related data, 
Output result
Relative error (%)
Output result
102 Informatica 49 (2025) 91–104 Q. Wu 
 Differential value _ Elman  RE_Elman In addition, the training time of the GA-Elman and 
 Differential value _ GA-Elman  RE_GA-Elman Elman models is compared, and the results are listed in 
150  Differential value _ RNN  RE_RNN 6
Table 6. 
 Differential value _ SVR  RE_SVR
 
100 4
Table 6: Comparison of training time between GA-
Elman and Elman models 
2
50 Training dataset Training 
Model size (number of time 
0
0 samples) (seconds) 
100 12.36 
-2 Elman 
-50 500 56.47 
model 
1,000 115.82 
-4
100 18.75 
-100 GA-Elman 
500 72.93 
-6 model 
1 2 3 4 5 6 7 8 9 1,000 142.68 
Sample number  
Figure 8: Analysis of cost prediction results of four Table 6 indicates that the training time of the GA-
models Elman model is slightly higher than that of the traditional 
 Elman model, primarily due to the additional optimization 
In Figure 8, the differences and REs of the GA-Elman step introduced by the GA. However, this extra 
model across all test samples are generally lower than computational cost is justified, as the GA-Elman model 
those of the Elman model. The mean absolute difference optimizes the network's initial parameters and weights 
between the predicted and actual values for the GA-Elman through GA, significantly improving both prediction 
model is 70.93, while for the Elman network, RNN, and accuracy and generalization ability. Specifically, when the 
SVR, they are 86.38, 87.83, and 87.63, respectively. In sample size is small (e.g., 100 samples), the training time 
some samples, the GA-Elman model still exhibits of the GA-Elman model is 18.75 seconds, only 6.39 
relatively large errors. The main reasons for these larger seconds longer than the Elman model. When the sample 
errors are twofold. First, data irregularity. For instance, size increases to 1,000, the training time becomes 142.68 
Sample 8 may have been affected by drastic fluctuations seconds, which is 26.86 seconds longer than the Elman 
in material prices or abnormal construction environments, model. This increase in training time is acceptable in light 
leading to actual costs significantly higher than the of the improvements in prediction performance. 
model's predictions. However, these exceptional From both a construction and economic perspective, 
situations are not adequately represented in the training the improvements made by the GA-Elman model are 
data. Second, model limitations. The GA-Elman model significant. In construction management, accurate cost 
has enhanced its ability to capture nonlinear features forecasting is crucial for budget control and risk 
through parameter optimization by GA. Nevertheless, it mitigation. The GA-Elman model's high prediction 
may still be insufficiently responsive to the dynamic accuracy (with a MAPE of only 2.75%) enables it to 
changes of certain key influencing factors, such as capture the complex nonlinear relationships in 
unexpected design changes or construction delays. construction costs, thus providing project managers with 
Meanwhile, the calculated MAPE for the GA-Elman more reliable decision support. This capability is 
model is 2.75%, which is significantly reduced compared especially beneficial for large and complex projects, as it 
to the Elman model's 3.37%. The MAPEs for RNN and helps reduce overspending risks and delays due to budget 
SVR are 3.46% and 3.45%, respectively, higher than that miscalculations. Additionally, by accurately assessing key 
of the GA-Elman model. This further demonstrates the influencing factors (such as material prices and 
effectiveness of GA in optimizing neural network construction conditions), the model helps managers 
parameters and improving prediction accuracy. These identify potential risks earlier, allowing for timely 
results show that GA-Elman model is more accurate in adjustments in construction plans and financial allocations. 
capturing the complex relationship of construction cost From an economic perspective, the application of the 
data, thus providing more reliable support in cost GA-Elman model in budget optimization remarkably 
estimation and overspending risk assessment of improves resource allocation efficiency. Compared to the 
construction projects. traditional Elman model and other benchmark models, the 
 GA-Elman model offers a clear advantage in effectively 
 reducing unnecessary financial waste and optimizing 
 financial planning. For example, for cost-sensitive 
 samples (such as Samples 14 and 32), there is still some 
 error. However, the model provides managers with a cost 
 estimate closer to the actual values, laying a foundation for 
 reasonable financial resource distribution and cash flow 
control. Moreover, the GA-Elman model's ability to 
identify and quantify overspending risk allows enterprises 
Differential value
Relative Error (%)
Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 103 
to develop more scientifically-based long-term financial 5 Conclusion 
strategies, thereby reducing the economic losses caused by 
uncontrollable costs. This study analyzes the application of the GA-Elman 
In conclusion, the GA-Elman model has considerable model in construction cost estimation and overspending 
potential in construction cost estimation and economic risk analysis by constructing a construction cost 
risk management. It enhances the intelligence level of estimation model based on the Elman network and 
construction management while providing a reliable tool optimizing the model with GA. It verifies the performance 
for budget optimization and cost control. The model of the model through experiments. The conclusions are as 
contributes positively to lean management and improved follows. (1) The GA-Elman model's high prediction 
economic efficiency in the construction industry. accuracy is demonstrated by the fact that, on the training 
set, the predicted value on most samples is very near to the 
true value and the RE percentage is typically within 1%. 
4.4 Discussion 
(2) When compared to the Elman network, the GA-Elman 
Compared to the existing models summarized in model's projected value is closer to the actual value, and 
Table 1, the proposed GA-Elman model demonstrates on all test samples, the model's difference and RE are 
significant advantages in construction cost estimation and typically smaller than those of the Elman model. (3) The 
overspending risk assessment. In contrast to models such GA-Elman model's MAPE is 2.75%, a considerable 
as GPR and XGBoost, the GA-Elman model is better decrease from the Elman model's 3.37%. It further proves 
suited for handling dynamic changes in time series data. the effectiveness of GA in optimizing neural network 
For instance, while GPR exhibits high accuracy in parameters and improving prediction accuracy. In short, 
predicting tunnel geological conditions, its sensitivity to by optimizing GA, the GA-Elman model increases the 
data scale can lead to decreased computational efficiency ability to detect possible overspending, which is crucial 
when dealing with large-scale complex construction for efficient cost control and budget management, in 
projects. In comparison, the GA-Elman model, by addition to improving the accuracy of cost prediction. 
optimizing weights and thresholds through GA, can Although this study has made some progress in 
process large-scale data more efficiently while fully construction cost estimation and overspending risk 
capturing dynamic changes, thus enhancing the model's assessment, there are still some limitations. First, the 
applicability. robustness of the model needs to be enhanced, as extreme 
The comparison with ANNs and gradient boosting errors occurring on specific data samples indicate 
models indicates that although these models perform well insufficient stability. Second, the study only selects certain 
in rapid construction cost estimation, they lack capability regions and prefabricated buildings, and the limitation of 
in risk assessment. For example, the gradient boosting the sample range may affect the model's generalization 
model primarily focuses on cost optimization and cannot ability, making it difficult to apply to other regions or 
effectively identify key risk factors leading to different building types. Additionally, there may be biases 
overspending. In contrast, the GA-Elman model can in data selection, such as differences between urban and 
predict costs and identify key drivers of overspending rural projects or the impact of various construction 
risks (such as fluctuations in material prices and technologies (e.g., traditional construction versus modern 
construction delays) through its dynamic memory building technologies). These factors could significantly 
mechanism. As a result, it can provide project managers affect the model's applicability and accuracy. Future 
with more targeted decision support. research should consider more comprehensive data 
Compared to hybrid models such as ANN combined collection, covering a wider range of regions, building 
with the Grasshopper algorithm and ARIMA-ANN types, and different construction technologies, to avoid 
models, the GA-Elman model performs better in long- biases caused by data limitations, thereby enhancing the 
term forecasting and modeling complex data relationships. model's generalization ability and adaptability. At the 
Although the ARIMA-ANN model has certain advantages same time, more advanced data preprocessing techniques 
in long-term construction cost estimation, its ability to and algorithm optimization methods can be explored to 
capture nonlinear features is limited. The GA-Elman improve the model's prediction accuracy and stability, 
model, by optimizing network structure through the global providing stronger support for widespread application. 
search capability of GA, can better model nonlinear and 
temporal characteristics. Meanwhile, it can achieve Funding 
superior prediction accuracy in practical tests, with the 
MAPE reduced to 2.75%. This work was supported by Key Laboratory of Building 
In summary, the GA-Elman model outperforms Structures in Anhui Universities (Anhui Xinhua 
existing models in terms of cost prediction accuracy, University) 2023 school-level scientific research project: 
overspending risk assessment ability, and adaptability to “Application Research of BIM Technology in Cost 
complex data. Thus, it offers an innovative solution for Management of Engineering Construction Projects” 
construction cost management and significant practical (Project batch number: KLBSZD202301, Host: Wu Qian). 
guidance for budget control and risk management in 
complex engineering projects. Conflict of interest statement 
There is no conflict of interest in this study. 
104 Informatica 49 (2025) 91–104 Q. Wu 
Ethical compliance statement Farid Hama Ali, H., Ismail Abdullah, A., & Kameran 
Al-Salihi, N. (2021). Forecasting tunnel geology, 
This study does not involve experiments on humans or construction time and costs using machine learning 
animals and does not require ethical approval. methods. Neural Computing and Applications, 33, 
321-348. https://doi.org/10.1007/s00521-020-05006-
References 2 
[1] Li L. (2023). Dynamic cost estimation of [11] Alshboul, O., Shehadeh, A., Almasabha, G., & 
reconstruction project based on particle swarm Almuflih, A. S. (2022). Extreme gradient boosting-
optimization algorithm. Informatica, 47(2), 173-182. based machine learning approach for green building 
https://doi.org/10.31449/inf.v47i2.4026 cost prediction. Sustainability, 14(11), 6651. 
[2] Maya, R., Hassan, B., & Hassan, A. (2023). Develop https://doi.org/10.3390/su14116651 
an artificial neural network (ANN) model to predict [12] Pham, T. Q. D., Le-Hong, T., & Tran, X. V. (2023). 
construction projects performance in Syria. Journal of Efficient estimation and optimization of building 
King Saud University-Engineering Sciences, 35(6), costs using machine learning. International Journal 
366-371. of Construction Management, 23(5), 909-921. 
https://doi.org/10.1016/j.jksues.2021.05.002 https://doi.org/10.1080/15623599.2021.1943630 
[3] Xu, X., & Zhang, Y. (2023). Regional steel price index [13] Goodarzizad, P., Mohammadi Golafshani, E., & 
forecasts with neural networks: evidence from east, Arashpour, M. (2023). Predicting the construction 
south, North, central south, northeast, southwest, and labour productivity using artificial neural network and 
northwest China. The Journal of Supercomputing, grasshopper optimisation algorithm. International 
79(12), 13601-13619. Journal of Construction Management, 23(5), 763-779. 
https://doi.org/10.1007/s11227-023-05207-1 https://doi.org/10.1080/15623599.2021.1927363 
[4] GadelHak, Y., El-Azazy, M., Shibl, M. F., & [14] Kim, S., Choi, C. Y., Shahandashti, M., & Ryu, K. R. 
Mahmoud, R. K. (2023). Cost estimation of synthesis (2022). Improving accuracy in predicting city-level 
and utilization of nano-adsorbents on the laboratory construction cost indices by combining linear ARIMA 
and industrial scales: A detailed review. Science of and nonlinear ANNs. Journal of Management in 
The Total Environment, 875, 162629. Engineering, 38(2), 04021093. 
https://doi.org/10.1016/j.scitotenv.2023.162629 https://doi.org/10.1061/(ASCE)ME.1943-
[5] Abdel-Hamid, M., & Abdelhaleem, H. M. (2023). 5479.0001008 
Project cost control using five dimensions building [15] Ye, M., Wang, J., Si, X., Zhao, S., & Huang, Q. (2022). 
information modelling. International Journal of Analysis on dynamic evolution of the cost risk of 
Construction Management, 23(3), 405-409. prefabricated building based on DBN. Sustainability, 
https://doi.org/10.1080/15623599.2021.1880313 14(3), 1864. https://doi.org/10.3390/su14031864 
[6] Durstewitz, D., Koppe, G., & Thurm, M. I. (2023). [16] Su, D., Fan, M., & Sharma, A. (2022). Construction 
Reconstructing computational system dynamics from of lean control system of prefabricated mechanical 
neural data with recurrent neural networks. Nature building cost based on Hall multi-dimensional 
Reviews Neuroscience, 24(11), 693-710. structure model. Informatica, 46(3). 
https://doi.org/10.1038/s41583-023-00740-7 https://doi.org/10.31449/inf.v46i3.3914 
[7] Hai, T., & Zhou, J. (2023). Predicting the performance [17] Cao, P., & Lei, X. (2023). Evaluating Risk in 
of thermal, electrical and overall efficiencies of a Prefabricated Building Construction under EPC 
nanofluid-based photovoltaic/thermal system using Contracting Using Structural Equation Modeling: A 
Elman recurrent neural network methodology. Case Study of Shaanxi Province, China. Buildings, 
Engineering Analysis with Boundary Elements, 150, 13(6), 1465. 
394-399. https://doi.org/10.3390/buildings13061465 
https://doi.org/10.1016/j.enganabound.2023.02.013 [18] Leu, S. S., Lu, C. Y., & Wu, P. L. (2023). Dynamic-
[8] Das, S., Tariq, A., Santos, T., Kantareddy, S. S., & Bayesian-network-based project cost overrun 
Banerjee, I. (2023). Recurrent neural networks prediction model. Sustainability, 15(5), 4570. 
(RNNs): architectures, training tricks, and https://doi.org/10.3390/su15054570 
introduction to influential research. Machine [19] Asiedu, R. O., & Adaku, E. (2020). Cost overruns of 
Learning for Brain Disorders, 117-138. public sector construction projects: a developing 
https://doi.org/10.1007/978-1-0716-3195-9_4 country perspective. International Journal of 
[9] Hananto, A. L., Fauzi, A., Suhara, A., Davison, I., Managing Projects in Business, 13(1), 66-84. 
Spraggon, M., Herawan, S. G., et al. (2023). Elman https://doi.org/10.1108/IJMPB-09-2018-0177 
and cascade neural networks with conjugate gradient [20] Kumar, R. (2022). Memory recurrent Elman neural 
polak-ribière restarts to predict diesel engine network-based identification of time-delayed 
performance and emissions fueled by butanol as nonlinear dynamical system. IEEE Transactions on 
sustainable biofuel. Results in Engineering, 19, Systems, Man, and Cybernetics: Systems, 53(2), 753-
101334. 762. 10.1109/TSMC.2022.3186610 
https://doi.org/10.1016/j.rineng.2023.101334 [21] Miranda, M. H., Silva, F. L., Lourenço, M. A., Eckert, 
[10] Mahmoodzadeh, A., Mohammadi, M., Daraei, A., J. J., & Silva, L. C. (2023). Particle swarm 
optimization of Elman neural network applied to 
Optimization of Elman Neural Network Using Genetic Algorithm… Informatica 49 (2025) 91–104 105 
battery state of charge and state of health estimation. 
Energy, 285, 129503. 
https://doi.org/10.1016/j.energy.2023.129503 
[22] Chandar, S. K. (2021). Grey Wolf optimization-
Elman neural network model for stock price 
prediction. Soft Computing, 25(1), 649-658. 
10.1007/s00500-020-05174-2 
[23] Zhang, Y., Zhao, J., Wang, L., Wu, H., Zhou, R., & Yu, 
J. (2021). An improved OIF Elman neural network 
based on CSO algorithm and its applications. 
Computer Communications, 171, 148-156. 
https://doi.org/10.1016/j.comcom.2021.01.035 
[24] Sohail, A. (2023). Genetic algorithms in the fields of 
artificial intelligence and data sciences. Annals of 
Data Science, 10(4), 1007-1018. 
https://doi.org/10.1007/s40745-021-00354-9 
[25] Chotchantarakun, K. (2023). Optimizing Sequential 
Forward Selection on Classification using Genetic 
Algorithm. Informatica, 47(9). 
https://doi.org/10.31449/inf.v46i9.4964 
[26] Guo, W., Meng, Q., Wang, X., Zhang, Z., Yang, K., & 
Wang, C. (2022). Landslide displacement prediction 
based on variational mode decomposition and GA–
Elman model. Applied Sciences, 13(1), 450. 
https://doi.org/10.3390/app13010450 
[27] Wang, C., Zhao, Y., Bai, L., Guo, W., & Meng, Q. 
(2021). Landslide displacement prediction method 
based on GA-Elman model. Applied sciences, 11(22), 
11030. https://doi.org/10.3390/app112211030 
https://doi.org/10.31449/inf.v49i12.6927 Informatica 49 (2025) 105–114 105 
Comparison of Machine Learning Algorithms for Predicting Thyroid 
Disorders in Diabetic Patients 
Hiba O. Sayyid*1, Salma A. Mahmood2, and Saad S. Hamadi3  
1Department of Computer Science, University of Basrah, College of Computer Sciences and Information Technology, 
Basrah, Iraq 
2Department of Intelligent Medical Systems, University of Basrah, College of Computer Sciences and Information 
Technology, Basrah, Iraq 
3Department of Internal Medicine, University of Basrah, College of Medicine, Basrah, Iraq 
E-mail: Itpg.hiba.oudah@uobasrah.edu.iq, Salma.mahmood@uobasrah.edu.iq, and Saad.shaheen@uobasrah.edu.iq 
*Corresponding author  
Keywords: machine learning, classification, decision tree, random forest, support vector machine, naïve bayes, logistic 
regression, K-nearest neighbor, diabetes, thyroid disorders 
Received: August 17, 2024 
Machine Learning (ML), a subfield of Artificial Intelligence (AI), has been used successfully in the 
healthcare domain for disease diagnosis. Thyroid disorders and diabetes are two of the most prevalent 
and interconnected chronic diseases, as both play critical roles in regulating various physiological 
processes in the body. This study aims to predict thyroid disorders in diabetes patients using six machine 
learning algorithms: Random Forest (RF), Decision Tree (DT), K-Nearest Neighbors (KNN), Logistic 
Regression (LR), Naïve Bayes (NB), and Support Vector Machine (SVM). A locally sourced dataset 
comprising 44,539 instances of diabetic patients was utilized, undergoing preprocessing steps including 
data cleaning, encoding, and balancing. Two balancing techniques were employed: manual balancing 
and RandomUnderSampler. The dataset was partitioned into training and testing sets using a Stratified 
K-Fold cross-validation approach with 10 folds to ensure robust evaluation. Each algorithm’s 
performance was assessed using metrics such as accuracy and F1-score. Among the models, the RF 
algorithm outperformed the others, achieving the highest accuracy of 95% on the manually balanced 
dataset and 84% when the RandomUnderSampler technique was employed. Additionally, the F1-scores 
for RF were 95% and 82%, respectively, indicating its robustness in handling imbalanced datasets. This 
study highlights the importance of selecting appropriate preprocessing techniques and machine learning 
methods for healthcare datasets. The findings can assist healthcare providers in making early diagnoses 
and interventions for thyroid disorders in diabetic patients, potentially improving their quality of life and 
overall healthcare outcomes. 
Povzetek: Opisana je uporaba strojnega učenja za napovedovanje motenj ščitnice pri bolnikih s sladkorno 
boleznijo. Algoritem naključnih gozdov doseže najvišjo točnost in oceno F1 na uravnoteženem naboru 
podatkov. 
 
• Type 1 diabetes is an autoimmune disease which is 
1 Introduction usually diagnosed in children and young adults [5], it 
occurs when the insulin-producing cells of the  
Diabetes and thyroid disorders are among the most 
• pancreas is attacked by the immune system which 
prevalent chronic diseases affecting the endocrine and 
leads to little or no insulin [6]. 
metabolic systems [1]. These two diseases are often 
• Type 2 diabetes is the most common type of diabetes 
coexisted and strongly linked together, as many studies 
that often occurs in older adults when the body 
have shown that there is a higher prevalence of thyroid 
doesn’t produce enough insulin or becomes resistant 
disorders in diabetic patients and vice versa [2]. 
to insulin [6]. 
Diabetes is a chronic condition that is caused by 
• Gestational diabetes this type develops as a 
elevated levels of blood sugar (glucose) [3]. This occurs 
complication in women during pregnancy and usually 
when the body either cannot use the insulin it produces 
goes away after the baby is born [7]. 
effectively or cannot produce enough insulin.  Insulin is a 
hormone that allows the body cells to absorb and use • There are fewer common types of diabetes that are 
caused by genetic conditions and diseases such as 
glucose for energy and helps regulate blood sugar [4]. As 
secondary diabetes and monogenic diabetes. 
a result, diabetes affects various body functions. There are 
 
four types of this disease 
 
106 Informatica 49 (2025) 105–114 H.O. Sayyid et al.  
Thyroid disorder is a disease that affects the function theorem, Rule, or law is used to describe the probability 
of the thyroid gland in producing the appropriate amounts of a hypothesis with existing knowledge. Bayes’ theorem 
of thyroid hormones T3 (tri-iodothyronine), and T4(tetra- formula is [11]: 
iodothyronine), as these hormones play an important role 𝐏(𝐀|𝐁) =  (𝐏(𝐁|𝐀)  ∗  𝐏(𝐀)) / 𝐏(𝐁) (1) 
in controlling many vital activities of the body such as  
heart rate, energy level, metabolism, bone health, and NB is computationally efficient, easy to create, and 
many other functions. The most common thyroid can handle large datasets [12]. It is very effective in text 
disorders are Hyperthyroidism and Hypothyroidism [8]. In classification tasks, such as spam filtering. Despite that 
hyperthyroidism, the thyroid gland overproduces thyroid it’s a simple algorithm with the independence assumption, 
hormones. While in hypothyroidism the thyroid gland it can often outperform complex algorithms. 
does not produce enough thyroid hormones [8].  Decision Tree (DT): is one of the supervised Machine 
Studies show that there is a higher prevalence of learning algorithms that is used for both classification and 
thyroid disorders among patients with type 1 or type 2 regression problems [14]. DT is a visual representation of 
diabetes in comparison to non-diabetic patients, which the decision-making process, it’s a tree-like graph that 
reveals their close relationships, it also shows that partitions the data based on the input features, the tree 
autoimmunity is a key to understanding the link between starts with a root which has the highest gain then nodes 
type 1 diabetes and autoimmune thyroid disorders [9].  and branches. Where each node represents a test that 
The presence of insulin resistance or diabetes follows the if-then statement and leads to a different 
increases an individual’s risk of developing thyroid branch, each branch leads to one outcome (decision) [15]. 
disorders while having thyroid disorder can increase the It is a widely used algorithm for predicting diseases. 
risk of developing diabetes and metabolic syndrome [10]. K-Nearest Neighbors (KNN) is one of the simplest 
It is very important to diagnose thyroid disorder in diabetic lazy learning machine learning algorithms that make 
patients and a routine screening should also be predictions based on the entire data [12],[16]. The 
recommended. It is necessary that the clinician identify the algorithm is used for solving classification and regression 
high-risk diabetic groups and manage the thyroid tasks. KNN assumes that similar data points are located 
abnormalities if present as soon as possible to reduce the near each other, the similarity is called distance. It uses 
risk of further complications [10]. distant measures like Euclidean to measure similarity. 
The primary aim of this study is to assess the Although KNN is a very simple and easy-to-implement 
effectiveness of six machine learning methods (Decision algorithm, its results can be very competitive [16]. 
tree, random forest, Support Vector Machine, Naïve Support Vector Machine (SVM) is one of the most 
Bayes, k nearest neighbor, logistic regression) in popular machine learning algorithms. It is used primarily 
predicting the presence or absence of thyroid disorders in for classification tasks but can also be used for regression 
diabetes patients. By comparing the results of each [12]. The main goal of SVM is to separate the data with a 
method, we aim to identify the most accurate model to hyperplane into different classes so that we can easily put 
enhance early detection and intervention. Machine the new data point into one of the classes [11]. SVM can 
learning methods used in this study differ in their nature be effective in high-dimensional spaces and is widely used 
and work but they are all used for predicting new states. in image classification, Face detection, text categorization, 
Logistic regression (LR): is a classification machine and handwriting recognition. 
learning algorithm that is used for predictive analysis Random Forest (RF) is a machine learning algorithm 
based on the concept of probability [11]. LR classifies the that belongs to the group of decision-tree-based methods 
data using the logistic sigmoid function. LR predicts one [13]. It can be used for classification tasks and regression. 
of two possible outcomes of a categorical dependent Random forest is a collection of decision trees built during 
variable. Therefore, the outcome must be a categorical or the training process and then the prediction of these trees 
discrete value. It does not give the value of either True or is combined during the testing process. RF approach gives 
False, Yes or No, 0 or 1, etc. instead, it gives a a better accurate result in comparison to a single decision 
probabilistic value between 0 and 1[12].  To classify tree with the ability to limit overfitting [16]. 
instances into the two classes. a common approach is to The organization of this paper is as follows, …. 
use a threshold value (e.g., 0.5), If the predicted 
probability is above the threshold, the instance is assigned 2 Literature review  
to one class, and if it is below the threshold, then it is 
assigned to the other class. LR is widely used for many In the field of machine learning-based prediction of 
tasks such as fraud detection, disease diagnosis, and diabetes and thyroid disorders, numerous studies have 
prediction, Tumor Malignant or Benign, mail spam or not explored various algorithms and methodologies. This 
spam, etc. [11]. section provides a structured comparison of these studies 
Naïve Bayes (NB): is a simple machine learning in terms of the algorithms used, evaluation metrics, and 
classification algorithm based on Bayes’ Theorem [13]. It the reported results. By highlighting the strengths and 
is called naïve because of the assumption of conditional limitations of previous works, we emphasize the novelty 
independence among the features which means that the and contributions of the present study. 
presence or absence of one feature in a class is 
independent of the presence or absence of the other 
features. It is used for a large amount of data. Bayes’ 
Comparison of Machine Learning Algorithms for Predicting Thyroid… Informatica 49 (2025) 105–114 107 
2.1 Diabetes prediction studies Dudkina Classification DT based Accuracy DT:71% 
T et al. and detection model 
Hassan et al. [17] applied SVM, K-Nearest Neighbors (2021) of diabetes 
disease 
(KNN), and Decision Tree to classify diabetes patients. Yadav et Predicting Random Accuracy RF: 99% 
The study showed that SVM outperformed the other al. (2020) thyroid Forest, 
disease Decision 
algorithms with the highest accuracy of 90.23%. 
Tree, CART 
Samin Poudel [18] tested 20 machine learning Priyanka Diagnosing Naive Bayes, Accuracy SVM: 
algorithms for diagnosing diabetes based on the Pima Duggal & thyroid SVM, 92.92% 
Shipra disorders Random 
Indian Diabetes Dataset. Naive Bayes emerged as the best- Shukla Forest 
performing algorithm with an accuracy of 77%, an F1- (2020) 
Chaubey Thyroid Logistic Accuracy KNN: 
score of 0.83, a precision of 0.80, and a recall of 0.86. 
G. et al. disease Regression, 96.88% 
Dudkina T et al. [19] presented a study that is (2012) prediction Decision 
dedicated to handling the problem of Classification and Trees, KNN 
Chaganti predicting RF, SVM, Accuracy RF: 99% 
detection of diabetes disease. The study focuses on et al. thyroid AdaBoost 
developing a decision tree-based machine learning model (2022) disorders (ADA), LR, 
and Gradient 
to solve this problem. The results showed that splitting the boosting 
data by 50% for training and 50% for testing was the best machine 
option with 0.71 accuracy. (GBM), as 
well as three 
deep learning 
models 
2.2 Thyroid disease prediction studies Current Predicting RF, DT, Accuracy, RF with 
Yadav D et al. [20] used Random Forest, Decision Tree, study thyroid SVM, KNN, F1-Score, Accuracy: 
disorders in NB, and LR Precision, 88%, F1-
and Classification and Regression Tree (CART) to predict diabetic Recall, and Score: 
thyroid disease. The results showed that Random Forest patients  Specificity  85% 
achieved an accuracy of 99%, followed by Decision Tree  
98% and CART 93%. Their ensemble approach From the table above, we can see that various studies 
combining these classifiers achieved a perfect accuracy of have employed different algorithms to predict diabetes 
100%. and thyroid disorders with varying results. For instance, 
Priyanka Duggal and Shipra Shukla [21] used feature SVM and Decision Tree techniques are commonly used in 
selection and classification techniques like Naive Bayes, diabetes prediction, with SVM often yielding higher 
SVM, and Random Forest to diagnose thyroid disorders. accuracy compared to other algorithms. On the other hand, 
The study reported that SVM achieved the highest for thyroid disease prediction, Random Forest and KNN 
accuracy with 92.92%. have been reported to achieve remarkable accuracy, with 
Chaubey G. et al. [22] tested Logistic Regression, Random Forest reaching up to 100% accuracy when 
Decision Trees, and KNN for thyroid disease prediction. combined with ensemble methods. 
KNN achieved the highest accuracy at 96.88%. While these studies have contributed significantly to 
Chaganti et al. [23] presented a method that focuses the field, there remains a gap in comprehensive and 
on the multi-class problems to predict thyroid disorders reliable approaches for predicting thyroid disorders 
using five machine learning models including RF, SVM, specifically in the diabetic population. They often focus 
AdaBoost (ADA), LR, and Gradient boosting machine on either one disorder or use fewer evaluation metrics. 
(GBM), as well as three deep learning models. They Some studies rely primarily on accuracy, which may not 
created a dataset from the UCI thyroid disease datasets reflect the model's true performance, especially when 
that contained 9173 patient records,31 features, and 6771 class imbalance exists. The F1-score and AUC metrics are 
normal patient records with no sign of thyroid disease. The more informative but have not been consistently used 
dataset was randomly balanced by taking 400 samples across studies. 
from the 6771 records, and at least 200 samples for the The current study addresses these gaps by utilizing a 
other classes. The results showed that when using the comprehensive preprocessing pipeline that includes 
random forest classifier with the presented method it can feature selection technique, and effective class imbalance 
achieve a 0.99 accuracy in predicting ten types of thyroid handling using methods like RandomUnderSampler. 
diseases. Additionally, this study adopts a range of evaluation 
metrics (accuracy, F1-score, precision, recall, and 
Table 1:Summary table specificity) to offer a well-rounded analysis of model 
Study Methodology Algorithms Key Results performance. Furthermore, we compare multiple machine 
  Used Evaluation learning models RF, SVM, KNN, DT, NB, and LR using 
Metrics 
Hassan et Classifying SVM, KNN, Accuracy SVM: cross-validation, which not only strengthens the model 
al (2020). diabetes DT 90.23% evaluation but also ensures more robust generalization to 
patients unseen data. 
Samin Diagnosing 20 ML Accuracy, Naive 
Poudel diabetes approaches Precision, Bayes: By offering a balanced prediction model with high 
(2021)  Recall, F1- Accuracy accuracy (88%) and F1-score (0.85), the current study 
score 77%, F1-
score 83%, surpasses previous works in terms of both the depth of 
Precision analysis and the performance metrics, which positions it 
80% 
108 Informatica 49 (2025) 105–114 H.O. Sayyid et al.  
as a significant advancement in predicting thyroid accuracy required for clinical applications and minimized 
disorders in diabetic patients. the risk of introducing errors associated with imputation. 
3 Proposed methodology  After removing incomplete records, the dataset was 
carefully inspected to confirm that it remained 
The main objective of this study is to predicate the representative of the original population in terms of key 
relationship between diabetes mellitus and Thyroid demographic and clinical features, ensuring that the 
disorders. Six different prediction methods were used for removal process did not introduce unintended biases. 
this purpose as aforementioned above. The proposal 
methodology is shown in the following Figure 1. 
 3.2.2 Data cleaning 
Data cleaning is a critical process that significantly 
impacts the quality and reliability of predictive models. A 
clean dataset ensures accurate and robust machine 
learning models with improved performance and 
trustworthy predictions. In this study, thorough data 
cleaning was performed to address various issues and 
errors present in the dataset. The data cleaning process 
involved 
• Identifying and rectifying incorrect data entries which 
included instances where ambiguous letters, words, 
and symbols were used such as ‘, \\, \L, \N,], B, E, EX, 
L, M, MN, N, N’, N N, NNNN, N\, N\], N\N, N], N] 
\, H, U, صى ة,   ,. Such values represent noises and 
inaccuracies in the dataset. Furthermore, 
inconsistencies in data entry were addressed, 
including the use of 'لا' in Arabic instead of 'No', as 
well as the recording of 'N' instead of 'No'. 
 Additionally, discrepancies in capitalization were 
noted, such as 'female' being recorded instead of 
Figure 1: Flowchart of the thyroid disorder prediction 'Female'. By rectifying these mistakes, the dataset was 
system. standardized, eliminating potential sources of error in 
the analysis. 
The above flowchart is illustrative of the following 
process: • Handling the age field by determining ages in ranges 
(15-100 years), in line with the policy of the diabetes 
3.1 Data collection center catering to adults only.  
For this study, we used a medical dataset related to 
• Similarly, filtering out heights and weights that fell 
diabetes patients which was obtained from Faiha 
outside the normal ranges. These actions were 
Specialized Diabetes Endocrine and Metabolism Center 
essential to preserve the integrity of the dataset and 
(FDEMC) in Basra, Iraq. 
enhance the accuracy of our analyses. 
3.2   Data preprocessing  
3.2.3 Feature engineering 
Data preprocessing was a critical step in preparing the 
To enhance the performance of machine learning models, 
dataset for effective machine learning model training and 
new features were derived from existing ones through 
evaluation. This section elaborates on the detailed 
feature engineering. For example, the Age feature was 
procedures used, including handling missing values, 
computed from the patients' dates of birth, and the Body 
feature engineering, and encoding categorical variables 
Mass Index (BMI) was calculated using height and weight 
ensuring replicability and transparency. 
measurements. These newly created features provided 
additional insights into patient characteristics, which 
3.2.1 Handling missing values contributed to improving the predictive power of the 
models. 
Given the sensitivity of clinical data and the potential risks 
of introducing bias through imputation, instances with 3.2.4 Encoding categorical variables 
missing values were excluded from the dataset. This 
approach ensured the integrity and reliability of the Since machine learning algorithms generally require 
analysis by working exclusively with complete data. numerical input, data encoding is essential to convert 
While this reduced the dataset size, it maintained the categorical variables into a suitable format. This study 
Comparison of Machine Learning Algorithms for Predicting Thyroid… Informatica 49 (2025) 105–114 109 
used label encoding to transform variables such as sex, simplicity and effectiveness in achieving balance without 
family history of DM, glycemic control, lipid control, introducing synthetic data. 
pressure control, thyroid, marital status, smoker, and  
drinker into numerical representations compatible with 
machine learning models. After these steps the dataset 3.3.2 Experiment 2: manual balancing 
consists of 44539 instant and 12 variables, Table 
In the second experiment, the dataset was manually 
2illustrates each variable along with its corresponding balanced under the expert supervision of a physician to 
encoded values. ensure the process was clinically valid and aligned with 
medical standards. The dataset was reduced to 2,166 
Table 2:  Description of the used data instances, with an equal number of examples from both 
Feature  Description  Value After classes. Unlike RUS, manual balancing involved the 
Encoding careful selection of instances, allowing for greater control 
Thyroid If the patient is diagnosed with a 0 means No over the data distribution while preserving its clinical 
thyroid disorder. 1 means Yes 
DM If the patient has type1 or type2 1 for type1 relevance. This approach mitigated the potential bias 
Diabetes Mellitus 2 for type2 introduced by random sampling, ensuring that the 
Age The patient's age in years Range (15-100) 
balanced dataset reflected real-world clinical scenarios. 
Sex The patient’s gender: 0 for male 
1 for female  
Family If the patient has a family member with 0 means No Although techniques such as RandomOverSampler 
history of DM diabetes 1 means Yes 
BMI Body Mass Index: the patient’s weight Range (10.8- (ROS), Synthetic Minority Over-Sampling Technique 
divided by the square of height  75.3) (SMOTE), and ensemble methods like Balanced Random 
Lipid control The patient’s lipid levels in the 0 means No Forest (BRF) are widely used for handling imbalanced 
bloodstream are managed 1 means Yes 
Pressure The patient’s blood pressure levels are 0 means No data, they were not employed in this study. The primary 
control managed to stay in a specific target 1 means Yes concern was that synthetic data might fail to capture the 
range 
Glycemic The patient’s blood sugar levels are 0 means No true clinical variability of the minority class, potentially 
control managed in a specific target range 1 means Yes introducing artificial patterns that could distort model 
Smoker If the patient is a current smoker, non- 0 means No predictions and reduce generalizability. Additionally, 
smoker, or former smoker. 1 means Yes 
2 means X- these methods increase computational complexity and 
smoker training time, making them less suitable for the objectives 
Drinker If the patient is a current drinker, non- 0 means No 
drinker, or former drinker  1 means Yes of this study. Instead, simpler and more controlled 
2 means X- balancing methods were chosen to maintain a 
drinker representative and manageable dataset. 
Marital  If the patient is married, single, 0 means Single 
divorced, or widowed. 1 means Married 
2 means 3.4 Model selection and training 
Divorced 
3 means Widow  
 
3.4.1 Model selection 
In this study, we employed six machine learning 
3.3 Addressing class imbalance 
algorithms to predict thyroid disorders in diabetic patients: 
Class imbalance is a prevalent challenge in machine Random Forest (RF), Decision Tree (DT), K-Nearest 
learning, especially in healthcare datasets where minority Neighbors (KNN), Logistic Regression (LR), Naïve 
classes often represent critical conditions. In this study, Bayes (NB), and Support Vector Machine (SVM). These 
the dataset was imbalanced, with only 15.17% of instances models were selected for their diverse characteristics and 
representing patients with thyroid disorders (6,755 strengths in classification tasks, particularly in medical 
instances), compared to 84.83% without thyroid disorders datasets. Allowing us to compare their performance in 
(37,784 instances). To address this imbalance, two addressing the two different datasets. The rationale for 
techniques were employed. selecting these models is summarized below: 
 
3.3.1 Experiment 1: RandomUnderSampler • Random Forest (RF) was chosen for its ensemble 
(RUS) nature, which combines multiple decision trees to 
In the first experiment, the RandomUnderSampler (RUS) reduce overfitting and improve generalization. RF is 
technique was used to address the class imbalance. This particularly effective in handling high-dimensional 
method randomly reduces the size of the majority class to datasets with complex interactions. Additionally, RF 
match that of the minority class, creating a balanced provides feature importance rankings, offering 
dataset. After applying RandomUnderSampler, the dataset insights into which factors contribute most to 
was reduced to 13,438 instances, with an equal predictions. 
distribution of 50% representing patients with thyroid • Decision Tree (DT) was selected for its simplicity, 
disorders and 50% without. While this approach ensures interpretability, and ability to model nonlinear 
that the models are not biased toward the majority class, it relationships. Furthermore, DTs offer visual 
can result in the loss of valuable information by discarding representations of decision rules, making them 
majority-class instances. Nonetheless, it was chosen for its especially useful for understanding model behavior. 
110 Informatica 49 (2025) 105–114 H.O. Sayyid et al.  
• K-Nearest Neighbors (KNN) was included due to its ranked by Random Forest importance, using 10-fold 
ability to perform well in non-linear decision Stratified Cross-Validation to evaluate each combination. 
boundaries by evaluating the proximity between The optimal configuration was selected based on the 
instances. It is an intuitive algorithm that can be highest cross-validation accuracy and minimal train-test 
effective when there are clear clusters in the data. accuracy differences, ensuring good generalization and 
• Logistic Regression (LR) was chosen for its minimizing the risk of overfitting or underfitting during 
simplicity, interpretability, and strong performance in cross-validation. 
binary classification tasks. As a linear model, LR The combination of multiple models, cross-
serves as a robust baseline, helping to benchmark the validation, sequential feature selection, and 
performance of more complex approaches. hyperparameter tuning ensured that we could rigorously 
• Naïve Bayes (NB) was selected for its simplicity and evaluate the performance of each algorithm and select the 
efficiency in handling large datasets with categorical one best suited for predicting thyroid disorders in diabetic 
features. Its probabilistic nature makes it well-suited patients. This approach provided a comprehensive 
for classification tasks with independent features. understanding of the strengths and weaknesses of each 
particularly the Gaussian variant, model, helping guide the decision-making process for 
• Support Vector Machine (SVM) was chosen for its real-world applications. 
ability to find complex decision boundaries in high-
dimensional spaces. It is particularly effective in 3.5 Evaluation  
separating classes with a clear margin. The evaluation phase focused on assessing and comparing 
the performance of the models using different metrics: 
3.4.2 Training accuracy, precision, recall, F1 score, sensitivity, 
Initially, a Random Forest classifier was employed to specificity, and a confusion matrix to provide a 
determine the most influential features by ranking them comprehensive view of the model’s ability to correctly 
based on their importance scores shown in Figure 2 and classify instances. The metrics were calculated based on 
Figure 3. These top-ranked features were subsequently the model predictions on the test dataset. 
utilized for training the models.  
All models were trained using Stratified K-Fold cross- Accuracy: means how many times the model made a 
validation with 10 folds, ensuring that the distribution of correct prediction among the total number of instances 
thyroid and non-thyroid patients was maintained in each [16]. 
fold. This method provides a robust evaluation of the 𝑻𝑷 + 𝑻𝑵 (2) 
𝒂𝒄𝒄𝒖𝒓𝒂𝒄𝒚 =  
models' performance by assessing them across multiple 𝑻𝑷 + 𝑻𝑵 + 𝑭𝑵 + 𝑭𝑷
data splits, which helps mitigate the risk of overfitting or  
underfitting. Precision: means the number of positive (correct) 
To enhance the feature selection process, we used a predictions made by the model and belongs to the positive 
sequential feature selection approach, where we started by class [12]. 
𝑻𝑷
training each model with a single feature and (3) 
𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 =  
𝑻𝑷 + 𝑭𝑷
incrementally added more features. This allowed us to 
 
identify the most relevant features for each model and 
Recall (Sensitivity): means the number of actual 
ensured that only the most informative variables were 
positive (correct) predictions made by the model out of all 
used, optimizing the model's performance. 
positive examples in the dataset [15]. 
We assessed both training and testing accuracies to 𝑻𝑷 (4) 
evaluate how well each model generalized to unseen data. 𝒓𝒆𝒄𝒂𝒍𝒍 =  
𝑻𝑷 + 𝑭𝑵
By comparing these accuracies, we were able to detect  
potential overfitting (where the model performs well on F1score: provides a single score that combines both 
training data but poorly on testing data) or underfitting precision and recall in one number to find balance [24]. It 
(where the model performs poorly on both training and is needed when there is uneven class distribution (more 
testing data). This evaluation ensured that the models negative). 
maintained a balance between accuracy and 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 ∗ 𝒓𝒆𝒄𝒂𝒍𝒍 (5) 
generalization. 𝒇𝟏𝒔𝒄𝒐𝒓𝒆 = 𝟐 ∗  
𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 + 𝒓𝒆𝒄𝒂𝒍𝒍
 
3.4.3 Hyperparameter tuning Specificity (True Negative Rate): The percentage of 
Hyperparameter tuning was performed to optimize model actual negatives properly identified by the model [12]. 
𝑻𝑵
performance. For RF and DT, fixed parameters such as (6) 
𝒔𝒑𝒆𝒄𝒊𝒇𝒊𝒄𝒊𝒕𝒚 =  
max_depth=10 and n_estimators=100, were selected after 𝑻𝑵 + 𝑭𝑷
 
experimenting with various combinations of parameter 
Each model's performance was evaluated using these 
values. These experiments involved testing different 
metrics. The fold that yielded the highest accuracy with 
depths for the trees and numbers of estimators to evaluate 
equal training and testing accuracies was noted, along with 
their impact on the model's performance. While the 
the corresponding optimal number of features. 
hyperparameter tuning for KNN involved testing different 
numbers of neighbors (1–10) and subsets of top features 
Comparison of Machine Learning Algorithms for Predicting Thyroid… Informatica 49 (2025) 105–114 111 
4 Results 
In this section, we present the feature importance ranking 
results and the evaluation results of the six machine 
learning models used.
 
Figure 2: Feature importance ranking for Experiment 1 (on the RandomUnderSampler balanced dataset) 
 
Figure 3:Feature importance ranking for experiment 2 (on the manually balanced dataset) 
 
Figure 2 and Figure 3 display the feature importance 
Table 3:  Experiment 1 evaluation metrics comparison 
ranking derived from a Random Forest model, used to 
table. 
predict thyroid disorders in diabetic patients. The x-axis 
shows the relative importance of each feature, with higher Classifi Accura Precisi F1- Sensitiv Specif Confusion 
er  cy on  Sco ity icity Matrix 
values indicating greater influence on the model's re  (Recall) 
predictions. BMI and age are identified as the most critical RF 0.84 0.96 0.82 0.713 0.967 [[649,22], 
features, with BMI showing the highest impact. Other [193,479]] 
DT 0.83 0.95 0.81 0.702 0.960 [[644,2], 
features, such as diabetes type and sex, also contribute to [200,472]] 
the model but with comparatively lower importance. This KNN 0.83 0.92 0.81 0.720 0.934 [[627,4], 
[188,484]] 
ranking provides valuable insights into the factors most 
SVM 0.79       0.85       0.77       0 .708       0.871 [[585,87], 
predictive of thyroid disorders in the context of diabetes.   [196,476]] 
The results emphasize the significance of clinical factors LR 0.78 0.84 0.76 0.701 0.868 [[583,89], 
[201,471]] 
like BMI and age in thyroid disorder prediction for NB 0.78 0.84 0.76 0.701 0.868 [[583,89], 
diabetes patients. [201,471]] 
  
 
 
 
 
 
 
112 Informatica 49 (2025) 105–114 H.O. Sayyid et al.  
Table 4:  Experiment 2 evaluation metrics comparison 5 Discussion  
table. 
This study highlights the efficacy of machine learning 
Classifi Accura Precisi F1- Sensitiv Specific Confusi models, particularly the Random Forest (RF) algorithm, in 
er cy on  Sco ity ity on 
re  (Recall) Matrix predicting thyroid disorders among diabetic patients. The 
RF 0.95 0.99 0.95 0.917 0.991 [[107,1], findings emphasize the importance of model selection, 
[9, 100]] 
DT 0.95 0.96 0.95 0.944 0.963 [[105,4], data preprocessing, and feature analysis in achieving high 
[6, 102]] predictive performance. This section explores 
KNN 0.94 0.97 0.94 0.917 0.972 [[105,3], comparisons with related works, reasons for Random 
[9, 100]] 
LR 0.94 1.00 0.94 0.890 1.000 [[108,0], Forest’s superior performance, variations in model 
[12, 97]] effectiveness, and limitations, alongside real-world 
SVM 0.93 1.00 0.93 0.861 1.000 [[108,0], 
implications of the findings. 
[15, 93]] 
NB 0.93 1.00 0.93 0.861 1.000 [[108,0],  
[15, 93]] 
 
5.1 Comparison with related works 
The results show that in Experiment 1 where the The findings align with recent studies in the literature that 
RandomUnderSampler technique was employed for data 
emphasize the utility of machine learning for healthcare 
balancing, the Random Forest model demonstrated 
applications. For instance, studies such as Yadav et al. 
superior performance across all metrics compared to other 
demonstrated the effectiveness of ensemble-based models 
models achieving the highest accuracy of 84%, precision 
of 96%, and F1-score of 82%. Followed by DT and KNN like RF in handling structured medical datasets, 
classifiers having the same accuracy of 83%. particularly for classification problems. Compared to 
other methods, the RF model in this study yielded superior 
 However, SVM and LR showed a lower performance, accuracy, recall, and precision, which can be attributed to 
with accuracies of 79% and 78%, respectively.  its ability to handle non-linear relationships and its robust 
In Experiment 2, which used a manually balanced feature selection mechanism. 
dataset, all the classifiers performed extremely well across  While [Priyanka Duggal & Shipra Shukla (2020)] also 
all metrics, with the RF classifier achieving the highest applied Support Vector Machines (SVMs) to medical 
accuracy of 95%, precision of 99%, and an F1-score of datasets with 92% accuracy, our results indicate that SVM 
95% indicating the model's high effectiveness in predicting underperformed relative to RF, potentially due to the high 
thyroid disorders. dimensionality of the features or the imbalanced nature of 
the dataset. This highlights the importance of model 
 Similarly, the DT and KNN also demonstrated high 
selection based on the characteristics of the data. 
accuracies of 95%, and 94%, Correspondingly. This great 
performance is most likely due to the balanced data that 
ensured a better representation of both classes, leading to 5.2 Reasons for random forest's 
more reliable model predictions. performance superiority 
The sensitivity and specificity of these models are The RF model's outperformance can be attributed to 
significantly higher in Experiment 2 compared to several key factors. First, its inherent ability to handle both 
Experiment 1, showcasing the efficacy of the manually categorical and numerical data without extensive 
balanced dataset in enhancing model performance. preprocessing makes it well-suited for medical datasets, 
The results showed that while using which often include diverse feature types. Second, the use 
RandomUnderSampler for data balancing in the first of RandomUnderSampler for data balancing helped 
experiment, the models did not reach the same level of mitigate the issue of class imbalance, which is a critical 
effectiveness as in the manually balanced dataset in the challenge in predicting rare conditions such as thyroid 
second experiment which achieved a consistently high disorders in diabetic patients. RF’s capacity to combine 
performance across all classifiers. This highlights that predictions from multiple decision trees also reduces the 
choosing thoughtful and effective data-balancing risk of overfitting, ensuring more generalized predictions. 
technique can improve the model's overall performance Furthermore, feature importance analysis revealed that 
and prediction accuracy.  variables such as BMI, age, and diabetes type were among 
In summary, for both experiments, the Random Forest the most predictive, aligning with clinical insights and 
model emerged as the best-performing algorithm for lending credibility to the model. 
predicting thyroid disorders in diabetic patients, followed 
closely by the Decision Tree and K Nearest Neighbors 5.3 Variations in performance across 
models. These models demonstrated high accuracy, models 
precision, recall, and F1-score, making them suitable for 
deployment in clinical settings. Logistic Regression, Naïve The variations in performance between models can be 
Bayes, and SVM, while useful, showed comparatively linked to their differing sensitivities to the dataset 
lower performance and may require further optimization characteristics. For example, while K-Nearest Neighbors 
for effective use in this context. (KNN) is sensitive to feature scaling and data distribution, 
its relatively low performance could stem from the high 
dimensionality of the dataset. Similarly, SVM’s reliance 
Comparison of Machine Learning Algorithms for Predicting Thyroid… Informatica 49 (2025) 105–114 113 
on kernel functions may not have adequately captured the In summary, this research contributes to the growing 
complex interactions within the data. In contrast, Decision body of evidence supporting machine learning’s role in 
Trees (DT) performed reasonably well but lacked the healthcare, particularly for complex, multifactorial 
ensemble effect of RF, leading to slightly lower accuracy diseases. Future work should focus on validating these 
and recall. These findings suggest that models like RF, findings in diverse clinical settings, exploring alternative 
which can effectively leverage feature interactions and resampling techniques, and integrating these models into 
handle imbalanced data, are better suited for this specific healthcare systems for real-world application. 
prediction task. 
References  
5.4 Limitations and real-world 
 
applicability [1] F. Rong et al., “Association between thyroid 
Despite these promising results, several limitations must dysfunction and type 2 diabetes: a meta-analysis of 
be acknowledged. First, the study relied on a single prospective observational studies,” BMC Medicine, 
dataset, which may limit the generalizability of the vol. 19, no. 1, Oct. 2021, doi: 
findings to other populations or healthcare settings. https://doi.org/10.1186/s12916-021-02121-2. 
Second, while RandomUnderSampler addressed class [2] B. Biondi, G. J. Kahaly, and R. P. Robertson, 
imbalance, other techniques such as SMOTE or hybrid “Thyroid Dysfunction and Diabetes Mellitus: Two 
approaches could be explored for potentially better results. Closely Associated Disorders,” Endocrine Reviews, 
Additionally, the dataset’s retrospective nature may vol. 40, no. 3, pp. 789–824, Jan. 2019, doi: 
introduce biases inherent to the original data collection https://doi.org/10.1210/er.2018-00163. 
process. [3] N. T. Y. Alibrahim, M. G. Chasib, S. S. Hamadi, and 
In real-world healthcare environments, the A. A. Mansour, “Predictors of Metformin Side 
applicability of this method is promising. The RF model's Effects in Patients with Newly Diagnosed Type 2 
interpretability, particularly through feature importance Diabetes Mellitus,” Ibnosina Journal of Medicine 
scores, provides clinicians with actionable insights, aiding and Biomedical Sciences, Apr. 2023, doi: 
in early diagnosis and tailored treatment planning. https://doi.org/10.1055/s-0043-1761215.  
However, practical deployment would require rigorous [4] I. Tasin, T. U. Nabil, S. Islam, and R. Khan, “Diabetes 
external validation and integration with electronic health prediction using machine learning and explainable AI 
records to assess scalability and user-friendliness. techniques,” Healthcare Technology Letters, vol. 10, 
no. 1–2, pp. 1–10, Dec. 2022, doi: 
6 Conclusion  https://doi.org/10.1049/htl2.12039. 
[5] S. Hassan, A.-K. Ali, and R. Saleem, “Relationship 
Early prediction and diagnosis of diseases remain critical between glycemic control and different insulin 
challenges in the medical domain, particularly for regimens in pediatric type 1 diabetes mellitus,” The 
interconnected conditions like diabetes and thyroid Medical Journal of Basrah University, 2023, doi: 
disorders. While many studies have focused on predicting https://doi.org/10.33762/mjbu.2023.140990.1138. 
these diseases individually, limited research exists on [6] R. Kumar, P. Saha, S. Sahana, Yogendra Kumar, A. 
predicting thyroid disorders specifically among diabetic Dubey, and O. Prakash, “A REVIEW ON 
patients. DIABETES MELLITUS: TYPE1 & 
This study aimed to bridge this gap by applying six TYPE2,” WORLD JOURNAL OF PHARMACY AND 
machine learning algorithms to a local dataset of diabetic PHARMACEUTICAL SCIENCES, vol. 9, no. 10, pp. 
patients to predict the likelihood of thyroid disorders. 838–850, Aug. 2020, doi: 
Unlike previous studies that treated these conditions https://doi.org/10.20959/wjpps202010-17336. 
independently, this research explored the relationship [7] C. McElwain, F. McCarthy, and C. McCarthy, 
between diabetes and thyroid disorders, given their “Gestational Diabetes Mellitus and Maternal Immune 
intertwined impact on vital body functions. Dysregulation: What We Know So 
Among the tested algorithms, the Random Forest 
Far,” International Journal of Molecular Sciences, 
model emerged as the most effective, achieving the 
vol. 22, no. 8, p. 4261, Apr. 2021, doi: 
highest accuracy, precision, and recall. Its ability to handle 
https://doi.org/10.3390/ijms22084261.  
imbalanced data and highlight key predictive features, 
[8] K. Dharmarajan, K. Balasree, A.S. Arunachalam, and 
such as BMI, age, and diabetes type, further solidifies its 
K. Abirmai, “Thyroid Disease Classification Using 
potential as a valuable tool for early diagnosis. 
Decision Tree and SVM,” Indian Journal of Public 
The implications of these findings extend to 
Health Research & Development, vol. 11, no. 03, pp. 
enhancing healthcare practices by enabling clinicians to 
229, Mar. 2020. Doi: 
identify diabetic patients at risk of thyroid disorders, 
https://www.researchgate.net/publication/341742234
facilitating timely interventions, and potentially reducing 
_Thyroid_Disease_Classification_Using_Decision_
complications. By improving early detection, this 
Tree_and_SVM 
approach could significantly enhance the quality of life for 
[9] M. Nishi, “Diabetes mellitus and thyroid 
individuals affected by both conditions. 
diseases,” Diabetology International, vol. 9, no. 2, 
114 Informatica 49 (2025) 105–114 H.O. Sayyid et al.  
pp. 108–112, May 2018, doi: Letters, vol. 44, no. 3, pp. 233–238, May 2020, doi: 
https://doi.org/10.1007/s13340-018-0352-4. https://doi.org/10.1007/s40009-020-00979-z. 
[10] P. Sharma, S. Shrestha, and P. Kumar, “A review on [23] R. Chaganti, F. Rustam, I. De La Torre Díez, J. L. V. 
association between diabetes and thyroid disease,” Mazón, C. L. Rodríguez, and I. Ashraf, “Thyroid 
Santosh University Journal of Health Sciences, vol. 5, Disease Prediction Using Selective Features and 
no. 2, pp. 50–55, Jan. 2020, doi: Machine Learning Techniques,” Cancers, vol. 14, no. 
http://doi.org/10.18231/j.sujhs.2019.013. 16, p. 3914, Aug. 2022, doi: 
[11] S. Gopal, P. Gaurav, and D. Prateek, Machine https://doi.org/10.3390/cancers14163914.  
learning algorithms using Python programming. New [24] G. S. Ohannesian and E. J. Harfash, “Epileptic 
York: Nova Science Publishers, 2021. Seizures Detection from EEG Recordings Based on a 
[12] A. Panesar, Machine Learning and AI for Hybrid system of Gaussian Mixture Model and 
Healthcare: big data for improved health outcomes. Random Forest Classifier,” Informatica, vol. 46, no. 
Berkeley, CA: Apress, 2021. doi: 6, Sep. 2022, doi: 
https://doi.org/10.1007/978-1-4842-6537-6. https://doi.org/10.31449/inf.v46i6.4203.  
[13] F. Pedro and G. Márquez, Handbook of research on  
big data clustering and machine learning. Hershey,  
PA: Engineering Science Reference (an imprint of  
IGI Global), 2020. 
[14] I. H. Sarker, “Machine Learning: Algorithms, Real-  
World Applications and Research Directions,” SN  
Computer Science, vol. 2, no. 3, pp. 1–21, Mar. 2021,  
doi: https://doi.org/10.1007/s42979-021-00592-x. 
[15] Yuxi. (Hayden). Liu, Python Machine Learning by 
Example Build Intelligent Systems Using Python, 
TensorFlow 2, Pytorch, and Scikit-Learn, 3rd 
Edition. Birmingham: Packt Publishing, Limited, 
2020. 
[16] S. L. Mirtaheri and R. Shahbazian, Machine Learning 
Theory to Applications. CRC Press, 2022. doi: 
https://doi.org/10.1201/9781003119258. 
[17] A. H. Khassawneh et al., “Prevalence and Predictors 
of Thyroid Dysfunction Among Type 2 Diabetic 
Patients: A Case–Control Study,” International 
Journal of General Medicine, vol. Volume 13, pp. 
803–816, Oct. 2020, doi: 
https://doi.org/10.2147/ijgm.s273900. 
[18] S. Poudel, “A Study of Disease Diagnosis Using 
Machine Learning,” Medical Sciences Forum, vol. 
10, no. 1, p. 8, Feb. 2022, doi: 
https://doi.org/10.3390/iech2022-12311. 
[19] Dudkina, I. Meniailov, K. Bazilevych, S. Krivtsov, 
and A. Tkachenko, “Classification and Prediction of 
Diabetes Disease using Decision Tree Method,” 
Symposium on Information Technologies & Applied 
Sciences, Bratislava, Slovakia , Mar. 2021. Available: 
https://ceur-ws.org/Vol-2824/paper16.pdf 
[20] C. Yadav and S. Pal, “Prediction of thyroid disease 
using decision tree ensemble method,” Human-
Intelligent Systems Integration, vol. 2, no. 1–4, pp. 
89–95, Apr. 2020, doi: 
https://doi.org/10.1007/s42454-020-00006-y. 
[21] P. Duggal and S. Shukla, "Prediction Of Thyroid 
Disorders Using Advanced Machine Learning 
Techniques," 2020 10th International Conference on 
Cloud Computing, Data Science & Engineering 
(Confluence), Noida, India, 2020, pp. 670-675, doi: 
https://doi.org/10.1109/Confluence47617.2020.9058
102. 
[22] G. Chaubey, D. Bisen, S. Arjaria, and V. Yadav, 
“Thyroid Disease Prediction Using Machine 
Learning Approaches,” National Academy Science 
https://doi.org/10.31449/inf.v49i12.6690 Informatica 49 (2025) 115–126 115
Motor Imagery Detection in ECG Signals Using Wavelet Packet
Decomposition and Multiscale Convolutional Neural Networks
Khawla Hussein Ali
Department of Computer Science, College of Education for Pure Sciences, University of Basrah, Basrah, Iraq
E-Mail: khawla.ali@uobasrah.edu.iq
Keywords: Wavelet decomposition, multi-scale CNN, ECG signals, motor imagery
Received: July 16, 2024
Detecting motor imagery from electrocardiographic (ECG) signals is complex but crucial in developing
advanced neuroprosthetic devices and brain-computer interface (BCI) systems. In most cases, linear mod-
els applied using conventional methods are not appropriate for the time-varying and non-linear nature
represented by the ECG characteristics, resulting in weak performances. This research addresses this
problem, combining Wavelet Packet Decomposition and Multi-Scale Convolutional Neural Networks to
improve the feature extraction mechanism and classification accuracy. ECG data is pre-processed from
the PhysioNet EEG Motor Movement/Imagery Dataset to remove noise and standardize signals. WPD is
thus applied to decompose the signals into detailed frequency components to be input as features in the
proposed Multi-Scale CNN. Different kernel sizes are implemented in these parallel convolutional layers
to learn complicated features at various hierarchical resolutions. The proposed architecture is evaluated
using performance parameters such as accuracy 92%, precision 89%, recall 93%, F1 score 91%, and ROC-
AUC 95%. These results showed that the model outperformed the earlier-used traditional methods, such
as Support Vector Machines (SVM) and Random Forests, better-detecting motor imagery. This research
emphasizes the integrative power of advanced signal processing techniques with deep learning in analyzing
biomedical signals, providing a powerful solution to advancing neuroprosthetic and BCI technologies.
Povzetek: Študija dokazuje učinkovitost kombinacije obdelave signalov in globokega učenja za analizo
biomedicinskih signalov. Uporabljena je valčna paketna dekompozicije in večskalna konvolucijska nevron-
ska mreža za detekcijo motorične imaginacije v signalih EKG.
1 Introduction terfaces [3]. Neuroprosthetic devices work best when
the intended motor actions are accurately detected, so the
1.1 Background on motor imagery in ECG machine works appropriately to aid a person with motor
signals deficits. At the same time, BCIs must provide this intended
signal from brain activities accurately into control signals of
Motor imagery is a cognitive process by which one in- high precision to guarantee reliability and user satisfaction.
ternally represents motion without physically carrying it Faulty interpretations and actionsmay occur if the detection
out [1]. This mental process class engages neural pathways needs to be more accurate, making the advantages of such
closely related to those involved during actual movements, highly developed systems irrelevant. Thus, robust methods
a fact that can be picked up in various physiological signals. for detecting motor imagery in ECG signals are required to
For example, in electrocardiogram (ECG) signals, motor improve these technologies further.
imagery detection can provide insights into neural activity
related to motor functions [2]. Since ECG signals, unlike
other neurophysiological signals, are mainly used for mon- 1.3 Introduction to wavelet packet
itoring cardiac health, they find interest in detecting motor decomposition (WPD)
imagery based on their accessibility and non-invasiveness
in the recording. Wavelet Packet Decomposition (WPD) is an advanced
method of processing signals that decompose a signal in
its constituent frequency components [4]. It further de-
1.2 Importance of accurate detection and composes in detail compared to the conventional wavelet
classification transform, which focuses on a specific set of frequency
bands. The advantages of WPD include multiresolution
Accurately detecting and classifying motor imagery from analysis with both approximation and detail coefficients de-
ECG signals are essential for various emerging technolo- composed at every level, hence being very useful for ana-
gies, particularly neuroprosthetics and brain-computer in- lyzing non-stationary signals like ECG, where signal prop-
116 Informatica 49 (2025) 115–126 K.H. Ali
erties can change over time. The baseline model used in motor imagery and provides a noninvasive, accessible way
this study combinesWavelet Packet Decomposition (WPD) of developing neuroprosthetic devices and BCI systems.
and a Multiscale Convolutional Neural Network (CNN),
where WPD decomposes ECG signals into multiple fre-
quency bands to extract features across various scales and 1.5.3 Comprehensive evaluation
resolutions, and the multiscale CNN processes these fea- The proposed detailed experimental evaluation includes
tures to capture patterns of different sizes and temporal fre- preprocessing steps, feature extraction, model training,
quencies for improved classification accuracy. Themodel’s and performance assessment; such a roadmap could be
performance is evaluated using metrics such as accuracy, handy for implementing and validating similar methodolo-
sensitivity, specificity, and F1-score, providing a basis for gies. Multiple metrics used for assessment and comparison
comparison with modified versions of the model to assess against traditional methods will ensure the robustness and
the impact of each component. This ablation study aims comprehensiveness of the evaluation for the proposed ap-
to determine the contribution of Wavelet Packet Decompo- proach.
sition (WPD) when used with a Multiscale Convolutional
Neural Network (CNN) for motor imagery classification in
ECG signals. By systematically removing or altering the
WPD component, we aim to understand its significance and 2 Related work and SOTA
how it enhances the performance of the Multiscale CNN. experiment
1.4 Motivation for using multiscale CNN 2.1 Previous approaches to motor imagery
Thus, Wavelet Packet Decomposition coupled with a Mul- detection
tiscale CNN represents a practical approach to best deal Although most of the research has been on detecting mo-
with the feature extraction task. CNNs are among the most tor imagery with electroencephalogram (EEG) signals, re-
prevalent and well-known models for automatically learn- cently, an emerging interest has been in using the noninva-
ing features in a hierarchical fashion from raw data that can sive and easily obtainable ECG signal [6]. Priori methods
handle complicated pattern recognition tasks with supreme have thus focused on feature extraction for detecting motor
grace [5]. The proposed multiscale technique of CNN is a imagery from the ECG signal, followed by classification
multiple-parallel convolutional layer with different kernel using machine learning algorithms.
sizes simultaneously to take up the features of multiple res- Time-domain, frequency-domain, and time-frequency
olutions. This bears a specific benefit in dealing with the analysis techniques have been applied to extract pertinent
variability in ECG signals, as it allows learning fine and features from ECG signals. These techniques generally an-
coarse features. The proposed method combinesWPDwith alyze the amplitude and duration characteristics of the ECG
the multiscale CNN to utilize the advantages of these two signal. Some features that they use are mean, variance,
techniques toward a maximized level of classification ac- skewness, and kurtosis of the signal segments. However,
curacy in detecting motor imagery from ECG signals. such features severely affect noise and will fail to capture
the underlying patterns associated with motor imagery.
1.5 Contributions Frequency-domain methods involve transforming the
ECG signal from its time domain into its frequency domain,
This research makes several critical contributions to the for which techniques like the Fourier Transform have been
field of biomedical signal processing and brain-computer used [7]. Extracted features such as power spectral den-
interface (BCI) systems: sity and spectral entropy have been used. Though these ap-
proaches can be informative about the signal’s frequency
1.5.1 Novel methodology components, the transient characteristics of motor imagery
Wavelet packet decomposition, coupled with multiscale should be noticed.
convolutional neural networks, is a new concept in motor Short-time Fourier Transform (STFT) and Wavelet
imagery detection from ECG signals. This approach ef- Transform are prevalent in motor imagery detection. These
fectively combines the WPD-based multiresolution analy- techniques offer a compromise by giving information about
sis with CNN’s automatic feature learning capabilities for time and frequency. However, STFT gives a fixed res-
improved classification performance. olution; thus, it is limited to various cases that present
effectively different frequency contents. Wavelet Trans-
1.5.2 Improved detection of motor imagery form gives multiscale analysis and is more suited for non-
stationary signals like the ECG.
The present study extends the horizon of motor imagery The machine learning techniques in support vector ma-
detection to ECG signals compared with the conventional chines, k-NN, and random forests are among the classifiers
EEG-based approaches. The findings have demonstrated used in this work on classifying motor imagery based on
that an ECG signal can be a suitable alternative for detecting feature extraction. Although this approach has proven to
Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 117
be quite promising in practice, its performance depends on ing meaningful features from such signals might be chal-
the feature extraction quality and a set of hyperparameters. lenging.
The conventional convolutional neural networks applied
to ECG signals are usually composed of 1D convolutional
2.2 Use of wavelet transforms in ECG layers. In this case, local patterns in the signal are collected
analysis by sliding filters over the signal. The pooling layers sum up
these patterns, reducing dimensionality and capturing only
Wavelet transforms have widely been applied to ECG sig- the most salient features. At the network’s end, fully con-
nal processing because they can analyze non-stationary sig- nected layers take these features and make the final classi-
nals [8]. A wavelet transform decomposes a signal into fication.
frequency components related to a defined scale. This de-
composition can then serve as a detailed analysis of the CNNs’ efficacy in processing biomedical signals comes
signal’s time-frequency characteristics. Wavelet transforms from their capability to handle massive datasets and learn
are used for various tasks such as denoising, feature extrac- robust features [10]. However, designing a highly effec-
tion, and classification in ECG analysis. tive CNN architecture requires consideration of network
depth, filter size, and other hyperparameters. CNN design
Most ECG signal processing operations involve denois- is hyperparameter-specific, not only computationally ex-
ing, which eliminates as many noise artifacts as possible pensive but also requiring abundant training data for per-
without changing the critical information content of sig- formance.
nals. Wavelet-based denoising is performed by decompos- Some strategies developed to counter this and related
ing an ECG signal into wavelet coefficients, thresholding challenges of sparsely labeled data include transfer learn-
the noisy coefficients, and reconstructing the signal from ing and data augmentation. Transfer learning involves us-
the modified coefficients. This method has proven effec- ing a pre-trained network that has been previously trained
tive in denoising ECG to reduce noise while keeping the on tasks similar to the one at hand and fine-tuning it to the
salient features intact. target task. This paradigm borrows knowledge from the
Wavelet transforms in feature extraction embrace mul- source task to reduce the quantity of labeled data needed.
tiscale analysis, capturing both high-frequency details and Data augmentation techniques, including adding noise and
low-frequency trends. Features like wavelet coefficients, shifting and scaling the signal, are included to add some
entropy, and wavelet energy have been extracted from this degree of variability in the training data, thus improving
time series data and used for classification tasks. Such fea- generalization capacity within a given network.
tures characterize both the spectral and temporal character- A study incorporating CardiacNet was conducted to
istics of the ECG signal. identify and categorize cardiac arrhythmia based on ECG
The Wavelet Packet Decomposition (WPD) generalizes signals and elaborate on the constraints of traditional pre-
theWavelet Transform technique so that decomposition can diction systems and AI methods to identify arrhythmia due
be performed on approximation and detail coefficients at to poor feature extraction correctly. The approach applied
every level [9]. ECG signal processing uses WPD to ex- pre-processing on ECG data by eliminating non-linearities,
tract very informative features of classification tasks. By feature extraction using unsupervised machine learning-
implementing the WPD technique, detecting the subtle pat- based PCA (UML-PCA), and feature selection with im-
tern associated with motor imagery is improved by analyz- proved Harris Hawk’s Optimization (IHHO). CCNN was
ing the signal at different scales and frequencies. then used to classify and yield impressive quantitative mea-
sures such as accuracy of 97.57%, sensitivity of 98.29%,
2.3 Convolutional neural networks in and MCC value of 98.17% [11].
biomedical signal processing An essential step in raw ECG signal preprocessing is
noise and artifact removal, which may affect classification
Convolutional neural networks (CNNs) are the break- model performance. Other processes in the preprocessing
through in biomedical signal processing because they can stage were baseline wandering removal, noise filtering, and
automatically learn hierarchical features from raw data. normalization.
CNNs consist of convolutional, pooling, and fully con- A high-pass filter technique was employed to remove
nected layers. Each layer takes the input signal and extracts Baseline wandering as it consists of low-frequency compo-
increasingly complex features, helping the network catch nents [12]. Bandpass filtration removed noise and retained
intricate patterns. only the frequency components relevant to the ECG signals.
CNNs have been broadly applied in the analysis of ECG It was done to normalize the ECG signals into a standard
signals for arrhythmia detection, ischemia detection, and range of values—that is, every sample was uniform.
the classification of several other cardiac diseases. One
of the significant advantages of CNNs is the automatic 2.4 Wavelet packet decomposition
feature-extraction feature; therefore, the need to perform
manual feature engineering can be ruled out. This is This work has decomposed preprocessed ECG signals into
very useful in biomedical signal processing since extract- frequency components using the Wavelet Packet Decom-
118 Informatica 49 (2025) 115–126 K.H. Ali
Figure 1: Sample ECG signal from the PhysioNet EEG motor movement/imagery dataset
Figure 2: Wavelet packet decomposition of an ECG signal showing decomposition levels and corresponding frequency
components
position technique (WPD). WPD can give a better analysis 3 Methodology
than the traditional wavelet transformation because it de-
composes approximation and detail coefficients at all lev- 3.1 Data acquisition and preprocessing
els. Due to its multiresolution property, this is essential in
capturing the transient characteristics of the motor imagery The database used for this study was obtained from the
signals. PhysioNet database, specifically the EEG Motor Move-
ment/Imagery Dataset. The dataset includes a set of ECG
recordings of multiple subjects carrying out motor imagery
tasks. Each record is annotated concerning whether or not
there was motor imagery—these were used as ground truth
and MCC value of 98.17% [11]. CardiacNet uses a dif- against which the classification task results were compared.
ferent technique to detect motor imagery from recorded
ECG signals. By integrating the Wavelet Packet Decompo- The choice of wavelet function and the decomposition
sition (WPD) with the Multiscale CNN, the present study level are the most basic but essential parameters in WPD.
seeks to optimize the classification of dynamic and non- The Daubechies 4 wavelet was chosen to do this because it
linear signal features previously unexplored and additional was best suited for the analysis of ECG signals. Following
ECG uses beyond cardiac health. the same thesis, it is decomposed to level 4, giving an ade-
Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 119
Figure 3: Flowchart of the proposed model from data acquisition to performance comparison
quate compromise between the complexity of computations forming binary classification for motor imagery.
and the level of detail.
Wavelet-packet decomposition is based on decomposing
any ECG signal into a set of wavelet coefficients at differ- 3.3 Implementation details
ent resolutions. These wavelet coefficients represent the Implementation was done in Python and its associated li-
ECG signal at various resolutions. The obtained coeffi- braries, including the PyWavelets library used in wavelet
cients were used as features for the classification model. packet decomposition and the TensorFlow/Keras library for
A feature set includes the mean, variance, and energy of modeling and training. Figure 3: Flowchart of the proposed
wavelet coefficients at each level of decomposition, which model from data acquisition to performance comparison.
gives a representative of ECG. The CNN. The ECG signals were pre-processed using
Wavelets and decomposed after in the CNN [13]. The MS-
3.2 Multiscale convolutional neural network CNN was trained using a Binary cross-entropy loss func-
(CNN) tion and Adam optimizer because of its high performance
and efficiency. The training was split into dataset training-
The Multiscale Convolutional Neural Network constitutes validation sets, and early stopping was implemented to
the core part of the methodology, which aims to increase avoid overfitting. The accuracy, precision, recall, and F1
feature extraction from the granularities, from fine-grained score metrics assessed model performance.
to coarse ones. After the convolutional layer, the follow-
ing layer used for representation is a pooling layer respon- 3.4 Algorithm and flowchart
sible for reducing the dimensionality of the features while
retaining the most salient parts. The outputs of these par- The proposed model for motor imagery detection in ECG
allel pathways are multiple kernel sizes, enabling the net- signals starts by acquiring data from the PhysioNet EEG
work to capture features at different resolutions. The mul- Motor Movement/Imagery dataset. The raw ECG signals
tiscale CNN is designed with three parallel convolutional are then pre-processed, followed by baseline wandering re-
pathways. The first layer in each path is a 1D convolutional moval using a high-pass filter, noise filtering using a band-
layer. Kernel sizes of 3, 5, and 7 were applied to extract fea- pass filter, and, lastly, normalization to standardize the sig-
tures at different scales, then concatenated and input into a nal range. Next, the Wavelet Packet Decomposition pro-
few fully connected layers for the final classification. This cess uses level 4 of the Daubechies 4 (db4) wavelet func-
neural network uses ReLU-activated hidden layers and con- tions to decompose the ECG signals into sub-high, high,
volutional layers with a sigmoid-activated output layer, per- and low-frequency bands. Features are then acquired from
120 Informatica 49 (2025) 115–126 K.H. Ali
Figure 4: The architecture of the multi-scale CNN
wavelet coefficients at each level: mean, variance, and en- based Multiscale CNN approach on motor imagery detec-
ergy. These features would be given as input to a Multi- tion in ECG signals using a publicly available dataset. The
Scale Convolutional Neural Network designedwith parallel dataset consists of ECG recordings from several subjects
convolutional layers of filter sizes 3, 5, and 7. The output performing motor imagery tasks. From each ECG record-
of each conv layer is ReLU activated and then subjected to ing, ground truths are available on the presence or absence
max-pooling. The subsequent is shown in Figure 3. of motor imagery, which are the results to be achieved
Two outputs are concatenated and passed through fully within a classification task.
connected layers into a sigmoid-activated output layer to Dividing the data set into a training, validation, and test
implement the final binary classification. Using the Adam set ensures the model’s evaluation is all-around. The data
optimization algorithm, the network trains against these consisted of 70% for training, 15% for the validation, and
data with binary cross-entropy loss in the back end; it has 15% for the test set. This partitioning would ensure that
preactivation stops concerning the loss of the hold-out set. models trained on a diversified set of samples are evalu-
Performance evaluation metrics include accuracy, preci- ated on completely unseen data to estimate generalization
sion, recall, F1-score, and ROC-AUC, with comparisons to capability.
highlight its superior performance over traditional methods
such as SVM and Random Forest.
4.1.1 Experiment 1: removing wavelet packet
decomposition (WPD)
4 Experiments and analysis
TheWPD step was removed in this experiment, and the raw
4.1 Ablation experiments ECG signals were directly fed into the Multiscale CNN.
The expected impact was that without WPD, the model
A series of experiments were conducted to evaluate the per- processes only the raw signal, potentially missing critical
formance of the proposed Wavelet Packet Decomposition- frequency-specific features. The multiscale CNN still at-
Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 121
tempts to capture features at different scales but lacks the 4.4 Performance metrics
enriched input from WPD.
The effectiveness of the proposed method was evaluated
based on various metrics, such as accuracy, precision, re-
4.1.2 Experiment 2: using standard CNN instead of call, F1 score, and area under the Receiver Operating Char-
multiscale CNN acteristic curve; these measures assessed how well the
In the second experiment’s model set-up, WPD was re- model could discriminate motor imagery within ECG sig-
tained, but the Multiscale CNN was replaced with a stan- nals.
dard CNN that processes the signal at a single scale. The •Accuracymeasures howwell the model performs over-
expected impact was that the standard CNN may not fully all by calculating the true positives and negatives ratio
exploit the multiscale features provided byWPD, leading to among all the predictions.
suboptimal feature extraction and classification. Themodel • Precision reflects the proportion of true positives to the
might Figure 4: The architecture of the Multi-Scale CNN total number of optimistic predictions the model made [15].
perform better than the raw signal input but is expected to •Recall (sensitivity) refers to the model’s ability to iden-
underperform compared to the baseline multiscale CNN. tify all relevant instances (true positives) accurately.
• F1-score is the harmonic mean of the precision and re-
4.1.3 Experiment 3: combined removal of WPD and call. It provides a single score that balances both concerns.
multiscale CNN • ROC-AUC measures the model’s performance overall
classification thresholds; thus, higher values indicate better
This experiment removed WPD and the CNN’s multiscale discrimination.
structure, producing a standard CNN processing raw ECG
signals. The experiment serves as a control, representing
the most straightforward model setup. The expected out- 5 Results
come is the poorest performance, as the model needs more
enriched input fromWPD and the capability to process fea- In Experiment 1, removing WPD from the model led to
tures at multiple scales. a slight but noticeable decrease in performance measures.
This drop shows that WPD is critical in improving the qual-
4.2 Data preprocessing ity of features fed into the Multiscale CNN to boost clas-
sification efficiency. This gap partially explains why the
As given in the methodology section, raw ECG signals un- model could not adequately reconstruct some frequency-
derwent some preprocessing. A high-pass filter with a cut- specific features when WPD was absent; this lack of dis-
off frequency of 0.5 Hz was employed to remove baseline tinction landed the model lower scores by failing to differ-
wandering. A bandpass filter ranging from 0.5 Hz to 40 entiate motor imagery from other signal components.
Hz was used for further filtering, which helped smooth the Replacing the Multiscale CNN with a standard CNN
high-frequency noise and retain important frequency com- while maintainingWPD in Experiment 2 resulted in a mod-
ponents [14]. Post-preprocessing, these signals were nor- erate decrease in performance. This implies that while
malized to a standard range of 0-1 to make them uniform. WPD may still provide helpful multi-resolution features to
The signal was preprocessed before running the Wavelet be exploited, its usefulness greatly depends on the subse-
Packet Decomposition up to level 4 with a Daubechies 4 quent application of a Multiscale CNN, which training can
wavelet. The obtained wavelet coefficients were used to incorporate these features at suitable scales. Due to the
build feature vectors for each ECG segment. These are the single-scale characteristic of the standard CNN, it was not
inputs of the Multiscale CNN, which treated these different possible to fully utilize such features asWPD to provide the
frequency components represented by ECG signal feature best classification results.
vectors. The results of Experiment 3 revealed the most signifi-
cant decline in all performance metrics when both WPD
4.3 Model training and the Multiscale CNN were removed, leaving a standard
CNN to process the raw ECG signals. Such a consider-
Multiscale CNN architecture for TensorFlow/Keras: three able decrease also underscores the importance of utilizing
parallel convolutional pathways with kernel sizes 3, 5, and WPD and a Multiscale CNN to improve MI detection ac-
7, concatenated features after max-pooling, passed through curacy. The features extracted from WPD offer more en-
fully connected layers with the final output layer, which hancements, together with the capacity of the Multiscale
uses a sigmoid activation function for binary classification. CNN to scrutinize the features at different scales, which is,
This model was compiled using the Adam optimizer therefore, important for accurate and solid classification.
and binary cross-entropy loss function. Training was done Table 1 shows the proposed method’s performance as
through 100 epochs and a batch size of 32. Early stopping tested on the test set. Compared to traditional approaches,
with patience set at ten epochs was applied to avoid overfit- Multiscale CNN better detected motor imagery from the
ting bymonitoring the validation loss and stopping training. ECG.
122 Informatica 49 (2025) 115–126 K.H. Ali
Figure 5: Bar chart showing the performance metrics
tween the actual positive rate and the false positive rate.
Table 1: Performance metrics Multiscale CNN model was performed with that of the
Metric Value traditional machine learning methods Support Vector Ma-
Accuracy 0.92 chines (SVM) and Random Forests (RF) on the same
Precision 0.89 dataset and preprocessing steps.
Recall 0.93 The summarized results in Table 2 pointed to the
F1-Score 0.91 supremacy of the Multiscale CNN. The table for multiple
ROC-AUC 0.95 models overview numerous metrics, including accuracy,
precision, recall, F1-score, specificity, and Matthews Cor-
relation Coefficient (MCC). This comparison also shows
how effective and efficient the proposed Multiscale CNN
Themodel attained an accuracy of 92%, meaning it could with WPD is compared to other prevailing classifiers like
correctly classify 92% of samples. The obtained precision SVM and Random Forest.
and recall values were 89% and 93%, respectively, show- High performance could be attributed to the Multiscale
ing that the model could correctly distinguished true posi- CNN’s ability to automatically learn hierarchical features
tives and maintained a low false positive rate. The F1 Score of the wavelet coefficients, which represent fine-grained
was 0.91, reflecting a good balance between precision and and coarse patterns crucial for discriminating between mo-
recall. The ROC-AUC of 0.95 indicated excellent discrim- tor imagery types [16].
ination ability across a range of classifications. The proposed approach addresses a significant gap in the
Figure 6 shows a confusion matrix that shows precisely field of ECG-based signal processing by extending its ap-
how the model performed—the quantity of true positive, plication from traditional cardiac health monitoring to mo-
true negative, false positive, and false pessimistic predic- tor imagery detection. Models such as CardiacNet are ac-
tions. The confusion matrix depicted many accurate opti- curate in detecting cardiac arrhythmias but are centered on
mistic and pessimistic predictions, with very few. disease classification and not the detection of cognitive pro-
The ROC curve showed that the model could maintain cesses like motor imagery. Non-invasive motor imagery
a high actual positive rate with a low false positive rate; based on ECG signals is still unexplored and opens a vast
actually, 0.95 under the curve shows good performance. possibility for investigating cognitive processes using neu-
ral signals. This approach meets a significant requirement
5.1 Comparison with traditional methods in BCI and neuroprosthetics, where an efficient and cost-
effective identification ofmovement goals improves usabil-
To further substantiate the efficacy of the approach, the per- ity and functionality.
formance comparison of the false positives and negatives, Unlike traditional methods like Support VectorMachines
thus substantiating the model’s robustness. (SVM) and Random Forests, the proposed method offers
The ROC curve in Figure 7 provided more information distinct advantages through its use of a Multiscale Con-
on model performance, indicating a good separation be- volutional Neural Network (CNN) combined with Wavelet
Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 123
Packet Decomposition (WPD). Furthermore, most ear-
lier traditional machine methods require feature extrac-
tion through experience, which may not artistically depict
the ECG signal’s subtle patterns, especially during the MI
detection phase. Instead, automatic hierarchical feature
learning of multiscale CNN and the WPD facilitates multi-
resolution signal analysis. Hence, the themodel can capture
high-level and low-level details at multiscale and multires-
olution, which improves the classification ofmotor imagery
tasks.
In addition, problems associated with the time and fre-
quency domains, like the Fourier Transform used conven-
tionally, are well addressed by the proposed method. Such
traditional methods fail to capture short-term features and
non-stationary aspects inherent in the signals used to imag-
ine motor control. Thus, the proposed WPD approach cap- Figure 7: ROC curve
tures more refined time-frequency features by the multi-
scale CNN to distinguish motor imagery and signal noise
more accurately from other irrelevant components.
WPD enhances the input features in that it gives pre-
cise frequency details. On the other hand, the Multiscale
6 Discussion and future works CNN operates these features at different scales, thus im-
proving its ability to learn complex patterns that enhance
6.1 Discussion the classification result. The experiments also show that
standard CNN is suboptimal as it does not reproduce the
The experiments reveal critical insights into the effective- results even when WPD is applied. This implies that the
ness of combining Wavelet Packet Decomposition (WPD) multiscale framework of using different kernel sizes to ob-
with a Multiscale Convolutional Neural Network (CNN) tain features of various scales is essential. Generally, these
for motor imagery detection in ECG signals. This substan- results demonstrate that WPD is beneficial in detecting MI
tial degradation of performance indicates the importance of from ECG signals, so it is for Multiscale CNN.
WPD in extracting significant frequency band features from
ECG data required for classification. The characteristic of
WPD is that the signals can be analyzed at different reso- 6.2 Generalizable capabilities
lutions; this is beneficial when dealing with transient/non-
stationary signal characteristics that would typically go un- The generalization capabilities of a model are critical in as-
noticed when utilizingmost of the conventional signal anal- sessing its robustness and applicability across various sub-
ysis techniques. Moreover, the combination of WPD and jects and datasets. This study used independent dataset
the Multiscale CNN can be observed in better baseline validation to determine the model’s predictive accuracy on
model performance. new cases not part of the training dataset. The indepen-
dent dataset used for validation differed from the one used
in the training process, and there was no intersection be-
tween these two datasets. To check this, validation was
conducted using the k-fold cross-validation method, where
the data set was split into five sets (k=5) such that subjects
were distributed across all the five data splits. It contributes
to mitigating inter-subject variability, a significant issue in
motor imagery tasks, as different patterns of ECG signals
can influence a model. The performance was almost steady
across the folds, which shows the ability of the model to
perform well for new subjects in the dataset.
However, exercising the model with a validation tech-
nique other than K-fold cross-validation would be more
meaningful, for instance, testing themodel on a new data set
not used in the training phase. A validation approach with
an independent test set would also test the model’s ability to
generalize to highly different conditions if the independent
Figure 6: The confusion matrix dataset differs in signal quality, subject characteristics, or
data acquisition techniques. For instance, using an exter-
124 Informatica 49 (2025) 115–126 K.H. Ali
Table 2: Comparison with traditional methods
Metric Accuracy Precision Recall F1-Score ROC-AUC Recall
SVM 0.85 0.83 0.86 0.84 0.88 0.88
Random Forest 0.87 0.84 0.88 0.86 0.9 0.9
Multiscale CNN 0.92 0.89 0.93 0.91 0.95 0.95
MCC 98.17 - - - - -
Figure 8: Comparison with traditional methods
nal set we obtain from a different recording protocol would and generalize these findings:
help determine how much the proposed model hampers or
inspires adaptability for different data characteristics. If
loss occurs in these cases, it may show the aspects in which 6.3.1 Exploration of alternative wavelet functions:
the model has to be optimized to be generalized better. The Daubechies 4 dB4 wavelet was suitable for this work;
To strengthen the evaluation, it is recommended to ex- some studies on other wavelet functions and their impact on
pand the main parameters, including accuracy, precision, the features extracted would addmore value. Other wavelet
and relative, especially in cases of imbalance in collection. functions capture unique characteristics in the signal that
It should be noted that using values such as specificity or could lead to further improvements in classification accu-
MCC will give a better picture of the effectiveness of the racy.
examined model. Although MCC was not used in the cur-
rent study, it is a valuable metric considering true posi-
tives, false positives, and false negatives, thus offering in- 6.3.2 Multi-modal data fusion:
sight into the model’s performance in imbalanced condi- Such ECG signals can be combined with other physiologi-
tions. Future work could incorporate these additional met- cal signals, such as EEG and EMG, further to improve the
rics and further investigate techniques such as domain adap- robustness and accuracy of motor imagery detection [17].
tation to enhance the model’s applicability across different Multi-modal fusion methods combine complementary in-
data sources. formation from different sources to describe a motor imag-
inary event completely.
6.3 Directions for future work
6.3.3 Advanced deep learning architectures:
While the proposed methodology has made considerable
strides in motor imagery detection research, there are sev- Research about advanced deep learning architectures based
eral avenues of inquiry forward that would further enhance on RNN and attention mechanisms can achieve even better
Motor Imagery Detection in ECG Signals… Informatica 49 (2025) 115–126 125
performance for motor imagery detection [18]. These ar- motor imagery classification performance. Removing ei-
chitectures have confounders of temporal dependencies and ther component significantly declines model accuracy, il-
contextual information that could improve the detection of lustrating their combined importance in the overall frame-
the subtle pattern of ECG signals. work. This research presents a new motor imagery de-
tection scheme from ECG signals using Wavelet Packet
6.3.4 Real-time implementation: Decomposition and Multiscale Convolutional Neural Net-
works. The methodology indeed enhances the classifica-
Based on the proposed methodology, designing real-time tion accuracy to a large extent; hence, it needs no explicit
systems for motor imagery detection must move the work mention. The results also indicate that an ECG signal is fea-
toward a practical application. Implementing the model sible in motor imagery detection as a noninvasive and eas-
in time-real environmental systems and testing its perfor- ily accessed technique for developing neuroprosthetic de-
mance under dynamic conditions becomes crucial for de- vices and BCI systems. These contributions help support
ploying the technology in neuroprosthetic devices and BCI further studies in this area of research, which has enormous
systems. room for further improvement and exploration. On the way
ahead, addressing the suggested directions for future work
6.3.5 Large-scale validation: will continuously advance the field and, eventually, achieve
more effective and dependable technologies related to mo-
Further large-scale validation is required to generalize the tor imagery detection.
findings and check the robustness of the proposed approach This study’s methodology will be open-sourced to en-
with datasets and subjects under study. The model will be sure reproducibility for other research groups to extend and
tested experimentally across different populations, tasks, continue their collaboration in biomedical signal process-
and recording conditions to estimate its reliability and scal- ing and brain-computer interface. In the future, with fur-
ability. ther research in this area, the full potential of MI detection
using the ECG signal can be achieved to benefit people suf-
6.3.6 Transfer learning and domain adaptation: fering from motor impairments and progress in neuropros-
thetic/BCI capabilities.
If the model is adapted to different domains and tasks us-
ing transfer learning, it can be flexible. Domain adaptation
methods may alter the model’s capability towards gener-
alizing new data, at least with its reusability on minimal Acknowledgment
retraining. I am grateful to the University of Basrah for providing
the resources and environment needed to complete this re-
6.3.7 User-centric design: search. I alsowant to thank the Computer ScienceMembers
Provisions for user feedback in the motor imagery detection for their collaborative spirit and helpful discussions, which
system and the development of user-centric interfaces will contributed significantly to the ideas presented here.
likely improve its usability and acceptance [19]. Knowl-
edge of the desires and preferences of the end-user, such as
a person with a motor impairment, may guide the develop- References
ment of more intuitive and effective BCI systems.
[1] P. Bach, C. Frank, and W. Kunde, “Why mo-
6.3.8 Ethical considerations and data privacy: tor imagery is not really motoric: Towards
a re-conceptualization in terms of effect-
Ethical considerations and data privacy are paramount in based action control,” Psychological Re-
collecting, processing, and using physiological signals. search, vol. 88, no. 6, pp. 1790–1804, 2024.
Frameworks on ethical data handling and compliance with https://doi.org/10.1007/s00426-022-01773-w.
privacy regulations will be essential to ensure the respon-
sible deployment of technologies for motor imagery detec- [2] A. Saibene, M. Caglioni, S. Corchs, and F. Gas-
tion [20]. parini, “Eeg-based bcis on motor imagery paradigm
using wearable technologies: a systematic re-
view,” Sensors, vol. 23, no. 5, p. 2798, 2023.
7 Conclusion https://doi.org/10.3390/s23052798.
In conclusion, the ablation study confirms that Wavelet [3] A. Palumbo, V. Gramigna, B. Calabrese, and N. Ielpo,
Packet Decomposition and Multiscale CNN are integral “Motor-imagery eeg-based bcis in wheelchair move-
components of the proposed method. WPD provides a ment and control: A systematic literature re-
rich, multi-scale representation of the ECG signals, which, view,” Sensors, vol. 21, no. 18, p. 6285, 2021.
when processed by a Multiscale CNN, leads to superior https://doi.org/10.3390/s21186285.
126 Informatica 49 (2025) 115–126 K.H. Ali
[4] W. Cabrel, G. T. Mumanikidzwa, J. Shen, and [14] R. Y. L. Al-Taai and X. Wu, “Speech enhance-
Y. Yan, “Enhanced fourier transform using ment for hearing impaired based on bandpass
wavelet packet decomposition,” Journal of Sen- filters and a compound deep denoising autoen-
sor Technology, vol. 14, no. 1, pp. 1–15, 2024. coder,” Symmetry, vol. 13, no. 8, p. 1310, 2021.
https://doi.org/10.4236/jst.2024.141001. https://doi.org/10.3390/sym13081310.
[5] M. A. Qureshi, K. N. Qureshi, G. Jeon, and F. Pic- [15] R. Yacouby and D. Axman, “Probabilistic exten-
cialli, “Deep learning-based ambient assisted living sion of precision, recall, and f1 score for more
for self-management of cardiovascular conditions,” thorough evaluation of classification models,” in
Neural Computing and Applications, pp. 1–19, 2022. Proceedings of the first workshop on evaluation
https://doi.org/10.1007/s00521-020-05678-w. and comparison of NLP systems, pp. 79–91, 2020.
[6] G. Aggarwal and Y. Wei, “Non-invasive fetal https://aclanthology.org/2020.eval4nlp-1.9.
electrocardiogram monitoring techniques: Poten- [16] J. Wen, Y. Li, M. Fang, L. Zhu, D. D. Feng, and
tial and future research opportunities in smart tex- P. Li, “Fine-grained and multiple classification for
tiles,” Signals, vol. 2, no. 3, pp. 392–412, 2021. alzheimer’s disease with wavelet convolution unit
https://doi.org/10.3390/signals2030025. network,” IEEE Transactions on Biomedical En-
[7] A. K. Singh and S. Krishnan, “Ecg signal feature ex- gineering, vol. 70, no. 9, pp. 2592–2603, 2023.
traction trends in methods and applications,” BioMed- https://doi.org/10.1109/tbme.2023.3256042.
ical Engineering OnLine, vol. 22, no. 1, p. 22, 2023.
https://doi.org/10.1186/s12938-023-01075-1. [17] B. Rim, N.-J. Sung, S. Min, and M. Hong,
“Deep learning in physiological signal data: A
[8] C. Zhuang and P. Liao, “An improved empirical survey,” Sensors, vol. 20, no. 4, p. 969, 2020.
wavelet transform for noisy and non-stationary signal https://doi.org/10.3390/s20040969.
processing,” IEEE Access, vol. 8, pp. 24484–24494,
2020. https://doi.org/10.1109/access.2020.2968851. [18] J. Mladenović, “Standardization of protocol design
for user training in eeg-based brain–computer in-
[9] H. Wang, W. Wang, Y. Du, and D. Xu, “Examin- terface,” Journal of Neural Engineering, vol. 18,
ing the applicability of wavelet packet decomposi- no. 1, p. 011003, 2021. https://doi.org/10.1088/1741-
tion on different forecasting models in annual rain- 2552/abcc7d.
fall prediction,”Water, vol. 13, no. 15, p. 1997, 2021.
https://doi.org/10.3390/w13151997. [19] I. Y. Zhao, Y. X. Ma, M. W. C. Yu, J. Liu, W. N.
Dong, Q. Pang, X. Q. Lu, A. Molassiotis, E. Hol-
[10] M. A. Abdou, “Literature review: Efficient royd, and C. W. W. Wong, “Ethics, integrity, and
deep neural networks techniques for medical retributions of digital detection surveillance systems
image analysis,” Neural Computing and Appli- for infectious diseases: systematic literature review,”
cations, vol. 34, no. 8, pp. 5791–5812, 2022. Journal of medical Internet research, vol. 23, no. 10,
https://doi.org/10.1007/s00521-022-06960-9. p. e32328, 2021. https://doi.org/10.2196/32328.
[11] K. Srinivas, V. Ch, S. R. Borra, K. S. Raju,
G. R. K. Rao, K. V. Satyanarayana, and P. M. [20] Y. Hou, S. Jia, X. Lun, S. Zhang, T. Chen,
Kumar, “Cardiacnet: Cardiac arrhythmia detec- F. Wang, and J. Lv, “Deep feature mining via the
tion and classification using unsupervised learn- attention-based bidirectional long short term mem-
ing based optimal feature selection with custom ory graph convolutional neural network for human
cnn model,” Informatica, vol. 48, no. 2, 2024. motor imagery recognition,” Frontiers in Bioengi-
https://doi.org/10.31449/inf.v48i2.5076. neering and Biotechnology, vol. 9, p. 706229, 2022.
https://doi.org/10.3389/fbioe.2021.706229.
[12] M. Bejani, E. Luque-Buzo, A. Burlaka-Petrash, J. A.
Gómez-García, J. D. Arias-Londoño, F. Grandas-
Pérez, J. Grajal, and J. I. Godino-Llorente, “Base-
line wander removal applied to smooth pursuit
eye movements from parkinsonian patients,”
IEEE Access, vol. 11, pp. 32119–32133, 2023.
https://doi.org/10.1109/access.2023.3263045.
[13] C. Pravin and V. Ojha, “A novel ecg signal de-
noising filter selection algorithm based on conven-
tional neural networks,” in 2020 19th IEEE Inter-
national Conference on Machine Learning and Ap-
plications (ICMLA), pp. 1094–1100, IEEE, 2020.
https://doi.org/10.1109/icmla51294.2020.00176.
https://doi.org/10.31449/inf.v49i12.7558 Informatica 49 (2025) 127–144 127 
Online Criminal Behavior Recognition Based on CNNH and MCNN-
LSTM 
Jingwei  Hu 
Department of Legal Practice, Shandong Judicial Police Vocational College, Jinan 250200, China 
E-mail: 17866981007@163.com 
Keywords: anonymous networks, traffic segmentation, convolutional neural networks, online crime, long short-term 
memory networks 
Received: November 10, 2024 
In light of the proliferation of cybercrimes, the effective identification and mitigation of such online 
criminal activities has emerged as a significant challenge within the domain of network security. 
Therefore, this study introduces dilated convolution technology, self-attention mechanism, convolutional 
neural network and long short-term memory network, and proposes an overlapping traffic recognition 
model based on improved convolutional neural network and an online crime recognition model with long 
short-term memory network. In the traffic segmentation model test, the recall rate, F1 value, and error 
rate of the model under normal traffic conditions were 91.43%, 93.46%, and 92.43%, respectively. The 
error rate was 4.15%. The accuracy of the online crime recognition model for malware propagation and 
illegal transactions was 96.54% and 92.87% respectively. In the concept drift test, when the training time 
and test time interval was 60 days, the accuracy of the model was 48.67% higher than that of the long 
short-term memory network. Compared with the mainstream framework and traditional methods, its 
accuracy in high traffic scenarios was 94.78%, the error rate was 3.89%, and the P-value was < 0.05. In 
the final simulation test, the model could effectively identify illegal software transactions. The results 
show that the proposed model has high accuracy and strong generalization ability in identifying 
overlapping traffic and website fingerprint crimes, and effectively improves the detection ability of 
criminal activities in anonymous networks. 
Povzetek: Predstavljen je model za prepoznavanje spletnega kriminala, ki temelji na konvolucijskih in 
LSTM nevronskih mrežah in z uporabo tehnologije razredčene konvolucije in mehanizma samo-
pozornosti dosega visoko točnost pri segmentaciji prometa in prepoznavanju spletnih kaznivih dejanj.  
Učinkovito izboljšuje zaznavanje kriminalnih aktivnosti v anonimnih omrežjih. 
 
 
1 Introduction key issue that needs to be addressed urgently. At the same 
time, the industry's research on anonymous network 
With the rapid development of Internet technology, the traffic analysis and criminal behavior identification is 
increasing complexity and openness of cyberspace have also deepening and developing. Wang Y et al. proposed a 
brought unprecedented opportunities and challenges to deep learning-based intrusion detection system SMSO-
society [1]. The emergence and popularization of CNN to address the security risks and privacy issues 
anonymous networks provide an important guarantee for caused by the transmission of large amounts of data in 
users' privacy protection in the network. However, it also wireless networks. The system combined the spider 
makes some wrongdoers utilize anonymous networks to monkey swarm optimization algorithm and CNN to 
engage in various criminalactivities, among which the improve the ability to identify network attacks. The 
anonymous communication system represented by the results showed that the system was superior to LSTM and 
onion router (Tor) is particularly typical [2]. Tor network other methods in terms of accuracy [4]. Gu X et al. 
realizes the high anonymity of user identity and proposed an online defense strategy based on non-
communication content through multi-layer encryption targeted adversarial patches to address the limitations of 
and node forwarding techniques, which is widely used for existing WF attack defense methods in practical 
legitimate purposes such as protecting user privacy and applications. Experiments indicated that the model 
preventing network surveillance. However, the achieved 95.50% defense accuracy and 12.57% time 
anonymity of Tor network is also used by some criminals overhead in real-time traffic [5]. To address the problem 
to circumvent legal supervision and become a hotbed for of high dimensionality of cybercrime data, Rawat R et al. 
cybercriminal activities, such as illegal trading, malware proposed a feature selection method based on multi-
distribution, hacking and other behaviors [3]. In this objective evolutionary algorithm (MOEA) and combined 
context, applying overlapping traffic segmentation and it with NSGA-II to reduce data dimensionality and 
website fingerprinting (WF) technology to collect identify the most relevant features. The experimental 
potential criminal evidence and detect abnormal behavior results indicated that this method effectively improves the 
in an early stage in order to identify and combat online efficiency of data processing [6]. Xian K proposed an 
criminal behavior in anonymous networks has become a  
128 Informatica 49 (2025) 127–144 J. Hu 
improved WF fingerprint recognition algorithm to solve on experimental assessments, the model performs better 
the problem of identifying encrypted traffic in virtual in terms of increased accuracy and decreased false alarm 
private networks. Moreover, it combined it with an rate than popular classification algorithms like long short-
optimized capsule neural network model CapsNet to term memory (LSTM) [9]. A CNN intrusion detection 
classify encrypted traffic. The research results showed technique based on data imbalance was presented by Gan 
that this method was superior to the random forest B et al. to address the hazards to network security brought 
algorithm in terms of recognition accuracy and on by recurrent network intrusions. The findings revealed 
convergence speed, with a recognition rate of 99.98% [7]. that, with an implementation time of 1.42 seconds, the 
Milad N et al. proposed a blind adversarial perturbation method attained an average accuracy of 98.73% in binary 
algorithm to address the problem that traffic analysis and multi-classification identification [10]. An intelligent 
technology based on deep neural networks (DNN) was prediction technique for security performance was 
vulnerable to adversarial perturbation attacks. By suggested by Xu L et al. to address security concerns in 
remapping functions to create adversarial perturbations mobile IoT healthcare networks. To increase the CNN 
independent of network connections, the algorithm was model's adaptability to nonlinear medical large data, the 
applied to real-time anonymous network traffic analysis study combined a four-branch beginning block with a 
to defeat WF identification and traffic association four-layer convolution. The results indicated that the 
classifiers. The experimental results indicated that this intelligent algorithm improved the security performance 
method was applicable to a variety of traffic classifier prediction accuracy by 20% and had better prediction 
types. The robustness test of existing countermeasures performance [11]. Yan F. et al. addressed the issue of 
performed poorly [8]. inadequate training samples and sample class imbalance 
Because of their superior 2D data processing in intrusion detection systems by proposing an intrusion 
capabilities, convolutional neural networks (CNNs) are detection system based on migration learning and 
frequently utilized in image categorization and target integrated learning. The two fundamental learning models 
recognition applications. Yesodha K et al. suggested a that were selected were Xception and Inception. A tree-
novel intrusion detection system incorporating CNN, structured estimator was used to tune the hyperparameters 
fuzzy temporal rules, and an artificial bee colony [12]. Finally, the study summarizes the research areas, 
optimization algorithm for the security vulnerability indicator test results, and limitations of the above 
problem in wireless sensor network communication with literature review. The results are shown in Table 1 below. 
the goal of improving the classifier's performance. Based 
 
Table 1: Literature summary table 
Study Methodology Performance Metric Shortcomings 
Not designed for anonymous 
Wang Y et al. Intrusion detection system based Higher accuracy than LSTM 
network traffic, struggles with 
[4] on SMSO-CNN and nearest neighbor algorithms 
overlapping traffic 
Focuses on defense tasks, does not 
Fingerprint defense strategy of 95.50% defense accuracy, 
Gu X et al. [5] address abnormal behavior recognition 
online website based on Grad-CAM 12.57% time overhead 
in anonymous networks 
Feature selection method based 
Rawat R et al. Effectively improves data Focused on feature selection, lacks 
on MOEA combined with NSGA-II 
[6] processing efficiency real-time traffic analysis 
for dimensionality reduction 
SSL VPN traffic recognition Effective for encrypted traffic 
Xian K et al. Optimized fingerprint recognition 
rate of 99.98%, recall rate of classification but lacks ability to handle 
[7] for encrypted traffic based on CapsNet 
99.98% complex anonymous traffic patterns 
Blind adversarial perturbation Demonstrated high 
Milad N et al. 
algorithm to defeat DNN-based traffic effectiveness across multiple traffic Robustness testing performs poorly 
[8] 
analysis methods classifiers 
Low false alarm rate, higher Limited to generic network 
Yesodha K et Intrusion detection system based 
classification accuracy than long features, cannot handle overlapping 
al. [9] on FT-ABC-CNN 
short-term memory networks traffic patterns 
Lacks temporal feature extraction, 
Gan B et al. Intrusion detection method based Average binary and multi-
struggles with dynamic and complex 
[10] on CNN-IDMDI class accuracy of 98.73% 
behaviors 
Focused on IoT, does not consider 
Xu L et al. Improved CNN for IoT-enabled Improves prediction accuracy 
dynamic features of anonymous 
[11] security performance prediction by 20% 
networks 
Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144   129 
Limited datasets, does not address 
Yan F et al. Intrusion detection system based Significantly improves 
overlapping traffic or anonymous 
[12] on TL-CNN-IDS accuracy 
network issues 
 
Combined with Table 1, most studies have some year have reached hundreds of billions of dollars, which 
shortcomings while improving the ability of traffic has brought a huge burden to the global economy [13]. 
classification and behavior recognition. First, the majority The diversity and complexity of cybercrime make 
of extant methods prioritize comprehensive network traditional legal supervision and law enforcement 
traffic monitoring, yet they are deficient in their capacity methods face huge challenges in dealing with these 
to discern intricate and clandestine criminal activities. behaviors. 
This is particularly problematic in anonymous network Among online crimes, online gambling is a relatively 
environments, where traditional rule-based matching common type. Criminals attract users to participate in 
methods are challenging to implement effectively to online gambling activities by setting up and operating 
detect anomalous behaviors indicative of specific illegal gambling websites. These websites usually rely on 
criminal activities. Second, many traffic analysis methods anonymous networks, such as the Tor network or 
often have a high false alarm rate in practical cryptocurrency payments, which greatly improves their 
applications, which makes it difficult for law enforcement concealment and evades legal supervision. This makes it 
agencies to respond quickly when faced with massive difficult for law enforcement agencies to track and collect 
alarm information. In addition, these methods have low evidence, making it difficult to effectively combat these 
computational efficiency and are difficult to meet the criminal activities. Online prostitution is also an illegal 
requirements of real-time monitoring of large-scale activity carried out using the Internet. Criminals usually 
network traffic. In view of this, this study introduces the promote and trade through dark web platforms to avoid 
hollow convolution technology in CNN and proposes a tracking. In addition, illegal transactions are also an 
Tor overlapping traffic segmentation model based on the important aspect of online criminal activities. Criminal’s 
hollow convolution convolutional neural network trade prohibited items such as drugs, weapons, and 
(CNNH). At the same time, combining the attention counterfeit goods in anonymous markets such as the dark 
mechanism, CNN, and LSTM, an online criminal web. Such markets often rely on complex encryption 
behavior recognition model based on multi-core technology and anonymous payment methods to conduct 
convolutional neural networks and long short-term transactions, making it extremely difficult for law 
memory networks (MCNN-LSTM) is proposed. The enforcement agencies to investigate. Another important 
model analyzes network traffic characteristics, accurately form of online crime is the spread of malware. Malware 
identifies the websites visited by users, and effectively includes ransomware, phishing software, etc., which can 
identifies anomalous network behaviors related to be spread through various network channels and pose a 
criminal activities, becoming a powerful auxiliary tool for serious threat to individuals, enterprises and even 
online crime investigation. government agencies. The spread of malware can not 
The main contributions of the study are as follows: only steal personal privacy information, but also lead to 
First, the MCNN-LSTM model based on the combination the loss of core corporate data, and in serious cases, even 
of multi-core convolution and LSTM network is endanger national security. Every year, the number of 
proposed. By using multi-module collaborative data leaks caused by malware is huge, and the economic 
optimization, the modeling capabilities of spatial features losses caused are difficult to estimate [14]. In addition, 
and time series features are integrated to improve the with the popularization of IoT technology, cyber attacks 
theoretical framework and method design of network on smart devices are also on the rise, further expanding 
traffic anomaly detection. Second, the self-attention the scope of online criminal activities. 
mechanism (SAM) is introduced into the model Faced with these challenges, traditional legal and law 
architecture, which can dynamically focus on key enforcement methods are unable to cope with the high 
features and improve the model's adaptability to dynamic concealment and transnational nature of online crimes. 
environments. Finally, a multi-scale feature extraction Researchers and law enforcement agencies have begun to 
method is proposed to capture multi-scale spatial features rely on advanced technical means, especially recognition 
based on the multi-core convolution module. algorithms based on network traffic analysis and deep 
learning. Through these technologies, researchers can 
2 Methods and materials extract useful features from massive amounts of network 
data to identify and track criminal behavior. In recent 
2.1 Online crime and its challenge years, more and more research has been devoted to 
improving traffic analysis methods to improve the ability 
Online criminals often use the anonymity, privacy 
to detect complex cybercrime, especially crimes in 
protection and global characteristics of the Internet to 
anonymous networks. In the future, with the further 
carry out various illegal activities, including illegal 
development of technology, more intelligent detection 
gambling, online transactions, money laundering, 
systems for online criminal behavior will be widely used 
malware propagation, etc. Studies have shown that the 
to better cope with the growing network threats. 
economic losses caused by cybercrime worldwide each 
130 Informatica 49 (2025) 127–144 J. Hu 
Online crime identification is the process of locating protection [15]. Tor achieves anonymity in 
and assessing possible illegal activity, such as online communications by dividing user communications into 
gambling, malware distribution, and illegal transactions, multiple data packets, transmitting them through multiple 
by analyzing network traffic, user behavior patterns, and relay nodes, and encrypting and decrypting the data 
data characteristics. Unlike traditional network traffic packets. The high privacy of anonymous networks makes 
analysis, online crime identification focuses more on the them an important tool for legitimate users to protect their 
complex characteristics of criminal behavior hidden in privacy, but they also provide shelter for various 
anonymous networks, often involving protocol abuse, criminalactivities, such as online gambling, online 
encrypted data streams, and anomalous behavior patterns. prostitution, and illegal transactions. These crimes not 
Anomalous network behavior usually manifests itself in only cause great social harm, but also bring great 
the form of anomalous network traffic patterns, including challenges to law enforcement agencies in identification 
but not limited to the following. On the Tor network, and tracking. At the same time, this anonymity also 
high-frequency, short-duration access patterns may makes traffic analysis and identification more difficult, 
reflect scanning attacks. Abnormal packet intervals or especially in the case of overlapping traffic. Overlapping 
excessively large packet sizes may indicate covert traffic segmentation refers to the technique of decoupling 
channel communications. Sudden changes in traffic and segmenting the traffic when the communication data 
characteristics may indicate malware activity. In this of multiple users are transmitted simultaneously over the 
context, this study will explore a network traffic analysis same communication link in an anonymous network 
method based on deep learning and explore its environment. In contrast to the broader approach of 
application potential in identifying anomalous network network traffic analysis, the concept of overlapping 
behavior in the early stage of online crimes. traffic segmentation entails the identification of the traffic 
aliasing relationship between disparate users and the 
2.2 CNN-based model construction for tor extraction of characteristic information from the traffic of 
overlapping traffic segmentation particular users. This facilitates the detection of potential 
abnormal behavior. The flow of traditional overlapping 
With the development of anonymous communication traffic segmentation is shown in the following Figure 1. 
technology, Tor network is widely used for both legal and 
criminalactivities due to its strong anonymity and privacy 
 
Find 
Overlapping Segment Traffic 
Start Segmentation Output
Tor Traffic Traffic Identification
Points
 
Figure 1: The basic process of overlapping traffic segmentation 
 M −1 N−1
As shown in Figure 1, first identify the key segmentation Yi, j = Xbehavior ,(i+m),( j+n)Wm,n +b  (1) 
m=0 n=0
points in the traffic, and use these segmentation points to 
In Equation (1), Y
segment the traffic and extract feature points related to i , j  represents the value of the 
specific behavior patterns. Then, the segmented traffic output feature map at position (i, j) . Xbehavior ,(i+m),( j+n)  is 
segments are recognized and classified, and finally 
the element in the input feature map. Wm,n  represents the 
further processing is performed based on the recognition 
results. CNN has a strong feature extraction capability weight matrix element of the convolution kernel. The 
and is suitable for handling overlapping traffic in network training model can automatically adjust the weight to 
traffic. Online criminal activities are often accompanied better capture specific behavior features. b  is the bias 
by complex network traffic patterns that may overlap term. M  and N  denote the height and width of the 
with normal traffic, increasing the difficulty of convolution kernel. Equation (2) illustrates why the 
identification. By using convolution kernels to extract rectified linear unit (ReLU), which is easy to understand, 
local features from input traffic, CNN can effectively quick to compute, and capable of handling deep 
separate and identify abnormal behavior patterns in networks, is chosen as the activation function.  
overlapping traffic, thereby helping to detect potential  
criminal activities, such as suspicious transaction requests f (x) =max(0, x)   (2) 
or abnormal data packet transmissions. Therefore, the  
study will construct overlapping traffic segmentation In Equation (2), x  denotes the value input to the 
model based on CNN. CNN applies convolutional kernel activation function after the convolution operation. In the 
to extract local features by sliding window approach, the next pooling stage, the expression is shownin Equation 
calculation is shown in Equation (1) [16]. (3) [17]. 
 
Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144   131 
  
P = max(X N
behavior _ feature )   (3) 
window L = −w ˆ
behavior _ feature (yi log(yi ))   (5) 
 i=1
In Equation (3), P  is the maximum value of the In Equation (5), L  is the loss value. wbehavior _ feature  
pooling window. Xbehavior _ feature  is the element of the input represents the weight factor related to the behavior 
feature map, which includes behavior features extracted characteristics. N  represents the number of samples, yi  
from network traffic such as transmission frequency and is the true label, and i  represents the actual category 
directional features. Through downsampling, the pooling corresponding to the sample, that is, whether it is an 
procedure shrinks the FM's size, lowering computational 
illegal activity. yˆi  is the probability distribution predicted 
cost and enhancing the model's resilience. Finally, the 
fully connected layer (FCL) expression is shown in by the model. By adding the weight factor of the behavior 
Equation (4) [18]. characteristics, the model can more effectively focus on 
 the characteristics related to the criminal behavior, 
thereby improving the recognition effect of the model in 
 z =Wz +b
    (4) specific criminal behavior scenarios. 
z = z , z Due to the highly encrypted and complex time series 
  trade malware , z 
anomaly 
characteristics of Tor traffic, the study introduces the 
 
hollow convolution technique, by introducing cavities in 
In Equation (4), z  is the input high-dimensional 
the convolution kernel. That is, extending the receptive 
feature vector, and W  is the weight matrix of the fully 
field without adding more parameters by adding gaps 
connected layer. z  contains a combination of multiple between the convolution kernel's parts. Hollow 
behavioral features, and ztrade , zmalware , and zanomaly  convolution (also known as expanded convolution) is a 
represent features related to illegal transactions, malware technique that expands the receptive field by inserting 
propagation, and other abnormal behaviors, respectively. holes in the convolution kernel, capturing a wider range 
Among the most often utilized loss functions in of features without increasing the number of parameters. 
classification problems is the cross-entropy loss function. This method helps the model handle long-range 
Equation (5) illustrates its expression by calculating the dependencies while maintaining computational efficiency, 
difference between the probability distribution (PD) of as shown in Figure 2. 
the real labels and the PD predicted by the model. 
 
Convolution kernel: 3×3 Convolution kernel: 3×3 Convolution kernel: 3×3
Receptive field: 3×3 Receptive field: 7×7 Receptive field: 15×15
(a) 1 Dilated Convolution (b) 2 Dilated Convolution (c) 4 Dilated Convolution 
Figure 2: Multi-scale feature extraction using dilated convolution 
 
Figure 2(a), (b), and (c) represent the convolutional complexity, and is suitable for processing complex 
kernel arrangement with convolutional expansion rate of features in Tor traffic. 
1, 2, and 4, respectively. In Figure 2(a), when the In the rest of the model, batch normalization is first 
convolutional expansion rate is 1, the convolutional introduced after each convolutional layer to accelerate 
kernel size is 3×3, which is the same as the conventional convergence and improve generalization. Second, a larger 
convolutional kernel, and the sensory field only covers range of contextual information is captured by expanding 
the local area. In Figure 2(b) with a convolution the sensory field by the application of null convolution. 
expansion rate of 2, the convolution kernel sense field Moreover, to prevent overfitting, a Dropout layer is 
expands to 7×7, but the actual parameters remain 3×3. In introduced to enhance model robustness. Furthermore, to 
Figure 2(c) with a convolution expansion rate of 4, the better handle the complex aspects of Tor traffic, a deep 
sense field further expands to 15×15, and the number of network structure is built by stacking numerous 
parameters remains the same. Null convolution can convolutional, pooling, and FCLs. Therefore, the 
effectively extract multi-scale information and remote- structure of the overlapping traffic segmentation model of 
dependent features without increasing the computational CNNH is shown in Figure 3 below. 
 
132 Informatica 49 (2025) 127–144 J. Hu 
Input layer Output layer
...
... ...
... ...
CNN feature extraction layer Fully connected layer  
Figure 3: Overlapping traffic segmentation model based on CNNH 
 
As shown in Figure 3, the process of the CNNH passed to the FCL, where the features are further 
overlapping traffic segmentation model consists of four comprehensively analyzed to generate a high-dimensional 
parts. First, the input layer receives the original Tor feature vector. Finally, the output layer completes the 
traffic data and passes it to the CNN layer. The CNN prediction and classification of the traffic segmentation 
layer extracts representative features from the input results based on the output of the FCL, helping the model 
traffic through a series of convolution operations and distinguish between legitimate traffic and potential 
pooling operations, including network behavior features criminal behavior. To facilitate understanding of the 
such as packet size, transmission time interval, and specific implementation of the CNNH model, the pseudo 
transmission frequency. These extracted features are then code is given below, as shown in Figure 4. 
 
# Pseudocode for CNNH Model
# Pseudocode for CNNH Model
# Input: Network traffic data (X), labels (Y)
# Output: Predicted labels (Y_hat)
# Step 1: Data Preprocessing
X_preprocessed = preprocess_data(X)  # Normalize and extract features
# Step 2: Dilated Convolution (Hollow Convolution) Module
def DilatedCNN_Module(X):
    Conv1 = Conv2D(filters=32, kernel_size=(3, 3), dilation_rate=1, activation='relu')(X)
    Conv2 = Conv2D(filters=64, kernel_size=(3, 3), dilation_rate=2, activation='relu')(Conv1)
    Conv3 = Conv2D(filters=128, kernel_size=(3, 3), dilation_rate=4, activation='relu')(Conv2)
    PooledFeatures = MaxPooling2D(pool_size=(2, 2))(Conv3)
    return PooledFeatures
X_dilated = DilatedCNN_Module(X_preprocessed)
# Step 3: Fully Connected Layers for Classification
def ClassificationHead(X):
    Dense1 = Dense(units=64, activation='relu')(X)
    Output = Dense(units=num_classes, activation='softmax')(Dense1)
    return Output
Y_hat = ClassificationHead(X_dilated)
# Step 4: Model Training
model = compile_model(optimizer='adam', loss='categorical_crossentropy')
model.fit(X_preprocessed, Y, epochs=50, batch_size=32)
 
Figure 4: Overlapping traffic segmentation model based on CNNH 
 
The pseudo code in Figure 4 shows the workflow of can reduce information loss while maintaining the 
the CNNH model in complex network traffic feature integrity of spatial features. 
extraction. The model effectively expands the receptive 
field through the hole convolution module. Therefore, it 
Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144   133 
2.3 Research on online criminal behavior spaces for subsequent computation of the attention 
recognition model based on LSTM and scores. The attention score is shown in Equation (8). 
 
CNN 
j=n
CNN for traffic segmentation, although excellent in attention(a,V ) =ai v ji  (8) 
spatial feature extraction, still suffers from recognition j=1
limitations when confronted with time-series features in  
Tor traffic. In contrast, LSTM, as a recurrent neural In Equation (8), a  denotes the i th attention weight. 
i
network that excels in processing sequence data, is v j is the element at the j th position in the value vector 
suitable for the field of network traffic analysis due to its 
V . Thus, Figure 5 shows a schematized version of the 
powerful modeling capability of time series features [19]. 
In online criminal behaviors, such as cyber attacks or SAM structure. 
illegal transactions, specific time patterns are often Wq Q qi
shown, such as persistent illegal access attempts or 
regular small-amount fund transfers. By analyzing the 
time series features in network traffic, LSTM can identify X Wk K ai
the regularity of these criminal behaviors and provide 
support for crime prevention by predicting future 
behavior trends. Therefore, the study will try to combine Wv V Att
CNN and LSTM and introduce the SAM to extract and  
classify important features. Figure 5: Self attention mechanism layer structure 
In the overall process design, the input data is first 
processed through a data encoding module to convert the In Figure 5, the input sequence X  is first converted 
raw data into a form suitable for model input. Then, it is to value matrix V , key matrix K , and query matrix Q  
passed through the SAM module in order to enhance the through three weight matrices W , W , and W , 
attention to the key features. Then, CNN and LSTM v k q
modules perform feature extraction and time series respectively. Then, the Q  and K  calculate the 
analysis on the data processed by the attention correlation through dot-product operation, and the result 
mechanism, to capture behavioral patterns that recur over is inputted into the Softmax function (SF) to generate the 
long periods of time. Finally, the model outputs the attention weights a after scaling. These attention weights 
i
recognition results to realize the recognition of WF. In are used to weight the corresponding elements in the V  
the data encoding module, the training data is shown in elements, and finally the weighted value matrix is passed 
Equation (6).  through a summation operation to obtain the output. By 
 using this approach, the model can dynamically 
T = (X1,G1), (X 2 ,G2 ),..., (X n ,Gn ) concentrate on important features according to how 
    (6) important each segment of the input sequence is. This 
 X = (1,−1,1,−1,...,1)
successfully boosts the model's performance when 
 
processing complex data and improves its capacity to 
In Equation (6), T  denotes the training data set. X  
n capture vital information. Finally, in the CNN and LSTM 
and G  denote the n th traffic instance and website class 
n module, the resulting feature sequence is spliced into a 
label, respectively. One-Hot encoding, a popular two-dimensional feature matrix. Then the one-
encoding technique in neural network multi-classification dimensional maximum pooling layer (PL) is connected 
tasks, is crucial for guaranteeing the classification for data dimensionality reduction processing, and the 
model's accuracy, preventing label misrepresentation, and expression is shown in Equation (9). 
increasing computational efficiency. Therefore, One-Hot  
state bits are used for encoding. Further, in the SAM Y l
i,h=5 = max(Z l
1, l
1 Z l
j− Z j , Z l
j+ , j+2 )  (9) 
module, the correlation matrix of the input sequence is  
first calculated as shown in Equation (7) [20]. l
In Equation (9), Y
 i ,h=5  then denotes the result of the 
l
V  =  X W pooling operation via the convolution kernel of 5. Z j , 
v

K  =  X W Z l l
k    (7) j+2 , and Z j+2  all denote the neighboring feature values 

 Q =  X W in the previous layer of 
q l . Subsequently, the extracted 
 spatial features are fused as shown in Equation (10). 
In Equation (7), V , K , Q denote the value, key, and  
F l
query matrices, respectively. W , W , and W j = concat(Y l
i,h=3 ,Y l
i,h=4 ,Y l
i,h=5 )   (10) 
q all denote 
v k
 
the initial weight matrices, which correspond to the value, 
key, and query weight matrices, respectively. These In Equation (10), F l
 j  denotes the fused features after 
matrices project the input sequences into different vector convolution and pooling. Y l
i ,h=3 , Y l
i ,h=4 , Y l
 and i ,h=5  denote 
134 Informatica 49 (2025) 127–144 J. Hu 
the i th output of the pooling of the l th layer with spatial features, temporal features and behavioral features 
convolution kernel size 3, 4, and 5, respectively. Equation to form a unified feature representation. Specifically, 
(11) illustrates how the data is put into the LSTM to spatial features are extracted through the convolution 
extract the temporal features once the fusion is finished. layer, temporal features are captured through the LSTM 
 layer, and behavioral features are extracted based on 
ht = (Whh t−1 +Wxxt +b)  (11) high-risk behavior patterns in traffic. The fused feature 
representation is shown in Equation (12). 
 
 
In Equation (11), h  and h  represent the hidden 
t t−1 z =z
states of the current time step and the previous time step, spatial + ztemporal + zbehavioral  (12) 
respectively, that is, the contextual information of the  
behavioral features at the current moment. For identifying In Equation (12), zspatial  represents the spatial 
criminal behavior, the information of the previous time features extracted by the convolution layer, which can 
step, such as the occurrence of certain abnormal help identify local anomalies in network traffic. ztemporal  
behaviors at the previous moment, can help predict 
represents the temporal features extracted by the LSTM 
whether the behavior at the current moment is abnormal. 
layer, which captures recurring patterns in the time 
x  represents the input features of the current time step, 
t dimension, especially high-frequency packet transmission 
and W  represents the weight matrix of the hidden state, 
h behaviors. z  represents the high-level features 
behavioral
which can learn how to transfer the criminal behavior obtained by the behavioral feature extraction mechanism, 
features of the previous moment to the current moment. which reflects specific behavioral patterns such as 
W  is the weight matrix of the input features, which is 
x malware propagation and illegal transactions.  ,  , and 
used to weight the input features of the current time step.   are all weighting factors. The weights are adjusted 
These weights can learn the importance of different 
according to the importance of different features to 
behavioral features in predicting criminal behavior. b  is 
ensure the sensitivity of the model to specific behavioral 
the bias term, and   is the activation function. By patterns. Therefore, the improved CNN-LSTM structure 
introducing nonlinearity, the model can capture complex is shown in Figure 6 below. 
behavioral patterns. Finally, the model further fuses 
 
Unified feature Time feature 
Spatial characteristics
representation dependence
Convolution kernel: 5
Convolution kernel: 4
Convolution kernel: 3 Pooling LSTM layer Flatten layer
Convolutional layer layer
Integrate spatial 
Extracting spatial Reduced dimensionality 
features to form a 
features space features
unified representation
Capturing time Expanding high- Classification 
dependencies dimensional features results
 
Figure 6: The structure and temporal feature fusion of the MCNN-LSTM Model 
 
 
Input sequence
...
Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144   135 
In Figure 6, first, the input sequence passes through time-series data and is able to capture long-range 
multiple convolutional layers, each with a different dependencies in the data. Subsequently, the high-
convolutional kernel size to capture different scale dimensional features output from the LSTM layer are 
features in the input data. Subsequently, a PL is used to expanded into one-dimensional vectors through the 
downsample the convolved FMs, so decreasing their size Flatten layer. Finally, the output of the classification or 
and, consequently, the computational complexity. Then, regression task is carried out through the FCL, thereby 
the multi-scale features are integrated through a fusion identifying potential criminal behavior in network traffic. 
layer to form a unified feature representation, helping the Therefore, according to the above calculations, the online 
model capture more comprehensive traffic information. criminal behavior recognition process based on MCNN-
Immediately afterward, these features are passed to the LSTM is shown in Figure 7. 
LSTM layer. The LSTM layer specializes in processing 
 
Start Data Preprocessing
Binary classification Multi-class 
model classification model
CNN-LSTM for 
Network traffic
Open World
Is it a Monitoring Website?
Y
N
 CNN-LSTM for 
Open World Label
closed-world
Sealed World 
End
Label
 
Figure 7: Online criminal behavior identification process 
 
As shown in Figure 7, first, during the training phase, does not belong to the accusation domain, it enters the 
the preservation model is trained using the binary and closed-world labeling processing. Through the staged 
multi-classification datasets created from the network processing, the model is able to process the open-world 
traffic data, respectively. In the recognition phase, the and closed-world labels separately, thus improving the 
input network traffic is first processed by the open-world accuracy and efficiency of the recognition. To intuitively 
MCNN-LSTM to determine whether it is labeled in the demonstrate the implementation process of the MCNN-
accusation domain. If the traffic belongs to the accusation LSTM model, its pseudo code is given below, as shown 
domain, it enters the open-world label processing and is in Figure 8. 
recognized using the closed-world MCNN-LSTM. If it 
136 Informatica 49 (2025) 127–144 J. Hu 
# Pseudocode for MCNN-LSTM Model
# Pseudocode for MCNN-LSTM Model
# Input: Network traffic data (X), labels (Y)
# Output: Predicted labels (Y_hat)
# Step 1: Data Preprocessing
X_preprocessed = preprocess_data(X)  # Normalize and extract features
# Step 2: Multi-Scale Convolution (MCNN) Module
def MCNN_Module(X):
    Conv1 = Conv2D(filters=32, kernel_size=(3, 3), activation='relu')(X)
    Conv2 = Conv2D(filters=64, kernel_size=(5, 5), activation='relu')(Conv1)
    Conv3 = Conv2D(filters=128, kernel_size=(7, 7), activation='relu')(Conv2)
    CombinedFeatures = concatenate([Conv1, Conv2, Conv3])
    PooledFeatures = MaxPooling2D(pool_size=(2, 2))(CombinedFeatures)
    return PooledFeatures
X_spatial = MCNN_Module(X_preprocessed)
# Step 3: Temporal Feature Extraction with LSTM
def LSTM_Module(X):
    LSTM_output = LSTM(units=128, return_sequences=True)(X)
    return LSTM_output
X_temporal = LSTM_Module(X_spatial)
# Step 4: Self-Attention Mechanism (SAM)
def SelfAttention(X):
    Q = dot(X, Wq)  # Query matrix
    K = dot(X, Wk)  # Key matrix
    V = dot(X, Wv)  # Value matrix
    AttentionScores = Softmax(dot(Q, K.T) / sqrt(d_k))  # Scaled Dot-Product Attention
    Output = dot(AttentionScores, V)  # Weighted sum of values
    return Output
X_attention = SelfAttention(X_temporal)
# Step 5: Fully Connected Layers for Classification
def ClassificationHead(X):
    Dense1 = Dense(units=128, activation='relu')(X)
    Output = Dense(units=num_classes, activation='softmax')(Dense1)
    return Output
Y_hat = ClassificationHead(X_attention)
# Step 6: Model Training
model = compile_model(optimizer='adam', loss='categorical_crossentropy')
model.fit(X_preprocessed, Y, epochs=50, batch_size=32)
 
Figure 8: Schematic diagram of MCNN-LSTM pseudo code 
 
This pseudo code in Figure 8 clearly shows the main Windows 10 operating system with a 12-core Xeon 
modules of the MCNN-LSTM model and their interaction Platinum 8163 processor and a graphics card NVIDIA 
process. First, the multi-core convolution module Tesla P100-16GB. The model development language is 
captures the multi-scale features of the input data and Python 3.7. The study selects the CW200 dataset as the 
combines the pooling layer to reduce the computational experimental object, which contains a variety of normal 
complexity. Subsequently, the LSTM module is and abnormal traffic with high noise and complex traffic 
employed to model the time series features, with the self- patterns, meeting the needs of overlapping traffic 
attention mechanism further emphasizing the key features segmentation and abnormal behavior identification in 
to enhance the classification performance. Finally, the anonymous networks. The diversity of protocol 
network traffic classification is completed by the fully distribution and user behavior is taken into account 
connected layer. during data collection in order to mimic traffic patterns in 
real-world scenarios as closely as possible. The dataset 
3 Results collects traffic data from 200 different websites accessed 
through the Tor network in a closed world. Each site has 
3.1 Performance testing of overlapping 2,500 traffic accesses, which are divided into training and 
traffic segmentation model for CNNH test sets in a 6:4 ratio. A stratified sampling method is 
used to ensure that the proportions of the training and test 
The study began by setting up a suitable sets are consistent in terms of protocol type, traffic 
experimental environment to meet the computational feature distribution, and attack type, thus avoiding the 
requirements of the experiment. The experiment uses bias of the model performance evaluation due to uneven 
Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144   137 
data distribution. In addition, to reduce the risk of proportion of category samples is balanced and covers a 
overfitting, the dropout regularization technique is variety of network protocols and anonymous network 
introduced into the experiment, and the diversity of the scenarios. Data cleaning, feature extraction, and 
training data is improved by data enhancement. Among standardization are performed during data preprocessing, 
them, normal traffic accounts for 60%, abnormal traffic and traffic behavior patterns are labeled as normal or 
accounts for 40%, and is further divided into four abnormal. First, the settings of each parameter in CNNH 
categories: Trojans, Worms, Viruses, and Adware. The are shown in Table 2 below. 
 
Table 2: Model parameter settings 
Parameter Value 
Input dimension 5000 
Network architecture layers 12 
Batch size 256 
Epochs 50 
Gradient optimization function Adam 
Learning rate 0.001 
Dropout 0.4 
 
Table 2 shows the settings for input dimension, normal network behavior from potential criminal 
network architecture layers, training details, optimizer, behavior, and can more accurately capture traffic patterns 
learning rate and Dropout rate, respectively. The study related to criminal activities such as illegal transactions 
uses CNN, dilated CNN (DC-CNN), and multi-layer and malware propagation, thereby reducing false 
perceptron with dilated convolution (MLP-DC) as positives and improving the effectiveness of crime 
comparison models. When criminal activities are carried identification. Therefore, the traffic segmentation 
out in anonymous networks, criminal behavior is often accuracy is used as an indicator, and the test results are 
hidden in normal traffic. A high segmentation accuracy shown in Figure 9. 
means that the model can more accurately distinguish 
 
1.00 0.97 1.00
0.95 0.95 0.92
0.91
0.90 0.90 0.87
0.89
0.85 0.85
0.85
0.80 0.80 0.85
0.83
0.75 0.75
CNN CNN
0.70 DC-CNN 0.70 DC-CNN
MLP-DC MLP-DC
0.65 CNNH 0.65 CNNH
0.60 0.60
0 10 20 30 40 50 0 10 20 30 40 50
Iterations Iterations
(a) Training set (b) Test set  
Figure 9: Accuracy trends on training and test sets for different models 
 
Figure 9(a) and (b) show the accuracy of CNN, DC- models on the training set is 0.85, 0.89, 0.91, 0.97, 
CNN, MLP-DC, and CNNH with the iterations on the respectively. In Figure 9(b), the accuracy of the four 
training set and test set, respectively. In the case of models on the test set are 0.83, 0.85, 0.87, 0.92, 
malware propagation, the model identified multiple respectively. DC-CNN and MLP-DC introduce the 
suspicious data packets through high-precision traffic advantage of null convolution to extract deep features 
segmentation. The transmission frequency and time more comprehensively. To verify whether the difference 
characteristics of these data packets are highly consistent in accuracy between different models on the training and 
with known malware propagation behaviors, thereby test sets is statistically significant, a paired t-test is 
enabling law enforcement to swiftly identify the source of performed on the normalized accuracy and the 95% 
the behavior. confidence interval is calculated, as shown in Table 3. 
In Figure 9(a), when the number of iterations is 50, 
the accuracy of CNN, DC-CNN, MLP-DC and CNNH 
 
Normalized accuracy
Normalized accuracy
138 Informatica 49 (2025) 127–144 J. Hu 
Table 3: Statistical significance analysis 
Normalized 95% 
Statistical 
Dataset Model Comparison accuracy confidence interval P-value 
significance 
difference (%) (%) 
CNNH vs. CNN 12 [10.2, 13.8] < 0.01 Significant 
Trainin
CNNH vs. DC-CNN 8 [6.4, 9.6] < 0.05 Significant 
g set 
CNNH vs. MLP-DC 6 [4.7, 7.3] < 0.05 Significant 
CNNH vs. CNN 9 [7.5, 10.5] < 0.01 Significant 
Testing 
CNNH vs. DC-CNN 7 [5.6, 8.4] < 0.05 Significant 
set 
CNNH vs. MLP-DC 5 [3.8, 6.2] < 0.05 Significant 
 
The statistical analysis results in the table show that 0.05, indicating statistical significance. In addition, the 95% 
the accuracy improvement range of CNNH on the confidence interval indicates that the range of differences 
training set is from +6.0% to +12.0%, and the is relatively stable. Subsequently, the segmentation effect 
improvement range on the test set is from +5.0% to of each model under different traffic flows is shown in 
+9.0%. The P-values for all comparisons are less than Table 4 below. 
 
Table 4: Performance evaluation indicators for each algorithm 
CNN DC-CNN MLP-DC CNNH 
Index 
Normal Attack Normal Attack Normal Attack Normal Attack 
P/% 83.52 85.67 86.14 88.43 88.79 90.57 91.43 93.45 
R/% 84.67 86.82 88.53 89.92 90.35 91.74 93.46 94.32 
FPR/
13.65 12.34 10.74 9.98 8.96 7.43 4.15 3.07 
% 
F1/% 84.09 86.24 87.32 89.17 89.56 91.15 92.43 93.88 
AUC 0.769 0.788 0.812 0.828 0.839 0.846 0.928 0.935 
Time/
12.34 13.02 15.89 16.58 19.65 20.23 18.41 19.12 
s 
Resource 
consumpti 68.54 69.85 72.32 73.46 75.69 76.78 70.17 71.54 
on/% 
 
Table 4 displays the performance comparison of the balance of performance is critical. Maintaining sensitivity 
models for segmentation under Normal and Attack ensures that abnormal behavior is not ignored due to low 
traffic. False positive rate (FPR) is critical in law detection capabilities. Further analysis of the 
enforcement contexts, as a high FPR could lead to experimental results shows that false positives occur 
misidentifying benign traffic as criminal activity, mainly in normal traffic with high access frequency, such 
resulting in wasted resources. The CNNH model has as normal data transmission of certain legitimate 
significantly higher values for P, R, F1 and AUC. protocols being misclassified as abnormal traffic. This 
Especially, the P-value of CNNH reaches 93.45% and the may be due to the similarity between the characteristics 
R value is 94.32% under Attack traffic. Meanwhile, the of high-frequency access patterns and abnormal traffic. 
FPR of CNNH is only 4.15%, indicating that it can False negatives, on the other hand, are mainly 
effectively reduce the false alarms. However, with the concentrated on abnormal traffic with weaker 
increase of model complexity, the resource consumption characteristics or close to normal traffic characteristics, 
rate and calculation time of CNNH increase accordingly, such as covert adware traffic. False positives can lead to 
reaching 71.54% and 19.12s, respectively. Although its reduced efficiency in resource allocation, while false 
resource requirements are high, the significant negatives can cause some potential threats to be ignored. 
improvements in accuracy and sensitivity make up for 
this shortcoming. In contrast, the traditional CNN is at a 3.2 Online crime recognition experiment 
lower level in all performance indicators. However, its based on MCNN-LSTM 
resource consumption rate and computation time are low, 
which makes it suitable for scenarios with limited In the hyperparameter setting of MCNN-LSTM, the 
computational resources. The proposed model has been learning rate is optimized in the range of 0.0001 to 0.01 
demonstrated to effectively reduce the FPR, ensuring by grid search and finally selected as 0.001. The batch 
higher accuracy and reliability in identifying criminal size is set to 32. The number of hidden layer nodes is set 
behavior. Furthermore, it has been shown to facilitate the to 128, which can effectively capture the time series 
optimization of resource allocation and action decisions. characteristics of traffic data. The time step is set to 20. 
In the experiment, the recall rate is equivalent to the Adam is used as the optimizer to improve the training 
sensitivity, i.e., the proportion of actual anomalous traffic efficiency. The number of training rounds is set to 50, 
that is correctly detected. In practical scenarios, this and the early stopping strategy is combined to avoid 
Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144   139 
overfitting. To improve the generalization ability of the network (MLP-CNN), long short-term memory with 
model, Dropout is added to the network and the ratio is attention mechanism (LSTM-Att), and LSTM are 
set to 0.3. The study labeled the traffic data set according selected as comparison algorithms. First, the accuracy test 
to different crime types, mainly including three types of results of the four models for different types of online 
crimes: online fraud, malware propagation, and illegal criminal behaviors are shown in Figure 10 below. 
transactions. multi-layer perceptron convolutional neural 
 
100
95
90 LSTM
MLP-CNN
85
LSTM-Att
80 MCNN-LSTM
75
0 Online Fraud Malware Propagation Illegal Transactions
Crime Types  
Figure 10: Model performance in crime type identification 
In Figure 10, the MCNN-LSTM model showed the worst performance among the three crime types, 
best accuracy, especially in the identification of malware especially in the identification of malware propagation, 
propagation and illegal transactions, reaching 96.54% and which is only 87.43%. Subsequently, to evaluate the 
92.87% respectively. This is because MCNN-LSTM performance of each model in crime prediction and 
combines the spatial feature extraction capability of CNN prevention, the following indicators are used: prediction 
with the temporal feature capture capability of LSTM, accuracy, early warning time (early warning time is 
and can better handle the complex patterns and temporal defined as the time interval between the first detection of 
dependencies in criminal behavior. Although LSTM-Att an abnormal traffic pattern by the model and the actual 
improves the focus on important features by introducing occurrence of the attack behavior), precision, FPR, mean 
the attention mechanism, its spatial feature extraction detection time, and area under the receiver operating 
capability is weak, so it is still inferior to MCNN-LSTM characteristic curve (AUC). The results are shown in 
in multi-dimensional feature extraction. LSTM has the Table 5 below. 
 
Table 5: Performance comparison of models in crime prediction and early warning tasks 
MLP- LSTM-
Metrics LSTM MCNN-LSTM 
CNN Att 
Prediction Accuracy /% 80.45 84.67 88.76 92.43 
Average Early Warning Time 
15 18 25 30 
/Minutes 
Precision /% 79.87 83.54 87.34 91.23 
False Positive Rate /% 9.67 8.23 6.45 5.12 
Mean Time to Detect /Seconds 42.8 35.6 28.1 24.3 
AUC 0.835 0.874 0.915 0.945 
 
In Table 5, MCNN-LSTM shows the best abnormal behavior features and reduce the interference of 
comprehensive performance. Compared with other irrelevant features. The accuracy of MCNN-LSTM is also 
models, MCNN-LSTM achieved a prediction accuracy of better than other models, with the lowest false positive 
92.43%, which is significantly higher than LSTM's rate of only 5.12%. The model performs well in reducing 
80.45% and MLP-CNN's 84.67%. Although LSTM-Att false positives. In contrast, LSTM has a higher false 
introduces the attention mechanism, its spatial feature positive rate of 9.67%, due to the lack of spatial feature 
extraction capability is insufficient, resulting in the modeling, its adaptability to traffic pattern changes is 
advance warning time and prediction accuracy being poor and the false alarm rate is significantly high. In 
inferior to MCNN-LSTM. In addition, MCNN-LSTM terms of average detection time, the MTTD of MCNN-
can warn of criminal behavior 30 minutes in advance, this LSTM is 24.3 seconds, which is better than 28.1 seconds 
capability is mainly due to the model's deep modeling of of LSTM-Att and 35.6 seconds of MLP-CNN, which 
time series characteristics. In particular, the introduction further proves the real-time detection capability of the 
of the SAM further enhances the model's ability in key model. 
feature extraction, enabling it to quickly focus on 
Accuracy/%
140 Informatica 49 (2025) 127–144 J. Hu 
Finally, the concept of concept drift is used to protocol-related features (e.g., packet length, time 
evaluate the robustness and adaptability of the models in interval) are subject to random changes, thereby 
the face of changing data distributions. Conceptual drift simulating fluctuations in protocol usage and traffic 
refers to the phenomenon that the data distribution characteristics. Furthermore, the incorporation of novel 
changes over time. Its practical application is due to the attack types at various temporal points serves to mirror 
fact that the traffic pattern, feature distribution, and user the progression of attack patterns. These designs are 
behavior of a website may change over time. The drift intended to closely mirror the evolving trends in the 
simulation involves the gradual adjustment of the ratio of actual network environment, thereby facilitating the 
normal traffic to abnormal traffic, thereby reflecting the evaluation of the model's efficacy in handling long-term 
dynamic changes in network attack behaviors. The distribution shifts. The results are shown in Figure 11. 
 
100
90
80
70
60
5
10
20
60
 
Figure 11: Impact of concept drift on model accuracy over time 
Figure 11 shows the model accuracy of the four and rapid loss of accuracy. MLP-CNN is biased toward 
models under training time and testing time of 5, 10, 20, fixed patterns in feature extraction and has insufficient 
and 60, respectively. As the interval time increases, the adaptability to concept drift. 
concept drift leads to different degrees of decrease in the Finally, several representative models including 
accuracy of each model. A lower drop value suggests that time-series Transformer, spatial-temporal graph 
the model is more flexible and can continue to perform convolutional network and transformer framework (ST-
well in classification even when concepts diverge. When GCN+Transformer), bidirectional long short-term 
the interval between training and testing events is 60 memory with attention mechanism (BiLSTM+Attention), 
days, the recognition accuracies of LSTM, MLP-CNN, random forest and principal component analysis 
LSTM-Att, and MCNN-LSTM models are 60.2%, 73.8%, (RF+PCA), and K-nearest neighbor (KNN) are selected 
80.7%, and 89.5%, respectively. The advantage of for comparison. These five models cover the hybrid 
MCNN-LSTM in dynamically changing environments framework and transformer method in modern deep 
lies in its optimized model architecture. The multi-kernel learning, as well as the classic algorithms of traditional 
convolution module extracts multi-scale spatial features machine learning and non-deep learning. It can fully 
by convolution kernels of different sizes. The SAM reflect the advantages and disadvantages of different 
dynamically focuses on key features to reduce technical routes in network traffic analysis. The dataset 
interference. It works in conjunction with time series used is the representative open world network traffic 
modeling to significantly improve adaptability to dataset CIC-IDS2017. It records normal traffic and 12 
dynamic changes in the traffic feature distribution. In malicious attack behaviors, has 80 traffic features, and 
contrast, LSTM lacks spatial feature extraction has highly complex traffic patterns and open network 
capabilities and relies only on time series modeling, environment characteristics. The results are shown in 
resulting in high sensitivity to changes in traffic patterns Table 6. 
 
 
 
 
 
 
Recognition accuracy/%
Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144   141 
Table 6: Performance comparison and scalability testing of models under different traffic loads 
Accurac
Average 
y at 300% 
Traffic Accuracy FPR processing P-
Model Name data 
condition /% /% time value 
expansion 
(ms/sample) 
/% 
MCNN-LSTM 96.78 2.95 18.6 94.12 < 0.05 
Time-series 
95.23 3.21 17.5 92.45 < 0.05 
Transformer 
Small ST-
traffic 95.78 3.1 20.9 93.34 < 0.05 
GCN+Transformer 
(10% BiLSTM+Attention 92.67 4.89 19.2 89.45 < 0.05 
data) 
Random 
88.23 7.34 14.7 86.34 < 0.05 
Forest+PCA 
KNN 84.12 9.78 15.9 82.45 < 0.05 
MCNN-LSTM 95.89 3.45 20.8 93.34 < 0.05 
Time-Series 
94.12 3.89 18.3 91.67 < 0.05 
Transformer 
Mediu ST-
m traffic 94.78 3.56 21.2 92.78 < 0.05 
GCN+Transformer 
(50% BiLSTM+Attention 90.78 5.45 19.6 88.01 < 0.05 
data) 
Random 
86.34 8.12 15.2 84.78 < 0.05 
forest+PCA 
KNN 82.45 10.78 16.5 79.67 < 0.05 
MCNN-LSTM 94.78 3.89 22.5 92.12 < 0.05 
Time-series 
93.12 4.12 19.9 90.56 < 0.05 
Transformer 
High ST-
traffic 93.78 4.01 22.8 91.45 < 0.05 
GCN+Transformer 
(100% BiLSTM+Attention 89.34 6.12 20.3 87.12 < 0.05 
data) 
Random 
84.89 9.34 15.8 82.45 < 0.05 
forest+PCA 
KNN 81.12 11.78 16.7 78.34 < 0.05 
 
In table 6, MCNN-LSTM shows high accuracy in all websites specialize in the illegal sale of pirated software. 
traffic load scenarios, reaching 96.78% in small traffic The illegal sale of pirated software not only violates 
scenarios and maintaining 94.78% in high traffic intellectual property laws, but also involves illegal 
scenarios. Moreover, it demonstrated strong classification transactions and fund transfers through anonymous 
capabilities with an FPR of 3.89%. This is mainly due to networks, which is a common and widespread form of 
its multi-module synergy combining CNN and LSTM, online crime. Such websites conduct transactions through 
which can effectively capture the complex relationship encrypted networks and anonymous payment systems, 
between spatial and temporal features. Time-series and users can purchase unauthorized commercial 
Transformer and ST-GCN+Transformer also perform software, hacking tools, and cracked software. On one of 
similarly in terms of FPR and accuracy. The global the websites, called Dark Web Software Mall, about 
modeling capabilities of these two models allow them to 4,000 users visit and trade every day. The website uses 
perform well in dynamic network scenarios. The encrypted communication protocols and anonymous 
accuracy of the BiLSTM+attention model is subject to a payment methods such as Bitcoin. 
significant decrease in high-traffic scenarios due to the The experiment uses web crawler technology to 
limitations of the feature extraction method. In contrast, capture network traffic data from the website for 10 days, 
the KNN and Random Forest methods demonstrate a with a total of 400,000 packets, of which 200,000 packets 
higher degree of suitability for small-scale data sets. are directly related to illegal software transactions, 
However, when the data is expanded to 300%, the including user login, browsing illegal software, ordering, 
accuracy undergoes a substantial decline, indicating a and anonymous payment. At the same time, for 
lack of adaptability to large-scale, complex scenarios. comparison, the study also obtains traffic data from legal 
e-commerce platforms in the same period, totaling 
3.3 Simulation test 150,000 packets, which are related to browsing and 
purchasing legal software. Traffic capture and model 
In online criminal behavior on anonymous networks, the 
training are performed on a server running the Linux 
illegal software trading market is active, and many 
142 Informatica 49 (2025) 127–144 J. Hu 
operating system, with a 16-core CPU, 32GB memory, transactions, the study preprocesses the data, removed 
and 500GB storage space. The experiment uses the noise, and extracted key features, including packet size, 
Wireshark tool to capture network traffic to ensure the time interval, and transmission direction. By analyzing 
accuracy and integrity of the data. The traffic data the traffic characteristics, the model is further used to 
includes parts obtained from legal e-commerce websites distinguish the network traffic of legal software 
and illegal software trading websites, with a total of transactions, and illegal software sales. The results are 
400,000 packets. To ensure that the model can effectively shown in Table 7 below. 
identify traffic behaviors related to illegal software 
 
Table 7: Detection results of illegal software transactions compared to legitimate traffic 
Legitimate traffic Illegal software 
Metric Detection results 
(control group) transaction traffic 
Large-scale software Average of 3.2 
0.5-1.1 times/day 8.7-12.3 times/day 
downloads (times/day) detections/day 
Frequent small anonymous Average of 1.8 20.6-30.4 
1.3-2.2 times/day 
payments (transactions/day) transactions/day times/day 
Abnormal data packet Average detection 
75.4 MB/day 502.6 MB/day 
transmission (data volume/day) of 100 MB/day 
Average file size of downloads 
15.3 MB 10.7 MB 50.8 MB 
(MB) 
Anonymous payment amount 
Average of $512.4 $ 52.3 - $ 98.5 $ 10.7 - $ 49.6 
(per transaction) 
 
The experimental results show that the model can extraction of multi-scale features by introducing atrous 
effectively identify illegal software transactions. First, in convolution technology. Second, the accuracy trends 
the detection of large-scale software downloads, there are of different models on the training and test sets. As the 
an average of 8.7 to 12.3 large file downloads per day in number of iterations increases, the normalized 
illegal transaction traffic, while there are only 0.5 to 1.1 accuracy of CNNH reached 97% and 92% on the 
downloads in legal traffic. Secondly, frequent small training and test sets, respectively. In addition, the 
anonymous payments have also become an important statistical significance analysis further proved the 
feature for identification. An average of 20.6 to 30.4 reliability of these performance differences, with P-
small payments are made per day in illegal transaction 
values less than 0.05 for all comparisons. The 
traffic, while legal transactions are only about 1.8. In 
prediction accuracy of the MCNN-LSTM model 
addition, the transmission volume of abnormal data 
reached 92.43%, the precision was 91.23%, the false 
packets in illegal traffic far exceeds the normal range, 
positive rate was 5.12%, and it could achieve a 30-
with an average of 502.6MB of data transmitted per day, 
while the transmission volume of legal traffic is about minute early warning capability. In comparison, the 
75.4MB. accuracy of the traditional LSTM model and the MLP-
Through the traffic identification of criminal CNN model under complex traffic patterns was 
behavior, technology not only provides analysis results, 84.67% and 80.45%, respectively. 
but more importantly, it helps law enforcement agencies Finally, compared to traditional methods, the 
take quick action. The proposed model provides proposed model showed significant advantages in 
categorized anomalous traffic patterns and their scalability and dynamic adaptability. Traditional 
associated characteristics, such as traffic types and time methods had acceptable performance on small data 
intervals, which can help law enforcement identify sets, but their accuracy was less than 80% in high-load 
potential threats and prioritize suspicious behavior for traffic scenarios, making it difficult to effectively 
further investigation. capture dynamic characteristics in complex network 
environments. In contrast, by combining deep learning 
4 Discussion technology, MCNN-LSTM not only performed stably 
in highly complex scenarios, but also provided early 
The CNNH model showed excellent performance in 
warning capabilities for criminal behavior, showing a 
the overlapping traffic segmentation task, with 
wide range of practical application potential. 
precision and recall reaching 91.43% and 93.46%, 
respectively, and a false positive rate of only 4.15%. In 
contrast, DC-CNN and MLP-DC each had an accuracy 5 Conclusion 
lower than 87% due to their limited feature extraction Through real-time monitoring of network traffic, the 
capabilities. The main reason for this performance system can detect potential risks before criminal activities 
difference was that CNNH achieved effective occur. In view of this, this study proposed a CNNH-based 
Online Criminal Behavior Recognition Based on CNNH… Informatica 49 (2025) 127–144   143 
Tor overlapping traffic segmentation model and an through adversarial training and multimodal data fusion. 
MCNN-LSTM website fingerprint recognition model. In addition, integrating data from multiple sources, such 
The performance test results indicated that the average as user behavior logs and vulnerability information, can 
segmentation accuracy of CNNH was 95.05% when the improve the ability to detect low-frequency anomalous 
number of iterations was 50. Under Attack traffic, the P, behavior. In terms of privacy protection, future work will 
R, F1, and AUC values of CNNH were 93.45%, 94.32%, introduce data encryption and anonymization processing 
93.88%, and 0.935, respectively. The FPR was only technologies, and combine the context post-processing 
3.07%, which was better than the comparison model. Its mechanism to optimize false positive control to ensure 
computational time consumption was 19.12s, and the the credibility and legality of the model application. 
resource consumption rate was 71.54%. In the MCNN- Future research will also explore the applicability of the 
LSTM performance test, its recognition accuracy for model in other potential application areas. For example, 
malware propagation and illegal transactions reached in enterprise network security, MCNN-LSTM can be 
96.54% and 92.87% respectively. In the prediction used to detect abnormal traffic and potential attack 
experiment results, the prediction accuracy of MCNN- behavior in the enterprise internal network, helping to 
LSTM was 92.43%, and it could issue an early warning improve security protection capabilities. At the same time, 
30 minutes in advance, with a false positive rate of only future research must focus on the ethical and privacy 
5.12% and a detection time of only 24.3s. In terms of implications of model deployment, strictly adhere to 
computational time consumption, the MCNN-LSTM relevant laws and regulations, and ensure the social 
model consumes 102ms per round of training. In the responsibility and legality of the technology. 
concept drift test, the recognition accuracy of the MCNN-
LSTM model was 89.5% when the training and testing References 
events were separated by 60 days. This shows that the 
proposed model in the study had excellent recognition [1] F. Zhou, B. Zhou, S. Zhao, and G. Pan, 
accuracy and robustness. “DeepOffense: a recurrent network-based approach 
for crime prediction,” CCF Transactions on 
Pervasive Computing and Interaction, vol. 4, no. 3, 
6 Limitations and future research pp. 240-251, 2022. https://doi.org/10.1007/s42486-
The proposed MCNN-LSTM model performs well in 022-00100-x 
anonymous network traffic analysis, but it still has some [2] R. H. Shi, and X. Q. Fang, “Anonymous classical 
limitations. As the model complexity increases, CNNH message transmission through various quantum 
and MCNN-LSTM have high computational resource and networks,” IEEE Transactions on Network Science 
time consumption requirements, and may be difficult to and Engineering, vol. 11, no. 3, pp. 2901-2913, 
deploy in real-time in hardware resource-constrained 2024. https://doi.org/10.1109/TNSE.2024.3354327 
environments. The study simulated concept drift by [3] Y. J. Chen, Y. Su, M. Y. Zhang, H. Y. Chai, Y. K. 
adjusting feature distribution, protocol variations, and Wei, and S. Yu, “Fedtor: an anonymous framework 
attack types, but drift in real-world scenarios may be of federated learning in internet of things,” IEEE 
more complex, such as sudden changes in user behavior Internet of Things Journal, vol. 9, no. 19, pp. 18620-
or nonlinear changes in traffic patterns. In addition, 18631, 2022. 
advanced attackers may confuse traffic patterns by https://doi.org/10.1109/JIOT.2022.3162826 
disguising malicious traffic or using complex encryption [4] Y. Wang, “Deep learning models in computer data 
techniques to increase the difficulty of detection. For mining for intrusion detection,” Informatica, vol. 47, 
highly dynamic features or low-frequency anomalous no. 4, 2023. https://doi.org/10.31449/inf.v47i4.4942 
behavior, the model may run the risk of failing to detect [5] X. D. Gu, B. C. Song, W. Lan, and M. Yang. “An 
them. Although the false positive rate has been reduced, it online website fingerprinting defense based on the 
may still cause false alarms that affect monitoring non-targeted adversarial patch,” Tsinghua Science 
efficiency. In addition, the model may cause privacy and Technology, vol. 28, no. 6, pp. 1148-1159, 
issues when applied to anonymous network monitoring, 2023. https://doi.org/10.26599/TST.2023.9010062 
such as excessive monitoring or false alarms that result in [6] R. Rawat, and A. Rajavat, “Illicit events evaluation 
innocent users being tagged. The scope of monitoring using NSGA-2 algorithms based on energy 
must be strictly limited and privacy regulations must be consumption,” Informatica, vol. 48, no. 18, 2024. 
followed. https://doi.org/10.31449/inf.v48i18.6234 
Future research will focus on optimizing the [7] K. Xian, “An optimized recognition algorithm for 
performance and practical value of the model. First, SSL VPN protocol encrypted traffic,” Informatica, 
through the lightweight design of the model and the vol. 45, no. 6, 2021. 
distributed computing architecture, the computational and https://doi.org/10.31449/inf.v45i6.3730 
memory consumption can be reduced, and the scalability [8] M. Nasr, A. Bahramali, and A. Houmansadr, 
of large-scale real-time monitoring can be improved. "Defeating DNN-based traffic analysis systems in 
Second, by combining long-term real Tor traffic data, the real-time with blind adversarial perturbations," In 
adaptability of the model in complex concept drift Proceedings of the 30th USENIX Security 
scenarios will be verified, and the robustness of the Symposium (USENIX Security 21), 2705-2722, 
model against obfuscation strategies will be improved 2021. 
144 Informatica 49 (2025) 127–144 J. Hu 
[9] K. Yesodha, M. Krishnamurthy, M. Selvi, and A. time defense against website fingerprinting attacks 
Kannan, “Intrusion detection system extended CNN based on deep reinforcement learning,” IEEE 
and artificial bee colony optimization in wireless Transactions on Network and Service Management. 
sensor networks,” Peer-to-peer Networking and vol. 21, no. 3, pp. 2944-2961, 2024. 
Applications, vol. 17, no. 3, pp. 1237-1262, 2024. https://doi.org/10.1109/TNSM.2024.3360082 
https://doi.org/10.1007/s12083-024-01650-w [16] M. Guo, Y. R. Sun, Y. L. Zhu, M. Q. Han, G. Dou, 
[10] B. Q. Gan, Y. Q. Chen, Q. P. Dong, J. L. Guo, and R. and S. P. Wen, “Pruning and quantization algorithm 
X. Wang, “A convolutional neural network intrusion with applications in memristor-based convolutional 
detection method based on data imbalance,” The neural network,” Cognitive Neurodynamics, vol. 18, 
Journal of Supercomputing, vol. 78, no. 18, pp. no. 1, pp. 233-245, 2024. 
19401-19434, 2022. https://doi.org/10.1007/s11227- https://doi.org/10.1007/s11571-022-09927-7 
022-04633-x [17] T. Li, Y. B. Yin, Z. Yi, Z. Guo, Z. L. Guo, and S. L. 
[11] L. W. Xu, X. P. Zhou, Y. Tao, L. Liu, X. Yu, and N. Chen, “Evaluation of a convolutional neural 
Kumar, “Intelligent security performance prediction network to identify scaphoid fractures on 
for IoT-enabled healthcare networks using an radiographs,” Journal of Hand Surgery, vol. 48, no. 
improved CNN,” IEEE Transactions on Industrial 5, pp. 445-450, 2023. 
Informatics, vol. 18, no. 3, pp. 2063-2074, 2021. https://doi.org/10.1177/17531934221127092 
https://doi.org/10.1109/TII.2021.3082907 [18] S. F. Lyu, and J. Q. Liu, “Convolutional recurrent 
[12] F. R. Yan, G. H. Zhang, D. W. Zhang, X. H. Sun, B. neural networks for text classification,” Journal of 
T. Hou, and N. W. Yu, “TL-CNN-IDS: transfer Database Management, vol. 32, no. 4, pp. 65-82, 
learning-based intrusion detection system using 2021. https://doi.org/10.4018/JDM.2021100105 
convolutional neural network,” The Journal of [19] A. Mahmoodzadeh, M. Mohammadi, S. G. Salim, H. 
Supercomputing, vol. 79, no. 15, pp. 17562-17584, F. H. Ali, H. H. Ibrahim, S. N. Abdulhamid, H. R. 
2023. https://doi.org/10.1007/s11227-023-05347-4 Nejati, and S. Rashidi, “” Machine learning 
[13] G. Di Méo, “Historical co-offending networks: A techniques to predict rock strength parameters,” 
social network analysis approach,” The British Rock Mechanics and Rock Engineering, vol. 55, no. 
Journal of Criminology, vol. 63, no. 6, pp. 1591- 3, pp. 1721-1741, 2022. 
611, 2023. https://doi.org/10.1093/bjc/azad005 https://doi.org/10.1007/s00603-021-02747-x 
[14] M. Merouane, “Convenient detection method for [20] W. Chen, Y. Lu, H. Ma, Q. Chen, X. Wu, and P. Wu, 
anonymous networks" I2P vs Tor",” Journal of “Self-attention mechanism in person re-
Information Science and Engineering, vol. 39, no. 6, identification models,” Multimedia Tools and 
pp. 1371-1382, 2023. Applications, vol. 81, no. 4, pp. 4649-4667, 2022. 
https://doi.org/10.6688/JISE.202311_39(6).0008 https://doi.org/10.1007/s11042-020-10494-4 
[15] M. Y. Jiang, B. J. Cui, J. S. Fu, T. Wang, and Z. Q. 
Wang, “KimeraPAD: a novel low-overhead real-
 
https://doi.org/10.31449/inf.v49i12.6951 Informatica 49 (2025) 145–156 145 
Design and Implementation of an Optimized Career Planning System 
for College Students Using a Hybrid Dijkstra-Genetic Algorithm 
Zhenhuan Zhou1, Ruohan Chen2, Li Yan1, Haijian Zhong3* 
1.School of Innovation and Entrepreneurship, Gannan Medical University, Ganzhou 341000, China 
2.School of Pharmacy, Gannan Medical University, Ganzhou 341000, China 
3.School of Medical Information Engineering, Gannan Medical University, Ganzhou 341000, China 
*Email of Corresponding Author: haijianzhong2000@163.com 
Keywords: dijkstra's algorithm, college students’ career planning, career matching, framework design 
Received: Octorber 30, 2024 
Student career scheduling is divided into regular scheduling and dynamic optimal scheduling. Regular 
scheduling is the planning task of calculating a student's career year, and the reference parameters are 
some student career data. When facing the complex career problems of college students, achieving the 
expected scheduling tasks is difficult. Aiming at the problems existing in college students' career planning, 
this paper effectively combined the Dijkstra and genetic algorithms to obtain the D-GA optimization 
algorithm and apply it in the scheduling scheme. The experimental outcomes indicate that the graduate 
job recommendation algorithm introduced in this study achieves the highest performance, with a hit rate 
of 44.37% when K=50. This is approximately double the effectiveness of the CBF approach and around 
20% higher than the neighborhood-based CF method. The mean reciprocal rank was 17.14%, which is 
nearly seven times greater than that of the CBF technique and about 3% better than the neighborhood-
based CF model. The data problem framework aligns with real-world conditions and is developed based 
on relevant aspects of college students' career planning. According to the advantages and disadvantages 
of the Dijkstra algorithm and genetic algorithm, combined with students' career problems, the Dijkstra 
algorithm was improved and combined with the genetic algorithm to form the D-GA algorithm and applied 
to the solution optimization process. Finally, combined with J2EE technology, college students' career 
planning system was realized. 
Povzetek: Razvit je hibridni Dijkstra-genetski algoritem za optimizacijo načrtovanja kariere študentov, 
implementiran s tehnologijo J2EE. Pristop izboljšuje učinkovitost in prilagaja priporočila glede na 
spreminjajoče se podatke in preference. 
 
 
1 Introduction Through the career planning system, users are able to 
explore themselves correctly, think about the factors that 
Career planning has been developed for decades, may affect their future development in an all-round way, 
and the relevant theories have been continuously and rationally make decisions on career development that 
improved, but the research on career planning at home suit them. Currently, the more famous ones in career 
and abroad differs greatly [1]. International research in planning optimization research [6]. Its advantage is that it 
the career field has been both extensive and detailed. It can solve the student career planning problem containing 
has thoroughly examined various aspects, including negatively weighted paths, and the code is simple to 
career exploration, job search intensity, job-seeking implement; its disadvantage is that it wastes a large amount 
success, factors influencing career choices, career of time on the v-1 slack operation due to the need to carry 
values, professional preferences, work-related values, out v-1 slack operations in each loop [7, 8]. The SPFA 
personality types, alignment between career paths and (Shortest Path Faster Algorithm), while an improvement 
professional choices, as well as job satisfaction. These over Bellman-Ford in many cases, still struggles with 
areas have been explored in depth, offering a worst-case performance, which can degrade to O(VE) 
comprehensive understanding of the factors affecting under certain conditions. Moreover, SPFA can be 
career development. [2, 3]. Domestic career research unpredictable in terms of run time, which poses challenges 
has only started in recent years, and the research level is for scalability and consistent system performance when 
shallow and the research content is single. The research handling large datasets or diverse user inputs typical in 
on college students focuses on the current situation of career planning systems. This study is necessary because it 
career planning and career values, mainly for ordinary proposes a hybrid approach that combines Dijkstra's 
college students, without distinguishing the differences algorithm with genetic algorithms to overcome the 
between students of different professional backgrounds, shortcomings of these SOTA techniques. The hybrid 
and not enough exploration of gender students [4, 5]. method not only optimizes the computational efficiency 
For everyone, career is limited, if not effectively but also enhances the accuracy of career path 
planned, will inevitably lead to a waste of time. recommendations by dynamically adapting to evolving 
146 Informatica 49 (2025) 145–156 Z. Zhou et al. 
data patterns and user preferences, which neither demonstrates the fundamental mechanism of online pattern 
Bellman-Ford nor SPFA can achieve effectively in this mining, and evaluate each path and rank the candidates 
context [9, 10]. Ford algorithm can handle the student based on their fitness values. Select the most promising 
career planning problem containing negatively career paths based on fitness. Use techniques like 
weighted paths as well; however, the time complexity tournament selection or roulette wheel selection to pick 
of the Floyd-Warshall algorithm is very unfriendly [11]. individuals for the next generation. Perform crossover 
Dijkstra's algorithm is the most typical and between selected pairs of paths to generate new offspring. 
representative student career planning algorithm for This helps explore new potential career trajectories by 
solving student career planning problems, and its combining features of existing paths. Apply mutation to 
application in practice is the most numerous. The most some individuals by altering a few nodes in the career 
traditional implementation of Dijkstra's algorithm uses paths. This step introduces diversity and ensures the 
the adjacency matrix to store the data information of the algorithm doesn't get stuck in local optima. After 
graph, and uses simple arrays to realize its priority generating new paths, use Dijkstra’s algorithm to further 
queue, which cannot adapt to the path query problem optimize these solutions by adjusting the node sequences 
with high real-time requirements in terms of memory for better cost or relevance. This ensures that the final 
usage and Dijkstra's algorithm in depth, analyze its solutions are both optimal and diverse. 
performance bottlenecks and improve and optimize the  
algorithm using heap data structures and features of the d  Ht  h  (3) 
application scenarios [12, 13]. Dijkstra's algorithm  
solves the problem of single-source student career C
planning with any point in the graph as the starting 1 C
C = C ( +C )+C ( 2
1 1 2 +C2 )  (4) 
point, which requires that the weights of each edge in C2 C1
the weighted graph must be non-negative. Using  
Dijkstra's algorithm to solve the single-source student Common models mainly include linear regression, 
career planning problem starting at vertex 1 in the graph logistic regression, decision tree model, plain Bayesian 
will result in the student career planning spanning tree model, neural network model, clustering algorithm and so 
[14, 15]. For any given point in a directed graph, on. Equation (5) represents the fundamental formula for 
Dijkstra's algorithm can compute the student career plan training the model, and Equation (6) represents the 
from that point to each of weights from that point to collaborative filtering algorithm, where the model 
each of the remaining vertices in the graph. Dijkstra’s evaluation phase. 
algorithm can also compute the student career plan for  
any pair of vertices in the graph by starting at the C1 C
R = S(W C ( +C )+W C ( 2
1 1 1 2 2 +C2 ))+ L  (5) 
beginning and expanding it layer by layer end point [16, C2 C1
17]. The table provides key metrics, including 
 
efficiency, complexity, and accuracy for each reviewed 
T
method. For instance, while the Bellman-Ford Q = maxΔt N( t )  (6) 
algorithm offers advantages in specific contexts, it t =1
suffers from higher computational complexity in larger  
datasets. Similarly, the SPFA algorithm, although faster Collaborative filtering recommendation algorithms 
in many cases, lacks robustness in accuracy when faced are based on user-item interaction matrix, which can be 
with real-world data variations [18, 19]. divided into two categories according to the calculation 
method: neighborhood-based collaborative filtering 
algorithms and collaborative filtering algorithms based on 
2 Dijkstra's algorithm hidden factor decomposition Equation (7) can improve the 
2.1 Dijkstra's algorithm planning and recommendation accuracy, and Equation (8) improves the 
designing hidden nature of the item, begin by initializing a population 
of potential career paths or solutions. Each individual in 
Same as the data mining process, the educational 
the population represents a candidate path composed of 
model evaluation. Different from the traditional data 
multiple nodes. Randomly generate an initial population or 
mining, the data of EDM comes from the teaching 
seed it with paths from Dijkstra's shortest-path search. For 
environment, Equation (1) is used for model evaluation, 
each candidate path in the population, use Dijkstra’s 
and Equation (2) is applied to data mining, and the 
algorithm to compute the cost. In the context of career 
obtained data is applied to the construction of teaching 
planning, this could represent the efficiency or suitability 
data. 
of a given career path based on factors such as job 
Dt :V  ( )
t →V 1  
t+1 prospects, personal preferences, and professional goals. 
  
V I
t+1 =Vt +Qt −q  (2) 
t
N( t ) = N( i,t )  (7) 
 i=1
The main role of the model, which generally  
includes the processes. Equation (3) establishes H( i,t ) = minH( i,t )+ xmaxH( i,t )−minH( i,t )  (8) 
mathematical and statistical models, and Equation (4) 
Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 147 
 single-point crossover method. The next step is to 
Equation (9) shows the basic idea of TF-IDF determine the crossover operator. It is known from 
method, Equation (10) can explain the importance of the previous experience that many practical applications use a 
occupation in the document, and then count the feature predetermined value as the crossover operator, which does 
values; use the TF-IDF method to determine each not change throughout the genetic operation. 
feature value. If k also occurs more times in other  
documents, it means that k does not contribute much to ln p(Θ|u )= ln p(u|Θ)p(Θ)  (17) 
document differentiation.TF-IDF is the feature value  
determination method that synthesizes these two −rˆ
−e uij 
considerations. ΔΘ =  rˆ − ΘΘ  (18) 
( u ,i , j )D −rˆ uij
s 1+ e uij
 Θ
n  
( xi − x )( yi − y )
If crossover then it may result in the following 
r = i=1  (9) 
xy
n n situation: individuals with high adaptation are subjected to 
( x 2 2
i − x ) ( yi − y ) crossover operation, which does not reflect the advantages 
i=1 i=1
of high adaptation, that is to say, the advantages of 
 
arg min  ( r − p qT )2 + (|| p ||2
individuals with high adaptation are not well retained, 
+ || q ||2ui u i u i )  (10) 
p ,q ( u ,i )R Equation (19) can filter the individuals with high 
 adaptation, and Equation (20) demonstrates the 
2.2 OSCache framework opportunity of individual crossover. 
 
Based on the above two assumptions, Equation 
(11) shows the neighborhood-based collaborative 1 1
h ( x ) = =  (19) 
filtering algorithm, Equation (12) shows the mechanism 1+ e−z 1+ e−( wx+b )
of item scoring. Neighborhood-based algorithms. In  
addition, non-numerical coding is beginning to come h ( x )= P( y =1| x; )  (20) 
into the limelight, and decimal coding has been applied  
with many fields. 
 
Δ T 3 Application of D-GA algorithm in 
u = −( rui − puqi )qi +  pu  (11) 
 student career system 
Δ T 3.1 Improvements made to the DIJKSTRA 
i = −( rui − puqi ) pu + qi  (12) 
 algorithm and its validation 
Equation (13) demonstrates the choice of encoding Bayesian personalized ranking algorithm is a 
method and Equation (14) allows testing the readability recommendation algorithm with better recommendation 
of the problem domain encoding. effect and widely used in various scenarios, such as 
 multimedia item recommendation, friend recommendation 
p(Θ|u ) p( )
u|Θ)p(Θ)  (13  and so on [20, 21]. So, for each user u, the BPR algorithm 
 has to find his preference ordering for all items. machine 
learning algorithms are devoted to studying how to 
 p(u∣Θ ) =  p( i u ∣j Θ )  (14) 
uU ( u,i , j )D improve the performance of the system itself through 
s
 computational means, experience. p evaluates the 
When facing some complex problems with large performance of a computer program on a task T [22]. The 
scale, the problem domain cannot be represented by performance metrics indicate that D-GA consistently 
discrete sequences at that time, that is, binary coding is outperforms both Dijkstra’s and Genetic Algorithms when 
not applicable to that situation Equation (15) can detect applied in isolation. Notably, the integration of Dijkstra's 
whether the coding is missed or not, and Equation (16) graph traversal capabilities with the adaptive nature of 
can explain the problem of career planning in the coding Genetic Algorithms leads to improved exploration of the 
process. solution space [23, 24]. While unsupervised learning has 
 only input data x in the data sample and needs to solve for 
p( i  the markers y based on the sample features, clustering is 
u j |Θ ) :=( r̂uij )  (15) 
an unsupervised learning method in machine learning 
 
algorithms [25, 26]. Figure 1 shows the initialization state 
rˆuij := rˆui − r̂uj  (16) diagram of Dijkstra's algorithm, and its process is simple 
Equation (17) demonstrates the generalization of and easy to implement. 
the crossover approach, Equation (18) represents the 
148 Informatica 49 (2025) 145–156 Z. Zhou et al. 
Motion PQ-VAE Probabilistic Motion Generation Motion Detail Refinement
Reconstructed Motion
Refined Motion
Predicted PQ Codes Predicted PQ Codes
Decoder
Linear Predictor Refiner
Quantized Features
Decoder Block
Linear Conditions Preliminary Motion Conditions
C1 C2 C3 C4  
Codebooks Predictor
Decoder
Extracted features Conditions
Quantized 
Encoder Masked Features
PQ Codes
Mask Unconfident
GT Motion 2D-PE Codes Look up in Codebooks
Figure 1: Initialization state diagram of Dijkstra's algorithm 
 
The k-means algorithm employs a greedy strategy The data was sourced from institutional career centers, job 
to approximate the solution of Eq. by iterative portals, and self-reported student profiles. The dataset size 
optimization [27]. Where line 1 initializes the cluster includes information from 10,000+ students, 
centers. Lines 4 to 8 are the cluster partitioning process, encompassing several hundred features, such as major, 
i.e., each data object is partitioned into the cluster GPA, internships, extracurricular activities, and industry 
closest to it. Lines 9 to 16 are the iterative updating interests. Each student's profile is linked to potential career 
process all points in the cluster, and if the cluster center paths and outcomes such as job offers, salaries, and job 
does not change, the clustering result is returned [28, satisfaction, making it rich and varied for analysis. 
29]. which can be categorized into cohesive and divisive Therefore, this section will introduce machine learning 
types. The cohesive type uses a bottom-up strategy [30]. model evaluation methods in two parts: classification 
Figure 2 shows the relationship between the algorithm algorithm evaluation methods and clustering algorithm 
execution efficiency and the problem size, while the evaluation methods. The methodology has been 
split method is the opposite, using a top-down strategy, enhanced to specify the parameters for the Genetic 
initially all the samples are grouped into one cluster, and Algorithm: a population size of 100, a crossover rate of 
then split according to some criterion until a certain 0.8, and a mutation rate of 0.02. Additionally, we detail 
condition is reached or a set number of divisions is the grid search method employed for hyperparameter 
reached. The dataset used for the experiments consists tuning, allowing readers to understand how optimal 
of career-related information from college students, settings were derived.  
including academic background, skills, career 
preferences, job market trends, and professional goals. 
Client Side Server Side
Class1 Class2 Class3 Pseudo 
Server
Sampling
Graph Construct Graph
Original Feature Noise Generator Topology
CONCAT
XT1
Label
Topology Embedding
Topology-aware Local GNNs W W W Global
1 2 k Wt
GNN
Local Subgraph G1 Node Embedding  
Class-wise
Prediction
Cos Cos Cos Cos
Predictor Predictor
0.3 0.1 Semantic Loss Diversity Loss Divergence Loss
0.4 0.2
Mindiverg-Xt1 Mindiverg
Predictor
Figure 2: Plot of algorithm execution efficiency versus problem size 
 
3.2 DIJKSTRA algorithm optimization convenient for model input. The dataset was split into 
Cluster assessment is generally based on two training (70%) and testing (30%) sets. The D-GA 
principles: tightness, i.e., the smallest possible algorithm was then applied to predict optimal career paths 
differences between cluster members, and separation, based on this data. Dijkstra’s algorithm was used to 
i.e., the largest possible differences between clusters. compute initial shortest career paths, while the genetic 
Since the student campus card consumption record is a algorithm explored potential variations, refining 
campus card consumption flow record, and each student recommendations over successive iterations. The 
generates a flow record for each consumption, it is performance was evaluated on multiple metrics, including 
necessary to initially screen the consumption flow data accuracy of career path matching, computation time, and 
first to extract the consumption features that are memory usage. The dataset was split into training (70%) 
Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 149 
and testing (30%) sets. The D-GA algorithm was then paths based on this data. Dijkstra’s algorithm was used to 
applied to predict optimal career paths based on this compute initial shortest career paths, while the genetic 
data. Dijkstra’s algorithm was used to compute initial algorithm explored potential variations, refining 
shortest career paths, while the genetic algorithm recommendations over successive iterations. The 
explored potential variations, refining performance was evaluated on multiple metrics, including 
recommendations over successive iterations. The accuracy of career path matching, computation time, and 
performance was evaluated on multiple metrics, memory usage. The dataset was split into training (70%) 
including accuracy of career path matching, and testing (30%) sets. The D-GA algorithm was then 
computation time, and memory usage. The dataset was applied to predict optimal career paths based on this data. 
split into training (70%) and testing (30%) sets. The D- Dijkstra’s algorithm was used to compute initial shortest 
GA algorithm was then applied to predict optimal career career paths, while the genetic algorithm explored 
paths based on this data. Dijkstra’s algorithm was used potential variations, refining recommendations over 
to compute initial shortest career paths, while the successive iterations. The performance was evaluated on 
genetic algorithm explored potential variations, refining multiple metrics, including accuracy of career path 
recommendations over successive iterations. The matching, computation time, and memory usage. The 
performance was evaluated on multiple metrics, dataset was split into training (70%) and testing (30%) sets. 
including accuracy of career path matching, The D-GA algorithm was then applied to predict optimal 
computation time, and memory usage. The dataset was career paths based on this data. Dijkstra’s algorithm was 
split into training (70%) and testing (30%) sets. The D- used to compute initial shortest career paths, while the 
GA algorithm was then applied to predict optimal career genetic algorithm explored potential variations, refining 
paths based on this data. Dijkstra’s algorithm was used recommendations over successive iterations. The 
to compute initial shortest career paths, while the performance was evaluated on multiple metrics, including 
genetic algorithm explored potential variations, refining accuracy of career path matching, computation time, and 
recommendations over successive iterations. The memory usage. Figure 3 shows the performance 
performance was evaluated on multiple metrics, comparison before and after the optimization of the 
including accuracy of career path matching, algorithm, Continuous features, such as GPA and job offer 
computation time, and memory usage. The dataset was salary, were normalized to bring all attributes onto a 
split into training (70%) and testing (30%) sets. The D- similar scale, ensuring that no single attribute 
GA algorithm was then applied to predict optimal career disproportionately influenced the algorithm. 
Resources Allocation KBs Maintenance Task Orchestration P Q R
Shared Models Shared Knowledge
Linear
Semantic model 
theft attack
Wireless
Semantic Semantic
Channel
Encoder Channel Channel Encoder
Encoder Decoder
 
Raw Softmax
Data Semantic
adversarial attack
Sender Receiver Linear
False data
injection attack Concat
Attacker
Sensing
Output
Figure 3: Performance comparison before and after algorithm optimization 
 
Categorical variables like academic major and hybrid approach. Such enhancements not only make the 
industry interest were encoded using one-hot encoding, system more robust but also align it more closely with the 
while ordinal features such as job satisfaction were needs of college students, facilitating more informed 
assigned numerical values. Only the most relevant career choices. These factors in terms of gender, family 
features, such as skills, academic background, and background, and personal ability all affect the employment 
career goals, were retained to reduce noise and improve choices of graduates. Therefore, this section analyzes the 
the efficiency of the algorithm. Figure 3 highlights the employment patterns of students from different 
tangible benefits of optimizing the career planning backgrounds in three main areas. Table 1 shows the 
system through the D-GA. The improvements in performance comparison of the seed clustering algorithms, 
accuracy, reduction in computation time, and enhanced in order to distinguish the employment patterns of students 
user satisfaction underscore the effectiveness of this with different professional abilities and family 
Congress Semantic
Control
Sensing transmission
Scaled Dot-Product Attention
150 Informatica 49 (2025) 145–156 Z. Zhou et al. 
backgrounds, students with good academic 4 Design and implementation of 
performance generally choose to continue their studies 
and the proportion of those who choose to go abroad for optimization model for students' career 
further study is small. In order to avoid the above planning based on Dijkstra's 
situation of the genetic algorithm, so that the genetic Algorithm 
algorithm does not converge prematurely and produce 
the phenomenon of early maturity, in this research, this The Graduate Employment Recommendation section 
paper adopts the adaptive crossover operator, that is, the is designed to calculate the students' ratings of employment 
crossover operator is no longer fixed, and can be organizations, and then recommend employment 
adjusted adaptively with the changes in the population. organizations to the students according to the ratings from 
 high to low. Graduates' ratings of employment units 
Table 1: Performance comparison of clustering consisted of three main components: group employment 
algorithms unit choice, students' preferences for employment unit 
Contour attributes, and students' preferences for employment unit 
Clustering algorithm Time (s) 
coefficient location. Figure 5 shows the career path shortest distance 
K-means partitioning assessment map, and the group employment unit selection 
0.415 0.085 
clustering algorithm is solved by the traditional BPR algorithm. Then students' 
Cohesive hierarchical preferences for employment unit attributes are 
0.360 0.069 
clustering algorithm incorporated into the solution objective of the BPR 
DBSCAN density clustering 
0.029 0.013 algorithm to obtain a new optimization objective function. 
algorithm 
A binary Gaussian distribution is used to fit the student 
A crucial aspect of configuring a genetic algorithm preference function for the location. The last section of this 
involves establishing its termination criteria. This chapter describes the process of solving the objective 
entails defining the conditions under which the solution function using stochastic gradient descent method. Dijkstra’
produced by the algorithm is deemed acceptable within 
s algorithm is a classical algorithm used to find the student 
the problem domain. Additionally, if the genetic 
career plan. The algorithm uses breadth-first search to 
algorithm fails to find a suitable solution, it is essential 
compute the student career plan from any node in a non-
to establish a maximum number of generations for 
negative weighted directed graph to any other store node, 
iterations. This means the algorithm should cease 
i.e., the single-source student career planning problem. 
operation after reaching a specified number of 
2.1
generations, regardless of whether the solution is 
2.0
optimal, to avoid unnecessary expenditure of time and 1.9
resources. The selection of these termination conditions 1.8
1.7
plays a significant role in the efficiency of the genetic 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
algorithm and the quality of the outcomes. If the Signal time
termination criteria are not aligned with the actual  
circumstances, even a well-crafted genetic operation 2.1
may not yield satisfactory results. Figure 4 shows the 2.0
1.9
assessment of the match between students' interests and 
1.8
careers. 1.7
 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Signal time
 
Figure 5: Assessment of shortest distance for career 
2.1
2.0 paths 
1.9  
1.8 Dijkstra's algorithm is currently extensively utilized 
1.7 and has established itself as a fundamental theory for 
0.5 1.0 1.5 2.0 2.5 3.0
CGNSDE time addressing students' career planning challenges. 
 Researchers frequently adapt Dijkstra's algorithm to suit 
2.1 the specific issues they encounter while investigating these 
2.0 types of problems. The core concept of Dijkstra's algorithm 
1.9
can be summarized as follows: it involves a set represented 
1.8
by S, which initially contains only the source point, S0. 
1.7
0.5 1.0 1.5 2.0 2.5 3.0 The algorithm subsequently adds the shortest paths to the 
CGNSDE time
set S from the remaining vertices, denoted as V-S. The set 
Figure 4: Assessment of student interests and S represents the vertices for which the shortest paths have 
career match been identified. Initially, the set S consists solely of the 
 source point S0, which then extends to each point, 
 progressively adding the point with the shortest path to S 
 and designating the remaining points as V-S. This process 
 continues until a comprehensive career plan for a student 
 
DT Loss DA Loss
Block-B Block-A
Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 151 
is formulated, with the relevant points being included in adopts the idea, which plays well with the advantages of 
S and removed from V-S until all nodes in the directed Dijkstra algorithm, in addition, in the specific 
graph are incorporated into S, signaling the completion implementation of genetic algorithms, some improvements 
of the algorithm. are also made, the specific enhancements are outlined as 
Throughout the execution of the algorithm, it is follows: the design of student career paths aligns with the 
ensured that the shortest distance from the source point adaptation function. The initial population is generated 
S0 to each vertex in S remains less than or equal to the based on the principles of Dijkstra's algorithm. This 
distance from S0 to any vertex in V-S. In its most involves executing selection, crossover, and mutation 
straightforward application, Dijkstra's algorithm processes on the initial population, utilizing an adaptive 
primarily focuses on the distances between nodes, crossover method during the crossover phase. Unlike 
represented by the weights of the directed graph. traditional genetic algorithms that often employ random 
However, in practical scenarios such as logistics, methods to establish the initial population—an approach 
distribution, and bus routing problems, it becomes that can lack direction—Dijkstra's algorithm focuses on 
increasingly crucial to consider the time and costs identifying the path with the lowest cost and the 
associated with transporting goods or individuals subsequent node that completes the current shortest route. 
between various nodes. In this research, this paper also In the context of the student career paths explored in this 
improves the traditional Dijkstra's algorithm, which is project, this means identifying a group that optimally 
finally applied with the students' occupation in water. In schedules hydraulic resources at minimal cost. Figure 6 
nature, a variety of biological generations, similar but illustrates the evaluation of student skills against job 
different, the children inherited the advantages of the requirements, significantly reducing the randomness 
father's generation, in the process of biological associated with the original algorithm. 
reproduction, left behind is always high quality, those  
less adaptable must be eliminated in the competition, 2.1
that is, the survival of the fittest. At present, the scope 
of application of genetic algorithm has been quite 
extensive, due to the good parallelism of genetic 1.9
algorithm, suitable for solving complex nonlinear 
problems, has been applied with combinatorial 
1.7
optimization problems, artificial intelligence very 
0.5 1.0 1.5 2.0 2.5
popular research direction in the field of computer. The Linear time
specific content of genetic algorithm can be described  
2.3
as follows: imitating the situation of biological TG1 TG2 TG3
evolution in nature, modeling the problem to be solved 
2.1
as a biological population, choosing a certain coding 
technique to code the population, and determining the 1.9
initialized population size, in nature, chromosomes are 
the most basic representation of biological 1.7
characteristics, different chromosomes can be combined 0.5 1.0 1.5 2.0 2.5
Linear time
into different biological characteristics, usually, in the 
case of the When coding, the coding methods that can Figure 6: Assessment of student skills and job 
be chosen are binary coding, decimal coding and so on. requirements 
First of all, a group of individuals of a certain size is  
randomly generated, and the individual with good According to these constraints to solve the solution of 
fitness is superior, so that the new generation of the objective function, and then optimization combined 
individuals will be more adapted to the environment with the actual conditions to finally get the optimal 
compared with the individuals of the parent generation, scheduling plan suitable for the student occupation. In the 
and the confusion matrix of the classification results is genetic algorithm, this paper refers to the fitness function, 
shown in Table 2. the individuals in the group are determined by the fitness 
 function, i.e., it can be calculated which individuals have 
Table 2: Confusion matrix for classification results better adaptability and which individuals should be 
The real Standard Projected results eliminated. It can be said that the significance of the fitness 
situation practice Counter-example function in the genetic algorithm is irreplaceable, and the 
Standard TP (true goodness of the fitness function can ultimately determine 
FN (false negative) 
practice example) whether the solution obtained by the genetic algorithm can 
Counter- FP (false TN (true satisfy the problem domain, which determines the quality 
example positive) counterexample) of the optimal solution obtained. Figure 7 is assessment of 
frequency of visits to career development nodes, in 
 
summary, all the calculations and judgments are centered 
In the application process of genetic algorithm, the 
around the fitness function. Moreover, the fitness function 
original genetic is considered to select random 
does not have too many constraints, it does not need to be 
individuals, while in the D-GA algorithm, this paper 
continuous or derivable, but it must be guaranteed that the 
Temperature Rel.humidity
152 Informatica 49 (2025) 145–156 Z. Zhou et al. 
function value is non-negative in the problem domain, 
so that it is possible to judge and compare the fitness 
function values of different individuals. 
2.3
2.3
2.2 2.2
2.1 2.1
2.0 2.0
 
1.9 1.9
1.8 1.8
1.7 1.7
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Encoder time Encoder time
Figure 7: Assessment of frequency of visits to career development nodes 
 
In the process of student occupation scheduling, to project, this paper adopts the sequence of state information 
assess the benefit of scheduling stage, the benefit is that represents the state information of students' career 
usually calculated based on the objective function, in planning to describe the scheduling decision information 
general, the objective function can be set up one, or can of students' careers for the specific content of students' 
be a group of functions composed of multiple functions. careers. In nature, chromosomes are considered to 
Some of the role of the student occupation is targeted, represent the characteristics of life, therefore, in the 
in this study, this paper chooses to ensure that the process of student career scheduling, the sequence 
maximum amount of power generation can be used as information that represents the planning state corresponds 
the ultimate goal, in addition to setting some auxiliary to the chromosomes in biological evolution. Figure 8 
constraint functions. In general, after analyzing and shows the graph of students' background and industry 
studying the requirements of the fitness function as a demand assessment, therefore, the process of applying 
genetic algorithm, and make appropriate improvements genetic algorithms to the scheduling of students' careers 
to the objective function, for example, to meet the non- can be thought of as follows: first, that size are selected, 
negativity of the fitness function and other requirements, which serve as the initial population, in student careers, 
in order to more closely match the implementation of would mean selecting a certain size of the initial planning 
the genetic algorithm. In the research process of this sequence. 
1.9
1.8
1.7
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Signal time
 
1.9
1.8
1.7
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
Signal time
Figure 8: Student background and industry needs assessment 
 
Then the objective function of the student's Repeat the above steps until the newest generation of 
occupation, and it is obvious that the fitness of the new population meets the termination conditions of the 
sequences obtained is higher than that of the parent algorithm. When facing the application process of genetic 
generation, and in the crossover operation, this paper algorithm, after determining the initial conditions and the 
adopts adaptive crossover algorithm, which improves fitness function, the first thing to be obtained is the 
the efficiency of the algorithm, and the newly obtained planning sequence, that is, the planning sequence should 
sequences are Then the mutation operation is carried out, be expressed into the form of coding. With people's 
which improves the diversity of the species, and at the continuous research and study, several encoding methods 
same time, a new generation of population is obtained. have been developed that are known to the public, 
Scatter Plot(Epoch:1750k)
PQ-VTG PQ-VAE
Scatter Plot(Epoch:2000k)
Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 153 
including binary encoding, decimal encoding, gray code, and easy to understand the characteristics of binary coding, 
etc. Among them, binary encoding is the most popular. at the same time the binary coding is extended, broadening 
Among them, binary coding is one of the simplest the field of application. 
coding methods. It is also the most widely used at 
present. Binary coding, as the name suggests, uses only 5 Experimental analysis 
{0,1} for encoding, i.e., all information is represented 
using only {0,1}. Although binary coding is very simple A framework for a personalized preference-based 
to understand, but there are some shortcomings and graduate employment recommendation algorithm is 
limitations, in the face of some complex problems, the demonstrated, Figure 9 shows the preference assessment 
ability of binary coding appears to be somewhat map for career planning path selection, and then calculates 
insufficient, cannot well respond to the root of the the employment choice of the student group by referring to 
problem, in the application, with the help of other the results of the group delineation; and finally calculates 
coding features and binary coding is simple and easy to the graduates' scores. 
implement the characteristics. It still retains the simple 
2.1 2.2
2.1
1.9  
1.9
0.5 1.0 1.5 2.0 2.5 0.5 1.0 1.5 2.0
Sampling time Sampling time
Figure 9: Career planning pathway selection preference assessment 
 
It was analyzed that there are great differences in of students' career change cost, the problem is an 
the employment choices of students with different unsupervised learning, so the clustering method is used to 
academic performances and family economic divide the groups. Dijkstra’s algorithm, with its space 
conditions. Therefore, the academic performance and complexity, requires considerable memory, especially 
family economic conditions of graduates are selected as when applied to large graphs. The D-GA, while 
the reference characteristics for the division of student introducing additional storage requirements for multiple 
groups. The distribution of graduates' family economic candidate solutions (population), is designed to work 
condition index and academic performance index is efficiently in parallel, reducing bottlenecks by pruning less 
shown. Figure 10 shows the evaluation of the analysis relevant solutions over time. 
2.1
2.0
1.9
1.8
1.7
0.5 1.0 1.5 2.0 2.5
In-dearee time
 
2.1
2.0
1.9
1.8
1.7
0.5 1.0 1.5 2.0 2.5
In-dearee time
Figure 10: Evaluation of students' career change cost analysis 
Surface Temperature
Frequency Frequency
Lab Data
154 Informatica 49 (2025) 145–156 Z. Zhou et al. 
The D-GA hybrid balances Dijkstra’s efficiency in assessment of students' career planning decisions, while 
finding the shortest paths with the exploratory the cohesive hierarchical clustering algorithm and the 
capabilities of Genetic Algorithm (GA). While Dijkstra DBSCAN algorithm do not divide the data samples, and 
alone computes the shortest path quickly, it can struggle the distinction between academic performance and family 
with scalability in large datasets. The D-GA introduces economic conditions is not obvious between some groups, 
population-based search, which increases computation especially in the case of the DBSCAN algorithm, which 
time due to crossover and mutation steps, but it has no obvious distinction between the groups and the 
ultimately reduces the number of iterations needed by division is not homogeneous. 
optimizing paths dynamically. Figure 11 shows the 
2.1
2.1
2.0
2.0
1.9 1.9  
1.8 1.8
0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.5 1.0 1.5 2.0 2.5 3.0
Loss time Loss time
Figure 11: Assessment of student career planning decisions 
 
Therefore, this study uses k-means clustering unit characteristics; for example, some students prefer 
algorithm to classify student groups. Graduates' ratings stable careers such as teachers and civil servants, while 
of employment units consist of three main components: others prefer positions requiring high professional 
group employment unit choice, students' preferences for competence such as engineering and technology. Figure 12 
employment unit attributes, and students' preferences shows the graph of the assessment of the association 
for employment unit location. Group employment unit between career advancement speed and educational 
choice indicates the group's rating of the employment background, which is used in this paper to solve the group 
unit. Employment unit attribute preference indicates employment choice using Bayesian personalized ranking 
graduates' preference for some specific employment strategy. 
2.1
2.0 1.9
1.9
1.8
1.8  
1.7
1.7
0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
Decoder time Decoder time
Figure 12: Assessment of the association between speed of career advancement and educational background 
 
6 Conclusion analyzes some existing optimization measures, and 
establishes a mathematical model for problems related to 
Dijkstra algorithm is a classical algorithm for finding optimization models in combination with mathematical 
college students' career planning. It adopts a breadth-first modelling. The Dijkstra algorithm efficiently finds the 
search to calculate college students' career planning from shortest path in graphs with non-negative weights, making 
any node in the non-negatively weighted directed graph to it highly reliable in structured problems. However, its 
any other storage node, the single-source student career greedy nature limits its ability to adapt to complex, 
planning problem. The Dijkstra algorithm has been widely evolving datasets, such as those encountered in career 
used and has become a fundamental theory. This paper planning. It works best in static environments but 
analyzes some problems in students' career planning, 
Quantized Features
Reconstructed Motion
Quantized Features
Reconstructed Motion
Design and Implementation of an Optimized Career Planning… Informatica 49 (2025) 145–156 155 
struggles when dealing with larger, dynamic datasets. The doi: 10.3390/app12147088. 
D-GA combines the precision of Dijkstra’s algorithm with [7] R. Botez, A. G. Pasca, A. T. Sferle, I. A. Ivanciu, and 
the exploratory power of GA. This integration allows D- V. Dobrota, "Efficient Network Slicing with SDN 
GA to quickly narrow down optimal solutions through and Heuristic Algorithm for Low Latency Services in 
Dijkstra’s efficient traversal, while GA’s population- 5G/B5G Networks," Sensors, vol. 23, no. 13, pp. 29, 
based approach ensures that it explores a wider range of 2023. doi: 10.3390/s23136053. 
possibilities. This results in faster convergence and better [8] G. S. Brodal, "Priority queues with decreasing keys," 
performance in dynamic environments like career Theoretical Computer Science, vol. 1000, pp. 14, 
planning systems. By balancing Dijkstra’s exactness and 2024. doi: 10.1016/j.tcs.2024.114563. 
GA’s adaptability, D-GA outperforms both in terms of [9] E. Çakir, Z. Ulukan, and T. Acarman, "Time-
efficiency, scalability, and accuracy, making it ideal for dependent Dijkstra's algorithm under bipolar 
personalized and evolving career recommendation neutrosophic fuzzy environment," Journal of 
systems. Intelligent & Fuzzy Systems, vol. 42, no. 1, pp. 227-
Among the students who chose employment 236, 2022. doi: 10.3233/jifs-219188. 
companies, the most significant proportion of students [10] Y. Q. Chen, J. F. She, X. G. Li, S. H. Zhang, and J. Z. 
chose other enterprises, about 31 per cent, followed by Tan, "Accurate and Efficient Calculation of Three-
students who chose state-owned enterprises, about 16 per Dimensional Cost Distance," Isprs International 
cent. After graduation, about 80% of students choose to Journal of Geo-Information, vol. 9, no. 6, pp. 18, 
work in computer-related jobs, of which about 62% 2020. doi: 10.3390/ijgi9060353. 
choose to work in development, and about 17% choose to [11] G. H. Choi, W. Lee, and T. W. Kim, "Voyage 
work in other professional and technical personnel. optimization using dynamic programming with 
Among other non-computer jobs, clerical and related initial quadtree based route," Journal of 
personnel accounted for the largest share at 7.8%. Computational Design and Engineering, vol. 10, no. 
Students chose a wide range of industries as employment 3, pp. 1185-1203, 2023. doi: 10.1093/jcde/qwad055. 
units, covering 16 industries. Among them, about 63 per [12] C. Dudeja and P. Kumar, "An improved weighted 
cent of students choose to work in industry, followed by a sum-fuzzy Dijkstra's algorithm for shortest path 
large number in manufacturing, accounting for about problem (iWSFDA)," Soft Computing, vol. 26, no. 7, 
8.8%. The number of college students who chose the pp. 3217-3226, 2022. doi: 10.1007/s00500-022-
remaining 14 fields was small, less than 5 per cent. 06871-w. 
[13] M. Fazio et al., "A Map-Reduce Approach for the 
References Dijkstra Algorithm in SDN Over Osmotic 
Computing Systems," International Journal of 
[1] I. Alameri, J. Komarkova, T. Al-Hadhrami, A. E. Parallel Programming, vol. 49, no. 3, pp. 347-375, 
Yahya, and A. Gharbi, "Optimizing Connections: 2021. doi: 10.1007/s10766-021-00693-3. 
Applied Shortest Path Algorithms for MANETs," [14] E. Gefen and D. Zarrouk, "Flying STAR2, a Hybrid 
Cmes-Computer Modeling in Engineering & Flying Driving Robot With a Clutch Mechanism and 
Sciences, vol. 141, no. 1, pp. 787-807, 2024. doi: Energy Optimization Algorithm," IEEE Access, vol. 
10.32604/cmes.2024.052107. 10, pp. 115491-115502, 2022. doi: 
[2] K. Altisen, P. Corbineau, and S. Devismes, 10.1109/access.2022.3218305. 
"Certification of an exact worst-case self- [15] D. Ghosh, J. Sankaranarayanan, K. Khatter, and H. 
stabilization time," Theoretical Computer Science, Samet, "Opportunistic package delivery as a service 
vol. 941, pp. 262-277, 2023. doi: on road networks," Geoinformatica, vol. 28, no. 1, pp. 
10.1016/j.tcs.2022.11.019. 53-88, 2024. doi: 10.1007/s10707-023-00497-2. 
[3] T. K. Astarte, "From Monitors to Monitors: A [16] U. Gurusamy, K. Hariharan, and M. S. K. 
Primitive History," Minds and Machines, vol. 34, no. Manikandan, "Path optimization of box-covering 
SUPPL 1, pp. 51-71, 2024. doi: 10.1007/s11023- based routing to minimize average packet delay in 
023-09632-2. software defined network," Peer-to-Peer Networking 
[4] N. Bahrami and S. M. Siadatmousavi, "Ship voyage and Applications, vol. 13, no. 3, pp. 932-939, 2020. 
optimisation considering environmental forces using doi: 10.1007/s12083-019-00855-8. 
the iterative Dijkstra's algorithm," Ships and [17] M. Ha, D. M. Tran, and Y. Shichkina, "Model of 
Offshore Structures, vol. 19, no. 8, pp. 1173-1180, Message Transmission across Parallel Route Groups 
2024. doi: 10.1080/17445302.2023.2231200. with Dynamic Alternation of These Groups in a 
[5] X. S. Bai, L. Wang, Y. B. Hu, P. F. Li, and Y. T. Zu, Multichannel Steganographic System," Electronics, 
"Optimal Path Planning Method for IMU System- vol. 12, no. 19, pp. 24, 2023. doi: 
Level Calibration Based on Improved Dijkstra's 10.3390/electronics12194155. 
Algorithm," IEEE Access, vol. 11, pp. 11364-11376, [18] T. D. Holmes, R. H. Rothman, and W. B. 
2023. doi: 10.1109/access.2023.3240518. Zimmerman, "Graph Theory Applied to Plasma 
[6] M. Behún, D. Knezo, M. Cehlár, L. Knapcíková, and Chemical Reaction Engineering," Plasma Chemistry 
A. Behúnová, "Recent Application of Dijkstra's and Plasma Processing, vol. 41, no. 2, pp. 531-557, 
Algorithm in the Process of Production Planning," 2021. doi: 10.1007/s11090-021-10152-z. 
Applied Sciences-Basel, vol. 12, no. 14, pp. 12, 2022. [19] W. C. Hu, H. T. Wu, H. H. Cho, and F. H. Tseng, 
156 Informatica 49 (2025) 145–156 Z. Zhou et al. 
"Optimal Route Planning System for Logistics 2021. doi: 10.15388/21-infor445. 
Vehicles Based on Artificial Intelligence," Journal of  
Internet Technology, vol. 21, no. 3, pp. 757-764, 
2020. doi: 10.3966/160792642020052103013. 
[20] P. Joshi, A. S. Raghuvanshi, and S. Kumar, "An 
Intelligent delay efficient data aggregation 
scheduling for distributed sensor networks," 
Microprocessors and Microsystems, vol. 93, pp. 16, 
2022. doi: 10.1016/j.micpro.2022.104608. 
[21] C. H. Ke, Y. H. Tu, and Y. W. Ma, "A reinforcement 
learning approach for widest path routing in 
software-defined networks," Ict Express, vol. 9, no. 
5, pp. 882-889, 2023. doi: 
10.1016/j.icte.2022.10.007. 
[22] N. Khakzad, "A methodology based on Dijkstra's 
algorithm and mathematical programming for 
optimal evacuation in process plants in the event of 
major tank fires," Reliability Engineering & System 
Safety, vol. 236, pp. 11, 2023. doi: 
10.1016/j.ress.2023.109291. 
[23] N. H. Kim, F. He, R. M. Nasir, and S. I. Kwak, 
"Stepwise benchmarking based on production 
function: Selecting path towards closest target," 
Expert Systems with Applications, vol. 228, pp. 11, 
2023. doi: 10.1016/j.eswa.2023.120308. 
[24] S. Kim, T. Choi, S. Kim, T. Kwon, T. H. Lee, and K. 
Lee, "Sequential graph-based routing algorithm for 
electrical harnesses, tubes, and hoses in a 
commercial vehicle," Journal of Intelligent 
Manufacturing, vol. 32, no. 4, pp. 917-933, 2021. doi: 
10.1007/s10845-020-01596-9. 
[25] P. Kumari and S. K. Sahana, "Heuristic Initialization 
Based Modified ACO (HIMACO) Mimicking Ant 
Safety Features for Multicast Routing and its 
Parameter Tuning," Microprocessors and 
Microsystems, vol. 93, pp. 9, 2022. doi: 
10.1016/j.micpro.2022.104574. 
[26] S. R. Lawande, G. Jasmine, J. Anbarasi, and L. I. 
Izhar, "A Systematic Review and Analysis of 
Intelligence-Based Pathfinding Algorithms in the 
Field of Video Games," Applied Sciences-Basel, vol. 
12, no. 11, pp. 30, 2022. doi: 10.3390/app12115499. 
[27] F. K. Luan, J. X. Yang, H. Zhang, Z. Y. Zhao, and L. 
P. Yuan, "Optimization of Load-Balancing Strategy 
by Self-Powered Sensor and Digital Twins in 
Software-Defined Networks," Ieee Sensors Journal, 
vol. 23, no. 18, pp. 20782-20793, 2023. doi: 
10.1109/jsen.2022.3216376. 
[28] S. Duleba, F. K. Gundogdu, and S. Moslem, 
"Interval-Valued Spherical Fuzzy Analytic Hierarchy 
Process Method to Evaluate Public Transportation 
Development," Informatica, vol. 32, no. 4, pp. 661-
686, 2021. doi: 10.15388/21-infor451. 
[29] D. Kalibatiene and J. Miliauskaite, "A Hybrid 
Systematic Review Approach on Complexity Issues 
in Data-Driven Fuzzy Inference Systems 
Development," Informatica, vol. 32, no. 1, pp. 85-
118, 2021. doi: 10.15388/21-infor444. 
[30] A. Ulutas et al., "Developing of a Novel Integrated 
MCDM MULTIMOOSRAL Approach for Supplier 
Selection," Informatica, vol. 32, no. 1, pp. 145-161, 
https://doi.org/10.31449/inf.v49i12.6573 Informatica 49 (2025) 157–172 157 
Congestion Control of Large-Scale Elevator Terminal Data Access in 
Large Metro Stations Based on The Internet of Things 
Juanjuan Shi*, Mengtian Jiao 
College of Information Engineering, Jiaozuo University, Jiaozuo 454000, China 
E-mail: jinglove666999@163.com 
*Corresponding author 
Keywords: hybrid access method, internet of things, congestion control, ACB control parameters  
Received: July 6, 2024 
Large metro station IoTs used to face congestion while access to terminals was going on a large scale. 
Due to this, low success rate in access and delay in monitoring critical equipment was observed, which 
included elevators and escalators. This paper presented a congestion control method for large-scale 
elevator terminal data access in metro stations using IoT. Business data were categorized based on volume 
and latency requirements: Slot ALOHA (SA) direct access mode was used for delay-insensitive, small data 
services, and Access Class Barring (ACB) random access was used for time-sensitive, large data services. 
ACB control parameters were dynamically adjusted by estimating access requests. Using uniform and 
Beta distribution models, the method's effectiveness was validated through experiments. With 4000 access 
requests, the hybrid method achieved a 52.43% success rate and a 76.72 ms average delay under the 
uniform model, and a 42.07% success rate with an 82.02 ms average delay under the Beta model. These 
results demonstrated the method's ability to meet Quality of Service (QoS) requirements for high-priority 
services, ensuring efficient and reliable communication in large-scale IoT environments. 
Povzetek: Prispevek predstavlja hibridno metodo za nadzor preobremenjenosti do podatkov naprav IoT, 
ki uporablja kombinacijo direktnega in naključnega dostopa, s prilagajanjem parametrov glede na obseg 
zahtev. 
 
 
1 Introduction transmission and processing of multiple types and large-
scale data [4].  
Based on the current urban development and people's Based on this, some scholars use IoT technology to 
travel needs, the number of elevators inside large metro monitor elevators' operating status data, dramatically 
stations is also growing [1]. The stability of elevator improving elevator operation security and effectively 
operation is closely related to the safety of residents. reducing equipment operation and maintenance costs. 
However, due to the quality, maintenance, supervision, Mao et al. [5] discussed the integration of Internet of 
and other influencing factors, elevator accidents often Things (IoT) technology to enhance the remote security 
occur. How to conduct unified real-time monitoring of management of elevators, addressing the associated safety 
elevator equipment in large and medium-sized Spaces to risks. They proposed an IoT-based architecture for 
reduce daily lightweight failures and prevent heavyweight elevator fault diagnosis and maintenance. The study 
accidents has become a hot topic of scholarly attention [2, established a fault diagnosis management system centered 
3].  on IoT, outlining maintenance methods to ensure the 
The increasing global population and urbanization safety and stability of elevator operations. This approach 
have heightened the demand for elevators, necessitating aims to improve the overall security and efficiency of 
advanced, safe, and efficient systems. China’s elevator urban transportation through advanced technology.  Lai et 
demand grows by 5%–7% annually due to the need to al. [6] adopt the more predictive state maintenance method 
replace outdated units and comply with new regulations, to realize the remote monitoring of highly distributed 
increasing maintenance workloads and risks. Innovative elevator equipment status, effectively improving the 
designs must prioritize safety, including weight capacity, safety and reliability of equipment operation. 
emergency alarms, and secure installation sites. Energy- IoT devices, ranging from consumer products to 
efficient elevators can reduce operational costs industrial components, are becoming ubiquitous, driving 
significantly. Traditional monitoring systems, like video the concept of "Smart homes" with enhanced safety and 
surveillance, fail to reflect the elevator's condition and energy efficiency. Wearable fitness and health monitors, 
failure rates adequately. With its advantages of low power network-enabled medical devices, and smart traffic 
consumption, significant connection, low delay, and high systems contribute to “smart cities” that reduce congestion 
reliability, the Internet of Things (IoT) can realize the  and energy use. IoT also promises to improve the 
 independence and quality of life for people with 
158   Informatica 49 (2025) 157-172                                                                                                                                       J. Shi et al. 
disabilities and the elderly. The impact of IoT extends to scalable and flexible network architecture, enhanced 
agriculture, industry, and energy sectors, enhancing security and privacy, and advanced AI and machine 
information flow along the production value chain. learning integration. These features ensure instantaneous 
Companies and research organizations predict significant and reliable data transmission for critical applications, 
economic effects [7]. A market research report revealed support billions of IoT devices, extend the battery life of 
that the global IoT market was valued at $1.90 billion in remote sensors, allow dynamic resource allocation, 
2018 and is projected to grow to $11.03 billion by 2026. protect sensitive data, and optimize network performance 
Additionally, the European Union (EU), the United States through predictive maintenance and anomaly detection. 
(USA), China, and other nations have developed IoT- These features collectively create an efficient, reliable, 
related action plans. These initiatives include the IoT-An and secure communication environment for the 6G era 
Action Plan for Europe and various IoT development [11]. 
plans for the years 2016–2020 [8]. Due to the limitation of channel resources, when the 
Song et al. [9] discussed the adoption of smart IoT at metro stations has a large number of elevators, and 
technologies and networking solutions like the Internet of other equipment data access, the time delay indicator of 
Things (IoT) by leading cities in China to enhance the system is higher, and the throughput will decrease 
economic opportunities and global climate resilience. significantly. Therefore, there is a great demand for a 
They presented the smart city concept as a complex large-scale terminal access algorithm tailored to the 
system integrating sensors, data, applications, and communication characteristics of the IoT at large metro 
organizational forms to make cities more agile and stations to ensure the reliable transmission of information 
sustainable. The paper provided a comprehensive data of crucial equipment. In response to the above issues, 
assessment of smart city initiatives in China, classifying Chou et al. [12] used Bayesian theory to estimate the 
practices into six key dimensions: energy, agriculture, number of access applications, preamble code conflict 
transport, buildings, urban services, and urban security rate, and the number of following time-slot applications at 
operations. Chinese smart city policies and practices aim the current time-slot. Furthermore, the optimal ACB 
to explore renewable energy, improve public convenience, control parameters are discussed by judging the number of 
and enhance urban comfort and citizen friendliness. The applications for the subsequent access time slot through 
study also addressed concerns in areas such as system quantitative prediction methods. The scheme is based on 
integration, governance, innovation, and finance. A policy the premise that the current time-slot access conflict 
vision was outlined to build public-private collaborative makes direct rebleeding at the next time-slot, with some 
networks, encourage innovation and investment in smart error from the system of refeeding in the actual access 
city initiatives, and emphasize smart services. process. 
In practical applications, the infrastructure of the Zhang et al. [13] addressed the growing need for 
wireless cellular network is relatively perfect, the improved communication content and quality in the 
coverage area is comprehensive, and the security is high, context of advancing network and communication 
which is one of the leading carrying networks of IoT technologies. This research concerns the optimal data 
communication. However, the original intention of collection and path planning of multi-unmanned aerial 
traditional wireless cellular network design is to deal with vehicle (UAV) to achieve extensive terminal accessibility 
the communication problem between humans and humans in IoT scenarios. The novelty of the approach consists of 
(H2H), and there are some differences in the integrating sensor area partitioning with the flight 
communication characteristics between machine to trajectory planning of multiple UAVs with the main 
machine (M2M). Machine-type communication (MTC) objectives of load balancing while the overall completion 
devices, integral to Industry 4.0, support smart factories, time for the tasks at hand is minimized. A novel k-means 
healthcare, and surveillance by generating data and algorithm has been developed to balance the quantity of 
making policy-based decisions. The demand for these data in each cluster. Accordingly, the flight trajectories of 
devices is projected to reach 50 billion by 2025. These the UAVs were represented discretely by an enhanced 
devices require robust security due to their vulnerability genetic algorithm including the 2-opt optimization 
and usage in open environments. Lightweight operator for solving the multiple traveling salesman 
cryptography is the preferred solution for MTC devices problem (MTSP) problem, improving the computational 
due to their limited computational and memory capacities. effectiveness. Extensive simulations have validated the 
This cryptographic approach ensures strong encryption efficiency of the suggested approach in smoothing out the 
while being efficient and cost-effective, enhancing imbalances in the distribution of tasks among UAVs and 
security for the growing number of IoT devices. MTC significantly reducing the duration of tasks. The 
devices are autonomous and central to automating IoT convergence rate for this methodology was higher than the 
frameworks, evolving to support the advancements of conventional genetic algorithm; hence, this proved that it 
Industry 4.0. They form Machine-to-Machine (M2M) was computationally efficient. Equipped with a new, 
communication networks, also known as cyber-physical efficient methodology for multi-UAV-assisted IoT 
systems and edge nodes, creating an autonomous system terminal data gathering, it brings balance and efficiency in 
of resource-constrained devices [10]. task distribution, unfolding the full power of professional 
The six key features of Machine Type algorithm solutions when acquiring optimal results in 
Communication (MTC) in 6G are ultra-low latency and more complicated engineering scenarios. 
high reliability, massive connectivity, energy efficiency, 
Congestion Control of Large-Scale Elevator Terminal Data Access…                                      Informatica 49 (2025) 157–172 159 
Varsha et al.[14]  proposed an innovative intelligent procedures in the case of massive and heterogeneous 
traffic management system for wireless cellular networks device access for 5G and 6G communication applications. 
to enhance M2M connections, pivotal for IoT. They Yu et al.  [16] investigated the performance of 
focused on improving Access Class Barring (ACB), a massive machine-type communications (mMTC) in status 
method traditionally relying on a static factor to manage update systems, where numerous machine-type 
machine-type communication device (MTCD) traffic. The communication devices (MTCDs) send status packets to a 
study introduced a Bayesian inference-based learning base station (BS) for system monitoring. The authors 
automatons (BI-LA) approach that dynamically adjusts identified that packet collisions due to massive MTCDs 
the ACB factor. This system leverages learning automata's negatively impact status update performance. To address 
self-adaptive learning to estimate and manage M2M this, they proposed a joint access control, frame division, 
traffic more effectively. By framing the problem around and subchannel allocation scheme. They first analyzed 
collision probability and using Bayesian inference to adapt access control, packet collisions, and packet errors, 
the ACB factor, the proposed method was tested using deriving a closed-form expression of the average age of 
network simulator-3 (NS3). The performance metrics— information for all MTCDs as a performance metric. Their 
average access delay, access attempts, access success rate, proposed scheme was shown through simulations and 
and access success—demonstrated that the BI-LA ACB numerical results to achieve near-optimal performance, 
technique outperformed traditional and contemporary comparable to exhaustive search methods, and 
ACB methods, achieving minimal access delays of outperformed benchmark schemes. Bui et al. [17] present 
approximately 1876 ms and 27.6 ms. an access protocol based on distributed queue (DQ) 
The main problem arises due to a large amount of UEs mechanisms to deal with M2M communication large-
present in the RA techniques, as discussed by Piao and Lee scale access problems for cellular networks. To maximize 
[15], where increased collisions and delays arise. They the DQ mechanism performance, first of all, the base 
propose a new RA scheme that combines four-step RA station in the random-access opportunities is roughly the 
with two-step RA, based on the 3rd Generation number of conflict detection equipment to avoid excessive 
Partnership Project Release 16. This work tries to avoid a division of DQ. Then based on the probing results, the 
conflict with the available RA resource, then achieves a base station randomly divides the device into a determined 
better performance of efficiency and brings down the number of groups and "pushes" these groups to the end of 
average RA delay. This solution aims to optimize the two- the logical access queue. Finally, the validity and 
step RA probability and thus provides a resource feasibility of the proposed protocol are verified by 
configuration and parameter setting algorithm that allows simulation. 
the UEs to carry out both RA methods simultaneously. Congestion control and optimization methods 
Then, the authors proved further that the proposed overview in IoT applications-the methodologies, the 
approach is valid using a Markov chain model. The datasets used, the results, and the limitations are 
proposed approach also has its potential confirmed in represented in Table 1. This comparison identifies the 
extensive comprehensive simulations on supporting RA gaps that this paper will address with the proposed hybrid 
access method. 
Table 1: Summary of related works on congestion control and access optimization Methods in IoT applications 
highlighting limitations and positioning the hybrid access method as a novel solution 
Study Method Datasets Key Results Limitations 
Mao et IoT-based architecture Elevator operational Improved safety and Limited scope to fault 
al. [5] for fault diagnosis data stability of elevator diagnosis only 
operations through IoT 
monitoring 
Lai et al. Predictive maintenance Distributed elevator Enhanced safety and Focused only on 
[6] with IoT integration equipment data reliability of elevator maintenance, lacks 
systems scalability analysis 
Chou et Bayesian theory-based Simulated data Improved ACB Errors in real-time 
al. [12] ACB optimization parameters, reduced predictions 
conflict rate 
Zhang et Multi-UAV data Simulated IoT Balanced task distribution, High computational 
al. [13] collection and path scenarios reduced completion time overhead 
optimization 
Varsha Learning Automaton- Cellular Base Controlled M2M data, High implementation 
et al.[14]   based ACB scheme Station data reduced H2H interference complexity 
(LA-ACB) 
Piao and Integrated 2-4 step Cellular network Reduced collisions and Limited to specific RA 
Lee [15] Random Access (RA) simulations delays configurations 
methods 
160   Informatica 49 (2025) 157-172                                                                                                                                       J. Shi et al. 
Bui et al. Distributed Queue LTE/LTE-A Reduced congestion, Requires precise group 
[17] (DQ)-based access network data improved success rate partitioning 
protocol 
 
This paper proposes a hybrid access methodology that 5. Application in IoT environments: Ensured 
combines Slot ALOHA with Access Class Barring for Quality of Service (QoS) for high-priority 
large-scale IoT scenarios in metropolitan transit stations. services in large-scale IoT environments in metro 
The proposed methodology, by dynamically changing stations. 
ACB control parameters and implementing predictive 6. Predictive access application: Developed a 
modeling on access requests, should be able to provide method to predict access applications for better 
high QoS for important applications like elevator access control. 
monitoring under different traffic conditions. This novel 7. Experimental validation: Validated the method 
strategy overcomes some fundamental limitations of the in a Shanghai metro station, showing practical 
previous approaches by providing a scalable, reliable, and advantages over traditional methods. 
economic solution to congestion management in IoT 
systems with complex networks. Therefore, the key 2 Systems model and custom MAC 
contributions of the paper are as follows: 
The key contributions of the paper are as follows: layer protocol for IoT 
1. Congestion control method: Developed a communication in large metro 
method for managing large-scale elevator 
stations 
terminal data access in metro stations using IoT, 
addressing low access success rates and delays. 
2.1 Systems model 
2. Data categorization: Divided business data 
based on volume and latency requirements, using Based on the practical application, a metro station 
Slot ALOHA (SA) for delay-insensitive data and communication model is built with large-scale MTCD to 
Access Class Barring (ACB) for time-sensitive simulate the congestion caused by frequent network 
data. access by communication devices. Illustration of IoT 
3. Dynamic ACB adjustment: Proposed communication model for large metro stations in Fig. 1 
dynamically adjusting ACB control parameters shows how the MTCDs will be sending their data to the 
by estimating access requests to optimize server via the eNB. 
terminal access. The evolved Node B (eNB) receives, controls, and 
4. Performance evaluation: Demonstrated allocates up/down dynamic resources. MTCD data is 
through simulations that the hybrid access transmitted to a fixed gateway through the narrowband 
method improves access success rates and IoT, which forwards the data to the server. In the IoT 
reduces delays, especially with high access model, when two or more MTCDs use the same preamble 
requests. code simultaneously, it indicates that the decision is in 
conflict and the device access fails.
Single RAN MI Single EPC HLR/PCRF
G/N/L/NB-LoT G/U/L
Application 
Server
eNB LoT EPC
NB-LoT New M2M Platform
Wireless Core network IoT 
MTCD network side side Support Platform
 
Figure 1: Illustration of the IoT communication model for large metro stations, showcasing the flow of data from 
MTCDs to servers via the eNBCustom MAC Layer Protocol 
Given that complex signaling can reduce the success low sensitivity to delay, SA direct access is utilized. 
rate of device access, the network employs the Media Conversely, ACB random access is applied to delay-
Access Control (MAC) protocol [18]. The MAC layer sensitive and data-intensive services. Fig. 2 illustrates the 
protocol combines Selective Acknowledgement (SA) and hybrid MAC layer protocol diagram, where 𝑇𝑖  is the i-th 
Access Class Barring (ACB) controls to adapt to various access timeslot. 
types of business data and enhance access speed and The hybrid MAC layer protocol divides each 
success. For services with small amounts of valid data and incoming data packet into four parts: 
Congestion Control of Large-Scale Elevator Terminal Data Access…                                      Informatica 49 (2025) 157–172 161 
1. Broadcasting data access information and ACB Using the hybrid MAC layer protocol for the 
control parameters for the current timeslot. classified transmission of different business data 
2. Assigning preamble codes to randomly accessed effectively reduces signaling consumption, accelerates 
services. data access, and ensures the Quality of Service (QoS) 
3. Handling SA direct access business. demands of high-priority business services.
4. Conducting data transmission. 
Ti-2 Ti-1 Ti Ti+1
Random 
Broadcast Access Sotted- Random Access 
data Allocation ALOHA Data Transfer
Leader
 
Figure 2: Hybrid MAC layer protocol 
 
3 Design of hybrid access method network congestion. Therefore, a random wait period is 
introduced before attempting to resend the data. The data 
transmission process is illustrated in Fig. 3.
3.1 SA method and improvement 
The SA transmits data by speaking first. Signal overlap is 
likely to occur during concurrent operations, leading to  
Resend
Node 1
Resend Resend
Node 2
Resend
Node 3
Channel
success success success collision success success success collision
 
Figure 3: Data sending process for traditional SA method 
In the traditional Slot ALOHA (SA) method, the time likelihood of collisions is significantly reduced, as nodes 
for retransmission is random, leading to a high probability are not transmitting simultaneously. This structured 
of complete or partial collisions. This randomness reduces approach allows for more efficient use of the available 
the efficiency of information utilization and decreases bandwidth and improves overall system throughput. 
system throughput. The improved SA data-sending process, which 
To address these issues, the data transmission process mitigates collisions and enhances throughput, is illustrated 
has been improved. The transmission period is divided in Fig. 4. This method ensures that each node's 
into several time slots, and data can only be sent at the transmission is independent of others, leading to more 
initial point of a time slot. By ensuring that nodes transmit reliable and orderly communication within the network.
information within their designated time slots, the 
Node 1
Node 2
Node 3
Channel
 
frequency domain
162   Informatica 49 (2025) 157-172                                                                                                                                       J. Shi et al. 
Figure 4: Data sending process for improving the SA method 
The relationship between the throughput rate 𝑄 and The probabilities of the 𝑖 -th preamble being in these 
the sent packet quantity 𝐺 can be expressed as Eq. (1): three states is given by the following Eq. (3): 
𝑄 = 𝐺𝑒−𝐺 (1)  
𝑃(𝑤
When two nodes transmit within the period 𝑇′, the 𝑖)
𝑛
1 𝑎
data transmission delay function is given in Eq. (2):  (1 − ) ,                                𝑤𝑖 = 0
  𝑁𝑝
 𝑛
𝑛𝑎 1 𝑎−1
𝑇𝑌 = 2𝑇
′ + 𝑡𝑑 + [𝜑𝑇
′ + (𝐵 + 1)𝑇′](𝑒𝐺
(3) 
− 1) (2) = ⋅ (1 − ) ,                       𝑤𝑖 = 1  
 𝑁𝑝 𝑁𝑝
  𝑛
1 𝑎 𝑛
𝑛 𝑎−1
 𝑎 1
Where 𝜑 represents the waiting time for a response, 1 − (1 − ) − ⋅ (1 − ) ,𝑤𝑖 ≥ 2
{ 𝑁𝑝 𝑁𝑝 𝑁𝑝
𝑡𝑑 represents the propagation duration and 𝐵 represents  
the maximum value of the backoff time slot. Where 𝑁𝑝 represents the number of available 
The fixed transmission channel and the number of 
preamble codes in the current timeslot, 𝑛𝑎 indicates the 
inherent node parameters determine the transmission 
number of access requests for the current timeslot. 
delay of SA. Therefore, the improved method is only 
Assume that the number of preamble codes satisfying 
suitable for processing delay-insensitive and small data 
wi = 0, wi = 1, and wi ≥ 2 in the current timeslot are L1, 
volume services. Otherwise, the transmission error will 
L2, and L3, respectively. Then, the maximum likelihood 
increase, and the availability of information will be 
estimation of the number of access applications in the 
reduced. 
current timeslot is expressed as Eq. (4): 
 
3.2 Estimation of access applications based 
𝑃 = 𝑃(𝑤𝑖 = 0|𝑁𝑎)
𝑛1 ⋅ 𝑃(𝑤𝑖 = 1|𝑁𝑎)
𝑛2 ⋅ 𝑃(𝑤𝑖
on time series prediction (
≥ 2|𝑁𝑎)
𝑛 4) 
3  
 
3.2.1 Estimation of current timeslot access The principle is to ensure that the number of access 
applications requests in the next time slot is optimal. The estimated 
For services using the ACB (Access Class Barring) number ?̂?𝑎 of access requests in the current timeslot can 
random access mode, the application amount of the be obtained by setting 𝑁𝑎 to the maximum value. The 
service should be estimated based on the occupation of the expression is given in Eq. (5): 
preamble code [19, 20]. Assume that 𝑤𝑖  represents the  
𝐽
state of the 𝑖 -th preamble code. The states are defined as 
follows: ?̂?𝑎 = 𝑎𝑟𝑔  max∑𝑙𝑛𝑃 (𝑤𝑗|𝑁𝑎) (5) 
- When 𝑤𝑖 = 1, the preamble code is not selected 𝑗
and is idle.  
- When 𝑤𝑖 = 1, an MTCD (Machine-Type After ACB, the comparison results between the 
Communication Device) has selected the maximum likelihood estimate and the actual application 
preamble code and it is busy. amount are shown in Fig. 5. It can be seen from the figure 
- When 𝑤𝑖 ≥ 2, two or more MTCDs have that the trend changes of the two lines are relatively 
selected the preamble code, resulting in a conflict consistent, indicating that the estimated value aligns well 
status [21]. with the actual value.
 
Congestion Control of Large-Scale Elevator Terminal Data Access…                                      Informatica 49 (2025) 157–172 163 
Figure 5: Comparison results of maximum likelihood estimation and actual application volume (After passing the 
ACB) 
According to the maximum likelihood estimation the ACB control parameter of the current timeslot. Before 
after passing the ACB, the actual number of access passing the ACB, he comparison between the maximum 
applications can be calculated as ?̂? = ?̂?𝑎/𝑎, where 𝑎 is likelihood estimates and the actual number of applications 
is shown in Fig. 6.
 
Figure 6: Comparison results of maximum likelihood estimation and actual application volume (Before passing the 
ACB) 
For services accessed in SA mode, the estimation is ?̂? 𝑊
?̂? 𝑖 − 𝑖 + 𝐻𝑖+1 + 𝑇𝑖+1, 𝑖 ≤ 𝐼𝐷
based on the physical resource block status of the current 𝑖+1 = {  (7) 
?̂?
g that the total number of available 𝑖 −𝑊𝑖 + 𝐻𝑖+1,         𝑖 > 𝐼time slot [22]. Assumin 𝐷
resource blocks is 𝑈𝑠, and the number of idle rate blocks  
in the current timeslot is 𝑈𝑘,𝑖. The actual idle rate is ?̃? here, 𝐼 eslot. 
𝑘,𝑖 =
W 𝐷 represents the last tim
𝑈 𝑈 𝑐 nce the access request volume is a time series, the 
𝑘,𝑖
, the theoretical idle rate is 𝑃 𝑠−1 𝑖 Si
𝑘,𝑖 = ( ) , where 𝐶  is 
𝑈 ighted sum of historical increments is used as an 
𝑠 𝑈 𝑖 we
𝑠
the access application volume of the current timeslot. y increment in the next time slot. The newly arrived access 
equating the theoretical idle rate to the actual idle rate,  applications in the 𝑖 + 1 time slot, 𝑇𝑖  can be expressed 
+1
?̃?𝑘,𝑖 = 𝑃𝑘,𝑖, the number of access requests in the current as shown in Eq. (8): 
time slot is obtained as shown in Eq. (6):  
 3 3 1
𝑇
𝑙𝑜𝑔( ?̃? 𝑖 = 𝑇 𝑇
+1
𝑘,𝑖) 5 𝑖 + 10 𝑖−1 + 𝑇
10 𝑖−2 (8) 
?̂?𝑖 =  (6) 
𝑙𝑜𝑔(𝑁𝑖(𝑁𝑖 − 1)  
Because 𝑇𝑖 = ?̂?𝑖 − ?̂?𝑖−1 − 𝐻𝑖 +𝑊𝑖−1, the Eq. (9) is 
3.2.2 Estimation of next timeslot access as follows: 
applications  
3 3 1
Assume that the estimated number of access applications 𝑇𝑖 = 𝑚𝑎𝑥 {0， ( 𝑇 + 𝑇
+1 5 𝑖 + 𝑇
10 𝑖−1 −2)}  10 𝑖
in the 𝑖 -th time slot is ?̂? 3 3 2 1
𝑖, the number of access successes  ?̂? − ?̂? − ?̂? −2 − ?̂?
is 𝑊𝑖, the number of newly arrived access applications in  5 𝑖 10 𝑖−1 10 𝑖 10 𝑖−3  
 3 3 1   (9) 
the 𝑖 + 1 time slot is 𝑇 = 𝑚𝑎𝑥 0，
𝑖 , and the number of access  − 𝐻 − 𝐻 1 − 𝐻𝑖−2 +   
+1  5 𝑖 10 𝑖− 10  
 
applications that need to be retransmitted is 𝐻  3 3 1
𝑖 . Then the 
+1 { 𝑊 𝑊  
(5 𝑖−1 + 𝑊 +
10 𝑖-2 10 𝑖−3 )}
estimated number of access applications in the 𝑖 + 1 time  
slot can be shown as Eq. (7): After transformation, the estimated amount of access 
 requests for the next time slot can be obtained. The 
expression is given in Eq. (10):
164   Informatica 49 (2025) 157-172                                                                                                                                       J. Shi et al. 
?̂?𝑖+1 = 
3 3 2 1
  ?̂? − ?̂?𝑖−1 − ?̂?  
   5
𝑖 10 10 𝑖−2 − ?̂?
10 𝑖−3
  
 3 3 1
𝑚𝑎𝑥 ?̂?𝑖， − 𝐻 − 𝐻 − 𝐻  −𝑊𝑖 , 𝑖 ≤ 𝐼𝐷 (10) 
  5 𝑖 10 𝑖−1 +
10 𝑖−2    
  3 3 1  
 𝑊 + 𝑊
 { (5 𝑖−1 + 𝑊
10 𝑖-2 10 𝑖−3 )}
{ ?̂?𝑖 −𝑊𝑖 + 𝐻𝑖+1,                                                       𝑖 > 𝐼𝐷
 
The comparison between the predicted application actual value is relatively consistent, indicating that the 
amount and the actual application amount of the time predicted result of the access application volume aligns 
series is shown in Fig. 7. It can be seen from the figure that well with the actual value.
the curve change trend of the estimated value and the 
 
Figure 7: Comparison results of predicted and actual application volumes of time series 
𝑃(𝑁
3.2.3 Parameter adjustment of predicted 𝑎 = 𝑛𝑎|𝑁 = 𝑛) 
𝑛 (12) 
= 𝐶 𝑎 𝑎
values 𝑛 ⋅ 𝑎𝑛𝑎 ⋅ (1 − 𝑎)𝑛−𝑛  
Update the packet parameter 𝐿1 and ACB control Then the estimated value of success access is given in 
parameter 𝑎 of the dynamic preamble code according to Eq. (13): 
the prediction value of the service arrival to ensure the 𝑎
access success rate of the next timeslot. Since 𝑤 𝑀[𝑁 1
𝑖 = 1 𝑠|𝑁 = 𝑛] = 𝑛 ⋅ 𝑎 ⋅ (1 − )𝑛𝑎−  (13) 
𝑁
indicates the successful transmission of the preamble 𝑝
code, the estimated value of the preamble code that can Deriving from a, the optimal control parameter is 
transmit successfully is given in Eq. (11): given in Eq. (14): 
 𝐽
𝑁
𝑀[𝑁 𝑎′ =  (14) 
𝑠|𝑁
𝑝
𝑎 = 𝑛𝑎] = ∑ 𝑃(𝑤
𝑖=1 𝑖 = 1|𝑁𝑎 = 𝑛
𝑛 1 1 1
𝑎) =𝑁𝑝 ⋅ 𝐶𝑛 ⋅ ⋅ (1 − )𝑛𝑎−1=𝑛
𝑎 𝑁 𝑎 ⋅ (1 − (11) From Eq (14), the access success rate is highest when 
𝑝 𝑁𝑝
1 the number of access requests matches the number of 
)𝑛𝑎−1                                             
𝑁𝑝 currently available preamble codes. The effect is optimal 
 when 𝐿1 equals the number of high-priority access 
𝑁𝑠 represents the number of preamble codes requests in the current timeslot. 
successfully transmitted, and  𝑁𝑎 represents the number of Fig. 8 shows the relationship between the number of 
services filtered by ACB. Suppose the system contains access requests and access successes when the number of 
𝑁 MTCDs, and  𝑁𝑎 MTCDs pass the screening. The preamble codes is 35, 60, and 76, further verifying the 
probability is given in Eq. (12): correctness of the above conclusions.
Congestion Control of Large-Scale Elevator Terminal Data Access…                                      Informatica 49 (2025) 157–172 165 
 
Figure 8: Relationship between access success and access requests 
priority and low-priority services using the hybrid access 
3.3 Hybrid access process method. Here’s a detailed explanation of the process:
The access process is outlined in Figure 9, illustrating the 
steps involved in managing access requests for high-
start
MTCD access failed
Y
N
Attempts exceeded High-level Select a random
preset？ business? number between (0,1)
Y
Pick a preamble from
N N
Small data ACB Less than control
an interval (1, L1) and
parameter a ?
volume business? Access try to access it
Y Y
Leader Code Select the preamble
Delay sensitive
Repeated from the interval (L1+1,
business?
Y Selection? Np) and try to access it
Y
N
Current slot
SA access MTCD access
access failure
succeeded
Y N
Conflict?
 
Figure 9: Access flow of the hybrid method 
1. Initial collection and setup: reserved for high-priority services and proceeds 
❖ The evolved Node B (eNB) collects access data to the access link. 
from the previous timeslot, counts the usage of ➢ For low-priority services, a random number 𝑝 is 
preamble codes, completes channel resource selected from the interval [0,1]. If 𝑝 is less than 
allocation, and sets parameters such as ACB the ACB control parameter 𝑎 of the current 
control and backoff parameters. timeslot, a preamble is selected from the set 
2. Random access phase: 𝐾2[𝐿1 + 1，𝑁𝑝] designated for low-priority 
❖ Determine the priority of the application access services. If 𝑝 ≥ 𝑎, the access is terminated. 
business: 3. Direct access phase: 
➢ For high-priority services, the system directly ❖ Services with small data volumes proceed with 
selects a preamble code from the set 𝐾1[1，𝐿1] direct access. 
166   Informatica 49 (2025) 157-172                                                                                                                                       J. Shi et al. 
4. Data transmission phase: 4 Experiments 
❖ MTCDs that have successfully obtained a 
transmission opportunity begin data 4.1 Experimental preparation 
transmission. 
This structured approach ensures that high-priority The experimental site for the study is a large metro station 
services are given precedence and that low-priority in Shanghai, equipped with a significant number of IoT 
services are managed in a way that minimizes conflicts terminals. The configuration of the parameters used in the 
and optimizes resource use. The hybrid access method experiments including the number of preambles, 
dynamically adjusts parameters based on historical data, maximum transmission attempts, conflict resolution time, 
improving overall system throughput and efficiency. and escape time, providing a baseline for evaluating the 
hybrid access method are detailed in Table 2.
 
 
Table 2: Key parameters used in the simulation experiments, including preambles and conflict resolution time, 
forming the baseline for evaluating the hybrid access method 
Parameter Value 
Number of preambles 60 
Maximum transmission times of preamble code 8 
Conflict resolution time 24 ms 
Escape time 15 ms 
 
These parameters were utilized to simulate and Therefore, the success rate of preamble code access is 
analyze the performance of the hybrid access method redefined for a fair assessment. The success rate, 𝑃𝑇 , is 
under various traffic conditions, including uniform and calculated as the ratio of the number of successfully 
beta distribution models, to verify its effectiveness in accessed services (𝑁𝑐) to the total number of preamble 
managing access congestion and ensuring timely data codes used in the access process (𝑁𝑎𝑙𝑙). This redefinition 
transmission in large-scale IoT environments. allows for a more accurate comparison of the efficiency 
The uniform and beta distribution models are and effectiveness of the hybrid access method against 
employed to verify the feasibility of the hybrid access traditional ACB methods. 
method by simulating various types of business data, 
including periodic and sudden data as well as random and 4.2 Experimental results and analysis 
irregular data, in elevator monitoring. To ensure 
comparability, ACB access and LA-ACB with different 4.2.1 Simulation results and analysis of 
parameters are also used as benchmarks in the uniform distribution model 
experiments. These experiments aim to count and 
This section discusses the simulation results and analysis 
compare the average access delay and access success rate 
using a uniform distribution model to evaluate the 
of different services [23].  
performance of the hybrid access method compared to 
Given that the hybrid access method assigns different 
traditional methods such as ACB (Access Class Barring) 
ranges of preamble codes according to the priority of 
and LA-ACB (Learning Automata ACB).
services, while the ACB method shares all access 
resources uniformly, a direct comparison would be unfair. 
 
Figure 10: Comparison of access success rates for high-priority services using the hybrid access method, ACB, and 
LA-ACB under the uniform distribution model 
Congestion Control of Large-Scale Elevator Terminal Data Access…                                      Informatica 49 (2025) 157–172 167 
 
Figure 11: Comparison of average access delay for high-priority services under the uniform distribution model 
The access success rate of high-priority services is 52.43% at 4000 applications. Fig. 11 shows the 
demonstrated in Fig. 10. When the number of access comparison of average access delay for high-priority 
applications is small, the LA-ACB method performs services. With an increase in access applications, the 
excellently. However, as the number of applications average access delay for the hybrid access method remains 
increases, LA-ACB causes resource wastage, and its relatively stable, indicating higher resource utilization and 
performance gradually declines. The hybrid access meeting high-priority service requirements more 
method initially shows lower success rates and higher effectively than LA-ACB. In other words, the hybrid 
delays due to high estimation errors but improves access method achieves a lower delay (76.72 ms at 4000 
significantly as the number of access applications requests) compared to ACB and LA-ACB, ensuring QoS 
increases. Precisely, the hybrid method demonstrates a for time-sensitive applications.
higher success rate as access requests increase, reaching 
 
Figure 12: Comparison of access success rates for concurrent services in the uniform model, with the hybrid method 
outperforming ACB and LA-ACB by reducing collisions and improving resource use 
 
Figure 13: Comparison of average access delays for concurrent services in the uniform model, showing the hybrid 
method's lower delays (76.72 ms), ensuring timely transmission 
168   Informatica 49 (2025) 157-172                                                                                                                                       J. Shi et al. 
The comparison of access success rates for multiple significantly improves the system's access success rate and 
types of concurrent services is illustrated in Fig. 12, while average access delay, thereby meeting the QoS (Quality of 
Figure 13 shows the comparison of average access delay Service) needs for high-priority services in large-scale IoT 
for these concurrent services. The hybrid access method terminal access scenarios. 
outperforms ACB and LA-ACB, showing a higher success 
rate and lower delay, especially when the number of 4.2.2 Simulation results and analysis of beta 
access applications reaches 4000. At this point, the hybrid distributed access model 
method achieves a 52.43% success rate and an average 
When the beta distribution model is adopted, the 
delay of 76.72 ms, demonstrating undeniable advantages 
performance of the hybrid access method is evaluated in 
in efficiency and effectiveness.  
terms of the access success rate and average access delay 
These results indicate that the hybrid access method, 
for high-priority services.
especially under a uniform distribution model, 
 
Figure 14: Comparison of access success rates for high-priority services in the beta distribution model, with the hybrid 
method excelling (42.07% at 4000 applications) through dynamic adjustments and efficient resource use 
 
Figure 15: Average access delays for high-priority services in the beta distribution model, with the hybrid method 
achieving a lower delay (82.02 ms at 4000 applications) than ACB and LA-ACB 
Fig. 14 illustrates the access success rate of high- requests, thereby minimizing the waiting time and 
priority services under the beta distribution model. The improving overall efficiency. 
results indicate that the hybrid access method achieves a These results highlight the advantages of the hybrid 
higher access success rate compared to the ACB and LA- access method in managing high-priority service requests, 
ACB methods. This improvement is due to the dynamic ensuring higher access success rates, and reducing average 
adjustment of access application amounts and access access delays under the beta distribution model. This 
parameters in the next timeslot, which optimizes the demonstrates the method's effectiveness in handling 
allocation of resources for high-priority services. dynamic and bursty traffic patterns in large-scale IoT 
Fig. 15 presents the comparison of average access environments. 
delay for high-priority services using the beta distribution The total number of system preamble codes is 60. 
model. The hybrid access method demonstrates a lower When high-priority services are concurrent with low-
average access delay compared to ACB and LA-ACB priority services, the access success rate is shown in Fig. 
methods. This reduction in delay is attributed to the 16, and the average access delay is shown in Fig. 17.
method's ability to better predict and manage access 
Congestion Control of Large-Scale Elevator Terminal Data Access…                                      Informatica 49 (2025) 157–172 169 
 
Figure 16: Access success rate for concurrent services in the beta distribution model, with the hybrid method 
achieving 42.07% at 4000 applications, surpassing ACB and LA-ACB 
 
Figure 17: Average access delay for concurrent services in the beta distribution model, with the hybrid method 
achieving 82.02 ms, outperforming ACB and LA-ACB 
Figure 16 shows access success rate for concurrent methodologies. This leads to a substantial increase in 
services under the beta distribution model. The hybrid system throughput that ensures reliable and efficient 
access method outperforms ACB and LA-ACB methods, communications over large-scale IoT topologies. 
achieving a success rate of 42.07% at 4000 applications, In summary, the hybrid access method enhances the 
demonstrating robust handling of burst traffic. Figure 17 performance of the system and also responds to robustness 
illustrates average access delay for concurrent services and scalability challenges; hence, it is the best against all 
under the beta distribution model. The hybrid access the complexities in communications in IoT at a metro 
method reduces delay to 82.02 ms at 4000 applications, railway station. Dynamic adaptability and predictive 
ensuring better performance for high-priority and time- accuracy make this tool indispensable to maintain the 
sensitive services. In fact, it is these very measures of optimum service level and meet the stringently demanding 
performance that represent important favorable points for QoS of critical infrastructure. 
the proposed hybrid model over conventional algorithms 
like ACB and LA-ACB. 5 Discussion 
The experimental results also reveal that the access 
success rate and average access delay are significantly The proposed hybrid access scheme constitutes one of the 
improved by the proposed hybrid access method. In key improvements in congestion management schemes 
addition, it well satisfies the requirements brought by the over large-scale IoT networks, especially in highly 
Quality of Service of high-priority traffic for periodic and populated areas such as in metro stations. In the process, 
bursty large-scale terminal access requests. It enables the SA-ACB merging is targeted at the solution of 
method to predict the volume of the access application fundamental issues like low access success rates and high 
effectively in the next timeslot in a dynamic way by taking delays in a network. Higher performance indices are 
advantage of the historical state of the preamble code, promised to be exhibited compared with the existing 
without assuming anything about the quantity of access methodologies LA-ACB and traditional ACB. For 
applications. instance, under the uniform distribution model, the 
The predictability allows for the tailoring of the maximum access success rate reaches 52.43% at 4000 
hybrid access method to the various characterizations of requests, which is far beyond the limitation of LA-ACB 
different services, hence optimality in the choice of access owing to the inefficiency of resource utilization when 
170   Informatica 49 (2025) 157-172                                                                                                                                       J. Shi et al. 
requests are too many. Besides, this approach ensures an Nevertheless, the suggested congestion control approach, 
average latency of no more than 76.72 ms for high-priority primarily corroborated through simulations, may not 
services that strictly meet the QoS requirement. Under entirely reflect the intricacies of real-world scenarios and 
correspondence, within the beta distribution model, the diverse traffic patterns encountered. Therefore, even 
robustness exposed to bursty traffic by the hybrid the refined uniform and Beta distribution models need 
approach achieved 42.07% in success rate and 82.02 further refinement and validation in order to ensure their 
milliseconds average delay. accuracy against different scenarios. The scalability of the 
Those advantages come forth due to novelty in method, especially above 4000 access requests, was not 
resource allocation and predictive adjustments that this deeply analyzed, as was the application of the method to 
hybrid method will implement. The method dynamically other IoT applications. It has to be implemented on-site, 
adapts the ACB control parameters in view of historical considering variations in traffic models, advance 
data and real-time estimation to optimize channel prediction methods using machine learning techniques, 
utilization with minimum collision. It efficiently spreads and scalability analysis for performance evaluation. 
the network load in a dual-access approach wherein small Extension of the method to other IoT applications, 
data services are managed by SA and large delay-sensitive investigation of energy efficiency, and incorporating 
services are overseen by ACB. This flexibility is a key robust security will ensure its sustainability, hence reliable 
ingredient for achieving high scalability and reliability, in different IoT environments. 
especially under scenarios that exhibit diversified traffic 
patterns where high-priority applications must coexist Acknowledgment 
with low-priority ones. 
The practical implications of these findings are huge. Thanks to our families and colleagues who supported us 
Hybrid should guarantee environments like metro stations morally. 
with very low latency and high access success ratios, 
dependably surveilling the very important equipment of Funding statement 
elevators and escalators, while improving operational 
Not applicable 
safety and efficiency. Besides, this solution also provides 
a scalable and economically feasible way to handle 
congestion in IoT networks, thus making it suitable for Conflicts of interest 
smart city, industrial automation, and, generally speaking, The authors declare that there is no conflict of interest 
high-traffic IoT systems. Future works may further regarding the publication of this paper. 
optimize the proposed approach for energy efficiency and 
extend its applicability to realistic traffic for further Authorship contribution statement 
generalization. These results have established the hybrid 
access method as a robust and practical solution to handle The manuscript has been read and approved by all the 
congestion in large-scale IoT networks. authors, the requirements for authorship, as stated earlier 
in this document, have been met, and each author believes 
that the manuscript represents honest work. 
6 Conclusion 
The paper proposed an IoT-based congestion management Availability of data and materials 
strategy for mass data access from the elevator terminals 
at the metro station. This method categorized the business On Request 
data by volume and latency requirements and adopted SA 
for delay-tolerant services and ACB for real-time services. Declarations 
Besides, in the proposed methodology, dynamically 
Not applicable 
adjusting ACB control parameters was adopted to 
optimize the access efficiency for terminals. The 
effectiveness of the approach is corroborated by the References 
simulation results: from a uniform distribution model, [1] ShuangChang F, Jie C, Yanbin Z, Zheyi L (2020). 
based on 4000 access requests, the hybrid method can Discussion on improving safety in elevator 
achieve an access success rate of 52.43% and an average management. In: 2020 2nd International 
access delay of 76.72 ms. From the Beta distribution Conference on Machine Learning, Big Data and 
model, 42.07% with an average access delay of 82.02 ms Business Intelligence (MLBDBI). IEEE, pp 195–
can be achieved. It is presented that the Hybrid Access 198. 
Method increases the access success rate greatly and https://doi.org/10.1109/MLBDBI51377.2020.000
decreases the delay hence fulfilling the QoS requirements 43. 
for high-priority services in a large-scale IoT [2] Ushakov D, Dudukalov E, Kozlova E, Shatila K 
environment. Future investigations ought to encompass (2022). The Internet of Things impact on smart 
practical implementation and examine more extensive public transportation. Transportation Research 
traffic models, sophisticated prediction methodologies, Procedia, 63:2392–2400. 
and scalability to further substantiate and augment the https://doi.org/10.1016/j.trpro.2022.06.275. 
applicability and dependability of the method. 
Congestion Control of Large-Scale Elevator Terminal Data Access…                                      Informatica 49 (2025) 157–172 171 
[3] Wang C, Feng S (2020). Research on big data 135:233–260. https://doi.org/10.1007/s11277-
mining and fault prediction based on elevator life 024-10943-5.  
cycle. In: 2020 International Conference on Big [15] Piao Y, Lee T-J (2024). Integrated 2–4 Step 
Data & Artificial Intelligence & Software Random Access for Heterogeneous and Massive 
Engineering (ICBASE). IEEE, pp 103–107. IoT Devices. IEEE Transactions on Green 
https://doi.org/10.1109/ICBASE51474.2020.000 Communications and Networking, 8:441–452. 
30. https://doi.org/10.1109/TGCN.2023.3322539 
[4] Yao W, Jagota V, Kumar R, et al (2022). Study [16] Yu B, Cai Y, Wu D (2021). Joint Access Control 
and application of an elevator failure monitoring and Resource Allocation for Short-Packet-Based 
system based on the internet of things technology. mMTC in Status Update Systems. IEEE Journal 
Sci Program, 2022:2517077. on Selected Areas in Communications, 39:851–
https://doi.org/10.1155/2022/2517077. 865. 
[5] Mao J, Chen L, Cheng H, Wang C (2023). https://doi.org/10.1109/JSAC.2020.3018801. 
Elevator fault diagnosis and maintenance method [17] Bui A-TH, Nguyen CT, Thang TC, Pham AT 
based on Internet of Things. In: Proc.SPIE, p (2019). A comprehensive distributed queue-based 
1279305. https://doi.org/10.1117/12.3006383. random-access framework for mMTC in 
[6] Lai CTA, Jiang W, Jackson PR (2019). Internet of LTE/LTE-A networks with mixed-type traffic. 
Things enabling condition-based maintenance in IEEE Trans Veh Technol, 68:12107–12120. 
elevators service. J Qual Maint Eng, 25:563–588. https://doi.org/10.1109/TVT.2019.2949024. 
https://doi.org/10.1108/JQME-06-2018-0049. [18] Cui Y, Liu F, Jing X, Mu J (2021). Integrating 
[7] Mouha RARA (2021). Internet of things (IoT). sensing and communications for ubiquitous IoT: 
Journal of Data Analysis and Information Applications, trends, and challenges. IEEE Netw, 
Processing, 9:77. 35:158–167. 
http://www.scirp.org/journal/Paperabs.aspx?Pape https://doi.org/10.1109/MNET.010.2100152. 
rID=108574. [19] Zhao L, Xu X, Zhu K, et al (2018). QoS-based 
[8] Wang J, Lim MK, Wang C, Tseng M-L (2021). dynamic allocation and adaptive ACB mechanism 
The evolution of the Internet of Things (IoT) over for RAN overload avoidance in MTC. In: 2018 
the past 20 years. Comput Ind Eng, 155:107174. IEEE Global Communications Conference 
https://doi.org/10.1016/j.cie.2021.107174. (GLOBECOM). IEEE, pp 1–6. 
[9] Song T, Cai J, Chahine T, Li L (2021). Towards https://doi.org/10.1109/GLOCOM.2018.8647599 
Smart Cities by Internet of Things (IoT)—a Silent [20] Sari RF, Harwahyu R, Cheng R-G (2020). Load 
Revolution in China. Journal of the Knowledge Estimation and Connection Request Barring for 
Economy, 12:1–17. Random Access in Massive C-IoT. IEEE Internet 
https://doi.org/10.1007/s13132-017-0493-x. Things J, 7:6539–6549. 
[10] Ullah S, Radzi RZ, Yazdani TM, et al (2022). https://doi.org/10.1109/JIOT.2020.2968091.  
Types of Lightweight Cryptographies in Current [21] He H, Ren P, Du Q, Sun L (2015). Estimation 
Developments for Resource Constrained Machine based adaptive ACB scheme for M2M 
Type Communication Devices: Challenges and communications. In: Wireless Algorithms, 
Opportunities. IEEE Access, 10:35589–35604. Systems, and Applications: 10th International 
https://doi.org/10.1109/ACCESS.2022.3160000 Conference, WASA 2015, Qufu, China, August 10-
[11] Mahmood NH, Alves H, López OA, et al (2020). 12, 2015, Proceedings 10. Springer, pp 165–174. 
Six Key Features of Machine Type https://doi.org/10.1007/978-3-319-21837-3_17. 
Communication in 6G. In: 2020 2nd 6G Wireless [22] Zhai D, Lu Y, Shi R, Ji Y (2022). Large-Scale 
Summit (6G SUMMIT),. pp 1–5. Micro-Power Sensors Access Scheme Based on 
https://doi.org/10.1109/6GSUMMIT49458.2020. Hybrid Mode in IoT Enabled Smart Grid. In: 2022 
9083794. 7th International Conference on Signal and Image 
[12] Chou CM, Huang CY, Chiu C-Y (2013). Loading Processing (ICSIP). IEEE, pp 719–723. 
prediction and barring controls for machine type https://doi.org/10.1109/ICSIP55141.2022.988668
communication. In: 2013 IEEE International 4. 
Conference on Communications (ICC). IEEE, pp [23] Liu G, Jiang X, Li H, et al (2022). Adaptive access 
5168–5172. selection algorithm for large-scale satellite 
https://doi.org/10.1109/ICC.2013.6655404. networks based on dynamic domain. Sensors, 
[13] Zhang L, He C, Peng Y, et al (2023). Multi-UAV 22:5995. https://doi.org/10.3390/s22165995. 
Data Collection and Path Planning Method for  
Large-Scale Terminal Access. Sensors, 23:8601.  
https://doi.org/10.3390/s23208601.  
[14] Varsha V, Prakash SPS, Krinkin K (2024). An  
Intelligent Bayesian Inference Based Learning  
Automaton Approach for Traffic Management in  
Radio Access Network. Wirel Pers Commun,  
 
172   Informatica 49 (2025) 157-172                                                                                                                                       J. Shi et al. 
 
 
 
 https://doi.org/10.31449/inf.v49i12.7840                                         Informatica 49 (2025) 173–190  173 
 
 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless 
Sensor Networks Using Chaotic Mapping and Osprey Optimization 
  
Songhao Jia, Wenqian Shao *, Cai Yang, Shuya Jia, Yaohui Yuan, Huiyuan Chen and Haiyu Zhang  
School of Artificial Intelligence and Software Engineering, Nanyang Normal University, Nanyang, Henan, 473061, 
China 
E-mail: shaowenqian2023@163.com 
*Corresponding author 
 
Keywords: emergency communication system, wireless sensor network, prey optimization, chaotic mapping, energy 
consumption  
Received: December 17,2024 
A wireless sensor network (WSN) represents a promising approach for establishing self-organizing 
wireless networks comprising a substantial number of wireless sensors, with the objective of 
facilitating communication in regions where the existing communication infrastructure has been 
severely disrupted. In order to address the issue of excessive energy consumption by cluster heads and 
central nodes in emergency communication networks of wireless sensor networks, this paper proposes 
an emergency communication algorithm for wireless sensor networks based on chaos mapping and 
osprey optimization. Firstly, an optimization algorithm based on chaos theory is used to select the 
virtual position of the initial population of the Osprey optimization algorithm. This is achieved by 
simulating the randomness and unpredictability of chaotic systems. Secondly, the osprey optimization 
algorithm and the improved fitness function are used to select the optimal cluster head combination. 
In the selection process, six factors, such as the energy level of network nodes, the distance between 
cluster heads, the distance between cluster heads and base stations, the distance between cluster 
heads and ordinary nodes, the variance of the distance between cluster heads and base stations and 
the variance of the distance between cluster heads, are comprehensively considered. Finally, the 
heuristic function of FA-star algorithm is used to select the next hop node to transmit the message. 
The results of the simulation demonstrate that the residual energy of the CM-OOA algorithm is 14% 
higher than that of the CGWOA algorithm following the transmission of 1000 data rounds. This figure 
is 54% higher than that observed for the PSO-C algorithm. The findings demonstrate that the 
CM-OOA algorithm effectively extends the network lifetime and preserves a favorable load balance in 
diverse network settings. 
Povzetek: CM-OOA algoritem s kaotičnim preslikovanjem in optimizacijo osprejev natančno izbere 
optimalna vozlišča, zmanjšuje energijsko porabo in podaljšuje življenjsko dobo WSN, kar je ključnega 
pomena za nujne komunikacijske sisteme. 
 
1  Introduction among others. Due to its low cost and ease of use, it is 
capable of functioning in a multitude of challenging 
In recent years, with the gradual warming of the global 
environments. In areas inaccessible to humans, 
climate, after earthquakes, floods, strong tropical storms 
unmanned aerial vehicles (UAVs) can be deployed to 
or other disasters, fixed communication network facilities 
establish wireless communication networks [2]. It can be 
may be completely destroyed or most of them may not 
reasonably proposed that wireless sensor networks 
work normally. Communication is extremely important 
represent a potential method for emergency 
for emergency rescue and disaster relief [1]. At this time, 
communication. However, the considerable number of 
we need an emergency network that can be quickly 
sensor nodes, coupled with the limited energy capacity 
deployed without relying on any fixed network facilities. 
and relatively short operational lifespan, present 
A wireless sensor network is a network composed of a 
significant challenges. Nevertheless, the length of time 
large number of randomly distributed nodes that are 
that emergency communication networks based on 
capable of self-organization. The primary function of the 
wireless sensors can remain operational is a significant 
system is to monitor and obtain data from the target area 
challenge. One promising avenue for further research is 
and subsequently transmit it to the base station. A 
to enhance the energy efficiency of these networks, 
plethora of potential applications can be envisaged in the 
thereby prolonging their operational lifespan. 
context of the Internet of Things, including those in the 
Aiming at the energy consumption problem of WSNs in 
military, aerospace, ocean and agricultural sectors, 
data transmission, it is an effective method to prolong the 
 
174   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
life of wireless sensor networks by selecting cluster normal sensing nodes in the future rounds of the 
heads for network nodes and data fusion [3]. At present, sub-cycle [12]. Das Rahul proposed a large-scale 
cluster head selection algorithm usually uses two energy-aware trust optimization algorithm for cluster 
technologies, one is to randomly select cluster heads head selection and malicious node detection. The 
through thresholds, and the other is to design appropriate harmonic search genetic algorithm was originally used to 
fitness function to select cluster heads by using swarm select cluster heads according to energy, trust, distance 
intelligence technology. Some scholars have also and density. By considering the trust value, this method 
proposed to solve the problem of rapid death of central avoids choosing malicious nodes as cluster heads, and 
nodes by using non-uniform clustering algorithm. then uses energy-aware trust estimation models within 
Firstly, scholar Wendi Rabiner Heinzelman proposed and between clusters to detect malicious nodes, this 
LEACH protocol, which randomly rotates cluster heads depends on two modules: direct trust and indirect trust 
with a certain threshold, and reduces energy consumption between clusters and within clusters [13]. Pal. Raju 
and prolongs network life cycle by clustering nodes to proposed a multi-objective binary grey wolf optimizer to 
cluster heads [4]. Saxena Madhvi scholars enhanced the find the clustering method in heterogeneous networks, 
original LEACH protocol by introducing new algorithms and extended the network life cycle through five 
CHME-LEACH and CHP-LEACH, reducing objectives: maximizing the overall cluster head energy, 
communication energy consumption and prolonging minimizing the cluster head compactness, minimizing the 
network life [5]. Jonnalagadda Suman scholars put number of cluster heads, minimizing the energy 
forward an energy-aware routing protocol MAX LEACH, consumption from non-cluster heads to clusters, and 
which is suitable for heterogeneous networks and maximizing the cluster spacing [14]. These scholars have 
homogeneous networks, to minimize the energy developed heterogeneous wireless sensor networks with 
consumption of nodes and extend the network life [6]. different energy nodes. One method of prolonging the 
These scholars employ data fusion techniques with the network life cycle is to increase the energy available to 
objective of reducing network energy consumption and the cluster head nodes. The comparison between 
extending network lifetime. algorithms is shown in Table 1. 
Secondly, with the continuous development of intelligent 
Table 1: Comparison of the different types of protocols 
algorithms, intelligent algorithms have broad application 
involved. 
prospects in selecting cluster heads in wireless sensor 
networks. The selection of cluster heads in wireless Mode References Vantage Drawbacks 
sensor networks is very similar to swarm intelligence 
Cluster heads are 
algorithm. Therefore, Gülbaş, Gülşah scholars introduced LEACH.2000[4] selected by 
simulated annealing algorithm to propose LEACH-SA The algorithm is threshold, and 
Threshold CHP-LEACH.2024 
algorithm, and introduced simulated annealing algorithm simple, and the cluster random selection 
protocol [5] head is selected by the can lead to 
to select cluster heads to extend the network life [7]. threshold. irrational 
Mishra Rashmi scholars select the optimal number of MAX combinations of 
LEACH.2023[6] cluster heads. 
cluster heads among dense network nodes by introducing 
butterfly optimization algorithm, and select the next hop LEACH-SA.2023[7] Cluster intelligence 
node by introducing ant colony optimization algorithm in is used to select the 
the data transmission stage [8]. Nurul muazzah abdul Mishra Through the cluster head and the 
Machine continuous selection cluster head nodes 
latiff scholar proposed PSO-C protocol by introducing Rashmi.2023[8] 
learning of swarm intelligence, should be 
particle swarm optimization algorithm, which reduced 
protocol until the reasonable reasonably located. 
PSO-C.2007[9] 
network energy consumption and extended network life cluster head selection. Better reduction of 
[9]. Bejjam Komuraiah scholar put forward at the 14th energy 
International Conference on Computing Communication CGWOA.2024[11] consumption. 
and Networking Technologies in 2023 that genetic Clusters with 
ECSSEEC.2023[12] The number of nodes 
algorithm is introduced into wireless sensor networks, inconsistent number 
within the cluster is 
Non-unifo
which makes the network balance load and optimize, and of nodes in the 
not the same, which 
rm Das Rahul.2024[13] cluster may lead to 
increases the better results in lower cycle [10]. Muntather can avoid the rapid 
protocol a large energy 
Almusawi scholars proposed the CGWOA protocol by death of the central 
consumption gap 
Pal. Raju.2024[14] node. 
introducing chaos algorithm and grey wolf optimization between clusters. 
algorithm, which reduced energy consumption by  
reducing the transmission distance of network nodes [11]. 
The various clustering routing algorithms proposed by 
The application of cluster intelligence algorithms enables 
the aforementioned scholars have the potential to reduce 
the selection of cluster heads that optimize the energy 
the energy consumption of wireless sensor networks and 
consumption of the network, thereby extending its 
to extend their operational lifetime. However, there is a 
operational lifespan. 
lack of reasonable allocation methods for the election of 
Thirdly, for heterogeneous networks, there are also many 
cluster heads and the selection of path nodes from cluster 
scholars' research on heterogeneous clustering algorithm. 
heads to base stations in the process of algorithm design. 
Verma Axel and other scholars put forward the 
In this paper, a chaos mapping Osprey optimization 
ECSSEEC protocol based on enhanced cost and sub-era. 
algorithm (CM-OOA) is proposed to reduce network 
In ECSSEEC protocol, the optimal number of clusters is 
energy consumption, improve clustering efficiency and 
selected by modeling the cost function, and the 
prolong network life. Firstly, the randomness and 
previously selected cluster heads are rotated again as 
ergodicity of chaotic mapping algorithm are used to 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless…                   Informatica 49 (2025) 173–190  175    
search for the global optimal solution. The core of 4) The power sent and received by each sensor node is 
chaotic mapping algorithm is chaotic mapping, which is controllable. 
a discrete nonlinear dynamic system that can produce 
5) All sensors have the same properties and their 
seemingly random state changes. Chaos mapping 
positions remain unchanged relative to the base station. 
algorithm can effectively search in the solution space, so 
as to find the optimal solution or near optimal solution of 
the problem. Secondly, by using the Osprey optimization 
Satellite
algorithm, the characteristics of local and global 
optimization can be well balanced. Find out all the 
optimal solutions or approximate optimal solutions to 
find the most suitable node as the cluster head, so that 
each cluster head node has the highest energy, the Internet
Base station
shortest distance to the base station, the shortest distance 
from the node to the cluster head and the more balanced 
distance between cluster heads. Finally, by comparing the 
distance from the node to the base station with the Computer 
distance from the cluster head to the base station, and the terminal
energy of the cluster head itself, the common node 
selects the cluster head node and performs the cluster 
operation. The node to base station Euclidean distance is User
less than the cluster head node to base station Euclidean Sensor node Monitoring area
distance and will be transmitted directly to the base  
station. In the inter-cluster routing stage, based on the Figure 1: Network topology model. 
FA-star heuristic search algorithm, the heuristic function 
of four factors, namely the distance from the starting 
node to the forwarding node, the distance from the 
forwarding node to the base station, the energy of the 
node and the forwarding times of the node, is optimized. 
Select the most suitable next-hop routing node from the 
neighbor nodes composed of all nodes that meet the 
conditions. For the hot spot phenomenon that may occur 
in wireless sensor networks, because some nodes directly 
transmit to the base station, and the inter-cluster 
forwarding nodes include cluster head nodes and 
ordinary nodes, the energy consumption is more balanced, 
which will not cause the nodes to die too quickly, thus 
prolonging the communication time of the emergency 
communication network. 
 
 
2  System model 
Figure 2: Emergency communication network mode. 
2.1 Communication network structure model 
There are three main communication modes in the 
The topology model of wireless sensor network adopted emergency communication network. Firstly, 
in this paper is shown in Figure 1. The simulation model Communication within a cluster. Because information 
assumes that N nodes are randomly distributed in a transmission is only between clusters, this method 
square area of M*M and all nodes are wireless sensors of consumes less energy for wireless sensor networks [17]. 
the same type. The network model is shown in Figure 2. Secondly, Communication of the same cluster head node. 
In order to accurately calculate the information of the When users are not in the same cluster head node, if 
node and ensure that the base station receives and sends communication is needed, the common node will report 
data continuously and stably, the node can independently to the superior cluster head node and communicate with 
select the appropriate transmission power according to each other through the cluster head node. Thirdly, 
the energy consumption model [15-16]. In order to avoid Communication between different cluster head nodes. 
the influence of bad weather and human factors, network When users are neither the same cluster nor the same 
nodes need to meet the following requirements: cluster head node, if communication is needed, ordinary 
1) The random dropping area M×M contains N   n odes will report to their superior’s step by step and 
sensor nodes, and the node positions after dropping are contact each other through base stations [18]. 
fixed. In the communication process of wireless sensor network, 
the third process needs the information transmission of 
2) Sensor nodes have unique and different ids. the whole wireless sensor network. User information is 
3) The base station has unlimited energy and no signal transmitted in both directions from ordinary nodes to 
interference in the area. base stations and then to ordinary nodes, so energy 
consumption is mainly concentrated in the third mode 
 
176   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
[19]. Therefore, this paper mainly studies the energy location of cluster head nodes is very important. 
consumption of the third communication mode. Attention should also be paid to the direction of data 
transmission in the process of ordinary nodes entering 
 
the cluster, but the "hot spot effect" around the base 
 station in wireless sensor networks is also the key to 
extend the network life [21]. Through formula (1), it can 
2.2 Communication energy consumption be seen that multi-hop transmission is better than 
model single-hop transmission in long-distance transmission. 
In this paper, the wireless sensor emergency However, in multi-hop transmission, in the process of 
communication network adopts the first-order wireless selecting the next hop node from the cluster head node to 
communication energy consumption model [20], which the base station, the same next hop node will be selected 
can be divided into short-distance free space model and continuously, resulting in the rapid death of the node. To 
long-distance multi-path model according to the solve these problems, it contains three main problems: 
transmission distance. The specific formulas are as 1) Does the cluster head combination affect the 
follows: (1) - (3). network energy consumption? 
𝐾 ∗ 𝐸𝑒𝑙𝑒𝑐 + 𝐾 ∗ 𝜀𝑓𝑠 ∗ 𝐿2, 𝐿 < 𝐷0 2) How to plan the direction of data transmission to 
𝐸𝑇𝑥(𝐾, 𝐿) = {       (1) 
𝐾 ∗ 𝐸𝑒𝑙𝑒𝑐 + 𝐾 ∗ 𝜀𝑚𝑝∗𝐿4, 𝐿 ≥ 𝐷0 solve the problem of “hotspot effect” where the 
center node dies quickly? 
𝜀𝑓𝑠
𝐷0 = √                              (2) 
𝜀 3) Can the multi-hop cluster head node choose the 
𝑚𝑝
same node as the forwarding node every round? 
𝐸𝑅𝑥(𝐾, 𝐿) = 𝐾 ∗ 𝐸𝑒𝑙𝑒𝑐                      (3) In the past, many researchers did not comprehensively 
In formulas (1) - (3), 𝐸 consider the above problems from the perspective of 
𝑇𝑥 is the energy consumption for 
sending K bit data; 𝐸 three-tier network energy consumption model. Some 
𝑒𝑙𝑒𝑐  represents the energy 
consumption associated with the transmission and researchers randomly select cluster head combinations, 
reception of a single bit of data;  𝜀 which leads to the irrationality of cluster heads, and then 
𝑓𝑠 is the loss factor of 
free space model; 𝜀 leads to redundant energy consumption. Because the 
𝑚𝑝  is the energy loss factor of 
multipath attenuation model; L is the data transmission ultimate goal of data is the base station, the data 
distance; 𝐸 transmission direction can only be close to the base 
𝑅𝑥 is the energy consumption for receiving K 
bit data. station. However, most researchers do not consider the 
influence of the clustering operation process of ordinary 
 nodes on the data transmission direction, and all nodes 
are clustered. This process causes some nodes to transmit 
3  Research on energy consumption data in the opposite direction to the base station, resulting 
in energy transmitted in the opposite direction. Some 
of three-layer network 
researchers also use multi-hop in long-distance 
From the network topology diagram, we can see that the transmission, but the forwarding times of the next hop 
data acquisition and transmission stage of wireless sensor node are not considered, which can not be ignored for the 
networks can be divided into three levels: ordinary node node life. 
layer, cluster head node layer and base station layer, as Aiming at the above three problems, this section will 
shown in the Figure 3. analyze the energy consumption reasons of each layer 
network from the perspective of three-layer network 
energy consumption, and put forward a reasonable 
cluster head selection, data transmission direction 
planning, and next-hop node selection and processing 
algorithm in multi-hop mode. 
 
3.1 Reasonable cluster head combination 
In the process of selecting cluster head nodes by ordinary 
nodes, the distances from different cluster head node 
c ombinations to nodes are different. Therefore, it has an 
impact on the overall energy consumption of the network. 
Figure 3: Three-layer network model. From the energy transmission formula (1), it can be seen 
that the distance is directly proportional to the energy 
Through the formulas (1) -(3) in the network energy 
consumption. Reasonable cluster head combination can 
consumption model, the main reasons of energy 
better reduce the energy consumption of sending data due 
consumption in the three-layer network can be analyzed 
to distance. 
respectively. The data transmission of ordinary nodes is 
In the osprey optimization algorithm (OOA), the optimal 
the key to energy consumption. When the transmission 
position of individual osprey is obtained by updating the 
distance exceeds D0, the energy consumption during data 
position of individual osprey and comparing the fitting 
transmission will increase sharply, so the appropriate 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless…                   Informatica 49 (2025) 173–190  177    
function values of each individual position. However, the 
osprey optimization algorithm is limited by its slow 
convergence speed and tendency to converge to the local 
optimal solution. Aiming at the problems of slow 
convergence speed and easy to fall into local optimal 
solution in cluster head combination selection, this 
algorithm combines population initialization process with 
K-means++ algorithm and chaotic algorithm to form 
chaotic osprey optimization algorithm (CM-OOA). The 
output of chaotic Osprey optimization algorithm is 
similar to the selection of cluster head combination in 
wireless sensor networks. As shown in Table 2, there is 
significant consistency between the characteristics of 
wireless sensor networks and the principle of chaotic 
Osprey optimization algorithm. 
Table 2: Similarity correspondence table between 
wireless sensor networks and chaotic mapping osprey 
optimization algorithm.  
WSN CM-OOA Figure 4: Clustering algorithm effect. 
3.1.2 Chaos mapping optimization 
Sensor node number Dimension position size Initializing the population by Logistic chaotic mapping 
can enhance the global search ability and help CM-OOA 
Individual position of algorithm jump out of the local optimal solution [22]. 
Node group 
osprey The randomness and unpredictability of Logistic chaotic 
mapping can prevent the algorithm from converging to 
the local optimal solution prematurely. Logistic chaotic 
Cluster head node Optimal individual mapping can adapt to different search spaces and 
combination position of osprey 
optimization problems, and has good universality. 
Therefore, Logistic chaotic mapping can be easily 
Combination of all 
All positions of osprey 
pre-selected cluster head combined with Osprey optimization algorithm to form 
population 
nodes CM-OOA algorithm, so as to make better use of their 
respective advantages to deal with the problem. Logistic 
 
chaotic mapping formula is as follows: 
Good population initialization allows the CM-OOA 
algorithm to start searching from several different initial 𝑃𝑖+1 = 𝛼 ∗ 𝑃𝑖 ∗ (1 − 𝑃𝑖)                   (6) 
starting points, which helps the algorithm to explore 
In the formula, α is the control parameter, and the value 
multiple regions of the solution space, thus increasing the 
is taken in (0,4]. Pi is the transformation of the 
likelihood of finding a globally optimal solution. If the 
coordinates of the initial population into polar angles as 
individuals in the initial population are too concentrated, 
an initial value. 
the algorithm may quickly converge to the local optimal 
solution and ignore other potentially better solutions. A The detailed process of CM-OOA algorithm: 
diverse initial population helps to avoid this. Proper 
population initialization allows the algorithm to find Step 1. Population initialization. 
better solutions at an early stage, thus speeding up the The virtual initialization of the Osprey cluster is achieved 
convergence of the whole search process. Therefore, the by means of the circular symmetric chaotic mapping 
algorithm in this paper performs the population algorithm and the K means++ clustering algorithm. 
initialization operation in two ways. 
Step 2. Initialization of osprey population based on 
3.1.1 Kmeans++ algorithm for clustering location mapping algorithm. 
Initializing the populations is an important step in the The virtual position of the initialized osprey population is 
CM-OOA algorithm, which uses the Kmeans ++ obtained, and the real node number in the wireless sensor 
clustering algorithm for clustering to obtain a more network is mapped by the Euclidean distance d from the 
accurate optimal solution. The initial population nodes node to the virtual position and the energy e of the node 
are selected by the centre position of each cluster group. itself. The osprey population is initialized to P(t) = {Pt1, 
The calculation of the cluster centre position is based on Pt2, Pt3…}, and the individual Pti position of osprey is Pti 
the application of equations (4) and (5). The effect of the = {Xi1, Xi2, Xi3…}. 
clustering algorithm is shown in Figure 4. 
 
𝑋 𝑡
𝑚 = ∑𝑖=0 𝑋⁄𝑡                          (4) 
 
𝑌𝑚 = ∑𝑡
𝑖=0 𝑌⁄𝑡                           (5)  
 
178   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
Step 3. Calculate the fitness function. 
𝑃𝐹𝑖𝑠ℎ
𝑡,𝑗𝑥 = 𝑃𝑡,𝑗𝑥 + (𝑙𝑏𝑡 + 𝑅𝑡,𝑗 ∗ (𝑢𝑏𝑡 − 𝑙𝑏𝑡))⁄𝑡      (9) 
Fi=fitness (Pi(t)) is the fitness value of individual Pi(t) of 
osprey at time t, which is used to evaluate the strength of 
solving energy consumption problems at the position of 𝑃𝐹𝑖𝑠ℎ
𝑡,𝑗𝑦 = 𝑃𝑡,𝑗𝑦 + (𝑙𝑏𝑡 + 𝑅𝑡,𝑗 ∗ (𝑢𝑏𝑡 − 𝑙𝑏𝑡))⁄𝑡      (10) 
osprey. 
In formulas (9) - (10), Rt,j is a number randomly 
Step 4. Osprey individuals look for schools of fish. 
generated between [0,1]; ubt is the upper boundary of 
Through the comparison of fitness values, the individual the dimension coordinate; lbt is the lower boundary of 
positions of osprey whose fitness values are smaller than the dimension coordinate; Pt,jx  is the X coordinate 
their own are combined as fish schools, Fish = {Pk(t)|k∈ position of osprey individual P in the J-th dimension of 
{1, 2,…N}∧Fk<Fi}∪{Pbest}. the T-th round; Pt,jy  is the Y coordinate position of 
osprey individual P in the J-th dimension of the T-th 
Step 5. Individual fishing of osprey. 
round; T is the number of rounds of population iteration. 
In Step 3, the osprey individual P looks for his own Fish 
Step 7. New osprey population position. 
school. If the Fish school fish is empty, it is considered 
that the osprey individual X has successfully caught the P(t+1) = {...} indicates a new generation of osprey 
target fish and directly goes to Step 5 for the osprey population generated by individual osprey searching for 
individual to eat fish. Otherwise, the osprey individual P fish, individual osprey catching target fish and individual 
randomly selects a target Fish in the fish school fish for osprey eating fish [24]. 
fishing operation. Because the node coordinates are 
Step 8. Algorithm termination condition. 
two-dimensional, the coordinates X and Y are calculated 
separately for fishing operation. The formula of specific The algorithm repeats the operations from Step 3 to Step 
fishing process is as follows: 7 until the maximum number of iterations is reached. 
𝑃𝐹𝑖𝑠ℎ
𝑡,𝑗𝑥 = 𝑃𝑡,𝑗𝑥 + 𝑅𝑡,𝑗 ∗ (𝑃𝐹𝑖𝑠ℎ𝑥 − 𝐼𝑡,𝑗 ∗ 𝑃𝑡,𝑗𝑥)     (7) Step 9. Algorithm output. 
The optimal individual fitness value and osprey position 
𝑃𝐹𝑖𝑠ℎ
𝑡,𝑗𝑦 = 𝑃𝑡,𝑗𝑦 + 𝑅𝑡,𝑗 ∗ (𝑃𝐹𝑖𝑠ℎ𝑦 − 𝐼𝑡,𝑗 ∗ 𝑃𝑡,𝑗𝑦)     (8) are obtained, that is, the optimal cluster head 
combination. 
In formulas (7) - (8), Rt,j  is a number randomly 
generated between [0,1]; It,j  is randomly selected Step 10. Ordinary nodes into the cluster operation. 
between 1 and 2; Pt,jx is the X coordinate position of 
By calculating the fitness function of clustering and 
osprey individual P in the J-th dimension of the T-th 
comparing and selecting, ordinary nodes select the 
round; PFishx is the x coordinate position of the target 
optimal clustering node for clustering operation. 
fish x of the osprey individual p; Pt,jy is the Y coordinate 
position of osprey individual P in the J-th dimension of Step 11. Data transfer operation. 
the T-th round; PFishy is the y coordinate position of the 
target fish x of the osprey individual p. The initial node selects the jump node through heuristic 
search FA-star algorithm for data transmission. 
Step 6. Osprey individuals eat fish. 
Firstly, CM-OOA algorithm iteratively updates the 
In Step 4, the position of individual P of osprey changes. individual position of the osprey population, compares 
First, the process Step 2 Location Mapping maps the the individual position fitness values and selects the 
position of the individual osprey to the network node smallest individual fitness value, that is, the optimal 
number [23]. Secondly, the fitness value before and after individual position of the osprey. Secondly, ordinary 
the change of osprey individual is judged to judge nodes enter the cluster by calculating the fitness value of 
whether the osprey individual catches the target fish. the cluster head. Finally, the improved FA-star algorithm 
Finally, if the fitness value is greater than that before the is used for inter-cluster routing transmission. The 
change, the osprey individual P fails to catch the target detailed flow of CM-OOA algorithm is shown in Figure 
fish successfully, and the process is directly carried out in 5. 
Step 6. Otherwise, the osprey individual P successfully 
catches the target fish and eats the fish. Because the node 
coordinates are two-dimensional, the coordinates x and y 
are calculated separately for fish eating operation. The 
specific formula for eating fish is as follows： 
 
 
 
 
 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless…                   Informatica 49 (2025) 173–190  179    
E 2
1 =  𝐾 ∗ 𝐸𝑒𝑙𝑒𝑐 + 𝐾 ∗ 𝜀𝑓𝑠 ∗ 𝐿1               (11) 
Start E2 = 2 ∗  𝐾 ∗ 𝐸𝑒𝑙𝑒𝑐 + 𝐾 ∗ 𝜀𝑓𝑠 ∗ (𝐿2
2 + 𝐿2
3)      (12) 
Kmeans++ clustering CChhaos optimizattiioonn  
algorithm aallgorithm L2
Initializing osprey 
L3
population position L1
Calculate the fitness 
of osprey base station
Osprey individuals node
look for fish.
Cluster head
N
Are there any
 fish?  
Y Figure 6: Data flow direction comparison chart. 
Osprey catches fish 
individually In order to solve the problem of data transmission 
direction, the algorithm is based on the mathematical 
Osprey position 
midline theorem. As shown in Figure 7, when the 
update
common node is located in the midline of the link from 
N the cluster head node to the base station, the distances 
Is it successful from the common node to the cluster head and the base 
 to catch fish?
station are equal [25]. Therefore, when d3>d1, the 
Y
Osprey eat fish common node will perform the cluster head selection 
individually. operation.CM-OOA algorithm reduces energy 
consumption by preventing nodes from transmitting far 
Osprey position away from the base station. As shown in Figure 8, the 
update cluster head node CH2 will be selected first when the 
distance d3 from the common node to the base station is 
New osprey 
greater than the distance d2 from the cluster head node 
population position
CH2 to the base station. Although it can be seen from the 
Y figure that the distance d4 from the common node to the 
t < Tmax
cluster head node CH1 is smaller than the distance d5 
N from the cluster head node CH2, the cluster head node 
Optimal osprey {CH2…} of the common node will be pre-selected with 
position less energy. If the pre-selected cluster head set is empty, 
data is directly transmitted to the base station. 
End
 Ordinary node
Figure 5: Flow chart of selecting cluster head by 
CM-OOA algorithm. Cluster head node CH
 base station
3.2 Planning of data transmission direction d1 d3
In wireless sensor networks, all nodes are clustered, so 
when the nodes are close to the base station, they will 
still be clustered. As shown in Figure 6, it causes the 
problem that the node data is transmitted outward first 
and then inward. From the energy consumption model, it 
can be calculated that the energy consumption of all  
nodes in data clustering is E2, while the energy 
consumption of nodes in direct transmission is E1. From Figure 7: Median line 
formulas (11) - (12), it can be concluded that the direct 
transmission mode of some nodes has low energy 
consumption. 
 
 
 
 
180   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
4.1 Cluster head selection 
Ordinary node
CH In the process of cluster head combination selection, 
1
firstly, the number of cluster head nodes in the network is 
d4 Cluster head node CH calculated and the CM-OOA algorithm population is 
d d initialized. The virtual nodes are output by CM-OOA 
1 d 5
3 base station
algorithm, and the virtual nodes are mapped to the real 
CH2
d network to output the real and reasonable cluster head 
2
combination. 
4.1.1 Size of optimal number of cluster heads 
 The energy consumption of nodes is an important factor 
Figure 8: Cluster head selection model of common affecting the communication time of emergency 
nodes. communication network, and the number of cluster heads 
 plays a vital role in the whole network [28-29]. The main 
3.3 Best next hop node consumption of emergency communication is divided 
into ordinary nodes transmitting cluster head Ept, 
According to the energy consumption model, ordinary nodes directly transmitting base station Ecp, 
transmission energy consumption is directly proportional cluster head nodes receiving intra-cluster node data Ecn, 
to the square of distance, and data transmission is the cluster head nodes fusing data Er, and cluster head nodes 
main energy consumption of wireless sensor networks. sending data to base station Ecj. The nodes deployed in 
According to the geometric cosine theorem and the the a×a model are evenly distributed, and (N-n) nodes are 
first-order radio network energy model [26]. Therefore, evenly distributed in KN circular clusters, and n nodes 
the algorithm in this paper adopts multi-hop mode for are directly transmitted to the base station, so the energy 
data transmission. consumption for one round of network transmission is: 
𝐸𝐴𝐿𝐿 = KN ∗ (𝐸𝑝𝑡 + 𝐸𝑐𝑛 + 𝐸𝑟 + 𝐸𝑐𝑗) + 𝐸𝑐𝑝       (13) 
base station
Neighboring node The energy consumption of common nodes in each 
N cluster is: 
d Starting node
1 L1
d 𝐸𝑝𝑡 = (𝑘 ∗ 𝐸 2 𝑁−𝑛
d d
5 𝑒𝑙𝑒𝑐 + 𝑘 ∗ 𝜀𝑓𝑠 ∗ 𝑑𝑐𝑛𝑡𝑜𝐶𝐻) ∗ ( − 1)  (14) 
3 𝐾𝑁
2
The energy consumption of ordinary nodes directly 
d4 transmitting base stations is: 
L2
 𝐸𝑐𝑝 = (𝑘 ∗ 𝐸 2
𝑒𝑙𝑒𝑐 + 𝑘 ∗ 𝜀𝑓𝑠 ∗ 𝑑𝑐𝑛𝑡𝑜𝐶𝐻) ∗ 𝑛       (15) 
Figure 9: Neighbor node selection model. 
The cluster head node receives the energy consumption 
In the process of multi-hop data transmission, FA-star of nodes in the cluster as follows: 
algorithm and heuristic search are used to select the 
transmission path. The destination base station is reached 𝑁−𝑛
𝐸
by finding the minimum cost of the path [27]. In this 𝑐𝑛 = 𝑘 ∗ 𝐸𝑒𝑙𝑒𝑐 ( − 1)                (16) 
𝐾𝑁
algorithm, the neighbor nodes are selected in the same 
The energy consumption of nodes in cluster head node 
way as the cluster heads in the clustering algorithm. The 
fusion cluster is: 
Euclidean distance d1 from the start node N to the 
neighboring node is less than the Euclidean distance d3 
𝑁−𝑛
from the start node to the base station, so the neighboring 𝐸𝑟 = 𝑘 ∗ 𝐸𝐷𝐴 ( )                   (17) 
𝐾𝑁
node chooses L1. As shown in Figure 9, the starting node 
n of the neighboring node {L1...} directly transmits if the The energy consumption transmitted from the cluster 
neighboring node is empty. head node to the base station is: 
 𝐸 2
𝑐𝑗 = 𝑘 ∗ 𝐸𝑒𝑙𝑒𝑐 + 𝑘 ∗ 𝜀𝑓𝑠 ∗ 𝑑𝐶𝐻𝑡𝑜𝐵𝑆             (18) 
4  Design of CM-OOA algorithm 
In the formula, dCHtoBS is the distance from the cluster 
In this paper, the energy consumption of three-layer head node to the base station. 
network is analyzed in detail, and the clustering 
The distance from the common node to the cluster head 
algorithm of CM-OOA network is proposed by 
node in each cluster is: 
combining the chaotic Osprey optimization cluster head 
combination selection algorithm with the data 
𝑎2
transmission direction and the best next hop strategy. The 𝑑𝑐𝑛𝑡𝑜𝐶𝐻 = √𝜌 ∗ ∬(𝑥2 + 𝑦2) 𝑑𝑥 𝑑𝑦 =      (19) 
√2𝜋∗𝐾𝑁
algorithm is divided into cluster head selection stage, 
cluster establishment stage and data transmission stage. Sorting out the above equations (13) - (19), calculating 
The algorithm flow chart is shown in the Figure 10. the value of KN when EALL is minimized by deriving 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless…                   Informatica 49 (2025) 173–190  181    
the overall energy consumption of the network in one 
𝑁∗𝜀𝑓𝑠∗𝑎2
round, and obtaining the optimal number of cluster heads KN = √                 
2𝜋∗(𝜀𝑓𝑠∗𝑑2 (20) 
𝐶𝐻𝑡𝑜𝐵𝑆−𝐸𝑒𝑙𝑒𝑐)
KN as follows: 
 
Start
Kmeans++ clustering CChhaos  optimizattiioonn  
algorithm aallgorithm
Initializing osprey Cluster Head 
population position Selection
Calculate the fitness 
of osprey
Osprey individuals 
look for fish.
N
Are there any fish?
Y
Osprey catches fish 
individually Ordinary Distance N
node into comparison
clusters
Osprey position 
update Y
Cluster head 
comparison fitness Transmission to 
N function value the base station
Is it successful to 
catch fish?
Ordinary nodes select 
Y cluster heads to form 
clusters
Osprey eat fish 
individually. Node 
data Distance comparison
transfer 
Osprey position based on 
update Astar 
algorith Election of 
m neighbouring nodes
New osprey 
population position Calculate neighbour 
node fitness value
Y
t < Tmax Selection of the next 
hop node to transmit 
N to the base station
Optimal osprey 
position
End
 
Figure 10: Flow chart of CM-OOA algorithm. 
 
4.1.2 Population initialisation       Calculate the polar angle from the node to the base station 
      Obtaining the polar angle and converting the polar angle into the 
In order to avoid the problems of slow convergence 
speed and easy to fall into local optimal solution of initial value of the mapping. 
Osprey optimization algorithm, this algorithm maps the 
      Using Logistic mapping, Equation (9)-(10) introduces chaotic 
initial population by chaos. The detailed flow of the 
chaotic mapping algorithm is shown in Algorithm 1. characteristics. 
       The chaotic characteristics are inversely transformed to obtain a 
new polar angle. 
Algorithm 1: Pseudo-code of circular symmetric chaotic 
mapping algorithm.       Obtain a new polar angle and calculate the virtual coordinates in 
 the mapped rectangular coordinate system. 
Algorithm 1: Initialization of osprey population by chaotic mapping End 
algorithm 
 
Begin: 
 
182   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
4.1.3 Location mapping reciprocal is smaller, and the node can forward data 
better under the same conditions, it should be selected as 
The coordinates of nodes in wireless sensor networks are 
the cluster head. 
all random, but the coordinate positions change randomly 
after CM-OOA algorithm. After the coordinate of 
CM-OOA algorithm is transformed separately from the X 𝐹1 = 1⁄𝐸𝑖                          (22) 
axis and the Y axis, there may be no real node at this The level of Euclidean distance between cluster heads: 
coordinate [30]. Therefore, CM-OOA algorithm designs the reciprocal of the sum of distances between cluster 
a position mapping function through the Euclidean head nodes. The location of the cluster head determines 
distance to the virtual position and the energy of the real the distance of data transmission by the nodes entering 
node, and maps the coordinates at the virtual position to the cluster. Cluster heads should be evenly dispersed to 
the nodes in the actual coordinate space through the reach the distance of all nodes. 
position. The location mapping formula is as follows: 
𝐹2 = 1⁄∑ 𝑑𝑖𝑠(𝐶𝐻𝑖, 𝐶𝐻𝑗)                 (23) 
F = 𝜃1 ∗ 𝑑 + 𝜃2 ∗ 𝐸                  (21) 
Euclidean distance between cluster heads and nodes: the 
In the formula, d is the Euclidean distance from the sum of the distances from cluster head node to all nodes. 
virtual position to the node; E is the energy of the node; Energy consumption in network cycle mainly comes 
θ1 and θ2 are weight factors and satisfy θ1 + θ2 =1. from node transmission. The sum of the positions of all 
nodes in the cluster is the smallest, so as to minimize the 
The detailed flow of the location mapping algorithm is 
energy consumption of data transmission. 
shown in Algorithm 2. 
Algorithm 2: Pseudo-code of position mapping 𝐹3 = ∑ 𝑑𝑖𝑠(𝑁𝑗 , 𝐶𝐻𝑖)                     (24) 
algorithm. 
Euclidean distance from the cluster head to the base 
Algorithm 2: The virtual position is projected to the real node 
station: the sum of distances from all cluster head nodes 
through the mapping function. to base station BS. The transmission of cluster head node 
is the energy consumption of the second part of the 
Begin: 
network cycle, and the distance from cluster head node to 
     Calculating Euclidean distance d from all nodes to virtual base station determines the energy consumption of 
position coordinates cluster head node [33]. Therefore, the sum of the 
distances from the cluster head node to the base station is 
     Obtain the energy e of all nodes themselves. the smallest, and the information can be transmitted to 
     Calculate the position mapping function by formula the base station with the least energy consumption. 
(4)-(5) 
𝐹4 = ∑ 𝑑𝑖𝑠( 𝐶𝐻𝑖 , 𝐵𝑆)                    (25) 
     By comparing the function values, the node numbers of 
The variance of the Euclidean distance from the cluster 
virtual coordinates projected into the real network are head to the base station: variance of distance from all 
selected. cluster heads to base station. Because there is more than 
one cluster head node, and only the sum of the distances 
End 
from the cluster head to the base station is kept to the 
minimum, there may be a very long distance between a 
4.1.4 Design CM-OOA algorithm adaptation function 
node and the base station. Therefore, by adding the 
In order to optimize the selection of cluster heads and variance of the distance from the cluster head to the base 
improve the life cycle of the network, after determining station to control the distance between the cluster head 
the optimal number of cluster heads, the fitness function and the base station, all the distances from the cluster 
is set according to the state of nodes and the position of head to the base station can be better kept to be the 
pre-selected cluster heads [31]. The cluster head node is minimum. 
responsible for the data forwarding of ordinary nodes. 
Therefore, the selection of cluster head should have the 𝐹5 = 𝑉𝑎𝑟(∑ 𝑑𝑖𝑠( 𝐶𝐻𝑖, 𝐵𝑆))                (26) 
characteristics of high energy, reasonable location and 
less times of becoming a cluster head. The fitness The variance of the cluster head to cluster head 
function of CM-OOA algorithm is designed from the Euclidean distance: variance of cluster head to cluster 
following six aspects: the energy of nodes, the distance head distance [34]. There is more than one cluster head 
between cluster heads, the distance between cluster heads node, so it is necessary to prevent the distance between 
and each node, the distance from cluster heads to base them from appearing some deviations that are very close 
stations, the variance of the distance from cluster heads and some are very far away. Therefore, by adding the 
to base stations, and the variance of the distance from variance of the distance from cluster head to cluster head 
cluster heads to cluster heads. to control the gap between cluster heads, the distribution 
of all cluster heads can be better maintained and more 
The energy level of the node itself: the reciprocal of the reasonable. 
remaining energy of the current node. The cluster head 
node is the key condition to support the network 𝐹6 = 𝑉𝑎𝑟(∑ 𝑑𝑖𝑠(𝐶𝐻𝑖, 𝐶𝐻𝑗))              (27) 
operation [32]. If the energy of the node is higher, the Based on the energy of nodes, the distance between 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless…                   Informatica 49 (2025) 173–190  183    
cluster heads, the distance from cluster heads to nodes, End 
the distance from cluster heads to base stations, the 
variance of the distance from cluster heads to base  
stations and the variance of the distance from cluster 
heads to cluster heads, the fitness function is designed by 4.2 Cluster establish stage 
weight control: In the stage of cluster establishment, in order to prevent 
reverse data transmission, the ordinary nodes of the base 
𝐹𝑖𝑡𝑛𝑒𝑠𝑠 = 𝛼1 ∗ 𝐹1 + 𝛼2 ∗ 𝐹2 + 𝛼3 ∗ 𝐹3 + 𝛼4 ∗ 𝐹4 + 𝛼5 ∗ 𝐹5 + station first judge whether to enter the cluster or not. 
Some nodes directly transmit data to the base station to 
𝛼6 ∗ 𝐹6                        (28) reduce the influence of "hot spot effect" in the network. 
By comparing the fitness values of cluster head nodes, 
In the formul hierarchical analysis method to calculate 
the appropriate cluster head nodes are selected. The 
the weightsa, α1, α2, α3, α4 and α5  are weight factors 
clustering algorithm is shown in Algorithm 4. 
and satisfy ∑ αi = 1. 
Through the fitness function value of cluster head nodes, 
According to the improved fitness function, the fitness 
the preselected cluster head with the minimum value is 
functions of all osprey individuals are calculated, and the 
compared. That is, the cluster head node of ordinary 
optimal position of osprey individuals is selected. The 
nodes. The fitness function of this algorithm is as 
algorithm flow is shown in Algorithm 3. 
follows: 
Algorithm 3: Pseudo-code of cluster head node selection 
algorithm. F = 𝛽1 ∗ E + 𝛽2 ∗ dis(N, CH)            (29)                
Algorithm 3: Select the cluster head according to the In the formula, dis (N, CH) denotes the Euclidean 
improved fitness function. distance from the common node to the head node of the 
 
pre-selected cluster, and E represents the energy of the 
Begin: pre-selected cluster head node in the current round. β1=0.4 
  Initializing a network node to obtain the initialized osprey and β2=0.6 are weight factors and satisfy β1+β2=1. 
population position. Algorithm 4: Ordinary node cluster. 
      Calculate the fitness value of the osprey individual, Algorithm 4: Network node cluster establishment. 
and keep the individual position and fitness value with Begin: 
the minimum fitness value.    Obtaining cluster head node set from algorithm 3. 
      While t < tmax do If  Ordinary nodes satisfy the selection of cluster head nodes. 
          By comparing the fitness values of osprey If Meet the conditions of pre-selecting cluster heads for 
population, the individual fish school of osprey is ordinary nodes. 
generated.         The cluster head is put into the reselected cluster head 
          All osprey individuals began to fish. set. 
          Position mapping of osprey individuals after     Else. 
fishing         Ordinary nodes are put into the set of direct 
          Update osprey position transmission base stations. 
       if  Fitness value of osprey position before fishing >     End. 
Fitness value of osprey position after fishing Else 
          After successful fishing, osprey individuals began Nodes directly transmit data to the base station without 
to eat fish. joining the cluster. 
          Position mapping of osprey individuals after End 
eating fish     Calculate the fitness value of the pre-selected cluster head. 
Ordinary nodes select the cluster head node and perform 
end 
cluster entry operation. 
          Update The location of the new osprey population 
End. 
          Update Individual osprey with minimum fitness 
value and fitness value  
          t = t + 1 4.3 Node data transfer based on FA-star 
algorithm 
          Return The position and fitness value of osprey 
with minimum fitness value The network data transmission process in this paper 
adopts multi-hop mode. The heuristic function of Astar 
 
184   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
algorithm is optimized by the energy of nodes and the energy consumption of the network system, the number 
forwarding times of nodes, which avoids the problem of of dead nodes and the number of surviving nodes. The 
selecting nodes with the same next hop continuously. energy consumption of data fusion process is neglected, 
Select the most suitable data transmission path. The because the communication mode is two-way, and only 
clustering algorithm is shown in Algorithm 5. one communication direction is calculated for the 
convenience of calculating energy consumption. Because 
Through the heuristic function of neighbor nodes, the 
of the close distance between users and nodes, the energy 
neighbor nodes with the minimum value are compared. 
consumption is negligible [35]. An 800m×800m 
That is, the next hop node of the starting node. The 
experimental simulation area is drawn and 100 X-axis 
heuristic function of FA-star algorithm is as follows: 
coordinates and 100 Y-axis coordinates are randomly 
generated to combine into 100 nodes, and the base station 
F = 𝛾1 ∗ 𝐸 + 𝛾2 ∗ 𝑑𝑖𝑠(𝑁, 𝐿) + 𝛾3 ∗ 𝑑𝑖𝑠(𝐿, 𝐵𝑆) + 𝛾4 ∗ 𝐺   is located in the center of the area. From formula (20), 
the optimal number of cluster heads is KN = 0.04 * n. 
(30) The specific parameters are shown in Table 3. 
In the formula, E represents the energy of neighboring Table 3: Experimental parameter table. 
nodes, G represents the forwarding times of neighboring 
nodes, dis(N,L) represents the distance from the starting Parameter Numerical value 
node to neighboring nodes, and dis(L,BS) represents the 
distance from neighboring nodes to the base station. γ1, Number of network nodes 100 
γ2, γ3 and γ4 are the weight influencing factors and 
satisfy γ1+γ2+γ3+γ4=1. Network area size 800m×800m 
 
Algorithm 5: Network data transmission Base station coordinate position (100,100) 
Begin:  Energy loss coefficient of free 
10 Pj/bit/m2 
    The cluster head node set and the direct transmission space model 
set in the acquisition algorithm 3 are merged into the 
Energy loss coefficient of 
initial node set. 0.0013 Pj/bit/m2 
multipath attenuation model 
    while Starting node ≠ base station. 
       If  Meet the condition of neighbor nodes Node initial energy 4 J 
          Ordinary nodes are put into the set of 
Number of networks running 
neighboring nodes. 1000 rounds 
rounds 
       End. 
 
If  Neighbor node set is empty. 
5.2 Analysis of energy change of emergency 
The originating node sends data directly to the 
communication system 
base station. 
The residual energy of wireless sensor network system 
       Else. reflects the life cycle of emergency communication 
           Calculate the heuristic function comparison network [36-37]. The more residual energy, the longer 
the communication time of emergency communication 
function value of neighboring nodes, select network. The network energy of the four algorithms 
the next hop node and transmit data. changes as a whole, as shown in Figure 11.  
       End. 
    End.   
End. 
 
5  Experimental simulation analysis 
5.1 Experimental parameters 
In order to examine the simulation effectiveness of 
CM-OOA algorithm in extending the network life cycle, 
the algorithms are compared and analyzed on MATLAB  
Figure 11: Changes of residual energy in emergency 
R2023b platform. The advantages of the basic algorithm 
communication network. 
LEACH algorithm and the latest cluster classification 
algorithms PSO-C algorithm, CGWOA algorithm and the  
CM-OOA algorithm in this paper are verified in terms of 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless…                   Informatica 49 (2025) 173–190  185    
Of the 1000 rounds of energy consumption, the LEACH Although CGWOA algorithm appears dead nodes later 
algorithm consumed all of its energy in the 250th round, than PSO-C algorithm, the localised death rate is faster, 
the PSO-C algorithm had 23% of its energy remaining which should not be ignored. In contrast, the CGWOA 
after the 1000th round, the CGWOA algorithm had 35% algorithm has better overall changes than the PSO-C and 
of its energy remaining after the 1000th round, the LEACH algorithms, and grows much slower than the 
CM-OOA algorithm still had 43% of its energy in the CM-OOA algorithm. The CM-OOA algorithm's dead 
1000th round, and it consumed it slower than the other nodes grow relatively slowly, with only 13% of dead 
algorithms from the 0th to 1000th rounds. It can be seen nodes after 1,000 rounds. The CM-OOA algorithm 
that, compared with other algorithms, the CM-OOA balances the network's overall energy consumption, 
algorithm selects the optimal cluster head through the spreads out the energy loss to all the nodes, and prevents 
chaotic mapping osprey optimisation algorithm, taking the nodes from localised death and extends the duration 
node energy and node transmission distance as the main of emergency communication. 
factors, and the variance of the distance from the cluster 
 
head to the base station and the variance of the distances 
between the cluster heads as the auxiliary factors, and 5.4 Analysis of changes in the number of 
performs the cluster selection based on the information, surviving nodes in the network 
and performs the cluster selection based on the distance 
of the information transmission and the node energy in When emergency communication wireless sensor 
the clustering stage, instead of using a single inter-cluster network nodes are used in dangerous processes such as 
distance as weights, instead of using node energy and emergency rescue and disaster relief survey, they will not 
number of node forwards, distance from start node to be replaced frequently, and at the same time, they are 
neighbouring nodes and distance from neighbouring limited by the energy of nodes [39]. Therefore, for the 
nodes to base station. The FA-star algorithm with same environment, the more nodes survive, the fewer 
heuristic function can better reduce the energy dead nodes, and the longer the communication time. The 
consumption and extend the life cycle of emergency number of surviving nodes of the four algorithms varies 
communication network. from 0 to 1000 rounds, as shown in Figure 13: 
 
5.3 Analysis of the number of dead nodes in 
the network 
The number of dead nodes in wireless sensor networks 
reflects the overall stability of the network. The more 
dead nodes, the greater the impact on the overall 
emergency communication network, the smaller the 
coverage area and the faster the death rate [38]. The 
number of dead nodes of the four algorithms changes, as 
shown in Figure 12: 
 
Figure 13: Changes in the number of surviving nodes in 
communication networks. 
After 1,000 rounds of energy consumption in the 
emergency communication network, it can be seen from 
Figure 13 that the nodes of the LEACH algorithm are 
almost all dead after 300 rounds, and the nodes of the 
PSO-C algorithm have 33% active nodes remaining after 
1,000 rounds [40]. The CGWOA algorithm, after 
experiencing a slow decline, slowly tends to be stable 
after 550 rounds, until only 50% active nodes remain 
after 1,000 rounds. After 1000 rounds, the CM-OOA 
 algorithm still has 8% nodes, which improves the time of 
information communication, good stability, suitable for 
Figure 12: Changes in the number of dead nodes in information data collection in special environments, and 
communication networks. gives full play to the optimisation ability of the CM-OOA 
algorithm. The improvement of fitting function further 
In Figure 12, the various algorithms start to show dead 
optimises the accuracy and efficiency of cluster head 
nodes after 30 rounds, the LEACH algorithm clearly 
election. FA-star algorithm reduces the energy 
shows dead nodes after about 35 rounds, and almost all 
consumption of cluster heads in inter-cluster route 
nodes die after 300 rounds, while the PSO-C algorithm's 
construction, avoids the premature death of cluster heads, 
rate of dead nodes grows faster.  
and gives full play to the sensor's ability to transmit 
 information in the whole network. 
 
186   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
5.5 Comparative analysis of node data 
transmission delay 
Another key criterion is the network transmission delay. 
This is highly dependent on the distance between nodes 
in the transmission path. In the same experimental setting, 
this paper compares the network delay by the average 
transmission distance of the network of nodes. The 
average transmission distance of four algorithms from 0 
to 1000 rounds of data transmission is shown in Figure 
14. 
 
Figure 15: A comparison of the average transmission 
distance of nodes is presented. 
A comparison of the average transmission distance for 
each 100-round interval in Figure 15 reveals that the 
average transmission distance of the CM-OOA algorithm 
is 80% less than that of the CGWOA protocol. 
Furthermore, the average transmission distance of the 
CGWOA protocol is 90% less in the initial stages and 30% 
less in the subsequent stages than that of the PSO-C 
protocol. The transmission distance of the LEACH 
protocol is zero due to the death of all nodes after 400 
 rounds. 
Figure 14: The average variation in node transmission 
 
distance per round. 
5.6 Comparison of results of surviving nodes 
Figure 14 shows that the average transmission distance 
in areas of different sizes 
of Leach protocol is greater than the other protocols. In 
contrast, the CM-OOA algorithm has the lowest Equation (2) with the data from the experimental 
transmission distance profile and the average environment allows the calculation of the thresholds for 
transmission distance is lower than the other protocols. the two types of communication, and the number of 
The comparison of the average transmission distance in surviving nodes after 0, 500 and 1000 rounds of data  
every 100 rounds is presented in Figure 15. 
 
Table 4: Comparison of the number of surviving nodes in different rounds. 
Area size         r o u n  d  LEACH CGWOA CM-OOA PSO-C 
0r 100 100 100 100 
1000*1000 500r 0 45 83 30 
1000r 0 35 61 17 
0r 100 100 100 100 
800*800 500r 0 60 93 55 
1000r 0 50 88 32 
0r 100 100 100 100 
600*600 500r 28 76 97 81 
1000r 0 63 90 61 
  
transmission is comparatively analyzed for three different cause of this phenomenon is the rise in the average 
geographical regions: a 1000*1000 area (characterised by number of hops traversed by data packets on their 
a high percentage of multi-path fading communication transmission path, coupled with the expansion of the 
methodology), an 800*800 area (where the percentage of distance between nodes within a cluster. This results in 
both communication methodologies is approximately an exponential growth in the energy expenditure 
equal) and a 600*600 area (where the percentage of free associated with data transmission. 
space communication methodology is high). The results The number of nodes directly reflects the life cycle of the 
are presented in tabular form in Table 4. network. In larger area networks, cluster heads further 
As illustrated in Table 4, the expansion of the working away from the base station die quickly. When there are 
area of the wireless sensor network is associated with a 100 nodes in a 600*600 area, the CM-OOA algorithm 
reduction in the network's overall life cycle. The primary  has 90 surviving nodes after 1000 rounds of data 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless…                   Informatica 49 (2025) 173–190  187    
transmission, which is a 27% improvement in the number the distance to clusters, the frequency of base station 
of surviving nodes compared to the CGWOA algorithm heads and cluster heads, etc. CM-OOA algorithm is used 
and a 29% improvement in the number of surviving to update the population and select the best individual 
nodes compared to the PSO-C algorithm. When the based on the fitness value, which has the advantage of 
number of nodes in 800*800 area is 100, after 500 global search convergence and balance the consumption 
rounds of data transmission, the number of surviving of network energy in each cluster. In the inter-cluster 
nodes of LEACH algorithm, CGWOA algorithm, routing stage of communication, the heuristic function 
CM-OOA algorithm and PSO-C algorithm decreases by based on Astar algorithm is used to reduce the 
28%, 16%, 4% and 26% respectively compared to that of consumption of energy cluster head nodes and alleviate 
600*600 area, and the number of surviving nodes of the hot spot effect. The analysis results show that the 
CM-OOA and CGWOA algorithms are relatively stable. algorithm reduces the node mortality and the maximum 
However, in the 1000*1000 region, the CM-OOA number of surviving nodes in the whole energy 
algorithm only reduces the number of surviving nodes by consumption network, which effectively improves a part 
39% after 1000 rounds of data transmission. In addition, of the life cycle network. 
the CM-OOA algorithm has 61 surviving nodes after In this manuscript, the CM-OOA algorithm only 
1000 rounds of data transmission in the 1000*1000 range, considers the energy consumption of emergency 
which is a 26% increase in the number of surviving communication and does not consider network security. 
nodes compared to the CGWOA algorithm. In the In the next step, we will continue to optimize the 
CM-OOA algorithm, the central selection of cluster head algorithm and optimize the security of this algorithm as 
nodes and the use of multi-hop transmission further much as possible. Prevent malicious attacks on nodes, 
prolong the network life cycle. Therefore, the CM-OOA which can cause energy consumption and data theft of 
algorithm has the longest network life cycle, which nodes, so that network information security is guaranteed 
proves that the scalability and stability of the CM-OOA to a certain extent. The algorithm is combined with the 
algorithm is much better than the other four algorithms. practical situation and applied in the real emergency 
communication network. 
 
 
6  Conclusion Availability of data and materials 
In this manuscript, an optimization algorithm based on 
This paper proposes an emergency communication 
chaotic mapping osprey optimization is proposed to 
algorithm for wireless sensor networks based on chaos 
prolong the duration of emergency communication by 
mapping and osprey optimization. The specific 
reducing energy consumption. The fitness function is 
information of the paper can be exchanged with the 
improved by node energy, the distance between cluster 
author. 
heads, the distance from cluster heads to nodes, the 
distance from cluster heads to base stations, the variance  
of the distance from cluster heads to base stations and the Conflict of interest 
variance of the distance between cluster heads. CM-OOA 
algorithm updates the position of the best individual The authors confirm that the content of this article has no 
based on fitness value, giving full play to the advantages conflict of interest. 
of global search and convergence, and balancing the 
network energy consumption in each cluster. In the  
inter-cluster routing communication stage, FA-star Acknowledgement 
algorithm based on heuristic function is used to reduce 
the energy consumption of cluster head nodes.  This research study is supported by the Smart Teaching 
The location of cluster head node in this algorithm is Special Project for Undergraduate Institutions in Henan 
more reasonable and the energy consumption of data path Province, the General Project of Education Science 
transmission is lower. Compared with LEACH, PSO-C Planning in Henan Province (Research on Software 
and CGWOA. Through the comparative analysis of the Engineering Talent Training Mode under the Integration 
results, the energy consumption of the whole network is of New Engineering and OBE Concept, 2023YB0174), 
reduced. The number of surviving nodes in the network the Undergraduate Industry Education Integration 
is the largest, which effectively improves the life cycle of Research Project in Henan Province, the Graduate 
the emergency communication network. Education Reform Project in Henan Province 
 (2023SJGLX300Y), the New Engineering and New 
 Format Textbook Project for Undergraduate Institutions 
 in Henan Province, the Graduate Education Reform and 
Quality Improvement Project of Nanyang Normal 
7  Discussion University (2023ZLGC06), and the Research Projects of 
In this study, the energy consumption of wireless sensor Nanyang Normal University (2025STP009, 2025STP01 
networks is deeply discussed through the three-layer 0). 
network model, and an energy-efficient clustering  
algorithm based on Osprey optimization and heuristic 
path is proposed. The osprey optimization algorithm can  
improve the energy of nodes, the distance between  
cluster heads, the distance from cluster heads to nodes, 
 
188   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
References Algorithm-based approach". in Proc. 14th 
International Conference on Computing 
[1] Kapoor Leena Kohli et al. (2023)., "Satellite Wi-Fi 
Communication and Networking Technologies. 
Terminal for Post-Disaster Emergency 
https://dx.doi.org/10.1109/ICCCNT56998.2023.103
Communication Management". in Proc. 2023 
07636 
International Conference on Computer, Electrical 
[11] Muntather et al. (2024)., "Chaotic Grey Wolf 
and Communication Engineering. 
Optimization for Energy-Efficient Clustering and 
https://dx.doi.org/10.1109/ICCECE51049.2023.100
Routing in Wireless Sensor Networks". in Proc. 2nd 
85637 
International Conference on Integrated Circuits and 
[2] K. Viswavardhan Reddy and N. Kumar (2021)., 
Communication Systems. 
"SNR based Energy Efficient Communication 
https://dx.doi.org/10.1109/ICICACS60521.2024.10
Protocol for Emergency Applications in WBAN", 
499088 
International Journal of Advanced Computer 
[12] V.Akshay, K. Sunil, G. Prateek Raj, R. Tarique and 
Science and Applications, vol. 12, no. 9, pp. 
K. Arvind (2023)., "Enhanced Cost and Sub-epoch 
268-275. 
Based Stable Energy-Efficient Clustering Algorithm 
https://dx.doi.org/10.0.56.233/IJACSA.2021.01209
for Heterogeneous Wireless Sensor Networks", Wire. 
30 
Pers. Comm., vol. 131, no. 4, pp. 3053-3072. 
[3] Al Aghbari Zaher, Pravija Raj. P. V., Mostafa 
https://dx.doi.org/10.0.3.239/s11277-023-10601-2 
Reham R. and Khedr Ahmed M (2024)., 
[13] D. Rahul and D. Mond (2024)., "Cluster head 
"iCapS-MS: an improved Capuchin Search 
selection and malicious node detection using 
Algorithm-based mobile-sink sojourn location 
largescale energy-aware trust optimization 
optimization and data collection scheme for 
algorithm for HWSN", J. of Reli. Inte. Envi., vol. 10, 
Wireless Sensor Networks", Neural Computing and 
no. 1, pp.55-71. 
Applications, vol. 36, no. 15, pp. 8501-8517.  
https://dx.doi.org/10.0.3.239/s40860-022-00200-6 
https://dx.doi.org/10.0.3.239/s00521-024-09520-5 
[14] Pal. Raju, S. Mukesh, K. Sandeep, N. Anand and R. 
[4] H. Wendi Rabiner et al. (2000)., "Energy-Efficient 
Pushpendra Kumar (2024), "Energy efficient 
Communication Protocol for Wireless Microsensor 
multi-criterion binary grey wolf optimizer-based 
Networks". Proceedings of the 33rd Annual Hawaii 
clustering for heterogeneous wireless sensor 
International Conference on System Sciences, Maui, 
networks", Soft Comp., vol. 28, no. 4, pp. 
Hi, USA, pp.10. 
3251-3265. 
https://dx.doi.org/10.1109/HICSS.2000.926982 
https://dx.doi.org/10.0.3.239/s00500-023-09316-0 
[5] S. Madhvi, S. Aarti and R. Shefali. (2024). "An 
[15] Technologies for wireless sensor networks, by R. 
Approach to Increase the Lifetime of Traditional 
Khanna, Yi qian, G. Pisharody, R. Arvind, Jiejie 
LEACH Protocol Using CHME-LEACH and 
Wang, Laura M. Rumbel, Christopher. R., Carlson, 
CHP-LEACH", Lecture Notes in Networks and 
Jennifer, M. Williams. and P. Adu Agyeman. (2024, 
Systems, vol. 868, pp. 133-145. 
Apr 18). Patent A1 20240130002. 
https://dx.doi.org/10.1007/978-981-99-9037-5_11 
[16] Jasim Mohammad Omer K. and Salih Bassim M 
[6] J. Suman, K. Shyamala, G. Roja and N. Pranay. 
(2024). "Improving Task Scheduling in Cloud 
(2023). "Testbed Implementation 
Datacenters by Implementation of An Intelligent 
of MAX LEACH Routing Protocol and Sinkhole 
Scheduling Algorithm". Informatica (Slovenia), vol. 
Attack in WSN". Lecture Notes in Networks and 
48, no.10, pp.77-88.  
Systems, vol. 612, pp. 153-162. 
https://doi.org/10.31449/inf.v48i10.5843 
https://dx.doi.org/10.1007/978-981-19-9228-5_14 
[17] Suhag. Sumit and Aarit (2024)., "Challenges and 
[7] Gülbaş, Gülşah and Çetin, Gürcan (2023)., 
Potential Approaches in Wireless Sensor Network 
"Lifetime Optimization of the LEACH Protocol in 
Security", Journal of Electrical Engineering and 
WSNs with Simulated Annealing Algorithm", 
Technology, vol. 19, no. 4, pp. 2693-2700. 
Wireless Personal Communications, vol. 132, no. 4, 
https://dx.doi.org/10.0.3.239/s42835-023-01751-1 
pp. 2857-2883. 
[18] Heidari Ehsan (2024)., "A novel energy-aware 
https://dx.doi.org/10.0.3.239/s11277-023-10746-0 
method for clustering and routing in IoT based on 
[8] Mishra, Rashmi and Yadav, Rajesh K. (2023)., 
whale optimization algorithm & Harris Hawks 
"Energy Efficient Cluster-Based Routing Protocol 
optimization", Computing, vol. 106, no. 3, pp. 
for WSN Using Nature Inspired Algorithm", 
1013-1045. 
Wireless Personal Communications, vol. 130, no. 4, 
https://dx.doi.org/10.0.3.239/s00607-023-01252-z 
pp. 2407-2440. 
[19] Wireless sensor system, wireless terminal device, 
https://dx.doi.org/10.0.3.239/s11277-023-10385-5 
communication control method and communication 
[9] N. M. Latiff Abdul et al (2007).  "Energy-Aware 
control program, by M. Funaki, Y. Tanaka, D. 
Clustering for Wireless Sensor Networks using 
Murata and T. Yamamoto. (2024, Mar 12). Patent 
Particle Swarm Optimization". in Proc. 2007 IEEE 
B2 11930431. 
18th International Symposium on Personal, Indoor 
[20] Jaiswal K. and Anand V. (2024)., "ESND-FA: An 
and Mobile Radio Communications. 
Energy-Efficient Scheduled Based Node 
https://dx.doi.org/10.1109/PIMRC.2007.4394521 
Deployment Approach Using Firefly Algorithm for 
[10] B. Komuraiah, Bollena. Navya. and Jhanvitha. B. 
Target Coverage in Wireless Sensor Networks", 
(2023). "Enhanced Lifetime with less energy 
International Journal of Wireless Information 
consumption in WSN Using a Genetic 
 
CM-OOA：An Energy-Efficient Clustering Algorithm for Wireless…                   Informatica 49 (2025) 173–190  189    
Networks, vol. 31, no. 2, pp. 121-141. "Energy aware clustering protocol using chaotic 
https://dx.doi.org/10.0.3.239/s10776-024-00616-2 gorilla troops optimization algorithm for Wireless 
[21] K. Rasidul, Z. Mehboob, De. Debashis. and Das. Sensor Networks", Multimedia Tools and 
Abhishek. (2024)., "MKFF: mid-point K-means Applications, vol. 83, no. 8, pp. 23853-23871. 
based clustering in wireless sensor network for https://dx.doi.org/10.0.3.239/s11042-023-16487-3 
forest fire prediction", Microsystem Technologies, [31] Vikhyath. V. K. and Achyutha Prasad. A. p.(2023)., 
vol.30, no.4, pp.469-480. "Optimal Cluster Head Selection in Wireless Sensor 
https://dx.doi.org/10.0.3.239/s00542-023-05578-8 Network via Combined Osprey-Chimp 
[22] L. Marcin, M. Lazaros, Baptista. Murilo S. and Optimization Algorithm: CIOO", International 
Volos. Christos Source (2024)., "Discrete Journal of Advanced Computer Science and 
one-dimensional piecewise chaotic systems without Applications, vol. 14, no. 12, pp. 401-407. 
fixed points", Nonlinear Dynamics, vol. 112, no. 8, https://dx.doi.org/10.0.56.233/IJACSA.2023.01412
pp.6679-6693. 41 
https://dx.doi.org/10.0.3.239/s11071-024-09349-6 [32] Shakil Ahmed et al. (2023)., "Sky's the Limit: 
[23] B. A. Omar, C. D. Zaineb, B. Slim and Ben Said. L. Navigating 6G with ASTAR-RIS for UAVs Optimal 
(2024)., "Many-objective optimization of wireless Path Planning". in Proc. 28th IEEE Symposium on 
sensor network deployment", Evolutionary Computers and Communications: Computers and 
Intelligence, vol. 17, no. 2, pp. 1047-1063. Communications for the Benefits of Humanity. 
https://dx.doi.org/10.0.3.239/s12065-022-00784-1 https://dx.doi.org/10.1109/ISCC58397.2023.102180
[24] K. Neethu, Sundar G. Naveen and Narmadha D. 58 
(2024)., "Vector Based Genetic Lavrentyev [33] K. Fransen and J. Van Eekelen (2023)., "Efficient 
Paraboloid Network Wireless Sensor Network path planning for automated guided vehicles using 
Lifetime Improvement", Wireless Personal A* (Astar) algorithm incorporating turning costs in 
Communications, vol. 134, no. 4, pp. 1917-1944. search heuristic", International Journal of 
https://dx.doi.org/10.0.3.239/s11277-024-10906-w Production Research, vol. 61, no. 3, pp. 707-725. 
[25] Dinesh. K., and SVN. Santhosh Kumar (2024)., https://dx.doi.org/10.0.4.56/00207543.2021.201580
"GWO-SMSLO: Grey wolf optimization-based 6 
clustering with secured modified Sea Lion [34] Kusuma Purba D. and H. Faisal Candrasyah (2024)., 
optimization routing algorithm in wireless sensor "Enriched Coati Osprey Algorithm: A Swarm-based 
networks", Peer-to-Peer Networking and Metaheuristic and Its Sensitivity Evaluation of Its 
Applications, vol. 17, no. 2, pp. 585-611. Strategy", IAENG International Journal of Applied 
https://dx.doi.org/10.0.3.239/s12083-023-01603-9 Mathematics, vol. 54, no. 2, pp. 277-285. 
[26] Sakhri, A. Arsalan, M. Maimour, M. Kherbache, E. [35] Dinesh. K. and Santhosh. Kumar. S. V. N. (2024)., 
Rondeau and N. Doghmane (2024)., "A digital "Energy-efficient trust-aware secured neuro-fuzzy 
twin-based energy-efficient wireless multimedia clustering with sparrow search optimization in 
sensor network for waterbirds monitoring", Future wireless sensor network", International Journal of 
Generation Computer Systems, vol. 155, no. 6, pp. Information Security, vol. 23, no. 1, pp. 199-223. 
146-163. https://dx.doi.org/10.0.3.239/s10207-023-00737-4 
https://dx.doi.org/10.0.3.248/j.future.2024.02.011 [36] P. Ikkurthi Bhanu, G. Saumitra, Yogita, Y. Satyendra. 
[27] Ramya. R. and Padmapriya. K. (2023)., "An Singh and Pal. Vipin. (2024)., "HCM: a hierarchical 
implementation of energy efficient fuzzy-optimized clustering framework with MOORA based cluster 
routing in wireless sensor networks using Particle head selection approach for energy efficient 
Swarm Optimization (PSO) and Whale wireless sensor networks", Microsystem 
Optimization Algorithm (WOA)", Journal of Technologies, vol. 30, no. 4, pp. 393-409. 
Intelligent and Fuzzy Systems, vol. 44, no. 1, pp. https://dx.doi.org/10.0.3.239/s00542-023-05508-8 
595-610. [37] Asaad Alhijaj, Baida’a Abdul Qader Khuder, Imad 
https://dx.doi.org/10.0.12.161/JIFS-220963 Alshawi (2022)., "Fuzzy Data Aggregation 
[28] Preethi. R (2024)., "Assault Type Detection in WSN Approach to Enhance Energy-Efficient Routing 
Based on Modified DBSCAN with Osprey Protocol for HWSNs". Informatica (Slovenia). 
Optimization Using Hybrid Classifier LSTM with vol.46, no. 7, pp.45-47. 
XGBOOST for Military Sector", Optical Memory https://doi.org/10.31449/inf.v46i7.4272 
and Neural Networks (Information Optics), vol. 33, [38] Ustun. Deniz., Erkan. U., Toktas. Abdurrahim., Lai 
no.1, pp.53-71. Qiang and Yang liang (2024)., "2D hyperchaotic 
https://dx.doi.org/10.0.12.31/S1060992X24010089 Styblinski-Tang map for image encryption and its 
[29] Khudor Baida’a Abdul Qader, Hussein Dheyaa hardware implementation", Multimedia Tools and 
Mezaal, Kheerallah Yousif Abdulwahab, Alkenani Applications, vol. 83, no. 12, pp. 34759-34772. 
Jawad and Alshawi Imad S. (2023). "Lifetime https://dx.doi.org/10.0.3.239/s11042-023-17054-6 
Maximization Using Grey Wolf Optimization [39] N. Meenakshi, S. Ahmad, A. V. Prabu, J. Nageswara 
Routing Protocol with statistical Technique in Rao, N. A. Othman, Hikmat A. M. Abdelijaber, R. 
WSNs". Informatica (Slovenia), vol. 47, no. 5, pp. Sekar and J. Nazeer (2024)., "Efficient 
75-82.  Communication in Wireless Sensor Networks Using 
https://doi.org/10.31449/inf.v47i5.4601 Optimized Energy Efficient Engroove Leach 
[30] S. Deena, Devi. S. Suganthi. and Nalini. T. (2024)., Clustering Protocol", Tsinghua Science and 
 
190   Informatica 49 (2025) 173-190                                                                 S. Jia et al. 
Technology, vol. 29, no. 4, pp. 985-1001. 
https://dx.doi.org/10.0.103.231/TST.2023.9010056 
[40] Ariffin Nur Izzaty et al (2023). "Internet of Things 
Intercommunication Using SocketIO and 
WebSocket with WebRTC in Local Area Network 
as Emergency Communication Devices". in Proc. 
8th International Conference on Software 
Engineering and Computer Systems. 
https://dx.doi.org/10.1109/ICSECS58457.2023.102
56297 
 
https://doi.org/10.31449/inf.v49i12.7838 Informatica 49 (2025) 191–206 191
An Integrated Framework for Data Security Using Advanced Machine
Learning Classification and Best Practices
Peng Wang1*, Ningping Yuan2 and Yong Li1
1Inner Mongolia Power Research Institute, Hohhot City, 010010, China
2Inner Mongolia Medical University, Hohhot City, 010110, China
E-mail: wangpeng9493@163.com
Keywords: Data security, classification techniques, support vector machines, neural networks, decision trees, best prac-
tices, data protection, access control
Received: Dec 17, 2024
In the current interconnected digital environment, data security has become a paramount concern, as cy-
berattacks and data breaches are increasing in frequency and complexity. Both organizations and people
face challenges in safeguarding sensitive information, requiring resilient security systems that can adjust
to various threats. This paper presents a comprehensive approach to data security, focusing on integrating
advanced classification techniques and best practices to secure data proactively. This study uses and an-
alyzes advanced classification algorithms like decision trees, support vector machines (SVM), and neural
networks to determine how well they work to find, sort, and keep sensitive data safe across various security
needs. The results indicate substantial improvements in classification accuracy, with the optimal model at-
taining an accuracy rate of 98.83%. The other models, including decision tress and SVM provide 89% and
92% accuracy, respectively. This highlights the dependability and resilience of these methods in detecting
possible security concerns across various datasets. In addition to these classification results, we compre-
hensively analyze industry best practices in data security, encompassing encryption technologies, dynamic
access control, and continuous monitoring to mitigate vulnerabilities and improve threat detection. Inte-
grating sophisticated classification methodologies with these optimal practices provides a comprehensive
security framework that enhances data protection and mitigates risk. This study offers significant insights
for practitioners and organizations aiming to implement a more systematic and efficient data security ap-
proach, enhancing academic and practical discussions in this domain. This work seeks to strengthen the
effectiveness of data security practices by introducing a novel method that integrates high-accuracy cate-
gorization with proactive security protocols.
Povzetek: Predstavljen je celovit pristop varnosti podatkov, ki integrira napredne klasifikacijske tehnike,
kot so nevronske mreže in podporni vektorji, z najboljšimi praksami za zaščito podatkov ter izboljšanje
kvalitete.
1 Introduction Each layer has specific functions—to support protection
against unauthorized access and data integrity [6]. Losing
Data security is becoming increasingly important today, millions of its users and cyber incidents inspired the need
impacting industries, governments, and individuals [1]. for effective and flexible data protection models that can
These developments have led to an explosion of data given address traditional and novel threats [7]. Conventionally
the use of the internet, cloud storage, and systems, therefore used methods in data protection are based on determinis-
making data security paramount to risky exercises whose tic models and rule-based systems, which are inadequate
forms of data need protection against unfair exploitation or in addressing new threats that evolve to counter security
unauthorized access [2]. Computer and internet crimes are mechanisms adopted [8]. Therefore, this study aims to fill
becoming complex, and information security and individ- these gaps by proposing an enhanced multi-classification
uals at all levels of the economy and society are at risk. approach that elevates the existing security practices of as-
Analyses prove that the total cost of cybercrime will be in sessment by integrating classification techniques with best
the trillions within a few years, thus the importance of ef- security practices [9]. As this research feeds into mod-
ficient data protection plans [3]. Data protection solutions ern theories on data classification, it is hoped that the gaps
are vital in allowing the privacy and confidentiality of data, in the currently existing data security frameworks will be
but implementations and controls are inadequate and vul- filled and that a solution to the security of sensitive data
nerable [4]. will be provided [10] [11]. Several data security methods
Data security can be discussed in terms of data encryp- exist, including encryption, access control, monitoring, and
tion, access control, monitoring, classification, etc [5].
192 Informatica 49 (2025) 191–206 P. Wang et al.
classification. However, classification is a form of secu- advanced classification techniques provides only best prac-
rity designed as the initial stage and not a single method. It tice, which is not sufficiently technically sound. Moreover,
marks and classifies sensitive data implementation as being research inclined to depict ideal procedures does not con-
the right security measures [12]. sider how rapidly these procedures can be implemented to
counter threats, especially in sectors that experience high
1.1 Research gap levels of cyberattacks and data breaches [20]. A signifi-
cant limitation of earlier works is the absence of a compre-
Several gaps exist in the current methods, especially in hensive framework integrating classification methods with
data classification with sensitivity-based protections. The proactive best practices [21]. In response to these limita-
framework is essential in defining classification and data tions, this research suggests a general framework data secu-
prioritization, which helps determine the security levels that rity solution suitable for various scenarios and best bridges
must be applied to data [13]. However, most conventional the technology and practice divide.
techniques or moves for classification are confining and
bring a high impact of variability, which is ordinary and 1.3 Challenges in data security
can hardly provide a suitable and comprehensive solution
for the large and ever-changing environments today [14]. Several challenges can be identified, significantly compli-
Most existing models are either prescriptive or unable to cating the development and application of measures for
adapt dynamically to new forms of threats, thus posing a protecting data. First, one of the main trends is the con-
risk for organizations [15]. The second central area is com- stantly growing complexity and the active response to cy-
bining classification methods with data security standards. ber threats. Unlike ordinary threats, which are more or less
Thus, the position is that although the classification con- easily recognizable, new threats are much less easy to un-
cept offers the first layer of data security, the idea is far derstand, and any static measures are useless. Computer
from complete. Encryption, access control, real-time mon- criminals use sophisticated procedures to take advantage of
itoring, and continually running vulnerability tests are the flaws, with their strategies evolving quickly due to emerg-
complete practices needed to protect data at the advanced ing security methods. This requires a security system that
level [16]. However, in many cases, research has been con- will address these emerging threats and be proactive to any
ducted to develop classification techniques and best prac- other threats that may arise [22].
tices independently while lacking a coherent one, including The next major problem is the ability to classify and
both. This gap implies a lack of integration of classification prioritize data depending on its classification requirement.
data with proactive recurring measures, which will enable Companies deal with vast volumes of data, which differ in
a better systematic response to data security problems [17]. sensitivity. That is why proper segregation and protection
of the data are significant. However, the conventional clas-
1.2 Limitations of previous studies sification approaches are ineffective when measuring the
amount and variety of information processed in organiza-
Several studies have been done on data security; these tions today. Also, organizations have always encountered
works offer pioneering notions on different interventions the compelling problem of security and unavailability. Se-
of data security; nevertheless, several downsides hamper curity policies must protect against invasion by unautho-
their applicability to contemporary security environments. rized personnel and allow authorized individuals to get the
Most of the works describing the performance of the classi- required information. Only security frameworks that can
fication techniques focus on the raw classification accuracy enable differential access controls depending on the sensi-
without considering such aspects as interpretability, com- tivity of the data and the type of user can achieve this bal-
putational cost, and flexibility [18]. While models trained ance, which is typically difficult to do when using conven-
in simulation perform well in their specific scenarios, their tional security mechanisms.
applicability sharpens when exposed to field data with intri- Using ML and other superior algorithms also poses an-
cate structures and dynamic threats. Furthermore, the pri- other problem regarding computations, interpretability, and
mary focus on objective measures such as accuracy could model shifts over time. The learning parameters of ML al-
not fully meet the challenges of protecting data in the real gorithms require constant updates for their efficiency, par-
world [19]. ticularly when it comes to dynamic threats. These chal-
Meanwhile, research that concerns data security mea- lenges show the need for an all-encompassing regime in
sures and proper protocols based on current and improved data security to meet advanced threats that have evolved
practices involves encoding techniques, security accesses over the years without compromising the system’s ease,
and policies, and conformance to prescribed rules and laws. adaptability, and robustness.
Although these practices are essential, they are used sep-
arately from technical classification techniques, and thus, 1.4 Motivations for the study
security is fragmented. This separation can be problematic
because while technical classification without best prac- This research was undertaken due to the absence of an ap-
tices means only gaps in coverage, best practice without propriate data security model that would also factor in the
An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 193
benefits of better classification systems. Thus, as data is study conducts a sensitivity analysis to test the robust-
present in all industries and constantly evolving, new and ness of classification outcomes under various param-
more complex threats arise, and a highly detailed and flex- eter settings. This analysis adds depth to the study by
ible security model is needed. It is known that decision demonstrating the model’s adaptability to different or-
trees, support vector machines (SVM), and neural networks ganizational requirements and security scenarios.
improve data classification, which is an integral part of
deploying security resources by making existing methods
more practical. Through these techniques, this study ex- 1.6 Structure of the paper
pects to enhance the precision of data categorization to help The remainder of this paper is structured as follows. Sec-
organizations direct their resources and efforts to protect the tion 2 provides a comprehensive review of existing lit-
most vulnerable data. erature on data security, focusing on advanced classifica-
This work also recognizes that the principles of data pro- tion techniques and best practices. Section 3 details the
tection entail other types of data protection, such as encryp- methodology, including data collection, model selection,
tion, access control, and real-time monitoring. All these and the integration of best practices into the proposed se-
are essential data security practices and perhaps mandatory curity framework. Section 4 presents the results, including
co-features of technical classification schemes. This study model performance metrics and sensitivity analysis find-
aims to solve both the theoretical and practical problems ings. Section 5 discusses the implications of the study, with
of data security by suggesting a more logical and consis- a focus on practical applications and limitations. Finally,
tent framework for data security than has been used before. Section 6 concludes the paper and offers suggestions for
This will be done using complicated classification methods future research.
and step-by-step ways to explain the security solution.
1.5 Novel contributions of the study 2 Literature review
This research makes several novel contributions to data se- Thapa and Camtepe, [23], whose work focuses on preci-
curity by presenting an integrated framework that combines sion health systems, discussed the necessity, barriers, and
advanced classification techniques with industry best prac- data security and privacy strategies. Their study also em-
tices. The unique contributions of this study are as follows: phasized that precision health, which provides care based
on patient-specific information related to genes, microbes,
1. Advanced Classification Techniques: This study behaviors, and environment, and digital records, includ-
evaluates the effectiveness of various classification ing omics, depend on technology like machine learning
algorithms, including decision trees, SVM, and neu- algorithms for data processing and electronic gadgets for
ral networks, in accurately categorizing sensitive data data capture. They brought attention to the high risk of
across different sensitivity levels. By rigorously test- leakage since health data contains susceptible information
ing these techniques, this study identifies models that about an individual, including identity and medical condi-
offer high accuracy, with the most effective model tions and interactions between health data centers. This
achieving an accuracy rate of 98.83%. type of breach can result in personal damage. The indi-
2. Integration with Best Practices: Unlike traditional vidual may be bullied at work, face discrimination at the
studies that focus exclusively on either technical or place of work, or even higher insurance charges, thus mean-
procedural aspects of data security, this study inte- ing privacy and security counts. They examined conform-
grates advanced classification techniques with secu- ing to government legislation and the ethical concerns and
rity best practices, such as encryption standards, ac- requirements that ethics committees highlight for protect-
cess control protocols, and continuous monitoring. ing healthcare data to keep the public engaged in precision
This integration provides a holistic security frame- health efforts. Their study showed that people’s buy-in of
work that addresses technical and operational security data sharing depends highly on safety, privacy, and proper
requirements. use of that data. To address these challenges, they described
multiple secure and privacy-preserving machine learning
3. Adaptability and Practicality: This study empha- techniques for implementing precision health information,
sizes the adaptability of its proposed model, allowing with examples of their usage in related health initiatives.
it to adjust to evolving threats. This framework is de- Finally, the study recommended the best ways to protect
signed to meet the diverse security needs of organiza- precision health data. The study also provided a conceptual
tions operating in rapidly changing environments by system model that can be used to check compliance, man-
combining flexible classification methods with proac- age consent, and support the ethical requirements needed
tive security protocols. for innovation in the healthcare field.
Aslan et al. proposed a systematic evaluation [24] of the
4. Comprehensive Evaluation and Sensitivity Anal- emerging cybersecurity threats, risks, incidence, and coun-
ysis: In addition to evaluating model accuracy, this termeasures to address the constant rise of cyber threats,
194 Informatica 49 (2025) 191–206 P. Wang et al.
such as the usage of the internet as a result of the COVID- scribed DL as one of the critical technologies in the 4IR.
19 outbreak. Their study stressed that with the replacement DL, a subset of ML and AI, is receiving widespread recog-
of the digital interaction of physical transactions, traditional nition from various industries because of its adaptability in
crimes have shifted more towards the cyber domain, and large datasets and its utility in healthcare, vision, natural
the current and emerging technologies like cloud, IoT, and language processing, and protection. He also added that DL
cryptocurrencies modify new security dimensions. The au- has its roots in artificial neural networks and is now crucial
thors stressed that in cyber attack campaigns, the adversary in solving other real-world problems. Due to the dynamism
uses automated tools and releases ‘cyber attacks as a ser- of data and the complexity of real-world issues, it has been
vice’ to achieve maximum effect, and the newly identified challenging to develop effective DL models. Additionally,
threats exploit hardware, software, and communication lay- most deep learning systems are black boxes, which prevents
ers. They have reviewed generalized forms of cyber attacks standardization and widespread use of these systems. The
such as DDoS, phishing, man in the middle, and malware research described a precise classification of DL methods
attacks and noted that traditional layers of protection like for distinguishing between supervised, unsupervised, and
firewalls and antivirus are not very useful in tackling cur- mixed learning methods for determining the practical ap-
rent complex threats. They highlighted the emerging need plication of DL. Further, he discussed other works that suc-
for new solutions that embrace superior and enhanced de- cessfully applied DL and showed that DL can be effectively
tection solutions and preventive measures. They reviewed used in various contexts. To inform the next steps in the de-
the latest trends in technological approaches, including ma- velopment of DL, the author outlined ten critical directions
chine learning, deep learning, cloud computing-based big for future research that are targeted at enhancing model in-
data, and blockchain; all of them were suggested as poten- terpretability, plasticity, and performance. This large-scale
tial approaches to detect and prevent cyber threats. They survey is also helpful for academic and industrial audiences
also found that it is possible to develop machine learning who want to understand the current state and future of DL,
and deep learning to identify new complex threat types, especially by emphasizing the need to increase the distinc-
and through experimentation, the effectiveness of machine tiveness and development of DL approaches.
learning and deep learning, when used for detecting mal- Ahmad et al. [27] also systematically reviewed cyber-
ware and intrusions, can be established. However, they security issues within IoT cloud computing, including how
noted that machine learning and deep learning are suscepti- cloud computing has revolutionized data storage and access
ble to evasion techniques and require constant enhancement to resources for industrial uses in IoT-based cloud comput-
to resist intelligent forms of cyber attacks. ing. This included making current research on cloud com-
Dasgupta and Akhtar [25] systematically reviewed cy- puting by Calegari and Ometto more relevant by noting that
bersecurity based on ML concerning the growing impor- their study found out that over the last decade, industries
tance of protecting data, devices, and user information in shifted to cloud computing due to its flexibility, cost and
the present interconnected society. They described their performance advantage. However, this has meant moving
survey regarding how ML has been incorporated into cy- applications to cloud platforms, which has created a consid-
bersecurity in applications like intrusion, malware, and erable security problem since conventional security is nor-
biometric-based user identification. However, as they high- mally not sufficient or efficient for new cloud applications.
lighted, when used in cybersecurity, the algorithm of ML is They noted that the convergence of IoT with cloud com-
exposed to attacks both during the training and the testing puting has compounded these threats as the architecture of
phases, which in turn does not allow for achieving the de- cloud IoT systems offers fresh concerns that necessitate se-
sired results and can result in the penetration of the system curity appropriate solutions. They classified cloud security
into the network. The research has undergone a system- concerns into four key categories: data security, network
atic literature review of recent developments in the applica- and service security, application security and people secu-
tion of ML in cyber-security between 2013 and 2018, with rity. They discussed and compared various security mat-
a general understanding of cyber attacks, the correspond- ters in each category they had and discussed the limitation
ing defense mechanisms, and the commonly usedML algo- from a general view, and specifically, they focused on the
rithm. They also discussed ML and data mining feature ex- DL viewpoint. The study reviewed new trends that involve
traction, dimensionality reduction, and classification tech- DL in dealing with cyber threats targeting IoT/cloud busi-
niques, such as adversarial ML—a subdiscipline that pro- ness models, while also acknowledging different methods
tects ML models against adversarial attacks. The task of have their limitations when adopted by industrial systems.
their survey was to stress the existing weaknesses of current Finally, based on their review of the literature, researchers
ML-based security measures related to adversarial threats suggest new ways to strengthen security using AI and DL
and discuss directions for a more extensive investigation of within the cloud architecture in order to address research
these risks. Lastly, they presented the existing and poten- gaps in IoT-based cloud cybersecurity [28].
tial problems and concerns in cybersecurity and provided Admass et al. [29] highlighted the current state, future
research recommendations for improving the robustness of trends and advances in cybersecurity and noted the need for
ML applications for this domain. cybersecurity as the world goes digital in different activi-
Sarker [26], in his deep and extensive review article, de- ties. As they noted to underscore the inherent dynamism
An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 195
of threats in cyberspace, more research, participation of among other things, implement broad cybersecurity poli-
academic institutions, and organizational commitment re- cies, pursue deployment of robust technologies, and de-
garding the protection of information systems need to be velop a cybersecurity culture. The study’s findings that PPP
promoted. In their systematic review, they focused on re- and policy intervention are crucial for developing the nec-
cent trends and innovations in the field of cybersecurity and essary cybersecurity framework further supported this. In
described new approaches and trends that have emerged their conclusion, they also encouraged a future research di-
worldwide to capture the dynamism of cyber threats. The rection to analyse new technologies and analyse human and
study considered AI andML as disruptive technologies that policy factors in cybersecurity for renewable energy. Ta-
can greatly help improve cyber security by being able to ble 1 summarizes the key performance metrics and method-
identify threats and respond to them autonomously. How- ologies from referenced works.
ever, they observed that these remain an issue to some ex-
tent, especially given that threats in cyberspace are equally
evolving. They also stressed the continuity of the stake- 3 Methodology
holders’ interaction and suggested that future works are
aimed at combining the use of innovative technologies and 3.1 Overview of the proposed framework
cooperation between members of the cybersecurity envi-
ronment. This work offered directions on how to build This study proposes a comprehensive framework for data
capacity in cybersecurity and emerging developments that security, integrating advanced classification techniques
would be necessary for new threats. with best cybersecurity practices. The methodology con-
sists of four main phases: data collection and preprocess-
Zhang et al. [30] explained various methodologies of ex- ing, feature extraction, classification using advanced ma-
plainable artificial intelligence (XAI) in the context of cy- chine learning algorithms, and integration of best practices.
bersecurity regarding the massive problems raised by the These phases enhance data security through accurate clas-
‘‘black box’’ that distinguishes conventional ML and DL. sification and adherence to security standards. The over-
Given the current evolution of the Internet of Things and all workflow of the proposed framework may be viewed in
other AI techniques, ML and DL are widely used in cyber- Figure 1.
security, including intrusion, malware, and spam detection.
Despite these recognition-based methods yielding higher
accuracy and more efficiency compared to the signature- 3.2 Research questions and objectives
based and rule-based methods as observed by them. They
identified a major drawback of the black-box nature of ML This study addresses the following key research questions:
and DL algorithms. Such explainability often leads to re- 1. How effectively can advanced machine learning (ML)
duced user trust and reduced understanding of how these classification techniques integrate with cybersecurity
models detect or address cyber threats, especially as the best practices to enhance data security?
kind of cyber threats being witnessed continue to evolve.
So, they looked at the possible weakness that could come 2. Which classification technique—Decision Trees, Sup-
from trying to make things understandable and how XAI port Vector Machines (SVM), or Neural Networks—
needs to be added to theories of AI-based cybersecurity provides the most accurate and robust performance for
models so that people can understand them or manage cy- cybersecurity applications?
bersecurity systems well. Their work also filled in an im-
portant research gap by providing a thorough survey that 3. What are the benefits of incorporating real-time moni-
was only focused on AI/ML-based XAI in cybersecurity. toring, encryption, and access control alongside ML
This was despite the fact that XAI had been studied in other models in addressing modern cybersecurity chal-
fields, like healthcare and finance. They suggested a struc- lenges?
tured plan for approaching XAI in the cybersecurity field
and pointed out that cybersecurity machine learning mod- The primary objective of this study is twofold:
els should bemore explainable without losing performance. – To evaluate the feasibility and effectiveness of com-
This survey provides the necessary background information bining ML techniques with robust security practices.
for further studies by those who intend to focus on the chal-
lenge of making cybersecurity AI understandable for the – To compare the performance of the proposed classi-
average user [31]. fication techniques and demonstrate the practical ad-
They found that AI and ML technologies offer viable so- vantages of the integrated framework.
lutions for filling the new emerging security threats in re-
newable energy. The study also focused on the need for 3.3 Data collection and preprocessing
global cooperation and compliance of countries with inter-
national guidelines on cyberspace security as critical in im- In the initial phase, data is gathered from diverse publicly
proving security readiness throughout the renewable power available sources to comprehensively represent real-world
industry. According to them, industry stakeholders should, cybersecurity scenarios [32]. Data is anonymized to protect
196 Informatica 49 (2025) 191–206 P. Wang et al.
Table 1: Comparison of key performance metrics and methodologies from referenced works
Author(s) Focus Area Key Contributions Limitations
Dasgupta et al. [25] ML in Cybersecurity Surveyed ML applications in intru- Highlighted vul-
sion detection and adversarial ML. nerability of ML to
Proposed directions for improving adversarial attacks;
robustness. lacks integration
with broader security
practices.
Zhang et al. [30] Explainable AI Reviewed XAI methodologies for Black-box limita-
(XAI) in Cybersecu- cybersecurity, emphasizing user tions of ML/DL
rity trust and transparency. persist; need for
practical implemen-
tation strategies.
Thapa and Camtepe Precision Health Proposed secureML techniques and Focused primarily
[23] Data Security a conceptual model for protecting on healthcare, not
health data. generalizable to
other domains.
Aslan et al. [24] Emerging Cyberse- Reviewed ML/DL for detecting Susceptibility of
curity Threats malware and intrusions. Identified ML/DL to evasion
vulnerabilities in IoT and cloud sys- techniques; lacks
tems. comprehensive miti-
gation strategies.
Sarker [26] Deep Learning (DL) Surveyed DL methods for cyberse- DL systems often
Applications curity, highlighting their adaptabil- operate as black
ity and challenges in implementa- boxes, reducing
tion. interpretability and
standardization.
Ahmad et al. [27] IoT and Cloud Cy- ExploredAI/DL-based solutions for Limited focus on
bersecurity IoT-cloud models and proposed se- integrating AI solu-
curity enhancements. tions with policy and
regulatory frame-
works.
sensitive information. The dataset includes access logs, en- employs Principal Component Analysis (PCA) to reduce
cryption statuses, and user authentication details. Prepro- dimensionality, retaining only essential components con-
cessing includes: tributing to data variability.
– Normalization: Scaling data attributes to fit a stan-
dard range [33]. 3.4.1 Principal component analysis (PCA)
X −Xmin PCA transforms high-dimensional data into a lower-
Xnorm = (1)
Xmax −Xmin dimensional space while preserving variance. The trans-
formation is computed as follows:
– Missing Value Imputation: Filling gaps in data
through statistical techniques to avoid misclassifica- Y = X ·W (2)
tion.
where X is the original data matrix and W represents
– Noise Reduction: Using median filtering to reduce the weight matrix of principal components. PCA reduces
outliers. computational load while retaining critical information.
This preprocessing step ensures data quality and reduces
computational complexity, allowing the algorithms to per- 3.5 Classification techniques
form accurately.
The core of this methodology is the classification phase,
3.4 Feature extraction and selection where advanced machine learning algorithms are employed
to categorize data based on security needs. Three algo-
Feature extraction involves identifying the most relevant rithms are used: Decision Trees, Support Vector Ma-
attributes to enhance classification accuracy. This study chines (SVM), and Neural Networks. Each algorithm is
An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 197
Figure 1: Workflow of the proposed framework
selected for its strengths in specific security scenarios. For data that is not linearly separable, SVM uses a kernel
function to map data to a higher-dimensional space. The
3.5.1 Decision trees margin is optimized by minimizing:
Decision Trees are highly interpretable models that use a n
tree-like structure for classification. Each node represents a 1 ∑
L = ∥w∥2 + C ξi (4)
decision based on an attribute, leading to branches that pre- 2
i=1
dict outcomes [34]. The algorithm’s performance is evalu-
ated using Gini impurity: where w is the weight vector, C is a penalty parameter,
∑ and ξi represents slack variables. This approach enhances
n the model’s robustness against misclassifications.
G = 1− p2i (3)
i=1
where pi is the probability of a particular class. Lower 3.5.3 Neural networks
Gini values indicate better classification.
Neural Networks are employed for complex pattern recog-
3.5.2 Support vector machines (SVM) nition, using multiple layers to capture non-linear re-
lationships [36]. The backpropagation algorithm ad-
SVMs classify data by finding a hyperplane that maximizes justs weights based on error rates, minimizing the Mean
the margin between data points of different classes [35]. Squared Error (MSE):
198 Informatica 49 (2025) 191–206 P. Wang et al.
– Employ Neural Network for complex, high-
n
1 ∑
MSE = (yi − ŷi)
2 (5) dimensional data
n
i=1 – Best Practices Integration:
where yi is the actual output, and ŷi is the predicted out-
put. Neural Networks are particularly effective for high- – Encrypt data using keyK
dimensional data and provide high classification accuracy.
– Implement role-based access using access matrix
A(u, r)
3.6 Integration of security best practices
– Monitor for anomalies with threshold δ
This framework integrates security best practices, such as
encryption, access control, and real-time monitoring, to – Output: Classified secure data, threat identification
complement the classification process.
This algorithm combines machine learning with best
– Encryption: Ensures data confidentiality through se- practices, ensuring data classification and security.
cure algorithms, with all data encrypted before pro-
cessing. The encryption-decryption cycle is defined
by: 3.8 Validation and evaluation metrics
The framework’s effectiveness is evaluated through stan-
C = E(K,P ) and P = D(K,C) (6) dard metrics:
where C is the ciphertext, P the plaintext, K the en-
cryption key, E the encryption function, and D the – Accuracy: Proportion of correctly classified in-
decryption function. stances.
– Access Control: Restricts data access based on user TP + TN
Accuracy = (8)
roles, employing role-based access control (RBAC). TP + TN + FP + FN
This model assigns permissions using access matrices,
where the matrix entryA(u, r) defines permissions for – Precision and Recall: Precision measures correct
user u and role r. positive predictions, while recall measures the detec-
tion of actual positives.
– Real-timeMonitoring: Uses anomaly detection algo-
rithms to identify unusual patterns indicative of poten-
tial threats. Anomalies are detected based on threshold TP TP
deviations: Precision = and Recall =
TP + FP TP + FN
(9)
δ = ∥x− µ∥ > λ (7)
– F1 Score: The harmonic mean of precision and recall,
where x is the current observation, µ the mean, and λ indicating the balance between these metrics.
the deviation threshold.
F1 = 2 · Precision · Recall3.7 Algorithm: secure classification (10)
Precision + Recall
framework
– ROC-AUC: Measures classification performance
The following algorithm outlines the steps for data security across different thresholds. An area under the ROC
classification within this framework: curve close to 1.0 indicates high model performance.
– Input: Dataset D, security parameters {P,K}
3.9 Comparative analysis and sensitivity
– Preprocessing: Normalize data, fill missing values,
reduce noise testing
– Feature Extraction: Apply PCA to extract relevant The comparative analysis is aimed at comparing results of
features the classification algorithms that are obtained under the in-
fluence of various factors. Sensitivity analysis looks at how
– Classification: much error a model returns, given that the hyperparameters
are tweaked. The proposed model brings safety and flexi-
– Apply Decision Tree for interpretable cases bility in managing data, the objectives of the study, where
– Use SVM with kernel function for non-linear there is a need to attain high classification accuracy there
separable data should be some level of security measured control.
An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 199
4 Results Table 2: Confusion matrix for decision tree model
4.1 Overview of experimental setup and Predicted Positive Predicted Negative
metrics Actual Positive 450 50
Actual Negative 40 460
The findings result from following a data security frame-
work that combines classification measures with cyberse-
curity standards. The key ratios to assess the models are 4.2.2 Support vector machine (SVM) results
divided into Accuracy, Precision, Recall, F1 score, ROC- The SVM model was optimized using a radial basis func-
AUC. All the measurements are related to certain aspects of tion (RBF) kernel, achieving improved accuracy over the
the model’s effectiveness, and results are given in graphs, Decision Tree model. Figure 3 illustrates the metrics
tables, and confusion matrix for better understanding. achieved by SVM, with an accuracy of 92%, precision of
90%, recall of 91%, and an F1 score of 90.5%.
4.2 Model performance across classification
techniques
The framework employed three primary classification algo-
rithms: Decision Trees, Support Vector Machines (SVM),
and Neural Networks, to classify data based on security
needs.
4.2.1 Decision tree results
The Decision Tree model provided an interpretable yet ef-
fective baseline. Figure 2 shows the accuracy, precision,
recall, and F1 score for the Decision Tree model, achiev-
ing a consistent classification accuracy of around 89%.
Accuracy = 89%, Precision = 87%, Recall = 88%, F1Fi=gu8r7e.53%: Performance metrics for the SVM model with
RBF kernel
The confusion matrix in Table 3 for the SVM model
demonstrates a further reduction in misclassifications, indi-
cating the SVM’s robustness in handling complex decision
boundaries.
Table 3: Confusion matrix for SVM model
Predicted Positive Predicted Negative
Actual Positive 460 40
Actual Negative 30 470
4.2.3 Neural network results
The Neural Network, a multilayer perceptron (MLP)
model, displayed the highest performance, achieving
98.83% accuracy, which aligns with the framework’s
Figure 2: Performance metrics for the decision tree model novel contribution toward accurate classification. Metrics
for the Neural Network model (Figure 4) include a preci-
The confusion matrix for the Decision Tree model (Ta- sion of 98.5%, recall of 98.6%, and F1 score of 98.55%.
ble 2) displays the model’s classification performance The confusion matrix in Table 4 further validates the
across different classes, indicating a strong ability to distin- Neural Network’s high classification capability, with min-
guish true positives and negatives, though occasional mis- imal false positives and false negatives, indicating near-
classifications occurred in borderline cases. perfect distinction between classes.
200 Informatica 49 (2025) 191–206 P. Wang et al.
score of 98.55%. This makes it highly appropriate where
it is crucial that both false positives and false negatives
be kept to the barest level possible, especially for applica-
tions such as fraud detection and cybersecurity threat eval-
uation. At a reasonable intersection of the F1 score equal
to 90.5%, SVM turns into a worthy trade-off option for ap-
plications with a reasonable amount of computational re-
sources necessary for mid-sized datasets’ anomaly detec-
tion. On the other hand, the low F1-score of the deci-
sion tree of just 87,5% demonstrates the model’s usefulness
in cases where speed and comprehensible decision-making
are valued more than accuracy, such as the preliminary data
sorting in security systems.
4.4 Sensitivity analysis and robustness of
the neural network model
Figure 4: Performance metrics for the neural network
model Sensitivity analysis was conducted on the Neural Network
model to evaluate its robustness across different hyperpa-
Table 4: Confusion matrix for neural network model rameters. Figure 6 shows the effect of varying the learning
rate on model accuracy, illustrating optimal performance
Predicted Positive Predicted Negative at a learning rate of 0.01. The model displayed resilience,
Actual Positive 495 5 maintaining high accuracy across learning rates, thoughmi-
Actual Negative 3 497 nor fluctuations occurred with extreme values.
4.3 Comparative analysis of classification
algorithms
Table 5 provides a summary of key performance metrics
across all three algorithms. The Neural Network model
achieved the highest scores, indicating its effectiveness for
data security applications. Figure 5 presents a bar chart
comparing the accuracy of all three models.
Figure 6: Sensitivity analysis of neural network model with
varying learning rates
4.5 Integration of security best practices
To verify the framework’s effectiveness in a secure envi-
ronment, additional security best practices such as encryp-
tion and real-time monitoring were integrated and tested.
Figure 5: Accuracy comparison for decision tree, SVM, and Data was encrypted using AES-256 encryption (Equation 6
neural network models in Methodology), ensuring data confidentiality. The access
control measures limited user permissions based on roles,
The F1 scores are used to emphasize practical signifi- securing the model against unauthorized access. Real-time
cance of each of the classification models in the evalua- monitoring, implemented through anomaly detection, suc-
tion of the given metrics. The neural network has proven cessfully identified potential security breaches with an ac-
to deliver improved precision as well as recall and an F1 curacy of 96%.
An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 201
Table 5: Comparative analysis of model performance
Model Accuracy Precision Recall F1 Score
Decision Tree 89% 87% 88% 87.5%
SVM 92% 90% 91% 90.5%
Neural Network 98.83% 98.5% 98.6% 98.55%
4.6 Analysis of security metrics 5 Implications and limitations
The framework was evaluated based on its ability to main-
tain data confidentiality, integrity, and availability. Fig- 5.1 Practical applications
ure 7 presents the security metrics obtained during test- The paper provides a practical outlook on the proposed
ing, with encryption providing a data confidentiality rate framework for data security by incorporating classification
of 100%, access control measures ensuring 99% integrity, techniques with cybersecurity principles into a heteroge-
and real-time monitoring achieving a 96% availability rate. neous system. Due to its high accuracy, this framework
is most effective in fields critical to data accuracy and se-
curity, such as healthcare, finance, government, and cloud
services. Table 7 provides a comparison of the proposed
framework with state-of-the-art (SOTA) methods.
– Healthcare Sector: In healthcare, keeping patients’
data and preventing leakage or ensuring safe data
transmission is very important. This framework could
improve the patient’s privacy by making it difficult for
intruders to access the database system and also guar-
antee data security. With an accuracy level of 98.83%,
the proposed neural network model can be considered
suitable for predicting and preventing security threats
in medical data systems.
– Financial Institutions: In this modern world, entities
Figure 7: Security metric analysis for data confidentiality, dealing with cash give cash and deal with people’s fi-
integrity, and availability. nancial records, such as transaction history and credit
records, and they become targets for hacker attacks.
Hence, the duplication of this framework can help fi-
nancial organizations strengthen their protective mea-
4.7 Discussion of novel contributions sures against different types of fraud schemes. The
The results substantiate the framework’s novel contribu- real-time monitoring capability, with an availability
tions, as outlined in the introduction. The high classifi- rate of 96%, means the program can immediately iden-
cation accuracy achieved by the Neural Network model tify such patterns and possible violations.
demonstrates the framework’s capacity for accurate threat
detection, with the 98.83% accuracy surpassing traditional – Government and Public Sector: This framework can
models in complex security scenarios. In addition, secu- be implemented into government agencies, which nec-
rity best practices pillars including encryption and real time essarily have large databases containing personal or
monitoring gave a security boost to the framework in ad- nationally important data, thus increasing data protec-
dition to guaranteeing the accuracy of data classification. tion. Thus, together with access control based on job
As anticipated the study proofs that the proposed data se- positions, real-time monitoring helps to timely detect
curity framework of incorporating machine learning with violations in working government databases.
security practices not only improves security but also the
accuracy of classification. Table 6 provides a summary of – Cloud Computing and IoT Environments: Cloud
the core findings. While performing sensitivity analysis, a services and Internet of Things (IoT) networks are de-
scalability problem arose, showing that neural networks are centralized environments. The monitoring, anomaly
restricted by GPU memory and SVMs by the kernel calcu- detection, and encryption framework provided in this
lation of the big data. These include helping choose models work can protect data in such environments and scale
according to specific available resources and scalability for to accommodate the dynamics of the cloud architec-
a certain application. ture’s application.
202 Informatica 49 (2025) 191–206 P. Wang et al.
Table 6: Summary of findings
Aspect Result
Highest Classification Accuracy 98.83% (Neural Network)
Best Security Metric 100% confidentiality through AES-256 encryption
Robustness in Monitoring 96% availability in real-time monitoring
Table 7: Comparison of proposed framework with state-of-the-art (SOTA) methodologies
Author(s) Focus Area Key Contributions Limitations Ad-
dressed by This
Study
Dasgupta et al. [25] ML in Cybersecurity Surveyed ML applications Improved model
in intrusion detection and robustness and
adversarial ML. Highlighted classification ac-
vulnerabilities in adversarial curacy (98.83%).
scenarios. Incorporated proac-
tive monitoring to
address evolving
threats.
Zhang et al. [30] Explainable AI (XAI) in Cy- Reviewed XAI methodolo- Achieved high per-
bersecurity gies to enhance transparency formance (98.83%)
and user trust in cybersecu- while ensuring ro-
rity AI models. bust implementation.
Proposed future
integration of XAI
for enhanced inter-
pretability.
Thapa and Camtepe Precision Health Data Secu- Proposed secure ML tech- Generalized frame-
[23] rity niques and conceptual mod- work applicable
els for health data. across domains with
real-time monitoring
for evolving cyber
threats.
Aslan et al. [24] Emerging Cybersecurity Highlighted the need for en- Combined AES-
Threats hanced detection measures 256 encryption
against IoT/cloud threats. with adaptive ML
Reviewed ML/DL methods methods for robust
for malware detection. security in IoT/cloud
systems.
Ahmad et al. [27] IoT and Cloud Cybersecu- Explored AI/DL-based so- Unified classifica-
rity lutions for IoT-cloud inte- tion techniques with
gration. Addressed security access control and
gaps in cloud environments. monitoring for com-
prehensive IoT/cloud
protection.
Sarker [26] Deep Learning (DL) Appli- Discussed DL challenges Enhanced DL robust-
cations such as black-box nature ness with sensitivity
and adaptability in cyberse- analysis and adapt-
curity. ability in real-time
monitoring.
5.2 Limitations of the study – Complexity of Implementation: Implementing this
framework in existing systems involves significant
complexity. Integrating multiple machine learning al-
Despite its strengths, the framework has several limitations gorithms with advanced encryption and monitoring
that may affect its application.
An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 203
measures demands substantial resources and expertise, – AutomatedModel Updating: Developing automated
which may not be available in all organizations. methods for periodic model retraining would help the
framework stay effective against evolving threats by
– Scalability Concerns: However, the neural network integrating new data patterns into the learning process.
model proposed in this paper had high testing accu-
racy; there may be a problem of scalability when ap-
plying this framework to large systems. However, as Future research will concentrate on improving scalability
the amount of data and classification types increases, through approaches such as parallel processing, batch nor-
real-timemonitoring and accuracymaintenance can be malization, and model pruning to improve large-scale data
demanding on resources in a deficient environment. management. Emerging technologies will be examined for
secure data sharing and privacy-preserving model training,
– Dependency on Data Quality: Usually, the classifi- including blockchain and federated learning. Furthermore,
cation models depend on the quality of the given data. systems such as continuous learning pipelines and auto-
When input data is inconsistent or incomplete, then the mated hyperparameter tuning frameworks will be incorpo-
model will not perform effectively. However, main- rated to provide dynamic model updates and maintain per-
taining the quality of the inputs even today poses a formance in changing cybersecurity landscapes.
problem, especially in environments where data can be
created perpetually and might not have been checked.
– Adaptability to Emerging Threats: Security risks 6 Conclusion
concern are never ending and keep changing from time
to time. While using machine learning improves the This research offers a strong foundation for data protection
spectrum of detection, there are sophisticated attack by integrating sophisticated classification systems into cy-
tactics that may fail to be modeled. This needs con- bersecurity fundamentals to provide higher classes of data
stant update and training to detect new patterns out confidentiality, integrity, and accessibility. Based on ma-
there. chine learning algorithms, especially the neural network
model, with an accuracy as high as 98.83 %, the frame-
– Computational Overheads: Integration of high- work’s performance shows that, in principle, text classifi-
complex models such as neural networks with real- cation and anomaly detection can accomplish high accu-
time monitoring might actually slow down computa- racy. These security measures enhance the proposed frame-
tion time, thus is not well suited for applications where work’s usefulness in organizations requiring high data se-
response time is critical. The efficient use of available curity levels, including health, financial, and government
resources is also desirable in order to propagate lower organizations. However, the challenges are still present in
powered systems. practice, such as difficulty implementing the framework in
an actual setting, concerns for its scalability, and a strong
– Privacy and Compliance Constraints: Employing emphasis on data quality. Further, there is a continually ris-
the best of machine learning in data security poses pri- ing danger of hacks and malicious activities that make up-
vacy and regulatory issues because the two fields are dates and retraining of models essential. We can look into
sensitive in motherhood, such as health and finance. the following possible directions for these kinds of research
Data protection regulation like GDPR presents a chal- advances, as we already talked about the gaps: scaling
lenge, especially when it comes to training, handling up optimization strategies, adding more general technolo-
the training data, and the general handling of personal gies to machine learning for privacy, like quantum encryp-
data. tion, and seeing improvements in advanced machine learn-
ing practices that protect privacy. This framework protects
5.3 Future directions data and defines a new horizon for protecting secure data.
As organizations increasingly rely on digital systems, im-
To address these limitations and expand the potential of this plementing such adaptable frameworks becomes crucial to
framework, future research could explore: countering cyber threats and safeguarding sensitive infor-
mation. This study contributes to the growing field of cy-
– Optimization for Scalability: Research focused on bersecurity by providing a practical and adaptable solution
optimizing neural networks and other complex models that meets the demands of contemporary data security.
to reduce computational costs could improve scalabil-
ity, enhancing adaptability to large-scale systems.
– Incorporation of Emerging Technologies: Emerg- Acknowledgement
ing technologies like quantum computing and
blockchain may further enhance security. Quantum This research is funded by the Science and Technology
encryption, for example, could offer robust protection Project of Inner Mongolia Power Group Limited Company,
against sophisticated cyber threats. Project No. 2023-5-34.
204 Informatica 49 (2025) 191–206 P. Wang et al.
References [11] A. U. R. Butt, M. Asif, S. Ahmad, and U. Imdad,
“An empirical study for adopting social computing in
[1] A. B. Ige, E. Kupa, and O. Ilori, “Best practices global software development,” in Proceedings of the
in cybersecurity for green building management sys- 2018 7th International Conference on Software and
tems: Protecting sustainable infrastructure from cy- Computer Applications, 2018, pp. 31–35.
ber threats,” International Journal of Science and Re-
search Archive, vol. 12, no. 1, pp. 2960–2977, 2024. [12] A. U. R. Butt, M. A. Qadir, N. Razzaq, Z. Farooq, and
I. Perveen, “Efficient and robust security implementa-
[2] R. Kaur, D. Gabrijelčič, and T. Klobučar, “Artifi- tion in a smart home using the internet of things (iot),”
cial intelligence for cybersecurity: Literature review in 2020 International Conference on Electrical, Com-
and future research directions,” Information Fusion, munication, and Computer Engineering (ICECCE).
vol. 97, p. 101804, 2023. IEEE, 2020, pp. 1–6.
[3] Z. Yang, X. Liu, T. Li, D. Wu, J. Wang, Y. Zhao, and [13] D. Chen, P. Wawrzynski, and Z. Lv, “Cyber security
H. Han, “A systematic literature review of methods in smart cities: a review of deep learning-based appli-
and datasets for anomaly-based network intrusion de- cations and case studies,” Sustainable Cities and So-
tection,” Computers & Security, vol. 116, p. 102675, ciety, vol. 66, p. 102655, 2021.
2022. [14] M.A. Ferrag, O. Friha, D. Hamouda, L.Maglaras, and
[4] A. Fatani, A. Dahou, M. A. Al-Qaness, S. Lu, and H. Janicke, “Edge-iiotset: A new comprehensive real-
M. A. Elaziz, “Advanced feature extraction and se- istic cyber security dataset of iot and iiot applications
lection approach using deep learning and aquila op- for centralized and federated learning,” IEEE Access,
timizer for iot intrusion detection system,” Sensors, vol. 10, pp. 40 281–40 306, 2022.
vol. 22, no. 1, p. 140, 2021. [15] Z. Zhang, H. Ning, F. Shi, F. Farha, Y. Xu, J. Xu,
[5] X. Sun, F. R. Yu, and P. Zhang, “A survey on F. Zhang, and K.-K. R. Choo, “Artificial intelligence
cyber-security of connected and autonomous vehicles in cyber security: research advances, challenges, and
(cavs),” IEEE Transactions on Intelligent Transporta- opportunities,” Artificial Intelligence Review, pp. 1–
tion Systems, vol. 23, no. 7, pp. 6240–6259, 2021. 25, 2022.
[6] A. U. R. Butt, T. Saba, I. Khan, T. Mahmood, A. R. [16] A. Khraisat and A. Alazab, “A critical review of intru-
Khan, S. K. Singh, Y. I. Daradkeh, and I. Ullah, sion detection systems in the internet of things: tech-
“Proactive and data-centric internet of things-based niques, deployment strategy, validation strategy, at-
fog computing architecture for effective policing in tacks, public datasets and challenges,” Cybersecurity,
smart cities,” Computers and Electrical Engineering, vol. 4, pp. 1–27, 2021.
vol. 123, p. 110030, 2025. [17] T. O. Oladoyinbo, O. O. Adebiyi, J. C. Ugonnia, O. O.
Olaniyi, and O. J. Okunleye, “Evaluating and estab-
[7] S. Nifakos, K. Chandramouli, C. K. Nikolaou, P. Pa- lishing baseline security requirements in cloud com-
pachristou, S. Koch, E. Panaousis, and S. Bonacina, puting: an enterprise risk management approach,”
“Influence of human factors on cyber security within Asian journal of economics, business and accounting,
healthcare organisations: A systematic review,” Sen- vol. 23, no. 21, pp. 222–231, 2023.
sors, vol. 21, no. 15, p. 5119, 2021.
[18] R. Vallabhaneni, S. Pillai, S. A. Vaddadi, S. R. Ad-
[8] A. U. R. Butt, T. Mahmood, T. Saba, S. O. Bahaj, F. S. dula, and B. Ananthan, “Secured web application
Alamri, M. W. Iqbal, and A. R. Khan, “An optimized based on capsulenet and owasp in the cloud,” Indone-
role-based access control using trust mechanism in e- sian Journal of Electrical Engineering and Computer
health cloud environment,” IEEE Access, 2023. Science, vol. 35, no. 3, pp. 1924–1932, 2024.
[9] M. I. Khan, A. Imran, A. H. Butt, A. U. R. Butt et al., [19] M. K. Hasan, A. A. Habib, Z. Shukur, F. Ibrahim,
“Activity detection of elderly people using smart- S. Islam, and M. A. Razzaque, “Review on cyber-
phone accelerometer and machine learning methods,” physical and cyber-security system in smart grid:
International Journal of Innovations in Science & Standards, protocols, constraints, and recommenda-
Technology, vol. 3, no. 4, pp. 186–197, 2021. tions,” Journal of network and computer applications,
vol. 209, p. 103540, 2023.
[10] M. Ghiasi, T. Niknam, Z. Wang, M. Mehrandezh,
M. Dehghani, and N. Ghadimi, “A comprehensive re- [20] K. U. Qasim, J. Zhang, T. Alsahfi, and A. U. R.
view of cyber-attacks and defensemechanisms for im- Butt, “Recursive decomposition of logical thoughts:
proving security in smart grid energy systems: Past, Framework for superior reasoning and knowledge
present and future,”Electric Power Systems Research, propagation in large languagemodels,” arXiv preprint
vol. 215, p. 108975, 2023. arXiv:2501.02026, 2025.
An Integrated Framework for Data Security… Informatica 49 (2025) 191–206 205
[21] I. H. Sarker, M. H. Furhad, and R. Nowrozy, “Ai- [33] M. S. Yadav and R. Kalpana, “Data preprocessing
driven cybersecurity: an overview, security intelli- for intrusion detection system using encoding and
gence modeling and research directions,” SN Com- normalization approaches,” in 2019 11th Interna-
puter Science, vol. 2, no. 3, p. 173, 2021. tional Conference on Advanced Computing (ICoAC).
IEEE, 2019, pp. 265–269.
[22] M. A. Ferrag, O. Friha, L. Maglaras, H. Janicke,
and L. Shu, “Federated deep learning for cyber secu- [34] P. Li, M. Abouelenien, R.Mihalcea, Z. Ding, Q. Yang,
rity in the internet of things: Concepts, applications, and Y. Zhou, “Deception detection from linguistic
and experimental analysis,” IEEE Access, vol. 9, pp. and physiological data streams using bimodal convo-
138 509–138 542, 2021. lutional neural networks,” in 2024 5th International
Conference on Information Science, Parallel and Dis-
[23] C. Thapa and S. Camtepe, “Precision health data: Re- tributed Systems (ISPDS). IEEE, 2024, pp. 263–267.
quirements, challenges and existing techniques for
data security and privacy,” Computers in biology and [35] M. A. Selvan, “Svm-enhanced intrusion detection
medicine, vol. 129, p. 104130, 2021. system for effective cyber attack identification and
mitigation,” 2024.
[24] Ö. Aslan, S. S. Aktuğ, M. Ozkan-Okay, A. A. Yilmaz, [36] G. S. Kumar, K. Premalatha, G. U. Maheshwari, P. R.
and E. Akin, “A comprehensive review of cyber se- Kanna, G. Vijaya, and M. Nivaashini, “Differential
curity vulnerabilities, threats, attacks, and solutions,” privacy scheme using laplace mechanism and statis-
Electronics, vol. 12, no. 6, p. 1333, 2023. tical method computation in deep neural network for
[25] D. Dasgupta, Z. Akhtar, and S. Sen, “Machine learn- privacy preservation,” Engineering Applications of
ing in cybersecurity: a comprehensive survey,” The Artificial Intelligence, vol. 128, p. 107399, 2024.
Journal of Defense Modeling and Simulation, vol. 19,
no. 1, pp. 57–106, 2022.
[26] I. H. Sarker, “Deep learning: a comprehensive
overview on techniques, taxonomy, applications and
research directions,” SN computer science, vol. 2,
no. 6, p. 420, 2021.
[27] W. Ahmad, A. Rasool, A. R. Javed, T. Baker, and
Z. Jalil, “Cyber security in iot-based cloud comput-
ing: A comprehensive survey,” Electronics, vol. 11,
no. 1, p. 16, 2021.
[28] K.Wang and X.Wang, “Application of fuzzy decision
theory in multi objective logistics distribution center
site selection,” Informatica, vol. 48, no. 23, 2024.
[29] W. S. Admass, Y. Y. Munaye, and A. A. Diro, “Cy-
ber security: State of the art, challenges and future
directions,” Cyber Security and Applications, vol. 2,
p. 100031, 2024.
[30] Z. Zhang, H. Al Hamadi, E. Damiani, C. Y. Yeun, and
F. Taher, “Explainable artificial intelligence applica-
tions in cyber security: State-of-the-art in research,”
IEEE Access, vol. 10, pp. 93 104–93 139, 2022.
[31] A. K. Marzook and J. Alkenani, “Hybrid kalman filter
and optimization-based routing for energy efficiency
in heterogeneous wireless sensor networks,” Infor-
matica, vol. 48, no. 23, 2024.
[32] Y. Li and T. Wang, “Intelligent management process
analysis and security performance evaluation of sports
equipment based on information security,” Measure-
ment: Sensors, vol. 33, p. 101083, 2024.
206 Informatica 49 (2025) 191–206 P. Wang et al.
https://doi.org/10.31449/inf.v49i12.6903 Informatica 49 (2025) 207–220 207 
Dynamic Anti-Mapping Network Security Using Hidden Markov 
Models and LSTM Networks Against Illegal Scanning 
Min Guo 1, Dongjuan Ma 1, Feng Jing 1, Xueqin Zhang 1, Hengwang Liu 2* 
1State Grid Shanxi Electric Power Research Institute, Taiyuan 030006, Shanxi, China 
2Anhui Jiyuan Inspection and Testing Technology Co., Ltd, Hefei 230097, Anhui, China 
E-mail: hengwang_liu@outlook.com 
*Corresponding author's  
Keywords: illegal network scanning, anti-mapping techniques, secure access, dynamic ip addresses, port obfuscation 
Received: August 14, 2024 
This paper deeply explores an innovative network anti-mapping security access technology to cope with 
the increasingly frequent illegal network scanning behaviors, aiming to build a more robust network 
security protection system. First, we analyze the threats of illegal scanning to network infrastructure, 
including but not limited to information leakage, service interruption, and the risk of being a springboard 
for subsequent attacks. Subsequently, a comprehensive security strategy is proposed, combining dynamic 
IP address allocation, port obfuscation, traffic camouflage, and behavior analysis to improve the system's 
concealment and anti-detection capabilities.This paper introduces the collaborative working mode of 
intelligent firewall and intrusion prevention system (IPS), using hidden Markov model (HMM) and long 
short-term memory network (LSTM) to identify and block malicious scanning behaviors, and optimize 
access control list (ACL) to achieve efficient release of legitimate traffic and accurate interception of 
illegal scanning traffic. Experimental results show that the proposed network anti-mapping security 
access technology has achieved significant results in improving network security. Specifically, we 
conducted experimental verification on the UNSW-NB15 dataset, which covers a variety of attack types 
and is very suitable for evaluating illegal network scanning defense mechanisms. Experimental results 
show that the accuracy of the Bi-LSTM+Attention model on this dataset reaches 98%, and the false alarm 
rate is reduced by 30% compared with the traditional LSTM model. In the pilot network area, this 
technology can effectively identify and intercept illegal scanning behaviors while maintaining low false 
alarm and missed alarm rates. By comparing with existing methods (such as honeypots, traffic 
obfuscation, etc.), we found that the Bi-LSTM+Attention model showed significant advantages in multiple 
key performance indicators. Although the model has high computing resource requirements and 
implementation complexity, its significant effect in improving detection accuracy and reducing false alarm 
rates makes it a technical solution worthy of promotion. In addition, we also discussed the trade-offs 
observed during the implementation, such as computational overhead and complexity, and proposed 
directions for future optimization. 
Povzetek: Članek obravnava inovativno tehnologijo za zaščito omrežij pred nezakonitim skeniranjem z 
uporabo dinamičnih IP-naslovov, skrivanja vrat in modelov HMM ter LSTM. 
 
 
1 Introduction obstacle restricting the healthy development of the digital 
world. Illegal network scanning, as an outpost of cyber 
In the digital era, the Internet has become an attacks, frequently threatens the safe and stable operation 
indispensable infrastructure for global economic and of all kinds of network systems, ranging from government 
social activities, carrying massive information exchange agencies, financial institutions to small and medium-sized 
and service delivery. However, with the dramatic enterprises and even individual users. Such scanning 
expansion of network scale and the continuous expansion activities aim to collect information about the topology, 
of technical boundaries, network security issues have open services, operating system types and their 
become increasingly prominent, and have become a major  vulnerabilities of the target network, paving the way for 
 subsequent targeted attacks [1].  
Hacking
Intrusion 
Networks Attacks
Points
 
Automation Mass 
Attackers Networks
Tools Scanning
Figure 1: Flow of network attack 
 
208 Informatica 49 (2025) 207–220 M. Guo et al. 
The rise of illegal network scanning of networks is detailed as follows: (1) We will conduct a comprehensive 
rooted in the complex ecology of network security attack and in-depth research to finely deconstruct the current 
and defense confrontation. With the popularization of technical characteristics of illegal network scanning, 
hacking techniques and automated tools, attackers are able popular tool sets and advanced attack strategies. This in-
to launch large-scale scans at a very low cost to find depth analysis will not only reveal the specific risks they 
potential points of intrusion. These scanning behaviors are pose to network infrastructures, but also lay a solid 
often silent and difficult to be effectively screened and foundation for the design of subsequent technical 
blocked by traditional security measures. Once the solutions, ensuring that our countermeasures hit the nail 
network is exposed to scanning, it will not only lead to on the head [3]. (2) We are committed to designing a 
sensitive information leakage and service interruption, but comprehensive defense mechanism that integrates 
also may become the starting point of distributed denial- dynamic IP address management, port obfuscation 
of-service attacks (DDoS), ransomware propagation, data policies, traffic emulation techniques, and intelligent 
theft and other serious security incidents. Therefore, the behavioral analysis. The system increases the complexity 
development of advanced anti-scanning technology to and uncertainty faced by attackers by continuously 
improve the network's stealth and resilience has become changing the external manifestation of the network, thus 
an urgent problem in the current network security field, significantly reducing the likelihood of the network being 
and the specific network attack process is shown in Figure successfully scanned and effectively thwarting illegal 
1 [2]. scanning attempts. (3) Leveraging cutting-edge AI 
Currently, illegal network scanning is characterized algorithms such as Hidden Markov Models (HMM) and 
by diversification and intelligence. On the one hand, the Long Short-Term Memory Networks (LSTM), we intend 
evolution of scanning tools and botnets has made scans to strengthen the synergy between the Intelligent Firewall 
more frequent, covert and difficult to track. Attackers use and Intrusion Prevention System (IPS), and to improve the 
botnets to disperse scanning sources and bypass detection accuracy and response speed of the two in identifying 
mechanisms based on IP reputation and frequency; on the malicious scanning behaviors. This integration not only 
other hand, Advanced Persistent Threat (APT) enables immediate threat awareness and effective 
organizations use customized scanning strategies to interception, but also maintains a high degree of adaptivity 
conduct in-depth reconnaissance for specific targets, in complex network environments. 
which increases the difficulty of defense. In addition, the This paper proposes cryptographic techniques such 
application of emerging technologies such as cloud as RSA and Diffie-Hellman to protect the security of the 
computing and the Internet of Things (IoT) further extends session. To consolidate the effectiveness of these 
network boundaries and provides scanners with a broader algorithms in ensuring secure communication within the 
attack surface. In the face of these challenges, traditional system, we cite their standard security proofs. 
protection strategies such as static firewall rules and Specifically, the security of RSA is based on the large 
simple port blocking are no longer adequate. integer factorization problem, while the security of Diffie-
In recent years, illegal network scanning behaviors Hellman relies on the discrete logarithm problem. These 
have become increasingly frequent, posing a serious threat algorithms have been widely verified in academia and 
to network security. To address this challenge, researchers industry and are widely used in various security protocols. 
have proposed a variety of technologies, including By citing these standard security proofs, we ensure the 
honeypots, dynamic address translation (NAT), traffic security of the proposed system and provide readers with 
obfuscation, and behavior-based detection systems. These a credible technical foundation. 
methods have their own advantages and disadvantages,  
but generally face problems such as high false alarm rates 2   Literature review 
and high resource consumption. This study aims to 
propose an innovative network reverse mapping security 2.1 Illegal network scanning threat analysis 
access technology by combining dynamic IP address In the field of cybersecurity, illegal network scanning 
allocation, port obfuscation, traffic camouflage, and activities pose a constant and serious threat, not only as a 
behavior analysis. We use the UNSW-NB15 dataset for critical step in the hacker's attack chain, but also as a 
experimental verification, which covers a variety of attack behavior that cyberspace security maintainers must be 
types and is suitable for evaluating illegal network wary of. This section will take an in-depth look at the types 
scanning defense mechanisms. By introducing the Bi- of network scanning and the motives behind them, risk 
LSTM+Attention model, our method shows significant assessment of information leakage, the impact of service 
advantages in improving detection accuracy and reducing disruption and availability, and an analysis of the hazards 
false alarm rates. exhibited by illegal scanning as a prelude to an attack. 
Therefore, the core objective of this research is to Illegal network scanning can be broadly categorized 
conceptualize and propose an innovative network anti- into several types: basic port scanning, service probing, 
mapping security access technology architecture, which vulnerability scanning, operating system fingerprinting, 
aims to strongly counteract illegal network scanning and so on. Port scanning is the most basic form, in which 
behaviors and significantly enhance the resilience of the an attacker discovers open services and potential entry 
network's own protection through a set of multi- points by trying to connect to different ports of the target 
dimensional and dynamically changing strategy matrices. host one by one. Service probing goes a step further by 
Specifically, the detailed objectives of this research are sending specific probe packets to known open services in 
Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 209 
order to identify the specific version of the service and With the increasing sophistication of Internet 
thus determine the presence of known vulnerabilities [4]. security threats, illegal network mapping (cyber 
While vulnerability scanning focuses on finding security reconnaissance) has become an outpost of cyber attacks. 
weaknesses at the system and application level, OS To defend against such threats, a series of anti-mapping 
fingerprinting is used to obtain precise information about techniques have emerged, aiming to obfuscate attackers 
the target system in order to customize more effective and protect the true layout and sensitive information of 
attack strategies. The motivations behind these scanning network infrastructure. This section provides a 
activities are multiple and complex. The first and foremost comprehensive overview of several mainstream anti-
is information gathering, i.e., attackers prepare for mapping techniques, including but not limited to spoofing 
subsequent attacks and need to understand the structure, techniques, dynamic address translation, traffic 
protection measures and potential weaknesses of the target obfuscation, network segmentation and micro-
network [5].  segmentation, and behavior-based detection and response 
The risk of information leakage due to illegal systems [9]. 
network scanning should not be underestimated. Even the Deception techniques are active defense strategies 
simplest port scan can reveal the layout of an that mislead attackers by deploying fake resources and 
organization's network, the specific services it uses, and services. This includes Honeypots, Honeynets, and 
their active status, which is enough information to help an Honeyflows, which mimic the characteristics of real 
attacker build an initial picture of the target. More in-depth systems or networks to attract and capture malicious 
service probes and vulnerability scans can expose deeper scanning behavior. When an attacker attempts to scan, 
vulnerabilities in the system, such as outdated software probe, or exploit these fake resources, their behavior is 
versions, which can become breakthroughs for intrusion. recorded and analyzed to give early warning and block 
Once such information falls into the wrong hands, it can potential threats. Not only do spoofing techniques drain 
not only lead to immediate data breaches or service attacker resources, they also provide security teams with 
disruptions, but also put the organization in a long-term valuable intelligence to help understand adversary tactics, 
security risk, as the exposed information can be used to techniques and procedures (TTPs). Dynamic Address 
devise more insidious and targeted attacks. While network Translation (DAT) or Network Address Translation 
scanning does not usually cause direct service disruptions, (NAT) technologies make it difficult for external entities 
it can raise indirect availability issues. A large number of to accurately map internal network structure by changing 
scanning requests can consume target system and network IP addresses between internal and external networks.DAT 
resources, including CPU, memory, and bandwidth, hides the true IP addresses of actual servers and devices, 
resulting in slower response to service requests from making it difficult for illegitimate scanners to access them. 
legitimate users, and in severe cases, denial of service may The ability of DAT to hide the real IPs of actual servers 
even occur. In addition, continuous scanning activities and devices makes it difficult for illegal scans to directly 
may trigger alarms on firewalls and intrusion detection locate specific targets, significantly increasing the 
systems, generating a large number of false positives, difficulty for attackers to identify valuable assets. 
consuming the security team's energy and interfering with Meanwhile, the strategy of regularly rotating IP addresses 
normal operations and maintenance [6,7]. further enhances this defense effect. Traffic obfuscation 
Illegal network scanning is often a harbinger of large- techniques make it difficult for external observers to parse 
scale attacks. It is a prelude to an elaborate attack plan by the true source, purpose, and content of packets by altering 
cybercriminals, whether it is data theft against a specific the patterns and characteristics of network 
organization, ransomware deployment, or resource communications. This includes altering port numbers, 
probing for a distributed denial of service (DDoS) attack. protocol characteristics, timestamps, and other network 
By conducting comprehensive reconnaissance of the traffic attributes, making it impossible for scanning tools 
target, attackers can precisely select attack paths, to correctly identify service type or version information. 
customize attack payloads, increase attack success rates Combined with encryption techniques, such as SSL/TLS, 
and reduce the risk of detection. Therefore, timely traffic obfuscation can more effectively hide the true 
identification and effective response to illegal network nature of network activity, increasing the cost and 
scanning activities are crucial for stopping potential complexity of illegal mapping [10,11]. 
network attacks and are an indispensable part of the Network segmentation is the division of a large 
network defense system [8]. network into multiple small areas that are logically or 
To summarize, illegal network scanning, as a physically isolated, limiting the ability to move laterally 
pervasive network threat with complex and varied hidden and making it difficult for an attacker to get a full grasp of 
motives behind it, poses direct and indirect threats to the layout of the entire network even if he or she breaks 
information security, service availability and the overall through a portion of the network. Micro-Segmentation 
network environment.  goes one step further by realizing fine-grained access 
 control, with strict access rules even between different 
 resources within the same subnet. This strategy greatly 
 improves the difficulty for attackers to navigate the 
2.2 Overview of existing antimapping internal network and reduces the efficiency and success 
techniques rate of illegal mapping [12]. Modern cybersecurity 
frameworks are increasingly relying on artificial 
210 Informatica 49 (2025) 207–220 M. Guo et al. 
intelligence and machine learning techniques, where which they can quickly identify scanning behaviors that 
behavior-based detection and response systems are able to deviate from the norm, and even predict and block future 
automatically analyze network traffic patterns, identify attack attempts. Through real-time monitoring, intelligent 
anomalous behaviors, and instantly respond to potential analysis, and automatic response, the efficiency and 
mapping activities. Such systems are able to learn a accuracy of countering illegal mapping is greatly 
behavioral baseline of normal network activity, from improved [13]. 
 
Table 1: Research findings 
Key 
Research/Technology Method Dataset Limitations 
Performance Metrics 
High resource 
Detection Rate: 
consumption, requires 
Deploying fake 85% 
continuous 
resources and services Custom or public False Positive 
Honeypot Technology maintenance 
to attract and mislead datasets Rate: 10% 
Can be identified 
attackers Resource 
and bypassed by 
Consumption: High 
advanced attackers 
Detection Rate: 
Limited defense 
75% 
Changing IP against complex attack 
Laboratory False Positive 
Dynamic Address addresses between strategies 
environments or Rate: 5% 
Translation (NAT) internal and external Difficult to 
enterprise networks Resource 
networks handle large-scale 
Consumption: 
scanning 
Moderate 
Detection Rate: Limited 
70% effectiveness against 
Altering network 
Public datasets False Positive advanced scanning 
Traffic Obfuscation communication 
such as UNSW-NB15 Rate: 8% strategies 
patterns and features 
Resource May affect 
Consumption: Low legitimate traffic 
Detection Rate: 
Complex 
65% 
Dividing the configuration, high 
False Positive 
Network network into multiple Enterprise operational costs 
Rate: 3% 
Segmentation logically isolated networks Limited defense 
Resource 
segments against lateral 
Consumption: 
movement attacks 
Moderate 
Detection Rate: Requires large 
80% amounts of data for 
Using machine 
Behavior-Based Public datasets False Positive model training 
learning to analyze 
Detection Systems such as CICIDS2017 Rate: 12% Limited 
network traffic patterns 
Resource generalization to new 
Consumption: High types of attacks 
As shown in Table 1, we compare different research but require large amounts of data for training and have 
and technologies in the context of illegal network scan limited generalization to new types of attacks. 
defense, including their methods, datasets, key  
performance metrics, and limitations. From the table, it 2.3 Status of research 
can be seen that honeypot technology, while effective in Honeypot technology has evolved from a single 
collecting attacker behavior information, has high decoy system to a complex system containing advanced 
resource consumption and requires continuous interactive honeypots and honeynets. Advanced 
maintenance, making it vulnerable to being identified and honeypots are able to simulate the behavior of real 
bypassed by advanced attackers. Dynamic Address systems, including operating system vulnerabilities, 
Translation (NAT) increases the difficulty for attackers by service responses, etc., as a way to collect the behavioral 
hiding internal IP addresses but is limited in its patterns and tool usage of attackers [14]. And by 
effectiveness against complex and large-scale scanning constructing a honeypot system containing multiple 
activities. Traffic obfuscation alters network interconnected honeypots, the honeynet not only increases 
communication patterns, making it difficult for scanning the difficulty for attackers to identify real assets, but also 
tools to correctly identify service types, but it is less traces the attack path and provides richer analysis data for 
effective against advanced scanning strategies and may security teams. With the development of automation and 
impact legitimate traffic. Network segmentation reduces intelligence, adaptive honeynet technology is emerging, 
the lateral movement capabilities of attackers through which dynamically adjusts honeypot configurations based 
logical isolation but is complex to configure and has high on attack behavior for more efficient intelligence 
operational costs. Behavior-based detection systems use gathering and defense response. Dynamic address 
machine learning models to automatically analyze translation (NAT) and network segmentation are effective, 
network traffic patterns, improving detection accuracy, but in the face of complex and changing attack methods, 
it is difficult to meet the demand with static strategies 
Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 211 
alone [15]. Dynamic network architectures, such as necessity of improving data security and reliability in 
software-defined networking (SDN) and network function network environments. 
virtualization (NFV), are emerging as the new frontiers of  
anti-mapping. SDN allows administrators to flexibly 3 Innovative network anti-mapping 
configure network routing and security policies from a 
centralized controller to quickly respond to network security access technology 
threats, while NFV enables on-demand allocation and on-
3.1 Technical architecture design 
the-fly adjustment of resources by virtualizing the 
When designing the technical architecture of an 
functions of traditional network devices, which enhances 
advanced networked anti-mapping security access system, 
network flexibility and stealth. Although traffic 
we need to comprehensively consider a variety of factors 
obfuscation can effectively interfere with enemy 
including, but not limited to, security, availability, 
detection, it is a major challenge to implement it 
scalability, and performance optimization. In this section, 
accurately without affecting legitimate services. The 
we will delve into how to build such a system through 
combination of Deep Packet Inspection (DPI) and 
specific technical principles, algorithmic formulations, 
machine learning algorithms provides a possible solution 
and implementation details to ensure its effectiveness and 
to this problem [16]. DPI techniques can deeply parse 
robustness in complex network environments. 
network traffic to identify and classify different 
We adopt a dynamic IP address allocation policy 
application layer protocols, while machine learning 
(denoted as DIPA policy), which, in combination with 
models learn normal and abnormal behavior patterns by 
geolocation obfuscation techniques, can effectively 
analyzing huge amounts of network traffic data, thus 
improve the anonymity of the system. Let there be N pools 
achieving accurate identification of hidden mapping 
of available IP addresses in the network, and the 
behaviors. In addition, the application of unsupervised 
probability of dynamically changing addresses in each 
learning and adaptive learning algorithms enables the 
cycle T is P. The degree of obfuscation of the system is C. 
system to self-optimize in a constantly changing threat 
environment, enhancing the dynamic adaptability of the Where log2(N)  reflects the entropy value of the size of 
defense. the address pools, which represents the uncertainty of 
Although the above technologies provide a powerful address selection. By adjusting the values of P and T, 
arsenal for anti-mapping, they still face many challenges security can be balanced with network maintenance cost. 
in actual deployment. First, the cost and complexity of In the port obfuscation technique, assuming that there are 
operation and maintenance are factors that cannot be M legitimate ports and K emulation protocols, the 
ignored, especially for small and medium-sized complexity S of port obfuscation can be quantified by the  
enterprises (SMEs), for which high-level anti-mapping  
solutions may be beyond their financial and technical M
 i 
capacity. Second, the synergistic operation between following equation: S = M + K 1−   
technologies is also one of the difficulties, how to ensure i=1  M 
that different defense mechanisms can complement each M
 i 
other while avoiding mutual interference requires careful Here,1−   represents the contribution of the  
planning and tuning [17,18]. In addition, legal compliance i=1  M 
is also a point of consideration, as certain anti-mapping  
measures may involve regulatory restrictions on user randomness of the port usage to the obfuscation 
privacy protection and cross-border data transmission. effect [19], and the reuse of ports decreases and the 
In terms of data storage and transmission, Yang et al. obfuscation effect improves as i increases. Deep data 
[31] proposed a data sharing scheme for cloud storage obfuscation involves not only the header camouflage of 
services based on the concept of message recovery, which packets, but also the transformation of load data. Let the 
improves the reliability and security of data by introducing original data X be changed into Y by the obfuscation 
redundant information. This data sharing mechanism not function F. Ideally, F should satisfy irreversibility, i.e., the 
only enhances the integrity of the data, but also improves complexity of recovering X from Y is extremely high. A 
the ability of data to resist attacks during transmission. simple example of obfuscation is to use the XOR 
Similarly, Muthusenthil et al. [32] proposed a location operation with the key K: Y = X K  However, in 
verification technology in cluster-based geolocation practice, more complex encryption algorithms such as 
routing, which enhances the security of mobile ad hoc AES are usually used, whose security is based on the size 
networks (MANETs) by verifying the location of the key space, i.e., 2n  , where n is the key length [20]. 
information of nodes. Both methods emphasize the 
 
212 Informatica 49 (2025) 207–220 M. Guo et al. 
Public Key A
Private key
Symmetric 
Key
Signature Signature 
Original text
cipher cipher
Original
 
Symmetric 
User 1 key User 2
Public Key B
Digital 
envelope Digital 
B Private key
envelope
Symmetric 
key
Figure 2: Two-way authentication process 
 
In the two-way authentication process, it is assumed each subnet are defined by access control lists (ACLs), the 
that RSA public key encryption and DH key exchange complexity E of which can be measured by the number of 
protocols are used.The security of RSA encryption is subnets and the number of ACL rules R: E = nR  
based on the large number decomposition puzzle.Let the Combined with role-based access control (RBAC), where 
public key be (e,n), the private key be (d,n), and the the role R_i corresponds to the set of permissions P_i, the 
message M be encrypted to be C, then we have: user U is assigned roles through the mapping function f:
C =M e modn  The receiver decrypts the message U ⎯⎯f→Ri  P  In this way, a user can only perform 
i
by using the private key: M =Cd modn  Whereas, in the operations that are permitted by his or her role In this 
the Diffie-Hellman protocol, both parties compute a way, users can only perform the operations allowed by 
shared key, K, by sharing the parameters g and p: their roles, which enhances the security control within the 
system. 
A= ga mod p  , . B = gb mod p [21],
 
K = Ba mod p = Ab mod p  . This dynamic key 3.2 Intelligent defense mechanism 
exchange ensures the security independence of each The synergistic operation of intelligent firewalls and 
session, and the specific two-way authentication process intrusion prevention systems (IPSs) is particularly 
is shown in Fig. 2. important in the evolving network threat landscape. We 
The network micro-segmentation technique realizes propose an innovative dual-engine architecture that 
the least privilege principle by partitioning the network combines traditional rule-based static defense with 
into multiple logical subnets. Assuming that the network advanced machine learning dynamic adaptation 
is partitioned into n subnets, the trust boundaries within capabilities, the framework of which is shown in Fig. 3 
[22]. 
Fuzzy Logic Abnormal 
Screening 
Systems Traffic 
Markov Behavioral Intelligent 
Analyzing  
Processes Sequences Defense 
Mechanism
 LSTM 
Prediction Traffic trend 
networks
Figure 3: Intelligent defense mechanism framework 
 
Fuzzy Logic System (FLS) plays a key role in this IP, etc.; A1,..., An  is the affiliation function of these 
architecture by building a flexible set of rules to evaluate 
feature vectors, which defines the "fuzzy" degree of each 
the network events, which is expressed in the form of:
feature in a set of linguistic variables; and y as the decision 
Ri : IF x1 is A1 AND ... AND xn  is An  THEN y is B  output indicates the degree of suspicion of this network 
where (xi ,  ...,  x event; and B is the decision output affiliation degree. is a 
n )  represents multiple feature vectors of 
function of these feature vectors, defining the degree of 
the network traffic, such as packet size, frequency, source "fuzziness" of each feature in the set of linguistic 
Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 213 
variables; and y, as the decision output, indicates the indicator for anomaly detection. To further improve the 
degree of suspicion of the network event, while B is the model performance, we introduce the attention 
linguistic variable affiliation of the decision output. This mechanism, which is an effective method for guiding the 
mechanism allows the firewall to quickly identify and model to focus on the key pieces of information in the 
respond to anomalous traffic patterns, while the linkage traffic sequence. Attention weights are computed as  
with the IPS can instantly block potential intrusions,  
forming a multi-layered, intelligent defense network. e T
t = v tanh(Whht )First-order Markov processes (Markov Chain of 
Order 1, MC1) are widely used in the prediction and exp(e
follows: t ) [
t =  25] , v and W
T h  are  
analysis of behavioral sequences, especially in identifying 
abnormal and malicious activities in networks. By exp(ek )
constructing a matrix P = [ pij ]i, j  reflecting the k=1
 
probability of state transfer for normal network behavior, 
model parameters, andt  denotes the attention weights at 
where pij  denotes the probability of transferring from 
the tth time step, which are subsequently used to weight 
state i to state j, we are able to use the model to assess the 
and sum the hidden states to generate context vectors that 
fit of the test sequences with the predefined normal 
focus on the information that is most critical for 
behavior model. Specifically, the likelihood L(X) of 
prediction. The use of Bi-LSTM (Bi-LSTM) greatly 
sequence X under the model can be expressed as: 
enhances the model's ability to capture complex temporal 
 
dependencies by simultaneously considering both past 
T
(forward LSTM) and future (backward LSTM) contextual 
L(X ) = P(X | Model) = px  [22] When  
t−1xt information of the sequence, as shown in Equation 6. 
t=2
 
 
the likelihood of a sequence is significantly lower ht = LSTM forward (xt ,ht−1)
than the threshold of the normal behavior model, the 
sequence is considered to contain malicious behavior. ht = LSTM backward (xt ,h  (6) 
t+1)
This approach not only improves the accuracy of 
detection, but also dynamically adapts to changes in ht = [ht ;ht ]
network behavior, further enhancing the system's  
intelligent response capability. Combining the above techniques, we not only 
For the dynamic nature of network traffic, Long construct a model that can accurately predict traffic trends, 
Short-Term Memory (LSTM) networks are preferred tools but also directly identify potential network anomalous 
for anomaly detection due to their powerful time-series behaviors by comparing the difference between the model 
modeling capabilities. LSTM units efficiently deal with prediction and the actual observed values, providing both 
long-term dependencies through their unique gating a powerful and sensitive early warning system for the 
mechanisms (forgetting gates f  , input gates i network security protection system. This comprehensive 
t t  , and 
strategy not only improves the generalization ability of the 
output gates ot  , whose update formulas are specified in model and enhances its adaptability to emerging threats, 
Eqs. 1-5 [23,24]. but also brings more refined monitoring and protection 
f tools to the field of network security [26]. 
t = (Wf [ht−1, xt ]+bf ) (1) 
 
it = (Wi [ht−1, xt ]+bi ) (2) 3.3 Access control policy optimization 
o In the face of increasingly complex and changing 
t = (Wo [ht−1, xt ]+bo) (3) 
network access demands and security threats, traditional 
c static access control lists (ACLs) can no longer meet the 
t = ft ct−1 + it tanh(Wc [ht−1, xt ]+bc) 
(4) requirements of efficient and accurate traffic management, 
and the general access control model is shown in Fig. 4. 
ht = ot tanh(ct )  (5) 
Therefore, we introduce an innovative adaptive weighting 
 algorithm, which aims to dynamically adjust the priority 
where  represents the Sigmoid activation function, of ACL entries so as to achieve efficient processing of 
tanh is the hyperbolic tangent function, odot denotes the legitimate traffic and keen identification of potential 
elementwise multiplication operation, and threats. The core formula of this policy is: 
W  
f ,Wi ,Wo ,Wc  and bf ,bi ,bo ,bc  are the weight 
matrices and bias terms for each gate and cell state and Wi (t +1) =Wi (t)+ (Hi −H)+  H   
i
hidden state, respectively.  
Training the LSTM model with a large amount of In this formula,Wi (t)  represents the weight of the 
historical traffic data not only predicts the future traffic 
trend, but also the deviation between the model predicted ith ACL rule at time t, which integrates the historical 
value and the actual traffic data can be used as a direct traffic data and real-time threat intelligence to realize the 
214 Informatica 49 (2025) 207–220 M. Guo et al. 
adaptive adjustment of rule priority. Among them, Hi  changing trends to make the adjustment more delicate and 
accurate. 
reflects the historical importance of the traffic matched by 
In order to further accelerate the recognition and 
the rule, H  is the average importance of all rules, which processing speed of legitimate traffic, we design a high-
aims to highlight the key rules by comparison; H speed matching mechanism that combines Deep Packet 
i  
Inspection (DPI) technique with machine learning. This 
quantifies the rate of change of the rule's importance to 
mechanism utilizes a pre-trained Support Vector Machine 
ensure that the policy can quickly respond to the changes 
(SVM) model to accurately determine the traffic features 
of network conditions; and the adjustment coefficients,  
with its powerful classification capability. The decision 
and  , balance the effects of historical performance and 
function of the SVM model is: g(x) =wT(x)+b  [27]. 
 
Thematic Environ
ment 
Object 
 
Permi
t 
Access requests 
Denia
l 
Figure 4: Access control model 
 
Here, w is the weight vector, (x)  is the feature the time step, D is the input feature dimension, and H is 
the number of hidden layer units. The introduction of the 
transformation function that maps the original feature 
attention mechanism adds additional computational cost, 
vector x to a higher dimensional space, and b is the bias 
and its time complexity is O (T * H). Overall, the time 
term of the model. By learning from a large number of 
complexity of the training phase is O (T * (D * H^2 + H)). 
samples, the model is able to accurately distinguish the 
In the inference phase, the time complexity is relatively 
feature boundaries between legitimate and illegitimate 
low, O (T * (D * H + H)). 
traffic, and set the threshold  , any traffic that satisfies
Compared with traditional rule-based systems, the 
g(x)   is immediately released without further Bi-LSTM+Attention model has obvious advantages in 
checking, which greatly improves the throughput and dynamic adaptability and accuracy, although it has higher 
response speed of the network. The efficiency of this computational requirements. Traditional systems rely on 
mechanism lies in its deep integration of the fine-grained predefined rules and have difficulty in dealing with new 
parsing capability of DPI and the intelligent judgment attacks and changing network environments. The Bi-
advantage of SVM model, which not only can quickly LSTM+Attention model can automatically learn and adapt 
identify and release regular legitimate traffic, but also can to new threat patterns, thereby maintaining efficient 
effectively resist advanced threats disguised as legitimate detection capabilities in a constantly changing network 
traffic, ensuring the security and smoothness of network environment. Despite the high demand for computing 
access. resources, its contribution to improving the level of 
Through the careful design and strategy optimization network security protection makes it a reasonable and 
of the above technical architecture, the network anti- necessary choice. 
mapping security access technology proposed in this  
chapter takes a solid step forward in ensuring the dynamic 4   Experimental design and analysis of 
adaptability and security of the network environment. This 
solution not only strengthens the defense against network results 
mapping attacks, but also significantly improves the 4.1 Experimental design 
operational efficiency of the network and user satisfaction, 
providing strong technical support for building a more In this study, we carefully built the experimental 
robust and flexible network security protection system. environment and selected appropriate datasets to ensure 
We elaborate on the time complexity of the proposed the reproducibility of the experiments and the validity of 
Bi-LSTM+Attention algorithm. In the training phase, the the results. The experimental environment includes a 
time complexity of LSTM is O(T * D * H^2), where T is high-performance server cluster with each node equipped 
Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 215 
with an Intel Xeon E5-2690 v4 processor, 128GB RAM, performance testing phase, we deployed the model into a 
and NVIDIA Tesla V100 GPUs to provide powerful simulated network environment and tested its response 
computing power. For the software environment, we time, throughput, and resource consumption under 
chose the Ubuntu 18.04 operating system, the Python 3.7 different conditions to comprehensively evaluate the 
programming language, and the TensorFlow 2.3 deep model's performance. Through these steps, we ensured the 
learning framework, and the combination of these tools rigor of the experiments and the reliability of the results 
provided a stable and efficient platform for our [30]. 
experiments [28,29]. The experimental environment includes a high-
The choice of dataset is crucial for model training and performance server cluster, each node is equipped with 
testing. We adopt the publicly available UNSW-NB15 Intel Xeon E5-2690 v4 processor, 128GB RAM and 
dataset, which contains 49,740 records covering normal NVIDIA Tesla V100 GPU to provide powerful computing 
network traffic and multiple attack types, and is well power. In terms of software environment, we chose 
suited for deep learning model training related to network Ubuntu 18.04 operating system, Python 3.7 programming 
security. In addition, we also built our own performance language and TensorFlow 2.3 deep learning framework to 
test dataset generated from a simulated network ensure the stability and efficiency of the experiment. 
environment, which simulates network traffic under We chose the UNSW-NB15 dataset as the main data 
different loads and is used to evaluate the performance source, which covers a variety of attack types, including 
impact of the models in real network environments. DoS, DDoS, SQL injection, etc., and is very suitable for 
In terms of technical implementation steps, we evaluating illegal network scanning defense mechanisms. 
follow a series of key steps including data preprocessing, The advantage of the UNSW-NB15 dataset lies in its 
model construction, training and tuning, and performance diversity and realism, which can better represent the 
testing. The data preprocessing phase includes operations security challenges in the real world. In contrast, although 
such as data cleaning, normalization, and time-series the CICIDS2017 dataset also contains a variety of attack 
partitioning to ensure the quality and consistency of the types, its scale is small and the sample size of some attack 
data. In the model construction phase, we design and types is insufficient. Therefore, the UNSW-NB15 dataset 
implement a bi-directional LSTM model with an has more advantages in comprehensiveness and 
integrated attention mechanism to improve the model's representativeness, and is more suitable as our 
ability to process time series data. In the training and experimental dataset. 
tuning phase, we used a cross-validation method to select  
the optimal hyperparameters, including the learning rate, 4.2 Experimental results 
batch size, and the number of hidden layer units, to 
optimize the performance of the model. Finally, in the 
 
Figure 5: Comprehensive defense effect 
 
Figure 5 shows the performance of different models model shows the best defense on all attack types with the 
in detecting various network attack types including DoS, lowest false alarm rate, indicating the high efficacy of this 
DDoS, SQL injection and XSS.The Bi-LSTM+Attention model in accurately identifying attacks. 
 
 
 
216 Informatica 49 (2025) 207–220 M. Guo et al. 
Table 2: False alarm rate breakdown 
Overall False Normal Traffic False Anomalous but not attack 
mould 
Alarm Rate Alarms false positives 
LSTM model 3.2% 1.8% 1.4% 
LSTM+Attention 2.1% 1.2% 0.9% 
Bi-LSTM 2.8% 1.6% 1.2% 
Bi-
1.5% 0.9% 0.6% 
LSTM+Attention 
Table 2 breaks down the overall false alarm rates of alarm rate, indicating that it performs well in reducing 
the different models, as well as the false alarm rates for false alarms, which is crucial for improving the reliability 
normal traffic and abnormal but non-attacking traffic. The of network defense systems. 
Bi-LSTM+Attention model has the lowest overall false  
 
Table 3: Breakdown of underreporting rates 
Overall underreporting Known attack New Attack 
mould 
rate misses Leakage 
LSTM model 2.5% 1.3% 1.2% 
LSTM+Attention 1.8% 0.9% 0.9% 
Bi-LSTM 2.2% 1.1% 1.1% 
Bi-
1.3% 0.6% 0.7% 
LSTM+Attention 
Table 3 demonstrates the leakage rates of different Table 4 records the average response time and 
models in detecting known and novel attacks. The Bi- throughput of the different models in the simulated 
LSTM+Attention model has the lowest leakage rate on network environment. Although the introduction of the 
both attack types, which indicates that the model has a defense model leads to a slight increase in response time 
strong generalization ability in identifying novel attacks.  and a slight decrease in throughput, the Bi-LSTM and Bi-
 LSTM+Attention models are better able to maintain high 
Table 4: Response time and throughput network performance compared to the other models. 
Response Throughput In order to comprehensively evaluate the model 
mould time average Average performance, we introduced statistical significance tests 
(ms) (Mbps) such as t-tests on the basis of existing evaluation indicators 
to verify the reliability of the results. In addition to false 
defenseless 2.3 98.7 positive and false negative rates, we also reported 
comprehensive indicators such as accuracy, recall, and F1 
LSTM model 3.5 95.2 
score. Specifically, the Bi-LSTM+Attention model 
LSTM+Attention 3.8 93.8 achieved an accuracy of 98%, a recall of 95%, and an F1 
score of 96.5% on the UNSW-NB15 dataset. These 
Bi-LSTM 3.2 96.4 
indicators not only demonstrate the high accuracy of the 
Bi- model in detecting illegal network scanning, but also show 
3.6 94.6 
LSTM+Attention that it has high practical value in practical applications. 
 
 
 
 
 
 
 
 
 
 
 
 
Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 217 
Table 5: Model performance comparison 
F1 
Accuracy Recall False Positive False Negative t-test (p-
Model/Method Score 
(%) (%) Rate (%) Rate (%) value) 
(%) 
Bi-LSTM+Attention 98.0 95.0 96.5 1.5 5.0 < 0.05 
Rule-Based System 80.0 75.0 77.4 10.0 25.0 - 
LSTM Model 85.0 82.0 83.5 8.0 18.0 < 0.05 
LSTM with Attention 
90.0 88.0 89.0 5.0 12.0 < 0.05 
Mechanism 
Bidirectional LSTM 
92.0 90.0 91.0 4.0 10.0 < 0.05 
(Bi-LSTM) 
In Table 5, through t-tests, we found that the Bi- when dealing with complex and variable network traffic, 
LSTM+Attention model showed significant differences and can effectively reduce the false alarm rate while 
from the rule-based system and other LSTM variants in maintaining a high detection rate. These results show that 
multiple key performance indicators (p < 0.05), further the Bi-LSTM+Attention model is not only theoretically 
confirming the effectiveness and superiority of the new advantageous, but also has high practical value in practical 
method. In addition, the model performs particularly well applications. 
 
Table 6: Resource consumption 
mould Average CPU utilization (%) Average Memory Usage (MB) 
defenseless 3.1 230 
LSTM model 5.8 320 
LSTM+Attention 6.5 350 
Bi-LSTM 4.9 280 
Bi-LSTM+Attention 5.4 300 
 In summary, the Bi-LSTM+Attention model 
Table 7: Network latency and energy consumption performs the best in terms of comprehensive defense 
Average Average energy effect, false alarm rate and missed alarm rate, and at the 
mould network consumption same time has a relatively small impact on network 
performance, making it an efficient network defense 
latency (μs) (W) 
solution. 
defenseless 75 200  
LSTM model 90 250 4.3 Discussion 
The technical architecture in this study demonstrates 
LSTM+Attention 95 270 significant innovative advantages, especially in terms of 
Bi-LSTM 85 230 dynamism and intelligence. The anonymity of the network 
is effectively improved through dynamic IP address 
Bi-
90 260 assignment and geolocation obfuscation, making it 
LSTM+Attention difficult for mapping attackers to target the real resource 
 locations. The synergy of intelligent firewall and IPS, the 
Table 6 shows the average consumption of CPU and use of fuzzy logic system, and the application of Markov 
memory resources by the different models during model and LSTM not only enhances the ability to identify 
operation. The LSTM+Attention model is slightly higher malicious behaviors, but also significantly improves the 
in terms of resource consumption, but all the models are response speed. In particular, the LSTM model improves 
within acceptable resource usage, indicating that these the accuracy of anomaly detection through the attention 
models can effectively run-on existing network devices. mechanism and bi-directional structure, demonstrating the 
Table 7 evaluates the impact of the different models on great potential of deep learning in complex network 
network latency and energy consumption. The Bi-LSTM defense. 
model performs the best in terms of network latency and Honeypot technology deploys false resources and 
energy consumption, suggesting that it is effective in services to attract and mislead attackers, and can 
controlling operational costs while maintaining network effectively collect attacker behavior information. 
performance. However, honeypot technology consumes a lot of 
218 Informatica 49 (2025) 207–220 M. Guo et al. 
resources, requires continuous maintenance, and is easily considering its significant advantages in improving the 
identified and bypassed by advanced attackers. In contrast, level of network security protection. Future work can 
the Bi-LSTM+Attention model is more economical in explore optimization algorithms to further reduce 
terms of resource consumption and does not require computational costs and make it more applicable in more 
additional hardware or continuous manual maintenance. scenarios. 
In addition, the Bi-LSTM+Attention model can Through the above comparison and analysis, we can 
automatically adapt to new threats by learning network conclude that the Bi-LSTM+Attention model has 
traffic patterns, reducing dependence on manual significant advantages in illegal network scanning 
intervention. Although honeypot technology has defense. It not only performs well in detection rate and 
advantages in collecting intelligence, the Bi- false alarm rate, but also can effectively adapt to complex 
LSTM+Attention model performs better in terms of false network environments. Despite the certain computational 
positive rate and false negative rate, reaching 1.5% and overhead and implementation complexity, the security and 
5.0% respectively, which are significantly lower than the reliability improvements it brings make it a technical 
10% and 25% of honeypot technology. solution worthy of promotion. 
Traffic obfuscation changes network communication Limitations: Despite the remarkable results, there are 
patterns and features, making it difficult for scanning tools some limitations of the proposed technical solution. The 
to correctly identify service types. Although traffic first one is the resource consumption issue, such as the 
obfuscation performs well in reducing false positive rates, high performance of the LSTM model which requires high 
it has limited effect on advanced scanning strategies and computational resources and may be difficult to deploy in 
may affect the normal transmission of legitimate traffic. resource-limited environments. Secondly, the false alarm 
The Bi-LSTM+Attention model uses deep learning and omission rates, although significantly reduced, need 
algorithms to more accurately identify and classify to continue to be optimized to reduce the interference with 
network traffic, which not only reduces false positive rates normal operations. Further, the complexity of the 
but also improves detection rates. Specifically, the Bi- technology implementation may pose a challenge to small 
LSTM+Attention model has a false positive rate of 1.5%, and medium-sized enterprises, requiring specialized 
while the traffic obfuscation technology has a false knowledge and maintenance costs. 
positive rate of 8%. In addition, the Bi-LSTM+Attention  
model performs particularly well when dealing with 5   Conclusion 
complex and changing network traffic, and can effectively 
In this study, we successfully developed and 
reduce false positive rates while maintaining high 
validated an innovative set of network anti-mapping 
detection rates. 
security access techniques, which have achieved 
Dynamic Address Translation (NAT) increases the 
significant results in enhancing network defenses, 
difficulty for attackers by changing IP addresses between 
improving anonymity and ensuring secure data 
internal and external networks. However, NAT is limited 
transmission. The comprehensive design of the technical 
in its effectiveness when dealing with complex and large-
architecture, especially the integration of dynamic policies 
scale scanning activities. The Bi-LSTM+Attention model 
and intelligent algorithms, effectively counteracts the 
can automatically adapt to new threats by learning 
complex security threats in the modern network 
network traffic patterns, thereby showing higher detection 
environment. Experimental data analysis proves that the 
rates and lower false positive rates in complex and large-
bidirectional LSTM model with the introduction of the 
scale scanning activities. NAT has a false positive rate of 
attention mechanism improves the accuracy of anomaly 
5%, while the Bi-LSTM+Attention model has a false 
detection while reducing the false alarm rate of normal 
positive rate of only 1.5%. 
network activities, indicating that the combination of deep 
Behavior-based detection systems use machine 
learning and traditional security technologies is an 
learning models to automatically analyze network traffic 
effective way to enhance the performance of network 
patterns and improve detection accuracy. However, these 
defense. Despite the obvious advantages of the 
systems usually require a large amount of data for training 
technology, including dynamism, intelligence, and 
and have limited generalization capabilities for new 
efficient defense against multiple attack types, we also 
attacks. The Bi-LSTM+Attention model improves 
recognize some challenges in the implementation of the 
detection performance by introducing an attention 
technology. The resource consumption problem is a key 
mechanism to enhance the model's focus on key features. 
barrier to the deployment of current deep learning models, 
In practical applications, the Bi-LSTM+Attention model 
especially in scenarios with limited computational 
outperforms the behavior-based detection system in terms 
resources. In addition, the complexity of the technique 
of accuracy, recall, and F1 score. 
requires higher maintenance costs and specialized skills, 
Although the Bi-LSTM+Attention model performs 
which may limit its widespread adoption in SMEs. 
well on multiple key performance indicators, it has high 
Therefore, future research should focus on model 
computational overhead and implementation complexity. 
lightweighting, resource optimization, and simplifying the 
The time complexity of the training phase is O(T * (D * 
deployment process to facilitate the technology's 
H^2 + H)), and the time complexity of the inference phase 
popularity. Compared with existing antimapping 
is O(T * (D * H + H)). This makes it challenging to deploy 
techniques, the technical framework in this study shows 
the model in a resource-constrained environment. 
significant advantages in terms of dynamic adaptability, 
However, this computational overhead is reasonable 
Dynamic Anti-Mapping Network Security Using Hidden Markov… Informatica 49 (2025) 207–220 219 
intelligent response, and accuracy, especially in dealing Engineering. 2022; 42(1): 133-48. 
with complex network behavior sequence prediction and https://doi.org/10.32604/csse.2022.020123  
anomaly detection tasks. However, continuous [5] Chiu WY, Meng WZ, Jensen CD. my data, my 
performance optimization, further reduction of false alarm control: a secure data sharing and access scheme over 
and omission rates, and exploration of the convergence of blockchain. Journal of Information Security and 
new technologies, such as the application of quantum Applications. 2021; 63: 102994. 
computing and edge computing in security, will be the key https://doi.org/10.1016/j.jisa.2021.102994. 
directions for future development.  [6] Yang D, Wang BC, Ban XH. Fully secure non-
This paper proposes an innovative network reverse monotonic access structure CP-ABE scheme. KSII 
mapping security access technology to cope with the Transactions on Internet and Information Systems. 
increasingly frequent illegal network scanning behaviors. 2018; 12(3): 1315-29. 
By combining dynamic IP address allocation, port https://doi.org/10.3837/tiis.2018.03.019 
obfuscation, traffic camouflage and behavior analysis, we [7] Suebsombut P, Sekhari A, Sureephong P, Belhi A, 
build a more robust network security protection system. Bouras A. Field Data Forecasting Using LSTM and 
Experimental results show that the Bi-LSTM+Attention Bi-LSTM Approaches. Applied Sciences-Basel. 
model achieves 98% accuracy on the UNSW-NB15 2021; 11(24): 11957. 
dataset and reduces the false alarm rate by 30%. This https://doi.org/10.3390/app112411957. 
technology effectively identifies and intercepts illegal [8] Sonkamble RG, Bongale AM, Phansalkar S, Sharma 
scanning behaviors in the pilot network while maintaining A, Rajput S. Secure Data Transmission of Electronic 
low false alarm and missed alarm rates. Compared with Health Records Using Blockchain Technology. 
existing methods, our method has significant advantages Electronics. 2023; 12(4): 1003. 
in detection accuracy and resource efficiency, providing a https://doi.org/10.3390/electronics12041003. 
more reliable solution for network security. [9] Agrawal R, Singhal S, Sharma A. Blockchain and 
This paper discusses the challenges that small and fog computing model for secure data access control 
medium-sized enterprises (SMEs) face when adopting mechanisms for distributed data storage and 
these technologies, including limited computing resources authentication using hybrid encryption algorithm. 
and deployment complexity. To alleviate these challenges, Cluster Computing. 2024; 27(1), 1–15. 
we recommend using model compression techniques, such https://doi.org/10.1007/s10586-023-04120-9 
as pruning and quantization, to simplify the deployment [10] Sureshkumar T, Lingaraj M, Anand B, Premkumar 
process and reduce computing resource requirements. In T. Non-dominated sorting particle swarm 
addition, SMEs should consider leveraging off-the-shelf optimization (NSPSO) and network security policy 
solutions from cloud service providers to reduce initial enforcement for Policy Space Analysis. International 
investment costs. At the same time, potential regulatory Journal of Communication Systems. 2018; 31(10): 
issues, such as the impact of GDPR on network traffic e3576. https://doi.org/10.1002/dac.3576. 
monitoring, can help enterprises ensure compliance. With [11] Khan I, Ghani A, Saqlain SM, Ashraf MU, Alzahrani 
these measures, SMEs can implement and manage A, Kim D. Secure Medical Data Against 
cybersecurity solutions more effectively. Unauthorized Access Using Decoy Technology in 
 Distributed Edge Computing Networks. IEEE 
Funding Access. 2023; 11: 144560-73. 
https://doi.org/10.1109/ACCESS.2023.3344168 
This study was supported by State Grid Shanxi 
[12] Pinto S, Machado P, Oliveira D, Cerdeira D, Gomes 
Electric Power Company Science and Technology Project 
T. Self-secured devices: high performance and 
Research (No.52053023001U).  
secure I/O access in TrustZone-based systems. 
 
Journal of Systems Architecture. 2021; 119: 102238. 
References https://doi.org/10.1016/j.sysarc.2021.102238 
[1] Adi K, Hamza L, Pene L. Automatic security policy [13] Yang J, Chen YH, Du SY, Chen BD, Principe JC. 
enforcement in computer systems. Computers & IA-LSTM: Interaction-Aware LSTM for Pedestrian 
Security. 2018; 73: 156-71. Trajectory Prediction. IEEE Transactions on 
https://doi.org/10.1016/j.cose.2017.10.012 Cybernetics. 2024; 57(4): 3904-3917, 
[2] Paananen H, Lapke M, Siponen M. State of the art in https://doi.org/10.1109/TCYB.2024.3359237. 
information security policy development. Computers [14] Meng YF, Huang ZQ, Shen GH, Ke CB. A security 
& Security. 2020; 88: 101615. policy model transformation and verification 
https://doi.org/10.1016/j.cose.2019.101615 approach for software defined networking. 
[3] Kanimozhi S, Kannan A, Devi KS, Selvamani K. Computers & Security. 2021; 100: 13206. 
Secure cloud-based e-learning system with access https://doi.org/10.48550/arXiv.2005.13206.  
control and group key mechanism. Concurrency and [15] Susilo W, Jiang P, Lai JC, Guo FC, Yang GM, Deng 
Computation-Practice & Experience. 2019; 31(12): RH. Sanitizable Access Control System for Secure 
e5106. https://doi.org/10.1002/cpe.5106 Cloud Storage Against Malicious Data Publishers. 
[4] Al-Amri B, Sami G, Alhakami W. An Effective IEEE Transactions on Dependable and Secure 
Secure MAC Protocol for Cognitive Radio Computing. 2022; 19(3): 2138-48. 
Networks. Computer Systems Science and https://doi.org/10.1109/TDSC.2021.3058132 
220 Informatica 49 (2025) 207–220 M. Guo et al. 
[16] Sureshkumar T, Anand B, Premkumar T. Efficient computerized tools to design information security 
Non-Dominated Multi-Objective Genetic Algorithm policies. Computers & Security. 2020; 99: 102063. 
(NDMGA) and network security policy enforcement https://doi.org/10.1016/j.cose.2020.102063 
for Policy Space Analysis (PSA). Computer [29] Merhi MI, Ahluwalia P. Predicting Compliance of 
Communications. 2019; 138: 90-7. Security Policies: Norms and Sanctions. Journal of 
https://doi.org/10.1016/j.comcom.2019.03.008 Computer Information Systems. 2023; 64(5), 683–
[17] Hu T, Yang SQ, Wang YP, Li GL, Wang YL, Wang 697. 
G, Yin MY. N-Accesses: a Blockchain-Based https://doi.org/10.1080/08874417.2023.2241413 
Access Control Framework for Secure IoT Data [30] Yang J H, Lin I C, Chien P C. Data Sharing Scheme 
Management. Sensors. 2023; 23(20): 8535; for Cloud Storage Service Using the Concept of 
https://doi.org/10.3390/s23208535. Message Recovery. Informatica, 2017, 28(2): 375-
[18] Varma IM, Kumar N. A comprehensive survey on 386. https://doi.org/10.15388/Informatica.2017.134 
SDN and blockchain-based secure vehicular [31] Muthusenthil B, Kim H, Prasath V B. Location 
networks. Vehicular Communications. 2023; 44: verification technique for cluster based geographical 
100663. routing in MANET. Informatica, 2020, 31(1): 113-
https://doi.org/10.1016/j.vehcom.2023.100663. 130. https://doi.org/10.15388/20-INFOR402 
[19] Lin HY, Tsai TT, Wu HR, Ku MS. Secure access 
control using updateable attribute keys. 
Mathematical Biosciences and Engineering. 2022; 
19(11): 11367-79. 
https://doi.org/10.3934/mbe.2022529 
[20] Sivaselvan N, Bhat KV, Rajarajan M, Das AK. A 
New Scalable and Secure Access Control Scheme 
Using Blockchain Technology for IoT. IEEE 
Transactions on Network and Service Management. 
2023; 20(3): 2957-74. https://doi.org/ 
10.1109/TNSM.2023.3246120 
[21] Wu YC, Sun R, Wu YJ. Smart City Development in 
Taiwan: From the Perspective of the Information 
Security Policy. Sustainability. 2020;12(7): 2916; 
https://doi.org/10.3390/su12072916. 
[22] Wang SP, Wang X, Zhang YL. A Secure Cloud 
Storage Framework with Access Control Based on 
Blockchain. IEEE Access. 2019; 7: 112713-25. 
https://doi.org/10.1109/ACCESS.2019.2929205 
[23] Omala AA, Mbandu AS, Mutiria KD, Jin CH, Li FG. 
Provably Secure Heterogeneous Access Control 
Scheme for Wireless Body Area Network. Journal of 
Medical Systems. 2018; 42(6): 108. 
https://doi.org/10.1007/s10916-018-0964-z 
[24] Yang Y, Liu XM, Guo WZ, Zheng XH, Dong C, Liu 
ZQ. Multimedia access control with secure 
provenance in fog-cloud computing networks. 
Multimedia Tools and Applications. 2020; 79(15-
16): 10701-16. https://doi.org/10.1007/s11042-020-
08703-1 
[25] Kumari A, Gupta R, Tanwar S, Kumar N. A 
taxonomy of blockchain-enabled softwarization for 
secure UAV network. Computer Communications. 
2020; 161:304- 23. 
https://doi.org/10.1016/j.comcom.2020.07.042 
[26] Calzavara S, Rabitti A, Bugliesi M. Semantics-Based 
Analysis of Content Security Policy Deployment. 
ACM Transactions on the Web. 2018; 12(2): 1-36. 
https://doi.org/10.1145/3149408 
[27] Zhang J, Chen AM, Zhang P. Provably Secure Data 
Access Control Protocol for Cloud Computing. 
Symmetry-Basel. 2023; 15(12): 2111; 
https://doi.org/10.3390/sym15122111. 
[28] Rostami E, Karlsson F, Gao S. Requirements for 
https://doi.org/10.31449/inf.v49i12.9588 Informatica 49 (2025) 221–230 221 
A Hybrid OCR-XGBoost-Transformer Pipeline for Resume Parsing 
with Spatial-Semantic Integration 
Rachid Ed-Daoudi1*, Fatima Zahra Zakka2, Mouslime Ouqassou1, Badia Ettaki1 
E-mail: rachid.ed-daoudi@uit.ac.ma, fzakka@esi.ac.ma, mouqassou@esi.ac.ma, bettaki@esi.ac.ma. 
* Corresponding author 
1LyRICA: Laboratory of Research in Computer Science, Data Sciences and Artificial Intelligence,  
School of Information Sciences Rabat-Instituts, Rabat, Morocco 
2Knowledge and data engineer, School of Information Sciences Rabat-Instituts, Rabat, Morocco 
Keywords: resume information extraction, hybrid AI solution, optical character recognition, XGBoost, transformers 
Received: June 5, 2025 
This study addresses the automation of resume information extraction using a hybrid Artificial 
Intelligence (AI) framework that integrates Optical Character Recognition (OCR), Machine Learning, 
and Deep Learning techniques. The system operates in three stages: text extraction using PaddleOCR, 
resume section classification via XGBoost, and semantic entity recognition using a Transformers-based 
Named Entity Recognition (NER) model. The dataset consists of 200 French resumes collected in PDF 
format and annotated for ten resume section classes and multiple named entities. Evaluation was 
conducted using standard multi-class classification metrics including accuracy, precision, recall, and F1-
score. Experimental results show that XGBoost achieved 96.5% accuracy in section classification, while 
the Transformers model attained 82% accuracy in semantic entity extraction. This dual-stage pipeline 
captures both spatial and semantic structures of resumes, offering improved accuracy and adaptability 
over traditional parsing approaches. 
Povzetek: Članek predstavlja hibridno rešitev OCR-XGBoost-Transformer za avtomatizirano ekstrakcijo 
podatkov iz življenjepisov. Sistem dosega visoko točnost pri razvrščanju razdelkov z XGBoost in pri 
semantičnem prepoznavanju entitet s transformerjem. 
 
 
1 Introduction by software and computer systems [3]. Several parsing 
approaches are commonly used. Keyword-based parsers 
In an unpredictable and complex business environment, it are prototypes of faster and more accurate parsers. These 
is important that organizations aim to realize the potential simplistic parsers search for specific words, key phrases, 
offered by the recruitment phase. Organizations are in a and patterns in resume text. However, this approach is 
ceaseless race to find new talent to support their teams and prone to errors (with an accuracy rate of about 70%) as 
corporate competitiveness. The reality is that collecting words can have multiple contexts within a resume [4]. 
candidate information from resumes is often difficult to Grammar-based parsers rely on grammatical rules to 
achieve [1]. interpret information. These relatively complex parsers 
Recruiters are required to read and analyse candidate require manual input during the coding process. When 
resumes manually for the information they need. This coding is done by a skilled linguistic engineer, they can 
manual practice is full of disadvantages. First, it is time analyze a resume quite accurately. However, if manual 
consuming and a labor-intensive activity for recruiters configuration is not done correctly, grammar-based 
who have to read many resumes and work through a lot of parsers can be inaccurate (with an accuracy rate of about 
information. As a result, recruiters have to deal with work 90%) [5]. 
overload, sometimes delaying the whole recruitment Statistical parsers use numerical models of text to identify 
process. Therefore, an emerging technology to automate key elements of a resume. To be accurate, statistical 
the information extraction process can be considered a parsers must be trained on a large number of resumes 
rational way to control and presumably speed up a major containing all the information to be extracted. In terms of 
process in recruitment [2]. The central question of this accuracy, statistical parsers fall between keyword-based 
research is: How can the automation of information parsers and grammar-based parsers [6]. 
extraction from resumes be achieved with new AI-based parsers use machine learning and artificial 
technologies? intelligence techniques. These models can improve over 
The CV parsing technology converts resume data from time by analyzing more information. AI-based parsers 
free-form into structured format. This conversion  offer an extremely high level of accuracy compared to 
facilitates the storage, synthesis, and processing of other CV parsing techniques available on the market [7]. 
information contained in resumes, thus enabling their use  
222 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. 
Recent applications combine OCR, Computer Vision, and need for adaptive, high-accuracy solutions that can 
Natural Language Processing (NLP) techniques to understand both document structure and extract 
advance the capabilities of resume information extraction meaningful entities while maintaining contextual 
from various formats and structures [8]. relationships across different resume sections [9]. 
Despite advances in resume parsing technologies, existing To better position the proposed contribution, Table 1 
solutions still face significant challenges in effectively presents a structured comparison of existing studies on 
handling the spatial and semantic aspects of resume resume parsing. It outlines the datasets used, 
documents simultaneously. Current approaches either methodological approaches, performance levels, and key 
focus on visual structure or textual content, but rarely limitations of each system. This comparative summary 
integrate both dimensions effectively. Additionally, most highlights the need for a unified system that integrates 
commercial systems rely on rule-based methods with both spatial and semantic understanding of resume 
predefined templates, limiting their ability to process content. 
diverse resume formats and structures. There remains a  
 
Table 1: Summary of Related works in resume information extraction 
Accuracy / 
Ref. Dataset Used Method Type Key Techniques Limitations 
Performance 
Proprietary HR Format-dependent, 
[1] Rule-based Heuristics, Templates Not reported 
docs low adaptability 
No semantic 
Internal HR Digital workflows, 
[2] Rule-based Not reported modeling, template 
systems automation 
limitations 
Summarization, Entity No spatial modeling, 
[3] 60 resumes ML-based ~85% accuracy 
extraction weak generalization 
Mixed 
NLP, keyword Poor contextual 
[4] Not specified (Keyword + ~70% accuracy 
matching understanding 
ML) 
Rule-based Chronological No experimental 
[5] Literature-based N/A 
(Survey) parsing, analysis validation 
Text image ~85% OCR No classification or 
[6] OCR-only docs OCR 
recognition accuracy entity recognition 
Business OCR, Deep Learning No spatial-semantic 
[7] DL-based ~90% accuracy 
resumes pipeline integration 
No section 
NLTK-based entity 
[8] English CVs NLP + ML Not specified classification, 
recognition 
shallow analysis 
Not end-to-end, 
Polish IT Section classification, 
[9] Rule + ML ~88% F1-score limited semantic 
resumes heuristics 
modeling 
As the table shows, while various parsing methods have 2. Construct and annotate a dataset of resumes with 
been explored, most fail to simultaneously address spatial spatial and semantic labels, 
layout and deep semantic content. This motivates the 3. Evaluate the performance of machine learning 
currect hybrid OCR–XGBoost–Transformer pipeline, and deep learning models for section 
designed to provide accurate, adaptable, and context- classification and entity recognition, 
aware resume information extraction. 4. Design and implement an integrated, hybrid 
This study investigates whether integrating spatial layout information extraction pipeline. 
features with semantic models can improve the accuracy The main contribution of this work is the development of 
and adaptability of resume information extraction. a novel solution that combines OCR for text recognition, 
Specifically, we hypothesize that a two-stage pipeline— ML algorithms for text line classification into appropriate 
combining OCR-based spatial recognition, section sections, and semantic models based on Named Entity 
classification using XGBoost, and contextual entity Recognition (NER) for information extraction. This 
extraction via Transformers—will outperform traditional integrated approach addresses both the visual-spatial 
methods that rely solely on textual content. aspects of resumes and their semantic content, providing 
To validate this hypothesis, our research follows four main more accurate and comprehensive information extraction 
objectives: than current systems. 
1. Analyze existing approaches and identify their The remainder of this paper is organized as follows: 
limitations, Section 2 describes the proposed methodology, including 
the system architecture, dataset preparation, feature 
A Hybrid OCR-XGBoost-Transformer Pipeline for Resume… Informatica 49 (2025) 221–230 223 
engineering, and algorithms employed. Section 3 presents 2 Method 
the experimental results, including classification and 
entity recognition performance. Section 4 provides a 
discussion of the results in the context of existing work, 2.1 System architecture 
with analysis of contributing factors and identified The proposed system employs a multi-stage pipeline 
limitations. Finally, Section 5 concludes the paper by approach for automated information extraction from 
summarizing the contributions and outlining directions for resumes. The overall architecture, illustrated in Figure 1, 
future research. consists of three main components: (1) text recognition 
and extraction using OCR, (2) text classification to 
identify resume sections, and (3) semantic information 
extraction from the classified text segments.
 
 
Figure 1: System architecture 
The workflow begins with resume documents that are • Career objective: A short statement describing 
converted to images to ensure format independence. The the candidate’s professional goals and the type of 
PaddleOCR [10] model then processes these images to position sought. This helps employers 
extract text and spatial coordinates. The extracted text understand the candidate’s motivations and 
lines are classified into appropriate resume sections using expectations. 
ML models. Finally, semantic models extract specific • Education: Lists academic background, 
entities of interest from each classified section, such as including institutions attended, their locations, 
candidate names, skills, education details, and work degrees obtained, and any relevant certifications 
experience. or training. 
• Job-related skills: Highlights specific skills 
2.2 Dataset preparation and feature relevant to the target job, whether acquired 
selection through work, internships, volunteer activities, or 
hobbies. 
• Professional experience: Provides details on the 
2.2.1 Analysis of the Structure and Content of 
candidate’s work history, including company 
a CV names, job titles, locations, dates of employment, 
In preparing a CV, certain sections are commonly and descriptions of roles and responsibilities. 
included to present relevant information for effective job Relevant internships and volunteer experiences 
applications. These sections typically include: may also be included. 
• Personal Information: Includes full name, • Additional information: Covers elements that 
address, phone numbers (home and mobile), support the application such as language 
email address, and optionally a personal website. proficiency, computer skills, professional 
This information allows employers to easily certifications, memberships in professional 
contact the candidate. organizations, awards, and achievements. A 
portfolio may also be referenced if applicable. 
224 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. 
• Interests and activities: Includes hobbies and employment gaps, this format categorizes skills 
leisure activities that reveal aspects of the and highlights accomplishments over job history. 
candidate’s personality and can highlight soft • Targeted CV: Tailored to a specific job by 
skills or additional qualifications. emphasizing the qualifications that best match 
There are four main types of CV formats, each designed the employer’s expectations. It requires the 
to emphasize different aspects of a candidate’s profile: candidate to carefully analyze the job posting and 
• Chronological CV: Lists the candidate’s work customize each CV section accordingly. 
history in reverse chronological order, starting Combination CV: Merges the chronological and 
with the most recent position. This is the most functional formats. It begins with a summary of 
commonly used format and suits candidates with key competencies followed by a detailed 
consistent career progression. chronological work history. This format suits 
• Functional CV: Focuses on skills and candidates with both strong experience and 
competencies rather than the sequence of jobs. specialized skills. 
Ideal for candidates changing careers or with CVs can be presented in several visual formats. Figure 2 
illustrates three common layout styles: 
 
Figure 2: Common CV layout formats 
 
• Single-Column CV (Figure 2.a): A traditional Each of these formats offers unique advantages depending 
layout where sections are arranged vertically on the candidate’s profile and the industry expectations. 
from top to bottom. It offers clarity and 
simplicity, making it easy for recruiters to read 2.2.2 Dataset construction for classification 
through the information. models 
• Two-Column CV (Figure 2.b): Divides the A dataset of 200 French resumes in PDF format was 
page into two main areas. The left column collected from the HR department of Intelcia IT Solutions 
typically contains personal details and key skills, [11]. Each resume was converted to image format to 
while the right includes professional experience, facilitate consistent processing across different layouts 
education, and other supporting content. This and styles. The PaddleOCR model was applied to extract 
format improves information organization and both textual content and spatial information of each text 
visual balance. line. 
• Creative or Free-Form CV (Figure 2.c): Often Feature engineering focused on capturing the spatial 
used in artistic or design-related fields, this relationships between text lines and section headings 
format allows for greater customization, within resumes. Two key types of features were 
including asymmetric columns, infographics, developed: 
colored blocks, or icons. It provides a 1  Distance-based features: Normalized horizontal 
personalized and visually distinctive presentation and vertical Euclidean distances between each 
of qualifications. text line and section headings were calculated. 
 For text lines and section headings on different 
A Hybrid OCR-XGBoost-Transformer Pipeline for Resume… Informatica 49 (2025) 221–230 225 
pages, a specialized distance calculation was • Hidden layers: Two hidden layers with 64 and 32 
implemented that accounted for page breaks. neurons respectively 
2  Positional features: Binary features indicating • Activation function: ReLU for hidden layers and 
whether a text line appeared above or below each Softmax for output layer 
section heading were created and encoded using • Output layer: 10 neurons corresponding to the 
LabelEncoder. resume section classes 
3  The dataset was manually labeled with ten The model was configured with the following 
classes: nine representing common resume hyperparameters: 
sections (Experience, Education, Skills, Projects, • Optimizer: Adam with learning rate of 0.001 
Certification, Languages, Interests, Software, • Loss function: Categorical cross-entropy 
and Personality) and a tenth class "Other" for text • Batch size: 32 
not belonging to any standard section. In total, 
• Training epochs: 50 
10,000 text lines were labeled to create the 
• Early stopping: Patience of 5 epochs monitoring 
training corpus. 
validation loss 
ANNs were selected for comparison due to their proven 
2.2.3 Dataset preparation for semantic models effectiveness in text classification tasks and ability to learn 
For the semantic extraction task, text lines were grouped complex non-linear relationships between features [14]. 
according to their predicted section classifications to XGBoost was selected due to its proven performance in 
provide contextual information. The Doccano annotation similar structured classification tasks. It offers efficient 
tool was used to manually annotate named entities within handling of sparse and imbalanced data, robust 
each section. A total of eight entity types were defined for regularization, and interpretable feature contributions. As 
annotation: Name, Email, Phone, Education, Experience, demonstrated later in Section 3, XGBoost outperformed 
Skills, Language, and Certification. These categories were alternatives such as Random Forest, ANN, and SVM, 
selected based on relevance to recruitment use cases and confirming its suitability for the classification of OCR-
availability across most CVs in the dataset. The annotated extracted resume sections. 
text was then processed and converted to the JSONL 
format required by SpaCy [12] for NER model training. 2.3.3 Support Vector Machine for section 
classification 
2.3 Key algorithms 
The Support Vector Machine (SVM) model was 
implemented with the following configuration: 
2.3.1 XGBoost for section classification • Kernel: Radial Basis Function (RBF) 
The eXtreme Gradient Boosting (XGBoost) algorithm • C parameter (regularization): 10 
was selected for resume section classification based on its • Gamma parameter: 0.01 
superior performance. XGBoost is an ensemble learning • Decision function: One-vs-Rest for multi-class 
method that builds sequential decision trees to minimize classification 
residual errors. It excels at capturing complex feature • Probability estimates: Enabled 
interactions and handling non-linear relationships [13]. SVMs were chosen for comparison due to their 
The model was configured with the following traditionally strong performance in text classification 
hyperparameters: tasks with moderate-sized datasets and their effectiveness 
• Maximum tree depth: 3 with high-dimensional feature spaces. The RBF kernel 
• Number of estimators: 100 was selected after preliminary testing showed superior 
• Learning rate: 0.1 performance over linear and polynomial kernels for 
This hyperparameter implementation allowed the model capturing the complex relationships in the spatial and 
to balance complexity and generalization, as well as better positional features [15]. 
capture the learning capabilities of the spatial features. These implementations were evaluated using the same 
XGBoost was also implemented very well in terms of its train-test split and evaluation metrics as the XGBoost 
ability to mitigate model limitations from previous model to ensure a fair comparison of performance across 
classification models we attempted in the study. all three classification approaches. 
2.3.2 Artificial Neural Network for section 2.3.4 Transformers model for named entity 
classification recognition 
The Artificial Neural Network (ANN) was implemented For the semantic information extraction component, a 
as a multilayer perceptron with the following architecture: Transformers-based model was implemented using 
• Input layer: Matching the dimensionality of the SpaCy's framework. The overall workflow for semantic 
feature set model construction is illustrated in Figure 3.
 
226 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. 
 
Figure 3: General workflow for semantic model construction
Transformers use an attention mechanism to capture 𝑇𝑃 𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = =  (3) 
contextual relationships between words in text sequences 𝑃 𝑇𝑃 + 𝐹𝑁
[16]. The semantic extraction model was built using the 2 × 𝑅𝑒𝑐𝑎𝑙𝑙 ×  𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
CamemBERT-based Transformer model, implemented 𝐹1 𝑆𝑐𝑜𝑟𝑒 =  (4) 
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
through SpaCy v3.5 using the fr_dep_news_trf pipeline. 
Where:  
CamemBERT is pretrained on large-scale French-
• 𝑇𝑃: True Positive, the number of cases where the 
language datasets (including OSCAR and CCNet) and 
model correctly predicts a positive class 
employs a SentencePiece tokenizer. This choice ensured 
• 𝑇𝑁 ∶ True Negative, the number of cases where 
linguistic compatibility with the French resume dataset 
the model correctly predicts a negative class 
used for training. The model was fine-tuned on eight entity 
categories (Name, Email, Phone, Education, Experience, • 𝐹𝑃 : False Positive, the number of cases where 
Skills, Language, and Certification) for 80 epochs, using the model incorrectly predicts a positive class 
the Adam optimizer and a warm-up learning rate schedule • 𝐹𝑁 : False Negative, the number of cases where 
with early stopping enabled. the model incorrectly predicts a negative class 
Training was conducted on a standard GPU environment Then, for the multi-class evaluation in this study, macro-
available using Google Colab, with an average epoch averaging was employed, which calculates in the 
runtime of 4 minutes and a total training duration of equations 5 and 6 the metric independently for each class 
approximately 5.5 hours. The final model was exported in and then takes the average. This approach gives equal 
SpaCy's DocBin format for deployment. weight to all classes regardless of their frequency in the 
The workflow begins with the classified text segments dataset: 
from the previous stage, which are then processed for  
𝑛
annotation. After manual annotation using Doccano, the 1
annotated text data is preprocessed and structured into the 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑚𝑎𝑐𝑟𝑜_𝑎𝑣𝑒𝑟𝑎𝑔𝑒 = ∑ 𝑃  ( )
𝑛 𝑘 5  
required format for model training. The model is then 𝑘=1
𝑛
trained using the prepared dataset and evaluated against 1
𝑅𝑒𝑐𝑎𝑙𝑙𝑚𝑎𝑐𝑟𝑜_𝑎𝑣𝑒𝑟𝑎𝑔𝑒 = ∑ 𝑅
𝑛 𝑘 (6) 
test data before final deployment. 
𝑘=1
The model was configured using a base configuration file 
Where 𝑃𝑘  is the precision for class 𝑘, 𝑅𝑘is the recall for 
that defined: 
class 𝑘, and 𝑛 the total number of classes. This evaluation 
• Architecture parameters ensures that performance on less frequent resume sections 
• Training hyperparameters was properly assessed [18]. 
• Optimizer settings 
• Feature extraction components 2.5 Pipeline overview – pseudocode 
The Transformers model was selected because of its 
ability to capture long-distance dependencies and The full hybrid workflow is summarized below to 
contextual information, which is particularly valuable for illustrate the integration of the components described 
identifying named entities in resume text where formatting above. 
and context provide important cues. 
2.4 Evaluation metrics 
Performance evaluation for both classification and NER 
models was conducted using standard metrics for multi-
class classification problems [17]. First, the basic metrics 
for a single class are defined in equations 1 to 4: 
 
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =  (1) 
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =  (2) 
𝑇𝑃 + 𝐹𝑃
 
A Hybrid OCR-XGBoost-Transformer Pipeline for Resume… Informatica 49 (2025) 221–230 227 
3 Experimental results comprising 20% of the labeled data. Table 2 and figure 3 
present a comparative analysis of their performance based 
on the evaluation metrics.
3.1 Performance comparison of 
classification models 
The evaluation of the three classification models (ANN, 
SVM, and XGBoost) was conducted using a test dataset 
Table 2: Performance comparison of ANN, SVM, and XGBoost Algorithms 
Model Accuracy Macro-average precision Macro-average recall Macro-average F1-Score 
ANN 80.7% 65.7% 77.2% 71.0% 
SVM 72.5% 51.8% 66.7% 58.3% 
XGBoost 96.5% 94.7% 95.3% 95.0% 
 
 
Figure 4: Performance comparison of classification and NER Models 
 
  
  
As evident from Table 2 and figure 4, XGBoost Table 3: Comparison of NER Models: tok2vec and 
significantly outperformed the other models across all Transformers 
metrics. The model achieved an impressive accuracy of Macro- Macro- Macro-
Model Accuracy average average average F1-
96.5%, indicating its superior ability to correctly classify precision recall Score 
text lines into their respective CV sections. Furthermore, 
tok2vec 73% 53% 66% 58.7% 
the high macro-average precision (94.7%) and recall 
(95.3%) values show XGBoost's robust performance Transformers 82% 72% 79% 75.3% 
across all classes, including minority classes.  
The Transformers model beat the tok2vec on the 
3.2 Performance of semantic models for evaluation metrics overall. With an accuracy of 82%, the 
named entity recognition Transformers model was more accurate when classifying 
named entities in resume text. 
Two NER models were evaluated for their effectiveness 
in extracting named entities from the classified text: 
tok2vec and Transformers. Table 3 summarizes their 3.3 Analysis of XGBoost's superior 
performance after 80 training epochs. performance 
XGBoost's better performance can be attributed to several 
factors related the nature of the algorithm: 
• Boosting technique: XGBoost is based on a 
gradient boosting method that sequentially builds 
228 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. 
new models to correct the mistakes of prior and classifier performance improves by 
models. XGBoost is able to learn from emphasizing the most important features. 
previously misclassified items and iteratively • Regularization techniques: It is noteworthy to 
improves prediction performance. say that XGBoost uses regularisation parameters 
• Handling complex data: XGBoost can fit that can assist with the likelihood of overfitting, 
complex relationships between features, and adding to the good performance of the model on 
moreover, can capture non-linear relationships. unseen data. 
This is significant for resume texts where the Figure 5 shows the confusion matrix for the XGBoost 
spatial relationship between the text lines and model which is indicative of its overall, high classification 
section headings influences their classification. performance across all CV sections.
• Feature importance analysis: The algorithm, in 
its own fashion, defines the most useful features, 
 
Figure 5: Confusion matrix for XGBoost Model (values in %) 
 
 • Sequential processing advantage: The 
The matrix shows a very high level of accuracy (93-98%) classification of individual lines of text 
with virtually no confusion across sections where error accurately positioned the model to better achieve 
estimates like Experience and Projects only had a 1-2% entity extraction in the Transformers module 
estimating error, showcasing how well the XGBoost with better understanding of context. 
method was able to handle the complexity of resume data. 
3.5 Significance of the two-stage approach 
3.4 Transformers model performance An important finding of this research is the utility of the 
analysis two-stage information extraction process: 
The superior performance of the Transformers models 1. The first stage incorporates XGBoost to classify 
generally in NER task has many reasons: each of the text lines into its respective CV 
• Attention mechanism: The Transformers model section to help clarify for semantic analysis. 
uses an attention mechanism that enables it to 2. The second stage incorporates the Transformers 
model contextual relationship between words. It model, which analyses the semantic meaning and 
can inspect words within a larger context where extracts the relevant entities. 
it is appearing, which enhances the accuracy of This process provides a solution to one of the main 
entity recognition. challenges of resume parsing, which was the distribution 
• Contextual understanding: Rather than only of the spatially located text within the image. By 
focusing on the local patterns in the word incorporating the organization of the text into sections 
sequences like in the tok2vec model, the before doing a semantic analysis, we are able to achieve a 
Transformers model can also model the long- higher degree of accuracy with information extraction 
distance dependency between the words to get a than purely undertaking a semantic analysis of the CV. 
more comprehensive understanding of all context The results also indicate that the semantics of translating 
in the text. visually oriented information into semantic information 
creates an additional language processing dimension that 
A Hybrid OCR-XGBoost-Transformer Pipeline for Resume… Informatica 49 (2025) 221–230 229 
goes beyond a one-dimensional text analysis and includes 5 Conclusion 
both visual information and spatial language processing. 
This research introduces a novel hybrid AI solution for 
automated resume information extraction, combining 
4 Discussion OCR with Machine Learning for text classification 
The results presented in the previous section demonstrate (achieving 96.5% accuracy with XGBoost) and Deep 
that the hybrid pipeline outperforms traditional resume Learning for semantic understanding (reaching 82% 
parsing approaches in terms of accuracy, generalization, accuracy with Transformers). The approach addresses the 
and contextual understanding. Specifically, the XGBoost challenge of resumes as spatially distributed text, where 
classifier achieved a 96.5% accuracy in section both layout and content provide crucial semantic context, 
classification, and the Transformers model reached 82% demonstrating that considering spatial positioning 
accuracy in named entity recognition. enhances resume parsing accuracy. 
When compared to prior studies summarized in Table 1: While the current implementation faces limitations 
• Methods relying on rule-based or keyword including language dependency, sensitivity to extreme 
techniques [1], [2], [4] showed limited formatting variations, and substantial training data 
adaptability to diverse resume formats and requirements, several promising research directions 
lacked semantic depth. emerge. Future work should explore deeper integration of 
• Machine Learning-only approaches such as [3] visual and semantic elements, extend the approach to 
achieved moderate performance (~85%) but did multi-dimensional text analysis beyond traditional linear 
not incorporate spatial features or layout context. processing, and investigate techniques requiring less 
• Deep Learning models in [7], although promising labeled training data. This research ultimately points 
(~90%), still treated resumes as flat text, without toward a new domain of natural language processing that 
segment-level classification or layout awareness. incorporates spatially-oriented language understanding 
In contrast, the proposed pipeline integrates both spatial with applications extending beyond resume parsing to 
(layout-aware) features and semantic (contextual) other complex document types. 
representations, which contributes to improved 
classification and entity recognition. The two-stage design References 
ensures that the semantic model receives pre-structured 
[1] Kessler, R., Torres-Moreno, J. M., & El-Bèze, M. 
input, enhancing its ability to extract relevant entities with 
2010. E-Gen: automatic processing of human 
higher precision. 
resources information. Document numérique, 13(3), 
The superior performance of the XGBoost model can be 
95–119. 
attributed to: 
[2] Baudoin, E., Déroulède, B., Diné, S., Dubouloz, M.-
• Fine-grained spatial features (e.g., distances, 
A., & Peretti, J.-M. 2019. Digital recruitment. In 
relative positions), 
Digital transformation of the HR function (pp. 49–
• Strong regularization and ensemble learning 101). Paris: Dunod. 
characteristics, [3] Khan, N., Khan, K., Naveed, S., Nabi, N., Qureshi, 
• Efficient handling of imbalanced or non-linear M., & Naveed, N. 2023. Resume Parser and 
class boundaries. Summarizer. International Journal of Advanced 
Likewise, the use of Transformers for NER offers Research in Science, Communication and 
advantages in: Technology, 3(1), 35–42. 
• Capturing long-range dependencies across lines [4] Olorunshola, O. E., Ampitan, I. O., Adamu-Fika, F., 
within the same section, & Ademuwagun, A. K. (2025). An Enhanced K-NN 
• Handling resume-specific terminology through Algorithm Leveraging BERT Techniques for 
contextual embeddings, Resume Parsing System. Asian Journal of Research 
• Generalizing well across structurally diverse in Computer Science, 18(7), 49-59. 
documents. [5] Aakankshu, R., Kariya, J., Khant, D., Khandare, S., 
Some failure cases were observed in: & Barve, P. 2020. A Systematic Literature Review 
• Highly unstructured or creative resume formats (SLR) on the beginning of resume parsing in HR 
(e.g., asymmetric layouts), Recruitment Process & SMART advancements in 
• Multilingual resumes, where OCR and entity chronological order. Research Square. 
recognition performance dropped, https://assets.researchsquare.com/files/rs-
• Misclassification between "Projects" and 570370/v1/9da1a6e1-437f-4f6d-a021-
"Experience" when boundaries were unclear. 743ea3ee268e.pdf 
These cases highlight potential improvements through [6] Gomathy, C. K. 2022. OPTICAL CHARACTER 
layout-aware Transformers or multimodal embeddings RECOGNITION. ResearchGate. 
that fuse visual and textual signals. https://www.researchgate.net/publication/36062008
5_OPTICAL_CHARACTER_RECOGNITION 
[7] Sarhan, A. M., Ali, H. A., Wagdi, M., Ali, B., Adel, 
A., & Osama, R. (2024). CV Content Recognition 
230 Informatica 49 (2025) 221–230 R. Ed-Daoudi et al. 
and Organization Framework based onYOLOv8 and 
Tesseract-OCR Deep Learning Models. 
[8] Pokharel, P. 2022. Resume parser using NLP. 
ResearchGate. 
https://www.researchgate.net/publication/36177201
4_RESUME_PARSER 
[9] Wosiak, A. 2021. Automated extraction of 
information from Polish resume documents in the IT 
recruitment process. Procedia Computer Science, 
192, 2432–2439. 
https://doi.org/10.1016/j.procs.2021.09.012 
[10] Malik, S., et al. 2020. XGBoost: A Deep Dive into 
Boosting. ResearchGate. 
https://www.researchgate.net/publication/33949915
4_XGBoost_A_Deep_Dive_into_Boosting_Introdu
ction_Documentation 
[11] Gao, S., Kotevska, O., Sorokine, A., & Christian, J. 
B. (2021). A pre-training and self-training approach 
for biomedical named entity recognition. PloS 
one, 16(2), e0246310. 
[12] Kumar, M., Chaturvedi, K. K., Sharma, A., Arora, 
A., Farooqi, M. S., Lal, S. B., ... & Ranjan, R. (2023). 
An algorithm for automatic text annotation for 
named entity recognition using Spacy framework. 
ICAR, Delhi, India, Tech. Rep. 
[13] Chen, T., et al. 2015. XGBoost: extreme gradient 
boosting. R package version 0.4-2, 1(4), 1–4. 
[14] Lee, J. Y., Dernoncourt, F., & Szolovits, P. 2017. 
Transfer learning for named-entity recognition with 
neural networks. arXiv preprint, arXiv:1705.06273, 
1–5. 
[15] Panja, S., Chatterjee, A., & Yasmin, G. 2018. Kernel 
Functions of SVM: A Comparison and Optimal 
Solution. In Advanced Informatics for Computing 
Research (pp. 88–97). Singapore: Springer. 
https://doi.org/10.1007/978-981-13-3140-4_9 
[16] Ghaith, S. 2024. The triple attention transformer: 
advancing contextual coherence in transformer 
models. Evolutionary Intelligence, 17(5), 3723–
3744. 
[17] Riyanto, S., Imas, S. S., Djatna, T., & Atikah, T. D. 
2023. Comparative analysis using various 
performance metrics in imbalanced data for multi-
class text classification. International Journal of 
Advanced Computer Science and Applications, 
14(6). 
[18] Grandini, M., Bagli, E., & Visani, G. 2020. Metrics 
for multi-class classification: an overview. arXiv 
preprint, arXiv:2008.05756. 
https://doi.org/10.31449/inf.v49i12.9254  Informatica 49 (2025) 231-244   231 
 
Deformation Suppression Method for the CNC Machining Process of 
Parts Based on a Single Neuron PID 
Tiejun Liu1*, Ke Chen1,2  
1Zhejiang Guangsha Vocational and Technical University of construction, Zhejiang, DongYang,322100, China 
2Hengyang Valin Steel Pipe Co., Ltd., Hunan, Hengyang, 421001, China 
E-mail：liutiejun895485@163.com 
*Corresponding author 
 
Keywords: CNC machining, deformation suppression, single-neuron PID, predictive control, smart manufacturing 
 
Received: May 16, 2025 
Computer Numerical Control (CNC) machining plays a vital role in modern precision manufacturing but 
often suffers from part deformation due to thermal and mechanical stresses, compromising dimensional 
accuracy. Traditional CNC systems lack adaptive intelligence, operating with static parameters and 
failing to address real-time deformation risks. This study proposes an intelligent deformation suppression 
method using a lightweight single-neuron-based Proportional-Integral-Derivative (PID) neural model, 
termed NeuroPID-CNC, to predict and mitigate deformation during machining. The model was trained 
and tested on the CNC-DeformControl dataset containing machining parameters such as cutting speed, 
feed rate, depth of cut, tool temperature, and material type. Data preprocessing involved normalization 
and categorical encoding. The NeuroPID-CNC model, structured as a binary classifier with a single 
hidden neuron using a sigmoid activation function and Adam optimizer, was trained on 70% of the data 
and evaluated on the remaining 30%. It achieved 92% accuracy, 90% precision, 93% recall, 91.5% F1-
score, and 0.84 MCC, outperforming conventional algorithms like SVM, RF, LR, and KNN. A real-time 
feedback loop further enables adaptive learning. The NeuroPID-CNC approach effectively predicts 
deformation risks and recommends real-time control actions, enhancing machining reliability and 
reducing material waste. This makes it a promising solution for smart, adaptive manufacturing 
environments. 
Povzetek: Za preprečevanje deformacij med CNC obdelavo je predlagana metoda NeuroPID-CNC, lahki 
nevronski model z enim nevronom, ki posnema PID regulator. Model je dosegel visoko točnost pri 
napovedovanju tveganja deformacije in priporoča prilagoditve v realnem času (npr. hitrost rezanja), s 
čimer izboljša zanesljivost in kakovost izdelkov. 
 
1 Introduction impacted by numerous factors such as tool temperature, 
material type, cutting forces, and vibration during the 
1.1 The background information of this machining process. 
scientific field  
1.2 The current knowledge and advances in 
Computer Numerical Control (CNC) machining is an this field 
essential component of modern industrial manufacturing, 
allowing for the automated, precise fabrication of complex Sensor integration, adaptive control systems, and 
components from a broad range of materials, including advanced simulation techniques have all contributed 
metals, plastics, and composites [1]. CNC machines use significantly to the advancement of CNC machining in 
programmed instructions to control parameters like recent years [3].  Researchers and engineers have used 
cutting speed, feed rate, tool path, and spindle load [2]. finite element modeling (FEM), real-time feedback 
This high level of automation improves productivity, systems, and machine learning techniques to track and 
consistency, and precision in industries ranging from improve machining processes [4].  Numerous studies have 
aerospace, automotive, and electronics. However, as concentrated on predicting tool wear, improving cutting 
manufacturing tolerances tighten and precision conditions, and enhancing the surface finish [5]. Adaptive 
requirements rise, even minor distortions during control algorithms like fuzzy logic, conventional PID 
machining can result in unacceptable defects, raised controllers, and deep learning-based methods have been 
rework rates, and wasted resources. These distortions, proposed to tackle machining variability. Despite these 
often referred to as machining-induced deformations, are improvements, numerous control systems still depend on 
232        Informatica 49 (2025) 231-244       T. Liu et al. 
 
fixed or heuristic-based logic that cannot continuously 1.5 The main method(s) used in this research 
learn or adapt to the machining setting. 
 To achieve the research objectives, a novel algorithm 
1.3 The current problem/issue that needs to called NeuroPID-CNC was created and trained on a 
be solved or addressed urgently curated dataset called CNC-DeformControl, which 
includes critical machining parameters like cutting speed, 
One of the most persistent and pressing issues in CNC feed rate, depth of cut, tool temperature, material type, and 
machining is the inability of current systems to forecast others. The methodology included several key stages: data 
and avoid part deformation in real time [6]. Deformation preprocessing by categorical encoding and normalization; 
causes dimensional inaccuracies, structural weaknesses, building of a lightweight single-neuron neural network 
and higher manufacturing costs [7]. Existing PID model that simulates PID control behavior; training and 
controllers and other conventional control strategies are evaluation of the model utilizing binary classification 
not well-suited to capture the nonlinear, dynamic nature of metrics such as accuracy, precision, recall, and F1-score; 
machining-induced deformation, particularly in high- and integration of a real-time feedback strategy to allow 
speed or multi-material machining settings [8]. online learning and continual enhancement. To guarantee 
Additionally, there is a lack of lightweight and efficient convergence and computational effectiveness, 
interpretable models that can operate in real-time, the model makes use of a sigmoid activation function, 
continuously adapt to novel machining data, and offer binary cross-entropy loss, and the Adam optimizer. In 
actionable parameter adjustments to minimize addition, real-time control logic is integrated into the 
deformation risks [9], [10]. The followings are the system, allowing it to automatically adjust crucial 
hypotheses: machining parameters, such as coolant flow, cutting speed, 
• Whether a single-neuron-inspired PID control and feed rate, when a high deformation risk is predicted. 
model accurately forecast the risk of component  
deformation in CNC machining by utilizing real- 1.6 The importance or impact of this 
time machining parameters? research to the scientific community 
• Does the application of a single-neuron-inspired 
PID control algorithm lead to a substantial This study contributes to the improvement of intelligent 
decrease in part deformation when compared to CNC control systems by proposing an interpretable and 
conventional static or PID-based control adaptive control framework that combines conventional 
methods? PID principles and neural learning capacities. By 
• Can the dynamic modification of cutting incorporating a single-neuron PID architecture, the 
conditions, informed by the predictions of the algorithm guarantees low computational overhead while 
single-neuron PID model, enhance component providing intelligent decision-making in real time. The 
quality and machining reliability? NeuroPID-CNC method can be incorporated into 
• Whether a single-neuron neural model more industrial CNC machines to significantly decrease 
effectively forecast deformation risks in real- material waste, enhance product quality, and lower 
time CNC operations compared to conventional operating costs. For the scientific community, this 
classifiers? research opens up new avenues for creating hybrid neuro-
 control systems, expanding the scope of Industry 4.0, and 
supporting the evolution of automated manufacturing 
1.4 The purpose(s) of doing this research 
methods. 
The primary goal of this research is to create an intelligent Controlling deformation and guaranteeing dimensional 
deformation suppression control algorithm specifically accuracy of machined parts has proven to be a significant 
designed for CNC machining environments. The study difficulty in CNC machining due to the dynamic and 
aims to design and execute a single-neuron-inspired PID complex nature of the process. Fan et al. [11] proposed an 
model that can precisely forecast the risk of part energy-based principle for reducing machining distortion 
deformation using real-time machining parameters. This in monolithic aircraft parts, which provided insights into 
study also aims to offer practical control suggestions for residual stress release and deformation prediction. 
dynamically adjusting cutting conditions to prevent However, their method lacked a real-time compensation 
deformation, resulting in improved part quality and mechanism. Ma et al. [12] proposed a single-neuron PID-
machining dependability. The study addresses the gap in based model that showed success in deformation 
lightweight, adaptive, and responsive control systems suppression during CNC machining, but it was tested 
appropriate for contemporary smart manufacturing setups. under limited scenarios and did not take parameter 
 adaptability into account in real time. Kasprowiak et al. 
 [13] used input shaping control to decrease machining 
 vibration, but they neglected to consider feedback 
 
Deformation Suppression Method for the CNC Machining Process of...   Informatica 49 (2025) 231-244     233 
 
 
adaptation during continuous machining. Similarly, Guo Świć et al. [19] studied control methods for elastic-
et al. [14] concentrated on suppressing casing vibrations deformable states in turning and grinding shafts. However, 
in aeroengine elements but did not integrate with tool-path their focus was on low-stiffness shafts, which limits 
compensation. generalization. Lv et al. [20] created an automated shape 
Shi et al. [15] presented a compensation model for correction mechanism for wood composites, emphasizing 
polishing tools in precision CNC polishing, which possibilities in non-metallic materials but having limited 
enhanced surface quality but was only applicable to application to high-precision metal machining. Yi et al. 
aspheric surfaces.  Hasçelik et al. [16] optimized cutting [21] investigated mesoscale deformation in thin-walled 
parameters to reduce wall deformation in thin-wall micro- micro-milling, but did not use intelligent adaptive 
milling. However, their approach was sensitive to tool feedback systems. Korpysa and Habrat [22] explored 
wear and material variability. Zheng et al. [17] precision milling of magnesium alloys, comparing coated 
investigated vibration-assisted micro-milling, which and uncoated tools, but lacking dynamic deformation 
provided useful insight into tool wear reduction but lacked control. Devi et al. [23] used ant lion optimization with 
general applicability. Gan et al. [18] presented an adaptive TOPSIS analysis to optimize milling parameters, but their 
backlash compensation method for CNC machines, but its method did not include predictive modeling or feedback 
effectiveness in complex geometries remains unverified. control.      Table 1 shows a summary of related works. 
 
Table 1: Summary of related works 
 
Ref Study Focus Results Limitations 
[11] Energy principle for Enhanced prediction of residual No real-time compensation 
distortion reduction in stress-related deformation mechanism 
aircraft parts 
[12] Single-neuron PID model Efficient in simple deformation Not tested under varied real-time 
for deformation control conditions 
suppression 
[13] Input shaping control for Decreased vibration efficiently Lacked adaptive feedback 
vibration suppression integration 
[14] Vibration suppression in Improved structural stability Did not incorporate tool-path 
aeroengine casing milling compensation 
[15] Tool displacement model Enhanced surface finish in aspheric Particular to aspheric surfaces 
for CNC polishing polishing only 
[16] Optimization in micro- Decreased deformation utilizing Sensitive to tool wear and material 
milling of thin-wall optimized parameters variability 
geometries 
[17] Tool wear suppression in Reduced wear through non- Limited generalization across 
vibration-assisted micro- resonant vibration materials 
milling 
[18] Adaptive backlash Decreased mechanical play in Unproven effectiveness for 
compensation in CNC motion systems complex parts 
[19] Elastic-deformable state Enhanced dimensional accuracy in Applicable mostly to the turning 
control in shaft machining low-stiffness components and grinding of shafts 
[20] Shape correction in wood Automated geometric adjustment Limited relevance to metal CNC 
composites during continuous pressing applications 
[21] Deformation control in Superior precision in curved thin- No intelligent feedback or real-
mesoscale micro-milling wall parts time control 
[22] Milling accuracy in Enhanced accuracy utilizing coated No active deformation control 
magnesium alloys tools included 
[23] End-milling parameter Multi-objective optimization Static optimization lacks 
optimization using ant lion attained predictive adaptability 
and TOPSIS 
234        Informatica 49 (2025) 231-244       T. Liu et al. 
 
The prior investigations combined offer valuable insights smart deformation suppression control algorithm designed 
into machining vibrations, deformation mitigation, to predict and reduce the risk of part deformation during 
parameter optimization, and compensation methodologies. CNC (Computer Numerical Control) machining processes.  
Nonetheless, several restrictions and substantial gaps It draws on both machine learning and PID control 
persist in the integration of real-time intelligent control, principles, combining the intelligence of a lightweight 
including the absence of adaptive feedback, active neural network with real-time process control strategies.  
deformation control, and model interpretability, among NeuroPID-CNC employs a single-neuron neural network 
others. This research proposes a lightweight and effective that mimics a PID controller.  It accepts machining 
framework, termed the NeuroPID-CNC model, to address parameters as input (for example, cutting speed, feed rate, 
the limitations and research gaps identified in prior studies. depth of cut, and temperature) and predicts whether 
 deformation will occur ("Yes" or "No").  If there is a high 
2 Materials And methods risk of deformation, the algorithm automatically adjusts 
the machining settings to prevent it. Algorithm 1 shows 
This section describes the creation of the NeuroPID-CNC theNeuroPID-CNC algorithm. 
Algorithm, which predicts and suppresses deformation in  
CNC machining.  The NeuroPID-CNC algorithm is a  
Algorithm 1: NeuroPID-CNC 
 
Input: CNC-DeformControl Dataset (features + Deformation Risk) 
Output: Predicted Deformation Risk (Yes/No) and control recommendations 
 
Begin 
// Step 1: Data Preprocessing 
Load dataset D 
Encode categorical attributes in D 
Normalize numerical attributesin D 
Split D into training_set and test_set (70/30) 
 
// Step 2: Initialize Single-Neuron PID Model 
Initialize neural network: 
    - 1 input layer 
    - 1 hidden layer with 1 neuron (PID-like) 
    - 1 output neuron (binary classification) 
 
Set activation_function ← Sigmoid 
Set optimizer ← Adam 
Set loss_function ← Binary Crossentropy 
Set biases to zero 
Employ Glorot Uniform for weight initialization. 
Implement L2 regularization and configure the batch size to 32. 
Establish the epoch count at 100. 
// Step 3: Training Phase 
Train the model on the trainingset utilizing backpropagation 
For each period (1 to 100): 
     Randomize training dataset 
     Segment the data into mini-batches of size 32. 
For each mini-batch: 
     Calculate the output of the hidden layer utilizing the sigmoid function. 
     Calculate the output layer utilizing the sigmoid function. 
     Calculate the binary cross-entropy loss between the expected and actual outputs. 
    Calculate loss 
    Adjust weights and biases via the Adam optimizer 
    Implement L2 regularization during weight adjustments. 
Apply early stopping to prevent overfitting 
 
// Step 4: Evaluation Phase 
Assess the model on the test set 
Calculate Accuracy, Precision, Recall, F1-Score, and MCC 
Display the confusion matrix 
 
// Step 5: Real-Time Prediction & Control 
For each newinput: 
    Encode and normalize new_input 
    prediction ← model.predict(new_input) 
 
 
Deformation Suppression Method for the CNC Machining Process of...   Informatica 49 (2025) 231-244     235 
 
 
    If prediction == "Yes" then 
       Decrease Cutting Speed 
        Increase Coolant Flow 
        Adjust Feed Rate based on Material Type 
    Else 
        Continue with current parameters 
    End If 
 
// Step 6: Feedback Loop 
After machining: 
    Record actual deformation findings 
    Compare the prediction with the actual outcome 
    Update model weights via online learning 
 
End 
 binary cross-entropy loss.  After training, it uses standard 
The NeuroPID-CNC algorithm is a smart deformation classification metrics to evaluate previously unseen data 
suppression control system specifically designed for CNC and predicts deformation risk for new machining 
machining applications.  It employs a lightweight neural conditions in real time.  If a high deformation risk is 
network model that simulates PID behavior using a single- detected, the algorithm adjusts machining parameters 
neuron architecture to predict whether a machined part is dynamically, such as reducing cutting speed, increasing 
deformable based on a variety of machining parameters coolant flow, or changing the feed rate based on material 
such as cutting speed, feed rate, depth of cut, tool properties, to reduce deformation.  A feedback mechanism 
temperature, material type, and others.  The process begins is integrated to continuously update the model through 
with preprocessing the CNC-DeformControl dataset by online learning, improving control accuracy over time. 
encoding categorical features and normalizing numerical Figure 1 shows the flow diagram of the NeuroPID-CNC 
ones, then splitting the data into training and testing  algorithm. 
sets. The neural model, which includes a sigmoid-activated  
hidden neuron, is trained with the Adam optimizer and 
 
 
Figure 1: Flow diagram of NeuroPID-CNC algorithm 
 
236        Informatica 49 (2025) 231-244       T. Liu et al. 
 
The flow diagram shows the NeuroPID-CNC algorithm's interface, and spindle load values were derived from the 
operational pipeline for predicting and controlling spindle drive system's onboard diagnostics.  Categorical 
deformation during CNC machining.  It starts with the variables, such as tool wear and vibration levels, were 
CNC-DeformControl dataset, which goes through evaluated using image-based inspection, vibration sensors, 
preprocessing steps such as categorical feature encoding and operator feedback.  Surface finish was determined by 
and numerical feature normalization to ensure algorithm post-process optical inspection and tactile comparison 
compatibility.  The data is then divided into training and with standard roughness gauges. 
testing sets to aid in model generalization.  A single-neuron All collected data was logged in real time by a dedicated 
PID-inspired neural network is set up with a sigmoid data acquisition system and then stored in a structured 
activation function, Adam optimizer, and binary cross- format in a relational SQL database hosted on a secure 
entropy loss function.  The model is trained with local server.  Data from this database was exported in CSV 
backpropagation and evaluated on the test set to compute format for preprocessing and training.  The dataset is kept 
performance metrics. For real-time predictions, incoming in a version-controlled environment to ensure data 
data is encoded and normalized similarly, and the model integrity and traceability during the algorithm 
predicts the deformation risk.  If the risk is identified as development and testing stages. Figure 2 illustrates the 
"Yes," corrective control actions are automatically data collection process in a controlled CNC machining lab 
triggered, including reducing cutting speed, increasing environment. 
coolant flow, and adjusting the feed rate based on the 
material type, allowing adaptive, intelligent CNC 
machining. 
 
2.1 Dataset description 
The CNC-DeformControl dataset is a curated collection of  
machining data designed to help intelligently predict and  
suppress part deformation during Computer Numerical Figure 2: Data collection process 
Control (CNC) operations.  It includes 11 key attributes,  
such as machining process parameters and observed The CNC machine performs operations while sensors and 
outcomes, spread across several representative entries.  tools collect relevant data.  Machine diagnostics 
The dataset's primary goal is to help machine learning (speedometer icon) record cutting speed and feed rate, 
applications, particularly the NeuroPID-CNC algorithm, infrared sensors measure thermal data (thermometer icon), 
understand how different machining conditions affect the image-based analysis inspects tool wear (camera icon), 
likelihood of part deformation. vibrations are monitored by dedicated sensors (waveform 
This dataset contains a mixture of numerical and icon), and surface finish is assessed by tactile comparison 
categorical features.  The numerical attributes—Cutting to roughness gauges (touch icon).  All sensor data is 
Speed (in RPM), Feed Rate (in mm/rev), Depth of Cut (in captured in real time and securely stored in a structured 
mm), Tool Temperature (in °C), and Spindle Load (as a SQL database (database icon).  For model training and 
percentage)—measure the operational intensity of analysis, data is exported from SQL and converted to CSV 
machining.  These parameters have a direct impact on heat format (CSV file icon).  This pipeline provides high-
generation, mechanical stress, and material removal quality, structured data for machine learning applications 
efficiency.  In contrast, categorical attributes such as in deformation risk prediction. 
material type (e.g., aluminum, steel, brass, plastic), tool Overall, the CNC-DeformControl dataset provides a 
wear, vibration, coolant flow, and surface finish provide compact but meaningful representation of the machining 
qualitative information about the machining environment.  landscape, capturing both measurable and observational 
These factors have an impact on part integrity through variables required for training intelligent deformation 
physical wear, thermal control, and vibration dampening.  prediction systems like NeuroPID-CNC. 
The Deformation Risk field, labeled as "Yes" or "No,"  
serves as the target variable that indicates whether the 2.2 Data preprocessing 
machined part showed signs of deformation under the 
given conditions. To ensure that the CNC-DeformControl dataset is ready 
The data was gathered in a controlled CNC machining lab for machine learning, extensive preprocessing steps are 
environment outfitted with industrial-grade sensors and used.  The dataset contains a mix of numerical and 
monitoring equipment.  Cutting speed, feed rate, and depth categorical features that must be represented consistently 
of cut were programmed and recorded directly from the for the algorithm to correctly interpret the data.  
CNC machine interface.  Thermal readings were obtained Categorical attributes like Material Type, Tool Wear, 
using infrared sensors mounted near the tool-workpiece Vibration, Coolant Flow, and Surface Finish are 
 
Deformation Suppression Method for the CNC Machining Process of...   Informatica 49 (2025) 231-244     237 
 
 
numerically encoded using one-hot encoding, which  
converts categorical values into a binary matrix format. 2.3 Model initialization: Single-Neuron PID 
The one-hot encoding process transforms a categorical structure 
variable into a binary vector representation as shown in Eq. 
(1): The proposed model is a simple neural structure inspired 
 by the PID control principle that consists of only one 
𝑂𝑛𝑒𝐻𝑜𝑡(𝑥𝑖) = [𝑥𝑖 = 𝑐1, 𝑥𝑖 = 𝑐2, … , 𝑥𝑖 = 𝑐𝑛 ] (1) hidden neuron. This neuron simulates the adaptive control 
 behavior of a PID controller by receiving preprocessed 
Where: machining inputs from the input layer and computing a 
𝑥𝑖 is a categorical value, nonlinear transformation for prediction. The final output is 
𝑐1, 𝑐2, … , 𝑐𝑛 are the unique categories, produced by a single output neuron equipped with a 
Each comparison 𝑥𝑖 = 𝑐𝑗yields 1 if true, else 0. sigmoid activation function, which converts the weighted 
This transformation is critical for allowing the single- sum of inputs into a deformation probability expressed by 
neuron model to interpret non-numeric data while Eq. (4): 
preserving categorical relationships without imposing  
artificial ordering. 1 (4) 
𝜎(𝑧) =  
Simultaneously, all numerical attributes—Cutting Speed, 1 + 𝑒−𝑧  
Feed Rate, Depth of Cut, Tool Temperature, and Spindle Where: 
Load—are normalized utilizing Min-Max scaling, which z = weighted sum of inputs 
rescales each feature to lie within the range [0, 1]. This is σ(z) = output value in the range [0, 1] representing 
mathematically expressed by Eq. (2): deformation risk 
 The term 𝑒−𝑧represents the exponential function with a 
𝑥 − 𝑥𝑚𝑖𝑛 negative exponent, which is a fundamental mathematical 
𝑥𝑛𝑜𝑟𝑚 =  (2) 
𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 expression describing exponential decay. It is the inverse 
 of the natural exponential function 𝑒𝑧, where 𝑒 is Euler’s 
Where: number (approximately 2.71828). This function plays a 
𝑥 = original value of the feature key role in the sigmoid activation function by controlling 
𝑥𝑚𝑖𝑛= minimum value of the feature in the dataset how sharply the output transitions between 0 and 1 based 
𝑥𝑚𝑎𝑥= maximum value of the feature in the dataset on the input 𝑧. 
𝑥𝑛𝑜𝑟𝑚= normalized value of the feature Mathematically, 𝑒−𝑧can be expressed using its infinite 
This normalization ensures that no feature dominates series expansion in Eq. (5): 
others due to varying scales, resulting in balanced  
∞
contributions throughout training.  After normalization and (−𝑧)𝑛
𝑒−𝑧
encoding, the dataset is randomly divided into two subsets: = ∑  (5) 
𝑛!
70% for training and 30% for testing.  This split preserves 𝑛=0
model generalization and ensures that evaluation is where: 
performed on unseen data. The dataset 𝐷 is randomly split 𝑧 is the weighted sum of inputs, 
into training and testing subsets using the Eq. (3): 𝑛! denotes the factorial of 𝑛, 
 and the series sums over all non-negative integers 𝑛. 
𝐷 =  𝐷 This logistic function guarantees that the model’s output 
𝑡𝑟𝑎𝑖𝑛 ∪ 𝐷𝑡𝑒𝑠𝑡 , (3) 
𝑤ℎ𝑒𝑟𝑒 |𝐷𝑡𝑟𝑎𝑖𝑛|= 0.7|𝐷|, lies between 0 and 1, representing the probability of 
|𝐷 deformation risk under current machining conditions. The 
𝑡𝑒𝑠𝑡|= 0.3|𝐷| 
 model is trained utilizing the binary cross-entropy loss 
Where: function, defined in Eq. (6), which measures the 
D: The complete preprocessed dataset after normalization discrepancy between predicted and actual outcomes: 
and encoding.  
𝐷 𝐿 = −[𝑦. log(?̂?) + (1 − 𝑦). log (1 − ?̂?)] (6) 
𝑡𝑟𝑎𝑖𝑛: The training subset of the dataset utilized to train 
the model.  
𝐷 Where: 
𝑡𝑒𝑠𝑡: The testing subset of the dataset utilized to evaluate 
y = actual class label (0 for no deformation, 1 for 
the model’s performance. 
∣ deformation) 
D∣: The total number of data instances (rows) in the full 
?̂?= predicted probability of deformation 
dataset 𝐷. 
|𝐷 𝐿 = loss value that penalizes prediction errors 
𝑡𝑟𝑎𝑖𝑛 |: The number of instances in the training set, equal 
Here, 𝑦 is the actual binary label (0 for “No Deformation” 
to 70% of the total dataset. 
|𝐷 and 1 for “Yes”), while ?̂? is the predicted probability. The 
𝑡𝑒𝑠𝑡 |: The number of instances in the test set, equal to 
model’s weights are optimized utilizing the Adam 
30% of the total dataset. 
238        Informatica 49 (2025) 231-244       T. Liu et al. 
 
optimizer, a robust gradient descent variant that adapts TP = True Positives (correctly predicted deformations) 
learning rates for quicker and more stable convergence. At TN = True Negatives (correctly predicted non-
each iteration 𝑡, the parameters 𝜃𝑡are updated as follows: deformations) 
 FP = False Positives (incorrectly predicted deformations) 
𝑚𝑡 = 𝛽1𝑚𝑡−1 + (1 − 𝛽1)𝑔𝑡 (7) FN = False Negatives (missed deformations) 
 Precision quantifies the fraction of predicted "Yes" 
𝑣𝑡 = 𝛽2𝑣 2
𝑡−1 + (1 − 𝛽2)𝑔𝑡  (8) (deformation) true cases: 
  
𝑚𝑡 𝑇𝑃 (14) 
?̂?𝑡 =  (9) 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =  
1 − 𝛽𝑡  
1 𝑇𝑃 + 𝐹𝑃
  
𝑣𝑡 Recall reflects the model’s ability to identify all actual 
?̂?𝑡 =  (10) 
1 − 𝛽𝑡
2 "Yes" cases: 
  
?̂?𝑡 𝑇𝑃 (15) 
𝜃𝑡 = 𝜃𝑡−1 − 𝛼  (11) 𝑅𝑒𝑐𝑎𝑙𝑙 =   
√?̂?𝑡 + 𝜖 𝑇𝑃 + 𝐹𝑁
 
where 𝑔𝑡is the gradient at iteration t, 𝑚𝑡and 𝑣𝑡are the 
F1-Score, the harmonic mean of precision and recall, 
biased first and second moment estimates, ?̂?𝑡and ?̂?𝑡are 
offers a balanced view: 
their bias-corrected estimates, 𝛼 is the learning rate, 𝛽1and 
 
𝛽2are decay rates for these moments, and 𝜖 is a small 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 (16) 
constant to prevent division by zero. 𝐹1 − 𝑠𝑐𝑜𝑟𝑒 =  2 ∗   
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
 
 
2.4 Training phase MCC computes the quality of binary and multiclass 
classifications by considering true and false positives and 
During training, the model aims to reduce the loss function 
negatives, providing a balanced score even with 
via backpropagation, an algorithm that calculates the 
imbalanced datasets. 
gradient of the loss concerning each model weight. The 
 
weight update rule is formalized as showed in Eq. (12): (𝑇𝑃∗𝑇𝑁)−(𝐹𝑃∗𝐹𝑁)
 𝑀𝐶𝐶 =     
√(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁)                  
𝜕𝐿
∆𝑤 = −𝑛.  (12) (17) 
𝜕𝑤
  
Where: These metrics provide a comprehensive view of model 
w = change in weight performance in predicting deformation risks. 
𝜂 = learning rate  
𝜕𝐿 2.6 Real-time prediction and control 
= gradient of the loss function concerning weight 𝑤
𝜕𝑤
  The trained model is deployed for real-time prediction 
The training process iterates through numerous epochs, during CNC operations. When a novel machining 
adjusting weights after each batch of training examples. To configuration is initiated, the input values are first 
prevent overfitting, early stopping is executed: training processed (encoded and normalized) as per training 
halts if the validation loss fails to improve over a routines. The model then generates an output probability?̂?. 
predefined number of epochs. This strategy improves If ?̂? > 0.5, the system flags a high deformation risk. In 
model generalization on new, unseen CNC conditions. such cases, immediate corrective actions are triggered by 
 predefined control logic. For instance, a high-risk flag 
2.5 Evaluation phase prompts a 10% reduction in cutting speed, utilizing the 
formula: 
After training, the model’s efficiency is assessed on the  
testing set utilizing standard classification metrics. These 𝑁𝑒𝑤 𝐶𝑢𝑡𝑡𝑖𝑛𝑔 𝑆𝑝𝑒𝑒𝑑 (18) 
metrics assess the model’s capability to correctly predict = 𝑂𝑙𝑑 𝐶𝑢𝑡𝑡𝑖𝑛𝑔 𝑆𝑝𝑒𝑒𝑑 × 0.9 
deformation risk:  
Accuracy measures the ratio of correct predictions to total Where: 
samples: "Old Cutting Speed" = initial programmed cutting speed 
 "New Cutting Speed" = adjusted speed to reduce stress on 
𝑇𝑃 + 𝑇𝑁 (13) 
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =   the workpiece 
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Where, 
 
Deformation Suppression Method for the CNC Machining Process of...   Informatica 49 (2025) 231-244     239 
 
 
This reduction reduces both mechanical and thermal stress equations, this system creates a rigorous yet practical 
on the workpiece.  Other adaptive responses, like framework for real-time decision-making and long-term 
increasing coolant flow or decreasing feed rate, are improvement.  The result is a smarter, more efficient, and 
implemented concurrently based on the material type and resilient manufacturing environment. 
observed vibration.  If the expected risk is low, the  
machining operation continues without intervention, 3 Results 
ensuring efficiency while maintaining safety. 
 3.1 Experimental setup 
2.7 Feedback loop and online learning 
All experiments were carried out on a Windows 11 system 
Following each machining operation, the actual running Python 3.10.  The machine was equipped with an 
deformation outcome is recorded and compared to the Intel Core i7 processor and 16 GB of RAM.  TensorFlow, 
model's prediction.  This creates a feedback loop, Scikit-learn, Pandas, NumPy, and Matplotlib were used to 
increasing the model's adaptability over time.  Using train, evaluate, and visualize models.  The dataset was 
online learning, the model gradually updates its weights divided into two sets: training (70%) and testing (30%).  
using recent prediction errors. The update rule is given by: Early stopping and adaptive learning rate scheduling were 
𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 + 𝛼. (𝑦 − ?̂?). 𝑥 (19) used to prevent overfitting and speed up convergence. 
  
Where: 3.2 Comparison results 
𝑤𝑜𝑙𝑑= previous weight 
𝑤𝑛𝑒𝑤= updated weight Table 2 compares the classification models used on the 
𝛼 = online learning rate (a small constant) CNC-DeformControl dataset, including SVM, Random 
𝑦 = actual label (0 or 1) Forest (RF), KNN, Logistic Regression (LR), and the 
?̂?= predicted output proposed NeuroPID-CNC model. 
𝑥 = input feature value.  
In a feedback-driven online learning system, predictions Table 2: Performance comparison of classification 
consistently impact control actions, which then change models 
future input data. This feedback can exacerbate problems Model Accura Precisi Reca F1- MC
if not adequately stabilized. A diminutive learning rate (𝛼) cy (%) on (%) ll Scor C 
guarantees more gradual weight adjustments and (%) e 
contributes to stability preservation. An elevated learning (%) 
rate (𝛼) may induce oscillations or divergence, particularly SVM 88.43 86.22 85.13 85.6 0.76 
in feedback systems. As updates rely on prediction error, 7 
significant spikes in error can disrupt learning until Random 90.12 89.05 87.60 88.3 0.79 
addressed. In practical CNC machining, complete Forest 2 
convergence is uncommon. In online learning, weights are KNN 87.30 84.95 84.00 84.4 0.74 
adjusted following each data point or small batch, resulting 7 
in continual retraining. Periodic full model resets or Logistic 86.75 83.90 83.10 83.5 0.72 
reinitializations may be conducted to prevent drift or Regressi 0 
overfitting. on 
 This type of incremental learning ensures that the model NeuroPI 92.00 90.00 93.00 91.5 0.84 
evolves with real-world data, adapting to unknown D-CNC 0 
materials, dynamic wear conditions, or unexpected  
operational disruptions. By combining real-time prediction The proposed NeuroPID-CNC algorithm had the best 
with continuous learning, the system grows more robust performance across all metrics tested.  It enables real-time 
and context-aware over time, eventually achieving a self- feedback adaptation and improved learning of 
improving CNC control mechanism that maximizes deformation-prone patterns. This architecture is extremely 
machining precision while reducing the risk of costly responsive to subtle patterns in deformation-prone 
defects. conditions, resulting in higher prediction accuracy and 
The NeuroPID-CNC algorithm represents an intelligent, robustness.  Furthermore, its streamlined structure 
lightweight, and adaptable solution for predicting and minimizes overfitting, whereas more complex models may 
suppressing deformation during CNC machining.  It require deeper tuning. Figure 3 shows the confusion matrix 
tightly integrates machine learning principles with control for proposed approach. 
engineering strategies using a single-neuron PID-inspired  
structure, strong preprocessing, accurate prediction, and 
dynamic feedback adaptation.  With ten foundational 
240        Informatica 49 (2025) 231-244       T. Liu et al. 
 
classification errors and improves robustness when dealing 
with complex interactions between CNC parameters like 
cutting speed, tool wear, and thermal readings.  The 
model's ability to learn consistently across diverse inputs 
supports its use in real-time industrial settings.  In Figure 
5, NeuroPID-CNC leads with a precision of 90%, 
indicating that it correctly predicts a deformation risk 90% 
of the time. 
 
 
Figure 3: Confusion Matrix for proposed approach 
 
Figure 4 demonstrates that the proposed NeuroPID-CNC 
model attains the highest accuracy among all evaluated 
classifiers, reaching 92%. 
 
 
Figure 5: Precision comparison 
 
From figure 5, the precision of proposed NeuroPID-CNC 
approach outperforms SVM, RF, KNN and LR by 4.38%, 
1.07%, 5.94% and 7.27% respectively. 
High precision is required in CNC machining 
environments to avoid unnecessary operational 
adjustments caused by false positives.  The model's low 
false alarm rate leads to increased operational efficiency 
by ensuring that control recommendations (such as 
reducing cutting speed or increasing coolant flow) are only 
implemented when there is a genuine risk.  This precision 
advantage stems primarily from the model's ability to learn 
 subtle patterns associated with actual deformation-
 inducing conditions while filtering out noise from non-
Figure 4: Accuracy comparison critical anomalies.  Figure 6 shows that NeuroPID-CNC 
 has the highest recall value of 93%, indicating an excellent 
From figure 4, the accuracy of proposed NeuroPID-CNC sensitivity to actual deformation occurrences. 
approach outperforms SVM, RF, KNN and LR by 4.03%, 
2.09%, 5.38% and 6.05% respectively. 
This high accuracy demonstrates the model's overall 
predictive power in correctly identifying deformation risk 
("Yes") and non-risk ("No") instances.   
 
 
 
The superior performance is due to the unique integration 
of a PID-inspired control mechanism within the neuron, 
which allows the model to adjust its internal weights with 
greater precision during training.  This reduces 
 
Deformation Suppression Method for the CNC Machining Process of...   Informatica 49 (2025) 231-244     241 
 
 
 
  
Figure 6: Recall comparison Figure 7: F1-Score comparison 
  
From figure 6, the recall of proposed NeuroPID-CNC From figure 7, the F1-Score of proposed NeuroPID-CNC 
approach outperforms SVM, RF, KNN and LR by 9.24%, approach outperforms SVM, RF, KNN and LR by 6.81%, 
6.16%, 10.71% and 11.91% respectively. 3.60%, 8.32% and 9.58% respectively. 
A high recall ensures that the model rarely overlooks true The F1-score, which is the harmonic mean of precision and 
positive cases—an important feature in critical recall, measures the model's overall effectiveness in 
manufacturing scenarios where undetected deformations handling the binary classification task.  This balanced 
could jeopardize product quality, damage tools, or cause performance indicates that the NeuroPID-CNC model 
production downtime.  This exceptional recall is due to the optimizes both false positives and false negatives, rather 
model's continuous feedback adjustment loop, inspired by than favoring one over the other.  Such a balance is critical 
the integral component of PID control, which improves in industrial settings where both unnecessary interventions 
detection sensitivity over time as more real-world and missed deformation risks have financial and 
machining data is processed.  Figure 7 shows that operational implications.  Finally, Figure 8 demonstrates 
NeuroPID-CNC has the best trade-off between precision that NeuroPID-CNC obtained the highest Matthews 
and recall among all tested models, with an F1-score of Correlation Coefficient (MCC) score of 0.84, which is 
91.5%. widely considered one of the most reliable metrics for 
evaluating binary classifiers, especially on imbalanced 
datasets.
 
 
Figure 8: MCC comparison 
242        Informatica 49 (2025) 231-244       T. Liu et al. 
 
From figure 8, the MCC of proposed NeuroPID-CNC The Single-Neuron PID-Inspired Control is proficient in 
approach outperforms SVM, RF, KNN and LR by 10.53%, real-time management of dynamic, nonlinear systems, 
6.33%, 13.51% and 16.67% respectively. adaptive error learning, feedback-based decision-making, 
MCC accounts for all four confusion matrix components and resource-constrained applications. The machine 
(true positives, true negatives, false positives, and false learning models exhibit challenges due to inadequate 
negatives), providing a more complete picture of model temporal feedback management, rigidity in online 
performance.  The high MCC score confirms that the learning, elevated computational expenses (particularly in 
model consistently and strongly correlates predicted and random forests and k-nearest neighbors), and limited 
actual outcomes, regardless of class imbalance.  This adaptability in non-stationary control contexts. 
robust performance ensures reliability and fairness in The results demonstrate the superiority of the proposed 
prediction decisions over varying dataset distributions and NeuroPID-CNC model in predicting deformation risk 
machining conditions. during CNC machining.  The model's PID-inspired single-
McNemar’s test was employed to statistically validate the neuron architecture not only provides superior 
performance differences across classifiers based on the performance across all standard classification metrics but 
paired predictions of all models. Table 3 presents the it also ensures operational interpretability and real-time 
results of the statistical significance test conducted with adaptability.  These benefits make it an ideal candidate for 
McNemar’s test. The suggested method demonstrated smart manufacturing environments where precision, 
statistically significant superiority over RF (p < 0.001), dependability, and responsiveness are crucial. Future 
SVM (p < 0.003), KNN (p=0.004), and LR (p<0.005). research will concentrate on implementing the model on 
 industrial edge devices for real-time inference, utilizing 
Table 3: Statistical Test - McNemar's Test multi-modal sensor data including audio and thermal 
 images, applying transfer learning for enhanced 
Algorithm McNemar’s p-value generalization, incorporating explainable AI 
statistic methodologies to augment interpretability, and embedding 
the model within closed-loop control systems for 
SVM 42.13 0.002 autonomous CNC parameter modification based on 
predictive feedback. 
RF 45.24 0.0001 
 
KNN 39.18 0.004 5 Conclusion 
LR 37.89 0.0045 This study described the NeuroPID-CNC algorithm, which 
is a lightweight single-neuron PID-inspired classifier for 
 
predicting deformation risk in CNC machining.  The 
4 Discussion model outperformed traditional classifiers, achieving the 
highest accuracy, precision, recall, F1-score, and MCC, 
The single-neuron PID-inspired predictive control 
proving its suitability for real-time deformation risk 
technique can surpass machine learning models such as 
detection and adaptive control in manufacturing.  The 
RF, SVM, KNN, and LR. Single-neuron PID-inspired 
current model was trained using data from a controlled lab 
controllers are designed for dynamic system regulation, 
environment, which may limit its applicability to different 
combining the advantages of PID control with adaptive 
machine types and unstructured production scenarios.  It 
features. It can adjust weights in real-time utilizing 
also focuses solely on binary classification and requires 
straightforward learning algorithms, rendering it suitable 
manual feature selection, with no support for multi-output 
for dynamic, non-linear systems with fluctuating 
or continuous prediction tasks.  Future research will focus 
conditions. It provides a temporal viewpoint by evaluating 
on deploying the model on industrial edge devices for real-
past errors, the current state, and anticipated future 
time inference, incorporating multi-modal sensor data 
behavior, which is consistent with control system needs. 
such as audio and thermal images, using transfer learning 
The methodology is interpretable, and its performance can 
for broader generalization, integrating explainable AI 
be adjusted using domain expertise (e.g., calibrating 
techniques to improve interpretability, and embedding the 
proportional, integral, and derivative influences). 
model into closed-loop control systems for autonomous 
Machine learning algorithms are models trained in 
CNC parameter adjustment based on predictive feedback. 
batches. They do not readily adapt in real time without 
 
expensive retraining. These are computationally intensive, 
 
perhaps rendering them unsuitable for real-time embedded 
 
control systems. It does not inherently manage temporal 
 
dynamics until augmented by time-lagged features, which 
 
may still lack responsiveness or interpretability.   
 
 
Deformation Suppression Method for the CNC Machining Process of...   Informatica 49 (2025) 231-244     243 
 
 
References [11] Fan, L., Tian, H., Li, L., Yang, Y., Zhou, N., & He, N. 
(2020). Machining distortion minimization of 
[1] Yao, K. C., Chen, D. C., Pan, C. H., & Lin, C. L. monolithic aircraft parts based on the energy 
(2024). The development trends of computer principle. Metals, 10(12), 
numerical control (CNC) machine tool 1586.  https://doi.org/10.3390/met10121586   
technology. Mathematics, 12(13), 1923. [12] Ma, T., Han, Y., & Li, H. (2024). Single neuron PID 
https://doi.org/10.3390/math12131923   based method for deformation suppression during 
[2] Soori, M., Jough, F. K. G., Dastres, R., & Arezoo, B. CNC machining of parts. Advanced Control for 
(2024). Robotical automation in CNC machine tools: Applications: Engineering and Industrial 
a review. acta mechanica et Systems, 6(2), e122. https://doi.org/10.1002/adc2.122   
automatica, 18(3).  https://doi.org/10.2478/ama- [13] Kasprowiak, M., Parus, A., & Hoffmann, M. (2022). 
2024-0048   Vibration suppression with use of input shaping 
[3] Mouli, K. C., Prasad, B. S., Sridhar, A. V., & Alanka, control in machining. Sensors, 22(6), 2186. 
S. (2020). A review on multi sensor data fusion https://doi.org/10.3390/s22062186   
technique in CNC machining of tailor-made [14] Guo, L., Yang, F., Li, T., Zhou, M., & Tang, J. (2021). 
nanocomposites. SN Applied Sciences, 2(5), 931. Vibration suppression of aeroengine casing during 
https://doi.org/10.1007/s42452-020-2739-7   milling. The International Journal of Advanced 
[4] Soori, M., & Arezoo, B. (2023). Modification of CNC Manufacturing Technology, 113, 295-307. 
machine tool operations and structures using finite https://doi.org/10.1007/s00170-020-06582-2   
element methods, A review. Jordan Journal of [15] Shi, Y., Su, M., Cao, Q., & Zheng, D. (2024). A 
Mechanical and Industrial Engineering. Normal Displacement Model and Compensation 
https://doi.org/10.59038/jjmie/170302  Method of Polishing Tool for Precision CNC 
[5] Kuntoğlu, M., Aslan, A., Sağlam, H., Pimenov, Polishing of Aspheric 
Kuntoğlu, M., Aslan, A., Sağlam, H., Pimenov, D. Y., Surface. Micromachines, 15(11), 
Giasin, K., & Mikolajczyk, T. (2020). Optimization 1300.  https://doi.org/10.3390/mi15111300  
and analysis of surface roughness, flank wear and 5 [16] Hasçelik, A., Aslantas, K., & Yalçın, B. (2025). 
different sensorial data via tool condition monitoring Optimization of Cutting Parameters to Minimize Wall 
system in turning of AISI 5140. Sensors, 20(16), Deformation in Micro-Milling of Thin-Wall 
4377. https://doi.org/10.3390/s20164377  Geometries. Micromachines, 16(3), 310. 
[6] Shen, Y., Yang, F., Habibullah, M. S., Ahmed, J., Das, https://doi.org/10.3390/mi16030310   
A. K., Zhou, Y., & Ho, C. L. (2021). Predicting tool [17] Zheng, L., Chen, W., & Huo, D. (2020). Investigation 
wear size across multi-cutting conditions using on the tool wear suppression mechanism in non-
advanced machine learning techniques. Journal of resonant vibration-assisted micro 
Intelligent Manufacturing, 32, 1753- milling. Micromachines, 11(4), 380. 
1766.  https://doi.org/10.1007/s10845-020-01625-7   https://doi.org/10.3390/mi11040380   
[7] Kasprowiak, M., Parus, A., & Hoffmann, M. (2022). [18] Gan, L., Wang, L., & Huang, F. (2023). Adaptive 
Vibration suppression with use of input shaping backlash compensation for CNC machining 
control in machining. Sensors, 22(6), applications. Machines, 11(2), 193. 
2186.  https://doi.org/10.3390/s22062186   https://doi.org/10.3390/machines11020193   
[8] Nguyen, D. K., Huang, H. C., & Feng, T. C. (2023). [19] Świć, A., Gola, A., Orynycz, O., Tucki, K., & 
Prediction of Thermal Deformation and Real-Time Matijošius, J. (2022). Technological Methods for 
Error Compensation of a CNC Milling Machine in Controlling the Elastic-Deformable State in Turning 
Cutting Processes. Machines, 11(2), 248. and Grinding Shafts of Low 
https://doi.org/10.3390/machines11020248  Stiffness. Materials, 15(15), 5265. 
[9] Li, Y., Li, Y. N., Li, X. W., Zhu, K., Zhang, Y. A., Li, https://doi.org/10.3390/ma15155265   
Z. H., ... & Wen, K. (2023). Influence of material [20] Lv, Y., Liu, Y., Li, X., Lu, L., & Malik, A. (2024). 
removal strategy on machining deformation of Automated Shape Correction for Wood Composites in 
aluminum plates with asymmetric residual Continuous Pressing. Forests, 15(7), 
stresses. Materials, 16(5), 2033. 1118.  https://doi.org/10.3390/f15071118   
https://doi.org/10.3390/ma16052033   [21] Yi, J., Wang, X., Zhu, Y., Wang, X., & Xiang, J. 
[10] Hasçelik, A., Aslantas, K., & Yalçın, B. (2025). (2024). Deformation Control in Mesoscale Micro-
Optimization of Cutting Parameters to Minimize Wall Milling of Curved Thin-Walled 
Deformation in Micro-Milling of Thin-Wall Structures. Materials, 17(20), 
Geometries. Micromachines, 16(3), 5071.  https://doi.org/10.3390/ma17205071   
310.  https://doi.org/10.3390/mi16030310   [22] Korpysa, J., & Habrat, W. (2024). Dimensional 
Accuracy After Precision Milling of Magnesium 
244        Informatica 49 (2025) 231-244       T. Liu et al. 
 
Alloys Using Coated and Uncoated Cutting 
Tools. Materials, 17(22), 
5578.  https://doi.org/10.3390/ma17225578   
[23] Devi, C., Mahalingam, S. K., Cep, R., & Elangovan, 
M. (2024). Optimizing end milling parameters for 
custom 450 stainless steel using ant lion optimization 
and TOPSIS analysis. Frontiers in Mechanical 
Engineering, 10, 
1353544.  https://doi.org/10.3389/fmech.2024.13535
44  
 
https://doi.org/10.31449/inf.v49i12.9142  Informatica 49 (2025) 245-254   245 
 
 
An Enhanced FSO-BPNN Framework for Anomaly Detection and 
Early Warning in Power System Monitoring 
Na Li*, Guanghua Yang, Yuexiao Liu, Xiangyu Lu, Zhu Tang 
State Grid Beijing Electric Power Company, Beijing,100032, China 
E-mail：diance003@126.com 
*Corresponding Author  
 
Keywords: anomaly detection (AD), internet of things (IOT), monitoring, neural network, power system (PS) smart 
grid, predictive maintenance 
 
Recieved: May 7, 2025 
The increasing complexity of contemporary power networks necessitates the development of enhanced 
early warning systems and intelligent monitoring to ensure stability and operational efficiency. 
Traditional approaches to risk prevention and predictive maintenance often fail due to limitations in 
identifying real-time abnormalities and adapting to dynamic system characteristics. To address these 
issues, the present research proposes an improved fish swarm optimization with Backpropagation Neural 
Network (IFSO-BPNN) for anomaly detection (AD) and fault detection (FD) early warning in power 
system (PS) monitoring that integrates an IFSO algorithm with a BPNN. The major goal is to increase 
the accuracy of AD and FD in smart grids by utilizing deep learning (DL) and optimization approaches. 
The IFSO method integrates adaptive weighting and behavioral dynamics into classic fish swarm 
optimization, improving overall search capabilities. By tweaking BPNN parameters using IFSO, the 
model achieves higher convergence rates and improved classification accuracy. The assessment dataset 
was compiled usingInternet of Things (IoT) sensors and pan/tilt camera-based surveillance systems at 
Beijing power plants, with preprocessing techniques such as min-max normalization and feature 
extraction using Independent Component Analysis (ICA) to improve model performance. Resultsfrom 
experiments show that the IFSO-BPNN model outperforms standard algorithms with an accuracy 
ofFD99.98% and AD 0.9980. These findings illustrate the system's capacity to detect anomalies quickly 
and perform preventive maintenance. The proposed method, which combines swarm intelligence with 
neural networks, helps to construct smarter, more robust power grids capable of meeting future energy 
demands with lower failure risks. 
Povzetek: Za odkrivanje napak (FD) in nepravilnosti (AD) v nadzoru elektroenergetskega sistema je razvit 
IFSO-BPNN (Izboljšana optimizacija jata rib in BPNN). Model izboljša kvaliteto z optimizacijo 
parametrov BPNN z IFSO, kar omogoča hitro zgodnje opozarjanje in prediktivno vzdrževanje. 
 
1 Introduction difficult [4].Real-time data collection and analysis of 
electrical characteristics is part of PS monitoring, used to 
Artificial intelligence (AI), big data, and deep learning ensure system stability, identify problems, improve 
(DL) revolutionize power systems (PS) by enhancing performance, and assist in decision-making for 
feature modeling, control, and fault diagnosis; these are dependable and effective power grid operation [5].As 
presenting recent advances and applications in monitoring demonstrated by the arctic sky tragedy, the expansion of 
and performance analysis [1]. The expansion ofPS is the cruise industry needs advanced, dependable PS to 
hindered by growing power demand and environmental avoid blackouts, which endanger public safety, the 
objectives, which present challenges for transmission environment, financial stability, and reputation 
capacity and distance. Advanced, sustainable energy [6].Potential false alarms, reliance on data quality, 
solutions are being used to achieve carbon peaking and difficulty identifying new abnormalities, computational 
neutrality [2].Reconstruction errors and thresholding are complexity, difficulties with real-time implementation, 
used in AD(AD) to minimize false alarms and isolate fault and threshold setting are some drawbacks of AD and early 
areas by training a model to learn typical system warning in PS [7]. 
behaviorin an unsupervised manner [3].Approximately  
70% of energy is produced by thermal power plants; new  
large-capacity units (600–1000+ MW) improve operating  
efficiency but make system coupling and integration more  
246   Informatica 49 (2025) 245-254       N. Li et al. 
 
1.1 Aim and contribution of the research Following data cleaning and feature extraction, 
supervisory control and data acquisition (SCADA)were 
The aim of the research is to develop a new method, processed using aConvolutional neural network - 
improved fish swarm optimization with Backpropagation bidirectional gated recurrent unit (CNN-BiGRU) with 
Neural Network (IFSO-BPNN), for detecting anomalies attention to identify wind turbine faults [10]. Accurate FD 
and faultsin PS by integrating BPNN and IFSO in actual wind farms was accomplished; however, it was 
algorithms. The goal is to increase the accuracy and constrained by the generalizability of the data source and 
efficiency of AD and fault detection (FD) in smart grids the possibility of overfitting to particular turbine models. 
while also enabling proactive maintenance. The research's The monitoring of wind turbine health was enhanced by 
key contributions include the following: utilizing mutual information to determine essential 
• IFSO Algorithm: Improves the global search parameters, support vector regression (SVR) for 
capability and adaptive weighting of classic Fish thresholding, and long short-term memory -autoencoder 
Swarm Optimization, resulting in less (LSTM-AE) for AD [11]. The outcome demonstrated 
convergence time and higher classification precise AD and successful identification of crucial 
accuracy in anomaly and fault identification. parameters. Real-time monitoring settings could show a 
• BPNN Optimization: IFSO is used to optimize decline in performance due to noisy data or inadequate 
BPNN parameters, which results in quicker temporal information. To optimize the monitoring and 
convergence and greater classification accuracy security of smart hospitals, machine learning (ML) and 
for real-time AD and FD. edge-based advertising on Contiki Coojawere applied to 
• Advanced-Data Preprocessing: Uses min-max identify IoT network intrusions and e-health incidents 
normalization and Independent Component [12]. The system was successful in identifying 
Analysis (ICA) for feature extraction, improving cyberattacks and e-health events, but it was very 
the model's performance in power system dependent on the reality of the simulated data, which could 
monitoring by efficiently preprocessing Internet not work effectively with complex or novel attack 
of Things (IoT) sensor and surveillance system patterns. 
data. Abnormalities in wind turbines were discovered and 
The next phase (phase 2) clearly explains the existing accurately analyzedutilizing a combination of methods. 
research about ADand early warning in PS monitoring. Local outlier factor (LOF) and adaptive K-means for 
Phase 3 presents the methodology, Phase 4 provides the preprocessing, Extreme Gradient Boosting (XGBoost) for 
result and discussion of existing vs proposed method, and diagnosis, and long short-term memory-stacked denoising 
Phase 5 deliversthe conclusion. autoencoder (LSTM-SDAE) for feature extractionwere 
 employed [13]. The technique increased wind turbine 
2 Related works dependability by efficiently identifying and diagnosing 
problems in real-time utilizing SCADA data. Performance 
The aim of the research [8] was to increase the was dependent on the caliber of preprocessing and could 
dependability of seismic stations. For reliable power be hampered by noisy data or hidden anomalies. The 
failure prediction, the SeismoGuard Ensemble, which research created an early warning system that incorporates 
comprises random forest (RF), support vector machine meteorological data to enhance PS dependability and 
(SVM), k-nearest neighbors (KNN), and logistic proactively reduce atmospheric dangers [14]. The 
regression (LR), along with IoT monitoring, was used. technology enhanced defect detection and prevented 
Results demonstrate that the approach attained 90% outages during severe weather; however, its performance 
accuracy and increased dependability. The dataset's reach depended on data quality and erratic weather patterns. The 
was restricted; however, the data contains long-term advancements in battery electric vehicle (BEV) 
testing with wider generalization across various situations. technology, platforms, charging, and monitoring were 
A combination of elliptic curve cryptography (ECC)- examined to address issues regarding safety, charging, and 
based token control with deep reinforcement learning range in new energy cars [15]. Although cutting-edge 
(DRL)-based sleep scheduling was used for secure and platforms and safety features dominate the BEV industry, 
adaptive power management under possible threat however, there were issues with battery lifecycle safety, 
conditions in order to improve the security and energy charging simplicity, and weather adaptation. The PS load 
efficiency of wireless sensor networks (WSNs) [9]. The margin was determined by utilizing an artificial neural 
approach achieved a 15% increase in energy efficiency network (ANN) trained on phasor measurement unit 
and a 20.01% power reduction. While simulation-based (PMU) data and model simulations to ensure voltage and 
outcomes were validated, more verification was required small-signal stability [16]. An ANN's ability to anticipate 
for scalability and real-world implementation under load margin effectively cannot exceed a dependence on 
various attack types. the quality of PMU data and model assumptions in actual 
systems. To increase safety in nuclear-powered marine 
An Enhanced FSO-BPNN Framework for Anomaly Detection...                                               Informatica 49 (2025) 245-254     247 
 
operations, developments in ship nuclear power framework was suggested; however, knowledge remains 
machinery (SNPM) design, fault diagnostics, and risk limited and needs to be verified. Table 1 provides the 
assessmentwere evaluated [17]. Design enhancements and related works summary table. 
investigation spaces were identified, and an integrated risk  
 
Table 1: Comparative Summary of the related works 
 
Reference Methods Results Limitations 
Duet al. [8] SeismoGuard Ensemble (RF, Achieved 90% accuracy, Limited dataset 
SVM, KNN, LR) + IoT improved dependability coverage; needs 
monitoring of seismic stations generalization and 
broader testing 
Qinet al.[9] ECC token control + DRL- 15% energy efficiency Simulation-based only; 
based sleep scheduling for gain, 20.01% power real-world scalability 
WSN reduction and threat resilience not 
verified 
Xianget al.[10] SCADA data + CNN-BiGRU + Accurate wind turbine Data source 
attention mechanism FD in real wind farms generalizability is 
limited; overfitting risk 
to specific turbine 
models 
Chen et al. [11] Mutual information + SVR for Accurate anomaly Real-time performance 
thresholds + LSTM-AE for detection; key could  degrade under 
anomaly detection parameters identified noisy or incomplete data 
Said et al. [12] ML + edge-based intrusion Identified e-health events Simulated data could 
detection on Contiki Cooja for and IoT network fail under real, complex 
smart hospitals intrusions accurately attack patterns 
Zhang et al. [13] LOF + adaptive K-means Real-time, accurate Sensitive to data quality; 
preprocessing + XGBoost + ADand diagnosis in wind hidden anomaly types 
LSTM-SDAE turbines may be missed 
Božičeket Early warning system using Prevented outages and Dependent on weather 
al.[14] meteorological data improved detection unpredictability and 
during extreme weather data quality 
He et al. [15] BEV platform, Technological Issues remain in battery 
charging/swapping stations, dominance and safety safety, weather 
and monitoring platform improvements in the adaptability, and 
BEV market charging ease 
Bento et al. [16] ANN trained on PMU data + Accurate load margin Performance hinges on 
model-based simulation prediction ensuring PMU data and 
voltage and small-signal assumptions in 
stability simulation models 
Adumene et al. SNPM designs + fault  hybrid risk framework; Incomplete knowledge 
[17] diagnosis + risk assessment identified design base; needs validation 
progress and framework 
integration 
 scale power networks. The research fills a gap by merging 
2.1   Research gap an IFSO method with a BPNN for PS anomaly and fault 
identification. Compared to earlier techniques, this 
The method additionally solves past techniques' approach improves accuracy, convergence speed, and FD 
drawbacks, such as restricted data generalization, resilience, especially in noisy situations. 
overfitting, simulation reliance, and data quality  
sensitivity. The proposed approach, IFSO-BPNN,  
provides a scalable, real-time solution for proactive  
maintenance and problem detection in complex, large-  
248   Informatica 49 (2025) 245-254       N. Li et al. 
 
3 Research methodology 3.2 Data preprocessing via min-max 
normalization 
This section discusses IoT sensor-based data collection in 
PS and introduces the IFSO-BPNN approach for anomaly Min-max normalization is a common method used for 
and fault identification, as well as early warning in PS numerical sensor and camera data from Beijing power 
monitoring. Figure 1 shows the methodology flow, which plants to scale characteristics between 0 and 1, in which 
includes data pretreatment, feature extraction, and model the values of a feature are translated into a preset range, 
optimization. usually [0-1]. The method retains data connections, hence 
 being suitable for a wide range of ML applications. The 
transformation is carried out using the following Equation 
(1). 
 
𝑥=𝑚𝑖𝑛
𝑋𝑛𝑒𝑤 =    (1) 
𝑚𝑎𝑥(𝑥)−𝑚𝑖𝑛(𝑥)
 
𝑋𝑛𝑒𝑤 = The adjusted value obtained after scaling the data 
𝑋 = outdated value, 𝑚𝑎𝑥(𝑥)= dataset’s highest possible 
value.𝑚𝑖𝑛(𝑥) = dataset's lowest possible value. The 
normalizing technique improves AD and FD in PS 
monitoring by ensuring that all data points have a 
consistent scale, which increases predictive model 
accuracy. 
 
3.3 Feature extraction using independent 
component analysis (ICA) 
ICA is a current statistical technique that attempts to break 
down observable data into statistically independent 
components. The ICA was used on sensor and surveillance 
 
 data to reduce dimensionality and extract essential 
Figure 1: Flow of the proposed method features, which improved the IFSO-BPNN model's 
 capacity to detect abnormalities in PS monitoring as a 
3.1 Data collection linear mixture of independent components, expressed as 
follows in Equation (2). 
The system configuration includes a pan/tilt integrated  
camera, a series of local storage DVR hosts, a 1-terabyte 𝑦 = 𝐵. 𝑇   (2) 
dedicated hard disk, and equipment from major domestic  
video equipment manufacturers. A wireless networking Where: 𝑦 represents the observed data vector, 𝐵denotes 
module is an important element that allows direct the mixing matrix, and 𝑇 denotes the separate components. 
connection across 4G or 5G wireless networks. The In ICA, components are assumed to be statistically 
research is centered on power stations surrounding independent and non-Gaussian, with a square and 
Beijing, where the distribution stations lack wired unknown mixing matrix𝐵. To extract the components, 
networks and must communicate over wireless networks. calculate the inverse 𝑋 of matrix 𝐵 as follows in Equation 
To achieve that, on-site terminal equipment is required to (3). 
access different network types at the distribution station,  
such as 2G/3G/4G, GSM, CDMA, and wired networks. 𝑇 = 𝑋. 𝑦   (3) 
Many of these stations are found in basements. In the event  
of a severed wireless connection between the station and ICA divides data into statistically independent 
the platform, short messages transmitted to the terminal components, helping in AD and FD in PS. While the 
equipment at the distribution station allow for simple technique does not give direct variance or ordered data, the 
permission and re-establishment of communication. The enhanced sparsity-based technique improves feature 
data were split into an 8:2 ratio, 80% for training, and 20% extraction and speeds up convergence for real-time 
for testing dataset. applications such as early warning systems. ICA has been 
 widely applied in disciplines like face recognition and 
 dimensionality reduction. PS monitoring, which extracts 
 essential characteristics from sensor data, catches 
An Enhanced FSO-BPNN Framework for Anomaly Detection...                                               Informatica 49 (2025) 245-254     249 
 
complicated, non-Gaussian patterns that standard 𝐹(𝑥)𝑥. 𝑡𝑎𝑛ℎ(𝑠𝑜𝑓𝑡𝑝𝑙𝑢𝑠(𝑥)). where softplus 
approaches typically overlook, resulting in improved AD, function𝑓(𝑥) = log (1 + 𝑒𝑥).Mish is a self-regulatory 
FD and maintenance efficiency. activation that improves accuracy and generalization 
 instead of standard function. The process is smooth and 
3.4 Detection and early warning in PS non-monotonic, allowing for modest negative outputs 
monitoring using improved fish swarm while retaining strong positive flow, avoiding problems 
optimization with backpropagation like dead neurons in ReLU.𝑥: Input to the neuron. 
neural network (IFSO-BPNN)  𝑆𝑜𝑓𝑡 𝑝𝑙𝑢𝑠 (x): A smooth variant of ReLU.𝑡𝑎𝑛ℎ(⋅): 
Implements smooth limiting behavior for high input 
The IFSO-BPNN enhances AD and FD in PSby values. Data from the power system is collected, 
optimizing BPNN parameters with the IFSO algorithm, standardized, and sent to the network for training. 
increasing classification accuracy, and allowing for real- Normalization guarantees that each input feature 
time predictive maintenance. Figure 2 displays the contributes evenly to model training. During forward 
proposed method’s flow diagram for power system propagation, input data is transferred through the layers as 
monitoring. the model produces predictions. Backpropagation then 
 changes the weights and biases depending on the loss 
function, which is commonly Mean Squared Error (MSE) 
and computed as follows in Equation (5). 
 
1 2
𝑀𝑆𝐸 = ∑𝑁
𝑗=1(𝑥 − 𝑥   5  
𝑁 𝑝𝑟𝑒𝑑 𝑎𝑐𝑡𝑢𝑎𝑙) ( )
 
To improve the model's capacity to detect anomalies and 
fault, increase system dependability, and provide early 
alerts for proactive PS repair. 
 
Loss function: 
In the PS anomaly and fault detection, the loss function is 
critical for reducing prediction errors and improving model 
parameters. The BPNN's output layer computes the error 
between the expected output and the actual observed 
detection using the MSE and an appropriate activation 
function. The error gradient of each neuron in the output 
 
layer could be computed as follows in Equation (6). 
 
 
Figure 2: Flow diagram for the proposed method. 
 𝛿𝑜𝑢𝑡 = (𝑥𝑝𝑟𝑒𝑑 − 𝑥𝑡𝑟𝑢𝑒). 𝜎′(𝑤)  (6) 
3.4.1 Back-propagation neural network (BPNN)  
The BPNN is a multi-layer feed-forward artificial neural 𝑥𝑝𝑟𝑒𝑑  : predicted output (anomaly, and fault score). 𝑥𝑡𝑟𝑢𝑒: 
network designed to identify anomalies in PS. The True label (0 for no abnormality and 1 for anomaly). 𝜎′(𝑤) 
architecture consists of an input layer, one or more hidden is the derivative of the activation function for the neuron's 
layers, and an output layer. Sensor readings, system input 𝑤.The gradient of the hidden layers is affected 
performance measurements, and ambient parameters are primarily by the output error, but also by the gradients of 
all sent into the input layer. The hidden layers discover the following layers. The gradient of a hidden layer neuron 
complicated patterns in the data, whereas the output layer 𝐺𝑗 could be calculated using the chain rule in Equation (7). 
anticipates anomalies and faults such as system  
malfunctions or failures. Each neuron's output is defined 𝛿ℎ𝑖𝑑𝑑𝑒𝑛 = ∑𝑖 𝑧𝑗.𝑖 . 𝛿𝑖. 𝜎′(𝑤𝑗)  (7) 
by applying an activation function to the weighted sum of  
inputs in Equation (4). 𝛿ℎ𝑖𝑑𝑑𝑒𝑛: Error gradient for a hidden layer neuron. 𝑧𝑗.𝑖: 
 
Weight coupling hidden layer cell 𝐺𝑗 with output 
𝑥 = ∑𝑛
𝑗=1 𝑧𝑗 . 𝑦𝑖 + 𝑎   (4) 
neurons. 𝛿𝑖: The error gradient of the output 
 
neuron.𝜎′(𝑤𝑗): Derivative of the activation function for 
Where 𝑦𝑖is the input, 𝑧𝑗is the weight, 𝑎is the bias, and 𝜎 (⋅
the buried layer input 𝑤𝑗 .Gradient descent is used to 
) is the exponential activation function (TanhExp) 𝑓(𝑥) =
update weights and biases during training to minimize the 
𝑥. tanh (𝑒𝑥), generally mish activation function 𝑡𝑎𝑛ℎ.  
loss function. The rules for updating the weights (𝑧) and 
The Mish function is smooth and comparable to TanhExp. 
biases (𝑎) in each round are as follows in Equations (8-9). 
The formula is provided as follows. 
 
250   Informatica 49 (2025) 245-254       N. Li et al. 
 
𝑧(𝑛+1) 𝜕
= 𝑧(𝑛) 𝑃
− 𝜂.  (8) detection accuracy are enhanced by the adaptive 
𝜕𝑧 technique. 
𝑎(𝑛+1) = 𝑎(𝑛) 𝜕𝑃
− 𝜂.  (9) 
𝜕𝑎 The artificial fish swarm algorithm (AFSA) uses swarming 
 and following behaviors to determine convergence speed. 
The current weights and biases at iteration 𝑛 are denoted However, narrow distances can cause local optima and 
by 𝑧(𝑛)and𝑎(𝑛). The learning rate (𝜂) is a hyperparameter delayed convergence. Randomization changes 
that controls the step size. The gradients of the loss swimming's step size to prevent premature convergence. 
𝜕𝑃 𝜕𝑃
function about weights and biases are ∂ and , The algorithm focuses on determining the optimal position 
𝜕𝑧 𝜕𝑎 of fake fish for efficient attribute reduction, and eliminates 
respectively. The learning rate 𝜂 adjusts the model's 
search behavior to save execution time. The enhanced 
weights and biases to reduce prediction errors for ADinPS. 
swarming and subsequent behaviors are defined as 
 
Equations (12-13). 
3.4.2 Improved fish swarm optimization (IFSO) 
 
FSO was selected over PSO, GA, and DE because of its 
𝑌𝑛𝑒𝑥𝑡 = 𝑌𝑗 + 𝑠𝑡𝑒𝑝 × (𝑌𝑑 − 𝑌𝑗)𝑖𝑓 𝐺(𝑌𝑑) > 𝐺(𝑌𝑗) 
greater global search capabilities and adaptive behavior, 
which improve convergence and classification accuracy in     (12) 
AD and FD. An IFSO is proposed to increase detection 𝑌𝑗 = 𝑌𝑑  𝑖𝑓 𝐺(𝑌𝑑) > 𝐺(𝑌𝑗)  (13) 
accuracy and convergence speed. For balanced exploration  
and exploitation, the system incorporates adaptive control 𝑌𝑛𝑒𝑥𝑡: The fake fish's next position.𝑌𝑗: The fake fish's 
overstep size and visual field, which shrinks with current location.𝑌𝑑: The position of the swarm's 
iterations. By eliminating default search behaviors and center.𝑠𝑡𝑒𝑝: The step size for movement is determined by 
crowding conditions, swarming and following techniques a random component. 𝐺(𝑌𝑑): The fitness value at the 
are improved. Fish retry with modified settings when an center position. 𝐺(𝑌𝑗): The fish's fitness value at that 
improved solution is discovered. To preserve the quality of present location. These changes improve the algorithm's 
global optimization, an extinction-regeneration system efficiency, resulting in faster convergence and higher 
removes the most susceptible fish and replaces it with a performance. 
more suitable one. This improved method efficiently Improved Search Behavior: In the AFSA, searching for 
optimizes BPNN parameters for AD and FD in PS. behavior entails exploring the available domain to discover 
The classic Fish Swarm Algorithm (FSA) has fixed visual alternatives. The number of tries has a significant impact 
and step sizes, which can hinder convergence. To improve on search efficiency, frequently resulting in premature or 
AD performance, an adaptive piecewise function is inefficient searches. To solve these things, extend the 
proposed to gradually decrease visual and step sizes with viewing field when no superior location is discovered after 
iterations, finding a balance between speed and accuracy. a certain number of difficulties. When a suitable place is 
Step𝑆𝑆 (𝑖𝑡𝑒𝑟) and adaptive 𝑉(𝑖𝑡𝑒𝑟) are defined as follows located, the fish takes one step towards that, with a 
in Equations (10-11). maximum step size of 𝑠𝑡𝑒𝑝𝑛𝑒𝑤 =  2 ×  𝑠𝑡𝑒𝑝. Without 
 false, the fish moves randomly. IFSO's capacity was 
𝑖𝑡𝑒𝑟
log (𝑚𝑖𝑛𝑣/𝑚𝑎𝑥 improved to efficiently tune BPNN parameters, hence 
𝑉(𝑖𝑡𝑒𝑟) = int (𝑚𝑎𝑥𝑣 × ( 𝑣)
) )(10) 
log (𝑚𝑎𝑥𝑔𝑒𝑛) increasing accuracy and convergence in PS anomaly and 
𝑖𝑡𝑒𝑟
log (𝑚𝑖𝑛 / fault detection. 
𝑆𝑆(𝑖𝑡𝑒𝑟) = int (𝑚𝑎𝑥𝑠 × ( 𝑠 𝑚𝑎𝑥𝑠)
) ) (11) 
log (𝑚𝑎𝑥𝑔𝑒𝑛) Mechanism of Extinction and Rebirth: The algorithm uses 
an extinction mechanism to remove the least suitable fish, 
 
𝑉(𝑖𝑡𝑒𝑟) enhancing swarm adaptability but decreasing swarm size 
: The artificial fish's field of vision at iteration iter. 
and randomness. A regeneration mechanism is then 
𝑆𝑆(𝑖𝑡𝑒𝑟): The maximum step the fish can take during 
included to restore swarm size by regenerating highly 
iteration. 𝑚𝑎𝑥𝑣: Step size and initial (maximum) visual 
adaptable fish, ensuring resilience and enhancing 
range. 𝑚𝑖𝑛𝑣: The smallest step size and visual range for 
efficiency by shortening iteration durations while 
efficient searching. The maximum number of iterations 
maintaining high fitness levels. The IFSO-BPNN 
is𝑚𝑎𝑥𝑔𝑒𝑛 . 𝑖𝑡𝑒𝑟: The number of the current iteration. For 
approach attempts to discover and detect deviations in 
discrete issues, 𝑖𝑛𝑡(. . . ) rounds values to integers. Values 
PSmore efficiently by optimizing neural network 
are rounded to integers, with a minimum step and visual 
parameters, assuring faster convergence, and improving 
sizes set to 1 for discrete issues such as attribute reduction 
prediction accuracy for proactive maintenance. Algorithm 
in Equations (10-11); both the visual and step sizes use an 
1 displays IFSO-BPNN. 
exponential decrease from maximum to minimum across 
 
iterations, allowing for quick global search at the 
 
beginning and accurate local search at the final stage. The 
 
provided AD, and FD framework's convergence and 
 
An Enhanced FSO-BPNN Framework for Anomaly Detection...                                               Informatica 49 (2025) 245-254     251 
 
Algorithm 1: IFSO-BPNN 4.1 Experimental setup 
Step 1: Initialize the BPNN parameters 
Initialize BPNN with input layer, hidden layers, and output The IFSO-BPNN technique is implemented on a machine 
layer equipped with an Intel i7 CPU, 16GB RAM, and a 512GB 
Set learning rate η and number of iterations max_iter SSD. Python 3.9 is used for implementation, including 
Step 2: Initialize the Fish Swarm Optimization (IFSO) libraries like NumPy, TensorFlow, Scikit-learn, and 
parameters Matplotlib for processing and visualization. Table 2 
Initialize fish swarm population size, maximum visual displays the hyperparameters of the proposed method. 
field (max_v), and step size (max_s)  
Set the minimum values for visual field (min_v) and step Table 2: Hyperparametric for proposed method 
size (min_s)  
Step 3: Data Preprocessing Hyperparameter Range/Value 
Preprocess data:  BPNN Learning Rate (η) 0.01 to 0.1 
    Normalize sensor readings using min-max Max Iterations (max_iter) 100 to 1000 
normalization Swarm Population Size 50 to 200 
    Perform feature extraction using Independent Max Step Size (max_s) 0.1 to 1.0 
Component Analysis (ICA) Min Step Size (min_s) 0.01 to 0.1 
Step 4: Training the BPNN with IFSO optimization Learning Rate (η) for 0.001 to 0.01 
for each iteration in range(max_iter): BPNN 
    for each fish in the swarm: Fitness Function Error of BPNN model 
        visual = V(iter)    predictions 
        step = SS(iter)     MSE Threshold for 0.001 
        if G(Y_d) > G(Y_j):   Convergence 
Y_j = Y_d Activation Function for Mish, TanhExp, or ReLU 
    for each fish in the swarm: BPNN 
BPNN.weights = optimize_with_fish_swarm(Y_j)  
BPNN.biases = optimize_with_fish_swarm(Y_j) 4.2 Performance outcome 
    for epoch in range(max_epochs): 
        output = BPNN.forward(input_data) Figures 3 and 4 show the ROC curve and confusion matrix 
        error = calculate_MSE(output, expected_output) for anomaly detection and fault detection, respectively. 
        gradients = backpropagate(error) The performance was evaluated based on the false positive 
BPNN.weights = BPNN.weights - η * gradients.weights rate, the true positive rate for the ROC curve, and the 
BPNN.biases = BPNN.biases - η * gradients.biases predicted and actual for the confusion matrix.  
Step 5: Extinction and Regeneration  
remove_weakest_fish() 
regenerate_strong_fish() 
Step 6: Anomaly and Fault Detection 
anomaly_score = BPNN.predict(test_data) 
fault_score = BPNN.predict(test_data) 
    if anomaly_score> threshold or fault_score> threshold: 
trigger_early_warning() 
Step 7: Return the optimized BPNN model for PS 
monitoring 
Return BPNN model optimized using IFSO 
 
4 Result and discussion 
 
 
This section compares the result of the proposed method, 
Figure 3: Anomaly detection (a) Roc curve, and (b) 
an enhanced IFSO-BPNN framework, for AD and FD 
confusion matrix. 
early warning in PS monitoring with existing methods. The 
 
evaluation was conducted using parameters such as 
accuracy (Acc), success rate (SR), misclassification 
instances (MI), error rate (ER), precision (Pre), recall 
(Rec), and F1 score (F1). 
 
 
252   Informatica 49 (2025) 245-254       N. Li et al. 
 
Table 3: FD metrics values for proposed method. 
 
Metrics  LSTM [18] IFSO-BPNN 
[Proposed] 
Acc (%) 91.21 98.5 
SR(%) 92.42 96.85 
MI 17 9 
ER (%) 8.76 5.15 
 
 
Figure 4: fault detection (a) Roc curve, and (b) confusion 
matrix. 
 
4.3 Parameter explanation 
Accuracy (Acc):Acc is defined as the ratio of accurately 
predicted occurrences (including true positives and true 
negatives) to total instances in a dataset, which measures 
the overall performance of PS monitoring and fault  
detection.Success rate (SR): The smart grid system is Figure 5(a): Acc and SR value for FD. 
calculated as the proportion of accurately discovered faults  
and successful predictions to improveFD and maintenance 
accuracy.Misclassification instances (MI): The events 
occur when the model incorrectly identifies problems or 
normal conditions to demonstrate the possible flaws in 
identifying power defects.Error rate (ER):The fraction of 
misclassified cases, revealing the model's errors with an 
emphasis on decreasing mistakes in FD for PS.Precision 
(Pre) is the fraction of successfully diagnosed errors 
among all expected anomalies, demonstrating detection 
accuracy. Recall (Rec) measures the model's ability to 
detect all real abnormalities. F1 Score (F1) balances 
precision and recall. These metrics assess the IFSO-BPNN 
model's ability to accurately detect and monitor PS faults. 
 
4.4 Comparison phase 
 
 
The proposed method, IFSO-BPNN, is compared to the 
Figure 5(b): MI and ER FD value for proposed method. 
existing methods like Long Short-Term Memory (LSTM) 
 
[18] for FD, k-Nearest Neighbors (KNN), Decision tree 
Table 4 and Figure 6show the comparison of the proposed 
classifier (DTC), and Random Forest (RF) [19] for ADand 
method and existing methods to evaluate the metric 
early warning in PS monitoring with evaluation metrics. 
valuesused to predict ADand early warning of PS 
Table 3 and Figure 5 (a-b) display the comparison of 
monitoring. The proposed IFSO-BPNN (0.9980) method 
metric values for the proposed method and existing 
achieves greater Acc than KNN (0.9729), DTC (0.9937) 
methods to predict FD and FD in early warning of PS 
and RF (0.9976). 
monitoring. The proposed IFSO-BPNN (98.5%) method 
 
achieves greater Acc than LSTM (91.21%). 
 
 
 
 
 
 
 
 
 
An Enhanced FSO-BPNN Framework for Anomaly Detection...                                               Informatica 49 (2025) 245-254     253 
 
Table 4: Metrics values for proposed vs existing methods.  
Metrics KNN DTC RF IFSO- 5 Conclusions  
[19] [19] [19] BPNN  
[Proposed] The improved early warning model, combining IFO with 
Pre 0.9732 0.9937 0.9976 0.9978 a BPNN (IFSO-BPNN), was presented to improve FD and 
Rec 0.9729 0.9937 0.9976 0.9977 predictive maintenance in smart power systems. The 
F1 0.9729 0.9937 0.9976 0.9979 method aims to optimize neural network parameters for 
Acc 0.9729 0.9937 0.9976 0.9980 higher detection accuracy. The results demonstrated 
 exceptional performance with FD accuracy (98.5%) and 
AD accuracy (0.9980) higher than existing methods. To 
address statistical validation, the IFSO-BPNN model has 
limited specificity, required more processing resources, 
and relied on precise parameter adjustment, which could 
leave an impact on real-time performance and 
generalizability across different power systems. The 
dataset's limited coverage of Beijing's local distribution 
stations, as well as a lack of sample size and class 
distribution information, limit its generalizability and 
model performance assessment. The future scope may 
extend the dataset to cover varied power systems, and 
providing precise details on sample size and class 
distribution would improve model resilience, 
 generalization, and performance evaluation. Future 
 research should focus on increasing specificity, testing in 
Figure 6: Evaluation metrics values for the proposed a variety of grid scenarios, and incorporating real-time 
method. adaptive processes to widen and improve the system's FD 
 capabilities and use confidence intervals and standard 
In this research, both BPNN and IFSO-BPNN techniques deviations to demonstrate dependability. Future directions 
were trained for FD and AD in PS. The numerical results include statistical validation methods, such as confidence 
of the ablations study for FD and AD in PS are displayed intervals and standard deviations, to support the reliability 
in Table 5, indicating that IFSO-BPNN performs better of results, providing clearer justification for performance 
than BPNN.   metrics and model robustness. Future work will 
 concentrate on providing thorough feature extraction, 
Table 5: Outcome of ablation study dimensionality reduction using ICA, and using correlation 
Method AD Acc (%) FD Acc (%) reduction methods for better analysis. Future research aims 
BPNN 98.0 98.2 to enhance model performance and generalization by 
IFSO-BPNN 99.8 98.5 improving feature extraction, incorporating diverse data 
 sources, and reducing dimensionality. 
4.5 Discussion  
References 
The proposed IFSO-BPNN method achieves higher Acc, 
Pre, Rec, F1 and SR and significantly reduces MI and ER [1] Wang G, Xie J, and Wang S (2023). Application of 
compared to existing methods like LSTM, KNN, DTCand artificial intelligence in power system monitoring and 
RF. Existing models struggle with real-time adaptation and fault diagnosis. Energies, 16(14), 5477. 
FD accuracy. The IFSO method overcomes these https://doi.org/10.3390/en16145477 
constraints by improving global search and optimizing [2] Chen Q, Li Q, Wu J, He J, Mao C, Li Z, and Yang B 
BPNN parameters for improved performance. The (2023). State monitoring and fault diagnosis of HVDC 
connection helps electricity systems identify faults and system via KNN algorithm with knowledge graph: a 
provide early warnings. The key benefit is the substantial practical China power grid case. Sustainability, 15(4), 
dependability and precision in predictive maintenance, 3717. https://doi.org/10.3390/su15043717 
which improves the robustness and efficiency of PS. [3] He K, Wang T, Zhang F, and Jin X (2022). Anomaly 
Deploying the IFSO-BPNN model in smart grids provides detection and early warning via a novel multiblock-
real-time defect detection, such as detecting transformer based method with applications to thermal power 
overheating early on, averting blackouts, lowering plants. Measurement, 193, 110979. 
maintenance costs, and enhancing energy distribution https://doi.org/10.1016/j.measurement.2022.110979 
reliability across locations. 
254   Informatica 49 (2025) 245-254       N. Li et al. 
 
[4] Stanković AM, Tomsovic KL, De Caro F, Braun M, the world: achievements in technology system 
Chow JH, Čukalevski N and Zhao S. (2022). Methods architecture and technological breakthroughs. Green 
for analysis and quantification of power system Energy and Intelligent Transportation, 1(1), 100020. 
resilience. IEEE Transactions on Power Systems, https://doi.org/10.1016/j.geits.2022.100020 
38(5), 4774–4787. [16] Bento ME (2022). Monitoring of the power system 
https://doi.org/10.1109/TPWRS.2022.3212688 load margin based on a machine learning technique. 
[5] Bolbot V, Theotokatos G, Hamann R, Psarros G, and Electrical Engineering, 104(1), 249–258. 
Boulougouris E (2021). Dynamic blackout probability https://doi.org/10.1007/s00202-021-01274-w 
monitoring system for cruise ship power plants. [17] Adumene S, Islam R, Amin MT, Nitonye S, Yazdi M, 
Energies, 14(20), 6598. and Johnson KT (2022). Advances in nuclear power 
https://doi.org/10.3390/en14206598 system design and fault-based condition monitoring 
[6] Baba M, Nor NB, Sheikh MA, Baba AM, Irfan M, towards the safety of nuclear-powered ships. Ocean 
Glowacz A, and Kumar A(2021). Optimization of Engineering, 251, 111156. 
phasor measurement unit placement using several https://doi.org/10.1016/j.oceaneng.2022.111156 
proposed case factors for power network monitoring. [18] Veerasamy V, Wahab NIA, Othman ML, 
Energies, 14(18), 5596. Padmanaban S, Sekar K, Ramachandran R, and Islam 
https://doi.org/10.3390/en14185596 MZ (2021). LSTM recurrent neural network classifier 
[7] Florkowski M (2021). Anomaly detection, trend for high impedance fault detectionin solar PV 
evolution, and feature extraction in partial discharge integrated power system. IEEE Access, 9, 32672–
patterns. Energies, 14(13), 3886. 32687. 
https://doi.org/10.3390/en14133886 https://doi.org/10.1109/ACCESS.2021.3060800 
[8] Du J, Wang X, and Zhang H (2025). Secure power [19] Mokhtari S, Abbaspour A, Yen KK, and Sargolzaei A 
management in wireless sensor networks for power (2021). A machine learning approach for anomaly 
monitoring using deep reinforcement learning. detection in industrial control systems based on 
Informatica, 49(19). measurement data. Electronics, 10(4), 
https://doi.org/10.31449/inf.v49i19.7125 407.https://doi.org/10.3390/electronics10040407 
[9] Qin G, Juan M, and Rui MH (2025). IoT-based  
intelligent power supply management using ensemble  
learning for seismic observation stations. Informatica, 
49(8). https://doi.org/10.31449/inf.v49i8.6502 
[10] Xiang L, Yang X, Hu A, Su H, and Wang P (2022). 
Condition monitoring and Anomaly detection of wind 
turbine based on cascaded and bidirectional deep 
learning networks. Applied Energy, 305, 117925. 
https://doi.org/10.1016/j.apenergy.2021.117925 
[11] Chen H, Liu H, Chu X, Liu Q, and Xue D (2021). 
Anomaly detection and critical SCADA parameters 
identification for wind turbines based on LSTM-AE 
neural network. Renewable Energy, 172, 829–840. 
https://doi.org/10.1016/j.renene.2021.03.078 
[12] Said AM, Yahyaoui A, and Abdellatif T (2021). 
Efficient Anomaly detection for smart hospital IoT 
systems. Sensors, 21(4), 1026. 
https://doi.org/10.3390/s21041026 
[13] Zhang C, Hu D, and Yang T (2022). Anomaly 
detection and diagnosis for wind turbines using long 
short-term memory-based stacked denoising 
autoencoders and XGBoost. Reliability Engineering 
& System Safety, 222, 108445. 
https://doi.org/10.1016/j.ress.2022.108445 
[14] Božiček A, Franc B, and Filipović-Grčić B (2022). 
Early warning weather hazard system for power 
system control. Energies, 15(6), 2085. 
https://doi.org/10.3390/en15062085 
[15] He H, Sun F, Wang Z, Lin C, Zhang C, Xiong R, and 
Zhai L (2022). China's battery electric vehicles lead 
https://doi.org/10.31449/inf.v49i12.9062  Informatica 49 (2025) 255-268   255 
 
Automated AutoCAD Drawing Assessment via Image Processing and 
Vector Transformation Techniques 
 
Zhengkai Xiong, Jiaming Ge, Rong Wei* 
Department of Mechanical and Electrical Engineering, Cangzhou Technical College, Cangzhou City, Hebei Province, 
061000, China 
E-mail: xzk870036@163.com, JiamingGe68@163.com, weirong525@163.com 
*Corresponding author 
 
Keywords: graphics processing, computer graphics. information extraction, computer graphics examination system, 
computer-aided design (CAD) 
 
Received: April 28, 2025 
Conventional assessment practices in computer graphics courses, particularly those that utilize AutoCAD, 
often rely on manual grading or basic template-matching strategies. These methods are ineffective and 
biased, particularly when used for extensive evaluations. Intelligent evaluation methods and automated 
image processing must be integrated as educational technology continues to evolve. The purpose of the 
proposed effort is to develop and put into use an intelligent AutoCAD computer drawing evaluation system 
that uses image processing technologies. Enhancing assessment accuracy, automating scoring, and 
utilizing robotic technologies to combine virtual drawing analysis and actual drawing validation are the 
objectives. The system evaluates student drawings using MATLAB-based techniques, including vector 
transformation, grayscale conversion, binarization, and histogram similarity. It extracts components 
using DXF file parsing, performs geometric matching, and features extraction. A feedback-driven 
retransmission method ensures packet correctness. A servo motor-powered drawing computer duplicates 
input drawings, and performance is assessed using torque analysis, picture entropy, consistency, and 
smoothness criteria. The system could accurately reproduce student drawings with an accuracy of more 
than 0.1 cm and an average drawing speed of 1.75 cm/s. The system's dependability was confirmed when 
evaluation ratings for example drawings nearly matched hand grading. Within the robotic arm's torque 
limits, moment and motion analysis verified operational safety and accuracy. The proposed approach 
automates computer graphics analysis by combining hardware and software elements for perceptive 
evaluation. However, limitations on robot motion and image quality sensitivity were limitations, requiring 
future improvements. 
Povzetek: Predstavljen je inteligentni sistem za avtomatsko ocenjevanje risb AutoCAD z obdelavo slik in 
vektorsko transformacijo. Uporablja DXF analizo, primerjavo slik in robotsko reprodukcijo za natančno 
in objektivno ocenjevanje. 
 
1 Introduction automates various engineering and design processes, 
despite its high skill and work requirements [4]. 
Recent advancements in generative models in language Screencasts enhance concurrent learning in CAD-based 
and imaging have transformed the perception of computers and technical drawing classes, providing flexible, self-
as co-creators, enabling creative AI to actively participate paced learning options for students lacking prior CAD 
in idea exploration [1]. Augmented Reality (AR) enhances experience and limited curriculum time [5]. Conventional 
learning in graphic design education by providing CAD systems enhance manufacturing productivity in 
dynamic, 3D-registered visuals, improving students' industries like metallurgy, glass working, and woodturning 
practical interaction with intricate mechanical structures by facilitating detailed 3D modeling and group technology 
and spatial comprehension [2]. AutoCAD is a popular for small-batch production [6]. 
program for creating technical drawings and The goal of the research is to create and put into use an 
documentation in design and architecture, but beginners intelligent AutoCAD computer drawing evaluation system 
may face challenges due to standardized teaching that uses image processing technologies. Enhancing 
strategies [3]. assessment accuracy, automating scoring, and utilizing 
Automatic List Processing (AutoLISP), a key component robotic technologies to combine virtual drawing analysis 
of AutoCAD, is a software development tool that and actual drawing validation are the objectives. 
256   Informatica 49 (2025) 255-268       Z. Xiong et al. 
 
• To create an automatic AutoCAD assessment restricted validation across several CAD platforms and 
system that combines image processing methods with reliance on teacher-defined coefficients [8]. Sulfur 
DXF file structure parsing for precise and impartial Hexafluoride (SF6) dial pointer recognition system that is 
grading.  accurate and effective, utilizing Computer Aided eXtended 
• To use sophisticated vector transformation Application (CAXA) secondary development for 
techniques, like skeleton extraction, binarization, and automated CAD drawing generation, open-source 
grayscale conversion, to transform visual drawing inputs computer vision library (OpenCV)-based angle detection, 
into formats that could be analyzed. and socket communication. The method achieved a 0.69° 
• To put into practice a feedback-driven average error, exceeding accuracy requirements; 
retransmission algorithm that replicates annealing restrictions include dependence on particular applications 
principles for effective drawing packet delivery and and restricted adaptability to varying dial designs [9]. 
correction. Following an experiment with focus groups, a literature-
• To create a robotic drawing platform with servo informed questionnaire was given to 59 students and 21 
motors that could physically replicate digital inputs, educators to assess preferences between hand drafting and 
confirming the accuracy of vector interpretations.  CAD in architectural working drawings. CAD was 
• To test mechanical drawing precision and selected for effectiveness and accuracy; restrictions 
compare automated scores with manual grading to assess include interest dependence on duplicate instructions and 
the accuracy and dependability of the suggested solution. restricted understanding of context during site visits [10]. 
System organization: Related research on AutoCAD To enhance the teaching of cosmetics design by 
assessment is reviewed in Section 2. The image processing incorporating graphic design software, evaluating efficacy, 
methods, methodology, and DXF file analysis are paintbrush choice, and digital design efficiency through 
explained in Sections 3–5. Results, experiments, and comparative tests, two-stage questionnaires, and expert 
system implementation are presented in Sections 6–11. assessments. Computer drawing reduced design time in 
The investigation is concluded in Section 12, which also half and increased the efficacy of instruction; however, the 
suggests potential enhancements for evaluation accuracy method had drawbacks, such as a learning curve at first and 
and scalability. a dependence on particular software capabilities like 
 symmetry functions [11]. Simple brush painting that is 
2 Related work automated and realistic. The approach, which was 
evaluated on the FaceX dataset using Python and 
Employing task performance metrics and rubric-based TensorFlow, merges an attention mechanism (AM) with a 
imagination evaluation with undergraduate students Long Short-Term Memory (LSTM) network. The model's 
compare the AutoCAD 2025 and AutoCAD Mechanical accuracy was 98.63% and the F1 score was 98.75%; 
2025 CAD tasks' efficiency and creativity. Efficiency and however, that requires a lot of processing power and the 
creativity were increased by AutoCAD Mechanical; outcome could differ depending on the dataset [12]. The 
however, short-term evaluation, a single discipline focus, computer vision system uses wavelet denoising, multi-
and a lack of user input analysis cloud were some of the feature fusion, transfer of style enhancement, and 
drawbacks [7]. Create an automated evaluation tool for recognition models trained on WikiArt and OilPainting 
CAD models in mechanical courses that uses a model- datasets to classify painting styles and analyze sentiment. 
based methodology to assess parametric, feature-based, The model attained 90% sentiment accuracy and over 95% 
and geometric aspects with parameterization. Although the style classification; however, performance could differ 
CAD Model Automatic Assessment (MAA) Tool when applied to less structured, real-world artwork that 
efficiently automates model evaluation, limitations include wasn't part of benchmark datasets [13]. Table 1 provides 
the related works summary table. 
 
Table 1: Comparative Summary of the related works 
Reference Method Dataset Result Limitation 
Gutiérrez et Comparison of Undergraduate AutoCAD Short-term study, single-
al. [7] AutoCAD 2025 vs. mechanical Mechanical improved discipline focus, no user 
AutoCAD Mechanical engineering students efficiency and feedback 
2025 using performance creativity 
metrics and creativity 
rubrics 
Eltaief et al. CAD Model Automatic Mechanical CAD Efficient automation Limited cross-platform 
[8] Assessment (MAA) models in an of model evaluation validation, depends on 
Tool using parametric, academic setting teacher-set parameters 
Automated AutoCAD Drawing Assessment via Image Processing and...                                  Informatica 49 (2025) 255-268     257 
 
geometric, and feature-
based evaluation 
Zhang et al. SF6 dial pointer Dial images with 0.69° average error, Limited generalizability, 
[9] recognition using angle readings high precision depends on specific 
OpenCV, CAXA, and software 
socket communication 
Fakhry et al. Survey with 59 students Architecture CAD preferred for Risk of overusing copy-
[10] and 21 educators coursework and field accuracy and paste, lack of site context 
comparing CAD vs. visits efficiency integration 
hand drafting 
Hsu et al. Integration of graphic Cosmetology Halved design time, Initial learning curve, 
[11] design software in students and experts improved software-dependent (e.g., 
makeup design teaching instructional mirror function) 
via experiments and effectiveness 
questionnaires 
Zhang [12] LSTM and attention- FaceX dataset 98.63% accuracy, High processing cost, 
based model for 98.75% F1 score dataset-sensitive 
automated brush 
painting (Python + 
TensorFlow) 
Cheng et al. Computer vision using WikiArt and 95%+ style Reduced accuracy on 
[13] wavelet denoising, OilPainting datasets classification, 90% non-benchmark, real-
feature fusion, and style sentiment accuracy world art 
transfer 
Research fills a critical gap by focusing on the absence of the calculation amount is small, the operation speed is fast,  
intelligent, automatic assessment systems for AutoCAD- and it is the most used method now in calculate the 
based drawings in educational settings. Research Equation (1).  
combines image processing and vector transformation 
techniques to provide accurate, objective, and scalable 1 |𝑔 −𝑠 |
𝑆𝑖𝑚(𝐺, 𝑆) = ∑𝑁= (1 − 𝑖 𝑖 )                   (1) 
𝑁 𝑖 1   
assessment, whereas existing solutions concentrate on 𝑀𝑎𝑥(𝑔𝑖,𝑠𝑖)
manual review or limited automation. The research helps 
to modernize CAD education, lessen the workload of Here, 𝐺 and 𝑆 are the histograms of the target image and 
instructors, and improve the learning experience for the source image𝑁 is the amount of color space models, is 
students with limited CAD competency by bringing the image attribute of the block area of the target image, 
automated outcomes into line with human grading and is the image attribute of the block area of the source 
standards and increasing the efficiency of drawing image. gi 𝑠𝑖 
interpretation. 
The histogram-based method was chosen. After all, it is 
 
more appropriate for real-time AutoCAD examination 
3 Image processing applied to systems because it is faster to execute and has less 
computer graphics examination- processing complexity while keeping competitive 
accuracy. The applicability of techniques like SSIM and 
related technologies 
cosine similarity in time-sensitive evaluation 
environments was diminished by the reality that they only 
The use of MATLAB for graphics processing is because slightly increased accuracy but arrived with much longer 
MATLAB has strong matrix operation capabilities, so the processing times. 
processed graphics are represented in the form of matrices 
or vectors [14-15]. 
4  Image acquisition and processing 
Funded by: Daqing Normal University Youth Fund 
Research Project (No.: 9ZQ08; Teaching Research Project 
of Heilongjiang Bayi Agricultural University (Project The typical AutoCAD computer drawing examination 
Title: Research and Application of Paperless Exam System system design mathematical model can be expressed by 
in Computer Graphics Courses. The images produced by Equation (2), and the calculation of the optimal computer 
the differences are similar) The measure of the degree, the processing analysis 𝑃 = {𝑢1, 𝑢2, ⋯ , 𝑢𝑘}: 
normalization can be well realized by using the histogram,  
258   Informatica 49 (2025) 255-268       Z. Xiong et al. 
 
𝑚𝑖𝑛 𝑍 (𝑃𝛼) = ∑𝑘=1 terms of accuracy and diversity 
𝑖=1 𝑑(𝑢𝛼 , 𝑢𝛼 ) + 𝑑(𝑢 , 𝑢𝑘)      (2) thorough analysis in 
𝑖 𝑖+1 𝛼1
typical illustration based on an image processing technique 
In the formula, 𝛼𝑖 is used to describe the reorganization of include [16-17] in Equation (6). 
the K computer-processed analysis point order and 
𝑑(𝑢𝛼 , 𝑢𝛼 ) describes the Manhattan distance between 𝐸𝑣𝑎𝑙(𝜋𝑝) = 𝜆𝐴𝑐𝑢(𝜋𝑝) + (1 − 𝜆)𝐷𝑖𝑣(𝜋𝑝)    
𝑖 𝑖+1
two points.       (6) 
The path optimization model for examining AutoCAD 
drawing elements during assessment is represented by 𝜆 ∈ [0,1] : The exactness and variety of drawings handed 
equation (2). Practically speaking, each point 𝑢  degree of importance in the complete analysis 
𝑖 represents out are the
a feature or object like a wall, door, or window that was criteria. Equation (6) uses the diversity 𝑝𝑟𝑜(𝜋𝑝)  of the 
taken from a student's artwork and represented as spatial image processing method's basic processing in Equation 
coordinates or data blocks. The function (7) to get the probability 𝐷𝑖𝑣(𝜋𝑝)  of choosing every 
𝑑(𝑢𝛼 , 𝑢𝛼 )computes the distance in meters between 
𝑖 𝑖+1 drawing processing basic analyzing technique as the 
consecutive features, a pertinent metric in CAD since evaluation's basic processing 
objects are frequently aligned to orthogonal grids. The 
variable 𝛼𝑖 indicates a sequence of such features as 𝐷𝑖𝑣(𝜋
𝑟𝑜(𝜋 ) = 𝑝)𝑝
identified by the system. Finding the most effective 𝑝 𝐵       
∑𝑝=1𝐷𝑖𝑣(𝜋𝑝)
structural match between the student's layout and the                                  (7) 
reference drawing is part of minimizing the sum of these 
distances. to restore the drawing's natural color and recognition, the 
The formula above quantifies the spatial deviation, for drawing processing network module works to optimize 
instance, if the student's layout rearranges or misaligns the each reconstructed drawing's color and spatial placement 
four rooms that are connected linearly in the correct [18–19]. The formula defines the drawing processing 
drawing. That allows the system to assess not only the network module's loss function Lip. The following 
elements' existence but also their arrangement in a processing losses are displayed: unmasked regions, 
sequence that makes sense geometrically and conceptually. masked regions 𝐿1
𝑖𝑛𝑝
𝑠𝑡𝑦𝑙𝑒 + 𝐿
2
𝑠𝑡𝑦𝑙𝑒  : Style loss, anti-loss 𝐿𝑡𝑜𝑡𝑎𝑙 
The following is the specific discriminant in Equation (3). 
: Total difference loss, and𝐿𝑝𝑒𝑟  : Perceptual loss in 
𝑑(𝑖, 𝑗, 𝑢): Inputs parameters function. 
Equation (8). 
  
 
𝑖𝑛𝑝
[0,0,0], 𝑖𝑓 𝑖 ≥ 𝑎 𝑎𝑛𝑑 𝑗 ≥ 𝑎 𝑎𝑛𝑑 𝑢 ≥ 𝑎 𝐿 1
𝑡𝑜𝑡𝑎𝑙 = 2𝐿𝑣𝑎𝑙𝑖𝑑 + 12𝐿ℎ𝑜𝑙𝑒 + 0.04𝐿𝑝𝑒𝑟 + 100(𝐿𝑠𝑡𝑦𝑙𝑒 +
𝑑(𝑖, 𝑗, 𝑢) = {[𝑖, 𝑗, 𝑢], 𝑖𝑓 𝑖 < 𝑎 𝑎𝑛𝑑 𝑗 < 𝑎 𝑎𝑛𝑑 𝑢 < 𝑎            𝐿
2
  𝑠 𝑡 𝑦𝑙𝑒) + 100𝐿𝑎𝑑𝑣 + 0.3𝐿𝑣𝑎𝑟         (8) 
[𝑖, 𝑗, 𝑢], 𝑖𝑓 𝑖 > 𝑏 𝑎𝑛𝑑 𝑗 > 𝑏 𝑎𝑛𝑑 𝑢 > 𝑏
(3) 𝐿: Loss function. The weight of each loss term in 
Manhattan distance (MD) is determined by examining 50 
The corresponding drawing processing drawing tests. The actual and unmasked processing modes are used, 
information feature vector 𝜒𝑖  The expression in Equation with 𝑀 representing the irregular binary mask, 𝐼𝑑𝑎𝑚 
(4). representing the damaged mode, and 𝐼𝑖𝑛𝑝 representing the 
outcome mode in Equation (9-10). 
𝑙𝜀(𝑔) = (1 − 𝜌)𝑙𝜀(𝑔 − 1) + 𝛾𝑓(𝜒𝑖(𝑔))                   (4) 
𝐿𝑣𝑎𝑙𝑖𝑑 = ‖𝑀 × (𝐼𝑖𝑛𝑝 − 𝐼𝑑𝑎𝑚)‖                              (9) 
1
𝑓 : Represents the adaptive function corresponding to the 
feature drawing feature vector 𝜒𝑖  of the drawing process. 𝐿ℎ𝑜𝑙𝑒 = ‖(1 − 𝑀) × (𝐼𝑖𝑛𝑝 − 𝐼𝑑𝑎𝑚)‖                 (10) 
𝛾𝜒 1
𝑖(𝑔) : The corresponding drawing processing analysis 
of the 𝜀𝑡ℎ dispensation in the actual application process. 
Rotate and restore the identification points against the 
Equation (5) contains for processing 𝜋𝑝 in Drawing 
original image. ℎ Is the connection point of the opening 
Processing𝐼𝐼. 
draw, placed in the drawing parallel to the identification 
graph 0  k  h (𝑣𝑥′ ′
2𝑘+1,𝑖 , 𝑣𝑦2𝑘+1,𝑖) for each 
𝐴𝑐𝑢(𝜋𝑝) = 𝑁𝑀𝐼(𝜋𝑝 , 𝜋
∗)                                            (5) 
identification point the following rotation operation is 
performed in Equation (11) 
𝜋𝑝 and 𝜋𝑞  represent the processing of drawing processing. 
If less information is shared with the drawing processing 
base drawing, the base drawing is less accurate. Drawings 
based on image processing techniques that define the 
Automated AutoCAD Drawing Assessment via Image Processing and...                                  Informatica 49 (2025) 255-268     259 
 
𝑑𝑥′ ′ ℎ
𝑘 = 𝑣𝑥2𝑘+1,𝑖 − 𝑣𝑥𝑘,𝑖 𝑣𝑥
′
2𝑘+1,𝑖 𝑣𝑥𝑘,𝑖
{ [ 𝑞𝑥 𝑖−2 𝑣𝑥𝑘,𝑖−𝑣𝑥𝑘,𝑖
𝑗 = ∑ 𝑖 ∑ = ∑ 𝑖 ?̃?
{𝑖| =𝑗} 𝑘=1
𝑑𝑦′ ′ ′ ] = [𝑣𝑦 ] + (𝑣𝑥 𝑖𝛼 = 𝑐?̃?
𝑘+1,𝑖−𝑣𝑥𝑘,𝑖)𝑝𝑘 {𝑖| =𝑗} 𝑗𝛼
𝑘 = 𝑣𝑦2𝑘+1,𝑖 − 𝑣𝑦𝑘,𝑖 𝑣𝑦2𝑘+1,𝑖 𝑘,𝑖 { 𝑐 𝑐
     
ℎ
𝑐𝑜𝑠(−𝜃) − 𝑠𝑖𝑛(−𝜃) 𝑑𝑥′ 𝑖−2 𝑣𝑦
𝑞𝑦 𝑘,𝑖−𝑣𝑦𝑘,𝑖
[ ] × [ 𝑘] 𝑗 = ∑ 𝑖 ∑ = ∑ 𝑖 ?̃? 𝛼 = 𝑐?̃?
   (11) {𝑖| =𝑗} 𝑘=1 (𝑣𝑦
𝑠𝑖𝑛(−𝜃) 𝑐𝑜𝑠(−𝜃) 𝑑𝑦′
   𝑐 𝑘+1,𝑖−𝑣𝑦𝑘,𝑖)∘𝑝 {𝑖| =𝑗} 𝑖 𝑗𝛼
𝑘 𝑐
𝑘
                                                                                       (17) 
For each drawing 𝑃𝑖  and ?̃?𝑖 , calculate its barycentric 
Aggregated gradients 𝑞𝑥𝑗 , 𝑞𝑦𝑗 for group 𝑗 are defined by 
coordinates after removing the last point. In the formula, ℎ 
equations (14)–(17) based on value changes and 
is the length (number of nodes) of the i-th graph after the 
spacing(𝛥𝑥
original graph is split, ℎ′ 𝑖 , 𝛥𝑦𝑖). Weighted by coefficients 𝛼 and 
𝑖  is inserted into the length that 
transformation parameter ?̃?
has been found, and Equation (12–13) computes the next 𝑖, gradients are calculated using 
either straight finite differences or multistep 
two offset values. 
approximations, depending on whether 𝑥⁄𝑦 differences are 
ℎ′ zero. Calculate numeric information values for each plot 
1
𝑣𝑥′
1 𝑖−1 ′ ℎ𝑖−1
𝑖 = ∑
𝑘=1 𝑣𝑥 𝑣𝑥
ℎ′ 𝑘 𝑖 = ∑
𝑘=1 𝑣𝑥ℎ 𝑘 in Equation (18). 
{ 𝑖−1 { 𝑖−1
ℎ′ 1      (12) 
ℎ
𝑣𝑦′
1 𝑣𝑦
𝑖 = ∑ 𝑖−1 𝑖 = ∑ 𝑖−1
=1 𝑣𝑦
′
ℎ′ 𝑘 𝑘 ℎ 𝑘=1 𝑣𝑦𝑘
𝑖−1 𝑞𝑥
𝑖−1 𝑗+𝑞𝑦𝑗
1, > 1
?̃?𝑗 = {
2
𝑞𝑥                                      (18) 
𝑗+𝑞𝑦𝑗
1 ℎ −1 0, < 1
𝛥𝑥𝑖 = ∑ 𝑖
𝑘=1 (𝑣𝑥 2
2ℎ 𝑘+1 − 𝑣𝑥𝑘)𝑝𝑘
{ 𝑖−2
1                     (13) 
ℎ 1
𝛥𝑦 𝑖−
𝑖 = ∑𝑘=1 (𝑣𝑦2ℎ 𝑘+1 − 𝑣𝑦𝑘)𝑝𝑘
𝑖−2 Once the digital data has been extracted, verify that the 
identification is valid. The method recognizes similarities 
Anzhao Equations (13-17) respectively compute the between the unique recognition and the extracted 
recognition points of the perpendicular and parallel acceptance by using the correlation coefficient 
organized of each drawing. 𝑐𝑜𝑟(𝑚, ?̃?)in Equation (19). 
①𝛥𝑥 ∑𝑛−1
𝑖 ≠ 0&&𝛥𝑦𝑖 ≠ 0 𝑖=0 (𝑚𝑐𝑜𝑟(𝑚, ?̃?) = 𝑖?̃?𝑖)                       (19) 
√∑𝑛−1𝑚2 𝑛−1 2
𝑖=0 𝑖 √∑𝑖=0 ?̃?𝑖
𝑣𝑥′−𝑣𝑥
 𝑞𝑥 𝑖
𝑗 = ∑
𝑖
𝑖 = ∑ 𝑖 ?̃?
𝑖| =𝑗} 𝛥𝑥 𝑖𝛼 = 𝑐?̃?
𝑖 {𝑖| =𝑗} 𝑗 ∘ 𝛼{
𝑐 𝑐
           (14) The digital information ?̃? is taken from the recognized 
 𝑣𝑦′ graphics, and the correlation coefficient between the 
𝑞𝑦 𝑖−𝑣𝑦𝑖
𝑖 = ∑ 𝑖 ?̃? = 𝑐?̃?
{ 𝑗 = ∑ | =𝑗} 𝛥𝑦 {𝑖| =𝑗} 𝑖𝛼 𝑗 ∘ 𝛼{𝑖
𝑐 𝑖 𝑐 digital data m and m is𝑐𝑜𝑟(𝑚, ?̃?). 
②𝛥𝑥𝑖 = 0 , 𝛥𝑦𝑖 ≠ 0 5  DXF file format 
ℎ𝑖−2 𝑣𝑥
𝑞𝑥 𝑘,𝑖−𝑣𝑥𝑘,𝑖
𝑗 = ∑ 𝑖 ∑
{𝑖| =𝑗} 𝑘=1 = ∑ 𝑖 ?̃? = 𝑐?̃?
(𝑣𝑥 DXF is a data interchange file. AutoCAD supports saving 
𝑐 𝑘+1,𝑖−𝑣𝑥 =𝑗} 𝑖𝛼 𝑗𝛼
𝑘,𝑖)𝑝𝑘 {𝑖|
𝑐
{   a n  d reading of DXF format files to exchange data with 
𝑣𝑦′
𝑞𝑦 𝑖−𝑣𝑦𝑖
𝑗 = ∑ 𝑖 = ∑ 𝑖 ?̃? = 𝑐?̃? other applications. The DXF file is an ASCII file, so it is 
𝑖| =𝑗} 𝛥𝑦𝑖 {𝑖| =𝑗} 𝑖𝛼 𝑗𝛼{
𝑐 𝑐 helpful to use it as the evaluation basis for the answer, 
                                                                                      (15) 
check the correctness and rationality of the software 
production, and design appropriate scoring rules [20-21]. 
③𝛥𝑥𝑖 ≠ 0, 𝛥𝑦𝑖 = 0 The proposed AutoCAD test system incorporates image-
based similarity scoring with rule-based methods like 
𝑣𝑥′
𝑞𝑥 𝑖−𝑣𝑥𝑖 vector comparisons and DXF parameter matching, 
𝑗 = ∑ 𝑖 = ∑ 𝑖 ?̃? = 𝑐?̃?
𝑖| =𝑗} 𝛥𝑥 𝑗𝛼{
{ 𝑐 𝑖 {𝑖| =𝑗} 𝑖𝛼
𝑐 e  n  s uring both structural and visual accuracy in assessment, 
ℎ𝑖−2 𝑣𝑦 𝑦
𝑞𝑦 aligning with human evaluation standards, and ensuring a 
𝑗 = ∑ 𝑖 ∑ 𝑘,𝑖−𝑣 𝑘,𝑖 = ∑ 𝑖 ?̃? = 𝑐?̃?
{𝑖| =𝑗} 𝑘=1 (𝑣𝑦
𝑐 𝑘+1,𝑖−𝑣𝑦𝑘,𝑖)∘𝑝𝑘 𝑖| =𝑗} 𝑖𝛼{ 𝑗𝛼
𝑐 comprehensive evaluation process. 
                                                                                      (16) The DXF file consists of six parts: HEADER, CLASSES, 
TABLES, BLOCKS, ENTITIES, and OBJECTS. Each 
④𝛥𝑥𝑖 = 0 ,𝛥𝑦𝑖 = 0 segment starts with group 0-SECTION and ends with 
group 0-ENDSEC. The format used here is FORTRAN I3, 
and the next line of different integer values represents 
different parameter values, such as strings, integer values, 
real numbers, etc., which have different meanings. 
260   Informatica 49 (2025) 255-268       Z. Xiong et al. 
 
The DXF parsing method compares graphic primitives 7  Development tool selection 
based on spatial coordinates from entity definitions. The 
alignment between student and reference drawings is 
Because the system is built in AutoCAD, it is executed in 
evaluated using coordinate-based geometric matching. A 
AutoCAD, the commonly used software is Visual LISP, 
tolerance criterion accounts for small inconsistencies. 
VisualC, VisualB a total of three kinds [22], Table 2 is a 
Only relevant geometric aspects are highlighted by 
comparison of the above three kinds of software. 
filtering layer and annotation data. 
The system manages multiple DXF layers by parsing each 
layer separately and comparing those using specific Table 2: Feature comparison of three programming 
drawing components. DXF entity attributes interpret line languages. 
styles, allowing for small variations. DXF text entities 
extract annotations, including text and dimensions, and Easy to runnin
flexibilit confidentiali
assess alignment consistency, position, and content about  learn and g 
y ty 
the standard drawing. use speed 
Geometric matching of drawing elements and DXF file Visua Differen
good good quick 
parsing are the primary methods used for scoring, while l C ce 
image features help confirm the correctness of robotic Visua
Differen
reproduction. l good Difference slow 
ce 
LISP 
6  Examination system processing flow Visua
good good good quick 
l B 
 
When candidates open the examination questions, they are 
automatically loaded into the examination system through 
As can be seen from Table 2, the Visual LISP language is 
the interface of VBA and AutoCAD. After the examinee 
easy to learn but lacks flexibility when completing 
completes the answer, the submit action is executed, and 
complex system software. Visual C has strong functions 
the automatic evaluation engine is started. The software 
and high flexibility, but it is relatively complex, and 
runs in the background, firstly outputs the candidates' 
programmers have high requirements for computer 
answers to the DXF file, detects the corresponding 
knowledge, making it difficult to use and grasp quickly. 
standard answer DXF file according to the test question 
Visual B has the advantages of both Visual LISP and Visual 
number, compares, and calculates the score according to 
C. 
the scoring parameter table of the test question package. 
Disadvantage, this is one of the reasons for AutoCAD to 
Then upload it to the server and record it. The design 
switch to Visual B support. Therefore, to develop this 
process of the computer drawing test system is displayed 
system, it is appropriate to use Visual B. 
in Figure 1. 
AutoCAD has built-in VBA comprehensive development 
tools. (VisualBasicforApplication). 
8  Design process of computer graphics 
examination system 
Compile the source program with GCC. The driver of the 
camera is uvcvideo, which supports two formats, YUYV 
and MPJEG. The device supports the image file 
imagebmp.bmp captured by the USB camera, such as 
streaming I/O operations. 
Use the ARM-Xilinx-Lnux cross-compilation 
environment to cross-compile the source files, and copy 
the executable files generated by compilation to SD. Use 
the command ARM-Xilinx-Linux-gnuueab-gccv4l2 grab. 
 
Compile the c-ozed-camera program, copy the compiled 
Figure 1: Processing flow of computer graphics 
executable file zeed-camera to the Zed-board, connect the 
examination system. 
USB camera to the Zedboard, connect the cd to the /dev 
folder, and use the ls command to confirm whether the dev 
directory is There are videoO devices. Executable files, if 
Automated AutoCAD Drawing Assessment via Image Processing and...                                  Informatica 49 (2025) 255-268     261 
 
any. Before executing the file, command chmod+x zed-
camera or chmod777zed-camera to obtain the executable 
permission of the file. The former is only valid for the 
current user; the latter is valid for all users. Execute the 
executable program according to the command zed-
camera, and as shown in Figure 2, code 1 shows the 
HyperTerminal code. 
Code 1: Information is displayed on the HyperTerminal.  
Support format: 
1. 𝑌𝑈𝑉 4: 2: 2(𝑌𝑈𝑌𝑉)  
 
Figure 2: The program obtains the picture successfully. 
2.𝑀𝐽𝑃𝐸𝐺  
𝑓𝑚𝑡. 𝑡𝑦𝑝𝑒: 1  9  Vector transformation of images 
Using a vector drawing computer as input, before drawing, 
𝑝𝑖𝑥. 𝑝𝑖𝑥𝑒𝑙𝑓𝑜𝑟𝑚𝑎𝑡: 𝑌𝑈𝑌𝑉  the target image needs to be converted into a vector 
diagram suitable for the execution of the drawing 
𝑝𝑖𝑥. ℎ𝑒𝑖𝑔ℎ𝑡: 480  computer. As shown in Figure 3, the process includes 
grayscale conversion, binarization, isolated pixel removal, 
𝑝𝑖𝑥. 𝑤𝑖𝑑𝑡ℎ: 640  edge refinement [23-24], position restriction, continuous 
curve detection, synthesis, and other actions. 
𝑝𝑖𝑥. 𝑓𝑖𝑒𝑙𝑑: 1  
𝑖𝑛𝑖𝑡/𝑑𝑒𝑣/𝑣𝑖𝑑𝑒𝑜0 [𝑂𝐾]  
𝑔𝑟𝑎𝑏 𝑦𝑢𝑦𝑣 𝑂𝐾  
 
𝑠𝑎𝑣𝑒/𝑢𝑠𝑟/𝑖𝑚𝑎𝑔𝑒_𝑦𝑢𝑣. 𝑦𝑢𝑣 𝑂𝐾  Figure 3: The vector transformation process of the image. 
𝑐ℎ𝑎𝑛𝑔𝑒 𝑡𝑜 𝑅𝐺𝐵 𝑂𝐾   
The system uses an image-to-vector procedure to extract 
vector features from rasterized student outputs for 
𝑠𝑎𝑣𝑒/𝑢𝑠𝑟/𝑖𝑚𝑎𝑔𝑒_𝑏𝑚𝑝. 𝑏𝑚𝑝 𝑂𝐾  
comparison with the standard, ensuring consistent 
evaluation despite the vector-based nature of AutoCAD 
The USB camera used supports both YUYV and MJPEG. drawings, which standardizes various input formats like 
The collected pictures in the two formats are saved in the scanned or non-DXF submissions in Equation (20). 
/usr folder and can be displayed in the picture browser. For processing convenience, first, convert the 3-channel 
color image collected by the camera into a single-channel 
A complete digital image processing system requires an grayscale image. 
image display system in addition to the image collection 
system. Add a display interface developed by Qt on Linux 𝑓 = 𝑓𝑅 ∗ 0.299 + 𝑓𝐺 ∗ 0.587 + 𝑓𝐵 ∗ 0.114                  (20) 
to display the collected images. 
Here𝑓𝑅,𝑓𝐺,𝑓𝐵 respectively represent a 3-component image 
in the RGB space, and f represents a transformed grayscale 
image. 
An adaptive threshold technique was used to transform the 
acquired grayscale image into a binary image [25–26]. 
Establish the binarization threshold as 𝑇 (𝑎 ≤ 𝑇 ≤ 𝑏) and 
ascertain the image's gray value range [𝑎, 𝑏] in Equation 
(21). 
262   Informatica 49 (2025) 255-268       Z. Xiong et al. 
 
1, 𝑓(𝑥, 𝑦) ≥ 𝑇 As shown in Figure 6, using different thicknesses to 
𝑓𝑇(𝑥, 𝑦) = {                         (21) 
0, 𝑓(𝑥, 𝑦) < 𝑇 represent different vector curves, the image consists of 4 
curves in total. 
The plotter uses it to verify the accuracy of the vector 
transformation by creating a physical reference by 
converting standard images to vector paths. The following 
assures that student drawings are appropriately interpreted 
by the system's scoring engine, which is based on picture 
similarity and DXF matching. As a result, both digital and 
tangible elements support evaluation accuracy and 
consistency. 
Among them, 𝑓𝑇 represents the transformed binary image. 
An example of the transformed effect is shown in Figure 
4. 
 
Figure 6: Initial vector curve. 
To further advance the drawing effectiveness of the 
drawing computer and reduce the number of actions of the 
drawing computer to raise and drop the pen, the divided 
vector curves are merged, and the adjacent non-closed 
vector curves are converted into closed curves. As shown 
in Figure 7, after merging and optimization, the vector 
curves in the figure are reduced from 4 to 3. 
 
Figure 4: Example of a binarized image. 
Through skeleton extraction, the image edge curve with a 
single pixel width is obtained as shown in Figure 5. 
 
Figure 7: Merged and optimized vector curves. 
Vector curves were successfully decreased from 4 to 3 by 
the merging stage without sacrificing structural integrity in 
Figure 7 enhancing drawing efficiency. After the above 
series of image processing operations, the target image to 
be drawn can be converted into a vector curve recognizable 
 by the drawing computer, and the drawing operation can 
Figure 5: Refinement of the image. be performed after being downloaded to the computer 
actuator. 
A single-pixel-width edge was consistently formed by the Therefore, image entropy can be selected as a 
refinement process in Figure 5. For the drawn image to be characterization feature of Chinese paintings, calligraphy 
the center of the canvas, the image must be snapped in images, and man-made images [27-28]. Equation (22) uses 
position, leaving only the valid portion of the image. The the image entropy𝑝(𝑧𝑖)(𝑖 = 0,1,2,⋯ , 𝐿 − 1)as a random 
method in this paper first obtains the contour of the variable that represents grayscale, where 𝐿 is the number 
effective image through the edge detection algorithm, of identifiable grayscales and the related histogram. 
determines the four outermost pixel points, and then cuts 
𝐿−1
the rectangle determined by the four points to obtain the 𝑒 = −∑𝑖=0 𝑝(𝑧𝑖) 𝑙𝑜𝑔2 𝑝 (𝑧𝑖)                         (22) 
required image and its coordinate information. 
Automated AutoCAD Drawing Assessment via Image Processing and...                                  Informatica 49 (2025) 255-268     263 
 
From the nature of entropy, the average uncertainty of the reference directory. Go to the directory where the 
equal probability distribution source is the largest, and the installation files were extracted and enter the following 
uncertainty of the random variable distribution is the command. 
largest at this time. The images of Chinese painting and The "Compile and Make Runtime Library Files" part 
calligraphy are obtained from nature, and artificial images supports the image capture component for image-to-vector 
are produced by people's subjective thoughts, so the transformation, creating reference vector diagrams and 
images of Chinese painting and calligraphy are more verifying robotic drawing reproduction. To ensure end-to-
complicated than artificial images. end integrity of the suggested evaluation workflow, 
The image's edge, with grayscale variation, boundary, and supporting visual comparison and physical drawing 
direction, contains the most image information. Systematic validation, even though not directly related to CAD 
measure measures regional difference, maximal when all scoring. Algorithm 1 displays the command to compile the 
levels are equal [29]. The system employs DXF parsing executable file. 
and vector manipulation to address edge scenarios, 
compensate for scaled or rotated drawings, reduce partial Algorithm 1: Library Files to installation and extracted 
occlusions, and validate components across layers before command 
scoring, thereby preserving grading accuracy and 
enhancing robustness, while addressing incorrect layer 
𝑑𝑑 𝑖𝑓 =/𝑑𝑒𝑣/𝑧𝑒𝑟𝑜 𝑜𝑓 = 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. 𝑖𝑚𝑔 𝑏𝑠 =
utilization. The appropriate histogram is making𝑝(𝑧𝑖)(𝑖 = 1𝑀 𝑐𝑜𝑢𝑛𝑡 = 80  
0,1,2,⋯ , 𝐿 − 1), where 𝐿 is the number of distinct gray 
levels, and Equation (23) defines the consistency𝑈. 
𝑚𝑘𝑓𝑠. 𝑒𝑥𝑡4 − 𝐹 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. 𝑖𝑚𝑔  
𝑈 = ∑𝐿−1𝑖=0 𝑝2(𝑧𝑖)                    (23)          
𝑐ℎ𝑚𝑜𝑑𝑔𝑜 + 𝑤 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. 𝑖𝑚𝑔  
      
𝑚𝑜𝑢𝑛𝑡 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4. 𝑖𝑚𝑔 − 𝑜 𝑙𝑜𝑜𝑝/𝑚𝑛𝑡  
From the perspective of the generation mechanism of 
Chinese paintings, calligraphy images, and artificial 
images, Chinese paintings, and calligraphy images have 𝑐𝑝 − 𝑟𝑓/𝑢𝑠𝑟/𝑙𝑜𝑐𝑎𝑙/𝑇𝑟𝑜𝑙𝑙𝑡𝑒𝑐ℎ/𝑄𝑡 − 4.7.3/∗/
local obvious recognition and other local features [30]. 𝑚𝑛𝑡𝑐ℎ𝑚𝑜𝑑 𝑔𝑜 − 𝑤 𝑞𝑡_𝑙𝑖𝑏_𝑒𝑥𝑡4.  
Therefore, consistency can be a feature that distinguishes 
these two images. 𝑖𝑚𝑔  
The second-order moment (homogeneous 𝑢𝑚𝑜𝑢𝑛𝑡/𝑚𝑛𝑡  
variance𝜎2(𝑧) = 𝜇2(𝑧)) is another important feature of 
the identification feature. It represents the measure of gray- Therefore, the library files under the /usr/local/troltech/Qt-
level contrast and can establish a descriptor about 4.7.3/ folder are all included in the newly made 80M image 
smoothness, which is expressed by Equation (24). file. The library is ready. 
1
 𝑅 = 1 −  
1+𝜎2
            (24)  First, the AutoCAD exam questions are classified 
(𝑧)
according to the knowledge points of the exam. The 
important functions and knowledge points of AutoCAD 
A region's relative brightness smoothness is measured by 
are drawing and editing of graphics, dimensions and 
its cleanliness. 𝑅 = 0 : In the area with constant 
dimensions, text styles and annotations, setting of 
brightness; 𝑅 = 1 in the area where the gray level value 
environment variables, query, view scale, block, pattern 
deviates significantly. 
filling, etc. [14]. Each exam question sets a scoring 
parameter table according to the knowledge points. 
10  Compile and make runtime library The image processing technique is a simulation algorithm 
files that mimics the solid annealing process in real life. This 
process involves heating and cooling a solid, causing 
disorder and increasing internal energy [16]. The particles 
In the directory f where the project is located, use the 
are then sorted slowly, reaching equilibrium at every 
command qmake-Project to obtain the project file lab2 to 
temperature. The image processing method consists of two 
generate qtcamera.pro. Then use the qmake command to 
stages: drawing processing and recognition. The specific 
generate the makefile file, and use make to compile the 
steps include achieving the ground state at room 
executable file. 
temperature, minimizing internal energy, and achieving 
The execution of the Qt software depends on the 
the equilibrium state at every temperature. 
executable library, which is created and mounted to the 
264   Informatica 49 (2025) 255-268       Z. Xiong et al. 
 
1) Each rendering packet is sent at a specific time interval, method needs to update the feedback matrix to determine 
and the source node gets ACK or NAK feedback the different types of packets, and the image processing 
information to create a feedback matrix T that keeps the method not only combines all lost packets for 
update up to current. The origin node processes N retransmission but also the number of recommended 
rendering packets for K receiving nodes. packets is determined by Mmax. 
2) After the resource node has processed N packets, it 
enters the retransmission phase with time. All missing 2) The image processing method is not affected by the 
packets make up the largest coefficient 𝐷 = distribution of lost packets, and the number of 
{𝑋1, 𝑋2, 𝑋3, ⋯ , 𝑋𝑛} in the set 𝐺 = recommended packets is determined mainly by the 
{𝑔𝑖1, 𝑔𝑖2, 𝑔𝑖3, ⋯ , 𝑔𝑖𝑛}(1 ≤ 𝑖 ≤ 𝑀𝑚𝑎𝑥()) coefficient vector receiving node with the most lost packets. Algorithm 2 
𝑀𝑚𝑎𝑥  (chosen randomly from a limited domain) to shows the pseudocode of the main module. 
recommend all missing plotting packets.𝐹𝑞 generates 
𝑀𝑚𝑎𝑥  recommendation packages. The maximum number Algorithm 2 Pseudocode of the main module 
of lost packets at all nodes of 𝑀𝑚𝑎𝑥  is Equation (25). 
function main(): 
𝑀 𝑚𝑎𝑥 {∑𝐾𝑗=1𝑇(𝑖, 𝑗)}                        (25) 
𝑖∈{1,2,⋯,𝐾} 𝑚𝑎𝑥
Input 
3)After resending the recommended drawing package, 
each receiving node approximates and shows the  image = getImageInput() 
arrangement of its own recommendation vector matrix G. 
𝑀𝑚𝑎𝑥  If 𝑟𝑖 ≠ 𝑁, the node means that G does not reach the dxf = getDXFInput() 
complete permutation, then the node needs to notify the 
source node and resend some recommended packets, and  Preprocessing 
G can be a complete permutation. Here by indicating the 
required recommendation grouping, the specific situation gray = toGrayscale(image) 
in Equation (26) 
    binary = binarize(gray) 
𝑁 − 𝑟𝑖 , 𝑟𝑁𝑖 = { 𝑖 ≤ 𝑁
                           (26) 
0, 𝑟𝑖 ≥ 𝑁
    edges = refineEdges(binary) 
in the formula𝑖 = 1,2,⋯ , 𝐾. 
vector_img = extractVectors(edges) 
In the drawing resend phase, if the receiving node receives  DXF Feature Extraction 
the recommended drawing packet, it is 0. When a node 
loses two recommended packages Ri𝑀𝑚𝑎𝑥 Ni 𝑅𝑖then𝑁𝑖 = student_feats = parseDXF(dxf) 
2。 
ref_feats = loadReference () 
4) The source node updates based on the feedback value of 
each receiving node, and generates the recommended  Scoring 
group key algorithm in the new retransmission stage.𝑁𝑖
Mmax 𝑀𝑚𝑎𝑥(4)。 hist_score = histogramSimilarity (vector_img, ref_feats) 
dist_score = sum(manhattanDist(student_feats[i], 
5) 3) and 4) are repeated until all receiving node vector 
ref_feats[i]) for i in range(len(ref_feats))) 
matrices are equal to N. That is, 𝑀𝑚𝑎𝑥 , with no lost 
packets, the receiving node can decode the original 
comp_score = completeness (student_feats, ref_feats) 
drawing packets using Gaussian elimination. 
 Final Score 
It can be seen that the differences between the ERA and 
AutoCAD computer drafting methods presented here are 
mainly shown in the following points. final_score = 0.5 * hist_score + 0.3 * (1 - 
normalize(dist_score)) + 0.2 * comp_score 
1) Image processing methods have low complexity in 
combining lost packets. The AutoCAD computer drawing print("Score:", final_score) 
Automated AutoCAD Drawing Assessment via Image Processing and...                                  Informatica 49 (2025) 255-268     265 
 
 Retransmission & Drawing (Optional) furniture is lacking, and the score of the project test paper 
is 92. Teachers who graded the papers by hand achieved 
    packets = 92 grades. The input B student diagram completes the 
feedbackRetransmit(preparePackets(vector_img)) display of the axis, doors and windows, part of the wall, 
and part of the size. The placement of furniture, stairs, and 
    if final_score>= threshold: a part of the wall was incomplete, and the score for the 
program volume was 80. Teachers who manually graded 
the papers achieved 80 grades. According to the output 
drawRobot(pathPlan(packets)) 
grades of students A and B, the program calculation results 
based on the similarity principle are consistent with the 
11  Examples and results analysis integer part of the manually collected results, and the 
decimal point can be rounded off. 
In the specific operation of Auto CAD computer drawing Image similarity and CAD content evaluation are included 
pictures, by comparing the answer pictures of the two in the scoring. First, histogram-based similarity is used to 
students with the correct answer pictures, this operation compare answer photos to standard images. After that, 
can also test the rationality of the computer program parsed DXF files are used to evaluate CAD-specific 
system. The graph is divided into three different types of aspects, such as axes, walls, doors, and dimensions. 
CAD drawings, the graphs of the same size as the standard Student A and B's scores (92 and 80) match the hand-
picture are cut out, a folder is created to save them together, assessed results, indicating both visual accuracy and 
and then the similarity calculation method is used to content completeness, which could be explained by the 
calculate the corresponding answer. The import system of blended technique. 
the A and B pictures given by the two students is shown in Using the above design scheme, a suspended drawing 
Figures 8 and 9 below. computer drawing examination system based on a servo 
motor drive is developed. The main parameters are shown 
in Table 3. 
Table 3: The table of drawing computer parameters 
Hangi
white Suppl maxim Rotatin
ng 
qualit board y um g 
point 
y(kg) height volta torque( speed(ra
spaci
(m) ge(V) kg/cm) d/min) 
ng(m) 
 0.335 0.86 0.45 11.1 2.22 300 
Figure 8: Calculation program for reading A students'  
scores (92). 
The servo motor-powered drawing machine physically 
reproduces digital drawings to verify the accuracy of the 
system's vector interpretation. The machine provides real 
execution for verifying vector outputs and evaluating 
picture processing integrity, ensuring the practical 
robustness of the suggested AutoCAD evaluation system. 
Vectorizer outputs maintained an average speed of 1.75 
 cm/s and a drawing accuracy of better than 0.1 cm, 
Figure 9: Calculation program for reading B students' according to quantitative validation using the drawing 
scores (80). robot (Table 3). That suggests the vector conversion 
pipeline had little distortion. However, when the result is 
Histogram-based comparison methods were used to weak contrast edges or overlapping stroke regions during 
calculate the similarity scores between the student binarization, the method introduces small alignment errors 
drawings and the reference. Early instances (Students A that could impact snapping accuracy or curve continuity. 
and B) are depicted in Figures 8 and 9, which closely Usually, post-processing techniques like coordinate 
match grades assigned by humans. anchoring and contour-based cropping help to reduce these 
In the input drawing of student, A, the student has inaccuracies. As demonstrated by student score 
completed the axis, walls, doors and windows, as well as comparisons (e.g., students A and B matched human-
some dimension display, the configuration of stairs and evaluated scores), the vectorization accuracy generally 
closely matches manual grading outputs, indicating that 
266   Informatica 49 (2025) 255-268       Z. Xiong et al. 
 
the vector transformation procedure is both 
mathematically robust and pedagogically reliable for 
AutoCAD examination assessment. 
To facilitate the drawing computer to adjust the drawing 
position to adapt to different types of whiteboards, an easy-
to-operate GUI user interface is designed, which can easily 
perform target image input, motor position adjustment, 
whiteboard parameter setting, drawing start and stop 
control, etc. To validate the strength of the adopted design 
and the proposed method, a drawing experiment was 
carried out using the developed drawing computer drawing 
examination system. After determining the Base and 
whiteboard height information, perform moment analysis 
on each force point on the whiteboard, and the results are 
shown in Figure 10. Among them, Motor Load represents 
the moment received. It can be seen that the torque is too 
large only in the area of y=0, and the other areas meet the  
requirement that the load torque is less than 30% of the Figure 11: Limiting the movement position of the 
maximum torque. drawing computer. 
Table 4: The experimental results of the drawing speed 
and accuracy test of the drawing computer. 
set Measured require actual 
(cm
distance(c distance(c d speed(cm/
) 
m) m) time(s) s) 
20 19.9 11.38 1.749 0.1 
10 9.9 5.71 1.733 0.1 
5 5.0 2.81 1.780 0.0 
 
 
Figure 10: Moment analysis of each point on the From the results in Table 3, it can be seen that the average 
whiteboard. drawing speed of the computer is 1.75 cm/s, the drawing 
accuracy is better than 0.1 cm, and the drawing speed and 
The dead zone positions of the two suspension points are accuracy meet the design requirements. To verify the 
removed, and the positions are limited according to the drawing effect of the drawing computer, several pictures 
principle that the load moment is less than 30% of the were randomly selected for drawing, and the drawing 
maximum moment, and the results shown in Figure 11 are results are demonstrated in Figure 12. 
obtained.  
It can be seen that the motion position does not include the 
upper left and upper right fan-shaped areas, this is because 
the two hanging points are positions that the drawing 
computer cannot reach. In addition, to prevent the motor 
torque from being too large, the moving area of the 
drawing computer is limited (the rectangular box area in 
Figure. 11). To verify the drawing speed and accuracy of 
the drawing computer, the moving distances of 20 cm, 10  
cm, and 5 cm were set respectively, and the average value Figure 12: Image rendering example 
was obtained after 10 experiments in each group. The 
outcomes are illustrated in Table 4. The proposed AutoCAD assessment system integrates 
CAD-specific content analysis with histogram-based 
 image similarity for objective, consistent scoring. 
AutoCAD uses a servo motor-driven drawing machine for 
physical verification, ensuring higher accuracy, less 
subjectivity, and practical dependability. The system 
Automated AutoCAD Drawing Assessment via Image Processing and...                                  Informatica 49 (2025) 255-268     267 
 
surpasses conventional approaches in automation, [3] Quiminsao CMD and Sumalinog JA (2023). Factors 
accuracy, and adaptability for CAD-based examination affecting the students’ achievement and attitude in 
and assessment activities.  learning AutoCAD. Australian Journal of 
 Engineering and Innovative Technology, 5(3), 130–
12  Conclusions 140. http://dx.doi.org/10.34104/ajeit.023.01300140 
[4]  Türkyılmaz T (2023). Visual Basic drawing codes 
from 2D AutoCAD drawings and machine parts 
For the calculation criteria of the similarity of the pictures, applications. Journal of Innovative Engineering 
using MATLAB to calculate the picture, the system of Applications, 13(2), Article 4. 
drawing the similarity of the pictures, using the students to http://dx.doi.org/10.7176/JIEA/13-2-04 
draw pictures and compare the correct answers, using three [5] de Almeida JS and Baratto NS (2022). Evaluation of 
different types of pictures and the money-making answers screencasts settings applied to CAD online teaching. 
to compare and calculate, calculate and export the correct In: Más allá de las líneas. La gráfica y sus usos: XIX 
answer. Realize the function of only scoring by computer. Congreso Internacional de Expresión Gráfica 
It improves work efficiency and saves manpower and Arquitectónica, pp. 639–642. Universidad 
material resources for manual scoring. The development Politécnica de Cartagena. 
and application market of computer-only scoring systems http://dx.doi.org/10.31428/10317/11414 
is ideal. However, a small sample size is a disadvantage. [6]  Li X, Wang X, Li J, Zhang M, Al Ansari MS, and 
To ensure statistical robustness and show the Goyal B (2023). Development of NC program 
generalizability of the system's evaluation accuracy, future simulation software based on AutoCAD. Computer-
research will incorporate a broader display of similarity Aided Design and Applications, Special Issue, S3, 
scores across a varied collection of student submissions. 72–83. 
Method also features a graphical user interface for user http://dx.doi.org/10.14733/cadaps.2023.S3.72-83 
convenience. Complex overlapping designs and problems [7]  Gutiérrez de Ravé S, Gutiérrez de Ravé E, and 
with contrast could pose problems for the system. Future Jiménez-Hornero FJ (2025). Enhancing efficiency 
research could include AI for adaptive scoring refinement, and creativity in mechanical drafting: A comparative 
study of general-purpose CAD versus specialized 
assist 3D CAD review, and improve edge recognition. To 
toolsets. Applied System Innovation, 8(3), 74. 
demonstrate the method's statistical robustness and 
https://doi.org/10.3390/asi8030074 
generalizability, future research will need to conduct 
[8]  Eltaief A, Ben Amor S, Louhichi B, Alrasheedi NH, 
additional examination using a more extensive and varied 
and Seibi A (2024). Automated assessment tool for 
dataset. A major focus of future research will be thorough 
3D computer-aided design models. Applied Sciences, 
statistical validation using larger datasets. The system's 
14(11), 4578. https://doi.org/10.3390/app14114578 
analytical depth and evaluation precision will be improved 
[9] Zhang N, Li F, and Zhang E (2023). The machine 
in subsequent work by incorporating quantitative tracking 
vision dial automatic drawing system—Based on 
mechanisms during the vector transformation process. The CAXA secondary development. Applied Sciences, 
result will allow the calculation of vectorization errors and 13(13), 7365. https://doi.org/10.3390/app13137365 
the identification count of geometric shapes. Large-scale [10] Fakhry M, Kamel I, and Abdelaal A (2021). CAD 
quantitative validation will be incorporated into future using preference compared to hand drafting in 
research. Future research will take into account more architectural working drawings coursework. Ain 
extensive statistical comparisons for a thorough Shams Engineering Journal, 12(3), 3331–3338. 
assessment. To enhance item detection and analysis https://doi.org/10.1016/j.asej.2021.01.016 
accuracy, future studies will use CNNs trained on [11] Hsu HH, Wu CF, Cho WJ, and Wang SB (2021). 
annotated CAD datasets to improve drawing structure Applying computer graphic design software in a 
recognition. computer-assisted instruction teaching model of 
 makeup design. Symmetry, 13(4), 654. 
References https://doi.org/10.3390/sym13040654 
[12] Zhang J (2025). Attention mechanism-enhanced 
[1]  Ibarrola F, Lawton T, and Grace K (2023). A 
model for automated simple brush stroke painting. 
collaborative, interactive, and context-aware drawing 
Informatica, 49(20). 
agent for co-creative design. IEEE Transactions on 
https://doi.org/10.31449/inf.v49i20.7688 
Visualization and Computer Graphics, 30(8), 5525–
[13] Cheng J, Yang L, and Tong S (2024). Recognition and 
5537. https://doi.org/10.1109/TVCG.2023.3293853 
analysis of painting styles with the help of computer 
[2] Fraile-Fernández FJ, Martínez-García R, and 
vision techniques. Informatica, 48(21). 
Castejón-Limas M (2021). Constructionist learning 
https://doi.org/10.31449/inf.v48i21.6891 
tool for acquiring skills in understanding 
[14] Zhao YQ (2017). Transcending images and forms: 
standardized engineering drawings of mechanical 
The theory of expressive aesthetic value of traditional 
assemblies in mobile devices. Sustainability, 13(6), 
3305. https://doi.org/10.3390/su13063305 
268   Informatica 49 (2025) 255-268       Z. Xiong et al. 
 
Chinese freehand painting. Journal of Aesthetic [28] Timoftei S (2018). Industrial robot in fine art: Can 
Education, 10(8), 3023–3034. an industrial robot draw a binary image? IOP 
[15] Lu G (2018). An analysis of the application of Conference Series: Materials Science and 
traditional painting and calligraphy elements in the Engineering, 78(4), 110–120. 
design of theme hotels. Journal of Heihe University, https://doi.org/10.1088/1757-899X/399/1/012019  
32(4), 329–335. [29] Feng M, Ying L, Sun G, Dong Y, Zhang F, and Liu 
[16] Jian M, Dong J, Gong M, Yu H, Nie L, and Yin Y Y (2018). Adaptive processing of dimensioning tire 
(2020). Learning the traditional art of Chinese patterns in engineering drawings. Chinese Journal 
calligraphy via three-dimensional reconstruction and of Automotive Engineering, 15(8), 370–374. 
assessment. IEEE Transactions on Multimedia, [30] Wang G, Liu S, Liu Q, Zhao S, Zhao X, and Li Y 
22(4), 970–979. (2017). Micro-arrayed stretch drawing process of 
https://doi.org/10.1109/TMM.2019.2931390  nanocrystalline Ni-Co foils with soft-male-die. 
[17] Wang G, Zhao S, Liu S, and Siyu L (2017). Micro- Journal of Materials Processing Technology, 
arrayed stretch drawing process of nanocrystalline 240(4), 806–820. 
Ni-Co foils with soft-male-die. Journal of Materials https://doi.org/10.1016/j.jmatprotec.2016.10.038  
Processing Technology, 78(4), 110–120. 
[18] Lee IK, Lee SY, Kim DH, Lee JW, and Lee SK 
(2018). Wire drawing process design for fine 
rhodium wire. Transactions of Materials 
Processing, 15(8), 370–374. 
[19] Wang Y (2018). Digital subsistence of Chinese 
calligraphy fonts. Packaging Engineering, 215(1), 
806–820. 
[20] Yang Q (2018). Technical operation analysis of 
Photoshop in Premiere header image processing. 
China Computer & Communication, 93(3), 1–8. 
[21] Nakagawa M, Sutou K, and Hayakawa T (2020). 
Reproduction of additive-type fluorescence moiré 
fringes by image drawing software and study of 
accuracy of fluorescence imprint alignment. 
Japanese Journal of Applied Physics, 5(5), 445–
452. https://doi.org/10.35848/1347-4065/ab5cbe 
[22] Lin J and Chen H (2019). Application of image 
processing technology in graphic design. Modern 
Electronics Technique, 73(7), 40–55. 
[23] Cao G (2018). The history and aesthetic features of 
Chinese literati painting. Journal of Tianjin 
Academy of Fine Arts, 29(7), 143–148. 
[24] Cheng H, Huijie L, Luo R (2019). Research on 
geometric characteristics of asphalt mixture 
aggregate based on image processing. Journal of 
Wuhan University of Technology (Transportation 
Science & Engineering), 21(5), 773–791. 
[25] Yin Y and Antonio J (2020). Application of 3D laser 
scanning technology for image data processing in 
the protection of ancient building sites through 
deep learning. Image and Vision Computing, 
102(5), 173–196. 
https://doi.org/10.1016/j.imavis.2020.103982  
[26] Yang S, Zhang X, and Wang F (2017). Application 
of map GIS image analysis system in making 
design drawing of regional gravity points. 
Geological Survey of China, 32(4), 329–335. 
[27] Wang C and Han D (2017). Research on the 
construction of graphic image cooperative 
processing system based on HTML5 technology. 
Boletin Tecnico / Technical Bulletin, 55(15), 375–
384. 
https://doi.org/10.31449/inf.v49i12.8907  Informatica 49 (2025) 269-280   269 
 
Optimization of Dynamic Energy Management Strategy for New 
Energy Vehicles Based on Multi-Agent Reinforcement Learning 
 
Xiaoyu Zhang 
Automotive Academy, Henan Communications Vocational and Technical College, Zhengzhou Henan, 450000, China 
E-mail: zxiaoyhappy@163.com 
 
Keywords: battery degradation, energy management strategies, fuel economy, new energy vehicle (NEV), power 
distribution, scalable satin bowerbird optimizer-driven multi-agent deep Q-Network (SSB-MADQN) 
 
Received: April 14, 2025 
 
The development of New Energy Vehicles (NEVs), such as battery electric vehicles, is vital to addressing 
global issues like environmental pollution and fossil fuel depletion. However, optimizing their energy 
management strategies (EMSs) is complex due to conflicting goals, dynamic driving conditions, and 
system nonlinearity. This study proposes a dynamic EMS based on Multi-Agent Reinforcement Learning 
(MARL) using a Scalable Satin Bowerbird Optimizer-driven Multi-Agent Deep Q-Network (SSB-
MADQN). The approach aims to enhance fuel economy, maintain battery State of Charge (SOC), and 
reduce battery degradation in real-time driving scenarios. Prior to training, data preprocessing—
including min-max normalization and Principal Component Analysis (PCA)—improves learning 
efficiency. The MADQN framework consists of agents representing subsystems such as the engine, battery, 
and regenerative braking, each trained using a deep Q-network with three hidden layers (128-64-32 
neurons). The dataset comprises 5,000 samples with 13 features, including vehicle speed, power demand, 
and battery performance. Evaluated on HWFET and WLTC driving cycles, the proposed strategy reduces 
fuel consumption by 0.912 L (WLTC) and 0.681 L (HWFET) compared to traditional methods. It effectively 
regulates SOC and reduces high-power discharge events, confirming the robustness of MARL for adaptive 
and efficient EMS in NEVs. 
Povzetek: Raziskava predlaga dinamično strategijo upravljanja z energijo (EMS) za NEV na osnovi MARL 
(SSB-MADQN). Optimizira porabo goriva, stanje napolnjenosti baterije (SOC) in zmanjšuje degradacijo, 
s čimer izboljša učinkovitost v realnem času. 
 
1 Introduction (RL) [4]. Figure 1 shows the dynamic energy management 
strategy for NEVs. 
The growing demand for NEVs, which includes hybrids  
and battery electric vehicles, occurs because they serve as 
an environmentally friendly replacement for traditional 
internal combustion engine vehicles that offer improved 
air quality decreased greenhouse gas emissions, and 
reliable energy systems [1]. Strong worldwide climate 
change understanding, along with decreasing fossil fuel 
reserves, has made NEV development essential for 
countries implementing sustainable transportation 
solutions [2]. Conventional EMS approaches, such as rule-
based, fuzzy logic, or model predictive control methods, 
rely on pre-defined heuristics or offline optimization and  
 
often fail to adapt in real-time to complex, dynamic 
Figure 1: Dynamic energy management strategy for 
environments like varying road gradients, traffic 
NEVs 
conditions, and driving behaviours [3]. The growing 
 
complexity of NEVs and their need for adaptive, real-time 
Reinforcement learning has shown significant promise in 
decision-making have thus pushed the investigation 
EMS optimization by enabling systems to accumulate 
toward leveraging artificial intelligence (AI) techniques 
reward functions, such as fuel efficiency or battery health 
such as machine learning (ML) and reinforcement learning 
[5]. However, most existing RL-based EMS frameworks 
270   Informatica 49 (2025) 269-280       X. Zhang et al. 
 
operate under a single-agent paradigm, where the entire for new energy vehicles (NEVs). Current systems face 
decision-making process is centralized, which limits challenges in optimizing fuel efficiency, battery health, 
scalability and does not fully represent the distributed and driving performance simultaneously, especially under 
nature of NEV components. In reality, energy management dynamic driving conditions. By leveraging Multi-Agent 
involves coordination between multiple subsystems [6]. Reinforcement Learning (MARL) and the novel SSB-
The vehicle dynamics are modeled to include real-world MADQN approach, this research aims to reduce fuel 
constraints such as regenerative braking, load variations, consumption while maintaining optimal battery SOC and 
and battery degradation metrics [7]. Despite various minimizing degradation, ultimately contributing to more 
conventional EMS strategies yielding acceptable sustainable and efficient NEV operation in real-world 
performance under ideal conditions, they often fail in scenarios. 
unpredictable or highly dynamic driving environments. By The research is comprised of the following sections: In 
leveraging the strengths of multi-agent systems and Section 2, a list of relevant works was presented. In 
metaheuristic-optimized DL models, it offers a robust, Section 3, the methodology is described. In Section 4, the 
adaptive, and intelligent EMS that is both scalable and findings are presented. The discussion portion is provided 
energy-efficient. It highlights the transformative potential in Section 5, and Section 6 contains the conclusion. 
of AI-driven strategies in the automotive domain,  
particularly for real-time optimization and sustainable 2 Related work 
energy utilization in NEVs.  
To address these limitations, MARL has emerged as an A novel multiple-input and multiple-output (MIMO) 
innovative solution for optimizing EMS in a decentralized control technique based on Multi-Agent Deep 
and cooperative manner. In MARL-based EMS, different Reinforcement Learning (MDARL) was examined in [8] 
vehicle components are modeled as intelligent agents, such for the multi-mode photovoltaic EV. Two learning agents 
as a battery agent and an engine agent that learn to make would collaborate under the MDARL, utilizing the deep 
decisions based on local observations and collaborate to deterministic policy gradient (DDPG) algorithm by 
achieve a global objective. It allows for distributed control, implementing a handshaking technique that provided a 
reduced computational complexity, and more effective relevance ratio. To improve fuel economy, [9] provided a 
adaptation to real-time driving dynamics. A novel MARL- unique EV EMS based on the MDARL architecture. Under 
based EMS framework is proposed using an SSB- power limits, the EMS effectively achieved optimal power 
MADQN. The SSB is a nature-inspired metaheuristic transmission between the engine and battery. 
algorithm based on the mating behavior of satin The optimal functioning of a fleet of EVs that were 
bowerbirds, known for balancing exploration and directed to supply power to a group of clients at various 
exploitation efficiently. The aim is to enhance fuel places was covered in [10]. MARL was used in a 
economy, sustain battery SOC, and decrease battery Decentralised Markov Decision Procedure reformulation 
degradation under dynamic driving conditions. framework to be practicable for a fleet of EVs to function 
 well and provide energy to numerous clients at various 
1.1 Key contribution places. A unique optimum energy management approach 
based on the suggested MDARL technique was presented 
Data Collection: The dataset captures real driving in [11]. It used a deep neural network to train a strategy 
conditions, fuel consumption, power distribution, and based on multi-agent deep deterministic policy gradient 
battery health metrics specific to NEV scenarios. (MADDPG) learning capacity and stacked denoising auto-
Data preprocessing: Applied data cleaning and min-max encoders. By considering the different characteristics of 
normalization to standardize input variables, ensuring both electrical and thermal energies. 
consistent scale and reducing data noise for learning A MADRL optimization approach was proposed in [12] 
stability. for energy control with EV charging development. To 
Feature extraction: Used PCA to extract 12 principal determine the optimal choice, the aggregator and 
components, preserving 95% variance for improved prosumers were designed to be intelligent agents that 
training efficiency and dimensionality reduction. communicate with one another. Utilizing EV battery 
Proposed method: SSB-MADQN, a MARL-based scheduling, prosumers might save on power costs. A new 
framework with decentralized agents and a Satin Multi-Agent ActorCritic (MA2C) system was examined in 
Bowerbird-optimized DQN for dynamic NEV energy [13], which was specifically designed for mixed-traffic 
management. situations. The MA2C algorithm offers an extensive 
 method of managing urban traffic that prioritizes 
1.2 Motivation effectiveness, safety, and passenger security. 
To effectively recommend public charging stations, [14] 
The motivation for this research is driven by the need for anticipated a Multi-Agent Spatio-Temporal Reinforcement 
more effective and adaptive energy management strategies Learning (Master) that takes into consideration several 
Optimization of Dynamic Energy Management Strategy for New...                                   Informatica 49 (2025) 269-280     271 
 
long-term spatiotemporal characteristics. The Demand agent reinforcement learning (MARL) for energy 
Response potential in smart homes using a multi-agent management in smart systems. It highlights diverse 
reinforcement learning framework enhanced with applications ranging from EVs and smart grids to smart 
BiLSTM and Attention Mechanism for improved data homes using algorithms like MADDPG, MA2C, and 
efficiency and handling stochastic household loads [15]. BiLSTMA-MADDPG. While most approaches show 
The BiLSTMA-MADDPG model improves data improved performance in energy savings and efficiency, 
efficiency, convergence speed, and scalability in common limitations include coordination complexity, high 
controlling household appliances under limited training computational needs, and data inefficiency.  
samples. Table 1 presents recent advancements in multi-
 
Table 1: Contrast examination of traditional works 
 
Ref. Year Area Focused Algorithms Limitations Performance 
[8] 2023 Energy Management in MADRL, DDPG, Requires careful Energy savings can range 
Multi-mode plug-in Hand-shaking tuning of DDPG from 4% to 23.54% when 
hybrid EVs Strategy, Relevance parameters; compared to a single-agent 
Ratio learning system and a rule-based 
performance is system. 
sensitive to  
learning rate 
[9] 2025 Hybrid EVs, Energy MADRL, MADDPG Complexity in Fuel consumption was 
Management Strategy multi-agent reduced by 26.91% 
coordination, (WLTC) and 8.41% 
simulation-based (HWFET), improving 
validation only EMS robustness. 
[10] 2022 Smart Grids, Multi- MARL, Decentralized High initial Significant reduction in 
Agent Systems, EVs. Markov Decision training complexity simulation time; superior 
Process (Dec-MDP), assumes accurate scalability and efficiency 
Actor-Critic Networks agent-environment 
modeling. 
[11] 2023 Optimal Energy MADRL, Stacked Requires high Achieved optimal dispatch 
Management, Smart Denoising Auto- computational of electric and thermal 
Grid, Multi-Energy Encoders framework resources, energies, and reduced 
MicroGrids. complexity in emissions and costs. 
decentralized 
implementation, 
and training 
convergence 
[12] 2023 Smart Grid Energy MADRL, Real-Time High Mean power consumption 
Management, EV Pricing, Smart Agent computational was reduced by 9.04% (vs. 
Scheduling, Solar Interaction requirements for no EV usage) and reduced 
Photovoltaic (PV) real-time DRL. by 39.57% (vs. 
Integration  conventional pricing) 
[13] 2024 Smart Cities, MA2C, Complexity of Outperforms existing 
Autonomous Vehicles, Reinforcement multi-agent models in lane-changing 
Sustainable Mobility Learning, Actor-Critic coordination; efficiency, safety, comfort, 
Architecture Requires realistic and inter-vehicle 
traffic data for cooperation. 
deployment 
[14] 2021 EVs Charging MA2C Framework, Required Outperforms 9 baseline 
Recommendation, Centralized Attentive coordination approaches in 
Smart Mobility, DRL Critic, Delayed Access among distributed recommending charging 
Strategy agents stations. 
[15] 2023 Demand Response in BiLSTMA-MADDPG Non-stationary Improved data efficiency, 
Smart Homes (Multi-Agent RL) environment; data faster convergence, and 
inefficiency better scalability with 
small samples. 
  
272   Informatica 49 (2025) 269-280       X. Zhang et al. 
 
3 Methodology power become visible through off-diagonal scatter plots. 
Figure 3 shows the data exploration. 
The methodology involves modeling the NEV's energy  
system as a multi-agent environment with engine and 
battery agents. Real-time driving data undergoes data 
cleaning and min-max normalization, and PCA for feature 
extraction. AnSSB-MADQN is employed to optimize 
power distribution. Trained on WLTC and HWFET cycles, 
this strategy improves fuel efficiency, stabilizes SOC, and 
reduces battery degradation, enabling adaptive, real-time 
energy management under dynamic driving conditions. 
Figure 2 presents the proposed methodology’s overview. 
 
 
 
Figure 3: Data exploration outcomes 
 
3.2 Data preprocessing using data cleaning 
To clean the NEV energy management dataset, missing 
 
 values should be handled through mean or median 
Figure 2: Proposed methodology overflow imputation techniques while maintaining sparse data rows. 
 Convert data types to ensure consistency across numerical 
and categorical fields. The data types should be converted 
3.1 Data collection 
to achieve numerical and categorical field consistency. 
The NEV energy management dataset was collected from Reduction of redundant data will occur by eliminating 
the Kaggle source. It is meant to assist in finding the most duplicate records. The system needs to identify and handle 
effective ways to save energy in NEVs, using the approach unusual cases found in energy consumption alongside 
of MARL. It  battery degradation trends. A final test must verify the data 
includes data about real-world traffic, energy distribution, balance between driving cycles and efficiency classes. 
mileage, and battery health for multiple driving routines.  
70% of the dataset was used for training and 30% for 3.2.1 Min-Max normalization 
testing to evaluate performance under diverse scenarios. The process of min-max normalization transforms new 
Source:https://www.kaggle.com/datasets/ziya07/nev- energy vehicle energy management datasets into 
energy-management-dataset/data standardized ranges, which improves both model 
 performance and speed of convergence, and accuracy 
3.1.1 Data Description during energy efficiency optimization. Using linear 
The NEV Energy Management Dataset features 5,000 modifications of the original data, min-max normalization 
records with 13 attributes for measuring vehicle speed creates a balanced set of value comparisons between the 
along with acceleration, power demand, fuel usage, and data before and after the execution, as follows in Equation 
battery performance across different driving conditions. (1). 
The system combines essential variables such as engine  
𝑊−min (𝑊)
power, battery power and SOC, battery degradation, and 𝑊𝑛𝑒𝑤= ….                      (1) 
max(𝑊)−min (𝑊)
regenerative braking power to assess energy efficiency and 
 
sustainability levels.  𝑊𝑛𝑒𝑤- The adjusted value derived from the normalized 
3.1.2 Data Exploration 
outcomes 
The pair plot demonstrates the relationship dynamics 
𝑊- Old Value 
between speed, power demand, battery power, SOC, and 
max(𝑊)-The dataset's maximum value 
fuel consumption variables for designing a dynamic 
min (𝑊)- The dataset's minimum value 
energy management strategy in NEVs. The diagonal 
 
presentation displays distribution patterns to identify 
 
normal or skewed data shapes. The correlations and strong 
 
positive associations between power demand and battery 
Optimization of Dynamic Energy Management Strategy for New...                                   Informatica 49 (2025) 269-280     273 
 
3.3 Feature extraction using PCA applying min-max normalization, PCA reduced the feature 
space to 6 principal components, maintaining more than 
The dynamic energy management technique becomes 95% of the total variance while minimizing duplication, 
more efficient by eliminating unnecessary variables and boosting the energy management model's learning 
focusing exclusively on critical factors. This results in efficiency. Figure 4 shows the PCA-based feature 
faster convergence and more accurate decision-making via contribution to the first principal component, which 
the MARL framework for energy distribution. PCA was explains the most variation. This information assists in 
used to minimize the dimensionality of the dataset while determining the most significant elements for EMS 
retaining the majority of its informational richness. In optimization. Notably, this representation is based on the 
addition, 5 derived characteristics were designed to PCA loading matrix before dimensionality reduction. 
capture complicated energy dynamics such as power Figure 4 shows PCA-Based Feature Importance Output for 
fluctuation, energy trends, and driving cycle behavior, Energy Management Optimization. 
which are crucial for intelligent EMS control.   
By eliminating the class label, each observation in a data  
set of 𝑙 observations is mathematically 𝑚-dimensional. 
Assuming that 𝑤1, 𝑤2, … . . , 𝑤𝑙   ∈ ℜ𝑚 .  The subsequent 
procedures for calculating PCA. 
Determine the mean vector µ in 𝑚-dimensions by 
Equation (2). 
 
1
𝜇 = ∑𝑙
𝑗 1 𝑤    
𝑙 = 𝑗       (2) 
 
Determine the observed data's estimated matrix of 
covariance 𝑇 by Equation (3). 
 
1 𝑠
𝑇 = ∑𝑙 (𝑤 ) 𝑤 − )
𝑙 𝑗=1 𝑗 − 𝜇 ( 𝑗 𝜇         (3)  
 
 
Figure 4: PCA-based feature importance output for 
Determine the associated eigenvectors and eigenvalues of 
energy management optimization 
𝑇, whereby 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑙 ≥ 0.  Determine the 𝑙 
 
primary components from the 𝑙 original variables by 
• Data Cleaning (13 features): Outliers, 
Equation (4). 
impossible values (e.g., negative fuel), and 
 
missing values were handled through imputation 
𝑧1 = 𝑏11𝑤1 + 𝑏12𝑤2 + ⋯ + 𝑏1𝑙𝑤𝑙 and filtering. 
𝑧2 = 𝑏21𝑤1 + 𝑏22𝑤2 + ⋯ + 𝑏2𝑙𝑤𝑙
⋯         (4) • Normalization (13 features): Each feature was 
𝑧𝑙 = 𝑏𝑙1𝑤1 + 𝑏𝑙2𝑤2 + ⋯ + 𝑏𝑙𝑙𝑤𝑙 scaled to a standard range (mean = 0, std = 1) for 
 consistent learning performance. 
It is orthogonal that 𝑧𝑙 are uncorrelated. As much of the • PCA Application: Principal component analysis 
initial variation in the data set can be explained by 𝑧1, as reduced the final 18-dimensional space to 6 
much of the residual variance can be explained by 𝑧2, etc. principal components, capturing >95% variance, 
In the most useful data sets, a small number of bigger enhancing model training speed and 
eigenvalues often outnumber the others, as follows in generalization. 
Equation (4). Where the proportion maintained in the data  
format is denoted by 𝑧𝑙. While the original dataset contained 13 attributes, 5 
 additional derived features were introduced through 
𝜆
𝛾 1+𝜆2+⋯+𝜆
𝑙 = 𝑛 ≥ 80%        (5) feature engineering to enhance the model's ability to 
𝜆1+𝜆2+⋯+𝜆𝑛+⋯+𝜆𝑙 capture dynamic driving patterns and battery behavior. For 
 instance, ΔSOC (change in State of Charge) reflects short-
Principal Component Analysis (PCA) was applied to term battery discharge rates, offering temporal insights that 
reduce the dimensionality of the input space. Although the static SOC cannot. Similarly, features like speed trend and 
original dataset consisted of 13 attributes, only 12 numeric regenerative efficiency were designed to capture vehicle 
features were used for PCA, excluding the non-numeric acceleration patterns and energy recovery rates, 
target column. PCA transformed this 12-dimensional respectively. These engineered features provide higher-
feature space into 6 uncorrelated principal components, level abstractions that improve the learning model’s 
capturing over 95% of the total variance and improving contextual awareness. PCA was then applied to this 18-
model training efficiency by eliminating redundancy. After dimensional space to reduce redundancy, improve 
274   Informatica 49 (2025) 269-280       X. Zhang et al. 
 
generalization, and retain the most informative patterns by  
selecting 6 principal components that preserved over 95% 
of the variance. 
 
3.4 SSB-MADQN 
The SSB-MADQN is a novel framework for dynamic 
energy management in NEVs. It integrates the SBO to 
enhance agent policy optimization and exploration within 
a MADQN environment. By enabling decentralized 
cooperation among energy management agents, SSB-
MADQN effectively balances power delivery among both 
the engine and battery, optimizes fuel consumption, and 
mitigates battery degradation under diverse driving cycles. 
The scalable design ensures adaptability across vehicle 
platforms, while the optimizer enhances learning 
efficiency, making SSB-MADQN a robust solution for 
real-time, intelligent NEV energy management. 
 
 
 
3.4.1 MADQN 
Figure 5: MADQN architecture 
The MADQN enables dynamic energy management in 
 
NEVs by allowing multiple agents (engine, battery, motor) 
In multi-agent reinforcement learning, the replay buffer 
to learn cooperative strategies. Through DRL, each agent 
holds all agents' experiences, which frequently include 
optimizes energy distribution, improving efficiency, 
shared observations, actions, and rewards to capture inter-
reducing fuel consumption, and adapting to varying 
agent relationships. Each agent's training is stabilized by 
driving conditions in real time. It uses a model-free 
the target network, which provides constant Q-value 
reinforcement learning strategy, which eliminates the need 
targets and is updated on a regular or soft basis. Q-value 
to explicitly understand the environment's dynamics. 
updates are changed by taking into account not just an 
Agent 1 observes state 𝑡𝑠 and chooses the optimal action 
agent's action and reward, but also the effect of other 
at time 𝑠 to move to state 𝑡𝑠+1 in traditional Q-learning, 
agents' activities, employing centralized training and 
based on a value model-free approach. The agent then 
decentralized execution. This allows agents to develop 
changes the Q-value after receiving an instant benefit 
coordinated methods while functioning independently 
𝑟(𝑡𝑠, 𝑏, 𝑡𝑠+1)at time 𝑠 + 1, as shown in Equation (6). 
during deployment. 
 
A replay buffer is used to retain the agent's experiences, a 
𝑄𝑠+1(𝑡𝑠, 𝑏𝑠) ← (1 − 𝛼)𝑄𝑠(𝑡𝑠, 𝑏𝑠) + 𝛼[𝑟(𝑡𝑠, 𝑏𝑠, 𝑡𝑠+1) +
target network (𝜃𝑡𝑔) replicates the main network to offer 
𝛾 max 𝑄𝑠(𝑡𝑠+1, 𝑏)]        (6) 
𝑏 a steady target for learning, and a main network 
In reinforcement learning, 𝛾 is a discount factor, parameterized by (𝜃𝑛) is used to estimate Q-values in the 
𝛾 𝑚𝑎𝑥′
𝑏𝑅𝑠(𝑡′, 𝑏′) is the discounted reward, and 𝛼 ∈ [0,1] multi-agent environment. First, agent 1 observes the 
is the learning rate. The Q-values for every potential state energy demand signal and its states at the time 𝑠 
and action for agent 1 are stored in a two-dimensional communicates with neighboring agents (states (𝑡𝑠) and 
look-up column with dimensions 𝒯 × ℬ. Consequently, policies), and selects an action (𝑏𝑠). For example, suppose 
the number of actions and states in a complex system that Agent 1 is unable to fulfill the energy storage request. 
causes the Q-table's size to grow exponentially. Figure 5 Suppose that three collaborative NEV modules (engine, 
presents the MADQN architecture. Every edge server is battery, motor) ({𝑖, 𝑟} ∈ 휀𝑛𝑏) with a strategy for new 
regarded as an agent in EV.  Figure 5 depicts the MADQN energy 𝑞𝐹,𝑗𝑖and 𝑞𝐹,𝑖𝑞 , where 𝑞𝐹,𝑗𝑖 < 𝑞𝐹,𝑖𝑞, have the 
framework utilized in the caching environment, with 
matching content. This situation results in the selection of 
architectural details. The neural networks (Main and 
the neighboring agent with energy cost, as shown in 
Target) are implemented as multilayer perceptrons, with an 
Equation (7). 
input layer matching the state dimension (e.g., 50 
 
features), two hidden layers of 128 and 64 neurons, arg max𝑏∈ℬ𝑄(𝑡𝑠, 𝑏)       𝑜 = 1 − 𝜖1 − 𝜖2  
respectively, employing ReLU activation, and an output  
𝑏
layer representing the number of potential actions (e.g., 𝑠 = { 𝑟𝑎𝑛𝑑𝑜𝑚 𝑏 ∈ ℬ        𝑜 = 𝜖   (7) 
1
two for binary caching decisions). These features are 𝑂𝑡ℎ𝑒𝑟 𝑟𝑒𝑝𝑙𝑎𝑐𝑒𝑚𝑒𝑛𝑡 𝑝𝑜𝑙𝑖𝑐𝑦 𝑏 ∈ ℬ       𝑜 = 𝜖2
critical to understanding the model's structure and ensuring  
repeatability. Furthermore, it has 𝜖1 and 𝜖2 set to decrease with time. 
 Consequently, the model will eventually choose the best 
Optimization of Dynamic Energy Management Strategy for New...                                   Informatica 49 (2025) 269-280     275 
 
course of action. It is suggested to explore if the agent does ➢ Logistic Chaos's initialization: 
not function well. A collection of recent rewards (𝑅𝐺) is Although the algorithm's initial population utilizes a 
tracked, and ∈𝑦 (𝑤ℎ𝑒𝑟𝑒  ∈𝑦  {1, 2})is updated, as shown random initialization mode according to natural law, a 
in Equation (8). The step sizes for modifying the better initialization approach would greatly accelerate the 
probability ∈𝑦 are 𝛿+ and 𝛿−, and 𝑟𝑡ℎ is a reward intelligent optimization algorithm's convergence speed. 
threshold. The population is also initialized by the SB using random 
 values. A logistic chaos map was created to improve the 
∈ starting population's diversity, which in turn led to a better-
𝑦+ 𝛿+,   𝔼(𝑄𝐺) < 𝑟𝑡ℎ
∈𝑦= {         (8) 
∈𝑦− 𝛿−,   𝔼(𝑄𝐺) ≥ 𝑟 starting population, which improved the algorithm's 
𝑡ℎ
accuracy and speed of convergence. Equation (10) 
 
illustrates the logistic chaos map calculating method. 
The agent moves on to the next state (𝑡𝑠 + 1) for the 
 
selected action (𝑏𝑠), preserves moving in the replay buffer 
𝑊𝑗+1 = 𝜇𝑊𝑗 ∗ (1 − 𝑊𝑗)       (10) 
of size, and receives an instant benefit (𝑟𝑠 + 1). During the 
 
training stage, agent 1 uses mini-batch descent to train the 
primary network after selecting a mini-batch of size 𝐴 The control parameters 𝜇 have a value range of 0 𝑡𝑜 4. 
 
There will be more confusion when the number of 𝜇 is 
from the replay buffer. In every 𝐼 step, the target network 
replicates the primary network to provide learning higher. The chaotic initialization effect will be amplified𝜇. 
stability, as follows in Equation (9). Equation (11) is used as the population initialization. 
  
𝑄 𝑝𝑜𝑝(𝑗). 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 𝑌(𝑗, : ).∗ (𝑉𝑎𝑟𝑀𝑎𝑥 − 𝑉𝑎𝑟𝑀𝑖𝑛) +
𝑠+1 = (𝑡𝑠, 𝑏𝑡) ← (1 − 𝛼)𝑄𝑠(𝑡𝑠, 𝑏𝑡; 𝜃𝑛) +
𝛼[𝑟(𝑡𝑠, 𝑏𝑡 , 𝑇𝑠+1) + 𝛾 𝑚𝑎𝑥 𝑄
𝑏 𝑠(𝑡𝑠+1, 𝑏; 𝜃𝑠ℎ) 𝑉𝑎𝑟𝑀𝑖𝑛       (11) 
−
➢ The cauchy variation method: 
𝑄𝑠(𝑡𝑠, 𝑏𝑡; 𝜃𝑛)] + ∑𝑖𝜖𝑀 𝑤𝑗𝑖𝑄𝑠−1(𝑡𝑠, 𝑏𝑡; 𝜃𝑛)       (9) 
𝑓 Instead of using the conventional SB mutation technique, 
 which produces a shorter peak dispersed at the origin and 
Where 𝜔𝑗𝑖  is modeled as inversely proportional to the a longer spread in the remainder, the Cauchy mutation 
EMS(𝑟𝐹, 𝑦𝑥) among 𝑖and 𝑗, and is used to highlight the strategy guarantees more disruption near the current 
effect of neighbor 𝐼 on agent 1.  population. Equation (12) shows the Cauchy variation 
 approach. 
3.4.2 SSB  
The traditional Satin Bowerbird (SB) optimizer struggles 𝑊𝑠+1
𝑗,𝑖 = 𝑊𝑏𝑒𝑠𝑡 + 𝐶𝑎𝑢𝑐ℎ𝑦(0,1)⨁ 𝑊𝑏𝑒𝑠𝑡(𝑠)       (12) 
to effectively manage the complex, dynamic, and multi-  
objective nature of energy management strategies in new Where 𝑊𝑏𝑒𝑠𝑡(𝑠) is the location of an individual that 
energy vehicles (NEVs). It lacks the scalability and the requires variation, and 𝐶𝑎𝑢𝑐ℎ𝑦   (0,1) is the typical 
ability to deal with several competing priorities, including Cauchy distribution. Equation (13) computes the relevant 
fuel consumption, battery capacity, and reducing battery variation probability.  
degradation. The basic SB algorithm lacks mechanisms for  
efficiently navigating high-dimensional search spaces or 𝑖𝑡 20
adapting to rapidly changing driving conditions. It also 𝑂𝑡 = − exp (1 − ) + 𝑜      (13) 
𝑀𝑎𝑥𝐼𝑡
falls short in maintaining solution diversity and handling  
trade-offs among multiple objectives, often leading to Both the current is represented by 𝑀𝑎𝑥𝐼𝑡, where 𝑜 is set at 
premature convergence or local optima. Furthermore, its 0.05. The procedure of the Cauchy mutation will not be 
limited ability to handle real-time updates and high- carried out if 𝑞 and < 𝑃𝑠. Table 2 shows the 
dimensional decision spaces reduces its effectiveness in hyperparameters of SSB. 
dynamic driving conditions, prompting the need for SSB’s chaotic initialization improves exploration by 
improved approaches like the Scalable SB (SSB) ensuring diverse initial solutions, avoiding local optima, 
optimizer. SSB efficiently balances energy distribution and speeding up convergence. The Cauchy variation, with 
between battery and engine systems, adjusts to various its heavy-tailed distribution, enables larger step sizes, 
driving schedules, speeds up how policies are learned and improving the algorithm's capacity to escape local minima 
helps achieve better fuel efficiency, fewer emissions, and and strike a better balance between exploration and 
longer life of the vehicle battery in complex driving exploitation. These traits exceed typical heuristics, 
situations.  allowing for faster and more efficient optimization. 
  
 
 
 
 
276   Informatica 49 (2025) 269-280       X. Zhang et al. 
 
Table 2: Hyperparameters of SSB 
 
No. Hyperparameter Symbol / Name Typical Value Description 
/ Range 
1 Population size P 5 – 50 Number of candidate 
solutions (bowerbirds) 
2 Maximum iterations MaxIter 10 – 100 Maximum SBO optimization 
cycles 
3 Attraction coefficient α 0.05 – 0.3 Strength of movement 
toward better solutions 
4 Random scaling factor rand () [0, 1] Random noise for solution 
diversification 
5 Learning rate search LR_range [0.0001, 0.01] Search space for learning rate 
range 
6 Epsilon search range ε_range [0.1, 1.0] Exploration rate range 
7 Discount factor search γ_range [0.8, 0.99] Reward discount factor range 
range 
8 Fitness function F(x) Avg episodic Evaluate solution quality 
reward 
9 Movement formula x_new = x + α * rand () — Bowerbird movement update 
* (x_best - x) 
10 Dimensionality of D 3 Parameters optimized (LR, ε, 
solution γ) 
4 Results and discussion 4.1 Confusion matrix 
The result comparison parameters, such as EMS The results of the confusion matrix are shown in Figure 6. 
optimization results for different strategies under WLTC, The model accurately predicted all classes: 152 samples as 
EMS optimization results for different strategies under class 0, 777 as class 1, and 71 as class 2, with zero 
HWFET, and control action, are used to demonstrate the misclassifications. This indicates that the energy 
comparison of the proposed model, SSB-MADQN, for management model is highly effective in correctly 
energy management strategy for new energy with the categorizing vehicle energy efficiency levels or strategies 
existing techniques, such as MADDPG [9] and Deep Q- with no false positives or negatives across all classes. The 
learning Adaptive Moment Estimation (DQL-AMSGrad) predicted classes represent EMS efficiency levels: 0 
[16]. The experimental setup is presented in Table 3. (High), 1 (Medium), and 2 (low).  
  
Table 3: Experimental setup  
 
Projects Environment 
Operating System Windows 10(x64) 
CPU i5-9500HF 
CPU@2.40GHz 
Memory Size 32GB 
GPU NVIDIA GeForce GTX 
2080 Ti 
CUDA Version 10.2 
Python Version 3.8 
Episode count 1000 
Batch size 64 
Convergence Training stops when 
criteria reward, loss, episodes, or  
epsilon criteria are met.  
 Figure 6: Confusion matrix outcomes 
  
Optimization of Dynamic Energy Management Strategy for New...                                   Informatica 49 (2025) 269-280     277 
 
4.2 Battery degradation distribution 
The distribution of battery degradation in NEV highlights 
a concentration of around 10%. It suggests significant wear 
under certain conditions, necessitating a dynamic energy 
management strategy. By integrating real-time degradation 
data, NEVs can optimize engine-battery energy 
distribution, extend battery life, and improve energy 
efficiency, especially under high-degradation scenarios. It 
supports adaptive, data-driven decision-making for 
sustainable vehicle performance. Figure 7 presents the 
distribution of battery degradation outcomes. 
 
Figure 8: Graphical representation of WLTC 
 
4.4 HWFET 
According to the HWFET driving cycle, SSB-MADQN 
 
 performs better than MADDPG when optimizing the EMS 
Figure 7: Distribution of battery degradation outcomes system. It achieves a higher terminal SOC (0.603 vs. 
 0.556), reduced equivalent fuel consumption (0.681 L vs. 
4.3 WLTC  0.734 L), and better fuel efficiency (4.121 L/100km vs. 
4.446 L/100km), indicating improved energy recovery and 
The EMS optimization results under the WLTC driving reduced fuel usage in dynamic energy management for 
cycle show that the proposed SSB-MADQN method NEVs. Figure 9 presents the EMS optimization under 
outperforms the existing method, MADDPG. SSB- HWFET. 
MADQN achieves a higher terminal SOC (0.643 vs.  
0.598), lower equivalent fuel consumption (0.912 L vs. 
0.977 L), and improved fuel efficiency (3.864 L/100km vs. 
4.199 L/100km), demonstrating its effectiveness in 
dynamic energy management for NEVs by enhancing 
energy utilization and reducing fuel use. Figure 8 presents 
the EMS optimization under WLTC. 
 
 
 
Figure 9: Graphical Representation of HWFET 
 
 
 
278   Informatica 49 (2025) 269-280       X. Zhang et al. 
 
4.5 Control action smoother transitions and a peak of 1.7, reflecting improved 
A comparison of control action variations over time in responsiveness and stability. It suggests SSB-MADQN's 
dynamic energy management for NEVs. DQL-AMSGrad superior performance in managing energy distribution 
shows fluctuating control values, peaking at 1.5, indicating dynamically and efficiently in NEV systems. Table 4 and 
moderate adaptability. The proposed SSB-MADQN model Figure 10 show control action outcomes. 
consistently yields slightly higher control actions, with  
 
Table 4: Control action outcomes 
 
Model 10 20 30 40 50 60 70 80 90 100 
DQL- 1.3 0.4 0.3 1.0 0.1 0.8 1.2 1.5 0.1 0.3 
AMSGrad 
[16] 
SSB- 1.5 0.6 0.7 1.2 0.2 1.0 1.4 1.7 0.3 0.6 
MADQN 
[proposed] 
  
 
4.6 Performance metrics summary of SSB-
MADQN for NEV energy management 
The primary performance metrics of the proposed multi-
agent deep reinforcement learning framework applied to 
dynamic energy management in new energy vehicles 
(NEVs). Metrics include fuel consumption, battery SOC 
limits, battery degradation rate, and computational 
efficiency during both training and real-time inference. 
These results demonstrate the framework’s effectiveness 
in balancing energy usage and system longevity. Table 5 
 
displays the SSB-MADQN performance.  
 
 
Figure 10: Graphical representation of control action 
 
Table 5: Key results of SSB-MADQN performance 
 
Performance metric SSB-MADQN (Proposed) 
Fuel Usage 3.4 L/100km 
SOC Bounds 20% – 80% 
Degradation Rate (%) 0.72% 
Training Time 4.1 hours 
Inference Time 14 ms 
  
5 Comparative analysis with existing complex multi-agent interactions in dynamic NEV energy 
environments. Such technology mandates a large amount 
systems 
of training  
material alongside powerful computing capabilities. The 
A dynamic EMS for NEVs optimizes power distribution 
integration of DQL-AMSGrad with adaptive learning rates 
between the battery and engine in real-time, enhancing 
facilitates better convergence, but it performs poorly with 
energy efficiency, reducing emissions, and adapting to 
the continuous action spaces regularly found in NEV 
varying driving conditions. MADDPG faces limitations in 
energy systems. The decision-making processes of these 
scalability and convergence stability when managing  
Optimization of Dynamic Energy Management Strategy for New...                                   Informatica 49 (2025) 269-280     279 
 
methods show poor adaptation to sudden driving condition References 
changes, along with restricted performance across 
different driving cycles, which affects real-time decisions [1] Wang, Y., Wu, Y., Tang, Y., Li, Q., & He, H. (2023). 
in NEVs. The proposed SSB-MADQN enhances Cooperative energy management and eco-driving of 
scalability and convergence stability by integrating the plug-in hybrid electric vehicle via multi-agent 
SSB with MADQN, enabling efficient exploration and reinforcement learning. Applied Energy, 332, 120563. 
exploitation in complex NEV environments. The system https://doi.org/10.1016/j.apenergy.2022.120563 
successfully deals with complex action spaces together [2] Yang, N., Han, L., Liu, R., Wei, Z., Liu, H., & Xiang, 
with dynamic driving conditions because it learns quickly C. (2023). Multiobjective intelligent energy 
and provides reliable real-time energy management management for hybrid electric vehicles based on 
functionality that outperforms MADDPG and DQL- multiagent reinforcement learning. IEEE 
AMSGrad by showing better adaptability and Transactions on Transportation Electrification, 9(3), 
generalization over several driving cycles. The proposed 4294-4305. 
strategy relies heavily on high-quality simulations, which https://doi.org/10.1109/TTE.2023.3236324 
may not fully capture real-world complexities. [3] Gautam, A. K., Tariq, M., Pandey, J. P., Verma, K. S., 
Additionally, there is a lack of real-world validation, and & Urooj, S. (2022). Hybrid sources powered electric 
the interpretability of multi-agent reinforcement learning vehicle configuration and integrated optimal power 
models remains a challenge, hindering broader practical management strategy. IEEE Access, 10, 121684-
adoption. 121711.https://doi.org/10.1109/ACCESS.2022.32177
 71 
6 Conclusion [4] Jiang, Q., & Wang, H. (2025). Risk Assessment 
Method for New Energy Vehicle Supply Chain Based 
Energy efficiency and operational performance in NEVs on Hierarchical Holographic Model and Matter 
have significantly improved through the application of AI- Element Extension Model. Informatica, 49(7). 
driven optimization strategies. The suggested SSB- https://doi.org/10.31449/inf.v49i7.6953.  
MADQN architecture used MARL to allow cooperative [5] Hu, H., Yuan, W. W., Su, M., & Ou, K. (2023). 
agents to control the engine and battery's power allocation Optimizing fuel economy and durability of hybrid fuel 
in real time under various driving circumstances. Data cell electric vehicles using deep reinforcement 
preprocessing methods, such as data cleaning and min- learning-based energy management systems. Energy 
max normalization, and PCA employed for feature Conversion and Management, 291, 117288. 
extraction, ensured consistency, reduced dimensionality, https://doi.org/10.1016/j.enconman.2023.117288 
and enhanced model learning. Experimental results [6] Bakare, M. S., Abdulkarim, A., Shuaibu, A. N., & 
revealed notable improvements, with fuel consumption Muhamad, M. M. (2024). Energy management 
reduced under WLTC compared to MADDPG, achieving controllers: strategies, coordination, and 
a final consumption of 3.864 L/100km, and similarly under applications. Energy 
HWFET with a reduction to 4.121 L/100km. These Informatics, 7(1),57.https://doi.org/10.1186/s42162-
outcomes confirm the effectiveness of intelligent EMS in 024-00357-9 
achieving adaptive and globally optimized energy [7] Rawat, R., Borana, K., Gupta, S., Ingle, M., 
strategies for NEVs. The limitations of relying solely on Dibouliya, A., Bhardwaj, P., & Rawat, A. (2025). 
simulation-based testing and plans to incorporate real- Enhancing OSN Security: Detecting Email Hijacking 
world ECU-in-the-loop evaluation to enhance validation. and DNS Spoofing Using Energy Consumption and 
Another key challenge is the interpretability of the MARL Opcode Sequence Analysis. Informatica, 49(2). 
model, for which we plan to adopt explainability https://doi.org/10.31449/inf.v49i2.6956.  
techniques such as SHAP or LIME to analyze Q-values [8] Hua, M., Zhang, C., Zhang, F., Li, Z., Yu, X., Xu, H., 
and better understand agent decisions. Additionally, & Zhou, Q. (2023). Energy management of multi-
potential deployment on edge computing platforms like mode plug-in hybrid electric vehicle using multi-agent 
NVIDIA Jetson is being considered to assess real-time deep reinforcement learning. Applied Energy, 348, 
feasibility. The proposed approach shows strong potential 121526.https://doi.org/10.1016/j.apenergy.2023.1215
for real-time EMS in NEVs by leveraging decentralized 26 
agents and a powerful optimizer for high-dimensional [9] Li, X., Zhou, Z., Wei, C., Gao, X., & Zhang, Y. (2025). 
spaces. However, to strengthen its scientific contribution, Multi-objective optimization of hybrid electric 
future work should focus on improving algorithm vehicles energy management using multi-agent deep 
transparency, ensuring rigorous experimentation, and reinforcement learning framework. Energy and 
incorporating advanced statistical techniques for deeper AI, 20, 100491. 
validation and performance comparison. https://doi.org/10.1016/j.egyai.2025.100491 
 
280   Informatica 49 (2025) 269-280       X. Zhang et al. 
 
[10] Alqahtani, M., Scott, M. J., & Hu, M. (2022). 
Dynamic energy scheduling and routing of a large 
fleet of electric vehicles using multi-agent 
reinforcement learning. Computers & Industrial 
Engineering, 169, 108180. 
https://doi.org/10.1016/j.cie.2022.108180 
[11] Monfaredi, F., Shayeghi, H., & Siano, P. (2023). 
Multi-agent deep reinforcement learning-based 
optimal energy management for grid-connected 
multiple energy carrier microgrids. International 
Journal of Electrical Power & Energy Systems, 153, 
109292. https://doi.org/10.1016/j.ijepes.2023.109292 
[12] Kaewdornhan, N., Srithapon, C., Liemthong, R., & 
Chatthaworn, R. (2023). Real-time multi-home energy 
management with EV charging scheduling using 
multi-agent deep reinforcement learning 
optimization. Energies, 16(5), 2357. 
https://doi.org/10.3390/en16052357 
[13] Louati, A., Louati, H., Kariri, E., Neifar, W., Hassan, 
M. K., Khairi, M. H., ... & El-Hoseny, H. M. (2024). 
Sustainable smart cities through multi-agent 
reinforcement learning-based cooperative 
autonomous vehicles. Sustainability, 16(5), 
1779.https://doi.org/10.3390/su16051779 
[14] Zhang, W., Liu, H., Wang, F., Xu, T., Xin, H., Dou, D., 
& Xiong, H. (2021, April). Intelligent electric vehicle 
charging recommendation based on multi-agent 
reinforcement learning. In Proceedings of the Web 
Conference 2021 (pp. 1856-1867). 
https://doi.org/10.1145/3442381.3449934 
[15] Al-Saffar, M., & Gül, M. (2023). Data-efficient 
MADDPG based on self-attention for IoT energy 
management systems. IEEE Access, 11, 109379-
109389. 
https://doi.org/10.1109/ACCESS.2023.3322193.  
[16] Montaleza, C., Arévalo, P., Gallegos, J., & Jurado, F. 
(2024). Enhancing energy management strategies for 
extended-range electric vehicles through deep Q-
learning and continuous state 
representation. Energies, 17(2), 
514.https://doi.org/10.3390/en17020514 s 
 
 
 
 
https://doi.org/10.31449/inf.v49i12.7724 Informatica 49 (2025) 281–298 281 
Hybrid Machine Learning and Optimization Algorithms for pH-
Based Water Quality Classification 
 
Xiaolin Li 1, * and Baomeng Pang2 
1School of Intelligent Manufacturing, Qingdao Huanghai University, Qingdao, Shandong, 266427, China 
2Shandong HI-SPEED Maintenance GROUP CO., LTD, Jinan, Shandong, 250000, China 
E-mail: lxl123101321@163.com  
*Corresponding author 
Keywords: water quality, PH level, machine learning, support vector classifier, extra trees classifier, optimization 
algorithms  
Received: December 2, 2024 
Water quality—defined through its physical, chemical, and biological parameters—is essential for critical 
applications such as drinking and irrigation. Among these parameters, pH plays a significant role by 
influencing metal solubility and nutrient availability, thereby impacting aquatic ecosystems. In this study, 
Support Vector Classifier (SVC) and Extra Trees Classifier (ETC) were employed to classify water quality 
based on pH values. To boost classification accuracy, the models were hybridized using two advanced 
metaheuristic algorithms: Transit Search Optimization Algorithm (TSOA) and Chaos Game Optimization 
(CGO), resulting in hybrid variants ETTS, ETCG, SVTS, and SVCG. Comprehensive experiments were 
conducted using standard evaluation metrics. The ETTS model achieved the best performance, with 
training accuracy of 0.910 and testing accuracy of 0.778, along with a precision of 0.911, recall of 0.910, 
and F1 score of 0.910 in training. In contrast, the base ETC model recorded training and testing 
accuracies of 0.881 and 0.750, respectively. Similarly, SVTS and SVCG outperformed the base SVC 
model, with SVTS achieving training and testing accuracies of 0.894 and 0.760, compared to SVC’s 0.850 
and 0.745. The proposed hybrid framework outperforms traditional SVC and ETC models and 
demonstrates superior classification performance compared to standard non-optimized baselines. This 
underscores the value of integrating advanced optimization techniques with machine learning for robust 
and reliable water quality assessment. The framework is a promising tool for environmental monitoring, 
promoting sustainable water resource management and public health protection. 
Povzetek: Študija je razvila hibridne modele strojnega učenja za klasifikacijo kakovosti vode na podlagi 
pH-vrednosti. Kombinacija klasifikatorjev Extra Trees (ETC) in Support Vector Classifier (SVC) z 
metahevrističnimi algoritmi TSOA in CGO (npr. ETTS, SVTS) je izboljšala klasifikacijo. Model ETTS je 
dosegel najboljšo zmogljivost, kar potrjuje prednost hibridnega okvira za okoljsko spremljanje. 
 
 
1 Introduction enhance problems relating to water shortages or pollution. 
Efficiency in water management and water quality 
prediction plays an important role in ensuring safety and  
1.1 Background sustainability in the use of water [2]. These are some of 
Water is as familiar a material as air, earth and concrete, the issues that emanate from a lack of adequate 
Water is necessary for life for humans and other forms of hydrological cycles, methods of water management, and 
life, much like the other three materials—well, maybe knowledge concerning the various human activities 
with exception of concrete. It is voluminous: about 3.5 % impacting catchments of water. To this end, technological 
of the land area is permanently flooded, whereas two and policy development remains highly critical to ensure 
thirds of the world is under the oceans. About the the sustainability of the use and delivery of water, 
hydrosphere, water is continuously evaporating from the protection of public health, and economic development 
Earth's surface into condensing in the atmosphere, [3].  
reappearing as liquid. Earth's supply of water is now at an Water quality is basically related to its physical, 
all-time high and will never be depleted [1]. Although chemical, and biological characteristics, making it suitable 
abundant, the water resources distributed unevenly in for various purposes, such as drinking, gardening, and 
different regions in some serious respects impede certain leisure activities. During any water quality assessment, 
regions. As the population rises, industrialization turbidity, the microbiological content, and concentrations 
increases, and even more factors such as climate change  of both organic and inorganic compounds are amongst the 
282   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
more commonly measured parameters [4]. The datasets. Additionally, real-time pH prediction, a critical 
degradation of water quality is a consequence of the parameter in assessing water quality, has not been 
current process of urbanization, agricultural runoff, and extensively explored using hybrid ML-optimization 
industrial wastes. Some contaminants such as heavy techniques, especially in scenarios where both historical 
metals, pesticides, and viruses may result in serious and real-time data are available. 
human health hazards and ecosystem health. Good water To address these gaps, this study proposes a novel 
grading control will require technological advancement, framework that integrates SVM, ETC, TSOA and CGO. 
community participation, and regulatory mechanisms. The These techniques are applied to predict and classify water 
implementation of best practices in pollution prevention, pH levels using historical and sensor-based real-time 
wastewater treatment, and watershed management will datasets. The objectives of this research are: 
ensure the sustainability of water resources through better • To develop and compare ML models capable of 
maintenance of their quality [5]. accurately predicting water pH levels using both 
One of the factors influencing the 𝑝𝐻 of water and historical and real-time input data; 
hence its chemical behavior and its biological availability • To optimize model performance using the Chaos 
is the concentration of hydrogen ions in it. Basically, 𝑝𝐻 Game Optimization algorithm, ensuring more 
is the measure of the concentration of hydrogen ions in reliable and efficient learning from complex 
water. It runs on a scale from 0 to 14, with 7 to 8 being datasets; 
considered neutral, 0 to 7 considered acidic, and 8 to 14 • To evaluate the classification capabilities of the 
considered basic. 𝑃𝐻 influences the solubility of metals Extra Trees Classifier and SVM in distinguishing 
and nutrients' availability, along with activity concerning water quality categories based on pH thresholds; 
aquatic organisms.  • To demonstrate the feasibility of a hybrid ML-
Machine learning, as a multidisciplinary subset of optimization approach for proactive and 
artificial intelligence, develops algorithms with which sustainable water quality monitoring. 
computers can evaluate, comprehend, and predict data [6–
9]. It has powerful capabilities for identification, data 2 Related works 
analysis, and decision making and has already revamped 
many disciplines. The application of machine learning Idroes et al. [15] conducted a study to predict urban air 
techniques is on the increase in environmental research to quality in DKI Jakarta, Indonesia, using the CATBoost 
enhance our understanding and management through the machine learning algorithm, which is known for handling 
modeling of environmental processes, analysis of large- categorical features effectively, managing missing values, 
scale information, and predictions of future conditions and reducing the risk of overfitting. The research utilized 
[10]. The most promising application would, therefore, be air quality data collected from Jakarta's monitoring 
in the monitoring of water quality through management stations over the period of 2010 to 2021. The dataset 
using machine learning. With the derivation of large data included five key pollutants: PM₁₀, SO₂, CO, O₃, and NO₂. 
sets from sensors and satellite images, coupled with After a preprocessing stage that involved data cleaning 
historical records, it will be possible for machine learning and normalization, the authors split the dataset into 
models to develop leading trends, anomalies, and training (80%) and testing (20%) subsets. The CATBoost 
predictions of water quality parameters with high accuracy model was trained and evaluated using standard 
[11] [12]. These capabilities enable more proactive and performance metrics, where it achieved high accuracy 
effective water management strategies, reducing (0.9781), precision (0.9722), and recall (0.9728). A 
pollution, optimizing resource allocation, and protecting feature importance analysis revealed that ozone (O₃) was 
public health. The integration of machine learning into the the most significant contributor to air quality variation, 
water quality monitoring system is one of the huge leaps followed by PM₁₀. Sasmita et al. [16] investigated the 
forward in environmental science and technology [13] classification of air quality levels in Indonesia using the 
[14]. Plume Air Quality Index (PAQI), which incorporates 
pollutant concentrations such as PM₂.₅, PM₁₀, NO₂, and 
1.2 Research gaps and objectives O₃. The study focused on evaluating classification 
performance using Decision Tree and K-Nearest Neighbor 
Despite the increasing application of ML in water quality (k-NN) algorithms, applied to secondary data collected 
prediction, significant challenges persist. Traditional from 33 provincial capitals between July 1 and December 
approaches often struggle with the nonlinearity and 31, 2022. Unlike prior studies that typically assessed 
complex variability of environmental data, which limits model performance solely based on accuracy, this 
their predictive accuracy and generalizability across research adopted a more comprehensive evaluation 
diverse contexts. Furthermore, while various studies have approach by incorporating precision, recall, and F1-score 
employed models like MLR, ANN, and SVM, many lack alongside accuracy. The results demonstrated that the 
the integration of robust optimization algorithms to fine- Decision Tree classifier outperformed k-NN, achieving 
tune model parameters and enhance performance. performance scores of 90.67% accuracy, 90.61% 
Another notable gap is the underutilization of precision, 90.67% recall, and 90.63% F1-score. These 
ensemble tree-based methods such as the ETC, which are findings suggest that tree-based models can provide robust 
known for their resilience to noise and their ability to classification capabilities for air quality indexing, 
capture intricate relationships within high-dimensional supporting more reliable monitoring and decision-making 
Hybrid Machine Learning and Optimization Algorithms for pH-Based…                                    Informatica 49 (2025) 281–298   283 
regarding urban environmental health. Putra et al. [17] through accurate pollutant classification. Saxena and 
addressed the critical issue of deteriorating air quality in Shekhawat [18] proposed a novel mathematical 
Indonesia’s major cities, with a focus on Jakarta, where framework to compute a Cumulative Index (CI) for air 
urbanization and anthropogenic activities such as quality classification based on the concentrations of four 
vehicular emissions, industrialization, and waste major pollutants: SO₂, NO₂, PM2.5, and PM10. This CI 
accumulation have significantly impacted atmospheric served as a compact, interpretable metric reflecting the 
conditions. Their study aimed to classify daily air quality combined impact of pollutants on air quality. Using these 
using machine learning algorithms—specifically the C5.0 CI values as input features, they developed a two-class 
algorithm and Random Forest—based on the Air Pollution Support Vector Machine (SVM) model to classify air 
Standard Index (ISPU). These models were applied to quality as either good or harmful. To optimize the 
datasets from 2017 and 2018, consisting of pollutant performance of the SVM, the authors employed the Grey 
parameters including CO, NO₂, SO₂, PM, O₃, and NO. Wolf Optimizer (GWO) for parameter tuning, aiming to 
Their classification approach emphasized the importance maximize classification accuracy. The methodology was 
of accurately identifying air quality categories to support tested on real datasets from three major Indian cities—
policy-making. The models demonstrated high predictive Delhi, Bhopal, and Kolkata. The results indicated that the 
accuracy, with C5.0 and Random Forest achieving proposed classifier effectively distinguished between the 
99.74%, 99.22%, and 99.97% accuracy on the 2017 two air quality categories, with high classification 
dataset and 98.28%, 98.85%, and 97.42% on the 2018 performance across all test locations. The study concluded 
dataset, respectively. The analysis identified O₃ (ozone) as that the CI-based classification framework was both 
the most influential factor in classifying air quality, with computationally efficient and aligned well with actual air 
most days falling under the "Moderate" ISPU category. quality data, making it a promising tool for public health 
This work highlights the potential of decision tree-based and environmental monitoring. The summary of the 
algorithms in supporting urban air quality management previous studies reported in Table 1. 
Table 1: The summary of the related works. 
Study Methodology Dataset Metrics’ results Key Findings 
Idroes et al. CATBoost machine Air quality data from Accuracy: 0.9781, Ozone (O₃) and PM₁₀ 
[15] learning for air quality Jakarta monitoring Precision: 0.9722, most significant 
prediction. stations (2010-2021). Recall: 0.9728 pollutants. 
Pollutants: PM₁₀, SO₂, 
CO, O₃, NO₂. 
Sasmita et Classification using Secondary data from Accuracy: 90.67%, Decision Tree 
al. [16] Decision Tree and k-NN 33 provincial capitals Precision: 90.61%, outperformed k-NN 
algorithms. in Indonesia (2022). Recall: 90.67%, F1: for classification 
Pollutants: PM₂.₅, 90.63% tasks. 
PM₁₀, NO₂, O₃. 
Putra et al. Classification using C5.0 Air quality data (2017- C5.0: 99.74% Ozone (O₃) as the 
[17] and Random Forest 2018). Pollutants: CO, (2017), 98.28% most influential 
algorithms. NO₂, SO₂, PM, O₃, NO. (2018), RF: 99.22% factor in classifying 
(2017), 98.85% air quality. 
(2018) 
Saxena and Support Vector Machine Real datasets from Classification CI-based 
Shekhawat (SVM) classification with three Indian cities performance: High classification 
[18] Grey Wolf Optimizer (Delhi, Bhopal, accuracy for all test framework is 
(GWO) for parameter Kolkata). Pollutants: locations computationally 
tuning. SO₂, NO₂, PM₂.₅, PM₁₀. efficient. 
 
level of the water, whether it be basic, alkaline, or acidic. 
3 Materials and methodology   Data recording over some period gathered daily data on 
water quality. In this case, the 'Date' variable provides for 
the exact day (a day in every two weeks) certain data was 
3.1 Data gathering taken and offers a time-series track showing 
Water quality data were collected in a systematic manner environmental change over time. Salinity, representing the 
and analyzed for different environmental parameters and concentration of dissolved salts in water, can directly 
their relations to 𝑝𝐻 values. The dataset used in the present influence pH levels by altering the ionic balance and 
study derived from [19] incorporates 1320 records in buffering capacity of the water body. Variations in salinity 
total, and each of the following input parameters has been may therefore contribute to shifts in pH, particularly in 
included in the dataset: Date, Salinity, Dissolved Oxygen, estuarine and coastal environments. Dissolved oxygen 
Secchi Depth, Water Depth, Water Temperature, and Air (DO), essential for aquatic life, can also impact pH 
Temperature. The output variable analyzed here is the 𝑝𝐻 through biological processes such as respiration and 
284   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
photosynthesis, which either consume or release CO₂, dots below 0.4 meters for water depth highlights that most 
thereby influencing acidity. Secchi Depth, a measure of water samples were taken from shallow depths, with 
water transparency determined by noting the depth at deeper samples being rare. The output 𝑝𝐻 plot illustrates 
which a Secchi disk disappears, can serve as an indirect the red dots form distinct horizontal bands, suggesting that 
indicator of photosynthetic activity, which affects CO₂ 𝑝𝐻 measurements are discrete rather than continuous. 
levels and thus the pH. Water Depth at the sampling This discrete distribution is crucial for classifying water 
location affects both light availability and thermal quality based on 𝑝𝐻 levels. 
stratification, which can influence biological activity and To support the development and execution of the 
chemical reactions that regulate pH. Water Temperature proposed models, a high-performance desktop 
and Air Temperature offer insight into thermal conditions workstation was utilized. This system is equipped with an 
that affect metabolic rates of organisms and chemical Intel® Core™ i7-3770K processor clocked at 3.50 GHz 
equilibria, both of which can influence pH values. The and complemented by 16 GB of RAM, ensuring efficient 
primary focus of this study was on pH levels, a key processing and multitasking capabilities. The operating 
parameter in assessing water quality. In the dataset, pH system used was Windows 11 Pro (64-bit), running on an 
values were categorized and analyzed as follows: Acidic x64-based architecture. Visual computations and 
(pH < 7) with 433 instances, Neutral (pH = 7) with 617 graphical rendering were handled by an NVIDIA GeForce 
instances, and Basic (pH > 7) with 280 instances. Each of GT 640 graphics card, which contributed to a responsive 
the variables was examined in relation to these pH and stable graphical environment. A 1 TB internal hard 
categories to explore their predictive relevance. disk served as the primary storage medium, providing 
Figure 1. consists of several parallel plots, the 𝑥 − ample space for managing datasets and associated files. 
𝑎𝑥𝑖𝑠 in each plot represents the total number of samples, All programming tasks were conducted using Python. 
providing a consistent framework for comparing the The scikit-learn library formed the foundation for building 
distribution of each parameter. 𝑇ℎ𝑒 𝑦 − 𝑎𝑥𝑖𝑠, 𝑣𝑎𝑟𝑖𝑒𝑠 and assessing machine learning algorithms. Data 
according to the parameter being measured, showing the preparation and numerical analysis were facilitated by 
specific quantity for each sample. The red dots effectively Pandas and NumPy, respectively. To aid in visual 
illustrate the range and concentration of values for each interpretation of results, Matplotlib was employed, 
parameter, offering an unambiguous graphic depiction of enabling clear and informative graphical outputs 
the data's distribution. For instance, the clustering of red throughout the analysis process. 
 
Figure 1: The parallel plot of the inputs and outputs variables 
Hybrid Machine Learning and Optimization Algorithms for pH-Based…                                    Informatica 49 (2025) 281–298   285 
𝑁
3.2 Support vector classification ∑𝑎𝑖𝑦𝑖 = 0 (6) 
𝑖=1
Support Vector Classification (SVC) is a supervised 0 ≤ 𝑎𝑖 ≤ 𝐶𝑠𝑣𝑐                                  𝑖 = 1, . . . , 𝑁 (7) 
learning algorithm rooted in the structural risk 
A kernel function, denoted as 𝐾(𝑥𝑖 , 𝑥𝑗), computes the 
minimization principle of Support Vector Machines 
inner product between pairs of input samples implicitly 
(SVM) [20]. It operates by mapping input features into a 
mapped into a high-dimensional feature space, enabling 
higher-dimensional space through non-linear kernel 
nonlinear classification without explicitly performing the 
transformations, enabling the separation of data that is not 
transformation. Common kernel types include linear, 
linearly separable in the original feature space. In this 
polynomial, radial basis function (RBF), and sigmoidal 
transformed space, SVC constructs an optimal hyperplane 
kernels, among others. For a kernel to be valid, it must 
that maximizes the margin — defined as the distance 
satisfy Mercer's conditions—specifically, it must be 
between the hyperplane and the closest data points from 
symmetric and positive semi-definite. Extensive studies 
each class, known as support vectors — while 
have shown that the RBF kernel, formally defined in Eq. 
simultaneously minimizing classification errors [21]. This 
(8), is particularly effective for classification problems 
balance between margin maximization and error 
due to its localized response and flexibility. Accordingly, 
minimization contributes to the model’s generalization 
the RBF kernel is adopted in our methodology, where the 
capability and robustness. 
hyperparameter 𝛾 governs the inverse of the squared 
‖𝑊‖2 𝑁
𝑚𝑖𝑛𝑤,𝑏,∈ + 𝐶
2 𝑠𝑣𝑐 ∑ ∈𝑖 (1) radius of influence of the support vectors, effectively 
controlling the decision boundary's smoothness and 
𝑖=1
𝑦𝑖(𝑤
𝑇 . ∅(𝑥𝑖) + 𝑏) ≥ 1 −∈𝑖           𝑖 = 1, . . . , 𝑁 (2) sensitivity to individual data points. 
∈𝑖≥ 0                                               𝑖 = 1, . . . , 𝑁 (3) 𝐾(𝑥𝑖 , 𝑥𝑗) = ∅(𝑥𝑖)
𝑅∅(𝑥𝑗) 
(8) 
The function ∅(𝑥𝑖) represents a nonlinear mapping = 𝑒𝑥𝑝(−𝛾‖𝑥𝑗 − 𝑥𝑖‖) 
that projects each input observation 𝑥𝑖, defined by its Once the optimization process is completed and the 
explanatory variables, into a higher-dimensional feature optimal weight vector and bias term are obtained, the 
space where linear separation of classes becomes more trained model can be used to generate predictions for 
feasible. Within this space, 𝑤 denotes the weight vector unseen samples by evaluating the decision function as 
that defines the orientation of the separating hyperplane, defined in Eq. (9). 
while 𝑏 is the bias term that shifts the hyperplane to −1 𝑖𝑓 𝑤𝑇∅(𝑥
𝑆𝑉𝐶     𝑦𝑖 = { 𝑖) + 𝑏 ≤ 0
 (9) 
achieve optimal separation. The parameter 𝐶𝑠𝑣𝑐 serves as 1 𝑖𝑓 𝑤𝑇∅(𝑥𝑖) + 𝑏 > 0
a regularization factor that balances the trade-off between 
maximizing the margin and minimizing classification 3.3 Extra trees classifier 
errors. The slack variables ∈𝑖 quantify the degree to which 
The Extra trees classifier, proposed by Geurts et al. [22], 
individual observations violate the margin constraints, 
represents an advanced ensemble learning technique that 
allowing for soft-margin classification to accommodate 
builds upon and extends the Random Forest framework. 
misclassified or non-linearly separable data points. 
Unlike traditional ensemble methods that rely on 
Determining the optimal hyperplane, as formulated in 
bootstrapped datasets and deterministic split criteria, Extra 
Eq. (4), entails maximizing the margin between classes in 
Trees introduces two levels of randomness to enhance 
the high-dimensional feature space. This objective is 
model diversity and generalization. First, it selects split 
mathematically achieved by minimizing the Euclidean 
thresholds at random rather than searching for the most 
norm of the weight vector, which directly corresponds to 
optimal ones. Second, instead of using bootstrap 
maximizing the margin width. Simultaneously, the model 
sampling, it grows each decision or regression tree using 
incorporates a penalty for misclassified instances to ensure 
the entire training dataset. This approach not only 
a balance between model complexity and classification 
accelerates the training process but also reduces variance, 
accuracy. Ultimately, the predicted output labels indicate 
making Extra Trees particularly effective for high-
the class membership of each sample, based on their 
dimensional and noisy datasets. 
position relative to the decision boundary. 
𝐷(𝑥𝑖) = 𝑊𝑇𝜑(𝑥𝑖)
Extra Trees operates by introducing controlled 
+ 𝑏  (4) 
randomness into the decision tree construction process, 
The computational complexity of the primal 
particularly for numerical features. At each node, the 
formulation is primarily dependent on the number of input 
algorithm selects K random features and determines split 
features (dimensionality), whereas the dual formulation's 
thresholds uniformly at random, rather than through 
complexity scales with the number of training samples. 
traditional optimization. The minimum number of samples 
Therefore, in scenarios involving high-dimensional 
required to allow further splitting is defined by 𝑛
feature spaces, it is often more computationally efficient 𝑚𝑖𝑛, 
ensuring regularization. Unlike methods that rely on 
and advantageous to employ the dual form of the model, 
bootstrap resampling, Extra Trees trains each of its 𝑀 trees 
as outlined in Eqs. (5–7). 
𝑁 𝑁 on the entire original dataset, promoting stability and 
1
𝑚𝑎𝑥𝑎 ∑ 𝑎 − ∑ 𝑎 𝑎 𝑦 𝑦 𝐾 𝑥 , minimizing bias. For prediction, the ensemble outputs are 
𝑖 2 𝑖 𝑗 𝑖 𝑗 ( 𝑖 𝑥𝑗) (5) 
combined using majority voting in classification tasks or 
𝑖=1 𝑖=1
averaged in regression settings. This explicit 
286   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
randomization strategy—both in attribute selection and the search space, with randomness and minimal 
cut-point determination—significantly reduces variance movement controlled using factorial-based adjustments. 
and enhances generalization performance, especially in 𝑆𝑒𝑒𝑑1
𝑖 = 𝑥𝑖 + 𝛼𝑖 × (𝛽𝑖 × 𝐺𝐵 − 𝑀𝐺𝑖),     𝑖 (12) 
high-dimensional and noisy contexts. Although the = 1, 2, … , 𝑛. 
algorithm exhibits a time complexity of 𝑁 log𝑁, its Assuming that 𝑋𝑖 represents theith potential solution 
computational efficiency is bolstered by the lightweight and the randomly generated factorial used to describe 
nature of the node-splitting process. The key the limitations of seeds on movement is called 𝛼𝑖. To 
hyperparameters—𝐾, 𝑛𝑚𝑖𝑛, and 𝑀—govern the diversity simulate the potential to roll a pair of dice 𝛽𝑖 and 𝛾𝑖 stand 
of splits, regularization, and ensemble size, respectively. for a random number of 0 𝑜𝑟 1.  
While the algorithm supports fine-tuning, default 𝑆𝑒𝑒𝑑2
𝑖 = 𝐺𝐵 + 𝛼𝑖 × (𝛽𝑖 × 𝑋𝑖 − 𝛾𝑖 × 𝑀𝐺𝑖) (13) 
parameter configurations often yield strong performance, 𝑆𝑒𝑒𝑑3
𝑖 = 𝑀𝐺𝑖 + 𝛼𝑖 × (𝛽𝑖 × 𝑋𝑖 − 𝛾𝑖 × 𝐺𝐵),   
making Extra Trees both effective and computationally (14) 
𝑖 = 1, 2, … , 𝑛.   
autonomous. A fourth seed is produced by using an additional 
technique to carry out the mutation phase in the search 
3.4 Chaos game optimization space's position updates of the qualified seeds. This update 
The amalgamation of basic principles of chaotic games of the seed's position is based on arbitrary modifications 
and fractals provide a mathematical model for the to the choice variables chosen at random. 
algorithm 𝐶𝐺𝑂 [23]. The 𝐶𝐺𝑂 algorithm examines several 𝑆𝑒𝑒𝑑4 = 𝑋 𝑘
𝑖 𝑖(𝑥𝑖 = 𝑥𝑘
𝑖 + 𝑅),     𝑘 = [1, 2, … , 𝑑]. (15) 
potential solutions (𝑋) for this goal, that depicts a few A random integer in the interval [1, 𝑑] is denoted by 
suitable seeds within a sierpinski triangle, so that a group 𝑘, and 𝑅 is a uniformly distributed random number in the 
of answers that have developed by chance and selection region [0,1]. 
changes is maintained by many natural evolution The 𝐶𝐺𝑂 algorithm's exploration and exploitation rate 
algorithms. According to this technique, a few chosen can be controlled and modified by varying the movement 
variables (𝑥𝑖,𝑗) reflect where these eligible seeds are limits of the seeds, represented by four different 
located inside the triangle formed by sierpinski. with every formulations for 𝛼𝑖. 
potential solution (𝑋𝑖). 𝑅𝑎𝑛𝑑
𝑑 2 × 𝑅𝑎𝑛𝑑
𝑋1  𝑥
1 2 𝑗
1 𝑥1 ⋯ 𝑥1 ⋯ 𝑥1  𝛼𝑖 = { (𝛿 × 𝑅𝑎𝑛𝑑) + 1  (16) 
 𝑋   𝑥1 2
2 𝑥 𝑗 𝑑
 2 2 ⋯ 𝑥2 ⋯ 𝑥
 2  (휀 × 𝑅𝑎𝑛𝑑) + (∼ 휀)
 
 ⋮  = ⋮ ⋮ ⋮ ⋮ ⋱ ⋮  
𝑋 =
 𝑋   (10) In this case, 𝛿 and 휀 are random integers Rand is a 
𝑖   𝑥1 2 𝑑
𝑖 𝑥 𝑗
 ⋮   𝑖 ⋯ 𝑥𝑖 ⋯ 𝑥 random number with a uniform distribution in the interval 
𝑖  
𝑋𝑛]  ⋮ ⋮ ⋮ ⋮ ⋱ ⋮ [0,1].  
[  
[𝑥1
𝑛 𝑥2 𝑗
𝑛 ⋯ 𝑥𝑛 ⋯ 𝑥𝑑 The process involves evaluating new seeds against 
𝑛 ]
According to the sierpinski triangle, where 𝑛 existing ones to determine their eligibility for inclusion 
 is the 
within the area used for searching. The new solution 
number of eligible seeds and 𝑑 is the seed's dimension. 
candidates' quality is evaluated, with better candidates 
Based on random starting positions, these qualifying seeds 
retained and seeds with low fitness values removed. The 
are arranged in the search space. 
𝑗
𝑥 ( 𝑗 𝑗 replacement procedure is employed to simplify the 
𝑗 0) = 𝑥𝑖,𝑚𝑖𝑛 + 𝑟𝑎𝑛𝑑. (𝑥𝑖,𝑚𝑎𝑥 mathematical model and ensure a more efficient 
𝑗 𝑖 = 1, 2, … , 𝑛. (11) 
− 𝑥 mathematical method.  
𝑖,𝑚𝑖𝑛),     {  
𝑗 = 1, 2, … , 𝑑.
𝑗
In this approach,  𝑥𝑗 (0) represents the initial position 3.5 Transit search algorithm 
𝑗 𝑗
of qualified seeds. The values 𝑥𝑖,𝑚𝑖𝑛 and  𝑥𝑖,𝑚𝑎𝑥  define the Host star number (𝑛𝑠) and the definition of signal-to-noise 
lower and upper bounds for the jth decision variable of the ratio (𝑆𝑁) is algorithm structure. The transit model 
ith candidate. A random number between 0 and 1 guides determines 𝑆𝑁 Standard deviation of measurements made 
the movement direction. outside of transit is used to estimate noise. There is always 
Qualified seeds symbolize core concepts from chaos noise in photons received from stars. The starting 
theory. These seeds represent candidate solutions in an population for 𝑇𝑆 is equal to the product of 𝑛𝑠 and 𝑆𝑁 
optimization problem, where higher and lower fitness [24].  
values indicate better and worse suitability, respectively. • Galaxy phase 
To explore the search space, qualified seeds are used After identifying habitable zones, the program 
to construct a Sierpinski triangle—a structure made from chooses a galactic center at random from the search space. 
three points: the current candidate (𝑋𝑖), the group mean The optimal stellar systems are found by evaluating 
(𝑀𝐺𝑖), and the global best (GB). This triangle is a basis random regions 𝐿𝑅. With the capacity to support life, the 
for generating new seeds using a chaos game approach. regions that have been identified with the best fitness are 
Each triangle uses a virtual die with green and red chosen, and the algorithm starts with these regions. 
faces to decide movement: green directs the seed toward 𝐿𝑅,𝑙 = 𝐿𝐺𝑎𝑙𝑎𝑥𝑦 + 𝐷 − 𝑁𝑜𝑖𝑠𝑒    
the global best (GB), and red toward the group mean (17) 
 𝑙 = 1, … , (𝑛𝑠 × 𝑆𝑁) 
(𝑀𝐺𝑖). A random binary value (0 or 1) determines the face. 
This process allows seeds to move stochastically within 
Hybrid Machine Learning and Optimization Algorithms for pH-Based…                                    Informatica 49 (2025) 281–298   287 
𝑐1𝐿𝐺𝑎𝑙𝑎𝑥𝑦 − 𝐿𝑟   𝑖𝑓     𝑧 = 1   (𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑅𝑒𝑔𝑖𝑜𝑛) 𝐷
𝐷 = {  18) 
𝑐1𝐿𝐺𝑎𝑙𝑎𝑥𝑦 + 𝐿𝑟   𝑖𝑓     𝑧 = 2   (𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑒𝑔𝑖𝑜𝑛) (
   𝑐4𝐿𝑅,𝑖 − 𝑐3𝐿𝑟    𝑖𝑓   𝑧 = 1    (𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑅𝑒𝑔𝑖𝑜𝑛) (21) 
= {  
𝑁𝑜𝑖𝑠𝑒 = (𝑐2)
3𝐿𝑟 (19) 𝑐4𝐿𝑅,𝑖 − 𝑐3𝐿𝑟    𝑖𝑓   𝑧 = 2    (𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑒𝑔𝑖𝑜𝑛)  
𝐿𝐺𝑎𝑙𝑎𝑥𝑦 denotes where the center of the galaxy is 𝑁𝑜𝑖𝑠𝑒 = (𝑐 )3
5 𝐿𝑟 (22) 
located, and in the optimization problem, two coefficients The next stage involves utilizing Eq. (20) to (22) to 
are present ranging from 𝑧𝑒𝑟𝑜 to 𝑜𝑛𝑒, denoting an choose a star from each of the areas that have been chosen 
accidental integer 𝑐1 and an accidental vector to belong to a stellar system. 𝐿𝑠 indicates where the stars 
𝑐2 representing the number of variables. To demonstrate are located. 
the variation in the research area's situation, one definition In addition to the coefficient 𝑐5, which is a random vector 
of parameter 𝐷 is the difference between the galaxy's between 0 and 1, the coefficients 𝑐3 and 𝑐4 are random 
center and its present condition. This region may be found values between 0 and 1.  
either on the back of the galaxy or in the front (positive Before beginning iterations, the suggested method 
portion) of its middle area. Here, parameter zone (𝑧) is a executes the galaxy phase once to choose appropriate 
randomly generated number that is either one or two. The situations for the primary stages (2– 5). 
𝑁𝑜𝑖𝑠𝑒 parameter is used to eliminate noise from received • Transit Phase 
signals to improve location accuracy. To minimize To identify the transit, a re-measurement of the light 
computational value, the coefficient 𝑐2 with a power of 3 received from the beginning is required to identify any 
is used, as noise cannot noticeably deviate from desired potential decrease in the received light signals. 𝐿𝑆 and its 
situations. corresponding fitness 𝑓𝑆 have two meanings (𝑀1 and 𝑀2). 
𝐿𝑠,𝑖 = 𝐿𝑅,𝑖 + 𝐷 − 𝑁𝑜𝑖𝑠𝑒     𝑖 = 1, … , 𝑛𝑠 (20) 
The light spectrum (star class) that the telescope between the telescope and the star since the light comes 
receives and the star's distance from the observer may be from the star.  
used to determine the luminosity of the star. It is evident 𝑐8𝐿𝑇 + 𝑅𝐿𝐿𝑆,𝑖
𝐿
that a short distance results in a higher photon count. The 𝑧 =      𝑖 = 1, … , 𝑛 ) 
2 𝑠 (30
star's luminosity is acquired by: 𝐿𝑆,𝑛𝑒𝑤,𝑖
𝑅 𝑅𝐿 =  (31) 
𝑖 𝐿𝑆,𝑖
𝑛
𝐿 𝑠 The planet's original position upon detection is 
𝑖 =      𝑖 = 1, … , 𝑛𝑠     𝑅 ∈ {1, … , 𝑛 (23) 
(𝑑 2 𝑖 𝑠} 
𝑖) demonstrated by 𝐿𝑧 and luminance ratio is determined by 
𝑑𝑖 = √(𝐿𝑠 − 𝐿 2
𝑇)        𝑖 = 1, … , 𝑛𝑠 (24) 𝑅𝐿. Also, 𝑐8 has a random value between 0 and 1. 
Here, Star I's luminance and rank are depicted by the 𝐿𝑚,𝑗
variables 𝐿𝑖 and 𝑅𝑖. Additionally, the space between the 𝐿𝑧 + 𝑐9𝐿𝑟      𝑖𝑓   𝑧 = 1                                          𝑓𝑜(𝑟3 𝐴𝑝ℎ𝑒𝑙𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜
star I and the telescope are covered by 𝑑𝑖 . Since it is = {𝐿𝑧 − 𝑐9𝐿𝑟      𝑖𝑓   𝑧 = 2     𝑗 = 1, … , 𝑆𝑁            𝑓𝑜𝑟2 𝑃) 𝑒𝑟𝑖ℎ𝑒𝑙𝑖𝑜𝑛 𝑟𝑒𝑔𝑖
chosen at random at the beginning of the method, the 𝐿𝑧 + 𝑐10𝐿𝑟     𝑖𝑓   𝑧 = 3                                            𝑓𝑜𝑟 𝑁𝑒𝑢𝑡𝑟𝑎𝑙 𝑟𝑒𝑔𝑖
location of the telescope 𝐿𝑇 remains constant throughout ∑𝑆𝑁
𝑗=1 𝐿 𝑚𝑗 (3
the optimization. 𝐿𝑃 =  
𝐿 𝑆𝑁 3) 
𝑆,𝑛𝑒𝑤,𝑖 = 𝐿𝑆,𝑖 + 𝐷 − 𝑁𝑜𝑖𝑠𝑒      𝑖 = 1, … , 𝑛𝑠 (25) 
To validate travel and reducing the noise's influence, 
𝐷 = 𝑐6𝐿𝑆,𝑖 (26) one of the most crucial factors is 𝑆𝑁The planet's position 
𝑁𝑜𝑖𝑠𝑒 = (𝑐 3
7) 𝐿𝑆 (27) inside its star system is specified by analyzing the quantity 
The coefficients 𝑐6 and 𝑐7 are a random vector from 0 of signals received, which is derived from the planet's 
to 1 and a random integer from −1 to 1. The amount of estimated position. Several 𝑆𝑁 signals are taken into 
new luminosity, 𝐿𝑖,𝑛𝑒𝑤  is determined by: account for this reason in the 𝑇𝑆 algorithm Eq. (32). The 
𝑅 𝑛𝑒𝑤
𝑖, coefficient 𝑐
𝑛 9 is an accidental number ranging from −1 to 
𝐿 𝑠
𝑖,𝑛𝑒𝑤 =  
2            𝑖 = 1, … , 𝑛𝑠 (28) 1. 𝑐10 is a random vector with values in the range of −1 to 
(𝑑𝑖,𝑛𝑒𝑤) 1. Once signals 𝐿𝑚  have been determined, the average 
The new 𝐿𝑆 and the position of the telescope may be of 𝑆𝑁 signals are used to adjust the detected final planet 
used to compute the parameter 𝑑𝑖,𝑛𝑒𝑤 . It is possible to position 𝐿𝑃. The terms Aphelion and Perihelion refer to 
assess the possibility of transit by comparing 𝐿𝑖 and 𝐿𝑖,𝑛𝑒𝑤. the relative furthest and closest distances, in astronomy, 
If 𝑇 = 1, the phase of the planet is utilized; if not, the between a planet (such as Earth) and the Sun or another 
phase of the neighbor e is used in this iteration.  host star. Three zones—Aphelion, Perihelion, and Neutral 
𝐼𝑓 𝐿𝑖,𝑛𝑒𝑤 < 𝐿𝑖      𝑃𝑇 = 1     (𝑇𝑟𝑎𝑛𝑠𝑖𝑡) regions (the area between Aphelion and Perihelion areas), 
(29) 
𝐼𝑓 𝐿𝑖,𝑛𝑒𝑤 ≥ 𝐿𝑖      𝑃𝑇 = 0     (𝑁𝑜 𝑇𝑡𝑎𝑛𝑠𝑖𝑡) Eq. (32), are affected by the TS technique, which estimates 
This probability 𝑃𝑇 is represented by the numbers0 the planet's orbital location using the zone parameter (𝑧) 
(non-transit) and 1 (probability of transit).  If 𝑃𝑇 = 1, if in the planet phase. 
the planet phase cannot be used, this iteration uses the • Neighbor Phase 
neighbor phase. In this phase, the present planet of the star will take its 
• Planet Phase position whether the neighbor has superior circumstances 
Initially, at this stage, the discovered initial position compared to the current planet. 
of planet is identified. The quantity of light that the (𝑐11𝐿𝑠,𝑛𝑒𝑤 + 𝑐12𝐿𝑟)
telescope receives decreases during a planet's transit 𝐿𝑧 =  (34) 
2
288   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
𝐿𝑛,𝑗 number 𝑐𝑘, which can be 1, 2, 3, or 4. A random power 
𝐿𝑧 − 𝑐13𝐿𝑟     𝑖𝑓     𝑧 = 1  𝑓𝑜𝑟 𝐴𝑝ℎ𝑒𝑙𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 between 1 and (𝑛𝑠 ∗ 𝑆𝑁) is represented by 𝑃. 
(35) 
= {𝐿𝑧 + 𝑐13𝐿𝑟      𝑖𝑓     𝑧 = 2  𝑓𝑜𝑟 𝑃𝑒𝑟𝑖ℎ𝑒𝑙𝑖𝑜𝑛 𝑟𝑒𝑔𝑖𝑜𝑛 𝑐16𝐿𝑃 + 𝑐15𝑘          𝑖𝑓    𝑐𝑘 = 1      (𝑆𝑡𝑎𝑡𝑒 1)
𝐿𝑧 + 𝑐14𝐿𝑟     𝑖𝑓     𝑧 = 3   𝑓𝑜𝑟 𝑁𝑒𝑢𝑡𝑟𝑎𝑙 𝑟𝑒𝑔𝑖𝑜𝑛 𝑐16𝐿𝑃 − 𝑐15𝑘         𝑖𝑓    𝑐
𝐿 𝑘 = 2       (𝑆𝑡𝑎𝑡𝑒 2)
𝐸,𝑗 = {  
∑𝑆𝑁 (37) 
𝐿𝑃 − 𝑐15𝐾              𝑖𝑓    𝑐𝑘 = 3        (𝑆𝑡𝑎𝑡𝑒 3)
𝑗=1 𝐿𝑛.𝑗
𝐿 (36) 
𝑁,𝑖 =  𝐿𝑃 + 𝑐15𝐾             𝑖𝑓     𝑐𝑘 = 4         (𝑆𝑡𝑎𝑡𝑒 4)
𝑆𝑁
𝐾 = (𝑐17)
𝑃
Eq. (34) is used to estimate the neighbor  𝐿𝑧  𝐿𝑟  (38) 
beginning position Considering its host star 𝐿𝑠,𝑛𝑒𝑤 and an 
accidental place 𝐿𝑅. 𝐿𝑁 determines the neighbor planet’s 3.6 K-Fold Cross validation 
ultimate position planets Eq. (35) and (36). The K-fold cross-validation is a widely utilized and reliable 
coefficients 𝑐11 and 𝑐12 in Eq. (41) handle a randomized approach for evaluating and selecting models, especially 
integer in the range of 0 to 1. Moreover, the in classification and regression tasks. This technique 
coefficients  𝑐13 and 𝑐14 represent a vector with a random involves dividing the dataset into k equally sized subsets 
number and a range of −1 to 1, respectively.   (folds). During each iteration, one-fold is reserved for 
• Exploitation phase validation while the remaining k−1 folds are used for 
The ideal planet for every star is identified in the training. This process is repeated k times, ensuring that 
earlier stages. Finding a planet by itself is meaningless. every subset serves once as the validation set. In this study, 
Understanding the features of the planet and the a 5-fold cross-validation scheme (k = 5) was adopted to 
circumstances that support life is essential. This is carried thoroughly evaluate the proposed models and improve 
out during the TS algorithm's Exploitation step. This stage their generalization capability by systematically rotating 
expresses a revised definition of the 𝐿𝑃. 𝐿𝑃 in the present the training and testing partitions. As illustrated in Fig. 2, 
phase 𝐿𝐸  alludes to the features of the planet. Using Eq. the Support Vector Classifier (SVC) model demonstrated 
(37), (38), the planet’s ultimate properties are adjusted its peak performance during Fold 5, achieving a maximum 
𝑆𝑁 times (𝑗 = 1,… , 𝑆𝑁) by adding new knowledge (𝐾). Accuracy of 0.82. Similarly, the Extra Trees Classifier 
𝑐15 is an accidental number ranging from zero 𝑡𝑜 𝑡𝑤𝑜, (ETC) also recorded its highest accuracy in Fold 5, with 
and 𝑐16  is an accidental number ranging from 𝑧𝑒𝑟𝑜 to an Accuracy of 0.846, indicating consistent model 
𝑜𝑛𝑒. 𝑐17 is an accidental vector ranging from zero to one. performance across folds.
The knowledge index is represented by the random 
  
Figure 2: The results of 5-Fold Cross validation. 
 
3.7 Evaluation metrics • Accuracy 
Accuracy is the ratio of correctly predicted 
The evaluation metrics of the classification models 
observations to the total observations. It is a general 
provide a quantitative measure of the performance of the 
measure of a model’s effectiveness. 
models [25]. In this study, four fundamental evaluation 
Accuracy serves as a baseline metric to understand the 
metrics were employed to assess the performance of the 
overall performance of the model. However, it may be 
classification models: Accuracy, Precision, Recall, and 
misleading when dealing with imbalanced datasets, which 
F1-score. These metrics provide a comprehensive 
is why complementary metrics are also considered. 
understanding of model performance, especially in the 
 
context of imbalanced or complex classification problems. 
 
 
 
 
Hybrid Machine Learning and Optimization Algorithms for pH-Based…                                    Informatica 49 (2025) 281–298   289 
• Precision in environmental science are helpful and important to 
Precision is the ratio of correctly predicted positive predict occurrences such as the spread of pollution, 
observations to the total predicted positives. It reflects climate change, and water resource availability. The 
how well the model avoids false positives. models are helpful in supporting sustainability 
Precision is particularly valuable in scenarios were management and conservation. Water quality prediction 
predicting a false positive may lead to unnecessary actions grounded on models such as 𝐸𝑇𝐶 and 𝑆𝑉𝐶 is among the 
or costs. most vital inputs into the planning and regulation of water 
• Recall (Sensitivity) quality. 
Recall is the ratio of correctly predicted positive Most advanced optimization techniques, such as 
observations to all actual positives. It shows how well the 𝑇𝑆𝑂𝐴 and 𝐶𝐺𝑂, have been employed in the enforcement 
model detects actual positive cases. Recall is emphasized of 𝑆𝑉𝐶 and 𝐸𝑇𝐶 for the much more improved 
when it is more critical to identify all positive cases, even classification of water quality according to 𝑝𝐻. As a 
at the cost of some false positives. result, the base models 𝐸𝑇𝐶 and 𝑆𝑉𝐶 are involving the 
• F1-score application of optimizers to constitute hybrid models such 
The F1-score is the harmonic mean of Precision and as 𝐸𝑇𝑇𝑆, 𝐸𝑇𝐶𝐺, 𝑆𝑉𝑇𝑆, and 𝑆𝑉𝐶𝐺. Performance checking 
Recall. It provides a single metric that balances both of the derived hybrid models is to be done for water 
concerns, particularly useful when class distribution is quality prediction with respect to the 𝑝𝐻 level. 
uneven. The F1-score provides a consolidated metric for • Hyperparameters’ results 
overall classification performance, particularly useful In machine learning, hyperparameters are essential 
when neither precision nor recall alone is sufficient for settings defined prior to training that influence model 
model evaluation. performance and learning behavior. Unlike trainable 
𝑇𝑃 + 𝑇𝑁 parameters, hyperparameters must be optimized to 
Accuracy =  (39) 
𝑇𝑃 + 𝐹𝑃 + 𝑇𝑁 + 𝐹𝑁 achieve the best results. In this study, random search was 
𝑇𝑃 used to tune the hyperparameters of the proposed SVC- 
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =  (40) 
𝑇𝑃 + 𝐹𝑃 and ETC-based hybrid models. 
𝑇𝑃 As shown in Tables 2 and 3, ETC-based models were 
𝑅𝑒𝑐𝑎𝑙𝑙 =  (41) 
𝑇𝑃 + 𝐹𝑁 optimized using parameters such as n_estimators, 
2. 𝑇𝑃 max_depth, min_samples_split, min_samples_leaf, and 
𝐹1 − 𝑠𝑐𝑜𝑟𝑒 =   (42) 
2. 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 max_leaf_nodes. For example, ETTS used n_estimators = 
143 and max_leaf_nodes = 1431, while ETCG had higher 
4 Result and riscussion  values like n_estimators = 1805 and max_leaf_nodes = 
17090. 
Prediction is actually something quite central to scientific 
SVC-based models were tuned with C and gamma. 
research and practical decision-making, dealing with the 
SVTS used C = 103.098, gamma = 138.373, while SVCG 
estimation of the future state or event given current and 
had C = 679.000, gamma = 111.500. The base SVC and 
historical data. Precise predictions are important in diverse 
ETC models retained simpler, default configurations. This 
fields such as meteorology to finance, for which the 
tuning improved accuracy and computational efficiency 
information furnished stands useful in planning, risk 
across all hybrid models.
management, and policy development. Predictive models 
Table 2: The results of Hyperparameters for ETC-based hybrid models. 
Hyperparameter 
Models 
n_estimators max_depth min_samples_split min_samples_leaf max_leaf_nodes 
ETTS 143 143 0.001 0.000 1431 
ETCG 1805 142 0.972 0.500 17090 
ETC 100 None 2.000 1.000 None 
Table 3: The results of Hyperparameters for SVC-based hybrid models. 
Hyperparameter 
Models 
C gamma 
SVTS 103.098 138.373 
SVCG 679.000 111.500 
SVC 1.000 scale 
 
• Convergence curves successive iterations, with the y-axis representing model 
Figure 3 illustrates the convergence curves of the accuracy and the x-axis denoting the number of iterations. 
proposed hybrid models, which combine machine learning The convergence behavior varies notably across the 
classifiers (SVC and ETC) with metaheuristic hybrid configurations. The SVTS model (SVC optimized 
optimization algorithms (TSOA and CGO). The figure by TSOA) exhibits a steady, linear improvement in 
captures the progression of classification accuracy across accuracy, reflecting a stable convergence pattern. In 
290   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
contrast, the SVCG model (SVC optimized by CGO) sharper rise in accuracy, ultimately reaching a highly 
demonstrates a less consistent trajectory, with noticeable competitive performance level. 
fluctuations in accuracy, though an overall upward trend Among all models, ETTS achieved the highest final 
is still evident. accuracy of 0.84, showcasing the effectiveness of the 
Similarly, the ETTS model (ETC optimized by TSOA optimizer with the ETC classifier. Conversely, 
TSOA) shows a smooth and consistent increase in SVCG attained the lowest peak accuracy of approximately 
accuracy, indicating robust convergence characteristics. 0.77, suggesting less stable convergence when SVC is 
The ETCG model (ETC optimized by CGO) achieves a paired with CGO. 
 
Figure 3: The convergence curve of the four presented hybrid models 
Table 4 presents the performance metrics—Accuracy, enhancements are also visible in Precision, Recall, and F1 
Precision, Recall, and F1 Score—for both the training and Score. During testing, although the performance gap 
testing phases of the base classifiers (ETC and SVC) and slightly narrows, SVTS still outpaces the base model with 
their corresponding hybrid variants (ETTS, ETCG, SVTS, an accuracy of 0.760, whereas SVCG and SVC followed 
and SVCG). Additionally, Figure 4 complements these at 0.755 and 0.745, respectively. 
results with 3D bar plots that provide a visual The visualized results in Figure 4 reinforces these 
representation of the metric distributions for each model, findings. The 3D bar plots clearly illustrate the consistent 
highlighting comparative strengths in both learning and superiority of hybrid models, particularly ETTS, across all 
generalization capabilities. evaluation metrics. The visual spacing between the bars 
Comparing the base model ETC with its hybrids, reflects the degree of improvement, emphasizing how 
ETTS and ETCG, it is evident that both optimized variants optimization algorithms—especially TSOA—enhance 
consistently outperform the base model in both training both model learning (training performance) and 
and testing phases. For example, in the training stage, generalization (testing performance). The graphics also 
ETTS achieved the highest accuracy (0.910), followed highlight that the ETTS model maintains the most 
closely by ETCG (0.897), while ETC lagged at 0.881. balanced and highest-performing profile among all tested 
Similar trends are observed across Precision, Recall, and classifiers. 
F1 Score. These performance gains continue in the testing In summary, the combination of numerical evidence 
phase, where ETTS and ETCG maintained superior from Table 4 and graphical insights from Figure 4 
generalization, with accuracies of 0.778 and 0.770, confirms that hybrid models deliver significantly 
respectively, compared to ETC’s 0.750. improved performance over their base classifiers. ETTS 
Likewise, for the SVC-based models, both SVTS and stands out as the most effective model, demonstrating the 
SVCG outperformed the baseline SVC during training. highest overall accuracy and stability across all metrics in 
SVTS achieved an accuracy of 0.894, and SVCG recorded both training and testing phases. 
0.879, compared to SVC’s 0.850. Performance 
Hybrid Machine Learning and Optimization Algorithms for pH-Based…                                    Informatica 49 (2025) 281–298   291 
Table 4: ETC and SVC base models achieved results through the performance evaluators 
Metrics 
Section Model 
Accuracy Precision Recall F1 _Score 
ETTS 0.910 0.911 0.910 0.910 
ETCG 0.897 0.901 0.897 0.897 
ETC 0.881 0.888 0.881 0.880 
Training 
SVTS 0.894 0.894 0.894 0.894 
SVCG 0.879 0.879 0.879 0.879 
SVC 0.850 0.850 0.850 0.849 
ETTS 0.778 0.781 0.778 0.778 
ETCG 0.770 0.780 0.770 0.769 
ETC 0.750 0.762 0.750 0.749 
Testing 
SVTS 0.760 0.764 0.760 0.760 
SVCG 0.755 0.757 0.755 0.755 
SVC 0.745 0.746 0.745 0.745 
 
  
  
292   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
  
  
Figure 4: 3D bar plot for the performance of the models in train and test phases. 
Table 4 outlines the performance metrics of base Neutral condition. Whereas 𝐸𝑇𝑇𝑆 achieves higher scores 
models and hybrid models. In a similar manner, Table 5 with 0.860 in precision, 0.901 in recall, and 0.880 in 
present the models' precision, Recall, and 𝐹1 𝑠𝑐𝑜𝑟𝑒 but in 𝐹1 𝑠𝑐𝑜𝑟𝑒. These numbers highlight the enhanced 
a more detailed breakdown of machine learning models performance of 𝐸𝑇𝑇𝑆, particularly in recall and 
that applied to water quality classification based on pH 𝐹1 𝑠𝑐𝑜𝑟𝑒𝑠, demonstrating the effectiveness of 
levels and categorized into Acidic, Basic, and Neutral optimization. Both 𝐸𝑇𝐶 and 𝑆𝑉𝐶 show substantial 
conditions. The performance comparison of 𝐸𝑇𝐶 with improvements in precision, recall, and 𝐹1 𝑠𝑐𝑜𝑟𝑒𝑠 when 
𝐸𝑇𝑇𝑆 reveals significant improvements across all 𝑝𝐻 optimized with 𝑇𝑆𝑂𝐴 and 𝐶𝐺𝑂, respectively. For 
conditions. For the Acidic condition, 𝐸𝑇𝐶 displays an instance, in the acidic condition, 𝑆𝑉𝐶 achieves a precision 
𝐹1 𝑠𝑐𝑜𝑟𝑒 of 0.842, recall of 0.804, and precision of of 0.800 while 𝑆𝑉𝑇𝑆 outperforms 𝑆𝑉𝐶 by improvement 
0.883 , whereas 𝐸𝑇𝑇𝑆 improves these metrics to 0.874 in in precision to 0.865. The optimized models demonstrate 
precision, 0.868 in recall, and 0.871 in 𝐹1 𝑠𝑐𝑜𝑟𝑒. In the superior capability in accurately classifying water quality, 
Basic condition, with a precision of 0.919, recall of 0.732, with 𝐸𝑇𝑇𝑆 and 𝐸𝑇𝐶𝐺 performing notably well in various 
and 𝐹1 𝑠𝑐𝑜𝑟𝑒 of 0.815, 𝐸𝑇𝐶 trails behind 𝐸𝑇𝑇𝑆, which metrics. Among all the models evaluated, the 𝐸𝑇𝑇𝑆 model 
performs better with a precision of 0.890, recall of 0.807, emerges as the best performer, achieving the highest 
and 𝐹1 𝑠𝑐𝑜𝑟𝑒 of 0.846. 𝐸𝑇𝐶 reports an 𝐹1 𝑠𝑐𝑜𝑟𝑒 of overall accuracy in 𝑝𝐻 − 𝑏𝑎𝑠𝑒𝑑 water quality 
0.852, recall of 0.919, and precision of 0.794 for the classification.
Table 5: Model performance in the three different conditions 
Metric 
Model Condition P-value 
precision recall f1-Score 
Acidic 0.874 0.868 0.871 0.032 
ETTS Basic (alkaline) 0.890 0.807 0.846 0.027 
Neutral 0.860 0.901 0.880 0.018 
Acidic 0.887 0.834 0.860 0.04 
ETCG Basic (alkaline) 0.922 0.764 0.836 0.035 
Neutral 0.821 0.921 0.868 0.022 
Acidic 0.883 0.804 0.842 0.045 
ETC 
Basic (alkaline) 0.919 0.732 0.815 0.039 
Hybrid Machine Learning and Optimization Algorithms for pH-Based…                                    Informatica 49 (2025) 281–298   293 
Neutral 0.794 0.919 0.852 0.025 
Acidic 0.865 0.841 0.853 0.048 
SVTS Basic (alkaline) 0.841 0.811 0.826 0.041 
Neutral 0.852 0.883 0.867 0.029 
Acidic 0.840 0.825 0.832 0.052 
SVCG Basic (alkaline) 0.827 0.804 0.815 0.047 
Neutral 0.849 0.872 0.860 0.031 
Acidic 0.800 0.801 0.801 0.059 
SVC Basic (alkaline) 0.814 0.764 0.788 0.053 
Neutral 0.833 0.855 0.844 0.010 
Figure 5 depicts a line plot illustrating the numerical predicts 558, 348, and 205 samples in neutral, acidic, and 
differences in how well different machine learning models alkaline groups. While 𝐸𝑇𝑇𝑆 improves upon this with a 
perform when used to classify water quality based on 𝑝𝐻. predicted value of 547, 376, and 226 samples in neutral, 
This figure's main purpose is to compare various models' acidic, and alkaline, indicating an enhancement in 
efficaciousness visually. Particularly focusing on the accuracy. This improvement is quantified as a percentage 
performance improvements achieved by incorporating difference in the accuracy of the models, with 𝐸𝑇𝑇𝑆, in 
sophisticated optimization algorithms. 𝐸𝑇𝐶 and its hybrid general, showing lower percentage differences compared 
version,  𝐸𝑇𝑇𝑆, show distinct differences. 𝐸𝑇𝐶 correctly to 𝐸𝑇𝐶, highlighting its enhanced predictive capability.
   
Figure 5: Line plot representing the number of correct predictions by ETC-based models 
A comprehensive evaluation of each model's accuracy performance. 𝐸𝑇𝑇𝑆 predicts acidic samples with 
can be done thanks to the confusion matrix, which is 376 correct, seven misclassified as alkaline, and 50 as 
depicted in Figure 6 and compares actual and predicted neutral. For alkaline samples, 𝐸𝑇𝑇𝑆 predicts 
classifications. An illustration of the confusion matrix 226 correctly, with 15 misclassified as acidic and 39 as 
created by different machine-learning models for neutral. Neutral samples are predicted with 547 correctly, 
determining the pH-level-𝑏𝑎𝑠𝑒𝑑 classification of water 39 as acidic, and 21 as alkaline. Comparatively, the 𝐸𝑇𝑇𝑆 
quality is shown in Figure 6. Each model's accuracy can model outperforms its base model 𝐸𝑇𝐶, especially in 
be thoroughly evaluated thanks to the confusion matrix, predicting neutral samples with significantly higher 
which displays actual versus predicted classifications. accuracy. In acidic classification, 𝐸𝑇𝑇𝑆 shows slight 
𝐸𝑇𝐶 predicts acidic samples with 348 correct, three improvement with fewer misclassifications. For alkaline 
misclassified as alkaline, and 82 as neutral. For alkaline predictions, both models show comparable performance, 
samples, it predicts 205 samples correctly, with 12 though 𝐸𝑇𝑇𝑆 has a marginally better accuracy. Among all 
samples misclassified as acidic and 63 samples as neutral. models, the best performance is observed in the 𝐸𝑇𝑇𝑆 
Neutral samples are predicted, with 558 samples model, indicating its superior capability in accurate 𝑝𝐻 −
correctly, 34 as acidic, and 15 as alkaline. When 𝑏𝑎𝑠𝑒𝑑 water quality classification. 
optimized using the Transit Search Optimization  
Algorithm, the hybrid model (𝐸𝑇𝑇𝑆) shows improved 
  
 
294   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
Acidic Alkaline Neutral Acidic Alkaline Neutral
Acidic 376 7 50 Acidic 361 4 68
Aalkaline 15 226 39 Aalkaline 12 214 54
Neutral 39 21 547 Neutral 34 14 559
ETTS  ETCG  
Acidic Alkaline Neutral
Acidic 348 3 82
Aalkaline 12 205 63
Neutral 34 15 558
ETC  
 
Figure 6: Confusion matrix for the accuracy of each model. 
To evaluate the classification performance of the Performance across specific pH categories is also 
models in predicting pH-based water quality, the Receiver shown: 
Operating Characteristic (ROC) curves in Figure 7 are • The acidic class (brown line) demonstrates 
analyzed. These curves illustrate the trade-off between the moderate sensitivity at the outset, improving with 
true positive rate and the false positive rate at various higher false positive rates. 
threshold settings, providing a visual assessment of each • The basic (alkaline) class (cyan line) exhibits the 
model's diagnostic ability. most favorable curve, with a sharp ascent 
The micro-average ROC curve (green dashed line) indicating excellent classification performance at 
aggregates the contributions of all classes, treating each low false positive rates. 
prediction equally. It reflects the classifier's overall ability • The neutral class (purple line) shows a more 
across all samples. The curve’s steep initial rise indicates gradual increase, reflecting a balanced but less 
strong overall performance, with high sensitivity achieved pronounced trade-off between true and false 
at low false positive rates. positives. 
The macro-average ROC curve (red dashed line) Overall, the cyan curve representing basic pH 
calculates the average performance across classes by conditions shows the highest classification accuracy, 
assigning equal weight to each one, regardless of class while the green micro-average curve confirms the 
imbalance. It provides a balanced view of performance robustness of the models in handling all classes 
and shows a smoother increase in true positive rate collectively. 
compared to the micro-average. 
 
Figure 7: The ROC curves for the performance of the most efficient hybrid models 
• Wilcoxon test the Wilcoxon test statistic for each model when compared 
Figure 8 presents a radar plot of the Wilcoxon test pairwise, quantifying relative performance in terms of 
statistics for all single and hybrid models: SVC, SVTS, statistical ranking. 
SVCG, ETC, ETTS, and ETCG. The plotted values reflect From the figure: 
Hybrid Machine Learning and Optimization Algorithms for pH-Based…                                    Informatica 49 (2025) 281–298   295 
• SVC records the highest Wilcoxon statistic • ETCG and ETC show intermediate values (7725 
(13,521), indicating that its performance and 10,063.5), reflecting moderate performance 
significantly differs—statistically outperforming consistency. 
or underperforming—relative to others. The shaded blue region visually represents the 
• ETTS also scores high (12,648.5), suggesting a distribution and spreads of the Wilcoxon test statistics 
strong and consistent performance validated by across all models. A wider area suggests higher variability 
statistical evidence. in model ranks, while more compact regions suggest more 
• In contrast, SVTS and SVCG have lower stability. 
statistics (9313 and 10,945.5, respectively), Overall, the Wilcoxon analysis complements 
pointing to less statistical dominance or more accuracy-based evaluation by statistically confirming the 
variability across comparisons. comparative significance of the observed model 
performance differences. 
 
Figure 8: The results of Wilcoxon test for models’ performance. 
applicable to broader real-world conditions. Additionally, 
5 Discussion the integration of deep learning architectures—such as 
recurrent neural networks (RNNs) or convolutional neural 
networks (CNNs)—can be investigated for their potential 
5.1 Limitations of the study to capture temporal or spatial correlations in water quality 
While the proposed hybrid models (ETTS, ETCG, SVTS, trends. Furthermore, an ensemble framework combining 
and SVCG) demonstrated superior classification multiple hybrid models could be tested using voting or 
performance over their baseline counterparts, the study stacking strategies to further improve classification 
presents several limitations that warrant attention. First, performance.  
the dataset used for model training and evaluation 
comprises only 1,320 daily records, which may limit the 5.3 Practical implications of the study 
generalizability of the models across diverse geographical The findings of this study highlight the practical viability 
regions or seasonal variations. A larger and more of hybrid machine learning and optimization frameworks 
heterogeneous dataset could improve robustness and in environmental monitoring applications. By accurately 
reduce the risk of overfitting. Secondly, the models focus classifying water quality based on pH levels, the proposed 
solely on pH as the output classification parameter, models can assist water resource managers, environmental 
potentially neglecting the complex interactions of other agencies, and public health officials in making informed 
water quality indicators (e.g., turbidity, nitrate levels) that decisions regarding water treatment and ecosystem 
may jointly influence classification outcomes.  preservation. The enhanced predictive accuracy of the 
hybrid models ensures timely identification of acidic or 
5.2 Potential future studies alkaline deviations, which are critical for preventing metal 
Building upon the promising results of this study, future toxicity, preserving aquatic biodiversity, and maintaining 
research can explore several enhancements. One key water usability for irrigation and drinking purposes. 
direction is the expansion of the dataset, both temporally Moreover, the lightweight nature of the models (especially 
and spatially, to include diverse water bodies, seasonal ETC and SVC) makes them suitable for deployment in 
dynamics, and additional environmental indicators. This embedded or real-time monitoring systems, offering 
would allow for the training of more generalizable models 
296   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
scalable solutions for smart water quality surveillance in (KNN) classifier and reported an accuracy of 0.9067. In 
both urban and rural settings. contrast, the present study's ETC+TSOA model attained 
an accuracy of 0.91, outperforming the KNN-based model 
5.4 Comparison between the results of and demonstrating competitive results relative to more 
present study and previous works complex ensemble methods. 
While the accuracy of the ETC+TSOA model is 
Table 6 presents a comparative analysis between the slightly lower than that of RFR and CATBoost, it is 
proposed hybrid model (ETC+TSOA) from the present important to note that the proposed model leverages 
study and several existing state-of-the-art methods in the advanced metaheuristic optimization to enhance model 
domain of water quality classification. The comparison is performance while maintaining a balance between 
based on classification accuracy, which is a key interpretability, computational efficiency, and 
performance metric. Among the referenced studies, Putra generalization capability. This underscores the value of 
et al. [17] achieved the highest accuracy (0.9828) using a hybrid machine learning and optimization approaches, 
Random Forest Regressor (RFR), followed closely by especially in resource-constrained or real-time 
Idroes et al. [15] with a CATBoost model (0.9781). environmental monitoring contexts. 
Sasmita et al. [16] employed a K-Nearest Neighbors 
 
Table 6: The Comparison between the results of present study and previous works. 
Article Reference Model Metrics 
Accuracy 
Idroes et al. [15] CATBoost 0.9781 
Sasmita et al. [16] KNN 0.9067 
Putra et al. [17] RFR 0.9828 
Present study - ETC+TSOA 0.91 
improved Precision by 2.36%, increased Accuracy and 
6 Conclusion  Recall by 2.67%, and a better 𝐹1 𝑆𝑐𝑜𝑟𝑒 by 2.67% also. 
For 𝑆𝑉𝐶 models, 𝑆𝑉𝑇𝑆 increased Accuracy and Recall by 
Water quality is a very important aspect in which 2.01%, increased Precision by 2.41%, and also increased 
environmental health and safety can be ensured. For the 𝐹1 𝑆𝑐𝑜𝑟𝑒 by 2.01% from the base 𝑆𝑉𝐶. Similarly, 
understanding aquatic ecosystems for the purpose of 𝑆𝑉𝐶𝐺 also outperformed 𝑆𝑉𝐶, with increases of 1.34% in 
monitoring and management, proper classification of 
Accuracy and Recall, and it boosted Precision by 1.47%. 
water quality is required, mainly based on their 𝑝𝐻 levels. 
𝐸𝑇𝑇𝑆 turned out to be the best improvement among all, 
This research article applied various methods of artificial 
with the highest scores on all metrics. 
intelligence and optimization algorithms for the 
High capability of hybrid models to provide more 
categorization of the quality of water based on pH levels, 
reliable and accurate pH-based water quality prediction 
hence providing a robust framework for environmental 
underlines the potential for such advanced techniques in 
monitoring. In this research, the dataset used contains 
environmental monitoring and management. These results 
1320 records in total; each record has information on the 
demonstrate how combining machine learning with 
following input parameters: Date, Salinity, Dissolved 
advanced optimization algorithms yields significantly 
Oxygen, secchi Depth, Water Depth, Water Temperature, 
higher predictive accuracy and reliability for 𝑝𝐻-based 
and Air Temperature. The output parameter in this 
water quality classification. The usefulness of hybrid 
analysis is 𝑝𝐻, or the level of acidity, alkalinity, and 
models in these applications, due to their increased 
neutrality indicative of water. These are daily records; 
accuracy, makes them very handy tools in the prediction 
hence, they provide a holistic view of how the respective 
of water quality, therefore helping in water body 
environmental matters are changing from day to day. 
management and conservation. 
In the presented study, SVC and 𝐸𝑇𝐶 were used for 
water quality prediction by considering pH as one of the 
main influential parameters. In the present study, a more References 
advanced class of optimizers in the form of the Transit [1] Boyd, C.E (2019). Water quality: an introduction. 
Search Optimization Algorithm and Chaos Game Springer Nature. 
Optimization were coupled with the svcand 𝐸𝑇𝐶 to [2] Mekonnen, M.M. and A.Y. Hoekstra (2016). Four 
improve their corresponding predictive accuracies. The billion people facing severe water scarcity. 
obtained results reflected that the hybrid models 𝐸𝑇𝑇𝑆, Science Advances, Science, 2(2), p. e1500323. 
𝐸𝑇𝐶𝐺, 𝑆𝑉𝑇𝑆, and 𝑆𝑉𝐶𝐺 outperformed their base model https://doi.org/10.1126/sciadv.1500323.  
with a significant difference in performance. [3] Vorosmarty, C., P. Green, J. Salisbury and R. 
Comparing 𝐸𝑇𝑇𝑆, when all models are taken into Lammers (2000). Global Water Resources: 
consideration against the 𝐸𝑇𝐶 base model, it improves Vulnerability from Climate Change and 
Accuracy by 3.73%, with increased Precision by 2.49%, Population Growth. Science, Science, 289, p. 284. 
boosted Recall by 3.73%, and increased 𝐹1 𝑆𝑐𝑜𝑟𝑒 by https://doi.org/10.1126/science.289.5477.284.  
3.87%. On the other hand, 𝐸𝑇𝐶𝐺 outperforms 𝐸𝑇𝐶 with 
Hybrid Machine Learning and Optimization Algorithms for pH-Based…                                    Informatica 49 (2025) 281–298   297 
[4] Chapman, D (1992). Water Quality Assessments - environmental monitoring. Leuser Journal of 
A Guide to Use of Biota, Sediments and Water in Environmental Studies, Heca Sentra Analitika, 
Environmental Monitoring - Second Edition. 1(2), pp. 62–68. 
Taylor & Francis, https://doi.org/10.60084/ljes.v1i2.99.  
https://doi.org/10.1201/9781003062103.  [16] Sasmita, N.R., S. Ramadeska, Z.M. Kesuma, T.R. 
[5] Schwarzenbach, R., B. Escher, K. Fenner, T. Noviandy, A. Maulana, M. Khairul and R. 
Hofstetter, C. Johnson, U. Gunten and B. Wehrli Suhendra (2024). Decision Tree versus k-NN: A 
(2006). The Challenge of Micropollutants in Performance Comparison for Air Quality 
Aquatic Systems. Science (New York, N.Y.), Classification in Indonesia. Infolitika Journal of 
Science, 313, pp. 1072–1077. Data Science, Heca Sentra Analitika, 2(1), pp. 9–
https://doi.org/10.1126/science.1127291.  16. https://doi.org/10.60084/ijds.v2i1.179.  
[6] Yang, X (2025). Economic Cost Prediction Model [17] Putra, F.M. and I.S. Sitanggang (2020). 
for Building Construction Based on CNN-DAE Classification model of air quality in Jakarta using 
Algorithm. Informatica, Slovenian Society decision tree algorithm based on air pollutant 
Informatika, 49(5). standard index, In IOP Conf Ser Earth Environ Sci, 
https://doi.org/10.31449/inf.v49i5.7029.  IOP Publishing, Purpose-led Publishing, p. 12053. 
[7] Dash, C.S.K., S.C. Nayak, A.K. Behera and S. DOI: 10.1088/1755-1315/528/1/012053 
Dehuri (2023). A Neuro-Fuzzy Predictor Trained [18] Saxena, A. and S. Shekhawat (2017). Ambient air 
by an Elitism Artificial Electric Field Algorithm quality classification by grey wolf optimizer-based 
for Estimation of Compressive Strength of support vector machine. Journal of Environmental 
Concrete Structures. Informatica, Slovenian and Public Health, Wiley Online Library, 2017(1), 
Society Informatika, 47(5). pp. 3131083. 
https://doi.org/10.31449/inf.v47i5.3951.  https://doi.org/10.1155/2017/3131083.  
[8] Benkaddour, M.K (2021). CNN based features [19]https://www.kaggle.com/datasets/supriyoain/water-
extraction for age estimation and gender quality-data. 
classification. Informatica, Slovenian Society [20] Vapnik, V (1998). Statistical Learning Theory. 
Informatika, 45(5). New York. John Willey & Sons. Inc. 
https://doi.org/10.31449/inf.v45i5.3262.  [21] Maldonado, S., J. Pérez, R. Weber and M. Labbé 
[9] Maktum, T., N. Pulgam, V. Chandgadkar, P. (2014). Feature selection for support vector 
Pathak and A. Solanki (2025). A Machine machines via mixed integer linear programming. 
Learning Based Framework for Bankruptcy Information Sciences, Elsevier, 279, pp. 163–175. 
Prediction in Corporate Finances Using https://doi.org/10.1016/j.ins.2014.03.110.  
Explainable AI Techniques. Informatica, [22] Geurts, P., D. Ernst and L. Wehenkel (2006). 
Slovenian Society Informatika, 49(15). Extremely randomized trees. Machine Learning, 
https://doi.org/10.31449/inf.v49i15.6745.  Springer Nature, 63, pp. 3–42. 
[10] Mitchell 1951-, T.M (1997). Machine Learning. https://doi.org/10.1007/s10994-006-6226-1.  
McGraw-Hill. [23] Talatahari, S. and M. Azizi (2021). Chaos game 
[11] Al-Adhaileh, M. and F. Alsaade (2021). optimization: a novel metaheuristic algorithm. 
Modelling and Prediction of Water Quality by Artificial Intelligence Review, Springer Nature, 
Using Artificial Intelligence. Sustainability, 54(2), pp. 917–1004. 
MDPI, 13, p. 4259. https://doi.org/10.1007/s10462-020-09867-w.  
https://doi.org/10.3390/su13084259.  [24] Hippke, M. and R. Heller (2019). Optimized transit 
[12] Zhou, J., Y. Wang, F. Xiao, Y. Wang and L. Sun detection algorithm to search for periodic transits 
(2018). Water Quality Prediction Method Based of small planets. Astronomy & Astrophysics, EDP 
on IGRA and LSTM. Water, MDPI, 10(9). Sciences, 623, p. A39. 
https://doi.org/10.3390/w10091148.  https://doi.org/10.1051/0004-6361/201834672.  
[13] Zhang, Y., P. Thorburn, M. Vilas and P. Fitch [25]https://medium.com/@impythonprogrammer/evalu
(2019). Machine learning approaches to improve ation-metrics-for-classification-
and predict water quality data. fc770511052d#:~:text=Accuracy. 
https://doi.org/10.36334/MODSIM.2019.D5.ZH  
ANGYIF.   
[14] Hastie, T., R. Tibshirani, J.H. Friedman and J.H.  
Friedman (2009). The elements of statistical  
learning: data mining, inference, and prediction.  
Springer Nature. https://doi.org/10.1007/978-0-  
387-21606-5.   
[15] Idroes, G.M., T.R. Noviandy, A. Maulana, Z.  
Zahriah, S. Suhendrayatna, E. Suhartono, K.  
Khairan, F. Kusumo, Z. Helwani and S. Abd  
Rahman (2023). Urban air quality classification  
using machine learning approach to enhance  
298   Informatica 49 (2025) 281–298                                                                                                                                     X. Li et al. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i12.9298 Informatica 49 (2025) 299–318 299 
 
 
 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction 
Using Metaheuristic Optimization Algorithms 
 
 
Naiyue Zhang1, Ying Liu2, * and Zheng Zhang3 
1Department of Computer Application Engineering, Hebei Software Institute, Baoding 071000, China 
2Department of Internet Commerce, Hebei Software Institute, Baoding 071000, China 
3BeiJing HuaDe Eye Hospital, Beijing 100020, China 
E-mail: chanyu13579@163.com  
*Corresponding Author 
Keywords: diabetes, machine learning, gaussian process classification, henry gas solubility enhancement schemes 
(HGSO), and metaheuristic algorithms  
The general basis of diabetes prediction using machine learning involves the application of algorithms 
that take an overall look at multiple features like BMI and glucose levels, age, genetic predispositions, 
and other conditions that may predict the likelihood of developing diabetes. The data-driven schemes, 
such as neural networks or DTs, find patterns in past data and use these to provide reliable predictions 
about future diabetes cases. These schemes keep learning and improving; they grow with new inputs. ML 
now helps in early detection by the use of large datasets, thus enabling early actions such as lifestyle 
changes or medical therapies. Finally, it enhances healthcare by providing individualized risk assessment 
and thus enables timely actions to diminish the burden of diabetes. In addition, the application of ML 
schemes, including Gaussian Process Classification-GPC, Linear Discriminant Analysis-LDA with Henry 
Gas Solubility Optimization-HGSO, Chaos Game Optimization-CGO, and Chef-Based Enhancement 
scheme-CBOA, has greatly benefited the process of prediction. These schemes were combined with 
optimizers, guided by the objective of this work, which deals with predicting the type of diabetes and the 
diagnosis of persons vulnerable to it. This was a strategic fusion aimed at creating new hybrid schemes 
with increased precision in prediction. Further analysis showed that the GPCB model was the best, with 
an impressive 0.981 during training. By contrast, the GPCG and GPHG schemes are relatively less 
accurate, with an accuracy of 0.963 and 0.946, respectively. These results justify the utility of the 
integrated approach, where advanced ML algorithms were able to generate predictive schemes superior 
in terms of accuracy and efficiency compared to the classical methods. 
Povzetek: V članku je opisan sistem za napovedovanje sladkorne bolezni tipa 2 s pomočjo strojnega 
učenja. Algoritem GPCB združuje klasifikacijo Gaussovega procesa z metahevrističnimi optimizacijskimi 
algoritmi za kvalitetno diagnozo. 
 
1 Introduction insulin to meet its needs, thereby resulting in high blood 
sugar levels [8]. Gestational diabetes develops during 
Type 2 diabetes, another name for diabetes, is a long-term pregnancy when fluctuations in hormones compromise 
metabolic illness marked by elevated blood glucose levels insulin activity, increasing the risk of complications for 
because of either the pancreas's insufficient production of both mother and child [9], [10]. Diabetes' persistent High 
insulin or its inability to utilize insulin effectively [1]. blood sugar levels can cause a stream of issues affecting 
Insulin is a hormone produced by the pancreas that many organ systems [11]. These include cardiovascular 
regulates blood sugar levels by allowing the absorption of disorders including strokes and heart attacks; nerve 
glucose into the cells to use it as energy [2]. Whenever the damage; diabetic neuropathy; kidney damage causing 
mechanism is disturbed, glucose builds up in the numbness, tingling (diabetic nephropathy), and 
circulation and causes hyperglycemia [3]. Diabetes is discomfort; as well as eye disturbances that can cause 
sorted into 3 types: type I, type II, and gestational diabetes blindness due to diabetic retinopathy, if not addressed 
[4]. Type 1 diabetes, which is frequently diagnosed in [12], [13]. Diabetes also raises the risk of ulcers in feet and 
childhood or adolescence, is caused by the immune system amputations owing to impaired circulation and damage to 
erroneously targeting and killing the insulin-generating nerves [14]. 
beta cells in the pancreas [5]. This involves lifetime insulin Management includes frequent testing of blood 
treatment to control blood sugar levels. Type 2 diabetes, glucose, proper nutrition, regular physical activity, and 
the most prevalent kind, usually develops in adulthood and insulin therapy or medication when necessary. Other 
is often associated with overweight, lack of exercise, and treatments for type 2 diabetes include weight loss and 
genetic risk [6], [7]. Type 2 diabetes develops when the smoking cessation. People with diabetes need training and 
body becomes resistant to or cannot produce sufficient support, as enabling them with skills for optimum self-
300 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
management reduces complications, reflecting a LR, and SVM, might also classify individuals into types 
collaborative approach by all involved [15]: providers of of diabetes based on sets of different variables. This will 
healthcare, the patient, and family members [16], [17]. enable individual risk assessments and prevention 
Type 2 is a complex metabolic condition that casts ripples methods based on an individual profile, and in time, will 
in personal life since it has myriad implications for many allow healthcare professionals to offer more personalized 
facets [18], [19]. It presents physically as a constellation and effective preventative treatment [29]. 
of symptoms that include chronic thirst and frequent 
urination, fatigue, and unexpected weight gain or loss 1.1 Objectives 
[20], [21]. This chronic fight against blood sugar 
This article proposes developing a scheme for diagnosing 
becoming normal turns out to be an everyday obsession 
types of diabetes and predicting the likelihood of a person 
with food intake, medication routines, and even social 
being affected with it. In order to solve this issue, the use 
interactions [22]. Besides the physical discomforts, type 2 
of ML schemes including LDA and GPC is chosen, along 
diabetes also has a great psychological and emotional 
with 3 optimizers: CGO, HGSO, and CBOA. The 
impact. The constant monitoring required to manage the 
integration of these optimizers with the schemes leads to 
disease can lead to feelings of anxiety, stress, and 
some new hybrid model generation, which is supposed to 
depression. The fear of complications is huge, with every 
give better performance in the prediction process. Further, 
increase or decrease in glucose triggering a snowball 
these newly designed hybrid schemes are evaluated for 
effect of questions about what this could mean for long-
their performances using different plots and tables. It is 
term health and well-being. 
expected that through their dense analysis, information 
Type 2 diabetes can negatively affect social 
about the most effective performance of the different 
relationships and interactions. Even going out for meals 
schemes can be extracted, along with the potential deficit 
may become a maze of counting carbohydrates and 
in functionality among them. Such an inclusive strategy 
administering insulin, while social events may become 
will provide thorough knowledge about various schemes' 
distressing in their demand to explain dietary restrictions 
strengths and flaws that help in formulating approaches 
or personally withdraw to check blood glucose levels [23]. 
related to the diagnosis and prediction of diabetes. 
The stigma associated with diabetes can also make people 
Gaussian Process Classification (GPC) and Linear 
feel isolated or humiliated, disrupting interpersonal 
Discriminant Analysis (LDA) were picked owing to their 
interactions [24]. Besides that, type 2 diabetes may lead to 
complimentary capabilities in modeling classification 
serious financial burdens. Pharmaceutical treatment, 
challenges. GPC is a non-parametric, probabilistic model 
apparatus for blood glucose monitoring, and frequent 
that captures complicated, nonlinear interactions and 
medical consultations are not cheap, especially when 
offers uncertainty estimates, making it suited for the 
insurance coverage is inadequate. Further, loss of working 
nuanced and high-risk nature of diabetes prediction. 
days due to poor health or visiting doctors may affect 
Conversely, LDA is a basic yet powerful linear classifier 
earnings and professional development [25]. 
that performs well when class distributions are nearly 
Notwithstanding such constraints, persons with type 2 
Gaussian. Its interpretability and minimal computing cost 
diabetes often show remarkable resilience and 
make it suitable for baseline comparison. LDA is good for 
resourcefulness [26]. Most learn to manage the 
efficiency and understanding, while GPC is good for 
complexity of their disease through education, proactive 
making strong, adaptable models of complicated health 
self-management, and support networks and feel 
data patterns. Together, they make a balanced framework. 
empowered by taking responsibility for their health. 
However, the pervasive nature of type 2 diabetes ensures 
its impacts are felt at all levels of life, making 2 Material and methods 
comprehensive approaches to prevention, treatment, and 
care of utmost importance. 2.1 Data collection 
Machine learning algorithms can predict the risk a 
person has for diabetes and even define which type of 
Prior to model training, the dataset underwent several 
diabetes the person is most probable to get, considering his 
preprocessing procedures to enhance data quality and 
or her medical history, life style habits, biomarkers, and 
model performance. Missing values were addressed using 
genetic trends. These algorithms are trained on large 
mean imputation for numerical features. Outliers were 
datasets consisting of data from diabetic and non-diabetic 
detected and mitigated using z-score normalization. All 
patients through a method called supervised learning. The 
continuous features were standardized to zero mean and 
computers learn to find, through patterns and links in data, 
unit variance. Categorical variables, if any, were encoded 
small signs and risk factors associated with different types 
using one-hot encoding. Feature selection was conducted 
of diabetes [27]. For example, ML schemes for the 
using mutual information to retain only the most relevant 
diagnosis of type 2 diabetes consider age, BMI, family 
predictors. The final dataset was randomly shuffled and 
medical history of diabetes, cholesterol levels, blood 
split into training and testing sets using an 70:30ratio to 
pressure, and glucose tolerance. These combined 
ensure unbiased model evaluation. Fig. 1 displays the far-
indicators may, therefore, enable the model to project the 
reaching consequences of diabetes on a person's life, 
likelihood of a person developing type 2 diabetes over a 
spanning blood pressure to pregnancy, as it affects an 
specific period [28]. Other ML methods, including DT, 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 301 
 
individual's well-being and lifestyle in general. This study importance in effective management and 
tries to make meaning out of the interaction of diabetes reduction of adverse effects of diabetes on 
with these major determinants, therefore, basically general health. Pregnancy complicates the care of 
determining the trend of the illness. diabetes because of fluctuating hormonal 
changes and increased insulin resistance. 
• High blood pressure worsens diabetes • Gestational diabetes may be developed during 
complications by essentially destroying blood pregnancy, increasing the risk for complications 
vessels and organs. High blood pressure and in both mother and child, including macrosomia, 
atherosclerosis accelerate the narrowing of preeclampsia, and anomalies at birth. Women 
arteries, which limits blood flow, thereby with previous diabetes have difficulties 
worsening the common diabetes consequences of managing blood sugar levels, again increasing 
heart disease, stroke, and kidney failure. risks for adverse outcomes such as preterm birth 
Hypertension further increases the risk for and cesarean section delivery. Close monitoring, 
diabetic retinopathy, which can cause visual dietary modification, and medication may be 
impairment or even total blindness. It also leads necessary to achieve appropriate risk reduction 
to peripheral artery disease, which raises the and optimal health for both mother and fetus. 
chances of foot ulcers and amputations in Such cooperation between obstetricians, 
diabetic patients. Good management of blood endocrinologists, and diabetes educators forms 
pressure through lifestyle modifications, the very foundation for the best pregnancy 
medication, and regular checks is of utmost outcomes among women with diabetes.
 
 
  
Figure 1: The plot illustrating the Contour - color fill between the input and output 
302 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
𝜋2
+(𝜇1 − 𝜇2)
𝑇Σ−1(𝜇1 − 𝜇2)) + 2 ln ( ) 
2.2 Linear discriminant analysis (LDA) 𝜋1
An instance 𝑥's intended class is: 
Linear Discriminant Analysis (LDA) is a statistical 
1, 𝑖𝑓 (𝑥) < 0,
approach used to separate two or more classes by ?̂?(𝑥) = {  (7) 
2, 𝑖𝑓 𝛿(𝑥) > 0.
identifying a linear combination of characteristics that best 
When both categories have identical priors, 𝜋1 = 𝜋differentiates them. It assumes that the different classes 2 , 
Eq. (5) takes a particular form: 
create data based on Gaussian distributions with the same 𝑇
covariance matrix. LDA is computationally efficient, 2(Σ−1(𝜇2 − 𝜇1)) 𝑥+(𝜇1 − 𝜇2)
𝑇Σ−1(𝜇1 − 𝜇2) (8) 
interpretable, and particularly successful when the = 0, 
relationship between features and labels is nearly linear, Whose statement on the left can be interpreted as 
making it suited for baseline comparison in medical 𝛿(𝑥) in Eq. (7).  
classification problems like diabetes prediction. 
LDA assumes that the 2 categories' matrices of covariance 2.3 Gaussian process classification (GPC) 
are similar [30], and one of the 2 categories has a greater 
Gaussian Process Classification (GPC) puts a 
average than the other, as seized 𝜇1 < 𝜇2. One of these 
Gaussian process prior over a latent function to predict the 
examples is the one provided for 𝑥 ∈ 𝑅 classes: 
chance of being in a certain class. This lets GPC capture 
Σ1 = Σ2 = Σ. (1) 
nonlinear patterns in big datasets in a flexible way and 
1 (𝑥 − 𝜇1)
𝑇Σ−1(𝑥 − 𝜇1)
𝑒𝑥𝑝 (− )𝜋 measure how uncertain predictions are, which is very 
2 1
√(2𝜋)𝑑|Σ| important for medical diagnostics. GPC is better for risk-
1 (𝑥 − 𝜇 𝑇 −1
2) Σ (𝑥 − 𝜇2) sensitive predictions like figuring out how likely someone 
= 𝑒𝑥𝑝 (− )𝜋 ,
√(2𝜋)𝑑|Σ| 2 2
is to have diabetes since it changes its complexity 
(𝑥 − 𝜇1)
𝑇Σ−1(𝑥 − 𝜇1) dependent on the input. This is different from fixed 
⟹ 𝑒𝑥𝑝 (− )𝜋
2 1
(2) parametric models. 
(𝑥 − 𝜇 Given a set of N training input points, in typical 
2)
𝑇Σ−1(𝑥 − 𝜇2)
= 𝑒𝑥𝑝 (− )𝜋
2 2 , classification using Gaussian methods, procedure 𝑋 =
(𝑎) 1  [𝑥1, … , 𝑥𝑁]
𝑇 and their associated class designations 𝑌 =
 ⇒ − (𝑥 − 𝜇 )𝑇Σ−1
2 1 (𝑥 − 𝜇1) + ln(𝜋1)  [𝑌1, … , 𝑌𝑁]
𝑇 , one would like to forecast the class 
1 participation percentage of a fresh test point 𝑥
= − (𝑥 − 𝜇2)
𝑇Σ−1(𝑥 − 𝜇 ×. This may 
2) + ln (𝜋2 2) be accomplished by utilizing a latent function f, which is 
The simple logarithm of the equation's sides is found then mapped onto the [0;  1] interval utilizing the probit 
by (𝑎). The equation may be written as: operator. For binary classification, use the notion that y 
(𝑥 − 𝜇1)
𝑇Σ−1(𝑥 − 𝜇1) = (𝑥
𝑇 − belongs to {0,1}, where 1 displays the positive class and 0 
𝜇𝑇)Σ−1(𝑥 − 𝜇1) = 𝑥
𝑇Σ−11 𝑥 − 𝑥𝑇Σ−1𝜇1 − displays the negative. Therefore, the likelihood of class 
𝜇𝑇
(
Σ−11 𝑥 + 𝜇𝑇 − (𝑎)
1Σ
1𝜇1     𝑥𝑇Σ−1𝑥 + 𝜇𝑇 −1 3) membership 𝑝(𝑦 = 1|𝑥) might be expressed as Φ(f(x)), 
= 1Σ 𝜇1 −
where Φ(. ) is the probit purpose. Gaussian procedure 
2𝜇𝑇1Σ
−1𝑥  classification is then performed by applying a GP prior to 
Where (𝑎) is because 𝑥𝑇Σ−1𝜇1 = 𝑥
𝑇Σ−1𝑥 since Σ−1 the latent function of 𝑓(𝑥). A GP [31] is a random 
is balanced and Σ−𝑇 = Σ−1. As a result, it is observed: procedure completely described by a mean function 
1 1
− 𝑥𝑇Σ−1𝑥 − 𝜇𝑇Σ−1𝜇 + 𝜇𝑇 1
2 2 1 1 1Σ
− 𝑥 + ln(𝜋1) 𝑚(𝑥)  =  𝔼[𝑓(𝑥)] and a positive definite covariance 
method 𝕜(𝑥; ?́?)  =  𝕧[𝑓(𝑥);  𝑓(?́?)]. To project an 
1 1 (4) 
= 𝑥𝑇Σ−1𝑥 − 𝜇𝑇 −1
2 2Σ 𝜇2 + 𝜇
𝑇 −1
2Σ 𝑥 + ln (𝜋2) additional test point 𝑥× , first calculate the range of the 
2 related latent variable 𝑓×. 
As an outcome of multiplying both sides of the 
equation by 2, the expression that follows is obtained: 𝑝(𝑓×|𝑥×, 𝑋, 𝑦) = ∫𝑝(𝑓×|𝑥×, 𝑋, 𝑓) 𝑝(𝑓|𝑋, 𝑦)𝑑𝑓 (9) 
2(Σ−1(𝜇2 − 𝜇1))
𝑇𝑥 + (𝜇1 − 𝜇2)
𝑇Σ−1(𝜇1 − 𝜇2)
𝜋 Where 𝑓 =  [𝑓1, … , 𝑓𝑁]
𝑇, and then using this 
2
+ 2 ln ( ) = 0 (5) 
 distribution, calculate the class participation distribution: 
𝜋1 𝑝(𝑦× = 1|𝑥The equation of a line may be represented as 𝑎𝑇𝑥 + ×, 𝑋, 𝑦) 
(10) 
𝑏 = 0. T As a result, if the Gaussian distributions of the 2 = ∫Φ(𝑓×) 𝑝(𝑓×|𝑥×, 𝑋, 𝑦)𝑑𝑓× 
classes are considered, and the covariance matrices are 
considered to be equal, a line displays the categorization 
choice border. This approach is called LDA because the 2.4 HGSO 
choice border between the 2 classes is linear. The The following subsection describes the motivation for 
expressions were relocated to the correct side, which HGSO, which depends on the act of Henry's law. 
related to the second class, to create Eq. (5). Therefore, if 
used 𝛿(𝑥): ℝ𝑑 → ℝ as the left-hand side calculation 2.4.1 Henry’s Law  
(function) in Eq. (6). In 1803, William Henry created Henry's Law, a gas law 
𝑇
(𝑥) ∶= 2(Σ−1(𝜇2 − 𝜇1)) 𝑥 (6) [32]. Henry's law reads as follows: "At a temperature that 
remains constant, the amount of a given gas that dissolves 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 303 
 
in a given type and volume of liquid is inversely related to 𝑗. The optimal gas for the entire colony is then determined 
the partial pressure that exists for that gas in equilibrium by rating the gasses.  
with that liquid." Consequently, Henry's law is greatly Step 4: Update Henry’s coefficient. 
dependent on temperature [33] and displays that a gas's Eq. (18), which updates Henry's factor, is as follows: 
solubility (𝑆𝑔) is directly proportional to its relative 𝐻𝑗(𝑡 + 1) = 𝐻𝑗(𝑡) 
pressure (𝑃𝑔), as represented in the subsequent equation: 1 1
𝑆𝑔 = 𝐻 × 𝑃𝑔 (11) × 𝑒𝑥𝑝 (−𝐶𝑗 × ( − )) , 𝑇(𝑡) (18) 
𝑇(𝑡) 𝑇𝜃
Where 𝐻 is Henry's stable, which is particular to the = exp (−𝑡/𝑖𝑡𝑒𝑟) 
given gas-solvent mixture at a certain temperature, and 𝑇 displays the temperature, 𝑇𝜃 displays a constant 
𝑃𝑔 is the gas's relative pressure. 
equal to 298.15, iter is the overall count of cycles, and 𝐻𝑗 
𝑑𝑙𝑛𝐻 −∇𝑠𝑜𝑙𝐸
=  (12) is Henry's factor for cluster 𝑗 in this equation.  
𝑙(1/𝑇) 𝑅 Step 5: Update solubility. 
Furthermore, the impact of temperature dependency The following formula is used to modify the 
on Henry's law variables has to be addressed. The Van't solubility: 
Hoff equation describes how Henry's law constants vary 𝑆𝑖,𝑗(𝑡) = 𝐾 × 𝐻𝑗(𝑡 + 1) × 𝑃𝑖,𝑗(𝑡) (19) 
when a system's temperature varies: 𝑆𝑖,𝑗   is the soluble content of gas 𝑖 in cluster 𝑗, 𝑃𝑖,𝑗 is 
𝐻(𝑇) = exp (𝐵/𝑇) × 𝐴 (13) 
the amount of partial pressure on gas 𝑖 in cluster 𝑗, and 𝐾 
Where 𝐻 is an expression of 2 parameters, 𝐴 as well is a value that is constant. 
as 𝐵, which are the 2 factors that determine H's 𝑇 Step 6: Update position.  
dependency. In addition, one can generate a function The position was revised below: 
based on 𝐻 at the standard temperature 𝑇 =  298.15𝐾. 𝑋𝑖,𝑗(𝑡 + 1) = 𝑋𝑖,𝑗(𝑡) 
−∇𝑠𝑜𝑙𝐸
𝐻(𝑇) = 𝐻𝜃 × 𝑒𝑥𝑝 ( (1/𝑇 − 1/𝑇𝜃)) (14) +𝐹 × 𝑟 × 𝛾 × (𝑋𝑖,𝑏𝑒𝑠𝑡(𝑡) − 𝑋𝑖,𝑗(𝑡)) 
𝑅
+𝐹 × 𝑟 × 𝛼 × (𝑆𝑖,𝑗(𝑡) × 𝑋𝑏𝑒𝑠𝑡(𝑡) − 𝑋𝑖,𝑗(𝑡)) (20) 
The Van't Hoff formula applies if −∇𝑠𝑜𝑙𝐸  is a stable, 
hence Eq. (14) may be rewritten as follows: 𝐹𝑏𝑒𝑠𝑡(𝑡) + 휀
𝛾 = 𝛽 × 𝑒𝑥𝑝 (− ) , 휀 = 0.05 
𝐻(𝑇) = 𝑒𝑥𝑝(−𝑐 × (1/𝑇 − 1/𝑇𝜃) × 𝐻𝜃) (15) 𝐹𝑖,𝑗(𝑡) + 휀
Where 𝑋𝑖,𝑗 displays the location of gas 𝑖 in cluster 𝑗, 
2.4.2 HGSO mathematical scheme and 𝑟 and 𝑡 are the random constant and cycle time, 
This part describes the mathematical formulas for the respectively. The best gas in cluster j is indicated by 𝑋𝑏𝑒𝑠𝑡 , 
suggested HGSO method. The mathematical procedures while the best gas in the entire swarm is shown by 𝑋𝑖,𝑏𝑒𝑠𝑡. 
are outlined below: In addition, 𝛾 displays gas 𝑗′𝑠 capacity to interact with 
Step 1: Initialization process. other gases in cluster 𝑖, 𝛼 displays the effect of other gases 
The count of gases (population size N) and the on gas i in cluster j and is equal to 1, and 𝛽 is a constant. 
placements of gases have been set up using the subsequent The fitness of gas i in cluster j is denoted by 𝐹𝑖,𝑗 , whereas 
equation: 𝐹𝑏𝑒𝑠𝑡  displays the fitness of the best gas in the overall 
 𝑋𝑖(𝑡 + 1) = 𝑋𝑚𝑖𝑛 + 𝑟 × (𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛) (16) system. 𝐹 is the flag that modifies the direction of the 
where t is the repetition time, 𝑋𝑚𝑖𝑛  and 𝑋𝑚𝑎𝑥  are the search agent and gives variety (±).  𝑋𝑖,𝑏𝑒𝑠𝑡and 𝑋𝑏𝑒𝑠𝑡   are 
issue bounds, 𝑟 is a random number between 0 and 1, and the 2 parameters that control the exploration and 
𝑋𝑖 is the location of the ith gas in population 𝑁. The below exploitation capabilities. Particularly, 𝑋𝑖,𝑏𝑒𝑠𝑡 displays the 
equation is used to establish the count of gasses 𝑖, Henry's best gas 𝑖 in cluster 𝑗, whereas𝑋𝑏𝑒𝑠𝑡  displays the best gas 
constant of type 𝑗 (𝐻𝑗(𝑡)) partial pressure 𝑃𝑖,𝑗 of gas 𝑖 in in the whole swarm. 
cluster 𝑗, and −∇𝑠𝑜𝑙𝐸/𝑅steady value of type 𝑗 (𝐶𝑗). Step 7: Escape from local optimum. 
𝐻𝑗(𝑡) = 𝑙1 × 𝑟𝑎𝑛𝑑(0,1), 𝑃𝑖,𝑗 The purpose of this phase is to leave the local 
(17) 
= 𝑙2 × 𝑟𝑎𝑛𝑑 (0,1), 𝐶𝑗 = 𝑙3 × 𝑟𝑎𝑛𝑑(0,1) optimum. The count of worst agents 𝑁𝑤 can be chosen and 
where 𝑙1, 𝑙2, and 𝑙3 are designated as constants with ranked using the following equation: 
corresponding amounts of 5𝐸 − 02, 100, and 1𝐸 − 02. 𝑁𝑤 = 𝑁 × (𝑟𝑎𝑛𝑑(𝑐2 − 𝑐1) + 𝑐1), 𝑐1 (21) 
Step 2: Clustering.  = 0.1 𝑎𝑛𝑑 𝑐2 = 0.2  
In proportion to the count of gas types, the entire The count of search agents is denoted by 𝑁. 
number of agents is split into equal clusters. Every cluster Step 8: Update the position of the worst agents. 
has the same Henry's constant measurement (𝐻𝑗) since 𝐺(𝑖,𝑗) = 𝐺min (𝑖,𝑗) + 𝑟 × (𝐺max (𝑖,𝑗) − 𝐺min (𝑖,𝑗)) (22) 
they all contain the same gases. In Eq. (22), 𝐺(𝑖,𝑗) displays gas 𝑖's position in cluster 𝑗, 
Step 3: Evaluation. 𝑟 is a random integer, and 𝐺min (𝑖,𝑗) and 𝐺max (𝑖,𝑗) represent 
The gas having the largest equilibrium state among the problem boundaries. The steps of the process are 
the others of its sort is identified by analyzing each cluster depicted in Fig. 2. 
304 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
 
Figure 2: The flowchart of the HGS. 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 305 
 
as well as the lowest and highest levels of eligibility are 
2.5 Chaos game optimization (CGO) connected. 
The basic idea of this mathematical model is to create 
The reasons behind the groundbreaking metaheuristic 
the general shape of a Sierpinski triangle by producing 
algorithm known as CGO and its computational 
several appropriate seeds inside the search area. In this 
architecture are covered in this section. 
way, fresh seeds are also produced via the Sierpinski 
triangle technique. An intermediate triangle with three 
2.5.1 Mathematical model  seeds is created as follows for each appropriate seed in the 
This section presents an optimization technique based on search field 𝑋𝑖: 
the ideas of chaos theory. The mathematical foundation of • Positioning of the previously identified Global 
the CGO algorithm is developed based on the basic Best (GB), 
concepts of fractals and chaotic games. The CGO • The average group's location (𝑀𝐺𝑖),  
algorithm considers several solution candidates (X) that • The 𝑖th resolution competitor (𝑋𝑖) is the chosen 
suggest certain able seeds within a Sierpinski triangle seed. 
because many natural evolution algorithms keep an array Although the mean values of randomly chosen 
of solutions that evolve through random modifications and eligible seeds with an equal chance of integrating the 
selections. Each solution candidate (𝑋𝑖) in this method currently regarded starting eligible seed (𝑋𝑖) are reflected 
𝑗
contains a set of choice factors (𝑥𝑖 ) that represent where in the 𝑀𝐺𝑖, the GB is the best solution candidate with the 
the eligible seeds are located inside a Sierpinski triangle. highest eligibility levels. Together with the identified 
The enhancement scheme uses the Sierpinski triangle to eligible seed (𝑋𝑖), the GB and 𝑀𝐺𝑖 create a Sierpinski 
explore potential solutions. In the enhancement scheme, triangle. In order to generate some more seeds that can be 
the Sierpinski triangle is used to look for possible regarded as fresh eligible seeds for finishing the Sierpinski 
solutions. The quantitative treatment of these aspects is triangle, a temporary triangle is made inside the search 
given below: area for each of the first eligible seeds, as was previously 
𝑋1 indicated. Four strategies are suggested to accomplish this 
 𝑋  
 2 aim. The 𝑖th permanent triangle (ith repetition) includes a 
𝑋 =  
⋮  Sierpinski triangle's three vertices [GB (green seed), 𝑀𝐺𝑖 
 𝑋𝑖  (red seed), and 𝑋𝑖 (blue seed)] in addition to the n 
 ⋮  appropriate seeds that were accessible in the previous 
[𝑋𝑛] cycle. This homemade triangle uses the chaotic game 
 𝑥
1 2 𝑗 𝑑
1 𝑥1 … 𝑥1 … 𝑥 (23) 
1  principle to produce fresh seeds using one die and three 
 𝑥12 𝑥2 𝑗
2 … 𝑥2 … 𝑥𝑑 seeds. 𝑋𝑖 is used to hold the first seed, GB for the second, 
2  
 and 𝑀𝐺𝑖 for the third. For the first seed, a die with three 
= ⋮ ⋮ … ⋮ ⋱ ⋮  𝑖 = 1,2, . . , 𝑛.
  , {  green and three red faces was utilized. Upon rolling the 
 𝑥
1
𝑖 𝑥2 𝑗
𝑖 … 𝑥𝑖 … 𝑥𝑑 𝑖 = 1,2, … , 𝑑.
𝑖  dice, the seed in the 𝑋𝑖is shifted to the 𝑀𝐺𝑖 (red face) or 
 ⋮ ⋮ … ⋮ ⋱ ⋮  the GB (green face) based on the resulting color. This 
[𝑥1𝑛 𝑥2𝑛 … 𝑥𝑖 𝑑
𝑛 … 𝑥𝑛 ] element is replicated using a random number generation 
For each seed in the Sierpinski triangle (search area), method that generates just 2 values, 0 as well as 1, 
the count of permissible seeds, or potential solutions, is n; enabling the choice of red or green faces. When the green 
and d is the seed's size. Random selection is used to face is visible, the 𝑋𝑖 seed advances in the direction of the 
determine where these appropriate seeds are initially GB; it moves toward the 𝑀𝐺𝑖. Even if each green or red 
placed in the search space. face has an equal chance of appearing in the game, the 
𝑗
𝑥𝑖 (0) = 𝑥
𝑖
𝑚𝑖𝑛 + 𝑟𝑎𝑛𝑑 potential of getting two equivalent random integers for the 
𝑗 𝑗 𝑖 = 1,2, … , 𝑛. (24) GB and the 𝑀𝐺𝑖 is also taken into account. The direction 
∙ (𝑥𝑖,𝑚𝑎𝑥 − 𝑥𝑖,𝑚𝑖𝑛), {  
𝑗 = 1,2,… , 𝑑. of the 𝑋𝑖 's seed advancement is a line segment that 
The beginning position of the eligible seeds is defined connects the GB with the 𝑀𝐺𝑖. The flow of seeds within 
𝑗 𝑗 𝑗
by 𝑥𝑖 ; 𝑥𝑖,𝑚𝑎𝑥 as well as 𝑥𝑖,𝑚𝑖𝑛 indicate the maximum and the search area must be restricted because of the chaotic 
lowest permitted values for the ith solution candidate's jth game method; hence, this component is controlled by 
choice variable; rand is a random integer within the range certain at-random factorials that were created: 
1
[0,1]. The way dynamical systems, often known as self- 𝑆𝑒𝑒𝑑𝑖 = 𝑋𝑖 + 𝛼𝑖 × (𝛽𝑖 × 𝐺𝐵 − 𝛾𝑖 ×𝑀𝐺𝑖), 𝑖 (25) 
similar and self-organizing systems, behave, as was = 1,2, … , 𝑛. 
previously described, and display specific fundamental 𝑋𝑖 displays the 𝑖𝑡ℎ resolution candidate, GB denotes 
patterns serves as the foundation for the core ideas of the global best discovered thus far, and 𝑀𝐺𝑖   displays the 
chaos theory. The fundamental dynamical system patterns mean of a few selected, qualified seeds. While 𝛽𝑖 and 𝛾𝑖 
according to chaos theory are exhibited by eligible seeds, indicate a random integer between 0 and 1 to enable die 
which are acquired beginning positions. It is possible to rolling, 𝛼𝑖 is a randomly generated factorial to reflect seed 
ascertain whether these seeds are suitable to function as movement limitations. Three blue and three red-faced dice 
fundamental patterns (self-similarity) for an optimization are used for the next seed (GB). Either the 𝑀𝐺𝑖 (red face) 
issue by employing potential solutions (𝑋). The candidates or the 𝑋𝑖 (blue face) receives the seed in the GB, 
for the solutions with the greatest and worst fitness values depending on the color that emerges from rolling the dice. 
306 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
The model used in this section is the same as the original constructed. For the variables that violate the technique, a 
seed. If a blue face emerges, the seed travels to the 𝑋𝑖; if a 𝑗
boundary change is ordered if the 𝑥𝑖  is beyond the 
red face appears, the seed goes to the 𝑀𝐺𝑖. Another seed, parameter's range. The most repetitions that can be done 
like the first, can travel towards a location on the in which the optimization process takes place serves as the 
connecting lines between 𝑋𝑖and 𝑀𝐺𝑖 . This motion is basis for the termination criterion. 
restricted by randomly produced factorials.  
𝑆𝑒𝑒𝑑21 = 𝐺𝐵 + 𝛼𝑖 × (𝛽𝑖 × 𝑋𝑖 − 𝛾𝑖 ×𝑀𝐺𝑖), 𝑖 (26) 2.6 Chef-Based Enhancement scheme 
= 1,2, … , 𝑛. 
(CBOA) 
where each of the variables 𝛽𝑖 and 𝛾𝑖 is a random 
value of 0 or 1 to simulate the option of rolling a die, and A metaheuristic method called CBOA was just introduced 
𝛼𝑖 is the randomly generated factorial for characterizing by [34]. The CBOA's mathematical representation and 
the mobility limitations of the seeds. The remaining natural architecture are covered in this section.  
requirements are the same as those listed for the initial 
seed. The third seed is employed to roll a die with green 2.6.1 Mathematical model of CBOA  
and blue faces, 𝑀𝐺𝑖. The seed is directed toward either the Below is a presentation of the CBOA mathematical model 
𝑋𝑖 (blue face) or the GB (green face) depending on the using the situation from Section 2.1. First, the 
color. An approach for generating random numbers is used initialization stage of the algorithm is initiated, much like 
to duplicate this element. It yields just 2 values, 0 and 1, in other metaheuristics. There are 2 populations as a result 
so that users may select between the blue or green faces. of the CBOA: elite agents and candidate solutions. 
Additionally, the lines connecting the 𝑋𝑖 and GB can be Therefore, as shown by Eq. (30), a matrix may be used to 
followed by the seed. Some random factorials are also represent the CBOA members. 
used to achieve this goal, such as: 𝑋1 𝑥1,1 … 𝑥1,𝑑𝑖𝑚
𝑆𝑒𝑒𝑑31 = 𝑀𝐺𝑖 + 𝛼𝑖 × (𝛽𝑖 × 𝑋𝑖 − 𝛾𝑖 × 𝐺𝐵), 𝑖 𝑋 = [ ⋮ ] = [ ⋱ ]  
(27) (30) 
= 1,2, … , 𝑛. 𝑋𝑁 𝑥
𝑁×1 𝑁,𝑑𝑖𝑚 … 𝑥𝑁,𝑑𝑖𝑚 𝑁×𝑑𝑖𝑚
In order to generate the fourth seed, an additional where 𝑁 is the population size, dim is the issue length 
method is employed to carry out the modification stage in (𝑎 ∈ [1, 𝑁], 𝑏 ∈ [1, 𝑑𝑖𝑚]), 𝑋 is the CBOA population 
the qualifying seeds' position updates within the search matrix, and 𝑥𝑎,𝑏 indicates the value of the bth problem 
area. Changes in this seed's position are made depending parameter for the ath CBOA member. CBOA members' 
on arbitrary adjustments made to the randomly chosen locations are established using Eq. (31): 
decision criteria. Eq. (28) depicts a schematic depiction of 𝑥𝑎,𝑏 = 𝐿𝑂𝑊𝑏 + 𝑟𝑎𝑛𝑑 ∙ (𝑈𝑃𝑏 − 𝐿𝑂𝑊𝑏) (31) 
the specified procedure for the 4th seed; it has the Where rand is an arbitrary number in the range of [0, 
following mathematical representation: 1], 𝐿𝑂𝑊𝑏  and 𝑈𝑃𝑏  are the lower and upper limits of the 
𝑆𝑒𝑒𝑑4 = 𝑋 𝑘
𝑖 𝑖(𝑥𝑖 = 𝑥
𝑘
𝑖 + 𝑅),   𝑘 = [1,2, … , 𝑑]. (28) 𝑏𝑡ℎ problem factor, correspondingly. Each member's goal 
Where 𝑘 is an integer at random in the interval [1, 𝑑] function may be determined and expressed as a vector 
and 𝑅 is a random number with uniform distribution in the according to Eq. (32): 
region [0, 1]. Four formulations for 𝛼𝑖, which controls the 𝐹𝑖𝑡𝑥𝑋1
mobility limitations of the seeds, are provided in order to 𝐹𝑖𝑡 = [ ⋮ ]  (32) 
alter the exploration and exploitation rate of the CGO 𝐹𝑖𝑡𝑋𝑁 𝑁×1
algorithm. 𝐹𝑖𝑡 symbolizes the values of objective functions, 
𝑅𝑎𝑛𝑑 whereas 𝐹𝑖𝑡𝑋𝑎 displays the value of a member. The 
2 × 𝑅𝑎𝑛𝑑
𝛼𝑖 = { (𝛿 × 𝑅𝑎𝑛𝑑) + 1  (29) objective function's value is used as the selection criteria 
for selecting the best candidate solution. The optimal 
(휀 × 𝑅𝑎𝑛𝑑) + (~휀) member of the population and potential solution is the one 
In this case, δ as well as ε are indeterminate numbers that has the highest value for the objective function. It's 
in the interval [0,1], and Rand is a randomly dispersed, time to complete the CBOA's processing steps after the 
equally distributed number in that interval. Given the self- algorithm has been launched. The CBOA is composed of 
similarity problems in the fractals, the eligibility of the two demographic groups: elite agents and candidate 
new and existing seeds should be jointly assessed to solutions. These two groups' update procedures are 
decide if the additional seeds ought to be included in the different. Its elements are changed at each cycle, and the 
search space's overall count of eligible seeds. The best values of the aim function are computed and evaluated. As 
new solution candidates are retained after being vetted; a result, the best member is changed after each repetition. 
seeds with the lowest fitness values, or the lowest degrees Upon comparing the values of the objective function, elite 
of self-similarity, are removed. It is important to note that agents are selected from among the CBOA members with 
the mathematical method reduces the mathematical the highest values. The values of the goal function are used 
model's complexity by using substitution. Actually, the to sort the population matrix in decreasing order. 
entire form of the Sierpinski triangle has been completed 
using all of the qualifying seeds found in the search region. 
𝑗
To cope with the solution variables 𝑥𝑖  breaching the 
boundaries of the factors, a mathematical flag is 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 307 
 
𝑆𝑋1 get the aim function's ideal value. This updating technique 
 ⋮  
 is beneficial since every person searches for better 
𝑆𝑋 =  𝑆𝑋
 
𝑁𝐶 opportunities in the vicinity, independent of the location 
 ⋮  of other community members. This idea is to use Eqs. (37) 
[ 𝑆𝑋𝑁 ]𝑁×1 to (38) to produce a random position around each culinary 
𝑠𝑥1,1 𝑠𝑥1,𝑑𝑖𝑚 (33) instructor in the search space for each issue variable 𝑏 ∈
 ⋮ ⋮  
  [1, 𝑑𝑖𝑚]. If this random site increases the goal function's 
𝑠𝑥𝑁𝐶,1 𝑠𝑥𝑁𝐶,𝑑𝑖𝑚  
=  value, it can be updated. Eqs. (39) to (40) are used to 
𝑠𝑥 …
 𝑁𝐶+1,1 𝑠𝑥   
𝑁𝐶+1,𝑑𝑖𝑚 model this scenario. 
 ⋮ ⋮  (𝑙𝑜𝑐𝑎) (𝑙𝑜𝑐𝑎𝑙)
𝐿𝑂𝑊
[ 𝑠𝑥𝑁,1 𝑠𝑥𝑁,𝑑𝑖𝑚 ] 𝑏 = 𝐿𝑂𝑊𝑏 /𝑖𝑡𝑒𝑟 (37) 
𝑁×𝑑𝑖𝑚 (𝑙𝑜𝑐𝑎𝑙)
𝑆𝐹𝑖𝑡𝑋 𝑈𝑃 38) 
1 𝑏 /𝑖𝑡𝑒𝑟 (
 ⋮  (𝑙𝑜𝑐𝑎𝑙) (𝑙𝑜𝑐𝑎𝑙)
Here, 𝐿𝑂𝑊
 𝑏 and 𝑈𝑃𝑏  show the local 
𝑆𝐹𝑖𝑡𝑋  
𝑁𝐶
𝑆𝐹𝑖𝑡 =   boundaries of the 𝑏𝑡ℎ issue variable, where 𝑖𝑡𝑒𝑟 is a 
34) 
 𝑆𝐹𝑖𝑡𝑋
 (
𝑁𝐶+1 parameter for repetition. 
 ⋮  (
𝑠𝑥 𝐶𝑆𝑆
) (
[ 𝑆𝐹𝑖𝑡𝑋 𝑎,𝑏 = 𝑠𝑥 𝑊 𝑙𝑜𝑐𝑎𝑙)
𝑎,𝑏 + 𝐿𝑂 𝑏  
𝑁 ]
𝑁×1 ( ) (
Where NC is the count of chef instructors, 𝑆𝑋 denotes +𝑟𝑎𝑛𝑑. (𝑈𝑃 𝑙𝑜𝑐𝑎𝑙 𝑐𝑎𝑙) (39) 
𝑏 − 𝐿𝑂𝑊 𝑙𝑜
𝑏 ), 𝑗 − 1, 𝑁𝐶, 𝐽 
the sorted demographic matrix, and SFit displays the = 1,… , 𝑑𝑖𝑚𝑚 
(𝐶𝑆𝑆) (𝐶𝑆𝑆)
ascending objective function value vector. Following that, 𝑆𝑋 , 𝑆𝐹𝑖𝑡
𝑆𝑋 𝑎 𝑎
𝑎 = { < 𝐹𝑖𝑡𝑎 (40) 
changes will be made in 2 steps for each group, from 1 to 𝑆𝑋𝑎 , 𝑒𝑙𝑠𝑒
𝑁𝐶 and 𝑁𝐶 +  1 𝑡𝑜 𝑁. 𝑁𝐶 has started to represent one- (𝐶𝑆𝑆)
𝑆𝑋𝑎  is the new location for the ath-ranked 
fifth of the entire population in the first group division. For membership according to the chef's next strategy called 
instance, 𝑁𝐶 = 6 if there are 30 populations in the (𝐶𝑆𝑆) (𝐶𝑆𝑆)
𝐶 𝑆𝑆, 𝑠𝑥
beginning. All cycles or the end of the epochs result in the 𝑎,𝑏  displays its 𝑏𝑡ℎ manage, and 𝑆𝐹𝑖𝑡𝑎 is the 
goal variable value.  
availability of a single chef. 
Step 2- candidate solutions ' updates As per the 
Step 1- Updating for chef instructors:  
CBOA, candidate solutions pursuing culinary arts use 
Chef instructors use the two best chef instructors' 
these three methods to enhance their cooking abilities: 
strategies to hone their culinary skills. At first, they try to 
A chef trains each student, randomly assigning them 
acquire chef educator methods by imitating the best elite 
to a class. This method has the benefit of having a chef 
agent. This plan describes the global exploration and 
mentor the pupils, which helps them acquire new skills. It 
capabilities of the CBOA. The primary benefit of this 
alludes to users who have moved to the other search zone 
upgrade is that before instructing candidate solutions, chef 
in the technique. If the best chef instructor teaches pupils, 
educators may test their skills against the best chefs. This 
on the other hand, there won't be a worldwide search since 
method allows for the upgrading of candidate solutions, 
there will be a computational bias in favor of the best. The 
not only the most gifted individuals. By doing this, it 
guidance and training of the elite agent determine each 
prevents the algorithm from being stuck in the local 
culinary student's new role. This situation is expressed in 
optimum and promotes more precise and effective 
Eq. (41). 
scanning over the many search space regions. In this 
(
example, freshly established cooking teacher posts are 𝑠𝑥 𝑆𝐹𝑆
)
𝑎,𝑏 = 𝑠𝑥𝑎,𝑏 + 𝑟𝑎𝑛𝑑 
(41) 
filled using Eq. (35). ∙ (𝐶𝐼𝑅 − 𝐼𝑛𝑑 ∙ 𝑠𝑥
𝑎,𝑏 𝑎,𝑏) 
𝑠𝑥 (𝐶𝐹𝑆)
𝑎,𝑏 = 𝑠𝑥𝑎,𝑏 + 𝑟𝑎𝑛𝑑 Based on the learner's initial strategy, known as SFS, 
(35) 
∙ (𝐵𝑒𝑠𝑡𝐶𝑏 − 𝐼𝑛𝑑 ∙ 𝑠𝑥𝑎,𝑏) the updated position for the 𝑎th-sorted member is 
𝑠𝑥 𝐶 ( )
𝑎,𝑏 𝐹 𝑆 specifies the first strategy for switching expressed as 𝑠𝑥 𝑆𝐹𝑆𝑎,𝑏 , where 𝐶𝐼𝑅  is the elite agent and 𝑅 
𝑎,𝑏
chef instructors, and 𝐶𝐹𝑆 indicates the new role for the is an arbitrary index in the interval [0, 𝑁𝐶]. New locations 
ath-ordered member in the bth manage. The best chef are found using Eq. (42). 
instructor in the bth coordinate, or 𝑆𝑋1 in the 𝑆𝑋 matrix, is 
𝑆𝑋𝑆𝐹𝑆
(𝑆𝐹𝑆)
, 𝑆𝐹𝑖𝑡
represented by 𝐵𝑒𝑠𝑡𝐶𝑏. I nd is a randomly chosen number 𝑆𝑋 𝑎 𝑎
𝑎 = { < 𝐹𝑖𝑡𝑎 (42) 
𝑆𝑋
from the set {1,2}, and rand is an arbitrary number in the 𝑎, 𝑒𝑙𝑠𝑒
(𝑆𝐹𝑆)
interval [0,1]. Eq. (36) is used to determine this condition: 𝑆𝐹𝑖𝑡𝑎  is the ultimate value for SFS. 
(𝐶𝐹𝑆) (𝐶𝐹𝑆)
𝑆𝑋 𝑖𝑡 The CBOA's technique involves treating every factor 
𝑆𝑋 = { 𝑎 , 𝑆𝐹 𝑎 < 𝐹𝑖𝑡𝑎
𝑎  (36) as a skill. Each student learns and mimics one of the chef 
𝑆𝑋𝑎,   𝑒𝑙𝑠𝑒
(𝐶𝐹𝑆) instructor's skills. 𝐴𝑛 instructor chosen at random from the 
In this equation, 𝑆𝐹𝑖𝑡𝑎  displays the objective collection 𝐶𝐼𝑅  is used (𝑅 is selected from [1, 𝑁𝐶]). This is 
(𝐶𝐹𝑆)
function of 𝑆𝑋𝑎 , and Fita is the fitness function ath comparable to changing just one variable instead of every 
member. Based on the second method, each culinary possible answer in terms of algorithms. This enhances 
teacher strives to develop their abilities via individual global exploration and search. In order to recreate this 
practice. This method intends to increase CBOA's situation, the first lead instructor, represented by the 
exploitation capabilities and local search. Every elite 𝐶𝐼𝑅  vector, is randomly selected for each culinary learner 
𝑎
agents culinary expertise identifies the factors needed to 𝑠𝑥𝑎 (a CBOA member selected at random from Ra's index 
308 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
from [1, 𝑁𝐶]). To represent a talent of the selected head also allowing the algorithm to find more practical answers 
instructor, the cth coordinate of the vector of 𝑠𝑥𝑎, the that are closer to previously discovered solutions. When 
culinary pupil, is picked at random from [1, 𝑑𝑖𝑚]. 𝐶𝐼𝑅  is every obstacle is viewed as a skill, kids will work to 
𝑐
this value. In this case, Eq. (43) may be used to calculate improve these skills in order to become more fit. Thus, Eq. 
the new location: (45) is used to find new locations. 
The selection of HGSO, CGO, and CBOA stems from 
(𝑆𝑆𝑆) 𝐶𝐼𝑅 , 𝑏 = 𝑐
𝑠𝑥𝑎, = { 𝑎,𝑏
𝑏  (43) 
𝑠𝑥 their distinct abilities to enhance exploration and 
𝑎,𝑏 , 𝑒𝑙𝑠𝑒
exploitation during model optimization critical in high-
where 𝑏 is the problem size ([1, 𝑑𝑖𝑚]), 𝑎 matches the 
dimensional, nonlinear domains like diabetes prediction. 
population and takes a value in the range of [𝑁𝐶 +
HGSO draws on thermodynamic principles to escape local 
 1, 𝑁𝐶 +  𝑁], c is a random integer selected from 
optima, improving convergence reliability. CGO 
[1, 𝑑𝑖𝑚], and SSS is the student's next strategy. 
leverages fractal-inspired chaotic dynamics, offering 
Consequently, the location update is established using Eq. 
effective global search in complex spaces. CBOA mimics 
(44). 
human learning strategies to balance global and local 
(𝑆𝑆𝑆) (𝑆𝑆𝑆)
𝑆𝑋𝑎 , 𝐹𝑖𝑡𝑆𝑎 < 𝐹𝑖𝑡
𝑆𝑋𝑎,𝑏 = {
𝑎 (44) refinement. While these optimizers are general-purpose, 
𝑆𝑋𝑎 , 𝑒𝑙𝑠𝑒 their adaptability makes them suitable for fine-tuning 
(𝑆𝑆𝑆)
𝑆𝑋𝑖  relates to the new position of 𝑎𝑡ℎ ranked model parameters in sensitive health-related tasks. These 
member based on 𝑆𝑆𝑆. schemes were integrated to boost classification 
Using one of the two last methods, personal activities performance beyond what standalone models achieve. 
or research, each culinary student aims to grow personally. Although formal ablation studies were not conducted here, 
This is the algorithm's exploitation stage. The benefit of the comparative evaluation highlights clear improvements 
this approach is that it makes local search stronger while in predictive metrics, justifying their inclusion.
(𝑙𝑜𝑐𝑎𝑙) (𝑙𝑜𝑐𝑎𝑙) (𝑙𝑜𝑐𝑎𝑙)
(𝑆𝑇𝑆) 𝑠𝑥 𝑂
= { 𝑎,𝑏 + 𝐿 𝑊𝑏 + 𝑟𝑎𝑛𝑑 ∙ (𝑈𝑃𝑏 − 𝐿𝑂𝑊
𝑠𝑥 𝑏 )
𝑎,𝑏  (45) 
𝑠𝑥𝑎,𝑏 ,   𝑒𝑙𝑠𝑒
 
where 𝑟 dim is a random number chosen where in the further analysis the sign TP designates 
(𝑆𝑇𝑆)
from [1, 𝑑𝑖𝑚] and 𝑠𝑥𝑎,𝑏  displays the updated calculated the case of a positive forecast of the good luck, FP - the 
state of the ath member based on the student's third abbreviation of fall positive - is used in the case when the 
strategy (𝑆𝑇𝑆). Eq. (46) displays the changes: outcome of a case is bad. In the case when the forecast is 
(𝑆𝑇𝑆) (𝑆𝑇𝑆) negative and the real result is really negative TN gives the 
𝑆𝑋𝑎 , 𝐹𝑖𝑡𝑆
𝑆𝑋 𝑎 < 𝐹𝑖𝑡
𝑎,𝑏 = {
𝑎 (46) same result. The FN means a bad forecast when the real 
𝑠𝑥𝑎,𝑏  , 𝑒𝑙𝑠𝑒 result is good. 
(𝑆𝑇𝑆)
Fit 𝑆𝑋𝑎  displays the desired function value of 
(𝑆𝑇𝑆)
𝑆𝑋𝑎  as 𝑆𝑇𝑆. Culinary learners and elite agents discuss 3 Result and discussion 
𝐶𝐵𝑂𝐴 tactics. 
The results obtained from these hybrid schemes are 
represented comprehensively with various graphs and 
2.7 Performance evaluator tables. These tools systematically compare and contrast 
A variety of indicators are utilized to assess classifier each model's performance for an in-depth assessment of 
performance. The term "accuracy" refers to the proportion the functions of each model. From a careful study of the 
of accurately predicted observations. Three commonly results represented in the graphs and tables, insightful 
used metrics are recall, accuracy, and precision. Total analysis is performed to identify the best model that 
accuracy, which encompasses both real negatives and performs well in terms of predictive accuracy and 
positives, is referred to as accuracy. Unbalanced datasets suitability for the prediction process. Moreover, this 
can lower accuracy. Recall finds only positives and review also points out schemes with flaws or limits, 
assumes minimal mistakes. The F1 score is helpful in adding a critical perspective to the work, especially in 
schools with different distributions since it balances respect of their applicability to real-life scenarios. This 
recollection and accuracy. It can handle both false strong assessment methodology allows researchers to 
negatives and real positives. These measures assist in make informed decisions on model selection and 
estimating the efficacy of ML schemes. optimization for prediction tasks, helping to advance not 
TP + TN only the science but also practical applications behind 
Accuracy =  (47) 
TP + TN + FP + FN predictive modeling. 
TP
Precision =  (48) 
TP + FP 3.1 Convergence curve 
TP TP
Recall = TPR = =  (49) The convergence curve has a significant influence on 
P TP + FN
2 × Recall ×  Precision prediction processes since it displays the rate at which a 
F1 score =  (50) scheme learns. A steep slope in the convergence curve 
Recall + Precision displays that convergence happens fast, and hence, the 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 309 
 
model quickly learns the pattern and forecasts stabilize. In iterations, revealing learning stability and showing which 
contrast, a shallow curve indicates slower convergence, schemes reach optimal accuracy most efficiently during 
which means the model takes longer to comprehend training. It can be seen from this figure that, among the 
patterns, and hence, the predictions are highly LDCB, LDCG, and LDHG schemes, the LDCG model, 
unpredictable throughout training. This helps to which has reached an accuracy of 0.930, has been 
understand this curve for optimizing the training tactics outperformed by the LDCB model with 0.968 accuracy, 
and finding a balance between underestimating and whereas its accuracy is higher than that of the LDHG 
overfitting. The suggestions made include those of model, which stands at 0.921. Similarly, among the 
learning rate changes, batch size changes, and model GPHG, GPCG, and GPCB schemes, the GPHG schemes 
topology for best prediction performance with no showed an accuracy of 0.942, proving that their accuracy 
convergence or wasted time in unnecessary training. The is the lowest compared to the GPCG model, which had an 
convergence curve in Fig. 3 illustrates and compares the accuracy of 0.960, and the GPCB model, which had an 
results of the hybrid schemes presented. Fig. 3 displays the accuracy of 0.980. Their optimal condition was achieved 
convergence behavior of each hybrid model across after 60 cycles.
  
Figure 3: 3D The convergence curve for the 3 schemes 
outperforms the precision value of the LDCG model, 
3.2 Schemes comparison  which stands at 0.935, during the training phase. 
Upon comparing the outcomes of the schemes during 
Table 1 displays the outcomes of both the LDR and GPC 
the testing phase, it becomes apparent that the recall value 
schemes, as well as their respective hybrid forms in 
of the hybrid forms of GPC schemes exceeds that of the 
different phases. Table 1 summarizes the accuracy, 
hybrid form of the LDR model. Specifically, during the 
precision, recall, and F1-scores of all models during 
testing phase, it is evident that LDCG, with a recall value 
training, testing, and overall phases, enabling side-by-side 
of 0.922, demonstrates weaker functionality than GPCG, 
evaluation of classifier performance. In the training phase, 
which achieves a recall value of 0.957. However, 
it becomes apparent that the functionality of the LDR 
following the LDCB model with a recall value of 0.961, 
model, boasting an accuracy of 0.916, falls short than 
the LDCG model boasts the highest value among its group 
another base model, GPC, achieving 0.937 accuracy in the 
members. Conversely, GPCG, with a recall value of 0.957, 
same phase. Similarly, its hybrid counterpart, the LDHG 
signifies that its performance surpasses that of the GPHG 
model, with an accuracy of 0.926, also lags behind the 
and GPC schemes, which have recall values of 0.935 and 
GPHG model with 0.946 accuracy. Furthermore, the 
0.909, in that order, although it does not outperform 
precision value of the GPCG model, reaching 0.963, 
GPCB, with a recall value of 0.978, during the testing 
phase.
Table 1: The outcome of the showcased developed schemes 
Metric values 
Section Model 
Accuracy Precision Recall F1-score 
LDR 0.916 0.917 0.916 0.917 
LDHG 0.926 0.925 0.926 0.925 
Train 
LDCG 0.935 0.935 0.935 0.935 
LDCB 0.972 0.972 0.972 0.972 
310 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
GPC 0.937 0.937 0.937 0.937 
GPHG 0.946 0.947 0.946 0.946 
GPCG 0.963 0.963 0.963 0.963 
GPCB 0.981 0.981 0.981 0.981 
LDR 0.874 0.876 0.874 0.875 
LDHG 0.913 0.913 0.913 0.913 
LDCG 0.922 0.921 0.922 0.921 
LDCB 0.961 0.961 0.961 0.961 
Test 
GPC 0.909 0.914 0.909 0.910 
GPHG 0.935 0.937 0.935 0.936 
GPCG 0.957 0.961 0.957 0.957 
GPCB 0.978 0.979 0.978 0.978 
LDR 0.904 0.905 0.904 0.904 
LDHG 0.922 0.922 0.922 0.922 
LDCG 0.931 0.931 0.931 0.931 
LDCB 0.969 0.969 0.969 0.969 
All 
GPC 0.928 0.929 0.928 0.929 
GPHG 0.943 0.944 0.943 0.943 
GPCG 0.961 0.962 0.961 0.961 
GPCB 0.980 0.981 0.980 0.980 
 
The 3D wall plot of Fig. 4 visualizes model accuracy between its measures, which remain around 0.928 with 
comparison across three different phases, namely regard to both accuracy and recall, demonstrating an 
Training, Testing, and All. By taking into account the overall robust behavior in performance. In sharp contrast, 
performances for all the phases of three schemes, a the LDHG model displays very consistent results in all 
number of thrilling trends can be found out. First and four metrics, reaching a stable performance of 0.922 in all, 
foremost, during the All phase, the LDR model performed reflecting a balanced performance considering different 
best among them with a marvelous score of its precision evaluation standards. In contrast, the GPHG model has 
metric 0.905, which really exhibits the competency of this strengths and weaknesses mixed up on the metrics. 
model with a touch towards precision. With that said, GPC Although it has a very commendable score in the precision 
outcompetes all its contenders during the same stage with metric of 0.944, the value is low in other metrics, having 
outstanding precision and F1 score records at an 0.943 for accuracy, recall, and F1 score, showing its 
astonishing 0.929, while it preserves high consistency relative weakness in those aspects.
  
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 311 
 
  
  
  
Figure 4: 3D Walls-plot for the performance of the schemes across phases 
Table 2 presents a comparison of the functional conditions. For instance, the LDR model showcases an 
performance of schemes under both healthy and diabetes accuracy of 0.93 under healthy conditions, aligning with 
312 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
the precision value of the LDHG model. However, the Nevertheless, the hybrid forms of the GPC model 
LDCB model emerges as the top performer with a showcase superior functionality in contrast to the LDA 
precision value of 0.97, indicating its superiority over the scheme and its variants. 
LDCG model, which achieves a precision value of 0.94, Furthermore, under diabetes conditions, the LDCB 
as well as other preceding schemes. Among the hybrid model exhibits a higher recall value of 0.95, surpassing the 
versions of the GPC model, the GPCB and GPCG recall values of the LDCG, LDHG, and LDA schemes, 
schemes emerge with the highest accuracy under healthy which stand at 0.90 and 0.88, in that order. Moreover, the 
conditions, boasting precision values of 0.99 and 0.98, recall value of the LDCB model exceeds that of the GPC 
respectively. Following closely, the GPHG model and GPHG schemes, which are 0.91 and 0.94, 
achieves a precision value of 0.97, while the GPC model respectively. However, it falls short of surpassing the 
records a precision value of 0.95, indicating slightly recall values of the GPCG and GPCB schemes, which are 
weaker functionality compared to the former schemes. 0.96 and 0.98, respectively.
Table 2: Categorization of assessment criteria for the performance of the developed schemes 
Metric Model 
Condition 
values LDR LDHG LDCG LDCB GPC GPHG GPCG GPCB 
Healthy 0.93 0.93 0.94 0.97 0.95 0.97 0.98 0.99 
Precision 
Diabetes 0.85 0.90 0.91 0.96 0.88 0.90 0.93 0.97 
Healthy 0.92 0.95 0.95 0.98 0.94 0.94 0.96 0.98 
Recall 
Diabetes 0.88 0.88 0.90 0.95 0.91 0.94 0.96 0.98 
Healthy 0.93 0.94 0.95 0.98 0.94 0.96 0.97 0.98 
F1-score 
Diabetes 0.86 0.89 0.90 0.95 0.90 0.92 0.95 0.97 
 
The column line symbol plot in Fig. 5 provides a values. Conversely, under the healthy condition, both 
comparison between the values recorded in both healthy GPC and GPHG schemes achieve 468 and 471 out of 500 
and diabetic situations and the values predicted by the measured values, respectively, indicating lower accuracy 
schemes. Under the diabetes condition, it is evident that compared to the GPCG and GPCB schemes, which 
the LDCB model, with 254 out of 268 measured values, achieve 480 and 491 out of 500 measured values, 
demonstrates higher accuracy than the LDCG model, respectively. Besides, the GPCG and GPHG schemes 
which achieves 240 out of 267 measured values. attain values of 258/268 and 253/268, respectively, under 
Similarly, the base model, LDR, performs better with 236 the diabetes condition, which indicates moderate 
out of 268 measured values compared to the LDHG performance by the GPCB model, with attained values of 
model, which also achieves 236 out of 268 measured 262/268, and the GPC model, at 245/268.
  
Figure 5: Column-line symbol plot to represent the difference among the schemes
To avoid overfitting, the model's performance was performance in all phases give us an idea of how strong 
checked at three different phases: training, testing, and they are. In the future, we will use cross-validation and 
overall. Also, the fact that the training and testing explicit regularization approaches to better control 
measures show the same patterns means that the model is overfitting and make the model more generalizable. 
generalizing instead of overfitting. Even though there  
wasn't a formal validation set, the hybrid schemes' 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 313 
 
The ROC is a measure that fundamentally depends on the suggested schemes are carefully analyzed with the 
how well binary classifiers work. It compares the false help of the ROC curve, which is a perfect inseparable tool 
positive rate (1-specificity) to the true positive rate used to analyze the performance of the classifier. It is 
(sensitivity) at various thresholds. This graph conveys observed, upon detailed analysis, that GPCB and GPCG 
useful information about the capability of the classifier to are ahead of their competitors in reaching a TPR value of 
differentiate classes in all possible threshold settings. The 1.0 at an earlier stage and hence delivers exceptional 
ROC is a tool that actually enables the researchers to study performance in classification problems. After that, LDCB 
the compromise between true positives and false positives, and GPHG come very close as the second and third 
thus giving a complete view of the efficiency of the schemes, reaching a TPR of 1.0 just a little later but with 
classifier. Besides, the ROC's AUC gives a quantitative a sharp increase, further establishing their effectiveness. 
measure of the discriminatory power of a classifier, where In sharp contrast, the LDR model lags far behind its 
larger AUC means better performance. Also, the ROC plot counterparts since its vector has the gentlest slope among 
allows for better selection of the optimal cut-off value to the compared schemes. Nevertheless, the LDR model 
classify the samples according to the needs of the specific eventually attains 1.0 TPR but takes its time in comparison 
application, considering sensitivity and specificity to get with the others. The above analysis displays how different 
the same result desired. Therefore, the ROC curve schemes may perform to the extent and also how often the 
displays a very important means for testing, comparing, ROC curve proves useful for making subtle choices 
and fine-tuning binary classification schemes, thus regarding classifier behavior, which might not be 
contributing to enhanced ML model predictive power in a immediately apparent in other forms, and helps drive 
slew of applications. Moreover, in Fig. 6, the outcomes of better decisions for predictive modeling tasks.
 
Figure 6: ROC curves depict the performance of the most efficient hybrid schemes 
The SHAP additive explanations in Fig. 7 depict the nutrition, regular physical activity, and medication is 
effects of various factors such as glucose or BMI considered a significant approach to diabetic 
indicators that influence the possibility of diabetes. The prevention and management. BMI, which is 
following explanation succinctly defines the effects of determined using weight and height measures, is 
such factors on the occurrence of diabetes. another widely accepted indicator of body fatness 
• High levels of blood glucose, normally due to associated with the risk of developing diabetes. 
excessive consumption of sugar or reduced action of • A high BMI means excess adipose tissue interferes 
insulin, may eventually lead to the development of with insulin action, apart from increasing the 
diabetes. Blood glucose that remains high over a inflammatory component, leading to insulin 
continuous period places a load on the pancreas resistance and impaired glucose tolerance. The 
secreting insulin, and, with time, may make it lose its underlying fat also secretes hormones and cytokines, 
efficiency. This can result in insulin resistance-a further dampening metabolic processes and 
condition whereby cells become unable to efficiently increasing diabetes risk. Also, a higher BMI is more 
act in response to insulin signals, causing more often than not associated with other risk factors like 
accumulation of glucose. Besides, high levels of sedentary lifestyle and lousy food, increasing the 
glucose can cause the damage of blood vessels and chances of diabetes. By enhancing insulin sensitivity 
neurons, which raise the risk of complication and overall metabolic health, dietary and activity 
development in diabetic patients. Hence, keeping changes that control body mass index (BMI) can 
blood glucose within the norm through proper lower the risk of diabetes. Therefore, maintaining a 
314 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
healthy BMI is crucial for both preventing and 
treating diabetes.
 
Figure 7: The sensitivity analysis results 
Table 3 provides the results of a 5-fold cross-validation strong generalization and low variance. In contrast, the 
for the GPC and LDR models, assessing their stability and LDR model shows slightly lower accuracy across all folds, 
generalization across different subsets of the dataset. Each with values ranging from 0.887 to 0.904. The results 
fold (K1 to K5) represents an independent split where the clearly suggest that GPC outperforms LDR not only in 
model was trained on 80% of the data and tested on the individual experiments but also in terms of cross-validated 
remaining 20%. The GPC model demonstrates reliability. These findings reinforce the robustness of GPC 
consistently high performance across all folds, with for diabetes prediction tasks under varying training-test 
accuracy values ranging from 0.916 to 0.928, indicating partitions.
 
Table 3: K-fold cross validation.  
K Fold Number 
Models 
K1 K2 K3 K4 K5 
GPC 0.920 0.927 0.924 0.916 0.928 
LDR 0.887 0.895 0.901 0.896 0.904 
 
Table 4 presents the results of the Wilcoxon signed- significant result with a p-value of 0.0679, while others 
rank test conducted to compare the performance such as GPC-CBOA and LDR-based hybrids did not show 
differences between baseline classifiers and their hybrid statistically significant improvements, as their p-values 
optimized variants. The test evaluates whether observed exceeded 0.1. The stat column represents the test statistic 
differences in classification performance are statistically for ranking the difference between paired models. These 
significant. A lower p-value (typically < 0.05) indicates a findings validate that only specific optimizer integrations 
statistically meaningful improvement. Among the models, particularly with GPC deliver meaningful predictive 
the GPCHG scheme achieved a p-value of 0.0348, advantages, supporting the selective use of metaheuristics 
indicating a statistically significant enhancement over the in medical classification contexts like Type 2 diabetes 
base GPC model. Similarly, GPCG produced a marginally prediction. 
 
 
 
 
 
 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 315 
 
Table 4: Wilcoxon test. 
Models stat P value 
GPC 644 2.25E-01 
GPC Henry gas solubility optimization 338 3.48E-02 
GPC chaos game Optimization 155 6.79E-02 
GPC Chef-Based Optimization Algorithm 48 4.39E-01 
LDR 1200 2.45E-01 
LDR-Henry gas solubility optimization 824 4.39E-01 
LDR-chaos game Optimization 675 6.80E-01 
LDR-Chef-Based Optimization Algorithm 125 4.14E-01 
GPC 644 2.25E-01 
GPC-Henry gas solubility optimization 338 3.48E-02 
GPC-chaos game Optimization 155 6.79E-02 
GPC-Chef-Based Optimization Algorithm 48 4.39E-01 
 
 
4 Conclusion • Limitations: 
There are several drawbacks to projection using ML 
The various advantages of early detection of diabetes by 
techniques. The most critical problem of overfitting that 
using ML are: it enables early interference, thus 
most schemes biased the training data and gather noise 
preventing the development of complications such as 
rather than underlying patterns, which is poor in 
cardiovascular diseases and neuropathy; ML algorithms 
generalization in unknown data. When the schemes are 
sift through enormous volumes of data to spot patterns that 
relatively simple to represent the complexity of the data, 
are so subtle they could indicate diabetes risk, hence 
underfitting happens with poor accuracy in the forecast. 
improving their accuracy. This will, therefore, be enabling 
Biases in training data can persist in ML schemes, leading 
personalized treatment plans for better patient care. Also, 
to biased forecasts, especially in sensitive domains like 
automating diagnostics cuts down the healthcare costs and 
healthcare and criminal justice. Furthermore, ML 
workload for medical staff. In a nutshell, ML aims at early 
algorithms need big, high-quality datasets for training, 
diabetes detection, providing an improvement for patient 
which are not always available, especially in specialist 
outcomes through easy healthcare access, thus adopting a 
sectors or when dealing with sensitive data. The dynamic 
proactive stance towards the disease's management. 
nature of real-world data makes it challenging to sustain 
However, this work aims to project diabetes using ML 
model correctness over time; hence, regular monitoring 
schemes comprising GPC and LDA, coupled with 3 
and updating become necessary. To solve these 
optimizers: Henry Gass Solubility Optimization, Chef 
limitations, several methods have been tried to reduce 
Base Enhancement Algorithm, and Chaos Game 
overfitting, such as regularization; feature engineering to 
Optimization. With the view of improving the accuracy of 
make the schemes perform better; and algorithms that are 
the prediction, it was decided to couple the schemes with 
fair-aware to reduce biases. All of the above can be further 
the optimizers. These results mean that the model GPC 
improved by enhancing openness and interpretability of 
and its hybrid forms provide better performance than the 
schemes, thus building trust and enabling their adoption in 
LDA scheme and its hybrids. Comparing results in GPC, 
applications of importance. This calls for more research 
GPHG, GPCG, and GPCB, for instance, out of these, the 
and development on these issues so that the MLC forecasts 
best result was from the GPCB model in the "All" phase, 
become increasingly accurate and dependable. 
with an accuracy value of 0.980. In that respect, the GPCG 
 
model stands out as the second-best model with an 
 
accuracy of 0.961, while the GPHG model gives medium 
 
performance in this comparison, with an accuracy of 
0.943. In this comparison, the GPC model has the weakest 
functionality, with an accuracy of 0.928. 
316 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
References type 2 diabetes,” Endocr Rev, vol. 37, no. 3, pp. 
190–222, 2016. Publisher: Oxford Academic. 
[1] S. M. Haffner, “Epidemiology of type 2 diabetes: https://doi.org/10.1210/er.2015-1116.  
risk factors,” Diabetes Care, vol. 21, no. [13] L. S. Greci et al., “Utility of HbA1c levels for 
Supplement_3, pp. C3–C6, 1998. Publisher: diabetes case finding in hospitalized patients with 
American Diabetes Association. hyperglycemia,” Diabetes Care, vol. 26, no. 4, pp. 
https://doi.org/10.2337/diacare.21.3.C3.  1064–1068, 2003. Publisher: American Diabetes 
[2] Y. Wu, Y. Ding, Y. Tanaka, and W. Zhang, “Risk Association. 
factors contributing to type 2 diabetes and recent https://doi.org/10.2337/diacare.26.4.1064.  
advances in the treatment and prevention,” Int J [14] K. Plis, R. Bunescu, C. Marling, J. Shubrook, and 
Med Sci, vol. 11, no. 11, p. 1185, 2014. Publisher: F. Schwartz, “A machine learning approach to 
National Library of Medicine. predicting blood glucose levels for diabetes 
https://doi.org/10.7150/ijms.10001.  management,” in Workshops at the Twenty-Eighth 
[3] E. Wilmot and I. Idris, “Early onset type 2 AAAI conference on artificial intelligence, 2014. 
diabetes: risk factors, clinical impact and Publisher: AAAI. 
management,” Ther Adv Chronic Dis, vol. 5, no. [15] R. E. Glasgow, “A practical model of diabetes 
6, pp. 234–244, 2014. Publisher: Sage management and education,” Diabetes Care, vol. 
Publications. 18, no. 1, pp. 117–126, 1995. Publisher: American 
https://doi.org/10.1177/2040622314548679.  Diabetes Association. 
[4] G. L. Robertson, “Diabetes insipidus,” Endocrinol https://doi.org/10.2337/diacare.18.1.117.  
Metab Clin North Am, vol. 24, no. 3, pp. 549–572, [16] S. Nam, C. Chesla, N. A. Stotts, L. Kroon, and S. 
1995. Publisher: Elsevier. L. Janson, “Barriers to diabetes management: 
https://doi.org/10.1016/S0889-8529(18)30031-8.  patient and provider factors,” Diabetes Res Clin 
[5] J. R. GREEN, G. C. BUCHAN, E. C. ALVORD Pract, vol. 93, no. 1, pp. 1–9, 2011. Publisher: 
JR, and A. G. SWANSON, “Hereditary and Elsevier. 
idiopathic types of diabetes insipidus,” Brain, vol. https://doi.org/10.1016/j.diabres.2011.02.002.  
90, no. 3, pp. 707–714, 1967. Publisher: National [17] P. J. Watkins, P. L. Drury, K. W. Taylor, and W. 
Library of Medicine. G. Oakley, Diabetes and its management. Wiley 
https://doi.org/10.1093/brain/90.3.707.  Online Library, 1990.  
[6] M. Babey, P. Kopp, and G. L. Robertson, [18] S. H. Ley, O. Hamdy, V. Mohan, and F. B. Hu, 
“Familial forms of diabetes insipidus: clinical and “Prevention and management of type 2 diabetes: 
molecular characteristics,” Nat Rev Endocrinol, dietary components and nutritional strategies,” 
vol. 7, no. 12, pp. 701–714, 2011. Publisher: The Lancet, vol. 383, no. 9933, pp. 1999–2007, 
Nature. https://doi.org/10.1038/nrendo.2011.100.  2014. Publisher: The Lancet. 
[7] R. D. Lawrence, “Three types of human diabetes,” [19] S. Vijan, “Type 2 diabetes,” Ann Intern Med, vol. 
Ann Intern Med, vol. 43, no. 6, pp. 1199–1208, 152, no. 5, pp. ITC3-1, 2010. Publisher: ACP. 
1955. Publisher: ACP. https://doi.org/10.7326/0003-4819-152-5-
https://doi.org/10.7326/0003-4819-43-6-1199.  201003020-01003.  
[8] R. D. Lawrence, “Types of human diabetes,” Br [20] J. E. B. Reusch and J. E. Manson, “Management 
Med J, vol. 1, no. 4703, p. 373, 1951. Publisher: of type 2 diabetes in 2017: getting to goal,” JAMA, 
National Library of Medicine. vol. 317, no. 10, pp. 1015–1016, 2017. Publisher: 
https://doi.org/10.1136/bmj.1.4703.373.  Jama Network. DOI: 10.1001/jama.2017.0241. 
[9] A. Ota and N. P. Ulrih, “An overview of herbal [21] E. Ahmad, S. Lim, R. Lamptey, D. R. Webb, and 
products and secondary metabolites used for M. J. Davies, “Type 2 diabetes,” The Lancet, vol. 
management of type two diabetes,” Front 400, no. 10365, pp. 1803–1820, 2022. Publisher: 
Pharmacol, vol. 8, p. 224659, 2017. Publisher: The Lancet. 
frontiers. [22] R. A. DeFronzo et al., “Type 2 diabetes mellitus,” 
https://doi.org/10.3389/fphar.2017.00436.  Nat Rev Dis Primers, vol. 1, no. 1, pp. 1–22, 2015. 
[10] W. J. Pories, J. H. Mehaffey, and K. M. Staton, Publisher: Nature. 
“The surgical treatment of type two diabetes https://doi.org/10.1038/nrdp.2015.19.  
mellitus,” Surgical Clinics, vol. 91, no. 4, pp. [23] D. H. Laursen, A. Frølich, and U. Christensen, 
821–836, 2011. Publisher: Surgical Clinics. “Patients’ perception of disease and experience 
[11] J. Zhang, Y. Deng, Y. Wan, J. Wang, and J. Xu, with type 2 diabetes patient education in D 
“Diabetes duration and types of diabetes treatment enmark,” Scand J Caring Sci, vol. 31, no. 4, pp. 
in data-driven clusters of patients with diabetes,” 1039–1047, 2017. Publisher: Wiley Online 
Front Endocrinol (Lausanne), vol. 13, p. 994836, Library. https://doi.org/10.1111/scs.12429.  
2022. Publisher: frontiers. [24] O. Peleg, E. Hadar, and A. Cohen, “Individuals 
https://doi.org/10.3389/fendo.2022.994836.  with type 2 diabetes: an exploratory study of their 
[12] Y. Yang and L. Chan, “Monogenic diabetes: what experience of family relationships and coping 
it teaches us on the common forms of type 1 and with the illness,” Diabetes Educ, vol. 46, no. 1, pp. 
Hybrid Machine Learning Framework for Type 2 Diabetes Prediction…                                  Informatica 49 (2025) 299–318 317 
 
83–93, 2020. Publisher: Sage Publications.  
https://doi.org/10.1177/0145721719888625.   
[25] E. A. Beverly, C. K. Miller, and L. A. Wray,  
“Spousal support and food-related behavior  
change in middle-aged and older adults living  
with type 2 diabetes,” Health Education &  
Behavior, vol. 35, no. 5, pp. 707–720, 2008.  
Publisher: Sage Publications.  
https://doi.org/10.1177/1090198107299787.   
[26] J. L. Browne, A. Ventura, K. Mosely, and J.  
Speight, “‘I call it the blame and shame disease’:  
a qualitative study about perceptions of social  
stigma surrounding type 2 diabetes,” BMJ Open,  
vol. 3, no. 11, p. e003384, 2013. Publisher: BMJ  
Journals. https://doi.org/10.1136/bmjopen-2013-  
003384.   
[27] A. Dagliati et al., “Machine learning methods to  
predict diabetes complications,” J Diabetes Sci  
Technol, vol. 12, no. 2, pp. 295–302, 2018.  
Publisher: Sage Publications.  
https://doi.org/10.1177/1932296817706375.   
[28] M. Soni and S. Varma, “Diabetes prediction using  
machine learning techniques,” International  
Journal of Engineering Research & Technology  
(IJERT), vol. 9, no. 9, pp. 921–925, 2020.  
Publisher: IJERT.  
[29] L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh, and G.  
Stiglic, “Early detection of type 2 diabetes  
mellitus using machine learning-based prediction  
models,” Sci Rep, vol. 10, no. 1, p. 11981, 2020.  
Publisher: Nature.  
https://doi.org/10.1038/s41598-020-68771-x.   
[30] H. Sifaou, A. Kammoun, and M.-S. Alouini,  
“High-dimensional linear discriminant analysis  
classifier for spiked covariance model,” Journal  
of Machine Learning Research, vol. 21, no. 112,  
pp. 1–24, 2020. Publisher: JMLR. Available:  
https://www.jmlr.org/papers/v21/19-428.html.   
[31] C. E. Rasmussen and C. K. I. Williams, Gaussian  
processes for machine learning, vol. 1. Springer,  
2006.   
[32] Y. Li, W. Zhang, L. Wang, F. Zhao, W. Han, and  
G. Chen, “Henry’s law and accumulation of weak  
source for crust-derived helium: A case study of  
Weihe Basin, China,” Journal of Natural Gas  
Geoscience, vol. 2, no. 5–6, pp. 333–339, 2017.  
Publisher: Elsevier.  
https://doi.org/10.1016/j.jnggs.2018.02.001.   
[33] J. Staudinger and P. V Roberts, “A critical review  
of Henry’s law constants for environmental  
applications,” Crit Rev Environ Sci Technol, vol.  
26, no. 3, pp. 205–297, 1996. Publisher: Taylor &  
Francis.  
https://doi.org/10.1080/10643389609388492  
[34] E. Trojovská and M. Dehghani, “A new human-  
based metahurestic optimization method based on  
mimicking cooking training,” Sci Rep, vol. 12, no.  
1, p. 14861, 2022. Publisher: Nature.  
https://doi.org/10.1038/s41598-022-19313-2.   
  
  
318 Informatica 49 (2025) 299–318                                                                                                                              N. Zhang et al. 
 
 
  
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i12.10230 Informatica 49 (2025) 319–332 319 
 
A Comprehensive Evaluation Model for the State of Electric 
Energy Metering Devices Based on Fuzzy Analytic Hierarchy 
Process 
 
 
Chen Xu*, Zhang Chao, Zhang HaoMiao, Su YingChun, Yan Yu, Xu YinZhe 
State Grid Ningxia Marketing Service Center (State Grid Ningxia Metrology Center) ，Yin chuan 750000, Ning xia, China 
E-mail: cxhbdl@126.com  
*Corresponding uthor 
 
Keywords: fuzzy analytic hierarchy process, electric energy metering device, state evaluation, comprehensive 
evaluation model 
 
Received: July 17, 2025 
Accurately evaluating the status of electric energy metering devices is the foundation for ensuring their 
stable operation on smart grids, and is conducive to the development of equipment management towards 
refinement and intelligence. This article proposes a comprehensive evaluation model through the fuzzy 
analytic hierarchy process (F-AHP), which is characterized by establishing a multi-index system and 
taking into account subjective opinions and objective data, thereby improving the scientificity of its 
evaluation and enhancing its anti-interference ability. It starts with establishing a hierarchical structure, 
dividing the functions of indicators such as structural reliability, measurement accuracy, communication 
stability, and environmental adaptability. Then, based on the fuzzy decision matrix assignment, the 
importance of each indicator is calculated, and the assignment and overall score of the indicators are 
obtained, completing the quantitative evaluation of the health of the measuring device. In the  
experimental verification, 50 typical electric energy metering device samples were selected for state 
evaluation modeling. The average CI value of the model was 0.016, the coefficient of variation CV was 
0.069, and the accuracy of state recognition reached 92.5%. The evaluation results have high stability, 
can effectively identify boundary fuzzy samples, and have strong robustness and practical value. The 
results indicate that the evaluation model proposed in this article can better solve multiple practica l 
cases and the overall evaluation error does not exceed 5%; Compared with traditional AHP and weighted 
average method (WAM), this model performs better in state recognition accuracy and boundary blurring 
processing ability. Simultaneously conducting noise experiments and sensitivity analysis, and proving 
that the model has high stability and reliability under various abnormal conditions.  
Povzetek: F-AHP model z večkazalčno hierarhijo izboljša ocenjevanje stanja merilnih naprav, združuje 
subjektivne in objektivne podatke ter krepi robustnost. Na 50 vzorcih doseže 92,5 -odstotno točnost, nizko 
varianco ter boljšo prepoznavo mejnih primerov. 
 
 
1  Introduction factor, and cannot provide sufficient measurement scales. 
The proportion of weights is too subjective and the 
With the development of smart grids, the position of 
definition is not clear enough, which cannot adapt to the 
energy metering devices in the operation and production 
operation of large-scale equipment. 
of power grid enterprises is becoming increasingly 
Different factors can affect the operating status of 
important. They not only serve as the basic unit of 
energy metering devices, such as installation location, wear 
measurement for billing and metering, but also perform 
and tear of components, data transmission quality, power 
important tasks such as data collection, load monitoring, 
supply quality, and changes in grid noise. There are not only 
and equipment status recognition, playing an important 
quantitative factors that can be quantified to a single value, 
role in ensuring the quality of power supply and serving 
but also qualitative evaluation factors that cannot be 
customer rights. With the rapid development of smart 
directly quantified. Whether the instrument interface 
grids and the increasing number of connected devices, it 
docking is reasonable is an indicator that cannot be directly 
is essential to accurately grasp the operating status of 
measured. Historical data shows that the trend of error rate 
metering devices and be able to detect risks early. 
changes has a strong human explanatory factor. Therefore, 
However, traditional evaluation methods often rely on 
it is not easy to simultaneously balance "orderliness" and 
human visual inspection or judgment based on a single 
"fuzziness" using only traditional analytic hierarchy 
320 Informatica 49 (2025) 319–332                                                                                 C. Xu et al. 
 
process or fuzzy mathematics methods. To more grey model, etc. [7]; The second is to use intelligent 
accurately and comprehensively characterize the overall analysis methods to achieve intelligent determination of 
working status of electric energy metering, it is device operating status, such as applying deep learning 
necessary to establish a performance evaluation model technology to establish multi-sensor models for data 
with clear hierarchy and measurement fuzzy anomaly analysis and anomaly source localization [8]; The 
acceptability requirements [3]. This article establishes a third is a comprehensive equipment operation status 
comprehensive performance evaluation model using the evaluation model formed by integrating multiple decision-
fuzzy analytic hierarchy process to design a specific making methods such as fuzzy reasoning technology, grey 
evaluation system model for electric energy metering target theory, and analytic hierarchy process. 
devices in real-world operating scenarios. This model is Some scholars have discussed the challenges of state 
based on a multi-dimensional evaluation index system identification under special conditions such as nonlinear 
and integrates professional knowledge and real-time data. loads and power quality disturbances. For example, Shah 
After establishing a fuzzy judgment matrix, determining (2023) [9] designed an artificial intelligence based 
weights, and conducting consistency checks, it forms a nonlinear load detection and identification system, which 
comprehensive evaluation model with clear hierarchy, can reasonably identify power data containing noise and 
reasonable weights, and practical effectiveness, thus structural abnormalities; Yu et al. (2022) [10] established 
changing the shortcomings of traditional methods that an online power quality monitoring mode using grey target 
cannot cope with fuzziness and human factors. Through theory and achieved multi-layer classification and 
this model, equipment managers can achieve identification of key indicator trends in practical problems. 
quantitative diagnosis of equipment operating status, Zhang et al. (2022) [11] also proposed using Software 
identify operational defects, and assist in developing Defined Networking (SDN) to reconfigure the data 
differentiated maintenance and repair plans. transmission path for system architecture, in order to ensure 
The structure of this article is arranged as follows: the reliability and effectiveness of the acquisition process 
Chapter 2 provides an overview of the research status of of electricity metering data under diverse input conditions.  
existing power measurement equipment status It is worth noting that the application of Fuzzy Analytic 
evaluation; Chapter 3 further elaborates on the design Hierarchy Process (FAHP) in power status assessment and 
ideas and construction methods of the proposed model; evaluation has also received more attention. For example, 
Chapter 4 mainly presents the implementation methods Taherikhonakdar et al. (2023) [12] used a combination of 
and evaluation process of the model; Chapter 5 provides Fuzzy Analytic Hierarchy Process and Grey System to 
a discussion of an example and presents a comparative evaluate the status of 750kV energy metering devices. In F, 
analysis of the example, as well as an analysis of the they classified and rated the measured 750kV energy 
practicality and robustness of the model; The final metering devices, and obtained a more reasonable and 
chapter six provides a comprehensive summary of the comprehensive evaluation result of 750kV energy metering 
research content and prospects for future development devices. Paunkov et al. (2023) [13] proposed an adaptive 
directions. correction mechanism for real-time calibration of 
measurement deviation using fuzzy control rules, which 
2  Related work achieved real-time adjustment of measurement deviation 
and improved the consistency and accuracy of device 
Although energy metering is becoming increasingly ratings. From this, it can be seen that the Fuzzy Analytic 
important in intelligent power grids, there are still many Hierarchy Process (FAHP) has the advantages of fuzzy 
challenges in identifying and optimizing measurement analytic hierarchy process due to its consideration of the 
deviations in energy meters. The various complex and fuzzy modeling process between multiple factors and the 
ever-changing environments in which electric energy allocation of weights for multiple factors. It is a powerful 
meters are used make errors in electric energy metering means to achieve "accurate comprehensive use" state 
devices not only caused by external electromagnetic evaluation. 
interference, nonlinear loads, harmonics, but also by 
The research on transfer learning and generative 
aging of the equipment itself and constraints on design 
models has also expanded the scope of multi characteristic 
accuracy [4]. Especially in situations where different 
analysis for device state assessment. Alrobaie et al. (2023) 
types of instruments are shared, voltage fluctuations are 
[14] utilized a balanced comprehensive evaluation method 
large, and large amounts of data are transmitted, 
for power quality issues based on CVAE-TS, which 
traditional methods are no longer able to meet the 
considers the effectiveness and wide applicability of the 
requirements of power network operation efficiency and 
evaluation method; Qu et al. (2024) [15] used an improved 
accuracy. Therefore, researchers hope to find new online 
XGBoost to construct an evaluation model for power 
inspection methods and self diagnostic models to use 
system stability transfer degree, which has good scalability 
digital technology to track the evolution process of 
in multi scenario analysis applications. This has laid a 
monitoring errors [5, 6]. 
theoretical foundation for the subsequent construction of an 
In recent years, the focus of discussions on abnormal adaptive state evaluation mode suitable for power grid 
energy metering has mainly been on anomaly detection measurement devices. 
schemes based on feature extraction and modeling, such 
Based on the existing research results, it can be found 
as equipment operation status classification detection 
that the main technologies at present have made certain 
and prediction based on gradient boosting tree (GBDT), 
progress, such as error detection and data processing, but 
A Comprehensive Evaluation Model for the State of Electric Energy…                       Informatica 49 (2025) 319–332 321 
 
there are still many areas that urgently need consistency analysis, and state level grading standards, it 
improvement. One is that the current indicator system effectively solves the problems of current models in 
lacks strict hierarchical relationships and adaptation structural design, weight allocation, and result 
rules, which limits the performance that can be used in interpretation, and can provide reference for later 
complex situations. The other is that although expert maintenance plan formulation and maintenance 
evaluations have a certain degree of reliability and arrangement optimization. 
flexibility, cognitive biases or subjective uncertainties Table 1 compares the performance of existing 
may still occur in some situations, and fuzzy representative state evaluation methods in terms of data 
mathematical methods need to be introduced to establish type, evaluation path, accuracy, and robustness. It can be 
quantitative evaluation models. The third issue is that seen that the state evaluation model based on fuzzy analytic 
most of the models cannot clearly provide the level hierarchy process (F-AHP) proposed in this article is 
classification and visual effects of the results, which superior to traditional models in terms of accuracy and 
affects the effectiveness of the output [16]. In response applicability, especially in supporting hierarchical output 
to these shortcomings, this article establishes a state and fuzzy boundary recognition, which provides theoretical 
evaluation model based on fuzzy analytic hierarchy support for the subsequent construction of intelligent 
process to establish a hierarchical structure, fuzzy weight metering device management and control mechanisms. 
reconstruction model, and evaluation model that is easy 
to understand. Based on fuzzy judgment matrix, 
 
Table 1: Comparison between existing methods and the model proposed in this paper  
 
Method Technical Robustnes
Sample Type Evaluation Metrics 
Name Approach s 
GBDT Smart meter Gradient Boosting 
Single error metric Moderate 
Model [7] time-series data Decision Tree 
Grey 
Target Power quality Grey decision Multi-feature trend 
Strong 
Theory monitoring data model analysis 
[10] 
SDN Prediction 
SDN monitoring Communication metrics 
Prediction optimization + Fair 
and control data focused 
Model [11] graph structure 
FAHP + 
750kV high-
Grey Multi-layer weight 
voltage Four state dimensions Moderate 
System fusion 
equipment 
[12] 
Three-phase 
F-AHP (Fuzzy 
Proposed meters, terminal Four-layer metrics + 
Analytic Hierarchy Strong 
Method devices (50 graded evaluation 
Process) 
samples) 
This article mainly emphasizes several key issues in applying on a large scale, the lack of a unified level 
the current state evaluation of electric energy metering judgment logic and hierarchical strategy for evaluation can 
devices. easily lead to monitoring delays and failure to identify risks 
Most existing models use a fixed weight in a timely manner. 
superposition method, which has not formed an effective In response to the above issues, improving the 
hierarchical structure and failed to reflect the relative scientific construction, fuzzy adaptability, and hierarchical 
importance between indicators. In reality, there are establishment of state assessment models has become the 
significant differences in the equipment level of each core content of current research. Therefore, this article will 
device, and the same evaluation may not highlight focus on the exposition of these two issues as the main line 
individual issues, which reduces the specificity of the to carry out the research in the following text. 
evaluation. Can the fuzzy analytic hierarchy process balance clear 
Existing research has shown that there is no structure and fuzzy information processing to enhance the 
emphasis on the processing and application of fuzzy scientific evaluation of the state of electric energy metering 
information. In the actual evaluation process, many devices? 
subjective and fuzzy factors, such as "connection How to build a multi-level classification system with 
standards" and "operational stability", have not been discriminative power, so that the evaluation model can 
considered in the system design, resulting in fixed adapt to the equipment management needs in different 
thinking in the evaluation results and insufficient application scenarios? 
response to complex and changing real states. This article proposes a comprehensive state evaluation 
In terms of evaluating output expression, there is a model based on fuzzy analytic hierarchy process to address 
lack of hierarchical expression, making it difficult to the above issues. Its main innovations lie in the following 
achieve refined management. When promoting and aspects. 
322 Informatica 49 (2025) 319–332                                                                                 C. Xu et al. 
 
Build a multidimensional indicator system that is the comprehensive status of the electric energy metering 
covers key elements such as structure, error, and device, while the criterion layer includes four key attributes: 
communication, and allocate weights through FAHP to structural reliability, measurement accuracy, 
enhance the hierarchical and explanatory power of the communication stability, and adaptability to operating 
model. environments, The indicator layer is further refined into 
Introduce fuzzy judgment matrix and consistency more than ten quantifiable or determinable specific 
check mechanism to enhance the ability to accommodate indicators (such as error drift rate, wiring standardization, 
subjective evaluation information and solve the signal packet loss rate, etc.). There are significant attribute 
instability problem in traditional AHP applications. differences and cognitive ambiguity among various 
A systematic evaluation workflow was developed indicators, making it suitable to use triangular fuzzy 
and examples were used to verify the effectiveness of the numbers to construct a judgment matrix and calculate 
model in identifying weak links and assisting precise relative weights and comprehensive scores. 
management. Experimental results also showed that this Compared with traditional single weighted sum 
model has advantages in stability and adaptability methods, FAHP has the following advantages: firstly, it 
compared to traditional models, and is easier to promote. allows experts to use fuzzy language (such as "slightly 
higher" or "significantly stronger") to evaluate when 
constructing the judgment matrix, and improves the 
3  Design of model construction flexibility and closeness of judgment through fuzzy number 
methods transformation; Secondly, FAHP introduces the maximum 
In the comprehensive evaluation model proposed in this membership degree and fuzzy consistency check 
article, the selection of fuzzy analytic hierarchy process mechanism in the weight calculation process, which can 
as the core method is based on its advantages in dealing effectively reduce the impact of subjective errors on the 
with complex and multi-level indicator systems, which evaluation structure, thereby improving the consistency and 
combine structural clarity and fuzzy adaptability. robustness of the evaluation model. 
Although the traditional Analytic Hierarchy Process The difference between FAHP and evaluation models 
(AHP) has good structural modeling capabilities and is such as weighted average, entropy weight, and TOPSIS is 
suitable for multi factor evaluation problems, it often that FAHP can more clearly, accurately, reasonably, and 
exhibits limitations such as strong subjectivity and poor intuitively handle such problems when multiple indicators 
consistency of judgment matrices when facing practical coexist and subjective and objective factors are intertwined. 
problems such as fuzzy expert cognition and unclear When the equipment conditions become increasingly 
boundaries between indicators. The fuzzy analytic diverse and complex, and there is a certain degree of 
hierarchy process, by introducing fuzzy numbers and ambiguity in expert evaluations, the advantages and 
fuzzy judgment matrices, not only retains the superiority of this method in weight setting and result 
hierarchical logical structure of AHP, but also description will be fully reflected. In addition, this method 
significantly enhances the model's ability to does not require too much historical data and complex 
accommodate fuzzy information, improving the stability optimization algorithms to support, so it can be well applied 
and practicality of comprehensive evaluation. to online monitoring systems or distributed management 
This model divides the comprehensive status of systems, which greatly improves its computing speed and 
electric energy metering devices into three levels: target applicability.
level, criterion level, and indicator level. The target layer 
 
Comprehensive status of 
Target 
electric energy metering 
Measurement Communication Adaptability to operating 
Structural reliability Criterion 
accuracy stability environment 
layerlayer 
rosion Installation 
Shell Sealing Cor Wiring measurement Error standard Calibration Signal packet Communicati anti-
frequency loss rate on delay interference 
integrit performance degree of standardizati correctness error drift deviation 
terminals on ability 
↑ Indicator layer (can be simplified or expanded 
according to actual needs) 
Figure 1：Structure diagram of comprehensive state evaluation model for electric energy metering devices based on fuzzy 
analytic hierarchy process (F-AHP) 
A Comprehensive Evaluation Model for the State of Electric Energy…                       Informatica 49 (2025) 319–332 323 
 
 
As shown in Figure 1, this article uses the Fuzzy The indicator layer specifically refers to observable 
Analytic Hierarchy Process (F-AHP) to construct a indicators. Set indicators such as "external shell integrity", 
multi-level hierarchical structure consisting of the "joint corrosion condition", and "fixed fastening" based on 
objective layer, criterion layer, and indicator layer. It the "structural reliability" index of external structural 
integrates fuzzy judgment matrix, weight extraction, and integrity to measure the true degree of object damage; Set 
consistency check to achieve comprehensive evaluation indicators such as "error bounce rate", "standard error 
of multi-source indicator information. degree", and "regular calibration frequency" based on the 
"measurement accuracy" indicator to measure the 
3.1  Construction of state indicator system measurement accuracy and precision of the instrument on 
for electric energy metering devices electrical data; Set indicators such as "communication 
stability" to measure the integrity and timeliness of 
This article uses the Analytic Hierarchy Process (AHP) 
communication information between equipment and central 
to evaluate the overall situation of power metering 
stations, including "communication delay degree", "data 
devices, and constructs a clear logical hierarchical model 
loss ratio", and "noise immunity"; The environmental 
to analyze the overall state of power metering devices 
tolerance index is composed of indicators such as 
during operation. Based on this, three different modules 
"adaptability to usage environment", "temperature range of 
are formed, including the target layer, criterion layer, 
working environment", "humidity range of working 
and indicator layer. Its core function is to transform the 
environment", "anti-interference degree of electromagnetic 
fuzzy status of power metering device operation and 
environment", and "outdoor protection category". 
management into a hierarchical system with a systematic 
These indicators together constitute the feature vector 
structure, and thus become a comparable and computable 
for evaluating the input of the model. Let the indicator 
data system. 
vector of the i-th measuring device be: 
At the target layer, the overall operational 
performance of the measuring device is defined as the X i=[xi1,xi2  ,xin]
（1） 
evaluation criterion and is the ultimate object of the 
Among them, xij represents the observation or rating 
model. This layer contains four types of primary 
attribute values, including structural reliability, value of the i-th device on the jth indicator, and n is the total 
measurement accuracy, communication stability, and number of indicators. To eliminate the influence of 
environmental adaptability. These four types of attribute dimensionality, all indicators will be normalized in the 
values are the structural performance, metrological future. 
performance, communication performance, and Unlike traditional evaluation methods that simply 
environmental adaptability of the measuring device. weight and add various indicators, this paper establishes a 
They cover the main functions of the physical judgment matrix based on a fuzzy hierarchical structure for 
performance, metrological performance, communication weight extraction, and introduces fuzzy language variables 
performance, and environmental performance of the to quantitatively express qualitative indicators, thereby 
measuring device, and are all key elements for enhancing the model's ability to handle "subjective 
evaluating the operational quality of the measuring fuzziness" and "cross indicator correlation".
device. 
Table 2：State index system of electric energy metering devices 
Dimension 
Metric Name Metric Type Reference Range 
Category 
Structural Intact / Minor Damage / 
Enclosure Integrity Qualitative 
Reliability Severe Damage 
Terminal Corrosion Structural 
Qualitative None / Mild / Severe 
Level Reliability 
Structural 
Installation Stability Qualitative Firm / Loose / Detached 
Reliability 
Measurement 
Error Drift Rate Quantitative 0% ~ 2% 
Accuracy 
Measurement 
Standard Deviation Quantitative 0 ~ 0.05 
Accuracy 
Communication 
Packet Loss Rate Quantitative 0% ~ 5% 
Stability 
Communication Communication 
Quantitative 0ms ~ 300ms 
Latency Stability 
Communication 
Noise Immunity Qualitative Weak / Moderate / Strong 
Stability 
Temperature Environmental 
Quantitative -25℃ ~ +60℃ 
Adaptability Suitability 
Protection Rating Environmental 
Qualitative IP20 / IP54 / IP65 etc. 
(IP) Suitability 
324 Informatica 49 (2025) 319–332                                                                                 C. Xu et al. 
 
In the entire model system, the selection of weights for each level, and deblurring is applied to convert 
indicators follows the principle of "comprehensive the fuzzy numbers into clear weight values, ultimately 
coverage, quantifiability, and distinguishability", forming a standardized weight vector. This process ensures 
striving to ensure the evaluation accuracy and that the weight allocation of each indicator to the overall 
discriminative ability of the model while considering the state evaluation results is interpretable. To adapt to 
feasibility of the project. practical application scenarios, the model also introduces a 
hierarchical synthesis mechanism, which weights and 
3.2  Principles and applications of fuzzy summarizes the evaluation values of each sub indicator to 
obtain the comprehensive score of each device's state. At 
analytic hierarchy process 
the same time, to avoid extreme value interference, a 
This article uses the Fuzzy Analytic Hierarchy Process normalization processing function is set up within the 
(FAHP) to solve the multi factor and multi-level system to standardize the mapping of the original scores, 
uncertainty problems encountered in the overall state 
thereby making different devices comparable. 
evaluation process of power metering equipment. 
Compared with the traditional Analytic Hierarchy 
Process (AHP), FAHP based on fuzzy mathematical 3.3  Hierarchical structure and weight 
theory can better adapt to the fuzziness and subjectivity calculation of evaluation model 
of expert judgment, improving the scientific and robust To achieve a systematic evaluation of the operating status 
overall evaluation. of electric energy metering devices, this paper constructs a 
The core idea of FAHP is mainly to model the three-level fuzzy analytic hierarchy process model. The 
evaluation level range, construct a fuzzy judgment model structure consists of a target layer, a criterion layer, 
matrix, calculate fuzzy weight vectors, and perform and an indicator layer from top to bottom, with clear 
hierarchical summarization. This method uses the three- hierarchical logic and comprehensive evaluation 
(l dimensions. It can effectively cover multiple key aspects 
ij,mij,uij) sided fuzzy number  to score experts, involved in device operation, such as performance, 
indicating the weight relationship and objectivity environment, maintenance, and faults. 
between different indicators, thereby reducing subjective The target layer is set as the "comprehensive state level 
misjudgments caused by human operation. Specifically, of electric energy metering devices", representing the 
it represents the lowest judgment value, and if it is the overall goal that needs to be judged ultimately; The criteria 
most likely judgment value, it is the highest judgment layer includes four dimensions: "structural 
value. To achieve fuzzy quantification of subjective reliability,""metrological accuracy,""communication 
judgments, this article adopts a nine-level fuzzy stability,"and "en vironmental adaptability,Evaluation 
language scale, and its corresponding relationship with dimensions are constructed from the perspectives of 
triangular fuzzy numbers is shown in Table 3. equipment stability, external environmental resilience, 
 implementation of operation and maintenance systems, and 
Table 3：Mapping table of fuzzy language and fault susceptibility; The indicator layer is refined into 
triangular fuzzy numbers several observable sub indicators, such as measurement 
accuracy, voltage load response, resistance to temperature 
Triangular Fuzzy 
Fuzzy Term and humidity fluctuations, calibration frequency, fault 
Number (l, m, u) 
repair cycle, etc., to ensure that the evaluation of each 
Equally Important (1, 1, 1) 
dimension has practical operability and measurement basis. 
Slightly More 
(1, 2, 3) 
Important In the stage of determining model weights, the Fuzzy 
Moderately Analytic Hierarchy Process (FAHP) is used for weight 
(2, 3, 4) 
Important calculation. Firstly, organize multiple experts in power 
Clearly More equipment operation and maintenance, as well as 
(4, 5, 6) 
Important measurement technicians, to conduct pairwise comparisons 
Strongly Important (6, 7, 8) around the elements of the criterion layer and indicator 
Extremely Important (8, 9, 10) layer, and construct a fuzzy judgment matrix. The relative 
 importance range of each comparison result is expressed as 
Among them, (l, m, u) respectively represent the (lij,mij,u
lower limit, median, and upper limit of the uncertain a triangular fuzzy number of ij) , effectively 
interval for judgment. Experts use this as a basis for quantifying the fuzziness in subjective judgments. 
language evaluation when constructing a fuzzy judgment Subsequently, the weight calculation and consistency check 
matrix, and further use it for weight calculation and are completed through the following steps: 
consistency testing. ①Fuzzy synthesis weight calculation: using the fuzzy 
In the matrix construction stage, the relative weights arithmetic mean method to perform fuzzy synthesis 
between each criterion are compared using fuzzy calculation on each judgment matrix, obtaining the fuzzy 
pairwise comparisons to generate a fuzzy judgment weight vector of each layer element; 
matrix, and fuzzy consistency checks are used to ensure ②De fuzzification processing: Convert triangular fuzzy 
that the judgment logic is reasonable. Subsequently, the numbers into corresponding clear weight values. The 
commonly used methods are "maximum membership 
fuzzy synthesis algorithm is used to calculate the fuzzy 
A Comprehensive Evaluation Model for the State of Electric Energy…                       Informatica 49 (2025) 319–332 325 
 
Indicator data Constructing a 
Consistency Consist
collection and Fuzzy Judgment 
check ency 
preprocessing Matrix 
passed? 
Generate a Fuzzy Weight 
comprehensive comprehensive calculation and yes 
status evaluation and normalization 
Reconstruct no 
 
 
Figure 2：Model implementation and evaluation flowchart
 
 4  Model implementation and 
degree method" or "center average method". In this study, 
the latter is chosen to improve computational efficiency; evaluation process 
③Normalization adjustment: Scale each weight so This study constructed a hierarchical comprehensive 
that the sum of each weight is 1, to ensure comparability evaluation model from the data collection layer to the 
and accuracy of the model's weighting calculation. processing layer, and from the evaluation layer to the 
④Consistency test: Use the CR ratio to determine warning layer. According to its order, it can be divided into: 
whether the consistency of each judgment matrix is good. firstly, using a predetermined set of state indicators to 
If cr<0.1, it is considered that the consistency of each collect standardized basic data; Then, construct a fuzzy 
judgment matrix is excellent and the calculation result is decision matrix and verify its consistency to ensure that the 
acceptable. weights of each indicator value in each layer are reasonable; 
Finally, weighting the elements at different levels in After the consistency verification is completed, the fuzzy 
the system and using them as weighting vectors to analytic hierarchy process (F-AHP) is used for multi-level 
participate in the fuzzy evaluation system in the data correlation to quantify the membership degree of 
following text is beneficial for the classification of various electrical measurement and metering equipment 
power measurement and control equipment status. It not states and determine the operating level of the equipment. 
only enhances the scientificity and practicality of the Fully considering the actual situation of the power grid, it 
model, but also improves the state analysis and decision- has high applicability and openness. The workflow is 
making performance of the measurement and control shown in Figure 2.
equipment. 
xij - min( x j )
x′
4.1  Indicator data acquisition and ij =
max( x j ) - min( x j )standardization processing （2） 
The first step in model implementation is to obtain the For negative indicators (the smaller the value, the 
raw indicator data of the energy metering device. The better the state), use the reverse standardization formula: 
selected state indicators in this article cover five max( x j ) - x
dimensions: structural reliability, measurement accuracy, ′ ij
xij =
communication stability, and environmental adaptability. max( x j ) - min( x j )
The relevant data mainly comes from multiple channels （3） 
such as on-site inspection records of enterprises, online Among them, xij represents the original value of the 
，
monitoring systems, device self diagnosis modules, and jth indicator in the i-th object, and xij  is its standardized 
historical maintenance archives, ensuring the value. This standardization process can unify all indicator 
comprehensiveness and representativeness of data data into the [0,1] interval, avoiding interference from 
sampling. numerical dimensions in the calculation of model weights, 
Due to differences in the measurement units and ensuring the fairness and scientificity of the evaluation 
numerical ranges of each indicator, direct use for system, and laying a data foundation for the subsequent 
evaluation may result in weight shift and result distortion. construction of fuzzy judgment matrices and weight 
Therefore, it is necessary to standardize the original data. analysis. 
The standardization method is divided into two 
categories based on indicator attributes: for positive  
indicators (the larger the value, the better the state), the 
range standardization method is used: 
326 Informatica 49 (2025) 319–332                                                                                 C. Xu et al. 
 
4.2  Construction of fuzzy judgment also further enhances the credibility of the total weights, 
matrix and consistency test and provides a scientific basis for our later fuzzy analytic 
hierarchy process (F-AHP). 
On the basis of standardized indicator data, in order to 
achieve the importance ranking of factors between 
different evaluation levels, it is necessary to construct a 4.3  Calculation of comprehensive 
fuzzy judgment matrix and conduct consistency checks. evaluation value and classification of status 
Its core lies in introducing subjective judgment through levels 
expert scoring method, while combining fuzzy 
After completing the weight calculation and indicator 
mathematics to handle ambiguity and uncertainty, to 
standardization processing, the most important model 
enhance the adaptability and practical operability of the 
evaluation task to be executed next is the allocation of 
model. 
comprehensive evaluation values and state levels. By using 
The basic steps for constructing a fuzzy judgment the fuzzy analytic hierarchy process (F-AHP), qualitative 
matrix are as follows: Firstly, based on the hierarchical evaluation is transformed into quantitative evaluation to 
structure model, the importance of each indicator in the accurately reflect the status of energy metering equipment.  
same layer is compared pairwise, and a judgment matrix 
Based on the constructed weight vector w=(w1, w2,..., 
is established by referring to the 1-9 nine level scaling 
wn) and the indicator membership matrix R, use fuzzy 
method 
operations to comprehensively evaluate and calculate the 
A=(aij ) nn comprehensive membership vector B. 
（4） 
B=W R=(b
Among them, 𝑎𝑖𝑗represents the importance of the 1,b2 ,...,bm)
（7） 
i-th indicator relative to the j-th indicator. In the fuzzy Among them, B is the overall membership degree of 
analytic hierarchy process (F-AHP), the elements of the each state level, W is the weight vector, and the 
judgment matrix are represented in the form of triangular membership matrix R is an n × m dimensional matrix, 
fuzzy numbers, namely ?̃?𝑖𝑗 = representing the membership values of each evaluation 
indicator at different state levels, reflecting the degree of 
(𝑙𝑖𝑗, 𝑚𝑖𝑗, 𝑢𝑖𝑗),representing the lowest possible value, fuzziness of the equipment belonging to the four categories 
the most reliable value, and the highest possible value, of "excellent, good, medium, and poor" on each indicator. 
respectively, reflecting the expert's judgment of the The construction method is usually based on expert scoring 
importance of the i-th indicator relative to the j-th or fuzzy quantification rules, mapping each original 
indicator under uncertain conditions. For example, indicator value to the [0,1] interval through a membership 
experts believe that "slightly important" can be function to form a membership vector. For example, a lower 
represented as a fuzzy number (2, 3, 4), while "extremely communication packet loss rate can correspond to a higher 
important" can be represented as (8, 9, 9). When ?̃?𝑖𝑗=(l, membership degree in the "excellent" state, while in the 
m, u), its reciprocal can be expressed as (1/u, 1/m, 1/l), "poor" state, the membership degree is close to 0. After 
satisfying a fuzzy symmetry relationship where they are vertically arranging the fuzzy membership vectors of all 
reciprocal to each other. After completing the indicators, a complete membership matrix R is formed. m 
preliminary judgment matrix, calculate the eigenvectors is the number of state levels, and the sample is calculated. 
and normalize them using the following method to obtain bk represents the membership degree of the sample at the 
the preliminary weights of each indicator kth state level, and the higher the value, the closer the 
sample is to the kth state level. The weighted sum of 
n
a1/n
 membership degrees that can ultimately be used to 
j=1 ij
w comprehensively evaluate the value is calculated as follows.  
i =
n n
 a1/n
m
i=1 j=1 ij
（5） S = bk vk
To ensure the consistency of the judgment results, it k =1 （8） 
is necessary to perform consistency checks on the 
Among them, vk  is the membership value 
judgment matrix. The specific process includes: 
corresponding to the state level, which is generally assigned 
calculating the maximum eigenvalue λmax , consistency 
based on the state level, such as excellent, To achieve 
index CI, and consistency ratio CR, where: 
quantitative grading of equipment operating status, this 
λ - CI article maps the comprehensive score S to four status levels, 
CI = max n
 ,CR = 
n -1 RI defined as follows: Excellent (Level I) =4 points, Good 
（6） 
(Level II) =3 points, Fair (Level III)=2 points, Poor (Level 
Among them, RI is a random consistency index, IV)=1 point. The scoring criteria for each level are shown 
which is obtained based on its parameter size n and can in Table 2. This assignment scheme adopts linear 
be directly consulted. If CR<0.10, it indicates that its equidistant scores to reflect the balance of level differences, 
matrix has met the consistency check requirements; On facilitating weighted operations and membership analysis. 
the contrary, it is necessary to adjust the original At the same time, it has scalability and can be adjusted to a 
assignment and perform another operation. This not only percentage system or non-linear weight structure according 
ensures the systematic rigor of the model structure, but to business needs.
 
A Comprehensive Evaluation Model for the State of Electric Energy…                       Informatica 49 (2025) 319–332 327 
 
Table 4: Comprehensive evaluation values and status classification standards for electric energy metering devices 
Comprehensive Score 
Status Level Status Description 
Range 
3.5–4.0 Excellent Good condition, stable operation 
2.5–3.4 Good Slight fluctuations, basically normal 
1.5–2.4 Fair Operational fluctuations, attention needed 
1.0–1.4 Poor Abnormal condition, maintenance required 
The grading criteria in Table 4 refer to the principle This is used to test and verify the adaptability and stability 
of linear distribution and set the scoring interval of the model. Taking into account both existing and new 
boundaries based on expert experience and opinions. equipment types for the selected samples, the voltage level 
Due to the final score S ∈ [1,4] and a total interval length involves urban-rural differences, meeting the 
of 3, it is divided into three complete intervals and one comprehensive and rigorous requirements of the overall 
compensated low interval (1.0-1.4) using the equidistant evaluation process. It should be noted that although the data 
method, aiming to improve the recognition sensitivity of obtained this time has real-time and practical relevance, it 
"poor" level devices. This design facilitates the is highly likely that some indicator data may be incomplete 
implementation of a hierarchical response mechanism due to human inspection errors or system failures, and some 
and also has good scalability. samples may have subjective descriptions or abnormal 
missing items. All sample data comes from the enterprise's 
own measurement equipment operation and maintenance 
5  Analysis of experimental results 
management system. The data has been anonymized and 
This article proposes a model analysis and evaluation only retains information related to the device's operating 
based on the fuzzy hierarchy process for the state status, without involving user privacy. Each indicator data 
evaluation of electric energy metering devices. The includes quantitative values (such as error drift rate, 
experimental data is based on real-time data from the communication packet loss rate) and qualitative scores 
power distribution network and includes various (such as protection level, installation tightness). The 
operating modes and environmental conditions. By qualitative items are consistently scored by two operation 
analyzing and comparing the effects of different weights and maintenance experts and mapped to a three-level rating 
and classification choices on the model, it is proven that value. There are a small number of missing fields in the data, 
the model method proposed in this article can distinguish which will be filled in using industry standard empirical 
equipment states and has a better ability to classify values or adjacent device means. All raw data undergo 
equipment. Finally, the experimental results of each interval normalization before being input into the model to 
stage were analyzed and discussed, and the applicability eliminate the influence of dimensionality and ensure that all 
and stability were explored. indicators have a unified dimension between [0,1] before 
participating in fuzzy synthesis operations. 
5.1  Experimental data sources and case 
selection 5.2  Display of model evaluation results 
The case data of this study is selected from the historical After constructing the Fuzzy Analytic Hierarchy Process 
archives of the power metering equipment management (F-AHP) model, this article conducted a comprehensive 
system, covering various forms such as metering state rating and grading of the 50 selected samples of 
equipment forms, three-phase smart meters, electric energy metering devices. According to the 
comprehensive substations, and power quality normalized scores of various indicators multiplied by their 
monitoring terminals. It is scattered in the supply and weights, the comprehensive evaluation value of each object 
distribution grids of urban and rural areas, presenting is calculated, and based on the preset membership function, 
significant differences in external environment and load its status is divided into four levels: "excellent, good, 
changes. The original data includes eight main indicators medium, and poor". From the overall evaluation results, 
including equipment reliability, counting accuracy, most of the electric energy metering devices are in the 
connection consistency, communication performance, "good" or "medium" level range, indicating that the 
working environment, and failure rate, as well as various operating status of the metering devices in the current 
secondary indicators. The data has strong system is generally controllable. However, some samples 
representativeness and completeness, and is suitable for have problems such as unstable communication, poor 
the design and evaluation of Fuzzy Analytic Hierarchy environmental adaptability, and decreased metering 
Process (F-AHP) in this article. In order to ensure the accuracy, which need to be brought to the attention of the 
universality and effectiveness of the case selection, the operation and maintenance department. 
research team selected 50 typical samples for modeling 
analysis. The selection principles mainly include 5.3  Comparative analysis with traditional 
completeness, comprehensive coverage of relevant 
indicator types, and typicality, which fully reflect the evaluation methods 
real differences in different installation positions, In order to comprehensively verify the effectiveness of the 
working conditions, and types of measuring equipment. proposed F-AHP model in the state evaluation of electric 
328 Informatica 49 (2025) 319–332                                                                                 C. Xu et al. 
 
energy metering devices, we selected the widely used compared and analyzed the comprehensive performance of 
traditional Analytic Hierarchy Process (AHP) and the three methods. The experimental sample consists of 10 
Simple Weighted Average Method (WAM) as control representative sets of electric energy metering devices, and 
objects, and classified the same batch of sample data into the data is sourced from on-site monitoring records in actual 
state levels under a unified indicator system. We also operating environments.
 
Table 5: Comprehensive performance comparison of different methods 
Fuzzy Boundary 
Method Average CI Average CV State Classification 
Sample Recognition 
Type Value ↓ Value ↓ Accuracy ↑ 
Ability 
F-AHP 0.016 0.069 92.5% High 
AHP 0.082 0.125 78.0% Medium 
WAM – 0.109 81.3% Low 
Note: CI is a consistency evaluation index for hierarchical structure weight fusion is the key to improving 
judgment matrices in AHP methods and is not applicable the overall evaluation quality. 
to methods such as WAM that do not have pairwise 
comparison structures. Therefore, this item is empty. 5.4  Model stability and robustness 
As shown in Table 5, the F-AHP model outperforms verification 
AHP and WAM in key indicators such as grade 
In order to further evaluate the applicability and stability of 
discrimination accuracy, consistency ratio (CI), and 
the proposed F-AHP model in the actual state evaluation of 
evaluation stability (measured by coefficient of variation 
electric energy metering devices, this study empirically 
(CV)). Specifically, the average CI of the F-AHP model 
verifies the stability and robustness of the model from three 
is 0.016, which is much lower than the traditional AHP's 
dimensions: input disturbance response, consistency 
0.082, indicating that it has better consistency in the 
fluctuation amplitude, and extreme value adaptability. By 
multi-level weight processing process; In terms of CV, 
introducing perturbation factors and boundary condition 
the average value of F-AHP is 0.069, indicating that it 
perturbations on the original dataset, and comparing the 
has the smallest fluctuation in ratings among different 
fluctuation of results under different evaluation models, the 
samples and has stronger evaluation robustness. At the 
performance reliability of the F-AHP model in complex 
same time, the F-AHP model performs particularly well 
application scenarios is revealed. 
in handling state fuzzy boundary samples. It uses 
triangular fuzzy numbers to construct a judgment matrix, Firstly, in the input disturbance test, we randomly 
which reflects subjective judgment uncertainty while perturbed the indicator data of 10 sets of electric energy 
enhancing the model's ability to identify critical state metering device samples with amplitudes of ± 5% and ± 
devices, avoiding the problems of "fuzzy concentration" 10%, respectively, and observed whether the 
and "level distortion" in traditional methods. The so- comprehensive evaluation score of the model and its 
called 'fuzzy boundary samples' refer to samples whose corresponding level deviated. The results show that when 
comprehensive rating results are close to the critical the disturbance amplitude is less than 10%, more than 80% 
values of two state levels (such as 2.49 or 3.51). In actual of the sample levels remain unchanged in the F-AHP model, 
equipment status assessment, this type of sample and the change in the comprehensive score is controlled 
judgment is the most sensitive and susceptible to weight within 0.06 (as shown in Figure 5), indicating that the 
disturbances or changes in individual indicators. In this model has good input robustness. 
article, it is defined that when the score S of a sample Secondly, in the consistency ratio volatility test, we 
falls within the range of 0.1 above or below a certain conducted 500 Monte Carlo random perturbation 
level boundary (such as S ∈ [2.4, 2.6]), it is considered a experiments on the constructed fuzzy judgment matrix and 
fuzzy boundary sample. We will calculate whether recorded the consistency ratio CI values obtained from each 
different models experience "state level jumps" (such as calculation. The statistical results show that the CI value 
result changes under ± 10% perturbations) on this type fluctuation range of the F-AHP model is concentrated 
of sample, and judge their boundary recognition ability between [0.011, 0.021], with a standard deviation of 0.0026, 
based on this. The F-AHP model only showed a skip which is much lower than the fluctuation standard deviation 
level in 1 out of 10 boundary samples, outperforming of the AHP model of 0.0093 (see Table 6), indicating that 
traditional AHP (3 cases) and WAM (4 cases), indicating F-AHP can maintain stable consistency control ability 
its strong boundary control ability. under complex weight combinations. 
The experimental results show that the F-AHP Thirdly, in the extreme boundary sample test, we 
model balances accuracy, stability, and interpretability selected 5 groups of samples located near the boundary of 
in state evaluation tasks with multiple indicators, levels, the level division and observed the trend of their final state 
and fuzzy information, demonstrating significant determination under the condition of weight perturbation 
comprehensive advantages and having good practical range of ± 15%. The F-AHP model can effectively buffer 
application prospects. Compared with traditional boundary samples through weight processing in the form of 
methods, its innovation in fuzzy logic modeling and fuzzy numbers, with only one group of samples 
experiencing a level transition (from "level II" to "level I"), 
A Comprehensive Evaluation Model for the State of Electric Energy…                       Informatica 49 (2025) 319–332 329 
 
while in traditional AHP models, there are three groups evaluation scores under different disturbance amplitudes 
experiencing a level change under the same conditions. clearly reflects the stability of its score curve at various 
This further demonstrates the robust control capability disturbance levels.
of the F-AHP model in boundary fuzzy regions. As 
shown in Figure 3, the variation trend of F-AHP model 
1
0,9
0,8
0,7
0,6
0,5
Original Score Disturbance +5% Disturbance −5% Disturbance +10% Disturbance −10%
S1 S2 S3 S4 S4
 
Figure 3: Evaluation score fluctuation curve of F-AHP model under different disturbance amplitudes 
 
Table 6：Comparison of stability indicators between F-AHP and AHP models under different testing dimensions 
Test Dimension Indicator F-AHP Model AHP Model 
Mean score fluctuation 
Input disturbance stability 0.037 0.089 
rate 
Consistency ratio fluctuation CI standard deviation 0.0026 0.0093 
Extreme sample rank jump 
Transition frequency 10% (1/10) 30% (3/10) 
rate 
Robust boundary control 
Fuzzy buffering effect Strong Weak 
ability 
From the above experimental results, it can be seen metering devices to solve the state evaluation problem of 
that the F-AHP model exhibits better stability and power equipment. 
robustness than traditional methods in dealing with input 
disturbances, consistency changes, and boundary 6.1  Scope and limitations analysis of the 
disturbances. This is mainly due to the introduction of 
model 
triangular fuzzy numbers and weight fuzzy fusion 
strategy in the construction of fuzzy judgment matrix in This study proposes and implements a method for 
the model, effectively alleviating the excessive evaluating the overall state of electric energy measuring 
sensitivity of subjective weighting to the final result. At instruments using the Fuzzy Analytic Hierarchy Process (F-
the same time, the introduction of a hierarchical structure AHP), which is widely flexible and can be used for 
ensures the coordination and response balance between evaluating the overall state of different power measurement 
different dimensions in a complex indicator system, tools. Especially when there are complex data sources and 
enabling the entire model system to maintain good vague or subjective information between measurement tool 
evaluation reliability and systematicity when facing indicators, it can effectively quantify fuzzy information, 
multi-source heterogeneous and uncertain data inputs in making the state evaluation results highly professional and 
actual power application scenarios. practical; The multi-level hierarchical structure and 
automatic weight adjustment function have played an 
important role in the inspection and evaluation of newly put 
6  Discussion and expansion into operation devices, normal operation monitoring, and 
The F-AHP model constructed in this study handling of aging and failure exit equipment. However, the 
demonstrates the stability of evaluation results and the application of the model is still influenced by the rationality 
ability to distinguish important information in a complex of the evaluation index system design and the credibility of 
and diverse information environment, which is much expert evaluation data, because establishing a fuzzy 
higher than traditional empirical methods and fuzzy judgment matrix relies on the experience of experts. If there 
analytic hierarchy process (F-AHP). Based on is a significant difference in their level of understanding, it 
experimental opinions in different situations, the will affect the fairness of the model output results; When 
accuracy and adaptability of this model are good, and it the actual application has special working conditions or 
has strong scalability and practical value, providing an newly added types, extremely low data volume, and 
intelligent evaluation method for electric energy extremely poor regularity, it may be limited by the model's 
330 Informatica 49 (2025) 319–332                                                                                 C. Xu et al. 
 
generalization ability, and it may be necessary to adjust three common types of measuring instruments, namely user 
the indicator weights or evaluation levels according to side smart meters, station side multifunctional meters, and 
the actual situation. To further ensure the universality of enterprise side measuring systems.
the model, expansion experiments were conducted on 
0,09
Measurement system for industrial and mining enterprises
0,62
0,07
Multi functional energy meter for station use
0,74
0,05
Household smart meter
0,82
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9
standard deviation Sample mean score
Figure 4: Applicability performance of the model in different energy metering devices 
 
Figure 4 shows the mean state scores generated by some weights, resulting in the model being unstable. 
the model in different device types, and the standard Therefore, when evaluating systems such as power 
deviation range is indicated by error bars to reflect the measurement instruments that contain multiple sources of 
score fluctuations between different samples. errors and unknown states, a cautious fuzzy analytic 
hierarchy process (F-AHP) is preferred to increase the 
6.2  Discussion on parameter sensitivity of model's tolerance for external factors. Thirdly, a change in 
the upper limit of the set consistency ratio (CR) threshold 
fuzzy analytic hierarchy process 
can indirectly lead to a change in the final conclusion. The 
In the comprehensive state evaluation model, the main general default value is set to 0.1 to draw a conclusion, but 
influencing factors that need to be considered are that the in the process of complex system evaluation, if the 
F-AHP results are greatly affected by the setting of a consistency requirements are artificially relaxed, it may 
series of important variables, especially the design of lead to internal conflicts, causing the weight system to 
attribute functions for the fuzzy decision matrix, the deviate from the initial judgment conditions and weakening 
selection of upper and lower boundary points for the explanatory power of the model. evaluating systems, 
triangular fuzziness, and the synthesis method of weights. controlling the strictness and number of indicators required 
These variables not only directly affect the ranking of for consistency testing is an important means to ensure the 
each important indicator, but also affect the stability and practicality of the model. 
discrimination of the final comprehensive score results. 
Therefore, exploring the sensitivity of these variables in 
depth can enhance the interpretability and adaptability of 6.3  Model's potential for promotion in 
the model. When establishing a fuzzy decision matrix, a smart grids 
triangular fuzzy representation is usually used, and the The fuzzy analytic hierarchy process proposed by our 
selection of fuzzy boundaries carries a considerable research institute as a feature for evaluating the overall 
degree of subjective color. Even if different experts' operation status of electric energy metering devices has 
evaluation values for indicators fall within the same good universality, scalability, and intelligent integration 
rating range, their corresponding triangle numbers may capabilities, and can be easily promoted to the smart grid 
have slight changes, and this change will be amplified in framework system. On the one hand, the overall analysis 
models with high levels and sensitive interactions. model constructed using this method includes multiple 
Therefore, it is necessary to plan a reasonable mapping indicators such as accurate and stable measurement, 
system between fuzzy words and three-dimensional adaptability to power quality, communication capability, 
numbers to accurately represent the meaning of experts' and adaptability to the working environment, which meet 
ratings. Secondly, synthesizing fuzzy weights can also the concept of power grid management of equipment 
have a significant impact on the final result. The lifecycle. The fuzzy theory principle is used to deal with the 
commonly used weighted average method and the impact of information uncertainty between various 
maximum minimum method have different strengths in indicators, and comprehensive stability analysis can be 
reflecting extremes. In the process of experimental carried out under diverse heterogeneous data, improving the 
verification, if a comprehensive method that is easily ability of power enterprises to identify equipment operation 
scored full marks by extreme situations is used, it may risks under operating conditions. On the other hand, it has 
lead to an increase in global rating due to high scores of good interface scalability and data compatibility, making it 
A Comprehensive Evaluation Model for the State of Electric Energy…                       Informatica 49 (2025) 319–332 331 
 
easy to collaborate with information management metering devices has become a key issue in ensuring 
systems, online monitoring systems, and data centers. measurement accuracy and improving energy 
Whether it is a lightweight installation deployed at the management level. In response to the problems of 
edge or an application deployed in the control room of insufficient resolution and poor adaptability of existing 
the dispatch center for centralized processing, its state recognition modes, this paper designs and 
parameters can be adjusted according to functional needs implements a comprehensive evaluation model for the 
to adapt to various usage scenarios. It can also be state of electric energy metering devices based on fuzzy 
associated with distribution automation and connected to analytic hierarchy process (F-AHP), which considers 
the Internet of Things of electrical equipment to meet multiple indicators. This model is based on expert 
various business scenarios. At the same time, as experience and on-site data, constructing an indicator 
intelligent operation and maintenance are gradually 
system, fuzzy quantification through membership 
promoted, this model also has the possibility of 
functions, and combining hierarchical structure and 
integrating with artificial intelligence technologies such 
weight allocation to achieve comprehensive integration 
as machine learning, anomaly detection, fault prediction, 
of multiple factors influences, outputting quantitative 
etc. By analyzing the results of previous ratings, it is 
scores and state level results. 
possible to build a self-learning system that completes 
In experimental verification, the model outperforms 
the transition from "static evaluation" to "dynamic 
warning", thereby providing the ability to support a traditional weighted scoring and threshold methods in 
comprehensive state management process of perception, terms of recognition accuracy, scoring stability, and anti-
intelligent decision-making, and cyclic control. To interference ability, demonstrating good practicality and 
upgrade from static evaluation to dynamic intelligent potential for promotion. Especially in the processing of 
operation and maintenance, this model can be integrated fuzzy boundary samples, the model exhibits stronger 
with AI modules to construct an intelligent monitoring robustness. However, this study still has certain 
system. For example, the F-AHP score result can be used limitations. For example, the weight judgment process 
as a "health label" for device operation, which can be relies heavily on expert experience, which may cause 
used to train lightweight classifiers (such as SVM, fluctuations due to subjective biases; At the same time, 
XGBoost) to achieve fast prediction of new device states; the sample size for verification is relatively limited and 
At the same time, time series anomaly detection has not fully covered diverse device scenarios. Future 
algorithms such as LSTM-AE and Isolation Forest can research can be further expanded from the following 
be combined to dynamically monitor the trend of device aspects: firstly, combining large-scale operation logs 
status score changes, achieving intelligent early warning with automatic data collection to further enhance the 
of phenomena such as score mutations and boundary objectivity and adaptability of scoring; The second is to 
fluctuations. This model structure can also be embedded explore a data-driven dynamic weight adjustment 
in IoT platforms, collecting real-time data through edge mechanism to weaken expert dependence; The third is to 
gateways and quickly scoring it as input for the operation integrate F-AHP with machine learning models to 
feedback indicators of digital twin systems, providing construct a state recognition framework with self-
high timeliness decision support for scheduling systems. 
learning capabilities, achieving the transition from static 
It is worth noting that the F-AHP method will face evaluation to real-time intelligent monitoring. Overall, 
the problem of increased computational complexity in the state assessment model constructed in this article 
the construction of its judgment matrix and fuzzy weight 
provides a feasible path for the intelligent management of 
calculation process when facing large-scale device 
energy metering devices and lays a methodological 
clusters or significantly increased indicator dimensions 
foundation for the high reliability operation of future 
(such as n>30). In theory, the computational complexity 
smart grid measurement and control equipment. 
of its judgment process at each layer is about O (n ²), and 
as the number of indicator layers or expert groups 
expands, the computational time and difficulty of Funding 
consistency testing significantly increase. Therefore, in 
practical deployment, a distributed computing strategy 
The research is supported by: Science and technology 
can be adopted to modularize weight calculation and 
projects of State Grid Ningxia Electric Power Co., Ltd. 
process it in parallel; At the same time, an expert scoring 
template library is constructed through historical data to 
achieve automated filling of the judgment matrix. This References 
model is also suitable for encapsulation into operation [1] Wang Z, Han X, Wang Y. High-precision substation 
and maintenance platforms in a microservices manner, electric energy metering device based on an adaptive 
with good resource scheduling and load control algorithm[J]. Journal of Physics: Conference 
capabilities in large-scale device management scenarios, 
Series,2024,2782(1): https://dol:10.1088/1742-
ensuring efficient and stable output of evaluation results. 
6596/2782/1/012053. 
[2] Alvarez-Valera H H, Maurice A, Ravat F, et al. 
7  Conclusion Energy Measurement System forData Lake: An 
With the continuous development of smart grids, how Initial Approach[C]//Asian Conference on Intelligent 
to efficiently identify the operating status of energy 
332 Informatica 49 (2025) 319–332                                                                                 C. Xu et al. 
 
Information and Database Systems.Springer, Sustainable Energy, 
Singapore, 2024.https://dol:10.1007/978-981-97- 2023.https://dol:10.1002/ep.14072. 
4982-9_2. [13] Paunkov N , Popov R , Parushev A ,et 
[3] Zhang F ,Guo J ,Yuan F , et al.Research on al.MEASUREMENT OF ELECTRICAL POWER 
Intelligent Verification System of High Voltage AND ENERGY WITH A SMART 
Electric Energy Metering Device Based on Power METER[J].INTED2023 Proceedings, 
Cloud[J].Electronics,2023,12(11):https://dol:10.3 2023.https://dol:10.21125/inted.2023.1300. 
390/electronics12112493. [14] Alrobaie A S , Krarti M .Measurement and 
[4] Zheng K , Zhou M , Zhang Y .Status evaluation of Verification Building Energy Prediction (MVBEP): 
electric energy metering device (EEMD) based on An interpretable data-driven model development and 
artificial intelligence technology[J].Proceedings analysis framework[J].Energy and Buildings, 
of SPIE, 2022, 2023.https://dol:10.1016/j.enbuild.2023.113321. 
12500(000):7.https://dol:10.1117/12.2662643. [15] Qu Y , Wang J , Cheng X ,et al.Migratable Power 
[5] Natarajan N , Vasudevan M , Dineshkumar S K ,et System Transient Stability AssessmentMethod Based 
al.Comparison of Analytic Hierarchy Process on Improved XGBoost[J].Energy Engineering, 2024, 
(AHP) and Fuzzy Analytic Hierarchy Process (f- 121(7):1847-
AHP) for the Sustainability Assessment of a Water 1863https://dol:10.32604/ee.2024.048300. 
Supply Project[J].Journal of The Institution of [16] Zhao J , Huang S , Cai Q ,et al.Research on 
Engineers (India), Series A. Civil, architectural, Distributed Renewable Energy Power Measurement 
environmental and agricultural engineering, and Operation Control Based on Cloud-Edge 
2022.https://dol:10.1007/s40030-022-00665-x. Collaboration[J].EAI Endorsed Transactions on the 
[6] Li Y , Dong Y , Pan J ,et al.Black box fault Energy Web, 2024, 
diagnosis method of power metering device based 11(1).https://dol:10.4108/ew.5520. 
on digital filtering and EMD [17] Prastowo H , Pitana T , Artana K B ,et 
method[J].Proceedings of SPIE, 2023, al.Development of Measurement Tool for Energy 
12788(000):8.https://dol:10.1117/12.3004699. Efficiency Operational Indicator (EEOI)[J].IOP 
[7] Yang W , Wang G , Wu W ,et al.Analysis of factors Publishing Ltd, 2024.https://dol:10.1088/1755-
influencing electric energy measurement in power 1315/1423/1/012026. 
systems[J].AIP Advances, 2024, [18] Huang T , Zhiwu W U , Zhan W ,et al.ELECTRIC 
14(2):6.https://dol:10.1063/5.0190948. ENERGY METERING ERROR EVALUATION 
[8] Yamamoto T , Tokura Y .Energy exchange and METHOD BASED ON DEEP 
fluctuations between a dissipative qubit and a LEARNING[J].Scalable Computing: Practice & 
monitor under continuous measurement and Experience, 2024, 
feedback[J].  25(3).https://dol:10.12694/scpe.v25i3.2789. 
2024.https://dol:10.21468/SciPostPhysCore.8.1.0 [19] Wang C .A method for identifying and evaluating 
16. energy meter data based on big data analysis 
[9] Shah A M ,Yuan R ,Zheng S , et al.Modeling of technology[J].International Journal of Information 
Nonlinear Load Electric Energy Measurement and and Communication Technology, 2023, 
Evaluation System Based on Artificial Intelligence 23(4):22.https://dol:10.1504/IJICT.2023.134852. 
Algorithm[J].Recent Advances in Electrical & [20] Yue Z , Yao X , Chen Y .A New Fuzzy Analytic 
Electronic Engineering,2023,16(2):94- Hierarchy Process Method forSoftware 
102..https://dol:10.2174/23520965156662205181 Trustworthiness Measurement[C]//International 
21454 Conference on AI Logic and Applications.Springer, 
[10] Gerber D L , Ghatpande O A , Nazir M ,et Singapore, 2022.https://dol:10.1007/978-981-19-
al.Energy and power quality measurement for 7510-3_18. 
electrical distribution in AC and DC microgrid 
 
buildings[J].Applied Energy, 2022, 
308.https://dol:10.1016/j.apenergy.2021.118308. 
[11] Jintao C ,Binruo Z ,Fang Z , et al.Prediction 
Method of Electric Energy Metering Device Based 
on Software-Defined Networking[J].International 
Journal of Information Security and Privacy 
(IJISP),2022,16(2):1-
20.https://dol:10.4018/IJISP.308316. 
[12] Taherikhonakdar Z , Fazlollahtabar H .Proposing 
the method for Operating System energy 
consumption measurement with energy efficiency 
approach[J].Environmental Progress & 
https://doi.org/10.31449/inf.v49i12.8933 Informatica 49 (2025) 333–344 333 
 
 
 
Models And Methods of Analysing Infrastructure Performance in 
Cloud Environments Based on Process Optimisation Methods 
 
Pavlo Kudrynskyi1, *, Oleksandr Zvenihorodskyi2, Yaroslav Bai2 
1Department of Computer Science, State University of Information and Communication Technologies 
Kyiv, 03110, Ukraine 
2Department of Artificial Intelligence, State University of Information and Communication Technologies 
Kyiv, 03110, Ukraine 
E-mail: pavlokudrynskyi@ukr.net, o.zvenihorodskyi@outlook.com, yar-bai@hotmail.com 
*Corresponding author 
Keywords: performance evaluation, workflow improvement, neural network technologies, resource management, 
dynamic workloads, cloud services 
Received: April 16, 2025  
The study aimed to develop models and methods for analysing infrastructure performance in cloud 
environments that consider the complexity and dynamism of modern IT systems. The development of 
adaptive resource management models capable of responding to changing loads in real time was 
emphasised. New methods of process optimisation were developed, including the use of artificial neural 
networks for load forecasting and dynamic resource allocation. Solutions for efficient management of 
computing and storage capacities were modelled and simulated. The use of adaptive models based on 
neural network technologies has increased the accuracy of load forecasting by up to 95% and reduced 
costs by 20% through the automation of resource management. Practical experiments conducted in the 
Amazon Web Services (AWS) and Microsoft Azure environments confirmed the effectiveness of the 
approaches under various load conditions. These results help to improve the stability of cloud services, 
reducing the risk of overload, downtime and data loss. The proposed models are universal and can be 
applied in various industries, including the financial sector, e-commerce and healthcare, which allows 
them to effectively solve the problems faced by modern information systems. The findings of the study 
highlight the importance of integrating artificial intelligence into performance management, which 
ensures the flexibility and scalability of cloud environments. This creates new opportunities to optimise 
processes, improve service quality and reduce operating costs, creating the basis for further research and 
development in the field of cloud computing. 
Povzetek: Študija razvija adaptivne modele in metode za analizo delovanja infrastrukture v oblaku, ki 
temeljijo na globokem učenju (nevronske mreže) za dinamično upravljanje virov. To je omogočilo boljše 
napovedi obremenitve in zmanjšanje stroškov v okoljih AWS in Azure, kar povečuje stabilnost in 
učinkovitost storitev. 
 
1 Introduction management. For example, M. Abdullah and M. 
Mohamed Surputheen [2] noted that static models often  
In the modern world, cloud computing has become an used for performance analysis do not consider the 
important element of IT infrastructure for enterprises and dynamic nature of loads inherent in modern cloud 
organisations of varying scales. Cloud computing enables infrastructures. These approaches do not facilitate optimal 
efficient use of resources, reduces infrastructure costs and resource allocation, particularly during fluctuations in user 
provides flexibility in working with data. However, the activity or when processing large datasets. The authors 
growing popularity of cloud services poses new advocate for the implementation of adaptive models, 
challenges, particularly in managing their performance, although their analysis remains largely conceptual. 
efficiency and security. One of the main challenges is to Similarly, H. Alrammah et al. [3], who addressed the 
ensure the high performance of cloud infrastructures under limited scalability of cloud platforms within static 
variable loads, as well as to optimise the cost of computing resource management models. The authors noted that such 
resources [1]. Consequently, investigating novel models approaches do not consider unpredictable changes in the 
and methodologies for analysing the performance of cloud load that often occur due to peak user activity. They 
environments is both a significant and pressing endeavour. proposed the use of adaptive algorithms, but their research 
According to numerous studies, existing approaches to is mostly limited to basic simulations, without a detailed 
assessing performance in cloud environments have analysis of performance in real-world conditions. 
significant limitations that affect the efficiency of resource  
334   Informatica 49 (2025) 333–344                                                                                                                       P. Kudrynskyi et al. 
 
 
 
The study by A. Tiwari and S. Yadaw [4] also confirms In this context, it is also worth noting the importance 
that static resource management approaches do not of adaptive systems in effective resource management in 
provide adequate efficiency in dynamic cloud cloud environments. Adaptive approaches can be used to 
environments. The authors analysed in detail the dynamically respond to changes in load, ensuring efficient 
shortcomings of such methods and noted that they are use of available resources and minimising their excessive 
particularly inefficient during peak loads. A. Tiwari and S. consumption. The implementation of such systems not 
Yadaw emphasise the importance of implementing only increases the stability and reliability of cloud services 
adaptive technologies that can predict load changes and but also contributes to economic efficiency, as it makes it 
adapt resources accordingly. Although their study is possible to reduce infrastructure costs without losing 
mainly focused on analysing existing approaches, it lays service quality [10]. This approach is especially important 
the theoretical foundation for the integration of smart in today's environment, when organisations face large 
systems. volumes of data and demands on the speed of information 
R. Anayat [5] explored the role of machine learning in processing, as well as a high level of flexibility and 
enhancing the performance management of cloud scalability in their systems. 
infrastructures. The author noted that the basic algorithms In general, based on the aforementioned 
that are often used do not consider the complexity and considerations, this study aims to develop novel models 
variability of the real-world conditions in which cloud and methods for analysing the performance of cloud 
platforms operate. R. Anayat recommends the use of deep infrastructures that combine adaptive and intelligent 
neural network models that can provide more accurate approaches. These models should operate efficiently 
forecasting and adaptation of resources, but the study amidst constant load changes, ensuring high performance, 
remains mostly theoretical and does not offer detailed resilience, and cost-effectiveness under varying 
practical implementations.  operational conditions. This will not only improve system 
Despite advancements in cloud computing research, performance but also expand opportunities for the use of 
there is a dearth of comprehensive approaches that cloud technologies in various industries, such as financial 
integrate various technologies for optimising and services, healthcare, and e-commerce. 
managing resources under real-world workloads. The 
problem of adaptive management of cloud infrastructures 2 Materials and methods 
that can effectively respond to changing conditions 
remains unresolved in many scientific papers. Therefore, The research is based on two major cloud platforms: 
it is imperative to explore the potential of advanced Amazon Web Services (AWS) and Microsoft Azure. 
optimisation methods, including neural network Modelling and simulations were conducted on these 
technologies, to achieve high-performance management platforms to study the effectiveness of different 
efficiency in cloud environments. approaches to performance optimisation.  
Previous studies show that most existing models are The study was conducted on equipment located in 
unable to effectively account for load variability. Standard AWS and Microsoft Azure data centres. Each server had 
optimisation algorithms may prove ineffective under the resources ranging from 2 to 16 processor cores and 8 to 64 
high dynamism and scalability of cloud systems [6,7]. GB of RAM, which provided the necessary capacity for 
Studies such as those by O.B. Johnson et al. [8] confirm conducting load tests and performance monitoring. 
that without the use of adaptive management methods, it Apache JMeter and Stress-ng tools were used to simulate 
is impossible to ensure stability and efficiency in the the load on the servers, which was used to simulate various 
operation of cloud infrastructures. Thus, there is a need to load scenarios in cloud environments. The performance of 
develop methods that can adjust resources in real-time and the systems was monitored using Amazon CloudWatch 
consider multifaceted changes through the integration of and Azure Monitor monitoring interfaces, which provide 
artificial intelligence. detailed information on resource usage. For the statistical 
A. Talha et al. [9] also discussed approaches to using analysis of the data obtained, the R environment was used 
machine learning for load forecasting and automatic to process and visualise the results, as well as the SPSS 
resource scaling in cloud platforms. However, contrary to software package to perform significance tests and 
their work, which focused on basic machine learning compare the results between different server 
methods, the study focuses on deep neural networks, configurations and cloud platforms. 
which allow for more accurate forecasting and adaptation Resource allocation adaptation models were 
of resources under highly dynamic loads. developed using recurrent neural networks and Long 
Machine learning methods, in particular neural Short-Term Memory networks. These models specialise in 
networks, have great potential to solve this problem, as processing time series of data, such as central processing 
they allow modelling complex relationships between unit (CPU) utilisation, memory, disc operations and 
various system parameters and predicting future load. network traffic. The developed models were integrated 
However, to date, there is very little research combining into a real-time dynamic resource scaling system. By 
these methods with cloud technologies. This study aims to predicting load peaks, the system adapted, adding or 
fill this gap and develop new approaches for integrating releasing resources as needed.  
machine learning into cloud infrastructure optimisation The sample for this study was formed based on the 
processes. characteristics of typical cloud environments that are 
Models And Methods of Analysing Infrastructure Performance in…                                         Informatica 49 (2025) 333–344 335 
 
 
 
widely used in real organisations to ensure reliability, the performance of different types of virtual machines was 
scalability and efficient resource management. These compared. The study determined that with optimal load 
environments were chosen to replicate a variety of balancing, even low-tier servers can achieve performance 
industry-standard cloud setups, with differing compute similar to high-end machines at significantly lower costs. 
and storage needs that represent actual cloud service The resilience of cloud platforms to failures and high 
deployments, in order to guarantee the models' loads was evaluated. To do this, error injection methods 
applicability. For AWS, three types of instances were (for example, Chaos Monkey) and load simulation using 
selected: standard, storage, and compute-intensive, which Kubernetes Stress Test Tools were employed. Chaos 
meet different performance and load requirements. For Monkey was applied by randomly terminating instances to 
Microsoft Azure, similar configurations were chosen to simulate system failures and assess recovery capabilities. 
provide a comparison between the two most popular cloud Kubernetes Stress Test Tools was employed to simulate 
platforms. Examples from both AWS and Azure were high traffic conditions, testing the platform's ability to 
chosen to provide a clear and equitable comparison, handle resource scaling and maintain stability under heavy 
encompassing a variety of workloads, such as content loads. The main criteria were the percentage of data loss 
delivery networks, high-performance computing and the average recovery time after a failure. The 
applications, and transactional databases. The choice of evaluation demonstrated that platforms with automatic 
configurations was based on real-world use cases, such as scaling and redundancy mechanisms provide high 
web application hosting, big data processing, and file resilience even in critical conditions. 
storage. The next stage included resource management and 
The study also determined the amount of data cost-effectiveness analysis. Dynamic scaling algorithms 
processed and the level of traffic, which ranged from reduced the cost of renting cloud resources by 25% and 
moderate (constant load on the servers) to highly dynamic reduced server downtime. This section compared the 
(with sharp traffic spikes at certain times). This diversity effectiveness of static and adaptive management by 
was used to evaluate the ability of the platforms to adapt evaluating key performance indicators such as system 
to changing conditions and ensure high performance under uptime, resource utilisation, and cost efficiency. It showed 
different loads. For each server configuration, several load a significant reduction in costs and improvement in 
scenarios that varied depending on the type and degree of performance when using the adaptive approach.  
user activity were created. These scenarios were created to At the final stage, the infrastructure performance was 
mimic the behaviour of real-world applications under optimised using multi-criteria algorithms, such as genetic 
various operating situations in addition to testing the algorithms and the particle swarm method. Simulation 
scalability of the system. They ranged from a stable load platforms (CloudSim, iFogSim) were used to test the 
(where the servers operate at an average level of developed models. They simulated cloud environments 
performance) to a highly dynamic load (where the load and evaluated resource allocation strategies under various 
increases sharply at certain times). load conditions. The main criteria were to reduce query 
The study was conducted in a real-world environment processing time and increase overall performance, 
where each platform used its typical performance considering energy consumption. The platforms were 
monitoring tools. Amazon CloudWatch was used for compared using static and adaptive resource management 
AWS and Azure. Azure Monitor was used for Azure, methods. The results showed that the optimisation 
which allowed for accurate monitoring of service improved performance by 18-22%. 
performance, including CPU Utilisation, Network This approach identified the most effective resource 
Throughput, Memory Usage and Disk I/O. The study was management strategies that automatically optimise their 
conducted on servers located in geographically dispersed use under high loads, minimising infrastructure costs and 
data centres, which was used to examine the performance ensuring stable system operation under changing 
of the platforms in different locations and physical conditions. In addition, adaptive management algorithms 
distances between servers. have reduced operating costs for computing power 
The performance of the cloud infrastructure was without losing data processing efficiency. 
assessed. The main criteria were system response time 
(ms), throughput (requests/sec), and resource utilisation 3 Results 
(CPU, memory, and disk space). Log files of real cloud 
platforms (AWS, Azure, Google Cloud) and synthetic 3.1 Comparative performance analysis of 
tests (for example, Apache Bench) were used. The results 
AWS and Microsoft Azure cloud platforms 
showed that performance significantly decreases at peak 
loads, which requires dynamic resource management. and development of resource allocation 
The efficiency of resource use, which was determined adaptation models 
by power consumption (W/request) and the efficiency of 
servicing requests per unit of equipment, was analysed in As part of the research, models for real-time adaptation of 
the study. This helps in understanding whether the cloud resource allocation based on intelligent algorithms, such 
infrastructure is over-provisioned or underutilised, leading as recurrent neural networks and Long Short-Term 
to potential cost savings or performance issues. Profilers Memory networks, were developed. These models 
such as Cloud Harmony and Prometheus were used, and specialised in processing time-series data, such as CPU, 
336   Informatica 49 (2025) 333–344                                                                                                                       P. Kudrynskyi et al. 
 
 
 
memory, disc operations, and network traffic. The When it comes to memory usage, both platforms can 
development process involved several key steps. First, the provide stable performance under a steady load, but 
data was prepared: it was normalised, cleaned of Azure's memory usage is less efficient when the demand 
anomalies and segmented to ensure the quality of training. for resources is variable. In the case of sudden load peaks, 
The models were trained with an emphasis on analysing static management on Azure does not efficiently limit 
long-term dependencies in time series, which was used to memory usage, which can cause overloading of certain 
identify hidden patterns in load changes. Model instances and degradation of overall system performance. 
optimisation included the use of genetic algorithms and Instead, AWS demonstrates better results in terms of 
particle swarming to tune hyperparameters and find memory allocation among instances. Based on the data 
optimal resource configurations that minimise response obtained, memory usage on the AWS platform is 6-8% 
time and energy consumption. Clustering algorithms were more efficient than Azure in static resource management. 
also used to group servers and resources based on This demonstrates AWS's superior ability to maintain load 
similarity in load, which contributed to their more efficient balance without critical overloads on certain nodes, even 
use. with static resource allocation. 
When using static management methods, which Network Throughput is a critical factor for the 
involve a fixed allocation of resources without the ability performance of cloud platforms, especially when there are 
to dynamically scale them in real-time, it is important to large volumes of data transfer between services [13,14]. 
assess how each platform handles loads under conditions With static resource management, AWS demonstrates 
of stable and variable demand. Static resource better results in providing stable and high-performance 
management does not allow for adaptation to load network performance. With more optimised data paths 
fluctuations, which can lead to inefficient use of and better geographical distribution of its data centres, 
computing power, memory, and other resources [11,12]. AWS can provide more stable and faster data transfer, 
However, to compare the performance of AWS and even at peak loads. Microsoft Azure in static management 
Microsoft Azure platforms in static resource management, conditions shows slightly lower network throughput in the 
four main indicators should be considered: CPU case of high volumes of data transfer between instances. 
Utilisation, Memory Usage, Network Throughput and The difference in throughput is 10-12% in favour of AWS, 
Disk I/O. which is the result of less efficient load balancing in the 
As shown in Table 1, both AWS and Microsoft Azure network on the Azure platform. 
deliver stable CPU utilisation results when running static Disk I/O is an important parameter for cloud 
resource management. However, when there are platforms, as it determines the speed of reading and 
significant load peaks, AWS is usually more efficient in writing data to the disc. Both platforms provide high 
managing CPU resources, as its default algorithms performance when using disk resources in static mode. 
provide more efficient load balancing between instances. However, with large volumes of disk operations, it turns 
In Microsoft Azure, the CPU utilisation situation may be out that AWS can better cope with high disk loads due to 
less optimised, as it does not have the same flexibility to more optimised caching and storage methods. Microsoft 
scale instances in real-time, which leads to the overloading Azure, although it demonstrates good results in terms of 
of certain instances while others remain underutilised. Disk I/O, has certain limitations under static management 
 at high loads. Tests have shown that the efficiency of using 
Table 1: Comparison of AWS and Azure performance by disk resources on Azure at a stable load is 7-9% worse 
key metrics in static management than on AWS, which is the result of a less optimised 
CPU Network organisation of the disk subsystem under static 
Memor Disk 
Platform Utilisatio Throughpu
y Usage I/O management. 
n t The static resource management on both platforms 
50 shows certain limitations in the face of variable 
AWS 95% 89% 95 MB/s MB/
workloads. While both platforms perform similarly under 
s 
steady resource demand, AWS delivers better 
48 
Microsof
92% 90% 92 MB/s MB/ performance under dynamic workloads by making more 
t Azure 
s efficient use of its compute, memory, network, and disk 
 resources. These differences can be associated with their 
Table 1 shows that in terms of static resource storage options, network architecture, and scaling and 
management, the performance of both platforms is similar, resource allocation strategies. Better dynamic scaling and 
but AWS demonstrates better performance in most key load balancing algorithms enable AWS to effectively 
metrics. CPU utilisation rates indicate a high load on the distribute resources in real-time based on varying demand, 
processors of these systems. This means that most of the which is why it performs better than Azure. AWS's 
computing resources are used to process requests, which extensive worldwide network of data centres and well-
may indicate that the system is operating efficiently but designed storage solutions further improve its capacity to 
also indicates that delays or performance degradation may manage peak loads and large data volumes without 
occur if the load is increased further.  experiencing performance issues. Azure's static resource 
management methodology, on the other hand, lacks real-
time adaptability and results in less effective resource 
Models And Methods of Analysing Infrastructure Performance in…                                         Informatica 49 (2025) 333–344 337 
 
 
 
allocation, particularly during periods of changing Microsoft Static 83% 2100 W/hour 
demand, which causes instances to be underutilised or Azure 
overloaded. Because of this, AWS offers greater Source: compiled by the authors. 
flexibility, faster resource adjustments, and better overall  
performance during dynamic workloads, whereas Azure A comparison of adaptive and static resource 
functions well in stable environments but has trouble management shows significant advantages of adaptive 
handling variations in peak demand. methods: 
AWS scores demonstrated significant improvement 1. Uptime without downtime. Adaptive 
over Microsoft Azure in such areas as CPU Utilisation and management ensures 98% (AWS) and 97% (Azure) 
Network Throughput, which improves the platform's uptime, which is 10-15% higher than static management. 
scalability under highly dynamic workloads by 15%. This 2. Power consumption. Adaptive management can 
allows the AWS platform to handle variable workloads reduce energy costs by up to 1500 W/h for AWS and 1700 
faster and more efficiently, reducing latency and W/h for Azure, which is 5-6% less than static methods. 
improving overall performance. Adaptive management significantly improves the 
At the same time, Microsoft Azure performs better efficiency and stability of cloud platforms by predicting 
under stable workloads, particularly in the Memory Usage load and automatically scaling [17]. The study results 
aspect, demonstrating a 10% improvement. This suggests showed that adaptive resource management based on deep 
that Azure is more efficient when resource demand is learning methods significantly improves server efficiency 
fixed, making it more attractive to organisations that have by reducing power consumption and reducing downtime. 
a stable infrastructure load. This is achieved by accurately predicting the load and 
automatically scaling resources in response to changes in 
3.2  Assessing the effectiveness of adaptive the load. Compared to static management methods, 
adaptive technologies can reduce downtime by 10-15%. 
resource management 
This means that cloud services operate more stably, even 
Further experiments were aimed at evaluating the in cases of high or variable loads, providing uninterrupted 
impact of adaptive resource management on the overall access to resources for users. 
performance of cloud platforms, comparing adaptive and This is especially relevant for cloud infrastructures that 
static resource management. For this purpose, two main often face high dynamic loads, such as large volumes of 
scenarios were applied, where one used traditional static traffic, spikes in user activity, or sudden changes in 
management and the other adaptive management based on computing resource requirements. Static methods based 
deep learning methods. on fixed capacity reservations cannot effectively respond 
Table 2 presents a comparative analysis of cloud to such changes, which often leads to the overuse of 
platforms employing adaptive versus static resource resources at times of low load or system overload at high 
management, evaluated across two key metrics: uptime loads. At the same time, adaptive technologies that use 
and power consumption. Uptime, defined as the deep learning can adjust resources in real time, 
percentage of operational time without system anticipating changes in load and adjusting them 
interruptions, demonstrates a marked advantage in accordingly to ensure optimal system performance. 
adaptive management systems. In the case of adaptive Thus, the results demonstrate that the implementation 
management, the platform automatically scales in of adaptive technologies is critical to optimise the 
response to changes in load, which reduces the risk of performance of cloud infrastructures, particularly in 
downtime and ensures high stability, which explains the conditions of high load dynamics. This reduces costs, 
high score for this parameter. Static control, on the other minimises downtime and ensures more stable and efficient 
hand, does not respond to changes in load, which increases operation of cloud services, which is important for 
the probability of overloads and, consequently, downtime. businesses that depend on uninterrupted access to 
Energy consumption shows the percentage of costs for computing power. 
using cloud resources. Adaptive management can use 
resources efficiently, scaling them depending on the load, 3.3 Resistance to load changes and error 
which reduces costs [15,16]. Static management, which injection testing 
does not adapt resources to changes, leads to higher costs 
because resources are used less efficiently. To assess the resilience of cloud platforms, testing 
 was conducted that included sudden changes in load, such 
Table 2: Performance results of cloud platforms with as traffic spikes and processing large amounts of data in a 
adaptive and static resource management short period. The results showed that adaptive resource 
Platform Type of Operating time Energy management provides significantly better platform 
control without consumption resilience to outages and changes in load. For instance, for 
downtime AWS with adaptive management, the percentage of data 
AWS Adaptive 98% 1500 W/hour loss was 0.5%, the average recovery time after a failure 
AWS Static 85% 2000 W/hour was 3 minutes, and the performance degradation during 
Microsoft Adaptive 97% 1700 W/hour peak loads was only 8%. In comparison, AWS with static 
Azure 
management showed 3.2% data loss, 12 minutes of 
338   Informatica 49 (2025) 333–344                                                                                                                       P. Kudrynskyi et al. 
 
 
 
recovery time, and an 18% performance degradation. For 3.4 Reduction of infrastructure costs 
Microsoft Azure with adaptive management, the 
percentage of data loss was 0.7%, the average recovery Adaptive resource management in cloud 
time was 4 minutes, and the performance was 7%. In infrastructures has proven to have significant cost-saving 
contrast, Azure with static management had 4.1% data benefits. Resource efficiency avoids situations when 
loss, 15 minutes of recovery time, and a 22% performance servers are running at low load or overloaded, which is 
degradation. Thus, adaptive resource management allows within normal parameters for static management methods. 
for better fault tolerance and high performance during Real-time optimisation of resource allocation minimises 
peak loads, while static management demonstrates the amount of unused computing capacity, thus reducing 
significantly worse results in terms of data loss, recovery the direct costs of renting or operating them. 
time, and performance. In the tests, both platforms In addition, resilience to changes in load provides 
demonstrated the ability to effectively handle these load flexible scaling that allows platforms to effectively handle 
changes, but AWS performed significantly better in terms peak loads without having to maintain excessive resource 
of rapid recovery and resource adaptation. At high peak reserves [18,19]. This is particularly relevant for 
loads, AWS proved to be more efficient in load balancing, businesses with irregular or seasonal operations, where 
which reduced response times and avoided delays in adaptive management can reduce the need for long-term 
request execution. This ensured high availability of leases or additional capacity, reducing costs by up to 20% 
services, even with significant load fluctuations. compared to static approaches. Thus, efficiency and 
Compared to Microsoft Azure, AWS has shown greater resilience to change not only reduce operating costs but 
flexibility in scaling resources, which has enabled faster also increase the cost-effectiveness of cloud infrastructure 
response to sudden changes in traffic and loads, increasing while ensuring stability and quality of service. 
the overall resilience of the platform.  Table 3 shows a comparison of infrastructure costs for 
There are a number of reasons why AWS and Azure static and adaptive resource management methods on 
function differently, including variations in their designs, AWS and Microsoft Azure. 
approaches to resource management, and load balancing  
systems. AWS's superior load-balancing algorithms and Table 3: Reduced infrastructure costs when using 
capacity to effectively divide workloads among numerous adaptive management 
instances allow it to scale resources with greater Reduction of 
flexibility, particularly during periods of high peak load. Type of Infrastructure costs with 
Platform 
This guarantees faster response times and fewer execution control costs (%) adaptive 
management 
delays for requests. Azure, on the other hand, struggles 
with resource allocation during dynamic load variations, Adaptive 20% 20% 
leading to instances that are either underutilised or AWS 
overcrowded, even if it performs well under constant load Static 25% - 
levels. Additionally, AWS gains from a more strategically 
placed data centre network, which improves network Adaptive 18% 22% 
throughput and overall performance during periods of Microsoft 
Azure 
high traffic. Azure's performance, on the other hand, is Static 23% - 
typically more reliable but less effective at managing 
abrupt surges in traffic. Additionally, AWS's predictive Source: compiled by the authors. 
resource management and improved machine learning  
model integration allow for quicker adaptability to shifting For AWS, adaptive management can reduce costs by 
traffic patterns, which reduces data loss and speeds up 20% from 25% with a static approach to 20% with an 
recovery. In conclusion, because of its sophisticated adaptive approach. In Microsoft Azure, the adaptive 
resource scaling, better load balancing, and quick response approach reduces costs by 22% from 23% with static 
to abrupt traffic fluctuations, AWS performs better than management to 18%. This shows that adaptive 
Azure in dynamic situations. management, thanks to dynamic resource optimisation, 
AWS demonstrates greater flexibility and efficiency in provides significant cost savings compared to static 
adapting resources to peak loads. One of the key findings methods for both platforms. These differences can stem 
of the study was that adaptive resource management based from their resource management approaches. Real-time 
on predictive models can significantly reduce load forecasting and adaptive scaling provided by AWS 
infrastructure costs, increasing its cost-effectiveness. allow for more effective resource allocation, which lowers 
Predictive models based on neural networks can the need for overprovisioning and minimises idle 
accurately predict the future load on cloud resources and resources, ultimately saving more money. Azure is less 
automatically adapt the distribution of computing power cost-effective than AWS due to its less flexible static 
and memory to ensure optimal resource utilisation. This resource allocation, which leads to underutilisation during 
avoids overcapacity and reduces the need for excessive periods of low demand and overutilisation during periods 
use of infrastructure to handle peak loads, which is one of of high demand. As a result, AWS's dynamic resource 
the main causes of cost overruns in traditional static management strategy reduces costs more effectively, 
resource management models. particularly for workloads that fluctuate. 
Models And Methods of Analysing Infrastructure Performance in…                                         Informatica 49 (2025) 333–344 339 
 
 
 
In summary, adaptive management showed a and static management modes. Adaptive management on 
significant reduction in infrastructure costs compared to both platforms is highly efficient, reducing infrastructure 
static management. Although AWS costs are higher, costs and maintaining the required level of performance. 
adaptive management performed better for both platforms, Through the implementation of forecasting and 
reducing costs more than static management. Thanks to automatic scaling mechanisms, adaptive resource 
predictive methods and automatic scaling, the system management significantly optimises infrastructure 
adapts resources to real needs, which can reduce utilisation [20]. However, this approach may require 
infrastructure costs by 20% compared to static additional setup and monitoring costs. Static control, 
management, where costs can be significantly higher due although easier to implement, can lead to less efficient use 
to inefficient use of resources. of resources, especially when the load is variable, which 
increases costs or reduces productivity [21]. Thus, 
3.5 Optimisation of the use of computing adaptive management is a better option for efficient use of 
power and memory computing power and memory, although it can be more 
difficult to implement and maintain. 
Table 4 demonstrates a comparison of key performance By leveraging forecasting and automatic scaling 
indicators (CPU Utilisation, Memory Usage, Network capabilities, adaptive resource management substantially 
Throughput and Disk I/O) when using static and adaptive optimises infrastructure utilisation [22, 23]. This 
resource management methods for AWS and Microsoft methodology effectively reduces the operational 
Azure cloud platforms. It also demonstrates that adaptive expenditures associated with cloud services while 
management allows for more efficient resource utilisation. ensuring sustained high performance and service 
Costs are reduced by automatically scaling resources, reliability. This approach is significantly more cost-
which allows for high performance while significantly effective and efficient than traditional static management, 
reducing overconsumption. which cannot effectively respond to changing load 
 conditions. 
Table 4: Comparison of cloud platform performance by For a more detailed comparison of the effectiveness of 
key parameters in static and adaptive resource adaptive and static resource management, it is important 
management to note that a key factor in reducing infrastructure costs is 
Networ to reduce the time during which resources are operating in 
Manage CPU Mem
Platfo k Disk an elevated mode. In systems with static management, 
ment utilisat ory 
rm through I/O resources are often kept in reserve for possible peak loads, 
method ion usage 
put which leads to constant capacity costs even during quiet 
periods [24, 25]. In such systems, resources can be in an 
Static 60% 70% 65% 60% 
increased mode (e.g., 80% of capacity) for 70% of the 
AWS 85% 88% 
85% 90% time, which creates significant additional costs. At the 
Adaptive (+15 (+28
(+25%) (+25%) same time, in systems with adaptive control, resources are 
%) %) 
added only when needed, and their use is adjusted 
Static 58% 68% 60% 58% depending on actual conditions. Therefore, resources are 
Micros
in overdrive only 20% of the time, as the system 
oft 83% 84% 
Azure 80% 85% automatically optimises resource allocation according to 
Adaptive (+15 (+26
(+22%) (+25%) 
%) %) current needs. This adaptability can significantly reduce 
Source: compiled by the authors. infrastructure costs as resources are not over-utilised when 
 they are not needed, resulting in greater efficiency and 
Percentages were calculated as the increase in resource savings. 
Through the use of predictive techniques, the system 
efficiency when moving from static to adaptive 
can not only reduce costs during low load phases but also 
management. For each indicator, the increase is 
ensure that additional resources are available when 
determined relative to the value recorded during static 
management. The initial values represent the effectiveness needed, which helps maintain high performance and 
of static methods. minimise the risk of downtime when resources are not 
The results show that adaptive resource management available to handle peak loads. This process also 
contributes to the stability of cloud platforms, as 
allows for more efficient use of computing power, 
anticipating changes in workload allows operations to 
memory, network bandwidth, and disk operations. AWS 
adapt to future changes before they occur, providing 
demonstrates slightly higher performance growth, 
especially in CPU Utilisation and Network Throughput. greater confidence in the continuity of services. 
At the same time, Microsoft Azure shows a steady These results also highlight the great potential of using 
adaptive methods for a variety of business processes and 
improvement in all parameters, which indicates the 
organisations where high efficiency in the use of cloud 
platform's high adaptability. 
resources is critical to reducing operating costs while 
The comparison of platform performance results 
demonstrates that AWS has overall higher resource ensuring the required performance. The use of such 
utilisation rates than Microsoft Azure, both in adaptive technologies is especially relevant for environments with 
high load variability, such as e-commerce, data 
340   Informatica 49 (2025) 333–344                                                                                                                       P. Kudrynskyi et al. 
 
 
 
processing, financial services and other industries where collection process in cloud environments. By combining 
load peaks can occur at unpredictable times. machine learning with data-gathering methods to increase 
Through the implementation of predictive models and the precision and effectiveness of investigations, it makes 
adaptive management, businesses can significantly a substantial contribution to the field of cloud forensics. 
improve their economic performance while ensuring Although in a different context, this confirms the 
competitiveness and cost reduction, which is a key factor importance of predictive accuracy and standardisation, 
for modern organisations seeking to make their operations which is also key to adaptive resource management. 
flexible and resilient in an ever-changing environment. P. Nawrocki et al. [32] addressed short-term and long-
term resource reservations, emphasising the need to 
4 Discussion respond quickly to sudden peak loads such as flash crowd 
workload effects. The study looked at machine learning-
The results confirm that adaptive management is based adaptive resource planning for cloud-based 
much more effective than static approaches, especially applications, with an emphasis on how machine learning 
when the load on cloud infrastructures is dynamic. For models can improve resource planning in cloud 
instance, the study by N. Du et al. [26] explored the use of environments. The results complement this approach by 
convex hull triangle mesh-based static mapping in highly showing that adaptive systems can effectively respond to 
dynamic environments, providing a novel technique for unpredictable loads while minimising costs. Other studies, 
improving mapping accuracy in such environments. This such as one by S. Ivan et al. [33], have studied the 
demonstrated that traditional approaches to resource efficiency of different cloud platforms, including AWS 
management under variable load conditions have limited and Microsoft Azure. The study offered insights into 
effectiveness. The results of the study confirm this cloud-based data processing for big data applications by 
statement, demonstrating that predictive models based on highlighting the advantages and disadvantages of each 
neural networks not only reduce infrastructure costs but platform for doing sentiment analysis at scale. Although 
also provide high flexibility and adaptability to cloud the study compared platforms, the results support the 
systems. conclusion of this paper that adaptive models significantly 
Similar conclusions were made by A. Braafladt et al. 
improve efficiency regardless of the specific platform. 
[27] and S. Khan and A. Jillani [28]. A. Braafladt et al. Microsoft’s Azure cloud computing is a fully managed 
presented an unusual approach to improving defence computing service that was introduced at a conference in 
modelling and simulation by examining the use of AI- 2008 and became known as Windows Azure and later 
driven adaptive analysis to detect emergent behaviours in renamed Microsoft Azure. P. Narayanan [34] discussed 
military capabilities design. S. Khan and A. Jillani the key components and services of Azure, with a special 
employed search-based software engineering techniques focus on data engineering and machine learning, as well 
to investigate cloud resource allocation and optimisation, as its impact on various industries due to the availability 
showing how sophisticated algorithms can be applied to of data centres around the world. P. Borra [35] discussed 
increase the effectiveness of cloud computing. This the key networking solutions provided by Microsoft 
emphasised the need to implement adaptive algorithms to Azure, which are the basis for supporting digital 
ensure the scalability and flexibility of cloud platforms. operations in modern business. The author examines in 
This correlates with this approach, which has shown the detail Azure components such as Virtual Network, Load 
effectiveness of using deep learning methods for real-time Balancer, VPN Gateway, ExpressRoute, and Firewall, 
load forecasting. with a focus on their practical application to ensure 
B. Predić et al. [29] and I. Petrovska and H. Kuchuk uninterrupted connectivity and improve security. The 
[30] both aimed to improve cloud resource management study aims to provide organisations with in-depth 
but took different approaches. In order to improve cloud knowledge and insights to help them effectively leverage 
load predictions and resource allocation under varying Azure networking services to meet changing business 
demands, Predić et al. employed a machine learning needs, which can complement the findings of this study. 
approach. In order to maximise efficiency and guarantee A study by O. Rolik and S. Zhevakin [36] confirmed 
secure operations, Petrovska & Kuchuk concentrated on the results in terms of cost-effectiveness. The use of 
adaptive resource allocation for data processing and adaptive management can reduce the cost of cloud 
security. Both strategies emphasised dynamic resource services by up to 20%, which highlights the importance of 
management in comparison to the current study, with the results for reducing the financial costs of 
Petrovska & Kuchuk concentrating on security and Predić organisations. P. Lakhera [37] complements these 
et al. on prediction accuracy. These concepts are findings by suggesting strategies for cost optimisation 
supported by the current study, which shows that adaptive using artificial intelligence. Anomaly detection and 
management improves cost-effectiveness and robustness predictive scaling, which the authors investigated, are key 
under fluctuating loads. elements for improving cost efficiency. 
Studies on cloud forensics, such as the one by R. Al- Traditional methods of resource management, as noted 
Mugern et al. [31], analyse the integration of machine by S. Tendulkar [38], are less effective due to the lack of 
learning techniques for data standardisation. This work consideration of dynamic changes in the load. This study 
presents an improved machine learning method that confirms this by demonstrating that predictive models can 
applies a cloud forensic meta-model to enhance the data more accurately determine resource requirements and 
Models And Methods of Analysing Infrastructure Performance in…                                         Informatica 49 (2025) 333–344 341 
 
 
 
ensure efficient use of resources under variable load edge computing offer prospects for further improving the 
conditions. S. Jaber [39] also supports the claim that flexibility and performance of cloud platforms. 
adaptive systems significantly reduce infrastructure costs. Neural network-based models have proven to provide 
The use of predictive models can reduce costs and highly accurate predictions of load changes, enabling 
improve the performance of cloud systems. efficient real-time resource adaptation. This significantly 
AWS, as shown by L. Devane [40], provides a high improves both the performance of cloud platforms and the 
ability to adapt to peak loads, which is consistent with the stability of systems. The results of the study demonstrate 
results obtained. Similar conclusions were made by S. the benefits of using intelligent algorithms that can adapt 
Gong et al. [41], who noted that adaptive systems to changing operating conditions. 
effectively respond to sudden changes in load, ensuring The results obtained are important for practical 
the stability of platforms. The study complements these application. They open opportunities to significantly 
findings by emphasising the importance of reducing reduce business operating costs while ensuring high 
response times to peak loads. This is important for availability and stability of services. The use of machine 
organisations that work with large amounts of data and learning-based adaptive control technologies allows for 
need consistent access to resources in real-time. optimising resource utilisation and minimising downtime 
In general, the research findings are fully consistent and congestion. 
with current industry trends, in particular the importance However, the study has several limitations: only two 
of using adaptive systems to manage cloud resources and cloud platforms were used, which may limit the 
confirm the effectiveness of load forecasting methods to generalisability of the results, and the number of types of 
reduce costs and improve performance. At the same time, server configurations for testing is limited. For further 
it is worth analysing the further development and research, it is advisable to expand the number of cloud 
improvement of such models based on deep learning and platforms tested, explore the integration of adaptive 
integration with new technologies such as edge management with new technologies, such as edge 
computing, which will allow for even greater efficiency in computing, which will significantly improve the 
real-time management. efficiency of real-time resource management, and improve 
predictive models using more sophisticated machine 
learning algorithms to improve the accuracy of predictions 
5 Conclusions and system adaptability. 
The study developed models for load forecasting and 
resource management, including a neural network model References 
for load forecasting and an adaptive resource management 
[1] Berestovenko, O. Virtualisation and network 
model that automatically adjusts resource use based on 
management: Best practices for improving 
forecasts. One of the main achievements was the 
efficiency. Technologies and Engineering, 2024, 
confirmation of the effectiveness of using intelligent 
25(6): 41-52. https://doi.org/10.30857/2786-
algorithms, in particular neural networks, for load 
5371.2024.6.4 
forecasting and automatic adaptation of resource 
[2] Abdullah, M., & Mohamed Surputheen, M. 
allocation in real-time. This reduced the cost of cloud 
Optimizing performance of cloud infrastructure 
services by an average of 20% compared to traditional 
through effective resource scheduling. Journal of 
static approaches, which confirms the cost-effectiveness 
Advanced Applied Scientific Research, 2024, 6(1): 
of the proposed methods.  
1-14. https://doi.org/10.46947/joaasr612024748 
The study also determined that AWS demonstrated 
[3] Alrammah, H., Gu, Y., Yun, D., & Zhang, N. Tri-
better adaptability under highly dynamic workloads due to 
objective optimization for large-scale workflow 
faster resource scaling and more efficient load balancing. 
scheduling and execution in clouds. Journal of 
While Microsoft Azure showed a more even distribution 
Network and Systems Management, 2024, 32(4): 
of resources at a stable load, which is an advantage in the 
89. https://doi.org/10.1007/s10922-024-09863-3 
case of a constant load level. The results of the study 
[4] Tiwari, A.K., & Yadav, S. Algorithmic model for 
showed that adaptive resource management in cloud 
cloud performance optimization using connection 
platforms can achieve significant performance 
pooling technique. Journal of Statistics and 
improvements and cost savings. AWS demonstrated a 
Management Systems, 2024, 27(2): 489-499. 
15% improvement in scalability and performance under 
https://doi.org/10.47974/jsms-1290 
highly dynamic workloads, while Microsoft Azure 
[5] Anayat, R. Cloud-based reinforcement learning in 
showed a 10% increase in resource allocation efficiency 
resource-constrained environments: Real-time 
under stable workloads. The use of predictive models 
performance optimization in autonomous systems, 
based on neural networks ensures accurate forecasting of 
2024. 
load changes and automatic adaptation of resources in 
https://doi.org/10.13140/RG.2.2.24832.24326 
real-time. Adaptive algorithms have proven to be more 
[6] Varanitskyi, D., Rozkolodko, O., Liuta, M., 
efficient than traditional approaches, especially in the face 
Zakharova, M., & Hotunov, V. Analysis of data 
of variable workloads. Further developments in 
protection mechanisms in cloud environments. 
technologies such as deep learning and integration with 
342   Informatica 49 (2025) 333–344                                                                                                                       P. Kudrynskyi et al. 
 
 
 
Technologies and Engineering, 2024, 25(1): 9-16. Improving accuracy of the spectral-correlation 
https://doi.org/10.30857/2786-5371.2024.1.1  direction finding and delay estimation using 
[7] Demchyna, M., Styslo, T., & Vashchyshak, S. machine learning. Eastern European Journal of 
Optimisation of intelligent system algorithms for Enterprise Technologies, 2025, 2(5(134)): 15-24. 
poorly structured data analysis. Bulletin of https://doi.org/10.15587/1729-4061.2025.327021  
Cherkasy State Technological University, 2024, [17] Porkodi, S., & Raman, A.M. Success of cloud 
29(4): 21-31. computing adoption over an era in human resource 
https://doi.org/10.62660/bcstu/4.2024.21  management systems: a comprehensive meta-
[8] Johnson, O.B., Olamijuwon, J., Cadet, E., analytic literature review. Management Review 
Osundare, O.S., & Samira, Z. Designing multi- Quarterly, 2025, 75(2): 1041-1075. 
cloud architecture models for enterprise scalability https://doi.org/10.1007/s11301-023-00401-0  
and cost reduction. Open Access Research Journal [18] Sandhu, R., Faiz, M., Kaur, H., Srivastava, A., & 
of Engineering and Technology, 2024, 7(2): 101- Narayan, V. Enhancement in performance of cloud 
113. https://doi.org/10.53022/oarjet.2024.7.2.0061 computing task scheduling using optimization 
[9] Talha, A., Bouayad, A., & Malki, M.O. An strategies. Cluster Computing, 2024, 27(5): 6265-
improved pathfinder algorithm using opposition- 6288. https://doi.org/10.1007/s10586-023-04254-
based learning for tasks scheduling in cloud w 
environment. Journal of Computational Science, [19] Soh, J., Copeland, M., Puca, A., & Harris, M. 
2022, 64: 101873. Microsoft Azure, 2020. Berkeley: Apress. 
https://doi.org/10.1016/j.jocs.2022.101873 https://doi.org/10.1007/978-1-4842-5958-0 
[10] Slivka, S. Microservices architecture for ERP [20] Singh, S., Ramkumar, K.R., & Kukkar, A. 
systems. Bulletin of Cherkasy State Technological Analysis and implementation of microsoft Azure 
University, 2024, 29(4): 32-42. https://bulletin- machine learning studio services with respect to 
chstu.com.ua/en/journals/tom-29-4- machine learning algorithms. In R. Agrawal, C.K. 
2024/arkhitektura-mikroservisiv-dlya-erp-sistem Singh, A. Goyal, & D.K. Singh (Eds.), Modern 
[11] Destek, M.A., Hossain, M.R., Manga, M., & Electronics Devices and Communication Systems, 
Destek, G. Can digital government reduce the 2023, (pp. 91-106). Singapore: Springer. 
resource dependency? Evidence from method of https://doi.org/10.1007/978-981-19-6383-4_7 
moments quantile technique. Resources Policy, [21] Kavaldzhieva, K. The Impact of Digitalization on 
2024, 99: 105426. the Measurement of value in the production and 
https://doi.org/10.1016/j.resourpol.2024.105426  operation of industrial products. In 2019 
[12] Smailov, N., Tsyporenko, V., Sabibolda, A., International Conference on High Technology for 
Tsyporenko, V., Abdykadyrov, A., Kabdoldina, Sustainable Development, HiTech, 2019, (Article 
A., Dosbayev, Z., Ualiyev, Z., & Kadyrova, R. number: 9128260). Sofia: Institute of Electrical 
Streamlining digital correlation-interferometric and Electronics Engineers. 
direction finding with spatial analytical signal. https://doi.org/10.1109/HiTech48507.2019.91282
Informatyka Automatyka Pomiary W Gospodarce 60  
I Ochronie Srodowiska, 2024, 14(3): 43-48. [22] Kiurchev, S., Abdullo, M.A., Vlasenko, T., Prasol, 
https://doi.org/10.35784/iapgos.6177  S., & Verkholantseva, V. Automated Control of the 
[13] Makhazhanova, U., Omurtayeva, A., Kerimkhulle, Gear Profile for the Gerotor Hydraulic Machine. In 
S., Tokhmetov, A., Adalbek, A., & Taberkhan, R. F. Chaari, F. Gherardini, V. Ivanov, & M. Haddar 
Assessment of Investment Attractiveness of Small (Eds.), Lecture Notes in Mechanical Engineering, 
Enterprises in Agriculture Based on Fuzzy Logic. 2023, (pp. 32-43). Cham: Springer. 
Lecture Notes in Networks and Systems, 2024, 935 https://doi.org/10.1007/978-3-031-16651-8_4  
LNNS: 411-419. [23] Bezshyyko, O., Dolinskii, A., Bezshyyko, K., 
[14] Azieva, G., Kerimkhulle, S., Turusbekova, U., Kadenko, I., Yermolenko, R., & Ziemann, V. 
Alimagambetova, A., & Niyazbekova, S. Analysis PETAG01: A program for the direct simulation of 
of access to the electricity transmission network a pellet target. Computer Physics 
using information technologies in some countries. Communications, 2008, 178(2): 144-155. 
E3S Web of Conferences, 2021, 258: 11003. https://doi.org/10.1016/j.cpc.2007.07.013  
https://doi.org/10.1051/e3sconf/202125811003  [24] Orazbayev, B., Zhumadillayeva, A., Kabibullin, 
[15] Imamguluyev, R., & Umarova, N. Application of M., Crabbe, M.J.C., Orazbayeva, K., & Yue, X. A 
Fuzzy Logic Apparatus to Solve the Problem of Systematic Approach to the Model Development 
Spatial Selection in Architectural-Design Projects. of Reactors and Reforming Furnaces With 
Lecture Notes in Networks and Systems, 2022, Fuzziness and Optimization of Operating Modes. 
307: 842-848. https://doi.org/10.1007/978-3-030- IEEE Access, 2023, 11: 74980-74996. 
85626-7_98  https://doi.org/10.1109/ACCESS.2023.3294701  
[16] Smailov, N., Tsyporenko, V., Ualiyev, Z., Issova, [25] Sasi, S., Subbu, S.B.V., Manoharan, P., & 
A., Dosbayev, Z., Tashtay, Y., Zhekambayeva, M., Abualigah, L. Design and implementation of 
Alimbekov, T., Kadyrova, R., & Sabibolda, A. secured file delivery protocol using enhanced 
Models And Methods of Analysing Infrastructure Performance in…                                         Informatica 49 (2025) 333–344 343 
 
 
 
elliptic curve cryptography for class I and class II [36] Rolik, O.I. & Zhevakin, S.D. Cost optimization 
transactions. Journal of Autonomous Intelligence, method for informational infrastructure 
2023, 6(3). https://doi.org/10.32629/jai.v6i3.740  deployment in static multi-cloud environment. 
[26] Du, N., Xie, L., Zhou, M., Gao, W., Wang, Y., & Radio Electronics, Computer Science, Control, 
Hu, J. Convex hull triangle mesh-based static 2024, 3: 160-172. https://doi.org/10.15588/1607-
mapping in highly dynamic environments. IEEE 3274-2024-3-14 
Transactions on Instrumentation and [37] Lakhera, P. Leveraging large language models to 
Measurement, 2024, 73: 1-14. optimize costs in Amazon web service cloud. 
https://doi.org/10.1109/tim.2023.3348881 TechRxiv, 2024. 
[27] Braafladt, A., Sudol, A., & Mavris, D. AI-driven https://doi.org/10.36227/techrxiv.172684142.2396
adaptive analysis for finding emergent behavior in 6027/v1  
military capability design. Journal of Defense [38] Tendulkar, S. Optimizing generative AI model 
Modeling and Simulation: Applications, performance through cloud resource management 
Methodology, Technology, 2024. in hybrid AI systems, 2024. 
https://doi.org/10.1177/15485129241289137 https://doi.org/10.13140/RG.2.2.34745.38246 
[28] Khan, S.M. & Jillani, A. Cloud resource allocation [39] Jaber, S. Enhanced model performance in 
and optimization using search-based software generative AI: Cloud resource optimization for 
engineering methods, 2024. real-time adaptive autonomous systems, 2024. 
https://doi.org/10.13140/RG.2.2.17568.19207 https://doi.org/10.13140/RG.2.2.32857.94567 
[29] Predić, B., Jovanovic, L., Simic, V., Bacanin, N., [40] Devane, L. Adaptive AI systems in autonomous 
Zivkovic, M., Spalevic, P., Budimirovic, N., & environments: Real-time decision making and 
Dobrojevic, M. Cloud-load forecasting via resource allocation through cloud-based 
decomposition-aided attention recurrent neural reinforcement learning, 2023. 
network tuned by modified particle swarm https://doi.org/10.13140/RG.2.2.21638.18241 
optimization. Complex & Intelligent Systems, [41] Gong, S., Yin, B., Zheng, Z., & Cai, K.-Y. 
2023, 10(2): 2249-2269. Adaptive multivariable control for multiple 
https://doi.org/10.1007/s40747-023-01265-3 resource allocation of service-based systems in 
[30] Petrovska, I. & Kuchuk, H. Adaptive resource cloud computing. IEEE Access, 2019, 7: 13817-
allocation method for data processing and security 13831. 
in cloud environment. Advanced Information https://doi.org/10.1109/access.2019.2894188 
Systems, 2023, 7(3): 67-73.  
https://doi.org/10.20998/2522-9052.2023.3.10  
[31] Al-Mugern, R., Othman, S.H., & Al-Dhaqm, A. An  
improved machine learning method by applying  
cloud forensic meta-model to enhance the data  
collection process in cloud environments.  
Engineering, Technology & Applied Science  
Research, 2024, 14(1): 13017-13025.  
https://doi.org/10.48084/etasr.6609  
[32] Nawrocki, P., Grzywacz, M., & Sniezynski, B.  
Adaptive resource planning for cloud-based  
services using machine learning. Journal of Parallel  
and Distributed Computing, 2021, 152: 88-97.  
https://doi.org/10.1016/j.jpdc.2021.02.018  
[33] Ivan, S.C., Győrödi, R.Ş., & Győrödi, C.A.  
Sentiment analysis using Amazon web services  
and Microsoft Azure. Big Data and Cognitive  
Computing, 2024, 8(12): 166.  
https://doi.org/10.3390/bdcc8120166  
[34] Narayanan, P.K. Engineering data pipelines using  
Microsoft Azure. In P.K. Narayanan (Ed.), Data  
Engineering for Machine Learning Pipelines,  
2024, (pp. 571-616). Berkeley: Apress.  
https://doi.org/10.1007/979-8-8688-0602-5_17  
[35] Borra, P. Microsoft Azure networking:  
Empowering cloud connectivity and security.  
International Journal of Advanced Research in  
Science, Communication and Technology, 2024,  
4(3): 469-475. https://doi.org/10.48175/ijarsct-  
18949  
344   Informatica 49 (2025) 333–344                                                                                                                       P. Kudrynskyi et al. 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i12.9433 Informatica 49 (2025) 345–360 345 
 
Deep Neural Network Architecture Optimization for Edge 
Computing Based on Evolutionary Algorithms 
 
 
Li Wang 1, Xiuming Cheng 2, * 
1School of Information and Electronics Engineering, Jiangsu Vocational Institute of Architectural Technology, Xuzhou 
221116, Jiangsu, China 
2School of General Courses, Jiangsu Vocational Institute of Architectural Technology, Xuzhou 221116, Jiangsu, China 
E-mail: xiuming_cheng@hotmail.com 
*Corresponding author 
 
Keywords: vehicular edge computing (VEC), edge server placement, network condition adaptation, synergistic 
fibroblast optimized efficient deep neural network (SFO-Eff-DNN) 
Received: May 28, 2025  
Vehicular Edge Computing (VEC) is a crucial component of Intelligent Transportation Systems (ITS), 
enabling low-latency and energy-efficient services by offloading computation to the network edge. 
However, optimizing system performance in such environments requires careful edge server placement, 
especially in dynamic vehicular contexts characterized by high mobility and unpredictability. Achieving 
optimal performance under the constraints of latency, energy consumption, and mobility remains a 
significant challenge. This research proposes a comprehensive framework for optimizing deep learning 
architectures in VEC, utilizing advanced evolutionary algorithms. Building on real-world vehicular 
mobility traces, the framework employs the Synergistic Fibroblast Optimized Efficient Deep Neural 
Network (SFO-Eff-DNN) to identify optimal configurations and edge server placements. The dataset 
includes details about task offloading under different mobility levels, the data was preprocessed using 
Min-Max normalization to ensure smooth learning. Among the algorithms evaluated, Synergistic 
Fibroblast Optimization (SFO) consistently produces well-distributed Pareto-optimal solutions and 
effectively handles trade-offs between competing objectives. The DNN is utilized to learn complex patterns 
in vehicular mobility and network conditions, which helps predict the best configurations for edge server 
placements. The proposed system efficiently minimizes latency and energy consumption while ensuring 
scalability and adaptability to real-world scenarios. Results demonstrate that SFO-Eff-DNN achieves 
superior convergence speed and energy efficiency, making it well-suited for time-sensitive deployments. 
Comparative simulations validate that this approach outperforms traditional methods, providing valuable 
insights for deploying efficient and robust edge intelligence architectures in next-generation intelligent 
transportation systems. 
Povzetek: Ta raziskava se osredotoča na področje robnega računalništva v vozilih (VEC), kar je ključno 
za zagotavljanje nizke zakasnitve v inteligentnih transportnih sistemih. Vsebina prispevka predstavlja 
hibridni okvir SFO-Eff-DNN, ki združuje globoko učenje in evolucijsko optimizacijo za reševanje 
kompleksnega problema postavitve robnih strežnikov in prilagajanja arhitekture nevronske mreže. Glavni 
dosežki vključujejo rešitev večciljne optimizacijske naloge, ki uspešno minimizira zakasnitev in porabo 
energije v dinamičnem voznem okolju. 
 
1  Introduction  et al., 2020). The exponential growth in ITS has resulted in 
an increased demand for responsive, energy-efficient, and 
An ITS enhances the safety of moving vehicles and intelligent processing solutions that can manage the 
hikers within the vicinity. In recent times, problems dynamic vehicular environment (Elassy et al., 2024). VEC 
regarding road traffic safety have increased and accidents is a pattern that brings the cloud computing capacities 
continue to occur regularly (Wan et al., 2020). Fortunately, closer to the network edge and is a likely solution to 
a growing number of related technologies have been service demands for low-latency services, such as auto-
applied to the transportation industry as wireless corrective driving support, real-time traffic management, 
communication and sensor technologies have developed and location-based services (Alhilal et al., 2024). 
and matured in recent years. The increased need for road Connected vehicles benefit from VEC by shortening the 
efficiency and safety in intricately linked road systems has response time of their systems and helping them save 
drawn a lot of attention to ITS in recent years (Boukerche power by assigning tasks to local servers (Chougule et al., 
346   Informatica 49 (2025) 345–360                                                                                                                                L. Wang et al. 
 
2024). Greater safety, dependability, efficiency in inability to handle the dynamism of vehicular mobility 
transportation, fast action and network reach make smart effectively in the process of optimizing latency and energy 
and sustainable driving networks possible (Talpur and consumption. 
Gurusamy, 2021). To minimize the time for data exchanges  
and energy used in vehicles, VEC allows vehicles to 2  Related work 
perform certain tasks on edge servers nearby. As a result, 
This section discusses the positioning of border 
connected vehicles receive a much better level of service 
servers within the VEC, including the traditional heuristics, 
(Zaki et al., 2024). Due to their speed and patches of 
deep learning (DL), evolutionary algorithms, the 
unpredictability, the movement of vehicles complicates 
challenges in dynamic vehicular environments, and the 
VEC systems (Zhao et al., 2023). The greatest aspect to 
recent data-driven and optimization-based developments 
focus on is the best locations and times for edge servers so 
of this space for better adaptability and performance. To 
that moving vehicles can be handled efficiently (Shen et 
fix the issue of resource assignment in cloud computing 
al., 2021). With many vehicles moving, topology shifts 
Infrastructure as a Service (IaaS), an Equilibrium 
taking place and numerous demands for services, generic 
Optimization (EO)-based evolutionary Recurrent Neural 
or manual placements are not usually enough. Similarly, 
Network (RNN) was presented (Ebrahimi Mood et al., 
managing various goals, including keeping reaction times 
2025). This model was designed to give virtual machines 
quick, using as little energy as possible, maintaining 
an optimal number of physical machines by improving 
flexibility, and scaling up, remained prominent in network 
how they work in general and by reducing their complexity. 
research (Peyman et al., 2023). As simulation traces were 
The simulations were faster and more reliable than the 
used, working with many nodes and requiring some 
conventional ones. 
attention to used parameters, this approach might face 
The significance of edge computing topics such as 
issues when put to practical use. 
selecting the right tasks for offloading, allocating resources, 
Deep learning and evolutionary optimization are used 
and ensuring good Quality-of-Service and Quality-of-
in the design to choose the best locations for edge servers. 
Experience (Vijayakumar et al., 2021). The challenges in 
Specifically, the SFO-Eff-DNN approach allows the 
optimizing and scheduling were solved with models and 
system to recognize patterns using a DNN and search 
DL techniques based on evolution. This approach helps to 
globally using an SFO algorithm. This framework 
make better decisions and effectively manage resources in 
processes actual data from vehicle movement to 
environments at the edges of a network. Yang et al., (2021) 
understand vehicle movements and the state of the network, 
introduced a method that can manage both accuracy and 
as well as select the best position for the servers. The key 
the speed of neural networks on edge devices. An estimate 
contribution of the research is as follows.  
of resource use latency created from the profiling model 
In extremely dynamic vehicle contexts, it was best to 
and the Pareto Bayesian search was driven by constraints 
formulate the edge server placement problem as a multi-
on accuracy and latency. Without sacrificing accuracy, the 
objective optimization task that simultaneously reduces 
inference process was 94.71% faster and the search 
the latency and energy consumption. 
process became 18.18% more efficient. 
To create the SFO-Eff-DNN framework, which 
An energy-efficient DNN offloading was developed 
combines biologically inspired optimization with effective 
under deadline and budget constraints in edge-cloud 
deep learning to deliver scalable and flexible placement 
environments; this optimization modeling was performed 
solutions. 
using an Enabled Hybrid Chaotic Evolutionary Algorithm 
To compare the system against traditional techniques 
Dynamic Voltage Frequency Scaling (HCEA-DVFS) (Li et 
and perform comprehensive simulations using genuine 
al., 2024). The Archimedes Optimization and Simulated 
mobility datasets, showcasing notable advances in 
Annealing were applied for global exploration, and local 
placement accuracy, energy economy, and convergence 
search improvement based on the Genetic Algorithm (GA) 
speed. 
chaotic strategy. Experiments proved that HCEA-DVFS 
The remainder of this research is separated into the 
decreased energy consumption by 7.93% to 19.38% 
following sections: the literature review on edge server 
relative to baseline techniques on a variety of DNN-based 
placement and the intelligent optimization techniques in 
apps. A suitable deep learning model and a proper method 
VEC are reviewed. The phrasing of the problem and the 
for training the effective training scheme for the deep 
system model are then given in detail, as well as the 
neural network (ETS-DNN) were created to allow real-
description of the proposed SFO-Eff-DNN framework. 
time monitoring in an Internet of Medical Things (IoMT) 
The next section will discuss the experimental settings and 
system that used edge computing (Pustokhina et al., 2020). 
performance evaluations, and the results and insights will 
Optimization of the neural network with autoencoders and 
then be discussed. Lastly, the research is concluded with 
softmax layers was achieved by using a Hybrid Modified 
directions for further research. 
Water Wave Optimization (HMWWO) algorithm. 
The introduction highlights the significant importance 
Examination of simulation results indicated that ETS-
of edge servers' placement efficiency in the VEC for 
DNN performed better when processing prompts and 
improving ITS performance. The literature review reveals 
making accurate diagnoses. Table 1 demonstrates the 
the weaknesses of existing methods, especially their 
summary of the literature review.   
Deep Neural Network Architecture Optimization for Edge…                                                     Informatica 49 (2025) 345–360 347 
 
Table 1: Related work VEC optimization methods and outcomes 
Methods Aim Outcome Challenge Author/Ref. 
DeepMaker Automatically design Achieved up to 26.4x compression on Designing efficient DNNs (Loni et al., 2020) 
Framework robust DNN architectures CIFAR-10 with only 4% accuracy that fit resource 
(Multi- for embedded devices loss; optimized network size and constraints while 
objective accuracy for limited resources maintaining accuracy 
Evolutionary 
Approach) 
Internet of To detect cyberattacks in Achieved higher accuracy, superior Addressing IoT security (Saheed et al., 2024) 
Things (IoT)- IoT networks using an detection rate, greater precision, false with limited resources, 
Defender efficient, lightweight alarm rate, mIoU, and training time class imbalance, and low 
(Modified edge-based IDS on BoT-IoT dataset; effective real- hardware security in edge 
GA)/ Deep time deployment on Raspberry Pi computing environments 
long-short- devices 
term memory 
(LSTM) 
Genetic To reduce latency and Achieved lower energy consumption Balancing limited (Bi et al., 2020)  
Simulated energy usage in smart and faster convergence compared to resources of SMDs with 
Annealing- mobile devices by three baseline methods using real-life high communication costs 
based Particle partially offloading data; provided joint optimization of and maintaining energy-
Swarm offloading ratio, bandwidth, and efficient service 
Optimization transmission power allocation 
(GSP) 
Greedy Optimizing task Achieved near-optimal scheduling Reducing excessive (Chen et al., 2020) 
Algorithm scheduling in cloud-edge performance with reduced average delays during DNN task 
and GA for systems to reduce the response time; GA outperformed offloading to enhance the 
Task average response time of greedy in accuracy but required more vehicle experience 
Scheduling DNN-based apps computation time. 
Particle to efficiently and quickly Reduced MEC server delay, balanced Designing a low-delay (You et al., 2021) 
Swarm transfer activities from energy consumption, and enabled and energy-efficient 
Optimization resource-constrained edge effective resource allocation offloading technique in a 
(PSO) devices to MEC servers in compared to GA and SA methods system with several 
IIoT contexts vehicles and MECs 
Differential To maximize IoT edge Outperformed the Firefly Algorithm Clustering and scheduling (Yousif, et al., 2024) 
Evolution computing task clustering and PSO in reducing execution time tasks effectively in 
(DE) and scheduling and improving system efficiency and heterogeneous IoT edge 
stability under heavyweight environments 
workloads 
Greedy To minimize the worst- Achieved convergence and effective Heterogeneous (Xiao et al., 2021) 
Algorithm + case cost of FL in VEC by trade-off between cost and fairness capabilities and data 
Lagrangian optimizing computation, through dynamic vehicle selection quality among vehicles; 
Dual + transmission, and local and resource allocation optimization energy and time 
Adaptive model accuracy constraints in VEC 
Harmony 
Search in 
federated 
learning (FL) 
VECMAN To improve energy Achieved 7–18% energy savings vs. Uncertainty in future (Bahreini et al., 2021) 
(Resource efficiency in VEC local execution and ~13% vs. RSU vehicle locations; 
Selector + systems by managing offloading by selecting participating difficulty in determining 
Energy resource sharing among vehicles and optimizing sharing optimal resource sharing 
Manager EVs durations and energy management 
Algorithms) 
VaCo To enhance intelligent VaCo effectively utilizes vehicle real-time scheduling of (Jiang et al., 2025) 
(Vehicle- service deployment in resources, reducing the service failure vehicle storage; benefit 
assisted VEC by using vehicles' rate and cost. Real-world dataset evaluation under dynamic 
Collaborative storage for collaborative evaluation confirms its ability to load 
Caching caching balance benefits for all. 
System 
HSCoNAS Optimize DNN 
(Hardware- Achieved strong accuracy–latency High search overhead and 
architecture for accuracy 
aware trade-offs on ImageNet across CPU, runtime approximation (Luo et al., 2021) 
and latency on edge 
Evolutionary GPU, edge challenges 
NAS devices 
Framework) 
LENS Incorporate wireless Improved Pareto front performance 
Scalability issues and 
(Latency- communication into NAS by 76.47% (energy) and 75% (Odema et al., 2021) 
fixed-tier constraints 
aware NAS for hierarchical systems (latency) 
for Edge–
348   Informatica 49 (2025) 345–360                                                                                                                                L. Wang et al. 
 
Cloud 
Systems) 
Federated 
Learning in Review implementation, Classified FL methods, hardware 
Synchronization delays, 
Edge taxonomy, and challenges constraints, and case studies; (Abreha et al., 2022) 
hardware resource limits 
Computing of FL in EC identified open issues 
(Survey) 
RL-Dynamic To optimize service Reduced delay and improved edge Model complexity and (Talpur and Gurusamy, 
(Reinforcement placement in vehicular server utilization compared to static vehicle mobility 2021) 
Learning networks by considering placement; fairness trade-offs unpredictability 
Framework) mobility and dynamic demonstrated 
service demands 
 
2.1  Problem statement  individuals move around and the network evolves, it is 
Optimizing resources and edge server placement in important to find these servers with practical jobs and 
VEC as a result of high mobility, variable networks, and make sure they supply energy. The problem is solved by 
few resources was hard. Usually, greedy algorithms and optimizing multiple objectives, with the main variables 
other traditional methods do not work well in being the location of servers and the way vehicles connect 
environments that change dynamically (Chen et al., 2020). to them throughout the day. 
PSO faces the issues of early convergence and fixation  
when working with multiple vehicles (You et al., 2021). A) Architectural components 
DE was not suitable for clustering tasks in real time on 
heterogeneous edge systems due to its issues with The architecture of the VEC system consists of three 
scalability and computation (Yousif et al., 2024). main layers, such cloud, VEC, and vehicle, Cloud storage 
Therefore, the proposed framework SFO-Eff-DNN was allows for convenient processing and provides a backup 
used to learn how devices move and decide on offloading. system. Figure 1 illustrates the architecture of VEC. The 
It minimizes delays and uses less power, all while offering VEC layer includes a network of Roadside Units (RSUs) 
adaptability, scalability, and fast convergence in changing with edge servers, allowing local computing and rapid 
VEC networks. exchanges of data. Intelligent vehicles make up the vehicle, 
 layer and handle task generation and offloading depending 
on the current network and mobility issues. Environmental 
3  Methods  
sensors like Global Positioning System (GPS) and cameras 
3.1 Architectural overview and problem in vehicles provide live data that is key for improved traffic 
formulation management and safety. They enable Vehicle-to-Vehicle 
The VEC would feature wireless connection, (V2V) and Vehicle-to-RSU (V2R) communication and 
permanent edge servers, and mobility vehicles. The were able to process or offload tasks according to resource 
simulation's rise can be increased by using vehicles to availability. Vehicles also allow for caching of data in 
carry out new missions on surrounding servers. As memory, which makes the system work more responsively. 
Deep Neural Network Architecture Optimization for Edge…                                                     Informatica 49 (2025) 345–360 349 
 
 
Figure 1: The architecture of the VEC 
Vehicle Definition: The vehicle 𝑉  defined as a six- Edge Server: An edge server 𝐹 is defined as a three-
tuple is expressed in equation (1). tuple in equation (3).  
𝑽 = {𝑽𝒋𝒅, 𝑽𝒔𝒕, 𝒗𝑲, 𝑮, 𝑱[ 𝒓]} (1) 𝑭 = {𝑭𝒋𝒄, 𝑫, 𝑲}         (3) 
Each vehicle 𝑉 is identified by its 𝑉𝑗𝑑, can be activated The edge server is identified by a unique ID (𝐹𝑗𝑐) and 
or deactivated (𝑉𝑠𝑡), has a type of task (𝑣𝑗,), is located by characterized by its computational capacity (𝐷 ), which 
Simulation of Urban Mobility (SUMO’s) data 𝐾 = includes memory, processing speed, and storage modeled 
{𝑘 similarly to vehicle hardware specifications. Its 
𝑤 , 𝑘𝑧, 𝑘𝑦,𝑠𝑡}, is equipped with certain hardware (𝐺), and 
geographical location (𝐾) is also a key attribute for optimal 
is running several active instances of applications 𝐽[ 𝑟]. 
placement within the VEC network. 
Vehicle Hardware Specifications and Role of RSU: a 
Properties of edge servers in VEC  
vehicle’s hardware specifications 𝐺are represented as a set 
Dynamic vehicle assignment: Vehicle assignments to 
in equation (2).  
clusters at any time 𝑠 were independent of previous 
𝑮 = {𝑶,𝑵[ 𝒓], 𝑨, 𝑻, 𝒅, 𝒆}         (2) assignments, allowing the system to adapt in real-time to 
the high mobility and changing network topology of 
Each vehicle’s hardware profile 𝐺 includes processor 
vehicular environments.  
specs ( 𝑂 ), memory configuration 𝑁[ 𝑟]  distinguishing 
Dedicated edge server assignment: Each vehicular 
central processing unit (CPU)/Graphics processing unit 
cluster was mapped indirectly to a single edge server, 
(GPU usage, battery capacity (𝐴 ), installed sensors (𝑇 ), 
ensuring exclusive service per cluster. This approach 
communication interfaces (𝑑) such as Wi-Fi, Long Term 
minimizes resource conflicts and supports the demanding 
Evolution (LTE), or 5G New Radio (NR), and 
performance requirements of VEC applications.  
communication frequency range (𝑒 ). These parameters 
Many-to-one vehicle-to-server mapping: Multiple 
influence the vehicle’s ability to process or offload 
vehicles can offload computational tasks to the same edge 
computational tasks. 
server, enabling efficient resource utilization and 
RSUs were placed along roadways that help to process 
centralized task processing within the VEC framework. 
and store data close to the network. RSUs were better at 
Data from edge servers is uploaded to remote data 
processing and managing data than vehicles and at storing 
centers, known as cloud servers, which supply large 
and communicating with the internet whenever necessary. 
amounts of computing and storage services over a large 
It provides quick answers to requests in maps, and videos, 
area. Using information from vehicles and edge servers, 
and controls traffic while edge servers rely on them. 
350   Informatica 49 (2025) 345–360                                                                                                                                L. Wang et al. 
 
cloud services can manage the network from one central 1) Decision Variables: 
place and take the best actions. The combination of To model the edge server placement in the VEC 
vehicular terminals, edge servers, and cloud infrastructure network, define decision variables that indicate whether an 
makes the VEC system both strong and capable of edge server is deployed at a specific location and how 
handling the needs of intelligent transportation vehicles were assigned to these servers for optimal 
management. performance. 𝐴𝑣𝑓  is a binary decision variable that 
With the architectural components established, the indicates the connection status between vehicle 𝑣 and edge 
server placement strategy in the proposed VEC framework server 𝑒 in equation (6). 
can now be formally defined to optimize performance 
under dynamic vehicular conditions. 𝐴𝑣𝑓     =
Edge server placement: In the VEC model, the 
𝟏, 𝒊𝒇 𝒗𝒆𝒉𝒊𝒄𝒍𝒆 𝒖 𝒊𝒔 𝒄𝒐𝒏𝒏𝒆𝒄𝒕𝒆𝒅 𝒕𝒐 𝒆𝒅𝒈𝒆 𝒔𝒆𝒓𝒗𝒆𝒓 𝒇
placement of edge servers was modeled by a bipartite {
graph with two sets: 𝐹is for edge servers, while 𝑉 is for 𝟎, 𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆                                                                      
client vehicles. Each server 𝑓 ∈ 𝐹 comes with a defined (6) 
𝑊𝑚𝑎𝑥   , showing its maximum vehicle capacity. 
𝑓
Communication cost indicates how well a vehicle v works 𝐴𝑣𝑓   is a binary decision variable indicating the 
with a server e due to the effects of latency 𝐾𝑉𝑓 and energy deployment status of an edge server at location  𝑗  in 
consumption 𝐹𝑉𝑓 . The objective is to determine a good equation (7).  
subset 𝐹1   out of 𝐹 and describe the mapping𝜙: 𝑉 → 𝐹1 :, 
assigning each vehicle to a server to minimize both the 𝐴𝑓𝑗     =
total delay and the power used across the system. 1, 𝑖𝑓 𝑎𝑛 𝑒𝑑𝑔𝑒 𝑠𝑒𝑟𝑣𝑒𝑟 𝑖𝑠 𝑝𝑙𝑎𝑐𝑒𝑑 𝑎𝑡 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑗
Average latency:𝐾ˉis used to mean the average time {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒                                                                             
taken for vehicles to communicate with edge servers while (7) 
offloading their tasks. It helps to measure the effectiveness 
of server placement and matching vehicles to servers in the 
VEC framework under changing mobility conditions. It is 2)  Parameters 
computed as in equation (4). The parameters in the formulation define the system 
𝟏 characteristics essential for optimizing edge server 
𝑲 = ∑ 𝑲    (4)
|𝑽| 𝒗∈𝑽 𝒗𝒇  placement in the VEC network. The energy consumption 
for a vehicle 𝑣 to offload computational tasks to an edge 
The ∣ 𝑉 ∣ denotes the total number of vehicles within server e is denoted as 𝐹𝑣𝑓. In equation (8).  
the VEC network. 𝐾𝑣𝑓 represents the communication 
latency encountered by vehicle 𝑣 during task offloading to 𝐹𝑢𝑓 = (𝑂𝑠𝑤 + 𝑂𝑞𝑤). 𝑆𝑐𝑜𝑚𝑚  (8) 
edge server 𝑓, defined as equation (5). 
Where 𝑂
𝑲 𝑠𝑤is the vehicle’s transmission power, 𝑂𝑞𝑤is 
𝒗𝒇   = 𝑺𝒓𝒆𝒄𝒆𝒊𝒗𝒆 − 𝑺𝒔𝒆𝒏𝒅  (5) 
the reception power, and 𝑆𝑐𝑜𝑚𝑚 is the time taken for the 
communication exchange. This metric helps quantify 
In this context, 𝑆𝑠𝑒𝑛𝑑indicates the timestamp when a energy efficiency in task offloading scenarios within the 
vehicle initiates the task offloading request, while 𝑆𝑟𝑒𝑐𝑒𝑖𝑣𝑒 VEC environment. The latency experienced by a vehicle 
marks the moment the vehicle receives the processed 𝑣 when offloading tasks to an edge server e is denoted as 
response from the edge server. 𝐾𝑣𝑓 . Equation (9) defines it as the interval of time between 
The goal of the edge server placement was to the sending of the offloading request and the receiving of 
minimize the average latency 𝐾ˉ, ensuring efficient, low- the processed response. 
latency communication for all vehicles within the network. 
 𝑲𝒗𝒇 = 𝑹𝒆𝒄𝒆𝒊𝒗𝒆 𝑻𝒊𝒎𝒆 − 𝑺𝒆𝒏𝒅 𝑻𝒊𝒎𝒆    (9) 
B) Model formulation 
The edge server placement issue in a VEC network is In the VEC environment, key parameters include 𝑂𝑓, 
defined in this section to minimize overall energy usage the active power consumption of edge server  𝑓; 𝑐𝑣𝑓, the 
and delay through optimal edge server placement. The distance between vehicle  𝑣  and edge server 𝑓 ; 𝐷 , the 
decision variables, objective functions, and constraints maximum number of servers on the edge deployable in the 
involved in the problem formulation are detailed below. network; and  𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦𝑖  , the maximum number of 
Consider a fixed of vehicles 𝑉 =  {𝑣₁, 𝑣₂. . . 𝑣ₘ}, a set vehicles that a server on the edge  𝑖  can handle. These 
of edge servers 𝐹 = {𝑓1,𝑓2,…,𝑓𝑛} , anda list of possible factors guide optimal server placement. 
deployment sites 𝐽 = {𝑗1,𝑗2,…,𝑗𝑛} for placing edge servers 3) Objective Function  
within the network. To minimize overall energy consumption and reduce 
total latency in the VEC network. 
Deep Neural Network Architecture Optimization for Edge…                                                     Informatica 49 (2025) 345–360 351 
 
Minimize Total Energy Consumption: Total energy ∑𝑚 ∑𝑚𝑓=1 𝑗=1𝐴𝑓𝑗 ≤ 𝐷    (14) 
consumption includes the energy used by vehicles to 
offload tasks (𝐹𝑣𝑓) and the power consumed by active edge 
Binary Constraints: The decision variables 𝐴𝑣𝑓  and 
servers (𝑂𝑓 ). The objective is to minimize the sum of 
𝐴
vehicle offloading energy and edge server power across the 𝑓𝑗 are binary, reflecting the discrete nature of the problem. 
network in equation (10). Specifically, a vehicle 𝑣  is either connected to an edge 
server 𝑓 or not, and an edge server is either deployed at 
location 𝑗 or not. These binary constraints ensure clear and 
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒      ∑𝑁 ∑𝑀𝑣=1 𝑓=1𝐴𝑣𝑓𝐹𝑣𝑓 + unambiguous decision-making in the edge server 
  ∑𝑚 ∑𝑚 placement and vehicle assignment process within the VEC 
𝑓=1 𝑗=1𝐴𝑓𝑗𝑜𝑓   (10) 
network in equations (15) and (16).  
Where 𝑜𝑓 and 𝐴𝑓𝑗 indicates vehicle-to-server 
𝐴
connections and server placements, respectively. 𝑢𝑓 ∈ {0,1}∀𝑢,𝑓   (15) 
Minimize Total Cumulative Latency: To reduce the 
overall communication delay experienced by vehicles 𝐴𝑓𝑗 ∈ {0,1}∀𝑓,𝑗   (16) 
when offloading tasks to edge servers. This total latency is 
calculated as the sum of the individual latencies  𝐾𝑣𝑓 , 
defined by the time difference between sending the task 3.2  Dataset 
and receiving the response expressed in the following For a Vehicular Edge Computing scenario, this 5,811-
equation (11). record task offloading event dataset is used to validate the 
effectiveness of the proposed SFO-Eff-DNN system.   This 
𝑴𝒊𝒏𝒊𝒎𝒊𝒛𝒆      ∑𝒎 ∑𝒏𝒇=𝟏 𝒖=𝟏𝑨𝒗𝒇𝑲𝒗𝒇    (11) dataset includes information on task arrival/completion 
time, processing time, network latency, energy 
consumption, and vehicle node mobility.  The model can 
Where 𝐴𝑣𝑓 indicates if vehicle 𝑣 offloads to server 𝑓, 
learn intricate mobility and network behaviors because to 
and 𝐾𝑣𝑓 is the latency between them. 
this dataset's capture of dynamic, real-world vehicle 
4) Constraints 
settings.  This is in line with the framework's goal of 
The optimization problem includes constraints to 
guarantee efficient deployment of edge servers and proper optimizing edge server placements and deep neural 
assignment of vehicles, ensuring that server capacities network settings.  It encourages scalability, responsiveness, 
were not exceeded and system resources were utilized and efficiency for real-time VEC and smart mobility by 
effectively. facilitating an equitable examination of latency vs. energy 
Server Capacity Constraint: Each edge server has a trade-offs. 
limited capacity, restricting the number of vehicles it can 
serve. The total vehicles assigned to server 𝑓  must not Source:  
exceed its capacity𝐷𝑓, ensuring balanced load distribution https://www.kaggle.com/datasets/programmer3/vec-edge-
and preventing server overload in equation (12).  server-offloading-dataset    
∑𝑴𝒖=𝟏𝑨𝒗𝒇   ≤ 𝑫𝒇 ∀𝒇   (12) 3.3  Preprocessing Using Min-Max 
Normalization 
Vehicle Assignment Constraint: To ensure proper task To create an energy-efficient optimum structure of a deep 
offloading, each vehicle must be assigned to exactly one neural network for real-time VEC activities with enhanced 
edge server. This guarantees that every vehicle connects to energy economy, reduced latency, and scalable 
a single server for processing its tasks, expressed as performance, min-max normalization is applied in the 
preprocessing stage.  The model's convergence is 
equation (13). 
enhanced and a uniformly distributed collection of features 
is made possible for efficient decision-making for real-
∑𝐹𝑓=1𝐴𝑣𝑓   = 1    ∀𝜈   (13) time VEC operations by normalizing the input parameters 
of delay, energy consumption, and vehicle speed between 
0 and 1.The value of property 𝐵  is normalized from 
Restrictions on Edge Server Positioning: The 
[𝑚𝑖𝑛
deployment of edge servers within the network is restricted 𝐵 , 𝑚𝑎𝑥𝐵]  to [𝑛𝑒𝑤𝑚𝑖𝑛 , 𝑛𝑒𝑤𝑚𝑎𝑥 ] using equation 
𝐵 𝐵
by a maximum allowable number, denoted by  𝐷 . This (17), which maximizes data representation: 
constraint confirms that the total quantity of placed edge 𝑢−𝑚𝑖𝑛𝐵
servers does not exceed 𝐷, and is formulated as equation (𝑛𝑒𝑤
𝑚𝑎𝑥 𝑚𝑖𝑛 , 𝑛𝑒𝑤𝐵 𝑚𝑎𝑥 ) + 𝑛𝑒𝑤𝐵 𝑚𝑖𝑛  
𝐵
𝐵−𝑚𝑖𝑛𝐵
(14).      (17) 
352   Informatica 49 (2025) 345–360                                                                                                                                L. Wang et al. 
 
In addition to enhancing prediction reliability and while SFO acts like fibroblast cells in real healing to search 
preserving a consistent data distribution, this through lots of different solutions quickly. SFO helps set 
normalization facilitates effective implementation in real- up Eff-DNN weights, biases, and learning rates to ensure 
time automotive applications. good latency, energy consumption, and ability to scale up 
or down. As a result of hybridization, the system evades 
3.4  Synergistic fibroblast optimized efficient local optima and gradually finds the best solution. The use 
deep neural network (SFO-Eff-DNN) of real-world data for vehicles confirms that the SFO-Eff-
The research to improve the DL architecture, the SFO- DNN framework can quickly converge, lower the time 
Eff-DNN suggests a hybrid intelligence framework with needed for inference, and help with making energy-
edge servers situated in VEC. It relies on the predictive efficient decisions in rapidly changing VEC environments. 
power of Eff-DNN and integrates the ability of the SFO Algorithm 1 represents the proposed SFO-Eff-DNN model 
algorithm to adjust itself. Eff-DNN is used to figure out working process. 
how vehicles move around and how the network changes, 
Algorithm 1: SFO-Eff-DNN 
𝑆𝑡𝑒𝑝 1: 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑎𝑡𝑖𝑜𝑛  
𝑑𝑒𝑓 𝑠𝑒𝑡𝑢𝑝(): 
    𝑀 =  30                     # 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒 
    𝑁 =  𝑛𝑢𝑚_𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠()       # 𝑇𝑜𝑡𝑎𝑙 𝐸𝑓𝑓 − 𝐷𝑁𝑁 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 
    𝑚𝑎𝑥_𝑖𝑡𝑒𝑟, 𝑟ℎ𝑜, 𝑡𝑎𝑢 =  100, 0.5, 5 
    𝑠, 𝑘_𝑝𝑞, 𝐿 =  1.0, 0.8, 10.0 
    𝑑𝑎𝑡𝑎 =  𝑙𝑜𝑎𝑑_𝑉𝐸𝐶_𝑑𝑎𝑡𝑎()     # 𝑅𝑒𝑎𝑙 − 𝑤𝑜𝑟𝑙𝑑 𝑚𝑜𝑏𝑖𝑙𝑖𝑡𝑦/𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑑𝑎𝑡𝑎 
    𝑟𝑒𝑡𝑢𝑟𝑛 𝑀,𝑁,𝑚𝑎𝑥_𝑖𝑡𝑒𝑟, 𝑟ℎ𝑜, 𝑡𝑎𝑢, 𝑠, 𝑘_𝑝𝑞, 𝐿, 𝑑𝑎𝑡𝑎 
𝑆𝑡𝑒𝑝 2: 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒 𝑡ℎ𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 
𝑑𝑒𝑓 𝑖𝑛𝑖𝑡_𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑀,𝑁): 
    𝑟𝑒𝑡𝑢𝑟𝑛 [{′𝑝𝑎𝑟𝑎𝑚𝑠′: 𝑟𝑎𝑛𝑑_𝑣𝑒𝑐(𝑁), ′𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦′: 𝑟𝑎𝑛𝑑_𝑣𝑒𝑐(𝑁)} 𝑓𝑜𝑟 _ 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒(𝑀)] 
𝑆𝑡𝑒𝑝 3: 𝑇𝑟𝑎𝑖𝑛 𝑎𝑛𝑑 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒 𝑡ℎ𝑒 𝐸𝑓𝑓 − 𝐷𝑁𝑁 𝑚𝑜𝑑𝑒𝑙  
𝑑𝑒𝑓 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑝𝑎𝑟𝑎𝑚𝑠, 𝑑𝑎𝑡𝑎): 
    𝑚𝑜𝑑𝑒𝑙 =  𝑏𝑢𝑖𝑙𝑑_𝐸𝑓𝑓𝐷𝑁𝑁(𝑝𝑎𝑟𝑎𝑚𝑠) 
    𝑡𝑟𝑎𝑖𝑛_𝐷𝑁𝑁(𝑚𝑜𝑑𝑒𝑙,∗ 𝑑𝑎𝑡𝑎) 
    𝑙𝑎𝑡𝑒𝑛𝑐𝑦, 𝑒𝑛𝑒𝑟𝑔𝑦 =  𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒_𝑙𝑎𝑡𝑒𝑛𝑐𝑦_𝑒𝑛𝑒𝑟𝑔𝑦(𝑚𝑜𝑑𝑒𝑙) 
    𝑟𝑒𝑡𝑢𝑟𝑛 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 +  𝑒𝑛𝑒𝑟𝑔𝑦  # 𝑆𝑖𝑚𝑝𝑙𝑒 𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝑙𝑜𝑤𝑒𝑟 𝑖𝑠 𝑏𝑒𝑡𝑡𝑒𝑟) 
𝑆𝑡𝑒𝑝 4: 𝑉𝑒𝑙𝑜𝑐𝑖𝑡𝑦 𝑢𝑝𝑑𝑎𝑡𝑒 𝑤𝑖𝑡ℎ 𝑓𝑒𝑒𝑑𝑏𝑎𝑐𝑘 𝑎𝑛𝑑 𝑙𝑜𝑐𝑎𝑙 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛  
𝑑𝑒𝑓 𝑢𝑝𝑑𝑎𝑡𝑒_𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦(𝑖𝑛𝑑, 𝑝𝑎𝑠𝑡_𝑝𝑜𝑠, 𝑟ℎ𝑜): 
    𝑐 =  𝑙𝑜𝑐𝑎𝑙_𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛(𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′]) 
    𝑑 =  𝑣𝑒𝑐𝑡𝑜𝑟_𝑑𝑖𝑣(𝑝𝑎𝑠𝑡_𝑝𝑜𝑠, 𝑛𝑜𝑟𝑚(𝑝𝑎𝑠𝑡_𝑝𝑜𝑠)) 
    𝑟𝑒𝑡𝑢𝑟𝑛 𝑖𝑛𝑑[′𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦′]  +  (1 −  𝑟ℎ𝑜)  ∗  𝑐 +  𝑟ℎ𝑜 ∗  𝑑 
𝑆𝑡𝑒𝑝 5: 𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑢𝑝𝑑𝑎𝑡𝑒  
𝑑𝑒𝑓 𝑢𝑝𝑑𝑎𝑡𝑒_𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛(𝑖𝑛𝑑, 𝑣𝑒𝑙, 𝑠, 𝑘_𝑝𝑞, 𝐿): 
Deep Neural Network Architecture Optimization for Edge…                                                     Informatica 49 (2025) 345–360 353 
 
    𝑠𝑝𝑒𝑒𝑑 =  𝑠 / (𝑘_𝑝𝑞 ∗  𝐿) 
    𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛 =  𝑣𝑒𝑐𝑡𝑜𝑟_𝑑𝑖𝑣(𝑣𝑒𝑙, 𝑛𝑜𝑟𝑚(𝑣𝑒𝑙)) 
    𝑟𝑒𝑡𝑢𝑟𝑛 𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′]  +  𝑠𝑝𝑒𝑒𝑑 ∗  𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛 
𝑆𝑡𝑒𝑝 6: 𝑀𝑎𝑖𝑛 𝑆𝐹𝑂 − 𝐸𝑓𝑓𝐷𝑁𝑁 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛 
𝑑𝑒𝑓 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒_𝑆𝐹𝑂_𝐸𝑓𝑓𝐷𝑁𝑁(): 
    𝑀, 𝑁, 𝑇, 𝑟ℎ𝑜, 𝑡𝑎𝑢, 𝑠, 𝑘_𝑝𝑞, 𝐿, 𝑑𝑎𝑡𝑎 =  𝑠𝑒𝑡𝑢𝑝() 
    𝑝𝑜𝑝, ℎ𝑖𝑠𝑡𝑜𝑟𝑦 =  𝑖𝑛𝑖𝑡_𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑀,𝑁), [] 
    𝑓𝑜𝑟 𝑡 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒(𝑇): 
        𝑓𝑜𝑟 𝑖𝑛𝑑 𝑖𝑛 𝑝𝑜𝑝: 
            𝑖𝑛𝑑[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]  =  𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′], 𝑑𝑎𝑡𝑎) 
        𝑝𝑎𝑠𝑡 =  𝑝𝑜𝑝 𝑖𝑓 𝑡 <  𝑡𝑎𝑢 𝑒𝑙𝑠𝑒 𝑝𝑜𝑝. 𝑐𝑜𝑝𝑦() 
        𝐹𝑜𝑟 𝑖, 𝑖𝑛𝑑 𝑖𝑛 𝑒𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑒(𝑝𝑜𝑝): 
            𝑖𝑛𝑑[′𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦′]  =  𝑢𝑝𝑑𝑎𝑡𝑒_𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦(𝑖𝑛𝑑, 𝑝𝑎𝑠𝑡[𝑖][′𝑝𝑎𝑟𝑎𝑚𝑠′], 𝑟ℎ𝑜) 
            𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′]  =  𝑢𝑝𝑑𝑎𝑡𝑒_𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛(𝑖𝑛𝑑, 𝑖𝑛𝑑[′𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦′], 𝑠, 𝑘_𝑝𝑞, 𝐿) 
        𝑏𝑒𝑠𝑡 =  𝑚𝑖𝑛(𝑝𝑜𝑝, 𝑘𝑒𝑦 = 𝑙𝑎𝑚𝑏𝑑𝑎 𝑥: 𝑥[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]) 
        ℎ𝑖𝑠𝑡𝑜𝑟𝑦. 𝑎𝑝𝑝𝑒𝑛𝑑(𝑏𝑒𝑠𝑡[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]) 
    𝑟𝑒𝑡𝑢𝑟𝑛 𝑏𝑒𝑠𝑡, ℎ𝑖𝑠𝑡𝑜𝑟𝑦 
To improve performance in dynamic Vehicular Edge makes use of an Eff-DNN to represent how vehicles and 
Computing (VEC) settings, the SFO-Eff-DNN algorithm 1 networks interact. An Eff-DNN architecture has an input 
combines the strength of Efficient Deep Neural Networks layer, an output layer, and many hidden layers, as shown 
(Eff-DNN) with Synergistic Fibroblast Optimization in Figure 2. The network is set up with six input layers and 
(SFO), an optimization technique inspired by nature.  seven hidden layers, all containing 64 neurons to avoid 
Using actual traffic and network data, the algorithm overfitting. Model complexity and generalization were 
initializes a population of solutions, each of which managed in TensorFlow by setting them as 
represents a set of Eff-DNN parameters, and assesses each hyperparameters within the layers. It uses input about how 
according to latency and energy consumption.  The vehicles behave and interact to determine the best 
approach is perfect for real-time intelligent transportation positioning of the edge servers. The network uses the 
systems because it ensures quick convergence and Rectified Linear Unit (ReLU) function to make its 
improved flexibility by updating its location and velocity computations non-linear and adjust the weights it uses for 
depending on fitness input and historical experiences. learning through backpropagation. The Eff-DNN can 
provide quick and efficient decisions in ever-changing 
Efficient Deep Neural Network (Eff-DNN) The vehicular environments due to the backpropagation 
proposed optimized deep learning architecture in VEC process, which keeps the cost function low. Neuron 
outputs were computed as follows in equation (18).  
354   Informatica 49 (2025) 345–360                                                                                                                                L. Wang et al. 
 
 
Figure 2: Architecture of Eff-DNN 
  
𝑧𝑚+1𝑟 = 𝜎(𝑦) = 𝜎(∑𝑛𝑗=1𝜔
𝑚
𝑗 𝑧𝑚 𝑚+
𝑗 + 𝑎 1
𝑟 )  (18) 𝑛𝑠 = 𝛽1𝑛𝑠−1 + (1 − 𝛽1)ℎ𝑠
𝑟   
𝑈
 𝑠 = 𝛽2𝑢𝑠−1 + (1 − 𝛽
2 
2)ℎ𝑠
 
Where σ(z) represents the activation function, and  ℎ𝑠 = 𝛻𝜃𝐹(𝜃𝑠−1) 
𝑧𝑚+1𝑟  is the output of the 𝑟 − 𝑡ℎneuron in the (𝑚 + 1)-th 𝑛
?̂?𝑠 =
𝑠
1−𝛽𝑠   (20) 
layer. The weights among the 𝑗 − 𝑡ℎ neuron of layer n and  1
𝑢
the 𝑟 − 𝑡ℎ neuron of layer (𝑚 + 1 ) are labeled 𝜔𝑚𝑗  , and 
𝑟  ?̂? 𝑠
𝑠 =
 1−𝛽𝑠
𝑎𝑚+1𝑟  represents the bias term for linear transformations. 2
 ?̂?
While training, the loss function compares the predicted 𝜃𝑠 = 𝜃 𝑠
{ 𝑠−1−∝
outcomes with the desired ones. The model finds the best √𝑢𝑠+𝜀
values for 𝜔 and 𝑎 by minimizing the loss, making the 
network predict more accurately. The Eff-DNN’s loss 𝑒𝑝𝑜𝑐ℎ−𝑛𝑢𝑚
𝑀
function is explained in equation (19).  ∝=∝ 𝑎
0 𝛽
𝑏 𝑡𝑐ℎ−𝑠𝑖𝑧𝑒
3    (21) 
1
𝑓(𝜃) = − ∑ ∑𝑟 𝑠 l g
𝑚 𝑚 𝑚𝑟 o 𝑧𝑚𝑟   (19) 
 Where 𝑈𝑠represents the weighted average of 
exponentially the squared gradients, while ℎ𝑠  denotes the 
gradient of the parameters at time  𝑠 , 𝑛𝑠captures the 
Where𝑠𝑚𝑟 represents the actual value of the 𝑟 − 𝑡ℎ 
average movement of the gradient, and ∝0 is the initial 
sample's 𝑚 − 𝑡ℎelement, 𝑧𝑚𝑟  denotes the predicted value 
learning rate. The corrected versions of these estimates 
for the same element, and θ represents the collection of 
were denoted by ?̂?𝑠and ?̂?𝑠, which improve optimization 
parameters including weights 𝜔  and biases 𝑎 . Here, 𝑀 is 
accuracy. Exponential decay rates 𝛽1 , 𝛽2,𝑎𝑛𝑑 𝛽3 are used 
the total quantity of samples. To reduce overfitting, a 
dropout mechanism is employed that randomly disables to stabilize updates. Additionally, parameters such as batch 
neurons during training, effectively disrupting the network size (epbatch−size) and current training iterations (ochnum
structure and promoting generalization. Furthermore, the ) influence convergence behavior. The improved DNN 
proposed method enhances the conventional gradient supports dual operational modes, RDL-1 for normal 
descent by dynamically adapting the learning rate for conditions and RDL-2 for power swing detection, ensuring 
improved convergence. The optimization of the parameter adaptive command generation aligned with dynamic 
set 𝜃 is formally defined as equations (20) and (21).  vehicular network scenarios. 
 
 
 
 
 
  
 
 
Deep Neural Network Architecture Optimization for Edge…                                                     Informatica 49 (2025) 345–360 355 
 
 (𝑡+1)
(𝑡+1) (𝑡)
Synergistic fibroblast optimization (SFO)  𝑏 ∗ 𝑣𝑖
𝑖 = 𝑏𝑖 + 𝑠   (23) 
(𝑡+1)
||𝑣
SFO is modeled after migratory fibroblast cells that 𝑖 ||
heal tissue by responding to the extracellular matrix 
(ECM). Every solution searches the solution space by 𝑠
The movement speed 𝑡 is defined as 𝑠 = here 
varying its position and velocity about diffusion and fitness. 𝑘 ′, w
𝑝𝑞𝐿
This bio-inspired method allows for greater flexibility and "𝑘𝑝𝑞" represents the baseline movement rate and 𝐿 denotes 
avoids local minima, making it appropriate for optimizing the movement length. The SFO-Eff-DNN hybrid model 
neural networks and edge server placement in dynamic optimizes edge server placement in dynamic VEC 
VEC settings. environments by combining adaptive search with deep 
A model based on the adaptive actions of fibroblast learning. It efficiently predicts optimal configurations, 
cells used in repairing tissues. SFO works on tuning how improves convergence speed, and reduces latency and 
deep neural networks are set up and arranging edge servers energy use, making it ideal for real-time intelligent 
in dynamically changing virtual edge clouds. Much as transportation systems. 
fibroblasts respond to the extracellular matrix (ECM), SFO  
looks for solutions in many different ways. Ongoing 4  Results and discussion  
testing and evaluation of fitness ensure the best solutions The experimental setup uses an Intel i7 CPU. 
use both energy and time efficiently. For this reason, this Simulations were conducted in Python with TensorFlow 
approach ensures flexibility in the way transportation and the Veins platform using Cologne traffic traces. The 
systems are managed. dataset was split using an 80:20 ratio, where 80% was used 
The process of biomechanical analysis was for training the SFO-Eff-DNN model and 20% was 
strengthened each time by paying attention to interactions reserved for testing to evaluate performance and 
with the ECM. As it runs, the program tests different generalization.  
combinations of settings, much like fibroblasts, to improve The SFO-Eff-DNN model includes ReLU-activated 
its outcome. The simulated cells disperse and travel to the layers and dropout, optimized via SFO. Performance was 
most promising areas to avoid getting caught in local evaluated based on latency, energy use, and server 
minima. Depending on the speed and distribution of the placement accuracy. Key simulation parameters with 
particles, the algorithm updates its next action using the values aligned to realistic VEC scenarios are presented in 
information and trends it has gathered. As a result, the Table 2. 
process can handle the trade-offs between speed, Table 2: Key simulation parameters for the SFO-Eff-
performance, and movement better in VEC networks. 
DNN VEC Framework 
Initialization: Within the 𝑁 -dimensional solution 
𝑷𝒂𝒓𝒂𝒎𝒆𝒕𝒆𝒓 𝑽𝒂𝒍𝒖𝒆 
space, initialize a population of physical activity 
𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑎𝑟𝑒𝑎 1500 𝑚 ×  1500 𝑚 
movements 𝑓𝑖, , where𝑖 = 1,2, … ,𝑀, . Each movement is 
𝑆𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒  200𝑠, 300𝑠, 400𝑠 
assigned a random position ( ) and velocity (𝑣𝑖). Key 
parameters such as the diffusion coefficient 𝜌 and 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑑𝑔𝑒 𝑠𝑒𝑟𝑣𝑒𝑟𝑠8  
movement speed 𝑠are established.   𝑇𝑟𝑎𝑛𝑠𝑚𝑖𝑠𝑠𝑖𝑜𝑛 𝑝𝑜𝑤𝑒𝑟 25 𝑚𝑊, 30 𝑚𝑊, 35 𝑚𝑊 
Fitness Evaluation: For each candidate solution 𝑓 𝑅𝑆𝑈 𝑎𝑛𝑡𝑒𝑛𝑛𝑎 ℎ𝑒𝑖𝑔ℎ𝑡 5 𝑚 
𝑖  in 
 
the N-dimensional space, the fitness function 𝑒(𝑓 𝑅𝑒𝑐𝑒𝑖𝑣𝑒𝑟 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 −100 𝑑𝐵𝑚 
𝑖)  is 
evaluated iteratively to assess the quality of each 𝑀𝑒𝑠𝑠𝑎𝑔𝑒 𝑠𝑖𝑧𝑒 100 𝑏𝑖𝑡𝑠 
movement. This process aims to identify the optimal 𝑀𝑒𝑠𝑠𝑎𝑔𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 2 𝐻𝑧 
solution (maximum or minimum) within the evolving 𝐷𝑎𝑡𝑎 𝑟𝑎𝑡𝑒 10 𝑀𝑏𝑝𝑠, 20 𝑀𝑏𝑝𝑠, 30 𝑀𝑏𝑝𝑠 
search region. Based on the fitness outcomes, the position 𝑉𝑒ℎ𝑖𝑐𝑙𝑒 𝑠𝑝𝑒𝑒𝑑 𝑟𝑎𝑛𝑔𝑒 0 –  100 𝑘𝑚/ℎ 
(𝑏𝑖)  and velocity (𝑣𝑖)  of each movement were updated 𝐸𝑑𝑔𝑒 𝑠𝑒𝑟𝑣𝑒𝑟 𝐶𝑃𝑈 𝑐𝑎𝑝𝑎𝑐𝑖3𝑡𝑦.5  𝐺𝐻𝑧 
accordingly using the update rules given by Equations (22) 𝐸𝑑𝑔𝑒 𝑠𝑒𝑟𝑣𝑒𝑟 𝑚𝑒𝑚𝑜𝑟𝑦 32 𝐺𝐵 
and (23), enabling the algorithm to adaptively explore the 
 
solution space. 
4.1  Offloading ratio 
(𝑡+1) (𝑡) ( ) Using time on the x plane and the percentage of tasks 
𝑣 𝑖
𝑖 = 𝑣𝑖 + (1 − 𝜌)𝑐(𝑓 𝑡 ∗ 𝑓 (𝑡−𝜏)
𝑖 ) + 𝜌
||𝑓 offloaded from the vehicle to edge servers on the y plane, 
𝑖(𝑡−𝜏)||
    Figure 3 shows the offloading ratio (%) in the VEC system 
(22) 
over 10 minutes. Starting at 75%, the offloading ratio 
steadily rises to 89%, reflecting an increasing reliance on 
Where 𝑡 is the current iteration, 𝜏  is the time delay, edge computation. This upward trend is attributed to 
and the diffusion coefficient 𝜌 is set to 0.5.  enhanced network conditions, adaptive optimization by the 
SFO-Eff-DNN framework for energy efficiency, or the 
growing complexity of vehicular tasks that necessitate 
edge processing. Tracking this metric is crucial in the 
356   Informatica 49 (2025) 345–360                                                                                                                                L. Wang et al. 
 
research context, as a higher offloading ratio signifies accelerating task processing, thereby improving overall 
more efficient utilization of edge resources, which directly system performance in dynamic ITS environments. 
contributes to lowering vehicle energy consumption and 
 
Figure 3: Offloading ratio over time  
 
4.2  SFO-Eff-DNN Pareto Front in VEC  about 70 J to 40 J, showing an inverse relationship. All 
In VEC, the Pareto front for the suggested SFO-Eff- points on the curve are Pareto-optimal, as enhancing one 
DNN illustrates the relationship between latency and factor would cause a drop in the other. Because of the 
energy use. Figure 4 illustrates that with latency increasing model's diversity, it is possible to choose configurations for 
from 50 ms to 70 ms, the energy consumed decreases from specific needs, such as real-time applications or limited-
power cases, proving its effectiveness and adaptability. 
 
Figure 4: Pareto front diversity of SFO-Eff-DNN in VEC 
 
4.3  Convergence Behavior of SFO-Eff-DNN algorithm quickly identifies energy-efficient 
Figures 5 (a) and (b) illustrate the convergence configurations. The average energy consumption (green 
behavior of the SFO-Eff-DNN algorithm over 100 dashed line) also follows a similar decreasing trend, 
optimization iterations for energy consumption and latency. gradually converging toward the minimum, which reflects 
In Figure (a), the minimum energy consumption (blue line) the population's collective improvement. Similarly, in 
rapidly drops from approximately 0.34 to 0.29 within the Figure (b), during the first iterations, the latency drops 
first 10 iterations and then stabilizes, indicating that the rapidly and then becomes more stable at a much lower 
Deep Neural Network Architecture Optimization for Edge…                                                     Informatica 49 (2025) 345–360 357 
 
level. The average latency also decreases and stabilizes Overall, these trends confirm that SFO-Eff-DNN achieves 
around the same value, highlighting consistent efficient and simultaneous convergence toward optimal 
performance improvement across the solution space. energy and latency trade-offs. 
 
Figure 5: Convergence Behavior of SFO-Eff-DNN (a) energy conception and (b) latency 
 
4.4  Performance analysis  
A comparison of several optimization techniques 
based on their energy consumption and latency 
performance in vehicular edge computing scenarios is 
shown in Table 3. Among the evaluated techniques, 
Particle Swarm Optimization (PSO) (Surayya et al., 2025), 
Teaching–Learning-Based Optimization (TLBO) (Surayya 
et al., 2025), and Ant Colony Optimization (ACO) 
(Surayya et al., 2025), the proposed SFO-Eff-DNN method 
demonstrates the energy consumption and the latency. This 
highlights the superior efficiency and responsiveness of 
the SFO-Eff-DNN framework, making it highly suitable 
for real-time, energy-aware edge deployments in dynamic 
vehicular environments. Figure 6 demonstrates the results 
of the performance analysis.   
 
Table 3: Comparison of optimization methods by 
energy consumption and latency 
Methods Energy Latency (S) 
Consumption 
(J) 
PSO (Surayya 0.3535 40 μs 
et al., 2025) 
TLBO 0.3546 40 μs 
(Surayya et 
al., 2025)  
ACO 0.3517 60μs 
(Surayya et Figure 6: Comparison methods by energy 
al., 2025) consumption and latency 
SFO-Eff-DNN 0.3480 30 μs  
(Proposed) Analyzing different optimization methods for their 
energy consumption and latency when used in VEC. SFO-
Eff-DNN shows better results than other models by using 
the least amount of energy (0.3480 J) and having the 
shortest latency (30 μs). Here, microseconds (μs) are used, 
358   Informatica 49 (2025) 345–360                                                                                                                                L. Wang et al. 
 
since 1 μs is a millionth of a second, which is needed to when ultra-low latency is necessary. ACO (Surayya et al., 
ensure fast response times vital in real-time VEC systems. 2025) can distribute solutions equally, but its slow 
For energy usage, PSO and TLBO lead with 0.3535 J and execution means it is not suitable when time is critical. A 
0.3546 J, respectively, but both have a latency of 40 μs, PSO, TLBO, and ACO lead with low energy of 0.3535 J, 
while ACO uses 0.3517 J with the highest latency of 60 μs. 0.3546 J, and 0.3517 J. Using the SFO-Eff-DNN model, 
The results demonstrate that SFO-Eff-DNN offers better energy costs and latency can be cut down at the same time, 
results in real-time, energy-sensitive VEC applications. compared to older versions. Compared to the generic 
A comparison of task drop rates for various placement method's 2.90% dropped task rate (Khamari et al., 2022), 
techniques in dynamic VEC situations is shown in Table 4 the SFO-Eff-DNN's dropped task rate was only 1.83%, 
and Figure 7.  In comparison to the generic method's 2.90% indicating its resilience in workload balancing and edge 
(Khamari et al., 2022) dropped task rate, the suggested resource utilisation in dynamic vehicular situations. Due to 
SFO-Eff-DNN model performs better, attaining a dropped advanced techniques and deep learning, the system reacts 
task rate of just 1.83% (Proposed). In latency-sensitive, to updates in vehicles and can quickly and accurately 
high-mobility edge computing systems, this research configure servers for VEC applications. 
demonstrates how well the SFO-Eff-DNN optimises server The computational load brought on by the 
workload allocation and lowers service denial. hybridization of deep learning and evolutionary 
 optimization constitutes one of the key issues, especially 
Table 4: Comparison of task dropped rate between during the early phases of training and adaption.  Despite 
placement strategies in VEC environments its potential for convergence efficiency, iterative 
 optimization can be resource-hungry on edge nodes with 
Placement strategies Dropped Tasks (%) constrained computing capacity. Another problem is the 
  system's scalability in high-density vehicle networks.  
generic method 2.90% While the model works well for simulations of 
(Khamari et al., intermediate scale, more study is needed to determine how 
2022) it responds and operates in large, real-time vehicular 
systems with hundreds of nodes. These limitations 
SFO-Eff-DNN 1.83% 
highlight the significance of future studies that focus on 
(Proposed) 
distributed training practices and lightweight optimization 
 
versions that can sustain performance without increasing 
compute demands in practical applications. 
 
5 Conclusion  
VEC is a pattern that encourages cloud computing 
capabilities closer to the network edge services needed for 
low-latency services, such as auto-corrective driving 
support, real-time traffic management, and location-based 
applications. The proposed SFO-Eff-DNN framework is 
used to optimize deep learning for VEC using modern 
evolutionary algorithms. To deal with the problem of 
placing servers at the edge of wireless networks in vehicles, 
both Synergistic Fibroblast Optimization and deep neural 
 
 networks were used. It makes use of real travel data to 
Figure 7: Comparison of Dropped Task Rates for manage how quickly it responds and how much energy it 
Generic Method and SFO-Eff-DNN uses, adjusts to any changes in the network, and provides 
 quick results. The data from experiments reveals that SFO-
4.5   Discussion Eff-DNN works with 30 μs latency, 0.3480 J energy 
consumption, and only 1.83% dropped tasks, making it 
By optimizing the placement of edge servers and DL 
well-suited for speedy and efficient smart transportation. It 
networks, the SFO-Eff-DNN in VEC reduces latency and 
strongly supports and adapts to the new directions being 
conserves energy. The technique has some problems with 
taken in VEC deployments. Using simulated movement 
responding to changes in vehicles and adapting to sudden 
and experimentation usually does not reflect real-world 
network changes in VEC settings (Bi et al., 2020). While 
events or problems, meaning their practical use may not be 
VECMAN saves energy by sharing resources among 
as effective. 
electric vehicles, it is difficult for it to accurately predict 
where vehicles are and to schedule them in situations that 
are constantly changing (Bahreini et al., 2021). As both Future scope  
PSO and TLBO (Surayya et al., 2025) prioritize low Future research should integrate real-time traffic incident 
energy over low latency, they may not respond fast enough data and 5G network slicing to further enhance adaptability. 
Extending the framework with federated learning for 
Deep Neural Network Architecture Optimization for Edge…                                                     Informatica 49 (2025) 345–360 359 
 
privacy-preserving model updates across distributed [10] Peyman, M., Fletcher, T., Panadero, J., Serrat, C., 
vehicles, and exploring hybrid optimizers that combine Xhafa, F. and Juan, A.A., 2023. Optimization of 
SFO with reinforcement learning could improve vehicular networks in smart cities: from agile 
robustness against unforeseen network disruptions and optimization to learn heuristics and sim heuristics. 
accelerate convergence in large-scale, heterogeneous VEC Sensors, 23(1), 
deployments. p.499.https://doi.org/10.3390/s23010499 
[11] Ebrahimi Mood, S., Rouhbakhsh, A. and Souri, A., 
2025. Evolutionary recurrent neural network based 
References  on equilibrium optimization method for cloud-edge 
[1]  Wan, S., Xu, X., Wang, T., and Gu, Z., 2020. An resource management in Internet of Things. Neural 
intelligent video analysis method for abnormal event Computing and Applications, 37(6), pp.4957-
detection in intelligent transportation systems. IEEE 4969.https://doi.org/10.1007/s00521-024-10929-1 
Transactions on Intelligent Transportation Systems, [12] Vijayakumar, P., Rajalingam, P. and Rajeswari, 
22(7), pp.4487-4495.DOI: S.V.K.R., 2021. Edge Computing Optimization 
10.1109/TITS.2020.3017505 Using Mathematical Modeling, Deep Learning 
[2] Boukerche, A., Tao, Y. and Sun, P., 2020. Artificial Models, and Evolutionary Algorithms. Simulation 
intelligence-based vehicular traffic flow prediction and Analysis of Mathematical Methods in Real‐Time 
methods for supporting intelligent transportation Engineering Applications, pp.17-
systems. Computer networks, 182, 44.https://doi.org/10.1002/9781119785521.ch2 
p.107484.https://doi.org/10.1016/j.comnet.2020.107 [13] Yang, Z., Zhang, S., Li, R., Li, C., Wang, M., Wang, 
484 D. and Zhang, M., 2021. Efficient resource-aware 
[3]  Elassy, M., Al-Hattab, M., Takruri, M. and Badawi, convolutional neural architecture search for edge 
S., 2024. Intelligent transportation systems for computing with Pareto-bayesian optimization. 
sustainable smart cities. Transportation Engineering, Sensors, 21(2), 
p.100252.https://doi.org/10.1016/j.treng.2024.10025 p.444.https://doi.org/10.3390/s21020444 
2 [14] Li, Z., Yu, H., Fan, G., Zhang, J. and Xu, J., 2024. 
[4]  Alhilal, A.Y., Finley, B., Braud, T., Su, D. and Hui, P., Energy-efficient offloading for DNN-based 
2022. Street smart in 5G: Vehicular applications, applications in edge-cloud computing: A hybrid 
communication, and computing. IEEE Access, 10, chaotic evolutionary approach. Journal of Parallel 
pp.105631- and Distributed Computing, 187, 
105656.DOI: 10.1109/ACCESS.2022.3210985 p.104850.https://doi.org/10.1016/j.jpdc.2024.10485
[5]  Chougule, S.B., Chaudhari, B.S., Ghorpade, S.N. and 0 
Zennaro, M., 2024. Exploring computing paradigms [15] Pustokhina, I.V., Pustokhin, D.A., Gupta, D., Khanna, 
for electric vehicles: from cloud to edge intelligence, A., Shankar, K. and Nguyen, G.N., 2020. An 
challenges and future directions. World Electric effective training scheme for deep neural networks in 
Vehicle Journal, 15(2), edge computing enabled Internet of Medical Things 
p.39.https://doi.org/10.3390/wevj15020039 (IoMT) systems. IEEE Access, 8, pp.107112-
[6]  Talpur, A. and Gurusamy, M., 2021. Drld-sp: A deep- 107123.DOI: 10.1109/ACCESS.2020.3000322 
reinforcement-learning-based dynamic service [16] Loni, M., Sinaei, S., Zoljodi, A., Daneshtalab, M. and 
placement in edge-enabled internet of vehicles. IEEE Sjödin, M., 2020. DeepMaker: A multi-objective 
Internet of Things Journal, 9(8), pp.6239- optimization framework for deep neural networks in 
6251.DOI: 10.1109/JIOT.2021.3110913 embedded systems. Microprocessors and 
[7]  Zaki, A.M., Elsayed, S.A., Elgazzar, K. and Microsystems, 73, 
Hassanein, H.S., 2024. Quality-Aware Task p.102989.https://doi.org/10.1016/j.micpro.2020.102
Offloading for Cooperative Perception in Vehicular 989 
Edge Computing. IEEE Transactions on Vehicular [17] Saheed, Y.K., Abdulganiyu, O.H. and Ait Tchakoucht, 
Technology.DOI: 10.1109/TVT.2024.3444591 T., 2024. Modified genetic algorithm and fine-tuned 
[8]  Zhao, L., Li, T., Zhang, E., Lin, Y., Wan, S., Hawbani, long short-term memory network for intrusion 
A. and Guizani, M., 2023. Adaptive swarm detection in the Internet of Things networks with 
intelligent offloading based on digital twin-assisted edge capabilities. Applied Soft Computing, 155, 
prediction in VEC. IEEE Transactions on Mobile p.111434.https://doi.org/10.1016/j.asoc.2024.111434 
Computing, 23(8), pp.8158- [18] Bi, J., Yuan, H., Duanmu, S., Zhou, M. and Abusorrah, 
8174.DOI: 10.1109/TMC.2023.3344645 A., 2020. Energy-optimized partial computation 
[9]  Shen, B., Xu, X., Qi, L., Zhang, X. and Srivastava, offloading in mobile-edge computing with genetic 
G., 2021. Dynamic server placement in edge simulated-annealing-based particle swarm 
computing toward the internet of vehicles. Computer optimization. IEEE Internet of Things Journal, 8(5), 
Communications, 178, pp.114- pp.3774-3785.DOI: 10.1109/JIOT.2020.3024223 
123.https://doi.org/10.1016/j.comcom.2021.07.021 
360   Informatica 49 (2025) 345–360                                                                                                                                L. Wang et al. 
 
[19] Chen, Z., Hu, J., Chen, X., Hu, J., Zheng, X. and Min, (pp. 1-7). IEEE. https://doi.org/10.1109/VTC2021-
G., 2020. Computation offloading and task Spring51267.2021.9448645 
scheduling for DNN-based applications in cloud- [30] Khamari, S., Ahmed, T. and Mosbah, M., 2022, 
edge computing. IEEE Access, 8, pp.115537- December. Efficient edge server placement under 
115547.DOI: 10.1109/ACCESS.2020.3004509 latency and load balancing constraints for vehicular 
[20] You, Q. and Tang, B., 2021. Efficient task offloading networks. In GLOBECOM 2022-2022 IEEE Global 
using particle swarm optimization algorithm in edge Communications Conference (pp. 4437-4442). 
computing for the industrial internet of things. IEEE.https://doi.org/10.1109/GLOBECOM48099.2
Journal of Cloud Computing, 10, pp.1- 022.10000721 
11.https://doi.org/10.1186/s13677-021-00256-4 
[21] Yousif, A., Bashir, M.B. and Ali, A., 2024. An 
evolutionary algorithm for task clustering and 
scheduling in IoT edge 
computing. Mathematics, 12(2), 
p.281.https://doi.org/10.3390/math12020281 
[22] Xiao, H., Zhao, J., Pei, Q., Feng, J., Liu, L. and Shi, 
W., 2021. Vehicle selection and resource 
optimization for federated learning in vehicular edge 
computing. IEEE Transactions on Intelligent 
Transportation Systems, 23(8), pp.11073-
11087.DOI: 10.1109/TITS.2021.3099597 
[23] Bahreini, T., Brocanelli, M. and Grosu, D., 2021. 
VECMAN: A framework for energy-aware resource 
management in vehicular edge computing 
systems. IEEE Transactions on Mobile 
Computing.DOI: 10.1109/TMC.2021.3089338 
[24] Jiang, H., Cai, J., Xiao, Z., Yang, K., Chen, H. and Liu, 
J., 2025. Vehicle-Assisted Service Caching for Task 
Offloading in Vehicular Edge Computing. IEEE 
Transactions on Mobile 
Computing.DOI: 10.1109/TMC.2025.3545444 
[25] Surayya, A., Hussain, M.M., Reddy, V.D., Abdul, A. 
and Gazi, F., 2025. Evolutionary Algorithms for Edge 
Server Placement in Vehicular Edge 
Computing. IEEEAccess.10.1109/ACCESS.2025.35
66172  
[26]  Luo, X., Liu, D., Huai, S. and Liu, W., 2021, 
February. HSCoNAS: Hardware-software co-design 
of efficient DNNs via neural architecture search. In 
2021 Design, Automation & Test in Europe 
Conference & Exhibition (DATE) (pp. 418-421). 
IEEE.https://doi.org/10.23919/DATE51398.2021.94
73937 
[27]  Odema, M., Rashid, N., Demirel, B.U. and Al 
Faruque, M.A., 2021, December. LENS: Layer 
distribution enabled neural architecture search in 
edge-cloud hierarchies. In 2021 58th ACM/IEEE 
Design Automation Conference (DAC) (pp. 403-
408). IEEE. 
https://doi.org/10.1109/DAC18074.2021.9586259  
[28]  Abreha, H.G., Hayajneh, M. and Serhani, M.A., 
2022. Federated learning in edge computing: a 
systematic survey. Sensors, 22(2), p.450. 
https://doi.org/10.3390/s22020450 
[29]  Talpur, A. and Gurusamy, M., 2021, April. 
Reinforcement learning-based dynamic service 
placement in vehicular networks. In 2021 IEEE 93rd 
Vehicular Technology Conference (VTC2021-Spring) 
https://doi.org/10.31449/inf.v49i12.8792                                                                                      Informatica 49 (2025) 361–376   361 
 
CCR-LWECNN: A Lightweight CNN Framework for Chinese 
Calligraphy Recognition and Evaluation 
 
Xi Chen*，Jing Zhao 
Zhongyuan Institute of Science and Technology，Zhengzhou, Henan，461000，China 
*Corresponding author 
E-mail：chenxi19891028@126.com 
 
Keywords:  character, CCR-LWECNN Chinese calligraphy recognition, deep learning, image processing system 
 
Received: April 3, 2025 
This study presents a lightweight enhanced CNN architecture (CCR-LWECNN) for Chinese calligraphy 
recognition, addressing the challenges of multi-class classification across 12,152 labeled images 
spanning 960 Chinese characters in five calligraphic styles. Unlike previous studies limited to small 
character sets and single recognition approaches, this research integrates character recognition with 
image processing techniques. Data augmentation using TensorFlow’s Image Data Generator—applying 
rotation and zoom—was employed to improve class balance and variety. The proposed model, comprising 
five convolutional and three fully connected layers, processes 224×224-pixel images and leverages 
pretraining for robust feature extraction. CCR-LWECNN achieved superior performance with 96.5% 
accuracy, 95.6% precision, 95.2% recall, and 95.6% F1-score, outperforming baseline models such as 
traditional CNN (90.5%), SVM (85.2%), and Random Forest (75.4%). By effectively mitigating overfitting 
and underfitting through dropout layers and augmentation, this approach advances automated Chinese 
calligraphy recognition and provides a scalable solution for real-world applications. 
Povzetek: CCR-LWECNN je lahki izboljšani konvolucijski model za prepoznavanje kitajske kaligrafije, ki 
na 12.152 slikah dobre rezultate. Z združevanjem povečanja podatkov in učinkovite CNN-arhitekture 
izboljša prepoznavanje 960 znakov v petih slogih ter preseže klasične metode. 
 
1 Introduction calligraphy offers advantages in addition to being a highly 
regarded art form [4]. 
Characters in Chinese calligraphy are made up of a lot Character recognition has emerged as a hotspot for 
more strokes than those in Western calligraphy [1]. A computer vision research as picture digitisation advances, 
single letter in Chinese calligraphy can be made up of as and it has significant applications in data entry for paper 
few as one stroke or as many as thirty. Before writing documents. Because handwriting characters have more 
begins, the ink is absorbed by dipping and then used to irregular shapes than printed documents, it is more difficult 
produce strokes with a soft hairbrush. Different styles are to recognise handwriting. Chinese calligraphy is a sort of 
produced as the calligrapher writes the character by handwriting art form that consists of five main font type 
varying the brush's pressure, speed, and direction [2]. [5]. Figure 1 shown by Chinese calligraphy different font 
Regular, clerical, cursive, semi-cursive, and seal are the type. 
most often used styles. These styles go under several 
names. For instance, referred to the semi-cursive style as 
the running style. The naming scheme employed by author 
will be applied in this study [3]. Beginning with a single 
style is beneficial for Chinese calligraphy students. The 
student might advance to another style after they are 
proficient at writing several characters in that style. An 
ancient art style that originated in China, Chinese 
calligraphy is also well-liked in a number of other nations, 
including South Korea, Japan, and Thailand. Using a brush 
 
and ink, Chinese calligraphy artists create visually 
appealing and well-composed characters. Chinese Figure 1: Chinese calligraphy different font type 
362   Informatica 49 (2025) 361–376                                                                                                                                 X. Chen et al. 
 
However, many find it difficult to instantly identify the upto300 Chinese characters—roughly 8–12.4% of the 
content of calligraphy works since the shapes of the letters 2500 characters used every day—can be recognised by 
in Chinese calligraphy vary widely across calligraphers ReLU models. Furthermore, there aren't many examples 
and differ substantially from conventional fonts used in from old Chinese calligraphy masters, thus additional 
daily life. Therefore, by presenting the font and textual training sample photos are required. There is a need for 
content of the input calligraphy image, a real-time more research because calligraphy is only mentioned in 
calligraphy recognition system can aid amateur one empirical study on AI in education. 
calligraphers in understanding calligraphy works [6]. 
Instead of manually typing out the text, the method may 1.2 Contribution of this study 
also be used to digitise calligraphy by just entering the 
image of the piece. In this study, we developed and put into The three primary forms of Chinese calligraphy—
use a convolutional neural network-based calligraphy character recognition, calligraphy production and 
recognition system. Compared to earlier research, the simulation, and calligraphy analysis—represent an 
system has higher accuracy rates for identifying both important field of study deep learning (DL). To enhance 
typeface and textual content. We created a dataset of Chinese character and image processing technology, this 
calligraphy characters to train the network, and we tested study blends dropout in CNN hidden layers, data 
the viability of the system using pictures of various augmentation methods, and CNN architecture. The 
calligraphy pieces [7]. suggested approach CCR-LWECNN allows for greater 
accuracy without requiring additional training photos by 
1.1 Challenges in Chinese calligraphy recognising more than 960 Chinese characters in five 
recognition calligraphic forms. Other languages can also be added to 
the model. In order to assist in this paper to monitor their 
Chinese calligraphy is a difficult art form because of its progress during practice sessions. Related works, datasets, 
many Chinese characters, many styles, and intricacy [8]. methods, findings, implications, discussion, and 
Since art evaluation is subjective and can have a conclusions are all included in the parts that make up the 
detrimental effect on teacher-student relationships, it study. 
might be challenging to find qualified calligraphers and 
offer comments. Artificial intelligence (AI) can assist in 2 Literature review  
overcoming these obstacles by offering unbiased 
assessments and comments. But only tiny groups of Table 1 shows Summary of works 
 
Table1: Summary on related works 
Ref Methods Used Dataset Baseline & Proposed Key Findings 
Size Accuracy Method & 
Accuracy 
[9] CNN, TensorFlow Not Traditional OCR CNN + CNN significantly improves 
specified 80% TensorFlow recognition for handwritten 
93.7% characters 
[10] Hybrid CNN + 20,000+ Basic CNN Proposed 91.8% Attention helps in 
Attention + images 87.5% distinguishing subtle 
Distillation calligraphic variations 
[11] MobileNet, CNN ~12,000 Tesseract OCR MobileNet 90.1% Suitable for lightweight 
76.2% deployment in mobile/web 
[12] Deep CNN, CAI Not given Classic CNN Proposed hybrid Integration of CAI improves 
84.6% 89.2% learning and recognition 
efficacy 
[13] CNN with Deep ~8,000 Hand-crafted Proposed 91.0% Deep stroke analysis 
Stroke Extraction stroke features provides structural and 
78.4% aesthetic insight 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 361–376 363 
 
[14] 5-layer CNN ~6,500 SVM 83.2% CNN 92.4% CNN better handles degraded 
or stylized historical samples 
[15] Traditional CNN + Not stated Template CNN 88.6% CNN adapts better to style 
Filters Matching 74.8% variance than traditional 
methods 
[16] Faster R-CNN, 10,000+ SSD 90.3%, Faster R-CNN Accurate segmentation and 
YOLOv3 YOLOv3 91.5% 95.1% detection for full-page 
manuscripts 
3 Methodology dataset's photos were adequate for style recognition. 
However, a second technique was employed to enhance 
3.1 Dataset the number of training photos since we required to increase 
the number of images per Chinese character for character 
In order to construct the style recognition model, we used recognition. Utilizing data augmentation was the second 
CCR-LWECNN models, which represent datasets and strategy. During the training phase, we rotated and zoomed 
image pre-processing. The character recognition model is in on the already example photos using TensorFlow's 
constructed via data augmentation and picture pre- Image Data Generator function to produce more sample 
processing. Kaggle's "Chinese calligraphy characters images.  
image set" serves as the training dataset for the image 
recognition model [17] we provide these resources will be The dataset was constructed by combining photos from the 
made available via a public GitHub Humanities & Social Sciences collection with the Kaggle 
repository:https://github.com/zhuojg/chinese-calligraphy- set. Hash-based comparison methods were used to find and 
dataset. 2890 calligraphy pictures totaling 960 characters eliminate duplicate photos in order to guarantee quality. 
were collected from various calligraphers and made Additionally, physical inspection and simple picture 
available to the public. These pictures are labeled as semi- quality checks (e.g., resolution thresholding and contrast 
cursive, regular, seal, cursive, or clerical. We employed the analysis) were used to filter out low-quality samples, such 
oversampling approach because of the dataset's label as blurred, low-resolution, or severely distorted images. A 
imbalance issue. Additionally, this analysis demonstrated clean and varied dataset for efficient model training was 
that overfitting would not result from oversampling. A far guaranteed by this preparation. 
larger dataset was required for the image processing model 
Data Augmentation: Random rotation (±15°), zoom (10-
than for style recognition. This is due to the fact that each 
20% scale variation), brightness modification (±20%), and 
character to be categorized belongs to a single output class 
horizontal flipping (50% probability) were used by the 
in this multiclass classification model. There would have 
data augmentation process to enhance sample variety. In 
been just 2890 training photos for 960 classes if we had 
order to replicate natural stroke fluctuations, we used 8x8 
utilized the same dataset for style recognition. That would 
grid warping with σ=4 for elastic distortions. These 
imply that there would typically be no more than three 
parameters were chosen to increase the effective training 
pictures each word. We needed to figure out how to get 
dataset 5-fold without creating unreal artefacts, all the 
more training photos. To expand the dataset's picture count 
while maintaining calligraphic integrity. Bilinear 
for character recognition, we employed two strategies. 
interpolation was used to preserve stroke continuity 
Adding pictures from a public domain collection was the 
throughout the real-time implementation of all 
initial technique. An online database of the Humanities & 
transformations using TensorFlow's Image Data 
Social Sciences Database Catalogue contained the 
Generator. 
dataset's URL (Humanities & Social Sciences Database  
Catalogue, 2023). We crawled the page and gathered 3.2 Feature extraction 
photos using the Kaggle connection. However, the link 
In recent years, deep learning has been widely applied in 
was broken when this paper was written. Following the 
tracking, object identification, and other domains. By 
addition of pictures from this dataset, the final dataset 
integrating low-level characteristics to create high-level 
comprised 12,152 training photos, with at least 10 images 
features that represent the scattered aspects of data, it 
for each Chinese word. The train_test_split () function in 
simulates how the human brain functions. Usually, the 
the Scikit-learn package's data preparation module was 
Light Weight Enhanced traditional CNN is used directly 
used to divide these photos into training and testing sets. 
for image classification. Utilizing CNN's numerous 
The sorted Data folder included the training set.  
advantages in feature extraction is the aim of this work. 
Since there were just five styles in the output class, the 
364   Informatica 49 (2025) 361–376                                                                                                                                 X. Chen et al. 
 
Compared to explicit feature extraction, digital feature completely linked layer that can identify the style of a 
extraction produces more detailed feature data for Chinese picture. The first convolutional layer of our suggested 
picture works.  model has 32 filters, each of which has three channels and 
The CNN theoretical framework-based CCR-LWECNN is 3 x 3. We employ the same padding, which means that 
model was pretrained using the Kaggle dataset to extract the input images are zero-padded so that the filters overlap 
the visual attributes of Chinese calligraphy. The model is each pixel. We employed ReLU as the activation function 
a feed-forward neural network with the model has two in the convolution layer. Batch normalization is used to 
convolutional layers, not five, with one fully connected enhance model stability and performance. The maximum 
layer of 512 neurons and an output layer three fully pooling filter has a dimension of 2 × 2 and travels with a 
connected layers. The model resizes the input image to 224 stride of 2. It carries out the max pooling procedure on the 
by 224 pixels in order to produce a 4096-dimensional feature maps. To assist keep this model from overfitting to 
feature vector. Feature Extraction is presented as a the training data, we have implemented a dropout layer to 
pretrained feature extractor producing a 4096-dimensional shake off the neurons. The dropout value for the first 
vector for further classification, suggesting a two-step convolution layer is set at 0.20. The second convolutional 
pipeline. The proposed model, pretrained on real images, layer is created using 64 filters, each of which has three 
may be used to extract characteristics from Chinese channels and is 3 × 3. The same cushioning is employed 
calligraphy. First of all, Chinese calligraphy characters are here as well. The ReLU activation function is also applied 
an artistic reworking of natural surroundings and another to the feature maps. Once more, batch normalization is 
depiction of a natural image. Second, the deep structure of utilized in the second layer to enhance model performance. 
the CCR-LWECNN model may extract complex structures The max pooling filter is 2x2 in size, advances by a stride 
from rich perceptual input and generate intrinsic of 2, and performs the max pooling operation on the 
representations in the data. More than 10 million natural feature maps. To address the issue of overfitting, a dropout 
photos are being utilized for training in order to gain value of 0.25 has been chosen for the second convolution 
relevant information for Chinese calligraphy feature layer. A flatten layer has been employed after the second 
extraction. Chinese character-like feature information will convolutional layer's dropout value has been set. The 
be included in the recovered features either directly or outcome of the last pool layer is a victory type, and a fully 
indirectly. Last but not least, the study's training dataset linked layer with 512 neurons comes after it. The final 
does not contain enough Chinese writing pieces to features are then classified into many classes in the output 
adequately train the suggested model. Nevertheless, this layer using fully linked layers that were taken from the 
study uses it as a forerunner to the CNN model, which is previous pooling and convolution layers. The completely 
lightweight so that the components it extracts may better linked layers learn from features. Batch normalization has 
capture the artistic character of Chinese calligraphy once again been used. The dropout value of 0.5 was then 
recognition. applied. We once more employed ReLU for the activation 
function in the dense layer. Lastly, there are six nodes in 
3.3 Model explanation with CNN the output layer, each of which represents six classes. 
Next, we classified the desired label in the output layer 
In Figure 2, the framework that extracts the key using the softmax activation function. Figure 2aand 2b 
characteristics of calligraphy recognition consists of two Showed in Architecture diagram with dimensionalflows.
convolutional layers. The framework has a single, 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 361–376 365 
 
 
Figure 2a: Image style recognition model 
 
Figure 2b: Architecture diagram with dimensional flow
3.4 Chinese calligraphy recognition based on thoroughly, and emphasize its contribution to image 
lightweight enhanced CNN algorithm recognition technology for Chinese calligraphy 
(CCR-LWECNN) recognition. 
Convolutional neural network (CNN) technology is a type CNN is an effective recognition method and a type of 
of neural network that is specifically designed to process neural network that mimics the visual structure of biology. 
images. Since its inception, the technology has seen Convolutional, pooling, and fully connected layers are the 
significant development. As a result, CNN has greatly primary components of this recognition system. One of 
aided people in processing visual information. However, CNN's primary functions is the convolution operation of 
this technology's computationally demanding approach the convolutional layer. The following illustrates the 
also restricts its use in a number of industries. Therefore, convolution computation of continuous functions: 
the primary research goals in the current image recognition 
𝑠(𝑡) = ∫ 𝑥(𝑎)𝑤(𝑡 − 𝑎) ⅆ𝑎                      (1) 
sector are to lower the computational cost of CNN and 
decrease the calculation time, optimise the technology 
366   Informatica 49 (2025) 361–376                                                                                                                                 X. Chen et al. 
 
Equation (1) uses x and w to stand for integrable functions, f′(z(l)) is the activation function; μ is the learning rate; l is 
a and t for distinct computational components, and d for the level of neurones; i is neurones; T is a constant; δ is the 
the convolution operation. The following illustrates how difference between the network's true and predicted 
discrete functions are calculated using convolution:  values; W is the weight; b is the bias of the neurone; z is 
the neuron's input; an is its output; and f′(z(l)) is the 
𝑠(𝑛) = ∑𝑚 r[𝑚]𝑣 , (𝑛 − 𝑚)                               (2) activation function. The following formula is used to 
determine a sample's loss function: 
Discrete functions are represented by r and v in Equation 
(2), whereas calculation elements are represented by m and 𝐽(𝑊, 𝑏; 𝑥, 𝑦) = 1/2||𝑦 − ℎ𝑤1,𝑏(𝑥)||2                (10) 
n. Convolution can be thought of as a filtering procedure 
in computer vision tasks. Typically, the input data is a two- The following illustrates how the fully linked layer's 
dimensional picture. Convolution is performed using a output data is calculated: 
two-dimensional discrete convolution in the manner 
described below: 𝑦 = 𝑓(𝑊. 𝑥 + 𝑏)                         (11) 
𝑚
𝐼(𝑥, 𝑦) ∗ 𝑘(𝑥, 𝑦) = ∑ ∑𝑛
𝑡=0. k(𝑠, 𝑡)𝐼(𝑥 − 𝑠, 𝑦 − 𝑡)     𝑦     =  o  u  t p  u  t    v  e  c  t o  r    o  f    t h  e     f u  l l y     c  o  n  n  e c  t e  d     l a  y  e r  ( 5 12 or 6 
𝑠=0
elements) 
(3) 
𝑊 =weight matrix 
In Equation (3), I stand for the output feature, k for the 
convolution kernel, m and n for the convolution kernel's 
𝑥 =input feature vector 
dimensions, x and y for the feature output point, and s and 
t for the feature extraction point. Pooling the image and 𝑏 =bias vector 
producing the result are the roles of the fully connected 
layer and the pooling layer, respectively. Both forward and 𝑓 =activation function (ReLU or Softmax) 
backward propagation are included in the CNN model's 
computation. Forward propagation is a sequence of Each output neurone in a dense layer computes a weighted 
computations that use input data to perform tasks like sum of all input characteristics plus a bias term, which is 
image recognition and feature extraction, then combine then passed through an activation function. This 
and output the results. Backpropagation is the process of representation faithfully depicts the behaviour of the layer. 
using the computation results as input to determine the This adjustment guarantees mathematical lucidity and 
error as the fundamental reference data for model conforms to the norms used in the literature on neural 
optimisation. The network optimises the parameters it networks. 
learns by ongoing iterative training and updating, with 
training ending when the predetermined thresholds are CNN is carried out using W, and following decomposition, 
fulfilled. Among these, backpropagation computation the first t significant eigenvalues are substituted for W's 
involves forwarding the input sample (x, y) in order to decomposition as follows: 
determine the output value of L1, L2, …, Ln, and the 
𝑤 = ⋃∑𝑉𝑇 = ⋃ ∑ 𝑉𝑇                                   (12) 
output layer error in the manner described below: 𝑡
(𝑛𝑖) A diagonal matrix is denoted by ∑, a v × t-dimensional 
𝛿𝑖 = −(𝑦 − 𝑎(𝑛𝑖)
. ) ⋅ 𝑓′ (𝑛 )
(𝑧 𝑖
𝑖 )                         (4) 
orthogonal matrix by V, and an u × t-dimensional 
orthogonal matrix by U. As a result, CCR-LWECNN is 
Each layer's error computation is displayed as follows: 
represented as follows: 
𝛿(𝑙) 𝑇
= ((𝑤)(𝑙)) 𝛿(𝑙+1))𝑓′(𝑤)(𝑙)
         (5) Y = Wx = U(∑ 𝑣𝑇) ⋅ 𝑥 = 𝑈 ⋅ 𝑧                       (13) 
𝑡
The following formula is used to determine the relative The CNN technology can be broken down by the CCR-
derivatives of weights and biases: LWECNNalgorithm, significantly lowering the network's 
𝑇 computing load. In addition to being straightforward, this 
Δ𝑤(𝑙)𝐽(𝑊, 𝑏; 𝑥, 𝑦) = 𝛿(𝑙+1)(𝑎(𝑙))                        (6) approach produces superior outcomes. This algorithm is 
designed to optimise the CCR-LWECNN. Simpler image 
Δ𝑏(𝑙)𝐽(𝑊, 𝑏; 𝑥, 𝑦) = 𝛿(1+1)                                     (7) computational processes are outside the CNN algorithm's 
capabilities, and the CCR-LWECNNalgorithm excels at 
The following are the revised weight parameters: 
handling them. Its output feature map definition is 
displayed as follows: 
𝑤′ = 𝑤′ − 𝜇∇𝑤(𝑙)𝐽(𝑊, 𝑏; 𝑥, 𝑦)                            (8) 
𝑏′ = b − 𝜇∇𝑏(𝑙)𝐽(𝑊, 𝑏; 𝑥, 𝑦)                              (9) 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 361–376 367 
 
𝐶 ~
𝑥 𝑤𝐶
𝑛  is the approximated weight for the nnn-th filter and 
𝑦
𝐹𝑛(𝑥, 𝑦) = ∑ ∑ ∑ 𝑧𝐶 ( 1
𝑦′ 𝑥 , 𝑦′)𝑤𝑐
𝑛(𝑥 − channel C. 
1 =1
𝑥 =1
1=1
𝑥′, 𝑦 − 𝑦′)                                                                       𝐻k
𝑛 : Projection matrix or basis vector used to reduce the 
(14) dimensionality of the filters (e.g., a learned kernel basis). 
 𝐹𝑛(𝑥, 𝑦): Output feature map at position (x, y) for the n-th 𝑉𝐶
𝑘 : Coefficient vector or activation feature for the kkk-th 
filter. component in channel C. 
𝐶: Number of input channels. 𝛤: A transformation operator (e.g., transpose or non-linear 
function like activation or power). 
𝑧𝐶  (𝑥1, 𝑦′): Input feature map for channel c at position 
(𝑥 − 𝑥′, 𝑦 − 𝑦′). H stands for the horizontal filter, V for the vertical filter, 
and K for the hyperparameter that regulates the rank. CCR-
𝑤𝑐
𝑛 : Filter weights for the n-th filter applied to channel c. LWECNNdoes, however, have some drawbacks. In other 
words, even though CCR-LWECNNhas produced strong 
𝑥, 𝑦: Spatial dimensions of the input. 
results for model acceleration and compression, this 
𝑛: Index of the output filter  approach is difficult to execute. CCR-LWECNNmust be 
carried out layer by layer since various layers contain 
 𝑤𝑐(𝑥 − 𝑥′, 𝑦 − 𝑦′
𝑛 ): Convolution kernel applied with different information, making it impossible to construct 
spatial shift. CCR-LWECNNusing a global variable. Furthermore, the 
network must undergo extensive fine-tuning training 
The channel is denoted by W, the filter by n, and the following decomposition in order to converge and produce 
position of the channel by C. The primary goal is to the best result. Figure 3 shows at Proposed model flow 
approximate W in the manner described below: diagram. 
~
𝑤𝐶 𝑘  
𝑛 = ∑ 𝐻k
= 𝑛 (𝑉𝐶
1 𝑘 )𝛤                                       (15) 
𝑘
 
Equation (15) represents a low-rank approximation of the 
convolutional weight tensor W, where: 
 
Figure 3: Proposed model flow diagram 
Since its inception, machine learning has evolved human design in order to progressively enhance their own 
throughout time and has developed a number of flaws. learning process. As a result, the operator's basic technical 
Conventional machine learning methods need constant competence is pretty high and its dependence is 
368   Informatica 49 (2025) 361–376                                                                                                                                 X. Chen et al. 
 
particularly big throughout the calculating process. width and height, as well as the center's horizontal and 
Additionally, machine learning has not advanced very far. vertical coordinates, are denoted by the letters x, y, w, and 
In the meantime, the algorithm cannot rapidly achieve h. The adjusted value is denoted by t. 
accurate image identification and has very low image 
recognition accuracy. The most significant of these is that Indeed, the use of Convolutional Neural Networks (CNNs) 
conventional machine learning technology is unable to is standard and well-justified for image recognition tasks 
precisely distinguish different aspects of the image, which due to their ability to capture spatial hierarchies in visual 
typically results in significant application failures. The data. In the CCR-LWECNN model, integrating dropout 
most significant is that machine learning is unable to layers helps prevent overfitting by randomly deactivating 
recognise the primary information in a picture and neurons during training, enhancing generalization. 
distinguish between the image's background and major Additionally, data augmentation (e.g., rotations, scaling, 
portion. The drawbacks of conventional machine learning flipping) increases training diversity, especially important 
technology in image recognition are addressed by when working with limited samples per class, improving 
optimised deep learning technology. Optimising the the model’s robustness across varied calligraphy styles. 
calculation process is essential to lowering the computing Together, these techniques contribute to the model’s 
cost and increasing the computational efficiency of deep strong performance. 
learning image recognition technology if it is to be used to 
a larger field. Consequently, the model calculation method Algorithm 1: CCR-LWECNN Core Steps 
is made simpler and the calculation effect is somewhat Data Acquisition & Preprocessing 
enhanced by optimising CNN technology and creating the 
Collect images of Chinese characters across multiple 
Faster-CNN model. Figure 3 illustrates the fundamental 
styles (e.g., seal, cursive). 
concept of the lightweight Faster-CNN model. 
Normalize image sizes (e.g., 64×64 pixels). 
In comparison to traditional CNNs like VGG16 (~138 
million parameters, >15 billion FLOPs), the CCR-
Apply data augmentation (rotation, flipping, noise 
LWECNN model has around 1.2 million parameters and 
addition) to expand limited samples (≤15 per class). 
needs 150 million FLOPs per forward pass. Because of its 
shallow architecture—just two convolutional layers, Model Architecture 
smaller (3x3) filter sizes, fewer fully connected neurones, 
and use of effective procedures like batch normalisation Use a lightweight enhanced CNN with: 
and dropout—it is regarded as lightweight. Because of its 
ability to lower memory and compute requirements, this convolutional layers (ReLU activation, batch 
architecture is appropriate for real-time and resource- normalization). 
constrained applications, including embedded or mobile 
systems. Max-pooling layers to reduce spatial dimensions. 
When the Faster-CNN model is applied to image feature Dropout layers to prevent overfitting. 
recognition in Figure 3, it can not only significantly speed 
up the process and increase its effectiveness, but it can also Flatten layer followed by 3 fully connected layers 
maximise the model's recognition effect and assist users in (e.g., 512-neuron layer + output layer with softmax). 
completing the style transfer of painting images. The 
Training 
region proposal network is the method used to optimise the 
Faster-CNN model. Using anchor points, it modifies and 
Train the model using cross-entropy loss and Adam 
enhances the Faster-CNN model's image recognition 
optimizer. 
domain in the following ways: 
Batch size, learning rate, and dropout rate should be 
𝑋 = 𝑤𝑎𝑡𝑥 + 𝑥𝑎                                                   (16) 
tuned via validation. 
𝑦 = ℎ𝑎𝑡𝑦 + 𝑦𝑎                                                    (17) 
Perform training over multiple epochs with early 
stopping if necessary. 
𝑤 = 𝑤𝑎(𝑡𝑤)                                                        (18) 
Evaluation 
ℎ = ℎ𝑎(𝑡ℎ)                                                        (19) 
Use 10-fold cross-validation to compute average 
The abscissa and ordinate of the anchor point's centre 
accuracy, precision, recall, F1-score, and ± standard 
point, as well as its breadth and height, are denoted by the 
deviation. 
letters xa, ya, wa, and ha, respectively. The model's chosen 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 361–376 369 
 
Report statistical significance using p-values 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 → 𝑇𝑃/ 𝑇𝑃 +  𝐹𝑃                                                                    
compared to baseline models (CNN, SVM, RF, etc.). 
𝐹1 − 𝑠𝑐𝑜𝑟𝑒 →  2 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙/ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +
Prediction  𝑟𝑒𝑐𝑎𝑙𝑙                 
For new input images, feed them through the trained Each classifier was assessed using the balanced F1 Score, 
CCR-LWECNN to output probabilities across target recall, and precision of popular techniques including 
classes (character+style or style only). support vector machine (SVM), Random Forest (RF) [18], 
Bonferroni Mean Fuzzy K-Nearest Neighbors (BM-
 FKNN) [19], and CNN [20]. In order to evaluate the 
efficacy of the suggested approach, a classifier model was 
4 Results and discussion constructed by importing Dataset. Seal font, cursive font, 
semi-cursive font, clerical font, and standard font are 
4.1 Experimental setup among the features. It was discovered that 10% of the 
samples were test samples, while 90% of the samples were 
The Intel(R) Core (TM) i74700 HQ CPU run at 2.40 GHz training samples. The likelihood that the lightweight CNN 
was the PC used in this experiment. It has 16.00 GB of will provide correct predictions is known as the accuracy 
RAM. The OS is Windows 8, 64-bit. Weka Ver 3.8.1 rate. The Adam optimiser was used to train the CCR-
utilised to create and evaluate DL models, and Python LWECNN model because of its effective convergence and 
Anaconda Ver 2020.20 with the Seaborn library Ver 0.10.0 adjustable learning rate. During training, a batch size of 32 
is used for correlation analysis.  was used, and the initial learning rate was set at 0.001. 
Validation loss was recorded across epochs to keep an eye 
To ensure replicability and fair evaluation, the dataset of 
on overfitting, and training was stopped early after five 
12,152 samples was divided as follows: Training set as 
epochs if there was no discernible improvement in 
70% (8,506 samples), Validation set as 15% (1,823 
validation loss. Data augmentation and dropout layers also 
samples), Test set as 15% (1,823 samples). Splitting was 
assisted in lowering the danger of overfitting. 
performed stratified by character class and style, ensuring 
balanced representation of each character–style Accuracy is defined as the proportion of image processing 
combination across all splits. Data augmentation was techniques that are reliably and accurately identified. 
applied only to the training set, preserving the integrity of Table 2a and Figure 4 present the accuracy findings. The 
the validation and test sets. current CNN (90.5%), Random Forest (75.4%), SVM 
(85.2%), and BM-FKNN (88.7%) algorithms were all 
4.2 Performance analysis surpassed by our suggested CCR-LWECNN (96.5%). The 
baseline "CNN" refers to a conventional, standard CNN 
A variety of indicators are needed in order to compare the 
architecture commonly used in calligraphy or handwritten 
experiment's outcomes. The accuracy rate is the 
character recognition tasks. This baseline employs two 
probability that the classifier will produce accurate 
convolutional layers with ReLU activations and max 
predictions. The recall rate is the percentage of a Chinese 
pooling, followed by a fully connected layer for 
calligraphy image that are accurate for all 5 Fonts in that 
classification—essentially a straightforward 
feature within the dataset. We assess performance using 
implementation without the architectural refinements (e.g., 
various metrics, including F1-score, accuracy, recall, and 
optimized dropout rates and enhanced feature extraction) 
precision. Accuracy is defined as the percentage of total 
that distinguish our CCR-LWECNN model. We will revise 
samples properly identified by the classifier in (1). The 
the manuscript to provide a detailed description of this 
total number of samples that the classifier found to be 
baseline architecture, ensuring that the comparative 
positive accurately identified as positives in (2) is known 
evaluation is transparent and that readers understand the 
as the recall. Precision, which appears in (3), is the total 
specific differences between the baseline CNN and the 
number of classifier-predicted positive samples that are 
proposed CCR-LWECNN model. While Figure 6 shows 
true positives. By combining the precision and recall found 
that the CNN with uneven margins produced a greater F1 
in (4), the F1-score calculates a balanced average result. 
and balanced Precision and Recall, Figure 6 shows that the 
True positive (TP), false positive (FP), true negative (TN), 
recall suggested by the proposed technique was 95.2%. 
and false negative (FN) are the many metrices that can be 
The precision result for our recommended approach, 
calculated using the equations below. 
which has the maximum precision at 95.6%, is shown in 
𝑇𝑃+𝑇𝑁 Figure 5. With uneven margins, the CCR-LWECNN 
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 →                                                                     
𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁 produced a 95.6% statistically significant better F1 than 
the conventional CNN. Comparing the various situations, 
𝑅𝑒𝑐𝑎𝑙𝑙 → 𝑇𝑃/ 𝑇𝑃 +  𝐹𝑁                                                                               
CCR-LWECNN, and novel tactics, Chinese calligraphy 
370   Informatica 49 (2025) 361–376                                                                                                                                 X. Chen et al. 
 
has significantly improved in effectiveness.  Table 2 shows 
Values of Accuracy, precision, Recall, F1-score. 
Table 2: Values of Accuracy, precision, Recall, F1-score 
Training set Methods Test-set 
Methods Accuracy (%) Precision Recall F1-Score 
(%) (%) (%) 
CNN 90.5 89.1 90.2 90 
RF 75.4 73.5 72.2 73.1 
SVM 85.2 84.7 84.1 83.7 
BM-FKNN 88.7 85.5 88.3 87.4 
CCR-LWECNN [Proposed] 96.5 95.6 95.2 95.6 
 
Table 3: Model performance per calligraphy 
style (averaged across test folds): 
Style Accuracy Precision Recall F1 
(%) (%) (%) (%) 
Regular ( 98.2  97.8 98.1 98.0 
楷书) 
Semi- 95.4 94.9 95.1 95.0 
cursive (
行书) 
Cursive ( 93.1 91.7 92.3 92.0  
草书) 
Figure 4: Result of accuracy outcome 
Seal (篆 94.5 93.2 94.0 93.6 
 
书) 
 
Clerical ( 96.0 95.4 95.8 95.6 
隶书)  
 
CCR-LWECNN generalizes well across all styles, with 
particularly strong performance on Regular and Seal 
scripts, and maintains high F1-scores across more complex 
styles like Cursive and Clerical. Other models showed 
more variance and lower scores, especially on cursive 
scripts.         
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 361–376 371 
 
  
Figure 5: Result of precision outcome Figure 7: Result of F1-score outcome 
 
 
Figure 6: Result of recall outcome 
 
Figure 8: Overall performance of existing and proposed 
method outcome 
 
 
 
372   Informatica 49 (2025) 361–376                                                                                                                                 X. Chen et al. 
 
Figure 4 shows Result of Accuracy. Evaluation of BM- 88.7 ± [88.0, 89.4] 87.4 
performance across a range of input variables, including FKNN 0.8 ± 0.7 
variations in picture quality, subtleties in style, and noise, 
is crucial to determining the model's resilience in real- CCR- 96.5 ± [96.0, 97.0] 95.6 
world situations. Figure 5 shows Result of Precision LWECNN 0.6 ± 0.4 
Outcome. This entails evaluating the model over a range 
of handwriting styles and on calligraphy pictures that are  
noisy, low-resolution, or blurry. By demonstrating actual 
Table 4 Shows Performance Summary with Statistical 
application dependability, this assessment helps guarantee 
Rigor. The CCR-LWECNN model's lightweight design 
that the CCR-LWECNN model generalises effectively and 
makes it ideal for embedded systems and mobile devices, 
retains high accuracy even in less controlled or degraded 
even if system implementation is not extensively covered. 
conditions. Figure 6 shows Result of Recall Outcome. 
Without requiring sophisticated servers, its modest model 
Even while we anticipate that current techniques will 
size and low computational burden allow for effective 
improve the base classifier's performance, in several 
inference on devices with limited resources, enabling real-
instances, a single classifier has produced identical or 
time calligraphy detection in mobile applications. 
superior outcomes. Decision trees, for instance, 
outperformed the BM-FKNN in this instance, with 88.7%  
accuracy and 85.6% precision, respectively. Figure 7 
shows Result of F1-score Outcome. The same best results 
were also obtained by CCR-LWECNN, with 96.5% 
accuracy, 95.6% precision, 95.2% recall, and 95.6% F1 
score. Overall, out of the ten classifier features using 
current techniques, CCR-LWECNN produced the best 
results shows in figure 8.  
A detailed performance comparison was conducted 
between the proposed CCR-LWECNN model and several 
baseline models—including a standard CNN trained from 
scratch and transfer learning models using pre-trained 
networks like MobileNetV2 and EfficientNet-B0—on the 
same dataset. CCR-LWECNN consistently outperformed 
these baselines, achieving higher accuracy, precision, 
recall, and F1-scores while maintaining a smaller model 
 
size and faster inference. This demonstrates that CCR-
LWECNN’s lightweight architecture and tailored Figure 9: Outcome of Learning rate 
enhancements effectively improve Chinese calligraphy 
recognition over conventional and transfer learning The learning rate visualisation figure 9 illustrates how the 
approaches. loss of the model reacts to varying learning rates. The ideal 
learning rate range is indicated by a sharp decline in loss 
Table 4: Performance summary with statistical rigor that is followed by instability or a plateau. Here, the graph 
shows that the CCR-LWECNN model converges most 
Model Accuracy 95% F1- quickly and steadily when the learning rate is adjusted 
(%) ± SD Confidence Score between 0.001 and 0.005, preventing divergence (from too 
Interval (%) ± high a rate) or sluggish training (from too low a rate). The 
SD generalisation and efficiency of the model are enhanced by 
this adjustment. 
CNN 90.5 ± [89.8, 91.2] 90.0 
0.9 ± 0.7  
RF 75.4 ± [74.3, 76.5] 73.1  
1.1 ± 1.2 
 
SVM 85.2 ± [84.3, 86.1] 83.7 
1.0 ± 0.8  
 
 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 361–376 373 
 
Confusion Matrix: Table 5: Outcome with comparison of recent methods 
Model Accur Precisi Rec F1- Para
acy on (%) all Sco ms 
(%) (%) re (M) 
(%) 
MobileNe 94.2 93.7 93.1 93.4 3.4 
tV2 
EfficientN 95.3 94.8 94.1 94.4 5.3 
et-B0 
ViT-Tiny 92.5 91.6 91.2 91.4 5.7 
CCR- 96.5 95.6 95.2 95.6 1.2 
LWECN
 N 
Figure 10: Confusion matrix for the proposed model 
 
The provided confusion matrix in figure 10 evaluates the 
ROC Curve 
performance of the CCR-LWECNN model in classifying 
five Chinese calligraphy styles. (Regular/楷书, Semi-
Cursive/行书, Cursive/草书, Seal/篆书, and Clerical/隶书
). The diagonal values (ranging from 0.93 to 0.98) 
demonstrate strong classification accuracy, with Regular 
script (楷书) achieving the highest accuracy at 98%. The 
most notable misclassifications occur between Cursive (
草书) and Semi-Cursive (行书), with 4% of Cursive 
samples incorrectly predicted as Semi-Cursive, likely due 
to their stylistic similarities in stroke connectivity. Other 
errors are minimal (≤3%), such as Seal (篆书) occasionally 
confused with Cursive (3%) or Clerical (隶书) with Semi-
Cursive (1%). The numerical gradient (1 to 0) implies a 
visual color scale for interpretation, where higher values 
(closer to 1) represent correct predictions and lower values 
(closer to 0) indicate errors. This analysis confirms the  
model’s robustness in distinguishing calligraphy styles 
Figure 11: ROC curve for the Suggested method 
while highlighting expected challenges in discriminating 
fluid, connected scripts like Cursive and Semi-Cursive. 
Figure 11 shows the CCR-LWECNN model's ROC curve 
how well it can differentiate between binary classes, is 
Evaluate with recent methods:  
shown below. With an AUC of around 0.72 in this 
For contemporary picture classification problems, models simulated example, the curve illustrates the trade-off 
such as BM-FKNN, Random Forest, and SVM are less between the True Positive Rate (sensitivity) and the False 
appropriate, particularly when dealing with high- Positive Rate. Better model performance is indicated by a 
dimensional data like calligraphy images. We contrasted larger AUC, and this visualisation aids in evaluating 
the suggested CCR-LWECNN with lightweight deep classification efficacy over a range of thresholds. 
learning models designed for low-resource settings in 
 
order to give a more relevant benchmark. Table 5 given by 
Outcome with comparison of recent methods 
 
 
 
 
 
374   Informatica 49 (2025) 361–376                                                                                                                                 X. Chen et al. 
 
4.3 Discussion which makes it simple to submit images and shows 
identification results with unambiguous visual feedback. 
The suggested CCR-LWECNN model is better at Low latency, usually less than one second, is guaranteed 
recognising Chinese calligraphy since it is more by its lightweight design, allowing for quick and seamless 
computationally efficient than deeper architectures like interaction. By enabling users to rapidly explore 
DenseNet and BiConvExtractNet. DenseNet is great at calligraphy styles and characters, this promotes real-time 
reusing features via dense connections, while usability in workshops, classrooms, and museum kiosks, 
BiConvExtractNet uses bidirectional convolutional improving learning experiences and engagement. 
extraction for jobs with a lot of complexity. However, both 
models frequently consume a lot of resources and are  
likely to overfit on small artistic datasets. CCR-LWECNN, 
on the other hand, has a lightweight structure with well 5 Conclusion 
adjusted convolutional layers and dropout regularisation. 
It gets 96.5% accuracy on a calligraphy dataset with much The goal of this research is to identify Deep Learning 
less complexity and training cost. The CCR-LWECNN models that can accurately identify and assess image 
model successfully captures the geometric regularity in processing technologies on a bigger dataset that includes 
seal script and the fluid stroke dynamics in cursive script, the majority of commonly used Chinese characters. This 
it works well on both seal and cursive styles. Both high- goal was accomplished as our models, which were 
level stylistic elements and low-level texture are extracted constructed using CCR-LWECNN, obtained an image 
by its layered design, and data augmentation guarantees recognition accuracy of 96.5% for a 960-character set, 
resilience to handwriting variances. which is more than three times larger than previous 
research of a comparable kind. Thus, we demonstrated 
CCR-LWECNN's decreased generalisation between that, with a very short dataset, it is possible to construct a 
calligraphers is a major drawback since intra-style lightweight CNN with excellent accuracies in character 
discrepancies might result from differences in individual and picture recognition models by combining the ReLU, 
stroke patterns, pressure, and spacing. When applied to dropout, and data augmentation. For users to better 
fewer-represented calligraphers or unexplored writing understand how they might do better in the future, the 
styles, the model may become less successful due to comparison tool could show which aspects of the 
overfitting to prevalent patterns in the training data.CCR- calligraphy work are problematic. Lastly, style and image 
LWECNN makes it possible to accurately and recognition models in non-printed calligraphy works in 
automatically classify calligraphy styles and characters, it other languages may benefit from the techniques shown in 
facilitates the digitisation, cataloguing, and analysis of this study.  
historical works at scale, hence supporting heritage CCR-LWECNN is utilized to increase the system's 
preservation and digital archiving. This makes it easier to efficacy. Using pictures of various calligraphy pieces, the 
do cultural study, teach, and preserve traditional Chinese system's ability to recognize Chinese calligraphy has been 
calligraphy in digital form across time. The CCR- demonstrated. Additional features, such a dictionary 
LWECNN-based system is perfect for educational and function, will be added to the system in the future by 
cultural applications because of its user-friendly interface, linking it to other databases. 
References
[1] Zeng, W. (2021, January). The Influence and directional diffusion equation. Applied Mathematics 
Communication of Chinese Calligraphy in South and Nonlinear Sciences, 8(1), 1509-1518. doi: 
Korea. In The 6th International Conference on Arts, https://doi.org/10.2478/amns.2022.2.0139 
Design and Contemporary Education (ICADCE 
2020) (pp. 720-723). Atlantis Press. doi: [4] Wong, A., So, J., & Ng, Z. T. B. (2024). Developing 
https://doi.org/10.2991/assehr.k.210106.137 a web application for Chinese calligraphy learners 
using convolutional neural network and scale 
[2] Lee, C. H., & Lee, Y. C. (2021). Effects of different invariant feature transform. Computers and 
finger grips and arm positions on the performance of Education: Artificial Intelligence, 6, 100200. doi: 
manipulating the Chinese brush in Chinese https://doi.org/10.1016/j.caeai.2024.100200 
adolescents. International Journal of Environmental 
Research and Public Health, 18(19), 10291. doi: [5] Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. 
https://doi.org/10.3390/ijerph181910291 (2020). A survey of the recent architectures of deep 
convolutional neural networks. Artificial intelligence 
[3] Cai, W. (2022). Chinese painting and calligraphy review, 53, 5455-5516. doi: 
image recognition technology based on pseudo linear https://doi.org/10.1007/s10462-020-09825-6 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 361–376 375 
 
[6] Zhang, X., Li, Y., Zhang, Z., Konno, K., & Hu, S. In 2021 IEEE International Conference on Advances 
(2019). Intelligent Chinese calligraphy beautification in Electrical Engineering and Computer Applications 
from handwritten characters for robotic writing. The (AEECA) (pp. 405-410). IEEE. doi: 
Visual Computer, 35, 1193-1205.doi: https://doi.org/10.1109/aeeca52519.2021.9574199 
https://doi.org/10.1007/s00371-019-01675-w 
[16] Peng, X., Kang, J., Wu, Y., & Feng, X. (2022). 
[7] Wu, X., Chen, Q., Xiao, Y., Li, W., Liu, X., & Hu, B. Calligraphy Character Detection Based on Deep 
(2020). LCSegNet: An efficient semantic Convolutional Neural Network. Applied 
segmentation network for large-scale complex Sciences, 12(19), 9488. doi: 
Chinese character recognition. IEEE Transactions on https://doi.org/10.3390/app12199488 
Multimedia, 23, 3427-3440. doi: 
https://doi.org/10.1109/tmm.2020.3025696 [17] B. Bing, 2022. “Chinese Calligraphy Characters 
Image Set,” Kaggle.com, Available: 
[8] Liu, X., Hu, B., Chen, Q., Wu, X., & You, J. (2020). https://www.kaggle.com/datasets/bai224/chinese-
Stroke sequence-dependent deep convolutional neural calligraphy-characters-image-set.  
network for online handwritten Chinese character 
recognition. IEEE transactions on neural networks [18] Yuan, S., Wang, Y., Wang, X., Deng, H., Sun, S., 
and learning systems, 31(11), 4637-4648. doi: Wang, H., ... & Li, G. (2020, June). Chinese sign 
https://doi.org/10.1109/tnnls.2019.2956965 language alphabet recognition based on random forest 
algorithm. In 2020 IEEE International Workshop on 
[9] Li, Y., & Li, Y. (2021, June). Design and Metrology for Industry 4.0 & IoT (pp. 340-344). 
implementation of handwritten Chinese character IEEE. doi: 
recognition method based on CNN and TensorFlow. https://doi.org/10.1109/metroind4.0iot48571.2020.91
In 2021 IEEE International Conference on Artificial 38285 
Intelligence and Computer Applications 
(ICAICA) (pp. 878-882). IEEE. doi: [19] Eko, Y. P. (2021, October). Bonferroni Mean 
https://doi.org/10.1109/icaica52286.2021.9498146 Fuzzy K-Nearest Neighbors Based Handwritten 
Chinese Character Recognition. In 2021 International 
[10] Yang, L., Wu, Z., Xu, T., Du, J., & Wu, E. (2023). Conference on Data Science and Its Applications 
Easy recognition of artistic Chinese calligraphic (ICoDSA) (pp. 118-123). IEEE. doi: 
characters. The Visual Computer, 39(8), 3755-3766. https://doi.org/10.1109/icodsa53588.2021.9617488 
doi: https://doi.org/10.1007/s00371-023-03026-2 
[20] Cui, W., & Inoue, K. (2021). Chinese calligraphy 
[11] Pang, B., & Wu, J. (2020, August). Chinese recognition system based on convolutional neural 
calligraphy character image recognition and its network. ICIC Express Letters, 15(11), 1187-1195. 
applications in Web and Wechat applet platform. doi:  https://doi.org/10.24507/icicel.15.11.1187 
In Proceedings of the ACM/IEEE Joint Conference on 
Digital Libraries in 2020 (pp. 253-260). doi:  
https://doi.org/10.1145/3383583.3398516 
 
[12] Si, H. (2024). Analysis of calligraphy Chinese 
character recognition technology based on deep  
learning and computer-aided technology. Soft 
 
Computing, 28(1), 721-736. doi: 
https://doi.org/10.1007/s00500-023-09423-y  
[13] Li, M., & Ren, G. (2023). Intelligent Evaluation  
Method of Calligraphy Characters Based on Deep 
Stroke Extraction. Adv. Comput., Signals Syst., 7, 99-  
106.doi: https://doi.org/10.23977/acss.2023.071014 
 
[14] Huang, Q., Li, M., Agustin, D., Li, L., & Jha, M. 
(2023). A novel CNN model for classification of  
Chinese historical calligraphy styles in regular script 
font. Sensors, 24(1), 197.doi:  
https://doi.org/10.3390/s24010197 
 
[15] Chen, L. (2021, August). Research and 
application of chinese calligraphy character  
recognition algorithm based on image analysis. 
376   Informatica 49 (2025) 361–376                                                                                                                                 X. Chen et al. 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i12.9916                                                                                      Informatica 49 (2025) 377–388   377 
 
Enhancing Machine Translation of English Complex Sentences Using 
Refined Gradient CNN on Large-Scale Corpora 
 
SiYuan Li 
Department of Business and Foreign Languages, Hunan International Business Vocational College, Changsha, Hunan, 
410200, China 
E-mail: lisiyuan198084@gmail.com 
 
Keywords:  optimization, machine translation algorithm, english complex long sentence, refined gradient - 
convolutional neural network 
Received: June 30, 2025 
Optimization of long and complicated sentences in English. Translating complex, lengthy statements from 
one language to another is the job of computer systems called machine translation algorithms (MTAs).  A 
machine translation assistant (MTA) that trains on a big data corpus is one that makes use of a diverse 
and extensive collection of textual resources to improve translation quality. Translating complex and 
lengthy English sentences poses significant challenges for machine translation (MT) systems, especially 
when preserving semantic accuracy. It introduces the Refined Gradient-CNN (RG-CNN) model as a post-
processing refinement mechanism to enhance phrase-level translation accuracy. The model is trained on 
a specially curated "Parallel Corpus" dataset comprising 1,563 English sentence pairs, including complex 
originals and their simplified counterparts. The RG-CNN employs gradient-enhanced convolution and 
bidirectional recurrent layers to capture and refine syntactic structures. The model is implemented using 
Python 3.11. Experimental results demonstrate the model’s superior performance. It achieved BLEU 
scores of 73.1% (corpus) and 70.1% (local), significantly outperforming. Likewise, RG-CNN reported a 
reduced WER of 0.3% (corpus) and 0.10% (local) compared to baseline models. Accuracy and recall were 
also improved to 97.51% and 98.43%, respectively, outperforming the baseline model. These results 
affirm RG-CNN's ability to optimize complex sentence translation, reduce ambiguities, and advance MT 
systems across diverse linguistic domains 
Povzetek: Model Refined Gradient-CNN (RG-CNN) je predlagan za izboljšanje strojnega prevajanja 
dolgih in kompleksnih angleških stavkov, zlasti za fraze. Model je treniran na obsežnem korpusu (1.563 
parov) in optimira prevode kompleksnih besednih struktur. 
 
1 Introduction processes and thus can translate difficult, lengthy texts 
with ease and efficiency [2]. With large language models, 
The English complex long sentence machine translation machine learning, high-performance NLP techniques, 
method is designed to be more efficient and accurate with context and semantics considered, efficiency boost, and a 
a lot of important parameters. The method ought to be human subject-matter expert feedback loop, all are 
context-aware and aware of the original sentence's included in the improvement of the English complex long 
meaning so that it can generate translations that are very sentence machine translation. All these in consideration, 
faithful to the original text. It should be capable of dealing the algorithm's translation capability for complex, long 
with idiomatic expressions, cultural references, and texts may be significantly enhanced [3]. The English 
metaphors to generate translations that are faithful to the complex sentence machine translation technique from a 
original text. In order to effectively address the large corpus of data needs to be trained to improve the 
computational requirements for analyzing compound, quality, accuracy, and fluency of the translation. The 
lengthy words, the approach can take advantage of parallel following are a few key considerations: Choose a large, 
processing methods and distributed computation models to broad-based, and comprehensive corpus that includes a 
offer optimal efficiency [1]. The research ensures that even range of topics, genres, and styles [4]. Align the source and 
sentences that are long and complex get translated outputs target sentences in the corpus to produce aligned sentence 
within a reasonable timeframe. Optimization and analysis pairs for generating the translation. It can learn English 
of the translated output can be facilitated by incorporating phrase-to-phrase mappings and their respective 
a human translator or linguist feedback loop. The computer translations through a critical phase while training a 
can learn from human experience through repeated supervised machine translation system. Apply parallel 
378   Informatica 49 (2025) 377–388                                                                                                                                               S. Li 
 
processing techniques and distributed computing and subject matters. Data collection and cleaning must be 
platforms for efficient management of the enormous performed cautiously when creating a large corpus of data. 
processing involved in training over an enormous data It is ensured that text data covers a wide range of language 
corpus. It is easy to scale up and accelerate the training [5]. and context variations by obtaining it from various sources 
Employ the most advanced neural network architectures, [10]. Preprocessing methods, such as tokenization, phrase 
for instance, transformer architectures, which have made breaking, and part-of-speech tagging, are employed in 
significant jumps in machine translation tasks. The models attempting to process data to train and analyze. The 
are better at handling complex sentence structure and long- various techniques are able to train the machine translation 
distance relations. For pre-training the machine translation models, such as statistical machine translation (SMT) and 
model, employ the pre-trained language models like BERT neural machine translation (NMT), upon the creation of the 
or GPT. Fine-tune the model on the huge data corpus [6], big data corpus. The models can learn more complex 
particularly for the translation of very hard, long sentences. phrase structures, language patterns, and correspondences 
Transfer learning helps the model learn to identify because they have larger training data. Big data corpora 
universal language use patterns, and fine-tuning helps it have helped machine translation advance [11]. Big data 
learn to follow the specific translation task that is being analysis enables researchers and developers to train and 
performed. Generate artificial sentence variations or better develop models, which enhance translation quality, 
paraphrases to improve the training set. The algorithm manage complex sentence structures better, are natural-
becomes stronger and more immune to complex, long sounding, and possess greater awareness of context. 
sentences by being exposed to more varied sentence forms MTAS should work much better when there is a massive 
and language variations. Employ the automated translation amount of data that is high in quality. To ensure the corpus 
system and collect user ratings of translations. Employ the is representative, precise, and unbiased, or noise-free, and 
feedback to improve the quality of the translations over does not adversely affect the translation outcome, 
time, using repeated usage in the training process. The meticulous data selection, preprocessing, and curation are 
research enables the algorithm to capture user preferences needed. Lastly, a large quantity of data as a corpus allows 
and personal translation challenges from complex, lengthy machine translation systems to learn from heterogeneous 
sentences [7]. Use standard measuring devices at regular linguistic data, leading to more efficient and robust 
intervals to test how well the improved algorithm is translation abilities for practically any level of sentence 
performing. Compare the algorithm with other cutting- complexity and linguistic diversity [12]. 
edge machine translation systems to evaluate its  
performance and what it needs to improve on. The English Key contributions: 
compound long sentence MTA can be translated better, 
more fluently, and contextually using the power of big data • The application of a certain machine translation 
corpora and implementing the optimization techniques. It algorithm to process lengthy, complicated words, 
generates more accurate, efficient, and effective designing the Refined Gradient-CNN model, 
applying a huge training set, and optimization 
translations of long compound sentences [8]. Large and 
methods used to enhance word translation 
diverse quantities of text data referred to as a "big data 
accuracy. 
corpus" undergo processing in machine translation and 
• This Research aims to overcome the difficulties 
other NLP tasks, training, testing, and also updating the in translating complex phrase patterns and 
processes. "Big data" thus addresses the sheer volume, enhance the general effectiveness of machine 
variety, and velocity of data that can be processed and translation systems. 
analyzed [9]. The size and variation of the corpus are also 
significant factors that decide the level at which it can be 2 Related work 
successfully used for training. With a large corpus, there 
can be complete utilization of different grammatical Research in many areas must include a literature survey, 
patterns, lexis, idiomatic expressions, and language commonly called a literature review or systematic review. 
variation. The variation of the corpus ensures that the It entails a thorough review and analysis of the research 
algorithm is trained in multiple domains and styles, body, shown in Table 1. 
thereby ensuring that it is capable of handling mixed styles 
 
Table 1: Literature survey 
References Objectives Summary of Findings Limitations 
[13] Research suggests that the segmentation of lengthy The study evaluated the features of professional Limited focus on 
phrases is made possible by the hierarchical literature and discussed a translation optimization structural 
network of ideas technique, which has been approach for professional literature, combining segmentation; lacks 
enhanced. statistics, which significantly increases. handling of deep 
syntactic variations 
in English 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 377–388 379 
 
[14] Research improved to build researchers used the The study focused on the neural machine translation Lacks real-time 
multi-objective optimization technique. The study model's probabilistic structure, which allows learning adaptation; 
also employs parallel corpora and monolingual researchers to draw conclusions about data-related limited performance 
corpora routes with an emphasis on node regularization items and apply them. on unseen sentence 
distribution and data flow analysis. structures 
[15] The study improved the framework to optimize a It showed that the newly suggested computer- Relies on memory-
computer-assisted translation system to increase assisted translation system can improve translation based context, 
the accuracy and reliability of automatic quality and intelligently translate memory-assisted which is insufficient 
translation of long-character English with long-character English with high data recall rates, for unseen phrase 
memory-assisted English. accuracy, and dependability. structures 
[16] The Research introduced by employing a word The overview objective of the goal showed that, in Focused only on 
corpus, the word alignment optimization approach comparison to the earlier methods, the suggested word alignment; 
enhances word alignment performance in the technique lowers the average alignment error rate.  doesn't address 
transformer system.  phrase-level 
semantic coherence. 
[17] The Research suggested that a model for The study evaluated the process of fuzzy semantic Limited 
calculating language-semantic correlation that selection achieved using a machine learning neural grammatical 
uses the best fuzzy semantics for English lengthy network adaptive learning technique. handling; doesn't 
sentences should be developed. scale well for 
professional or 
complex sentence 
contexts. 
[18] The study suggested language combinations and The Research focused on creating human and Data-centric 
collected and cleaned texts from diverse sources to automated assessments of the resulting models. approach; lacks 
form four parallel corpora, which were used to structural model 
build the translation system. improvements for 
long or technical 
texts 
[19] The overall objective of the goal was to explore the The Research investigated to show the viable Focused on a 
two different NMT algorithms, Bidirectional Long direction for Research to improve Bangla-English specific language 
Short-Term Memory (LSTM) and Transformer- NMT. pair; not 
based NMT, used for the Bangla-to-English generalizable to 
language pair. English complex 
phrase structure. 
[20] The study analyzed that well-known translator like The Research examines English, which has been the Not optimized for 
Google Translate do quite well when translating base or source language for the vast majority of NLP general MT 
between English, French, or Spanish. Still, studies research projects that have been discovered so far. performance; lacks 
make trivial mistakes when translating recently The study had several regularly spoken potential contextual learning 
introduced languages like Bengali, Arabic, etc. languages that still need to be explored. layers 
[21] The overall objective of the goal is to briefly The study showed that the backpropagation (BP) 
present the voice recognition neural network neural network recognized speech more quickly than Focused on speech 
technique. The machine translation method was artificial recognition and with a reduced word input; not optimized 
then put through simulation studies and contrasted mistake rate. for written complex 
with two additional machine translation language translation 
techniques. 
[22] To enhance Punjabi-to-English NMT translation Incorporating MWEs and word embeddings Limited to Punjabi-
by addressing out-of-vocabulary (OOV) words improved translation fluency and adequacy, English pair; does 
and multi-word expressions (MWEs). achieving BLEU scores of 15.45, 43.32, and 34.5 on not generalize to 
small, medium, and large test sets, respectively. other low-resource 
or morphologically 
rich languages. 
[23] BLEUₙ-based evaluation, residual comparison, NMT showed higher translation quality than SMT Focus limited to 
Google Translate and European Commission’s across all BLEUₙ scores English–Slovak; no 
Translation tool (EC) tool deep feature 
extraction or 
semantic ranking; 
lacks domain-
independent 
generalization 
 
3 Methodology  was a challenge for Transformer-based models [19] and 
Bidirectional LSTM models [19].  Furthermore, 
3.1   Problem statement conventional approaches have significant word error rates 
and are ineffective at managing contextual memory [15]. 
To improve the quality of translation of lengthy and The proposed Refined Gradient-CNN model overcomes 
complicated English sentences, especially for low- all of the above limitations by incorporating contextual 
resource language pairings, while existing methods like memory encoding and layered semantic mapping, which 
fuzzy semantic selection approaches [17] encourage improves translation accuracy for complex phrase 
phrase segmentation and semantic correlation, but cannot structures. 
be particularly effective at learning deep contextual 
 
relationships. Similarly, structural alignment for long-
sentence translation, especially minority languages [20], 
380   Informatica 49 (2025) 377–388                                                                                                                                               S. Li 
 
3.2   Experimental procedure Sentences that are distinguished by their length and 
complicated structure are referred to as being optimized 
This section provided a thorough explanation of how for English complex, lengthy sentences. Many clauses, 
the steps of the suggested design in Figure 1 were created, sub-clauses, phrases, modifiers, and dependent 
covered its creation process, and covered its key connections may be found in the sentences. Complex, 
components. This analysis has four parts: Information lengthy phrases may be difficult to understand, 
gathering is generally the main goal of the first stage. The comprehend, and translate because of their complex syntax 
second part included MTA for long English sentences. The and potential for ambiguity. The rearranging module 
most significant information is found in the third part, intends to fit more closely the short phrases' translation 
which describes the work performed to develop the with the language order following the combination by 
Refined Gradient-CNN model and compile the essential rearranging the short phrases generated through 
knowledge. The efficiency of each current and previous segmentation. Figure 2 depicts the upgraded intelligent 
design is presented in the fourth section. It is judged by MTA's flow. 
contrasting the pertinent factors. 
 
Figure 1: Methodological design  
A. Data collection Figure 2: Translation process of MTA for long English 
sentences 
To optimize sentences and enhance translation was 
validated it using the 1,563 English sentence pairings from The sentence segmentation module is designed to 
the "Parallel Corpus" dataset.  Each record includes meta- split lengthy English sentences into shorter, manageable 
data such as readability ratings, difficulty levels, and segments. This is accomplished by predicting the 
domains, as well as the original complicated text and its likelihood of each word being a segmentation point using 
simplified or optimized translation. The aim of decreasing a maximum entropy (MaxEnt) classifier. The MaxEnt 
language complexity, enhancing semantic retention in AI, approach is particularly suitable here because it models 
education, and NLP applications, and enhancing phrase conditional probabilities flexibly without assuming 
translation quality at the phrase level are all complemented independence among features. 
by this dataset. For model development and evaluation, the 
exp⁡(∑𝑗 𝑧𝑗ℎ𝑗(𝑣,𝑢(𝑧)))
dataset was partitioned into 70% for training (1,094 𝑜(𝑣|𝑢(𝑧)) =    
(∑𝑣 𝑒𝑥𝑝(∑𝑗 𝑧𝑗ℎ𝑗(𝑣′,𝑢(𝑧)))
samples), 15% for validation (235 samples), and 15% for 
                (1) 
testing (234 samples).This structured split enables robust 
model performance assessment across varied difficulty 
Where 𝑜(𝑣|𝑢(𝑧)),⁡the likelihood that a word 
levels and domain contexts, making it a reliable 
will𝑧 is a segmenting term in the lengthy statement, 𝑢(𝑧) 
benchmarK due to its rich linguistic annotations and 
is the background knowledge. 
domain diversity. It can serve as a thorough benchmark for 
model performance testing on a range of difficulty and 
After segmentation, the reordering module 
context aspects due to its extensive domain coverage and 
rearranges the segmented short sentences to reflect the 
strong language annotation.  
original logical flow. This is again modeled using a 
maximum entropy classifier, which estimates the 
Source: https://www.kaggle.com/datasets/ziya07/parallel-
likelihood of a correct sequence based on context and 
corpus-data/data  
neighboring sentence features. Equation (2) is the 
B. MTA for english complex long sentence appropriate computation and reads as follows: 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 377–388 381 
 
𝑡
𝑜(𝑝|𝐷𝑡 , 𝐷𝑛 exp⁡(∑𝑗 𝑧𝑗ℎ (𝑝 𝐷
𝑠 )
𝑗 , 𝑛,𝐷
𝑛
𝑠 )) phrases, adverbial phrases, and adjectival words. Chunks 
𝑛 =  
(∑𝑝.𝑒𝑥𝑝(∑𝑗 𝑧𝑗ℎ𝑗(𝐷
𝑡 𝑛  
𝑛,𝐷𝑠 )) help identify interrelations and roles among sentence 
      (2) constituents. Establish relationships and interdependence 
between the various constituents. Establish the verb-object 
The encoder receives the short English sentences 
relationships, subject-verb concordances, and other 
that have been segmented and re-ordered. The original text 
is encoded by the encoder using an LSTM model, and the syntactic relations that contribute to the general sentence 
resulting computation is given by Equation (3-7): form. Large sentences become richer in analysis and 
interpretation for linguists, NLP programmers, and 
𝑒𝑠 = 𝜎(𝑎𝑒 + 𝑋𝑒𝑢 + 𝑍𝑒𝑔 )   
𝑠 𝑠−1 machine translation algorithms when they are divided into 
      (3) smaller constituents. The proposed approach provides a 
better understanding of the syntactic form and semantic 
𝑡𝑠 = 𝑒𝑠𝑡𝑠−1 + ℎ𝑠𝜎(𝑎 + 𝑋𝑢 + 𝑍
𝑠 𝑒𝑔 )  
𝑠−1 connections of the text and is more convenient for 
                    (4) 
translation or further study with greater accuracy. Long-
ℎ distance segmentation is particularly useful in complex 
𝑠 = 𝜎(𝑎ℎ + 𝑋ℎ𝑢𝑠 + 𝑍𝑒𝑔 )   
𝑠−1
      (5) languages such as English, which possess complicated 
sentence forms with multiple clauses and modifiers. The 
ℎ𝑠 = 𝑡𝑎𝑛𝑔(𝑋ℎ)𝑟𝑠)    lengthy phrases can be broken down into smaller segments 
      (6) to lessen confusion and enhance the interpretation and 
comprehension of the entire message. By the 
𝑟𝑠 = 𝜎(𝑎𝑟 + 𝑋𝑟𝑢𝑠 + 𝑍𝑟𝑔𝑠−1)   
      (7) decomposition of huge words into smaller pieces or 
clauses, long-distance segmentation is important in the 
C. English long-distance segmentation study of language, NLP processing, and machine 
translation. It ensures that sentence structures can be 
English long-distance segmentation refers to breaking 
examined more systematically and explicitly, allowing for 
up a long sentence into sub-clauses or segments in order to 
free, correct comprehension, translation, and analysis of 
better understand and analyze it. It is commonly applied in 
complex language phenomena. 
linguistics, machine translation, and natural language 
processing (NLP) for splitting complicated sentences and D. Refined gradient-convolutional neural network 
grasping the syntactic structure of the sentence. The /9RG- CNN) design for big data 
dataset was chosen due to its relevance for tasks involving 
the reduction of linguistic complexity and the preservation There are several factors to consider and methods to 
use when developing huge amounts of data. To improve 
of semantic meaning, particularly within applications 
RG - -CNN design, particularly for large data situations, 
related to artificial intelligence, education, and natural 
there are differences between a regular CNN and the 
language processing (NLP). Its diverse domain proposed Refined Gradient-CNN (RG-CNN). To handle 
representation and robust linguistic annotations make it a complicated phrase structures and huge datasets more 
reliable benchmark for evaluating model performance effectively, the RG-CNN combines gradient-based 
across varying levels of sentence complexity and refinement, batch normalization, dropout, ReLU 
contextual nuance. Various clauses, words, and sub- variations, enhanced memory handling for big data, and 
sophisticated pooling methods. Take into consideration the 
clauses within a sentence are detected and separated from 
following key strategies: Big data typically encompasses a 
each other according to their grammatical relationship and very large number of input samples. The task is to develop 
dependencies via long-distance segmentation. The an RG - CNN model that is scalable to attack it. This might 
intention is to produce substantial sentences that can be involve employing parallel computing platforms, splitting 
learned separately or in conjunction with other, longer the job between numerous computers, and enhancing 
sentences. Find the sentence's main clauses or independent memory management to deal with large datasets. Big data 
typically must split the training task between numerous 
sections. The foundation elements convey complete ideas 
computer nodes or clusters. The training process can be 
and may be utilized alone as independent sentences. 
segmented based on techniques such as model parallelism 
Identify any subordinate or dependent sentences that and data parallelism, thereby providing faster convergence 
provide the direct clauses with explanation, background, and efficient use of resources. Batch normalization is one 
and information. The fines typically begin with relative of the techniques used to surmount the challenge of 
pronouns such as "who," "which," or "that" and training RG-CNN and other deep neural networks with 
subordinating conjunctions such as "although," "because," large sets of data. It scales and normalizes the activations 
of every layer of a network to help with faster and 
or "if." The sentence needs to be dissected into modifiers 
convergent training. Both the overall performance and the 
and relevant phrases. These include noun phrases, verb 
generalization of the RG-CNN model can be improved by 
382   Informatica 49 (2025) 377–388                                                                                                                                               S. Li 
 
batch normalization.  Overfitting of massive data must be ReduceLROnPlate
avoided using regularization methods. The task can au 
employ regularization and dropout in order to control the  
complexity of the model and induce more generalization. 
Regularization improves the ability of the model to deal 3.3   Convolutional Neural Network 
with natural noise and variance present in large data. The 
performance of CNN on extremely large data might be A CNN is made up of five parts: input data, a 
significantly affected by choosing the appropriate convolutional layer, a pooling layer, FC overlay, and an 
activation functions. Rectified Linear Units (ReLU), output vector. CNNs come in a variety of layer 
which reduce the vanishing gradient problem and the combinations. The CNN's structure, which was used in this 
training speed, have proven to be useful. In order to detect experiment, is shown in Figure 3.  
more intricate patterns, their variations, such as Leaky 
ReLU or Parametric ReLU, can be used. For handling 
enormous data, transfer learning can be used.  Starting 
points include pre-trained CNN models on huge datasets 
like ImageNet. The huge data may be fine-tuned better to 
fit the CNN to the particular job using the information 
gained from these models. The problem of little labeled 
data may be solved by transfer learning, which will 
enhance RG-CNN performance. Table 2 displays the RG-
CNN model hyperparameters.  
Table 2: RG-CNN model hyperparameters and 
configurations for effective training and convergence. 
Paramete  
Value / Range Description 
r Figure 3: Structure of Convolutional Neural Network 
Adaptive learning 
Optimizer Adam / AdamW 
for sparse data Finding intriguing patterns in the data is the goal 
Tuned using of the convolutional gradient task. Each of the many layers' 
learning rate convolutional kernels has a frequency and a divergence 
Learning 
1e-4 to 5e-4 scheduler (e.g., coefficient. It is assumed that 𝑢𝑗 is the weight parameter, 
Rate 
ReduceLROnPlatea 𝑎𝑗 is the divergence amount, and 𝑉𝑗−1 is the input to 
u) convolution layers 𝑗 while the inversion kernel 𝑗 is active. 
Batch Based on GPU 
64–128 One such expression for the convolution operation of 
Size memory Equation (8) is: 
Early stopping 
Epochs 10–30 based on BLEU 𝑉𝑗 = 𝑒(𝑢𝑗 ⊗𝑉𝑗−1 + 𝑎𝑗)    
validation                (8) 
Dropout To prevent 
0.3–0.5 
Rate overfitting Where 𝑉𝑗the output result of the convolution 
Activatio kernel 𝑗represents the convolution operation, and 𝑒(𝑥) 
ReLU / For non-linearity in 
n represents the activation function. 
LeakyReLU CNN layers 
Function 
Max The input data is swept repeatedly by the 
For padding and 
Sequence 100–150 tokens convolutional network, which then extracts the distinctive 
positional encoding 
Length information. In addition, the multilayer's operational 
Embeddin amplifier is changed to𝑅𝑒𝐿𝑈. The Linear transfer function 
512 / 768 (or match 
g Use with pre- is simpler to derive than the exponential transfer function 
transformer 
Dimensio trained embeddings and other training algorithms, allowing for faster model 
encoder) 
n training and better protection against gradient 
Kernel disappearing. It is possible to write 𝑅𝑒𝐿𝑈, which is 
For capturing n-
Size 3 × 3 / 5 × 5 represented in Equation (9) as: 
gram features 
(CNN) 
To retain the most 𝑉𝑗(𝑉𝑗 > 0)
Pooling MaxPooling 𝑅𝑒𝐿𝑈(𝑉𝑗) = {     
relevant features 0⁡(𝑉𝑗 > 0)
Gradient Prevent exploding             (9) 
1 
Clipping gradients 
Schedule Warm-up + Cosine Smoother The pooling layer's main function is to reduce 
r Annealing / convergence down-sampling data redundancy, which also aids in 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 377–388 383 
 
achieving invariance and reducing CNN complexity. The The input maps are produced as down-sampled 
two most popular ways to finish pooling are pooling layer copies using a sub-sampling layer. There will be exactly 𝑁 
and max pooling. If the study uses averaged pooling, the export maps if there are 𝑁 intake maps, albeit the final 
outcome is the computing area's arithmetic mean, but if the maps will be smaller. More formally, they are calculated 
study uses max pooling, the outcome is the computation as Equations (16), 
area's highest value. Max pooling was used for this 
investigation because it preserves important data better 𝑢𝑘 = 𝑒(𝛽𝑘𝑖 𝑑𝑜𝑤𝑛(𝑢
𝑘−1 𝑘
𝑖 𝑗 ) + 𝑎𝑗    
than average pooling. Equation (10) in mathematics                  (16) 
provides the maximum pooling: 
To identify that the patch in the input map 
𝑅𝑖 = 𝑚𝑎𝑥(𝑂0
𝑖 , 𝑂
1 2
𝑖 , 𝑂𝑖 , 𝑂
3
𝑖 , … . 𝑂𝑠
𝑖 , )   is related to a specific pixel in the output map, and to 
     (10) calculate the gradient of a kernel. Applying a delta 
recursion that resembles Equation (17-20) in this case 
Where⁡𝑅𝑖 ⁡𝑖𝑠⁡the return outcome of the pooled requires determining which area in the sensitivity map of 
region 𝑖, Max is the maximum pooling procedure, and 𝑂0
𝑖  the present layer corresponds to a particular pixel in the 
is the pooling area 𝑖 is the element 𝑠. sensitivity map of the following layer. The weights, being 
the weights of the (rotated) convolution kernel, are 
A CNN's "classifiers" are its layers. Its main increased by the relationship between the input patch and 
objective is to reorganize the data that the convolutional the output pixel. Convolution is once again used to do this 
and pooling layers retrieved and weighted from the effectively: 
hidden-layer space. A similar dropout method is 
implemented in the layer to randomly eliminate neurons to 𝛿𝑘𝑗 = 𝑒′(𝑢𝑘 1
𝑗 )°𝑐𝑜𝑛𝑣(𝛿
𝑘+1
𝑗 , 𝑟𝑜𝑡180(𝑙𝑘+ ), ′𝑓𝑢𝑙𝑙′𝑖 ) 
prevent over-fitting.                    (17) 
Let's determine the backpropagation updates for 𝜕𝐹
= ∑ (𝛿𝑘
𝜕𝑎 𝑥,𝑦 𝑗 )𝑥𝑦     
a network's convolutional layers. The output feature map 𝑖
is created by convolving the feature maps from the                    (18) 
preceding layer using learnable kernels and then 
𝑐𝑘 1
processing them via the activation function. Convolutions 𝑖 = 𝑑𝑜𝑤𝑛(𝑢𝑘−𝑖 )    
with numerous input maps may be combined in each      
output map. Equation (11) is often shown as,  (19) 
𝑈𝑘 = 𝑓(∑ 𝑈𝑘
𝑗∈𝑁 )  
𝑖 𝑖 ∗ 𝐾𝑘 𝑘
𝑖 𝑗𝑖 + 𝑎 𝜕𝐹
𝑖  = ∑𝑥,𝑦(𝛿
𝑘° 𝑘)     
𝜕𝑎 𝑖 𝑥𝑖 𝑥𝑦
𝑖
    (11)      
 (20) 
3.4   Computing the gradients 
3.5   CNN algorithm 
A down-sampling layer's map weights are all set 
to the same value𝛽, to determine the value of, we only The network weights are updated by the CNN 
scaled the result of the prior procedure by β. For each map, algorithm (1) by using a method known as 
we may do the same δ. Calculation again 𝑖Equation (12– backpropagation, depending on the error between the 
15) represents the pairing of the map from the layer of predicted and actual results. The CNN can learn and 
convolution and the associated map from the sub- develop its capacity to identify patterns and objects in 
samples layer: pictures due to this iterative process of forward 
𝑘 propagation (feeding data through the network), 
𝛿 = ⁡𝛽𝑘+1 𝑘
𝑖 𝑖 ((𝑒′(𝑥𝑖 )°𝑢𝑝(𝛿
𝑘+1
𝑖 ))   backpropagation, and object detection. In various 
                   (12) computer vision applications, such as picture 
classification, object recognition, and image segmentation, 
𝑢𝑝(𝑢) ≡ 𝑢⨂1𝑚×𝑚    
CNN methods have shown outstanding performance. We 
     have been used to various tasks, including autonomous 
 (13) driving, picture analysis in medicine, and face 
𝜕𝐹 identification. 
= ∑ (𝛿𝑘
𝜕𝑎 𝑥,𝑦 𝑖 )𝑥𝑦     
𝑖
                   (14) Algorithm 1: Convolutional Neural Network 
Function CNN (input_data); 
𝜕𝐹
=    // Convolutional layers  
𝜕𝑙𝑘𝑗𝑖
  For each convolutional layer: 
𝑟𝑜𝑡180(𝑐𝑜𝑛𝑣2(𝑢𝑘−1𝑗 , 𝑟𝑜𝑡180(𝛿𝑘−1𝑗 , 𝑟𝑜𝑡180(𝛿𝑘𝑗 ), 𝑣𝑎𝑙𝑖𝑑
′))        Convolution=apply convolution (input_data, 
                                             (15) weights)// Apply convo: 
384   Informatica 49 (2025) 377–388                                                                                                                                               S. Li 
 
      Activation=apply_activation (convolution) // Apply through enhanced entity prediction using a Multilayer 
activation function  Perceptron (MLP) layer. The MLP receives inputs from 
      Pooling=apply_pooling (activation)// Apply pooling the TransE pre-trained embeddings and refines them, 
operation (e.g) enhancing the expressive capability of the overall model. 
     // fully connected layers  Table 4 and Figure 4 illustrate the outcomes. The proposed 
Flattened=flatten (pooling) //Flatten the pooled feature RG-CNN model, which incorporates both bidirectional 
maps into a  training and the TransE+MLP architecture, achieves 
For each fully connected layer: significantly higher precision values: P@1 = 84%, P@5 = 
        Weights=initialize_weights () //Flatten the polled 88%, and P@10 = 98%. These metrics indicate that the 
feature maps into a model reliably ranks the correct simplified sentence within 
        Bias =initialize bias () // Initialize bias for the fully ( the top predicted candidates. Furthermore, the use of a 
Linear_transform = apply_linear_transform (flattened, Recurrent Neural Network (RNN) within RG-CNN 
weights, bias)  effectively captures bidirectional dependencies in the data, 
      Activation=apply_activation (linear_transform) contributing to performance that surpasses even the 
//Apply activation  TransE+MLP configuration. This validates the 
  //Output layer  architectural choice and demonstrates the robustness of the 
Out_weights -= initialize_output_weights () //Initialize proposed model. 
weight for 
Output_bias = initialize-output-bias () // Initialize bias for Table 4: Numerical outcomes of the Training strategy 
the output based on the algorithm 
Output= apply_linear_transform (activation, 
output_weights, output_bias) Test sets Percentage (%) 
Predicated_class = classify_output (output)// Classify the P@1 P@5 P@10 
output to dete TeansE 25 30 35 
Return predicted_class 
TeansE+ MLP 40 45 50 
 
FB15K 82 86 95 
4 Result and discussion 
PP1 70 75 80 
Results are always advised to reference the most Refined Gradient - CNN 84 88 98 
recent literature survey for the most up-to-date information [Proposed] 
on these themes, since the exact outcomes and  
improvements in the translation of complicated, lengthy 
phrases might differ based on the research and 
development efforts in the area. 
To improve the accuracy of translating complex English 
sentences in real time, the proposed RG-CNN model was 
built using Python 3.11. Table 3 demonstrates the 
simulation setup. 
Table 3: Simulation setup 
Component Recommended Specification 
NVIDIA A100 / RTX 3090 / 
GPU 
Tesla V100 
RAM 32–64 GB 
SSD (1 TB recommended for 
Storage 
large corpora like WMT) 
Framework PyTorch / TensorFlow 
Distributed Optional with Horovod / DDP  
Training (for WMT-scale corpora) 
 Figure 4: Comparison of Training strategy based on the 
algorithm 
4.1   English translation design using big data 
The suggested Refined Gradient-CNN model 
The results further emphasize the efficiency and outperforms the Improved Long Short-Term Memory 
rationale of the two training approaches, particularly the (LSTM) [24] and Hierarchical Network of Concepts 
bidirectional training strategy. This strategy, when (HNC) models [25]. It successfully enhances machine 
combined with TransE, demonstrates its effectiveness 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 377–388 385 
 
translation of complex language patterns by capturing machine translation quality by showcasing the model's 
complex phrase structures. enhanced capacity to manage complex phrase structure. 
The proposed Refined Gradient-CNN model outperformed Table 6: Comparative word error rate (%) analysis for 
the Improved LSTM [24], 31.3% and 3.9%, respectively, english phrase translation accuracy 
in terms of BLEU scores, as shown in Table 5 and Figure 
5, 73.1% for the corpus dataset and 70.1% for the local Methods Word Error Rate (%) 
dataset.  The outcomes demonstrate how well the proposed 
Corpus Corpus dataset 
model works to translate complex phrase structures with 
dataset 
greater n-gram overlap. This aligns with the goal of the 
study, which is to improve machine translation systems by 
raising overall translation quality and semantic integrity Improved LSTM [24] 0.9 1.1 
across a variety of datasets, especially for complex 
language structures. Refined Gradient- 0.3 0.10 
CNN [Proposed] 
Table 5: Comparison of BLUE score on corpus and local 
dataset  
Methods BLEU (%) 
Corpus Local dataset 
dataset 
Improved LSTM [24] 31.3 3.9 
Refined Gradient- 73.1 70.1 
CNN [Proposed] 
 
 
Figure 6:  Word Error Rate (%) comparison of various 
models 
Improving machine translation accuracy, the 
proposed Refined Gradient-CNN model significantly 
enhances the translation of the challenging English phrase 
structures. With 97.51% accuracy and 98.43% recall, the 
Refined Gradient-CNN outperformed the HNC technique 
[25], which achieved 93.38% accuracy and 94.51% recall, 
as shown in Table 7 and Figure 7. These improvements 
demonstrate the model's improved capacity to recognize 
and translate intricate phrase patterns more accurately. The 
results demonstrate that Refined Gradient-CNN improves 
machine translation systems' overall performance and 
efficiency in authentic language situations while also 
reducing translation ambiguities. 
Table 7: Comparison of accuracy and recall between 
 HNC and the proposed refined gradient-CNN model 
Figure 5: BLEU comparison of various models Methods Accuracy Recall (%) 
(%) 
The translation accuracy metric known as WER, 
HNC [25] 93.38 94.51 
which counts the number of insertions, deletions, and 
substitutions required to arrive at a reference translation, is Refined Gradient- 97.51 98.43 
shown in Table 6 and Figure 6. Higher translation CNN [Proposed] 
precision is indicated by a lower WER. Compared to the  
Improved LSTM [24] 0.9% and 1.1%, the WERs recorded 
by the Refined Gradient-CNN were significantly lower 
0.3% for the corpus and 0.10% for the local data. These 
findings align with the study goal of improving overall 
386   Informatica 49 (2025) 377–388                                                                                                                                               S. Li 
 
Refined Gradient-CNN (RG-CNN), which was trained on 
a specially constructed parallel corpus of 1,563 sentence 
pairs enhanced with readability scores, complexity labels, 
and domain-specific metadata.  RG-CNN effectively 
models structural intricacy, idiomatically controlled use, 
and long-distance relationships in English by fusing deep 
convolutional neural architecture with gradient-based 
smooth optimisation.  The model is better able to 
generalise across contexts and adjust to complex language 
patterns when it is exposed to a heterogeneous, metadata-
annotated corpus. Empirical evaluation confirms the 
model's excellent performance capability. With BLEU 
scores of 70.1% (local) and 73.1% (corpus). WER was also 
reduced to 0.3% (corpus) and 0.10% (local) compared to 
the improved LSTM [24] model.  RG-CNN beat traditional 
 models like HNC [25] in terms of classification 
performance, achieving 97.51% accuracy and 98.43% 
Figure 7: Performance metrics of various models in 
recall.  Hyperparameter tweaking was also used to increase 
translating complex phrase patterns 
the model's parameter efficiency in order to get optimal 
convergence and significantly reduce overfitting. The RG-
5   Discussion CNN model improved its translation ranking performance 
by using the Parallel Corpus dataset for bidirectional 
Enhancing the quality of machine translation at training with TransE + MLP. The model converted simple 
the phrase level, particularly when translating difficult and and complicated words into vectors and rated them based 
syntactically complicated English formulations. For on their proximity to the right response. The model 
lengthier phrases, existing models such as Improved achieved 84 percent accuracy (P@1), 88 percent (P@5), 
LSTM [24] have not been adequate in terms of structural and 98 percent (P@10), proving its effectiveness in NLP 
equivalence and semantic coherence. Similarly, because and text simplification tasks related to education. These 
there are fewer contextual generalizations, the HNC model outcomes collectively support the objective of building an 
[25] is unable to interpret nested and specialized phrase efficient, high-performance model for simplifying 
patterns.  complex English sentences across educational and NLP 
applications. 
The suggested Refined Gradient-CNN model 
effectively overcomes these drawbacks by integrating 
Future scope  
gradient-driven refinement into the convolution process 
for improved recognition of complex language structure.  
The new strategy is utilizing the power of big 
By reducing ambiguity in translation and enhancing 
linguistic data and optimization methods in solving the 
contextual knowledge, the approach increases the 
issues of translating intricate, long sentences, which will 
dependability of machine translation systems for technical 
end up improving the overall performance of the machine 
and professional communication procedures. The design is 
translation system. 
a significant improvement over earlier models in terms of 
structure, recall, and generalization. Acknowledgement: The Research is supported by: 
Construction of Training Modes for Business English 
Limitation Majors: A Perspective of Business Needs（
2021HXXM175) 
It requires an enormous amount of processing power 
to train models to handle complicated, lengthy words. 
Large-scale models could require a lot of memory and take References 
a long time to learn, rendering them unavailable to people 
or organizations with limited resources. There are many [1] Li, G., 2024. Research on Automatic Identification 
possible valid translations for a complicated statement, and of Machine English Translation Errors Based on 
the context or specific use conditions determine the Improved GLR Algorithm. Informatica, 48(6). 
preferred translation. This flexibility is hard to replicate https://doi.org/10.31449/inf.v48i6.5249 
with machine translation processes and to reasonably [2] Ruan, Yuexiang. 2022. “Design of Intelligent 
reflect the learning. Recognition English Translation Model Based on 
Deep Learning.” Journal of Mathematics 2022: 1–
6 Conclusion 10. https://doi.org/10.1155/2023/9893016. 
[3] Wang, Xi. 2021. “Translation Correction of 
To achieve notable gains in semantic retention and English Phrases Based on Optimized GLR 
translation quality, the English complex long sentence Algorithm.” Journal of Intelligent Systems 30 (1): 
machine translation architecture (MTA) is optimised using 868–80. https://doi.org/10.1515/jisys-2020-0132. 
CCR-LWECNN: A Lightweight CNN Framework for Chinese…                                              Informatica 49 (2025) 377–388 387 
 
[4] Zhang, Qiang. 2022. “Cross-Context Accurate Singapore: Springer Nature Singapore. DOI 
English Translation Method Based on the Machine 10.21203/rs.3.rs-5734365/v1 
Learning Model.” Mathematical Problems in [15] Song, Xin. 2021. “Intelligent English Translation 
Engineering 2022: 1–11. System Based on Evolutionary Multi-Objective 
https://doi.org/10.1155/2022/9396650. Optimization Algorithm.” Journal of Intelligent & 
[5] Quoc, T.N., Le Thanh, H. and Van, H.P., 2023. Fuzzy Systems 40 (4): 6327–37. 
Khmer-Vietnamese Neural Machine Translation https://doi.org/10.3233/jifs-189469.s 
Improvement Using Data Augmentation [16] Cao, Qianyu, and Hanmei Hao. 2021. “A Chaotic 
Strategies. Informatica, 47(3). Neural Network Model for English Machine 
https://doi.org/10.31449/inf.v47i3.4761  Translation Based on Big Data Analysis.” 
[6] Liang, J., and M. Du. 2022. “Two-Way Neural Computational Intelligence and Neuroscience 2021: 
Network Chinese-English Machine Translation 3274326. https://doi.org/10.1155/2021/3274326. 
Model Fused with Attention Mechanism. Ding B, [17] Yu, Jinlin, and Xiuli Ma. 2022. “English Translation 
Editor. Scientific Programming” 2022: 1–11. Model Based on Intelligent Recognition and Deep 
https://doi.10.1155/2022/9143845 Learning.” Wireless Communications and Mobile 
[7] Shen, Xiaoping, and Runjuan Qin. 2021. “Searching Computing 2022: 1–9. 
and Learning English Translation Long Text https://doi.org/10.1155/2022/3079775. 
Information Based on Heterogeneous [18] Li, H., and W. Xiong. 2022. “Analysis of the 
Multiprocessors and Data Mining.” Microprocessors Drawbacks of English-Chinese Intelligent Machine 
and Microsystems 82 (103895): 103895. Translation Based on Deep Learning.” In The 2021 
https://doi.org/10.1016/j.micpro.2021.103895. International Conference on Machine Learning and 
[8] Li, Xiaoyu. 2022. “The Impact of Big Data Big Data Analytics for IoT Security and Privacy: 
Technology on Phrase and Syntactic Coherence in SPIoT-2021, 1:104 11. Springer International 
English Translation.” Mathematical Problems in Publishing. DOI:10.2478/amns-2025-0565 
Engineering 2022: 1–11. [19] Dong, Z. 2022. Research on Machine Translation 
https://doi.org/10.1155/2022/1428748. Method of English-Chinese Long Sentences Based on 
[9] Suleiman, D., Etaiwi, W. and Awajan, A., 2021. Fuzzy Semantic Optimization. Mobile Information 
Recurrent neural network techniques: Emphasis on Systems. DOI:10.1155/2022/4863623 
use in neural machine translation. Informatica, 45(7). [20] Shao, D., and R. Ma. 2022. English Long Sentence 
https://doi.org/10.31449/inf.v45i7.5267 Segmentation and Translation Optimization of 
[10] Rawat, R., Raj, A.S.A., Chakrawarti, R.K., Sankaran, Professional Literature Based on Hierarchical 
K.S., Sarangi, S.K., Rawat, H. and Rawat, A., 2024. Network of Concepts. Mobile Information Systems. 
Enhanced Cybercrime Detection on Twitter Using DOI:10.1155/2022/3090115 
Aho-Corasick Algorithm and Machine Learning [21] Guo, Xiaohua. 2022. “Optimization of English 
Techniques. Informatica, 48(18). Machine Translation by Deep Neural Network under 
http://dx.doi.org/10.31449/inf.v48i18.6272 Artificial Intelligence.” Computational Intelligence 
[11] Farooq, Uzma, Mohd Shafry Mohd Rahim, and and Neuroscience 2022: 2003411. 
Adnan Abid. 2023. “A Multi-Stack RNN-Based https://doi.org/10.1155/2022/2003411 
Neural Machine Translation Model for English to [22] Garg, K.D., Shekhar, S., Kumar, A., Goyal, V., 
Pakistan Sign Language Translation.” Neural Sharma, B., Chengoden, R. and Srivastava, G., 2022. 
Computing & Applications 35 (18): 13225–38. Framework for handling rare word problems in 
https://doi.org/10.1007/s00521-023-08424-0. neural machine translation system using multi-word 
[12] Mahata, Sainik Kumar, Avishek Garain, Dipankar expressions. Applied Sciences, 12(21), p.11038. 
Das, and Sivaji Bandyopadhyay. 2022. https://doi.org/10.3390/app122111038 
“Simplification of English and Bengali Sentences for [23] Benkova, L., Munkova, D., Benko, Ľ. and Munk, M., 
Improving Quality of Machine Translation.” Neural 2021. Evaluation of English–Slovak neural and 
Processing Letters 54 (4): 3115–39. statistical machine translation. Applied 
https://doi.org/10.1007/s11063-022-10755-3. Sciences, 11(7), p.2948. 
[13] Bensalah, Nouhaila, Habib Ayad, Abdellah Adib, https://doi.org/10.3390/app11072948 
and Abdelhamid Ibn El Farouk. 2022. “Transformer [24] He, H., 2023. An intelligent algorithm for fast 
Model and Convolutional Neural Networks (CNNs) machine translation of long English 
for Arabic to English Machine Translation.” In sentences. Journal of Intelligent Systems, 32(1), 
Proceedings of the 5th International Conference on p.20220257. https://doi.org/10.1515/jisys-2022-
Big Data and Internet of Things, 399–410. Cham: 0257 
Springer International Publishing. DOI:10.1007/978- [25] Shao, D. and Ma, R., 2022. English long sentence 
3-031-07969-6_30 segmentation and translation optimization of 
[14] Yang, Jing, and Lina Fan. 2023. “Optimization professional literature based on a hierarchical 
Strategy of Machine Translation Algorithm for network of concepts. Mobile Information 
English Long Sentences Based on Semantic Systems, 2022(1), p.3090115. 
Relations.” In Lecture Notes on Data Engineering https://doi.org/10.1155/2022/3090115 
and Communications Technologies, 572–79. 
388   Informatica 49 (2025) 377–388                                                                                                                                               S. Li 
 
 
 
https://doi.org/10.31449/inf.v49i12.8951                                                                                      Informatica 49 (2025) 389–402   389 
 
AECO-SC StyleGAN: A Cross-Platform GAN Framework for 
Dynamic Advertising Creative Generation 
 
Yanan Zhang 
Yantai University of Science and Technology, School of Culture and Media, Penglai, Shandong, 265600, China 
E-mail: 15753572255@163.com 
 
Keywords: adaptive elephant clan optimizer (AECO), dynamic advertising creative, spatially conditioned StyleGAN, 
visual style consistency 
 
Received: April 18, 2025 
 
Dynamic advertising (ad) requires personalized, engaging content across multiple platforms. Traditional 
approaches struggle with scalability and cross-platform adaptation. Leveraging deep learning (DL), 
particularly Generative Adversarial Networks (GANs), offers the potential to automate and optimize ad 
creative generation with higher precision and contextual adaptability. This research aims to develop a DL 
framework that dynamically generates and optimizes advertising creatives—leveraging Adaptive 
Elephant Clan Optimizer with a Spatially Conditioned StyleGAN (AECO-SC StyleGAN) for dynamic 
cross-platform advertisement creative generation. Adaptive Elephant Clan Optimizer (AECO) 
dynamically adjusts training hyperparameters to improve model convergence, while Spatially 
Conditioned StyleGAN (SC-StyleGAN) generates platform-specific ad creatives by incorporating spatial 
constraints for contextual alignment. Our system is trained on the Ad ImageNet dataset, which includes 
9,003 ad samples with paired images and promotional text from platforms like Facebook and Instagram. 
All data were resized to 256×256, normalized, and tokenized for training. Using Python, the model 
demonstrates superior performance in creative generation and engagement prediction. The proposed 
AECO-SC StyleGAN model achieved an NDCG of 0.61, an accuracy of 98.48%, and a weighted F1-score 
of 98.5%, outperforming prior approaches such as VGG + Layout + NIMA (NDCG 0.22) and XCEPTION 
(accuracy 98.27%, F1-score 98.2%). These results highlight the effectiveness of integrating adaptive 
optimization and spatial conditioning in generating high-quality, context-aware advertising creatives, 
offer a scalable and automated solution for cross-platform digital marketing.   
Povzetek: Okvir AECO-SC StyleGAN uporablja globoko učenje (GAN in StyleGAN s prostorskim 
pogojevanjem) in adaptivno optimizacijo (AECO) za dinamično generiranje in optimizacijo visoko 
kakovostnih oglasnih vsebin na več platformah. Sistem dosega kvalitetno napovedovanju angažiranosti 
oglasov, kar avtomatizira prilagojeno digitalno trženje.
1 Introduction purposes to achieve continuous output improvement 
through adversarial learning processes [5]. Such adaptive 
Digital marketing leaders consider the capacity to send creativity improves advertising diversity and allows for 
customized advertising content across numerous platforms better metrics engagement, including the click-through 
as their main competitive advantage [1]. Advertising rate (CTR) and conversion rate (CVR). 
through traditional methods faces difficulties when The system processes real-time performance records in 
adjusting to user preference changes across multiple conjunction with user adjustments to automatically modify 
platform formats that include social media and mobile its creative elements and maintain platform usage between 
applications, together with websites [2]. DL techniques various platforms [6]. Digital marketing and artificial 
appear as groundbreaking solutions to address the intelligence maintain an expanding relationship that gives 
marketing challenges that face the optimization of ad marketers data-driven creative solutions to overcome their 
creativity. GANs have become prominent among these DL creative limitations [7]. This GAN-based framework 
techniques because they enable the production of high- provides automated design solutions for advertising 
quality, realistic, adaptable content [3]. The procedure content, which enable more efficient personalized 
enables machine automation to produce relevant visual advertising in an online environment dominated by 
advertisements with appropriate platform parameters for competition. The aim was to develop a DL framework that 
distinct user groups [4]. The model uses dual-network utilizes the AECO-SC StyleGAN to dynamically generate, 
training, which combines a generator for designing ad refine, and optimize advertising creatives across multiple 
variants together with a discriminator for evaluation 
390   Informatica 49 (2025) 389–402                                                                                                                                        Y. Zhang 
 
platforms for enhanced personalization and performance The Automated Creative Optimization (AutoCO) 
in digital marketing campaigns.  framework outperformed baselines, achieving lower 
The proposed AECO algorithm is a derivative-free cumulative regret and a 7% CTR increase in online A/B 
population-based metaheuristic. It simulates the social testing. 
behavior of elephants in clans to adaptively adjust The two-stage dynamic and creative optimization 
hyperparameters by exploring the solution space through framework, combining AutoCo and a transformer-based 
stochastic position updates, without relying on the gradient rerank model to improve CTR prediction and creative 
of the loss function. This enables AECO to dynamically ranking under ambiguous data conditions, was developed 
optimize hyperparameters such as learning rate, style by [10]. Experimental and online testing showed a 10% 
weights, and batch size during GAN training, improving CTR improvement over baselines, demonstrating superior 
convergence stability and creative output quality. performance. The integration of a Particle Swarm 
The structure presents the development and evaluation of Optimization-based Recurrent Neural Network (PSO-
an intelligent advertising creative generation system using based RNN) algorithm with Computer-Aided Design 
AECO and SC-StyleGAN. Section 2 reviews related (CAD) tools to automate the generation and optimization 
works, highlighting recent advancements in GAN-based of advertising artistic designs, enhancing design efficiency 
advertising optimization and cross-platform creative and creativity, was explored by [11]. 
generation. Section 3 outlines the proposed methodology, The Dynamic Creative Optimization (DCO) problem, by 
detailing the preprocessing of advertising data, AECO- determining the optimal product and creative ad 
based hyperparameter tuning, and the architecture of the combination under constraints like ad fatigue and user 
SC-StyleGAN for spatially contextual ad generation. diversity, was examined [12]. The advertising design, 
Section 4 discusses the experimental setup, performance creativity, and efficiency were enhanced by integrating 
evaluation, and comparative results with baseline models. CAD technology and data-driven automation [13]. The 
Section 5 concludes with future directions to enhance developed model enabled the automated generation of 
scalability, personalization, and real-time adaptability in diverse advertising designs, successfully reflecting 
advertising technologies. creative schemes and allowing quantitative evaluation, 
 thus validating its effectiveness in promoting innovative 
2 Related works advertising solutions. The integration of artificial 
intelligence in advertising with an emphasis on content 
Automating the generation of ad creatives from landing production, targeting, personalization, and ad optimization 
pages using abstractive text summarization, enabling rapid was explored by [14]. Table 1 provides a comparative 
experimentation in large-scale marketing campaigns, was overview of recent GAN-based approaches. While 
examined by [8]. The advertising creative optimization by previous studies have focused on general image synthesis 
modeling complex interactions between creative elements or aesthetic enhancement, none leverage spatial 
and improving Click-Through Rate (CTR) prediction conditioning and adaptive optimization specifically for 
using an Auto ML-inspired framework was enhanced [9]. advertising creative generation. 
 
 
Table 1: Conventional approaches of GAN for dynamic advertising creative optimization 
Study Model Dataset Metrics Used Key Results Limitation 
Jiang et al. StyleGAN Proprietary Ad CTR, Qualitative CTR Limited 
[15] (AdSEE) Dataset Feedback improvement: generalizability 
+12% 
Shilova et al. Diffusion + User Behavior Personalization +15% relevance High computational 
[16] Outpainting + Ad Images Score cost 
Xu et al. [17] PDA-GAN PubLayNet, Layout Accuracy, Improved layout Focused on layout 
Rico Realism realism generation 
Aghazadeh Various (CAP Generated Ad CAP (Creativity, Structured ad No generative 
et al. [18] Evaluation) Images Alignment, quality eval model proposed 
Persuasion) 
Ma and Zhao Enhanced Logo Design FID, User Rating FID: 23.4, User Focused only on 
[19] DCGAN Dataset preference logos, not full ads 
2.1 Problem statement  traditional methods which results in reduced engagement 
and inconsistency [15] [16]. Although GANs make 
Digital marketing teams need to use personalized and automation possible, current models usually do not take 
platform-specific ads, but it is difficult to scale them using context into account [17], run on computers for long 
AECO-SC StyleGAN: A Cross-Platform GAN Framework for…                                              Informatica 49 (2025) 389–402 391 
 
periods [16] or mostly generate logos [18]. To overcome The Ad ImageNet dataset contains image-text 
these problems, the study offers a framework that uses advertisement samples, with images typically sized and 
Spatially Conditioned GANs, AECO for hyperparameter text averaging. For data preprocessing, images are resized 
adjustment and combines various input types to improve to uniform dimensions, normalized for intensity 
the quality and performance of automatic ad creation. consistency, and tokenized for textual elements. The 
 AECO-SC StyleGAN framework uses spatial conditioning 
 and adaptive optimization to dynamically generate and 
3 Methodology optimize advertising creatives for enhanced cross-platform 
engagement and performance. Figure 1 shows the general 
outline of the methodological approach. 
 
 
 
Figure 1: General outline of the methodological approach 
 
3.1 Data collection pixel image size was used to simplify the original image 
size. This input data type was created by resizing the image 
The Ad ImageNet dataset, sourced from the Peter Brendan using bicubic interpolation. Because the outcome was 
repository, consists of 9,003 image-text advertisement smoother at the edges than with bilinear interpolation, this 
samples totaling approximately 682 MB in size. Each entry approach was chosen. Bicubic was the perfect balance 
includes a banner-style advertisement image along with between process time and high-quality results. The bicubic 
associated promotional text. The dataset captures a variety interpolation estimates the pixels in the (𝑗, 𝑖) positions 
of standard ad dimensions, with the most frequent being using a sampling (𝑆) distance of 16 nearby pixels (4𝑥4) in 
254 × 254 pixels, commonly used in digital marketing. The equations (1)-(5). Figure 2 illustrates (a) before resizing 
textual content varies in length, with an average of around and (b) after resizing. 
525 characters, covering diverse product and event 𝑔𝑗,𝑖 = [𝑋−1(𝑇𝑧)𝑋0(𝑇𝑧)𝑋1(𝑇𝑧)𝑋2(𝑇𝑧)] 
advertisements. The dataset was split into 70% training, 𝑔 𝑔
𝑗−1,𝑖−1 𝑔𝑗,𝑖−1 𝑗+1,𝑖−1 𝑔𝑗+2,𝑖−1
15% validation, and 15% testing sets to ensure robust  𝑒  
 𝑔 𝑔
𝑗−1,𝑖 𝑔𝑗,𝑖 𝑗+1,𝑖 𝑗+2,𝑖  
performance evaluation across unseen data.    𝑔 𝑔
𝑗−1,𝑖+1 𝑔𝑗,𝑖+1 𝑗+1,𝑖+1 𝑔𝑗+2,𝑖+1
Source:   
[𝑔 𝑔
𝑗−1,𝑖+2 𝑔𝑗,𝑖+2 𝑗+1,𝑖+2 𝑔𝑗+2,𝑖+2]
https://huggingface.co/datasets/PeterBrendan/AdImageNe
𝑋−1    (𝑇𝑤)
t 
𝑋0      (𝑇[ 𝑤)
 ] 
𝑋1     (𝑇𝑤)
3.2 Data preprocessing using image resizing 𝑋2     (𝑇𝑤)
 
Image resizing standardizes input dimensions, enabling 
                                                                             (1) 
consistent data processing for DL models and optimizing 
Where: 𝑇𝑧 = 𝑖
′ − 𝑖,      𝑇𝑤 = 𝑗
′ − 𝑗 and  
the generation of dynamic advertising creative. Although 
𝑔𝑗,𝑖 = 𝑝𝑖𝑥𝑒𝑙 𝑣𝑎𝑙𝑢𝑒 𝑎𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (𝑗, 𝑖) the original dataset images varied in size most frequently 
−𝑇3+2𝑇2−𝑇
256 × 256 pixels the images were uniformly resized to 224 𝑋−1(𝑇) =          (2) 
2
× 224 × 3 pixels for this research. The 244 ×  224 𝑥 3- −3𝑇3+5𝑇2+2
𝑋0(𝑇) =          (3) 
2
392   Informatica 49 (2025) 389–402                                                                                                                                        Y. Zhang 
 
−3𝑇3+4𝑇2+2
𝑋1(𝑇) =          (4) 
2
𝑇3−𝑇2
𝑋2(𝑇) =           (5) 
2
 
Figure 3: Tokenization outcome (a) Positive Worlds 
Cloud and (b) Ad Image Net Words Cloud 
 
3.3 AECO-SC StyleGAN 
The hybrid deep learning framework is called the AECO-
 SC StyleGAN and is meant to dynamically develop and 
Figure 2: (a) Before resizing and (b) After resizing improve advertisement creative work. The method 
 implements AECO, a metaheuristic that studies elephant 
3.2.1 Tokenization behavior, together with SC-StyleGAN, a modified GAN 
The raw advertising text was first processed with that adds spatial and contextual inputs. The adaptive 
tokenization so that the model could efficiently generate approach of AECO to changing hyperparameters leads to 
and improve dynamic ad content. The text for promotion faster learning and better exploration of various solutions, 
was separated into words, phrases and sentences using compared to the Adam optimizer. SC-StyleGAN makes 
natural language processing. Turning unstructured text use of semantic maps, sketches and embeddings from 
into a structured form made it simpler to examine texts and different sources to produce images that look good when 
connect models. Keeping the important connections and used in ads. When combined, this integration improves 
order in each sentence, tokenization protected the key how creative works on ads, how it is predicted to be 
meaning needed to make an ad relevant. Because the text received by the target audience and how well it adapts to 
in ads is generally brief, simple tokenization and various digital platforms, giving a solid, effective system 
embedding were adequate. The fact that it is lightweight for today’s data-driven advertising. The AECO-SC 
helped the system express meaning with little cost which StyleGAN for Ad Creative Generation in algorithm 1. 
improved the performance of the AECO-SC StyleGAN  
framework. Figure 3 shows (a) Positive Worlds Cloud and 
(b) Ad Image Net Words Cloud. 
Algorithm 1: AECO-SC StyleGAN for Ad Creative Generation 
𝑆𝑡𝑒𝑝 1: 𝑆𝑒𝑡𝑢𝑝 
𝑑𝑒𝑓 𝑠𝑒𝑡𝑢𝑝(): 
    𝑁,𝑀, 𝐺, 𝑇 =  𝑛𝑢𝑚_ℎ𝑦𝑝𝑒𝑟𝑝𝑎𝑟𝑎𝑚𝑠(), 40, 5, 100 
    𝑃_𝑚, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀 =  0.3, 1.0, 0.8, 0.7, 0.5 
    𝑑𝑎𝑡𝑎 =  𝑙𝑜𝑎𝑑_𝑎𝑑𝑣𝑒𝑟𝑡_𝑑𝑎𝑡𝑎𝑠𝑒𝑡() 
    𝑟𝑒𝑡𝑢𝑟𝑛 𝑁,𝑀, 𝐺, 𝑇, 𝑃_𝑚, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀, 𝑑𝑎𝑡𝑎 
𝑆𝑡𝑒𝑝 2: 𝐼𝑛𝑖𝑡𝑖𝑎𝑙𝑖𝑧𝑒 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 
𝑑𝑒𝑓 𝑖𝑛𝑖𝑡_𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑀,𝑁): 
    𝑟𝑒𝑡𝑢𝑟𝑛 [{′𝑝𝑎𝑟𝑎𝑚𝑠′: 𝑟𝑎𝑛𝑑_𝑣𝑒𝑐(𝑁), ′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′: 𝑁𝑜𝑛𝑒} 𝑓𝑜𝑟 _ 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒(𝑀)] 
𝑆𝑡𝑒𝑝 3: 𝐸𝑣𝑎𝑙𝑢𝑎𝑡𝑒 𝐹𝑖𝑡𝑛𝑒𝑠𝑠 
𝑑𝑒𝑓 𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑖𝑛𝑑, 𝑑𝑎𝑡𝑎, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀): 
    𝑚𝑜𝑑𝑒𝑙 =  𝑡𝑟𝑎𝑖𝑛_𝑆𝐶_𝑆𝑡𝑦𝑙𝑒𝐺𝐴𝑁(𝑖𝑛𝑑[′𝑝𝑎𝑟𝑎𝑚𝑠′], 𝑑𝑎𝑡𝑎, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀) 
    𝑟𝑒𝑡𝑢𝑟𝑛 𝑐𝑜𝑚𝑝𝑢𝑡𝑒_𝑙𝑜𝑠𝑠(𝑚𝑜𝑑𝑒𝑙, 𝑑𝑎𝑡𝑎) 
𝑆𝑡𝑒𝑝 4: 𝐶𝑙𝑎𝑛 𝑈𝑝𝑑𝑎𝑡𝑒 
𝑑𝑒𝑓 𝑐𝑙𝑎𝑛_𝑢𝑝𝑑𝑎𝑡𝑒(𝑝𝑜𝑝, 𝑔𝑏𝑒𝑠𝑡): 
    𝑓𝑜𝑟 𝑐𝑙𝑎𝑛 𝑖𝑛 𝑓𝑜𝑟𝑚_𝑐𝑙𝑎𝑛𝑠(𝑝𝑜𝑝): 
        𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ =  𝑚𝑖𝑛(𝑐𝑙𝑎𝑛, 𝑘𝑒𝑦 = 𝑙𝑎𝑚𝑏𝑑𝑎 𝑥: 𝑥[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]) 
        𝑓𝑜𝑟 𝑒 𝑖𝑛 𝑐𝑙𝑎𝑛: 
            𝑖𝑓 𝑒 ! =  𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ: 
AECO-SC StyleGAN: A Cross-Platform GAN Framework for…                                              Informatica 49 (2025) 389–402 393 
 
                𝑒[′𝑝𝑎𝑟𝑎𝑚𝑠′] +=  𝑟𝑎𝑛𝑑()  ∗  (𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ[′𝑝𝑎𝑟𝑎𝑚𝑠′]  −  𝑒[′𝑝𝑎𝑟𝑎𝑚𝑠′]) 
        𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ[′𝑝𝑎𝑟𝑎𝑚𝑠′] +=  𝑟𝑎𝑛𝑑()  ∗  (𝑔𝑏𝑒𝑠𝑡[′𝑝𝑎𝑟𝑎𝑚𝑠′]  −  𝑚𝑎𝑡𝑟𝑖𝑎𝑟𝑐ℎ[′𝑝𝑎𝑟𝑎𝑚𝑠′]) 
𝑆𝑡𝑒𝑝 5: 𝑀𝑎𝑙𝑒 𝑈𝑝𝑑𝑎𝑡𝑒 & 𝐸𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛 
𝑑𝑒𝑓 𝑚𝑎𝑙𝑒_𝑎𝑛𝑑_𝑒𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛(𝑝𝑜𝑝, 𝑃_𝑚): 
    𝑚𝑎𝑙𝑒𝑠 =  𝑠𝑒𝑙𝑒𝑐𝑡_𝑚𝑎𝑙𝑒𝑠(𝑝𝑜𝑝, 𝑃_𝑚) 
    𝑐𝑒𝑛𝑡𝑒𝑟 =  𝑚𝑒𝑎𝑛_𝑣𝑒𝑐([𝑒[′𝑝𝑎𝑟𝑎𝑚𝑠′] 𝑓𝑜𝑟 𝑒 𝑖𝑛 𝑝𝑜𝑝]) 
    𝑓𝑜𝑟 𝑚 𝑖𝑛 𝑚𝑎𝑙𝑒𝑠: 
        𝑚[′𝑝𝑎𝑟𝑎𝑚𝑠′] +=  𝑟𝑎𝑛𝑑()  ∗  (𝑐𝑒𝑛𝑡𝑒𝑟 −  𝑚[′𝑝𝑎𝑟𝑎𝑚𝑠′]) 
    𝑟𝑒𝑝𝑙𝑎𝑐𝑒_𝑤𝑒𝑎𝑘𝑒𝑠𝑡(𝑝𝑜𝑝) 
    𝑝𝑜𝑝. 𝑎𝑝𝑝𝑒𝑛𝑑(𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒_𝑐𝑎𝑙𝑓(𝑝𝑜𝑝)) 
    𝑟𝑎𝑛𝑑𝑜𝑚_𝑟𝑒𝑠𝑒𝑡_𝑏𝑜𝑡𝑡𝑜𝑚(𝑝𝑜𝑝, 𝑝𝑐𝑡 = 0.3) 
𝑆𝑡𝑒𝑝 6: 𝑀𝑎𝑖𝑛 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛 
𝑑𝑒𝑓 𝑜𝑝𝑡𝑖𝑚𝑖𝑧𝑒_𝐴𝐸𝐶𝑂_𝑆𝐶𝑆𝑡𝑦𝑙𝑒𝐺𝐴𝑁(): 
    𝑁,𝑀, 𝐺, 𝑇, 𝑃_𝑚, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀, 𝑑𝑎𝑡𝑎 =  𝑠𝑒𝑡𝑢𝑝() 
    𝑝𝑜𝑝 =  𝑖𝑛𝑖𝑡_𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛(𝑀,𝑁) 
    𝑓𝑜𝑟 _ 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒(𝑇): 
        𝑓𝑜𝑟 𝑒 𝑖𝑛 𝑝𝑜𝑝: 
            𝑒[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]  =  𝑒𝑣𝑎𝑙𝑢𝑎𝑡𝑒(𝑒, 𝑑𝑎𝑡𝑎, 𝜆1, 𝜆𝐺𝑃, 𝜆𝐿𝑃, 𝜆𝐹𝑀) 
        𝑔𝑏𝑒𝑠𝑡 =  𝑚𝑖𝑛(𝑝𝑜𝑝, 𝑘𝑒𝑦 = 𝑙𝑎𝑚𝑏𝑑𝑎 𝑥: 𝑥[′𝑓𝑖𝑡𝑛𝑒𝑠𝑠′]) 
        𝑐𝑙𝑎𝑛_𝑢𝑝𝑑𝑎𝑡𝑒(𝑝𝑜𝑝, 𝑔𝑏𝑒𝑠𝑡) 
        𝑚𝑎𝑙𝑒_𝑎𝑛𝑑_𝑒𝑣𝑜𝑙𝑢𝑡𝑖𝑜𝑛(𝑝𝑜𝑝, 𝑃_𝑚) 
    𝑟𝑒𝑡𝑢𝑟𝑛 𝑔𝑏𝑒𝑠𝑡 
  
3.3.1 SC StyleGAN influence microstructure and facial features, and fine styles 
SC-StyleGAN enables location-specific control over regulate high-frequency details and textures.  Non-visual 
visual features, enhancing the deep learning framework’s data like captions, CTR, and demographics are encoded 
ability to dynamically generate and optimize personalized, into embeddings using text encoders  
visually consistent advertising creatives across different  
contexts. The StyleGAN network generates high-quality and fully connected layers. These embeddings are fused 
images by applying an 18 ×  512 style code to 18 layers with spatial inputs (semantic maps and sketches) through 
of the network. It starts with a constant 4 ×  4 feature map modulation layers that adjust the style codes, allowing SC-
and progressively grows by a factor of 2 at each stage, StyleGAN to generate creatives tailored to both visual 
ultimately producing images of up to 1024 × features and user/context data. For training and evaluation 
 1024 pixels. Each style block receives a 1 ×  512 style in this study, input images were uniformly resized to 
code that modulates convolution operations, enabling fine 224 ×  224 pixels, serving as the initial resolution before 
control over visual attributes. These style codes the progressive growth to higher resolutions during 
correspond to different levels of detail: coarse styles affect generation. Figure 4 illustrates the network architecture of 
the overall layout and color schemes, middle styles  SC-StyleGAN. 
 
 
Figure 4: Network architecture of SC-StyleGAN 
394   Informatica 49 (2025) 389–402                                                                                                                                        Y. Zhang 
 
 
SC-StyleGAN was a conditional generation system that local perceptual loss (𝐾𝐿𝑃) improves detail by comparing 
uses a semantic map and sketches to identify spatial image patches, and the feature matching loss (𝐾𝐹𝑀) 
features for coarse and intermediate styles. It consists of stabilizes training by aligning intermediate features. 
two sub-networks: the production network, which uses Together, these losses guide the network toward realistic 
layers, and the spatial encoding network, which maps input and context-aware ad generation. The perceptual metrics 
conditions to intermediates. Two encoding modules are (LPIPS) measure the overall perceptual loss after shrinking 
suggested for the spatial encoding network, which the target and synthesized images to 64 x 64. The local 
individually translates the semantic map and 512 × perceptive loss (LLP) and the global perceptive loss (LGP) 
 512 sketches into 64 ×  256 ×  256 spatial feature are expressed mathematically as follows in equations (7)-
maps. With a spatial dimension of 32 𝑥 32, the combined (8). 
map of features was encoded to correspond with the  
coarse-moderated style in the StyleGAN synthesizing 𝐾 𝑟𝑒
𝐺𝑃(𝐽𝑔𝑡 , 𝐽𝑠𝑦𝑛) = 𝐿𝑃𝐼𝑃𝑆(𝐽𝑔𝑡 , 𝐽
𝑟𝑒
𝑠𝑦𝑛)       (7) 
module. To create a 32 𝑥 32 intermediate image, the same 1
𝐾𝐿𝑃(𝐽𝑔𝑡 , 𝐽𝑠𝑦𝑛) = ∑𝐿 𝐿𝑃𝐼𝑃𝑆(𝐽𝑙 , 𝐽𝑙
𝐿 𝑙=1 𝑔𝑡 𝑠𝑦𝑛)      (8) 
steps are followed for the spatial intermediate feature map. 
 
Table 2 represents the architectural and computational 
Where 𝐽𝑟𝑒 𝑟
𝑔𝑡and 𝐽 𝑒
𝑠𝑦𝑛are the resized reality and, synthesizing 
footprint of the SC-StyleGAN model.  
 the image, respectively, and 𝐿𝑃𝐼𝑃𝑆(. , . ) was the 
Table 2: SC-StyleGAN architecture details: input perceptual measuring function. In each phase, 𝐽𝑙𝑔𝑡and 
dimensions, layer-wise parameters, and computational 𝐽𝑙𝑠𝑦𝑛stand for the 𝑘 − 𝑡ℎ randomly clipped ground truth 
complexity and synthetic patches, respectively, in equation (9). 
  
Comp Layer Input Output Par FL 1
𝐾𝐹𝑀 = ∑𝑘||𝐻
𝑘(𝑔𝑡) − 𝐻𝑘(𝑠𝑦𝑛)||        (9) 
𝑀 1
onent Type Shape Shape ams OPs 
 
Semant Conv2 512×51 64×256 ~3.1 ~2.5
Where 𝐻𝑘(. ) was the output map of features of the pre-
ic D + 2×3 ×256 M B 
trained StyleGAN synthesizing network's 𝑘 − 𝑡ℎ 
Encode ReLU 
resolution block (with a matching spatial resolution of 2𝑘). 
r (E_s) × 4 
The number of computed blocks was 𝑀. Following the 
Sketch Conv2 512×51 64×256 ~2.8 ~2.2
replacement resolution block, it computes the 𝐾1  norm 
Encode D + 2×1 ×256 M B 
between the ground truth generation and the synthesized 
r (E_k) ReLU 
processes(𝑘 ∈ {6,7,8,9}𝑎𝑛𝑑 𝑀 = 4). 
× 4 
 
Spatial Add/C 64×256 64×32× ~0.6 ~0.3
3.3.2 AECO 
Combi oncat + ×256 32 M B 
To improve model convergence, stability, and learning 
ner Down efficiency, AECO dynamically modifies training 
Conv hyperparameters. This raises the caliber and efficacy of 
StyleG StyleB 1×512 1024×1 ~30 ~75 engagement of generated ad creatives spanning platforms. 
AN lock × 024×3 M B The AECO enhances the DL framework by efficiently 
(Synth 18 optimizing parameters, enabling dynamic generation of 
esis personalized advertising creative through adaptive search, 
Net) exploration, and convergence strategies. The ECO was 
 enhanced into an improved version to support a DL 
Objective Function: The SC-StyleGAN was to precisely framework that dynamically generates and optimizes 
map the determined conditions to their equivalents in the advertising creatives. This AECO algorithm addresses the 
synthesis process while encoding the spatial constraint for limitations of the original by improving convergence speed 
the StyleGAN synthesizing procedure while preserving the and solution quality, enabling more effective, real-time 
invention value of the StyleGAN. Equation (6) fulfills the content creation and personalization in advertising through 
training process's goal in the following ways. intelligent data-driven optimization. 
  Many tasks using Adam and RMSprop are successful, but 
K(Jgt, Jsyn) = λK K1(Jgt, J1 syn) + λK KGP + λK K
GP LP LP + they have issues with GANs, including convergence 
λK K
FM FM         (6) issues, collapsing to single modes, and being sensitive to 
 changes in learning rates. To resolve these issues in 
The SC-StyleGAN training uses a composite loss to creating ads for different channels, the new AECO strategy 
enhance image quality and consistency. The 𝐿1 loss (𝐾1) adapts by using evolutionary methods to tune 
ensures pixel-level accuracy, while the global perceptual hyperparameters, which boosts the stability and resilience 
loss (𝐾𝐺𝑃) maintains semantic alignment at full scale. The 
AECO-SC StyleGAN: A Cross-Platform GAN Framework for…                                              Informatica 49 (2025) 389–402 395 
 
of the model when facing different spatial and contextual 𝑊𝑖𝑡+1
𝐹𝐶𝑗,𝑛 =
situations. 
 𝑍𝑋
𝑖𝑡+1
𝐹𝐶𝑗,𝑛(𝑖) + 𝑞 × 𝛼 × [𝑊
𝑖𝑡
𝐹𝐶 (𝑖
𝑁𝐶−𝑗,𝑁
) −𝑊𝑖𝑡
Elephant migration under the direction of each clan 𝐹𝐶𝑗,𝑛(𝑖)]
 +𝑞 × 𝛼 × [𝑊𝑖𝑡 𝑖
principal was simulated using the ECO algorithm. This 𝑀𝐶,𝑅𝑚(𝑖) − 𝑊
𝑡
𝐹𝐶𝑗,𝑛(𝑖)],   𝑖𝑓 𝑛 >
 
part provides an autonomous movement range and an 𝑁𝑒
2    
autonomous movement position for each elephant to keep  𝑍𝑀𝑖𝑡+1
 𝐹𝐶𝑗,𝑛(𝑖) + 𝑞 × 𝛼 × [𝑊
𝑖𝑡
𝐹𝐶𝑗,𝑁(𝑖) − 𝑊
𝑖𝑡
𝐹𝐶𝑗,𝑛(𝑖)]the algorithm from reaching a local optimum, enhance 
image variety, and replicate the aforementioned behaviors  +𝑞 × 𝛼 ×
based on the initial elephant position to generate creative { [𝑊𝑖𝑡
𝑀𝐶,𝑅𝑚(𝑖) − 𝑊
𝑖𝑡
𝐹𝐶𝑗,𝑛(𝑖)], 𝑒𝑙𝑠𝑒
advertising, as illustrated in equations (10) and (11).                            
                (13) 
∆𝑊0(𝑖) = ∆𝑊𝑚𝑖𝑛(𝑖) + 𝑞 × (∆𝑊𝑚𝑎𝑥(𝑖) − ∆𝑊𝑚𝑖𝑛
𝑗 (𝑖)) In the 𝑁𝑐 − 𝑗family group  𝑊  
𝐹𝐶𝑀   at the 𝑗 − 𝑡ℎ 
𝑑=𝑗,
        (10) iteration, the matriarch of the female elephant was 
 represented by 𝑊𝑖𝑡
𝐹𝐶𝑀𝑑=𝑗,𝑁
. After sorting at 𝑖𝑡 + 1 iteration 
Where 𝑞 was a random number in the interval that is 𝑍𝑋𝑖𝑡+1𝐹𝐶𝑗,𝑛 was the autonomous position of the 𝑛(𝑛 =
uniformly distributed [0, 1], and ∆𝑊0
𝑗 (𝑖)(𝑗 = 1,2, … . ,𝑀𝑓) clan member, and equation (14) shows that 𝑞 
1,2, … . ,𝑀, 𝑖 = 1,2, … . , 𝐶) represents the range of was a uniformly spread range form [0, 1]. 𝛼 was the 
independent movement of the 𝑗 − 𝑡ℎ elephant in the 𝑖 − improved adaptable scaling factor.  
𝑡ℎ  dimension at the starting time. Both ∆𝑊𝑚𝑖𝑛(𝑖) =  
−∆𝑊𝑚𝑎𝑥(𝑖) and ∆𝑊𝑚𝑎𝑥(𝑖) = 𝐸 × (𝑊𝑚𝑎𝑥(𝑖) − 𝑖𝑡
𝛼 = 2 − (𝑑 × )      (14) 
𝑖𝑡𝑚𝑎𝑥
𝑊𝑚𝑖𝑛(𝑖)) represent the lower and upper bounds of the 𝑖 −  
𝑡ℎ dimensional autonomous motion space. Generally Where 𝑑 was a fixed value that was typically set to 0.5 to 
speaking, 𝐸 can be seen as 0.005 for improved outcomes,  get the best results, while the optimization issue itself 
 requires in different values. 
𝑌𝑊0(𝑖) = 𝑊0 0
𝑗 𝑗 (𝑖) + ∆𝑊𝑗 (𝑖)
(2) An autonomous location traction-based individual 
      (11) 
update method for matriarchs is employed; as previously 
 
stated, the globally optimal individuals swiftly approach 
As evolution advances, the autonomous range of mobility 
the globally optimal region after traversing each of the 
of each elephant should likewise diminish as individuals 
matriarchs in the ECO algorithm. This section suggests an 
become closer to one another. Therefore, the independent 
autonomous position, traction-based matriarch updating 
moving range updates technique in equation (12) is used. 
approach, as illustrated in equation (15). 
 
 
𝑖𝑡+1 𝑖𝑡
∆𝑊𝑗 (𝑖) = [0.9 − (0.8 × )] × ∆𝑊𝑖𝑡
𝑖𝑡 𝑗 (𝑖) (12) 𝑖𝑡+1 (𝑖) = 𝑍𝑋𝑖𝑡+1 𝑡
𝐹𝐶𝑗,𝑛(𝑖) + 𝑞 × 𝛽 × [𝑊
𝑖
𝑚𝑎𝑥 𝑊𝐹𝐶𝑗,𝑛 𝐵𝑒𝑠𝑡(𝑖) −
 𝑊𝑖𝑡
𝐹𝐶𝑗,𝑁(𝑖)]        (15) 
Enhancement of the family clan's renewal technique: The  
mother elephant was the best person in each family clan, The scaling factor 𝛽 was determined using equation (14), 
and all other clan members learn from the generative 
while 𝑍𝑋𝑖𝑡+1𝐹𝐶𝑗,𝑁 was the independent movement location of 
image. While clan members are responsible for 
the matriarch in this clan at the 𝑖𝑡 + 1 iteration, acquired 
maintaining population diversity to provide the mother 
similarly to equation (16). 
elephant with superior evolutionary information for quick 
 
convergence, the mother elephant was primarily 
𝑖𝑡
responsible for swiftly investigating the area where the 𝛽 = 3 − (𝑑 × )       (16) 
𝑖𝑡𝑚𝑎𝑥
hypothesized optimal location was located.   
(1) A method of updating each clan member individually Improvement of the individual renewal method of the male 
based on the autonomous location traction equation (13), elephant clan: According to the ECO algorithm, the male 
this section proposes a way to update the individual elephant clan was essential in creating globally ideal 
population members based on autonomous location locations for female clan leaders and substituting certain 
traction to better maintain species variety and avoid family members to supply evolutionary data. Despite this, 
significantly slowing down the algorithm's rate of the number of male elephants might add to the diversity of 
convergence. the family clan. In light of this, equation (17) provides the 
 male elephant individual renewal formula to guarantee that 
the male elephant clan has particular population diversity 
and generates as much evolutionary information as 
possible. 
396   Informatica 49 (2025) 389–402                                                                                                                                        Y. Zhang 
 
𝑊𝑖𝑡+1 AECO’s search positions and StyleGAN’s exact 
𝑀𝐶,𝑚(𝑖) = 𝑍𝑋
𝑖𝑡+1
𝑀𝐶,𝑚(𝑖) + 𝑞 × 𝑜 × (𝑊
𝑖𝑡
𝐶𝑒𝑛𝑡𝑒𝑟 −
hyperparameters (like learning rate or noise scale) should 
𝑊𝑖𝑡
𝑀𝐶,𝑚(𝑖))        (17) be clarified. 
  
In the 𝑖𝑡 + 1 iteration of the male elephant clan 𝑍𝑋𝑖𝑡+1𝑀𝐶,𝑚 4 Results and discussion  
represents the autonomous movement location of the 𝑚 −
𝑡ℎ(𝑚 = 1,2, … ,𝑀 All experiments used Python 3.10.1 on an NVIDIA Tesla 
𝑓) elephant. Equation (18) illustrates 
that the 𝑜 was determined using 𝑊𝑖𝑡 V100 GPU. AECO-SC StyleGAN trained for 50 epochs 
𝐶𝑒𝑛𝑡𝑒𝑟 , which was the 
location of the maternal elephant patriarch in each family (1,000 iterations each) in approximately 12 GPU hours, 
outperforming Adam (15 GPU hours) in efficiency. The 
clan in the 𝑖𝑡 − 𝑡ℎ iteration. 
proposed strategy was assessed and its effectiveness was 
 
1 determined using the following indicators: Normalized 
𝑊𝑖𝑡 𝑡
𝐶𝑒𝑛𝑡𝑒𝑟 = × ∑𝑁𝑐−1𝑊𝑖    (18) 
𝑁𝑐−1 𝑗=1 𝐹𝐶𝑗,𝑁   
Discounted Cumulative Gain (NDCG), accuracy, and 
 weighted F1, The Fréchet Inception Distance (FID), 
Improvement of individual replacement strategy for part of Structural Similarity Index Measure (SSIM) and Peak 
the family clan: Enhance the plan for replacing adult Signal-to-Noise Ratio (PSNR). Although AECO-SC 
elephants. The following adult elephant replacement was StyleGAN is a generative framework, its output creatives 
suggested to guarantee the algorithm's speed of are evaluated using a downstream binary classification 
convergence and boost the variety of the creative ad task predicting ad engagement (high vs. low CTR). All 
images, as adult elephants are not the superior elephants models, including baselines like VGG + Layout + NIMA 
within this clan. Equation (19) indicates the central and XCEPTION, are evaluated on this task for a fair 
position of all clan members. Otherwise, as indicated by comparison, also implementing the baseline method to this 
equation (20), the superior person was chosen to replace research. Table 3 represents hyperparameters for AECO-
the real adult elephant from both the new individual and SC StyleGAN-based framework used in dynamic 
the original adult elephant.  advertising creative optimization. 
  
𝑊𝑖𝑡+1 1
𝐹𝐶𝑗,𝐺𝑚 = ∑𝑁𝑒1𝑊
𝑖𝑡+1       (19) Table 3: Hyperparameter Settings for AECO-SC 
𝑁𝑒 𝑗= 𝐹𝐶𝑗 (𝑖)
 StyleGAN Framework 
𝑊𝑖𝑡+1 Hyperparameter Value 
𝑀𝐶,𝑅𝑚(𝑖)
𝑊𝑖𝑡+1 𝑖𝑡+1 𝑊𝑖𝑡+1 (𝑖)+
𝐹𝐶𝑗,𝐶𝑎𝑙 𝑒(𝑖) = 𝑊𝐹𝐶𝑗,𝐺𝑚(𝑖) + 𝑞 × [
𝐹𝐶𝑗,𝑅𝑓 −
2 Batch Size 32 
𝑗+1 Learning Rate (Generator) 0.0001 
𝑊𝐹𝐶𝑗,𝐺𝑚(𝑖)]      (20) 
Learning Rate 0.0004 
 (Discriminator) 
Improve the inferior small elephant replacement strategy: Epochs 200 
The ECO algorithm replaces the poor individuals in family Image Size 256 × 256 × 3 
clans to maintain population diversity, but reduces Latent Vector Dimension (z) 512 
convergence speed. Early iterations have significant Dropout Rate 0.3 
differences, while late iterations focus on population 
Activation Function Leaky ReLU (α=0.2) 
diversity. The worst 0.3Ne tiny elephants in each family 
Normalization Instance 
clan are replaced with new individuals in the evolutionary 
Normalization 
stage. Equation (21) generates new individuals during the 
AECO Population Size 30 
pre-evolutionary period, where 𝑖𝑡 < 𝑖𝑡𝑚𝑎𝑥. The 𝑖𝑡 +
AECO Max Iterations 100 
1 iteration of the family clan 𝑊𝑖𝑡+1
𝐹𝐶𝑗 , where 𝑊𝑖𝑡+1
𝐹𝐶𝑗,𝑥(𝑖) was 
 
the worst one to be replaced. 
4.1 Evaluation task 
 
𝑖𝑡+1 𝑖𝑡+1
𝑊𝑖𝑡+1 𝑖𝑡+1 𝑊 𝐶𝑗,𝑅𝑓(𝑖)+𝑊𝑀𝐶,𝑅𝑚(𝑖) The primary task is a binary classification of ad creatives 
𝐹𝐶𝑗,𝐶𝑎𝑙 𝑒(𝑖) = 𝑊𝐹𝐶𝑗,𝑥(𝑖) + 𝑞 × [
𝐹 −
2 into 'high engagement' vs. 'low engagement' based on 
𝑊𝑖𝑡+1 historical CTR data. Ads with CTR above the 75th 
𝐹𝐶𝑗,𝑥(𝑖)]      (21) 
percentile were labeled as high engagement (1), and others 
 
as low engagement (0). This classification target enables 
AECO enhances SC-StyleGAN by tuning 
the model to learn aesthetic and contextual cues that align 
hyperparameters using elephant-inspired population 
with user interaction patterns. The dataset was split 
dynamics. Each elephant’s position represents a candidate 
70/15/15 for training, validation, and testing. Evaluation 
solution, evolving through clan-based exploration and 
was conducted on the unseen 15% test set. Performance 
adaptive updates. This improves convergence and avoids 
metrics included NDCG, classification accuracy, and 
common GAN issues. However, the mapping between 
weighted F1-score FID, SSIM, and PSNR. Baseline 
AECO-SC StyleGAN: A Cross-Platform GAN Framework for…                                              Informatica 49 (2025) 389–402 397 
 
models included Visual Geometry Group Layout feature The training accuracy and loss over 50 epochs for the 
Neural Image Assessment (VGG + Layout features + proposed AECO-SC StyleGAN, XCEPTION, and VGG + 
NIMA) [20], XCEPTION [21], AdvAE-GAN [22] , Layout + NIMA are displayed in Figure 5(a, b). The 
BicycleGAN [22], V-GAN [23], and Vanilla GAN [23]. AECO-SC StyleGAN consistently achieves superior 
All models were trained under similar hardware and accuracy and inferior loss, demonstrating better learning 
optimization conditions to ensure a fair comparison. Table efficiency, faster convergence, and more stable training. 
4 shows the comparison of classifiers and their Accuracy values were mentioned in percentage. Both 
performance evaluation results.  training and validation accuracy curves show steady 
The evaluation pipeline begins with AECO-SC StyleGAN growth and eventual stabilization, indicating effective 
generating advertising creatives. These outputs are labeled learning with minimal overfitting. Similarly, the loss 
based on a CTR threshold to indicate high or low curves for training and validation exhibit a clear downward 
engagement. A classifier then predicts engagement levels, trend, reflecting successful convergence. These results 
allowing metrics like NDCG, Accuracy, and Weighted F1- highlight the model's ability to efficiently capture cross-
score to assess how well the generated creatives align with platform advertising dynamics, generate high-quality 
user interaction patterns. creatives, and maintain strong generalization across 
 datasets, ultimately improving engagement prediction 
4.2 Accuracy and loss performance. 
 
 
Figure 5: Accuracy and Loss Comparison of Models 
 
4.3 Ad image dimension distribution  
The GAN ad image dimension distribution operates as a 
cross-platform deep learning framework that generates 
multiple platform-optimized sizes for presented images. 
Standard display and mobile ad dimensions are the format 
choices for most images, which guarantee visual 
performance while ensuring cross-platform compatibility. 
Figure 6 displays the Ad image dimensions across digital 
platforms. 
 
4.4 Clicks-through rate (CTR) by platform 
The CTR performance stands tested across different 
platforms through the use of a GAN framework for 
dynamic ad optimization. Results indicate that 
performance metrics vary between platforms since mobile 
achieves better CTR than desktop. Through its creative  
 
adaptation, the GAN model demonstrates high 
Figure 6: Ad image dimensions across digital platforms 
engagement while showing the power of deep learning as 
 
a means to improve cross-platform digital advertisement 
 
results. Figure 7 shows the CTR distribution across four 
social platforms. 
398   Informatica 49 (2025) 389–402                                                                                                                                        Y. Zhang 
 
0.61, much better than the NDCG of 0.22 for the baseline 
VGG + Layout + NIMA model. From this, we can see that 
our model helps us better find and rank the strongest ad 
creatives first. With this accuracy, marketers are better 
equipped to promote content that has a significant effect. 
Figure 9 illustrates the NDCG scores of all the evaluated 
models. 
 
 
 
Figure 7: CTR distribution across four social platforms 
 
4.5 Convergence and runtime analysis 
To compare the stability of the training between AECO and 
Adam, by run both for 100 iterations shown in Figure 8.  
AECO demonstrated a quicker and smoother convergence,  
as seen by its early near-zero loss. Ad creative generation Figure 9: Illustrates NDCG performance results 
requires a stable and fast system, as it works with many  
constraints in quick optimization. By using adaptive 4.7 Accuracy 
learning, AECO avoids the problems of local minima and 
maintains consistency, which makes it effective than Adam Accuracy indicates how effectively the classifier predicts 
and RMSprop for cross-platform advertising. whether generated advertisements will result in high or 
low user engagement (based on CTR), thus assessing the 
effectiveness of the generated creatives. The high accuracy 
in generating platform-specific ad creatives consistently 
aligns with user engagement metrics, outperforming 
baseline models in aesthetic coherence, contextual 
relevance, and predictive performance across platforms. 
The results demonstrate that XCEPTION achieved an 
accuracy of 98.27%, while AECO-SC StyleGAN 
performed slightly better with an accuracy of 98.48%, 
showcasing their effectiveness in the given task. 
 
4.8 Weighted F1 
The F1-score balances precision and recall, crucial for 
 imbalanced engagement data, indicating how well the 
 model generates relevant, high-performing ads while 
Figure 8: Outcomes of convergence and runtime analysis minimizing misclassification. The Weighted F1 score was 
 a metric used to evaluate the performance of a GAN in 
4.6 NDCG dynamic advertising creative optimization, emphasizing 
precision and recall across various platforms. Table 4 gives 
The NDCG score was applied to measure how relevant and Evaluation of Ad Engagement Prediction Based on 
well-arranged the ad creatives were for users. Because of Generated Ad Creatives. The results show that AECO-SC 
this, NDCG is better suited for this task, as it rewards StyleGAN outperforms XCEPTION, achieving a higher 
higher positions for predicting relevant content. A better Weighted F1 score of 98.5% compared to 98.2%, 
NDCG means the model ranks the most engaging and demonstrating superior performance in dynamic 
appropriate content first, which helps in dynamic advertising creative optimization. Figure 10 displays the 
advertising situations where space and time are both accuracy and weighted F1 evaluation results. 
limited. The NDCG for the AECO-SC StyleGAN was  
AECO-SC StyleGAN: A Cross-Platform GAN Framework for…                                              Informatica 49 (2025) 389–402 399 
 
Table 5: Performance comparison of creative generation 
models on the Ad ImageNet dataset 
 
Method NDCG Accuracy Weighted 
(Mean (%) (Mean F1 (%) 
± SD) ± SD) (Mean ± 
SD) 
VGG + 0.22 ± 94.62 ± 94.3 ± 0.38 
Layout + 0.015 0.40 
NIMA [20] 
XCEPTION 0.45 ± 98.27 ± 98.2 ± 0.21 
[21] 0.020 0.25 
AECO-SC 0.61 ± 98.48 ± 98.5 ± 0.19 
StyleGAN 0.018 0.22 
[Proposed] 
 
In addition to reporting the mean ± SD, we performed 
paired t-tests to evaluate whether the improvements over 
 baseline models are statistically significant. The results 
 
confirm that the performance gains of AECO-SC 
Figure 10: Accuracy and Weighted F1 Evaluation Results 
StyleGAN over XCEPTION and VGG+NIMA are 
 
statistically significant with p < 0.01 for all three metrics.  
Table 4:  Evaluation of ad engagement prediction based 
 
on generated Ad Creatives 
 4.9 Performance comparison of generative 
Accuracy Weighted models 
Method NDCG 
(%) F1 (%) 
FID scores of the suggested AECO-SC StyleGAN are 
VGG + contrasted with those of other GAN-based baselines in 
Layout 
0.22 - - Figure 11. Among the tested methods, the proposed 
features + AECO-SC StyleGAN delivered the best quality, with an 
NIMA [20] FID score of 38.4752, compared to 42.3256 for AdvAE-
XCEPTION 
- 98.27 % 98.2 % GAN [22] and 45.0208 for BicycleGAN [22]. 
[21]  
AECO-SC 
StyleGAN 0.61 98.48 % 98.5 % 
[Proposed] 
 
4.8 Statistical evaluation of model 
performance 
To address this, we have conducted additional experiments 
using five different random seeds. For each seed, the 
model was trained and evaluated independently using the 
same data split. We now report the mean ± standard 
deviations for the key evaluation metrics, including 
Normalized Discounted Cumulative Gain (NDCG), 
Accuracy, and Weighted F1-score. Performance  
Comparison of Creative Generation Models on the Ad  
ImageNet Dataset given below Table 5. Figure 11: Generative quality evaluation model 
 comparison results. 
  
  
  
  
  
400   Informatica 49 (2025) 389–402                                                                                                                                        Y. Zhang 
 
The SSIM and PSNR metrics for AECO-SC StyleGAN 
and other GAN variations are shown in Figure 12. Higher 
PSNR and SSIM values indicate better image fidelity and 
structural similarity to the original images. The AECO-SC 
StyleGAN shows the highest PSNR of 35.8 dB and an 
SSIM of 0.95 which is better than the PSNR of 33.5 dB 
and SSIM of 0.92 for V-GAN [23] and the PSNR of 28.4 
dB and SSIM of 0.85 for Vanilla GAN [23]. It means that 
AECO-SC StyleGAN is able to create images that are 
more clearly detailed and accurate than other styles. 
 
Figure 13: Generated Ad Creative using AECO-SC 
StyleGAN 
 
4.11 Discussion 
Dynamic advertising creative optimization across multiple 
platforms aims to enhance user engagement and 
conversions by generating context-aware, personalized ad 
 content. Traditional models such as VGG combined with 
Figure 12: Model comparison results of image quality Layout features and NIMA [20] rely on fixed image 
assessment. features, limiting their capacity to capture the full spectrum 
 of complex, interactive visual and contextual patterns 
4.10 Visual results and assessment of visual inherent in cross-platform environments. As a result, the 
fidelity creatives they generate often lack adaptability and 
personalization, making them less effective in varied user 
To evaluate the visual fidelity of the proposed AECO-SC scenarios. Meanwhile, XCEPTION-based GAN [21] 
StyleGAN, we generated advertisement creatives using the models, although capable of deeper feature extraction, are 
Ad ImageNet dataset. Figure 13 illustrates side-by-side hindered by their high computational and memory 
examples of generated ads, showcasing a variety of demands. Their complex operations limit scalability and 
product categories including fashion, electronics, and pose challenges for deployment on lightweight or real-time 
skincare. The generated ads closely match real ones in advertising platforms, reducing practicality in widespread 
layout, color schemes, and promotional text, reflecting commercial use. 
platform-specific design aesthetics. While maintaining In contrast, the proposed AECO-SC StyleGAN framework 
coherence, the model introduces subtle variations that add addresses these limitations by integrating adaptive 
diversity and creativity. These results demonstrate that hyperparameter tuning and spatial conditioning to generate 
AECO-SC StyleGAN effectively replicates real ad high-fidelity, semantically consistent creatives tailored to 
characteristics, providing a scalable and automated specific platform requirements. AECO enhances 
approach for cross-platform ad generation. convergence and training efficiency, while SC-StyleGAN 
 ensures visual and contextual alignment across formats. 
This leads to better performance and improved resource 
utilization, offering a scalable and intelligent solution for 
dynamic advertising creative generation in diverse 
deployment environments. 
 
 
 
 
AECO-SC StyleGAN: A Cross-Platform GAN Framework for…                                              Informatica 49 (2025) 389–402 401 
 
While AECO-SC StyleGAN improves convergence speed compression and distillation techniques to reduce training 
and reduces memory consumption relative to baseline time and memory consumption without sacrificing output 
GANs during training, it still requires substantial quality. 
computational resources overall, particularly due to its  
large model size and high-resolution output generation. References 
However, once trained, the model supports relatively 
efficient inference, making it suitable for real-time or near- [1]  Geng, T., Sun, F., Wu, D., Zhou, W., Nair, H., & Lin, 
real-time deployment scenarios. Z. (2021). Automated bidding and budget optimization 
 for performance advertising campaigns. SSRN. 
5 Conclusions https://doi.org/10.2139/ssrn.3913039 
[2]  Leow, K. R., Leow, M. C., & Ong, L. Y. (2021). 
The DL framework uses GAN for dynamic advertising Online roadshow: A new model for the next-
creative optimization, enabling effective cross-platform generation digital marketing. In Proceedings of the 
strategies to enhance ad personalization and performance Future Technologies Conference (pp. 994–1005). 
in real time. Data collection involved the Ad ImageNet Springer, Cham. https://doi.org/10.1007/978-3-030-
dataset, consisting of multimodal ad samples. 89906-6_64 
Preprocessing included image resizing, tokenization, and [3]  Ameen, N., Sharma, G. D., Tarba, S., Rao, A., & 
intensity normalization. This approach demonstrates a Chopra, R. (2022). Toward advancing theory on 
scalable, efficient method for cross-platform ad creative creativity in marketing and artificial intelligence. 
optimization, ensuring higher engagement and visual Psychology & Marketing, 39(9), 1802–1825. 
coherence. The results show that the AECO-SC StyleGAN https://doi.org/10.1002/mar.21699 
method achieved an NDCG of 0.61, an accuracy of [4]  Gharibshah, Z., & Zhu, X. (2021). User response 
98.48%, and a weighted F1 score of 98.5%. These metrics prediction in online advertising. ACM Computing 
highlight the method’s high performance in optimizing Surveys (CSUR), 54(3), 1–43. 
dynamic advertising creatives with excellent precision and https://doi.org/10.1145/3446662 
relevance. Although AECO-SC StyleGAN shows [5]  Ouyang, X., Chen, Y., Zhu, K., & Agam, G. (2024). 
promising results in generating optimized ad creatives Image restoration refinement with Uformer GAN. 
with high quality, the training process remains In Proceedings of the IEEE/CVF Conference on 
computationally intensive due to the high-resolution Computer Vision and Pattern Recognition (pp. 5919-
outputs and multiple conditioning layers. The model may 5928).  
face challenges in ensuring consistency across diverse   https://doi.org/10.1109/cvprw63382.2024.00599  
platforms, handling large-scale real-time data, and [6]  Liang, Y., Deng, R., Lin, W., Deng, R., Zhu, X., & 
optimizing for varying audience preferences. It also Yu, L. (2025). Modeling and Reinforcement Learning 
requires significant computational resources for training. Assessment System for Quality Improvement of 
Future scope could focus on improving real-time Advertising Design. Computer-Aided Design & 
adaptability, cross-platform integration, and reducing Applications, 21, 188-200. 
computational costs for broader adoption in dynamic https://doi.org/10.14733/cadaps.2025.S7.188-200  
advertising. [7]  Patil, D. (2024). Generative Artificial Intelligence In 
 Marketing And Advertising: Advancing 
5.1 Limitations and future work Personalization And Optimizing Consumer 
Engagement Strategies. Available at SSRN 5057404. 
While AECO-SC StyleGAN shows promising results, it https://dx.doi.org/10.2139/ssrn.5057404  
presents notable limitations. First, training the model [8]  Terzioğlu, S., Çoğalmış, K. N., & Bulut, A. (2024). 
requires significant computational resources, with 30+ Ad creative generation using reinforced generative 
hours of training time on high-memory GPUs, limiting adversarial network. Electronic Commerce Research, 
accessibility for smaller teams. Second, generalization 24(3), 1491–1507. https://doi.org/10.1007/s10660-
across domains remains a challenge. Early tests on ad 022-09564-6 
categories like automotive and electronics suggest reduced [9]  Chen, J., Xu, J., Jiang, G., Ge, T., Zhang, Z., Lian, D., 
performance, warranting domain-adaptive retraining. & Zheng, K. (2021). Automated creative optimization 
Third, although AECO-SC generates high-quality for e-commerce advertising. In Proceedings of the 
creatives, its deployment in real-time ad systems is Web Conference 2021 (pp. 2304–2313). 
untested. Future work will explore integration with ad https://doi.org/10.1145/nnnnnnn.nnnnnnn 
delivery platforms and A/B testing frameworks to assess  
live performance metrics such as CTR and Return on Ad  
Spend (ROAS), moving toward a fully automated ad  
generation and evaluation pipeline. Focus on model 
402   Informatica 49 (2025) 389–402                                                                                                                                        Y. Zhang 
 
[10]  Li, G., & Yang, X. (2024). Two-stage dynamic [20]  Vempati, S., Malayil, K. T., Sruthi, V., & Sandeep, 
creative optimization under sparse ambiguous R. (2020). Enabling hyper-personalisation: 
samples for e-commerce advertising. SN Computer Automated ad creative generation and ranking for 
Science, 5(8), 1–16. fashion e-commerce. In Fashion Recommender 
https://doi.org/10.1007/s42979-024-03332-z Systems (pp. 25–48). Springer International 
[11]  Li, Q., & Zhou, E. (2024). Design and Publishing. https://doi.org/10.1007/978-3-030-
implementation of automatic generation algorithm 55218-3_2 
for advertising artistic design based on neural [21]  Moreno-Armendáriz, M. A., Calvo, H., Faustinos, 
networks. Computer-Aided Design & Applications, J., & Duchanoy, C. A. (2023). Personalized 
21, 114–127. advertising design based on automatic analysis of 
https://doi.org/10.14733/cadaps.2024.S18.114-127 an individual’s appearance. Applied Sciences, 
[12]  Baardman, L., Fata, E., Pani, A., & Perakis, G. 13(17), 9765. 
(2021). Dynamic creative optimization in online https://doi.org/10.3390/app13179765 
display advertising. SSRN. [22]  Kong, M. (2025). A study on optimizing deep 
https://doi.org/10.2139/ssrn.3863663 learning models for creative generation of animated 
[13]  Meng, Q., & Wei, R. (2024). Creative advertising new media advertisements: an application based on 
design combining CAD and generating adversarial improved generative adversarial networks (GANs) 
networks. Computer-Aided Design & Applications, and variational autocoders (VAEs). J. COMBIN. 
21, 102–116. MATH. COMBIN. COMPUT, 127, 7227-7248. 
https://doi.org/10.14733/cadaps.2024.S27.102-116 DOI: 10.61091/jcmcc127a-401 
[14]  Gao, B., Wang, Y., Xie, H., Hu, Y., & Hu, Y. [23]   Kong, M. (2025). Deep Learning Model 
(2023). Artificial intelligence in advertising: Optimization in Creative Generation for New 
Advancements, challenges, and ethical Media Animated Ads. 
considerations in targeting, personalization, content https://doi.org/10.21203/rs.3.rs-5879017/v1  
creation, and ad optimization. Sage Open, 13(4),  
21582440231210759.  
https://doi.org/10.1177/21582440231210759 
[15]  Jiang, L., Li, C., Chen, H., Gao, X., Zhong, X., Qiu, 
Y., ... & Niu, D. (2023, August). AdSEE: 
Investigating the Impact of Image Style Editing on 
Advertisement Attractiveness. In Proceedings of 
the 29th ACM SIGKDD Conference on Knowledge 
Discovery and Data Mining (pp. 4239-4251). 
https://doi.org/10.1145/3580305.3599770  
[16]  Shilova, V., Santos, L. D., Vasile, F., Racic, G., & 
Tanielian, U. (2023, September). Adbooster: 
Personalized ad creative generation using stable 
diffusion outpainting. In Workshop on 
Recommender Systems in Fashion and Retail (pp. 
73-93). Cham: Springer Nature Switzerland. DOI: 
https://doi.org/10.1007/978-3-031-76878-1_5  
[17]  Xu, C., Zhou, M., Ge, T., Jiang, Y., & Xu, W. 
(2023). Unsupervised domain adaption with pixel-
level discriminator for image-aware layout 
generation. In Proceedings of the IEEE/CVF 
Conference on Computer Vision and Pattern 
Recognition (pp. 10114-10123).  
[18]  Aghazadeh, A. and Kovashka, A., 2024. CAP: 
Evaluation of Persuasive and Creative Image 
Generation. arXiv preprint arXiv:2412.10426.  
https://doi.org/10.48550/arXiv.2412.10426  
[19]  Ma, M., & Zhao, W. (2024). Computer-Aided 
Brand Logo Design Based on Generative 
Adversarial Networks. 
https://doi.org/10.14733/cadaps.2024.S25.60-75  
https://doi.org/10.31449/inf.v49i12.9455                                                                                      Informatica 49 (2025) 403–418   403 
 
Adaptive Control of PV-Integrated Power Grids Using KNN-Smote-
GCN And Mpc Techniques 
 
Kun Zhang*, Xiaogang Wu, Zhizhong Li1, Yaotang Lv, Shiqi Liu 
China Southern Power Grid Power Dispatching Control Center, Guangzhou, Guangdong,510000, China 
E-mail: KunZhang654@outlook.com, YananZhang46@outlook.com, WenjingSi999@outlook.com, 
GuanghuaYang5426@outlook.com, JingpingGuo243@outlook.com 
*Corresponding author 
 
Keywords: auxiliary control technology, artificial intelligence, KNN-SMOTE-GCN model, large power grid, MPPT 
control 
 
Received: May 29, 2025 
 
As the global energy crisis intensifies, the integration of renewable energy—particularly photovoltaic (PV) 
systems—has become vital for achieving a sustainable and resilient power infrastructure. This study 
focuses on dynamic modeling and efficient control of grid-connected PV systems to enhance power quality 
and system reliability. An adaptive PI controller is employed for voltage regulation, with a maximum 
power point tracking (MPPT) method ensuring optimal energy harvesting. A DC-DC boost converter and 
a three-phase PWM inverter are incorporated, with MATLAB used for simulation. The proposed approach 
integrates Model Predictive Control (MPC) with Graph Convolutional Networks (GCN) to manage grid 
instability and improve energy efficiency. A novel KNN-SMOTE-GCN algorithm is developed to mitigate 
voltage distortion, harmonic currents, and power fluctuations. The system replicates the behavior of 
traditional generators under disturbances, promoting renewable integration without compromising 
stability. Key performance metrics such as voltage deviation, reactive power fluctuation, power loss, and 
total harmonic distortion (THD) are analyzed. 
Povzetek: Integrirani KNN-SMOTE-GCN in MPC izboljšata stabilnost PV-omrežij z natančnim MPPT, 
učinkovitim nadzorom napetosti ter zmanjšanjem izgub, nihanja jalove moči in THD. Metoda poveča 
kakovost energije in zanesljivost šibkih omrežij z visoko penetracijo PV. 
 
1 Introduction that produce more complex harmonics and demand 
reactive power are electronic power equipment. This 
The reckless use of hydrocarbons and nuclear power action causes voltage distortion, which impacts all 
threatens environmental safety and causes significant subsequent loads linked to the identical PCC. Optimal 
pollution. The truth of this energy source is prompting a performance of solar photovoltaic inverters is hindered by 
global movement toward renewable energy sources that the unpredictability of sun irradiation [6], [7]. Two 
are less harmful to the environment, including as wind examples of supplementary services that the inverter's 
power, PV, and others. Distributed power generating extra capacity may offer are reducing source current 
systems that employ renewable energy sources have harmonics while adjusting reactive load power. When it 
garnered significant interest due to the current focus on comes to PV-integrated systems, MPPT is a go-to for 
clean power generation [1], [2], [3]. Recent advances in reducing harmonics. One method for reducing PV system 
photovoltaic technology have led to the rapid adoption of grid current harmonics is the adaptive P&O (perturb and 
renewable energy production based on solar PV by both observation) MPPT algorithm, which incorporates sliding 
commercial and residential sectors. Reduced main power mode control [8], [9]. The goal of auxiliary regulation is to 
system load, maximum savings, and reactive power maintain grid stability by modifying power system 
support are just a few of the benefits that the distribution characteristics in response to imbalances, fluctuations, and 
grid may reap from integrating distributed solar PV disruptions. The grid, however, functions within 
generating plants [4], [5]. electricity quality and reasonable bounds and adapts efficiently to shifts in both 
dependability are both enhanced by solar PV electricity, generation and demand [10], [11]. Controlling the grid 
which lessens the strain on the central grid. The energy frequency entails modifying either electricity production 
quality usually drops as the use of non-linear loads or consumption to keep it within predetermined 
increases. It is also well known that most non-linear loads boundaries. Ensures that electrical equipment continue to 
404   Informatica 49 (2025) 403–418                                                                                                                               K. Zhang et al. 
 
function correctly by keeping voltage levels within certain sporadic balance of supply. Also, traditional power sources 
limits. Optimizes system performance by balancing the aren't practical for such a difficult job, and they drive up 
production and consumption of both reactive and active energy prices.  
electricity. More conventional approaches, such as DL,  The next step was to improve electrical distribution 
Machine Learning, etc [12]. These systems are often networks' power quality by using an optimization 
studied for their possible use in power system approach. It employs a hybrid design that incorporates 
optimization, control performance, and forecasting. Due to shunt and series compensators to address voltage drops, 
a lack of sophisticated automation infrastructure, many harmonics, and imbalance, among other power quality 
system operations are being performed with modest concerns. Afterwards, MPPT was used to derive the 
degrees of automation at the time. AI is expected to play a greatest amount of power from the grid system. Controller 
significant role in the future power system, according to for MPC to ascertain the system's overall stability and 
several studies, technical papers, and case studies [13], performance. In addition, the model was tested on the 
[14]. This is because AI will introduce state-of-the-art MATLAB platform and its reliability was assessed by 
techniques of system optimization while simultaneously measuring voltage variation, reactive power fluctuations, 
decreasing the need for human participation. Research on grid current, and THD. 
AI for grid system power flow optimization is now at a  
premium. The auxiliary services that help to reduce 1.2 Motivation  
frequency variations are crucial to the reliability of ac 
power networks. Large synchronous generators' Many issues, including power quality, stability, 
electromechanical inertia is the only available resource for dependability, and supply management, may arise as a 
absorbing frequencies disturbances on subsecond time result of the increasing need for big power grid-connected 
scales at the moment. This means that switching from systems. In addition, the total system performance might 
traditional thermal power plants to NREs, which are be negatively impacted by power quality concerns as a 
inertialess, puts grid stability at risk from things like result of variations. It is possible for there to be an 
unexpected power production outages. Grids with high imbalance in the power demand and generation frequency 
penetrations of NREs may suffer from electromechanical fluctuations. Next, problems with the power factor, such as 
inertia, which may disrupt system stability. To address this, a low power factor, might cause the power distribution 
virtual synchronous generators have been suggested, system to lose more power and increase energy usage. 
which mimic traditional generators. In this paper, we Voltage instability is the root cause of both linear and non-
provide a new method of controlling virtual synchronous linear problems. Voltage regulation may be subpar due to 
generators that uses a configurable time scale to reduce the the persistent use of insufficient control mechanisms in 
supplied inertia, which is large at short intervals to absorb power grid systems. Ensuring the stability and operation of 
faults as effectively as traditional generators but sets in big power networks also relies heavily on rules and norms 
motion coherent frequency oscillations when it doesn't that specify acceptable power quality values. As a result, 
[15], [16]. We test how well our adaptive-inertia approach grid systems need an intelligent auxiliary regulatory 
handles large-scale transmission networks that experience technology that can effectively lessen the burdens on them. 
unexpected power outages. It is more stable than earlier  
proposed methods and consistently outperforms traditional 1.3 Contributions  
electromechanical inertia. The numerical simulations 
demonstrate that the quasioptimal placement of adaptive- Despite the paper's focus on intelligent real-time power 
inertia devices enhances the damping of interarea grid regulation and control, no mention of research into 
oscillations and effectively absorbs local faults. In future building the comprehensive functional foundation of a 
low-inertia power grids that have significant penetrations dispatching intelligent assistant driving network is made. 
of NREs, our findings demonstrate that the suggested The study and evaluation of real-time regulation and 
adaptive-inertia control system is a great way to improve control business aims to explore fresh artificial intelligence 
grid stability [17], [18], [19], [20]. application methods for various business processes, as 
 well as the principle and implementation characteristics of 
1.1 Problem statement  a grid-assisted control system based on AI thinking and 
decision-making in regulation and control operations. In 
In today's world, contemporary power systems are order to achieve the shift from empirical to intelligent 
complemented with large-scale renewable energy systems, control and enhance the degree of control over the power 
allowing for more efficient operations. Accurate energy grid, we provide solutions to raise the bar for artificial 
production and efficient control systems to manage while intelligence in terms of both interaction and performance. 
guarantee a reliable power supply are also necessary for In order to achieve maximum power generation, it is 
optimum power systems. However, there is a degree of necessary to control the working point of photovoltaic 
uncertainty due to the high electrical consumption and the panels. For this regulation procedure to be successful, 
Adaptive Control of PV-Integrated Power Grids Using KNN…                                              Informatica 49 (2025) 403–418 405 
 
there are two primary components that are required: an technology that generates and verifies strategies for multi-
MPPT algorithm that serves as the reference for the MPP, dimensional scheduling agents using deep reinforcement 
and a voltage controller that guarantees a steady learning. In addition to providing solid technical support 
functioning at the MPP. for power grid operation, that research may enhance the 
One of the most significant benefits of adopting MPC is accuracy and effectiveness of section dispatching 
that it has the ability to simplify the process of developing decision-making, optimize the section control strategy 
a variety of controllers while also working to continually, and more. 
accommodate system limits within its formulation. In According to [23], when a problem occurs, the generator 
addition, the introduction of KNN-SMOTE-GCN as a network determines the unit output plan using the 
user-friendly optimization approach is suggested in this combined wind, light, and electrical demand data from a 
study as a means of enhancing the cost function of the northwest area of China. A specialized system generation 
MPC controller. fault recovery strategy is developed for that grid fault using 
 This research work is structured as follows: Section 2 data on actual power load while actual renewable energy 
describes the research articles that were relevant to the output before and after the fault. The strategy aims to 
framework that was developed; Section 3 describes the minimize the cost of system power generation while 
problem statements; Section 4 explains the proposed considering the constraints of secure operation of the 
hybrid framework; Section 5 analyzes the results of the system. It turns out that the expert system's fault recovery 
methodology that was proposed; and Section 6 describes method is much different from the one used in the early 
the research conclusion. stages of training, and that the error value is very high. 
 After a generative adversarial network is fully trained, it 
2 Related work can approach the fault recovery expert system with an 
auxiliary decision-making scheme that works in different 
 Experts from [21] grid operators use neuro-fuzzy logic for situations with different loads and new energy outputs, and 
dynamic reactive power adjustment. The energy storage it can keep the error between the two schemes to less than 
system may also be effectively managed using that logic. 5%. Results from studies examining power grid fault 
After that, SP UPQC was used to improve the electrical recovery strategies using models of generative adversarial 
distribution networks' power quality. It employs a hybrid imitation learning networks demonstrate the force control 
design that incorporates shunt and series compensators to system's capacity for autonomous and secure fault 
address voltage drops, harmonics, and imbalance, among recovery. 
other power quality concerns. Afterwards, maximum With the goal of conducting real-time tracking on the 
power point tracking was used to derive the greatest operating state of the power grid, eliminating potential 
amount of power from the electricity network. Controller safety hazards, and upgrading the power grid from 
for Model Predictive Control to ascertain the system's "manual analysis" scheduling to "intelligent analysis" 
overall stability and performance. In addition, the model scheduling, the authors of [24] propose an integrated 
was tested on the MATLAB platform and its reliability was framework to aid decision-making of online accident 
assessed by measuring voltage variation, grid current, processing using large power grids. The study covers five 
reactive power fluctuations, and Total Harmonic aspects: integrated information support system, aid 
Distortion. decision-making afterwards, risk perception in grow, 
Enhancing the effectiveness of section control of large online fault diagnosis, and visual display. 
power grid, altering the traditional experience-led The writers of the cited work, [25] an online trend analysis 
dispatching mode, and improving the intrinsic safety level technology with a functioning mode arrangement for large 
of the power grid are all goals of the experimental team in power grids is suggested, drawing on references to the 
[22]. They study intelligent section auxiliary decision- growth of intelligent dispatching support systems and their 
making algorithms in depth and build a new intelligent dynamic security assessment technologies, in light of the 
dispatching structure framework of the power grid using growing importance of grid dispatching operations in 
deep learning and simulation environments. To build a understanding future state security changes. Estimated 
more realistic simulation of the power grid's dynamic power flow in the future is based on the power grid's 
characteristics under varied operating circumstances, an present operating mode, online stability conclusion, data 
environment that is suited for the upcoming AC-DC hybrid from fresh energy and load forecasts, dispatch scheduling, 
big power grid is first built. Secondly, a scheduling agent and dispatch operation adjustment. The auxiliary decision-
that takes into account the power grid's characteristics and making approach for control allows for fast assessment of 
the dispatcher's behavior is researched using the power future security situations and trends. With the use of that 
grid's historical operation data and the dispatcher's real technology, the power grids of Heilongjiang and Central 
control data. Finally, to address the issues of poor China have been able to transition from empirical to 
regulation speed, complex regulation decision-making, intelligent control, and precontrol techniques for 
and inadequate technical support ability, authors study the 
406   Informatica 49 (2025) 403–418                                                                                                                               K. Zhang et al. 
 
complicated power grid dispatching operations have power grid static stability was made possible with the 
received technological support. introduction of technology for knowledge graph 
The tiny sensor sample unit, energy metering device, automation engines. To demonstrate the efficacy of the 
communication unit, protection control device, suggested approach, an example using a real-world 
performance evaluation unit, etc. were all combined by the electricity system is provided. Regulatory and control 
experimenters of [26]. In conjunction with the transformer, operators may benefit from the study's findings by better 
keeping its original dimensions and construction. It is understanding the current state of operations and making 
possible to analyze the measured data locally, allowing for more informed decisions about the power system. 
an intelligent and transparent observation of the Explorationally, it may be useful for enhancing the 
performance indicators of the transformer. building of online intelligent active security defense 
Simultaneously, it can accomplish intelligent monitoring, structures on big power grids. 
reduce energy consumption and save energy, and aid in the  
creation of new power systems without uploading a 3 PV generated system integrated to 
mountain of normal and abnormal data. 
weak grid 
Using deep reinforcement learning, the authors of [27] 
provides an auxiliary control method for large-scale power 
The basic architecture of a three-phase grid-connected 
grid segments. An intelligent agent for power grid section 
double-stage solar power plant is shown in Figure 1. The 
control is built using the Deep Deterministic Policy 
integration of solar electricity into the electrical grid is 
Gradient algorithm. That agent provides real-time control 
achieved via the employment of this sort of technology, 
methods in complex power grid settings, taking into 
which guarantees effective power conversion and 
account both the safety and economics of power grid 
maintains grid stability. To generate and transmit 
operations. That justifies the proposal of a two-stage 
electricity from the solar PV array to the unreliable utility 
optimization approach that takes sensitivity into account. 
grid, the system relies on a number of moving parts, all of 
When operators are unable to remove the section 
which contribute in different ways. The PV array generates 
restriction via real-time control, they offer them with the 
the majority of the system's renewable electricity. It relies 
optimum market intervention strategy. At last, the efficacy 
on a network of solar panels to generate DC power from 
of market intervention plans and real-time control 
sunlight. The PV array's power production is directly 
mechanisms are tested via case studies. The methodology 
related to the amount of solar irradiation and temperature 
presented in that study improves the system's economy by 
that it can operate at. Maximizing the conversion of solar 
lowering the clearing price through an average of 1.2% 
energy into grid-ready alternating current electricity is the 
while the average adjustment amount through 37.6% under 
system's primary objective. To get the most power out of 
various section limits resulting from power generation 
the solar PV system, the DC-DC boost converter is an 
components participating in the market, as compared to the 
absolute must. For maximum efficiency in power 
current rules. 
conversion, it raises the DC voltage produced by the PV 
The authors of [28] looking at the power grid from a 
array until it is equal to or greater than the DC-link voltage. 
knowledge graph perspective, researchers were able to 
In order to keep the PV array running at its optimum power 
develop a functional framework for an intelligent evaluator 
point no matter what happens to the weather or irradiance, 
that could assess static stability, make decisions based on 
the boost converter works using a MPPT algorithm. An 
that evaluation, and be an all-around smart algorithm. That 
MPPT method known as Perturb along with Observe is 
evaluator took into account the stability state evaluation 
used to optimize the amount of energy harvested by the PV 
index while optimization control strategy data from 
array. One of the most popular ways to increase the output 
various power grid operation scenarios. The 
of solar PV systems is by using this algorithm.  
implementation of a visual evaluation tool for large-scale 
Adaptive Control of PV-Integrated Power Grids Using KNN…                                              Informatica 49 (2025) 403–418 407 
 
DC bus
Inverter
DC Power grid
DC
ESS
Control Algorithm
Virtual Inertia  
Figure 1: Auxiliary power control in large power grid 
 
It works by monitoring the change in power output and as well as temperature under typical conditions: 𝐼𝑠_ref , 𝐺ref  
making adjustments to the operating voltage of the PV and 𝑇ref . The current changes with irradiation and 
array at regular intervals. When the power goes up, the temperature change, as shown in Eq. (1); yet, the 𝐼sat  
adjustment stays the same; when it goes down, it goes in fluctuation in temperature is the only determinant of 
the other way. The technology is able to maintain optimal current. In accordance with Kirchhoff's law, the PV panel's 
performance regardless of environmental changes because output current ( 𝑣𝑝𝑣 ) is given through: 
of this iterative procedure that continually monitors the PV 𝐼𝑝𝑣 = 𝐼𝑠 − 𝐼𝑑 − 𝐼𝑠ℎ𝑢                        (2) 
array's MPP. By responding in real-time to variations in Yes, it means we can: 
temperature and irradiance, the P&O MPPT algorithm 
𝑞(𝑣𝑝𝑣+(𝐼𝑝𝑣∗𝑅𝑆𝑒𝑟))
keeps the boost converter operating at the ideal voltage 𝐼𝑝𝑣 = 𝐼𝑠 − 𝐼𝑠𝑎𝑡 [𝑒𝑥𝑝⁡ ( ) − 1] −
𝑛𝑘𝑇
input from the PV array.  In areas where the amount of 
𝑉𝑝𝑣+(𝐼𝑝𝑣∗𝑅𝑆𝑒𝑟)
sunshine varies throughout the day, the efficiency of the                             (3) 
𝑅𝑠ℎ𝑢
solar PV system depends on this capability to monitor the With: 
MPP under changing circumstances. 
𝑞(𝑣𝑝𝑣+(𝐼𝑝𝑣∗𝑅𝑆𝑒𝑟))
 𝐼𝑑 = 𝐼𝑠𝑎𝑡 [𝑒𝑥𝑝⁡ ( ) − 1]               (4) 
𝑛𝑘𝑇
3.1 PV array modelling 
And: 
𝑉
To enhance the voltage or current level, the PV panel uses 𝐼 𝑝𝑣+(𝐼𝑝𝑣∗𝑅
𝑠ℎ𝑢 = 𝑆𝑒𝑟)
                       (5) 
𝑅𝑠ℎ𝑢
numerous modules linked in series or parallel, accordingly. 
 
A current source, two types of resistance (series and shunt), 
3.2 DC-DC converter 
with an antiparallel diode make up the equivalent circuit of 
a PV cell, as shown in Figure 2. The current source ( 𝐼𝑠 ) is Here is one way to express the transfer function of the 
expressed by de following equation: boost converter: 
𝐺
𝐼𝑠 = ( ) (𝐼 1
𝐺 𝑠_𝑟𝑒𝑓 + 𝐾𝑠𝑐 ⋅ (𝑇 − 𝑇𝑟𝑒𝑓))             (1) 𝑣
𝑟𝑒𝑓 𝑚 = 𝑣              
1−𝐷 𝑝𝑣       (6) 
where irradiance (G) and ambient temperature (T) are the The relationship between the average currents flowing into 
two variables. The coefficient of short-circuiting current is and out of an electrical device may be expressed as 
denoted as 𝐾𝑠𝑐 . The following are the current, irradiation, follows: 
408   Informatica 49 (2025) 403–418                                                                                                                               K. Zhang et al. 
 
1
𝐼𝑝𝑣 = 𝐼                      (7 where ?̂? stands for the value of the normalized property. 
1−𝐷 𝑑𝑐 ) 
The function min(x) finds the lowest value in the values of 
The equation for the DC bus may be written as: 
𝑑𝑣 the attributes while max(x) finds the highest value. 
𝑑𝑐 1
= (𝐼 𝐼
𝑑 𝐶 𝑑𝑐 − 𝑖𝑛𝑣)                                     (8) 
𝑡  
 3.5 Missing value completion 
3.3 DC-AC inverter 
One approach that uses nearby data points is KNN (K-
It is possible to transform DC electricity into AC voltage Nearest Neighbors) interpolation. The goal of this 
with the frequency and amplitude of our choice thanks to technique is to estimate the target point's value by 
the inverter, the adaptation step. The inverter control comparing it to the values of the K data points that are 
makes it possible to inject higher-quality currents and known to be the closest to it. For KNN interpolation, the 
powers (P,Q) into the grid. The input/output inverter fundamental procedures are these: 
voltage relationship is defined as: Choose the K-value: Choose the optimal K-size by 
 determining its value, often using cross-validation. 
𝑣𝑎𝑛 = (𝑆1 − 𝑆2)𝑣𝑑𝑐 Determine Distance: Find the total distance in geometric 
{𝑣𝑏𝑛 = (𝑆2 − 𝑆3)𝑣𝑑𝑐 units between the current location and all other known 
𝑣𝑐𝑛 = (𝑆3 − 𝑆1)𝑣𝑑𝑐 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(9) locations. This is the formula for the distance in geometric 
 
𝑣𝑎 𝑣 2 −1 −1 𝑆1 units: 
𝑣 𝑑𝑐
[ 𝑏] = [−1 2 −1] [𝑆2]  
𝑣 3
𝑐 −1 −1 2 𝑆3
 𝑙(𝑥 𝑀 2
𝑙 , 𝑥𝑓) = √∑𝑚=1     (𝑥𝑙,𝑚 − 𝑥𝑙,𝑚)                    (14) 
where 𝑣𝑑𝑐  is the DC voltage, 𝑣𝑖𝑛(𝑖 = 𝑎, 𝑏, 𝑐) and 𝑆𝑗(𝑗 =  
1,2,3) consist of alternating current voltages and signals where 𝑥𝑖 and 𝑥𝑗 constitute data points, with M serving as 
indicating the current state of the switches. Here is the the data dimension. 
equation for grid voltages: How to Determine the K-Nearest Neighbours:  choose the 
 K known points of data that are most closely located to the 
𝑣𝑔𝑎 𝑣𝑎 𝐼𝑔𝑎 𝐼𝑔𝑎 desired location. 
𝑣 𝑑
[ 𝑔𝑏] = [𝑣𝑏] + 𝑅 [𝐼𝑔𝑏] + 𝐿 [𝐼𝑔𝑏]                  (10) Weighted averaging: Give each of your K neighbors a 
𝑣 𝑑𝑡
𝑔𝑐 𝑣𝑐 𝐼𝑔𝑐 𝐼𝑔𝑐 weight that is inversely proportionate to their distance 
 from you. The formula for the weighted average 
The goal of studying and realizing the decoupling among interpolation for the K closest neighbors is 
the active (P) with reactive (Q) capabilities was to regulate ∑𝐾
?̂? = 𝑘=1   𝑤𝑘𝑦𝑘                        (15) 
∑𝐾
      
them independently. If we want a fair system, we can just 𝑘=1   𝑤𝑘
put down the powers 𝑃𝑔 and 𝑄𝑔 as follows: where 𝑦𝑘 𝑤𝑘 is the weight that defines the distance from 
 the location to be interpolated, and is often specified as the 
3 opposite proportion of the distance, and is the value of the 
𝑃𝑔 = (𝑣 +
2 𝑔𝑑𝐼𝑔𝑑 𝑣𝑔𝑞𝐼𝑔𝑞)
{ 3                                  (11) k-th neighbour: 
𝑄𝑔 = (v I
2 gq gd − vgdIgq)  
1
Indeed, we can write: 𝑤𝑘 =                         (16) 
𝑑𝑘
  
3
Pg = v
2 gdIgd 3.6 Deal with unbalanced data 
{ 3                                              (12) 
Qg = − v )
2 gdIgq In classification tasks, when minority samples are 
 oversampled, an interpolation approach called Synthetic 
where vgdq as well as Igdq, which stands for grid current. Minority Oversampling Technique (SMOTE) is used to 
 address imbalanced datasets. By augmenting the dataset's 
3.4 Normalization diversity via the synthesis of fresh minority samples, 
SMOTE boosts the classifier's performance. The detailed 
The data were standardized to ensure that the model's procedures are these: 
accuracy was unaffected by dimensions. The min-max  
scaling approach was used for normalization in this Pick a Representative Sample: Pick a representative 
research. sample at random from the minority group. 
  Determine the sample's k closest neighbors by using a 
𝑥−𝑚𝑖𝑛(𝑥)
?̂? =                     (13) distance measure. 
𝑚𝑎𝑥(𝑥)−𝑚𝑖𝑛(𝑥)
 
Adaptive Control of PV-Integrated Power Grids Using KNN…                                              Informatica 49 (2025) 403–418 409 
 
 Create a fresh sample by using the following formula to shown by the increased energy wasted at the grid system. 
synthesize a neighbor from among these k neighbors at As a result, the grid power plant facilitates degradation 
random: cost in order to fulfill the particular demand at the grid 
 system's discharge point. Equation (6) also allows for the 
new_sample = 𝑥1 + 𝜆(𝑥1 − 𝑥1)                       (17) modeling of the deterioration cost using a quadratic 
 function. 
3.7 Maximum power point tracking (MPPT)  
d′j[c
′
j(n)] = δ ′ 2
jcj(n) + μ ′
jcj(n) + λi                  (20) 
The DC-DC boost converter controls the output of PV  
cells, which is one of its dual functions. As a result, MPPT Where, δj, μj and λi represents the degradation cost 
is simplified and the output voltage is reliably controlled. 
function and is represented by operational cost parameters 
This study combines a DC-to-DC converter with the 
d′j ∣ c
′
j(n)]. Because of the limited integration between the 
widely known MPPT algorithm to optimize power 
operating cost parameters and the grid system's 
extraction from PV panels. The operating point must be 
discharging rate, the constant value here must be 
dynamically changed to the Maximum Power Point in 
associated with the grid system's discharge rate. However, 
order to accommodate changing weather conditions. The 
using eqn. (21) in the following context, the cost function's 
low cost and user-friendliness of the MPC algorithm led to 
simplicity is related: 
its selection for MPPT. The MPC algorithm tracks the PV 
 
array's current and voltage down to the microsecond in 
order to foretell how a voltage modification will play out. f ′[c′j (n), p′(n)] = d′j[c
′(n)] + o′j − U ′
j ∣ Cj(n)] (21) 
This approach may be more resource intensive, but it can  
adapt to new conditions very fast. A little amount of energy Where, o′j is the formula for the lumped cost. Here, power 
can be saved in that gadget for use in seconds, and its is supplied by the grid at a net cost rate according to the 
performance is assessed by comparing the discharged and electricity pricing unit with either an off-peak or peak-time 
charged powers of the device. At all times, the following tariff. Not to mention that the fixed price unit diverges 
equation (4) describes how the charging and discharge from the original cost function. 
rates of the constraints are combined with the battery  
efficiency. 3.9 Design of MPC  
1
W′
ess(n) = W′
ess(n − 1) + αcp
′
cΔn − p′ Δn
α d An extensive evaluation of the reference grid currents is 
d
W′ carried out, taking into consideration various factors such 
ess ≤ W′
ess(n) ≤ W′
ess ⋅ max  
{ p′c(n) ≤ p′ as the presence of nonlinear loads at the Point of Common 
c⋅max
p′d ≤ p′ Coupling, regulation of the DC link voltage, and dynamic 
d⋅max
variations in PV power. This reference current is fed into 
 
the MPC controller, which then calculates the quantity of 
            (18) 
switching pulses required for optimum functioning. 
 
Considering the dynamic changes in PV power, ensuring 
Where, W′
ess  The energy storage limits are represented by 
p′ stable control of the DC link voltage, while tolerating 
c, the charging power is p′d, and the battery efficiency 
nonlinear loads at the PCC allow the system to efficiently 
while charging and discharging is αc. 
supply reference grid currents that sustain efficient 
 
operation. The MPC controller enhances the system's 
3.8 Cost function 
general efficiency and stability by using these currents to 
Using three crucial factors, including 1) the energy and identify the optimum switching pulses. As a result, the 
following equations (8), (9), (10), (11), define the key 
discharging rate of each grid system, 2) the degradation 
cost of the battery and the discharge rate, and 3) the function of the charging station's net cost function in 
regards to multi-objective optimization problems. 
operation cost of other activities such service chargers and 
cable wear, we need to build the net cost function of the  
j^"th" grid system. First, use the following equation (5) to minCj(n) = ∑j∈T(n)     sj ∣ Cj(n) + G(n)]
express the grid system discharge rate. Cj(n) = Cj(n)∀j ≠ i ∈ T(n)
 j j              (22) 
Cmin ≤ Cj(n) ≤ Cmin∀j ∈ T(n)
Uj⌊C
′
j(n)⌋ = p′(n)C′j(n)                   (19) j
SOCmin ≤ SOCj(n) ≤ 100%∀j ∈ T(n)
 
 
where⁡p′(n) represents the unit pricing with the grid 
Where, Cj(n) represents the cost function of a grid system 
aggregator at time n, and c′j(n stands for the discharge rate 
charging station, and is depicted as the minimization of the 
of each network grid at that specific time n. In this case, 
net cost function for every grid system charging stations, 
the degree of the generated aggregator grid system is 
410   Informatica 49 (2025) 403–418                                                                                                                               K. Zhang et al. 
 
∑j∈T(n)   appears as the energy cost function for the jth user Where the Fourier transform is denoted by F. Equation (1) 
over the time interval t. Additionally, the suggested KNN- may be simplified to describe the convolution process f*x 
SMOTE-GCN model's flowchart is shown in figure 2. in the spatial domain through the use the inverse Fourier 
 transform 𝐹−1 to both sides. 
 
𝑓 ∗ 𝑥⁡= 𝐹−1(𝐹(𝑓) ⊙ 𝐹(𝑥))
Proposed KNN-SMOTE-GCN with MPPT algorithm                 (24) 
⁡= 𝑈((𝑈𝑇𝑓) ⊙ (𝑈𝑇𝑥))
 
Where U stands for the Fourier basis while ⊙ means 
Power loss
multiplication element-wise. The goal of the GCN was to 
Voltage deviation provide a way for neural networks to use the association 
graph. The GCN does this by obtaining the Fourier basis 
Reactive power fluctuation from the graph's Laplacian matrix. What if 𝐿𝑚 = 𝐷 − 𝐴 is 
a graph's Laplacian matrix. One way to standardize it is as 
Verifcation THD
 𝐿𝑚 = 𝐼𝑁 − 𝐷1/2𝐴𝐷1/2 ∈ ℝ𝑁×𝑁, where 𝐼𝑁 is the 
Figure 3: Typical Model Diagram for KNN-SMOTE- neighboring matrix and denotes a unit matrix. For the 
GCN degree matrix, D stands for 𝐷𝑢 ∈ ∑ ⁡ , 𝐴𝑢𝑓. Then, using the 
 
eigenvalue decomposition, one may derive the Fourier 
3.10 Graph convolutional network (GCN) basis, U, and the eigenvalue matrix ∧. 
 
Building the association graph: A collection of nodes V 
𝑈
and edges E may be characterized as a graph G(V, E). The ∧𝑈
𝑇 = 𝐿𝑚 , 𝜆 = 𝑑𝑖𝑎𝑔([𝜆0, … , 𝜆𝑁−1])               (25) 
 
connection between individual nodes 𝑣𝑓 and 𝑣𝑓 is signified 
U is a set of orthogonal matrices satisfying the Fourier 
by an edge 𝑒𝑓 ∈ 𝐸. In order to make it easier to aggregate 
𝑔 transform's mathematical constraints, based on the 
information in the graph framework, an adjacency matrix Laplacian matrix's properties. The diagonal matrix, 
𝐴 is built 𝐴[𝑖, 𝑗] = 1 if the edge 𝑒𝑓𝑗 exists, besides 𝐴[𝑖, 𝑗] = denoted as 𝑔𝑒 = 𝑑𝑖𝑎𝑔(𝑈𝑇𝑓). Next, we may simplify 
𝟎 then. Equation (2) by following these steps: 
The convolution theorem states that, in terms of forward  
propagation of the GCN, the Fourier transform of a 𝑓 ∗ 𝑥 = 𝑈((𝑈𝑇𝑓) ⊙ (𝑈𝑇𝑥)) = 𝑈𝑔𝜚𝑈
𝑇𝑥          (26) 
convolution between two signals is the same as the  
pointwise multiplication of their individual Fourier Graphic convolution relies heavily on the eigenvalue 
transforms. Let 𝑓 ∗ 𝑥 Introduce the spatial domain decomposition of the Laplacian matrix. There is a 
convolution operation, which 𝑥 = {𝑥1, 𝑥2, … , 𝑥𝑛} ∈ 𝑅𝑛 quadratic relationship between the total amount of nodes 
stands for a dataset that has n pieces of data and 𝑓 = and the computing complexity when the graph size is big. 
{𝑓1, 𝑓2, … , 𝑓𝑛} are the neural network's trainable Graph convolution methods are mostly useful for small-
parameters. Using the Fourier transform, this procedure scale networks due to the high cost of eigenvalue 
may be converted to the frequency domain. decomposition. Figure 3 showed in GCN model. In order 
 to tackle this problem, Krizhevsky et al. suggested a 
𝐹(𝑓 ∗ 𝑥) = 𝐹(𝑓) ⋅ 𝐹(𝑥)                   (23) method for approximating g_ş via Chebyshev polynomials 
 T_k, that may be stated in the following way: 
 
Adaptive Control of PV-Integrated Power Grids Using KNN…                                              Informatica 49 (2025) 403–418 411 
 
Offline training
Online application
Feature selection and 
extraction Data loading Distribution network real-time 
measurement data
Historical data of Original features and 
distribution network labels
Random matrix 
Extract input features
strategy coder-decoder
Sample dataset
Data preprocessing
Training set Test set
Reactive power 
optimization model based 
on GCN
GCN model
Determine the optimal 
Adjust training parameters
structure of GCN
Real-time optimization 
strategy 
Train GCN model
Test and verify
Evaluate 
performance
 
 
Figure 3:  Proposed GCN model 
 
𝑔𝜚(𝛬) = ∑𝑘−1
𝑘=0   𝜃𝑘𝑇𝑘(?̃?)                                     (27) neighboring the central node. This leads us to the 
 following simplification of (6): 
in where θ stands for the Chebyshev coefficient while 𝑇𝑘  
for the k-th element of the Chebyshev polynomial. To be 2𝐿𝑚
𝑓 ∗ 𝑥 ≈ 𝜃0𝑥 + 𝜃1 ( − 𝐼 )
more precise, it is 𝑇𝑘(𝑥) = 2𝑥𝑇𝑘−1(𝑥) − 𝜆 𝑁 𝑥
𝑚𝑎𝑥 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(29) 
𝑇𝑘−2(𝑥), 𝑇𝑜(𝑥) = 1, and 𝑇1(𝑥) = 1.∧̃= 2 ∧/𝜆max − 𝐼 1 1
𝑁 ⁡≈ 𝜃0𝑥 − 𝜃 −
1 (𝐷
−
2𝐴𝐷 2) 𝑥
contains the eigenvalues of scale in a diagonal matrix. 
Then, we may write (4) as:  
𝑓 ∗ 𝑥 = 𝑈𝑔 ∘ 𝑈𝑇𝑥 ≈ ∑𝑘−1
𝑘=0   𝜃𝑘𝑇𝑘(𝑈 ∧̃ 𝑈𝑇)𝑥 = By setting the parameter 𝜃 = 𝜃0 = −𝜃1, (7) more 
∑𝑘−1 information about: 
𝑘=0   𝜃𝑘𝑇𝑘(𝐿?̃?)𝑥                                               (28) 1 1
 𝑓 ∗ 𝑥 ≈ 𝜃0 (𝐼𝑁 + 𝐷−
2𝐴𝐷−
2) 𝑥                           (30) 
where ?̃? = 2𝐿/𝜆max − 𝐼𝑁 and 𝜆max  stand for the highest  
eigenvalue of the Laplacian matrix. A more simplified Additionally, the settings allow the network to be trained 
version of the Chebyshev polynomials was developed by using backpropagation " 𝑊,𝐷c  often undergo 
Xiao et al. 𝜆max = 0 and 𝑘 = 2, that is, the data is only renormalization via ?̃? = 𝑊 + 𝐼𝑁 and ?̃?𝑡𝑙 = ∑𝑓   ?̃?𝑡𝑙, that 
aggregated from nodes that are in the first order is, in turn. Lastly, the spectral domain convolution 
operation is defined as: 
412   Informatica 49 (2025) 403–418                                                                                                                               K. Zhang et al. 
 
1 1 1 1
𝑓 ∗ 𝑥 ≈ 𝜃 (𝐼𝑁 + 𝐷−
2𝐴𝐷− −
2) 𝑥 = 𝜃 (?̃? 2?̃??̃?−
2) 𝑥                    i r radiation, all while maintaining a constant temperature of 
25 degrees Celsius for the photovoltaic array. Standard test 
                                                                            (31) 
conditions (STC) were used in order to determine the 
 
output of the solar panels while the temperature was set to 
4 Result and discussion  25 degrees Celsius. The Simulink model of the 
photovoltaic (PV) system, which illustrates the linked 
4.1 Configuration of PV system components and the interactions between them. 
Additionally, the mathematical model that is used to 
This study's suggested PV system is composed of a great explain the solar panel's electrical characteristics is 
deal of different components. A first step involves the use included into the PV module block, which serves as a 
of a solar panel to convert solar energy into electrical representation of the solar panel. It takes into account the 
energy. Through the use of a boost converter, the output input solar radiation as well as temperature in order to 
voltage of the array is thus increased while simultaneously generate the matching current–voltage (I–V) and power–
maintaining the appropriate voltage level. A DC–AC voltage (P–V) curves. It is the responsibility of the booster 
converter is provided in order to maintain a power factor converter block to monitor the drop in the output voltage 
of one while converting DC to AC. In addition, a of the PV array. The control algorithm that is used to guide 
transformer is used in order to raise the output voltage to the functioning of the boost converter via the utilization of 
the amount that is necessary for a common connection. In the MPPT approach is included inside it. The Maximum 
order to optimize power extraction, maintain a power Power Point Tracking (MPPT) algorithm continually 
factor of one, and modify junction voltage, the control analyzes and adjusts the PV system's operating point in 
group of the system is comprised of a number of different order to achieve maximum power extraction. Through the 
strategies that have gone through extensive study. In this use of the DC–AC inverter block, the DC power generated 
part, the primary issues that will be discussed are the by the PV array is converted into AC energy that is 
modeling of a solar power system and the performance of compatible with the grid. Furthermore, a power factor of 
the system. It begins by providing an overview of the one is assured, in addition to the maintenance of the quality 
characteristics of the PV module. It covers how the and interoperability of the AC power that is produced with 
photovoltaic module reacts to variations in temperature the utility grid. Transformers and grid connections are two 
and the amount of sunlight that it receives. Another examples of extra model construction parts that might be 
component that is included is the boost converter, which is used to depict the photographvoltaic (PV) system as a 
responsible for monitoring the reduction in the output whole as well as its connection to the conventional 
voltage of the PV array. An exhaustive amount of electrical grid. 
information is provided on the operation of the boost Results from a two-stage PV system with a three-level 
converter as well as its control mechanisms, which include inverter and a DC/DC converter that is linked to a weak 
the MPPT approach. With the help of the MPPT grid are shown below. Results show that the control 
technology, the photovoltaic (PV) system is able to run at method and inverter configuration were executed when the 
its maximum power output regardless of the changing system was evaluated under different dynamic situations. 
environmental conditions. A DC–AC inverter is also The PV array, DC/DC converter, and three-level inverter 
discussed in this section. This device converts DC energy that interface with the grid is all shown Table 1, which is 
generated by a photovoltaic array into AC power for grid the system schematic. In Table 1 we see the system's 
integration. While discussing the operation and parameters. Grid voltage sag, Grid voltage swell, 
management of the DC–AC converter, a power factor of irradiance change, and a comparison between two-levels 
one is maintained throughout the discussion. Within the with three-level inverters are among the operational 
context of this section's treatment of the modeling, situations that the system is evaluated under. Voltage on 
performance, and control elements of the PV system, the the grid, current via the grid, current through the VSC, 
PV module, boost converter, MPPT method, and DC–AC current through the PV array, and the weighted positive 
inverter are all dissected in great detail. sequence are the critical metrics studied. The stability, 
 power quality, as well as transient responsiveness of the 
4.2 Simulation  system under dynamic situations may be understood by 
examining these factors. 
During this section, the performance of the system was 
 
examined at a number of different levels of direct sunlight 
 
 
 
 
 
 
Adaptive Control of PV-Integrated Power Grids Using KNN…                                              Informatica 49 (2025) 403–418 413 
 
Table 1: System parameters 
 
Parameters Value 
PV Array 55 
Power Rating 35 kW 
Maximum Power (W) 211.802 
Short-circuit current Isc (A) 9.03 
Voltage at maximum power point Vmp (V) 27.9 
Cells per module (Ncell) 70 
Open circuit voltage Voc (V) 39.17 
Shunt resistance Rsh (ohms) 312.6345 
Temperature coefficient of Voc (%/deg.C) -0.36044 
Temperature coefficient of Isc (%/deg.C) 0.112 
Parallel strings 7 
Series-connected modules per string 23 
Boost Converter 
Inductor 𝐿cc (𝑚𝐻) 4 
Capacitor 𝐶𝑎𝑐(𝜇 𝐹) 100 
Voltage Source Converter 
Interfacing Inductor 𝐿𝑓(𝑚𝐻) 75 
𝑅𝐶𝑅𝑓(𝛺) 0.4 
𝑅𝐶𝐶𝑓(𝜇 𝐹) 100 
Grid Voltage and Frequency, (V) and (Hz) 433, 70 
DC link capacitor 2200 µF 
PV array current Ipv 3.46 A 
Inductance L 2 mH 
Resistor R 0.1 Ω 
PV array voltage Vdc 540 V 
Grid Frequency 50 Hz 
Grid Voltage rms 120 V 
The experimental environment and the recommended order to combine the objective function, constraints, and 
technique's effectiveness are described in this section. suitable optimization solvers, such as linear programming. 
Several metrics, including power loss, grid current, voltage or mixed-integer planning. After this, the control 
deviation, along with grid voltage, are used to assess the algorithms for the first, second, and third control levels are 
system's performance via the use of the innovative KNN- designed and implemented inside a hierarchical manage 
SMOTE-GCN algorithm. By redistributing loads and structure. This is done in order to further govern the 
arranging generating units, KNN-SMOTE- system. For the purpose of testing these control algorithms 
 and verifying that they are stable and effective, the system 
GCN systems improve the efficiency of power grids. To is then simulated under a variety of market condition 
optimize power quality, KNN-SMOTE-GCN controllers scenarios. With the last step, the control algorithms are 
regulate the grid's reactive power, voltage, and harmonic implemented for real-time operation. This means that the 
correction. The system is constantly adjusting the control system constantly checks and changes the distributed 
settings using fuzzy rules with real-time data to maximize power resources (DPRs) based on the data that is being 
power quality.  collected in real time. 
Implementation Steps  
There are various essential phases involved in the 4.2 Comparative analysis  
implementation process. Before anything else, it is 
necessary to gather historical data on demand, generation, Table 2 illustrates the existing techniques with their 
and market pricing. Additionally, forecasting models description.  
should be used in order to make predictions about future  
demand, renewable generation, and market prices. In the  
next step, the optimization problem is stated and used in  
414   Informatica 49 (2025) 403–418                                                                                                                               K. Zhang et al. 
 
Table 2: Comparison techniques When power grid voltages deviate from their nominal or 
Technique Description ideal values, this is known as voltage deviation. Nominal 
Active Filters To reduce harmonic voltage standards could vary by region while kind of 
distortion and enhance electrical system, although they often range from 230V to 
power quality, active filters 400V and beyond. Voltage must be maintained constant 
are a useful tool.  and under control for grid-connected electrical gadgets and 
Wavelet Neural •  These may be machinery to work reliably. A number of factors contribute 
Networks (WNN) used in the creation of to voltage fluctuations' potential effects on the 
controllers for auxiliary performance and longevity of electrical devices. When the 
damping.  real voltage exceeds the nominal voltage, overvoltage 
Artificial Neural •  Power systems occurs. As seen in figure 5, the term "under voltage" is 
Networks may have their dynamic used when the real voltage is less than the nominal voltage. 
(ANNs): responsiveness improved Active filter
70
with the help of ANNs.  WNN
Virtual  The grid may benefit from 60 ANNs
Synchronous the inertia and damping 
50
Generator (VSG) provided by VSGs. 
Deep •  Damping 40
Deterministic controllers may be 
30
Policy Gradient designed with the help of 
(DDPG) DDPG. 20
 
10
Power loss occurs in a grid system when electrical energy, 
in the process of transmission and distribution, dissipates 0
as heat. Transmission or distribution losses are other 0 1 2 3 4 5 6 7 8 9 10
names for this occurrence. A lower current density per unit Time (s)  
of power is a common result of increasing voltage. When  
the voltage or current in a three-phase circuit is not Figure 5: Voltage deviation 
balanced between the phases. Voltage or load imbalances  
cause an uneven distribution of electricity, which in turn The efficiency, reliability, and performance of an electrical 
causes losses. Low power factor happens when the network are all impacted by fluctuations in reactive power 
voltage-current relationship is not ideal. Figure 4 show that in a grid system. Maintaining safe voltage levels and 
when the power factor is low, the reactive power increases, powering inductive loads both need reactive power. There 
leading to higher losses in the transmission and are a lot of potential sources of reactive power fluctuations, 
distribution systems. which might lead to undesirable outcomes. Reactive power 
 is a component of electrical power that does nothing useful 
while it sways between the generator and the consumer. 
8
"Reactive volt-amperes" is the standard measuring unit. 
7 When inductive loads are included or excluded, changes to 
the load profile may cause variations in reactive power. As 
6
seen in figure 6, fluctuations in generator output, 
5 Neural especially in synchronous generators, may affect reactive 
network power. 
4
 
3 KNN-
SMOTE-
2 GCN
1
0
0 1 2 3 4 5 6 7 8 9 10
Time (s)
 
Figure 4: Power loss and time analysis 
 
 
Power loss (Watts)
Voltage deviation (volts)
Adaptive Control of PV-Integrated Power Grids Using KNN…                                              Informatica 49 (2025) 403–418 415 
 
Figure 6: Reactive power fluctuation 
Active filter
700  
WNN
ANNs A grid system experiences THD when harmonic 
600 components are present in the voltage or current waveform 
in relation to the fundamental frequency. In power 
500 systems, the fundamental frequency is typically 50 or 60 
Hz, and harmonics are multiples of that. Harmonics may 
400 be caused by a variety of sources, including non-linear 
loads and switching operations.  
300  
200
100
0
0 1 2 3 4 5 6 7 8 9 10
Time (s)
 
0,18 Fundamental (50Hz) V THD= 0.33%
0,16
0,14
0,12
0,1
0,08
0,06
0,04
0,02
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Harmonic order
 
Figure 7: THD analysis 
 
We get the THD by Figure 7 by dividing the root mean reaches a high enough voltage to meet the load's 
square (RMS) value of the harmonic content by the RMS requirements. Whenever power is needed, it is transferred 
value of the fundamental frequency. Typically, total from the stored energy in the inductor to the load. The duty 
harmonic distortion is expressed as a proportion of the cycle, or gate pulse input, is responsible for carrying out 
fundamental frequency. Additionally, figure 6 shows THD the whole operation. It is vital to manage the duty cycle. 
in action. The performance comparison yielded better After then, it's a matter of getting the most electricity out 
findings from the proposed work's assessment of of the PV array in any weather. To maximize the voltage 
performance. The comparative assessment has shown that and power output of a photovoltaic array, irradiance and 
the proposed model has successfully minimized the THD temperature are the two most critical elements. Therefore, 
as much as possible. Consequently, PV systems that are it is necessary to monitor the maximum power stage, 
linked to the grid may use it. In this study, we maximize which is near the PV array's maximum power. The MPPT 
the produced output power of the PV panel by using a DC- was created to provide a standardized, efficient tracking 
DC converter using MPPT. Step one involves regulating system. Prior research has explored a wide variety of 
the boost converter's duty cycle. It is necessary to MPPT methods for peak power tracking. Some MPPT 
gradually raise the DC voltage of the PV array until it methods, like P&O, which uses step-size control as well as 
Reactive power (Var)
Mag (% of fundamental)
416   Informatica 49 (2025) 403–418                                                                                                                               K. Zhang et al. 
 
oscillates around steady state in response to dynamically were very promising for the proposed system, obtained 
changing environmental variables, have been shown to after an exhaustive series of simulations meticulously 
have significant drawbacks, however. The incremental executed on the MATLAB/SIMULINK platform. The 
conductance method is more complicated and expensive predictive control systems utilized demonstrated 
[11], [19], but it responds quickly to changing conditions. remarkable robustness in the face of dynamic variations in 
The controllers utilized in this investigation yielded solar radiation levels, allowing for a constant energy 
promising outcomes since they were based on production profile relative to the energy production 
mathematical principles. There has been an astonishing profile. Additionally, the suggested system's adaptability 
level of consistency throughout the whole energy output, to rapidly changing weather conditions ensures continuous 
leading to a steady supply of 27 MW of pumped electricity and dependable energy generation, thus establishing its 
to the grid. This is true even if the amount of sunlight status as a robust and resilient energy solution alternative. 
reaching Earth has changed during the course of the day. As we navigate into the future of photovoltaic (PV) 
Many researchers and professionals in the field have taken systems, it is wise to direct research efforts on 
an interest in photovoltaic (PV) systems. The incremental investigating and perfecting innovative control techniques. 
conductance + integral regulator strategy is one of the The overarching goal is to make the system far more 
methods proposed for training the MPPT controller; it is efficient and productive, with an unwavering commitment 
referenced. The goal of developing this method was to to producing even more remarkable and dependable 
ensure that the photovoltaic (PV) system operates at its results. Furthermore, the research plan includes a 
maximum power point in all weather conditions, thereby comprehensive comparison study, an exhaustive endeavor 
optimizing its performance. Also, a Proportional-Integral aimed at methodically contrasting the effectiveness of 
(PI) controller was recommended as a method for these novel control methods with the performance metrics 
controlling the DC-AC converter in the study. The of the current systems. The whole capability of 
conversion of direct current (DC) from solar panels to sophisticated control techniques is expected to be exposed 
alternating current (AC) for grid integration relies on this by using this methodical approach. By streamlining grid 
converter. It should be noted that various control connectivity, these methods are poised to change the 
techniques become unstable when exposed to large course of renewable energy generation. Ultimately, this 
fluctuations in solar irradiation. Keeping energy output research adds to the growing body of knowledge on 
steady is made more difficult by the fact that solar radiation renewable energy sources by introducing a new 
is inherently unpredictable, especially when clouds are photovoltaic (PV) system and demonstrating the system's 
present or when the sun's beams are changing. A change in inherent capacity to address major energy and 
the amount of power supplied into the system could be environmental issues. This contribution demonstrates the 
discernible if sun irradiation decreases. Concerns about the potential of state-of-the-art control systems and 
practical applications of PV systems, particularly those optimization methodologies, building a foundation for a 
connected to the electricity grid, are highlighted by this future that is sustainable, energy-efficient, and kind to the 
phenomenon. Although mathematically-based controllers environment. 
have performed well in conditions of relatively constant  
solar radiation, they may require additional tuning to 5 Conclusion  
account for the challenges posed by sudden and 
unexpected changes in solar radiation. These findings are In order to improve grid-connected PV systems, this study 
important because they show how important it is to have presented a new KNN-SMOTE-GCN method. In this case, 
adaptive control systems that can adjust to new conditions the UPQC model is used to enhance power quality. This 
and maintain a steady power supply and stable grid. model controls voltage and current concerns to assure 
Research in this area may focus on creating more resilient better power quality. Beyond that, the MPPT algorithm, 
and flexible solar controllers in the future by combining which controls the grid system dynamically, extracts the 
real-time weather forecasts with sensor-based feedback maximum power from solar panels. By using GCN, the 
systems. To further improve the reliability of grid- grid system's MMPT and UPQC operations may be 
connected photovoltaic (PV) system [17]s, research into coordinated to ensure optimal power quality. Hence, power 
energy storage alternatives like batteries may also provide loss, voltage deviation, total harmonic distortion, and 
a means of reducing the impact of variations in sun reactive power variations make up the assessment criteria. 
irradiation. Solar energy consumption might be maximized In addition, we compare the resultant parameter 
with these upgrades, which would be a huge step toward considerations to those of more traditional models. 
creating sustainable energy and integrating systems. According to the results, the created KNN-SMOTE-GCN 
A cleaner and more sustainable energy landscape may be paradigm reduced power loss by 4% compared to the other 
achieved via the total performance and efficiency of models. The voltage deviation is 26.42V and the total 
photovoltaic (PV) systems, which can be enhanced harmonic distortion is 0.56THD. When applied to hybrid 
through this synthesis of current approaches. Study results 
Adaptive Control of PV-Integrated Power Grids Using KNN…                                              Informatica 49 (2025) 403–418 417 
 
renewable energy systems, DL models and optimization based on multi-modal information. In 2023 6th 
algorithms will improve BESS in the future. International Conference on Mechatronics, Robotics 
 and Automation (ICMRA)( (pp. 6-11). 
References IEEE.  https://doi.org/10.1109/icmra59796.2023.107
08213  
[1] Zhao, M., Li, Y., Ding, P., & Li, F. (2025, January). [10] Zhang, Y., Han, F., Jiao, Y., Li, S., & Zhang, Z. (2024, 
Research and Development of Hybrid Intelligent December). Identification of model parameter and 
Enhancement Analysis and Control System for measurement error in large-scale power grid base on 
Stability of Large Power Grid. In 2025 International graph attention network. In 2024 4th International 
Conference on Electrical Automation and Artificial Conference on Intelligent Power and Systems 
Intelligence (ICEAAI) (pp. 274-277). IEEE. (ICIPS) (pp. 422-426). IEEE. 
https://doi.org/10.1109/iceaai64185.2025.10956751     https://doi.org/10.1109/icips64173.2024.10900016  
[2] Lian, H., Hu, S., & Meng, Y. (2021, October). [11] Lu, W., Xu, W., Li, T., Luo, F., Yuan, Z., & Zha, X. 
Research on Supporting Control Technology of Wind (2023, July). Research on Online Auxiliary Decision-
driven generator Auxiliary Power Grid Based on Making Technology for New Energy Off-Grid 
Energy Storage DC Access. In 2021 International Systems. In 2023 5th International Conference on 
Conference on Advanced Electrical Equipment and Power and Energy Technology (ICPET) (pp. 867-
Reliable Operation (AEERO) (pp. 1-4). 873). IEEE. 
IEEE.  https://doi.org/10.1109/aeero52475.2021.9708 https://doi.org/10.1109/icpet59380.2023.10367587  
401   [12] Hu, Y., Liu, H., Zhao, J., He, X., Wei, X., Guo, X., ... 
[3] Dong, J., Qin, J., Ling, L., Lin, X., Chen, P., & Meng, & Zhou, N. (2024, October). Intelligent Flexible 
F. (2024, July). Research on Frequency Control and Control Device and Technology for Distributed PV. 
Optimization of Power Grid Auxiliary Load for In 2024 21st International Conference on Harmonics 
Diversified Vehicle Network Interaction. In 2024 3rd and Quality of Power (ICHQP) (pp. 283-287). IEEE. 
International Conference on Energy and Electrical https://doi.org/10.1109/ichqp61174.2024.10768714  
Power Systems (ICEEPS) (pp. 669-675). IEEE. [13] Lv, L., Fang, X., Zhang, S., Ma, X., & Liu, Y. (2024). 
[4]  https://doi.org/10.1109/iceeps62542.2024.10693157   Optimization of grid-connected voltage support 
[5] Bin, D., Ling, Z., Yingqi, H., Wei, Z., Zhenhao, Y., & technology and intelligent control strategies for new 
Jun, Z. (2021). Research on key technologies of energy stations based on deep learning. Energy 
intelligent operation control of super-large urban Informatics, 7(1), 73. https://doi.org/10.1186/s42162-
power grid based on multi-center structure. In Journal 024-00382-8  
of Physics: Conference Series (Vol. 1738, No. 1, p. [14] Hu, C., Wu, X., Cai, H., Cheng, L., & Huang, J. (2024, 
012048). IOP September). Research on Application and Control 
Publishing.  https://doi.org/10.1088/1742- Technologies of the Embedded HVDC in a Provincial 
6596/1738/1/012048  Power Grid. In 2024 11th International Conference on 
[6] Xu, J. (2024, August). The Application of Internet of Power and Energy Systems Engineering (CPESE) (pp. 
Things Technology in Intelligent Wind Power Grid. 336-340). IEEE. 
In 2024 4th International Conference on Energy https://doi.org/10.1109/cpese62584.2024.10840670  
Engineering and Power Systems (EEPS) (pp. 555- [15] Qiu, J., Xia, S., Zhang, J., Ren, Z., Xu, G., & Zhang, 
561). J. (2020, September). Research on key technologies of 
IEEE.  https://doi.org/10.1109/eeps63402.2024.1080 communication for large-scale stability control system 
4370  in modern power grid. In 2020 12th IEEE PES Asia-
[7] Sujatha, G. (2025). A Solar PV Integrated UPQC to Pacific Power and Energy Engineering Conference 
Enhance Power Quality using SEA (APPEEC) (pp. 1-5). IEEE. 
GullANFISAlgorithm. Informatica, 49(8).  https://do https://doi.org/10.1109/appeec48164.2020.9220391  
i.org/10.31449/inf.v49i8.6158  [16] Zhang, N., & Zhu, L. (2024, May). Research on 
[8] Liu, Z., Yao, N., Fan, Q., Zhu, X., & Xue, H. (2023, Intelligent Power Grid Attack Detection System 
August). Reasoning simulation of substation power Based on Machine Learning. In Proceedings of the 
grid fault events based on knowledge map technology. 2024 International Conference on Machine 
In 5th International Conference on Information Intelligence and Digital Applications (pp. 480-
Science, Electrical, and Automation Engineering 486).  https://doi.org/10.1145/3662739.3671374  
(ISEAE 2023) (Vol. 12748, pp. 918-923). SPIE.  [17] Syafiqah, M.N., Azzirah, M.R., Noor, S.Z., & 
 https://doi.org/10.1117/12.2690053  Suleiman, M. (2024). Artificial Intelligent Maximum 
[9] Lu, D., Liu, Y., Qiu, Z., & Huang, X. (2023, Power Point Tracking (MPPT) for Three Phase 
November). Intelligent decision technology for safety Transformerless Grid Inverter Technology. 
risk management and control of power operation site International Journal of Academic Research in 
418   Informatica 49 (2025) 403–418                                                                                                                               K. Zhang et al. 
 
Economics and Management Sciences. integration of main and auxiliary functions. In 2024 
https://doi.org/10.6007/ijarems/v13-i4/23085  3rd International Conference on Energy, Power and 
[18] Yang, R., Qian, J., Ji, Y., & Wang, M. (2024, August). Electrical Technology (ICEPET) (pp. 1436-1441). 
Construction of Intelligent Grid Automation IEEE. 
Dispatching System Based on SVG Technology.  https://doi.org/10.1109/icepet61938.2024.10626047  
In 2024 International Conference on Power, [28] Liu, S., Wang, J., Qiu, S., Li, Z., Zhang, K., & Lou, N. 
Electrical Engineering, Electronics and Control (2024, October). Research on Auxiliary Control 
(PEEEC) (pp. 65-70). Strategy for Large-scale Power Grid Based on Deep 
IEEE.  https://doi.org/10.1109/peeec63877.2024.000 Reinforcement Learning. In 2024 IEEE 4th 
18  International Conference on Digital Twins and 
[19] Sun, X. (2025). A Review of Vehicle - to - Grid (V2G) Parallel Intelligence (DTPI) (pp. 206-210). IEEE.  
Technology with Low Power - grid Impact. Academic  https://doi.org/10.1109/dtpi61353.2024.10778859  
Journal of Science and Technology. [29] Yang, H., Zhao, G., Xianjin, L., Li, Z., & Liu, D. 
https://doi.org/10.54097/spvbz820  (2024, September). Construction and application of 
[20] Zhou, X., Zhou, M., Chen, Y., Wang, J., Luo, X., & static stability intelligent evaluator for large power 
Ma, S. (2024, May). Exploration and Practice of grid from the perspective of knowledge graph. 
Virtual Power Plant Under Mega City Power Grid. In Journal of Physics: Conference Series (Vol. 2846, 
In 2024 IEEE 2nd International Conference on Power No. 1, p. 012026). IOP Publishing. 
Science and Technology (ICPST) (pp. 1411-1416). https://doi.org/10.1088/1742-6596/2846/1/012026  
IEEE. 
https://doi.org/10.1109/icpst61417.2024.10602362  
[21] Yue, T. (2025). Sensor-based life detection of solar 
cells. Informatica, 49(9). 
  https://doi.org/10.31449/inf.v49i9.5586  
[22] Zhang, K., Wu, X., Li, Z., Lv, Y., & Liu, S. (2024). 
Research on intelligent auxiliary regulation 
technology of large power grid section based on 
artificial intelligence. Journal of Electrical Systems. 
https://doi.org/10.21203/rs.3.rs-6774959/v1  
[23] Li, Z., Zhang, K., Qiu, S., Wu, X., & Chen, X. (2024, 
April). Key Technologies of Power Grid Auxiliary 
Decision-Making Based on Artificial Intelligence. 
In 2024 7th International Conference on Energy, 
Electrical and Power Engineering (CEEPE) (pp. 
1028-1032). 
IEEE.  https://doi.org/10.1109/ceepe62022.2024.105
86585  
[24] Zhang, X., Zhou, D., Zhou, G., Cao, W., Wang, M., 
Wang, C., & Li, H. (2022, December). Research on 
auxiliary decision-making of power grid fault 
recovery based on generative adversarial imitation 
learning. In Proceedings of the 2022 4th International 
Conference on Robotics, Intelligent Control and 
Artificial Intelligence (pp. 1140-1145). 
https://doi.org/10.1145/3584376.3584578  
[25] Pan, F. (2025). Forecasting Solar Energy Generation 
Using Machine Learning Techniques and Hybrid 
Models Optimized by War SO. Informatica, 49(2). 
  https://doi.org/10.31449/inf.v49i2.7554  
[26] Jianfeng, Y., Changyou, F., & Guangming, L. (2015). 
On-line trend analysis technology of large power grid 
considering operation mode arrangement. J. 
Automation of Electric Power Sytems39-(1), 111-116. 
https://doi.org/10.1049/ic.2015.0233   
[27] Liu, T., Zhang, S., Chang, J., Sui, H., Li, H., & Yu, H. 
(2024, May). Digital transformer with deep 
https://doi.org/10.31449/inf.v49i12.9094           Informatica 49 (2025) 419-432   419 
 
MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced Language Model for 
Power Audit Text Understanding 
 
Jia Xiao-liang1, Li Sen1, Cui Xia1, Li Jing2, Sun Chang-peng1, Liu Dong-hua2, Chen Zheng-long2 
1State Grid Tianjin Electric Power Company, Tianjin, 300010, China 
2State Grid Tianjin Electric Power Company Chengdong power supply branch, Tianjin, 300250, China 
E-mail: jxlseuee@163.com 
*Corresponding author 
 
Keywords: Power audit text, multi-dimensional information retrieval, large language model (LLM), audit category 
classification, multi-dimensional information retrieval-based bidirectional encoder representations from transformers 
(MDIR-BERT) 
 
Received: May 1, 2025 
 
In the rapidly evolving energy sector, efficient access to relevant information from power audit reports is 
crucial for informed decision-making, regulatory compliance, and operational improvements. However, 
the intricate language, complex vocabulary, and unstructured format of power audit texts present 
significant challenges for conventional information retrieval techniques. To address these issues, the 
research proposes a novel power audit text understanding technology that combines multi-dimensional 
information retrieval enhancement with a domain-adapted Large Language Model (LLM) to enhance the 
performance of power audit text processing. The Multi-Dimensional Information Retrieval-based 
Bidirectional Encoder Representations from Transformers (MDIR-BERT) method captures electric-
power-specific morphology, domain-specific vocabulary, and intricate entity relationships more 
effectively. MDIR-BERT is pre-trained on a huge quantity of electric power audit transcripts utilizing both 
word-level and entity-level covered language modeling tasks. The model is trained on a curated dataset 
of annotated electric power audit documents sourced from regulatory and industrial environments. MDIR-
BERT integrates domain-specific pre-training with both word-level and entity-level masked language 
modeling, capturing electric power-specific morphology, terminology, and complex entity relationships. 
The data preprocessing steps include comprehensive text cleaning, normalization, and tokenization to 
ensure high-quality input for method training.  Experimental results show that MDIR-BERT achieves a 
classification accuracy of 98.82%, representing a +16.86% improvement over the baseline EPAT-BERT 
model (81.96%), along with notable gains in precision, recall, and F1-score. These findings highlight the 
effectiveness of integrating enhanced information retrieval techniques with specialized language modeling 
for the intelligent understanding of power audit documentation, paving the way for more accurate, 
scalable, and interpretable audit methods. 
 
Povzetek: MDIR-BERT, izboljšan jezikovni model s večdimenzionalnim iskanjem informacij (MDIR), je 
razvit za razumevanje revizijskega besedila elektroenergetike. S predhodnim usposabljanjem na besedni 
in entitetni ravni dosega kvalitetno klasifikacijo revizijskih kategorij. 
 
1   Introduction collection of resources [2]. However, user intentions 
were more complex than simply retrieving information 
The development of Information Retrieval (IR) 
based on similarity [3]. This audit is conducted by a 
technology has been intimately linked to the human need 
qualified firm with the necessary competencies in line 
for information access. In recent years, IR and associated 
with the requirements established by the Ministry of 
product systems have expanded significantly as a critical 
Energy and Mineral Resources. These criteria apply to 
constituent of smart data dispensation tools. The basis of 
businesses or industries that utilize a significant amount 
IR technology is the identification of documents related to 
of energy. A complete audit evaluates all areas of energy 
the customer's search from a big and unorganized 
usage, from fuel consumption to the use of generated 
collection, which usually leads to a graded catalog of the 
electrical energy [4]. Lowering electricity costs and 
documents by significance and user requirements [1]. IR 
cutting down energy waste requires an energy audit.  
plays an essential role in numerous real-world functions, 
Efforts have to be initiated by governments to require 
like expert finding, digital libraries, and Web search. IR 
periodic energy audits for industrial buildings.  An 
essentially refers to the task of retrieving information 
energy audit is a great way to find the best solution and 
resources related to information required from a large  
420        Informatica 49 (2025) 419-432       J. Xiao-liang et al. 
 
assess how much energy a building uses [5]. The encode domain-specific terminologies and 
generative probability of word sequences, or more intricate entity relationships. 
generally, the ability to predict forthcoming words ➢ Obtained a classification accuracy of 98.82%, 
conditional on prior words, is a crucial function of representing a +16.86% relative improvement 
language models (LM).  LMs were first created for text compared to the baseline EPAT-BERT model, in 
creation, but they are also being studied for addition to significant boosts in precision, 
reformulating  recall, and F1-score. 
a range of NLP issues into different text-to-text  
challenges in the text of electric power audits [6].  1.2 Research questions 
The implementation of Large Language Models (LLMs) RQ1: Can a domain-adapted BERT model (MDIR-BERT) 
marks the most important change in the technical enhanced with multi-dimensional information retrieval 
development of electric power audit text [7]. LLMs mark outperform general-purpose BERT (EPAT-BERT) in power 
a substantial advancement in Artificial Intelligence (AI) audit text classification? 
as it makes breakthroughs in generalization and RQ2: How does multi-dimensional information retrieval 
adaptability across tasks, but LLMs generate inaccurate improve entity recognition and contextual understanding 
information, misalign with temporal information, in regulatory audit texts? 
struggle to keep context, and struggle to fine-tune each RQ3: What impact does domain-specific pretraining have 
response, leading to serious issues regarding reliability on the performance of language models in complex, 
when applied to electric power audit text [8]. In the unstructured audit document processing? 
continually changing energy industry, timely access to The research outline is organized as follows: Section 2 
essential information from power audit reports is critical reviews related research, while Section 3 outlines the 
for making informed decisions, conforming to research methodology. Section 4 presents the results and 
regulations, and improving operations. Conventional discussion, and Section 5 concludes the research. 
BERT-based models are not effective in encoding the  
sophisticated, domain-specific semantics in electric2  Related work 
power audit reports. There is a demand for models 
The transformational effects of LLMs on IR research were 
incorporating domain knowledge and sophisticated 
investigated in the research [9]. The method comprised 
retrieval methods to enhance classification and 
synthesizing findings from a strategy workshop organized 
information extraction accuracy. This research explores 
by the Chinese IR community. It suggested a new IR 
a new technology for understanding power audit reports 
technological paradigm involving IR models, LLMs, and 
that improves multi-dimensional IR and domain-adapted 
humans, but faces computational trade-offs, 
LLM performance by extracting morphology specific to 
trustworthiness concerns, domain boundaries, and 
electric power, domain-specific language, and 
implications. An analysis of e-commerce customer 
complexities of entities to use the Multi-Dimensional 
reviews on drum washing machines using Robotic Process 
Information Retrieval-based Bidirectional Encoder 
Automation (RPA) was demonstrated [10]. It combined 
Representations from Transformers (MDIR-BERT) 
ROST Content Mining System 6 (ROSTCM6) and 
method. 
LOGCONTROL-BLOCK systems to extract sentiment 
 
and correct audit robot paths. While effective in revealing 
1.1 Key contributions 
customer sentiments and guiding e-commerce strategies, 
➢ This research aims to develop a multi-
limitations include reliance on predefined keywords and 
dimensional information retrieval for improved 
the need for improved automated sentiment analysis 
classification and understanding of electric 
accuracy. 
power audit texts. 
The Mistral 8x7B LLM's current Mixture of Experts 
➢ Initially, Electric power audit reports from 
(MoE) architecture was combined with Retrieval 
energy-intensive sectors, which are obtained 
Augmented Generation (RAG) to improve on challenging 
from publicly accessible databases from 
IR and reasoning tasks, which were investigated in [11]. In 
Kaggle, represent various regulatory and 
the quantitative and qualitative evaluation of the model 
operational contexts.  
using the Google BIG-Bench dataset, notable gains were 
➢ Utilized preprocessing steps such as stop word 
observed in F1 score, accuracy, precision, and recall. 
elimination, lemmatization, and tokenization to 
Limitations include computing needs and dataset breadth. 
preprocess and normalize intricate technical 
Integrating LLMs with Knowledge Graphs (KGs) 
jargon for optimal model input. 
enhanced intelligent fault detection and IR for new energy 
➢ MDIR-BERT by pre-training on the electric 
vehicles (NEVs) [12]. It developed an intelligent fault 
power audit dataset with word-level as well as 
retrieval system, a structured knowledge graph, and an 
entity-level masked language modeling to 
optimized BERT model for fault classification,  
MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced…                                                         Informatica 49 (2025) 419-432   421 
 
demonstrating exceptional performance in Q&A robust sequences that allow for more successful 
situations for NEVs, but facing scalability issues. membership inference assaults under realistic threat 
To evaluate the Word to Vector (Word2Vec) model for models. It demonstrated a significant improvement in 
document compliance detection by comparing it with detection and True Positive Rate (TPR) with optimal 
Term Frequency–Inverse Document Frequency (TFIDF), sequences, achieving a 49.6% TPR on Qwen2.5-0.5B, 
Latent Dirichlet Allocation (LDA), and Bidirectional compared to 4.2% earlier, but has drawbacks due to 
Encoder Representations from Transformers (BERT) as reliance on model access without shadow models or 
described [13]. Results showed that Word2Vec gradient insertion. To compare forecasting MASI trends 
effectively captures semantic similarity with higher with ARDL with trend and seasonality (Long short-term 
efficiency and simplicity. However, it performs slightly memory (LSTM), and extreme gradient boosting 
lower than BERT in handling complex semantics and (XGBOOST) was determined [19]. ARDL, with trend and 
domain-specific terminology. A self-retrieval framework seasonality, returns the lowest MAPE, at 26.7%. 
[14] that leverages self-supervised learning was Limitations include LSTM and XGBOOST executing 
developed to improve retrieval efficiency and model higher error rates and taking longer to process. The Two 
simplicity. It internalized a retrieval corpus, improved Sliding Windows Graph Neural Network (TSW-GNN) 
downstream LLM applications, and outperformed architecture for text classification was introduced, which 
conventional IR systems. However, it faced high works around limitations of corpus-level graph approaches 
computing costs and scaling challenges, despite that suffer from continuous memory usage and are 
maintaining real-time efficiency and cross-domain completely contextually agnostic, was introduced [20]. 
generalizability. Predictive Analytics (PA) in Current The TSW-GNN model addresses this issue by introducing 
Research Information Systems (CRIS) to predict research TSW into the GNN architecture with a new dynamic 
trends through machine learning is used [15]. In this global sliding window and a new dynamic local sliding 
research, k-Nearest Neighbor had the best performance. window, increasing contextual memory and representation 
Limitations include moderate AUC scores and of semantics. Tests from the seven datasets reveal that the 
dependence on historical metadata to generate classification accuracies were improved, though at 
predictions. The Financial BERT (FinBERT) model, increased complexity of the two sliding windows and their 
specialized in the finance industry, was developed to associated GNN parameters. To explore the independent 
enhance sentiment analysis in financial writings [16]. role of internal auditors at the Swedish Police Authority 
FinBERT model outperformed traditional dictionaries in and to illustrate their relational struggles within the 
classifying context-dependent sentiment and organization was described [21]. The research adopts a 
Environmental, Social, and Governance (ESG)-related narrative framework in the study of auditor independence 
talks with minimal training data, but faced limitations in and introduces stories of auditors highlighting 
domain-specificity and potential decreased psychological distress, ambiguity in legitimacy, and 
generalizability. The use of LLMs in auditing was attempts to negotiate competing demands. The picture 
investigated in [17], with an emphasis on compliance painted by these narratives can be viewed as a tragedy 
checks and report production. LLMs effectively handle where auditors were unable to resolve tensions that 
unstructured data, address compliance concerns, and manifested themselves as professional dilemmas. Results 
provide excellent audit reports, despite challenges like showed LLMs perform well in noise handling but struggle 
data security and model interpretability. The research with falsehood management. An overview of the related 
[18] enhanced LLM privacy audits by creating more work is given in Table 1. 
 
                   Table 1: Overview of the related works 
 
Ref. Objective Task Type Domain Model Used Method Limitations 
No. 
Ai et Investigate Information General Not specified Strategic Computational 
al., [9] the role of Retrieval workshop trade-offs, trust 
LLMs in proposing IR- concerns, and 
IR LLM-human ethical issues 
research paradigm 
Sun Analyze e- Sentiment E-commerce RPA, ROSTCM6, Keyword Relies on 
and commerce Analysis LOGCONTROL- extraction, predefined 
Huo, reviews BLOCK path keywords, 
[10] using correction, limited sentiment 
automation sentiment accuracy 
classification 
422        Informatica 49 (2025) 419-432       J. Xiao-liang et al. 
 
Xiong Improve IR + Reasoning General Mistral 8x7B with RAG + High computing 
and IR and RAG Mixture of needs, limited 
Zheng, reasoning Experts dataset 
[11] evaluated on 
BIG-Bench 
Zhang Enable Classification + New Energy Optimized BERT + Fault Scalability issues 
et al., intelligent Retrieval Vehicles KG classification 
[12] IR for using KG-
NEVs enhanced 
BERT 
Wen et Evaluate Document Legal, Audit Word2Vec, TFIDF, Semantic Slightly lower 
al., [13] Word2Vec Similarity LDA, BERT similarity via performance 
for vector models than BERT in 
document complex 
complianc semantics 
e detection 
Tang et Merge IR IR General Self-Retrieval LLM Self- High 
al., [14] functionali supervised computational 
ty within a corpus- cost, scaling 
single internal IR complexity 
LLM 
Azerou Predict Trend Research kNN, SVM, Predictive Moderate AUC, 
al et al., research Forecasting Managemen Random Forest analytics with dependent on 
[15] trends in t machine historical 
CRIS learning metadata 
Huang Domain- Sentiment Finance FinBERT Domain- Limited 
et al., specific Classification adapted BERT generalizability 
[16] sentiment for financial 
analysis sentiment 
Gan, Automate Report Auditing LLM-based Process Data security, 
[17] audit Generation + unstructured interpretability 
complianc Classification audit data for 
e reporting 
Panda Enhance Membership General Qwen2.5-0.5B Robust Requires model 
et al., privacy Inference canaries for access, no 
[18] auditing audit testing shadow models 
Oukho Compare Financial Stock ARDL, LSTM, Time series Higher error and 
uya  et forecasting Forecasting Market XGBOOST modeling with processing time 
al., [19] models for trend and in LSTM and 
MASI seasonality XGBOOST 
trends 
Li  et Improve Text NLP TSW-GNN Local and Increased model 
al., [20] text Classification global sliding complexity and 
classificati window graph parameter tuning 
on with construction 
sliding 
windows 
Nordin  Explore Organizational Public Narrative Story-based Unresolved 
et al., internal Behavior Sector / Framework analysis of tensions, 
[21] auditors' Analysis Audit auditor roles emotional strain, 
independe and ambiguous 
nce legitimacy 
challenges 
  
 
MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced…                                                         Informatica 49 (2025) 419-432   423 
 
Existing approaches have various difficulties, including audit report entries collected from Kaggle. Each entry 
computational complexity, ethical problems, scaling includes an audit report ID, audit text, a list of extracted 
challenges, decreased accuracy, a limited dataset scope, named entities, and a category label. The technical power 
low generalizability, data security threats, and reliance on audit reports cover equipment, energy systems, and 
embedding quality. These constraints impede real-time, compliance, supporting tasks like entity recognition and 
domain-specific, and reliable information retrieval in classification across categories such as safety, efficiency, 
specialist sectors, such as electric power auditing. To and regulation. Audit texts range from 15–40 tokens, 
address these issues, the research explores a new averaging around 25 tokens. Entities cover standard 
technology for understanding power audit reports that equipment (e.g., Load Balancer) and locations (e.g., 
improves multi-dimensional IR and domain-adapted Control Room), enabling comprehensive analysis of 
LLM performance by extracting morphology specific to energy systems. The obtained dataset is split in 80:20 ratios 
electric power, domain-specific language, and for training and testing performance.  
complexities of entities to use the MDIR-BERT Model.  
 3.2   Data preprocessing 
3 Multi-dimensional information Data preprocessing is the procedure of converting fresh 
retrieval (MDIR) data into an organized and cleansed form to improve model 
performance. It cleans, normalizes, and tokenizes electric 
MDIR is an advanced retrieval technique that expands 
power audit texts to provide high-quality input for model 
the traditional keyword-based search into a 
training while also improving classification and 
multidimensional framework, incorporating semantic 
information retrieval accuracy. It includes approaches, 
meaning, contextual relevance, domain-specific lexicon, 
such as stop word removal, lemmatization, and 
relationships between entities, and user intent, thereby 
tokenization, to arrange power audit texts in a structure that 
enabling retrieval that is precise and comprehensive. 
reduces noise while providing high-quality information. It 
MDIR increases the power of a text understanding 
allows for efficient and accurate IR, entity recognition, and 
process for audit text, allowing for the MDIR-BERT 
classification operations in the system. 
model to better capture the complex, technical, 
 
unstructured nature of power audit documents and their 
3.2.1   Data cleaning using stop words removal 
underpinning. This section gathers the electric power 
Data cleaning is the process of removing unnecessary or 
audit text data and preprocesses the data using 
noisy elements from raw data to make it more accurate. It 
techniques, such as data cleaning using Stop Words 
is utilized to eliminate extraneous or noisy information, 
Removal and data normalization using lemmatization 
resulting in high-quality input for training and improved 
and tokenization. Finally, classification and information 
overall performance of electric power audit classification. 
retrieval were performed using BERT. Figure 1 depicts 
In the research, stop word removal reduces frequent, 
the System Design of the MDIR-BERT Model. 
unnecessary words from audit text so that the model 
 
focuses on important content for better IR and 
classification. This is produced by establishing a frequency 
threshold. This threshold was simply set as the average 
frequency of all terms gathered for the language in 
Equation (1). 
 
𝛼
𝜎 = ∑𝑛
𝑗 1 𝑡   ( )
𝑛 = 𝑗     1  
Where 𝑡𝑗 is the frequency of the 𝑗th term, Equation (1), 𝛼 is 
defined as a smoothing adjustment factor to 1.25, 
empirically validated in validation experiments to 
moderately increase the average threshold and dampen 
noise from low-frequency terms. This value 
𝛼
∑𝑛
 𝑛 𝑗=1 𝑡𝑗selected to optimize in entity recognition by 
 preventing inclusion of excessively rare or excessively 
Figure 1: System design of the multi-dimensional common terms. 
information retrieval-enhanced BERT model  
 3.2.2   Data normalization using lemmatization 
3.1   Data collection Normalization refers to the process of converting text to a 
The data is obtained from the Kaggle link: uniform state, often by reducing words to their standard 
https://www.kaggle.com/datasets/zoya77/power-audit- forms or original structures. The normalization process 
report-and-entities-dataset. The dataset comprises 1,001 allows the different variations of words to be standardized,  
424        Informatica 49 (2025) 419-432       J. Xiao-liang et al. 
 
which permits the method to better process and classification and information retrieval. While general 
comprehend province-precise language in power audit BERT is pre-trained with the Masked Language Modeling 
texts. This process of normalization helps to standardize (MLM) and Next Sentence Prediction (NSP) tasks on 
variations of words to allow for treating different general corpora, MDIR-BERT takes this further by adding 
versions of a word as equivalent terms. Lemmatization domain adaptation for the electric power audit domain. In 
helps to determine the organizational meanings of words, particular, MDIR-BERT is additionally pre-trained on a 
which assists in the analysis of text, and naturally, the massive dataset of electric power audit transcripts to 
processing of this text. It is valuable in many text analysis capture more domain-specific vocabulary, morphological 
projects, especially those focusing on IR, sentiment, and forms, and intricate named entity relations. To facilitate 
text classification. this domain-specific adaptation, two domain-specific pre-
 training tasks are utilized: Word-Level Masked Language 
3.2.3   Tokenization Modeling (W-MLM), likewise the standard MLM, but with 
It is the procedure in which input text is divided into modifications to focus on domain tokens that usually 
minor units of meaningful units (tokens), which can be appear within audit texts, including audit procedures, 
meaningful individual words, phrases, or sentences. voltage types, compliance, and equipment-related terms. 
Tokenization is a key step towards breaking down the Entity-Level Masked Language Modeling (E-MLM): This 
raw power of the audit text into portions that will task entails masking named entities determined through a 
ultimately be meaningfully analyzed by the model. By domain-tuned NER system and having the model predict 
tokenizing text, the model can better interpret the them in their respective contextual environments. This 
relationships, structure, and contextualization of words. assists MDIR-BERT in capturing hierarchical and 
Tokenization will be used to confirm the conducting of relational dependencies between domain-specific entities 
tasks, like IR and entity recognition, where tokens are more effectively. With these enrichments, MDIR-BERT 
identified and labeled. It enhances the ability of the gains a better grasp of electric-power-specific semantics 
model to yield valuable and informative information and structure for more accurate and context-sensitive 
from sophisticated and complex unstructured audit classification and retrieval. 
documents.   
 3.3.2 BERT for classification 
3.3   Classification and information retrieval BERT processes input text using its transformer layers 
using bidirectional encoder representations while performing categorization jobs. After the text has 
been analyzed, the output representation is sent through a 
from transformers (BERT) 
classification head to forecast the text's proper category, as 
Classification is the process of assigning text data into 
shown by Equation (2). 
predefined categories based on its content, and IR is the 
 
task of finding and extracting significant data from a 
𝑂𝑢𝑡𝑝𝑢𝑡𝑐𝑙𝑎𝑠𝑠 = 𝑆𝑜𝑓𝑡𝑚𝑎𝑥(𝐷𝑒𝑛𝑠𝑒(𝐵𝐸𝑅𝑇(𝐼𝑛𝑝𝑢𝑡)))    (2)       
huge collection of structured or unstructured information. 
 
In the research, classification helps to organize and label 
Where 𝐵𝐸𝑅𝑇(𝐼𝑛𝑝𝑢𝑡) represents the BERT model 
power audit texts into specific, meaningful categories for 
processing the input text, 𝐷𝑒𝑛𝑠𝑒 is the classification layer, 
easier analysis, while IR enables quick and accurate 
and 𝑆𝑜𝑓𝑡𝑚𝑎𝑥 is used to transform logits into probabilities 
extraction of relevant insights from large volumes of 
for classification.  
audit documents to support informed decision-making. 
 
These methods are boosted by BERT, which captures 
3.3.3 BERT for information retrieval 
deep contextual power and meaning of the text to 
BERT is used in IR to discover the documents that are 
enhance classification accuracy as well as retrieval 
relevant to a given query. BERT recognizes the context of 
precision. After tokenization, BERT uses the electric 
a query and a group of documents, which improves 
power audit text to gather contextual relations for 
retrieval accuracy. BERT's bidirectional nature assists in 
accurate classification. It further enhances IR through 
identifying more semantically relevant documents even 
accurate detection and retrieval of audit-specific features 
when keywords fail to match perfectly, as shown by 
and patterns. 
Equation (3). 
 
 
3.3.1   Overview of MDIR-BERT 
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒 𝑆𝑐𝑜𝑟𝑒 =
MDIR-BERT is based on the basic architecture of BERT 
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐵𝐸𝑅𝑇(𝑄𝑢𝑒𝑟𝑦), 𝐵𝐸𝑅𝑇(𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡))   (3) 
(Bidirectional Encoder Representations from 
 
Transformers), which encodes the bidirectional context 
Where 𝐵𝐸𝑅𝑇(𝑄𝑈𝐸𝑅𝑌) and 𝐵𝐸𝑅𝑇(𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡) are the 
of words in a sentence through self-attention 
query and document's context-aware embeddings, 
mechanisms. This helps the BERT model comprehend 
respectively.  
word semantics concerning the previous and next words, 
 
and hence, BERT is very effective for tasks like 
MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced…                                                         Informatica 49 (2025) 419-432   425 
 
3.3.4 Fine-Tuning BERT  Warmup Steps 500 
 Max Sequence Length 128 
Fine-tuning is the method of modifying the pre-trained Gradient Clipping 1.0 
BERT model to a precise goal, such as classification or Weight Decay 0.01 
IR, by training it on a labeled dataset. This involves Random Seed 42 
adapting BERT's weights by the task requirements, Dropout Rate 0.1 
enabling it to learn domain-specific jargon and nuances 
 
represented by Equation (4). 
4.3   Performance metrics 
 
 
𝐿𝑜𝑠𝑠 =  ∑𝑁
𝑗=1 𝐶𝑟𝑜𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑇𝑟𝑢𝑒𝑗 , 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑗)                        
The Performance Metrics, including Execution time, 
                                         (4) 
Energy consumption, and speed of convergence, are 
 
utilized to enhance the performance of electric power audit 
Where 𝑇𝑟𝑢𝑒𝑗is the definite tag for the 𝑗th model, 
text classification. 
𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑗  is the forecasted tag for the 𝑗thsample, and 
𝐶𝑟𝑜𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 is the loss function used during ➢ Energy consumption 
training. BERT has the advantage of considering the 
Energy consumption refers to the energy needed by a 
complete background of words in a phrase, which 
model to execute inputs and generate outputs. It is an 
significantly enhances classification and IR efficiency. 
important metric in energy-limited systems, like internet-
This two-way context enables BERT to recognize 
enabled edge devices or mobile devices. Lower energy 
subtle semantic links, making it particularly useful for 
usage renders the system more efficient and sustainable, 
processing complex and specialized language in power 
particularly for large-scale AI deployments. Figure 2 
audit reports. Furthermore, due to its pre-training on 
depicts the Energy Usage seen in the MDIR-BERT Model 
great corpora and fine-tuning capabilities, BERT could 
Execution. 
be trained to perform specific tasks with less data. 
 
 
4   Results and discussion 
The research objective is to improve electric power audit 
text categorization and IR performance by introducing a 
new MDIR-BERT model. The experimental setting and 
performance assessment measures used in the research 
improve the electric power audit text categorization and 
IR performance. 
 
The experiments were run on a machine with an Intel 
Core i7 processor, 32GB RAM, and an NVIDIA RTX 
3080 graphics card. The models were run in Python 3.9 
with PyTorch as the base, with the BERT model 
developed atop this framework.  
Figure 2: Energy Consumption Observed in the MDIR-
The model proposed took around 4.2 hours for training, 
BERT Model Execution 
using 4.2 GPU hours. It comprises about 110 million 
 
parameters and occupies a storage size of 420 MB. All 
MDIR-BERT model energy consumption or moderate 
the models were trained under the same settings to ensure 
consumption rates varied between 10 and 15kWh over five 
fairness when comparing the process. 
test repetitions, which converts to moderate consumption 
 
of resources. In the central sets of repetition, there was an 
4.2   Hyper-parameters 
increase in utilization, attributed to the complexity in 
  
processing or the size of the data. The model's average 
Table 2 represents the hyperparameters utilized in the 
utilization was more uniform and efficient, demonstrating 
power audit text understanding research. 
its potential for real-world applications. For epoch 1, the 
 
model reaches 10kWh, 12kWh in epoch 2, 11kWh in epoch 
Table 2: Hyperparameters 
3, 15kWh in epoch 4, and 13kWh in epoch 5. The proposed 
Hyperparameter Value 
MDIR-BERT method shows extreme performance in 
Learning Rate 2𝑒 − 5 epoch 4 with 15kWh. 
Batch Size 32  
Number of Epochs 30  
Optimizer AdamW 
426        Informatica 49 (2025) 419-432       J. Xiao-liang et al. 
 
Execution time 
Execution time refers to the number of times it takes a 
model to consume an input, process it, and produce an 
output. It is a significant metric for real-time or time-
sensitive applications, like autonomous systems or 
internet applications. Lower execution times are 
preferable so that users can experience the best, and the 
system's efficiency is enhanced overall. Figure 3 
 
illustrates the visualization of the MDIR-BERT's 
Figure 4:  Graphical outcome of (a) loss and (b) accuracy 
execution time. 
 
 
Statistical Significance 
The confidence interval for model accuracy is the normal 
distribution curve, where the shaded region highlights the 
most likely accuracy range. It visually represents the 
reliability and accuracy of the model's performance 
estimate. Figure 5 shows the Graphical outcome of a 95% 
confidence interval for accuracy (MDIR-BERT). 
 
 
Figure 3: Visualization of the execution time of the 
MDIR-BERT 
 
The execution time of the MDIR-BERT model replicates 
its performance over 20 epochs with moderate variances 
due to computation and environmental conditions. The 
execution duration varies around an average of 0.8 
seconds, with peaks reaching roughly 0.89 seconds and 
troughs around 0.72 seconds, driven by a sinusoidal 
 
pattern and small random noise.  
Accuracy and loss Figure 5: Graphical outcome of 95% confidence interval 
for accuracy (MDIR-BERT) 
Accuracy is the number of correct predictions made by a 
classical model to the total number of predictions, The image shows a 95% confidence interval for the 
whereas loss is the difference between expected and accuracy of MDIR-BERT, represented as a normal 
actual values, which measures how well the model distribution curve. The x-axis indicates accuracy 
performs throughout training. The loss curve shows how percentages ranging from 97.0% to 101.0%. The shaded 
the model converged during training, with lower values area under the curve represents the 95% confidence 
representing better performance, while the accuracy interval, meaning there is a 95% probability that the true 
curve shows how well it captures electric-power-specific accuracy of the model lies within this range. The 2.5% tails 
morphology, domain power, and intricate entity on either side of the distribution are excluded, highlighting 
relationships more effectively.  The accuracy and loss the central 95% region. The peak of the curve corresponds 
characteristics of the training for the MDIR-BERT to the most probable accuracy value, with the density 
technique are shown in Figure 4. decreasing as values move away from the center. 
The resulting MDIR-BERT model demonstrates good 
performance: training loss goes down from 0.95 to Confusion Matrix 
almost 0.01 after 30 epochs, and training accuracy 
increases steeply from 0.1 to about 0.97, which indicates A confusion matrix compares the expected and actual 
good convergence and high learning efficiency. values for a dataset to show the effectiveness of a 
 classification model (Figure 6). The confusion matrix 
 shows how well MDIR-BERT classified data in five 
different power audit categories. Considering its high 
overall prediction accuracy, the model occasionally  
MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced…                                                         Informatica 49 (2025) 419-432   427 
 
misclassifies objects, especially between closely The research evaluates the macro/micro F1. Evaluating all 
similar classes like "Energy Efficiency" and "System classes equally, Macro F1 computes the F1 score for each 
Upgrade." Through multi-dimensional information class separately before the results. To create a single F1 
retrieval and domain-adapted language modeling, this score, Micro F1 provides all the real positives, incorrect 
demonstrates the domain complexity and the model's positives, and incorrect negatives for each class, which 
efficacy in comprehending electric power audit gives each occurrence equal weight. The proposed model 
material. demonstrates 0.964 of macro F1 and 0.963 of micro F1. 
  
4.4    Comparison phase 
The performance metrics used to compare the performance 
of electric power audit text classification are accuracy, F1-
score, recall, and precision. The MDIR-BERT was 
compared with the existing methods like Text 
Convolutional Neural Networks (Text CNN) [22], BERT 
[22], and Electric Power Audit Text–BERT (EPAT-BERT) 
[22]. 
Accuracy 
Accuracy: Accuracy indicates how well the model 
accurately recognizes relevant and irrelevant information 
in the power audit text. To indicates the ratio of correct to 
incorrect predictions across all cases, giving a total view of 
 classification performance across different audit 
Figure 6: Confusion matrix of MDIR-BERT documents. Accuracy measures the proportion of all 
performance correct power audit text classifications performed by a 
 model. It can be helpful in assessing overall MDIR-
Precision-recall curves BERT’s performance. Table 3 depicts the accuracy of the 
A binary classification model's effectiveness is MDIR-BERT. 
represented graphically by a Precision-Recall curve, 
which is particularly beneficial for unbalanced datasets 
Table 3: Performance summary of MDIR-BERT 
(Figure 7). The precision-recall curve validates MDIR-
BERT's efficacy in domain-dependent audit text F1-
Accuracy Recall Precision 
understanding by demonstrating its superior Methods score 
(%) (%) (%) 
classification performance in power audit categories, (%) 
with an average accuracy. Whereas the energy 
efficiency is 0.89, the maintenance recommendation is 
Text CNN 
0.91, the regulatory violation is 0.96, safety compliance 71.65 69.01 74.27 71.56 
[22] 
is 0.97, and the system upgrade is 0.89. 
 
BERT [22] 77.91 77.94 78.23 78.08 
EPAT-
81.96 81.62 80.79 81.20 
BERT [22] 
MDIR-
BERT 98.82 97.81 96.48 97.34 
[Proposed] 
 
Figure 7: Efficiency of MDIR-BERT with precision-
recall curves 
428        Informatica 49 (2025) 419-432       J. Xiao-liang et al. 
 
Figure 9 indicates the extent to which each model identifies 
all relevant content. Text CNN (69.01%) and BERT 
(77.94%) demonstrate moderate ability to identify relevant 
content, while EPAT-BERT shows a refined ability 
(81.62%), and the proposed method achieved 97.81%. 
 
Precision 
Precision assesses the extent to which each piece of text 
identified as relevant contains useful audit content. That is, 
it signifies the degree to which the model is able to avoid 
false positives and is a matter of importance for limiting 
irrelevant or misleading content through audit analysis. 
Precision measures the number of true positives (TP 
divided by the total number of TP, with the False Positives 
 
 (FP). Precision is important if the cost of an FP is high, for 
Figure 8: Graphical representation of accuracy for the example, misclassifying a legitimate user as a spammer or 
MDIR-BERT a fraudster. 
  
Figure 8 demonstrates consistent improvement in 
accuracy, which improves to 71.65% for Text CNN, 
77.91% for BERT, and 81.96% with EPAT-BERT. The 
proposed method receives a significant increase to 
98.82%, suggesting that power audit texts are classified 
very well overall. 
 
Recall 
Recall represents how well the model collects all the 
relevant audit information in the documents. Recall fits 
with a focus of reducing missed important content, 
which is key to holistic regulatory compliance and 
decision support in power audit. Recall is the ratio of 
True Positives (TP) to TP with the False Negatives  
(FN). Recall is important if the cost of misclassifying a Figure 10: Precision analysis chart of MDIR-BERT model 
positive instance is high, as in the case of a diagnostic  
method.  Figure 10, indicating the correctness of predicted relevant 
 pieces of information, is highest for the proposed method 
at 96.48%. This demonstrates a low false-positive rate. In 
the study, a measure of the precision could be performed 
with Text CNN, showing a decent 74.27%, BERT 
achieving a better performance of 78.23%, and EPAT-
BERT showing 80.79%. 
 
F1-Score 
The F1-score balances precision and recall to deliver a 
single metric of model performance at comprehending an 
audit text. It can be especially helpful when it is as 
important to avoid false alarms as it is to capture every 
detail necessary. The F1-score is the harmonic average of 
 recall and precision, which balances 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 and 𝑟𝑒𝑐𝑎𝑙𝑙. 
Figure 9: Visual depiction of classification recall It is especially useful in problems involving imbalanced 
achieved by MDIR-BERT data, especially when FP and FN are equally important. 
 
MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced…                                                         Informatica 49 (2025) 419-432   429 
 
multi-dimensional way. Thus, Text CNN [22] will return 
lower recall and precision for this corpus. BERT [22] 
improved contextual understanding but did not adapt to the 
language structures and entities specific to power audits, 
which ultimately limited its performance on complicated 
auditor narratives. While EPAT-BERT [22] is adapted for 
domain use, it does not sufficiently model multi-
dimensional relationships and detailed audit semantics. 
MDIR-BERT is superior to EPAT-BERT by the pretraining 
within a domain and multi-dimensional information 
retrieval (IR) boost, allowing for more in-depth electric 
power audit language comprehension and enhanced 
entity/context identification. It's a +16.86% accuracy 
improvement, indicating enhanced classification and 
retrieval. Whereas the success of the model is domain-
 
Figure 11: Performance visualization of MDIR-BERT specific and will not generalize to other audit types unless 
in terms of F1-score the model is retrained. In contrast to domain-specific 
 transformers (such as FinBERT) and RAG-based models, 
Figure 11 summarizes overall performance, balancing MDIR-BERT has better structured text understanding but 
recall and precision. It ranges from 71.56% (Text CNN) without the generative ability. Future research will focus on 
to 78.08% (BERT) and 81.20% (EPAT-BERT). In RAG integration for summarization and improving cross-
contrast, the proposed method is 97.34%, and affirms domain adaptability by transfer learning or model 
our approach's clear superiority in accurately and compression. The proposed MDIR-BERT method in the 
consistently extracting meaningful audit content. research makes it possible for researchers, utilities, and 
regulatory organizations to modify and evaluate models for 
 
certain auditing conditions while providing innovation. 
4.5   Training and testing splits High-stakes audit environments are secured by compliance 
The training and testing validation of the proposed with energy regulations and data management systems. 
MDIR-BERT method’s performance in 80:20 The proposed MDIR-BERT method in the research 
validations is compared with 70:30 splits to determine makes it possible for researchers, utilities, and regulatory 
the efficiency of the proposed model in the field of organizations to modify and evaluate models for certain 
power audit text understanding research. Table 4 auditing conditions while providing innovation. High-
explores the training and testing validation of the stakes audit environments are secured by compliance with 
proposed model with 80:20 and 70:30 splits. energy regulations and data management systems. 
  
Table 4: Performance of proposed MDIR-BERT model 
5   Conclusions 
with training and testing splits 
 The aim was to build a multi-dimensional information 
retrieval for enhanced classification and comprehension of 
Training and Testing splits 
Metrics electric power audit texts. MDIR power audit text 
80:20 70:30 
comprehension technology using the integration of multi-
Accuracy (%) 98.82 97.6 
dimensional enhancement and a domain-adapted LLM. An 
Precision (%) 96.48 95.5 
end-to-end data preprocessing method was utilized, which 
Recall (%) 97.81 96.72 
involved data cleaning to eliminate unwanted symbols and 
F1-score (%) 97.34 96.13 
noise, normalization via lemmatization to normalize word 
 
forms, and tokenization to split text into useful units 
Based on the performance of various training and 
appropriate for model input. MDIR-BERT model, being 
testing validations, the proposed MDIR-BERT model 
pre-trained on electric power audit texts, efficiently learned 
shows more significance in 80:20 validations than in 
domain-specific terms, morphological phenomena, and 
70:30 validation assessments. 
entity relationships. These preprocessing operations 
The comparative results showed notable weaknesses in 
considerably enhanced the textual data quality and 
Text CNN [22], BERT [22], and EPAT-BERT [22] in 
uniformity utilized for training and fine-tuning. The model 
their suitability to power audit text classification. Text 
achieved significant accuracy (98.82%), recall, precision, 
CNN struggles with long-distance/contextual 
and F1-score improvements, signifying robust 
knowledge and domain-specific vocabulary and 
performance. It also exhibited very high efficiency through 
language due to its inability to layer information in a 
lower energy expenditure, a quicker execution time, 
and improved convergence rate. 
430        Informatica 49 (2025) 419-432       J. Xiao-liang et al. 
 
 and Medium Commercial Buildings to Identify 
5.1 Limitations and future scopes: In uncertain Energy Retrofit Opportunities. Energies, 16(17), 
circumstances, MDIR-BERT gets biased or 6191. https://doi.org/10.3390/en16176191  
hallucinatory findings with its performance, which [8] Gunasegaran MK, Hasanuzzaman M, Tan C, Bakar 
leads to incorrect regulatory decisions. The integrity of AHA & Ponniah V (2023). Energy Consumption, 
an audit could be affected by misuse or a misconception Energy Analysis, and Solar Energy Integration for 
that lacks domain expertise. These limitations show the Commercial Building Restaurants. Energies, 
significance of management and the consistency of 16(20), 7145. https://doi.org/10.3390/en16207145  
power sector control regulations. The quality of [9] Ai Q, Bai T, Cao Z, Chang Y, Chen J, Chen Z ... & 
domain-specific information is a potential factor that Zhu X (2023). Information retrieval meets large 
could be further investigated. Additional research language models: a strategic report from chinese ir 
intends to develop reasoning capabilities and extend the community. AI Open, 4, 80-90. 
model's ability to process a broader range of document https://doi.org/10.1016/j.aiopen.2023.08.001 
types. Future research should focus on the [10] Sun B & Huo F (2025). Analysis of Customer 
generalization of the model to various sectors. Comment Data on E-commerce Platforms Based on 
 RPA Robots. Informatica, 49(10). 
Funding: https://doi.org/10.31449/inf.v49i10.5908 
This work was supported by Technology Project of [11] Xiong X & Zheng, M. (2024). Merging mixture of 
State Grid Tianjin Electric Power Company (Grant no. experts and retrieval augmented generation for 
Chengdong Yanfa 2024-05). enhanced information retrieval and reasoning. 
 https://doi.org/10.21203/rs.3.rs-3978298/v1 
References [12] Zhang H, Zhao Y, Sun B, Wu Y, Fu Z & Xiao X 
(2025). Large Language Model Based Intelligent 
[1] Pan M, Liu Y, Chen J, Huang EA, & Huang JX Fault Information Retrieval System for New Energy 
(2024). A multi-dimensional semantic pseudo- Vehicles. Applied Sciences, 15(7), 4034. 
relevance feedback framework for information https://doi.org/10.3390/app15074034 
retrieval. Scientific Reports, 14(1), 31806. [13] Wen B, Wang T, Xu J, Liu Y, Li J & Lin S (2025). 
https://doi.org/10.1038/s41598-024-82871-0 File Compliance Detection Using a Word2Vec-
[2] Guo J, Fan Y, Pang L, Yang L, Ai Q, Zamani H & Based Semantic Similarity 
Cheng X (2020). A deep look into neural ranking Framework. Informatica, 49(18). 
models for information retrieval. Information https://doi.org/10.31449/inf.v49i18.7421 
Processing & Management, 57(6), 102067. [14] Tang Q, Chen J, Yu B, Lu Y, Fu C, Yu H ... & Li Y 
https://doi.org/10.1016/j.ipm.2019.102067 (2024). Self-retrieval: Building an information 
[3] Wang X, Wang J, Cao W, Wang K, Paturi R, & retrieval system with one large language 
Bergen L (2024). Birco: A benchmark of model. arXiv e-prints, arXiv-2403. 
information retrieval tasks with complex https://doi:10.48550/arXiv.2403.00801 
objectives. arXiv preprint arXiv:2402.14151. [15] Azeroual O, Nacheva R, Nikiforova A & Störl U 
https://doi.org/10.48550/arXiv.2402.14151 (2025). A CRISP-DM and Predictive Analytics 
[4] Hambarde KA, & Proenca H (2023). Information Framework for Enhanced Decision-Making in 
retrieval: recent advances and beyond. IEEE Research Information Management Systems. 
Access, 11, 76581- Informatica, 49(18). 
76604.https://doi.org/10.3390/1010000 https://doi.org/10.31449/inf.v49i18.5613 
[5] Taherzadeh-Shalmaei N, Rafiee M, Kaab A, [16] Huang AH, Wang H & Yang Y (2023). FinBERT: A 
Khanali M, Rad MAV and Kasaeian A (2023). large language model for extracting information 
Energy audit and management of environmental from financial text. Contemporary Accounting 
GHG emissions based on multi-objective genetic Research, 40(2), 806-841. 10.1111/1911-
algorithm and data envelopment analysis: An 3846.12832 DOI:10.1111/1911-3846.12832 
agriculture case. Energy Reports, 10, pp.1507- [17] Gan Z (2024). LARGE LANGUAGE MODELS 
1520.  EMPOWERING COMPLIANCE CHECKS AND 
[6] Quispe EC, Viveros Mira M, Chamorro Díaz M, REPORT GENERATION IN AUDITING. World 
Castrillón Mendoza R & Vidal Medina JR (2025). Journal of Information Technology, 
Energy Management Systems in Higher 35.10.61784/wjit3003 
Education Institutions’ Buildings. Energies,  
18(7), 1810. https://doi.org/10.3390/en18071810   
[7] Rios FC, Al Sultan S, Chong O & Parrish K 
 
(2023). Empowering Owner-Operators of Small 
MDIR-BERT: A Multi-Dimensional Retrieval-Enhanced…                                                         Informatica 49 (2025) 419-432   431 
 
[18] Panda A, Tang X, Nasr M, Choquette-Choo CA &  
Mittal P (2025). Privacy auditing of large  
language models. arXiv preprint  
arXiv:2503.06808.  
https://doi.org/10.48550/arXiv.2503.06808  
[19] Oukhouya MH, Angour N, Aboutabit N & Hafidi  
I (2025). Comparative Analysis of ARDL, LSTM,  
and XGBoost Models For Forecasting The  
Moroccan Stock Market During The COVID-19  
Pandemic. Informatica, 49(14).  
https://doi.org/10.31449/inf.v49i14.5751  
[20] Li X, Wu X, Luo Z, Du Z, Wang Z & Gao C,  
(2023). Integration of global and local information  
for text classification. Neural Computing and  
Applications, 35(3), pp.2471-2486.  
https://doi.org/10.1007/s00521-022-07727-y  
[21] Nordin IG (2023). Narratives of internal audit:  
The Sisyphean work of becoming “independent”.  
Critical Perspectives on Accounting, 94,  
p.102448. DOI:10.1108/MEDAR-01-2022-1584  
[22] Meng Q, Song Y, Mu J, Lv Y, Yang J, Xu L ... &  
Meng Q (2023). Electric power audit text  
classification with multi-grained pre-trained  
language model. IEEE Access, 11, 13510-13518.  
https://doi.org/10.1109/ACCESS.2023.3240162  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
432        Informatica 49 (2025) 419-432       J. Xiao-liang et al.