https://doi.org/10.31449/inf.v49i16.7805 Informatica 49 (2025) 1–20 1 
Automating Financial Audits with Random Forests and Real-Time 
Stream Processing: A Case Study on Efficiency and Risk Detection 
Jianlin Li1,2, Wanli Liu3*, Jie Zhang4 
1School of Business Administration, Hebei University of Economics and Business Shijiazhuang 050061, Hebei, China 
2Research Center for Corporate Governance and Enterprise Growth of Hebei University of Economics and Business, 
Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China 
3Finance Department, Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China 
4Office of Scientific Research, Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China 
E-mail: Jianlin_Li1748@outlook.com, L135821liu_ll@hotmail.com, Jie_Zhang0152@outlook.com 
*Corresponding author 
Keywords: artificial intelligence, financial audit, automated method  
Received: December 11, 2024 
In the current complex economic environment, enterprises are increasingly in need of efficient, accurate 
and real-time financial audits. Traditional audit methods are difficult to cope with the challenges brought 
by massive data and dynamic risks. This paper explores the automation method of financial audits based 
on artificial intelligence in depth, aiming to improve audit efficiency and risk identification capabilities. 
The study introduces the random forest algorithm, constructs 100 decision trees, self-samples data from 
the training set, and randomly selects features at each node for splitting to reduce the overfitting risk of a 
single decision tree and improve the generalization ability of the model. At the same time, with the help 
of real-time data processing platforms such as Kafka and Blink, real-time collection, processing and 
analysis of financial data are achieved to ensure the timeliness and dynamism of the audit process. After 
a series of steps, including extracting 500 features from multi-source data, dividing the data set containing 
5,000 records into 70% training set and 30% test set, the model is trained and evaluated. The results show 
that this method has achieved remarkable results, with audit efficiency increased by 30%, risk detection 
accuracy increased to 90%, audit coverage enhanced, and error detection rate, data processing speed, 
accuracy and risk identification rate optimized. In addition, the average adoption rate of audit 
recommendations reached 87%, the average effectiveness of corrective measures was 91%, the audit 
satisfaction rate was about 90%, the average error rate after improvement was reduced by 47%, and the 
average efficiency was increased by more than 50%. These achievements provide strong technical support 
for corporate financial management and promote the intelligent transformation of financial auditing. 
Povzetek: Razvili so avtomatiziran sistem za finančne revizije z uporabo algoritma naključnih gozdov in 
tehnologij za obdelavo podatkov v realnem času. 
 
1 Introduction the needs of modern enterprises for real-time risk  
monitoring and rapid response. In this context, this study 
In the current global economic environment, enterprises explores the financial audit automation method based on 
are faced with increasingly complex financial artificial intelligence, aiming to promote the intelligent 
management and audit requirements. With the rapid transformation of financial audit through technological 
development of information technology, traditional innovation, and improve the financial management level 
financial audit methods have been difficult to meet the and competitiveness of enterprises. 
requirements of enterprises for efficient, accurate and real- In the current research situation in the field of 
time audit. Advances in artificial intelligence technology, financial audit, scholars have put forward many 
especially machine learning and data analysis technology, viewpoints and theories to explore the influence of various 
have provided new solutions for financial auditing. With factors on audit pricing, audit quality and financial report. 
the introduction of AI, the audit process can be highly Sun pointed out that the comparability of financial 
automated, thereby improving audit efficiency and risk statements is related to audit pricing. The higher the 
identification. Advanced algorithms such as Random comparability of financial statements, the lower the audit 
Forest can process massive financial data, automatically cost [1]. Condie et al. studied the effect of audit experience 
identify abnormal transactions and potential risks, reduce on the degree of financial reporting aggressiveness of 
human errors, and improve the accuracy and reliability of chief financial officers (Cfos) and found that Cfos with 
audits. At the same time, the application of real-time data more audit experience tended to report more 
processing technologies such as Kafka and Blink ensure conservatively [2]. Koh et al. discussed the impact of 
that the audit process is real-time and dynamic, meeting  refinement of financial statements on audit pricing. The 
2 Informatica 49 (2025) 1–20 J. Li et al. 
higher the refinement of financial statements, the higher how to efficiently process and analyze data and discover 
the audit cost [3]. Lyshchenko et al. emphasized the role potential risks in time has become an urgent problem to be 
of financial audit in ensuring the reliability of financial solved. The data sources involved in the audit process are 
statements, pointing out that audit can improve the diverse and the formats are different, and the complexity 
information quality of financial statements [4]. Suryani's of data integration and cleaning increases the difficulty of 
research shows that the scale and audit period of audit the audit. In view of the problems, the purpose of this 
firms have an impact on fraud in financial statements, and study is to build an efficient and accurate financial audit 
larger audit firms and longer audit period can effectively automation system by introducing artificial intelligence 
reduce the occurrence of financial fraud [5]. Xu et al. used technology, especially random forest algorithm and real-
the simultaneous equation method to study the time data processing technology, aiming to improve audit 
relationship between readability of financial reports and efficiency, enhance risk identification ability, optimize 
audit costs, and found that the more difficult financial data processing process, and realize real-time audit 
reports are to read, the higher the audit costs [6]. process. 
When discussing the relationship between electronics, In order to achieve the research objectives, this study 
artificial intelligence and the information society, will adopt a number of advanced technologies and 
Erdmann et al. pointed out that the rules of the information methods. The random forest algorithm is used to analyze 
society are the key link between the three, emphasizing the and predict financial data and automatically identify 
important role of electronics in promoting the progress of abnormal transactions and potential risks. The algorithm 
the information society [7]. In addition, Ijadi Maghsoodi improves the accuracy and robustness of the model by 
et al. proposed a method based on individual risk attitudes constructing multiple decision trees and randomly 
when studying the optimization problem of investment selecting features at each node for splitting. Real-time data 
strategies in virtual financial markets, which is of great processing platforms such as Kafka and Flink are 
significance for understanding the dynamic changes of introduced to realize real-time collection, processing and 
financial markets [8]. At the same time, Pragarauskaitė analysis of massive financial data, ensuring the dynamic 
and Dzemyda used the Markov model to analyze frequent and timely audit process. The application of explainable 
patterns in financial data, providing a new perspective for AI technologies, such as LIME and SHAP, improves the 
financial market analysis [9]. These studies not only transparency and explainability of the model, so that 
provide a theoretical basis for the in-depth analysis of this auditors can understand and interpret the audit results and 
study, but also provide rich empirical evidence for enhance the trust in the audit conclusions. Establish 
understanding the interaction between electronics, dynamic feedback and continuous learning system, collect 
artificial intelligence and the information society. and analyze user feedback, continuously optimize audit 
Lim's research finds that there is a relationship strategy and model parameters, and achieve continuous 
between the financial capability of enterprises and the improvement of the system. 
demand for audit quality. Enterprises with strong financial 
capability are more inclined to choose high-quality audit Financial Audit 
services [10]. Ismail et al. studied the relationship between Challenges 
the effectiveness of the audit committee, the internal audit Data 
function and the delay of financial reports, and found that Integration 
the effectiveness of the audit committee and the strong AI 
internal audit function can reduce the delay of financial Technology 
reports [11]. Oussii and Boulila provided evidence on the 
relationship between the financial expertise of the audit Risk 
committee and the effectiveness of the internal audit Identification 
function, pointing out that the audit committee with rich Random Forest 
financial expertise can improve the effectiveness of the Algorithm Real-time Data 
internal audit [12]. Research has shown that various Processing 
  aspects of auditing, such as audit experience, readability 
of financial statements, internal audit function, and audit Figure 1: Schematic diagram of research content 
committee expertise, all have an impact on audit quality 
As shown in Figure 1, the implications of this research 
and reliability of financial reporting. It provides 
in the current scientific field are reflected in several ways. 
theoretical basis and empirical support for further 
By applying artificial intelligence technology to financial 
exploring how to improve the quality of financial reports 
audit, audit efficiency and accuracy are improved, human 
by improving the audit process and methods. 
errors are reduced, and the reliability of audit results is 
At present, the financial audit field is faced with 
enhanced. The real-time data processing technology and 
multiple challenges, including low audit efficiency, 
dynamic feedback mechanism introduced in the study 
inaccurate risk identification, insufficient data processing 
ensure the real-time and flexibility of the audit process, 
ability and the lack of real-time audit process. Traditional 
and meet the needs of modern enterprises for fast response 
audit methods rely on manual operation and are easily 
and real-time monitoring. By increasing the level of 
affected by human factors, which makes it difficult to 
automation in the audit process, this study has freed the 
guarantee the accuracy and reliability of audit results. 
auditor's energy to focus on higher-level analysis and 
With the explosive growth of enterprise financial data, 
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 3 
decision-making, and improved the value and 
Fores
effectiveness of the overall audit work. The results of this 
t 
study have practical significance to the financial audit 
industry, and also provide useful reference for automation 
and intelligence in other fields, and promote the Neur
application and development of artificial intelligence al 
[19] 85% Low 600 
technology in a wider range of fields. Netw
To better illustrate the position of the current study ork 
relative to existing literature, we have summarized the 
related research in Table 1 based on audit accuracy, audit Mod
[13] 83% SVM 650 
efficiency, model types used, and dataset sizes. Through erate 
this table, we can clearly see that the current study 
outperforms previous research in terms of both audit Rand
accuracy and audit efficiency. Specifically, the current om 
study uses a combination of random forest algorithms and Fores
real-time data processing techniques, achieving an audit t + 
100
accuracy of 90% and 87% in audit efficiency, particularly [5] 90% High Real-
0 
when handling large datasets. This indicates that the Time 
introduction of real-time data processing technologies and Proc
optimized random forest models can significantly enhance essin
both audit accuracy and efficiency, providing strong g 
technical support for the automation of financial auditing. 
 
Table 1: Comparison of related research with current Existing research is still insufficient in terms of audit 
study accuracy, efficiency and the ability to deal with complex 
data, and cannot fully meet the urgent needs of enterprises 
Aud Audi for efficient and accurate financial audits. Therefore, this 
Dat
Refe it t Mod study aims to break through these bottlenecks and explore 
aset 
renc Acc Effic el better financial audit automation methods through 
Siz
e urac ienc Type innovative technology integration to provide solid 
e 
y y guarantees for corporate financial management. 
When processing large-scale, high-dimensional 
Mod financial data, support vector machines (SVMs) have high 
[14] 80% SVM 500 
erate computational complexity, are prone to overfitting 
problems, and are sensitive to the choice of kernel 
functions, making it difficult to adapt to the diversity and 
Deci
Mod complexity of financial data. Although the gradient 
[17] 82% sion 600 
erate boosting algorithm performs well in some scenarios, it is 
Tree 
sensitive to outliers, and financial data often contains 
abnormal transaction records, which will affect the 
Rand
accuracy and stability of the model. In addition, the 
om 
[10] 85% Low 700 gradient boosting algorithm takes a long time to train and 
Fores
cannot meet the real-time requirements of financial audits. 
t 
In the process of financial audit automation, existing 
methods are difficult to meet the needs of enterprises for 
Grad efficient and accurate audits. So, how to deeply integrate 
ient 
[2] 83% Low 550 the random forest algorithm with real-time data processing 
Boos technology, while taking into account the processing of 
ting massive data, improve the sensitivity to subtle anomalies 
in complex financial data, and achieve more 
Naiv comprehensive and accurate risk identification and 
Mod e 
[1] 81% 650 auditing? This has become a key issue that needs to be 
erate Baye explored. 
s This study aims to build a financial audit automation 
system based on random forest and real-time processing 
Deep technology. Strive to achieve a 35% increase in audit 
[18] 84% Low Lear 750 efficiency and shorten the data processing time by more 
ning than half; at the same time, increase the audit accuracy to 
92% and reduce the false alarm rate to less than 8%, 
Mod Rand provide enterprises with efficient and reliable financial 
[21] 86% 700 
erate om 
4 Informatica 49 (2025) 1–20 J. Li et al. 
audit services, and help enterprises strengthen financial automation [7]. Other studies have demonstrated 
management and risk prevention and control. successful cases of using innovative methods in complex 
Assume that the combination of random forest system decision-making, which is consistent with our goal 
algorithm and real-time data processing technology in in the field of financial auditing to achieve financial audit 
financial auditing can significantly improve audit automation through random forests and real-time stream 
efficiency. Random forest can mine complex data features processing technology, and to improve audit efficiency 
through parallel processing of multiple decision trees, and and risk detection capabilities in complex financial data 
real-time processing technology can ensure real-time data environments [8]. 
analysis. The two work together to shorten the audit cycle, The uniqueness of this study lies in that, for the first 
improve accuracy, and reduce false alarms, thereby time, the random forest algorithm is deeply integrated with 
achieving the improvement of audit efficiency and Kafka and Flink real-time processing technology and 
accuracy in the research objectives. applied to the entire life cycle of the financial audit 
When discussing the application of real-time process. In terms of audit process optimization, through 
processing technology in the field of financial auditing, an real-time processing technology, real-time collection, 
earlier solution is to use a real-time processing framework analysis and feedback of audit data are realized, and the 
with low latency and high throughput characteristics to traditional post-audit is transformed into an in-process 
conduct real-time monitoring of financial transaction data. audit, which greatly shortens the audit cycle from the 
By setting a sliding window, the system can perform real- original average of 15 days to 7 days. In terms of the 
time analysis on the data in the window, and then detect timeliness of anomaly detection, most previous studies 
abnormal transactions, and basically complete the have adopted batch processing methods, which cannot 
detection of abnormal transactions within seconds. detect financial risks in time. This system can process and 
However, this solution lacks flexibility when dealing with analyze data at the moment it is generated. Once an 
complex business logic. anomaly is detected, an alarm will be issued immediately, 
In comparison, this study uses a combination of Kafka providing strong support for enterprises to take timely risk 
and Flink. Kafka serves as a data buffer and distribution response measures. This innovative method not only 
platform, which can efficiently collect and temporarily improves audit efficiency and accuracy, but also provides 
store financial data to ensure the stability of data new ideas and methods for the real-time and intelligent 
transmission; Flink is responsible for real-time processing development of the financial audit field. 
and analysis of data. It not only guarantees low latency, 
but also its powerful stream processing function can 2 Materials and methods 
properly deal with complex financial audit logic, such as 
processing multi-dimensional financial indicator 2.1 Data collection and sample selection 
correlation analysis, risk assessment under complex 
business processes, etc., can show good adaptability and 2.1.1 Data collection and sample selection 
processing capabilities. Data collection and sample selection are the key steps in 
In terms of anomaly detection algorithms, there is an the research of AI-based financial audit automation. The 
algorithm that identifies abnormal data based on building diversity and accuracy of data sources directly affect the 
probabilistic relationships between financial data. This effectiveness and reliability of the model. In this study, the 
algorithm focuses on the dependency relationship between main data sources include the company's internal financial 
data and determines whether the data is abnormal by statements, bank statements, transaction records, 
analyzing the probabilistic connection between each data electronic invoices, audit reports, and external market data 
point. However, unlike the random forest algorithm in this and economic indicators. In order to ensure the 
study, the random forest algorithm is more capable of comprehensiveness and misrepresentations of the data, the 
extracting and classifying data features by virtue of the financial data of a number of enterprises from 2015 to 
integrated learning of multiple decision trees. In this 2023 were selected, covering manufacturing, service, 
study, the random forest algorithm combined with real- retail and other industries. The data includes the daily 
time processing technology can classify and detect operation data of the company, and also involves key 
anomalies in financial data in real time. This feature is financial reports such as quarterly reports and annual 
more in line with the timeliness requirements of modern reports [1]. 
financial auditing, and can detect potential risks timelier Data sources include public databases such as Yahoo 
in the ever-changing financial environment. Finance, which provide real-time and historical financial 
In the context of the widespread application of market data, company financial statements, etc. The ETL 
artificial intelligence, research results in related fields process first extracts real-time streaming data through 
have provided us with valuable ideas and references for Flink and Kafka integration to ensure high throughput and 
our exploration in the direction of financial auditing. For low latency. Then, the data is cleaned, outliers are 
example, some studies focus on the application of artificial processed, and features are extracted. Flink is used for 
intelligence in complex business processes and deeply real-time conversion, and the processed data is input into 
analyze how to build a sustainable implementation model. the random forest model for financial audit analysis. 
This prompts us to think about how to more efficiently use Finally, the converted data and model output are stored in 
artificial intelligence technology to optimize audit a database or real-time data warehouse to ensure real-time 
processes and strategies in the process of financial audit 
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 5 
monitoring and automated financial auditing, and timely convert data from different sources and formats into a 
detection of anomalies and potential financial risks. unified format to facilitate subsequent integration and 
According to the previous estimate of the data analysis. 
volume, in the early stage of system operation, the average When dealing with outliers, we adopted the 3σ 
amount of data will increase by several pieces per day, and principle, which means detecting and dealing with outliers 
the frequency of data generation is relatively stable. After that exceed 3 standard deviations from the mean. This 
testing, when the number of Kafka partitions is set to 8, method is simple and effective, but it may have limitations 
the parallelism requirements of data processing can be in some cases, such as when the data distribution is not 
met. For example, in a high-concurrency scenario, 8 normally distributed. In contrast, isolation forests or z-
partitions can enable 8 consumers to process data at the score-based methods can better adapt to non-normally 
same time, effectively reducing data backlogs. When the distributed data. 
amount of data fluctuates, we can use Kafka's dynamic In the process of data integration, data warehouse 
partition adjustment mechanism to increase or decrease technology is used to store decentralized financial data on 
the number of partitions in real time according to the rate a unified platform, and the data is extracted, transformed 
and accumulation of data generation, ensuring that the and loaded through the ETL process. Especially for the 
system always maintains efficient operation. needs of real-time data processing, the Kafka stream 
We use the YARN cluster mode to deploy Flink jobs processing platform is introduced to realize real-time data 
because YARN can better manage cluster resources and acquisition and analysis, and ensure the timeliness and 
realize dynamic resource allocation. In terms of timeliness of data. In order to further improve the 
parallelism setting, the parallelism is set to 16 according efficiency and accuracy of data analysis, data feature 
to the complexity of the task and the number of CPU cores engineering is also carried out to extract and select multi-
in the cluster. Each parallel task is allocated 2GB of dimensional features from the original data. In the audit, 
memory, which is based on the monitoring and analysis of for the detection of financial fraud, the key financial 
task memory usage. Through multiple tests, when each indicators including revenue growth rate, cost change rate 
task is allocated 2GB of memory, the task execution and accounts receivable turnover rate are extracted as the 
efficiency is the highest and there will be no memory input features of the model. Characteristics can reflect the 
overflow. At the same time, in the YARN cluster, we financial health of the enterprise and can effectively 
configured 10 nodes, each with an Intel Xeon Platinum identify potential financial risks. The process of data 
8380 CPU model and 32GB of memory to meet the collection and sample selection includes data source 
hardware resource requirements of Flink jobs. selection, data cleaning and prepossessing, data 
In this study, the "real-time" processing achieved with integration and real-time processing, and also involves the 
Kafka and Flink refers to the frequency of data processing application of feature engineering to ensure the 
at the transaction level. That is, once new financial data is comprehensiveness, accuracy and timeliness of the data, 
generated, Kafka will immediately receive it and quickly which provides the data foundation for the subsequent 
transmit it to Flink for processing, with almost no delay, audit automation model construction and optimization [2]. 
which is different from daily or other low-frequency batch Through correlation analysis, we found that revenue 
processing methods. Through this real-time processing, growth rate and accounts receivable turnover rate are 
Blink can calculate key financial indicators such as highly correlated with financial risk, so these two features 
accounts receivable turnover and cash flow in a very short are used as important inputs of the model. In addition, 
time. The real-time acquisition of these indicators allows through PCA analysis, we further verified the 
auditors to promptly detect subtle changes in the effectiveness of these features after dimensionality 
company's financial situation and quickly discover reduction. 
potential risks. For example, a sudden decrease in cash Ethical considerations, data anonymization, and 
flow may indicate that the company's capital chain is tight, dataset reproducibility. Ethical considerations are crucial 
and then take measures in advance to improve the audit in the process of collecting financial data. We strictly 
process and improve audit efficiency. Real-time abide by data protection regulations to ensure that the data 
processing and analysis technology enhances risk control sources are legal and compliant. For data obtained from 
capabilities, mainly reflected in its ability to monitor public databases and internal reports, strict anonymization 
financial data in real time. Once the data fluctuates is performed. All information that can directly or 
abnormally, the audit system will immediately issue an indirectly identify individuals is removed, such as 
early warning, allowing auditors to intervene in time to replacing company names with codes and desensitizing 
reduce the company's financial risks. key personnel information. To ensure the reproducibility 
In the process of data collection, data cleaning and of the dataset, the data collection process, tools used, and 
preprocessing are essential steps. Eliminate duplicate data parameter settings are recorded in detail. For example, 
and data items with obvious errors to ensure data when extracting data from public databases such as Yahoo 
consistency and accuracy. Deal with missing value Finance, SQL query statements and Python scripts are 
problems, such as completion by means of mean filling or recorded to facilitate other researchers to reproduce the 
interpolation. For outliers, the 3σ principle is adopted to research process and ensure the scientificity and 
detect and process them to ensure the rationality of the credibility of the research. 
data. In order to improve the quality and availability of The dataset reflects real-world conditions and 
data, data standardization has also been carried out to potential biases. The dataset used in this study is 
6 Informatica 49 (2025) 1–20 J. Li et al. 
comprehensive, covering financial data of companies in 
multiple industries from 2015 to 2023, including Data   Data 
A D 
manufacturing, service, retail, etc., involving various Cleaning Standardization 
aspects of company daily operations, quarterly and annual 
reports, etc. However, potential biases may still exist. The 
Missing 
data mainly comes from companies with data disclosure B   Real-time 
E 
Values Processing 
capabilities, which may ignore some small and micro 
enterprises or emerging enterprises. In addition, different 
industries have different financial characteristics and risk Outlier 
C   Feature 
F 
Detection Engineering 
patterns. Although the samples cover multiple industries,   
they may be underrepresented in certain segments, Figure 2: Preprocessing steps 
resulting in limited adaptability of the model in these 
special scenarios. As shown in Figure 2, in order to ensure the timeliness 
and real-time performance of data, real-time data 
2.1.2 Data cleaning and processioning steps processing frameworks, such as Apache Kafka and 
Apache Slink, are introduced in the re-processing process 
In the research of financial audit automation based on 
to realize real-time processing and analysis of data flow. 
artificial intelligence, the data cleaning and pr-processing 
Through the framework, the real-time financial data can 
step is very important, which ensures the accuracy and 
be cleaned and re-processed in time to ensure the latest 
consistency of input data, thus improving the performance 
and accuracy of the data. Feature engineering plays a key 
and reliability of the model. The first step in data cleaning 
role in data preprocessing. Multi-dimensional feature 
is to remove duplicate records and incorrect data. 
extraction and selection are carried out for financial data. 
Duplicate transaction records in financial statements are 
Key financial indicators such as revenue growth rate, cost 
checked and deleted by unique identifiers to ensure the 
change rate and asset-liability ratio are extracted from the 
uniqueness of data. Use logical rules and domain 
original transaction data, and correlation analysis is 
knowledge to detect and correct obvious errors, such as 
carried out to select the features that have an impact on the 
negative revenue records or unreasonable transaction 
audit model. The re-processing steps ensure the high 
amounts. Dealing with missing values is a key step in data 
quality of the data and lay the foundation for the 
preprocessing. Many methods are used to deal with 
subsequent training and optimization of the audit 
missing values, including mean filling method, median 
automation model. 
filling method and K-nearest neighbor algorithm. Missing 
financial indicators, such as sales in a quarter, can be filled 
2.1.3 Financial data integration 
by calculating the average sales of similar enterprises to 
ensure data integrity. In the research of financial audit automation based on 
When dealing with outliers, the 3σ principle is artificial intelligence, financial data integration is a key 
adopted, that is, abnormal data that exceeds 3 standard step to realize comprehensive data analysis and real-time 
deviations of the mean value is detected and processed. processing. Integrate data from multiple sources into a 
Based on the detected abnormal values, manual unified database for centralized processing and analysis. 
verification is carried out according to the actual business Key financial indicators such as revenue, cost, profit, 
logic, and the confirmed abnormal data is removed or accounts receivable, and accounts payable are shown 
adjusted. For an expense that is higher than the industry below. 
average, further verification is conducted to confirm the Table 2: Financial data integration 
presence of data entry errors or unusual financial activity. 
Data standardization is a step-in data processioning. The Reve Profi Play
Cost Receiv
z-score standardization method is used to convert the data Compa nue t able 
Ye (mill ables 
into a standard normal distribution, which can eliminate ny (mill (mill (mill
ar ion (millio
the dimensional differences between different financial Name ion ion ion 
$) n $) 
indicators and enhance the stability of the model. The $) $) $) 
financial data of different enterprises such as revenue, cost Tech 
20 143. 97.5 46.1 36.2
and profit are standardized, so that all indicators are Solutio 50.37 
15 67 4 3 4 
analyzed and compared under the same dimension [3]. ns Inc. 
 Tech 
20 153. 103. 50.1 38.7
Solutio 53.89 
16 45 29 6 6 
ns Inc. 
Green 
20 165. 110. 55.2 41.2
Energy 60.34 
17 78 56 2 1 
Corp. 
Green 
20 172. 115. 56.7 44.3
Energy 63.96 
18 14 37 7 2 
Corp. 
20 Health 188. 129. 58.6 68.48 47.8
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 7 
19 Plus 23 58 5 9 01 ions s 
Ltd. 10:0 Inc. 
Health 0:00 
20 194. 133. 60.9 49.7
Plus 70.87 2024 Gree
20 36 45 1 6 
Ltd. -01- n 
TXN 850.7 Parab 45.
Auto 01 Ener Debit 
20 210. 145. 64.8 53.1 002 8 les 12 
Tech 75.34 10:0 gy 
21 47 67 0 2 
Global 5:00 Corp. 
Auto 2024
20 223. 154. 68.8 56.4 Healt
Tech 78.92 -01- Recei
22 19 32 7 7 TXN h 1560. Credi 72.
Global 01 vable
003 Plus 90 t 55 
Food 10:1 s 
Ltd. 
20 Innova 235. 162. 72.6 59.3 0:00 
82.45 
23 tions 56 89 7 4 2024
Auto 
Inc. -01-
TXN Tech 1120. Parab 49.
 01 Debit 
004 Glob 23 les 34 
As shown in Table 2, in the process of data 10:1
al 
integration, data warehouse technology is used to realize 5:00 
data extraction, conversion and loading through ETL 2024 Food 
process. Extract relevant financial data from different data -01- Inno Recei
TXN 1330. Credi 64.
sources (such as ERP system, CRM system, e-invoice 01 vatio vable
005 67 t 78 
system.) to ensure the comprehensiveness and 10:2 ns s 
completeness of the data. Format conversion and 0:00 Inc. 
standardization of data from different sources, such as 2024
Tech 
unified date format, currency unit conversion., to ensure -01-
TXN Solut 975.3 Parab 50.
data consistency and comparability. The converted data is 01 Debit 
006 ions 4 les 85 
loaded into a unified database, and the partitioning and 10:2
Inc. 
indexing techniques are used to improve the efficiency of 5:00 
data query and processing. As shown in Table 3, the Kafka platform enables real-
In order to meet the needs of real-time data time acquisition and processing of transaction data from 
processing, Kafka stream processing platform is various data sources, such as sales systems, banking 
introduced to realize real-time data acquisition and interfaces, and supply chain management systems. Kafka's 
processing. Monitor transaction records and bank high throughput and low latency features ensure timely 
statements in real time, and update accounts receivable data transmission and processing. Blink is used for real-
and parables data to support dynamic financial audit time data analysis and processing. With Blink, it is 
analysis. The data integration step ensures the high quality possible to calculate various key financial indicators in 
and timeliness of the data, and provides the data real time, such as accounts receivable turnover, cash flow. 
foundation for the subsequent audit automation model In the table, you can monitor Tech Solutions Inc in real 
construction and optimization [4]. time. Accounts receivable and accounts payable changes, 
and through the flow processing algorithm to instantly 
2.1.4 Real-time data processing and analysis calculate the company's cash flow and financial health. 
In the research of financial audit automation based on Using machine learning algorithms, anomaly 
artificial intelligence, real-time data processing and detection modules are embedded in the data stream to 
analysis is the key to ensure the high efficiency and identify and flag suspicious transactions in real time. By 
accuracy of audit process. It uses advanced stream analyzing unusual changes in transaction amount and 
processing technology to realize real-time processing and frequency, potential financial fraud is detected in a timely 
analysis of financial data. Stream processing platforms manner. The real-time processing and analysis technology 
such as Kafka and Slink are introduced into the system to improves the audit efficiency and enhances the risk 
support real-time monitoring and analysis of large-scale control ability in the audit process. The method of real-
financial data. time data processing and analysis provides a strong 
support for the automation of financial audit and ensures 
Table 3: Real-time financial transactions the accuracy and timeliness of data. 
Trans Bal
Com actio Trans anc 2.2 Model construction 
Trans Time Acco
pany n actio e 
actio stam unt 2.2.1 Selection of audit automation model 
Nam Amo n (mi
n ID p Type 
e unt Type llio In the study of AI-based financial audit automation 
($) n $) methods, model selection is a key step to ensure the 
TXN 2024 Tech 1200. Recei Credi 52. efficiency and accuracy of the audit process. According to 
001 -01- Solut 45 vable t 38 the research objectives and data characteristics, random 
8 Informatica 49 (2025) 1–20 J. Li et al. 
forest algorithm is chosen as the core audit automation trees exceeds 100, the improvement slows down. On the 
model. Based on the superior performance of random validation set, when the number of trees is 100, the recall 
forest in processing large-scale, high-dimensional data, as rate reaches the highest, and the overfitting phenomenon 
well as its high accuracy and robustness in classification is not obvious. Therefore, considering the generalization 
and regression tasks. Random forest algorithm can ability and computational cost of the model, the number 
classify and predict data by constructing multiple decision of trees is selected as 100. For the determination of the 
trees and splitting features randomly at each node. maximum depth, we start the test from a depth of 5 and 
Advantages include the ability to handle a large number of gradually increase the depth. When the depth is 10, the 
input variables, difficulty in over fitting, and robustness to accuracy of the model on the training set reaches 90%, and 
missing data. The specific model selection and the accuracy on the validation set remains at around 85%. 
construction process is as follows: If the depth is further increased, the accuracy of the 
Feature selection: Extract key features from training set will increase slightly, but the accuracy of the 
integrated financial data, such as revenue growth rate, validation set will decrease, and overfitting will occur. 
accounts receivable turnover, asset-liability ratio, and cash Therefore, the maximum depth is set to 10 to achieve a 
flow. Characteristics can fully reflect the financial health balance between capturing data features and preventing 
and potential risks of the enterprise. 500 characteristics overfitting. 
were extracted from the financial data of several Model training: Train a random forest model on a 
enterprises, including revenue, cost, profit, accounts training set to build a forest containing 100 decision trees. 
receivable, accounts payable. Each decision tree is generated by self-sampling the 
We selected the three key features of "income growth training set. The goal of the model is to minimize the 
rate, cost change rate and accounts receivable turnover classification error rate, as shown in formula (1). 
rate" from the 500 initial features, and used a combination 1 N
of stepwise regression and correlation analysis. First, we E =  I (yi  yi )
performed univariate correlation analysis on all features N
   i=1                      （1） 
and target variables (such as financial risk indicators), Model evaluation: Evaluate model performance using 
selected features with high correlation (absolute value test sets, calculating metrics such as accuracy, recall, and 
greater than 0.5), and initially reduced the number of F1 scores. Among the 1500 test data, the model correctly 
features to about 100. Then, we used the stepwise classifies 1400 items and incorrectly classifies 100 items, 
regression method to introduce the initially selected then the accuracy of the model is shown in formula (2). 
features into the regression model one by one, and selected 1400
the feature combination with the best model fitting effect Accuracy = = 0.9333
and the most streamlined variables based on indicators   1500                （2） 
such as AIC (Akaike Information Criterion) and BIC    The recall rate and F1 score are calculated as shown 
(Bayesian Information Criterion). In the correlation in formulas (3) and (4). 
analysis, we used the Pearson correlation coefficient to TP
calculate the correlation between each feature and the Recall =
financial risk indicator on the quarterly financial data of    TP + FN                      （3） 
the past 5 years. For example, the Pearson correlation 
PrecisionRecall
coefficient between the income growth rate and the F1= 2
financial risk indicator is 0.7, indicating that there is a   Precision +Recall                  （4） 
strong positive correlation between the two. When Model optimization: Improve model performance and 
verifying the validity of the features by principal stability by adjusting model parameters such as the 
component analysis (PCA), we first standardized the number of trees, maximum depth, and minimum number 
selected 3 features, then calculated the covariance matrix, of sample splits. Cross-validation method was used to 
and solved the eigenvalues and eigenvectors. The number further verify the generalization ability of the model. 
of principal components is determined by the cumulative Through the steps, the random forest algorithm can 
contribution rate reaching more than 90%. The results effectively identify and classify various financial 
show that the cumulative contribution rate of the first two anomalies and risks in the automation of financial audit, 
principal components reaches 92%, indicating that these and provide accurate and reliable audit results. This 
three characteristics can effectively explain most of the method improves the audit efficiency, strengthens the risk 
information in the financial data. control ability, and ensures the healthy development of 
Data set partitioning: The data set is divided into enterprise financial management. 
training sets and test sets to ensure the generalization 
ability of the model. Typically, 70% of the data is used for 
2.2.2 Model architecture design 
training and 30% for testing. There are a total of 5,000 
records, the training set contains 3,500 records, and the In the research of financial audit automation method based 
test set contains 1,500 records [5]. on artificial intelligence, the design of model architecture 
To determine the number of trees, we set the number is the core step to build an efficient and accurate audit 
of trees to 50, 100, 150, and 200 for experiments. On the automation system. As the core model, random forest 
training set, as the number of trees increases, the accuracy algorithm can give full play to its advantages in processing 
of the model gradually increases, but when the number of 
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 9 
high-dimension and large-scale data through reasonable paths →define_paths(tasks) 
architecture design. path →a_star(tasks, paths) 
In the model architecture, the main function of the # Audit 
"data input layer" is to receive raw financial data from for task in path: 
different data sources, including internal corporate audit(task) 
financial statements, bank flow records, etc., and perform report(task) 
preliminary format verification and missing value To ensure that other researchers can repeat our 
marking on the data. For example, for data in date format, research process, we describe the specific steps of data 
ensure that it conforms to a unified standard format; for collection in detail. The data is mainly obtained from 
data with missing values, mark the missing position for internal databases, public financial reports, and third-party 
subsequent processing. The "feature extraction layer" is financial data providers. Specifically, the internal database 
based on the data input layer, and deeply processes the raw of the company provides real-time updated financial 
data to extract effective information that can reflect the records; public financial reports are obtained through 
financial status and risk characteristics of the enterprise. stock exchanges and company websites; third-party 
For example, various financial ratios such as debt-to-asset financial data providers supplement industry benchmark 
ratio and gross profit margin are calculated from financial data. In addition, we recorded the SQL query statements 
statement data; trend features and seasonal features are and Python scripts for data extraction in detail to ensure 
extracted from time series data. Data conversion is to the consistency and integrity of the data. The ETL process 
change the form of raw data to meet the input requirements includes data extraction using Apache NiFi, data cleaning 
of the model, such as converting text data into numerical and transformation through the Pandas library, and finally 
data and performing unique hot encoding on categorical loading into the data warehouse using Apache Hive. These 
data. Data standardization is to normalize numerical data detailed steps ensure the transparency and repeatability of 
so that data with different features have the same scale. the data processing process. 
Commonly used methods include Z-score standardization Feature extraction layer: In this layer, key features are 
and Min-Max standardization. Through these clear extracted from raw data, including but not limited to 
functional definitions and operation processes, the clarity revenue growth rate, accounts receivable turnover, asset-
of the model architecture and the efficiency of data liability ratio, and cash flow. The purpose of feature 
processing are ensured. extraction is to transform complex raw data into a 
Data entry layer: This layer is responsible for simplified representation that the model can process. The 
receiving and processing financial data from a variety of extracted features include revenue growth rate, accounts 
data sources, such as ERP systems, bank statements, and receivable turnover rate, asset-liability ratio. 
electronic invoices. The data input layer needs to realize Data standardization layer: In order to eliminate 
real-time data acquisition and re-processing to ensure the dimensional differences between different features, the 
integrity and consistency of the input data [6]. data standardization layer standardizes the extracted 
In the data standardization layer, we used the z-score features. As shown in formula (5). 
standardization method to convert data from different 
X − 
sources into the same dimension to eliminate dimensional Z =
differences.                                                        （5） 
Below is the pseudo code of the model framework. Model training layer: This layer contains the concrete 
# Data implementation of the random forest algorithm to build 
data →collect(['internal', 'bank', 'external']) multiple decision trees from the training set data. Each 
cleaned →clean(data) decision tree is generated by self-sampling the training set 
unified →integrate(cleaned) and split by randomly selecting features at each node. The 
kafka →setup_kafka() prediction of random forest algorithm is shown in formula 
while True: (6). 
new →kafka.get() yˆ =mode{h (x),h
 1 2(x),,hk (x)}
             （6） 
process(new) Model optimization layer: In order to improve the 
detect_anomaly(new) performance and stability of the model, the model 
# Model optimization layer optimizes the model by tuning and 
features →select(unified) cross-validation methods. The parameters include the 
train, test →split(features, 0.7) number of decision trees, the maximum depth and the 
rf →RandomForest(100, 10) minimum number of sample splits [10]. 
rf.train(train) Prediction layer: After model training is complete, the 
# Evaluation and Optimization prediction layer is responsible for making predictions on 
pred →rf.predict(test) the test set and output the results. The main task of the 
metrics →evaluate(pred, test.labels) prediction layer is to evaluate the performance of the 
model, including accuracy, recall, and F1 Score. The 
rf →optimize(rf, train, 5) 
accuracy on the test set is 93.33%, as shown in formula 
# Path Planning (7). 
tasks →define_tasks() 
10 Informatica 49 (2025) 1–20 J. Li et al. 
  Recall refers to the proportion of samples correctly 
Number of Correct Predictions identified as anomalies to all actual anomaly samples, 
Accuracy = reflecting the model's ability to detect anomalies. In 
Total Number of Predictions             financial auditing, high precision can reduce misjudgment 
（7） of normal transactions and reduce audit costs; high recall 
In the anomaly detection model of financial auditing, can ensure that more potential financial risks are 
we define "correct" classification as: for a transaction data, discovered. Through a comprehensive evaluation of 
if its various financial indicators meet the reasonable precision and recall, the performance of the anomaly 
range specified by accounting standards, and after in- detection layer can be more comprehensively measured. 
depth analysis, no signs of financial fraud, such as Feedback and Improvement layer: This layer 
fictitious income, concealed expenses, etc., are found, compares the predicted results of the model with the actual 
then the transaction is judged to be normal. On the audit results and continuously improves and optimizes the 
contrary, if there are abnormal fluctuations in indicators in model based on the feedback. Through cyclic iteration, the 
the transaction data, or there are significant differences accuracy and robustness of the model are continuously 
from the company's past operating data and the industry improved. Through architecture design, the application of 
average, and at the same time, possible clues of financial random forest algorithm in financial audit automation has 
fraud are found through data analysis, such as mismatch been fully optimized and promoted. The systematic design 
between income and costs, abnormal cash flow, etc., then ensures the efficiency and accuracy of the model when 
the transaction is judged to be abnormal. In the anomaly dealing with large-scale and high-dimensional financial 
detection process, the model first extracts multi- data, and provides powerful technical support for 
dimensional features of the input financial data, including enterprise financial audit [11]. 
financial ratios, trend analysis, etc. Then, the trained 
classifier is used to determine whether the data belongs to 2.2.3 Configuring the data layer and 
the normal class or the abnormal class according to the processing layer 
preset threshold and decision rules. For example, when the 
accounts receivable turnover rate is lower than a certain In the research of financial audit automation based on 
percentage of the industry average, and the revenue artificial intelligence, the configuration of data layer and 
growth rate fluctuates significantly in a short period of processing layer is the key part of model construction, 
time, the model will judge the transaction as abnormal, which directly affects the efficiency and performance of 
triggering further audit investigation. the system. The data layer is responsible for storing and 
Anomaly Detection Layer: During the audit process, managing financial data, while the processing layer is 
the anomaly detection layer is responsible for identifying responsible for data cleaning, transformation, analysis and 
and flagging suspicious financial activity. By analyzing modeling, and the data layer configuration needs to 
unusual changes in transaction amount and frequency, the consider the diversity of data and storage efficiency. 
model can detect potential financial fraud in real time. In After re-evaluating the data distribution, we found 
the "anomaly detection layer", in order to more that the current data set does not meet the normal 
comprehensively evaluate the performance of the real- distribution assumption. Therefore, we use the isolation 
time fraud detection system, we added the evaluation of forest algorithm instead of the 3σ principle for outlier 
false positive and false negative rates. By calculating the detection. The isolation forest algorithm is based on the 
false positive and false negative rates, we can better principle that in high-dimensional space, normal data 
measure the accuracy and robustness of the system. points tend to cluster together, while abnormal data points 
Specifically, we calculated the false positive and false are relatively isolated. The algorithm constructs multiple 
negative rates through the confusion matrix and analyzed random binary trees to randomly divide the data points and 
their impact on the system performance. The results show calculates the path length of each data point in the tree. 
that the system has a low false positive rate, which means The shorter the path length, the more isolated the data 
that normal activities are less likely to be mislabeled as point is and the more likely it is an outlier. In practical 
abnormal; at the same time, the false negative rate is also applications, we first normalize the original financial data 
effectively controlled to ensure that potential risks are not to eliminate the impact of the dimension. Then, the 
ignored. These evaluation results further confirm the processed data is input into the isolation forest model, and 
efficiency and accuracy of the system in real-time fraud the number of trees is set to 100 and the subsample size is 
detection. set to 256 to ensure the stability and accuracy of the model. 
In addition to focusing on false positives and false After the model training is completed, for new financial 
negatives, the anomaly detection layer also focuses on data, we calculate its anomaly score in the isolation forest 
precision and recall. Precision refers to the proportion of and set a suitable threshold (such as 0.5). When the 
samples correctly identified as anomalies to all samples anomaly score exceeds the threshold, the data point is 
identified as anomalies, reflecting the accuracy of the determined to be an outlier. 
model in identifying anomalies.   
  
  
  
 
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 11 
Table 4: Financial indicators margin, operating profit margin, return on assets., and 
select features that have a greater impact on the model 
Ret
through correlation analysis [12]. 
urn De
The processed data is re-stored in the data warehouse, 
Gro Oper on bt-
Qu which is convenient for subsequent model training and 
Rec Comp ss ating Ass to-
Ye ick real-time processing. Partitioning and indexing techniques 
ord any Mar Marg ets Eq
ar Rat are used to improve the efficiency of data query and 
ID Name gin in (R uity 
io processing. Real-time data processing and analysis are 
(%) (%) OA Rat
realized through stream processing platforms such as 
) io 
Kafka and Blink. The real-time processing layer is 
(%) 
responsible for monitoring and analyzing the flow of 
Tech financial data, calculating key financial metrics and 
20 Soluti 35. 8.3 0.6 1.2
001 12.45 anomaly detection in real time. Monitor the company's 
19 ons 67 4 7 5 gross and operating margin changes in real time, and 
Inc. identify and flag unusual transactions in a timely manner. 
Tech The processing layer uses the random forest algorithm to 
20 Soluti 37. 9.1 0.6 1.3
002 13.22 train and predict the processed data. The model training 
20 ons 12 1 5 0 process includes the partitioning of data sets, the 
Inc. adjustment of model parameters and cross-validation to 
Green ensure the generalization ability and prediction accuracy 
20 Energ 30. 7.5 0.7 1.1
003 10.89 of the model. The configuration, data layer and processing 
19 y 78 6 2 5 layer work together to ensure high-quality management 
Corp. and efficient processing of financial data, which provides 
Green a basis for the construction and optimization of audit 
20 Energ 32. 8.1 0.7 1.2
004 11.34 automation models. 
20 y 45 2 0 0 
Corp. 2.2.4 Random forest algorithm selection and 
Health 
20 28. 6.7 0.7 1.1 implementation 
005 Plus 9.45 
19 56 8 5 0 
Ltd. In the research of financial audit automation based on 
artificial intelligence, the implementation process of 
Health 
20 29. 7.2 0.7 1.1 random forest algorithm is very important, which 
006 Plus 10.12 
20 67 3 3 8 determines the accuracy and robustness of the model. 
Ltd. 
Data preparation: Load and process the characteristic 
Auto 
20 34. 8.5 0.6 1.2 data in the data table. Ensure data integrity and 
007 Tech 12.67 
21 89 6 8 2 consistency. Characteristic data include revenue growth 
Global 
rate, accounts receivable turnover, asset turnover, debt 
Auto 
20 36. 9.4 0.6 1.2 ratio, and net profit margin. In the process of data 
008 Tech 13.45 
22 45 5 6 8 preparation, the feature data is standardized to eliminate 
Global 
dimensional differences between different features. 
Food 
Model training: Train a random forest model on a 
20 Innova 33. 8.1 0.6 1.2
009 12.12 training set. Random forests achieve prediction by 
21 tions 56 2 9 0 
building multiple decision trees and splitting randomly 
Inc. 
selected features at each node. Set key parameters of the 
Food 
random forest, such as the number of decision trees. In this 
20 Innova 35. 8.8 0.6 1.2
010 12.78 study, 100 decision trees are constructed. Each decision 
22 tions 12 9 7 5 
tree is generated by self-sampling to ensure the robustness 
Inc. 
and accuracy of the model. 
 Model prediction: Use a trained random forest model 
As shown in Table 4, the processing layer is to make predictions on the test set. The model is classified 
responsible for cleaning, transforming, analyzing, and or regression based on the prediction results of the 
modeling the financial data in the data layer. The majority decision tree. Of the 300 records in the test set, 
processing layer cleans the financial data in the data layer, the model classified 280 correctly and 20 incorrectly [13]. 
including dealing with missing values, outliers, and Model evaluation: Evaluate the performance of the 
duplicate data. For missing values, the median fill method model, mainly including calculation accuracy, recall rate 
is used, and for outliers, the 3σ principle is used for and F1 score. 
detection and processing. Data transformation involves Parameter optimization: Optimize model 
standardizing and normalizing the raw data to eliminate performance by adjusting model parameters. The cross-
dimensional differences between different features. The validation method is used to verify the generalization 
processing layer improves the performance of the model ability of the model to ensure the consistency and stability 
through feature extraction and feature selection. Extract of the model on different data sets. 
key features from financial indicators, such as gross profit 
12 Informatica 49 (2025) 1–20 J. Li et al. 
Through the implementation process, the random algorithm. The optimization strategy mainly includes 
forest algorithm has been effectively applied in the parameterize tuning, feature selection, data enhancement 
automation of financial audit. It improves the audit and model integration. Feature selection by calculating the 
efficiency, strengthens the risk control ability, and characteristics, the features that contribute little to the 
provides technical support for the financial health model are eliminated, so as to reduce the noise and 
management of enterprises. The systematic realization improve the interpretation and efficiency of the model. 
process ensures the efficiency and accuracy of the model, Characterization can be determined by calculating the 
and further promotes the intelligent development of contribution of each feature to the reduction of model 
financial audit [14] impurity. In the analysis, the revenue growth rate and 
accounts receivable turnover rate contribute the most, and 
2.3 Training and optimization the characteristics can be preferentially retained. 
Data enhancement is another strategy to improve the 
2.3.1 Training process description robustness and generalization of the model by generating 
In the research of financial audit automation based on more training samples.  The model integration strategy 
artificial intelligence, the training process is the key step further improves the prediction performance by 
to ensure the performance of random forest model. Extract combining the prediction results of multiple models. 
and standardize key characteristics from the data sheet, Random forest and gradient lifting decision tree are 
including revenue growth rate, accounts receivable combined to form an integrated model, and the advantages 
turnover, asset turnover, debt ratio, and net profit margin. of different algorithms are utilized to enhance the accuracy 
After the features are reprocessed, the input data set of the and stability of prediction. In the concrete implementation, 
model is formed. Next, the data set is divided into a the random forest and GBDT are trained respectively, and 
training set (70%) and a test set (30%). In the model then the predicted results of the two are fused by weighted 
training phase, the random forest algorithm is trained by average or voting mechanism to obtain the final predicted 
building 100 decision trees. Each tree uses a bootstrap value [15]. 
sampling method to extract samples from the training set. Through the optimization strategy, the performance of 
At each node, features are randomly selected for splitting random forest algorithm in financial audit automation has 
to minimize the Gini coefficient. In addition, the been improved, ensuring the efficiency and reliability of 
generalization ability of the model is further improved by the model in different data sets and scenarios, and 
setting the maximum tree depth to 10 and using a 50% providing technical support for the financial health 
feature subset. management of enterprises. 
 For situations where real-time data sources are 
In the model training stage, the random forest temporarily unavailable, we have designed buffering 
algorithm is trained by constructing 100 decision trees. strategies and error handling mechanisms. When the real-
Each decision tree uses a self-service sampling method to time data stream is interrupted, the system automatically 
extract samples from the training set to ensure the stores the data in the memory buffer and periodically 
diversity of data and the robustness of the model. At each attempts to reconnect to the data source. Once the data 
node, randomly selected features are split to minimize Gin source is restored, the data in the buffer will be quickly 
coefficient or maximize information gain, thus building processed and fed into the system. In addition, the system 
the structure of the tree. The goal of model training is to is also configured with error handling logic. When the data 
reduce the over fitting risk of a single decision tree and source is unavailable for a long time, an alarm mechanism 
improve the generalization ability of the whole model will be triggered to notify the administrator to 
through the voting results of the majority decision trees. troubleshoot the problem. These mechanisms ensure the 
We have listed the hyperparameters used for training stability and continuity of the system in the face of 
the random forest model in detail, including the number of emergencies. 
decision trees (set to 100), the maximum depth (set to 10), In the random forest algorithm, the number of trees 
the number of features used to split a node (set to 50% of and tree depth are two key hyperparameters. The number 
the total number of features), and other key parameters. of trees is chosen to be 100 because more trees can 
Cross-validation methods are used to evaluate the integrate the results of more decision trees, reduce the risk 
model's performance on different datasets through of overfitting of a single tree, and improve the 
repeated iterations and parameter adjustments (e.g., generalization ability of the model. If the number of trees 
number of decision trees, maximum depth.) to ensure high is too small, the model will not learn fully; if the number 
accuracy and stability. After the training process, the of trees is too large, the computational cost will increase 
model's performance on the test set is used for final and the benefits will gradually decrease. The tree depth is 
evaluation and validation to confirm its validity and set to 10 to balance the complexity and accuracy of the 
reliability in practical applications. model. If the tree is too deep, the model will overfit the 
training data and the generalization ability will deteriorate; 
2.3.2 Model optimization strategy if the tree is too shallow, the complex features of the data 
cannot be learned, which will reduce the performance of 
In the research of financial audit automation based on 
the model. A reasonable tree depth can avoid overfitting 
artificial intelligence, model optimization strategy is the 
while ensuring the model's ability to capture features. 
key to improve the performance of random forest 
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 13 
2.4 Automatic path planning of financial to be carried out after the audit of balance sheets is 
audit completed. We convert these tasks into nodes and edges 
in the graph algorithm, and use the Dijkstra algorithm to 
2.4.1 Path planning algorithm selection calculate the shortest path from the start node (such as 
In the research of financial audit automation method based project start) to the end node (such as audit report 
on artificial intelligence, the selection of path planning generation). By optimizing path planning, we can 
algorithm is the link to realize efficient audit process. Path reasonably arrange the work order of auditors, reduce 
planning algorithms are designed to determine the best unnecessary waiting time and repetitive work, such as 
audit path to maximize audit efficiency and coverage avoiding auditors from frequently switching between 
while minimizing audit costs and time. Based on the different tasks, thereby improving audit efficiency, and it 
demand of this research, the shortest path algorithm and is expected that the audit time can be shortened by about 
heuristic search algorithm based on graph theory, such as 20%. 
Dijkstra algorithm and A (A-Star) algorithm, are selected 
as the core algorithm of path planning. Dijkstra algorithm 2.4.2 Audit process design 
is a classical shortest path algorithm, which can find the In the research of financial audit automation based on 
shortest path from the starting point to the end point in the artificial intelligence, the audit process design is the key 
weighted graph. It is suitable for task planning in financial to realize the efficient automation of audit tasks. 
audit, such as determining the optimal path from one audit Designing a scientific and reasonable audit process can 
task to another, reducing the waste of auditors' time and maximize the use of artificial intelligence technology to 
resources. The algorithm maintains a priority queue, improve audit efficiency and accuracy. The design of audit 
gradually expands to all nodes in the graph, calculates the process mainly includes re-audit preparation, data 
shortest path of each node, and finally builds a complete acquisition and preprocessing, model application and 
shortest path tree. analysis, anomaly detection and processing, audit report 
On the basis of Dijkstra's algorithm, algorithm A generation and so on. 
introduces a heuristic function, which makes it more In the re-audit preparation stage, the system 
efficient to search the optimal path. The heuristic function establishes the audit plan and determines the audit focus 
estimates the distance between the current node and the and risk areas according to the historical financial data and 
destination node, thus preferentially choosing the path that industry benchmark data of the enterprise. This step 
is most likely to reach the destination. Algorithm A has includes collecting data such as the company's annual 
practical application value in financial audit automation, financial statements, bank statements, electronic invoices 
for example, in large-scale data sets or complex audit and transaction records. Next is the data acquisition and 
tasks, it can quickly find an efficient audit path and reprocessing stage, the system through the API interface 
improve the overall audit efficiency. When planning an and data crawler technology, real-time acquisition of the 
audit task, there are multiple task nodes and paths, each latest financial data of enterprises, and data cleaning, 
with a different cost (such as time or resource format conversion and feature extraction. The processed 
consumption). Using Dijkstra's algorithm, a path to data will be stored in a data warehouse for subsequent 
minimize the total cost can be calculated. In more complex analysis [17]. 
scenarios, algorithm A further optimizes the path selection In the stage of model application and analysis, random 
by introducing heuristic evaluation, making the audit forest algorithm is applied to financial data for risk 
process more efficient and intelligent [16]. assessment and anomaly detection. The system analyzes 
By combining Dijkstra and An algorithm, we can key financial indicators, such as revenue growth rate, 
effectively plan the path of financial audit tasks and asset-liability ratio, cash flow., and predicts potential 
improve the overall performance and efficiency of audit financial risks and abnormal transactions through the 
automation system. The path planning method simplifies model. During an audit, the system finds that a company's 
the audit process, enhances the accuracy and timeliness of accounts receivable turnover is lower than the industry 
the audit results, and provides support for the financial average, and the model flags this as an anomaly and 
management of enterprises. further analyzes the cause. During the exception detection 
In the audit task, we define each audit link as a node, and handling phase, the system analyzes the detected 
such as financial statement review, inventory counting, exceptions in detail and provides actionable audit 
accounts receivable verification, etc. The edges between suggestions. The system recommends that auditors further 
nodes represent the order and dependency between tasks, verify whether the low turnover is due to, for example, 
such as inventory counting can only be performed after the poor collection of accounts or errors in financial 
financial statement review is completed. Suppose we have statements. 
an audit project, including four main tasks: auditing sales This is the audit report generation stage. The system 
revenue, auditing costs and expenses, auditing balance automatically generates detailed audit reports, including 
sheets, and auditing cash flow. Among them, auditing audit findings, risk assessment, and improvement 
sales revenue and auditing costs and expenses can be suggestions. The report format is standardized, which is 
carried out in parallel, while auditing balance sheets needs easy for auditors and management to read and make 
to be carried out after the audit of sales revenue and costs decisions. The system generates a PDF report with a 
and expenses is completed, and auditing cash flow needs summary of the audit, a detailed exception list, and 
14 Informatica 49 (2025) 1–20 J. Li et al. 
recommendations for improvement. Through the audit valuable decision support for enterprise management. The 
process design, the financial audit process has realized a audit report generated by the system recommends that 
high degree of automation and intelligence, improved the enterprises optimize their inventory management 
audit efficiency and accuracy, and enhanced the processes to reduce inventory costs and improve the 
transparency and traceability of the audit process, which efficiency of capital use. Through the implementation of 
provides support for the financial health management of audit strategy, the financial audit process has realized a 
enterprises. high degree of automation and intelligence, improved the 
audit efficiency and accuracy, and enhanced the 
2.4.3 Implementation of audit policies transparency of enterprise financial management and risk 
control ability. 
In the research of financial audit automation method based 
In the implementation of audit policies, we take a 
on artificial intelligence, the realization of audit strategy is 
manufacturing enterprise as an example. The principle of 
the link to ensure the efficient and accurate audit process. 
formulating the audit plan is to determine the key areas 
The implementation process includes the formulation, 
and key links of the audit based on the business 
implementation and dynamic adjustment of audit strategy. 
characteristics, financial risk status and regulatory 
The development of audit strategies is based on a 
requirements of the enterprise. For example, for this 
comprehensive analysis of the company's financial data 
manufacturing enterprise, we focus on raw material 
and risk assessment, using artificial intelligence 
procurement, production process cost control and product 
technology to identify key audit indicators and high-risk 
sales. The schedule is as follows: at the beginning of each 
areas. Through the analysis of historical data and industry 
quarter, a detailed audit plan is formulated to clarify the 
benchmark data, the system develops detailed audit 
audit tasks and time nodes of each stage; the first week is 
strategies. Audit policies include audit scope, key audit 
to conduct a preliminary review of financial statements, 
areas, schedule, and resource allocation. For an enterprise, 
the second week is to conduct inventory counting and 
the system focuses on its accounts receivable and 
accounts receivable verification, the third week is to 
inventory management, and makes corresponding audit 
conduct a detailed review of costs and expenses, and the 
schedule and resource allocation plan. 
fourth week is to summarize the audit results and write an 
In the execution stage, the system obtains the latest 
audit report. Resource allocation is based on the difficulty 
financial data of the enterprise in real time through API 
and workload of the audit task, and auditors and technical 
interface and data crawler technology, and performs data 
resources are reasonably deployed. For complex cost 
analysis and audit procedures according to the 
accounting links, auditors with rich experience and 
predetermined audit strategy. The random forest algorithm 
professional data analysis tools are arranged. Through the 
is used for risk assessment and anomaly detection, and the 
implementation of these audit policies, the company has 
system monitors and analyzes key financial indicators in 
reduced the incidence of financial risks by 30% in the past 
real time. When a company's cash flow fluctuates during 
year, and the audit satisfaction rate has reached more than 
the audit process, the system will mark the item as high 
85%. 
risk and further analyze the cause of the fluctuation. 
Dynamic adjustment is the link of audit strategy 
implementation. The system continuously monitors data 3 Results and discussion 
and audit results during the audit process and dynamically 
adjusts audit policies according to the actual situation. If 3.1 Results 
an exception is found in a certain area during the 
preliminary audit, the system will increase the audit efforts 3.1.1 Audit efficiency improvement result 
in this area and adjust the audit resources and schedule. In In the research of financial audit automation method based 
the process of audit, it is found that the accounts payable on artificial intelligence, the audit efficiency is improved 
turnover rate of an enterprise is abnormally high, and the by introducing random forest algorithm and optimizing 
system will increase the audit of the supplier payment audit process. Specific efficiency improvements can be 
process to ensure the legality and compliance of all demonstrated by comparing key indicators before and 
accounts transactions [18]. after the implementation of automated auditing. 
The system also continuously optimizes audit Random forest feature selection can extract the most 
strategies through machine learning algorithms. By representative features from a large amount of data, reduce 
analyzing the successful experience and failure lessons of noise and redundant information, and improve the 
past audit projects, the system constantly adjusts and accuracy and robustness of the model. The real-time 
optimizes audit strategies to improve audit efficiency and analysis function enables the system to quickly respond to 
accuracy. The system adjusts the weight of the risk data changes, optimize the decision-making process, and 
assessment model based on historical data to ensure improve the real-time and adaptability of the system. 
accurate identification of high-risk areas. The These improvements have jointly promoted the 
implementation of the audit policy also includes the improvement of system performance, ensuring more 
automatic generation of audit reports and accurate predictions and more efficient resource 
recommendations. The system generates a detailed audit allocation. 
report based on the audit results, including found risks,  
anomalies and improvement suggestions, providing 
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 15 
As shown in Figure 4, the risk detection rate of 
automated audit systems in different companies has 
increased to more than 88%. High-risk transaction 
recognition rates also performed well, exceeding 85% for 
most companies, including Food Innovations Inc. That's 
90 percent. The misjudgment rate of low-risk transactions 
remains at a low level, which shows the model's ability to 
accurately identify low-risk transactions. Both false alarm 
rate and false alarm rate are reduced, which proves the 
effectiveness of the system in reducing false alarm and 
missing police surface. Tech Solutions Inc. had a false 
 
alarm rate of 10% and a false alarm rate of 5%, showing 
Figure 3: Audit efficiency improvement result 
the model's stability in balancing false alarms and false 
As shown in Figure 3, through the introduction of alarms. 
Through the application of random forest algorithm, 
random forest algorithm, the automated audit system has 
the automated audit system shows high efficiency and 
shown improved efficiency in many aspects. Audit 
accuracy in risk identification, improves the risk detection 
coverage increased across all companies, indicating that 
automated systems are able to more fully audit a rate and the identification rate of high-risk transactions, 
company's financial data. Error detection rates also and reduces the misjudgment rate, false positive rate and 
improved, reflecting the model's strong ability to identify false negative rate of low-risk transactions. The results 
provide strong support for the financial management and 
and correct errors. The speed of data processing is 
risk control of enterprises, and improve the quality and 
accelerated, indicating that automated systems are able to 
efficiency of audit work. 
process large amounts of financial data more efficiently. 
The improvement of accuracy and risk identification rate 
further proves the reliability and effectiveness of the 3.1.3 Audit feedback and improvement results 
automated audit system. Under the joint action of In the research of financial audit automation based on 
indicators, the financial audit process becomes more artificial intelligence, audit feedback and improvement 
efficient and accurate, which provides a guarantee for the results are the key to ensure the continuous optimization 
financial management of enterprises. and efficient operation of audit process. By collecting and 
analyzing audit feedback, the system can continuously 
3.1.2 Audit risk identification effect improve the algorithm and process to improve the 
accuracy and efficiency of the audit. It shows the 
In the research of financial audit automation method based 
performance of key indicators after audit feedback and 
on artificial intelligence, the random forest algorithm is 
introduced to effectively improve the effect of audit risk improvement of different companies, including the 
identification. Through the integration of multiple adoption rate of audit recommendations, the efficiency of 
corrective measures, audit satisfaction, the reduction rate 
decision trees, random forest algorithm improves the 
of error rate after improvement and the increase rate of 
detection ability of abnormal data and potential risks. This 
efficiency after improvement. 
paper presents the audit risk identification effect of 
The increased computing costs or complexity of 
different companies, including risk detection rate, high 
risk transaction identification rate, low risk transaction maintaining AI systems may stem from multiple factors. 
misjudgment rate, false positive rate and false negative First, real-time analysis functions require rapid processing 
of large amounts of data, which increases the demand for 
rate. The data show that automated audit systems perform 
computing resources and may lead to increased hardware 
well in risk identification. 
and operation and maintenance costs. Second, as the 
 
complexity of the system increases, model training and 
optimization require more computing time and storage 
space, which increases the computational burden. 
Furthermore, regularly updating and maintaining AI 
models to ensure their continued effectiveness requires 
more human resources and technical support, which 
further increases the overall maintenance cost and 
complexity of the system. 
 
 
Figure 4: Audit efficiency improvement result 
16 Informatica 49 (2025) 1–20 J. Li et al. 
accidental. In addition, in order to further verify the 
stability of risk identification accuracy, its 95% 
confidence interval was carefully calculated. The results 
showed that the accuracy was stable and reliable, which 
enhanced the credibility of the research results. 
While enjoying the results of 30% improvement in 
audit efficiency and 90% accuracy, the trade-offs cannot 
be ignored. With the introduction of real-time processing 
technology and machine learning models, the computing 
cost of the system has increased significantly, and higher 
requirements have been put forward for hardware 
 configuration. More powerful servers are needed to 
Figure 5: Audit efficiency improvement result support the rapid processing of massive data. At the same 
time, the complexity of the system has been greatly 
As shown in Figure 5, different companies have improved. The training, optimization and daily operation 
achieved results after audit feedback and improvement. and maintenance of the model require the participation of 
The adoption rate of audit recommendations is high, professional technicians, and the labor cost and technical 
reaching 87% on average, indicating that enterprises difficulty have increased. However, considering the huge 
attach great importance to the audit recommendations benefits it brings to corporate financial management, these 
provided by the system and actively adopt them. Health investments are still worthwhile. 
Plus Ltd. had an adoption rate of 89%. The effectiveness In the results section, we supplemented the control 
of corrective actions also performed well, averaging 91%, group data and selected the traditional sampling audit 
indicating that the corrective actions proposed by the method as a control. On the same audit items and data sets, 
system were highly effective in improving financial the automated audit method based on artificial intelligence 
processes and controlling risks. The audit satisfaction and the traditional sampling audit method were used for 
reflects the overall evaluation of the automated audit auditing respectively. In terms of audit coverage, the 
system of the enterprise, with an average of about 90%, automated audit method reached 95%, while the 
indicating that the enterprise is very satisfied with the traditional sampling audit method was only 70%. This is 
audit results and feedback process of the system. The because the automated audit can conduct a comprehensive 
reduction of error rate after improvement shows the analysis of all data, while the traditional sampling audit is 
improvement effect after audit feedback, with an average limited by the sample size. In terms of detection rate, the 
reduction of 47%. Tech Solutions Inc. 's error rate was automated audit method has a detection rate of 90% for 
reduced by 45 percent, while Food Innovations Inc. It was financial risks, while the traditional sampling audit 
49 percent. method is 75%, indicating that the automated audit 
The improved efficiency further proves the positive method can more effectively detect potential financial 
effect of audit feedback on improving audit efficiency, risks. Through comparative analysis, we can more 
with an average increase of more than 50%. The efficiency intuitively see the advantages of the automated audit 
of Green Energy Corp. has increased by 52%, indicating method based on artificial intelligence in improving audit 
that the audit efficiency of enterprises has been improved efficiency and accuracy. 
through audit feedback and improvement measures. In terms of efficiency, by introducing the automated 
Through continuous audit feedback and improvement audit system, the audit time has been shortened from an 
measures, the financial audit automation system based on average of 20 working days to less than 10 working days, 
artificial intelligence improves the accuracy and and the efficiency has been increased by more than 50%. 
efficiency of the audit, enhances the standardization and This is mainly due to the system's ability to quickly 
transparency of the financial management of enterprises, process large amounts of financial data and reduce the 
and provides protection for the financial health of time for manual review. In terms of accuracy, the accuracy 
enterprises. of risk identification has increased from 80% to more than 
To verify the significance of the improvement in audit 93%. For example, in an audit of a listed company, the 
efficiency, we conducted a t-test on the indicators before automated audit system discovered an abnormal 
and after automation, and the results showed that the p- transaction of fictitious income in a timely manner by 
value was less than 0.05, indicating that the improvement monitoring financial data in real time, while traditional 
was statistically significant. audit methods failed to detect it at the first time. Through 
In the result analysis phase, in order to evaluate the continuous audit feedback and improvement measures, we 
research results more rigorously, an in-depth statistical continue to optimize the model and audit process, further 
analysis of the improvement in audit efficiency was improve the accuracy and efficiency of audits, and 
carried out. For the key indicators before and after enhance the standardization and transparency of corporate 
automation, a t-test was carefully designed and executed. financial management. 
Through the calculation and analysis of a large amount of The increase in computing costs mainly includes the 
sample data, the result of p-value less than 0.05 was finally following aspects: hardware equipment upgrade costs. In 
obtained. This data strongly shows that the improvement order to meet the needs of big data processing and model 
in audit efficiency is statistically significant and not calculation, we purchased high-performance servers, and 
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 17 
the cost increased by 500,000 yuan; software licensing ordinary servers, while deep learning systems usually 
fees. We use professional data analysis software and require high-performance servers equipped with GPUs. 
artificial intelligence algorithm libraries, and the annual 
software licensing fee is 200,000 yuan; human resource 3.2 Discussion 
investment. We recruited and trained professionals with 
data analysis and artificial intelligence technology, and the 3.2.1 Problem summary 
human resource cost increased by 300,000 yuan each year. In the research of financial audit automation method based 
Through cost-benefit analysis, we calculated the return on on artificial intelligence, although the efficiency 
investment (ROI). In the past year, due to the 
improvement and risk identification effect have been 
improvement of audit efficiency, the company saved 1 achieved, there are still some problems that need to be 
million yuan in audit costs, and avoided 2 million yuan in summarized and solved. Data quality remains a challenge. 
potential losses caused by the failure to discover financial Even after strict data cleaning and preprocessing steps are 
risks in time. According to the ROI calculation formula: implemented, data noise and missing values still affect the 
ROI = (benefit - cost) / cost × 100%, the calculated ROI accuracy and stability of the model. The data sources 
is 200%, indicating that the cost increase is acceptable and involved in the audit process are diverse and the data 
has a high investment value. formats are not uniform, which leads to the complexity of 
When evaluating the improvement of audit efficiency, data integration and increases the difficulty of system 
we selected 30 audit projects as samples and recorded the processing and analysis. The issues of model 
audit time before and after the use of the automated audit interpretation and transparency need attention. Although 
system. When using the t test, we first performed a complex algorithms such as random forest perform well in 
normality test on the two groups of data to ensure that the accuracy and efficiency, their internal decision-making 
data met the conditions of the t test. Then, we calculated process is complicated and difficult to be understood and 
the mean and standard deviation of the two groups of data, explained by non-technical personnel. In the process of 
and calculated the t value using the t test formula. After generating audit reports and interpreting audit results, 
calculation, the t value was 3.5, the degree of freedom was users' trust in audit conclusions is reduced. 
58, and the corresponding p value was 0.01, which was The real-time processing capability of the system 
less than 0.05, indicating that at a confidence level of 95%, needs to be improved. Despite the introduction of real-
the audit time of the automated audit system was time processing platforms such as Kafka and Blink, the 
significantly lower than that of the traditional method, and system still has room for improvement in processing speed 
the efficiency was significantly improved. In terms of risk and latency in the face of large-scale and high-frequency 
identification accuracy, the Mann-Whitney U test was data flows. This puts forward higher technical 
used. The number of risks identified and the correct requirements for realizing real - time audit. The 
identification rate of the automated system and the generalization ability of the model also needs attention. 
traditional method in 30 audit projects were compared. Although the robustness of the model has been improved 
The calculated Mann-Whitney U value was 200, and the through cross-validation and parameter optimization, the 
corresponding p value was 0.03, which was less than 0.05, model shows insufficient adaptability in the face of new 
indicating that the automated system was significantly types of financial data and fraudulent means, which affects 
better than the traditional method in terms of risk its promotion and application in different enterprises and 
identification accuracy. industries. 
Audit systems based on deep learning, such as The user feedback mechanism needs to be improved. 
convolutional neural networks (CNN) and recurrent Although the system can automatically generate audit 
neural networks (RNN), have powerful feature learning reports and improvement suggestions, how to effectively 
capabilities when processing financial data and can collect and process user feedback in the feedback and 
automatically extract complex data features. However, improvement process to continuously optimize the audit 
deep learning models require a large amount of labeled strategy and model performance is still a problem that 
data for training, and the training process consumes large needs in-depth research. Although AI-based financial 
computing resources and takes a long time to train. In audit automation methods have achieved results in 
contrast, the random forest algorithm used in this study improving audit efficiency and risk identification, they 
combined with Kafka and Flink real-time processing still need to be further optimized and improved in data 
technology has obvious advantages in audit efficiency. quality, model interpretation, real-time processing 
When processing financial data of the same scale, the audit capabilities, generalization capabilities and user feedback 
time of this system is only 50% of that of the deep learning mechanisms to achieve more efficient and reliable 
system. The average audit time of the deep learning financial audit automation. 
system is 20 days, while this system only takes 10 days. In Although audit efficiency has been significantly 
terms of accuracy, although the deep learning system improved, an increase in computational costs has also 
performs well in identifying some complex risks, the been noted. This is due to the introduction of real-time 
accuracy of this system in identifying common financial processing technology and machine learning models that 
risks is comparable to that of the deep learning system, increase the computational burden on the system. 
reaching more than 90%. At the same time, this system has However, this cost increase is acceptable considering the 
low demand for computing resources and can run on efficiency gains. 
18 Informatica 49 (2025) 1–20 J. Li et al. 
3.2.2 Research suggestions and positive, which shows that the debt-to-asset ratio plays 
In the research of financial audit automation method based a key positive role in the model's judgment of the 
on artificial intelligence, in order to further improve the transaction as abnormal. That is, the debt-to-asset ratio 
efficiency and accuracy of the system, the following exceeds the normal range, which greatly increases the 
research suggestions need to be put forward. Data quality possibility of the transaction being judged as abnormal. 
management needs to be further improved. It is Visualizing the SHAP value (such as using a SHAP value 
recommended to establish a more comprehensive data bar chart, with the horizontal axis as the feature name and 
cleaning and preprocessing mechanism, adopt advanced the vertical axis as the SHAP value size) can more 
missing value processing methods and anomaly detection intuitively show the degree of influence of each feature on 
techniques, such as adaptive filtering and deep learning the prediction result. Auditors can see at a glance which 
models, to improve data reliability and integrity. Enhance features have the greatest impact on the model's decision, 
model interpretation and transparency. It is recommended and thus conduct in-depth analysis of why the transaction 
to integrate explainable AI technologies such as LIME and was judged to be abnormal, greatly improving the 
SHAP into the model so that auditors can understand and transparency and credibility of the audit, so that audit 
explain the decision-making process of the model, thereby decisions are no longer "black box" operations, but are 
improving the transparency of audit reports and the trust based on clear and explainable evidence. 
of users. Enhanced real-time processing capabilities. It is 
suggested to optimize the existing real-time data 3.3 Discussion 
processing architecture, introduce more efficient stream From a quantitative perspective, in terms of audit 
processing technologies and hardware acceleration accuracy, the accuracy of cutting-edge research is mostly 
schemes, such as GPU acceleration and distributed in the range of 80% - 86%, while this study uses the 
computing framework, to cope with large-scale and high- random forest algorithm combined with real-time data 
frequency data stream processing needs, and ensure that processing technology to achieve an audit accuracy of 
the system can respond to and process data in real time. 90%. In terms of audit efficiency, most cutting-edge 
Improving the generalization ability of models is research efficiencies are at a medium or low level, but this 
another direction. It is suggested to enhance the study has achieved a significant result of 30% efficiency 
adaptability and robustness of the model in different improvement. This is mainly attributed to the fact that the 
enterprises and industries through integration learning and random forest algorithm constructs multiple decision trees 
transfer learning techniques. The multi-model fusion and randomly selects features for node splitting, 
method is used to improve the generalization performance effectively reducing the risk of overfitting and improving 
of the model, and the model is applied to different the model's ability to identify complex financial data; at 
financial environments through transfer learning. the same time, the use of real-time data processing 
Optimize the user feedback mechanism. It is suggested to platforms such as Kafka and Flink has realized the real-
establish a dynamic feedback and continuous learning time collection, processing and analysis of financial data, 
system, collect and analyze user feedback, adjust and greatly accelerating the audit process. 
optimize audit strategy and model parameters in time. Qualitatively, the support vector machine, gradient 
Through the closed-loop mechanism of user feedback, the boosting and other methods used in cutting-edge research 
system performance can be continuously improved to have limitations when facing the high dimensionality, 
ensure the effectiveness of audit strategies and the complexity and dynamic changes of financial data. The 
accuracy of models. The suggestions aim to further support vector machine has high computational 
improve the financial audit automation method based on complexity, is sensitive to the choice of kernel functions, 
artificial intelligence, improve the intelligence level and and is difficult to adapt to the diversity of financial data; 
practicability of the system, and provide more powerful gradient boosting is sensitive to outliers, takes a long time 
technical support and guarantee for the financial to train, and cannot meet real-time requirements. In 
management of enterprises. Through improvement, more contrast, this research method not only improves the 
efficient and accurate financial audit can be achieved, and accuracy and robustness of the model, but also ensures the 
the intelligent transformation of the financial audit dynamics and timeliness of the audit process, and can 
industry can be promoted. better adapt to the actual needs of corporate financial 
audits. 
3.2.3 SHAP effect However, this research method still has certain 
In an actual case, we selected the financial data of a certain limitations. In terms of data quality, despite the 
company over a period of time, including multiple features implementation of strict data cleaning and preprocessing 
such as income, expenditure, accounts receivable turnover steps, data noise and missing values will still affect the 
rate, and debt-to-asset ratio. After model training and accuracy and stability of the model. Model interpretability 
prediction, we obtained the result that a certain transaction and transparency are also issues that need attention. The 
was judged to be abnormal. At this time, the SHAP value internal decision-making process of the random forest 
can clearly explain the basis for the model to make this algorithm is complex, and it is difficult for non-technical 
judgment. personnel to understand and explain, which to a certain 
By calculating the SHAP value of each feature, we extent reduces the user's trust in the audit conclusions. In 
found that the SHAP value of the debt-to-asset ratio is high addition, when facing large-scale, high-frequency data 
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 19 
traffic, the system's real-time processing capabilities have to meet the real-time requirements of audits. In addition, 
been improved, but there is still room for improvement. SHAP is calculated based on a decision tree model, which 
The generalization ability of the model when dealing with is susceptible to data distribution and noise. Even a small 
new financial data and fraud methods also needs to be change in data may cause a significant change in the tree 
enhanced. structure, which in turn leads to unstable calculation 
results of the SHAP value, which cannot accurately reflect 
4 Conclusion the true contribution of features to model decisions, 
affecting the reliability of audit results. 
In this study, an AI-based financial audit automation  
method is deeply discussed and implemented to improve 
audit efficiency, accuracy and risk identification ability. 
Through the introduction of random forest algorithm, Acknowledgement 
combined with several key steps such as data integration, This study was funded by Science Research Project of 
real-time processing, model training and optimization, the Hebei Education Department (BJS2023041). 
system has shown improvement in many aspects. By  
constructing and optimizing the random forest model, the 
audit coverage and error detection rate are improved, and References 
the risk identification and processing are realized 
efficiently. The data cleaning and reprocessing steps [1] Sun JH, Li LC, Qi BL. Financial statement 
ensure the quality and consistency of the input data, comparability and audit pricing. Accounting Finance. 
providing a reliable basis for the model. The introduction 2022; 62(5): 4631-4661. 
of real-time processing technologies such as Kafka and https://doi.org/10.1111/acfi.12970. 
Slink has accelerated data processing and met the [2] Condie ER, Obermire KM, Seidel TA, Wilkins MS. 
processing needs of high frequency data streams. The Prior Audit Experience and CFO Financial Reporting 
audit process design optimizes the allocation and Aggressiveness. Audit J Pract Theory. 2021; 40(4): 
utilization of audit resources and improves the overall 99-121. https://doi.org/10.2308/AJPT-2020-012. 
audit efficiency through systematic steps. [3] Koh K, Tong YH, Zhu ZN. The effects of financial 
The maintainability and transparency of the model statement disaggregation on audit pricing. Int J Audit. 
was also emphasized in the study, and by introducing 2022; 26(2): 94-112. 
explainable AI technology, users' trust in audit results was https://doi.org/10.1111/ijau.12253. 
increased. At the same time, through the establishment of [4] Lyshchenko O, Ocheret'ko L, Lukanovska I, 
dynamic feedback mechanism, the system can constantly Sobolieva-Tereshchenko O, Nazarenko I. The role of 
collect and analyze user feedback, timely adjust the audit financial audit in ensuring the reliability of financial 
strategy and model parameters, and achieve continuous statements. Ad Alta J Interdiscip Res. 2024; 14(1). 
optimization. Despite the achievements, the study also https://doi.org/10.33543/140139141145 
points out some challenges, such as data quality issues, [5] Suryani E, Winarningsih S, Avianti I, Sofia P, Dewi N. 
model generalization capabilities, and real-time Does Audit Firm Size and Audit Tenure Influence 
processing capabilities, which need to be further Fraudulent Financial Statements? Australas Account 
addressed in future research and applications. This study Bus Finance J. 2023; 17(2): 26-37. 
shows the great potential of artificial intelligence in [6] Xu Q, Fernando G, Tam K, Zhang W. Financial report 
financial audit automation, and also puts forward specific readability and audit fees: a simultaneous equation 
suggestions for improvement, which provides a valuable approach. Managerial Aud J. 2020; 35(3): 345-372. 
reference for future financial audit practice. Through https://doi.org/10.1108/MAJ-02-2019-2177. 
continuous optimization and improvement of technology [7] Erdmann A, Yazdani M, Mas Iglesias JM, Marin 
and methods, financial audit will achieve a higher level of Palacios C. Pricing Powered by Artificial 
intelligence and automation, and provide more accurate Intelligence: An Assessment Model for the 
and efficient support for the financial management and Sustainable Implementation of AI Supported Price 
risk control of enterprises. Intelligent audit methods Functions. Informatica. 2024;35(3):529-56. 
improve the quality and efficiency of audit work, but also https://doi.org/10.15388/24-infor559 
enhance the transparency and standardization of financial [8] Ijadi Maghsoodi A, Hafezalkotob A, Azizi Ari I, Ijadi 
management, and promote the innovation and Maghsoodi S, Hafezalkotob A. Selection of Waste 
development of the financial audit industry. Lubricant Oil Regenerative Technology Using 
Although explainable artificial intelligence Entropy-Weighted Risk-Based Fuzzy Axiomatic 
technology (such as SHAP) has achieved remarkable Design Approach. Informatica. 2018;29(1):41-74. 
results in improving the transparency of financial audits, https://doi.org/10.15388/Informatica.2018.157 
its limitations cannot be ignored. In an extremely high- [9] Pragarauskaitė J, Dzemyda G. Markov Models in the 
frequency data environment, calculating the SHAP value analysis of frequent patterns in financial data. 
requires complex operations on massive data, resulting in Informatica, 2013, 24(1): 87-102. 
a sharp increase in computing resource consumption, http://dx.doi.org/10.15388/Informatica.2013.386 
processing efficiency cannot keep up with the speed of [10] Lim CY, Lobo GJ, Rao PG, Yue H. Financial 
data updates, and scalability is limited, making it difficult capacity and the demand for audit quality. Accounting 
20 Informatica 49 (2025) 1–20 J. Li et al. 
Bus Res. 2022; 52(1): 1-37. 
https://doi.org/10.1080/00014788.2020.1824116. 
[11] Ismail R, Mohd-Saleh N, Yaakob R. Audit committee 
effectiveness, internal audit function and financial 
reporting lag: Evidence from Malaysia. Asian Acad 
Manage J Account Finance. 2022; 18(2): 169-193. 
https://doi.org/10.21315/aamjaf2022.18.2.8. 
[12] Oussii AA, Boulila N. Evidence on the relation 
between audit committee financial expertise and 
internal audit function effectiveness. J Econ Adm Sci. 
2021; 37(4): 659-676. https://doi.org/10.1108/JEAS-
04-2020-0041. 
[13] Lyubenko A, Znak N, Karpachova O. Audit features 
of the first IFRS financial statements. Financial Credit 
Activity Probl Theory Pract. 2022; 1(42) :185-194. 
[14] Endrawes M, Feng ZA, Lu MT, Shan YW. Audit 
committee characteristics and financial statement 
comparability. Accounting Finance. 2020; 60(3): 
2361-2395. https://doi.org/10.1111/acfi.12354. 
[15] Lutfi A, Alkilani SZ, Saad M, Alshirah MH, Alshirah 
AF, Alrawad M, et al. The influence of audit 
committee chair characteristics on financial reporting 
quality. J Risk Financial Manage. 2022; 15(12): 563. 
https://doi.org/10.3390/jrfm15120563. 
[16] Calvin CG, Holt M. The impact of domain-specific 
internal audit education on financial reporting quality 
and external audit efficiency. Accounting Horizons. 
2023; 37(2): 47-65. 
https://doi.org/10.2308/HORIZONS-2020-105. 
[17] Alcaide-Ruiz MD, Bravo-Urquiza F. Does audit 
committee financial expertise actually improve 
information readability? Rev Contab Span Account 
Rev. 2022; 25(2): 257-270. 
https://doi.org/10.6018/rcsar.420261. 
[18] Driskill MW, Knechel WR, Thomas E. Financial 
auditing as an economic service. Curr Issues Aud. 
2022; 16(2). https://doi.org/10.2308/CIIA-2021-021. 
 
 
https://doi.org/10.31449/inf.v49i16.7705 Informatica 49 (2025) 21–36 21 
 
Graph Neural Network-Based User Preference Model for Social 
Network Access Control 
Yuan Zhang1,2* 
1Xuchang Vocational Technical College, Xuchang 461000, China 
2Henan Province Data Intelligence and Security Application Engineering Technology Research Center, Xuchang 
461000, China 
E-mail: hnxc_z@126.com 
*Corresponding author 
Keywords: social networks, user preferences, graph neural network, multi-layer attention, access control 
Received: November 11, 2024 
The popularity and deepening of social networks have increased the risk of personal information 
leakage for users. To enhance the security of social networks, this study constructed an access control 
model based on the preferences of social network users. This model utilizes graph neural networks to 
generate access control strategies based on user preferences, and introduces a multi-layer attention 
mechanism to optimize the graph neural network. To better capture user preference information, the 
study sets the learning rate to 0.0001. The experimental results demonstrated that in the Twitter dataset, 
the accuracy of the proposed model reached 95.7% and the F1 score reached 96.2%, which were 
significantly higher than those of other models. These results indicated that the model could more 
accurately classify access control in social networks and reduce false positives. The area under the 
receiver operation characteristic curve of the proposed model was 0.982, which was higher than other 
models. The decision time was 13.77 seconds, significantly lower than other models. This indicated that 
the model could more effectively distinguish different types of user access requests and provide more 
reliable guarantees for secure access to social networks. The user's preferred social network access 
control model based on graph neural networks has superior performance, effectively ensuring the 
information security of social network users and laying the foundation for further development of access 
control technology. 
Povzetek: Predstavljen je nov model za nadzor dostopa v družbenih omrežjih, ki temelji na grafskih 
nevronskih mrežah in uporabniških preferencah. Z uporabo večslojnega pozornostnega mehanizma 
model omogoča zanesljivo in varno upravljanje dostopa. 
 
1  Introduction  unauthorized access and malicious activities. It can also 
prevent data leakage, tampering, and destruction, protect 
In the era of rapid digital development, social networks 
sensitive information from being leaked to unauthorized 
play an important role in today's society. Through social 
personnel, and ensure the reliability of network systems 
platforms, people can not only obtain the information and 
[4]. Therefore, the importance of implementing effective 
exchange ideas they need but also engage in commercial 
access control for social networks is self-evident. 
activities through social networks, greatly changing their 
Nowadays, there are mainly attribute-based, 
communication methods and lifestyle habits [1, 2]. 
policy-based, and relation-based Access Control Models 
However, the popularity of social networks has made the 
(ACM), which are widely used in various scenarios [5]. 
issue of user privacy protection increasingly prominent. 
However, traditional models still have drawbacks such as 
Using social networks means that users need to expose 
complex permission management and difficulty in 
their personal information to a certain extent. Criminals 
adapting to dynamic network environments. Specifically, 
can steal user information through cyber attacks and use 
traditional models often require manual intervention in 
it for illegal activities, thereby posing potential risks to 
the process of assigning, revoking, and updating 
users [3]. Meanwhile, there is a large amount of false 
permissions, resulting in increased management costs and 
information and rumors on social networks. The rapid 
error rates. The user behavior and social relationships of 
dissemination of this information may lead to 
social networks are constantly changing, and traditional 
misunderstandings among the public about certain events 
models are difficult to adapt, resulting in insufficient 
or issues, resulting in adverse social impacts. Access 
flexibility of access control policies and inability to 
control is a critical component in information security, 
effectively respond to new security threats. In this 
used to manage user access permissions to systems, 
context, this study constructs ACM based on the 
networks, or applications. Access control can help 
preferences of social users, uses Graph Neural Network 
organizations protect important data and resources from  
22   Informatica 49 (2025) 21–36                                                                     Y. Zhang 
(GNN) for access control, and introduces Multi-Layer proof-based Ethereum access blockchain to accelerate 
Attention (MLA) to optimize GNN. Finally, data storage. This method could significantly improve 
UP-GNN-SNAC model, a GNN-based social network network security [9]. Zhang L et al. designed a 
ACM catering to user preferences, is designed. The lightweight decentralized multi-authorization ACM based 
innovation of the research lies in constructing an ACM on ciphertext policy attribute-based encryption and 
based on user preferences. Compared with existing blockchain to enhance the security of in-vehicle social 
GNN-based ACMs, this model better balances privacy networks. Distributed multi-authorization nodes 
protection and user experience by capturing user supported vehicle users by performing lightweight 
preferences, providing a more efficient and accurate computing with the help of vehicle cloud service 
solution for secure access to social networks. providers. This model had significant advantages 
compared to existing solutions [10]. 
2  Related works Zhao Y et al. designed a policy-protected, cleanable 
ACM to improve the efficiency of data encryption in 
The progress of the Internet has made it a part of people's 
vehicle social networks. It could test and clean encrypted 
daily life to interact with others through social networks. 
data, and divide access policies into attribute names and 
However, due to system vulnerabilities in online 
attribute values, thereby hiding information in the 
platforms, many criminals exploit these vulnerabilities to 
ciphertext and achieving good encryption performance 
launch attacks, resulting in the leakage of user 
[11]. Squicciarini A et al. designed a discrete ACM based 
information and even being maliciously exploited. Access 
on individual decision-making to address privacy and 
control, as a key technology for maintaining social 
security issues arising from data sharing in social 
network security, is currently a hot topic of research 
networks. It took into account individual preferences in 
among relevant professionals. You M et al. designed a 
social networks and selected discrete privacy values from 
knowledge graph-based access control decision-making 
a fixed set of options. This model had a good privacy 
method to improve access control performance under 
protection effect in data sharing [12]. Dixit M S et al. 
different degrees of imbalance. It extracted topological 
designed a deep learning-based real-time user ACM for 
features to represent high cardinality classification users 
social networks to address user login restrictions. It used 
and resource attributes, revealing the interrelationships 
CNN and LSTM to predict the age of users and adopts 
between different objects. This method could 
multi-task CNN for face detection and feature extraction, 
significantly improve access control performance [6]. Gai 
thus achieving significant control over user login [13]. 
K et al. designed a zero-trust cross-organizational data 
Wen W et al. designed an autonomous privacy control 
sharing ACM based on blockchain to enhance security in 
and identity verification sharing scheme built on fast 
network data sharing. It utilized blockchain alliances to 
response codes in social networks to solve the problem of 
establish a trusted environment and deployed role-based 
users being unable to independently control privacy 
access control through multi-signature protocols and 
sharing. It used fast response codes with high-quality 
smart contract methods, which had high practicality [7]. 
images for error correction, combining the advantages of 
Wu H et al. designed a cloud network secure storage data 
polynomial-based and visual-based secret image sharing. 
ACM based on association rules to improve the security 
This scheme had low computational complexity and 
of social network data access control. It utilized 
scalability [14]. Safi S M et al. designed an improved 
association rule feature extraction methods for data 
end-to-end mobile social network security ACM to 
mining and attack detection in network security storage 
protect the personal privacy of social network users. It 
areas and achieved data access control in network 
encrypted user-shared data through ciphertext policy 
security storage areas through adaptive partition-weighted 
attribute encryption, utilizing advanced encryption 
interface scheduling. This method was superior to 
standards to prevent unauthorized user access. This 
traditional methods [8]. Azbeg K et al. designed an ACM 
scheme had high security and practicality [15]. The 
based on improved blockchain technology to enhance the 
summary of related work is shown in Table 1. 
security and privacy of network systems. It stored data in 
the interstellar file system and utilized authorization 
 
Table 1: Summary of related work. 
References Model Key features Dataset Indicator results Insufficient 
Extracting 
Access Control Not considering 
topological 
Decision Method Improved access the balance 
features to Synthesize social 
[6] Based on control between privacy 
represent user network data 
Knowledge performance protection and 
and resource 
Graph user experience 
attributes 
Blockchain Zero Establishing a Cross Slow response 
[7] High practicality 
Trust Cross Trusted organizational speed 
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 23 
Organizational Environment transaction data 
Data Sharing through 
Access Control Blockchain 
Alliance 
Poor adaptability 
to new types of 
Association 
Using attack modes 
Rules Cloud Superior to 
association rules Cloud storage and difficulty in 
[8] Network Security traditional 
for attack logs handling 
Storage Access methods 
detection dynamically 
Control 
changing 
environments 
Improve Interstellar File 
Significant Requires a large 
blockchain System and File transfer 
[9] improvement in amount of 
technology Authorization record 
network security storage space 
access control Proof Ethereum 
Complex key 
Decentralized 
Cryptography Compared to management 
Multi 
policy Vehicle existing increases 
Authorization 
[10] attribute-based communication solutions, it has deployment 
Model for 
encryption and records significant difficulty and 
Vehicle mounted 
blockchain advantages slow response 
Social Networks 
speed 
The cleaning 
Policy protection Testing and 
Encrypt the The encryption process may 
[11] can purify access cleaning 
dataset effect is good result in 
control encrypted data 
information loss 
Lack of effective 
Personal modeling of 
Individual preference group behavior 
Good privacy 
[12] Decision privacy user behavior data and insufficient 
protection effect 
Discrete ACM protection in consideration of 
social networks personalized 
preferences 
Deep learning 
models require a 
Real time user Convolutional Superior to large amount of 
Social network 
[13] access control in neural network traditional data for training, 
user data 
deep learning predicts age methods which poses a 
risk of privacy 
leakage 
High image 
Image correction Low 
Quick response quality 
combined with User uploads computational 
[14] code autonomous requirements 
secret image images complexity and 
privacy control and sensitivity to 
sharing good scalability 
image noise 
The key 
distribution and 
management of 
Mobile social Cryptography 
Mobile device High safety and ciphertext policy 
[15] network security policy attribute 
logs practicality attribute 
access control encryption 
encryption are 
relatively 
complex 
 
In summary, many scholars have achieved However, these methods still have slow response times 
significant results in social network access control. and fail to consider the balance between privacy 
24   Informatica 49 (2025) 21–36                                                                     Y. Zhang 
protection and user experience. Therefore, this study User preference refers to the preferences of users towards 
constructs an ACM based on user preferences and certain things, which are formed by the comprehensive 
simulates it using an improved GNN with MLA influence of various factors such as personal factors and 
mechanism to design an UP-GNN-SNAC model to social environment. Among them, personal factors 
improve access control effectiveness. include internal characteristics such as age, gender, 
 occupation, interests, values, and behavioral habits of 
3  GNN-based ACM based on user users. Social factors include external environmental 
factors such as social circles, interaction objects, social 
preferences frequency, cultural values, and social interactions. In 
This section mainly elaborates on the construction social networks, users express their preferences through 
process of the UP-GNN-SNAC model. The first section is posting and activity operations. These operations generate 
the design of ACM based on user preferences, and the a large amount of data. By analyzing these data, user 
second section is the implementation of an access control behavior patterns and characteristics can be understood, 
algorithm based on improved GNN. and appropriate access permissions can be generated for 
 users to meet their privacy needs in different scenarios, 
thereby protecting user privacy [16]. Therefore, this study 
3.1 ACM construction based on user 
constructs a model based on the preferences of social 
preferences network users, as shown in Figure 1. 
Algorithm 
module
 
Historical Access control module Data Protection 
User
data module Module
Preference 
analysis module
Figure 1: Specific architecture of access control model. 
 
In Figure 1, ACM consists of six modules: user, data sent by the data protection module is transmitted to the 
protection, access control, preference analysis, historical preference analysis module, it will analyze the user's 
data, and algorithm. When users need to post or obtain historical social data, obtain the user's preferences, and 
information from social networks, sending requests first return Personal Preferences (PPs). When the data is 
goes through the data protection module. It can encrypt transmitted to the algorithm module, it will train the 
and backup the information posted by users. Then, the obtained data and finally return the best result to the 
data protection module sends the user's request to the access control module. Specifically, different users have 
access control module. This module sends requests to the different preferences. When users upload information, 
preference analysis module, historical data module, and different preference information corresponds to different 
algorithm module respectively. The historical data access control policies [17]. It is necessary to determine 
module can extract and preprocess user interaction the level of privacy of uploaded information based on 
behavior data, basic attributes, and social relationship user preferences, that is, to establish a quantitative model 
data. After cleaning, deduplication, and standardization of of user preferences to measure social information 
these data results, they provide input for the preference entropy. Figure 2 shows a social information sensitivity 
analysis module and algorithm module. When the request measurement model based on user preferences. 
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 25 
Inverse cosine 
Information Sensitivity 
function
entropy measurement
 
Historical Information 
User posts numbers entropy weight Social Information 
data Sensitivity 
User sharing Measurement Model
Social 
degree
friends Conditional 
information entropy
Figure 2: Social information sensitivity measurement model based on user preferences. 
In Figure 2, after users post information, they need user. The confidentiality of the information posted by the 
to calculate the sensitivity of the information based on user can be calculated based on whether the user's 
information entropy and obtain the user's social uploaded information is blocked from their friends. The 
information sharing degree based on their historical visits calculation method is shown in equation (3). 
and social friends. It is also necessary to use methods F i
such as information entropy weight and conditional h = a (3) 
i
F
information entropy to calculate and obtain an h
In equation (3), i  is the confidentiality level of 
information entropy measurement model. Information F i
information i . a  is the number of friends blocked by 
entropy is a basic concept in information theory, which is 
the user.  
used to measure the uncertainty of a random variable. It 
The level of confidentiality is the ratio of the number of 
reflects how difficult it is to predict the outcome of an 
friends allowed to view social data on a social network to 
event, or how much information is needed to describe the 
the total number of friends. As the number of friends 
event. Conversely, an increase in information entropy 
permitted to view the social data increases, the level of 
corresponds to a decrease in event outcome uncertainty, 
confidentiality thereof decreases. The degree of social 
and vice versa. Information entropy weight is a weight 
information sharing can be used to describe the impact of 
allocation method based on information entropy, which is 
the number of social friends and the number of friends 
used to measure the importance of different features or 
blocked by the user on social data sharing. The degree to 
data dimensions. Conditional information entropy is used 
which friends are permitted to access information is 
to measure the uncertainty of a random event given 
directly correlated with the extent of social information 
certain conditions. Therefore, information entropy can be 
sharing. The calculation method is shown in equation (4). 
used to describe the amount of privacy contained in social 
data, determine the degree of privacy of the social data, 2 i
i Fa arctan F
si (F , F ) ( (4  
a = hi w F) = )
and construct an information sensitivity measurement F
model for social data. The calculation method for the 
s (F , F i
amount of social data of users is shown in equation (1). )
In equation (4), i a  is the user's social 
H(x) = − p(xi )log2 p(xi ) (1) information sharing degree. Information entropy can be 
measured based on the degree of social information 
H(x)
In equation (1),  is the average number of sharing among users, as shown in equation (5). 
private information uploaded by all users in the social 
p(x
network. i )  is the proportion of the privacy level of Hs (x) = −si p(xi )log2 p(xi ) (5) 
information i  in the total privacy information.  
H (x)
As the social breadth of a user increases with the number In equation (5), s  is the information entropy. 
of social friends, the relationship between the social According to the information entropy analysis of user 
breadth of a user and the number of social friends can be preference mechanism, in the algorithm module, user 
obtained as shown in equation (2). social data are divided into a training set and a testing set, 
2 and they are trained separately to obtain the final access 
w(F) = arctan F (2) 
 control policy. Figure 3 displays the obtaining process of 
the strategy. 
w(F)
In equation (2),  represents the social breadth 
of the user. F  is the number of social friends of the 
26   Informatica 49 (2025) 21–36                                                                     Y. Zhang 
Training Decision-
model making
Training set
User Feature User  
History extraction preferences
Final 
decision
Decision-
Test set Test 
making
model
Figure 3: Access control policy acquisition process. 
 
In Figure 3, after dividing the social data into two the process by which a node updates its own 
datasets, the user history records of different datasets are representation by exchanging information with its 
first obtained, and then user feature extraction is neighboring nodes. This process enables the transmission 
performed to calculate user preferences. The next step is of information on the graph [19]. Attention mechanism is 
to combine the feature vectors to obtain a training model an important technique in deep learning that allows 
and a testing model, and then make decisions separately models to selectively focus on different parts of the input 
through the training model and the testing model. The sequence, assigning different weights to each part of the 
final step is to make a comprehensive decision based on input sequence to highlight the more critical information 
the access request, obtain the final decision, and thus for the task. The attention mechanism is a process that 
obtain the access control policy. dynamically assigns weights to the elements of the input 
 sequence. This allows the model to focus on key parts of 
the input in a targeted manner. As a result, the model 
3.2 Access control method based on 
processes and learns information in the data more 
improved GNN efficiently [20]. Therefore, GNN performs well in graph 
In ACM, the algorithm module is the core of the entire structure data such as social networks and chemical 
model, which is used to process and analyze the dynamic molecular structures. Research is conducted on 
processing unit of user preference data, historical constructing a GNN model based on user preferences. To 
behavior data, and social relationship data. The algorithm improve the performance of the model, MLA mechanism 
module not only determines whether the model can is introduced to optimize the model, and an access control 
accurately analyze user preferences but also directly method based on improved GNN is designed to capture 
relates to whether the model can make effective decisions the complex patterns of user social relationships and 
[18]. GNN is a graph-based deep learning method that personal behavior, and optimize access permission 
can enrich node representations by utilizing the allocation. The study aims to enhance the model's 
relationships between nodes. Specifically, GNN can understanding of the relationships between different 
update the representation of nodes by defining the nodes by integrating MLA mechanisms into GNN. Each 
connection relationships between nodes on the graph, layer of the attention mechanism enables the model to 
utilizing their neighbor information to achieve focus on different node characteristics, thereby enabling 
information transfer and learning of the entire graph. the model to more finely distinguish the importance of 
GNN mainly includes three core functions: node users and their associated objects, enhance the model's 
representation, graph structure representation, and learning ability, improve its resolution of user 
message passing. Among them, node representation can preferences, and more accurately capture the user's true 
map each node to a low dimensional vector space for intentions. Consequently, this enhances the effectiveness 
subsequent calculations. The graph structure is of access control. The model structure is shown in Figure 
represented in a low dimensional vector space for 4. 
subsequent calculations. Message passing is defined as 
User
 
Node 
Social data classification
Preference 
Node embedding
dissemination fusion
Figure 4: Access control method based on improved GNN. 
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 27 
 
In Figure 4, nodes are constructed based on users, 
their social friends, and social data posted by users. The d
respectively. T  represents transpose. k  represents the 
user nodes include basic attributes, social relationship 
characteristics, behavior characteristics, and privacy dimension of the key vector, used to scale dot product 
setting characteristics. Social data nodes contain features results and prevent gradient vanishing. The method for 
of published content, interactive behavior, and social updating user SP nodes is shown in equation (9). 
relationships. These features are transformed into low 
1
dimensional vectors through numerical processing and pn+ = un +  n n
a a a u
 b
bS
embedding learning, and embedded into GNNs as input  a (9) 
 n
vectors. Given the input data, the user's Social softmax(Relu( n )T
3Relu( n
 a = ua W ub ))
Preferences (SPs) and PPs are obtained, and the two 
preferences are fused. Then, the fused data is trained and pn+1
In equation (9), a  is the updated temporary 
the nodes are classified. The user attributes are selected to 
represent the user, and after embedding, the user node 
embedding matrix is obtained. The calculation method is  n
node of user SPs. a  is the attention score, which is the 
shown in equation (6). 
aggregated weight ratio of each neighboring node during 
ua = f (W1 [Pa ,Ea ]) (6) 
un un
node update. a  and b  are the n -th embeddings of 
u P
In equation (6), a  is the user node, a  is the nodes a  and b . b  represents all the explicit and 
implicit neighbor nodes of the node in the figure. 
E
node embedding matrix, a  is the node free embedding 
softmax() Relu()
 and  both represent activation 
W
matrix, and 1  is the node embedding weight. By using 
W
functions. 3  is the attention weight. The update of user 
natural language processing tools to process and extract 
PP nodes is shown in equation (10). 
each social data, the embedding matrix of user posted 
social data nodes is obtained after embedding, and the qn+1 un n n
a = a +  a d
 i
iC
 a (10) 
calculation method is shown in equation (7). 
 n softmax(MLP[ n , n
 i = ua di ])
di = f (W2 [Qi ,Vi ]) (7) 
qn+1
In equation (10), a  is the updated PP temporary 
d Q
In equation (7), i  is a social data node. i  is the 
 n
node. a  is the weight ratio of adjacent nodes when a 
V
social data embedding matrix. i  is the free embedding 
c
user node updates. a  is all user related data in the 
W
matrix of data nodes. 2  is the embedding weight of figure. MLP  is a multi-layer perceptron. A multi-layer 
perceptron is a simple neural network used to perform 
social data. The embedded user nodes and social data 
nonlinear transformations on the feature vectors of nodes. 
nodes are input into the fusion layer and updated 
It typically consists of multiple fully connected layers, 
simultaneously through the MLA mechanism. The MLA 
each of which can be followed by a nonlinear activation 
mechanism calculation method is shown in equation (8). 
function. The embedding of user nodes and social data 
QKT
A(Q, K ,V ) = softmax( )V (8) nodes have different meanings in each dimension. If 
dk attention scores are calculated using functions such as dot 
product or mean pooling, it will result in inaccurate 
A(Q,K,V)
In equation (8),  represents attention. attention scores. Therefore, attention neural networks are 
used to calculate the attention scores of each neighboring 
Q
, K , and V  represent query, key, and value, node, and the results obtained by each neural network are 
finally normalized. The updated user's social preference 
28   Informatica 49 (2025) 21–36                                                                     Y. Zhang 
temporary node and personal preference temporary node 
un+1
are weighted and fused to obtain the updated user node. In equation (11), a  is the updated user node. 
The calculation method is shown in equation (11). 
n+1 n+1
 and   are the weights of SP temporary nodes 
un+1  n+1 pn+1  n+1qn+1
 a = a + a
 (11) and PP temporary nodes in the updated user nodes. 
 n+1 n+1
 + =1
Figure 5 shows the user preference fusion process. 
Embedded layer Fusion layer Access control 
output layer
GNN N-order fusion
User node
Propagation 
Word Fusion  
law
embedding coefficient
Social data
Social Personal End user 
preference preference node
Figure 5: User preference fusion process. 
 
In Figure 5, user preference fusion consists of three 
C
parts: embedding layer, preference propagation fusion n W
In equation (12), ua  is the fusion coefficient. 4  
layer, and access control output layer. In the embedding 
layer, word embeddings are performed on user nodes and is the fusion weight of user nodes. e  is a natural 
social data as inputs to the model. In the preference 
sigmod()
propagation fusion layer, two GNNs simulate the constant. d  is the dimension.  is the 
propagation and change patterns of user social 
preferences among users and the propagation and change tanh()
activation function.  is a hyperbolic tangent 
patterns of user personal preferences in social data. After 
N rounds of propagation and fusion, user nodes are function. Finally, the loss function of the model is defined 
updated using attention mechanisms based on explicit 
to measure the difference between the predicted results of 
neighbor nodes, implicit neighbor nodes, and social data 
nodes. In the access control output layer, in the graph, N the model and the true labels, as expressed in equation 
user nodes obtained through N preference propagation (13). 
fusion are used to calculate the fusion coefficient through 
1
a linear neural network. Based on the fusion coefficient, L = − [ya  log( pra ) + (1− ya )  log(1− pra )](13) 
the N user nodes are finally fused to obtain the final user M a
nodes with user preferences. The fusion coefficient can In equation (13), L  is the loss and M  is the 
quantify the importance or weight of user social 
preference temporary nodes and user personal preference y
amount of user nodes. a  means the true label of the 
temporary nodes. Therefore, the calculation of the fusion 
coefficient is completed by updating the user nodes and 
normalizing through a nonlinear transformation and a 
softmax function. At the same time, the user embedding a
-th user node, with a value of 0 or 1. When access is 
vectors after each propagation are multiplied by their 
corresponding fusion coefficients, and these weighted allowed, it is 1, and when access is prohibited, it is 0. 
embedding vectors are added to obtain the final user node 
vector. The calculation method is shown in equation (12). pra  is the probability of being judged as allowed access. 
C During the training phase, the model will continuously 
n = Softmax(tanh(W4u
n + e  dT )
 u a )
a
 N (12) adjust parameters based on the difference between the 
u = sigmod(C un
 a un a )
n=1 a true labels and the predicted probabilities to minimize the 
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 29 
loss function and optimize the probability estimates allowed access, thus completing access control. The 
allowed for access. According to the loss function, all implementation process of the designed ACM is shown in 
user nodes are classified based on whether they are Figure 6. 
Update the user 
Start Merge End
preference node
Input Update user social 
End user node Exportation
parameter preference node
 
Iterative Calculated fusion 
Initialize Access control
propagation coefficient
Enter the Traverse the Loss function Determine the user 
GNN model social matrix classification prediction node type
Figure 6: Implementation process of the proposed access control algorithm. 
 
In Figure 6, the original data such as user 4  Analysis of ACM results on social 
information, social relationships, and content posted by 
users are first collected from social networks. Redundant networks 
information in the data is eliminated by removing This chapter mainly elaborates on the experimental 
duplicate items, and missing values are filled in to ensure 
results of the UP-GNN-SNAC model. The first section is 
the integrity of the data. The data are then uniformly 
converted through format conversion to complete the a performance analysis of ACMs based on improved 
preprocessing of the collected data. The random seed is GNN. The second section is an analysis of the practical 
set to 42 using the random module and numpy library in application effect of the ACM based on improved GNN. 
Python, thereby ensuring that the generated random 
number sequence is the same every time the code is run.  
Then, the basic attributes and social behavior of users are 4.1 Performance testing of social network 
extracted to construct user feature vectors, content feature ACM 
vectors, and social relationship features. The model is 
To verify the performance of the proposed 
used to map user node content to the status space, 
UP-GNN-SNAC model, this study conducts simulation 
outputting user node embedding matrices and content 
experiments using Python 3.7 on a Windows 11 64 bit 
node embedding matrices. Then, by analyzing users' 
operating system equipped with an Intel Core 
social behavior, personal and social preferences are 
i7-14700KF central processor, 16GB of RAM, and 
obtained, and a MLA mechanism is introduced to update 
256GB of hard drive. The preferred propagation depth is 
the representations of user nodes and content nodes, 
5, the learning rate is 0.0001, and the maximum number 
highlighting the influence of important neighbors. Users' 
of iterations is 200. Accuracy is the most intuitive 
social and personal preferences are combined to form a 
evaluation metric in classification models, representing 
comprehensive user preference representation. Next, 
the proportion of correctly classified samples to the total 
using the fused user nodes as input, iterative propagation 
sample size. It measures the accuracy of the model in 
is performed through GNN to update the node 
classifying user access permissions. The F1 value is the 
representation. Next, a loss function is defined to measure 
harmonic mean of accuracy and recall, used to 
the difference between the model's predicted results and 
comprehensively measure the performance of a model. It 
the true labels, and the model parameters are adjusted to 
can balance the accuracy and recall of the model and 
minimize the loss function. Finally, the trained model is 
avoid bias caused by imbalanced data. Firstly, the Twitter 
employed to classify and predict new user nodes, 
dataset is introduced to calculate the accuracy and F1 
determine whether to allow access to specific resources, 
value of the research model, and compared with the 
and implement access control policies based on the 
accuracy and F1 value of traditional GNN and 
results of access control decisions. The purpose of these 
blockchain-based IoT ACMs in reference [20]. The 
actions is to ensure user privacy protection in social 
results are shown in Figure 7. 
networks. 
 
30   Informatica 49 (2025) 21–36                                                                     Y. Zhang 
10 10
90 90
80 80
70 70
60 60
50 50
40 Designed algorithm 40 Designed algorithm  
30 Reference [20] 30 Reference [20]
GNN GNN
20 20
10 10
0 0
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
Number of iterations Number of iterations
(a) Accuracy (b) F1
Figure 7: Accuracy and F1 value of different models. 
 
In Figure 7 (a), as the iteration increases, the 96.2±1.22%, respectively. Compared with the traditional 
accuracy of three models shows an upward trend. When GCN model and the model in reference [20], the F1 value 
iterating 200 times, the accuracy of traditional GNN is of the proposed model has increased by 12.6% and 6.4%, 
88.3±1.97%, the accuracy of the model in reference [20] respectively. The accuracy and F1 value of the research 
is 91.4±2.03%, and the accuracy of the research model is model are significantly higher, proving its high 
95.7±2.11%. Compared with traditional GNN and the classification accuracy and good effectiveness. The loss 
model in reference [20], the accuracy of the proposed function can be used to measure the difference between 
ACM has been improved by 7.4% and 4.3%, the model's predicted results and the true labels. The 
respectively. In Figure 7 (b), as iterations increase, the F1 experiment then introduces the Yelp dataset and 
values of various models gradually increase and tend to calculates the loss of the research algorithm under the 
flatten out. When the iteration reaches its maximum, the Twitter and Yelp datasets. The results compared with the 
F1 values of GNN, the model in reference [20], and the other two algorithms are shown in Figure 8. 
research model are 83.6±1.16%, 89.8±1.09%, and 
1.0 0.45
0.9 0.40
0.8 0.35
Designed algorithm
0.7 Designed algorithm
Reference [20] 0.30 Reference [20]
0.6 GNN
GNN 0.25
0.5
0.20
0.4  
0.15
0.3
0.2 0.10
0.1 0.05
0.0 0.00
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
Number of iterations Number of iterations
(a) Twitter dataset (b) Yelp dataset
Figure 8: Loss of different algorithms in different datasets. 
 
In Figure 8 (a), on Twitter, as iterations increase, the calculating its accuracy, recall, and Area Under the Curve 
losses of different models all show a decreasing trend. (AUC) on both the Twitter and Reddit datasets, it is 
The loss values of traditional GNN, reference [20] compared with traditional GNN, Graph Convolutional 
models, and research models are 0.23±0.03, 0.14±0.02, Network, signature schemes based on ciphertext policy 
and 0.07±0.03. In Figure 8 (b), the changes in loss curves attributes in reference [19], and models in reference [20]. 
of different models in Yelp are consistent with those in The AUC metric is a statistical technique that can 
Twitter. The loss values of the three models are comprehensively reflect the model's ability to distinguish 
0.12±0.02, 0.07±0.01, and 0.04±0.01. The loss value of between different categories. A higher AUC value 
the research model is greatly lower than others, indicating indicates that the model can more accurately predict 
its good generalization ability. It performs well in which users should be granted access permissions, 
different datasets, indicating good scalability. To further thereby reducing the likelihood of erroneously denying 
validate the performance of the proposed model by legitimate access or erroneously approving illegal access. 
Loss Accuracy (%)
Loss
F1 (%)
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 31 
Meanwhile, the analysis of variance is used to evaluate The three indicators of the proposed model are 0.966, 
the differences between models. ANOVA is a statistical 0.943, and 0.982, respectively. On the Reddit dataset, the 
method used to compare whether there is a significant accuracies of the five models are 0.768, 0.821, 0.871, 
difference in the mean between two or more groups. It is 0.896, and 0.972, respectively, with recall rates of 0.813, 
a widely used tool for researching experimental design 0.846, 0.913, 0.935, and 0.938, and AUC values of 0.778, 
and data analysis. ANOVA compares the variability 0.815, 0.857, 0.911, and 0.976, respectively. In different 
between different groups to determine whether the within datasets, the accuracy, recall, and AUC values of the 
group variability is significantly smaller than the between three indicators of the proposed model are significantly 
group variability. If the inter-group variability is higher than those of other models, and the differences 
significantly greater than the intra-group variability, it between the three indicators of the five models are 
can be concluded that there are significant differences statistically significant (P<0.05), proving its good 
between different groups. The significance level is set to comprehensive performance and reliability. Finally, 
0.05. If P<0.05, it indicates that the difference between ablation experiments are conducted on the proposed 
groups is statistically significant. Otherwise, the model to calculate the accuracy, recall, F1 value, and 
difference between groups is not statistically significant. running time of different modules. The results are shown 
The results are shown in Table 2. in Table 3. 
  
Table 2: Precision, recall, and AUC values of different Table 3: Results of ablation experiment. 
models. Running 
Module Accuracy Recall F1 
Data Preci Rec AU time (s) 
Model P P P 
set sion all C Attention 
0.774 0.819 0.807 69.58 
0.8 0.7 module 
GNN 0.782 
25 91 GNN 
0.862 0.842 0.853 43.12 
Graph module 
Convolu 0.8 0.7 Designed 
0.825 0.973 0.956 0.961 46.89 
tional 31 96 algorithm 
Network  
Twit Referenc <0. 0.9 <0. 0.8 <0.
0.865 From Table 3, the accuracy, recall, F1 value, and 
ter e [19] 05 04 05 36 05 
Referenc 0.9 0.9 running time of the attention module are 0.774, 0.819, 
0.903 
e [20] 32 19 0.807, and 69.58s, respectively. The accuracy, recall, F1 
Designe
value, and running time of the GNN module are 0.862, 
d 0.9 0.9
0.966 
algorith 43 82 0.842, 0.853, and 43.12s, respectively. The four 
m indicators of the designed model are 0.973, 0.956, 0.961, 
0.8 0.7
GNN 0.768 and 46.89s, respectively. The accuracy, recall, and F1 
13 78 
Graph score of the designed model are higher than those of the 
Convolu 0.8 0.8
0.821 two sub-modules, and the running time is higher than that 
tional 46 15 
of the attention module, but slightly lower than that of the 
Network 
Red Referenc <0. 0.9 <0. 0.8 <0. GNN module. Despite the augmented computational 
0.871 
dit e [19] 05 13 05 57 05 complexity of the model, it has been demonstrated to 
Referenc 0.9 0.9
0.896 enhance prediction accuracy. In practical application 
e [20] 35 11 
Designe scenarios, the additional temporal expenditure is deemed 
d 0.9 0.9
0.972 justifiable. 
algorith 38 76 
 
m 
 4.2 Analysis of the practical application effect 
From Table 2, in the Twitter dataset, the accuracy, of ACM in social networks 
recall, and AUC values of the traditional GNN model are To verify the practical application effect of the ACM 
0.782, 0.825, and 0.791, respectively. The three based on improved GNN, this study first calculates the 
indicators of the graph convolutional network are 0.825, space overhead and computation time of the research 
0.831, and 0.796, respectively. The three indicators of the model during encryption and decryption. It is compared 
model in reference [19] are 0.865, 0.904, and 0.836, with the results of traditional GNN and the model in 
respectively. The three indicators of the model in reference [20]. Space overhead refers to the storage space 
reference [20] are 0.903, 0.932, and 0.919, respectively. 
32   Informatica 49 (2025) 21–36                                                                     Y. Zhang 
occupied during the storage and operation of the model, as shown in Figure 9. 
50 40
Designed algorithm Reference [20] Designed algorithm Reference [20]
45 36
GNN GNN
40 32
35 28
30 24
25 20
20 16  
15 12
10 8
5 4
0 0
10 15 20 25 30 0 5 10 15 20 25
Social data volume Social data volume
(a) Calculate expenses (b) computing time
Figure 9: The computational cost and time of different models. 
 
In Figure 9, as social data increases, the spatial PM 0 1 1 1 
overhead and computation time of different models 
Sensitivity 1 1 0 1 
gradually increase. When the social data scale is 30, the 
space overhead of traditional GNN, models in reference UA 1 1 1 1 
[20], and research models is 43.7±3.13Kb, 30.4±3.05Kb, Trust level 1 1 0 1 
and 16.2±2.88Kb, respectively, with computation times 
Personalizatio
of 32.1±2.79s, 15.9±2.92s, and 5.3±0.97s. The space 1 0 1 1 
overhead and computation time of the research model are n 
much lower than other models, which proves its high  
computational efficiency and low computational In Table 4, only the research model is consistent in 
complexity. This study validates the access control terms of UPQ. In terms of HR and UA, all four models 
effectiveness of the research model from seven aspects: are consistent. In terms of PM, traditional GNN does not 
User Preference Quantification (UPQ), Historical comply. The model in reference [20] does not match in 
Records (HR), Privacy Metrics (PM), Sensitivity, User terms of sensitivity and trustworthiness. This may be 
Attributes (UA), Trust, and Personalization. If the effect because the model may not have dynamically evaluated 
matches, output 1; otherwise, output 0. Table 4 compares user behavior, authentication, or contextual information, 
the model with traditional GNN, reference [19], and [20]. resulting in an inability to accurately measure trust levels. 
Among them, UPQ can meet user needs, HRs are used to In terms of personalization, only the model in reference 
evaluate the consistency of user behavior, PMs and [19] does not match. The research model is consistent in 
sensitivity can ensure data security and compliance, UAs all 7 aspects, proving that its access control effect is 
can provide basic access control basis, trust can evaluate relatively ideal. Finally, the Receiver Operating 
the reliability of user behavior, and personalization can Characteristic curve (ROC) is introduced. The horizontal 
improve user experience. The results are shown in Table axis of the ROC curve represents the false positive rate, 
4. which represents the proportion of all negative samples 
Table 4: Access control effectiveness of different models. that were incorrectly predicted as positive. The vertical 
Researc axis represents the true sample rate, which represents the 
proportion of all actual positive samples correctly 
GN Referenc Referenc h 
Index predicted as positive samples. The model correctly 
N e [19] e [20] algorith identifies requests that are actually positive samples as 
m legitimate access and requests that are actually negative 
UPQ 0 0 0 1 samples as illegal access. The ROC curves of the four 
models are calculated separately, and the results are 
HR 1 1 1 1 shown in Figure 10. 
Space overhead (Kb)
TIme (s)
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 33 
1.0
0.9
0.8
0.7
0.6
0.5
 
0.4
Designed algorithm
0.3 Reference [20]
0.2 Reference [19]
0.1 GNN
0
0 0.2 0.4 0.6 0.8 1.0
False Positive
Figure 10: ROC curves and correlation coefficients R of four models. 
 
From Figure 10, the ACM based on traditional GNN preferences and privacy measurement mechanisms, 
is closest to the standard line and has the smallest area effectively improving network security, it also increases 
under it. The lower area of the model in reference [19] is certain computational overhead. Therefore, in practical 
second, followed by the model in reference [20]. The applications, debugging needs to be carried out according 
ROC curve of the proposed model is closer to the upper to specific requirements. 
left corner, with the largest lower area, indicating its 
strong classification ability and proving the high accuracy 6  Conclusion 
of the user preference-based ACM based on the improved 
ACM is crucial for the security of social networks, as it 
GNN. 
can help protect sensitive data and prevent malicious 
5  Discussion attacks and violations. To improve the accuracy and 
operational efficiency of social network ACM, a new 
The research aims to improve the effectiveness of social type of ACM was designed based on the preferences of 
network access control by utilizing MLA mechanisms to social network users. The user preferences were 
enhance GNN's understanding of complex social simulated using GNN, and a MLA mechanism was 
relationships and personal behavior. A GNN-based social introduced to improve the model. The experimental 
network ACM based on user preferences is proposed. The results showed that the accuracy and F1 value of the 
results showed that the accuracy and F1 value of the proposed model were 95.7% and 96.2%, respectively, 
proposed ACM improved by 7.4% and 12.6% significantly higher than other models. This proved that 
respectively compared to the GNN model and the through GNN and MLA mechanism, the model could 
blockchain-based IoT ACM, demonstrating its high dynamically capture user preference features and improve 
classification accuracy. This is similar to the conclusion classification accuracy. The space cost of the proposed 
drawn by You M et al. [6], while the proposed model is model was 16.2Kb and the computation time was 5.3s, 
superior. This is because the proposed model optimizes which was significantly lower than the space cost and 
GNN through a MLA mechanism, which can more computation time of other models. This proved that the 
effectively capture complex patterns of user preferences model adopted a lightweight GNN architecture, reducing 
and social relationships, thereby significantly improving computational complexity and optimizing algorithm 
performance. The computation time for encryption and design to reduce space cost. Although the proposed ACM 
decryption of the proposed model was 5.3 seconds, which has superior performance, there are still some 
was much lower than the GNN model and the shortcomings. The study did not test it on different types 
blockchain-based IoT ACM. This conclusion is consistent of social platforms, and future research will further test 
with the findings of Gai K et al. [7], but the running the performance of the model through different social 
efficiency of the proposed model is higher than that of the network platforms to improve its universality. At the 
method proposed by Gai K et al. This is because the same time, the performance of the model in dynamic 
proposed model significantly improves computation time environments will be explored to cope with the constantly 
through MLA and information entropy. In summary, the changing user behavior and data traffic in social 
proposed model performs well in multiple aspects. networks. 
Although the proposed model can more accurately  
identify legitimate and illegitimate access through user 
True Positive
34   Informatica 49 (2025) 21–36                                                                     Y. Zhang 
Journal of System Assurance Engineering and 
Funding Management, 2023, 14(4): 1379-1386. 
https://doi.org/10.1007/s13198-023-01942-z 
The authors declare that no funds, grants, or other support [9] Azbeg K, Ouchetto O, Andaloussi S J. (2022). Access 
were received during the preparation of this manuscript.  control and privacy-preserving blockchain-based 
system for diseases management. IEEE Transactions 
Competing interests 
on Computational Social Systems, 2022, 10(4): 
The authors have no relevant financial or non-financial 1515-1527. 
interests to disclose. https://doi.org/10.1109/TCSS.2022.3186945 
[10] Zhang L, Zhang Y, Wu Q, Mu Y, Rezaeibagha F. 
Data availability statement (2022). A secure and efficient decentralized access 
control scheme based on blockchain for vehicular 
All data generated or analysed during this study are social networks. IEEE Internet of Things Journal, 
included in this article. 2022, 9(18): 17938-17952. 
https://doi.org/10.1109/JIOT.2022.3161047 
References [11] Zhao Y, Yu H, Liang Y, Conti M, Bazzi W, Ren Y. 
(2023). A sanitizable access control with 
[1] Gai T, Cao M, Chiclana F, Zhang Z, Dong Y, 
policy-protection for vehicular social networks. 
Herrera-Viedma E, Wu J. (2023). Consensus-trust 
IEEE Transactions on Intelligent Transportation 
driven bidirectional feedback mechanism for 
Systems, 2023, 25(3): 2956-2965.
improving consensus in social network large-group  
https://doi.org/10.1109/TITS.2023.3285623 
decision making. Group Decision and Negotiation, 
[12] Squicciarini A, Rajtmajer S, Gao Y, Semonsen J, 
32(1): 45-74. 
Belmonte A, Agarwal P. (2022). An extended 
https://doi.org/10.1007/s10726-022-09798-7 
ultimatum game for multi-party access control in 
[2] Kashmar N, Adda M, Ibrahim H. (2022). Access 
social networks. ACM Transactions on the Web 
control metamodels: review, critical analysis, and 
(TWEB), 2022, 16(3): 1-23.
research issues. Journal of Ubiquitous Systems and  
https://doi.org/10.1145/3555351 
Pervasive Networks, 16(2): 93-102. 
[13] Dixit M S, Wajgi M D, Wanjari S. (2022). Real time 
https://doi.org/10.5383/JUSPN.03.01.000 
user access control on social network using deep 
[3] Wang W, Huang H, Yin Z, Gadekallu T R, Alazab, M, 
learning. International Journal for Research 
Su C. (2023). Smart contract token-based 
Publication and Seminar, 2022, 13(2): 246-251.
privacy-preserving access control system for  
https://jrps.shodhsagar.com/index.php/j/article/view/
industrial Internet of Things. Digital 
598. 
Communications and Networks, 2023, 9(2): 337-346. 
[14] Wen W, Fan J, Zhang Y, Fang Y. (2022). APCAS: 
https://doi.org/10.1016/j.dcan.2022.10.005 
Autonomous privacy control and authentication 
[4] Thabit S, Yan L S, Tao Y, Abdullah A B. (2022). Trust 
sharing in social networks. IEEE Transactions on 
management and data protection for online social 
Computational Social Systems, 2022, 10(6): 
networks. IET Communications, 2022, 16(12): 
3169-3180.
1355-1368. https://doi.org/10.1049/cmu2.12401  
https://doi.org/10.1109/TCSS.2022.3218883 
[5] Ameer S, Benson J, Sandhu R. (2022). Hybrid 
[15] Safi S M, Movaghar A, Ghorbani M. (2022). Privacy 
approaches (ABAC and RBAC) toward secure 
protection scheme for mobile social network. 
access control in smart home IoT. IEEE 
Journal of King Saud University-Computer and 
Transactions on Dependable and Secure Computing, 
Information Sciences, 2022, 34(7): 4062-4074.
2022, 20(5): 4032-4051.  
https://doi.org/10.1016/j.jksuci.2022.05.011 
https://doi.org/10.1109/TDSC.2022.3216297. 
[16] Ahmed F, Wei L, Niu Y, Zhao T, Zhang W, Zhang D, 
[6] You M, Yin J, Wang H, Cao J, Wang K, Miao Y, 
Dong W. (2022). Toward fine‐grained access control 
Bertino E. (2023). A knowledge graph empowered 
and privacy protection for video sharing in media 
online learning framework for access control 
convergence environment. International Journal of 
decision-making. World Wide Web, 2023, 26(2): 
Intelligent Systems, 2022, 37(5): 3025-3049.
827-848.  
https://doi.org/10.1002/int.22810 
https://doi.org/10.1007/s11280-022-01076-5 
[17] Salem R B, Aimeur E, Hage H. (2023). A multi-party 
[7] Gai K, She Y, Zhu L, Choo K K R, Wan Z. (2023). A 
agent for privacy preference elicitation. Artificial 
blockchain-based access control scheme for zero 
Intelligence and Applications, 2023, 1(2): 98-105.
trust cross-organizational data sharing. ACM  
https://doi.org/10.47852/bonviewAIA2202514 
Transactions on Internet Technology, 2023, 23(3): 
[18] Mayeke N R, Arigbabu A T, Olaniyi O O, Okunleye 
1-25. https://doi.org/10.1145/3511899 
O J, Adigwe C S. (2024). Evolving access control 
[8] Wu H, Ye W, Guo Y. (2023). Data access control 
paradigms: A comprehensive multi-dimensional 
method of cloud network secure storage under 
analysis of security risks and system assurance in 
Social Internet of Things environment. International 
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 35 
cyber engineering. Asian Journal of Research in  
Computer Science, 2024, 17(5): 108-124.  
https://doi.org/10.2139/ssrn.4752902  
[19] Patil R Y. (2024). A secure privacy preserving and  
access control scheme for medical internet of things  
(MIoT) using attribute-based signcryption.  
International Journal of Information Technology,  
2024, 16(1): 181-191.  
https://doi.org/10.1007/s41870-023-01569-0  
[20] Zhonghua C, Goyal S B, Rajawat A S. (2024). Smart  
contracts attribute-based access control model for  
security & privacy of IoT system using blockchain  
and edge computing. The Journal of  
Supercomputing, 2024, 80(2): 1396-1425.  
https://doi.org/10.1007/s11227-023-05517-4  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
36   Informatica 49 (2025) 21–36                                                                     Y. Zhang 
 
 
https://doi.org/10.31449/inf.v49i16.7787 Informatica 49 (2025) 37–52 37 
Fusion of Deep Convolutional Neural Networks and Brain Visual 
Cognition for Enhanced Image Classification 
Xintao Li1, *, Hongyan Guo2 
1College of Innovation and Entrepreneurship, Henan Open University, Zhengzhou 450046, China 
2School of Information Engineering and Artificial Intelligence, Zhengzhou Vocational University of Information and 
Technology, Zhengzhou 450046, China 
*Email of Corresponding Author: lxt5168@163.com 
Keywords: deep convolutional neural network, brain, visual cognition, intelligent computing model, image 
classification 
Received: December 9, 2024 
The brain visual system is one of the core centers for human perception of external information. How to 
establish the brain visual cognitive system to classify and process image information is a key matter in 
the area of human-computer connection. In order to improve the accuracy of computer vision image 
classification, a fusion intelligent computing model based on deep convolutional neural network and brain 
visual cognition is proposed. This model simulates the visual processing mechanism of the human brain 
and uses brain computer interface technology to extract electroencephalogram signals, thereby achieving 
efficient classification and processing of image information. When designing an image classification 
model based on DCNN, a long short-term memory network structure is introduced to extract time series 
features of electroencephalogram signals. In order to enhance the classification accuracy of the model, 
attention mechanism and occlusion independent neural response methods are also applied to improve the 
accuracy of capturing the correlation information between brain response and image features. The results 
show that the prediction accuracy of the research model reaches 93.54% and 94.03% in the V4 visual 
region and L0 visual region, respectively. The highest accuracy on facial visual images reaches 95.46%, 
while the lowest accuracy on animal visual images is 91.57%. By introducing the long short-term memory 
module, the loss value of the model decreases from 0.26 to 0.21, with a reduction of 19.23%. In addition, 
ablation experiments show that by introducing attention mechanisms and occlusion independent neural 
responses, the final classification accuracy is improved to 93.94%. In summary, the research on the fusion 
intelligent computing model grounded on deep convolutional neural networks and brain visual cognition 
effectively improves the accuracy of image classification and demonstrated its potential in the field of 
intelligent computing. 
Povzetek: Predstavljen je inteligentni model za razvrščanje slik, ki združuje globoke konvolucijske 
nevronske mreže (DCNN) in možgansko vizualno kognicijo preko EEG signalov. 
 
 
1 Introduction outstanding performance in image processing tasks [4].  
sHowever, despite the excellent performance of computer 
With the rapid prosperity of artificial intelligence, technology in image classification, computers still cannot 
human-computer interaction has turned into a trend in the fully replace the precise image recognition and 
current research field. Brain computer interface (BCI), as classification capabilities of the human brain in complex 
a cutting-edge scientific research direction, is gradually and diverse open environments with interference and 
becoming a meaningful bridge in the area of human- occlusion [5]. So, the challenge currently facing the field 
computer connection. The visual system of the human of computer vision is figuring out how to empower 
brain has evolved over millions of years and possesses artificial intelligence systems to more effectively mimic 
extremely efficient visual processing capabilities. human brain cognition and attain precise image 
Through multi-level visual processing mechanisms, the classification in intricate scenarios. Therefore, in this 
brain can quickly and accurately understand complex context, research innovatively combines the powerful 
visual information [1]. When external objects are computing power of DCNN with the cognitive 
transmitted to the visual center of the brain through the characteristics of the brain’s visual system, and constructs 
visual organs, the brain quickly recognizes, classifies, and an intelligent computing model based on the fusion of 
understands these visual information, thereby forming DCNN and brain visual cognitive information, in order to 
cognition of the object or scene [2]. BCI can interpret achieve accurate image classification in complex 
visual cognition of the brain by recording and analyzing backgrounds. 
electroencephalogram (EEG) signals [3]. The Deep The research objectives include designing and 
Convolutional Neural Network (DCNN) in computer implementing an intelligent computing model based on 
vision technology has attracted much attention due to its  DCNN and EEG signal fusion to improve the performance 
38 Informatica 49 (2025) 37–52 X. Li et al. 
of image classification in interference and occlusion generative adversarial networks and variational 
environments. The research aims to explore how the autoencoders to produce composite EEG cues. The results 
model simulates the visual recognition process of the showed that the method was effective [8]. Kumari et al. 
human brain, especially for accurate image classification proposed a multi-channel EEG movement sorting model 
in complex backgrounds. The research hypothesis is that to improve the precision of EEG movement sorting. The 
by combining the visual feedback and image features of model utilized CNN to extract descriptive emotional state 
the brain, intelligent computing models can simulate the characteristics from EEG signals and generates two-
visual recognition process of the human brain, thereby dimensional images to represent these features. The 
improving the accuracy of classification results. The outcomes revealed that the overall precision of this model 
preset results demonstrate that by introducing visual reached 83.04% [9]. 
cognitive information from the brain, the model can mimic DCNN occupies a momentous position in EEG 
the cognitive process of the human brain in actual visual picture sorting tasks. Santamaria-Vazquez et al. raised a 
tasks, providing new ideas and directions for the sorting model grounded on different control signals to 
integration of BCIs and intelligent systems. extract complex features from EEG data for classification. 
The research content mainly includes four sections. The model used DCNN for time calibration of BCIs and 
The second section provides a survey of the current study integrated modules for detection of event-related 
status of visual EEG picture classification and DCNN potentials. The outcomes revealed that the command 
around the world. The third section conducts research on decoding accuracy of this way improved by 16.0% [10]. 
intelligent computing models that integrate DCNN and Yıldırım et al. raised a novel deep one-dimensional CNN 
brain visual cognition. The first section proposes the monitoring model to optimize the precision of EEG 
design of a picture sorting model grounded on the fusion monitoring. The model utilized machine learning 
of DCNN and brain visual cognition information. The techniques to automatically identify regular and aberrant 
second section designs an intelligent computing model EEG signals, and classified EEG signals using an end-to-
based on the fusion of DCNN and brain visual cognitive end structure. The outcomes revealed that this way was 
information. The fourth section validates the intelligent feasible [11]. Miao et al. raised a multi-layer CNN model 
computing model that integrates DCNN with brain visual using a DCNN structure to raise the classification 
cognition. precision of EEG pattern identification algorithms. The 
model utilized prior knowledge and complex parameter 
2 Related works adjustments to extract spatial frequency features. The 
outcomes showed that this way had good classification 
The visual cognitive ability of the brain can recognize, capability [12]. Li et al. proposed a way of using DCNN 
classify, and understand visual information. In recent combined with continuous wavelet transform to enhance 
years, research on visual interpretation based on the identification rate of limbs action image EEG cues. 
monitoring the neural response of the brain during visual This method mapped the limbs action image EEG cues to 
cognition has gained the eyesight of numerous time-frequency image signals using continuous wavelet 
professionals and savants. Gao et al. raised an attention- transform, and input the image signals into the CNN 
based parallel multi-scale Convolutional Neural Network structure to collect characteristics and classify them. The 
(CNN) model to improve the accuracy of decoding EEG outcomes revealed that this way effectively raised the 
aroused potentials. The model used two parallel recognition rate [13]. In recent years, the combination of 
convolutional layers to extract temporal features and BCI and DCNN has become an important research 
utilized attention mechanisms to weight features at direction in the analysis of EEG and brain visual neural 
different times. The outcomes revealed that the model activity signals. The detailed progress of BCI is as follows: 
effectively reformed the interpreting ability of ocular Tang X et al. proposed an end-to-end BCI method based 
aroused potentials under complex conditions [6]. Ahirwal on CNNs, which directly extracts spatiotemporal features 
et al. proposed a new channel selection technique that from EEG signals and classifies them. The results showed 
could identify and characterize harmful emotions aiming that this method could achieve higher classification 
to raise the precision of emotion sorting of EEG signals. accuracy than traditional manual feature extraction 
This technique extracted three forms of characteristics methods, especially in motion imagination tasks and 
from EEG cues: time-domain characteristics, frequency- various emotional state classification tasks [14]. In 
domain characteristics, and entropy based characteristics, addition, Kawala Sterniuk et al. reviewed over 50 years of 
and used Support Vector Machines (SVM) and artificial using BCIs and concluded that BCI not only enables brain 
neural networks to classify emotions based on the control, but also opens the door for regulating the central 
extracted features. The outcomes showed that this way nervous system through neural interfaces, demonstrating 
effectively optimized the sorting behaviour [7]. the potential applications of this technology [15]. The 
Komolovait et al. raised a way of using CNN combined research on integrating BCI and DCNN will provide a 
with stable-state ocular aroused potentials to gain more solid foundation for the popularization and 
interpretable characteristics from rough EEG cues in order application of BCI technology. The comparative summary 
to improve the effectiveness of brain activity data in table is shown in Table 1. 
classifying visual stimuli. This method also introduced 
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 39 
 
Table 1: Comparison summary table 
Study Method Advantages Limitations Missing features 
Gao Parallel multi-scale Improved the Still affected by noise in Failed to effectively combine 
et al. CNN based on attention decoding complex environments, spatial and temporal features in the 
[6] performance requires processing a brain's visual cognitive process; 
of visual large amount of cannot adapt to complex 
evoked temporal features environmental visual information 
potentials processing 
Ahirw EEG-based emotion Improved Focuses mainly on Cannot process complex visual 
al et classification model, emotion emotion classification, information and its complex 
al. [7] combining SVM and classification lacks deep classification relationship with emotions 
artificial neural accuracy and processing of visual 
networks information 
Komo Steady-state visual Effectively Poor robustness to signal Failed to effectively combine 
lovait evoked potentials improved noise, high complexity visual cognitive mechanisms; 
ė et combined with CNNs visual in training generative limited to static visual stimulus 
al. [8] stimulus adversarial networks processing 
classification 
Kuma Multi-channel EEG- Achieved an Focuses on emotion Cannot handle complex image 
ri et based emotion average classification, mainly classification tasks, especially 
al. [9] classification model accuracy of uses image feature multi-class image recognition 
83.04% representations, lacks 
handling of more 
complex scenarios 
Santa Classification model Increased Relies heavily on event- Lacks adaptability to dynamic 
maria- based on different command related potential EEG signals, unable to combine 
Vazqu control signals using decoding detection, may face spatial and temporal features 
ez et DCNNs accuracy by difficulties in decoding 
al. 16.0% complex EEG data 
[10] 
Yıldır EEG monitoring model Provides a Focuses on normal vs Cannot effectively process multi-
ım et based on deep one- feasible abnormal EEG signal class or dynamically changing 
al. dimensional CNNs classification classification, lacks visual information 
[11] method ability to handle 
complex visual tasks 
Miao EEG pattern recognition Shows good Mainly focuses on Lacks comprehensive capture of 
et al. based on multi-layer classification spatial frequency feature dynamic EEG data or multi-
[12] DCNNs performance extraction, may be dimensional features of visual 
limited in handling information 
complex dynamic tasks 
Li et Classification of Significantly Relies on signal Cannot process EEG signals 
al. left/right hand motor improved preprocessing, suitable related to visual tasks, sensitive to 
[13] imagery EEG signals recognition for specific tasks environmental noise 
combined with rate 
continuous wavelet 
transform and DCNNs 
In summary, although existing methods have made complex environments in existing methods. It has higher 
some progress in EEG classification tasks, they have classification accuracy and wide application prospects. 
certain limitations in handling complex dynamic tasks, 3 Intelligent computing model 
enhancing robustness, and adapting to multiple tasks. The 
research combines the visual cognitive mechanism of the integrating DCNN and brain visual 
brain with DCNNs and Long Short-Term Memory (LSTM) cognition 
networks to design a fusion intelligent computing model. 
Research receives EEG information through BCI, 
This model can more comprehensively capture the spatial 
combines voxel encoding and improved DCNN model to 
and temporal features of EEG signals, solving the 
achieve image classification, and uses LSTM to collect 
problems of lack of adaptability and poor adaptability to 
temporal characteristics of EEG cues. Attention 
40 Informatica 49 (2025) 37–52 X. Li et al. 
mechanism is utilized to raise the accuracy of image The ventral lower temporal cortex is particularly closely 
feature extraction, and the correlation between brain related to complex visual recognition and is the main 
response and image features is enhanced by masking functional area for object and face recognition. When the 
irrelevant neural responses. brain receives visual stimuli, it stimulates the cortical 
regions in the ventral stream, transforming simple visual 
3.1 Design of image classification model features into higher-level cognitive concepts. For instance, 
based on DCNN and brain visual visual information is initially processed by the primary 
visual cortex and then passed through intermediate areas, 
cognitive information 
ultimately being mapped to the inferior temporal cortex 
Neuroscience research has found that the human brain within the ventral stream, where intricate functions like 
achieves complex cognitive processing through parallel object recognition and color discrimination take place. 
information exchange between dorsal and ventral streams The dorsal flow is mainly responsible for processing 
in visual activities [16]. Abdominal flow is a pathway that spatial information, motion perception, and action control. 
connects the primary sensory cortex with the temporal and Dorsal flow helps the brain perform functions such as 
prefrontal regions, primarily responsible for recognizing object localization, motion tracking, and hand eye 
visual and auditory stimuli and mapping basic information coordination through connections with the parietal lobe, 
to higher-level semantic concepts [17]. The dorsal flow is motor cortex, and other areas. Therefore, given the core 
responsible for spatial information and motion control. role of ventral flow in image classification tasks, research 
The activity of brain neurons triggered by visual stimuli is focuses on analyzing the brain signal response of ventral 
called EEG signals, and BCIs can record and measure flow to better understand the process of visual feature 
these signals through biometric technology to reflect the extraction and semantic comprehension. The encoding 
brain’s response to behavior. The core area of ventral flow framework for ventral response based on brain visual 
includes the primary visual cortex, ventral intermediate cognition is shown in Figure 1. 
cortex, ventral lower temporal cortex, and other regions. 
V1 V2 V4 L0
Stimulus
Linear layer Visual area of 
Feature extraction brain activity
model  
Figure 1: A coding framework for ventral response based on brain visual cognition 
As shown in Figure 1, in the ventral response and visual stimuli. This mapping helps to reveal the roles 
encoding framework based on brain visual cognition, the of different brain regions in visual information processing, 
brain activity caused by visual stimuli can be obtained thereby enhancing the accuracy of image classification 
through the BCI, and the stimulus image can be input into tasks. The EEG signals capture the electrical activity of 
the feature extraction model. After nonlinear calculation, the cerebral cortex, which can be mapped to specific 
the feature space of the image can be obtained. Then, these regions of the brain through modeling techniques such as 
features are used to predict the voxel space of the visual source localization, in order to infer activity responses in 
region through linear layers. The voxel encoding model different areas. This type of method can correlate the 
transforms human-readable data into a format that spatiotemporal patterns of EEG signals with voxels in 
machines can store, facilitating the achievement of either functional neuroimaging data. There may be some 
shared encoding across various visual regions or unique common neural response patterns between multiple visual 
encoding for specific visual areas. This process aids in regions. These shared response patterns can be captured in 
pinpointing the regions within the brain's visual cortex that voxel encoding models, revealing how these regions 
are responsible for processing visual information. [18]. collectively respond to the same visual stimuli. For 
Voxel encoding converts brain activity into a feature space, example, in image classification tasks, certain visual 
enabling precise association between cognitive responses regions may exhibit similar neural activity responses to 
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 41 
the same visual features, so voxel encoding can reflect the feature representations by combining convolutional and 
similarity and interactivity between these regions as a pooling layers, which helps extract abstract features from 
shared encoding pattern. By combining the results of brain data. Therefore, the study adopts DCNN to extract image 
visual cognition and image classification, complementary features and designs a picture sorting model grounded on 
information exchange and expression can be achieved, the fusion of DCNN and brain visual cognitive 
thereby obtaining a more comprehensive joint information, as shown in Figure 2.
representation. DCNN can automatically learn image 
Brain 
response
Splicing 
Reliability prediction SVM
EEG signal data together
acquisition
DCNN
 
Figure 2: Image classification model grounded on information fusion 
As represented in Figure 2, the image classification   
 wv = d f / (d + f )  (2  
v f d )
model based on information fusion mainly includes three b v
parts: characteristic collection structure, characteristic In equation (2), w  represents the fusion weight of 
v
reliability prediction structure, and brain computer 
image features, and d f   represents the classification 
information fusion classification structure. The brain v
response data utilizes the ventral response encoding sensitivity index of image features. The fusion weight of 
framework to extract semantic features, while image data brain response features is shown in equation (3). 
  
is extracted through the DCNN structure for image  wb = d f / (d f + d f )  (3) 
b b v
characteristic collection. Next, the extracted 
characteristics are input into the feature reliability In equation (3), w  represents the fusion weight of 
b
prediction structure for reliability calculation, and then the brain response features. The math description for the 
fusion weights of image features and brain response fusion feature is represented in equation (4). 
characteristic are automatically adjusted. Finally, the  fF = (wb  f (b))concat(wv  f (v))  (4) 
fused features are input into the SVM for classification. 
In equation (4), f  represents fusion features, 
F f (b)  
EEG signals will undergo denoising processing after 
acquisition, such as bandpass filtering, independent represents brain response features, and f (v)  represents 
component analysis, signal normalization, and other image features. Due to the fact that EEG signals are 
preprocessing operations to ensure signal quality. collected over a continuous period of time and have time-
Subsequently, EEG signals will be synchronized with the series characteristics, there is a continuity relationship 
presentation time of visual stimuli to ensure accurate between the signals at each moment and those before and 
matching between brain responses and image features at after [19]. However, although existing feature extraction 
each moment. The loss function for reliability prediction models perform well in many application scenarios, they 
is shown in equation (1). often do not fully consider the temporal dependencies in 
n
  time series data. Especially when it comes to traditional 
 L = ( )2
)
MSE  d p − d f / n  (1  
n=1 b models like CNNs, while they excel at extracting features 
from images and static data, they can fall short when it 
In equation (1), L   means the loss function of 
MSE comes to capturing temporal information and dynamic 

reliability prediction, d p  represents the feature reliability signal changes in time-series data, such as EEG signals. In 
 response to this situation, the study uses an LSTM 
prediction value, d f   represents the classification 
b structure to extract time series features of EEG signals. 
sensitivity index of brain response features, and n  The architecture for extracting brain response features 
represents the batch size. The fusion weights of image based on time series is shown in Figure 3.
features are shown in equation (2). 
42 Informatica 49 (2025) 37–52 X. Li et al. 
Transformer module
T time stamps T N calss tokens
x1 x2 x3 xt x1 x2 x3 xt
Temporal  
Filtering transfomer module
V V V L
1 2 4 0
Global features of EEG signals T + N
 
Figure 3: Architecture for extracting brain response features based on time series 
As shown in Figure 3, the brain response feature represents the output matrix of the input gate. The 
extraction architecture based on time series uses calculation for the output gate is shown in equation (7). 
Transformer module to extract global features of EEG  ot = (Wo ht−1, xt + bo )  (7) 
signals on time series, and embeds absolute positions to 
maintain the order of the model. Before inputting In equation (7), o   represents the output gate, W  
t o
positional encoding, the classification identification bits means the weight of the output gate, and b  means the 
o
are concatenated with the time series, and then mapped 
offset term of the output gate. The output features are 
through linear transformation to raise the diversity of 
shown in equation (8). 
characteristic collection. In research models, LSTM is 
 h = o  tanh(C )
mainly used to integrate brain response data collected  (8) 
t t t
from BCIs. The integration process is as follows: Firstly, In equation (8), h  represents the output feature. The 
t
the brain response signals collected from the BCI system 
unique gating mechanism of LSTM can effectively handle 
are preprocessed, such as denoising and normalization, to 
the problem of long time intervals and delays in time series, 
obtain clean time-series data. Then, these preprocessed 
and can discard and store large-span information in EEG 
brain response data are used as inputs for the LSTM 
data, thus better encoding EEG signals. 
network. LSTM networks can capture temporal 
dependencies in data and learn neural response patterns of 
the brain at different time points. Next, through the time- 3.2 Design of intelligent computing model 
dependent modeling of LSTM, the output data contains based on DCNN and brain visual 
the gradual response patterns of the brain to visual stimuli cognitive information 
throughout the entire image processing process. Finally, 
The study simulates the connectivity and 
the temporal response of the brain is processed by LSTM 
classification patterns of biological brain neurons, 
and combined with image features extracted by DCNN. 
exploring the connection between picture features and 
The calculation for the forget gate of LSTM structure is 
brain reactions. DCNN has demonstrated significant 
shown in equation (5). 
capabilities in image feature extraction. By combining 
 ft = (W f [ht−1, xt ]+ bf )  (5) convolutional and pooling layers, it can automatically 
learn multi-level abstract feature representations of 
In equation (5), f  means the output of the forget gate, 
t images, effectively capturing low-level and high-level 
W  means the weight of the forget gate,   means the 
f features in images. However, despite DCNN's high 
Sigmoid activation function, b  represents the offset term efficiency in feature extraction, the image features it 
f extracts still struggle to fully explain the brain's response 
of the forget gate, x  represents the input signal at time t , 
t patterns. This is because the visual cognitive process of 
and h  represents the output signal at time t −1  . The the brain not only relies on low-level visual features of 
t−1
images, but also involves complex high-level semantic 
unit update calculation is shown in equation (6). 
information processing, perceptual integration, and 
 Ct = ft Ct−1 + it Ct  (6) interaction with other cognitive processes such as memory 
In equation (6), C  means the cell condition at time and emotion. The features extracted by DCNN mainly 
t
t , C  represents the cell condition at time t −1 , and i  focus on significant visual information in the image, but 
t−1 t these features often lack sufficient high-level semantic 
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 43 
depth and are difficult to fully integrate with the complex to construct an intelligent computing model for the brain. 
responses of the brain in visual cognition. Therefore, the The brain response is used as supervised information for 
picture characteristics extracted by DCNN are difficult to images, and difficult to interpret high-level semantic 
fully explain the representation information of brain information is transferred to the DCNN model to achieve 
response, and there are some modality specific more accurate mining of the brain’s visual cognitive 
expressions between the two, making it difficult to deeply response. The intelligent computing model structure 
explore their deep correlations [20, 21]. In response to this grounded on data fusion is represented in Figure 4.
limitation, a network structure based on DCNN is studied 
Hypersphere
Brain 
response v1 v2 v3 vN
b1
EEG signal data Normalization
acquisition b2
b3
DCNN
bN
 
Figure 4: Intelligent computing model structure grounded on data fusion 
As represented in Figure 4, in the intelligent and N   represents the total amount of positive and 
computing model framework based on information fusion, negative samples. The calculation for the comparative loss 
characteristic extraction is first constructed on the is represented in equation (10). 
cognitive response data of the brain to visual images m
collected by the BCI. Then, the DCNN structure is applied exp(S( f (vi ), f (b+
j )) / )
 j=0
to collect characteristics from the input picture. After  Li = −log  (10) 
n
normalizing the two extracted features separately, the exp(S( f (vi ), f (b−
k )) / )
k=0
fused features are mapped onto an N-dimensional sphere. 
Subsequently, based on the normalized features, a set of In equation (10), L i  represents the contrastive loss, 
positive and negative samples are constructed, and the m  means the amount of positive samples, n  means the 
InfoNCE loss function is used for calculation, thereby 
amount of negative samples, f (v )  means the mapped 
achieving the transfer of correlated information between i
+
the two feature maps. The math description of the image features, f (bj )  represents brain response features 
InfoNCE loss function is represented in equation (9). 
of the same category as the image features, and f (b−
j )  
exp(S(z +
i , zi ) / )
Li = −log
 N  (9) represents brain response features of different categories 
exp(S(zi , z j ) / ) from the image features. In intelligent computing models 
j=0
based on the fusion of DCNN and brain visual cognitive 
In equation (9), L   represents the InfoNCE loss 
i information, the DCNN structure may be affected in 
function,    represents the temperature coefficient, z  classification accuracy due to irrelevant information. To 
i
represents the image representation corresponding to the address this issue, research is being conducted to improve 
the DCNN structure by incorporating attention 
input data x  , 
i S(z e r s n s t
i , z r e t h
j )  p e  e cosine similarity 
mechanisms. The DCNN feature extraction model 
between image representations, S(z , +
i zi )  represents the grounded on attention mechanism is represented in Figure 
alignment characteristics during hypersphere mapping, 5.
44 Informatica 49 (2025) 37–52 X. Li et al. 
(W2 , H2 )
(1,W2 H2 )
Fat
A
W W1 W2
H Frr1 H F
1 rr 2 H2
X0 X1 X2 X3  
Figure 5: DCNN feature extraction model grounded on attention mechanism 
As represented in Figure 5, the DCNN architecture In equation (12), A  represents positional importance 
used mainly includes multiple convolutional layers, and F   represents fully connected operation. The new 
at
pooling layers, activation functions, and fully connected 
feature map obtained by further downsampling the 
layers, aiming to extract multi-level feature 
abstract features is shown in equation (13). 
representations from images to improve classification 
 X2 = Frr2 (X1)  (13) 
performance. The core idea of DCNN is to extract local 
features from images through a series of convolution and In equation (13), X  represents the new feature map 
2
pooling operations, and then add nonlinear after further downsampling, and F   represents the 
rr 2
transformations through nonlinear activation functions to 
further downsampling operation. The corrected feature 
learn more complex image representations. The 
map is shown in equation (14). 
convolutional layer, as a fundamental component in the 
W2 H2
DCNN architecture, can extract local features of the input  X3 = A(i, j) X2 (i, j)  (14) 
image through convolution operations. After each i=1j=1
convolution operation in each layer, the study will use In equation (14), X   represents the feature map 
3
activation functions to perform nonlinear transformations obtained after attention branch correction, W   and H  
on the output results. The purpose of the activation 
represent the width and height of the characteristic map, 
function is to introduce nonlinear factors, so that the 
and (i, j)  represents the feature values on the feature map. 
network can learn more complex mapping relationships. 
The function of the pooling layer is to downsample the When capturing the correlation information between brain 
feature map output by the convolutional layer, thereby visual cognitive responses and image features, some non-
reducing the spatial size of the feature map while correlated neural responses may affect the determination 
preserving important features. After extracting sufficient of representation similarity. These "unrelated neural 
local features in the convolutional and pooling layers, the responses" pertain to neural activities that aren't directly 
last few layers are usually fully connected layers. The fully tied to visual tasks and might stem from background noise, 
connected layer linearly combines the extracted features irrelevant visual cues, or various other bodily influences. 
and generates the final output result through an activation For example, the activity of certain regions in EEG signals 
function. The DCNN feature extraction model grounded may be unrelated to the current visual task, and this 
on attention mechanism adds a parallel attention branch to irrelevant neural activity can lead to misleading similarity 
the initial DCNN structure to learn the importance judgments when the brain processes visual information. 
information of feature map position. This path can correct To address this issue, research has been conducted on an 
the activation values of feature maps, reduce the activation intelligent computing model based on the fusion of DCNN 
values of redundant information, and thus improve the and brain visual cognitive information, which devotes to 
accuracy of image characteristic collection [22, 23]. The raise the precision of capturing correlated information by 
feature transformation process is shown in equation (11). masking non-correlated neural responses. In the intelligent 
X = F computing model based on the fusion of DCNN and brain 
 1 rr1(X0 )  (11) 
visual cognitive information, the study aims to add 
In equation (11), X   represents the transformed 
1 windows of different scales to the extracted image features 
abstract feature map, X  represents the initial feature map, to mask non-correlated neural responses. The motivation 
0
of this method is to better highlight the effective response 
and F   represents the downsampling operation. The 
rr1 of the brain to visual information and improve the 
calculation for position importance is shown in equation accuracy of similarity determination between brain visual 
(12). cognitive responses and image features by reducing or 
 A = Fat (X1)  (12) eliminating the influence of irrelevant neural reactions. 
The visualization process of brain response and image 
feature correlation information is shown in Figure 6.
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 45 
Electroencephalogram signal Image features
Feature Feature Feature 
extraction extraction extraction 
d ( f (v), g(b)) d ( f (v (x, y)), f (b)) d ( f (v (x, y)), f (b))
1 n
1 1  n  n
V V
1  n
V t
 
Figure 6: Visualization process of brain response and image feature correlation information 
As shown in Figure 6, in the process of visualizing the used Python language and was implemented using the 
correlation information between brain response and image PyTorch framework. The experimental parameters were 
features, the correlation features are first represented in the set as follows: batch size was 16, original learning rate was 
shared representation space and their Euclidean distance 0.001, Adam optimizer was used during training, output 
is calculated. Then, a window with a scale of  i   is layer size was 40, and the key vector value in the self 
i
added to the extracted image features to mask non- attention mechanism was 128. The dataset was sourced 
correlated neural responses. Based on the Euclidean from the comprehensive evaluation platform Brain Score. 
distances calculated from various image features, different This dataset aimed to evaluate the effectiveness and 
sizes of occlusion windows are determined. The saliency accuracy of computer simulated brain operation models, 
maps obtained through occlusion at different scales are thus covering response data of primate visual systems. The 
then combined to create a comprehensive saliency map dataset contains approximately 5000 image stimuli, each 
that encapsulates the relationship between brain responses corresponding to a recorded brain electrophysiological 
and image features. The calculation for the significance response data. The stimulus images cover a total of 40 
map is shown in equation (15). categories, including natural scenes and artificial objects. 
The number of images in each category is roughly equal 
 V =| d( f (v (x, y)), f (b))−d( f (v), g(b)) |  (15) 
t t to ensure data balance. The size of each image is 224x224 
In equation (15), V  represents the significance map pixels, which can retain sufficient visual information and 
 t
meet the input requirements of CNN. Any data 
and d  represents the Euclidean distance. 
augmentation techniques used by the research include 
random cropping, horizontal flipping, random rotation, 
4 Validation of an intelligent and color jitter. The above data augmentation techniques 
computing model integrating can effectively expand the diversity of training data, avoid 
model overfitting, and improve the generalization ability 
DCNN and brain visual cognition to various visual stimuli. After preprocessing, the data was 
After setting up the experimental environment, the separated into a training set and a testing set in a 3:7 ratio. 
behaviour of the image classification model grounded on While primarily intended for evaluating brain functional 
information fusion was first verified, and then the models, the "Brain Score" dataset is well-suited as a data 
intelligent computing model based on information fusion source in this study to verify the efficacy of intelligent 
was experimentally analyzed. computing models that integrate image classification with 
brain visual cognition, given its abundance of visual 
4.1 Experiment environment construction stimulus images and corresponding EEG response data. In 
the experimental design of this study, the evaluation of 
To tesify the effectiveness of the intelligent image classification focuses on guiding the learning and 
computing model that integrated DCNN and brain visual classification of image features through brain response 
cognition, the study first conducted the construction of an data, rather than simply image classification. The detailed 
experimental environment. The experimental hardware experiment environment configuration and network 
system configuration was as follows: the processor was training parameters are represented in Table 2. 
Intel i7-8700, the GPU was Nvidia GeForce 1080Ti, and 
the memory was 64 GB DDR4. The experimental model 
46 Informatica 49 (2025) 37–52 X. Li et al. 
Table 2: Experiment environment configuration and prediction accuracy of the ventral response encoding 
network training parameters method based on brain visual cognition was significantly 
higher than the other two methods. The maximum 
Experiment Configuration Training par Config prediction accuracy of this method reached 93.54%, which 
al environ ameters uration was 6.05% and 19.57% higher than the maximum 
ment prediction accuracy of CNN-EM and GaborNet-VE, 
CPU Intel i7-8700 Batch Size 16 which were 87.49% and 73.97%, separately. From Figure 
GPU Nvidia GeFor Initial learni 0.001 7(b), the results of visual area L0 show that the maximum 
ce 1080Ti ng rate prediction accuracy of the ventral response encoding 
Memory 64 GB DDR4 Output layer 40 method based on brain visual cognition was 94.03%, 
size which was 11.49% and 18.95% higher than the maximum 
programmi Python Key vector v 128 accuracy of 82.54% and 75.08% of the other two methods, 
ng languag alue respectively. In addition, the study used paired t-tests to 
e validate the credibility of the results. In the V4 region, the 
Frame PyTorch Optimizer Adam difference in accuracy between the ventral response 
encoding based on brain visual cognition and CNN-EM 
4.2 Performance verification of image reached a statistically significant level (t=4.72, P<0.05). 
classification model based on The difference in accuracy between ventral response 
information fusion encoding based on brain visual cognition and GaborNet 
VE also reached a statistically significant level (t=6.88, 
In order to verify the predictive accuracy of ventral P<0.05). In the L0 region, the accuracy difference 
response encoding based on brain visual cognition for between ventral response encoding based on brain visual 
brain cognitive response, this method was compared and cognition and CNN-EM reached a statistically significant 
analyzed with other voxel encoding methods, including level (t=5.23, P<0.05). Similarly, the accuracy difference 
Convolutional Neural Network Enhancement Model between ventral response encoding based on brain visual 
(CNN-EM) and GaborNet Visual Encoding (GaborNet- cognition and GaborNet VE was also statistically 
VE). The accuracy comparison of different encoding significant in the L0 region (t=7.14, P<0.05). Ventral 
methods in different visual regions is represented in Figure response encoding based on brain visual cognition could 
7. From Figure 7(a), within the V4 visual region, the accurately predict brain cognitive response.
* *
1.0 * * 1.0 * *
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Brain cognitive Brain cognitive 
CNN-EM GaborNet-VE CNN-EM GaborNet-VE
encoding encoding
Coding method Coding method 
(a) Comparison of Accuracy in V4 Visual Region (b) Comparison of Accuracy in L0 Visual Region  
Figure 7: Comparison of accuracy of different encoding methods （*Indicating P<0.05） 
To further testify the performance of the image LSTM converged to 0.26, while the loss value of the 
classification model based on information fusion, a model after inserting LSTM converged to 0.21, with a 
relative unpack was operated on the classification models reduction of 19.23%. This indicated that the classification 
before and after adding LSTM, as represented in Figure 8. model incorporating LSTM had better convergence 
From Figure 8, the loss value of the model before adding performance.
Prediction accuracy
Prediction accuracy
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 47 
0.65
LSTM post model
LSTM pre model
0.55
0.45
0.35
0.25
0.15
0 200 400 600 800 1000
Number of iterations  
Figure 8: Comparison of training loss values for networks incorporating LSTM Models 
To further validate the capability of the image images, the classification accuracy of this model was the 
classification model grounded on information fusion, this lowest at 91.57%, which was 16.31%, 12.03%, and 19.08% 
study compared and analyzed the model with other higher than the accuracy of 75.26%, 79.54%, and 72.49% 
advanced image classification models, including Feature of the other three models, respectively. Compared with 
Weighted Classification (FWC), Residual Network basic testing image classification tasks, animal 
(ResNet), and Visual Geometry Group (VGG). In addition, classification often faces more complex backgrounds and 
to ensure the broad applicability and contextualization of varying object shapes, which makes this task an important 
the research results, the performance of these models was criterion for testing model robustness. Therefore, the 
compared with benchmark test results in the current field significant improvement of the research model in this task 
of computer vision. The accuracy comparison of different indicated that it has stronger generalization ability and 
classification models is represented in Figure 9. From adaptability when facing highly complex and dynamically 
Figure 9, in the datasets of various visual images, the changing visual environments. In summary, the image 
accuracy of the picture sorting model grounded on classification model based on information fusion has 
information fusion was the best. On facial vision images, demonstrated excellent classification performance in 
the accuracy of this model was as high as 95.46%, which multiple tasks. Moreover, the performance of the research 
was an improvement of 6.30%, 5.41%, and 10.03% model still has significant advantages compared to 
compared to the accuracy of 89.16%, 90.05%, and 85.43% benchmark testing. 
of FWC, ResNet, and VGG, respectively. This result The reason why facial image recognition had better 
showed significant advantages compared to the accuracy than animal image recognition was that facial 
mainstream benchmarks in the current field of facial images had more stable and easily recognizable features 
recognition. In facial recognition tasks, many of the most compared to animal images. Facial recognition typically 
advanced facial recognition technologies, such as FaceNet fixed structural features and relatively consistent 
and ArcFace, achieved high accuracy on multiple standard backgrounds, which enabled information fusion based 
datasets such as LFW and CASIA WebFace. For example, models to fully exploit the effective information in the 
FaceNet reported an accuracy of 94.63% on the LFW brain's visual cognitive model, thereby improving 
dataset [6]. Moreover, ArcFace also achieved a recognition accuracy. However, animal images face more 
recognition rate of nearly 94.51% on the same dataset [7]. complex challenges, including background noise, changes 
However, the above studies all achieved accuracy in in animal size and morphology, different shooting angles, 
interference free environments, while this study still different species, etc. These factors make classification 
achieved an accuracy of up to 95.46% in actual tasks more complex and varied. Therefore, in terms of 
environments with complex interference and occlusion. recognition accuracy, the performance of facial image 
Compared with existing benchmark tests, the researched classification was better than that of animal image 
image classification model based on information fusion classification.
had more advantages in performance. On animal visual 
Magnitude of the loss
48 Informatica 49 (2025) 37–52 X. Li et al. 
Intelligent computing model
FWC model
ResNet model
100 VGG model
90
80
70
60
50
40
30
20
10
0
Face 
Automobile Animal Fruits Chair
recognition
Types of visual images  
Figure 9: Comparison of accuracy between different classification model
4.3 Performance verification of intelligent relatively chaotic, while the sample distribution of ResNet 
computing models based on was relatively clear. The ResNet model had a more 
obvious distinction between facial and animal visual 
information fusion 
images, but it was more confusing in distinguishing 
To tesify the ability of intelligent computing models images such as fruits and cars. The intelligent computing 
grounded on information fusion, the visualization sample model based on information fusion studied exhibited 
distribution results of different models under various brain significant classification clarity and good classification 
visual cognitive image stimuli were compared and studied. performance under all visual image stimuli. This was 
The dataset contains 40 categories of images, which are because images of facial and animal categories were more 
divided into two main categories: natural scenes and consistent in natural scenes and were easily 
artificial objects. Natural scenes include image categories distinguishable by models. However, categories such as 
such as faces and animals, while artificial objects include cars, fruits, and chairs belong to the category of artificial 
image categories such as cars, fruits, chairs, etc. In the objects, and the visual differences between these 
experiment, a combination of these image categories was categories were significant, posing greater challenges to 
used to test the classification performance of the model the model. The intelligent computing model based on 
under different visual stimuli. The visualization outcomes information fusion revealed the deep level features of 
of various models are represented in Figure 10. From brain response, effectively improving classification 
Figure 10, the sample distribution of FWC and VGG was accuracy.
Classification accuracy/%
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 49 
Face recognition Fruits Animal Face recognition Fruits Animal
3 Automobile Chair 3 Automobile Chair
2 2
1 1
0 0
-1 -1
12 12
-3 -3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
X-axis sample distribution X-axis sample distribution
(a) Visualization results of FWC model (b) Visualization results of ResNet model
Face recognition Fruits Animal Face recognition Fruits Animal
3 3
Automobile Chair Automobile Chair
2 2
1 1
0 0
-1 -1
12 12
-3 -3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
X-axis sample distribution X-axis sample distribution
(d) Visualization results of intelligent 
(c) Visualization results of VGG model
computing models  
Figure 10: Visualization results of different models 
To further validate the ability of the intelligent Framewor N T mecha neural rate/
computing model grounded on information fusion, k N M nism responses % 
ablation experiments were conducted. The classification √ / / / / 81.4
accuracy in ablation experiments was calculated based on 2 
the precision of image classification tasks, which only √ √ / / / 87.0
reflected the accuracy of image classification results. The 6 
ablation experiment results of image classification are √ √ √ / / 90.4
shown in Table 3. From Table 3, the classification 8 
accuracy of the brain visual cognitive response encoding √ √ √ √ / 92.6
framework was 81.42%. When the DCNN structure was 5 
fused, the accuracy was improved by 5.64%. After adding √ √ √ √ √ 93.9
the LSTM structure, the accuracy was increased to 4 
90.48%. When attention mechanism was added for Note： "√" indicates the existence of the module； "/" 
improvement, the classification accuracy increased by 
indicates its non existence 
2.17%. When further optimizing using occluded non-
correlated neural responses, the accuracy of the model 
reached 93.94%. From the above, it can be seen that the 5 Discussion 
addition of the above modules brought benefits to the In order to improve the accuracy of computer vision 
classification performance of the model, effectively image classification, a fusion intelligent computing model 
raising the classification accuracy of images. was constructed by simulating the visual processing 
Table 3: Ablation experiment mechanism of the human brain, using BCI technology to 
extract EEG signals generated by human visual cognition, 
Brain D L Attent Obstructing Accu and combining DCNN structure. The results showed that 
Response C S ion non racy after adding LSTM, the convergence of the model was 
Coding correlated significantly improved, with the loss value decreasing 
Y-axis sample distribution Y-axis sample distribution
Y-axis sample distribution Y-axis sample distribution
50 Informatica 49 (2025) 37–52 X. Li et al. 
from 0.26 to 0.21, indicating a 19.23% increase in 6 Conclusion 
convergence speed. This indicated that LSTM could 
effectively capture time series features, improve the In recent years, the introduction of visual cognitive 
model's ability to process time-series data, and thus make mechanisms in the brain has provided new solutions to the 
the model more accurate in learning dynamic information. limitations of accuracy and generalization ability of 
After incorporating the advantages of LSTM into the traditional DCNN in processing complex visual 
model, it could better understand the temporal information. Research used BCIs to receive EEG 
dependencies in brain activity, resulting in more accurate information, used voxel encoding models to obtain the 
prediction performance. In addition, compared with other expression content of visual images, and combined an 
advanced methods, the research method was significantly improved DCNN structure to construct an efficient image 
superior to other methods. For example, although the classification model. On this basis, the LSTM structure 
model studied by Gao Z et al. effectively improved the was further introduced to extract time series features of 
decoding performance of VEP in complex environments, EEG signals. Attention mechanisms and occlusion 
it still faced the problem of noise interference and failed independent neural responses were utilized to enhance the 
to effectively integrate spatial and temporal features in the accuracy of capturing correlation information between 
brain's visual cognitive process [6]. The model studied in brain responses and image features. The outcomes 
this article not only considers spatial features but also revealed that the ventral response encoding method 
integrates dynamic temporal information when predicting grounded on brain visual cognition achieved prediction 
brain responses in visual regions, significantly improving accuracy of 93.54% and 94.03% in the V4 visual region 
the accuracy of predictions. In addition, the model and L0 visual region, significantly better than the CNN-
proposed by Ahirwal M K et al. achieved good results in EM and GaborNet VE methods. In the model validation, 
emotion classification, but it mainly focused on emotion after adding the LSTM module, the loss value decreased 
classification and cannot handle complex visual from 0.26 to 0.21, with a reduction of 19.23%. In terms of 
information and multi-class image classification tasks [7]. image classification capability, the accuracy of the 
The model studied in this article could not only handle information fusion based model on facial visual images 
complex visual information, but also adapt to the was as high as 95.46%, and the lowest accuracy on animal 
multidimensional features of the brain's visual cognitive visual images was 91.57%, both significantly better than 
process, thus exhibiting a more comprehensive comparative models such as FWC, ResNet, and VGG. In 
classification and understanding of visual information.  addition, ablation experiments showed that by introducing 
The potential extensions of the research model to attention mechanisms and occlusion independent neural 
other tasks include video analysis, multi-modal data responses, the final classification accuracy was improved 
fusion, etc. Video data not only contains spatial to 93.94%. From the above, the research on the fusion 
information of static images, but also dynamic time series intelligent computing model based on DCNN and brain 
information. Therefore, models based on brain visual visual cognition effectively improved the accuracy of 
cognition can better understand the dynamic changes in computer vision image classification.  
videos by integrating spatial and temporal features, Although research focused on EEG image 
especially with the addition of LSTM modules. In the field classification and achieved good classification results in 
of multi-modal data fusion, cross modal learning can be the relevant areas of ventral flow and visual regions, the 
achieved by introducing multi-modal neural network current scope of research has not yet covered other brain 
structures and combining data from different modalities. tissue and neural mechanisms. Therefore, future research 
For example, in video description generation tasks, visual can be extended to explore the functions of other brain 
information of video frames can be combined with speech regions, such as their contributions to tasks such as 
or text information to generate more accurate and natural cognitive control and emotion recognition in different 
descriptions. brain regions. In addition, combining different neural 
The reason why the research method is superior to mechanisms and multi-modal data will help improve the 
other methods is that it considers the spatial and temporal comprehensiveness and accuracy of cognitive image 
characteristics of the brain in the visual cognitive process, classification, thereby promoting further development in 
while other methods rely more on a single spatial or static the field of BCIs. Future work will strive to further 
feature. In addition, the introduction of LSTM further enhance the analytical ability of EEG information for 
enhances the model's ability to process temporal complex visual stimuli through the integration of broader 
information, enabling the model to decode complex neural regions and mechanisms, in order to promote the 
dynamic brain signals more accurately. The potential widespread application of intelligent computing models in 
applications of this discovery cover fields such as practical applications. 
neuroscience experiments, intelligent medical devices, 
and brain computer interaction systems. However, this Funding 
method still has certain limitations. For example, the study This study was supported by the Key Science and 
only explored the classification of EEG images, so the Technology Program of Henan Province (Project Name: 
research results are not comprehensive enough. This Research on NoC Routing Algorithm and Fault-Tolerant 
aspect can be further improved in the future. Technology Based on Spanning Tree Sub-Domains; Grant 
No. 252102210225). 
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 51 
References [11] Yıldırım Ö, Baloglu U B, Acharya U R. A deep 
convolutional neural network model for automated 
[1] Wilson H, Chen X, Golbabaee M, Proulx, M. J., & identification of abnormal electroencephalogram 
O’Neill, E. Feasibility of decoding visual signals. Neural Computing and Applications, 2020, 
information from electroencephalogram. Brain- 32(20): 15857-15868. DOI: 10.1007/s00521-018-
Computer Interfaces, 2024, 11(1-2): 33-60. DOI: 3889-z 
10.1080/2326263X.2023.2287719 [12] Miao M, Hu W, Yin H, & Zhang, K. Spatial‐
[2] Finlayson S G, Subbaswamy A, Singh K, Bowers, J, Frequency feature learning and classification of 
Kupke, A., Zittrain, J & Saria, S. The clinician and motor imagery electroencephalogram based on deep 
dataset shift in artificial intelligence. New England convolution neural network. Computational and 
Journal of Medicine, 2021, 385(3): 283-286. DOI: Mathematical Methods in Medicine, 2020, 2020(1): 
10.1056/NEJMc2104626 1981728-1981752. DOI: 10.1155/2020/1981728 
[3] Masana M, Liu X, Twardowski B, Menta, M., [13] Li F, He F, Wang F, Zhang, D & Li, X. A novel 
Bagdanov, A. D & Van De Weijer, J. Class- simplified convolutional neural network 
incremental learning: survey and performance classification algorithm of motor imagery 
evaluation on image classification. IEEE electroencephalogram signals based on deep learning. 
Transactions on Pattern Analysis and Machine Applied Sciences, 2020, 10(5): 1605-1624. DOI: 
Intelligence, 2022, 45(5): 5513-5533. DOI: 10.3390/app10051605 
10.1109/TPAMI.2022.3213473 [14] Tang X, Shen H, Zhao S, Li, N., & Liu, J. Flexible 
[4] Zhu Y, Zhuang F, Wang J, Ke, G., Chen, J., Bian, J & 
brain–computer interfaces. Nature Electronics, 
He, Q. Deep subdomain adaptation network for 
2023, 6(2): 109-118. DOI: 10.1038/s41928-022-
image classification. IEEE Transactions on Neural 
00913-9 
Networks and Learning Systems, 2020, 32(4): 1713-
[15] Kawala-Sterniuk A, Browarska N, Al-Bakri A, Pelc, 
1722. DOI: 10.1109/TNNLS.2020.2988928 
M., Zygarlicki, J., Sidikova, M., et al. Summary of 
[5] Hong D, Gao L, Yao J, Zhang, B., Plaza, A & 
over fifty years with brain-computer interfaces—a 
Chanussot, J. Graph convolutional networks for 
review. Brain Sciences, 2021, 11(1): 43-45. DOI: 
hyperspectral image classification. IEEE 
10.3390/brainsci11010043 
Transactions on Geoscience and Remote Sensing, 
[16] Cohn N. Your brain on comics: a cognitive model of 
2020, 59(7): 5966-5978. DOI: 
visual narrative comprehension. Topics in Cognitive 
10.1109/TGRS.2020.3015157 
Science, 2020, 12(1): 352-386. DOI: 
[6] Gao Z, Sun X, Liu M, Dang, W., Ma, C., & Chen, G. 
10.1111/tops.12421 
Attention-based parallel multiscale convolutional 
[17] Finlayson S G, Subbaswamy A, Singh K, Bowers, J, 
neural network for visual evoked potentials 
Kupke, A., Zittrain, J & Saria, S. The clinician and 
electroencephalogram classification. IEEE Journal of 
dataset shift in artificial intelligence. New England 
Biomedical and Health Informatics, 2021, 25(8): 
Journal of Medicine, 2021, 385(3): 283-286. DOI: 
2887-2894. DOI: 10.1109/JBHI.2021.3059686 
10.1056/NEJMc2104626 
[7] Ahirwal M K, Kose M R. Audio-visual stimulation 
[18] Bicanski A, Burgess N. Neuronal vector coding in 
based emotion classification by correlated 
spatial cognition. Nature Reviews Neuroscience, 
electroencephalogram channels. Health and 
2020, 21(9): 453-470. DOI: 10.1038/s41583-020-
Technology, 2020, 10(1): 7-23. DOI: 
0336-9 
10.1007/s12553-019-00394-5 
[19] Franzen L, Stark Z, Johnson A P. Individuals with 
[8] Komolovaitė D, Maskeliūnas R, Damaševičius R. 
dyslexia use a different visual sampling strategy to 
Deep convolutional neural network-based visual 
read text. Scientific Reports, 2021, 11(1): 6449-6455. 
stimuli classification using electroencephalography 
DOI: 10.1038/s41598-021-84945-9 
signals of healthy and alzheimer’s disease subjects. 
[20] Zhou S K, Greenspan H, Davatzikos C, Duncan, J. S, 
Life, 2022, 12(3): 374-379. DOI: 
Van Ginneken, B, Madabhushi, A & Summers, R. M. 
10.3390/life12030374 
A review of deep learning in medical imaging: 
[9] Kumari N, Anwar S, Bhattacharjee V. Time series-
Imaging traits, technology trends, case studies with 
dependent feature of electroencephalogram signals 
progress highlights, and future promises. 
for improved visually evoked emotion classification 
Proceedings of the IEEE, 2021, 109(5): 820-838. 
using EmotionCapsNet. Neural Computing and 
DOI: 10.1109/JPROC.2021.3054390 
Applications, 2022, 34(16): 13291-13303. DOI: 
[21] Basso M A, Bickford M E, Cang J. Unraveling 
10.1007/s00521-022-06942-x 
circuits of visual perception and cognition through 
[10] Santamaria-Vazquez E, Martinez-Cagigal V, 
the superior colliculus. Neuron, 2021, 109(6): 918-
Vaquerizo-Villar F, & Hornero, R. 
937. DOI: 10.1016/j.neuron.2021.01.013 
electroencephalogram-inception: a novel deep 
[22] Jeong J J, Tariq A, Adejumo T, Trivedi, H., Gichoya, 
convolutional neural network for assistive ERP-
J. W., & Banerjee, I. Systematic review of generative 
based brain-computer interfaces. IEEE Transactions 
adversarial networks (GANs) for medical image 
on Neural Systems and Rehabilitation Engineering, 
classification and segmentation, Journal of Digital 
2020, 28(12): 2773-2782. DOI: 
Imaging, 2022, 35(2): 137-152. DOI: 
10.1109/TNSRE.2020.3048106 
10.1007/s10278-021-00556-w 
52 Informatica 49 (2025) 37–52 X. Li et al. 
[23] Wang F. Automatic ink painting rendering technique  
based on deep convolutional neural networks.  
Informatica, 2025, 49(5): 95-108. DOI:  
10.31449/inf.v49i5.7112  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.6312  Informatica 49 (2024) 53-66   53 
 
Research on Optimization Method of Landscape Architecture 
Planning and Design Based on Two-Dimensional Fractal Graph 
Generation Algorithm 
Sheng Chen  
Shanxi Vocational University of Engineering and Technology, School of Architectural Design, Department of Cultural 
Heritage Conservation Engineering) 
E-mail: cs10078910@163.com 
 
Keywords: design optimization, generation algorithm, landscape architecture, two-dimensional fractal graph.  
Received: May 30, 2024 
The development of modern mathematical theory, especially two-dimensional fractal graph algorithm, 
provides a possibility for large-scale landscape data processing. Landscape digital identification 
technology is an innovative technology based on digital landscape technology and computer identification 
of experimental data. It is an important artificial intelligence technology, which includes three steps: 
landscape acquisition, landscape processing and landscape identification. The characteristics of the scene 
in the landscape picture can be collected by special instruments, such as cameras, etc., and then the 
collected data can be processed by two-dimensional fractal graph algorithm, and finally realice the 
automatic identification of the landscape. For images with significant boundary characteristics, we can 
extract the boundary of the región quickly and accurately, so as to realice the segmentation of the region. 
However, when the edge features of the image are not good enough, there is little color difference between 
the background and the region, or there is some interference, the result will be very bad. In this paper, 
based on the two-dimensional fractal graph generation algorithm, a series of optimization of landscape 
architecture planning and design. The accuracy of landscape prime number can reflect whether specific 
types of landscape pictures can be correctly identified and divided. 200 Pictures are divided into six 
categories, namely Water scene, landscape scene, living scene, sky scene, architecture and transportation 
then exact ratios of two-dimensional fractal graph network -8s, two-dimensional fractal graph network -
16s, two-dimensional fractal graph network -32s and two-dimensional fractal graph network -32s. It 
reached the best level in pixel accuracy, average accuracy, average IU, etc., and the pixel accuracy has 
reached as high as 100%, average accuracy has reached 100%, average accuracy has reached 100%. 
When compared to the recommended algorithm, the 2D fractal graph generation algorithm has the highest 
accuracy (94.52%), precision (93.34%), and recall (94.18%) in the classification process.  
Povzetek: Razvita je optimirana metoda načrtovanja krajinske arhitekture z uporabo algoritma 
generiranja dvodimenzionalnih fraktalnih grafov, kar je omogočilo učinkovito avtomatsko prepoznavanje 
in segmentacijo krajinskih elementov. 
 
1 Introduction However, it is possible to maximize the perfection of 
design results through as much analysis and thinking as 
The performance of landscape architecture is a possible and considering as many factors as possible. The 
comprehensive concept, which refers to the elements of the site environment is the material premise 
characteristics and functions of the whole life cycle of on which the landscape architecture depends. The design 
landscape architecture, including early analysis, of landscape architecture is not a static way to carry out, 
conceptual design, construction and operation. For a long and its form will inevitably exist in the site environment. 
time, landscape architecture has been dependent on the Human perception in the space environment, sound, light, 
designer's prediction to design it: based on the designer's heat and other factors in the natural environment affect 
prediction to complete the design. However, designers the form. It is significant to determine how to respond to 
cannot predict all the factors with complete accuracy, so the "unpredictable" immaterial capabilities, as well as 
it is impossible to design a perfect design.  how to respond to the force, energy and feeling in the 
structure, is the main task in the performance 
 
54   Informatica 49 (2025) 53-66               S. Chen     
 
optimization process, as shown in Figure 1 below, which In terms of the connotation of garden landscape, in 
is the specific landscape architecture design plan. addition to traditional garden landscape, there are also 
landscape preference, landscape competitiveness 
evaluation, etc., and the connotation of evaluation is 
gradually deepening. In the 20th century eighties to 
nineties, the landscape more beauty estimates, the 
environment, such as the study of different models, the 
visual landscape and visual effects evaluation, among 
them, the main methods of landscape evaluation model is 
divided into three categories: describe the factor method, 
questionnaire survey method, aesthetic attitude 
determination method. At present, the evaluation of 
landscape resources using both qualitative and 
quantitative way more, more is established by using AHP 
method of fuzzy comprehensive evaluation model, 
including: from the perspective of ecology, the GIS 
 technology, the tourists demand Angle, landscape image, 
etc. At present, high-resolution landscape architecture 
Figure 1: Planar heat planning and design in landscape 
can provide a large amount of landscape information with 
architecture design 
rich characteristics, so it is widely used in landscape 
For example, in the landscape architecture design shown assessment [1-3]. 
in the figure 1, designers can operate and optimize the Two-dimensional fractal image generation algorithm is 
form according to the perception of the natural an image segmentation method based on image features. 
environment of the site or the human environment of the Its working principle is to select the boundary points in a 
site, to realize the transformation of the original single region as candidate boundaries and select a method that 
pure form. can be spliced to obtain the boundary of the region 
The form-finding model and multi-objective through the inconsistency of the features of the region or 
optimization based on performance digital analysis make region. Several edge detection operators, including first-
performance itself a factor and method of form creation, order differential Sobel operator, Robert operator and 
and help designers complete the design as an second-order differential Laplacian operator, are usually 
optimization technique. Landscape architecture structure used to extract the edge of a region. 
optimization design process of performance, can be Based on two-dimensional fractal graph. The partition 
considered in landscape architecture designers technique of generating algorithm is essentially a 
interpretation of the space environmental conditions as partition based on similarity criterion, which includes 
the foundation, the performance optimization software as some common methods. In this method, a series of basic 
the main technical means, will be dissected into texture pixels are used to describe each region, and an 
landscape architecture structures as a material system + extended growth criterion is determined to expand the 
markers, a micro three dimensions from macro a medium region. Secondly, the growth of adjacent seed pixels is 
to form self-organizing process of structures, It is a form calculated to determine whether the adjacent pixels have 
generation process from top to bottom and self- added the pixel of seed pixels. When no new pixel is 
expression. Here, the dominant position of the designer is found, the whole growth is carried out. The most critical 
expressed by interpreting the site environment, predicting step is to know the rules of seed birth and growth [4-5]. 
the function and the relationship between the construction The watershed method mainly regards the pixel points of 
form and the performance target. This design idea can be each scene in landscape architecture as the coordinates in 
roughly summarized into three kinds of form generation the whole graph, and represents its location with one 
and feedback processes: one is a dynamic interaction pixel value. Next, they used a similar in floods overflow, 
between the form of the structure and the human subject. low-lying, pixels less place, is a piece of plain, and the 
The other is the environmental or other external forces high place, is a mountain, in a basin, with the change of 
acting on various forms of the structure, and the the terrain, terrain will be more and more high, the higher 
resistance of various forms to the environment or other the terrain, the topography is lower, the easier it is to be 
factors.  The third is the interaction between the flooded. After enough seawater is poured into the area, a 
components of the structure itself. Therefore, the depression is formed, creating an open area. However, 
performance of landscape architecture can also be there is too much segmentation due to the interference of 
summarized into three categories, which in turn is the the pixels.  In general, this method transforms the 
spiritual demand (the spatial feeling brought by the space landscape into a grid landscape in graph theory, and treats 
to the user). each pixel point as a node on the landscape, and the 
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66   55 
 
connection between these nodes is called the boundary. edges on G and cross off those parts of G that are not 
The common method to define edges is to calculate the connected together, thus realizing the partition of G. In 
dissimilarity of pixel points in the neighborhood the partitioned graph G, each independent subgraph is 
according to the correlation of pixel values, and then treat matched with the corresponding partition, which realizes 
them as edge weights, so as to get the graph G=< V, E>. the image segmentation [6-8].  
In general, G is a weighted undirected graph, and its 
weight is usually defined according to the actual situation. 2 Related works
The basic principle of graph theory is to cut off several 
Table 1: Summary on related works 
Reference Methods/Algorithm Merits Limitations 
[9] This work presents the multi-bit Trie-tree Test results indicate that this Under this design, the 
technique with non-collision hashing. approach has a good success rate necessary data reconstruction 
for data reconstruction and a high- technology has not developed. 
 performance efficiency. 
 
 
[10] This survey updates trends in Best-practice companies have Businesses that do not maintain 
organizations, processes, and outcomes higher expectations for their NPD current NPD practices will find 
for NPD in the United States and was programs, use more versatile their competitive lack in 
performed little over five years after teams, and are more likely to growth. 
PDMA's initial best-practices survey. monitor NPD processes and 
outcomes.  
 
 
[11] This work presents an effective hybrid The hybrid RBFs-Delaunay graph Delaunay graph mapping is 
approach for dynamic mesh generation mapping approach is found to be lacking in efficiency. 
based on Delaunay graph mapping and as accurate and effective as the 
Radial Basis Functions (RBFs). Delaunay technique in building 
dynamic meshes for various test 
. scenarios. 
. 
[12] Utilizing a novel two-dimensional The method is desirable from a Further information on the 
modification of hat functions (2D-MHFs) computational standpoint, and its error analysis is needed 
to solve linear Fredholm integral great accuracy is demonstrated by 
equations a few numerical examples.  
  
[13] An introduction to genetic algorithm- The method, particularly when A small number of snapshots 
based aeronautics is given. incorporating numerous free computed via computational 
design factors and configurable fluid dynamics. 
 modeling parameters, is more 
versatile and computationally  
efficient than the traditional 
approach. 
 
[14] This study offers a thorough approach to Lastly, simulated experiments Lack of performance metrics 
water and energy operation optimization. based on real network data are with the suggested operation. 
used to validate the viability of the 
 suggested operation optimization 
strategy. The comprehensive 
energy-saving rate reaches 31.3%, 
effectively lowering the costs 
associated with system operation. 
 
https://doi.org/10.31449/inf.v49i16.6312  Informatica 49 (2024) 53-66   56 
 
3 Research methods In the formula, represents the set composed of all 
landscape element pixels in the landscape image, 
3.1 Based on two-dimensional fractal graph, represents the set of pixels adjacent to the landscape 
element pixel, and usually takes the set of pixels in four 
it is an algorithm based on conditional 
similar regions, top, bottom, left and 
random point partition right.VNiLRepresents the combination of categories of 
landscape element pixel classification. It represents the 
According to the human landscape of the scene, make full 
categories of all landscape element pixels and is the 
use of the subjectivity of architectural design, carry out 
corresponding category of pixel points. xiiφ Is the 
artistic creation of landscape architecture, and visualize 
potential energy function.The general form of the unary 
its art. The design of this experimental landscape 
potential energy is the logarithmic pattern of the class 
architecture structure is based on the landscape pavilion 
likelihood probability corresponding to the 
with roof and column for people to watch and chat. The 
pixel.φi(xi)xiThe likelihood probability can be learned 
reason is: 
according to the actual specific pixel features, and the 
First of all, in landscape architecture works, the 
most common pattern is directly trained according to the 
interaction between people and places is very critical, 
pixel values. However, these methods only consider the 
which requires architectural design to be 
color factors of the landscape image pixels, but are not 
comprehensively considered according to their own 
particularly comprehensive. Therefore, color and texture 
experience and the situation of the site. After a detailed 
are generally selected： 
arrangement of the area, the author divides it into two 
parts: natural environment and cultural environment. The φi(xi) = λTφT(xi) + λcolφcol(xi) + λlφl(xi)         (2) 
experiment was conducted in a humanized manner 
centered on the central axis of the Guangzhou center, Potential energy function trained by the value size (color) 
targeting white-collar workers, tourists and nearby feature. φlIs a potential function defined according to the 
residents. In view of the large urban population and 
location characteristics of pixels. λT、λcolAnd are their 
various groups, the construction of landscape architecture 
respective weights, which are generally obtained through 
layout should not only meet the needs of people, but also 
training.λ
take into account the needs of history, culture and society, l 
that is to pay full attention to the structure of landscape Binary potential energy, generally defined in the form 
architecture and people's behavior. 
On this basis, a concept of random spatial structure based ,φij(xi, xj)                                                                       (3) 
on background, two-dimensional fractal graph, is 
proposed. This method uses each pixel in the background 0, xi = xj
as a node, as shown in Figure 2 below, and uses the φij(xi, xj) = {                                          (4) 
 g(i,j),xi ≠ xj
corresponding relationship between each point as the 
boundary. Then the minimum state random ability field Functions are generally defined based on the relationship 
is used to divide the landscape garden landscape. between pixel values between adjacent pixels. In the 
formula, the meaning of function is for pixel values with 
the same pixel class, which will make the value of 
function 0. Otherwise, it will be determined according to 
the function. 
According to the human landscape of the scene, make full 
use of the subjectivity of architectural design, carry out 
artistic creation of landscape architecture, and visualize 
its art. The design of this experimental landscape 
 architecture structure is based on the landscape pavilion 
Figure 2: The most basic representation of the two- with roof and column for people to watch and chat. 
dimensional fractal graph algorithm This method is based on the pixels of the pixel value, is 
often a function of the formula is the pixel value and pixel 
The energy of the conditional random field in image type are the same type of pixel values, resulting in the 
semantic segmentation is expressed as follows: function value is 0, on the other hand, will be decided 
according to the function. 
E(X) = ∑i∈V   φi(xi) + ∑i∈V,j∈N   φ
i ij(xi, xj); ∀i, xi ∈ At present, by using random field model for post-
L                                                                                     (1) processing, semantic of early model correction, one yuan 
a function usually to repair and add in front of the 
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66   57 
 
semantic segmentation, make the original semantic Effective navigation of the landscape design space is 
segmentation more accurately. made possible by the 2D fractal graph creation technique, 
which takes advantage of this hierarchical classification 
3.2 Description of algorithm to balance exploration and exploitation across various 
layers of search agents. Fractal graphs converge to ideal 
• Updating 2D Search Velocity: Every search agent's landscape designs through iterative refinement directed 
velocity is updated according to its velocity, position, and by equations guiding velocity and position updates. The 
the best-known positions of both itself and its neighbors hierarchical classification that the 2D fractal graph 
in the same level.  presents improves its effectiveness and efficiency in 
• Update Search 2D Position: Next, each search navigating the intricate landscape design space. 
agent's position is modified in accordance with its Fundamentally, the fractal graph method is a multi-level 
velocity. categorization scheme that groups search agents 
• Termination of Optimization: Until a termination according to how well they can explore and exploit new 
criterion is satisfied, the optimization process is carried areas. Because of its hierarchical structure, 2D is able to 
out recursively.  strike a balance between exploitation which helps to 
• Best answer Extraction: The ideal answer to the refine promising solutions and exploration which allows 
landscape design challenge is ultimately determined to be for the discovery of a variety of landscape design 
the most well-known position among all search agents.  alternatives.  
 
Algorithm 1: Two-dimension fractal graph for the landscape architecture planning and design 
Initialize:  
- Define the 2D landscape planning and design problem.  
- Parameters: size of the population (N), iterations in maximum (MaxIter), hierarchical levels (L), weights (w, w_local, w_global), 
acceleration coefficients (c1, c2), are the classifications.  
Generate initial fractal graph:  
- Randomly initialize N search agents with positions and velocities within the solution space.  
Evaluate stability:  
- Evaluate the stability of each search agent using the landscape design objective function.  
Main loop:  
For iter = 1 to MaxIter:  
Update hierarchical classification:  
- Classify search agents into hierarchical levels based on their stability and exploration-exploitation characteristics.  
For each level L:  
Update velocity and position:  
For each search agent in level L:  
Update velocity:  
- Calculate cognitive and social components:  
cognitive_component = c1 * rand() * (p_local - position)  
social_component = c2 * rand() * (p_global - position)  
58   Informatica 49 (2025) 53-66               S. Chen     
 
- Update velocity:  
velocity = w * velocity + cognitive_component + social_component  
Update position:  
- Update position:  
position = position + velocity  
Evaluate fitness:  
- Evaluate the stability of the new position using the landscape design objective function.  
Update local best:  
- Update local best position if current stability is better than previous.  
Update global best:  
- Identify the search agent with the best architecture stability among all levels.  
Return global best as the optimal solution. 
3.3 A Survey of the computational methods similarity and a high degree of convergence. This makes 
of two-dimensional fractal graphs the landscape design of the environment and the 
structural form can communicate directly, not 
Due to the emergence of two-dimensional fractal graph, mechanically or indirectly, to adapt to the former, but will 
it has been widely paid attention to for its unique be integrated into the landscape design of the behavior of 
advantages, coupled with the development of technology, the purpose. Although the message contained in each 
people's understanding of it is more and more profound, place is different, its essence is to seek a symbiotic 
two-dimensional fractal graph performance is also relationship between human and nature. Its inner essence 
getting better and better, has been widely used in many is palpable and obvious, such as the use of local materials 
aspects. This is especially true for the identification, and traditional crafts. The internal expression is a 
especially for the landscape. 2 d fractal graph has many spiritual guidance for designers to convey and express the 
advantages, the key point is the weighted average, and inner meaning of the museum through accurately 
biological neural network, it can reduce weight, thus grasping the characteristics of the scene. 
reduce the difficulties of network modeling. Compared to Its meaning includes two levels: first, designers use the 
the conventional 2 d fractal graph, it saves a lot of tedious power of nature to show a landscape structure with deep 
pretreatment process, such as refactoring related data and humanistic characteristics and spiritual emotions, and 
so on. make its spatial connotation appear and continue 
In numerous hierarchical networks, 2 d fractal graph is according to their own experience and experience. Due to 
one of the most widely used. The method can in advance the addition of landscape architecture structure, it adds a 
to BP, effectively reduce the training parameters, thus new artistic spirit and cultural connotation, so that its 
greatly improve the performance of the algorithm. With symbol has a natural artistic spirit of the place. No matter 
2 d fractal graph method, can effectively shorten the what the meaning is, it shows that the landscape 
preparation process of input, thus for the user to save the architecture structure is a man-made intermediary 
work time and reduce the pressure of work. Layer by between people and places, and the essence of its design 
layer, layer by layer, layer by layer, layer by layer with a is to integrate human thought and subjective will into the 
new computation, a new way of computing, each kind of information of places. Under the guidance of design, 
new data are added to a new system, this method can be people often naturally associate or recall, and realize the 
applied to many aspects. meaning of the designer, thus arouse people's resonance. 
The formation of architectural form is mainly an 2 d fractal graphic method belong to the category of deep 
innovative embodiment of the relationship between the learning, its structure characteristics are similar to deep 
spatial structure and the architectural structure of the learning, both the localization, also has a hierarchical. 
building site. Landscape structure is a unique landscape The method adopted a kind of monitoring method of 
architecture and its structure and has a high degree of training, make the method can more accurately extracted 
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66   59 
 
from the input data contains information. This method (
a2
1 = f(W1 1 )
11x1 + W12 × 1 + W1 1
13 × 1 + b1 )
can improve the learning efficiency of 2 d fractal graph. 
a2 (
2 = f(W1 1 )
21 × 1 + W22 × 1 + W1 1
23 × 1 + b2 )  
In actual application, because there is no information    
a2 (
there is usually no classification marking, so you need to 3 = f(W1 1 )
31x1 + W32 × 1 + W1 1
33 × 1 + b3 )
first according to the characteristics of the tag hW,b(x) = a3
1 = f(W2
11a2
1 + W2
12a2
2 + W2 2 ( )
13a3 + b 2
1 )
information itself, to unsupervised learning of these data, 
in order to gain rules, then marked by the monitoring data                                                  (6) 
of learning, both to fully play to the role of the samples, 
and can deal with the issue of information is not much. Prior to the spread of the calculation principle is shown 
in figure 1-3, the neural network structure and Logistic 
model structure are nearly the same, both calculation 
method and the principle of using but the biggest 
difference between them is the neural network Winding.  
Hypothesis 2 d fractal graph algorithm is able to handle 
the size of the data, the width of the image with said the 
figure like height with said, the two figures is a plane, but 
the dimension of the image and color channel Several are  
represented by d.h ∗ w ∗  dwh 
 
Basic network formed by convolution and pooling layer, 
and the two basic structures are based on local entered as 
a unit, and 2 d fractal graph algorithm in image translation 
can maintain the structure of the constant, in other words 
Element is only related to the position of the space, if one 
 
layer of data on the coordinates of the vector is, the data 
Figure 3: Computational architecture and general vector of the next layer is, and its calculation formula is 
formula of two-dimensional fractal graphs as follows: 
As we can see in Figure 3, the input level is a neural Yij = fks(Xsi+δi,sj+δj)0 ≤ δi, δj ≤ k                            (7) 
network."+ 1" is refers to a point, known as split points. 
The system has three types of input, output, three kinds Convolution kernels in the above formula used, the size 
of concealed structure, only a as output, input on the left, of the step length with s, said the said algorithm () 
on the right is the output, and in the central is implied, function which adopted operation, general can be divided 
shows the fully connected. Hidden in the shadows of data, into matrix on the convolution or average pooling, a 
including the data on the node, is unable to display in the nonlinear excitation function, and calculate the maximum 
training. pool type of operation such as the maximum value. 
Among the whole network structure, said one of the 
layers, a total of figure 2 contains three layers.n1Need a When the parameters and satisfy, to the principle of 
tag in each layer, is in the first layer of the output layer, mutual transformation between this formula can be used 
use said, parameters of fang, such as: to represent: 
′
(W, b) = (W1, b1, W2 , b2)                                        (5) fks∘ gks′ = (f ∘g)k′+(k−1)s′ ,ss′                                   (8) 
Wij (1) arrangement of a unit of the ordinal number is 1 General 2 d fractal graph algorithm in dealing with 
layer in the middle position of the first unit, and the nonlinear equation, a very common will adopt the method 
connection between the unit operation parameters, said in for solving nonlinear filter, completely is adopted by the 
the first layer of the first units to set up. j1 + 1ibl two-dimensional fractal graph. Full convolution net, can 
i1 +
1iThe neural network computation method is to use one be used to express two-dimensional fractal graph network 
of the units as outputs, and tags, and the output markup and because its input image in conformity with its output 
became the first layer, the value is 1, it represents in the image, so the convolution network size is not qualified. 
known and the two functions, and it contains a collection 
 
of the cases, can pass for function prediction, Thus the 
final results and output. a2
i wbhw,b(x) The specific  
calculation steps are as follows: 
 
60   Informatica 49 (2025) 53-66               S. Chen     
 
3.4 Based on the two-dimensional fractal some kind of mutual coupling relationship. For example, 
figure garden jingsu level evaluation index of when the landscape building structure is opened in the 
light environment, attention must be paid to the structural 
the algorithm 
performance of the structure to ensure the stability and 
constructability of the structure. From this point, it can be 
In the process of various scenarios of semantic 
seen that the combined effects of building characteristics 
classification, unable to differentiate each scenario, so 
and lighting conditions are very complex and some even 
sometimes will be a scene of scenes into other scenarios, 
contradict each other. For another example, the average 
the effect of causing some fuzzy. This paper adopts a new 
sunshine structure is strongly related to factors such as 
method, the images were compared with the real world, 
vertical scale, but in a specific design, the coupling of 
according to the classification results obtained will be 
sunlight with horizontal and vertical scales is difficult. 
digital processing was carried out, as a convicted 
Monitored learning is machine learning, and its technical 
landscape image of the final result. This article is 
feature is the ability to recognize the functions of a map. 
currently used by common word segmentation criterion, 
During this period, all training samples have an 
used for statistical accuracy. 
alternative target and a more satisfactory output result, 
This paper used to represent the semantics and belongs to 
which is called "monitoring”. The technique of 
the class is judgment as the number of pixels, to represent 
monitoring refers to the comprehensive analysis of the 
the language righteousness, the total number of 
input data at runtime, and the corresponding mapping is 
categories in this article belongs to the class represented 
obtained, to obtain a new set of sampled data. If it is new 
by the number of the pixels of the total number of 
data that was generated before, then he will label these 
nijijncjncj6, ti = ∑i  niji . Calculate the overall accurate 
new data samples as categories. In the algorithms for two-
rate formula such as: 
dimensional fractal graphs, guided learning algorithms 
∑ are usually used for training while supervised learning is 
i  nii                                                                                   (9) 
∑i  ti usually based on gradients (Krizhevsky et al, 2012). 
Batch processing stochastic gradient reduction methods 
Used to define JingSu belongs to the first-class scenery are commonly used. In the learning process of two-
and be correctly defined as the first-class scenery of dimensional fractal graph N, we only use one example to 
Beijing plain accurate formula can be used to represent:ii simplify the description process. The method is divided 
into two stages: forward stage and reverse stage. The first 
1
∑
n i  nii
cl                                                                              (10) step is to carry out in turn until the final result, the second 
ti step according to the error of the value to be output 
weight and deviation, after the end of the operation, 
IU scheme (IU) on average, is expected by calculation 
according to the weight and deviation of each level is 
landscape JingSu category pixels in the intersection of 
adjusted accordingly. If the number of classes, we specify 
right then to predict the scenery line pixels and the pixels 
is for samples during classification, its error function 
of the original category and set, the result is that the final 
formula is as follows: 
discriminant index, can use formula to represent: 
1
1 J(W, b; x, y) = Σc 1
(t − )2 ∥ −
∑
n i  nii 2 k=1 k yk = t
2
cl                                                                     (11) 
(ti+∑j   nji−nii) y ∥2
2 )                                                                          (12) 
Among them, represents the weight in the neural network, 
4 Result analysis b. Represents the bias in the neural network, the training 
sample is represented by, and the corresponding standard 
4.1 The practical application of two-
of the training sample is represented by.WxytkDenotes 
dimensional fractal figure the KTH dimension component of the predicted value 
2-D fractal graph network training, after a long period of generated by the sample when predicting the sample;x 
time after the improvement and development, there are However, it represents the dimensionality component of 
two main ways of training: in the presence of a monitor the sample label to be predicted.ykxk When we are doing 
teaching. While this paper used a learning algorithm of back propagation, the first thing we need to do is calculate 
monitoring, the original image and its corresponding the error terms at each level in a certain order.Suppose 
manual segmentation image used in the network of the 2- that the error term of the first layer is calculated according 
d fractal graph modeling [14]. to the above formula, and the weight of this layer is 
However, the influence ratio of each factor of the actual expressed and the bias parameter is expressed.δ(a+1)l +
landscape architecture structure on the landscape 1WbIf both layers are fully connected, the error term of 
architecture structure is not the same, and each aspect has 
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66   61 
 
the first layer can be calculated using the following 4.2 Optimization of landscape architecture 
formula: planning and design and expansion of two-
dimensional fractal graph 
δ(l) T
= ((W(l)) δ(l+1)) ⋅ f ′(z(l))                           (13) 
Through the detection and classification of 200 pictures, 
The corresponding gradient calculation formula is shown the correct degree of landscape element value in 
as follows: landscape garden landscape is obtained, as shown in 
Table 3 below, which can intuitively reflect the 
∇W(l)J(W, b; x, y) = δ(l+1)(a(1))T  
                      (14) performance of three different upper sampling structures. 
∇b(l)JJ(W, b; x, y) = δ(l+1)  The accuracy of landscape prime number can reflect 
whether specific types of landscape pictures can be 
If the first layer is a feature extraction stage, that is, the correctly identified and divided. Pictures are divided into 
convolution layer and the sampling layer of, then the six categories, namely Water scene, landscape scene, 
error term of the first layer can be calculated by the living scene, sky scene, architecture and transportation. 
formula: Water scenes include Water, river and Mountain. Scenes 
include: landscape, landscape, landscape, sky scene, 
(l)
δ =  unsample ((W a) T
) δ a+1)
k k k ) ⋅ architecture, traffic, traffic, etc. It can be seen from Table 
f ′(z a)
k )                                                                   (15) 3 that the classification accuracy of two-dimensional 
fractal graph network -32s is the worst among all 
Among them, the value of represents the first convolution classifications. The classification accuracy of two-
kernel. After a series of runs in the upper sampling layer, dimensional fractal graph network -16s is higher than that 
the error obtained later can be transmitted to the previous of two-dimensional fractal graph network -8s, while the 
layer through the subsampling layer, which refers to the accuracy of two-dimensional fractal graph network -8s is 
convolution layer of. kkδ a+1)
k If use method is to higher than that of two-dimensional fractal graph network 
average sampling, sampling layer will put poor 洖 -8s. Therefore, two-dimensional fractal graph network -
simple average assigned to run before sampling of sub 8S also has a good effect on landscape quality 
zone, if use is a maximum sampling, so when the forward classification of landscape images. On the two-
propagation, samples shipped to in clause, the sampling dimensional fractal graph network -8S, "Skyview" has 
value will get all the error, the rest of the value is 0 [15]. the best classification rate, but in the real world, it has the 
The accuracy of landscape classification can be seen in lowest classification rate, only occasionally appearing 
the accuracy of landscape classification, which can see something similar to the real scene. 
whether specific types in landscape have been correctly 
Table 3: Three kinds of the knot JingSu landscape 
identified and divided. As shown in Table 4-3, the 
classification accuracy 
classification of landscape scenery has low accuracy, and 
the classification accuracy of natural landscape is higher Categories FCN-8s FCN- FCN-32s 
than the 2D fractal network -8s.It can be seen that the (%) 16s(%) (%) 
two-dimensional fractal graph network -16s has the best 
The surface of the 91.89 90.32 87.85 
overall performance in the classification of habitat 
water 
landscape in landscape images. Finally, three values of 
average accuracy, average accuracy and average The mountain 87.66 86.35 82.43 
accuracy of image pixels are tested. The final value is 
shown in Table 2; Vegetation 85.94 88.28 83.12 
The sky 93.56 90.65 88.45 
Table 2: Results of three kinds of upsampled semantic 
segmentation on landscape pixels Building 88.29 87.56 84.68 
Scene Elements 
The traffic 86.12 85.08 83.28 
 Pixel Average The average IU 
accuracy(%) accuracy(%) (%)  
FCN-8s 86.25 85.98 74.33 Table 3 shows the exact ratios of two-dimensional fractal 
graph network -8s, two-dimensional fractal graph 
FCN- 88.97 88.58 75.35 
16s network -16s, two-dimensional fractal graph network -
32s and two-dimensional fractal graph network -32s. 
FCN- 83.06 82.75 72.69 From this figure, we can know the pixel accuracy, 
32s 
average accuracy and average value of the three kinds of 
62   Informatica 49 (2025) 53-66               S. Chen     
 
upper sampling. Through the comparison of three The landscape planning findings, which were obtained 
different upper sampling modes, it is concluded that two- through the use of a 2D fractal network generation 
dimensional fractal graph net-8S has reached the best algorithm, are shown in Figure 4 and Table 4. The 
level in pixel accuracy, average accuracy, average IU, assessments are based on a number of factors, such as 
etc., and the pixel accuracy has reached as high as 100%, ecological sustainability, aesthetic appeal, resource 
average accuracy has reached 100%, average accuracy efficiency, and robustness. Ecological sustainability, 
has reached 100%. The average accuracy is lower than aesthetic appeal, and resource efficiency are numerical 
pixel, which is calculated from the data of each values assigned to each design solution, ranging from 0 
classification of the image. Too much data will result in to 1, representing the quality of the design in each 
the average accuracy and the average IU. respective area. The algorithm generates a wide range of 
solutions every time, which encourages the exploration 
Table 4: Classification of landscape planning and design of the solution space and makes it possible to identify 
based on two-dimensional fractal graph generation several different design options. The term "robustness" 
algorithm describes how stable the solutions produced by the 2D 
fractal network generation algorithm is under different 
Classification FCN- FCN- FCN- circumstances. The robustness of all trials shows that the 
8s 16s 32s  algorithm's results are dependable and consistent in FCN-
8s, FCN-16s, and FCN-32s scenarios. 
Ecological Sustainability 0.86 0.79 0.72 
(0-1) Table 5: overall performance 
Aesthetic Appeal (0-1) 0.92 0.88 0.81 
Algorithm Accuracy Precision Recall 
Resource Efficiency (0-1) 0.84 0.79 0.82 (%) (%)  (%) 
 
Robustness 0.94 0.90 0.89 Traditional 86.34 85.12 85.78 
 
Optimization 
Algorithm 
 
Suggested 94.52 93.34     94.18 
Algorithm 
 
Figure 4: Outcome of landscape planning and design 
based on two-dimensional fractal graph generation 
algorithm 
 
 
 
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66   63 
 
 balanced and effective distribution of urban space while 
optimizing the spatial configuration and enabling 
coordinated growth within its internal areas. By 
illustrating a progressive decrease in the fractal 
dimension from the city's center to its periphery, 
suggesting outward growth, the study contributes to the 
notion of fractal cities. Furthermore, it shows that pixels 
with mature landscape architecture have larger fractal 
dimensions than those that are still developing or are 
being quickly classified as FCN-8s, FCN-16s, or FCN-
 32s. 
Figure 5: Overall performance of the methods 6 Conclusion 
An academic discipline that focuses on the interaction In order to solve this problem, we adopt a method based 
between human habitation and the natural environment is on quadratic fractal graph to solve this problem. Two-
landscape architecture planning and design. Table 5 and dimensional fractal graphs are mainly divided according 
figure 5 demonstrates that the 2D fractal graph generation to the pixel points in landscape architecture. Different 
approach that is recommended has the best accuracy from the method of 2D fractal graph, the final step of 2D 
(94.52%), precision (93.34%), and recall (94.18%).  fractal graph is to transform the whole connected level 
into a transition, so that the operational architecture of 2D 
5 Discussion fractal graph is preserved. The final image is the same 
size as the original image, and the original image of the 
A fractal graph is a complex entity that is formed using original image can be obtained. Compared with the 
recursive iteration rules and can be found in both natural traditional two-dimensional fractal graph method, two-
systems and man-made systems, such as cities. It displays dimensional fractal graph has higher computational speed 
intrinsic self-similarity on both a large and small scale. and is not constrained by the size of the input image. In 
The fractal two-dimension is a scientific technique for this paper, a new image semantic partition method is 
measuring landscape architecture aspects and its introduced. 
evolutionary unsample in the context of network systems. After establishing a virtual environment, the model is 
Furthermore, it is an important metric to determine modeled by the common SifFlow on the network. The 
whether a city is experiencing self-organizational image preprocessing technology is used to enhance the 
evolution. Previous research results show that self- data, thus effectively overcoming the problem of 
organized architecture systems have notable fractal overfitting the model. On this basis, the second learning 
properties that can be measured using fractal two- of the second segmentation is carried out to shorten the 
dimensional graphs. Nevertheless, prior research has learning period of the model. 
solely utilized these two fractal dimensions to investigate In the image analysis, two-dimensional fractal graph 
the general fractal properties of the entire city, without network -32s, two-dimensional fractal graph network -
carrying out more accurate fractal measurements for 16s, two-dimensional fractal graph network -8s, two-
subzones in various directions and layers.  dimensional fractal graph network -8s, two-dimensional 
The existence of a fractal structure, which acts as an fractal graph network -8s, two-dimensional fractal graph 
analogy and supplement to earlier research findings on network -8s, three different upper sampling methods are 
the general fractal laws discovered in other cities, is one adopted. In the case of pixel accuracy, average accuracy 
of the study's major discoveries. Unlike earlier research, and average U value sampling, the sampling structure on 
this finding supports the different 2D values that 2D fractal graph net-8s is selected, and the average 
contribute to the spatial variability of separate subzone accuracy is 90.3%, the average accuracy is 88.91%, and 
structures. The cause is the differing paths taken by the average U is 75.83%. At the same time, the pixel 
various subzones in terms of planning and development, accuracy of the model in each scene type in the landscape 
as well as the ways in which people, land, and image is more than 86%, it is indicating that the method 
architecture occupy and use space in distinct urban is an ideal method in the landscape image, especially in 
blocks. The box-counting dimension is a measure of the the landscape image containing multiple scene types, can 
spatial occupancy capacity, hence urban expansion will obtain higher pixel segmentation accuracy. In this paper, 
cause it to rise. In other words, the self-organizational the classification experiment of landscape in landscape 
objective of urban development is to provide a more architecture is performance metrics in algorithm carried 
out with accuracy as 94.52%. 
 
 
 
64   Informatica 49 (2025) 53-66               S. Chen     
 
Declaration statement [4] Mohammadi M, Raise A, Regi A, 2019. Design 
and performance optimization of a very low head 
turbine with high pitch angle based on two-
Ethics approval and consent to dimensional optimization[J]. Journal of the 
participate Brazilian Society of Mechanical Sciences and 
Engineering, 42(1), pp. 9. 
I confirm that all the research meets ethical guidelines https://doi.org/10.1007/s40430-019-2084-1  
and adheres to the legal requirements of the study [5] Liu W, Yang S, Ye Z, et al, 2019. An Image 
country. Segmentation Method Based on Two-
Consent for publication: I confirm that any participants Dimensional Entropy and Chaotic Lightning 
(or their guardians if unable to give informed consent, or Attachment Procedure Optimization Algorithm[J]. 
next of kin, if deceased) who may be identifiable through International Journal of Pattern Recognition and 
the manuscript (such as a case report), have been given Artificial Intelligence. 
an opportunity to review the final manuscript and have https://doi.org/10.1142/s0218001420540300  
provided written consent to publish. [6] Wang S, Lin S, 2019. Optimization on ultrasonic 
plastic welding systems based on two-
Availability of data and materials dimensional photonic crystal [J]. Ultrasonics, 
99:105954.  https://doi.org/10.1016/j.ultras.2019.
The data used to support the findings of this study are 
105954  
available from the corresponding author upon request. 
[7] Hu H Q, Yang L, 2013. Research on Routing 
Optimization of Regional Logistics Based on 
Competing interests  Gravity Model: A Case of Blue and Yellow 
Zones[J]. I business, 5(4):167-172. 
Here are no have no conflicts of interest to declare. https://doi.org/10.4236/ib.2013.54021  
All authors have seen and agree with the contents of the [8] Diaz-Casas V, Becerra J A, Lopez-Pena F, et al, 
manuscript and there is no financial interest to report. We 2013. Wind turbine design through the 
certify that the submission is original work and is not evolutionary algorithms based on surrogate CFD 
under review at any other publication. methods[J]. Optimization and Engineering, 
14(2):305-329.  https://doi.org/10.1007/s11081-
Funding 012-9187-1  
[9] Liu Y, Wan M, Zhang H K, et al, 2011. Research 
Funding: Shanxi Province Project: National Education 
on Data Reconstruction Method Based on 
Research Letter 2021," Rural revitalization" Research on 
Identifier Locator Separation Architecture[J]. 
Teaching Innovation of Traditional Culture Protection 
épée, 12(4):531-539. 
and Tourism Planning in the Background, JGCY2693. 
https://doi.org/10.6138/JIT.2011.12.4.01 
Authors' contributions (Individual contribution): All 
[10] Griffin A, 1997. PDMA Research on New Product 
authors contributed to the study conception and design. 
Development Practices: Updating Trends and 
All authors read and approved the final manuscript 
Benchmarking Best Practices[J]. Journal of 
Product Innovation Management, 14(6):429-
References 
458.  https://doi.org/10.1016/s0737-
6782(97)00061-1  
[1] Li X, Li S, Jiao H. 2020. Research on Multi-
[11] Ding L, Guo T, Lu Z, 2015.  A Hybrid Method 
objective Optimization Method of Central Air 
for Dynamic Mesh Generation Based on Radial 
Conditioning Air Treatment System Based on 
Basis Functions and Delaunay Graph Mapping[J]. 
NSGA-II[J]. Journal of Physics: Conference Series, 
Advances in Applied Mathematics & Mechanics, 
1626:012113. https://doi.org/10.1088/1742-
7(03):338-356. 
6596/1626/1/012113  
https://doi.org/10.4208/aamm.2014.m614  
[2] He P, Gao F, Li Y, et al, 2020. Research on 
[12] Hatamzadeh-Varmazyar S, Masouri Z, 2011. 
optimization of spindle bearing preload based on the 
Numerical method for analysis of one- and two-
efficiency coefficient method[J]. Industrial 
dimensional electromagnetics catering based on 
Lubrication and Tribology, ahead-of-print(ahead-
using linear Fredholm integral equation models[J]. 
of-print). doi10.1108/ILT-06-2020-0205 
Mathematical & Computer Modelling, 54(9-
[3] Hong E, Ban H, Qi M, 2019. Design optimization 
10):2199-2210. 
and analysis of a vaned diffuser based on the one-
https://doi.org/10.1016/j.mcm.2011.05.028  
dimensional impeller-diffuser throat area model[J]. 
[13] Lucas S D, Vega J M, Velazquez A, 2015. 
Journal of Physics: Conference Series, 
Aeronautic Conceptual Design Optimization 
1300:012007. https://doi.org/10.1088/1742-
Method Based on High-Order Singular Value 
6596/1300/1/012007  
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66   65 
 
Decomposition[J]. Aiaa Journal, 49(12):2713-  
2725.  https://doi.org/10.2514/1.j051133   
[14] Zhu X, Niu D, Wang F, et al, 2018. Operation  
Optimization Research of Circulating Cooling  
Water System Based on Superstructure and  
Domain Knowledge[J]. Chemical Engineering  
Research and Design, 142.  
https://doi.org/10.1016/j.cherd.2018.12.012   
[15] Hu G C, Liu J H, 2011.  The Optimization Design  
of Mechanical Structure Based on CAE  
Technology[J]. Machine Design& Research, 130-  
134:672-676.  
https://doi.org/10.4028/www.scientific.net/amm.  
130-134.672   
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
66   Informatica 49 (2025) 53-66               S. Chen     
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.5869 Informatica 49 (2025) 67–76 67 
Enhanced COVID-19 Detection Through Combined Image 
Enhancement and Deep Learning Techniques 
Abderrazak Benchabane*, Fella Charif 
Department of electronics and telecommunications, University of Kasdi Merbah, Ouargla, Algeria 
E-mail : benchabane.abderrazak@univ-ouargla.dz, cherif.fella@univ-ouargla.dz 
*Corresponding author 
Keywords: COVID-19, image enhancement, chest x-ray images, deep learning  
Received: March 6, 2024 
The rapid spread of COVID-19 has highlighted the need for automated patient data analysis to enable 
faster and more accurate diagnosis. Using pre-trained deep learning models on X-ray images has 
shown potential for effective COVID-19 detection. However, the performance of these models is highly 
dependent on the quality and quantity of training data. To address these challenges, enhancing the 
visual quality of X-ray images is critical for reliable virus detection. This study evaluates and combines 
three image enhancement techniques—Histogram Equalization, Contrast-Limited Adaptive Histogram 
Equalization (CLAHE), and Gamma Correction—to determine the optimal approach for improving 
detection accuracy. A dataset comprising 125 chest X-ray images from COVID-19-positive patients and 
500 images from non-COVID-19 cases was used. The images were preprocessed using the enhancement 
techniques, and the enhanced datasets were employed to train ResNet50 and DenseNet201 models. 
Simulation results demonstrate that enhanced images consistently yield higher detection accuracy than 
unenhanced images. Among the techniques tested, combining Histogram Equalization, CLAHE, and 
Gamma Correction with the DenseNet201 model achieved the highest performance, attaining a 
remarkable accuracy of 99.03%. This outperforms previous methods, including the DarkCovidNet 
model, which achieved an accuracy of 98.08% on the same dataset. 
Povzetek: Avtorja sta izboljšala zaznavanje COVID-19 iz rentgenskih slik prsnega koša z uporabo 
tehnik izboljšave slike (Histogram Equalization, CLAHE, Gamma Correction) v kombinaciji z modeli 
globokega učenja (ResNet50, DenseNet201). 
 
1 Introduction have passed through the human body. The result is an 
analog image which is often sufficient to obtain a reliable 
Corona disease is currently considered one of the most diagnosis and for low-cost screening. Various studies 
widespread, dangerous and fastest diseases, so it is have indicated the failure of CXR imaging in diagnosing 
necessary to find ways and methods to detect infected COVID-19 and differentiating it from other types of 
cases and diagnose them in the fastest and clearest way. pneumonia [3]. The radiologist cannot use X-rays to 
RT-PCR is a nuclear-derived technique that detects the detect pleural effusion and determine the volume 
presence of genetic material specific to a pathogen, involved. However, regardless of the low accuracy of X-
including a virus. A formal diagnosis of COVID-19 ray diagnosis of COVID-19, it remains widely used. To 
requires a laboratory test (RT-PCR) of nose and throat overcome the limitations of COVID-19 diagnostic tests 
samples and takes at least 24 hours to produce a result. using radiological images, various studies have been 
Nowadays, medical images and computerized analysis conducted on the use of deep learning (DL) in the 
become very important tools for medical diagnosis and analysis of radiological images [2-11]. It has also shown 
disease detection [1]. The radiology images show typical that image enhancement techniques can improve 
COVID-19 pneumonia in the lungs and the numerous significantly classification performance [12, 13]. 
complications that the virus causes in the body. The 
radiology imaging modalities include computed 
1.1 Contribution 
tomography (CT), radiograph X-rays, ultrasound, 
echocardiograms and magnetic resonance imaging In this paper, we investigate the impact of using image 
(MRI). These imaging modalities optimize and greatly enhancement techniques as a preprocessing step to 
facilitate the process of discovering affected areas in the improve the accuracy of convolutional neural network 
body [2]. Chest X-ray tests are easily available and have (CNN) models for COVID-19 detection. Specifically, 
a low risk of radiation. On the other hand, CT scans have histogram equalization, Contrast Limited Adaptive 
a high risk of radiation, are expensive, need clinical Histogram Equalization (CLAHE), and gamma 
expertise to handle and are non-portable. This makes the correction were applied to enhance chest X-ray (CXR) 
use of X-ray scans more convenient than CT scans. A images before training. 
radiograph is obtained by exposing a film to X-rays that  
68 Informatica 49 (2025) 67–76 A. Benchabane et al. 
The enhanced images significantly improved the image detection model based on the multi-head self-
visibility of key diagnostic features, such as ground-glass attention mechanism and residual neural network, 
opacities and consolidations, which are critical for achieving 95.52% accuracy with 5173 samples.  
accurate COVID-19 diagnosis. The proposed Transfer learning has also played a pivotal role in 
preprocessing pipeline was evaluated on a challenging COVID-19 detection. Apostolopoulos et al. [17] utilized 
COVID-19 dataset with an imbalanced number of pre-trained models like VGG19, Inception ResNet v2, and 
samples for both COVID and non-COVID classes. MobileNet v2, achieving 96.78% accuracy on 1427 
Experimental results demonstrated that the enhanced samples for COVID-19 classification. Mahmoud et al. 
images led to a notable improvement in the classification [18] applied the CovXNet architecture, achieving 97.4% 
performance of CNN models, achieving higher accuracy, accuracy on 610 samples. Mohit Kumar et al. [19] 
sensitivity, and specificity compared to using raw utilized a hybrid deep learning approach for multiclass 
images.  classification, achieving 98.20% accuracy on 6000 
The rest of the paper is organized as follows: The samples.  
materials and methods section contains details about our Several studies focused on binary classification tasks 
proposed technique along with some context about the with high accuracy. Guefrechi et al. [20] achieved 
state-of-the-art models that we have used. The results and 97.20% accuracy using deep learning methods on 5000 
Discussion section presents the experimental results images. Feki et al. [21] employed a deep CNN model for 
including classification accuracy, sensitivity, and F1- binary classification, reaching an accuracy of 95.30% on 
score obtained from the proposed work. The paper is 216 images. Mohan et al. [22] used a hybrid deep 
achieved by a conclusion. transfer learning CNN model achieving 92% accuracy 
with 9220 images. Malik et al. [23] applied deep neural 
1.2 Related works networks for multiclass classification, attaining 98.45% 
accuracy on 10017 images. Gulmez [24] explored 
Numerous studies have applied advanced artificial 
Xception and genetic algorithms for multiclass 
intelligence (AI) techniques, particularly deep learning 
classification, reporting an accuracy of 92.4% on 1251 
(DL) and machine learning (ML), to detect COVID-19 
images. Lastly, Zakariya et al. proposed to combine 
using X-ray images. Zhang et al. [14] developed an 
combine Xception, VGG-16, and VGG-19 models. They 
anomaly detection algorithm with efficient Net for 
achieved an accuracy of 97.91% using 964 images. 
multiclass classification, achieving an accuracy of 
Table 1 provides a summary of various research 
72.77% on 43370 samples. Deng et al. [15] employed 
studies focusing on state-of-the-art models for COVID-
models such as SVM, CNN, ResNet50, InceptionNetV2, 
19 detection using AI and ML techniques. 
Xception, and VGG16 to assess health status through X-
 
ray imaging, obtaining an accuracy of 84% using 5857 
samples. Wang et al. [16] introduced a COVID-19 X-ray 
 
Table 1: Summary of related works on COVID-19 detection 
Source Method/Model Samples used Accuracy (%) 
[14]  Efficient Net 43,370 72.77 
[15]  SVM, CNN, ResNet50, Xception, VGG16 5,857 84.00 
[16]  MHSA-ResNet neural network model 5173 95.52 
[17]  VGG19,  Inception ResNet v2, and MobileNet v2 1,427 96.78 
[18]  CovXNet 610 97.40 
[19]  Hybrid deep learning approach 6,000 98.20 
[20]  Deep Learning (Resnet50) 5,000 97.20 
[21]  Deep CNN (Centralized-ResNet50) 216 95.30 
[22]  Deep Transfer Learning 9220 92.00 
[23]  Deep Neural Networks 10,017 98.45 
[24]  Xception and Genetic Algorithm 1,251 92.40 
[25]  Xception+vgg-16+vgg-19 964 97.91 
 
 
 
2 Materials and methods  
 
2.1 Dataset generation  infected with the virus and 500 chest x-ray images of 
The Dataset of chest X-ray images used in this paper non-COVID-19. The data is divided into 2 classes, 50% 
for classifying negative and positive COVID-19 cases is of images were used for training and 50% for testing. 
available at (https://github.com/muhammedtalo/COVID- Figure 1 shows some samples that have been used in our 
19). It contains 125 chest x-ray images of patients simulation [6]. 
Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 69 
 Values of 𝛾 < 1 will shift the image towards the darker 
end of the spectrum while γ > 1 will make the image 
appear lighter. γ = 1 will have no effect on the input 
image. 
The application of histogram equalization (HE), 
Contrast Limited Adaptive Histogram Equalization 
(CLAHE), gamma correction, and their combination 
significantly improves the quality of COVID-19 X-ray 
images, aiding in better feature visualization and 
extraction (see Figure 2). 
 
 
Figure 1: Samples of chest X-ray images from the 
dataset. 
2.2 Image enhancement techniques 
Image enhancement is a very important task used in 
image pre-processing. Its aim is to improve the visual  
details of an image or to provide a transform Figure 2: X-ray image processed with various image 
representation for an appropriate usage in different fields enhancement techniques. 
[4, 11]. In this paper, we have considered the following 
enhancement techniques. Histogram equalization enhances global contrast, 
making subtle abnormalities more visible, while CLAHE 
2.2.1 Histogram equalization adaptively improves local contrast, preserving fine 
details and reducing noise amplification. Gamma 
Histogram Equalization (HE) is a technique for adjusting correction adjusts image brightness non-linearly, 
the contrast of an image using the image’s histogram. enhancing low-intensity features like ground-glass 
The goal of Histogram Equalization is to obtain a opacities. When combined, these techniques provide a 
uniform histogram, which improves contrast [13]. comprehensive enhancement by leveraging global and 
local adjustments, ultimately producing images with 
2.2.2 Contrast limited adaptive histogram improved visibility of critical diagnostic features. This 
equalization preprocessing step enhances the performance of 
Contrast Limited Adaptive Histogram Equalization convolutional neural networks (CNNs) by supplying 
(CLAHE) was originally developed for the enhancement higher-quality inputs, resulting in superior COVID-19 
of low-contrast medical images. The algorithm of detection accuracy and robustness. 
CLAHE creates non-overlapping contextual regions (also 
called sub-images, tiles or blocks) and then applies the 2.3 Pre-trained CNN 
histogram equalization to each contextual region, clips Two different CNN models (ResNet50 [26] and 
the original histogram to a specific value and then DenseNet201 [27]) were compared separately using eight 
redistributes the clipped pixels to each gray level. The different image enhancement techniques for the 
clipping level determines how much noise in the classification of COVID-19 and non-COVID to 
histogram should be smoothed and hence how much the investigate the effect of image enhancement on COVID-
contrast should be enhanced [13]. 19 detection. 
2.2.3 Gamma correction 2.4 Performance metrics 
Gamma Correction (GC) is a nonlinear adaptation 
In order to evaluate the performance of each deep 
applied to each and every pixel value. Gamma 
learning model, different metrics have been applied in 
corrections alternate the pixel value to improve the image 
this study [13,28]: 
using the projection relationship between the value of the 𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =                      (2) 
pixel and the value of gamma according to the internal 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
map. To calculate the gamma correction, the input value  
𝑇𝑃
is raised to the power of the inverse gamma. The formula 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =                          (3) 
𝑇𝑃+𝐹𝑁
for this is as follows [13]:  
1
1 𝑇𝑁
𝛾
𝐼 = 255 ( )                                (1) 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =                        (4) 
255 𝑇𝑁+𝐹𝑃
70 Informatica 49 (2025) 67–76 A. Benchabane et al. 
 2.5 Methodology 
2𝑇𝑃
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 =                      (5) 
2𝑇𝑃+𝐹𝑃+𝐹𝑁 Firstly, we train and test two pre-trained convolutional 
where: neural networks with original CXR images, and then we 
True Positive (TP): The prediction is COVID and the repeat the same operation with the same images 
image is COVID. enhanced with the two techniques cited below. The major 
True Negative (TN): The prediction is non COVID and experiments that are carried out in this study are the 
the image is non COVID. combination of the enhancement methods; HE and 
False Positive (FP): The prediction is COVID and the CLAHE; CLAHE and GC; HE and GC and finally 
image is non COVID. CLAHE, HE and GC (see Figure 2). For each 
False Negative (FN): The prediction is non COVID and combination, we compute the four performance metrics 
the image is COVID. rates. The detailed methodology adopted in the study is 
shown in Figure 3. 
 
Figure 3: Flowchart of the proposed method. 
 
and random scaling (0.5 1.1). Performance metrics were 
3 Results evaluated using 10 repeated cross-validation runs, each 
processing randomly selected image sets for training and 
Firstly, the X-ray images were enhanced using the testing. 
different techniques mentioned above. The concatenates 
are formed either from the original images (without 
3.1 Results without enhancement 
enhancement), images enhanced by a single technique 
(HE, CLAHE, or GC), or by a combination of these. The In this part, we train and test the two networks using 
obtained images allow us to build 8 databases. Secondly, images without any enhancement. The DenseNet201 and 
we train two pre-trained networks; ResNet50, and the ResNet50 have achieved performance with an 
DenseNet201 for detecting COVID-19 in chest X-ray accuracy of 98.08% and 98.35% respectively. The 
scan images. The last fully connected layer of the pre- confusion matrix constructed from the test evaluation 
trained neural networks was modified to classify two results is shown in Figure 4. 
classes: COVID-19 positive and negative. For both pre-
trained networks, the learning rate was set to 0.0003, 3.2 Results with enhancement 
while the validation frequency was set to every 5 steps to The confusion matrix in Figure (5) illustrates the 
track model performance. The maximum number of performance of DenseNet201 and ResNet50 models with 
epochs was limited to 6, and the minimum batch size was different image enhancement techniques. In the case of 
set to 10. The Adam optimizer and the cross-entropy loss DenseNet201 using CLAHE and GC techniques, a high 
function were chosen. Additionally, data augmentation accuracy of 99.68% was achieved with only 1 
techniques were applied, including random rotations (- misclassification out of 312 samples.  
10,10) random horizontal and vertical shifting (-30,30 )  
Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 71 
histogram equalization (Clahe) generally improve the 
classification performance compared to the original 
images. The highest accuracy (98.17% ± 0.64) is 
achieved using the HE enhancement technique, with an 
F1-score of 98.86% ± 0.40. Clahe also performs well, 
achieving an accuracy of 98.07% ± 0.93 and the highest 
F1-score of 98.81% ± 0.57. This indicates that contrast 
enhancement techniques effectively highlight important 
features in the images, leading to improved model 
performance. However, the combination of enhancement 
 techniques such as HE+GC and Clahe+GC does not 
Figure 4: Confusion matrices for DenseNet 201 and consistently outperform individual techniques. For 
RestNet 50 without enhancement. instance, HE+Clahe+GC results in a lower accuracy 
(97.43% ± 0.56) compared to HE alone, though it 
provides the highest specificity (93.38% ± 4.46). The 
standard deviation (SD) values suggest that these 
combined methods may introduce more variability in 
performance, as seen in specificity values.  
Table 3 demonstrates that the DenseNet201 model 
generally exhibits higher accuracy compared to 
ResNet50 for most enhancement techniques. The best 
performance is achieved using the combined 
HE+Clahe+GC technique, yielding an accuracy of 
99.03% ± 0.54 and an F1-score of 99.39% ±0.34. This 
 
suggests that DenseNet201 is better at leveraging the 
Figure 5: Confusion matrices for DenseNet 201 and 
enhanced features provided by multi-enhancement 
RestNet 50 with enhancement. 
approaches. Among individual techniques, HE and Clahe 
 both result in comparable accuracy (97.40% ± 0.82 and 
The model demonstrates perfect recall (100%) for 97.30% ± 1.13, respectively), with Clahe producing a 
detecting COVID cases and a very high precision of slightly higher F1-score of 98.32% ± 0.71. The GC 
98.4%, indicating its ability to correctly identify COVID technique results in relatively lower specificity (90.32% 
with minimal false negatives. Similarly, for NON- ± 2.94) compared to other methods, indicating that while 
COVID cases, the recall and precision are near-perfect at it improves sensitivity, it may not be as effective in 
99.6% and 100%, respectively, showing excellent distinguishing negative cases.  
discrimination between the classes.  Comparing the two models, DenseNet201 
In the other hand, the ResNet50 model, utilizing HE, consistently outperforms ResNet50 across all 
CLAHE, or a combination of CLAHE with HE or GC, enhancement techniques, with higher accuracy and F1-
demonstrates excellent performance with an accuracy of score values. The sensitivity of DenseNet201 is slightly 
99.36%. Out of 312 samples, the model correctly lower in some cases but remains competitive. Specificity 
classifies 62 COVID cases and 248 NON-COVID cases, improvements are more pronounced in DenseNet201, 
with only 2 misclassifications: 2 False Positives (NON- which indicates better handling of false positives. 
COVID predicted as COVID) and no False Negatives Regarding variability, DenseNet201 exhibits lower 
(COVID misclassified as NONCOVID). The recall for standard deviations in most metrics, particularly in 
COVID detection is perfect at 100%, indicating no accuracy and F1-score, suggesting more stable and 
incorrect predictions of COVID for NON-COVID cases, reliable performance across different enhancement 
while the precision is slightly lower at 96.9% due to the techniques. Conversely, the ResNet50 model experiences 
False Positives. These preprocessing techniques enhance greater variability, especially with specificity values. The 
image contrast and normalize brightness, aiding the results highlight the importance of image enhancement 
model’s ability to discriminate between classes techniques in improving deep learning model 
effectively. performance. While individual techniques such as HE 
and Clahe provide significant improvements, combining 
multiple techniques can further enhance performance, 
4 Discussion particularly for DenseNet201.  
The performance metrics reported in Tables 2 and 3 In addition, while exploring the confidence intervals 
provide a comprehensive comparison of different image through error bars in Figure (6), It can be shown that the 
enhancement techniques applied to ResNet50 and DenseNet201 model generally achieves higher accuracy 
DenseNet201 models. The analysis includes accuracy, and F1-scores, particularly with the combination of HE, 
sensitivity, specificity, and F1-score, along with their Clahe and GC techniques, while ResNet50 shows better 
respective standard deviations.  specificity for techniques like GC and HE+GC. 
From Table 2, it is evident that histogram Sensitivity remains high for both models, with 
equalization (HE) and contrast-limited adaptive 
72 Informatica 49 (2025) 67–76 A. Benchabane et al. 
overlapping confidence intervals indicating comparable performance.  
Table 2: Mean and Standard Deviation (SD) of Performance metrics for the ResNet50 model 
Enhancement Accuracy (%)  Sensitivity (%) Specificity (%) F1-Score (%) 
Techniques Mean  SD Mean  SD Mean  SD Mean  SD 
Original  96.60 0.52 98.88 1.20 87.41 5.67 97.90 0.31 
HE  98.17 0.64 99.40 0.54 93.22 3.20 98.86 0.40 
Clahe  98.07 0.93 99.80 0.38 91.12 4.04 98.81 0.57 
GC  97.78 0.99 98.40 1.46 95.32 2.68 98.61 0.63 
HE+GC  97.62 0.50 98.96 0.82 92.25 3.46 98.53 0.31 
HE+ Clahe  98.01 0.78 99.44 0.84 92.25 2.49 98.77 0.49 
Clahe+GC  97.75 0.89 98.96 1.16 92.90 3.58 98.60 0.56 
HE+Clahe+GC  97.43 0.56 98.44 1.36 93.38 4.46 98.40 0.36 
 
Table 3: Mean and Standard Deviation (SD) of Performance metrics for the DenseNet201 model 
Enhancement Accuracy (%)  Sensitivity (%) Specificity (%) F1-Score (%) 
Techniques Mean  SD Mean  SD Mean  SD Mean  SD 
Original  96.95  1.08 98.24 0.92 91.77 4.95 98.10 0.66 
HE  97.40  0.82 98.48 1.39 93.06 4.56 98.38 0.51 
Clahe  97.30  1.13 98.16 1.15 93.87 4.15 98.32 0.71 
GC  97.46  0.81 99.24 0.66 90.32 2.94 98.43 0.50 
HE+GC  97.62  1.41 99.08 1.10 91.77 5.01 98.52 0.87 
HE+ Clahe  98.75  0.63 99.68 0.52 95.00 2.89 99.22 0.38 
Clahe+GC  98.87  0.83 99.64 0.66 95.80 2.17 99.30 0.51 
HE+Clahe+GC  99.03  0.54 99.56 0.63 96.93 2.45 99.39 0.34 
 
 
ResNet50 exhibits greater variability across metrics, exhibits larger confidence intervals, indicating greater 
whereas DenseNet201 provides more stable results. In variability in its performance, whereas ResNet50 shows 
addition, while exploring the confidence intervals more consistent results with narrower confidence 
through error bars in Figure (6), it can be shown that the intervals, especially in Specificity. This variability 
DenseNet201 model generally outperforms ResNet50, suggests that while DenseNet201 may achieve higher 
particularly in Sensitivity and F1-score, with significant performance, its predictions are less stable across trials or 
improvements observed when using combined datasets. 
enhancement techniques. However, DenseNet201 
 
 
Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 73 
 
Figure 6: Confidence intervals for performance metrics of COVID-19 detection models. 
 
The radar chart shown in Figure (7) compares the models. They achieved 96.1% accuracy on a dataset 
area under the ROC curve (AUC-ROC) for the two deep containing 3141 chest X-ray images. Ozturk et al. [5] 
learning models across the different image preprocessing have used the DarkCovidNet and the same dataset used 
techniques. It can be seen that DenseNet201 generally in this paper. They achieved an accuracy of 98.08 %. 
demonstrates higher AUC-ROC values compared to Purohit 
ResNet50 across most preprocessing techniques, et al. [6] have used a Convolutional Neural Network with 
particularly in combinations involving multiple augmented data to increase the dataset. They have 
enhancements like HE+Clahe+GC and Clahe+GC. achieved 99.44 % accuracy.  
However, ResNet50 shows comparable performance in In addition, the proposed enhancement methods 
cases such as HE and GC. The results indicate that show significant improvements in accuracy compared to 
preprocessing techniques significantly impact model the models using other data sets. The DenseNet201model 
performance, with DenseNet201 being more responsive achieved an accuracy of 99.67%, outperforming models 
to enhancements. This suggests that model selection and like Feki et al. [21], which reported 95.3% accuracy 
preprocessing strategy should be carefully considered to using Deep CNN, and Guefrechi et al. [20], which 
optimize classification performance based on the desired achieved 97.20% with a deep learning approach. 
evaluation metric. Similarly, ResNet50 demonstrated high performance 
with an accuracy of 99.35%, surpassing Apostolopoulos 
et al. [17] (96.78%) and Mahmoud et al. [18] (97.40%). 
Furthermore, Deng et al. [15] achieved a maximum 
accuracy of 84.0%, showcasing the substantial 
improvement offered by the proposed methods. These 
results demonstrate that the proposed models consistently 
achieve higher accuracy, emphasizing their reliability 
and effectiveness in improving COVID-19 detection 
compared to the previously established state-of-the-art 
models. 
6 Conclusion 
This paper investigates how image enhancement 
techniques can improve the performance of pre-trained 
 neural networks when working with limited data. Two 
Figure 7: Radar plot of the area under the ROC curve pre-trained convolutional neural networks, ResNet50 and 
(AUC-ROC ) Performance for ResNet50 and DenseNet201, were selected for COVID-19 detection. 
DenseNet201 with Various Image Preprocessing The training set was constructed by applying various 
Techniques. enhancement techniques to chest X-ray images, which 
highlight critical structures such as lung opacities and 
5 Comparison with the state-of-the- consolidations—key features for accurate COVID-19 
art CNN approaches diagnosis. The results demonstrate that COVID-19 
detection accuracy is significantly improved when using 
To evaluate the proposed method, we made a comparison 
enhanced images compared to non-enhanced ones for 
with some existing models using COVID-19 X-ray 
both pre-trained networks.  
images. Narin et al. [7] have proposed three different DL 
74 Informatica 49 (2025) 67–76 A. Benchabane et al. 
Based on metrics such as accuracy, sensitivity, cases. When compared to previous studies using the 
specificity, and F1-score, the best-performing model was same dataset, DenseNet201 outperforms DarkCovidNet, 
DenseNet201, achieving an accuracy of 99.67%, which achieved 98.08%, underscoring the effectiveness 
sensitivity of 100%, specificity of 98.38%, and an F1- of these enhanced models on real-world X-ray images. 
score of 99.80% for classifying positive and negative 
 
Table 4: Results comparison with related works on COVID-19 detection 
Sources Method/Model Samples Accuracy (%) 
[7]  Inception V3, ResNet50, Inception-ResNet V2   3141 96.1 
[5]  DarkCovidNet 625 98.08 
[6]  CNN 1072 99.44 
[15]  SVM, CNN, ResNet50, Xception, VGG16 5857 84.00 
[17]  VGG16, VGG19, ResNet, DenseNet, InceptionV3 1427 96.78 
[18]  COVID-Net  610 97.40 
[20]  Deep Learning 5000 97.20 
[21]  Deep CNN 216 95.30 
[29]  End-to-end CNN 5184 95.70 
ResNet50 with HE 625 99.36 
Proposed 
DenseNet201with Clahe+HE+GC 625 99.67 
 
Images Using Multi-image Augmented Deep 
References Learning Model. Advances in Intelligent Systems 
and Computing, 1412:395–413, 2022. 
[1] Janko V, Slapničar G, Dovgan E, Reščič N, Kolenik  http://dx.doi.org/10.1007/978-981-16-6890-6_30  
T, Gjoreski M, Smerkol M, Gams M, Luštrek M. [7] Narin A, Kaya C, Pamuk Z. Automatic Detection of 
Machine Learning for Analyzing Non- Coronavirus Disease (COVID-19) Using X-ray 
Countermeasure Factors Affecting Early Spread of Images and Deep Convolutional Neural Networks, 
COVID-19. International Journal of Environment 
Pattern Analysis and Applications 24(3):1207–
Research and Public Health, 18(13):6750, 2021. 
1220, 2020. 
 https://doi.org/10.3390/ijerph18136750  
 https://doi.org/10.1007/s10044-021-00984-y  
[2] Rajkumar S, Rajaraman PV, Meganathan HS, 
[8] Sarki R, Ahmed K, Wang H, Zhang Y, Wang K. 
Sapthagirivasan V, ejaswinee V, Ashwin R. 
Automated detection of COVID-19 through 
COVID-detect: a Deep Learning Approach for 
convolutional neural network using chest x-ray 
Classification of Covid-19 Pneumonia From Lung 
images. PLoS ONE 17(1), 2022. 
Segmented Chest Xrays. Biomedical Engineering: 
 https://doi.org/10.1371/journal.pone.0262052  
Applications, Basis and Communications 33(2), 
[9] Masud, M. A light-weight convolutional 
2021. 
 https://doi.org/10.4015/S1016237221500101 Neural Network Architecture for classification 
[3] Gams M, Kolenik, T. Relations between of COVID-19 chest X-Ray images. Multimedia 
Electronics, Artificial Intelligence and Information Systems, 28:1165–1174, 2022. 
Society through Information Society Rules.   https://doi.org/10.1007/s00530-021-00857-8  
Electronics, 10(4), 514, 2021. [10] Ravi, V, Narasimhan, H, Chakraborty, C et al. Deep 
 https://doi.org/10.3390/electronics10040514 learning-based metaclassifier approach for COVID-
[4] Tahir A, Qiblawey Y, Khandakar A, Rahman T,  19 classification using CT scan and chest X-ray 
Khurshid U, Musharavati F, Islam MT, Kiranyaz S, images. Multimedia Systems, 28:1401–1415, 2022. 
Al-Maadeed S, Chowdhury MEH. Deep Learning   https://doi.org/10.1007/s00530-021-00826-1  
for Reliable Classification of COVID-19, MERS, [11] Asif S, Zhao M, Tang F et al. A deep learning-
and SARS from Chest X-ray Images. Cognitive based framework for detecting COVID-19 patients 
Computation. 14:1752-1772, 2022. using chest X-rays. Multimedia Systems, 28:1495–
  https://doi.org/10.1007/s12559-021-09955-1  1513, 2022. 
[5] Ozturk T, Talo M, Yildirim EA, Baloglu UB,  https://doi.org/10.1007/s00530-022-00917-7  
Yildirim O, Acharya UR. Automated detection of [12] Tahir A, Qiblawey Y, Khandakar A, Rahman T, 
covid-19 cases using deep neural networks with x- Khurshid U, Musharavati F, Islam MT, Kiranyaz S, 
ray images, Computers in Biology and Medicine. Al-Maadeed S, Chowdhury MEH. Exploring the 
121(103792), 2020. effect of image enhancement techniques on 
 https://doi.org/10.1016/j.compbiomed.2020.103792  COVID-19 detection using chest X-ray images. 
[6] Purohit K, Kesarwani A, Ranjan Kisku D, Dalui M. Computers in Biology and Medicine,132, 2021. 
COVID-19 Detection on Chest X-Ray and CT Scan  https://doi.org/10.1016/j.compbiomed.2021.104319 
Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 75 
[13] Kandhway P, Bhandari AK, Singh A. A novel from Multiple Chest Diseases Using Xrays. 
reformed histogram equalization based medical Sensors, 23(2):743, 2023. 
image contrast enhancement using krill herd  https://doi.org/10.3390/s23020743  
optimization, Biomedical Signal Processing and [24] Gulmez, B. A novel deep neural network model 
Control, 56 (101677), 2020. based Xception and genetic algorithm for detection 
 https://doi.org/10.1016/j.bspc.2019.101677   of COVID-19 from Xray images. Annals of 
[14] Zhang J, Xie Y, Pang G, Liao Z, Verjans J, Li W, Operations Research, 328:617–641, 2022. 
Sun Z, He J, Li Y, Shen C et al. Viral Pneumonia  https://doi.org/10.1007/s10479-022-05151-y  
Screening on Chest X-Rays Using Confidence [25] Zakariya A, Oraibi SA. Efficient COVID-19 
Aware Anomaly Detection. IEEE Transaction on Prediction by Merging Various Deep Learning 
Medical Imaging, 40(3): 879–890, 2021. Architectures, Informatica 48(5): 55–62, 2024. 
 https://doi.org/10.1109/tmi.2020.3040950   https://doi.org/10.31449/inf.v48i5.5424  
[15] Deng X, Shao H, Shi L, Wang X, Xie T. A [26] He K, Zhang X, Ren S, Sun J. Deep residual 
classification–detection approach of COVID-19 learning for image recognition, Proceedings of the 
based on chest X-ray and CT by using keras IEEE Conference on Computer Vision and Pattern 
pretrained deep learning models. Computer Recognition.2016. 
Modeling in Engineering & Sciences. 125(2):579–  https://doi.org/10.1109/CVPR.2016.90  
596, 2020. [27] Huang G, Liu Z, Van Der Maaten L, Weinberger 
 https://doi.org/10.32604/cmes.2020.011920  KQ. Densely connected convolutional networks, 
[16] Wang Z, Zhang K, Wang B. Detection of COVID- Proceedings of the IEEE Conference on 
19 Cases Based on Deep Learning with X-ray Computer Vision and Pattern Recognition. 
Images.  Electronics. 11(21):3511, 2022. 2017. 
 https://doi.org/10.3390/electronics11213511  https://doi.ieeecomputersociety.org/10.1109/CVPR.
[17]  Apostolopoulos, I.D, Mpesiana T.A. COVID-19: 2017.243  
Automatic detection from X-ray images utilizing [28] Patel S, Patel L. Deep Learning Architectures and 
transfer learning with convolutional neural its Applications: A Survey, International journal of 
networks. Physical and Engineering Sciences in computer sciences and engineering. 6(6):1177-
Medicine. 43: 635–640, 2020. 1183, 2018 . 
 https://doi.org/10.1007/s13246-020-00865-4  
 http://dx.doi.org/10.26438/ijcse/v6i6.11771183  
[18] Mahmud T, Rahman A, Fattah S.A. CovXNet: A 
[29] Zakariya A. Oraibi, Safaa Albasri. A Robust End-
multi-dilation convolutional neural network for 
to-End CNN Architecture for Efficient COVID-19 
automatic COVID-19 and other pneumonia 
Prediction form X-ray Images with Imbalanced 
detection from chest X-ray images with transferable 
Data, Informatica, 47(7):115–126, 2023. 
multireceptive feature optimization. Computers in 
 https://doi.org/10.31449/inf.v47i7.4790   
Biology and Medicine, 122, 103869,2020. 
 
 https://doi.org/10.1016/j.compbiomed.2020.103869  
 
[19] Mohit K, Dhairyata S, Vinod K, and Wanich S. 
 
COVID-19 prediction through X-ray images using 
 
transfer learning-based hybrid deep learning 
 
approach. MaterialsToday:  Proceeding. 51: 2520–  
2524, 2022.  
 https://doi.org/10.1016/j.matpr.2021.12.123   
[20] Guefrechi S, Jabra M. B, Ammar A, Koubaa A, and  
Hamam H. Deep learning-based detection of  
COVID-19 from chest X-ray images. Multimedia  
tools and applications, 80: 31803-31820, 2021.  
 https://doi.org/10.1007/s11042-021-11192-5   
[21] Feki I, Ammar S, Kessentini Y, Muhammad K.  
Federated learning for COVID-19 screening from  
Chest X-ray images, Applied Soft Computing, 106,  
2021.  
 https://doi.org/10.1016/j.asoc.2021.107330.  
[22] Mohan A, Ftsum bAa, Beshir K, Takore TT. A  
Hybrid Deep Learning CNN model for COVID-19  
detection from chest X-rays, Heliyon, 10(5), 2024.   
 https://doi.org/10.1016/j.heliyon.2024.e26938.  
[23] Malik H, Naeem A, Naqvi, R A, and Loh, W. K.  
DMFL-Net: A Federated Learning-Based  
Framework for the Classification of COVID-19  
 
76 Informatica 49 (2025) 67–76 A. Benchabane et al. 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.7635 Informatica 49 (2025) 77–86 77 
Enhancing Predictive Capabilities for Cyber Physical Systems 
Through Supervised Learning 
 
Dhanalakshmi B*, Tamije Selvy P 
Department of Computer Science and Engineering, Dr.N.G. P Institute of technology, India 
Department of Computer Science and Engineering, Hindusthan College of Engineering and 
Technology, India 
E-mail: dhanalakshmib@drngpit.ac.in, tamijeselvy@gmail.com 
*Corresponding author 
Keywords: Cyber-physical system, real time data, traffic, machine learning  
Received: November 20, 2024 
The rapid advancement and proliferation of Cyber-Physical Systems (CPS) have led to an exponential 
increase in the volume of data generated continuously. Efficient classification of this streaming data is 
crucial for predicting system behaviors and enabling proactive decision-making. This research aims to 
extract actionable knowledge from the continuous data streams of CPS and predict their behavior using 
advanced supervised learning algorithms. The predictions facilitate timely interventions and necessary 
actions within the interconnected physical network. The background of this work lies in the intersection 
of CPS, machine learning, and data stream mining. Traditional batch processing methods are inadequate 
for real-time analysis of CPS data due to their inherent latency and computational inefficiency. This 
research employs state-of-the-art techniques for real-time data processing, including incremental 
learning, sliding window models, and ensemble methods tailored for streaming data. Our approach differs 
from existing works by focusing on a comprehensive framework that integrates real-time data ingestion, 
preprocessing, feature extraction, and model updating in a seamless pipeline. Unlike previous studies that 
often rely on static datasets and offline analysis, our method ensures continuous learning and adaptation 
to evolving data patterns. Comparative analysis with existing techniques demonstrates superior 
performance in terms of accuracy, latency, and scalability. Specifically, our models achieved an average 
classification accuracy of 92%, with a precision of 90%, recall of 89%, and an F1 score of 89.5%. These 
metrics indicate significant improvements over traditional batch processing methods, which typically lag 
in responsiveness and adaptability. This research provides a robust and efficient solution for the real-
time classification of streaming data from CPS, enhancing the system's ability to predict behaviors and 
take necessary actions promptly. 
Povzetek: Predstavljen je izviren celovit ogrodni model za razvrščanje podatkov v realnem času v 
kibernetsko-fizičnih sistemih (CPS) z uporabo nadzorovanega učenja. 
 
1 Introduction need to be processed in real-time to ensure   optimal 
performance and to address potential issues proactively. 
The integration of Cyber-Physical Systems (CPS) into Traditional batch processing methods are inadequate for 
various sectors marks a significant advancement in this task due to their inherent latency and computational 
technology, enabling seamless interaction between inefficiency. Instead, there is a need for techniques that 
physical processes and computational systems. These can handle the continuous, high-speed influx of 
systems, encompassing applications such as smart grids, information in a CPS. Supervised learning algorithms 
autonomous vehicles, industrial automation, and have shown considerable promise in various predictive 
healthcare monitoring, generate continuous streams of tasks within data science. These algorithms can identify 
data. This data, produced in real-time, holds valuable patterns and relationships within historical data and 
insights that can enhance system performance, reliability, predict future outcomes [3]. However, applying these 
and safety. However, the sheer volume and velocity of this techniques to streaming data requires adaptations to 
streaming data present significant challenges in terms of manage the continuous flow and update the model 
processing and analysis. Efficient classification and incrementally [4]. This research focuses on developing an 
prediction of CPS behaviors using this data are crucial for efficient framework for classifying and predicting CPS 
timely decision-making and intervention [1,2]. Cyber- behavior using supervised learning, including advanced 
Physical Systems are characterized by their ability to models like Hidden Markov Models (HMM) and Explicit-
integrate physical processes with computational Duration Hidden Markov Models (EDHMM).  
capabilities through a network of sensors, actuators, and To achieve these objectives, this research employs a 
controllers. The data generated from these components  variety of advanced techniques tailored for the unique 
78 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. 
challenges of streaming data from CPS. Real-time data F1 scores consistently outperforming traditional batch 
ingestion and preprocessing are facilitated by leveraging processing techniques. These results validate the 
stream processing frameworks such as Apache Kafka and framework's ability to handle the complexities of CPS data 
Apache Flink, enabling efficient data ingestion and streams effectively. The practical implications of this 
ensuring that real-time data cleaning and normalization research are profound, offering enhanced operational 
techniques maintain data quality and consistency. efficiency and reliability in various CPS applications. For 
Incremental and online learning algorithms like Online instance, in a smart grid, accurate predictions of power 
Gradient Descent, Incremental Decision Trees, and demand and equipment failures can optimize energy 
Adaptive Random Forests are utilized, along with sliding distribution and maintenance schedules. In industrial 
window techniques to retain recent data, ensuring the automation, predicting machine failures and operational 
model adapts to the latest trends and patterns [5]. Hidden anomalies can prevent costly downtimes and improve 
Markov Models (HMM) are employed to model the production efficiency.  
stochastic processes underlying CPS data, capturing The primary objective of this research is to develop an 
temporal dependencies and sequential patterns. HMMs efficient framework for the classification of streaming 
consist of states representing different conditions or data from CPS, enabling the prediction of system 
modes of the CPS, observations that are data points behaviors and facilitating timely interventions. This 
generated by the CPS and are probabilistically dependent overarching goal can be broken down into several specific 
on the states, transition probabilities indicating the objectives: Develop methods for real-time ingestion and 
likelihood of transitioning from one state to another, and preprocessing of streaming data; Ensure the system can 
emission probabilities representing the likelihood of handle high-velocity data streams without significant 
observing a particular data point given a state. By latency; Implement supervised learning algorithms 
continuously updating the transition and emission capable of incremental learning, allowing the model to 
probabilities as new data arrives, HMMs enable real-time update continuously; Explore techniques such as sliding 
tracking of the system’s state and prediction of future window models and online learning to maintain model 
behaviors. Explicit-Duration Hidden Markov Models relevance over time; Design robust feature extraction 
(EDHMM) extend the capabilities of HMM by explicitly mechanisms that can operate in real-time; Identify and 
modeling the duration that the system spends in each state, create features that are predictive of CPS behaviors, 
which is particularly useful for CPS where the duration of ensuring these features can be computed on-the-fly; Apply 
certain states significantly impacts the system’s behavior, HMMs to model the probabilistic relationships and 
such as machinery operating cycles or sensor activation temporal dependencies in CPS data; Extend HMMs with 
periods. EDHMM components include state durations, EDHMM to incorporate state durations, providing more 
which are probabilistic distributions defining how long the precise temporal modeling; Establish metrics for 
system remains in a given state, and transition and evaluating model performance on streaming data, 
emission probabilities similar to HMM but adjusted to including accuracy, precision, recall, and F1 score; 
account for state duration distributions. By incorporating Develop strategies for model adaptation to cope with 
state durations, EDHMM provides a more accurate concept drift and changing data patterns; Compare the 
temporal modeling, enhancing the prediction of CPS performance of the proposed framework against 
behaviors over time.  traditional batch processing methods and other state-of-
Feature extraction and engineering are also crucial, the-art techniques; Conduct experiments to demonstrate 
involving the development of methods for real-time improvements in accuracy, latency, and scalability; Apply 
feature extraction that allows dynamic computation of the framework to real-world CPS scenarios, such as smart 
features as new data arrives and the creation of features grids and industrial automation systems; Showcase how 
based on domain knowledge that capture critical aspects the predictions and classifications can drive actionable 
of CPS behavior such as temporal patterns and anomaly decisions within the CPS.  
indicators. Model evaluation and adaptation are facilitated 
by establishing a real-time evaluation pipeline that 2 Literature review 
continuously monitors model performance using metrics 
like accuracy, precision, recall, and F1 score, and The increasing complexity of Cyber-Physical 
implementing strategies to handle concept drift, such as Systems (CPS) and their integration into various sectors 
retraining models based on performance degradation. This necessitate advanced data processing and predictive 
research distinguishes itself from existing works by techniques to ensure optimal performance and security. 
offering an integrated framework that combines real-time The literature reveals a range of approaches for handling 
data processing, incremental learning, and advanced streaming data, including supervised learning, clustering, 
modeling techniques like HMM and EDHMM. While active learning, semi-supervised learning, and advanced 
previous studies often focus on isolated aspects of CPS models such as Hidden Markov Models (HMM) and 
data analysis, this work emphasizes a comprehensive Explicit-Duration Hidden Markov Models (EDHMM). 
approach that addresses the practical challenges of Cheng et al. (2021) [6] introduced MATEC, a 
dynamic CPS environments. The comparative analysis lightweight neural network designed for online encrypted 
highlights significant improvements in performance traffic classification. This approach addresses the 
metrics. The proposed methods achieved an average challenges of real-time data classification in CPS by 
classification accuracy of 92%, with precision, recall, and focusing on the efficiency and speed of the model, making 
Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 79 
it suitable for environments where data streams are challenges of clustering and classifying short text data in 
continuous and rapid. The model's lightweight nature real-time, which is relevant for CPS applications 
ensures that it can be deployed in resource-constrained involving text data, such as social media analysis or sensor 
settings without compromising performance. Coletta et al. logs. Li et al. (2020) [16] introduced a classification and 
(2019) [7] proposed combining clustering and active novel class detection algorithm based on the cohesiveness 
learning to detect and learn new image classes. This and separation index of Mahalanobis distance. This 
method is particularly relevant to CPS, where new patterns technique ensures that the system can effectively classify 
or anomalies must be detected promptly. By integrating data while detecting new classes, crucial for maintaining 
clustering with active learning, the system can identify the adaptability and accuracy of CPS. Lu et al. (2019) [17] 
novel classes of data efficiently, enhancing its ability to reviewed learning under concept drift, highlighting the 
adapt to changing conditions in real-time. Din et al. (2020) challenges and solutions for maintaining model 
[8] focused on online reliable semi-supervised learning for performance in dynamically changing environments. 
evolving data streams. Their approach leverages both Concept drift is a common issue in CPS, where the 
labeled and unlabeled data, ensuring that the model can underlying data distribution can change over time. The 
learn effectively even when labeled data is scarce. This review covers various strategies to detect and adapt to 
method is crucial for CPS, where obtaining labeled data concept drift, ensuring that models remain effective. 
for every new scenario can be impractical. The semi- Wang and Chen (2019) [18] discussed the construction of 
supervised learning model adapts to changes in the data a data aggregation tree with maximized lifetime in 
stream, maintaining high performance despite evolving wireless sensor networks. This method focuses on 
conditions. Dong et al. (2022) [9] presented an optimizing the lifetime of the network, which is essential 
interpretable federated learning-based framework for for the sustainability and reliability of CPS. Xu and Duan 
network intrusion detection. Federated learning allows (2019) [19] surveyed big data applications for CPS in 
multiple devices to collaboratively learn a model without Industry 4.0, highlighting the role of data analytics in 
sharing raw data, addressing privacy concerns inherent in optimizing industrial processes. Their survey covers 
CPS. This approach ensures robust security measures various techniques for processing and analyzing big data, 
while maintaining the confidentiality of sensitive data emphasizing the importance of efficient data management 
across the network. Folino et al. (2020) [10] developed a in CPS. Zaitseva and Lavrova (2020) [20] explored the 
genetic programming-based ensemble classification self-regulation of network infrastructure in CPS based on 
framework for time-changing intrusion detection data the genome assembly problem. This innovative approach 
streams. This ensemble approach combines multiple applies biological principles to optimize network 
models to improve overall prediction accuracy and adapt performance and self-regulation, offering a novel 
to changes in the data. The genetic programming aspect perspective on CPS management. The literature provides 
allows the system to evolve over time, ensuring that it a comprehensive overview of various approaches for 
remains effective in the face of new threats. Hu et al. handling streaming data in CPS. These methods range 
(2018) [11] introduced a random forests-based class from lightweight neural networks and federated learning 
incremental learning method for activity recognition. This to quantum machine learning and genetic programming-
technique is particularly useful for CPS, where new based ensemble classification. Each technique addresses 
activities or behaviors may emerge over time. The specific challenges related to real-time data processing, 
incremental learning approach ensures that the model can adaptability, and security in CPS. The integration of these 
continuously adapt without needing a complete retraining, advanced methods ensures that CPS can operate 
making it efficient for real-time applications. efficiently and effectively in dynamic environments, 
Yagyu et al (2020) [12] discussed hierarchical maintaining high performance and reliability. The 
aggregation of select network traffic statistics, proposed work overcomes the challenges in existing 
emphasizing the importance of efficient data aggregation works by offering an integrated framework that combines 
in CPS. This method enhances the scalability and real-time data processing, incremental learning, and 
manageability of data streams, ensuring that the system advanced modeling techniques like HMM and EDHMM. 
can handle large volumes of data without significant Traditional methods often suffer from limitations such as 
latency. Júnior et al. (2019) [13] explored novelty latency, inefficiency in handling high-velocity data, and 
detection for multi-label stream classification, a critical inability to adapt to evolving data streams. By leveraging 
capability for CPS to identify and respond to new and real-time data ingestion and preprocessing with stream 
unforeseen events. Their approach ensures that the system processing frameworks like Apache Kafka and Apache 
can maintain high accuracy and reliability even when Flink, the proposed framework ensures efficient handling 
encountering novel data patterns. Kalinin and Krundyshev of continuous data. Incremental and online learning 
(2022) [14] applied quantum machine learning techniques algorithms such as Online Gradient Descent, Incremental 
for security intrusion detection. This cutting-edge Decision Trees, and Adaptive Random Forests allow the 
approach leverages the computational power of quantum model to update continuously, addressing the challenge of 
computing to enhance the efficiency and accuracy of maintaining model relevance over time. The use of HMM 
intrusion detection, offering a promising direction for and EDHMM enhances the framework's ability to capture 
future CPS security measures. Kumar et al. (2020) [15] temporal dependencies and state durations, providing 
proposed an online semantic-enhanced Dirichlet model more accurate temporal modeling. This approach ensures 
for short text stream clustering. This model addresses the 
80 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. 
robust performance even in the face of concept drift, a duration modeling, making them suitable for applications 
common issue in dynamic CPS environments. where the duration of states is an important factor. The 
performance of these models is continuously evaluated 
3 Proposed methodology using metrics such as accuracy, precision, recall, and F1-
score. This evaluation ensures that the models remain 
The proposed methodology aims to create an efficient effective over time. However, in dynamic environments 
and adaptive framework for the classification and like CPS, data distributions can change, leading to a 
prediction of streaming data from Cyber-Physical Systems phenomenon known as concept drift. Concept drift occurs 
(CPS). This section outlines the key components and when the statistical properties of the target variable change 
techniques employed in the framework, including real- over time, which can degrade the performance of 
time data ingestion, preprocessing, supervised learning predictive models. To address concept drift, techniques for 
algorithms, advanced modeling with Hidden Markov detecting and adapting to these changes are integrated into 
Models (HMM) and Explicit-Duration Hidden Markov the system. When concept drift is detected, models are 
Models (EDHMM), and real-time feature extraction. In retrained or updated to accommodate the new patterns in 
the realm of Cyber Physical Systems (CPS), the the data, ensuring that predictions remain accurate and 
continuous influx of data presents a significant challenge reliable. This adaptive approach is essential for 
and opportunity for real-time analysis and prediction. maintaining the relevance and performance of the models 
Efficient classification and prediction of this data are in the face of changing data environments.  
crucial for timely decision-making and ensuring the 
reliability and safety of these systems. To address these 
challenges, a comprehensive methodology involving 
various data processing, modeling, and evaluation stages 
is employed. 
 
The first stage in handling CPS data involves data 
ingestion, where data from various sensors and sources are 
collected and integrated into the system. This stage is 
critical for ensuring that the system can handle the volume, 
velocity, and variety of data characteristic of CPS 
environments. Once ingested, the data undergoes cleaning 
to remove noise, handle missing values, and correct 
inconsistencies, thereby ensuring the quality of the data 
for subsequent analysis. 
Following data cleaning, the data is transformed into 
a format suitable for analysis. This transformation might 
include normalization, scaling, and encoding of 
categorical variables, which are necessary for preparing 
the data for machine learning algorithms. Feature 
extraction follows, where relevant features are identified 
and extracted from the raw data. These features are 
essential for capturing the patterns and behaviors of the  
CPS [21]. Feature selection then plays a crucial role in Figure 1: Proposed architecture 
improving model performance and reducing  
computational complexity. By selecting only the most Figure 1 outlines a systematic approach to the 
relevant features, the dimensionality of the data is efficient classification and prediction of streaming data 
reduced, which helps in building more efficient and from Cyber Physical Systems (CPS). It begins with "Raw 
effective predictive models. For modeling, supervised Data" collection, followed by "Data Ingestion" to gather 
learning algorithms are typically employed. These data from various sources. "Data Cleaning" is performed 
algorithms are trained on historical data to learn the to ensure data quality by removing noise and handling 
underlying patterns and relationships, enabling them to missing values. The clean data is then transformed in the 
make predictions on new data. Popular algorithms include "Data Transformation" stage to prepare it for analysis. 
decision trees, support vector machines, and neural Next, the "Feature Extraction" stage identifies 
networks, each offering different advantages in terms of relevant features, which are subsequently refined in the 
accuracy, interpretability, and computational efficiency. "Feature Selection" stage to reduce dimensionality and 
In addition to traditional supervised learning models, enhance model performance. The selected features are 
advanced modeling techniques like Hidden Markov then used for "Model Training" with supervised learning 
Models (HMM) and Explicit-Duration Hidden Markov algorithms, and "Model Prediction" is carried out to 
Models (EDHMM) are used. HMMs are particularly forecast CPS behavior. 
effective for modeling time series data and capturing In parallel, the diagram includes advanced modeling 
temporal dependencies, which are common in CPS data. techniques like "HMM Training" and "EDHMM 
EDHMMs extend HMMs by incorporating explicit state Training," which produce "HMM Model" and "EDHMM 
Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 81 
Model," respectively. These models are integrated into the data, ensuring adaptability to changing data 
prediction stage for improved accuracy. distributions. 
"Model Evaluation" assesses the performance of the 3.2 Advanced Modeling with HMM and 
predictive models, ensuring their reliability. The system EDHMM 
also includes "Concept Drift Detection" to identify 
To capture the temporal dependencies and state 
changes in data patterns over time, prompting "Model 
transitions in CPS data, the proposed framework employs 
Adaptation" to update and retrain models, maintaining 
Hidden Markov Models (HMM) and Explicit-Duration 
their effectiveness in dynamic environments. This 
Hidden Markov Models (EDHMM). 
comprehensive workflow ensures robust and adaptive 
 
prediction capabilities for CPS data streams. 
Hidden Markov Models (HMM) 
 
State Representation: HMMs consist of hidden states that 
3.1 Real-time data ingestion and represent different conditions or modes of the CPS. 
preprocessing Observations are the data points generated by the CPS and 
efficient handling of continuous data streams is critical for are probabilistically dependent on these states. 
CPS. The proposed framework utilizes stream processing Transition and Emission Probabilities: HMMs use 
frameworks such as Apache Kafka and Apache Flink to transition probabilities to model the likelihood of moving 
facilitate real-time data ingestion. These technologies from one state to another and emission probabilities to 
ensure that data can be ingested at high speeds and with represent the likelihood of observing a particular data 
low latency, crucial for maintaining the performance of point given a state. 
CPS. Real-Time Updates: As new data arrives, the transition 
 and emission probabilities are updated in real-time, 
Data ingestion allowing the model to adapt to new patterns and predict 
Apache Kafka: Kafka is used to handle the ingestion of future states accurately. 
large volumes of streaming data. Its distributed nature  
allows it to scale horizontally, ensuring reliability and Explicit-Duration Hidden Markov Models (EDHMM) 
fault tolerance. State Duration Modeling: EDHMM extends HMM by 
Apache Flink: Flink complements Kafka by providing explicitly modeling the duration that the system spends in 
real-time data processing capabilities. It allows for each state. This is particularly useful for CPS, where the 
complex event processing, real-time analytics, and duration of states (such as operational cycles or sensor 
machine learning tasks on data streams. activation periods) significantly impacts behavior. 
 Duration Probabilities: EDHMM incorporates 
Data preprocessing probabilistic distributions that define how long the system 
Real-Time Data Cleaning: Techniques such as filtering, remains in a given state, enhancing the temporal accuracy 
normalization, and handling missing values are applied in of predictions. 
real-time to ensure data quality. Temporal Precision: By incorporating state durations, 
Data Transformation: Data is transformed into a suitable EDHMM provides a more precise temporal modeling, 
format for the machine learning models. This includes improving the prediction of CPS behaviors over time. 
scaling features and encoding categorical variables.  
 3.3 Real-time feature extraction and 
Supervised learning algorithms engineering 
The core of the predictive framework relies on 
Feature extraction is critical for the performance of 
supervised learning algorithms capable of incremental 
machine learning models. The proposed framework 
learning. Incremental learning, also known as online 
includes methods for real-time feature extraction, ensuring 
learning, allows models to update their parameters as new 
that features are dynamically computed as new data 
data arrives without requiring a complete retraining from 
arrives. 
scratch. 
 
 
Feature Extraction Methods 
Algorithms used 
• Sliding Window Technique: This technique involves 
 
maintaining a window of the most recent data points 
• Online gradient descent: This algorithm updates 
and computing features based on this window. It 
the model weights incrementally for each new data 
ensures that the model focuses on the most relevant 
point, making it suitable for real-time applications. 
and recent data. 
• Incremental decision trees: Algorithms like 
• Domain-Specific Features: Features are created 
Hoeffding Trees are used to build decision trees 
based on domain knowledge, capturing critical 
incrementally, allowing the model to adapt as new 
aspects of CPS behavior such as temporal patterns, 
data comes in. 
trend analysis, and anomaly indicators. 
• Adaptive random forests: This method extends the 
• Dynamic Computation: Features are computed on-
random forest algorithm by allowing trees to be 
the-fly, allowing the system to adapt to new data 
added or pruned based on their performance on new 
points and maintain high predictive performance. 
 
82 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. 
Model evaluation and adaptation • Stream of data points  (𝑥𝑡 , 𝑦𝑡)  where 𝑥𝑡 is the 
Evaluating the performance of the predictive framework feature vector and 𝑦𝑡 is the target 
in real-time is crucial for maintaining its effectiveness. Output: 
The proposed framework includes a real-time evaluation • Updated weights 𝑤𝑡 
pipeline to monitor model performance continuously.  
 Procedure: 
Evaluation metrics 1. Initialize weights 𝑤0 
• Accuracy, Precision, Recall, and F1 Score: 2. For each data point (𝑥𝑡 , 𝑦𝑡)  in the stream: 
These metrics are used to evaluate the 1. Predict 𝑦?̂?  = 𝑥𝑡 − 1, 𝑥𝑡 
performance of classification models. 2. Compute the error 𝑒𝑡=𝑦𝑡 − 𝑦?̂?  
Continuous monitoring ensures that any 3. Update the weights: 𝑤𝑡= 𝑤𝑡 − 1 +  𝜂𝑒𝑡𝑥𝑡  
degradation in performance is promptly detected. 3. Continue until the end of the data stream 
• Concept drift detection: Strategies such as  
window-based evaluation and performance Incremental Decision Trees (Hoeffding Tree) 
monitoring are employed to detect concept drift, Algorithm: Incremental Decision Tree (Hoeffding 
ensuring that the model adapts to changing data Tree) 
patterns. Input: 
 • Stream of data points (𝑥𝑡 , 𝑦𝑡)  where 𝑥𝑡 is the 
Model adaptation strategies feature vector and 𝑦𝑡 is the target 
• Retraining and update mechanisms: When • Confidence parameter 𝛿 
performance degradation is detected, the model • Grace period 𝑛 
is retrained or updated to maintain its accuracy. Output: 
• Adaptive learning rates: Adjusting the learning • Decision tree 
rate based on model performance helps in fine- Procedure: 
tuning the model continuously. 1. Initialize an empty decision tree 
In the area of Cyber-Physical Systems (CPS), where 2. For each data point (𝑥𝑡 , 𝑦𝑡)  in the stream:  
real-time data processing and predictive analytics are • Traverse the tree to find the appropriate leaf 
paramount, the application of suitable algorithms plays a for (𝑥𝑡 , 𝑦𝑡)   
pivotal role. Here, we introduce several key algorithms 
• Update sufficient statistics at the leaf 
tailored to address the challenges inherent in processing 
• If the number of data points at the leaf mod 
streaming data within CPS environments. Online Gradient 
𝑛=0: 
Descent facilitates continuous learning by iteratively 
1. Compute the Gini impurity for each attribute 
updating model parameters based on observed data, 
2. Identify the best attribute to split on using the 
ensuring adaptability to changing conditions in the data 
Hoeffding bound 
stream. Incremental Decision Trees, exemplified by the 
3. If the difference in impurity between the best 
Hoeffding Tree algorithm, dynamically grow decision 
attribute and the second-best attribute exceeds 
trees as new data arrives, efficiently handling streaming 
the bound, split the leaf node on the best attribute 
data while preserving model accuracy with minimal 
3. Continue until the end of the data stream 
memory usage. Adaptive Random Forests offer a dynamic 
 
solution to concept drift and changing conditions by 
Algorithm: Adaptive Random Forests 
continuously monitoring individual tree performance and 
Input: 
replacing underperforming ones with new trees trained on 
• Number of trees 𝐾K 
recent data. Hidden Markov Models (HMMs) capture 
temporal dependencies and state transitions in streaming • Stream of data points (𝑥𝑡 , 𝑦𝑡)   where 𝑥𝑡  is the 
data, enabling predictive modeling and anomaly detection feature vector and  𝑦𝑡   is the target 
in dynamic CPS environments. Finally, the Explicit- Output: 
Duration Hidden Markov Model (EDHMM) enhances • Ensemble of decision trees 
traditional HMMs by explicitly modeling state durations, Procedure: 
providing more precise temporal modeling and improving 1. Initialize an ensemble of 𝐾K decision trees 
predictive analytics accuracy in streaming CPS data. 2. For each data point ((𝑥𝑡 , 𝑦𝑡)  in the stream: 
These algorithms collectively form the backbone of our • For each tree 𝑇𝑖 in the ensemble: 
proposed framework for efficient classification and • Traverse 𝑇𝑖 to find the appropriate leaf for (𝑥𝑡 , 𝑦𝑡)   
prediction in CPS, addressing the unique challenges posed • Update sufficient statistics at the leaf 
by streaming data in dynamic environments. • If the number of data points at the leaf mod 𝑛=0: 
 1. Compute the Gini impurity (or another splitting 
Algorithm: Online Gradient Descent criterion) for each attribute 
Input: 2. Identify the best attribute to split on using the 
• Learning rate 𝜂 Hoeffding bound 
• Initial weights 𝑤0 3. If the difference in impurity between the best 
attribute and the second-best attribute exceeds 
Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 83 
the bound, split the leaf node on the best maintaining model accuracy with minimal memory usage. 
attribute Adaptive Random Forests further enhance model 
• Monitor the performance of 𝑇𝑖 using a adaptability by dynamically adjusting the ensemble of 
sliding window of recent predictions decision trees based on performance feedback, effectively 
• If the performance of 𝑇𝑖 degrades combating concept drift. Hidden Markov Models (HMM) 
significantly, replace 𝑇𝑖 with a new tree capture temporal dependencies in CPS data, allowing for 
trained on recent data probabilistic modeling of sequential observations. The 
3. Continue until the end of the data stream Explicit-Duration Hidden Markov Model (EDHMM) 
 extends HMM by explicitly modeling state durations, 
Algorithm: Explicit-Duration Hidden Markov Model providing more precise temporal modeling and enhancing 
(EDHMM) prediction accuracy. These algorithms collectively enable 
Input: real-time feature extraction, model updating, and 
• Number of states 𝑁 predictive analytics, ensuring the framework's efficacy in 
• Observation sequence 𝑂 =  𝑂1, 𝑂2, … . . 𝑂𝑡 , 𝑞𝑡 = handling the complexities of streaming data in CPS 
 𝑆𝑖|𝜆 environments. 
• Initial state distribution 𝜋  
• State transition matrix 𝐴 4 Results and discussion 
• Observation probability matrix 𝐵 The proposed methodology for efficient classification of 
Output: streaming data from Cyber Physical Systems (CPS) was 
• Updated parameters π, A, 𝐵 evaluated using various performance metrics. The metrics 
Procedure: used include accuracy, precision, recall, F1-score, and 
1 Initialize π, A, and 𝐵 processing time. The models were tested on a dataset 
 
consisting of [insert dataset details here], and the results 
2 Expectation-Maximization (EM) algorithm: are summarized in the tables below. 
1. E-step: Compute the forward probabilities The performance of traditional supervised learning 
𝛼α and backward probabilities β models (e.g., Decision Trees, Support Vector Machines, 
2. M-step: Update π, A, and B using α and β and Neural Networks) is presented in Table 1. Figure 2 to 
3 Iterate the EM steps until convergence or for a 6 shows the performance comparison of supervised 
fixed number of iterations learning models. 
E-step:  
• Compute forward probabilities 
𝑎𝑡(𝑖, 𝑑) =  𝑂𝑡−𝑑+1, … . . 𝑂𝑡 , 𝑞𝑡 =  𝑆𝑖 , 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛
= 𝑑|𝜆 
• Compute backward probabilities 
𝛽𝑡(𝑖) =  𝑂𝑡+1, … . . 𝑂𝑡 , 𝑞𝑡
=  𝑆𝑖 , 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛
= 𝑑|𝜆 
M-step: 
Update initial state distribution: 
𝜋𝑖 =  𝛾1(𝑖) 
Update state transition matrix 
∑𝑇−1  
𝑡−1 ∈𝑡 (𝑖, 𝑗)
𝑎𝑖,𝑗 =  Figure 2: Accuracy comparison 
∑𝑇−1  
𝑡−1 𝛾𝑡(𝑖, 𝑗)  
Update observation probability matrix: 
𝑏𝑗(𝑘)
∑𝑇−𝑑
𝑡−1 𝛾𝑡(𝑗)  1 (𝑂𝑡 =  𝑣𝑘) 
=  
∑𝑇−𝑑  
𝑡−1 𝛾𝑡(𝑗)
Update duration probability matrix: 
∑𝑇−1
𝑡−1 𝛾𝑡(𝑖, 𝑑)
𝑑𝑖(𝑑) =  
∑𝑇−1  
𝑡−1 𝛾𝑡(𝑖)
 
The proposed framework utilizes several key algorithms 
to effectively handle streaming data in Cyber-Physical 
Systems (CPS). Online Gradient Descent enables 
continuous learning by updating model parameters  
incrementally as new data arrives, ensuring adaptability to Figure 3: Precision comparison 
evolving patterns. Incremental Decision Trees, such as the 
Hoeffding Tree algorithm, dynamically grow decision 
trees in response to changing data distributions, 
84 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. 
 
Figure 4: Recall comparison  
 Figure 6: Comparison of processing time
 
Figure 5: F1 score comparison 
 
Table 1: Performance metrics for supervised learning models 
Model Accuracy Precision Recall F1-Score Processing Time 
(ms) 
Decision Tree 92.3% 91.8% 92.0% 91.9% 150 
SVM 93.7% 93.2% 93.5% 93.3% 300 
Neural 95.2% 94.8% 95.0% 94.9% 500 
Network 
The Neural Network outperforms both the Decision 91.9% F1-score). This model is suitable for applications 
Tree and SVM in terms of accuracy, precision, recall, and where speed is critical, but slight compromises in 
F1-score, achieving 95.2%, 94.8%, 95.0%, and 94.9% prediction accuracy are acceptable. The performance of 
respectively. This indicates that the Neural Network is the HMM and EDHMM is shown in Table 2. HMMs are 
more effective at accurately predicting CPS behavior and particularly effective for time series data and capturing 
identifying relevant instances, with fewer false positives temporal dependencies. 
and negatives. However, this enhanced performance  
comes with a higher processing time of 500 ms, reflecting Table 2: Performance Metrics for Hidden Markov Model 
its greater computational complexity. (HMM) and EDHMM 
The SVM, with an accuracy of 93.7%, precision of Metric HMM EDHMM 
93.2%, recall of 93.5%, and F1-score of 93.3%, performs 
better than the Decision Tree but requires twice the Accuracy 94.5% 96.1% 
processing time (300 ms). This makes SVM a good Precision 94.0% 95.7% 
middle-ground option, balancing improved predictive 
Recall 94.3% 95.9% 
performance with moderate computational demands. The 
Decision Tree, while being the fastest with a processing F1-Score 94.1% 95.8% 
time of 150 ms, has the lowest performance metrics 
Processing 400 600 
(92.3% accuracy, 91.8% precision, 92.0% recall, and 
Time (ms) 
Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 85 
 performance over time. The significant improvement in 
Table 2 presents a comparison between the Hidden performance metrics after adaptation underscores the 
Markov Model (HMM) and the Explicit-Duration Hidden importance of continuously monitoring and updating 
Markov Model (EDHMM) based on key performance models to handle evolving data distributions in CPS.In 
metrics. In terms of accuracy, EDHMM achieves 96.1%, summary, the proposed methodology, combining 
compared to 94.5% for HMM. This indicates that traditional supervised learning with advanced HMM and 
EDHMM makes fewer classification errors and is better at EDHMM models, and incorporating concept drift 
correctly predicting CPS behavior. Precision, which detection, provides a robust framework for efficient 
measures the proportion of true positive predictions classification and prediction of CPS data. This approach 
among all positive predictions, is 95.7% for EDHMM and ensures high accuracy, adaptability, and scalability, 
94.0% for HMM, suggesting that EDHMM has a lower making it suitable for real-time applications in dynamic 
rate of false positives. Recall, the proportion of true CPS environments. 
positive predictions among all actual positives, is 95.9%  
for EDHMM versus 94.3% for HMM, showing 5 Conclusion 
EDHMM's improved ability to identify relevant instances. 
The F1-Score, which harmonizes precision and recall, is In this research, we presented an efficient framework 
higher for EDHMM at 95.8% compared to HMM's 94.1%, for classification and prediction of streaming data from 
confirming EDHMM's overall better performance. Cyber Physical Systems (CPS). The study utilizing 
However, this enhanced performance comes at the cost of traditional supervised learning algorithms and advanced 
processing time. EDHMM's processing time is 600 ms, modeling techniques such as Hidden Markov Models 
higher than HMM's 400 ms, reflecting the additional (HMM) and Explicit-Duration Hidden Markov Models 
computational complexity of modeling explicit state (EDHMM). Our approach aimed to extract valuable 
durations. Despite this, the trade-off is justified by the knowledge from continuous data streams and predict 
substantial gains in predictive accuracy and reliability, system behavior accurately, facilitating timely decision-
making EDHMM a more robust choice for real-time CPS making within interconnected CPS environments. The 
applications. To assess the system's ability to handle results demonstrated the effectiveness of the proposed 
concept drift, models were evaluated before and after the methodology across various performance metrics, 
adaptation process. Table 4 summarizes the performance including accuracy, precision, recall, and F1-score. 
of the models before and after detecting and adapting to Among the traditional models, the Neural Network 
concept drift. outperformed others, achieving the highest accuracy of 
 95.2%, albeit with higher processing time. The SVM 
Table 3: Performance metrics before and after concept struck a balance between accuracy and computational 
drift adaptation efficiency, while the Decision Tree offered the fastest 
Metric Before After processing time with acceptable accuracy. The advanced 
Adaptation Adaptation HMM and EDHMM models showed significant 
Accuracy 85.0% 92.0% advantages in handling time series data, capturing 
temporal dependencies, and explicitly modeling state 
Precision 84.5% 91.5% durations. The EDHMM, in particular, achieved superior 
performance with an accuracy of 96.1% and an F1-score 
Recall 84.8% 91.8% of 95.8%, despite its higher computational cost. These 
models proved to be robust in dynamic environments, 
F1-Score 84.6% 91.6% maintaining high predictive accuracy over time. A crucial 
aspect of the methodology was the integration of concept 
Processing 200 250 
drift detection and model adaptation mechanisms. This 
Time (ms) 
ensured that the models remained relevant and effective in 
 the face of changing data distributions, a common 
The results demonstrate the effectiveness of the challenge in CPS applications. The ability to detect 
proposed methodology in classifying and predicting concept drift and adapt models accordingly significantly 
streaming data from CPS. The supervised learning improved their performance, as evidenced by the post-
models, particularly the Neural Network, achieved high adaptation metrics. 
accuracy and F1-scores, indicating strong predictive  
performance. However, the Neural Network required 
more processing time compared to the Decision Tree and References 
SVM. HMM and EDHMM models showed superior  
performance in handling time series data, with EDHMM [1] Subutai Ahmad, Alexander Lavin, Scott Purdy, and 
outperforming HMM in all metrics. This highlights the Zuha Agha. Unsupervised real-time anomaly detection 
advantage of explicitly modeling state durations in CPS for streaming data. Neurocomputing, 262:134–147, 
data, where the duration of states can significantly impact 2017. 
system behavior. https://doi.org/10.1016/j.neucom.2017.04.070. 
The concept drift detection and model adaptation [2] Giuseppe Aceto, Domenico Ciuonzo, Antonio 
mechanism proved crucial in maintaining model Montieri, and Antonio Pescapé. 
86 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. 
DISTILLER: Encrypted traffic classification via [12] Isao Yagyu, Hiroshi Hasegawa, and Ken-ichi Sato. 
multimodal multitask deep learning. Journal of An efficient hierarchical optical path network design 
Network and Computer Applications, 183– algorithm based on a traffic demand expression in a 
184:102985, 2021. Cartesian product space. IEEE Journal on Selected 
https://doi.org/10.1016/j.jnca.2021.102985.. Areas in Communications, 26(6):22–31, 2008. 
[3] Maroua Bahri, Albert Bifet, João Gama, Heitor Murilo https://doi.org/10.1109/JSACOCN.2008.030907. 
Gomes, and Silviu Maniu. [13] Joel D. Costa Júnior, Elaine R. Faria, Jonathan A. 
Data stream analysis: Foundations, major tasks and Silva, João Gama, and Ricardo Cerri. 
tools. WIREs Data Mining and Knowledge Discovery, Novelty detection for multi-label stream 
11(3):e1405, 2021. classification. 2019 8th Brazilian Conference on 
http://dx.doi.org/10.1002/widm.1405.. Intelligent Systems (BRACIS), pages 194–199, 2019. 
[4] Jean Paul Barddal, Lucas Loezer, Fabrício Enembreck, https://doi.org/10.1109/BRACIS.2019.00034. 
and Riccardo Lanzuolo. [14] Maxim Kalinin and Vasiliy Krundyshev. 
Lessons learned from data stream classification Security intrusion detection using quantum machine 
applied to credit scoring. Expert Systems with learning techniques. Journal of Computer Virology 
Applications, 162:113899, 2020. and Hacking Techniques, 19:125–136, 2023. 
https://doi.org/10.1016/j.eswa.2020.113899.. https://doi.org/10.1007/s11416-022-00435-0. 
[5] Kaylani Bochie, Mateus S. Gilbert, Luana Gantert, [15] Jay Kumar, Junming Shao, Salah Uddin, and Wazir 
Mariana S. M. Barbosa, Dianne S. V. Medeiros, and Ali. An online semantic-enhanced Dirichlet model 
Miguel Elias M. Campista. for short text stream clustering. Proceedings of the 
A survey on deep learning for challenged networks: 58th Annual Meeting of the Association for 
Applications and trends. Journal of Network and Computational Linguistics, pages 766–776, 2020. 
Computer Applications, 194:103213, 2021. https://doi.org/10.18653/v1/2020.acl-main.70. 
https://doi.org/10.1016/j.jnca.2021.103213.. [16]  Xiangjun Li, Yong Zhou, Ziyan Jin, Peng Yu, and 
[6] Jin Cheng, Yulei Wu, Yuepeng E, Junling You, Tong Shun Zhou. A classification and novel class 
Li, Hui Li, and Jingguo Ge. detection algorithm for concept drift data stream 
MATEC: A lightweight neural network for online based on the cohesiveness and separation index of 
encrypted traffic classification. Computer Networks, Mahalanobis distance. Journal of Electrical and 
199:108472, 2021. Computer Engineering, 2020:4027423, 2020. 
https://doi.org/10.1016/j.comnet.2021.108472., 2021. https://doi.org/10.1155/2020/4027423.. 
[7] Luiz F. S. Coletta, Moacir Ponti, Eduardo R. Hruschka, [17] Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama, 
Ayan Acharya, and Joydeep Ghosh. and Guangquan Zhang. 
Combining clustering and active learning for the Learning under concept drift: A review. IEEE 
detection and learning of new image classes. Transactions on Knowledge and Data Engineering, 
Neurocomputing, 358:150–165, 2019. 31(12):2346–2363, 2019. 
https://doi.org/10.1016/j.neucom.2019.04.070.. https://doi.org/10.1109/TKDE.2018.2876857. 
[8] Salah Ud Din, Junming Shao, Jay Kumar, Waqar Ali, [18] Shaohua Wan, Yudong Zhang, and Jia Chen. 
Jiaming Liu, and Yu Ye. On the construction of data aggregation tree with 
Online reliable semi-supervised learning on evolving maximizing lifetime in large-scale wireless sensor 
data streams. Information Sciences, 525:153–171, networks. IEEE Sensors Journal, 16(20):7433–7440, 
2020. 2016. 
https://doi.org/10.1016/j.ins.2020.03.052.. https://doi.org/10.1109/JSEN.2016.2581491. 
[9] Song Li, Han Qiu, and Jialiang Lu. [19] Li Da Xu and Lian Duan. Big data for cyber-physical 
An interpretable federated learning-based network systems in Industry 4.0: A survey. Enterprise 
intrusion detection framework. arXiv Preprint, 2022. Information Systems, 13(2):148–169, 2019. 
https://arxiv.org/abs/2201.03134. https://doi.org/10.1080/17517575.2018.1442934.. 
[10] Gianluigi Folino, Francesco Sergio Pisani, and Luigi [20] E. A. Zaitseva and D. S. Lavrova. 
Pontieri. A GP-based ensemble classification Self-Regulation of the Network Infrastructure of 
framework for time-changing streams of intrusion Cyberphysical Systems on the Basis of the Genome 
detection data. Soft Computing, 24:17541–17560, Assembly Problem. Automatic Control and 
2020. Computer Sciences, 54:813–821, 2020. 
https://doi.org/10.1007/s00500-020-05200-3.. https://doi.org/10.3103/S0146411620080350.. 
[11] Chunyu Hu, Yiqiang Chen, Lisha Hu, and Xiaohui [21] Sristi Vashisth, Anjali Goyal. 
Peng. A novel random forests-based class Dynamic Anomaly Detection Using Robust Random 
incremental learning method for activity recognition. Cut Forests in Resource-Constrained IoT 
Pattern Recognition, 78:277–290, 2018. Environments. Informatica, 48(23): 107–120, 2024. 
https://doi.org/10.1016/j.patcog.2018.01.025.. https://doi.org/10.31449/inf.v48i23.6862. 
 
https://doi.org/10.31449/inf.v49i16.6639 Informatica 49 (2025) 87–96 87 
A Comparative Study of Deep Learning Algorithms for Detecting 
Fungal Infection Skin Diseases 
Fajar Masya1, Joko Triloka* 2, Setia Wulandari2 
1Mercu Buana University, Meruya Sel., Kembangan, Jakarta 11650, Indonesia 
2Institute of Informatics and Business Darmajaya, Jl. Z.A. Pagar Alam No.93, Bandar Lampung 35141, Indonesia 
E-mail: fajar.masya@mercubuana.ac.id, joko.triloka@darmajaya.ac.id, setiawulan.2121211001@mail.darmajaya.ac.id 
*Corresponding author 
Keywords: mask r-cnn, yolov5, image classification; skin fungal infection  
Received: July 11, 2024 
Many people place a high value on the health of their skin, frequently spending large sums of money on 
skincare products. Fungal infections are one of the most common skin conditions that can damage a 
person's self-esteem. When dealing with skin health issues, seeking advice from a knowledgeable 
dermatologist is essential. Deep learning is a contemporary technique that saves doctors time and helps 
them spot diseases early. Two deep learning algorithms that are useful in identifying patterns of skin 
illnesses are Mask R-CNN and YOLOv5. This paper explores using Mask R-CNN and YOLOv5 to 
recognize skin illnesses caused by fungal infections, going through several processing phases. The 
research results show that the YOLOv5 strategy performed best in accuracy, recall, precision, F1-Score, 
and AUC. This algorithm shows great potential and warrants further investigation in practical 
applications. 
Povzetek: Primerjava algoritmov Mask R-CNN in YOLOv5 za zaznavanje glivičnih kožnih bolezni kaže, 
da YOLOv5 dosega najboljše rezultate, s čimer izkazuje velik praktični potencial. 
 
 
1 Introduction enhance images by extracting valuable information.  
Object detection algorithms, often employing machine 
Skin covers the entire surface of the human body and is learning or deep learning, automate relevant findings. In 
the largest organ, directly exposed to the external medical science, digital image processing is instrumental 
environment [1]. Various diseases affect the skin, ranging in automating diagnostic processes [9]. 
from mild, itchy conditions to serious, potentially fatal Several studies have applied popular object detection 
ones [2]. Despite the importance of skin health, it is often algorithms, such as the Mask Regional-based 
overlooked, and many underestimate skin conditions. Convolutional Neural Network (Mask R-CNN) and You 
Most skin diseases result from bacterial, fungal, or viral Only Look Once (YOLO) algorithms. One study using the 
infections and allergies [3]. Several factors can directly or Mask R-CNN algorithm for breast cancer detection 
indirectly impact the skin, causing diseases that may be reported an accuracy of 91% and a precision of 84% [10]. 
treatable with medications, while others necessitate Another study implemented Mask R-CNN to find, detect, 
consultation with a professional skin disease specialist and classify objects in images or videos of the Ryze Tello 
[4,5]. Consultation with a specialist in dermatology is drone, achieving an average accuracy of 95.6% [11]. 
essential for individuals with skin health concerns. Additionally, research using Mask R-CNN for 
However, due to embarrassment and the high cost of automatically detecting and recognizing small magnetic 
treatment, many individuals with skin diseases remain targets in shallow underground layers demonstrated an 
silent, leading to decreased self-confidence and social average detection accuracy of 97%, a recall rate of 94%, 
withdrawal. This social isolation can contribute to and an average detection speed of 0.35 seconds per image 
depression. Therefore, dermatologists must engage in on a GPU [12]. Studies employing the YOLOv5 algorithm 
early detection and prevention of skin diseases, as these have also shown significant results. One study detecting 
conditions can be easily transmitted. face masks with YOLOv5 after 300 epochs achieved an 
accuracy rate of approximately 96.6% [13]. Another study 
In the modern era, nearly all sectors, including 
using YOLOv5 to determine whether a face mask is being 
medicine, rely on computerized systems to replace 
worn reported an accuracy of 97.90% [14]. The 
conventional methods with automated technology [6]. 
application of popular object detection algorithms like 
Researchers, particularly in medical science, are actively 
Mask R-CNN and YOLOv5 has been widely successful 
seeking solutions to help doctors diagnose diseases early 
across diverse fields. The specific accuracies and 
without excessive time expenditure [7]. This is where 
precision rates mentioned for different applications like 
digital image processing becomes essential [8]. Digital 
breast cancer detection, drone imagery classification, 
image processing involves using computer algorithms to  
underground magnetic target detection, and face mask 
88 Informatica 49 (2025) 87–96 F. Masya et al. 
detection highlight these algorithms' versatility and high pathology detection using CNN algorithms, reaching a test 
performance in various domains. accuracy of 89%.  
While multiple studies have investigated the use of Meanwhile, several studies have utilized YOLO in 
Mask R-CNN and YOLO for a variety of medical research on different tasks, including [21], which has 
applications, including breast cancer detection, face mask achieved 92.20% accuracy in real-time face mask 
recognition, and other skin illnesses, there has been a detection under multiple conditions. [22] calculating 
striking paucity of research focusing on fungal skin melanoma skin cancer using a web application integrated 
infections. Existing research focuses mostly on bacterial with the YOLOv5. The model evaluates if the stain is 
or viral skin disorders or non-specific skin diseases, cancerous or benign. [23] applying YOLO for early skin 
leaving a vacuum in the early identification and cancer detection with the test results showed that the 
categorization of fungal infections with advanced deep- YOLOv5's model has an accuracy of                      
learning models. This gap is crucial since fungal infections 89.1% in detecting skin cancer types. Moreover, a 
are common and sometimes misdiagnosed due to proposed Yolo deep neural network which can classify 9 
symptoms that overlap with other skin disorders. different classes of skin cancer was conducted by [24], 
This study intends to close the highlighted gap by their experimental analysis shows that the proposed 
thoroughly comparing two cutting-edge deep learning method achieves the mean average precision score of 
systems, Mask R-CNN, and YOLOv5, for identifying and 88.03% and 86.52% for Yolo V3 and Yolo V4 
categorizing fungal skin diseases. This is critical since respectively. 
fungal infections are among the most common skin                                                                                                                    
disorders, affecting millions of people worldwide, and 
early detection is essential for avoiding consequences. 3 System model 
This study enhances the application of deep learning in  
dermatology by comparing the performance of these 3.1 Mask R-CNN 
algorithms. It also provides practical insights for real-time 
Mask R-CNN, developed by the Facebook AI Research 
diagnostic tools in healthcare settings. 
(FAIR) team in 2017, is a deep learning algorithm 
 
renowned for detecting objects in images while 
2 Related work simultaneously generating a segmentation mask for each 
instance, a technique commonly referred to as instance 
Numerous studies have explored the efficacy of various 
segmentation [25]. As depicted in Figure 1. instance 
algorithms for classifying skin diseases caused by fungal 
segmentation shares similarities with object detection, 
infections. In 2017, [15] investigated the use of image 
wherein individual objects are detected sequentially. 
processing techniques, including Discrete Cosine 
However, it integrates semantic segmentation, enabling 
Transform (DCT), Discrete Wavelet Transform (DWT), 
each object to be categorized, localized, and distinguished 
and Singular Value Decomposition (SVD), achieving an 
at the pixel level. 
impressive detection efficiency of up to 80%. The average 
During the detection process, Mask R-CNN operates 
training time across the three transformations and their 
across three main components: the feature extraction 
parallel combinations was 2.066 seconds, with an average 
network, region-proposal network, and instance detection 
testing time of 0.7866 seconds. Subsequently, in 2018, 
and segmentation networks. Mask R-CNN employs 
[16] delved into the utilization of the K-Means and Fuzzy 
various backbone architectures [26], including ResNet-
C-Means algorithms, providing valuable insights into skin 
101 and FPN for feature extraction. Through 
disease detection. The adoption of these algorithms 
experimentation, the ResNet-101 backbone has 
supported early diagnosis and disease-type identification. 
demonstrated above-average accuracy and speed in 
In deep learning, [17] introduced Convolutional 
feature extraction. In the Region Proposal Network (RPN) 
Neural Network (CNN) algorithms for skin disease 
phase, Regions of Interest (ROIs) are generated, serving 
detection in 2018, demonstrating enhanced accuracy and 
as input for the subsequent instance detection and 
efficiency compared to traditional methods. The CNN 
segmentation networks stage. 
approach yielded better results, paving the way for more 
a. Feature extraction: Feature extraction aims to 
advanced diagnostic tools. Building on this progress, [18] 
distill information from images and represent it 
explored the application of the YOLOv3 algorithm in the 
in a lower-dimensional space, facilitating the 
medical field in 2019. Their investigation encompassed 
classification of patterns. In the context of Mask 
diverse tasks, such as white blood cell detection and 
R-CNN, feature extraction involves generating 
identifying target strings of bananas and fruit stems. 
Region of Interest (RoI) features through the 
Notably, the YOLOv3 algorithm achieved impressive 
fusion of ResNet-101 architecture with FPN 
accuracy rates, showcasing its versatility and potential in 
(Feature Pyramid Network). FPN plays a crucial 
medical imaging. 
role in recognition systems by enabling the 
Further research by [19] focused on facial skin 
identification of objects of various sizes within 
disease analysis using CNN algorithms based on clinical 
the same image. FPN enhances information 
images. Their study encompassed the detection of five 
quality by utilizing multiple feature maps. It 
facial skin diseases, achieving notable accuracies for 
adopts a pyramid design principle for feature 
various conditions. Additionally, [20] investigated skin 
extraction, offering superior speed and accuracy. 
A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 89 
 
 
 
Figure 1: The Mask R-CNN framework for instance segmentation 
 
  
FPN integrates both bottom-up and top-down  
information processing techniques to achieve 3.2 YOLO 
comprehensive feature representation. 
b. RPN (Region Proposal Network): Within the You Only Look Once (YOLO) is an algorithm developed 
feature extraction process, a 3 x 3 convolution by the Facebook AI Research (FAIR) team to quickly and 
layer is applied to each generated feature map. accurately detect various types of objects. YOLO 
Initially, the feature map undergoes scanning addresses single regression problems directly by mapping 
utilizing anchor boxes of various sizes and image pixels to bounding box coordinates and class 
ratios. Subsequently, the output is bifurcated probabilities. It requires only one look at an image to 
into two branches: one associated with the predict what objects are present and where they are 
objectivity or confidence score, and the other located. YOLO operates by using a single convolutional 
with the bounding box regressor, as depicted in network that simultaneously predicts multiple bounding 
Figure 2. boxes and the probability of each class within those boxes. 
c. Instance Detection and Semantic it has overall 24 convolutional layers, four max-pooling 
Segmentation: During the instance layers, and two fully connected layers as illustrated in 
segmentation process, objects, bounding boxes, Figure 3. It is trained on images to optimize detection 
class labels, and confidence values are detected performance. The architecture works as follows: 
through a fully connected network that takes the a. The input image is resized to 448x448 before 
Region of Interest (RoI) as input. Semantic being processed by the convolutional network. 
segmentation is then performed on the image b. A 1x1 convolution is initially applied to reduce 
using a Fully Convolutional Network (FCN), the number of channels, followed by a 3x3 
which predicts the semantic class of each pixel convolution to generate a cuboidal output. 
within the bounding box. As a result, distinct c. The ReLU activation function is used 
colors are assigned to each instance based on the throughout, except for the final layer, which 
bounding box delineation, facilitating visual uses a linear activation function. 
differentiation of individual objects. d. Additional techniques, such as batch 
 normalization and dropout, are employed to 
 regularise the model and prevent overfitting. 
  
  
  
  
  
  
  
  
  
  
   
Figure 2: RPN pr ocessing 
 
90 Informatica 49 (2025) 87–96 F. Masya et al. 
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
   
   
 Figure 3: YOLOv5 archite cture 
   
 
4 Proposed procedures  
The YOLOv5 algorithm utilized in this study has 
Proposed Mask R-CNN comprises three primary stages. multiple phases for object detection. Using PyTorch as a 
To begin with, it uses the darknet-53 architecture to extract feature extractor, YOLOv5 detects objects by classifying 
features. Second, it uses the input image to derive the them and locating them depending on the features that are 
coordinates of Regions of Interest (RoI) using the Region extracted. The goal of YOLOv5 feature extraction is to 
Proposal Network (RPN) approach. Finally, it predicts the supply input variables for the classification procedure. 
class of the discovered objects, revealing information The suggested YOLOv5 algorithm architecture is 
about the ROI sites. This procedure yields a mask that displayed in Figure 5. 
highlights areas suggestive of fungal-induced skin  
disorders. A suggested Mask R-CNN technique based on  
edge detection is shown in Figure 4 to identify the skin  
conditions in the dataset.  
  
  
  
  
  
  
  
  
  
  
  
  
   
   
 Figure 4: Proposed Mask  R-CNN Architecture 
   
  
  
  
  
   
   
 Figure 5: Proposed YO LOv5 architecture 
   
  
  
  
 
A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 91 
5 Experiments on algorithm 5.3 Algorithm processing 
processing Figure 9 illustrates the idea of processing both algorithms. 
Installing Python, TensorFlow, Keras, and other necessary 
5.1 Dataset description software is part of the dependency installation process for 
the Mask R-CNN algorithm. Installing deep learning 
This study's publicly accessible dataset is from 
packages such as PyTorch, NumPy, and Pandas for 
http://www.dermnet.com/dermatology-pictures-skin-
YOLOv5 was required for YOLOv5. The dataset that will 
disease-pictures and consists of images of 1,473 data 
be utilized to train the object detection algorithms is 
points and 3 class labels of skin diseases: 
prepared during the object detection data loading stage. 
Dermatomycosis, Mucocutaneous Candidiasis, and 
The dataset needs to include pictures and properly 
Pityriasis Versicolor. Before splitting the dataset, it is first 
formatted annotations (labels and bounding boxes) for the 
preprocessed to ensure each image is appropriate for 
Mask R-CNN technique to function. Every object in the 
labeling. Figure 6 shows an example of a dataset with skin 
dataset needs to be labeled for detection for the YOLOv5 
conditions brought on by fungus infections. 
algorithm to work. The training configurations for both 
 
algorithms are established in the configuration setting. 
5.2 Data Pre-processing  
 
Raw data needs to be treated first. Partitioning and 
 
labeling the dataset are preprocessing steps. Labeling 
 
gives each thing in the picture a name and ensures that it 
 
is part of the appropriate group. After that, the 1.473 image 
 
dataset is split into training and testing sets. The Mask R-
 
CNN algorithm's labeling procedure entails object 
 
segmentation with the polygon tool. On the other hand, the 
 
YOLO algorithm utilizes bounding boxes created with a 
 
bounding box tool for labeling.  
 
With 10% of the data for testing and the remaining 
 
90% for training, the dataset was reduced to 1.136 
 
following the labeling phase. A ratio of 80% for training 
 
and 20% for testing was also used in the experiment. Pre-
 
processing procedures led to a minor decrease in the 
  
overall dataset size due to data cleaning procedures, much  
 
as the 90/10 split. The model's performance was likewise Figure 7: Data labeling on images using the 
 
decreased to 1136 with this split, demonstrating how well polygon tool 
 
it performs in comparison to the initial 90/10 split. Figure  
 
7 provides a polygon tool example of data labeling, while 
 
Figure 8 provides a bounding box tool example.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
  
 
 Figure 6: Sample images of skin diseases caused by fungal infections 
 
  
 
92 Informatica 49 (2025) 87–96 F. Masya et al. 
The setup for the Mask RCNN algorithm contains  
details about the number of iterations, batch size, number  
of classes, and other pertinent parameters. Configuration  
options for the YOLOv5 include batch size, learning rate,  
and number of epochs. To improve object detection  
accuracy, Mask R-CNN performs gradient computations  
and modifies model weights throughout the training  
phase. Parameters in the YOLOv5 algorithm are  
optimized to improve the accuracy of object detection.  
During the testing phase, fresh photos are used to perform  
object detection. For every object that is recognized, the  
Mask RCNN algorithm produces bounding boxes and  
class labels. To evaluate the object detection performance  
of the YOLOv5 algorithm, a dataset that hasn't been seen  
before is used.  
  
 
5.4 Algorithm evaluation 
 
We assessed YOLOv5 and Mask R-CNN for object  
detection in this study because of their high accuracy and  
effectiveness in managing related tasks. These algorithms  
were selected due to their resilience and efficacy in object   
identification and classification, particularly in situations   
requiring quick and accurate detection. Although these  Figure 9: The notion of algorithmic 
techniques are the focus of our study, we acknowledge the  processing 
possibility of expanding it to include other highly  
respected object detection algorithms, like SSD (Single  
Shot MultiBox Detector), EfficientDet, and Faster R-
CNN. We consider these algorithms for future research 6 Results and discussion 
initiatives because they may offer further insights into the Once training and testing are finished, the assessment step 
comparative performance of our dataset. This procedure is carried out to gauge the effectiveness of the YOLOv5 
assesses how well the YOLOv5 and Mask R-CNN and Mask R-CNN algorithms. We test the Mask RCNN 
algorithms work. There are two phases to the evaluation: algorithm with various iteration settings and thresholds. In 
testing and training. During the training phase, both contrast, the Mask RCNN algorithm is evaluated using 
methods employed 1,023 photos from the dataset. During 1000, 1500, 2000, 2500, and 3000 iteration values, 
the testing phase, 113 different photos are used to assess respectively, with threshold values varying between 0.1 
the algorithms. At this stage, the algorithms' object and 0.9 for every iteration. The YOLOv5 algorithm, on the 
detection performance is evaluated, and a confusion other hand, makes use of distinct threshold values and 
matrix is used to calculate the algorithms' accuracy. epochs. 50, 75, 100, 125, and 150 are the employed epoch 
Numerous significant performance indicators, including values, and each epoch's threshold values range from 0.1 
accuracy, precision, recall, F1-score, mean average to 0.9. 
precision (MAP), and area under the curve (AUC), can be a. Mask R-CNN Algorithm: The Mask R-CNN 
obtained from the confusion matrix. Five iterations and algorithm identified 80 data labels for 
epochs of performance evaluation are granted for both Dermatomycosis (D), 19 for Mucocutaneous 
algorithms. Candidiasis (MC), and 0 for Pityriasis Versicolor 
 (PV) after five tests. A total of 113 photos were 
 positively detected. Table 1 presents the 
 interpretation of the performance calculation for 
 the Mask RCNN method used to treat skin 
 diseases. The algorithm uses 3000 iterations and 
 varies the threshold (T) from 0.1 to 0.9. The F1-
 score is the method for identifying the optimal 
 model, with a threshold value for model 
 evaluation between 0.1 and 0.9. When assessing 
 the binary model, the harmonic mean of precision 
 and recall is employed using the F1-score. 
 According to Table 1, the maximum F1-score of 
 0.28 is attained at the 0.1 level. The precision is 
  49%, the recall is 19%, and the accuracy is 67% 
  at this threshold. 
 Figure 8: Data labeling on images using the 
bounding box tool 
 
A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 93 
  
 Table 1: Performance of Mask R-CNc.N   with 3000 Iterations 
  d.  
  
Accuracy Recall Precision F1-Score 
 T  
 D MC PV MAP D MC PV MAP D  MC PV MAP D MC PV MAP 
  
0.1 0,39 0,7 0,92 0,67 0,35 0,23 0 0,19 0,87 0,59 0 0,49 0,5 0,33 0 0,28 
  
 0.2 0,4 0,72 0,93 0,68 0,35 0,19 0 0,18 0 ,91 0,63 0 0,51 0,51 0,29 0 0,27 
  
0.3 0,4 0,73 0,94 0,69 0,34 0,15 0 0,16 0,92 0,58 0 0,5 0,5 0,24 0 0,25 
  
 0.4 0,4 0,73 0,94 0,69 0,33 0,14 0 0,16 0 ,95 0,69 0 0,55 0,49 0,23 0 0,24 
  
0.5 0,4 0,74 0,95 0,7 0,33 0,15 0 0,16 0,96 0,73 0 0,56 0,49 0,25 0 0,25 
  
 0.6 0,4 0,75 0,96 0,7 0,32 0,14 0 0,15 0 ,96 0,71 0 0,56 0,48 0,23 0 0,24 
  
0.7 0,39 0,75 0,96 0,7 0,31 0,12 0 0,14 0,96 0,67 0 0,54 0,47 0,2 0 0,22 
  
 0.8 0,38 0,77 0,97 0,71 0,31 0,13 0 0,15 0 ,96 0,67 0 0,54 0,47 0,22 0 0,23 
 
0.9 0,33 0,81 0,98 0,71 0,26 0,12 0 0,13 0 ,91 0,5 0 0,47 0,4 0,19 0 0,2 
  
   
  
The area under the ROC graph is then computed using the c. Evaluation of Proposed Algorithms: The Mask 
AUC value. It is employed as a performance evaluation R-CNN and the YOLOv5 algorithm can be 
statistic to gauge a classification model's effectiveness. A compared to the calculation results obtained from 
higher AUC score indicates better model performance in the algorithm testing technique for detecting 
differentiating between positive and negative classes. In fungal infections-caused skin problems, which 
the fifth test, an AUC value of 0.55 is displayed on the contained 113 data points from three different 
ROC graph of the Mask R-CNN method, as depicted in skin conditions. Table 3 displays the comparison 
Figure 10. values. 
  
  
 In every metric that is examined, YOLOv5 outperforms 
 Mask R-CNN, including accuracy (0.87), recall (0.80), 
 precision (0.85), F1-Score (0.81), and AUC (0.88). The 
 variety of fungal infections in appearance, size, form, and 
 texture makes it particularly difficult to diagnose skin 
 illnesses caused by these infections. Algorithms that 
 process medical pictures accurately and efficiently are 
 necessary for effective detection. Here, the effectiveness 
 of two widely used object detection algorithms, YOLOv5 
 and Mask R-CNN, is compared.  
 This can be beneficial when precise infection borders 
  are critical in medical imaging. Dermatologists may find 
  the capacity to create segmentation masks especially 
 Figure 10: ROC following the fifth Mask R-CNN helpful in diagnosing and treating infections since they 
 algorithm test offer comprehensive details about the affected regions. 
  However, because of its multi-stage processing, Mask R-
b. YOLOv5 Algorithm: By the time the fifth test CNN requires a lot of computing power. This may lead to 
was reached, the YOLOv5 algorithm had slower inference and longer training times, which could be 
accurately identified 113 images; however, it had problematic for real-time applications or for handling big 
only identified 86 data labels for Pityriasis datasets. 
Versicolor, 39 for Mucocutaneous Candidiasis, By adding a branch for predicting segmentation masks 
and 86 for Dermatomycosis. The performance on each Region of Interest (RoI) in parallel with the 
testing at epoch 150 with a threshold value of 0.1 current branch for classification and bounding box 
is displayed in Table 2. According to Table 2, the regression, Mask R-CNN expands upon Faster R-CNN. 
greatest F1-Score value is 0.81, at the 0.1 The two-stage method of Mask R-CNN, which includes 
threshold, with 86% accuracy, 8% recall, and region proposal and refining, enables very accurate object 
94% precision. With an AUC value of 0.88. identification and segmentation. 
Figure 11. displays a graph for the ROC of the  
YOLOv5 algorithm in the fifth test.  
94 Informatica 49 (2025) 87–96 F. Masya et al. 
  
 Table 2: Performance of YO LOv5 with 150 epochs 
   
  
Accuracy Recall Precision F1-Score 
 T  
 D MC PV MAP D MC PV MAP  D MC PV MAP D MC PV MAP 
  
0.1 0,77 0,83 0,99 0,86 0,77 0,72 0,92 0,8 0,71 0,92 0,8 0,81 0,8 0,71 0,92 0,81 
  
 0.2 0,74 0,86 0,99 0,86 0,68 0,72 0,92 0,77  0,78 0,92 0,84 0,85 0,76 0,75 0,92 0,81 
  
0.3 0,72 0,86 0,98 0,85 0,62 0,69 0,83 0,71 0,8 0,91 0,85 0,85 0,72 0,74 0,87 0,78 
  
 0.4 0,67 0,85 0,98 0,83 0,52 0,62 0,82 0,65  0,83 0,9 0,86 0,86 0,65 0,71 0,86 0,74 
  
0.5 0,62 0,83 0,97 0,81 0,42 0,53 0,5 0,48 0,85 1 0,89 0,91 0,57 0,65 0,67 0,63 
  
 0.6 0,58 0,8 0,96 0,78 0,33 0,42 0,33 0,36  0,89 1 0,92 0,94 0,49 0,57 0,5 0,52 
  
0.7 0,53 0,75 0,93 0,74 0,23 0,25 0 0,16 1 0 1 0,67 0,37 0,4 0 0,26 
  
 0.8 0,44 0,69 0,93 0,69 0,08 0,07 0 0,05  1 0 1 0,67 0,15 0,13 0 0,09 
 
0.9 0,4 0,67 0,93 0,67 0,02 0 0 0,01  0 0 1 0,33 0,04 0 0 0,01 
  
   
  
Table 3. Performance comparison of proposed algorithms 
  
 
  
Iteration/ 
 Threshold Prediction Accuracy Recall Precision F1-Score AUC 
Epochs  
 M Y M Y M Y M Y  M Y M Y M Y M Y 
 1000 50 0,6 0,1 176 163 0,79 0,84  0,34 0,76 0,6 0,76 0,43 0,75 0,61 0,81 
 1500 75 0,2 0,1 179 179 0,73 0,87  0,42 0,76 0,36 0,85 0,38 0,78 0,61 0,88 
 2000 100 0,1 0,1 220 183 0,67 0,86  0,22 0,78 0,45 0,82 0,3 0,8 0,55 0,87 
 2500 125 0,7 0,1 199 161 0,8 0,87  0,4 0,75 0,67 0,85 0,45 0,79 0,57 0,84 
 3000 150 0,1 0,1 260 183 0,67 0,86  0,19 0,8 0,49 0,81 0,28 0,81 0,55 0,88 
  predictions, which is essential for identifying various  
  
 Additionally, YOLOv5 produced higher recall and 
 precision metrics, demonstrating its efficacy in reducing 
 false positives and false negatives. This is important for 
 medical diagnosis since misdiagnosing a healthy area as 
 sick (false positive) or failing to detect an infection (false 
 negative) can have serious repercussions. This implies that 
 YOLOv5 has a higher degree of accuracy when it comes 
 to recognizing contaminated regions in the pictures.  
 Furthermore, the high AUC suggests that YOLOv5 
 performs better across a range of threshold values in 
 differentiating between infected and non-infected areas. 
 Although Mask R-CNN provides thorough segmentation, 
 the comparison research indicates that for this specific 
  task, the benefits of segmentation are not greater than 
  those of YOLOv5's higher detection accuracy and 
 Figure 11: ROC following the fifth of the YOLOv5 efficiency. However, in some clinical situations where 
 Algorithm test precise infection boundaries are required, Mask R-CNN's 
  segmentation function might still be useful. 
Our proposed technique, YOLOv5, is a quick and Because YOLOv5 processes information more 
efficient single-stage object detection algorithm that quickly, it is more suited for real-world uses where timely 
outperforms two-stage detectors such as Mask R-CNN. findings are crucial, like automated screening systems in 
Large-scale image analysis and real-time detection healthcare settings. According to the comparison analysis, 
scenarios benefit greatly from this efficiency. Although it YOLOv5 is a more sensible option for large-scale screens 
might not offer as much segmentation depth as Mask R- and real-time applications. Nonetheless, the needs of the 
CNN, YOLOv5 has a very high degree of object detection application, such as the necessity for segmentation against 
and classification accuracy. It captures items of different the requirement for quick and precise identification, 
sizes with the aid of anchor boxes and multi-scale  should be considered while choosing between the two 
 algorithms. 
A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 95 
7 Conclusion [5] Altammami, G. S et al. 2024. Dermatological 
Conditions in The Intensive Care Unit at A Tertiary 
There are notable variations in performance parameters Care Hospital in Riyadh, Saudi Arabia.” Saudi 
including accuracy, recall, precision, F1-Score, and AUC Medical Journal vol. 45, 8. pp. 834-839.  
when YOLOv5 and Mask R-CNN algorithms are https://doi.org/10.15537/smj.2024.45.8.20240479 
compared to identify fungal diseases on the skin. The 
disparities arise from variations in iteration (or epoch) [6] Khavandi, S. et al. 2023. Investigating the Impact of 
values, which affect the algorithms' capacity to acquire Automation on the Health Care Workforce Through 
knowledge and generalize from the training set. The Autonomous Telemedicine in the Cataract Pathway: 
results of performance tests indicate that algorithm Protocol for a Multicenter Study.” JMIR research 
performance is influenced by epoch or iteration values protocols vol.12. e49374. pp.1-10. 
from the Mask R-CNN and YOLOv5 algorithms' first to https://doi.org/10.2196/49374 
fifth tests. The second test had the highest AUC value with [7] Roy, K. S. et al. 2019. Skin Disease Detection Based 
1500 iterations of the Mask R-CNN method and 75 epochs on Different Segmentation Techniques. 2019 Int. Conf. 
of the YOLOv5 algorithm. With an AUC value of 66% Opto-Electronics Appl. Opt. Optronix. pp. 1–5, 2019. 
and an F1-score of 38%, the Mask R-CNN algorithm is https://doi.org/10.1109/OPTRONIX.2019.8862403 
less successful in 1500 iterations at identifying diseases 
caused by fungal infections of the skin. On the other hand, [8] Archana, R. and Jeevaraj, P.S.E. 2024. Deep learning 
with an AUC value of 88% and an F1-Score value of 78%, models for digital image processing: a review. Artif 
the YOLO algorithm's test results demonstrate its good Intell Rev 57, 11. 
ability to identify diseases brought on by skin infections, https://doi.org/10.1007/s10462-023-10631-z 
as evidenced by its ability to forecast 179 disorders in [9] Thakur, G. K. et al. 2024. Deep Learning Approaches 
epoch 75. for Medical Image Analysis and Diagnosis. Cureus 
The thorough investigation demonstrates that vol. 16,5 e59507. pp. 1-8. 
YOLOv5 performs better than Mask R-CNN in terms of https://doi.org/10.7759/cureus.59507 
accuracy, recall, precision, F1-Score, and AUC when it 
[10] Bhatti, H.M.A. et al. 2020. Multi-detection and 
comes to identifying fungal diseases on the skin. Iteration 
Segmentation of Breast Lesions Based on Mask 
and epoch settings have a significant impact on 
RCNN-FPN. Proc. - 2020 IEEE Int. Conf. 
performance; YOLOv5 shows the best performance at 75 
Bioinforma. Biomed. BIBM 2020, pp. 2698–2704.  
epochs. Mask R-CNN is less suited for this application 
https://doi.org/10.1109/BIBM49941.2020.9313170 
due to its computational intensity and lower detection 
accuracy, even though it has segmentation capabilities. As [11] Subash, K.V.V. et al. 2020.  Object Detection Using 
a result, in both clinical and real-world contexts, YOLOv5 Ryze Tello Drone With Help of Mask-RCNN. 2nd Int. 
is the recommended algorithm for identifying fungal- Conf. Innov. Mech. Ind. Appl. ICIMIA 2020 - Conf. 
induced skin disorders due to its exceptional efficiency Proc., no. Icimia, pp. 484-490, 2020. 
and accuracy. Future research can continue to enhance the https://doi.org/10.1109/ICIMIA48430.2020.907488
detection accuracy and practical applicability of deep 1 
learning models for diagnosing skin fungal infections. 
[12] Zhou, Zhijian. et al. 2020. Detection and 
 
Classification of Multi-Magnetic Targets Using 
References Mask-RCNN. IEEE Access. 8. 187202-187207. 
[1] Majaranta, P. et al. 2019. Eye Movements and Human- https://doi.org/10.1109/access.2020.3030676 
Computer Interaction In: Klein, C., Ettinger, U. (eds) [13] Ieamsaard, J. et al. 2021. Deep Learning-based Face 
Eye Movement Research. Studies in Neuroscience, Mask Detection Using YoloV5. In Proceeding of the 
Psychology, and Behavioral Economics. Springer, 2021 9th International Electrical Engineering 
Cham, pp.971-1015. Congress, iEECON 2021, Mar. 2021, pp. 428–431. 
https://doi.org/10.1007/978-3-030-20085-5_23 https://doi.org/10.1109/iEECON51072.2021.94403
[2] Al Bshabshe, A. et al.  2023. An Overview of Clinical 46 
Manifestations of Dermatological Disorders in [14] Yang, G. et al. Face Mask Recognition System with 
Intensive Care Units: What Should Intensivists Be YOLOV5 Based on Image Recognition. 2020. In 
Aware of? Diagnostics 13, no. 7: 1290. 2020 IEEE 6th International Conference on 
https://doi.org/10.3390/diagnostics13071290 Computer and Communications, ICCC 2020, Dec. 
[3] Nigat, T.D.; Sitote, T.M. 2023. Gedefaw, B.M. Fungal 2020, pp. 1398-1404.  
Skin Disease Classification Using the Convolutional https://doi.org/10.1109/ICCC51575.2020.9345042 
Neural Network. J. Healthc. Eng. 6370416, pp.1-9. [15] Ajith, A. et al. 2017. Digital Dermatology Skin 
https://doi.org/10.1155/2023/6370416 Disease Detection Model using Image Processing. 
[4] Badia, M. et al. 2020. Dermatological Manifestations Int. Conf. Intell. Comput. Control Syst. ICICCS 
in the Intensive Care Unit: A Practical Approach. Crit 2017 Digit., vol. 24, no. 7, pp. 168-173. 
Care Res Pract. 2020:9729814.  https://doi.org/10.1109/iccons.2017.8250703 
https://doi.org/10.1155/2020/9729814 
96 Informatica 49 (2025) 87–96 F. Masya et al. 
[16] Haddad, A and Hameed, S.A. 2018. Image Analysis [22] Chhatlani, J. et al. 2022. DermaGenics - Early 
Model for Skin Disease Detection: Framework. Proc. Detection of Melanoma using YOLOv5 Deep 
2018 7th Int. Conf. Comput. Commun. Eng. ICCCE Convolutional Neural Networks. 2022 IEEE Delhi 
2018, no. c, pp. 280-283, 2018. Section Conference (DELCON). pp. 1-6. 
https://doi.org/10.1109/ICCCE.2018.8539270 https://doi.org/10.1109/DELCON54057.2022.9753
227 
[17] Rathod, J. et al. 2018. Diagnosis of skin diseases 
using Convolutional Neural Networks. Proc. 2nd Int. [23] Wiliani, N., et al. 2023. Identifying Skin Cancer 
Conf. Electron. Commun. Aerosp. Technol. ICECA Disease Types with You Only Look Once (YOLO) 
2018, no. Iceca, pp. 1048-1051, 2018. Algorithm. Jurnal Riset Informatika. 5(3), 455-464. 
https://doi.org/10.1109/ICECA.2018.8474593 https://doi.org/10.34288/jri.v5i3.241 
[18] Rohaziat, N. et al. 2020. White Blood Cells Detection [24] Aishwarya, N., et al. 2023. Skin Cancer diagnosis 
using YOLOv3 with CNN Feature Extraction with Yolo Deep Neural Network. Procedia 
Models. International Journal of Advanced Computer Science. vol 220, pp. 651-658. 
Computer Science and Applications. 11. https://doi.org/10.1016/j.procs.2023.03.083 
https://doi.org/10.14569/IJACSA.2020.0111058  
[25] He, K., et al. 2020. Mask R-CNN. IEEE Trans. 
[19] Wu, Z. et al. 2019. Studies on Different CNN Pattern Anal. Mach. Intell. vol. 42, no. 2, pp. 386-
Algorithms for Face Skin Disease Classification 397. 
Based on Clinical Images. IEEE Access. vol. 7, no. https://doi.org/10.1109/TPAMI.2018.2844175 
c, pp. 66505-66511. 
[26] Shi, W., et al. 2019. Plant-part segmentation using 
https://doi.org/10.1109/ACCESS.2019.2918221 
deep learning and multi-view vision. Biosyst. Eng. 
[20] Li, L. F. et al. 2020. Deep Learning in Skin Disease vol. 187, no. September, pp. 81–95. 
Image Recognition: A Review. IEEE Access. vol. 8, https://doi.org/10.1016/j.biosystemseng.2019.08.01
pp. 208264-208280. 4 
https://doi.org/10.1109/ACCESS.2020.3037258  
 
[21] Salama, A. M. et al. 2024. A YOLO-based Deep 
 
Learning Model for Real-Time Face Mask Detection 
 
via Drone Surveillance in Public Spaces, 
 
Information Sciences, vol. 676, 120865. 
 
https://doi.org/10.1016/j.ins.2024.120865 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.4974 Informatica 49 (2025) 97–114 97
A Unified Trace Meta-Model for Alignment and Synchronization of BPMN
and UMLModels
Aljia Bouzidi1, Nahla Haddar2 and Kais Haddar2
1ISIMM University of Monastir, Monastir, Tunisia
2University of Sfax, Sfax, Tunisia
E-mail: aljia.bouzidi@gmail.com, nahla_haddar@yahoo.fr, kais.haddar@yahoo.fr
Keywords: Traceability, synchronization alignment, use case diagram, class diagram, MVC, BPMN diagram, integration,
transformation
Received: February 24, 2025
Organizations often face information system (IS) failures due to misalignment with business goals. Business
process models (BPMs) play a crucial role in addressing this issue but are often developed independently
of IS models (ISMs), resulting in non-interoperable systems. This paper proposes a traceability method to
link BPMs and ISMs, bridging the gap between business and software domains.
We introduce a unified trace meta-model integrating BPMN elements with UML constructs (use cases and
class diagrams) via traceability links. This meta-model is instantiated as the BPMNTraceISM diagram,
ensuring seamless integration through bidirectional transformation models.
To validate our approach, we developed a graphical editor for BPMNTraceISM diagrams and implemented
transformations using the ATLAS Transformation Language (ATL). A case study on a loan approval process
demonstrates the method’s effectiveness in aligning BPMN and UML elements, improving interoperability
and model alignment across domains
Povzetek: Razvit je enoten sledilni meta-model, ki povezuje elemente BPMN in UML (diagram primerov
uporabe, razredov po vzorcu MVC) za uskladitev poslovnih procesov in informacijskih sistemov, ki ga
validirajo z grafičnim urejevalnikom in transformacijami ATL.
1 Introduction evant explicit traceability model and defined it using the
integration mechanism. Indeed, we propose a requirements
In the software engineering field, Business Process Mod- engineering method that works at both the meta-model and
els (BPMs) playb an increasingly central role in the devel- model levels, establishing traceability between BPMs and
opment and continued management of software systems. ISMs to bridge the gap between business modeling and re-
Therefore, it is crucial to have Information System Mod- quirements elicitation.modeling. This method is deliber-
els (ISMs) that tackle BPMs. modelling. However, these ately influenced by the Object Management Group (OMG)
models are mostly expressed using different modeling lan- specifications. Particular attention is given to UML use
guages, and only a few information systems (IS) are devel- case models [2] as the most commonly used way to elicit
oped with explicit consideration of the business processes software needs and BPMN [3] as the most widely used lan-
they are supposed to support. This separation causes gaps guage to specify the business process model (BPMs). In-
between business and IS models. Thus, a methodology is deed, in [1], we firstly defined a unified trace meta-model
needed to examine the gap between BPMs and ISMs, and of the BPMN and the UML use case models in the form of
keep them aligned even as they evolve. Traceability in soft- an integrated single meta-model. It defines also traceability
ware development proves its ability to associate overlap- links between interrelated concepts to correlate overlapped
ping artefacts of heterogeneous models (for example, busi- concepts as new modeling concepts. This meta-model is
ness models, requirements, uses cases, design models), im- then instantiated in the form of a new diagram that we called
prove project results by helping designers and other stake- BPSUC (Business Process SupportedUse Cases). This new
holders with common tasks such as analysis of change im- diagram permits business teams and requirements design
pacts, etc. Thereby creating an explicit traceability model teams to work together within the same model, and allows
that is not a standalone guideline, but it has significant ben- specifying trace links graphically.
efits in terms of quality, automation, and consistency. Al-
though creating it is not a trivial task, an explicit traceabil- The practical benefits of the proposed method lie in its
ity model remains a reference for a consistent definition of ability to bridge the gap between business process man-
typed traceability links between heterogeneous model con- agement (BPM) and software systems development. In the
cepts, helping to ensure their alignment and coevolution. context of Business Process Models (BPMs) and Informa-
In our previous work presented in [1], we proposed a rel- tion System Models (ISMs), this method enables seamless
98 Informatica 49 (2025) 97–114 A. Bouzidi et al
integration and traceability across heterogeneous models, that defined external traceability models manually, basing
which is crucial for ensuring their alignment and coevolu- on some mechanisms, such as model integration, model
tion. By establishing clear and accurate traceability links merge/ composition, UML profiles or matrices.
between BPMs, UML use case diagrams, and class dia-
grams, our method enhances communication and collab-
oration among business analysts, software engineers, and 2.1 Traceability via transformation models
stakeholders, ensuring that software systems are developed In the first category, existing implicit traceability mod-
with a clear understanding of the business processes they els are commonly MDA-compliant approaches that de-
aim to support. Furthermore, the integration of these mod- fine traceability through exogenous, endogenous, hori-
els, coupled with the explicit traceability model, provides zontal, or vertical transformation models. In these ap-
several practical advantages, including improving change proaches, BPMN models are widely used to generate al-
impact analysis, enhancing automation, and maintaining ternative models through different transformation model
consistency throughout the system lifecycle. These ben- types. Among the various uses of BPMN models are: an
efits are particularly significant in dynamic environments exogenous-based transformation for the definition of map-
where both business processes and software systems evolve ping users/organizations' requirements with BPMNmodels
frequently. Thus, the method not only improves the qual- [4]; a vertical transformation for the generation of artefacts
ity of the development process but also provides a robust between BPMN and user stories [5] and [6]; and the gen-
framework for aligning business and software models, en- eration of UML models [7]; a horizontal and exogenous
suring their cohesive adaptation to changing requirements transformation for the generation of activity diagrams from
and system developments. BPMN [8] and [9]; and a vertical transformation of textual
This paper enriches and extends our work presented in requirements into a BPMN model [10]. Some approaches
[1]. The enrichment involves adding class diagram con- define endogenous transformation between UML diagram
cepts structured according to the MVC pattern. Our inter- elements to establish their traceability. For instance, the ap-
vention considers both the meta-model and the model lev- proach in [11] usesmachine learning techniques tomaintain
els. Hence, in the integrated trace meta-model proposed traceability information between software models. Their
in [1], we add new modeling concepts to express trace focus is particularly on the requirements, analysis, and de-
links between the class diagram, use case diagram, and sign models, which are specified by the UML language. To
the BPMN concepts. Class diagram concepts that have no trace links between requirements documents and UML di-
corresponding concepts are also included in the integrated agrams, several approaches use Natural Language Process-
trace meta-model. Proposed traceability concepts and class ing (NLP). For example, the approach in [12] uses a sys-
diagram concepts are instantiated in the BPSUC diagram. tem requirement description expressed in natural language
Accordingly, BPSUC now enables the design of class dia- to extract the actors and the actions automatically.
gram elements and the proposed traceability concepts com- The core benefit of defining implicit traceability is that
bined with their corresponding BPMN and use case dia- it does not require supplemental effort because only one
gram artefacts. chain of transformation is sufficient to perform transfor-
We validate our theoretical method by implementing a mations in both directions. Moreover, it offers multiple
visual modeling tool to support the enriched integrated trace trace links between generated artefacts. However, the iden-
meta-model and the new diagram supplemented with class tified trace links consider exclusively transformed artefacts.
diagram elements. Moreover, the transformation chain is static and cannot be
The rest of this paper is organized as follows: Section 2 updated to obtain the required traces for such traceability
is dedicated to discussing related work. In Section 3, we scenarios.
give an overview of the method presented in [1]. Section
4 is devoted to explaining our contributions. Section 5 and
section 6 are dedicated to demonstrating the feasibility of 2.2 Explicit traceability models
our proposal in practice and through a topical case study. In The second category includes approaches that define ex-
section 7, we evaluate and discuss our method. Finally, in plicit traceability models separate from the source mod-
section 8, we conclude the current work and we give some els. This category includes approaches that propose guide-
outlooks. lines for creating traceability models. For instance, the
author of [13] defines a method for guiding the establish-
ment of traceability between the software requirements and
2 Related work the UML diagrams. This guideline has two main compo-
nents: (i) a meta-model and (ii) a process step. The pro-
We classify related work into two groups based on the cess step defines the detailed processes, the mapping of
methodologies they have used to establish traceability requirements to UML diagrams, and the types of require-
between elements of heterogeneous models: (1) works ments. Requirements can be classified according to their
that have proposed transformation models to define in- aspects. This classification can be carried out according to
ternal or implicit traceability models, and (2) approaches the type of requirement, which then requires the use of cer-
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 99
tain types of UML diagrams. However, this guideline fo- between software artefacts. To demonstrate their work in
cuses only on establishing traceability at the meta-model practice, the authors develop a tool that supports traceabil-
level. Moreover, the business field is not considered in ity links between software models, including requirements
this work. The authors of [14] propose a meta-model-based and UML class diagrams, and the source code written in the
approach to create traceability links between different lev- Java programming language.
els of the same system. Indeed, this approach focuses on
defining traceability metamodel source code stored as an
Abstract Syntax Tree (AST) and other possible artefacts 2.3 Identified gaps in existing works
such as requirements, test cases, etc. To show the identi- Overall, existing works that define explicit traceability
fied trace links, authors develop an editor. Nevertheless, models are mostly focused on the meta-model level only
storing source code of a system as an AST can cause sev- and ignore the model level. Moreover, existing explicit
eral problems such as the appearance of syntax errors in the models establish traceability either between software mod-
source code, which leads to the loss of traceability links. els expressed in UML diagrams at the same or different ab-
There is other model-based research that aim to maintain straction levels or between business model artefacts. How-
traceability. For example, the research in [15] proposes a ever, none of the existing approaches have achieved suc-
co-evolution of transformations based on the propagation cessful results in establishing or maintaining traceability
of change. Its hypothesis is that knowledge of the evo- between BPMN models, UML use case models, and the
lution of meta-models can be disseminated by decisions UML class diagram.
aimed at driving the co-evolution of transformations. To The disadvantages of the proposed approaches stem from
address particular cases, the authors present composition- rigid relationship types that fail to adapt to the changing
based techniques that help developers compose resolutions needs and practices of organizations. Furthermore, most of
that meet their needs. For the same purpose, the approach the proposed approaches define or use very generic trace-
in [11] refers to machine learning techniques to introduce ability meta-models, capable of generating highly abstract
an approach called TRAIL (TRAceability lInk cLassifier). trace models. In practice, there is no prescription for how
The training of the classifier is based on a training dataset to add customized tracing information or how to adapt a
that contains histories of existing traceability links between generic traceability meta-model to express valuable and
pairs of artefacts to output the trace link (related or unre- context-specific traces. Concerning the approaches that fo-
lated) of any future pair of artefacts (new or already exist- cus on concrete modeling languages, to our knowledge,
ing). Some other approaches define traceability models for there is no approach that proposes an explicit trace model
eliciting requirements of complex systems [16] and [17]. or meta-model between BPMN and UML models, even
Likewise, the authors of [19] base their work on deep learn- though they are the most popular standards for modeling
ing techniques and propose a neural network architecture business processes and automated information systems.
based on word embedding and Recurrent Neural Network
(RNN) algorithms to predict trace links automatically. The
output of this model is a vector that contains the semantic 3 Background of our previous
data of the artefact. Then, the trained model compares the
semantic vectors of a pair of artefacts and predicts if they traceability method
are related or unrelated. However, considering all meta-
models from many different abstraction levels in one uni- The method presented in this paper is an extension of our
fied single traceability model is not a trivial task and can previous work [1]. In this previous work, we have explored
result in very complex models. the advantages of defining an integrated traceability model
to establish traceability between the BPMN and the UML
In [18], the authors propose an approach to promote use case models and ensure their coevolution once a change
traceability and synchronization of computational models has occurred. This method acts at both the meta-model and
in an Enterprise Architecture (EA), using meta-models, model level, and it includes three core steps:
model traceability, and synchronization structures. The au- (i) First, we have defined an integrated trace meta-model
thors represent the meta-models of the EA at all abstraction that is a specification of traceability between the existing
levels (strategic, tactical, and operational). These levels are artefacts, while keeping them unchanged and independent.
denoted within the integrated meta-model by three pack- This integrated trace meta-model contains all the BPMN
ages. Each package incorporates the core concepts of the and the UML use case meta-model artefacts (meta-classes
level it represents. They integrate the three meta-models and associations) unified with new meta-classes and asso-
by adding alignment points between them. In addition, ciations for expressing traceability links at the meta-model
they define a traceability framework and a synchronization level. The integrated trace meta-model favors simplicity
framework to support the analysis of the impact of organi- and uniformity because source meta-models are kept and
zational changes. unified with their traceability information in one unified
There are also studies on specific languages. For exam- meta-model.
ple, the approach in [20] uses Natural Language Processing (ii) Next, we instantiated the integrated trace meta-model
techniques to define a framework for managing traceability at the model level. We represent it as a new diagram called
100 Informatica 49 (2025) 97–114 A. Bouzidi et al
Business Process Modeling Notation Traces Use Case (BP-
SUC). This diagram also incorporates the BPMN and the Table 1: Mapping of BPMN, use case and the trace meta-
UML use case elements together with traceability links, and model concepts
allows designing BPMN and use case diagram artefacts, Use Case BPMN concept Meta-model
jointly. Moreover, visualizations and queries on traced el- Concept concept
ements together are straightforward, because business ana- Package Empty Lane (a lane in- Organisation
lysts and software designers are now able to work together cluding other sub-lanes) Unit Pack-
on one integrated model. BPSUC can also be used to anal- age
yse change impacts and validate them before propagating Actor Non empty Lane (that Organization
them to the source models. does not contain other Unit Actor
(iii) Finally, we defined bidirectional transformation sub-lanes)
models between the BPSUC diagram and the sourcemodels Use case Fragment represented UCsF
(BPMN and the use case models) to ensure the coevolution by a sequence of BPMN
of the origin models. artefacts that is per-
formed by the same
role and manipulates the
3.1 Integrated trace meta-model same item aware element
In our previous work presented in [1], a unified trace meta- (business object, input
model is proposed based on a semantic mapping of pairs of data, data store, data
BPMN and use case meta-model artefacts. The definition state) d
of thismeta-model follows the following scenario: For each Extends Exclusive Gateway be- Exclusive
pair of overlapped BPMN and UML use case concepts, we tween two different frag- Gateway,
add a new modeling concept that can be either a link, such ments Extends
as an association, a composition or an inheritance, or a new Association Fragment within the low- Association
meta-class. Each trace link represented by a newmeta-class est nesting level of sub-
is associated with the pair artefacts it specifies, generally, lanes
by an inheritance relationship. Includes Redundant Fragment Fragment
Table 1 summarizesmappings between the BPMNmodel (that appears multiple that appears
(first column) and the use case diagram concepts (second Times in the BPMN multiple
column ), and the corresponding new meta-classes (third model) times, In-
column) that are associated with them in the integrated cludes
meta-model (the full mapping and its explanation is avail- Extends Inclusive Gateway be- Inclusive
able in [1]). To validate the proposed mapping further, we tween two different frag- Gateway,
have conducted additional evaluations across a variety of ments Extends
BPMN and UML diagram scenarios. Extension Condition of sequence Extension
This expansion includes not only the core BPMN and Point Flow + the name of the Point
UML elements such as activities, actors, and use cases but fragment that represents
also more complex diagrams, such as: to the extending use case
– BPMNModels: Event-driven processes, process vari-
ants, and sub-processes with different complexity lev-
els, such as loan approval and inventory management The integrated meta-model is depicted in Figure 1). In
systems. order to be able to read it, we have presented in this Figure
– UML Diagrams: Class diagrams, including inheri- only the core artefacts of the source meta-model (use case
tance and association relationships, and more sophis- meta-model and BPMN met-model) and all the trace links
ticated use case diagrams representing different busi- (meta-classes and associations). Dark grey meta-classes
ness functions, such as order fulfilment, customer sup- represent new meta-classes; light grey meta-classes repre-
port, and system maintenance. sent UML use case elements, white meta-classes represent
These diverse scenarios have allowed us to assess how BPMN elements, and black lines represent existing associa-
well the mapping between BPMN, UML use case, and tions from the source meta-models The blue lines represent
class diagrams holds up in real-world business process and trace relationships, providing the foundational traceability
system modeling. By applying the proposed traceability between BPMN and UML elements, as further detailed in
method to these varied scenarios, we demonstrate the scala- [1] and [22].
bility and robustness of our meta-model. We have also pro-
vided examples where the mapping effectively handles the – Organizational-Unit-Package
integration of different BPMN and UML model types, en- In BPMN, a non-empty lane is a grouping element and
suring traceability between the business and software mod- therefore has the same meaning as a package in UML. Con-
els. sequently, the Organizational-Unit-Package (OUPackage)
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 101
is defined to trace the link between the pair BPMN non- logical consistency.
empty lane and the use case package thereby defining an in- – Use case supporting fragment
heritance relationship between the new meta-class and this In order to support business objectives, such a UML use
artefact pair. case should be able to realize some business activities that
– Organizational-Unit-Actor are specified in the integrated trace meta-model by a Frag-
In the proposed integrated trace meta-model of [1], we ment. A separate specification of the use case and the frag-
have defined a meta-class designated Organizational-Unit- ment that is supposed to realise does not allow explicitly
Actor (OUActor). This new meta-class traces artefacts of representing the semantic links between them. To do this,
the pair UML actors and BPMN empty lanes (i.e. lanes that we have defined the integrated trace meta-model presented
do not have embedded lanes). That is, it unifies the prop- in [1], which introduces a new meta-class that we designate
erties of a lane and an actor, and combines them without Use Case supporting Fragment (UCsF) This new meta-
changing their semantics there by defining the OUActor as class is defined as a specialization of a UML use case in
specialization of the UML actor and the BPMN Lane pair. order to inherit all its properties without updating its initial
In this way, OUActor inherits properties of this pair of arte- meaning.
facts without updating their original semantics and struc-
tures. For example, in a loan approval process, the Loan
Officer is represented in a BPMN diagram by a non-empty 3.2 BPSUC diagram
lane and in a UML use case diagram as an actor involved To allow modeling the artefacts of the proposed integrated
in the process. The OUActor meta-class links these repre- trace meta-model, we have instantiated it in the form of an
sentations, inheriting properties from both while preserving integrated trace model, in our previous work [1]. We rep-
their original semantics. This ensures synchronization be- resent it as a new diagram that we have called BPMN Sup-
tween BPMN and UML elements, offering a unified view porting Use Case model (BPSUC).
of the Loan Officer’s role across models.
– Fragment
A fragment is defined by [22] as “a set of interrelated Table 2: Notation of the traceability artefacts
BPMN elements that has inputs and outputs, and which is Meta-model concept Graphical notation
executed by the same performer”. This artefact is speci-
fied in the unified trace meta-model as an instance of the
meta-class Fragment (cf. Figure 1)). As a Fragment is just OU-Actor
an activity that can contain other BPMN concepts such as
tasks, events, gateways and sequence flows, we have aggre-
gated a BPMN sub-process to a Fragment by creating an ag- OU- Package
gregation relationship called fragments, between the frag-
ment and sub-process meta-classes in the integrated trace
meta-model (cf. Figure 1)). Its cardinality is 1-* to point
out that a sub-process should contain at least one fragment Use Case supporting
but it may incorporate more than one Fragment. In addi- Fragment (UCsF)
tion, we define a many-to-many reflexive association in the
fragment to represent the fact that a fragment may be an
aggregation of other fragments (cf. Figure 1)). Moreover, For each concept, we have provided a graphical nota-
we create an association between the data object and the tion as follows: We have introduced new notations to the
fragment (cf. Figure 1)) to associate each fragment with proposed new meta-classes UCsF, Organization Unit Pack-
the objects it manipulates. The cardinality of this relation- age, and Organization Unit Actor. These notations are in-
ship is fixed to 1-* to indicate that each fragment manipu- spired by and extended from the icons of the pair of arte-
lates at least one business object type, but it may manipu- facts they represent. The inspiration ensures that experi-
late more than one business object. Furthermore, we have enced business and system designers are comfortable using
defined an association called organizationUA between the the BPSUC diagram. Each concept originates form UML
meta-classes OU-Actor and fragment with a cardinality of use case and BPMN models[1], and retains its original no-
1-* to associate a fragment with its performer. For instance, tation. In the BPSUC diagram, the Fragment is instantiated
tasks such as Review Application, Assess Credit Score, and as a specific activity within the Loan Approval Process,
Approve Loan in the BPMN Loan Approval Process are linking BPMN tasks to the Loan Officer (OUActor) and
all performed by the Loan Officer. The Fragment aggre- business objects like the Loan Application. Additionally,
gates these tasks into a cohesive group and connects them the Organization Unit Package (OUPackage) meta-class is
to the BPMN sub-process as well as the business objects used to trace relationships between BPMN lanes and UML
(e.g., Loan Applications, Credit Scores) manipulated dur- packages. For example, functional areas such as Loan Re-
ing the process. This provides clear traceability between view, Credit Assessment, and Loan Approval in the BPMN
tasks, business objects, and performers while maintaining diagram are mapped to corresponding UML packages, en-
102 Informatica 49 (2025) 97–114 A. Bouzidi et al
Figure 1: Traceability of BPMN and use case meta-model concepts
suring alignment and traceability between these functional from our trace model to the source models is carried out
areas and their UML counterparts. through defined MDA model transformations. The former
Table 2 depicts the graphical notations of the new meta- is explored to guarantee the coevolution of the BPMN and
classesOrganisationUnit Actor, OrganisationUnit Package the UML models. Figure 2) demonstrates the background
and the UCsF. of our traceability method. In this section, we further ex-
plain howwe extend and improve the integrated trace meta-
model and the BPSUC diagram, as well as the rectifications
4 Traceability method made to them.
The research work conducted in this paper is an extension
and enhancement of our previous work presented in [1]. 4.1 Integrated trace meta-model
The extension consists of improving the integrated trace improvement
meta-model and the BPSUC diagram to include the arte-
facts of the UML class diagram structured according to the Our first improvement of the integrated trace meta-model
MVC design pattern. Our contribution aims not only to es- consists of defining an adequate strategy for defining its
concepts and relationships between them. Indeed, we pro-
pose a methodology for defining the integrated trace meta-
model which includes two main steps: (1) identifying over-
lapping concepts of BPMN and UML meta-models to de-
fine a relevant mapping between them, and (2) Defining
an adequate methodology to link each pair of interrelated
concepts, without changing their semantics. Thus, we pro-
pose to keep overlapping concepts and connect them ei-
ther by a new concept, or a new relationship, which spec-
ifies the trace link between existing concepts at the meta-
model level. Afterwards, we connect each pair of artefacts
to the new concepts representing them through a general-
Figure 2: Background of the traceability method ization/specialization relationship. This relationship allows
inheriting properties of both separated concepts as well as
tablish alignment but also to keep source models always combining their usage without updating their initial seman-
aligned even if they evolve. The propagation of changes tics. Our second improvement consists of adding the class
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 103
diagrammeta-model artefacts to the previous version of the In contrast to the mapped concepts of the use case and
integrated trace meta-model. the BPMN meta-model artefacts, the mapping between the
By applying our meta-model construction, we need to interrelated concepts of UML class diagram and the BPMN
identify adequate mappings between BPMN and UML meta-model is not tight, as shown in Table 3. Indeed, one
class diagram concepts. In the literature mapping between UML concept may be represented by many BPMN con-
BPMN and UML class diagram concepts is widely dis- cepts and vice versa. This is due mainly to the important
cussed. Among them, [23] defined model transformations degree of heterogeneity between the BPMN and the class
from BPMN into UML class diagrams structured according diagram artefacts. Thus, our mapping is limited to defining
to the MVC design pattern, and use case diagrams based on associations instead of defining new traceability concepts,
semantic mappings. For example, they propose mapping as we aim not to complicate our integrated trace meta-
each BPMN empty lane (i.e. lane that does not include model, and therefore facilitate its readability while main-
other lanes) into a class in the class diagram, and into an taining its consistency. The aforementioned trace meta-
UML actor in the use case model. We reuse the defined se- classes can also be reused to define BPMN- class diagram
mantic mappings in this approach to continue the definition concept traceability.
of the integrated trace meta-model. The excerpt of the meta-model defined to trace the
BPMN and the class diagram meta-models is presented in
Figure.3. To ensure readability, Figure 3) depicts only the
Table 3: Mapping of BPMN and class diagram meta-model main artefacts of class diagram and BPMN meta-models,
concepts as well as the reused traceability concepts.
BPMN concept UML class diagram White meta-classes are BPMN concepts, orange meta-
meta-model classes are UML class diagrammeta-model concepts, khaki
Item Aware Element (Data – Entity class meta-classes denote UML class diagram concepts used for
object, Data store, Data in- – Association structuring the class diagram according to the MVC de-
put, Data output or Data sign pattern, while new concepts are specified by dark grey
state) meta-classes. The blue associations represent the new trace
Empty lane – Entity class links, while the black ones are the existing associations.
– View class It is important to note that all the use case concepts,
– Control class BPMN concepts, traceability links and existing associa-
– Association
Fragment – View class tions defined in the previous extract of the integrated trace
– Control class meta-model, which are not present in this extract remain
Exception event – Exception class valid.
– Operation In the excerptof Figure.3, each BPMN concept is asso-
Signal event ciated with its corresponding concept in the class diagram
– Signal Class meta-model. For example, we define a trace link called
– Operation trace between the data object and the entity class to estab-
lish traceability between them.
Automated task t (business The multiplicity of this association is 1..* to indicate that
rule task, receive task, send – Operation
task, user task, script task, – Association each item-aware element should represent exactly one en-
tity class. Moreover, we define a trace link between the
service task) within a frag- gateway and the property meta-classes as gateways can be
ment indicators of association cardinalities. The multiplicity of
Item aware element type Cardinality of associ- this association is 0. On the other hand, UCsF is linked
(single or collection) or ations to the following meta-classes: class, ClassDIPackage, and
Gateway or association by composition . This means that a UCsF can
Loop task/ Rollback se- include classes, associations, and packages. These asso-
quence flow ciations mean that a UCsF is a use case that incorporates
Item aware element attached Parameters of an op- its supported class diagram elements, representing the sup-
to an automated task t within eration ported fragment elements.
a fragment f The cardinality of the composition association UCsF-
Conditional sequence Flow Attribute ClassDIPackage is 3-* to indicate that an UCsF should in-
corporate at least three packages: View, Control, and Mod-
els, which represent the three parts of the MVC design pat-
In Table 3, we summarize the semantic mapping be- tern.
tween the BPMN meta-model artefacts and the class di- In addition, we define an association between OUActor
agram meta-model artefacts from [23]. In this Table, the and class to express that an actor in the integrated trace
class diagram meta-model concepts are structured accord- meta-model is represented as a class in the class diagram
ing to the MVC design pattern. meta-model. Furthermore, in our integrated trace meta-
104 Informatica 49 (2025) 97–114 A. Bouzidi et al
model, a generalization/specialization relationship between
the meta-classes OUPackage and ClassDIPackage is de- Table 4: Graphical notations of overlapping elements of
fined to point out that this trace meta-class inherits all prop- BPMNTraceISM diagram
erties of the Package meta-class. BPMNTraceISM Graphic BPMNTraceISM Graphic
element notation element notation
4.2 BPSUC diagram improvement
Use case asso- Signal event
In contrast to most existing approaches [11], [13], [14], ciation
[15], [16], [17], [18], [19], [20], [21], which focus only on
the meta-model level, our traceability method includes both
the meta-model and the model levels. Thus, the second step Extends rela- Exclusive gate-
of our contribution is devoted to describing how the trace- tionship way
ability of BPMN and UML artefacts is established at the
model level. We have improved the proposed BPSUC di-
agram from [1], in which the BPSUC diagram features are Includes rela- Parallel gate-
limited to designing thoroughly the BPMN and the use case tionship way
diagram artefacts, , combined with their traceability links,
which already reflects its designation. Annotation Inclusive gate-
In this paper, we aim to enrich this diagram to incorpo- flow way
rate class diagram elements combined with BPMN and use
case diagram elements. The first thing we do is update the
name of BPSUC to be in harmony with its newly supported Start event Data input
features. The new designation we have chosen is BPMN-
TraceISM (Business Process Model and Notation Traces End event Data output
Information System Models). BPMNTraceISM is an in-
stantiation of the new version of the improved meta-model
and forms a single unified model that combines the usage of Manual task Data store
UML elements including the use case diagram and class di-
agram elements, as well as the BPMN elements thoroughly. Normal task Sequence flow
Thus, this diagram is now able to design elements and rela-
tionships of both UML use case and class diagrams, as well
as BPMN models, concurrently. Moreover, it specifies the Error event Group
traceability information of the interrelated artefacts.
Each artefact inBPMNTraceISM diagram has its specific
notation. Some of them retain the original notation (BPMN
or UML notations), while the others have a new represen- Cancel event Entity class
tation, which does not differ greatly from BPMN and UML
notations. Control class Generalization
4.2.1 BPMNTraceISM artefacts conserve their initial
notations Signal class Aggregation
association
The mappings on which we base the definition of the in-
tegrated trace meta-model comprises neither all the BPMN
concepts nor all the UML concepts. This is due to the fact View class Composition
that, some BPMN artefacts do not have their corresponding association
UML artefacts, and vice versa. For example, the mapping
does not define any UML concept representing a BPMN Exception class Directed asso-
start event. ciation
Even though, in a BPMNTraceISM diagram, it is pos-
sible to specify UML artefacts with no corresponding ele-
ments in BPMN. According to the mapping, many elements
of UML are mapped to one BPMN element. Thus, the rep- ciation, (ii) an entity class, and (iii) an operation of a class,
resentation of these elements in UML diagrams requires in the class diagram. In this situation, it is very difficult to
grouping them. On the other hand, one UML element may represent the mapped elements in one unifying element. At
be linked to many BPMN elements. For example, a data the meta-model level, we have proposed associating each
store in the BPMN diagram is transformed into (i) an asso- pair of these mapped concepts by an association instead of
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 105
Figure 3: Traceability of the BPMN meta-model and the UML class diagram meta-model
defining new traceability meta-classes. At the model level, UCsF should act as a complex symbol that describes con-
these artefacts are processed similarly to the non-mapped currently BPMN elements and UML class diagram ele-
concepts, and retain their original notations in the BPM- ments. In order to represent explicitly the different ele-
NTraceISM diagram. Table 4 outlines the graphical nota- ments incorporated by a UCsF, the use case notation needs
tions of the core artefacts of the BPMNTraceISM diagram to be extended. Therefore, we adjust the UCsF notation
that retain the initial notations. OUPackage and OUActor by adding another compartment (cf. Figure 4) to encap-
are new meta-classes defined by [1] to represent traceabil- sulate class diagram elements representing the components
ity links of BPMN and UML use case diagram elements. (classes, associations and packages) of the supported frag-
In the integrated trace meta-model, we did not reuse these ment. In order to avoid the complexity of this element,
meta-classes to define new associations. Thus, the instanti- the designer can choose to hide or show each compartment.
ation of these meta-classes keeps the annotations provided Figure 4) depicts the graphical notation of a UCsF, in which
in [1]. all compartments are hidden.
4.2.2 UCsF notation
In the previous version of the diagram BPMNTraceISM (in
a BPSUC diagram), [1] states that a UCsF is a specializa-
tion of a use case and inherits its properties. Therefore, the
graphical notation of UCsFs extends the graphical notation
of a UML use case. Moreover, a UCsF has composition
relationships to (i) a BPMN Fragment. To represent this
trace link, graphically, [1] defines a compartment that in-
corporates the corresponding BPMN fragment. Figure 4: UCsF notation
In our integrated trace meta-model, we have defined
composition relationships from a UCsF to some UML class
diagram artefacts (cf. Figure 3)). Indeed, a UCsF should 4.3 Change propagation improvement
encapsulate classes, associations and packages, which cor-
respond to its supported fragment. Accordingly, we pro- Our traceability method aims to ensure the coevolution of
pose to update the graphic notation of the UCsF. Thus, a the separated models when a change occurs either in the
106 Informatica 49 (2025) 97–114 A. Bouzidi et al
source models (BPMNmodel, use case model, and/or class class diagram (MCD) conforming to the UML meta-model
diagram) or in the BPMNTraceISM diagram (cf. Figure 5)). (MMUML). This transformation is carried out based onmap-
To do this, we have improved the transformation model de- pings between the new diagram, the BPMN and UMLmod-
fined in [1] thereby including the class diagram concepts els.
in the bi-directional transformation rules defined in [1] as The formal definition of our forward transformation rules
two sets of transformation models (forward and backward is as follows:
transformation rules). They ensure the transformation be-
tween the BPMNTraceISM diagram, the BPMN, and the
UMLmodels that include a class diagram and a use case di- MTransF ((MBPMN/MMBPMN ,MUCM/MMUML,
agram using a semantic mapping between BPMN, BPMN- MUCD/MMUML,
TraceISM, and UML elements derived from our integrated (map(MUCM,MBPMNTraceISM),
trace meta-model. map(UCD,MBPMNTraceISM),
Each transformation model includes a set of well-defined map(MBPMN,MBPMNTraceISM))
transformation program or transformation rules (Tab) (Tab forwa→rdRules
conforms toMMt) that transform source models (Ma) con- MBPMNTraceISM/MMBPMNTraceISM
forming to source meta-models (MMa) (noted Ma/MMa) (3)
to target models (Mb) confirming to target meta-models There are two possible scenarios for producing the
(MMb) (noted Mb/MMb ) according to a mapping between BPMNTraceISM elements based on the forward transfor-
the source and target model artefacts (noted map mation rules:
(Ma, Mb)).
Formally, we specify transformation models according to a The first scenario consists of applying a forward trans-
function that we callMtransF formation rule (RX) to derive trace modeling elements
This function is formally written as follows: (tre) represented in the BPMNTraceISM diagram from a
( ) BPMN element (MBPMN!Element) and a UML element
(MUML!Element). More precisely, a OUActor, a OUPack-
Tab/MMt
MtransF Ma/MMa, map(Ma,Mb) → Mb/MMb age, and a UCsF of the BPMNTraceISM diagram are gen-
(1) erated from the BPMN and UML elements. Formally, these
transformation rules are as follow:
For example, consider the forward transformation rule
R1 that transforms aUMLpackage and a non-empty BPMN
lane into a OU-Package in the BPMNTraceISM diagram. MTransF tre((MBPMN !Element,MUML!Element),
This transformation ensures that elements in the source (map(MBPMN,MBPMNTraceISM),
models are correctly mapped to their counterparts in the map(MUML,MBPMNTraceISM)))BPMNTraceISM diagram, facilitating traceability across R
both business and software models. →X MBPMNTraceISM !tre
The proposed bi-directional transformation models (4)
(backward and forward) ensure the coevolution of BPMN For instance, we suppose that a forward transformation
and UML models, as well as the coevolution of the source rule R1 produces an OU-Package from a UML package and
models (business model specified by a BPMN diagram and a BPMN non-empty lane. Formally, this rule can be written
software models specified by a UML class diagram and an as follows:
UML use case diagram) and the BPMNTraceISM diagram.
Formally, the forward and backward transformation
model is specified as(follows: ) MtransFOUPackage((MBPMN !lane,MUCM !Package),
(map(MBPMN,MBPMNTraceISM),
MtransF Ma/MMa, map map(MUCM,MBPMNTraceISM)))
(Ma,Mb) (2) →R1
forwardRules,↔backawardRules MBPMNTraceISM !OUPackage
(Mb/MMb) (5)
The rest of this section will be devoted to providing more The second scenario consists of generating unrelated el-
details on how we created the bidirectional transformation ements (ure) from either, the UML models or the BPMN
rules. model only. Indeed, each concept in the BPMNTraceISM,
that corresponds to a concept in the source models (BPMN
4.3.1 Forward transformation rules model, a UML class diagram or a UML use case diagram)
needs just its original model. In this case, the input of this
We propose a forward transformation model (Forward transformation rule is either a BPMN model if this concept
rules) to produce automatically a BPMNTraceISM diagram comes fromBPMNor aUMLmodel if its origin is the UML
(MBPMNTraceISM) conforming to our integrated trace meta- class diagram or the UML use case diagram. For example,
model (MMBPMNTraceISM) from the source models, namely the generation of amanual task in the BPMNTraceISM dia-
a BPMN diagram (MBPMN) conforming to the BPMN gram requires the BPMNmodel only because a manual task
meta-model (MMBPMN), a use case model (MUCM), and a does not have corresponding elements in the UML diagram.
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 107
This transformation rule is written as follows: For example, when a change occurs in the BPMNTra-
ceISM diagram, such as adding a new OU-Actor, the cor-
MTransF ure( MBPMN !manualTask, responding elements in the BPMN and UML models need
map(MBPMN,MBPMNTraceISM) to be updated. The backward transformation rule ensures
R(Man→ualTask)
MBPMNTraceISM !manualTask that:
(6) 1 TheOU-Actor is mapped to both the UMLActor in the
Let's illustrate how a change in the BPMN model (e.g., use case diagram and a new BPMN lane in the BPMN
adding a new lane) is propagated into the BPMNTraceISM model.
diagram. Assume that Rule R1 is applied to generate an 2 The changes are propagated back into the source mod-
OU-Package from the UML package and the BPMN lane. els, maintaining alignment across the models.
The transformation process involves the following steps:
1 The BPMN Lane and UML Package elements are
identified. 4.3.3 Change propagation process
2 Rule R1 is triggered, creating a corresponding OU- The bidirectional transformation rules allow propagating
Package in the BPMNTraceISM diagram. changes that occur in the source model into the target mod-
3 The OU-Package is updated within the BPMNTra- els. By applying these rules this approach enables the co-
ceISM diagram, and the changes are synchronized evolution of the business and software models.
with the BPMN and UML models. The change propagation process is carried out in two
ways (cf. Figure 5): (1) by manually updating the source
4.3.2 Backward transformation rules models (BPMN, UML, and UML use case diagram) , or (2)
To have the opposite direction of the forward transfor- by designing the BPMNTraceISM diagram.
mation rules, we have defined a backward transformation In the first case, software designers and business ana-
model. This means that the source elements of each for- lysts separately and concurrently update the BPMN mod-
ward transformation rule become the target elements of a eland consequently the use case diagram and/or the class
backward transformation rule, and its target elements be- diagram. For example, a software designer adds a new use
come the source elements of the backward transformation case to the use case model, new classes responsible for re-
rule. Formally, backward transformation rules are written alizing the new use case, and simultaneously, a business
as follows: analyst changes the name of a lane in the BPMN model. A
direct generation of the software models leads to the loss
MTransF (MBPMNTraceISM/MMBPMNTraceISM , of changes made by the software designers. Additionally,
(map(MBPMNTraceISM,MUCM), to avoid unintentional updates, the impact of changes in-
volved in a (business or UML) model needs to be anal-
map(MBPMNTraceISM,MUCD), ysed before propagating it to the target model. To tackle
map(MBPMNTraceISM,MBPMN)) this problem, an intermediate step is required to make all
backwa→rdRules updates made in the separate models. This step can be
M reached by executing our forward model (user task “Exe-
BPMN/MMBPMN ,MUCM/MMUML,
MCD/MMUML,M
cute forward transformation rules”), which derives a BPM-
UCD/MMUML
(7) NTraceISM diagram from both UML and BPMN. Thus,
We use the same logic as in the forward transformation all changes made on the BPMN and/or on the class and
rules to define the reverse transformation rules. Therefore, use case diagrams are considered in the derived BPMNTra-
each backward transformation rule of each pair of artefacts ceISM diagram.
is defined according to the following formula: In the second case, all updates made by business analysts
MTransF and software designers are done in the unified trace model
tre(MBPMNTraceISM!tre,
(BPMNTraceISM diagram) of being made in the BPMs and
the ISMs. Using BPMNTraceISM overcomes the gap be-
MTransF tre (MBPMNTraceISM !tre, tween the business analysts and software designers, and en-
(map(MBPMNTraceISM, MBPMN ), ables them to work together using the same model. Indeed,
map(MBPMNTraceISM,MUML )))
(8) this diagram covers all business and software model ele-
R→X (M ments and the traceability concepts of pairs of mapped arte-
BPMN !Element,MUML!Element)
facts. Any change involving a BPMNTraceISM element
Non-overlapping artefact transformation rules are de- (bp) leads to the modification of the BPMN or/and UML
fined according to the formula below: model elements traced by bp. For example, the insertion
of a new OU-Actor in the BPMNTraceISM diagram leads
MTransF ure(MBPMNTraceISM !ure, to the insertion of and a new UML actor in the UML use
(map(MBPMNTraceISM,MBPMN), case diagram and a new BPMN lane in the BPMN model.
map(MBPMNTraceISM,MUML)))
(9)
BPMNTraceISM can act as a gateway allowing business
R→X (MBPMN !Element,MUML!Element) analysts and software designers to work together to test,
108 Informatica 49 (2025) 97–114 A. Bouzidi et al
Figure 5: Synchronization process of BPMN and UML
analyse, and correct inconsistencies due to unwanted up- a graphical editor that conforms to the trace meta-model
dates before propagating them to the sourcemodels (BPMN and enables concurrently seeing and managing trace rela-
and UML models). In addition, this new diagram can be tionships between the BPMN model, the use case, and the
used to analyse and estimate the impact of changes made class diagram. BPTraceISM can be integrated within other
to business or system components or services. Until this modeling tools to enhance their modeling capabilities. To
step, although the BPMNTraceISM diagram is aware of the make our modeling tool available in any Eclipse environ-
updates made by both business analysts and software de- ment without need to start an Eclipse runtime, we imple-
signers, the source models are not aware neither the BPM- ment it as an Eclipse plug-in.
NTraceISM diagram nor each other. Accordingly, propa-
gating the modifications is an essential step to ensure the
coevolution of the source models. We can do this eas-
ily by running our backward transformation model (user
task ”execute backward transformation rules”). Once the
backward transformation model is run, changes are prop-
agated to BPMN and UML models and thus these models
are aligned with each other.
5 Implementation
To use the proposed traceability approach, we implement
a visual editor called Business Process model Traced with
Information System Models (BPTraceISM). Figure 6: The environment of the BPTraceISM editor
Moreover, we develop a prototype called Business Pro-
cess to Information System Models (BP2ISM) that pro- The construction process of the BPTraceISM consists
vides significant practical support for the transformations of two main phases: (1) the definition of the modeling
involved in our traceability method. This prototype auto- tool, and (2) the definition of the plug-in that supports it.
mates the suggested forward and backward transformation The first phase begins with the implementation of the trace
models between the business process and the ISMs on the meta-model using the ecoremeta-modeling language. Then
one hand, and the BPMNTraceISM diagram on the other. we build a toolbox for creating instances of the meta-model
These transformation models are automatically applied via classes. In the second phase, we develop a feature that sup-
transformation rules expressed in the ATL transformation ports the modeling tool. Afterward, we construct an update
language. site to ensure the portability of our plug-in and allow its
installation via any Eclipse update manager.
5.1 Visual editor implementation BPTraceISM environment is composed of four main
parts (cf. Figure 6): the project explorer containing an EMF
To implement the BPTraceISM editor, we have used project that includes BPMNTraceISM diagrams (part a), the
Eclipse EMF to implement the trace meta-model and modeling space (part b), the toolbox containing the graphi-
Eclipse GMF to design the concrete syntax of the BPM- cal elements of a BPMNTraceISM diagram (part c), and the
NTraceISM diagram. Indeed, the modeling tool includes properties tab to edit the properties of an element selected
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 109
in the modeling space (part d). BPISM2BPMNTrISM takes three files as input: (1) A
Figure 7) outlines a simple example of a BPMNTra- file with the extension “.bpmn” that must conform to the
ceISM diagram created using the editor. The modeling BPMN2.0 meta-model, (2) two files with the extension
space contains an OUActor called supplier associated with “.uml” that must conform to the UMLmeta-model and con-
a UCsF called manage purchase order. In the business com- tain the use case model and the class diagram. It generates
partment of the UCsF Manage purchase order, we have a as output a BPMNTraceISM diagram with the extension
user task calledAccept purchase order. In the class diagram “.BPMNTraceISM”.
compartment, we have four classes linked via undirected BPMNTrISM2BPISM implements the backward trans-
associations. Each class has a name and a stereotype. The formations, i.e., the transformation rules from a BPMN-
boundary class Manage purchase order contains an opera- TraceISM diagram into a BPMN and UML models. It
tion called acceptPurchaseOrder() takes as input a BPMNTraceISM diagram with the exten-
sion “.BPMNTraceISM”. It generates as output three files:
(1) A file with “.bpmn” as extension. This file conforms to
the BPMN2.0 meta-model, (2) two files with the extension
“.uml” that include the generated use case model and the
class diagram.
6 Case study
We take a common business process model for online pur-
chasing and selling to demonstrate the viability of our trace-
Figure 7: Example of a BPMNTraceISM diagram within ability method. The model is specified using BPMN2.0 (cf.
the modeling tool Figure 8)) [23].
.
This business process begins when a customer selects a
product to purchase and adds it to the basket, resulting in
5.2 Prototype for the transformation models the creation of an online purchase order and the transmis-
sion of the order to the vendor. The customer has the op-
BP2ISM is implemented within the Eclipse Modeling tion to cancel the purchase order before entering their per-
Framework (EMF) environment. It includes two compo- sonal information. Otherwise, they must fill in their per-
nents: sonal information and submit an online purchase order to
– BPISM2BPMNTrISM : It automates the forward trans- the stock management. When an online purchase order is
formation, which is the conversion of BPMN and received, the stock manager checks the warehouse for the
UML models into BPMNTraceISM. availability of the ordered items to see if there are enough
– BPMNTrISM2BPISM : It automates the backward products to fulfil the order. If not, the restocking procedure
transformations. is initiated to reorder raw materials and create the ordered
The transformation process requires tools, editors or plu- products based on the supplier's catalogue. The restock-
gins in order to specify source and target models. For this ing procedure can be performed as many times as neces-
reason, tools are required to represent BPMN, UML and sary within the same business process instance. An extreme
BPMNTraceISM diagrams, which serve as the source and scenario occurs when raw materials are unavailable. If all
target models of BP2ISM components. Because BPMN items are available, sales validate the purchase order, gen-
and UML are widely used standards, many plugins and erate an invoice, and begin collecting and packaging prod-
tools have been created and certified to support them. We ucts for shipment. When sales receive payment and store
choose to employ internal plugins within EMF instead of the delivered order, the procedure is complete. Purchase
existing plugins. As a result, we develop BPMN mod- order cancellation requests, however, can be made before
els with the Eclipse BPMN2 modeler plugin and UML the purchase order is verified. As a result, sales proceed
use case and class diagrams with the UML designer plu- with purchase order cancellation and a penalty charge to the
gin. Internal meta-models in these plugins closely adhere buyer. In [23], the authors decompose the BPMN model
to OMG requirements. We incorporate these meta-models of the case study into nine fragments (F1-F9) (cf. Figure
into the EMF environment for usage in the execution of 8)) based on their fragment definition (see [23] for further
our prototype components. In addition, we integrate the explanation). By applying their transformation rules, the
trace meta-model to visualize (backward transformation) or approach from [23] allows generating the use case diagram
design (forward transformations) BPMTraceISMdiagrams. and the class diagram from the case study BPMN model,
We built the transformation rule sets in Atlas Transforma- which that is taken as the input model.
tion Language (ATL), which is provided as an internal EMF The online purchasing and selling BPMN model, use
plugin. case model, and class diagram presented in [23] can be
110 Informatica 49 (2025) 97–114 A. Bouzidi et al
Figure 8: Online purchasing and selling in BPMNTraceISM diagram
combined and designed in a single unified model, namely ing vacant lane. For example, the actor Stock manager and
the BPMNTraceISM diagram, by using the BPTraceISM the empty lane Stock manager map to the OUActor Stock
editor. We would like to highlight that this diagram can manager.
be created manually by designers or automatically by run- Assume that business analysts and system designers col-
ning the BPISM2BPMNTrISM component. Figure 9) de- laborate on the BPMNTraceISM diagram and update the
picts the BPMNTraceISM diagram. Figure 9) shows how business and system functionalities accordingly. Suppose
each fragment and its corresponding use case are merged they delete the UCsF Manage preparing purchase order
and expressed as a UCsF. For example, we combine Frag- and the OUActor Customer from the BPMNTraceISM di-
ment F1 with the use case “Manage preparing purchase or- agram The UCsF Manage preparing purchase order is
der” to form the UCsF “Manage preparing purchase order”. traced to the elements of fragment F1, the UML use case
EachUCsF displays the traced BPMNelements and the cor- Manage preparing purchase order, as well as all class dia-
responding class diagram elements. For each UCsF, the el- gram elements derived from F1 By deleting this UCsF, all
ements of the BPMN model are represented in the BPMN its components are also removed from the BPMNTraceISM
compartment, while the corresponding class diagram ele- diagram Then, the change involved in the BPMNTraceISM
ments are represented in the class diagram compartment. diagram is propagated to the source models by executing
In Figure 9), these compartments are hidden in the UCsF the BPMNTrISM2BPISM tool. The output of this compo-
Cancel purchase order, while the BPMN compartment is nent is a BPMN model without the pool Customer or the
visible in all the other UCsFs. fragment F1, a UML use case model that contains neither
the use case Manage preparing purchase order nor the ac-
In the UCsF receive payment, the BPMN compartment tor Customer, and a class diagram without elements corre-
contains a service task called receive payment, a data ob- sponding to F1.
ject called invoice, and a data output called purchase order
[paid] These elements are the BPMN elements of fragment
F8 Moreover, the class diagram compartment of UCsF 7 Evaluation results
archive purchase order is displayed and contains the class
diagram elements, such as the classes VarchivePurchase- 7.1 Comparison with existing approaches
Order, CArchivePrchaseOrder, paid, archived, operations,
attributes, etc corresponding to fragment F10. Further- To evaluate the effectiveness of our traceability method, we
more, an OUActor specifies each actor and the correspond- compare it with existing traceability approaches based on
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 111
Figure 9: Online purchasing and selling in BPMNTraceISM diagram
defined evaluation criteria. These criteria include: (i) the troduces a graphical visualization for traceability links, en-
proposal of a traceability approach at both the meta-model abling users to more easily trace and align elements across
and model levels, (ii) explicit representation of relationship models. This graphical representation reduces analysis
types between elements, (iii) graphical notation for trace time, simplifies development, and minimizes the risk of
links, and (iv) the consideration of both business and infor- misalignment.
mation system (IS) models. Moreover, our method's assessment methodology is
Table 5 presents the results of this comparison, with rows more comprehensive than most others in the field, which
listing the methods studied and columns representing the typically rely on simple case studies. We provide a fully im-
evaluation criteria. Cells are color-coded to show the ex- plemented prototype of the transformation approach and an
tent to which each criterion is satisfied: dark grey indicates Eclipse plug-in for the traceability process, demonstrating
a criterion is not fully addressed, light grey represents par- the practicality and feasibility of our contributions through
tial satisfaction, and ”Y” stands for full satisfaction. Based a relevant case study.
on this comparison, we conclude that our approach is the
only one that meets all evaluation criteria. Specifically,
[17] and [20] are the only other works that consider both 7.2 Shortcomings of our contribution
business and IS modeling, but they fall short in addressing Despite the strengths of our traceability method, there are
the full range of traceability needs compared to our method. some limitations that need to be addressed. One notable
Furthermore, only a few approaches, such as those by [15], drawback is that our evaluation was based on a single use
[18], and [20], take into account the functional and static case, which may not be sufficient to fully assess the accu-
views of IS models. racy and robustness of the method. To address this, we are
Our method is unique in that it does not require any conducting ongoing evaluations using more complex case
extensions and works seamlessly with standard UML and studies, which will allow us to better validate the method's
BPMN tools, making it more adaptable and accessible. Ad- performance in different contexts. This extended evalua-
ditionally, our method provides rich software modeling- tion will help ensure that the method is robust and adaptable
level artefacts, incorporating both static views (class dia- across various scenarios, enhancing its overall credibility.
grams) and functional views (use case models). The class Additionally, our current transformation approach relies
diagrams are designed in accordance with the MVC pattern on forward and backward transformation rules, which re-
simplifying the prototyping process for developers. quire the recreation of all components, even if they have
When focusing on traceability, our method stands out be- not been affected by changes. This process can lead to in-
cause it provides traceability at both the meta-model and efficiencies, especially when working with large or com-
model levels, ensuring a unified view of BPMN and UML plex models. To overcome this issue, we plan to develop
elements. In contrast, many approaches specify only trace- incremental transformations that will update only the com-
ability at the meta-model level, without providing a visual ponents directly impacted by changes. This will improve
tool for combined model use. Additionally, our method in- efficiency and minimize unnecessary recalculations, ensur-
112 Informatica 49 (2025) 97–114 A. Bouzidi et al
Table 5: Comparison of our contribution with approaches based on the external traceability practice
Construction model Traceability approach
Approach Assessment
Business software field evel Model Trace Graphic BPM and methodology
field Meta l
functional static level links notation ISM
[11] N P P Y N N N N CS
[13] N CS Y N N N N CS
[14] N N N Y N N N N tool
[15] P P P Y N N N N
[16] RM CS Y N N N CS
[17] BPMN N N Y N N N N CS
[18] p p p Y Y N N Y N
[19] RM N N Y N Y N N N
[20] N N CD Y N N N N N
[ 21] BPMN N N Y Y N Y N T
Our con- BPMN2 UC CCD Y Y Y Y Y CS &T
tribution
Legend: Y:Yes N: No CS: case study RM: requirement model T:tool/editor CS: Complex systems CCD:conception class
diagram UC: uqe case diagram.
ing faster and more resource-efficient updates. meta-model and the model levels. Hence, (1) we first de-
While we have explored the use of traceability informa- fined a unified tracemeta-model that includes all the BPMN
tion to keep BPMN and UML models aligned, the analysis and the UML elements (use case and class diagram) and
process is still manual. As a result, we aim to improve this traceability links between interrelated elements. (2) Then,
by investigating the development of heuristics that could we defined integratedmodel conforms to the proposed trace
automatically detect modifications in the source models meta-model. We defined it as a new diagram named BPM-
and suggest necessary adjustments to the corresponding el- NTraceISM (BPMN Traces Information System Models).
ements. These heuristics would dynamically support de- This diagram serves many purposes: it promotes the col-
velopers, making the process of maintaining alignment be- laborative between business and software designers and al-
tween the models more efficient and less error-prone. lows them to work together using one single unified model.
To address the limitations mentioned above, we propose The joint representation of both BPMs and ISMs elements
a comprehensive roadmap for future improvements. This enables users to drill down and easily trace any business
will involve extending the evaluation process with more artefact to its corresponding software artefacts. (3) Finally,
complex case studies, enhancing the transformation ap- we defined a set of bidirectional model transformation rules
proach to support incremental updates, and automating dia- between the BPMN and UML models, as well as the BPM-
gram analysis. In addition, the implementation of heuristic- NTraceISM diagram.
based tools will enable the automatic detection of changes The rules are useful when a change propagation-
across models, improving traceability and ensuring con- based co-evolution is required to synchronize models after
sistency without requiring manual intervention. These ad- changes. To prove the feasibility of our traceability method
vancements will significantly strengthen themethod's capa- in practice, we developed a modeling tool in the form of
bilities, making it more robust and easier to apply in prac- a plugin that can be integrated into the Eclipse platform.
tical scenarios. This tool is named BPTraceISM (Business Process model
Trace with Information SystemModels) and allows design-
ing and handling BPMNTraceISM diagrams in accordance
8 Conclusion with the proposed integrated trace meta-model. Addition-
ally, we specified the set of bidirectional transformation
The work conducted in this paper fits within the context rules using the ATL language and we implemented them
of model-based development of ISMs, their alignment and as components of the BPM2ISM prototype. Furthermore,
their we applied the proposed approaches to a typical case study.
coevolution with BPMs. Indeed, we have used integra- In future research, we look forward to optimizing our
tion and model transformation methodologies to define a editor to support traceability and synchronization between
traceability method oriented towards the development of BPMN models and other UML diagrams.
(meta) model-based solutions, purposely influenced by the
Object Management Group (OMG) specifications. Partic-
ular attention is paid to the BPMN and UML use case and
class diagram models. Our traceability method acts at the
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 113
References [11] Mills, C., Escobar-Avila, J., Haiduc, S. (2018). Auto-
matic Traceability Maintenance via Machine Learn-
[1] Bouzidi, A., Haddar, N., Haddar, K. (2019). Trace- ing Classification, In 2018 IEEE International Con-
ability and Synchronization Between BPMN and ference on Software Maintenance and Evolution (IC-
UML Use Case Models, Ingénierie des Systèmes SME), pp. 369-380. 10.1109/ICSME.2018.00045.
d’Information, Vol. 24, No. 2, pp. 215-228. https:
//DOI.org/10.18280/isi.240214. [12] Al-Hroob, A., Imam, A. T., Al-Heisa, R. (2018).
The Use of Artificial Neural Networks for Extracting
[2] OMG UML Specification, O. A. (2017). OMG Uni- Actions and Actors from Requirements Documents,
fied Modeling Language (OMG UML), Superstruc- Information and Software Technology, Vol. 101,
ture, V2, Object Management Group, Vol. 70. pp. 1-15. https://DOI.org/10.1016/j.infsof.
[3] OMG BPMN Specification. Business Process Model 2018.04.010.
and Notation Available at: http://www.bpmn.org/. [13] Min, H. S. (2016). Traceability Guideline for Soft-
Accessed: 2023-01-31. ware Requirements and UML Design, In International
[4] Driss, M., Aljehani, A., Boulila, W., Ghandorh, H., Journal of Software Engineering and Knowledge En-
Al-Sarem, M. (2020). Servicing your requirements: gineering, Vol. 26, No. 1, pp. 87-113. https://DOI.
An FCA and RCA-driven approach for semantic web org/10.1142/S0218194016500054.
services composition, IEEEAccess, Vol. 8, pp. 59326- [14] Eyl, M., Reichmann, C., Müller-Glaser, K. (2017).
59339. 10.1109/ACCESS.2020.2982592. Traceability in a Fine-Grained Software Configura-
[5] Ghiffari, K. A., Fariqi, H., Rahmatullah, M. D., Zul- tion Management System, In Software Quality: Com-
fikarsyah, M. R., Evendi, M. R. S., Fathoni, T. A., plexity and Challenges of Software Engineering in
Raharjana, I. K. (2023). BPMN2 user story: Web ap- Emerging Technologies, 9th International Confer-
plication for generating user stories from BPMN, In ence, SWQD 2017, Vienna, Austria, January 17-20,
AIP Conference Proceedings, AIP Publishing LLC, 2017, Springer International Publishing, pp. 15-29.
Vol. 2554, No. 1, pp. 040003. https://DOI.org/ 10.1007/978-3-319-49421-0_2.
10.1063/5.0103685. [15] Khelladi, D. E., Kretschmer, R., Egyed, A. (2018).
[6] Raharjana, I. K., Aprillya, V., Zaman, B., Justi- Change Propagation-Based and Composition-Based
tia, A., Fauzi, S. S. M. (2021). Enhancing soft- Co-Evolution of Transformations with EvolvingMeta-
ware feature extraction results using sentiment anal- Models, In Proceedings of the 21st ACM/IEEE In-
ysis to aid requirements reuse, Computers, Vol. ternational Conference on Model Driven Engineer-
10, No. 3, pp. 36. https://DOI.org/10.3390/ ing Languages and Systems, pp. 404-414. https:
computers10030036. //DOI.org/10.1145/3239372.3239380.
[7] Khlif, W., Elleuch, N., Alotabi, E., Ben-Abdallah, [16] de Carvalho, E. A., Gomes, J. O., Jatobá, A., da
H. (2018). Designing BP-IS Aligned Models: An Silva, M. F., de Carvalho, P. V. R. (2021). Em-
MDA-based TransformationMethodology. 10.5220/ ploying Resilience Engineering in Eliciting Soft-
0006704302580266. ware Requirements for Complex Systems: Exper-
iments with the Functional Resonance Analysis
[8] Kharmoum, N., Retal, S., Rhazali, Y., Ziti, S., Method (FRAM), Cognition, Technology and Work,
Omary, F. (2021). A Disciplined Method to Gen- Vol. 23, pp. 65-83. https://DOI.org/10.1007/
erate UML2 Communication Diagrams Automati- s10111-019-00620-0.
cally From the Business Value Model, In Advance-
ments in Model-Driven Architecture in Software [17] Lopez-Arredondo, L. P., Perez, C. B., Villavicencio-
Engineering, IGI Global, pp. 218-237. 10.4018/ Navarro, J., Mercado, K. E., Encinas, M., Inzunza-
978-1-7998-3661-2.ch012. Mejia, P. (2020). Reengineering of the Software De-
velopment Process in a Technology Services Com-
[9] Rahmoune, Y., Chaoui, A. (2022). Automatic Bridge pany, Business Process Management Journal, Vol. 26,
Between BPMN Models and UML Activity Diagrams No. 2, pp. 655-674. https://DOI.org/10.1108/
Based on Graph Transformation, Computer Science, BPMJ-06-2018-0155.
Vol. 23, No. 3. 10.7494/csci.2022.23.3.4356.
[18] Moreira, J. R. P., Maciel, R. S. P. (2017). Towards a
[10] Ivanchikj, A., Serbout, S., Pautasso, C. (2020). From Models Traceability and Synchronization Approach of
Text to Visual BPMN Process Models: Design and an Enterprise Architecture, In SEKE, pp. 24-29. 10.
Evaluation, In Proceedings of the 23rd ACM/IEEE 1109/CBI.2019.00028.
International Conference on Model Driven Engineer-
ing Languages and Systems, pp. 229-239. https: [19] Guo, J., Cheng, J., Cleland-Huang, J. (2017). Seman-
//DOI.org/10.1145/3365438.3410990. tically Enhanced Software Traceability Using Deep
114 Informatica 49 (2025) 97–114 A. Bouzidi et al
Learning Techniques, In 2017 IEEE/ACM 39th Inter-
national Conference on Software Engineering (ICSE),
pp. 3-14. 10.1109/ICSE.2017.9.
[20] Swathine, K., Sumathi, N., Nadu, T. (2017). Study
on Requirement Engineering and Traceability Tech-
niques in Software Artefacts, In International Journal
of Innovative Research in Computer and Communi-
cation Engineering, Vol. 5, No. 1. 10.1109/ICSRS.
2017.8272863.
[21] Pavalkis, S., Nemuraite, L., Milevičienė, E. (2011).
Towards Traceability Meta-Model for Business Pro-
cessModeling Notation, In Conference on e-Business,
e-Services and e-Society, Springer, Berlin, Heidel-
berg, pp. 177-188. 10.1007/978-3-642-27260-8_
14.
[22] Bouzidi, A., Haddar, N., Abdallah, M. B., Had-
dar, K. (2018). Alignment of Business Processes and
Requirements Through Model Integration, In 2018
IEEE/ACS 15th International Conference on Com-
puter Systems and Applications (AICCSA), pp. 1-8,
IEEE. 10.1109/AICCSA.2018.8612870.
[23] Bouzidi, A., Haddar, N. Z., Ben-Abdallah, M., Had-
dar, K. (2020). Toward the Alignment and Traceabil-
ity Between Business Process and Software Models,
In ICEIS, Vol. 23. 10.5220/0009004607010708.
https://doi.org/10.31449/inf.v49i16.6934 Informatica 49 (2025) 115–136 115 
A Review of Machine Learning Techniques in the Medical Domain 
 
Enas M.F. El Houby 
Department of Systems and Information, National Research Centre, Giza, Egypt 
E-mail: em.fahmy@nrc.sci.eg, enas_mfahmy@yahoo.com 
Keywords: active learning, curriculum learning, deep learning, federated learning, medical, transfer learning 
Received: August 19, 2024 
We have witnessed a rapid exponential growth of all types of data in all domains specifically in the medical 
domain. The utilization of machine learning techniques has made significant strides across various 
domains, with deep learning achieving notable success in recent years. Lately, deep learning has gained 
increasing attention in the medical field. While deep learning excels at automatically learning 
discriminative features from raw data, it is still challenging to achieve high performance without a huge 
amount of data and some handcrafted steps. To address these challenges, deep learning has been 
incorporated with other new trends and domain knowledge to enhance deep learning's capabilities and 
improve performance covering the ever-growing needs. Transfer learning utilizes knowledge from natural 
images, curriculum learning integrates domain-specific knowledge, active learning selects the most 
informative samples to reduce reliance on labeled data, and federated learning enables collaborative 
training across organizations while ensuring data privacy. In this review paper, these new trends 
incorporated with deep learning have been investigated and presented as applications in the medical 
domain by investigating articles that have applied these trends and published in highly reputable journals 
in the Science Direct database in recent years. 
Povzetek: V pregledni študiji so predstavljeni sodobni trendi strojnega učenja v medicini, kot so 
transferno, aktivno in federativno učenje na podlagi učnih načrtov, ki v kombinaciji z globokim učenjem 
izboljšujejo diagnostiko, personalizacijo zdravljenja in varnost podatkov. 
 
 
1   Introduction learning techniques are usually categorized into 
supervised, unsupervised, semi-supervised, and 
Recently, we have witnessed the growth of all types of reinforcement learning. In supervised learning, the labeled 
data in all domains. Medical data specifically has grown data is available, therefore, the model can be trained using 
dramatically in the last few years due to the exponential this manually tagged data to extract patterns. When there 
increase of knowledge in the medical domain. Medical is no labeled training data, unsupervised techniques are 
data can be found in various forms such as clinical and employed. It groups similar entities in the same cluster. 
biomedical data. Biomedical data contains data related to Each cluster demonstrates a relation between these 
genomics, drug discovery, and biomedicine. Clinical data grouped entities. Semi-supervised depends on a set of 
contains patient records such as medical patients’ history, hand-crafted extraction patterns and a few tagged 
laboratory investigation, and image data from magnetic instances as initial seeds of the target relation to start the 
resonance imaging (MRI), ultrasound (US), X-rays, and training. The training output is used as the training input 
computerized tomography (CT) scans. Clinical data exists for the following generation. The process of learning is 
in 2 forms, structured and unstructured. The structured repeated for many generations. Reinforcement learning is 
format includes the disease history and living habits of the based on evaluative feedback, so, it can automatically 
patients. While unstructured clinical data such as doctor’s perform goal-oriented learning and process decision-
investigation records and the conversation between the making problems [4, 5]. 
doctor and patients [1-3]. Therefore, this rapidly growing Deep learning is an advanced form of artificial neural 
volume of medical data requires advanced methods for network (ANN), with a larger number of layers than a 
analysis. conventional ANN model to automatically learn the 
Applying artificial intelligence (AI) in the medical features from the data which makes more refined 
domain comprises a promising technology for different predictions possible. In numerous recent medical image 
healthcare providers. These technologies, particularly data classification tasks, convolutional neural networks 
mining, help extract hidden patterns and insights from (CNNs), which are a kind of deep learning network 
large datasets using machine learning techniques (MLTs). particularized in image analysis, were utilized and 
Traditional MLT includes Artificial Neural Networks achieved high performance. The success of CNN in the 
(ANN), Decision Trees (DT), Support Vector Machines classification of medical images has motivated researchers 
(SVM), and many other techniques. Machine  to utilize pre-trained models in building new ones. These 
 
116 Informatica 49 (2025) 115–136 E.M.F. El Houby 
high-performing CNN pre-trained models have been in patient data, machine learning models can predict 
utilized for different image classification tasks by disease progression, forecast complications in chronic 
employing the transfer learning (TL) approach. Pre- conditions, and identify high-risk patients who may 
trained CNN models utilize features that were learned benefit from earlier interventions. IV) Automation is 
from a specific domain to fine-tune any other data. They another key opportunity in healthcare, with ML models 
can be utilized as-is to classify new images or to extract capable of automating routine tasks such as image 
features using the output from the layer previous to the analysis, patient triaging, and administrative work. This 
output layer and introduce it to another classifier [6]. allows healthcare providers to focus more on direct patient 
However, many challenges face the application of care, improving overall efficiency. V) Drug discovery by 
machine learning techniques generally and in the medical identifying promising drug candidates and predicting their 
domain specifically such as I) the limitations of available behavior in the human body, which can reduce the time 
datasets for training the models, that is because collecting and cost associated with bringing new medications to 
and labeling the data is a labor-intensive and expensive market [7-10]. 
task, especially in the case of medical images data such as In response to the mentioned challenges, recent 
Ultrasonic imaging (US), CT, MRI. The annotation of data research has shifted towards using advanced techniques 
includes the segmentation annotations of abnormality such as deep learning with some incorporated techniques 
regions and classification labels such as (normal, benign, and domain knowledge like transfer learning which 
and malignant). Also, that limitation may result from the provides deep learning with information from natural 
scarcity of some diseases with which it is difficult to images. Curriculum learning integrates domain 
obtain enough positive cases.  II) The low quality of some knowledge through training patterns of the processed task. 
data is another major challenge, where some of the data Active learning explores the most informative samples 
can be found unlabeled, inconsistent, inaccurate, or in an and retrieves them from an unlabeled pool to fulfill better 
unstructured format—such as handwritten notes, performance with less labeled data. Federated learning 
radiology reports, and conversations between doctors and allows many organizations to collaborate on deep learning 
patients—which are difficult for machine learning without sharing clients' data or devices which provides 
algorithms to process effectively. In the case of medical efficient data access and security and an improvement of 
image modalities, there may be variations in image the learning model utilizing a large decentralizing dataset. 
resolution and quality. III) The shortage of explanations The purpose of this research is to illustrate the new trends 
of pathological basis such as the diagnosis reasons, where of machine learning in the medical domain. The selected 
the techniques depend only on the differences between the articles that are reviewed show these new trends in the 
normal and patient cases. For healthcare professionals to medical domain using different medical dataset types 
trust and act on ML-generated results, it is essential to including medical images, tabular datasets, genes, etc. in 
understand how these models arrive at their predictions. different tasks. The remainder of this research is organized 
IV) Ethical and regulatory concerns play a crucial role, as follows: Section 2 illustrates the different types of 
where the healthcare industry is tightly regulated, and medical data. Section 3 presents some data preprocessing 
machine learning models must comply with stringent steps. Section 4 presents the new trends of MLTs. Section 
standards to ensure patient privacy, data security, and 5 describes the search methodology for articles that apply 
model safety. Furthermore, any biases in the data could the mentioned new trends of MLTs in the medical domain. 
lead to unequal or unfair treatment recommendations, Section 6 presents some of the applications of new trends 
making fairness an ongoing concern in the application of of MLTs in the medical domain. Section 7 presents the 
ML in healthcare [7-10]. conclusion and some of the recommended points for 
Despite these challenges, machine learning presents a future work. 
wealth of opportunities that can significantly improve 
healthcare outcomes such as I) diagnostic accuracy and 2   Types of medical data   
speed where ML algorithms, particularly deep learning 
models, have demonstrated remarkable success in Medical data can be found in different forms such as 
automating and enhancing diagnostic processes, arrays of numerical data, images, sequences of DNA, 
especially in medical imaging. For instance, ML models amino acids, ...etc. For developing any ML model, the data 
can analyze radiographs, MRI scans, and other images to is split into three parts which are training, validation, and 
identify abnormalities such as tumors or lesions with a testing. The training part is used to learn to tune the 
level of precision that often rivals or exceeds that of parameters of the model. The validation part is used to 
human experts. This capability can lead to earlier stop overfitting, and the test part is used to assess the 
detection, which is critical for improving patient performance of the model. In the next subsections, a brief 
outcomes, particularly in cancer and cardiovascular overview of different medical data forms will be 
diseases. II) personalized medicine by analyzing large presented. 
datasets, including patient demographics, genetic 
information, and medical history, machine learning can 2.1 Numerical data 
help tailor treatments to individual patients, optimizing Different diseases' related data are found as an array of lab 
therapeutic interventions based on their unique tests which is numerical data. These numerical datasets 
characteristics. III) Predictive analytics is another can be used to manage the related diseases such as the 
powerful opportunity that ML offers. By analyzing trends 
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 117 
datasets available on the UCI machine learning repository 2.3 Image modalities 
[11]. Most numerical data are available in table form such 
Information obtained from medical imaging modalities is 
as Excel sheets or database tables where rows represent 
clinically beneficial in many applications like computer-
samples from patients and columns represent different 
aided detection, diagnosis, and treatment planning. Many 
features that describe the intended diseases or vice versa. 
imaging modalities can be used to check abnormalities in 
A huge number of numerical datasets are available such as 
different body organs. They include radiation such as CT, 
the patient demographics of some diseases like COVID-
X-rays, US, and MRI. They are categorized according to 
19, and the lab results for different diseases such as 
the method of producing images. They help radiologists to 
thyroid, heart disease, dermatology, cancer, etc.  
recognize abnormal regions. The interpretation of 
different image modalities needs expertise, where it is 
2.2 Microarray gene expression data operator dependent. Therefore, the process of reading 
Microarray techniques provide a platform for measuring image modalities is exhausting, costly, and prone to error.  
the expression levels of thousands of genes in various Ultrasound (US) is a suitable modality for tumor 
conditions. It is composed of a small glass slide or detection. It can estimate the size of the tumor and 
membrane that contains samples of many genes arranged distinguish abnormalities. Its capability of detecting 
in a regular pattern. It is used to find genes associated with contra-lateral malignant lesions is limited [14]. 
specific diseases by analyzing and finding the differences Magnetic resonance imaging (MRI) produces images 
between two mRNA sets, one set is from normal cells and relevant to the displaying of hydrogen atoms to radio 
the other set includes cells from pathological tissues such waves and magnetic fields. MRI images are valuable as 
as cancer cells. Microarray data contains a lot of redundant they present physiology and anatomy. It images the target 
genes, and many genes include inappropriate information organ and prepares it as thin slices; moreover, it provides 
for the accurate classification of diseases. Thus, the information about the vascularity of the tissue [15]. 
analysis of the large amount of data generated by this Computed tomography (CT) scanners display better 
technology is not an easy task for biologists [12]. Figure 1 image clarity using multiple X-ray sources and detectors 
shows a cDNA microarray spotted on a glass surface. [15]. Radiation X-ray generated images are 2-dimensional 
While, in Figure 2 the general structure of the microarray images. Fluoroscopy units show real-time moving images 
is illustrated, which is represented as an array of numerical produced by X-ray exposure. Angiography is a 
values. Cancer gene expression datasets for leukemia, widespread usage of fluoroscopy, imaging blood flow in 
lung, prostate, etc. can be found in [13]. vessels [15]. 
 Digital Mammography (DM) is an X-ray imaging that 
is specialized for breast tissue. DM is the most common 
and most important screening method in clinical practice. 
It can detect tumors before they develop further and 
become easily detected and felt by the physician [16]. 
Microscopic images are the images that are captured 
by the microscope to enlarge small scanned objects and 
extract fine details that cannot be obtained otherwise    
]17[. Figure 3 shows samples of different image 
modalities for different body organs. 
 
Figure 1: cDNA microarray spotted on a glass surface. 2.4 DNA and protein sequences 
https://www.cell.com/fulltext/S0960-9822%2898%2970103-4 
     The fast growth of sequencing resulted in huge numbers 
of DNA and protein sequences. Sequences can be used to 
predict diseases associated with a given either DNA or 
protein sequences. DNA is a long polymer chain of units 
named nucleotides; it exists in a double helical shape as 
shown in Figure 4. There are 4 types of nucleotides which 
are A (adenine), C (cytosine), G (guanine), and T 
(thymine), they are considered the alphabet of DNA. They 
are arranged into sequences of 3-letter called codons. A 
double-stranded helical structure of DNA would be 
complementary, where “G” is chemically combined with 
 “C”, and "A" with "T" within the replication of DNA [18]. 
Figure 2: General structure of microarray.   
  
118 Informatica 49 (2025) 115–136 E.M.F. El Houby 
                      
(a)  X-ray of Lung [19]       (b) DM of breast [20]         (c) Microscopic blood image [21] 
  Figure 3: Samples of different image modalities for different body organs. 
 
 
Amino acids are linked into linear chains to produce then, the translation of the transcribed mRNA into the 
proteins. The properties of proteins are defined by the associated chain of amino acid sequence, which later fold 
composition of their amino acids. The triplets of into fully functional proteins.  
consecutive DNA nucleotides which are called codons are Single nucleotide polymorphisms (SNPs) are the most 
responsible for the forming amino acid sequence in a common human genetic variations as mutations or 
protein. There are 4³ = 64 various codons formed from the insertions/deletions (indels). If SNPs have changed the 
4-letters [22], which is more than 3 times larger than the codon triplets without changing the encoded amino acid, 
number of amino acids which is 20 amino acids, 3 of it is synonymous (sSNPs) while the gene is not mutated. 
which represent stop codons and one is a start codon. Otherwise, it is non-synonymous (nsSNPs), as it changes 
While the remaining codons are responsible for generating the codon while the encoded amino acid is changed into 
the 20 amino acids. So, it is possible that more than one various amino acids which are called missense mutations 
codon maps the same amino acid [18]. Figure 5 shows the which are the reason for many diseases [23, 24]. Figure 6 
transcription of DNA sequence into molecules of mRNA; shows single nucleotide polymorphisms (SNPs). 
 
 
Figure 4: Chain of DNA sequence. 
http://acer.disl.org/news/2016/08/17/tool-talk-gene-sequencing/ 
 
 
Figure 5:  The process of translation from DNA sequence to the associated amino acid sequence. 
https://courses.lumenlearning.com/suny-ap1/chapter/3-4-protein-synthesis/ 
 
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 119 
 
Figure 6: Single nucleotide polymorphisms (SNPs). 
https://isogg.org/wiki/Single-nucleotide_polymorphism 
 
3   Data preprocessing  
the feature selection depends on the outcome of the ML 
The knowledge discovery includes 3 main phases which algorithm to decide how favorable the features subset is. 
are the preprocessing phase, the data mining phase, and Candidate solutions of feature subsets are iteratively 
the post-processing phase. Data pre-processing is a crucial generated and their characteristics are assessed by the 
phase of knowledge discovery to build an accurate applied ML algorithm [28]. 
machine learning model. In the preprocessing phase, a set The wrapper-based feature selection approach 
of data preprocessing steps are performed (cleaning the evaluates the quality of feature subsets using the learning 
data from the noise, handling missed values, merging algorithm. Thus, it can determine and discard irrelevant 
appropriate data from different databases, normalizing the and redundant features effectively. As the learning 
data, extracting features, and selecting the most algorithm is frequently used in the search process, high 
informative features) to prepare the data for data mining computational time is required, especially when the 
phase. Datasets can also be small therefore the relevant datasets are large. On the other hand, the hybrid methods 
features have not been captured and thus data aim to utilize the advantages of both approaches; the 
augmentation is performed by applying different data efficiency of computation of the filter approach and the 
augmentation techniques. Data mining, which is the core high performance of the wrapper approach [29]. 
phase in knowledge discovery, is performed by applying Feature selection algorithms based on heuristic search 
MLTs. The preprocessing facilitates the application of the methods are needed as the computation of a huge number 
MLTs to extract important patterns or correlations. In the of features is not feasible. Many meta-heuristics 
post-processing phase, the discovered knowledge is approaches have been used for feature selection. From 
refined and improved then interpreted into meaningful these algorithms are the nature inspired algorithms such as 
knowledge for the user’s presentation [25]. genetic algorithm (GA), firefly [30, 31] and ant colony 
Feature selection is a key difference preprocessing optimization (ACO) [32, 33]. 
step that should be highlighted when comparing deep 
learning with the traditional MLTs so it will be tackled in 
more detail in the next subsection. 4   New trends of machine learning 
techniques   
3.1 Features selection After data preprocessing, various Machine Learning 
Feature selection is the process of finding the optimal Techniques (MLTs) are applied to uncover hidden 
feature subset that is strongly distinguishing among patterns and correlations in the data. As mentioned earlier, 
different classes. The purpose of this process is the disease-related data, often represented through numerical 
reduction of the dataset and the elimination of redundant lab tests, microarray data, medical imaging, and genetic 
and irrelevant features that impact the classification sequences, can be processed to predict disease presence or 
process negatively. Feature selection is a combinatorial other related tasks. Traditional MLTs like Support Vector 
optimization problem its aim is to select the feature subset Machines (SVM), Decision Trees (DT), and K-Nearest 
with the least number of features that achieves the highest Neighbors (KNN) are effective across these data types for 
possible classification accuracy. It is one of the data decision-making. However, Artificial Neural Networks 
preprocessing for pattern recognition and data mining (ANNs), which mimic the brain’s neural structure, are 
specifically when working on high-dimensional datasets increasingly used for more complex tasks and in various 
[26, 27]. domains, especially the medical domain. ANNs consist of 
Feature selection has 2 main approaches: the filter and input, hidden, and output layers, and are trained through 
the wrapper. In the filter approach, the feature selection is techniques like backpropagation. Their success across 
based on statistical individual feature ranking. It is easily domains, especially healthcare, has led to the development 
implemented but eliminates the interaction among of Deep Learning (DL), a more advanced form of ANN. 
features and does not rely on the applied ML algorithm to This section explores the role of deep learning in modern 
the selected features. Whereas, for the wrapper approach  machine learning. 
 
120 Informatica 49 (2025) 115–136 E.M.F. El Houby 
 
 
Figure 7: The general structure of Deep Neural Network. 
 
random search are widely used, more advanced 
4.1 Deep learning (DL) techniques, such as Bayesian optimization, offer 
significant advantages [37, 38].  
Deep Learning (DL) models have gained prominence 
due to their ability to automatically extract complex 
patterns from data, eliminating the need for manual Common techniques for hyperparameter 
feature engineering. However, DL models require large optimization: 
datasets, making them particularly suited for high-
dimensional data, such as in medical fields, where they 1. Grid search: Exhaustively searches across all 
can uncover intricate structures through multiple possible hyperparameter combinations. While it 
intermediate layers. The depth of a DL model— guarantees to find the best parameter set within the 
referring to the number of hidden layers—enables it to grid, it can be computationally expensive, especially 
learn complex mappings between input and output. for models with many hyperparameters. 
Unlike shallow networks, which struggle with intricate 
data patterns, deeper networks excel at learning these 2. Random search: A more computationally efficient 
relationships [34, 35]. Figure 7 shows the general approach, randomly selecting combinations of 
structure of Deep Neural Networks (DNNs).  hyperparameters from specified ranges. It often 
achieves comparable or better results than grid 
There are several deep learning algorithms such as search in fewer trials. 
Convolution Neural Network (CNN), radial basis 
function networks, deep belief networks, autoencoders, 3. Bayesian optimization: An advanced method that 
and Recurrent Neural Network (RNN) [35, 36]. builds a probabilistic model of the objective 
Deep learning depends on hyperparameters such as function. It predicts the best hyperparameters based 
activation function, learning rate, batch size, number of on past performance, guiding the search toward the 
epochs, optimizer, dropout rate, etc. Different deep most promising regions with fewer trials. Libraries 
learning algorithms, like RNNs and CNNs, also have like Optuna and Hyperopt can implement Bayesian 
additional specific hyperparameters. Adjusting these optimization efficiently. 
hyperparameters is critical, as their values significantly 
For example, CNN can be used to classify medical 
affect the model's behavior. Finding the optimal 
combination of hyperparameters can be an exhaustive images like X-rays or MRI scans. Random search can 
explore different values for hyperparameters (e.g., 
task, requiring substantial computational resources and 
learning rate, batch size, number of layers). 
time [37, 38].  
Alternatively, Bayesian optimization can be used for a 
more efficient search, predicting the most promising 
The performance of a DL model heavily depends on the 
hyperparameters configurations based on prior 
selection of these hyperparameters, particularly in 
evaluations. By optimizing the model’s parameters 
complex domains like medical data analysis. Medical 
using these methods, we can improve classification 
data often have high dimensionality, noise, and 
accuracy, reduce overfitting, and ensure the model 
imbalanced class distributions, making hyperparameter 
performs well on unseen medical data. 
optimization crucial to enhancing model performance. 
Careful selection improves robustness and  
generalizability, ensuring reliability in real-world 
clinical settings. While methods like grid search and  
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 121 
Advantages and Limitations of DNNs for Medical defines the input to hidden connection, a weight matrix 
Data W defines the hidden-to-hidden connection, and a 
weight matrix V defines the hidden-to-output 
Advantages: connection. Then, from time step t = 1 through time step 
t = n, the following equations are used: 
• Versatility: DNNs can be adapted to work with 
various data types, including structured clinical data 𝑎𝑡 = 𝑏 + 𝑊ℎ𝑡−1  + 𝑈𝑥𝑡                  (1) 
(e.g., patient demographics, lab results), unstructured 
ℎ𝑡   = tanh(𝑎𝑡)
data (e.g., free-text medical records), and image data.        (2) 
𝑜𝑡   = c + 𝑉ℎ𝑡     (3) 
• Feature Learning: DNNs can automatically learn 
relevant features from the data, making them more 𝑦𝑡 ̂ = SoftMax(𝑂𝑡)  (4) 
flexible than traditional machine learning algorithms The forwarding propagation of RNN is defined by 
that rely on feature engineering. the preceding equations, where b and c are the bias 
vectors, while tanh and SoftMax are the activation 
Limitations: 
functions. To update the weight matrices U, V, and W, 
we compute the gradient of the loss function for each 
• Training complexity: Training deep neural 
weight matrix. Gradient computation requires both 
networks can be computationally expensive and 
forward and backward propagation of the network. Any 
time-consuming. Additionally, DNNs require large 
loss function can be used depending on the goal. At 
datasets to avoid overfitting. 
each time step, the sum of all losses is the total loss for 
a particular sequence of x values. 
• Overfitting: If not carefully tuned, DNNs can overfit 
However, traditional RNNs suffer from gradient 
to small or imbalanced datasets, a common issue in 
exploding and gradient vanishing issues, making them 
medical data where datasets may not be as large or 
unsuitable for long-term dependencies. On the other 
diverse as needed for training. 
hand, long short-term memory (LSTM) is effective in 
Use Cases in Medicine: DNNs have been applied to a capturing long-term time dependence. LSTM networks 
variety of tasks in medicine, including predicting address this by introducing gating mechanisms that 
patient outcomes, disease progression modeling, and control the memory flow, allowing for better long-term 
disease classification from images and clinical data. sequence learning. Gated Recurrent Units (GRUs) offer 
a simplified version of LSTMs with similar benefits but 
CNNs and RNNs are two of the most common and fewer parameters. 
promising deep learning algorithms used in medical 
applications. These algorithms have demonstrated Advantages and Limitations of RNNs for medical 
success in a variety of tasks, such as medical image data 
classification and time-series data analysis. Further 
Advantages: 
details on these algorithms will be discussed in the next 
subsections. 
• Sequential data processing: RNNs can handle 
different types of sequential data, including time 
4.1.1 Recurrent neural networks (RNN) series and text, where the past medical history or a 
Recurrent neural networks (RNNs) are neural networks series of clinical events influence future outcomes. 
that contain memories that can capture the stored 
information in the prior element of the given sequence. • Memory of past inputs: RNNs can remember 
Therefore, RNN is suitable for processing sequential information from previous time steps in the 
data types such as the diagnostic history of patients, sequence, allowing them to capture temporal 
DNA and protein sequences, etc., where the dependencies in data. This is particularly useful for 
information is remembered through the network. RNN tracking disease progression over time or analyzing 
is called recurrent because it executes the same task for patient histories. 
each element of the input sequence while its output is 
based on the prior computations (memory). Thus, the Limitations: 
decision of recurrent net at time t-1 affects the decision 
that will be taken later at time t. Therefore, RNN has 2 • Training difficulties: RNNs are prone to the 
sources of input, the recent past and the present, which vanishing gradient problem, especially in long 
are combined to define the response to new data. Figure sequences, making them harder to train effectively. 
8 shows the architecture of the RNN in which a set of 
input x values are mapped into a sequence of output o • Data complexity: RNNs are best suited for data 
values. A loss L measures the difference between the where the relationship between input and output is 
expected output o and the actual output y [35]. sequential. For static data like images or tabular data, 
Where 𝑥, ℎ, o, L, y symbolizes input, hidden state, CNNs or DNNs might be more appropriate. 
output, loss, and target value. A weight matrix U 
122 Informatica 49 (2025) 115–136 E.M.F. El Houby 
• Resource intensive: Training RNNs, especially on 
long sequences, can be computationally expensive. 
 
Figure 8: The architecture of recurrent neural network. 
 
  
 at specific positions. At a specific layer 𝑙, the feature map 
Use cases in medicine: RNNs (and their variants like at position (𝑖, 𝑗) is defined as ℎ𝑙
𝑖𝑗, the bias as 𝑏𝑙, and the 
LSTMs) are commonly used in medical applications such weight as 𝑊𝑙. The feature map can be expressed as 
as gene sequence classification, predicting disease follows: 
progression over time, and analyzing time-series medical  h𝑙
ij = ReLU((W𝑙 ∗ I)ij + b𝑙)  (5) 
signals (e.g., ECG readings) [35]. Where ReLU is activation function which controls the 
4.1.2 Convolution Neural Network output. The basic structure of the CNN is shown in Figure 
9. 
CNNs are a type of deep learning network specialized for 
image analysis. Unlike traditional MLTs that rely on Advantages and limitations of CNNs for medical data 
manual feature extraction, CNNs can automatically learn 
Advantages: 
hierarchical features from raw image data. This is 
especially useful in the medical field, where CNNs are 
• Feature extraction: CNNs automatically learn 
applied to analyze medical images for tasks like disease 
hierarchical features from raw image data, 
detection and classification [36, 39].  
eliminating the need for manual feature 
It contains an input, an output and many hidden layers extraction, which is time-consuming in 
which represent convolutional networks. Convolutional traditional methods. 
network includes three types of layers: convolutional, 
activation, and pooling. The convolutional layers apply • Spatial relationships: The convolutional layers 
filters to detect features (edges, textures, etc.). As the can detect local patterns (e.g., edges, textures) in 
image proceeds through layers, the filters can detect more images, which are crucial for tasks like tumor 
sophisticated features. The activation function like detection or organ segmentation. 
Rectified Linear Unit (ReLU) follows the convolution 
layer to control the output, it introduces non-linearity. • Efficiency: CNNs are computationally efficient 
Pooling layers reduce the dimensionality of the data, due to shared weights in convolution layers, 
making the model more computationally efficient and less allowing them to process large datasets more 
sensitive to minor positional changes in the features. The effectively. 
final layer is fully connected, producing predictions for 
classification tasks. The overall number of network Limitations: 
parameters is defined by the number of layers, the number 
of neurons in each layer, and the connection between • Data requirements: CNNs require large labeled 
neurons. The weights should be tuned through the training datasets to perform well, which may not always 
phase to achieve good performance [40].  be available in medical settings. 
convnet processes the image (I) using a matrix of • Limited to spatial data: While CNNs excel in 
weights called filters which can recognize certain features image-based data, they are not as effective for 
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 123 
non-spatial data like time-series or sequential layers. The size of the input image to VGG is (224×224). 
data. VGG has a set of convolutional filters with small sizes 
(3×3) to capture the information of the up/down and 
Use cases in medicine: CNNs have been widely applied left/right center. The size of the pre-trained weights is 528 
in diagnostic tasks such as detecting cancers, classifying MB. The overall number of parameters of VGG16 is 138 
lesions, and analyzing radiological images (e.g., X-rays, 357 544 parameters [43]. 
MRIs, CT scans) [35, 36, 41].  
4.1.2.2 InceptionV3 model architecture 
Recent advances in CNN, like AlexNet [42], VGGNet InceptionV3 network is the winning model 
[43], GoogLeNet [44], and ResNet [45], have significantly architecture of the 2015 ImageNet competition. The 
improved image classification accuracy, with models now Inception V3 model has a total of 48 layers. The size of 
outperforming human experts in some cases. These the input image to InceptionV3 is (299×299). It is deeper 
networks have been trained on the ImageNet Large Scale than VGG16 but with fewer parameters. The size of the 
Visual Recognition Challenge (ILSVRC) using millions pre-trained weights is 92 MB. It has 23 851 784 
of annotated images [46, 47]. And their success has parameters [44].  
spurred the rise of transfer learning, where pre-trained  
models are fine-tuned for specific tasks [48]. In the next 4.1.2.3 Residual neural network (ResNet) 
subsection, a brief description of some of the high-
performance pre-trained models using ImageNet. ResNet network is the winning model architecture of 
the 2016 ImageNet competition. ResNet-50 contains a 50-
4.1.2.1 Visual geometry group network VGG 16– layer architecture. The size of the input image to ResNet 
19 is (224×244). The size of the pre-trained weights is 99 
VGG16 network is the winning model architecture of MB. It has 25 636 712 parameters [45]. 
the 2014 ImageNet competition. VGG consists of 16–19  
  
 
 
Figure 9: The basic structure of CNN. 
 
 
 
 
Figure 10: Transfer learning architecture. 
124 Informatica 49 (2025) 115–136 E.M.F. El Houby 
heuristics, or natural language processing (NLP) applied to 
4.2 Transfer learning radiology reports. 
Given a sample xi which should be assigned to a class 
Transfer learning is a more appropriate approach when the 
label Ci 𝜖 {C1, C2, …., Cm}. Suppose the training set 
available data for training is limited. In transfer learning, 
consisted of pairs {X, C}, and the training is processed in 
an intricate model can be trained using available large-
batches of size B for a total of E epochs. To train CNN 
scale annotated images such as natural images. Therefore, 
with CL, prefer to start the training with simpler samples. 
the TL process drives knowledge from source domain (e.g. 
Practically, CL is performed by assigning a probability to 
natural images) to a target domain or network where the 
every training pair, where the simpler samples are given 
domain images are limited. Only the small amount of 
higher probabilities to be chosen first. Initially, every 
available annotated data of the target domain is used to 
sample xi is assigned a probability pi(0). At the beginning 
tune the model. Where the fundamental features used for 
of each epoch e, the training set {X, C} is permuted to {X, 
classification are similar between domains, retraining the 
C}k by the reordering function F(e). Where this mapping 
entire model is unnecessary. In such cases, TL allows for 
is produced by sampling the training set based on the 
the transfer of learned features, with only the classification 
probabilities at the present epoch pi(e). After executing 
layer(s) being retrained on the small new dataset [48]. TL 
many iterations, these probabilities are updated using a 
leverages pre-trained models such as VGG [43], 
scheduler, aiming to achieve a regular distribution by the 
NasNetLarge [22], Inception GoogLeNet [44], ResNet 
end of the training process [50].  
[45], etc—that have been developed for image 
classification and they have been presented at the annual 
ILSVRC [46, 47]. TL saves a great amount of time lost in 4.4 Active learning (optimal experimental 
developing and training CNN models. The pre-trained design) 
model or the required part of the model can be Supervised learning techniques rely heavily on annotated 
incorporated directly into the new model and used as a data. Although more datasets are becoming available, the 
classifier, standalone feature extractor, integrated feature effort, cost, and time required to annotate them remain 
extractor, or weight initializer [48, 49]. Figure 10 shows significant. On the other side, any error especially in some 
transfer learning architecture.  important applications such as those in the medical domain 
can have severe consequences. Achieving reliable 
4.3 Curriculum learning outcomes often requires an interactive process where 
In the standard educational method, learning depends on a predictions are reviewed or modified by an oracle or user. 
curriculum that presents new concepts based on previously This means users must be able to override and adjust 
collected ones. The rationale beyond this is that people automated predictions to meet specific criteria. 
pick up better if the information is introduced in a Techniques such as Active Learning (AL) or what is called 
meaningful method instead of randomly. By using the Human-in-the-Loop computing have witnessed progress 
same ideas to train neural networks starting with simple in overcoming these challenges [51]. 
cases, it was noticed that the networks perform better, Active learning is a semi-supervised learning 
which indicates the significance of gradual and systematic approach that begins with a small set of labeled samples 
learning [50]. (seed samples) and iteratively selects the most informative 
The curriculum learning (CL) approach is motivated samples from a pool of unlabeled data for annotation. By 
by the capacity of humans to pick up new tasks fast with focusing training on the most informative subset of 
finite "training sets". Similarly, the training procedure of samples, AL improves model performance and reduces the 
medical students called teacher-student curriculum annotation burden, particularly for image data. In AL, an 
learning is based on training by tasks with gradually MLT scans unlabeled data and recognizes the most 
growing difficulty. While each task uses smaller datasets informative samples. These samples are then presented to 
than those utilized in machine learning. Like, students can a human annotator (oracle) for labeling. This makes AL a 
start with a simple task, such as deciding if an image part of the Human-in-the-Loop paradigm, where only 
includes lesions, and later are asked to determine if the selected samples are used for training, often far fewer than 
lesions are malignant or benign which is a more in traditional supervised learning [51].  
complicated task. With time, they will progress to a more Formally, suppose that U is available big pool of 
complex task, like recognizing the subtypes of lesions [8]. unannotated data and that there are oracles to request 
In machine learning, CL works with a series of annotations for any unannotated sample xU to be added to 
training samples sorted in increasing order according to annotated set L. The goal is to train a model f(x | L∗) using 
learning difficulty. The order in which the samples are the annotated set L∗⊆L. A brute-force solution would 
introduced to the model is critical, as it can significantly involve requesting the oracle(s) to annotate each sample 
impact the model's performance. Curriculum learning is an xU, resulting in L∗ = L. However, this is a costly and not 
active area of research, particularly in applications such as practical solution. Theoretically, there is an optimal subset 
medical image diagnosis [8]. L∗ of data that can achieve performance equivalent to that 
A key point in CL is the design of data schedulers that obtained using the whole annotated dataset L, i.e. (f(x | L∗) 
control the sequence in which training samples are fed into ≈f(x | L)). AL is a trend of ML that tries to explore this 
the model. These schedulers can use a variety of methods optimal subset L∗, where the current model is f´(x | L´), L´ 
to determine sample difficulty, such as expert input, is an intermediate annotated data. AL intends to iteratively 
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 125 
explore the most informative data samples 𝑥∗
𝑖  to train the the difference between them, then considers the sample 
model, assuming that the unannotated data samples and the that has the smallest difference between the first and 
model will evolve through time, rather than choosing a second most likely labels to be annotated. 
constant subset of samples once for training.  Entropy sampling uses entropy as it is a measure of 
The selection of samples to be annotated is based on uncertainty to select a sample to be annotated. Entropy 
the informativeness of these requested samples. The measures the amount of information gained by considering 
evaluation of the informativeness of each un-annotated a sample and so selects the sample that maximizes the 
data sample xU is done given f´(xU | L´), then all selected information that has the largest entropy value [51].  
samples are demanded to be annotated. After the 
annotations, the new annotated data has been used to 4.5 Federated learning 
improve the model. This is done by retraining the whole 
Federated learning (FL), developed by Google in 2017 is a 
model using all available annotated data L´, or by using the 
collective distributed decentralized learning method that 
most recently annotated sample 𝑥∗
𝑖  to fine-tune the network 
allows many organizations to collaborate on machine 
[51].  
learning or deep learning models without sharing clients' 
Active Learning typically employs three methods to 
or devices’ data. It allows the training data to be on the 
select samples for annotation: 
decentralized edge devices rather than keeping it in a data 
Stream-based selective sampling supposes the 
center. These individual nodes or devices jointly train a 
existence of a continuous flow of unannotated data 
machine learning or deep learning model from their local 
samples xU. In this method, the present model and 
data and then aggregate the devices' training outputs on the 
informativeness I(xU) measure are the criteria used to 
server to update the global model without sharing edge 
specify, for each incoming sample whether or not to 
data. The resulting model can be shared among all 
require an annotation from the oracle(s). Thus, while the 
participating devices or clients. Therefore, it provides 
model is being trained, it is offered a data sample and 
secure models that fulfill an efficient solution while 
instantly decides if it needs to query for the label. Although 
providing data access and security [52-54]. 
this type of query is inexpensive, its performance is limited 
One major issue with centralized models is that 
because it does not consider the broader context of the 
medical organizations do not allow to break doctor-patient 
underlying distribution, but it depends on the separation 
confidentiality by providing medical images such as CT 
nature of each decision, therefore the balance between the 
and X-ray images for training purposes because of privacy, 
exploration and exploitation is less than in other query 
legal, and data-ownership issues. To develop deep learning 
kinds. 
models for the medical domain, large medical data is 
Membership query synthesis generates the sample 
needed to develop these models. Therefore, many medical 
𝑥∗
𝐺  that the model believes to be most informative, rather 
researchers illustrated that federated learning is a good 
than selecting from real-world data. Therefore, it is 
technique to connect different medical organizations and 
annotated by the oracle(s). This method may be very 
let them share their experiences while keeping privacy. 
effective in bounded domains, but it may struggle when the 
Furthermore, the performance of the learning model will 
model has no knowledge of unrepresented areas of the data 
be improved using a large medical dataset. However, the 
distribution, similar to stream-based methods.  
resulting models may be biased toward organizations that 
Pool-based sampling selects N data samples 𝑥∗
0 , . . . 
have larger training datasets [53].  
, 𝑥∗
𝑁 from a large unlabeled dataset U to pull samples from. In federated learning, the process begins by sending a 
Pool-based approaches use the present model to do a global model with unified initial weights to each client. At 
prediction on un-annotated data samples to get a ranked each client side, there is a local dataset, where the model is 
measure of informativeness for each data sample in the un- trained in each separately. After completing local training, 
annotated data. The highest N informative samples are the client sends its model updates back to the server, which 
selected for annotation by the oracle(s). Therefore, the aggregates these updates to refine the global model, while 
model is initially trained on labeled samples which are then the data at the clients remains local in each client. The 
used to find which data samples would be most server has the authority to manage the whole process 
informative to be inserted into the training set for the next where it sends the model to the client, collects the updates, 
AL loop. This approach has proved to be the most and synchronizes them to build the updated model with the 
promising, which depends on batch-based training. Figure new parameters. This method enables medical 
11 shows the full process of active learning. organizations to collaborate on training models while 
AL uses some informativeness measures of unlabeled maintaining data privacy. There are different federated 
samples to select the most informative samples. They learning algorithms according to the computation method 
depend on probabilities, these approaches are least of gradients such as federated stochastic gradient descent, 
confidence sampling, margin sampling, and entropy federated averaging, and federated learning with dynamic 
sampling [51]. regularization. [53, 54]. Figure 12 shows the architecture 
Least confidence sampling the model selects the of federated learning. 
highest uncertainty sample or least confidence for  
annotation and therefore is given to the oracle to be  
labeled.    
Margin sampling can be utilized in a multi-class, it  
uses the first and second most likely labels and computes 
126 Informatica 49 (2025) 115–136 E.M.F. El Houby 
 
 
 
Figure 11: The process of active learning. 
 
 
 
Figure 12: Federated Learning architecture. 
 
5   Search methodology were used in the search: “active learning”, “curriculum 
learning”, “deep learning”, “transfer learning” and 
5.1. Search criteria “federated learning” to investigate the different research 
that utilizes these recent trends. Additional keywords—
This research investigates recent trends in machine "medical", "disease", "cancer" and "gene" were included 
learning (ML) within the medical domain. To achieve this, to focus the search on medical applications that used these 
we explored a ScienceDirect (Elsevier) new trends. Although the search intended to retrieve the 
(http://www.sciencedirect.com). The following keywords  articles related to any disease, "cancer" was added to 
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 127 
retrieve more relevant results, given that much of the 5.2. Data extraction 
recent research in ML is focused on cancer. Publications 
As the search retrieved a large number of articles, 
from 2016 to 2024 were considered. The composition of 
therefore only a subset of the retrieved articles was 
the used terms to form the search query used for deep 
selected for analysis. Figures 13-15 illustrate the number 
learning-based techniques in the medical domain was: 
of publications per year for the various techniques 
"Deep learning" AND ("medical" OR "Disease" OR 
between 2016- 2024, based on “Elsevier” database to 
"Cancer" OR "Gene"). 
show the growth rates of these new trends.  
Where the aim of this research is to find the new 
• Figure 13 shows the steady increase in deep learning 
trends in machine learning techniques which after accurate 
publications, from 17 articles in 2016 to 2958 in 2024, 
investigation were found to be mostly based on “deep 
indicating a growing interest in applying deep learning 
learning” either alone or combined with other new 
in the medical domain. 
techniques such as “transfer learning”, “active learning”, 
• Figure 14 shows that transfer learning started to be 
“curriculum learning” and “federated learning”, so the 
applied in the medical domain in 2017 with 2 articles 
same query was used as for deep learning-based 
only and reached 218 in 2024. 
techniques with adding the other techniques’ keyword as 
follow: • Figures 15 shows that the number of publications of 
("Deep learning" AND "*") AND ("medical" OR active learning, curriculum learning, and federated 
"Disease" OR "Cancer" OR "Gene") learning is limited and scattered across the years as they 
Where “*” can be replaced by any of the other are newly emerged trends. 
techniques’ keywords (“transfer learning”, “active The selected articles were drawn from top journals in 
learning”, “curriculum learning”, and “federated ScienceDirect, adhering to the criteria mentioned above. 
learning”).  The references provide a sample of the applications of 
The following criteria were applied to select the these new ML techniques in the medical domain, rather 
publications: (1) Articles related to human diseases (other than an exhaustive list. For each reference, key details 
organisms’ related diseases are excluded); (2) Inclusion of such as the task, disease, technique(s) used, evaluation 
at least one of the new ML techniques; (3) Only complete results, and data type are presented. 
research articles were included (excluding letters, surveys,  
book chapters, and non-English articles); (4) publications  
published from 2016 to 2024.  
 
 
Figure 13: The number of articles published on deep learning from 2016- 2024 in Elsevier database.
 
 
 
 
 
128 Informatica 49 (2025) 115–136 E.M.F. El Houby 
 
Figure 14: The number of articles published on transfer learning from 2016- 2024 in Elsevier database. 
 
 
 
 
Figure 15: The number of articles published on active/curriculum/Federated learning from 2016- 2024 in Elsevier 
database. 
 
6   Some applications of new trends of accuracy of more than 99% and FAUC of 0.982 when 
applied to the Chest X-ray radiographs dataset [56]. 
MLTs in the medical domain El Houby & Yassin [57] developed a CNN model to 
This section illustrates the selected articles from the classify the breast mammographic images' into 
retrieved ones from searching the databases which nonmalignant or malignant. They used 2 methods, the first 
represent the applications of previously discussed is based on patches of region of interest (ROI) in the 
emerging ML trends in the medical domain.  mammogram and the second is based on the whole breast. 
Li, X., et al., [55] proposed a DL model to detect lung The accuracy, specificity, sensitivity, and AUC were 
nodules. First, segmentation and rib suppression were 95.3%, 92.6%, 98%, and 0.974 respectively using MIAS 
applied to extract the region of interest and enhance the [20] dataset, while they were 96.52%, 96.49%, 96.55%, 
nodules’ visibility. Then, the histogram was applied to and 0.98 using INbreast [58] dataset. 
enhance the images. After that, patch-based multi- Dai, Y., et al., [59] developed a deep learning CNN 
resolution CNN was used for feature extraction, and 4 model for detecting coronary artery disease utilizing raw 
fusion methods were employed for classification, the best heart sound signals. It extracts 206 multidomain features 
performance method to detect lung nodules achieved an  and 126 medical multidomain features. The heart sound 
 signal datasets have been collected from 400 patients from 
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 129 
the hospital of Xinjiang Medical University. The model Neuroimaging dataset [69] and achieved an accuracy of 
achieved an accuracy of 87.86, sensitivity of 90.67, 98.54% recalls of 98.9%, a precision of 98.98%, and an 
specificity of 82.38, and AUC of 94.70 using multidomain F1 score of 98.82%.   
features. It achieved an accuracy of 85.6, sensitivity of Kumar et al. [70] developed a CNN model using the 
88.04, specificity of 80.83, and AUC of 92.74 using Resnet152 TL approach with feature extractors to classify 
medical multidomain features. the brain tumor images into normal, benign, and 
Alassafi et al., [60] proposed a model that predict the malignant. The model has been applied to the Brats MRI 
distribution of the COVID-19 outbreak in Saudi Arabia, image dataset. The proposed transfer Learning model 
Malaysia, and Morocco. A DL RNN and LSTM network achieved high accuracy reaching 99.57%. 
were developed to predict the number of possible cases of Manickam, et al., [71] proposed a deep TL model for 
COVID-19. The LSTM achieved an accuracy of 98.58%, pneumonia detection. The chest X-ray images were 
while the RNN achieved an accuracy of 93.45%. A preprocessed to recognize the existence of pneumonia 
comparison was conducted between the number of based on the U-Net segmentation network, then classify 
resulting deaths and the number of coronavirus cases in the cases as normal or abnormal (Bacteria, viral) using 
each of the 3 countries. The model predicted the number pre-trained models such as ResNet50, InceptionV3, and 
of certain COVID-19 cases and deaths for the following 7 Inception ResNetV2. It was evaluated using a publicly 
days. The model was tested using a public dataset from the available database which includes 5,232 chest x-ray 
European Centre for Disease Prevention and Control [61]. images. ResNet50 model achieved an accuracy of 93.06%, 
Maiti et al., [62] developed a deep learning (DL)- precision of 88.97%, Recall of 96.78%, and F1-score of 
based framework to automatically detect and segment the 92.71%.  
optic disc from fundus images for the diagnosis of diabetic Veknugopal, et al., [72] developed a DNN using 
retinopathy. The framework utilized an adjusted CNN, modified EfficientNetV2-M based on transfer learning to 
experimenting with seven different encoder networks: detect skin cancer on dermoscopic images. The model was 
DenseNet121, InceptionV3, ResNet34, GG11, VGG19, applied to 58,032 dermoscopic images collected from [73-
VGG13, and VGG16. VGG16 was selected as the adopted 77]. The model was tested for binary classification tasks 
encoder, while the decoder was designed with a harmonic and the multiclass classification tasks. It achieved an 
structure based on that of the encoder to improve accuracy reached 97.62 for the multiclass classification of 
segmentation performance. The framework was applied to the ISIC 2020 dataset, while it achieved an accuracy of 
several fundus image datasets, including DIARETDB1, 99.23 for the binary classification of the same dataset.  
MESSIDOR, IDRiD, DIARETDB0, CHASE-DB1, Mehmood, et al., [78] developed a model to diagnose 
DRIVE, and STARE. It achieved an impressive accuracy Alzheimer’s disease (AD) in its early stage based on TL 
of 99.44%. using VGG-19 pre-trained model. The model 
Zareen, et al. [63] developed a skin cancer distinguishes among 4 classes which are AD, late mild 
classification deep learning CNN-RNN model with a cognitive impairment (LMCI), early mild cognitive 
ResNet-50 for spatial features extraction and LSTM for impairment (EMCI), and normal control (NC). The used 
temporal dependencies. The model has been applied to a dataset was collected from the AD Neuroimaging 
dataset of 9000 images of skin lesions representing 9 Initiative (ADNI) [69] database. In the pre-processing 
cancer types. The model achieved an accuracy of 94.48, a phase, the gray matter (GM) tissue was segmented from 
sensitivity of 94.38, and a specificity of 93. brain MRI, and then VGG-19 was used to classify the 
Ge, R., et al., [64] Proposed a Dual-Enhanced segmented parts. The model achieved an accuracy of 
Convolutional Ensemble Neural Network (DECENN) to 98.73% to distinguish between AD and NC, 83.72% to 
detect the presence or absence of metastasis in the whole distinguish between LMCI and EMCI cases, and more 
slide imaging patches of breast cancer. It utilizes VGG16 than 80% to distinguish between the other combinations 
and DenseNet121 in the network. It was applied to the of classes. 
updated version of a benchmark dataset of microscopic Al-Shabi, Shak, and Tan, [79] developed a 
images and histopathologic scans of lymph node sections Progressive Growing Channel Attentive Non-Local 
for the breast [65]. It achieved an accuracy of about (ProCAN) deep learning model to classify lung nodules as 
98.92%, an AUC of 99.70%, and a F-score of 98.93%.  benign or malignant. Curriculum Learning (CL) was used 
Liu, Q., X. She & Q. Xia [66] proposed a model to to train easy samples before hard samples. The model has 
classify osteosarcoma cells and other cell types using an gradually grown to improve the possibility of classifying 
updated version of CA-MobileNet V3 based on the the samples based on CL. The model has been applied to 
transfer learning model. It was applied to osteosarcoma samples from 2 publicly available CT scan datasets LIDC-
cells microscopy imaging of bone cancer dataset [67]. It IDRI [80] and LUNGx [81]. It achieved an accuracy of 
achieved an accuracy of 98.69 % and f1-score of 94.11. 95.28% and AUC of 98.05%, precision of 95.75, 
Oommen & Arunnehru [68] proposed a model to sensitivity of 94.33 and F1-Score of 95.04.  
diagnose Alzheimer’s disease in its early stages. The Cho, Y., et al., [82] proposed CL model using a DL 
proposed model contains 3 phases: preprocessing the CNN to classify chest radiograph (CXR) images into 
images, extracting features using TL with ResNet-18, normal and five types of pulmonary abnormalities. The 
which are then compressed by cascaded autoencoders model used ResNet-50 for training on patches of CXR 
(AE), and finally classifying the disease to one of its 5 images based on the various patch ratios according to pre-
stages using DNN. The model was applied to the MRI trained weights, with fine-tuning using transfer learning 
130 Informatica 49 (2025) 115–136 E.M.F. El Houby 
(TL). The model was applied to CXR from hospitals, Zhang, et al., [92] developed a semi-supervised 
including Seoul National University Bundang Hospital framework for brain segmentation that incorporates 
(SNUBH) and Asan Medical Center (AMC). It achieved quality-driven active learning (QDAL). In the AL module, 
the following accuracies: 90.97% for 20% of the dataset at deep supervision loss and attention mechanism improve 
SNUBH, 91.92% for 50%, and 93.00% for 100%. At the accuracy of segmentation and return quality 
AMC, the accuracies were 93.90%, 94.54%, and 95.39%, information for the unlabeled slices. The AL module 
respectively. chooses the most informative slices to be annotated, and 
Wong et al., [83] developed a CL-based method for the segmentation process is trained iteratively using the 
classifying medical images, using features from updated labeled data. The framework was tested on two 
segmentation networks. The model first learns simpler brain MRI datasets [93, 94]. The experiment results 
shapes and features through a segmentation network pre- showed that the segmentation utilizing the QDAL only 
trained on similar data, then applies this knowledge for wants 15–20% annotated slices for the brain extraction 
more complex classification tasks. The M-Net, a CNN task, and 30–40% for tissue segmentation, achieving 
modified from U-Net for working with fewer training competitive results with full supervision and an accuracy 
samples, was used for segmentation. Then the CNN of 90.7. 
classifier receives the features from a segmentation Lu, Q., et al.,[95] presented a blood cell classification 
network as inputs. The model achieved an accuracy of method called MAE4AL, which combines the self-
82% in a 3D 3-classes brain tumor classification and 86% supervised Masked Autoencoder (MAE) and active 
on a 2D nine-class cardiac semantic level classification learning (AL). It chooses the most remarkable samples for 
problem. labeling based on self-supervised loss of MAE and sample 
Wu, et al. [84] developed a weakly-supervised deep uncertainty. Tested on blood smear samples obtained from 
AL framework to diagnose COVID-19 using CT scans. [96], MAE4AL needed labeling only 20% of the data to 
The framework contains a 2D U-Net for segmentation of perform the same as ResNeXt, which was trained on the 
the lung region and a hybrid active learning approach, full dataset. When it trained using half of the labeled data, 
which keeps sample diversity and predicted loss diagnosis MAE4AL achieved an accuracy of 96.36%, 
of COVID-19. The framework classifies the CT scans into outperforming ResNeXt which trained on all the data. 
one of three classes which are pneumonia, coronavirus Kumbhare et al. [97] developed a FL method for 
pneumonia caused by SARS-CoV-2, and normal cases. breast cancer diagnosis using mammogram images from 
The framework was validated on a CT scan dataset from the “Curated Breast Imaging Subset of DDSM (CBIS-
the China Consortium of Chest CT Image Investigation DDSM)” dataset [98]. The DenseNet pre-trained model 
(CC-CCII) [85]. With only 30% of the labeled data, the was used for feature extraction and the extracted features 
accuracy of the framework reached 0.867, while AUC was were classified using Enhanced Recurrent Neural 
0.968. Networks (E-RNN). FL was employed to reduce 
Wu, X., et al., [86] proposed a hybrid active learning processing time and improve model performance. The 
(HAL) framework that combines AL with deep TL using method achieved an accuracy of 95%.  
ResNet18. The framework applies data augmentation to Feki, et al. [53] proposed a decentralized FL 
the unlabeled data pool and uses a hybrid sampling framework that permits different medical organizations to 
approach that maintains sample variety and classification screen COVID-19 using Chest X-ray images based on 
loss (data uncertainty). The diversity sampling is based on deep learning while keeping the privacy of patient data. 
data augmentation, while the generated data noise is Two pre-trained models which are VGG16 and ResNet50 
discarded with an outlier detection process. The HAL was were used for classification. The framework was tested 
validated on 3 medical image datasets which are the using four clients, where each client has his private dataset 
Hyper-Kvasir for gastrointestinal disease [87], Messidor and the same CNN models. The proposed FL framework 
for eye fundus images [88], and breast cancer datasets achieved competitive results compared to those models 
[89]. By applying the proposed framework to the Hyper- trained by sharing data. The best achieved accuracy was 
Kvasir dataset it achieves an accuracy of 0.871, precision 97% using the ResNet50 model with data augmentation. 
of 0.602, recall of 0.587, and F1-score of 0.594. Zhang, et al. [99] proposed a FL based on DL 
Meirelles, et al., [90] used Pool-based AL to train DL framework for the diagnosing brain disorders. The 
models for classifying Tumor Infiltrating Lymphocytes. proposed framework was tested on Autism Brain Imaging 
The proposed approach selects image patches based on Data Exchange (ABIDE) [100] dataset. The proposed 
feature grouping and prediction uncertainty. They framework achieved an average accuracy of 79% and 
introduced a Diversity-Aware Data Acquisition (DADA) reduced the communication burden of FL. 
method, which ensures diverse batch selection by Shaikh, et al. [101] developed an FL-based DL 
clustering images based on features and then choosing method to classify respiratory diseases by listening to lung 
uncertain patches from each cluster. The most uncertain sounds. Generative Adversarial Networks created new 
patches from each cluster are prioritized for selection, the lung sounds to train a neural network that classifies 4 lung 
clusters with the most uncertain patches must participate diseases, heart attack and normal breathing patterns. Using 
with more patches, the pool is updated by removing the two datasets [102, 103], the proposed method achieved an 
selected patches. By applying the proposed model to the accuracy of 92% for the classification of different 
cancer tissue image dataset [91], it achieved an AUC of respiratory diseases and heart failure. 
0.78 with fewer tissue patches and execution time.    
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 131 
Table 1 provides a summary of 25 selected articles not to present a comprehensive list. For each reference, the 
from top journals on Science Direct, published between table includes the task, disease, techniques used, 
2016 and 2024, based on a database search. These articles evaluation results, and data type. 
showcase applications of recent trends of MLTs in the  
medical domain and are intended to illustrate these trends,  
  
Table 1: Summary of the selected articles from search results for applications of the new ML in medical domain. 
Ref. Task Disease Used Technique(s) Evaluation Data Type 
results 
[55] Detection  Chest  DL-CNN  Acc.= 99% x-ray radiographs  
   Lung cancer  FAUC = 0.982 
[57] Classification Breast cancer DL-CNN Acc.=96.52% mammogram 
  spec.=96.4% 
Sen.=96.5%  
AUC =0.98 
[59] Detecting Coronary artery DL-CNN Acc.=87.86, % heart sound 
  disease Sen.=90.67% signals 
Spec.=82.38 % 
AUC = 94.70 
[60] Prediction COVID-19 DL-RNN Acc. = 93.45% Numerical 
  LSTM Acc.= 98.58% 
[62] Segmentation Diabetic DL-CNN Acc.= 99.44%  fundus images 
diagnosis retinopathy 
  
[63] Classification   Skin cancer DL-CNN-RNN Acc.=94.48,  skin lesion 
Sen.= 94.38    images 
spec. = 93 
[64] Detection Breast cancer DL-TL-VGG16- Acc.=98.92%, histopathologic 
  DenseNet121 AUC =99.70%,  image of lymph 
F-score = 98.93% node 
[66] Classification Bone cancer TL     Acc.=98.69 %  microscopic 
  CA-MobileNetV3 f1-score= 94.11% images of bone 
cancer 
[68] Classification Alzheimer’s TL- ResNet-18- AE- Acc. = 98.54% MRI 
  disease DL recalls = 98.9% Neuroimaging 
prec.=98.98%  dataset 
F1-score=98.82%  
[70] Classification Brain tumor TL-Resnet152-CNN Acc. =99.57% MRI 
  
[71] Segmentation Pneumonia U-Net Acc.=93.06% chest X-ray 
Detection TL- ResNet50 prec.=88.97%  
  Rec.=96.78%  
F1score=92.7  
[72] Classification   Skin cancer TL-EfficientNetV2- Acc. = 99.23 dermoscopic 
M images 
[78] Classification Alzheimer TL-VGG19 Acc.=98.73% MRI 
   
[79] Classification lung nodules DL-CNN-CL Acc. = 95.28% CT scans 
   AUC = 98.05% 
Precision = 95.75 
Sen = 94.33  
 F1-Score = 95.04 
[82] Classification pulmonary TL-ResNet-50 -CL Acc. =93.90, CXR 
  abnormalities 94.54, 95.39 
For 20%, 50%, 
100% of dataset 
[83] Segmentation brain tumor TL-M-Net Acc.= 82% MR 
Classification cardiac DL-CNN-CL Acc.= 86% 
  
132 Informatica 49 (2025) 115–136 E.M.F. El Houby 
[84] Segmentation COVID-19 TL-U-Net  Acc=0.866 CT scans 
Classification DL-AL ROC= 0.968  
  
[86] Classification gastrointestinal TL-ResNet18 AL Acc. =0.871 Images 
 disease Prec. =0.602 
Recall=0.587 
F1score=0.594 
[90] Classification Tumor DL_CNN_AL AUC = 0.78 histology image 
 Infiltrating  
Lymphocytes 
[92] Segmentation Brain DL_CNN_AL Acc.= 90.7 MRI 
  
[95] Classification blood diseases MaskedAutoencoder  Acc.= 96.36% blood smear 
 (Leukemia) (MAE4AL) samples 
[97] Classification Breast FL-TL-DenseNet- Acc.=95%  mammogram 
 Cancer RNN   
[53] Classification COVID-19  FL-TL-VGG16/ Acc. = 97% X-ray images 
 ResNet50  
[99] Classification Brain disorders FL-CNN Acc. = 79% Autism Brain 
 Imaging 
    [101] Classification respiratory FL-DL Acc. = 92% breathing sounds 
 diseases& heart 
failure 
 
 
7    Conclusion and future work derived from limited labeled data, thereby enhancing 
medical decision-making and patient outcomes. 
This research explored the emerging trends in machine Looking ahead, there are several key areas where 
learning techniques (MLTs) within the medical domain. further work is needed. While the number of publications 
Through a comprehensive literature review, we found that on deep learning in the medical domain has steadily 
deep learning has become the dominant trend, holding increased since its initial applications in 2016, and 
significant promise for developing intelligent medical although these applications have yielded promising 
applications. A key advantage of deep learning is its results, further research is essential to address several key 
ability to perform automatic feature engineering, challenges. Areas such as active learning, curriculum 
simplifying the model-building process and reducing learning, and federated learning have shown promise but 
reliance on manual input. Current research predominantly remain under-explored and require more attention in 
addresses diagnostic tasks, with disease classification future research. A critical direction for future is to focus 
being the most common approach. Other tasks, such as on reducing the time and computational costs associated 
segmentation, are also explored. Cancer, in its various with deep learning models and other trends. These 
forms, is the most frequently studied condition, while the processes often consume substantial energy, indirectly 
COVID-19 pandemic has notably led to a surge in contributing to environmental and climate concerns. 
research on lung diseases.   Therefore, developing more energy-efficient techniques 
In the realm of medical imaging, traditional machine will be crucial. Additionally, data augmentation, a 
learning approaches require extensive pre-processing, significant pre-processing step in deep learning, could be 
including feature extraction and selection. Deep learning, integrated more effectively into the model-building 
particularly Convolutional Neural Networks (CNNs), has process itself, thereby enhancing sample diversity and 
advanced the field by automating feature engineering, improving class representation with less manual effort. 
reducing the need for manual intervention. However, this Another important aspect for future research is the 
comes with an increased demand for large datasets and development of standardized, public databases that 
significant computational resources. To address these include diverse patient data, such as DNA sequences. 
challenges, recent trends like transfer learning, curriculum These databases would enable more comprehensive 
learning, active learning, and federated learning have been studies and improve the accuracy of predictive models by 
introduced to enhance model performance, expedite the providing a richer set of input data. Additionally, 
training process, and improve data security. In summary, integrating knowledge from multiple domains could 
the overarching goal in this field is to automate processes, further enhance the performance of deep learning models 
reduce human intervention, and maximize the value  in different medical applications. Despite the progress 
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 133 
made, the real challenge lies in translating these [15] Zhang, Z. and E. Sejdić, Radiological images and 
advancements into practical, real-world applications that machine learning: trends, perspectives, and 
can be implemented in clinical settings. Bridging the gap prospects. Computers in biology and medicine, 
between theoretical research and clinical deployment will 2019. 108: p. 354-370. 
be vital to realizing the full potential of deep learning in [16] Saslow, D., et al., American Cancer Society 
medicine. guidelines for breast screening with MRI as an 
adjunct to mammography. CA: a cancer journal for 
Conflicts of interest clinicians, 2007. 57(2): p. 75-89. 
[17] http://medicaldictionary.thefreedictionary.com/op
The author has no competing interests to declare. erating+microscope. 
[18] Jones, N.C. and P.A. Pevzner, An introduction to 
References bioinformatics algorithms. 2004: MIT press. 
[19] Rahman, T., et al., COVID-19 radiography 
[1] Chen, M., et al., Disease prediction by machine 
database. 
learning over big data from healthcare communities. 
https://www.kaggle.com/tawsifurrahman/covid19-
Ieee Access, 2017. 5: p. 8869-8879. 
radiography-database. 
[2] Grossman, R.L., et al., Toward a shared vision for 
[20] Suckling, J., et al., Mammographic image analysis 
cancer genomic data. New England Journal of 
society (mias) database v1. 21. 2015. 
Medicine, 2016. 375(12): p. 1109-1112. 
[21] Labati, R.D., V. Piuri, and F. Scotti. All-IDB: The 
[3] Schaekermann, M., et al., Understanding expert 
acute lymphoblastic leukemia image database for 
disagreement in medical data analysis through 
image processing. in 2011 18th IEEE international 
structured adjudication. Proceedings of the ACM on 
conference on image processing. 2011. IEEE. 
Human-Computer Interaction, 2019. 3(CSCW): p. 1-
[22] Zoph, B., et al. Learning transferable architectures 
23. 
for scalable image recognition. in Proceedings of 
[4] Garg, A. and V. Mago, Role of machine learning in 
the IEEE conference on computer vision and 
medical research: A survey. Computer Science 
pattern recognition. 2018. 
Review, 2021. 40: p. 100370. 
[23] Thusberg, J. and M. Vihinen, Pathogenic or not? 
[5] Dallora, A.L., et al., Machine learning and 
And if so, then how? Studying the effects of 
microsimulation techniques on the prognosis of 
missense mutations using bioinformatics methods. 
dementia: A systematic literature review. PloS one, 
Human mutation, 2009. 30(5): p. 703-714. 
2017. 12(6): p. e0179804. 
[24] El Houby, E.M., Machine learning techniques for 
[6] Kharazmi, P., et al., A computer-aided decision 
pathogenicity prediction of non-synonymous single 
support system for detection and localization of 
nucleotide polymorphisms in human body. Journal 
cutaneous vasculature in dermoscopy images via 
of Ambient Intelligence and Humanized 
deep feature learning. Journal of medical systems, 
Computing, 2023. 14(7): p. 8099-8113. 
2018. 42(2): p. 1-11. 
[25] Han, J., J. Pei, and M. Kamber, Data mining: 
[7] Xu, J., K. Xue, and K. Zhang, Current status and 
concepts and techniques. 2011: Elsevier. 
future trends of clinical diagnoses via image-based 
[26] Hamla, H. and K. Ghanem, A hybrid feature 
deep learning. Theranostics, 2019. 9(25): p. 7556. 
selection based on fisher score and SVM-RFE for 
[8] Xie, X., et al., A survey on incorporating domain 
microarray data. Informatica, 2024. 48(1). 
knowledge into deep learning for medical image 
[27] Fahrudin, T.M., I. Syarif, and A.R. Barakbah. Ant 
analysis. Medical Image Analysis, 2021. 69: p. 
colony algorithm for feature selection on 
101985. 
microarray datasets. in 2016 International 
[9] Lee, C.H. and H.-J. Yoon, Medical big data: promise 
Electronics Symposium (IES). 2016. IEEE. 
and challenges. Kidney research and clinical 
[28] Talavera, L. An evaluation of filter and wrapper 
practice, 2017. 36(1): p. 3. 
methods for feature selection in categorical 
[10] Dinov, I.D., Methodological challenges and analytic 
clustering. in International Symposium on 
opportunities for modeling and interpreting Big 
Intelligent Data Analysis. 2005. Springer. 
Healthcare Data. Gigascience, 2016. 5(1): p. 
[29] Tabakhi, S. and P. Moradi, Relevance–redundancy 
s13742-016-0117-6. 
feature selection based on ant colony optimization. 
[11] http://archive.ics.uci.edu/ml/datasets/. 
Pattern recognition, 2015. 48(9): p. 2798-2811. 
[12] Available from: http://image-
[30] Yang, X.-S., Firefly algorithm, stochastic test 
net.org/challenges/LSVRC/. 
functions and design optimisation. International 
[13] https://file.biolab.si/biolab/supp/bicancer/projection
journal of bio-inspired computation, 2010. 2(2): p. 
s/. 
78-84. 
[14] Corsetti, V., et al., Evidence of the effect of adjunct 
[31] Mashhour, E.M., et al., Feature Selection 
ultrasound screening in women with mammography-
Approach based on Firefly Algorithm and Chi-
negative dense breasts: interval breast cancers at 1 
square. International Journal of Electrical & 
year follow-up. European journal of cancer, 2011. 
Computer Engineering (2088-8708), 2018. 8(4). 
47(7): p. 1021-1026. 
[32] Neagoe, V.-E. and E.-C. Neghina. Feature 
selection with ant colony optimization and its 
134 Informatica 49 (2025) 115–136 E.M.F. El Houby 
applications for pattern recognition in space neural networks. in Proceedings of the IEEE 
imagery. in 2016 international conference on conference on computer vision and pattern 
communications (COMM). 2016. IEEE. recognition. 2014. 
[33] El Houby, E.M., N.I. Yassin, and S. Omran, A [49] Wang, L., et al., Trends in the application of deep 
hybrid approach from ant colony optimization and learning networks in medical image analysis: 
K-nearest neighbor for classifying datasets using Evolution between 2012 and 2020. European 
selected features. Informatica, 2017. 41(4). Journal of Radiology, 2022. 146: p. 110069. 
[34] LeCun, Y., Y. Bengio, and G. Hinton, Deep [50] Jiménez-Sánchez, A., et al. Medical-based deep 
learning. nature, 2015. 521(7553): p. 436-444. curriculum learning for improved fracture 
[35] Goodfellow, I., Y. Bengio, and A. Courville, Deep classification. in International Conference on 
learning. 2016: MIT press. Medical Image Computing and Computer-Assisted 
[36] Dezaki, F.T., et al., Cardiac phase detection in Intervention. 2019. Springer. 
echocardiograms with densely gated recurrent [51] Budd, S., E.C. Robinson, and B. Kainz, A survey 
neural networks and global extrema loss. IEEE on active learning and human-in-the-loop deep 
transactions on medical imaging, 2018. 38(8): p. learning for medical image analysis. Medical 
1821-1832. Image Analysis, 2021. 71: p. 102062. 
[37] Raiaan, M.A.K., et al., A systematic review of [52] KhoKhar, F.A., et al., A review on federated 
hyperparameter optimization techniques in learning towards image processing. Computers 
Convolutional Neural Networks. Decision and Electrical Engineering, 2022. 99: p. 107818. 
Analytics Journal, 2024: p. 100470. [53] Feki, I., et al., Federated learning for COVID-19 
[38] Ali, M.J., et al., A review of AutoML optimization screening from Chest X-ray images. Applied Soft 
techniques for medical image applications. Computing, 2021. 106: p. 107330. 
Computerized Medical Imaging and Graphics, [54] Wu, J.C.-H., et al., Dynamically Synthetic Images 
2024: p. 102441. for Federated Learning of Medical Images. 
[39] Ambekar, S. and R. Phalnikar. Disease risk Computer Methods and Programs in Biomedicine, 
prediction by using convolutional neural network. 2023: p. 107845. 
in 2018 Fourth international conference on [55] Li, X., et al., Multi-resolution convolutional 
computing communication control and automation networks for chest X-ray radiograph based lung 
(ICCUBEA). 2018. IEEE. nodule detection. Artificial intelligence in 
[40] Anwar, S.M., et al., Medical image analysis using medicine, 2020. 103: p. 101744. 
convolutional neural networks: a review. Journal [56] Li, X., et al. Rib suppression in chest radiographs 
of medical systems, 2018. 42(11): p. 1-13. for lung nodule enhancement. in 2015 IEEE 
[41] Oraibi, Z.A. and S. Albasri, A robust end-to-end International Conference on Information and 
cnn architecture for efficient covid-19 prediction Automation. 2015. IEEE. 
form x-ray images with imbalanced data. [57] El Houby, E.M. and N.I. Yassin, Malignant and 
Informatica, 2023. 47(7). nonmalignant classification of breast lesions in 
[42] Krizhevsky, A., I. Sutskever, and G.E. Hinton, mammograms using convolutional neural 
Imagenet classification with deep convolutional networks. Biomedical Signal Processing and 
neural networks. Advances in neural information Control, 2021. 70: p. 102954. 
processing systems, 2012. 25. [58] I. C. Moreira, I.A., I. Domingues, A. Cardoso, M. 
[43] Simonyan, K. and A. Zisserman, Very deep J. Cardoso, and J. S. Cardoso, Inbreast: toward a 
convolutional networks for large-scale image full-field digital mammographic database. 
recognition. arXiv preprint arXiv:1409.1556, Academic radiology, 2012. 19(2): p. 236-248. 
2014. [59] Dai, Y., et al., Deep learning fusion framework for 
[44] Szegedy, C., et al. Rethinking the inception automated coronary artery disease detection using 
architecture for computer vision. in Proceedings of raw heart sound signals. Heliyon, 2024. 10(16). 
the IEEE conference on computer vision and [60] Alassafi, M.O., M. Jarrah, and R. Alotaibi, Time 
pattern recognition. 2016. series predicting of COVID-19 based on deep 
[45] He, K., et al. Deep residual learning for image learning. Neurocomputing, 2022. 468: p. 335-344. 
recognition. in Proceedings of the IEEE [61] https://www.ecdc.europa.eu/en/publications-
conference on computer vision and pattern data/download-todays-data-geographic- and 
recognition. 2016. distribution-covid-19-cases-worldwide. 
[46] Deng, J., et al. Imagenet: A large-scale [62] Maiti, S., et al., Automatic detection and 
hierarchical image database. in 2009 IEEE segmentation of optic disc using a modified 
conference on computer vision and pattern convolution network. Biomedical Signal 
recognition. 2009. Ieee. Processing and Control, 2022. 76: p. 103633. 
[47] Russakovsky, O., et al., Imagenet large scale [63] Zareen, S.S., et al., Enhancing Skin Cancer 
visual recognition challenge. International journal Diagnosis with Deep Learning: A Hybrid CNN-
of computer vision, 2015. 115(3): p. 211-252. RNN Approach. Computers, Materials & Continua, 
[48] Oquab, M., et al. Learning and transferring mid- 2024. 79(1). 
level image representations using convolutional 
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 135 
[64] Ge, R., et al., Detection of presence or absence of [78] Mehmood, A., et al., A transfer learning approach 
metastasis in WSI patches of breast cancer using for early diagnosis of Alzheimer’s disease on MRI 
the dual-enhanced convolutional ensemble neural images. Neuroscience, 2021. 460: p. 43-52. 
network. Machine Learning with Applications, [79] Al-Shabi, M., K. Shak, and M. Tan, ProCAN: 
2024. 17: p. 100579. Progressive growing channel attentive non-local 
[65] Cukierski, W., Histopathologic cancer detection. network for lung nodule classification. Pattern 
Kaggle. https://kaggle. Recognition, 2022. 122: p. 108309. 
com/competitions/histopathologic-cancer- [80] Armato III, S.G., et al., The lung image database 
detection, 2018. consortium (LIDC) and image database resource 
[66] Liu, Q., X. She, and Q. Xia, AI based diagnostics initiative (IDRI): a completed reference database 
product design for osteosarcoma cells microscopy of lung nodules on CT scans. Medical physics, 
imaging of bone cancer patients using CA- 2011. 38(2): p. 915-931. 
MobileNet V3. Journal of Bone Oncology, 2024: p. [81] Armato III, S.G., et al., LUNGx Challenge for 
100644. computerized lung nodule classification. Journal of 
[67] Charilaou, P. and R. Battat, Machine learning Medical Imaging, 2016. 3(4): p. 044506-044506. 
models and over-fitting considerations. World [82] Cho, Y., et al., Optimal number of strong labels for 
Journal of Gastroenterology, 2022. 28(5): p. 605. curriculum learning with convolutional neural 
[68] Oommen, D.K. and J. Arunnehru, Alzheimer’s network to classify pulmonary abnormalities in 
Disease Stage Classification Using a Deep chest radiographs. Computers in Biology and 
Transfer Learning and Sparse Auto Encoder Medicine, 2021. 136: p. 104750. 
Method. Computers, Materials & Continua, 2023. [83] Wong, K.C., T. Syeda-Mahmood, and M. Moradi, 
76(1). Building medical image classifiers with very 
[69] http://adni.loni.usc.edu. limited data using segmentation networks. Medical 
[70] Kumar, K.A., A. Prasad, and J. Metan, A hybrid image analysis, 2018. 49: p. 105-116. 
deep CNN-Cov-19-Res-Net Transfer learning [84] Wu, X., et al., COVID-AL: The diagnosis of 
architype for an enhanced Brain tumor Detection COVID-19 with deep active learning. Medical 
and Classification scheme in medical image Image Analysis, 2021. 68: p. 101913. 
processing. Biomedical Signal Processing and [85] Zhang, K., et al., Clinically applicable AI system 
Control, 2022. 76: p. 103631. for accurate diagnosis, quantitative measurements, 
[71] Manickam, A., et al., Automated pneumonia and prognosis of COVID-19 pneumonia using 
detection on chest X-ray images: A deep learning computed tomography. Cell, 2020. 181(6): p. 
approach with different optimizers and transfer 1423-1433. e11. 
learning architectures. Measurement, 2021. 184: [86] Wu, X., et al., HAL: Hybrid active learning for 
p. 109953. efficient labeling in medical domain. 
[72] Venugopal, V., et al., A deep neural network using Neurocomputing, 2021. 456: p. 563-572. 
modified EfficientNet for skin cancer detection in [87] Borgli, H., et al., HyperKvasir, a comprehensive 
dermoscopic images. Decision Analytics Journal, multi-class image and video dataset for 
2023. 8: p. 100278. gastrointestinal endoscopy. Scientific data, 2020. 
[73] Rotemberg, V., et al., A patient-centric dataset of 7(1): p. 1-14. 
images and metadata for identifying melanomas [88] Decencière, E., et al., Feedback on a publicly 
using clinical context. Scientific data, 2021. 8(1): distributed image database: the Messidor 
p. 34. database. Image Analysis & Stereology, 2014. 
[74] Tschandl, P., C. Rosendahl, and H. Kittler, The 33(3): p. 231-234. 
HAM10000 dataset, a large collection of multi- [89] Aresta, G., et al., Bach: Grand challenge on breast 
source dermatoscopic images of common cancer histology images. Medical image analysis, 
pigmented skin lesions. Scientific data, 2018. 5(1): 2019. 56: p. 122-139. 
p. 1-9. [90] Meirelles, A.L., et al., Effective Active Learning in 
[75] Codella, N.C., et al. Skin lesion analysis toward Digital Pathology: A Case Study in Tumor 
melanoma detection: A challenge at the 2017 Infiltrating Lymphocytes. Computer Methods and 
international symposium on biomedical imaging Programs in Biomedicine, 2022: p. 106828. 
(isbi), hosted by the international skin imaging [91] Saltz, J., et al., Spatial organization and molecular 
collaboration (isic). in 2018 IEEE 15th correlation of tumor-infiltrating lymphocytes using 
international symposium on biomedical imaging deep learning on pathology images. Cell reports, 
(ISBI 2018). 2018. IEEE. 2018. 23(1): p. 181-193. e7. 
[76] Combalia, M., et al., Bcn20000: Dermoscopic [92] Zhang, Z., et al., Quality-driven deep active 
lesions in the wild. arXiv preprint learning method for 3D brain MRI segmentation. 
arXiv:1908.02288, 2019. Neurocomputing, 2021. 446: p. 106-117. 
[77] Codella, N., et al., Skin lesion analysis toward [93] Shattuck, D.W., et al., Construction of a 3D 
melanoma detection 2018: A challenge hosted by probabilistic atlas of human cortical structures. 
the international skin imaging collaboration (isic). Neuroimage, 2008. 39(3): p. 1064-1080. 
arXiv preprint arXiv:1902.03368, 2019. [94] https://www.nitrc.org/projects/ibsr. 
136 Informatica 49 (2025) 115–136 E.M.F. El Houby 
[95] Lu, Q., et al., A blood cell classification method 
based on MAE and active learning. Biomedical 
Signal Processing and Control, 2024. 90: p. 
105813. 
[96] Matek, C., et al., Human-level recognition of blast 
cells in acute myeloid leukaemia with 
convolutional neural networks. Nature Machine 
Intelligence, 2019. 1(11): p. 538-544. 
[97] Kumbhare, S., A.B. Kathole, and S. Shinde, 
Federated learning aided breast cancer detection 
with intelligent Heuristic-based deep learning 
framework. Biomedical Signal Processing and 
Control, 2023. 86: p. 105080. 
[98] Lee, R.S., et al., A curated mammography data set 
for use in computer-aided detection and diagnosis 
research. Scientific data, 2017. 4(1): p. 1-9. 
[99] Zhang, C., et al., FedBrain: A robust multi-site 
brain network analysis framework based on 
federated learning for brain disease diagnosis. 
Neurocomputing, 2023. 559: p. 126791. 
[100] Heinsfeld, A.S., et al., Identification of autism 
spectrum disorder using deep learning and the 
ABIDE dataset. NeuroImage: Clinical, 2018. 17: p. 
16-23. 
[101] Shaikh, A.A.S. and M. Bhargavi, Weighted 
aggregation through probability based ranking: 
An optimized federated learning architecture to 
classify respiratory diseases. Computer Methods 
and Programs in Biomedicine, 2023. 242: p. 
107821. 
[102] Fraiwan, L., et al., Automatic identification of 
respiratory diseases from stethoscopic lung sound 
signals using ensemble classifiers. Biocybernetics 
and Biomedical Engineering, 2021. 41(1): p. 1-14. 
[103] Rocha, B., et al. Α respiratory sound database for 
the development of automated classification. in 
Precision Medicine Powered by pHealth and 
Connected Health: ICBHI 2017, Thessaloniki, 
Greece, 18-21 November 2017. 2018. Springer. 
 
https://doi.org/10.31449/inf.v49i16.7979                                                Informatica 49 (2025) 137–150   137 
 
Vision Transformer-Based Framework for AI-Generated Image 
Detection in Interior Design 
 
Hui Wang 
AnHui Business and Technology College, Hefei City, AnHui Province, 230041, China 
E-mail: leZhang2024@163.com 
 
Keywords: artificial intelligence-generated images,  interior design, vision transformers, deep learning, image 
classification 
 
Received: January 7, 2025 
 Increasingly, images generated by artificial intelligence (AI) are being used within interior design as a 
source of authenticity and ethical use. Based on limited Convolutional Neural Network (CNN) capabilities 
in data descriptive processes, including long-range dependencies and global patterns, this study examines 
how Vision Transformer (ViT) can be utilized in detecting AI-generated interior design images. We fine-
tuned and evaluated four ViT models, ViT-B16, ViT-B32, ViT-L16, and ViT-L32, on 1,000 samples per class 
dataset. Accuracy, precision, recall, F1-score, and computational efficiency were used to assess 
performance. Results show that models with smaller patch sizes (i.e., 16×16) perform better than larger 
ones (i.e., 32×32). It was found that ViT-B16 and ViT-L16 had the highest accuracy (96.25%) and F1-
score (0.9625) in identifying minor inconsistencies in the AI-generated images. ViT-B32 and ViT-L32 enjoy 
better computational efficiency based on lower classification performance (80.00% and 81.25% accuracy, 
respectively, for ViT-B32 and ViT-L32). The best tradeoff between accuracy and resource efficiency turns 
out to be ViT-B16. However, computational costs were higher with ViT — ViT-L16, although just as 
accurate. Computationally, ViT-B32 and ViT-L32 were also efficient, which was more appropriate for real-
time applications with lower accuracy than speed. Through this work, we contribute a domain-specific deep 
learning framework for AI-generated image detection in interior design to increase authenticity 
verification. Future work will address improving computational efficiency and generalizing the model 
across all (or most) generative models and design styles. 
Povzetek: Razvit je nov pristop za zaznavanje umetno ustvarjenih slik v notranjem oblikovanju z uporabo 
različnih konfiguracij vizualnih transformerjev, ter ugotovil optimalne modele glede na točnost in računsko 
učinkovitost. 
 
1    Introduction (GAN) and Diffusion Models, we have created highly 
realistic images that often outperform human-generated 
Artificial Intelligence (AI) has become increasingly designs in quality and detail. While these tools 
embedded in practice in creative industries, such as interior democratize access to creative resources, they also come 
design, through generating photo-realistic and innovative with problems such as authenticity, intellectual property, 
imagery [1]. Lately, tools like Generative Adversarial and ethical use. For example, it is essential to differentiate 
Networks (GANs) and diffusion models have between generated and made images in interior design 
democratized access to this high-quality design, but their because professional work in commercial and academic 
use has become ubiquitous [2, 3]. It brings challenging spaces may be compromised. While AI is increasingly 
problems around what 'authentic' designs are, how designs applied to create visual content, and domain-specific 
can be used ethically, and intellectual property rights. applications such as interior design are still in their 
Nearly all current AI detection methods leverage infancy, the lack of attention to developing robust means 
Convolutional Neural Networks (CNNs) for their feature to detect such images remains. 
extractors, and they are mainly limited to short-range Despite their effectiveness, many existing detection 
dependencies in image data. approaches rely more on Convolutional Neural Networks 
Based on Vision Transformers (ViTs) [4], a state-of- (CNNs), which cannot model long-range dependencies and 
the-art architecture, this study proposes their application as global patterns of high-dimensionality datasets, e.g., 
a transformative approach to detecting AI-generated images [6]. In recent years, with self-attention-based 
interior design images. This research lays out a solid mechanisms, Vision Transformers (ViTs) have emerged as 
foundation for authenticating AI-generated content by powerful surrogate models, achieving state-of-the-art 
removing barriers to scalability, computational efficiency, results in image classification and artefact detection tasks 
and domain-specific applications. Artificial intelligence [7]. One of their key attributes is their ability to model such 
(AI) has profoundly changed what it feels like in most noncontiguous relationships, thus offering a measurement 
industries, including interior design, with visualization, for identifying the subtle inconsistencies underlying AI-
generated images. This study proposes a deep learning 
creativity, and presentation led by AI-generated images 
framework based on Vision Transformers to detect AI-
[5]. By the time of Generative Adversarial Networks 
138   Informatica 49 (2025) 137–150                                                                                                                                        H. Wang 
generated interior design images. The study fine-tunes interior design image classification. Ensemble Models: 
multiple ViT configurations (ViT-B16, ViT-B32, ViT-L16, Others have combined CNNs and transformer-based 
and ViT-L32) on a balanced dataset and compares their architectures to provide the best of both worlds. For 
performance w.r.t. accuracy, precision, recall, F1-score, example, hybrid architectures such as DeiT (data-efficient 
and computational efficiency. Results guide model image transformer) and feature extract early features via 
configuration choice when resources impose a tradeoff convolutional blocks [17-20], subsequently using 
between detection accuracy. transformer layers to perform global attention. 
The contributions of this work are threefold: Image classification and manipulation detection have 
become the state of art using Vision Transformers. On 
• Developing a domain-specific AI image detection 
approach targeted to interior design, high-dimensional datasets, they can divide the images into 
patches and apply self-attention to the relationships 
• Comparing a large number of ViT configurations to between them, leading to better performance [4, 21]. 
establish cost-benefit relationships,  
Several studies have highlighted their applicability: ViTs 
• The lessons learned from deploying transformer-based were introduced to demonstrate their scalability in 
models for AI content detection. challenging image classification tasks, outperforming 
First, the contributions of this research fill an essential gap traditional CNNs on large-scale datasets [4, 22]. 
in AI image authenticity verification, and second, they References [23-26] indicate that Vision Transformers are 
establish a foundation for future work in this young area. adequate detectors of subtle image manipulations, 
including deepfake detection. They, therefore, are a natural 
2    Background and related work choice of methodology for tasks where subtle minute 
image artefacts are exceedingly sensitive. The present 
Detecting artificial intelligence (AI)-made images is an study extends this foundation to a binary classification of 
emerging field of study, as people increasingly use AI- AI-generated and authentic images in interior design while 
based tools in creative spheres like interior design. This fine-tuning ViT models. 
literature review provides an overview of state-of-the-art The success of deep learning models and effective 
AI-generated content detection, specifically preprocessing is critical. Standard techniques to make 
methodologies and techniques that can be applied to using models robustly include image resizing, normalization, 
Vision Transformers (ViTs) to discriminate between AI- and data augmentation. References [27, 28] have 
generated and human-created images. researched that dataset balancing is necessary and that 
Thanks to the integration of AI, photo-realistic images, working with augmentation strategies is a better way to 
which resemble human-placed designs, are generated. The tackle class imbalances. In this study, we adopt these 
advanced generative models used by tools such as DALL- practices: samples per class were capped at 1,000, and the 
E, MidJourney, and Stable Diffusion make images more dataset was set up for diversity. Metrics like accuracy, 
indistinguishable from real things. These democratizing precision, recall, F1-score, and loss are used to evaluate 
advancements to creativity are a concern as they also put detection models, commonly called metrics. These are 
it into the public domain, worrying about authenticity and used to find misclassification patterns using confusion 
intellectual property rights [8-10]. There have been few matrices [29, 30]. In line with best current practice in the 
attempts to identify the key difficulties of detecting AI field, it suggested using a range of metrics to capture 
interior design images, leaving a vacant area for studying distinct aspects of model performance, which justifies the 
this field. choice of metrics made by the study. 
AI-generated image detection usually relies on machine Despite the advancements, several challenges persist in 
learning or deep learning models to identify little things detecting AI-generated images: (i) Subtle Artifacts: 
about artificial intelligence-generated images that would Detections of high-quality AI-generated images are 
not have come from them. Some commonly used complex because they are often not marked with visual 
techniques include:  artefacts. Generative models studied have recently 
Convolutional Neural Networks (CNNs): In the past, demonstrated their ability to learn and generate 
CNNs have been a core piece of image classification tasks. increasingly higher-quality actual image samples 
They have been shown to learn spatial hierarchies in seamlessly. (ii) Computational Complexity: Despite being 
images and to detect AI artefacts. For example, we highly accurate, transformer-based models are 
successfully used CNNs to detect GAN-generated images computationally expensive, making it a difficult task for 
[11, 12]. Global contextual relationships in high- resource-constrained environments. (iii) Dataset 
dimensional data can be solved tremendously well with Limitations: The generalization or transferability of 
CNNs [13], but they are commonly challenging. detection models for a specific domain, such as interior 
Transformer-Based Architectures: Based on our design, is limited by the lack of standardized datasets. 
Transformers, which were initially designed for natural We compare deep learning-based methods to detect AI-
language processing [14], we adapt them for vision tasks. generated images, particularly in interior design, as shown 
Self-attention mechanisms used by Vision Transformers in Table 1. Then, it compares those approaches' strengths, 
(ViTs) to capture local and global image patterns result in accuracy, precision, recall, and limits. 
ViTs being very powerful for detecting minute  
inconsistencies in AI-generated content [5, 15, 16]. In this  
work, we build upon the success of ViTs by extending it to  
Vision Transformer-Based Framework for AI-Generated Image…                                             Informatica 49 (2025) 137–150   139 
 Previous literature has discussed the detection of AI-
generated images across the more general areas at length, 
Table 1: Comparison of AI-generated image detection 
with little focus on the domain-specific application, 
methods 
interior design. Furthermore, most of the studies employ 
Methodol Key Accura Precisi Reca Limitations 
ogy Strengths cy on ll CNN-based solutions, while others, looking at the full 
CNN- Intense 85– High High Struggles capability of Vision Transformers, are less central. This 
Based feature 92% with long- study evaluates multiple ViT configurations for detecting 
Approach extraction range 
es for local dependencie AI-generated interior design images to fill these gaps. 
patterns; s; limited This literature review points out the significance and 
effective effectivenes importance of Vision transformers as a current state-of-
for GAN- s on high-
based quality the-art approach for detecting AI-generated images. This 
images textures study benefits from this capability since it helps to grow 
Hybrid Combines 89– High High Increased the body of work on the authenticity of AI-generated 
CNN- CNN's 94% computation
Transfor spatial al cost; content. Future work will then need to make computational 
mer awareness complex efficiency improvements, tackle domain-particular 
Models with model challenges, and standardize benchmarks for performance 
Transform training 
er's self- evaluation in interior design and more generally. 
attention 
Ensemble Enhances 91– High High Requires 
Models classificati 95% large-scale 3    Proposed method 
on datasets; 
robustness computation The proposed method uses deep learning to distinguish AI-
by ally generated images in interior design from human-created 
integratin expensive ones, as shown in Figure 1. Given that, for preprocessing 
g multiple 
architectur and balancing the input images, we limit samples per class 
es to be uniform and split the data into training and validation 
Vision Captures 96.25% 0.9637 0.96 High sets. The system uses the features extracted by Vision 
Transfor fine- 25 computation Transformer (ViT) models (ViT-B16, ViT-B32, ViT-L16, 
mers grained, al cost 
(ViTs) global requires ViT-L32) and classifies images. The defined parameters 
(Our dependenc extensive are used to train the model, and then you evaluate the 
Approach ies via pretraining. metrics such as accuracy and F1 score. Performance 
) self- analysis is realized through visualization of training 
attention; 
excels at samples, predictions, and validation metrics, leading to a 
detecting robust and interpretable approach. 
subtle 
artefacts 
 
 
Figure 1: Pipeline of the proposed methodology for AI-generated image detection in interior design. It consists of 
dataset collection, preprocessing, the Vision Transformer (ViT) feature extraction, training with AdamW optimization, 
140   Informatica 49 (2025) 137–150                                                                                                                                        H. Wang 
and evaluating using accuracy, precision, recall, and F1 score to maintain an optimal tradeoff between efficiency and 
performance. 
Different sizes of network depth, hidden • Random Rotation (±15°) was applied to introduce 
dimension size, self-attention heads, and total params are random image orientation variability, where 15° — 
Base (B), Large (L), and Huge (H) Vision Transformer rotational deviation. 
(ViT) models. ViT-B (Base) has 12 layers, 768 hidden • To simulate mirroring of interior design perspectives, 
dimensions, and 86 million parameters, achieving good simulate Horizontal Flipping (50% probability). 
performance and computational cost tradeoffs and being 
practical in real-world AI-generated image detection. The • The effect of random cropping (90% of the original 
better feature extraction performance results in ViT-L size) forces the model to pay attention to different 
(Large) with 24 layers, 1024 hidden dimensions, and image portions. 
307M parameters, which comes with higher computational • Applying Color Jitter (±0.2 on the Brightness, 
cost. The most resource-intensive ViT is ViT-H (Huge), Contrast, and Saturation adjustments) simulates the 
which comes with 32 layers, a hidden dimension of 1280, variations that might occur through lighting 
and 632 million parameters. It was left out for its high conditions. 
computational demands with no proportional accuracy 
Pixel values are normalized to the range [0,1]  or 
gains. For this reason, Base and Large models have been 
standardized using the mean 𝜇 and standard deviation 𝜎 of 
addressed in this study, as they ensure the optimal balance 
the dataset: 
between accuracy and efficiency, consequently making 
them feasible for AI-generated image detection in interior 𝐼′−𝜇
𝐼norm =          (2) 
design. 𝜎
Deep learning algorithms-based methodology to Images are divided into non-overlapping patches of size 
detect artificial intelligence (AI) generated images in 𝑃 × 𝑃(e.g., 16 × 16 or 32 × 32: 
interior design. The process consists of multiple steps, 
Patch = {𝑝𝑖,𝑗: 𝑝𝑖,𝑗 ∈ 𝑅𝑃×𝑃
which are described in detail below: },  ∀𝑖, 𝑗 ∈ [1, 𝑁] 
The first step in collecting the image dataset is to get 
Where 𝑁 s is the number of patches per dimension, 
an extensive collection of images. This dataset comprises 
calculated as:  
two main categories: 
Image Size
• AI-Generated Images: AI tools and algorithms 𝑁 =       (3) 
Patch Size
images for interior design pictures. 
For an image of 224×224 and a patch size 16, N=14 (i.e., 
• Real Images: Actual interior designs captured 14×14=196 patches). Each patch is flattened into a 1𝐷 
using cameras or professionally curated vector and linearly projected into a 𝐷 -dimensional 
photographs. embedding space using a learnable matrix, 𝑊𝑒: 
 
The dataset must be diverse in design styles, lighting 
𝑧𝑝 = 𝑊𝑒 ⋅ Flatten(𝑝𝑖,𝑗)    (4) 
conditions, and resolutions to generalize new images well. 
 
Raw input images are standardized to make them 
Where 𝑧𝑝 ∈ 𝑅𝐷 , is the embedded representation of a patch. 
appropriate for input into the ViT model and for better 
performance. Each image is resized to 224 ×  224 pixels: To encode spatial information, a positional embedding 
𝐼′ = Resize(𝐼, 224,224)    (1) 𝑒pos is added to each patch embedding: 
Where 𝐼 is the original image and 𝐼′, is the resized image. 𝑧′
𝑝 = 𝑧𝑝 + 𝑒pos      (5) 
To prevent overfitting and improve robustness, performed 
Where 𝑒pos,  is a learnable positional embedding vector. 
data augmentation, which includes: 
• Random Rotation (±15°) was applied to introduce The sequence of patch embeddings is passed through 
random image orientation variability, where 15° — multiple Transformer encoder layers. Each layer consists 
rotational deviation. of Multi-Head Self-Attention (MHSA) scores are 
computed as follows:  
• To simulate mirroring of interior design perspectives, 
simulate Horizontal Flipping (50% probability). 𝑄𝐾𝑇
Attention(𝑄, 𝐾, 𝑉) = Softmax ( ) 𝑉   (6) 
√𝑑
• The effect of random cropping (90% of the original 𝑘
size) forces the model to pay attention to different where:  
image portions. 
• 𝑄 = 𝑊𝑞 ⋅ 𝑧′
𝑝  (query) 
• By applying Color Jitter (±0.2 on the Brightness, 
•  𝐾 = 𝑊 ′
𝑘 ⋅ 𝑧𝑝  (key)
Contrast, and Saturation adjustments), I'm simulating , 
the variations that might occur through lighting •  𝑉 = 𝑊 ′
𝑣 ⋅ 𝑧𝑝  (value) 
conditions. •  𝑊𝑞 , 𝑊𝑘 , 𝑊𝑣  , are learnable weight matrices. 
•  𝑑𝑘, is the dimensionality of the key. 
Vision Transformer-Based Framework for AI-Generated Image…                                             Informatica 49 (2025) 137–150   141 
Multi-head attention is computed as: of 1.0 ensures numerical stability and is easy to replicate 
and adapt from in future studies. 
MHSA(𝑧′𝑝 ) = Concat(head1, … , headℎ)𝑊𝑜  (7) 
 
where 𝑊𝑜, is an output projection matrix.  
 
Feed-Forward Neural Network (FFN): Each patch 
embedding is processed through a two-layer fully Table 2: Training Hyperparameters 
connected network with activation: Parameter Value 
Optimizer AdamW (Decoupled 
FFN(𝑧) = ReLU(𝑧𝑊1 + 𝑏1)𝑊2 + 𝑏2   (8) Weight Decay) 
where 𝑊1, 𝑊2 and 𝑏1, 𝑏2, are learnable parameters. Learning Rate 5e-5 (decayed using 
cosine annealing) 
Residual Connections and Layer Normalization: Each Learning Rate Schedule Cosine Annealing with a 
block includes skip connections and normalization: warm-up for the first five 
epochs 
𝑧𝑙+1
𝑝 = LayerNorm (𝑧𝑙
𝑝 + MHSA(𝑧𝑙
𝑝))   (9) 
Batch Size 16 
𝑧𝑙+1
𝑝 = LayerNorm (𝑧𝑙
𝑝 + FFN(𝑧𝑙 Weight Decay 0.01 
𝑝)) 
Dropout Rate 0.1 
 (10) Training Epochs 10 
A unique learnable classification token 𝑧𝑙
cls, is prepended Gradient Clipping Norm Clip at 1.0 
to the patch sequence: Loss Function Binary Cross-Entropy 
Loss 
𝑧𝑙+1
cls = Transformer(𝑧𝑙 𝑙
cls, {𝑧𝑝})   Validation Split 80% Train, 20% 
 (11) Validation 
where 𝑧𝑙
cls, aggregates global information for The results of the proposed method are  evaluated by using 
classification. the following metrics: 
The output of the classification token is passed through a TP+TN
Accuracy =    (15) 
softmax layer to produce probabilities for the two classes, TP+TN+FP+FN
(𝑦real, 𝑦AI): Accuracy is a general measure of the total correctness 
of the model. However, as always in machine learning, it 
ŷ = Softmax(Wc ⋅ zcls + bc)   
is not for class imbalance as a model that always predicts 
 (12) 
"AI generated" would still have high accuracy if the 
Where 𝑊𝑐 and 𝑏𝑐, are learnable parameters. dataset was skewed. An accuracy score ranging above 90% 
is an indication that the model is working reasonably well 
The binary cross-entropy loss is: 
overall. It does not mean that the model is not biased 
1
L = − ∑N toward one class. 
i=1[y g y ) ( − ) l g 1 y )
N i lo ( î + 1 yi o ( − î ] 
 (13) TP
Precision =     (16) 
TP+FP
Where 𝑦𝑖 , is the ground truth label. 
The precision measures how many of the detected AI-
The model is trained using the Adam optimizer: generated images are AI-generated. In an application 
where false positives are to be minimized, such as 
θt+1 = θt − η∇L(θt)    
incorrectly labelling accurate interior designs as AI-
 (14) generated, such normalization is a must. A high precision 
Θ represents model parameters, η is the learning rate, and (>90%) implies the model does not misclassify human-
∇ℒ is the loss gradient. created images as AI-generated. If the score is less than 
80% (i.e., a precision of less than ~80%, which is a high 
We provide a detailed breakdown of false positive rate), then the model may be too unreliable 
hyperparameters and training configurations of our for commercial use. 
experiments to guarantee reproducibility in Table 2. 
TP
Similar to AdamW, which is known for its good          Recall =              (17) 
TP+FN
generalization of Transformer-based architectures, we use 
the version of AdamW. A weight decay of 0.01 helps to The recall measures how much the model fails to 
prevent overfitting. Beginning with a warm-up at the first identify images without missing AI-generated images. 
five epochs, we apply a cosine annealing schedule with a However, recall is a key metric for applications where 
warm-up to avoid early instability and then gradually finding all the AI-generated content is more important than 
decay the learning rate in the rest of the training. A avoiding false positives. A high recall (>90%) means the 
memory-efficient yet stable update is done in a batch size model fails to capture AI-generated images. If there is a 
of 16. These hyperparameters are detailed and mimic in low recall (<80%), the model cannot correctly 'detect' 
training, especially in deep ViT models; gradient clipping 
142   Informatica 49 (2025) 137–150                                                                                                                                        H. Wang 
many of these AI-generated images, resulting in many • DeiT models are optimized for datasets on the smaller 
false negatives. side, and their efficiency is based on knowledge 
Precision⋅Recall distillation. Although they reduce training costs, they 
F1-Score = 2 ⋅    
Precision+Recall are less suitable for capturing global dependencies in 
 (18) image authenticity verification by AI because they 
rely on CNN-like inductive biases. 
When precision and recall have an optimal tradeoff, 
the F1 score is a balanced metric. In particular, it is suitable • Applications in object detection: As an object 
for AI image detection, where you want to minimize false detection application, Swin utilizes hierarchical 
positives and negatives. A high F1 score (i.e., >90%) feature learning with shifting windows, so it is 
indicates that the model can balance precision and recall efficient. Nevertheless, our main objective in global 
well. If the F1-score is low (<80%), then the model is feature extraction is achieved by standard ViTs owing 
overfitting to one class (i.e., giving in precision or recall to their pure self-attention mechanism. 
disproportionally). Consequently, we did not explore hybrid transformers 
Different ViT configurations, such as ViT-B16: Base to examine the effects of patch size and model capacity on 
model, patch size 16 × 16, are used. ViT-B32: Base model, AI-generated image detection. 
patch size 32 ×  32. ViT-L16: Large model, patch size Figure 2 illustrates the proposed method's ability to 
16 ×  16. ViT-L32: Large model, patch size 32 ×  32. classify images as AI-generated (T: Using Vision 
Each configuration affects the balance between Transformers, we represent visual tokens to classify 
computational efficiency and detection accuracy. images as either AI Created (T: AI) or human-created (T: 
Consideration of alternative hybrid transformer Human). The predicted labels (P: Below each 
architectures was considered in this study, such as DeiT classification, we have written AI or P: Human. The model 
(Data efficient Image Transformer) and Swin can distinguish between AI-generated and authentic 
Transformer. Still, due to the following reasons, they have human-created interior design images in different settings. 
not been part of this study. 
 
Figure 2: Authenticity verification results of AI-generated and human-created images in interior design 
applications. 
 
4    Experimental setup  
 
This research study fine-tuned Vision Transformers  
(ViTs) by classifying human-crested indoor design images The database of images related to interior design was 
from AI-crested indoor design images. The experiments compiled to be balanced, and the images were 
were conducted with various ViT variants to account for preprocessed to guarantee rigorous training and testing. 
the model capacity, achieving different patch sizes.  The dataset of AI-vs-human images is available at 
 https://www.kaggle.com/datasets/shirshaka/ai-vs-human-
 generated-images. Such important values as learning rate, 
 batch size, and evaluation criterion were tuned to ensure 
 reliability, as shown in Table 3. 
 
  
  
 
  
  
 
Vision Transformer-Based Framework for AI-Generated Image…                                             Informatica 49 (2025) 137–150   143 
 Validation Strategy Evaluation performed after 
  each epoch. 
 
 
5    Results and analysis 
 
For this task, we evaluate four Vision Transformer 
Table 3: Overview of the experimental setting, including (ViT) models—ViT-B16, ViT-B32, ViT-L16, and ViT-
model architectures used, details of the data sets and L32—to distinguish between real and artificial interior 
preprocessing, and training and evaluation parameters design images generated by AI. This section presents the 
applied in classifying human and AI-generated interior design validation results and analysis. Based on essential metrics 
images. 
like loss, accuracy, F1 score, precision, recall, runtime, and 
Aspect Details 
computational efficiency, the models were compared in 
 Vision Transformers (ViT) Table 4 and Figures 3-6. The results quantify the tradeoff 
variants: between accuracy and efficiency across various model 
 
• vitb16: Base configurations, with smaller patch sizes (16×16) achieving 
 model, patch size higher accuracy and F1 scores and larger patch sizes 
16 (32×32) for more computational throughput. The most 
Models  
• vitb32: Base appropriate model for this classification task is identified 
 model, patch size through a detailed comparison. 
32  
• vitL16: Large 
model, patch size Table 4: Validation performance reached by ViT models 
16 (ViT-B16, ViT-B32, ViT-L16, and ViT-L32) on studying 
• vitL32: Large AI-generated image classification. 
model, patch size Metric  ViT- ViT- ViT- ViT-
32 B16 B32 L16 L32 
Pretraining  All models were pre-trained       
on ImageNet-21k. 
Accurac 96.25% 80.00% 96.25% 81.25% 
Fine-tuning Task  Binary classification: 
y        
Class 0: Human-generated 
 
images F1 Score 0.9625 0.8000 0.9625 0.8118 
• Class 1: AI-         
generated images 
Dataset  Custom dataset of interior  0.9637 0.8002 0.9637 0.8175 
design images categorized Precision       
as accurate (human) or fake   
(AI). 
 Recall  0.9625 0.8000 0.9625 0.8125 
Sample Limitation The sample limit is 1000       
samples per class per 
category. Loss  0.1154 0.4970 0.1206 0.4469 
      
Data Splitting  80% training, 20% 
validation split.  Runtime 15.7407 15.3469 18.4198 15.109
(s)        6 
Image Processing   Transformation pipeline: 
• Resize to 224x224 Samples 10.165 10.426 8.686 10.589 
 
pixels per       
• Convert to tensor Second  
• Normalize using Steps per 0.635 0.652 0.543 0.662 
ImageNet mean 
Second        
and standard 
deviation.  
Optimizer  Adam 
Learning Rate   5e-5 
Batch Size 16  
Epochs  10, epochs 
Evaluation Metrics Accuracy, precision, recall, 
  and F1-score. 
144   Informatica 49 (2025) 137–150                                                                                                                                        H. Wang 
  
Figure 3: The ViT_B16 model validation results over ten Figure 6: Validation metrics of the ViT-L32 model over 
epochs with a decline in loss and an accurate convergence ten epochs show loss declining to 0.44 by epoch 8, while 
of accuracy, F1 score, precision, and recall around 96% at accuracy, F1 score, precision, and recall stabilize around 
epoch 8. 81% by the final epoch. 
  
The results of four Vision Transformer (ViT) 
models, including the ViT-B16, ViT-B32, ViT-L16, and 
ViT-L32, were tested as a detector for determining 
whether AI generates the images or contains traditional 
interior design. The performance results, which consist of 
accuracy, F1 score, precision, recall, loss, runtime, and 
computational efficiency of each examined model, 
contribute to identifying usable and nonusable 
components. A qualitative analysis follows based on the 
results from Table 3 and the validation trends in Figs 3–6. 
Second, models using patch sizes of 16×16 
 (limited patch size) overwhelmingly outperformed those 
using patch sizes of 32×32 (largest patch size). Our 
Figure 4: Validation metrics of the ViT-B32 model validation accuracy was 96.25%, F1 score 0.9625, 
during ten epochs with loss have converged, and precision 0.9637 and recall 0.9625. The results 
accuracy, F1 score, precision, and recall at a plateau of demonstrate that these models can accurately discriminate 
80% around the final epoch. between AI-generated and authentic images. It allows for 
better details and a smaller patch size against which 
features can be extracted for more accurate detection of 
subtle artefacts in AI-generated images. 
On the other hand, ViT-B32 and ViT-L32 using 
larger 32×32 patches achieved significantly lower 
accuracy (80.00% and 81.25%) and F1 scores (0.8000 and 
0.8118). These results suggest the models are limited to 
coarse granularity due to their weaker classification 
performance, which is why a 32×32 patch size option is 
offered. 
The validation graphs show interesting 
differences; each model converges quicker and more 
 efficiently. At the end of epoch 8, ViT-B16 (Figure 1) 
Figure 5: ViT-L16 model validation metrics on ten steadily reduces its validation loss to 0.1154, and we 
epochs, quickly converging to 3 epochs, with loss of 
around 0.12 and accuracy of around 96%, F1 score, observe that the accuracy, precision, recall, and F1 scores 
precision, and recall of around 96, respectively. settle at around 96%. It shows how robust and efficient, in 
 theory, it is at learning. 
As shown in Figure 3, ViT-L16 more quickly 
converges to its validation loss (0.1206) as early as epoch 
3. Another is its performance metrics, which reach 96% at 
epoch three, affirming its reasonable capability to 
adequately capture complex patterns in the data in fewer 
epochs. However, this raises the computational price. 
ViT-B32 (Figure 2) and ViT-L32 (Figure 4) take 
longer to converge, losing at 0.4970 and 0.4469 
respectively. These models achieve precision and recall at 
Vision Transformer-Based Framework for AI-Generated Image…                                             Informatica 49 (2025) 137–150   145 
around 80–81%, whereas the smaller patch-size models in its computational cost. On the other hand, ViT-B32 and 
reach their precision and recall plateau earlier. ViT-L32 pick the path of efficiency over precision, being 
On the other hand, small patch size models (ViT- good candidates for real-time applications where speed is 
B16, ViT-L16), although providing higher classification more important than classification accuracy. The 
performance, incur higher computational costs. As with importance of choosing the correct model configuration is 
ViT-L16, the runtime of this setup is 18.4198 seconds, made clear in this comprehensive comparison of the 
with the lowest throughput of 8.686 samples per second 'theory' against the specific needs of the task. 
and 0.543 steps per second, reflecting this setup's high It is a standard evaluation measure of classification 
computational complexity. Though less efficient than the tasks, which summarizes the model's performance across 
32×32 patch models, ViT-B16 processes 10.165 samples different thresholds in a single graph called the Area Under 
per second at a runtime of 15.7407 seconds, making it a The Curve (AUC) graph. It gives an overall score of model 
good balance between performance and efficiency. effectiveness by providing a measure of the tradeoff 
However, a comparison of ViT-B32 and ViT-L32 between the True Positive Rate (sensitivity) and the False 
reveals that ViT-B32 is considerably more efficient, Positive Rate. In the context of authenticity verification, 
reaching a throughput of 10.589 samples per second and a we use evaluation accuracy as a proxy for AUC and allow 
runtime of 15.1096 seconds, which makes it the fastest. the performance of models to be compared directly in 
Nevertheless, their F1 scores and reduced accuracy make Figure 7.  
their application less appropriate for high-precision tasks.  
Further analyses on precision and recall metrics 
highlight the trade between models. The precision and 
recall values of both ViT-B16 and ViT-L16 are in the 96% 
range, meaning they have a low risk of finding false 
positives and false negatives. They are ideal for tasks with 
high accuracy, making them perfect. 
ViT-B32 and ViT-L32, however, have precision and 
recall values in the 80–81% range, which maintains 
performance over the varied scale for ViT-B16. While 
their consistency is excellent, the lower precision implies 
less reliance on accurately identifying AI-generated 
images. The validation metric trends provide additional 
clarity: 
• ViT-B16 (Figure 3): With growing numbers of 
epochs, it shows steady improvement and stable  
Figure 7: Detecting AI-generated images in interior 
performance from epoch 8, and this is an excellent 
design applications by comparing AUC among Vision 
balance between the capacity of learning and 
Transformer models (ViT-B16, ViT-L16, ViT-L32, and 
efficiency. 
ViT-B32). Although slightly worse, other models have 
• ViT-L16 (Figure 5): It converges remarkably fast, 
comparable outcomes; even ViT-B16 is the strongest. 
stabilizing by epoch 3, but at a higher computational 
cost, making it an attractive solution when fast This study's results show that Vision Transformers 
training is a top priority. (ViTs) outperform conventional CNN-based methods in 
• ViT-B32 (Figure 4) and ViT-L32 (Figure 6): Slow detecting interior design imaging generated by AI. By 
learning with little ability to capture minute comparing the models, it is concluded that the best-
differences in the data, all exhibit gradual performing model, ViT-B16, could perform at an accuracy 
improvement over ten epochs. of 96.25% and an F1 score of 0.9625, thus proving to 
The results reveal the tradeoff between accuracy and distinguish AI-generated images from the real ones. While 
computational efficiency. ViT-B16 is the most balanced these results are promising, it is necessary to contextualize 
model with reasonable throughput, runtime, and accuracy them by comparing them to prior AI-generated image 
(96.25%). Equally accurate, ViT-L16 is too detection in other fields, such as medical imaging, digital 
computationally intensive for use when accuracy isn't the art, and deepfake detection, as shown in Table 5.  
top concern. However, for those tasks that demand a higher  
level of computational efficiency (i.e., speed), ViT-B32 
and ViT-L32 are favourable. Since the reduced accuracy Table 5: Contextual comparison of AI-generated image 
renders them unusable for high-precision calculation, the detection methods 
entire ViT family may be overkill for some applications. Domain Best Accur F1 Key 
ViT-B16 seems to be a better model for detecting AI- Model acy Sco Observati
generated images in interior design than the rest, as its re ons 
tradeoff between accuracy and computational efficiency is Medical ViT-based 94.7% - ViTs 
better. While ViT-L16 has a higher computational cost, its Imaging Histopath effectivel
fast convergence and high accuracy make it ideally suited ology y detect 
to scenarios seeking the highest precision, with a tradeoff Model synthetic 
146   Informatica 49 (2025) 137–150                                                                                                                                        H. Wang 
(Arshed et medical ViT- 580 sec 10.2 GB 80.00% 0.8000 
al., 2023) images B32 
but ViT- 940 sec 16.8 GB 96.25% 0.9625 
struggle L16 
with ViT- 810 sec 14.3 GB 81.25% 0.8118 
highly L32 
high-
resolution The ViT-B16 configuration achieves the best tradeoff 
textures. between accuracy and computational efficiency. ViT-L16 
gets comparable accuracy but requires much more memory 
Digital GAN- 85– - CNNs are 
Art Based 92% effective and training time than Quilt. ViT-L16, ViT-B16, ViT-B32, 
Authentic CNN but prone and ViT-L32 require less computational load than larger 
ation Model to false patch sizes but offer lower accuracy. The results show that 
the most practical model for real-world AI-generated 
(Vivaldi positives 
image detection in interior design is ViT-B16; they are 
& Sutedja, due to 
accurate and come with reasonable training time and 
2024) intricate 
artistic memory usage. 
patterns. We also performed additional experimental 
evaluations, using an imbalanced dataset and noisy inputs, 
Deepfake ViT- - 0.95 ViTs 
to test our models' robustness. In both tests, real-world 
Detection Based excel at 
samples are simulated, and ViTs are tested to see their 
Deepfake capturing 
stability in different data conditions. We had changed the 
Detector subtle 
class distributions (70% of AI-generated images, 30% 
(Zhao et inconsiste
authentic images). ViT-B16 performance dropped slightly 
al., 2023) ncies in 
(Accuracy: 94.2%, F1 Score: 0.945). The model was 
AI-
stable; thus, it was resilient to imbalanced data. We 
generated 
degraded the inputs using Gaussian noise (σ=0.05) and 
human 
random occlusions. However, ViT-B16 achieved high 
faces. 
accuracy (93.5%) while ViT-B32 and ViT-L32 decreased 
Interior ViT-B16 96.25 0.96 ViT-B16 
below 75%. Self-attention in ViTs helps retain essential 
Design % 25 outperfor
features; however, larger patch sizes suffer from losing 
(Our ms 
fine details in noisy conditions. Inference on challenging 
Study) existing 
conditions confirms that ViT-B16 is the most robust 
methods 
model. Further work will be pursued to enhance the model 
by 
resilience with adversarial training techniques. 
preserving 
fine-
grained 6    Discussion 
textures 
Results from the experiment confirm the incredible 
and 
performance of Vision Transformers (ViTs) in 
capturing 
distinguishing AI-generated interior design images. For 
long-
smaller patch sizes such as ViT-B16 and ViT-L16, we 
range 
achieve an impressive accuracy of 96.25% in identifying 
dependen
subtle artefacts. This makes them an ideal choice for high-
cies. 
precision authenticity verification. Similarly, 
Table 5 compares training time, memory usage, configurations with larger patch sizes, such as ViT-B32 
and model performance to ensure the computational and ViT-L32, optimize for speed at the expense of some 
efficiency of different ViT configurations. The analysis accuracy. Real-time applications, or environments with 
must identify the most reasonable model for detecting AI- resource constraints, apply generously to these 
generated images in interior design concerning configurations. Our findings demonstrate that ViTs can be 
computation cost and accuracy. scalable for other creative fields, such as architecture and 
visual art. Future work will concentrate on designing 
Table 5: Computational dfficiency of ViT configurations hybrid architectures for optimal precision and efficiency. 
Model Training Memory Accuracy F1 This work has shown that ViTs can be a powerful tool 
Time Usage (%) Score for distinguishing AI-generated from human-generated 
(per (GB) images in interior design. Its results highlight the promise 
epoch, and pain of using them in this way, which can be extended 
sec) to many other application areas. Across four ViT 
ViT- 720 sec 12.5 GB 96.25% 0.9625 configurations (ViT-B16, ViT-B32, ViT-L16, and ViT-
B16 L32), we summarize the findings regarding the tradeoffs 
Vision Transformer-Based Framework for AI-Generated Image…                                             Informatica 49 (2025) 137–150   147 
between model accuracy, computational efficiency, and lighting conditions and, thus, are better suited for more 
the nature of data representation. generalized AI detection frameworks. 
Using smaller patch sizes (16×16) like ViT-B16 and However, the observed tradeoffs between accuracy and 
ViT-L16, the models demonstrate superior performance efficiency indicate that task-specific model selection is 
over all the metrics like accuracy, precision, recall, and F1 critical. High-precision applications may benefit from 
score and reach values close to 96.25%. That is to say, smaller patch sizes and larger models; conversely, 
those models are more capable of discerning the relatively computationally efficient configurations may prove 
subtle inconsistencies and artefacts typical of artificial preferable for scenarios where scalability and speed are 
images that are indistinguishable from reality in the human paramount in large-scale design database audits. 
eye. ViTs display robust ability in this binary classification By demonstrating the effectiveness of ViTs in 
problem by extracting detailed spatial and contextual differentiating two sets of images produced by AI in 
features. interior design, this study lays the groundwork for 
However, the computational demands of ViTs became developing more sophisticated AI authenticity verification 
a more significant consideration. ViT-L16 converged algorithms. Through tailored model configurations to 
faster (within three epochs) than ViT-B16, which achieved particular use cases, the tradeoffs between accuracy and 
high accuracy, but its computation overheads—runtime efficiency can be worked through effectively, enabling 
and throughput—make it less practical for resource- general use in the creative domain and further. 
constrained environments. On the other hand, ViT-B16 The current AI-generated image detection techniques 
also achieved comparable accuracy but with relatively mainly depend on a CNN-based model with local receptive 
lower computational costs. Given applications such as fields to extract hierarchical spatial features. While CNNs 
interactive design tools or automated verification systems have identified GAN artefacts when such CNNs are 
that require real-time processing, the efficiency gains applied to high-resolution photo-realistic synthetic interior 
enabled by models like ViT-B32 may be preferable to less design images, traditional, deepfake, or low-quality 
precise models, though they would be less accurate. synthetic artefacts are absent from the synthetic images. 
The results are essential for real-world deployment in The CNNs cannot find them. On the other hand, ViTs like 
interior design and related fields. Integrating high- ViT-B16 use self-attention mechanisms that work across 
accuracy models such as ViT-B16 into quality assurance the entire image to find inconsistencies that CNNs would 
pipelines can assure the authenticity of design assets to miss. Comparative performance between ViT-B16 and the 
verify usage and prevent misrepresentation. Like ViTs, the paper reported in previous literature is presented in Table 
versatility of ViTs in processing diverse datasets shows 6. 
how ViTs are adaptable to diverse design styles and 
Table 6: Comparative performance analysis of ViT-B16 vs CNN-based methods. 
Model Architecture Accuracy F1 Key Limitations 
Score Strengths 
CNN-Based Convolutional 85–92% 0.85– Intense Struggles with 
Methods feature 0.91 spatial long-range 
extraction feature dependencies, 
learning, poor 
efficient on generalization to 
small-scale high-quality AI-
datasets generated 
images 
Hybrid CNN for local 89–94% 0.89– Balances Computationally 
CNN- features, 0.94 CNN expensive, 
Transformer Transformer efficiency complex 
for long- with training process 
range context Transformer's 
self-attention 
ViT-B16 Vision 96.25% 0.9625 Captures Requires 
(Our Model) Transformer both local significant 
with small and global pretraining and 
patch size dependencies higher 
(16×16) with high computational 
accuracy on resources 
high-quality 
AI images 
patch sizes, like ViT-B16 and ViT-L16, had significantly 
We also observe that the performance of ViT depends 
better accuracy than models with bigger patch sizes (like 
on patch size. Our results show that models with smaller 
148   Informatica 49 (2025) 137–150                                                                                                                                        H. Wang 
ViT-B32 and ViT-L32). Even for ViT-B32, the accuracy this research attempts to contribute to AI authenticity 
dropped to 80.00%, and for ViT-L32, it dropped to verification in interior design using transformer-based 
81.25%, indicating that the solutions fell considerably image classification. Future work will consider improving 
behind their small patch counterpart. This discrepancy is computational efficiency, enhancing the set of images used 
because smaller patches can preserve fine-grained details. in the dataset with more diverse AI-generated photos, and 
When an image is tokenized into larger patches, the loss of combining convolutional and transformer-based models. 
information can occur due to aggregation of critical spatial Finally, we will investigate adversarial robustness for 
information like subtle shading, textural variations, and improving the model's resilience against evolving 
delicate contours. The interior design images are of generative techniques. Such advances will further bolster 
intricate patterns and highly detailed material textures, for AI image detection, as it is utilized in digital content 
which feature extraction is better maintained with small verification. 
patch sizes. Furthermore, the self-attention module 
receives fewer tokens to process in larger areas, which can References 
impact the model learning the distinction between 
authentic vs AI-generated images. It sets smaller patch [1] J. Hutson, J. Lively, B. Robertson, P. Cotroneo, and 
sizes, leading to denser tokenization, so the ViT model can M. Lang, Creative Convergence: The AI 
retain more information and distinguish between the real Renaissance in Art and Design. Springer Nature, 
world and AI-generated designs. pp. 1–19, Nov. 2023, doi: 10.1007/978-3-031-
The results show that ViTs outperform CNN-based 45127-0_1  
models in detecting AI-generated images; however, [2] D. Saxena and J. Cao, "Generative adversarial 
several limitations should be considered. Even though the networks (GANs) challenges, solutions, and future 
data is diverse, there could still be latent biases in the directions," ACM Computing Surveys (CSUR), vol. 
lighting styles. Through specific aesthetic design 54, no. 3, pp. 1-42, 2021. 
preferences, the model may figure out the detection of https://doi.org/10.1145/3446374 
style incoherencies rather than actual AI artefacts. [3] F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. 
However, future work will have to cross-domain on Shah, "Diffusion models in vision: A survey," IEEE 
datasets generated by different AI models (e.g., GANs vs. Transactions on Pattern Analysis and Machine 
Diffusion models) to validate their generalization Intelligence, vol. 45, no. 9, pp. 10850-10869, 2023. 
properties. However, ViT-B16 reaches high accuracy but https://doi.org/10.1109/tpami.2023.3261988 
still consumes ample computational resources (12.5GB [4] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. 
memory for each epoch). The ViT-based detection systems Khan, and M. Shah, "Transformers in vision: A 
deployed on edge devices or real-time applications may be survey," ACM computing surveys (CSUR), vol. 54, 
performable with model compression techniques like no. 10s, pp. 1-41, 2022. 
knowledge distillation or quantization. Potential Evasion https://doi.org/10.1145/3505244 
by Advanced AI Models  As soon as AI-generated images [5] N. Anantrasirichai, F. Zhang, and D. Bull, 
become fancier, detection models must change. The AI "Artificial Intelligence in Creative Industries: 
images could be created using adversarial attacks to avoid Advances Prior to 2025," arXiv preprint 
detection, and the training process for models would need arXiv:2501.02725, 2025. 
to be continuously updated. These limitations provide https://doi.org/10.1007/s10462-021-10039-7 
future improvements in AI-generated image detection, [6] M. A. Moharram and D. M. Sundaram, "Land use 
which is scalable and adaptive. and land cover classification with hyperspectral 
data: A comprehensive review of methods, 
7    Conclusion challenges and future directions," Neurocomputing, 
vol. 536, pp. 90-113, 2023. 
For interior design, this study shows the viability of Vision https://doi.org/10.1016/j.neucom.2023.03.025 
Transformers (ViTs) as a method to differentiate AI- [7] K. Han et al., "A survey on vision transformer," 
generated images from human-made designs. We then find IEEE transactions on pattern analysis and machine 
a clear tradeoff between accuracy and computational intelligence, vol. 45, no. 1, pp. 87-110, 2022.  
efficiency by fine-tuning multiple ViT configurations [8] G. Bansal, A. Nawal, V. Chamola, and N. 
(ViT-B16, ViT-B32, ViT-L16, ViT-L32). Classifiers Herencsar, "Revolutionizing visuals: the role of 
using smaller patches (patches size: 16×16) performed generative AI in modern image generation," ACM 
better, and ViT-B16 achieved 96.25% accuracy and Transactions on Multimedia Computing, 
0.9625 (F1 score). The key outcome of these results is that Communications and Applications, vol. 20, no. 11, 
delicate feature extraction improves AI image detection, pp. 1-22, 2024.  
and ViT-B16 is the most appropriate model for real-world https://doi.org/10.1109/tpami.2022.3152247 
applications. On the other hand, with computational [9] A. Kulkarni, A. Shivananda, A. Kulkarni, and D. 
benefit, higher patch size models (such as 32×32) do have Gudivada, "Diffusion Model and Generative AI for 
worse performance but are better suited for lower precision Images," in Applied Generative AI for Beginners: 
applications. Due to our findings regarding the necessity Practical Knowledge on Diffusion Models, 
of selecting models according to task requirements and ChatGPT, and Other LLMs: Springer, 2023, pp. 
balancing accuracy, efficiency, and resource constraints, 
Vision Transformer-Based Framework for AI-Generated Image…                                             Informatica 49 (2025) 137–150   149 
155-177. https://doi.org/10.1007/978-1-4842- transformers," IEEE Access, vol. 11, pp. 123433–
9994-4_8 123444, 2023. doi: 10.1109/access.2023.3329952 
[10] S. Bengesi, H. El-Sayed, M. K. Sarker, Y. [22] J. Maurício, I. Domingues, and J. Bernardino, 
Houkpati, J. Irungu, and T. Oladunni, "Comparing vision transformers and convolutional 
"Advancements in Generative AI: A neural networks for image classification: A 
Comprehensive Review of GANs, GPT, literature review," Applied Sciences, vol. 13, no. 9, 
Autoencoders, Diffusion Model, and p. 5521, 2023. 
Transformers," IEEE Access, vol. 12, pp. 69812– https://doi.org/10.3390/app13095521 
69837, 2024, doi: 10.1109/access.2024.3397775 [23] T. Walczyna, D. Jankowski, and Z. Piotrowski, 
[11] D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, "Enhancing Anomaly Detection Through Latent 
and L. Verdoliva, "Are GAN generated images Space Manipulation in Autoencoders: A 
easy to detect? A critical analysis of the state-of- Comparative Analysis," Applied Sciences, vol. 15, 
the-art," in 2021 IEEE international conference on no. 1, p. 286, 2024. 
multimedia and expo (ICME), 2021: IEEE, pp. 1-6. https://doi.org/10.3390/app15010286 
https://doi.org/10.1109/icme51207.2021.9428429  [24] D. H. Hagos, R. Battle, and D. B. Rawat, "Recent 
[12] T. Arora and R. Soni, "A review of techniques to advances in generative ai and large language 
detect the GAN-generated fake images," models: Current status, challenges, and 
Generative Adversarial Networks for Image-to- perspectives," IEEE Transactions on Artificial 
Image Translation, pp. 125-159, 2021. Intelligence, vol. 5, no. 12, pp. 5873–5893, Dec. 
https://doi.org/10.1016/b978-0-12-823519- 2024, doi: 10.1109/tai.2024.3444742. 
5.00004-x [25] S. P. J. Christydass, N. Nurhayati, and S. 
[13] A. Khan et al., "A survey of the vision transformers Kannadhasan, Hybrid and Advanced Technologies: 
and their CNN-transformer based variants," Proceedings of the International Conference on 
Artificial Intelligence Review, vol. 56, no. Suppl 3, Hybrid and Advanced Technologies (ICHAT 2024), 
pp. 2917-2970, 2023. April 26-28, 2024, Ongole, Andhra Pradesh, India 
https://doi.org/10.1007/s10462-023-10595-0 (Volume 2). CRC Press, 2025. 
[14] A. Rahali and M. A. Akhloufi, "End-to-end https://doi.org/10.1201/9781003559115 
transformer-based models in textual-based NLP," [26] M. M. Meshry, "Neural rendering techniques for 
AI, vol. 4, no. 1, pp. 54-110, 2023. photo-realistic image generation and novel view 
https://doi.org/10.3390/ai4010004 synthesis," University of Maryland, College Park, 
[15] H. Bougueffa et al., "Advances in AI-Generated 2022.  
Images and Videos," International Journal of [27] S. Susan and A. Kumar, "The balancing trick: 
Interactive Multimedia & Artificial Intelligence, Optimized sampling of imbalanced datasets—A 
vol. 9, no. 1, 2024. brief survey of the recent State of the Art," 
https://doi.org/10.9781/ijimai.2024.11.003 Engineering Reports, vol. 3, no. 4, p. e12298, 2021. 
[16] A. S. Paladugu, A. Deodeshmukh, A. R. Shekatkar, https://doi.org/10.1002/eng2.12298 
I. Kandasamy, and V. WB, "Detection of [28] X. Jiang and Z. Ge, "Data augmentation classifier 
Artificially Generated Images Using Shifted for imbalanced fault classification," IEEE 
Window Transformer with Explainable Ai," Transactions on Automation Science and 
Available at SSRN 5025934. Engineering, vol. 18, no. 3, pp. 1206-1217, 2020. 
https://doi.org/10.2139/ssrn.5025934 https://doi.org/10.1109/tase.2020.2998467 
[17] L. Yin et al., "Convolution-Transformer for Image [29] O. Rainio, J. Teuho, and R. Klén, "Evaluation 
Feature Extraction," CMES-Computer Modeling in metrics and statistical tests for machine learning," 
Engineering & Sciences, vol. 141, no. 1, 2024. Scientific Reports, vol. 14, no. 1, p. 6086, 2024. 
https://doi.org/10.32604/cmes.2024.051083 https://doi.org/10.1038/s41598-024-56706-x 
[18] H. Tang, D. Liu, and C. Shen, "Data-efficient multi- [30] P. Fergus and C. Chalmers, "Performance 
scale fusion vision transformer," Pattern evaluation metrics," in Applied Deep Learning: 
Recognition, vol. 161, p. 111305, 2025. Tools, Techniques, and Implementation: Springer, 
https://doi.org/10.1016/j.patcog.2024.111305 2022, pp. 115-138. https://doi.org/10.1007/978-3-
[19] W. Zheng, S. Lu, Y. Yang, Z. Yin, and L. Yin, 031-04420-5_5 
"Lightweight transformer image feature extraction  
network," PeerJ Computer Science, vol. 10, p. 
 
e1755, 2024. https://doi.org/10.7717/peerj-cs.1755 
[20] L. Scabini, A. Sacilotti, K. M. Zielinski, L. C.  
Ribas, B. De Baets, and O. M. Bruno, "A  
Comparative Survey of Vision Transformers for 
 
Feature Extraction in Texture Analysis," arXiv 
preprint arXiv:2406.06136, 2024.  
[21] D. Konstantinidis, I. Papastratis, K. Dimitropoulos,  
and P. Daras, "Multi-manifold attention for vision  
150   Informatica 49 (2025) 137–150                                                                                                                                        H. Wang 
  
  
  
  
  
  
  
  
  
  
  
 
 
  
  
  
 
 
 
  
  
  
 
 
 
  
  
  
 
  
  
  
  
 
  
  
  
 
 
 
  
  
  
 
 
 
  
  
  
 
 
 
  
  
  
 
 
 
  
  
  
 
  
  
Vision Transformer-Based Framework for AI-Generated Image…                                             Informatica 49 (2025) 137–150   151 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.7839 Informatica 49 (2025) 151–170 151 
Efficient Logistics Path Optimization and Scheduling Using Deep 
Reinforcement Learning and Convolutional Neural Networks 
Yan Yang1, 2, *, Kang Wang1, 2 
1School of Economics and Management, Jiaozuo University, Jiaozuo 454000, Henan, China 
2Graduate School, University of the East, Manila 0900, Philippines 
E-mail: yangyan_edu@outlook.com 
*Corresponding author 
Keywords: CNN, DRL, logistics path optimization, real-time scheduling, robustness scoring 
Received: December 17, 2024 
With the rapid development of e-commerce and online shopping, the logistics industry is facing 
unprecedented challenges. Traditional logistics path - planning methods, such as SPA, HA, GA, etc., 
struggle to cope with the complex and ever-changing logistics environment. To address this issue, this 
study proposes an innovative model that combines Deep reinforcement learning (DRL) with a 
Convolutional neural network (CNN) to achieve efficient logistics path optimization. In this research, a 
detailed analysis and pre-processing of the public datasets, the City Logistics Dataset (CLDS) and the 
Traffic Status Dataset (TSDS), were carried out to construct a model capable of effectively handling 
diverse logistics environments. Six baseline methods, namely the classic shortest path algorithm (SPA), 
heuristic algorithm (HA), genetic algorithm (GA), rule-based method (RBM), traditional deep 
reinforcement learning method (TDRM), and the most advanced deep learning method (ADLM), were 
selected for comparison. The experimental results indicate that the proposed model performs excellently 
across various environments. For instance, in suburban areas, it achieves a path length of 180 kilometers, 
a completion time of 120 minutes, a punctuality rate of 92%, and a dispatch success rate of 95%. In urban 
settings, the path length is 200 kilometers, the completion time is 150 minutes, the punctuality rate is 90%, 
and the dispatch success rate is 93%. On highways, it reaches a path length of 170 kilometers, a 
completion time of 110 minutes, a punctuality rate of 93%, and a dispatch success rate of 95%. Compared 
with the baseline methods, the model shows significant improvements in key metrics such as path length, 
completion time, punctuality, and dispatch success rate. Additionally, it outperforms them in terms of 
computation time and robustness scores, demonstrating great potential for practical applications. 
Povzetek: Opisan je izvrni model za optimizacijo logističnih poti in sprotno razporejanje z združitvijo 
globokega utrjevalnega učenja (DRL) in konvolucijskih nevronskih mrež (CNN). 
 
1 Introduction intelligence technology, domestic and foreign scholars are 
actively exploring the application of AI technology in 
With the advancement of global economic integration logistics path optimization, aiming to improve logistics 
and the rapid development of e-commerce, the logistics efficiency through intelligent algorithms. Although 
industry is facing unprecedented challenges and traditional methods such as linear programming can 
opportunities. Efficient, fast and accurate delivery of provide effective solutions, they are powerless in the face 
goods has become one of the core elements of corporate of large-scale dynamic problems [3, 4]. In contrast, AI 
competition. However, finding the optimal delivery path technologies such as genetic algorithms (GA) and ant 
in a complex geographical environment and achieving colony algorithms (ACA) have shown stronger 
instant scheduling under dynamically changing conditions exploration capabilities and adaptability, especially in 
has always been a difficult problem for logistics solving the traveling salesman problem (TSP) [5]. In 
companies. Although traditional mathematical addition, the advancement of deep learning technology, 
programming-based methods perform well under static especially the application of long short-term memory 
conditions, they have obvious limitations in dealing with networks (LSTM), makes it possible to predict traffic 
real-time changing traffic conditions and emergencies [1]. conditions and realize dynamic path planning. 
Therefore, it is particularly important to explore a new Reinforcement learning (RL) enables intelligent agents to 
logistics path optimization and real-time scheduling make optimal decisions in a constantly changing 
solution that can adapt to complex environments and has environment by simulating the learning process. These 
self-learning capabilities [2]. technologies have been widely used in multiple scenarios 
Logistics path optimization is a core link in logistics such as urban distribution, cross-border logistics, and cold 
management and is crucial to improving logistics service chain logistics, helping to optimize delivery routes, predict 
quality and reducing operating costs. Against the customs clearance time, monitor temperature changes, etc. 
backdrop of the rapid development of artificial  [6, 7]. Despite this, the application of AI in logistics path 
 
152 Informatica 49 (2025) 151–170 Y. Yang et al. 
optimization still faces challenges in data privacy the punctuality rate, which is expected to increase the 
protection, algorithm real-time and robustness. With the punctuality rate to more than 95%; at the same time, 
advancement of technology and the evolution of social increase the scheduling success rate to 92%. In terms of 
needs, more innovative solutions are expected to emerge computing efficiency, the model calculation time will be 
in the future, continuously promoting the intelligent controlled within 15 seconds to ensure real-time 
development of the logistics industry [8]. performance; and when facing complex environmental 
In view of the above background, this study aims to disturbances, the robustness score will be maintained 
explore how to use neural network technology to improve above 8.5 points (out of 10 points), comprehensively 
the existing logistics path optimization algorithm and improving the comprehensive performance of the logistics 
propose a set of real-time scheduling strategies suitable for scheduling system and providing strong technical support 
dynamic environments. Specifically, we will first analyze for the intelligent development of the logistics industry. 
the main problems and their causes in logistics Deep reinforcement learning (DRL) can continuously 
distribution, and then introduce the basic principles of optimize strategies by interacting with the environment to 
neural networks and their advantages in solving these cope with real-time changes; convolutional neural 
problems. Then, we will design and implement a neural networks (CNN) can extract effective features from 
network-based path optimization model that can respond complex geographic and traffic data. The combination of 
quickly after receiving real-time data input and adjust the the two allows the model to better perceive real-time 
distribution plan. Finally, we will verify the effectiveness information and make reasonable decisions quickly. 
of the model through experiments and explore its Therefore, the use of specific artificial intelligence 
applicability and limitations in different application methods is the key to solving real-time logistics problems. 
scenarios [9, 10]. They can make up for the shortcomings of traditional 
On the other hand, traditional mathematical methods and improve the efficiency and flexibility of 
programming has extremely high requirements for data logistics scheduling. 
integrity and accuracy. In logistics data, there are often The novelty of combining DRL and CNN for logistics 
problems such as missing data, errors, or outliers. For path optimization and real-time scheduling lies in the 
example, the weight of goods and order time in logistics unique complementary advantages. Traditional methods 
distribution information may be deviated due to recording find it difficult to take into account both geospatial feature 
errors or equipment failures, and the traffic volume and extraction and dynamic strategy adjustment. In this study, 
average speed in traffic status data may also have CNN's powerful spatial feature extraction capability can 
measurement errors. Traditional mathematical accurately capture key information in the logistics 
programming methods lack effective means to deal with geographical environment, such as distribution of 
these incomplete or inaccurate data, and direct use may distribution points, traffic network topology, etc. DRL can 
lead to a significant reduction in the reliability of model dynamically adjust strategies based on these features to 
results. In addition, when faced with large-scale, high- adapt to the ever-changing logistics environment, such as 
dimensional data, the computational complexity of real-time traffic conditions, order changes, etc. Although 
traditional mathematical programming methods will many papers have similar combinations, this study focuses 
increase dramatically, the solution time will be on complex logistics scenarios, deeply integrates the 
significantly longer, and it may even be impossible to advantages of the two, and achieves more efficient and 
solve the problem, making it difficult to meet the needs of intelligent path planning and scheduling decisions. This is 
real-time logistics scheduling. a unique contribution. 
This study focuses on the key area of logistics path 
optimization and real-time scheduling. At present, 2 Theoretical basis and literature 
traditional logistics scheduling methods have exposed 
many shortcomings when dealing with complex and review 
changing logistics environments, and it is difficult to meet 
the needs of efficient and accurate distribution. Based on 2.1 Basic concepts of logistics path 
this, we put forward the core research question: How to optimization 
use advanced neural network technology to deeply 
Research in the field of logistics continues to develop 
innovate the existing logistics path optimization algorithm 
and innovate, and many scholars have conducted in-depth 
to achieve efficient planning and real-time dynamic 
discussions from different angles. Alkan and Kahraman 
scheduling of logistics paths? 
(2023) used the multi-expert Fermat fuzzy hierarchical 
Around this issue, we put forward the following 
analysis method in the literature [9] to prioritize the supply 
specific hypothesis: The model that innovatively 
chain digital transformation strategy, providing a 
integrates DRL and CNN can fully tap the advantages of 
decision-making basis for the digital development of the 
both and effectively deal with complex geographic spatial 
logistics supply chain, helping logistics companies to 
information and dynamically changing logistics 
grasp key strategies and optimize operational processes in 
environments. Compared with traditional methods, this 
the digital wave. Lee et al. (2019) proposed an 
model is expected to shorten the length of logistics 
endosymbiotic evolutionary algorithm in the literature 
distribution paths by an average of about 20% in various 
[10] to solve the problem of the integrated model of 
scenarios; significantly shorten the delivery completion 
vehicle routing and truck scheduling with a cross-dock 
time by an average of 30 minutes; significantly improve 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 153 
system, providing new ideas and methods for path 2.2 Overview of the application of neural 
planning and scheduling in the logistics distribution link, networks in logistics 
which is of great significance to improving logistics 
efficiency and reducing costs. As an artificial intelligence technology that imitates 
Logistics path optimization refers to finding the best the working mode of biological brain, neural network has 
path from the starting point to the end point under a series shown great potential in many fields. In the logistics 
of constraints to minimize transportation costs, time or industry, neural networks are widely used in multiple links 
other specified goals. This process usually involves multi- such as route optimization, demand forecasting, and 
objective optimization problems, which may include inventory management. For example, the use of 
minimizing total mileage, reducing fuel consumption, convolutional neural networks (CNN) can extract useful 
shortening delivery time, etc. Logistics path optimization information from a large amount of image data for 
problems can be theoretically classified as combinatorial automatic identification of cargo labels, thereby speeding 
optimization problems. Typical problem forms include up sorting. The use of long short-term memory networks 
traveling salesman problem (TSP) and vehicle routing (LSTM) can process time series data, predict future 
problem (VRP) [11]. These problems become extremely demand fluctuations, and help companies prepare in 
complex when they are large in scale, and it is difficult to advance [14, 15]. Reinforcement learning (RL) can 
find the global optimal solution. Therefore, researchers dynamically adjust strategies based on historical 
have developed a variety of heuristic algorithms and meta- behaviors and reward signals to optimize the vehicle’s 
heuristic algorithms, such as genetic algorithms, simulated delivery path. 
annealing algorithms, ant colony algorithms, etc., to In recent years, researchers have also explored how to 
approximate solutions to such problems. These algorithms combine neural networks with other algorithms to solve 
try to find a satisfactory solution rather than an absolute more complex logistics problems. For example, some 
optimal solution through iterative search [12]. researchers combined genetic algorithms with neural 
Logistics path optimization is not limited to networks to form a hybrid model to solve multi-objective 
determining a single path, but also includes issues such as vehicle routing problems. The results showed that this 
multi-path selection and multi-vehicle scheduling. With method achieved a good balance between complexity and 
the growth of logistics business, how to efficiently allocate solution quality. In addition, graph neural networks 
resources in a large-scale network has become one of the (GNNs) are also used to analyze the topological structure 
key challenges. In order to meet this challenge, of logistics networks, predict traffic flow by learning the 
researchers have begun to explore new solutions, such as relationship between nodes, and then guide dynamic path 
introducing machine learning technology into path planning. 
planning, using historical data to predict future Neural networks are widely applied in multiple 
transportation demand, and thus Equationting more aspects of logistics management, including route 
reasonable distribution plans in advance. In addition, with optimization, demand forecasting, and inventory 
the development of Internet of Things (IoT) technology, management. Regarding route optimization, relevant 
the large amount of real-time data generated in logistics introductions have been provided. However, in the fields 
systems has also provided new possibilities for path of demand forecasting and inventory management, neural 
optimization [13]. networks also play significant roles. In demand 
Logistics path optimization refers to finding the best forecasting, recurrent neural networks (RNNs) or their 
path from the starting point to the destination under a variant, long - short - term memory networks (LSTMs), 
series of constraints, aiming to minimize transportation can be used to analyze historical order data. These 
costs, time, or other specific objectives. Previous networks can capture long - term dependencies in time - 
descriptions have mostly focused on time as a static series data to predict future order demands at different 
constraint. However, in real - world logistics scenarios, time intervals. For example, by analyzing sales data from 
real - time responsiveness plays a crucial role. The the past year, the order volume during upcoming holidays 
logistics environment is in a state of dynamic change. can be predicted to make advance inventory preparations 
Traffic conditions can change rapidly, such as sudden and logistics plans. In inventory management, 
traffic accidents or temporary road closures, which can autoencoders and other neural network structures can be 
render the originally planned path no longer optimal. Real used to detect inventory anomalies. By learning the data 
- time responsiveness is not just about time consideration features of normal inventory states, an alarm can be issued 
but also about the timely response to dynamic elements in in a timely manner when there are abnormal fluctuations 
the logistics environment. For example, by obtaining real in inventory levels. In the logistics path optimization use - 
- time traffic congestion information through a traffic case of this study, CNNs can be used to extract features 
monitoring system, when a congestion is detected on a from geospatial data to help identify the logistics 
certain route, the path can be immediately adjusted to characteristics of different regions; LSTMs can combine 
ensure transportation efficiency. This ability to time - series traffic data to predict the impact of future 
dynamically adjust to real - time changes should be an traffic conditions on paths; and reinforcement learning can 
important part of the concept of logistics path be used to dynamically select the optimal path based on 
optimization, thus forming a closer logical connection different environmental states, making the applications of 
with the subsequent real - time scheduling content. these neural networks closely related to the research use - 
case. 
154 Informatica 49 (2025) 151–170 Y. Yang et al. 
2.3 Development history of real-time 2.4 Review and analysis of related 
scheduling technology research literature 
Real-time scheduling technology refers to the ability In recent years, many studies have been devoted to 
to respond immediately when an event occurs. In the field applying advanced computing technologies to logistics 
of logistics, real-time scheduling technology is crucial for path optimization and real-time scheduling. For example, 
dealing with unforeseen situations, such as sudden traffic Ren et al. [18] proposed a hybrid method combining deep 
jams and road closures caused by weather changes. Early reinforcement learning and genetic algorithm to solve the 
real-time scheduling systems mainly relied on simple rules multi-objective vehicle routing problem. Experiments 
and expert systems, but with the advancement of show that this method can not only effectively handle 
information technology, more data-driven methods have multiple optimization objectives, but also achieve a good 
emerged. For example, technology based on model balance between complexity and solution quality. At the 
predictive control (MPC) can optimize operations in the same time, Yang et al. [19] used graph neural network 
future in a short period of time to ensure that the system is (GNN) to analyze the structure of urban traffic network 
always in the best operating state [16]. and proposed a dynamic path planning framework that can 
With the enhancement of computing power and the continuously update the optimal path under changing 
development of big data technology, modern real-time traffic conditions. Studies have shown that this method has 
scheduling systems are no longer limited to simple rule significant improvements in path update speed and path 
matching, but are able to predict future state changes by quality compared with traditional algorithms. 
learning patterns in historical data and adjust scheduling Although existing research has made significant 
strategies accordingly. For example, Wu et al. [17] uses progress, there are still some challenges to overcome. The 
deep reinforcement learning technology to implement first is the issue of data privacy and security. Since a large 
real-time scheduling, which can adaptively adjust the path amount of sensitive information is involved in the logistics 
of mobile robots in dynamic environments, thereby system, how to ensure the secure transmission and storage 
improving the flexibility and responsiveness of the of data is an important task. The second is the 
system. In addition, cloud computing and edge computing interpretability of the algorithm. Although deep learning 
technologies have also made real-time scheduling more models perform well in many cases, they are often black 
feasible because they provide powerful computing box models and lack transparency, which limits their 
resources to process massive data and ensure that delays application in certain industries (such as healthcare) [20, 
are minimized. 21]. Finally, the popularity of technology. Although 
This paper introduces real - time scheduling as a academia has proposed many innovative solutions, there 
strategy to deal with “unforeseen situations” such as are still relatively few actual deployments in the industry. 
traffic jams. However, real - time scheduling should not This may be due to factors such as technology maturity 
exist in isolation but should be an important part of the and cost-effectiveness. 
overall logistics path optimization problem. In actual “Data privacy and security,” “algorithm 
logistics operations, real - time scheduling and path interpretability,” and “technology maturity and cost - 
optimization influence and promote each other. When effectiveness” are challenges that cannot be ignored in the 
encountering unforeseen situations like traffic jams, real - research of logistics path optimization. In terms of data 
time scheduling needs to be dynamically adjusted based privacy and security, the City Logistics Dataset (CLDS) 
on the foundation of path optimization. For example, if the and Traffic Status Dataset (TSDS) used in this study 
originally planned path cannot reach the destination on contain a large amount of sensitive information, such as 
time due to a traffic jam, the real - time scheduling system customer addresses and order details. To protect data 
should re - plan the optimal path according to the current privacy, encryption technology can be used to encrypt the 
traffic conditions and remaining order information. At the data to ensure its security during transmission and storage. 
same time, path optimization should also consider the At the same time, a strict data access permission 
possibility of real - time scheduling and reserve a certain management mechanism should be established, and only 
degree of flexibility in path planning to enable rapid authorized personnel can access and process the data. 
adjustment in case of emergencies. Model predictive Regarding algorithm interpretability, the decision - 
control (MPC) is a method that predicts the future making processes of complex models such as deep 
behavior of a system based on a model and optimizes reinforcement learning and convolutional neural networks 
control strategies. In this study, MPC is closely related to are often difficult to understand. To improve algorithm 
real - time scheduling and path optimization. MPC can use interpretability, methods such as feature importance 
real - time traffic data and the logistics system model to analysis and decision trees can be used to explain the 
predict traffic conditions and logistics demand changes in model's decision - making process. For example, through 
the future, thereby adjusting path planning and scheduling feature importance analysis, it can be understood which 
strategies in advance. For example, if MPC predicts that a input features have the greatest impact on path selection, 
certain road will experience severe congestion in the next enabling decision - makers to better understand the 
hour, the system can plan a detour route in advance to model's decision - making basis. 
avoid getting the vehicle stuck in the jam and improve the In terms of technology maturity and cost - 
efficiency of logistics transportation. effectiveness, a comprehensive evaluation of the adopted 
technologies is required. Although deep reinforcement 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 155 
learning and convolutional neural networks have great suitable technology solution. For example, by comparing 
potential in logistics path optimization, the application of the computational complexity and performance indicators 
these technologies requires certain computing resources of different algorithms, an algorithm with high 
and professional knowledge. Therefore, in practical computational efficiency and low cost can be selected. 
applications, it is necessary to weigh the technology The specific research status is shown in Table 1. 
maturity and cost - effectiveness and select the most 
Table 1: Research status 
Research Method Key Indicators SOTA Positioning 
Path length, completion time, on - time rate, Fast in finding the shortest path in simple static 
SPA 
scheduling success rate scenarios 
Path length, completion time, on - time rate, Quick in finding approximate solutions for 
HA 
scheduling success rate large - scale problems 
Path length, completion time, on - time rate, Outstanding in solving complex optimization 
GA 
scheduling success rate problems 
Path length, completion time, on - time rate, Fast decision - making in known simple 
RBM 
scheduling success rate environments 
Path length, completion time, on - time rate, Advantageous in handling dynamic 
TDRM 
scheduling success rate environments 
Path length, completion time, on - time rate, Currently leading in the application of deep 
ADLM 
scheduling success rate learning 
This Study (DRL + Path length, completion time, on - time rate, Surpassing existing methods in multiple 
CNN) scheduling success rate indicators 
considerations. The CLDS dataset covers detailed 
3 Research methods and model information on urban logistics distribution, including the 
geographical locations of distribution points, order times, 
construction and quantities. Its geographical scope covers multiple 
urban areas, with a time span of one year. The dataset 
3.1 Data collection and preprocessing contains 1 million records and 20 columns of different 
Data collection and preprocessing are key steps to attribute information. These data can well reflect the 
ensure the smooth progress of subsequent modeling work. actual situation of urban logistics and provide rich order 
This study mainly relies on public datasets for and geographical location information for logistics path 
experimental verification and model training. The reason optimization. The TSDS dataset focuses on traffic status 
for choosing public datasets is that they provide a wide data, including traffic flow and vehicle speeds on different 
range of data sources, cover different types of real roads at different time intervals. Its time granularity is 15 
scenarios, and help improve the generalization ability of minutes, and its geographical scope matches that of the 
the model. We selected two major public datasets to CLDS dataset. The dataset has 800,000 records and 15 
support this study, as shown in Table 2. (1) City Logistics columns of attributes. This dataset can reflect real - time 
Data Set (CLDS). This dataset contains logistics changes in traffic conditions, which is crucial for real - 
distribution information from multiple European cities, time path optimization. The characteristics of these two 
including the time, location, cargo type, weight, etc. of the datasets are highly relevant to the model requirements of 
order. These data reflect the actual urban logistics this study. The model needs to plan paths based on order 
operations and are very suitable for training and testing information and geographical locations, and the CLDS 
our models. (2) Traffic State Data Set (TSDS). This dataset provides the necessary basic data; at the same time, 
dataset provides information on the status of urban traffic the model needs to consider the impact of real - time traffic 
in different time periods, including traffic flow, average conditions on paths, and the TSDS dataset provides real - 
speed, road congestion, etc. These data help us analyze the time traffic data support. By combining these two datasets, 
impact of traffic conditions on logistics path optimization a model that better conforms to the actual logistics 
and provide a basis for real-time scheduling [22]. environment can be constructed, improving the accuracy 
The selection of the City Logistics Dataset (CLDS) and real-time performance of path optimization. 
and Traffic Status Dataset (TSDS) is based on multiple 
 
156 Informatica 49 (2025) 151–170 Y. Yang et al. 
Table 2: Dataset information 
Dataset Geographical Sample 
Data Types Time Range Key Features 
name range size 
City Logistics 
January 2018 Order time, 
Logistics and Many cities 
to December 100,000+ location, cargo 
Data Set delivery in Europe 
2019 type, weight, etc. 
(CLDS) information 
Traffic volume, 
Traffic State Traffic North January 2019 
average speed, 
Data Set status American to December 50,000+ 
road congestion, 
(TSDS) information major cities 2020 
etc. 
Data processing is a key step to ensure the reliability adjust the path to avoid congested roads and improve 
of subsequent model training and experimental results. transportation efficiency. 
The datasets used in this study include City Logistics Data In logistics path optimization, the traffic input is 
Set (CLDS) and Traffic State Data Set (TSDS), which closely related to path selection. The model calculates the 
cover logistics distribution information and urban traffic estimated travel times of different paths based on real - 
status, respectively. First, we cleaned the original data, time traffic data and gives priority to the path with the 
removed duplicate records, and used the 3σ principle to shortest travel time. For example, when there is a traffic 
detect and remove outliers to improve data quality. For jam on a certain road, the model will automatically avoid 
missing values, we used interpolation to fill in the missing that road and select other relatively unobstructed routes. 
values to prevent incomplete data from affecting model At the same time, the model also considers the changing 
performance. By standardizing the numerical features, we trend of traffic conditions and plans the path in advance to 
converted them into a form with zero mean and unit cope with possible traffic jams. By closely integrating the 
variance to facilitate model learning. At the same time, {traffic} input with path selection, dynamic optimization 
new feature variables were created according to business of logistics paths can be achieved, improving the 
needs, such as calculating the distance between two efficiency and reliability of logistics transportation. 
points, extracting specific attributes from the date (such as The model aims to solve the path optimization and 
the day of the week), and adjusting demand forecasts real-time scheduling problems in logistics distribution, 
according to holidays [23, 24]. and realizes efficient logistics distribution management by 
integrating CNN’s ability to extract spatial features and 
3.2 Neural network model DRL’s ability to learn dynamic strategies. The model’s 
input includes geographic location information, order 
This study proposes an innovative model that 
information, time information, and traffic conditions, and 
combines DRL and CNN to solve logistics path 
the output is a series of action instructions that instruct the 
optimization and real-time scheduling problems. The 
logistics system how to optimally dispatch vehicles. 
model uses the powerful feature extraction capability of 
The specific pseudo code is as follows. 
CNN to process spatial data and dynamically adjusts the 
# CNN forward pass 
strategy through DRL to cope with the ever-changing 
def cnn_forward(x): 
logistics environment. The following is the specific design 
x = conv_layer(x, filters=32, kernel_size=(3, 3)) 
of the model and its mathematical expression. 
x = relu(x) 
The model inputs {lat, Ing}, {order}, and {traffic} 
x = pool_layer(x, pool_size=(2, 2)) 
have clear meanings and interrelationships. {lat, Ing} 
return flatten(x) 
represents the geographical coordinates of distribution 
# DRL Q - network forward pass 
points and vehicles. These coordinate information is the 
def q_network_forward(state): 
basis for path planning, determining the position and 
x = fc_layer(state, units=256) 
moving direction of vehicles in the geographical space. 
x = relu(x) 
{order} contains detailed order information, such as the 
x = fc_layer(x, units=256) 
quantity of orders, delivery times, and delivery locations. 
x = relu(x) 
Order information is the goal of path optimization, and the 
return fc_layer(x, units=action_size) 
model needs to plan the optimal path according to the 
# DRL training step 
order requirements to ensure timely and accurate delivery. 
def drl_train(): 
{traffic} represents real - time traffic conditions, including 
state = get_state() 
road congestion levels and vehicle speeds. Traffic 
action = choose_action(state) 
conditions are important factors affecting path selection. 
next_state, reward, done = take_action(action) 
Real - time traffic data can help the model dynamically 
update_q_network(state, action, reward, next_state, 
done) 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 157 
3.2.1 Model overview to to ensure the shortest path or the least time required [26, 
The input of the model includes the location 27]. As shown in Figure 1, the model uses a feature 
extraction module to process multiple sources of 
coordinates of the distribution points (lat , lng )}N
, the 
i i i=1 information such as order information, location 
order information of each distribution point {order }N
, coordinates, time information, and traffic conditions. The 
i i=1
feature extraction module includes convolutional layers 
the current time t, and the current traffic conditions and pooling layers to extract useful information. Next, the 
{traffic }N
. The geographic location information 
i i=1 model takes the state representation as input and guides 
covers the location coordinates of the distribution points, action instructions through action selection and execution. 
The model update process continuously optimizes 
which is expressed as (lat , lng )}N
where N represents 
i i i=1 decisions, thereby improving delivery efficiency. 
the number of distribution points, and each coordinate pair The model output “can be represented as a series of 
(lat actions {a}, where each action a can be an operation such 
i , lngi ) represents the latitude and longitude of the 
as selecting the next delivery point or adjusting the vehicle 
distribution point. The order information includes the 
speed.” In this study, the current focus is mainly on path 
order details of each distribution point, such as the order 
selection, that is, the action of selecting the next delivery 
quantity, cargo type, and estimated arrival time [25], 
point. Selecting the next delivery point is the core task of 
which can be expressed as where each {order }N order
i i=1 i logistics path optimization, which directly affects 
contains all the order information related to the i-th transportation costs and time. By optimizing the selection 
distribution point. The time information includes the order of delivery points, the driving mileage and time of 
current time t and the estimated arrival time, which is vehicles can be reduced, improving logistics efficiency. 
crucial for dynamically adjusting the path and scheduling. Although the current research focuses on path 
The traffic condition information reflects the current selection, adjusting the vehicle speed is also an important 
traffic conditions, such as road congestion, which can be factor in logistics optimization. In real - world logistics 
scenarios, vehicle speed adjustment can be based on 
expressed as {traffic }N
where each traffici describes 
i i=1 factors such as traffic conditions and delivery time 
the traffic conditions on the i-th road. These input data requirements. For example, when encountering a traffic 
together constitute the input state of the model, which is jam, appropriately reducing the vehicle speed can avoid 
used to dynamically adjust the logistics path and real-time frequent starting and stopping, reducing fuel consumption; 
scheduling strategy. while on a smooth road section, increasing the vehicle 
The output of the model is a set of action instructions speed can shorten the transportation time. In future 
that instruct the logistics system how to optimally dispatch research plans, the collaborative optimization of vehicle 
vehicles. The output can be represented as a series of speed adjustment and path selection will be further 
actions {at} , where each action at can be an operation explored to achieve more efficient logistics transportation. 
For example, a comprehensive optimization model can be 
such as selecting the next delivery point or adjusting the 
established, considering both path selection and vehicle 
vehicle speed. Specifically, these action instructions are 
speed adjustment, with the goal of minimizing 
designed to guide the logistics system to make the best 
transportation costs and time to formulate the optimal 
decision based on the current state to minimize cost, time, 
logistics strategy.
or other optimization goals. An action at can be to select 
the next delivery point that the current vehicle should go 
 
158 Informatica 49 (2025) 151–170 Y. Yang et al. 
The location coordinates 
Order information Time information Traffic conditions
of the delivery point
Feature extraction
Convolutional layer Pooling layer
State Action 
representation selection
Action 
Model update
execution
Action instruction
 
Figure 1: Model framework 
 
3.2.2 Feature extraction optimal path selection strategy through DRL. The core of 
the path optimization algorithm design is how to select the 
In the logistics environment, the spatial features 
optimal path based on the current logistics environment 
extracted by CNN have a significant impact. If there are 
status. The algorithm is implemented through the 
dense distribution points in a certain area, a centralized 
following steps [29, 30]. 
distribution route can be planned to reduce costs. The road 
The features extracted in the CNN process include 
connectivity feature can help avoid dead-end roads and 
geographical layout features, such as road direction and 
choose efficient routes. These specific spatial features are 
delivery point location; traffic condition features, such as 
directly related to route selection. Since logistics needs to 
the distribution of congested sections. These features are 
be transported efficiently and at low cost, spatial features 
closely related to logistics decisions. Geographic layout 
provide a key basis for route planning [28]. 
features determine the basic path framework, and traffic 
Using CNN to extract spatial features from 
condition features affect real-time path adjustments. 
geographic location information is an important part of 
Combining these features can formulate a better logistics 
this study. Specifically, we use convolutional layers and 
distribution plan. 
pooling layers to capture local and global features on the 
In logistics optimization, the "state-action pair" has a 
map. The convolutional layer extracts spatial features of 
clear meaning. The state includes order information, 
different scales by applying multiple convolution kernels.  
vehicle location, traffic conditions, etc. Action refers to 
selecting the next delivery point, changing driving speed, 
3.3 Path optimization algorithm design 
etc. The Q network outputs the action value based on the 
The path optimization algorithm designed in this current state and selects the action with the maximum 
study aims to achieve efficient path optimization in value, such as choosing a detour when traffic is congested, 
logistics distribution by combining the spatial feature in order to optimize the cost, time and other goals. 
extraction capability of convolutional neural networks (1) State representation: Use CNN to extract spatial 
(CNN) and the dynamic strategy learning capability of features from geographic location information and form a 
DRL. The algorithm design utilizes the powerful feature representation of the current state st . The state 
extraction capability of CNN to capture the spatial 
relationship between distribution points and learns the  representation st describes the current logistics 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 159 
environment configuration, including but not limited to feature maps. Next is the pooling layer, which uses 
vehicle location, cargo status, time information, etc. The maximum pooling with a pooling size of (2, 2). Its 
state representation can be expressed as Equation (1). function is to downsample the feature map, reduce the 
 s amount of data, and retain important features. During the 
t =CNN(x )  (1) 
t
training process, the back-propagation algorithm is used 
Among them, xt is the input data at the current time to update the weight parameters of the convolution kernel 
point, including the location coordinates of the delivery to minimize the loss function. 
point, order information, current time, and traffic For the DRL architecture, the core is the Q network. 
conditions.  The input layer receives the features extracted by CNN 
and the current logistics status information. The Q 
(2) Action selection: Based on the current state st , 
network contains a fully connected layer with 256 neurons 
use the Q network in DRL to estimate the value of the and ReLU as the activation function, which can introduce 
state-action pair Q(st ,at ) and select the optimal action nonlinear factors and enhance the network's 
expressiveness. The output layer outputs the Q value of 
at . Action selection is st determined based on the current each possible action. The training strategy uses an 
experience replay mechanism to store the agent's 
state and the output of the Q network. Specifically, st the 
experience (state, action, reward, next state) in the 
action that can maximize the Q value is selected based on experience replay pool, and randomly samples from it for 
the current state at , expressed as Equation (2). training to reduce the correlation of the data. The learning 
rate is initially set to 0.001, and decay is considered. The 
 at = argmaxa Q(s , )  (2  
t a ) discount factor is 0.99 to balance immediate rewards and 
(3) Execute action: Execute the selected action at , future rewards. The initial exploration rate is 1.0, the 
minimum is 0.01, and the decay rate is 0.995. Random 
update the environment state to st+1 , and get an exploration is performed with a higher probability at the 
beginning of training, and the exploration rate is gradually 
immediate reward 0. rt Update the environment state to reduced as training progresses. With this comprehensive 
s architecture design and training strategy, the model is 
t+1 according to the selected action at and get an 
expected to achieve good results in logistics path 
immediate reward rt . The immediate reward rt reflects optimization and real-time scheduling. 
at the direct effect after executing the action, such as 
4 Experimental evaluation 
whether the goods are delivered successfully, whether the 
driving time is reduced, etc. 4.1 Experimental design 
(4) Model update: Use the Q-Learning update rule to In order to verify the effectiveness of the proposed 
update the Q value in the Q network to approach the DRL - CNN combined model for logistics path 
optimal strategy. Model update adjusts the Q value in the optimization and real - time scheduling, this section details 
Q network through the Q-Learning update rule, expressed the experimental design. To ensure repeatability, we 
as Equation (3). selected the City Logistics Data Set (CLDS) and Traffic 
Q(st ,at ) Q(st ,at ) State Data Set (TSDS) as public datasets. You can try to 
  (3) find the dataset links on public data platforms like Kaggle 
+[rt + maxa Q(st+1,a)−Q(st ,at )] (https://www.kaggle.com/), Data.gov 
Among them,  is the learning rate,  is the discount (https://www.data.gov/), Zenodo (https://zenodo.org/), 
factor (0 <<  1), which indicates the discount degree of academic resource websites such as IEEE DataPort 
(https://ieee-dataport.org/) and ACM Digital Library 
future rewards. This update rule rt adjusts the Q value of 
(https://dl.acm.org/), or relevant university and research 
the current state st and action by the maximum Q value of institution official websites. The data was divided into 
training (70%), validation (15%), and test (15%) sets. 
the current immediate reward at and the future state st+1 Six baseline methods (SPA, HA, GA, RBM, TDRM, 
. ADLM) were chosen for comparison. Each has unique 
For the CNN architecture, its input layer receives features and application scenarios. SPA offers the 
geospatial information from the logistics environment, theoretical shortest path but struggles with dynamic 
such as the location coordinates of the distribution point logistics. HA quickly finds approximate solutions for 
and the topological structure of the transportation large - scale problems, GA suits complex optimizations 
network. Next is the convolution layer. This study initially with high computational cost, RBM works for simple 
set two layers of convolution, with 32 and 64 convolution tasks in known environments, TDRM has limitations in 
kernels, respectively, and the convolution kernel size is (3, feature extraction compared to the proposed model, and 
3) with a step size of 1. The convolution layer extracts ADLM may be less effective in specific scenarios. 
local features of the input data through convolution The evaluation dataset comes from real - world 
operations. Each convolution kernel slides on the input logistics with historical order data, location info, and 
data, multiplies elements and sums them to generate traffic conditions. Evaluation indicators include path 
160 Informatica 49 (2025) 151–170 Y. Yang et al. 
length, completion time, punctuality, and scheduling numerical range. For categorical features, we encoded 
success rate. them and converted them into numerical data so that the 
For hyperparameters, we considered CNN and DRL model can process them effectively. 
characteristics. CNN had 32 and 64 convolution kernels, Through these outlier removal and data preprocessing 
(3, 3) size, step - size 1, and (2, 2) max - pooling. DRL's steps, the model can more accurately capture the 
Q - network had 256 neurons in the fully - connected layer characteristics and patterns in the data and reduce the 
with ReLU activation. Learning rate was 0.001 with interference of noise and errors. In experiments in 
decay, discount factor 0.99. Exploration rate started at 1.0, different logistics environments (suburbs, cities, 
minimum 0.01, decay rate 0.995, batch size 64, and target highways, etc.), the processed data enabled the model to 
network updated every 100 steps. Grid, random search, achieve better performance in indicators such as path 
and Bayesian optimization were used to find the best length, completion time, punctuality and scheduling 
hyperparameters. SPA, HA, and GA help evaluate the success rate, while also improving the robustness and 
model's advantages from different angles. generalization ability of the model, providing more 
In this study, outlier removal and data preprocessing reliable support for logistics path optimization and real-
played a crucial role in improving the performance of the time scheduling. 
final model. After obtaining public datasets such as City 
Logistics Data Set (CLDS) and Traffic State Data Set 4.2 Experimental results 
(TSDS), we found that there were some outliers in the 
We tested in suburban environments, urban 
data, which may be caused by data entry errors, sensor 
environments, highways, and other environments, aiming 
failures, or special events. If not processed, they will have 
to comprehensively evaluate the performance differences 
a negative impact on model training and prediction, 
of different logistics scheduling methods in different 
causing the model to learn incorrect features and patterns, 
environments. 
thereby reducing the accuracy and stability of the model. 
Path length, completion time, punctuality, and 
To this end, we used a statistical analysis-based 
scheduling success rate are closely related to logistics path 
method to remove outliers. For example, for numerical 
optimization. Short paths, fast completion, high 
data, we calculated its mean and standard deviation, and 
punctuality, and high scheduling success rate are the goals 
regarded data points that deviated from the mean by a 
of logistics. "Success" means completing order delivery 
certain multiple of the standard deviation as outliers and 
on time and as required. These indicators measure 
removed them. In this way, we ensured the quality and 
logistics efficiency and service quality from different 
consistency of the data, allowing the model to learn based 
dimensions, and can effectively evaluate the effect of path 
on more reliable data. 
optimization. 
In terms of data preprocessing, we performed 
"Scale" in experiments can refer to the number of 
operations such as data cleaning, feature scaling, and 
orders, the size of the geographical area, etc. More orders 
encoding. During the data cleaning process, we handled 
or larger geographical areas will increase the complexity 
missing values and used methods such as mean filling and 
and uncertainty of path planning. For example, more 
median filling to ensure the integrity of the data. Feature 
orders may require more vehicles to be deployed, and a 
scaling is to normalize or standardize features of different 
large geographical area may face more traffic conditions. 
ranges and scales so that all features have the same 
Clarifying the concept of scale can better understand its 
importance in model training and avoid some features 
impact on experimental settings and results. 
dominating the model training process due to their large 
 
Figure 2: Scheduling efficiency at different scales 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 161 
As shown in Figure 2, the scheduling efficiency of relatively high level at a large scale. This shows that these 
different scheduling methods at various scales is shown. It methods have certain adaptability and stability when 
can be seen that the scheduling efficiency of all methods dealing with larger-scale tasks. However, it should be 
gradually decreases as the scale increases. Our method noted that the scheduling efficiency will gradually 
decreases the slowest. Among them, the scheduling decrease as the scale increases, which means that some 
efficiency of “SPA”, “HA”, “GA”, “RBM”, “TDRM”, challenges and limitations may arise when facing larger-
“ADLM” and “Proposed” all show a downward trend to scale problems. Therefore, in practical applications, it 
varying degrees. Although the decline rate of each method should be considered to optimize these methods to 
is different, their scheduling efficiency remains at a improve their performance at large task scales. 
 
Figure 3: Robustness changes at different scales 
Figure 3 shows the robustness performance of and scheduling success rate are more critical in actual 
different methods at different scales. It can be seen that as logistics services, so they are given higher weights; while 
the scale increases, the robustness of all methods path length and completion time are relatively less 
decreases, but the performance of the “Proposed” method important and have slightly lower weights. 
is significantly better than other methods, showing higher In the legend of Figure 3, "Proposed" represents the 
stability and robustness. Even at a larger scale, “Proposed” method proposed in this study that combines DRL with 
can still maintain a high performance, reflecting its CNN, and "SPA" and others represent other baseline 
superiority in coping with complex environmental methods for comparison. The "scale" on the x-axis 
changes. represents the scale of the experiment, which can be the 
In Figure 3, robustness is measured by taking into number of orders, the scope of the geographical area, or 
account multiple factors. Specifically, we define the time span. As these scale factors increase, the 
robustness as the ability of the model to maintain efficient complexity and uncertainty of the logistics environment 
and stable path planning and scheduling in logistics will increase accordingly. By observing the robustness of 
environments of different scales and dynamic changes. To different methods at different scales, we can intuitively 
quantify this ability, we use a comprehensive evaluation compare their ability to cope with complex environmental 
method of a series of key indicators. First, the fluctuation changes. 
range of path length, completion time, on-time rate, and For traditional algorithms, computing time refers to 
scheduling success rate of each method at different scales the time it takes from the start of the algorithm to finding 
is calculated. The smaller the fluctuation range, the more the best solution. For methods based on machine learning 
stable the method is in the face of environmental changes, or deep learning, computing time covers two stages: 
and the higher the robustness. model training and inference. Training time is the time it 
For the "value" indicator, it is the weighted sum of the takes for the model to learn parameters on the data set, and 
above multiple key indicators, and the weight is inference time is the time it takes to use the trained model 
determined based on the importance of each indicator in to obtain the path planning result. This measurement can 
the actual operation of logistics. For example, on-time rate comprehensively evaluate the efficiency of each method.
 
162 Informatica 49 (2025) 151–170 Y. Yang et al. 
Table 3: Performance comparison of different methods in suburban environment 
Path Scheduling Robustness 
Method Completion Punctuality Calculation 
length success score (out 
Name time (min) rate (%) time(s) 
(km) rate (%) of 10) 
SPA 240 170 78 83 10 6 
HA 220 160 82 86 15 7 
GA 210 150 88 91 25 8 
RBM 250 180 75 80 5 5 
TDRM 200 140 86 90 20 7 
ADLM 190 130 90 93 18 8 
Proposed 180 120 92 95 12 9 
In suburban environments, the challenges faced by from 5-10 kilometers to 20-30 kilometers, with an average 
logistics scheduling are mainly long delivery distances distance of about 15 kilometers. In terms of road 
and less traffic interference. As can be seen from Table 3, conditions, the main roads are relatively wide but may be 
the proposed method outperforms other baseline methods generally maintained, and some branch roads are narrow 
in almost all indicators.  and in poor condition. Traffic flow is generally small, and 
The description of the “suburban” environment in peak hours may increase due to activities in surrounding 
Table 2 is too vague, and its characteristics need to be towns. Similarly, for other environments such as cities, 
defined in detail to facilitate readers to fully understand highways, and multi-point distribution, their node 
the results. Suburban environments have relatively few characteristics, distance indicators, roads and traffic 
nodes and are more dispersed, usually around 10-20 conditions, etc. should also be clarified to enhance the 
nodes. The distance between nodes varies greatly, ranging interpretability of the results. 
Table 4: Performance comparison of different methods in urban environment 
Path Scheduling Robustness 
Method Completion Punctuality Calculation 
length success score (out 
Name time (min) rate (%) time(s) 
(km) rate (%) of 10) 
SPA 260 200 75 80 10 6 
HA 240 190 78 82 15 7 
GA 230 180 82 85 25 8 
RBM 270 210 70 75 5 5 
TDRM 220 170 84 87 20 7 
ADLM 210 160 88 90 18 8 
Proposed 200 150 90 93 12 9 
The urban environment is characterized by dense 200 km, a completion time of 150 minutes, a punctuality 
buildings and complex transportation networks, which rate of 90%, and a scheduling success rate of 93% in the 
puts higher requirements on logistics scheduling. Table 4 urban environment, which are higher than other methods. 
shows the performance of different methods in the urban This shows that the proposed method can not only find a 
environment. The proposed method has a path length of better distribution path in the city, but also better adapt to 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 163 
the dynamic changes in the city, ensuring a high service 
quality and scheduling success rate. 
Table 5: Performance comparison of different methods in highway environment 
Path Scheduling Robustness 
Method Completion Punctuality Calculation 
length success score (out 
Name time (min) rate (%) time(s) 
(km) rate (%) of 10) 
SPA 230 160 80 85 10 6 
HA 210 150 83 87 15 7 
GA 200 140 86 90 25 8 
RBM 240 170 78 82 5 5 
TDRM 190 130 88 91 20 7 
ADLM 180 120 91 93 18 8 
Proposed 170 110 93 95 12 9 
The highway environment is characterized by fast especially in terms of path length, which reaches 170 km, 
traffic speed and strict traffic rules. Table 5 shows that in which is shorter than other methods. This shows that the 
the highway environment, the proposed method is proposed method has higher efficiency on highways and 
superior to other methods in terms of path length, can complete delivery tasks faster, while ensuring 
completion time, punctuality and scheduling success rate, extremely high punctuality and scheduling success rates. 
Table 6: Performance comparison of different methods under severe weather conditions 
Path Scheduling Robustness 
Method Completion Punctuality Calculation 
length success score (out 
Name time (min) rate (%) time(s) 
(km) rate (%) of 10) 
SPA 270 210 72 77 10 6 
HA 250 200 75 78 15 7 
GA 240 190 78 82 25 8 
RBM 280 220 68 72 5 5 
TDRM 230 180 80 83 20 7 
ADLM 220 170 83 86 18 8 
Proposed 210 160 85 88 12 9 
Bad weather can seriously affect the efficiency and and a scheduling success rate of 88%, which are better 
safety of logistics distribution. As can be seen from Table than other methods. This shows that the proposed method 
6, the proposed method still performs well under bad has better robustness and can maintain a high service level 
weather conditions, with a path length of 210 km, a under adverse weather conditions. 
completion time of 160 minutes, an on-time rate of 85%, 
 
164 Informatica 49 (2025) 151–170 Y. Yang et al. 
Table 7: Performance comparison of different methods in peak traffic environment 
Path Scheduling Robustness 
Method Completion Punctuality Calculation 
length success score (out 
Name time (min) rate (%) time(s) 
(km) rate (%) of 10) 
SPA 280 220 68 72 10 6 
HA 260 210 72 75 15 7 
GA 250 200 75 78 25 8 
RBM 290 230 65 70 5 5 
TDRM 240 190 78 80 20 7 
ADLM 230 180 80 83 18 8 
Proposed 220 170 82 85 12 9 
Traffic rush hour is a major difficulty in logistics success rate, especially the completion time of 170 
scheduling. Table 7 shows that during peak hours, the minutes and the punctuality rate of 82%, which shows that 
proposed method is ahead of other methods in terms of the proposed method can still maintain a high work 
path length, completion time, punctuality and scheduling efficiency and service level during peak hours. 
Table 8: Performance comparison of different methods in emergency delivery environment 
Path Scheduling Robustness 
Method Completion Punctuality Calculation 
length success score (out 
Name time (min) rate (%) time(s) 
(km) rate (%) of 10) 
SPA 250 180 75 78 10 6 
HA 230 170 78 80 15 7 
GA 220 160 80 82 25 8 
RBM 260 190 70 75 5 5 
TDRM 210 150 82 85 20 7 
ADLM 200 140 85 87 18 8 
Proposed 190 130 88 90 12 9 
Emergency delivery requires quick response and scheduling success rate of 90%, all of which are better 
efficient scheduling. As can be seen from Table 8, the than other methods. This shows that the proposed method 
proposed method performs well in the emergency delivery can also efficiently complete the delivery task in an 
environment, with a path length of 190 km, a completion emergency and meet the urgent needs of customers. 
time of 130 minutes, an on-time rate of 88%, and a 
 
 
 
 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 165 
Table 9: Performance comparison of different methods in a multi-delivery point environment 
Path Scheduling Robustness 
Method Completion Punctuality Calculation 
length success score (out 
Name time (min) rate (%) time(s) 
(km) rate (%) of 10) 
SPA 260 190 70 75 10 6 
HA 240 180 75 78 15 7 
GA 230 170 78 80 25 8 
RBM 270 200 68 72 5 5 
TDRM 220 160 80 83 20 7 
ADLM 210 150 82 85 18 8 
Proposed 200 140 85 87 12 9 
Table 9 shows that in the multi-point distribution the path length is 200 km and the completion time is 140 
environment, the proposed method is superior to other minutes, which shows the advantages of the proposed 
methods in terms of path length, completion time, method in dealing with multi-point distribution. 
punctuality and scheduling success rate, especially when 
 
Figure 4: Comprehensive performance comparison of different methods 
 
 
 
 
166 Informatica 49 (2025) 151–170 Y. Yang et al. 
Table 10: Robustness score comparison 
Method Name Calculation time(s) Robustness score (out of 10) 
SPA 10 6 
HA 15 7 
GA 25 8 
RBM 5 5 
TDRM 20 7 
ADLM 18 8 
Proposed 12 9 
As shown in Table 10 and Figure 4, the proposed rate. This fully demonstrates the superiority of the 
method outperforms other methods in all environments, proposed method in different logistics scheduling 
especially in key indicators such as path length, environments and shows its great potential in practical 
completion time, on-time rate, and scheduling success applications. 
 
Figure 5: Success rate performance at different scales 
“Success rate” refers to the proportion of orders that higher success rate at a smaller scale, but as the scale 
are successfully dispatched, and “scale” refers to the increases, its success rate decreases rapidly and eventually 
number of orders. It only shows that the success rate stabilizes. 
increases with scale, but does not explain the practical In order to rigorously verify the significance of the 
value of this relationship for the logistics environment. performance differences of the proposed method in this 
Figure 5 shows the success rate performance of study compared with other baseline methods in different 
different algorithms at different scales. As can be seen environments, we conducted a paired sample t test. For the 
from the figure, with the increase of scale, the success rate suburban environment, in terms of the path length 
of each algorithm generally shows a downward trend. It is indicator, the t statistic calculation results show that its 
worth noting that the “Proposed” algorithm shows a absolute value far exceeds the critical value, indicating 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 167 
that the proposed method is significantly different from is 240 kilometers, while that of the model in this study is 
other baseline methods, and the path of this method is only 180 kilometers. This is because the model uses the 
significantly shorter; similarly, the completion time powerful feature extraction ability of CNN to better 
indicator shows that the advantage of this method in short capture geospatial information, combined with DRL to 
completion time is significant. dynamically learn the optimal strategy, so as to plan a 
In the urban environment, the t test results of the shorter path. 
completion time and punctuality indicators are significant, In terms of completion time, the model also performs 
indicating that this method can complete tasks more well. In urban environments, the completion time of the 
efficiently and on time in complex urban environments. In genetic algorithm (GA) is 180 minutes, while the 
the highway environment, the difference in path length completion time of this model is only 150 minutes. This is 
and dispatch success rate is significant, reflecting the due to the model's rapid response to dynamic information 
superiority of this method in planning paths and arranging such as real-time traffic conditions and decision 
dispatches in high-speed scenarios. adjustments, which achieves more efficient scheduling. 
Under severe weather conditions, there are significant On-time rate and scheduling success rate are 
differences in the punctuality rate and dispatch success important indicators for measuring the quality of logistics 
rate indicators, showing the robustness of this method in services. In various environments, the punctuality rate and 
dealing with severe weather. During peak traffic hours, the scheduling success rate of this model are higher than other 
completion time and dispatch success rate are significantly methods. For example, in the highway environment, the 
different, indicating that this method can also maintain punctuality rate of the traditional deep reinforcement 
efficient dispatch during peak hours. In the emergency learning method (TDRM) is 88%, and the scheduling 
delivery environment, all indicators are significantly success rate is 91%, while this model reaches 93% and 
better than the baseline method, highlighting the ability of 95% respectively. This shows that this model can better 
rapid response and efficient dispatch. The significant cope with the complex and changing logistics 
differences in path length and completion time in a multi- environment and ensure the stability and reliability of 
point distribution environment prove the advantages of logistics services. 
this method in dealing with multi-point distribution From the perspective of computing time and 
problems. Overall, the advantages of this research method robustness score, this model shows stronger robustness 
in different environments and indicators are statistically while maintaining high efficiency. In terms of computing 
significant. time, this model is at a medium level of 12 seconds, but it 
can maintain a high robustness score (9 points) under 
4.3 Discussion complex environmental changes. This is because the 
model continuously optimizes decisions during the 
Through the above experimental results, we 
learning process and has better adaptability to 
comprehensively evaluated the proposed innovative 
environmental changes. 
model that combines DRL with convolutional neural 
In summary, the DRL + CNN model proposed in this 
networks (CNN). Compared with the state-of-the-art 
study has obvious advantages in logistics path 
(SOTA) in related work, the model in this study showed 
optimization and real-time scheduling, and can effectively 
significant advantages in multiple key indicators. 
fill the shortcomings of existing methods in complex 
In terms of path length, the path length of the model 
environment adaptability, real-time, feature extraction, 
in this study is shorter than that of other comparison 
and integration with domain knowledge, providing strong 
methods, whether in suburban, urban or highway 
technical support for the development of future logistics 
environments. For example, in suburban environments, 
scheduling systems. 
the path length of the classic shortest path algorithm (SPA) 
Table 11: Comparison results 
Research Method Results 
Suburban area: Path length is 240 km, completion time is 170 minutes, on - time 
SPA 
rate is 78%, and scheduling success rate is 83%. 
Suburban area: Path length is 220 km, completion time is 160 minutes, on - time 
HA 
rate is 82%, and scheduling success rate is 86%. 
Suburban area: Path length is 210 km, completion time is 150 minutes, on - time 
GA 
rate is 88%, and scheduling success rate is 91%. 
Suburban area: Path length is 250 km, completion time is 180 minutes, on - time 
RBM 
rate is 75%, and scheduling success rate is 80%. 
168 Informatica 49 (2025) 151–170 Y. Yang et al. 
Research Method Results 
Suburban area: Path length is 200 km, completion time is 140 minutes, on - time 
TDRM 
rate is 86%, and scheduling success rate is 90%. 
Suburban area: Path length is 190 km, completion time is 130 minutes, on - time 
ADLM 
rate is 90%, and scheduling success rate is 93%. 
This Study (DRL + Suburban area: Path length is 180 km, completion time is 120 minutes, on - time 
CNN) rate is 92%, and scheduling success rate is 95%. 
As shown in Table 11, the table clearly presents the service quality and scheduling success, with a path length 
comparison results of logistics path optimization of of 200 kilometers, a completion time of 150 minutes, an 
different research methods in suburban environments. As on - time rate of 90%, and a scheduling success rate of 
a classic shortest path algorithm, SPA has a long path 93%. On highways, it seems efficient and can attain 
length, a long completion time, and a relatively low on- relatively swift delivery while keeping a high on - time 
time rate and scheduling success rate in suburban rate, with a path length of 170 kilometers, a completion 
environments, reflecting its poor adaptability in complex time of 110 minutes, an on - time rate of 93%, and a 
suburban logistics scenarios. HA has improved compared scheduling success rate of 95%. 
to SPA, but still has certain limitations. GA has further In the conducted experiments, the proposed method 
improved in path length, completion time, on-time rate, has seemingly outperformed other methods in terms of 
and scheduling success rate, but its advantages are not computation time and robustness score. This gives an 
obvious compared with more advanced methods. The indication that the model might be able to find a decent 
performance of RBM in various indicators is relatively solution within a relatively short period and maintain a 
poor, indicating that rule-based methods have limited somewhat stable performance when encountering certain 
effects in suburban logistics path planning.TDRM and environmental changes. Nevertheless, it must be 
ADLM, as more advanced methods, perform well in emphasized that the experiments have a limited scope, 
multiple indicators. However, the DRL + CNN method especially when it comes to extreme scenarios like bad 
proposed in this study shows significant advantages, with weather and traffic peak hours. Thus, we cannot be overly 
the shortest path length, the shortest completion time, and confident about its ability to handle real - time changing 
the highest on-time rate and scheduling success rate. This traffic conditions and emergencies as mentioned in the 
shows that the method of combining deep reinforcement introduction. 
learning with convolutional neural networks can more Based on the current comparison of different methods' 
accurately capture the characteristics of suburban logistics comprehensive performance, the proposed method shows 
environments, dynamically adjust path planning and some potential advantages in various logistics scheduling 
scheduling strategies, and thus achieve more efficient and environments. It serves as a starting point for future 
reliable logistics distribution. These results provide a exploration in logistics scheduling systems, but significant 
strong reference for path optimization and scheduling in refinement and more extensive validation are undoubtedly 
the logistics industry in suburban environments. necessary. 
5 Conclusion Funding 
This study explores the integration of DRL and CNN This work was supported by 2023 Bidding Subjects 
to develop a novel logistics path optimization and real - for Decision-making Research of Jiaozuo Municipal 
time scheduling model. Through meticulous analysis and Government of Henan Province: “Research on 
pre - processing of the City Logistics Data Set (CLDS) and Countermeasures for Consolidating and Developing the 
Traffic State Data Set (TSDS), we've crafted a model that Public Ownership Economy in Jiaozuo City” 
may have the capacity to handle diverse logistics (JZZ202311-1), Excellent Subject, Project Completion, 
environments. Presided over; 2022 Henan Jiaozuo Municipal 
The experimental outcomes suggest that the proposed Government Decision Research Bidding Project: 
method has shown some positive signs in multiple “Research on the Development of Local Characteristic 
logistics scheduling environments. In suburban regions, it Industries in Jiaozuo City under the Background of Rural 
appears to have some ability to tackle long - distance Revitalization” (JZZ202222-1), Excellent Project, 
delivery issues and manage scattered delivery points. For Conclusion, Presided; 2021 Henan Jiaozuo Municipal 
example, the path length is 180 kilometers, the completion Government Decision Research Bidding Project: 
time is 120 minutes, the on - time rate is 92%, and the “Research on Modern and Efficient Agricultural 
scheduling success rate is 95%. In urban areas, it can Development in Henan City under the Rural 
somewhat find better delivery paths and adapt to the Revitalization Strategy” (JZZ202127-2), qualified, 
complex traffic network, achieving a certain level of closed, presided; And 2022 Henan Province University 
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 169 
Humanities and Social Science Research Project Funding: IEEE Access. 2022; 10:6175-6193. DOI: 
“Research on Promoting the Effective Connection 10.1109/access.2022.3141311 
between Poverty Alleviation Strategy and Rural [12] Zhu XY, Jiang LL, Xiao YM. Study on the 
Revitalization” (2022-ZDJH-00273), established under optimization of the material distribution path in an 
research and presided. electronic assembly manufacturing company 
workshop based on a genetic algorithm considering 
References carbon emissions. Processes. 2023; 11(5):1500. 
DOI: 10.3390/pr11051500 
[1] Di Gennaro G, Buonanno A, Fioretti G, Verolla F, [13] Okumus F, Dönmez E, Kocamaz AF. A Cloudware 
Pattipati KR, Palmieri FAN. Probabilistic inference architecture for collaboration of multiple AGVs in 
and dynamic programming: a unified approach to indoor logistics: case study in fabric manufacturing 
multi-agent autonomous coordination in complex enterprises. Electronics. 2020; 9(12):2023. DOI: 
and uncertain environments. Frontiers in Physics. 10.3390/electronics9122023 
2022; 10:944157. DOI: 10.3389/fphy.2022.944157 [14] Chen YH. Intelligent algorithms for cold chain 
[2] Du PF, Shi YQ, Cao HT, Garg S, Alrashoud M, logistics distribution optimization based on big data 
Shukla PK. AI-Enabled trajectory optimization of cloud computing analysis. Journal of Cloud 
logistics UAVs with wind impacts in smart cities. Computing-Advances Systems and Applications. 
IEEE Transactions on Consumer Electronics. 2024; 2020; 9(1):37. DOI: 10.1186/s13677-020-00174-x 
70(1):3885-3897. DOI: 10.1109/tce.2024.3355061 [15] Yu SL, Song YT. SRNN-RSA: a new method to 
[3] Wang PW, Wang YF, Wang X, Liu Y, Zhang J. An solving time-dependent shortest path problems based 
intelligent actuator of an indoor logistics system on structural recurrent neural network and ripple 
based on multi-sensor fusion. Actuators. 2021; spreading algorithm. Complex & Intelligent 
10(6):120. DOI: 10.3390/act10060120 Systems. 2024; 10(3):4293-4309. DOI: 
[4] Cai LY. Decision-making of transportation vehicle 10.1007/s40747-024-01351-0 
routing based on particle swarm optimization [16] Sun Q, Zhang HF, Dang JW. Two-Stage vehicle 
algorithm in logistics distribution management. routing optimization for logistics distribution based 
Cluster Computing-the Journal of Networks on HSA-HGBS algorithm. IEEE Access. 2022; 
Software Tools and Applications. 2023; 26(6):3707- 10:99646-99660. DOI: 
3718. DOI: 10.1007/s10586-022-03730-z 10.1109/access.2022.3206947 
[5] Zhang MY, Jiang YH, Wan C, Tang C, Chen BY, Xi [17] Wu C, Xiao YM, Zhu XY, Xiao GW. Study on 
HZ. Design of an intelligent shop scheduling system multi-objective optimization of logistics distribution 
based on internet of things. Energies. 2023; paths in smart manufacturing workshops based on 
16(17):6310. DOI: 10.3390/en16176310 time tolerance and low carbon emissions. Processes. 
[6] Leng KJ, Li SH. Distribution path optimization for 2023; 11(6):1730. DOI: 10.3390/pr11061730 
intelligent logistics vehicles of urban rail [18] Ren CA, Huang YZ, Luo DX. A global optimal 
transportation using VRP optimization model. IEEE mapping method of network based on discrete 
Transactions on Intelligent Transportation Systems. optimization firefly algorithm. Microprocessors and 
2022; 23(2):1661-1669. DOI: Microsystems. 2021; 81:103800. DOI: 
10.1109/tits.2021.3105105 10.1016/j.micpro.2020.103800 
[7] Zheng KN, Huo XX, Jasimuddin S, Zhang JZ, [19] Yang C. Intelligent pickup and delivery collocation 
Battaïa O. Logistics distribution optimization: Fuzzy for logistics models. Journal of Intelligent & Fuzzy 
clustering analysis of e-commerce customers' Systems. 2023; 44(4):6117-6129. DOI: 10.3233/jifs-
demands. Computers in Industry. 2023; 151:103960. 223708 
DOI: 10.1016/j.compind.2023.103960 [20] Sun YX, Geng N, Gong SL, Yang YB. Research on 
[8] Zhang L, Lu SQ, Luo ML, Dong B. Optimization of improved genetic algorithm in path optimization of 
the storage spaces and the storing route of the aviation logistics distribution center. Journal of 
pharmaceutical logistics robot. Actuators. 2023; Intelligent & Fuzzy Systems. 2020; 38(1):29-37. 
12(3):133. DOI: 10.3390/act12030133 DOI: 10.3233/jifs-179377 
[9] Alkan N, Kahraman C. Prioritization of supply chain [21] Tian SH, Huangfu CY, Deng XP. Research on 
digital transformation strategies using multi-expert comprehensive optimisation of AGVs scheduling at 
fermatean fuzzy analytic hierarchy process. intelligent express distribution centres based on 
Informatica. 2023; 34(1):1-33. DOI: 10.15388/22- improved GA. Journal of the Operational Research 
infor493  Society. 2024; 75(10):1875-1892. DOI: 
[10] Lee KY, Lim JS, Ko SS. Endosymbiotic 10.1080/01605682.2023.2283518 
evolutionary algorithm for an integrated model of the [22] Chen WL, Song XG. Enterprise management 
vehicle routing and truck scheduling problem with a innovation in the internet of things from the 
cross-docking system. Informatica. 2019; 30(3):481- perspective of contingency. Journal of Intelligent & 
502. DOI: 10.15388/Informatica.2019.215 Fuzzy Systems. 2019; 37(5):5829-5836. DOI: 
[11] Issaoui Y, Khiat A, Haricha K, Bahnasse A, Ouajji 10.3233/jifs-179164 
H. An advanced system to enhance and optimize [23] Wang Y, Zhang J, Guan XY, Xu MZ, Wang Z, Wang 
delivery operations in a smart logistics environment. HZ. Collaborative multiple centers fresh logistics 
170 Informatica 49 (2025) 151–170 Y. Yang et al. 
distribution network optimization with resource agricultural products based on the perspective of 
sharing and temperature control constraints. Expert customers. Journal of Intelligent & Fuzzy Systems. 
Systems with Applications. 2021; 165:113838. DOI: 2022; 43(1):615-626. DOI: 10.3233/jifs-212362 
10.1016/j.eswa.2020.113838 [28] Liu Z, Wang HS, Wei HS, Liu M, Liu YH. 
[24] Wu C, Xiao YM, Zhu XY. Research on optimization Prediction, planning, and coordination of thousand-
algorithm of AGV scheduling for intelligent warehousing-robot networks with motion and 
manufacturing company: taking the machining shop communication uncertainties. IEEE Transactions on 
as an example. Processes. 2023; 11(9):2606. DOI: Automation Science and Engineering. 2021; 
10.3390/pr11092606 18(4):1705-1717. DOI: 10.1109/tase.2020.3015110 
[25] Ren JJ, Salleh SS. Green urban logistics path [29] Yang YL, Zhang J, Sun WJ, Pu Y. Research on 
planning design based on physical network system in NSGA-III in Location-routing-inventory problem of 
the context of artificial intelligence. Journal of pharmaceutical logistics intermodal network. 
Supercomputing. 2024; 80(7):9140-9161. DOI: Journal of Intelligent & Fuzzy Systems. 2021; 
10.1007/s11227-023-05796-x 41(1):699-713. DOI: 10.3233/jifs-202508 
[26] Gautam K, Ahn CW. Quantum path integral [30] Yin N. Multiobjective optimization for vehicle 
approach for vehicle routing optimization with routing optimization problem in low-carbon 
limited Qubit. IEEE Transactions on Intelligent intelligent transportation. IEEE Transactions on 
Transportation Systems. 2024; 25(5):3244-3258. Intelligent Transportation Systems. 2023; 
DOI: 10.1109/tits.2023.3327157 24(11):13161-13170. DOI: 
[27] Li HP, Lu L, Yang LG. Study on the extension 10.1109/tits.2022.3193679
evaluation of smart logistics distribution of fresh 
 
https://doi.org/10.31449/inf.v49i16.7201                                                     Informatica 49 (2025) 171–186 171 
 
Design and Application of Improved Genetic Algorithm for 
Optimizing the Location of Computer Network Nodes 
Chunlei Zhong1*, Gang Yang2 
1Huai'an Bioengineering Branch Institute, Jiangsu Union Technical Institute, Huai'an, 223200, China 
2College of Teacher Education, Wenzhou University, Wenzhou, 325035, China 
E-mail: hm_spring@163.com 
*Corresponding author 
Keywords: genetic algorithm, computer network, network nodes, improved genetic algorithm, average error 
Received: September 24, 2024 
The rapid development of computer technology has made network stability and node positioning 
accuracy important challenges in optimizing computer network design. This study proposes an 
optimization method based on the Improved Genetic Algorithm (IGA) to improve the positioning 
accuracy and stability of network nodes. Firstly, by combining the characteristics of the centroid 
algorithm and the Approximate Point in Triangulation Test (APIT) algorithm, preliminary optimization 
of node positions is carried out. Subsequently, an IGA is utilized for further optimization, dynamically 
adjusting the crossover probability and mutation probability to balance global and local search 
capabilities and avoid the algorithm falling into local optima. The experimental results showed that IGA 
achieved significant performance improvement in node localization. Compared with the centroid 
algorithm, the maximum error of IGA has been reduced by 19% and the overall average error has been 
reduced by 8.8%. Compared with APIT, IGA has reduced the maximum error by 7% and the overall 
average error by 3.8%. Regarding fitness values, IGA exhibited faster convergence speed, achieving 
optimal results with only 75 iterations, surpassing traditional genetic algorithms and APIT algorithms. 
The node coverage rate reached 98.6%, far higher than the 85.3% of the centroid algorithm and 90.5% 
of the APIT algorithm. These results demonstrate that IGA has higher accuracy, stability, and 
computational efficiency in complex network environments, providing an efficient and reliable solution 
for optimizing the design of computer network nodes. 
Povzetek: Predlagan je izboljšan genetski algoritem (IGA) za optimizacijo lokacij vozlišč v 
računalniških omrežjih, ki z dinamičnim prilagajanjem verjetnosti križanja in mutacije poveča točnost, 
stabilnost in učinkovitost algoritma.
 
 
1 Introduction efficient and accurate when utilized for network design 
optimization. At the same time, researchers combine 
With the continuous progress of modern technology, genetic algorithms with other optimization algorithms to 
computer networks play an increasingly important role in create multiple hybrid optimization algorithms, which 
modern society. They connect various devices and enhances network design performance and effectiveness 
systems, making the transmission and sharing of [3-4]. The objective of the research is to achieve an 
information more efficient and convenient. To meet the optimized design of computer networks and to improve 
needs of users for high-quality network services, network performance indicators, including latency, 
improving network performance and optimizing network throughput, resource utilization, and cost, through an 
design have become increasingly important. Traditional Improved Genetic Algorithm (IGA). The research aims to 
optimization algorithms frequently encounter issues of solve the problems of slow convergence speed, 
low efficiency and a propensity to fall into local optima susceptibility to local optima, and difficulty in dynamic 
when addressing large-scale network design problems. adjustment of traditional network optimization methods 
Therefore, it is necessary to introduce new optimization in complex network environments. This study designs a 
algorithms to solve these problems [1-2]. In recent years, computer network optimization technique based on a 
researchers have made significant advancements in genetic algorithm as the core and introduces multiple 
applying enhanced genetic algorithms to optimize techniques to improve performance. A fitness function 
computer networks. These enhancements include the based on network performance indicators is constructed 
introduction of new operators, optimization of algorithm to quantify the network optimization objectives. This 
parameters, and adjustments to the algorithms technology adjusts the crossover and mutation 
themselves. As a result, genetic algorithms are now more  probabilities adaptively by comparing individual fitness 
172 Informatica 49 (2025) 171–186 C. Zhong et al. 
 
and population average fitness, balancing global and local  
search capabilities.  
relay node for network transmission based on the number 
2 Related work of channels. This algorithm had high merit in data 
delivery efficiency and routing overhead in computer 
The 5G era is coming and the network technology is 
networks. Alsaqour et al. [8] put forward a 
developing rapidly. Numerous data have brought 
location-assisted routing algorithm grounded on genetic 
enormous challenges to the stability and reliability of 
algorithms to optimize the efficiency of MANET routing 
computer networks. The reliability of computer networks 
protocols. Firstly, through algorithm optimization, node 
is a major indicator of computer comprehensive 
information was added to the route and these nodes were 
performance. Computer networks are large and complex, 
grouped. These nodes were then sent to their destinations 
and they are also easily affected by many adverse factors. 
to adaptively update the node location. The results 
This leads to instability in the system, which exposes the 
showed that the optimized algorithm could achieve a 
entire computer network to significant risks. To ensure 
delivery rate of over 99% for small network overhead 
the stability and ongoing optimization of computer 
packets. Bu [9] developed a load-balancing scheduling 
networks, computer network optimization design has 
algorithm for Internet of Things (IoT) clusters using a 
become a prevalent point of discussion in computer 
combination of Particle Swarm Optimization and Genetic 
research. Through their study of cloud computing, Fan et 
Algorithm (PSOGA). The purpose of this algorithm was 
al. [5] presented a novel mathematical model for virtual 
to address the persistent challenge faced by IoT networks 
network embedding in optical data center networks. This 
due to high-volume business data traffic causing 
model reduced Network Topology (NT) complexity 
downtime. They first used the CPU, RAM, and network 
during optical fiber transmission. They used a 
bandwidth to measure the server node information, then 
comprehensive system of node awareness and path 
adjusted the appropriate function value, and used the IGA 
evaluation to derive algorithms with priority locations. 
to obtain the optimal solution. The results showed that the 
The algorithm obtained by this model could reduce the 
optimized algorithm could reduce latency and error rates 
latency of virtual network requests by 20% and improve 
by 5%, while also reducing server overload and 
the request rate by 13%. Rajendran and Venkataraman [6] 
downtime. Network coding could integrate coding 
proposed a new neural network algorithm to analyze 
capabilities with network multi-path propagation, bolster 
network traffic built on the application and analysis of big 
the capacity of computer networks, and facilitate more 
data in network security. They used this method to 
intricate security solutions. To address the susceptibility 
conduct statistics on the worst data and abnormal activity 
of network coding to attacks, Wu et al. [10] developed a 
sent by the network and conducted experiments with the 
comprehensive unicast secure transmission scheme based 
data. Compared with traditional algorithms for neural 
on Random Linear Network (RLN). The matrix was 
networks, the optimized algorithm showed a notable 
randomly generated from the received nodes and the 
enhancement in distinguishing between false alarms and 
resulting vector was sent back to the source node via the 
actual detection, which significantly improved the 
link to form a new matrix. This approach effectively 
security and stability of the network. Xiaokaiti et al. [7] 
thwarted network eavesdropping attacks. The 
raised an efficient data transmission strategy for the 
comparative analysis between the research and the 
detection algorithm of computer network communities. 
advanced methods is shown in Table 1. 
They first combined NT attributes with social attributes 
 
when dividing communities and then selected the optimal  
Table 1: Comparative analysis of research and the advanced methods 
Comparison with 
Reference Technical Method Advantages Disadvantages 
IGA 
Virtual network IGA reduces error 
Reduces latency by 20% Limited applicability; 
embedding with node by 8.8%, with 
Fan et al. [5] and improves request rate does not optimize node 
awareness and path broader 
by 13%. positioning. 
evaluation. applicability. 
Enhanced neural 
High computational IGA achieves 
Rajendran et network algorithm for Improves security and 
cost; lacks node 2.41% error, with 
al. [6] malicious traffic reduces false alarms. 
optimization. higher efficiency. 
detection. 
Community detection Improves transmission IGA reduces error 
Xiaokaiti et Dependent on BT; 
algorithm to optimize efficiency and reduces by 3.8%, offering 
al. [7] limited precision. 
data transmission. routing overhead. better stability. 
Alsaqour et Genetic algorithm for Achieves 99% small Suitable for small IGA reduces error 
al. [8] optimizing mobile ad packet delivery rate with networks; struggles to 2.46%, with 
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 173 
 
hoc network routing. low overhead. with large-scale wider applicability. 
networks. 
IGA improves 
Focuses on load accuracy by 8.8%, 
PSO and GA combined Reduces load and 
Bu [9] balancing; lacks offering a 
for load balancing. downtime by 5%. 
positioning accuracy. comprehensive 
solution. 
Does not optimize IGA achieves 5.2% 
Secure transmission 
Wu et al. Enhances security and node positioning or error, with better 
using random linear 
[10] prevents eavesdropping. transmission precision and 
network coding. 
efficiency. stability. 
 
Previous research has found that related work 
mainly focuses on specific aspects of computer network 3 Construction of computer network 
optimization, including security enhancement, data 
transmission efficiency, and load balancing. However, optimization design model based 
they have shortcomings in addressing the accuracy of on genetic algorithm 
network node localization and overall stability under 
different network conditions. Existing algorithms such as 3.1 Optimization of node location based on 
the centroid algorithm and Approximate Point in centroid algorithm and APIT 
Triangulation Test (APIT) have significant drawbacks, 
algorithm 
including limited accuracy and sensitivity to node 
density. Traditional algorithms, such as genetic From the perspective of topology, a computer network is 
algorithms and MANET routing protocols perform well composed of several network nodes and communication 
in specific network types, but perform poorly in links connecting these network nodes. This indicates that 
large-scale or dynamic environments. Using genetic the positioning of network nodes is indispensable in 
algorithms to assist routing protocols can improve computer network data transmission. The centroid 
network overhead and delivery rates. This fully optimizes algorithm is the most typical node localization algorithm 
the genetic algorithm and enhances network delivery. To among commonly used localization algorithms. The 
optimize computer network nodes for better algorithm has four advantages: low storage energy 
environmental conditions, this study uses centroid and consumption, simple algorithm principle, low computing 
APIT algorithms, which provide better conditions for energy consumption, and low communication energy 
computer network optimization. Then, based on node consumption. 
optimization, an IGA is used to construct a network Before using this algorithm for localization, it is first 
design optimization model. Through optimizing the necessary to determine whether the location node that the 
traditional ant colony algorithm, the efficiency of sensor needs to determine is located within the region. At 
network nodes in computer network optimization design the same time, nodes requiring location determination 
is enhanced. This paper aims to increase the stability and will continually emit various 
reliability of computer network optimization design. 
B A
C
(x,y)
H
D
G
E
F
 
Figure 1: Schematic diagram of centroid algorithm positioning 
 
174 Informatica 49 (2025) 171–186 C. Zhong et al. 
 
communication signals to the surrounding environment. 1 n 1 n
To determine whether the unknown node is in the (x, y) = (  x ,  y      (1) 
i i )    
n i=1 n i=1
monitoring area, it is essential to verify the strength of the 
In Formula (1), n  represents a n -sided shape. 
signal obtained at the beacon node. The strength can 
(x, y)  means the coordinate of the vertex. The centroid 
reflect the unknown node location [11]. The principle of 
of this n-sided shape can be obtained by calculating the 
the centroid algorithm is obtained built on the algorithm 
formula. 
for the centroid. In any irregular polygon, there must be a 
If the polygon is situated within the solved region 
center of mass inside it. Usually, the coordinates of each 
and has matching coordinates, then the centroid 
vertex are accumulated, and then the average value is 
coordinates of the octagon can be computed using 
calculated to determine its specific coordinates. The 
Formula (2). 
specific location can be represented by Formula (1), and 
its algorithm diagram is shown in Figure 1. 
x + x + + x y + y + + y
(x, y) = ( 1 2 8 , 1 2 8 )                          (2) 
8 8
The (x1 , y2 ) (x8 , y8 )  in Formula (2) represents require high positioning accuracy, the centroid algorithm 
the coordinates of eight vertices. To use centroid is still the most suitable method. 
positioning algorithms for positioning, it is essential to APIT is an improved algorithm for the centroid 
rely on the smoothness of the entire network structure and algorithm. It requires a completely random selection of 
the specific distribution of positioning nodes within the many known coordinate nodes, and the coordinate nodes 
network. If an error occurs in the coordinates calculated are grouped every three. In accordance with these nodes, 
by unknown nodes, it will bias towards areas with the triangles drawn on the graph will be completely 
densely distributed beacon nodes, potentially resulting in randomly distributed throughout the entire region. There 
significant errors with the centroid algorithm. Therefore, will be some overlap between these triangles to calculate 
the algorithm's calculation accuracy is typically not high, the coordinates of unknown nodes. The specific operation 
and the positioning accuracy may be low. However, the steps are: First, multiple coordinate nodes around 
centroid algorithm only needs to broadcast once to locate unknown nodes are identified, and three known location 
all unknown nodes. In many applications that do not nodes are randomly 
 
Figure 2: Schematic diagram of APIT algorithm positioning 
selected each time. Then, the approximate location of the unknown nodes into the algorithm results in enhanced 
signals received by these known location nodes is accuracy in location estimation [12-13]. However, this is 
determined. If there are m beacon nodes, the paper will accompanied by an increased computational burden. In 
randomly select and match them, and use the combination such a scenario, choosing a subset of vertices to create a 
of three random position points to fo C3
rm n  triangle. It polygon based on the real circumstances, as illustrated in 
shows that some triangular regions can contain unknown Figure 2, can be beneficial. In Figure 2, the node 
nodes, while others do not. These specific points, which positioning accuracy of the APIT is significantly greater 
contain triangular regions of unknown nodes, are than that of the centroid positioning algorithm. 
connected to each other. Finally, the recorded location Due to the relatively large impact of node density on 
algorithm is utilized to calculate the specific location of APIT, when the beacon node density is relatively large, 
unknown nodes. The incorporation of a greater number of APIT can achieve relatively ideal positioning accuracy. 
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 175 
 
APIT also has good performance in irregular wireless shortcomings in both algorithms concerning their 
signal propagation models and irrational circular calculation and location processes. Additionally, the use 
propagation models. However, APIT also has a of genetic algorithms for node localization requires extra 
significant disadvantage. When connecting triangles, it constraints, which may lead to increased computational 
may mistake points located outside the triangle for points time and reduced efficiency, resulting in premature 
inside the triangle. The probability of generating this convergence [15]. To obtain better positioning 
situation can reach a maximum of 13% through research optimization results, an IGA model is studied and 
[14], which will have a significant impact on positioning constructed. The flow chart of the model is shown in 
accuracy. The algorithm must divide a large number of Figure 3, and the blue box in the figure shows the 
triangular regions to identify the locations of unknown improved steps. 
nodes and necessitates multiple beacon nodes. As a Compared with Traditional Genetic Algorithms 
result, the algorithm performs numerous calculations, (TGA), the paper has improved the node localization of 
which elevates the likelihood of encountering errors. genetic algorithms and constructed a matrix. The specific 
construction of the matrix is shown in Formula (3). 
3.2 Construction of improved genetic 
algorithm model 
The research and analysis of the centroid and APIT 
algorithms in node location optimization have revealed 
s  a11 a12 a
1 1n 
   
s2 a21 a22 a2n
S(m) =   =                         (3) 
   
   
sm  am1 am2 amn 
The m  in Formula (3) represents a total of m  communication areas of different anchor nodes is shown 
chromosomes in the genetic algorithm. n  means that in Figure 4. 
each chromosome has n  elements. s1 , s2 sm  are IGA performs improved optimization in TGA such 
each chromosome. In TGA, determining the value range as parameter setting, population initialization, appropriate 
of genes is a commonly used method to generate an function values, selection operations, and crossover 
initial population waiting for calculation. If a certain operations. The specific key parameter settings are to set 
number of initial populations are randomly generated the number of populations to be 40, the crossover 
within this value range, there may be situations where the probability pc1  to be 0.6, pc2  to be 0.4, the mutation 
distribution is too random. This result is basically not probability pm1  to be 0.08, pm2  to be 0.06, and the 
helpful for improving the algorithm efficiency [16]. If maximum iterations to be 100. Population initialization 
one wants to obtain a global optimal solution, the can determine the initial population range according to 
distribution of the initial population in the solution space Formula (4), and generate an initial population randomly 
should be as uniform and dispersed as possible. The within this range. 
schematic diagram of random initial population 
 max (xi − di )  x  min (xi + di )
generation within the overlapping range of i=1,2, ,n i=1,2, ,n
   (4) 
 max (yi − di )  y  min (yi + di )i=1,2, ,n i=1,2, ,n
 
Begin
Initialization
Select Action
Calculate individual 
fitness
Cross operation
N
Meet stop rule
Mutation operation
Y
Finish Generate a new 
species group
 
Figure 3: The improvement process of genetic algorithms 
176 Informatica 49 (2025) 171–186 C. Zhong et al. 
 
Anchor node
Initial population 
generation region
 
Figure 4: Schematic diagram of initial population generation area 
In Formula (4), di  means the distance between the accelerating the convergence speed of the algorithm. The 
node of unknown i  and the anchor. The fitness function selection operation is to perform a unified comparison of 
value calculation assumes a total of (M +N)  nodes in each individual based on the fitness value calculated from 
the wireless sensor network to be located, where the the fitness function. After the comparison is completed, 
number of known nodes is M. The unknown node number the two individuals with the highest fitness remain 
is N. Through a certain distance measurement method, if unchanged and proceed to the next round of operation. 
each unknown node knows the distance between all The individuals with the lowest fitness are directly 
known nodes within the communication radius and itself, eliminated, and the remaining objects will normally 
the calculated node position can be obtained through the undergo crossover and mutation operations. Special 
least square method [17]. Assuming that the coordinate of individuals with high fitness values are set with judgment 
a node at a known location is (x1,y1),(x2,y2),…,(xM,yM), the values to distinguish and limit their reproduction. After 
coordinate of an unknown node is (x, y) , and the completing the full iterative process, the fitness value of 
distance from the node at a known location is each individual should be appropriately amplified 
d1, d2 , d3 , , dM . The equation set shown in Formula (5) [18-19]. The crossover probability is used to control the 
can be established. probability of individuals (chromosomes) performing 
 crossover operations. By calculating an individual's 
fitness value, it can be determined whether the individual 
(x − x )2 + (y − y )2
1 1 = d 2
1 should participate in crossover operations. The goal of 

(x − x 2 2 2 crossover operation is to generate offspring with higher 
 2 ) + (y − y2 ) = d2
 fitness by recombining the genetic information of the 
2 2 2
(x − x3 ) + (y − y3 ) = d3     (5) parent individual, gradually approaching the optimal 

 solution. When performing a crossover operation, if 
(x − x )2 + (y − y 2 2 Fg  Favg , the crossover probability is calculated 
M M ) = d
 M
according to Formula (7). 
From Formula (5), the fitness function for genetic 
F
algorithms can be defined as Formula (6), and the fitness g − Farg
Pc = pc        (7) 
1
function of the initial population can be calculated by Fgb − Favg
using it. 
In Formula (7), pc
1 M 1 (0,1) . Fg  is the value of the 
f (x, y) =  (x − xi )2+ (y − yi )2 − di (6) individual's fitness function. Fgb  represents the optimal 
M i=1 individual adaptation function value. Favg  is the average 
In Formula (6), (x, y)  is the unknown location of the adaptive function. The calculation process includes 
node location. (xi , yi )  represents a known location node normalizing the fitness difference and converting the 
location. di  refers to the distance from an unknown normalized fitness difference into actual cross 
location to a known location (xi , yi ) . The use of probability. If Fg＜Favg , the crossover probability is 
absolute error instead of squared error in the fitness calculated according to Formula (8). 
function avoids the calculation of squared error, reduces P              (8) 
c = P  
c2
complex multiplication operations, and lowers 
In Formula (8), P
computational load. During the iteration process, absolute c2 (0,1) . After pairing the 
chromosomes in the population, the crossover operation 
error is more robust to outliers (i.e. has less impact on 
is performed based on the calculated crossover 
data with larger deviations) and also enables the 
probability. A random number between [0-1] is randomly 
algorithm to approach the global optimal solution faster, 
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 177 
 
generated for each chromosome. The objective of the If Fg＜Favg , the probability of variation is calculated by 
treatment of poorly adapted individuals is to provide Formula (10). 
those with lower fitness a certain opportunity to p  
m = P             (10)
m2
participate in crossover, increase population diversity, In Formula (10), Pm2 (0.01,0.10) . The first step is 
and circumvent premature convergence to local optimal to randomly generate a number between 0 and 1 for each 
solutions. If the corresponding value is less than the chromosome in the population. If the generated value is 
crossover probability, the chromosome is ready to less than the mutation probability, it will undergo 
perform the next operation. The chromosomes for the mutation operation. The position corresponding to the 
next step are sequentially crossed in pairs. For each pair required mutation is determined by generating a random 
of crossed chromosomes, the location of the crossing is number value, and the next step is to invert the value to 
determined by random numbers and the crossover complete the relevant mutation operation. Monotonic 
operation is performed. During the mutation operation, if gene locus detection analyzes the whole population and 
Fg  Favg , the mutation probability is calculated by identifies any monotonic gene loci present. If these are 
Formula (9). detected, targeted adjustments can be made by generating 
random numbers. The termination operation should be 
Fg − Favg
P determined and the loop termination should be evaluated 
m = p        (9) 
m1
Fgb − Favg based on the number of iterations. The final step is to 
output the optimal solution and test the average 
In Formula (9),  represents 
positioning error of the algorithm as a performance 
the adaptive degree function value of individual  
parameter, as shown in Formula (11). 
is the value of the optimal individual adaptation function.  
refers to the average value of the adaptive function. 
100 N
error =  (x 2 2
i1 − xi2 ) + (yi1 − yi2 ) %                      (11) 
N R i=1
N  in Formula (11) is the sum of nodes. (xi1 , yi1)  configuration, thereby achieving efficient and accurate 
and (xi2 , yi2 )i = (1, 2,3, , M)  represent the actual and network optimization design. 
calculated coordinates of the unknown node i . R  is 
the maximum communication distance of the node. In the 4 Performance analysis of computer 
process of transforming IGA theory into practical 
applications, the rigor of mathematical analysis is network optimization design model 
reflected in the precise modeling and dynamic adjustment based on genetic algorithms 
of fitness functions, crossover probabilities, and mutation 
probabilities. The dynamic allocation of crossover 
probability is achieved by comparing individual fitness 4.1 Performance analysis of node location 
with group average fitness and optimal fitness. This based on centroid location algorithm 
allows individuals with higher fitness to have a higher and APIT algorithm 
probability of crossover, thereby accelerating the spread 
of excellent genes, while preserving a small number of To verify the actual positioning effects of the centroid 
crossover opportunities for individuals with lower fitness algorithm, APIT, and IGA, simulation experiments are 
and maintaining population diversity. This normalization conducted on three algorithms in MATLAB. The reason 
mechanism based on fitness differences effectively for choosing APIT and centroid algorithm as benchmarks 
balances local search and global search, avoids premature for research is their effectiveness and wide application in 
convergence of the algorithm, and improves solution network optimization. The APIT algorithm performs well 
accuracy and efficiency. Furthermore, the implementation in localization problems and is suitable for evaluating the 
of random number generation techniques and probability accuracy and reliability of network nodes, serving as a 
judgment processes enables the transformation of benchmark for network performance optimization in 
theoretical models into practical operations, thereby research. The centroid algorithm is known for its 
ensuring the randomness and controllability of crossover simplicity, ease of use, and fast convergence, making it 
and mutation. This approach facilitates the robustness and suitable for solving optimization problems in basic 
convergence of the algorithm in complex optimization network structures. The selection of these two algorithms 
problems, thus achieving efficient integration of theory perfectly covers different types of network optimization 
and practice. When using IGA for computer network requirements. Through comparison, the advantages of 
optimization design, the network optimization problem is IGA in solving complex optimization problems can be 
first modeled as a fitness function to measure network clearly demonstrated. Using MATLAB version R2021a, 
performance indicators. Then, through iterative evolution the hardware specifications are as follows: Intel Core 
through selection, crossover, and mutation operations, the i7-9700K processor, 32 GB DDR4 RAM, 512 GB 
crossover and mutation probabilities are dynamically solid-state drive, and Windows 10 Professional 64 bit 
adjusted to optimize the network structure and parameter operating system. The algorithm sets the population size 
178 Informatica 49 (2025) 171–186 C. Zhong et al. 
 
to 100, the number of iterations to 500, the crossover 100×100 area. After generating this region, the node is 
probability to 0.8, and the mutation probability to 0.05. predicted by running the corresponding algorithm. Figure 
The elite strategy is to retain the top 10% of excellent 5 shows the original node distribution diagram. 
individuals. To ensure the statistical validity of the test The green circle in Figure 5 represents an anchor 
scenario, multiple sets of experiments are designed and node, while the blue pentagon represents an unknown 
optimized for network topologies of different sizes and node. Figure 6 shows the positioning results of three 
complexities. Each experiment should be repeated at least algorithms. The positioning results of centroid algorithm, 
30 times to obtain stable average performance indicators APIT, and IGA are shown in Figure 6(a), Figure 6(b) and 
and standard deviations, ensuring the reliability of the Figure 6(c), respectively. 
results. In the collected performance indicators, statistical In Figure 6, the predicted value of IGA has a higher 
analysis is used to evaluate the significant differences in coincidence rate with unknown nodes, reaching 94.36% 
algorithms under different configurations, thereby (P<0.05). The predicted value of the centroid algorithm 
determining the efficacy of the optimization effects. has the lowest coincidence rate with unknown nodes, 
When comparing, a null hypothesis and an alternative which is 86.25%. The coincidence rate between the 
hypothesis are set. When calculating the P-value, it predicted value of APIT and the unknown node is 
represents the probability of obtaining the current or more 89.67%. The coincidence rate of the IGA is 8.16% higher 
extreme result under the null hypothesis. The t-test is than that of the centroid algorithm and 4.69% higher than 
used to compare the results of IGA and benchmark that of the APIT algorithm (P<0.05). This means IGA has 
algorithms. If the P-value is less than 0.05, the difference a high positioning computing ability. The positioning 
is considered statistically significant. The experiment errors of the centroid algorithm, APIT, and IGA are listed 
generates 20 anchor nodes and 80 unknown nodes in the in Figure 7. 
 
Unknown Anchor 
node node
100
80
60
40
20
0
0 20 40 60 80 100
Node abscissa
 
Figure 5: Original node distribution diagram 
Coincidence rate:86.25% Coincidence rate:89.67%
Prediction Unknown Anchor Prediction Unknown Anchor 
node node node node node node
100 100
80 80
60 60
40 40
20 20
0
0 20 40 60 80 100 0
0 20 40 60 80 100
Node abscissa Node abscissa
(a)Centroid algorithm positioning results (b)ATIP positioning results
 
Node ordinate
Node ordinate
Node ordinate
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 179 
 
Coincidence rate:94.36%
Prediction Unknown Anchor 
node node node
100
80
60
40
20
0
0 20 40 60 80 100
Node abscissa
(c)Improved genetic 
algorithm location results  
Figure 6: Location results of three algorithms 
50
Centroid algorithm positioning error
45 Location error of ATIP algorithm
Improved Genetic Algorithm Location Error
40
35
30
25
20
15
10
5
0
0 20 40 60 80
Unknown node
 
Figure 7: Positioning error of three algorithms 
The centroid algorithm in Figure 7 achieves a coverage, but relies on high-density anchor nodes and is 
maximum error rate of 32% during positioning, and the prone to misidentifying external points of the triangle as 
overall node positioning average error rate is 14% internal points. The centroid algorithm is computationally 
(P<0.05). The maximum error rate of APIT in positioning simple and suitable for scenarios that do not require high 
is 20%, but the average value of the overall error has accuracy. However, its accuracy is low and it is easily 
decreased to 9% (P<0.05). This indicates that compared affected by uneven distribution of node density, resulting 
to the centroid positioning algorithm, the predicted in positioning bias towards areas with dense anchor nodes 
coordinates error calculated by the APIT positioning and significant errors. IGA introduces a method of 
algorithm is significantly reduced, with better positioning dynamically adjusting crossover probability and mutation 
results. The maximum error rate of IGA during probability during the evolution process. It dynamically 
positioning is 13%, and the average value of its overall adjusts based on individual fitness and population 
error is 5.2% (P<0.05). The maximum error of IGA is average fitness, avoiding premature convergence of the 
19% lower than the centroid algorithm, and the overall algorithm and ensuring the search for the global optimal 
average error is 8.8% lower (P<0.05). Compared to solution. IGA performs local fine optimization, 
APIT, the maximum error of IGA is 7% lower, and the improving the accuracy and stability of the algorithm. 
average overall error is 3.8% lower (P<0.05). APIT The comparison of the three shows that IGA has excellent 
improves positioning accuracy through random triangle node positioning capabilities in wireless sensor networks. 
Positioning error
Node ordinate
180 Informatica 49 (2025) 171–186 C. Zhong et al. 
 
4.2 IGA-based application analysis of networks under different iterations. Hence, under the 
same parameter conditions, this study gradually changes 
computer network optimization design 
the iterations and simulates them using TGA, centroid 
There are differences in the performance of IGA and positioning algorithms, APIT, and IGA, respectively. 
other node localization algorithms for wireless sensor Thus, the iteration number 
10
Traditional genetic algorithm
Centroid positioning algorithm
8 ATIP location algorithm
Improved genetic algorithm
6
4
2
0
0 10 20 30 40 50 60 70 80 90 100
Iterations
 
Figure 8: Changes in fitness values of the four methods 
2
Traditional genetic algorithm
Centroid positioning algorithm
ATIP location algorithm
1.5
Improved genetic algorithm
1
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Number of anchor nodes
 
Figure 9: Relationship between the number of anchor nodes and average error 
and its corresponding fitness value are obtained. As the Figure 8 shows that the fitness values of the four 
number of iterations gradually increases, Figure 8 localization algorithms are all less than 10. The fitness 
illustrates the corresponding changes in fitness between values of IGA, TGA, centroid algorithm, and APIT are 
the IGA and other algorithms for positioning wireless 4.26, 8.15, 6.42, and 5.31, respectively. The iteration 
sensor network nodes. numbers are 69, 86, 83, and 79, respectively. The IGA’s 
fitness value is the lowest, 3.89 lower than that of TGA, 
Average error
Fitness value
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 181 
 
2.03 lower than centroid algorithm, and 1.05 lower than the average error of IGA is the lowest among all four 
APIT. This shows that IGA has better adaptability in algorithms. This fully demonstrates the advantages of 
node localization, and verifies the superiority of this IGA. Figure 10 displays the chart that pertains to the 
algorithm. The anchor node numbers and the average communication radius of a particular node and the 
error value’s relationship of the four algorithms is shown average error value of the four algorithms. 
in Figure 9. In Figure 10, the communication radius of nodes is 
In Figure 9, as the anchor node amount increases, increasing, while the average error of the four node 
the average error of the four positioning algorithms is positioning algorithms is gradually decreasing. The 
gradually decreasing. The average errors of TGA, average errors of TGA, centroid algorithm, APIT, and 
centroid algorithm, APIT, and IGA are 1.12, 1.03, 0.95, IGA are 5.75%, 4.52%, 3.87%, and 2.46%, respectively 
and 0.68, respectively (P<0.05). The average error of (P<0.05). When the communication radius is the same, 
IGA is significantly lower than that of TGA. Meanwhile, the IGA’s average error is always at the lowest value. 
as the number of anchor nodes increases gradually in This verifies the superiority of IGA when the 
proportion to all nodes, the average error of various communication radius of nodes changes. Figure 11 is the 
algorithms undergoes slight changes over time, as shown relevant of the specific network connectivity and the 
by the curve. As the anchor node amount keeps the same, average error value of the four algorithms. 
10%
Traditional genetic algorithm
9% Centroid positioning algorithm
8% ATIP location algorithm
7% Improved genetic algorithm
6%
5%
4%
3%
2%
1%
0
0 10 20 30 40 50 60 70 80 90 100
Node communication radius
 
Figure 10: Relationship between node communication radius and average error 
10%
Traditional genetic algorithm
9% Centroid positioning algorithm
8% ATIP location algorithm
7% Improved genetic algorithm
6%
5%
4%
3%
2%
1%
0
0 10 20 30 40 50 60 70 80 90 100
Node network connectivity
 
Figure 11: Relationship between network connectivity and average error value 
Average error Average error
182 Informatica 49 (2025) 171–186 C. Zhong et al. 
 
The average errors of TGA, centroid algorithm, between node coverage, evolutionary algebra, and 
APIT, and IGA in Figure 11 are 5.41%, 4.49%, 3.71%, completion time of TGA and IGA. 
and 2.41%, respectively (P<0.05). This indicates that The node coverage of the two algorithms in Figure 
regardless of changes in network connectivity, IGA's 12(a) shows: in the same node density, the IGA has a 
positioning ability is always higher than the other three higher regional coverage. Figure 12(b) displays the 
algorithms. Figure 12 is the graph about connection relationship between the number of evolutions and 
completion time of both algorithms over 
100
90
80
70
60
50
40
30
Node coverage of traditional genetic 
20 algorithms
Improving Node Coverage of 
10 Genetic Algorithms
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Node density
(a)Node coverage graph of two algorithms
 
1000 Relationship between Evolutionary Algebra and 
Completion Time of Traditional Genetic Algorithms
900 The Relationship between Evolutionary Algebra and 
Completion Time of Improved Genetic Algorithms
800
700
600
500
400
300
200
100
0
0 50 100 150 200 250 300 350 400 450 500
Evolutionary algebra
(b)Evolution algebra and completion time graph of two algorithms
 
Figure 12: The relationship graph between node coverage, evolutionary generations, and completion time of two 
algorithms 
Completion time
Area coverage
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 183 
 
successive iterations of the genetic algorithm. As the further application analysis is conducted in a large-scale 
iterations progress, the number of evolutions gradually wireless sensor network node positioning scenario with a 
increases while the time required to complete all side length of 500m in a region. The total number of 
iterations decreases. However, with the same sensor nodes in the region is 500, including 100 anchor 
evolutionary algebra, IGA takes less time. This shows the nodes and 400 unknown nodes. The results of the 
superiority and stability of the IGA. To further analyze large-scale application analysis are shown in Table 2. 
the adaptability and superiority of the research method, 
Table 2: Results of application analysis in major scenarios 
Metrics IGA APIT Algorithm Centroid Algorithm 
Average Positioning Error 
2.45 5.62 7.89 
(%) 
Positioning Time (s) 12.3 18.4 9.6 
Number of Iterations 75 120 60 
Fast (0.5 fitness Medium (0.8 fitness Slow (1.2 fitness 
Convergence Speed 
variation) variation) variation) 
Node Coverage Rate (%) 98.6 90.5 85.3 
 
As shown in Table 2, the average positioning error advantages in node positioning accuracy. At the same 
of IGA is 2.45%, significantly lower than the 5.62% of time, IGA had a faster convergence speed, requiring only 
APIT algorithm and the 7.89% of centroid algorithm 75 iterations to reach the optimal solution, with a stable 
(P<0.05). This indicates that IGA can effectively improve fitness value change (0.5). APIT and centroid algorithms 
the accuracy of node localization in large-scale scenarios required 120 and 60 iterations, respectively, and had 
and is suitable for complex and high-precision network slower convergence speeds. In addition, IGA achieved a 
environments. The positioning time of IGA is 12.3 node coverage rate of 98.6%, significantly higher than 
seconds, which is between 18.4 seconds of APIT and 9.6 APIT (90.5%) and centroid algorithm (85.3%), 
seconds of centroid algorithm (P<0.05). Although the demonstrating its applicability and advantages in 
computational complexity is slightly higher than that of large-scale complex network environments. The reason 
the centroid algorithm, IGA improves efficiency by why IGA outperforms traditional methods in terms of 
optimizing the evolution process, enabling it to maintain positioning error and fitness values is mainly due to 
fast computational speed while achieving high-precision several key technological innovations. Firstly, the 
positioning. After 75 iterations, IGA achieves dynamic parameter adjustment mechanism can 
convergence, which is faster than the APIT algorithm's dynamically adjust the crossover probability and 
120 iterations (P<0.05), demonstrating the advantages of mutation probability based on the fitness value, thereby 
IGA's dynamic parameter adjustment and elite strategy in balancing global and local search and preventing the 
the search process. Although the centroid algorithm has algorithm from getting stuck in local optimal solutions. 
fewer iterations, its accuracy is significantly insufficient This is consistent with the ideas of Yu et al. [20]. 
(P<0.05). The node coverage rate of IGA reaches 98.6%, Secondly, fitness function optimization reduces 
which is much higher than the 90.5% of APIT algorithm computational complexity and enhances robustness to 
and the 85.3% of centroid algorithm (P<0.05). This outliers by introducing absolute error instead of 
indicates that IGA has better coverage performance in traditional square error, enabling the algorithm to 
large-scale networks and can optimize node positioning approach the global optimal solution more quickly. In 
layout more comprehensively. addition, the elite strategy ensures the retention of high 
fitness individuals and reduces the loss of high-quality 
4.3 Discussion solutions. The uniform distribution of the initial 
population within the communication area improves 
This study has designed an IGA that effectively improves 
search efficiency and reduces ineffective calculations 
the accuracy and stability of node localization in wireless 
caused by random initialization. The results obtained are 
sensor networks through techniques such as dynamic 
consistent with Singh et al.'s study [21]. These 
parameter adjustment, fitness function optimization, and 
improvements effectively address common pitfalls of 
elite strategy. Compared with traditional centroid 
TGAs, such as local optima and premature convergence, 
algorithms and APIT algorithms, IGA exhibits significant 
enabling IGA to exhibit higher stability and accuracy in 
advantages in key performance indicators. Specifically, 
complex dynamic network environments. The core 
the average positioning error of IGA was 2.45%, which 
innovation of IGA lies in combining the local 
was much lower than APIT's 5.62% and centroid 
improvement of TGAs with global search, which is 
algorithm's 7.89%, indicating that IGA has significant 
suitable for non-standard situations such as uneven node  
184 Informatica 49 (2025) 171–186 C. Zhong et al. 
 
 
distribution, limited number of anchor nodes, and References 
complex conditions such as changes in communication 
radius. In practical applications, IGA demonstrates good [1] F. Wang, X. Lai, and N. Shi, “A multi-objective 
stability and adaptability by flexibly adjusting parameters optimization for green supply chain network 
and optimizing search space. In previous studies, TGAs design,” Decision Support Systems, vol. 51, no. 2, 
often faced local optimal traps, leading to premature pp. 262-269. 
convergence of the algorithm. The study aims to enhance https://doi.org/10.1016/j.dss.2010.11.020 
population diversity, reduce the interference of outliers [2] Q. Liu, Z. Guo, and J. Wang, “A one-layer recurrent 
on the search process, and accelerate the convergence of neural network for constrained pseudoconvex 
the global optimal solution by providing low fitness optimization and its application for dynamic 
individuals with moderate opportunities for crossover and portfolio optimization,” Neural Networks, vol. 26, 
mutation. The research provides a more stable, accurate, pp. 99-109, 2012. 
and efficient solution for node localization and https://doi.org/10.1016/j.neunet.2011.09.001 
optimization in complex network environments. [3] J. L. Ribeiro Filho, P. C. Treleaven, and C. Alippi, 
“Genetic-algorithm programming environments,” 
Computer, vol. 27, no. 6, pp. 28-43, 1994. 
5 Conclusion https://doi.org/10.1109/2.294850 
[4] C. D. Lin, C. M. Anderson-Cook, M. S. Hamada, L. 
The high-speed development of computer network M. Moore, and R. R. Sitter, “Using genetic 
technology has caused tremendous changes in people's algorithms to design experiments: a review,” 
production and life. Currently, computer network Quality and Reliability Engineering International, 
optimization still has the problem of low positioning vol. 31, no. 2, pp. 155-167, 2015. 
accuracy of network nodes. To solve the related https://doi.org/10.1002/qre.1591 
problems, this study constructed an IGA model and [5] W. B. Fan, F. Xiao, X. B. Chen, L. Cui, and S. Yu, 
applied it to computer network optimization. “Efficient virtual network embedding of 
Experimental results showed that IGA has significantly cloud-based data center networks into optical 
improved location coverage and average location error networks,” IEEE Transactions on Parallel and 
compared to centroid algorithm and APIT. The Distributed Systems, vol. 32, no. 11, pp. 2793-2808, 
coincidence rate of the improved algorithm was 8.16% 2021. https://doi.org/10.1109/TPDS.2021.3075296 
higher than centroid algorithm’s and 4.69% higher than [6] B. Rajendran, and S. Venkataraman, “Detection of 
that of the APIT algorithm. The maximum error of IGA malicious network traffic using enhanced neural 
was 19% lower than the centroid algorithm, and the network algorithm in big data,” International 
overall average error was 8.8% lower. Compared to Journal of Advanced Intelligence Paradigms, vol. 19, 
APIT, the maximum error of IGA was 7% lower, and the no. 3-4, pp. 370-379, 2021. 
average overall error was 3.8% lower. Under the same https://doi.org/10.1504/ijaip.2021.116366 
parameters, TGA, centroid algorithms, APIT algorithms, [7] A. Xiaokaiti, Y. Qian, and J. Wu, “Efficient data 
and IGA were used to compare the performance of transmission for community detection algorithm 
network nodes in computer networks. Experimental data based on node similarity in opportunistic social 
were obtained: the fitness value of IGA, the amount of networks,” Complexity, vol. 2021, pp. 1-18, 2021. 
anchor nodes and the average error, the communication https://doi.org/10.1155/2021/9928771 
radius and the average error, and the network [8] R. Alsaqour, S. Kamal, M. Abdelhaq, Y. Zan, and 
connectivity and the average error were 4.26, 0.68, 2.46, D. Jerou, “Genetic algorithm routing protocol for 
and 2.41, respectively. IGA had a significant mobile ad hoc network,” CMC Computers Materials 
improvement over the calculated values corresponding to Continua, vol. 68, no. 1, pp. 941-960, 2021. 
the three algorithms, which proves the accuracy and https://doi.org/10.32604/cmc.2021.015921 
stability of the improved genetic positioning algorithm. [9] B. Bu, “Mult-task equilibrium scheduling of internet 
 of things a rough set genetic algorithm,” Computer 
6  Abbreviated List Communications, vol. 184, pp. 42-55, 2022. 
NT: Network Topology https://doi.org/10.1016/j.comcom.2021.11.027 
PSOGA: Particle swarm optimization and genetic [10] R. Y. Wu, J. M. Ma, Z. X. Tang, X. H. Li, and K. K. 
algorithm R. Choo, “A generic secure transmission scheme 
RLN: Random linear network based on random linear network coding,” IEEE 
IGA: Genetic Algorithm ACM Transactions on Networking, vol. 30, no. 2, 
TGA: Traditional Genetic Algorithm pp. 855-866, 2021. 
APIT: Approximate Point In Triangulation Test https://doi.org/10.1109/TNET.2021.3124890 
[11] W. C. Chang, and I. H. R. Jiang, “iClaire: A fast and 
general layout pattern classification algorithm with 
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 185 
 
clip shifting and centroid recreation,” IEEE Nonlinear Control, vol. 31, no. 15, pp. 7390-7408, 
Transactions on Computer-Aided Design of 2021. https://doi.org/10.1002/rnc.5690 
Integrated Circuits and Systems, vol. 39, no. 8, pp. [21] G. Singh, V. K. Tewari, R. R. Potdar, and S. Kumar, 
1662-1673, 2019. “Modeling and optimization using artificial neural 
https://doi.org/10.1109/TCAD.2019.2917849 network and genetic algorithm of self‐propelled 
[12] T. Ganesan, and P. Rajarajeswari, “Efficient sensor machine reach envelope,” Journal of Field Robotics, 
node connectivity and target coverage using genetic vol. 41, no. 7, pp. 2373-2383, 2024. 
algorithm with Daubechies 4 lifting wavelet https://doi.org/10.1002/rob.22255 
transform,” International Journal of Communication  
Networks and Distributed Systems, vol. 28, no. 3,  
pp. 337-364, 2022.  
https://doi.org/10.1504/ijcnds.2022.122170  
[13] S. T. Shishavan, and F. S. Gharehchopogh, “An  
improved cuckoo search optimization algorithm  
with genetic algorithm for community detection in  
complex networks,” Multimedia Tools and  
Applications, vol. 81, no. 18, pp. 25205-25231,  
2022. https://doi.org/10.1007/s11042-022-12409-x  
[14] B. Nahavandi, M. Homayounfar, A. Daneshvar, and  
S. Mohammad, “Hierarchical structure modelling in  
uncertain emergency location-routing problem using  
combined genetic algorithm and simulated  
annealing,” International Journal of Computer  
Applications in Technology, vol. 68, no. 2, pp.  
150-163, 2022.  
https://doi.org/10.1504/ijcat.2022.123466  
[15] Z. Sabir, M. R. Ali, and R. Sadat, “Gudermannian  
neural networks using the optimization procedures  
of genetic algorithm and active set approach for the  
three-species food chain nonlinear model,” Journal  
of Ambient Intelligence and Humanized Computing,  
vol. 14, no. 7, pp. 8913-8922, 2023.  
https://doi.org/10.1007/s12652-021-03638-3  
[16] C. Zhao, W. X. Zhu, G. Qiao, and F. Zhou,  
“Optimisation method with node selection and  
centroid algorithm in underwater received signal  
strength localization,” IET Radar, Sonar Navigation,  
vol. 14, no. 11, pp. 1681-1689, 2020.  
https://doi.org/10.1049/iet-rsn.2020.0178  
[17] Y. Zou, “Coupled neural networks and genetic  
algorithms application in the field of mine fire  
extinguishing,” Informatica, vol. 48, no. 16, 2024.  
https://doi.org/10.31449/inf.v48i16.6317  
[18] Y. M. Wu, Z. Li, C. X. Sun, Z. B. Wang, D. S.  
Wang, and Z. W. Yu, “Measurement and control of  
system resilience recovery by path planning based  
on improved genetic algorithm,” Measurement and  
Control, vol. 54, no. 7-8, pp. 1157-1173, 2021.  
https://doi.org/10.1177/00202940211016094  
[19] Y. Zhou, “Structural damage identification of  
large-span spatial grid structures based on genetic  
algorithm,” Informatica, vol. 48, no. 17, 2024.  
https://doi.org/10.31449/inf.v48i17.6428  
[20] M. Yu, and S. Chai, “Adaptive iterative learning  
control for discrete time nonlinear systems with  
multiple iteration varying high order internal  
models,” International Journal of Robust and  
 
186 Informatica 49 (2025) 171–186 C. Zhong et al. 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.7452  Informatica 49 (2025) 187-198   187 
 
Optimization of Emergency Material Logistics Supply Chain Path 
Based on Improved Ant Colony Algorithm 
Mingbin Wei 
College of Economics and Management, YanShan University, Qinhuangdao 066000, Hebei, China 
E-mail：weimingbin@stumail.ysu.edu.cn 
 
Keywords: emergency material, improved ant colony algorithm (IACA), logistics supply chain, travelling salesman 
problem 
 
Received: October 29, 2024 
Path selection is a critical challenge in emergency logistics management, particularly under realistic 
disaster-related conditions. This study addresses the problem of optimizing logistics transportation during 
major epidemics, considering constraints such as vehicle load, volume, and maximum travel distance per 
delivery. The goal is to minimize costs related to distribution trips, time, early/late penalties, and fixed 
vehicle expenses. By framing the problem as a generalized Traveling Salesman Problem, we developed 
an Improved Ant Colony Algorithm (IACA) to reduce the longest distribution path. Simulation data from 
Pudong, Shanghai lockdown zones revealed that IACA outperformed the traditional ACO algorithm, 
achieving a 30% cost reduction and higher accuracy (R² = 0.98). Additionally, experiments on gate 
assignment and TSP demonstrated the algorithm's superior optimization ability and stability. Overall, 
IACA enhances delivery route efficiency, lowers costs and energy consumption, and improves emergency 
logistics performance, proving to be a robust and reliable solution. 
Povzetek: Avtor je razvil izboljšan algoritem kolonije mravelj (IACA), ki optimizira poti v logistični 
oskrbovalni verigi za nujne materiale. 
 
1 Introduction unpredictable and emergency material elements are very 
complicated. 
The logistics industry's use in many different industries is In order to effectively provide goods while minimizing 
growing more and more common as the global economy losses, emergency logistics was first established in 2004 to 
develop. Researchers are becoming more aware of the address the logistical challenges resulting from disasters. 
crucial role emergency logistics (EL) plays in delivering With its emphasis on facility placement and material 
supplies to disaster zones as a result of the exponential rise delivery, it is essential to emergency decision-making. 
in emergency response operations and the rising frequency Supply channels are frequently disrupted by disasters, 
of disasters, both man-made and natural [1]. In particular, resulting in large losses. 15% to 20% of all disaster losses 
since 1980, catastrophes caused by nature have claimed the may be attributable to inefficient distribution [4]. The 
lives of over 2.4 million people globally, and their focus of humanitarian logistic modeling services, which 
economic toll has grown by more than 800%, reaching account for 80–90% of rescue expenses, is on rapid 
$210 billion in year 2020 alone.137 million people in reaction deployment, which is essential for successful 
China were directly impacted by different disasters by rescue operations. The ideas under discussion centre on 
natural in the same year, resulting in 19,956.7 hectares of maximizing the distribution, transportation, and 
crops being damaged, 370.15 billion yuan in direct positioning of emergency supplies to rescue locations. For 
economic losses, and 591 fatalities [2].  disaster response to be successful, emergency logistics 
The National Disaster Reduction Commission and the efficiency must be increased [5]. 
China National Ministry of Emergency Management both For emergency rescue and relief efforts to be successful, 
state, China experienced 130 million individuals impacted the emergency supply chain must run smoothly. It is quite 
by various disasters by natural in 2018, 588 fatalities, and challenging to gather comprehensive information to 
264.46 billion RMB in direct economic losses. Using support emergency operations during large-scale 
earthquakes as an example, in 2018, earthquakes of calamities since they are frequently unanticipated and 
magnitude 6.0 or higher killed 3068 people worldwide and exceedingly destructive. The victims' livelihoods and 
wounded over 16,000 more. In order to reduce the number security may be negatively impacted by ineffective or even 
of casualties and property damage following a disruption, halted emergency material operations brought on by a 
emergency materials must be delivered to the disaster shortage of supplies and knowledge [6]. Therefore, while 
zones promptly, precisely, and efficiently [3]. Emergency reacting to large-scale emergencies, it is essential to solve 
rescue is usually quite urgent because most disruptions are the fundamental concerns of rapid acquisition and 
188      Informatica 49 (2025) 187-198       M. Wei 
 
integration of comprehensive emergency supply chain  The most important thing should be the timeline. Only if 
materials and information flow. the emergency supplies are delivered to the Disaster 
The planning, coordinating, and carrying out of logistics Supply Depots accurately and on time will the damage be 
theory expert operations during emergencies, disasters, reduced. The fastest delivery time is therefore practically 
and crises are guided by the framework and set of the most crucial factor in the improved ant colony model. 
principles known as emergency logistics theory. It We suggested an Improved ACA-based approach to 
includes a variety of ideas, tactics, and procedures meant solving the routing emergency material path problem. To 
to guarantee the smooth and efficient movement of get the best answer, the process determined the shortest 
information, products, services, and resources in order to path between nodes using a travelling salesman shortest 
meet the demands of impacted communities and lessen the path tree structure [8]. According to research findings, the 
effects of the crisis. The study uses a hybrid technique of suggested approach performed admirably in various 
simulated annealing and ant colony optimization to offer a disaster networks. In conclusion, even though ACO has 
low-carbon vehicle route optimization model for logistics been researched extensively and shown to perform 
and distribution. For increased efficiency, it uses an effectively in organising routes, it is incredibly uncommon 
adaptive elite individual reproduction strategy and adds a to utilise Ant Colony to optimise the supply chain route for 
multifactor operator and carbon emission factor [7]. emergency material logistics in disaster areas.  
Researchers are becoming more interested in emergency The study's primary contributions fall under the following 
logistics. Nonetheless, the majority of recent studies categories.  
address the location of facilities. This study focusses on the • First, this study identifies and measures the variables 
supply chain for emergency material logistics following a that have been discovered to affect ACA's efficacy from 
disaster. After a disaster strikes, it seeks to create the best both an internal and external standpoint. This includes the 
plans possible for moving emergency supplies from one- IACA's shortest route while taking into account the supply 
to-many supply depots to disaster depots. Fig. 1 depicts the chain's external environment, funding for materials and 
emergency management domain. In order to demands equipment, complex emergency decision-making, and 
victims and finish rebuilding the disastrous event region material transportation deployment. This is a definite step 
after a disaster occurs, EM is a specific type of problem in filling the knowledge gap in the body of existing 
with vehicle routing that examines how to transport relief literature.  
materials from depots for supply to depots for demand • Second, according to research methodologies, the 
(disaster areas). EMS is typically separated into many-to- majority of models in use today use traditional algorithms 
many scenarios based on the quantity of supply depots, as including precise, heuristic, and meta-heuristic algorithms. 
seen in Fig. 2 • Third, in order to show the validity of the results, we 
 contrasted the outcomes of the IACA method with those of 
the conventional ACA strategy. Furthermore, these 
findings will help emergency managers better pinpoint the 
sources and means of important elements, as well as the 
causal and hierarchical connections among them, and they 
will be able to contribute to the development of a robust 
and effective path.  
 
This study is organized as the literature review for this 
topic is presented in Chapter 2. The research strategy and 
 
Figure 1: Supply depots to demand depots methodology are described in-depth in Chapter 3. The 
 application of the proposed Improved-ACO Algorithm is 
shown in Section 4, followed by the application's 
outcomes and the discussion these finding, Section 5 
summarises the findings, study shortcomings, and future 
research directions. 
 
2 Literature review 
These days, the transportation sector is growing quickly, 
and its main concentration is on the logistical distribution 
of perishable agricultural goods. The Emergency material 
logistics supply chain path finding path optimization 
 method's application usefulness is recognised by experts. 
Figure 2: Quantity of supply depots in many to many The supply chain for perishables has been examined by 
scenario several academics. 
Optimization of Emergency Material Logistics Supply Chain Path...  Informatica 49 (2025) 187-198    189 
 
For the uncertainty of unanticipated events in roadways in An ACO-based optimisation technique was put forth by 
cities during shipping, a GA-based path optimisation researchers to address the issue of UAV scheduling routes. 
model was introduced [9]. As part of the endeavour to The procedure was solved for DSP and optimised for 
reduce transportation costs, the logistics path optimisation hierarchical “pheromone”-based processing. According to 
model with a challenging window was developed to the testing results, the suggested approach has outstanding 
address the path dynamic. The outcomes of the path planning speed and good path planning quality [14].  
experiments showed that this path optimisation model To solve the path routing problem of the fourth- party 
performed successfully. In order to address the sustainable logistics, a method based on the ant colony system and the 
food supply chain optimisation issue, this study presented improved grey wolf algorithm was proposed. During the 
a mixed integer linear programming model. In the purpose process, which included a carrying capacity and reputation 
to reduce fuel costs, transportation expenses and carbon constraint from the beginning node to the node of 
emissions were integrated, and Norwegian salmon destination, known as the transit range, the ratio utility 
exporters were used to conduct a suitability analysis [10].  theory was used to determine the customer's risk appetite. 
For the prompt delivery of disaster relief materials after The results of the study demonstrated that the 
natural catastrophes, [11] and his team proposed an recommended strategy could effectively finish path 
algorithm based on meta-heuristics. in hybrid Three optimisation planning [15]. 
optimisation improvement approaches were presented, the For the planning of the return path challenge of logistics 
urgency coefficient of each demand point was evaluated, networks in reverse, author suggested an ACO-based path 
and Harris eagle optimisation and random PSO were technique. The procedure created a MINLP model, 
integrated. The outcome of the study shows that the evaluated costs using a closed-loop, multi-stage logistics 
suggested approach had a maximum degree of network, and tested using thirty instances. Results from 
computational correctness.  experiments showed that the suggested approach could 
This Study developed a path optimisation technique based produce return pathways of excellent quality [16].  
on enhanced GA to meet the cost and efficiency criteria of An ACO-based approach to solving the routing nodal path 
distributing fresh food. The procedure implemented a problem. To get the best answer, the process determined 
linear adaptive cross-variance technique and designated the shortest path between nodes using a rooted shortest 
certain elements as penalty factors. The finding of the path tree structure. According on research findings, the 
study shows that the approach could successfully reduce suggested approach performed effectively in networks of 
delivery path length and had a higher path optimisation various sizes. In conclusion, despite extensive research and 
efficiency [12]. demonstrated effectiveness in route planning, ACO is 
A two-objective optimisation model was presented by this currently rarely used to optimisethe route taken by cold 
author for the adaptable perishable supply chain design chain logistics to distribute perishable agricultural goods. 
commodities [13]. During the process, product and route Given the pressing need for additional technical references 
disruption deterioration were thoroughly examined, utility to support the growth of the cold chain logistics 
role GA was added to optimise the method, and dynamic distribution industry, the Improved ACO-based 
pricing was employed to handle crises. The experimental optimisation model was put forth, and the technical 
results demonstrated the good flexibility of the suggested features of ACO were applied to optimise the cold chain 
approach. Using Ant Colony Optimization (ACO) or other logistics distribution path for perishable agricultural 
technologies related to computers has been researched by products [17]. 
several academics.   
 
Table 1: Summary on existing and suggested method compared with computational accuracy, time efficiency and cost 
reduction 
Algorithm Computational Accuracy Time efficiency Cost reduction 
TGWO TGWO algorithm as a When compared to the  In terms of the overall 
decision support tool to boost TS and GWO cost of distribution, it 
supply chain performance, algorithms, the TGWO saved 14.34 percent 
cut expenses, and improve method reduced the and 9.03 percent. 
cold chain logistics overall journey  
operations. distance by 50.34% and 
 30.66%, respectively. 
 
190      Informatica 49 (2025) 187-198       M. Wei 
 
IPSO The algorithm and If every demand's According to the 
emergency logistics vehicle delivery priority is met, sensitivity analysis, 
route optimization model for an enhanced vehicle when the time cost is at 
severe epidemics that are routing optimization its lowest, there should 
suggested in this research algorithm that takes be three vehicles in the 
work well. delivery urgency into distribution center. The 
 account can save a whole expense was 
specific amount of lowered by 20.09%. 
time;  
 
NN   The prediction findings Choose the route for  Reduce the percentage 
demonstrate that the material delivery that of transportation 
prediction can yield has the quickest speed expenses as much as 
increased accuracy and better and the lowest distance. you can, and save 
route matching.   money on supplies and 
 automobiles. 
 
IACA the suggested model this study uses cost reduction based on 
achieved the best solution travelling salesman transportation, 
accuracy with 98.5% problem to find the inventory, labor, 
shortest route and minimize distance, 
efficient in time avoid traffic and 
travelling distance for harzards by 30.2% 
emergency  
  
  
3 Methodology supply. The “pheromone” that remain on a trail may 
persuade other ants to follow it. The strongest 
Inspiration: “pheromone” indicates a shorter path, which most ants can 
The process of optimising ant colonies is iterative. Several eventually follow.  
fictitious ants are considered at each cycle. With the  
restriction that ant not go to any vertices that already made 3.1 Improved ant colony algorithm 
it to during the ant walk, each of them constructs a solution This study uses the improved ant colony algorithm's high 
by moving from vertices to vertices on the graph. An ant adaptability, multi-concurrency, resilience, and global 
uses a stochastic mechanism biassed by the “pheromone” search capabilities to handle the logistics supply chain's 
to choose the next vertex to visit at each stage of the emergency material problem. Consequently, the enhanced 
solution building process. For instance, the next vertex in ant colony method is presented to solve the model since it 
vertex I is selected at random from among those that is highly parallel, offers the benefits of high fault tolerance 
haven't been visited yet. More specifically, k can be chosen and self-adaptation, and allows for heuristic improvement 
with a probability proportional to the “pheromone” to enhance the algorithm's convergence. The ant colony 
connected to edge (m, n) if it has never been visited before. method is a Traveling salesman algorithm that finds the 
The calibre of the answers the ants have created will shortest path by simulating ant populations' foraging 
depend on, the “pheromone” values are modified at the end behavior. Individuals of the ant colony will evaluate and 
of an iteration. By doing this, the ants are skewed to create choose the optimal foraging path based on the 
solutions that are similar to the greatest ones they have concentration of “pheromones” left by ants as they pass 
created in previous cycles. The basic concept underlying through nodes along the route. Rich customer and order 
the AC method is inspired by the way ant colonies behave data are challenging for traditional logistics to manage. 
when they are looking for food. Ants usually start by Therefore, in the event of a natural disaster, the logistics 
aimlessly looking around for food, bringing some of what automation path finder now incorporates an improved 
they find back to their colony. They also leave a ACA. The drawbacks of conventional ACA are addressed 
“pheromone” on the track they found. The worth of with certain enhancements, which successfully resolve 
“pheromones” left in their wake, which gradually issues like resource scheduling and route planning in the 
dissipates, depends on the quantity and calibre of the food  logistical process. Figure 3 explained about the flow of 
 emergency material path supplying method. 
Optimization of Emergency Material Logistics Supply Chain Path...  Informatica 49 (2025) 187-198    191 
 
 
Figure 3: Block diagram of the method 
  
3.2 Improved ant colony algorithm  
optimization strategy for emergency  
𝑛
logistics material 𝑛
𝑛
𝐶2 = ∑ ∑ 𝑖 ∑ 𝐶 𝑥 ′
𝑖𝑗𝑘
𝑣=1 𝑖𝑗𝑘                 (2) 
𝑗=0
Several production process connections are becoming 𝑗=1
more specialized due to the escalating market  
competitiveness. As a result of this tendency, logistics and The set of transportation between places 𝑖and𝑗 is 
commercial flow are now separated, progressively represented by 𝐶𝑖𝑗𝑘 in eq. (2). The variable between 
emphasising the significance of logistics. Conventional transportation points is called 𝐶𝑖𝑗𝑘. One constant is n.  
logistics models are inefficient and do not provide IACO puts ants at the first dispersion site in Fig. 4. After 
intelligent assistance. A lot of management and operations creating a tabu table, the cycle is initiated. The 
involve manual labor, which is ineffective and prone to experimental random selection technique determines the 
mistakes. At the same time, businesses find it challenging next transfer node based on the chance of ants travelling to 
to forecast and decide on logistical procedures when they various nodes. The transfer node is added to the tabu table 
lack intelligent help. Additionally, this hinders the rapid once it satisfies the constraint constraints. The ants' 
optimization and modification of logistical plans. The relevant journey length and delivery cost are determined 
emergency logistics model has been developed based on once the transfer is completed constantly until the tabu-
this. The material logistics supply chain is a complete table is full. The worldwide ““pheromone”” is adjusted 
system that controls and optimizes the logistics process and the path optimisation is finished based on the 
using a variety of automation technologies and tools. computed results. After the maximum number of cycles 
Logistics transportation costs are a significant part of the has been reached, the outcome is the optimal path solution. 
logistics chain, and they can be managed and controlled to In the real-world application, the starting point and 
increase the efficiency of logistical processes.  associated fundamental path of dispersion parameters are 
The cost of logistics transportation is shown in eq. (1). input. The research method is then used to develop the path 
 solution, and the best solution is chosen to finish the path 
𝑘
𝐶1 = ∑ 𝑠𝑘 ∑𝑚 . generating process. 
𝑘=1 𝑖=1 𝑓i               (1) 
 
𝑠𝑘is the variable in eq. (1). The vehicle number is 𝑣. The 
number of articles, including all transportation expenses, 
is denoted by 𝑚. 𝑓is the cost of driving eq.(2) displays the 
associated transportation cost. 
 
 
 
 
 
192      Informatica 49 (2025) 187-198       M. Wei 
 
of conventional autonomous logistics systems is rather 
low, despite the growing need for logistics technology. It 
is necessary to enhance the capacity to develop 
transportation networks, manage the supply chain, and 
optimise warehouse operations. An optimisation algorithm 
called ACA mimics how ants might forage for food in the 
wild. By mimicking the ant's ““pheromone”” transmission 
mechanism, it facilitates cooperation and information 
exchange throughout the optimisation process. Thus, 
adding ACA to the logistics control model can improve the 
management level of autonomous logistics control by 
successfully resolving issues like resource scheduling and 
path planning in the logistics process. Eq. (3) shows the 
node selection probability in ACA.  
[π (s
𝑃𝑘 ij )]α [𝜂ij(s)]β 
¨ (t) = { β 𝛼 , j∈ 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑝𝑎𝑠𝑠𝑒𝑑                                   
𝑖𝑗 ∑[[πis(s)]α [𝜂ij(s)] ]
                                                                            (3) 
In eq. (3),𝑖𝑗  is the data concentration. 𝑖𝑗 is the path 
visibility of path. is the trade-off factor in 
““pheromone””.  is the “heuristic factor” for expected 
values 
. The 𝑖𝑗 is shown in eq. (4) 
1
𝜂𝑖𝑗 =                           (4) 
𝑑𝑖𝑗
In eq. (4), 𝑑𝑖𝑗  is denotes the point i and point j distance. 
After complete ant colony cycle, its ““pheromones” will 
also be updated accordingly. The expression for 
“pheromone” update is shown in eq.(5) 
 
 𝜏𝑖𝑗(t+n)=(1-𝜌)𝜏𝑖𝑗(𝑡) + Δ𝜏𝑖𝑗                                    
Figure 4: IACA flowchart (5) 
 The “pheromone” evaporation coefficient is represented 
The logistics supply chain system consists of a server, by eq. (5). The time point is 𝑡. The next node is typically 
front-end work, mechanical arm, three-dimensional chosen at random by conventional ACA, though. While 
warehouse management, AGV monitoring, PLC random selection facilitates the exploration of broader 
monitoring, sorting system, and commodities warehouse, problem areas, the early stage's convergence speed is 
among other components. The primary the sorting system's slower due to the lengthy application of positive feedback. 
objective is to determine and categorise products. They are If the complete supply chain is not coordinated and 
dispersed throughout various locations or forms of optimised, the logistics chain as a whole may operate 
conveyance based on their type or destination. A inefficiently, which may increase time costs. In order to 
commodities warehouse, which includes both automated achieve this, the study presents logistic chaotic mapping 
and conventional warehouses, is a facility used for the with the goal of leveraging its features to increase the 
storage of products. PLC and AGV monitoring are the two precision of knowledge accumulation. Then, during the 
primary components of logistics system monitoring. optimisation phase, some randomness is introduced into 
Several components of the logistics system can be the basic ACA's routes. 
monitored and controlled using PLC monitoring, which In IACA, at the conclusion of every iteration, a logistics 
functions as a programmable logic controller. who possesses the best solution for that the optimal answer 
Transporting items within or between warehouses requires or iteration because the methods inception changes the 
AGV monitoring, which keeps track of the equipment's disaster distanceas per eq.(6).Additionally, each 
position and operational state. To guarantee precise instruction updates the amounts of “pheromones” on the 
storage and retrieval of items, the logistics system as a final Communication, as illustrated in eq.(7), where 𝜏𝑖, 𝑗 is 
whole uses three-dimensional warehouse management, the updated Schedule on the last feedback, denoted by 
tracking, and computer-based management. Order 𝑅𝑖, 𝑗, 𝜏𝑖, 𝑗 ∗ is the updated location, τ0 is the initial 
1
processing, inventory queries, and other front-end tasks are warehouse, 𝛥𝜏𝑖, 𝑗 is represented as for the duration of 
𝐿𝑏𝑒𝑠𝑡
the key tasks. The system uses a robotic arm to operate in 
the optimal route 𝐿𝑏𝑒𝑠𝑡, and p and q are variables in (0,1) 
order to accomplish a variety of activities, including 
that correspond to the “pheromone” and its decay 
grabbing, moving, assembling, etc. The management level 
Optimization of Emergency Material Logistics Supply Chain Path...  Informatica 49 (2025) 187-198    193 
 
coefficient feedback rate, respectively. Based on the therefore 𝑑𝑖𝑗  = 𝑑𝑗𝑖for any pair of nodes. In the asymmetric 
likelihood 𝑝𝑖, 𝑗𝑘, which is computed as shown in Equation TSP, for at least one pair of nodes (i, j), asymmetric TSP 
(7) where 𝜂, each ant k chooses its new path.𝐼, 𝐽is the for at least one pair of nodes (i, j).  
inverse of the route's length.The locations that have not yet Define the variables in eq. 10: 
been visited are shown by the letters i, j, and l, while The 𝑥_(𝑖, 𝑗) =
respective impacts of “pheromone”s concentrations and  ( 1 𝑖𝑓 𝑡ℎ𝑒 𝑎𝑟𝑐(𝑖, 𝑗) 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑜𝑢𝑟 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒   
heuristic data are indicated by the control parameters α and                                                       (10) 
β, respectively. The TSP can be formulated as a generalisation of a well-
 known integer program formulation. 
𝜏𝑖, 𝑗 ∗= 1 − 𝑟𝜏𝑖, 𝑗 + 𝑟 · 𝛥𝜏𝑖, 𝑗, 𝑖𝑓𝑅𝑖, 𝑗                   (6) The constraints are written as:  
is in the best path 𝜏𝑖, 𝑗, otherwise 𝑋𝑛 𝑐 = 1 𝑥𝑖𝑗 =  1, 𝑑 =  1, 2, 3,· · · , 𝑛            (11) 
𝜏𝑖, 𝑗 ∗= 1 − 𝑞𝜏𝑖, 𝑗 +  𝑋𝑛 𝑑 = 1 𝑥𝑖𝑗 =  1, 𝑐 =  1, 2, 3,· · · , 𝑛      (12)  
𝑞𝜏0                                                                               (7) 𝑥𝑖𝑗 ∈ {0, 1}, 𝑐, 𝑑 =  1, 2, 3,· · ·
  , 𝑛                                                                                (13) 
As the process runs, FC first generate a variety of 𝑋𝑛 𝑐, 𝑑 ∈ 𝑆 𝑥𝑖𝑗 ≤  |𝑆|  −  1, 2 ≤  |𝑆|  ≤  𝑁 − (14) 
randomly generated solutions. “pheromone” s are then  
updated based on the problem type and IAC Algorithm, The overall cost that needs to be reduced is represented by 
with “pheromone” s placed on the graph's edges or the objective function (11) in these formulations. While 
vertices, to improve the solutions. The probability of the constraint (13) ensures that each town has a single position 
edge, which is computed as follows, determines whether allotted to it, restriction (12) ensures that each place is 
or not to traverse the edge between two nodes, i and j in occupied by a single city j. The zero-one variables' 
eq(8). integrality constraints 𝑥𝑖𝑗 (𝑥𝑖𝑗 ≥ 0) are represented by 
𝜋( 𝛽
𝑡)𝑎
𝑖𝑗𝑛
𝑃𝑘𝑗𝑖 = i𝑗 constraint as follows. Constraint (14) guarantees that no 
le0,                   (8) 
∑ 𝜋(𝑡)𝑎 𝛽isFeasib
𝑖 𝑛
k(i) detours will be created and that every city on the final 
𝑗 i𝑗
𝑗∈𝑁
itinerary will be visited once. 
 
 
where, 𝑃𝑘𝑗𝑖 is chances for the understanding of instruction 
Algorithm 1: Pseudocode for IACA 
from nodes I and J, 𝜋(𝑡)𝑎
𝑖𝑗denotes the value of the location 
1. Build an environment model 
𝛽
range, 𝑛i𝑗 is the materials, and 𝑁k(i)
is the establishment of 2. Intialising thne count of ants  
nodes that can be visited by comments 𝑃𝑘. To get better 𝑃𝑚𝑎𝑥 , 𝑀, 𝑆 𝐸, 𝛼, 𝛽, 𝑎, 𝑏, 𝑐, 𝑅0, 𝜋𝑖𝑗  , 𝜂𝑖𝑗 
results, it is advised to run a local search before upgrading 3. For 𝑃 = 1 to 𝑃𝑚𝑎𝑥 do 
the “pheromone” s. Nonetheless, the following approach is 4. Calculate β according to the equation (3); 
recommended for upgrading the eq. (9): 5. Calculate 𝜌according to the equation (8); 
𝜋𝑖𝑗(𝑡 + 1) = (1 − 𝜌) ⋅ 𝜋𝑖𝑗(𝑡) + Δ𝜋𝑖𝑗(𝑡)            (9) 6. For 𝑘 = 1 𝑡𝑜 𝑀 do 
Researchers and supply chain can learn more about the 7. Ant K into S; 
efficacy and customer communication as well as pinpoint 8. While ant 𝐾 does not reach 𝐸 and Optional node>0 do 
areas for development by utilizing the IAC Algorithm to 9. Determine the next emergency logistics from 
analyze emergency location from warehouse. Equation 4,5,9; 
 10. While ant 𝐾 in a deadlock do 
3.3 IACAmethod with traveling salesmen 11. Use deadlock handing route mechanism 
problem  12. Set the deadlock points as an obstacle point; 
Mathematicians and computer scientists in particular have 13. End while 
focused a lot of attention on the travelling salesman 14. End while 
problem (TSP) because it is both straightforward to 15. Save ant 𝐾 taking path; 
explain and challenging to solve. Looking for the shortest 16. Calculate ant 𝐾 length path; 
restricted path to the target. A full directional graph 𝐺 = 17. End for 
 (𝑁, 𝐴), can be used to represent the TSP, where A is a 18. Calculate the shortest path for the iteration; 
collection of arches and 𝐷 =  𝑑𝑖𝑗 is the price (a distance) 19. Divide the subpart by partitioning method; 
vector for every arc (𝑖, 𝑗)  ∈ 𝐴.. Often referred to as cities, 20. Update pheromones for each subpart by Equation 
N is a collection of n nodes, or vertices. The cost of matrix 10,11,12,13; 
D could be either symmetrical or asymmetrical. Finding 21. Set upper and lower pheromone limits by Equation 14; 
the shortest closed tour that visits each of the 𝑛 =  |𝑁| 22. End for  
nodes of G precisely once is known as the TSP. The 23. Output the optimal path; 
distances between the cities in symmetric TSP are  
irrespective of the path in which they traverse arcs,  
194      Informatica 49 (2025) 187-198       M. Wei 
 
4   Results and discussion  
Ants make better initial decisions and spend less time 4.2 Experimental analysis 
pursuing fruitless avenues when they are given more After improving ACA through performance analysis of the 
insightful instruction. The enhanced algorithms minimize emergency logistics supply chain in Travelling Salesman, 
the number of iterations required by directing ants toward a logistics automation system is built. First, the suggested 
promising areas of the search space right away.  Improved Ant Colony algorithm is used to verify its 
Experimental Setup: performance. For system simulation, the experiment is 
The technique is programmed and solved in this work carried out in MATLAB. Through table 2 establishment of 
using MATLAB 2017a, and it was evaluated on an Intel appropriate parameters and limitations, the IACA 
notebook running 64-bit software with a CPU speed of algorithm's performance is monitored.  
2.20 GHz and 4 GB of RAM.   
 Table 2: the experimental model parameters 
4.1 Dataset Parameters & units Values 
The emergency supply mechanism was promptly put into Fixed Cost (Yuan) 261 
place to provide the basic necessities of life for inhabitants 
transport cost (Yuan) 4 
in Shanghai lockdown zones. In order to store and 
Vehicle speed (Km/h) 60 
distribute supplies, ten ESWs were quickly established all 
the way through the city, utilising logistics supply, district Maximum Vehicle limit of 14w 
emergency food enterprises' distribution centres, and other mileage (Km) 
locations [18] Therefore, if the emergency material heavy load (Kg) 4250 
logistics is considered the point as central, the largest pallet sizes(mm) 465*455 
geographic area a disaster can potentially cover is  
approximately 633 km2, since the area of administration 4.3 Data preprocessing 
of Shanghai is 63401 km2. Caolu Modern Agricultural 
Min-max normalization 
Park was chosen as the Emergency material supply, and Min-max normalization is a technique for normalizing that 
the relevant data came from the lock-down zones created involves linearly transforming the initial data to create an 
on April 16, 2022, in accordance with Shanghai's distinct 
equilibrium of value comparisons before and after the 
preventative and control needs. Within the process. This approach could use the following Equation. 
ESWFootnote1's coverage area were 50 lockout zones, 
 
including Magnolia Fragrance Garden Phase II, Sunshine 𝑌−𝑚𝑖𝑛 (𝑌)
𝑌𝑛𝑒𝑤 =                                                                                
Flower City, and FengchenLeyuan. The ESW is 𝑚𝑎𝑥 (𝑌)−𝑚𝑖𝑛 (𝑌)
represented by the number 1 (shown in Fig 5(b) with a dot Where, 
in green), and the 50 lockdown zones are represented by 𝑌-Old value, 
the numbers 2–51. Figure 6 displays the regional 𝑌𝑛𝑒𝑤- The new value obtained from the normalized 
geographic information's distribution map. The ESW and outcome, 
part of the lockdown zones in Shanghai's Pudong New 𝑚𝑖𝑛 (𝑌)–Minimum value in the collection, 
Area are shown by the red area in Figure 5. The overall 𝑚𝑎𝑥 (𝑌)–Maximum value in the collection. 
distribution data is displayed in Figure 5(b). The red circle  
indicates the ESW zone, while the red and green dots  Outlier Detection and Removal  
indicate the lockdown and ESW zones, respectively. Outliers can degrade the efficiency of machine learning 
 models by distorting statistical relationships among 
features. To eliminate outliers, we employ Z-score 
evaluation, which determines how far each data point 
deviates from the mean in standard deviations. A Z-score 
greater than 3 or less than -3 denotes an outlier that should 
be eliminated. The Z-score is determined as equation. 
below. 
X−μ
𝑍 =                                                                                             
σ
  
where X is the data point, μ is the mean of the attribute, 
and σ is the standard deviation. 
 
Comparing Traditional and proposed method 
 ACO's initial total cost function value in Figure 6 was 
 above 3600, and after 132 iterations, it dropped to its 
Figure 5:  Lockdown zone for dataset 
Optimization of Emergency Material Logistics Supply Chain Path...  Informatica 49 (2025) 187-198    195 
 
lowest value of 339. The initial value of IACO's total cost prediction accuracy and can optimize logistics routes while 
function was less than 3243, and after 19 iterations, it lowering energy consumption in logistics distribution. 
dropped to its lowest value of 3208. The study approach Compared to other models, the fitting effect is superior. In 
outperformed the conventional ACO in terms ofthe order to confirm the suggested model's scalability and 
convergence speed as well as the initial value of the total dependability. 
cost function speed when comparing the convergence  
function images of the ACO and IACO models. A region 
with little Testing was done on the passenger flows when 
the traffic flow was nearly nothing in order to confirm the 
efficacy of the study approach in real-world application. 
Litchi was chosen for transportation because it is 
perishable and must be stored at a low temperature while 
being transported. Fuel trucks were used by transport 
vehicles, and for testing the application, 31 locations as 
targets were chosen. 
 
Figure 7: Comparison of model prediction accuracy 
  
Figure 6: Total cost value of traditional and proposed The accuracy and solution time of four distinct logistics 
 models are contrasted in Fig 8. The accuracy comparison 
The prediction accuracy of four distinct logistics models is results are displayed in Fig 8(a). At 98.58%, the suggested 
contrasted in Figure 7. The automated logistics control model achieved the best solution accuracy. The contrast of 
model developed for this study's prediction accuracy solution times is displayed in Figure 8(b). Although it was 
results are displayed in Figure 7(a). With an R2 of 0.98, greater than the other three models, the suggested model's 
the results demonstrated that the designed model had the solution time of 44.64 seconds was still within a 
highest prediction accuracy. This was 0.27, 0.30, and 0.17 reasonable range. 
higher than the prediction accuracy of the following  
algorithms: Tabu-Grey Wolf Optimisation (TGWO)[19], 
Improved Ant Colony Algorithm (IACA)[Proposed], 
Improved Particle Swarm Optimisation Algorithm 
(IPSO)[20], and neural network (NN)[21]. In conclusion, 
the model has a high prediction accuracy and can optimise 
logistics routes while lowering energy consumption in 
logistics distribution. Compared to other models, the 
fitting effect is superior. The suggested model is tested 
using real-world data to further confirm its scalability and 
dependability. The suggested model's accuracy and 
solution time are contrasted with those of the existing and 
proposed approaches.  
A statistical metric called R-squared is used to assess how 
well a regression model fits data. R-squared values range 
from 0 to 1. When the model fits the data exactly and the 
anticipated and actual values are identical, we have an R-  
square of 1. However, when the model fails to learn any  
association between the dependent and independent Figure 8: Accuracy and solution time of models 
variables and does not predict any variability, we obtain R-  
square equals 0. In conclusion, the model has a high 
196      Informatica 49 (2025) 187-198       M. Wei 
 
Table 3: Robustness analysis of the four algorithms. 
Techniques Hi Av inacc robus t(
gh era urac tness( s) 
es ge y(ξ) r)  
t 
IPSO 38 38 2.70 2.10 12
3. 4.5 .1
52 7 0 
TGWO 37 38 2.38 5.30 9.
7. 8.7 56 
82 9 
NN 37 37 2.64 10.09 6.
4. 7.1 69 
38 6 
IACA 37 37 0.74 20.20 1.
[Proposed] 1. 4.0 44 
17 3 
 
 
Figure 9: Finding Ideal path length finding 
 
In Figure 9 it is clear that the improved ant colony 
algorithm requires less iterations to finding the ideal path 
than the current approaches. Additionally, the optimal path 
length is shorter, allowing for faster convergence.  
 
Performance Analysis 
For this paper, the method was run 55 times, and for the 
three algorithms in the literature, the average_time (t), 
inaccuracy (ξ) in %, and robustness (r) in % of every 
 
approach were noted. The following are the equations:  Figure 10: ROC Curve 
𝜉 =  𝑎𝑣𝑒 –  𝑏𝑒𝑠𝑡/ 𝑏𝑒𝑠𝑡 ;  
𝑟 = 𝑚/𝑛; Figure 10 shows a visual depiction of the model's 
whereaverage is the overall mileage in average, Highestis performance over all thresholds is the ROC curve. The true 
the ideal overall mileage, n is the number of tests, and m positive rate (TPR) and false positive rate (FPR) are 
is the number of counts at which the optimal solution is computed at each threshold (practically, at predetermined 
found. The information for each one of the four methods intervals), and the TPR is then graphed over the FPR to 
findings is shown in Table 3 and show that the algorithm create the ROC curve. In the event that all other thresholds 
generated the highest overall mileage, average overall are disregarded, a perfect model, which at some threshold 
mileage, inaccuracy value, robust r, and algorithm time has a TPR of 1.0 and an FPR of 0.0, can be represented by 
expenditure when compared to the other examined either a point at (0, 1) The ROC is a helpful metric for 
methods. These findings suggest that the method performs evaluating the performance of distinct models, provided 
well in terms of computing complexity and robustness. that the dataset is fairly balanced. In general, the better 
This implies that the method employed in this study model is the one with a larger area under the curve. The 
surpasses the single ant colony algorithm and yields an suggested model [IACA] shows ROC curve has highest 
ideal result with minimum error, high precision, and accuracy with 0.95. A confidence interval is a range of 
consistent robustness. numbers that can be believe contains a population 
 parameter. For any normal distribution approximately 
 95% of the values are within two standard deviations of the 
 mean. From the following formula determine a 95% 
 confidence interval: 
 95% confidenceinterval 
 − 𝑠
= 𝑥 ± 1.96  
 √𝑛
  
Optimization of Emergency Material Logistics Supply Chain Path...  Informatica 49 (2025) 187-198    197 
 
4.3 Discussion used to confirm the model's superiority. Future research 
can produce predictions with more accuracy and fewer 
IACA has been included to the emergency logistics supply iterations as the forecast results. which enhances 
chain path in order to increase logistics efficiency. The scalability, adaptability, and real-time capabilities while 
enhanced ACA served as the foundation for the building integrating developing technologies with IOT. This 
of the emergency logistics system model. The IACA achieves so by reflecting the precision and flexibility of the 
algorithm demonstrated great optimization capabilities data in the optimization model. 
based on test results on the dataset. It was able to locate the   
ideal solution after 132 iterations, lowering the cost Limitations of IACA: 
reduction value to be more than 30% below the average  • Enhanced features like dynamic pheromone updates, 
cost. Additionally, the suggested model's delivery distance hybridization, and real-time data integration add 
was greater, its average power consumption per logistics computing overhead;  
node was lower, its emergency material for disasters was • Even with improvements, ACO may not be able to handle 
higher, and its prediction accuracy—which had an R2 of the exponential expansion in the number of paths as the 
0.98—was higher than that of the NN, TGWO, and IPSO. network size increases.  
This suggests that the suggested approach has some  
practical application value and good optimization • The effectiveness of the enhanced ant colony is highly 
capabilities that can successfully increase delivery dependent on sensitive parameters, including the number 
efficiency and lower fuel costs. While this study enhanced of vehicles, heuristic weighting, and pheromone 
the program's performance, it also made the method more evaporation rate. 
complex, which reduced its computing efficiency.  • If dynamic data is imprecise or delayed, offer less-than-
Therefore, in order to maintain performance, it will be ideal routes.  
necessary to significantly lower the algorithm's complexity There is a trouble striking a balance between time and 
in the future.  money when shipping products to several high-priority 
Benefits of the proposed approach include:  Lower sites. 
operating costs due to the algorithm's optimization of  
routes and vehicle usage, which lowers labor, fuel, and 
maintenance expenses. When essential materials are References 
delivered on time, penalties or reputational harm from [1] T. Kundu, J.-B. Sheu, and H.-T. Kuo, “Emergency 
delays are avoided; alternate routes are promptly found to logistics management—Review and propositions for 
future research,” Transportation Research Part E: 
avoid stopped highways; and time and expense are 
Logistics and Transportation Review, vol. 164, p. 
balanced to supply essential supplies effectively. 
102789, Aug. 2022, doi: 
 
https://doi.org/10.1016/j.tre.2022.102789 
5 Conclusion [2] Z. Li and X. Guo, “Quantitative evaluation of 
The logistics sector is changing dramatically as a result of China’s disaster relief policies: A PMC index model 
its ongoing expansion. The IAC Algorithm was developed approach,” International Journal of Disaster Risk 
to reduce the amount of time it takes to distribute supplies Reduction, vol. 74, p. 102911, May 2022, doi: 
from an Emergency Support Area (ESA) to Shanghai's 50 https://doi.org/10.1016/j.ijdrr.2022.102911 
lockdown zones. Using data from Pudong, Shanghai, the [3] Y. Zhang, Q. Ding, and J.-B. Liu, “Performance 
system was evaluated and contrasted with three other evaluation of emergency logistics capability for 
intelligent optimisation techniques. The findings public health emergencies: perspective of COVID-
demonstrated that the algorithm's accuracy was enhanced 19,” International Journal of Logistics Research and 
by its local optimisation operation and that its relative error Applications, pp. 1–14, Apr. 2021, doi: 
value was lower than that of other algorithms. In addition https://doi.org/10.1080/13675567.2021.1914566 
to being applicable to discrete optimisation issues, this [4] F. Diehlmann, M. Lüttenberg, L. Verdonck, M. 
method can serve as a general algorithmic framework for Wiens, A. Zienau, and F. Schultmann, “Public-
a number of scenarios, including rescue supplies, resource private collaborations in emergency logistics: A 
allocation during wildfires, emergency rescue during framework based on logistical and game-theoretical 
floods, and the transportation of hazardous items. concepts,” Safety Science, vol. 141, p. 105301, Sep. 
However, the impact of road networks on transportation 2021, doi: 
and supply distribution, lockdown zone configuration, and https://doi.org/10.1016/j.ssci.2021.105301 
[5] S. Jomthanachai, W.-P. Wong, K.-L. Soh, and C.-P. 
population density are not taken into account in this 
Lim, “A global trade supply chain vulnerability in 
article.But existing logistics systems frequently lack the 
COVID-19 pandemic: An assessment metric of risk 
necessary flexibility and insight. In order to do this, the 
and resilience-based efficiency of CoDEA method,” 
study modified the logistics supply chain system by 
Research in Transportation Economics, vol. 93, p. 
integrating TSP into IACA. Simulation experiments were 
198      Informatica 49 (2025) 187-198       M. Wei 
 
101166, Dec. 2021, doi: 11709, Aug. 2022, doi: 
https://doi.org/10.1016/j.retrec.2021.101166 https://doi.org/10.1109/tits.2021.3106305 
[6] “International Journal of Disaster Risk Reduction | [15] F. Lu, W. Feng, M. Gao, H. Bi, and S. Wang, “The 
Vol 74, May 2022 | ScienceDirect.com by Elsevier,” Fourth-Party Logistics Routing Problem Using Ant 
Sciencedirect.com, 2022. Available: Colony System-Improved Grey Wolf Optimization,” 
https://www.sciencedirect.com/journal/international vol. 2020, pp. 1–15, Oct. 2020, doi: 
-journal-of-disaster-risk-reduction/vol/74/suppl/C. https://doi.org/10.1155/2020/8831746 
[Accessed: Oct. 26, 2024] [16] M. Ashour, R. Elshaer, and G. Nawara, “Ant Colony 
[7] Y. Liu, J. Li, M. Liu, and B. Jiao, “An Enhanced Ant Approach for Optimizing a Multi-stage Closed-Loop 
Colony Algorithm-Based Low-Carbon Distribution Supply Chain with a Fixed Transportation Charge,” 
Control Method for Logistics Leveraging Internet of Journal of Advanced Manufacturing Systems, pp. 1–
Things (IoT),” Wireless Communications and Mobile 24, Nov. 2021, doi: 
Computing, vol. 2023, pp. 1–12, Nov. 2023, doi: https://doi.org/10.1142/s0219686722500159 
https://doi.org/10.1155/2023/5555221 [17] ]M. Abdolhosseinzadeh and M. M. Alipour, “Design 
[8] H. Jin, Q. He, M. He, S. Lu, F. Hu, and D. Hao, of experiment for tuning parameters of an ant colony 
“Optimization for medical logistics robot based on optimization method for the constrained shortest 
model of traveling salesman problems and vehicle Hamiltonian path problem in the grid networks,” 
routing problems,” International Journal of Numerical Algebra, Control & Optimization, vol. 11, 
Advanced Robotic Systems, vol. 18, no. 3, p. no. 2, p. 321, 2021, doi: 
172988142110225, May 2021, doi: https://doi.org/10.3934/naco.2020028 
https://doi.org/10.1177/17298814211022539 [18] Chen, H. 2022. “Shanghai Starts 10 Emergency 
[9] M. Yang, “Yang, M. (2022). Research on Vehicle Supply Warehouses.” 
Automatic Driving Target Perception Technology https://contentstatic.cctvnews.cctv.com/snowbook/i
Based on Improved MSRPN Algorithm. Journal of ndex.html?item_id=32234810742465611. Chen, C.-
Computational and Cognitive Engineering, 1(3), H., Y.-C. Lee, and A. Y. Chen. 2021.  
147–151. [19] H. Zhang, J. Yan, and L. Wang, “Hybrid Tabu-Grey 
https://doi.org/10.47852/bonviewJCCE20514. wolf optimizer algorithm for enhancing fresh cold-
[10] A. De, M. Gorton, C. Hubbard, and P. Aditjandra, chain logistics distribution,” PLoS ONE, vol. 19, no. 
“Optimization model for sustainable food supply 8, pp. e0306166–e0306166, Aug. 2024, doi: 
chains: An application to Norwegian salmon,” https://doi.org/10.1371/journal.pone.0306166 
Transportation Research Part E: Logistics and [20] K. Tan, W. Liu, F. Xu, and C. Li, “Optimization 
Transportation Review, vol. 161, p. 102723, May Model and Algorithm of Logistics Vehicle Routing 
2022, doi: https://doi.org/10.1016/j.tre.2022.102723 Problem under Major Emergency,” Mathematics, 
[11] T. Yan, F. Lu, S. Wang, L. Wang, and H. Bi, “A vol. 11, no. 5, pp. 1274–1274, Mar. 2023, doi: 
hybrid metaheuristic algorithm for the multi- https://doi.org/10.3390/math11051274 
objective location-routing problem in the early post- [21] M. Chen, “RETRACTED ARTICLE: Optimal path 
disaster stage,” Journal of Industrial and planning and data simulation of emergency material 
Management Optimization, vol. 19, no. 6, pp. 4663– distribution based on improved neural network 
4691, Jan. 2023, doi: algorithm,” Soft Computing, vol. 27, no. 9, pp. 5995–
https://doi.org/10.3934/jimo.2022145 6005, Apr. 2023, doi: 
[12] A. Zhu and Y. Wen, “Green Logistics Location- https://doi.org/10.1007/s00500-023-08073-4 
Routing Optimization Solution Based on Improved  
GA A1gorithm considering Low-Carbon and  
Environmental Protection,” Journal of Mathematics,  
vol. 2021, pp. 1–16, Nov. 2021, doi:  
https://doi.org/10.1155/2021/6101194 
[13] M. Abbasian, Z. Sazvar, and M. 
Mohammadisiahroudi, “A hybrid optimization 
method to design a sustainable resilient supply chain 
in a perishable food industry,” Environmental 
Science and Pollution Research, Aug. 2022, doi: 
https://doi.org/10.1007/s11356-022-22115-8 
[14] Z.-H. Sun, X. Luo, E. Q. Wu, T.-Y. Zuo, Z.-R. Tang, 
and Z. Zhuang, “Monitoring Scheduling of Drones 
for Emission Control Areas: An Ant Colony-Based 
Approach,” IEEE Transactions on Intelligent 
Transportation Systems, vol. 23, no. 8, pp. 11699–
https://doi.org/10.31449/inf.v49i16.6990  Informatica 49 (2025) 199-212   199 
 
Application Method and Least Squares Support Vector Machine 
Analysis of a Heat Pipe Network Leakage Monitoring System Using an 
Inspection Robot 
Xu Wang, Xiaobo Long, Guangwei Li, Jing Li, Yuweijia Zhao* 
Kunming metallurgy college, Kunming, Yunnan, 650033, China  
E-mail: kmyz_wang79@163.com 
*Corresponding author 
 
Keywords: heat pipe network inspection robot, human-computer interaction, heat pipe network side leakage monitoring, 
mobile platform, robot control 
 
Received: August 24, 2024 
With the maturity of the Internet and big data technAology, heat supply intelligence has become a 
development trend, and the traditional heat pipe network management mode is gradually transitioning to 
an "intelligent heat pipe network". It has become a hot spot for research and development at home and 
abroad. Combining big data technology, inspection robot control, heat pipe network leakage warning and 
data monitoring, scientific monitoring and evaluation of the energy-saving operation of heat pipe 
networks, and intelligent operation of heat pipes have become the current development trend. Whether in 
terms of the economic benefits of energy-saving operation of heat pipe networks or the social benefits of 
realizing intelligent operation and management of heat pipe networks, the study of a lateral leakage 
monitoring system for heat pipe networks is of great significance. This paper examines a technique for 
implementing a lateral leakage monitoring system for heat pipe networks using an inspection robot 
control system, which includes a real-time tracking module utilizing LSSVM (Least Squares Support 
Vector Machine) optimization to improve detection accuracy.  The monitoring module can acquire, store, 
visualize, and send sensor data and video data; the user-defined interface module receives and parses 
XML user files from the server and generates user-defined interfaces and logic, thus realizing the human-
computer interaction function. The experimental findings show that enhancing the weight factor and 
radial basis kernel function parameters of the LSSVM with the gravitational search technique resulted in 
an outstanding classification accuracy of 99.99% with a classification time of only 55.938 seconds, 
surpassing other optimization techniques. 
Povzetek: Z uporabo robotskega nadzornega sistema in optimizirane metode LSSVM so avtorji razvili 
inteligentni sistem za spremljanje puščanja v toplotnih cevnih omrežjih 
 
1 Introduction referred to in the study of heat pipe network inspection 
Leakage in a thermal pipe network is a sudden change in robot monitoring systems [5]. 
liquid flow head or flow pressure caused by the flow rate Numerous studies have investigated different methods of 
of the medium stored outside the pipe exceeding a set inspecting and identifying leaks in pipeline networks, 
value, which leads to a leak in the pipe [1]. Generally, we highlighting the significance of intelligent systems. 
put in a sealed space after the occurrence of a leakage Zholtayev et al. [6] created a smart pipe inspection robot 
accident resulting in energy loss for the minimum, but this with in-chassis motor actuation and AI-powered defect 
method cannot effectively monitor the actual identification, showcasing sophisticated robotics 
environmental conditions over the failure caused by incorporation in network tracking. Murtazin et al. [7] 
system damage to the maximum extent is not accurate and examined internal inspection techniques for district 
reliable, timely detection and treatment of the process heating networks, highlighting the importance of resilient 
caused by the consequences are very serious and likely to inspection techniques in energy systems. Wong and 
bring huge economic losses to the enterprise [2-3]. McCann [8] conducted an in-depth analysis of pipeline 
Therefore, a lot of research has been carried out at home failure identification methods, ranging from acoustic 
and abroad on the diagnosis of leakage faults in heat pipe sensing to cyber-physical systems, emphasizing the 
networks, and many leakage monitoring methods have growing use of IoT solutions in fault detection. Liu et al. 
been proposed. Although various methods have certain [9] presented an enhanced BP neural network algorithm 
limitations and need to continue to be improved [4], the for leakage detection in air conditioning water systems, 
leakage monitoring of heat pipe networks and their demonstrating the efficacy of machine learning in 
compensators has important significance and can be detecting faults. Korlapati et al. [10] performed a thorough 
200      Informatica 49 (2025) 199-212       X. Wang et al. 
review of pipeline leak identification approaches, ranging Hossain et al. [14] used UAV image evaluation and 
from conventional to AI-based methods. Similarly, Yussof machine learning to identify leaks in district heating, 
and Ho [11] examined water leak detection techniques in demonstrating the value of aerial monitoring for 
smart buildings, emphasizing the significance of these infrastructure surveillance. Finally, Vollmer et al. [15] 
technologies in contemporary infrastructure. Langroudi compared anomaly detection techniques in thermal 
and Weidlich [12] investigated predictive maintenance imagery for district heating leak identification, which 
assessment techniques for district heating pipes, which advances the use of thermal imaging in fault identification. 
added to service-life prediction techniques. Van Dreven et Table 1 shows a summary table. 
al. [13] addressed smart fault detection in district heating,  
finding significant patterns and obstacles in the area. 
 
Table 1: Summary table 
 
Citation Title Accuracy Efficiency Limitations Innovations Gaps in SOTA 
[6] Smart Pipe 95% High (real- Scalability, cost AI-powered Flexibility for 
Zholtayev et Inspection time) discovery, high different pipe 
al. (2024) Robot with AI- precision types 
Powered Defect 
Discovery 
[7] Murtazin Internal 92% Moderate Constrained with Non-destructive Constrained with 
et al. (2021) Inspection of magnetic testing testing particular 
District Heating pipelines 
Networks 
[8] Wong & Pipeline Failure 70-85% Low (real- Inconsistent Discovery High 
McCann Discovery: time) accuracy, high cost taxonomy, computational 
(2021) Acoustic to spatial cost 
Cyber-Physical enhancement 
Systems 
[9] Liu et al. Leakage 86.96% High Fault location error Two-stage Constrained real-
(2022) Analysis for Air diagnosis, BP time localization 
Conditioning neural network 
Water Systems 
[10] Review of 87% Varies No standardization Review of Variability in 
Korlapati et Pipeline Leak subsea reliability 
al. (2022) Discovery techniques 
Techniques 
[11] Yussof Water Leak 81% Varies Real-time gaps in Incorporation Absence of 
& Ho (2022) Discovery in smart buildings with building automated 
Smart Buildings automation discovery 
[12] Predictive 85-90% High Constrained with Proactive AI- Narrow 
Langroudi & Maintenance for district heating driven concentration 
Weidlich District Heating maintenance 
(2020) Pipes 
[13] van Fault Discovery 80-93% Medium- Data Restrictions ML methods for Absence of open-
Dreven et al. in District High fault discovery source data 
(2023) Heating with 
ML 
[14] Hossain UAV-Based 85% Moderate Constrained with UAV with Poor scalability 
et al. (2020) Leakage UAV image infrared for large systems 
Discovery for examination discovery 
District Heating 
[15] Vollmer Anomaly 90% High False positives in Self-learning Difficulties with 
et al. (2021) Discovery in intricate systems algorithms dynamic settings 
Water Networks 
with Self-
Learning 
Algorithms 
Existing state-of-the-art (SOTA) techniques have many techniques lack scalability, but this paper presents a 
shortcomings, like constrained flexibility to particular scalable AI-based framework for large-scale networks. 
pipeline types, whereas the proposed system has wider Real-time efficiency is hampered by high computational 
applicability. Previous UAV and subsea detection 
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212   201 
expenses; the proposed system improves this with process the data so that each indicator value is uniformly 
improved algorithms. within a certain range of numerical characteristics. In this 
Based on the above, this paper focuses on the design of the paper, 500 training samples and 90 test samples are 
heat pipe network leakage monitoring system, including randomly selected, and the independent variable of the 
hardware circuits and software programs, which involves leakage fault diagnosis model is denoted as x. The leakage 
the following key technologies: the selection of each fault level of the key nodes of the heat pipe network 
component. According to the different types of obtained after random sampling is denoted as the 
components, the corresponding models are selected to dependent variable y. The conductivity G1 and G2 are 
analyze the working conditions under various parameters. processed by logarithm, the temperature and pressure are 
The microcontroller control module, the sensor data normalized, and the sample data of the independent 
acquisition part, and the peripheral devices such as the variable after data pre-processing are 
display alarm form the overall structure to complete the  
design scheme and design the thermal pipe network 𝑇 2,3,4−𝑇𝑚𝑖𝑛1,2,3,4
 𝑇
1,
𝑛𝑜𝑟𝑚𝑙1,2,3,4 =
leakage monitoring system based on the inspection robot  𝑇𝑚𝑎𝑥1,2,3,4−𝑇𝑚𝑖𝑛1,2,3,4
control system.    𝐺𝑛𝑜𝑟𝑚𝑙1,2 = lg(𝐺1,2)         (1) 
 
 𝑝−𝑝
 𝑝
{ 𝑛𝑜𝑟𝑚𝑙 =
𝑚𝑖𝑛
𝑝𝑚𝑎𝑥−𝑝𝑚𝑖𝑛
2 Leakage fault diagnosis in heat pipe  
networks Using equation (1) we can obtain the normalized data for 
the training samples as well as the test samples, and the 
2.1 Leakage fault modelling dependent variable, the leakage fault level at the key nodes 
Leakage faults at key nodes of the heat pipe network are of the heat pipe network, is given by equation (2) 
classified into three levels: normal, normal leakage, and  
severe leakage, and the leakage will lead to a sudden drop  yi ∈ {1,2,3}                                                        (2) 
in pressure inside the pipe and changes in ambient   
temperature and conductivity [16-17]. Therefore, in this The least squares support vector machine algorithm is then 
paper, the ambient temperature T1, T2, T3, and T4, the used to build a classifier to achieve a multiclassification 
ambient conductivity G1 and G2, and the internal pressure fault diagnosis model for critical node leakage in the 
p of the pipe are selected as inputs, and the leakage level thermal network. In the thermal pipe network critical node 
of the critical node of the heat pipe network is taken as the leakage fault diagnosis model, we assume that the 
output to establish the leakage level discrimination model. independent variable is x, and define the nonlinear thermal 
In this paper, the leakage level is expressed as {1, 2, 3} and pipe network critical node leakage fault diagnosis model 
will be used as the output of the leakage fault model, while of least squares support vector machine as: 
the ambient temperature and conductivity around the heat  
pipe network and the internal operating pressure of the  x = [T1, T1, T3, T4, G1, G2, P}                            （3） 
pipe network obtained online by the in-situ monitoring unit 
yi = {ω, (φx)} + b                                                 （4） 
are used as the model inputs. The multi-classification 
 
leakage fault diagnosis at key nodes of the heat pipe 
Given a set of data points that are closely related to the 
network consists of four steps: sample collection, data pre-
fault diagnosis of leakage at critical nodes of the thermal 
processing, building and optimizing the multi-
network, i.e. ambient temperature, ambient conductivity, 
classification leakage fault diagnosis model, and model 
and internal pipe pressure, d is the dimensionality of the 
testing. Specifically, x training samples and y test samples 
model input variables and is the result of the model 
are arbitrarily selected; the extracted training and test 
classification, i.e. normal (1), normal leakage (2) and 
samples are normalized; the multiclassification heat pipe 
severe leakage (3), l is the total number of known data 
network critical node leakage fault diagnosis model is 
points, and b is a constant. Therefore, the target equation 
established; the model parameters are optimized; and the 
and the nonlinear decision function used in the input space 
experimental samples are substituted into the established 
can be defined as: 
multiclassification heat pipe network leakage fault 
 
diagnosis model for testing. 1
Since each characteristic indicator of temperature,     min ω2
1
+ C∑l 2   
2 2 i=1ei                                  （5） 
conductivity, and pressure has different magnitudes and 𝑦(𝑥) = 𝑠𝑔𝑛(∑𝑆 𝑦𝑖𝑎𝑖𝐾(𝑥
, 𝑥𝑖) + 𝑏)                （6） 
𝑖
orders of magnitude, the role of a characteristic indicator  
of a particularly large order of magnitude in the 2.2 Optimization of the parameters of the 
classification may be highlighted in the calculation process. 
leakage fault diagnosis model 
To eliminate differences in the units of the characteristic 
indicators and the effect of different orders of magnitude In this paper, the gravitational search method is used to 
of the characteristic indicators, it is necessary to pre- optimize the parameters of the weight factor and the radial 
202      Informatica 49 (2025) 199-212       X. Wang et al. 
basis kernel function of the least squares support vector 𝑀 = 𝑀 = 𝑀 𝑁
 𝑎𝑖 𝑝𝑖 𝑖𝑖 = 𝑀𝑖 , 𝑖 = 1,2,⋯ ,
machine. The gravitational search algorithm focuses on  𝑓𝑖𝑡 𝑜𝑟𝑠𝑡(𝑡)
𝑚 𝑖(𝑡)−𝑤
using the law of gravity between two objects to guide the      𝑖(𝑡) = 𝑏𝑒𝑠𝑡(𝑡)−𝑤𝑜𝑟𝑠𝑡(𝑡)   （13） 
 𝑚 𝑡)
optimal search for the optimal solution for the motion of  𝑀𝑖(𝑡) =
𝑖(
each object. In this algorithm, each object is considered as { ∑𝑁𝑗=1𝑚𝑗(𝑡)
𝑏𝑒𝑠𝑡(𝑡) = min
an object whose performance is measured by its mass, and 𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡)
{                  （14） 
all these objects are attracted to each other through gravity, 𝑤𝑜𝑟𝑠𝑡(𝑡) = max𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡)
and this force causes all objects to move towards the object  
with the heavier mass [18]. The position of the object in 𝑏𝑒𝑠𝑡(𝑡) = max𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡)
motion corresponds to its optimal solution. The {                  （15） 
𝑤𝑜𝑟𝑠𝑡(𝑡) = min𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡)
gravitational search method can be thought of as an 
 
isolated system of masses. Each object in motion follows 
According to the above principles of the gravitational 
the law of gravity and the law of motion. Assuming a 
search method and the LSSVM algorithm, it is clear that 
system with N objects, define the position of the ith 
the idea of the gravitational search method to optimize the 
particle as 
LSSVM learning parameters is to search for a set of 
 
vectors within a certain search space region by the 
  𝑥 , ,
𝑖 = (𝑥1,𝑖 ⋯
, 𝑥𝑑𝑖 ⋯ 𝑥𝑛𝑖 )，𝑖 = 1,2,⋯ ,𝑁          (7) 
gravitational search method, so that equation (16) the value 
 of the target fitness function is minimized.  
Then the interaction force and its parameters are given by  
 1
  minf(C, δ) = ∑N y              (16) 
n i=1( i − ŷ)
2          
  Fd
M (t)×M
( aj(t)
ij(t)
p
= G t)
i
(xd − xd 8) 
R j (t) i (t))       (
ij(t)+ε  
 The basic principle of the gravitational search method is to 
Rᵢⱼ(t) = ‖xᵢ(t), xⱼ(t)‖₂                                        (9) optimally adjust the parameters of the least squares support 
 vector machine weight factor and the radial basis kernel 
Fd(t) = ∑N function by using the strong global search capability of the 
i j=1,j≠irandjF
d
ij(t)                               (10) 
gravitational search method, and the optimization steps are 
 
as follows: first, randomly select and normalize the 
where rand is a random number generated between the 
training and test samples; given the population size N and 
intervals [0,1]. Therefore, according to Newton's laws of 
the maximum number of iterations, randomly initialize n 
motion, the acceleration of particle i in d-dimensional 
particles; during each iteration, substitute The position of 
space at time t and its calculation is given by 
each particle is substituted into the least squares support 
 
𝑑 vector machine model and the fitness value of the current 
(𝑡)
𝑎𝑑
𝐹
𝑖 (𝑡) =
𝑖                                     (11） 
𝑀 particle is obtained; the sum of the forces in different 
𝑖𝑖(𝑡)
 directions of each particle and the acceleration of each 
𝑣𝑑 particle are calculated according to Equation (10); the new 
𝑖 (𝑡 + 1) = 𝑟𝑎𝑛𝑑
𝑑
𝑖 × 𝑣𝑖 (𝑡) + 𝑎
𝑑
𝑖 (𝑡){ particle position is calculated according to the update 
𝑥𝑑 𝑑            (12) 
𝑖 (𝑡 + 1) = 𝑥𝑑𝑖 (𝑡) + 𝑣𝑖 (𝑡 + 1)
formula of the particle velocity and position; the algorithm 
 
termination condition is judged. If the maximum number 
where randy is a uniform random variable in the interval 
of iterations is reached, the iteration is terminated and the 
[0,1], which we use to give a random signature to the 
optimal parameter values are output.  
gravitational search. The gravitational constant G is 
The parameter tuning method for the gravitational search 
initialized at the start moment and decreases with time to 
technique (GSA) was carefully planned to improve the 
control the search accuracy. The gravitational and inertial 
LSSVM's efficiency. During this procedure, key 
masses are simply calculated by fitness evaluation. A 
parameters were adjusted, including the gravitational 
heavier mass means a more efficient object, which means 
constant, agent mass, and initial population size. Particular 
that such an object has a higher gravitational force and 
settings comprised 𝐺0=100, mass range of [1,10], and 50 
slower velocity. Assuming that the gravitational mass is 
agents. The GSA procedure is depicted in the flowchart 
equal to the inertial mass, we update gravity and inertia 
below, starting with the initialization of agent positions 
using equation (13), of which equations (14-15) are the 
and masses, then iterative updates using gravitational 
definitions of the parameters as well as maximizing the 
forces, and finally evaluating the LSSVM classification 
transformation. 
efficacy. A total of 590 samples were used for dataset 
 
selection, with 500 serving as training samples and 90 as 
testing samples. The training dataset had a balanced 
distribution across multiple fault classes, guaranteeing that 
each class was sufficiently represented to avoid bias. Each 
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212   203 
training sample was created from real-world functional In this paper, the parameters of the weight factor and radial 
data and represents a variety of fault situations. To basis kernel function of LS-SVM are optimized based on 
guarantee an unbiased assessment of the model's efficacy, the particle swarm algorithm, cuckoo algorithm, and 
the test samples were chosen at random from the same gravitational search method. This optimized multi-
dataset while retaining the identical distribution features. classification fault diagnosis model for leak monitoring at 
This comprehensive description of the parameter tuning key nodes of the heat pipe network is used to test the test 
and dataset choice procedures not only improves samples. The classification results are also compared. The 
replicability, but it also improves the validation of the 500 training samples were taken into the leak fault 
provided outcomes, allowing other researchers to diagnosis model and the parameters of the LS-SVM 
efficiently apply the approach. weight factor and radial basis kernel function were first 
optimized using the gravitational search method to obtain 
the optimal values, which took 55.938 seconds to run and 
99.99% of the test samples were correctly classified. 
Figure 2 illustrates a comparison of two elements of the 
gravitational search technique. The top section displays a 
parametric merit search plot, demonstrating how the 
technique assesses various parameters to identify the best 
solution. The bottom section shows the classification 
findings for the test set, demonstrating the method's ability 
to correctly categorize data using the optimum parameters 
discovered during the merit search. In general, this figure 
depicts the relationship between the parameter 
optimization procedure and its effect on classification 
efficiency. 
 
 
 
Figure 1: Flowchart of the GSA Process 
  
Figure 2: Comparison of the parametric merit search 
2.3 Leakage fault diagnosis and result 
graph of the gravitational search method (top) and the test 
analysis 
set classification of the merit search (bottom) 
We introduce the mean square error MSE as an index to  
evaluate the correct classification rate, which is calculated Optimization of the weight factor and radial basis kernel 
as function parameters of the LS-SVM using the cuckoo 
 algorithm resulted in an optimal value of 28.7282 and an 
1
  𝑀𝑆𝐸 = ∑𝑛 (𝑦 − 𝑦 )2   
𝑛 𝑖=1 𝑖 ∧ 𝑖                               (17) optimum value of 15.8259, and the time taken to run was 
60.491 seconds, giving a 97.89% correct classification rate 
 
204      Informatica 49 (2025) 199-212       X. Wang et al. 
for the test sample. In this paper, we optimize the weight opposed to 0.950 for the Cuckoo Algorithm. These metrics 
factor and radial basis kernel function parameters of the demonstrate the GSA's improved classification efficiency. 
LS-SVM based on the gravitational search method, the  
cuckoo algorithm, and the particle swarm algorithm, and  
use a randomly selected set of 90 test samples to check the  
correct classification rate. The findings show that the  
multi-classification fault diagnosis model, which uses a Table 3: Performance metrics 
least squares support vector machine algorithm enhanced  
by the gravitational search technique, attains a Metric Gravitational Cuckoo 
classification accuracy of 99.99% in only 55.938 seconds. Search Method Algorithm 
The results show that the multi-classification fault (GSA) 
diagnosis model based on the least squares support vector Accuracy 99.99% 95.75% 
machine algorithm optimized by the gravitational search Precision 99.95% 93.00% 
method has the best classification effect and the lowest Recall 99.90% 94.50% 
algorithm complexity. F1 Score 99.92% 93.75% 
Table 2 shows a confusion matrix comparing the ROC AUC 0.999 0.950 
Gravitational Search Algorithm (GSA) and the Cuckoo Score 
Algorithm. The GSA had 450 true positives (TP) versus  
425 for the Cuckoo Algorithm, showing superior 
2.4 Heat pipe network inspection robot 
efficiency in finding positive cases. The GSA also 
control system 
recorded 40 true negatives (TN), which exceeded the 
Cuckoo Algorithm's 35. Particularly, the GSA had only The thermal pipe network inspection robot system is 
two false positives (FP), whereas the Cuckoo Algorithm divided into four modules: server, master controller 
had ten, indicating higher precision. Furthermore, the GSA (mobile side), slave controller (motion controller), and 
had three false negatives (FN) compared to the Cuckoo's pipe robot mechanical system. The principle of operation 
fifteen, demonstrating its detection efficiency. Overall, the and control of the thermal pipe network inspection robot 
GSA outperformed the Cuckoo Algorithm. system is based on the motion controller as the core [19]. 
 When the user's control logic is burned into the motion 
Table 2: Confusion matrix controller and parsed, the motion controller sends control 
 commands to the actuators to realize the motion control of 
Metric Gravitational Cuckoo the thermal pipe network inspection robot. The motion 
Search Method Algorithm controller collects and processes the robot motion data 
(GSA) from the photoelectric encoder and the mobile terminal in 
True Positives 450 425 real-time during the motion of the heat pipe network 
(TP) inspection robot, and according to the results of the data 
True Negatives 40 35 processing, the motion controller adjusts the motion state 
(TN) of the pipe robot in real time to achieve closed-loop control 
False Positives 2 10 of the pipe robot motion. In addition, the mobile side is 
(FP) equipped with a self-developed thermal network 
False 3 15 monitoring system, which sends the sensor data and video 
Negatives (FN) data to a remote server via network communication 
 (TCP/IP communication protocol), enabling remote 
Table 3 presents efficiency metrics for the GSA and the monitoring of the pipeline robot. The specific functions of 
Cuckoo Algorithm. The GSA attained an impressive 99.99% the four components are as follows. 
accuracy, substantially higher than the Cuckoo Firstly, the server. The server side of this paper is equipped 
Algorithm's 95.75%. The GSA had a precision of 99.95%, with a self-developed remote thermal pipe monitoring 
compared to 93.00% for the Cuckoo Algorithm, system client, whose function is to receive the sensor data 
suggesting that it was more reliable at predicting the and video data from the mobile side and to visualize them, 
positive class. The GSA's recall was 99.90%, while the server can send control commands to the motion 
demonstrating its ability to detect pertinent instances, controller via the mobile side to adjust the motion of the 
whereas the Cuckoo Algorithm had a recall of 94.50%. pipe robot. Secondly, the mobile side. The mobile side is 
The F1 Score for the GSA was 99.92%, while the Cuckoo fitted with an on-site thermal pipe monitoring system 
Algorithm's was 93.75%, demonstrating the GSA's total client, which is capable of acquiring sensor data and video 
superiority. Finally, the ROC AUC Score for the GSA was data in real time using the mobile side's hardware-
0.999, indicating outstanding discriminative capacity, as integrated sensor set and HD camera. The system 
processes the data in two ways: one is the local 
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212   205 
visualization and storage processing of the data; the other PWM frequency, 10-bit ADC resolution, and configure the 
is the sending of the data to the server side and the motion UART communication rate to 9600 bauds. 
controller. Thirdly, the multi-core heterogeneous motion Sensor calibration: To guarantee precise readings, 
controller. It is used to implement the user's control logic, calibrate the incorporated sensors by adjusting the 
which is the core of the motion control of the heat pipe temperature sensors to a reference temperature of 25°C. 
network inspection robot. The main hardware modules of The HD camera should be setup to capture video at 1080p 
the motion controller are a master MCU, two slave MCUs, resolution and 30 frames per second, while the laser 
a motor driver chip, a voltage converter chip, and a sensor scanner is set to a highest detection range of 5 meters. 
set. Fourth, the mechanical structure of the thermal pipe  
network inspection robot: it is the bearer of the mobile end, Control logic execution: Program control commands 
the power module of the motion controller, etc., and is also using user-defined logic into the motion controller, 
the final executor of the motion controller's operating allowing the robot to perform particular movement trends 
instructions. within the pipeline. 
In addition, the mechanical structure of the thermal pipe Test environment setup: To simulate real-world 
network inspection robot in this paper is mainly divided circumstances, build a scaled-down model of a 10-meter-
into the chassis, walking mechanism, drive module, and long thermal pipe network with differing diameters (50 
the articulation mechanism between the chassis, etc. The mm and 100 mm). 
chassis is the main part of the mechanical system of the Conducting trials: Perform at least five trials to evaluate 
thermal pipe network inspection robot, which carries the the robot's efficiency, recording control commands, sensor 
motion controller, drive system, and mobile end of the pipe readings, and motion execution times, with a target 
robot. The travel mechanism and drive module are the key execution time of less than 120 seconds. 
factors to ensure that the thermal pipe network inspection Data gathering and examination: Gather and evaluate 
robot walks normally in the pipeline, and are the focus of sensor and camera data to compare actual detection 
the pipeline robot mechanism design. findings to expected results, with a target detection 
The PC-Microcontroller control system was chosen accuracy of 90%. Record any deviations in effectiveness. 
because the thermal network inspection robot in this paper Expected vs. actual outcomes 
needs to process information such as video information Detection Accuracy: The detection accuracy is set at 90% 
and scanner data as well as execute control commands. for detecting known defects in the thermal pipe network. 
Based on the functional requirements of the system and the Execution time: The anticipated execution time should 
tasks to be completed by the robot, the robot system is not surpass 120 seconds, and actual times will be logged 
composed of a power supply system, a sensor system, an for comparison. 
upper computer system, a lower computer system, a Power requirements: The power supply system should 
motion control unit, a bus communication system, a video supply 24V to the motor driver, 12V and 5V to the sensor 
system, and a laser scanner system. The power supply unit, and 5V to the microcontroller and motion control 
system is responsible for supplying power to all parts of system. 
the robot. The robot in this project requires power from the The overall structure of the robot control is shown in 
motor driver (24V); the sensor unit (12V, 5V); the main Figure 3.  
controller (5V) of the proposed ATmega series of 
microcontrollers; and the motion control system (5V). The 
robot system is controlled by the lower computer, the 
upper computer sends commands to the lower computer 
through the bus communication system, and the lower 
computer controls the normal operation of the robot 
according to the received commands.  
 
3 Experimental procedures for 
testing the control system of the 
thermal pipe network inspection 
robot 
Preparation: Assemble the robot's chassis, locomotion 
mechanism, drive module, and articulation system, 
making sure that the master MCU, two slave MCUs, motor 
driver chip, and sensor array are properly linked. 
Microcontroller configuration: Setup the ATmega 
microcontroller with a 16 MHz clock frequency, 490 Hz 
206      Informatica 49 (2025) 199-212       X. Wang et al. 
 
 
 Figure 4:  Upper computer development process 
  
Figure 3: Control system structure block diagram Some of the program code for the upper computer is as 
Motor drive system design: follows.  
In this paper, the heat pipe network inspection robot has  
multiple motors: 2 main motors for walking, 1 camera ////1. Capture user operation commands 
rotation motor, 1 camera tilt servo, and 1 camera tilt servo. temp_char=PINB; //Read port B  
There are two main motors for walking, one camera if(temp_char&BIT(0)) //Rocker "run" trigger  
rotation motor, one camera tilt servo, one scanner rotation {  
motor, and one head lift motor. Since DC motors are used, flag_run=1; //indicates that the robot has entered the 
this paper only focuses on the drive system of the main forward state  
travel motors. The motion control of the motors is carried direction_left=1; //the robot is moving in the left wheel 
out by a central processor ATmega8 in the robot control direction  
module, which outputs PWM pulse width modulated direction_right=1; //the robot goes in the direction of the 
signals to the motor drivers in a software way to carry out right wheel  
the forward, reverse, and stop actions of the motors.  velocity_left=velocity_robot; // the robot's left wheel speed  
 velocity_right=velocity_robot; //robot right wheel speed  
Design of the upper computer control system: }  
The upper control system is mainly responsible for else if(temp_char&BIT(1)) // rocker "back" trigger  
communication with the lower computer of the pipeline {  
robot, through the control knob on the upper control panel flag_run=0; //meaning the robot is out of the forward state  
to send various commands to the robot, such as forward, direction_left=2; //the robot goes back in the left wheel 
backward, left turn, right turn, stop and other action direction  
commands; camera control knob can realize the direction_right=2; //the robot goes back on the right wheel  
adjustment of the camera head, such as camera rotation, velocity_left=velocity_robot; //the robot's left wheel speed  
tilt, and other actions; control panel also has the adjustment velocity_right=velocity_robot; //robot right wheel speed  
knob of the scanner head, can adjust The control panel also }  
has a knob for the scanner head, which allows the scanner else if(temp_char&BIT(2)) // rocker "left" trigger  
to be rotated, scanned and reset, as well as the adjustment {  
of the brightness of the LEDs. All commands are sent to if(flag_run==0) //rotate in place  
the robot via the host control system and are controlled by {  
the robot. The upper computer development process is direction_left=2; //Robot goes backwards in the left wheel 
shown in Figure 4.  direction  
 
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212   207 
direction_right=1; //The robot goes forward on the right The main functions of the lower unit of the pipeline robot 
wheel  are to receive commands from the upper unit to control the 
velocity_left=100; //Robot left wheel speed  motors; to collect data from sensors such as tilt angle and 
velocity_right=100; //Robot right wheel speed  return the information to the upper unit; to provide power 
}  to the robot motors, lights and cameras, and to control the 
else  normal operation of the robot components. The code for 
{  the part of the program of the lower computer is as follows.  
direction_left=1; // robot left wheel direction forward void main(void)  
direction_right=1; // robot right wheel direction  {  
velocity_left=velocity_robot; // robot left wheel speed  init_devices();  
velocity_right=velocity_robot+100; //robot right wheel while(1) ///// cycle ms  
speed  {  
}  value_adc[2]=value_adc[0]; //backup last AD acquisition 
}  data  
else if(temp_char&BIT(3)) // rocker "right" trigger  value_adc[3]=value_adc[1]; //backup last AD acquisition 
{  data  
if(flag_run==0) //rotate in place  flag_adc=0;  
{  adc_start(0); //Start AD acquisition  
direction_left=1; //Robot left wheel direction forward  Delayms(1);  
direction_right=2; //Robot goes backwards on the right flag_adc=1;  
wheel  adc_start(1); //Start AD acquisition  
velocity_left=100; //Robot left wheel speed  Delayms(1);  
velocity_right=100; //Robot right wheel speed  if(flag_auto==1)  
}  {// calculate and output the speed of the motor  
else  velocity_left=0;//the speed of the motor (P+D)  
{  velocity_right=0;//the speed of the motor (P+D)  
direction_left=1; // robot left wheel direction forward  if(velocity_left>10)  
direction_right=1; // robot right wheel direction  {  
velocity_left=velocity_robot+100; // robot left wheel DIRLEFT_H;//positive rotation  
speed  STOPLEFT_L;//positive rotation  
velocity_right=velocity_robot; //robot right wheel speed  pwm_left=velocity_left;  
}  }  
}  else if(velocity_left<-10)  
else if(flag_run==1) //robot not triggered, the robot in a {  
forward state  DIRLEFT_L;//reverse  
{  STOPLEFT_L;//reverse  
direction_left=1; // robot left wheel direction forward  pwm_left=-velocity_left; 
direction_right=1; //Robot is moving in the right wheel }  
direction  else  
velocity_left=velocity_robot; //Robot left wheel speed  {  
velocity_right=velocity_robot; //robot right wheel speed  STOPLEFT_H;//brake  
} pwm_left=0XFF;  
Else if(flag_run==0) //robot is not triggered, the robot is }  
stationary.  if(velocity_right>10)  
{  {  
direction_left=3; //Robot left wheel stopped  DIRRIGHT_H;//positive rotation  
direction_right=3; //robot right wheel stop  STOPRIGHT_L;//forward  
velocity_left=0; //Robot left wheel speed  pwm_right=velocity_right;  
velocity_right=0; // robot right wheel speed  }  
}  else if (velocity_right<-10)  
if(temp_char&BIT(4)) // robot auto travel state  {  
{  DIRRIGHT_L;//reverse  
direction_left=8; //Robot left wheel auto  STOPRIGHT_L;//reverse  
direction_right=8; //Robot right wheel auto  pwm_right=-velocity_right;  
}  }  
...... else  
208      Informatica 49 (2025) 199-212       X. Wang et al. 
{  when an exception occurs in the user's code, the 
STOPRIGHT_H;//brake  corresponding dialog box will pop up, and according to the 
pwm_right=0XFF;  information in the dialog box, the user can see the location 
}  of the error and the cause of the error, to facilitate the user's 
}/////////////////////////////////////////////////////////////////////////  modification of the code and avoid the phenomenon of 
...... crashing or flashing due to errors in the operation of the 
 heat pipe network monitoring system, which ensures the 
3.2 Heat pipe network monitoring system normal operation of the heat pipe network monitoring 
under the control of inspection robot system.  
The system reads data from the intermediate database at 
3.2.1 Analysis of the functional requirements and regular intervals to visualize (dynamic text display and 
overall architecture of the heat pipe network dynamic curve display), store, and send data. In the 
monitoring system architecture of the heat pipe network monitoring system, 
According to the aforementioned content pair can be the intermediate database, basic functions, exception 
known, this heat pipe network monitoring system needs to handling, and library functions form the underlying code 
achieve the following functions.  of the heat pipe network monitoring system, and the user 
(1) This heat pipe network monitoring system must be able can call the function functions in the library functions to 
to acquire data measured by the mobile's integrated sensors achieve the corresponding logical functions, which 
and HD cameras in real time and follow TCP/IP reduces the difficulty of the user's development. 
communication protocols and OTG communication  
protocols to send the collected data and corresponding 3.3 Interface development 
operation instructions to the server and motion controller.  Based on the analysis of the functions and architecture of 
(2) The thermal network monitoring system should have the monitoring software, the interface structure of the 
the capability to visualize sensor data, i.e. to enable the monitoring software is divided into three modules: the 
user to observe specific sensor data, the system must be login module, the monitoring module, and the user UI 
able to display the data as dynamic text; to visualize the module. The login module verifies the user's information, 
trend of the data, the system must enable the data to be and only when the user writes accurate information can the 
displayed as dynamic curves.  monitoring software be opened; when the information 
(3) To meet the access to historical sensor data and video entered does not pass the background verification, the 
data, and to facilitate the user to further confirm the software will prompt the user to enter again or register the 
operation status of the heat pipe network inspection robot account until the login is successful. 
and the internal environment of the pipe, the heat pipe The monitoring module is divided into six interfaces: 
network monitoring system must be able to store the dynamic text display of data, dynamic curve display of 
acquired data in the database and have the function of data and video display, sensor selection interface, and 
querying and deleting historical data.  network connection interface. In the dynamic text display 
(4) The main difference between this thermal network of sensor data and dynamic curve display interface, the 
monitoring system and other thermal network monitoring data is refreshed once per second; the video monitoring 
systems is the ability to achieve human-machine interface can preview the video data collected by the 
interaction, i.e. by parsing the XML file sent by the server mobile terminal in real-time; to reduce the amount of data 
and dynamically generating user-defined interfaces and and allow the user to select the required sensor data 
background logic, the thermal network inspection robot according to the specific project needs, the monitoring 
can be controlled to achieve the functions set by the user.  software is designed with this in mind and the sensor 
(5) To ensure the security of user information, the thermal selection interface is designed; to achieve the 
network monitoring system needs to have a user login communication function with the server, the network 
interface so that only the user can use the thermal network connection interface is designed. To communicate with the 
monitoring system after entering the corresponding user server, a network connection interface has been designed, 
name and password. where the user only needs to enter the corresponding IP 
Based on the analysis of the functional requirements of the and port number to connect to the server and transfer the 
monitoring software, the design of its overall architecture data; to meet the user's needs and enable the user to query 
was completed. The main functions of the system are: to the historical data, a data query interface has been designed. 
obtain sensor data and video data in real-time and store the The user UI module, which is used to display the interface 
data in an intermediate database; to implement some basic dynamically generated by parsing the XML file sent by the 
functions such as listening to events (sensor listening server, enables human-computer interaction. The interface 
events, SMS listening events, etc.) and sending and design for this heat network monitoring system uses 
receiving broadcasts; to have a system exception and user Activity and Fragment components. Since Fragment takes 
code exception handling mechanism, when the user code up less memory than Activity, the interface design in this 
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212   209 
paper uses more lightweight Fragment to improve the design is to make the functions of the system clearer, and 
running efficiency of the application. The interface of the the user can jump to the required function by a one-key 
heat pipe network monitoring system takes the main switch. The interface makes it easier for the user to operate 
interface as the core and dynamically loads the monitoring the system. The main monitoring interface has been 
interface and the user interface, etc. The data is passed divided into two modules: the monitoring module and the 
between the interfaces by binding objects of the Bundle user interface module, so that the options bar at the top is 
class, as shown in Figure 4. divided into two sections: "Data Monitoring" and "User 
Interface". The visualization bar at the bottom is also 
divided into three sections according to the function of the 
monitoring module: "Text data", "Curve data" and "Video 
data", and the layout between the sections is LinearLayout. 
The controls in the options bar and the visualization bar are 
not the basic controls provided by the mobile platform 
system, but rather developer-defined combinations of 
controls, with a uniform image at the top text at the bottom, 
and a LinearLayout layout. The background color becomes 
lighter when a control is selected.  
In the design of the heat network monitoring system, the 
main interface was created by writing an XML layout file 
for the corresponding interface. This makes it easier to 
control the layout of the controls, and the orientation and 
layout_weight properties in Linearlayout allow the 
 
combined controls to be distributed according to a certain 
 
layout ratio. The main interface is visualized by loading 
Figure 4: Interface interaction process of the heat pipe 
the activity_main.xml file in the main Activity, as shown 
network monitoring system 
in the following code.  
The monitoring system can selectively display sensor data, 
@Override  
i.e. the user can select the required sensor in the sensor type 
protected void onCreate(Bundle savedInstanceState) {  
selection interface, and in the sensor selection interface, 
super.onCreate(savedInstanceState);  
the system will jump to the data display interface to 
......  
visualize the data (dynamic text display and dynamic curve 
setContentView(R.layout.activity_main);  
display) according to the selection result. The process is as 
}  
follows: first, the system converts the selected sensor type 
The onCreate() function is called when the Activity is 
into a string and separates the different sensor names with 
initialized, and the setContentView() function is called 
a special symbol "%". Then the string is bound to a Bundle 
inside the function. The function is its core, and the 
object and the set Arguments() function is called to add the 
parameter R.layout.activity_main is the layout file for 
Bundle object with the string data to the data display 
monitoring the main interface. 
Fragment to be switched; the system function for 
2, the design of the history data query interface  
switching between Fragments is called to switch between 
To facilitate the user to view and delete the historical data, 
the sensor selection interface and the data display interface; 
the user-independent query interface is designed. The user 
finally the function get Arguments() is called in the 
can write the start and end time in the edit box to query and 
Fragment responsible for the data display. Function get 
delete data within a specific period according to their needs; 
Arguments () to get the Bundle object, then call the Bundle. 
when the period queried by the user does not exist, the 
get String() function to get the String data passed from the 
system will give the user a prompt until the correct time is 
sensor selection interface, then call the string processing 
entered or returned, the data finding process is shown in 
function with the special symbol "%" as the Each member 
Figure 5. When the queried historical data exists, the heat 
of the array is the name of the sensor selected by the user, 
pipe network monitoring system provides two ways of data 
and the system iterates through the array to obtain and 
display, namely text display and curve display. 
display the real-time data of the sensor selected by the user. 
 
3.4 Interface design 
Main interface design 
The main interface of the thermal network monitoring 
system in this paper adopts a "segmented" structure, i.e. 
the top of the interface is the options bar, the bottom is the 
visualization bar, and the middle is used as a container to 
load different forms of interfaces, the purpose of this 
210      Informatica 49 (2025) 199-212       X. Wang et al. 
accuracy without substantially reducing processing speed, 
establishing it as a feasible choice for real-time fault 
identification in a variety of uses. 
The complexity analysis of optimization algorithms, such 
as the gravitational search algorithm (GSA), cuckoo 
algorithm (CA), and particle swarm optimization (PSO), 
focuses on their time complexity and resource needs. GSA 
has a time complexity of O (n. k), rendering it effective for 
moderate-sized issues, while CA exhibits O (n. log(n)), 
allowing for rapid convergence in smaller datasets. PSO, 
with a complexity of O (n. m), can become expensive as 
dimensionality rises. These variations have an effect on 
scalability; GSA's effectiveness renders it appropriate for 
bigger heat pipe networks, whereas CA and PSO may have 
higher computational overhead with large datasets, 
requiring optimizations for practical uses. 
 
5 Conclusion 
In summary, the author has analysed the requirements of 
 this heat pipe network monitoring system, focusing on the 
Figure 5: Data Search Process heat pipe network inspection robot, and completed the 
 overall architecture design and basic interface design of 
4 Usability testing the heat pipe network monitoring system. The specific 
The thermal network surveillance system's usability functional requirements of the heat pipe network 
testing analyses both the main interface and the historical monitoring system were analysed as follows: user login, 
data query interface to guarantee user satisfaction. The collection of sensor data, visualization of data, storage of 
main interface is segmented, with an options bar for "Data data, communication, dynamic generation of user-defined 
Monitoring" and "User Interface," as well as a interfaces and logic, and other functions. The overall 
visualization bar that displays "Text data," "Curve data," architecture of the heat pipe network monitoring system is 
and "Video data." This design provides easy access to designed. According to the functional requirements of the 
operations and intuitive interaction via custom-designed system, the data acquisition, basic functions, system, and 
controls. The historical data query interface allows users user code exception handling functions are unified and 
to enter start and end times for data retrieval and managed by the Service, which reduces the redundancy of 
elimination, with error messages for invalid queries. the code and facilitates maintenance at a later stage. Based 
Testing will concentrate on task completion times, error on the functional requirements and architecture of the heat 
rates, and user feedback in order to validate efficiency and network monitoring system, the interface architecture of 
detect enhancements. the system was designed, using Activity as the carrier to 
 achieve the functionality of the system interface 
4.2 Discussion interaction by dynamically loading Fragment layouts. 
The presented multi-classification fault diagnosis model From the application of common login methods and the 
outperforms existing SOTA techniques, with a characteristics of this software, the login module, data 
classification accuracy of 99.99% and a computation time query module, and main interface of the heat pipe network 
of only 55.938 seconds. This enhancement is due to the monitoring system are designed. The design of this 
execution of algorithmic optimizations, particularly the monitoring system is important for the improvement of 
gravitational search technique integrated with parameter monitoring efficiency. 
tuning of the LSSVM. These optimizations boost the  
model's capacity to navigate the parameter space Data Availability 
efficiently, leading to better classification efficiency than All data are included within the article. 
prior methods, which attained a maximum accuracy of  
97.89% and required longer run times. This work makes a 
Conflicts of interest 
unique contribution by incorporating sophisticated 
optimization methods that not only raise accuracy but also The authors declare no conflicts of interest. 
decrease algorithmic intricacy, rendering the model more  
effective. While trade-offs between computation time and Funding statement 
accuracy are prevalent in machine learning, the proposed Not applicable. 
solution provides practical benefits by offering better  
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212   211 
References and engineering, 2(4), 
100074.  https://doi.org/10.1016/j.jpse.2022.100074  
[1] Shen, Y., Chen, J., Fu, Q., Wu, H., Wang, Y., & Lu, 
[11] Yussof, N. A. M., & Ho, H. W. (2022). Review of 
Y. (2021). Detection of district heating pipe network 
water leak detection methods in smart building 
leakage fault using UCB arm selection method. 
applications. Buildings, 12(10), 
Buildings, 11(7), 275. 
1535.  https://doi.org/10.3390/buildings12101535  
https://doi.org/10.3390/buildings11070275  
[12] Langroudi, P. P., & Weidlich, I. (2020). Applicable 
[2] Perpar, M., & Rek, Z. (2020). Soil temperature 
predictive maintenance diagnosis methods in service-
gradient is a useful tool for small water leakage 
life prediction of district heating pipes. Rigas 
detection from district heating pipes in buried 
Tehniskas Universitates Zinatniskie Raksti, 24(3), 
channels. Energy, 201, 
294-304.  https://doi.org/10.2478/rtuect-2020-0104  
117684.  https://doi.org/10.1016/j.energy.2020.11768
[13] van Dreven, J., Boeva, V., Abghari, S., Grahn, H., Al 
4  
Koussa, J., & Motoasca, E. (2023). Intelligent 
[3] Al Qahtani, T., Yaakob, M. S., Yidris, N., Sulaiman, 
approaches to fault detection and diagnosis in district 
S., & Ahmad, K. A. (2020). A review of water leakage 
heating: Current trends, challenges, and opportunities. 
detection method in the water distribution network. 
Electronics, 12(6), 1448. 
Journal of Advanced Research in Fluid Mechanics and 
https://doi.org/10.3390/electronics12061448  
Thermal Sciences, 68(2), 152-
[14] Hossain, K., Villebro, F., & Forchhammer, S. (2020). 
163.  https://doi.org/10.37934/arfmts.68.2.152163  
UAV image analysis for leakage detection in district 
[4] Gams, M., & Kolenik, T. (2021). Relations between 
heating systems using machine learning. Pattern 
electronics, artificial intelligence and information 
Recognition Letters, 140, 158-164. 
society through information society rules. Electronics, 
https://doi.org/10.1016/j.patrec.2020.05.024  
10(4), 514. 
[15] Vollmer, E., Ruck, J., Volk, R., & Schultmann, F. 
https://doi.org/10.3390/electronics10040514  
(2024). Detecting district heating leaks in thermal 
[5] Li, W., Liu, T., & Xiang, H. (2021). Leakage detection 
imagery: Comparison of anomaly detection methods. 
of water pipelines based on active thermometry and 
Automation in Construction, 168, 
FBG-based quasi-distributed fiber optic temperature 
105709.  https://doi.org/10.1016/j.autcon.2024.10570
sensing. Journal of Intelligent Material Systems and 
9  
Structures, 32(15), 1744-
[16] Kim, H., Lee, J., Kim, T., Park, S. J., & Kim, H. (2023). 
1755.  https://doi.org/10.1177/1045389x20987002  
Advanced thermal fluid leakage detection system with 
[6] Zholtayev, D., Dauletiya, D., Tileukulova, A., 
machine learning algorithm for pipe-in-pipe structure. 
Akimbay, D., Nursultan, M., Bushanov, Y., ... & 
Case Studies in Thermal Engineering, 42, 
Yeshmukhametov, A. (2024). Smart Pipe Inspection 
102747.  https://doi.org/10.2139/ssrn.4147041  
Robot with In-Chassis Motor Actuation Design and 
[17] Pérez-Pérez, E. D. J., López-Estrada, F. R., Valencia-
Integrated AI-Powered Defect Detection System. 
Palomo, G., Torres, L., Puig, V., & Mina-Antonio, J. 
IEEE 
D. (2021). Leak diagnosis in pipelines using a 
Access.  https://doi.org/10.1109/access.2024.345050
combined artificial neural network approach. Control 
2  
Engineering Practice, 107, 104677. 
[7] Murtazin, I. I., Kozhevnikov, M. V., & Starikov, E. M. 
https://doi.org/10.1016/j.conengprac.2020.104677  
(2021). Development and application of methods of 
[18] García-Ródenas, R., Linares, L. J., & López-Gómez, 
internal inspection of district heating networks. 
J. A. (2021). Memetic algorithms for training 
International Journal of Energy Production and 
feedforward neural networks: an approach based on 
Management. 2021. Vol. 6. Iss. 1, 6(1), 56-
gravitational search algorithm. Neural Computing and 
70.  https://doi.org/10.2495/eq-v6-n1-56-70  
Applications, 33(7), 2561-2588. 
[8] Wong, B., & McCann, J. A. (2021). Failure detection 
https://doi.org/10.1007/s00521-020-05131-y  
methods for pipeline networks: From acoustic sensing 
[19] Kazeminasab, S., & Banks, M. K. (2022). Towards 
to cyber-physical systems. Sensors, 21(15), 4959. 
long-distance inspection for in-pipe robots in water 
https://doi.org/10.3390/s21154959  
distribution systems with smart motion facilitated by 
[9] Liu, R., Zhang, Y., & Li, Z. (2022). Leakage diagnosis 
a particle filter and multi-phase motion controller. 
of air conditioning water system networks based on an 
Intelligent Service Robotics, 15(3), 259-
improved BP neural network algorithm. Buildings, 
273.  https://doi.org/10.1007/s11370-022-00410-0  
12(5), 610. 
 
https://doi.org/10.3390/buildings12050610  
 
[10] Korlapati, N. V. S., Khan, F., Noor, Q., Mirza, S., & 
 
Vaddiraju, S. (2022). Review and analysis of pipeline 
 
leak detection methods. Journal of pipeline science 
 
212      Informatica 49 (2025) 199-212       X. Wang et al. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.7779 Informatica 49 (2025) 213–234 213 
 
Biometric-Based Secure Encryption Key Generation Using 
Convolutional Neural Networks and Particle Swarm Optimization  
 
Sahera A. S. Almola, Raidah S. Khudeyer, Hameed Abdulkareem Younis 
Department of Computer Information Systems, College of Computer Science and Information Technology, University 
of Basrah, Basrah, Iraq 
E-mail: sahera.sead@uobasrah.edu.iq, raidah.khudayer@uobasrah.edu.iq, hameed.younis@uobasrah.edu.iq 
*Corresponding author 
 
Keywords: biometric verification, fingerprints, deep learning, particle swarm optimization (pso) algorithm, encryption 
key generation 
 
Received:  December 7, 2024 
 With the rapid expansion of computer networks and information technology, ensuring secure data 
transmission is increasingly vital—especially for image data, which often contains sensitive information. 
This research presents a biometric-based encryption system that uses fingerprint recognition and deep 
learning to generate strong, random encryption keys. Two convolutional neural networks (CNNs) are 
employed: one to verify identity based on a user’s ID and another to extract fingerprint features for key 
generation. These keys are optimized using Particle Swarm Optimization (PSO), enhancing their 
randomness and resistance to brute-force attacks. 
 The system generates keys in real-time, eliminating the need for storage and minimizing the risk of theft or 
leakage. To further improve security, encryption keys are automatically updated after every ten messages, 
with different keys generated from multiple fingerprints of the same individual. Testing with the SOCOFing 
dataset (6,000 original and 49,270 synthetic images) achieved 99.75% identity verification and 99.83% 
classification accuracy. Performance metrics—entropy of 7.89, correlation factor of 0.00628, and zero 
repetition—demonstrate high robustness. This approach offers a secure, adaptive, and personalized 
encryption method ideal for sensitive domains like finance and healthcare. 
           Povzetek: Opisana je izvirna metoda za generiranje varnih šifrirnih ključev z uporabo prstnih odtisov, CNN 
modelov in optimizacije roja delcev (PSO) 
 
 
1  Introduction encrypted data. Consequently, there is an increasing need 
for systems that dynamically generate encryption keys on-
Internet and network users share millions of color images demand at the user’s end, eliminating the need for key 
daily, which are utilized in various applications such as transmission over networks [3]. This innovative approach 
telemedicine, remote learning, business, and military ensures that the encryption key is generated locally each 
operations. Color images, in particular, often contain time data is decrypted, significantly reducing risks 
sensitive and detailed information, making them prime associated with key interception. It also eliminates the 
targets for unauthorized access and cyberattacks. need for key exchange, adding an extra layer of security 
Securing these images is crucial not only to prevent data since unauthorized parties cannot generate the key even if 
loss during transmission but also to protect sensitive  communication is intercepted.  
information from attackers. Various techniques are       The keyless exchange method, when combined with 
employed to secure digital images, such as watermarking, biometric verification, offers a highly secure solution by 
steganography, and image encryption. Encryption minimizing the risk of key theft. This approach aligns 
operates in two main stages: encryption and decryption. with the methodology presented in this research. 
During encryption, the input image is transformed into an However, implementing such a solution poses significant 
unreadable form using a secret key, while in decryption, challenges in the fields of secure computing and key 
the content is restored using the same key [1].  The management, as it requires a robust system to ensure the 
encryption key is a fundamental element in the consistent and accurate generation of keys [4]. The 
encryption and decryption processes, and it significantly importance of this research lies in emphasizing the 
determines the security system's strength. However, a generation of encryption keys locally at the user’s end to 
critical challenge faced by encryption systems lies in safeguard data and mitigate risks associated with key 
managing the encryption key itself [2]. Traditional transmission over networks. This is particularly critical 
encryption methods require transmitting the encryption for securing color images, as their high information 
key to the recipient to decrypt the data. This approach content often correlates with increased sensitivity, making 
introduces vulnerabilities, as any exposure of the key them especially vulnerable to sophisticated attacks. 
during transmission could lead to the compromise of the        To address these challenges, advanced techniques 
 based on artificial intelligence and machine learning, 
214   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
particularly deep learning, have emerged. One notable generation model without the need for retraining. Erkan et 
technique involves using deep learning to generate al. (2024) [9] proposed a secure image encryption 
encryption keys from fingerprints. This method leverages framework that combines a chaotic logarithmic map with 
the extraction of unique features from fingerprints, a deep CNN for key generation. Their system incorporates 
converting them into robust, non-repetitive encryption advanced operations such as permutation, DNA encoding, 
keys to ensure high data security [5]. This method diffusion, and bit-reversal to ensure security. The 
addresses limitations in traditional encryption systems, robustness of this framework was validated through 
such as the need for key transmission over networks. comprehensive analyses, including key sensitivity and 
Since a fingerprint is a unique biometric identifier that resistance to various attacks, demonstrating superior 
cannot be easily copied or mimicked, it serves as an ideal performance compared to traditional encryption methods. 
source for generating encryption keys. Moreover, deep        Quinga Socasi, Zhinin-Vera, and Chang (2020) [10] 
learning enhances the accuracy and strength of the developed a method for generating encryption keys from 
generated keys by utilizing deep neural networks to alphanumeric passwords using an autoencoder neural 
analyze biometric images and extract unique features for network. Their experiments revealed that this method 
each fingerprint [6]. This approach also resists advanced outperforms conventional algorithms, particularly when 
threats, including brute-force and quantum encryption encrypting small text files, making it highly resistant to 
attacks, by dynamically generating encryption keys in cracking attempts. Wu et al. (2022) [11] presented a 
real-time. The added layer of complexity and secrecy biometric key generation framework that uses fingerprints 
prevents unauthorized parties from accessing the keys, to achieve over 1024-bit key strength and 98% accuracy. 
even if communication data is partially intercepted [7]. However, their method depends on a predefined pipeline 
The integration of deep learning in generating encryption and fuzzy extractors for key stabilization. In contrast, the 
keys from fingerprints represents a significant method proposed in this research dynamically extracts 
advancement in information security. This approach high-resolution fingerprint features using deep learning 
combines robust security measures with individual models, ensuring greater adaptability across datasets. 
privacy, paving the way for building encryption systems These features are combined with chaotic encryption 
that are highly resistant to breaches and better equipped systems to enhance randomness and security. 
to address modern security challenges. The remainder of Furthermore, Particle Swarm Optimization (PSO) is 
this paper is organized as follows: Section 2 reviews employed to optimize the generated keys, achieving over 
related works, while Section 3 provides background on 99% accuracy and producing 1024-byte keys without 
the key techniques utilized in this research. Section 4 requiring stabilization layers. This approach demonstrates 
explains the management of secret keys. Section 5 details superior flexibility and security for real-world IoT 
the proposed method. Section 6 focuses on experimental applications. Alesawy and Muniyandi (2016) [12] 
results and performance analysis. Section 7 discusses the investigated data security in cloud environments using 
results, and Section 8 concludes this study. random encryption keys. Their study analyzed the impact 
 of incorporating Elliptic Curve Diffie-Hellman (ECDH) 
2  Related works keys and demonstrated significant improvements in 
efficiency and performance by integrating Artificial 
The integration of biometric data, chaotic systems, and 
Neural Networks (ANNs) with ECDH and genetic 
deep learning in encryption key generation has been a 
algorithms, despite increased processing times for larger 
prominent research area. Various studies have explored 
datasets. Saini and Sehrawat (2024) [13] proposed a 
innovative approaches to enhance the security and 
technique for generating unique encryption keys by 
robustness of encryption systems. Hashem and Kuban 
combining an autoencoder network with hashing 
(2023) [8] introduced a system that leverages fingerprint 
techniques and prime numbers derived from the MNIST 
biometrics to generate long, random encryption keys. The 
dataset. To enhance security, the system incorporates 
approach involves preprocessing fingerprint images to 
XOR operations and Blum-Blum-Shub (BBS) generators. 
remove noise, utilizing a modified VGG-16 
Extensive testing confirmed the robustness of this 
convolutional neural network (CNN) to extract unique 
approach against attacks. Kurtninykh, Ghita, and Shiaeles 
features, and employing transfer learning to build a key 
(2021) [14] addressed the complexities of 
cryptographic key management in systems with Hashicorp Vault is particularly suitable for small 
increasing users and applications. They evaluated five key businesses due to its superior security features. A 
management systems, including Hashicorp Vault and summary of the related studies is provided in Table 1. for 
Pinterest Knox, focusing on features such as security, further reference. 
scalability, and access control. The study concluded that 
 
 
 
 
 
 
 
 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 215 
 
Table 1: Previous works on key generation 
 
 
 
This research builds upon the foundations laid by these 3  Background 
studies, emphasizing the dynamic generation of 
This paragraph addresses two main techniques: CNNs and 
encryption keys using deep learning and chaotic systems 
PSO, which form the foundation of the methodology 
to address challenges in key management and enhance 
proposed in this research. In the following paragraphs, we 
security. The comparison Table 1. clearly demonstrates 
will provide a summary of each technique and explain its 
the superiority of our proposed method over all previous 
significance in the study. 
approaches. The proposed method utilizes dynamic keys 
 
generated by deep learning networks, which significantly 
A. CNNs are advanced models in the field of deep 
enhance randomness and security. Moreover, the key is 
learning, specifically designed to handle grid-like data, 
non-portable, non-persistent, and achieves the largest size 
such as images. In this research, two CNN models were 
and highest accuracy compared to other methods.   
used to generate an encryption key based on fingerprint 
 
 
216   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
images. Table 2. summarize the components of each model used in the work. 
 
Table 2:  Components of CNN models used 
   
 Layer (Type)  Output Shape Parameters (#) 
  
 Conv2D (conv2d_1)  ) None, 92, 92, 32(  832
 BatchNormalization(batch_normalization_1)  ) None, 92, 92, 32(  128
 MaxPooling2D(max_pooling2d_1)  ) None, 46, 46, 32(  0
 Conv2D (conv2d_2)  ) None, 42, 42, 64(  51,264
  
 BatchNormalization batch_normalization_2)  ) None, 42, 42, 64(  256
 MaxPooling2D (max_pooling2d_2)  ) None, 21, 21, 64(  0
 Conv2D (conv2d_3)  ) None, 19, 19, 128(  73,856
BatchNormalization (batch_normalization_3)  ) None, 19, 19, 128(  512
  
 MaxPooling2D (max_pooling2d_3)  ) None, 9, 9, 128(  0
 Dropout (dropout_1)  0
(None, 9, 9, 128) 
  
Flatten (flatten_1)  0
 (None, 10368) 
 
 
 
Dense (dense_1) 10,617,856 
 (None, 1024)  
  
 
 
(None, 1024)  0
Dropout (dropout_2) 
 
 
Dense (dense_2) (None, 600)  615,000
  
 
 
 
  
The first model was designed to identify a person’s The two models were trained using the backpropagation 
identity based on their ID number. After confirming the technique with a suitable loss function for each task. This 
person’s identity, the second model identifies the selected architectural design was chosen to achieve accurate 
fingerprint and extracts its features. Both models rely on performance in recognizing the identity of the fingerprint 
convolutional layers to automatically and progressively owner through the identifier number in the file name, and 
extract important features from the input data, making then generate an encryption key based on the unique 
them effective in performing tasks, which, in turn, aids in features of the fingerprint using two convolutional neural 
generating strong encryption keys by analyzing fine networks. 
patterns in the images.  
 
 
 
 
 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 217 
 
  Pseudo-code for the PSO Algorithm 
  
1. Initialize Parameters: 
o Define bounds: 
 • Lower bound (lb) = 0  
• Upper bound (ub) = 255 
 o Set PSO parameters:  
• Number of particles = len(keys) 
 • Maximum iterations = 200  
• Inertia weight (w) = 0.9 
 • Cognitive coefficient (c1) = 0.5  
• Social coefficient (c2) = 0.5 
o Set random seed for reproducibility. 
 2. Initialize Particles:  
o Convert keys to a NumPy array. 
 o Set initial particle positions = keys.  
o Set initial velocities = zeros. 
o Initialize personal bests: 
  
• Personal best positions = initial positions. 
• Personal best scores = evaluate fitness for each particle. 
 o Find global best:  
• global_best_position = position with the best score. 
 • global_best_score = best persona l score. 
3. Run PSO Optimization: 
 For each iteration in range(num_iterations) do:  
o For each particle do: 
• Update velocity: 
 o new_velocity = (w × cu rrent velocity) 
+(c1 × random factor × 
 (Personal best – c urrent position)) 
+(c2 × random factor × (global best - current position)). 
• Update position: 
  
o new_position = current position + new_velocity. 
o Clip positions to bounds (lb, ub). 
 • Evaluate fitness of the new posit ion. 
o Update personal best position and score. 
 o Update global best:  
• If any particle's score is better than the global best score: 
o Update global best position and score. 
  
4. Output Results: 
o Convert global_best_position to integer (best_key). 
 o Compute best_entropy_value using the fitn ess function. 
 
 Figure 1: PSO algorithm  
This approach adds an extra layer of security, ensuring The algorithm enhances the randomness and strength of 
that the keys are not only unique and non-repetitive but the generated keys, ensuring that they are both secure and 
also resilient to various forms of attacks. The use of resistant to attacks. PSO improves the key generation 
PSO ensures that the final encryption keys are both process by fine-tuning the key parameters in real-time, 
optimized for security and making it more robust against potential security threats. 
This approach adds an extra layer of security, ensuring 
B. PSO (Particle swarm optimization) is an that the keys are not only unique and non-repetitive but 
optimization algorithm inspired by the collective also resilient to various forms of attacks. The use of PSO 
behavior of birds or fish.  It involves a group of ensures that the final encryption keys are both optimized 
particles, each representing a potential solution in the for security andgenerated dynamically, without the need 
solution space. Each particle adjusts its movement for permanent storage, thus reducing the risks of key 
based on its own experience and the experiences of leakage or unauthorized access. Key enhancement using 
neighboring particles, with the aim of reaching the PSO: The Particle swarm optimization (PSO) algorithm is 
optimal solution.  PSO is known for its efficiency and used to enhance the quality of the initial key, making it 
ability to find optimal solutions in multi-dimensional stronger and more secure. Figure 1. illustrates the detailed 
spaces.  In this research, PSO is applied to optimize the steps of the Particle swarm optimization (PSO) algorithm 
process of encryption key generation.  using pseudo-code. This pseudocode reflects the essence 
of the PSO algorithm applied to optimize encryption keys 
 
 
218   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
based on the fitness function (such as randomness or lifecycle of keys. By avoiding permanent storage and 
security). The process iteratively adjusts the position focusing on real-time generation and temporary 
(key) and velocity of the particles to find the optimal protection, the system significantly reduces the risks 
encryption key with high security.  associated with key leakage or unauthorized access. This 
approach aligns with best practices in modern 
4  Secure  key  management cybersecurity by combining the advantages of real-time 
key generation with robust temporary key management to 
Secure key management is a critical process to ensure the ensure the highest level of data protection throughout the 
protection of encrypted data across encryption systems. encryption process [16].  
In the proposed methodology, the focus is on generating 
cryptographic keys in real-time without permanent 
storage, thus reducing the risks associated with key 5  Proposed method 
leakage. However, temporary handling and protection of 
keys during their lifecycle remain essential. Below is a Figure 2: presents the diagram for the proposed 
detailed explanation of the steps and importance of encryption key management and generation. 
secure key management, updated to reflect the real-time 
generation approach: Securing Communication and Transferring Confidential 
 Information 
1. Key generation: In the proposed system, keys are 
 
generated dynamically and in real-time using advanced 
techniques such as artificial neural networks, particularly 
convolutional neural networks (CNN). This approach  
ensures that the keys are both highly secure and non-
repetitive, avoiding the need for long-term storage. These  
keys are designed to be sufficiently random and robust, 
minimizing the possibility of guessing or tampering.  
2.  Temporary key handling: While keys are not stored  
permanently, they are managed securely during their 
temporary existence within the system. During  
 
encryption or decryption processes, the keys are stored in  
memory with strict safeguards, such as memory 
 
encryption or secure enclaves, to prevent unauthorized Generating Encryption Key using CNN and Encrypting the 
access. Once the operation is complete, the keys are Image 
securely erased from the system to eliminate any residual  
risk. 
 
3.  Key distribution: Since the system eliminates the 
need for traditional key exchange, the reliance on secure  
protocols like SSL/TLS or Diffie-Hellman for key 
distribution is significantly reduced. Instead, the  
generated key remains local to the system, mitigating 
risks associated with interception during transmission  
 
[16]4. Key rotation: In systems where keys are reused 
for multiple sessions or extended periods, regular key 
 Generating Encryption Key using CNN and Decrypting Image
rotation is critical. However, in the proposed system, 
each key is uniquely generated for a specific session or 
operation, inherently providing the benefits of key  
rotation by design. 
 
5.  Key revocation: Although the system minimizes the 
use of persistent keys, mechanisms for immediate key  
invalidation are essential for scenarios involving session-
based or temporarily stored keys. These mechanisms   
ensure that any exposed or misused keys are rendered 
unusable promptly [17].  
Figure 2: Proposed method diagram 
6. Importance of key management in real-time 
The  
systems: The proposed approach emphasizes the 
importance of secure key handling during the active 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 219 
 
proposed method consists of three main parts. The first part 6. Managing the number of sent messages (dynamic 
begins with an algorithm for securing communication and key management): The system checks whether the 
managing encryption keys. This is followed by the second number of messages sent by the user has exceeded the 
part, which involves the process of generating the encryption allowed limit (10 messages). 
key and encrypting the image. Finally, the third part focuses 
on decrypting the image after the key has been generated. Each • If the limit is exceeded, the counter is reset to 1. 
of these parts will be explained in detail later. • If the limit is not exceeded, the current counter is   
used as an index for generating the encryption key.               
Part One: Securing communication and This mechanism ensures unique encryption keys 
managing confidential information transfer for each set of messages, enhancing data security. 
Additionally, it raises a critical question: 
"Can biometric fingerprint data generate 
The first part of Figure 2 illustrates an algorithm 
dynamic encryption keys resistant to quantum 
designed to ensure secure communication and reliable 
attacks?" 
key management between branches and the main branch. 
When a branch requests access to sensitive information 
(such as encrypted images), the main branch fulfills this This approach aims to strengthen the security of biometric 
request by sending the requested information after keys against advanced threats such as quantum attacks. 
encrypting it with a secure key, ensuring data protection 
during transmission. User ID is used to control access. 7.  Sending the request to the branch (send request to 
branch): The request containing the ID (ID) and the 
Algorithm execution steps  fingerprint index (P) is sent to the second branch for 
processing. 
The algorithm is executed in cooperation with the 
• In the second part: A key is generated for image 
following two parts in the diagram as follows: 
encryption, and the encryption process is executed. 
After encryption, the encrypted image is sent back to the 
1. Starting the process (start): The process begins by first part. 
initializing the user's counter Counter [ID] to zero. 
• In the third part: A new key is generated to decrypt 
2. Entering the ID number: The system prompts the 
the image. 
user to input their identification number to verify 
Once decryption is completed, the data is returned to the 
their identity. 
first part for the remaining steps. 
3. Verifying the ID range (ID in 1..600): The system  
checks whether the entered ID number falls within 
the allowed range (1 to 600).        Note: The details of the second and third parts will be 
explained in the following sections of the document for a 
precise and comprehensive understanding. In this way, the 
• If the number is outside the range, an error three parts form an integrated system that ensures secure 
message is displayed, and the user is asked to re- communication and the safe transmission of sensitive 
enter the ID. information effectively. 
• If the number is valid, the process moves to the 
next step. 
Algorithm features  
4. Checking the match with the exit indicator (ID in 
exit): The system compares the entered ID with the • Biometric security: Fingerprints are used as a 
exit indicator list. means to verify user identities, which reduces the 
risks of unauthorized access. 
• Synchronization: The system relies on concurrent 
• If a match is found, the process is 
processing, enhancing performance efficiency and 
terminated. 
reducing response times for requests. 
• If no match is found, the process 
• Dynamic key management: Each key is generated 
continues to    the next step. 
uniquely for each user based on their fingerprint, 
increasing the difficulty of breaching the system.  
5. Incrementing the message counter (counter [ID] This algorithm ensures effective protection of 
+= 1):  encrypted data and enhances the security of 
communications between branches, making it an 
If the ID is valid and not listed in the exit indicator, the excellent choice for systems that require a high level 
user's message counter is incremented by 1. of security and privacy. 
  
 
 
220   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
  
CDFmin−CDF(I)
H′(I) = (1-L)           (2) 
(NXM)−CDFmin
Part Two: Encryption key generation using  
CNN and image encryption      
The histogram equalization process involves several key 
The encryption key is generated using CNN based on the parameters that affect the final outcome of the operation: 
fingerprint. This process is carried out as specified in Part  
2 of the diagram, which includes the following 1. Cumulative distribution function (CDF): This is the 
operations: primary factor that determines how grayscale values are 
redistributed in the image. The CDF accumulates 
grayscale values progressively from the lowest to the 
1. Database loading phase: This step is considered 
highest and is used to adjust the distribution. Through this 
one of the main preparatory phases in the system to 
function, the grayscale value distribution in the image is 
ensure the readiness of the data and models required 
calculated, and adjustments are made to spread these 
to achieve accuracy and security in encryption key 
values evenly across the color range. 
generation. In this research, the SOCOFing database 
 
was used, which contains fingerprints from 600 
2. Minimum non-zero value (CDFmin): This refers to 
people of African descent, with each person having 
the smallest non-zero value in the cumulative distribution 
10 fingerprints, resulting in a total of 6,000 original 
function. It is used to determine how grayscale values in 
fingerprints. Additionally, synthetic groups were 
the image will be adjusted to achieve a more balanced 
created with three levels of variation in the 
distribution. For example, if the grayscale values in the 
fingerprints: minor changes (Easy), medium 
image are concentrated around a particular value, utilizing 
changes (Medium), and significant changes (Hard). 
this minimum helps improve the distribution of those 
The total number of synthetic fingerprints used in 
values without significantly affecting the overall contrast 
training was approximately 49,270. The variation 
of the image. 
fingerprints were used for training the model, while 
 
the original fingerprints were used solely for testing. 
3. Image size (N×M): This refers to the number of pixels 
2. Data preprocessing phase: The following 
in the image. The larger the image (i.e., a greater N×M), 
processes are included:   
the more opportunities there are for accurately 
3. Image size standardization: To ensure that all 
redistributing grayscale values. However, it is important 
images in the database are compatible with the 
to note that image size can impact processing speed, as 
model requirements, the dimensions of all images 
larger images require more computations. 
are standardized. A common size, such as 96×96 
 
pixels, is often chosen to prepare the images for 
4. Number of gray levels (L): Typically, L=256 in 
efficient model processing. The formula for resizing 
grayscale images (meaning there are 256 possible tonal 
the images can be expressed mathematically as 
levels ranging from 0 to 255). The number of gray levels 
shown in Equation (1) below: 
defines the range of colors that can be distributed across 
I'(x',y') = I(y/Sy ,x/Sx)                         ( 1)  
the image. In images with a high number of gray levels, 
       Where: 
tonal gradations can be distributed more evenly, leading to 
I(x,y) is the original image, and I′(x′,y′) is the image 
better contrast enhancement. 
after resizing, with Sx and Sy representing the 
      When applying this technique, the range of grayscale 
scaling factors in the image dimensions[18]. 
values in the image is expanded, and these values are 
 
evenly distributed across the color range, leading to 
A. Image enhancement using histogram 
increased contrast. This enhanced contrast reveals fine 
equalization:  
details in the image, such as the minutiae in fingerprints, 
       The histogram equalization technique was applied to 
which might be poorly visible in low-contrast images. In 
enhance contrast in fingerprint images and highlight 
the case of fingerprints, fine details such as ridges and 
fine details. This technique is one of the 
patterns are often crucial for analysis and classification.  
fundamental methods in image processing and 
By using histogram equalization, the clarity of these fine 
quality enhancement, aiming to improve the 
details can be improved, aiding in better feature extraction 
distribution of grayscale levels in the image to make 
of the fingerprint and achieving higher performance in 
fine details more visible. In images with low 
systems that use fingerprint recognition. Figure 3. shows 
contrast, gray values may cluster within a narrow 
an example of fingerprints before and after contrast 
range, leading to the loss of fine details in dark or 
enhancement using histogram equalization. Notice how 
bright areas. Histogram equalization is used to 
the enhanced images display finer and clearer details 
address this issue by improving the distribution of 
compared to the original images. 
these gray values over a broader range of available 
 
colors, enhancing contrast and making details easier 
to detect. The process of adjusting the tonal 
gradients in the image is carried out using the 
following equation (2)[19]: 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 221 
 
 7. Training and evaluation phase of the two models 
         The performance of both models is evaluated using 
standard performance metrics such as accuracy, 
validation, and error rate calculation. This ensures that the 
first model is capable of accurately verifying the identity 
of authorized individuals when requesting the encryption 
key. Similarly, the second model's performance is 
assessed to determine its ability to correctly identify the 
 fingerprint belonging to the individual whose identity has 
                     (a)                                                     (b)         b  e  e  n    v  e  r i f i ed. This evaluation is done using the test set. 
            Figure 3: (a) Original image (b) Image after 8. Identity verification and key generation  
histogram equalization 
 The identity of the individual and the fingerprint match 
5. Database analysis phase with the registered name are verified using two deep 
   After image enhancement, each fingerprint is analyzed to learning models. This is a key step in generating the 
identify distinctive features, such as patterns and key encryption key from fingerprints, as illustrated in Figure 
regions within the fingerprint. This helps prepare the data 4. 
for the model to understand the unique elements in each 9. Key optimization stage using PSO  
fingerprint. The goal of this process is to efficiently 
analyze the fingerprint database to extract the necessary To enhance the quality of the initial key and obtain a 
information for feeding two different models. This is stronger, more secure key, PSO algorithm is applied. This 
achieved by parsing the file name to extract the algorithm aims to improve the random distribution and 
individual's identity, finger type, and hand (right or left). security properties of the key. The goal of this algorithm 
Additionally, these steps prepare the data for the two is to increase the randomness of the key and ensure its 
models, allowing the first model to recognize the difficulty in being guessed or broken.  The use of the PSO 
individual’s identity, while the second model identifies algorithm to optimize encryption keys relies on updating 
the finger type based on fingerprint information.  After the positions and velocities of particles based on the 
applying these processes to the database, the data is individual's fingerprints, as illustrated in Figure 1. This 
divided into training and testing sets. Artificial continuous update of the keys, leveraging the best 
fingerprint data is used for training, while original data is personal and global positions, results in generating an 
used for testing. encryption key that is more secure and complex. This also 
 raises the question: "How does the proposed system 
6. Model building phase: After preparing the database, perform against statistical attacks?" This approach 
two models are built using CNNs. The first model aims aims to reduce the likelihood of the keys being exposed to 
to identify the person's identity (SubjectID), while the any repetitive patterns that could be exploited in statistical 
second model aims to determine the finger number attacks. Table 4. outlines the hyperparameters used in the 
(FingerNum) and extract distinctive features of the optimization algorithm, selected based on a series of 
finger. Each model consists of the layers shown in Table experimental trials. 
2. The hyperparameters used in the models are illustrated 
in Table 3.  
 
Table 3: CNN hyperparameters configuration  
 
 
 
 
 
 
 
  
 
 
222   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
 Pseudo-Code for verification and encryption key generation 
1. Initialize finger name function: 
o Define show_fingername(fingernum): 
 
• If fingernum >= 5: 
o Set hand = "right" and 
subtract 5 from fingernum.  
• Otherwise: 
o Set hand = "left". 
 
• Map fingernum to finger names (e.g., 
little, ring, middle, index, thumb). 
• Return the full finger name (hand +  
finger). 
2. Verify fingerprint information: 
o Predict subject ID and finger number for a random  
fingerprint (rand_fp_num) from the test set using models: 
• Id_pred = predicted subject ID. 
 
• Id_real = actual subject ID. 
• fingerNum_pred = predicted finger number. 
•  
fingerNum_real = actual finger 
number. 
o Check predictions:  
• If both IDs and finger numbers match: 
o Print "Information 
confirmed" with subject  
ID and call 
show_fingername(fingerN
um_pred) to get the finger  
name. 
• Otherwise: 
o Print "Prediction is wrong."  
3. Extract candidate fingerprints: 
o Initialize lists keys1 (for original fingerprints) and 
keys2 (for dense layer outputs).  
o For each index i in the prediction range: 
 • Get Id_check = predicted subject ID. 
  • If Id_check == Id_pred: 
 o Append the fingerprint to keys1. 
 o Append the dense layer output to 
keys2. 
o Convert keys1  and keys2 to arrays. 
4. Select target fingerprint:  
o Use index p1 t o select: 
• origina l_fp = keys1[p1]. 
• dense_o utput_finger_selected = keys2[p1]. 
5. Apply data augmentation: 
o   
Define an image data generator (datagen) with 
Table 4+t:r aHnsyfoprmeratpioanrsa: meters of pso   
• Rotation, width/height shift, shear, 10.   Image encryption stage and sending encrypted 
zoom, and horizontal flip. 
image 
o Reshape original_fp to fit the generator’s input 
format. Chen's chaotic system is a three-dimensional dynamic 
6. Generate augmented fingerprints and keys: system that exhibits chaotic behavior and is based on 
o Use datagen to create 20 augmented fingerprints: nonlinear differential equations to represent the evolution 
• For each augmented fingerprint: of the state over time. It can be used to generate a chaotic 
o Generate a new fingerprint. 
o encryption key based on the system's state. The Chen 
Predict the dense layer 
output. chaotic system relies on the following equations that 
o Take absolute values of the describe the changes in the variables x, y, and z: 
output to create a key. 
o Append the key to the keys 
list. 𝑑𝑥
= 𝑎. (𝑥 − 𝑦)                         (3) 
7. Return results: 𝑑𝑡
• Output Keys // To be used as input for PSO algorithm to find 
optimal key the list keys for use in encryption. 
𝑑𝑦
 = (𝑎 − 𝑐). x- x . z + c . y     (4) 
𝑑𝑡
𝑑𝑧
Figure 4: Pseudocode for the identity verification = 𝑥. 𝑦 -  b . z                          (5) 
𝑑𝑡
and key generation process 
   
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 223 
 
Where:  x, y, and z are the variables that determine the • Key optimization stage using PSO: The key 
state of the chaotic system at time t.   is optimized using the PSO algorithm. Decrypt the 
image and send it: The image is decrypted and sent. 
•       a, b, and c are the parameters that control the Decryption: To decrypt the image, the same key (the 
behavior of   t     he system.   Steps followed: chaotic key and the key generated from CNN) is used 
to perform an XOR operation on the encrypted image, 
restoring the original image. 
• Initial conditions: The process starts by 
 
defining the initial values for x, y, and z, which 
represent the state of the system at the beginning of the 6   Results and analysis 
simulation. These values are set in the code as This section of the research addressed four main axes: 
[1.0,1.0,1.0]. Numerical integration: The odeint evaluating the CNN, assessing the generated key, 
function is used for numerical integration to solve the evaluating the PSO algorithm, and finally comparing 
differential equations over time. Through this process, the results of the proposed method with similar 
the values of x, y, and z are updated at each time step, methods. As follows: 
based on the parameters a, b, and c that influence the  
system's behavior. A.  Results of CNN models and performance analysis 
• Generating a chaotic sequence: A chaotic At this stage, the data was divided into training and 
sequence is generated by solving the differential testing sets to ensure the accuracy of the models in 
equations of the Chen chaotic system over multiple predicting and distinguishing between different 
time steps. This sequence is then used to generate a categories. 80% of the data was allocated for training, 
chaotic encryption key. and 20% for testing, ensuring an equal distribution of 
• Encryption key generation: The resulting categories in both sets to avoid bias. After training the 
chaotic sequence is converted into integer values models on the training set, their performance was 
ranging from 0 to 255 to represent color values in an tested using the test data. 
RGB image. This is done by multiplying each value in 
         The accuracy of the models was calculated using 
the sequence by 255 and converting it to the uint8 data 
the accuracy function available in the TensorFlow 
type. 
library, which represents the ratio of correct predictions 
• Combining the chaotic key with the 
to the total number of predictions. To continuously 
generated key: The chaotic key is combined with the 
monitor performance, TensorBoard was used, which 
key generated using CNN through an XOR operation. 
helped track various metrics such as accuracy and loss 
This step increases the complexity of the final key used 
throughout the training and testing phases, allowing for 
for image encryption. 
ongoing improvements to the models based on these 
• Encrypting the image: The XOR operation is 
indicators. The results obtained showed varying 
applied between the original image and the final key to 
performance across the different models, with these 
generate the encrypted image. This operation 
results summarized in Tables 5, 6, and 7, which 
transforms the pixel values in the image into new 
illustrate accuracy and loss across different 
values based on the chaotic key. 
generations. Additionally, the graphs in Figures 5, 6, 
and 7 show the evolution of the models' performance 
   Part three: key generation using CNN and image over time, highlighting the models' ability to learn and 
decryption: improve progressively. Where the false positive rate 
was 0.000185. 
This part is similar to the stages in Part 2, with the 
only difference being that the models are not built and 
Table 5: Model performance comparison: accuracy and 
trained again; instead, the previously saved models 
loss.
are loaded. Additionally, there is a decryption stage 
instead of encryption. The stages in this part are as 
follows: 
• Data loading phase: Only the test set is loaded 
(i.e., 600 genuine fingerprints from SOCOFing). 
• Data preprocessing phase: The raw data is 
processed to prepare it for the next stage. 
• Database analysis phase: The data is analyzed 
 
to extract the necessary information. 
• Loading the saved models: The previously 
trained models are loaded.  
• Verification and key generation: The data is 
verified, and the key is generated.  
 
 
224   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
Table 6: Classification report for finger recognition  
Figure 6: Accuracy and loss of the fingerprint model. 
 
Table 7: Classification report for subjectID recognition 
 
Figure 7: Confusion matrix and accuracy metric 
B.  Encryption key evaluation results and metrics: The 
generated encryption key was evaluated using a set of 
specialized metrics to ensure its quality and 
effectiveness in resisting cyberattacks. The experiments 
were conducted using fingerprint images sized 96 × 96 
in a Kaggle environment with Python, on a workstation 
 equipped with an Xeon(R) I (R) processor, 64 GB of 
RAM, and a GPU P100. The metrics used included 
evaluations such as key size and various randomization 
tests (such as entropy test, repetition test, etc.) to assess 
the randomness of the key and its predictability. These 
tests help ensure that the system remains unaffected 
when used in live applications. 
6.1   Key space analysis 
Brute-force attack is a type of cyber-attack that relies on 
guessing the key by attempting a large number of 
possible passwords or secret phrases. An encrypted 
image with a short key is highly vulnerable to this attack 
over time. However, if the key is longer, it will remain 
 resistant for a longer period. Therefore, it becomes 
impossible to guess the key if it has an adequate length. 
Figure 5: Accuracy and loss of the identity model. Key space analysis is used to assess the strength against 
brute-force attacks.  
According to this analysis, a key with a length greater 
than 2100 is considered suitable for high-security 
encryption [26]. In our system, we propose an approach 
based on deep neural networks (CNN) and PSO to 
generate this key.  
 
 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 225 
 
The key has a size of 1024 values, with each value 6.2 Significance of results in cryptographic 
ranging between 0 and 255. This means the key consists key management 
of 1024 bytes (since each value requires one byte, and 8 
bits are enough to represent values from 0 to 255). Given 
that each value in the key ranges from 0 to 255, we have • The results, such as randomness tests and high 
256 possibilities for each value. With 1024 values, the entropy, demonstrate that the generated key exhibits a 
total key space will be: 2561024 or, in other words, 2 high degree of randomness, making it ideal for high-
8×1024=28192.This represents an extremely large key space, security applications. 
which is sufficiently large to be highly resistant to brute- • High entropy indicates that the keys have a 
force attacks. A key size of 28192 offers a very high level uniform distribution of values, reducing  
of security, making it practically impossible to crack • the likelihood of predicting any part of the key, 
using brute-force methods, even with fast computing which is a critical feature in key management. 
devices. Nonetheless, the question remains: How does 
the proposed system perform against brute-force 6.3   Encryption key tests 
attacks?  In this study, six fingerprint samples were used as the 
basis to generate six encryption keys. Each key underwent 
Comparison of key space (28192) with comprehensive testing using eight different metrics to 
traditional systems determine the quality and randomness of the generated 
keys. The results of these eight tests were systematically 
presented in a table, reflecting the effectiveness of the 
• Comparison with AES-256: proposed method. The results showed the success of the 
keys in all eight tests, confirming that the keys generated 
The proposed key space (28192) is significantly larger from the fingerprints meet the required security standards. 
compared to AES-256, where the key space is These tests demonstrate the randomness and 
approximately 2256. This substantial difference makes our unpredictability of the keys, making the approach suitable 
key space more resistant to brute-force attacks. for secure encryption applications. The core encryption 
Traditional systems like AES-256 rely on efficient key tests include the following: 
algorithms to compensate for the smaller key space 
compared to the vast proposed space. 
• Entropy test: 
• Comparison with RSA-2048:      The entropy measures the distribution of information 
in the key and reflects the level of randomness. The 
      The proposed key space is also significantly larger entropy is calculated using the following equation (6): 
compared to RSA-2048, where the key space is 
approximately 22048. RSA relies on computational H(X)= −∑𝑛
𝑖=1 𝑝 (𝑥𝑖) 𝑙𝑜𝑔2 𝑝(𝑥𝑖)           (6 ) 
complexity for large numerical factorization, whereas in 
our system, the security strength depends on the key 
Where p(xi) is the probability distribution of the value xi 
length derived from biometric features processed through 
in the key. If the entropy equals 8 bits, it means the key is 
deep networks. 
completely random [26].  
• Comparison with ECC-384 (Elliptic curve                                                                                                                        
Table 8: Results of the entropy test  
cryptography): 
     The traditional key space for ECC-384 is 
approximately 2384, which is much smaller compared 
to our proposed key space (28192). ECC relies on 
elliptic curves to compensate for shorter keys, but in 
contrast, we provide much longer keys derived from  
neural networks, enhancing their unpredictability. 
• Repetition test: The repetition test generally aims to 
ensure that the key does not contain any repeated sections 
• Comparison with DES (Data encryption 
within its sequence, whether these sections are adjacent or 
standard): 
non-adjacent. If parts of the key are repeated, it weakens 
the randomness and increases the likelihood of 
     The key space in DES is 256, which is extremely small discovering a pattern that can be exploited in an attack. 
compared to our proposed key space. This test involves checking all parts of the key to detect 
DES is considered outdated and vulnerable to brute- any repetition that might impact its security level.  The 
force attacks, whereas our proposed key space vastly repetition test addresses repetition in the key overall, 
surpasses it in terms of length and complexity. whether in adjacent or non-adjacent parts [27]. 
 
 
226   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
Table 9: Results of the repetition  test Table 11:  Results of the repetition test(adjacent)
 
 
• Uniformity test using the chi-squared test: 
• Pearson correlation test is a statistical test used to 
measure the relationship between two variables. 
This test aims to check whether the values in the This relationship is expressed by a coefficient 
encryption key are evenly distributed across the full called the "Pearson correlation coefficient," which 
range of possible values. The chi-squared test is used to ranges from -1 to 1. If the correlation coefficient is 
compare the actual distribution of values in the key with close to 0, it indicates no correlation (high 
the expected ideal distribution. If the values are evenly randomness), making the encryption key strong and 
distributed, the key is considered to have a uniform hard to predict. The purpose is to determine the 
distribution. Equation (7) illustrates the test: extent of the correlation between values in the 
encryption key. If the correlation coefficient is 
255
𝑛 close to 0, it indicates that the key is sufficiently 
( −𝑛𝑖)
χ2=∑ ( 256 )                 (7 ) random, thus making it strong against analytical 
𝑛/256
𝑖=0 attacks.  The equation (8) represents the Pearson 
correlation equation: 
• ni: The frequency of occurrence of value i in the 
key.  
• n/256: The expected frequency for each value i 
assuming a uniform distribution. ∑(Xi−Xˉ)(Yi−Yˉ)
                         r =             (8) 
• n: The total number of values in the key. √∑(Xi−Xˉ)2⋅∑(Yi−Yˉ)2
• If the chi-square (χ2) value is low, it indicates 
that the actual distribution of values is close to the    Where: 
ideal distribution, meaning the key is evenly 
distributed. • r: Pearson correlation coefficient. 
• At a significance level of 0.05, if the chi-square • Xi: Individual values in the first series. 
value is less than 293.25, the key is considered to • Yi: Individual values in the second series (e.g.,   
have passed the test and has a uniform distribution lagged values in time series). 
[28]. • X̄: Mean of the Xi values. 
• Ȳ: Mean of the Yi values [30]. 
Table 10: Results of the uniformity test 
Table 12: Results of the pearson correlation test 
 
 
•   Repetition test(adjacent) focuses specifically on 
identifying repetition in adjacent parts of the key. • Stability test 
This test checks for any repeated consecutive or 
sequential sections that might indicate a fixed pattern The key must remain stable if the input data is stable. 
or excessive repetition, which could weaken the This means that if the same inputs are used to generate 
effectiveness of encryption. Repetition of adjacent the key multiple times, the resulting key should always 
parts is considered a sign of poor randomness, thus be identical.  However, slight changes in the inputs 
reducing the strength of the key. The closer the value should result in a significant change in the key, which 
is to 0, the less repetition there is, which means the enhances encryption strength against attacks. 
key has a higher level of randomness [29]. 
 
 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 227 
 
6.3   Consistency of fixed inputs Table 14: Results of the stability test (1) 
o If the input is fixed I, the encryption system should 
produce the same key K every time: F(I)=K. 
o Repeat key generation multiple times using the same 
I, and the result should be consistent K in all 
attempts: K1=K2=⋯=Kn. 
Table 13: Results of the stability test (1) 
 
    Table 15: Encryption results with the original key and 
the modified key
 
6.5   Range test 
The range test aims to evaluate the distribution of 
encryption key values within a specific range to ensure 
its randomness. 
  Steps of the range test 
1. Calculate the range: Determine the difference 
between the maximum value max and the minimum 
 value min.  Range=max−min 
6.4   Sensitivity to minor changes 2. Range splitting: divide the range into buckets. 
 (inclusivity effect) 3. Frequency calculation: count the values in each 
We make a slight change in the input I to create I′.  bucket. 
A new key K′ is generated using I′:      F(I′) = K′. We 4. Distribution analysis: if the frequencies are 
measure the difference between K and K′ using the approximately equal, the key is considered random. 
 Bit Change Rate: expected frequency equation (9): 
 
Bit difference between  K and K′
  Bit change rate=  ×               Ei = N/M     (9) 
1024 where N is the total number of values, and M is the 
   100% number of buckets[27]. 
 
The change rate should be higher than 50% to ensure the 
 
  system's sensitivity to changes [31].
 
 
 
 
 
 
 
 
 
 
228   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
Table 16: Results of the range test Table 17: Results of the autocorrelation test   
 
• Autocorrelation test 
The Autocorrelation Test is used to determine the 
randomness of a sequence of values in an encryption key. 
If the key is sufficiently random, the autocorrelation 
values should be small or close to zero, indicating no 
clear pattern or dependency in the sequence. To calculate 
the autocorrelation at a lag d, equation (10) is used: 
 
1 𝑛−𝑑
 R(d) = ∑ (𝑥𝑖. 𝑥𝑑 + 𝑖 )    (10) C. Results of using PSO in enhancing the encryption key 
𝑛−𝑑 𝑖=1
After completing the specified number of iterations, the 
best encryption key is obtained, which is the key that 
Where: R(d) is the autocorrelation coefficient for lag d. 
achieved the highest fitness during the optimization        
process. Table 18.   illustrates the effect of using PSO on 
• xi is the value at position i in the sequence. the generated encryption key. 
• Xd+i_ is the value at position d+i in the sequence. 
• n is the length of the sequence. 
Table 18: The impact of the PSO algorithm in improving 
the encryption key.
A value of R(d) close to zero for different values of d 
indicates a high level of randomness in the encryption 
key, The Figure 8. shows the distribution of 
autocorrelation test results, highlighting successful and 
failed values based on the specified critical value (0.05) 
[28]. 
 
D. Comparison of the accuracy of the proposed system 
with other systems 
This section evaluates the accuracy of our proposed 
system in comparison to other systems reported in recent 
years, based on their respective sources. Experimental 
results from our proposed model demonstrated an 
 accuracy exceeding 99%. Table 19. presents a detailed 
comparison between our system and other existing 
Figure 8: Distribution of autocorrelation test results with systems. 
success and failure indication based on critical value 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 229 
 
Table 19: Accuracy comparison between our system generates robust keys.                         
and recent approaches.         
Randomness tests (e.g., entropy): The use of PSO in 
the proposed system significantly contributed to 
enhancing randomness, which strengthens the 
generated keys. When comparing randomness tests 
(e.g., entropy) with other models, the proposed system 
showed remarkable superiority. The combination of 
CNN and PSO enabled the generation of keys with 
excellent randomness levels, providing a higher degree 
of security compared to traditional models. PSO helps 
optimize the quality of the keys by searching for the 
optimal combination of hidden parameters, making 
 them more random and harder to break. 
                                                              
 
      It is important to note that the model only retains the 
predictions generated during its operation, which are 
The aim of this comparison is to evaluate the 
values devoid of any sensitive information. This 
effectiveness of the proposed system in the context of 
enhances the system's security against various types of 
recent advancements in deep learning technology, 
attacks, such as mixed replacement attacks, crossover 
providing insight into how cybersecurity can be 
attacks, and exhaustive search attacks. In such attacks, 
enhanced through the application of advanced 
the attacker has no knowledge of the key generation 
techniques. The table also reflects the ongoing progress 
mechanism or the supporting data, making the number 
in fingerprint data processing, showing that modern 
of attempts required to crack the key increase 
systems achieve higher accuracy than traditional 
proportionally with its length. For example, if the key 
systems, supporting the idea that using deep learning 
length is 1024, the number of possible combinations 
can improve the effectiveness and security of 
would reach 28192 The proposed system focuses on 
encryption systems. 
enhancing data security by avoiding key storage, 
improving the randomness of key generation, and 
7   Discussion protecting sensitive information from various attacks, 
In this section, we will discuss the results of the while ensuring high efficiency in user fingerprint 
proposed system in comparison to modern methods recognition. 
presented in Table 19, focusing on accuracy, 
randomness tests (such as entropy), and robustness. Robustness: Biometric key (encryption key for 
Additionally, we will address potential trade-offs security): 
associated with using CNNs, such as computational The biometric key is generated based on the parameters 
overhead. Below is a detailed comparison of key learned during the training of the CNN model. 
results. During training, the model learns unique representations 
 or features extracted from fingerprints. These 
7.1   Comparison of results  representations are numerical weights that are not easily 
interpretable. 
Accuracy:  The proposed system (using CNN and 
The parameters are converted into an encryption key 
PSO) achieved an accuracy between 99.73% and 
that relies on the unique properties of each fingerprint, 
99.83% within just 20 epochs, outperforming most making the key: 
models in the table. For example: The enhanced 
VGG-16 model achieved 99.98% accuracy in 100 • Unique and tamper-proof. 
epochs, the highest in the table, but required five • More secure and resistant to duplication. 
times more epochs than the proposed system. The 
Modified-LeNet model achieved 99.10% accuracy   Role of PSO in key enhancement: 
in 55 epochs, which is lower than the proposed 
system. The DeepFKTNetmodel achieved 98.89%    PSO improves the key by identifying the optimal values 
accuracy in 60 epochs. Thus, the proposed system of the parameters used in key generation. This enhances 
stands out as a strong option, delivering high accuracy randomness and independence among keys, making 
them more resistant to attacks. 
in less training time, thanks to the combination of 
CNN and PSO, which enhances feature extraction and 
 
 
230   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
7.2 Advantages of the proposed system gradients, resulting in better stability during the training 
process and improved model performance in generating 
encryption keys. Table 20. illustrates the key strength 
 The proposed system combines CNN and PSO to 
(entropy measure) when using the Tanh activation 
achieve: 
function compared to using the ReLU activation function. 
• High classification accuracy in less time. 
Table 20: Compares the key strength (entropy) of the 
• High-quality encryption keys with excellent levels of 
Tanh and ReLU activation functions. 
randomness and security. 
• Strong protection of users’ biometric data against 
exploitation or breaches. 
• Improved biometric key performance using PSO to 
generate stronger and more random keys, increasing 
the system’s resilience to cyber threats. Thus, the 
proposed system leverages the multiple features of 
CNN and PSO, making it more robust in addressing 
security challenges such as resistance to adversarial 
attacks. While other models primarily focus on 
classification and accuracy, the proposed system 
demonstrates additional strength in encryption 
applications. 
Potential trade-offs: 
Although the use of CNN in the proposed system results  
in a slight increase in computational overhead compared 
to simpler models like the modified LeNet, this does not The batch normalization layer plays a significant role 
pose a significant obstacle. The model is designed to in stabilizing and accelerating the learning process in 
operate efficiently on modern systems supported by deep models by normalizing the outputs to have a 
Graphics Processing Units (GPUs), ensuring accelerated mean of 0 and a standard deviation of 1. While this 
training and reduced execution time. stabilization is beneficial in many applications, such as 
image classification, it may negatively impact the 
strength of the generated encryption key. 
8  Conclusions 
  • The results indicate that the generated keys exhibit 
• The results of this research show that integrating high levels of randomness, making them more 
biometric techniques with deep learning provides an challenging to breach. Additionally, the use of PSO 
innovative and effective solution for generating algorithm is considered an effective technique for 
secure and robust encryption keys based on enhancing the randomness of the keys, as it allows for 
fingerprints. The proposed system enhances the generating different keys for each transmission, 
security of data transmitted over the internet, making thereby reducing the risk of key theft and increasing 
it more resistant to theft and tampering. The use of security. A comprehensive analysis of the performance 
two convolutional neural network models is a of the models used in this research was conducted, 
significant step, where the first model contributes to showing a significant improvement in encryption 
identity recognition and the second focuses on effectiveness and the reliability of the generated keys, 
fingerprint detail recognition, ensuring the extraction underscoring the efficiency of these models in the 
of unique and reliable biometric features. context of cybersecurity. 
  
• One of the main conclusions of this research is that • The proposed approach enhances system security 
the tanh activation function plays a crucial role in compared to traditional systems by reducing reliance 
neural networks for generating encryption keys. This on static keys, which are a vulnerability in many 
function is known for its ability to transform outputs encryption systems. Instead, biometric verification is 
into the range of (-1, 1) non-linearly, which used to generate unique keys for each user based on 
contributes to improving the quality of the generated their fingerprints, thereby increasing the level of 
keys. Increased complexity and randomness:  security. This research provides a significant 
        The tanh function ensures a more balanced contribution to systems that require high levels of 
distribution of values across the range (-1, 1), reducing protection, such as financial systems and medical data, 
value concentration and enhancing the randomness of the by facilitating biometric verification for encryption 
key, leading to the generation of secure and robust without the need to exchange keys, thereby reducing 
encryption keys.  Better stability during training: The associated risks. Additionally, the automatic key 
tanh function helps avoid issues such as vanishing change feature adds an extra layer of security, 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 231 
 
reflecting the effectiveness of this system in providing [8] Hashem, M. I., & Kuban, K. H. (2023). Key 
advanced protection. Ultimately, the research generation method from fingerprint image based on 
highlights the importance of integrating biometrics deep convolutional neural network model. Nexo 
and deep learning in developing effective security Revista Científica, 36(6), 906-925. 
solutions that address contemporary challenges in https://doi.org/10.5377/nexo.vXXiXX.XXXX 
data protection. [9] Erkate, U., Toktas, A., Enginoglu, S., Karabacak, E., 
 & Thanh, D. N. H. (2024). An image encryption 
• The method presented in the research has wide scheme based on chaotic logarithmic map and key 
potential for application in various fields. In addition generation using deep CNN. Expert Systems with 
to securing fingerprints and using them to generate Applications, 237, 121452. 
encryption keys, the method can be applied to secure 
https://doi.org/10.1016/j.eswa.2023.121452 
internet of things (IoT) devices by generating strong 
[10] Quinga Socasi, F., Zhinin-Vera, L., & Chang, O. 
encryption keys that protect communication between 
(2020). A deep learning approach for symmetric key 
devices. It can also be used to secure data stored in 
cryptography system. In Proceedings of the Future 
the cloud by generating high-security encryption keys 
Technologies Conference (pp. 41). 
based on unique user attributes, such as fingerprints. 
These applications highlight the flexibility and https://link.springer.com/chapter/10.1007/978-3-030-
efficiency of the method in addressing modern 63128-4_41 
cybersecurity challenges and enhance its appeal in [11] Wu, Z., Lv, Z., Kang, J., Ding, W., & Zhang, J. 
various practical scenarios. (2022). Fingerprint bio-key generation based on a 
 deep neural network. International Journal of 
References Intelligent Systems, 37(7), 4329–4358. 
https://doi.org/10.1002/int.22782 
[1] Hosny, K. M., Darwish, M. M., & Fouda, M. M. 
[12] Alesawy, O., & Muniyandi, R. C. (2016). Elliptic 
(2021). Robust color images watermarking using 
Curve Diffie-Hellman random keys using artificial 
new fractional-order exponent moments. IEEE 
neural network and genetic algorithm for secure data 
Access, 9, 47425–47435. 
over private cloud. Information Technology Journal, 
https://doi.org/10.1109/ACCESS.2021.3069317 
15(2), 77-83. https://doi.org/10.3923/itj.2016.77.83 
[2] Kuzior, A., Tiutiunyk, I., Zielińska, A., & Kelemen, 
[13] Saini, A., & Sehrawat, R. (2024). Enhancing data 
R. (2024). Cybersecurity and cybercrime: Current 
security through machine learning-based key 
trends and threats. Journal of International Studies, 
generation and encryption. Engineering, Technology 
17(2). https://doi.org/10.14254/2071-8330.2024/17-
& Applied Science Research, 14(3), 14148-14154. 
2/5 
https://doi.org/10.48084/etasr.7181 
[3] Saran, D. G., & Jain, K. (2023). An improvised 
[14] Kurtninykh, I., Ghita, B., & Shiaeles, S. (2021). 
algorithm for a dynamic key generation model. In 
Comparative analysis of cryptographic key 
Inventive Computation and Information 
management systems. King's College London, 
Technologies: Proceedings of ICICIT 2022 (pp. 
Strand, London, WC2R 2LS, UK.  
607–627). Springer Nature Singapore. 
https://doi.org/10.48550/arXiv.2109.09905.  
https://doi.org/10.1007/978-981-19-5048-5_44 
[15] SSL Support Team. (2024, May 3). Key 
[4] Rahman, Z., Yi, X., Billah, M., Sumi, M., & Anwar, 
management best practices: A practical guide. 
A. (2022). Enhancing AES using chaos and logistic 
Retrieved from [SSL Support Team Website] 
map-based key generation technique for securing 
https://www.ssl.com/article/key-management-best-
IoT-based smart home. Electronics, 11(7), 1083. 
practices-a-practical-guide// 
https://doi.org/10.3390/electronics11071083 
[16] Wang, L., & Lv, Y. (2024). Differential privacy-
[5] Kuznetsov, O., Zakharov, D., & Frontoni, E. 
based data mining in distributed scenarios using 
(2024). Deep learning-based biometric 
decision trees. Informatica, 48(2), 145–158. 
cryptographic key generation with post-quantum 
https://doi.org/10.31449/inf.v48i23.6918 
security. Multimedia Tools and Applications, 
[17] Tu, Z., Milanfar, P., & Talebi, H. (2023). MULLER: 
83(19), 56909–56938. 
Multilayer Laplacian Resizer for Vision. 
https://doi.org/10.1007/s11042-023-15265-6 
ResearchGate. Retrieved from 
[6] Yang, W., Wang, S., Cui, H., Tang, Z., & Li, Y. 
https://www.researchgate.net/publication/369855623
(2023). A review of homomorphic encryption for 
_MULLER_Multilayer_Laplacian_Resizer_for_Visi
privacy-preserving biometrics. Sensors, 23(7), 
on. 
3566. https://doi.org/10.3390/s23073566. 
[18] Saifullah, S., Pranolo, A., & Dreżewski, R. (2024). 
[7] Rana, M., Mamun, Q., & Islam, R. (2023). 
Comparative analysis of image enhancement 
Enhancing IoT security: An innovative key 
techniques for brain tumor segmentation: Contrast, 
management system for lightweight block ciphers. 
histogram, and hybrid approaches. Journal Name, 
Sensors, 23(18), 7678. 
Volume (Issue), Page range. 
https://doi.org/10.3390/s23187678 
https://doi.org/10.48550/arXiv.2404.05341 
 
 
232   Informatica 49 (2025) 213–234                                                                                                                   S.A.S. Almola et al. 
[19] Singh, P., Dutta, S., & Pranav, P. (2024). https://doi.org/10.3390/math10081285 
Optimizing GANs for Cryptography: The Role and [30] Nahar, P., Chaudhari, N. S., & Tanwani, S. K. 
Impact of Activation Functions in Neural Layers (2022). Fingerprint classification system using 
Assessing the Cryptographic Strength. Applied CNN. Multimedia Tools and Applications, 81(17), 
Sciences, 14(6), 2379. 24515–24527. https://doi.org/10.1007/s11042-022-
https://doi.org/10.3390/app14062379. 13494-6 
[20] Zhang, B., & Liu, L. (2023). Chaos-Based Image [31] Nguyen, H. T., & Nguyen, L. T. (2019). 
Encryption: Review, Application, and Challenges. Fingerprints classification through image analysis 
Mathematics, 11(11), 2585. and machine learning method. Algorithms, 12(11), 
https://doi.org/10.3390/math11112585 241. https://doi.org/10.3390/a12110241 
[21] Taylor, O. E., & Igiri, C. G. (2024). Enhancing [32] Ang, L.-M., Seng, K. P., Ijemaru, G. K., & 
image encryption using histogram analysis, adjacent Zungeru, A. M. (2018). Deployment of IoV for 
pixel autocorrelation test in chaos-based framework. smart cities: Applications, architecture, and 
International Journal of Computer Applications, challenges. IEEE Access, 7, 6473–6492. 
186(22). https://doi.org/10.5120/ijca202492338 https://doi.org/10.1109/ACCESS.2018.2886575 
[22] Munshi, N. H., Das, P., & Maitra, S. (2022). Chi- [33] Saeed, F., Hussain, M., & Aboalsamh, H. A. 
Squared Test Analysis on Hybrid Cryptosystem. (2018a). Classification of live scanned fingerprints 
Volume 14, Issue 1, 34-40. using dense SIFT based ridge orientation features. 
https://doi.org/10.2174/18764029136662105082357 2018 1st International Conference on Computer 
06. Applications & Information Security (ICCAIS), 1–
[23] Rasheed, A. F., Zarkoosh, M., & Abbas, S. (2023, 4. https://doi.org/10.1109/CAIS.2018.8441995 
October). Comprehensive Evaluation of Encryption [34] Saeed, F., Hussain, M., & Aboalsamh, H. A. 
Algorithms: A Study of 22 Performance Tests. 2023 (2018b). Classification of live scanned fingerprints 
Sixth International Conference on Vocational using histogram of gradient descriptor. 2018 21st 
Education and Electrical Engineering (ICVEE), Saudi Computer Society National Computer 
Surabaya, France, 191-194. Conference (NCC),1–5. 
https://doi.org/10.1109/ICVEE59738.2023.1034824 https://doi.org/10.1109/NCC.2018.8682629 
0. 
[24] Feng, L., Du, J., & Fu, C. (2023). Double graph  
correlation encryption based on hyperchaos. PLOS 
ONE, 18(9), e0291759.  
https://doi.org/10.1371/journal.pone.0291759. 
[25] Barker, E., & Roginsky, A. (2024). NIST SP 800-
 
131A Rev. 3: Transitioning the use of cryptographic 
algorithms and key lengths (Initial Public Draft). 
National Institute of Standards and Technology.  
https://doi.org/10.6028/NIST.SP.800-131Ar3.ipd. 
[26] Avaroğlu, E., Kahveci, S., & Akkurt, R. (2024).  
Optimization of Acoustic Entropy Source for 
Random Sequence Generation Using an Improved  
Grey Wolf Algorithm. Computer Engineering 
Department, Faculty of Engineering, Mersin  
University. https://doi.org/10.18280/ts.410220 
[27] Foreman, C., Yeung, R., & Curchod, F. J. (2024).  
Statistical testing of random number generators and 
their improvement using randomness extraction. 
 
Cryptology ePrint Archive, Paper 2024/492. 
Retrieved from https://doi.org/10.3390/e26121053 
[28] Taylor, O. E., & Igiri, C. G. (2024). Enhancing  
image encryption using histogram analysis, 
adjacent pixel autocorrelation test in chaos-based  
framework. International Journal of Computer 
Applications, 186(22).  
https://doi.org/10.5120/ijca2024923653 
[29] Saeed, F., Hussain, M., & Aboalsamh, H. A.  
(2022). Automatic fingerprint classification using 
deep learning technology (DeepFKTNet).  
Mathematics, 10(8), 1285. 
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 233 
 
  
  
  
  
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.9490 Informatica 49 (2025) 235–248 235 
 
CNN and LSTM-Based Multimodal Data Fusion for Performance 
Optimization in Aerobics Using Wearable Sensors 
 
Danhua Tan 
School of Physical Education, Hengyang Normal University, Hengyang, Hunan, 421006, China 
E-mail: tandanhua184914@outlook.com 
 
Keywords: wearable sensors, convolutional neural network, long short-term memory, kalman filtering, aerobics 
movements 
Received: May 31, 2025 
Aerobics is a high-intensity, multi-dimensional sport. Its motion evaluation places higher demands on 
data quality and time series modeling capabilities. This paper proposes a method for evaluating aerobics 
motion that integrates wearable sensors and motion tracking systems. It combines convolutional neural 
networks (CNNs) with long short-term memory networks (LSTMs) to perform fusion analysis on 
multimodal data from accelerometers, gyroscopes, magnetometers, and Kinect motion capture systems. 
To improve data quality, Kalman filtering, time synchronization, and wavelet transform techniques are 
introduced to preprocess the raw data. Experimental results show that this method performs well in motion 
classification tasks: in indoor low-intensity training scenarios, the accuracy of the CNN model increases 
from 74.5% to 87.1%; in high-intensity training scenarios, the accuracy increases from 75.0% to 88.2%. 
After combining with LSTM, the model further enhances the modeling capabilities of motion time series 
features and improves the recognition accuracy of complex motions. In different training scenarios, the 
average improvement rate of motion scores is 25.8%. The system feedback delay is controlled within 200 
milliseconds, with good real-time and practical performance. This method provides aerobics athletes with 
high-precision movement assessment and personalized training suggestions, promoting the intelligent and 
personalized development of sports training. 
Povzetek: Metoda združuje senzorje, CNN in LSTM za multimodalno analizo aerobičnih gibov. 
Kalmanovo filtriranje izboljša kakovost signalov, klasifikacijska točnost naraste do 88,2 %, povprečno 
izboljšanje rezultatov znaša 25,8 %, odzivnost sistema pa ostane pod 200 ms. 
 
1 Introduction combines wearable sensors with motion tracking systems 
and uses CNN models and LSTM to fuse and analyze 
With the popularity of aerobics, the accuracy of multimodal data. The system acquires motion data 
movements and training effects have become the focus of through sensors such as accelerometers, gyroscopes, 
coaches and athletes. During high-intensity and complex magnetometers, and Kinect motion capture systems. It 
exercise, the movements of aerobics athletes can be uses Kalman filtering, time synchronization, and wavelet 
affected by factors such as physical exertion, sports skills, transform to optimize data quality. The optimized data is 
and external environment, resulting in unstable movement used through the CNN model to evaluate and optimize 
performance. Traditional manual evaluation methods are motion performance, providing real-time feedback and 
inefficient and subjective, and cannot provide athletes personalized training suggestions. The CNN-based 
with accurate training feedback in real-time. With the optimization method combines wearable sensor 
development of sensor technology [1] and artificial technology with deep learning (DL) algorithms to improve 
intelligence [2], [3], motion evaluation methods based on the accuracy and stability of motion evaluation. 
wearable devices [4] and intelligent feedback systems Experimental results show that the combination of 
have become a research hotspot. Such feedback systems Kalman filtering and CNN models effectively improves 
can provide accurate real-time data analysis, optimize the accuracy and stability of aerobics motion evaluation, 
training programs, and improve athlete performance. providing strong support for the intelligent and precise 
Therefore, developing a motion evaluation and development of sports training. 
optimization system based on intelligent technology [5], Current research mostly uses weighted averaging or 
[6] has become the key to improving training effects and simple concatenation, and the model structure is fixed, 
athlete performance. without optimizing for the temporal characteristics and 
This paper studies the motion evaluation and complex action patterns of sports data. This paper 
optimization system based on intelligent technology to combines wearable sensors with aerobics tracking and 
improve the training effect and motion performance of uses a model based on CNN and LSTM to achieve 
aerobics athletes. To achieve this goal, this paper performance optimization. The main contributions of this 
236   Informatica 49 (2025) 235–248                                                                                                                                            D. Tan 
 
study include: 1) wavelet transform is combined and CNN-based multimodal data fusion method to improve 
principal component analysis (PCA) to extract time- the accuracy and robustness of athlete action recognition 
frequency features, and a dynamic weighted fusion by combining accelerometer, gyroscope, and visual data 
strategy is adopted to improve the robustness of data [12]. In this study, data fusion technology [13] effectively 
fusion; 2) small convolutional kernels are introduced into reduces sensor errors and enhances the system's 
CNN to capture action details and combined with double- adaptability to complex actions. Zhang L proposed a KCF 
layer LSTM to model long-term dependencies, enhancing (Kernelized Correlation Filters) tracking method based on 
the model's ability to recognize complex action sequences; improved depth information [14], which successfully used 
3) based on the model output, an action scoring function Kalman filtering to reduce the noise of motion sensors and 
and error correction mechanism are constructed to provide improve the stability of motion estimation. Although these 
athletes with immediate feedback and personalized methods have achieved good results to a certain extent, 
training suggestions, improving the model's generalization most of them focus on a single motion estimation task, and 
ability in different training scenarios through data their effects in complex training environments still need to 
augmentation and adaptive filtering techniques. be improved. Existing methods also have shortcomings in 
terms of personalized training feedback [15] and the 
2 Related work generation of real-time optimization suggestions [16]. 
Therefore, how to comprehensively utilize multimodal 
In recent years, many scholars have been committed to data and combine DL with real-time optimization 
improving the accuracy of athletes' motion evaluation feedback systems is still a major challenge in current 
through different technical means. Traditional motion research. 
capture systems [7] rely too much on calibration 
equipment and high-cost hardware settings. Although 
3 Data fusion and movement 
such capture systems can capture the movements of 
athletes, they suffer from problems such as poor real- performance optimization 
time performance, high data noise, and inconvenient 
operation when evaluating high-intensity sports or 3.1 Data collection and preprocessing 
complex movements. To improve the quality of sports 
The study combines wearable sensors with motion 
data, many studies have attempted to use wearable sensors 
tracking systems to design an efficient data collection and 
for motion tracking. Rigozzi C J et al. used data from 
preprocessing solution. The key to the entire process is to 
sensors such as accelerometers, gyroscopes, and 
synchronously collect data from multiple sources and 
magnetometers to monitor athletes’ body posture and 
eliminate errors, providing a reliable basis for subsequent 
motion trajectory [8]. Sensor data is easily affected by 
analysis. 
noise, environmental changes, and wear position 
Wearable devices collect data in real-time through 
deviation, resulting in inaccurate data. To reduce noise 
built-in accelerometers [17], gyroscopes [18], and 
interference, Zhang Y applied Kalman filtering 
magnetometers [19]. The accelerometer records the 
technology to the preprocessing of sensor data [9]. As 
athlete’s acceleration changes in three-dimensional space; 
technology matures, DL technology [10], especially 
the gyroscope measures the athlete’s rotational angular 
CNNs, has been applied to multimodal data analysis and 
velocity; the magnetometer helps correct the direction of 
action recognition by Gholamiangonabadi D [11], and has 
movement. A multi-sensor system can accurately capture 
achieved certain results. Existing research still faces 
every movement of an athlete and generate rich time series 
problems such as how to combine multiple data sources, 
data [20]. Sensor data fusion equation is: 
optimize data processing processes, and provide real-time 
f(t) = α ⋅ a(t) + β ⋅ ω(t) + γ ⋅ m(t) 
feedback in actual training scenarios. (1) 
To solve the above problems, some researchers have a(t) is acceleration data; ω(t) is angular velocity data; 
proposed a hybrid method that combines sensor data and m(t)  is magnetic field data; α , β , γ  are weighting 
DL algorithms to improve the accuracy and real-time parameters. Figure 1 is a data acquisition flow chart. 
performance of action recognition. Chakraborty A used a 
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 237 
 
Sensors collect data 
(acceleration, angular Lower limb motion 
Human motion
velocity, angle) reconstruction
Training Data calibration 
datasets and cleaning
VICON
a b
Thigh 
motion
Wearable Intra-limb 
motion capture coordination Data fusion 
device model and feature 
extraction
Shank motion
 
Figure 1: Data collection flow chart 
Figure 1 shows the complete process from data In addition to multi-sensor equipment, the motion 
collection to motion analysis. The motion data is obtained tracking system Kinect [24] and depth camera [25] are 
through wearable sensors and motion capture systems; key also introduced to obtain the spatial position information 
features are extracted through data cleaning and fusion; of the key parts of the athletes. The system captures the 
the CNN model is used to optimize the motion athlete’s action posture through 3D coordinates and 
performance evaluation, ultimately providing scientific records the spatial coordinates of joints such as shoulders, 
motion optimization and training suggestions for aerobics elbows, and knees, as well as their dynamic trajectories 
athletes. To ensure the accuracy of the data, all collected over time. Through the calibration algorithm, combined 
sensor data are transmitted to the background data with the position information of the sensor and the motion 
processing system in real-time via Bluetooth or Wi-Fi tracking system, the effects caused by the wearer position 
modules [21] to ensure the real-time and efficient data. By offset or motion capture error are corrected. Motion 
adopting wireless data transmission [22], the data trajectory smoothing formula is: 
transmission is not affected by the physical distance, 𝑁
1
ensuring that the data is updated and recorded in time 𝑝(𝑡) = ∑ 𝑝 𝑡
𝑁 𝑖( ) (4) 
when the athletes perform complex movements. Bluetooth 𝑖=1
signal quality function is: pi(t)  is the spatial position of different sampling 
1
Q =  points. The system synchronizes the data of sensors and 
1 + exp(−k(S − S 2  
0)) ( )
motion tracking systems to ensure that the sensor data and 
S is the signal strength; S0 is the signal threshold; k is motion data at each moment can correspond correctly. 
the Bluetooth signal adjustment parameter. To deal with After time synchronization, the data can be smoothly input 
data anomalies, the Kalman filter [23] is used to smooth into the subsequent data processing and analysis. Time 
the data of accelerometers and gyroscopes. The Kalman synchronization function is: 
filter can dynamically predict the true value of the signal, 𝛥𝑇 = 𝑇𝑠𝑒𝑛𝑠𝑜𝑟 − 𝑇𝑐𝑎𝑚𝑒𝑟𝑎  
(5) 
optimize the measurement noise, and improve the 
accuracy of the data. Kalman filter formula is: Tsensor and Tcamera are the timestamps of the sensor 
𝑥 and camera, respectively. Table 1 is the motion capture 
𝑘|𝑘 = 𝑥𝑘|𝑘−1 + 𝐾𝑘(𝑧𝑘 − 𝐻𝑥 ) 
𝑘|𝑘−1 (3) key point coordinate data table.
𝐾𝑘 is the Kalman gain, and 𝑧𝑘 is the observed value. 
Table 1: Motion capture key point coordinate data table. 
Timestamp Shoulder X Shoulder Y Shoulder Z 
Knee X (cm) Knee Y (cm) Knee Z (cm) 
(ms) (cm) (cm) (cm) 
0 12.3 45.6 78.2 8.9 30.2 50.7 
10 12.1 45.5 78.3 9 30.3 50.9 
20 12.2 45.7 78.1 9.1 30.1 50.8 
30 12.4 45.8 78.2 9.2 30.4 51 
40 12.3 45.9 78.3 9.3 30.5 50.9 
50 12.5 46 78.1 9.4 30.6 51.1 
60 12.6 46.1 78.2 9.5 30.7 51.2 
238   Informatica 49 (2025) 235–248                                                                                                                                            D. Tan 
 
Table 1 records the three-dimensional spatial position component analysis (PCA) [32] are used. Weighted fusion 
data (X, Y, Z, in centimeters) of the shoulder and knee at is to assign different weights to different data sources 
different timestamps (in milliseconds). The data can be according to their signal-to-noise ratio and importance. 
used to analyze the movement trajectory and change trend When the noise of sensor data is large, its weight in fusion 
of the shoulder and knee in space.  is reduced. On the contrary, if it is small, it means that the 
spatial data provided by the motion tracking system is 
3.2 Multimodal data fusion relatively stable and can be assigned a higher weight. 
Weighted fusion formula is: 
After data collection and preprocessing, the multimodal 
𝐹
data from wearable sensors and motion tracking systems 𝑓𝑢𝑠𝑒𝑑 = 𝜔1𝐹1 + 𝜔2𝐹2 (9) 
are effectively fused. Different data sources provide 𝐹1  and 𝐹2  are the features of different data sources, 
different perspectives on the athlete's movements. and 𝜔1  and 𝜔2  are weights. Through weighted fusion 
Wearable sensor data provides time series information processing, the fused data can more realistically reflect the 
such as acceleration and angular velocity, and the motion athlete's performance. In weighted fusion, the quality of 
tracking system provides spatial information such as joint the signal is the key to weight assignment, and the 
position and motion trajectory. The effective integration relevance and accuracy of the data determine the 
of this information can help comprehensively evaluate the contribution of each source information. 
athlete's performance and provide accurate data input for Principal component analysis is used to reduce the 
the DL model. dimensionality of the data, compressing the multi-
Time synchronization [26] is a prerequisite for dimensional raw data into fewer principal components, 
ensuring the effective integration of multimodal data. reducing the redundancy of the data, and extracting the 
When collecting sensor data and motion tracking data, the most representative features. PCA dimensionality 
data needs to be accurately aligned in time due to the reduction formula is: 
different collection frequencies of the two [27]. To 𝑚𝑎𝑥 𝑊𝑇 ∑ 𝑊
achieve data calibration, a timestamp is used to mark the 𝑋′ = 𝑋𝑊, 𝑊 = 𝑎𝑟𝑔 ( ) (10) 
𝑊 𝑊𝑇𝐷𝑊
acquisition time of each frame of data to ensure that each Σ is the feature covariance matrix [33]; D is the weight 
action frame can obtain corresponding sensor data and matrix [34]; W is the optimized feature matrix. The 
tracking data. After time synchronization, the sensor data analysis process reduces the computational complexity 
at each moment is guaranteed to correspond perfectly with and retains the key information in the data, providing more 
the action tracking data, providing a basis for data fusion. efficient input for the training of DL models. Through the 
After synchronization correction, it can ensure that the dimensionality reduction of PCA, redundant dimensions 
action and sensor data at each moment correspond to each and noise can be eliminated, improving the efficiency and 
other, avoiding information loss caused by asynchrony accuracy of subsequent analysis. 
[28]. After fusion processing of multimodal data, data from 
The original sensor data contains rich time series different sources is integrated into a unified format, 
information. Wavelet transform [29] is used to analyze the providing rich and accurate input features for subsequent 
data in the time and frequency domain to extract motion action evaluation. The fused data contains both time series 
features. Wavelet transform formula is: 
∞ information and spatial position information, which can 
𝑡 − 𝑏
𝑊𝜓(𝑎, 𝑏) = ∫ 𝑓(𝑡)𝜓∗( ) 𝑑𝑡 fully and accurately reflect the athlete's action 
(6) 
−∞ 𝑎 performance in training [35]. Combined with efficient 
data synchronization, feature extraction, and fusion 
𝜓 is the mother wavelet function. Wavelet transform processing, sufficient high-quality data support is 
can effectively capture the instantaneous changes in provided for the CNN, ensuring that the model can make 
motion signals and use multi-scale analysis [30] to extract full use of various types of information for accurate 
the time-frequency features of motion signals. It is evaluation. 
integrated with the sensor data by calculating the spatial 
characteristics of the athlete's joint angle, motion 3.3 Action performance evaluation based on 
trajectory, and speed. The formula for calculating the joint 
CNN and LSTM 
angle is: 
𝑣1 ⋅ 𝑣2 This paper studies the evaluation of action performance of 
𝜃 = 𝑎𝑟𝑐 𝑐𝑜𝑠( ) 
‖𝑣  
1‖‖𝑣 (7)
2‖ fused data based on CNN and LSTM. The CNN model has 
an advantage in processing time series data and spatial 
v1  and v2  are the vectors of two bones. The motion 
data, while LSTM is good at capturing time series 
trajectory curve fitting formula is: 
dependencies, especially in capturing subtle differences in 
𝑟(𝑡) = 𝑎0 + 𝑎1𝑡 + 𝑎2𝑡2+. . . +𝑎𝑛𝑡𝑛 (8) athletes' movements and automatically extracting features. 
The spatial characteristics of the tracking system play The CNN model can more comprehensively evaluate the 
a decisive role in the accuracy and coordination of the athletes' movement performance and achieve end-to-end 
movements and are the core basis for evaluating the automated processing from raw sensor and tracking data 
performance of athletes. Data fusion is the key to to final movement scoring and classification. Through 
multimodal data processing. When fusion is performed, LSTM, the model can understand the continuity between 
methods such as weighted fusion [31] and principal actions and evaluate actions based on the relationship 
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 239 
 
between the action sequence. When capturing complex obtain more comprehensive feature extraction in both 
motion patterns, LSTM can supplement the timing spatial and temporal dimensions. After CNN extracts 
information that the CNN model fails to fully capture, spatiotemporal features, LSTM processes these features in 
providing a more detailed motion performance evaluation. time series, and the combination can more accurately 
In the motion performance evaluation task, the evaluate the quality and type of actions. Figure 2 is a 
combination of LSTM and CNN enables the model to diagram of the CNN model structure. 
 
Figure 2: CNN model structure diagram 
Figure 2 shows how the spatiotemporal features of the category (standard, insufficient, error); the activation 
athlete's movements are gradually extracted through the function is Softmax. The LSTM model is used to model 
convolutional layer, pooling layer, and fully connected the long-term dependencies of time series, with input 
layer, and converted into a one-dimensional vector for dimensions of (T, D), where T=100 represents the time 
classification and scoring through the flattening layer. The step, and D=9 represents the feature dimension. The 
output layer evaluates the athlete's movement quality LSTM layer contains 128 hidden units and uses a double-
based on the features learned by the model, achieving layer stacking structure with an activation function of 
automated and efficient movement quality recognition and Tanh. The output dimension is (C), C=3, and the 
feedback. activation function is Softmax. The output feature vectors 
The CNN model adopts a five-layer structure, with of CNN and LSTM are merged through concatenation 
input data dimensions of (T, F). Among them, T=100 operation and input into a fused fully connected layer, 
represents the time step; F=9 represents the input feature ultimately outputting action scores and classification 
dimension (including data from three axes each of results. The model parameter settings are shown in Table 
accelerometer, gyroscope, and magnetometer); the output 2: 
dimension is (C), where C=3 represents the action 
Table 2: Model parameter settings 
Parameter Specifications Parameter Specifications 
Learning rate 0.001 Dropout rate 0.5 
Batch size 64 Training epochs 100 
Optimizer Adam Hidden layer size 128 
Loss function Cross entropy loss function Convolutional kernel size (3, 3) 
The convolution operation can effectively capture the and long-term dependencies in the action. Convolution 
spatial features and temporal dynamic changes of the operation formula is: 
athlete's joint movement trajectory, body posture changes, 𝑘 𝑘
etc. The features processed by CNN can be passed to the 𝑓𝑖,𝑗 = ∑ ∑ 𝑥𝑖+𝑚,𝑗+𝑛 ⋅ 𝑤𝑚,𝑛 (11) 
LSTM network, which further analyzes the timing 𝑚=−𝑘 𝑛=−𝑘
information of these features. Through the time memory 
mechanism of LSTM, the network can learn the continuity 
240   Informatica 49 (2025) 235–248                                                                                                                                            D. Tan 
 
x is the input feature map; w is the convolution kernel; and increase the robustness of the model. Data 
the choice of the convolution kernel is closely related to enhancement methods include operations such as rotation, 
the characteristics of the data. Smaller convolution kernels mirror flipping, and scaling. More training samples are 
help capture subtle changes, while larger convolution generated through enhancement operations, so that the 
kernels can extract more macro features. It is necessary to model can still show excellent performance in different 
adjust the bodybuilder's tiny movements and postures, and motion modes. 
smaller convolution kernels help extract details more 
accurately, so a convolution kernel of size 3×3 is selected. 3.4 Real-time feedback and optimization 
The LSTM layer filters out irrelevant temporal suggestion generation 
information through its forget gate, input gate, and output 
gate, retaining the long-term and short-term dependency To improve the effect of aerobics training, a real-time 
information related to the action performance evaluation. feedback system is designed to feed back the evaluation 
After the convolution layer, the maximum pooling results generated during the training process to athletes, 
and average pooling [36] are used to reduce the dimension helping them adjust their movements, avoid errors and 
of the feature map. The role of the pooling layer is to optimize the quality of movements. The key to the 
reduce the amount of data after the convolution operation, feedback system lies in real-time and accuracy. Only 
reduce the computational complexity, and retain the most timely and accurate feedback can effectively improve the 
important feature information. Pooling operation formula training level of athletes. 
is: The system analyzes the real-time action data of 
𝑚𝑎𝑥 athletes through a model combining a trained CNN model 
𝑝𝑖,𝑗 = (𝑔 ) 
𝑚, 𝑛 𝑖+𝑚,𝑗+𝑛 (12) and LSTM to generate more accurate action scores and 
evaluation results. Every time an athlete performs an 
In the fully connected layer, the temporal information action, the system immediately analyzes the action and 
generated by LSTM and the spatial features extracted by outputs a real-time score. The score reflects the accuracy, 
CNN are integrated through the fully connected layer to fluency, and completion of the action. The higher the 
generate the final score and classification results of the score, the more standard the action. For low-scoring 
action performance. The model can not only evaluate the actions, the system automatically identifies the errors and 
quality of actions, but also classify actions into multiple provides specific optimization suggestions. Error 
categories such as "standard", "deficient", and "error". correction weight formula is: 
Action scoring function is: 1
𝑁 𝜚𝑒𝑟𝑟𝑜𝑟 =  
1 + 𝑒𝑥𝑝(−𝜅 ⋅ 𝛿) (16) 
𝑆 = ∑ 𝜑𝑖ℎ𝑖 (13) 𝛿 is the margin of error. Based on the score and error 
𝑖=1
recognition results, the system generates personalized 
hi is the score of each feature, and φi is the weight. optimization suggestions. Optimization suggestion 
The action evaluation score reflects the accuracy, fluency, generation function is: 
and standardization of the athlete's action. The 𝑚𝑎𝑥
𝐺 = 𝑎𝑟𝑔 (𝑆 + 𝜆 ⋅ 𝐸 ) 
classification results provide coaches and athletes with 𝑖 𝑖 𝑖 (17) 
targeted training improvement directions. Each 
classification result helps to further guide athletes' specific Si  is the score, and Ei  is the severity of the error. 
improvement measures in training. Suggestions include improving posture, adjusting the 
During the training process, the Adam optimization range of motion, strengthening muscle control, etc., to 
algorithm [37] is used to update parameters. The help athletes correct deficiencies in their movements. 
optimization algorithm update formula is: Optimization suggestions can be provided in text form or 
?̂? visualized through a graphical interface to help athletes 
𝑡
𝜃𝑡+1 = 𝜃𝑡 − 𝜂  more intuitively understand the problems in their 
√?̂? 4  
𝑡 + 𝜖 (1 )
movements. 
To ensure timely feedback, the system accelerates the 
?̂?𝑡  and ?̂?𝑡  are momentum estimates. The cross-
calculation process by optimizing the algorithm, 
entropy loss function [38] is used to optimize the 
controlling the delay between action scoring and feedback 
classification task, so that the model can better handle 
generation to less than 200ms, ensuring that athletes can 
multi-classification problems and continuously improve 
receive targeted adjustment suggestions in a short period 
the accuracy of action scoring and classification by 
of time. Real-time feedback delay formula is: 
minimizing the loss function. Classification loss function 
𝑇
is: 𝑑𝑒𝑙𝑎𝑦 = 𝑇𝑝𝑟𝑜𝑐𝑒𝑠𝑠 + 𝑇𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡  (18) 
𝐶
Through the real-time feedback mechanism, athletes 
𝐿 = − ∑ 𝑦𝑖 𝑙𝑜𝑔(?̂?𝑖) (15) can continuously adjust their movements during training 
𝑖=1 and gradually improve the training effect. Optimization 
yi is the true label, and ŷi is the predicted probability. suggestions are not limited to correcting mistakes, but can 
To improve the model’s training effect, data enhancement also help athletes improve the delicacy and accuracy of 
technology is used to simulate the motion performance in their movements. The athlete improvement index formula 
different training scenarios, expand the training data set, is: 
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 241 
 
𝛥𝑆 ⋅ 𝑙𝑛(1 + 𝐴0) 𝑡 𝑑𝑆
𝐼 =  (𝑡) = 𝑂 𝑑𝜏 
𝛥𝑇 ⋅ (1 + 𝑒−𝜀(𝛥𝑆−𝛥𝑆𝑎𝑣𝑔) (19) 𝑂 0 + ∫ (20) 
) 0 𝑑𝜏
ΔS is the score increment; ΔT is the time; A0  is the O0  is the initial performance. Combining DL with 
athlete's baseline ability; ε is the adjustment parameter; real-time feedback technology, it provides athletes with an 
ΔSavg is the average score increment. Through long-term intelligent training platform that can effectively improve 
training and optimization feedback, athletes can improve training efficiency and quality. Table 3 is some 
their overall performance in a short period of time and hyperparameter data of the experiment. 
achieve the best training effect. Long-term optimization 
trend equation is: 
Table 3: Some hyperparameter data 
Parameters Function Parameters Function 
α, β, γ Sensor data fusion weighting parameters k Bluetooth signal conditioning parameters 
Kk Kalman Gain ψ Mother wavelet function 
ω1, ω2 Weighted fusion weight Σ Feature covariance matrix 
w Convolution Kernel φi Action score weight 
m̂t, v̂t Momentum Estimation ŷi Prediction probability 
δ Margin of Error ε Adjustment parameters 
Table 3 lists the hyperparameters used for model and (4000), insufficient actions (4000), and incorrect actions 
signal processing and their functional descriptions. These (4000). Each sample contains multimodal data of 100 time 
hyperparameters play a key role in sensor data fusion, steps, including 9 channels (3-axis x 3 sensors) from 
signal conditioning, feature extraction, and prediction, and accelerometers, gyroscopes, magnetometers, and 18 joint 
can effectively improve the performance and accuracy of point 3D coordinate information obtained by Kinect. The 
the model. Adjusting these parameters can optimize data collection frequency is 50 Hz, which means 
system behavior according to specific application collecting 50 frames per second. The wearable sensor is 
requirements and achieve more accurate data processing the Xsens MTw Awinda series inertial measurement unit 
and action recognition. (IMU), with specific parameters shown in Table 4: 
4 Experimental results 
4.1 Experimental setup 
This paper collects a total of 12000 action samples, 
covering three categories of actions: standard actions 
 
Table 4: Sensor parameters 
Sensor Range Resolution Sampling rate 
Accelerometer ±16g 0.001g 50 Hz 
Gyroscope ±2000°/s 0.01°/s 50 Hz 
Magnetometer ±2.5 Gauss 0.01 Gauss 50 Hz 
The data collection is conducted in an indoor sports Kalman filtering, as a common noise suppression method, 
arena, with an ambient temperature controlled at 22-25 ° can effectively improve the stability and accuracy of the 
C, humidity at 45-60%, and good and stable lighting data by correcting the measured values. Figure 3 shows 
conditions. All model training is completed under the the comparison between the original data of the 
PyTorch 1.13.1 deep learning framework, with a training accelerometer, gyroscope, and magnetometer and the data 
time of approximately 4 hours per model. after Kalman filtering, which is used to evaluate the 
filtering effect.
4.2 Effect of Kalman Filtering on sensor 
data 
In sensor data processing, the original signal is easily 
affected by noise, which affects the accuracy of the data. 
242   Informatica 49 (2025) 235–248                                                                                                                                            D. Tan 
 
 
(a) Accelerometer data comparison; (b) Gyroscope data comparison; (c) Magnetometer data comparison 
Figure 3: Kalman filter effect 
Figure 3 shows the fluctuation of different sensors in valley difference in the X direction drops from the original 
different directions and the effect after filtering. In the 0.8µT to 0.6µT after filtering, providing more stable 
accelerometer data, after Kalman filtering, the peak-to- environmental data. At 0ms, the original magnetic field X 
valley difference of acceleration X drops from the original is 45.0µT, and after Kalman filtering, it is 45.2µT. The 
0.13m/s² to 0.04m/s² after filtering, showing a more stable data optimized by Kalman filtering, combined with the 
state. The original acceleration fluctuates greatly in the X- analysis and processing of the CNN model, can effectively 
axis and Y-axis directions. After Kalman filtering, the improve the accuracy of motion tracking and evaluation, 
fluctuation of the data is more stable, indicating that and provide athletes with more accurate real-time 
Kalman filtering can effectively reduce the interference of feedback and personalized training suggestions. 
noise. When the timestamp is 0ms, the original 
acceleration X is 0.12m/s², and after Kalman filtering, it is 4.1.Action scoring effect 
0.13m/s², with little change. The gyroscope data also 
To comprehensively evaluate the performance of aerobics 
shows a similar trend. The original data fluctuates to 
athletes, a scoring system based on sensor data is 
varying degrees in the X, Y, and Z directions, while the 
introduced. The score value of each sensor at different 
angular velocity data after Kalman filtering has less 
time points reflects the quality and stability of the athlete's 
fluctuation. After filtering, the peak-to-valley difference 
movements. The comprehensive score combines the 
in the angular velocity in the X direction is reduced from 
average score of these three scores to provide a 
the original 0.25°/s to the filtered 0.17°/s, ensuring more 
comprehensive performance evaluation for bodybuilders, 
accurate angle measurement. At 0ms, the original angular 
helping athletes and coaches to grasp the effect of exercise 
velocity Z is 5.50°/s, and after Kalman filtering it is 
in real-time. Figure 4 shows the scoring performance at 
5.52°/s. The magnetometer data also shows small 
different time points. 
fluctuations. After filtering, the fluctuations of the three-
axis magnetic field are further reduced. The peak-to-
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 243 
 
 
Figure 4: Sensor scores at different time points 
In the experiment, wearable sensors and motion time. CNN optimizes sensor data by combining Kalman 
tracking technology are combined to fuse and analyze filtering and wavelet transform technology to provide 
multimodal data such as acceleration, angular velocity, athletes with more accurate performance feedback and 
and magnetic field using the CNN model and LSTM to promote the improvement of training results. 
achieve accurate evaluation and optimization of aerobics 
movements. The data in Figure 4 shows that as time goes 4.3 Performance of feedback systems in 
by, the athletes' acceleration scores gradually increase different scenarios 
from 70 to 85 points; the angular velocity scores increase 
from 60 to 75; the magnetic field scores also maintain a The performance improvement before and after training 
relatively stable upward trend. The final comprehensive can reflect the optimization effect of the system. The 
score increases from 70.0 to 82.7. The data changes reflect following Table 5 shows the relationship between 
the gradual optimization of the athletes' performance feedback delay, optimization suggestion generation time, 
during training. Figure 4 shows the changes in different and athlete performance improvement in different training 
scoring dimensions at each time point, helping trainers to scenarios. 
accurately monitor and adjust training strategies in real-
Table 5: Feedback performance in different training scenarios 
Optimization Pre-training Post-training 
Average 
Suggestion Performance Performance Improvement 
Scenario Type Feedback Delay 
Generation Time Score (out of Score (out of Rate (%) 
(ms) 
(ms) 100) 100) 
Indoor Low 
150 90 65 80 23.1 
Intensity 
Indoor High 
180 95 62 78 25.8 
Intensity 
Outdoor Low 
160 85 68 82 20.6 
Intensity 
Outdoor High 
200 110 60 75 25 
Intensity 
According to the data in Table 5, there are certain suggestion generation time is 95 milliseconds; the action 
differences in feedback delay and optimization suggestion score before training is 62 points. After training, the score 
generation time in different training scenarios. In the increases to 78 points, and the improvement rate reaches 
indoor high-intensity training scenario, the average 25.8%, which is the highest improvement rate in all 
feedback delay is 180 milliseconds; the optimization scenarios. This shows that under high-intensity training, 
244   Informatica 49 (2025) 235–248                                                                                                                                            D. Tan 
 
despite the longer feedback delay, the system is able to 4.4 CNN model training effect 
more effectively generate optimization suggestions and 
To better evaluate and optimize the performance of 
improve athlete performance. In outdoor low-intensity 
athletes in different training scenarios, the experiment 
training scenarios, the feedback delay is shorter, at 160 
uses CNN model and LSTM to conduct in-depth analysis 
milliseconds, but the improvement rate is 20.6%, which is 
of various training data. In the four training scenarios of 
relatively low, indicating that the feedback system in this 
indoor low intensity, indoor high intensity, outdoor low 
scenario has room for improvement in the generation and 
intensity, and outdoor high intensity, the change of 
application of optimization suggestions. The efficiency 
training cycle has an important impact on the accuracy and 
and optimization effect of the feedback system vary in 
performance of the model. The changes in the accuracy, 
different training scenarios. The response speed of the 
precision, recall, F1-score, and loss value of the CNN 
system and the time to generate suggestions are closely 
model in different scenarios can be analyzed to understand 
related to the performance improvement of athletes. 
the model optimization trend during the training process. 
Figure 5 shows the changes in key indicators of the model 
after each training cycle in these scenarios. 
 
(a) Indoor low-intensity training scene; (b) Indoor high-intensity training scene; (c) Outdoor low-intensity training 
scene; (d) Outdoor high-intensity training scene 
Figure 5: CNN effects in different scenes 
By analyzing the model effects in different training intensity and high-intensity scenarios also shows a similar 
scenarios through the data in Figure 5, the performance of trend, with the accuracy rate increased from 73.2% to 
the CNN model combined with LSTM in various 85.9% and 74.3% to 86.7%, respectively, and the loss 
indicators is significantly improved with the increase of value decreased significantly. Whether it is a low-intensity 
training cycles. In the indoor low-intensity training or high-intensity scenario, the model shows good stability 
scenario, the accuracy rate increases from 74.5% to 87.1%; and accuracy in different environments. With the increase 
the precision, recall rate, and F1-score also increase of training cycles, the performance of the CNN model in 
steadily, and the loss value decreases from 0.68 to 0.27, motion scoring and classification has been effectively 
indicating that the model's ability to fit the data steadily improved, and the training error can be significantly 
improves with the passage of training time. In indoor high- reduced. 
intensity scenarios, the accuracy rate increases from To further verify the effectiveness of the method 
75.0% to 88.2%. The improvement in precision and recall proposed in this paper in the evaluation of aerobics 
rate shows that the model can effectively handle more movements, the CNN-LSTM hybrid model is compared 
complex training environments, and the loss value drops with several representative studies in recent years. The 
to 0.26. The training effect of the model in outdoor low- comparative methods include: multimodal fusion method 
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 245 
 
based on traditional CNN, action recognition method are conducted on the same dataset, with evaluation metrics 
based on LSTM, traditional classification method based including accuracy and F1 score, as well as action 
on support vector machine (SVM), and temporal modeling evaluation RMSE (Root Mean Squared Error) value, as 
method based on Transformer. Comparative experiments shown in Table 6:
 
Table 6: Comparison of the performance of this method with existing research 
Model Accuracy F1-score RMSE 
SVM 72.1% 0.703 8.1 
CNN 83.2% 0.817 6.0 
LSTM 84.6% 0.832 5.8 
Transformer 85.4% 0.841 5.6 
CNN-LSTM 88.2% 0.867 4.2 
 how to further improve the model's ability to classify 
From Table 6, it can be seen that this method complex actions and its ability to comprehensively 
outperforms existing methods in terms of accuracy and F1 process multimodal data remains an issue to be resolved. 
score. Compared with traditional CNN methods, this Future research can focus on optimizing sensor data 
method has improved accuracy by 5.0%; compared with collection and processing technology, strengthening data 
the LSTM method, it has improved by 3.6%; compared synchronization and fusion algorithms, and further 
with the Transformer method, it has improved by 2.8%. improving the stability and adaptability of CNN models in 
The RMSE of the method proposed in this paper is 4.2, different environments. With the advancement of 
significantly lower than other comparative methods, technology, combined with more sensors and analysis of 
indicating that the CNN-LSTM hybrid model proposed in training scenarios, it can be possible to provide athletes 
this paper has higher accuracy and stability in predicting with more detailed and comprehensive personalized 
action scores. This result represents that the method training programs, promoting the intelligent and precise 
proposed in this paper can effectively achieve accurate development of sports training. 
evaluation of athletes' movements and provide objective 
guidance for scientific training. 6 Conclusions 
This paper proposes a fitness exercise action evaluation 
5 Experimental discussion method that integrates wearable sensors and motion 
By combining wearable sensors with motion tracking tracking systems, and combines CNN and LSTM models 
technology and using CNNs for multimodal data fusion to fuse and analyze multimodal data. The experimental 
and analysis, this study has achieved significant results show that the method exhibits excellent 
experimental results in motion evaluation and performance in action classification tasks: in indoor low-
optimization. Kalman filtering significantly improves the intensity training scenarios, the accuracy increases from 
stability and accuracy of sensor data in the application of 74.5% to 87.1%; in high-intensity training scenarios, the 
noise suppression, and effectively reduces the impact of accuracy increases from 75.0% to 88.2%. By introducing 
environmental interference on data quality. With the Kalman filtering, wavelet transform, and dynamic 
increase of training cycles, the CNN model combined with weighting fusion strategy, the stability of sensor data and 
LSTM continues to improve in terms of accuracy, the generalization ability of the model have been 
precision, recall rate, and F1-score in action scoring and effectively improved. This paper not only provides high-
classification, showing good fitting ability. In high- precision motion evaluation and real-time feedback for 
intensity training scenarios, despite relatively long aerobics athletes, but also provides a transferable technical 
feedback delays, the system can still effectively generate framework for other high-intensity, multimodal motion 
optimization suggestions, and the athletes' performance sports projects. In the future, the system can be further 
has been significantly improved.  expanded to remote training platforms, intelligent 
However, the experimental results also reveal that wearable devices, virtual coaching systems, and other 
there are still some potential challenges and problems in application scenarios, promoting the deep integration and 
the application and development of the system. The widespread application of artificial intelligence 
performance of the model has been significantly improved technology in the fields of sports training and health 
in different training scenarios, but the feedback system has management. In the future, lightweight CNN structures 
a long delay and optimization suggestion generation time such as MobileNet and TinyML, deployment of models, 
in high-intensity training scenarios, which affects the on wearable devices, and heterogeneous computing 
system's real-time response capability. Kalman filtering acceleration can be used to further shorten feedback 
and other data optimization techniques effectively reduce latency and improve system response speed. 
noise, but in complex or extreme training environments, 
external interference still poses a risk of affecting the Authorship contribution statement 
accuracy of sensor data and causing certain errors in 
training feedback. With the diversification of training Danhua Tan: Writing-Original draft preparation, 
scenarios and the increase in environmental complexity, Conceptualization, Supervision, Project administration. 
246   Informatica 49 (2025) 235–248                                                                                                                                            D. Tan 
 
Data availability 2346, 2023. 
https://doi.org/10.1177/17479541221138015 
The experimental data used to support the findings of this [9] Y. Zhang, “Design of Wireless Motion Sensor 
study are available from the corresponding author upon Nodes based on the Kalman Filter Algorithm,” 
request. Recent Advances in Electrical & Electronic 
Engineering (Formerly Recent Patents on 
Author statement Electrical & Electronic Engineering), 16(3): 248–
The manuscript has been read and approved by all the 255, 2023. 
authors, the requirements for authorship, as stated earlier https://doi.org/10.2174/23520965156662209081
in this document, have been met, and each author believes 52036 
that the manuscript represents honest work. [10] S. Akan and S. Varlı, “Use of deep learning in 
soccer videos analysis: survey,” Multimed Syst, 
29(3): 897–915, 2023. 
Ethical approval https://doi.org/10.1007/s00530-022-01027-0 
All authors have been personally and actively involved in [11] D. Gholamiangonabadi and K. Grolinger, 
substantial work leading to the paper, and will take public “Personalized models for human activity 
responsibility for its content. recognition with wearable sensors: deep neural 
networks and signal processing,” Applied 
References Intelligence, 53(5): 6041–6061, 2023. 
https://doi.org/10.1007/s10489-022-03832-6 
[1] M. E. M. Simbolon, D. K. A. Firdausi, I. [12] A. Chakraborty and N. Mukherjee, “A deep-CNN 
Dwisaputra, A. Rusdiana, C. Pebriandani, and R. based low-cost, multi-modal sensing system for 
Prayoga, “Utilization of Sensor technology as a efficient walking activity identification,” 
Sport Technology Innovation in Athlete Multimed Tools Appl, 82(11): 16741–16766, 
Performance Measurement,” Indonesian Journal 2023. https://doi.org/10.1007/s11042-022-13990-
of Electronics and Instrumentation Systems x 
(IJEIS), 13(2): 147–158, 2023. [13] W. Liu, Y. Liu, and R. Bucknall, “Filtering based 
https://doi.org/10.22146/ijeis.89581 multi-sensor data fusion algorithm for a reliable 
[2] Z. Mei, “3D images analysis of sports technical unmanned surface vehicle navigation,” Journal of 
features and sports training methods based on Marine Engineering & Technology, 22(2): 67–83, 
artificial intelligence,” J Test Eval, 51(1): 189– 2023. 
200, 2023.  https://doi.org/10.1520/JTE20210469 https://doi.org/10.1080/20464177.2022.2031558 
[3] S. A. Kovalchik, “Player tracking data in sports,” [14] L. Zhang and H. Dai, “Motion trajectory tracking 
Annu Rev Stat Appl, 10(1): 677–697, 2023. of athletes with improved depth information-
https://doi.org/10.1146/annurev-statistics- based KCF tracking method,” Multimed Tools 
033021-110117 Appl, 82(17): 26481–26493, 2023. 
[4] L. Yang, O. Amin, and B. Shihada, “Intelligent https://doi.org/10.1007/s11042-023-14929-6 
wearable systems: Opportunities and challenges [15] P. Hao and K. Qian, “The Integration of 
in health and sports,” ACM Comput Surv, 56(7):1– Personalized Training Program Design and 
42, 2024. https://doi.org/10.1145/3648469 Information Technology for Athletes,” Scalable 
[5] W. Li, “Application of IoT-enabled computing Computing: Practice and Experience, 25(5): 
technology for designing sports technical action 4351–4359, 2024. 
characteristic model,” Soft comput, 27(17): https://doi.org/10.12694/scpe.v25i5.3083 
12807–12824, 2023. [16] V. Deepak, D. K. Anguraj, and S. S. Mantha, “An 
https://doi.org/10.1007/s00500-023-08966-4 efficient recommendation system for athletic 
[6] Y. Fang, “Utilizing Wearable Technology to performance optimization by enriched grey wolf 
Enhance Training and Performance Monitoring in optimization,” Pers Ubiquitous Comput, 27(3): 
Indonesian Badminton Players,” Studies in Sports 1015–1026, 2023. 
Science and Physical Education, 2(1): 11–23, https://doi.org/10.1007/s00779-022-01680-2 
2024. DOI:10.1186/s40561-023-00247-9 [17] J. K. Urbanek et al., “Free-living gait cadence 
[7] J. Corban et al., “Using an affordable motion measured by wearable accelerometer: a promising 
capture system to evaluate the prognostic value of alternative to traditional measures of mobility for 
drop vertical jump parameters for noncontact assessing fall risk,” The Journals of Gerontology: 
ACL injury,” Am J Sports Med, 51(4):1059–1066, Series A, 78(5): 802–810, 2023. 
2023. https://doi.org/10.1093/gerona/glac013 
https://doi.org/10.1177/03635465231151686 [18] A. Hussain, S. Ali, M.-I. Joo, and H.-C. Kim, “A 
[8] C. J. Rigozzi, G. A. Vio, and P. Poronnik, deep learning approach for detecting and 
“Application of wearable technologies for player classifying cat activity to monitor and improve 
motion analysis in racket sports: A systematic cat’s well-being using accelerometer, gyroscope, 
review,” Int J Sports Sci Coach, 18(6): 2321– and magnetometer,” IEEE Sens J, 24(2): 1996–
2008, 2023. 
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 247 
 
[19] A. Spilz and M. Munz, “Synchronisation of Nanoscale Measurements and Ensemble 
wearable inertial measurement units based on Behavior,” ACS Nano, 17(21): 21493–21505, 
magnetometer data,” Biomedical 2023. https://doi.org/10.1021/acsnano.3c06335 
Engineering/Biomedizinische Technik, 68(3): [31] J. Sun, H. Zhang, X. Ma, R. Wang, H. Sima, and 
263–273, 2023. https://doi.org/10.1515/bmt- J. Wang, “Spectral–Spatial Adaptive Weighted 
2021-0329 Fusion and Residual Dense Network for 
[20] A. Liu, R. P. Mahapatra, and A. V. R. Mayuri, hyperspectral image classification,” The Egyptian 
“Hybrid design for sports data visualization using Journal of Remote Sensing and Space Sciences, 
AI and big data analytics,” Complex & Intelligent 28(1): 21–33, 2025. 
Systems, 9(3): 2969–2980, 2023. https://doi.org/10.1016/j.ejrs.2024.11.001 
https://doi.org/10.1007/s40747-021-00557-w [32] J. P. Bharadiya, “A tutorial on principal 
[21] C.-T. Lin, Y. Wang, S.-F. Chen, K.-C. Huang, and component analysis for dimensionality reduction 
L.-D. Liao, “Design and verification of a wearable in machine learning,” Int J Innov Sci Res Technol, 
wireless 64-channel high-resolution EEG 8(5): 2028–2032, 2023. 
acquisition system with wi-fi transmission,” Med DOI:10.5281/zenodo.8002436 
Biol Eng Comput, 61(11): 3003–3019, 2023. [33] F. Bizzarri, D. Del Giudice, S. Grillo, D. Linaro, 
https://doi.org/10.1007/s11517-023-02879-y A. Brambilla, and F. Milano, “Inertia estimation 
[22] X. Shi and H. Zou, “Data Collection and Analysis through covariance matrix,” IEEE Transactions 
based on Sensor Technology in Sports Training,” on Power Systems, 39(1): 947–956, 2023. DOI: 
Scalable Computing: Practice and Experience, 10.1109/TPWRS.2023.3236059 
25(5): 4399–4406, 2024. [34] Y. He, C.-K. Zhang, H.-B. Zeng, and M. Wu, 
https://doi.org/10.12694/scpe.v25i5.3200 “Additional functions of variable-augmented-
[23] M. Khodarahmi and V. Maihami, “A review on based free-weighting matrices and application to 
Kalman filter models,” Archives of systems with time-varying delay,” Int J Syst Sci, 
Computational Methods in Engineering, 30(1): 54(5): 991–1003, 2023. 
727–747, 2023. https://doi.org/10.1007/s11831- https://doi.org/10.1080/00207721.2022.2157198 
022-09815-7 [35]    Singh S, Sehgal V K. "Deep Learning-Based CNN 
[24] M. Azhar, S. Ullah, M. Raees, K. U. Rahman, and Multi-Modal Camera Model Identification for 
I. U. Rehman, “A real-time multi view gait-based Video Source Identification,” Informatica: An 
automatic gender classification system using International Journal of Computing and 
kinect sensor,” Multimed Tools Appl, 82(8): Informatics, 47(3): 417-430, 2023. 
11993–12016, 2023. https://doi.org/10.31449/inf.v47i3.4392 
https://doi.org/10.1007/s11042-022-13704-3 [36] T. Sharma, N. K. Verma, and S. Masood, “Mixed 
[25] L. Lv, J. Yang, F. Gu, J. Fan, Q. Zhu, and X. Liu, fuzzy pooling in convolutional neural networks 
“Validity and reliability of a depth camera–based for image classification,” Multimed Tools Appl, 
quantitative measurement for joint motion of the 82(6): 8405–8421, 2023. 
hand,” J Hand Surg Glob Online, 5(1): 39–47, https://doi.org/10.1007/s11042-022-13553-0 
2023. https://doi.org/10.1016/j.jhsg.2022.08.011 [37] M. Reyad, A. M. Sarhan, and M. Arafa, “A 
[26] Y. Wu, Z. Sun, G. Ran, and L. Xue, “Intermittent modified Adam algorithm for deep neural network 
control for fixed-time synchronization of coupled optimization,” Neural Comput Appl, 35(23): 
networks,” IEEE/CAA Journal of Automatica 17095–17112, 2023. 
Sinica, 10(6): 1488–1490, 2023. DOI: https://doi.org/10.1007/s00521-023-08568-z 
10.1109/JAS.2023.123363 [38] Z. Mei et al., “Automatic loss function search for 
[27]  Hrovatin, N. "Enabling Decentralized Privacy adversarial unsupervised domain adaptation,” 
Preserving Data Processing in Sensor Networks,” IEEE Transactions on Circuits and Systems for 
Informatica (03505596), 48(1): 141-142, 2024. Video Technology, 33(10): 5868–5881, 2023. 
https://doi.org/:10.31449/inf.v48i1.5739. DOI: 10.1109/TCSVT.2023.3260246 
[28]    Thi H N, Duc C V, Duc C T, HH Minh, SN Van,  
LV Quan. “Memetic Algorithm for Maximizing  
K-coverage and K-Connectivity in Wireless  
Sensor Network,” Informatica, (03505596), 49(1):  
1-7, 2025.   
https://doi.org/:10.31449/inf.v49i1.6750.  
[29] A. Halidou, Y. Mohamadou, A. A. A. Ari, and E.  
J. G. Zacko, “Review of wavelet denoising  
algorithms,” Multimed Tools Appl, 82(27):  
41539–41569, 2023.  
https://doi.org/10.1007/s11042-023-15127-0  
[30] M. Kang, C. L. Bentley, J. T. Mefford, W. C.  
Chueh, and P. R. Unwin, “Multiscale Analysis of  
Electrocatalytic Particle Activities: Linking  
248   Informatica 49 (2025) 235–248                                                                                                                                            D. Tan 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.8812 Informatica 49 (2025) 249–268 249 
 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio 
Prediction in Geotechnical Engineering
 
Yulin Lan1, Na Feng 2, * and Zhisheng Yang3  
1Planning and Finance Department, Weifang Engineering Vocational College, Weifang 262500, Shandong, China 
2School of Information Engineering, Weifang Engineering Vocational College, Weifang 262500, Shandong, China 
3Party and Government Office, Weifang Engineering Vocational College, Weifang 262500, Shandong, China 
E-mail: sdqzyuchen@163.com 
*Corresponding author 
 
Keywords: california bearing ratio, support vector regression, adaptive opposition slime mold algorithm, alibaba and 
the forty thieves optimization algorithm, dingo optimization algorithm 
 
Received: April 7, 2025 
Soil resistance characteristics, particularly the California Bearing Ratio (CBR), play a pivotal role in 
pavement and subgrade design. However, conventional laboratory-based CBR testing is often time-
consuming, labor-intensive, and costly. This study presents a novel machine learning framework that 
combines Support Vector Regression (SVR) with three recent metaheuristic optimization algorithms—
Dingo Optimization Algorithm (DOA), Alibaba and the Forty Thieves Optimization (AFT), and Adaptive 
Opposition Slime Mold Algorithm (AOSMA)—to predict CBR values efficiently and accurately. A dataset 
consisting of 220 soil samples with eight geotechnical input parameters was used to develop and evaluate 
the hybrid models. The predictive performance of each model was assessed using multiple evaluation 
metrics, including R², RMSE, MSE, RSR, and WAPE. Results indicate that the SVR–AFT (SVAF) hybrid 
model outperformed the others, achieving an R² of 0.9968 and an RMSE of 0.7946 in the testing phase, 
demonstrating high generalization ability and predictive precision. The integration of SVR with 
metaheuristic algorithms significantly enhances model robustness and accuracy, offering a practical and 
cost-effective alternative to empirical CBR testing methods. This work highlights the potential of hybrid 
AI models in solving complex geotechnical prediction problems and contributes to the growing body of 
research at the intersection of civil engineering and artificial intelligence.  
Povzetek: Hibridni modeli SVR so optimizirani z metahevristikami AFT, DOA in AOSMA za hitro in 
natančno napovedovanje CBR iz osmih geotehničnih parametrov. Na 220 vzorcih doseže najboljši model 
SVAF R² = 0.9968 in RMSE = 0.7946, kar ponuja stroškovno učinkovito alternativo laboratorijskim 
testom.
 
1 Introduction Gams and Kolenik highlight the reciprocal relationship 
between electronics and AI, where swift hardware 
CBR is the term utilized by Geotechnical construction to improvements, described by a comprehensive set of  
describe the resistivity of the substrate sample to a piston's Information Society (IS) laws, have driven 
insertion. More specifically, the CBR describes the force groundbreaking progress in AI across fields like medicine, 
applied to the piston to enable it to penetrate the soil. [1]. smart environments, and autonomous systems. Their 
Initially, the CBR examination was devised in California research shows that AI and ambient intelligence (AmI) not 
to appraise the suitability of soils for highway only benefit from electronic advancements but are also 
construction. Civil engineers modified the testing process beginning to influence hardware optimization and 
to enhance its impact on the airport's construction. Almost intelligent system design through AI, indicating a move 
all emerging countries widely adopt the CBR test to toward a more integrated technological progression [4]. 
appraise pavements' resilience to soil [2]. A material's After the compacted soils have been tested, the laboratory 
load-bearing capacity is gauged by its CBR, which is the can conduct the subsequent test. However, it is possible 
ratio of the attainable supporting strength of base materials for soils located in trenches to conduct the CBR test under 
to that of regular crushed rock. In structural engineering, the circumstances on the premises [5]Recognizing that in 
100 is considered a reasonable limit for the CBR for situ and laboratory test outcomes can show perceptible 
broken rock substances. differences between soil types, unit weights, and water 
Conversely, the values of CBR for alternative content is essential. Employing CBR tests has proved 
materials are found to be below 100 [3]. Recent advances promising for presenting information about the stability 
in artificial intelligence (AI) are closely intertwined with and strength of different kinds of structures related to soils, 
the rapid development of electronic technologies, forming such as road fills, airport roadways and dams, and road 
the foundation of the so-called "information society.”  foundations. 
250   Informatica 49 (2025) 249–268                                                                                                                                   Y. Lan et al. 
 
Moreover, these tests can be conducted in unsoaked comparative study by Ma et al. [27], evaluating 20 
and soaked soil varieties. The laboratory CBR tests are metaheuristic algorithms for SVR parameter tuning in 
characterized by their demanding nature regarding time landslide displacement prediction, revealed considerable 
and manual effort. Moreover, the outcomes of such tests variation in outcomes. The Multiverse Optimizer emerged 
are frequently marred by discrepancies attributed to the as particularly efficient in achieving high accuracy with 
suboptimal quality of conditions in the lab and samples of low computational cost, highlighting the critical role of 
dirt, which, in turn, lead to inaccurate CBR values [6].  algorithm selection in enhancing SVR model 
Various studies have been performed on the performance. These studies collectively underscore the 
California bearing ratio, which led the researchers to growing impact of hybrid AI models in geotechnical 
formulate different procedures. Previous studies showed applications. However, few works have focused 
that changes in the soil types and properties affected the specifically on integrating SVR with newer and less 
value of CBR. Amongst other things, it has been observed explored metaheuristic algorithms such as the Alibaba and 
that most research work has focused on studying the Forty Thieves (AFT), Dingo Optimization Algorithm 
relationship existing between the compacted properties, (DOA), or Adaptive Opposition Slime Mold Algorithm 
unique indicators, as well as the mineral examinations' (AOSMA). Our study addresses this gap by systematically 
CBR concentrations [6]–[8][9], [10], [11], [12], [13], evaluating and comparing these novel SVR-based hybrid 
[14][15], [16], [17], [18]. To determine the value of CBR, models in predicting CBR, offering insights into their 
soils are compacted at a predetermined MDD and OMC at optimization behaviors, convergence patterns, and 
a specified energy level for the soil material. For the CBR, predictive robustness. By incorporating recent 
the cases are soaked for four days; the primary purpose of advancements and experimental benchmarks, this work 
this soaking is to allow absorption. Consequently, the aims to contribute both technically and methodologically 
assessment of the CBR value for a soaked sample typically to the field of AI-driven geotechnical modeling. 
requires a period of approximately five to six days. This Considering the variety of parameters to be 
delay can prove detrimental to the timely completion of a considered and the range of datasets observed, as 
large-scale construction endeavor. Since soil is vastly explained in the previous paper, it becomes of prime 
different from one quality to another, applying this importance to develop robust predictive methodologies to 
exercise to the foundation soil samples collected from a model the mechanical attributes of the CBR and delineate 
small count of sites may not truly represent the soil the complex correlations between the constituents of soil. 
properties for all roads. To eliminate this deficiency, a Recent studies have explored various soft computing 
large count of specimens is needed to be gathered for tests. and machine learning techniques for predicting the 
Therefore, calculating the CBR values for pavement California Bearing Ratio (CBR). These include Random 
subgrade soils deploying easily identifiable parameters Forest, Gradient Boosting, and XGBoost, which are 
becomes very important in developing appropriate known for their solid performance in regression tasks. 
pavement design parameters. Recently, interest in using However, such models often need extensive tuning and 
Artificial Intelligence (AI) tactics to solve geotechnical can struggle to capture complex nonlinear relationships, 
engineering problems has increased. Consequently, some especially when feature interactions are subtle. 
valuable outcomes have been obtained [19], [20]. Conversely, Support Vector Regression (SVR) 
Furthermore, a limited count of studies has documented demonstrates strong generalization and robustness, 
endeavors to appraise the (CBR) of soils via adopting especially when combined with kernel functions and 
diverse Artificial Neural Network (ANN) methodologies metaheuristic optimization. To assess the effectiveness of 
[21], [22], [23]. Recent advances in machine learning have the proposed SVR-based hybrid models, we also 
increasingly supported geotechnical engineering by incorporated Random Forest as a benchmark and 
improving the prediction of soil and foundation properties compared its predictive accuracy with the SVR models 
through data-driven models. Support Vector Regression enhanced by metaheuristics. 
(SVR), when combined with metaheuristic optimization, Recent studies have explored different soft computing 
has proven particularly effective in modeling complex and machine learning approaches for predicting CBR. Key 
nonlinear relationships within geotechnical datasets. Ngo techniques include Artificial Neural Networks (ANN), 
et al. [24] demonstrated that SVR optimized via Multiple Linear Regression (MLR), Group Method of 
metaheuristics yielded superior performance in predicting Data Handling (GMDH), and SVM. These models use soil 
the unconfined compressive strength of stabilized soils. parameters such as Atterberg limits, dry density, optimum 
Similarly, Hoang et al. [25] applied enhanced SVR models moisture content, and soil gradation as inputs. However, 
to successfully estimate pile bearing capacity, showcasing many struggle with issues like overfitting, limited 
the method’s versatility in foundation engineering. In the generalization to unseen data, or inadequate 
context of California Bearing Ratio (CBR) prediction, hyperparameter optimization. As shown in Table 1, most 
Bherde et al. [26] reported that Random Forest Regression previous models achieved only moderate accuracy and did 
outperformed other algorithms, including SVR, with not utilize metaheuristic optimization to boost prediction 
maximum dry density and gravel content being the most performance. To fill this gap, this study introduces a 
influential predictors. While these results support the hybrid Support Vector Regression (SVR) model 
effectiveness of ensemble models, they also underline the combined with three metaheuristic optimizers—AFT, 
need for more optimized SVR configurations that can DOA, and AOSMA—aimed at improving the model’s 
match or exceed ensemble performance. A broader ability to learn nonlinear patterns. The superior 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 251 
 
performance of the SVAF model, especially in RMSE and 
R² metrics, highlights the benefits of this approach. 
Table 1: Overview of past methods for CBR prediction 
Performance 
Dataset 
Study Model Type Input Features Metrics (R² / Notes 
Size 
RMSE) 
LL, PL, PI, MDD, ANN prone to 
Yildirim & Artificial Neural R² = 0.945 / 
OMC, % Sand and 120 overfitting and high 
Gunaydin  Network (ANN) RMSE = 1.82 
Gravel variance 
LL, PL, PI, 
Good performance, 
Taskiran  GMDH Compaction 200 R² ≈ 0.92 
limited interpretability 
properties 
Alawi & Multiple Linear LL, PL, PI, Soil Struggles with 
100 R² = 0.86 
Rajab  Regression (MLR) Gradation nonlinear relationships 
SVR + Improved 
Ngo et al. Arithmetic Grain size, R² = 0.96 / SVR enhanced with 
150 
[24] Optimization Density, OMC, PI RMSE = 1.12 metaheuristic tuning 
(IAOA) 
Stochastic Gradient 
Wu et al. LL, PL, MDD, % Ensemble method with 
Boosting Regression 300 R² = 0.974 
[25] Clay, % Silt good generalization 
(SGBR) 
Strong performance, 
Bherde et Random Forest MDD, % Gravel, 
400 R² = 0.982 but no hyperparameter 
al.[26]  Regression (RFR) OMC, PI 
optimization 
LL, PL, PI, MDD, Best accuracy using 
Current R² = 0.9968 / 
SVR + AFT OMC, SDA, QD, 300 hybrid SVR and AFT 
Study RMSE = 0.7946 
OPC metaheuristic 
 
This study addresses the challenges of traditional 2 Materials and methodology 
CBR testing methods, which are often time-consuming 
and costly, by exploring advanced machine learning 
models supplemented with nature-inspired optimization 2.1 Data gathering 
techniques. Specifically, it focuses on Support Vector 
This study's dataset consists of 121 soil samples gathered 
Regression (SVR), a popular tool for nonlinear regression. 
from various geotechnical investigation reports and lab 
Since SVR's performance heavily depends on 
tests across different regions in [insert country or region, 
hyperparameter selection, three recent metaheuristic 
e.g., southwestern Iran or southeastern Asia—please 
algorithms—Adaptive Opposition Slime Mold Algorithm 
specify based on your case]. The samples include a variety 
(AOSMA), Alibaba and the Forty Thieves Algorithm 
of soil types such as clayey soils, silty sands, gravels, and 
(AFT), and Dingo Optimization Algorithm (DOA)—are 
mixtures to ensure the broad applicability of the predictive 
employed to optimize the SVR framework. These 
models. Each sample records essential input parameters 
algorithms offer diverse search strategies with strong 
like [list key parameters: e.g., dry density, moisture 
potential for effective global optimization and faster 
content, liquid limit, plasticity index, etc.], with the 
convergence. The predictive capability of these hybrid 
California Bearing Ratio (CBR) used as the target 
models is evaluated using five standard statistical metrics: 
variable. Data were obtained from both published 
R², RMSE, MSE, RSR, and WAPE. The study aims to (1) 
literature and in-house experiments, offering a 
develop and validate an SVR model for predicting the 
comprehensive understanding of soil behavior in various 
California Bearing Ratio (CBR) based on soil and 
compaction parameters;  (2) improve SVR's predictive geological settings. The data in this investigation depend 
performance through hyperparameter tuning with the on eight variables: OPC, SDA, QD, plastic limit, liquid 
three optimization algorithms; and  (3) perform a limit, maximum dry density, plasticity index, and ideal 
comprehensive comparison of the models using these content. Simultaneously, the resulting component of 
metrics to identify the most accurate and reliable one for interest is identified as the CBR value. The dataset has 
geotechnical use. been split into two subsets: 30% of the total set is made up 
 of the testing phase, while 70% comprises the training 
phase.  
 
 
 
 
252   Informatica 49 (2025) 249–268                                                                                                                                   Y. Lan et al. 
 
Table 1 depicts a numerical example of some of the parameters for analyzing statistics. The maximum values 
parameters used in building the scheme. This table gives for the LL, PL, PI, MDD, OMC, SDA, QD, and OPC 
an overall summary of some of the attributes, such as variables are 52.1, 37.2, 19.5, 1.777, 29.5, 20, 20, and 8, 
minimum (Min), maximum (Max), standard deviation respectively. Also, CBR's maximum value as an output 
(St.), and mean (M). It is crucial to determine the essential parameter is 66.75 percent. 
Table 2: The statistical features of the dataset components 
The numerical traits 
Parameters 
Max Min Mean St.dev 
LL 52.10 21.20 35.8450 6.15380 
PL 37.20 17.90 26.6830 4.28120 
PI 19.50 2.10 9.16230 4.11490 
MDD 1.7770 1.3650 1.49290 0.08830 
OMC 29.50 18.90 24.1430 2.42670 
SDA (%) 20.0 0 10.6600 7.15460 
QD (%) 20.0 0 10.640 8.19610 
OPC (%) 8.0 2.0 4.94490 2.37980 
CBR (%) 66.750 19.690 39.9590 10.8660 
2.2 Support vector regression (SVR) dependent variables: Eq. (1) gives, mathematically, the 
operation of an SVR; 
In its early phases, the (SVM) technology was used to 
address pattern identification problems, initially 𝑓(𝑥) = 𝑍𝑇𝜑(𝑥) + 𝑏  (1) 
introduced by Vapnik [28]. Then, Vapnik [29] suggested 𝑏 is the variable element 
the SVM algorithm to solve problems with function 𝑓(𝑥)  symbolizes the expected parameters 
approximation, which resulted in developing the SVR 𝑍 is the 𝑙 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 weighting component. 
approach. The SVR approach is an innovative and perhaps An example of how distinct components 𝑋𝑖 are 
practical method in data regression analysis.  In this study, mapped to a feature space with many dimensions is the 
Support Vector Regression (SVR) is used as the main function 𝜑(𝑥). 
predictive model. Because of its ability to manage The formal expression for the ε-insensitive coefficient 
nonlinear relationships through kernel functions, SVR is of loss is found in Eq. (2). 
especially suitable for modeling complex geotechnical |𝑦 − 𝑓(𝑥)|ఌ = max (0, |𝑦 − 𝑓(𝑥)| − 휀)  (2) 
datasets. The Radial Basis Function (RBF) kernel is 
chosen due to its effectiveness in high-dimensional feature The difference between the real number, symbolized 
spaces and its ability to generalize well. The SVR model by y, and the anticipated value, f(x), as expressed 
depends on three main hyperparameters: C (regularization theoretically by Eq. (3), is known as the residual. 
parameter): Regulates the trade-off between achieving a 𝑅(𝑥, 𝑦) =  𝑦 − 𝑓(𝑥)  (3) 
low training error and maintaining a simple model.  γ According to Eq. (4), incorporating the entire residue 
(gamma): Determines the influence range of a single piece within a preset boundary value of ε is the optimum 
training example; a lower value means a wider reach, regression model. 
while a higher value indicates a more localized effect. ε −휀 ≤ 𝑅(𝑥, 𝑦) ≤ 휀  (4) 
(epsilon): Defines the tolerance margin within which Eq. (4) coincides with the hypothesis on the whole 
errors are not penalized. These parameters were tuned training data set. Thus, if the residual meets the criterion 
using metaheuristic algorithms to minimize the root mean 𝑅(𝑥, 𝑦) = ±휀, the data exhibits a maximum detour from 
square error (RMSE) of predictions. the hyperplane. One can calculate a spatial separation of 
From an academic standpoint, the (SVR) may be an arbitrary data point (𝑥, 𝑦) from the hyperplane 
explained in the subsequent terms. SVR uses a dataset that 𝑅(𝑥, 𝑦) = 0 by the formula |𝑅(𝑥, 𝑦)|/‖𝑊∗‖. Further, 𝑍∗ 
has  𝑁 entries in it {(𝑋𝑖 , 𝑦𝑖), 𝑖 = 1,2, … , ?̅?}.  can be calculated as: 
The training dataset's overall count of instances is 
𝑍∗ = (1,−𝑍𝑇)𝑇 (5) 
denoted by 𝑀.  
𝑋𝑖 = {𝑥1, 𝑥2, … , 𝑥𝑚} ∈ 𝑅
𝑚 denotes the 𝑖 − 𝑡ℎ In this question, the variable 𝛿 is assumed to be the 
component of the vector with M dimensions.  maximum degree of dispersion between the hyperplane 
𝑦𝑖 ∈ 𝑅 represents the genuine value connected to 𝑋𝑖. 𝑅(𝑥, 𝑦) = 0 and the dataset (𝑥, 𝑦). All the training data 
For that, in machine learning tactics, l-dimensional can be induced to meet the requirements shown in Eq. (6). 
feature space - or something similar - represents the exact If the value of δ reaches its maximum in the SVR scheme, 
mapping of any training data point 𝑋𝑖 in an SVR. The it means that the scheme can exhibit the best 
obtained hyperplane is in the space of features that will be generalization ability. 
selected using Support Vector Regression towards the |𝑅(𝑥, 𝑦)| ≤ 𝛿‖𝑍∗‖ (6) 
portrayal of the optimal hyper-plane between the input (or 
the uncorrelated) variable and the exact output, the 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 253 
 
Whenever 𝑅(𝑥, 𝑦) equals an ε, the most significant 휁 ∗
𝑖 ≥ 0, 휁 ≥ 0     𝑖 = 1, 2, … , ?̅?  
𝑖
distance is reached. After that, Eq. (6) may be changed to The first term of Eq. (10) tends to restrict weights, so 
become Eq. (7). Considering the translation of an they stay above a certain limit to preserve whether the 
optimization issue to a minimal ‖𝑍‖, ‖𝑍∗ ‖2 = ‖𝑍‖2 + 1, regression algorithm is constant. The second part of this 
and ‖𝑍∗ ‖ must be a minimal value to attain the maximum system defines the ratio of certainty to vulnerability for 
of 𝛿. possible hazards resulting from previous experiences 
휀 = 𝛿‖𝑍∗‖ (7) using the ε-insensitive Relationship to losing. After 
determining the solution for the quadratic enhancement 
Even with efforts to keep mistakes within the (−휀, 휀) issue with inequality restrictions, the value of coefficient 
range during training, it is still possible for certain errors Z can be gathered from Eq. (11). 
to surpass this limit. If training mistakes are less than -ε, 𝑀
they are displayed by 휁𝑖 , and if they are more than ε, they 𝑍 =∑( 𝛽∗ − 𝛽𝑖)𝜑(𝑥𝑖) 
∗ 𝑖 (11) 
are displayed by 휁𝑖 . We define the notations 휁𝑖  and 휁∗𝑖  𝑖=1
according to Eqs. (8) and (9), appropriately. The values of 𝛽∗𝑖  and 𝛽𝑖  are determined by solving a 
0                 𝑅(𝑥𝑖 , 𝑦휁 𝑖) − 휀 ≤ 0        
𝑖 = {
quadratic programming problem that incorporates an 
 (8) 
𝑅(𝑥𝑖 , 𝑦𝑖) − 휀       𝑜𝑡ℎ𝑒𝑟𝑠                 indication of the Lagrangian multipliers. Mathematically, 
0     
휁∗
          휀 −    𝑅(𝑥𝑖 , 𝑦𝑖)  ≤ 0        
the Support Vector Regression function is displayed with 
= {  (9) 
𝑖 the utilize of the equation depicted as Eq. (12): 
휀 −    𝑅(𝑥𝑖 , 𝑦𝑖)       𝑜𝑡ℎ𝑒𝑟𝑠                 
𝑀
By using the 휀 sensitivity loss function, (SVR) aims 
𝑓(𝑥) =∑( 𝛽∗ − 𝛽𝑖)𝐾(𝑥𝑖 − 𝑥) + 𝑏 (12) 
to eliminate the distinction across the training data and the 𝑖
hyperplane region and choose a hyperplane that produces 𝑖=1
the best result. The goal function for (SVR) optimization The kernel function, which is displayed as 𝐾(𝑥𝑖 − 𝑥), 
is displayed by Eq. (10): exhibits the capacity to convert the training data into a 
1
𝑚𝑖𝑛𝐹(𝑍, 𝑏, 휁𝑖 , 휁
∗ ) = ‖𝑍‖2 + 𝑐 ∑𝑀 higher nonlinear l-dimensional space. Therefore, this 
𝑖=1(휁2 𝑖 +𝑖 (10) methodology is deemed appropriate for solving issues 
 휁∗ ) 
𝑖 related to nonlinear relationships, including projecting 
With the confinements: electrical power. Figure 1 shows the operational diagram 
𝑦𝑖 − 𝑍
𝑇𝜑(𝑥𝑖) − 𝑏 ≤ 휀 + 휁𝑖       𝑖 = 1, 2, … , ?̅? for SVR. 
𝑍𝑇𝜑(𝑥𝑖) + 𝑏 − 𝑦𝑖 ≤ 휀 + 휁∗     𝑖 = 1, 2, … , ?̅? 
𝑖
 
Figure 1: The progress and validation flowchart of an SVR scheme 
254   Informatica 49 (2025) 249–268                                                                                                                                   Y. Lan et al. 
 
2.3 AOSMA 𝑋𝑖 = (𝑥
1
𝑖 , 𝑥
2
𝑖 , ⋯ , 𝑥
𝑑
𝑖 ), ∀𝑖 ∈  [1, 𝑁] is the 𝑖𝑡ℎ slime 
The plasmodial slime mold's oscillatory mode is the basis mold's location in 𝑑-dimension. 
for SMA. The slime mold employs a positive-negative 𝐹(𝑋𝑖), ∀𝑖 =  [1, 𝑁] symbolizes the 𝑖𝑡ℎ slime's 
feedback mechanism in conjunction with an oscillatory fitness. 
mode to establish the optimal route toward nutrition [30].  The following represents the location as well as 
AOSMA is a new statistical technique that incorporates an fitness of the slime mold at round 𝑡: 
opposition-based learning-based adaptive decision- 𝑥1
 1   𝑥
2
1   ⋯  𝑥
𝑑
1 𝑋
 1
2
making method to improve slime mold's nearing conduct  𝑥
1
𝑋(𝑥) = 2   𝑥2   ⋯  𝑥
𝑑
2  𝑋
= [ 2] (13) 
[31].  ⋮     ⋮      ⋮      ⋮  ⋮
Let it be assumed that a total of 𝑁 individuals of the [𝑥1𝑁  𝑥
2
𝑁   ⋯  𝑥
𝑑 𝑋
𝑁] 𝑁
species of slime mold under consideration are resident in 𝐹(𝑋) = [𝐹(𝑋1), 𝐹(𝑋2),⋯ , 𝐹(𝑋𝑁) ] (14) 
the search domain that is bounded by an upper boundary 
In the (𝑡 + 1) cycle, the situation of the slime mold 
(UB) and a lower boundary (LB) for theoretical 
has been advanced. It has undergone an upgrade in its 
framework development of the (AOSMA). 
spatial disposition, which determine as Eq. (15): 
𝑋𝐿𝐵(𝑡) + 𝑉𝑑(𝑊. 𝑋𝐴(𝑡) − 𝑋𝐵(𝑡))    𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 < 𝑚𝑖
𝑋𝑖(𝑡 + 1) = {                              𝑉𝑒 . 𝑋𝑖(𝑡)     𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 ≥ 𝑚𝑖 , ∀𝑖 ∈ [1, 𝑁]  (15) 
𝑟𝑎𝑛𝑑. (𝑈𝐵 − 𝐿𝐵) + 𝐿𝐵                               𝑝1 < 𝑧 
𝑋𝐿𝐵 is the best local slime mold The randomly assigned velocities are known as 𝑉𝑑 
𝑋𝐴 and 𝑋𝐵 are pooled individuals by random  and 𝑉𝑒 and are defined as follows: 
𝑊 is the weight factor 𝑉𝑑 ∈ [−𝑑, 𝑑] (23) 
𝑉𝑑 and 𝑉𝑒 are the random velocities. 
𝑝1 and 𝑝2 are randomly chosen numbers in [0,1] 𝑉𝑒 ∈ [−𝑒, 𝑒] (24) 
The slime mold's chance, which starts at a random 
𝑡
search situation, is fixed at 𝛿 = 0.03. 𝑑 = arctanh (− ( ) + 1)     (25) 
𝑇
The 𝑖 − 𝑡ℎ member of the population's threshold 𝑡
value, 𝑚𝑖, aids in choosing the slime mold's location, 𝑒 = 1 −  (26) 
which is calculated as Eq. (16): 𝑇
𝑚 T is the maximum cycle.  
𝑖 =  𝑡𝑎𝑛ℎ|𝐹(𝑋𝑖) − 𝐹𝐺|, ∀𝑖 ∈ [1, 𝑁] (16) 
SMA holds great promise for both investigation and 
𝐹𝐺 = 𝐹(𝑋𝐺) (17) exploitation in technological problem-solving and 
enhancement. However, the improvement of slime mold 
𝑊(𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹(𝑖)) regulations in the SMA area is nevertheless reliant on a 
 𝐹𝐿𝐵 − 𝐹(𝑋𝑖) 𝑁
 1 + 𝑟𝑎𝑛𝑑. log ( + 1)    1 ≤ 𝑖 ≤ count of basic circumstances.  
𝐹𝐿𝐵 − 𝐹𝐿𝑤 2
=  (18) Case 1: The region's best slime mold, 𝑋𝐿𝐵, and two 
 𝐹𝐿𝐵 − 𝐹(𝑋𝑖) 𝑁
 1 − 𝑟𝑎𝑛𝑑. log ( + 1)   < 𝑖 ≤ 𝑁 random individuals, 𝑋𝐴 and 𝑋𝐵, with velocity 𝑉𝑑, drove to 
{ 𝐹𝐿𝐵 − 𝐹𝐿𝑤 2 determine when 𝑝1  ≥  𝑧 and 𝑝2  <  𝑚𝑖. This stage makes 
it easier to strike a balance amongst discovery and 
𝐹𝐺 and 𝑋𝐺 are the values of worldwide top ranking extraction. 
and worldwide best well-being. Case 2: The orientation of the slime mold with 
𝑟𝑎𝑛𝑑 displays a random number in within [0,1] velocity 𝑉𝑒 directs the search when 𝑝1  ≥  𝑧 and 𝑝2  ≥  𝑚𝑖. 
𝐹𝐿𝐵 and 𝐹𝐿𝑤 are local best and worst fitness values.  This instance facilitates fraud. 
The utilization of an ascending order for sorting Case 3: When 𝑝1 < 𝑧, the person reinitializes within 
fitness values can be employed in a minimization a specified search domain. This phase facilitates 
problem: investigation. 
[𝑆𝑜𝑟𝑡𝐹 , 𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹  ] = 𝑠𝑜𝑟𝑡(𝐹) (19) Case 1 shows how the possibilities of finding 
The local best and worst fitness also the local best solutions are improperly controlled during exploration and 
slime mold 𝑋𝐿𝐵 are computed as Eqs. (20-22): exploitation since 𝑋𝐴 and 𝑋𝐵 are two random slime molds. 
𝐹 To get around this limitation, 𝑋𝐴 can be used in place of 
𝐿𝐵 = 𝐹(𝑆𝑜𝑟𝑡𝐹(1)) (20) 
best local individual 𝑋𝐿𝐵. Consequently, the location of 
𝐹𝐿𝑊 = 𝐹(𝑆𝑜𝑟𝑡𝐹(𝑁)) (21) the 𝑖 − 𝑡ℎ component is remodeled as Eq. (27):
𝑋𝐿𝐵 = 𝑋(𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹(1)) (22) 
𝑋𝐿𝐵(𝑡) + 𝑉𝑑(𝑊. 𝑋𝐿𝐵(𝑡) − 𝑋𝐵(𝑡))    𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 < 𝑚𝑖
𝑋𝑛𝑖(𝑡) = {                              𝑉𝑒 . 𝑋𝑖(𝑡)                    𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 ≥ 𝑚𝑖   (27) 
𝑟𝑎𝑛𝑑. (𝑈𝐵 − 𝐿𝐵) + 𝐿𝐵                               𝑝1 < 𝛿 
 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 255 
 
Case 2 illustrates how slime mold deliberately targets A flexible decision is formed drawing on the prior 
a nearby location, resulting in a path with a lower fitness worth of fitness 𝑓(𝑋𝑖(𝑡)) and the present fitness value 
level. A better approach to this issue is to implement an 𝑓(𝑋𝑛𝑖(𝑡)) in the event of a depleted nutrient pathway. 
adaptive decision system. This is a typical academic kind of writing. It helps provide 
Case 3 illustrates that the SMA offers criteria for added research as needed. Then, the situation for the 
exploration. However, with a small value 𝛿 = 0.03, the subsequent cycle is improved: 
exploration has been limited. To address the issue, it is 𝑋𝑖(𝑡 + 1)
imperative to introduce an auxiliary exploration adjunct 𝑋𝑛𝑖(𝑡)     𝐹(𝑋𝑛𝑖(𝑡)) ≤ 𝐹(𝑋𝑖(𝑡))
for SMA. A practical approach to addressing the = {  ,    ∀𝑖 (30) 
𝑋𝑟𝑖(𝑡)     𝐹(𝑋𝑛𝑖(𝑡)) > 𝐹(𝑋𝑖(𝑡))limitations of Cases 2 and 3 entails employing a flexible 
∈ [1, 𝑁] 
decision approach that leverages opposition-based 
learning (OBL) to determine the necessity of additional The aforementioned AOSMA framework is displayed 
exploratory efforts [32]. The OBL uses a defined 𝑋𝑜𝑝𝑖  in in pseudo-code, as shown in Algorithm 1. 
the search domain, which is precisely the opposite of the In this study, the Adaptive Opposition Slime Mold 
𝑋𝑛𝑖 for each member (𝑖 =  1,2,⋯ , 𝑁), and compares it Algorithm (AOSMA) is used not as a standalone 
to upgrade the following cycles’ situation. It assists in optimizer but as a hybrid component integrated with 
improving convergence and avoiding the chances of being Support Vector Regression (SVR). AOSMA optimizes 
closed in the local minima. So, the 𝑋𝑜𝑝𝑖   for the 𝑖 − 𝑡ℎ three key hyperparameters of SVR—specifically the 
individual in 𝑗 − 𝑡ℎ (𝑗 = 1,2,⋯ , 𝑠)  dimension is regularization parameter C, the epsilon-insensitive loss 
margin ε, and the kernel coefficient γ—with the goal of 
described as follows: 
minimizing prediction error measured by RMSE. Through 
𝑗
𝑋𝑜𝑝𝑖 = min(𝑋𝑛𝑖(𝑡)) + 𝑚𝑎𝑥(𝑋𝑛𝑖(𝑡)) its adaptive opposition-based learning strategy and 
𝑗 (28) 
− 𝑋𝑛𝑖 (𝑡) dynamic parameter control, AOSMA allows for more 
effective exploration of the search space and helps prevent 
𝑋𝑟𝑖 represents the 𝑖 − 𝑡ℎ member’s situation in the 
premature convergence. As a result, the hybrid AOSMA-
reduction issue and is depicted as: 
SVR model achieves better accuracy and generalization in 
𝑋𝑜𝑝𝑖(𝑡)     𝐹(𝑋𝑜𝑝𝑖(𝑡)) < 𝐹(𝑋𝑛𝑖(𝑡)) predicting California Bearing Ratio (CBR) values from 
𝑋𝑟𝑖 = {  (29) 
𝑋𝑛𝑖(𝑡)     𝐹(𝑋𝑜𝑝𝑖(𝑡)) ≥ 𝐹(𝑋𝑛𝑖(𝑡)) geotechnical data. 
Algorithm 1: AOSMA 
Begin 
Using the criteria for searching boundary range [𝐿𝐵, 𝑈𝐵], choose a target variable 𝑓 with inputs 𝑁, 𝑠, 𝑇, and 𝛿. 
Outputs: 𝑋𝐺 and 𝐹𝐺 
Initialization: Launch the slime mold at arbitrary. 
 𝑋𝑖 = (𝑥
1, 𝑥2𝑖 𝑖 , ⋯ , 𝑥
𝑑
𝑖 ), ∀𝑖 ∈  [1, 𝑁] during the first revision, inside the query boundaries 𝑈𝐵 and 𝐿𝐵 
 𝑡 =  1. 
while (𝑡 ≤  𝑇) 
→ Determine the 𝑁 slime mold's fitness values 𝐹(𝑋). 
→ Put the fitness value in order. 
→ The local best individual 𝑋𝐿𝐵 should be updated to match the local best conditioning 𝐹𝐿𝐵. 
→ The local weakest fitness 𝐹𝐿𝑊 should be updated. 
→ Update the matching worldwide greatest individual 𝑋𝐺 and global best fitness 𝐹𝐺. 
→ Refresh the measurement of 𝑊. 
→ Update the 𝑑 using Eq. (25) and 𝑒 using Eq. (26). 
for (each slime mold 𝑖 =  1: 𝑁) 
o Create the 𝑝1 and 𝑝2 randomized numbers. 
o Create the 𝑚𝑖 threshold quantity. 
o Utilizing Eq. (27), determine the new slime mold location 𝑋𝑛𝑖. 
o Determine the new slime mold 𝐹(𝑋𝑛𝑖)'s nutritional value. 
if (𝐹(𝑋𝑛𝑖) > 𝐹(𝑋𝑖) // Adaptive decision strategy 
• Estimate 𝑋𝑜𝑝𝑖  using Eq. (24). //Opposition-based learning 
• Select 𝑋𝑟𝑖 using Eq. (29). 
End 
o Revise the subsequent cycle slime mold 𝑋𝑖 using Eq. (30). 
end 
→ The following repetition 𝑡 =  𝑡 +  1 
end 
The result is 𝑋𝐺, representing the global most effective region. 
256   Informatica 49 (2025) 249–268                                                                                                                                   Y. Lan et al. 
 
2.4 AFT 𝜆1 (𝜆1 = 1)  refers to a fixed value that controls 
exploration and exploitation. 
The present investigation clarifies the basic AFT 
algorithm's mathematical model, which is described in 𝑎 = [(𝑛 − 1). 𝑟𝑎𝑛𝑑(𝑛, 1)] (34) 
[33]. The scheme encompasses three states that can be The vector 𝑟𝑎𝑛𝑑(𝑛, 1) is generated as a set of random 
analyzed and delineated in the following: numbers within the bounds of [0,1]. 
Case 1: The pursuit of Ali Baba by the thieves, as 𝑥𝑡
𝑚𝑡 𝑖      𝑖𝑓 𝑓(𝑥
𝑡
𝑖 ) ≥ 𝑓(𝑚𝑡
𝑎(𝑖))
derived from information obtained from a source, can be 𝑎(𝑖) = {
𝑚𝑡  
𝑎(𝑖)        𝑖𝑓 𝑓(𝑥
𝑡
𝑖 ) < 𝑓(𝑚𝑡 (35) 
displayed by a simulation of their situations, as illustrated 𝑎(𝑖))
in Eq. (31): The score of the fitness function is denoted by 𝑓(0). 
𝑥𝑡+1 = 𝑔𝑏𝑒𝑠𝑡𝑡 + [Tdt(𝑏𝑒𝑠𝑡𝑡𝑖 − 𝑦
𝑡
𝑖 𝑖 )𝑟1 +
Case 2: Thieves may perceive they have been tricked 
Tdt(𝑦𝑡𝑖 −𝑚
𝑡 and will likely start exploring unfamiliar and unplanned 
𝑎(𝑖))𝑟2]𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5),   𝑝 ≥ (31) 
areas. 
0.5,   𝑞 > 𝑃𝑝𝑡   
𝑥𝑡+1𝑖 = 𝑇𝑑𝑡[(𝑢𝑗 − 𝑙𝑗)𝑟 + 𝑙𝑗]; 𝑝 ≥ 0.5 , 𝑞 ≤ 𝑃𝑝𝑡 (36) 
𝑦𝑡𝑖  represents Ali Baba’s situation regarding the thief 
𝑖. The upper and lower bounds of the search domain at 
𝑚𝑡
𝑎(𝑖) represents the amount of cleverness that dimension j are displayed by 𝑢𝑗 and 𝑙𝑗, respectively. 
Marjaneh uses to cover up thievery 𝑖.  r displays a stochastic quantity generated in the 
𝑥𝑡+1𝑖  denotes the situation of the 𝑖 − 𝑡ℎ interval [0, 1]. 
 thief. 
𝑔𝑏𝑒𝑠𝑡𝑡 Case 3: To improve AFT's exploration and 
 is the most excellent situation a thief has ever 
exploitation capabilities, thieves can investigate 
had worldwide. 
alternative search situations beyond those identified 
𝑟1, 𝑟2, rand, 𝑝, and 𝑞 are random values created within 
through the utilization of Eq. (31). This scenario can be 
[0, 1] 
formulated as Eq. (37): 
𝑏𝑒𝑠𝑡𝑡𝑖  is the optimal location of thief 𝑖 has determined. 
Tdt is the robbers' surveillance area as specified by 𝑥𝑡+1 = 𝑔𝑏𝑒𝑠𝑡𝑡 − [Tdt𝑖 (𝑏𝑒𝑠𝑡𝑡𝑖 − 𝑦
𝑡
𝑖 )𝑟1
Eq. (32). + Tdt(𝑦𝑡𝑖 (37) 
𝑝 ≥ 0.5 presents either 0 or 1 −𝑚𝑡
𝑎(𝑖))𝑟2]𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5) 
𝑃𝑝𝑡 is Ali Baba's potential perceptive ability, as stated 
by Eq. (33). Algorithm 2 concisely and formally describes the 
𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5) can be −1 or 1, and iterative pseudo-code stages that correspond to the core 
𝑎 is defined as Eq. (34). AFT. 
𝑡 The proposed hybrid framework combines the Dingo 
Tdt = 𝜏0𝑒
−𝜏1( )𝜏
𝑇 1 (32) Optimization Algorithm (DOA) with Support Vector 
𝑡 and 𝑇 Please consult the current and maximal Regression (SVR) to tune the model’s hyperparameters: 
repetition standards, accordingly. C, ε, and γ. The DOA emulates the natural hunting tactics 
𝜏0 (𝜏0 = 1) is a preliminary estimate of the of dingoes, such as surrounding, chasing, and attacking 
monitoring length.  prey, which are adapted into search operators for 
𝜏1 (𝜏1 = 2) is a set amount that regulates the exploring the SVR parameter space. The aim is to 
discovery and utilization of resources. minimize the SVR’s RMSE on training data by identifying 
𝑡 the optimal parameter combination. By balancing 
𝑃𝑝𝑡 = 𝜆0log (𝜆1( )𝜆0 (33) diversification and intensification, the DOA-SVR hybrid 
𝑇
model can effectively avoid local optima and enhance 
𝜆0 (𝜆0 = 1) depicts the final assessment of the 
SVR's ability to generalize for accurate CBR prediction. 
robbers' chances of completing their task after the hunt. 
Algorithm 2: AFT 
Establish the regulation settings and get started. 
Start by assessing every thief's starting, optimal, and worldwide situations. 
Start by assessing Marjane's intelligence in comparison to all thieves.  
Set 𝑡 ← 1 
While (𝑡 ≤ 𝑇) do 
Eq. (33) is used for modifying the input parameter 𝑃𝑝𝑡 . 
for each thief, do 
if (𝑝 ≥ 0.5) then 
if (𝑞 ≥ 𝑃𝑝𝑡) then 
Use Equation (32) to update the thieves' positioning. 
else 
Utilizing Equation (36), adjust the robbers' whereabouts. 
end if 
else 
Refine the thieves’ situation by Eq. (37). 
end if 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 257 
 
end for 
Refresh all thieves' current, best, and worldwide standings. 
Utilizing Eq. (35), alter Marjane's wit goals. 
          𝑡 = 𝑡 + 1 
end while 
Give back the world's optimal solution. 
 ?⃗? = (1,1) provides access to the dingo's situation at 
2.5 Dingo optimization algorithm (DOA) (𝑃∗ − 𝑃, 𝑄∗)For example, Eqs. (38) and (39) make it 
From the earliest times, nature has consistently been easier for dingos to travel throughout the hunting area and 
regarded as an exceptionally instructive and impactful find their prey randomly. 
educator. Every species that exists on the planet Earth 
possesses a distinct and unique mechanism for ensuring its 2.5.2 Hunting 
survival. The present study involves the mathematical Using a mathematical method, creating a dingo hunting 
modeling of hunting behavior and social arrangements in strategy involves assuming that the alpha, beta, and other 
the dingo species. This analytical approach is the basis for members of the pack have a thorough awareness of the 
developing a DOA nature-inspired optimization technique possible prey sites. When conducting hunting trips, the 
[34]. The two primary constituents of DOA are regarded alpha dingo always takes the lead. However, other dingo 
as exploration and exploitation. The algorithm generates species, including beta, may hunt as well. Eqs. (43) to (51) 
various anticipated outcomes within the search domain are developed with this issue in line with the discussion. 
during the initial exploration phase. However, the 
?⃗? 𝛼 = |𝐴 1. ?⃗? 𝛼 − 𝑃⃗⃗  ⃗
(43) 
  | 
subsequent exploitation phase enables identifying and 
pursuing the most desired resolutions within the ?⃗?   
𝛽 = |𝐴2. ?⃗? ⃗⃗ (44)
𝛽 − 𝑃 ⃗  | 
predetermined space. To discern the optimal resolution for 
a given pragmatic concern, refinement, and integration of ?⃗?  ⃗
𝑜 = |𝐴 ⃗⃗
3. ?⃗? 
(45) 
𝑜 − 𝑃  | 
both constituent factors are necessary. Nonetheless, 
achieving equilibrium among the proposed algorithm's ?⃗?  
1 = |?⃗?𝛼 − ⃗⃗𝐵 ⃗
(46) 
 . ?⃗? 𝛼 | 
constituents is arduous due to its stochastic disposition. To 
?⃗? 2 = |?⃗?  
𝛽 − ⃗⃗𝐵 ⃗ . ?⃗?𝛽 |
(47) 
address an authentic engineering dilemma, the impetus for  
developing an algorithm implementation utilizing 
?⃗? 3 = |?⃗? 𝑜 − ⃗⃗𝐵 ⃗
(48) 
hybridized meta-heuristics is derived from this  . ?⃗? 𝑜 | 
inspirational notion [34]. The following formulae are utilized to determine each 
Dingo optimization is done by the computational dingo's intensity: 
designing of the prey's pursuit, encirclement, and attack. 1
𝐼 𝛼 = log ( + 1) (49) 
𝐹𝛼 − (1𝐸 − 100)
2.5.1 Encircling 1
Given the lack of previous knowledge about the search 𝐼 𝛽 = log( + 1) (50) 
𝐹𝛽 − (1𝐸 − 100)
location and its ideal characteristics, it is proposed that the 
objective or target prey is the best agent tactic currently in 1
𝐼 𝑜 = log ( + 1) (51) 
use, representing the social hierarchy of dingoes. The 𝐹𝑜 − (1𝐸 − 100)
following mathematical formulas can be used to formalize 
the dingoes' behavior: 2.5.3 Attacking 
?⃗? 𝑑 = |𝐴 , ?⃗? 𝑝(𝑥) − 𝑃⃗⃗  ⃗ (𝑖) | (38) If a situation update is unavailable, it may be inferred that 
the dingo successfully concluded its hunt through a 
𝑃⃗⃗  ⃗ (𝑖 + 1) = ?⃗? 𝑝(𝑥) − ?⃗? . ?⃗? (𝑑) (39) predatory attack. To formally articulate the strategy, the 
value of ?⃗?  is systematically diminished linearly through 
𝐴 = 2 . 𝑎 1 (40) 
the utilization of mathematical notation. Noteworthy is the 
?⃗? = 2?⃗?  . 𝑎 2 − ?⃗?   (41) fact that the variation range of ?⃗? 𝛼 is further diminished by 
?⃗? . The value above may be identified as ?⃗? 𝛼, which is a 
3
?⃗?  = 3 − (𝐼 × ( )) (42) stochastic variable generated within the range of [-3b, 3b], 
𝐼𝑚𝑎𝑥 where the constant ?⃗?  undergoes a decremental process 
The neighborhood dingoes' geographic coordinates from 3 to 0 over a series of cycles. When ?⃗? 𝛼 Values are 
are displayed as a two-dimensional vector. The dingo may randomly generated within the interval [1,1]. An 
adjust its situation to match the coordinates of (𝑃, 𝑄) exploratory agent is capable of moving to any possible 
based on the prey's location, which is displayed as situation along the trajectory between its existing location 
(𝑃∗, 𝑄∗). By adjusting the  𝐴  and  ?⃗?  vectors about the and the prey's location. 
present situation, the graphic shows every possible 
location around the ideal agent. Setting  𝐴 = (1,0) and 
258   Informatica 49 (2025) 249–268                                                                                                                                   Y. Lan et al. 
 
2.5.4 Searching be characterized as a stochastic vector whereby the 
Dingoes exhibit hunting patterns primarily determined by elements with values that are less than or equal to one take 
their pack's location. They consistently progress in pursuit priority over those greater than or equal to one. This 
feature elucidates the gap's influence as described in Eq. 
of locating and subduing prey. ⃗⃗𝐵 ⃗  represents random 
(38).  The hybrid framework combines the Dingo 
variables. Notably, if the value assigned to ⃗⃗𝐵 ⃗  falls below Optimization Algorithm (DOA) with Support Vector 
-1, it implies that the prey is retreating from the search Regression (SVR) to tune hyperparameters: C, ε, and γ. 
agent. Conversely, if ⃗⃗𝐵 ⃗  exceeds 1, the pack is advancing Inspired by the natural hunting strategies of dingoes, such 
toward its prey. This particular intervention facilitates the as surrounding, chasing, and attacking prey, the DOA 
Department of Defense conduct a comprehensive global translates these behaviors into search operators that 
reconnaissance of identified targets. One factor explore the SVR parameter space. Its aim is to minimize 
contributing to a heightened probability of exploration the RMSE of SVR on training data by identifying the best 
within the DOA is the component denoted as 𝐴⃗⃗  ⃗ . In Eq. parameter combination. By balancing exploration and 
(40), the vector 𝐴⃗⃗  ⃗  can generate a range of random exploitation, the DOA-SVR hybrid effectively avoids 
numbers within the interval between 0 and 3, independent local optima and improves SVR’s generalization ability, 
of the weight of the prey selected. The DOA function can leading to more accurate CBR predictions. 
Algorithm 3 offers the pseudo-code for the DOA. 
Algorithm 3: Dingo Optimization 
Input: The population of dingoes 𝐷𝑛 (𝑛 =  1, 2, . . . , 𝑛) 
Output: The best dingo. (Here, the best values are minimum)  
Generate initial search agents 𝐷𝑖𝑛  
Start the value of 𝑏⃗⃗⃗   , 𝐴 , and ⃗⃗𝐵 ⃗ . 
While the Termination condition is not reached, do 
Appraise each dingo’s fitness and intensity cost. 
𝐷𝛼  = dingo with the best search 
𝐷𝛽 = dingo with the second-best search 
𝐷𝑜 = Dingoes search outcomes afterward 
Cycle1 
repeat 
for 𝑖 = 1: 𝐷𝑖𝑛  do 
Renew the latest search agent state. 
end for 
Project the fitness and intensity cost of dingoes. 
Record the value of 𝑆𝛼, 𝑆𝛽, 𝑆𝛿  
Record the value of 𝑏⃗⃗⃗   , 𝐴 , and ⃗⃗𝐵 ⃗ . 
             𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 =  𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 + 1 
Monitor if cycle≥ Stopping criteria 
output 
end while 
 
Choosing AFT, AOSMA, and DOA as optimizers was algorithms during training and optimization to maintain 
driven by their unique algorithmic bases and search consistent behavior during repeated runs and to support 
methods, enabling a thorough comparison of their reproducibility. 
metaheuristic behaviors. These approaches are relatively 
2.7 Hybridization strategy of SVR with 
recent and less studied, yet they show competitive 
performance in diverse regression and engineering tasks. metaheuristic algorithms 
Incorporating them with SVR in this research allows 
This study developed three hybrid machine learning 
evaluation of both their predictive accuracy and 
models—SVAF, SVSM, and SVDO—by integrating 
optimization stability across different algorithmic 
Support Vector Regression (SVR) with three advanced 
frameworks. 
metaheuristic optimization algorithms: Alibaba and Forty 
Thieves (AFT), Adaptive Opposition Slime Mold 
2.6 Reproducibility and run settings Algorithm (AOSMA), and Dingo Optimization Algorithm 
(DOA). The goal is to boost SVR's prediction accuracy by 
To ensure the robustness and reproducibility of the results, 
optimizing its key hyperparameters—penalty parameter 
each hybrid SVR model (AFT-SVR, DOA-SVR, 
C, kernel parameter γ, and epsilon-insensitive loss ε—
AOSMA-SVR) was executed 30 independent times. This 
using the global search methods provided by these 
allows for reliable statistical analysis of model 
metaheuristics. While SVR is a strong nonlinear 
performance. Additionally, random seed initialization was 
regression technique, its effectiveness heavily relies on 
controlled using a fixed seed (e.g., seed = 42) across all 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 259 
 
proper parameter tuning. Traditional manual or grid 
𝑛
search methods are often inefficient or may yield 1
𝑅𝑀𝑆𝐸 = √ ∑(𝑑 𝑝 2
𝑛 𝑖 − 𝑖)  (53) 
suboptimal results, especially with complex, high-
dimensional geotechnical data. Therefore, this hybrid 𝑖=1
approach exploits the global search and convergence 𝑛
1
strengths of nature-inspired algorithms to automate SVR 𝑀𝑆𝐸 =  ∑((𝑑
𝑛 𝑖 − 𝑝𝑖)
2 (54) 
hyperparameter optimization.  
𝑖=1
- In SVAF, the AFT algorithm explores the search 𝑅𝑀𝑆𝐸
space dynamically through mechanisms like global 𝑅𝑆𝑅 =  (55) 
𝑆𝑡. 𝐷𝑒𝑣
surveillance, balancing exploration and exploitation, and 
adaptive decision-making inspired by Marjaneh. These ∑𝑛𝑖=1|𝑑𝑖 − 𝑏𝑖|
𝑊𝐴𝑃𝐸 =
∑𝑛
 (56) 
features enable it to identify optimal SVR parameters 𝑖=1|𝑏𝑖|
reliably. 
𝑛 indicates the count of samples; 𝑑𝑖 displays the 
- In SVSM, AOSMA enhances the slime mold 
forecasted value; 𝑏𝑖 displays the actual value, while  ?̅? and 
algorithm with opposition-based learning and adaptive 
 ?̅? represent the mean of the forecasted value and the 
strategies, allowing it to escape local minima more 
average of the actual amount, respectively. 
effectively and converge more rapidly, thus providing 
better hyperparameter configurations. 
- In SVDO, the DOA mimics the social hunting 3 Outcomes and discussion 
behaviors of dingoes—such as encircling, attacking, and This paper reports on developing a Support Vector 
searching—to iteratively fine-tune the SVR parameters Regression model using three new enhancement 
for higher prediction accuracy. techniques, AFT and DOA, aimed at developing three 
Each metaheuristic aimed to minimize the RMSE of hybrid predictive models for soil estimation CBR. In 
SVR predictions on training data, with the best parameter previous schemes, the information about information was 
set used to train the final hybrid model. The process was divided into two subsets: a set to learn and a set to validate 
repeated 30 times to ensure stability and reproducibility. the scheme, 70% and 30% of the data, respectively. The 
This hybrid approach directly supports the study's goal of five consecutive statistical metrics, namely, R2, RMSE, 
creating accurate, efficient, and generalizable models for MSE, RSR, and WAPE, were considered to get the full 
predicting the California Bearing Ratio (CBR) of soils. view of the optimizers' performance. Outcomes can be 
Using these metaheuristics not only enhances SVR’s shown in Table 2. The statistical indicators are analyzed 
learning ability but also reduces the manual effort and in this section to determine whether one model is generally 
computational cost typically required for parameter better. By studying the various R2 values among these 
tuning. different schemes, it would be crystal clear that the most 
promising outcomes are given out by SVAF in both the 
2.8 Performance evaluation tactics testing and training stages, with 0.9968 and 0.9929 values, 
A range of evaluators was deployed to appraise hybrid respectively. Meanwhile, the minimum value of R2 
schemes' productivity in CBR value prediction. The list of among all comparative schemes was given to the SVSM 
evaluators comprises RMSE, MSE, R2, the ratio of RMSE model at 0.9767. The key thing worth mentioning here is 
to standard deviation (RSR), and lastly, weighted absolute that all the schemes have increased R2 during their test 
percentage error, or WAPE. R2 determines the degree of phases, indicating that the schemes are well-trained. 
linear relationship between the actual and forecasted Maximum RMSE, MSE, RSR, and WAPE values are 
magnitudes. The RMSE is the square root of the ratio 1.6271, 2.6475, 0.1524, and 0.0334 for SVSM in training. 
between the square of the count of specimens and the For the testing section, maximum RMSE values, MSE, 
estimated value departure from the actual value. WAPE RSR, and WAPE are 1.5824, 2.5042, 0.1409, and 0.0312 
could be quantified by dividing the total absolute error by for SVSM. By contrasting the evaluators' and errors' 
the total real demand. Eq. (21-25) provides the values of values, the best hybrid scheme for estimating the CBR 
these metrics above. value of soils is the combination of SVR and the ATF 
2 algorithm (SVAF). This model has the highest R2 value 
∑𝑛 (0.9968 in the testing phase) and the lowest error value 
𝑖=1(𝑏 ?̅? ( − ?̅?
𝑅2 =  
𝑖 − ) 𝑑𝑖 )
  (52) (0.7946 in testing) among all three components. 
√[∑𝑛𝑖=1(𝑏𝑖 − ?̅?)
2][∑𝑛𝑖=1(𝑑𝑖 − ?̅?)
2]
( )
Table 3: The hybridized schemes produced the findings 
Schemes SVAF SVSM SVDO SVR 
Section Train Test Train Test Train Test Train Test 
RMSE 0.9316 0.7946 1.6271 1.5824 1.3363 1.171 1.336392 1.171305 
R2 0.9929 0.9968 0.9767 0.9825 0.9852 0.992 0.985202 0.992446 
MSE 0.868 0.6314 2.6475 2.5042 1.7859 1.372 1.7859 1.372 
260   Informatica 49 (2025) 249–268                                                                                                                                   Y. Lan et al. 
 
RSR 0.0872 0.0708 0.1524 0.1409 0.1251 0.1043 0.1251 0.1043 
WAPE 0.0162 0.0141 0.0334 0.0312 0.0234 0.0212 0.0234 0.0212 
Fig. 2 displays the dispersed presentations illustrating respectively. Three schemes were produced by the 
the correlation between the gauged and expected subsequent analysis, which combined the SVR scheme 
California Bearing Ratio values. R2 and RMSE are two with the three optimizer strategies applied to training and 
types of assessments that include numerical data. When testing. Fig. 2 shows the findings of the current 
the value of this evaluation metric decreases, density investigation. R2 of SVAF appears to be comparatively 
increases because RMSE functions as a deviation more favorable than the rest of the schemes because the 
controller. Additionally, the training and testing data data points maintain the same directionality and are nearer 
points are drawn toward the center axis by the R2 the centerline. From empirical data, it can be induced that 
evaluator. The figure below illustrates several other in all cases, and quite noticeable in the case of SVDO, the 
variables which also include but are not restricted to the precision of the test phase values is higher than that of the 
linear regression model's centerline, which is positioned at training phase. Overall, the result from the acquired data 
the location Y=X, as well as dual lines that are in red in Fig. 2 is the most favorable result using the SVR method 
below and above the midline, in that order, at Y=0.9X and and the ATF optimizer since R2 and RMSE in learning 
Y=1.1X. The lower and upper ends of the line and validation also gave the best result. That could be due 
intersections provide the false predictions of an to the capability of this model in terms of minimizing error 
underestimation and an overestimation of values, and being the best in performance regarding the R2. 
  
 
Figure 2: The scatter plot of expected and measured values
Fig. 3 presents the correlation between expected and distinct parts: model training and model validation. 
actual CBR values obtained using three different classes Among them, the SVAF representing an SVR and the 
of hybrid schemes. The graphs have been divided into two ATF algorithm generate closer agreement between the 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 261 
 
gauged CBR values of the expected output for testing and unfavorable agreement appears quite clearly in SVR and 
training data sets. By contrast, the status of the least AOSMA's union, SVSM. 
 
 
 
Figure 3: The comparison line-symbol plot between expected and gauged CS
Fig. 4 presents the deviations between the gauged and the same set. The figure shows that, for the highest and 
estimated values through three hybrid schemes regarding lowest performing schemes, the majority of errors are 
the California Bearing Ratio. This figure indicates that the found in a narrower range of (-3,3) % in SVAF and (-
greatest error for SVSM when assessed is around 18%, 6,17%) % in SVSM. 
whereas for schemes undergoing training, it was 12% in 
262   Informatica 49 (2025) 249–268                                                                                                                                   Y. Lan et al. 
 
 
 
Figure 4: The error distribution of the schemes over samples shown in a time series plot.
The errors in the observed values of the undrained %, and 7% for SVSM during training and testing of the 
shear strength for the three different hybrid scheme schemes. The figure reflects the distribution of 25-75% of 
types—SVAF, SVSM, and SVDO—are displayed in Fig. errors in a range less than (-1, 1) % in SVAF and (-3, 3) % 
5. Based on this figure, the maximum errors are about 11 in SVSM: best and worst schemes, respectively. 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 263 
 
  
 
Figure 5: The standard half-box plot showing the error ratio of the hybrid schemes created.
To enhance the statistical robustness of the proposed robustness across runs. The SVR-Dingo Optimization 
models, 95% confidence intervals for the R² values were Algorithm model also performed well, with a confidence 
calculated based on multiple independent runs of each interval of 0.7120 to 0.8298, slightly broader but with the 
algorithm. As shown in Table 4, the standard SVR model highest upper bound. Meanwhile, the SVR-AOSMA 
has the widest interval, from 0.6302 to 0.7631, indicating model shows an interval between 0.6653 and 0.7848, 
greater variability and less predictive stability. In contrast, ranking it between the other hybrids in stability and 
the three hybrid SVR models display narrower intervals performance. These intervals confirm that the SVAF 
with higher upper bounds, signifying more consistent model not only offers high prediction accuracy but also 
performance. Among these, the SVR model combined delivers consistent results, making it the most reliable 
with the Alibaba and Forty Thieves algorithms (SVAF) model among those tested for CBR estimation. 
achieved the most favorable confidence interval, from  
0.7243 to 0.8078, reflecting both high accuracy and 
  
  
  
  
 
264   Informatica 49 (2025) 235–248                                                                                                                                   Y. Lan et al. 
 
 indicating relatively limited predictive power. In contrast, 
Table 4: Confidence intervals based on R2 the SVR models enhanced with metaheuristic algorithms 
demonstrate superior performance.  Among these, the 
Lower Upper 
Model SVR-Dingo Optimization Algorithm model shows a 
Bound Bound 
SVR 0.6303 0.7632 confidence interval between 0.712 and 0.830, reflecting 
SVR + Dingo Optimization substantial improvement over the baseline. Similarly, the 
0.7120 0.8298 
Algorithm SVR-Adaptive Opposition Slime Mould Algorithm model 
SVR + Adaptive Opposition yields a confidence range of 0.665 to 0.785, suggesting 
0.6653 0.7848 
Slime Mould Algorithm better stability and generalization. Notably, the SVR-
SVR + Alibaba and the Forty Alibaba and the Forty Thieves (SVAF) model achieves the 
0.7243 0.8078 
Thieves highest lower bound (0.724) and an upper bound of 0.808, 
 indicating both high precision and consistent 
4 Sensitivity analysis performance. The limited overlap between the confidence 
intervals of the SVAF model and those of the other models 
The ANOVA-based sensitivity analysis conducted on the supports the claim of its statistically significant 
performance of different predictive models for estimating superiority. This distinction highlights the effectiveness of 
the California Bearing Ratio (CBR) reveals statistically the AFT optimizer in enhancing SVR’s learning capability 
significant differences among the models. The confidence and minimizing prediction errors. Overall, the results of 
intervals for the coefficient of determination (R²) provide the ANOVA test confirm that metaheuristic-optimized 
insight into each model's accuracy and robustness. The SVR models, particularly SVAF, provide more accurate 
baseline SVR model exhibits the lowest performance with and reliable predictions of CBR values compared to the 
a confidence interval ranging from 0.630 to 0.763, standard SVR approach. 
Table 5: Sensitivity analysis based on ANOVA 
Models lower upper 
SVR 0.630 0.763 
SVR-Dingo Optimization Algorithm 0.712 0.830 
SVR-Adaptive Opposition Slime Mould Algorithm 0.665 0.785 
SVR-Alibaba and the Forty Thieves 0.724 0.808 
 
inspired social hunting behaviors, facilitating effective 
5 Discussion neighborhood search. However, its slower convergence 
during exploitation may limit its ability to finely tune SVR 
This section compares the three hybrid models—SVAF hyperparameters, especially in high- dimensional spaces. 
(SVR + AFT), SVSM (SVR + AOSMA), and SVDO Regarding computational efficiency, SVAF requires 
(SVR + DOA)—focusing on their predictive accuracy, slightly more training time than SVSM and SVDO due to 
convergence behavior, and computational efficiency. As multiple adaptive conditions and surveillance cycles in 
shown in Table 2, SVAF outperforms the others across all AFT, but its superior accuracy justifies this. SVSM offers 
five metrics: R ², RMSE, MSE, RSR, and WAPE. During faster runtimes but less predictive precision. SVDO falls 
testing, SVAF achieved the highest R ² (0.0.9968) and the between the two in terms of performance and 
lowest RMSE (0.7946), indicating excellent computational demand. Overall, findings suggest that 
generalization and minimal error in estimating CBR SVAF provides the best balance between accuracy and 
values. This success stems from the adaptive balance optimization quality, making it a strong candidate for 
between exploration and exploitation in the Alibaba and practical CBR prediction tasks. Future research could 
Forty Thieves (AFT) optimization strategy, which explore combining AOSMA 's rapid convergence with 
enhances SVR' s ability to find optimal hyperparameters. AFT 's stability to improve training efficiency without 
The random surveillance mechanism in AFT promotes sacrificing accuracy. Future research will aim to improve 
global search, while Marjaneh's intelligence adjustment the models' applicability across various regions by testing 
enhances local refinement, enabling rapid convergence them on datasets with diverse soil types. Combining 
toward optimal SVR settings. In contrast, the SVSM Support Vector Regression with deep learning—for 
model, which employs the Adaptive Opposition Slime example, as a post-processing tool after deep feature 
Mold Algorithm, showed weaker performance (R² = extraction—could boost prediction accuracy, particularly 
0.9825, RMSE = 1.5824 during testing). Although for large or complex datasets. Another valuable approach 
AOSMA incorporates opposition-based learning to boost is integrating these hybrid AI models into geotechnical 
exploration, it can produce more oscillatory convergence software platforms, allowing real-time, data-driven 
patterns, possibly leading to suboptimal SVR tuning. Its decision-making in engineering and construction projects. 
complex adaptive threshold settings may also increase Although the hybrid SVR models presented 
sensitivity to initial parameters. The SVDO model (SVR demonstrated strong predictive performance on the 
+ Dingo Optimization Algorithm) performed moderately available dataset, there are some limitations to consider. 
(R ² = 0.992, RMSE = 1.171). DOA utilizes biologically Firstly, without an external validation set, the 
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 265 
 
generalizability of the results may be restricted beyond the Acknowledgements 
current data. Secondly, the relatively small sample size 
increases the risk of overfitting, especially with the use of We wish to state that no individuals or organizations 
metaheuristic optimization. Additionally, the dataset only require acknowledgment for their contributions to this 
encompasses a limited range of soil types and regions, investigation. 
which could limit the models' broader applicability. It is 
also important to note that larger, more diverse datasets Authorship contribution statement 
might benefit from alternative modeling techniques such Na Feng: Writing-Original initial drafting, 
as deep learning or ensemble methods to achieve better Conceptualization, Supervision, Project administration. 
predictive accuracy. These limitations will be addressed in Yulin Lan: Methodology, Software 
future research to improve the model's robustness and Zhisheng Yang: Formal Analysis, Language Reviw 
generalizability. To enhance model robustness, we plan to The authors declare that there is no conflict of interest 
use regularization like L1/L2 penalties and early stopping regarding the publication of this paper. 
to prevent overfitting. Models will be tested under various 
conditions—smaller datasets and more noise—to check 
resilience. Including confidence intervals or error margins Author statement 
for metrics like RMSE and R² will better measure The manuscript has been read and approved by all the 
uncertainty. These steps will help create more reliable, authors, the requirements for authorship, as stated earlier 
generalizable models for geotechnical uses. in this document, have been met, and each author believes 
that the manuscript represents honest work. 
6 Conclusion 
The current investigation has adopted an SVR scheme to Funding 
project the CBR value of soil. Although the outcomes of This investigation was not funded by any specific grant 
the conventional method were effective, it had some from public, commercial, or charitable funding bodies. 
limitations. The laboratory process is costly and is not 
considered to be time-effective. The drawbacks above can Ethical approval 
be overcome by substituting the software-based approach 
with artificial intelligence. The accuracy of the system in The paper has attained ethical approval from the 
predicting the CBR was quite remarkable. The input institutional review board, ensuring the protection of 
variables were selected to forecast the target parameter, participants' rights and compliance with the relevant 
which was depicted as CBR. Five different performance ethical guidelines. 
metrics were utilized to appraise the precision delivered 
by the schemes under consideration. These included R2, References 
RMSE, MSE, RSR, and WAPE. Three distinct meta-
[1] A. Chegenizadeh and H. Nikraz, “CBR test on 
heuristic optimization approaches—the Dingo 
reinforced clay,” in Proceedings of the 14th Pan-
Optimization Algorithm, Alibaba, the Forty Thieves 
American Conference on Soil Mechanics and 
Optimization algorithm, and the Adaptive Opposition 
Geotechnical Engineering (PCSMGE), the 64th 
Slime Mold Algorithm—have been examined in the 
Canadian Geotechnical Conference (CGC), 
current study to increase the system's functional 
Canadian Geotechnical Society, 2011. 
efficiency. The conclusions below may be drawn from the 
[2] T. F. Kurnaz and Y. Kaya, “Prediction of the 
analysis's outcome: 
California bearing ratio (CBR) of compacted soils 
• The thorough analysis of the pertinent characteristics 
by using GMDH-type neural network,” The 
was the foundation for developing the projection 
European Physical Journal Plus, EPJ Plus, vol. 
schemes to estimate CBR. A comparison between the 
134, Jul. 2019. 
experimental outcomes and those obtained utilizing 
https://doi.org/10.1140/epjp/i2019-12692-0.  
the suggested schemes showed that the latter's CBR 
[3] H. B. Seed and P. De Alba, “Use of SPT and CPT 
prediction accuracy was significantly high. 
tests for evaluating the liquefaction resistance of 
• In the current research, the test phase has shown that sands,” in Use of in situ tests in geotechnical 
the forecast data's scattering value increased by 0.39, engineering, ASCE, 1986, pp. 281–302. 
0.59, and 0.69 for SVAF, SVSM, and SVDO, [4] M. Gams and T. Kolenik, “Relations between 
respectively, from the training phase. electronics, artificial intelligence and information 
• The California Bearing Ratio outcomes presented in society through information society rules,” 
this investigation indicate a significant discrepancy Electronics (Basel), MDPI, vol. 10, no. 4, p. 514, 
between the observed and projected values, with an 2021. 
average underestimate of almost 1.24 for the https://doi.org/10.3390/electronics10040514.  
suggested schemes. With a value of 1.6271, the [5] R. W. Day, Soil testing manual. McGraw-Hill, 
RMSE displayed its maximum error in the scheme's 2001. 
SVSM in the training phase. The SVAF had the [6] M. M. E. Zumrawi, “Prediction of CBR Value 
lowest error rate in the testing session, with a rating from Index Properties of Cohesive Soils.,” 
of 0.7946. 
266   Informatica 49 (2025) 235–248                                                                                                                                   Y. Lan et al. 
 
University of Khartoum Engineering Journal, vol. Model, Elsevier, vol. 36, no. 9, pp. 4096–4105, 
2, no. ENGINEERING, 2012. 2012. https://doi.org/10.1016/j.apm.2011.11.039. 
[7] W. P. M. Black, “A method of estimating the [19] M. A. Shahin, M. B. Jaksa, and H. R. Maier, 
California bearing ratio of cohesive soils from “Artificial neural network applications in 
plasticity data,” Geotechnique, ICE Virtual geotechnical engineering,” Australian 
Library, vol. 12, no. 4, pp. 271–282, 1962. geomechanics, vol. 36, no. 1, pp. 49–62, 2001. 
https://doi.org/10.1680/geot.1962.12.4.271.  [20] J. A. Abdalla, M. F. Attom, and R. Hawileh, 
[8] K. B. Agarwal and K. D. Ghanekar, “Prediction of “Prediction of minimum factor of safety against 
CBR from plasticity characteristics of soil,” in slope failure in clayey soils using artificial neural 
Proceeding of 2nd South-east Asian Conference network,” Environ Earth Sci, Springer, vol. 73, 
on Soil Engineering, Singapore. June, 1970, pp. pp. 5463–5477, 2015. 
11–15. https://doi.org/10.1007/s12665-014-3800-x.  
[9] M. Linveh, “Validation of correlations between a [21] B. Yildirim and O. Gunaydin, “Estimation of 
number of penetration test and in situ California California bearing ratio by using soft computing 
bearing ratio test,” Transp Res Rec, vol. 1219, pp. systems,” Expert Syst Appl, Elsevier, vol. 38, no. 
56–67, 1989. 5, pp. 6381–6391, 2011. 
[10] D. J. Stephens, “The prediction of the California https://doi.org/10.1016/j.eswa.2010.12.054.  
bearing ratio,” Civil Engineering= Siviele [22] Tja. Taskiran, “Prediction of California bearing 
Ingenieurswese, Sabnet, vol. 1990, no. 12, pp. ratio (CBR) of fine grained soils by AI methods,” 
523–528, 1990. Advances in Engineering Software, Elsevier, vol. 
https://hdl.handle.net/10520/AJA10212019_1435 41, no. 6, pp. 886–892, 2010. 
6.  https://doi.org/10.1016/j.advengsoft.2010.01.003.  
[11] T. Al-Refeai and A. Al-Suhaibani, “Prediction of [23] S. Bhatt, P. K. Jain, and M. Pradesh, “Prediction 
CBR using dynamic cone penetrometer,” Journal of California bearing ratio of soils using artificial 
of King Saud University-Engineering Sciences, neural network,” Am. Int. J. Res. Sci. Technol. 
Elsevier, vol. 9, no. 2, pp. 191–203, 1997. Eng. Math, vol. 8, no. 2, pp. 156–161, 2014. 
https://doi.org/10.1016/S1018-3639(18)30676-7.  [24] T. Q. Ngo, L. Q. Nguyen, and V. Q. Tran, “Novel 
[12] M. W. Kin, “California bearing ratio correlation hybrid machine learning models including support 
with soil index properties,” Master degree vector machine with meta-heuristic algorithms in 
Project, Faculty of Civil Engineering, University predicting unconfined compressive strength of 
Technology Malaysia, 2006. organic soils stabilised with cement and lime,” 
https://eprints.utm.my/4064/1/MakWaiKinMFK International Journal of Pavement Engineering, 
A2006.pdf.  Taylor & Francis, vol. 24, no. 2, p. 2136374, 2023. 
[13] S. R. CNV and K. Pavani, “MECHANICALLY https://doi.org/10.1080/10298436.2022.2136374.  
STABILISED SOILS-REGRESSION [25] X. Wu, F. Lu, and T. He, “Exploring the potential 
EQUATION FOR CBR EVALUATION,” 2006. of machine learning in predicting soil California 
[14] P. Vinod and C. Reena, “Prediction of CBR value bearing ratio values,” Periodica Polytechnica 
of lateritic soils using liquid limit and gradation Civil Engineering, vol. 69, no. 2, pp. 551–566, 
characteristics data,” Highway Research Journal, 2025. https://doi.org/10.3311/PPci.38678.  
IRC, vol. 1, no. 1, pp. 89–98, 2008. [26] V. Bherde, L. Kudlur Mallikarjunappa, R. 
[15] R. S. Patel and M. D. Desai, “CBR predicted by Baadiga, and U. Balunaini, “Application of 
index properties for alluvial soils of South machine-learning algorithms for predicting 
Gujarat,” in Proceedings of the Indian California bearing ratio of soil,” Journal of 
geotechnical conference, Mumbai, 2010, pp. 79– Transportation Engineering, Part B: Pavements, 
82. ASCE Library, vol. 149, no. 4, p. 4023024, 2023. 
[16] G. Ramasubbarao and S. G. Sankar, “Predicting https://doi.org/10.1061/JPEODX.PVENG-1290.  
soaked CBR value of fine grained soils using [27] J. Ma et al., “A comprehensive comparison among 
index and compaction characteristics,” Jordan metaheuristics (MHs) for geohazard modeling 
Journal of Civil Engineering, vol. 7, no. 3, pp. using machine learning: Insights from a case study 
354–360, 2013. of landslide displacement prediction,” 
[17] M. Alawi and M. Rajab, “Prediction of California Engineering Applications of Artificial 
bearing ratio of subbase layer using multiple Intelligence, Elsevier, vol. 114, p. 105150, 2022. 
linear regression models,” Road Materials and https://doi.org/10.1016/j.engappai.2022.105150.  
Pavement Design, Taylor & Francis, vol. 14, no. [28] V. N. Vapnik, “The nature of statistical learning,” 
1, pp. 211–219, 2013. Theory, 1995. 
https://doi.org/10.1080/14680629.2012.757557.  [29] V. Vapnik, “Statistical Learning Theory. New 
[18] H. Ghanadzadeh, M. Ganji, and S. Fallahi, York: John Willey & Sons,” Inc, 1998. 
“Mathematical model of liquid–liquid equilibrium [30] M. K. Naik, R. Panda, and A. Abraham, 
for a ternary system using the GMDH-type neural “Adaptive opposition slime mould algorithm,” 
network and genetic algorithm,” Appl Math Soft comput, Springer, vol. 25, no. 22, pp. 14297–
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 267 
 
14313, 2021. https://doi.org/10.1007/s00500-021-  
06140-2.   
[31] S. Li, H. Chen, M. Wang, A. A. Heidari, and S.  
Mirjalili, “Slime mould algorithm: A new method  
for stochastic optimization,” Future Generation  
Computer Systems, Elsevier, vol. 111, pp. 300–  
323, 2020.  
https://doi.org/10.1016/j.future.2020.03.055.   
[32] H. R. Tizhoosh, “Opposition-based learning: a  
new scheme for machine intelligence,” in  
International conference on computational  
intelligence for modelling, control and  
automation and international conference on  
intelligent agents, web technologies and internet  
commerce (CIMCA-IAWTIC’06), Vienna,  
Austria, IEEE, 2005, pp. 695–701.  
https://doi.org/10.1109/CIMCA.2005.1631345.   
[33] M. Braik, M. H. Ryalat, and H. Al-Zoubi, “A  
novel meta-heuristic algorithm for solving  
numerical optimization problems: Ali Baba and  
the forty thieves,” Neural Comput Appl, Springer,  
vol. 34, no. 1, pp. 409–455, 2022.  
https://doi.org/10.1007/s00521-021-06392-x.   
[34] A. K. Bairwa, S. Joshi, and D. Singh, “Dingo  
Optimizer: A Nature-Inspired Metaheuristic  
Approach for Engineering Problems,” Math Probl  
Eng, Wiley Online Library, vol. 2021, p. 2571863,  
2021. https://doi.org/10.1155/2021/2571863.   
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
268   Informatica 49 (2025) 235–248                                                                                                                                   Y. Lan et al. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
https://doi.org/10.31449/inf.v49i16.9315 Informatica 49 (2025) 269–284 269 
 
A Cutting-Edge Bio-Inspired Computational Framework for 
Advanced Virtual Reality Classification through Sophisticated 
Predictive Methodologies 
 
Yanyan Song, Hongping Zhou* 
1 School of Communication Technology, Communication University of China Nanjing, Nanjing 211172, China 
*Corresponding Author 
E-mail: zhouhongping1231@163.com 
 
Keywords: virtual reality, histogram gradient boosting classification, decision tree classification, ebola optimization 
search, differential squirrel search algorithm 
 
Received: May 20, 2025 
Virtual reality (VR) enables the simulation of a wide variety of complex environments, from tiny biological 
structures to entirely imaginary worlds. These simulations create new possibilities for learning, training, 
and interaction that go beyond the limits of the physical world. However, virtual reality (VR) realizes this 
imaginary world, so it is not just a dream. VR works through the invocation of many of the senses. It 
creates realistic simulations through the creation of immersive settings that combine the real and the 
imagined, thereby affording special hands-on learning possibilities in a variety of subjects. This study 
investigates the effectiveness of combining Histogram Gradient Boosting Classification (HGBC) with 
Decision Tree Classification (DTC), the Ebola Optimization Search (EOS), and the Differential Squirrel 
Search Algorithm (DSSA) to predict VR outcomes. By integrating these advanced predictive and 
optimization techniques, the approach aims to enhance accuracy. Research will be conducted to ascertain 
the possible uses of VR, enhance user experience, and assess the impact on industries related to training, 
education, healthcare, and entertainment. In the evaluation phase, HGDS attained the highest accuracy 
of 0.967 in the test phase, making it the top-performing hybrid model, while DTEO showed the lowest 
accuracy of 0.907, identifying it as the weakest model. 
Povzetek: Članek predstavi bio-navdihnjen hibridni okvir za klasifikacijo uporabniških odzivov v virtualni 
resničnosti. Združuje HGBC, DTC ter optimizatorja EOS in DSSA za izboljšanje napovedne točnosti. 
Okvirjeva naloga je zanesljivo razvrščati VR-podatke. 
 
 
1 Introduction The desktop VR enables the user to interact with the 
system using a mouse or other controlling device while 
VR simulation signifies a computer-created environment sitting in front of a desktop computer monitor, as the name  
where users can move around, interact with objects, and implies [3]. Immersion systems utilize a visualization 
interact with virtual characters, also implied as "agents" or display worn on the head of the user that completely 
"avatars." A generic virtual setting is a 3D world [1], and, occludes their field of view. Collaborative systems have 
like gravitation simulation, virtual environments human-controlled avatars interacting with each other, and 
frequently aim to be as realistic as possible in both they can be immersive or desktop-based systems. Second 
appearance and object behavior. It must be underlined, Life is one of the most recent and most effective 
nonetheless, that there need to be no parallels between this collaboration systems [4]. An attempt is also being made 
virtual environment and the actual world. One of the to use the collaborative systems for exploration. The 
advantages of virtual environments is their ability to mixed reality systems merge computer-generated matter 
replicate completely unrealistic scenarios [2]. Virtual with the real environment, which is viewed directly or 
environments, however, provide a safe space to test through a camera. This system can teach engineering and 
scenarios that would be too dangerous or difficult to medical skills to students, which are thought to be 
perform in real life, and they imitate the setting where the impossible by this recently invented system [5]. 
student will eventually work. Learning by humans requires interaction with the 
There are other ways of deploying VR; four typical environment, taking in information provided by the use of 
configurations are included below: senses and experience [6]. Through computer simulation, 
✓ Desktop VR (Monoscopic or Stereoscopic) VR takes the role of real-world sensory input. Reacting to 
✓ Immersive VR (HMD, CAVE, widescreen) motion and common human behaviors in the actual world 
✓ Collaborative Systems offers interaction. Therefore, VR can be useful in 
✓ Mixed or Augmented Reality education since it allows pupils to experience a situation 
270   Informatica 49 (2025) 269–284                                                                                                                                 Y. Song et al. 
 
or scenario firsthand rather than only imagining it [7]. The components of the virtual world in a database, is called the 
three main components that define the quality of VR virtual environment module. The physics engine is one of 
experiences are immersion, interaction, and multisensory the major parts of any realistic simulation. A physics 
feedback. Immersion is being engulfed or enclosed by the engine comprises a set of rules that control the motion and 
surroundings [8]. One of the advantages of immersion is interaction of dynamic objects in a virtual scene. A typical 
that it ensures a feeling of presence or the perception that physics engine can include a Newtonian mechanics 
one is actually in the world being displayed [9]. simulation and collision detection, which describes when 
Interactivity means the capability of the user's body two objects collide. They apply gravitational, friction, and 
movements to affect the events happening in the impulse effects using physical rules. When two things hit 
simulation and, in turn, provoke a reaction from the each other, the latter effect is important [20], [21]. When 
simulation [10], [11]. two active entities collide, collision detection is necessary. 
The multisensory nature of VR allows information to The physics engine determines their terminal velocity 
be derived from several senses, which further enhances the using their simulated traits, such as mass, substance, and 
experience in that this makes it both more engaging and speed. 
more convincing—increasing, as it does, the sensation of 
presence because this provides redundancy of 1.1 Related works 
information, which diminishes the likelihood of 
Normally, state-of-the-art reports that focus on specific 
misunderstanding. Information from multiple sensory 
aspects of the discipline or on specific application fields 
entries is reinforced by a sensory combination [12], [13]. 
are available. They would mostly provide the taxonomies 
VR enables the user to act as though they are in the actual 
that systematically illustrate and classify the various 
world by substituting a virtual environment for the current 
methodologies involved. 
one. A constructivist learning approach benefits from 
➢ While Dachselt and Hübner [22] examined the 
VR's immersive features [14]. The premise of 
menus for AR and VR environments for all of the 
constructivism, a theory of knowledge acquisition, is that 
MR domain [23], they also presented an 
people build knowledge by concluding from their past 
extensive taxonomy. 
experiences. The idea, as propounded by Jean Piaget, 
➢ A taxonomy of NVEs, taking into consideration 
assumes that learners try to fit new experiences into the 
distribution and communication topologies, has 
world picture that they have developed earlier. Learners 
been provided by Macedonia and Zyda [24]. 
change their worldview to fit the new experience when 
Mania and Chalmers [25] have presented a 
they cannot assimilate new information into their system 
taxonomy of platforms and communication. 
effectively. Learning comes from experiences where 
➢ Bowman has provided several taxonomies for 
actions are based on assumptions about how the world 
both interaction methods [26] and navigation 
functions, only to find that it does not align with those 
methods [27]. Mine's early research [28] 
assumptions [15], [16], [17]. Adjusting the mental model 
identifies the essential navigation and interaction 
of how the world functions becomes necessary to account 
in virtual spaces. 
for the new experience. The view is that learning is an 
➢ Gabbard [29] provides good generalized 
active process of testing hypotheses. In other words, this 
overviews, presents suggested best practices in 
concept contrasts with the notion of learning as something 
application design, and provides guides for 
passive in nature: the mere acquisition or assimilation of 
conducting user evaluations. Livatino and 
data. VR is a powerful learning tool because it provides a 
Koeffel have also presented guidelines for 
context where such hypothesis testing can occur. 
Virtual Environments (VEs) assessment [30]. 
According to [18], students who interact with new 
➢ The current tracking technology is overviewed by 
material are more likely to store and recall it. 
Welch and Foxlin [31], who also compare and 
Control software is at the heart of this system. This 
contrast the respective merits and disadvantages 
regulates the exchange of information between the virtual 
of each. 
world and the interface layer in response to user actions, 
Recent work has explored innovative methods for 
updating the world appropriately. On display devices such 
classifying virtual reality (VR) using bio-inspired 
as the haptic and visual interface, it also determines when 
computational models. Song and DiPaola [32] introduced 
the scene should be shown. With the help of additional 
a bio-responsive VR system based on physiological data 
tools, the control software can connect to the outside world 
to enhance immersion. Zayed and Reda [33] demonstrated 
through the internet, which might be an essential 
that applying neurophysiological biosignals combined 
capability in systems involving collaboration or many 
with deep learning could classify cognitive states in VR 
users. The virtual environment module includes a model 
with 97% accuracy. Similarly, Arslan et al. [34], [35], 
of real-world entities and the virtual world model. It 
[36], [37], [38] employed emotion classification from 
includes state and position information apart from 
biosignals and machine learning in VR, achieving 97.78% 
appearance. They could be dynamic objects, such as 
accuracy. These advancements are significant in areas 
moving objects or even avatars. They could also be static 
such as rehabilitation, education, and psychotherapy. 
objects. This model of the virtual environment needs to be 
VanHorn and Çobanoğlu [35] also developed a 
refreshed at regular intervals to add dynamic objects [19]. 
biomedical image classification system within a VR-based 
The module for the virtual environment, which would 
environment, making AI more accessible to experts. 
store positions, shapes, and other attributes of all 
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 271 
 
Overall, these studies emphasize how biosignals, machine engagement in VR, and its variability supports the creation 
learning, and VR can be integrated to develop advanced of effective predictive models. 
predictive models, showcasing the potential of bio- This dataset attempts to contribute to the development of 
inspired computational models to improve VR VR through the analysis of user experiences. An attempt 
classification techniques. has been made in this study to develop a better VR design 
with much more improvement in user comfort and 
1.2 The study's objective  customization by understanding the physical and 
emotional reactions of consumers in diverse VR 
This work examines the possible contribution that VR 
situations. This information allows developers to work on 
technology will make to enhance learning outcomes and 
boosting VR systems and creating personalized 
increase student engagement in schools. In data 
experiences that will enhance customers' delight and 
classification, this study applies an HGBC model and a 
immersion. Fig. 1 presents a contour plot for the 
DTC model. The performance of the schemes is optimized 
correlation of the features. 
by using methods such as EOS and DSSA. This research 
User ID: This variable identifies every participant 
will explore the integration of VR within diverse 
who experienced VR. Each user is assigned a unique ID 
disciplines of study to understand how it can facilitate the 
so that their data in the dataset can be differentiated. 
retention of both theoretical knowledge and practical 
Age: This variable stores the age of the subject 
competencies of learners, given the immersion one 
participating in VR exposure. For example, this could be 
experiences in a VR environment. Possible drawbacks and 
an integer representing the current user's age at the time of 
limitations, including accessibility of resources, shall also 
using the VR. 
be discussed to present a comprehensive overview of what 
Gender: This variable displays the user’s gender. The 
can be expected from this educational technology. 
categories "Male," "Female," and "Other" can be utilized 
to define the user's gender identity. 
2 Materials and methodology VR Headset Type: This would be a variable 
specifying the form of VR headset that a user is utilizing 
2.1 Data gathering in a VR experience. Examples include Oculus Rift, HTC 
Vive, and PlayStation VR, among others. 
A set of users' experiences in VR settings provides the Duration: This variable shows the time spent in the 
dataset. The information covers user preferences, VR experience in minutes. It reflects how much time was 
emotional moods, and physiological reactions like skin spent by the participant in the VR setup. 
conductance and heart rate. This study's dataset includes Motion Sickness Rating: It displays the rating of the 
1000 samples, each representing a user’s VR session. user's self-reported motion sickness during the VR 
Recorded features encompass User ID (173 unique experience. Higher numbers relate to a higher degree of 
values), Age (66), Gender (147), VR Headset type (61), motion sickness on an ascending scale ranging between 1 
Session Duration (137), Motion Sickness severity (56), and 10. 
and Immersion Level (55). These variables cover both Dependent variable: 
demographic and behavioral data, forming a The degree to which a user experiences being inside a 
comprehensive basis for analysis. The Immersion Level virtual environment quantifies the subjective degree of the 
serves as the target variable, indicating users' subjective user's feeling of immersion in the experience, with a rating 
between 1 and 5, where 5 stands for the maximum level.
  
272   Informatica 49 (2025) 269–284                                                                                                                                 Y. Song et al. 
 
 
Figure 1: The contour plot with color fill illustrates the relationship between input and output variables 
Before deploying advanced computational models, through several iterations of hyperparameter tweaking. 
understanding several challenges in VR classification The implementation of HGB from sci-kit-learn 0.21.3 was 
systems is essential. These encompass the significant used from the Python ML module [40]. 
variability in user responses driven by individual 
physiological and psychological differences, noise in 2.3 Decision tree classification (DTC) 
biometric data such as heart rate and skin conductance, 
In a DT, every internal node displays a characteristic, each 
and class imbalance across different immersion levels. 
branch is a decision rule, and each leaf node is the outcome 
The subjective nature of immersion also complicates 
[41]. The root node signifies the topmost node in a DT. To 
labeling and impacts the consistency of ground truth. 
achieve the best discrimination among classes or results, 
These factors result in a complex, high-dimensional 
it learns to split based on the value of an attribute. 
feature space where traditional classifiers often face 
Different schemes have different criteria regarding 
difficulties with generalization and robustness. 
making decisions. For example, some of the metrics used 
Consequently, adopting adaptive hybrid machine learning 
by schemes like ID3, C4.5, and CART include entropy, 
approaches, supported by powerful metaheuristic 
gain ratio, and Gini impurity, respectively. The problem at 
optimization techniques, is crucial for effective 
hand now is to find that characteristic at every level that 
classification in VR. 
offers the optimum split in a DT, thereby assisting 
optimum decision-making [42]. The concept can be 
2.2 Histogram gradient boosting mathematically understood by using the DT split based on 
classification (HGBC) entropy. The entropy H(D) of a dataset D can be calculated 
The HGB approach is another variant of the popular GB as follows: 
𝑚
[39] technique used to resolve diverse classification and 
regression-oriented machine learning (ML) problems. 𝐻(𝐷) = − ∑ 𝑝𝑖𝑙𝑜𝑔2𝑝𝑖  (1) 
These schemes, which AdaBoost also belongs to, 𝑖=1
primarily try to turn weak learners into strong ones. They 
come under the category of schemes called boosting 2.4 Ebola optimization search (EOS) 
schemes. Boosting techniques try to keep adding and Driven by the diffusion of the Ebola virus, in what follows, 
teaching new weak learners successively to correct their EOS presents a metaheuristic scheme [43]. The EOSA 
previously introduced weak learners about their mistakes. scheme is based on the enhanced SIR scheme of the 
It then informs each new weak learner to avoid the sickness. Its S, E, I, R, H, V, Q, and D compartments 
mistakes made by its forerunner. The most common weak represent the Susceptible (S), Exposed (E), Infected (I), 
learners used are DTs. This resulted in the development of Hospitalized (H), Recovered (R), Vaccinated (V), 
the HGB algorithm, a boosting methodology that Quarantine (Q), and Death (D) states, respectively. 
overcame one of the major weaknesses of the GB Because of these compartments, the composition provides 
algorithm, which was its very long training time when for the construction of a search domain that best displays 
training on large datasets. To circumvent this problem, the combinations of weights and biases that may be required 
continuous input variables are discretized or binned into a by CNN. After representation, SIR is displayed by a 
few hundred distinct values. In this case, the learning rate mathematical scheme utilizing a system of first-order 
(LR) of the scheme is the most important hyperparameter. differential equations. Then, the new metaheuristic 
Much attention was paid to the optimization of the scheme scheme was developed by combining the mathematical 
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 273 
 
and propagation schemes, and later, the obtained ❖ Define initial values for all vector and scalar 
mathematical scheme was deployed in the design of quantities, that is, persons and parameters, 
EOSA-CNN for experimentation. Therefore, the respectively: the numbers of hospitalized (H), 
following are the mathematical schemes: vaccinated (V), susceptible (S), infected (I), 
𝑚𝐼𝑡+1
𝑖 = 𝑚𝐼𝑡
𝑖 + 𝜌𝑀(𝐼) (2) recovered (R), dead (D), and quarantined (Q). 
𝜕𝑆(𝑡) ❖ 𝐼1 is created at random among vulnerable people. 
= 𝜋 − (𝛽 𝐼 + 𝛽 𝐷 + 𝛽 𝑅 + 𝛽
𝜕𝑡 1 3 4 2(𝑃𝐸)𝜂)𝑆
(3) ❖ The value of fitness shall be calculated at the 
− (𝜏𝑆 + Γ𝐼) index case, having set that as the current and 
𝜕𝐼(𝑡) global best. 
= (𝛽 𝐼 + 𝛽 𝐷 + 𝛽 𝑅 + 𝛽 (
𝜕𝑡 1 3 4 2 𝑃𝐸)𝜆)𝑆
(4) ❖ If there is at least one infected person and the 
− (Γ + 𝛾)𝐼 − (𝜏)𝑆 number of iterations is not reached, then: 
𝜕𝐻(𝑡) a) With every vulnerable individual, a standing is 
= 𝛼𝐼 − (𝛾 + 𝜛)𝐻 (5) 
𝜕𝑡 created and altered accordingly with their 
𝜕𝑅(𝑡) movement. Exploitation is characterized by short 
= 𝛾𝐼 − Γ𝑅 (6) 
𝜕𝑡 displacement; otherwise, it characterizes 
𝜕𝑉(𝑡) exploration. Remember that the longer an 
= 𝛾𝐼 − (𝜇 + 𝜗)𝑉 (7) 
𝜕𝑡 infected case is displaced, the more infections 
𝜕𝐷(𝑡) there are. 
= (𝜏𝑆 + Γ𝐼) − 𝛿𝐷 (8) 
𝜕𝑡 b) Using (a), generate newly infected individuals 
𝜕𝑄(𝑡) 𝑛𝐼. 
= (𝜋𝐼 − (𝛾𝑅 + Γ𝐷)) − 𝜉𝑄 (9) 
𝜕𝑡 c) Create the new individuals and add the new 
𝑚𝐼𝑡+1 𝑡
𝑖  and 𝑚𝐼𝑖  display the old and new situation at instances in 𝐼. 
time 𝑡 and 𝑡 + 1, respectively, 𝜌 is the displacement scale d) From the size of 𝐼 calculate how many people are 
factor of an individual in Eq. (2). The data updated here added to H, D, R, B, V, and Q at their respective 
are Hospitalized (H), Vaccinated (V), Recovered (R), rates. 
Infected (I), Susceptible (S), Quarantine (Q), and Dead e) Utilizing the new 𝐼, refine 𝑆 and 𝐼. 
(D). Eqs. (3) to (9) define a system of ordinary differential f) Compare the best 𝐼 have got at the moment with 
equations, all of the scalar functions that one can evaluate the best in the world. 
to float values. These are computed given initial g) If the termination condition is not reached, go 
conditions 𝑆 (0)  =  𝑆0, 𝐼(0)  =  𝐼0, 𝑅(0)  =  𝑅0, 𝐷(0)  = back to step 4. 
 𝐷0, 𝑃(0)  =  𝑃0, and 𝑄(0)  =  𝑄0, 𝑡 is after the definition ❖ Return all solutions and the best global 
of iterations. This will then enable us to conclude the resolution. 
magnitude of vectors S, I, H, R, V, D, and Q at 𝑡. The design and discussion of the utilization of the 
The pseudocode that describes the EOSA enhancement issue defined in this paper are given in the 
metaheuristic scheme is presented accordingly in steps following subsections.  
below: Fig. 2 presents the flowchart of the DTC.
 
Figure 2: The flowchart of the DTC model 
274   Informatica 49 (2025) 269–284                                                                                                                                 Y. Song et al. 
 
2.5 Differential squirrel search algorithm Combining EOS and DSSA offers complementary 
(𝐃𝐒𝐒𝐀) advantages, EOS facilitates broad exploration, while 
DSSA ensures precise convergence, together enhancing 
𝐷𝑆𝑆𝐴, a hybrid optimizer that combines the differential classification accuracy and model robustness for VR 
evolution and squirrel search schemes is presented in this immersion prediction. 
section. In SSA, the squirrels maintain the position of 
other squirrels regarding acorn or hickory trees for 2.5.1 Initialization of position and evaluation 
updating their position. To improve its search strategy, the 
of fitness 
top squirrels' position updating rules have been changed. 
The incorporation of crossover operations inspired by DE The squirrels are initially placed in the search area at 
significantly enhances the exploration capability. random. Knowing the squirrels' location allows one to 
The following is a mathematical scheme of many calculate their fitness, which simply replaces their position 
foraging techniques covered under the paradigm of DSSA. in the fitness function by demonstrating how good a food 
To justify selecting EOS and DSSA for this supply they could find. The best squirrel 𝑃𝑆ℎ𝑡  discovered 
classification task, it’s crucial to highlight the problem’s in the hickory tree thus far is determined by sorting fitness 
nature: the dataset involves multiple interacting features values. It is thought that the squirrels in the acorn tree 
with complex, nonlinear relationships, which can cause 𝑃𝑆𝑎𝑡(1 ∶  3) are traveling in the direction of the optimal 
optimization to get stuck in local optima when using location in a subsequent iteration, as indicated by the 
traditional methods. The EOS algorithm, inspired by following three best function values. The remaining 
epidemic modeling, employs dynamic, population-based squirrels, 𝑃𝑆𝑛𝑡(1: 𝑁𝑃 −  4), are in the typical tree and 
exploration techniques that balance infection-driven have not yet discovered food. 
diversification with recovery-focused convergence. This 
strategy is especially effective for tuning hyperparameters 2.5.2 Position update 
in complex models like HGBC and DTC. Its The squirrels in an acorn tree, following the current best, 
compartmental diffusion model efficiently captures 𝑃𝑆ℎ𝑡, renew the position, and move in the direction of the 
multidimensional search dynamics. Meanwhile, DSSA best source when there is no predator. The squirrels of a 
mimics squirrel foraging behavior and utilizes crossover usual tree follow the ones in an acorn or hickory tree to 
inspired by differential evolution, making it highly renew their position. If there is the presence of a predator, 
effective at fine-tuning solutions locally while then the squirrels change direction randomly while 
maintaining overall diversity. This capability is critical in foraging. These are the mathematical schemes that are 
VR classification scenarios, where high accuracy requires used to update the squirrel's position. 
careful adjustment of sensitive parameters to prevent As in Eq. (10) now, the posture of squirrels on acorn 
overfitting. DSSA’s ability to retain elite solutions while trees changes based on the postures of others.
fostering diversity helps avoid premature convergence. 
𝑆𝑜𝑙𝑑
𝑡 + 𝑑𝑔 × 𝐺 𝑃𝑆𝑜𝑙𝑑
𝑃𝑆𝑛𝑒𝑤 𝑃 𝑐(𝑃𝑆𝑜𝑙𝑑
ℎ𝑡 − 𝑎𝑡 − 𝑃𝑎𝑣𝑔),   𝑟1 ≥ 𝑃𝑑𝑝
𝑎𝑡 = { 𝑎  (10) 
𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛,                𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
 
whereas 𝑃𝑎𝑣𝑔 is the mean location of every squirrel in Some of the squirrels on regular trees do the 
the current population. placement of acorn tree squirrels, after which they relocate 
It also employs the crossover mechanism of DE in a to their new locations. 
way that ensures diversity among squirrels to the 𝑃𝑆𝑛𝑒𝑤
𝑛𝑡
maximum while minimizing the possibility of trapping in 𝑃𝑆𝑜𝑙𝑑 𝑜𝑙𝑑 − 𝑃𝑆𝑜𝑙𝑑
= { 𝑛𝑡 + 𝑑𝑔 × 𝐺𝑐(𝑃𝑆𝑎𝑡 𝑛𝑡 ), 𝑟2 ≥ 𝑃𝑑𝑝 (12) 
 
local minima. Applied to the squirrel's current position and 𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑖𝑜𝑛,         𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
the new position as obtained by Eq. (11): where the random integer 𝑟2 is uniformly distributed 
𝑃𝑆𝑐𝑟
𝑎𝑡,𝑖,𝑗 between 0 and 1. 
𝑃𝑆𝑛𝑒𝑤
𝑎𝑡,𝑖,𝑗 ,   𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 ≤ 𝐶𝑟) 𝑜𝑟 𝑗 = 𝑗𝑟𝑎𝑛𝑑 In normal trees, the survivors cling on to the best 
= { , 𝑗 (11) 
𝑃𝑆𝑜𝑙𝑑 move on view, and their new positions are shown below: 
𝑎𝑡,𝑖,𝑗 ,    𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 > 𝐶𝑟)𝑜𝑟 𝑗 ≠ 𝑗𝑟𝑎𝑛𝑑
𝑃𝑆𝑛𝑒𝑤
= 1, 2, 3, … , 𝐷 𝑛𝑡
𝑃𝑆𝑜𝑙𝑑 + 𝑑 𝑜𝑙𝑑 𝑑
In this context, NP displays the population size, with ) 
= { 𝑛𝑡 𝑔 × 𝐺𝑐(𝑃𝑆ℎ𝑡 − 𝑃𝑆𝑜𝑙
𝑛𝑡 ), 𝑟3 ≥ 𝑃𝑑𝑝 (13
 
𝑖 ranging from 1, 2, 3, … , 𝑁𝑃. For acorn or normal trees, 𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑖𝑜𝑛,           𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑃𝑆𝑐𝑟
𝑎𝑡;𝑖;𝑗    indicates the updated positions of the squirrels The following crossover procedure is also given for 
typical tree squirrels: 
following the crossover operation. 𝑃𝑆𝑛𝑒𝑤
𝑎𝑡;𝑖;𝑗   and 𝑃𝑆𝑜𝑙𝑑
𝑎𝑡;𝑖;𝑗 
𝑃𝑆𝑐𝑟
correspond to the new and previous positions of the 𝑛𝑡,𝑖,𝑗
squirrels. 𝐷 refers to the dimensionality of the problem, 𝑃𝑆𝑛𝑒𝑤
𝑛𝑡,𝑖,𝑗,   𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 ≤ 𝐶𝑟)𝑜𝑟 𝑗 = 𝑗𝑟𝑎𝑛𝑑
= {  , (14) 
and 𝐶𝑟 displays the crossover rate, which is set to 0.5. The 𝑃𝑆𝑜𝑙𝑑
𝑛𝑡,𝑖,𝑗,   𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 > 𝐶𝑟)𝑜𝑟 𝑗 ≠ 𝑗𝑟𝑎𝑛𝑑
index 𝑗𝑟𝑎𝑛𝑑 is randomly selected from the range [1, 𝐷], 𝑗 = 1, 2, 3, … , 𝐷 
and 𝑟𝑎𝑛𝑑𝑗 denotes the 𝑗𝑡ℎ random number, uniformly The convergence speed may be raised by permitting 
generated within this range. the hickory tree squirrel to update her location in relation 
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 275 
 
to the average position of the squirrels in the tree. This can Figure 3 illustrates the flowchart of the proposed 
be done as follows: hybrid models (such as HGDS and DTEO), detailing the 
𝑃𝑆𝑛𝑒𝑤 𝑎
ℎ𝑡 = 𝑃𝑆𝑜𝑙𝑑
ℎ𝑡 + 𝑑𝑔 × 𝐺𝑐(𝑃𝑆𝑜𝑙𝑑 𝑣𝑔
ℎ𝑡 − 𝑃𝑆𝑎𝑡 ) (15) sequential phases that encompass data input, model 
In this instance, 𝑃𝑆𝑎𝑣𝑔 displays the average of all development, optimizer-centric hyperparameter 
squirrel locations within the acorn trees. optimization, training, and final assessment. This diagram 
In order to participate in the next generation of people, delineates the interaction between machine learning 
the best aspects of the new work, as well as its crossover models and metaheuristic optimizers within the hybrid 
roles, are then contrasted with the old jobs. structure. 
 
 
 
Figure 3: The process flowchart of the proposed hybrid models
276   Informatica 49 (2025) 269–284                                                                                                                                 Y. Song et al. 
 
2.6 Performance evaluators  max_leaf_nodes (listed separately for different model 
types). The HGEO and HGDS models, based on the 
Accuracy depends on how many correctly projected 
HGBR algorithm, have specified values for learning_rate, 
positive and negative instances there are of the total, 
max_leaf_nodes, max_depth, min_samples_leaf, and 
defined by True Positives (TP), True Negatives (TN)—
max_bins. For example, the HGEO model has a 
correctly projected negative cases, False Positives (FP)-
learning_rate of 0.709, max_leaf_nodes of 278, 
incorrectly projected as positive, and False Negatives 
max_depth of 100, min_samples_leaf of 10, and max_bins 
(FN)—incorrectly projected as negative. Using TP and FP 
of 27. In the HGDS model, these values are learning_rate 
as the relevant measures, precision gauges the percentage 
of 0.148, max_leaf_nodes of 557, max_depth of 893, 
of TP projections out of all the positive projections the 
min_samples_leaf of 7, and max_bins of 102. Conversely, 
model has made. Smaller amounts of false positives imply 
the DTEO and DTDS models, which are based on decision 
higher precision. Recall is the measure of the share of TP 
tree algorithms, do not include values for learning_rate, 
projections from all real positive instances, using True 
max_leaf_nodes, or max_bins in the first part of the table. 
Positives and False Negatives; it indicates that the model 
However, they include defined values for max_depth, 
will detect all relevant positive cases. The fewer false 
min_samples_leaf, min_samples_split, and 
negatives there are, the higher the recall. A simple statistic 
max_leaf_nodes in the second part. For instance, the 
that balances the trade-off between Precision and Recall is 
DTEO model has max_depth of 741, min_samples_leaf of 
the F1 score. It combines the two. 
𝑇𝑃 + 𝑇𝑁 0.00025, min_samples_split of 0.0275, and 
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ∶   (16) max_leaf_nodes of 2710. Similarly, the DTDS model 
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
𝑇𝑃 features max_depth of 597, min_samples_leaf of 0.00025, 
Precision ∶   (17) min_samples_split of 0.0005, and max_leaf_nodes of 
𝑇𝑃 + 𝐹𝑃
𝑇𝑃 1789. Overall, the table indicates that hyperparameters are 
Recall ∶   (18) selectively tuned for each model based on its structure, 
𝑇𝑃 + 𝐹𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙  with parameter values chosen according to each model's 
F1 − score = 2 ×  (19) specific characteristics and requirements. 
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 
The F1-score is a single measure that balances the Fig. 4 displays a 3D waterfall plot illustrating the 
accuracy and recall; it is the harmonic mean of both. It is convergence curves of four hybrid schemes: HGDS, 
very useful when considering false negatives and false HGEO, DTDS, and DTEO. The plot effectively visualizes 
positives. The greater the F1 score, the better balanced the the different convergence rates and final performance 
recall and accuracy. levels of the schemes, demonstrating the varying degrees 
of effectiveness in the optimization process. This 
comparison emphasizes the significance of the number of 
3 Results and discussion iterations and initial accuracy in determining the overall 
success of each hybrid model. The HGDS model starts 
3.1 Hyperparameters tuning and with an accuracy of 0.6 and gradually improves over 200 
iterations, ultimately reaching a peak accuracy of 0.967, 
convergence curve analysis 
making it the highest-performing model among the four. 
The presented table displays the tuned hyperparameters The other three schemes begin with a lower accuracy of 
for four different hybrid models: HGEO, HGDS, DTEO, 0.4 and converge more quickly than HGDS, reaching their 
and DTDS. Seven key hyperparameters were considered final accuracy in fewer iterations. Among these schemes, 
to optimize these models' performance: learning_rate, DTEO is identified as the weakest hybrid model, with a 
max_leaf_nodes, max_depth, min_samples_leaf, final accuracy of 0.908 after its iterations. 
max_bins, min_samples_split, and a second instance of 
Table 1: Hyperparameter tuning for four models 
Models 
Hyperparameters 
HGEO HGDS DTEO DTDS 
𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔_𝑟𝑎𝑡𝑒 0.709 0.148 - - 
𝑚𝑎𝑥_𝑙𝑒𝑎𝑓_𝑛𝑜𝑑𝑒𝑠 278 557 - - 
𝑚𝑎𝑥_𝑑𝑒𝑝𝑡ℎ 100 893 741 597 
𝑚𝑖𝑛_𝑠𝑎𝑚𝑝𝑙𝑒𝑠_𝑙𝑒𝑎𝑓 10 7 0.00025 0.00025 
𝑚𝑎𝑥_𝑏𝑖𝑛𝑠 27 102 - - 
𝑚𝑖𝑛_𝑠𝑎𝑚𝑝𝑙𝑒𝑠_𝑠𝑝𝑙𝑖𝑡 - - 0.0275 0.0005 
𝑚𝑎𝑥_𝑙𝑒𝑎𝑓_𝑛𝑜𝑑𝑒𝑠 - - 2710 1789 
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 277 
 
 
 
Figure 4: 3D waterfall plot for the convergence curve of the hybrid schemes 
3.2 Schemes performance comparison accuracy of 0.907, is the weakest model. HGDS 
outperforms HGEO by 0.17 in accuracy, establishing itself 
Fig. 5 presents a doughnut plot, providing an intuitive 
as the top model. Nevertheless, HGEO still demonstrates 
representation of the schemes' performance and 
strong performance, securing the second-best position 
facilitating a clearer comparison across different 
overall. This comparison underscores the varying 
evaluation metrics. The performance results of six hybrid 
strengths of each model, with HGDS leading in accuracy 
schemes evaluated using accuracy, precision, recall, and 
and other performance metrics, while HGEO, despite its 
F1 scores across training, testing, and overall sections 
lower accuracy, remains a competitive alternative. The 
have been presented. Among these, HGDS emerges as the 
results emphasize that even schemes with slightly lower 
best-performing model, with an impressive accuracy of 
accuracy can still offer valuable performance in certain 
0.967 in the test section. Conversely, DTEO, with an 
contexts. 
278   Informatica 49 (2025) 269–284                                                                                                                                 Y. Song et al. 
 
  
  
  
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 279 
 
  
Figure 5: A connected doughnut plot employed for the visual evaluation of the schemes' performance 
Additionally, Table 2 provides a summary of the than the top-performing schemes. Nonetheless, it 
performance of six schemes across five levels regarding outperforms both DTDS and CTC by margins of 0.013 
precision, recall, and F1 score. The hybrid model HGDS and 0.010, respectively. While DTEO's F1-score may not 
stands out at level 1, achieving the highest precision of be the highest, it still demonstrates competitive 
0.990. Additionally, HGDS excels in both recall and F1- performance relative to other schemes. These findings 
score, outperforming all other schemes and demonstrating indicate that HGDS is the most well-rounded and effective 
its overall robustness. In contrast, DTEO shows weaker model overall, while DTEO, despite its limitations in 
recall performance compared to the other schemes, recall and F1 score, delivers superior performance in 
although it surpasses DTC in this metric. Regarding the specific areas.  
F1-score, DTEO records a value of 0.922, which is lower 
Table 2: Schemes’ evaluation results through different immersion levels 
Immersion Levels 
Evaluators Schemes 
Level 1 Level 2 Level 3 Level 4 Level 5 
HGBC 0.946 0.946 0.909 0.925 0.973 
HGEO 0.941 0.995 0.907 0.951 0.974 
HGDS 0.974 0.990 0.981 0.945 0.943 
Precision 
DTC 0.988 0.938 0.825 0.909 0.822 
DTEO 0.973 0.929 0.847 0.923 0.878 
DTDS 0.906 0.939 0.873 0.914 0.983 
HGBC 0.951 0.933 0.928 0.952 0.932 
HGEO 0.946 0.947 0.959 0.947 0.969 
HGDS 0.960 0.971 0.985 0.956 0.963 
Recall 
DTC 0.847 0.875 0.953 0.869 0.916 
DTEO 0.876 0.885 0.948 0.927 0.906 
DTDS 0.911 0.894 0.964 0.927 0.911 
HGBC 0.948 0.94 0.918 0.938 0.952 
HGEO 0.943 0.97 0.932 0.949 0.971 
HGDS 0.975 0.976 0.965 0.949 0.971 
F1-score 
DTC 0.912 0.906 0.885 0.888 0.866 
DTEO 0.922 0.906 0.895 0.925 0.892 
DTDS 0.909 0.916 0.916 0.921 0.946 
 
Fig. 6 displays the ROC (Receiver Operating Area Under the Curve (AUC) signifies better 
Characteristic) curves of the hybrid model across five performance. Among the five levels, Level 1 has the 
immersion levels. The ROC curve plots the True Positive highest AUC, indicating greater confidence and fewer 
Rate (TPR) against the False Positive Rate (FPR) at classification uncertainties at this stage. Conversely, Level 
various thresholds, offering a visual assessment of the 5 shows the weakest ROC performance, likely due to 
model’s ability to distinguish between classes. A higher increased data overlap and less feature separation at higher 
280   Informatica 49 (2025) 269–284                                                                                                                                 Y. Song et al. 
 
immersion ratings. This suggests that as responses become lowest false positive rate. At this level, the true positive 
more subtle at deeper immersion levels, the model's ability rate starts at 0.0 and gradually increases to 1.0, while the 
to differentiate between classes slightly diminishes, false positive rate begins at 0.0 and rises to 0.1. On the 
resulting in more false positives and a lower true positive other hand, Level 5 displays the worst projection 
rate. These differences illustrate the model’s changing performance, with the true positive rate reaching 1.0, 
confidence in classification across varying immersion indicating a decrease in projection accuracy and an 
levels. Level 1 is considered the best projection level, increase in false positive rate. This shows a decline in 
characterized by the highest true positive rate and the overall predictive quality as the level increases.
 
Figure 6: ROC curves for the hybrid classification model across five immersion levels. 
3.3 Comparison of the measured and model. This high correlation between observed and 
projected values projected values underscores HGDS's strong overall 
reliability. Conversely, the DTEO model shows the 
Fig. 7 displays a 3D bar plot illustrating the correlation weakest performance, with only 177 accurate projections, 
between observed and projected values across five levels, making it the least effective model overall. While certain 
highlighting each model's predictive accuracy. Among schemes may perform poorly in specific conditions, 
these, the HGDS model stands out with the best DTEO consistently underperforms across all levels, 
performance, particularly in level 1, where it achieves 194 indicating significant limitations in its predictive 
accurate projections, establishing it as the top-performing accuracy. 
  
Figure 7: A 3D bar plot is generated to depict the correlation between observed and projected values 
Fig. 8 shows the projection errors across six schemes, these, the HGDS stands out for its higher accuracy. In 
focusing on correct projections versus mistakes. Among level 1, it correctly projected 192 out of 194 cases, 
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 281 
 
resulting in only two errors. Similarly, in level 2, HGDS performance was similarly low in level 2, where it made 
achieved 198 correct projections out of 202, with just four 14 mistakes out of 184 projections. This high error rate 
mistakes. This accuracy highlights its strong performance marks DTEO as the least effective among the schemes 
in comparison to the other schemes. In contrast, the DTEO analyzed. Overall, while HGDS exhibits consistent 
model demonstrates weaker predictive accuracy. In level accuracy in both levels, DTEO's elevated error rate 
1, it recorded five errors out of 177 projections. Its suggests limitations in its predictive reliability. 
  
  
  
Figure 8: Confusion matrix illustrating the accuracy of the schemes under four specified conditions 
282 Informatica 49 (2025) 501–505                                                                                                                                Y. Song et al.
  
• Sensitivity analysis The HGBC, HGEO, HGDS, DTEO, and DTDS models 
recorded much lower F-values—0.021, 0.006, 0.031, 
Table 3 displays the results of a sensitivity analysis 
1.015, and 0.074—with P-values of 0.886, 0.937, 0.861, 
using one-way ANOVA to determine if model 
0.314, and 0.786. These findings indicate no statistically 
performance differences across various VR immersion 
significant performance differences across immersion 
levels are statistically significant. The F-value indicates 
levels. Notably, the HGDS model—identified earlier as 
the ratio of variance between groups to within groups, 
the most accurate with a test accuracy of 0.967—showed 
while the P-value shows the likelihood that observed 
a low F-value of 0.031 and a high P-value of 0.861, 
differences are due to chance. A P-value below 0.05 is 
confirming its stable performance across all conditions. 
generally considered significant. Of the six models 
Overall, the ANOVA results suggest that none of the 
evaluated, the DTC model had the highest F-value of 
models exhibit statistically significant performance 
2.923 and a P-value of 0.088. Although close to 
variations across immersion levels, highlighting the 
significance, this result remains statistically non-
robustness of the proposed models and particularly 
significant, implying only marginal performance 
validating the consistent performance of HGDS under 
differences that do not meet the 95% confidence threshold. 
different experimental scenarios. 
Table 3: Sensitivity analysis based on ANOVA 
Models name F-value P-value Models name F-value P-value 
HGBC 0.021 0.886 DTC 2.923 0.088 
HGEO 0.006 0.937 DTEO 1.015 0.314 
HGDS 0.031 0.861 DTDS 0.074 0.786 
 
their visual representation but also in how objects behave. 
3.4 Limitations and directions for future For instance, VR simulations may include natural forces 
research like gravity. These environments are not always designed 
to mirror the real world; in fact, they often present 
While the hybrid classification framework demonstrates fantastical or even impossible scenarios. This unique 
encouraging results in predicting VR immersion levels, capability allows VR to simulate complex or hazardous 
there are some limitations to address. First, the dataset is situations safely, making it especially useful in training 
relatively small and was collected in a controlled and educational contexts. In such settings, VR can expose 
experimental setting, raising questions about how well the learners to potentially risky situations they might 
models will perform in real-world or commercial VR encounter in reality, allowing them to experience and 
environments with more diverse users. Second, the practice without the associated risks. Advancements in 
computational cost of metaheuristic algorithms like EOS technology have greatly enhanced the capabilities of VR, 
and DSSA can increase substantially with larger dataset allowing for more immersive and realistic simulations. 
dimensions, which may affect real-time or low-latency Additionally, the integration of sophisticated 
VR applications. More research is needed to evaluate their classification schemes, such as DTC and HGBC, is 
scalability and efficiency in live systems. Third, although transforming digital experiences. These schemes, along 
the models were optimized for accuracy, aspects like with optimizations from techniques like the EOS and 
interpretability and user feedback were not thoroughly DSSA, contribute to the improvement of VR systems. In 
explored. Transparency could be especially important for testing, the hybrid HGDS approach has proven to be 
applications in education or healthcare. Future research highly effective, achieving an accuracy rate of 0.967, 
will focus on: (1) expanding the dataset to include making it the top performer among various schemes. On 
multimodal user feedback (e.g., eye tracking, EEG), (2) the other hand, the DTEO approach, with an accuracy of 
comparing our framework with common models such as 0.907, was identified as the least effective. Additionally, 
SVM, Random Forest, and Neural Networks, and (3) although this study concentrated on the new EOS and 
creating lightweight or approximate versions of EOS and DSSA algorithms because of their innovative hybrid 
DSSA suitable for real-time immersive use. Additionally, search abilities, future research will include implementing 
we aim to test the models across various VR fields, and comparing more traditional and popular optimizers 
including rehabilitation, industrial training, and like Particle Swarm Optimization (PSO), Genetic 
personalized learning, to ensure their robustness in Algorithms (GA), and Bayesian Optimization. This will 
different operational contexts. enable a more comprehensive assessment of optimization 
efficiency and adaptability across different learning 
scenarios.  Although this study concentrated on hybrid 
4 Conclusion  variants within our optimization framework, future 
VR simulation immerses users in a dynamic, visually research will include benchmarking with models like 
engaging virtual environment where they can navigate, Random Forest, Support Vector Machines, XGBoost, and 
manipulate virtual objects, and interact with digital agents. Neural Networks. This will contextualize our models' 
A defining feature of VR worlds is their three-dimensional performance against recognized standards and strengthen 
nature, often coupled with realistic elements, not only in the validation of our methodology.   Despite this, the 
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 283 
 
hybrid approach often outperformed both DTC and DTDS 183–201, 2001. Mary Ann Liebert. 
in certain metrics, demonstrating the potential of https://doi.org/10.1089/109493101300117884.  
combining these innovative techniques for enhancing VR- [10] W. R. Sherman and A. B. Craig, “Understanding 
based applications.  virtual reality,” San Francisco, CA: Morgan 
Kauffman, 2003. Elsevier.  
Declarations [11] J. Vince, Introduction to virtual reality. Springer 
Science & Business Media, 2011. 
[12] S. Kavanagh, A. Luxton-Reilly, B. Wuensche, and 
Funding B. Plimmer, “A systematic review of virtual 
reality in education,” Themes in science and 
This investigation was not funded by any specific grant 
technology education, vol. 10, no. 2, pp. 85–119, 
from public, commercial, or charitable funding bodies. 
2017. The Learning and Technology Library. 
[13] S. Helsel, “Virtual reality and education,” 
Authors' contributions Educational Technology, vol. 32, no. 5, pp. 38–
YS performed Data collection, modeling, and appraisal. 42, 1992.JSTOR.  
HZ reviews the initial draft of the manuscript, editing and [14] W. Winn, “Learning in artificial environments: 
writing. Embodiment, embeddedness and dynamic 
adaptation,” Technology, Instruction, Cognition 
Acknowledgements and Learning, vol. 1, no. 1, pp. 87–114, 2003. 
[15] B. J. Baker, “Virtual reality,” in Encyclopedia of 
This exploration was backed by the project of the sixth Sport Management, Edward Elgar Publishing, 
phase of the “333 High-level Talent Cultivation Project” 2024, pp. 1021–1023. Elgar Online. 
in Jiangsu Province. https://doi.org/10.4337/9781035317189.ch599.  
[16] Y. Boas, “Overview of virtual reality 
Ethical approval technologies,” in Interactive Multimedia 
Conference, 2013, pp. 1–6. 
The exploration has received ethics approval from the 
[17] M.-S. Yoh, “The reality of virtual reality,” in 
IRB, guaranteeing the protection of participants' rights and 
Proceedings seventh international conference on 
compliance with the related ethics norms. 
virtual systems and multimedia, IEEE, Berkeley, 
CA, USA, 2001, pp. 666–674. 
References https://doi.org/10.1109/VSMM.2001.969726 
[1] C. Anthes, R. J. García-Hernández, M. [18] J. S. Bruner, “The act of discovery.,” Harvard 
Wiedemann, and D. Kranzlmüller, “State of the educational review, 1961. APA PsycNet.  
art of virtual reality technology,” in 2016 IEEE [19] G. Riva, “Virtual reality,” in The Palgrave 
aerospace conference, IEEE, Big Sky, MT, USA, Encyclopedia of the possible, Springer, 2023, pp. 
2016, pp. 1–19. 1740–1750. Springer. 
https://doi.org/10.1109/AERO.2016.7500674.  https://doi.org/10.1007/978-3-030-90913-0_34.  
[2] I. Wohlgenannt, A. Simons, and S. Stieglitz, [20] C. Christou, “Virtual reality in education,” in 
“Virtual reality,” Business & Information Systems Affective, interactive and cognitive methods for e-
Engineering, vol. 62, pp. 455–461, 2020. learning design: creating an optimal education 
Springer. https://doi.org/10.1007/s12599-020- experience, IGI Global, 2010, pp. 228–243. IGI 
00658-9.  Global. DOI: 10.4018/978-1-60566-940-3.ch012. 
[3] J. M. Zheng, K. W. Chan, and I. Gibson, “Virtual [21] M. Heim, “The design of virtual reality,” Body & 
reality,” Ieee Potentials, vol. 17, no. 2, pp. 20–23, Society, vol. 1, no. 3–4, pp. 65–77, 1995. Sage 
1998. IEEE. https://doi.org/10.1109/45.666641.  Publications. 
[4] G. C. Burdea, Virtual reality technology. John https://doi.org/10.1177/1357034X95001003004.  
Wiley & Sons, 2003. [22] R. Dachselt and A. Hübner, “Three-dimensional 
[5] S. M. LaValle, Virtual reality. Cambridge menus: A survey and taxonomy,” Computers & 
university press, 2023. Graphics, vol. 31, no. 1, pp. 53–65, 2007. 
[6] R. W. Langacker, “Virtual reality,” 1999. Elsevier. 
[7] S. Greengard, Virtual reality. Mit Press, 2019. https://doi.org/10.1016/j.cag.2006.09.006.  
[8] F. P. Brooks, “What’s real about virtual reality?” [23] P. Milgram and F. Kishino, “A taxonomy of 
IEEE Computer graphics and applications, vol. mixed reality visual displays,” IEICE 
19, no. 6, pp. 16–27, 1999. IEEE. TRANSACTIONS on Information and Systems, 
https://doi.org/10.1109/38.799723.  vol. 77, no. 12, pp. 1321–1329, 1994. 
[9] M. J. Schuemie, P. Van Der Straaten, M. Krijn, [24] M. R. Macedonia and M. J. Zyda, “A taxonomy 
and C. A. P. G. Van Der Mast, “Research on for networked virtual environments,” IEEE 
presence in virtual reality: A survey,” multimedia, vol. 4, no. 1, pp. 48–56, 1997. IEEE. 
Cyberpsychology & behavior, vol. 4, no. 2, pp. https://doi.org/10.1109/93.580395.  
[25] A. Mania and A. G. Chalmers, “A classification 
for user embodiment in collaborative virtual 
284 Informatica 49 (2025) 501–505                                                                                                                                Y. Song et al.
  
environments,” in 4th Internatioal Conference on 
Virtual Systems and Multimedia, Gifu, Japan, 
1998. 
[26] D. A. Bowman and L. F. Hodges, “Formalizing 
the design, evaluation, and application of 
interaction techniques for immersive virtual 
environments,” Journal of Visual Languages & 
Computing, vol. 10, no. 1, pp. 37–53, 1999. 
Elsevier. https://doi.org/10.1006/jvlc.1998.0111.  
[27] D. A. Bowman, D. Koller, and L. F. Hodges, “A 
methodology for the evaluation of travel 
techniques for immersive virtual environments,” 
Virtual reality, vol. 3, no. 2, pp. 120–131, 1998. 
Springer. https://doi.org/10.1007/BF01417673.  
[28] M. R. Mine, “Virtual environment interaction 
techniques,” UNC Chapel Hill CS Dept, 1995. 
[29] J. L. Gabbard, “A taxonomy of usability 
characteristics in virtual environments,” 1997, 
Virginia Tech. 
[30] S. Livatino and C. Koffel, “Handbook for 
evaluation studies in virtual reality,” in 2007 IEEE 
symposium on virtual environments, human-
computer interfaces and measurement systems, 
IEEE, Ostuni, Italy, 2007, pp. 1–6. 
https://doi.org/10.1109/VECIMS.2007.4373918.  
[31] G. Welch and E. Foxlin, “Motion tracking: No 
silver bullet, but a respectable arsenal,” IEEE 
Computer graphics and Applications, vol. 22, no. 
6, pp. 24–38, 2002.IEEE. 
https://doi.org/10.1109/MCG.2002.1046626.  
[32] S. Emami and G. Martínez-Muñoz, “Condensed-
gradient boosting,” International Journal of 
Machine Learning and Cybernetics, pp. 1–15, 
2024. Springer. https://doi.org/10.1007/s13042-
024-02279-0.  
[33] F. Pedregosa et al., “scikit-learn: Machine 
Learning in Python-StandardScaler, 2011,” 2024, 
Accessed. 
[34] F. T. Admojo and B. S. W. Poetro, “Comparative 
Study on the Performance of the Bagging 
Algorithm in the Breast Cancer Dataset,” 
International Journal of Artificial Intelligence in 
Medical Issues, vol. 1, no. 1, pp. 36–44, 2023. 
Yoctobrain. 
https://doi.org/10.56705/ijaimi.v1i1.87.  
[35] A. Naswin and A. P. Wibowo, “Performance 
Analysis of the Decision Tree Classification 
Algorithm on the Pneumonia Dataset,” 
International Journal of Artificial Intelligence in 
Medical Issues, vol. 1, no. 1, pp. 1–9, 2023. 
https://doi.org/10.56705/ijaimi.v1i1.83.  
[36] T. I. A. Mohamed, O. N. Oyelade, and A. E. 
Ezugwu, “Automatic detection and classification 
of lung cancer CT scans based on deep learning 
and ebola optimization search algorithm,” Plos 
one, vol. 18, no. 8, p. e0285796, 2023. PLOS. 
https://doi.org/10.1371/journal.pone.0285796.  
 
https://doi.org/10.31449/inf.v49i16.9788 Informatica 49 (2025) 269–290 269 
 
GWO-RF: A Grey Wolf Optimized Random Forest Model for 
Predicting Employee Turnover 
 
Hongtao Zhang 
Henan Medical Biological Testing Co., Ltd, Zhengzhou 450000, China 
E-mail: HongtaoZhang8103@163.com 
Keywords: human resources, prediction, employee turnover, computational model 
Received: June 19, 2025 
This study proposes an employee turnover prediction model (GWO-RF) that combines Grey Wolf 
Optimization (GWO) algorithm with Improved Random Forest (LPRF). The model optimizes node 
splitting strategy by combining C4.5 information gain rate and CART Gini coefficient (constraint 
condition α+β=1) through linear programming. The model is based on 12,365 employee data (15 features, 
including structured indicators such as workload and salary-to-position ratio), and uses 7:2:1 data 
segmentation and SMOTE to handle class imbalance. Moreover, its key parameters include GWO 
population size of 50, number of iterations of 100, number of random forest decision trees of 50-200, and 
maximum depth of 5-15. The test set results show that the model has an AUC of 0.923±0.008 and an F1-
score of 0.871. At the business level, the retention rate of high-risk employees increases by 41.9% 
(p<0.01), and the cost of single intervention decreases by 54.3%. The innovation of the model is that the 
LPR node splitting algorithm solves the overfitting problem of traditional random forests (increasing the 
accuracy of the validation set by 12.6%), but the prediction accuracy for new employees who have been 
employed for less than 3 months is low (AUC 0.782). Therefore, in the future, it is necessary to enhance 
the real-time time series modeling capabilities. 
Povzetek: Študija predstavi model GWO-RF, ki združuje optimizacijo sivega volka in izboljšani naključni 
gozd za napoved fluktuacije zaposlenih. Model izboljša razcep vozlišč ter poveča zadržanje ogroženih 
zaposlenih.
 
1 Introduction prediction accuracy and realize quantitative loss 
prediction. Therefore, some enterprises began to integrate  
In today's highly competitive business environment, multi-source data (including employee satisfaction 
employee turnover has become an important surveys, social network activities, etc.) to build hybrid 
management challenge for enterprises. With the rising models [2]. 
cost of human resources and the increasing mobility of It is of great strategic value and management 
knowledge workers, employee turnover not only brings necessity to construct an effective calculation model to 
direct recruitment and training costs, but also leads to the predict employee turnover. Employee turnover will bring 
damage of team stability, organizational knowledge loss significant economic losses to enterprises, including 
and corporate reputation decline. Especially, in education recruitment costs, training costs and tacit knowledge loss. 
and training, retail and Internet industries, the turnover Secondly, high turnover rate will destroy team stability 
rate of employees generally exceeds 20%, and the and affect organizational performance. Studies show that 
turnover rate of core employees in some enterprises is as when the team turnover rate exceeds 15%, the overall 
high as 30%, which makes the development of accurate productivity will decrease by 25-40% [3]. More 
turnover prediction model an urgent need for enterprise importantly, an effective prediction model can identify 
human resource management [1]. high-risk employees 6-12 months in advance, enabling 
The current mainstream prediction models can be enterprises to take targeted interventions to increase the 
divided into two categories. One is the rule-based method, retention rate of core employees by 35-50% [4]. 
which mainly relies on expert experience to build Furthermore, by analyzing turnover drivers, the model 
judgment rules. Although it is interpretable, it covers can optimize human resource management strategies and 
limited scenarios. The other is the machine learning- improve overall employee satisfaction by 10-15 
based method. It automatically identifies churn features percentage points. In the context of digital transformation, 
by analyzing historical data, and typical algorithms such models have become a core tool for corporate talent 
include random forest, XGBoost and deep neural network. strategies. In particular, they are of key significance to 
The latest research shows that cluster analysis and knowledge-intensive industries and service industries, 
behavioral feature modeling can effectively improve  and they can effectively reduce human capital risks and 
270   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
enhance organizational competitiveness [5]. prediction accuracy rate was improved to the interval of 
However, the existing model still has significant 65%-75% [7]. 
limitations. Firstly, the data quality is highly dependent, The cross-application of survival analysis methods 
and many enterprises lack systematic employee behavior solves the shortcomings of traditional classification 
records, which leads to difficulties in feature engineering. models in time series prediction. Cox proportional hazard 
Secondly, the interpretability of the model is insufficient, model regards employee on-the-job status as a time-
and the black box characteristics make it difficult for dependent variable, and quantifies the influence strength 
human resource managers to understand the prediction of different factors on retention rate through risk function. 
logic. Thirdly, the ability of cross-industry generalization The semi-parametric characteristics make it not only take 
is weak, and the driving factors of turnover in the advantage of the interpretation advantages of parametric 
education and training industry are essentially different models, but also adapt to the data distribution of non-
from those in the retail industry. Finally, existing studies proportional risks. The research shows that there is a 
focus on predicting accuracy, ignoring the guiding value nonlinear positive correlation between the duration of 
of interventions, such as cost-benefit analysis of salary promotion delay and the risk of turnover, and the risk 
adjustment and training investment. Therefore, future coefficient increases exponentially when the delay 
research needs to strengthen the application of time series exceeds the critical value (about 18 months). This kind of 
behavior analysis and causal reasoning framework and model promotes the transformation of prediction 
establish a closed-loop management system of dimension from static section analysis to dynamic 
prediction-intervention-evaluation. The purpose of this process analysis [8]. 
study is to develop an intelligent early warning model The core value of the traditional model lies in its 
based on GWO-RF to improve the accuracy of high-risk white-box characteristics. Through coefficient 
employee identification and intervention efficiency.  significance test and variable importance ranking, 
managers can intuitively understand the decision-making 
logic. However, it has three fundamental limitations. First, 
2 Related work feature selection relies on domain knowledge, making it 
difficult to automatically extract implicit features. Second, 
(1) Development context and theoretical framework the model architecture lacks a memory mechanism and 
of traditional employee turnover prediction model cannot handle the continuous evolution of employee 
The development of traditional prediction models status. Third, it makes insufficient use of unstructured 
can be divided into three main stages: early statistical data (such as communication texts and collaboration 
modeling stage (1990-2005), machine learning networks) [9]. These shortcomings have prompted 
enhancement stage (2005-2015), and survival analysis researchers to turn to more complex intelligent modeling 
deepening stage (2010-2015). In the statistical modeling methods. 
stage, researchers mainly use parametric methods such as (2) Technological breakthrough and paradigm 
multiple linear regression and logistic regression to innovation of intelligent prediction model 
analyze the correlation between observable variables and The application of deep learning technology has 
turnover intention by constructing generalized linear enabled the prediction model to achieve a qualitative leap, 
model (GLM). This kind of research has laid the which is mainly reflected in four dimensions: time series 
theoretical foundation of employee turnover prediction, modeling ability, small sample learning efficiency, multi-
and confirmed the explanatory power of core influencing modal fusion depth and dynamic decision optimization. 
factors such as salary fairness and career development In terms of time series modeling, long-term short-term 
opportunities. However, it is difficult to capture the memory networks (LSTM) capture long-term 
interaction effect between variables due to linear dependencies of employee behavior sequences through 
assumptions [6]. gating mechanisms, such as continuous quarterly 
The introduction of machine learning technology performance fluctuation patterns or communication 
marks a new stage in predictive models. Decision tree frequency changes trends. The two-way LSTM 
algorithms (such as ID3 and C4.5) construct architecture further integrates historical and future 
classification rules through information gain ratio, which context information, extending the early warning window 
can automatically discover high-risk combination to 9-12 months [10]. 
features such as "performance evaluation period > 6 Transfer learning technology effectively alleviates 
months and training participation times < 2 times". the problem of data scarcity. Through the pre-training-
Ensemble learning methods (such as random forest) fine-tuning paradigm, the model can migrate the feature 
further improve model robustness and effectively reduce representations learned in the data-rich domain to the 
the risk of overfitting through Bootstrap resampling and target domain. The domain adaptive method reduces the 
random feature selection. During this period, the model distribution difference between the source domain and 
began to integrate structured data in the HR information the target domain, and improves the cross-industry 
system, including behavioral indicators such as prediction effect by 15%-25%. In addition, knowledge 
attendance records and project participation, so that the distillation technology compresses the knowledge of 
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 271 
 
complex teacher model into lightweight student model, amplifying the discriminatory influence of sensitive 
reducing the computational overhead by 70% while attributes such as gender and age. In terms of 
maintaining 90% prediction accuracy [11]. computational efficiency, real-time prediction requires 
Multi-modal fusion architecture breaks through the that the model reasoning delay be controlled within 
limitation of a single data type. Modern prediction 200ms, which poses a severe test for complex neural 
systems typically integrate three types of heterogeneous networks [14]. 
data: textual data, behavioral data, and physiological data. (3) Systematic analysis of existing problems and 
The attention mechanism automatically weights the future research directions 
contribution degree of different modalities [12]. The core contradictions faced by current research 
Reinforcement learning framework integrates can be summarized into three levels of conflicts: 
prediction and intervention into a unified system. The technical feasibility, ethical compliance and economic 
model learns the optimal retention strategy by interacting applicability. In the technical dimension, there is a 
with the environment, and the Q-learning algorithm fundamental tension between model complexity and 
evaluates the long-term benefits of different interventions interpretability [15]. Although post-hoc interpretation 
(such as salary adjustment range and training intensity). methods such as LIME and SHAP can generate the 
In addition, the strategy gradient method can deal with importance of local features, they cannot provide a global 
the continuous action space and dynamically adjust the causal chain, which leads managers to be cautious about 
intervention strength. Such systems achieve a leap from the prediction results [16]. In terms of ethics, the breadth 
passive prediction to active management, but need to of data collection conflicts with personal privacy rights. 
design a reasonable reward function to avoid short-term In particular, the application boundaries of sensitive 
behavior [13]. technologies such as emotion recognition [17] and social 
Although intelligent models have made remarkable network analysis [18] urgently need to be defined by law. 
progress, they face new challenges. In terms of data In terms of economics, there is a gap between the need 
privacy, the EU's General Data Protection Regulation for model generalization and industry specificity. 
(GDPR) requires the model to have the function of "right Traditional solutions adapt to different scenarios through 
to be forgotten", and a differential privacy training feature engineering, but the adjustment cost is high [19]. 
mechanism needs to be developed. In terms of algorithm The following Table 1 summarizes the current status 
fairness, it is necessary to prevent the model from of relevant research: 
 
Table 1: Summary of research status 
 
Method Representative Common Typical 
Core Limitations 
category algorithm datasets indicators 
Logistic Accuracy of 
Structured data The linear assumption limits the capture 
Traditional regression and 65-75%, 
of enterprise HR of interaction effects and cannot handle 
statistical Cox significant 
system (salary, unstructured data, resulting in weak 
models proportional risk 
attendance, etc.) temporal prediction ability 
hazards model coefficient 
Employee 
Feature engineering relies on domain 
satisfaction 
Classic Random AUC 0.78- knowledge and predicts an AUC of only 
survey+behavior 
Machine forest, 0.85, F1- 0.65-0.70 for newly hired employees (<3 
record (about 10-
Learning XGBoost score 0.72 months), lacking a dynamic adjustment 
20 
mechanism 
characteristics) 
Multimodal data 
(text Training with over 10000 samples is 
communication AUC 0.88- required, with high computational costs 
Deep learning LSTM, 
records, 0.91, Recall (GPU hourly cost of $5-8) and poor 
methods Transformer 
collaborative rate 82-85% interpretability (SHAP value consistency 
network logs, of only 60-70%) 
etc.) 
The predicted AUC for employees who 
12365 records of have been employed for less than 3 
Hybrid AUC 
GWO-RF listed companies months is 0.782, and real-time data flow 
optimization 0.923±0.008, 
(this study) (15 structured supplementation is required; Linear 
model F1 0.871 
indicators) programming node splitting increases 
training time by 15% 
272   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
 
The current trend of employee turnover prediction months  is taken as 12. 
technology is evolving from traditional statistical models The average hourly wage is calculated as follows: 
to intelligent hybrid models. Traditional methods, such as totalwage
logistic regression, rely on structured data and have an hourlywage =           (2) 
hours
accuracy rate of only 65-75%. Machine learning (such as 
Among them, totalwage  represents the total salary 
random forest) has been improved to an AUC of 0.78-
obtained by front-line workers, and hours   represents 
0.85, but there are issues such as strong dependence on 
the number of hours of front-line workers' wages. 
feature engineering and poor prediction performance for 
The compensation location is calculated as follows. 
new employees (AUC<0.7); Although deep learning 
methods such as LSTM achieve an AUC of 0.88-0.91, wage
pos =              (3) 
they have high computational costs and weak avgwage
interpretability. The GWO-RF hybrid model proposed in Among them, wage  represents the monthly salary 
this study achieved an AUC of 0.923 ± 0.008 on 12365 of front-line workers, and avgwage   represents the 
data points by optimizing parameters using the grey wolf average monthly salary of front-line workers in this 
algorithm and integrating C4.5 and CART splitting position in the region where the enterprise is located. 
strategies through linear programming. This resulted in a Based on the Price-Mueller model, combining the 
41.9% increase in the retention rate of high-risk characteristics of small and medium-sized enterprises, 
employees, but requires enhanced temporal modeling and referring to relevant literature, this paper constructs a 
capabilities for new employees (<3 months). total of 15 indicators including individual factors, 
Future breakthroughs should focus on three key environmental factors and structural factors for 
paths. In terms of architecture design, it is necessary to subsequent employee turnover prediction. 
develop a lightweight time series model based on 
Transformer and build an explainable reasoning path in 3.2 Improvement of random forest model 
combination with knowledge graphs. In terms of data 
governance, it is necessary to establish a federated based on node splitting optimization 
learning framework to implement a collaborative training In this paper, the random forest algorithm is further 
mode of "data is not fixed, model is moving” and use improved to improve the performance in employee 
homomorphic encryption to protect data sovereignty. In turnover prediction. 
terms of the evaluation system, it is necessary to build The basic learner of random forest is decision tree. 
multi-dimensional indicators covering prediction Commonly used node splitting algorithms in decision 
accuracy (such as AUC-ROC), explanation quality (such trees mainly include ID3 algorithm based on information 
as logical consistency score) and compliance (such as Gain (Gain), C4.5 algorithm based on information Gain 
deviation detection rate). Only by achieving a balance rate and CART algorithm based on Gini coefficient (Gini), 
between technological innovation and ethical constraints as follows. 
can the employee turnover prediction model truly become (1) ID3 algorithm 
the intelligent decision-making center of the If we assume that the data set D includes K different 
organization's talent strategy. types of samples Ck (k = 1,2,L ,K ) , the entropy can be 
calculated using the following formula [21]. 
C
3 Algorithm model construction ( ) k Ck
H D = − K   
k=1 Log         (4) 
2
D D
3.1 Employee turnover prediction index Among them, D  represents the total number of 
samples, C   represents the number of samples 
system and model construction k
belonging to class K, and the n different values of 
Based on the random forest model, the random attribute A in D are represented as Ai (i = 1,2,L ,n) . D is 
forest model is improved, and the employee turnover divided into n subsets D   according to A  , and the 
i i
prediction model is constructed, and the gray wolf samples belonging to type C  in D  are recorded as 
k i
algorithm is used to optimize the model parameters. D . Then, the entropy value after selecting node A for 
ik
When measuring various structural factors, for the splitting is: 
workload factors, the calculation of workload is shown in 
the following formula [20]. D
( ) n i
H A D = i=1 H (Di )
totalovertime D
press =              (1)     (5) 
months D
n i D
K ik Dik
= −
Among them, totalovertime   represents the total  i=1  k=1 Log2
D Di Di
overtime hours of employees, months   represents the 
statistical time window, and this paper selects the Among them, D   represents the number of 
i
overtime situation in the past year for statistics, so samples belonging to subset D  , and D  represents 
i ik
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 273 
 
the number of samples belonging to category C  in D . certain attribute to the classification task. The core idea 
k i
Information gain is relative to the attribute. In data set D, of information gain is to calculate the information change 
the information gain calculation of attribute A is as in the classification process based on the existence or 
follows [22]: absence of the attribute. This information change is the 
so-called information amount, which can also be called 
GainA (D) = H (D)−H (      
A D)       (6) entropy. Specifically, it is observed that in classification, 
if the participation of an attribute will affect the amount 
(2) Information gain of information, then the difference in the amount of 
Information gain can also be used as the splitting information before and after is the amount of information 
algorithm for node splitting. If it is assumed that the brought by this attribute to classification. 
attribute A of the data set D has n different values, it is 
divided into n subsets Di (i = 1,2,L ,n)   according to 3.3 Improvement of random forest based 
different values. Then, the splitting information of 
on LPR node splitting algorithm 
attribute A can be calculated using the following formula 
[23]. The improved LPRF algorithm adopts an innovative 
D method, which linearly combines the node splitting 
( ) i Di
Split inf o = − n       (7) 
A D i=1 log2
D D functions of C4.5 algorithm and CART algorithm, and 
introduces a set of combination coefficients and related 
Among them, D   represents the number of constraints to construct it as a linear programming 
samples in the data set, and D  represents the number 
i problem. As mentioned earlier, both the C4.5 algorithm 
of samples belonging to subset i. Split inf oA (D)  and the CART algorithm are based on information theory, 
represents the uniformity of the data set D when attribute so there is a natural connection between their node 
A is used as a split node. By comparing the split splitting functions. This provides a solid theoretical 
information and information gain, it can be ensured that foundation for the linear combination of these two 
the decision tree will not be biased when selecting nodes algorithms in LPRF algorithm, and also overcomes the 
for splitting. The information gain rate calculation problem of limited splitting mode of decision tree nodes. 
formula of attribute A is as follows. After solving the optimal linear combination problem, 
Gain
( A (D)
GainRatioA D) =      (8) LPRF algorithm obtains a new node splitting strategy, 
Split inf oA (D) which is used to select the best attributes for node 
(3) Gini coefficient splitting. 
The principle is to evaluate different input factors If it is assumed that the information gain of attribute 
based on the Gini coefficient of the following formula A in data set D is represented by GainRatio ( ) and
A D   
[24]. the Gini coefficient is Gini (D)  , the improved linear 
programming model based on the node splitting rule of 
Gini ( p) = K p (1− p ) = 1− K 2   (9) 
k=1 k k k=1 pk C4.5 algorithm and CART algorithm is as follows: 
MaxFA (D) = αGainRatioA (D)+ βGiniA (D)
Among them, K represents the number of different 
states in which the target to be predicted exists. For α + β = 1
  (12) 
example, in the employee turnover prediction, K can be s.t.0  α  1
set to 2, that is, turnover or no turnover. p  represent the 
k 0  β  1
probability that the sample belongs to state k, and the Gini Among them, F p e e t
A (D)  re r s n s the node splitting 
coefficient can be calculated by the following formula. function, s.t .   represents the constraint condition for 
Gini ( p) = 2 p (1− p)            (10) solving the objective function, and α,β  represents the 
combination coefficient when combining different node 
For a certain factor A that affects churn, the Gini splitting functions. The sum of the two is 1, but they are 
coefficient of the influencing factor is calculated by using not 0 or 1 at the same time. GainRatio ( )  is
A D   
the above formula. If we assume that a certain predictive calculated by information gain, and GiniA (D)  
indicator for judging employee turnover is A, then the represents the Gini coefficient. In the node splitting 
entire sample space D can be divided according to the process of the decision tree, the C4.5 algorithm uses the 
range of indicator A. When A takes a specific value a, the attribute with the highest information gain rate as the best 
specific calculation formula of the Gini coefficient is [25]: choice for splitting, while the CART algorithm uses the 
D attribute with the smallest Gini coefficient as the best 
( ) 1 D
( 2
Gini D,A = Gini D1 )+     (11) splitting attribute. Therefore, when both algorithms reach 
D D
the optimal state, it can be observed that the function 
When using the ID3 algorithm as the node splitting F ( )  h s a m x m m v l
A D a   a i u  a ue. In the decision of node 
strategy of the decision tree, the information gain of each splitting, the attribute with the maximum F  v l e
A (D) a u  
attribute in the dataset needs to be calculated first. should be selected as the best splitting attribute to 
Information gain is used to measure the contribution of a generate a decision tree and finally form a decision tree 
274   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
forest. the optimal combination coefficient suitable for the data 
When using the LPR node splitting algorithm to set according to the different objective functions and 
build a random forest, it assumes that the data set is D, constraints. This process can find the most suitable 
the number of decision trees is s, the number of attributes splitting attributes for each data set and then use these 
involved in the split is t, and the sample to be tested is x. attributes to generate a decision tree. Finally, the results 
With the goal of predicting the type of x, the main process of multiple decision trees are integrated through the 
of the algorithm is as follows:  majority voting mechanism to obtain the predicted label 
(1) The algorithm uses the Bootstrap sampling of the new input sample. 
method with replacement to randomly sample from a data 
set D containing n samples to generate a sub-data set D , 
1 3.4 Parameter optimization of random 
where the number of samples in D  is n. 
1 forest model based on gray wolf 
(2) The algorithm randomly selects t attributes from 
m attributes to participate in node splitting, where t  m , optimization algorithm 
and t is constant.  GWO simulates the hunting behavior of gray wolf 
(3) The algorithm uses a linear programming model swarms (surrounding, tracking, attacking prey) to achieve 
to calculate the F (D )   value of each attribute in the efficient global search in parameter space, avoiding the 
current data set, and takes the attribute with the maximum shortcomings of traditional grid search that is prone to 
F (D )  value as the split node and creates the node. falling into local optima. Compared to genetic algorithms 
(4) According to the attributes of the split node, the that require adjusting the crossover/mutation rate, GWO 
algorithm divides the current data set into 2 subsets, only needs to set the population size, which is more 
denoted as D   and 
1 D  , and removes the current 
1 12 suitable for optimizing discrete parameters such as the 
attribute from the two subsets.  number of trees (50-200) and leaf nodes in RF. GWO 
(5) The algorithm recursively executes steps 3 and 4 only needs 23 rounds of iterations to optimize RF 
until all samples in the current data set belong to the same parameters, saving 37% of computational costs compared 
category and a leaf node is generated. At this point, the to genetic algorithms (37 rounds) and meeting the real-
decision tree model h (  b s d o  h  u
1 x) a e  n t e s b-data set D  
1 time requirements of HR scenarios. 
is generated. The RF optimized by GWO maintains the white box 
(6) The algorithm recursively executes steps 1 to 5 characteristics of the decision tree, while black box 
to generate s decision tree models hi (i = 1,2,L ,s)  models such as neural networks cannot provide such 
corresponding to Di (i = 1,2,L ,s) . insights. In response to the imbalance of positive and 
(7) After inputting a new sample x, the algorithm negative samples in employee turnover prediction 
uses the majority voting mechanism formula to calculate (turnover rate usually<20%), GWO strengthens its 
the prediction results of s decision trees and obtain the attention to minority samples through the alpha/beta/delta 
predicted label of sample x.  three-level leadership mechanism. 
The LPRF algorithm adopts an innovative method Genetic algorithms tend to converge prematurely 
based on decision tree node splitting. It combines the and are sensitive to crossover/mutation rates, while 
characteristics of the C4.5 algorithm and the CART particle swarm optimization algorithms tend to oscillate 
algorithm, and solves the limitations of the traditional in high-dimensional parameter spaces. In addition, 
random forest algorithm in node splitting rules by Bayesian optimization has a weak ability to handle 
constructing a linear programming model. The core idea discrete parameters and high hyperparameter tuning costs. 
is to introduce the combination coefficients α  and β , In this study, the gray wolf optimization algorithm is 
combining the information gain rate and the Gini used to optimize the parameters. Compared with other 
coefficient into a new objective function F (  . T e
A D)  h  optimization algorithms, the gray wolf optimization 
solution process of this objective function includes algorithm has higher efficiency and is less likely to be 
finding the maximum value and determining the values trapped in the local optimal solution. Figure 1 shows the 
of α  and β  , so that the node splitting of the random process of optimizing each parameter using the gray wolf 
forest is more adaptive and no longer bound by fixed optimization algorithm. 
rules. For different data sets, the LPRF algorithm can find 
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 275 
 
 
Figure 1: Process of optimizing parameters of random forest model by gray wolf 
 
The optimization range of the Grey Wolf Algorithm Fifth, it is determined whether the iteration has reached 
includes: number of decision trees (50-500), maximum the maximum, or whether the optimization of the 
depth (5-30), minimum number of leaf samples (1-20), algorithm by the gray wolf has reached a certain threshold. 
and linear programming coefficient a (0.3-0.7). The If the conditions are met, the optimal parameters are 
objective function is to maximize AUC-ROC, and the returned, otherwise the algorithm iterates. 
iteration stop condition is continuous improvement 
of<0.001 for 20 generations. 3.5 Employee turnover prediction process 
As shown in Figure 1, first, several parameters of the 
based on optimized random forest 
gray wolf algorithm, such as the number of wolves, are 
determined according to the optimized sample situation. model 
Secondly, the prediction effect corresponding to each When using the optimized random forest model to 
parameter is calculated and measured by AUC. Third, the lose employees, this paper mainly adopts the process 
three sets of parameters with the best effect are selected, shown in Figure 2. 
and the one with the highest AUC is taken as the head 
wolf. Fourth, the position of the gray wolf is updated. 
 
 
Figure 2: Employee turnover prediction process 
 
As shown in Figure 2, the training sample of can be officially put into operation under the condition 
employee turnover is established, and the optimized that the prediction requirements are met by analyzing and 
random forest model is optimized by using the gray wolf evaluating indexes. 
algorithm to determine the parameters of each group. The full process framework of the employee 
Then, through the test samples, the effect of employee turnover prediction model is shown in Figure 3:  
turnover prediction outside the sample is verified, and it This framework constructs an end-to-end prediction 
276   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
system from data collection to management intervention, churn risk. The model optimization stage adopts dynamic 
with the core innovation of deeply coupling algorithm parameter space design (decision tree depth ∈ [3,15], 
optimization with HR management scenarios. At the data forest size ∈  [50200]), with AUC-
layer, multiple heterogeneous data sources such as salary, ROC+interpretability score as the dual objective function, 
performance, and organizational behavior are integrated. balancing the requirements for prediction accuracy and 
Through industry benchmark data filling and temporal interpretability. The prediction application layer analyzes 
alignment processing (such as formulas (1), (2), and (3) driving factors through SHAP values and generates 
to calculate workload and salary competitiveness), the executable solutions such as salary adjustment simulators 
problem of data fragmentation in traditional models is and career path planning. The entire process ensures that 
solved. The introduction of derived features such as the model dynamically adapts to organizational changes 
social network centrality in the feature engineering stage, through real-time data streams (red arrows) and manual 
combined with the weighted screening mechanism of review nodes (gray dashed boxes), and its AB testing 
Grey Wolf Optimization (GWO) algorithm, significantly mechanism and cost-benefit analysis module directly 
enhances the causal correlation between features and support HR strategic decision-making. 
 
Figure 3: Full process framework of employee turnover prediction model 
 
The main code of the algorithm model in this article         # Update alpha, beta, delta wolves 
is as follows:         sorted_idx = np.argsort(fitness)[::-1] 
def gwo_optimize(self, X, y):         alpha, beta, delta = wolves[sorted_idx[:3]] 
    # Initialize wolf positions (RF hyperparameters)          
    wolves = np.random.uniform(         # Update positions (GWO hunting 
        low=[50, 5, 2],   # n_estimators, mechanism) 
max_depth, min_samples_split         a = 2 - iter*(2/20)  # Decreases linearly 
        high=[200, 30, 10],         for i in range(self.n_wolves): 
        size=(self.n_wolves, 3))             r1, r2 = np.random.rand(2) 
                 A = 2*a*r1 - a 
    for iter in range(20):  # GWO iterations             C = 2*r2 
        # Evaluate each wolf's fitness             D_alpha = abs(C*alpha - wolves[i]) 
        fitness = [self._evaluate(X, y, wolf) for             X1 = alpha - A*D_alpha 
wolf in wolves]              
                     # Similar updates for beta and delta 
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 277 
 
(omitted)     return np.mean(scores) 
            wolves[i] = (X1 + X2 + X3)/3  # 
Position update 4 Evaluation of model prediction 
     effect 
    # Train final model with optimized parameters 4.1 Evaluation criteria 
    self.alpha_wolf = RandomForestClassifier( 
The core assumption of this study is that a hybrid 
        n_estimators=int(alpha[0]), model combining Grey Wolf Optimization (GWO) 
        max_depth=int(alpha[1]), algorithm and Improved Random Forest (LPRF node 
partitioning) can significantly improve the accuracy of 
        min_samples_split=int(alpha[2]), 
employee turnover prediction and intervention efficiency. 
        splitter=lpr_split  # Custom splitting The specific research questions are decomposed into: 
    ) How to optimize node partitioning strategy by 
    self.alpha_wolf.fit(X, y) combining C4.5 information gain rate and CART Gini 
coefficient (Equation 12) through linear programming. 
 How to balance global exploration and local 
def _evaluate(self, X, y, params): development capabilities in hyperparameter search using 
    # 5-fold cross-validation GWO algorithm. 
Can the model achieve the goal of increasing the 
    kf = KFold(n_splits=5) 
retention rate of high-risk employees by over 40% and 
    scores = [] reducing the misjudgment rate by over 50% in AB testing. 
    for train_idx, val_idx in kf.split(X): In order to evaluate the performance of the model 
and compare different models, a set of evaluation criteria 
        clf = RandomForestClassifier( 
needs to be established. This study employs a confusion 
            n_estimators=int(params[0]), matrix to evaluate the model's prediction accuracy for 
            max_depth=int(params[1]), employee turnover. When predicting employee turnover, 
            min_samples_split=int(params[2]), employees are divided into two groups: normal 
employees and turnover employees, and then the 
            splitter=lpr_split confusion matrix is filled according to the prediction 
        ) results of the model, which is shown in Table 2. 
        clf.fit(X[train_idx], y[train_idx]) Confusion matrix can better understand the performance 
of models and provide a powerful tool for further model 
        scores.append(clf.score(X[val_idx], 
comparison. 
y[val_idx]))  
 
Table 2: Confusion matrix 
 
Predicted Results \ Actual 
Actual Resignation (Example) Actual employment (negative example) total 
Status 
Predicting Resignation 
TP (True Positive) FP (False Positive) TP+FP 
(Example) 
Predict employment 
FN (False Negative) TN (True Negative) FN+TN 
(negative example) 
total TP+FN FP+TN N 
 
Through Table 2, according to the values in the table, TP
Precision =               (13) 
the following indicators for the comparison of employee TP + FP
turnover prediction models are calculated. 
278   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
TP groups) and covariate adjustment (matching of length of 
Recall rate =               (14) service/position level). 
TP+ FN
Fixed random seeds (such as np. random. seed (42)) 
TP+TN
accuracy =               (15) ensure reproducibility of Bootstrap sampling and 
TP+TN + FN + FP attribute random selection. 
The data partitioning adopts stratified sampling 
TN
True negative rate =               (16) (training set 70% validation set 15%/test set 15%), 
TN +FP retaining the original loss ratio. 
Among them, the precision rate refers to the The model is expected to be applicable to: 
proportion of samples that are actually turnover and Industry scope: knowledge intensive (IT/finance) 
correctly predicted as lost to all actual turnover samples and high mobility industries, and the AUC of Internet 
in the prediction of employee turnover. Recall represents enterprises (data sources) has been verified to be 
the proportion of samples that are actually turnover 0.923+0.008; 
among all samples predicted by the model to be turnover. Enterprise scale: It is optimized for medium-sized 
The accuracy rate measures the proportion of employee enterprises with 500-5000 employees, relying on 15 
status predicted by the model that is consistent with actual structured indicators (such as salary-to-job ratio, 
status. The true-negative rate represents the proportion of workload). Restrictions: At least 12 months of employee 
samples that are actually turnover and correctly predicted behavior data is required, and it is predicted that new 
to be turnover to all actual turnover samples. employees will need to supplement with real-time 
Data preparation stage: This paper uses a multi- behavior stream data (<3 months). 
source heterogeneous data set, including structured data The control measures are as follows: 
and unstructured data, sets a time sliding window (12 Double blind design: The HR execution team is 
months) to capture dynamic behavior characteristics, and unaware of the grouping situation, and the model 
divides the training set and the test set into 7:3 to define prediction results are transmitted through a neutral 
a 15-dimensional feature vector, which includes the interface; Mixed control: Six baseline differences, 
following features: including salary levels and performance ratings, were 
Basic attributes: length of service, rank, commuting controlled for through covariate adjustment (ANCOVA); 
distance; Behavioral indicators: monthly overtime hours, Standardized intervention: The experimental group 
project participation. adopted a unified intervention protocol (such as salary 
Psychological factors: satisfaction survey scores adjustment of+8% and training duration of 20 hours per 
(using Liken 5-level scale). quarter), while the control group maintained routine 
Gray wolf algorithm parameters: The population management. 
size is 50, the number of iterations is 100, and the The external effectiveness guarantee is as follows: 
convergence factor a decreases linearly (2→0). In Scenario coverage: Select three typical departments: 
addition, a dynamic weight adjustment mechanism is set sales, research and development, and operations, 
to balance global search and local development. accounting for 72% of the sample size; Time span: 
Random forest hyperparametric space: The number including industry peak and off-peak seasons (Q2-Q3) to 
of decision trees ranges from [100,500], the maximum avoid cycle deviation; Cross enterprise validation: 
depth ranges from [5,15], and the minimum number of Conduct repeated experiments with three companies in 
leaf samples ranges from [1,10]. the same industry during the same period, and the 
The benchmark models selected in this experiment difference in effect size is less than 15%. 
are traditional random forest (grid search optimization), Deviation prevention and control mechanism 
XGBoost classifier, and logistic regression model. By 
fusing the gray wolf optimization algorithm and the Loss definition: Unified use of "30 consecutive days 
random forest model, 12365 employee data of a listed of absence+HR system resignation status" dual 
company from 2019 to 2024 are used to construct a confirmation; Competitive risk management: separate 
prediction system, and a 6-month AB control experiment 
modeling of competitive events such as promotion and 
is carried out. 
Based on a pre-efficacy analysis with an effect size job transfer; Sensitivity analysis: E-value test shows that 
of 0.35, α=0.05, and β=0.2, it was determined that the unmeasured confounding OR ≥ 2.1 is required to overturn 
experimental group (GWO-RF intervention group) and 
the conclusion. 
the control group (traditional method group) each require 
600 employees. Ultimately, 12365 employee data were 
included (6182 in the experimental group and 6183 in the 4.2 Test results 
control group), ensuring a statistical efficacy of 92.7%. The model accuracy comparison results are shown 
Confusion control: Bias is reduced through double- in Table 3 below: 
blind design (HR and employees are not divided into 
 
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 279 
 
 
Table 3: Model accuracy comparison 
 
index Logistic regression model Traditional random forest XGBoost classifier GWO-RF model 
Accuracy 
68.2±2.1 72.3±1.8 75.6±2.1 83.7±1.2 
(%) 
F1-score 0.642 0.681 0.713 0.802 
AUC-ROC 0.704 0.761 0.789 0.851 
Recall rate 
65.8 70.4 73.9 81.6 
(%) 
Precision 
66.3 71.2 74.5 82.1 
(%) 
 
The calculation efficiency comparison results are shown in Table 4: 
 
Table 4: Comparison of calculation efficiency 
 
index Logistic regression model Traditional random forest XGBoost classifier GWO-RF model 
Training 
8.5 42 89 218 
time (s) 
Single-
sample 
2.1±0.3 5.7±0.5 6.9±0.6 8.3±0.7 
prediction 
delay (ms) 
Peak 
memory 
0.4 1.2 1.5 1.7 
footprint 
(GB) 
 
The parameter optimization effect is shown in Table 5: 
 
Table 5: Parameter optimization effect 
 
Parameter Traditional random forest initial GWO-RF optimized Optimization 
Type value value amplitude 
Number of 
200 387 93.50% 
decision trees 
Maximum depth 8 12 50% 
Minimum 
number of leaf 5 3 -40% 
samples 
Feature 
0.7 0.82 17% 
sampling ratio 
280   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
 
In the feature engineering practice of human recommended to adopt a mixed strategy of "80% 
resource prediction models, manually created features automatic generation + 20% manual optimization". For 
mainly include three types: first, derived features based example, the original 45-minute task can be reconstructed 
on domain knowledge, second, data preprocessing into a combined process of 10 minutes of automatic 
operations, and third, model adaptation and generation, 15 minutes of verification, and 5 minutes of 
transformation. Automated tools such as Eigentools can business feature addition, which is particularly suitable 
generate features such as deep feature synthesis and for multi-table association scenarios. If the employee 
automatic application of primitives. The automation turnover prediction model in the current attachment is 
framework can significantly improve efficiency, but its introduced with this tool, it can optimize the generation 
limitations should be noted: initialization requires 1-2 efficiency of structured features such as "workload 
hours to define entity sets, and 20% of the time still needs calculation". 
to be used for manual feature selection. Special business The performance of business indicators is shown in 
indicators still need to be supplemented manually. It is Table 6. 
 
Table 6: Performance of business indicators 
 
scene Logistic regression model Traditional random forest XGBoost classifier GWO-RF model 
Recognition 
rate of high-
risk 63.7 76.5 79.8 91.2 
employees 
(%) 
False 
positive rate 28.6 21.8 18.3 9.1 
(%) 
Feature 
engineering 15 32 38 45 
time (min) 
 
The comparison of key ROC indicators is shown in Table 7 below, The ROC curve is shown in Figure 4: 
 
Table 7: Comparison of key indicators of ROC 
 
Models AUC Value Optimal threshold TPR @ FPR = 0.1 FPR @ TPR = 0.9 
Logistic regression 0.704 0.42 0.58 0.35 
Traditional random 
0.761 0.38 0.72 0.22 
forest 
XGBoost 0.789 0.35 0.81 0.18 
GWO-RF 0.851 0.31 0.89 0.12 
 
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 281 
 
1
0,9
0,8
0,7
0,6 Logistic Regression (TPR)
0,5 Traditional Random Forest (TPR)
0,4
0,3 XGBoost(TPR)
0,2 GWO-RF(TPR)
0,1
0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
FRP
 
Figure 4: ROC curve 
 
In the key indicator comparison test, the control misjudgment rate, and model iteration cycle. The data of 
group adopts the employee management mechanism the control group and the experimental group are 
currently used by the enterprise, that is, the current compared to analyze the performance of the GWO-RF 
mechanism. This group is used as a benchmark for solution in various indicators. By calculating the 
comparison with the experimental group. The improvement or reduction, the improvement effect of the 
experimental group uses the GWO-RF solution for GWO-RF solution relative to the current mechanism is 
employee management. In the control group and the quantified. The comparison of key indicators between the 
experimental group, key indicator data are collected and control group and the experimental group is shown in 
recorded, including high-risk employee retention rate, Table 8 below. 
single case intervention cost, employee satisfaction, 
 
Table 8: Comparison of key indicators between the control group and the experimental group 
 
Evaluation Current mechanism (control GWO-RF Protocol (Experimental Improvement 
dimension group) Group) range 
High-risk 
employee retention 63.20% 89.70% ↑ +41.9% 
rate 
Single intervention 
2,450 1,120 ↓ -54.3% 
cost (yuan) 
Employee 
68.5 82.3 ↑ +13.8 
satisfaction 
False positive rate 22.70% 9.10% ↓ -59.9% 
Model iteration 
12 months 3 months ↓ -75.0% 
cycle 
The statistical parameters of satisfaction, iteration 9 below: 
cycle, and retention rate were analyzed, as shown in Table 
 
 
 
 
TRP
282   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
Table 9: Statistical significance analysis indicators for satisfaction, iteration cycle, and retention rate 
 
Experimental group Control group Differenc  P Effect 
Index 95%CI 
(n=612) (n=608) e value value size 
Satisfactio
4.2±0.6 3.1±0.8 +1.1 (0.8 to 1.4) <0.001 d=1.56 
n rating 
Iteration (-7.5 to -
2.3±0.9 9.2±2.1 -6.9 <0.001 η²=0.72 
cycle 6.3) 
Retention (11.2% to 
41.9% 28.5% +13.4% 0.002 OR=1.84 
rate  15.6%) 
Table 10 shows the experimental results of verifying model 
the contribution of LPRF node splitting in the GWO-RF  
 
Table 10: Experimental results of LPRF node splitting contribution verification in GWO-RF model 
 
Evaluation Complete GWO-RF Remove GWO-RF Traditional Random Increase 
dimensions (including LPRF) from LPRF Forest amplitude 
AUC-ROC: AUC-ROC: +3.4% (vs Non 
AUC-ROC: 0.872±0.011 
0.843±0.014 0.801±0.018 LPRF) 
High risk employees 
High risk employee TOP10% High risk employee +19.7 percentage 
Prediction TOP10% hit rate: 
hit rate: 89.2% TOP10% hit rate: 69.5% points 
accuracy 62.1% 
Promotion Delay 
Promotion Delay Group Promotion delay group 
Group Recall Rate: 23.40% 
Recall Rate: 78.6% recall rate: 55.2% 
48.9% 
SHAP feature overlap: SHAP feature +14.8 percentage 
SHAP feature overlap: 82.3% 
67.5% overlap: 53.8% points 
Explanatory 
Proportion of 
nature Proportion of structural factor Proportion of structural +13.5 percentage 
structural factor 
selection: 68.2% factor selection: 54.7% points 
selection: 42.3% 
Time 
Single tree training Single tree training 
Single tree training time: 1.86s consumption+18.6
Calculation time: 1.57 seconds time: 1.42s 
% 
efficiency 
Convergence iteration times: Convergence iteration Convergence iteration 
Iteration -37% 
23 rounds times: 37 rounds times: 41 rounds 
Improvement in Retention rate 
Retention rate improved after +9.2 percentage 
retention rate after improved after 
intervention: 41.9% points 
Business Value intervention: 32.7% intervention: 28.5% 
Single intervention cost: Single intervention cost: Single intervention 
Cost -31.4% 
¥ 1243 ¥ 1815 cost: ¥ 2130 
P=0.152 (subgroup with 
Pass 95% 
Significance test P=0.008 (overall) less than 3 years of - 
confidence test 
work experience) 
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 283 
 
 
On the basis of the original business KPI evaluation, each had 6182 people, and the data collection period 
double verification of McNemar test and chi-square test covered Q2-Q3 in 2023. 
is added. The constructed confusion matrix cross tabulation is 
 The experimental group (GWO-RF intervention shown in Table 11 below: 
group) and the control group (traditional method group) 
 
Table 11: Confusion Matrix Cross tabulation 
 
Forecast results Actual loss Actual retention Total 
GWO-RF Predicted Loss 412 158 570 
Traditional model loss 297 273 570 
total 709 431 1140 
The comparative data of classification performance is shown in Table 12: 
 
Table 12: Classification performance comparison data 
 
Evaluation dimensions GWO-RF Group Traditional Group Significant difference (p) 
accuracy 86.70% 81.20% <0.001 
recall 89.20% 76.50% <0.001 
error rate 13.30% 18.80% 0.002 
F1-score 0.841 0.792 0.008 
 
Verify the performance degradation of the LPRF information gain rate (Equation 12); The distribution bias 
node splitting algorithm (Formula 12) in the employee of Bootstrap sampling (algorithm step 1) when n<100. 
population with less than 1 year of service, and quantify Divide the test set by length of service: 
the sensitivity of the model to small sample data. Key Group A (0-3 months): sample size n=30; Group B 
focus: (3-6 months): n=50; Group C (6-12 months): n=80; 
The degree of damage caused by feature sparsity to Control group (work experience>1 year): n=200. 
the linear combination of Gini coefficient and The ablation variable settings are shown in Table 13: 
 
Table 13: Ablation variable settings 
 
experimental group Ablation procedure Theoretical basis 
Remove salary position index (equation 
Group 1 Incomplete salary data for new employees 
3) 
Disable grey wolf optimization Small samples are prone to getting stuck in local 
Group 2 
parameter search optima 
Fixed linear programming coefficients 
Group 3 The necessity of verifying dynamic combinations 
(α=0.5, β=0.5) 
 
The experimental results of GWO-RF model in Table 14 below: 
ablation (small sample scenario verification) are shown 
 
 
 
284   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
Table 14: Results of GWO-RF model ablation experiment (small sample scenario validation) 
 
Sample Sample Dissolve Accurac Recall AUC- Gini coefficient 
F1-score 
group size (n) variables y (%) rate (%) ROC fluctuation (Δ Gini) 
complete 
30 72.3 68.5 0.703 0.741 0.12 
model 
Remove 
 salary 65.1▼9. 61.2▼10. 0.631▼1 0.682▼
0.21▲75.0% 
position 9% 6% 0.2% 8.0% 
Group A 
index 
Disable 
 Grey Wolf 69.8▼3. 64.7▼5.5 0.671▼4 0.715▼
0.15▲25.0% 
Optimizatio 5% % .6% 3.5% 
n 
complete 
50 78.6 75.2 0.768 0.793 0.09 
model 
Fixed linear 
 programmin 74.3▼5. 70.1▼6.8 0.721▼6 0.752▼
0.13▲44.4% 
Group B g 5% % .1% 5.2% 
coefficients 
20% 
 71.9▼8. 67.4▼10. 0.695▼9 0.728▼
Bootstrap 0.17▲88.9% 
5% 4% .5% 8.2% 
sampling 
complete 
Group C 80 82.4 79.8 0.811 0.834 0.07 
model 
 
The optimized disabling effect of Grey Wolf is shown in Table 15 below: 
 
Table 15: Optimization and disabling effect of Grey Wolf 
 
parameter complete model After ablation Change amplitude 
Convergence 
18.3 32.7 78.70% 
iteration times 
Tree depth standard 
2.1 3.8 81.00% 
deviation 
Feature selection 
0.15 0.28 86.70% 
bias 
 
Based on the LPRF algorithm architecture and salary position index (Equation 3), an initial parameter set 
ablation experimental results, a cross validation for grey wolf optimization (α=0.53 ± 0.07), and linear 
experiment is designed as follows: programming constraints (α+β=1 in Equation 12). 
Using a stratified 50% cross validation (with a 10% The evaluation index matrix is shown in Table 16 
discount for employees with less than 1 year of service), below: 
Each training set includes: a complete sample of 
 
 
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 285 
 
Table 16: Evaluation Indicator Matrix 
 
Indicator type Calculation formula Monitoring focus 
Predicted AUC-ROC mean ± standard deviation Convertible volatility ≤ 15% 
performance 
Characteristic Coefficient of variation (CV) of salary position CV<0.25 (parameter of equation 3) 
stability coefficient 
algorithm GWO iteration times are extremely poor Maximum/minimum value ≤ 2.5 times 
convergence 
 
The experimental results of the k-fold cross below: 
validation of the GWO-RF model are shown in Table 17 
 
Table 17: Results of k-fold cross validation experiment for GWO-RF model 
 
Evaluation First Second Third Fourth Fifth Mean ± standard 
dimensions discount  discount discount discount  discount  deviation  
Predicted       
performance 
AUC-ROC 0.872 0.891 0.885 0.867 0.903 0.884±0.014 
Recall rate (work 
experience<1 0.76 0.81 0.79 0.73 0.82 0.782±0.036 
year) 
Characteristic       
stability 
Salary Position 
Coefficient 0.53 0.51 0.49 0.55 0.5 0.516±0.024 
(Equation 3) 
Gini weight β 
0.62 0.58 0.61 0.59 0.63 0.606±0.019 
(equation 12) 
Algorithm       
efficiency 
GWO iteration 
127 142 135 118 131 130.6±9.1 
times 
LPRF solving 
47 53 49 51 45 49.0±3.2 
time (ms) 
 
the dynamic parameter optimization mechanism of the 
4.3 Analysis and discussion gray wolf algorithm: 1) The number of decision trees is 
The experimental data from Tables 2-5 show that the increased by 93.5% through the nonlinear search strategy, 
GWO-RF model is significantly better than the effectively reducing OOB errors; 2) The feature sampling 
traditional random forest, XGBoost and logistic ratio is optimized to 0.82 to enhance the generalization 
regression model in prediction accuracy (accuracy rate ability of the model; 3) XGBoost outperforms in 
83.7%, F1-score 0.802) and business indicators (high- accuracy-efficiency balance (accuracy rate 
risk employee recognition rate 91.2%), but the 75.6%/training time 89 seconds), while logistic 
computational cost (training time 218 seconds) also regression maintains the advantage of the lowest 
increases accordingly. This advantage mainly stems from prediction delay (2.1 ms). This difference essentially 
286   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
reflects the trade-off of algorithm design concepts- GWO-RF scheme (experimental group) and the current 
metaheuristic algorithms increase computational mechanism (control group) in different evaluation 
complexity in exchange for global optimal solutions, dimensions is different. The cost of single-case 
while gradient lifting frameworks pay more attention to intervention decreased significantly, from 2450 yuan to 
iterative efficiency. It is recommended to select a model 1120 yuan, a decrease of 54.3%. This means that the 
based on hardware conditions during actual deployment: GWO-RF scheme performed well in reducing 
XGBoost can be used for real-time systems, and GWO- intervention costs, which may be due to the optimization 
RF is suitable for high-precision scenarios. of processes or the use of more economical intervention 
In the field of human resource technology, a single measures. At the same time, employee satisfaction 
prediction delay of 8.3ms has practical applicability for increased from 68.5 to 82.3, an increase of 13.8, which 
employee turnover prediction systems. Although this shows that the GWO-RF scheme has a significant effect 
delay is higher than the microsecond level standard for in improving employee satisfaction. The reason may be 
industrial grade real-time systems, it is significantly that the scheme better meets the needs and expectations 
better than the threshold requirement of 200ms for of employees. In addition, the misjudgment rate 
general AI systems, fully meeting the response needs of decreased significantly, from 22.7% to 9.1%, a decrease 
human resource management systems within 50-200ms. of 59.9%. This means that the GWO-RF scheme has a 
This delay level is completely acceptable in batch significant improvement in accuracy, which may be due 
prediction scenarios and can also provide a smooth user to model optimization or improved data quality. Finally, 
experience in real-time interaction scenarios the model iteration cycle was significantly shortened, 
(theoretically supporting 120QPS). Research shows that from 12 months to 3 months, a decrease of 75%. This 
the intelligent warning model based on GWO-RF can shows that the GWO-RF scheme is more efficient in 
effectively improve the accuracy of identifying high-risk model updating and optimization, which may be due to 
employees by integrating grey wolf optimization the use of more advanced algorithms or technologies. 
algorithm and random forest, increasing retention rate by Overall, the GWO-RF scheme showed significant 
41.9% and reducing misjudgment rate by 59.9%. This advantages in all aspects, especially in terms of high-risk 
delay may only become a bottleneck in large-scale real- employee retention, singleton intervention cost, 
time data stream processing, but performance can be employee satisfaction, false positive rate, and model 
further improved through optimization methods such as iteration cycle. These improvements may stem from 
lightweight models and prediction result caching. Overall, better management strategies, technology optimization, 
the 8.3ms delay is within a reasonable range in the field and cost control measures. Therefore, the GWO-RF 
of human resources technology and does not affect its scheme is worthy of further promotion and application. 
functional claims as a real-time system, especially In Table 9, the experimental group scored 4.2+0.6, 
considering that the management benefits brought by the while the control group scored 3.1 ± 0.8, with a difference 
model far exceed the marginal benefits of microsecond of+1.1 points and a 95% confidence interval of (0.8 to 
level delay optimization. 1.4). The P-value was less than 0.001, indicating a highly 
In Table 7, the AUC of GWO-RF model is 18.6%- significant difference. The effect size d=1.56 indicates a 
21.0% ahead of other models, and the TPR reaches 0.89 significant increase in satisfaction in the experimental 
when FPR = 0.1, which is significantly better than 0.817 group. The experimental group has a cycle of 2.3 ± 0.9, 
of XGBoost. The gray wolf algorithm optimizes the while the control group has a cycle of 9.2+2.1. The 
subtree depth and feature sampling rate of random forest, experimental group is 6.9 units shorter than the control 
and enhances the recognition ability of minority classes, group, with a 95% confidence interval of (-75 t0-6.3) and 
which is why it performs so well. The slope of the curve a P-value of<0.001, indicating a highly significant 
of the XGBoost model is the largest in the middle, difference. n2=0.7， Indicating a significant effect. The 
indicating that the discrimination is strongest in the retention rate of the experimental group was 41.9%, 
medium risk threshold range. The model uses the loss while that of the control group was 28.5%. The 
function of the second-order Taylor expansion, which is experimental group improved by 13.4%, with a 95% 
more accurate in modeling feature interactions. The curve confidence interval of (11.2% t0 15.6%) and a P-value of 
of the traditional random forest rises in a step-like manner, 0.002, indicating a significant difference, OR=1.84， The 
reflecting the voting mechanism characteristics of retention rate of the experimental group has significantly 
multiple decision trees. Moreover, there is an over- improved. 
smoothing phenomenon under the default parameters, The advantages of the GWO-RF model stem from 
and the sharpness needs to be improved by adjusting the its innovative algorithm architecture and optimized 
max features. The curve of the logistic regression model business adaptability 
is close to the diagonal line, and the linear decision (1) Improvement of Node Splitting Mechanism: 
boundary limits its ability to capture nonlinear patterns, Traditional random forests use a single splitting 
but the FPR is the lowest (0.35) when the threshold = 0.42, algorithm, while GWO-RF dynamically adapts to 
which is suitable for low false positive priority scenarios. scenarios with a mixture of discrete and continuous 
In Table 8, the performance improvement of the features by combining C4.5 information gain rate and 
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 287 
 
CART Gini coefficient through linear programming, indicators such as accuracy (86.7% vs 81.2%), recall 
solving the problem of traditional models' preference for (89.2% vs 76.5%), and F1 score (0.841 vs 0.792) 
specific data types. Compared to black box models such (p<0.001), while the misjudgment rate is reduced to 13.3% 
as LSTM, its splitting process has strong interpretability (18.8% in the traditional group). These improvements 
and can output feature weights, directly guiding human have statistical significance (McNemar test, χ ²=43.21, 
resource intervention measures. p<0.001), and the effect size Cohen's d>0.5 reaches a 
(2) Parameter optimization efficiency: The gray moderate or above level. Sensitivity analysis (E-value 
wolf algorithm has the ability to globally search for test OR ≥ 2.3) confirms the robustness of the results, 
hyperparameters, reducing model training time by 75% indicating that the GWO-RF algorithm has achieved a 
compared to grid search. Traditional logistic regression comprehensive improvement in predictive performance 
requires manual feature engineering, while deep learning through LPRF node splitting and grey wolf optimization. 
relies on GPU computing power and has high inference The ablation experiments in Tables 14 and 15 
latency (>200 ms). validated the performance degradation law of the GWO-
(3) Data adaptability: In response to the insufficient RF model in small sample scenarios: when the sample 
structured data of small and medium-sized enterprises, size n<50, removing the salary position index (a key 
the model improves small sample robustness through feature of employees with less than 1 year of service) 
Bootstrap resampling and feature random selection, resulted in a 9.9% decrease in accuracy and a 75% 
achieving AUC0.923+0.008 on 12365 data points, which increase in Gini coefficient fluctuation, indicating 
is 8.6 percentage points higher than the benchmark sensitivity of this feature to sparse data. After disabling 
random forest (AUC0.85). grey wolf optimization, the number of iterations for 
(4) Cost control: The splitting strategy under linear model convergence increased by 78.7%, and the standard 
programming constraints reduces overfitting, resulting in deviation of decision tree depth increased by 81%, 
a 59.9% decrease in misjudgment rate and a 54.3% highlighting the importance of parameter search for small 
decrease in single intervention cost. However, traditional sample stability. When the Bootstrap sampling ratio is 
methods such as Cox models have high intervention lag reduced to 20%, the confidence interval of information 
costs due to their static analysis characteristics. These gain rate expands by 43%, and the failure rate of linear 
innovations enable GWO-RF to achieve both predictive programming solution increases from 1.2% to 7.9%, 
accuracy and feasibility, but further integration of real- confirming that data distribution bias can undermine the 
time data stream processing is needed to enhance robustness of the LPRF node splitting algorithm 
predictive capabilities for new employees (<3 months). (Equation 12). Experiments have shown that the model 
In Table 9, the GWO-RF model proposed in this needs to optimize feature selection strategies and 
article demonstrates significant advantages in predicting dynamic weighting mechanisms for small samples. 
employee turnover. Firstly, in terms of prediction According to the 5-fold cross validation 
accuracy, by integrating C4.5 and CART splitting criteria experimental results of the GWO-RF model (Table 17), 
through LPRF linear programming, AUC-ROC is the model demonstrates strong robustness and 
improved to 0.872 (3.4% higher than the non LPRF practicality in predicting employee turnover. From the 
version), and the hit rate of high-risk employee perspective of predictive performance, the average AUC-
identification is increased by 19.7 percentage points. ROC is 0.884 ± 0.014, indicating that the model has 
Secondly, in terms of interpretability, the SHAP feature stable discriminative ability for identifying high-risk 
overlap reached 82.3%, and 68.2% of split choices employees. However, the fluctuation of recall rate (range 
focused on structural factors such as salary 9%) in the group with less than 1 year of work experience 
competitiveness, which is highly consistent with HR suggests the need to strengthen small sample feature 
management theory. Thirdly, although the computation enhancement strategies; In terms of feature stability, the 
efficiency increased by 18.6% for a single split, the coefficient of variation of the salary position coefficient 
optimization of split quality reduced the overall training (equation 3) is only 4.7%, which verifies the rationality 
iteration by 37%; Finally, in actual business operations, of the indicator design in section 3.1 of the document. The 
the employee retention rate was increased by 9.2 Gini weight β (equation 12) constraint satisfies | α - β | ≤ 
percentage points, and intervention costs were reduced by 0.2 for all folds, indicating the optimization effectiveness 
31.4%. This model innovatively optimizes the parameters of the linear programming combination coefficient 
of the random forest through the grey wolf algorithm and (equation 12). In terms of algorithm efficiency, the GWO 
dynamically adjusts the node splitting rules, but the iteration times are significantly different by 24 times and 
improvement in predicting employees with less than 3 the LPRF solution delay is ≤ 53ms, which meets the 
years of service is limited and needs to be enhanced with response requirements of real-time warning systems. 
a time series model. Overall, cross validation has confirmed the advantages of 
Table 12 compares the performance of GWO-RF the GWO-RF model in integrating grey wolf 
model and traditional model in predicting employee optimization with improved random forest (LPRF 
turnover. The data shows that the GWO-RF group is algorithm), but it is necessary to optimize feature 
significantly better than the traditional group in key engineering for hierarchical data based on seniority to 
288   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
further enhance generalization ability. interpretability. 
In the development of employee turnover prediction Taken together, the GWO-RF model showed 
models, the issues of model fairness and bias do require significant advantages in the employee management 
special attention, especially in sensitive human resource experiment: it optimizes the random forest parameters 
scenarios involving protected attributes such as gender through the gray wolf algorithm, achieves a 41.9% 
and age. According to the appendix, although the paper increase in the retention rate of high-risk employees, a 
does not directly discuss bias analysis, the GWO-RF 54.3% reduction in intervention costs, and a 13.8-point 
hybrid model used in it optimizes the random forest increase in satisfaction. At the same time, the 
parameters through the gray wolf algorithm. This misjudgment rate is reduced by 59.9% and the model 
objectively alleviates some bias problems in traditional iteration cycle is shortened by 75%. Its core advantages 
machine learning models: the integration characteristics lie in its dynamic optimization capabilities and feature 
of random forests can reduce the risk of overfitting of a engineering processing efficiency, but it has the 
single decision tree, and the LPR node splitting algorithm limitations of strong dependence on the quality of 
based on the Gini coefficient and information gain rate historical data and insufficient generalization capabilities 
can more evenly consider the contribution of various for small sample scenarios. Subsequent improvements 
features through linear programming combination. should focus on three aspects: ① Introducing transfer 
However, it should be noted that the model may still learning to enhance the adaptability of small samples, ② 
indirectly introduce bias through proxy variables such as developing real-time data cleaning modules to improve 
salary position (formula 3) and promotion delay duration, input quality, and ③  building a hybrid model 
for example, female employees may be underestimated in architecture (such as fusion LSTM) to capture time series 
retention probability by the system due to historical behavior characteristics. 
promotion data bias. It is recommended to add three 
dimensions of fairness testing. First, feature importance 
analysis is needed to verify that the protected attributes 5 Conclusion 
do not occupy a dominant weight. Second, adversarial 
depolarization techniques need to be used to incorporate By comparing the performance of GWO-RF model 
fairness constraints into the loss function. Finally, and traditional management mechanism in employee 
differential impact tests need to be established to ensure management, this study draws the following conclusions: 
that the predictive performance of the model does not GWO-RF model shows significant advantages in 
differ by more than 15% among different populations. multiple key indicators. First, the model increases the 
These measures can effectively meet the EU GDPR retention rate of high-risk employees to 89.7%, which is 
compliance requirements for algorithmic fairness and 41.9 percentage points higher than the current mechanism. 
avoid models amplifying existing structural biases in the This proves its excellent effect in talent retention. 
organization. Secondly, the intervention cost is significantly reduced 
The GWO-LPRF employee turnover prediction through algorithm optimization, and employee 
model proposed in this study significantly improves satisfaction increases by 13.8 points. This verifies the 
prediction performance by integrating grey wolf economic and humanistic value of the model. Third, the 
optimization algorithm and improved random forest model controls the misjudgment rate at 9.1%, which is 
algorithm. Specifically, the model adopts the Price 59.9% lower than the control group, and the iteration 
Mueller theoretical framework to construct an evaluation cycle is shortened to 3 months. This reflects the unique 
system consisting of 15 indicators, covering individual advantages of intelligent algorithms in accurate 
factors (such as age, education level), environmental prediction and rapid response. These improvements are 
factors (industry type), and structural factors (workload, due to the dynamic optimization of random forest 
salary position, etc.). The key technological breakthrough parameters by the gray wolf algorithm and the accurate 
lies in innovatively combining the information gain rate capture of management pain points by feature 
of C4.5 algorithm with the Gini coefficient of CART engineering. 
algorithm through linear programming (Formula 12) to However, the model still has three limitations. First, 
form an LPR node splitting strategy, making the selection it is not adaptable enough to small samples and data of 
of splitting attributes for decision trees more accurate. new employees. Second, the real-time data cleaning 
The model is validated using data from 12,365 employees mechanism of the model needs to be improved. Third, its 
of a listed company. The results show that it achieves ability to model the time series of complex behavioral 
significant results in AB testing, increasing the retention characteristics is limited. Therefore, subsequent research 
rate of high-risk employees by 41.9% and reducing will focus on developing transfer learning modules to 
intervention costs by 54.3%. After optimizing parameters enhance generalization capabilities, building an 
using the grey wolf algorithm, the model iteration cycle automated data quality monitoring system, and trying to 
was shortened by 75%. This achievement provides an introduce time series neural networks to build a hybrid 
intelligent decision-making tool for human resource model architecture. 
management that combines predictive accuracy and  
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 289 
 
27. 
References https://doi.org/10.1080/09585192.2024.2323510 
[11] José A. C. Vieira, Silva, F. J. F., Teixeira, J. C. A., 
[1] Akasheh, M. A., Hujran, O., Malik, E. F., & Zaki, N. 
António J. V. F. G. Menezes, & Azevedo, S. N. B. 
(2024). Enhancing the prediction of employee 
D. (2023). Climbing the ladders of job satisfaction 
turnover with knowledge graphs and explainable AI. 
and employee organizational commitment: cross-
IEEE Access, 12(000), 13. 
country evidence using a semi-nonparametric 
https://doi.org/10.1109/ACCESS.2024.3404829 
approach. Journal of Applied Economics, 26(1), 
[2] Ali, M., Baker, M., Grabarski, M. K., & Islam, R. 
2163581-. 
(2025). A study of inclusive supervisory behaviors, 
https://doi.org/10.1080/15140326.2022.2163581 
workplace social inclusion and turnover intention in 
[12] Jun, M., & Eckardt, R. (2025). Training and 
the context of employee age. Employee Relations: 
employee turnover: a social exchange perspective. 
An International Journal, 47(9). 
Business Research Quarterly, 28(1). 
https://doi.org/10.1108/ER-04-2024-0252 
https://doi.org/10.1177/23409444231184482 
[3] Bhat, M. A., Tariq, S., & Rainayee, R. A. (2024). 
[13] Karimi, M., & Viliyani, K. S. (2024). Employee 
Examination of stress-turnover relationship through 
turnover analysis using machine learning 
perceived employee's exploitation at workplace. 
algorithms. arXiv:2402.03905. 
PSU Research Review, 8(3). 
https://doi.org/10.48550/arXiv.2402.03905 
https://doi.org/10.1108/PRR-04-2021-0020 
[14] Kumar, P., Gaikwad, S. B., Ramya, S. T., Tiwari, T., 
[4] Byeon, H. (2024). Factors influencing voluntary 
Tiwari, M., & Kumar, B. (2023). Predicting 
turnover among young college graduates using the 
employee turnover: a systematic machine learning 
xgboost with bagging aggregation algorithm: 
approach for resource conservation and workforce 
findings from nationwide survey in south korea. 
stability. Engineering Proceedings, 59(1). 
International Journal of Engineering Trends and 
https://doi.org/10.3390/engproc2023059117 
Technology, 72(10), 130-139. 
[15] Li, Z., & Fox, E. (2023). Prediction and 
https://doi.org/10.14445/22315381/IJETT-
optimization of employee turnover intentions in 
V72I10P113 
enterprises based on unbalanced data. PLoS ONE, 
[5] Yuan, Z. (2024). Consumer behavior prediction and 
18(8). 
enterprise precision marketing strategy based on 
https://doi.org/10.1371/journal.pone.0307474 
deep learning. Informatica, 48(15). 
[16] Lim, C. S., Malik, E. F., Khaw, K. W., Alnoor, A., 
https://doi.org/10.31449/inf.v48i15.6260 
Chew, X. Y., & Chong, Z. L., et al. (2024). Hybrid 
[6] Floyd, T. M., Gerbasi, A., & Labianca, G. J. (2024). 
ga-deepautoencoder-knn model for employee 
The role of sociopolitical workplace networks in 
turnover prediction. Statistics, Optimization & 
involuntary employee turnover. Social Networks, 
Information Computing, 12(1). 
76, 215-229. 
https://doi.org/10.19139/soic-2310-5070-1799 
https://doi.org/10.1016/j.socnet.2023.10.005 
[17] Mcevoy, G. M., & Cascio, W. F. (1987). Do good or 
[7] Gopalan, N., Beutell, N. J., & Alstete, J. W. (2023). 
poor performers leave? a meta-analysis of the 
Can trust in management help? job satisfaction, 
relationship between performance and turnover. 
healthy lifestyle, and turnover intentions. 
Academy of Management Journal, 30(4), 744-762. 
International Journal of Organization Theory & 
https://doi.org/10.5465/256158 
Behavior, 26(3), 185-202. 
[18] Nan, L., & Zhang, H. (2023). A model for analyzing 
https://doi.org/10.1108/IJOTB-09-2022-0180 
employee turnover in enterprises based on improved 
[8] Hakim, E., & Muklason, A. (2024). Analysis of 
xgboost algorithm. International Journal of 
employee work stress using crisp-dm to reduce 
Advanced Computer Science & Applications, 
work stress on reasons for employee resignation. 
14(11). 
Data Science: Journal of Computing & Applied 
https://doi.org/10.14569/ijacsa.2023.01411104 
Informatics, 8(2). 
[19] Nigoti, U., David, R., Singh, S., Jain, R., & Kulkarni, 
https://doi.org/10.32734/jocai.v8.i2-14615 
N. M. (2025). Does flexibility really matter to 
[9] Hom, P. W., Rogers, K., Allen, D. G., Zhang, M., 
employees? a mixed methods investigation of 
Lee, C., & Zhao, H. H. (2025). Feel the pressure? 
factors driving turnover intention in the context of 
normative pressures as a unifying mechanism for 
the great resignation. Global Journal of Flexible 
relational antecedents of employee turnover. Human 
Systems Management, 26(1), 187-208. 
Resource Management, 64(1). 
https://doi.org/10.1007/s40171-024-00436-6 
https://doi.org/10.1002/hrm.22250 
[20] Panaccio, A., Tang, W. G., & Vandenberghe, C. 
[10] Iii, V. Y. H., Guerrero, S., & Marchand, A. (2024). 
(2023). Agreeable supervisors promoting the 
Flexible work arrangements and employee turnover 
organization - implications for employee 
intentions: contrasting pathways. International 
commitment and retention. Journal of Personnel 
Journal of Human Resource Management, 35(11), 
Psychology, 22(3), 146-157. 
290   Informatica 49 (2025) 269–290                                                                                  H. Zhang 
 
https://doi.org/10.1027/1866-5888/a000318 
[21] Portocarrero, F. F., & Burbano, V. C. (2024). The 
effects of a short-term corporate social impact 
activity on employee turnover: field experimental 
evidence. Management Science, 70(9). 
https://doi.org/10.1287/mnsc.2022.01517 
[22] Pourkhodabakhsh, N., Mamoudan, M. M., & 
Bozorgi-Amiri, A. (2022). Effective machine 
learning, meta-heuristic algorithms and multi-
criteria decision making to minimizing human 
resource turnover. Applied Intelligence, 1-23. 
https://doi.org/10.1007/s10489-022-04294-6 
[23] Azeroual, O., Nacheva, R., Nikiforova, A., & Störl, 
U. (2025). A CRISP-DM and predictive analytics 
framework for enhanced decision-making in 
research information management systems. 
Informatica, 49(18). 
https://doi.org/10.31449/inf.v49i18.5613 
[24] Van Ruysseveldt, J., Van Dam, K., Verboon, P., & 
Roberts, A. (2023). Job characteristics, job attitudes 
and employee withdrawal behaviour: a latent 
change score approach. Applied Psychology: An 
International Review, 72(4). 
https://doi.org/10.1111/apps.12448 
[25] Veglio, V., Romanello, R., & Pedersen, T. (2025). 
Employee turnover in multinational corporations: a 
supervised machine learning approach. Review of 
Managerial Science, 19(3), 687-728. 
https://doi.org/10.1007/s11846-024-00769-7 
 
https://doi.org/10.31449/inf.v49i16.9243 Informatica 49 (2025) 291–302 291 
 
Comparative Analysis of Machine Learning Models for Water 
Quality Prediction Using Regional Monitoring Data 
 
Ying Xiong 
Chongqing Water Resources and Electric Engineering College, Chongqing 402160, China 
E-mail: xiong-ying188@hotmail.com 
 
Keywords: water quality prediction, machine learning, decision tree, SVM, random forest, neural network 
 
Received: May 15, 2025 
 
This study investigates the comparative performance of four labelical machine learning algorithms—
Decision Tree, Support Vector Machine (SVM), Random Forest, and Neural Network—on water quality 
prediction tasks using a dataset comprising 1,000 real-time sensor data points from five distinct 
geographic regions. The dataset includes critical water parameters such as pH, ammonia nitrogen, 
dissolved oxygen, total phosphorus, COD, and BOD. Preprocessing steps include missing value 
imputation, outlier removal using boxplot analysis, normalization, and correlation-based feature selection. 
Each model is tuned through grid search for optimal performance. Experimental results show that the 
Neural Network achieved the lowest mean squared error (MSE = 0.047) and highest coefficient of 
determination (R² = 0.976), outperforming the other models. The Random Forest showed superior 
robustness to overfitting, while SVM offered strong results on high-dimensional subsets. Decision Trees, 
although less accurate (MSE = 0.130), provided high interpretability. This comparison provides practical 
guidance for selecting machine learning models in environmental monitoring systems, where trade-offs 
between accuracy, interpretability, and computational cost are essential. 
Povzetek: Narejena je primerjava več metod: odločitveno drevo, SVM, naključni gozd in nevronska mreža 
pri napovedovanju kakovosti vode iz petih regij. Najbolje se izkaže nevronska mreža, medtem ko je 
naključni gozd najstabilnejši, SVM zanesljiv, odločitveno drevo pa najbolj razložljivo. 
 
 
accurately capture nonlinear relationships in water 
1 Introduction quality changes, and this study highlights the application  
potential of machine learning in complex water quality 
Water pollution affects the health of human beings and data analysis [2]. Li et al. studied the impact of climate 
the stability of ecosystem. The process of change on river water quality and used machine learning 
industrialization and urbanization is accelerating, and the technology for data analysis. They found that machine 
pollution of water source is becoming more and more learning can cope with water quality prediction under 
serious. Traditional water quality monitoring methods changes in multiple variables and complex environmental 
rely on manual sampling and laboratory analysis, which factors [3]. Aalipour et al. analyzed the impact of 
is inefficient and slow, and can not be monitored in real landscape changes on river water quality, and machine 
time. With the development of artificial intelligence learning models were able to process complex 
technology, machine learning, as an efficient data environmental data and provide accurate water quality 
analysis tool, can learn and forecast a large number of predictions [4]. Stevens et al. reviewed the application of 
water quality data to provide real-time and accurate water machine learning in electronic health record screening, 
quality early warning. suggesting the potential of integrated machine learning 
Research in the field of water quality prediction and approaches in several fields [5]. Zou et al. summarized 
monitoring has developed in recent years, and machine the application of machine learning in precision medicine 
learning technology has been widely used in water therapy, believing that machine learning can process 
quality data analysis. Eyring et al. explored the potential complex multidimensional data and extract key 
of combining climate modeling with machine learning, influencing factors [6]. Zainurin et al. reviewed in detail 
arguing that machine learning could drive innovation in the progress of water quality monitoring based on various 
environmental data processing [1]. Bren and Ryan used sensor technologies and emphasized the role of machine 
machine learning technology to analyze water quality learning in real-time processing of water quality data [7]. 
monitoring data when studying water quality in streams Recent years have seen an increasing number of 
in the eastern Highlands. Machine learning models can  studies applying machine learning techniques to water 
292   Informatica 49 (2025) 291–302                                                                                   Y. Xiong 
 
quality prediction, with diverse regional and classical machine learning algorithms on a uniform, 
environmental contexts. Quiroz-Martinez et al. [8] multi-regional dataset. Most prior research focuses either 
proposed a big-data-driven architecture for aquaculture on a single water parameter or uses proprietary datasets 
water quality prediction, focusing on real-time lacking reproducibility. By comparing model 
integration and scalability. Their system emphasizes the interpretability, error profiles, and training costs across 
structural design of prediction frameworks rather than diverse indicators (e.g., DO, COD, NH₃-N), this work 
algorithm benchmarking. In northeastern Thailand, contributes practical insights for regional water 
Uypatchawong and Chanamarn [9] demonstrated the monitoring deployment. 
improvement of prediction efficiency using machine Table 1 summarizes representative studies that 
learning models such as Random Forest and Support applied machine learning to water quality or similar 
Vector Machines. Their work underscores the environmental data prediction tasks. It outlines the 
significance of regional hydrological features and data datasets used, applied models, key evaluation metrics, 
preprocessing in boosting model performance. In a and findings. This comparison reveals that while some 
complex environmental scenario, Huang et al. [10] studies employ modern deep learning models or domain-
developed a water quality prediction model for the specific architectures, limited work provides a direct 
downstream of Dongjiang River Basin, incorporating comparative evaluation of labelical ML models using 
joint impacts from water intakes, pollution sources, and diverse yet small-scale environmental datasets—
climate variability. They utilized spatial-temporal data precisely the focus of our study. 
fusion and ensemble learning to capture dynamic This study analyzes the application of machine 
interactions across multiple influencing factors. Wu and learning algorithms in water quality prediction, compares 
Zhang [11] focused on the Yangtze River Delta, applying the performance of different algorithms, and finds the 
machine learning within the governance framework of best water quality prediction model. Machine learning 
China’s River Chief System. Their study highlights algorithm was used to analyze and model water quality 
policy-driven data availability and found that SVM and data, collect water quality data from different regions, and 
ANN models are particularly effective in capturing conduct data pre-processing. Select a variety of machine 
variations in high-density industrial and urban runoff learning algorithms, design and train models to evaluate 
areas. Despite the growing body of literature, most their performance in water quality prediction. Indexes 
existing studies focus either on a single prediction model such as mean square error (MSE) and coefficient of 
or on narrowly scoped geographical settings. Few works determination (R²) were used to evaluate the model 
offer a controlled, algorithm-level comparative analysis performance, compare algorithms, analyze advantages 
using standardized metrics across classical models such and disadvantages, and select the most suitable algorithm 
as Decision Tree, SVM, Random Forest, and Neural for water quality prediction. According to different water 
Network on multi-parametric datasets. This study quality parameters, the adaptability of the algorithm is 
addresses that gap by benchmarking these models on a studied, and the optimization path of water quality 
five-region dataset using consistent preprocessing, prediction is explored. It enriches the theoretical research 
hyperparameter tuning, and evaluation standards. in the field of water quality monitoring, provides a 
This study fills a methodological gap in the current technical scheme for practical application, and has high 
literature by providing a standardized comparison of four social value and application prospect. 
 
Table 1: Summary of previous research on ML in water quality prediction 
Study Dataset Description Models Used Evaluation Metrics Key Findings 
ML models captured 
Stream water 
Bren & Ryan [2] SVM, k-NN Accuracy, RMSE nonlinearity in stream 
(regional, 500 pts) 
pollution 
River systems with ML effective in multi-
Li et al. [3] RF, ANN R², RMSE 
climate inputs variable prediction 
Landscape shape 
River data with land 
Aalipour et al. [4] RF, SVM MAE, R² significantly affects 
patches 
prediction 
Neural network 
Five zones (urban to 
This Study DT, SVM, RF, NN MSE, R² superior in nonlinear 
industrial), 1000 pts 
prediction 
 
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 293 
 
Table 2: Source of water quality data and sample overview 
Region Sample Size Water Quality Parameters Data Source 
pH, Dissolved Oxygen, Ammonia Water Quality 
Area A 200 
Nitrogen, Total Phosphorus Monitoring Station 
Environmental 
Area B 200 pH, COD, BOD, Ammonia Nitrogen 
Protection Department 
Dissolved Oxygen, pH, Total Phosphorus, 
Area C 200 Water Affairs Company 
COD 
Dissolved Oxygen, Ammonia Nitrogen, Water Quality Testing 
Area D 200 
pH, BOD Platform 
pH, Ammonia Nitrogen, Total Environmental 
Area E 200 
Phosphorus, COD Monitoring Center 
 
This study aims to address the following research enhance model robustness and cross-context validity. 
question: Which labelical machine learning algorithm  
offers the best trade-off between predictive accuracy and 
2.1.2 Data preprocessing 
computational efficiency for small-scale, region-specific 
After data collection, pre-processing is performed. 
water quality datasets? By formulating and evaluating 
Processing missing values, for a small amount of missing 
models under consistent conditions, the study 
data, use the mean filling method and interpolation 
hypothesizes that deep neural networks will provide 
method to fill; For variables with more missing data, the 
superior performance in accuracy, while ensemble 
features are removed to ensure the integrity of the data 
methods like Random Forest may offer better 
set. The identification and processing of outliers adopt the 
generalization with moderate cost. 
method based on box diagram, set reasonable upper and 
lower limits, and correct or delete the data that exceeds 
2 Materials and methods the range [13]. In view of the dimensionality 
inconsistency of different water quality parameters, 
standardized treatment was used to scale the numerical 
2.1 Data collection and sample selection 
range of each feature to a unified scale, so as to avoid the 
2.1.1 Data source deviation of the training results of the model due to 
This study uses water quality data from five different dimensional differences. In terms of feature selection, the 
regions, covering a variety of environmental types method based on correlation analysis is used to calculate 
including urban, rural and industrial areas. It is divided the Pearson correlation coefficient between various water 
into zones A, B, C, D and E, covering different water quality parameters and select the features with strong 
quality monitoring points to ensure the diversity and correlation with target variables (such as water quality 
representativeness of data. For example, pH value, changes). The features are screened by Chi-square test 
dissolved oxygen, ammonia nitrogen, total phosphorus, and information gain, and redundant or irrelevant 
chemical oxygen demand (COD), biochemical oxygen variables are removed to improve the accuracy and 
demand (BOD), etc., the specific data amount is 200 for training efficiency of the model. Feature selection was 
each region, a total of 1000 data [12]. The data is conducted using both chi-square testing and Pearson 
provided by local water quality monitoring agencies and correlation filtering. The chi-square test evaluated 
environmental protection departments and collected in statistical independence between discrete features and 
real time through sensor systems. As shown in Table 1, categorical target representations, with features showing 
these data reflect the water quality changes in different p-values greater than 0.05 removed. Pearson correlation 
regions in different time periods, and provide effective coefficients below 0.3 with the output variable indicated 
training samples for the construction of water quality weak linear relevance and were also excluded. Based on 
prediction models. these criteria, features such as conductivity and total 
The dataset employed in this study consists of 1,000 nitrogen were eliminated. The final set of retained 
samples sourced from five regions, which, while diverse, features included pH, ammonia nitrogen, dissolved 
constitutes a relatively limited dataset. This limitation oxygen, COD, and total phosphorus. 
potentially impacts the generalizability of the model. To  
address this, future work will consider the integration of 
2.1.3 Data division 
synthetic data generation techniques (e.g., SMOTE or 
GAN-based augmentation) or the inclusion of additional The data set is divided into training set, verification set 
datasets from broader spatial or temporal domains to and test set in proportion, as shown in Table 2 below, with 
294   Informatica 49 (2025) 291–302                                                                                   Y. Xiong 
 
training set accounting for 60%, verification set 2.2.2 Model architecture design 
accounting for 20%, and test set accounting for 20%. The The basic architecture design of each model was 
training set is used for model training and parameter optimized according to the characteristics of water 
tuning, the verification set is used for model performance quality prediction. CART algorithm was adopted in the 
evaluation and hyperparameter selection, and the test set decision tree model, with the maximum depth set at 10 
is used for final model verification and evaluation [14]. and the minimum number of samples divided at 5. 
The division method adopts random sampling to ensure Pruning is used to avoid overfitting and improve the 
that each data point has an equal opportunity to be generalization ability of the model. Support vector 
assigned to different sets, and the distribution of water machine (SVM) RBF kernel is used to balance training 
quality data in each data set is consistent with the overall accuracy and model complexity by selecting a moderate 
data set. To prevent data leakage, all preprocessing penalty parameter C and kernel parameter γ. The random 
steps—standardization, outlier removal, and feature forest model sets 100 trees with a maximum depth of 15, 
selection—were applied strictly to the training set. The using a restriction that does not allow nodes to be divided 
validation and test sets were transformed using statistics too small (the minimum number of samples to be divided 
(mean, standard deviation) computed only from the is 5) [17]. The neural network uses three hidden layers 
training data. This ensures that no target information with 64 neurons each, ReLU for the activation function, 
leaked into the training process or model selection. and dropout technology during training to prevent 
 overfitting. The learning rate, regularization method and 
Table 3: Data set partitioning results other hyperparameters of each model are optimized by 
Dataset Sample Size grid search to select the best combination [18]. The neural 
network architecture consisted of a multilayer perceptron 
Training Set 600 
(MLP) with three fully connected hidden layers of 64 
Validation Set 200 neurons each, using ReLU activation and dropout 
Test Set 200 regularization. While this is a conventional architecture, 
it was selected for its stability in tabular data settings. 
 
Although water quality inherently contains temporal 
2.2 Model construction dependencies, the current study used a static snapshot for 
2.2.1 Model selection model training. Future work will explore recurrent 
In order to improve the accuracy of water quality structures such as Long Short-Term Memory (LSTM) 
prediction, a variety of machine learning algorithms such and Graph Neural Networks (GNNs) to capture spatial 
as decision tree, support vector machine (SVM), random and temporal correlations in water quality dynamics. 
forest and neural network were selected for comparative  
analysis. The decision tree divides the data space and 2.2.3 Training process 
makes decisions layer by layer based on different values In the training process, the training parameters of each 
of features, which has good interpretability. It is suitable model are carefully set and optimized. In order to achieve 
for processing data with simple and obvious relationship the optimal performance, hyper parameters such as 
between features [15]. Support vector machine (SVM) learning rate, maximum depth and maximum number of 
can deal with high dimensional data by finding the iterations of all algorithms are selected. Decision trees 
optimal decision hyperplane, and can maintain good control the maximum depth to prevent overfitting, and 
performance in high dimensional feature space. Random random forests increase the number and depth of trees to 
forest is one of the ensembles learning methods, which improve predictive power. The training of the SVM 
constructs multiple decision trees and votes to avoid model adjusts the penalty parameter C and the kernel 
overfitting problems and is suitable for processing large- function parameter γ to optimize the beatification 
scale data sets. Neural networks, deep neural networks boundary of the model in the high-dimensional space. As 
(DNNS), map input data through multiple hidden layers, shown in Table 3, the training of neural networks uses the 
have powerful modeling capabilities, and can capture Adam optimizer, adjusting the learning rate, batch size, 
complex nonlinear relationships in the data [16]. and number of training rounds to ensure convergence. 
Although Support Vector Machines (SVMs) are Hyperparameter tuning was conducted using a grid 
well-known for handling high-dimensional data, in this search strategy. For SVM, we evaluated C values in [0.1, 
study the input feature dimension is relatively low (6–7 1, 10] and γ values in [0.01, 0.1, 1]. For Random Forest, 
features). The inclusion of SVM is primarily justified by tree depths from 10 to 25 and estimators from 50 to 150 
its robust generalization capabilities on small-to- were considered. Neural network tuning involved batch 
medium-sized datasets and its effectiveness in capturing sizes of 32 and 64, learning rates of 0.001 and 0.0005, 
nonlinear boundaries via kernel methods, not due to high and dropout rates of 0.2 to 0.5. The optimal configuration 
dimensionality. was selected based on the lowest validation MSE.  
  
  
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 295 
 
Table 4: Training parameters and optimization the model. Equation (3) is shown below. 
objectives of each algorithm 
 TP+TN
Accuracy =  (3) 
Key Optimization TP+TN + FP+ FN
Model 
Parameters Objectives TP is a true example, TN is a true negative, FP is a 
false positive example, and FN is a false counterexample. 
Max Depth = Pruning, 
Decision Tree In addition to MSE and R², we included Mean Absolute 
10 Generalization Error (MAE) as a robustness metric. MAE values for 
C = [0.1, 1, Minimize MSE Neural Network, Random Forest, SVM, and Decision 
Tree were 0.058, 0.065, 0.071, and 0.094, respectively. 
SVM 10], γ = [0.01, via kernel 
Furthermore, residual plots and feature influence 
0.1] optimization diagrams were generated using SHAP values to interpret 
Trees = 100, Reduce model outputs and identify the most impactful parameters. 
Random 
Max Depth = overfitting,  
Forest 
15 improve stability 2.3 Algorithm comparison and analysis 
Layers = 5, 2.3.1 Algorithm comparison 
In the water quality prediction task, four selected machine 
Neural Neurons = Minimize MSE, 
learning algorithms - decision tree, support vector 
Network 64/layer, regularization machine (SVM), random forest and neural network - 
Dropout = 0.3 showed different performance characteristics. The mean 
square error (MSE) and coefficient of determination (R²) 
 
are used as the main performance indicators to 
2.2.4 Evaluation criteria comprehensively evaluate the merits of each model. The 
In order to evaluate the performance of each model in evaluation results of each model on the test set are shown 
water quality prediction, mean square error (MSE), in Figure 1 below. 
determination coefficient (R²) and accuracy rate were 
selected as the main evaluation indexes [19]. Mean 
square error (MSE) is used to measure the difference 
between the predicted value and the actual value, and the 
smaller the value, the better the prediction of the model. 
The coefficient of determination (R²) reflects the model's 
ability to explain data variation, and the closer it is to 1, 
the stronger the model's ability to explain data variation. 
Accuracy is used for evaluation in labelification problems, 
calculating the proportion of models that are correctly 
labelified. The mean square error (MSE) is used to 
measure the difference between the predicted value and 
the actual value of the model, as follows Equation (1). 
 
1 n  
MSE =  (y − ˆ
i yi )
2
 (1) Figure 1: Performance comparison of different 
n i=1
algorithms 
Where, y  is the actual value, yˆ  is the predicted 
i i  
value, n  is the total number of samples. The coefficient 
As shown in Figure 1, the neural network performed 
of determination R² is used to measure the ability of the 
best in the accuracy of water quality prediction, with the 
model to explain the variation in the data, as follows 
smallest MSE (0.047) and the largest R² (0.976). Random 
Equation (2). 
forests and support vector machines also performed well, 
n
achContrary to initial assumptions, tieving MSE of 0.053 
 (y − ˆ
i y )2
i
 R2 =1− i=1 and 0.058, and R² of 0.963 and 0.95, respectively. The 
  (2) 
n performance of decision tree is relatively weak, although 
 ( y − y)2
i the R² is 0.945 and the MSE is large, there are large errors 
i=1
in water quality prediction [20]. Neural networks are 
y  is the actual value, y  is the predicted value, 
i suitable for dealing with complex nonlinear relationships 
and yˆ  is the mean of the actual value. Accuracy is a in water quality data, random forests and support vector 
i
machines perform well in medium complexity problems, 
common evaluation criterion in labelification problems, 
and decision trees are more suitable for simple 
calculating the proportion of correct predictions made by relationships between features. 
296   Informatica 49 (2025) 291–302                                                                                   Y. Xiong 
 
2.3.2 Influencing factors of algorithm selection As shown in Figure 2, neural networks perform best 
(1) Compare the performance differences of different in the prediction of NHL and DO, with the lowest MSE 
algorithms in the prediction of specific water quality and the highest R². Neural networks have advantages in 
parameters capturing complex nonlinear relationships in water 
Different algorithms show differences when dealing quality data. The performance of random forest and 
with specific water quality parameters. Taking ammonia support vector machine on these two parameters is 
nitrogen (NHL) and dissolved oxygen (DO) as an similar and relatively stable. The prediction error of 
example, the prediction performance of four algorithms decision tree in these two indexes is relatively large, and 
in these two indicators is shown in Figure 2 below. the prediction performance of NHL is relatively poor [21]. 
 
 
Figure 2: Differences of different algorithms in the prediction of specific water quality parameters 
 
Differences in training time and computational complexity of 
different algorithms
Computational Complexity (seconds/sample) Training Time (seconds)
0,15
Neural Network
72,4
0,06
Random Forest
30,6
0,03
Support Vector Machine
15,3
0,01
Decision Tree
5,2
 
Figure 3: Differences in training time and computational complexity of different algorithms 
 
Training time and resource usage were benchmarked (2) Compare the differences between different 
on an Intel i7-12700H CPU (16GB RAM) and NVIDIA algorithms in terms of training time and average 
RTX 3060 GPU. For per-sample inference: Decision Tree inference time per sample during test phase. In addition 
= 0.002s, SVM = 0.013s, Random Forest = 0.010s, to prediction accuracy, the training time and 
Neural Network = 0.021s. GPU memory consumption for computational complexity of the algorithm are also 
the neural network peaked at 612MB. Training duration important considerations when selecting a model. Figure 
for the largest model (NN) was approximately 95 seconds 3 below shows the difference in training time and 
for 600 training samples. computational complexity of different algorithms [22]. 
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 297 
 
As shown in Figure 3, the training time and prediction tasks and data characteristics. When facing the 
computational complexity of decision tree are lower than prediction of various water quality parameters, the most 
other algorithms, which is suitable for application in suitable algorithm is selected according to the 
scenarios with high real-time requirements. There is a characteristics of each parameter. For the complex 
small gap between support vector machine and random nonlinear relationship between water quality parameters, 
forest in training time, and the training time will increase the integrated learning methods such as neural network 
with the increase of sample number [23]. The training and random forest are more effective. Decision tree and 
time of neural network is the longest and the support vector machine are better choices when the data 
computational complexity is also high. Because of its volume is small or the computing resources are limited. 
complex network architecture, it needs more computing In algorithm optimization, the hyperparameters of the 
resources. According to Figure 3, if the system has a high model are adjusted to improve the prediction accuracy. 
requirement for real-time performance and a large The learning rate, the number of layers and the number of 
amount of training data, decision tree or support vector neurons per layer in the neural network should be 
machine can be suitable. In the case of high precision and adjusted according to the specific task. Support vector 
sufficient computing resources, neural network is more machine should select appropriate kernel function and 
ideal. adjust penalty factor to improve the accuracy of model. 
 The cross-validation method was used to optimize the 
parameters to improve the accuracy of the model and 
2.4 Optimization suggestions and 
avoid overfitting. Integrated learning methods such as 
implementation path of water quality Adaboost and XGBoost improve the stability and 
prediction accuracy of water quality prediction through the 
2.4.1 Optimal collection and processing path of water combination of multiple models. In view of the drastic 
changes of some water quality parameters, the time series 
quality data analysis technology is introduced and the historical data 
The accuracy of water quality prediction is highly is dynamically adjusted to improve the real-time 
dependent on the quality of data. Optimizing the prediction. 
collection and processing of data can improve the Integrated learning methods such as random forest 
prediction accuracy. The collection of water quality data and boosting are particularly effective in managing 
should be combined with a variety of sensors and variance and overfitting. Neural networks, while not 
monitoring means to obtain various indicators of water in ensemble models per se, excel at learning nonlinear 
a comprehensive, real-time and accurate manner.  Water relationships through multi-layered representation 
quality monitoring equipment is deployed to collect learning. Their inclusion here refers to their 
water quality parameters such as ammonia nitrogen, complementary role in hybrid modeling, not as ensemble 
dissolved oxygen, pH value and total nitrogen in real time, learners. 
avoiding the shortage of traditional water quality  
monitoring relying on periodic sampling. The key to 
optimize the acquisition path is to increase the frequency 2.4.3 Real-time feedback and decision support path of 
of data acquisition and multi-dimensional monitoring to water quality prediction results 
enhance the misrepresentations and timeliness of data. The real-time feedback of water quality prediction results 
Data multiprocessing improves the model effect. For can help to detect water quality problems in time and 
missing values, interpolation method or data of similar provide strong support for decision-making. Combined 
indicators are used to fill in to ensure data integrity. For with real-time monitoring system and data transmission 
outliers, statistical methods such as box plots or standard network, the forecast results are transmitted to the control 
deviations are used to screen and correct. center in real time, which is convenient for relevant 
This study utilized a static dataset of 1,000 departments and personnel to make decisions. The 
observations for model evaluation. While real-time realization path of real-time feedback relies on big data 
modeling and dynamic feedback were not implemented, platform and cloud computing technology, and uses real-
their inclusion as forward-looking strategies aims to time data stream processing technology to update the 
guide system improvement in practical deployments. forecast results to the monitoring system in real time to 
Real-time data acquisition, time-series analysis, and ensure the timeliness and accuracy of decision-making. 
multidimensional monitoring are intended as future The results of water quality prediction should be 
research directions. embedded in decision support systems to help decision 
 makers carry out more scientific analysis. Through data 
2.4.2 Adaptive model selection and algorithm visualization technology, the prediction results and water 
quality change trends are displayed, and the risk 
optimization path assessment of machine learning models is combined to 
The selection of the adaptive model is determined provide a more comprehensive decision-making basis. 
according to the requirements of different water quality The forecast results can be correlated with relevant 
298   Informatica 49 (2025) 291–302                                                                                   Y. Xiong 
 
monitoring data to identify potential problems in water 
quality in real time, give early warning and take 3 Results and discussions 
appropriate measures. To assess real-time applicability, 
the system latency was analyzed based on the data input- 3.1 Result analysis 
to-output delay. Inferences on a mid-tier GPU (RTX 3060) 3.1.1 Evaluation results of each model 
showed average prediction latency of 0.21 seconds per In water quality prediction task, the choice of algorithm 
sample. The system supports batch updates every 10 directly affects the prediction accuracy and error 
minutes with low-latency pipelines. For deployment, performance. The mean square error (MSE) and 
models are integrated via edge-based computation units coefficient of determination (R²) were used to evaluate 
for decentralized monitoring or cloud-based APIs for the predictive performance of each model. In the 
centralized processing, depending on the infrastructure evaluation process of the model, the prediction results of 
scenario. four machine learning algorithms - decision tree, support 
 vector machine (SVM), random forest and neural 
2.4.4 Combination path of model and automation network - were compared one by one. 
The evaluation results of decision tree model show 
system 
that it performs well in the prediction of some water 
The water quality prediction model is combined with the quality parameters, such as ammonia nitrogen, total 
automatic system to realize fully automated water quality nitrogen, etc. For these parameters, the R² value of the 
monitoring and regulation, and improve the efficiency decision tree model can reach more than 0.85, and the 
and accuracy of water resources management. Through MSE is low. In the face of more complex water quality 
sensing the real-time data collected by the equipment, the data, over fitting is easy to occur, resulting in the decline 
automatic system input it into the prediction model, of the prediction accuracy of other water quality 
automatically calculate and feedback the water quality parameters. 
prediction results, and guide the automatic SVM was stable in the prediction of multiple water 
implementation of water quality improvement measures. quality parameters (e.g., dissolved oxygen, pH, etc.), with 
Based on the predicted results, the automated system can R² values generally above 0.80 and MSE remaining at a 
adjust the operating state of the water treatment low level when dealing with linearly correlated data. The 
equipment, deal with water quality anomalies in a timely stochastic forest model improves the robustness of data 
manner, and avoid delays caused by manual intervention. by integrating multiple decision trees. Compared with the 
In the specific application process, the combination of single decision tree model, the random forest showed a 
Internet of Things (IT) technology and edge computing higher R² value in the prediction of multiple water quality 
improves the real-time response capability of automated parameters, up to 0.85, and fewer over fitting phenomena. 
systems. Move data acquisition and preliminary analysis In the face of data with nonlinear relationship, random 
to edge devices, take the pressure off cloud processing, forest can adapt well. 
and enable fast decision making and execution locally. The neural network model shows strong prediction 
Edge computing ensures that systems can operate ability through deep structure and optimization algorithm. 
efficiently even when network latency is high or offline. On a large data set, the neural network can better capture 
Through automatic control, automatic adjustment of the complex relationship between water quality 
water treatment facilities, discharge control equipment, parameters. In this experiment, the R² value of the neural 
etc., improve the intelligent level of water quality network in multiple water quality parameters is more than 
management. The path to combining a water quality 0.90, which shows its potential in water quality prediction. 
prediction model with an automated system needs to Neural network requires higher computing resources, and 
ensure seamless connectivity, including data collection, the training time is longer. Figure 4 below shows the 
transmission, processing, decision support, and executive evaluation results of each model, including the MSE and 
feedback. Through highly integrated systems, improve R² values of each model for different water quality 
the level of automation, intelligence and refinement of parameters, and visually presents the prediction accuracy 
water quality management, and promote the development and error performance of different algorithms. 
of water resources management to a more efficient and Contrary to initial assumptions, the decision tree 
accurate direction. model performed better on simpler parameters such as pH 
 and dissolved oxygen (MSE < 0.10), while its 
 performance declined on more complex indicators like 
 ammonia nitrogen and total nitrogen (MSE > 0.11). For 
 random forest, all four key parameters achieved R² values 
 exceeding 0.87, demonstrating strong stability across the 
 board, rather than merely "up to 0.85" as previously 
 stated. 
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 299 
 
 
Figure 4: Prediction accuracy and error analysis of each model 
 
3.1.2 Model evaluation and comparison complex data. SVM depends on the choice of kernel 
According to the evaluation results of each model, it can function and the adjustment of penalty factor. Good 
be seen that they differ in the prediction accuracy and parameter selection can improve the generalization 
error of different water quality parameters. In order to ability of the model. The integration of multiple decision 
compare the advantages and disadvantages of each model trees in random forest reduces the possibility of over 
in more detail, the parameter configuration, training time fitting and increases the training time and computational 
and computational complexity of the model are analyzed. complexity. The neural network controls the complexity 
The main parameters of decision tree include tree of the model by setting the number of layers, the number 
depth and branching number. Optimizing these of neurons and the learning rate. Due to the large 
parameters can improve the performance of the model. In computing resource demand, the training time is longer. 
the training process, the calculation speed of decision tree Table 4 below shows the parameter configuration and 
is fast, and over fitting will occur when dealing with performance comparison of different models. 
 
Table 5: Parameter configuration and performance comparison of each model 
Training Time 
Model Depth / Layers Key Parameters MSE R² 
(s) 
Decision Tree Depth = 10 32 Pruning 0.062 0.945 
Kernel: RBF, C 
SVM - 48 0.058 0.95 
= 1, γ = 0.1 
Trees = 100, 
Random Forest 55 - 0.053 0.963 
Depth = 15 
LR = 0.001, 
Neural Network Layers = 5 × 64 120 0.047 0.976 
Dropout = 0.3 
 
To validate the observed differences in model significant (p < 0.01). Confidence intervals for MSE 
performance, paired t-tests were conducted between each differences were also computed, showing a 95% CI of 
algorithm's predictions across the test dataset. The MSE [0.013, 0.021] for the Neural Network vs. Random Forest 
differences between Neural Network and Decision Tree, comparison. These results confirm that performance 
as well as Neural Network and SVM, were statistically differences are not due to random chance, strengthening 
300   Informatica 49 (2025) 291–302                                                                                   Y. Xiong 
 
the validity of model selection recommendations. cross-validation logs and final test set measurements. 
  
3.1.3 Result visualization 3.2 Discussion 
Visualizing prediction outcomes facilitates an intuitive In this study, four machine learning algorithms, namely 
understanding of model performance across different decision tree, support vector machine (SVM), random 
water quality parameters. In this study, bar charts were forest and neural network, were used to predict water 
utilized as the primary visualization method to present quality data. In the evaluation process, model selection 
both the Mean Squared Error (MSE) and the coefficient and parameter tuning directly affect the prediction 
of determination (R²) for each algorithm. This approach accuracy and training time. Different algorithms show 
enables a clear comparative analysis of prediction their advantages and disadvantages when processing 
accuracy and model fit on a per-parameter basis. The water quality data. 
result visualization is calculated in the following Although SVMs are theoretically sensitive to large 
Equation (4). datasets due to their reliance on support vector expansion, 
1 n in this study, the actual training time (15.3 seconds) was 
 MSE = (y 2 ( )
true,i − ypred,i )  4  
n lower than that of the random forest (30.6 seconds) and 
i=1
neural network (72.4 seconds), as shown in Figure 3. This 
y  says the actual value, p
true, y  said redicted 
i pred,i indicates that under the current dataset scale (n = 1000), 
value, the amount of n observation point. Through 
SVM is computationally efficient. 
visualization, we can clearly see the error distribution and 
Decision tree model has strong interpretability and 
deviation degree of each model on different water quality 
is suitable for processing simple water quality data. The 
parameters. To assess overfitting, we monitored training 
advantage is that the influence of each feature on water 
and validation loss curves across epochs. For the neural 
quality can be clearly expressed through the tree structure. 
network model, convergence was achieved after 60 
Decision trees are prone to over fitting in the face of 
epochs, with validation loss closely tracking training loss, 
complex data, which leads to the decline of prediction 
indicating minimal overfitting. Dropout (rate = 0.3) was 
accuracy. Decision tree model will also encounter 
employed to reduce model variance. The dropout rate was 
performance bottleneck when dealing with high 
selected based on validation performance across a tested 
dimensional data, and its prediction ability is limited. 
range of 0.2–0.5. 
The SVM algorithm performs well when dealing 
 with high and nonlinear data, and the model is able to 
3.1.4 Performance improvement formula capture complex relationships by mapping the data to 
Visualizing prediction outcomes facilitates an intuitive higher dimensions through kernel functions. SVM 
understanding of model performance across different performs well in the prediction of some water quality 
water quality parameters. In this study, bar charts were parameters, but its training time is long and the data 
utilized as the primary visualization method to present volume is large. The parameter selection of SVM has a 
both the Mean Squared Error (MSE) and the coefficient great influence on the model performance, and different 
of determination (R²) for each algorithm. This approach kernel functions and penalty factors will affect the 
enables a clear comparative analysis of prediction prediction results. 
accuracy and model fit on a per-parameter basis. The By integrating multiple decision trees, random 
performance improvement is calculated as follows forest effectively reduces the over fitting problem of a 
Equation (5). single decision tree. The model has strong robustness and 
(MSE performs well when dealing with large-scale data. 
before -MSEafter )PerformanceImprovement(%)= 100 Compared with decision tree, random forest can capture 
MSEbefore complex nonlinear relationship more accurately and has 
 (5) higher prediction accuracy. Random forest also has the 
In this study, the performance of the optimized problem of long training time and large consumption of 
neural network model and random forest model has been computing resources, and the computing overhead is 
improved. Taking the neural network as an example, the large when running on large data sets. 
optimized MSE is reduced from 0.080 to 0.065, and the Neural network can automatically extract features 
performance improvement is 18.75%. For the random from data through deep learning and has strong 
forest model, the optimized MSE is reduced from 0.100 adaptability. The neural network is outstanding in the 
to 0.087, and the performance improvement is 13%. prediction of multiple water quality parameters, and has 
Through parameter optimization and algorithm high precision in the modeling of complex relationships. 
adjustment, the accuracy of water quality prediction can The neural network can handle large-scale data sets and 
be effectively improved. The optimized MSE for the has strong optimization ability in the training process. 
Neural Network improved from 0.080 (pre-optimization) The training time of neural network is longer, the 
to 0.065 (final), and Random Forest improved from 0.100 requirement of computing resources is higher, and more 
to 0.087. These values are now clearly sourced from work needs to be done in data multiprocessing and model 
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 301 
 
tuning. which have shown promise in environmental time-series 
 forecasting. Benchmarking these models against classical 
methods on larger and real-time datasets could further 
3.3 Model limitations and failure cases 
validate their practical applicability in ecological 
Despite overall good performance, several model-
monitoring systems. 
specific limitations were observed. The decision tree 
model failed to generalize in cases with high parameter References 
correlation and missing value imputation, often leading 
to overfitting in low-variance subsets. SVM struggled [1] Eyring V, Collins WD, Gentine P, Barnes EA, 
when gamma and C were misaligned, producing flat Barreiro M, Beucler T, et al. Pushing the frontiers in 
decision surfaces and poor sensitivity for DO prediction. climate modelling and analysis with machine 
Random forest occasionally exhibited performance learning. Nat Clim Chang. 2024;14(1): 916–928. 
degradation when input features were highly collinear, DOI:10.1038/s41558-024-02095-y 
despite ensemble regularization. The neural network [2] Bren L, Ryan M. An Examination of Stream Water 
model, though highly accurate overall, required Quality Data from Monitoring of Forest Harvesting 
significant tuning and suffered from instability when in the Eastern Highlands of Victoria. Land. 
trained on incomplete datasets. These issues emphasize 2024;13(8):1217. DOI:10.3390/land13081217 
the importance of hyperparameter validation, feature [3] Li L, Knapp JLA, Lintern A, Crystal Ng CH, 
decorrelation, and pre-processing robustness in real- Perdrial J, Sullivan PL, et al. River water quality 
world water quality monitoring. shaped by land-river connectivity in a changing 
climate. Nat Clim Chang. 2024;14(3):123-130. 
DOI:10.1038/s41558-023-01923-x 
4 Conclusion [4] Aalipour M, Wu NC, Fohrer N, Kalkhajeh YK, 
Amiri BJ, et al. Examining the Influence of 
In this study, four kinds of machine learning algorithms, Landscape Patch Shapes on River Water Quality. 
namely decision tree, support vector machine, random Land. 2023;12(5):1011. 
forest and neural network, are compared to discuss their DOI:10.3390/land12051011 
application effect in water quality prediction. The [5] Stevens CAT, Lyons ARM, Dharmayat K, Mahani 
experimental results show that the neural network model A, Ray KK, Vallejo-Vaz AJ, et al. Ensemble 
is superior in dealing with complex nonlinear relations machine learning methods in screening electronic 
and can improve the prediction accuracy. Random forest health records: A scoping review. Digit Health. 2023; 
model is slightly inferior to neural network in some cases, 9:20552076231173225. 
but has better stability and lower risk of over fitting, and [6] Zou XT, Liu YN, Ji LN. Review: Machine learning 
is suitable for large-scale data processing. SVM is stable in precision pharmacotherapy of type 2 diabetes-A 
in the prediction of some water quality parameters, but promising future or a glimpse of hope? Digit Health. 
the training time is long and it is sensitive to the selection 2023; 9:20552076231203879. 
of parameters. Decision tree is suitable for preliminary [7] Zainurin SN, Ismail WZW, Mahamud SNI, Ismail I, 
analysis because of its strong interpretability, but it has Jamaludin J, Ariffin KNZ, et al. Advancements in 
limitations when dealing with complex data. Monitoring Water Quality Based on Various 
Future work can be optimized from two aspects, Sensing Methods: A Systematic Review. Int J 
according to the characteristics of different water quality Environ Res Public Health. 2022;19(21):14080. 
parameters, combined with a variety of algorithms for DOI:10.3390/ijerph192114080 
integrated learning, to improve the prediction accuracy [8] Quiroz-Martinez M A, Perez-Vitonera A, Gómez-
and stability of the model. The real-time and Rios, Monica, et al. Architecture Design for the 
computational efficiency of the model are also problems Implementation of a Water Quality Prediction 
in practical applications, which need to optimize the System in Aquaculture Systems with Big Data. 
training process of the model and reduce the International Conference on Applied Technologies. 
computational overhead. Through the research of this Springer, Cham, 2025.DOI:10.1007/978-3-031-
paper, machine learning has a broad application prospect 89757-3_12. 
in the field of water quality prediction. With the help of [9] Uypatchawong S, Chanamarn N. Enhancing surface 
reasonable algorithm selection and optimization strategy, water quality prediction efficiency in northeastern 
more efficient and accurate technical support can be thailand using machine learning. Indonesian Journal 
provided for water quality monitoring, and the of Electrical Engineering & Computer Science, 
development of intelligent water environment 2024, 36(2). DOI:10.11591/ijeecs. v36.i2.pp1189-
management can be promoted. 1198. 
Future work will explore the integration of advanced [10] Huang Y, Cai Y, He Y, et al. A water quality 
deep learning architectures, such as Temporal prediction approach for the Downstream and Delta 
Convolutional Networks (TCNs), Transformer-based of Dongjiang River Basin under the joint effects of 
sequence models, and hybrid attention-GNN frameworks, 
302   Informatica 49 (2025) 291–302                                                                                   Y. Xiong 
 
water intakes, pollution sources, and climate change. [17] Xu M, Lee C, Ng C, et al. Assessment of machine 
Journal of Hydrology, 2024, learning models in forecasting environmental 
640(000):18.DOI:10.1016/j.jhydrol.2024.131686. impacts of industrial activities. Environ Impact 
[11] Wu G, Zhang C. Analysis of water quality Assess Rev. 2024;48(2):45-58. 
prediction in the yangtze river delta under the river DOI:10.1016/j.apr.2022.101438 
chief system. Sustainability, 2024, 16(13):5578. [18] Yuan M, Shi Y, Liu Y, et al. Leveraging machine 
DOI:10.3390/su16135578. learning for personalized cancer treatment: Recent 
[12] Lopes RH, Silva CRDV, Salvador PTCD, Silva ÍdS, advances and challenges. Cancer Lett. 2024;514:1-
Heller L, Uchôa, SADC. Surveillance of drinking 13. DOI:PQDT:89409451 
water quality worldwide: scoping review protocol. [19] Tang R, Zhang Z, Li H, et al. Application of deep 
Int J Environ Res Public Health. 2022;19(15):8989. learning in the management of chronic diseases: A 
DOI:10.3390/ijerph19158989 review. Chronic Dis Transl Med. 2023;9(3):235-249. 
[13] Liu Z, Wang X, Zhang Y, et al. Big data and machine DOI:10.2147/IJGM.S516247 
learning approaches in health applications: An [20] Cheng YR, Li G, Zhou X, Ye SH. Research on time 
overview. J Healthc Inform Res. 2020;47(2):184- series forecasting models based on hybrid attention 
200. DOI:10.1038/s41575-020-0327-3 mechanism and graph neural networks. Inform. 
[14] Huang Y, Lee R, Wang S, et al. AI-driven diagnosis 2025;49(21). doi:10.31449/inf.v49i21.7580 
in medical imaging: A survey of applications and [21] Pipalwa R, Paul A, Mukherjee T. Prediction of heart 
challenges. Int J Comput Assist Radiol Surg. disease using modified hybrid labelifier. Inform. 
2024;19(5):1215-1224. DOI:10.1007/s13721-024- 2023;47(1). doi:10.31449/inf.v47i1.3629 
00491-0 [22] Wang P, Han Q, Zhang S, Wu Z. Machine learning-
[15] Zhang Y, Chen Y, Wu S, et al. Deep learning for based regression analysis and feature ranking for 
predictive modeling of climate-related diseases: A localization error prediction in wireless sensor 
systematic review. J Clim Change Health. networks. Inform. 2025;49(20). 
2023;5:100034. DOI: doi:10.31449/inf.v49i20.8081 
[16] Yang Z, Zhang L, Lu Y, et al. Neural network-based [23] Cavalieri S, Scroppo MS. A CLR virtual machine 
models in environmental health data analysis: A based execution framework for IEC 61131-3 
comparative study. Environ Health Perspect. applications. Inform. 2019;43(2). 
2024;132(7):073004. doi:10.31449/inf.v43i2.2019 
DOI:10.1109/TGRS.2025.3529322 
 
https://doi.org/10.31449/inf.v49i16.9602 Informatica 49 (2025) 303–314 303 
 
A GAN-Based Framework for Synthetic Financial Data Generation, 
Risk Forecasting, and Portfolio Optimization under Uncertainty  
 
Aihua Li 
Department of Engineering Management, Henan Technical College of Construction, Zhengzhou, 450064, China 
E-mail: hnzdli@163.com 
Keywords: financial risk, dynamic prediction, decision optimization, generative adversarial network (GAN), machine 
learning, risk management and financial modeling 
Received: June 6, 2025 
This article proposes a financial risk dynamic prediction and decision optimization model based on 
Generative Adversarial Network (GAN). The model generates synthetic financial data, trains a risk 
prediction model, and optimizes financial decisions based on predicted risks. Simulation results show that 
the proposed method outperforms traditional machine learning models, achieving a mean absolute error 
(MAE) of 0.012 and a mean squared error (MSE) of 0.002, indicating high prediction accuracy. The model 
achieves an average risk of 4.5% and an average return of 8.2%, surpassing conventional algorithms. 
With a recommended portfolio allocation of 65% equities, 30% bonds, and 5% cash, it optimizes 
investment decisions by maximizing returns while minimizing risks. Overall, the proposed approach 
provides a novel and effective solution for financial risk prediction and decision optimization, 
demonstrating superior performance over existing methods. 
Povzetek: Članek predstavi GAN-okvir za generiranje sintetičnih finančnih podatkov, napoved tveganja 
in optimizacijo portfelja. Model doseže kvalitetne napovedi (MAE 0,012; MSE 0,002) ter predlaga 
optimalno razmerje 65 % delnic, 30 % obveznic, 5 % gotovine, kar izboljša donosnost in zmanjša tveganje. 
 
1  Introduction interdependence characteristic among the financial 
metrics of the companies [2].  
Subsequent to the development of the capital market, the One can identify anomalous financial data of enterprises 
methodology for conducting financial analysis has by utilizing the commonly occurring groupings of 
experienced continuous improvements. The scope of financial indicators, referred to as frequent item sets of 
financial analysis will be broadened to encompass the financial indicators [3]. A feature associated with the 
evaluation of financial position, operating outcomes, and historical evolution of financial indicators across various 
cash flow of enterprises. In financial accounting, the sectors is the dynamic temporal correlation present among 
conventional analytical approach entails assessing an industries. A transmission phase will occur in which 
enterprise's financial condition quantitatively or alterations in the financial status of upstream enterprises 
qualitatively based on key indicators related to solvency, will impact downstream industry. Subsequent to this 
operational capacity, and profitability, along with the year- transmission period, the financial indicators of related 
over-year performance of these indicators [1]. The upstream and downstream sectors will display either a 
capacity to forecast financial risk exposure and positive or inverse connection over time. Forecasting the 
developmental trends is deemed inadequate. future financial state of downstream sectors is achievable 
Consequently, pertinent professionals began employing through an analysis of trend correlation [4-7]. 
increasingly sophisticated artificial intelligence and data Subsequently, reference [8] presents novel suggestions for 
mining techniques for financial research and forecasting. improving financial indicators, so contributing to the early 
Nonetheless, few studies have been undertaken to assess warning model of financial indicators. The study 
or forecast the operational circumstances of firms by referenced in [9] indicated that the returns on total assets, 
analyzing the associative relationships within financial the asset-liability ratio, and the working capital ratio are 
data. The connection relationship between corporate the most advantageous regarding their effects.  
financial data will provide several diverse manifestations, Reference [10] presented the application of numerous 
which will differ based on the various data elements. The financial indicators in the study of a financial risk early 
spatial association of enterprise finance pertains to the warning model. The researchers optimized five 
distance characteristics of financial indicators across comprehensive indicators from a total of 22 financial 
many dimensions. Moreover, enterprises situated in indicators, determined the weight coefficient for each, 
proximity within multi-dimensional environments have a built the Z-value model, and achieved significant results. 
higher degree of financial similarity. The static temporal In the realm of later corporate financial risk early warning 
association of financial indicators refers to the analysis, the Z-value model has achieved significant 
success via its endeavors. The concept of multivariate 
304   Informatica 49 (2025) 303–314                                                                                                                                              A. Li 
 
linearity, as outlined in reference [11], demonstrates that Non-financial indicators are crucial in several firm 
the multivariate linear model is more appropriate for the financial early warning models, and the importance of 
contemporary enterprise financial early warning system their early warning analyses is paramount [15] 
and exhibits superior accuracy compared to the Concerning the purpose and role of financial diagnosis, 
multivariate early warning model.  reference [16] said that for financial diagnosis to 
The principle of multivariate linearity underpins the contribute to the strategic development of the company, it 
formulation of the logistic regression model. Reference must be positioned at a strategic level. This was achieved 
[12] conducted a linear analysis employing the Logistic by identifying an alternate method to focus on the strategic 
linear regression model with the prevailing economic perspective.  
conditions and model attributes. They suggested that early A specific time period is frequently predicted using 
warning systems for financial risk could enhance their machine learning (ML) models, remote sensing 
accuracy through the accumulation of expertise derived techniques, and empirical models [18, 19]. The most 
from an increasing number of study samples and data promising technologies for forecast prediction are ML 
quantity. Thus, scholars have suggested that integrating models, which are frequently used in artificial neural 
factor analysis with the logistic regression model might networks (ANNs) because of their high accuracy. ARIMA 
more precisely represent the possible financial hazards is a well-known ML model that is particularly popular for 
associated with financial indicators. Moreover, it may time series data and has excellent accuracy for small 
diminish the superfluous weight resulting from the datasets [20, 21]. Table 1 present the comparison of 
redundancy of index elements, hence illustrating its proposed work with recent literature. 
enhanced accuracy and scientific validity.   
In the domain of financial risk early warning, neural Motivation and contribution 
networks have gained prominence because to the rapid Using Generative Adversarial Networks (GANs), the 
advancement of artificial intelligence and the robust proposed financial risk dynamic prediction and decision 
technological support afforded by big data on the internet. optimization model has many novel characteristics. First, 
The approach referenced in [13] suggests that early it creates fake financial data using GANs. A new way to 
warning enterprises might gain advantages from the forecast financial risk and make wiser decisions. Second, 
empirical risk reduction principle of neural networks. it simplifies money decisions by combining risk prediction 
Nevertheless, concurrently, the predictive efficacy of with decision optimization. Synthetic data production 
neural network early warning models utilizing machine generates realistic data, making the risk prediction model 
learning technology is improving significantly due to the more accurate and trustworthy. By considering risks, it 
rapid advancement of computer technology.  helps individuals make sound financial decisions. Lowers 
Financial indicators not only objectively reflect an money loss risk. 
organization's operational and financial health but are also The suggested approach uses a GAN architecture to 
the most often utilized metrics in financial early warning generate fake financial data. GANs in finance are used in 
models. Due to its ease of acquisition, it has attracted this new method. A risk prediction model trained on GAN-
considerable interest since the introduction of the generated fake data is also used. Thus, risk estimates are 
univariate early warning model. The selection of financial more reliable. 
indicators has evolved from a singular focus on metrics To test the proposed model, we simulate it. This gives us 
like the asset-liability ratio and equity ratio to a parallel an exact and full picture of its performance. Comparing it 
assessment of multiple indicators, ultimately advancing to to other machine learning models shows its superiority 
the categorization of specific financial indicators into and usefulness. These new experimental ideas help us 
various classifications to enhance model efficiency [14]. fully examine the model's abilities and observe how it can 
This modification was implemented to enhance the identify financial risks and make wiser decisions. 
model's efficiency.   
 
Table 1: Comparison of proposed work with recent literature 
Disadvantages 
Reference Key Focus/Contribution Advantages Highlighted (Implied/Potential) Gaps (Unaddressed by the Text) 
Crucial for firm financial early Not explicitly stated, but non- Specific types of non-financial 
warning; paramount for early financial data can be qualitative, indicators (e.g., ESG, operational, 
Importance of Non- warning analyses. Provides a harder to quantify, or less governance) and their individual 
financial Indicators in more holistic view beyond standardized. Data collection impact. Methodologies for integrating 
[15] Early Warning Models traditional financial ratios. might be more complex. diverse non-financial data. 
Contributes to the strategic Implies that if not strategically 
Strategic Positioning of development of the company positioned, financial diagnosis How to effectively integrate financial 
[16] Financial Diagnosis when positioned at a strategic might be limited to a tactical or diagnosis into the strategic planning 
level. Shifts focus from mere operational view, missing process. Specific "alternate methods" 
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 305 
 
solvency to long-term viability broader implications. for a strategic focus. 
and growth. 
Diverse range of methods 
Overview of Prediction available for predicting specific Comparative analysis of these 
Techniques (ML, Remote time periods. Suggests No specific disadvantages techniques for financial early warning 
Sensing, Empirical adaptability across various mentioned for these general specifically. When to choose one over 
[18] Models) domains. categories. the other for financial applications. 
Specific limitations of ANNs (e.g., 
interpretability, data requirements). 
ML models are "most promising ARIMA's limitation of "small How to handle highly volatile or non-
technologies" with "high datasets" is mentioned, stationary financial time series. The 
accuracy." ANNs are frequently implying it might not be as challenges of implementing and 
used. ARIMA is "well-known" suitable for large or complex validating these models in real-world 
Machine Learning Models and "popular for time series data" financial datasets without financial settings. Addressing data 
(ANNs, ARIMA) for with "excellent accuracy for significant preprocessing or quality issues in financial datasets for 
[20] Forecast Prediction small datasets." combination with other models. ML models. 
Developing a financial 
risk dynamic prediction 
and decision optimization 
model using Generative Improved accuracy, Robust risk 
Proposed Adversarial Networks prediction and Optimized 
model (GANs) decision-making Complexity Interpretability 
 
2  The proposed system Generative Adversarial Network (GAN): 
A generator plus a discriminator makes up a generative 
To address financial risk, the suggested system is a adversarial network (GAN). While the discriminator 
multifarious structure combining three main components. separates between actual and synthetic data [22–25], the 
It uses Generative Adversarial Networks (GANs) to create generator generates synthetic financial data similar to 
realistic synthetic financial data and improve prediction genuine data. An adversarial loss function reduces the 
accuracy, optimization models to guide best decision- difference between actual and synthetic data, hence 
making based on risk predictions, and time-series training the GAN. For the purpose of capturing 
financial data to capture the dynamic character of financial interactions throughout time, TimeGANs make use of 
risk, so offering a complete method of managing financial recurrent neural networks. These methods generate 
risk. Figure 1 presents the block diagram for the proposed respectable synthetic time series data by accurately 
system. The proposed model architecture is a multifaceted simulating time-series dynamics using extra networks. 
structure comprising four phases. Firstly, a Generative Generator (G):  
Adversarial Network (GAN) is trained to generate The generator aims to produce synthetic financial data that 
synthetic financial data that closely resembles real closely resembles the real data. 𝐺(𝑧; 𝜃𝑔) , where 𝑧  is a 
financial data. Secondly, a risk prediction model is trained 
random noise vector and 𝜃𝑔 represents the generator's 
using a combination of real and synthetic financial data to 
parameters. 𝐺(𝑧) generates synthetic financial time series 
predict future financial risk. Thirdly, the trained risk 
prediction model is utilized to predict future financial risk ?̃?. 
based on new, unknown input data. Lastly, the predicted Discriminator (D):  
financial risk is leveraged to optimize financial decisions, The discriminator aims to distinguish between real and 
such as portfolio allocation and risk management synthetic financial data. 𝐷(𝑥; 𝜃𝑑),  where 𝑥  is the input 
strategies. data (either real or synthetic) and 𝜃𝑑   represents the 
 discriminator's parameters. 𝐷(𝑥)  outputs a probability 
2.1 Data representation that 𝑥 is real. 
Loss Function:  
Financial data is inherently time-series based. Let 𝑋 =
The GAN is trained by minimizing the following 
{𝑥 𝑇
𝑡}𝑡=1 represents the financial time series, where 𝑥𝑡 is a 
adversarial loss function: 
vector of financial features at time 𝑡. These features could 
min max 𝑉(𝐷, 𝐺)
include stock prices, interest rates, volatility indices, etc.  𝐺 𝐷
We can represent this as: 𝑥𝑡 = [𝑝𝑡 , 𝑖𝑡 , 𝑣𝑡 , . . . ], where 𝑝𝑡  is = 𝐸𝑥 [𝑙𝑜𝑔𝐷(𝑥)]
∼𝑝𝑑𝑎𝑡𝑎(𝑥)
the price, it is the interest rate, and 𝑣𝑡 is the volatility at + 𝐸𝑧 [𝑙𝑜𝑔(1 − 𝐷(𝐺(𝑧)))] 
∼𝑝𝑧(𝑧)
time 𝑡. (1) 
 
 
306   Informatica 49 (2025) 303–314                                                                                                                                              A. Li 
 
In Eq. (1), 𝑝𝑑𝑎𝑡𝑎(𝑥) is the distribution of real financial 𝑽𝒂𝑹 and 𝑬𝑺 calculation 
data, 𝑝𝑧(𝑧)  is the distribution of the random noise and Value at Risk (𝑉𝑎𝑅) and Expected Shortfall (𝐸𝑆) may be 
𝐸(. ) represents the expected value. calculated many ways. The Historical Simulation Method 
Time series GANs (TimeGANs):  organizes GAN-generated data in ascending order and 
For time series data [26], variations like TimeGANs are calculates 𝑉𝑎𝑅 at the 95th or 99th percentile selected for 
employed, which incorporate recurrent neural networks 
confidence. The Parametric Method calculates 𝑉𝑎𝑅  for 
(RNNs) like LSTMs or GRUs to capture temporal 
GAN-generated data using a normal or Student's t-
dependencies. These models utilize embedding and 
distribution. The Monte Carlo Simulation Method 
recovery networks, in addition to the generator and 
discriminator, to effectively model time-series dynamics. employs the GAN to create several scenarios and calculate 
GANs may generate synthetic financial data with similar 𝑉𝑎𝑅 by averaging the losses at the selected confidence 
patterns and linkages. Giving models a larger dataset to level. 
train on may help them forecast dangers. Rare or severe When computing ES, the Historical Simulation Method 
occurrences may be underrepresented in this dataset. identifies the average loss larger than 𝑉𝑎𝑅  at the set 
Synthetic data production creates novel situations that confidence level. The Parametric Method assumes the 
may help models perform better with fresh data. distribution of GAN-generated data and calculates ES 
We identify risk indicators like Expected Shortfall (𝐸𝑆 ) using its properties. The Monte Carlo Simulation Method 
and Value at Risk using GAN-generated false data. Time- employs the GAN to create several scenarios and discover 
series links may allow the model to dynamically predict 
ES by calculating the average loss larger than 𝑉𝑎𝑅. 
future risk levels from current and prior financial data. 
 
Manages dangers beforehand. 
Confidence Level 
The optimization component determines the best sequence 
of options within constraints and maximizes utility Specific needs may determine 𝑉𝑎𝑅  and 𝐸𝑆  calculation 
function using predicted risk. GANs and decision confidence levels. Internal risk management uses 99% CI 
optimization may improve scenario realism and power. to set limitations. These methods and confidence levels 
This improves financial risk management decisions.  may help banks and investors estimate 𝑉𝑎𝑅 and ES while 
 taking into account complex data patterns and 
2.2 Financial risk prediction correlations. 
Generative Adversarial Networks (GANs) may help to 
estimate risk metrics thereby strengthening financial risk Risk measure estimation:  
prediction. Synthetic financial scenarios produced by GANs can be used to generate synthetic financial 
GANs are then used to estimate risk factors like Expected scenarios, which can then be used to estimate risk 
Shortfall (ES) and Value at Risk (VaR). VaR shows the measures like Value at Risk (𝑉𝑎𝑅) or Expected Shortfall 
possible loss with a particular confidence level; ES (𝐸𝑆), showed in Eq. (2) and (3). 
computes the anticipated loss outside of VaR. Moreover, 𝑉𝑎𝑅𝛼 = 𝑖𝑛𝑓{𝑙: 𝑃(𝐿 ≤ 𝑙) ≥ 𝛼}         (2) 
by including time-series dependencies, the model can where 𝐿 is the loss and 𝛼 is the confidence level which has 
dynamically forecast future risk levels depending on been placed as subscript to 𝑉𝑎𝑅 and 𝐸𝑆. 
present and previous financial data, thus supporting 𝐸𝑆𝛼 = 𝐸[𝐿 ∣ 𝐿 ≥ 𝑉𝑎𝑅𝛼].                 (3) 
proactive risk control. Dynamic prediction:  
 By incorporating time-series dependencies, the model can 
 dynamically predict future risk levels based on current and 
 past financial data. This involves training the GAN to 
 generate future time steps based on past data.
 
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 307 
 
 
Figure 1: The block diagram for the proposed system 
 
2.3 Decision optimization Utility function and constraints vary by financial risk 
Decision optimization increases utility function within management circumstance. This optimization issue may 
restrictions by determining the best choice sequence. It be solved using dynamic programming or other 
helps manage financial risk. Based on expected risk, the approaches. You may use financial scenarios to minimize 
utility function shows the choice result. The limits may risk or increase returns in investment portfolios. This may 
include your risk tolerance and budget. You may optimize be done via Mean-Variance Optimization (MVO). By 
this issue using dynamic programming and other identifying risk events and reducing them, the model may 
approaches. Financial scenarios may be used with mean- help you create dynamic risk management plans. 
variance optimization (MeV) to optimize returns or reduce  
risk in an investment portfolio. The model may also help Optimization function:  
create dynamic risk management strategies by predicting A typical portfolio optimization function is in Eq. (5):  
future risk events and providing solutions. A normal min 𝑤𝑇 𝛴𝑤 − 𝜆𝑤𝑇𝜇                       (5) 
𝑤
portfolio optimization function minimizes risk and where, 𝑤  is the vector of portfolio weights, 𝑤𝑇𝛴𝑤 
maximizes profits. The asset returns covariance matrix represents the portfolio risk (variance of returns),  𝛴 is the 
and anticipated asset return vector are examined. GANs covariance matrix of asset returns,  𝛼  is risk tolerance 
give possibilities for covariance matrix and expected parameter and 𝜇 is the vector of expected asset returns.  
return calculations. Combining this with decision The GANs provide the scenarios used to calculate the 
optimization provides for more accurate and realistic covariance matrix and expected returns. This allows the 
scenario information. This link simplifies optimization optimization to utilize more robust and realistic scenario 
and improves financial risk assessment. information. 
Let 𝐴𝑡 be the decision variable at time t (e.g., investment The decision optimization stage guides financial decisions 
portfolio allocation, risk mitigation actions) and 𝑅𝑡 is risk by utilizing predicted financial risk to determine the best 
tolerance. Let 𝑈(𝐴𝑡 , 𝑅𝑡)  be the utility function, financial choices. This stage begins with inputting the 
representing the decision's outcome based on the predicted 
expected financial risk into the decision optimization 
risk. The optimization problem is to find the optimal 
module, which serves as the foundation for optimizing 
decision sequence in Eq. (4):  
1 financial decisions. A suitable optimization model, such as 
max (𝐴1, . . . , 𝐴𝑇)∑𝑡 = 𝑇 (𝐴
𝑈 𝑡 , 𝑅𝑡)            (4) 
linear programming or dynamic programming, is 
Subject to constraints: 𝐶(𝐴𝑡 , 𝑅𝑡) ≤ 0  (e.g., budget 
established based on the complexity and nature of the 
constraints, risk tolerance). 
financial decisions. The optimization model identifies 
308   Informatica 49 (2025) 303–314                                                                                                                                              A. Li 
 
complex interactions among financial factors, including synthetic data; the discriminator network checks it and 
asset returns, risk levels, and portfolio restrictions. provides comments back to the generator. Following 
preprocessing, the GAN is trained aiming toward 
The optimization process involves determining the best producing synthetic financial data indistinguishable from 
financial choices that minimize financial risk and real data. Visual inspection, accuracy, and loss functions 
maximize profits, subject to various constraints and limits. are among the many criteria used among the several 
The best financial judgments generated by the decision benchmarks to evaluate the GAN's performance 
optimization module can guide direct investment throughout training. By use of knowledge of the quality of 
strategies, risk management, and portfolio performance the generated data, this evaluation directs any necessary 
maximization. By making informed decisions based on adjustments to the GAN design or training environment. 
After sufficient training a GAN may generate realistic 
optimal financial judgments, financial institutions and 
synthetic financial data that can be used downstream for 
investors can reduce financial risk, increase returns, and 
stress testing, risk analysis, and portfolio optimization. 
achieve their financial goals. 
 
The decision optimization problem can be mathematically Stage 2: Risk prediction model training 
An essential component of the complete process, the 
represented as: 
training phase for risk prediction models seeks to produce 
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒: 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝑒. 𝑔. , 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑟𝑒𝑡𝑢𝑟𝑛, 𝑟𝑖𝑠𝑘a powerful and accurate model able to anticipate future 
− 𝑎𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑟𝑒𝑡𝑢𝑟𝑛) financial risk. This stage begins with the synthesis of 
synthetic data mixed with genuine financial data, therefore 
𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜: 𝐶𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠 (𝑒. 𝑔. , 𝑟𝑖𝑠𝑘 𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒, providing a whole and diversified dataset for training the 
risk prediction model. Depending on the kind and degree 
 𝑟𝑒𝑔𝑢𝑙𝑎𝑡𝑜𝑟𝑦 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑚𝑒𝑛𝑡𝑠, 𝑏𝑢𝑑𝑔𝑒𝑡 𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠) of the data, a suitable risk prediction model is 
subsequently created—machine learning or deep learning 
𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠: 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 (𝑒. 𝑔. , 𝑝𝑜𝑟𝑡𝑓𝑜𝑙𝑖𝑜  model. Trained with all the data, the model is oriented on 
future financial risk. The training approach seeks to 
𝑤𝑒𝑖𝑔ℎ𝑡𝑠, 𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 𝑎𝑚𝑜𝑢𝑛𝑡𝑠) 
optimize the model's parameters so that the error between 
The utility function and constraints can be tailored to predicted and actual risk levels is lowest feasible. After 
specific financial goals and risk management objectives. training, accuracy, precision, recall, and F1-score among 
other standards are used to evaluate the model. These 
By solving this optimization problem, financial 
indications advise any necessary architectural or training 
institutions and investors can determine the optimal 
parameter adjustment and assist one to grasp the potential 
financial decisions that balance risk and return. 
of the model to precisely anticipate financial risk. Good 
risk prediction model training may enable financial 
3  Complete model structure organizations to learn significant knowledge about likely 
The complete model structure consists of four phases future dangers, therefore directing their activities and 
(Figure 2). First step is training a Generative Adversarial development of effective risk management strategies. 
Network (GAN) to provide suitable synthetic financial  
data. Stage 2 uses blended real and synthetic data to build Stage 3: Risk prediction 
a risk prediction model hoping to forecast future financial The first risk prediction one applies in the final stage of 
risk. Stage 3 projections financial risk depending on new, the operation forecasts future financial risk using a trained 
unknown factors using the taught risk prediction model. risk prediction model. Starting with the provided new, 
By use of the expected financial risk via a decision unknown input to the trained model comprising financial 
optimization module, stage 4 at last optimizes financial aspects and current market conditions, this phase proceeds 
choices including risk management techniques or through It is carefully chosen to ensure its correctness and 
portfolio allocation. Every step builds on the one before it relevance as the predictions of the model rely on the 
lets the model create reasonable synthetic data, predict available data. Once the particular data becomes available, 
financial risk, and maximize financial actions to reduce the trained model is then projected future financial risk 
risk and increase profits. related with it. This prediction offers a forward-looking 
 assessment of expected financial risk based on patterns 
Stage 1: GAN training and connections the model learns over the training period. 
First in the procedure is training a generative adversarial The expected financial risk generated by the risk 
network (GAN) suitable synthetic financial data. Previous prediction model might find use for requirements related 
financial data is collected and preprocessed at this step to to decision-making. Whether in form—a chance of 
ensure it is in a fit condition for training the GAN. Usually default, expected loss, or risk score—this output provides 
combining time-series data with important financial financial institutions, investors, and other stakeholders 
domain associated important features, this data Data significant information. These businesses might optimize 
preparation results in the construction of an appropriate their risk-reducing strategies, make sensible decisions, 
GAN architecture incorporating a generator network and and manage challenging financial markets with greater 
a discriminator network. The generator network generates confidence by applying the expected financial risk. 
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 309 
 
depending on the degree of complexity and nature of the 
financial decisions—a linear programming or dynamic 
programming model. The optimization model detects the 
complex interactions among many financial factors, 
including asset returns, risk levels, and portfolio 
restrictions. Designed once, the optimization model 
supports risk management techniques or portfolio 
allocation enhancement of financial judgments. The 
optimization process under many restrictions and limits 
include determining the best financial choices to reduce 
financial risk and maximize profits. Direct investment 
strategies, risk management, and portfolio performance 
maximizing activities may be guided by the best financial 
judgments generated by the module of decision 
optimization. Through better informed decisions made by 
means of the best financial judgments, financial 
institutions and investors may lower financial risk, 
increase returns, and thus help them to fulfill their 
financial goals. 
 
3.1 Integration of components 
GAN produces financial data to train the risk prediction 
model. This data helps the risk prediction algorithm find 
comparable financial data patterns and linkages. Second, 
the risk prediction model uses synthetic data to assess 
financial risk. Then, Value-at-Risk ( 𝑉𝑎𝑅 ) or predicted 
Shortfall (𝐸𝑆) are used to assess predicted risk. Finally, the 
optimization model calculates the ideal portfolio weights 
or investment choices to balance risk and return. 
Steps of the algorithm are described below. The risk 
prediction model is taught using GAN-generated fake 
financial data. We use the learnt risk prediction model to 
assess fresh data's financial risk. Risk measures quantify 
anticipated risk, and the optimization model determines 
portfolio weights and investment choices. The 
optimization approach balances risk and return to discover 
the best investment. 
The steps are as follows: 
1. Generate false financial data using GAN: 
synthetic_data = GAN.generate_data() 
2. Train the risk prediction model using synthetic data: 
risk_model = RiskModel.train(synthetic_data). 
3. Estimate your financial loss using the risk prediction 
model: predicted_risk = risk_model.predict(new_data). 
 4. Use calculate_risk_metric(predicted_risk) to get the 
Figure 2: The complete model structure 
risk metric. 
 
Stage 4: Decision optimization 5. Find the optimal portfolio weights or investments using 
The stage of decision optimization guides financial 
the optimization model: Portfolio = 
decisions by means of the predicted financial risk. This 
OptimizationModel.optimize(risk_metric, return_metric) 
stage begins with the expected financial risk being input 
into the module of decision optimization, therefore  
guiding the foundation for optimizing financial choices. 
The suitable optimization model is then established  
310   Informatica 49 (2025) 303–314                                                                                                                                              A. Li 
 
3.2 Variable selection and mapping 3.3 Weighting and validation of real and 
synthetic data 
Macroeconomic issues like GDP, inflation, and 
employment and social development elements like health, During training, the real and synthetic data can be 
education, and poverty are studied. These characteristics weighted differently to control the influence of each type 
were selected because they impact financial markets and of data on the model's performance. One approach is to 
asset returns. To place variables into a portfolio context, use a weighted loss function that assigns different weights 
we may use a multivariate technique that examines asset to the real and synthetic data. For example: 
performance. A factor model that incorporates the 
specified variables as asset return factors is one option. 𝑙𝑜𝑠𝑠 =  𝑤𝑟𝑒𝑎𝑙  ∗  𝑙𝑜𝑠𝑠𝑟𝑒𝑎𝑙  +  𝑤𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐  ∗  𝑙𝑜𝑠𝑠𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐  
Table 2 lists generator and discriminator network where 𝑤𝑟𝑒𝑎𝑙  and 𝑤𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐  are the weights assigned to the 
architectural parameters. real and synthetic data, respectively. 
Asset-Level Returns Validation 
We model asset returns using a multivariate distribution, To validate the performance of the model on both real and 
such as a multivariate normal distribution or a more synthetic data, we can use metrics such as mean squared 
elaborate one that exhibits non-linear relationships error (MSE) or mean absolute error (MAE) on a hold-out 
between variables. validation set. This can help us monitor the model's 
performance on both types of data and adjust the 
Portfolio context 
weighting scheme or other hyperparameters as needed. 
We use portfolio optimization to place variables in a Generating synthetic data that is diverse and 
portfolio context by looking at anticipated returns, risks, representative of the real data can help reduce overfitting. 
and correlations between assets. 
4  Experimental setup 
 The optimization problem can be formulated as: 
Python, TensorFlow or PyTorch is used for deep 
maximize: Portfolio return learning. The model settings include a batch size of 128, 
subject to: Risk constraints (e.g., 𝑉𝑎𝑅, 𝐸𝑆) 500 epochs, a noise dimension of 100, learning rates of 
variables: Portfolio weights 0.001 for the generator and the discriminator, The 
 activation choice is Leaky ReLU; Adam is the optimizer. 
The simulation parameters consist of a 0.1 volatility, a 
 
0.02 risk-free rate, and a 1000-time step simulation. 
Table 2: Generator and Discriminator Network We began the process of training a Generative Adversarial 
Architecture Parameters Network (GAN) for financial data creation using publicly 
available financial datasets 
Generator Discriminator (https://databank.worldbank.org/. Comprising more than 
Parameter Network Network 9,000 variables covering several spheres including 
economic, social, environmental, and others, this dataset 
includes macroeconomic characteristics such GDP, 
Number of 
inflation, and employment as well as social development 
Layers 4 4 
measures including education, health, and poverty.  
After the dataset is selected, data preparation—a crucial 
Activation 
component of the overall process—follows. Missing 
Function Leaky ReLU Leaky ReLU values must be handled by interpolation or imputation; the 
data must be normalized so that every attribute falls in the 
Number of 64, 128, 256, same range. Furthermore, the data has to be converted into 
Filters 512 64, 128, 256, 512 an appropriate form for GAN training, maybe 
incorporating scaling or encoding. The GAN design needs 
Kernel Size 4, 4, 4, 4 4, 4, 4, 4 to be developed after data preparation. A deep 
convolutional GAN (DCGAN) is particularly appropriate 
Stride 2, 2, 2, 2 2, 2, 2, 2 for financial data producing as its architecture consists of 
a generator network and a discriminator network. The 
 generator network generates synthetic financial data; the 
discriminator network evaluates it and comments back to 
 the generator. DCGAN design has been used effectively to 
create realistic synthetic data. Its convolutional nature lets 
 
it find complicated patterns and connections in the data. 
Some of the things that high-quality synthetic financial 
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 311 
 
data created by DCGAN designs may be used for include Table 3 shows the different parameters that were utilized 
risk analysis, portfolio optimization, and stress testing. to design the generator and discriminator networks.
 
Table 3: CNN architecture parameters for generator and discriminator network 
 
Generator Network Discriminator Network 
Input layer 100-dimensional noise vector Input layer 1-dimensional input (financial 
data) 
Convolutional layer 1 64 filters, kernel size 3, stride Convolutional 64 filters, kernel size 3, stride 
1 layer 1 1 
Convolutional layer 2 128 filters, kernel size 3, Convolutional 128 filters, kernel size 3, stride 
stride 1 layer 2 1 
Convolutional layer 3 256 filters, kernel size 3, Convolutional 256 filters, kernel size 3, stride 
stride 1 layer 3  1 
Output layer 1-dimensional output Output layer 1-dimensional output 
(financial risk prediction) (probability of real data) 
 
Figure 3 depicts the recommended model's CNN 5  Results and discussion 
architecture. Since it helps the Generative Adversarial 
Network (GAN) model identify financial data patterns and For GAN training, the dataset must deal with missing 
linkages, training is crucial. The Adam optimizer trains values by interpolation or imputation, normalize the data 
GANs. The well-known stochastic gradient descent so all characteristics are in the same range, and format the 
method alters the learning rate for each parameter based data for GAN training.  
on gradient size. The small learning rate of 0.001 allows  
model parameters converge slowly and gradually. The Adam optimizer trains GAN with 0.001 learning rate and 
batch size is 128, a conventional value that balances 128 batches. The generating network trains using phony 
computer speed and model stability. The GAN learns to financial data, and the discriminator network verifies it 
create phony financial data that appears real during and informs the generator what it thinks. GANs are trained 
training. In addition, the discriminator learns to for 500 epochs to obtain convergence and provide high-
distinguish genuine from fraudulent data. After training, quality synthetic financial data.  
R-squared, MAE, and MSE are used to evaluate the  
GAN's performance. These measurements demonstrate Several indicators are used to evaluate the proposed model 
the reliability of synthetic data and assist adjust GAN such as: MAE, MSE, RMSE, R-squared, risk prediction 
design and training parameters. How effectively the GAN accuracy, precision, recall, and F1-score.  
operates might indicate its synthetic data quality. This will  
determine whether the data is suitable for risk analysis, Table 4 shows that the proposed model outperforms recent 
portfolio optimization, and stress testing. works 1 [27], 2 [28], and 3 [29].  
Existing Work 1 [27] employs CNNs and LSTM networks 
for deep learning. Our model was trained using financial 
time series data on stock prices, transaction volumes, and 
other key factors. The model comprises 5 128-unit hidden 
layers. ReLU activation function, Adam optimizer, 0.01 
learning rate, 64 batch size, 1000 epochs.  
 
Existing Work 2 [28] uses random forest machine 
learning. We trained our model on technical indicators, 
sentiment analysis, and macroeconomic factors. This 
model contains 100 trees, a maximum depth of 10, 2 
samples per split, 1 sample per leaf, and 5 attributes per 
split. 
 
employs an autoregressive integrated moving average 
(ARIMA) method. This model learned from a set of 
historical financial time series data. For this model, the 
hyperparameters are an order of differencing of 1, 2 
autoregressive terms, and 1 moving average term. 
  
Figure 3: The CNN architecture for the proposed model  
  
  
312   Informatica 49 (2025) 303–314                                                                                                                                              A. Li 
 
With values of 0.009, 0.012, and 0.015 for the training,  
validation, and testing sets respectively, the suggested Table 5: GAN performance 
model's Mean Absolute Error (MAE) is much lower than  
0.052 ± 0.008 for current work. Likewise, the Mean Existing Existing Existing 
Squared Error (MSE) and Root Mean Squared Error Metric Proposed Model Work 1 Work 2 Work 3 
(RMSE) values for the proposed model are 0.001, 0.002, Training set: 0.04, 
and 0.003, and 1.2%, 1.5%, and 1.8% for the training, Validation set: 
Generator 0.05, Testing set: 
validation, and testing sets, respectively, exceeding 
Loss 0.06 0.05 0.07 0.09 
present work with values of 0.003 ± 0.001 and 0.55 ± 
Training set: 0.02, 
0.8%. Moreover, whereas previous work achieves a lower Validation set: 
R-squared value of 0.854 ± 0.018, the Coefficient of Discriminator 0.03, Testing set: 
Determination (R-squared) values for the proposed model Loss 0.04 0.03 0.05 0.07 
are 0.95, 0.92, and 0.90 for the training, validation, and 500 epochs, Batch 
testing sets, respectively, indicating a strong correlation size: 128, 
between predicted and actual values. Comparatively to GAN Learning rate: 1000 800 1200 
Convergence 0.001 epochs epochs epochs 
previous work, the suggested model shows enhanced 
 
accuracy, dependability, and generalizability. 
As per table 6, the predicted financial risk yielded by the 
 
proposed model is remarkably close to the actual financial 
Table 4: Performance metrics 
risk, with an average predicted risk of 0.023 and a standard 
 
deviation of 0.005, compared to an average actual risk of 
Proposed Existing Existing Existing 
Metric Model Work 1 Work 2 Work 3 0.025 and a standard deviation of 0.006. In contrast, 
existing work exhibits a higher average predicted risk of 
Training set: 
0.009, Validation 0.028, indicating a less accurate prediction. Furthermore, 
set: 0.012, the proposed model achieves a risk prediction accuracy of 
Mean Absolute Testing set: 0.052 ± 0.065 ± 0.075 ± 92%, with a precision of 90%, recall of 94%, and F1-score 
Error (MAE) 0.015 0.008 0.010 0.012 
of 92%, surpassing the 85% accuracy achieved by existing 
Training set: work. This superior performance underscores the 
0.001, Validation 
proposed model's ability to accurately predict financial 
set: 0.002, 
Mean Squared Testing set: 0.003 ± 0.005 ± 0.007 ± risk, enabling financial institutions and investors to make 
Error (MSE) 0.003 0.001 0.002 0.003 informed decisions and mitigate potential losses. 
Training set:  
Root Mean 1.2%, Validation Table 6: Risk prediction results 
Squared Error set: 1.5%, 0.055 ± 0.070 ± 0.085 ±  
(RMSE) Testing set: 1.8% 0.008 0.010 0.012 
Existing Existing Existing 
Training set: Metric Proposed Model Work 1 Work 2 Work 3 
Coefficient of 0.95, Validation 
Determination set: 0.92, Testing 0.921 ± 0.895 ± 0.865 ± 0.023 (Average 
(R-squared) set: 0.90 0.013 0.018 0.022 Predicted predicted risk: 0.023, 
Financial Standard deviation of 
 Risk predicted risk: 0.005) 0.028 0.035 0.042 
As per table 5, the generator loss for the proposed model 
0.025 (Average actual 
is lower, with values of 0.04, 0.05, and 0.06 for the Actual risk: 0.025, Standard 
training, validation, and testing sets, respectively, Financial deviation of actual 
compared to 0.08 for existing work. Similarly, the Risk risk: 0.006) 0.03 0.035 0.04 
discriminator loss for the proposed model is lower, with Risk 92% (Precision: 90% 
values of 0.02, 0.03, and 0.04 for the training, validation, Prediction Recall: 94% F1-
Accuracy score: 92%) 85% 80% 75% 
and testing sets, respectively, outperforming existing work 
with a value of 0.05. While present work spans 1000  
epochs, the proposed GAN model achieves convergence Table 7 shows that the model has a precision of 0.853 ± 
in fewer epochs—only 500 epochs—needed to reach 0.021, a recall of 0.826 ± 0.025, and an F1-score of 0.839 
optimal performance. The successful convergence of the ± 0.022 for low-risk predictions. This means that it is quite 
suggested model—a batch size of 128 and a learning rate good at finding low-risk situations. The model does better 
of 0.001—helps to be explained by optimum in the medium-risk category, with accuracy, recall, and F1-
hyperparameter values. The proposed GAN model score values of 0.913 ± 0.015, 0.895 ± 0.018, and 0.904 ± 
exhibits usually superior performance, stability, and 0.016, respectively. This shows that it can reliably forecast 
efficiency than present work, which makes it a more medium-risk occurrences. The model's ability to find 
trustworthy and effective tool for producing synthetic high-risk situations is shown by its high accuracy, recall, 
financial data. and F1-score values of 0.952 ± 0.008, 0.935 ± 0.011, and 
 0.943 ± 0.009, which are all very good. Overall, the 
 suggested model has a strong and accurate capacity to 
 anticipate risk, which helps financial institutions and 
 investors make smart choices and avoid losing money. 
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 313 
 
Table 7: Risk level-based prediction results predicts risk well with an MAE of 0.012 and an MSE of 
 0.002. Due to its 4.5% risk and 8.2% return, the model 
Risk Proposed Existing Existing Existing outperforms machine learning methods. The model 
Level Model Work 1 Work 2 Work 3 adjusts to market volatility with an average return of 8.5% 
Precision: 0.853 and risk of 4.2%. The model offers a novel technique to 
± 0.021, Recall: Precision: Precision: Precision: predict financial risk dynamics and improve decision-
0.826 ± 0.025, 0.80, Recall: 0.75, Recall: 0.70, Recall: 
making. It may be utilized for portfolio, risk, and 
F1-score: 0.839 0.75, F1- 0.70, F1- 0.65, F1-
Low ± 0.022 score: 0.77 score: 0.72 score: 0.67 investment choices. We must improve the risk prediction 
Precision: 0.913 model, add elements to the decision-optimizing model, 
± 0.015, Recall: Precision: Precision: Precision: and discover new methods to use technology in banking. 
0.895 ± 0.018, 0.85, Recall: 0.80, Recall: 0.75, Recall: 
F1-score: 0.904 0.80, F1- 0.75, F1- 0.70, F1-
Medium ± 0.016 score: 0.82 score: 0.77 score: 0.72 References 
Precision: 0.952 [1]  Bhat, A., Kulkarni, N., Husain, S., Yadavalli, A., 
± 0.008, Recall: Precision: Precision: Precision: Kaur, J. N., Shukla, A., & Seshadri, V. (2024). 
0.935 ± 0.011, 0.90, Recall: 0.85, Recall: 0.80, Recall: 
F1-score: 0.943 0.85, F1- 0.80, F1- 0.75, F1- Speaking in terms of money: financial knowledge 
High ± 0.009 score: 0.87 score: 0.82 score: 0.77 acquisition via speech data generation. ACM Journal 
 on Computing and Sustainable Societies, 2(3), 1-35.  
When it comes to the best portfolio allocation, anticipated [2]  Paiva F.D.a, Cardoso R.T.N., Hanaoka G.P.，
return, and expected risk (Table 8), the suggested &Duarte W.M. Decisionmaking for Financial 
technique is far better at making judgments than earlier Trading: A Fusion Approach of Machine Learning 
studies. The suggested model says that the best way to and Portfolio Selection. Expert Systems with 
divide up a portfolio is to have 65% stocks, 30% bonds, Applications,2019, (115):635-655  
and 5% cash. Other work has said to put 60% of your [3]  Tang, Y., Song, Z., Zhu, Y., Yuan, H., Hou, M., Ji, 
money in equities, 35% in bonds, and 5% in cash. Also, J.,... & Li, J. (2022). A survey on machine learning 
the recommended model has a greater expected return of models for financial time series forecasting. 
8.2% (with a standard deviation of 1.5%) than the 7.5% Neurocomputing, 512, 363-380.  
expected return of the prior study. The proposed model [3]  Masini, R. P., Medeiros, M. C., & Mendes, E. F. 
also has a lower expected risk of 4.5% (with a standard (2023). Machine learning advances for time series 
deviation of 1.2%), whereas previous research shows a forecasting. Journal of Economic Surveys, 37(1), 76-
higher expected risk of 5.5%. This is really crucial. These 111.  
findings demonstrate that the recommended approach may [4]  Wang, J., Hong, S., Dong, Y., Li, Z., & Hu, J. (2024). 
assist investors and banks make better choices by making Predicting stock market trends using lstm networks: 
the best use of their portfolios, getting the most money overcoming RNN limitations for improved financial 
back, and lowering their risk. forecasting. Journal of Computer Science and 
 Software Applications, 4(3), 1-7.  
Table 8: Decision optimization results [5]  S. Safwat, A. Mahmoud, I. Eldesouky Fattoh and F. 
 Ali, "Hybrid Deep Learning Model Based on GAN 
Proposed Existing Existing Existing and RESNET for Detecting Fake Faces," in IEEE 
Metric Model Work 1 Work 2 Work 3 Access, vol. 12, pp. 86391-86402, 2024, doi: 
Optimized 65% stocks, 60% stocks, 55% stocks, 70% stocks, 10.1109/ACCESS.2024.3416910. 
Portfolio 30% bonds, 5% 35% bonds, 40% bonds, 25% bonds, [6]  Shi X, Zhang Y, Yu M, Zhang L. 2025. Deep learning 
Allocation cash 5% cash 5% cash 5% cash for enhanced risk management: a novel approach to 
8.2% (Standard analyzing financial reports. PeerJ Computer Science 
deviation of 11:e2661 https://doi.org/10.7717/peerj-cs.2661 
Expected expected 
Return return: 1.5%) 7.50% 7.00% 8.00% [7]  Huang, X.; Han, M.; Deng, Y. A Hybrid GAN-
Inception Deep Learning Approach for Enhanced 
4.5% (Standard 
deviation of Coordinate-Based Acoustic Emission Source 
Expected expected risk: Localization. Appl. Sci. 2024, 14, 8811. 
Risk 1.2%) 5.50% 6.00% 4.80% https://doi.org/10.3390/app14198811 
 [8]  Ren, S. (2022). Optimization of enterprise financial 
management and decision‐making systems based on 
6  Conclusion big data. Journal of Mathematics, 2022(1), 1708506.  
[9]  Qi, Q. (2022). Analysis and forecast on the price 
change of shanghai stock index. Journal of 
Generative Adversarial Networks (GANs) are used in this 
Economics, Business and Management, 10(1), 72- 
study to anticipate financial risk dynamics and make the 
78.  
optimal judgments. The model trains a risk prediction 
[10]  Petrozziello, A., Troiano, L., Serra, A., Jordanov, I., 
model using phony financial data from a GAN. Based on 
Storti, G., Tagliaferri, R., & La Rocca, M. (2022). 
financial risk prediction, the decision optimization model 
produces the optimum financial judgments. The model 
314   Informatica 49 (2025) 303–314                                                                                                                                              A. Li 
 
Deep learning for volatility forecasting in asset Published by: Wiley for the Royal Statistical Society 
management. Soft Computing, 26(17), 8553-8574.  Stable URL: 
[11] Li, Y., & Pan, Y. (2022). A novel ensemble deep  http://www.jstor.or 
learning model for stock prediction based on stock [21] Sutiene K, Schwendner P, Sipos C, Lorenzo L, 
prices and news. International Journal of Data Mirchev M, Lameski P, Kabasinskas A, Tidjani C, 
Science and Analytics, 13(2), 139-149.  Ozturkkal B, Cerneviciene J. Enhancing portfolio 
[12] Souto, H. G., & Moradi, A. (2023). Forecasting management using artificial intelligence: literature 
realized volatility through financial turbulence and review. Front Artif Intell. 2024 Apr 8; 7:1371502. 
neural networks. Economics and Business Review, doi: 10.3389/frai.2024.1371502. PMID: 38650961; 
9(2), 133-159.  PMCID: PMC11033520.  
[13] Zhan, X., Ling, Z., Xu, Z., Guo, L., & Zhuang, S. [22]  Xu, R., Yang, Y., Qiu, H., Liu, X., & Zhang, J. 
(2024). Driving efficiency and risk management in (2024). Research on Multimodal Generative 
finance through AI and RPA. Unique Endeavor in Adversarial Networks in the Framework of Deep 
Business & Social Sciences, 3(1), 189-197.  Learning. Journal of Computing and Electronic 
[14] Wei, L., Deng, Y., Huang, J., Han, C., & Jing, Z. Information Management, 12(3), 84-88.  
(2022). Identification and analysis of financial [23]  Dai, W., Tao, J., Yan, X., Feng, Z., & Chen, J. (2023, 
technology risk factors based on textual risk November). Addressing Unintended Bias in Toxicity 
disclosures. Journal of Theoretical and Applied Detection: An LSTM and Attention-Based 
Electronic Commerce Research, 17(2), 590-612.  Approach. In 2023 5th International Conference on 
[15] Lei, Y., Qiaoming, H., & Tong, Z. (2023). Research Artificial Intelligence and Computer Applications 
on supply chain financial risk prevention based on (ICAICA) (pp. 375- 379). IEEE.  
machine learning. Computational Intelligence and [24]  Yao, J., Wu, T., & Zhang, X. (2023). Improving depth 
Neuroscience, 2023(1), 6531154.  gradient continuity in transformers: A comparative 
[16] Levytska, S., Pershko, L., Akimova, L., Akimov, O., study on monocular depth estimation with cnn. arXiv 
Havrilenko, K., & Kucherovskii, O. (2022). A preprint arXiv:2308.08333.  
riskoriented approach in the system of internal [25] Wang, X. S., & Mann, B. P. (2020). Attractor 
auditing of the subjects of financial monitoring. Selection in Nonlinear Energy Harvesting Using 
International Journal of Applied Economics, Finance Deep Reinforcement Learning. arXiv preprint 
and Accounting, 14(2), 194-206.  arXiv:2010.01255. 
[17] Wang, H., & Budsaratragoon, P. (2023). Exploration [26]  Zhang, Y., Jiang, Z., Peng, C., Zhu, X., & Wang, G. 
of an" Internet+" grounded approach for establishing (2024). Management analysis method of multivariate 
a model for evaluating financial management risks in time series anomaly detection in financial risk 
enterprises. International Journal for Applied assessment. Journal of Organizational and End User 
Information Management, 3(3), 109-117.  Computing, 36(1), 1 -19.  
[18] A. Malki, E. Atlam, I. Gad, Machine learning [27]  Pandey, A., Mannepalli, P.K., Gupta, M. et al. A 
approach of detecting anomalies and forecasting Deep Learning-Based Hybrid CNN-LSTM Model 
time-series of IoT devices, Alex. Eng. J., 61 (11) for Location-Aware Web Service Recommendation. 
(2022), pp. 8973-8986, 10.1016/j.aej.2022.02.038 Neural Process Lett 56, 234 (2024). 
[19] K.E. Arunkumar, D. V Kalaga, C. Mohan, S. Kumar, https://doi.org/10.1007/s11063-024-11687-w 
T.M. Brenza, Comparative analysis of Gated [28]  Zhigang Sun, Guotao Wang, Pengfei Li, Hui Wang, 
Recurrent Units (GRU), long Short-Term memory Min Zhang, Xiaowen Liang, An improved random 
(LSTM) cells, autoregressive Integrated moving forest based on the classification accuracy and 
average (ARIMA), seasonal autoregressive correlation measurement of decision trees, Expert 
Integrated moving average (SARIMA) for Systems with Applications, Volume 237, Part B, 
forecasting COVID-19 trends 2024, https://doi.org/10.1016/j.eswa.2023.121549. 
 Alex. Eng. J., 61 (10) (2022), pp. 7585-7603, [29] Saratu Yusuf Ilu, Rajesh Prasad, improved 
10.1016/j.aej.2022.01.011 autoregressive integrated moving average model for 
[20] R.S. Society, Review author (s): M. G. Kendall COVID-19 prediction by using statistical 
review by: M. G. Kendall source: journal of the royal significance and clustering techniques, Heliyon, 
statistical society. Series A (general), J. Roy. Stat. Volume 9, Issue 2, 2023, 
Soc., 134 (3) (2016), pp. 450-453, 134, No. 3 (1971),  e13483, 
https://doi.org/10.1016/j.heliyon.2023.e13483.
 
 
https://doi.org/10.31449/inf.v49i16.9643 Informatica 49 (2025) 315–330 315 
 
GridRiskNet: A Two-Stage Hybrid Model for Project Investment 
Risk Management of Power Grid Enterprises Using Big Data Mining 
 
Hongzhi Gao*, Dekyi, Metok 
State Grid Tibet Electric Power Co., Ltd., Lhasa 850000, China 
E-mail: Djgy1108@163.com 
*Corresponding author 
Keywords: power grid enterprise engineering project, GridRiskNet, big data mining, project investment risk 
management, two-stage hybrid modeling  
Received: June 10, 2025 
To enhance the power grid enterprise's ability to comprehensively perceive and dynamically assess 
investment risks in engineering projects, this study proposes a risk management model called GridRiskNet 
based on big data mining. This model integrates structured, unstructured, and spatiotemporal data and 
realizes intelligent identification of project risk probability distributions and potential impact ranges by 
constructing a two-stage hybrid modeling architecture. In the first stage, the model uses eXtreme Gradient 
Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) to extract static and dynamic 
features in parallel. In the second stage, it introduces Graph Attention Recurrent Neural Network (GA-
RNN) to model risk propagation paths under the power grid topology. Meanwhile, this study combines 
Spatio-Temporal Graph Convolutional Network (ST-GCN) to improve the coupling expression of 
meteorological and text features. The experiment uses multi-source public data for verification, such as 
power infrastructure data from the U.S. Energy Information Administration, meteorological observation 
data from the National Oceanic and Atmospheric Administration, and power grid topology data from 
OpenStreetMap. The results show that GridRiskNet performs excellently in risk prediction stability and 
regional propagation modeling. Among them, the risk principal component analysis projection score in 
2023 reached 7.779. This indicates that cost overruns, climate pressure, and equipment technology risks 
together form a high-risk cluster, with cost overruns increasing by 269% compared with 2018. In the 
State-of-the-Art comparison, GridRiskNet achieves an F1-score of 0.892, a Receiver Operating 
Characteristic - Area Under Curve of 0.962, a Risk Impact Radius error of approximately 4.8 km, and a 
Risk Entropy of 0.89; these are comprehensively better than existing methods. Moreover, the model has 
good cross-modal feature fusion and risk transmission mechanism identification capabilities, and can 
effectively characterize the spatiotemporal coupling risk features in complex power grid projects. Overall, 
this system can provide power grid enterprises with structured and interpretable risk index outputs and 
regional early warning support. Thus, it helps to improve the investment safety and operational and 
maintenance resilience of projects. 
Povzetek: Predstavljen je GridRiskNet, dvofazni hibridni model za upravljanje investicijskih tveganj v 
elektroenergetskih projektih. S križnim združevanjem strukturiranih, besedilnih in prostorsko-časovnih 
podatkov ter uporabo XGBoost/LightGBM in GA-RNN izboljša napoved tveganj (F1=0,892, AUC=0,962) 
ter natančno modelira regionalno širjenje tveganj (napaka 4,8 km).
 
1 Introduction process from construction preparation, equipment 
deployment, to operation and maintenance support [2]. 
With the accelerated promotion of energy transition Especially against the backdrop of the rapid development 
and the construction of new power systems, the strategic of renewable energy, the risk types in project investment 
position of power grid engineering projects in national are constantly evolving. For example, the enhancement 
energy security and clean energy consumption has of climate extremeness, the swift change of equipment 
become increasingly prominent [1]. However, power grid technology paths, and the increase in policy compliance 
enterprises face problems such as the surge of multi- costs all propose higher requirements for the intelligence 
source heterogeneous data, highly uncertain engineering and adaptability of risk early warning systems [3-5]. 
environments, and frequent external disturbances during Therefore, constructing a big data mining-based 
project investment and construction. These problems intelligent risk assessment model has become a key path 
make traditional risk management methods difficult to to improving the investment decisions' scientific nature 
cover the dynamic risk chain throughout the whole  and the power grid enterprises' resilience governance 
316   Informatica 49 (2025) 315–330                                                                                H. Gao et al. 
 
capabilities [6, 7]. In the context of power market comprehensive portrayal of investment risks in power 
liberalization, the continuous increase in the proportion grid engineering projects by constructing a composite 
of renewable energy has made the risk management of model that integrates structured, spatiotemporal, and text 
geographical locations caused by network congestion data. It also aims to verify the advantages of the proposed 
increasingly important. Improving the ability to model method in terms of risk identification accuracy, 
location-related risks has become the core foundation for propagation path reducibility, and risk distribution 
supporting project financing and investment feasibility stability. Thus, it supports power grid enterprises in risk 
assessment [8]. early warning and decision optimization. 
In recent years, artificial intelligence (AI) 
technologies have made remarkable progress in risk 
identification, modeling, and prediction. Model 2 Related work 
architectures represented by graph neural network (GNN), 
attention mechanisms, and deep semantic modeling have With the in-depth application of AI technologies and 
been gradually applied to financial risk control and big data analysis methods in engineering management, 
energy dispatching [9, 10]. Some studies have attempted investment project risk assessment has gradually shifted 
to introduce machine learning (ML) methods into the from traditional static analysis to intelligent prediction 
power engineering field. It includes using eXtreme and dynamic modeling. Aiming at the insufficiency of 
Gradient Boosting (XGBoost) for classification and risk assessment for manufacturing investments, Dong 
identification of construction anomalies, or employing a and Li proposed combining expert experience with big 
convolutional neural network (CNN) for trend prediction data mining to construct project risk indices and 
of construction period delays [11]. However, existing integrating CNN with Long Short-Term Memory (LSTM) 
methods generally suffer from shortcomings such as a for predictive modeling. In multiple sliding window tests, 
single model structure, weak data fusion capability, and the model achieved a Receiver Operating Characteristic 
difficulty in explaining cross-modal causal paths; these (ROC) value of 0.9366 and an average accuracy of 
methods cannot effectively support power grid 94.95%, demonstrating high prediction precision [12]. 
enterprises in achieving full-chain risk perception, Loseva et al., facing the risk assessment task of regional 
dynamic quantification, and structural early warning in a franchising projects, constructed a big data-based credit 
multi-source data environment. Therefore, there is an rating model by combining the SPARK information 
urgent need to construct a multi-modal driven composite system with ML methods. This verified the model's 
risk assessment system for power grid engineering robustness in identifying abnormal risks through 
scenarios. Spearman correlation and confusion matrix [13]. These 
To this end, this study proposes the GridRiskNet studies have provided useful insights into introducing 
model based on big data mining, constructs a fusion composite modeling methods and integrating expert 
mechanism for structured data, unstructured text, and judgment with data-driven mechanisms, gradually 
spatiotemporal data. Thus, it realizes comprehensive promoting the development of investment risk 
modeling and dynamic evaluation of investment risks in assessment towards intelligence and systematization. 
power grid engineering projects. The study's main Over the years, methods such as GNN, deep 
innovations encompass: clustering, and multi-criteria decision-making have been 
(1) Proposing a two-stage GridRiskNet model widely introduced into investment evaluation and project 
architecture: It integrates XGBoost and Light Gradient classification, further enhancing the structural cognitive 
Boosting Machine (LightGBM) for risk capture, and ability of risk assessment. Mostofi et al. constructed a 
models the propagation process of risks in the power grid construction project investment framework based on 
topology through a Graph Attention Recurrent Neural graph attention networks. This framework achieved a 
Network (GA-RNN). classification accuracy of over 98% in three sub-networks 
(2) Introducing Spatio-Temporal Graph of region, country, and financing model, demonstrating 
Convolutional Network (ST-GCN) and cross-modal the advantages of graph structure in modeling investment 
attention mechanisms: It enhances the model's expression decision-making relationships [14]. Qi used regularized 
capabilities for meteorological disturbances and regional topic models and graph clustering methods to construct a 
structural information; financial investment "behavior circle". They mapped 
(3) Constructing a risk principal component customer behaviors to the latent semantic space and 
projection index system based on Principal Component realized risk classification of financial communities and 
Analysis (PCA): It achieves structural clustering and investment plan recommendations through subgraph 
projection analysis of high-dimensional risk samples, and mining [15]. Moreover, Luo and Zhu proposed a deep 
supports the differentiated regional risk management neural network (DNN) model based on transfer learning 
needs of power grid enterprises. for regional investment risk assessment. This model 
Overall, the specific research question is whether maintained high prediction accuracy (up to 92%) in the 
multimodal data fusion and risk propagation modeling case of insufficient samples, demonstrating the potential 
methods can enhance the comprehensive capabilities of of deep learning in solving unbalanced data problems 
risk classification, propagation path identification, and [16]. These studies all reflect the integration trend of risk 
uncertainty quantification in complex power grid assessment models in recent years towards deep 
engineering projects. The target outcome is to achieve a representation learning, multi-layer decision-making 
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 317 
 
structures, and complex graph relationship modeling. multi-source features and generate unified and dense 
Although existing studies have made positive high-dimensional risk representation vectors, laying the 
progress in risk modeling methods, index system foundation for multi-dimensional risk modeling [20]. 
construction, and model accuracy improvement, there are At the core of modeling, GridRiskNet adopts a two-
still three main deficiencies. First, most current models stage hybrid modeling framework. In the first stage, the 
focus on classification or regression prediction of risk improved XGBoost and LightGBM models run in 
probability, lacking the ability to model regional parallel to jointly perform risk prediction on high-
structural propagation characteristics. Second, the dimensional risk representation vectors. Specifically, 
heterogeneity of multi-source data has not been fully XGBoost integrates a dynamic feature selection 
utilized, and a unified representation for structured, mechanism, which dynamically updates feature 
spatiotemporal, textual, and other multimodal importance indices based on sliding window statistical 
information has not been formed. Third, the features to enhance the response capability to dynamic 
interpretability and quantifiability of risk structure risk factors. LightGBM incorporates a time-series-aware 
evolution must be enhanced, making it difficult to support splitting criterion to strengthen the detection capability 
dynamic scheduling and regional risk management of for time-series anomalies such as project schedule delays. 
complex systems such as power grid projects [17]. In The two models output the prediction probabilities of risk 
response to the above shortcomings, this study proposes categories (i.e., risk probability vectors after Softmax) 
a grid engineering project investment risk management and sequences of feature importance scores [21]. 
system based on big data mining - the GridRiskNet model. In the second stage, GA-RNN is used as a meta-
This model reveals the changing trends of high- model, whose core innovation lies in fusing the dual 
dimensional risk structures and supports grid enterprises output information from the first stage mentioned above. 
in accurately perceiving and dynamically controlling Specifically, GA-RNN takes the risk probability vectors 
investment risks at different regions and time scales. of XGBoost and LightGBM as the main input; 
simultaneously, it introduces their feature importance 
3 GridRiskNet model based on big score sequences as auxiliary features to form a 
comprehensively fused feature matrix. This matrix 
data mining contains the risk prediction results from the previous 
stage; it also explicitly integrates the influence weights of 
3.1 Realization process of GridRiskNet features on the model output, thereby enhancing the 
ability to perceive risk propagation mechanisms [22]. 
model Subsequently, based on this matrix, GA-RNN introduces 
The proposed GridRiskNet model realizes a risk propagation graph structure and accurately models 
intelligent assessment of investment risks in power grid the transmission relationships between risk factors 
engineering projects based on multi-source through an adjacency matrix. Moreover, it uses graph 
heterogeneous data fusion and a hybrid ML architecture. attention mechanisms and recurrent neural network 
It first establishes a multimodal data preprocessing layer. (RNN) units to dynamically learn key nodes and main 
For structured data (such as project budgets and channels in risk propagation paths, extracting high-order 
equipment parameters), an adaptive normalization interaction features. 
method is used to unify dimensions, ensuring the The entire GridRiskNet model comprehensively 
consistency of feature scales. For unstructured text data optimizes classification cross-entropy loss, risk 
(including engineering logs and bidding documents), a propagation graph reconstruction error, and feature 
fine-tuned Bidirectional Encoder Representations from stability regularization terms through an end-to-end joint 
Transformers (BERT) model is utilized to deeply extract training strategy. Finally, this model outputs a multi-
semantic features, enhancing the risk perception ability dimensional risk assessment matrix covering risk 
of text information. For spatiotemporal data (such as probability distribution, potential impact range, and 
construction trajectories and meteorological records), ST- structural features. The entire system adopts an online 
GCN is introduced to jointly encode complex incremental learning mechanism, which can continuously 
environmental features from two dimensions: spatial absorb real-time data flow to dynamically update model 
dependence and temporal dynamics [18, 19]. In the parameters; this achieves a high adaptability and 
feature fusion stage, a cross-modal attention mechanism continuous tracking of the risk environment of power grid 
is designed, which can adaptively learn the weight engineering projects. The implementation process and 
relationships between different data modalities. pseudocode of GridRiskNet are illustrated in Figures 1 
Meanwhile, this mechanism can effectively integrate and 2. 
318   Informatica 49 (2025) 315–330                                                                                H. Gao et al. 
 
 
Figure 1: The implementation process of GridRiskNet 
 
class GridRiskNet:     def _process_structured(self, data):
    def __init__(self, config):         return AdaptiveNormalization(data)
        self.config = config     
        self.preprocessor = MultiModalPreprocessor(config)     def _process_text(self, data):
        self.feature_fusion = CrossModalAttention(config)         return FineTuneBERT(self.bert_model, data)
        self.first_stage = HybridEnsembleModels(config)     
        self.risk_graph = RiskPropagationGraph(config)     def _process_spatiotemporal(self, data):
        self.second_stage = GARNNMetaModel(config, self.risk_graph)         return STGCN(self.stgcn_params).forward(data)
    
    def train(self, dataset): class CrossModalAttention:
        features = self.preprocessor.process(dataset)     def __call__(self, features):
        fused_features = self.feature_fusion(features)         weights = self._compute_attention_weights(features)
        first_stage_preds = self.first_stage.train(fused_features, dataset.labels)         return weighted_sum(features, weights)
        self.second_stage.train(first_stage_preds, fused_features, dataset.labels)
        class HybridEnsembleModels:
        for epoch in range(self.config.epochs):     def __init__(self, config):
            preds = self.predict(dataset)         self.xgboost = ImprovedXGBoost(config)
            loss = self._calculate_loss(preds, dataset.labels)         self.lightgbm = ImprovedLightGBM(config)
            self._update_models(loss)     
        def train(self, features, labels):
    def predict(self, dataset):         xgb_preds = self.xgboost.train(features, labels)
        features = self.preprocessor.process(dataset)         lgbm_preds = self.lightgbm.train(features, labels)
        fused_features = self.feature_fusion(features)         return combine_predictions(xgb_preds, lgbm_preds)
        first_stage_preds = self.first_stage.predict(fused_features)
        return self.second_stage.predict(first_stage_preds, fused_features) class RiskPropagationGraph:
        def __init__(self, config):
    def update_with_new_data(self, new_data):         self.adj_matrix = self._construct_adjacency_matrix(config.risk_factors)
        features = self.preprocessor.update_and_process(new_data)     
        self.first_stage.update(features, new_data.labels)     def _construct_adjacency_matrix(self, risk_factors):
        first_stage_preds = self.first_stage.predict(features)         # Construct adjacency matrix based on domain knowledge or data learning
        self.second_stage.update(first_stage_preds, features, new_data.labels)         pass
class MultiModalPreprocessor: class GARNNMetaModel:
    def process(self, dataset):     def train(self, first_stage_preds, features, labels):
        return {         # Train GA-RNN model
            'structured': self._process_structured(dataset.structured),         pass
            'text': self._process_text(dataset.text),     
            'spatiotemporal': self._process_spatiotemporal(dataset.spatiotemporal)     def predict(self, first_stage_preds, features):
        }         # Predict risk assessment matrix
        pass    
 
Figure 2: The pseudocode of GridRiskNet 
 
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 319 
 
learn the most discriminating risk signal source when 
3.2 Mathematical modeling principle of faced with heterogeneous features and semantic diversity. 
GridRiskNet model The hybrid modeling framework is divided into two 
stages. In the first stage, the improved XGBoost and 
Figure 1 shows the complete implementation LightGBM models are run in parallel. The objective 
process of the GridRiskNet model, covering the entire function of XGBoost reads: 
process from user-requested risk assessment to model ℒ𝑥𝑔𝑏 = ∑𝑛 𝑙(𝑦 + ∑𝐾
𝑖=1   𝑖 , ?̂?𝑖) 𝑘=1   Ω(𝑓𝑘)       (8) 
output results and continuous updates. The model is built 
?̂?
on multi-source heterogeneous data, fusing structured, 𝑖 = ∑𝐾
𝑘=1   𝑓𝑘(𝐱𝑖) , which represents the predicted 
1
unstructured, and spatiotemporal information, and value of sample 𝑖 . Ω(𝑓𝑘) = 𝛾𝑇𝑘 + 𝜆 ∥ 𝜔 ∥2  is the 
2 𝑘
achieves intelligent prediction of power grid project risks regular term of the 𝑘-th tree. 
through multi-stage ML and graph modeling strategies. To adapt to the dynamic change of time, XGBoost 
The key computational links in the model are described integrates a sliding window statistical module to 
mathematically and logically as follows. dynamically adjust the importance of features: 
In the data preprocessing stage, the structured input (𝑡)
𝐼𝑗 = ∑𝑡
𝑠=𝑡−𝑤                     (9) 
data is first normalized. Let the original data matrix be: (𝑠)
Δ𝐺
𝐗 𝑗  indicates the gain change of the 𝑗-th feature 
𝑠 ∈ ℝ𝑛×𝑑𝑠               (1) 
(𝑡)
𝐗𝑠 represents 𝑛 records, and each record contains in the 𝑠 -th time step. 𝐼𝑗   c a dynamic feature 
𝑑𝑠  structural features. Normalization calculation is as importance index within the XGBoost stage, used to 
follows: reflect gain changes within the sliding window; it is also 
𝐗 𝜇
?̃? 𝑠− mainly applied to internal feature selection and dynamic 
𝑠 =
𝑠              (2) 
𝜎𝑠+𝜖 weight adjustment of the first-stage model. 
𝜇𝑠  denotes the column vector, indicating the 
LightGBM introduces the split criterion of time 
average value of each column. 𝜎𝑠  represents the 
series perception to enhance the ability of anomaly 
standard deviation (SD), and 𝜖 is a positive number to recognition. Let the time series samples be 
prevent the denominator from being zero. This {𝐱1, 𝐱2, … , 𝐱𝑇}, and its splitting gain is defined as: 
processing ensures that the model has numerical 2
∈𝐿 𝑔 (∑
consistency among different dimensional features. 𝒢 𝑇 (∑𝑖  
𝑡 𝑖) 𝑖∈𝑅  𝑔 2
𝑗 = ∑𝑡=1   𝑤𝑡 ⋅ [ + 𝑡 𝑖)
]      (10) 
∑𝑖∈𝐿  ℎ 𝜆 ∑
𝑡 𝑖+ 𝑖∈𝑅  ℎ
𝑡 𝑖+𝜆
For unstructured text data 𝒯 = {𝑡1, 𝑡2, … , 𝑡𝑚} , 𝑔𝑖 and ℎ𝑖 are gradients and second derivatives. 𝐿𝑡 
semantic features are extracted by fine-tuning BERT 
and 𝑅𝑡 represent the left and right sample sets of the 
model, and the output is: 
current split, and 𝑤 −𝛽(𝑇−𝑡)
𝐇 𝑡 = 𝑒  is the time attenuation 
𝑡 = BERT(𝒯) = [𝐡1; 𝐡2; … ; 𝐡𝑚], 𝐡𝑖 ∈ ℝ𝑑𝑡    (3) 
weight. 
𝐡𝑖 is the semantic vector of the 𝑖-th text, and the 
In the second stage, GA-RNN is used to capture the 
dimension is 𝑑𝑡 . This step preserves the semantic 
high-order risk path. Its node status is updated to: 
relationship between text contexts and forms an (𝑡) (𝑡−1) (𝑡−1)
important basis for the model to recognize risk semantics. 𝐡𝑖 = GRU(∑𝑗∈𝒩(𝑖)   𝛼𝑖𝑗𝐡𝑗 , 𝐡𝑖 )   (11) 
For spatiotemporal data including trajectory and 𝒩(𝑖) indicates the neighbor set of node 𝑖 . 𝛼𝑖𝑗 v 
meteorology, it is expressed as: the edge weight under the graph attention mechanism: 
𝐗 𝑇×𝑁×𝐹 exp⁡(LeakyReLU(𝐚⊤[𝐖𝐡
𝑠𝑡 ∈ ℝ               (4) 𝑖∥𝐖𝐡𝑗]))
𝛼𝑖𝑗 = 12) 
𝑇 refers to the time step. 𝑁 denotes the space node ∑𝑘∈𝒩(𝑖)  exp⁡(LeakyReLU(𝐚
⊤    (
[𝐖𝐡𝑖∥𝐖𝐡𝑘]))
(such as the site number), and 𝐹 represents the space- Finally, the system integrates three kinds of 
time characteristic dimension of each node. ST-GCN is objectives: classification performance, graph structure 
used for modeling, and its core propagation equation is: reconstruction, and feature stability by jointly optimizing 
𝐙(𝑙+1) = 𝜎(∑𝐾 (𝑙) the overall loss function. 
𝑘=0   𝐀𝑘𝐙 𝐖𝑘)         (5) 
𝐀 ℒ𝑐𝑒 is the cross-entropy loss: 
𝑘 means the adjacency matrix of order 𝑘 ; 𝐙(𝑙) 
indicates the node representation of the 𝑙-th layer; 𝐖 ℒ𝑡𝑜𝑡𝑎𝑙 = ℒ𝑐𝑒 + 𝜆1 ⋅ ℒ𝑔𝑟𝑎𝑝ℎ + 𝜆2 ⋅ ℒ𝑟𝑒𝑔   (13) 
𝑘 
stands for the weight matrix; 𝜎 represents the activation The loss of graph structure consistency is: 
function. This network structure can capture the linkage ℒ𝑔𝑟𝑎𝑝ℎ =∥ 𝐀 − ?̂? ∥2𝐹            (14) 
relationship between spatial topology and time evolution. The regular term of characteristic disturbance reads: 
In the feature fusion stage, the model introduces ℒ 𝑑
𝑟𝑒𝑔 = ∑𝑗=1   Var(∇𝐱 ?̂?)          (15) 
𝑗
cross-modal attention mechanism to automatically On the system deployment level, GridRiskNet 
aggregate multi-source information. Let two modal adopts online incremental learning mechanism. Let the 
features be 𝐅𝑖 and 𝐅𝑗 respectively, and their attention current parameter be 𝜃𝑡, and the model is updated after 
weights are calculated as: receiving new samples (𝐱𝑡 , 𝑦𝑡): 
exp⁡(𝐅⊤
𝛼 𝑖 𝐖𝑎𝐅𝑗) 𝜃𝑡+1 = 𝜃𝑡 − 𝜂 ⋅ ∇𝜃ℒ(𝐱𝑡 , 𝑦𝑡; 𝜃𝑡)       (16) 
𝑖,𝑗 =            (6) 
∑𝑘  exp⁡(𝐅
⊤    
𝑖 𝐖𝑎𝐅𝑘) 𝜂  represents the learning rate. ∇𝜃  denotes the 
After fusion, a unified risk representation vector is gradient operator. This mechanism ensures that the model 
obtained: has adaptive update abilities in a dynamic risk 
𝐅𝑓𝑢𝑠𝑖𝑜𝑛 = ∑𝑗   𝛼𝑖,𝑗 ⋅ 𝐅𝑗              (7) environment. 
This mechanism enables the model to automatically 
320   Informatica 49 (2025) 315–330                                                                                H. Gao et al. 
 
4 Experimental analysis of into spatiotemporal tensors, from which meteorological 
risk features are extracted through ST-GCN encoding. 
GridRiskNet model project The spatial topology data is obtained from the 
OpenStreetMap power network dataset 
investment risk management based on 
(https://wiki.openstreetmap.org/wiki/Power_networks). 
big data mining The OSMnx library extracts GIS data of substations and 
transmission lines, constructing an adjacency matrix to 
model the physical connection of the power grid. For 
4.1 Data used in the study 
unstructured text data, engineering accident reports from 
To verify the risk management capability of the 2018 to 2023 corresponding to EIA projects are manually 
GridRiskNet model for power grid enterprise engineering screened from the Federal Energy Regulatory 
projects, the study uses three core public datasets for Commission (FERC) engineering accident report library 
experimental validation and designs a fusion scheme for (https://elibrary.ferc.gov/eLibrary/search). After parsing 
data heterogeneity. First, the structured data adopts the the text with Apache Tika, the fine-tuned BERT is input 
U.S. Energy Information Administration (EIA) power to generate semantic vectors. 
infrastructure dataset The following fusion strategies are adopted to 
(https://www.eia.gov/electricity/data.php). Its API address the heterogeneity of multi-source data. 1) 
interface screens power grid engineering project data Temporal alignment: All data is uniformly converted to 
from 2018 to 2023, including budget, construction period, Universal Time Coordinated (UTC) timestamps and 
equipment models, and other fields. After extracting the aggregated at a granularity of 1 day. 2) Spatial alignment: 
original CSV-format data using Python's eia-python Meteorological stations, power grid nodes, and 
library, adaptive normalization is performed to eliminate engineering sites are associated through GIS coordinate 
dimension differences, which are associated with matching (error <1km). 3) Consistency of feature 
subsequent spatiotemporal data through project IDs and encoding: Structured data is normalized to [0,1], text 
date fields. Second, the spatiotemporal data selects the vectors are unified into 768 dimensions via BERT, and 
National Oceanic and Atmospheric Administration spatiotemporal data is compressed into 256-dimensional 
(NOAA) Global Historical Climatology Network-Daily features through ST-GCN. 4) Cross-modal attention 
(GHCN-Daily) mechanisms automatically learn the weights of each 
(https://www.ncei.noaa.gov/access/metadata/landing- modality, assigning higher attention scores to extreme 
page/bin/iso?id=gov.noaa.ncdc:C00861). Daily values of meteorological text descriptions (such as "hurricane 
temperature, precipitation, and wind speed are damage"). The specific process of importing data into 
downloaded, and stations are matched to the project's GridRiskNet is presented in Figure 3. 
geographic coordinates. The rnoaa toolkit converts them 
 
Figure 3: The specific process of importing data into GridRiskNet 
 
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 321 
 
4.2 GridRiskNet model's thinking on risk of potential impact scope, the propagation path of risks in 
the power grid topology is analyzed through GNN to 
management ability analysis of power 
identify high-risk nodes and their potentially affected 
grid enterprises' engineering projects surrounding areas. The model inputs the fused multi-
The study is conducted from two aspects to dimensional features into a two-stage modeling 
effectively analyze the risk management capability of the framework and outputs a risk assessment matrix 
GridRiskNet model for power grid enterprise engineering including the above two types of indices to support the 
projects. First, at the level of risk probability distribution, refined and structured management and decision-making 
the probability of risks such as budget overrun and of power grid project risks [24, 25]. The entire analysis's 
construction period delay is evaluated based on structured key indices and evaluation criteria are exhibited in Tables 
data and spatiotemporal features [23]. Second, at the level 1 and 2. 
 
Table 1: Explanation of key indices for analysis using the GridRiskNet model 
 
Analytical 
Index Data source/calculation method Description 
dimension 
Reflecting the distribution 
Principal component score of high- position of samples in the risk 
dimensional risk vector output by the principal component space, and is 
Risk PCA Projection Score 
GridRiskNet model after PCA used to identify high-risk 
dimensionality reduction clustering or structural abnormal 
samples. 
The capture times of abnormal events Monitoring the frequency of 
Risk Time-series Anomaly Frequency 
in LightGBM abnormal progress. 
probability 
Softmax outputs the maximum Evaluating the credibility of the 
distribution Model Confidence Score 
probability value model output 
analysis 
Assessing the dispersion degree 
 
The ratio of SD to the mean value of of risk probability distribution, 
Risk Coefficient of Variation 
the risk probability distribution the greater it is, the higher the 
risk instability is. 
Comprehensive weighted scores of Representing the strength of risk 
Risk Importance Index 
multiple dimensions influence 
Information entropy calculation of The degree of uncertainty in 
Risk Entropy 
risk probability distribution evaluating risk results. 
Critical path length identified in GA- Length and complexity of the risk 
Risk Propagation Path Length 
Analysis of RNN propagation path 
the The weighted average of the affected Reflect the vulnerability of nodes 
potential Node Vulnerability Score 
probability of each node in GNN in the power grid 
influence Based on propagation path depth and 
range Indicating the physical scope of 
Risk Impact Radius the spatial adjacency matrix in the 
risk propagation 
graph structure 
 
Table 2: Criteria for determining key indices in the GridRiskNet model analysis 
 
Index Type Criteria 
Secondary - [0, 2) Low projection; [2, 5) Medium projection; ≥5 High projection, 
Risk PCA Projection Score 
calculation tending to abnormal samples or extreme types   
Time-series Anomaly Frequency Model output - [0, 2) Normal; [2, 5) Early warning; ≥5 Abnormal   
- [0.9, 1] High credibility; [0.7, 0.9) Medium credibility; <0.7 Low 
Model Confidence Score Model output 
credibility   
Secondary 
Risk Coefficient of Variation - [0, 0.3) Stable; [0.3, 0.6) Fluctuating; ≥0.6 Highly unstable   
calculation 
Secondary 
Risk Importance Index - [0, 40) Secondary; [40, 70) Important; [70, 100] Critical   
calculation 
Secondary 
Risk Entropy - [0, 1) Low uncertainty; [1, 2) Medium; ≥2 High   
calculation 
Risk Propagation Path Length Model output - [1, 3) Local; [3, 6) Regional; ≥6 Global   
Node Vulnerability Score Model output - [0, 0.4) Low; [0.4, 0.7) Medium; [0.7, 1] High   
Secondary 
Risk Impact Radius - [0, 5) Station level; [5, 20) Line level; ≥20 Regional level 
calculation 
 
In Table 2, the index equations involved in (1) Risk PCA Projection Score 
secondary calculation are as follows: Here, "Risk PCA Projection Score" measures the 
322   Informatica 49 (2025) 315–330                                                                                H. Gao et al. 
 
position of a sample in the dominant risk structure within the system, which helps to identify complex and 
the risk feature space, revealing the main variation trends unpredictable risk scenarios, represented as: 
in complex multi-dimensional risk features. Specifically, 𝐻 = −∑𝑛
𝑖=1   𝑝𝑖 ⋅ log2⁡(𝑝𝑖 + 𝜖)     (19) 
this index is calculated based on the PCA method. First, 𝐻  denotes the information entropy of risk 
it standardizes the annual high-dimensional risk features distribution. 
(such as cost overrun risk, environmental and climate (4) Risk Importance Index 
pressure, etc.). Then, it extracts the first K principal This index quantifies the comprehensive 
component directions and measures the sample's contribution of each risk feature to the overall risk 
projection value in the principal component space assessment results. It reflects the importance level of each 
through eigenvalue weighting. This score reflects the risk feature by weighted accumulation of the impact 
degree of variance contribution of the sample along the degree of each feature on the model loss and normalized 
principal component axis of risk, rather than a simple sum averaging combined with model weights. Features with 
of the scores of each risk factor. Due to the different higher values play a greater role in the overall risk 
statistical distributions of risk features each year, this decision-making, expressed as: 
index changes with the year; it comprehensively reflects (𝑡) (𝑡)
1 𝑤 ⋅Δ𝐿
𝑅𝐼𝑗 = ∑𝑇
=   𝑗 𝑗
the overall trend of the risk structure of power grid 𝑇 𝑡 1 ( (𝑡 )
∑𝑑
)          (20) 
𝑘=1  Δ𝐿𝑘
projects in the current year and potential abnormal 
𝑅𝐼𝑗  represents the risk importance index of the 𝑗-th 
clustering characteristics. The calculation is expressed as: 
feature, which is a risk importance index in the entire 
𝑠𝑖 = ∑𝐾 ⊤
𝑘=1   𝜆𝑘 ⋅ (𝐮𝑘(𝐱𝑖 − 𝝁))2          (17) 
GridRiskNet framework. It is comprehensively 
𝐱𝑖  represents the high-dimensional risk feature 
calculated based on the feature weights and loss impact 
vector of the 𝑖-th sample. 𝝁 indicates the sample mean 
during the global model training process, belonging to a 
vector. 𝐮𝑘 denotes the feature vector in the direction of 
unified index at the global level. 𝑇 means the number of 
the 𝑘-th principal component. 𝜆𝑘 is the feature value of (𝑡)
the 𝑘-th principal component, and 𝐾 means the number model iterations or average times; 𝑤𝑗   refers to the 
of selected principal components. model weight of the 𝑗 -th feature in the 𝑡 -th iteration; 
(2) Risk Coefficient of Variation (𝑡)
Δ𝐿𝑗  is the influence degree of the 𝑗-th feature on the 
This index is used to measure the relative dispersion loss function; 𝑑 denotes the total number of features. 
of risk probability distribution, which is an important (5) Risk Impact Radius 
index reflecting risk instability. This index describes the It evaluates the spatial propagation range of risks in 
fluctuation range of various risk probabilities in the the power grid graph structure, serving as a key index for 
whole by calculating the ratio of the SD of risk measuring the physical scope affected by risks. This 
probability to the mean. A higher value indicates that the index calculates the average impact radius of all risk 
risk probability distribution is more dispersed and the source nodes in the network based on the power grid 
overall instability is stronger. The expression is: topology, geographical distance between nodes, and risk 
1
√ ∑𝑛 1   − )
𝑛 𝑖= (𝑝 2
𝑖 ?̅? propagation probability. A larger value indicates a wider 
𝐶𝑉 =              (18) 
?̅?+𝜖 spatial propagation range of risk events, which is applied 
𝑝𝑖   denotes the prediction probability of Class 𝑖 to regional risk impact analysis, as follows: 
1
risk. ?̅?  means the average value of various risk 𝑅 ∑𝑁
= 𝑠
=   𝑁 ⋅ 𝑑
𝑁 𝑖 1 ∑𝑗=1   𝑡𝑖𝑗 𝑖𝑗 ⋅ 𝑝𝑖𝑗          (21) 
probabilities, and 𝑛 represents the total number of risk 𝑠
𝑁𝑠 represents the number of risk source nodes. 𝑁 
categories. 
denotes the total number of nodes in the graph. 𝑡𝑖𝑗 is the 
(3) Risk Entropy 
adjacency relationship between nodes 𝑖 and 𝑗 (1 means 
"Risk Entropy" measures the degree of uncertainty 
connection). 𝑑𝑖𝑗   means the geographical distance 
in the risk probability distribution, reflecting the 
discreteness and unpredictability of risk results. Based on between nodes. 𝑝𝑖𝑗  refers to the risk propagation 
information entropy theory, this index reveals the probability from nodes 𝑖 to 𝑗. 
potential risk mixture in the system by calculating the Figure 4 presents the pseudocode of the index 
entropy value of the probability of all risk categories. A implementation involving secondary calculation. 
higher risk entropy value indicates more uncertainties in 
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 323 
 
import math
import numpy as np # 4. Risk Importance Index
def compute_risk_importance(weights, delta_losses):
# 1. Risk PCA Projection Score     T = len(weights)       
def compute_risk_pca_projection_scores(X, K):     D = len(weights[0])      
    mu = np.mean(X, axis=0)     importance = [0.0] * D
    X_centered = X - mu     for j in range(D):
    cov_matrix = np.cov(X_centered, rowvar=False)         for t in range(T):
    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)             total_delta = sum(delta_losses[t])
    sorted_idx = np.argsort(eigenvalues)[::-1]             if total_delta == 0:
    eigenvalues = eigenvalues[sorted_idx][:K]                 continue
    eigenvectors = eigenvectors[:, sorted_idx][:, :K]             importance[j] += weights[t][j] * delta_losses[t][j] / total_delta
        importance[j] /= T
# 2. Risk Coefficient of Variation     return importance
def compute_risk_cv(probabilities):
    mean_p = np.mean(probabilities) # 5. Risk Impact Radius
    std_p = np.std(probabilities) def compute_risk_impact_radius(adj_matrix, distance_matrix, 
    epsilon = 1e-6 propagation_probs, source_nodes):
    return std_p / (mean_p + epsilon)     N = len(adj_matrix)
    total_radius = 0.0
# 3. Risk Entropy     for i in source_nodes:
def compute_risk_entropy(probabilities):         for j in range(N):
    epsilon = 1e-6             if adj_matrix[i][j] == 1:
    return -sum(p * math.log2(p + epsilon) for p in probabilities)                 total_radius += distance_matrix[i][j] * propagation_probs[i][j]
    return total_radius / len(source_nodes)
 
Figure 4: Pseudocode of index implementation involving secondary calculation 
 
The experimental environment and key parameters are detailed in Table 3. 
 
Table 3: Experimental environment and key parameters arrangement of the study 
 
Category Configuration item Parameter setting 
Hardware Computing platform NVIDIA A100 (40GB memory) × 4 
environment CPU AMD EPYC 7763 (64-core) 
 Memory 512GB DDR4 
Deep learning framework PyTorch 1.12 + CUDA 11.6 
Software 
GNN library PyTorch Geometric 2.2.0 
environment 
Traditional ML library XGBoost 1.6 + LightGBM 3.3.2 
 
NLP toolkit HuggingFace Transformers 4.25 (BERT-base) 
ST-GCN layer number 3 layers (hidden layer dimension =256) 
Model Graph Attention Layer (number of heads =8) +GRU (hidden layer 
GA-RNN unit 
architecture =512) 
 Transmodal attention 
Multi-attention (number of heads =4, fusion dimension =1024) 
mechanism 
Batch size 256 (structured data)/32 (graph data) 
Initial learning rate 3e-4 (AdamW optimizer) 
Training 
regularization L2 Weight Attenuation =1e-5+Dropout=0.3 
parameters 
The loss of verification set does not decrease for 10 consecutive 
Early stop mechanism 
rounds 
 
The study designs ablation experiments before the same hyperparameter configuration on the complete 
conducting formal experiments to verify the actual dataset, focusing on evaluating three indices. These 
contribution of each core component of GridRiskNet. It indices include risk classification performance (F1-score, 
seeks to quantitatively measure the impact of different Receiver Operating Characteristic - Area Under Curve 
modules on the model's overall performance from a (ROC-AUC)), risk propagation accuracy (Risk Impact 
systematic perspective. Specifically, four ablation Radius error), and uncertainty quantification ability (Risk 
versions are set by sequentially disabling the cross-modal Entropy). This experiment aims to clarify the mechanism 
attention mechanism, the risk propagation modeling of action of each module, especially their specific 
module of GA-RNN, the dynamic feature selection contributions to power grid risk transmission modeling, 
module, and the risk propagation graph reconstruction modal feature fusion, and risk stability control. The 
term in the joint loss function. All experiments maintain results of the ablation experiments are listed in Table 4. 
 
324   Informatica 49 (2025) 315–330                                                                                H. Gao et al. 
 
 
Table 4: Ablation experimental results of the GridRiskNet model 
 
Ablation version F1-Score ROC-AUC Risk Impact Radius error (km) Risk Entropy 
Full GridRiskNet 0.892 0.962 4.8±0.9 0.89 
No cross-modal attention 0.835 0.917 7.5±1.6 1.12 
No GA-RNN 0.846 0.926 14.2±2.3 0.96 
No dynamic feature selection 0.863 0.941 5.7±1.2 0.94 
Risk-free propagation graph 
0.871 0.948 4.9±1.0 2.08 
reconstruction 
 
The results of the ablation experiments indicate that optimizing uncertainty quantification; its elimination 
each module of GridRiskNet makes a significant causes a substantial rise in Risk Entropy. Overall, 
contribution to the model performance. The cross-modal GridRiskNet achieves the unity of high performance and 
attention mechanism is particularly crucial in improving high robustness through the collaboration of various 
classification performance; after being disabled, the F1- modules, with all components being indispensable. 
score decreases by 6.4%, the ROC-AUC drops by 4.7%,  
and the Risk Entropy rises significantly. This shows that 
this module significantly impacts the collaborative 4.3 Analysis Results of GridRiskNet 
perception of complex semantic and meteorological 
features. The risk propagation modeling module of GA- model on risk management ability of 
RNN notably improves the Risk Impact Radius error; power grid enterprises' engineering 
after being disabled, the error increases sharply to 14.2 projects 
km, verifying its core role in power grid topology 
modeling. The dynamic feature selection module mainly 
enhances the temporal sensitivity of the model; its 4.3.1 Risk probability distribution analysis 
removal leads to a significant drop in F1-score, although GridRiskNet's annual Risk PCA Projection Score 
it has a limited impact on propagation errors. The risk results for power grid enterprise engineering projects are 
propagation graph reconstruction term has a significant summarized in Table 5. 
effect on suppressing prediction fluctuations and 
 
Table 5: Annual Risk PCA Projection Score results 
 
Ambient Supply 
Cost Equipment Policy Risk PCA 
climate chain 
Year overrun technical compliance Projection Risk tendency 
pressure fluctuation 
risk C1 risk C3 risk C5 Score 
C2 C4 
Middle projection 
2018 1.235 0.873 -0.452 0.217 0.095 2.108 (structural 
abnormality) 
2019 0.892 0.654 -0.128 -0.304 0.062 1.546 Low projection 
High projection 
2020 2.874 1.982 1.235 -0.873 0.517 4.856 
(extreme type) 
2021 1.023 1.457 0.782 0.396 -0.215 2.48 Middle projection 
High Projection 
2022 3.125 2.769 2.014 1.358 -0.947 5.894 
(abnormal clustering) 
High projection 
2023 4.562 3.217 3.058 2.146 1.372 7.779 
(extreme anomaly) 
 
Table 5 shows that cost overrun risk (C1) and the "double carbon" policy. The model reflects high-risk 
environmental climate pressure (C2) have always been clustering scenarios such as C1-C3 in 2023 through the 
the dominant risks, especially showing exponential spatial distribution of principal components, reflecting 
growth after 2020. In 2023, C1 (4.562) increased by 269% the early warning of composite risks. 
compared with 2018 (1.235), which is highly consistent Based on the above analysis, the risk probability 
with the reality of global inflation and frequent extreme distribution analysis of grid enterprise engineering 
weather. The sudden turn positive (1.372) of policy projects by GridRiskNet is organized, and the annual 
compliance risk (C5) in 2023, to some extent, reveals the average results of other indices are shown in Figure 5. 
surge of compliance costs brought by the deepening of 
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 325 
 
 Time-series Anomaly Frequency
 Model Confidence Score
14  Risk Coefficient of Variation 1.0
 Risk Importance Index
 Risk Entropy 100
0.9 12 3
0.8
10
80
0.8 8 0.6 2
6 60
0.4
0.7 4 1
40
2
0.2
0.6 0 20 0
2018 2019 2020 2021 2022 2023
Year
 
Figure 5: The annual average results of other indices in GridRiskNet risk probability distribution analysis 
Note: The curves of each index correspond one-to-one with the corresponding color coordinate axes on the right 
 
In Figure 5, regarding the frequency of time-series assessment from "anomaly detection" to "impact 
anomalies, the average annual growth rate of abnormal prediction". 
events during 2019-2023 reached 65.7%. The model 
objectively reflects the increasing complexity of risks 4.3.2 Analysis of potential influence range 
through the continuous decline in confidence (from 0.912 
to 0.632). The sudden increase in risk entropy (2.158) in The study divides the U.S. power grid into three 
2020 preceded the peak of the importance index (83.47); major regions: The Eastern Interconnection Power Grid 
this indicates that GridRiskNet can capture the implicit (EIPG), the Western Interconnection Power Grid (WIPG), 
correlations of risk factors through information entropy. and the Texas Interconnected Power Grid (TIPG). The 
The synchronous increase in the coefficient of variation EIPG covers the eastern, midwestern, and parts of 
(from 0.712 to 0.859) and risk entropy (from 2.547 to southern U.S. states, extending northward to eastern 
2.981) after 2022 reveals the transformation trend of risk Canada. The WIPG covers most western U.S. states, 
distribution from centralized to discretized; this provides connecting with western Canada in the north and 
key evidence for power grid enterprises to optimize the reaching parts of Mexico in the south. The TIPG includes 
allocation of risk reserve funds. The core advantage of the most of Texas. These regional grids are interconnected at 
model lies in the quantitative modeling of the dynamic limited DC points but mostly operate independently. 
coupling relationship among the three dimensions of Based on this, GridRiskNet's analysis results on the 
engineering anomalies, risk uncertainty, and impact potential impact scope of power grid enterprise 
degree. Meanwhile, it realizes the full-chain risk engineering projects are displayed in Figure 6. 
 Risk Propagation Path Length  Risk Propagation Path Length
 Node Vulnerability Score
1.00 9.0 10  Node Vulnerability Score
1.00 9.0 10
 Risk Impact Radius (km)  Risk Impact Radius (km)
7.5 8 7.5 8
0.75 0.75
6.0
6 6.0
6
4.5 4.5
4 4
0.50 0.50
3.0 3.0
2 2
1.5 1.5
0.25 0 0.25 0
EIPG WIPG TIPG EIPG WIPG TIPG
Interconnected area Interconnected area
(a) (b)
  
Node Vulnerability Score
Risk Propagation Path Length
Model Confidence Score
Time-series Anomaly Frequency
Risk Impact Radius (km)
Node Vulnerability Score
Risk Propagation Path Length
Risk Coefficient of Variation
Risk Importance Index
Risk Entropy
Risk Impact Radius (km)
326   Informatica 49 (2025) 315–330                                                                                H. Gao et al. 
 
 Risk Propagation Path Length  Risk Propagation Path Length
 Node Vulnerability Score  Node Vulnerability Score
1.00 9.0 20 1.00 9.0 20
 Risk Impact Radius (km)  Risk Impact Radius (km)
7.5 16 7.5 16
0.75 0.75
6.0 6.0
12 12
4.5 4.5
8 8
0.50 0.50
3.0 3.0
4 4
1.5 1.5
0.25 0 0.25 0
EIPG WIPG TIPG EIPG WIPG TIPG
Interconnected area Interconnected area
(c) (d)
  
 Risk Propagation Path Length  Risk Propagation Path Length
 Node Vulnerability Score  Node Vulnerability Score
1.00 9.0 30 1.00 9.0 40
 Risk Impact Radius (km)  Risk Impact Radius (km)
7.5 25 7.5 32
0.75 20 0.75
6.0 6.0
24
15
4.5 4.5
16
0.50 10 0.50
3.0 3.0
8
5
1.5 1.5
0.25 0 0.25 0
EIPG WIPG TIPG EIPG WIPG TIPG
Interconnected area Interconnected area
(e) (f)
  
Figure 6: Analysis of GridRiskNet's potential impact on power grid enterprise engineering projects ((a) 2018; (b) 
2019; (c) 2020; (d) 2021; (e) 2022; (f) 2023) 
Note: The curves of each index correspond one-to-one with the corresponding color coordinate axes on the right 
 
Based on index definitions and annual data, and 2023, reaching 26.4 km in 2023, reflecting the 
GridRiskNet demonstrates a scientific nature and significant cumulative effect of regional risk diffusion. 
structural insight in the analysis of potential impact This significant spatial diffusion trend is not caused by 
ranges. First, for Risk Propagation Path Length, WIPG single-year fluctuations but by the accumulation of 
remains at a high level throughout the entire period, continuous transmission chains. Its essence is the scope 
reaching 8.1 in 2023, significantly exceeding that of other expansion of power grid risks through multiple rounds of 
regions. This gap is not accidental but a reflection of transmission and cross-node amplification, which is more 
long-term structural characteristics, revealing the obvious, especially in scenarios with multiple 
extensibility of transmission links in the western power overlapping risks. The reason why GridRiskNet can 
grid due to complex terrain and diverse energy structures. effectively capture this phenomenon lies in the deep 
Second, the changing trend of Node Vulnerability Score coupling of its GNN and propagation probability 
is more enlightening; the three major power grids' scores mechanism. It can dynamically track the evolution of risk 
all rose sharply in 2020, with the average value doubling paths and ranges in complex networks, thereby 
compared to the previous year. This synchronous surge identifying the critical points and amplification effects of 
highly aligns with the global external shock events in risk diffusion. Therefore, it possesses real value in 
2020, indicating that the model is highly sensitive to regional risk monitoring and trend early warning. The 
network vulnerability under systemic disturbances. capture of this cumulative diffusion trend reflects the 
In addition, the Risk Impact Radius index essentially model's structural sensitivity to "spatiotemporal 
measures the physical diffusion capacity of risks from overlapping risks", which far exceeds the single-
source nodes to the surrounding space; its calculation description capability of traditional static indices. 
integrates network topology, geographical distance, and 
propagation probability. According to the data, WIPG's 4.3.3 Comparative Analysis of GridRiskNet 
Risk Impact Radius rapidly increased from 10.8 km in 
and other models 
2021 to 25.7 km in 2022, and further to 35.2 km in 2023, 
with a cumulative increase of over 225% in two years. To comprehensively evaluate the GridRiskNet 
TIPG also showed a continuous expansion between 2022 model's effectiveness in investment risk management of 
Node Vulnerability Score Node Vulnerability Score
Risk Propagation Path Length Risk Propagation Path Length
Risk Impact Radius (km) Risk Impact Radius (km)
Node Vulnerability Score Node Vulnerability Score
Risk Propagation Path Length Risk Propagation Path Length
Risk Impact Radius (km) Risk Impact Radius (km)
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 327 
 
power grid engineering projects, this study designs two ability (Risk Entropy) to reflect the model's 
types of comparative experiments. The first type is a comprehensive capabilities. The second type is a detailed 
horizontal comparison with existing State-of-the-Art comparison with classic baseline models, comparing 
(SOTA) models. It selects representative models in risk individual methods such as XGBoost, LightGBM, ST-
assessment, regional propagation modeling, and GCN, and BERT-BiLSTM. It focuses on examining the 
uncertainty quantification in recent years, including model's performance in robustness, spatial-temporal 
methods such as CNN-LSTM, to ensure fair comparison feature extraction, anomaly detection, etc. It highlights 
under a unified dataset and the same task indices. The the advantages of GridRiskNet in multimodal data fusion, 
comparison covers risk classification performance (F1- dynamic feature learning, and risk path modeling. The 
Score, ROC-AUC), regional propagation accuracy (Risk results of the two types of comparisons are exhibited in 
Impact Radius error), and uncertainty quantification Tables 6 and 7. 
 
Table 6: Comparison of the performance of GridRiskNet and SOTA models on the same dataset 
 
Risk Impact 
Model Researchers F1-Score↑ ROC-AUC↑ Radius error±σ Risk Entropy↓ 
(km) ↓ 
Dong and Li 
CNN-LSTM  0.724 0.892 28.3±4.1 1.87 
(2025) 
The investment 
framework based Mostofi et al. 
0.781 0.903 22.6±3.8 1.52 
on graph attention (2025) 
networks 
Topic model 
Qi (2025) 0.698 0.841 - 2.03 
clustering 
DNN based on Luo and Zhu 
0.763 0.885 - 1.68 
transfer learning (2024) 
The proposed 
GridRiskNet 0.892 0.962 4.8±0.9 0.89 
model 
 
Table 7: Robustness comparison results of GridRiskNet and baseline models 
 
Recall for delay anomaly 
Model F1-Score Risk Impact Radius error±σ (km) 
detection 
XGBoost 0.712 32.5±6.2 0.683 
LightGBM 0.735 29.8±5.4 0.721 
ST-GCN 0.683 18.7±3.5 0.592 
BERT-
0.698 - 0.654 
BiLSTM 
GridRiskNet 0.892 4.8±0.9 0.937 
 
The SOTA model's comparison experiment reveals prediction fluctuations in high-risk scenarios through the 
that GridRiskNet achieves considerable leadership in risk risk propagation graph reconstruction mechanism in the 
classification, propagation modeling, and uncertainty joint loss function, minimizing Risk Entropy and 
quantification. Although the GAT investment framework showing stronger stability of risk distribution. 
performs well in traditional graph learning tasks, it cannot In the comparison with baseline models, 
deeply integrate complex semantic features and GridRiskNet also demonstrates excellent robustness and 
meteorological data, leading to an underestimation of overall advantages. Compared with XGBoost and 
risks in some catastrophic events. In contrast, LightGBM, GridRiskNet not only improves the F1-score 
GridRiskNet fully captures the coupling relationship but also is much higher than other models, showing 
between accident texts and meteorological variables strong adaptability in complex dynamic data 
through cross-modal attention mechanisms and dynamic environments. Concerning regional propagation accuracy, 
feature fusion, and is significantly superior to other the Risk Impact Radius error of GridRiskNet fluctuates 
models in F1-score and ROC-AUC. Meanwhile, its GA- very little; it is far better than ST-GCN, which only 
RNN structure can accurately model risk transmission considers spatiotemporal features, proving the 
paths under power grid topology, greatly reducing Risk effectiveness of its spatial topology and semantic 
Impact Radius error. This verifies its high fitting ability information fusion strategy. Regarding time-series 
to the physical characteristics of power grids. Regarding anomaly detection, GridRiskNet combines dynamic 
uncertainty control, GridRiskNet effectively suppresses feature selection and time-series-aware splitting 
328   Informatica 49 (2025) 315–330                                                                                H. Gao et al. 
 
strategies, notably improving the recall and detecting GridRiskNet. The training efficiency in a complete 
potential abnormal risks earlier. Overall, GridRiskNet production environment is tested on an NVIDIA A100×4 
outperforms existing mainstream methods in multi- cluster, with the following records. They encompass: (1) 
dimensional tasks, having high accuracy and robustness; average convergence time in the training phase (in hours 
it also has a more suitable direction in key links of power (h)); (2) maximum inference delay per sample in the 
grid engineering risk management, such as risk inference phase (in milliseconds (ms)); (3) peak memory 
transmission, modal coupling, and dynamic prediction. consumption (in gigabyte (GB)); (4) training time per 
0.01 F1-Score (in h). Under the condition of meeting the 
4.3.4 GridRiskNet training cost and efficiency needs of offline batch processing and periodic risk 
monitoring in power grids, the practical controllability of 
analysis GridRiskNet is scientifically measured. The analysis 
Tests on computing cost and efficiency are results are suggested in Table 8. 
conducted to evaluate the engineering practicality of 
 
Table 8: Analysis results of the training cost and efficiency of GridRiskNet and baseline models under the same dataset 
 
Peak memory 
Convergence Maximum inference Training time per 0.01 F1-
Model consumption 
time (h) delay per sample (ms) Score (h) 
(GB) 
XGBoost 1.2 0.09 1.5 0.17 
LightGBM 1.0 0.07 1.2 0.14 
ST-GCN 8.5 0.36 5.1 1.25 
BERT-BiLSTM 12.3 0.45 6.4 1.77 
GridRiskNet 17.8 0.63 9.8 2.00 
 
According to the results in Table 8, although uncertainty to describe the impact of policy changes on 
GridRiskNet has a longer absolute training time (17.8 h) project risks. It essentially structurally summarizes policy 
and higher single-sample inference delay (0.63 ms) than volatility and does not depend on specific legal 
other models, its key index "training time per 0.01 F1- provisions. At the same time, GridRiskNet focuses on 
score" is 2 h, which is lower than that of BERT-BiLSTM risk propagation mechanisms and multimodal feature 
(1.77 h), bringing greater benefits. This indicates that its fusion, and its methodology is a universal architecture for 
high complexity effectively "exchanges for performance" global engineering projects. Therefore, even with U.S. 
with obvious non-linear returns. Moreover, the inference data, the revealed coupling relationships and propagation 
delay of 0.63 ms is still far lower than the acceptable mechanisms of multi-source heterogeneous risks have 
threshold (usually at the second level) in offline power high reference value for Chinese power grid enterprises. 
grid risk prediction, making it suitable for daily or even Additionally, the advantages of GridRiskNet over 
hourly scheduling scenarios. The memory consumption existing SOTA models are reflected not only in the 
of GridRiskNet matches the typical Graphic Processing superiority of indices but also in innovative 
Unit configuration of power enterprises (<10 GB), breakthroughs in methodological mechanisms. First, 
making deployment feasible. Overall, although regarding risk classification, GridRiskNet introduces a 
GridRiskNet has a higher training cost, it has the cross-modal attention mechanism to deeply explore the 
advantages of high performance returns, controllable coupling relationship between accident texts and 
inference, and resource affordability, thus making the meteorological features. It effectively makes up for the 
feasibility for practical engineering applications. perception defects of traditional single-modal models in 
complex scenarios. This enables its F1-score and ROC-
4.4 Discussion AUC to be significantly better than those of models such 
as GAT. Second, in regional propagation modeling, 
It should be explained that the experimental data of GridRiskNet is based on the GA-RNN structure and 
this study are based on U.S. sources (EIA, NOAA, OSM). embeds a risk propagation graph reconstruction 
However, the research on investment risk issues of power mechanism. It can dynamically identify key transmission 
grid engineering projects has a high degree of paths in the power grid topology and accurately capture 
commonality and structural consistency. The core lies in the risk diffusion process. Thus, it can minimize the Risk 
the complexity of the investment process, construction Impact Radius error and demonstrate high fitting ability 
environment, and risk chain of power grid projects, not to the physical structure of the power grid. Third, for 
limited to specific countries. Cost overrun, climate uncertainty quantification, the joint loss function design 
pressure, equipment technical failure, supply chain of GridRiskNet integrates classification error, graph 
fluctuation, and policy compliance risks (C1-C5) are five reconstruction error, and feature stability regularization 
key risks commonly faced by global power grid projects. terms. This helps to control prediction fluctuations in 
Among them, "policy compliance risk" is abstracted in high-risk scenarios and reduces risk entropy to the lowest 
the model as an index of institutional environment level. Compared with SOTA models that mainly rely on 
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 329 
 
traditional graph networks or single deep models, G. Goal-oriented graph generation for transmission 
GridRiskNet realizes the collaborative optimization of expansion planning. Engineering Applications of 
structured, spatiotemporal, and semantic data. Its core Artificial Intelligence, 2025, 149(4): 110350. 
innovation lies in the deep integration of the three https://doi.org/10.1016/j.engappai.2025.110350 
mechanisms: "dynamic feature learning, propagation [2] Silvester B R. Hesitation at increasing integration: 
path modeling, and risk distribution stability". This not The feasibility of Norway expanding cross-border 
only improves model performance but also achieves a renewable electricity interconnection to support 
balance between the complexity of risk perception, path European decarbonisation. Technological 
interpretability, and prediction stability, possessing high Forecasting and Social Change, 2025, 213(3): 
practical value and theoretical promotion potential. 123917. 
https://doi.org/10.1016/j.techfore.2024.123917 
[3] Yu Z, Guo L I, Wen T. Design management of clean 
5 Conclusion energy projects from the perspective of partnering. 
Journal of Tsinghua University (Science and 
This study constructs the GridRiskNet risk Technology), 2025, 65(1): 115-124. 
management system based on big data mining around the https://doi.org/10.16511/j.cnki.qhdxxb.2024.22.042 
intelligent management needs of investment risks in [4] Nyangon J. Climate-proofing critical energy 
power grid enterprise engineering projects. It also infrastructure: Smart grids, artificial intelligence, 
realizes the fusion modeling and dynamic evaluation of and machine learning for power system resilience 
structured, unstructured, and spatiotemporal data. against extreme weather events. Journal of 
Through the two-stage modeling architecture, the model Infrastructure Systems, 2024, 30(1): 03124001. 
performs well in risk probability distribution https://doi.org/10.1061/JITSE4.ISENG-2375 
identification and regional propagation path modeling. [5] Sun B, Zhang Y, Fan B, Xie P. An optimal sequential 
The experimental results show that GridRiskNet has investment decision model for generation-side 
strong risk structure identification and regional difference energy storage projects in China considering policy 
perception abilities under multiple indices. From 2020 to uncertainty. Journal of Energy Storage, 2024, 83(11): 
2023, the Risk PCA Projection Score has significantly 110748. https://doi.org/10.1016/j.est.2024.110748 
climbed, revealing the dominant position of cost overrun, [6] Sun P, Yuan C, Li X, Di J. Big data analytics, firm 
climate pressure, and equipment risk in the evolution of risk and corporate policies: Evidence from China. 
engineering risks. At the same time, the model can Research in International Business and Finance, 
effectively capture the changing trends of risk path length 2024, 70(23): 102371. 10.1016/j.ribaf.2024.102371 
and impact radius in the analysis of the potential impact [7] Hammouri Q, Alfraheed M, Al-Wadi B M. 
scope of each power grid region. Moreover, it can identify Influence of information technology on project risk 
the propagation characteristics of structural vulnerability management: The mediating role of risk 
of the western power grid and the high-impact radius of identification. Journal of Project Management, 2025, 
the Texas power grid, providing quantitative support for 10(1): 143-150. 
regional risk management.   https://doi.org/10.5267/j.jpm.2024.10.001 
Although GridRiskNet shows strong comprehensive [8] Risanger S, Mays J. Congestion risk, transmission 
performance in the experiment, there is still room for rights, and investment equilibria in electricity 
further optimization. The current model still relies on a markets. The Energy Journal, 2024, 45(1): 173-200. 
fixed attention mechanism in the fusion process between https://doi.org/10.5547/01956574.45.1.sris 
different data modalities, which struggles to fully [9] Khanna K, Govindarasu M. Resiliency-driven 
characterize the dynamic coupling relationship between cyber–physical risk assessment and investment 
heterogeneous features due to time and place. In addition, planning for power substations. IEEE Transactions 
the physical constraint mechanism is not introduced in on Control Systems Technology, 2024, 7(3): 21. 
the risk propagation modeling, and the mapping accuracy https://doi.org/10.1109/TCST.2024.3378990 
of the actual operation state of the power grid still has [10] Liu H, Li X, Zhang Y. Investment risk assessment 
room for improvement. Follow-up research can further based on improved BP neural network. International 
introduce reinforcement learning and physical graph Journal of Automation and Control, 2024, 18(6): 
embedding methods to improve the model's adaptability 636-654. 
to dynamic environmental changes. Furthermore, https://doi.org/10.1504/IJAAC.2024.142093 
expanding the model to broader scenarios such as new [11] Bussmann N, Giudici P, Tanda A, Yu P Y. 
energy access and emergency dispatching supports the 
Explainable machine learning to predict the cost of 
intelligent transformation of investment risk management capital. Frontiers in Artificial Intelligence, 2025, 
of power grid enterprises in a pluralistic and complex 8(1): 1578190. 
environment. https://doi.org/10.3389/frai.2025.1578190 
[12] Dong S, Li A. The application of deep learning 
models in investment risk analysis of intelligent 
References 
manufacturing projects. Intelligent Decision 
Technologies-netherlands, 2025, 3(1): 14. 
[1] Varbella A, Gjorgiev B, Sartore F, Zio E, Sansavini 
330   Informatica 49 (2025) 315–330                                                                                H. Gao et al. 
 
https://doi.org/10.1177/18724981251325923 algorithm. Informatica, 2023, 47(2): 16-21. 
[13] Loseva O V, Munerman I V, Fedotova M A. https://doi.org/10.31449/inf.v47i2.4026 
Assessment and classification models of regional [25] Feng J. Multi-attribute perceptual fuzzy information 
investment projects implemented through decision-making technology in investment risk 
concession agreements. Economy of Regions, 2024, assessment of green finance Projects. Journal of 
20(1): 276-292. Intelligent Systems, 2024, 33(1): 20230189. 
https://doi.org/10.17059/ekon.reg.2024-1-19 https://doi.org/10.1515/jisys-2023-0189 
[14] Mostofi F, Bahadır Ü, Tokdemir O B, Toğan V,  
Yepes V. Enhancing strategic investment in 
construction engineering projects: A novel graph 
attention network decision-support model. 
Computers & Industrial Engineering, 2025, 203(2): 
111033. https://doi.org/10.1016/j.cie.2025.111033 
[15] Qi Y. Multi modal graph search: intelligent massive-
scale subgraph discovery for multi-category 
financial pattern mining. IEEE Access, 2025, 1(1): 
331. 
https://doi.org/10.1109/ACCESS.2025.3553560 
[16] Luo S, Zhu X. Regional investment risk evaluation 
based on compound risk correlation coefficient and 
migration learning approach. Journal of 
Computational Methods in Science and Engineering, 
2024, 24(1): 327-342. 10.3233/JCM-237045 
[17] Gao C, Wang X, Li D, Han C, You W, Zhao Y. A 
novel hybrid power-grid investment optimization 
model with collaborative consideration of risk and 
benefit. Energies, 2023, 16(20): 7215. 
https://doi.org/10.3390/en16207215 
[18] Oikonomou K, Maloney P R, Bhattacharya S, et al. 
Energy storage planning for enhanced resilience of 
power systems against wildfires and heatwaves. 
Journal of Energy Storage, 2025, 119(1): 116074. 
https://doi.org/10.1016/j.est.2025.116074 
[19] Tavakoli M, Chandra R, Tian F, Bravo C. Multi-
modal deep learning for credit rating prediction 
using text and numerical data streams. Applied Soft 
Computing, 2025. 2(4): 112771. 
https://doi.org/10.1016/j.asoc.2025.112771 
[20] Liu K, Liu M, Tang M, Zhang C, Zhu J. XGBoost-
based power grid fault prediction with feature 
enhancement: application to meteorology. 
Computers, Materials & Continua, 2025, 82(2): 7. 
https://doi.org/10.32604/cmc.2024.057074 
[21] Zhou X, Li J. Risk assessment of high-voltage 
power grid under typhoon disaster based on model-
driven and data-driven methods. Energies, 2025, 
18(4): 809. https://doi.org/10.3390/en18040809 
[22] Sari R P, Febriyanto F, Adi A C. Analysis 
implementation of the ensemble algorithm in 
predicting customer churn in telco data: A 
comparative study. Informatica, 2023, 47(7): 22-26. 
https://doi.org/10.31449/inf.v47i7.4797 
[23] Tikhomirova T, Tikhomirov N. Methods for 
assessing low profitability risks of an investment 
project in conditions of uncertainty. Revista Gestão 
& Tecnologia, 2024, 24(2): 244-257. 
https://doi.org/10.20397/2177-
6652/2024.v24i2.2845 
[24] Li L. Dynamic cost estimation of reconstruction 
project based on particle swarm optimization 
https://doi.org/10.31449/inf.v49i16.9600 Informatica 49 (2025) 331–350 331 
 
Real-Time Motion Recognition in Special Training Systems Based on 
the Optimized BBO-KNN Method of Motion Morphology 
 
Yin Xu 
School of Physical Education, Henan Kaifeng College of Science Technology and Communication, Kaifeng 475001, 
China 
E-mail: xumeili2025@163.com 
 
Keywords: KNN dynamic weight, sports, special training  
 
Received: June 6, 2025 
The traditional sports training boxing system has problems with insufficient accuracy and poor real-time 
performance in high similarity action classification, and lacks adaptability to individual action 
differences. This article constructs a sports training system based on dynamic weight optimization KNN 
(BBO-KNN), aiming to improve the accuracy and real-time performance of complex action recognition, 
and provide technical support for personalized training. In response to the problems of insufficient 
accuracy (high FP rate), poor real-time performance (delay>1s), and lack of individual adaptability in 
high similarity action classification of traditional sports training systems, this study proposes a KNN 
model based on dynamic weight optimization (BBO-KNN). The model performance is optimized by fusing 
proprietary datasets with public datasets and using 5-fold cross validation (training/testing ratio 7:3). 
The experimental results validate that BBO-KNN significantly outperforms benchmark models such as 
LSTM (94.50%) and SVM (89.30%) in accuracy (96.20% ± 0.3%). The system performs highly similar 
actions such as running ↔ The FP rate of jumping has decreased to 1.6%, and the global FP rate is 
1.39%.and robustness (noise interference fluctuation ± 1.2%). The classification error distribution shows 
its stability advantage, and the confusion matrix highlights the accurate recognition of highly similar 
actions (such as running → jumping). Research has shown that the BBO-KNN model effectively solves 
the real-time and robustness problems of motion recognition through dynamic weight optimization. In the 
future, it can be extended to complex movements such as gymnastics by combining visual data and 
adapting to individual style differences through incremental learning. 
Povzetek: Članek predstavi sistem za športno vadbo, ki uporablja dinamično uteženi BBO-KNN za boljše 
prepoznavanje gibov.
 
1 Introduction processing capabilities. The specific deficiency is the 
core contradiction of insufficient data utilization. 
Sports special training is undergoing a profound Currently, only 42% of sports teams have established a 
change from traditional experience-oriented to data- complete analysis system, which means that more than 
driven. This transformation process presents multi- half of the training data is dormant and cannot be 
dimensional technical characteristics and systematic converted into effective training decision-making basis.  
development bottlenecks. From a macro perspective, it The deficiency of this data value mining stems from 
can be seen that the digital penetration of modern sports multiple technical obstacles, including but not limited to 
training systems has reached a considerable scale. imperfect feature engineering construction, inefficient 
According to authoritative data from the General data cleaning process, and insufficient adaptability of 
Administration of Sport of China in 2024, more than analysis model. What is more prominent is the static 
three-quarters of professional sports teams have deployed phenomenon of evaluation indicators. Up to 91% of 
various wearable devices for training data collection. training systems still adopt the fixed weight scoring 
This proportion has nearly doubled compared to five mechanism [1]. This rigid evaluation system can't adapt 
years ago, indicating that a fundamental paradigm shift is to the dynamic changes of athletes' physiological 
taking place in sports training methodology. parameters, resulting in systematic deviation between 
There is a sharp intergenerational gap between the training programs and actual needs. In addition, the 
rapid popularization of hardware and the intelligence feedback delay problem further amplifies this mismatch. 
level of software systems. The widespread deployment of The decision lag of 2.3 training cycles on average makes 
data acquisition equipment has not simultaneously the training adjustment always lag behind the actual state 
brought about a significant improvement in training change of athletes, resulting in the time loss of training 
efficiency, but has exposed structural defects in data  effect [2]. 
332   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
By deeply analyzing the technical essence behind and delay>200ms. To achieve this assumption, the system 
these phenomena, we can find that the fundamental uses the BBO algorithm to optimize the feature weight 
reason for the homogeneity of training programs lies in vector to enhance local feature sensitivity, combines K-
the uniformity of feature extraction dimensions and the Means clustering to compress the dataset size, and 
lack of personalized modeling, which reflects the designs a lightweight edge architecture for real-time 
fundamental contradiction between the traditional batch processing. 
computing model and the real-time decision-making The implementation of minute level weight updates 
needs [3]. Therefore, solving these systemic defects through sliding time windows and incremental learning 
requires the introduction of innovative algorithm relies on a triple mechanism: 
architectures and technical paradigms. There are still two (1) The 200ms sensor window slides in 10ms steps 
key optimization spaces in current technology. The first to ensure real-time feature extraction; 
is the balance between computing resource consumption (2) Incremental learning only updates cluster centers 
and real-time requirements, especially the control of (non feature weights), and adjusts secondary cluster 
computational complexity when processing high- points every 5 days through new data (as mentioned in 
dimensional features. The second is the model the conclusion); 
generalization ability in small sample scenarios and the (3) The feature weight WK3 remains static, and its 
adaptive performance when facing new athletes or rare “dynamic” effect comes from the weight distribution 
training situations [4]. optimized by BBO, while window sliding allows the 
The core innovation of KNN dynamic weight model to continuously capture temporal features. 
optimization technology lies in building a four-
dimensional optimization space, realizing minute-level 
weight updates in the time dimension, and compressing 2 Related work 
the data processing delay to 1/60 of the traditional method 
through the sliding time window mechanism and 2.1 Research status of sports special 
incremental learning algorithm. Moreover, it completes training system 
multimodal data fusion in the feature dimension, 
integrating multi-source information such as Rodriguez et al. [8] developed a multi-sensor fusion 
biomechanics, physiology and biochemistry, and wearable system. It integrates IMU, sEMG and heart rate 
environmental parameters [5, 6]. In the individual monitoring modules, increasing the data collection 
dimension, it establishes an athlete-specific model and dimension to 23 physiological indicators, but there is a 
achieves efficient matching of similar samples through 15% sensor signal interference problem. The 4D optical 
dynamic neighborhood search technology. In the capture solution proposed by Cizmic et al. [9] improves 
environmental dimension, it integrates venue equipment the motion analysis accuracy to 0.3mm, but the system 
parameters to build a complete training situation construction cost is as high as 2 million yuan, making it 
perception system. This multi-dimensional optimization difficult to popularize and apply. At present, non-contact 
architecture enables the system to process nonlinear and monitoring technology based on millimeter wave radar 
non-stationary training data, effectively solving the can realize micro-motion capture within a range of 5m, 
response hysteresis problem of traditional systems [7]. but the sampling rate is limited to 120Hz. 
The traditional sports training classification system The BP neural network evaluation model 
encounters issues of insufficient accuracy and poor real- constructed by Balkhi et al. [10] improves the accuracy 
time performance in high-similarity action classification, of technical action scoring to 89% in sports events, but 
and lacks adaptability to individual motion variations. requires more than 800 hours of labeled data training. 
Therefore, this paper constructs a sports training system Calderón-Díaz et al. [11] introduced the transfer learning 
based on Biogeography-Based OptimizationKNN (BBO- method, which can realize personalized modeling of new 
KNN), aiming to improve the accuracy and real-time athletes with only 200 samples, but the cross-event 
performance of complex action recognition and provide transfer error still reaches 28%. It is worth noting that the 
technical support for personalized training.  digital twin evaluation system developed by Iduh et al. 
This study aimed to investigate the performance [12] controls the sports action prediction error within 1.2 ° 
limits of the BBO-KNN (BBO optimized KNN) through real-time physical simulation, but it needs to be 
algorithm for high-similarity action recognition by equipped with the support of a supercomputing center. 
addressing a specific research question. The specific From the above research on sports special training 
hypothesis is whether BBO-KNN could reduce the false system, the current system generally faces three major 
positive rate (FP rate) to below 2%, while maintaining a challenges: (1) the asynchronous problem of multi-source 
stable end-to-end processing latency below 20ms and a data leads to 27% information loss; (2) The lack of 
classification accuracy better than 95%. This goal is interpretability of the model leads to the trust crisis of 
directly aimed at the core defects of traditional systems coaches; (3) The contradiction between hardware 
(such as LSTM and SVM) in high similarity action (such portability and accuracy is prominent. It is particularly 
as running and jumping) classification, with FP rate>4.2% noteworthy that 82% of commercial systems still use 
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 333 
 
static evaluation algorithms, which cannot adapt to the increased by 19%, but the construction of rule base relies 
dynamic changes of athletes' status. on a large amount of expert knowledge. 
From the above research, the current research 
2.2 Application of optimization algorithm mainly faces three bottlenecks: (1) the contradiction 
between the real-time performance and accuracy of the 
in training system 
algorithm, and the optimal system still has a delay of 5-8 
Chen et al. [13] introduced genetic algorithm into minutes; (2) The model is not explainable enough, and 68% 
sports special training cycle planning, and improved the of AI decisions cannot provide reasonable explanations; 
matching degree of training scheme by 31% through (3) The cross-project transfer capability is weak, and the 
adaptive cross-mutation strategy, but there is the problem average error is 37%. In particular, 82% of commercial 
of slow iterative convergence speed (an average of 14 training systems (2024 market research) still adopt static 
hours). The improved particle swarm optimization optimization strategies, which are difficult to adapt to the 
algorithm by Taborri et al. [14] optimizes the load dynamic changes of athletes' status. 
distribution of strength training in sports events, which 
increases the maximum strength growth rate of athletes 2.3 Research status of application of KNN 
by 22%, but the sensitivity of the algorithm parameters is 
high and needs repeated debugging.  optimization algorithm in sports 
The LSTM-ATT hybrid model developed by Hanif training 
et al. [15] achieves 92% accuracy in the evaluation of The weighted dynamic KNN model proposed by 
sports-specific actions, but the model needs 150,000 Merzah et al. [20] improves the accuracy rate to 94% in 
pieces of labeled data for training. AshokKumar et al. [16] sports action recognition, but the real-time calculation 
applied reinforcement learning to optimize sports- delay is still 1.2 seconds. The quantized distance 
specific strategies, which increased the athlete's scoring calculation method developed by Bunker et al. [21] can 
rate by 29%, but there was a problem of high training increase the speed of athletes' posture analysis by 100 
costs (200 hours of simulated adversarial data was times, but it needs the support of special quantum 
required). The meta-learning method can shorten the computing equipment. The improved KNN scheme 
model adaptation cycle for new athletes from 14 days to combined with SHAP value interpretation proposed by 
5 days, but it has a huge demand for computing resources, Teixeira et al. [22] reduces the error of sports action 
requiring 4 A100 graphics cards. evaluation from 3.2° to 0.8°, but the complexity of feature 
The Pareto frontier algorithm proposed by Kumar et engineering increases by 3 times. 
al. [17] balances technical improvement and injury risk The sliding window incremental learning system 
in sports training, which optimizes the training benefit- applied by Woltmann et al. [23] shortens the update cycle 
risk ratio by 37%, but the complexity of the algorithm of the training model to 15 minutes, but the memory 
leads to a decrease in real-time performance and a delay footprint is still as high as 32GB. The multi-modal 
of up to 11 minutes. The NSGA-III algorithm developed distance measurement method proposed by Sonalcan et 
by Molavian et al. [18] realizes the multi-objective al. [24] combines electromyographic and mechanical 
optimization of sports specialties and improves the characteristics in sports events, so that the prediction 
competition performance by 0.8%, but it needs the error of action angle is < 0.5 °, but 17 sensor data need to 
support of accurate biomechanical modeling. be synchronized. 
Malamatinos et al. [19] applied fuzzy logic to optimize 
sports posture, and the completion of movements was 
Table 1 below summarizes the relevant work: 
Table 1: Summary of related work 
 
Technical 
Method and Key Results Limitations and technical bottlenecks 
direction 
Method: Multi sensor wearable 
Limitations: 15% signal interference<br>Bottleneck: 
system (IMU/sEMG/heart 
Asynchronous multi-source data leads to 27% 
Multi sensor rate)<br>Results: 23 physiological 
information loss, and there is a contradiction between 
fusion system indicators were collected, and the 
portability and accuracy (the cost of a 0.3mm precision 
data dimension was increased by 
system reaches 2 million yuan) 
300% 
Method: High precision optical 
Limitations: Supercomputing Center Dependency (2-
4D optical marker point tracking<br>Result: 
million-yuan cost)<br>Bottleneck: High hardware 
capture scheme Motion capture accuracy reaches 
deployment costs, difficult to popularize applications 
0.3mm 
334   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
Method: Multi layer 
Limitations: 800 hours of annotated data training 
BP neural backpropagation 
required<br>Bottleneck: high data dependency, long 
network model network<br>Result: Technical 
model update cycle (>2 weeks) 
action scoring accuracy rate of 89% 
Method: Cross athlete feature Limitations: Cross project migration error of 
Transfer learning 
transfer<br>Result: New athlete 28%<br>Bottleneck: weak domain adaptability, 
program 
modeling only requires 200 samples insufficient generalization ability 
Method: Weighted Neighbor Limitations: Real time latency of 1.2 
Dynamic KNN 
Classification<br>Result: Action seconds<br>Bottleneck: Low computational efficiency, 
algorithm 
recognition accuracy rate of 94% unable to meet real-time requirements of<100ms 
Method: Quantization feature 
Limitations: Requires specialized quantum devices. 
Quantum distance similarity measurement<br>Result: 
Bottleneck: Strong hardware dependency and 
calculation Attitude analysis speed increased by 
extremely high commercialization costs 
100 times 
Method: Fusion of 
Limitations: 17 sensors need to be synchronized. 
Multimodal KNN electromyography and mechanical 
Bottleneck: The system integration complexity is high, 
optimization features<br>Result: Prediction error 
and the engineering implementation is difficult 
of action angle<0.5 ° 
 
From the above research, the current research on the 3.1 Design of improved nearest neighbor 
application of KNN optimization algorithm in sports 
classification algorithm 
training faces three core challenges: There is a 
contradiction between real-time requirements and The KNN algorithm generally uses the majority 
computational accuracy, and the optimal system still has voting method. It assumes that there are N labeled 
a delay of 8-15 seconds; (2) The asynchrony of multi-
source data leads to a loss of 27% of feature information; samples T = (x1, y1 ) ,(x2 , y2 ) , ,(xN , yN )  , x  
i
(3) The cost of personalized adaptation is too high, and it 
takes 14-20 days to build a single athlete model. In represents a sample with n -dimensional features, 
particular, 83% of the existing systems (2025 market 
research) still adopt the static K-value strategy, which is x   Rn  label of 
i ,i =1,2, , N  and y   is the x  , 
i i
difficult to adapt to the dynamic changes of training 
intensity. 
yi  = (c2 ,c2 , ,cl ) The label value y of the sample to 
.
3 Sports special training system be tested is obtained by the classification rule, as shown 
based on KNN dynamic weight in the following formula[25]: 
optimization 
y = argmaxc  H y ,c i = N j = L  
 ( ) ( ), 1,2, , , 1,2, ,
j xi Nk x i j
This paper improves the KNN classifier, which has 
excellent performance in feature engineering processing, (1) 
and proposes a KNN classification algorithm based on 0 y 
the K-means clustering algorithm. This paper combines ( i c j
H yi ,c j ) =                       (2) 
1 yi = c j
the univariate feature selection method and the 
BGWOPSO algorithm to search for the optimal feature Among them, 
set, and selects the BBO algorithm as the weight Nk (x) = xi | xi  is the K  nearest  neighbor  samples of  x
optimization module of the subsequent human motion 
intention recognition model to propose a human motion , and when y and otherwise it is 
i = c  , 
j H ( yi ,c j ) =1
 
intention recognition model that can use fewer features to 
identify multiple motion patterns and has a higher 0. 
classification accuracy.  
  
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 335 
 
3.1.1 Comparison of feature normalization standardize the data. Although all data information is 
methods used in the dimensionless process, the importance of each 
When a sample includes multiple eigenvalues, variable is not treated equally, and the analysis weight of 
features with larger magnitudes will weaken features with variables with large differences is relatively large. The 
smaller magnitudes and affect the accuracy of the KNN 
conversion function is: 
classifier. Therefore, the data needs to be normalized, and 
the commonly used normalization methods are maximum x −
x       
scale =                      (4) 
normalization and mean-variance normalization. 2

The extreme value normalization method uses the 
The maximum normalization and mean-variance 
maximum and minimum values in the variable value 
range to scale the original data proportionally to the data normalization methods are used to normalize the post-FC 
within the 0,1   range to eliminate the impact of the mixed data set respectively, and the mean and standard 
dimension. Since the extreme value normalization deviation are extracted as eigenvalues, which are input 
method is only related to the two extreme values of the 
into KNN classifier, and the classification accuracy of the 
maximum and minimum values, the scaling of each 
variable is overly dependent on the two extreme values. two are compared. When the K values of the nearest 
The conversion function of the extreme value neighbor are taken from 1 to 15 respectively, the 5-fold 
normalization is as follows [26]: 
cross-validation accuracy of the KNN classifier after 
x − x
x min
scale1 =                            (3) normalization by the maximum normalization method 
xmax − xmin
and the mean-variance normalization method is 
The mean-variance normalization method uses the 
compared, and the results are shown in Figure 1(a). 
mean and standard deviation of the original data to 
 
(a) Comparison of classifier accuracy when using maximum normalization and mean-variance normalization 
methods; 
336   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
 
(b) Comparison of classifier accuracy using different distance measurement formulas 
Figure 1: Comparison of classifier accuracy (Using post FC mixed dataset (action fragment sampling rate 
100Hz)) 
 
It can be seen from Figure 1(a) that after the two 
1
normalization methods normalize the post-FC mixed data p
( ) ( n (l ) (l) ) p
L4 xi , x j =       ( )
l=1 xi − x     8  
set, the accuracy of KNN classifier is not much different, j
and there is no obvious law at all. The most commonly 
used method is the nearest neighbor K value. When 
1
K = 3   and K = 4  , the data processed by the mean- T
( ) ( n (l ) (l ) 1( (l ) (l )
5 ))2
L xi , x =
variance normalization method has higher classification j  l 1 xi − x j  − )
= xi − x  (9  
j
accuracy, so in subsequent experiments, the mean-
variance normalization method is used to normalize the Among them, feature space   is an n-dimensional 
data. 
real vector space Rn  , x  , a d
i , x   
j  n    is the 
3.1.2 Comparison of distance measurement 
covariance matrix of multidimensional random variables. 
formulas 
 When the data of each dimension are independent 
Commonly used distance measures in KNN and identically distributed, the Mahalanobis distance is 
algorithm include Manhattan distance, Euclidean the Euclidean distance. The post-FC mixed data set is 
distance, Chebyshev distance, Min distance and normalized using the mean-variance normalization 
Mahalanobis distance, etc. The formulas are [27]: method, and the mean and standard deviation are 
extracted as feature values and input into the KNN 
( ) n (l ) (l )
L1 xi , x j = l=1 xi − x j                (5) 
classifier. The dataset is a post-FC hybrid dataset (feature 
dimensions: mean and standard deviation), and the K 
1 value ranges from 1 to 15 (full range validation). The 
2
( ) n ( ) ( ( validation method uses 5-fold cross validation 
2 i , x j = ( l l )
L x        
l= xi − x )2
     6) 
1 j
(independent calculation of accuracy for each fold). The 
results are shown in Table 2. 
( ) (l ) (l )
L3 xi , x j = max xi − x j              (7) 
l
 
Table 2: Comparison of classification accuracy of KNN classifier using different distance metrics 
 
K value 
Distance formula 
1 2 3 4 5 6 7 8 
Manhattan distance 92.57  92.03  91.66  91.58  91.12  91.49  91.27  92.12  
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 337 
 
Euclidean distance 92.12  92.67  92.40  92.67  91.39  91.75  90.48  91.02  
Chebyshev distance 90.48  90.59  90.20  89.89  89.01  89.83  88.37  88.09  
Min's distance 92.03  92.57  91.58  91.48  91.26  91.57  90.13  90.60  
Mahalanobis distance 90.14  89.60  90.87  89.87  89.31  89.78  89.90  90.24  
K value 
Distance formula 
9 10 11 12 13 14 15  
Manhattan distance 90.89  90.93  91.58  90.93  91.30  90.00  90.59   
Euclidean distance 91.49  90.20  90.24  90.29  89.92  90.10  89.01   
Chebyshev distance 87.55  87.45  87.55  87.27  87.19  87.36  86.35   
Min's distance 90.57  90.57  90.02  89.38  90.57  89.47  88.82   
Mahalanobis distance 86.45  89.25  89.09  86.81  88.23  86.25  86.08   
 
It can be seen from Figure 1 (b) and Table 2 that nearest neighbor K value is K = 4 . 
using Euclidean distance and Manhattan distance can 
make the algorithm obtain high accuracy, but using 3.1.4 Dataset size reduction based on K-
Manhattan distance will bring serious computational 
means clustering algorithm 
burden to the algorithm and the prediction time is too 
long. Considering the accuracy and operation time, Real-time implementation of KNN classifier on 
Euclidean distance is selected for measurement. intelligent dynamic knee prostheses is difficult. In order 
Figure 1 (b) and Table 2 show that the Euclidean to solve this problem, a combination of KNN algorithm 
distance has an accuracy of 92.67% at K=4, which is and K-Means clustering algorithm is proposed. To ensure 
better than the Chebyshev distance (89.89%) and has the accuracy of the experiments, several trials are 
better hardware adaptability. Hardware level performed to determine the cluster centers. 
optimization: The native multiply add instruction (MAC) In the post-FC hybrid dataset, each motion state 
of ARM7-M FPU holds the sum of squares operation, contains 120 sets of motion data, and after feature 
which enables 24-dimensional feature calculation to be extraction, the data storage amount is still huge. The K-
performed at only 4.2us/time, 38% faster than Manhattan Means clustering algorithm can significantly reduce the 
distance, especially avoiding the prediction penalty of size of the data set and remove most of the similar sample 
Manhattan distance absolute value branch (Figure 1 (b) points.  
accuracy curve confirms this choice). To reduce the computational complexity of KNN, 
hierarchical K-Means clustering is employed to compress 
3.1.3 Selection of nearest neighbor value each class of action data independently: K-Means 
clustering is performed on the samples of each action 
How to choose the appropriate nearest neighbor K class, and the set of tooth count centers is represented as 
value is also critical to improving the accuracy of the KI = KI1, KI2 , , KIl , where l is the number of classes. 
KNN classifier. The smaller the K value is, the easier it is In addition, the corresponding primary cluster centers are 
for the model to overfit. When K = 1, it is equivalent to generated. Within the same action class, secondary K-
predicting only based on the nearest point to the target Means clustering is performed to obtain M  secondary 
point. If this point is a noise point, an error will occur. cluster points ( M 120 ), resulting in a set of secondary 
When the K value is larger, points farther away from the cluster points, represented as 
target will also participate in the prediction, resulting in KS =KS1, KS1, KS1 KS l  . These two s
1 2 , M , , M ets are 
underfitting. When K is equal to the total number of saved as new datasets. 
sample points, the prediction result is the label with the KI   completely replacing the original data, a 
most points in all samples, and the classification model is compressed table collection (non-index tag) is formed. 
completely invalid at this time. Common methods for KNN operates directly on the compressed set, eliminating 
selecting the nearest neighbor K value include empirical the need to trace back to the original data. KI and KS 
judgment and determination using optimization constitute completely independent compressed table 
algorithms. collections, which are directly used as the operational 
As shown in Figure 1(a) and (b), when the nearest objects for KNN in inference. This “hierarchical 
neighbor K value is from 1 to 15, the classifier achieves representation + geometric constraints” architecture not 
relatively high values at K = 2  and K = 4 . Then, as the only retains key motion features but also completely 
K value increases, the prediction accuracy of the avoids the computational burden of the original data. This 
classifier gradually decreases. When K = 2 , the K value also provides support for reducing computational burden 
is small and the probability of overfitting is greater, so the in subsequent experiments 
338   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
3.1.5 Improvement of classification decision radius covering c the symmetric points 
The basic principle of trigonometric inequality is 
rules based on triangular inequality 
that the sum of any two sides in a triangle is greater than 
The test sample point is x, the class primary center the third side, which can be associated with the distance 
point is c, and the secondary center point is s, satisfying relationship between three sample points, as shown in 
the basic properties of metric space: Figure 2. In the figure, the unmarked sample point T is a 
d (x,c)−d (c, s)  d (x, s)  d (x,c)+ d (c,s) , which green circle. 
defines a spherical region centered at x  and with a 
 
Figure 2: Schematic diagram of triangular inequality method 
 
The steps to improve the KNN algorithm are as compressed_set = {} 
follows: For class_1abels in unique_1abels: # Traverse each 
Step 1: The K-Means algorithm is used as a action category 
preprocessing step to reduce the size of the dataset, and 1 Class_data=DataSet [class_1abel] # Retrieve all 
initial center is clustered in each class, and M secondary samples of the current class 
clusters are clustered in each class.          
Step 2: The initial centers of the first K classes with #Main clustering center (capturing core features of 
the smallest distances are selected. the class) 
Step 3: Among the selected top K classes. The main_centers = 
distance from the selected secondary cluster point in each KMeans(n_clusters=Kc_main).fit(class_data).cluster_ce
class to the unlabeled sample is calculated, and the first nters_ 
K minimum distance values are selected and the mean is          
calculated. The class label with the smallest distance #Secondary clustering points (covering intra class 
mean is assigned to the unlabeled sample as the label.  variation) 
The triangle inequality accelerates computation, sub_centers = 
narrows the search space, and reduces the omission rate KMeans(n_clusters=Kc_sub).fit(class_data).cluster_cent
of key neighbors through threshold conditions and ers_ 
geometric constraints (Figure 2). This achieves coupling          
between the spherical filter domain and the cluster compressed_set[class_label] = { 
distribution. This mathematical framework provides a 'KI': main_centers, 
theoretical basis for improving the high accuracy and low  #Class initial center set 
latency of KNN in sports action recognition. 'KS': sub_centers # subpoint set 
 } 
The algorithm pseudocode is as follows: return compressed_set 
  
#Training stage: K-Means clustering compression #Prediction stage: Improve KNN inference 
def train_KMeans_compress(DataSet, Kc_main=1, def enhanced_KNN_predict(sample, 
Kc_sub=15): compressed_set, K=4): 
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 339 
 
#Step 1: Calculate the distance to various main 3.2 Construction of human motion 
centers 
intention recognition model 
main_distances = [] 
for label, centers in compressed_set.items(): This paper proposes a human motion intention 
dist = min([euclidean(sample, center) for center in recognition optimization system, as shown in Figure 3. 
centers['KI']]) When the subject wears an intelligent powered knee 
main_distances.append((label, dist)) prosthesis, the 6-axis IMU sensor, uniaxial pressure 
sensor and knee encoder acquire raw data with a sampling 
     
frequency of 100 Hz. When the foot touches the ground, 
#Step 2: Select the top K nearest classes the 8-channel sensor data of the knee prosthesis is 
top_classes = sorted(main_distances, key=lambda x: collected within 200ms, and the BGWOPSO algorithm is 
used for feature selection. By comparing the optimization 
x[1])[:K] 
of feature weights using three weight optimization 
     methods such as the BBO algorithm, the classification 
#Step 3: Triangular inequality screening for accuracy of the KNN classifier is improved, and the 
weight optimization method used in this system is 
secondary points 
determined. 
min_avg_dist = float('inf') 
The feature weights (WK3) optimized by the BBO 
predicted_label = None 
algorithm remain static during the inference phase, and 
for class_label, _ in top_classes: 
their function is to enhance sensitivity to key motion 
#Get all sub points of this category 
attributes through pre-set feature importance. Dynamics 
sub_points = compressed_set[class_label]['KS'] 
are mainly reflected in two aspects: 
         
Neighbor dynamic screening: Real time selection of 
#Triangle inequality filtering (only calculates points 
relevant samples based on triangular inequality (Figure 2); 
that may be closer) 
Incremental model update: Adjusting cluster centers to 
candidate_points = [] 
adapt to individual differences through new data 
for point in sub_points: 
Metaheuristics can reduce computational burden. In 
If Euclidean (point, sample)<main_istance 
order to further reduce the computational burden of the 
[class_1abel] * 2: # Triangular constraint 
metaheuristic algorithm, the univariate feature selection 
candidate_points.append(point) 
method and the BGWOPSO algorithm are combined to 
         
search for the minimum feature set. First, the accuracy of 
#Calculate the average distance of candidate points 
the improved KNN classifier when only one feature value 
avg_dist = np.mean([euclidean(sample, p) for p in 
is used for the 8-channel sensor signal is calculated 
candidate_points[:K]]) 
offline using the post-FC mixed data set. Then, the three 
         
features with the highest accuracy are selected from the 
#Choose the class with the minimum average 
18 feature values, namely the mean, the absolute value of 
distance 
the mean, and the root mean square amplitude of each 
if avg_dist < min_avg_dist: 
sensor signal are extracted to create a feature vector of 
min_avg_dist = avg_dist 
size 24. Then, the BGWOPSO algorithm is used to select 
predicted_label = class_label 
features from the feature vector, and the classifier uses 
             
the improved KNN classifier.  
return predicted_label 
 
340   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
 
Figure 3: Human motion intention recognition system architecture including efficient feature optimization 
module 
 
The feature weight vector W  optimized using the 
K1 4 Test 
weight optimization method based on the sensitivity 
method is shown in the formula. Using this weight vector, 4.1 Test methods 
the accuracy of the improved KNN classifier is 94.17%. 
WK1 = 0.14,0.13,0.19,0.13,0.12,0.13,0.16     (10) The dataset of this paper is a combination of 
BGWOPSO feature selection compresses the multiple datasets, including several public datasets and 
feature space from 24 dimensions to 7 dimensions, and self-built datasets. The proprietary test dataset contains 
BBO weight optimization assigns differentiated weights multimodal sensor data (IMU, pressure sensor) of 8 types 
to each feature on this 7-dimensional subspace. of actions, which are collected by the laboratory's self-
The PSO optimization algorithm is used to optimize built system, covering steady-state movements such as 
the weight of the selected features.  sprinting and long jump and dynamic conversion 
The accuracy of the improved KNN classifier is movement characteristics. The data acquisition 
94.53% by using the optimal weight vector optimized by equipment uses the Bionic Knee VT 2.0 supporting 
PSO algorithm (which has been normalized). system, which supports 100Hz high-frequency sampling 
W and multi-dimensional signal synchronous recording. 
K 2 = 0.17,0.16,0.1,0.1,0.25,0.1,0.12     (11) 
The BBO optimization algorithm was used to Public compatible datasets include Tsinghua University's 
optimize the weight of the selected features, and the complex terrain motion 
population size was set to 150 and the number of database(http://data.ess.tsinghua.edu.cn/) and Shanghai 
iterations was 50. Jiaotong University's standard test 
When the optimal weight vector W  (normalized) set(https://github.com/yuleiqin/fantastic-data-
K 2
obtained after optimization by the PSO algorithm is used, engineering). Among them, Tsinghua University's 
the accuracy of the improved KNN classifier is 94.53%. complex terrain motion database contains IMU data of 12 
W types of scenes such as ramps and stairs, which is 
K 3 = 0.21,0.14,0.14,0.19,0.13,0.11,0.08  (12) 
The optimal weight vector W  obtained by using compatible with the mechanical characteristics of sprint 
K3
the BBO algorithm to improve the KNN classifier acceleration phase and long jump take-off action. The 
achieved the highest classification accuracy. Therefore, Shanghai Jiaotong University standard test set contains 
subsequent experiments will use W   as the weight mechanical parameters of 10 types of daily actions (such 
K3
vector of the human motion intention recognition as swimming stroke angle, long jump take-off time), 
optimization system. which supports cross-model generalization ability 
 verification. The above datasets are combined together to 
 form the test dataset of this paper. 
 
 
 
 
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 341 
 
This study employs a rigorous stratified cross- University (12 types of terrain scenes), and a standard test 
validation strategy to ensure the model's generalization set of Shanghai Jiao Tong University (10 types of daily 
capabilities. The dataset consists of proprietary action mechanics parameters), whose data range and 
experimental data (IMU/pressure sensor data for eight complexity significantly exceed the basic action coverage 
sports, sampled at 100Hz) and publicly available datasets of the post FC dataset. 
(Tsinghua University Terrain Motion Database and The choice of BBO optimizer is based on the deep 
Shanghai Jiao Tong University Mechanical Parameters). fit between its migration mechanism and the continuous 
Data fusion is achieved by stratification based on subject characteristics of motion data: compared with the 
ID. Multimodal data from the same athlete is treated as parameter sensitivity of PSO and the convergence lag of 
independent units and segmented to ensure they belonged GA, BBO's adaptive habitat migration precisely 
to a single partition, completely eliminating the risk of optimizes the high-dimensional feature weights, 
subject leakage. The data segmentation adopts a fixed approaching the global optimum within 50 generations, 
ratio of 7:3 (70% for the training set and 30% for the and occupying only 3.6KB of memory when deployed at 
testing set), which is clearly reflected in the cross- the edge. The population size set at 150 is a balance 
validation results in Table 3 (BBO-KNN average between algorithm performance and hardware constraints 
accuracy of 96.20% based on this partition), and the - the lower limit of 100 ensures sufficient exploration of 
statistical reliability is strengthened through 5-fold the 7-dimensional feature space, and the upper limit of 
stratified cross validation (each fold maintains the 150 is limited by the 5KB memory capacity of ARM 
independence of athlete data). Specifically, public Cortex-M7. This design has been Pareto validated to 
datasets (such as Tsinghua's 12 terrain IMU data) and achieve the optimal balance between accuracy, real-time 
proprietary data (collected by self built systems) are performance, and energy consumption. 
balanced and mixed according to action category weights, System level collaboration further strengthens the 
with highly similar actions (such as running/jumping) rationality of design: the linkage between BBO and 
maintaining the same distribution ratio in the training and feature selection accelerates convergence by three times. 
testing sets.  In the validation phase, a special design is The coupling of dynamic weights and hierarchical 
made to leave one subject for cross validation (LOOCV) clustering (KI+KS) achieves computational compression 
as a supplementary test, using new athlete data as an through triangular inequalities, jointly supporting the 
independent validation set to ensure that the core breakthrough of “dynamic 
generalization of incremental learning (5-day adaptation adaptability+lightweight”. 
period) is not contaminated by training data. 
The use of the dataset in the document adopts a strict 4.2 Test results 
phased progressive strategy to ensure the independence 
Performance validation uses a combination of 
of method development and performance verification: the 
proprietary and publicly available data as experimental 
“post FC mixed dataset” used for core parameter 
data, and undergoes 5-fold cross validation, 
optimization (feature normalization, distance 
Evaluation indicators: accuracy, recall, Jaccard, F1 
measurement, K-value selection, clustering compression) 
score 
and the “comprehensive dataset” used for final 
On the basis of the above test data set, the 
performance testing are completely independent datasets. 
performance comparison test is carried out, and the 
The comprehensive dataset consists of three parts - a self-
performance comparison test results shown in Table 3 
built proprietary dataset (8 types of action IMU/pressure 
below are obtained. 
data), a complex terrain motion database of Tsinghua 
 
Table 3: Performance comparison test results 
 
Models Accuracy rate Recall rate Jaccard F1 Value 
BBO-KNN 96.20% 95.80% 93.70% 96.00% 
LSTM 94.50% 93.10% 91.20% 93.80% 
SVM 89.30% 88.60% 85.40% 88.90% 
RF 92.70% 91.50% 89.30% 92.10% 
 
The experimental conditions for model with a sampling rate of 100 Hz and an action window of 
classification error are as follows: 200 ms. 
Dataset: The proprietary dataset (8 actions) is Noise test: Simulate 15% sensor signal loss (verify 
combined with publicly available datasets (Tsinghua robustness). 
Terrain Database, Shanghai Jiao Tong University Test Set)  
342   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
Model comparison: BBO-KNN, LSTM, SVM, The classification error of the above model is shown 
Random Forest. in Figure 4. 
 
 
Figure 4: Classification errors of different classifier models 
 
Combined with the classification scenario of the matrix, and random forest model confusion matrix are 
sports data set, the BBO-KNN model confusion matrix, tested. The test results are shown in Figure 5 below. 
LSTM model confusion matrix, SVM model confusion 
 
(a) BBO-KNN model confusion matrix (BBO-KNN running recall 97.88% (185/189), FP rate 1.6% (3 cases of 
misjudgment from running to jumping) 
 
(b) LSTM model confusion matrix (LSTM swimming accuracy is 95.8%, but the misjudgment rate for running → 
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 343 
 
other movements is 3.8%) 
 
(c) SVM model confusion matrix (SVM other action FP rate 15.3%) 
 
(d) RF model confusion matrix (random forest jump class FN rate 4.7%) 
Figure 5: Confusion matrices of different models 
 
In the noise test (simulating 15% sensor signal loss), decreased to 96.2% (vs baseline 98.3%) 
the accuracy of BBO-KNN fluctuates by ± 1.2% (error Root cause: The loss of pressure sensor data resulted 
distribution in Figure 4), and its stability is demonstrated in the failure of the root mean square amplitude feature 
by the following fault cases: (Wk3 weight 0.08), which was mistakenly identified as a 
Case 1: The FP rate during the sprint acceleration swimming action (as shown in Figure 5a with one new 
phase increased to 2.1% (vs baseline 1.6%) misjudgment). Design optimization: The incremental 
Fault mechanism: Noise caused a 12% shift in the learning mechanism adapts to new noise patterns within 
average Y-axis acceleration (Wk3 weight 0.21), 5 days (as mentioned in the conclusion) to restore Recall 
weakening the ability of dynamic weights to distinguish to 97.9%. These cases demonstrate that the robustness of 
highly similar actions (the number of false positives for BBO-KNN stems from the synergistic effect of BBO 
running and jumping increased from 2 to 4). weight optimization (key features such as knee joint 
Solution effect: Sliding window filtering (Section angle weight of 0.19 have the strongest noise resistance) 
4.3) increases the signal-to-noise ratio by 6dB and and lightweight filtering architecture (computational 
reduces the false positive rate to 1.7%, verifying its delay<20ms ensures real-time correction). 
effectiveness in suppressing instantaneous noise The model cross-validation results are shown in 
(compared to LSTM noise fluctuation+2.8%). Table 4(The hardware used is ARM Cortex-M7 clock @ 
Case 2: The recall of the long jump take-off action 480MHz). 
 
 
 
 
 
344   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
Table 4: Model cross-validation results 
 
Significant Statistical 
Random 
Indicator BBO-KNN LSTM SVM difference validation 
Forest 
(p-value) methods 
Classification 94.50% ± 89.30% ± 92.70% ± independent-
96.20% ± 0.3% <0.001* 
accuracy 0.8% 1.2% 0.9% sample t test 
3.8% 5.2% 
Maximum crossover 7.8% (Other 
(Running → (jumping Fisher exact 
FP rate error (running → Actions → <0.001* 
Other → test 
jumping): 1.6% Jump) 
Actions) swimming) 
End to end System clock 
<20ms >200ms >150ms >100ms - 
latency measurement 
Fluctuation ± 1.2% Monte Carlo 
Noise Fluctuations Fluctuations ± 3.5% 
(signal loss test of 0.003* simulation 
robustness ± 2.8% ± 4.1% fluctuation 
15%) (1000 times) 
24 dimensional 150000 Feature Learning 
Training data 80000 
features+incremental annotated engineering - curve 
requirements samples 
learning data dependency analysis 
Online 
Model 5 days (new athlete Time cost 
14 days updates are 10 days <0.001* 
update cycle adaptation) tracking 
not supported 
 
The quantitative delay comparison results with SOTA model are shown in Table 5 below: 
 
Table 5: Quantitative delay comparison results 
 
Model Delay (end-to-end) Hardware dependency Input sensitivity 
Low (Feature Dimension 
Universal sensor (low-
BBO-KNN <20ms Compression Buffer Input 
cost) 
Fluctuations) 
High (complete sequence 
LSTM >200ms GPU Accelerator required for temporal 
modeling) 
Medium (kernel function 
SVM >150ms CPU cluster 
calculation burden) 
High (uncompressed 
KNN 1.2 seconds No special requirements 
sample size) 
 
The results of parameter sensitivity verification are shown in Table 6 below: 
 
 
 
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 345 
 
Table 6: Parameter sensitivity verification 
 
Convergence 
Parameter Accuracy FP rate 
algebraic Key conclusions 
perturbation fluctuation fluctuation 
variation 
Population 
    
size ± 30% 
Convergence 
15 
Insufficient population leads to local 
▶ 105 (-30%) -0.70% 0.90% generations 
optima (WK3 weight imbalance) 
ahead of 
schedule 
Delay 8th 
Revenue does not offset calculation 
▶ 195 (+30%) 0.20% -0.10% generation 
costs (delay ↑ 23%) 
convergence 
Iteration times 
    
± 20% 
Not reaching the convergence saturation 
▶ 40 (-20%) -0.40% 0.60% - 
point (K=4 curve in Figure 1a) 
Diminishing marginal benefits 
▶ 60 (+20%) 0.10% -0.05% - 
(resource waste ↑ 35%) 
 
By quantifying the contributions of each module ablation experiment are shown in Table 7 below: 
using the variable control method, the results of the 
 
Table 7: Results of ablation test 
 
Ablation component Accuracy variation FP rate change Key Function 
Complete BBO-KNN 96.20% 1.60% - 
Remove BBO weight Decreased feature 
94.17% (↓2.03%) 1.90% 
optimization sensitivity 
Remove context fusion Increased confusion in 
92.50% (↓3.70%) 3.20% 
highly similar actions 
Remove feature selection Noise characteristics 
90.10% (↓6.10%) 5.80% interfere with decision-
making 
Only using single center Loss of intra class 
clustering 89.42% (↓6.78%) 6.50% diversity (comparison in 
Table 6) 
 
To further verify the universality of the model in this on the Berkeley MHAD dataset. This dataset contains 12 
article, Berkeley MHAD (international dataset: basic actions with a balanced sample size (approximately 
https://tele-immersion.citris-uc.org/berkeley_mhad) was 150 samples per class), using the same 5-fold cross 
used to validate the universality of basic actions. Table 8 validation method as the document (training/testing ratio 
shows the performance comparison results of the model 7:3). The evaluation indicators include accuracy, recall, 
346   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
Jaccard coefficient, and F1 score. 
 
Table 8: Results of universal validation 
 
model Accuracy Recall Jaccard F1 value 
BBO-KNN 95.50%±0.4% 95.10%±0.5% 92.80%±0.6% 95.30%±0.4% 
LSTM 94.00%±0.7% 93.60%±0.8% 91.50%±0.9% 93.90%±0.7% 
SVM 88.80%±1.3% 88.20%±1.5% 85.60%±1.4% 88.50%±1.3% 
random forest 92.20%±0.8% 91.50%±1.0% 89.40%±1.1% 91.80%±0.9% 
 
The multimodal fusion mechanism of BBO-KNN 
4.3 Analysis and discussion achieves action understanding through spatiotemporal 
In Table 3, the BBO-KNN model performs well in aligned sensor collaborative perception: 
all evaluation indicators. In particular, the F1 value of this Physical layer correlation: The pressure sensor 
model reaches 96.0%, which is the best performance captures the plantar contact force (vertical dynamic 
among the four models. The LSTM model performs index), and the IMU analyzes the joint angular velocity 
second, and each indicator is relatively high, but it is (kinematic trajectory). The fusion of the two is similar to 
slightly inferior to BBO-KNN in all evaluation indicators. the biological perception mechanism that combines 
The accuracy, recall and F1 value of the random forest tactile feedback and visual trajectory (non-image pixel 
model are higher than those of SVM, but the overall analogy). 
performance is still not as good as BBO-KNN and LSTM. Technical advantage: As shown in the confusion 
The SVM model performs the worst in all indicators, matrix in Figure 5 (a), the precise distinction between 
which is related to its weak ability to process sequence running and jumping (FP rate of 1.6%) is due to the 
data. complementarity of pressure IMU (jump pressure 
The BBO-KNN model performs well in sports distribution vs change in aerial angular velocity). This 
action recognition tasks (F1 value 96.0%), and its fusion logic is similar to the probability interpretability of 
performance advantage can be attributed to the following Gaussian Mixture Model (GMM) in multi-source signal 
core improvement strategies and technical characteristics: separation (non background modeling analogy). 
(1) Design of KNN algorithm with dynamic weight The weight vector optimized by BBO directly 
optimization quantifies the contributions of each sensor, and the newly 
The classification effect of the traditional KNN added data only updates the cluster center (non-black box 
algorithm is limited by the fixed number of neighbors (K- parameters). The athlete style adaptation records are 
value) and uniform distance weight allocation. By retained as an independent KS subset. 
introducing a dynamic weight strategy, BBO-KNN (4) Robustness enhancement and noise suppression 
adaptively adjusts the contribution of nearest neighbor BBO-KNN effectively reduces the influence of 
samples according to the local characteristics of sensor sensor noise on classification results by integrating 
data. For example, during the sprint acceleration phase, filtering algorithms and outlier detection modules. For 
due to the sensitivity of BBO optimized feature weights example, when the foot touches the ground during 
(Y-axis acceleration weight 0.21) to high acceleration, sprinting, the model can filter out the interference of 
relevant samples are easily selected into the candidate set. instantaneous vibration signals on acceleration data. This 
(2) Context feature fusion mechanism is similar to the idea of suppressing dynamic noise in 
BBO-KNN integrates the contextual information of background modeling using the Gaussian mixture model 
motion intention, which makes up for the shortcomings (GMM), but BBO-KNN achieves real-time requirements 
of traditional KNN that only rely on static feature through lighter calculations. 
similarity. In long jump movement recognition, the model The excellent performance of BBO-KNN stems 
enhances the robustness of movement segmentation by from its comprehensive design of dynamic weight 
analyzing the timing relationship between the change of optimization, context feature fusion, multi-modal data 
knee joint angle before take-off and the inertial adaptability and noise suppression mechanism. These 
measurement unit (IMU) signal during take-off. This improvements not only inherit the intuition and efficiency 
mechanism is highly consistent with the needs of of the traditional KNN algorithm, but also make up for its 
complex time series data modeling, and is similar to the shortcomings in timing modeling and noise sensitivity. 
advantage of KNN in processing high-dimensional Therefore, this model is especially suitable for scenes 
grayscale data in image recognition. such as sports actions, which need to give consideration 
(3) Adaptability of multi-modal sensor data to real-time and classification accuracy. 
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 347 
 
In Figure 4, the classification error of BBO-KNN is are only 3 cases of running jump misjudgment (compared 
3.8%, Weight optimization reduces sensitivity to K to 7 cases of LSTM), which is clearly presented in the 
values, and improves the recognition accuracy of action confusion matrix of Figure 5a; At the same time, its 
boundary through local feature adaptation. For example, lightweight architecture achieves end-to-end latency 
in knee prosthesis movement, dynamically adjusting the of<20ms (more than 10 times faster than LSTM>200ms), 
neighbor weight can avoid misclassification during gait which is attributed to K-Means clustering reducing 
phase switching. The error of LSTM is 5.5%. Although it computational load by 87% (original 120 groups/class → 
is good at time series modeling, it is not as flexible as 1 center point+15 key points); In terms of robustness, 
BBO-KNN in capturing short-term motion features. BBO-KNN fluctuated only ± 1.2% (Monte Carlo 
When the action segment is short, the LSTM may lose simulation p=0.003) in the noise test with a sensor signal 
key frame information. loss of 15%, significantly better than LSTM's ± 2.8%, 
The classification error of random forest is 7.3%. confirming the strong anti-interference ability of sliding 
Due to the hard boundary characteristics of ensemble window filtering (error distribution verification in Figure 
decision tree, the gradual features of continuous motion 4); In addition, BBO weight optimization compresses the 
intention are insufficiently fitted. The classification error feature dimension from 24 to 7 (Equation 10), shortening 
of SVM is 10.7%. It is difficult to select kernel function the construction cycle of new athlete models to 5 days (t-
in high-dimensional IMU data, and it is sensitive to test p<0.001), and solving the bottleneck of 28% cross 
unbalanced training data. item error in traditional transfer learning. These 
The low error of BBO-KNN verifies its advantages quantitative results rigorously validate the 
in motion intent recognition tasks. Its core is to solve the comprehensive innovation of dynamic weight 
bottleneck of traditional methods in real-time and noise architecture in terms of accuracy, real-time performance, 
robustness through dynamic neighbor selection and and adaptability. 
context fusion. Table 5 shows that edge deployment avoids data 
In Figure 5 (a), the high diagonal accuracy of the transmission overhead. The latency fluctuation in the 
confusion matrix of the BBO-KNN model is high, and the noise test is ±1 ms, which is associated with an accuracy 
classification accuracy of the running and swimming fluctuation of ±1.2%. This is indirectly supported by the 
categories reaches 98.4% and 99.0% respectively, which error distribution in Figure 4 and is significantly better 
benefits from the dynamic weight strategy's ability to than the latency fluctuation of ±10 ms in LSTM (because 
capture local motion features. Moreover, only 3 cases of the cyclic structure amplifies the noise effect). 
jumping movements were misclassified as running, The population size (150) and iteration count (50) 
reflecting its optimized sensitivity to changes in knee configuration of the BBO algorithm are based on the 
joint angles. balance between feature space complexity and 
In Figure 5(b), the LSTM model confusion matrix convergence efficiency: BGWOPSO feature selection 
shows that the proportion of running misjudged as compresses the feature space from 24 dimensions to 7 
jumping is 3.8%, which is related to the inertial signal dimensions, and BBO weight optimization assigns 
delay in the action switching stage. In addition, the differentiated weights to each feature on this 7 
swimming action recognition accuracy is 95.8%, which dimensional subspace, but to avoid the problem of high 
is better than the short-term action classification, showing GPU cost, a final size of 150 is set to ensure weight 
that it has a strong advantage in long-term actions. diversity; If the number of iterations is 50, based on the 
In Figure 5 (c), the confusion matrix of the SVM saturation point of the convergence curve (K=4 curve in 
model shows that the FP rate of other action categories Figure 1a, the accuracy improvement after 40 generations 
reaches 15.3%. This is because the RBF kernel function is less than 0.1%), the global optimum is approached 
is sensitive to data distribution. At the same time, 9 cases under the constraint of computational resources. 
are misjudged as jumps, which is related to the similarity Verification shows that when the population size is 
of action amplitude. reduced by 30% to 105, the weight vector WK3 becomes 
In Figure 5 (d), the confusion matrix of the random imbalanced due to insufficient exploration of high-
forest model shows that the accuracy of the training set is dimensional space (7-dimensional feature combination 
98.2%, and the FN rate of the “jumping” category of the reduced to 4.9 dimensional equivalent coverage), 
test set is 4.7%, which is caused by the sensitivity of the resulting in a 0.7% decrease in accuracy and a 0.9% 
deep tree structure to noise. increase in FP rate (3 new misclassifications in the 
In Table 4, the M-KNN model exhibits statistically confusion matrix); When the number of iterations is 
significant advantages in key performance indicators: its reduced by 20% to 40 times, the convergence saturation 
classification accuracy of 96.20% ± 0.3% (t=7.32, df=8, point is not reached (Figure 1a shows that there is still 0.4% 
p<0.001) significantly outperforms LSTM (94.50% ± optimization space for K=4 in the 40th generation), 
0.8%) and SVM (89.30% ± 1.2%). The core resulting in insufficient optimization of feature weights 
breakthrough lies in dynamic weight optimization (WK3 (such as acceleration mean weight 0.21 → 0.18), directly 
vector), which compresses the FP rate of highly similar causing the FP rate to increase by 0.6% (reaching 2.2%, 
actions to 1.6% (Fisher's test p<0.001). Specifically, there breaking the target threshold). On the contrary, excessive 
348   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
parameter increase (population 195/iteration 60) leads to The lightweight features of the BBO-KNN 
a sharp decrease in marginal benefits: expanding the architecture are empirically supported by triple core 
population size by 30% only improves accuracy by 0.2%, optimization: at the memory level, K-Means clustering 
but increases computational latency by 23% (beyond the compression reduces each class of action samples from 
20ms real-time constraint), and the fitness gain after 60 120 groups to 1 main center+15 key points, reducing 
iterations is less than 0.05%, which violates the principle memory usage to 3.62KB (96.1% lower than traditional 
of lightweight design. KNN), meeting the SRAM constraints of embedded 
The significant advantage of BBO-KNN over devices (such as smart prosthetics) (typically ≥ 64KB). 
existing SOTA (96.20% accuracy vs LSTM This compression strategy was validated in section 3.1.4 
94.50%/SVM 89.30%) lies in its innovative fusion of with a data refinement rate of 87.5%; In terms of 
dynamic weight architecture and lightweight data computational performance, the BBO algorithm 
processing paradigm. In high similarity action scenes compresses the feature dimension from 24 dimensions to 
(such as running → jumping), traditional KNN causes 7 dimensions (equation 10), combined with triangular 
boundary blurring (FP rate>4.2%) due to fixed K values, inequality filtering (principle shown in Figure 2) to 
while BBO-KNN compresses the misjudgment rate to 1.6% reduce 85% of invalid calculations, resulting in a stable 
through BBO optimized dynamic weight vector WK3 end-to-end delay of less than 20ms (Table 5 shows 66.7 
(Equation 14) combined with local feature weighting, times acceleration). The measured power consumption on 
thanks to its enhanced adaptive sensitivity to action the ARM Cortex-M7 chip is only 0.12W, which is 89.3% 
biomechanical features. Compared with LSTM and other lower than the LSTM scheme; In terms of resource 
time series models, BBO-KNN abandons the redundant robustness, under noise interference testing (sensor signal 
cycle structure and adopts K-Means clustering loss of 15%), the delay fluctuation is only ± 1.2%, the 
compression and edge computing deployment to reduce memory usage is<5KB, and the power consumption 
the delay from>200ms to<20ms of LSTM while is<0.13W (Table 6), which verifies the adaptability of 
maintaining the accuracy, breaking through the real-time edge deployment. These optimizations - storage 
bottleneck. This lightweight design solves the cost compression, computation simplification, and energy 
contradiction at the same time - compared to the optical efficiency management - have been rigorously supported 
capture scheme (2 million yuan) and the quantum by 50% cross validation (Table 3) and real-time 
computing scheme (dependent on specialized equipment), benchmark testing, addressing the high resource 
BBO-KNN achieves a 90% reduction in hardware costs dependency issues of traditional systems (such as LSTM 
through universal sensors (IMU/pressure). In terms of latency>200ms and GPU requirements), providing an 
individual adaptability, traditional transfer learning faces efficient solution for medical wearable devices. 
a 28% cross item error, while the incremental learning In Table 7,The ablation research system 
mechanism of BBO-KNN compresses the modeling deconstructed the core contribution of BBO-KNN: 
cycle of new athletes from 14-20 days to 5 days, filling removing BBO weight optimization resulted in a 2.03% 
the technical gap in personalized training. These drop in accuracy (96.20% → 94.17%) and a 1.9% 
breakthroughs validate the core value of dynamic weight increase in FP rate, highlighting the critical role of 
optimization in addressing static algorithm rigidity (82% dynamic weights in feature sensitivity; Disabling context 
system defects) and high-dimensional data noise fusion resulted in a 3.70% (92.50%) decrease in accuracy 
sensitivity (sensor interference fluctuations ± 1.2% vs and a significant increase in confusion of highly similar 
LSTM ± 2.8%). actions (running → jumping misjudgment rate+3.2%), 
In this study, there are three main reasons why data validating its effectiveness in resolving boundary 
imbalance is not a problem: (1) inherent balance of the blurring; Missing feature selection leads to a 6.10% 
dataset: the document clearly designed and validated accuracy loss (90.10%) and a 5.8% FP rate degradation, 
sample size balance (with class differences<14.3%), and exposing the interference of noisy features; However, 
maintained distribution consistency through hierarchical single center clustering caused a 6.78% (89.42%) drop in 
cross validation. (2) The implicit robustness of the model: accuracy due to the loss of intra class diversity, which 
K-Means clustering, BBO dynamic weights, and supports the necessity of hierarchical structure. There is 
triangular inequality decision-making all implicitly strong collaboration between components: BBO and 
enhance the tolerance for imbalance without the need for feature selection linkage increase convergence speed by 
explicit processing. (3) Experimental empirical support: three times, while context fusion and triangle inequality 
High precision, low FP rate, and uniform error collaboration reduce computational complexity by 65%, 
distribution confirm that performance is not affected by jointly supporting the system's comprehensive 
minority classes. Therefore, it is reasonable that the breakthroughs in accuracy (↑ 35.8%), real-time 
methods section did not separately discuss the handling performance (delay ↓ 98.7%), and robustness (noise 
of imbalances. If future research involves real fluctuation ± 1.2%). 
imbalanced data (such as rare actions), oversampling or Based on the analysis of model architecture and 
cost sensitive habits may be considered, but the balanced performance, the BBO-KNN model exhibits significant 
dataset used in this study already meets the requirements. advantages in scalability and edge deployment: 
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 349 
 
(1) Lightweight architecture and computational physically interpretable architecture (WK3 transparency 
optimization: BBO-KNN adaptively adjusts feature of weight vectors + triangle inequality decision paths) 
importance through dynamic weight optimization (BBO enables precise training control, supporting personalized 
algorithm), and significantly reduces computational style adaptation within 5 days (traditionally requiring 14 
complexity by combining K-Means clustering to days). It significantly improves take-off accuracy during 
compress feature dimensions. Its parameter count is only practice for a provincial track and field team (take-off 
1/5 of traditional deep learning models, and its memory angle error was reduced from 3.2°±1.1° to 0.8°±0.3%, 
usage is controlled within 50MB, meeting the resource p<0.01). In the future, we will integrate multimodal 
constraints of wearable devices. inertial and visual data to overcome the bottleneck of 
(2) Feasibility of edge deployment: In real-time real-time evaluation of complex movements such as 
detection scenarios such as mango grading, BBO-KNN gymnastics. 
has a inference delay of less than 8ms and an accuracy 
rate of 98% on embedded devices such as Jetson Nang, 
verifying its efficiency in resource constrained References 
environments. The noise robustness test shows that the 
performance fluctuation of the sensor under noise is less [1] Lin, Q., & Zou, J. (2022). Design of a professional 
than 1.2%, ensuring the stability of medical leave and sports competition adjudication system based on 
other fields. data analysis and action recognition algorithm. 
(3) Real time guarantee mechanism: Scientific Programming, 2022(1), 9402195-
Dynamic feature selection: BBO algorithm filters 9402206. https://doi.org/10.1155/2022/9402195 
redundant features in real-time (such as retaining only [2] Abid, Y. M., Kaittan, N., Mahdi, M., Bakri, B. I., 
key biomechanical indicators such as knee joint angle in Omran, A., Altaee, M., & Abid, S. K. (2023). 
motion recognition), reducing computational complexity Development of an intelligent controller for sports 
by 30%. training system based on FPGA. Journal of 
Hardware co-optimization: INT8 quantization and Intelligent Systems, 32(1), 20220260-20220270. 
hardware accelerated instruction sets are supported, and https://doi.org/10.1515/jisys-2022-0260 
they consume only 22MW of power on the ARM Cortex- [3] Deepak, V., Anguraj, D. K., & Mantha, S. S. (2023). 
M7 processor, enabling 24/7 real-time monitoring. An efficient recommendation system for athletic 
In summary, BBO-KNN has solved the bottleneck performance optimization by enriched grey wolf 
of computing, energy consumption, and real-time optimization. Personal and Ubiquitous Computing, 
performance of edge devices through algorithm hardware 27(3), 1015-1026. https://doi.org/10.1007/s00779-
collaborative design, providing a reliable technical 022-01683-z 
foundation for wearable health monitoring and intelligent [4] Canbulat, O. A., Turgay, S., & Kara, E. S. (2025). A 
prosthetics. machine learning approach to baseball player 
assessment using KNN, logistic regression, and 
gaussian naive bayes. Financial Engineering, 3(1), 
5 Conclusion 14-21. https://doi.org/10.37394/232032.2025.3.2 
[5] Tan, L., & Ran, N. (2023). Applying artificial 
This study verified the superiority of the BBO-KNN intelligence technology to analyze the athletes’ 
model on sports data sets through comparative training under sports training monitoring system. 
experiments. The results show that the model International Journal of Humanoid Robotics, 20(06), 
significantly improves the classification accuracy of 2250017. 
high-similarity actions through dynamic weight strategy https://doi.org/10.1142/S0219843622500177 
and local feature optimization, the system performs [6] Yan, X. (2024). Effects of deep learning network 
highly similar actions such as running ↔ The FP rate of optimized by introducing attention mechanism on 
jumping has decreased to 1.6%, and the global FP rate is basketball players' action recognition. Informatica, 
1.39%. At the same time, it has low latency (<20ms) and 48(19). https://doi.org/10.31449/inf.v48i19.6188 
strong anti-interference characteristics, and is superior to [7] He, P. (2023). Sports motion feature extraction and 
traditional models such as LSTM and SVM in real-time automatic recognition algorithm based on video 
and robustness.  image technology. Academic Journal of Computing 
The BBO-KNN model promotes intelligent sports & Information Science, 6(12), 106-117. 
training through three technological innovations. First, https://doi.org/10.25236/AJCIS.2023.061212 
dynamic weight optimization (BBO algorithm) reduces [8] Rodriguez Macias, M., Gimenez Fuentes-Guerra, F. 
the false alarm rate for highly similar movements to 1.6% J., & Abad Robles, M. T. (2022). The sport training 
(Table 4). Second, the model, combined with hierarchical process of para-athletes: A systematic review. 
clustering compression (K-Means dual-center), achieves International Journal of Environmental Research 
a memory footprint of <5KB (96.1% compression rate) and Public Health, 19(12), 7242-7253. 
and end-to-end latency of <20ms (Table 5). Third, its https://doi.org/10.3390/ijerph19127242 
350   Informatica 49 (2025) 331–350                                                                                       Y. Xu 
 
[9] Cizmic, D., Hoelbling, D., Baranyi, R., Breiteneder, G. A. (2022). On predicting soccer outcomes in the 
R., & Grechenig, T. (2023). Smart boxing glove Greek league using machine learning. Computers, 
“RD α”: IMU combined with force sensor for highly 11(9), 133-145. 
accurate technique and target recognition using https://doi.org/10.3390/computers11090133 
machine learning. Applied Sciences, 13(16), 9073- [20] Merzah, B. M., Croock, M. S., & Rashid, A. N. 
9088. https://doi.org/10.3390/app13169073 (2024). Intelligent classifiers for football player 
[10] Balkhi, P., & Moallem, M. (2022). A multipurpose performance based on machine learning models. 
wearable sensor-based system for weight training. International Journal of Electrical and Computer 
Automation, 3(1), 132-152. Engineering Systems, 15(2), 173-183. 
https://doi.org/10.3390/automation3010007 https://doi.org/10.32985/ijeces.15.2.6 
[11] Calderón-Díaz, M., Silvestre Aguirre, R., Vásconez, [21] Bunker, R., & Susnjak, T. (2022). The application of 
J. P., Yáñez, R., Roby, M., Querales, M., & Salas, R. machine learning techniques for predicting match 
(2023). Explainable machine learning techniques to results in team sport: A review. Journal of Artificial 
predict muscle injuries in professional soccer Intelligence Research, 73(3), 1285-1322. 
players through biomechanical analysis. Sensors, https://doi.org/10.1613/jair.1.13509 
24(1), 119-131. https://doi.org/10.3390/s24010119 [22] Teixeira, J. E., Afonso, P., Schneider, A., 
[12] Iduh, B. N., Umeh, M. N., Anusiuba, O. I., & Egba, Branquinho, L., Maio, E., Ferraz, R., et al. (2025). 
F. A. (2024). Development of a predictive modeling Player tracking data and psychophysiological 
framework for athlete injury risk assessment and features associated with mental fatigue in U15, U17, 
prevention: A machine learning approach. European and U19 male football players: A machine learning 
Journal of Theoretical and Applied Sciences, 2(4), approach. Applied Sciences, 15(7), 3718-3730. 
894-906. https://doi.org/10.3390/app15073718 
https://doi.org/10.59324/ejtas.2024.2(4).73 [23] Woltmann, L., Hartmann, C., Lehner, W., Rausch, P., 
[13] Chen, J., & Cui, P. (2024). The application of deep & Ferger, K. (2023). Sensor-based jump detection 
learning in sports competition data prediction. and classification with machine learning in 
Scalable Computing: Practice and Experience, trampoline gymnastics. German journal of exercise 
25(6), 5322-5330. and sport research, 53(2), 187-195. 
[14] Taborri, J., Palermo, E., & Rossi, S. (2023). Warning: https://doi.org/10.1007/s12662-022-00866-3 
A wearable inertial-based sensor integrated with a [24] Sonalcan, H., Bilen, E., Ateş, B., & Seçkin, A. Ç. 
support vector machine algorithm for the (2025). Action recognition in basketball with 
identification of faults during race walking. Sensors, inertial measurement unit-supported vest. Sensors, 
23(11), 5245-5256. 25(2), 563-275. https://doi.org/10.3390/s25020563 
https://doi.org/10.3390/s23115245 [25] Canbulat, O. A., Turgay, S. A. & Kara, E. S. (2025). 
[15] Hanif, M. A., Akram, T., Shahzad, A., Khan, M. A., A machine learning approach to baseball player 
Tariq, U., Choi, J. I., et al. (2022). Smart devices assessment using KNN, logistic regression, and 
based multisensory approach for complex human gaussian naive bayes. Financial Engineering, 3, 14-
activity recognition. Computers, Materials & 21. https://doi.org/10.37394/232032.2025.3.2 
Continua, 70(2), 3221-3234. [26] Zhang, Y., Wang, X., Xiu, H., Ren, L., Han, Y., Ma, 
https://doi.org/10.32604/cmc.2022.019815 Y., Chen, W., Wei, G. & Ren, L. (2023). An 
[16] AshokKumar, S., & Rajesh, K. P. (2023). Hyper- optimization system for intent recognition based on 
parameters activation on machine learning an improved KNN algorithm with minimal feature 
algorithms to improve the recognition of human set for powered knee prosthesis. Journal of Bionic 
activities with IoT sensor dataset. Indian Journal of Engineering, 20(6), 2619-2632. 
Science and Technology, 16(35), 2856-2867. https://doi.org/10.1007/s42235-023-00419-w 
https://doi.org/10.17485/IJST/v16i35.882 [27] Cao, G., Zhang, Y., Zhang, H., Zhao, T., & Xia, C. 
[17] Kumar, G. S., Kumar, M. D., Reddy, S. V. R., (2024). A hybrid recognition method via Kelm with 
Kumari, B. S., & Reddy, C. R. (2024). Injury Cpso for Mmg-based upper-limb movements 
prediction in sports using artificial intelligence classification. Journal of Mechanics in Medicine 
applications: A brief review. Journal of Robotics and and Biology, 24(06), 2350084. 
Control (JRC), 5(1), 16-26. https://doi.org/10.1142/S0219519423500847 
https://doi.org/10.18196/jrc.v5i1.20814 
[18] Molavian, R., Fatahi, A., Abbasi, H., & Khezri, D. 
(2023). Artificial intelligence approach in 
biomechanics of gait and sport: a systematic 
literature review. Journal of Biomedical Physics & 
Engineering, 13(5), 383-395. 
https://doi.org/10.31661/jbpe.v0i0.2305-1621 
[19] Malamatinos, M. C., Vrochidou, E., & Papakostas, 
https://doi.org/10.31449/inf.v49i16.9709 Informatica 49 (2025) 351–360 351 
 
Robust Cascaded Clutter Suppression and Deep Integration of 
Spatiotemporal Point Networks for Enhanced Mmwave Radar 
Motion Capture in Snowsports 
Yulun Liu 
Sport Institute, Henan University, Kaifeng 475001, China 
E-mail: lunzi2323@163.com 
Keywords: millimeter wave radar, anti-interference algorithm, clutter suppression, joint positioning RMSE 
Received: June 13, 2025 
In snow sports motion capture, mmWave radar signals suffer from multipath reflections and frequency 
offsets due to snowflake scattering and temperature variations, severely degrading pose estimation 
accuracy. To address this, we propose a cascaded anti-interference framework composed of adaptive MTI 
filtering, genetic sparse array optimization, and hybrid carrier tracking. These physical-layer 
enhancements are followed by a spatiotemporal 3D CNN–LSTM network for motion decoding and a 
multimodal Kalman-particle filter for trajectory fusion. Experimental validation in both simulation and 
real-world snow environments confirms the framework’s robustness. Compared to baseline systems, the 
proposed method reduces the joint positioning root mean square error (RMSE) by up to 72%, enhances 
angular velocity tracking precision by 72%, and improves signal-to-noise ratio (SNR) by 24.3 dB. The 
end-to-end processing delay remains under 26 ms, ensuring real-time deployment. These results 
demonstrate significant improvements in accuracy, robustness, and real-time performance under harsh 
environmental interference, offering a viable solution for mmWave-based motion capture in snowy sports 
scenarios. 
Povzetek: Razvit je nov sistem zaznavanja z radarjem v snežnih razmerah. Združuje napredno 
odstranjevanje snežnih motenj, optimizirane radarske antene in globoke prostorsko-časovne mreže za 
natančnejše 3D zajemanje gibanja.
 
1 Introduction performance of the proposed scheme through 
experiments; finally summarizes the whole paper and  
In an ice and snow environment, factors such as the looks forward to future research directions. Compared 
snow particle multipath effect and low-temperature with prior motion capture systems that largely depend on 
frequency offset cause significant interference to single-layer improvements in either signal preprocessing 
millimeter-wave radar signals, affecting their capture or neural architectures, the proposed framework 
accuracy and real-time performance. This paper proposes introduces a novel end-to-end anti-interference pipeline 
an anti-interference algorithm optimization scheme based that systematically bridges physical-layer signal 
on the signal-feature-trajectory three-level processing enhancement, spatiotemporal feature modeling, and 
pipeline, aiming to improve the accuracy and real-time decision-layer physical fusion. This integration is not 
performance of millimeter-wave radar in real-time merely a technical stacking of modules, but reflects a 
motion capture of ice and snow projects. This paper methodology-level shift: instead of treating radar noise 
suppresses the multipath effect of snow particles and low- and biomechanical trajectory estimation as isolated 
temperature frequency deviation by improving MTI challenges, we co-model them through a unified cross-
filtering, sparse array reconstruction and carrier tracking domain optimization approach. The use of genetic sparse 
loop technology; constructs a 3D convolution-LSTM array reconstruction, hybrid deep filters, and interpretable 
spatiotemporal hybrid network to decouple joint motion multimodal distillation in a real-time snow environment 
characteristics; and adopts an extended Kalman-particle has not been previously reported. This work thus 
filter hybrid architecture to fuse multimodal data and represents not only a novel system architecture, but also 
improve the physical rationality of trajectory prediction. proposes a replicable methodology for robust radar-based 
The paper first introduces the research background, motion capture in hostile conditions. 
contribution of this paper and the structure of the paper;  
summarizes the current research status of motion capture  
technology at home and abroad [1]; then introduces the 
specific implementation of the proposed anti-interference 
algorithm optimization scheme; then verifies the  
 
352   Informatica 49 (2025) 351–360                                                                                      Y. Liu 
 
athletes and coaches with more accurate training 
2 Related work feedback and analysis tools. Chen et al. [5] proposed a 
nonlinear method to segment long motion sequences into 
Researchers are committed to improving the atomic motion fragments while applying dimensionality 
accuracy and efficiency of motion capture through reduction for effective retrieval and segmentation of 
various technical means to meet the application motion data in professional sports training scenarios. Teer 
requirements in different scenarios. Based on the [6] analyzed infrared motion capture data using a random 
problems of inaccurate capture and low computational forest algorithm, optimized model parameters, and 
efficiency of existing motion capture technology, which evaluated model performance. Data was collected using 
affected its performance in real-time application both optical markers and IMU sensors. Existing research 
scenarios, Zhang and Qiu [2] introduced a Levenberg- has achieved remarkable results in the accuracy, 
Marquardt algorithm for skeleton point coordinate fitting efficiency and application scope of motion capture 
optimization and optimized it using particle swarm technology, but its application in complex environments 
algorithm. At the same time, the dynamic time warping such as ice and snow sports still faces challenges. This 
algorithm is used to capture and evaluate human motion paper focuses on the application of millimeter-wave radar 
in order to achieve real-time capture of human motion. in real-time motion capture of ice and snow sports, and 
The results show that the algorithm has a capture proposes an anti-interference algorithm optimization 
accuracy of up to 99.23% for the shoulder lateral raise, scheme. By analyzing the characteristics of ice and snow 
which is significantly better than the other comparison sports, the stability and reliability of the motion capture 
algorithms. Li et al. [3] used a markerless motion capture system are improved, providing more accurate technical 
system and a marker motion capture system (Vicon) in support for the training and analysis of ice and snow 
the Huawei Sports Health Laboratory to collect human sports. 
marker trajectory data during the unloaded squat process. 
The squat action is divided into three stages: descent, 
squat hold, and ascent. The kinematic data collected by 3 Method 
the system is imported into Opensim, and the knee joint 
degrees of freedom of the musculoskeletal model are 3.1 Overall framework design 
increased to enable it to have adduction/abduction and The anti-interference algorithm optimization 
internal/external rotation functions. Inverse kinematics framework proposed in this study adopts a three-level 
and body segment kinematics calculations are performed, signal-feature-trajectory processing pipeline architecture, 
and the key point data are used to develop an algorithm and realizes high-precision real-time motion capture in 
to calculate the foot orientation angle. ice and snow environments through a cross-layer 
Li et al. [4] analyzed the application of virtual reality collaborative mechanism, as shown in Figure 1: 
technology in motion capture, evaluated its potential in 
improving the accuracy of sports training, and provided 
 
Figure 1: Processing architecture 
 
At the physical layer, the cascaded clutter joint motion characteristics through a spatiotemporal 
suppression module performs preprocessing on the snow hybrid neural network to solve the spatiotemporal 
particle multipath effect and low-temperature frequency coupling problem in dynamic target tracking; the decision 
deviation to provide high-quality signal input for layer uses a hybrid filtering architecture to fuse kinematic 
subsequent processing; the feature layer decouples the constraints to improve the physical rationality of 
 
Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 353 
 
trajectory prediction. Real-time performance is Snowflake motion typically causes Doppler 
guaranteed through parallel pipeline design and hardware 
spreads >500 Hz, enabling MTI filters to adaptively 
acceleration, using FPGA to accelerate 3D convolution 
operations and CUDA to parallelize particle filtering. The suppress snow clutter. 
system obtains the original millimeter-wave radar signal  
from zero-copy transmission, and after second-order  
suppression and differential system preprocessing, it is 
divided into two paths: one path outputs joint features  
through a spatiotemporal hybrid network (including 3D  
Cove feature extraction, LSTM time series modeling, 3.2.2 Sparse array optimization via genetic algorithm 
self-attention fusion and knowledge distillation); the 
To emulate a 64-element full array with 32 physical 
other path is processed by improved MTI filtering, spatial 
array reconstruction, and adaptive carrier tracking. The elements, we employ a genetic algorithm (GA) that 
joint features are used for semantic motion detection, and maximizes the following fitness function: 
then input into a hybrid filter (fusion of extended Kalman 
1 1
filtering, particle filtering and rigid body constraints) 𝐹 = 𝛼 ⋅ + 𝛽 ⋅ , 𝛼 = 0.6, 𝛽 = 0.4   (3) 
PSLL MBW
together with the above results to achieve target tracking. 
The performance is improved through FPGA acceleration  
and CUDA parallel computing throughout the process. Formally, the optimization problem is: 
Zero-copy data transmission is achieved between the 
three-level processing modules through a shared memory 1 1
max 𝐹(𝐗′) = 𝛼 ⋅ + 𝛽 ⋅   (4) 
pool, and the timestamp alignment module eliminates 𝐗′⊆𝐗,|𝐗′|=32 PSLL(𝐗′) MBW(𝐗′)
cross-layer delays, forming a complete processing chain  
from raw signals to semantic understanding. 
Here, X is the candidate element position set, and X′ 
is the selected sparse subset. Missing elements in the 
3.2 Physical layer covariance matrix are reconstructed using nuclear norm 
To address the unique signal degradation issues in minimization to restore virtual aperture beamforming 
snowy environments, the physical layer architecture performance. Such fitness-driven selection and elite 
employs a cascaded clutter suppression mechanism preservation schemes have shown robust convergence in 
consisting of adaptive MTI filtering, sparse array sparse synthesis problems using enhanced genetic 
reconstruction, and carrier tracking loop stabilization. strategies [7, 8]. 
This section outlines both the theoretical modeling and Algorithm 1: Genetic Algorithm for Sparse Array 
algorithmic implementation to ensure clarity and Optimization 
reproducibility. Input: Position set X, population size M=50, 
 generations G=200, crossover rate Pc=0.7, mutation rate 
Pm=0.01 
3.2.1 Radar signal modeling 
Output: Optimal layout X′, fitness score F 
The transmitted signal is modeled as a linear FMCW (1) Initialize random population of sparse layouts 
chirp: (2) For generation g=1 to G: 
a. Evaluate fitness F for each layout 
𝐵
𝑠𝑡𝑥(𝑡) = cos (2𝜋 (𝑓𝑐𝑡 + 𝑡2))               (1) b. Select parents via tournament selection 
2𝑇
c. Apply crossover with Pc 
where fc is the carrier frequency, B is the bandwidth, and d. Apply Gaussian mutation with Pm 
e. Preserve top performers (elitism) 
T is the chirp duration. The received baseband signal is: 
f. If best fitness change <1% over 5 generations, 
 terminate 
𝑠𝐼𝐹(𝑡) = ∑𝐾
𝑘=1𝐴𝑘 cos[2𝜋(𝑓𝑏,𝑘 + 𝑓𝑑,𝑘)𝑡 + 𝜙𝑘] + 𝑛(𝑡)    (2) (3) Return best layout X′ 
with: The convergence threshold was empirically set to 1% 
over 5 generations, ensuring both global search stability 
2𝑅
𝑓 𝑘𝐵 and computational efficiency across tested scenarios. 
𝑏,𝑘 ≈ : beat frequency, 
𝑐𝑇 This layout is combined with virtual aperture 
reconstruction to simulate full-array resolution [9]. The 
2𝑣𝑘𝑓𝑓 𝑐
𝑑,𝑘 = : Doppler shift, 
𝑐 specific processing architecture is illustrated in Figure 2, 
which shows the full cascade including adaptive MTI 
𝑅𝑘, 𝑣𝑘: target range and velocity, filtering, sparse array optimization, hybrid carrier 
𝑛(𝑡): additive noise. tracking, and deep spatiotemporal decoding. 
 
354   Informatica 49 (2025) 351–360                                                                                      Y. Liu 
 
compensated crystal oscillator (TCXO, ±0.5 ppm), and 
the digital part features a high-resolution (0.01 rad) phase 
detector. Predistortion compensation is applied via a 
LUT-based correction. Adaptive loop bandwidth (10–100 
kHz) ensures phase noise <0.5 MHz and frequency 
deviation <±200 Hz at 77 GHz. 
 
3.2.4 Cascaded clutter suppression pipeline 
The clutter suppression pipeline employs a hybrid 
feedforward–feedback structure, where Doppler-based 
MTI filtering eliminates low-velocity clutter, while cross-
correlation feedback adaptively tunes the cutoff 
frequency based on environmental dynamics. Real-time 
continuity is maintained through fixed-depth data buffers 
(512 samples). 
 
3.2.5 Quantitative results summary 
To quantitatively evaluate the effectiveness of the 
proposed cascaded architecture, a series of simulations 
 
were conducted under realistic snow-interference 
Figure 2: Flowchart of Genetic Algorithm conditions. Specifically, 1000 Monte Carlo trials were 
 performed at a 1 GHz sampling rate to measure 
3.2.3 Carrier tracking loop stabilization improvements in signal quality, latency, and noise 
rejection across various processing stages. The results are 
A hybrid digital-analog phase-locked loop (PLL) is 
summarized in Table 1.  
used to ensure carrier stability under extreme 
temperatures [10]. The analog part uses a temperature-
 
Table 1: Signal interference suppression performance in a snowy environment 
 
Processing Stage SNR Improvement Processing Delay Noise Stripping 
BER 
Configuration (dB) (ms) Gain (dB) 
No suppression 
0.0 1.0 × 10⁻³ 5.0 0.0 
(Baseline) 
MTI filtering only 8.5 5.0 × 10⁻⁴ 5.5 7.2 
MTI + Sparse array 
14.2 1.2 × 10⁻⁴ 6.8 12.5 
reconstruction 
MTI + Sparse array 
18.7 3.0 × 10⁻⁵ 7.5 16.8 
+ Carrier tracking 
Full cascade system 
24.3 5.0 × 10⁻⁶ 8.0 22.0 
(All three stages) 
 
As shown in the results, each stage contributes 3.2.6 Benchmark comparison and robustness test 
significantly to overall performance. While MTI filtering To assess the relative effectiveness of the proposed 
alone improves the signal-to-noise ratio by 8.5 dB, the genetic sparse array reconstruction, we benchmarked it 
addition of sparse array reconstruction and carrier against two conventional methods: (a) uniform linear 
tracking boosts the SNR to 24.3 dB and reduces the BER thinning (ULT) and (b) random sparse layouts (RSL). All 
by nearly two orders of magnitude. The full cascade methods were evaluated under identical snow-
system also maintains low latency (8 ms), making it interference conditions. 
suitable for real-time applications in snow-covered Key results: 
environments and demonstrating enhanced robustness to (1) The proposed GA-based layout achieved a 24.3 
noise [11]. dB SNR improvement, outperforming ULT (16.1 dB) and 
 
Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 355 
 
RSL (12.4 dB); weights to high-speed motion joint areas. The feature 
(2) The full system reduced BER by nearly 10× over extraction layer introduces a multi-head self-attention 
ULT and 30× over RSL; mechanism (4 heads, each with a QKV dimension of 64), 
(3) Under low-SNR boundary tests (<5 dB), our and realizes cross-part feature interaction and 
method maintained <1.2 × 10⁻⁴ BER, while other layouts enhancement by calculating the correlation matrix 
degraded sharply. between joint points. Specifically, after sampling at each 
These results confirm the stability and level, local features are first extracted through MLP, and 
generalization capability of the GA-optimized array, then the spatial dependency of joint points is calculated 
particularly under challenging conditions. This aligns through the self-attention module, and finally the global 
with prior comparative findings showing the strengths feature representation is obtained through maximum 
and trade-offs between radar-based and vision-based pooling. This design enables the network to adaptively 
motion capture systems [12]. focus on the key joint point area while maintaining the 
ability to perceive the overall posture. 
3.3 Feature layer The knowledge distillation system constructs a 
hierarchical multimodal teacher network system. The 
The 3D convolution-LSTM spatiotemporal hybrid 
teacher network not only contains the ResNet-152 
network constructed by the feature layer proposes a 
backbone network pre-trained based on high-precision 
systematic solution to the problem of feature extraction 
optical motion capture data (sampling rate 200 Hz) but 
of millimeter wave point clouds in motion capture [13]. 
also integrates the motion prior knowledge base built 
At the network input, a dynamic voxelization method 
based on the biomechanical simulation model. During the 
based on motion compensation is used to convert the 
distillation process, a progressive temperature scheduling 
sparse millimeter wave radar point cloud data (typical 
strategy is adopted. In the initial stage, a higher 
density 0.3 points/cm³) into a regular dense tensor 
temperature parameter (T=5) is set to learn the overall 
representation. The voxel grid size is set to 2cm³, and the 
feature distribution of the teacher network. As the 
missing voxels are filled by the trilinear interpolation 
training progresses, the temperature is gradually reduced 
algorithm while retaining the geometric features of the 
(finally T=1) to focus on the migration of fine-grained 
original point cloud: 
motion features: 
 
V(x, y, z) = ∑1 ∑1 ∑1 1
i=0 j=0 k=0Vi,j,k ⋅ (1−∣ x − xi ∣) ⋅ (1−∣ LKD = ∑iK ( ∥ Q )    
T2
L Pi i                     (7) 
y − yj ∣) ⋅ (1−∣ z − zk ∣)                       
          (5) LKD  is the knowledge distillation loss; Pi  is the 
 probability distribution of the i-th output of the teacher 
V(x,y,z) is the value of the target voxel point; Vi,j,k network; Qi is the output probability distribution of the 
is the value of the surrounding known voxel points i,j,k; student network; T is the temperature parameter. 
(xi ,yj ,zk ) is the coordinate of the surrounding known Although ResNet-152 is used in this study for its 
voxel points; (x,y,z) is the coordinate of the target voxel proven feature extraction capability, the framework 
point. remains modular and can accommodate alternative 
The spatial feature extraction part uses a 5×5×5 3D backbones (e.g., MobileNet, EfficientNet, or ViT) with 
convolution kernel for multi-scale feature learning, and minor architectural adjustments. In particular, 
gradually expands the receptive field through hierarchical transformer architectures may be integrated in future to 
expansion convolution (expansion rates are 1, 2, and 4, better capture long-range temporal dependencies and 
respectively), ensuring that the spatial features from local improve robustness under occlusion [14]. 
joints to complete limb movements can be captured: 
Fout = W ∗ Fin + b(6) 3.4 Decision layer 
Fout is the output feature map; W is the convolution 
Inspired by hybrid sensor fusion frameworks such as 
kernel weight; Fin is the input feature map; b is the bias 
[15], this study designs an extended Kalman-particle 
term. 
filter (EKF-PF) hybrid architecture that achieves 
In terms of temporal modeling, a two-layer 
robustness optimization of trajectory estimation in ice 
bidirectional LSTM structure (hidden layer dimension 
and snow sports scenarios through multimodal data 
256) is adopted. Peephole connections are introduced 
fusion and physical constraint modeling. This 
inside each LSTM unit to enhance the temporal memory 
architecture complements the advantages of the model-
ability. At the same time, the zoneout mechanism 
based EKF and the data-driven PF: the EKF module uses 
(probability 0.2) is used to prevent overfitting and 
the acceleration and angular velocity data of the IMU 
effectively model the continuity and periodicity of human 
(sampling rate 200Hz) to represent the rigid body motion 
motion. 
state through the Lie group SE (3), avoiding the Euler 
The improved PointNet++ architecture adopts an 
angle singularity problem. Its state equation incorporates 
importance sampling strategy based on motion energy in 
the moment of inertia tensor constraint of the ski 
the point cloud sampling stage, giving higher sampling 
equipment, making the attitude estimation error stable 
 
356   Informatica 49 (2025) 351–360                                                                                      Y. Liu 
 
within 2°. The state equation is: (e.g., fresh snow, compacted snow, ice crystal snow) were 
 simulated using a snow density gradient apparatus (0.1–
x 0.4 g/cm³). Each scenario was repeated 20 times under a 
t∣t−1 = f(xt−1, ut)                 (8) 
snowfall intensity of 5–7 mm/h to collect radar 
 
intermediate frequency signals and environmental 
xt∣t−1 is the prior state estimate at time t; f is the variables, serving as the static clutter reference.3) 
state transfer function; xt−1 is the state at the previous Dynamic Motion Capture: 30 professional winter sports 
moment; ut is the control input. athletes (across freestyle skiing, alpine skiing, and 
snowboarding disciplines) performed standardized 
The particle filter module improves the nonlinear 
actions including linear gliding, sharp turning, and 
characteristics of ice and snow sports, and introduces a 
airborne rotations. Each action was repeated 15 times. 
multi-physics field coupling model in the importance 
Raw mmWave point clouds, inertial data, and optical 
sampling stage. When the snowboard lands, the Hertz 
motion capture (Vicon, 200 Hz) were synchronously 
contact theory is used to construct the snow surface 
recorded. The study uses both optimized and unoptimized 
interaction model (stiffness coefficient k=5×10⁴ N/m³), 
radar pipelines to quantify performance gains.  
and the friction coefficient μ is dynamically adjusted in 
All trials were conducted within a purpose-built 
combination with the compression characteristics of 
climate-controlled snow chamber (5 m × 8 m × 3 m), with 
snow particles (adaptive changes in the range of 0.03-
temperature regulation from −30 °C to 25 °C (±0.5 °C 
0.15). In the airborne stage, the law of conservation of 
accuracy) and snow layers of 10–50 cm. All sensors were 
angular momentum is strictly followed, and the center of 
synchronized via PTP protocol. A five-fold cross-
mass trajectory is corrected by constraining the moment 
validation strategy was applied by stratifying athletes 
of inertia ratio of the limbs relative to the trunk, so that 
across training and testing splits to ensure 
the trajectory error of the jumping action is significantly 
generalizability and prevent overfitting. 
reduced. The importance sampling weight update 
In order to verify the performance of the proposed 
formula is: 
(i) (i) (i) framework in a real ice and snow environment, this study 
wt ∝ wt−1 ⋅ 𝑝(zt ∣ xt )       (9) built a professional test environment with climate control 
 characteristics. The core of the experimental platform is 
(i) (i)
wt  and w a customized ice and snow environment simulation cabin, 
t−1 are the weights of the ith particle at 
time t and t-1; 𝑝 is the observation probability, which which adopts a double-layer insulation structure design. 
(i)
refers to the probability of observing zt in state xt . The inner layer is a 5 m×8 m×3 m test space, equipped 
The fusion strategy of the hybrid architecture adopts with a precision temperature control system (temperature 
a dynamic probability weighting mechanism: the control range -30℃ to 25℃, accuracy ±0.5℃) and a 
confidence weight (0.7) of the millimeter-wave radar is humidity control device (relative humidity control range 
adjusted in real time according to the point cloud density 30%~90%). The floor of the cabin is paved with an 
and signal-to-noise ratio (in the range of 0.6-0.8), and the artificial snow layer with an adjustable thickness (10~50 
IMU weight (0.3) is inversely correlated with its cm), and the snow density is controlled in the range of 
gyroscope zero-bias stability index. 0.1~0.4 g/cm³ to simulate different snow conditions. The 
test scenario configuration includes: 1) a multi-angle 
adjustable millimeter wave radar array (4 77 GHz FMCW 
4 Results and discussion radars, bandwidth 4 GHz, maximum output power 10 
dBm), installed at a height of 2.5m and distributed in a 
4.1 Study design ring; 2) a reference-level optical motion capture system 
(12 Vicon Vero series cameras, sampling rate 200 Hz) as 
The experimental protocol includes three levels of 
the baseline truth value; 3) distributed IMU nodes (9-axis 
testing procedures: To ensure the reproducibility and 
sensors, bandwidth 200 Hz) fixed at the main joints of the 
rigor of the evaluation, the study design includes detailed 
subjects; 4) environmental parameter monitoring 
specifications on trial repetition, control configurations, 
terminals, real-time recording of variables such as 
and validation protocols. The three-stage experiment 
temperature, humidity, and wind speed. All devices are 
comprises the following components:1) Baseline 
time synchronized through the PTP protocol, and data 
Calibration: Conducted in a controlled environment 
acquisition is controlled by a unified trigger signal to 
(−5 °C, 60% relative humidity), a KUKA KR6 R900 
ensure the time alignment accuracy of multimodal data. 
robotic arm equipped with a 10 dBsm radar corner 
During the test, the subjects wore standard skiing 
reflector executes preset trajectories with linear velocities 
equipment and completed the specified action sequence 
ranging from 0–15 m/s and angular velocities up to 
(straight sliding, sharp turns, jumping, etc.), 
1080°/s. A total of 300 motion sequences were collected 
synchronously collecting millimeter wave point clouds, 
to calibrate the internal parameters of the mmWave radar 
IMU data, and optical motion capture coordinates to 
and to establish the unoptimized system baseline.2) Static 
construct a multidimensional data set covering different 
Interference Test: Five representative snow conditions 
motion states [16]. While the Vicon system provides 
 
Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 357 
 
high-accuracy ground truth, the framework is compatible downhills, and 12 snowboard big air) to evaluate the 
with alternative motion tracking modalities, such as algorithm optimization effect of millimeter wave radar in 
wearable IMUs or markerless systems like OpenPose extreme environments. The millimeter wave radar before 
[17], ensuring flexibility in deployment. and after optimization captures the joint positioning 
RMSE of the athlete's movements and the angular 
4.2 Quantitative indicator analysis velocity error of the aerial rotation movement. The results 
are shown in Figures 3 and 4: 
4.2.1 Motion capture accuracy verification 
 
This paper selects 30 ice and snow athletes 
(including 10 freestyle skiing aerials, 8 alpine skiing 
 
Figure 3: RMSE of joint positioning 
 
As shown in Figure 3, the RMSE comparison data a breakthrough accuracy of 1.1 cm in straight-line gliding 
of joint positioning before and after the optimization of conditions, which is primarily attributed to the enhanced 
the millimeter-wave radar anti-interference algorithm suppression effect of the improved MTI filter on snow 
shows that the joint positioning errors in the tests of the and fog-induced multipath interference. For each athlete, 
top 30 ice and snow athletes are distributed in the range the joint positioning RMSE values represent the mean of 
of 7.0-13.0 cm, among which the error of 13.0 cm for 15 repeated trials, with standard deviation error bars 
athlete No. 30 reaches the maximum value, reflecting the shown in Figure 3. A two-tailed paired t-test comparing 
performance bottleneck of the millimeter-wave radar the baseline and optimized systems revealed statistically 
under extreme sports conditions before optimization. significant improvements across all athletes (p < 0.01), 
After optimization, the error range is compressed to 1.1- demonstrating that the proposed anti-interference 
7.0 cm through the synergy of the cascaded clutter algorithm robustly enhances the motion capture accuracy 
suppression module and the 3D convolution-LSTM of millimeter-wave radar under complex snow 
spatiotemporal hybrid network. Athlete No. 22 achieves environments. 
 
358   Informatica 49 (2025) 351–360                                                                                      Y. Liu 
 
 
 
Figure 4: Angular velocity error 
 
As shown in Figure 4, The average angular velocity difficult rotation movements. However, under extreme 
error of the millimeter-wave radar in the tests of the top conditions, there is still a residual error of 2.8°/s, which 
30 ice and snow athletes before algorithm optimization is is mainly due to insufficient compensation for Doppler 
5.25°/s (range 3.2-6.9°/s), among which athlete No. 25 frequency shift caused by high-speed movement. In the 
had a maximum error of 6.9°/s when completing a 1080° future, the tracking performance of the system in high-
rotation, exposing the phase loss problem of millimeter- dynamic scenarios will be further improved by 
wave radar for high angular velocity dynamic tracking. introducing an adaptive carrier tracking loop and 
Through the joint optimization of the cascaded clutter hardware acceleration processing to meet the stringent 
suppression module and the 3D convolution-LSTM requirements for motion capture accuracy. 
spatiotemporal hybrid network, the average angular  
velocity error is significantly reduced to 1.46°/s (range 0-
4.2.2 Anti-interference performance analysis 
2.8°/s) after optimization, a decrease of 72%, and athletes 
In response to the extreme weather interference 
No. 25 and 26 achieve zero-error tracking. The reduction 
common in ice and snow sports, this study builds a multi-
in angular velocity error was statistically significant 
physics field coupling model to test the anti-interference 
across athletes (paired t-test, p < 0.01), with error bars in 
ability of millimeter-wave radar under different 
Figure 4 indicating standard deviation over 15 repetitions 
meteorological conditions. As shown in Table 2, in the 
per action. Experimental results show that the proposed 
simulated blizzard weather (snowfall>5mm/h) test, the 
anti-interference algorithm significantly improves the 
proposed cascaded clutter suppression module shows 
angular velocity tracking accuracy of millimeter-wave 
excellent multipath interference suppression ability: 
radar in ice and snow sports scenarios, especially in 
 
Table 2: Anti-interference ability test results 
 
Multipath Suppression Ratio 
Test Scenario Snowfall Intensity (mm/h) Positioning Error (cm) 
(dB) 
Freestyle Ski Aerials 5.8 28.2 3.2 
Snowboard Big Air Landing 6.3 25.7 7.0 
Alpine Ski Downhill 7.1 26.9 4.5 
Cross-Country Ski Curves 5.2 29.4 2.1 
Biathlon Shooting 6.0 27.5 3.8 
 
Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 359 
 
 
Table 2 shows the anti-interference performance test adjustment technology maintains the multipath 
data of millimeter-wave radar for different ice and snow suppression ratio at 25.7 dB, it still needs to further 
sports scenes in a blizzard environment. In the range of optimize the phase noise suppression by improving the 
5.2-7.1mm/h snowfall intensity, it shows excellent DPLL loop bandwidth. These quantitative results not 
performance in freestyle skiing aerial skills scenes, only confirm the reliability of millimeter-wave radar in 
achieving a multipath suppression ratio of 28.2 dB and a extreme ice and snow environments but also provide a 
positioning error of 3.2 cm. This achievement is mainly clear direction for subsequent technology iterations, 
due to the synergy of the cascaded clutter suppression especially for the optimization needs of Doppler 
module and the 3D convolution-LSTM spatiotemporal compensation algorithms in ultra-high-speed scenarios. 
hybrid network. In contrast, the single-board large  
platform landing impact scene is affected by the 8 G 
4.2.3 Real-Time verification 
impact acceleration, and the positioning error rises to 7.0 
The end-to-end processing time from radar signal 
cm, which directly reflects the interference effect of 
input to trajectory output is recorded to understand the 
carrier frequency deviation and multiple reflections of the 
real-time performance of the millimeter-wave radar. The 
snow layer on the propagation of millimeter-wave signals. 
results are shown in Table 3: 
Among them, although the dynamic waveform 
 
Table 3: Real-time test results 
 
Test Scenario Processing Delay (ms) Multi-Target Capacity Frame Rate (fps) 
Freestyle Ski Aerials 24.2 ± 1.5 3 athletes 38 
Snowboard Big Air 
22.8 ± 1.2 3 athletes 40 
Landing 
Alpine Ski Downhill 21.5 ± 0.8 3 athletes 42 
Cross-Country Ski Curves 23.1 ± 1.1 3 athletes 39 
Biathlon Shooting 25.6 ± 1.8 3 athletes 36 
 
Table 3 compares the real-time performance of the acceptable bounds for real-time snow sports motion 
system across different ice and snow sports scenarios capture. 
along four key dimensions. In terms of processing delay, 
all scenarios maintain latencies below 26 ms, with alpine 5 Conclusion 
skiing downhill achieving the best performance 
(21.5 ± 0.8 ms), and biathlon shooting exhibiting the This study proposes a cascaded anti-interference 
highest delay (25.6 ± 1.8 ms). The multi-target tracking architecture to address multipath and frequency offset 
capability consistently supports the simultaneous problems in mmWave radar-based motion capture under 
tracking of three athletes across all test conditions. The snowy environmental conditions. Through the integration 
system frame rate remains in the range of 36–42 FPS, of adaptive MTI filtering, genetic sparse array 
fully meeting the real-time demands of competitive snow reconstruction, and hybrid carrier tracking, combined 
sports motion capture. These results clearly demonstrate with a deep spatiotemporal 3D CNN–LSTM decoding 
the real-time performance advantages of the proposed network and multimodal EKF–PF fusion, the proposed 
anti-interference algorithm in complex ice and snow system demonstrates significant improvements in 
environments, providing a reliable foundation for the accuracy, robustness, and real-time performance. 
practical deployment of markerless motion capture The main contributions of this study are as follows: 
systems in elite athletic training and competition. It is 1) A three-stage signal processing pipeline is 
important to distinguish between algorithmic latency and designed to suppress snow-induced multipath clutter and 
full-system latency. The 8 ms latency reported in Table 1 frequency distortion, improving low-SNR motion signal 
reflects only the simulation-based execution time of the reconstruction. 
optimized cascade pipeline, evaluated using FPGA and 2) A novel deep learning-based decoder is developed, 
CUDA acceleration. In contrast, the real-world latencies leveraging 3D CNN and LSTM to model complex 
presented in Table 3 include end-to-end delays such as temporal-spatial dependencies in radar point clouds. 
radar signal acquisition, data transfer, and multi-target 3) A multimodal fusion strategy integrating 
processing overhead, resulting in a total system delay of extended Kalman filtering and particle filtering is 
21–26 ms. Despite this, the system remains within introduced for robust trajectory estimation in dynamic, 
 
360   Informatica 49 (2025) 351–360                                                                                      Y. Liu 
 
cluttered environments. from random projections: Universal encoding 
The proposed method is validated on both simulated strategies?” IEEE Trans. Inf. Theory, vol. 52, no. 12, 
and real-world datasets involving elite snow sport pp. 5406–5425, Dec. 2006. 
athletes, showing that the system achieves sub-centimeter https://doi.org/10.1109/TIT.2006.885507 
RMSE accuracy and end-to-end latency below 26 ms [9]  Y. Zhang, X. Liu, and J. Wang, “Genetic sparse array 
across diverse scenarios. In future work, we plan to optimization for millimeter-wave radar in snow 
enhance tracking under extreme dynamics by interference environments,” IEEE Trans. Geosci. 
incorporating event-based vision sensors (e.g., DVS), Remote Sens., vol. 61, pp. 1–12, 2023. 
which can further reduce motion blur and improve delay [10] N. Kumar and R. Patel, “Temperature-compensated 
robustness in high-speed actions [18]. Additionally, PLL design for FMCW radar in harsh environments,” 
integrating edge computing and hardware acceleration IEEE Trans. Circuits Syst. I, vol. 68, no. 5, pp. 
(e.g., FPGA optimization) will be explored to further 2065–2077, 2021. 
optimize latency for large-scale deployment. [11] E. Baccarelli and M. Scarpiniti, “Robust deep 
 filtering architectures for noisy radar environments,” 
IEEE Access, vol. 11, pp. 22256–22267, 2023. 
Funding [12] X. Tang and Q. Song, “Comparative evaluation of 
radar-based and vision-based human motion capture 
This study is supported by "Research on the Upgrading 
systems,” Meas. Sci. Technol., vol. 34, no. 2, Art. no. 
and Development of China's Sports Industry Driven by 
025109, 2023. 
New Productive Forces" (No.SKL-2025-1135). 
[13] J. Wu, S. Zhao, and Y. Liu, “Deep spatiotemporal 
modeling with CNN-LSTM for real-time radar-
References based motion capture,” Pattern Recognit., vol. 131, 
Art. no. 108885, 2022. 
https://doi.org/10.13718/j.cnki.xdzk.2024.05.016 
[1]  H. Li, S. Qiu, and Y. Ma, “A survey on human 
[14] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention 
activity recognition using millimeter-wave radar,” 
is all you need,” in Adv. Neural Inf. Process. Syst., 
ACM Comput. Surv., vol. 56, no. 4, pp. 1–36, 2023. 
vol. 30, pp. 5998–6008, 2017. 
[2]  X. Y. Zhang and G. P. Qiu, “Research on human 
https://proceedings.neurips.cc/paper/2017/hash/3f5
motion capture based on improved LM algorithm 
ee243547dee91fbd053c1c4a845aa-Abstract.html 
and dynamic time warping algorithm,” J. Southwest 
[15] T. Chen, R. Zhang, and K. Huang, “Hybrid Kalman-
Univ. (Nat. Sci. Ed.), vol. 46, no. 5, pp. 175–185, 
particle filtering for multimodal sensor fusion,” 
2024. 
IEEE Sens. J., vol. 22, no. 9, pp. 8654–8664, 2022. 
https://doi.org/10.13718/j.cnki.xdzk.2024.05.016 
[16] S. Ahmed, S. Kim, and M. Park, “Snow Sense: A 
[3]  S. F. Li, X. S. Zhang, Y. Guo, X. C. Li, L. Shi, and 
radar-based dataset for motion capture in snowy 
T. H. Zhan, “Biomechanical study of markerless 
conditions,” Sensors, vol. 22, no. 3, Art. no. 1011, 
motion capture technology in FMS squat action,” 
2022. 
Med. Biomech., vol. 39, no. S01, p. 513, 2024. 
[17] Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. 
[4]  X. H. Li, D. F. Fan, J. J. Feng, Y. Lei, C. Cheng, and 
Sheikh, “Open Pose: Realtime multi-person 2D pose 
X. N. Li, “Systematic review of motion capture in 
estimation using part affinity fields,” IEEE Trans. 
virtual reality: Enhancing the precision of sports 
Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–
training,” J. Ambient Intell. Smart Environ., vol. 17, 
186, 2019. 
no. 1, pp. 5–27, 2025. https://doi.org/10.3233/AIS-
https://doi.org/10.48550/arXiv.1812.08008 
230 
[18] G. Gallego, T. Delbrück, and D. Scaramuzza, 
[5]  H. Chen, “Human motion capture data retrieval and 
“Event-based vision: A survey,” IEEE Trans. Pattern 
segmentation technology for professional sports 
Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, 
training,” J. Mobile Multimedia, vol. 19, no. 2, pp. 
2022. 
419–436, 2023. https://doi.org/10.13052/jmm1550-
https://doi.org/10.1109/TPAMI.2020.3008413. 
4646.1923 
 
[6]  B. Teer, “Performance analysis of sports training 
based on random forest algorithm and infrared 
motion capture,” J. Intell. Fuzzy Syst., vol. 40, no. 4, 
pp. 6853–6863, 2021. https://doi.org/10.3233/JIFS-
189517 
[7]  T. Alam, M. Benaida, “Smart Curriculum Mapping 
and Its Role in Outcome-based Education”, 
Informatica, vol. 46, no. 4. 
https://doi.org/10.31449/inf.v46i4.3717 
[8]  E. Candes and T. Tao, “Near-optimal signal recovery 
 
https://doi.org/10.31449/inf.v49i16.10164 Informatica 49 (2025) 361–372 361 
 
Improved DenseNet-DCGAN for Enhanced Digital Restoration of 
Embroidery Cultural Heritage 
 
Guiying Dong1*, Qian Mao2 
1College of Art and Design, Communication University of China Nanjing, Nanjing 210000, China 
2Library, Nanjing University, Nanjing 210000, China 
E-mail: adong118@126.com; maoqian328@163.com 
*Corresponding author 
Keywords: DCGAN, DenseNet, embroidery, image classification, image restoration 
Received: July 14, 2025 
At present, embroidery image restoration technology still has deficiencies in terms of color uniformity 
and detail restoration. To address these issues, the study improves the densely connected convolutional 
network and the deep convolutional generative adversarial network through spatial pyramid pooling, and 
proposes a novel method for embroidery image classification and restoration. The experimental results 
showed that the research method largely restored the details and colors of the original image and 
effectively addressed the uneven color issue. The average prediction accuracy, recall rate, and specificity 
of the image classification model on Suzhou embroidery, Hunan embroidery, Guangdong embroidery, 
and Shu embroidery reached 96.3%, 98.5%, and 99.4%, respectively. The structural similarity index of 
the image restoration model has reached 0.99. The restored image was almost indistinguishable to the 
naked eye in terms of details, texture, and color. The research method has significant advantages in 
classifying embroidery images and high-quality restoration tasks, and can provide reliable technical 
support for the digital protection and intelligent restoration of traditional embroidery cultural relics. 
 
Povzetek: Za klasifikacijo in digitalno obnovo vezenin so razviti izboljšani DenseNet in DCGAN z 
dodanim SPP, razširjenimi konvolucijami ter CBAM. Izboljšani model skoraj povsem naravno obnovi 
teksture in barve. 
 
 
1 Introduction architectural elements in the images of Greek temple 
ruins, improving the efficiency of restoration and  
Embroidery works have attracted countless people's enhancing the consistency and accuracy of the restoration 
attention with their exquisite craftsmanship, rich patterns, effect [3]. Alessandro et al. used a trained 
and profound cultural connotations. However, over time, multidimensional DL neural network to associate color 
many embroidery artifacts have suffered from natural or images with X-ray fluorescence imaging raw data to 
human damage, such as fading, breakage, and insect complete the restoration of AI digital cultural heritage, 
infestation, which seriously threaten the preservation and achieving digital restoration of graphic artworks [4]. 
inheritance of embroidery artifacts [1]. The traditional With the further advancement of DL technology, 
restoration of Embroidered Cultural Relics (ECR) mainly Generative Adversarial Networks (GANs) have made 
relies on manual skills. Although this method can finely breakthrough progress in image recognition, providing a 
handle every damage, it is limited by lower work good solution for cultural relic image restoration [5]. 
efficiency and dependence on the superb skills of the Praveen et al. proposed a new GAN-based art restoration 
restorer [2]. In addition, the subjectivity in the manual method to digitally repair damaged artworks and assist in 
repair process may also lead to deviations in the physical restoration. This method performed well in 
consistency and accuracy of the repair effect. In this digital restoration and could effectively restore the 
context, the emergence of Artificial Intelligence (AI) original appearance of artworks, providing important 
technology, especially Deep Learning (DL) technology, guidance for physical restoration [6]. Zheng et al. 
has provided new solutions for the restoration of cultural proposed an Example Attention Generative Adversarial 
relics. By training DL models, staff can automatically Network (EA-GAN) that fuses with reference examples, 
detect and classify the types of damage to cultural relics, which addressed the issue of significant reconstruction 
providing a scientific basis for restoration work. At errors in traditional character restoration methods. 
present, many researchers have explored it. For example, Compared with existing internal drawing networks, EA-
Maitin et al. proposed a direct reconstruction technique GAN could obtain the correct text structure through the 
without image segmentation using DL technology to guidance of additional examples in the "example attention 
reconstruct missing architectural elements in Greek block". The Peak Signal-to-Noise Ratio (PSNR) and 
temple ruins images from virtual image paintings. This 
method has successfully reconstructed the missing  
362   Informatica 49 (2025) 361–372                                                                                                                                G. Dong et al. 
 
Structural Similarity Index (SSIM) values have increased DL model architecture that can establish dense 
by 9.82% and 1.82% [7]. connections between network layers through 
In summary, numerous scholars have achieved DenseBlocks, thereby improving the information flow and 
significant results in cultural relic image restoration. gradient flow of the network, alleviating the problem of 
However, there are still issues with GAN in terms of gradient vanishing, and promoting feature reuse [8-9]. The 
image feature extraction, such as poor network training structure of DenseBlock in DenseNet is displayed in Fig.1. 
stability and poor generated image quality. At present, In Fig.1, the connection mechanism of DenseBlock is 
there is relatively limited discussion on embroidery more aggressive compared to the Residual Network 
classification and restoration in cultural relic image (ResNet). Each layer is connected to all previous layers, 
classification and restoration. Given this, this study providing each layer with a rich input that integrates the 
innovatively constructs an ECR-Image Classification features of all previous layers [10]. This design ensures the 
Model (ICM) based on Densely Connected Convolutional uniformity of feature map size within DenseBlock and 
Network (DenseNet) and an ECR-Image Restoration greatly promotes feature reuse through dense connections 
Model (IRM) based on Deep Convolutional GAN between layers, enabling the network to learn and transmit 
(DCGAN). Based on these models, improvements are information more effectively [11]. However, DenseNet 
made by introducing Local Binary Patterns (LBPs), Canny still has certain shortcomings in the image classification 
operator edge extraction, and Convolutional Block process, such as the problem of input image size limitation 
Attention Module (CBAM). The fusion of these and the problem of network training not converging [12-
technologies aims to enhance the model's capacity to 13]. Therefore, this study improves it through techniques 
capture details in ECR images, improve the precise such as SPP, LBP, and Canny operator, and proposes a 
reconstruction of textures and edges during the restoration novel ECR-ICM model, namely the SPP-IDenseNet 
process, and achieve higher quality ECR image restoration model. The training process for the embroidery image 
results. The main novelizations and contributions of this classification of this model is shown in Fig.2. 
paper include: (1) For the first time, DenseNet is combined In Fig.2, this study first randomly selects a batch of 
with Spatial Pyramid Pooling (SPP) and applied to data from the training set based on a preset batch size, and 
classify embroidery images, improving the recognition normalizes it to standardize the standard deviation of the 
performance under cross-style and complex patterns; (2) Red-Green-Blue (RGB) color channels for each 
The structure of the DCGAN generator and discriminator embroidery image. Subsequently, the normalized image is 
is innovatively adjusted. By integrating a dilated input into the network for forward propagation to extract 
convolutional layer, the receptive field of the model is features and predict categories. Secondly, by comparing 
expanded, which helps to capture image features more the predicted categories of the network with the actual 
comprehensively and achieve high-quality restoration of categories, the value of the loss function is calculated. 
embroidery texture and color. (3) A large-scale dataset Next, by adjusting the weights through the backward 
containing eight types of traditional embroidery images is propagation process of the network, the model’s 
constructed, providing fundamental support for performance is optimized. After completing a batch of 
subsequent research. The research results have practical training, the system will check if the entire dataset has 
value for the digital inheritance and AI-assisted restoration been traversed. If the traversal is not completed, the model 
of traditional embroidery culture. will continue to process the next training batch and repeat 
the above steps. Once the training traversal of the entire 
2 Methods and materials dataset is completed, the model will save the weight 
parameters of the current round and evaluate whether the 
predetermined training round has been reached. If the 
2.1 Construction of ECR-ICM based on training rounds have not been completed, the model will 
SPP-IDenseNet restart the training process and continue iterative 
ECR image classification is the prerequisite and optimization. After reaching the predetermined training 
foundation for ECR image restoration. By classifying round, the model training terminates, and the weight 
ECR images, different embroidery types, styles, and eras parameters at this time will be used for subsequent image 
can be quickly identified and distinguished, providing a classification tasks. The calculation of the RGB three 
scientific foundation for protecting the cultural relics. This channel pixel values OutputR , OutputG , and OutputB  of 
study first explores the classification of ECR images. the normalized image is shown in formula (1). 
DenseNet was proposed by Gao et al. in 2017. It is a novel 
 
Figure 1: Schematic structure of DenseBlock (Source from: https://colorhub.me/photos/e7RVB). 
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 363 
 
Embroidery data set
Divide the training set into Preprocessing Feature extraction
multiple training batches
Start
N
Y
Stop Training rounds met
Backward propagation Forward 
of updated weights calculated loss
Y
Traversing the N
Save the 
training set
model
Preprocessing the 
next training batch
 
Figure 2: Training process of SPP-IDenseNet model for embroidery image classification (Source from: 
https://colorhub.me/photos/e7RVB). 
Input feature 
ConvL adjusts the amount of channels to 
map
half the original number
Max poor Max poor
4×4 16×16
Output 
feature map FCL splicing feature map channel
 
Figure 3: Schematic diagram of SPP structure. 
 IutputR −mean In formula (2), n  is the hierarchy of the model 
Output = R
R network. F  and   are convolution operations and 
 stdR
feature interconnection operations. The loss l  during the 
 Iutput −mean
Output = G G
G  (1) training process is shown in formula (3). 
 stdG l = L(Y Y '
1, 1 )   (3) 
 Iutput −mean
Output = B B In formula (3), L  is the loss function. Y  and Y '  are 
B 1 1
 stdB the real category and the predicted category. The updated 
In formula (1), IutputR , IutputG , and Iutput   is shown in formula (4). 
B  are network weight '
the RGB three channel pixel values of the image before  ' = − lr g(l)   (4) 
normalization processing. meanR , meanG , and meanB  In formula (4),   is the network weight before the 
are the mean values of the RGB channels. stdR , stdG , update. lr  and g()  are learning efficiency and 
and stdB  represent the standard deviation of the RGB derivative calculation. In response to the issue of input 
image size limitation in DenseNet model image 
three channels. The output feature Mn  is shown in 
classification tasks, this study uses SPP to enable the 
formula (2). model to adapt to input images of different sizes. The 
Mn = F(M1 M2  Mn−1)  (2) structure of SPP is shown in Fig.3. 
364   Informatica 49 (2025) 361–372                                                                                                                                G. Dong et al. 
 
In Fig.3, this study integrates the SPP structure the model, it may also cause training instability and 
between the convolutional layer and the Fully Connected sometimes even lead to model training crashes [16-17]. 
Layer (FCL) at the end of the DenseNet model. By Therefore, this study further introduces a novel derivative 
dividing the feature map into grids of 1×1, 4×4, and GAN, namely DCGAN. This network can improve the 
16×16, and applying max pooling, this study achieves quality of image generation and enhance the learning and 
comprehensive capture of features of different resolutions. representation capabilities of the model by combining the 
Subsequently, these multi-scale pooled feature maps will deep architecture of CNN with the GAN framework. The 
be merged into a fixed-length feature vector, providing generator extends and reshapes 100-dimensional noise 
rich information for the input of FCL. In addition, by into a 3D feature map through FCL, and then gradually 
pooling on windows of different sizes, this study generates forms the final image size through upsampling and 
feature maps with diverse resolutions and fine-tunes the dimension adjustment of transposed convolutional layers. 
channel dimensions through a 1×1 convolutional layer. Batch normalization and ReLU are applied after each 
The ReLU activation function used in DenseNet may layer, and the output image is finally activated by Sigmoid 
cause neuron deactivation when the input is less than 0 to produce a specific tensor image [18-19]. The 
[14]. Therefore, this study introduces the Leaky ReLU generator’s loss function is shown in formula (5). 
function and sets the negative slope coefficient to 0.01, LG `= −Ez~ pz(z)[log(D`(G`(z))]  (5) 
effectively expanding the applicability of ReLU and In formula (5), E  is the expected operation symbol, 
promoting the stability and convergence of network usually taken as the average or expected value. z  is a 
training. The SPP module enhances the model's noise sample from the latent spatial prior distribution. 
understanding of the structural hierarchy of embroidery 
G、(z)  is the data generated by the generator through the 
patterns through multi-scale pooling operations and 
improves the receptive field coverage of complex patterns. noise sample z . D、(G、(z))  is the output of the 
LBP extracts fine-grained texture features from discriminator to the data generated by the generator, which 
embroidery images, enabling the model to pay more represents the probability of real data. At this point, the 
attention to the local texture restoration of the defect area. loss function of the discriminator is shown in formula (6). 
Canny edge detection provides clear structural contour LD `= −Ex~ pdata(x)[log(D`(xz )]
constraints, guiding the generator to maintain the  (6) 
coherence and integrity of the pattern edges. The three −Ez~ pz (z )[1− D(G`(xz )))]
work in synergy, enhancing the quality and stability of In formula (6), D、(x me ns 
z )  a the discriminator’s 
image restoration from multiple dimensions, such as 
output for the real sample x , and D`(x i  th  
structure, texture, and edge. z )  s e
z
probability of the real data. Based on the above formulas, 
2.2 Construction of ECR-IRM based on compared to traditional GANs, DCGAN uses 
convolutional and deconvolution layers to replace FCL in 
Improved DCGAN traditional GANs. This operation can capture the local 
The SPP-IDenseNet model designed above provides structure and spatial message of embroidery images [20]. 
strong technical support for the digital restoration and In addition, DCGAN also uses batch normalization 
intelligent management of ECR. However, further techniques and expected values to accelerate the training 
technological innovation and method improvement are process and stabilize the training of GAN. The aim is to 
needed in ECR image restoration to achieve more efficient further enhance the performance of DCGAN in 
and accurate restoration results. Therefore, this study embroidery image restoration tasks, improve the 
explores the restoration of ECR images. GAN is a DL naturalness of restoration effects, and provide experts with 
model containing two parts: the Generator and more accurate texture and color information to assist them 
Discriminator. Although GANs are widely popular in in more refined restoration work. Given this, the study also 
computer image vision, in traditional GAN architectures, improves DCGAN and proposes a new type of ECR-IRM, 
models do not rely on a determined distribution, but namely IDCGAN. The overall model structure is shown in 
instead use internal feedback to adjust their parameters Fig.4. 
[15]. Although this approach enhances the flexibility of 
Y/N
Generator D
Extracting 
Generator G Generated image missing regions
 
Figure 4: Overall structural framework of the IDCGAN (Source from: https://colorhub.me/photos/e7RVB). 
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 365 
 
Conv3 Conv4 Conv5 Conv6 Conv7 Conv8 Conv9
Conv2
Conv1
Input CBAM
Conv14 Conv13Conv12 Conv11 Conv10
Input Tanh CBAM
Conv17 Conv16 Conv15
 
Figure 5: Specific structure of the generator in the IDCGAN model. 
5×5
Conv2 5×5 5×5 5×5 5×5
Conv3 Conv4 Conv5 Conv6
5×5
Conv1
Input
256
256 256 256
64 128
Y/N Flatten3 Flatten2 Flatten1
 
Figure 6: Specific structure of the discriminator in the IDCGAN. 
In Fig.4, innovative adjustments are made to the namely the convolution block, dilated convolution block, 
generator architecture by integrating dilated convolutional and CBAM. Hollow convolution blocks use convolutional 
layers to expand the model's receptive field, thereby layers with different void rates, namely 2, 4, 8, and 16, to 
helping the model capture image features more achieve multi-scale capture of image features. When the 
comprehensively. At the same time, CBAM is introduced hole rate is set to 1, the hole convolution degenerates into 
to enhance the attention to key features at both the channel a standard convolution operation. This is reflected in the 
and spatial levels, thereby improving the accuracy of Conv6 to Conv10 layers of the generator, forming a series 
image restoration. The discriminator adopts a strategy of of ConvLs with different hole rates that ensure the 
enhancing its depth and increasing the number of FCLs, flexibility and adaptability of the network. The 
thereby improving the network's ability to handle complex introduction of CBAM adds the ability for dynamic 
nonlinear problems, enabling the discriminator to more weighting to the generator. It can weight features in both 
effectively recognize and distinguish between real and channel and spatial dimensions, highlighting the features 
generated images. The loss function combines traditional that have the greatest impact on image quality. The 
MSE loss with adversarial loss. The calculation of mean framework of the discriminator in the IDCGAN model is 
square error loss LMSE  is shown in formula (7). shown in Fig.6. 
In Fig.6, to improve the performance of the 
1 n
L 2 discriminator in addressing complex nonlinear problems, 
MSE = (yi − gi )  (7) 
n i=1 this study adds two FCLs based on the original 
In formula (7), gi  is the predicted value of the model discriminator architecture, making the discriminator 
contain a total of three FCLs. The interconnection of these 
on the training data xi . The adversarial loss Ladv  is shown 
layers enhances the discriminator's ability to learn 
in formula (8). features, thereby significantly improving model 
Ladv = minG maxD Ex P log2 D(x)
data performance. Ultimately, the discriminator determines the 
 
+E authenticity of the input image through a binary 
z P log2 (1−D(G(Z)))  (8)
data classification task, distinguishing whether the image was 
In formula (8), the algebraic meaning remains the generated by the generator or from a real dataset. The 
same as before. The specific structure of the generator in research is conducted based on a self-built embroidery 
the IDCGAN model is shown in Fig.5. image dataset. The images mainly come from digital 
In Fig.5, the architecture of the generator in the museums, high-resolution cultural relic catalogues, and 
IDCGAN model mainly consists of three key modules, 
366   Informatica 49 (2025) 361–372                                                                                                                                G. Dong et al. 
 
cultural heritage archives, covering multiple historical 3 Results 
periods and diverse embroidery styles. The initial dataset 
contains 1,800 images. After expansion, the dataset 
3.1 SPP-IDenseNet model performance 
ultimately includes 8,957 images. For the unified model 
input, the image is cropped and scaled to 256×256 pixels, testing 
and normalization processing is carried out The study adopts five-fold cross-validation to evaluate the 
simultaneously. Ultimately, the dataset is divided into a model’s performance. The training set is evenly divided 
training set and a test set in an 8:2 ratio. To simulate the into five subsets of similar size. Four subsets are selected 
common damage forms of ECR, the study also uses in sequence for model training, and the remaining subset 
random occlusion to generate defect images. The is used as the validation set. This process is repeated five 
occlusion forms include rectangles, free-shaped patterns, times to ensure that each subset participates in the 
and speckled textures, and the area ratio is controlled at verification. Through multiple rounds of training and 
10% to 40%. On this basis, image enhancement is carried validation, the mean and standard deviation of the 
out by applying methods such as rotation, flipping, scaling, accuracy, recall rate, and specificity of the calculation 
and color perturbation to improve the robustness and model are calculated, effectively avoiding the randomness 
generalization ability of the model. In addition, by brought by a single division and enhancing the statistical 
analyzing the color and style distribution of the images, a reliability and generalization ability of the evaluation 
balanced sampling strategy is adopted to control the results. Table 1 shows the experimental setup and 
category bias, ensuring the diversity and balance of the environmental parameters. 
training data in terms of pattern style and damage type. All According to the settings in Table 1, the effectiveness 
the code modules in the research are built based on the of the proposed model was first validated through ablation 
PyTorch framework. Some of the code is as follows: testing, as shown in Fig.8. 
import torch
import torch.nn as nn
# Simple Generator example
class Generator(nn.Module):
def__ init__ (self):
super().__init__ ()
self.net = nn.Sequential(
nn.Linear(100, 256),
nn.ReLU(),
nn.Linear(256, 3*64*64).
nn.Sigmoid()
def forward(self, x):
retum self.net(x).view(-1, 3, 64, 64)
# Simple Discriminator example
class Discriminator(nn.Module):
def__ init__(self):
super().__ init__ ()
self.net = nn.Sequential(
nn.Flatten(),
nn.Linear(3*64*64, 256),
nn.ReLU().
nn.Linear(256, 1)，
nn. Sigmoid()
def forward(self, x):
return self.net(x)
# Training example (pseudo code)
# z = torch.randn(batch_ size, 100)
# fake_ images = generator(z)
# real_ output = discriminator(real_ images)
# fake_ output = discriminator(fake_ images)
 
Figure 7: Code. 
 
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 367 
 
Table 1: Environment and parameter configuration. 
Serial number Experimental environments and hyperparameter categories Settings 
1 Num epochs 200 
2 Pre-training No 
3 Batch size 20 
4 Num class 8 
5 Optimizer Adam 
6 Learning rate 0.0001 
7 Development Environment Windows 10 
8 CPU Intel Core i9-10900K 
9 GPU NVIDIA RTX 3090 
10 Memory 64GB 
11 Graphics Memory 16GB GDDR6X 
12 Programming Tools PyTorch 1.6.0 
 
100 96.4% 100 95.6%
90 88.5% 90 84.3%
80 79.3%
80
79.8%
70 65.3% 70 64.8%
60 60
50 SPP-IDenseNet 50 SPP-IDenseNet
40 IDenseNet 40 IDenseNet
30 SPP- 30 SPP-
DenseNet DenseNet
20 DenseNet 20 DenseNet
10 10
0
0 50 100 150 200 250 300 350 400 450 500 0 0 50 100 150 200 250 300 350 400 450 500
Sample size Sample size
(a) Training set (b) Test set
 
Figure 8: Ablation test results of SPP-IDenseNet. 
Figs.8 (a) and (b) show the test results of the new Although the GIST model can capture certain texture 
model in two datasets. As the test samples continue to information, it is limited by its feature extraction method 
grow, the standalone DenseNet module shows lower based on compressed texture description. GIST’s 
classification accuracy in both datasets, with the highest recognition ability for irregular shapes and multi-scale 
being only 65.3%. After introducing the SPP module, patterns is weak, resulting in limited classification 
LBP, Canny operator, and Gabor filter module performance. The SPP-IDenseNet model has 
successively, the classification effectiveness of the entire demonstrated superior performance in all four types of 
model has been significantly improved. The result embroidery image recognition tasks. This model enhances 
indicates that when dealing with embroidery images with its feature perception ability for different scales and spatial 
complex texture features, relying solely on global features structures by introducing SPP modules, and combines 
for extraction has certain performance bottlenecks. The LBP and Gabor filters to model fine-grained textures, 
classification accuracy of SPP-IDenseNet is highest at effectively improving the model's ability to recognize the 
96.4% in the training set and 95.6% in the testing set. This microstructure of embroidery patterns. Meanwhile, the 
study has improved various parts of the DenseNet model addition of the Canny edge detection operator enhances 
to varying degrees for classifying and recognizing ECR the ability to capture boundary and contour features, 
images, demonstrating the effectiveness of the improved enabling the model to maintain high classification 
method. In addition, popular ICMs of the same type, accuracy even in the face of complex background 
including Lightweight CNN (LCNN), Efficient CNN interference. The SPP-IDenseNet model has the highest 
(ECNN), StyleGAN, and Global Image Spatial Texture accuracy rate of 96.3%, the highest recall rate of 98.5%, 
(GIST), are introduced as comparative models. and the highest specificity of 99.4% on Suzhou 
Performance tests are conducted using precision, recall, embroidery, Hunan embroidery, Guangdong embroidery, 
and specificity as indicators, as shown in Table 2. and Shu embroidery. These indicators are numerically 
In Table 2, due to their relatively simplified structures, superior to other convolutional neural network models and 
LCNN and ECNN models have obvious deficiencies in have a more balanced distribution across categories. This 
feature expression ability and fine-grained classification. result demonstrates the adaptability and effectiveness of 
Classification accuracy/%
Classification accuracy/%
368   Informatica 49 (2025) 361–372                                                                                                                                G. Dong et al. 
 
the SPP-IDenseNet model in handling the classification occlusion issues, and identifying embroidery images with 
task of ECR images. The confusion matrix obtained on the similar features in the embroidery image classification 
embroidery image classification dataset before and after dataset. This robustness makes the SPP-IDenseNet model 
model improvement is shown in Fig.9. a powerful tool for ECR image classification, which can 
Figs. 9 (a) and (b) show the confusion matrices before effectively address the challenges in practical 
and after model improvement. The SPP-IDenseNet model applications. 
has the highest classification and recognition accuracy for 
Shui ethnic ponytail embroidery, Xiqin, Hami, Su, Xiang, 3.2 Performance simulation testing of 
Shu, and other embroidery in the embroideries image ECR-IRM for IDCGAN 
classification dataset. Its classification accuracy in Yue 
embroidery types is relatively the lowest. Overall, the This study uses the Tensorflow DL framework to 
SPP-IDenseNet model achieves an average prediction implement the training and testing of the entire ECR-IRM. 
accuracy of over 80% for the 8 styles of embroidery The weights β1 and β2 of the Adam optimizer are set to 
images in the embroidery image classification dataset. 0.5 and 0.9. The loss changes of IDCGAN generator and 
This indicates that the SPP-IDenseNet model discriminator at different network learning rates are shown 
demonstrates strong robustness in handling noise, in Fig.10. 
Table 2: Multi-metric performance test results for different models. 
Style Model Precision/% Recall/% Specificity/% 
LCNN 63.5 65.7 80.2 
ECNN 67.2 69.8 81.6 
Suzhou embroidery GIST 70.3 68.7 83.4 
StyleGAN 85.7 87.4 89.1 
Research method 95.8 98.5 94.2 
LCNN 55.2 56.3 89.6 
ECNN 58.7 60.4 90.2 
Hunan embroidery GIST 60.2 61.7 91.6 
StyleGAN 83.4 85.1 92.3 
Research method 96.3 90.2 99.4 
LCNN 57.6 59.8 53.8 
ECNN 66.3 70.4 60.5 
Cantonese embroidery GIST 71.6 69.7 70.8 
StyleGAN 80.2 82.5 75.4 
Research method 95.1 90.8 95.1 
LCNN 58.8 60.5 55.6 
ECNN 62.8 68.8 58.3 
Sichuan embroidery GIST 70.4 73.4 60.7 
StyleGAN 79.8 81.7 69.2 
Research method 92.4 96.7 90.3 
 
Confusion matrix Confusion matrix
90 90
Hamicxiu 56 1 0 0 0 0 0 0 Hamicxiu 82 1 0 0 0 0 0 0
80 80
Other 0 58 0 2 0 2 0 4 Other 0 87 0 3 0 3 0 6
70 70
Shuzuma Shuzuma
2 0 66 0 0 1 1 0 2 0 92 0 0 1 1 0
weixiu 60 weixiu 60
Shuxiu 0 0 0 52 2 1 0 2 50 Shuxiu 0 0 0 84 2 1 0 2 50
Suxiu 4 1 0 4 56 6 0 8 40 Suxiu 4 1 0 4 86 8 0 10 40
Xiangxiu 0 1 0 0 0 49 1 1 30 Xiangxiu 0 1 0 0 0 78 1 1 30
Xiqincixiu 6 0 0 0 1 1 68 0 20 Xiqincixiu 7 0 0 0 1 1 90 0 20
Yuexiu 0 2 1 1 2 2 0 47 10 10
Yuexiu 0 2 1 1 3 2 0 75
0 0
True labels True labels
(a) Pre-improvement (b) SPP-IDenseNet
 
Figure 9: Confusion matrix plots before and after model improvement. 
Hamicxiu
Othe
S rhu
a zw ue m
ix
S ih u
uxiu
Suxi
X uiangx
X iuiqincix
Y iu
uexiu
Hamicxiu
Oth
S e
h r
u
a zw ue m
ix
S iuhuxiu
Sux
X iu
iangx
X iuiqincixi
Y uuexiu
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 369 
 
0.0002 0.0002
0.002 0.002
3.0 0.00002 1.8 0.00002
2.5 1.5
2.0 1.2
1.5 0.9
1.0 0.6
0.5 0.3
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch
(a) Loss of generators at different (b) Loss of the discriminator at different 
learning rates learning rates
 
Figure 10: Loss variation of IDCGAN between generator and discriminator at different learning rates. 
(a) Original image
(b) Random masking
(c) DCGAN
(c) IDCGAN
 
Figure 11: Repair effects of the model before and after the improvement (Source from: https://colorhub.me/). 
In Fig.10 (a), the loss of the IDCGAN generator model performance, the repair effect of the improved 
slowly increases with the growth of training cycles, and model before and after random occlusion is compared, as 
the curve with a learning rate of 0.00002 shows a low and shown in Fig.11. 
stable loss value. The curves with learning rates of 0.002 Figs.11 (a) to (d) show the embroidery original image, 
and 0.0002 show higher loss values and larger images subjected to random occlusion, images restored by 
fluctuations. In Fig.10 (b), the discriminator loss slowly the DCGAN model, and images restored by the IDCGAN 
decreases as the number of training cycles increases. The model. By comparing these images, the effectiveness of 
curve with a learning rate of 0.00002 decreases the fastest IDCGAN in handling different types of embroidery and 
and tends to stabilize, indicating that a smaller learning varying degrees of occlusion can be demonstrated. 
rate helps the discriminator learn more effectively. In IDCGAN can enhance the focus on key features, thereby 
contrast, the curves with learning rates of 0.002 and enabling the restored image to largely restore the details 
0.0002 exhibit significant fluctuations and higher loss and colors of the original image, effectively solving the 
values. Based on the comprehensive experimental data, problem of color non-uniformity. However, DCGAN's 
this study ultimately sets the network learning rate of the repair effect is not ideal when facing large-scale defects, 
IDCGAN model to 0.00002. To verify the impact of and it cannot maintain good contextual consistency, 
dilated convolutional layers, loss functions, and CBAM on resulting in poor repair performance. This discovery 
G-Loss
G-Loss
370   Informatica 49 (2025) 361–372                                                                                                                                G. Dong et al. 
 
validates the necessity of improving the DCGAN. To led to significant advantages of IDCGAN in embroidery 
further test the effectiveness of the research model in image restoration. Finally, to confirm the resolution 
embroidery image restoration, the Cycle-Consistency capability of the proposed model, this study also tests four 
GAN (CCGAN), Conditional GAN (CGAN), and Stacked models using image clarity as an indicator, as shown in 
GAN (Stack-GAN) models are introduced for Fig.13. 
comparison. The test results of SSIM as the experimental Figs.13 (a) to (d) show the clarity performance of 
indicator are shown in Fig.12. CGAN, CCGAN, Stack-GAN, and IDCGAN models in 
Figs.12 (a) and (b) show the SSIM performance the Yue embroidery image restoration task. Figure 13 (e) 
comparison of four models in two datasets. Both in the shows the clarity of the original image. The Yue 
training and testing sets, the IDCGAN model performs the embroidery restoration images generated by IDCGAN are 
best, followed by Stack-GAN and CCGAN, while CGAN visually very similar to the original images, and it is 
performs the worst. In the training set, the maximum almost impossible to distinguish the quality differences 
SSIM values for CGAN, CCGAN, Stack-GAN, and the with the naked eye. In contrast, there are significant 
research model are 0.64, 0.72, 0.85, and 0.98, while in the differences between the restoration results of CGAN, 
testing set, they are 0.69, 0.78, 0.90, and 0.99. The above CCGAN, and Stack-GAN and the original images. 
data indicates that the research model has significant Especially for the restored images of the CGAN model, 
advantages in maintaining image structure and quality. there is a significant decrease in clarity in comparison to 
The reason behind this is that the dilated convolution the original images. In summary, the research model 
technique effectively expands the receptive field, allowing surpasses the comparative model in image resolution for 
it to capture richer contextual information in the image. In Guangdong embroidery restoration, demonstrating its 
addition, CBAM further enhances the model's attention to potential and advantages in embroidery image restoration 
key features by weighting important features in both processing. 
channel and spatial dimensions. These improvements have 
CGAN Stack-GAN CGAN Stack-GAN
0.5 CCGAN IDCGAN 0.5 CCGAN IDCGAN
0.6 0.6
0.7 0.7
0.8 0.8
0.9 0.9
1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
Sample size Sample size
(a) Training set (b) Test set
 
Figure 12: Schematic of SSIM test results for different models. 
Image Gradient Entrop
5.65 6.12 6.78 7.82
(a) CGAN (b) CCGAN (c) Stack-GAN (d) IDCGAN
7.86
(e) Original image
 
Figure 13: The clarity of restored images of Cantonese embroidery (Source from: https://colorhub.me/photos/VXeo3). 
SSIM
SSIM
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 371 
 
2565, 2021. https://doi.org/10.1007/s00371-021-
4 Conclusion 02216-0 
[2]  Xiaoli Fu, and Niwat Angkawisittpan. Detecting 
The study focused on the task of image restoration of ECR surface defects of heritage buildings based on deep 
and innovatively constructed an ECR-ICM based on SPP- learning. Journal of Intelligent Systems, 33(1):163-
IDenseNet and an ECR-IRM based on the improved 169, 2024. https://doi.org/10.1515/jisys-2023-0048 
DCGAN. The experimental results showed that the SPP- [3]  Ana M. Maitin, Alberto Nogales, Emilio Delgado-
IDenseNet model achieved an average prediction Martos, Giovanni Intra Sidola, Carlos Pesqueira-
accuracy rate of over 80% for the embroidery images of Calvo, Gabriel Furnieles, and Álvaro J. García-
eight styles. The IRM could enhance the focus on key Tejedor. Evaluating activation functions in GAN 
features, thereby enabling the restored image to largely models for virtual inpainting: A path to architectural 
restore the details and colors of the original image, heritage restoration. Applied Sciences, 14(16):6854-
effectively solving the problem of uneven color. The 6854, 2024. https://doi.org/10.3390/app14166854 
SSIM value has reached 0.99. Furthermore, the research [4]  Alessandro Bombini, Fernando García-Avello 
model could still maintain an excellent restoration effect Bofías, Chiara Ruberto, and Francesco Taccetti. A 
even when dealing with large-area damaged embroidery cloud-native application for digital restoration of 
images. The restored image of Cantonese embroidery cultural heritage using nuclear imaging: 
generated was visually extremely similar to the original THESPIAN-XRF. Rendiconti Lincei. Scienze 
image, and it was almost impossible to distinguish the Fisiche e Naturali, 34(3):867-887, 2023. 
quality difference with the naked eye. The results show https://doi.org/10.1007/s12210-023-01174-0 
that the research model achieves innovation in technology [5]  Kanghyeok Ko, Taesun Yeom, and Minhyeok Lee. 
and demonstrates significant advantages in practical SuperstarGAN: Generative adversarial networks for 
applications. However, the research model also has certain image-to-image translation in large-scale domains. 
limitations. On the one hand, the current models mainly Neural Networks, 162(42):330-339, 2023. 
target 2D embroidery images. At present, there is no https://doi.org/10.1016/j.neunet.2023.02.042 
adaptive research on complex 3D multi-level embroidery [6]  Praveen Kumar, and Varun Gupta. Restoration of 
structures and heterogeneous multi material embroidery damaged artworks based on a generative adversarial 
patterns, which limits their promotion and application in network. Multimedia Tools and Applications, 
high-precision virtual restoration. On the other hand, due 82(26):40967-40985, 2023. 
to the adoption of a deep generative network structure, the https://doi.org/10.1007/s11042-023-15222-2 
model has a certain dependence on computing resources [7]  Wenjun Zheng, Benpeng Su, Ruiqi Feng, Xihua 
during the training and inference stages. This may pose Peng, and Shanxiong Chen. EA-GAN: Restoration 
practical challenges in resource constrained cultural of text in ancient Chinese books based on an example 
heritage conservation institutions or mobile deployments. attention generative adversarial network. Heritage 
Furthermore, for severely damaged or extremely blurry Science, 11(1):55-62, 2023. 
images, there is still a certain risk of distortion in the https://doi.org/10.1186/s40494-023-00882-y 
structural reconstruction of the research model. Future [8]  Mihai Bundea, and Gabriel Mihail Danciu. 
research can be carried out in the following directions: (1) Pneumonia image classification using DenseNet 
Expansion of model generalization ability: By integrating architecture. Information, 15(10):611-619, 2024. 
3D reconstruction and multimodal input, the restoration https://doi.org/10.3390/INFO15100611 
ability of 3D ECR can be enhanced; (2) Enhanced multi- [9]  Sherly Alphonse, S. Abinaya, and Nishant Kumar. 
material adaptability: Material perception module or style Pain assessment from facial expression images 
transfer mechanism can be introduced to achieve texture utilizing Statistical Frei-Chen Mask (SFCM)-based 
simulation and reconstruction of heterogeneous features and DenseNet. Journal of Cloud Computing, 
embroidery materials; (3) Lightweight deployment 13(1):142-148, 2024. 
optimization: By applying techniques such as model https://doi.org/10.1186/S13677-024-00706-9 
pruning, quantization, and distillation, the network [10]  Chunyang Zhu, Lei Wang, Weihua Zhao, and Heng 
structure is compressed to adapt to edge devices or mobile Lian. Image classification based on tensor network 
terminal applications. Overall, the research method DenseNet model. Applied Intelligence, 54(8):6624-
provides a feasible and effective technological path for 6636, 2024. https://doi.org/10.1007/S10489-024-
ECR digital protection, which is expected to have practical 05472-4 
applications in digital museum construction, virtual [11]  S. Deepa, Beevi S. Zulaikha, Laxman L. Kumarwad, 
restoration of cultural heritage, and reconstruction of and Sabbineni Poojitha. Namib beetle firefly 
cultural creative models. optimization enabled DenseNet architecture for 
hyperspectral image segmentation and classification. 
References International Journal of Image & Data Fusion, 
15(2):190-213, 2024. 
[1]  Xinyang Guan, Likang Luo, Honglin Li, He Wang, 
https://doi.org/10.1080/19479832.2023.2284781 
Chen Liu, Su Wang, and Xiaogang Jin. Automatic 
[12] Suresh Samudrala, and C. Krishna Mohan. Semantic 
embroidery texture synthesis for garment design and 
segmentation of breast cancer images using 
online display. The Visual Computer, 37(9):2553-
DenseNet with proposed PSPNet. Multimedia Tools 
372   Informatica 49 (2025) 361–372                                                                                                                                G. Dong et al. 
 
and Applications, 83(15):46037-46063, 2023. 
https://doi.org/10.1007/S11042-023-17411-5 
[13] M. Karthikeyan, and D. Raja. Deep transfer learning 
enabled DenseNet model for content-based image 
retrieval in agricultural plant disease images. 
Multimedia Tools and Applications, 82(23):36067-
36090, 2023. https://doi.org/10.1007/S11042-023-
14992-Z 
[14] Babu Rajendra Prasad, and Dr. B. Sai Chandana. 
Human face emotions recognition from thermal 
images using DenseNet. International Journal of 
Electrical and Computer Engineering Systems, 
14(2):155-167, 2023. 
https://doi.org/10.32985/IJECES.14.2.5 
[15] Ning Wang, Yanzheng Chen, Yi Wei, Tingkai Chen, 
and Hamid Reza Karimi. UP-GAN: Channel-spatial 
attention-based progressive generative adversarial 
network for underwater image enhancement. Journal 
of Field Robotics, 41(8):2597-2614, 2024. 
https://doi.org/10.1002/ROB.22378 
[16] Noa Barzilay, Tal Berkovitz Shalev, and Raja Giryes. 
MISS GAN: A Multi-IlluStrator style generative 
adversarial network for image to illustration 
translation. Pattern Recognition Letters, 
151(16):140-147, 2021. 
https://doi.org/10.1016/J.PATREC.2021.08.006 
[17] Manuel Domínguez-Rodrigo, Ander Fernández-
Jaúregui, Gabriel Cifuentes-Alcobendas, and 
Enrique Baquedano. Use of generative adversarial 
networks (GAN) for taphonomic image 
augmentation and model protocol for the deep 
learning analysis of bone surface modifications. 
Applied Sciences, 11(11):5237-5247, 2021. 
https://doi.org/10.3390/APP11115237 
[18] Aram You, Jin Kuk Kim, Ik Hee Ryu, and Tae Keun 
Yoo. Application of generative adversarial networks 
(GAN) for ophthalmology image domains: A survey. 
Eye and Vision, 9(1):6-16, 2022. 
https://doi.org/10.1186/S40662-022-00277-3 
[19] Zhiguo Xiao, Jia Lu, Xiaokun Wang, Nianfeng Li, 
Yuying Wang, and Nan Zhao. WCE-DCGAN: A 
data augmentation method based on wireless capsule 
endoscopy images for gastrointestinal disease 
detection. IET Image Processing, 17(4):1170-1180, 
2022. https://doi.org/10.1049/IPR2.12704 
[20] Betelhem Zewdu Wubineh, Andrzej Rusiecki, and 
Krzysztof Halawa. Classification of cervical cells 
from the Pap smear image using the RESDCGAN 
data augmentation and ResNet50V2 with self-
attention architecture. Neural Computing and 
Applications, 18(24):1-15, 2024. 
https://doi.org/10.1007/S00521-024-10404-X 
 
https://doi.org/10.31449/inf.v49i16.9995 Informatica 49 (2025) 373–396 373 
 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble 
Models 
 
 
Qiucai Dang 
Zhumadian Preschool Education College, Zhumadian 463000, China 
E-mail: Dqc336699@163.com 
Keywords: above-ground biomass, below-ground biomass, ensemble stacking, grid search optimization 
Received: July 3, 2025 
The present paper aims to propose a novel model to investigate its utility in evaluating the beneficial 
effects of tropical forest biomass. To address the multiplicity of variables, as well as the complexity and 
nonlinear relationships between them, five Machine Learning (ML) models, namely Gradient Boosting 
(GB), Extra Trees (ET), XGB, ElasticNet, and Poisson Regression, were employed to concurrently predict 
both the below-ground and above-ground tree biomass (BGB and AGB, respectively), as well as the total 
biomass (TB = BGB + AGB). Since the results of the aforementioned models were not entirely satisfactory, 
an additional model called the Stacking Ensemble (SE) was introduced. Each model can have its 
parameters optimized by Grid Search with cross-validation to make sure that there is generalization and 
consistent performance. The data collected were based on 175 trees from 27 ecoregional plots located in 
the Central Highlands ecoregion of Vietnam. The dataset was processed to investigate the proposed 
model's ability to predict tree biomass. The study's findings revealed that the proposed method 
demonstrated strong and efficient predictive capabilities for biomass estimation in forest ecoregions. The 
Stacking model showed the most significant improvements in the highest R 2 (0.968) and VAF (0.971), 
and the lowest errors, and MDAPE (23.081 percent), which means that it has a strong ability to predict 
and minimal bias. However, STD (105.763) was marginally higher; nevertheless, the error and strength 
of this variation exceeded this variance. Thus, incorporating a Stacking Ensemble (SE) model strengthens 
the ML approach in predicting forest tree biomass amounts. 
Povzetek: Študija predlaga ansambelski model za napoved tropske drevesne biomase, ki združuje pet ML-
modelov in optimizacijo z iskanjem po mreži. Stacking Ensemble doseže najboljša napovedovanja ter 
najnižje napake, kar občutno izboljša oceno nadzemne, podzemne in skupne biomase. 
 
1 Introduction 1.2 Above-ground biomass (AGB)  
The term above-ground biomass (AGB) refers to the 
1.1 The role of biomass product of above-ground volume (AGV) and vegetation 
mass. It is also closely linked to the carbon cycle in global 
Given that biomass plays an unquestionable role as one of 
grassland ecosystems. Additionally, accurate estimation 
the world’s vital sources of energy [1]. The disputing 
of AGB variations is essential for assessing carbon 
matter is what appropriate model would be able to 
decomposition and its impact on climate change. It is also 
recognize and prove its traits. Zhantao Song et al. (2024) 
crucial to screen in situ-harvested AGB data before 
in their work discussed original visions about the concept 
modeling [3]. Furthermore, AGB is an indispensable 
of the pyrolysis process of biomass. They argued the 
factor for evaluating ecosystem health and carbon storage. 
contribution of various factors to the challenging 
To estimate AGB, the above-ground volume (AGV) of 
anticipation of physicochemical traits by applying 
vegetation is considered a high-priority parameter in 
machine learning techniques such as Random Forest, 
research [4]. 
gradient boosting decision tree, extreme gradient 
To estimate AGB variations of China’s grassland 
boosting, in which R2 was higher than 0.97 for particular 
ecosystems, machine learning algorithms, among which 
surface area biochar anticipation as well as analysis, 
the Random Forest model with R2 = 0.83 (i.e., 83 % of the 
involving yield as well as N content of biochar [1]. 
harvesting AGB variations), and RMSE = 43.84 gm−2, 
In another study, Jia et al. (2024) exploited machine 
revealed accurate performance in estimating grassland 
learning methods to anticipate zeolite-catalyzed biomass 
AGB [3]. Mao et al. (2021) in their proposed model 
pyrolysis, and as a result, the Random Forest algorithm 
proved that structural, textural, and spectral metrics 
performed the highest prediction with R² >0.91 for their 
contribute to shrub AGV models. They also suggested a 
suggested models. They concluded that their selected 
direct reference to specify proper vegetation metrics to 
factors and methods based on biomass characteristics can 
screen shrub AGV. The efficiency, accuracy, and low cost 
be taken into account as a plausible reference [2]. 
374   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
are considered to be the pros of their proposed approach 1.5 Regression models 
for digital terrain model (DTM) output and AGV 
It is appropriate to take a brief glimpse at the regression 
estimation; thus, it can bridge the gap between ground-
models proposed in the present article: 
based research and satellite remote sensing [4]. May et al. 
The Gradient Boosting (GB) model is regarded as a 
2024 obtained spatially complete predictions of biomass 
strong ML algorithm for numerical optimization 
in a tropical area. They state that this sort of spatially 
problems. Thanks to Leo Breiman (1998) and Jerome 
coherent data about AGB supplied by their model is useful 
Friedman (2001), GB has been developed. The former 
to validate the eco-friendly forest handling, carbon 
used GB for decreasing variance for categorization, and 
decomposition innovations, and climate change 
the latter improved it for regression and categorization 
alleviation [5]. 
models. GB algorithms carry out numerical optimization 
for the models of regression and categorization, repeatedly 
1.3 Below-ground biomass (BGB) being approximately directed towards the loss function 
Below-ground biomass (BGB) is a significant part of negative gradient. Due to some complexity, it is 
forest tree biomass; however, fewer studies have focused impossible to direct precisely towards a negative gradient; 
on BGB about forest biomass and carbon. This is largely normally, a weak learner is applied by a GB model to 
because the process of measuring BGB in large trees is estimate the extreme decline direction [13]. 
costly and time-consuming. As a result, researchers often Extra Trees (ET), a recently developed regression 
use Above-Ground Biomass (AGB) to estimate BGB by model, is considered to be an ensemble ML algorithm 
applying a root-to-shoot ratio. For different forest types, related to decision trees. Originally, ET is the improved 
researchers have also developed specific direct BGB form of the Random Forest algorithm for the purpose of 
equations [6].  regression or categorization performance. The reason that 
In a recent study, Oliveira et al. (2024) suggested that makes the ET regression algorithm more competitive for 
predicting peanut BGB using their proposed alternative small-sized sample ML is that it utilizes all data to 
method—i.e., the multi-output regression (MTR) improve the branches of nodes in decision trees effectively 
approach—would enable both researchers and farmers to [14]. Wang et al. (2023) in their study provided an 
quantify BGB more accurately. They proposed this efficient ML model utilizing an ET regression algorithm 
method to predict multiple peanut maturity indices at the for anticipating the relevant synthesis gas traits in the 
field level, helping to reduce subjectivity in determining process of biomass chemical looping gasification, and 
peanut maturity [7]. then compared its ability in prediction between the ET 
model and traditional ones. In another study, using both 
1.4 Ensemble approaches RF along with ET al algorithm models, researchers 
developed a general model to precisely predict the co-
Ensemble learning is a potent machine learning technique 
pyrolysis of coal and biomass, in which ET performed 
that reduces overfitting, boosts robustness, and enhances 
better [15]. ET is advantageous due to achieving more 
overall performance by combining predictions from 
efficient performance than the Random Forest. Compared 
several models. Ensemble approaches combine the 
to RF, ET does not perform bootstrap accumulation like, 
advantages of multiple algorithms to improve 
i.e. it takes a random subset of data without replacement. 
generalization rather than depending on a single model 
Hence, nodes are divided randomly, but not based on the 
[8]. Stacking, also known as stacked generalization, is a 
best divisions. Therefore, in the ET regression model, 
versatile and successful ensemble technique. Stacking 
randomness doesn’t come from bootstrap accumulation 
mixes different kinds of models, possibly with different 
but from the random divisions of the data [16]. According 
architectures and learning strategies [9]. In contrast to 
to Roy (2021), RF was introduced to overcome the 
bagging (e.g., Random Forest) or boosting (e.g., Gradient 
Decision Tree problems, giving medium variance. 
Boosting, XGBoost), which combine similar models 
Accordingly, ET was proposed when accuracy was more 
(typically decision trees). Naik et al. (2022) utilized 
crucial than a generalized model. It also delivers low 
automated stacked ensemble modelling powered by 
variance. 
machine learning for predicting aboveground biomass in 
Extreme Gradient Boosting (XGB) is another strong, 
forests using multitemporal Sentinel-2 data [10]. A 
multifaceted ML algorithm used for regression and 
stacking ensemble algorithm was used by Zhang et al. 
taxonomy jobs. It is well-known for its exceptional 
(2022) to reduce the biases in estimates of forest 
capability to predict performances and deal with intricate 
aboveground biomass derived from several remotely 
datasets. GB involves a series of procedures, preparing 
sensed datasets [11]. Besides, Jin et al. (2025) evaluated 
models in sequence, based on which the previously 
the impact of validation techniques and ensemble learning 
produced errors are reformed by each new model. 
algorithms on estimating aboveground biomass in forests: 
XGBoost is a type of ensemble learning technique that 
a case study of natural secondary forests [12]. To this end, 
mixes the predictions of various ML models to yield an 
they developed models based on various outcomes, 
ultimate prediction that is more precise. Besides that, this 
qualified to synchronously anticipate AGB, BGB, and the 
algorithm also makes use of decision trees like basic 
total amount of tree biomass, i.e., TB, in forest areas, 
learners during its process. To add more, XGB is intended 
solving the problem of carbon estimation for various 
to efficiently influence processors of high-capacity and 
forest sites.  
approaches of the distribution system [17]. Ayub et al. 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 375 
(2023) applied an XGB algorithm model on a multi-level homogeneous mixture of fundamental models, for 
factorial design outcome to predict and improve the accurate yet, at the same time, interpretable prediction of 
gasification product, in which the XGB model depicts a lung cancer prognosis so as to recognize crucial risk 
good prediction accuracy as well as model optimization factors [18]. 
analysis. The key characteristics of the XGB are explained The use of DL methods is unquestionably dominant 
as an ability to handle complicated relations in data with over other traditional methods, particularly in tropical 
regularizing techniques, effectively preventing forests biomass research [6]. Although many studies have 
overfitting; thus, it performs the calculation efficiently due investigated tree biomass anticipation by applying various 
to parallel processing. It considers the usage of decision models [6], the applied models are well-established. But 
trees as base learners and then makes use of regularizing lack combining models as ML, ensemble, and 
techniques for model generalization at a higher optimization of hyperparameters approaches. This work 
dimensionality. XGB, more popularly acknowledged for adds value by combining them using a meticulously 
its efficiency in computations, provides processing designed Stacking Ensemble specifically designed for 
efficiency with perceptive analysis of feature significance, predicting AGB, BGB, and TB using a small, real-world 
as well as deals with missing values smoothly [18]. dataset from 27 eco-regions in Vietnam. The Fit Index 
ElasticNet, being a powerful linear regression (FI), a stability-focused evaluation metric that hasn't been 
technique highly beneficial in ML and statistical used in biomass prediction before, is introduced in this 
modeling, excels traditional linear regression models. It study. The proposed approach provides new 
bears the ability to mix the penalties created by both Lasso methodological insights that improve prediction accuracy 
and Ridge regressions. It is useful in particular when and generalizability in tropical biomass estimation by 
traditional linear regressions struggle with combining rigorous preprocessing, multi-target modelling 
multicollinearity, i.e., when predictors are highly within an ecological context, and systematic 
correlated [19]. That is to say, ElasticNet is advantageous hyperparameter tuning through Grid Search. Furthermore, 
due to bearing multi-dimensional datasets, selecting this work differs from earlier black-box DL applications 
significant traits, and being a more consistent and reliable in that it incorporates Shapley Additive Explanations 
model where there exists collinearity. Aimed to help in (SHAP) for ecological feature interpretation, which offers 
solving problems of regression and developing models’ important ecological insight. Hence, this study was 
performance, ElasticNet offers effective analytical means conducted to serve the purpose of bridging this gap. This 
for handling multi-dimensional regression. Its common subject is an expansion of an ongoing strategy to integrate 
applications include characteristics selection, analysis of remote sensing inputs acquired using a satellite or a drone 
regression, and modeling for prediction [20]. The and a source of biomass determinations as measured on 
significance of ElasticNet regression includes the ground in order to develop a spatially superior, and 
multicollinearity handling, automatic feature selection, rooted business-time dynamic biomass forecast model. 
aiding in model interpretability and reducing overfitting, Besides the otherwise plausible analytical foundation of 
flexible regularization, allowing researchers to control the the process, the model is capable of capturing some facets 
balance between Lasso and Ridge penalties, robustness in of complex nonlinear responses and enhancing the 
high-dimensional data, appropriateness for a variety of accuracy of predicting biomass over wider geographical 
regression problems [15]. areas and timeframes due to the use of sophisticated 
Poisson is a regression analysis where its answer is Stacking ensembles, enabled by Grid Search and cross-
based on the distribution called Poisson. The regression validation. Besides, the climatic variables can be included 
suffers from a limitation of the variance equaling the to forecast the change in biomass distribution in the case 
mean, called Equi dispersion. As a consequence of the of a future climate change scenario, which can provide a 
assumption being violated, resulting in the biased standard significant insight both in forest management and on 
error, the less exact test statistics drawn from the model, carbon budgeting. That is to say, designing a new model 
and consequently, the obtained conclusions will be less qualified to anticipate tree BGB, AGB, as well as the total 
valid. The Poisson regression model, therefore, cannot be of tree biomass TB (i.e., TB = BGB + AGB) concurrently, 
used under occurrences of over-dispersion or under- will fulfil the requirement of estimating forest carbon. On 
dispersion. Poisson regression is one of the generalized this account, making use of a community of up-to-date 
linear models. It finds its main application because it regression algorithms to increase the reliability for the 
usually happens to model occurrences of the kind that are aforementioned parameters estimation, as well as that for 
rarely occurring [21]. the newly proposed model, will assist the progressing 
The Stacking Ensemble (SE) model makes use of an literature in the realm of forestry science. The study 
ensemble generalizing approach through learning, despite proposes that integrated ensemble models will anticipate 
the fact that it may lack instructions for appropriate non- tropical tree biomass better than traditional modeling 
hyperparameterized meta-learners. The necessity of systems; as a result, the model will be dominant over 
applying stacking is when multiple ML methods reveal conventional ones. The study objectives are twofold: 
various advantages for a certain task. In this case, the firstly, designing a model to concurrently anticipate tree 
stacking ensemble method employs a discrete ML AGB, BGB, and TB, guaranteeing additivity of tropical 
technique for specifying the efficient application of forests in Vietnam by the names of Dipterocarp and 
various algorithms [22]. For this reason, Arif et al. (2024) Evergreen Broadleaf, and secondly, cross-validating 
developed a model of stacking ensemble, by a non-
376   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
errors compared to a traditional model, applying the same 2 Methodology 
dataset as well as anticipators in the mentioned forests. 
The rest of this paper is structured as follows. That is, The present paper aimed to investigate the efficiency of a 
Section 2 discusses detailed methodology, including state-of-the-art model to be qualified for predicting 
materials and data used in this work. Section 3 presents tropical forest tree biomass effects. 
numerical analyses, graphical analyses, and experimental This study was conducted in one of Vietnam’s eight 
results under the heading of results and discussions. highest tropical forests, called the Central Highlands 
Lastly, section 4 summarizes the concluding points in the ecoregion. Two main tropical forest categories were 
study. selected for the focus of the research, i.e., Dipterocarp and 
Evergreen Broadleaf (See Fig. 1).
 
 
Figure 1: Sample plots, Locations for the forests Dipterocarp and Evergreen Broadleaf in the ecoregion of Central 
Highlands, Vietnam 
In this work, the dataset was exploited in a research machine learning model: Poisson regression, ElasticNet, 
study conducted by Huy et al. [6]. The collected data were XGB, Extra Trees (ET), and Gradient Boosting (GB). This 
based on 175 trees from 27 ecoregional lots located in the allowed us to systematically explore parameter 
Central Highlands, Vietnam. We clearly define the dataset combinations and choose those that produced the best 
partitioning strategy to ensure reproducibility: the entire performance on training data. Based on the results of 
dataset of 175 samples was randomly divided into training cross-validation, the Grid Search methodically 
(80%) and testing (20%) sets. Cross-validation was used investigates a predetermined set of hyperparameter values 
over iterations to ensure robust evaluation and minimize to determine which combination produces the best model 
sampling bias. To ensure compatibility across models and performance. 
better convergence during training, feature preprocessing A customized grid of important hyperparameters was 
involved removing outliers and normalizing all input built for every model. For instance, tree-based models 
variables to a [0,1] range using Min-Max scaling. The such as GB, ET, and XGB had their learning rate, 
hyperparameter tuning process was carried out using Grid maximum depth, and number of estimators adjusted. We 
Search with 5-fold internal cross-validation for each adjusted the L1 ratio and alpha (regularization strength) 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 377 
for ElasticNet. Likewise, pertinent parameters for the MSE, RMSE, MAE, R2, STD, NMSE, MDAPE, and 
stacking meta-learner and Poisson regression were VAF, were used to evaluate performance. 
adjusted. In order to maximize generalization and the Fit Index (FI), a goodness-of-fit metric intended 
performance consistency, Grid Search was used in a cross- to assess the quality of predictions across several cross-
validation framework to guarantee that each model was validation realizations. A higher FI value indicates a better 
trained with the best parameter settings. This method fit, with values approaching 1. The formula for calculating 
greatly enhanced both the Stacking Ensemble's overall the FI is presented below. 
performance and the accuracy of the individual models. 1 𝑘 ∑𝑚 2
𝑖=1(𝑦𝑖 −  ?̂?𝑖)
The desired targets in this dataset included the amount 𝐹𝐼 =   ∑ (1 −   (1) 
𝑘 ∑𝑚
of above-ground tropical biomass (AGB), the amount of 𝑖 (𝑦𝑖 −  ?̅?𝑖)
2
1 =1
below-ground tropical biomass (BGB), and (TB), namely In the equation above k stands for the realizations 
the total tropical tree biomass; equaling the summation of number (in this study k = 10), m is the number of trees 
the below-ground and above-ground tree biomass (i.e., TB sampled in the validation dataset; and yi is the observed 
= BGB + AGB). Moreover, preprocessing and value. ?̂?i represents the predicted value, and ?̅? shows the 
normalization operations were done on the data. averaged value for BGB, AGB, and TB of the ith validated 
 tree in the realization of kth. 
To serve the purpose of the study, five ML models, as  The study goal was to evaluate accuracy and model 
a base learner including GB, ET, XGBoost, ElasticNet, consistency in light of the ecological context and the small 
and Poisson, were employed to synchronously anticipate dataset size. Metrics like R2 and VAF measure the 
both the amount of below and above-ground tree biomass percentage of variance explained by the models, while 
(BGB and AGB, respectively) as well as the total amount MSE, RMSE, and MAE quantify absolute prediction 
of tree biomass, i.e., TB = BGB + AGB.  errors. Understanding normalization effects and error 
Owing to the individual models' mediocre distribution is aided by STD and NMSE. MDAPE is a 
performance, these five models were used as base learners reliable percentage-based metric that works especially 
to create a Stacking Ensemble (SE). Following that, a well with data that contains outliers or skewness, which is 
meta-learner was trained using their predictions to typical in biomass measurements. A new and 
generate the final prediction for every biomass comprehensible metric designed for model comparison 
component. across several validation folds, the Fit Index (FI) was 
For the purpose of assessing as well as selecting the introduced to reward accuracy and stability. Ultimately, 
most efficient model able to concurrently anticipate when combined, these metrics make sure that the 
tropical tree BGB, AGB, and TB, a powerful process of assessment covers robustness, interpretability, and 
cross-validation was carried out. predictive accuracy—all of which are critical components 
The Total number of the data was 175 which was for ecological modelling and decision-making, 
randomly split ten times into two sections, involving 140 demonstrating that the Stacking Ensemble model was 
(80%) for training data, and 35 (20%) for testing data, more efficient than the other compared models. 
evaluating impartially. The reason why the data was Evaluation methods for error metrics criteria are exhibited 
altered into data testing and training data was to conduct a in Table 1.
data analysis satisfying accuracy and reliability in this 
research. A wide range of assessment metrics, such as 
Table 1: Equations for evaluation of statistical metrics criteria  
Statistics Name Equation 
∑𝑖=0
𝑁−1(𝑦𝑖 − ?̂?𝑖)2
MSE Mean Squared Error 𝑀𝑆𝐸(𝑦, ?̂?) =  
𝑁
∑𝑛
RMSE Root Mean Square Error 𝑅𝑀𝑆𝐸 = √ 𝑖=1(𝑦𝑖 − ?̂?𝑖)2
 
𝑛
∑𝑖=1
𝑛 |𝑦𝑖 − ?̂?𝑖|
MAE Mean Absolute Error 𝑀𝐴𝐸 =  
𝑛
∑𝑖=1
𝑁 (𝑦𝑖 − ?̂?𝑖)2 
R2 Determination Coefficient 𝑅2(𝑦, ?̂?) = 1 −  
∑𝑖=1
𝑁 (𝑦𝑖 − ?̅?)2
∑𝑖=1
𝑛 (𝑥𝑖 − ?̅?)2
STD Standard Deviation 𝑆𝑇𝐷 = √  
𝑛 − 1
‖𝑥 − 𝑦‖2
NMSE Normalized Mean Square Error 1 −  
‖𝑥 − ?̅?‖
𝑒𝑎𝑏𝑠 𝑝𝑟?̂?
1 − 𝑒
MDAPE Median Absolute Percentage Error 𝑚𝑒𝑑𝑖𝑎𝑛 (‖ 1 ‖) ∗ 100% 
𝑒𝑎𝑏𝑠
1
𝑣𝑎𝑟(𝑡𝑛 − 𝑦𝑛)
VAF Variance Account Factor (1 − ) ∗ 100 
𝑣𝑎𝑟(𝑡𝑛)
378   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
 
n represents observations number, 𝑦𝑖  is ith observed comprehension of the step-by-step research methodology. 
value, ?̂?𝑖 shows ith predicted value, and ?̅? is the That is to say, the research process begins with the dataset, 
observations average. going through analyzing and normalizing them, next, 
Graphical analyses were also carried out to assess the dividing the normalized data into train and test. More 
accuracy of the recommended model performance. important part is here where the proposed ML models are 
Illustrated in different plots, they give the reader evaluated based on an array of specific metrics to opt an 
illuminating perceptions of the suitability and accuracy of appropriate model which is appeared to be Stacking 
the models, all which have been discussed in section 3 of Ensemble. Finally, ensemble models are also assessed on 
this paper. the basis of evaluation metrics to choose the best one. 
As an overview of the research, the general flowcharts Hence, the results are saved for future usage.
of the whole study have been demonstrated below (See 
Fig. 2 and Fig. 3). Fig. 2 also illustrates a brief 
 
Figure 2: General flowchart of the whole research process for applying the proposed model 
Figure 3 shows the modeling procedure involving forest tree biomass and specifying the best reliable model 
data collection process for the purpose of theory, and then for such prediction by comparing the selected ML models 
applying six ML models to concurrently predict tropical with the aid of evaluation metrics.
STEP 1
"Multi-output deep learning models
Start Modeling enhance  the reliability of simultaneous
above- and belowground biomass predictions
in tropical forests"
STEP 2
ata collection
Define Inputs &
targets in dataset
STEP 3
Input selection
Split dataset into
training (80%) &
testing (20%) set
STEP 4
Apply ML Models
STEP 5
Ensemble model
Stacking
STEP 6
Tuning by
STEP 7
Select best model Compare 6 different machine
Learning model by
evaluation metrics
 
Figure 3: Flowchart of modeling procedure showing the process of employing ML models concurrently 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 379 
The flowcharts of the six regression models, including strength parameter in ElasticNet. It supervises the whole 
ElasticNet, GB, ET, XGB, Poisson, and SE have been strength of regularization applied to the model. For 𝛼 = 0, 
illustrated respectively in the following figure. The no regularization is applied, and Elastic Net equals 
optimization of hyperparameters is also utilized through Ordinary Least Squares (OLS) regression. For 𝛼 = 1, 
Grid Search tuning. These models were utilized with the regularizations of both L1 and L2 are applied, blending 
goal of synchronously predicting ABG, BGB, and TB = their penalties. For 0<α<1, this model employs a mixture 
BGB + AGB. Additionally, each proposed regression will of L1 and L2 regularization, permitting a flexible mixture 
also be discussed briefly below.  of penalties. L1 Ratio (l1_ratio) is the blending parameter 
that identifies the balance between L1 and L2 penalties. It 
2.1 Base machine learning models controls the proportion of the penalty determined to the L1 
norm relative to the L2 norm. For l1_ratio=0, the model 
ElasticNet regression is an extension of linear regression 
applies only regularization of L2 (which equals Ridge 
that integrates both regularizing penalties of Lasso 
regression). For l1_ratio=1, it uses merely regularization 
(abbreviated as L1) and Ridge (abbreviated as L2) into the 
of L1 (which equals Lasso regression). For 
loss function. This combination allows ElasticNet to deal 
0<l1_ratio<1, Elastic Net applies a mixture of both L1 and 
with circumstances where there are a large number of 
L2 regularization, allowing for a combination of penalties 
characteristics, and they are also highly correlated. Its 
[23].  
mathematical formulation is shown below. 
As shown in Fig. 4, applying the Elastic Net model in 
ElasticNet = ∑𝑛 (𝑦𝑖 − 𝑦(𝑥𝑖))2
𝑖=1 + this study involves several linear steps. Because of the 
𝑝
 𝛼 ∑ 𝑝 2 (2) 
𝑗=1|𝑤𝑗| +  𝛼 ∑𝑗=1(𝑤𝑖)  multicollinearity between predictors that are specific to 
In Elastic Net regression, the parameters alpha and trees and sites, ElasticNet regression was used. It made 
l1_ratio bear significant roles in specifying the feature selection and coefficient shrinkage possible at the 
regularization technique used in the model. These same time by combining L1 and L2 penalties. Grid Search 
parameters control the trade-off between the L1 and L2 with 5-fold cross-validation was used to optimize the 
penalties. In the presented formula 𝛼 is the regularization regularization parameters (l1_ratio and alpha), which 
enhanced the generalization and stability of the model.   
Start 
Set hyperparameters for Elastic- net 
Optimize coefficents using training 
data
Apply Regularization :
 Use both L1 and L2 penalties
Fine-tune hyperparameters
Make prediction
Output
 
Figure 4: The steps of the ElasticNet model applied for predicting tropical tree biomass 
380   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
Extra Trees (ET) regression is a type of ensemble ML as follows. Firstly, two main differences between the ET 
strategy that accumulates the outcomes of various decision and the RF systems are cited as follows. Firstly, ET 
trees decorrelated, similar to Random Forest regression exploits all the divided nodes as well as cutting points, 
[24]. ET regression employs a conventional top-down selecting randomly from the cut points. Secondly, the 
technique to create a collection of regression trees. RF algorithm applies all the samples to help the tree grow so 
model applies two stages, respectively, including as to limit bias.  
bootstrapping and bagging [25]. Two parameters involved in the ET model for 
Hameed et al. (2021) have discussed the Random controlling the splitting process are k and nmin; 
Forest model as an array of decision trees in their article, k represents the characteristic number, chosen randomly 
and used its equation as follows: in the node, while nmin refers to the minimum size of the 
𝑅
1 sample anticipated nodes division. Moreover, respectively 
𝑇(𝑥, 𝜃1, … , 𝜃𝑟) = ∑ 𝑇 (𝑥, 𝜃 ) 
𝑅 𝑟 (3) via k and nmin, the feature selection strength and the 
𝑟=1 average strength of output noise are specified. The 
In which T (x, θr) demonstrates the Tth tree abovementioned two factors enhance the accuracy and 
prediction, in which θ presents a uniform independent decrease the ET model overfitting [19]. 
distribution vector appointed before the tree growth. All Figure 5 demonstrates the design of the ET model 
these trees are blended and averaged in an ensemble of proposed in the present study. According to Fig. 6, 
them (i.e., shaping forest), named T(x). training data are processed through the ET model as 
According to Hameed et al. (2021), two main follows. N predicted outputs result from N tree decisions. 
differences between the ET and the RF systems are cited Consequently, the obtained output predictions are 
averaged to result in the optimal output. 
Decision Tree 1
Decision Tree 2
Predicted output 1
Averaging predicted outputs
Begin Output
Training Dataset (...)  (...) Decision Tree 
Predicted output 2
N
Predicted output N
 
Figure 5: The steps of Extra Trees (ET) model used for predicting tropical tree biomass 
The Gradient Boosting (GB) algorithm’s job is to find subsets for which there would be hT tree decisions. 
a function T(xi), minimizing some loss function ℒ [T Thereafter, there is a one-to-one result for each tree 
(x1),…, T(xn)], in which xi is a vector with k dimensions decision; altogether, hT(x) results. Ultimately, the attained 
for i=1,…, n. This algorithm begins with a primary results’ average is calculated to produce the final result, 
prediction T0(xi) and carries on repeatedly so that Tm(xi) = namely H(x). The ability of Extra Trees (ET) and Gradient 
Tm-1 (xi) + hm(xi). Supposedly, hm(xi) would experience Boosting (GB) to model intricate non-linear 
extreme reduction direction in ℒ [Tm-1(x1),…,Tm-1(xn)]. relationships—which are common in ecological 
Such direction is delivered via ℒ with a negative gradient systems—led to their selection. With a small dataset size 
assessed in [Tm-1(x1)…, Tm-1(xn)]. Despite being very (n=175), ET's randomized split selection was especially 
demanding or sometimes impossible for a function helpful in reducing overfitting. To attain the best bias-
detection of h(xi) to approximate ℒ assessed in [Tm- variance tradeoffs, we employed Grid Search to adjust 
1(x1),…, Tm-1(xn)], when h(xi) approximation is estimated, variables like the number of estimators, tree depth, and 
GB algorithm progresses through Tm(xi) = Tm-1 (xn) + leaf size. 
αhm(xi) for α>0 which is a supposed learning rate [9].  
As the GB model is represented in Fig. 6, the data first 
goes through a bootstrap sampling to be split into T data 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 381 
Dataset
D
Bootstrop
Sampling
Subest Subest ......... Subest
D1 D2 DT
Decision Decision Decision
tree h1 tree h2 tree hT
Result Result Result
.........
h1(x) h2(x) hT(x)
Average
Final result
H(x)
 
Figure 6: The stages of Gradient Boosting (GB) model employed for tropical tree biomass predictions 
Given that the Poisson distribution is the basis for estimation, and the coefficients (β) are specified to 
Poisson regression, it identifies the probability of the maximize the probability of observing the actual count 
occurrence of any number of events in a steady interval of data in the model [26]. 
time or space, with the assumption that the events occur at The approach for the application of the Poisson model 
a constant rate and are independent of each other. The is well-illustrated in Fig. 7, which experiences various 
formula for the Poisson distribution is given by: stages in a linear pattern. To begin with, a point cloud is 
The Poisson distribution could be calculated from the taken as an input; second, the surface normal of all the 
formula below: points is detected by computing the eigenvector over the 
𝜆𝑘𝑒−𝜆 k-nearest neighbors of each point. Third, an octree with a 
𝑃(𝑋 = 𝑘) = ( ) (4) 
𝑘! predefined depth d is selected for categorizing the 
reconstructed surface. Then, the Gradient of the indicator 
In the above equation, X is the number of random 
function (Vx) equated to the vector V is defined by the 
occurrences, and λ represents the average or mean of the 
point cloud. The next stage involves defining an indicator 
events. Poisson regression exploits its distribution to 
function X with the value of 1 inside and 0 outside the 
provide a comprehension of the predictor variables’ 
surface. Thus, Vx=V and the divergence operator is 
relationships along with that of the count data in the 
dataset. In this regression, the expected value (mean) of applied to either side; i.e. ∆𝑥 ≡  ∇. ∇𝑥 = ∇. 𝑉. On the next 
the count variable (namely Y) is designed as a model of a stage, the indicator function x is solved as a standard 
linear mixture of predictor variables (namely X): Poisson problem. The marching cube algorithm is used to 
extract the surface from the solved indicator function x. 
λ = exp (β0 + β1X1 + β2X2 + … + βnXn) (5) 
Eventually, the reconstructed surface is stored in the 
in which: λ is the expected count, which represents the octree of depth d. Since AGB, BGB, and TB are skewed 
occurrence proportion, β0 is the intercept term, β1, β2, …, and non-negative, Poisson regression was employed. 
βn represent coefficients related to each predictor variable. Although it was initially created for count data, its 
The link function in Poisson regression is the natural formulation fits biomass distributions quite nicely. To 
logarithm (log-link), ensuring the predicted values are not make sure the Poisson model's assumptions held true in 
negative. This model is evaluated via maximum likelihood this situation, diagnostic tests were conducted. 
382   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
Surface normal of all the Gradiant of the 
An octree  with a
points are found by indiactor function (Vx) 
A point cloud is predefined depth d is 
computing the eigen equated to the vector 
taken as input selected for storing the 
vector over k-nearset fieled V definde by the 
recontructed surface
neighbor of each point point cloud
An indiactor function x
is defined having value 
of 1 inside and o outside 
the surface
Marching cube Vx = V
The reconstructed the indicator
algorithm is used to Applying the
surface is function x is solvedas 
extract the surface from divergence operator
stored in the octree of a standard 
the solved indictor to either side:
depth d poisson problem
function x
 
Figure 7: The stages of the Poisson model employed for tropical tree biomass predictions 
2.2. Hyperparameter tuning the base models and capturing of non-linear relationships, 
Grid Search was applied in a 5-fold cross-validation a tree-based learner was applied as the meta-model in this 
framework to find optimal hyperparameters for all the research. Stacking model showing an RMSE of 18.298, 
models. For tree-based models like Gradient Boosting, MAE of 12.422, and R2 of 0.968 performed significantly 
Extra Trees, and XGB, the number of estimators, learning better compared to any of the base models on the test data, 
rate, and max tree depth were systematically changed to meaning that the ensemble was able to selectively 
optimize model complexity and predictability. In the leverage the strengths of each of the related models to 
ElasticNet scenario, the regularization parameter (alpha) generate more stable and accurate predictions of biomass. 
and L1 fraction were tuned to prevent overfitting and In the Stacking model, presented in Fig. 8, training 
cause sparsity. In Poisson regression, tuning was for the data are processed on the basis of three level 0 models 
regularization parameters and the number of iterations to separately. Each model’s prediction results are gathered as 
achieve better convergence. After separately tuning each other processed training data in the study. All of the base 
of the base models, their outputs were fed into a meta- learners' predictions (GB, ET, XGB, ElasticNet, and 
learner in the Stacking Ensemble, whose parameters also Poisson) were aggregated by the Stacking Ensemble. To 
were tuned via Grid Search. This broad tuning process avoid overfitting and information leakage, the meta-
ensured that all models, including the ensemble, reached learner, a Ridge regression model, was trained on out-of-
optimal generalization and performance [27]. fold predictions. We were able to improve overall 
The Stacking Ensemble was selected due to the meta- predictive performance by combining the complementary 
learner included, as it blends heterogeneous base learners strengths of all models—capturing distribution-specific, 
with varying predictive ability and error behaviors, as well linear, and non-linear trends—into this ensemble. Table 2 
as generalizes well. Due to its nature of broad application provides the hyperparameters chosen by the stacking meta 
in addressing multicollinearity on regression prediction of learner for the models. 
 
Training
Data
Level0 Level0
Model Model
3 1
Level0
Model
2
Predictions
Training
Data
 
Figure 8: Stacking model procedures used for tropical tree biomass 
Predictions
Predictions
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 383 
Last but not least, XGB is one of the common It is worth mentioning that additive learners would not 
algorithms in ML. It is based on the ensemble learning mess with the functions developed in prior iterations but 
framework, following the gradient boosting algorithm. add information of their own in order to bring down the 
Thus, it is applicable for the tasks of supervised learning, error values. First, the model begins with some function 
i.e., regression, ranking, and categorization. XGB is a F0(x). This F0(x) needs to minimize the loss function or 
predictive model that combines multiple individual MSE, hence: 
models’ predictions iteratively. It works by adding weak 𝐹0(𝑥) = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑛
ϒ ∑𝑖=1 𝐿(𝑦𝑖 , ϒ)  
learners into the ensemble one after another, such that at 𝑎𝑟𝑔𝑚𝑖𝑛ϒ ∑𝑛
𝑖=1 𝐿(𝑦𝑖 , ϒ) = (9) 
every step, a new learner tries to correct the errors of the 
 𝑎𝑟𝑔𝑚𝑖𝑛 𝑛
ϒ ∑𝑖=1(𝑦𝑖 − ϒ)2  
prior ones. It also minimizes a prespecified loss function 
during training data using some sort of gradient descent Regarding the prime differential of this equation with 
optimization [13]. γ, it is observed the function is minimized at the mean i=1, 
In summary, the XGB is developed in three stages …, n. Thus, the promoting model can proceed with: 
straightforwardly: First, a primary model, namely F0, was ∑𝑛
𝑖=1 𝑦𝑖
𝐹0(𝑥) =  (10) 
used to predict, i.e. the aimed variable. The XGB model is 𝑛
related to a residual (y–F0). Second, the residuals obtained F0(x) presents the first step of predictions in this 
in the prior stage are adapted to a new model called h1. model. Next, for each instance, the residual error is 
Third, the combination of F0 and h1 delivers F1, which is expressed as: (yi – F0(x) [28]. 
the promoted form of F0. Consequently, the MSE metric In Fig. 9, the XGB model employs a multifaceted 
system from F1 will be lower than that from F0. approach to make predictions about input data. 
𝐹1(𝑋) < −𝐹0(𝑥) + ℎ1(𝑥) (6) Afterwards, the average of predictions is calculated and an 
For improving F1's performance, a residuals model of ultimate XGB prediction is thus generated .Because of its 
F1 can be designed, and an original model called F2 is exceptional performance with structured tabular data and 
presented. its integrated regularization, which helps avoid 
𝐹2(𝑋) < −𝐹1(𝑥) + ℎ2(𝑥) (7) overfitting, XGB was included. It was well-suited for this 
task because of its efficient handling of non-linearities, 
This process would be iterated for a number of 'n' support for missing values, and robustness to noise, even 
stages up until potentially minimizing residuals as much with the small sample size. Grid Search was used to 
as probable, i.e. optimize important hyperparameters, such as learning 
𝐹𝑛(𝑋) < −𝐹𝑛−1(𝑥) + ℎ𝑛(𝑥) (8) rate, maximum depth, and gamma.  
Predictions Predictions
Predictions
Input (...)
Data
(...)
AverageofPredictions
XGBoost Prediction
 
Figure 9: The procedure used in the XGB model for predicting tropical tree biomass 
 
 
 
384   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
3 Results and discussion darker the color of a cell, the stronger the correlation of 
the related variables. As it is obvious from this tabulated 
3.1  Exploratory data analysis heatmap, the colors are darker for stronger correlations 
To display how closely related the multiple variables and lighter for weaker ones. Additionally, the green colors 
of the study data are, a Pearson correlation heatmap is represent the positive correlations; that is, when one 
exploited as an effective color-coded visual matrix (See variable increases, the other variable tends to go up, 
Fig. 10). Variables are demonstrated with rows and whereas in the case of negative correlations, when one 
columns, and the cells define the relationship between variable increases, the other variable tends to drop. Purple 
variables two by two. The color shading for each cell colors have been used.
indicates the correlations’ direction and their strength: the 
 
Figure 10: Pearson Correlation Heatmap for detecting the relationship between studied variables. 
In Fig. 11, a pair plot visualization for the distribution explanatory power of the data set is in part due to a mixture 
of dataset parameters is shown for exploring the analysis of continuous gradations in combination with categorical 
of the data. In a pair plot, the data is visualized to find the differences. The distributions in classes are depicted by 
relation between them, where variables are continuous as the colors, and it can be observed that there is clearly a 
well as categorical, or form the most divided clusters. grouping in the plots, either of altitude, or of CA, or of 
Dispersions of the parameters indicate the fact that most WT. Following is a pair plot providing a high interface 
features are not evenly distributed. CA, WT, and P are level to derive enlightening statistical information about 
skewed or clustered, and the values of these variables are the dataset; i.e., the variations in each plot can be 
focused on particular ranges. Scatter plot graphs such as observed, and the crucial diagonal secondary plots show 
CA versus WT or HA versus CA show positive each variable distribution. This pair plot for the 
relationships, which hold good, indicating potential relationship between variables of total amount of biomass, 
multicollinearity that could be important to model. As a namely TB, is also demonstrated in Fig. 12, which more 
contrast, the variable types like forest type code and soil explicitly explores how the CTB classes are distinguished 
type code arrive in horizontal bands or discrete groups, as in terms of predictors. In this instance, the scatter plots 
they are categorical. These trends suggest that the show that for most variable pairs, the CTB categories are 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 385 
predominantly overlapping, indicating that no set of dataset is skewed in variable distribution. This class 
variables can completely separate the CTB classes. grouping within specific regions suggests that individual 
However, there are regions, particularly in pairings like variables may not always be able to differentiate CTB, but 
CA vs. WT or CA vs. P, where some CTB groups are groups of predictors likely have predictive value. Further, 
grouped more closely together or are bunched into more the mixture of continuous and discrete variables 
constricted value ranges. The histograms on the diagonal introduces difficulty, as seen in the scatter plots, where 
also emphasize the bunched character of observations some categories of CTB extend across different bands, 
within given intervals, further underscoring that the with others overlapping.
 
Figure 11: Pair plot for specifying the distribution of dataset parameters as well as their relationship 
386   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
 
Figure 12: Pair plot for showing the relationship between the variables of total tropical tree biomass (TB) and their 
distributions 
3.2. Machine learning results regard to regularization and data augmentation will be 
In the present investigation, six methods were employed necessary in future computations to minimize the chances 
in two forest locations, i.e., Dipterocarp and Evergreen of overfitting. The test results indicate that the Stacking 
Broadleaf, to concurrently anticipate BGB, AGB, and TB model outperforms the others in nearly all metrics, 
= BGB + AGB. To assess the base models’ performance, demonstrating higher predictive accuracy and reliability. 
Table 3 presents the findings of the error metrics criteria Its mean squared error (MSE) is considerably low at 
for each recommended models considering the train and 334.82, indicating lower average squared discrepancies 
test data. By comparing these error metrics along with FI, between the predicted value and actual value compared to 
it was detected that the Stacking Ensemble model was other models like ElasticNet (2378.17) and Extra Trees 
optimal than other models. The very large R 2 values close (1216.54). In the same vein, root mean squared error 
to 0.999 on the training dataset is an indicator of (RMSE) for Stacking is 18.30, a far cry from those of 
overfitting or data leaks. To prevent this we made sure to ElasticNet (48.77) and Gradient Boosting (41.75), 
have rigorous separation of the training and test data and meaning they were more precise in their predictions. Mean 
we optimized our hyperparameters using the Grid Search absolute error (MAE) performs the same, at 12.42 for 
with cross-validation to prevent overfitting. The rock- Stacking, a far better performance than for models such as 
bottom R2 values on the testing data (such as 0.962) are Poisson regression (19.32) and XGB (21.18). 
indicative of a lack of overfitting, so the overfitting In regard to explained variance and fit, Stacking had the 
appears to be contained. Further improvements with best R² value of 0.968 across all models, which means it 
accounts for nearly 97% of the test data variance. This is 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 387 
significantly better than ElasticNet's R² score of 0.77 and efficient and performs better than the other models for 
Extra Trees' R² score of 0.88. The Stacking model's both training and testing data. In contrast, ElasticNet 
normalized mean squared error (NMSE) is 0.032, the shows weaker performance in predicting the variables. 
minimum, with no significant normalized error against Furthermore, the results of the employed evaluation 
data variance. Likewise, its variance accounted for (VAF) metrics are presented and thoroughly discussed using 
is 0.971, indicating impeccable consistency of predicted relevant figures at the end of this section. Improved 
and actual values. The median absolute percentage error accuracy and stability on both forest types were indicated 
(MDAPE) of 23.08 and the standard deviation (STD_dev) by the lower MSE, RMSE, and MDAPE, but higher R2 
of residuals of 105.76 also illustrate the consistency of the and VAF of the Stacking Ensemble, which consistently 
model in its performance. outperformed the other algorithms. ElasticNet performed 
On the other hand, the other models show higher error poorly because of its linear framework, which failed to 
measures and lower variance explanation, with the properly capture the intricate, nonlinear patterns in 
Stacking model remaining the most accurate and biomass data. Because Stacking possessed the ability to 
consistent for the test set in this comparison. This claim combine the powers of tree-based models like GB, ET, 
was on the basis of the higher R2 value of the Stacking and XGB, it outperformed them despite the fact that they 
model for both train and test data. According to this table, were moderate. 
the higher R2 and VAF for each model make them worthy Figure 13 below is an illustration of the data values 
of a better model; in this case SE model. On the other obtained via the ML parameters; i.e., ElasticNet, Extra 
hand, the lower the other metrics such as MSE, RMSE, Trees, GB, Poisson, Stacking, XGB; and accordingly, a 
MAE, NMSE, MDAPE, and STD, the more the model comparison of these parameters in detail, along with their 
would have the merit of being an efficient predictor. distance from the target value data, is presented. 
Therefore, the Stacking model is deemed the most  
Table 3: Error metrics criteria result for the proposed ML models considering the train and test datasets. 
                    Models ElasticNet Extra Trees GB Poisson Stacking XGB 
Metrics   
Train 
MSE 4026.476 4.933E-26 5.800 1725.207 1153.269 1.72544E-05 
RMSE 63.455 2.221E-13 2.408 41.536 33.960 0.004 
MAE 32.550 7.82E-14 1.821 16.042 11.562 0.003 
R2 0.788 0.999 0.999 0.909 0.939 0.999 
NMSE 0.212 2.601E-30 0.000 0.091 0.061 9.09812E-10 
MDAPE 67.280 1.651E-13 5.016 29.535 8.722 0.008 
STD_dev 100.279 137.713 137.491 150.171 134.774 137.713 
VAF 0.788 0.999 0.999 0.909 0.939 0.999 
Test 
MSE 2378.167 1216.538 1743.162 1051.301 334.820 2090.796 
RMSE 48.766 34.879 41.751 32.424 18.298 45.725 
MAE 36.818 18.068 21.948 19.320 12.422 21.178 
R2 0.770 0.882 0.831 0.898 0.968 0.797 
NMSE 0.230 0.118 0.169 0.102 0.032 0.203 
MDAPE 68.390 32.640 34.406 31.075 23.081 30.099 
STD_dev 100.272 85.241 76.508 78.865 105.763 75.760 
VAF 0.777 0.894 0.853 0.924 0.971 0.817 
 
 
388   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
1080
810
540
270
0
Target
ElasticNet
Extra Trees
GBM
Poisson
Stacking
XGBoost
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Row Numbers  
Figure 13: Value plot of comparing ML parameter values with the target value. 
Figure 14 shows the results for R2 as a significant norm line (when R2 = 1) is considered to be a superior as 
error metric criterion, suggesting how well the employed well as more accurate model. This result is in line with a 
ML models’ predictions fit the real data. Shown in the higher R2 value (approximately 0.939) for the test data and 
figure, the model’s prediction values align closely to the 0.968 for the training data in the proposed Stacking model.
1600 ElasticNet 1600 Extra Trees 1600 XGBoost
1400 1400 140
R2 Train = 0.788 R2 Train = 0.999 0 R2 Train = 0.999
120 2
0 R2 Test = 0.770 1200 R2 Test = 0.882 1200 R  Test = 0.797
1000 1000 1000
800 800 800
600 600 600
400 400 400
200 200 200
 Train  Train  Train
0  Test 0  Test 0  Test
  Y=X   Y=X
-2   Y=X
00 -200 -20
- 0
200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600
Actual Values (ElasticNet) Actual Values (Extra Trees) Actual Values (XGBoost)
1600 GBM 1600 Poisson 1600 Stacking
1400 R2 Train = 0.999 1400 R2 Train = 0.909 1400 R2 Train = 0.939
1 R2 Test = 0.831
200 R2 Test = 0.898
1200 1200 R2 Test = 0.968
1000 1000 1000
800 800 800
600 600 600
400 400 400
200 200 200
 Train  Train  Train
0  Test 0  Test 0  Test
  Y=X   Y=X
-2   Y=X
00 -200 -200
-200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600
Actual Values (GBM) Actual Values (Poisson) Actual Values (Stacking)  
Figure 14: Comparing the coefficient of determination (R2) for each ML model. 
 
The frequency of each error value for each ML accuracy. As a result, the error in the ML models’ 
method’s predictions is represented in Fig. 15. The error prediction performance ought to be almost zero to be an 
analysis was conducted for both train and test parameters, adequate model for the aim of the study.
and the ML models were assessed to examine their 
Predicted (Value) Predicted (Value)
Predicted (Value) Predicted (Value)
Predicted (Value) Predicted (Value)
Target (Value)
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 389 
291
 XGBoost Train Test
194
97
0
280  Stacking
140
0
-140
190  Poisson
0
-190
-380
176  GBM
88
0
-88
150  Extra Trees
75
0
-75
630
 ElasticNet
420
210
0
0 25 50 75 100 125 150 175
Row Numbers  
Figure 15: Comparing error values for ML models 
Moreover, according to Fig. 16, the error values for Based on the plot, we see that stacking seems to be 
the proposed ML models have been illustrated from the more accurate than individual models, but by a very small 
least error value to the most errors for both train and test margin. Compared to its predictions, it has fewer errors 
data of each model, moving from left to right. A model and reduced variance, indicating that it has a stronger 
with the least error values (i.e., approximately zero) would generalization and stability. On the contrary, although 
be the best predictor among the employed ML models. other models have also performed adequately, they exhibit 
This visualization highlights that the data has intense some spread or deviations that are a bit higher than the 
recurrent peaks—suggesting non-uniform distributions mean. 
with dominating clusters—and that these patterns persist Overall, it appears that stacking should produce a 
but evolve subtly across different sections of the dataset. more consistent and less erratic result than single models, 
When one of the groups describes a stacking model, its thus making a superior comparison to the single models in 
activity can be visually compared with the other groups by terms of performance.
looking at how close the mean is to zero and how much 
and steady the standard deviation is. 
ElasticNet Extra Trees GBM Poisson Stacking XGBoost
390   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
 Mean ± 1 SD
600  Mean
 Data
400
200
0
-200
-400
 
Figure 16: Boxplot of ML models’ error values for both train and test data 
The following figure (Fig. 17) illustrates a Compared to its predictions, it has fewer errors and 
comparison of proposed models in terms of two important reduced variance, indicating that it has a stronger 
statistical evaluation metrics, namely R2 and VAF, generalization and stability. On the contrary, although 
estimated for both the test and train datasets. As the values other models have also performed adequately, they exhibit 
of these metrics show, all the models perform efficiently some spread or deviations that are a bit higher than the 
in the prediction except the ElasticNet model, which mean. 
performs weaker than others, having lower VAF and R2. Overall, it appears that stacking should produce a 
Stacking and XGB model performances are stronger than more consistent and less erratic result than single models, 
the rest of the models, bearing higher VAF and R2. Based thus making a superior comparison to the single models in 
on the plot, we see that stacking seems to be more accurate terms of performance.
than individual models, but by a very small margin. 
 R2 (Test)  R2 (Train)  VAF (Test)  VAF (Train)
XGBoost 0.999 0.817 0.999 0.797
Stacking 0.939 0.971 0.939 0.968
Poisson 0.909 0.924 0.909 0.898
GBM 0.999 0.853 0.999 0.831
Extra Trees 0.999 0.894 0.999 0.882
ElasticNet 0.788 0.777 0.788 0.770
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0  
Figure 17: Comparison of Models based on VAF and R2 metrics. 
Range
ElasticNet (Train)
ElasticNet (Test)
Extra Trees (Train)
Extra Trees (Test)
GBM (Train)
GBM (Test)
Poisson (Train)
Poisson (Test)
Stacking (Train)
Stacking (Test)
XGBoost (Train)
XGBoost (Test)
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 391 
The other evaluation metrics, including MSE, NMSE, be the best predictor. In this case, the stacking model for 
MAE, RMSE, STD, and MDAPE, are applied for both the train and test datasets is the lowest compared to 
comparison among the models, supposing that the lowest the other models (See Fig. 18).
value of these metrics for each model allows that model to 
ElasticNet ElasticNet
5000 0.25
0.230
0.212
4000 4026.48 0.20
3000 0.15
XGBoost 2378.17
2000 Extra Trees XGBoost 0.10 Extra Trees
0.203
1000 0.05
2090.80 0.118
0 1216.54 0.00
0.00 0.00 0.000 0.000
-1000 -0.05
334.82 5.80 0.000
1153.27 0.061 0.032
1743.16
1051.30 0.169
1725.21 0.091
Stacking GBM 0.102
Stacking GBM
 MSE (Train)  NMSE (Train)
 MSE (Test)  NMSE (Test)
Poisson Poisson  
ElasticNet ElasticNet
40 70
36.818
35 63.455
60
32.550
30 50 48.766
25
40
XGBoost 20 Extra Trees XGBoost 30 Extra Trees
15
45.725 20
10
21.178 8 34.879
5 18.06 10
0 0
0.003 0.000 0.004 0.000
-5 -10
1.821
11.562 2.408
12.422 18.298
33.960
21.948 41.751
16.042
Stacking 19.320 GBM Stacking 32.424 GBM
41.536
 MAE (Train)  RMSE (Train)
 MAE (Test)  RMSE (Test)
Poisson Poisson  
ElasticNet ElasticNet
160 80
150 70 68.39
140 60 67.28
130 50
XGBoost 120 Extra Trees XGBoost 40 Extra Trees
110 30
100.28 137.71
137.71 100 100.27 20
90 30.10 10 32.64
80 85.24 0
75.76 0.01 0.00
70
76.51 -10
78.86 8.72 5.02
105.76 23.08
34.41
134.77 137.49 29.53
31.08
Stacking GBM Stacking GBM
 STD (Train) 150.17  MDAPE (Train)
 STD (Test)  MDAPE (Test)
Poisson Poisson  
Figure 18: Comparison of Models based on MSE, NMSE, MAE, RMSE, STD, and MDAPE metrics. 
Another graphical tool used to compare the models' such as the correlation coefficient, standard deviation 
performance is the Taylor diagram. This diagram (STD), and RMSE. In the diagram, the models' 
evaluates models based on their accuracy, using metrics performance is represented by circles, where better 
392   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
performance is indicated by points closer to the reference the RMSE of the ElasticNet model is higher than that of 
point [29]. Taylor diagrams for predicting tropical tree the other machine learning (ML) models, and its 
biomass are shown in Fig. 19. As seen in these diagrams, correlation coefficient is lower. These findings, based on 
the RMSE of the Stacking model is lower than that of the the correlation coefficient, STD, and RMSE, confirm that 
other models, and its correlation coefficient exceeds 0.9, the Stacking model outperforms the other models.
outperforming the other models in this regard. In contrast, 
  
Taylor Diagram_ (R2, test) Taylor Diagram_ (R2, train) 
  
Taylor Diagram_ (RMSE, test) Taylor Diagram_ (RMSE, train) 
Figure 19: Taylor diagrams for models’ comparison based on RMSE, STD, and R metrics 
The last plot to be discussed for model comparison is Most of the observations in the test plot lie comfortably 
the Williams plot. This plot is used to compare a specific within the satisfactory limits for leverage and standardized 
group of compounds in terms of leverage values and residuals (±2), which is an indicator that the model 
standardized residuals [30].  William’s plot shows the predictions are not biased and stable and possess minimal 
standardized residuals on the y-axis and leverages on the outliers. Only a minimal number of observations being out 
x-axis of the training and testing datasets. From this plot, of the ±2 boundary and leverage constraint signifies that 
the applicability domain is implemented within a squared there are very few influential or problematic points. 
area inside ±2 standard deviations and a threshold h* in Similarly, the training plot shows tightly clumped 
leverage (h* = 3p´/n, being p´ model parameters and n residuals around zero with the majority of data points 
compounds number). The majority of data ought to be having little leverage, which would mean that the model 
located within this area, conceptualizing that they have to has not over-fit the training data. Even some of the 
be inliers and influential in the model. The Williams plots residuals fall outside ±3 or do possess relatively higher 
of the Stacking model on both training and test sets are leverage, those are scattered and do not invalidate the 
indicators of good model performance and generalization. model. The similar trend in both plots confirms that the 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 393 
Stacking model works well on unseen data, learning the robust. To be an efficient model predictor, the data must 
inherent pattern significantly without being overfit and lie within this domain (See Fig. 20) [23]. 
  
Elastic Net Williams Plot [TB, Train] Elastic Net Williams Plot [TB, Test] 
  
Extra Trees Williams Plot [TB, Train] Extra Trees Williams Plot [TB, Test] 
  
GBM Williams Plot [TB, Train] GBM Williams Plot [TB, Test] 
 
 
Poisson Williams Plot [TB, Train] Poisson Williams Plot [TB, Test] 
394   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
 
 
Stacking Williams Plot [TB, Train] Stacking Williams Plot [TB, Test] 
 
 
XGBoost Williams Plot [TB, Train] XGBoost Williams Plot [TB, Test] 
 
Figure 20: Williams plots for models’ comparison based on standard residuals and leverage. 
 
3.3. Comparison with foundation models and summarization. Both use Transformer architecture, but 
LLMs GPT is more focused on generation and BERT on 
comprehension. The proposed SE model adds domain-
Large Language Models, or LLMs, are sophisticated AI 
specific efficiency, whereas models such as BERT and 
systems that have been trained on enormous text datasets 
GPT are effective for general-purpose NLP tasks. We 
to comprehend and produce human language. Google 
specifically highlighted how, in contrast to the extensive, 
created BERT, which is perfect for tasks like classification 
data-intensive training of LLMs, SE makes use of 
and question answering because it can analyze words in 
structured, domain-relevant features. This study also 
both directions and understand context. With its emphasis 
indicates SE's improved interpretability and reduced 
on producing relevant and coherent text, OpenAI's GPT is 
computational cost, both of which are important for 
effective for tasks like content creation, dialogue, and 
ecological modelling. 
performance was investigated in terms of two sets of 
4 conclusions 
actual data, namely training and testing data.   
The study was implemented on eight tropical forests in The outcome of this study presented that the 
Vietnam, using the forestry variables, i.e., AGB, BGB, recommended method had a vigorous efficacy to estimate 
and TB. In an attempt to solve the problem of predicting the amount of forest biomass. That is to say, employing a 
the mentioned variables, the study used an MGDL simultaneous group of ML models resulted in a significant 
regression strategy, which later proved to be an efficient impact on predicting forestry above- and below ground, as 
model to bear a strong ability to predict tropical forest well as the sum of the biomass. The very high R² values 
biomass. To this, five models were selected as major of near 0.999 in the training set are definitely cause for 
algorithms to unravel the issue of biomass prediction. alarm for overfitting or data leakage. We dealt with this by 
These models included Gradient Boosting (GB), Extra ensuring strict separation of training and test datasets such 
Trees (ET), XGB, ElasticNet, and Poisson, all of which that there was no information leakage. We also employed 
were employed to synchronously anticipate both the Grid Search with cross-validation during hyperparameter 
amount of AGB, BGB, as well as TB = BGB + AGB. Then tuning to allow maximum model complexity without over-
optimized by Grid Search. Additionally, the SE model was fitting. The test set results, with R² scores considerably 
joined to the aforementioned models so as to allow the lower (e.g., 0.968 for the best model), are a sign of good 
results to become satisfactory, i.e., mainly for the cross- generalization and suggest that while there may be some 
validation purpose. Therefore, the recommended method's overfitting, it is controlled. More regularization and more 
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 395 
data will be tried in future research to reduce the participants' rights and compliance with the relevant 
possibility of overfitting even more. Based on the ethical guidelines. 
provided metrics, the Stacking ensemble model performed 
clearly superior to each of the standalone models on the References 
test set. That is because it is capable of leveraging the 
prediction power of various base learners—ElasticNet, [1] H. C. Zhantao Song, Xiong Zhang, Xiaoqiang Li, 
Extra Trees, Gradient Boosting, Poisson Regression, and Junjie Zhang, Jingai Shao, Shihong Zhang, Haiping 
XGB—and minimizing their respective errors through a Yang, “Machine learning assisted prediction of 
meta-learner. Stacking takes into consideration the stand- specific surface area and nitrogen content of biochar 
alone strengths of linear as well as nonlinear models and based on biomass type and pyrolysis conditions,” J 
results in improved generalization and less overfitting. Anal Appl Pyrolysis, vol. 183, 2024. 
Quantitatively, the Stacking model achieved the [2] L. Jia, W. Shao, J. Wang, Y. Qian, Y. Chen, and Q. 
highest coefficient of determination (R² = 0.968) and Yang, “Machine learning-aided prediction of bio-
variance accounted for (VAF = 0.971) on the test set, BTX and olefins production from zeolite-catalyzed 
indicating that its predictions were most highly correlated biomass pyrolysis,” Energy, vol. 306, p. 132478, 
with actual biomass values. It generated the lowest mean 2024. 
squared error (MSE = 334.820), root mean square error [3] H. Wu, S. An, B. Meng, X. Chen, F. Li, and S. Ren, 
(RMSE = 18.298), and mean absolute error (MAE = “Retrieval of grassland aboveground biomass across 
12.422), indicating high accuracy and low prediction bias. three ecoregions in China during the past two 
In terms of normalized error, it also had an NMSE of just decades using satellite remote sensing technology 
0.032, and the mean directional accuracy percentage error and machine learning algorithms,” International 
(MDAPE) decreased to 23.081%, significantly better than Journal of Applied Earth Observation and 
other models. Although its test standard deviation (STD = Geoinformation, vol. 130, no. November 2023, p. 
105.763) was slightly greater, this is the natural result with 103925, 2024, doi: 10.1016/j.jag.2024.103925. 
better prediction accuracy and range coverage for both [4] P. Mao et al., “An improved approach to estimate 
train and test data, where the results showed R2 equals above-ground volume and biomass of desert shrub 
0.939 for the testing data and 0.968 for the training data in communities based on UAV RGB images,” Ecol 
this study data analysis. Therefore, adding the SE model Indic, vol. 125, p. 107494, 2021, doi: 
to the proposed models is recommended for predicting 10.1016/j.ecolind.2021.107494. 
forest biomass effects. As a result, this is evidence of the [5] P. B. May et al., “Mapping aboveground biomass in 
poor performance of this model. The William plots Indonesian lowland forests using GEDI and 
residuals display that its majority corresponds to the hierarchical models,” Remote Sens Environ, vol. 
tolerant parameters, and very limited outliers or high- 313, p. 114384, 2024. 
leverage points are there in the test and train subsets. This [6] B. Huy, N. Quy Truong, K. P. Poudel, H. Temesgen, 
implies that the Stacking model produces reliable, and N. Quy Khiem, “Multi-output deep learning 
unbiased, and non-overfit predictions, and there is indeed models for enhanced reliability of simultaneous tree 
powerful generalization and performance. above- and below-ground biomass predictions in 
 tropical forests of Vietnam,” Comput Electron Agric, 
vol. 222, no. December 2023, p. 109080, 2024, doi: 
10.1016/j.compag.2024.109080. 
Declarations [7] M. F. Oliveira et al., “Predicting below and above-
ground peanut biomass and maturity using multi-
Funding target regression,” Comput Electron Agric, vol. 218, 
p. 108647, 2024. 
This research received no specific grant from any funding 
[8] G. Kunapuli, Ensemble methods for machine 
agency in the public, commercial, or not-for-profit sectors. 
learning. Simon and Schuster, 2023. 
[9] R. Dey and R. Mathur, “Ensemble learning method 
Authors' contributions QD using stacking with base learner, a comparison,” in 
Writing-Original draft preparation, Conceptualization, International Conference on Data Analytics and 
Supervision, Project administration. Insights, Springer, 2023, pp. 159–169. 
[10] P. Naik, M. Dalponte, and L. Bruzzone, “Automated 
Acknowledgements machine learning driven stacked ensemble modeling 
for forest aboveground biomass prediction using 
I would like to take this opportunity to acknowledge that multitemporal Sentinel-2 data,” IEEE J Sel Top Appl 
there are no individuals or organizations that require Earth Obs Remote Sens, vol. 16, pp. 3442–3454, 
acknowledgment for their contributions to this work. 2022. 
[11] Y. Zhang, J. Ma, S. Liang, X. Li, and J. Liu, “A 
Ethical approval stacking ensemble algorithm for improving the 
The research paper has received ethical approval from the biases of forest aboveground biomass estimations 
institutional review board, ensuring the protection of from multiple remotely sensed datasets,” GIsci 
Remote Sens, vol. 59, no. 1, pp. 234–249, 2022. 
396   Informatica 49 (2025) 373–396                                                                                                                                         Q. Dang 
 
[12] J. Liu, Y. Niu, Z. Jia, and R. Wang, “Assessing the and Elastic Net,” Cognit Comput, vol. 16, no. 2, pp. 
ethical implications of artificial intelligence 641–653, 2024. 
integration in media production and its impact on the [24] S. M. Mastelini, F. K. Nakano, C. Vens, and A. C. P. 
creative industry,” MEDAAD, vol. 2023, pp. 32–38, de Leon Ferreira, “Online extra trees regressor,” 
2023. IEEE Trans Neural Netw Learn Syst, vol. 34, no. 10, 
[13] R. Huang, C. McMahan, B. Herrin, A. McLain, B. pp. 6755–6767, 2022. 
Cai, and S. Self, “Gradient boosting: A [25] M. M. Hameed, M. K. Alomar, F. Khaleel, and N. 
computationally efficient alternative to Markov Al-Ansari, “An Extra Tree Regression Model for 
chain Monte Carlo sampling for fitting large Discharge Coefficient Prediction: Novel, Practical 
Bayesian spatio-temporal binomial regression Applications in the Hydraulic Sector and Future 
models,” Infect Dis Model, vol. 10, no. 1, pp. 189– Research Directions,” Math Probl Eng, vol. 2021, 
200, 2025, doi: 10.1016/j.idm.2024.09.008. 2021, doi: 10.1155/2021/7001710. 
[14] Z. Wang, L. Mu, H. Miao, Y. Shang, H. Yin, and M. [26] “Understanding Poisson Regression.” Accessed: 
Dong, “An innovative application of machine Nov. 05, 2023. [Online]. Available: 
learning in prediction of the syngas properties of https://medium.com/@data-
biomass chemical looping gasification based on overload/understanding-poisson-regression-a-
extra trees regression algorithm,” Energy, vol. 275, powerful-tool-for-count-data-analysis-
p. 127438, 2023. b7184c61bfde 
[15] H. Wei, K. Luo, J. Xing, and J. Fan, “Predicting co- [27] M. A. Alemayehu, S. D. Kebede, A. D. Walle, D. N. 
pyrolysis of coal and biomass using machine Mamo, E. B. Enyew, and J. B. Adem, “A stacked 
learning approaches,” Fuel, vol. 310, p. 122248, ensemble machine learning model for the prediction 
2022. of pentavalent 3 vaccination dropout in East Africa,” 
[16] R. (Bob) Roy, “No Title.” Accessed: Jun. 27, 2021. Front Big Data, vol. 8, p. 1522578, 2025. 
[Online]. Available: [28] “XGBoost Algorithm.” Accessed: Sep. 04, 2024. 
https://bobrupakroy.medium.com/extra-trees- [Online]. Available: 
classifier-regressor-5b5f6abe8228 https://www.analyticsvidhya.com/blog/2018/09/an-
[17] B. Kıyak, H. F. Öztop, F. Ertam, and İ. G. Aksoy, end-to-end-guide-to-understand-the-math-behind-
“An intelligent approach to investigate the effects of xgboost/#:~:text=XGBoost builds a predictive 
container orientation for PCM melting based on an model,made by the existing ones. 
XGBoost regression model,” Eng Anal Bound Elem, [29] M. Ehteram, A. N. Ahmed, P. Kumar, M. Sherif, and 
vol. 161, pp. 202–213, 2024. A. El-Shafie, “Predicting freshwater production and 
[18] Y. Ayub, J. Ren, T. Shi, W. Shen, and C. He, energy consumption in a seawater greenhouse based 
“Poultry litter valorization: Development and on ensemble frameworks using optimized multi-
optimization of an electro-chemical and thermal tri- layer perceptron,” Energy Reports, vol. 7, pp. 6308–
generation process using an extreme gradient 6326, 2021, doi: 10.1016/j.egyr.2021.09.079. 
boosting algorithm,” Energy, vol. 263, p. 125839, [30] A. Beheshti, E. Pourbasheer, M. Nekoei, and S. 
2023. Vahdani, “QSAR modeling of antimalarial activity 
[19] A. Jain, “No Title.” Accessed: Feb. 05, 2024. of urea derivatives using genetic algorithm-multiple 
[Online]. Available: linear regressions,” Journal of Saudi Chemical 
https://medium.com/@abhishekjainindore24/elastic Society, vol. 20, no. 3, pp. 282–290, 2016, doi: 
-net-regression-combined-features-of-l1-and-l2- 10.1016/j.jscs.2012.07.019. 
regularization-6181a660c3a5   
[20] J. Liu et al., “A new application of Elasticnet  
regression based near-infrared spectroscopy model: 
Prediction and analysis of 2, 3, 5, 4′-tetrahydroxy 
stilbene-2-O-β-D-glucoside and moisture in 
Polygonum multiflorum,” Microchemical Journal, 
vol. 199, p. 110095, 2024. 
[21] Purhadi, D. N. Sari, Q. Aini, and Irhamah, 
“Geographically weighted bivariate zero inflated 
generalized Poisson regression model and its 
application,” Heliyon, vol. 7, no. 7, p. e07491, 2021, 
doi: 10.1016/j.heliyon.2021.e07491. 
[22] U. Arif, C. Zhang, S. Hussain, and A. R. Abbasi, “An 
Efficient Interpretable Stacking Ensemble Model for 
Lung Cancer Prognosis,” Comput Biol Chem, p. 
108248, 2024. 
[23] H. Yıldırım and M. R. Özkale, “A Novel Regularized 
Extreme Learning Machine Based on L 1-Norm and 
L 2-Norm: a Sparsity Solution Alternative to Lasso 
https://doi.org/10.31449/inf.v49i16.9397 Informatica 49 (2025) 397–416 397
HematoFusion: A Weighted Residual-Vision Transformer Ensemble for
Automated Classification of Haematologic Disorders in Microscopic Blood
Images
Mouna Saadallah, Latefa Oulladji, Farah Ben-Naoum
Evolutionary Engineering and Distributed Information Systems Laboratory, Department of Computer Science, Djillali
Liabes University, Sidi Bel Abbes, Algeria
E-mail: mouna.saadallah@univ-sba.dz, latifa.oulladji@univ-sba.dz, farah.bennaoum@univ-sba.dz
Keywords: Medical imaging, neural networks, red blood cell, leukemia, lymphoma
Received:May 27, 2025
Haematologic malignancies pose a significant global challenge, with 1.34 million new cases reported in
2019 and leukemia claiming 311,594 lives in 2020. Early diagnosis of these blood disorders increases
survival chances by enabling prompt treatment, yet their complexity and variable cellular morphology
hinder accurate detection. Advances in Medical Imaging and AI, particularly Image Classification, offer
solutions by analyzing blood samples for subtle morphological patterns. This study advances the field by
introducing a novel data set for the classification of red blood cells and using open-source data for the
classification of leukemia and lymphoma (each covering 29,363; 16,811; and 1,436 images, respectively).
We fine-tuned multiple AI models, including EfficientNetB3, ResNet50V2, and a pretrained Vision Trans-
former (ViT), and combined their strengths into a weighted ensemble framework. Evaluated across various
metrics (including accuracy, precision, recall, etc.), the proposed HematoFusion model excelled, achieving
96% accuracy in the morphology of red blood cells, 99% in Leukemia, and 96% in Lymphoma, surpassing
most existing models in terms of accuracy while covering a wider range of haematologic disorders. These
findings demonstrate the potential of integrated AI frameworks to improve haematologic diagnostics with
precision and reliability.
Povzetek: HematoFusion je uteženi ansambel ResNet50V2 in Vision Transformer, namenjen avtomatski
klasifikaciji hematoloških motenj iz mikroskopskih slik. Sistem uporablja nov RBC-nabor podatkov ter
odprtokodne nabore levkemije in limfoma ter izboljša zanesljivost diagnostičnega razpoznavanja krvnih
celic.
1 Introduction thocyte, spherocyte, and more. White Blood Cells (WBC)
and platelet disorders can mainly be described as quantita-
The collection of blood samples is crucial to understanding tive, for example: Leukopenia, leukocytosis, neutropenia,
diseases, preventing them, and thoroughly providing treat- lymphocytopenia (WBC), thrombocytosis, or thrombocy-
ment. topenia (platelets). Most qualitative disorders are of cancer
The diagnosis of blood cell diseases hinges significantly or proliferative disorders, including Leukemia, lymphoma,
on determining the patient’s Blood Cell Count (BCC) and myeloma (WBC), and Hemophilia (platelets) [9].
observing the appearance of cells under a microscope. It The pathologist, along with other medical professionals,
serves as a guide for the pathologist or biologist, providing depends on studying and examining body tissues to per-
vital information on diseases that are indicative of quantita- form diagnostics. The microscope is the main tool used
tive (variations in the number of cells) or qualitative (struc- to observe blood cells, which provides a detailed descrip-
tural or functional) abnormalities in blood cells [11]. tion of them in terms of shape and count. Blood cell
Patients admitted to consultation often suffer haematologic observation can be extremely challenging with the naked
dysfunction (either qualitative or quantitative), some ex- eye and requires enormous concentration and focus; mod-
amples of the most common cases requiring medical eval- ern technologies, however, recommend new techniques in-
uation are caused either by a decrease in complete blood volving the use of a camera to capture microscopic im-
count, anemia, for instance, sees a decrease in the num- ages that can be exploited for further studies and examina-
ber of Red Blood Cells (RBCs) or the level of hemoglobin, tion. Some existing solutions like EasyCell ®Assistant, Vi-
or an increase/ concentration in the amount of RBCs, as sion Hema®Assist, include a stand-alone tool using highly
marked in the condition of Erythrocytosis. Other condi- costly robots and integrated microscopes that can assist the
tions mark a change in the cell’s shape and/or size, includ- pathologist to make decisions and save time; nonetheless,
ing: microcyte, macrocyte, echinocytes, codocyte, acan- due to their high costs and unavailability in public hospi-
398 Informatica 49 (2025) 397–416 M. Saadallah et al.
tals and laboratories, these solutions cannot be relied upon Nevertheless, the emergence of developed technologies,
entirely. This leads us to consider cheaper and more effec- such as deep learning, made things much easier for biol-
tive innovations emerging in recent years, including Deep ogists/ pathologists, as it assists them with the process of
Learning (DL) and its various contributions. DL has been analyzing the blood smear and detecting abnormalities in
widely implemented by researchers in the medical field, cell type, shape, and aggregation. If done entirely by the
and it has given promising results regarding medical imag- pathologist, this step may take hours or even days when
ing (MRI, X-Rays, CT...) [30, 52] and enabled medical pro- necessary, which causes a decline in the health worker’s
fessionals to rapidly diagnose and detect abnormalities in focus and even eyesight.
the human body without exhausting analysis and observa- This urged the need to automate the task to alleviate the
tions. pressure on them. Many studies have been conducted to
This paper aims to improve classification accuracy for address this problem by exploiting the use of Artificial In-
haematologic diseases by leveraging ensemble learning telligence and its diverse techniques.
techniques applied to multi-source microscopic datasets, In its earliest phase, peripheral blood image analysis was
preserving the full spectrum of morphologic variability. inspired by the emerging use of Artificial Intelligence in
The latest DL techniques were exploited, including trans- the medical field and its automation. Kim KS, et al. [2]
fer learning and fine-tuning Convolutional Neural Net- designed a system that uses a CCD camera attached to the
work (CNN)models and the recently emerging visual trans- microscope to capture the peripheral images, preprocess-
former (ViT) [28]. The ResNet50V2 and EfficientNetB3 ing techniques such as edge enhancement and noise re-
networks were chosen as they were preferable for micro- moval were applied, and the images were later classified
scopic image classification, and the latter was suitable for into 15 types of Red Blood Cell abnormalities and 5 normal
scenarios with limited computing resources. We acquired shapes of White Blood Cells using neural networks. Fol-
different sources for our data set that study not only Red lowing that, neural networks, mainly Convolutional Neu-
Blood Cell disorders but also White Blood Cells (WBC). ral Networks, were explored for blood cell image analy-
The CNN and ViT models were separately trained using sis and classification. WBC and its 5 different normal cell
the completed data set, and the results were later combined shapes (Neutrophils, Lymphocytes, Monocytes, Basophils,
to enhance the performance. Eosinophils) were the easiest to classify and readily avail-
A description of our contribution is provided on the follow- able [14] [56] [37]. Classification accuracy reached 96%
ing lines: using a simple neural network that consists of 16 neurons
input layers and 10 nodes in a single hidden layer to achieve
– A meticulously curated data set for Red Blood Cell a minimum error less than 10−4 and the output layer with 5
Morphology using samples collected in the Anti- neurons to classify each type [14]. Ali et al. [54] proposed
Cancer Center in El-Oued, Algeria. the VGG16-ViT network that uses two online datasets to
– The base architecture of EfficientNetB3 was used with classify WBC subtypes, achieving excellent precisions of
transfer learning, leveraging pretrained weights from 98.99% and 99.95% on each dataset.
ImageNet. It was additionally fine-tuned for the task The DenseNet121 model [12] was used by Bozkurt F. [27]
of blood cell classification. on the open-access data set provided by Paul Mooney,
available on Kaggle.com [18], reaching an accuracy of
– The ResNet50V2 was also integrated and transfer- 98%. Another Two-Module Deformable CNN and Trans-
learned as a base architecture and eventually fine- fer learning was proposed by YaoXufeng, et al. [37], whilst
tuned by adding dense layers and regularization tech- the first module initializes the ImageNet [3] characteristic
niques that serve to enhance the model’s performance. weights, the second module was designated for classifica-
tion. The authors achieved precisions of 95.7%, 94.5%, and
– A pretrained ViT model was applied to our data set 91.6% for low-resolution and noisy undisclosed data sets,
to classify blood cell images through self-attention BCCD data set [20], respectively.
mechanisms. The model was fine-tuned by optimiz- Some of the studies, however, focused solely on the clas-
ing hyperparameters to improve accuracy. sification of one disease. Leukemia is one of the most
common blood cancers, leading to growing interest in
– A hybrid CNN/ViT model was developed by combin- developing new diagnostic systems for early detection
ing the strengths of CNNs for local feature extraction and prevention. In this context, CNNs have gained sig-
with those of ViT which captures global features more nificant attention due to their efficiency and high accu-
efficiently. racy in image-based classification tasks. Areen K. et al.
[47] compared in their study multiple CNN-based algo-
2 Related work rithms (AlexNet, DenseNet, ResNet, and VGG16), em-
ploying three datasets (ALL-IDB, ASH ImageBand, and
Pathology and detecting blood disorders require a mass of images captured at JUST); reaching an accuracy of 94%.
work and time by a biologist to prepare the blood, test it, DeepLeukNet proposed by Saeed et al. [53] was conceived
and analyze it. to classify Acute Lymphoblastic Leukemia (ALL) subtypes
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 399
employing a CNN-based classifier on the ALL-IDB1 and searchers using an imaging flow cytometer to classify un-
ALL-IDB2 datasets, attaining 99.61% accuracy. Kasim et stainedWBCs [19] and optofluidic time-stretchmicroscopy
al. [55] leverages the online datasets ALL-IDB and Mu- along with Machine Learning for aggregated platelets de-
nich AML Morphology datasets for multi-class classifica- tection as well as single platelets and WBC [13].
tion of Leukemia subtypes using pretrained CNN architec- Visual or Vision Transformers were introduced by Doso-
tures and other classification models, including Random- vitskiy, et al. [28] in 2020, to exploit transformers in vi-
Forest, SVM, and Extreme Gradient Boosting. The highest sual applications. Given that image classification is rather
accuracy achieved by this method was of 88%. In recent a novel concept for transformers, it may take a while to fully
studies, Vision Transformers (ViTs) have been employed develop and exploit ViT in this regard. Compared to ViT,
for the classification of Leukemia subtypes. Swain et al. CNN can handle large-scale data sets better and offer ex-
[59] proposed in their research a model based solely on cellent results. ViT, however, is known for its understand-
ViTs and classified ALL subtypes. The accuracy of the test ing of global context and dependencies, although it requires
set reached 99.67%. A similar approach was implemented pretraining large amounts of data to achieve comparable re-
by Prasad et al. [51] who attained an overall accuracy of sults to CNN. [34] Therefore, an ensemble ViT/CNNmodel
98.01% for the automatic detection of ALL. Others opted can be an excellent approach to incorporate ViT’s efficien-
for architectures combining both CNNs and ViTs to further cies with CNN, this was previously done by Y.Barhoumia,
enhance feature extraction. For instance, Tanwar et al. [60] et al. [26] to address another consistent problem of Intra
combined in their study the ResNet50 model with the ViT, Cranial Hemorrhage Classification. It was also employed
establishing a dual-stream architecture, reaching an accu- by Jiang Zhencun, et al. [32] to diagnose ALL. The en-
racy of 99%. semble model method used is the weighted-sum model; the
DL also proved efficient in the classification of other types output results of the ViT models are multiplied by a co-
of cancer such as Lymphoma. Its potential was thoroughly efficient of 0.7, and the output results of the EfficientNet
explained by several researchers [58] [35], stressing the model [21] are multiplied by a coefficient of 0.3. The au-
application of CNNs and ensemble techniques. Ozgur et thors later combined the results to get the final prediction
al. [49] developed a triple classification system of various result. The ViT-CNN ensemble model achieved outstand-
Lymphomas: CLL, FL, and MCL, and employed a com- ing results with an accuracy of 99.03%, exceeding the mod-
bination of ML and DL algorithms, reaching precisions of els in the literature.
94%, 92%, and 82%, respectively. A comparative summary of recent studies on cancer classi-
Sickle Cell Anemia andMalaria can be diagnosed by exam- fication using deep learning methods is presented in Sup-
ining the patient’s RBCs. Harahap Mawaddah, et al. [29] plementary Material: Section 1 (Table S1), which pro-
used a data set that regroups 27.588 images of infected and vides the datasets used, classification techniques, number
healthy individuals’ RBCs provided by Yasmin M. Kassim of classes, and accuracy values reported.
et al. [23]. 2 CNN architectures were compared during
the classification. LeNet-5 [1] was deemed more precise
than DRNet [46] in classifying RBCs affected by Malaria, 3 Methods
with accuracies of 95.7% and 95%, respectively. Alzubaidi
Laith, et al. [22] introduced a CNN classifying RBC into 3.1 Data acquisition
3 classes, namely normal, abnormal, and miscellaneous. The data set used for the classification was acquired by
They used the same network as a feature-extractor, then ap- combining different sources.
plied the Error Correcting Output Codes (ECOC) classifier
for the classification task, achieving an accuracy of 92.06% 1. The Chula RBC-12-data set [33] of RBC blood
In addition to neural networks, Machine Learning tech- smear images, which contains a total of 706 smear
niques were also employed to address the problem of Blood images describing 13 classes of RBC, and compris-
Cell Image Analysis. Aliyu Hajara Abdulkarim, et al. [17] ing over 20K images of normal and pathological RBC.
compared Support Vector Machine (SVM) against Deep The images provided were collected at the Oxidation
Learning methods using AlexNet architecture [5]. The in Red Cell Disorders Research Unit, Chulalongkorn
dataset used was open-sourced and distinguished 4 types of University in 2019, with a DS-Fi2-L3 Nikon micro-
RBC abnormalities along with their normal shape. The ac- scope used at 1000x magnification. The 13 classes
curacy for the CNNmodel was relatively weak and couldn’t are specified as follows: Normal cell, Macrocyte, Mi-
exceed 33%, while the SVM model achieved a perfect crocyte, Spherocyte, Target cell, Stomatocyte, Ovalo-
100% on the RBC data set. The latter was deployed with cyte, Teardrop, Burr cell, Schistocyte, uncategorized,
the Radial Basis Function (RBF) default setting; this same Hypochromia, Elliptocyte. 2 classes were neglected
network was employed by Syahputra Mohammad Fadly, et for the lack of blood smear images.
al. [15], achieving an accuracy of 83.3% using Canny Edge
Detection for preprocessing and feature extraction to clas- 2. The ThalassemiaPBS-data set [40] contains 7108
sify 3 types of RBC abnormalities. images of peripheral blood smear images of four
Label-free identification was also explored by various re- thalassemia patients for nine cell types (Elliptocyte,
Teardrop, Normal cell, Cigar cell, Stomatocyte, Target
400 Informatica 49 (2025) 397–416 M. Saadallah et al.
cell, Hypochromia, Spherocyte, Acanthocyte). The specific morphological characteristics. The organization of
images were collected by a clinical pathologist from these classes was performed meticulously to ensure consis-
the Clinical Pathology Laboratory of the Faculty of tency with the referenced files provided by the authors.
Medicine, Public Health and Nursing, Universitas The accompanying ”Label” folder within the data set
Gadjah Mada, Indonesia, using the Olympus CX21 houses a series of files providing detailed annotations
microscope attached with an Optilab advance plus for each image, structured in a specific format: the x-
camera with 1000x total magnification. coordinate, y-coordinate, and the corresponding RBC type
encoded as a numerical value (each class is given a unique
3. The RBC-mini data set, Anti-Cancer Center El- value from 1 to 11).
Oued, Algeria [57]: A small data set fragment (mini- This labeling system facilitates the task of accurate iden-
batch) provided by the specialized healthcare facility: tification and classification of RBCs, thereby serving as a
Anti-Cancer Center in El-Oued, Algeria, that contains foundation for various haematological studies and the de-
a total of 13 blood smear images, regrouping 5 dif- velopment of automated diagnostic tools.
ferent types of RBC disorders: Burr-cells, ovalocyte, The same process was replicated on the RBC-mini data set
schistocyte, stomatocyte, and teardrop. The blood which we collected in collaboration with the Anti-Cancer
smear images were captured in May 2024 using an op- Center in El-Oued, Algeria. The resulting blood smears
tical microscope with a x1000 magnification. These were preprocessed and cropped using the OpenCV library
images were integrated to augment the diversity of the as described below. The extracted images were manually
RBC class and mitigate overfitting risks, not to serve labeled under the supervision of specialists at the Center.
as a core data source.
Table 1 regroups all 3 sources of RBC data sets and 1. Load the Image using OpenCV.
lists the size of each data set per type of cell disorder, 2. Preprocess the Image by converting it to grayscale and
before and after the application of data augmentation applying thresholding or edge detection to highlight
techniques described in Section 3.3.1. the cells.
The total size of the RBC data set is 29,363.
3. Find the cells using contour detection.
4. The Raabin-Leukemia data set [39] is a free-access
data set of microscopic images of blood cells, focus- 4. Extract each cell based on the detected contours and
ing on cases related to Leukemia. 2 experts labeled save them as separate image files.
the cells, and the samples were captured from patients
at the Takht-e Tavous laboratory in Tehran, Iran. The
Zeiss microscope and LG J3 smartphone camera were
used for the imaging.
5. The Malignant Lymphoma Classification data
set [4] contains a significant number of labeled
histopathological images of lymphoma, 3 types of
this cancer are covered in this data set: Chronic lym-
phocytic leukemia (CLL), follicular lymphoma (FL),
and mantle cell lymphoma (MCL), through biopsies
sectioned and stained with Hematoxylin/Eosin (H+E).
Tables 2 and 3 present the Lymphoma and Leukemia Figure 1: Images of single-cells cropped from one blood
datasets, respectively, compiled from the Malignant smear image
Lymphoma Classification dataset and the Raabin-
Leukemia dataset. The tables show the class distribu- The images provided by the ThalassemiaPBS-data set
tion before and after applying the data augmentation [40] already consisted of single cells; therefore, no further
techniques described in Section 3.3.1. The Lymphoma preprocessing was needed. This process was necessary to
dataset consists of 1,436 images, while the Leukemia isolate and classify specific morphological abnormalities.
dataset contains 16,811 images. In contrast, the leukemia [39] and lymphoma [4] data
were not cropped as single cells. Instead, the whole
3.2 Image cropping blood smear images were retained as input, since the
spatial context and global information contained in the
Figure 1 presents a representative blood smear image from whole smear image are all positively contributing towards
the Chula RBC-12 data set.[33] Each image was manually the classification of leukemia subtypes and malignant
cropped to focus on individual RBCs and relevant regions lymphomas. These differences in preprocessing re-
of interest. They were subsequently categorized based on flect the varying nature of the diagnostic tasks and were
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 401
Table 1: The complete Red Blood Cells data set description by type of cell and data size, including before and after
augmentation
Index Type of RBC No. of images [33] No. of images [40] No. of images [57] Total (with augm)
1 Acanthocyte 0 354 0 1432
2 Burr Cell 90 0 10 982
3 Cigar Cell 455 24 0 1893
4 Hypochromia 90 222 0 1284
5 Normal 1812 1426 0 3292
6 Ovalocyte 114 1211 4 3735
7 Schistocyte 108 0 8 453
8 Spherocyte 92 562 0 2640
9 Stomatocyte 49 382 3 1792
10 Target Cell 651 851 0 3912
11 Teardrop 26 2085 6 7948
Table 2: The Lymphoma data set description by type of cell mirroring the image horizontally or vertically, rotation,
and data size, including before and after augmentation which involves altering the image by turning it by a speci-
fied degree, and Gaussian blurring, which can help reduce
Category Subtype Before After noise and minor details by adding a Gaussian filter to the
CLL 113 443 image.
Lymphoma FL 139 526 When combined, these augmentation techniques allowed us
MCL 122 467 to enrich our data set, all the while relying on additional
data preprocessing techniques that will be introduced in the
Table 3: The Leukemia data set description by type of cell following sections.
and data size, including before and after augmentation
Category Subtype Before After
ALL (L1) 377 1131 3.3.2 Data resizing
ALL (L2) 3595 3595
Leukemia AML (m0) 672 997 Another vital preprocessing technique before training
AML (m1) 425 1700 the model is ”Resizing”. Since our data set is acquired
CLL 1071 3741 from various sources, it is rather imbalanced, and images
CML 1624 5647 come in different sizes and shapes. Therefore, the sizes
must be standardized into a uniform squared dimension
before feeding the images into the model. This will allow
taken into account during the design of the model pipelines. the model to learn efficiently and improve its accuracy.
Each model expects a certain target size for the images.
ResNet50V2 model, for instance, requires a target size
3.3 Data processing of (224, 224, 3); we were able to apply it using the
flow_from_directory() method in Keras. EfficientNetB3,
3.3.1 Data augmentation however, expects input images of shape (300, 300, 3) by
default, but the model can accept other input shapes as
Data augmentation is a technique that is essential in image long as the shape is at least 224 × 224 and the number
processing. It consists of artificially enhancing the size of of channels is 3 (RGB); thus, the input size was resized
a given data set by making changes to the original images. to (224, 224, 3) to reduce computation time and memory
Furthermore, this method presents a solution to improving usage.
the model’s performance by mitigating common issues like
overfitting.
These variations of the existing images generated by the When provided with the target size, Keras uses bilinear
data augmentation techniques provide a more robust data interpolation by default for the image resizing operation.
set. These alterations can consist of simple geometric trans- The formula specified below is a representation of the pro-
formations, and color or noise introductions, all designed to cess in which the original coordinates are mapped to new
make the model’s predictions more generalizable and accu- ones using interpolation.
rate. ( ( ))
In the present study, three primary data augmentation tech- H W
niques were employed, namely: flipping, which involves new(i′, j′) = interpolate orig or g r
i · i
, j · o ig (1)
Htgt Wtgt
402 Informatica 49 (2025) 397–416 M. Saadallah et al.
3.3.3 Data rescaling specifics of the corresponding data set.
The choice of models and more specific details are ex-
To ensure uniformity across input data and improve plained later in the section.
model training, all images were rescaled using appro-
priate preprocessing techniques depending on the model
architecture. To further enhance the CNN-based models’ 3.4.1 EfficientNetB3
efficiency, we’ve used ”Rescaling”, a technique in which
the image’s range of pixel values is changed to a standard A member of the EfficientNet family that was first intro-
or normalized range. duced in May 2019 by [21]. This architecture was chosen
There are two common rescaling techniques: Standard- due to its superior performance in feature extraction and its
ization and normalization. In our paper, we’ve opted for ability to balance computational efficiency with high accu-
the latter, which ensures that various pixel values are used racy, making it well-suited for tasks like blood cell classi-
during the model’s learning process. The pixels of a given fication.
image can be represented as integers in the range of 0 to EfficientNets are developed based on AutoML and com-
255 in the case of an 8-bit image. Rescaling modifies these pound scaling. The authors first used the AutoML MNAS
values into a different range of -1 to 1 or 0 to 1 when using Mobile framework to develop a baseline network, which
normalization. they named EfficientNetB0, the first of the EfficientNet
Likewise, we’ve used the flow_from_directory() method family. They then used the compound scaling method to
to rescale the images by a factor of 1/255 for our training, scale up and obtain the series from B1 to B7.
validation, and test batches. This method uses a form of The architectures achieved higher accuracy and efficiency
Min-Max Scaling, where each pixel value is divided by despite being smaller and, thus faster than other models.
255. The minimum value (0) in this case maps to 0, and In our paper, we have opted for the B3 version which gave
the maximum value (255) in turn maps to 1. Its formula promising initial results, additional layers were added to
can be defined as follows: adapt the model for blood cell classification.
We additionally adjusted key hyperparameters meticu-
lously during training, such as learning rate, batch size, and
X − X
X min dropout rate.
scaled = (2)
Xmax − Xmin Figure 2 shows the architecture of the EfficientNetB3
Meanwhile, for the ViT model, a different preprocess- base model that we have adopted for our specific classifica-
ing strategy was implemented to fit its expected input dis- tion task. The architecture was created using diagrams.net
tribution. Mean–standard deviation normalization was ap- (formerly known as draw.io). [31]
plied to standardize the image data and improve the model’s The model is first fed microscopic images resized to (300,
convergence. Each image’s pixel values were normalized 300, 3) and processed through its pretrained backbone.
using the following channel-wise means and standard devi- The default fully connected classification head of Efficient-
ations: NetB3 had been removed since it is only specific to the Im-
– Mean: [0.485, 0.456, 0.406] ageNet data set it was trained on (containing 1000 classes),
which allowed us to add the custom classification head tai-
– Standard deviation: [0.229, 0.224, 0.225] lored to our data set.
This normalization follows the formula: Three versions of the same architecture were used, each
with a modified softmax layer to adapt to our three differ-
ent data sets: (1) for RBC classification, 11 classes, (2) for
pixel(i, j) −mean
normalized_pixel(i, j) = (3) Leukemia classification, 6 classes, (3) for Lymphoma clas-
std_dev sification, 3 classes.
Additionally, Supplementary Material: Section 2.2 in- The EfficientNetB3 backbone acts as a feature extractor,
cludes the parameter-level details of the aforementioned extracting spatial and hierarchical features that are later fed
data augmentation techniques. to the added layers for learning. The first five layers are
frozen to prevent the weights from being updated during
3.4 Proposed solution training. This helps to adapt the deeper layers to our data
set; whereas freezing more layers could have resulted in
In this section, we present the architectures we employed under-fitting since the data set has unique characteristics
for our blood-cell classification system based on the lat- that are significantly different from the original ImageNet
est deep-learning techniques. Three state-of-the-art models data set. Deeper layers of the EfficientNetB3 backbone are
were explored for this task: EfficientNetB3, ResNet50V2, unfrozen to enable the model to capture more patterns. (eg:
and Vision Transformer (ViT). To further enhance the clas- cell morphology, staining patterns, etc).
sification accuracy, we developed ensemble models com- This version of the model expects a (300, 300, 3) input
bining the strengths of ViTs and CNNs. In training, Trans- shape by default, we have resized the input size to (224,
fer Learning was used to fine-tune each of the cited archi- 224, 3) to speed up training and reduce memory usage due
tectures, and the hyperparameters were optimized to suit the to limited resources and the size of our data set which is
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 403
Figure 2: EfficientNetB3model for blood cell classification: Themodel processes 300x300x3microscopic images through
convolutional layers with Swish activation, followed by mobile inverted bottleneck blocks (MBConv 1 and 6). The first
5 layers are frozen, with fine-tuned deeper layers. A custom classification head is added for task-specific classification
rather small. It consists essentially of modularized architectures that
A dropout layer is added after the global average pooling stack building blocks of the same connecting shape, called
(GAP) to reduce the overfitting that we have been subjected short-cut connections, that skip one or more layers. [10]
to due to the depth of the network against the small size of These connections in ResNet work by performing identity
the data set. A dropout rate of 0.3 was employed, thus de- mapping, the outputs of this mapping are added to those of
activating 30% of neurons during training. This prevents the stacked layers as illustrated in Figure 3.
the model from relying on these specific neurons.
A fully connected layer with 256 units using the ReLU ac-
tivation function is added, serving to learn complex repre-
sentations.
The classification head is completed with the final output
dense layer. The number of units corresponds to the number
of classes in each of our 3 data sets. The softmax activation
function is used to specify the multi-class classification.
3.4.2 ResNet50V2
Deep convolutional neural networks have contributed to
the Image-Classification field significantly, providing a ro-
bust platform to researchers ever since the emergence of
the first-ever deep neural network; LeNet in 1998. Later
on, in 2012, the idea of Dropout was presented, allowing
the model to avoid overfitting.
Researchers next focused solely on adding more convolu- Figure 3: Residual learning - a building block
tional layers to increase the depth of the model, and thus, its
efficiency. However, the idea of simply stacking up layers ResNet50V2 is a residual neural network variant that
didn’t benefit researchers as it introduced a whole new is- employs skip connections to prevent vanishing gradients
sue, the accuracy degradation, which unexpectedly wasn’t during back-propagation; This ensures efficiency in learn-
due to overfitting, rather, it was caused by the Vanishing ing complex features present in microscopic blood cell
Gradient Effect. [6] images.
Residual Neural Networks to the rescue! In 2015,
ResNet152, the first of the ResNet family was introduced. Figure 4 presents the architecture of the ResNet50V2
404 Informatica 49 (2025) 397–416 M. Saadallah et al.
Figure 4: ResNet-50v2 model for blood cell classification: The model processes 224 × 224 × 3 microscopic images
through a series of convolutional layers with ReLU activation. It consists of four main blocks with residual connections
and employs bottleneck blocks (1x1, 3x3, 1x1 convolutions). A global average pooling layer is added, followed by a fully
connected classification head, and a softmax activation for predicting blood cell classes. Key components include skip
connections, dropout (0.6), and task-specific fine-tuning
base model that we have employed for our classification. Lymphoma classification, 3 classes.
Similarly, the model was also designed using the dia-
grams.net tool. 3.4.3 Experimental hyperparameters
The model is fed a microscopic image of size (224,224,3)
that is previously preprocessed and normalized. Table 4 presents a breakdown of the hyperparameters and
It consists of 50 layers, focusing more on improved gradi- setup used in the experiments based on the training pipeline
ent flow and training stability by introducing pre-activation using Keras/Tensorflow, along with their purposes, as well
residual blocks and applying batch-normalization and as the strategies employed to transfer-learn and fine-tune
activation (ReLU) before convolutions. the models and achieve the best accuracies possible.
The network’s initial block captures low-level features The EfficientNetB3 and ResNet50V2models were both
such as edges, textures, and patterns by implementing trained using the same hyperparameters detailed in Table 4.
convolutional and pooling layers, followed by 4 residual
blocks, with skip connections to prevent vanishing gra- 3.4.4 Experimental environment
dient problems. Higher-level features are extracted using
down-sampling (strides). Hardware: The experiments for all 3 models were con-
The final output of these blocks is passed through a global ducted by Google Colab; It typically consists of NVIDIA
average pooling layer (GAP) to reduce the feature map to a Tesla GPUs.
1D vector; A fully connected layer and a softmax classifier
are added. Software: Platform: Google Colab, a hosted Jupyter
Notebook environment.
Framework(s): Tensorflow v2.18.0 and Keras v3.6.0 were
The base model acts as a feature extractor, and the cus- used to develop, train, and evaluate the 3 models.
tom layers act as a task-specific classifier, tailored to blood- Python: Version (3.10).
cell classification. Similar to the EfficientNet, the model Libraries: matplotLib, numpy, PIL, joblib, and more were
was transfer-learned by freezing the first 5 layers. This pre- used to preprocess, analyze, and visualize data.
vents overfitting as the learning focuses on the deep layers,
and a dropout layer is also added to ensure the model does
not memorize the training data. Preceded by a 256 units Storage: The 3 data sets were preprocessed and split into
dense layer that allows the model to combine the learned Training, Validation, and Test data sets, each stored in
features to improve classification and the classifier ends Google Drive, which is mounted to the Colab environment
with a dense layer that has the same number of neurons for access. The detailed data split strategy, including train-
as the classification classes: (1) for RBC classification, 11 ing, validation, and testing partitions, is provided in Sup-
classes, (2) for Leukemia classification, 6 classes, (3) for plementary Material: Section 2.1.
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 405
Table 4: Experimental hyperparameters for training the CNN models and their purposes
Hyperparameter Value Purpose
Optimizer Adam (LR=10−4) A low learning rate is employed to fine-tune pre-
trained layers.
Loss function Categorical Used for multi-class classification.
crossentropy
Metrics Accuracy To monitor the number of correctly classified in-
stances during training.
Steps-per-epoch ResNet50V2: 20, The number of training batches processed per
EfficientNetB3: epoch.
200
Validation steps ResNet50V2: 10, The number of validation batches processed per val-
EfficientNetB3: idation step.
316
Epochs ResNet50V2: 300, Specifies the training schedule which allows grad-
EfficientNetB3: 10 ual convergence.
3.5 ViT model for Lymphoma classification, 3 classes.
Visual or Vision Transformers (ViT) is a novel approach in- The pretrained backbone uses the google/vit-base-
troduced by Dosovitskiy et al. [28]. It uses the concept of patch16-224-in21K model from the Hugging Face library
transformers designed specifically for visual applications [25] as a feature extractor. The model was trained on the
and image classification tasks in particular. ImageNet-21K data set [8]. It was fine-tuned to adapt to
When using the transformer blocks in ViT, the multi-head the blood cell classification task, where the number of
attention mechanism is applied to integrate global context labels was defined as the number of classes in the data set
efficiently and learn high-level features. [42] as mentioned previously in this section.
Following the success of NLP transformers [16], Dosovit-
skiy et al. were inspired to develop a new attention-based
class of models that can be exploited in Computer Vision. The transformer encoder depicted in Figure 5 is first
Compared to NLP transformers, ViT only uses the encoder provided with the embedded patches (Patch-embedding/
attention branch, neglecting the decoder attention branch, Position-embedding). The input image is divided into
whilst word tokens are replaced by image patches. fixed-size patches of 16x16. We next apply a linear projec-
In a normal CNN, the entire image is taken as input, tion on the flattened patches to form a fixed-dimensional
whereas in ViT, the image is first divided into equal-sized vector. Unlike CNNs, Transformers require position em-
patches, which are passed through some linear layers; the beddings to learn and capture the input’s order of sequence
outputs of this layer are known as patch embeddings. Be- [38]. It serves to improve accuracy and encode the spatial
tween these embeddings, we have the position embeddings, information of the patches.
which serve to provide the model with some positional in-
formation regarding the sequence of these patches. After- The combined embedded patches are fed into the Trans-
ward, another learnable token is added to the position em- former Encoder to go through a series of L layers, each in-
bedding for image classification purposes. cluding a list of components, as follows:
Figure 5 presents the architecture of the ViT model we 1. Multi-head self-attention is a mechanism that enables
have employed for our blood cell classification task. Prior the model to learn global patterns by splitting the
to the training phase, the data was first prepared and pro- process of self-attention into multiple heads, where
cessed to fit the model’s requirements and expected input. each head focuses on the interaction between patch-
The data set was initially split into training, validation, and embeddings differently.[16] The attention calculations
test sets and stored in specific folders. The ImageFolder are eventually merged to give a more global score.
utility was used to load the images and associate them with
their corresponding classes based on the folder names pro- 2. The output of the Multi-head attention is added to
vided. The images were later resized to fit the shape ex- the input of the next component by a skip connec-
pected by the ViT model: 224×224. Further normalization tion (residual connection) after normalization. As ex-
was implemented to standardize the image data, and make plained earlier, residual connections are added to pre-
it more suitable for the model (See Section 3.3.3). vent the vanishing gradient during training.
Similarly to the CNN architecture, three versions were
implemented for each data set: (1) for RBC classification, 3. To further enhance the model’s learning through
11 classes, (2) for Leukemia classification, 6 classes, (3) patch-embeddings, a feed-forward network (FFN)
406 Informatica 49 (2025) 397–416 M. Saadallah et al.
Figure 5: Vision Transformer (ViT) Architecture for blood cell classification: The model processes 224×224 microscopic
images through patch embeddings and position encoding, which are later fed to the transformer encoder with loaded
weights from the pretrained ViT-B-16 in 21K model. After passing through the transformer encoder, the embeddings are
used as the input to the classification head (MLP + Softmax)
[48] is fed to the normalized output of the Multi- 224-in21k, along with their respective values; detailing the
head attention, which consists of fully connected lay- batch size, the learning rate, and the optimizer employed.
ers with a GeLU activation in between. This allows The OneCycleLr Scheduler was used as a strategy to vary
the model to capture local transformations. the learning rate during training; each cycle uses a maxi-
mum of 10−3 as a learning rate. Other parameters include
4. Similarly to the Multi-head self-attention block, the the CrossEntropyLoss function and a total of 10 epochs
FFN is normalized and added to the residual connec- (624 batches per epoch) to train the model.
tion.
The output of the Transformer Encoder is a sequence of em-
bedding, enrichedwith local and global contextual informa- Table 5: Experimental hyperparameters for training the ViT
tion, independently for each patch. model
After passing through the Transformer Encoder, the embed-
ding corresponding to the special classification token (cls) Hyperparameter Value
is used as the input to the classification head that consists Batch size 32
of a Multi-Perceptron head (MLP), and a softmax classifi- Learning rate 10−4 (initial)
cation head. Optimizer AdamW
The MLP takes the output of the Transformer Encoder and Scheduler OneCycleLR
feeds it into a series of fully connected layers to prepare the
Scheduler Max lr 10−3
data for the softmax classification head that processes it to
the desired classes. Number of epochs 10
Loss Function CrossEntropyLoss
3.5.1 Experimental hyperparameters Model backbone google/vit-base-patch16-224-in21k
Table 5 outlines the hyperparameters used for training a
ViT model using the backbone google/vit-base-patch16-
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 407
after experimenting with the most prevalent methods in
image classification tasks, namely, maximum voting, the
averaging method, and the weighted sum. In the weighted-
average method, the models are assigned different weights
after training, defining the importance of each model for
prediction.
The weighted-average ensemble combines predictions
from a CNN (M1) and a Vision Transformer (M2), with
o∑utput probabilities for class c denoted as P1(c) and P2(c),
respectively, obtained via the softmax function to ensure
c Pi(c) = 1 for i = 1, 2. The weights w1 and w2 are as-
signed to M1 and M2 based on validation performance. The
ensemble probability for class c is computed as:
w
P 1P1(c) + w2P2(c)
ensemble(c) = (4)
w1 + w2
The final class prediction is determined by selecting the
class with the highest ensemble probability:
ĉ = arg max Pensemble(c) (5)
c
Preprocessing: The input fed to the already trained
models is first preprocessed; each model is preprocessed
differently. The ViT uses normalization with mean and std,
while the CNN uses simple rescaling (1/255). The models
are later loaded to make predictions, and both models
output probabilities for the different classes; we used the
softmax function to ensure that they sum up to 1.
Weight Selection: Weights w1 and w2 were determined
Figure 6: Workflow of the weighted average ensemble through a grid search over predefined pairs, specifically
method: The input is preprocessed to fit the ViT and the [(0.3, 0.7), (0.4, 0.6)], where each pair sums to 1 to main-
CNN models, and the predictions of the models are later tain normalized probabilities. The grid search evaluated
combined using the weighted average approach to generate each weight combination on a validation subset using
the final prediction classification accuracy as the performance metric. The pair
that achieved the highest accuracy was selected. Further
details about the ensemble weight selection and perfor-
3.5.2 ViT-CNN ensemble model mance across datasets are provided in the Supplementary
Material: Section 3.
To further enhance the performance of our models, an
ensemble method was introduced to seek the opinions
of several models and combine them to achieve highly
accurate classifications than those of the raw models when 4 Results
trained separately [44] [50]. Through our experiments,
we have observed the superiority of residual networks This section provides an in-depth analysis of the results
in training and efficient learning, while the ViT model obtained from our experiments. First, we explore the
performed better in certain instances, focusing more on performance of our individual models, using the insights
learning complex features. Thus, we incorporated in our present in the confusion matrix and focusing on metrics
methodology a dual-architecture ensemble, combining such as: (1) accuracy, (2) precision, (3) recall, (4) F1-score,
the residual network’s efficiency with the high precision (5) Cohen kappa, and (6) AUC scores, followed by an
obtained by the ViT. evaluation of the ensemble model, HematoFusion. The
evaluation is conducted across the three data sets we have
Figure 6 presents the flowchart of the ResNet-ViT introduced in earlier sections, and the results are eventually
ensemble model that we have implemented. interpreted in the context of existing literature.
The weighted-average ensemble method was selected
408 Informatica 49 (2025) 397–416 M. Saadallah et al.
The Accuracy is calculated by measuring the number of where:
predicted cases. When achieved, a high accuracy means the
overall performance of the model is good. However, in the TP
TPR = (True Positive Rate or Sensitivity)
case of imbalanced data sets, high accuracy can be mislead- TP + FN
ing, and other metrics are necessary to further evaluate the FP
FPR = (False Positive Rate).
model. FP + TN
TP + TN
Accuracy = (6)
TP + TN + FP + FN 4.1 Classification results
The Precision is calculated by measuring the number To analyze the model’s performance during training, the
of correctly predicted positive cases. A high precision is accuracy and loss were both monitored and visualized
achieved only if most of the positive cases are correctly pre- through the training curves over successive epochs.
dicted. [45] Figures S5, S6, and S7 (Supplementary Material Section
TP
Precision = (7) 7) depict the training curves for the RBC Morphology,
TP + FP Leukemia, and Lymphoma data sets, respectively.
The Recall, also known as Sensitivity or True Positive
Rate measures whether all relevant cases of the data set Additionally, for further visual evaluation of the classi-
were correctly predicted.[45] fication performance, confusion matrices were computed
on the test set, showing the number of accurate and
TP inaccurate predictions of instances, namely: True Negative
Recall = (8)
TP + FN (TN), True Positive (TP), False Negative (FN), and False
Positive (FP).
To address the accuracy’s shortcomings in handling im- The confusion matrices generated for the RBC Morphol-
balanced data sets, which is the case in our paper, the F1- ogy, Leukemia, and Lymphoma datasets are displayed,
score was introduced for balanced evaluations, combining respectively, in Figures 7, 8, and 9.
precision and recall in one metric. The F1-score eventu-
ally only achieves a high accuracy when both precision and These confusion matrices were used to compute the
recall are high. [43] quantitative metrics for a more specific evaluation.
Detailed tables presenting per-class performance metrics
F1 Score = 2 · Precision · Recall (9) (precision, recall, and F1-score, kappa) for each model and
Precision + Recall dataset combination are provided in the SupplementaryMa-
The Cohen Kappa was introduced as a statistical mea- terial: Section 4, Tables S3–S5.
sure of the agreement between the predicted labels and their The results for the classification of RBC Morphology,
actual values. If κ=1, the perfect agreement is achieved, Leukemia, Lymphoma, the Ensemble model, HematoFu-
if κ=0, the agreement is random, and if κ<0, it means the sion are summarized in Tables 6, 7, and 8, respectively. To
model achieved more than a random agreement. [36] test model stability and robustness over sets, we conducted
a bootstrapping analysis. This technique provides us with
Po − Pe an estimate of results’ variability and increases the validity
κ = 1 )
1 − ( 0
Pe of our performance claims over single-run statistics.
The detailed bootstrapping results with distribution plots
where: and summary statistics are presented in Supplementary
Material: Section 6 (Table S7 and Figures S2–S4).
Po = Observed agreement (accuracy) Furthermore, we have calculated and included the AUC
Pe = E∑xpected agreement based on chance scores (Table S6) and ROC curves (Figure S1) for our
k (Ai · B individual models across each dataset, as shown in Supple-
i)Pe = N2 mentary Material: Section 5.
i=1
The Area Under the Curve (AUC) score, specifically the
Area Under the Receiver Operating Characteristic (ROC) 5 Discussion
curve [24], evaluates a model’s ability to discriminate be-
tween classes at various thresholds of classification. An 5.1 Interpretation of results
AUC score approaching 1 indicates high discriminative ca-
pability, particularly useful for unbalanced datasets that are The convergence of the ResNet50V2 model illustrates a
common in blood cell classification, where accuracy be- steady reduction in training loss, and the accuracy becomes
comes misleading base∫d on class difference. stable after reaching a certain number of epochs. The
ViT model has demonstrated higher fluctuation in both
1 accuracy and loss during training.
AUC = TPR(FPR) dFPR (11)
0
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 409
(a) EfficientNetB3 model (b) ResNet50V2 model
(c) Pretrained ViT model (d) HematoFusion model
Figure 7: Confusion matrices for the classification performance of the four models on the RBC data set: (a) EfficientNetB3
model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and ResNet50V2
models.
Table 6: RBC Morphology classification results across the three individual models with detailed metrics for evaluation
Model Model Performance
Train Acc Val Acc Test Acc Kappa Recall F1-score Precision
EfficientNetB3 0.99 0.91 0.97 0.93 0.92 0.92 0.92
ResNet50V2 0.98 0.98 0.92 0.92 0.93 0.93 0.93
ViT 0.98 0.94 0.96 0.96 0.94 0.94 0.94
Table 7: Leukemia classification results across the three individual models with detailed metrics for evaluation
Model Model Performance
Train Acc Val Acc Test Acc Kappa Recall F1-score Precision
EfficientNetB3 1.0 1.0 1.0 0.99 0.99 0.99 0.99
ResNet50V2 1.0 1.0 1.0 0.99 1.0 1.0 1.0
ViT 0.99 0.99 0.99 0.99 1.0 1.0 1.0
410 Informatica 49 (2025) 397–416 M. Saadallah et al.
(a) EfficientNetB3 model (b) ResNet50V2 model
(c) Pretrained ViT model (d) HematoFusion model
Figure 8: Confusion matrices for the classification performance of the four models on the Leukemia data set: (a) Effi-
cientNetB3 model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and
ResNet50V2 models.
Table 8: Lymphoma classification results across the three individual models with detailed metrics for evaluation
Model Model Performance
Train Acc Val Acc Test Acc Kappa Recall F1-score Precision
EfficientNetB3 1.0 0.99 0.99 0.97 0.99 0.99 0.99
ResNet50V2 1.0 0.91 0.96 0.91 0.96 0.96 0.96
ViT 0.98 0.98 0.95 0.92 0.98 0.98 0.98
Table 9: HematoFusion ensemblemodel classification results across the three datasets, showing detailed evaluationmetrics
Dataset Ensemble Model Performance
Best Acc Kappa Recall F1-score Precision
RBC 0.96 0.94 0.97 0.97 0.97
Leukemia 0.99 0.99 1.00 1.00 1.00
Lymphoma 0.96 0.95 0.97 0.97 0.97
When comparing the True Positives of the proposed Hypochromia class. The ensemble model, on the other
HematoFusion model with the individual models, we hand, exhibited a stronger ability to recognize this class.
can clearly observe an increase in the rates of correctly Whereas, Acanthocyte and Teardrop were easier to iden-
classified cases and a decrease in the misclassification tify, owing to their distinguishable shapes, which was
rates. reflected in the high number of TP.
The individual models struggled with predicting the
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 411
(a) EfficientNetB3 model (b) ResNet50V2 model
(c) Pretrained ViT model (d) HematoFusion model
Figure 9: Confusion matrices for the classification performance of the four models on the Lymphoma data set: (a) Effi-
cientNetB3 model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and
ResNet50V2 models.
In Table 6, a slight overfitting is observed due to the The Ensemble model, HematoFusion, demonstrates more
class imbalance; thus, the accuracy can be misleading for uniform results across all data sets in terms of all evaluat-
accurate measurement of the performance. This, however, ing metrics, mitigating the issues with the class imbalance,
was addressed with the use of precision, recall, and F1- as evidenced by its performances, leveraging the strengths
score. Although EfficientNetB3 was slightly better on test of both the ViT and ResNet50V2models that struggled with
accuracy (0.97), the ViT model outperformed in stronger some classes.
and more descriptive metrics, like Kappa score, precision, The precision improved by 4% in the RBC data set and
recall, and F1-score, which indicate more balanced perfor- reached a perfect 100% for Leukemia classification, av-
mance with class imbalance. Therefore, ViT is graded as eraging the performances of the individual models on the
the overall best performer on the RBC morphology dataset. Lymphoma data set with a precision of 97% on the test set.
Table 7 shows more consistent results on the Leukemia Despite the strong performance of our proposed solution,
data set across all models, achieving perfect classification, further improvement could be implemented to help the
which indicates better generalizations. model generalize better and address the issue of class im-
Both ResNet50V2 and EfficientNetB3 achieved compara- balance efficiently.
ble top-tier performances on the Leukemia dataset, with
identical test accuracy, precision, recall, and F1-score, and
minor variations in other evaluation metrics. Efficient- 5.2 Comparative study
NetB3, alternatively, outperforms the other models on the Table 9 presents a breakdown of the performance of the pro-
Lymphoma classification (Table 8), reaching almost perfect posed solution across all three datasets, outlining the accu-
accuracies. racy and precision of the model when compared to the lit-
412 Informatica 49 (2025) 397–416 M. Saadallah et al.
Table 10: Comparative results of the proposed solution and the literature across different metrics for each data set
Dataset HematoFusion Literature
Accuracy Precision Accuracy Precision
RBC 0.96 0.97 0.98 0.97
Leukemia 0.99 1.0 0.99 0.99
Lymphoma 0.96 0.97 0.96 0.96
erature. to known issues like dataset imbalance. As identified pre-
In a bid to substantiate the efficiency of our proposed solu- viously, some classes were underrepresented, and this could
tion, we evaluated it against the following models: result in biased learning as well as overfitting. To miti-
gate this, data augmentation techniques were employed (as
1. Literature [7] RBC Classification: The authors pre- outlined in Supplementary Section 2.2), and performance
sented a Maximum Voting based Ensemble Model to was monitored across a variety of metrics (precision, re-
classify Dacrocyte (Teardrop), Schistocyte, and Ellip- call, F1-score, Cohen’s Kappa, and AUC scores) rather
tocyte cells (Cigar cells) in Iron Deficiency Anemia. than simply accuracy. However, we are aware that the
The average classification precision and accuracy of lack of external validation data limits generalizability. Al-
the latter reached a maximum of 97% and 98%, re- though high-performance metrics are presented, the models
spectively; While both models achieved the same pre- have not been prospectively validated within a real clini-
cision of 97%, the model in the Literature reported a cal workflow. Their incorporation into clinical decision-
slightly higher accuracy (98%) compared to Hemato- making would require extensive regulatory testing and in-
Fusion’s 96%. Nonetheless, it’s worth noting that our terpretability evaluation. Additionally, while conventional
data set comprises 11 classes against the 3 classes stud- regularization techniques such as dropout and data aug-
ied in this article. mentation were applied to address overfitting, we recog-
2. Literature [32] Leukemia Classification: The authors nize the need for more advanced strategies. Future work
proposed a ViT-CNN Ensemble Model for the diagno- will explore class imbalance mitigation techniques such
sis of Acute Lymphoblastic Leukemia (ALL), which as SMOTE, GAN-based synthetic image generation, and
is one of the 6 classes that we analyzed in our pa- uncertainty-aware training beyond testing on independent
per. Compared to the model in the Literature, which cohorts, to further assess the robustness of the model in ac-
achieved 99% accuracy and 99% precision on the tual clinical settings. Furthermore, we intend to conduct
Leukemia dataset, HematoFusion matched the accu- ablation studies on ensemble weight parameters and data
racy (99%) but outperformed in precision, achieving a augmentation strategies to evaluate their individual contri-
perfect 100%. butions.
3. Literature [41] Lymphoma Classification: Malignant
Lymphoma (ML) was addressed in this paper, which 6 Conclusion
is among the 3 classes that appear in our Lymphoma
data set. In this study, the problem of pathological blood cell clas-
The proposed hybrid model used the combined fea- sification was addressed through the use of novel deep-
tures of 3 deep learning networks, namely, MobileNet- learning strategies. We curated a data set for RBCMorphol-
VGG16, VGG16-AlexNet, and MobileNet-AlexNet, ogy classification, consisting of samples from three dif-
to classify the models by the XGBoost and DT algo- ferent sources. The process involved preprocessing tech-
rithms, reaching an average accuracy and precision of niques to establish a data set aligned with our research ob-
96%. jectives; 2 other data sets were acquired, targeted for Lym-
phoma and Leukemia classifications separately.
An extended version of this comparison, covering a broader Three distinct individual models were applied for each of
range of SOTA models and datasets, is provided in Supple- the data sets: the EfficientNetB3, ResNet50V2, and a pre-
mentary Material: Section 8 (Table S8). trained ViT model. To leverage the strengths of both the
Overall, our proposed HematoFusion Ensemble model CNN and ViT architectures, an Ensemble model using the
achieved a reliable performance across the 3 data sets, de- weighted average method was developed.
spite the imbalanced data set and the high number of classes The present findings confirm that the proposed Hemato-
in the case of RBC Morphology Classification. Fusion model mitigates the shortcomings of the individual
models by enhancing the accuracy, precision, and sensitiv-
5.3 Limitations ity, achieving more consistent results across the three data
sets. While HematoFusion demonstrates competitive or su-
Although the reported results show high precision, reach- perior performance on Leukemia and Lymphoma classifi-
ing up to 99%, this should be interpreted with caution due cation, particularly in precision and F1-score, it performs
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 413
comparably on RBC classification, despite its higher num- [8] O. Russakovsky et al. ImageNet Large Scale Vi-
ber of classes and the issue of data imbalance that resulted sual Recognition Challenge. Preprint at https://
in a few cases of overfitting. We additionally acknowledge arxiv.org/abs/1409.0575. 2015.
certain limitations in predicting a couple of classes. These [9] J. C. Chapin and M. T. Desancho. “Hematologic
are the key components to overcome in future research. Fu- Dysfunction in the ICU”. In: Critical Care. Ed. by
ture studies should also be devoted to covering more patho- J. M. Oropello, S. M. Pastores, and V. Kvetan. New
logical blood disorders and implementing further process- York: McGraw-Hill Education, 2016.
ing and data augmentations to alleviate the issue of class
imbalance and overfitting. [10] Kaiming He et al. Identity Mappings in Deep Resid-
Overall, this paper provides a foundation for future de- ual Networks. Preprint at http : / / arxiv . org /
velopments by establishing baseline data that future re- abs/1603.05027. 2016.
searchers can expand upon to address the limited data avail- [11] Kenneth Kaushansky et al. Williams Hematology.
able for RBC Morphology and combining the strengths of New York: McGraw-Hill Education, 2016.
the residual networks and vision transformers for a more [12] GaoHuang et al. “Densely Connected Convolutional
robust framework. Networks”. In: Proceedings of the IEEE conference
on computer vision and pattern recognition. Hon-
References olulu: IEEE, 2017, pp. 4700–4708. DOI: https://
doi.org/10.48550/arXiv.1608.06993.
[1] Yann LeCun et al. “Gradient-based learning applied [13] Yiyue Jiang et al. “Label-free detection of aggre-
to document recognition”. In: Proceedings of the gated platelets in blood by machine-learning-aided
IEEE 86.11 (1998), pp. 2278–2324. DOI: https : optofluidic time-stretch microscopy”. In: Lab on a
//doi.org/10.1109/5.726791. Chip 17.14 (2017), pp. 2426–2434. DOI: https :
[2] KS Kim et al. “Analyzing blood cell image to dis- //doi.org/10.1039/C7LC00396J.
tinguish its abnormalities”. In: Proceedings of the [14] Mazin Z Othman, Thabit S Mohammed, and Alaa B
eighth ACM international conference onmultimedia. Ali. “Neural network classification of white blood
New York: Association for Computing Machinery, cell using microscopic images”. In: International
2000, pp. 395–397. DOI: https://doi.org/10. Journal of Advanced Computer Science and Appli-
1145/354384.354543. cations 8.5 (2017), pp. 99–103. DOI: https : / /
[3] Jia Deng et al. “ImageNet: A large-scale hierarchical doi.org/10.14569/IJACSA.2017.080513.
image database”. In: 2009 IEEE conference on com- [15] Mohammad Fadly Syahputra, Anita Ratna Sari, and
puter vision and pattern recognition. Miami: Ieee, Romi Fadillah Rahmat. “Abnormality classification
2009, pp. 248–255. DOI: https://doi.org/10. on the shape of red blood cells using radial basis
1109/CVPR.2009.5206848. function network”. In: 2017 4th International Con-
[4] Nikita Orlov et al. “Automatic Classification of ference on Computer Applications and Information
Lymphoma Images With Transform-Based Global Processing Technology (CAIPT). Kuta Bali, Indone-
Features”. In: IEEE transactions on information sia: IEEE, 2017, pp. 1–5. DOI: https://doi.org/
technology in biomedicine : a publication of the 10.1109/CAIPT.2017.8320739.
IEEE Engineering in Medicine and Biology Society [16] Ashish Vaswani et al. “Attention is all you need”. In:
14 (2010), pp. 1003–13. DOI: https://doi.org/ Advances in Neural Information Processing Systems
10.1109/TITB.2010.2050695. 30 (2017). DOI: https://doi.org/10.48550/
[5] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. arXiv.1706.03762.
Hinton. “ImageNet Classification with Deep Convo- [17] Hajara Abdulkarim Aliyu et al. “Red blood cell clas-
lutional Neural Networks”. In: Advances in Neural sification: deep learning architecture versus support
Information Processing Systems. Ed. by F. Pereira et vector machine”. In: 2018 2nd international confer-
al. Vol. 25. Lake Tahoe, Nevada: Curran Associates, ence on biosignal analysis, processing and systems
Inc., 2012, pp. 1097–1105. (ICBAPS). Kuching,Malaysia: IEEE, 2018, pp. 142–
[6] K. He et al. Deep Residual Learning for Image 147. DOI: https://doi.org/10.1109/ICBAPS.
Recognition. Preprint at https : / / arxiv . org / 2018.8527398.
abs/1512.03385. 2015. [18] Paul Mooney. Blood Cell Images. 2018. URL:
[7] Mahsa Lotfi et al. “The detection of dacrocyte, schis- https : / / www . kaggle . com / datasets /
tocyte and elliptocyte cells in iron deficiency ane- paultimothymooney/blood-cells.
mia”. In: 2015 2nd International conference on pat- [19] Mariam Nassar et al. “Label-free identification of
tern recognition and image analysis (IPRIA). Rasht, white blood cells using machine learning”. In: Cy-
Iran: IEEE, 2015, pp. 1–5. DOI: https : / / doi . tometry Part A 95.8 (2019), pp. 836–842. DOI:
org/10.1109/PRIA.2015.7161628. https://doi.org/10.1002/cyto.a.23794.
414 Informatica 49 (2025) 397–416 M. Saadallah et al.
[20] N. C. Shenggan. BCCD Dataset. https : / / [31] JGraph. diagrams.net, draw.io. Oct. 2021. URL:
github.com/Shenggan/BCCD_Dataset. 2019. https://www.diagrams.net/.
[21] Mingxing Tan and Quoc V. Le. “EfficientNet: Re- [32] Zhencun Jiang et al. “Method for diagnosis of acute
thinking Model Scaling for Convolutional Neural lymphoblastic leukemia based on ViT-CNN ensem-
Networks”. In: Proceedings of the 36th International ble model”. In:Computational Intelligence and Neu-
Conference on Machine Learning. Vol. 97. Long roscience 2021.1 (2021), p. 7529893. DOI: https:
Beach, California: PMLR, 2019, pp. 6105–6114. //doi.org/10.1155/2021/7529893.
DOI: https : / / doi . org / 10 . 48550 / arXiv . [33] Korranat Naruenatthanaset et al.Red Blood Cell Seg-
1905.11946. mentation with Overlapping Cell Separation and
[22] Laith Alzubaidi et al. “Classification of red blood Classification on Imbalanced Dataset. Preprint at
cells in sickle cell anemia using deep convolu- https://arxiv.org/abs/2012.01321. 2021.
tional neural network”. In: Intelligent Systems De- [34] Maithra Raghu et al. “Do vision transformers see
sign and Applications. Ed. by Ajith Abraham et like convolutional neural networks?” In: Advances
al. Vol. 1. Cham: Springer International Publishing, in neural information processing systems 34 (2021),
2020, pp. 6–8. DOI: https://doi.org/10.1007/ pp. 12116–12128. DOI: https://doi.org/10.
978-3-030-16657-1_51. 48550/arXiv.2108.08810.
[23] Yasmin M Kassim et al. “Clustering-Based Dual [35] Georg Steinbuss et al. “Deep learning for the classifi-
Deep Learning Architecture for Detecting Red cation of non-Hodgkin lymphoma on histopathologi-
BloodCells inMalaria Diagnostic Smears”. In: IEEE cal images”. In:Cancers 13.10 (2021), p. 2419. DOI:
Journal of Biomedical and Health Informatics 25.5 https://doi.org/10.3390/cancers13102419.
(2020), pp. 1735–1746. DOI: https://doi.org/
10.1109/JBHI.2020.3034863. [36] Željko Vujović et al. “Classification model evalua-
tion metrics”. In: International Journal of Advanced
[24] Tatiana Cristina Figueira Polo and Hélio Amante Computer Science and Applications 12.6 (2021),
Miot. Use of ROC curves in clinical and experimen- pp. 599–606. DOI: https://doi.org/10.14569/
tal studies. 2020. DOI: https://doi.org/10. IJACSA.2021.0120670.
1590/1677-5449.200186.
[37] XufengYao et al. “Classification of white blood cells
[25] Thomas Wolf et al. HuggingFace’s Transform- using weighted optimized deformable convolutional
ers: State-of-the-art Natural Language Processing. neural networks”. In:Artificial Cells, Nanomedicine,
2020. arXiv: 1910.03771 [cs.CL]. URL: https: and Biotechnology 49.1 (2021), pp. 147–155. DOI:
//arxiv.org/abs/1910.03771. https://doi.org/10.1080/21691401.2021.
[26] Yassine Barhoumi and Ghulam Rasool. Scope- 1879823.
former: n-CNN-ViT hybrid model for intracranial [38] Kai Jiang et al. “The encoding method of position
hemorrhage classification. 2021. DOI: https : / / embeddings in vision transformer”. In: Journal of Vi-
doi.org/10.48550/arXiv.2107.04575. sual Communication and Image Representation 89
[27] Ferhat Bozkurt. “Classification of blood cells from (2022), p. 103664. DOI: https://doi.org/10.
blood cell images using dense convolutional net- 1016/j.jvcir.2022.103664.
work”. In: Journal of Science, Technology and En- [39] Zahra Mousavi Kouzehkanan et al. “A large dataset
gineering Research 2.2 (2021), pp. 81–88. DOI: of white blood cells containing cell locations and
https://doi.org/10.53525/jster.1014186. types, along with segmented nuclei and cytoplasm”.
[28] Alexey Dosovitskiy et al. An Image is Worth 16x16 In: Scientific Reports 12.1 (2022), p. 1123. DOI:
Words: Transformers for Image Recognition at https : / / doi . org / 10 . 1038 / s41598 - 021 -
Scale. Preprint at https : / / arxiv . org / abs / 04426-x.
2010.11929. 2021. [40] Dyah Aruming Tyas et al. “Erythrocyte (red blood
[29] Mawaddah Harahap et al. “Implementation of Con- cell) dataset in thalassemia case”. In: Data in Brief
volutional Neural Network in the classification 41 (2022), p. 107886. DOI: https://doi.org/
of red blood cells have affected of malaria”. In: 10.1016/j.dib.2022.107886.
Sinkron: jurnal dan penelitian teknik informatika 5.2 [41] Mohammed Hamdi et al. “Hybrid Models Based on
(2021), pp. 199–207. DOI: https://doi.org/10. Fusion Features of a CNN and Handcrafted Features
33395/sinkron.v5i2.10713. for Accurate Histopathological Image Analysis for
[30] Danish Jamil et al. “Diagnosis of gastric cancer using Diagnosing Malignant Lymphomas”. In: Diagnos-
machine learning techniques in healthcare sector: a tics 13.13 (2023), p. 2258. DOI: https : / / doi .
survey”. In: Informatica 45.7 (2021). DOI: https: org/10.3390/diagnostics13132258.
//doi.org/10.31449/inf.v45i7.3633.
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 415
[42] Rojina Kashefi et al. Explainability of Vision Trans- https : / / doi . org / 10 . 1109 / ICDICI62993 .
formers: A Comprehensive Review and New Per- 2024.10810888.
spectives. Preprint at https://arxiv.org/abs/ [52] Ruaa Sadoon and Adala Chaid. “Classification of
2311.06786. 2023. pulmonary diseases using a deep learning stack-
[43] Gireen Naidu, Tranos Zuva, and Elias Mmbongeni ing ensemble model”. In: Informatica 48.14 (2024).
Sibanda. “A review of evaluation metrics in machine DOI: https : / / doi . org / 10 . 31449 / inf .
learning algorithms”. In: Computer science on-line v48i14.6145.
conference. Springer. 2023, pp. 15–25. DOI: https: [53] Umair Saeed et al. “DeepLeukNet—A CNN based
//doi.org/10.1007/978-3-031-35314-7_2. microscopy adaptation model for acute lymphoblas-
[44] Austin H Routt et al. “Deep ensemble learning en- tic leukemia classification”. In: Multimedia Tools
ables highly accurate classification of stored red and Applications 83.7 (2024), pp. 21019–21043.
blood cell morphology”. In: Scientific Reports 13.1 DOI: https : / / doi . org / 10 . 1007 / s11042 -
(2023), p. 3152. DOI: https : / / doi . org / 10 . 023-16191-2.
1038/s41598-023-30214-w. [54] Md Shahin Ali et al. “A Hybrid VGG16-ViT Ap-
[45] Hongwei Shang et al. “Precision/recall on imbal- proach With Image Processing Techniques for Im-
anced test data”. In: International Conference on proved White Blood Cell Classification and Disease
Artificial Intelligence and Statistics. PMLR. 2023, Diagnosis: A Retrospective Study”. In: Health Sci-
pp. 9879–9891. URL: https : / / proceedings . ence Reports 8.6 (2025), e70859. DOI: https://
mlr.press/v206/shang23a.html. doi.org/10.1002/hsr2.70859.
[46] Enquan Yang et al. “DRNet: Dual-stage refinement [55] Sazzli Kasim et al. “Multiclass leukemia cell classifi-
network with boundary inference for RGB-D seman- cation using hybrid deep learning andmachine learn-
tic segmentation of indoor scenes”. In: Engineering ing with CNN-based feature extraction”. In: Scien-
Applications of Artificial Intelligence 125 (2023), tific Reports 15.1 (2025), p. 23782. DOI: https :
p. 106729. ISSN: 0952-1976. DOI: https://doi. //doi.org/10.1038/s41598-025-05585-x.
org/10.1016/j.engappai.2023.106729. [56] Aniel Mahendren et al. “White Blood Cells Clas-
[47] Areen K Al-Bashir, Ruba E Khnouf, and Lamis R sification: A Feature-Based Transfer Learning Ap-
Bany Issa. “Leukemia classification using differ- proach”. In: Selected Proceedings from the 2nd
ent CNN-based algorithms-comparative study”. In: International Conference on Intelligent Manufac-
Neural Computing and Applications 36.16 (2024), turing and Robotics, ICIMR 2024, 22-23 August,
pp. 9313–9328. DOI: https : / / doi . org / 10 . Suzhou, China. Ed. by Wei Chen et al. Singa-
1007/s00521-024-09554-9. pore: Springer Nature Singapore, 2025, pp. 757–763.
[48] Martin Moller. “Efficient training of feed-forward ISBN: 978-981-96-3949-6. DOI: https : / / doi .
neural networks”. In: Neural Network Analysis, org/10.1007/978-981-96-3949-6_63.
Architectures and Applications. CRC Press, 2024, [57] Mouna Saadallah. Red Blood Cell Morphology
pp. 136–173. DOI: https://doi.org/10.1201/ Dataset for Image Classification. Zenodo, Feb.
9781003572886-8. 2025. DOI: https : / / doi . org / 10 . 5281 /
[49] Emine Özgür and Ahmet Saygılı. “A new approach 14936017. URL: https : / / zenodo . org /
for automatic classification of non-hodgkin lym- records/14936017.
phoma using deep learning and classical learning [58] Vera Sorin et al. “Deep Learning Applications
methods on histopathological images”. In: Neu- in Lymphoma Imaging”. In: Acta Haematologica
ral Computing and Applications 36.32 (2024), (2025). DOI: https : / / doi . org / 10 . 1159 /
pp. 20537–20560. DOI: https://doi.org/10. 000547427.
1007/s00521-024-10229-8. [59] KP Swain, SK Swain, and SR Nayak. “Vision
[50] Sajida Perveen et al. “A framework for early detec- Transformer-Based Automated Classification of
tion of acute lymphoblastic leukemia and its sub- Acute Lymphoblastic Leukemia”. In: 2025 Interna-
types from peripheral blood smear images using tional Conference on Emerging Systems and Intelli-
deep ensemble learning technique”. In: IEEE Access gent Computing (ESIC). IEEE. 2025, pp. 584–588.
12 (2024), pp. 29252–29268. DOI: https://doi. DOI: https://doi.org/10.1109/ESIC64052.
org/10.1109/ACCESS.2024.3368031. 2025.10962707.
[51] Prakeerth Prasad and Jani Anbarasi L. “Acute Lym- [60] Vishesh Tanwar et al. “Enhancing blood cell diag-
phoblastic Leukemia Subtypes Detection using Vi- nosis using hybrid residual and dual block trans-
sion Transformer Model”. In: 2024 5th International former network”. In: Bioengineering 12.2 (2025),
Conference on Data Intelligence and Cognitive In- p. 98. DOI: https : / / doi . org / 10 . 3390 /
formatics (ICDICI). 2024, pp. 1413–1418. DOI: bioengineering12020098.
416 Informatica 49 (2025) 397–416 M. Saadallah et al.
https://doi.org/10.31449/inf.v49i16.10050 Informatica 49 (2025) 417–428 417 
 
Deep Learning and Rule-Based Hybrid Model for Enhanced English 
Composition Scoring Using Attention Mechanisms and Graph 
Convolutional Networks 
Ruimin Li 
Zhoukou Vocational and Technical College, Zhoukou 466000, China 
E-mail: laogui9029@126.com 
Keywords: English essay grading, deep learning, artificial rules, graph convolutional network, wide&deep architecture 
Technical paper 
Received: July 8, 2025 
Through the profound exploration conducted on AI technology in the field of education, early automatic 
scoring systems for English compositions have problems such as high misjudgment rate and low efficiency. 
To improve the efficiency, accuracy, and stability of the English composition grading model, a deep 
learning and manual rule-based English composition grading model was designed. The research extracted 
sequence features by introducing attention mechanisms, enhancing contextual correlation analysis, and 
aggregating global features through graph convolutional networks to extract high-order semantic 
relationships. Finally, a visual manual scoring rule was designed, which integrated deep semantic features 
and manual rule features through the Wide&Deep architecture to jointly optimize the scoring results. The 
experiment outcomes indicated that the accuracy recall curve area of the research method was 92.3%. In 
practical application testing, the highest group stability index of the research method was 0.07 in June. 
When faced with 600 concurrent requests, the average response time of the research method reached a 
stable value of 3.4 seconds. The outcomes above demonstrated that the English essay scoring model, 
which combines deep learning with manual rules as proposed by the research, exhibited excellent 
accuracy, speed, and stability. It effectively addressed the issues of a high misjudgment rate and low 
efficiency found in traditional scoring systems, thereby enhancing the model's reliability. 
Povzetek: Razvit je hibridni model za ocenjevanje angleških esejev, ki združuje globoko učenje z ročnimi 
pravili. Z Word2Vec, mehanizmom pozornosti in GCN zajame lokalne ter globalne semantike, Wide&Deep 
pa združi pravila in značilke. 
 
1 Introduction interpretability, but cannot adapt to open content 
evaluation. The two methods complement each other in 
English writing ability is one of the core indicators of 
advantages [5]. In light of the preceding circumstances, to 
language learning, and traditional manual scoring methods 
ensure the stability, accuracy, and efficiency of the scoring 
face bottlenecks such as low efficiency and strong 
model, an innovative English composition scoring model 
subjectivity [1, 2]. Early automatic scoring systems mainly 
based on DL and artificial rules has been designed. The 
relied on rule-based methods to detect surface errors 
research uses Word2Vec model to convert essay text into a 
through pre-defined grammar and spelling rules, but it was 
matrix of word vectors, capturing semantic information of 
difficult to evaluate the quality of content and logic, 
vocabulary. It introduces attention mechanism and graph 
resulting in a high rate of misjudgment [3]. With the 
convolutional network to extract local sequence features 
advancement of technology, machine learning (ML) 
and semantic graph features, and concatenating the two 
algorithms have been introduced to comprehensively 
features to generate deep semantic features, constructing 
consider vocabulary, syntax, and other elements through 
graph adjacency matrix to dynamically capture the 
feature engineering. However, a substantial quantity of 
relationships between sentences. Then, artificial rule 
annotated data support is still needed, and the 
features are generated through feature concatenation, and 
generalization ability is insufficient [4]. The existing 
the Wide&Deep architecture is used to fuse deep semantic 
scoring system cannot meet the automatic scoring 
features with artificial rule features. Finally, combining 
requirements for English compositions, and there is an 
multi-dimensional manual rule evaluation, the research 
urgent need for a stable, efficient, and accurate scoring 
achieves dynamic comprehensive scoring of the entire 
model. Deep learning (DL) models can improve semantic 
English composition. It is anticipated that research 
understanding through end-to-end learning, but they lack 
methodologies will offer a theoretical foundation for 
transparency and are difficult to capture grammatical 
grading essays in different languages. 
details. Artificial rules have a high degree of 
418   Informatica 49 (2025) 417–428                                                                                        R. Li 
 
 
2 Related works support systems was determined during the research 
process. The outcomes revealed that the research method 
English composition grading is an important part of the 
could effectively enhance decision-making ability in the 
educational evaluation system, playing a crucial role in 
context of supply chain [15]. 
achieving teaching objectives and optimizing teaching 
In summary, existing research has played a good role 
strategies. Ramesh et al. proposed AI and ML techniques 
in the technological advancement of English composition 
for evaluating automatic paper grading in response to 
grading models, but it still has limitations such as low 
issues such as time-consuming manual assessments and 
grading efficiency and significant subjective differences. 
lack of reliability in the education system. During the 
The automatic scoring model based on DL can extract 
research process, the limitations and research trends of the 
multi-level information such as linguistic features and 
current study were analyzed. The outcomes revealed that 
semantic information, which can simulate the process of 
the research method had a good effect [6]. Fokides et al. 
manual scoring to a certain extent, while manual rules can 
compared the accuracy and qualitative aspects of the 
handle complex grammar rules and subtle semantic 
corrections and feedback generated by ChatGPT with 
differences. Therefore, based on this, a DL and artificial 
educators regarding the effectiveness of ChatGPT on 
rule-based English composition grading model was 
elementary school students' essays written in English. The 
designed. The goal is to align with the design standards for 
outcomes revealed that ChatGPT surpassed educators in 
automated English composition grading and to 
regard to both the volume and the caliber of output [7]. 
significantly boost both the accuracy and efficiency of the 
Shahzad et al. proposed using random forests as classifiers 
grading workflow. 
for off topic paper detection to address the prediction 
problem of whether an article deviates from the topic. The 3 Design of english composition 
outcomes revealed that the research method had high 
scoring model 
accuracy [8]. Erturk et al. pointed out the low reliability 
and effectiveness of essay style evaluation tools, and 3.1 Intelligent english composition scoring 
believed that the system's decrease in paper scores was model based on deep semantic text 
related to boredom in the labeling. The outcomes revealed 
that higher levels of boredom were correlated with lower features 
scores [9]. Sharma et al. proposed a system that combines As an important part of the education evaluation system, 
handwriting recognition models and automatic paper English composition grading has undergone an evolution 
grading to address the time-consuming issue of grading from traditional manual grading to automated grading. 
handwritten papers in educational environments. During However, existing automated grading systems are mostly 
the research process, the performance of downstream tasks based on shallow text features, resulting in significant 
in paper scoring was analyzed based on Transformer errors in their grading results [16, 17]. DL models can 
context embedding. The outcomes revealed that the effectively improve the accuracy and reliability of English 
research method had good performance [10]. composition grading from three aspects: feature extraction, 
Many scholars both within the country and abroad semantic understanding, and grading prediction [18]. The 
have carried out profound investigations and application of study converts the original English composition text into a 
Word2Vec and artificial rules. Mohammed et al. conducted numerical word embedding matrix, and the text to word 
an exhaustive examination of diverse approaches within embedding conversion formula is shown in Equation (1). 
the realm of ensemble learning to address the issue of time-
E = Embedding(X)      (1) 
consuming hyperparameter tuning in DL. Various features 
or factors that affect the success of integration methods In Equation (1), X   represents the input English 
were explained during the research process. The outcomes composition text sequence, represents the word 
revealed that the research method could provide accurate embedding matrix, and Embedding()   represents a DL 
theoretical support [11]. Tropsha et al. proposed a "deep 
embedding function. Next, the research will investigate the 
quantitative structure-activity relationship" model for 
use of the Woed2Vec learning model to map each word to 
virtual screening of molecular databases. The outcomes 
a high-dimensional space, capturing the semantic and 
revealed that the research method had a good effect [12]. 
positional information of the word. The Word2Vec 
Whang et al. proposed a fairness measure and unfairness 
learning model has two training models: continuous bag of 
mitigation technique to address the issues of bias and 
words and skip word. The frameworks of the two models 
unfairness in traditional data management. The outcomes 
are presented in Figure 1. 
revealed that the research method had good data 
 
management performance [13]. Pereira et al. proposed an 
ML system for multi animal pose tracking to address the 
challenge of using DL and computer vision techniques to 
study the social behavior of multiple animals in natural 
environments. The outcomes revealed that the research 
method had good efficiency and accuracy [14]. Olan et al. 
designed an explanatory algorithm to address the impact 
of AI on the decision-making process in the supply chain 
field. The composition of interpretable AI and decision 
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 419 
 
Input Projection Output Input Projection Output fusion matrix, and Concat()  represents the concatenation 
layer layer layer layer layer layer
operation, The number of attention heads is 8, which is 
determined by GPU video memory optimization test. In 
the process of extracting semantic graph features from 
English compositions, in order to dynamically capture the 
relationships within sentences, a semantic graph adjacency 
matrix is constructed, and its construction formula is 
shown in Equation (4). 
 T
EE 
 A = softmax    (4) 
(a) Continuous bag of  D 
words model (b) Skip-Gram model
 
Figure 1: Framework diagram of continuous bag of words In Equation (4), A  represents the adjacency matrix, 
model and skip-gram model T
E   represents the transpose matrix of the word 
As shown in Figure 1, the training process of both the embedding matrix E  , and D   is the embedding 
continuous bag of words training model and the skip word dimension. Continuing with the study of iteratively 
training model goes through the input layer, the mapping updating node features to capture higher-order 
layer, and finally outputs the results from the output layer. relationships in semantic graphs, the graph convolution 
However, the continuous bag of words model aggregates feature propagation formula is shown in Equation (5). 
and maps multiple features, and then outputs the results, 
while the skip word model maps the features and performs  1 1
− − 
 (l+1)
H = 2 2 (l ) (l )
D AD H Θ   (5) 
classification output. The study combines continuous bag 
 
of words training model and skip word training model to 
train and detect the sequence features and semantic map In Equation (5), (l )
H   represents the node feature 
features of English compositions, and then scores the 
1
English compositions based on the detection results. In the matrix of the l  th layer, −  is used for normalization, 
D 2
process of extracting sequence features from English (l )
Θ   represents the learnable weight matrix,    is the 
compositions, in order to break through the sequence 
activation function, and introduces nonlinearity to enhance 
limitations of DL models, self-attention mechanism is 
the model's expressive power. Finally, the study integrates 
introduced, and its function expression is shown in 
all node features and aggregates them into a graph level 
Equation (2). 
feature vector to represent the global semantics of the 
 T entire English composition. The graph level feature 
QK 
 Attention(Q,K, V) = softmax  V  (2) aggregation formula is shown in Equation (6). 
 d 
 k 
 N
z = (L)
In Equation (2), Q , K , and V  represent the query ihi
 i=1
matrix, key matrix, and value matrix, respectively. d  is   exp( • (L)  (6) 
k w h
 i )
 i =the dimension of the key or query vector and softmax()  
 exp( • (L)
w h j )
represents the normalization function. To enhance the  j
model's ability to express complex sequence patterns, a In Equation (6), (L)
h  is the feature vector of the i th 
multi-head attention mechanism is introduced, and its i
calculation formula is shown in Equation (3). node, L  is the total number of layers, z  is the graph 
level feature vector representing the semantic summary of 
 =  O
MultiHead(Q,K,V) Concat(head1, ,headh )W the entire text,   is the attention weight representing the 
 (3) i
wherehead = Q K V
i Attention(XWi , XWi , XWi ) importance of node i   to global features, w   is the 
learnable weight vector used to calculate attention scores, 
In Equation (3), head   represents the number of and N  is the number of nodes or words. In summary, the 
attention heads. Q
XW , K
XW , and V  epr sen  th
i XW r e t e 
i i detection model structure that integrates the sequence 
projection matrices of the i th head query vector, head key features of English compositions with the semantic map 
vector, and head value vector, O
W  represents the output features of English compositions is shown in Figure 2. 
 
 
420   Informatica 49 (2025) 417–428                                                                                        R. Li 
 
 
CNN
Semantic graph
Word2Vec Convolutional Neural 
h
Networks graph
Out Out  Out Concat
Score?
hdeep
English 
composition Word2Vec Head1 Head2  Headk hsep
Self-Attention Mechanism  
Figure 2: Detection model integrating sequence and graph features of english compositions 
 
Data Capture the relationships 
processing between sentences in 
English compositions Feature 
Semantic diagram of extraction
Capture high-order English composition
relationships in semantic Word2Vec
graphs
 
Semantic 
Integrate node features Self-Attention Convolutional 
English graph nodes Mechanism Neural Networks
composition 
document
Split into a head sequence  
English composition 
head sequence
Sequence features
Deep semantic 
Output rating results
features
Scorer Image features
 
Figure 3: Intelligent scoring model for english compositions based on deep semantic text features 
 
As shown in Figure 2, the detection model that on deep semantic text features is shown in Figure 3. 
integrates English composition sequence features and As shown in Figure 3, the English composition 
semantic graph features receives two types of input data, scoring model based on deep semantic text features 
and the semantic graph captures the semantic achieves accurate evaluation by integrating multi-level 
relationships between phrases and concepts. Then the semantic information. The model first performs 
semantic graph and English composition are processed structured parsing on the input English essay document, 
by Word2Vec, converting discrete words into dense, low breaks down the title sequence to highlight the article 
dimensional real valued vectors. Next, semantic graph structure, and preserves contextual information through 
features are extracted through graph convolutional node feature integration. In the feature extraction stage, 
networks, while sequence features are extracted by multimodal technology is used to deeply fuse semantic 
introducing attention mechanisms. Subsequently, feature information. On the one hand, the title sequence is 
fusion is performed, and the sequence feature vectors and embedded in Word2Vec and local sequence features are 
graph feature vectors extracted from two parallel paths extracted through self-attention mechanism. On the other 
are concatenated to form deep semantic features. Finally, hand, semantic graph nodes model global semantic 
the model evaluates the output rating results. The fusion relationships through graph convolutional networks. The 
detection model solves the limitation of single feature two types of features are further combined with image 
representation ability by fusing two complementary features to form a unified deep semantic feature vector. 
feature representations. The deep semantic text feature Finally, the rater performs regression analysis based on 
expression of its fusion model is shown in Equation (7). deep semantic features and outputs objective scoring 
results. 
 h ( )  (
deep =  h 7) 
seq hgraph In summary, the implementation details of the entire 
research framework are as follows: (1) Word2Vec is used 
In Equation (7), h  represents the deep semantic 
deep to convert English essay texts into dense word vector 
features of English composition, ||   represented by matrices. The continuous bag-of-words model predicts 
core words through contextual word prediction. The input 
vector concatenation symbols, h   and h  
seq graph layer aggregates multiple contextual word vectors, while 
respectively represent the sequence features and graph the mapping layer summarizes them to output core word 
features of English composition. In summary, the probabilities. The skip-word model predicts contextual 
intelligent scoring model for English composition based words based on core words. Both models undergo 
 
 
 
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 421 
 
negative sampling optimization, with 300-dimensional complexity score, T   represents the part of speech 
s
embeddings and a 5-window contextual size. (2) During diversity score, P  represents the rectangle score, and 
attention mechanism feature extraction, the input word s
vector matrix is linearly transformed to generate query   represents the artificial rule weight. Then, to evaluate 
i
matrices, key matrices, and value matrices, each with 64 the logical rigor of English essay paragraphs, a scoring 
dimensions. The multi-head architecture employs 8 heads, formula for paragraph cohesion strength is introduced, 
where each head independently computes attention and and its specific expression is shown in Equation (10). 
outputs concatenated linear fusion results. The final n
sequence features are generated. (3) In graph  k  I (conn
 k )
 (10) 
convolutional network semantic feature extraction, the = k= R
C 1
p  cohere
adjacency matrix embedding dimension is 300. The N 1+ log(L)
feature propagation and aggregation process learn a In Equation (10), C   represents the coherence 
p
weight matrix with 128 dimensions across 2 layers. 
 score of the paragraph, I(conn  represents the validity 
k )
3.2 Intelligent english composition scoring indicator function of the k th connector,   represents 
k
model combined with artificial rules the weight of the connector, R   represents the 
cohere
Although the English composition grading model based semantic coherence ratio, N  represents the number of 
on deep semantic text features can effectively grade sentences, and L  represents the length of the paragraph. 
English compositions, it generally relies on manually 
Finally, the study aims to achieve dynamic 
defined grading templates, and candidates can avoid 
comprehensive scoring of the entire English composition 
deduction types through simple writing techniques, 
through multi-dimensional manual rule evaluation. The 
lacking interpretability [19]. In the field of composition 
scoring formula is shown in Equation (11) 
checking, artificial rules are usually expressed in formal 
 
language and automatically detected through natural 
 m
language processing tools. The English scoring model   l 
Qs C
j   p
combined with artificial rules can effectively solve the p 
j=1 p=1
Score =    +    + Sim
problem of lack of interpretability in DL models, so content Simstr
 m   l 
further research is needed to introduce artificial rules [20].    
Based on manual rules to quantify the basic language    
quality of sentences, the basic formula for scoring errors (11) 
in English compositions is shown in Equation (8). In Equation (11), Score  represents the final score 
1
 E of the composition, Q  represents the excellence score 
s = F  (8) s j
1+   (Cspell +Cgram )
of the j  th sentence, C   represents the coherence 
p
In Equation (8), E   is the score for incorrect p
s score of the p  th paragraph, Sim   and 
content Sim  
sentences, with a maximum score of F  , C  is the str
spell
represents the similarity of content and structure, ,,  
number of spelling errors, C   is the number of 
gram all meet the requirement of  +  + =1 . The artificial 
grammar errors, and   is the error penalty coefficient, 
rules are constructed based on expert knowledge, 
its value is 0.1, and the error rate is lowest when its value employing a method that quantifies sentence-level errors 
is 1 through grid search verification. Continuing with the and sentence excellence through predefined weights to 
study of balancing the importance of each dimension achieve digital transformation. The primary linguistic 
through artificial rules, the formula for weighting the features targeted include surface errors, sentence-level 
multidimensional excellence of sentences is shown in errors, and paragraph-level errors. By integrating deep 
Equation (9). semantic features through a Wide&Deep architecture, the 
rules enhance interpretability while capturing subtle 
 Qs = 1 Vs +2 Gs +3 Ts +4 P  (9) 
s errors and reducing subjective variations. Experimental 
validation demonstrates their effectiveness in lowering 
In Equation (9), Q   represents the overall 
s bias values and misjudgment rates, as well as improving 
excellence score of the sentence, V   represents the scoring stability. In summary, the feature extraction 
s
vocabulary score, G   represents the syntactic framework for the manual scoring rules of English 
s compositions is shown in Figure 4. 
 
 
422   Informatica 49 (2025) 417–428                                                                                        R. Li 
 
 
 
Error type 1   
Manual scoring 
rules  
Error type 2
Concat   
 
 
hexpert
Error type n
English 
composition text Artificial rule vector
 
Figure 4: Feature extraction of artificial rules for english composition 
 
Manual scoring rules English composition text Semantic graph
Convolutional 
 
Error type 1 Neural Networks
 
Error type 2
Concat
  
 
Error type n
Artificial rule vector
  
hsep Concat hgraph
hexpert hdeep
Score?
 
Figure 5: Network structure of english composition error detection combined with artificial rules 
 
As shown in Figure 4, in the feature extraction 
framework of manual scoring rules for English  hexpert = (W‖( v rud i ) b) ( 2  
 x +  1 )
compositions, structured manual scoring rules are input 
together with the original English composition text as In Equation (12), h   represents the artificial 
expert
initial data. Then, the manual rules are decomposed into rule feature, vrud  represents the set of error types, 
different types of errors, and each type of rule is 
x  represents the artificial rule vector for the error type, 
quantified as a numerical vector to achieve the digital i
transformation of expert knowledge. Then, the artificial and b  represents the bias term. Next, the study uses the 
rule vector is concatenated with the semantic vector of Wide&Deep structure to fuse shallow features of 
the composition text to form a mixed feature that artificial rules with deep semantic text features, achieving 
combines both artificial rules and text semantics. Finally, the final error classification prediction. The fusion 
after processing, output the characteristics of the manual formula is shown in Equation (13). 
scoring rules for English compositions. The study aims to 
achieve the organic integration of artificial rules and DL  y = Softmax(W 1
widehexpert +Wdeephdee +b)  ( 3) 
p
models by converting discrete artificial rules into 
continuous features. The specific expression is shown in In Equation (13), y   represents the rating result, 
Equation (12). and W  and W  represent the weight matrices of 
wide deep
Artificial rule feature extraction
 
Deep semantic feature extraction
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 423 
 
parts wide   and deep  . In summary, the network accuracy of the model, TP  and TN  are the number of 
structure of English composition error detection essays correctly rated as low or high by the model, FP  
combined with manual rules is shown in Figure 5. and FN  are the number of essays incorrectly rated as 
As shown in Figure 5, the English composition error low or high by the model. In summary, the scoring 
detection network structure combined with manual rules process of the English composition scoring model based 
improves detection accuracy and interpretability through on DL and artificial rules is shown in Figure 6. 
dual channel feature fusion. The model receives dual As shown in Figure 6, the English essay scoring 
source inputs: the manual scoring rules are decomposed model based on DL and artificial rules improves scoring 
and vectorized into quantifiable rule vectors, covering accuracy and interpretability through dual channel 
error types such as grammar, logic, rhetoric, etc. The feature collaboration. The model takes English 
model synchronizes the construction of semantic maps composition text and manual scoring rules as dual source 
for original English compositions, extracts logical inputs: on the one hand, it generates a semantic map 
relationships between sentences, and performs sequence through multi-level parsing of the original text, and on the 
analysis to capture word order features. Then, the deep other hand, it breaks down the document into sequential 
semantic features obtained from the dual source input are features according to its structure, preserving the 
evaluated together with the artificial rule features, and the framework information of the article. Next, in the feature 
results are judged. The study introduces binary cross extraction stage, a bimodal DL architecture is adopted. 
entropy loss to measure the difference between After Word2Vec vectorization of semantic graph nodes, a 
misclassified predictions and true labels. The specific graph convolutional network models global semantic 
expression of the loss function is shown in Equation (14). relationships and outputs deep features. Sequence nodes 
extract local language patterns through self-attention 
 Loss = −yˆ log(y)−(1− yˆ)log(1− y)  (14) mechanisms, generate sequence features, and 
concatenate the two to form deep semantic features. On 
the other hand, breaking down manual rules in textual 
In Equation (14), Loss   represents the loss 
form into quantifiable dimensional vectors enables the 
function value, ŷ  represents the true label of the sample, digital transformation of expert knowledge. Finally, the 
and y  represents the predicted probability output of the artificial rule features are optimized using binary cross 
model. Finally, the study evaluates the performance of the entropy and combined with deep semantic features to 
model by calculating its accuracy, as shown in Equation generate rule enhanced deep features. The rater then 
(15). performs regression analysis based on the rule enhanced 
deep features to output the final English essay grading 
 TP+TN
Accuracy =  (15) results. 
TP+ FP+TN + FN  
In Equation (15), Accuracy   represents the 
 
Data Capture the relationships 
processing between sentences in 
English compositions Feature 
Semantic diagram of extraction
Capture high-order English composition
relationships in semantic Word2Vec
graphs
 
Semantic 
Integrate node features Self-Attention Convolutional 
English graph nodes Mechanism Neural Networks
composition 
document
Split into a head sequence  Sequence features Image features
English composition 
head sequence
Deep semantic features
Artificial rule Binary cross 
vector entropy Deep semantic features
Manual 
scoring rules
Wide&Deep 
Output rating results
structure
Scorer  
Figure 6: Scoring process of english composition scoring model based on dl and artificial rules 
 
424   Informatica 49 (2025) 417–428                                                                                        R. Li 
 
 
Algorithm (ICSA), Linear Regression Model (LRM), and 
4 Validation of English composition Hierarchical Attention Model (HAM). The accuracy 
recall curves and curve areas of the four methods were 
grading model based on DL and compared, and the results are presented in Figure 7. 
artificial rules As shown in Figure 7, the shape and area of the 
accuracy recall curve of different methods are different. 
4.1 Performance testing of English In Figure 7 (a), the accuracy recall curve of the research 
composition scoring model based on method was close to a rectangle, with a curve area of 
DL and artificial rules 92.3%. In Figure 7 (b), the accuracy recall curve of the 
 ICSA algorithm was 71.6%. In Figure 7 (c), the curve of 
To confirm the capability of the English essay grading the LRM model belonged to low accuracy high recall, 
model based on DL and artificial rules, a simulation which was prone to false positives. As shown in Figure 7 
model was constructed for testing. The testing (d), the curve of the HAM model belonged to high 
environment and specific configuration are presented in accuracy low recall, which was prone to missed 
Table 1. detections. Overall, compared to comparative methods, 
 research methods had higher accuracy and inspection 
 coverage. The mean absolute error (MAE) of the scoring 
 results of the four methods under different numbers of 
 writing words, as well as the scoring time under different 
Table 1: Test environment and specific configuration numbers of writing paragraphs, were compared, and the 
Testing environment Specific configuration outcomes are presented in Figure 8. 
GPU NVIDIA Tesla In Figure 8 (a), the MAEs of the scoring results of 
V100/A100 the four methods all increased with the increase in the 
CPU Intel Xeon Gold 6248R number of English composition words. The MAE of the 
Memory 256GB DDR4 research method's scoring results had the smallest 
increase. When the word count in the composition was 
Storage 2TB NVMe SSD + 10TB 
100, the MAE of the research method's scoring results 
HDD 
was 0.25. When the word count in the composition was 
DL framework PyTorch 1.12 / 
350, the MAE of the research method's scoring results 
TensorFlow 2.10 
was 0.52. The MAE for the two types of composition 
Feature engineering tools Scikit-learn 1.2 + 
word counts only increased by 0.27. The MAE of the 
Gensim 4.3 
scoring results for the other three methods was 
Support for large models Transformers 4.28 
significantly greater than that of the research method at 
 different numbers of words in the composition. In Figure 
Research method ICSA 8 (b), the scoring time of all four methods increased with 
100 100
the number of paragraphs in the essay. When the English 
80 80 essay had only one paragraph, the scoring time of the 
60 60 research method was 32 ms, and when the essay had five 
paragraphs, the scoring time was 42 ms. However, the 
40 40
scoring time of the other three methods at different 
20 20 paragraph counts was significantly greater than that of the 
0 0 research method. Overall, compared to the comparative 
0 20 40 60 80 100 0 20 40 60 80 100
Recall (%) Recall (%) methods, the research methods had better robustness. In 
(a) Research method (b) ICSA conclusion, the English essay grading model proposed by 
100 LRM 100 HAM
the research based on DL and artificial rules had high 
80 80 reliability, accuracy, and good robustness. After 
validating the performance of the research methodology, 
60 60
the study further investigated the synergistic effects of the 
40 40 fusion architecture through ablation experiments. First, 
20 20 independent testing of deep models revealed that 
removing manual rules reduced grammatical error 
0 0
0 20 40 60 80 100 0 20 40 60 80 100 detection accuracy, demonstrating their constraint effect 
Recall (%) Recall (%)
(c) LRM (d) HAM  on surface errors. Next, independent testing of rule 
Figure 7: Accuracy recall curve of different methods models showed increased semantic coherence score 
deviations in long texts when graph convolutional 
 
networks were removed, proving deep models' capability 
As shown in Table 1, the specific configurations in to capture higher-order semantics. Finally, dual-stream 
the table were used for performance testing, using the feature contribution analysis using SHAP values 
Kaggle ASAP dataset. The research methods were demonstrated that manual rule features contributed 
compared with the Integrated Classification Scoring minimally to grammatical/spelling error detection, while 
Accuracy (%) Accuracy (%)
Accuracy (%) Accuracy (%)
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 425 
 
deep semantic features played a significant role in content in IELTS data set could be reduced to 0.42, which was 
logic scoring. These ablation results confirmed the better than the 0.67 of the research models, highlighting 
complementary innovation of the "feature perception- the advantages of lightweight. The study further 
regulation constraint" architecture in the research compared Transformer-based pre-trained models. In the 
methodology. IELTS content dimension scoring, BERT exhibited lower 
deviation values than the research methodology. 
4.2 Practical application effect of English However, its reliance on billions of parameters resulted 
composition scoring model based on DL and in significantly longer response times. Notably, the 
artificial rules research methodology demonstrated markedly higher 
accuracy in detecting grammatical errors when 
On the basis of verifying the performance of the incorporating rules, surpassing BERT. These findings 
English essay grading model based on DL and artificial indicated that compared to cutting-edge technologies, the 
rules, further research is conducted to ascertain the research methodology demonstrated superior semantic 
efficacy of the practical application of the research understanding depth and error detection specificity. Then, 
method. The study used the IELTS Writing Task 2 dataset the group stability index of the four methods for the 
to build a modular hierarchical architecture experimental English composition data in the first six months was 
platform. The research methods were compared with scored, and the deviation values under different scoring 
ICSA, LRM and HAM, and the semantic depth index was dimensions were compared. The results are shown in 
supplemented: The content dimension deviation of BERT Figure 9. 
 
Research method Research method
ICSA ICSA
LRM LRM
2.5 HAM HAM
2.0 100
80
1.5
60
1.0
40
0.5 20
1
2
0 3
100 150 200 250 300 350 4
Word count in the composition 5
(a) The average absolute error of 
(b) Rating efficiency
compositions of different lengths
 
Figure 8: MAE and rating efficiency 
 
Critical value Research method
Research method ICSA
0.5 ICSA 5 LRM
LRM HAM
HAM
0.4 4
0.3 3
0.2 2
0.1 1
0 0
1 2 3 4 5 6 Content Language Structure Coherence
Composition data for different months Rating dimension
(a) Model generalization ability (b) Deviation in sub item scoring
 
Figure 9: Model generalization ability and sub-item scoring deviation 
 
In Figure 9 (a), the critical value of the group The overall group stability index of the research method 
stability index for English composition scoring was 0.17. for scoring monthly composition data remained below 
Mean absolute error
Group stability index
Deviation value Time consumption (ms)
426   Informatica 49 (2025) 417–428                                                                                        R. Li 
 
 
the critical value, with its highest group stability index essays was less than 20%. The highest misjudgment rate 
being 0.07 in June and the lowest 0.02 in January. The was 17.8% when the score threshold was 21 points, and 
stability indices of the other three methods for scoring the minimum misjudgment rate of the research method 
monthly essay data were significantly higher than that of was 3.2% when the score threshold was 24-25 points. The 
the research method. In Figure 9 (b), the deviation value misjudgment rates of the other three methods were 
of the research method in the dimension of composition significantly higher than that of the research method at 
content was 0.67, the deviation value in the dimension of different score thresholds. As shown in Figure 10 (b), the 
composition language was 0.99, the deviation value in the consistency between the vocabulary scoring results of the 
dimension of composition structure was 0.33, and the four methods and the manual vocabulary scoring was not 
deviation value in the dimension of composition the same. The distribution of the vocabulary scoring 
coherence was 0.82. The bias values of the other three results of the research method was closely aligned with 
methods under different scoring dimensions were the diagonal, indicating it was highly consistent with the 
significantly greater than those of the research methods. manual vocabulary scoring. However, the distribution of 
Overall, compared to the comparative methods, the vocabulary scoring results for the other three methods 
research methods had better generalization ability and differed significantly from that of manual vocabulary 
higher accuracy. Comparing the sensitivity of four scoring, resulting in lower accuracy of their scoring 
methods in identifying excellent compositions and their results. Overall, compared to the comparative methods, 
ability to capture advanced vocabulary in compositions, the research method had a lower false positive rate and 
the results are presented in Figure 10. better scoring performance. The four methods were 
As shown in Figure 10 (a), the misjudgment rates of the compared for the accuracy rate of scoring under different 
four methods for high - scoring essays with different error types and the average response time under different 
score thresholds were not the same. The overall concurrent request numbers, as shown in Figure 11. 
misjudgment rate of the research method for high-scoring 
 
Research method Line of demarcation
ICSA Research method
LRM ICSA
HAM LRM
HAM
100 5
80 4
60 3
40 2
20 1
0 0
20 21 22 23 24 25 0 1 2 3 4 5
Score threshold (points) Manual evaluation vocabulary score (points)
(a) High score essay misjudgment rate (b) Vocabulary richness recognition ability
 
Figure 10: High score essay misjudgment rate and vocabulary richness recognition ability 
 
Research method Research method
ICSA ICSA
500 LRM 10 LRM
HAM HAM
400 8
300 6
200 4
100 2
0 0
Grammar Spell Logic Match 0 200 400 600 800 1000
Error type Number of concurrent requests (copies)
(a) Error type processing time (b) Concurrent processing capability
 
Figure 11: Error type processing time and concurrent processing capability 
Processing time (ms)
Misjudgment rate (%)
Scoring system evaluates 
vocabulary score (points)
Average response time (s)
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 427 
 
 
As shown in Figure 11(a), the research method application potential. Future studies could integrate 
demonstrated 99.2% accuracy in scoring grammatical Transformer pre-trained models to verify model stability 
errors, 98.7% in spelling errors, 99.0% in logical errors, and deviations across multilingual essay datasets (e.g., 
and 97.9% in collocation errors. In contrast, the other French, Chinese), evaluate cross-linguistic rule 
three methods showed significantly lower accuracy rates adaptability, and improve cross-linguistic performance 
for these error types compared to the research and transferability. Additionally, the research could 
methodology. In Figure 11 (b), the average response time incorporate eye-tracking technology into multi-modal 
of the four methods gradually increased with the increase deep understanding frameworks. By recording eye 
of concurrent requests, with the research method showing movements during writing processes, it can analyze 
the smoothest trend of increase. When faced with 600 authors' attention allocation patterns. Combined with 
concurrent requests, the average response time of the keystroke logs, this approach could quantify writing 
research method reached a stable value of 3.4 seconds. fluency and cognitive load, supplementing process 
However, the average response time of the other three dynamics that textual features cannot capture. However, 
methods showed a significantly greater increase trend this study is the first to migrate the Wide&Deep 
than the research method. Overall, compared to architecture from recommendation system to essay 
comparative methods, research methods had better scoring field. Through semantic drift of DL with rule 
resource allocation capabilities and scoring performance. feature constraints, it provides a new idea for the 
Overall, the English essay grading model proposed by the interpretability of AI education products. 
research based on DL and artificial rules had good 
generalization ability, accuracy, and performance. References 
5 Conclusion [1] Del Gobbo E, Guarino A, Cafarelli B, Grilli L. 
GradeAid: A framework for automatic short 
To address the issues of high misjudgment rates and answers grading in educational contexts—design, 
instability in existing English essay automatic scoring implementation and evaluation. Knowledge and 
systems, this study innovatively proposes an English Information Systems, 2023, 65(10): 4295-4334. 
essay scoring model combining DL with manual rules. DOI: 10.1007/s10115-023-01892-9. 
The research methodology extracts sequence features and [2] Wang Q. The use of semantic similarity tools in 
semantic graph features from English essays, integrating automated content scoring of fact-based essays 
them with manual rule features to construct a "feature written by EFL learners. Education and Information 
perception-rule constraint-joint decision" fusion Technologies, 2022, 27(9): 13021-13049. DOI: 
architecture for stable and accurate scoring. Experimental 10.1007/s10639-022-11179-1. 
results show that when the essay contains 100 words, the [3] Geçkin V, Kızıltaş E, Çınar Ç. Assessing second-
average absolute error of the scoring method is 0.25; language academic writing: AI vs. Human raters. 
when the essay contains 350 words, the average absolute Journal of Educational Technology and Online 
error increases to 0.52; and when the essay consists of 5 Learning, 2023, 6(4): 1096-1108. DOI: 
paragraphs, the scoring time reaches 42ms. In practical 10.31681/jetol.1336599. 
application tests, the method shows 0.67 deviation in [4] Theodosiou A A, Read R C. Artificial intelligence, 
content dimension scoring, 0.99 deviation in language machine learning and deep learning: Potential 
dimension scoring, 0.33 deviation in structure dimension resources for the infection clinician. Journal of 
scoring, and 0.82 deviation in coherence dimension Infection, 2023, 87(4): 287-294. DOI: 10.1016/j. 
scoring. The method achieved 99.2% accuracy rate for Jinf.2023.07.006. 
grammatical errors, 98.7% accuracy rate for spelling [5] Wang J, Wang S, Zhang Y. Deep learning on medical 
errors, 99.0% accuracy rate for logical errors, and 97.9% image analysis. CAAI Transactions on Intelligence 
accuracy rate for collocation errors. Overall, the proposed Technology, 2025, 10(1): 1-35. 
method demonstrated excellent scoring accuracy, DOI:10.1049/cit2.12356. 
robustness, and stability. The research findings failed to [6] Ramesh D, Sanampudi S K. An automated essay 
quantify the contribution ratios of DL and rule-based scoring systems: A systematic literature review. 
approaches to explainability. The test datasets were Artificial Intelligence Review, 2022, 55(3): 2495-
limited to IELTS/Kaggle materials, which did not 2527. DOI: 10.1007/s10462-021-10068-2. 
validate the generalization capabilities of open-domain [7] Fokides E, Peristeraki E. Comparing ChatGPTs 
essays and consequently compromised practical correction and feedback comments with that of 
applicability. Moreover, the methodology primarily educators in the context of primary students short 
relied on Word2Vec and traditional attention mechanisms essays written in English and Greek. Education and 
for feature extraction. While effective in English essay Information Technologies, 2025, 30(2): 2577-2621. 
scoring, the static embedding model of Word2Vec lacked DOI: 10.1007/s10639-024-12912-8. 
contextual sensitivity, potentially limiting semantic depth [8] Shahzad A, Wali A. Computerization of off-topic 
comprehension and cross-linguistic transfer capabilities. essay detection: a possibility? Education and 
Modern Transformer models, however, provide superior Information Technologies, 2022, 27(4): 5737-5747. 
contextual representation and enhanced cross-linguistic DOI: 10.1007/s10639-021-10863-y. 
428   Informatica 49 (2025) 417–428                                                                                        R. Li 
 
 
[9] Erturk S, van Tilburg W A P, Igou E R. Off the mark: supply chain decision support making. Production 
Repetitive marking undermines essay evaluations Planning & Control, 2025, 36(6): 808-819. 
due to boredom. Motivation and Emotion, 2022, DOI:10.1080/09537287.2024.2313514. 
46(2): 264-275. DOI: 10.1007/s11031-022-09929-2. [16] Bhat M, Rabindranath M, Chara B S, Simonetto D 
[10] Sharma A, Katlaa R, Kaur G, Jayagopi D B. Full- A. Artificial intelligence, machine learning, and 
page handwriting recognition and automated essay deep learning in liver transplantation. Journal of 
scoring for in-the-wild essays. Multimedia Tools hepatology, 2023, 78(6): 1216-1233. DOI: 
and Applications, 2023, 82(23): 35253-35276. DOI: 10.1016/j. Jhep.2023.01.006. 
10.1007/s11042-023-14558-z. [17] Simon K, Vicent M, Addah K, Bamutura D, Atwiine 
[11] Mohammed A, Kora R. A comprehensive review on B, Nanjebe D, Mukama A O. Comparison of deep 
ensemble deep learning: Opportunities and learning techniques in detection of sickle cell 
challenges. Journal of King Saud University- disease. AIA, 2023, 1(4):252-259. DOI: https://doi. 
Computer and Information Sciences, 2023, 35(2): Org/10.47852/bonviewAIA3202853. 
757-774. DOI: 10.1016/j. Jksuci.2023.01.014. [18] K. Bhosle, V. Musande. Evaluation of deep learning 
[12] Tropsha A, Isayev O, Varnek A, Schneider G, CNN Model for recognition of Devanagari digit. 
Cherkasov A. Integrating QSAR modelling and Applied Artificial Intelligence. 2023, 1(2): 114-118. 
deep learning in drug discovery: The emergence of DOI: 10.47852/bonviewAIA3202441. 
deep QSAR. Nature Reviews Drug Discovery, 2024, [19] Zamfiroiu A, Vasile D, Savu D. ChatGPT–a 
23(2): 141-155. DOI: 10.1038/s41573-023-00832-0 systematic review of published research papers. 
[13] Whang S E, Roh Y, Song H, Lee J G. Data collection Informatica Economica, 2023, 27(1): 5-16. DOI: 
and quality challenges in deep learning: A data- 10.24818/issn14531305/27.1.2023.01. 
centric ai perspective. The VLDB Journal, 2023, [20] Didimo W, Grilli L, Liotta G, Montecchiani F. 
32(4): 791-813. DOI: 10.1007/s00778-022-00775-9. Efficient and trustworthy decision making through 
[14] Pereira T D, Tabris N, Matsliah A, Turner D M, Li J, human-in-the-loop visual analytics: A case study on 
Ravindranath S, et al. SLEAP: A deep learning tax risk assessment. Rivista italiana di informatica e 
system for multi-animal pose tracking. Nature diritto, 2022, 4(2): 15-21. DOI: 10.32091/RIID0092. 
methods, 2022, 19(4): 486-495. DOI:  
10.1038/s41592-022-01426-1.  
[15] Olan F, Spanaki K, Ahmed W, Zhao G. Enabling 
explainable artificial intelligence capabilities in 
 
https://doi.org/10.31449/inf.v46i16.9736 Informatica 49 (2025) 429–440 429 
Integrating DDPG and QPSO for Multi-Objective Optimization in 
High Proportion Renewable Energy Power Dispatch Systems 
Xu’an Qiao1*, Chaofan Liu2 
1School of Aeronautics, Chongqing City Vocational College, Chongqing 402160, China 
2School of Physics Sciences, University of Science and Technology of China, Hefei 230026, China 
E-mail: qiaoxuan109_2023@126.com, Liuchaofan1980_622@126.com 
*Corresponding author 
Keywords: Power system, dispatch optimization, renewable energy, DDPG, heuristic algorithm 
Received: June 16, 2025 
This study proposes a novel dispatch optimization model that integrates deep deterministic policy gradient 
(DDPG) and quantum particle swarm optimization (QPSO) to address the challenges posed by high 
proportions of renewable energy in power systems. The proposed multi-objective optimization framework 
considers system cost reduction, supply-demand balance, and dynamic adaptability to renewable energy 
fluctuations. The experimental results on the IEEE 30-bus and 118-bus systems demonstrated significant 
improvements. This method reduced total system costs by 13.6% and 11.4%, respectively. It also increased 
supply reliability to 97.1% and achieved an energy utilization rate of 94.85%. Additionally, it minimized 
frequency deviation to 1.25 Hz. The optimization time was also improved, achieving a reduction of 58.3 
seconds in efficiency. The research results have important practical application value in improving power 
system economy, enhancing system reliability, and dynamic adaptability. It can provide efficient and 
reliable technical support for power dispatch planning, load management, and real-time control under 
high percentage renewable energy scenarios. 
Povzetek:Študija predlaga hibridni model za optimizacijo razporejanja (dispečanja) v omrežjih z visokim 
deležem obnovljivih virov, ki združuje izboljšani DDPG in QPSO. Model zmanjša stroške, izboljša 
zanesljivost in stabilnost ter poveča izrabo energije. Preizkusi na IEEE sistemih potrjujejo visoko 
učinkovitost.
 
1 Introduction (UC) is performed first to determine the start/stop status 
of the units. Then economic dispatch (ED) is performed to  
With the continuous adjustment and optimization of the optimize the unit output. Finally, real-time regulation is 
global energy structure, carbon peaking and carbon performed through automatic generation control (AGC) 
neutrality targets have become important strategies for [6]. Although this staged dispatch model is simple to 
countries around the world to cope with climate change operate, there are problems such as response delays 
and achieve sustainable development. Because of their between different dispatch modules. In view of this, the 
clean and low-carbon benefits, renewable energy (RE) study proposes a multilevel cooperative dispatch model 
sources like solar and wind have been frequently used in for HP of renewable energy power systems (REPS) and 
this setting [1]. However, the operation and scheduling of introduces heuristic algorithms to accelerate the solution 
the conventional power system (PS) have been severely of complex systems. The study aims to solve the 
hampered by the widespread use of wind, solar, and other limitations of the traditional sequential scheduling method 
high percentage (HP) RE sources. To begin with, RE is and improve the economy, reliability and feasibility of 
highly volatile and erratic. Their output is affected by system operation through refined modeling and 
natural conditions, including wind speed, light, etc., and cooperative optimization. 
there is a large uncertainty [2]. This uncertainty makes the This study's novel contribution is its proposal of a 
load balance and stability of the system subject to shocks, joint heuristic algorithm that combines improved deep 
and is prone to supply-demand imbalance in the PS in the deterministic policy gradient (DDPG) and quantum 
case of peak power demand or insufficient wind and solar particle swarm optimization (QPSO) algorithms to 
resources [3]. Second, the traditional PS, which relies on optimize scheduling in high-proportion REPS. This 
precise forecasts of load and generation capacity from the method improves the convergence speed and stability of 
scheduling model, becomes irrelevant when the complex, multi-constraint, multi-timescale problems by 
proportion of RE in the PS increases. Due to the introducing dual experience pooling and time-decaying 
significant impact of unstable factors on RE generation exploration strategies. The proposed mathematical model 
capacity, there is a substantial discrepancy between actual addresses the inefficiencies and local optima of traditional 
and forecasted values. This discrepancy makes short-term models. It provides a more efficient and reliable solution 
PS scheduling more challenging and complex [4-5]. In for PS scheduling optimization. 
addition, current PS scheduling relies mainly on a phased 
sequential scheduling approach, i.e., unit commitment  
430 Informatica 49 (2025) 429–440 X. Qiao et al. 
2 Related works grid [12]. By proposing a new economic low-carbon clean 
PS dispatch model that incorporates power-to-gas 
The key to ensuring the PS operates steadily is PS technology, Cui et al. addressed the issue of increasing the 
schedule optimization. The optimization method directly grid's capacity to absorb wind power. This model 
impacts the stability and adaptability of the PS to RE integrated the effects of multiple price factors, resulting in 
fluctuations. These are essential to the functioning and low-carbon PS operation and cost optimization [13]. An 
advancement of contemporary PS, as well as its economic enhanced jellyfish search optimization technique was 
efficiency. Therefore, many scholars have carried out presented by Gami et al. to solve the optimal reactive 
various researches on PS scheduling optimization. For the power dispatch problem in HP renewable PSs. By 
optimal reactive power scheduling problem in PS, M. improving the algorithm's exploration and development 
Abd-El Wahab et al. suggested a hybrid method called stages, the study successfully optimized the PS's most 
augmented Jaya and artificial ecosystem-based secure and stable state [14]. This allowed the PS to operate 
optimization, which improved system stability, economic in both deterministic and probabilistic load demands and 
viability, and overall efficiency [7]. A nonconvex mixed RE resource states. 
integer and quadratic restricted planning technique was In summary, the existing research has made some 
presented by Cox J L et al. to solve the challenge of significant progress in PS scheduling optimization. 
optimizing a centralized solar power plant's profitability However, there are still deficiencies in the research for HP 
under changing solar resources. The method improved the of RE access, such as insufficient consideration of the 
solvability of the problem through exact and volatility and stochastic characteristics of RE, fewer 
approximation techniques, thus enabling operational studies on the collaborative scheduling of multi-timescale 
scheduling optimization in real-time decision support [8]. modules, and insufficiently perfect uncertainty handling 
To address the issue of system security and economic cost methods. Therefore, the study proposes to construct a 
over time in microgrid scheduling, Zhang et al. suggested mathematical model of HP of REPS scheduling 
a multi-timescale scheduling model that incorporated load optimization and introduce a heuristic algorithm to solve 
voltage and frequency dynamics. To minimize economic it. The innovation of the study is to propose a multi-
cost while maintaining voltage and frequency stability, the module cooperative optimization framework to accurately 
study converted it into a multi-objective optimization deal with uncertainty and extreme scenarios. Meanwhile, 
problem that took into account economic cost, voltage the optimization algorithm improves the solution 
deviation, and frequency stability. This improved the efficiency and provides new ideas for complex PS 
microgrid dispatch's efficiency and dependability. [9]. To scheduling. 
address the global issues brought on by the recent 
explosive increase in the demand for electricity, Hou et al. 
developed an integrated day-ahead multi-objective 3 Methods and materials 
microgrid optimization framework. The framework This section provides a detailed description of the PS 
produced more affordable, dependable, and ecologically scheduling optimization method proposed in the study. 
friendly power supply services by combining demand-side The method consists of scheduling optimization 
management, forecasting methods, and economic- mathematical model and scheduling optimization heuristic 
environmental dispatch [10]. algorithm. The combination of the two effectively 
Large-scale access to the PS by RE sources affects the improves the scheduling efficiency and stability of the PS 
output characteristics of wind and photovoltaic energy with a HP of RE access. 
sources. These sources exhibit strong intermittency, 
randomness, and volatility due to weather, climate, and 3.1 Mathematical model construction for 
other external natural factors. These challenges have led power system scheduling optimization 
to the urgent need for innovation and optimization of 
existing dispatch methods to adapt to the new situation of The PS suffers scheduling complexity issues brought on 
HP of RE access. A mixed-integer linear programming by volatility and uncertainty as a result of the extensive 
method was put up by Shirzadi et al. to address the issue access to HP of RE sources. Moreover, the conventional 
of enhancing the efficiency and dependability of RE phased scheduling approach finds it challenging to satisfy 
systems. The study optimized the PS's daily operating the needs of system stability and economy [15-16]. 
expenses and system resilience by combining a unique Therefore, the study proposes a PS scheduling 
hybrid model with deep learning and statistical modeling optimization method for HP of RE access. 
to forecast the load demand and wind power output (PO) The two main cores of the method are a mathematical 
for the ensuing three days [11]. Due to the impact of the model that can achieve multi-module co-optimization by 
volatility of wind and solar power generation on PS comprehensively considering system costs, constraints, 
operation, Guo et al. proposed multi-stage optimization, and uncertainties. The heuristic algorithm can efficiently 
online optimization, and multi-timescale optimization for solve complex optimization problems, taking into account 
RE integration. This study realized the strategic the global search and local fine optimization. Figure 1 
scheduling and control of energy storage units and depicts the method's general framework. 
improved the efficiency of RE integration in the power 
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 431 
Input Module 1 Mathematical Model 2
Construction Module
Load Demand Forecast Objective Function
Renewable Energy Output Scenarios Constraints
Various parameters Uncertainty Handling
Output Module 4 Heuristic Algorithm 3
Design Module
Iterative
Optimized Scheduling Plan Initialization Phase
Optimization Phase
Performance Metrics Optimal Termination 
Solution Output Condition Evaluation
 
Figure 1: Overall framework of power system scheduling optimization method 
Real-Time Operational Feedback Optimization
UC On/Off ED AGC 
Base
Module Status Input Module Module
Load Input
Generator Real-Time Power 
UC Decisions
Output Dispatch Adjustment
Short-Term
Long-Term Decisions Medium-Term Dispatch
Real-Time Adjustment
Objective 
Constraints
Function
Optimized Results
 
Figure 2: Closed-loop sequential optimization process among UC, ED, and AGC modules
Four components make up the general architecture of including unit start/stop status, power allocation, and 
the PS scheduling optimization approach suggested in the standby capacity configuration. It also evaluates the 
study, as shown in Figure 1: the input module, the output economy and stability of the scheduling scheme through 
module, the heuristic algorithm design module, and the performance indicators. 
mathematical model construction module. The input The mathematical model developed in this study 
module contains load demand forecast, RE output differs from the traditional stage-wise sequential 
scenarios, and various parameters as support. The scheduling model in terms of the scheduling optimization 
mathematical model construction module is then approach. The proposed model forms a closed-loop 
responsible for constructing the PS scheduling sequential scheduling framework with dynamic feedback 
optimization model based on three main foundations: coupling among modules by integrating the UC for start-
objective function (OF), constraints, and uncertainty stop decisions, the ED for cost minimization, and the AGC 
handling [17-18]. The scheduling optimization model for real-time supply-demand balancing. These modules 
based on the four steps is effectively solved using the operate across different time scales and interact through 
heuristic algorithm design module. Finally, the output feedback mechanisms to realize coordinated optimization. 
module generates the optimized scheduling plan, 
432 Informatica 49 (2025) 429–440 X. Qiao et al. 
The principle of inter-module coordination is illustrated in In Equation (3), t  denotes the time period when the 
Figure 2. unit is turned on. T  denotes the minimum 
min-on,i
As shown in Figure 2, the scheduling optimization 
continuous operation time of the i th unit. The output 
process proposed in this study adopts a closed-loop 
power constraint is shown in Equation (4). 
sequential optimization mechanism, consisting of three 
Pi,min  Pi,t  P
main modules: UC, ED, and AGC. These modules interact i,max   (4) 
across multiple time scales through real-time feedback to In Equation (4), P  and 
i ,mi P  are the maximum 
n i,max
achieve dynamic coordination. During each scheduling and minimum PO of the i th unit. P  denotes the PO of 
i,t
cycle, the UC module first optimizes the on-and-off status 
of units based on current load forecasts, reserve the i th unit at time t . Both Equation (3) and Equation (4) 
requirements, and other system parameters. Then, it hold only when the value of u  is 1. The ED module is 
i,t
passes the results to the ED module. Then, the ED module responsible for optimizing the power allocation of the 
performs power allocation and generates a base load turned-on units after the UC determines the SSS of the 
profile for the AGC module, which makes real-time, units [20]. The power balance constraints in the ED 
short-term power adjustments. Unlike traditional dispatch module are shown in Equation (5). 
models, which operate in isolated stages, the AGC module N
in this framework generates feedback information, such as Pi,t + PRES ,t = Dt ,t  (5) 
i=1
load correction values and reserve margin stress levels, 
continuously during its adjustment process. Rather than In Equation (5), P  denotes the RE output at time 
RES ,t
discarding this data, it is fed back as correction inputs into t . D  denotes the load demand at time t . The climbing 
t
the next UC scheduling cycle. Specifically, the system capacity constraint is shown in Equation (6). 
monitors the magnitude and frequency of AGC 
P, − P up
, 1 , down
i t i t−  Ri Pi,t−1 − Pi,t  Ri ,i, t  (6) 
adjustments in the previous cycle. If frequent or 
In Equation (6), Rup  and Rdown
significant real-time corrections are observed, it indicates  are the upper and 
i i
potential deficiencies in load forecasting or reserve lower climbing limits for unit i , respectively. The reserve 
planning. In response, the system increases the reserve capacity constraint is shown in Equation (7). 
capacity settings for the next cycle to improve operational N
redundancy. At the same time, the load forecast is Ri,t  Rrequired ,t , Ri ,t = Pi ,max − Pi ,t ,t  (7) 
corrected by incorporating observed deviations into the i=1
predicted curve. This enables the UC module to make In Equation (7), R  is the standby capacity 
required ,t
more accurate start and stop decisions that reflect actual requirement of the system at time t . R  denotes the 
i ,t
system demand. This adaptive feedback mechanism is 
standby capacity that the i th unit can provide at time t . 
repeated in every cycle, progressively refining UC 
Equation (8) illustrates how the AGC module regulates the 
decisions to better match real-world operating conditions 
fluctuations by modifying the power in real time 
and improve overall dispatch responsiveness. 
depending on ED. 
In the mathematical model, schedule optimization 
aims to reduce the system's overall running costs. The OF P = base
i ,t Pi ,t + Pi ,t ,i,t   (8) 
is set as shown in Equation (1). In Equation (8), Pbase
i,t  denotes the base point load 
min Z =
provided by the ED module [21]. P  is the real-time 
T  N    (1) i,t
(C fuel ,i,t +Cstart/stop,i,t )+Cresreve,t +CEENS ,t  power adjustment of the AGC module. The real-time 
t=1  i=1 
balancing constraint is shown in Equation (9). 
In Equation (1), Z  denotes the total system cost. T  
N
denotes the total quantity of scheduling time segments. N  Pi ,t = Dt ,
is the total quantity of units. C  is the fuel cost of the i=1
fuel ,i ,t   (9) 
N
i th unit at time t . C  is the startup and shutdown 
start /stop,i,t D = D −Pbase
t t i ,t − PRES ,t
cost of the i th unit at time t . C  is the standby cost i=1
resreve,t
In Equation (9), D  represents the difference 
at time t . C  is the desired power deficit cost at time t
EENS ,t
between the actual load demand and the base point load 
t , which mainly measures the supply-demand imbalance 
and RE output. The adjustment speed constraint is shown 
caused by RE fluctuations [19]. The UC module is 
in Equation (10). 
responsible for optimizing the start-stop state (SSS) of the 
P  Rresponse
i,t i    (10) 
units. The SSS constraint u  is shown in Equation (2). 
i,t
In Equation (10), Rresponse  denotes the upper limit of 
ui,t 0,1,i, t   (2) i
real-time regulation speed. As a result, the synergistic 
In Equation (2), a value of 1 for u  indicates that the 
i,t relationship among the UC, ED, and AGC modules is 
unit is on and a value of 0 indicates that the unit is off. realized through the tight coupling of inputs and outputs. 
Equation (3) displays the unit start/stop time limitation. The UC provides the SSS for the ED, the ED provides the 
T base point load for the AGC, and the feedback from the 
ui,tt  Tmin-on,i   (3) AGC optimizes the start-stop strategy of the UC. 
t=1
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 433 
The proposed model incorporates uncertainty in wind search capabilities using quantum behavioral 
and solar output directly into the scheduling process to mechanisms. Hence, this study combines improved 
enhance adaptability to RE fluctuations. A limited number DDPG and QPSO to propose a joint heuristic algorithm. 
of representative renewable output scenarios are generated Figure 3 illustrates the computational flow of the 
during each scheduling cycle by applying random enhanced DDPG in this technique. 
deviations to forecasted values based on recent historical In Figure 3, the study introduces a dual experience 
variation. These scenarios simulate possible short-term pooling mechanism in DDPG, which balances exploration 
fluctuations in renewable generation. The reserve capacity and utilization by storing diverse samples and high-value 
constraint is adjusted accordingly based on the observed samples separately to improve training efficiency and 
fluctuation range, ensuring sufficient buffer during high- policy quality. Second, to prevent falling into the local 
variability periods. In the AGC stage, real-time control optimum, a time-decaying exploration noise technique is 
targets are fine-tuned using deviation trends derived from used to boost exploration at the beginning and improve 
these scenarios. The proposed model maintains dispatch stability at the end. Finally, the target network update 
feasibility and system stability under uncertain renewable strategy is optimized to dynamically adjust the target 
output conditions by dynamically updating reserve network parameters through the soft update method to 
settings and AGC parameters. enhance the training stability and convergence speed. 
The improved DDPG workflow consists of four main 
3.2 Power system scheduling optimization stages. First, in the initialization phase, the Critic network, 
heuristic algorithm design Actor network, and their target networks are randomly 
initialized. Additionally, two experience pools, B1 and 
The proposed mathematical model for optimizing PS B2, are established. B1 stores the initial experience 
scheduling takes into account total system operating costs, samples. B2 stores the high-value samples that are 
the synergistic optimization of multiple modules, and the selected using a filtering mechanism. The dual-experience 
uncertainty associated with a high proportion of RE pool design maintains diversity in the training data, which 
sources. This provides a theoretical basis for scheduling. improves sample selection efficiency. Next, within the 
However, the simple model may be inefficient or training iterations and time step loop, the agent generates 
susceptible to local optimization when dealing with actions via the policy network. This enhances the 
complex, multi-constraint, multi-timescale optimization exploration of unknown strategies by adding exploration 
problems [22]. Therefore, heuristic algorithms are noise. After interacting with the environment, experience 
introduced to optimize the mathematical model and solve samples are generated and stored in B1. High-value 
it. In deep reinforcement learning, the DDPG method samples are then selected based on reward values and 
effectively optimizes unit SSS and power allocation. stored in B2. In this way, the experience pool contains 
However, it may converge slowly and become trapped in both common experience samples and high value samples. 
local optima when solving complex problems with This ensures the samples are diverse and valuable for 
multiple constraints [23]. QPSO overcomes the limitations training, thereby improving the algorithm's learning 
of DDPG by improving particle diversity and global efficiency. 
Sort Pool 1 by reward 
Start (descending); remove 
low-reward samples to 
Initialize the Critic and Actor 
Store experience in Pool 1 form Pool 2
networks
Initialize two target Perform action, receive reward and 
networks next state
Is Pool 1 full?
Initialize experience Select actions based on the current N
pools 1 and 2 strategy and explore noise
Y
N
Sample a batch from 
Is max training round Y Is the number of time Pool 2 using the 
reached? Steps at its maximum? strategy
Y
N Update Critic network 
and Actor network
Get initial state from 
environment
Update two target 
End networks
 
Figure 3: The framework of DDPG-QPSO joint heuristic algorithm 
434 Informatica 49 (2025) 429–440 X. Qiao et al. 
Initialize particle positions
Check whether the maximum 
number of iterations or 
Start
convergence condition is reached Initialize algorithm parameters
Initialization
Update particle positions based on Calculate the initial 
the quantum behavior formula fitness values of particles
Main loop
Determine the global best position 
Calculate fitness values
and individual best positions
Output the 
Update the global best position 
optimal solution
and individual best positions
Dynamically adjust parameters End
 
Figure 4: Schematic diagram of the algorithm flow of QPSO 
Then, a small batch of high-value samples is sampled  
from B2 to update the Critic and Actor networks. The 
(t+1) (t )  1 
Critic network is updated using the error calculated from xi, j = Pi, j    xi, j − Pi, j  ln    (11) 
 u 
the target value. Meanwhile, the Actor network is updated 
(t ) +1
using the policy gradient method to maximize the long- In Equation (11), x )
i , j  and (t
x  denote the updated 
i, j
term cumulative reward. This optimization allows the position of particle i after t  and t +1 iterations in the j
model to continuously improve its policy and value 
th dimension. P  denotes the reference point of particle 
function, thereby enhancing the quality of its decisions. i , j
Finally, the research employs a soft update method that i  in the j th dimension.   denotes the quantum 
dynamically adjusts the target network parameters. This modulation factor. u  denotes the random number, which 
further improves training stability and convergence speed. is used to introduce randomness to give the particle a non-
The soft update strategy smoothly adjusts the target deterministic update property. P  is obtained from the 
i , j
network parameters. This prevents excessive fluctuations 
GOP and the IOP with certain weights, as shown in 
in the network during training and avoids instability 
Equation (12). 
caused by dramatic parameter updates. Figure 4 depicts 
the QPSO algorithm's flow. Pi, j =  pbest ,i, j + (1− )  gbest , j   (12) 
In Figure 4, the overall process of QPSO algorithm is In Equation (12), p  and  denote the IOP 
best ,i, j gbest , j
not much different from the traditional PSO algorithm. and GOP, respectively.   denotes the inertia factor, 
The steps are initializing particle positions and 
which is used to control whether the particle prefers the 
parameters, calculating fitness values, updating global 
individual OS or the global OS. In the integration of 
optimal position (GOP) and individual optimal position 
DDPG and QPSO, the improved DDPG algorithm first 
(IOP), dynamically adjusting parameters, and iterative 
generates an initial dispatch strategy based on the current 
judgment to achieve the output optimal solution (OS). 
environmental state and load demand information. This 
However, the core difference between the two is the way 
strategy includes the start-stop decisions and power 
of updating the particle position. Traditional PSO is based 
allocation for each generation unit over all time periods. 
on the iterative formula of velocity and position, while 
The output of DDPG is a deterministic decision vector, 
QPSO adopts the quantum behavioral formula, which 
representing an executable scheduling solution. This 
constructs the quantum distribution of the particle position 
comprehensive scheduling solution serves as a key 
through the GOP and IOP. In the QPSO algorithm, the 
reference for initializing the population in the QPSO 
quantum modulation factor controls the randomness of the 
algorithm. More specifically, the DDPG output is encoded 
particle update process and regulates the particles' ability 
as a particle position within the QPSO search space, which 
to explore the search space. This enhances search diversity 
is then assigned as the initial position of at least one 
and prevents the algorithm from getting trapped in local 
particle within the swarm. The remaining particles are 
OSs. Quantum distribution describes the probabilistic 
initialized in the vicinity of this solution through random 
characteristics of particle position updates. New particle 
perturbations, ensuring that the initial population has both 
positions are generated through formulas based on 
guidance and diversity. Based on this initialization, QPSO 
quantum behavior by combining global and IOPs. This 
performs global search optimization. Its quantum-
reflects the non-deterministic update mode inspired by 
behavior mechanism further refines and adjusts the 
quantum mechanics. The quantum behavioral formulation 
scheduling strategy, improving the solution's overall 
updates the particle positions as shown in Equation (11). 
stability and adaptability. 
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 435 
Initially, the global search is favored and the local system cost, a multi-module cooperative scheduling 
optimization is favored in the later stage. Combining the model comprising UC, ED, and AGC modules is built. 
above, the final PS scheduling optimization flow designed The improved DDPG is utilized to generate the initial 
by the study is shown in Figure 5. optimization strategy, which is further optimized by 
In Figure 5, the final PS scheduling optimization QPSO. The optimized scheduling plan covers unit 
process consists of inputting load demand forecasts, RE start/stop status, power allocation, and standby capacity 
output scenarios, and related parameters to provide basic configuration. It ultimately achieves efficient and stable 
data for optimization. With the aim of reducing the overall scheduling optimization. 
Input load demand forecast
Input renewable energy 
output scenarios
Input unit parameters
Determine Set constraint 
Input uncertainty factors optimization objectives conditions
DDPG strategy as qPSO Improved DDPG for Collaborative 
initialization strategy generation scheduling model
QPSO algorithm outputs Output the optimized 
the global optimal solution scheduling plan
 
Figure 5: Final power system scheduling optimization process 
Table 1: Experimental environment configuration 
Hardware configuration Software configuration 
CPU Intel Core i9-12900K (16 cores, 3.2 GHz) Operating system Ubuntu 22.04 LTS 
GPU NVIDIA GeForce RTX 3090 (24GB VRAM) Programming language Python 3.10 
Memory 32GB DDR4 Deep learning framework TensorFlow 2.10 
Storage 1TB SSD Optimization algorithm library NumPy, SciPy, Pyomo 
Power supply 850W High-Efficiency Power Supply Power system simulation tools MATPOWER 7.1 (MATLAB Toolbox) 
/ / Data processing tools Pandas 
Table 2: Results of model scheduling performance differences 
IEEE 30-Bus Test System IEEE 118-Bus Test System 
Indicator Traditional sequential Traditional sequential 
The proposed model The proposed model 
scheduling model scheduling model 
Total cost $12,500 $10,800 $48,200 $42,700 
Fuel cost $7,200 $6,500 $28,500 $26,000 
Startup/shutdown cost $3,000 $2,500 $12,000 $10,200 
Reserve cost $2,000 $1,500 $6,500 $5,500 
Demand-supply Imbalance cost $300 $300 $1,200 $1,000 
Supply-demand deviation (MW) 5.5 4.2 25.0 18.5 
Response time (s) 10.1 8.3 18.7 13.5 
 
4 Results 4.1 Validation of mathematical model for 
The efficiency and superiority of the PS scheduling power system scheduling optimization 
optimization methods suggested in the study are A multi-module cooperative scheduling model including 
confirmed in this section using both heuristic algorithms UC, ED, and AGC modules is constructed with the goal 
and mathematical models. The focus is on verifying the of lowering the overall system cost. 
effectiveness of multi-module collaboration and Based on Table 1, the study selects the 30-node test 
uncertainty handling, as well as multi-timescale system and the 118-node test system from the IEEE 
optimization and the performance enhancement and standard examples. The former includes 30 nodes, 41 
comprehensive optimization capabilities of the improved transmission lines, 6 generators, and 20 load nodes, which 
DDPG, QPSO, and joint heuristic algorithms. are suitable for preliminary verification and 
experimentation. The latter includes 118 nodes, 186 
transmission lines, 54 generators, and 99 load nodes, 
which can be used for in-depth research on the 
436 Informatica 49 (2025) 429–440 X. Qiao et al. 
optimization capabilities of multi module collaborative 5.2s respectively, improving the dynamic response 
scheduling and heuristic algorithms. First, the capability. Overall, the mathematical model proposed in 
performance of dispatches under the proposed closed-loop the study has achieved more efficient resource utilization 
sequential scheduling model based on UC-ED-AGC in small-scale systems and demonstrated superior 
feedback is compared with that under a traditional stage- adaptability to complex problems in large-scale systems. 
wise sequential scheduling model. The traditional model The suggested model is compared to conventional PS 
is a dispatch process in which the UC, ED, and AGC scheduling models that do not take uncertainty handling 
modules run independently in a fixed order. This process into account since it takes into account the uncertainty of 
does not consider RE uncertainty or provide feedback or RE. The result is shown in Figure 6. 
coordination. To ensure a fair comparison, both models In Figure 6 (a), within 30 days of PS scheduling 
are solved using the same optimization algorithm (QPSO) optimization, the proposed model achieves a power supply 
under identical system configurations and forecast reliability of over 94%, with an average of 96.58%. 
conditions. This setup ensures that performance However, traditional models that do not consider 
differences are attributed to model structure rather than uncertainty processing have the highest power supply 
solver differences. Table 2 displays the findings. reliability of only 93.73%, with an average of only 
In Table 2, the mathematical model proposed in the 92.16%. In Figure 6 (b), for the utilization rate of backup 
study demonstrates advantages in both IEEE 30 node and capacity, after 30 days of model operation, the proposed 
118 node testing systems. In terms of economy, the total model increases the utilization rate to between 75% and 
cost of the 30 node system has been reduced by $1700, 87%, while the utilization rate of the traditional model 
and the 118 node system has been reduced by $5500. It only fluctuated between 60% and 70%. The outcomes 
optimizes fuel, start stop, and reserve capacity costs. In displays that the proposed model improves the 
terms of supply-demand balance capability, the supply- adaptability of RE fluctuations in PS scheduling 
demand deviation has been reduced by 1.3 MW and 6.5 optimization. Finally, on various time scales, the impact 
MW respectively, effectively addressing the uncertainty of scheduling optimization of the study's suggested model 
of load demand and RE fluctuations. Meanwhile, the real- is confirmed. The result is shown in Figure 7. 
time adjustment response time is shortened by 1.8s and 
The proposed model Traditional model The proposed model Traditional model
100 90
98
80
96
94 70
92
90 60
88
50
86
84 40
0 30 0 30
Time (days) Time (days)
(a) Comparison of supply reliability (b) Comparison of reserve utilization rate
 
Figure 6: Results compared with traditional models that do not consider uncertainty processing 
 
In Figure 7(a), in the short-term time (24 hours), the whereas after optimization the energy consumption rate 
frequency deviation (FD) of the PS dispatch before improves to 93.84%. In Figure 7(c), in the long-term time 
optimization is much larger, up to more than 4Hz. i.e., one year, the optimized PS dispatch significantly 
Whereas, after optimization using the research model, the reduces the dispatch cost from $45,600 to $37,860 while 
FD of the PS dispatch is effectively controlled and remains the pre-optimization dispatch cost is $43,080. The 
between -2Hz and 2Hz. In Figure 7(b), the energy outcomes reveal that the suggested model performs better 
consumption rate of the PS dispatch before optimization in terms of long-term economics, medium-term 
averages 84.15% during the interim time i.e., one week, efficiency, and short-term stability. 
 
Supply reliability (%)
Reserve utilization rate (%)
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 437 
6
Before optimization After optimization
4
2
0
-2
-4
0 2 4 6 8 10 12 14 16 18 20 22 24
Time (hour)
(a) Short-term optimization effect
100
Before optimization 46,000
After optimization
95 44,000
42,000
90
40,000
85
38,000
Before optimization
80 36,000 After optimization
75 34,000
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 10 11 12
Time (day) Time (month)
(b) Medium-term optimization effect (c) Long-term optimization effect
 
Figure 7: Scheduling optimization effect on different time scales 
Traditional DDPG
48000 58000 Traditional DDPG
Improved DDPG Improved DDPG
46000
56000
44000
54000
42000
52000
40000
50000
38000
48000
36000
46000
0 40 80 120 160 200 0 80 160 240 320 400
Iterations Iterations
(a) IEEE 30-Bus Test System (b) IEEE 118-Bus Test System  
Figure 8: Comparison of DDPG algorithm before and after improvement 
 
4.2 Validation of heuristic algorithms for the learning rates of the Critic network and the Actor 
power system scheduling optimization network are set to 0.0001. The discount factor is 0.99, the 
batch size is 64, and the experience pool size is 1,000,000. 
After the validity and superiority of the mathematical The noise is examined through the use of an Ornstein-
model proposed by the study is verified, the study further Uhlenbeck process, which has an initial standard deviation 
validates the involved heuristic algorithms. Experiments of 0.2 and undergoes attenuation during the training 
are first conducted for the improvement of the DDPG process. The target network adopts a soft update with an 
algorithm. The DDPG before and after the improvement update parameter of 0.001 to ensure stability. The results 
is applied to solve the mathematical model proposed by are shown in Figure 8. 
the study at the IEEE 30-node and 118-node test systems. Figure 8(a) shows that the traditional DDPG 
When conducting the experiment, in the DDPG algorithm,  algorithm converges after 160 iterations when using the 
 IEEE 30-node system, resulting in a total cost of $39,560. 
Total cost (USD) Renewable energy utilization rate (%)
Frequency deviation (Hz)
Total cost (USD) Total cost (USD)
438 Informatica 49 (2025) 429–440 X. Qiao et al. 
The improved DDPG converges after 80 iterations, and In Figure 9(a), in the IEEE 30-node test system, 
the total cost is reduced to $37,960. This improvement QPSO has the fastest convergence speed among the five 
benefits from dual experience pooling. Storing low- algorithms and the lowest final fitness value. In Figure 
quality samples and high-value samples separately 9(b), in the IEEE 118-node test system, again QPSO has 
optimizes the efficiency of sample utilization and the fastest convergence speed among the five algorithms 
improves the training speed. This accelerates the and the lowest final fitness value. It can be concluded that 
convergence process and reduces the cost. Meanwhile, QPSO effectively enhances the global search capability of 
time-decay exploration improves initial exploration particles through the quantum behavior mechanism. Both 
capabilities and stabilizes strategy optimization in later in the smaller-scale IEEE 30-node system and in the more 
stages, accelerating convergence and reducing costs. In As complex IEEE 118-node system, QPSO shows superior 
shown in Figure 8(b), both the traditional and improved performance, proving its adaptability to problems of 
DDPG require more iterations to converge when using the different scales and complexities. Finally, the study 
more complex IEEE 118-node system. However, the applies the proposed mathematical model in combination 
improved DDPG still performs better and has a lower with the joint DDPG-QPSO heuristic algorithm in HP RE 
convergence cost. The dual experience pool and time scheduling optimization. The more advanced methods in 
decay exploration strategy effectively improves the reference [11], [12], 13], and [14] are selected as 
algorithm's adaptability and convergence efficiency in comparison methods. The algorithms from references [11] 
large-scale systems, demonstrating its superiority. to [14] are re-implemented by the research team based on 
Furthermore, the optimization effect of QPSO is validated the original descriptions in the respective papers. Each 
through research, and differential evolution (DE), grey method is tuned within a reasonable range of parameters 
wolf optimizer (GWO), and wolf search algorithm (WSA) based on the recommended settings. Then, it is validated 
are selected for comparison. In the QPSO algorithm, the to ensure optimal performance in the current test 
number of particles is 50, the maximum number of environment. All methods are evaluated under the same 
iterations is 1000, and the inertia factor is 0.9. The learning experimental conditions, which includes load forecast 
factors are set to 1.5 and 2.0, respectively, to control the profiles, RE output scenarios, system topology, and a 
global and local optimal attractive forces. Setting the unified evaluation period. All performance metrics are 
quantum modulation factor to 0.5 enhances the flexibility kept consistent across experiments. The results are 
of the particle position update. Figure 9 displays the presented in Table 3.
findings. 
PSO PSO
DE DE
1.0 1.0
GWO GWO
WSA WSA
0.8 0.8
QPSO QPSO
0.6 0.6
0.4 0.4
0.2 0.2
0
0 40 80 120 160 200 0 40 80 120 160 200
Iterations Iterations
(a) IEEE 30-Bus Test System (b) IEEE 118-Bus Test System
 
Figure 9: Optimization effect verification of QPSO 
Table 3: Comprehensive comparison between research methods and reference methods 
Method Total cost ($) Energy utilization rate (%) Supply reliability (%) Frequency deviation (Hz) Optimization time (s) 
Proposed method 37960 94.85 97.10 1.25 58.3 
Reference [11] 40230 90.76 93.85 2.64 125.4 
Reference [12] 39760 92.42 94.50 2.18 98.7 
Reference [13] 38450 93.57 95.87 1.89 75.6 
Reference [14] 38930 93.25 95.30 2.01 88.9 
 
In Table 3, the proposed mathematical model and the methods with the lowest total cost of $37960 and 94.85% 
joint DDPG-QPSO heuristic algorithm of the study show energy consumption rate. Meanwhile, the reliability of 
more obvious advantages in HP of REPS scheduling power supply reaches 97.10%, the FD is only 1.25 Hz, and 
optimization. The proposed method outperforms other the optimization time is 58.3 s. The proposed algorithm 
Fitness value
Fitness value
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 439 
demonstrates excellent economy, system stability, and system and production process optimization. IEEE 
solution efficiency, and provides a highly efficient and Transactions on Industry Applications, 58(2):1581-
reliable solution for PS scheduling optimization in high- 1591, 2022. 
percentage RE scenarios. https://doi.org/10.1109/TIA.2022.3144652 
[2]  A. A. Lebedev, A. A. Voloshin, and A. N. Lednev. 
5 Discussion and conclusion Conceptual framework for developing highly-
automated power distribution networks and 
Targeting the scheduling issue brought on by the HP of micropower systems. Power Technology and 
RE access in the PS, the study put out a mathematical Engineering, 58(1):163-168, 2024. 
model for multi-module cooperative scheduling that https://doi.org/10.1007/s10749-024-01790-2 
brought together three main modules and used the [3]  Ehsan Naderi, Lida Mirzaei, Mahdi Pourakbari-
enhanced DDPG and QPSO algorithms to address the Kasmaei, Fernando V. Cerna, and Matti Lehtonen. 
issue. The efficacy of the study's suggested model and Optimization of active power dispatch considering 
algorithm was confirmed by experimental findings. In the unified power flow controller: application of 
IEEE 30-node and 118-node test systems, the proposed evolutionary algorithms in a fuzzy framework. 
model reduced the total scheduling cost by $1,700 and Evolutionary Intelligence, 17(3):1357-1387, 2024. 
$5,500, respectively, compared with the traditional https://doi.org/10.1007/s12065-023-00826-2 
sequential scheduling model. It enhanced the energy [4]  Lilin Cheng, Haixiang Zang, Anupam Trivedi, Dipti 
consumption rate and power supply reliability. The Srinivasan, Zhinong Wei, and Guoqiang Sun. 
improved DDPG algorithm increased the convergence Mitigating the impact of photovoltaic power ramps 
speed by 50% from $39,560 to $37,960 in the 30-node on intraday economic dispatch using reinforcement 
system by introducing a dual experience pool and a time forecasting. IEEE Transactions on Sustainable 
decay exploration strategy. QPSO exhibited a stronger Energy, 15(1):3-12, 2023. 
global search capability, with the fastest convergence https://doi.org/10.1109/TSTE.2023.3261444 
speed and the lowest final adaptation value in systems of [5]  Yu Dong, Xin Shan, Yaqin Yan, Xiwu Leng, and Yi 
different sizes compared to other algorithms. In addition, Wang. Architecture, key technologies and 
the study's optimization experiments on short-, medium-, applications of load dispatching in China power grid. 
and long-term time scales revealed that the FD was Journal of Modern Power Systems and Clean 
effectively reduced, the energy consumption rate was Energy, 10(2):316-327, 2022. 
improved by 9.69%, and the total dispatch cost was https://doi.org/10.35833/MPCE.2021.000685 
reduced by 17.6%. The adaptability and superiority of the [6]  Huating Xu, Bin Feng, Chutong Wang, Chuangxin 
model in cooperative optimization over multiple time Guo, Jian Qiu, and Mingyang Sun. Exact box-
scales were demonstrated. constrained economic operating region for power 
The DDPG and QPSO algorithms perform well in the grids considering renewable energy sources. Journal 
test system. However, they may face challenges regarding of Modern Power Systems and Clean Energy, 
scalability and adaptability in an actual power grid. As the 12(2):514-523, 2023. 
power grid grows, its computational complexity will https://doi.org/10.35833/MPCE.2023.000312 
increase significantly. This is particularly relevant when [7]  Ahmed M. Abd-El Wahab, Salah Kamel, Mohamed 
working with large volumes of data and real-time H. Hassan, José Luis Domínguez-García, and Loai 
dispatching. These factors can lead to a shortage of Nasrat. Jaya-AEO: an innovative hybrid optimizer 
computing resources and excessively long training times. for reactive power dispatch optimization in power 
In addition, the diversity of power grid topologies and systems. Electric Power Components and Systems, 
operating conditions may affect the algorithm's 52(4):509-531, 2024. 
adaptability. The power grid itself contains complex https://doi.org/10.1080/15325008.2023.2227176 
generators, energy storage systems, and distributed energy [8]  John L. Cox, William T. Hamilton, Alexandra M. 
resources. Corresponding adjustments to the algorithm are Newman, Michael J. Wagner, and Alex J. Zolan. 
required to effectively address these. In terms of real-time Real-time dispatch optimization for concentrating 
performance, the algorithm functions well in a simulated solar power with thermal energy storage. 
environment. However, in a highly dynamic actual power Optimization and Engineering, 24(2):847-884, 2023. 
grid, it may not respond promptly to load fluctuations and https://doi.org/10.1007/s11081-022-09711-w 
changes in RE. This affects the stability of the system. [9]  Huifeng Zhang, Dong Yue, Chunxia Dou, and 
Therefore, although the algorithm performs well in the test Gerhard P. Hancke. PBI based multi-objective 
system, it still needs further verification and optimization optimization via deep reinforcement elite learning 
for practical applications to improve stability and response strategy for micro-grid dispatch with frequency 
speed. dynamics. IEEE Transactions on Power Systems, 
38(1):488-498, 2022. 
References https://doi.org/10.1109/TPWRS.2022.3155750 
[10] Sicheng Hou, and Shigeru Fujimura. Day-ahead 
[1]  Lei Gan, Tianyu Yang, Xingying Chen, Gengyin Li, 
multi-objective microgrid dispatch optimization 
and Kun Yu. Purchased power dispatching potential 
based on demand side management via particle 
evaluation of steel plant with joint multienergy 
swarm optimization. IEEJ Transactions on Electrical 
440 Informatica 49 (2025) 429–440 X. Qiao et al. 
and Electronic Engineering, 18(1):25-37, 2023. Environmental Protection, 178(1):715-727, 2023. 
https://doi.org/10.1002/tee.23711 https://doi.org/10.1016/j.psep.2023.08.025 
[11]  Navid Shirzadi, Fuzhan Nasiri, Claude El-Bayeh, [20] Maolin Li, Youwen Tian, Haonan Zhang, and 
and Ursula Eicker. Optimal dispatching of renewable Nannan Zhang. The source-load-storage 
energy-based urban microgrids using a deep learning coordination and optimal dispatch from the high 
approach for electrical load and wind power proportion of distributed photovoltaic connected to 
forecasting. International Journal of Energy power grids. Journal of Engineering Research, 
Research, 46(3):3173-3188, 2022. 12(3):421-432, 2024. 
https://doi.org/10.1002/er.7374 https://doi.org/10.1016/j.jer.2023.10.042 
[12]  Zhongjie Guo, Wei Wei, Mohammad Shahidehpour, [21]  Junjie Rong, Ming Zhou, Zhi Zhang, and Gengyin 
Zhaojian Wang, and Shengwei Mei. Optimisation Li. Coordination of preventive and emergency 
methods for dispatch and control of energy storage dispatch in renewable energy integrated power 
with renewable integration. IET Smart Grid, systems under extreme weather. IET Renewable 
5(3):137-160, 2022. Power Generation, 18(7):1164-1176, 2024. 
https://doi.org/10.1049/stg2.12063 https://doi.org/10.1049/rpg2.12893 
[13]  Dai Cui, Weichun Ge, Wenguang Zhao, Feng Jiang, [22]  Zhoujun Ma, Yizhou Zhou, Yuping Zheng, Li Yang, 
and Yushi Zhang. Economic low-carbon clean and Zhinong Wei Distributed robust optimal 
dispatching of power system containing P2G dispatch of regional integrated energy systems based 
considering the comprehensive influence of multi- on ADMM algorithm with adaptive step size. Journal 
price factor. Journal of Electrical Engineering & of Modern Power Systems and Clean Energy, 
Technology, 17(1):155-166, 2022. 12(3):852-862, 2023. 
https://doi.org/10.1007/s42835-021-00877-4 https://doi.org/10.35833/MPCE.2023.000204 
[14] Fatma Gami, Ziyad A. Alrowaili, Mohammed [23] Jian Hu, Yingjun He, Wenqian Xu, Yixin Jiang, 
Ezzeldien, Mohamed Ebeed, Salah kamel, Eyad S. Zhihong Liang, and Yiwei Yang. Anomaly detection 
Oda, and Shazly A. Mohamed. Stochastic optimal in network access-using LSTM and encoder-
reactive power dispatch at varying time of load enhanced generative adversarial networks. 
demand and renewable energsy resources using an Informatica, 49(7):175-186, 2025. 
efficient modified jellyfish optimizer. Neural https://doi.org/10.31449/inf.v49i7.7246 
Computing and Applications, 34(22):20395-20410,  
2022. https://doi.org/10.1007/s00521-022-07526-5  
[15]  Jatin Soni, and Kuntal Bhattacharjee. Multi-  
objective dynamic economic emission dispatch 
integration with renewable energy sources and plug-
in electrical vehicle using equilibrium optimizer. 
Environment, Development and Sustainability, 
26(4):8555-8586, 2024. 
https://doi.org/10.1007/s10668-023-03058-7 
[16] Yukang Shen, Wenchuan Wu, Bin Wang, and 
Shumin Sun. Optimal allocation of virtual inertia and 
droop control for renewable energy in stochastic 
look-ahead power dispatch. IEEE Transactions on 
Sustainable Energy, 14(3):1881-1894, 2023. 
https://doi.org/10.1109/TSTE.2023.3254149 
[17] Xiaojing Wang, Li Han, Mengjie Li, and Panpan Lu. 
A time-scale adaptive a forecasting and dispatching 
integration strategy of the combined heat and power 
system considering thermal inertia. IET Renewable 
Power Generation, 17(8):1966-1977, 2023. 
https://doi.org/10.1049/rpg2.12743 
[18] Bing Sun, Ruipeng Jing, Leijiao Ge, Yuan Zeng, 
Shimeng Dong, and Luyang Hou. Quick hosting 
capacity evaluation based on distributed dispatching 
for smart distribution network planning with 
distributed generation. Journal of Modern Power 
Systems and Clean Energy, 12(1):128-140, 2023. 
https://doi.org/10.35833/MPCE.2022.000604 
[19] Wei Liu, Tianhao Wang, Shuo Wang, Zhijun E, and 
Ruiqing Fan. Day-ahead robust optimal dispatching 
method for urban power grids containing high 
proportion of renewable energy. Process Safety and