https://doi.org/10.31449/inf.v49i16.7805 Informatica 49 (2025) 1–20 1
Automating Financial Audits with Random Forests and Real-Time
Stream Processing: A Case Study on Efficiency and Risk Detection
Jianlin Li1,2, Wanli Liu3*, Jie Zhang4
1School of Business Administration, Hebei University of Economics and Business Shijiazhuang 050061, Hebei, China
2Research Center for Corporate Governance and Enterprise Growth of Hebei University of Economics and Business,
Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China
3Finance Department, Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China
4Office of Scientific Research, Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China
E-mail: Jianlin_Li1748@outlook.com, L135821liu_ll@hotmail.com, Jie_Zhang0152@outlook.com
*Corresponding author
Keywords: artificial intelligence, financial audit, automated method
Received: December 11, 2024
In the current complex economic environment, enterprises are increasingly in need of efficient, accurate
and real-time financial audits. Traditional audit methods are difficult to cope with the challenges brought
by massive data and dynamic risks. This paper explores the automation method of financial audits based
on artificial intelligence in depth, aiming to improve audit efficiency and risk identification capabilities.
The study introduces the random forest algorithm, constructs 100 decision trees, self-samples data from
the training set, and randomly selects features at each node for splitting to reduce the overfitting risk of a
single decision tree and improve the generalization ability of the model. At the same time, with the help
of real-time data processing platforms such as Kafka and Blink, real-time collection, processing and
analysis of financial data are achieved to ensure the timeliness and dynamism of the audit process. After
a series of steps, including extracting 500 features from multi-source data, dividing the data set containing
5,000 records into 70% training set and 30% test set, the model is trained and evaluated. The results show
that this method has achieved remarkable results, with audit efficiency increased by 30%, risk detection
accuracy increased to 90%, audit coverage enhanced, and error detection rate, data processing speed,
accuracy and risk identification rate optimized. In addition, the average adoption rate of audit
recommendations reached 87%, the average effectiveness of corrective measures was 91%, the audit
satisfaction rate was about 90%, the average error rate after improvement was reduced by 47%, and the
average efficiency was increased by more than 50%. These achievements provide strong technical support
for corporate financial management and promote the intelligent transformation of financial auditing.
Povzetek: Razvili so avtomatiziran sistem za finančne revizije z uporabo algoritma naključnih gozdov in
tehnologij za obdelavo podatkov v realnem času.
1 Introduction the needs of modern enterprises for real-time risk
monitoring and rapid response. In this context, this study
In the current global economic environment, enterprises explores the financial audit automation method based on
are faced with increasingly complex financial artificial intelligence, aiming to promote the intelligent
management and audit requirements. With the rapid transformation of financial audit through technological
development of information technology, traditional innovation, and improve the financial management level
financial audit methods have been difficult to meet the and competitiveness of enterprises.
requirements of enterprises for efficient, accurate and real- In the current research situation in the field of
time audit. Advances in artificial intelligence technology, financial audit, scholars have put forward many
especially machine learning and data analysis technology, viewpoints and theories to explore the influence of various
have provided new solutions for financial auditing. With factors on audit pricing, audit quality and financial report.
the introduction of AI, the audit process can be highly Sun pointed out that the comparability of financial
automated, thereby improving audit efficiency and risk statements is related to audit pricing. The higher the
identification. Advanced algorithms such as Random comparability of financial statements, the lower the audit
Forest can process massive financial data, automatically cost [1]. Condie et al. studied the effect of audit experience
identify abnormal transactions and potential risks, reduce on the degree of financial reporting aggressiveness of
human errors, and improve the accuracy and reliability of chief financial officers (Cfos) and found that Cfos with
audits. At the same time, the application of real-time data more audit experience tended to report more
processing technologies such as Kafka and Blink ensure conservatively [2]. Koh et al. discussed the impact of
that the audit process is real-time and dynamic, meeting refinement of financial statements on audit pricing. The
2 Informatica 49 (2025) 1–20 J. Li et al.
higher the refinement of financial statements, the higher how to efficiently process and analyze data and discover
the audit cost [3]. Lyshchenko et al. emphasized the role potential risks in time has become an urgent problem to be
of financial audit in ensuring the reliability of financial solved. The data sources involved in the audit process are
statements, pointing out that audit can improve the diverse and the formats are different, and the complexity
information quality of financial statements [4]. Suryani's of data integration and cleaning increases the difficulty of
research shows that the scale and audit period of audit the audit. In view of the problems, the purpose of this
firms have an impact on fraud in financial statements, and study is to build an efficient and accurate financial audit
larger audit firms and longer audit period can effectively automation system by introducing artificial intelligence
reduce the occurrence of financial fraud [5]. Xu et al. used technology, especially random forest algorithm and real-
the simultaneous equation method to study the time data processing technology, aiming to improve audit
relationship between readability of financial reports and efficiency, enhance risk identification ability, optimize
audit costs, and found that the more difficult financial data processing process, and realize real-time audit
reports are to read, the higher the audit costs [6]. process.
When discussing the relationship between electronics, In order to achieve the research objectives, this study
artificial intelligence and the information society, will adopt a number of advanced technologies and
Erdmann et al. pointed out that the rules of the information methods. The random forest algorithm is used to analyze
society are the key link between the three, emphasizing the and predict financial data and automatically identify
important role of electronics in promoting the progress of abnormal transactions and potential risks. The algorithm
the information society [7]. In addition, Ijadi Maghsoodi improves the accuracy and robustness of the model by
et al. proposed a method based on individual risk attitudes constructing multiple decision trees and randomly
when studying the optimization problem of investment selecting features at each node for splitting. Real-time data
strategies in virtual financial markets, which is of great processing platforms such as Kafka and Flink are
significance for understanding the dynamic changes of introduced to realize real-time collection, processing and
financial markets [8]. At the same time, Pragarauskaitė analysis of massive financial data, ensuring the dynamic
and Dzemyda used the Markov model to analyze frequent and timely audit process. The application of explainable
patterns in financial data, providing a new perspective for AI technologies, such as LIME and SHAP, improves the
financial market analysis [9]. These studies not only transparency and explainability of the model, so that
provide a theoretical basis for the in-depth analysis of this auditors can understand and interpret the audit results and
study, but also provide rich empirical evidence for enhance the trust in the audit conclusions. Establish
understanding the interaction between electronics, dynamic feedback and continuous learning system, collect
artificial intelligence and the information society. and analyze user feedback, continuously optimize audit
Lim's research finds that there is a relationship strategy and model parameters, and achieve continuous
between the financial capability of enterprises and the improvement of the system.
demand for audit quality. Enterprises with strong financial
capability are more inclined to choose high-quality audit Financial Audit
services [10]. Ismail et al. studied the relationship between Challenges
the effectiveness of the audit committee, the internal audit Data
function and the delay of financial reports, and found that Integration
the effectiveness of the audit committee and the strong AI
internal audit function can reduce the delay of financial Technology
reports [11]. Oussii and Boulila provided evidence on the
relationship between the financial expertise of the audit Risk
committee and the effectiveness of the internal audit Identification
function, pointing out that the audit committee with rich Random Forest
financial expertise can improve the effectiveness of the Algorithm Real-time Data
internal audit [12]. Research has shown that various Processing
aspects of auditing, such as audit experience, readability
of financial statements, internal audit function, and audit Figure 1: Schematic diagram of research content
committee expertise, all have an impact on audit quality
As shown in Figure 1, the implications of this research
and reliability of financial reporting. It provides
in the current scientific field are reflected in several ways.
theoretical basis and empirical support for further
By applying artificial intelligence technology to financial
exploring how to improve the quality of financial reports
audit, audit efficiency and accuracy are improved, human
by improving the audit process and methods.
errors are reduced, and the reliability of audit results is
At present, the financial audit field is faced with
enhanced. The real-time data processing technology and
multiple challenges, including low audit efficiency,
dynamic feedback mechanism introduced in the study
inaccurate risk identification, insufficient data processing
ensure the real-time and flexibility of the audit process,
ability and the lack of real-time audit process. Traditional
and meet the needs of modern enterprises for fast response
audit methods rely on manual operation and are easily
and real-time monitoring. By increasing the level of
affected by human factors, which makes it difficult to
automation in the audit process, this study has freed the
guarantee the accuracy and reliability of audit results.
auditor's energy to focus on higher-level analysis and
With the explosive growth of enterprise financial data,
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 3
decision-making, and improved the value and
Fores
effectiveness of the overall audit work. The results of this
t
study have practical significance to the financial audit
industry, and also provide useful reference for automation
and intelligence in other fields, and promote the Neur
application and development of artificial intelligence al
[19] 85% Low 600
technology in a wider range of fields. Netw
To better illustrate the position of the current study ork
relative to existing literature, we have summarized the
related research in Table 1 based on audit accuracy, audit Mod
[13] 83% SVM 650
efficiency, model types used, and dataset sizes. Through erate
this table, we can clearly see that the current study
outperforms previous research in terms of both audit Rand
accuracy and audit efficiency. Specifically, the current om
study uses a combination of random forest algorithms and Fores
real-time data processing techniques, achieving an audit t +
100
accuracy of 90% and 87% in audit efficiency, particularly [5] 90% High Real-
0
when handling large datasets. This indicates that the Time
introduction of real-time data processing technologies and Proc
optimized random forest models can significantly enhance essin
both audit accuracy and efficiency, providing strong g
technical support for the automation of financial auditing.
Table 1: Comparison of related research with current Existing research is still insufficient in terms of audit
study accuracy, efficiency and the ability to deal with complex
data, and cannot fully meet the urgent needs of enterprises
Aud Audi for efficient and accurate financial audits. Therefore, this
Dat
Refe it t Mod study aims to break through these bottlenecks and explore
aset
renc Acc Effic el better financial audit automation methods through
Siz
e urac ienc Type innovative technology integration to provide solid
e
y y guarantees for corporate financial management.
When processing large-scale, high-dimensional
Mod financial data, support vector machines (SVMs) have high
[14] 80% SVM 500
erate computational complexity, are prone to overfitting
problems, and are sensitive to the choice of kernel
functions, making it difficult to adapt to the diversity and
Deci
Mod complexity of financial data. Although the gradient
[17] 82% sion 600
erate boosting algorithm performs well in some scenarios, it is
Tree
sensitive to outliers, and financial data often contains
abnormal transaction records, which will affect the
Rand
accuracy and stability of the model. In addition, the
om
[10] 85% Low 700 gradient boosting algorithm takes a long time to train and
Fores
cannot meet the real-time requirements of financial audits.
t
In the process of financial audit automation, existing
methods are difficult to meet the needs of enterprises for
Grad efficient and accurate audits. So, how to deeply integrate
ient
[2] 83% Low 550 the random forest algorithm with real-time data processing
Boos technology, while taking into account the processing of
ting massive data, improve the sensitivity to subtle anomalies
in complex financial data, and achieve more
Naiv comprehensive and accurate risk identification and
Mod e
[1] 81% 650 auditing? This has become a key issue that needs to be
erate Baye explored.
s This study aims to build a financial audit automation
system based on random forest and real-time processing
Deep technology. Strive to achieve a 35% increase in audit
[18] 84% Low Lear 750 efficiency and shorten the data processing time by more
ning than half; at the same time, increase the audit accuracy to
92% and reduce the false alarm rate to less than 8%,
Mod Rand provide enterprises with efficient and reliable financial
[21] 86% 700
erate om
4 Informatica 49 (2025) 1–20 J. Li et al.
audit services, and help enterprises strengthen financial automation [7]. Other studies have demonstrated
management and risk prevention and control. successful cases of using innovative methods in complex
Assume that the combination of random forest system decision-making, which is consistent with our goal
algorithm and real-time data processing technology in in the field of financial auditing to achieve financial audit
financial auditing can significantly improve audit automation through random forests and real-time stream
efficiency. Random forest can mine complex data features processing technology, and to improve audit efficiency
through parallel processing of multiple decision trees, and and risk detection capabilities in complex financial data
real-time processing technology can ensure real-time data environments [8].
analysis. The two work together to shorten the audit cycle, The uniqueness of this study lies in that, for the first
improve accuracy, and reduce false alarms, thereby time, the random forest algorithm is deeply integrated with
achieving the improvement of audit efficiency and Kafka and Flink real-time processing technology and
accuracy in the research objectives. applied to the entire life cycle of the financial audit
When discussing the application of real-time process. In terms of audit process optimization, through
processing technology in the field of financial auditing, an real-time processing technology, real-time collection,
earlier solution is to use a real-time processing framework analysis and feedback of audit data are realized, and the
with low latency and high throughput characteristics to traditional post-audit is transformed into an in-process
conduct real-time monitoring of financial transaction data. audit, which greatly shortens the audit cycle from the
By setting a sliding window, the system can perform real- original average of 15 days to 7 days. In terms of the
time analysis on the data in the window, and then detect timeliness of anomaly detection, most previous studies
abnormal transactions, and basically complete the have adopted batch processing methods, which cannot
detection of abnormal transactions within seconds. detect financial risks in time. This system can process and
However, this solution lacks flexibility when dealing with analyze data at the moment it is generated. Once an
complex business logic. anomaly is detected, an alarm will be issued immediately,
In comparison, this study uses a combination of Kafka providing strong support for enterprises to take timely risk
and Flink. Kafka serves as a data buffer and distribution response measures. This innovative method not only
platform, which can efficiently collect and temporarily improves audit efficiency and accuracy, but also provides
store financial data to ensure the stability of data new ideas and methods for the real-time and intelligent
transmission; Flink is responsible for real-time processing development of the financial audit field.
and analysis of data. It not only guarantees low latency,
but also its powerful stream processing function can 2 Materials and methods
properly deal with complex financial audit logic, such as
processing multi-dimensional financial indicator 2.1 Data collection and sample selection
correlation analysis, risk assessment under complex
business processes, etc., can show good adaptability and 2.1.1 Data collection and sample selection
processing capabilities. Data collection and sample selection are the key steps in
In terms of anomaly detection algorithms, there is an the research of AI-based financial audit automation. The
algorithm that identifies abnormal data based on building diversity and accuracy of data sources directly affect the
probabilistic relationships between financial data. This effectiveness and reliability of the model. In this study, the
algorithm focuses on the dependency relationship between main data sources include the company's internal financial
data and determines whether the data is abnormal by statements, bank statements, transaction records,
analyzing the probabilistic connection between each data electronic invoices, audit reports, and external market data
point. However, unlike the random forest algorithm in this and economic indicators. In order to ensure the
study, the random forest algorithm is more capable of comprehensiveness and misrepresentations of the data, the
extracting and classifying data features by virtue of the financial data of a number of enterprises from 2015 to
integrated learning of multiple decision trees. In this 2023 were selected, covering manufacturing, service,
study, the random forest algorithm combined with real- retail and other industries. The data includes the daily
time processing technology can classify and detect operation data of the company, and also involves key
anomalies in financial data in real time. This feature is financial reports such as quarterly reports and annual
more in line with the timeliness requirements of modern reports [1].
financial auditing, and can detect potential risks timelier Data sources include public databases such as Yahoo
in the ever-changing financial environment. Finance, which provide real-time and historical financial
In the context of the widespread application of market data, company financial statements, etc. The ETL
artificial intelligence, research results in related fields process first extracts real-time streaming data through
have provided us with valuable ideas and references for Flink and Kafka integration to ensure high throughput and
our exploration in the direction of financial auditing. For low latency. Then, the data is cleaned, outliers are
example, some studies focus on the application of artificial processed, and features are extracted. Flink is used for
intelligence in complex business processes and deeply real-time conversion, and the processed data is input into
analyze how to build a sustainable implementation model. the random forest model for financial audit analysis.
This prompts us to think about how to more efficiently use Finally, the converted data and model output are stored in
artificial intelligence technology to optimize audit a database or real-time data warehouse to ensure real-time
processes and strategies in the process of financial audit
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 5
monitoring and automated financial auditing, and timely convert data from different sources and formats into a
detection of anomalies and potential financial risks. unified format to facilitate subsequent integration and
According to the previous estimate of the data analysis.
volume, in the early stage of system operation, the average When dealing with outliers, we adopted the 3σ
amount of data will increase by several pieces per day, and principle, which means detecting and dealing with outliers
the frequency of data generation is relatively stable. After that exceed 3 standard deviations from the mean. This
testing, when the number of Kafka partitions is set to 8, method is simple and effective, but it may have limitations
the parallelism requirements of data processing can be in some cases, such as when the data distribution is not
met. For example, in a high-concurrency scenario, 8 normally distributed. In contrast, isolation forests or z-
partitions can enable 8 consumers to process data at the score-based methods can better adapt to non-normally
same time, effectively reducing data backlogs. When the distributed data.
amount of data fluctuates, we can use Kafka's dynamic In the process of data integration, data warehouse
partition adjustment mechanism to increase or decrease technology is used to store decentralized financial data on
the number of partitions in real time according to the rate a unified platform, and the data is extracted, transformed
and accumulation of data generation, ensuring that the and loaded through the ETL process. Especially for the
system always maintains efficient operation. needs of real-time data processing, the Kafka stream
We use the YARN cluster mode to deploy Flink jobs processing platform is introduced to realize real-time data
because YARN can better manage cluster resources and acquisition and analysis, and ensure the timeliness and
realize dynamic resource allocation. In terms of timeliness of data. In order to further improve the
parallelism setting, the parallelism is set to 16 according efficiency and accuracy of data analysis, data feature
to the complexity of the task and the number of CPU cores engineering is also carried out to extract and select multi-
in the cluster. Each parallel task is allocated 2GB of dimensional features from the original data. In the audit,
memory, which is based on the monitoring and analysis of for the detection of financial fraud, the key financial
task memory usage. Through multiple tests, when each indicators including revenue growth rate, cost change rate
task is allocated 2GB of memory, the task execution and accounts receivable turnover rate are extracted as the
efficiency is the highest and there will be no memory input features of the model. Characteristics can reflect the
overflow. At the same time, in the YARN cluster, we financial health of the enterprise and can effectively
configured 10 nodes, each with an Intel Xeon Platinum identify potential financial risks. The process of data
8380 CPU model and 32GB of memory to meet the collection and sample selection includes data source
hardware resource requirements of Flink jobs. selection, data cleaning and prepossessing, data
In this study, the "real-time" processing achieved with integration and real-time processing, and also involves the
Kafka and Flink refers to the frequency of data processing application of feature engineering to ensure the
at the transaction level. That is, once new financial data is comprehensiveness, accuracy and timeliness of the data,
generated, Kafka will immediately receive it and quickly which provides the data foundation for the subsequent
transmit it to Flink for processing, with almost no delay, audit automation model construction and optimization [2].
which is different from daily or other low-frequency batch Through correlation analysis, we found that revenue
processing methods. Through this real-time processing, growth rate and accounts receivable turnover rate are
Blink can calculate key financial indicators such as highly correlated with financial risk, so these two features
accounts receivable turnover and cash flow in a very short are used as important inputs of the model. In addition,
time. The real-time acquisition of these indicators allows through PCA analysis, we further verified the
auditors to promptly detect subtle changes in the effectiveness of these features after dimensionality
company's financial situation and quickly discover reduction.
potential risks. For example, a sudden decrease in cash Ethical considerations, data anonymization, and
flow may indicate that the company's capital chain is tight, dataset reproducibility. Ethical considerations are crucial
and then take measures in advance to improve the audit in the process of collecting financial data. We strictly
process and improve audit efficiency. Real-time abide by data protection regulations to ensure that the data
processing and analysis technology enhances risk control sources are legal and compliant. For data obtained from
capabilities, mainly reflected in its ability to monitor public databases and internal reports, strict anonymization
financial data in real time. Once the data fluctuates is performed. All information that can directly or
abnormally, the audit system will immediately issue an indirectly identify individuals is removed, such as
early warning, allowing auditors to intervene in time to replacing company names with codes and desensitizing
reduce the company's financial risks. key personnel information. To ensure the reproducibility
In the process of data collection, data cleaning and of the dataset, the data collection process, tools used, and
preprocessing are essential steps. Eliminate duplicate data parameter settings are recorded in detail. For example,
and data items with obvious errors to ensure data when extracting data from public databases such as Yahoo
consistency and accuracy. Deal with missing value Finance, SQL query statements and Python scripts are
problems, such as completion by means of mean filling or recorded to facilitate other researchers to reproduce the
interpolation. For outliers, the 3σ principle is adopted to research process and ensure the scientificity and
detect and process them to ensure the rationality of the credibility of the research.
data. In order to improve the quality and availability of The dataset reflects real-world conditions and
data, data standardization has also been carried out to potential biases. The dataset used in this study is
6 Informatica 49 (2025) 1–20 J. Li et al.
comprehensive, covering financial data of companies in
multiple industries from 2015 to 2023, including Data Data
A D
manufacturing, service, retail, etc., involving various Cleaning Standardization
aspects of company daily operations, quarterly and annual
reports, etc. However, potential biases may still exist. The
Missing
data mainly comes from companies with data disclosure B Real-time
E
Values Processing
capabilities, which may ignore some small and micro
enterprises or emerging enterprises. In addition, different
industries have different financial characteristics and risk Outlier
C Feature
F
Detection Engineering
patterns. Although the samples cover multiple industries,
they may be underrepresented in certain segments, Figure 2: Preprocessing steps
resulting in limited adaptability of the model in these
special scenarios. As shown in Figure 2, in order to ensure the timeliness
and real-time performance of data, real-time data
2.1.2 Data cleaning and processioning steps processing frameworks, such as Apache Kafka and
Apache Slink, are introduced in the re-processing process
In the research of financial audit automation based on
to realize real-time processing and analysis of data flow.
artificial intelligence, the data cleaning and pr-processing
Through the framework, the real-time financial data can
step is very important, which ensures the accuracy and
be cleaned and re-processed in time to ensure the latest
consistency of input data, thus improving the performance
and accuracy of the data. Feature engineering plays a key
and reliability of the model. The first step in data cleaning
role in data preprocessing. Multi-dimensional feature
is to remove duplicate records and incorrect data.
extraction and selection are carried out for financial data.
Duplicate transaction records in financial statements are
Key financial indicators such as revenue growth rate, cost
checked and deleted by unique identifiers to ensure the
change rate and asset-liability ratio are extracted from the
uniqueness of data. Use logical rules and domain
original transaction data, and correlation analysis is
knowledge to detect and correct obvious errors, such as
carried out to select the features that have an impact on the
negative revenue records or unreasonable transaction
audit model. The re-processing steps ensure the high
amounts. Dealing with missing values is a key step in data
quality of the data and lay the foundation for the
preprocessing. Many methods are used to deal with
subsequent training and optimization of the audit
missing values, including mean filling method, median
automation model.
filling method and K-nearest neighbor algorithm. Missing
financial indicators, such as sales in a quarter, can be filled
2.1.3 Financial data integration
by calculating the average sales of similar enterprises to
ensure data integrity. In the research of financial audit automation based on
When dealing with outliers, the 3σ principle is artificial intelligence, financial data integration is a key
adopted, that is, abnormal data that exceeds 3 standard step to realize comprehensive data analysis and real-time
deviations of the mean value is detected and processed. processing. Integrate data from multiple sources into a
Based on the detected abnormal values, manual unified database for centralized processing and analysis.
verification is carried out according to the actual business Key financial indicators such as revenue, cost, profit,
logic, and the confirmed abnormal data is removed or accounts receivable, and accounts payable are shown
adjusted. For an expense that is higher than the industry below.
average, further verification is conducted to confirm the Table 2: Financial data integration
presence of data entry errors or unusual financial activity.
Data standardization is a step-in data processioning. The Reve Profi Play
Cost Receiv
z-score standardization method is used to convert the data Compa nue t able
Ye (mill ables
into a standard normal distribution, which can eliminate ny (mill (mill (mill
ar ion (millio
the dimensional differences between different financial Name ion ion ion
$) n $)
indicators and enhance the stability of the model. The $) $) $)
financial data of different enterprises such as revenue, cost Tech
20 143. 97.5 46.1 36.2
and profit are standardized, so that all indicators are Solutio 50.37
15 67 4 3 4
analyzed and compared under the same dimension [3]. ns Inc.
Tech
20 153. 103. 50.1 38.7
Solutio 53.89
16 45 29 6 6
ns Inc.
Green
20 165. 110. 55.2 41.2
Energy 60.34
17 78 56 2 1
Corp.
Green
20 172. 115. 56.7 44.3
Energy 63.96
18 14 37 7 2
Corp.
20 Health 188. 129. 58.6 68.48 47.8
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 7
19 Plus 23 58 5 9 01 ions s
Ltd. 10:0 Inc.
Health 0:00
20 194. 133. 60.9 49.7
Plus 70.87 2024 Gree
20 36 45 1 6
Ltd. -01- n
TXN 850.7 Parab 45.
Auto 01 Ener Debit
20 210. 145. 64.8 53.1 002 8 les 12
Tech 75.34 10:0 gy
21 47 67 0 2
Global 5:00 Corp.
Auto 2024
20 223. 154. 68.8 56.4 Healt
Tech 78.92 -01- Recei
22 19 32 7 7 TXN h 1560. Credi 72.
Global 01 vable
003 Plus 90 t 55
Food 10:1 s
Ltd.
20 Innova 235. 162. 72.6 59.3 0:00
82.45
23 tions 56 89 7 4 2024
Auto
Inc. -01-
TXN Tech 1120. Parab 49.
01 Debit
004 Glob 23 les 34
As shown in Table 2, in the process of data 10:1
al
integration, data warehouse technology is used to realize 5:00
data extraction, conversion and loading through ETL 2024 Food
process. Extract relevant financial data from different data -01- Inno Recei
TXN 1330. Credi 64.
sources (such as ERP system, CRM system, e-invoice 01 vatio vable
005 67 t 78
system.) to ensure the comprehensiveness and 10:2 ns s
completeness of the data. Format conversion and 0:00 Inc.
standardization of data from different sources, such as 2024
Tech
unified date format, currency unit conversion., to ensure -01-
TXN Solut 975.3 Parab 50.
data consistency and comparability. The converted data is 01 Debit
006 ions 4 les 85
loaded into a unified database, and the partitioning and 10:2
Inc.
indexing techniques are used to improve the efficiency of 5:00
data query and processing. As shown in Table 3, the Kafka platform enables real-
In order to meet the needs of real-time data time acquisition and processing of transaction data from
processing, Kafka stream processing platform is various data sources, such as sales systems, banking
introduced to realize real-time data acquisition and interfaces, and supply chain management systems. Kafka's
processing. Monitor transaction records and bank high throughput and low latency features ensure timely
statements in real time, and update accounts receivable data transmission and processing. Blink is used for real-
and parables data to support dynamic financial audit time data analysis and processing. With Blink, it is
analysis. The data integration step ensures the high quality possible to calculate various key financial indicators in
and timeliness of the data, and provides the data real time, such as accounts receivable turnover, cash flow.
foundation for the subsequent audit automation model In the table, you can monitor Tech Solutions Inc in real
construction and optimization [4]. time. Accounts receivable and accounts payable changes,
and through the flow processing algorithm to instantly
2.1.4 Real-time data processing and analysis calculate the company's cash flow and financial health.
In the research of financial audit automation based on Using machine learning algorithms, anomaly
artificial intelligence, real-time data processing and detection modules are embedded in the data stream to
analysis is the key to ensure the high efficiency and identify and flag suspicious transactions in real time. By
accuracy of audit process. It uses advanced stream analyzing unusual changes in transaction amount and
processing technology to realize real-time processing and frequency, potential financial fraud is detected in a timely
analysis of financial data. Stream processing platforms manner. The real-time processing and analysis technology
such as Kafka and Slink are introduced into the system to improves the audit efficiency and enhances the risk
support real-time monitoring and analysis of large-scale control ability in the audit process. The method of real-
financial data. time data processing and analysis provides a strong
support for the automation of financial audit and ensures
Table 3: Real-time financial transactions the accuracy and timeliness of data.
Trans Bal
Com actio Trans anc 2.2 Model construction
Trans Time Acco
pany n actio e
actio stam unt 2.2.1 Selection of audit automation model
Nam Amo n (mi
n ID p Type
e unt Type llio In the study of AI-based financial audit automation
($) n $) methods, model selection is a key step to ensure the
TXN 2024 Tech 1200. Recei Credi 52. efficiency and accuracy of the audit process. According to
001 -01- Solut 45 vable t 38 the research objectives and data characteristics, random
8 Informatica 49 (2025) 1–20 J. Li et al.
forest algorithm is chosen as the core audit automation trees exceeds 100, the improvement slows down. On the
model. Based on the superior performance of random validation set, when the number of trees is 100, the recall
forest in processing large-scale, high-dimensional data, as rate reaches the highest, and the overfitting phenomenon
well as its high accuracy and robustness in classification is not obvious. Therefore, considering the generalization
and regression tasks. Random forest algorithm can ability and computational cost of the model, the number
classify and predict data by constructing multiple decision of trees is selected as 100. For the determination of the
trees and splitting features randomly at each node. maximum depth, we start the test from a depth of 5 and
Advantages include the ability to handle a large number of gradually increase the depth. When the depth is 10, the
input variables, difficulty in over fitting, and robustness to accuracy of the model on the training set reaches 90%, and
missing data. The specific model selection and the accuracy on the validation set remains at around 85%.
construction process is as follows: If the depth is further increased, the accuracy of the
Feature selection: Extract key features from training set will increase slightly, but the accuracy of the
integrated financial data, such as revenue growth rate, validation set will decrease, and overfitting will occur.
accounts receivable turnover, asset-liability ratio, and cash Therefore, the maximum depth is set to 10 to achieve a
flow. Characteristics can fully reflect the financial health balance between capturing data features and preventing
and potential risks of the enterprise. 500 characteristics overfitting.
were extracted from the financial data of several Model training: Train a random forest model on a
enterprises, including revenue, cost, profit, accounts training set to build a forest containing 100 decision trees.
receivable, accounts payable. Each decision tree is generated by self-sampling the
We selected the three key features of "income growth training set. The goal of the model is to minimize the
rate, cost change rate and accounts receivable turnover classification error rate, as shown in formula (1).
rate" from the 500 initial features, and used a combination 1 N
of stepwise regression and correlation analysis. First, we E = I (yi yi )
performed univariate correlation analysis on all features N
i=1 (1)
and target variables (such as financial risk indicators), Model evaluation: Evaluate model performance using
selected features with high correlation (absolute value test sets, calculating metrics such as accuracy, recall, and
greater than 0.5), and initially reduced the number of F1 scores. Among the 1500 test data, the model correctly
features to about 100. Then, we used the stepwise classifies 1400 items and incorrectly classifies 100 items,
regression method to introduce the initially selected then the accuracy of the model is shown in formula (2).
features into the regression model one by one, and selected 1400
the feature combination with the best model fitting effect Accuracy = = 0.9333
and the most streamlined variables based on indicators 1500 (2)
such as AIC (Akaike Information Criterion) and BIC The recall rate and F1 score are calculated as shown
(Bayesian Information Criterion). In the correlation in formulas (3) and (4).
analysis, we used the Pearson correlation coefficient to TP
calculate the correlation between each feature and the Recall =
financial risk indicator on the quarterly financial data of TP + FN (3)
the past 5 years. For example, the Pearson correlation
PrecisionRecall
coefficient between the income growth rate and the F1= 2
financial risk indicator is 0.7, indicating that there is a Precision +Recall (4)
strong positive correlation between the two. When Model optimization: Improve model performance and
verifying the validity of the features by principal stability by adjusting model parameters such as the
component analysis (PCA), we first standardized the number of trees, maximum depth, and minimum number
selected 3 features, then calculated the covariance matrix, of sample splits. Cross-validation method was used to
and solved the eigenvalues and eigenvectors. The number further verify the generalization ability of the model.
of principal components is determined by the cumulative Through the steps, the random forest algorithm can
contribution rate reaching more than 90%. The results effectively identify and classify various financial
show that the cumulative contribution rate of the first two anomalies and risks in the automation of financial audit,
principal components reaches 92%, indicating that these and provide accurate and reliable audit results. This
three characteristics can effectively explain most of the method improves the audit efficiency, strengthens the risk
information in the financial data. control ability, and ensures the healthy development of
Data set partitioning: The data set is divided into enterprise financial management.
training sets and test sets to ensure the generalization
ability of the model. Typically, 70% of the data is used for
2.2.2 Model architecture design
training and 30% for testing. There are a total of 5,000
records, the training set contains 3,500 records, and the In the research of financial audit automation method based
test set contains 1,500 records [5]. on artificial intelligence, the design of model architecture
To determine the number of trees, we set the number is the core step to build an efficient and accurate audit
of trees to 50, 100, 150, and 200 for experiments. On the automation system. As the core model, random forest
training set, as the number of trees increases, the accuracy algorithm can give full play to its advantages in processing
of the model gradually increases, but when the number of
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 9
high-dimension and large-scale data through reasonable paths →define_paths(tasks)
architecture design. path →a_star(tasks, paths)
In the model architecture, the main function of the # Audit
"data input layer" is to receive raw financial data from for task in path:
different data sources, including internal corporate audit(task)
financial statements, bank flow records, etc., and perform report(task)
preliminary format verification and missing value To ensure that other researchers can repeat our
marking on the data. For example, for data in date format, research process, we describe the specific steps of data
ensure that it conforms to a unified standard format; for collection in detail. The data is mainly obtained from
data with missing values, mark the missing position for internal databases, public financial reports, and third-party
subsequent processing. The "feature extraction layer" is financial data providers. Specifically, the internal database
based on the data input layer, and deeply processes the raw of the company provides real-time updated financial
data to extract effective information that can reflect the records; public financial reports are obtained through
financial status and risk characteristics of the enterprise. stock exchanges and company websites; third-party
For example, various financial ratios such as debt-to-asset financial data providers supplement industry benchmark
ratio and gross profit margin are calculated from financial data. In addition, we recorded the SQL query statements
statement data; trend features and seasonal features are and Python scripts for data extraction in detail to ensure
extracted from time series data. Data conversion is to the consistency and integrity of the data. The ETL process
change the form of raw data to meet the input requirements includes data extraction using Apache NiFi, data cleaning
of the model, such as converting text data into numerical and transformation through the Pandas library, and finally
data and performing unique hot encoding on categorical loading into the data warehouse using Apache Hive. These
data. Data standardization is to normalize numerical data detailed steps ensure the transparency and repeatability of
so that data with different features have the same scale. the data processing process.
Commonly used methods include Z-score standardization Feature extraction layer: In this layer, key features are
and Min-Max standardization. Through these clear extracted from raw data, including but not limited to
functional definitions and operation processes, the clarity revenue growth rate, accounts receivable turnover, asset-
of the model architecture and the efficiency of data liability ratio, and cash flow. The purpose of feature
processing are ensured. extraction is to transform complex raw data into a
Data entry layer: This layer is responsible for simplified representation that the model can process. The
receiving and processing financial data from a variety of extracted features include revenue growth rate, accounts
data sources, such as ERP systems, bank statements, and receivable turnover rate, asset-liability ratio.
electronic invoices. The data input layer needs to realize Data standardization layer: In order to eliminate
real-time data acquisition and re-processing to ensure the dimensional differences between different features, the
integrity and consistency of the input data [6]. data standardization layer standardizes the extracted
In the data standardization layer, we used the z-score features. As shown in formula (5).
standardization method to convert data from different
X −
sources into the same dimension to eliminate dimensional Z =
differences. (5)
Below is the pseudo code of the model framework. Model training layer: This layer contains the concrete
# Data implementation of the random forest algorithm to build
data →collect(['internal', 'bank', 'external']) multiple decision trees from the training set data. Each
cleaned →clean(data) decision tree is generated by self-sampling the training set
unified →integrate(cleaned) and split by randomly selecting features at each node. The
kafka →setup_kafka() prediction of random forest algorithm is shown in formula
while True: (6).
new →kafka.get() yˆ =mode{h (x),h
1 2(x),,hk (x)}
(6)
process(new) Model optimization layer: In order to improve the
detect_anomaly(new) performance and stability of the model, the model
# Model optimization layer optimizes the model by tuning and
features →select(unified) cross-validation methods. The parameters include the
train, test →split(features, 0.7) number of decision trees, the maximum depth and the
rf →RandomForest(100, 10) minimum number of sample splits [10].
rf.train(train) Prediction layer: After model training is complete, the
# Evaluation and Optimization prediction layer is responsible for making predictions on
pred →rf.predict(test) the test set and output the results. The main task of the
metrics →evaluate(pred, test.labels) prediction layer is to evaluate the performance of the
model, including accuracy, recall, and F1 Score. The
rf →optimize(rf, train, 5)
accuracy on the test set is 93.33%, as shown in formula
# Path Planning (7).
tasks →define_tasks()
10 Informatica 49 (2025) 1–20 J. Li et al.
Recall refers to the proportion of samples correctly
Number of Correct Predictions identified as anomalies to all actual anomaly samples,
Accuracy = reflecting the model's ability to detect anomalies. In
Total Number of Predictions financial auditing, high precision can reduce misjudgment
(7) of normal transactions and reduce audit costs; high recall
In the anomaly detection model of financial auditing, can ensure that more potential financial risks are
we define "correct" classification as: for a transaction data, discovered. Through a comprehensive evaluation of
if its various financial indicators meet the reasonable precision and recall, the performance of the anomaly
range specified by accounting standards, and after in- detection layer can be more comprehensively measured.
depth analysis, no signs of financial fraud, such as Feedback and Improvement layer: This layer
fictitious income, concealed expenses, etc., are found, compares the predicted results of the model with the actual
then the transaction is judged to be normal. On the audit results and continuously improves and optimizes the
contrary, if there are abnormal fluctuations in indicators in model based on the feedback. Through cyclic iteration, the
the transaction data, or there are significant differences accuracy and robustness of the model are continuously
from the company's past operating data and the industry improved. Through architecture design, the application of
average, and at the same time, possible clues of financial random forest algorithm in financial audit automation has
fraud are found through data analysis, such as mismatch been fully optimized and promoted. The systematic design
between income and costs, abnormal cash flow, etc., then ensures the efficiency and accuracy of the model when
the transaction is judged to be abnormal. In the anomaly dealing with large-scale and high-dimensional financial
detection process, the model first extracts multi- data, and provides powerful technical support for
dimensional features of the input financial data, including enterprise financial audit [11].
financial ratios, trend analysis, etc. Then, the trained
classifier is used to determine whether the data belongs to 2.2.3 Configuring the data layer and
the normal class or the abnormal class according to the processing layer
preset threshold and decision rules. For example, when the
accounts receivable turnover rate is lower than a certain In the research of financial audit automation based on
percentage of the industry average, and the revenue artificial intelligence, the configuration of data layer and
growth rate fluctuates significantly in a short period of processing layer is the key part of model construction,
time, the model will judge the transaction as abnormal, which directly affects the efficiency and performance of
triggering further audit investigation. the system. The data layer is responsible for storing and
Anomaly Detection Layer: During the audit process, managing financial data, while the processing layer is
the anomaly detection layer is responsible for identifying responsible for data cleaning, transformation, analysis and
and flagging suspicious financial activity. By analyzing modeling, and the data layer configuration needs to
unusual changes in transaction amount and frequency, the consider the diversity of data and storage efficiency.
model can detect potential financial fraud in real time. In After re-evaluating the data distribution, we found
the "anomaly detection layer", in order to more that the current data set does not meet the normal
comprehensively evaluate the performance of the real- distribution assumption. Therefore, we use the isolation
time fraud detection system, we added the evaluation of forest algorithm instead of the 3σ principle for outlier
false positive and false negative rates. By calculating the detection. The isolation forest algorithm is based on the
false positive and false negative rates, we can better principle that in high-dimensional space, normal data
measure the accuracy and robustness of the system. points tend to cluster together, while abnormal data points
Specifically, we calculated the false positive and false are relatively isolated. The algorithm constructs multiple
negative rates through the confusion matrix and analyzed random binary trees to randomly divide the data points and
their impact on the system performance. The results show calculates the path length of each data point in the tree.
that the system has a low false positive rate, which means The shorter the path length, the more isolated the data
that normal activities are less likely to be mislabeled as point is and the more likely it is an outlier. In practical
abnormal; at the same time, the false negative rate is also applications, we first normalize the original financial data
effectively controlled to ensure that potential risks are not to eliminate the impact of the dimension. Then, the
ignored. These evaluation results further confirm the processed data is input into the isolation forest model, and
efficiency and accuracy of the system in real-time fraud the number of trees is set to 100 and the subsample size is
detection. set to 256 to ensure the stability and accuracy of the model.
In addition to focusing on false positives and false After the model training is completed, for new financial
negatives, the anomaly detection layer also focuses on data, we calculate its anomaly score in the isolation forest
precision and recall. Precision refers to the proportion of and set a suitable threshold (such as 0.5). When the
samples correctly identified as anomalies to all samples anomaly score exceeds the threshold, the data point is
identified as anomalies, reflecting the accuracy of the determined to be an outlier.
model in identifying anomalies.
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 11
Table 4: Financial indicators margin, operating profit margin, return on assets., and
select features that have a greater impact on the model
Ret
through correlation analysis [12].
urn De
The processed data is re-stored in the data warehouse,
Gro Oper on bt-
Qu which is convenient for subsequent model training and
Rec Comp ss ating Ass to-
Ye ick real-time processing. Partitioning and indexing techniques
ord any Mar Marg ets Eq
ar Rat are used to improve the efficiency of data query and
ID Name gin in (R uity
io processing. Real-time data processing and analysis are
(%) (%) OA Rat
realized through stream processing platforms such as
) io
Kafka and Blink. The real-time processing layer is
(%)
responsible for monitoring and analyzing the flow of
Tech financial data, calculating key financial metrics and
20 Soluti 35. 8.3 0.6 1.2
001 12.45 anomaly detection in real time. Monitor the company's
19 ons 67 4 7 5 gross and operating margin changes in real time, and
Inc. identify and flag unusual transactions in a timely manner.
Tech The processing layer uses the random forest algorithm to
20 Soluti 37. 9.1 0.6 1.3
002 13.22 train and predict the processed data. The model training
20 ons 12 1 5 0 process includes the partitioning of data sets, the
Inc. adjustment of model parameters and cross-validation to
Green ensure the generalization ability and prediction accuracy
20 Energ 30. 7.5 0.7 1.1
003 10.89 of the model. The configuration, data layer and processing
19 y 78 6 2 5 layer work together to ensure high-quality management
Corp. and efficient processing of financial data, which provides
Green a basis for the construction and optimization of audit
20 Energ 32. 8.1 0.7 1.2
004 11.34 automation models.
20 y 45 2 0 0
Corp. 2.2.4 Random forest algorithm selection and
Health
20 28. 6.7 0.7 1.1 implementation
005 Plus 9.45
19 56 8 5 0
Ltd. In the research of financial audit automation based on
artificial intelligence, the implementation process of
Health
20 29. 7.2 0.7 1.1 random forest algorithm is very important, which
006 Plus 10.12
20 67 3 3 8 determines the accuracy and robustness of the model.
Ltd.
Data preparation: Load and process the characteristic
Auto
20 34. 8.5 0.6 1.2 data in the data table. Ensure data integrity and
007 Tech 12.67
21 89 6 8 2 consistency. Characteristic data include revenue growth
Global
rate, accounts receivable turnover, asset turnover, debt
Auto
20 36. 9.4 0.6 1.2 ratio, and net profit margin. In the process of data
008 Tech 13.45
22 45 5 6 8 preparation, the feature data is standardized to eliminate
Global
dimensional differences between different features.
Food
Model training: Train a random forest model on a
20 Innova 33. 8.1 0.6 1.2
009 12.12 training set. Random forests achieve prediction by
21 tions 56 2 9 0
building multiple decision trees and splitting randomly
Inc.
selected features at each node. Set key parameters of the
Food
random forest, such as the number of decision trees. In this
20 Innova 35. 8.8 0.6 1.2
010 12.78 study, 100 decision trees are constructed. Each decision
22 tions 12 9 7 5
tree is generated by self-sampling to ensure the robustness
Inc.
and accuracy of the model.
Model prediction: Use a trained random forest model
As shown in Table 4, the processing layer is to make predictions on the test set. The model is classified
responsible for cleaning, transforming, analyzing, and or regression based on the prediction results of the
modeling the financial data in the data layer. The majority decision tree. Of the 300 records in the test set,
processing layer cleans the financial data in the data layer, the model classified 280 correctly and 20 incorrectly [13].
including dealing with missing values, outliers, and Model evaluation: Evaluate the performance of the
duplicate data. For missing values, the median fill method model, mainly including calculation accuracy, recall rate
is used, and for outliers, the 3σ principle is used for and F1 score.
detection and processing. Data transformation involves Parameter optimization: Optimize model
standardizing and normalizing the raw data to eliminate performance by adjusting model parameters. The cross-
dimensional differences between different features. The validation method is used to verify the generalization
processing layer improves the performance of the model ability of the model to ensure the consistency and stability
through feature extraction and feature selection. Extract of the model on different data sets.
key features from financial indicators, such as gross profit
12 Informatica 49 (2025) 1–20 J. Li et al.
Through the implementation process, the random algorithm. The optimization strategy mainly includes
forest algorithm has been effectively applied in the parameterize tuning, feature selection, data enhancement
automation of financial audit. It improves the audit and model integration. Feature selection by calculating the
efficiency, strengthens the risk control ability, and characteristics, the features that contribute little to the
provides technical support for the financial health model are eliminated, so as to reduce the noise and
management of enterprises. The systematic realization improve the interpretation and efficiency of the model.
process ensures the efficiency and accuracy of the model, Characterization can be determined by calculating the
and further promotes the intelligent development of contribution of each feature to the reduction of model
financial audit [14] impurity. In the analysis, the revenue growth rate and
accounts receivable turnover rate contribute the most, and
2.3 Training and optimization the characteristics can be preferentially retained.
Data enhancement is another strategy to improve the
2.3.1 Training process description robustness and generalization of the model by generating
In the research of financial audit automation based on more training samples. The model integration strategy
artificial intelligence, the training process is the key step further improves the prediction performance by
to ensure the performance of random forest model. Extract combining the prediction results of multiple models.
and standardize key characteristics from the data sheet, Random forest and gradient lifting decision tree are
including revenue growth rate, accounts receivable combined to form an integrated model, and the advantages
turnover, asset turnover, debt ratio, and net profit margin. of different algorithms are utilized to enhance the accuracy
After the features are reprocessed, the input data set of the and stability of prediction. In the concrete implementation,
model is formed. Next, the data set is divided into a the random forest and GBDT are trained respectively, and
training set (70%) and a test set (30%). In the model then the predicted results of the two are fused by weighted
training phase, the random forest algorithm is trained by average or voting mechanism to obtain the final predicted
building 100 decision trees. Each tree uses a bootstrap value [15].
sampling method to extract samples from the training set. Through the optimization strategy, the performance of
At each node, features are randomly selected for splitting random forest algorithm in financial audit automation has
to minimize the Gini coefficient. In addition, the been improved, ensuring the efficiency and reliability of
generalization ability of the model is further improved by the model in different data sets and scenarios, and
setting the maximum tree depth to 10 and using a 50% providing technical support for the financial health
feature subset. management of enterprises.
For situations where real-time data sources are
In the model training stage, the random forest temporarily unavailable, we have designed buffering
algorithm is trained by constructing 100 decision trees. strategies and error handling mechanisms. When the real-
Each decision tree uses a self-service sampling method to time data stream is interrupted, the system automatically
extract samples from the training set to ensure the stores the data in the memory buffer and periodically
diversity of data and the robustness of the model. At each attempts to reconnect to the data source. Once the data
node, randomly selected features are split to minimize Gin source is restored, the data in the buffer will be quickly
coefficient or maximize information gain, thus building processed and fed into the system. In addition, the system
the structure of the tree. The goal of model training is to is also configured with error handling logic. When the data
reduce the over fitting risk of a single decision tree and source is unavailable for a long time, an alarm mechanism
improve the generalization ability of the whole model will be triggered to notify the administrator to
through the voting results of the majority decision trees. troubleshoot the problem. These mechanisms ensure the
We have listed the hyperparameters used for training stability and continuity of the system in the face of
the random forest model in detail, including the number of emergencies.
decision trees (set to 100), the maximum depth (set to 10), In the random forest algorithm, the number of trees
the number of features used to split a node (set to 50% of and tree depth are two key hyperparameters. The number
the total number of features), and other key parameters. of trees is chosen to be 100 because more trees can
Cross-validation methods are used to evaluate the integrate the results of more decision trees, reduce the risk
model's performance on different datasets through of overfitting of a single tree, and improve the
repeated iterations and parameter adjustments (e.g., generalization ability of the model. If the number of trees
number of decision trees, maximum depth.) to ensure high is too small, the model will not learn fully; if the number
accuracy and stability. After the training process, the of trees is too large, the computational cost will increase
model's performance on the test set is used for final and the benefits will gradually decrease. The tree depth is
evaluation and validation to confirm its validity and set to 10 to balance the complexity and accuracy of the
reliability in practical applications. model. If the tree is too deep, the model will overfit the
training data and the generalization ability will deteriorate;
2.3.2 Model optimization strategy if the tree is too shallow, the complex features of the data
cannot be learned, which will reduce the performance of
In the research of financial audit automation based on
the model. A reasonable tree depth can avoid overfitting
artificial intelligence, model optimization strategy is the
while ensuring the model's ability to capture features.
key to improve the performance of random forest
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 13
2.4 Automatic path planning of financial to be carried out after the audit of balance sheets is
audit completed. We convert these tasks into nodes and edges
in the graph algorithm, and use the Dijkstra algorithm to
2.4.1 Path planning algorithm selection calculate the shortest path from the start node (such as
In the research of financial audit automation method based project start) to the end node (such as audit report
on artificial intelligence, the selection of path planning generation). By optimizing path planning, we can
algorithm is the link to realize efficient audit process. Path reasonably arrange the work order of auditors, reduce
planning algorithms are designed to determine the best unnecessary waiting time and repetitive work, such as
audit path to maximize audit efficiency and coverage avoiding auditors from frequently switching between
while minimizing audit costs and time. Based on the different tasks, thereby improving audit efficiency, and it
demand of this research, the shortest path algorithm and is expected that the audit time can be shortened by about
heuristic search algorithm based on graph theory, such as 20%.
Dijkstra algorithm and A (A-Star) algorithm, are selected
as the core algorithm of path planning. Dijkstra algorithm 2.4.2 Audit process design
is a classical shortest path algorithm, which can find the In the research of financial audit automation based on
shortest path from the starting point to the end point in the artificial intelligence, the audit process design is the key
weighted graph. It is suitable for task planning in financial to realize the efficient automation of audit tasks.
audit, such as determining the optimal path from one audit Designing a scientific and reasonable audit process can
task to another, reducing the waste of auditors' time and maximize the use of artificial intelligence technology to
resources. The algorithm maintains a priority queue, improve audit efficiency and accuracy. The design of audit
gradually expands to all nodes in the graph, calculates the process mainly includes re-audit preparation, data
shortest path of each node, and finally builds a complete acquisition and preprocessing, model application and
shortest path tree. analysis, anomaly detection and processing, audit report
On the basis of Dijkstra's algorithm, algorithm A generation and so on.
introduces a heuristic function, which makes it more In the re-audit preparation stage, the system
efficient to search the optimal path. The heuristic function establishes the audit plan and determines the audit focus
estimates the distance between the current node and the and risk areas according to the historical financial data and
destination node, thus preferentially choosing the path that industry benchmark data of the enterprise. This step
is most likely to reach the destination. Algorithm A has includes collecting data such as the company's annual
practical application value in financial audit automation, financial statements, bank statements, electronic invoices
for example, in large-scale data sets or complex audit and transaction records. Next is the data acquisition and
tasks, it can quickly find an efficient audit path and reprocessing stage, the system through the API interface
improve the overall audit efficiency. When planning an and data crawler technology, real-time acquisition of the
audit task, there are multiple task nodes and paths, each latest financial data of enterprises, and data cleaning,
with a different cost (such as time or resource format conversion and feature extraction. The processed
consumption). Using Dijkstra's algorithm, a path to data will be stored in a data warehouse for subsequent
minimize the total cost can be calculated. In more complex analysis [17].
scenarios, algorithm A further optimizes the path selection In the stage of model application and analysis, random
by introducing heuristic evaluation, making the audit forest algorithm is applied to financial data for risk
process more efficient and intelligent [16]. assessment and anomaly detection. The system analyzes
By combining Dijkstra and An algorithm, we can key financial indicators, such as revenue growth rate,
effectively plan the path of financial audit tasks and asset-liability ratio, cash flow., and predicts potential
improve the overall performance and efficiency of audit financial risks and abnormal transactions through the
automation system. The path planning method simplifies model. During an audit, the system finds that a company's
the audit process, enhances the accuracy and timeliness of accounts receivable turnover is lower than the industry
the audit results, and provides support for the financial average, and the model flags this as an anomaly and
management of enterprises. further analyzes the cause. During the exception detection
In the audit task, we define each audit link as a node, and handling phase, the system analyzes the detected
such as financial statement review, inventory counting, exceptions in detail and provides actionable audit
accounts receivable verification, etc. The edges between suggestions. The system recommends that auditors further
nodes represent the order and dependency between tasks, verify whether the low turnover is due to, for example,
such as inventory counting can only be performed after the poor collection of accounts or errors in financial
financial statement review is completed. Suppose we have statements.
an audit project, including four main tasks: auditing sales This is the audit report generation stage. The system
revenue, auditing costs and expenses, auditing balance automatically generates detailed audit reports, including
sheets, and auditing cash flow. Among them, auditing audit findings, risk assessment, and improvement
sales revenue and auditing costs and expenses can be suggestions. The report format is standardized, which is
carried out in parallel, while auditing balance sheets needs easy for auditors and management to read and make
to be carried out after the audit of sales revenue and costs decisions. The system generates a PDF report with a
and expenses is completed, and auditing cash flow needs summary of the audit, a detailed exception list, and
14 Informatica 49 (2025) 1–20 J. Li et al.
recommendations for improvement. Through the audit valuable decision support for enterprise management. The
process design, the financial audit process has realized a audit report generated by the system recommends that
high degree of automation and intelligence, improved the enterprises optimize their inventory management
audit efficiency and accuracy, and enhanced the processes to reduce inventory costs and improve the
transparency and traceability of the audit process, which efficiency of capital use. Through the implementation of
provides support for the financial health management of audit strategy, the financial audit process has realized a
enterprises. high degree of automation and intelligence, improved the
audit efficiency and accuracy, and enhanced the
2.4.3 Implementation of audit policies transparency of enterprise financial management and risk
control ability.
In the research of financial audit automation method based
In the implementation of audit policies, we take a
on artificial intelligence, the realization of audit strategy is
manufacturing enterprise as an example. The principle of
the link to ensure the efficient and accurate audit process.
formulating the audit plan is to determine the key areas
The implementation process includes the formulation,
and key links of the audit based on the business
implementation and dynamic adjustment of audit strategy.
characteristics, financial risk status and regulatory
The development of audit strategies is based on a
requirements of the enterprise. For example, for this
comprehensive analysis of the company's financial data
manufacturing enterprise, we focus on raw material
and risk assessment, using artificial intelligence
procurement, production process cost control and product
technology to identify key audit indicators and high-risk
sales. The schedule is as follows: at the beginning of each
areas. Through the analysis of historical data and industry
quarter, a detailed audit plan is formulated to clarify the
benchmark data, the system develops detailed audit
audit tasks and time nodes of each stage; the first week is
strategies. Audit policies include audit scope, key audit
to conduct a preliminary review of financial statements,
areas, schedule, and resource allocation. For an enterprise,
the second week is to conduct inventory counting and
the system focuses on its accounts receivable and
accounts receivable verification, the third week is to
inventory management, and makes corresponding audit
conduct a detailed review of costs and expenses, and the
schedule and resource allocation plan.
fourth week is to summarize the audit results and write an
In the execution stage, the system obtains the latest
audit report. Resource allocation is based on the difficulty
financial data of the enterprise in real time through API
and workload of the audit task, and auditors and technical
interface and data crawler technology, and performs data
resources are reasonably deployed. For complex cost
analysis and audit procedures according to the
accounting links, auditors with rich experience and
predetermined audit strategy. The random forest algorithm
professional data analysis tools are arranged. Through the
is used for risk assessment and anomaly detection, and the
implementation of these audit policies, the company has
system monitors and analyzes key financial indicators in
reduced the incidence of financial risks by 30% in the past
real time. When a company's cash flow fluctuates during
year, and the audit satisfaction rate has reached more than
the audit process, the system will mark the item as high
85%.
risk and further analyze the cause of the fluctuation.
Dynamic adjustment is the link of audit strategy
implementation. The system continuously monitors data 3 Results and discussion
and audit results during the audit process and dynamically
adjusts audit policies according to the actual situation. If 3.1 Results
an exception is found in a certain area during the
preliminary audit, the system will increase the audit efforts 3.1.1 Audit efficiency improvement result
in this area and adjust the audit resources and schedule. In In the research of financial audit automation method based
the process of audit, it is found that the accounts payable on artificial intelligence, the audit efficiency is improved
turnover rate of an enterprise is abnormally high, and the by introducing random forest algorithm and optimizing
system will increase the audit of the supplier payment audit process. Specific efficiency improvements can be
process to ensure the legality and compliance of all demonstrated by comparing key indicators before and
accounts transactions [18]. after the implementation of automated auditing.
The system also continuously optimizes audit Random forest feature selection can extract the most
strategies through machine learning algorithms. By representative features from a large amount of data, reduce
analyzing the successful experience and failure lessons of noise and redundant information, and improve the
past audit projects, the system constantly adjusts and accuracy and robustness of the model. The real-time
optimizes audit strategies to improve audit efficiency and analysis function enables the system to quickly respond to
accuracy. The system adjusts the weight of the risk data changes, optimize the decision-making process, and
assessment model based on historical data to ensure improve the real-time and adaptability of the system.
accurate identification of high-risk areas. The These improvements have jointly promoted the
implementation of the audit policy also includes the improvement of system performance, ensuring more
automatic generation of audit reports and accurate predictions and more efficient resource
recommendations. The system generates a detailed audit allocation.
report based on the audit results, including found risks,
anomalies and improvement suggestions, providing
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 15
As shown in Figure 4, the risk detection rate of
automated audit systems in different companies has
increased to more than 88%. High-risk transaction
recognition rates also performed well, exceeding 85% for
most companies, including Food Innovations Inc. That's
90 percent. The misjudgment rate of low-risk transactions
remains at a low level, which shows the model's ability to
accurately identify low-risk transactions. Both false alarm
rate and false alarm rate are reduced, which proves the
effectiveness of the system in reducing false alarm and
missing police surface. Tech Solutions Inc. had a false
alarm rate of 10% and a false alarm rate of 5%, showing
Figure 3: Audit efficiency improvement result
the model's stability in balancing false alarms and false
As shown in Figure 3, through the introduction of alarms.
Through the application of random forest algorithm,
random forest algorithm, the automated audit system has
the automated audit system shows high efficiency and
shown improved efficiency in many aspects. Audit
accuracy in risk identification, improves the risk detection
coverage increased across all companies, indicating that
automated systems are able to more fully audit a rate and the identification rate of high-risk transactions,
company's financial data. Error detection rates also and reduces the misjudgment rate, false positive rate and
improved, reflecting the model's strong ability to identify false negative rate of low-risk transactions. The results
provide strong support for the financial management and
and correct errors. The speed of data processing is
risk control of enterprises, and improve the quality and
accelerated, indicating that automated systems are able to
efficiency of audit work.
process large amounts of financial data more efficiently.
The improvement of accuracy and risk identification rate
further proves the reliability and effectiveness of the 3.1.3 Audit feedback and improvement results
automated audit system. Under the joint action of In the research of financial audit automation based on
indicators, the financial audit process becomes more artificial intelligence, audit feedback and improvement
efficient and accurate, which provides a guarantee for the results are the key to ensure the continuous optimization
financial management of enterprises. and efficient operation of audit process. By collecting and
analyzing audit feedback, the system can continuously
3.1.2 Audit risk identification effect improve the algorithm and process to improve the
accuracy and efficiency of the audit. It shows the
In the research of financial audit automation method based
performance of key indicators after audit feedback and
on artificial intelligence, the random forest algorithm is
introduced to effectively improve the effect of audit risk improvement of different companies, including the
identification. Through the integration of multiple adoption rate of audit recommendations, the efficiency of
corrective measures, audit satisfaction, the reduction rate
decision trees, random forest algorithm improves the
of error rate after improvement and the increase rate of
detection ability of abnormal data and potential risks. This
efficiency after improvement.
paper presents the audit risk identification effect of
The increased computing costs or complexity of
different companies, including risk detection rate, high
risk transaction identification rate, low risk transaction maintaining AI systems may stem from multiple factors.
misjudgment rate, false positive rate and false negative First, real-time analysis functions require rapid processing
of large amounts of data, which increases the demand for
rate. The data show that automated audit systems perform
computing resources and may lead to increased hardware
well in risk identification.
and operation and maintenance costs. Second, as the
complexity of the system increases, model training and
optimization require more computing time and storage
space, which increases the computational burden.
Furthermore, regularly updating and maintaining AI
models to ensure their continued effectiveness requires
more human resources and technical support, which
further increases the overall maintenance cost and
complexity of the system.
Figure 4: Audit efficiency improvement result
16 Informatica 49 (2025) 1–20 J. Li et al.
accidental. In addition, in order to further verify the
stability of risk identification accuracy, its 95%
confidence interval was carefully calculated. The results
showed that the accuracy was stable and reliable, which
enhanced the credibility of the research results.
While enjoying the results of 30% improvement in
audit efficiency and 90% accuracy, the trade-offs cannot
be ignored. With the introduction of real-time processing
technology and machine learning models, the computing
cost of the system has increased significantly, and higher
requirements have been put forward for hardware
configuration. More powerful servers are needed to
Figure 5: Audit efficiency improvement result support the rapid processing of massive data. At the same
time, the complexity of the system has been greatly
As shown in Figure 5, different companies have improved. The training, optimization and daily operation
achieved results after audit feedback and improvement. and maintenance of the model require the participation of
The adoption rate of audit recommendations is high, professional technicians, and the labor cost and technical
reaching 87% on average, indicating that enterprises difficulty have increased. However, considering the huge
attach great importance to the audit recommendations benefits it brings to corporate financial management, these
provided by the system and actively adopt them. Health investments are still worthwhile.
Plus Ltd. had an adoption rate of 89%. The effectiveness In the results section, we supplemented the control
of corrective actions also performed well, averaging 91%, group data and selected the traditional sampling audit
indicating that the corrective actions proposed by the method as a control. On the same audit items and data sets,
system were highly effective in improving financial the automated audit method based on artificial intelligence
processes and controlling risks. The audit satisfaction and the traditional sampling audit method were used for
reflects the overall evaluation of the automated audit auditing respectively. In terms of audit coverage, the
system of the enterprise, with an average of about 90%, automated audit method reached 95%, while the
indicating that the enterprise is very satisfied with the traditional sampling audit method was only 70%. This is
audit results and feedback process of the system. The because the automated audit can conduct a comprehensive
reduction of error rate after improvement shows the analysis of all data, while the traditional sampling audit is
improvement effect after audit feedback, with an average limited by the sample size. In terms of detection rate, the
reduction of 47%. Tech Solutions Inc. 's error rate was automated audit method has a detection rate of 90% for
reduced by 45 percent, while Food Innovations Inc. It was financial risks, while the traditional sampling audit
49 percent. method is 75%, indicating that the automated audit
The improved efficiency further proves the positive method can more effectively detect potential financial
effect of audit feedback on improving audit efficiency, risks. Through comparative analysis, we can more
with an average increase of more than 50%. The efficiency intuitively see the advantages of the automated audit
of Green Energy Corp. has increased by 52%, indicating method based on artificial intelligence in improving audit
that the audit efficiency of enterprises has been improved efficiency and accuracy.
through audit feedback and improvement measures. In terms of efficiency, by introducing the automated
Through continuous audit feedback and improvement audit system, the audit time has been shortened from an
measures, the financial audit automation system based on average of 20 working days to less than 10 working days,
artificial intelligence improves the accuracy and and the efficiency has been increased by more than 50%.
efficiency of the audit, enhances the standardization and This is mainly due to the system's ability to quickly
transparency of the financial management of enterprises, process large amounts of financial data and reduce the
and provides protection for the financial health of time for manual review. In terms of accuracy, the accuracy
enterprises. of risk identification has increased from 80% to more than
To verify the significance of the improvement in audit 93%. For example, in an audit of a listed company, the
efficiency, we conducted a t-test on the indicators before automated audit system discovered an abnormal
and after automation, and the results showed that the p- transaction of fictitious income in a timely manner by
value was less than 0.05, indicating that the improvement monitoring financial data in real time, while traditional
was statistically significant. audit methods failed to detect it at the first time. Through
In the result analysis phase, in order to evaluate the continuous audit feedback and improvement measures, we
research results more rigorously, an in-depth statistical continue to optimize the model and audit process, further
analysis of the improvement in audit efficiency was improve the accuracy and efficiency of audits, and
carried out. For the key indicators before and after enhance the standardization and transparency of corporate
automation, a t-test was carefully designed and executed. financial management.
Through the calculation and analysis of a large amount of The increase in computing costs mainly includes the
sample data, the result of p-value less than 0.05 was finally following aspects: hardware equipment upgrade costs. In
obtained. This data strongly shows that the improvement order to meet the needs of big data processing and model
in audit efficiency is statistically significant and not calculation, we purchased high-performance servers, and
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 17
the cost increased by 500,000 yuan; software licensing ordinary servers, while deep learning systems usually
fees. We use professional data analysis software and require high-performance servers equipped with GPUs.
artificial intelligence algorithm libraries, and the annual
software licensing fee is 200,000 yuan; human resource 3.2 Discussion
investment. We recruited and trained professionals with
data analysis and artificial intelligence technology, and the 3.2.1 Problem summary
human resource cost increased by 300,000 yuan each year. In the research of financial audit automation method based
Through cost-benefit analysis, we calculated the return on on artificial intelligence, although the efficiency
investment (ROI). In the past year, due to the
improvement and risk identification effect have been
improvement of audit efficiency, the company saved 1 achieved, there are still some problems that need to be
million yuan in audit costs, and avoided 2 million yuan in summarized and solved. Data quality remains a challenge.
potential losses caused by the failure to discover financial Even after strict data cleaning and preprocessing steps are
risks in time. According to the ROI calculation formula: implemented, data noise and missing values still affect the
ROI = (benefit - cost) / cost × 100%, the calculated ROI accuracy and stability of the model. The data sources
is 200%, indicating that the cost increase is acceptable and involved in the audit process are diverse and the data
has a high investment value. formats are not uniform, which leads to the complexity of
When evaluating the improvement of audit efficiency, data integration and increases the difficulty of system
we selected 30 audit projects as samples and recorded the processing and analysis. The issues of model
audit time before and after the use of the automated audit interpretation and transparency need attention. Although
system. When using the t test, we first performed a complex algorithms such as random forest perform well in
normality test on the two groups of data to ensure that the accuracy and efficiency, their internal decision-making
data met the conditions of the t test. Then, we calculated process is complicated and difficult to be understood and
the mean and standard deviation of the two groups of data, explained by non-technical personnel. In the process of
and calculated the t value using the t test formula. After generating audit reports and interpreting audit results,
calculation, the t value was 3.5, the degree of freedom was users' trust in audit conclusions is reduced.
58, and the corresponding p value was 0.01, which was The real-time processing capability of the system
less than 0.05, indicating that at a confidence level of 95%, needs to be improved. Despite the introduction of real-
the audit time of the automated audit system was time processing platforms such as Kafka and Blink, the
significantly lower than that of the traditional method, and system still has room for improvement in processing speed
the efficiency was significantly improved. In terms of risk and latency in the face of large-scale and high-frequency
identification accuracy, the Mann-Whitney U test was data flows. This puts forward higher technical
used. The number of risks identified and the correct requirements for realizing real - time audit. The
identification rate of the automated system and the generalization ability of the model also needs attention.
traditional method in 30 audit projects were compared. Although the robustness of the model has been improved
The calculated Mann-Whitney U value was 200, and the through cross-validation and parameter optimization, the
corresponding p value was 0.03, which was less than 0.05, model shows insufficient adaptability in the face of new
indicating that the automated system was significantly types of financial data and fraudulent means, which affects
better than the traditional method in terms of risk its promotion and application in different enterprises and
identification accuracy. industries.
Audit systems based on deep learning, such as The user feedback mechanism needs to be improved.
convolutional neural networks (CNN) and recurrent Although the system can automatically generate audit
neural networks (RNN), have powerful feature learning reports and improvement suggestions, how to effectively
capabilities when processing financial data and can collect and process user feedback in the feedback and
automatically extract complex data features. However, improvement process to continuously optimize the audit
deep learning models require a large amount of labeled strategy and model performance is still a problem that
data for training, and the training process consumes large needs in-depth research. Although AI-based financial
computing resources and takes a long time to train. In audit automation methods have achieved results in
contrast, the random forest algorithm used in this study improving audit efficiency and risk identification, they
combined with Kafka and Flink real-time processing still need to be further optimized and improved in data
technology has obvious advantages in audit efficiency. quality, model interpretation, real-time processing
When processing financial data of the same scale, the audit capabilities, generalization capabilities and user feedback
time of this system is only 50% of that of the deep learning mechanisms to achieve more efficient and reliable
system. The average audit time of the deep learning financial audit automation.
system is 20 days, while this system only takes 10 days. In Although audit efficiency has been significantly
terms of accuracy, although the deep learning system improved, an increase in computational costs has also
performs well in identifying some complex risks, the been noted. This is due to the introduction of real-time
accuracy of this system in identifying common financial processing technology and machine learning models that
risks is comparable to that of the deep learning system, increase the computational burden on the system.
reaching more than 90%. At the same time, this system has However, this cost increase is acceptable considering the
low demand for computing resources and can run on efficiency gains.
18 Informatica 49 (2025) 1–20 J. Li et al.
3.2.2 Research suggestions and positive, which shows that the debt-to-asset ratio plays
In the research of financial audit automation method based a key positive role in the model's judgment of the
on artificial intelligence, in order to further improve the transaction as abnormal. That is, the debt-to-asset ratio
efficiency and accuracy of the system, the following exceeds the normal range, which greatly increases the
research suggestions need to be put forward. Data quality possibility of the transaction being judged as abnormal.
management needs to be further improved. It is Visualizing the SHAP value (such as using a SHAP value
recommended to establish a more comprehensive data bar chart, with the horizontal axis as the feature name and
cleaning and preprocessing mechanism, adopt advanced the vertical axis as the SHAP value size) can more
missing value processing methods and anomaly detection intuitively show the degree of influence of each feature on
techniques, such as adaptive filtering and deep learning the prediction result. Auditors can see at a glance which
models, to improve data reliability and integrity. Enhance features have the greatest impact on the model's decision,
model interpretation and transparency. It is recommended and thus conduct in-depth analysis of why the transaction
to integrate explainable AI technologies such as LIME and was judged to be abnormal, greatly improving the
SHAP into the model so that auditors can understand and transparency and credibility of the audit, so that audit
explain the decision-making process of the model, thereby decisions are no longer "black box" operations, but are
improving the transparency of audit reports and the trust based on clear and explainable evidence.
of users. Enhanced real-time processing capabilities. It is
suggested to optimize the existing real-time data 3.3 Discussion
processing architecture, introduce more efficient stream From a quantitative perspective, in terms of audit
processing technologies and hardware acceleration accuracy, the accuracy of cutting-edge research is mostly
schemes, such as GPU acceleration and distributed in the range of 80% - 86%, while this study uses the
computing framework, to cope with large-scale and high- random forest algorithm combined with real-time data
frequency data stream processing needs, and ensure that processing technology to achieve an audit accuracy of
the system can respond to and process data in real time. 90%. In terms of audit efficiency, most cutting-edge
Improving the generalization ability of models is research efficiencies are at a medium or low level, but this
another direction. It is suggested to enhance the study has achieved a significant result of 30% efficiency
adaptability and robustness of the model in different improvement. This is mainly attributed to the fact that the
enterprises and industries through integration learning and random forest algorithm constructs multiple decision trees
transfer learning techniques. The multi-model fusion and randomly selects features for node splitting,
method is used to improve the generalization performance effectively reducing the risk of overfitting and improving
of the model, and the model is applied to different the model's ability to identify complex financial data; at
financial environments through transfer learning. the same time, the use of real-time data processing
Optimize the user feedback mechanism. It is suggested to platforms such as Kafka and Flink has realized the real-
establish a dynamic feedback and continuous learning time collection, processing and analysis of financial data,
system, collect and analyze user feedback, adjust and greatly accelerating the audit process.
optimize audit strategy and model parameters in time. Qualitatively, the support vector machine, gradient
Through the closed-loop mechanism of user feedback, the boosting and other methods used in cutting-edge research
system performance can be continuously improved to have limitations when facing the high dimensionality,
ensure the effectiveness of audit strategies and the complexity and dynamic changes of financial data. The
accuracy of models. The suggestions aim to further support vector machine has high computational
improve the financial audit automation method based on complexity, is sensitive to the choice of kernel functions,
artificial intelligence, improve the intelligence level and and is difficult to adapt to the diversity of financial data;
practicability of the system, and provide more powerful gradient boosting is sensitive to outliers, takes a long time
technical support and guarantee for the financial to train, and cannot meet real-time requirements. In
management of enterprises. Through improvement, more contrast, this research method not only improves the
efficient and accurate financial audit can be achieved, and accuracy and robustness of the model, but also ensures the
the intelligent transformation of the financial audit dynamics and timeliness of the audit process, and can
industry can be promoted. better adapt to the actual needs of corporate financial
audits.
3.2.3 SHAP effect However, this research method still has certain
In an actual case, we selected the financial data of a certain limitations. In terms of data quality, despite the
company over a period of time, including multiple features implementation of strict data cleaning and preprocessing
such as income, expenditure, accounts receivable turnover steps, data noise and missing values will still affect the
rate, and debt-to-asset ratio. After model training and accuracy and stability of the model. Model interpretability
prediction, we obtained the result that a certain transaction and transparency are also issues that need attention. The
was judged to be abnormal. At this time, the SHAP value internal decision-making process of the random forest
can clearly explain the basis for the model to make this algorithm is complex, and it is difficult for non-technical
judgment. personnel to understand and explain, which to a certain
By calculating the SHAP value of each feature, we extent reduces the user's trust in the audit conclusions. In
found that the SHAP value of the debt-to-asset ratio is high addition, when facing large-scale, high-frequency data
Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 19
traffic, the system's real-time processing capabilities have to meet the real-time requirements of audits. In addition,
been improved, but there is still room for improvement. SHAP is calculated based on a decision tree model, which
The generalization ability of the model when dealing with is susceptible to data distribution and noise. Even a small
new financial data and fraud methods also needs to be change in data may cause a significant change in the tree
enhanced. structure, which in turn leads to unstable calculation
results of the SHAP value, which cannot accurately reflect
4 Conclusion the true contribution of features to model decisions,
affecting the reliability of audit results.
In this study, an AI-based financial audit automation
method is deeply discussed and implemented to improve
audit efficiency, accuracy and risk identification ability.
Through the introduction of random forest algorithm, Acknowledgement
combined with several key steps such as data integration, This study was funded by Science Research Project of
real-time processing, model training and optimization, the Hebei Education Department (BJS2023041).
system has shown improvement in many aspects. By
constructing and optimizing the random forest model, the
audit coverage and error detection rate are improved, and References
the risk identification and processing are realized
efficiently. The data cleaning and reprocessing steps [1] Sun JH, Li LC, Qi BL. Financial statement
ensure the quality and consistency of the input data, comparability and audit pricing. Accounting Finance.
providing a reliable basis for the model. The introduction 2022; 62(5): 4631-4661.
of real-time processing technologies such as Kafka and https://doi.org/10.1111/acfi.12970.
Slink has accelerated data processing and met the [2] Condie ER, Obermire KM, Seidel TA, Wilkins MS.
processing needs of high frequency data streams. The Prior Audit Experience and CFO Financial Reporting
audit process design optimizes the allocation and Aggressiveness. Audit J Pract Theory. 2021; 40(4):
utilization of audit resources and improves the overall 99-121. https://doi.org/10.2308/AJPT-2020-012.
audit efficiency through systematic steps. [3] Koh K, Tong YH, Zhu ZN. The effects of financial
The maintainability and transparency of the model statement disaggregation on audit pricing. Int J Audit.
was also emphasized in the study, and by introducing 2022; 26(2): 94-112.
explainable AI technology, users' trust in audit results was https://doi.org/10.1111/ijau.12253.
increased. At the same time, through the establishment of [4] Lyshchenko O, Ocheret'ko L, Lukanovska I,
dynamic feedback mechanism, the system can constantly Sobolieva-Tereshchenko O, Nazarenko I. The role of
collect and analyze user feedback, timely adjust the audit financial audit in ensuring the reliability of financial
strategy and model parameters, and achieve continuous statements. Ad Alta J Interdiscip Res. 2024; 14(1).
optimization. Despite the achievements, the study also https://doi.org/10.33543/140139141145
points out some challenges, such as data quality issues, [5] Suryani E, Winarningsih S, Avianti I, Sofia P, Dewi N.
model generalization capabilities, and real-time Does Audit Firm Size and Audit Tenure Influence
processing capabilities, which need to be further Fraudulent Financial Statements? Australas Account
addressed in future research and applications. This study Bus Finance J. 2023; 17(2): 26-37.
shows the great potential of artificial intelligence in [6] Xu Q, Fernando G, Tam K, Zhang W. Financial report
financial audit automation, and also puts forward specific readability and audit fees: a simultaneous equation
suggestions for improvement, which provides a valuable approach. Managerial Aud J. 2020; 35(3): 345-372.
reference for future financial audit practice. Through https://doi.org/10.1108/MAJ-02-2019-2177.
continuous optimization and improvement of technology [7] Erdmann A, Yazdani M, Mas Iglesias JM, Marin
and methods, financial audit will achieve a higher level of Palacios C. Pricing Powered by Artificial
intelligence and automation, and provide more accurate Intelligence: An Assessment Model for the
and efficient support for the financial management and Sustainable Implementation of AI Supported Price
risk control of enterprises. Intelligent audit methods Functions. Informatica. 2024;35(3):529-56.
improve the quality and efficiency of audit work, but also https://doi.org/10.15388/24-infor559
enhance the transparency and standardization of financial [8] Ijadi Maghsoodi A, Hafezalkotob A, Azizi Ari I, Ijadi
management, and promote the innovation and Maghsoodi S, Hafezalkotob A. Selection of Waste
development of the financial audit industry. Lubricant Oil Regenerative Technology Using
Although explainable artificial intelligence Entropy-Weighted Risk-Based Fuzzy Axiomatic
technology (such as SHAP) has achieved remarkable Design Approach. Informatica. 2018;29(1):41-74.
results in improving the transparency of financial audits, https://doi.org/10.15388/Informatica.2018.157
its limitations cannot be ignored. In an extremely high- [9] Pragarauskaitė J, Dzemyda G. Markov Models in the
frequency data environment, calculating the SHAP value analysis of frequent patterns in financial data.
requires complex operations on massive data, resulting in Informatica, 2013, 24(1): 87-102.
a sharp increase in computing resource consumption, http://dx.doi.org/10.15388/Informatica.2013.386
processing efficiency cannot keep up with the speed of [10] Lim CY, Lobo GJ, Rao PG, Yue H. Financial
data updates, and scalability is limited, making it difficult capacity and the demand for audit quality. Accounting
20 Informatica 49 (2025) 1–20 J. Li et al.
Bus Res. 2022; 52(1): 1-37.
https://doi.org/10.1080/00014788.2020.1824116.
[11] Ismail R, Mohd-Saleh N, Yaakob R. Audit committee
effectiveness, internal audit function and financial
reporting lag: Evidence from Malaysia. Asian Acad
Manage J Account Finance. 2022; 18(2): 169-193.
https://doi.org/10.21315/aamjaf2022.18.2.8.
[12] Oussii AA, Boulila N. Evidence on the relation
between audit committee financial expertise and
internal audit function effectiveness. J Econ Adm Sci.
2021; 37(4): 659-676. https://doi.org/10.1108/JEAS-
04-2020-0041.
[13] Lyubenko A, Znak N, Karpachova O. Audit features
of the first IFRS financial statements. Financial Credit
Activity Probl Theory Pract. 2022; 1(42) :185-194.
[14] Endrawes M, Feng ZA, Lu MT, Shan YW. Audit
committee characteristics and financial statement
comparability. Accounting Finance. 2020; 60(3):
2361-2395. https://doi.org/10.1111/acfi.12354.
[15] Lutfi A, Alkilani SZ, Saad M, Alshirah MH, Alshirah
AF, Alrawad M, et al. The influence of audit
committee chair characteristics on financial reporting
quality. J Risk Financial Manage. 2022; 15(12): 563.
https://doi.org/10.3390/jrfm15120563.
[16] Calvin CG, Holt M. The impact of domain-specific
internal audit education on financial reporting quality
and external audit efficiency. Accounting Horizons.
2023; 37(2): 47-65.
https://doi.org/10.2308/HORIZONS-2020-105.
[17] Alcaide-Ruiz MD, Bravo-Urquiza F. Does audit
committee financial expertise actually improve
information readability? Rev Contab Span Account
Rev. 2022; 25(2): 257-270.
https://doi.org/10.6018/rcsar.420261.
[18] Driskill MW, Knechel WR, Thomas E. Financial
auditing as an economic service. Curr Issues Aud.
2022; 16(2). https://doi.org/10.2308/CIIA-2021-021.
https://doi.org/10.31449/inf.v49i16.7705 Informatica 49 (2025) 21–36 21
Graph Neural Network-Based User Preference Model for Social
Network Access Control
Yuan Zhang1,2*
1Xuchang Vocational Technical College, Xuchang 461000, China
2Henan Province Data Intelligence and Security Application Engineering Technology Research Center, Xuchang
461000, China
E-mail: hnxc_z@126.com
*Corresponding author
Keywords: social networks, user preferences, graph neural network, multi-layer attention, access control
Received: November 11, 2024
The popularity and deepening of social networks have increased the risk of personal information
leakage for users. To enhance the security of social networks, this study constructed an access control
model based on the preferences of social network users. This model utilizes graph neural networks to
generate access control strategies based on user preferences, and introduces a multi-layer attention
mechanism to optimize the graph neural network. To better capture user preference information, the
study sets the learning rate to 0.0001. The experimental results demonstrated that in the Twitter dataset,
the accuracy of the proposed model reached 95.7% and the F1 score reached 96.2%, which were
significantly higher than those of other models. These results indicated that the model could more
accurately classify access control in social networks and reduce false positives. The area under the
receiver operation characteristic curve of the proposed model was 0.982, which was higher than other
models. The decision time was 13.77 seconds, significantly lower than other models. This indicated that
the model could more effectively distinguish different types of user access requests and provide more
reliable guarantees for secure access to social networks. The user's preferred social network access
control model based on graph neural networks has superior performance, effectively ensuring the
information security of social network users and laying the foundation for further development of access
control technology.
Povzetek: Predstavljen je nov model za nadzor dostopa v družbenih omrežjih, ki temelji na grafskih
nevronskih mrežah in uporabniških preferencah. Z uporabo večslojnega pozornostnega mehanizma
model omogoča zanesljivo in varno upravljanje dostopa.
1 Introduction unauthorized access and malicious activities. It can also
prevent data leakage, tampering, and destruction, protect
In the era of rapid digital development, social networks
sensitive information from being leaked to unauthorized
play an important role in today's society. Through social
personnel, and ensure the reliability of network systems
platforms, people can not only obtain the information and
[4]. Therefore, the importance of implementing effective
exchange ideas they need but also engage in commercial
access control for social networks is self-evident.
activities through social networks, greatly changing their
Nowadays, there are mainly attribute-based,
communication methods and lifestyle habits [1, 2].
policy-based, and relation-based Access Control Models
However, the popularity of social networks has made the
(ACM), which are widely used in various scenarios [5].
issue of user privacy protection increasingly prominent.
However, traditional models still have drawbacks such as
Using social networks means that users need to expose
complex permission management and difficulty in
their personal information to a certain extent. Criminals
adapting to dynamic network environments. Specifically,
can steal user information through cyber attacks and use
traditional models often require manual intervention in
it for illegal activities, thereby posing potential risks to
the process of assigning, revoking, and updating
users [3]. Meanwhile, there is a large amount of false
permissions, resulting in increased management costs and
information and rumors on social networks. The rapid
error rates. The user behavior and social relationships of
dissemination of this information may lead to
social networks are constantly changing, and traditional
misunderstandings among the public about certain events
models are difficult to adapt, resulting in insufficient
or issues, resulting in adverse social impacts. Access
flexibility of access control policies and inability to
control is a critical component in information security,
effectively respond to new security threats. In this
used to manage user access permissions to systems,
context, this study constructs ACM based on the
networks, or applications. Access control can help
preferences of social users, uses Graph Neural Network
organizations protect important data and resources from
22 Informatica 49 (2025) 21–36 Y. Zhang
(GNN) for access control, and introduces Multi-Layer proof-based Ethereum access blockchain to accelerate
Attention (MLA) to optimize GNN. Finally, data storage. This method could significantly improve
UP-GNN-SNAC model, a GNN-based social network network security [9]. Zhang L et al. designed a
ACM catering to user preferences, is designed. The lightweight decentralized multi-authorization ACM based
innovation of the research lies in constructing an ACM on ciphertext policy attribute-based encryption and
based on user preferences. Compared with existing blockchain to enhance the security of in-vehicle social
GNN-based ACMs, this model better balances privacy networks. Distributed multi-authorization nodes
protection and user experience by capturing user supported vehicle users by performing lightweight
preferences, providing a more efficient and accurate computing with the help of vehicle cloud service
solution for secure access to social networks. providers. This model had significant advantages
compared to existing solutions [10].
2 Related works Zhao Y et al. designed a policy-protected, cleanable
ACM to improve the efficiency of data encryption in
The progress of the Internet has made it a part of people's
vehicle social networks. It could test and clean encrypted
daily life to interact with others through social networks.
data, and divide access policies into attribute names and
However, due to system vulnerabilities in online
attribute values, thereby hiding information in the
platforms, many criminals exploit these vulnerabilities to
ciphertext and achieving good encryption performance
launch attacks, resulting in the leakage of user
[11]. Squicciarini A et al. designed a discrete ACM based
information and even being maliciously exploited. Access
on individual decision-making to address privacy and
control, as a key technology for maintaining social
security issues arising from data sharing in social
network security, is currently a hot topic of research
networks. It took into account individual preferences in
among relevant professionals. You M et al. designed a
social networks and selected discrete privacy values from
knowledge graph-based access control decision-making
a fixed set of options. This model had a good privacy
method to improve access control performance under
protection effect in data sharing [12]. Dixit M S et al.
different degrees of imbalance. It extracted topological
designed a deep learning-based real-time user ACM for
features to represent high cardinality classification users
social networks to address user login restrictions. It used
and resource attributes, revealing the interrelationships
CNN and LSTM to predict the age of users and adopts
between different objects. This method could
multi-task CNN for face detection and feature extraction,
significantly improve access control performance [6]. Gai
thus achieving significant control over user login [13].
K et al. designed a zero-trust cross-organizational data
Wen W et al. designed an autonomous privacy control
sharing ACM based on blockchain to enhance security in
and identity verification sharing scheme built on fast
network data sharing. It utilized blockchain alliances to
response codes in social networks to solve the problem of
establish a trusted environment and deployed role-based
users being unable to independently control privacy
access control through multi-signature protocols and
sharing. It used fast response codes with high-quality
smart contract methods, which had high practicality [7].
images for error correction, combining the advantages of
Wu H et al. designed a cloud network secure storage data
polynomial-based and visual-based secret image sharing.
ACM based on association rules to improve the security
This scheme had low computational complexity and
of social network data access control. It utilized
scalability [14]. Safi S M et al. designed an improved
association rule feature extraction methods for data
end-to-end mobile social network security ACM to
mining and attack detection in network security storage
protect the personal privacy of social network users. It
areas and achieved data access control in network
encrypted user-shared data through ciphertext policy
security storage areas through adaptive partition-weighted
attribute encryption, utilizing advanced encryption
interface scheduling. This method was superior to
standards to prevent unauthorized user access. This
traditional methods [8]. Azbeg K et al. designed an ACM
scheme had high security and practicality [15]. The
based on improved blockchain technology to enhance the
summary of related work is shown in Table 1.
security and privacy of network systems. It stored data in
the interstellar file system and utilized authorization
Table 1: Summary of related work.
References Model Key features Dataset Indicator results Insufficient
Extracting
Access Control Not considering
topological
Decision Method Improved access the balance
features to Synthesize social
[6] Based on control between privacy
represent user network data
Knowledge performance protection and
and resource
Graph user experience
attributes
Blockchain Zero Establishing a Cross Slow response
[7] High practicality
Trust Cross Trusted organizational speed
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 23
Organizational Environment transaction data
Data Sharing through
Access Control Blockchain
Alliance
Poor adaptability
to new types of
Association
Using attack modes
Rules Cloud Superior to
association rules Cloud storage and difficulty in
[8] Network Security traditional
for attack logs handling
Storage Access methods
detection dynamically
Control
changing
environments
Improve Interstellar File
Significant Requires a large
blockchain System and File transfer
[9] improvement in amount of
technology Authorization record
network security storage space
access control Proof Ethereum
Complex key
Decentralized
Cryptography Compared to management
Multi
policy Vehicle existing increases
Authorization
[10] attribute-based communication solutions, it has deployment
Model for
encryption and records significant difficulty and
Vehicle mounted
blockchain advantages slow response
Social Networks
speed
The cleaning
Policy protection Testing and
Encrypt the The encryption process may
[11] can purify access cleaning
dataset effect is good result in
control encrypted data
information loss
Lack of effective
Personal modeling of
Individual preference group behavior
Good privacy
[12] Decision privacy user behavior data and insufficient
protection effect
Discrete ACM protection in consideration of
social networks personalized
preferences
Deep learning
models require a
Real time user Convolutional Superior to large amount of
Social network
[13] access control in neural network traditional data for training,
user data
deep learning predicts age methods which poses a
risk of privacy
leakage
High image
Image correction Low
Quick response quality
combined with User uploads computational
[14] code autonomous requirements
secret image images complexity and
privacy control and sensitivity to
sharing good scalability
image noise
The key
distribution and
management of
Mobile social Cryptography
Mobile device High safety and ciphertext policy
[15] network security policy attribute
logs practicality attribute
access control encryption
encryption are
relatively
complex
In summary, many scholars have achieved However, these methods still have slow response times
significant results in social network access control. and fail to consider the balance between privacy
24 Informatica 49 (2025) 21–36 Y. Zhang
protection and user experience. Therefore, this study User preference refers to the preferences of users towards
constructs an ACM based on user preferences and certain things, which are formed by the comprehensive
simulates it using an improved GNN with MLA influence of various factors such as personal factors and
mechanism to design an UP-GNN-SNAC model to social environment. Among them, personal factors
improve access control effectiveness. include internal characteristics such as age, gender,
occupation, interests, values, and behavioral habits of
3 GNN-based ACM based on user users. Social factors include external environmental
factors such as social circles, interaction objects, social
preferences frequency, cultural values, and social interactions. In
This section mainly elaborates on the construction social networks, users express their preferences through
process of the UP-GNN-SNAC model. The first section is posting and activity operations. These operations generate
the design of ACM based on user preferences, and the a large amount of data. By analyzing these data, user
second section is the implementation of an access control behavior patterns and characteristics can be understood,
algorithm based on improved GNN. and appropriate access permissions can be generated for
users to meet their privacy needs in different scenarios,
thereby protecting user privacy [16]. Therefore, this study
3.1 ACM construction based on user
constructs a model based on the preferences of social
preferences network users, as shown in Figure 1.
Algorithm
module
Historical Access control module Data Protection
User
data module Module
Preference
analysis module
Figure 1: Specific architecture of access control model.
In Figure 1, ACM consists of six modules: user, data sent by the data protection module is transmitted to the
protection, access control, preference analysis, historical preference analysis module, it will analyze the user's
data, and algorithm. When users need to post or obtain historical social data, obtain the user's preferences, and
information from social networks, sending requests first return Personal Preferences (PPs). When the data is
goes through the data protection module. It can encrypt transmitted to the algorithm module, it will train the
and backup the information posted by users. Then, the obtained data and finally return the best result to the
data protection module sends the user's request to the access control module. Specifically, different users have
access control module. This module sends requests to the different preferences. When users upload information,
preference analysis module, historical data module, and different preference information corresponds to different
algorithm module respectively. The historical data access control policies [17]. It is necessary to determine
module can extract and preprocess user interaction the level of privacy of uploaded information based on
behavior data, basic attributes, and social relationship user preferences, that is, to establish a quantitative model
data. After cleaning, deduplication, and standardization of of user preferences to measure social information
these data results, they provide input for the preference entropy. Figure 2 shows a social information sensitivity
analysis module and algorithm module. When the request measurement model based on user preferences.
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 25
Inverse cosine
Information Sensitivity
function
entropy measurement
Historical Information
User posts numbers entropy weight Social Information
data Sensitivity
User sharing Measurement Model
Social
degree
friends Conditional
information entropy
Figure 2: Social information sensitivity measurement model based on user preferences.
In Figure 2, after users post information, they need user. The confidentiality of the information posted by the
to calculate the sensitivity of the information based on user can be calculated based on whether the user's
information entropy and obtain the user's social uploaded information is blocked from their friends. The
information sharing degree based on their historical visits calculation method is shown in equation (3).
and social friends. It is also necessary to use methods F i
such as information entropy weight and conditional h = a (3)
i
F
information entropy to calculate and obtain an h
In equation (3), i is the confidentiality level of
information entropy measurement model. Information F i
information i . a is the number of friends blocked by
entropy is a basic concept in information theory, which is
the user.
used to measure the uncertainty of a random variable. It
The level of confidentiality is the ratio of the number of
reflects how difficult it is to predict the outcome of an
friends allowed to view social data on a social network to
event, or how much information is needed to describe the
the total number of friends. As the number of friends
event. Conversely, an increase in information entropy
permitted to view the social data increases, the level of
corresponds to a decrease in event outcome uncertainty,
confidentiality thereof decreases. The degree of social
and vice versa. Information entropy weight is a weight
information sharing can be used to describe the impact of
allocation method based on information entropy, which is
the number of social friends and the number of friends
used to measure the importance of different features or
blocked by the user on social data sharing. The degree to
data dimensions. Conditional information entropy is used
which friends are permitted to access information is
to measure the uncertainty of a random event given
directly correlated with the extent of social information
certain conditions. Therefore, information entropy can be
sharing. The calculation method is shown in equation (4).
used to describe the amount of privacy contained in social
data, determine the degree of privacy of the social data, 2 i
i Fa arctan F
si (F , F ) ( (4
a = hi w F) = )
and construct an information sensitivity measurement F
model for social data. The calculation method for the
s (F , F i
amount of social data of users is shown in equation (1). )
In equation (4), i a is the user's social
H(x) = − p(xi )log2 p(xi ) (1) information sharing degree. Information entropy can be
measured based on the degree of social information
H(x)
In equation (1), is the average number of sharing among users, as shown in equation (5).
private information uploaded by all users in the social
p(x
network. i ) is the proportion of the privacy level of Hs (x) = −si p(xi )log2 p(xi ) (5)
information i in the total privacy information.
H (x)
As the social breadth of a user increases with the number In equation (5), s is the information entropy.
of social friends, the relationship between the social According to the information entropy analysis of user
breadth of a user and the number of social friends can be preference mechanism, in the algorithm module, user
obtained as shown in equation (2). social data are divided into a training set and a testing set,
2 and they are trained separately to obtain the final access
w(F) = arctan F (2)
control policy. Figure 3 displays the obtaining process of
the strategy.
w(F)
In equation (2), represents the social breadth
of the user. F is the number of social friends of the
26 Informatica 49 (2025) 21–36 Y. Zhang
Training Decision-
model making
Training set
User Feature User
History extraction preferences
Final
decision
Decision-
Test set Test
making
model
Figure 3: Access control policy acquisition process.
In Figure 3, after dividing the social data into two the process by which a node updates its own
datasets, the user history records of different datasets are representation by exchanging information with its
first obtained, and then user feature extraction is neighboring nodes. This process enables the transmission
performed to calculate user preferences. The next step is of information on the graph [19]. Attention mechanism is
to combine the feature vectors to obtain a training model an important technique in deep learning that allows
and a testing model, and then make decisions separately models to selectively focus on different parts of the input
through the training model and the testing model. The sequence, assigning different weights to each part of the
final step is to make a comprehensive decision based on input sequence to highlight the more critical information
the access request, obtain the final decision, and thus for the task. The attention mechanism is a process that
obtain the access control policy. dynamically assigns weights to the elements of the input
sequence. This allows the model to focus on key parts of
the input in a targeted manner. As a result, the model
3.2 Access control method based on
processes and learns information in the data more
improved GNN efficiently [20]. Therefore, GNN performs well in graph
In ACM, the algorithm module is the core of the entire structure data such as social networks and chemical
model, which is used to process and analyze the dynamic molecular structures. Research is conducted on
processing unit of user preference data, historical constructing a GNN model based on user preferences. To
behavior data, and social relationship data. The algorithm improve the performance of the model, MLA mechanism
module not only determines whether the model can is introduced to optimize the model, and an access control
accurately analyze user preferences but also directly method based on improved GNN is designed to capture
relates to whether the model can make effective decisions the complex patterns of user social relationships and
[18]. GNN is a graph-based deep learning method that personal behavior, and optimize access permission
can enrich node representations by utilizing the allocation. The study aims to enhance the model's
relationships between nodes. Specifically, GNN can understanding of the relationships between different
update the representation of nodes by defining the nodes by integrating MLA mechanisms into GNN. Each
connection relationships between nodes on the graph, layer of the attention mechanism enables the model to
utilizing their neighbor information to achieve focus on different node characteristics, thereby enabling
information transfer and learning of the entire graph. the model to more finely distinguish the importance of
GNN mainly includes three core functions: node users and their associated objects, enhance the model's
representation, graph structure representation, and learning ability, improve its resolution of user
message passing. Among them, node representation can preferences, and more accurately capture the user's true
map each node to a low dimensional vector space for intentions. Consequently, this enhances the effectiveness
subsequent calculations. The graph structure is of access control. The model structure is shown in Figure
represented in a low dimensional vector space for 4.
subsequent calculations. Message passing is defined as
User
Node
Social data classification
Preference
Node embedding
dissemination fusion
Figure 4: Access control method based on improved GNN.
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 27
In Figure 4, nodes are constructed based on users,
their social friends, and social data posted by users. The d
respectively. T represents transpose. k represents the
user nodes include basic attributes, social relationship
characteristics, behavior characteristics, and privacy dimension of the key vector, used to scale dot product
setting characteristics. Social data nodes contain features results and prevent gradient vanishing. The method for
of published content, interactive behavior, and social updating user SP nodes is shown in equation (9).
relationships. These features are transformed into low
1
dimensional vectors through numerical processing and pn+ = un + n n
a a a u
b
bS
embedding learning, and embedded into GNNs as input a (9)
n
vectors. Given the input data, the user's Social softmax(Relu( n )T
3Relu( n
a = ua W ub ))
Preferences (SPs) and PPs are obtained, and the two
preferences are fused. Then, the fused data is trained and pn+1
In equation (9), a is the updated temporary
the nodes are classified. The user attributes are selected to
represent the user, and after embedding, the user node
embedding matrix is obtained. The calculation method is n
node of user SPs. a is the attention score, which is the
shown in equation (6).
aggregated weight ratio of each neighboring node during
ua = f (W1 [Pa ,Ea ]) (6)
un un
node update. a and b are the n -th embeddings of
u P
In equation (6), a is the user node, a is the nodes a and b . b represents all the explicit and
implicit neighbor nodes of the node in the figure.
E
node embedding matrix, a is the node free embedding
softmax() Relu()
and both represent activation
W
matrix, and 1 is the node embedding weight. By using
W
functions. 3 is the attention weight. The update of user
natural language processing tools to process and extract
PP nodes is shown in equation (10).
each social data, the embedding matrix of user posted
social data nodes is obtained after embedding, and the qn+1 un n n
a = a + a d
i
iC
a (10)
calculation method is shown in equation (7).
n softmax(MLP[ n , n
i = ua di ])
di = f (W2 [Qi ,Vi ]) (7)
qn+1
In equation (10), a is the updated PP temporary
d Q
In equation (7), i is a social data node. i is the
n
node. a is the weight ratio of adjacent nodes when a
V
social data embedding matrix. i is the free embedding
c
user node updates. a is all user related data in the
W
matrix of data nodes. 2 is the embedding weight of figure. MLP is a multi-layer perceptron. A multi-layer
perceptron is a simple neural network used to perform
social data. The embedded user nodes and social data
nonlinear transformations on the feature vectors of nodes.
nodes are input into the fusion layer and updated
It typically consists of multiple fully connected layers,
simultaneously through the MLA mechanism. The MLA
each of which can be followed by a nonlinear activation
mechanism calculation method is shown in equation (8).
function. The embedding of user nodes and social data
QKT
A(Q, K ,V ) = softmax( )V (8) nodes have different meanings in each dimension. If
dk attention scores are calculated using functions such as dot
product or mean pooling, it will result in inaccurate
A(Q,K,V)
In equation (8), represents attention. attention scores. Therefore, attention neural networks are
used to calculate the attention scores of each neighboring
Q
, K , and V represent query, key, and value, node, and the results obtained by each neural network are
finally normalized. The updated user's social preference
28 Informatica 49 (2025) 21–36 Y. Zhang
temporary node and personal preference temporary node
un+1
are weighted and fused to obtain the updated user node. In equation (11), a is the updated user node.
The calculation method is shown in equation (11).
n+1 n+1
and are the weights of SP temporary nodes
un+1 n+1 pn+1 n+1qn+1
a = a + a
(11) and PP temporary nodes in the updated user nodes.
n+1 n+1
+ =1
Figure 5 shows the user preference fusion process.
Embedded layer Fusion layer Access control
output layer
GNN N-order fusion
User node
Propagation
Word Fusion
law
embedding coefficient
Social data
Social Personal End user
preference preference node
Figure 5: User preference fusion process.
In Figure 5, user preference fusion consists of three
C
parts: embedding layer, preference propagation fusion n W
In equation (12), ua is the fusion coefficient. 4
layer, and access control output layer. In the embedding
layer, word embeddings are performed on user nodes and is the fusion weight of user nodes. e is a natural
social data as inputs to the model. In the preference
sigmod()
propagation fusion layer, two GNNs simulate the constant. d is the dimension. is the
propagation and change patterns of user social
preferences among users and the propagation and change tanh()
activation function. is a hyperbolic tangent
patterns of user personal preferences in social data. After
N rounds of propagation and fusion, user nodes are function. Finally, the loss function of the model is defined
updated using attention mechanisms based on explicit
to measure the difference between the predicted results of
neighbor nodes, implicit neighbor nodes, and social data
nodes. In the access control output layer, in the graph, N the model and the true labels, as expressed in equation
user nodes obtained through N preference propagation (13).
fusion are used to calculate the fusion coefficient through
1
a linear neural network. Based on the fusion coefficient, L = − [ya log( pra ) + (1− ya ) log(1− pra )](13)
the N user nodes are finally fused to obtain the final user M a
nodes with user preferences. The fusion coefficient can In equation (13), L is the loss and M is the
quantify the importance or weight of user social
preference temporary nodes and user personal preference y
amount of user nodes. a means the true label of the
temporary nodes. Therefore, the calculation of the fusion
coefficient is completed by updating the user nodes and
normalizing through a nonlinear transformation and a
softmax function. At the same time, the user embedding a
-th user node, with a value of 0 or 1. When access is
vectors after each propagation are multiplied by their
corresponding fusion coefficients, and these weighted allowed, it is 1, and when access is prohibited, it is 0.
embedding vectors are added to obtain the final user node
vector. The calculation method is shown in equation (12). pra is the probability of being judged as allowed access.
C During the training phase, the model will continuously
n = Softmax(tanh(W4u
n + e dT )
u a )
a
N (12) adjust parameters based on the difference between the
u = sigmod(C un
a un a )
n=1 a true labels and the predicted probabilities to minimize the
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 29
loss function and optimize the probability estimates allowed access, thus completing access control. The
allowed for access. According to the loss function, all implementation process of the designed ACM is shown in
user nodes are classified based on whether they are Figure 6.
Update the user
Start Merge End
preference node
Input Update user social
End user node Exportation
parameter preference node
Iterative Calculated fusion
Initialize Access control
propagation coefficient
Enter the Traverse the Loss function Determine the user
GNN model social matrix classification prediction node type
Figure 6: Implementation process of the proposed access control algorithm.
In Figure 6, the original data such as user 4 Analysis of ACM results on social
information, social relationships, and content posted by
users are first collected from social networks. Redundant networks
information in the data is eliminated by removing This chapter mainly elaborates on the experimental
duplicate items, and missing values are filled in to ensure
results of the UP-GNN-SNAC model. The first section is
the integrity of the data. The data are then uniformly
converted through format conversion to complete the a performance analysis of ACMs based on improved
preprocessing of the collected data. The random seed is GNN. The second section is an analysis of the practical
set to 42 using the random module and numpy library in application effect of the ACM based on improved GNN.
Python, thereby ensuring that the generated random
number sequence is the same every time the code is run.
Then, the basic attributes and social behavior of users are 4.1 Performance testing of social network
extracted to construct user feature vectors, content feature ACM
vectors, and social relationship features. The model is
To verify the performance of the proposed
used to map user node content to the status space,
UP-GNN-SNAC model, this study conducts simulation
outputting user node embedding matrices and content
experiments using Python 3.7 on a Windows 11 64 bit
node embedding matrices. Then, by analyzing users'
operating system equipped with an Intel Core
social behavior, personal and social preferences are
i7-14700KF central processor, 16GB of RAM, and
obtained, and a MLA mechanism is introduced to update
256GB of hard drive. The preferred propagation depth is
the representations of user nodes and content nodes,
5, the learning rate is 0.0001, and the maximum number
highlighting the influence of important neighbors. Users'
of iterations is 200. Accuracy is the most intuitive
social and personal preferences are combined to form a
evaluation metric in classification models, representing
comprehensive user preference representation. Next,
the proportion of correctly classified samples to the total
using the fused user nodes as input, iterative propagation
sample size. It measures the accuracy of the model in
is performed through GNN to update the node
classifying user access permissions. The F1 value is the
representation. Next, a loss function is defined to measure
harmonic mean of accuracy and recall, used to
the difference between the model's predicted results and
comprehensively measure the performance of a model. It
the true labels, and the model parameters are adjusted to
can balance the accuracy and recall of the model and
minimize the loss function. Finally, the trained model is
avoid bias caused by imbalanced data. Firstly, the Twitter
employed to classify and predict new user nodes,
dataset is introduced to calculate the accuracy and F1
determine whether to allow access to specific resources,
value of the research model, and compared with the
and implement access control policies based on the
accuracy and F1 value of traditional GNN and
results of access control decisions. The purpose of these
blockchain-based IoT ACMs in reference [20]. The
actions is to ensure user privacy protection in social
results are shown in Figure 7.
networks.
30 Informatica 49 (2025) 21–36 Y. Zhang
10 10
90 90
80 80
70 70
60 60
50 50
40 Designed algorithm 40 Designed algorithm
30 Reference [20] 30 Reference [20]
GNN GNN
20 20
10 10
0 0
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
Number of iterations Number of iterations
(a) Accuracy (b) F1
Figure 7: Accuracy and F1 value of different models.
In Figure 7 (a), as the iteration increases, the 96.2±1.22%, respectively. Compared with the traditional
accuracy of three models shows an upward trend. When GCN model and the model in reference [20], the F1 value
iterating 200 times, the accuracy of traditional GNN is of the proposed model has increased by 12.6% and 6.4%,
88.3±1.97%, the accuracy of the model in reference [20] respectively. The accuracy and F1 value of the research
is 91.4±2.03%, and the accuracy of the research model is model are significantly higher, proving its high
95.7±2.11%. Compared with traditional GNN and the classification accuracy and good effectiveness. The loss
model in reference [20], the accuracy of the proposed function can be used to measure the difference between
ACM has been improved by 7.4% and 4.3%, the model's predicted results and the true labels. The
respectively. In Figure 7 (b), as iterations increase, the F1 experiment then introduces the Yelp dataset and
values of various models gradually increase and tend to calculates the loss of the research algorithm under the
flatten out. When the iteration reaches its maximum, the Twitter and Yelp datasets. The results compared with the
F1 values of GNN, the model in reference [20], and the other two algorithms are shown in Figure 8.
research model are 83.6±1.16%, 89.8±1.09%, and
1.0 0.45
0.9 0.40
0.8 0.35
Designed algorithm
0.7 Designed algorithm
Reference [20] 0.30 Reference [20]
0.6 GNN
GNN 0.25
0.5
0.20
0.4
0.15
0.3
0.2 0.10
0.1 0.05
0.0 0.00
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
Number of iterations Number of iterations
(a) Twitter dataset (b) Yelp dataset
Figure 8: Loss of different algorithms in different datasets.
In Figure 8 (a), on Twitter, as iterations increase, the calculating its accuracy, recall, and Area Under the Curve
losses of different models all show a decreasing trend. (AUC) on both the Twitter and Reddit datasets, it is
The loss values of traditional GNN, reference [20] compared with traditional GNN, Graph Convolutional
models, and research models are 0.23±0.03, 0.14±0.02, Network, signature schemes based on ciphertext policy
and 0.07±0.03. In Figure 8 (b), the changes in loss curves attributes in reference [19], and models in reference [20].
of different models in Yelp are consistent with those in The AUC metric is a statistical technique that can
Twitter. The loss values of the three models are comprehensively reflect the model's ability to distinguish
0.12±0.02, 0.07±0.01, and 0.04±0.01. The loss value of between different categories. A higher AUC value
the research model is greatly lower than others, indicating indicates that the model can more accurately predict
its good generalization ability. It performs well in which users should be granted access permissions,
different datasets, indicating good scalability. To further thereby reducing the likelihood of erroneously denying
validate the performance of the proposed model by legitimate access or erroneously approving illegal access.
Loss Accuracy (%)
Loss
F1 (%)
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 31
Meanwhile, the analysis of variance is used to evaluate The three indicators of the proposed model are 0.966,
the differences between models. ANOVA is a statistical 0.943, and 0.982, respectively. On the Reddit dataset, the
method used to compare whether there is a significant accuracies of the five models are 0.768, 0.821, 0.871,
difference in the mean between two or more groups. It is 0.896, and 0.972, respectively, with recall rates of 0.813,
a widely used tool for researching experimental design 0.846, 0.913, 0.935, and 0.938, and AUC values of 0.778,
and data analysis. ANOVA compares the variability 0.815, 0.857, 0.911, and 0.976, respectively. In different
between different groups to determine whether the within datasets, the accuracy, recall, and AUC values of the
group variability is significantly smaller than the between three indicators of the proposed model are significantly
group variability. If the inter-group variability is higher than those of other models, and the differences
significantly greater than the intra-group variability, it between the three indicators of the five models are
can be concluded that there are significant differences statistically significant (P<0.05), proving its good
between different groups. The significance level is set to comprehensive performance and reliability. Finally,
0.05. If P<0.05, it indicates that the difference between ablation experiments are conducted on the proposed
groups is statistically significant. Otherwise, the model to calculate the accuracy, recall, F1 value, and
difference between groups is not statistically significant. running time of different modules. The results are shown
The results are shown in Table 2. in Table 3.
Table 2: Precision, recall, and AUC values of different Table 3: Results of ablation experiment.
models. Running
Module Accuracy Recall F1
Data Preci Rec AU time (s)
Model P P P
set sion all C Attention
0.774 0.819 0.807 69.58
0.8 0.7 module
GNN 0.782
25 91 GNN
0.862 0.842 0.853 43.12
Graph module
Convolu 0.8 0.7 Designed
0.825 0.973 0.956 0.961 46.89
tional 31 96 algorithm
Network
Twit Referenc <0. 0.9 <0. 0.8 <0.
0.865 From Table 3, the accuracy, recall, F1 value, and
ter e [19] 05 04 05 36 05
Referenc 0.9 0.9 running time of the attention module are 0.774, 0.819,
0.903
e [20] 32 19 0.807, and 69.58s, respectively. The accuracy, recall, F1
Designe
value, and running time of the GNN module are 0.862,
d 0.9 0.9
0.966
algorith 43 82 0.842, 0.853, and 43.12s, respectively. The four
m indicators of the designed model are 0.973, 0.956, 0.961,
0.8 0.7
GNN 0.768 and 46.89s, respectively. The accuracy, recall, and F1
13 78
Graph score of the designed model are higher than those of the
Convolu 0.8 0.8
0.821 two sub-modules, and the running time is higher than that
tional 46 15
of the attention module, but slightly lower than that of the
Network
Red Referenc <0. 0.9 <0. 0.8 <0. GNN module. Despite the augmented computational
0.871
dit e [19] 05 13 05 57 05 complexity of the model, it has been demonstrated to
Referenc 0.9 0.9
0.896 enhance prediction accuracy. In practical application
e [20] 35 11
Designe scenarios, the additional temporal expenditure is deemed
d 0.9 0.9
0.972 justifiable.
algorith 38 76
m
4.2 Analysis of the practical application effect
From Table 2, in the Twitter dataset, the accuracy, of ACM in social networks
recall, and AUC values of the traditional GNN model are To verify the practical application effect of the ACM
0.782, 0.825, and 0.791, respectively. The three based on improved GNN, this study first calculates the
indicators of the graph convolutional network are 0.825, space overhead and computation time of the research
0.831, and 0.796, respectively. The three indicators of the model during encryption and decryption. It is compared
model in reference [19] are 0.865, 0.904, and 0.836, with the results of traditional GNN and the model in
respectively. The three indicators of the model in reference [20]. Space overhead refers to the storage space
reference [20] are 0.903, 0.932, and 0.919, respectively.
32 Informatica 49 (2025) 21–36 Y. Zhang
occupied during the storage and operation of the model, as shown in Figure 9.
50 40
Designed algorithm Reference [20] Designed algorithm Reference [20]
45 36
GNN GNN
40 32
35 28
30 24
25 20
20 16
15 12
10 8
5 4
0 0
10 15 20 25 30 0 5 10 15 20 25
Social data volume Social data volume
(a) Calculate expenses (b) computing time
Figure 9: The computational cost and time of different models.
In Figure 9, as social data increases, the spatial PM 0 1 1 1
overhead and computation time of different models
Sensitivity 1 1 0 1
gradually increase. When the social data scale is 30, the
space overhead of traditional GNN, models in reference UA 1 1 1 1
[20], and research models is 43.7±3.13Kb, 30.4±3.05Kb, Trust level 1 1 0 1
and 16.2±2.88Kb, respectively, with computation times
Personalizatio
of 32.1±2.79s, 15.9±2.92s, and 5.3±0.97s. The space 1 0 1 1
overhead and computation time of the research model are n
much lower than other models, which proves its high
computational efficiency and low computational In Table 4, only the research model is consistent in
complexity. This study validates the access control terms of UPQ. In terms of HR and UA, all four models
effectiveness of the research model from seven aspects: are consistent. In terms of PM, traditional GNN does not
User Preference Quantification (UPQ), Historical comply. The model in reference [20] does not match in
Records (HR), Privacy Metrics (PM), Sensitivity, User terms of sensitivity and trustworthiness. This may be
Attributes (UA), Trust, and Personalization. If the effect because the model may not have dynamically evaluated
matches, output 1; otherwise, output 0. Table 4 compares user behavior, authentication, or contextual information,
the model with traditional GNN, reference [19], and [20]. resulting in an inability to accurately measure trust levels.
Among them, UPQ can meet user needs, HRs are used to In terms of personalization, only the model in reference
evaluate the consistency of user behavior, PMs and [19] does not match. The research model is consistent in
sensitivity can ensure data security and compliance, UAs all 7 aspects, proving that its access control effect is
can provide basic access control basis, trust can evaluate relatively ideal. Finally, the Receiver Operating
the reliability of user behavior, and personalization can Characteristic curve (ROC) is introduced. The horizontal
improve user experience. The results are shown in Table axis of the ROC curve represents the false positive rate,
4. which represents the proportion of all negative samples
Table 4: Access control effectiveness of different models. that were incorrectly predicted as positive. The vertical
Researc axis represents the true sample rate, which represents the
proportion of all actual positive samples correctly
GN Referenc Referenc h
Index predicted as positive samples. The model correctly
N e [19] e [20] algorith identifies requests that are actually positive samples as
m legitimate access and requests that are actually negative
UPQ 0 0 0 1 samples as illegal access. The ROC curves of the four
models are calculated separately, and the results are
HR 1 1 1 1 shown in Figure 10.
Space overhead (Kb)
TIme (s)
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 33
1.0
0.9
0.8
0.7
0.6
0.5
0.4
Designed algorithm
0.3 Reference [20]
0.2 Reference [19]
0.1 GNN
0
0 0.2 0.4 0.6 0.8 1.0
False Positive
Figure 10: ROC curves and correlation coefficients R of four models.
From Figure 10, the ACM based on traditional GNN preferences and privacy measurement mechanisms,
is closest to the standard line and has the smallest area effectively improving network security, it also increases
under it. The lower area of the model in reference [19] is certain computational overhead. Therefore, in practical
second, followed by the model in reference [20]. The applications, debugging needs to be carried out according
ROC curve of the proposed model is closer to the upper to specific requirements.
left corner, with the largest lower area, indicating its
strong classification ability and proving the high accuracy 6 Conclusion
of the user preference-based ACM based on the improved
ACM is crucial for the security of social networks, as it
GNN.
can help protect sensitive data and prevent malicious
5 Discussion attacks and violations. To improve the accuracy and
operational efficiency of social network ACM, a new
The research aims to improve the effectiveness of social type of ACM was designed based on the preferences of
network access control by utilizing MLA mechanisms to social network users. The user preferences were
enhance GNN's understanding of complex social simulated using GNN, and a MLA mechanism was
relationships and personal behavior. A GNN-based social introduced to improve the model. The experimental
network ACM based on user preferences is proposed. The results showed that the accuracy and F1 value of the
results showed that the accuracy and F1 value of the proposed model were 95.7% and 96.2%, respectively,
proposed ACM improved by 7.4% and 12.6% significantly higher than other models. This proved that
respectively compared to the GNN model and the through GNN and MLA mechanism, the model could
blockchain-based IoT ACM, demonstrating its high dynamically capture user preference features and improve
classification accuracy. This is similar to the conclusion classification accuracy. The space cost of the proposed
drawn by You M et al. [6], while the proposed model is model was 16.2Kb and the computation time was 5.3s,
superior. This is because the proposed model optimizes which was significantly lower than the space cost and
GNN through a MLA mechanism, which can more computation time of other models. This proved that the
effectively capture complex patterns of user preferences model adopted a lightweight GNN architecture, reducing
and social relationships, thereby significantly improving computational complexity and optimizing algorithm
performance. The computation time for encryption and design to reduce space cost. Although the proposed ACM
decryption of the proposed model was 5.3 seconds, which has superior performance, there are still some
was much lower than the GNN model and the shortcomings. The study did not test it on different types
blockchain-based IoT ACM. This conclusion is consistent of social platforms, and future research will further test
with the findings of Gai K et al. [7], but the running the performance of the model through different social
efficiency of the proposed model is higher than that of the network platforms to improve its universality. At the
method proposed by Gai K et al. This is because the same time, the performance of the model in dynamic
proposed model significantly improves computation time environments will be explored to cope with the constantly
through MLA and information entropy. In summary, the changing user behavior and data traffic in social
proposed model performs well in multiple aspects. networks.
Although the proposed model can more accurately
identify legitimate and illegitimate access through user
True Positive
34 Informatica 49 (2025) 21–36 Y. Zhang
Journal of System Assurance Engineering and
Funding Management, 2023, 14(4): 1379-1386.
https://doi.org/10.1007/s13198-023-01942-z
The authors declare that no funds, grants, or other support [9] Azbeg K, Ouchetto O, Andaloussi S J. (2022). Access
were received during the preparation of this manuscript. control and privacy-preserving blockchain-based
system for diseases management. IEEE Transactions
Competing interests
on Computational Social Systems, 2022, 10(4):
The authors have no relevant financial or non-financial 1515-1527.
interests to disclose. https://doi.org/10.1109/TCSS.2022.3186945
[10] Zhang L, Zhang Y, Wu Q, Mu Y, Rezaeibagha F.
Data availability statement (2022). A secure and efficient decentralized access
control scheme based on blockchain for vehicular
All data generated or analysed during this study are social networks. IEEE Internet of Things Journal,
included in this article. 2022, 9(18): 17938-17952.
https://doi.org/10.1109/JIOT.2022.3161047
References [11] Zhao Y, Yu H, Liang Y, Conti M, Bazzi W, Ren Y.
(2023). A sanitizable access control with
[1] Gai T, Cao M, Chiclana F, Zhang Z, Dong Y,
policy-protection for vehicular social networks.
Herrera-Viedma E, Wu J. (2023). Consensus-trust
IEEE Transactions on Intelligent Transportation
driven bidirectional feedback mechanism for
Systems, 2023, 25(3): 2956-2965.
improving consensus in social network large-group
https://doi.org/10.1109/TITS.2023.3285623
decision making. Group Decision and Negotiation,
[12] Squicciarini A, Rajtmajer S, Gao Y, Semonsen J,
32(1): 45-74.
Belmonte A, Agarwal P. (2022). An extended
https://doi.org/10.1007/s10726-022-09798-7
ultimatum game for multi-party access control in
[2] Kashmar N, Adda M, Ibrahim H. (2022). Access
social networks. ACM Transactions on the Web
control metamodels: review, critical analysis, and
(TWEB), 2022, 16(3): 1-23.
research issues. Journal of Ubiquitous Systems and
https://doi.org/10.1145/3555351
Pervasive Networks, 16(2): 93-102.
[13] Dixit M S, Wajgi M D, Wanjari S. (2022). Real time
https://doi.org/10.5383/JUSPN.03.01.000
user access control on social network using deep
[3] Wang W, Huang H, Yin Z, Gadekallu T R, Alazab, M,
learning. International Journal for Research
Su C. (2023). Smart contract token-based
Publication and Seminar, 2022, 13(2): 246-251.
privacy-preserving access control system for
https://jrps.shodhsagar.com/index.php/j/article/view/
industrial Internet of Things. Digital
598.
Communications and Networks, 2023, 9(2): 337-346.
[14] Wen W, Fan J, Zhang Y, Fang Y. (2022). APCAS:
https://doi.org/10.1016/j.dcan.2022.10.005
Autonomous privacy control and authentication
[4] Thabit S, Yan L S, Tao Y, Abdullah A B. (2022). Trust
sharing in social networks. IEEE Transactions on
management and data protection for online social
Computational Social Systems, 2022, 10(6):
networks. IET Communications, 2022, 16(12):
3169-3180.
1355-1368. https://doi.org/10.1049/cmu2.12401
https://doi.org/10.1109/TCSS.2022.3218883
[5] Ameer S, Benson J, Sandhu R. (2022). Hybrid
[15] Safi S M, Movaghar A, Ghorbani M. (2022). Privacy
approaches (ABAC and RBAC) toward secure
protection scheme for mobile social network.
access control in smart home IoT. IEEE
Journal of King Saud University-Computer and
Transactions on Dependable and Secure Computing,
Information Sciences, 2022, 34(7): 4062-4074.
2022, 20(5): 4032-4051.
https://doi.org/10.1016/j.jksuci.2022.05.011
https://doi.org/10.1109/TDSC.2022.3216297.
[16] Ahmed F, Wei L, Niu Y, Zhao T, Zhang W, Zhang D,
[6] You M, Yin J, Wang H, Cao J, Wang K, Miao Y,
Dong W. (2022). Toward fine‐grained access control
Bertino E. (2023). A knowledge graph empowered
and privacy protection for video sharing in media
online learning framework for access control
convergence environment. International Journal of
decision-making. World Wide Web, 2023, 26(2):
Intelligent Systems, 2022, 37(5): 3025-3049.
827-848.
https://doi.org/10.1002/int.22810
https://doi.org/10.1007/s11280-022-01076-5
[17] Salem R B, Aimeur E, Hage H. (2023). A multi-party
[7] Gai K, She Y, Zhu L, Choo K K R, Wan Z. (2023). A
agent for privacy preference elicitation. Artificial
blockchain-based access control scheme for zero
Intelligence and Applications, 2023, 1(2): 98-105.
trust cross-organizational data sharing. ACM
https://doi.org/10.47852/bonviewAIA2202514
Transactions on Internet Technology, 2023, 23(3):
[18] Mayeke N R, Arigbabu A T, Olaniyi O O, Okunleye
1-25. https://doi.org/10.1145/3511899
O J, Adigwe C S. (2024). Evolving access control
[8] Wu H, Ye W, Guo Y. (2023). Data access control
paradigms: A comprehensive multi-dimensional
method of cloud network secure storage under
analysis of security risks and system assurance in
Social Internet of Things environment. International
Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 35
cyber engineering. Asian Journal of Research in
Computer Science, 2024, 17(5): 108-124.
https://doi.org/10.2139/ssrn.4752902
[19] Patil R Y. (2024). A secure privacy preserving and
access control scheme for medical internet of things
(MIoT) using attribute-based signcryption.
International Journal of Information Technology,
2024, 16(1): 181-191.
https://doi.org/10.1007/s41870-023-01569-0
[20] Zhonghua C, Goyal S B, Rajawat A S. (2024). Smart
contracts attribute-based access control model for
security & privacy of IoT system using blockchain
and edge computing. The Journal of
Supercomputing, 2024, 80(2): 1396-1425.
https://doi.org/10.1007/s11227-023-05517-4
36 Informatica 49 (2025) 21–36 Y. Zhang
https://doi.org/10.31449/inf.v49i16.7787 Informatica 49 (2025) 37–52 37
Fusion of Deep Convolutional Neural Networks and Brain Visual
Cognition for Enhanced Image Classification
Xintao Li1, *, Hongyan Guo2
1College of Innovation and Entrepreneurship, Henan Open University, Zhengzhou 450046, China
2School of Information Engineering and Artificial Intelligence, Zhengzhou Vocational University of Information and
Technology, Zhengzhou 450046, China
*Email of Corresponding Author: lxt5168@163.com
Keywords: deep convolutional neural network, brain, visual cognition, intelligent computing model, image
classification
Received: December 9, 2024
The brain visual system is one of the core centers for human perception of external information. How to
establish the brain visual cognitive system to classify and process image information is a key matter in
the area of human-computer connection. In order to improve the accuracy of computer vision image
classification, a fusion intelligent computing model based on deep convolutional neural network and brain
visual cognition is proposed. This model simulates the visual processing mechanism of the human brain
and uses brain computer interface technology to extract electroencephalogram signals, thereby achieving
efficient classification and processing of image information. When designing an image classification
model based on DCNN, a long short-term memory network structure is introduced to extract time series
features of electroencephalogram signals. In order to enhance the classification accuracy of the model,
attention mechanism and occlusion independent neural response methods are also applied to improve the
accuracy of capturing the correlation information between brain response and image features. The results
show that the prediction accuracy of the research model reaches 93.54% and 94.03% in the V4 visual
region and L0 visual region, respectively. The highest accuracy on facial visual images reaches 95.46%,
while the lowest accuracy on animal visual images is 91.57%. By introducing the long short-term memory
module, the loss value of the model decreases from 0.26 to 0.21, with a reduction of 19.23%. In addition,
ablation experiments show that by introducing attention mechanisms and occlusion independent neural
responses, the final classification accuracy is improved to 93.94%. In summary, the research on the fusion
intelligent computing model grounded on deep convolutional neural networks and brain visual cognition
effectively improves the accuracy of image classification and demonstrated its potential in the field of
intelligent computing.
Povzetek: Predstavljen je inteligentni model za razvrščanje slik, ki združuje globoke konvolucijske
nevronske mreže (DCNN) in možgansko vizualno kognicijo preko EEG signalov.
1 Introduction outstanding performance in image processing tasks [4].
sHowever, despite the excellent performance of computer
With the rapid prosperity of artificial intelligence, technology in image classification, computers still cannot
human-computer interaction has turned into a trend in the fully replace the precise image recognition and
current research field. Brain computer interface (BCI), as classification capabilities of the human brain in complex
a cutting-edge scientific research direction, is gradually and diverse open environments with interference and
becoming a meaningful bridge in the area of human- occlusion [5]. So, the challenge currently facing the field
computer connection. The visual system of the human of computer vision is figuring out how to empower
brain has evolved over millions of years and possesses artificial intelligence systems to more effectively mimic
extremely efficient visual processing capabilities. human brain cognition and attain precise image
Through multi-level visual processing mechanisms, the classification in intricate scenarios. Therefore, in this
brain can quickly and accurately understand complex context, research innovatively combines the powerful
visual information [1]. When external objects are computing power of DCNN with the cognitive
transmitted to the visual center of the brain through the characteristics of the brain’s visual system, and constructs
visual organs, the brain quickly recognizes, classifies, and an intelligent computing model based on the fusion of
understands these visual information, thereby forming DCNN and brain visual cognitive information, in order to
cognition of the object or scene [2]. BCI can interpret achieve accurate image classification in complex
visual cognition of the brain by recording and analyzing backgrounds.
electroencephalogram (EEG) signals [3]. The Deep The research objectives include designing and
Convolutional Neural Network (DCNN) in computer implementing an intelligent computing model based on
vision technology has attracted much attention due to its DCNN and EEG signal fusion to improve the performance
38 Informatica 49 (2025) 37–52 X. Li et al.
of image classification in interference and occlusion generative adversarial networks and variational
environments. The research aims to explore how the autoencoders to produce composite EEG cues. The results
model simulates the visual recognition process of the showed that the method was effective [8]. Kumari et al.
human brain, especially for accurate image classification proposed a multi-channel EEG movement sorting model
in complex backgrounds. The research hypothesis is that to improve the precision of EEG movement sorting. The
by combining the visual feedback and image features of model utilized CNN to extract descriptive emotional state
the brain, intelligent computing models can simulate the characteristics from EEG signals and generates two-
visual recognition process of the human brain, thereby dimensional images to represent these features. The
improving the accuracy of classification results. The outcomes revealed that the overall precision of this model
preset results demonstrate that by introducing visual reached 83.04% [9].
cognitive information from the brain, the model can mimic DCNN occupies a momentous position in EEG
the cognitive process of the human brain in actual visual picture sorting tasks. Santamaria-Vazquez et al. raised a
tasks, providing new ideas and directions for the sorting model grounded on different control signals to
integration of BCIs and intelligent systems. extract complex features from EEG data for classification.
The research content mainly includes four sections. The model used DCNN for time calibration of BCIs and
The second section provides a survey of the current study integrated modules for detection of event-related
status of visual EEG picture classification and DCNN potentials. The outcomes revealed that the command
around the world. The third section conducts research on decoding accuracy of this way improved by 16.0% [10].
intelligent computing models that integrate DCNN and Yıldırım et al. raised a novel deep one-dimensional CNN
brain visual cognition. The first section proposes the monitoring model to optimize the precision of EEG
design of a picture sorting model grounded on the fusion monitoring. The model utilized machine learning
of DCNN and brain visual cognition information. The techniques to automatically identify regular and aberrant
second section designs an intelligent computing model EEG signals, and classified EEG signals using an end-to-
based on the fusion of DCNN and brain visual cognitive end structure. The outcomes revealed that this way was
information. The fourth section validates the intelligent feasible [11]. Miao et al. raised a multi-layer CNN model
computing model that integrates DCNN with brain visual using a DCNN structure to raise the classification
cognition. precision of EEG pattern identification algorithms. The
model utilized prior knowledge and complex parameter
2 Related works adjustments to extract spatial frequency features. The
outcomes showed that this way had good classification
The visual cognitive ability of the brain can recognize, capability [12]. Li et al. proposed a way of using DCNN
classify, and understand visual information. In recent combined with continuous wavelet transform to enhance
years, research on visual interpretation based on the identification rate of limbs action image EEG cues.
monitoring the neural response of the brain during visual This method mapped the limbs action image EEG cues to
cognition has gained the eyesight of numerous time-frequency image signals using continuous wavelet
professionals and savants. Gao et al. raised an attention- transform, and input the image signals into the CNN
based parallel multi-scale Convolutional Neural Network structure to collect characteristics and classify them. The
(CNN) model to improve the accuracy of decoding EEG outcomes revealed that this way effectively raised the
aroused potentials. The model used two parallel recognition rate [13]. In recent years, the combination of
convolutional layers to extract temporal features and BCI and DCNN has become an important research
utilized attention mechanisms to weight features at direction in the analysis of EEG and brain visual neural
different times. The outcomes revealed that the model activity signals. The detailed progress of BCI is as follows:
effectively reformed the interpreting ability of ocular Tang X et al. proposed an end-to-end BCI method based
aroused potentials under complex conditions [6]. Ahirwal on CNNs, which directly extracts spatiotemporal features
et al. proposed a new channel selection technique that from EEG signals and classifies them. The results showed
could identify and characterize harmful emotions aiming that this method could achieve higher classification
to raise the precision of emotion sorting of EEG signals. accuracy than traditional manual feature extraction
This technique extracted three forms of characteristics methods, especially in motion imagination tasks and
from EEG cues: time-domain characteristics, frequency- various emotional state classification tasks [14]. In
domain characteristics, and entropy based characteristics, addition, Kawala Sterniuk et al. reviewed over 50 years of
and used Support Vector Machines (SVM) and artificial using BCIs and concluded that BCI not only enables brain
neural networks to classify emotions based on the control, but also opens the door for regulating the central
extracted features. The outcomes showed that this way nervous system through neural interfaces, demonstrating
effectively optimized the sorting behaviour [7]. the potential applications of this technology [15]. The
Komolovait et al. raised a way of using CNN combined research on integrating BCI and DCNN will provide a
with stable-state ocular aroused potentials to gain more solid foundation for the popularization and
interpretable characteristics from rough EEG cues in order application of BCI technology. The comparative summary
to improve the effectiveness of brain activity data in table is shown in Table 1.
classifying visual stimuli. This method also introduced
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 39
Table 1: Comparison summary table
Study Method Advantages Limitations Missing features
Gao Parallel multi-scale Improved the Still affected by noise in Failed to effectively combine
et al. CNN based on attention decoding complex environments, spatial and temporal features in the
[6] performance requires processing a brain's visual cognitive process;
of visual large amount of cannot adapt to complex
evoked temporal features environmental visual information
potentials processing
Ahirw EEG-based emotion Improved Focuses mainly on Cannot process complex visual
al et classification model, emotion emotion classification, information and its complex
al. [7] combining SVM and classification lacks deep classification relationship with emotions
artificial neural accuracy and processing of visual
networks information
Komo Steady-state visual Effectively Poor robustness to signal Failed to effectively combine
lovait evoked potentials improved noise, high complexity visual cognitive mechanisms;
ė et combined with CNNs visual in training generative limited to static visual stimulus
al. [8] stimulus adversarial networks processing
classification
Kuma Multi-channel EEG- Achieved an Focuses on emotion Cannot handle complex image
ri et based emotion average classification, mainly classification tasks, especially
al. [9] classification model accuracy of uses image feature multi-class image recognition
83.04% representations, lacks
handling of more
complex scenarios
Santa Classification model Increased Relies heavily on event- Lacks adaptability to dynamic
maria- based on different command related potential EEG signals, unable to combine
Vazqu control signals using decoding detection, may face spatial and temporal features
ez et DCNNs accuracy by difficulties in decoding
al. 16.0% complex EEG data
[10]
Yıldır EEG monitoring model Provides a Focuses on normal vs Cannot effectively process multi-
ım et based on deep one- feasible abnormal EEG signal class or dynamically changing
al. dimensional CNNs classification classification, lacks visual information
[11] method ability to handle
complex visual tasks
Miao EEG pattern recognition Shows good Mainly focuses on Lacks comprehensive capture of
et al. based on multi-layer classification spatial frequency feature dynamic EEG data or multi-
[12] DCNNs performance extraction, may be dimensional features of visual
limited in handling information
complex dynamic tasks
Li et Classification of Significantly Relies on signal Cannot process EEG signals
al. left/right hand motor improved preprocessing, suitable related to visual tasks, sensitive to
[13] imagery EEG signals recognition for specific tasks environmental noise
combined with rate
continuous wavelet
transform and DCNNs
In summary, although existing methods have made complex environments in existing methods. It has higher
some progress in EEG classification tasks, they have classification accuracy and wide application prospects.
certain limitations in handling complex dynamic tasks, 3 Intelligent computing model
enhancing robustness, and adapting to multiple tasks. The
research combines the visual cognitive mechanism of the integrating DCNN and brain visual
brain with DCNNs and Long Short-Term Memory (LSTM) cognition
networks to design a fusion intelligent computing model.
Research receives EEG information through BCI,
This model can more comprehensively capture the spatial
combines voxel encoding and improved DCNN model to
and temporal features of EEG signals, solving the
achieve image classification, and uses LSTM to collect
problems of lack of adaptability and poor adaptability to
temporal characteristics of EEG cues. Attention
40 Informatica 49 (2025) 37–52 X. Li et al.
mechanism is utilized to raise the accuracy of image The ventral lower temporal cortex is particularly closely
feature extraction, and the correlation between brain related to complex visual recognition and is the main
response and image features is enhanced by masking functional area for object and face recognition. When the
irrelevant neural responses. brain receives visual stimuli, it stimulates the cortical
regions in the ventral stream, transforming simple visual
3.1 Design of image classification model features into higher-level cognitive concepts. For instance,
based on DCNN and brain visual visual information is initially processed by the primary
visual cortex and then passed through intermediate areas,
cognitive information
ultimately being mapped to the inferior temporal cortex
Neuroscience research has found that the human brain within the ventral stream, where intricate functions like
achieves complex cognitive processing through parallel object recognition and color discrimination take place.
information exchange between dorsal and ventral streams The dorsal flow is mainly responsible for processing
in visual activities [16]. Abdominal flow is a pathway that spatial information, motion perception, and action control.
connects the primary sensory cortex with the temporal and Dorsal flow helps the brain perform functions such as
prefrontal regions, primarily responsible for recognizing object localization, motion tracking, and hand eye
visual and auditory stimuli and mapping basic information coordination through connections with the parietal lobe,
to higher-level semantic concepts [17]. The dorsal flow is motor cortex, and other areas. Therefore, given the core
responsible for spatial information and motion control. role of ventral flow in image classification tasks, research
The activity of brain neurons triggered by visual stimuli is focuses on analyzing the brain signal response of ventral
called EEG signals, and BCIs can record and measure flow to better understand the process of visual feature
these signals through biometric technology to reflect the extraction and semantic comprehension. The encoding
brain’s response to behavior. The core area of ventral flow framework for ventral response based on brain visual
includes the primary visual cortex, ventral intermediate cognition is shown in Figure 1.
cortex, ventral lower temporal cortex, and other regions.
V1 V2 V4 L0
Stimulus
Linear layer Visual area of
Feature extraction brain activity
model
Figure 1: A coding framework for ventral response based on brain visual cognition
As shown in Figure 1, in the ventral response and visual stimuli. This mapping helps to reveal the roles
encoding framework based on brain visual cognition, the of different brain regions in visual information processing,
brain activity caused by visual stimuli can be obtained thereby enhancing the accuracy of image classification
through the BCI, and the stimulus image can be input into tasks. The EEG signals capture the electrical activity of
the feature extraction model. After nonlinear calculation, the cerebral cortex, which can be mapped to specific
the feature space of the image can be obtained. Then, these regions of the brain through modeling techniques such as
features are used to predict the voxel space of the visual source localization, in order to infer activity responses in
region through linear layers. The voxel encoding model different areas. This type of method can correlate the
transforms human-readable data into a format that spatiotemporal patterns of EEG signals with voxels in
machines can store, facilitating the achievement of either functional neuroimaging data. There may be some
shared encoding across various visual regions or unique common neural response patterns between multiple visual
encoding for specific visual areas. This process aids in regions. These shared response patterns can be captured in
pinpointing the regions within the brain's visual cortex that voxel encoding models, revealing how these regions
are responsible for processing visual information. [18]. collectively respond to the same visual stimuli. For
Voxel encoding converts brain activity into a feature space, example, in image classification tasks, certain visual
enabling precise association between cognitive responses regions may exhibit similar neural activity responses to
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 41
the same visual features, so voxel encoding can reflect the feature representations by combining convolutional and
similarity and interactivity between these regions as a pooling layers, which helps extract abstract features from
shared encoding pattern. By combining the results of brain data. Therefore, the study adopts DCNN to extract image
visual cognition and image classification, complementary features and designs a picture sorting model grounded on
information exchange and expression can be achieved, the fusion of DCNN and brain visual cognitive
thereby obtaining a more comprehensive joint information, as shown in Figure 2.
representation. DCNN can automatically learn image
Brain
response
Splicing
Reliability prediction SVM
EEG signal data together
acquisition
DCNN
Figure 2: Image classification model grounded on information fusion
As represented in Figure 2, the image classification
wv = d f / (d + f ) (2
v f d )
model based on information fusion mainly includes three b v
parts: characteristic collection structure, characteristic In equation (2), w represents the fusion weight of
v
reliability prediction structure, and brain computer
image features, and d f represents the classification
information fusion classification structure. The brain v
response data utilizes the ventral response encoding sensitivity index of image features. The fusion weight of
framework to extract semantic features, while image data brain response features is shown in equation (3).
is extracted through the DCNN structure for image wb = d f / (d f + d f ) (3)
b b v
characteristic collection. Next, the extracted
characteristics are input into the feature reliability In equation (3), w represents the fusion weight of
b
prediction structure for reliability calculation, and then the brain response features. The math description for the
fusion weights of image features and brain response fusion feature is represented in equation (4).
characteristic are automatically adjusted. Finally, the fF = (wb f (b))concat(wv f (v)) (4)
fused features are input into the SVM for classification.
In equation (4), f represents fusion features,
F f (b)
EEG signals will undergo denoising processing after
acquisition, such as bandpass filtering, independent represents brain response features, and f (v) represents
component analysis, signal normalization, and other image features. Due to the fact that EEG signals are
preprocessing operations to ensure signal quality. collected over a continuous period of time and have time-
Subsequently, EEG signals will be synchronized with the series characteristics, there is a continuity relationship
presentation time of visual stimuli to ensure accurate between the signals at each moment and those before and
matching between brain responses and image features at after [19]. However, although existing feature extraction
each moment. The loss function for reliability prediction models perform well in many application scenarios, they
is shown in equation (1). often do not fully consider the temporal dependencies in
n
time series data. Especially when it comes to traditional
L = ( )2
)
MSE d p − d f / n (1
n=1 b models like CNNs, while they excel at extracting features
from images and static data, they can fall short when it
In equation (1), L means the loss function of
MSE comes to capturing temporal information and dynamic
reliability prediction, d p represents the feature reliability signal changes in time-series data, such as EEG signals. In
response to this situation, the study uses an LSTM
prediction value, d f represents the classification
b structure to extract time series features of EEG signals.
sensitivity index of brain response features, and n The architecture for extracting brain response features
represents the batch size. The fusion weights of image based on time series is shown in Figure 3.
features are shown in equation (2).
42 Informatica 49 (2025) 37–52 X. Li et al.
Transformer module
T time stamps T N calss tokens
x1 x2 x3 xt x1 x2 x3 xt
Temporal
Filtering transfomer module
V V V L
1 2 4 0
Global features of EEG signals T + N
Figure 3: Architecture for extracting brain response features based on time series
As shown in Figure 3, the brain response feature represents the output matrix of the input gate. The
extraction architecture based on time series uses calculation for the output gate is shown in equation (7).
Transformer module to extract global features of EEG ot = (Wo ht−1, xt + bo ) (7)
signals on time series, and embeds absolute positions to
maintain the order of the model. Before inputting In equation (7), o represents the output gate, W
t o
positional encoding, the classification identification bits means the weight of the output gate, and b means the
o
are concatenated with the time series, and then mapped
offset term of the output gate. The output features are
through linear transformation to raise the diversity of
shown in equation (8).
characteristic collection. In research models, LSTM is
h = o tanh(C )
mainly used to integrate brain response data collected (8)
t t t
from BCIs. The integration process is as follows: Firstly, In equation (8), h represents the output feature. The
t
the brain response signals collected from the BCI system
unique gating mechanism of LSTM can effectively handle
are preprocessed, such as denoising and normalization, to
the problem of long time intervals and delays in time series,
obtain clean time-series data. Then, these preprocessed
and can discard and store large-span information in EEG
brain response data are used as inputs for the LSTM
data, thus better encoding EEG signals.
network. LSTM networks can capture temporal
dependencies in data and learn neural response patterns of
the brain at different time points. Next, through the time- 3.2 Design of intelligent computing model
dependent modeling of LSTM, the output data contains based on DCNN and brain visual
the gradual response patterns of the brain to visual stimuli cognitive information
throughout the entire image processing process. Finally,
The study simulates the connectivity and
the temporal response of the brain is processed by LSTM
classification patterns of biological brain neurons,
and combined with image features extracted by DCNN.
exploring the connection between picture features and
The calculation for the forget gate of LSTM structure is
brain reactions. DCNN has demonstrated significant
shown in equation (5).
capabilities in image feature extraction. By combining
ft = (W f [ht−1, xt ]+ bf ) (5) convolutional and pooling layers, it can automatically
learn multi-level abstract feature representations of
In equation (5), f means the output of the forget gate,
t images, effectively capturing low-level and high-level
W means the weight of the forget gate, means the
f features in images. However, despite DCNN's high
Sigmoid activation function, b represents the offset term efficiency in feature extraction, the image features it
f extracts still struggle to fully explain the brain's response
of the forget gate, x represents the input signal at time t ,
t patterns. This is because the visual cognitive process of
and h represents the output signal at time t −1 . The the brain not only relies on low-level visual features of
t−1
images, but also involves complex high-level semantic
unit update calculation is shown in equation (6).
information processing, perceptual integration, and
Ct = ft Ct−1 + it Ct (6) interaction with other cognitive processes such as memory
In equation (6), C means the cell condition at time and emotion. The features extracted by DCNN mainly
t
t , C represents the cell condition at time t −1 , and i focus on significant visual information in the image, but
t−1 t these features often lack sufficient high-level semantic
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 43
depth and are difficult to fully integrate with the complex to construct an intelligent computing model for the brain.
responses of the brain in visual cognition. Therefore, the The brain response is used as supervised information for
picture characteristics extracted by DCNN are difficult to images, and difficult to interpret high-level semantic
fully explain the representation information of brain information is transferred to the DCNN model to achieve
response, and there are some modality specific more accurate mining of the brain’s visual cognitive
expressions between the two, making it difficult to deeply response. The intelligent computing model structure
explore their deep correlations [20, 21]. In response to this grounded on data fusion is represented in Figure 4.
limitation, a network structure based on DCNN is studied
Hypersphere
Brain
response v1 v2 v3 vN
b1
EEG signal data Normalization
acquisition b2
b3
DCNN
bN
Figure 4: Intelligent computing model structure grounded on data fusion
As represented in Figure 4, in the intelligent and N represents the total amount of positive and
computing model framework based on information fusion, negative samples. The calculation for the comparative loss
characteristic extraction is first constructed on the is represented in equation (10).
cognitive response data of the brain to visual images m
collected by the BCI. Then, the DCNN structure is applied exp(S( f (vi ), f (b+
j )) / )
j=0
to collect characteristics from the input picture. After Li = −log (10)
n
normalizing the two extracted features separately, the exp(S( f (vi ), f (b−
k )) / )
k=0
fused features are mapped onto an N-dimensional sphere.
Subsequently, based on the normalized features, a set of In equation (10), L i represents the contrastive loss,
positive and negative samples are constructed, and the m means the amount of positive samples, n means the
InfoNCE loss function is used for calculation, thereby
amount of negative samples, f (v ) means the mapped
achieving the transfer of correlated information between i
+
the two feature maps. The math description of the image features, f (bj ) represents brain response features
InfoNCE loss function is represented in equation (9).
of the same category as the image features, and f (b−
j )
exp(S(z +
i , zi ) / )
Li = −log
N (9) represents brain response features of different categories
exp(S(zi , z j ) / ) from the image features. In intelligent computing models
j=0
based on the fusion of DCNN and brain visual cognitive
In equation (9), L represents the InfoNCE loss
i information, the DCNN structure may be affected in
function, represents the temperature coefficient, z classification accuracy due to irrelevant information. To
i
represents the image representation corresponding to the address this issue, research is being conducted to improve
the DCNN structure by incorporating attention
input data x ,
i S(z e r s n s t
i , z r e t h
j ) p e e cosine similarity
mechanisms. The DCNN feature extraction model
between image representations, S(z , +
i zi ) represents the grounded on attention mechanism is represented in Figure
alignment characteristics during hypersphere mapping, 5.
44 Informatica 49 (2025) 37–52 X. Li et al.
(W2 , H2 )
(1,W2 H2 )
Fat
A
W W1 W2
H Frr1 H F
1 rr 2 H2
X0 X1 X2 X3
Figure 5: DCNN feature extraction model grounded on attention mechanism
As represented in Figure 5, the DCNN architecture In equation (12), A represents positional importance
used mainly includes multiple convolutional layers, and F represents fully connected operation. The new
at
pooling layers, activation functions, and fully connected
feature map obtained by further downsampling the
layers, aiming to extract multi-level feature
abstract features is shown in equation (13).
representations from images to improve classification
X2 = Frr2 (X1) (13)
performance. The core idea of DCNN is to extract local
features from images through a series of convolution and In equation (13), X represents the new feature map
2
pooling operations, and then add nonlinear after further downsampling, and F represents the
rr 2
transformations through nonlinear activation functions to
further downsampling operation. The corrected feature
learn more complex image representations. The
map is shown in equation (14).
convolutional layer, as a fundamental component in the
W2 H2
DCNN architecture, can extract local features of the input X3 = A(i, j) X2 (i, j) (14)
image through convolution operations. After each i=1j=1
convolution operation in each layer, the study will use In equation (14), X represents the feature map
3
activation functions to perform nonlinear transformations obtained after attention branch correction, W and H
on the output results. The purpose of the activation
represent the width and height of the characteristic map,
function is to introduce nonlinear factors, so that the
and (i, j) represents the feature values on the feature map.
network can learn more complex mapping relationships.
The function of the pooling layer is to downsample the When capturing the correlation information between brain
feature map output by the convolutional layer, thereby visual cognitive responses and image features, some non-
reducing the spatial size of the feature map while correlated neural responses may affect the determination
preserving important features. After extracting sufficient of representation similarity. These "unrelated neural
local features in the convolutional and pooling layers, the responses" pertain to neural activities that aren't directly
last few layers are usually fully connected layers. The fully tied to visual tasks and might stem from background noise,
connected layer linearly combines the extracted features irrelevant visual cues, or various other bodily influences.
and generates the final output result through an activation For example, the activity of certain regions in EEG signals
function. The DCNN feature extraction model grounded may be unrelated to the current visual task, and this
on attention mechanism adds a parallel attention branch to irrelevant neural activity can lead to misleading similarity
the initial DCNN structure to learn the importance judgments when the brain processes visual information.
information of feature map position. This path can correct To address this issue, research has been conducted on an
the activation values of feature maps, reduce the activation intelligent computing model based on the fusion of DCNN
values of redundant information, and thus improve the and brain visual cognitive information, which devotes to
accuracy of image characteristic collection [22, 23]. The raise the precision of capturing correlated information by
feature transformation process is shown in equation (11). masking non-correlated neural responses. In the intelligent
X = F computing model based on the fusion of DCNN and brain
1 rr1(X0 ) (11)
visual cognitive information, the study aims to add
In equation (11), X represents the transformed
1 windows of different scales to the extracted image features
abstract feature map, X represents the initial feature map, to mask non-correlated neural responses. The motivation
0
of this method is to better highlight the effective response
and F represents the downsampling operation. The
rr1 of the brain to visual information and improve the
calculation for position importance is shown in equation accuracy of similarity determination between brain visual
(12). cognitive responses and image features by reducing or
A = Fat (X1) (12) eliminating the influence of irrelevant neural reactions.
The visualization process of brain response and image
feature correlation information is shown in Figure 6.
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 45
Electroencephalogram signal Image features
Feature Feature Feature
extraction extraction extraction
d ( f (v), g(b)) d ( f (v (x, y)), f (b)) d ( f (v (x, y)), f (b))
1 n
1 1 n n
V V
1 n
V t
Figure 6: Visualization process of brain response and image feature correlation information
As shown in Figure 6, in the process of visualizing the used Python language and was implemented using the
correlation information between brain response and image PyTorch framework. The experimental parameters were
features, the correlation features are first represented in the set as follows: batch size was 16, original learning rate was
shared representation space and their Euclidean distance 0.001, Adam optimizer was used during training, output
is calculated. Then, a window with a scale of i is layer size was 40, and the key vector value in the self
i
added to the extracted image features to mask non- attention mechanism was 128. The dataset was sourced
correlated neural responses. Based on the Euclidean from the comprehensive evaluation platform Brain Score.
distances calculated from various image features, different This dataset aimed to evaluate the effectiveness and
sizes of occlusion windows are determined. The saliency accuracy of computer simulated brain operation models,
maps obtained through occlusion at different scales are thus covering response data of primate visual systems. The
then combined to create a comprehensive saliency map dataset contains approximately 5000 image stimuli, each
that encapsulates the relationship between brain responses corresponding to a recorded brain electrophysiological
and image features. The calculation for the significance response data. The stimulus images cover a total of 40
map is shown in equation (15). categories, including natural scenes and artificial objects.
The number of images in each category is roughly equal
V =| d( f (v (x, y)), f (b))−d( f (v), g(b)) | (15)
t t to ensure data balance. The size of each image is 224x224
In equation (15), V represents the significance map pixels, which can retain sufficient visual information and
t
meet the input requirements of CNN. Any data
and d represents the Euclidean distance.
augmentation techniques used by the research include
random cropping, horizontal flipping, random rotation,
4 Validation of an intelligent and color jitter. The above data augmentation techniques
computing model integrating can effectively expand the diversity of training data, avoid
model overfitting, and improve the generalization ability
DCNN and brain visual cognition to various visual stimuli. After preprocessing, the data was
After setting up the experimental environment, the separated into a training set and a testing set in a 3:7 ratio.
behaviour of the image classification model grounded on While primarily intended for evaluating brain functional
information fusion was first verified, and then the models, the "Brain Score" dataset is well-suited as a data
intelligent computing model based on information fusion source in this study to verify the efficacy of intelligent
was experimentally analyzed. computing models that integrate image classification with
brain visual cognition, given its abundance of visual
4.1 Experiment environment construction stimulus images and corresponding EEG response data. In
the experimental design of this study, the evaluation of
To tesify the effectiveness of the intelligent image classification focuses on guiding the learning and
computing model that integrated DCNN and brain visual classification of image features through brain response
cognition, the study first conducted the construction of an data, rather than simply image classification. The detailed
experimental environment. The experimental hardware experiment environment configuration and network
system configuration was as follows: the processor was training parameters are represented in Table 2.
Intel i7-8700, the GPU was Nvidia GeForce 1080Ti, and
the memory was 64 GB DDR4. The experimental model
46 Informatica 49 (2025) 37–52 X. Li et al.
Table 2: Experiment environment configuration and prediction accuracy of the ventral response encoding
network training parameters method based on brain visual cognition was significantly
higher than the other two methods. The maximum
Experiment Configuration Training par Config prediction accuracy of this method reached 93.54%, which
al environ ameters uration was 6.05% and 19.57% higher than the maximum
ment prediction accuracy of CNN-EM and GaborNet-VE,
CPU Intel i7-8700 Batch Size 16 which were 87.49% and 73.97%, separately. From Figure
GPU Nvidia GeFor Initial learni 0.001 7(b), the results of visual area L0 show that the maximum
ce 1080Ti ng rate prediction accuracy of the ventral response encoding
Memory 64 GB DDR4 Output layer 40 method based on brain visual cognition was 94.03%,
size which was 11.49% and 18.95% higher than the maximum
programmi Python Key vector v 128 accuracy of 82.54% and 75.08% of the other two methods,
ng languag alue respectively. In addition, the study used paired t-tests to
e validate the credibility of the results. In the V4 region, the
Frame PyTorch Optimizer Adam difference in accuracy between the ventral response
encoding based on brain visual cognition and CNN-EM
4.2 Performance verification of image reached a statistically significant level (t=4.72, P<0.05).
classification model based on The difference in accuracy between ventral response
information fusion encoding based on brain visual cognition and GaborNet
VE also reached a statistically significant level (t=6.88,
In order to verify the predictive accuracy of ventral P<0.05). In the L0 region, the accuracy difference
response encoding based on brain visual cognition for between ventral response encoding based on brain visual
brain cognitive response, this method was compared and cognition and CNN-EM reached a statistically significant
analyzed with other voxel encoding methods, including level (t=5.23, P<0.05). Similarly, the accuracy difference
Convolutional Neural Network Enhancement Model between ventral response encoding based on brain visual
(CNN-EM) and GaborNet Visual Encoding (GaborNet- cognition and GaborNet VE was also statistically
VE). The accuracy comparison of different encoding significant in the L0 region (t=7.14, P<0.05). Ventral
methods in different visual regions is represented in Figure response encoding based on brain visual cognition could
7. From Figure 7(a), within the V4 visual region, the accurately predict brain cognitive response.
* *
1.0 * * 1.0 * *
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Brain cognitive Brain cognitive
CNN-EM GaborNet-VE CNN-EM GaborNet-VE
encoding encoding
Coding method Coding method
(a) Comparison of Accuracy in V4 Visual Region (b) Comparison of Accuracy in L0 Visual Region
Figure 7: Comparison of accuracy of different encoding methods (*Indicating P<0.05)
To further testify the performance of the image LSTM converged to 0.26, while the loss value of the
classification model based on information fusion, a model after inserting LSTM converged to 0.21, with a
relative unpack was operated on the classification models reduction of 19.23%. This indicated that the classification
before and after adding LSTM, as represented in Figure 8. model incorporating LSTM had better convergence
From Figure 8, the loss value of the model before adding performance.
Prediction accuracy
Prediction accuracy
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 47
0.65
LSTM post model
LSTM pre model
0.55
0.45
0.35
0.25
0.15
0 200 400 600 800 1000
Number of iterations
Figure 8: Comparison of training loss values for networks incorporating LSTM Models
To further validate the capability of the image images, the classification accuracy of this model was the
classification model grounded on information fusion, this lowest at 91.57%, which was 16.31%, 12.03%, and 19.08%
study compared and analyzed the model with other higher than the accuracy of 75.26%, 79.54%, and 72.49%
advanced image classification models, including Feature of the other three models, respectively. Compared with
Weighted Classification (FWC), Residual Network basic testing image classification tasks, animal
(ResNet), and Visual Geometry Group (VGG). In addition, classification often faces more complex backgrounds and
to ensure the broad applicability and contextualization of varying object shapes, which makes this task an important
the research results, the performance of these models was criterion for testing model robustness. Therefore, the
compared with benchmark test results in the current field significant improvement of the research model in this task
of computer vision. The accuracy comparison of different indicated that it has stronger generalization ability and
classification models is represented in Figure 9. From adaptability when facing highly complex and dynamically
Figure 9, in the datasets of various visual images, the changing visual environments. In summary, the image
accuracy of the picture sorting model grounded on classification model based on information fusion has
information fusion was the best. On facial vision images, demonstrated excellent classification performance in
the accuracy of this model was as high as 95.46%, which multiple tasks. Moreover, the performance of the research
was an improvement of 6.30%, 5.41%, and 10.03% model still has significant advantages compared to
compared to the accuracy of 89.16%, 90.05%, and 85.43% benchmark testing.
of FWC, ResNet, and VGG, respectively. This result The reason why facial image recognition had better
showed significant advantages compared to the accuracy than animal image recognition was that facial
mainstream benchmarks in the current field of facial images had more stable and easily recognizable features
recognition. In facial recognition tasks, many of the most compared to animal images. Facial recognition typically
advanced facial recognition technologies, such as FaceNet fixed structural features and relatively consistent
and ArcFace, achieved high accuracy on multiple standard backgrounds, which enabled information fusion based
datasets such as LFW and CASIA WebFace. For example, models to fully exploit the effective information in the
FaceNet reported an accuracy of 94.63% on the LFW brain's visual cognitive model, thereby improving
dataset [6]. Moreover, ArcFace also achieved a recognition accuracy. However, animal images face more
recognition rate of nearly 94.51% on the same dataset [7]. complex challenges, including background noise, changes
However, the above studies all achieved accuracy in in animal size and morphology, different shooting angles,
interference free environments, while this study still different species, etc. These factors make classification
achieved an accuracy of up to 95.46% in actual tasks more complex and varied. Therefore, in terms of
environments with complex interference and occlusion. recognition accuracy, the performance of facial image
Compared with existing benchmark tests, the researched classification was better than that of animal image
image classification model based on information fusion classification.
had more advantages in performance. On animal visual
Magnitude of the loss
48 Informatica 49 (2025) 37–52 X. Li et al.
Intelligent computing model
FWC model
ResNet model
100 VGG model
90
80
70
60
50
40
30
20
10
0
Face
Automobile Animal Fruits Chair
recognition
Types of visual images
Figure 9: Comparison of accuracy between different classification model
4.3 Performance verification of intelligent relatively chaotic, while the sample distribution of ResNet
computing models based on was relatively clear. The ResNet model had a more
obvious distinction between facial and animal visual
information fusion
images, but it was more confusing in distinguishing
To tesify the ability of intelligent computing models images such as fruits and cars. The intelligent computing
grounded on information fusion, the visualization sample model based on information fusion studied exhibited
distribution results of different models under various brain significant classification clarity and good classification
visual cognitive image stimuli were compared and studied. performance under all visual image stimuli. This was
The dataset contains 40 categories of images, which are because images of facial and animal categories were more
divided into two main categories: natural scenes and consistent in natural scenes and were easily
artificial objects. Natural scenes include image categories distinguishable by models. However, categories such as
such as faces and animals, while artificial objects include cars, fruits, and chairs belong to the category of artificial
image categories such as cars, fruits, chairs, etc. In the objects, and the visual differences between these
experiment, a combination of these image categories was categories were significant, posing greater challenges to
used to test the classification performance of the model the model. The intelligent computing model based on
under different visual stimuli. The visualization outcomes information fusion revealed the deep level features of
of various models are represented in Figure 10. From brain response, effectively improving classification
Figure 10, the sample distribution of FWC and VGG was accuracy.
Classification accuracy/%
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 49
Face recognition Fruits Animal Face recognition Fruits Animal
3 Automobile Chair 3 Automobile Chair
2 2
1 1
0 0
-1 -1
12 12
-3 -3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
X-axis sample distribution X-axis sample distribution
(a) Visualization results of FWC model (b) Visualization results of ResNet model
Face recognition Fruits Animal Face recognition Fruits Animal
3 3
Automobile Chair Automobile Chair
2 2
1 1
0 0
-1 -1
12 12
-3 -3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
X-axis sample distribution X-axis sample distribution
(d) Visualization results of intelligent
(c) Visualization results of VGG model
computing models
Figure 10: Visualization results of different models
To further validate the ability of the intelligent Framewor N T mecha neural rate/
computing model grounded on information fusion, k N M nism responses %
ablation experiments were conducted. The classification √ / / / / 81.4
accuracy in ablation experiments was calculated based on 2
the precision of image classification tasks, which only √ √ / / / 87.0
reflected the accuracy of image classification results. The 6
ablation experiment results of image classification are √ √ √ / / 90.4
shown in Table 3. From Table 3, the classification 8
accuracy of the brain visual cognitive response encoding √ √ √ √ / 92.6
framework was 81.42%. When the DCNN structure was 5
fused, the accuracy was improved by 5.64%. After adding √ √ √ √ √ 93.9
the LSTM structure, the accuracy was increased to 4
90.48%. When attention mechanism was added for Note: "√" indicates the existence of the module; "/"
improvement, the classification accuracy increased by
indicates its non existence
2.17%. When further optimizing using occluded non-
correlated neural responses, the accuracy of the model
reached 93.94%. From the above, it can be seen that the 5 Discussion
addition of the above modules brought benefits to the In order to improve the accuracy of computer vision
classification performance of the model, effectively image classification, a fusion intelligent computing model
raising the classification accuracy of images. was constructed by simulating the visual processing
Table 3: Ablation experiment mechanism of the human brain, using BCI technology to
extract EEG signals generated by human visual cognition,
Brain D L Attent Obstructing Accu and combining DCNN structure. The results showed that
Response C S ion non racy after adding LSTM, the convergence of the model was
Coding correlated significantly improved, with the loss value decreasing
Y-axis sample distribution Y-axis sample distribution
Y-axis sample distribution Y-axis sample distribution
50 Informatica 49 (2025) 37–52 X. Li et al.
from 0.26 to 0.21, indicating a 19.23% increase in 6 Conclusion
convergence speed. This indicated that LSTM could
effectively capture time series features, improve the In recent years, the introduction of visual cognitive
model's ability to process time-series data, and thus make mechanisms in the brain has provided new solutions to the
the model more accurate in learning dynamic information. limitations of accuracy and generalization ability of
After incorporating the advantages of LSTM into the traditional DCNN in processing complex visual
model, it could better understand the temporal information. Research used BCIs to receive EEG
dependencies in brain activity, resulting in more accurate information, used voxel encoding models to obtain the
prediction performance. In addition, compared with other expression content of visual images, and combined an
advanced methods, the research method was significantly improved DCNN structure to construct an efficient image
superior to other methods. For example, although the classification model. On this basis, the LSTM structure
model studied by Gao Z et al. effectively improved the was further introduced to extract time series features of
decoding performance of VEP in complex environments, EEG signals. Attention mechanisms and occlusion
it still faced the problem of noise interference and failed independent neural responses were utilized to enhance the
to effectively integrate spatial and temporal features in the accuracy of capturing correlation information between
brain's visual cognitive process [6]. The model studied in brain responses and image features. The outcomes
this article not only considers spatial features but also revealed that the ventral response encoding method
integrates dynamic temporal information when predicting grounded on brain visual cognition achieved prediction
brain responses in visual regions, significantly improving accuracy of 93.54% and 94.03% in the V4 visual region
the accuracy of predictions. In addition, the model and L0 visual region, significantly better than the CNN-
proposed by Ahirwal M K et al. achieved good results in EM and GaborNet VE methods. In the model validation,
emotion classification, but it mainly focused on emotion after adding the LSTM module, the loss value decreased
classification and cannot handle complex visual from 0.26 to 0.21, with a reduction of 19.23%. In terms of
information and multi-class image classification tasks [7]. image classification capability, the accuracy of the
The model studied in this article could not only handle information fusion based model on facial visual images
complex visual information, but also adapt to the was as high as 95.46%, and the lowest accuracy on animal
multidimensional features of the brain's visual cognitive visual images was 91.57%, both significantly better than
process, thus exhibiting a more comprehensive comparative models such as FWC, ResNet, and VGG. In
classification and understanding of visual information. addition, ablation experiments showed that by introducing
The potential extensions of the research model to attention mechanisms and occlusion independent neural
other tasks include video analysis, multi-modal data responses, the final classification accuracy was improved
fusion, etc. Video data not only contains spatial to 93.94%. From the above, the research on the fusion
information of static images, but also dynamic time series intelligent computing model based on DCNN and brain
information. Therefore, models based on brain visual visual cognition effectively improved the accuracy of
cognition can better understand the dynamic changes in computer vision image classification.
videos by integrating spatial and temporal features, Although research focused on EEG image
especially with the addition of LSTM modules. In the field classification and achieved good classification results in
of multi-modal data fusion, cross modal learning can be the relevant areas of ventral flow and visual regions, the
achieved by introducing multi-modal neural network current scope of research has not yet covered other brain
structures and combining data from different modalities. tissue and neural mechanisms. Therefore, future research
For example, in video description generation tasks, visual can be extended to explore the functions of other brain
information of video frames can be combined with speech regions, such as their contributions to tasks such as
or text information to generate more accurate and natural cognitive control and emotion recognition in different
descriptions. brain regions. In addition, combining different neural
The reason why the research method is superior to mechanisms and multi-modal data will help improve the
other methods is that it considers the spatial and temporal comprehensiveness and accuracy of cognitive image
characteristics of the brain in the visual cognitive process, classification, thereby promoting further development in
while other methods rely more on a single spatial or static the field of BCIs. Future work will strive to further
feature. In addition, the introduction of LSTM further enhance the analytical ability of EEG information for
enhances the model's ability to process temporal complex visual stimuli through the integration of broader
information, enabling the model to decode complex neural regions and mechanisms, in order to promote the
dynamic brain signals more accurately. The potential widespread application of intelligent computing models in
applications of this discovery cover fields such as practical applications.
neuroscience experiments, intelligent medical devices,
and brain computer interaction systems. However, this Funding
method still has certain limitations. For example, the study This study was supported by the Key Science and
only explored the classification of EEG images, so the Technology Program of Henan Province (Project Name:
research results are not comprehensive enough. This Research on NoC Routing Algorithm and Fault-Tolerant
aspect can be further improved in the future. Technology Based on Spanning Tree Sub-Domains; Grant
No. 252102210225).
Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 51
References [11] Yıldırım Ö, Baloglu U B, Acharya U R. A deep
convolutional neural network model for automated
[1] Wilson H, Chen X, Golbabaee M, Proulx, M. J., & identification of abnormal electroencephalogram
O’Neill, E. Feasibility of decoding visual signals. Neural Computing and Applications, 2020,
information from electroencephalogram. Brain- 32(20): 15857-15868. DOI: 10.1007/s00521-018-
Computer Interfaces, 2024, 11(1-2): 33-60. DOI: 3889-z
10.1080/2326263X.2023.2287719 [12] Miao M, Hu W, Yin H, & Zhang, K. Spatial‐
[2] Finlayson S G, Subbaswamy A, Singh K, Bowers, J, Frequency feature learning and classification of
Kupke, A., Zittrain, J & Saria, S. The clinician and motor imagery electroencephalogram based on deep
dataset shift in artificial intelligence. New England convolution neural network. Computational and
Journal of Medicine, 2021, 385(3): 283-286. DOI: Mathematical Methods in Medicine, 2020, 2020(1):
10.1056/NEJMc2104626 1981728-1981752. DOI: 10.1155/2020/1981728
[3] Masana M, Liu X, Twardowski B, Menta, M., [13] Li F, He F, Wang F, Zhang, D & Li, X. A novel
Bagdanov, A. D & Van De Weijer, J. Class- simplified convolutional neural network
incremental learning: survey and performance classification algorithm of motor imagery
evaluation on image classification. IEEE electroencephalogram signals based on deep learning.
Transactions on Pattern Analysis and Machine Applied Sciences, 2020, 10(5): 1605-1624. DOI:
Intelligence, 2022, 45(5): 5513-5533. DOI: 10.3390/app10051605
10.1109/TPAMI.2022.3213473 [14] Tang X, Shen H, Zhao S, Li, N., & Liu, J. Flexible
[4] Zhu Y, Zhuang F, Wang J, Ke, G., Chen, J., Bian, J &
brain–computer interfaces. Nature Electronics,
He, Q. Deep subdomain adaptation network for
2023, 6(2): 109-118. DOI: 10.1038/s41928-022-
image classification. IEEE Transactions on Neural
00913-9
Networks and Learning Systems, 2020, 32(4): 1713-
[15] Kawala-Sterniuk A, Browarska N, Al-Bakri A, Pelc,
1722. DOI: 10.1109/TNNLS.2020.2988928
M., Zygarlicki, J., Sidikova, M., et al. Summary of
[5] Hong D, Gao L, Yao J, Zhang, B., Plaza, A &
over fifty years with brain-computer interfaces—a
Chanussot, J. Graph convolutional networks for
review. Brain Sciences, 2021, 11(1): 43-45. DOI:
hyperspectral image classification. IEEE
10.3390/brainsci11010043
Transactions on Geoscience and Remote Sensing,
[16] Cohn N. Your brain on comics: a cognitive model of
2020, 59(7): 5966-5978. DOI:
visual narrative comprehension. Topics in Cognitive
10.1109/TGRS.2020.3015157
Science, 2020, 12(1): 352-386. DOI:
[6] Gao Z, Sun X, Liu M, Dang, W., Ma, C., & Chen, G.
10.1111/tops.12421
Attention-based parallel multiscale convolutional
[17] Finlayson S G, Subbaswamy A, Singh K, Bowers, J,
neural network for visual evoked potentials
Kupke, A., Zittrain, J & Saria, S. The clinician and
electroencephalogram classification. IEEE Journal of
dataset shift in artificial intelligence. New England
Biomedical and Health Informatics, 2021, 25(8):
Journal of Medicine, 2021, 385(3): 283-286. DOI:
2887-2894. DOI: 10.1109/JBHI.2021.3059686
10.1056/NEJMc2104626
[7] Ahirwal M K, Kose M R. Audio-visual stimulation
[18] Bicanski A, Burgess N. Neuronal vector coding in
based emotion classification by correlated
spatial cognition. Nature Reviews Neuroscience,
electroencephalogram channels. Health and
2020, 21(9): 453-470. DOI: 10.1038/s41583-020-
Technology, 2020, 10(1): 7-23. DOI:
0336-9
10.1007/s12553-019-00394-5
[19] Franzen L, Stark Z, Johnson A P. Individuals with
[8] Komolovaitė D, Maskeliūnas R, Damaševičius R.
dyslexia use a different visual sampling strategy to
Deep convolutional neural network-based visual
read text. Scientific Reports, 2021, 11(1): 6449-6455.
stimuli classification using electroencephalography
DOI: 10.1038/s41598-021-84945-9
signals of healthy and alzheimer’s disease subjects.
[20] Zhou S K, Greenspan H, Davatzikos C, Duncan, J. S,
Life, 2022, 12(3): 374-379. DOI:
Van Ginneken, B, Madabhushi, A & Summers, R. M.
10.3390/life12030374
A review of deep learning in medical imaging:
[9] Kumari N, Anwar S, Bhattacharjee V. Time series-
Imaging traits, technology trends, case studies with
dependent feature of electroencephalogram signals
progress highlights, and future promises.
for improved visually evoked emotion classification
Proceedings of the IEEE, 2021, 109(5): 820-838.
using EmotionCapsNet. Neural Computing and
DOI: 10.1109/JPROC.2021.3054390
Applications, 2022, 34(16): 13291-13303. DOI:
[21] Basso M A, Bickford M E, Cang J. Unraveling
10.1007/s00521-022-06942-x
circuits of visual perception and cognition through
[10] Santamaria-Vazquez E, Martinez-Cagigal V,
the superior colliculus. Neuron, 2021, 109(6): 918-
Vaquerizo-Villar F, & Hornero, R.
937. DOI: 10.1016/j.neuron.2021.01.013
electroencephalogram-inception: a novel deep
[22] Jeong J J, Tariq A, Adejumo T, Trivedi, H., Gichoya,
convolutional neural network for assistive ERP-
J. W., & Banerjee, I. Systematic review of generative
based brain-computer interfaces. IEEE Transactions
adversarial networks (GANs) for medical image
on Neural Systems and Rehabilitation Engineering,
classification and segmentation, Journal of Digital
2020, 28(12): 2773-2782. DOI:
Imaging, 2022, 35(2): 137-152. DOI:
10.1109/TNSRE.2020.3048106
10.1007/s10278-021-00556-w
52 Informatica 49 (2025) 37–52 X. Li et al.
[23] Wang F. Automatic ink painting rendering technique
based on deep convolutional neural networks.
Informatica, 2025, 49(5): 95-108. DOI:
10.31449/inf.v49i5.7112
https://doi.org/10.31449/inf.v49i16.6312 Informatica 49 (2024) 53-66 53
Research on Optimization Method of Landscape Architecture
Planning and Design Based on Two-Dimensional Fractal Graph
Generation Algorithm
Sheng Chen
Shanxi Vocational University of Engineering and Technology, School of Architectural Design, Department of Cultural
Heritage Conservation Engineering)
E-mail: cs10078910@163.com
Keywords: design optimization, generation algorithm, landscape architecture, two-dimensional fractal graph.
Received: May 30, 2024
The development of modern mathematical theory, especially two-dimensional fractal graph algorithm,
provides a possibility for large-scale landscape data processing. Landscape digital identification
technology is an innovative technology based on digital landscape technology and computer identification
of experimental data. It is an important artificial intelligence technology, which includes three steps:
landscape acquisition, landscape processing and landscape identification. The characteristics of the scene
in the landscape picture can be collected by special instruments, such as cameras, etc., and then the
collected data can be processed by two-dimensional fractal graph algorithm, and finally realice the
automatic identification of the landscape. For images with significant boundary characteristics, we can
extract the boundary of the región quickly and accurately, so as to realice the segmentation of the region.
However, when the edge features of the image are not good enough, there is little color difference between
the background and the region, or there is some interference, the result will be very bad. In this paper,
based on the two-dimensional fractal graph generation algorithm, a series of optimization of landscape
architecture planning and design. The accuracy of landscape prime number can reflect whether specific
types of landscape pictures can be correctly identified and divided. 200 Pictures are divided into six
categories, namely Water scene, landscape scene, living scene, sky scene, architecture and transportation
then exact ratios of two-dimensional fractal graph network -8s, two-dimensional fractal graph network -
16s, two-dimensional fractal graph network -32s and two-dimensional fractal graph network -32s. It
reached the best level in pixel accuracy, average accuracy, average IU, etc., and the pixel accuracy has
reached as high as 100%, average accuracy has reached 100%, average accuracy has reached 100%.
When compared to the recommended algorithm, the 2D fractal graph generation algorithm has the highest
accuracy (94.52%), precision (93.34%), and recall (94.18%) in the classification process.
Povzetek: Razvita je optimirana metoda načrtovanja krajinske arhitekture z uporabo algoritma
generiranja dvodimenzionalnih fraktalnih grafov, kar je omogočilo učinkovito avtomatsko prepoznavanje
in segmentacijo krajinskih elementov.
1 Introduction However, it is possible to maximize the perfection of
design results through as much analysis and thinking as
The performance of landscape architecture is a possible and considering as many factors as possible. The
comprehensive concept, which refers to the elements of the site environment is the material premise
characteristics and functions of the whole life cycle of on which the landscape architecture depends. The design
landscape architecture, including early analysis, of landscape architecture is not a static way to carry out,
conceptual design, construction and operation. For a long and its form will inevitably exist in the site environment.
time, landscape architecture has been dependent on the Human perception in the space environment, sound, light,
designer's prediction to design it: based on the designer's heat and other factors in the natural environment affect
prediction to complete the design. However, designers the form. It is significant to determine how to respond to
cannot predict all the factors with complete accuracy, so the "unpredictable" immaterial capabilities, as well as
it is impossible to design a perfect design. how to respond to the force, energy and feeling in the
structure, is the main task in the performance
54 Informatica 49 (2025) 53-66 S. Chen
optimization process, as shown in Figure 1 below, which In terms of the connotation of garden landscape, in
is the specific landscape architecture design plan. addition to traditional garden landscape, there are also
landscape preference, landscape competitiveness
evaluation, etc., and the connotation of evaluation is
gradually deepening. In the 20th century eighties to
nineties, the landscape more beauty estimates, the
environment, such as the study of different models, the
visual landscape and visual effects evaluation, among
them, the main methods of landscape evaluation model is
divided into three categories: describe the factor method,
questionnaire survey method, aesthetic attitude
determination method. At present, the evaluation of
landscape resources using both qualitative and
quantitative way more, more is established by using AHP
method of fuzzy comprehensive evaluation model,
including: from the perspective of ecology, the GIS
technology, the tourists demand Angle, landscape image,
etc. At present, high-resolution landscape architecture
Figure 1: Planar heat planning and design in landscape
can provide a large amount of landscape information with
architecture design
rich characteristics, so it is widely used in landscape
For example, in the landscape architecture design shown assessment [1-3].
in the figure 1, designers can operate and optimize the Two-dimensional fractal image generation algorithm is
form according to the perception of the natural an image segmentation method based on image features.
environment of the site or the human environment of the Its working principle is to select the boundary points in a
site, to realize the transformation of the original single region as candidate boundaries and select a method that
pure form. can be spliced to obtain the boundary of the region
The form-finding model and multi-objective through the inconsistency of the features of the region or
optimization based on performance digital analysis make region. Several edge detection operators, including first-
performance itself a factor and method of form creation, order differential Sobel operator, Robert operator and
and help designers complete the design as an second-order differential Laplacian operator, are usually
optimization technique. Landscape architecture structure used to extract the edge of a region.
optimization design process of performance, can be Based on two-dimensional fractal graph. The partition
considered in landscape architecture designers technique of generating algorithm is essentially a
interpretation of the space environmental conditions as partition based on similarity criterion, which includes
the foundation, the performance optimization software as some common methods. In this method, a series of basic
the main technical means, will be dissected into texture pixels are used to describe each region, and an
landscape architecture structures as a material system + extended growth criterion is determined to expand the
markers, a micro three dimensions from macro a medium region. Secondly, the growth of adjacent seed pixels is
to form self-organizing process of structures, It is a form calculated to determine whether the adjacent pixels have
generation process from top to bottom and self- added the pixel of seed pixels. When no new pixel is
expression. Here, the dominant position of the designer is found, the whole growth is carried out. The most critical
expressed by interpreting the site environment, predicting step is to know the rules of seed birth and growth [4-5].
the function and the relationship between the construction The watershed method mainly regards the pixel points of
form and the performance target. This design idea can be each scene in landscape architecture as the coordinates in
roughly summarized into three kinds of form generation the whole graph, and represents its location with one
and feedback processes: one is a dynamic interaction pixel value. Next, they used a similar in floods overflow,
between the form of the structure and the human subject. low-lying, pixels less place, is a piece of plain, and the
The other is the environmental or other external forces high place, is a mountain, in a basin, with the change of
acting on various forms of the structure, and the the terrain, terrain will be more and more high, the higher
resistance of various forms to the environment or other the terrain, the topography is lower, the easier it is to be
factors. The third is the interaction between the flooded. After enough seawater is poured into the area, a
components of the structure itself. Therefore, the depression is formed, creating an open area. However,
performance of landscape architecture can also be there is too much segmentation due to the interference of
summarized into three categories, which in turn is the the pixels. In general, this method transforms the
spiritual demand (the spatial feeling brought by the space landscape into a grid landscape in graph theory, and treats
to the user). each pixel point as a node on the landscape, and the
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 55
connection between these nodes is called the boundary. edges on G and cross off those parts of G that are not
The common method to define edges is to calculate the connected together, thus realizing the partition of G. In
dissimilarity of pixel points in the neighborhood the partitioned graph G, each independent subgraph is
according to the correlation of pixel values, and then treat matched with the corresponding partition, which realizes
them as edge weights, so as to get the graph G=< V, E>. the image segmentation [6-8].
In general, G is a weighted undirected graph, and its
weight is usually defined according to the actual situation. 2 Related works
The basic principle of graph theory is to cut off several
Table 1: Summary on related works
Reference Methods/Algorithm Merits Limitations
[9] This work presents the multi-bit Trie-tree Test results indicate that this Under this design, the
technique with non-collision hashing. approach has a good success rate necessary data reconstruction
for data reconstruction and a high- technology has not developed.
performance efficiency.
[10] This survey updates trends in Best-practice companies have Businesses that do not maintain
organizations, processes, and outcomes higher expectations for their NPD current NPD practices will find
for NPD in the United States and was programs, use more versatile their competitive lack in
performed little over five years after teams, and are more likely to growth.
PDMA's initial best-practices survey. monitor NPD processes and
outcomes.
[11] This work presents an effective hybrid The hybrid RBFs-Delaunay graph Delaunay graph mapping is
approach for dynamic mesh generation mapping approach is found to be lacking in efficiency.
based on Delaunay graph mapping and as accurate and effective as the
Radial Basis Functions (RBFs). Delaunay technique in building
dynamic meshes for various test
. scenarios.
.
[12] Utilizing a novel two-dimensional The method is desirable from a Further information on the
modification of hat functions (2D-MHFs) computational standpoint, and its error analysis is needed
to solve linear Fredholm integral great accuracy is demonstrated by
equations a few numerical examples.
[13] An introduction to genetic algorithm- The method, particularly when A small number of snapshots
based aeronautics is given. incorporating numerous free computed via computational
design factors and configurable fluid dynamics.
modeling parameters, is more
versatile and computationally
efficient than the traditional
approach.
[14] This study offers a thorough approach to Lastly, simulated experiments Lack of performance metrics
water and energy operation optimization. based on real network data are with the suggested operation.
used to validate the viability of the
suggested operation optimization
strategy. The comprehensive
energy-saving rate reaches 31.3%,
effectively lowering the costs
associated with system operation.
https://doi.org/10.31449/inf.v49i16.6312 Informatica 49 (2024) 53-66 56
3 Research methods In the formula, represents the set composed of all
landscape element pixels in the landscape image,
3.1 Based on two-dimensional fractal graph, represents the set of pixels adjacent to the landscape
element pixel, and usually takes the set of pixels in four
it is an algorithm based on conditional
similar regions, top, bottom, left and
random point partition right.VNiLRepresents the combination of categories of
landscape element pixel classification. It represents the
According to the human landscape of the scene, make full
categories of all landscape element pixels and is the
use of the subjectivity of architectural design, carry out
corresponding category of pixel points. xiiφ Is the
artistic creation of landscape architecture, and visualize
potential energy function.The general form of the unary
its art. The design of this experimental landscape
potential energy is the logarithmic pattern of the class
architecture structure is based on the landscape pavilion
likelihood probability corresponding to the
with roof and column for people to watch and chat. The
pixel.φi(xi)xiThe likelihood probability can be learned
reason is:
according to the actual specific pixel features, and the
First of all, in landscape architecture works, the
most common pattern is directly trained according to the
interaction between people and places is very critical,
pixel values. However, these methods only consider the
which requires architectural design to be
color factors of the landscape image pixels, but are not
comprehensively considered according to their own
particularly comprehensive. Therefore, color and texture
experience and the situation of the site. After a detailed
are generally selected:
arrangement of the area, the author divides it into two
parts: natural environment and cultural environment. The φi(xi) = λTφT(xi) + λcolφcol(xi) + λlφl(xi) (2)
experiment was conducted in a humanized manner
centered on the central axis of the Guangzhou center, Potential energy function trained by the value size (color)
targeting white-collar workers, tourists and nearby feature. φlIs a potential function defined according to the
residents. In view of the large urban population and
location characteristics of pixels. λT、λcolAnd are their
various groups, the construction of landscape architecture
respective weights, which are generally obtained through
layout should not only meet the needs of people, but also
training.λ
take into account the needs of history, culture and society, l
that is to pay full attention to the structure of landscape Binary potential energy, generally defined in the form
architecture and people's behavior.
On this basis, a concept of random spatial structure based ,φij(xi, xj) (3)
on background, two-dimensional fractal graph, is
proposed. This method uses each pixel in the background 0, xi = xj
as a node, as shown in Figure 2 below, and uses the φij(xi, xj) = { (4)
g(i,j),xi ≠ xj
corresponding relationship between each point as the
boundary. Then the minimum state random ability field Functions are generally defined based on the relationship
is used to divide the landscape garden landscape. between pixel values between adjacent pixels. In the
formula, the meaning of function is for pixel values with
the same pixel class, which will make the value of
function 0. Otherwise, it will be determined according to
the function.
According to the human landscape of the scene, make full
use of the subjectivity of architectural design, carry out
artistic creation of landscape architecture, and visualize
its art. The design of this experimental landscape
architecture structure is based on the landscape pavilion
Figure 2: The most basic representation of the two- with roof and column for people to watch and chat.
dimensional fractal graph algorithm This method is based on the pixels of the pixel value, is
often a function of the formula is the pixel value and pixel
The energy of the conditional random field in image type are the same type of pixel values, resulting in the
semantic segmentation is expressed as follows: function value is 0, on the other hand, will be decided
according to the function.
E(X) = ∑i∈V φi(xi) + ∑i∈V,j∈N φ
i ij(xi, xj); ∀i, xi ∈ At present, by using random field model for post-
L (1) processing, semantic of early model correction, one yuan
a function usually to repair and add in front of the
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 57
semantic segmentation, make the original semantic Effective navigation of the landscape design space is
segmentation more accurately. made possible by the 2D fractal graph creation technique,
which takes advantage of this hierarchical classification
3.2 Description of algorithm to balance exploration and exploitation across various
layers of search agents. Fractal graphs converge to ideal
• Updating 2D Search Velocity: Every search agent's landscape designs through iterative refinement directed
velocity is updated according to its velocity, position, and by equations guiding velocity and position updates. The
the best-known positions of both itself and its neighbors hierarchical classification that the 2D fractal graph
in the same level. presents improves its effectiveness and efficiency in
• Update Search 2D Position: Next, each search navigating the intricate landscape design space.
agent's position is modified in accordance with its Fundamentally, the fractal graph method is a multi-level
velocity. categorization scheme that groups search agents
• Termination of Optimization: Until a termination according to how well they can explore and exploit new
criterion is satisfied, the optimization process is carried areas. Because of its hierarchical structure, 2D is able to
out recursively. strike a balance between exploitation which helps to
• Best answer Extraction: The ideal answer to the refine promising solutions and exploration which allows
landscape design challenge is ultimately determined to be for the discovery of a variety of landscape design
the most well-known position among all search agents. alternatives.
Algorithm 1: Two-dimension fractal graph for the landscape architecture planning and design
Initialize:
- Define the 2D landscape planning and design problem.
- Parameters: size of the population (N), iterations in maximum (MaxIter), hierarchical levels (L), weights (w, w_local, w_global),
acceleration coefficients (c1, c2), are the classifications.
Generate initial fractal graph:
- Randomly initialize N search agents with positions and velocities within the solution space.
Evaluate stability:
- Evaluate the stability of each search agent using the landscape design objective function.
Main loop:
For iter = 1 to MaxIter:
Update hierarchical classification:
- Classify search agents into hierarchical levels based on their stability and exploration-exploitation characteristics.
For each level L:
Update velocity and position:
For each search agent in level L:
Update velocity:
- Calculate cognitive and social components:
cognitive_component = c1 * rand() * (p_local - position)
social_component = c2 * rand() * (p_global - position)
58 Informatica 49 (2025) 53-66 S. Chen
- Update velocity:
velocity = w * velocity + cognitive_component + social_component
Update position:
- Update position:
position = position + velocity
Evaluate fitness:
- Evaluate the stability of the new position using the landscape design objective function.
Update local best:
- Update local best position if current stability is better than previous.
Update global best:
- Identify the search agent with the best architecture stability among all levels.
Return global best as the optimal solution.
3.3 A Survey of the computational methods similarity and a high degree of convergence. This makes
of two-dimensional fractal graphs the landscape design of the environment and the
structural form can communicate directly, not
Due to the emergence of two-dimensional fractal graph, mechanically or indirectly, to adapt to the former, but will
it has been widely paid attention to for its unique be integrated into the landscape design of the behavior of
advantages, coupled with the development of technology, the purpose. Although the message contained in each
people's understanding of it is more and more profound, place is different, its essence is to seek a symbiotic
two-dimensional fractal graph performance is also relationship between human and nature. Its inner essence
getting better and better, has been widely used in many is palpable and obvious, such as the use of local materials
aspects. This is especially true for the identification, and traditional crafts. The internal expression is a
especially for the landscape. 2 d fractal graph has many spiritual guidance for designers to convey and express the
advantages, the key point is the weighted average, and inner meaning of the museum through accurately
biological neural network, it can reduce weight, thus grasping the characteristics of the scene.
reduce the difficulties of network modeling. Compared to Its meaning includes two levels: first, designers use the
the conventional 2 d fractal graph, it saves a lot of tedious power of nature to show a landscape structure with deep
pretreatment process, such as refactoring related data and humanistic characteristics and spiritual emotions, and
so on. make its spatial connotation appear and continue
In numerous hierarchical networks, 2 d fractal graph is according to their own experience and experience. Due to
one of the most widely used. The method can in advance the addition of landscape architecture structure, it adds a
to BP, effectively reduce the training parameters, thus new artistic spirit and cultural connotation, so that its
greatly improve the performance of the algorithm. With symbol has a natural artistic spirit of the place. No matter
2 d fractal graph method, can effectively shorten the what the meaning is, it shows that the landscape
preparation process of input, thus for the user to save the architecture structure is a man-made intermediary
work time and reduce the pressure of work. Layer by between people and places, and the essence of its design
layer, layer by layer, layer by layer, layer by layer with a is to integrate human thought and subjective will into the
new computation, a new way of computing, each kind of information of places. Under the guidance of design,
new data are added to a new system, this method can be people often naturally associate or recall, and realize the
applied to many aspects. meaning of the designer, thus arouse people's resonance.
The formation of architectural form is mainly an 2 d fractal graphic method belong to the category of deep
innovative embodiment of the relationship between the learning, its structure characteristics are similar to deep
spatial structure and the architectural structure of the learning, both the localization, also has a hierarchical.
building site. Landscape structure is a unique landscape The method adopted a kind of monitoring method of
architecture and its structure and has a high degree of training, make the method can more accurately extracted
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 59
from the input data contains information. This method (
a2
1 = f(W1 1 )
11x1 + W12 × 1 + W1 1
13 × 1 + b1 )
can improve the learning efficiency of 2 d fractal graph.
a2 (
2 = f(W1 1 )
21 × 1 + W22 × 1 + W1 1
23 × 1 + b2 )
In actual application, because there is no information
a2 (
there is usually no classification marking, so you need to 3 = f(W1 1 )
31x1 + W32 × 1 + W1 1
33 × 1 + b3 )
first according to the characteristics of the tag hW,b(x) = a3
1 = f(W2
11a2
1 + W2
12a2
2 + W2 2 ( )
13a3 + b 2
1 )
information itself, to unsupervised learning of these data,
in order to gain rules, then marked by the monitoring data (6)
of learning, both to fully play to the role of the samples,
and can deal with the issue of information is not much. Prior to the spread of the calculation principle is shown
in figure 1-3, the neural network structure and Logistic
model structure are nearly the same, both calculation
method and the principle of using but the biggest
difference between them is the neural network Winding.
Hypothesis 2 d fractal graph algorithm is able to handle
the size of the data, the width of the image with said the
figure like height with said, the two figures is a plane, but
the dimension of the image and color channel Several are
represented by d.h ∗ w ∗ dwh
Basic network formed by convolution and pooling layer,
and the two basic structures are based on local entered as
a unit, and 2 d fractal graph algorithm in image translation
can maintain the structure of the constant, in other words
Element is only related to the position of the space, if one
layer of data on the coordinates of the vector is, the data
Figure 3: Computational architecture and general vector of the next layer is, and its calculation formula is
formula of two-dimensional fractal graphs as follows:
As we can see in Figure 3, the input level is a neural Yij = fks(Xsi+δi,sj+δj)0 ≤ δi, δj ≤ k (7)
network."+ 1" is refers to a point, known as split points.
The system has three types of input, output, three kinds Convolution kernels in the above formula used, the size
of concealed structure, only a as output, input on the left, of the step length with s, said the said algorithm ()
on the right is the output, and in the central is implied, function which adopted operation, general can be divided
shows the fully connected. Hidden in the shadows of data, into matrix on the convolution or average pooling, a
including the data on the node, is unable to display in the nonlinear excitation function, and calculate the maximum
training. pool type of operation such as the maximum value.
Among the whole network structure, said one of the
layers, a total of figure 2 contains three layers.n1Need a When the parameters and satisfy, to the principle of
tag in each layer, is in the first layer of the output layer, mutual transformation between this formula can be used
use said, parameters of fang, such as: to represent:
′
(W, b) = (W1, b1, W2 , b2) (5) fks∘ gks′ = (f ∘g)k′+(k−1)s′ ,ss′ (8)
Wij (1) arrangement of a unit of the ordinal number is 1 General 2 d fractal graph algorithm in dealing with
layer in the middle position of the first unit, and the nonlinear equation, a very common will adopt the method
connection between the unit operation parameters, said in for solving nonlinear filter, completely is adopted by the
the first layer of the first units to set up. j1 + 1ibl two-dimensional fractal graph. Full convolution net, can
i1 +
1iThe neural network computation method is to use one be used to express two-dimensional fractal graph network
of the units as outputs, and tags, and the output markup and because its input image in conformity with its output
became the first layer, the value is 1, it represents in the image, so the convolution network size is not qualified.
known and the two functions, and it contains a collection
of the cases, can pass for function prediction, Thus the
final results and output. a2
i wbhw,b(x) The specific
calculation steps are as follows:
60 Informatica 49 (2025) 53-66 S. Chen
3.4 Based on the two-dimensional fractal some kind of mutual coupling relationship. For example,
figure garden jingsu level evaluation index of when the landscape building structure is opened in the
light environment, attention must be paid to the structural
the algorithm
performance of the structure to ensure the stability and
constructability of the structure. From this point, it can be
In the process of various scenarios of semantic
seen that the combined effects of building characteristics
classification, unable to differentiate each scenario, so
and lighting conditions are very complex and some even
sometimes will be a scene of scenes into other scenarios,
contradict each other. For another example, the average
the effect of causing some fuzzy. This paper adopts a new
sunshine structure is strongly related to factors such as
method, the images were compared with the real world,
vertical scale, but in a specific design, the coupling of
according to the classification results obtained will be
sunlight with horizontal and vertical scales is difficult.
digital processing was carried out, as a convicted
Monitored learning is machine learning, and its technical
landscape image of the final result. This article is
feature is the ability to recognize the functions of a map.
currently used by common word segmentation criterion,
During this period, all training samples have an
used for statistical accuracy.
alternative target and a more satisfactory output result,
This paper used to represent the semantics and belongs to
which is called "monitoring”. The technique of
the class is judgment as the number of pixels, to represent
monitoring refers to the comprehensive analysis of the
the language righteousness, the total number of
input data at runtime, and the corresponding mapping is
categories in this article belongs to the class represented
obtained, to obtain a new set of sampled data. If it is new
by the number of the pixels of the total number of
data that was generated before, then he will label these
nijijncjncj6, ti = ∑i niji . Calculate the overall accurate
new data samples as categories. In the algorithms for two-
rate formula such as:
dimensional fractal graphs, guided learning algorithms
∑ are usually used for training while supervised learning is
i nii (9)
∑i ti usually based on gradients (Krizhevsky et al, 2012).
Batch processing stochastic gradient reduction methods
Used to define JingSu belongs to the first-class scenery are commonly used. In the learning process of two-
and be correctly defined as the first-class scenery of dimensional fractal graph N, we only use one example to
Beijing plain accurate formula can be used to represent:ii simplify the description process. The method is divided
into two stages: forward stage and reverse stage. The first
1
∑
n i nii
cl (10) step is to carry out in turn until the final result, the second
ti step according to the error of the value to be output
weight and deviation, after the end of the operation,
IU scheme (IU) on average, is expected by calculation
according to the weight and deviation of each level is
landscape JingSu category pixels in the intersection of
adjusted accordingly. If the number of classes, we specify
right then to predict the scenery line pixels and the pixels
is for samples during classification, its error function
of the original category and set, the result is that the final
formula is as follows:
discriminant index, can use formula to represent:
1
1 J(W, b; x, y) = Σc 1
(t − )2 ∥ −
∑
n i nii 2 k=1 k yk = t
2
cl (11)
(ti+∑j nji−nii) y ∥2
2 ) (12)
Among them, represents the weight in the neural network,
4 Result analysis b. Represents the bias in the neural network, the training
sample is represented by, and the corresponding standard
4.1 The practical application of two-
of the training sample is represented by.WxytkDenotes
dimensional fractal figure the KTH dimension component of the predicted value
2-D fractal graph network training, after a long period of generated by the sample when predicting the sample;x
time after the improvement and development, there are However, it represents the dimensionality component of
two main ways of training: in the presence of a monitor the sample label to be predicted.ykxk When we are doing
teaching. While this paper used a learning algorithm of back propagation, the first thing we need to do is calculate
monitoring, the original image and its corresponding the error terms at each level in a certain order.Suppose
manual segmentation image used in the network of the 2- that the error term of the first layer is calculated according
d fractal graph modeling [14]. to the above formula, and the weight of this layer is
However, the influence ratio of each factor of the actual expressed and the bias parameter is expressed.δ(a+1)l +
landscape architecture structure on the landscape 1WbIf both layers are fully connected, the error term of
architecture structure is not the same, and each aspect has
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 61
the first layer can be calculated using the following 4.2 Optimization of landscape architecture
formula: planning and design and expansion of two-
dimensional fractal graph
δ(l) T
= ((W(l)) δ(l+1)) ⋅ f ′(z(l)) (13)
Through the detection and classification of 200 pictures,
The corresponding gradient calculation formula is shown the correct degree of landscape element value in
as follows: landscape garden landscape is obtained, as shown in
Table 3 below, which can intuitively reflect the
∇W(l)J(W, b; x, y) = δ(l+1)(a(1))T
(14) performance of three different upper sampling structures.
∇b(l)JJ(W, b; x, y) = δ(l+1) The accuracy of landscape prime number can reflect
whether specific types of landscape pictures can be
If the first layer is a feature extraction stage, that is, the correctly identified and divided. Pictures are divided into
convolution layer and the sampling layer of, then the six categories, namely Water scene, landscape scene,
error term of the first layer can be calculated by the living scene, sky scene, architecture and transportation.
formula: Water scenes include Water, river and Mountain. Scenes
include: landscape, landscape, landscape, sky scene,
(l)
δ = unsample ((W a) T
) δ a+1)
k k k ) ⋅ architecture, traffic, traffic, etc. It can be seen from Table
f ′(z a)
k ) (15) 3 that the classification accuracy of two-dimensional
fractal graph network -32s is the worst among all
Among them, the value of represents the first convolution classifications. The classification accuracy of two-
kernel. After a series of runs in the upper sampling layer, dimensional fractal graph network -16s is higher than that
the error obtained later can be transmitted to the previous of two-dimensional fractal graph network -8s, while the
layer through the subsampling layer, which refers to the accuracy of two-dimensional fractal graph network -8s is
convolution layer of. kkδ a+1)
k If use method is to higher than that of two-dimensional fractal graph network
average sampling, sampling layer will put poor 洖 -8s. Therefore, two-dimensional fractal graph network -
simple average assigned to run before sampling of sub 8S also has a good effect on landscape quality
zone, if use is a maximum sampling, so when the forward classification of landscape images. On the two-
propagation, samples shipped to in clause, the sampling dimensional fractal graph network -8S, "Skyview" has
value will get all the error, the rest of the value is 0 [15]. the best classification rate, but in the real world, it has the
The accuracy of landscape classification can be seen in lowest classification rate, only occasionally appearing
the accuracy of landscape classification, which can see something similar to the real scene.
whether specific types in landscape have been correctly
Table 3: Three kinds of the knot JingSu landscape
identified and divided. As shown in Table 4-3, the
classification accuracy
classification of landscape scenery has low accuracy, and
the classification accuracy of natural landscape is higher Categories FCN-8s FCN- FCN-32s
than the 2D fractal network -8s.It can be seen that the (%) 16s(%) (%)
two-dimensional fractal graph network -16s has the best
The surface of the 91.89 90.32 87.85
overall performance in the classification of habitat
water
landscape in landscape images. Finally, three values of
average accuracy, average accuracy and average The mountain 87.66 86.35 82.43
accuracy of image pixels are tested. The final value is
shown in Table 2; Vegetation 85.94 88.28 83.12
The sky 93.56 90.65 88.45
Table 2: Results of three kinds of upsampled semantic
segmentation on landscape pixels Building 88.29 87.56 84.68
Scene Elements
The traffic 86.12 85.08 83.28
Pixel Average The average IU
accuracy(%) accuracy(%) (%)
FCN-8s 86.25 85.98 74.33 Table 3 shows the exact ratios of two-dimensional fractal
graph network -8s, two-dimensional fractal graph
FCN- 88.97 88.58 75.35
16s network -16s, two-dimensional fractal graph network -
32s and two-dimensional fractal graph network -32s.
FCN- 83.06 82.75 72.69 From this figure, we can know the pixel accuracy,
32s
average accuracy and average value of the three kinds of
62 Informatica 49 (2025) 53-66 S. Chen
upper sampling. Through the comparison of three The landscape planning findings, which were obtained
different upper sampling modes, it is concluded that two- through the use of a 2D fractal network generation
dimensional fractal graph net-8S has reached the best algorithm, are shown in Figure 4 and Table 4. The
level in pixel accuracy, average accuracy, average IU, assessments are based on a number of factors, such as
etc., and the pixel accuracy has reached as high as 100%, ecological sustainability, aesthetic appeal, resource
average accuracy has reached 100%, average accuracy efficiency, and robustness. Ecological sustainability,
has reached 100%. The average accuracy is lower than aesthetic appeal, and resource efficiency are numerical
pixel, which is calculated from the data of each values assigned to each design solution, ranging from 0
classification of the image. Too much data will result in to 1, representing the quality of the design in each
the average accuracy and the average IU. respective area. The algorithm generates a wide range of
solutions every time, which encourages the exploration
Table 4: Classification of landscape planning and design of the solution space and makes it possible to identify
based on two-dimensional fractal graph generation several different design options. The term "robustness"
algorithm describes how stable the solutions produced by the 2D
fractal network generation algorithm is under different
Classification FCN- FCN- FCN- circumstances. The robustness of all trials shows that the
8s 16s 32s algorithm's results are dependable and consistent in FCN-
8s, FCN-16s, and FCN-32s scenarios.
Ecological Sustainability 0.86 0.79 0.72
(0-1) Table 5: overall performance
Aesthetic Appeal (0-1) 0.92 0.88 0.81
Algorithm Accuracy Precision Recall
Resource Efficiency (0-1) 0.84 0.79 0.82 (%) (%) (%)
Robustness 0.94 0.90 0.89 Traditional 86.34 85.12 85.78
Optimization
Algorithm
Suggested 94.52 93.34 94.18
Algorithm
Figure 4: Outcome of landscape planning and design
based on two-dimensional fractal graph generation
algorithm
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 63
balanced and effective distribution of urban space while
optimizing the spatial configuration and enabling
coordinated growth within its internal areas. By
illustrating a progressive decrease in the fractal
dimension from the city's center to its periphery,
suggesting outward growth, the study contributes to the
notion of fractal cities. Furthermore, it shows that pixels
with mature landscape architecture have larger fractal
dimensions than those that are still developing or are
being quickly classified as FCN-8s, FCN-16s, or FCN-
32s.
Figure 5: Overall performance of the methods 6 Conclusion
An academic discipline that focuses on the interaction In order to solve this problem, we adopt a method based
between human habitation and the natural environment is on quadratic fractal graph to solve this problem. Two-
landscape architecture planning and design. Table 5 and dimensional fractal graphs are mainly divided according
figure 5 demonstrates that the 2D fractal graph generation to the pixel points in landscape architecture. Different
approach that is recommended has the best accuracy from the method of 2D fractal graph, the final step of 2D
(94.52%), precision (93.34%), and recall (94.18%). fractal graph is to transform the whole connected level
into a transition, so that the operational architecture of 2D
5 Discussion fractal graph is preserved. The final image is the same
size as the original image, and the original image of the
A fractal graph is a complex entity that is formed using original image can be obtained. Compared with the
recursive iteration rules and can be found in both natural traditional two-dimensional fractal graph method, two-
systems and man-made systems, such as cities. It displays dimensional fractal graph has higher computational speed
intrinsic self-similarity on both a large and small scale. and is not constrained by the size of the input image. In
The fractal two-dimension is a scientific technique for this paper, a new image semantic partition method is
measuring landscape architecture aspects and its introduced.
evolutionary unsample in the context of network systems. After establishing a virtual environment, the model is
Furthermore, it is an important metric to determine modeled by the common SifFlow on the network. The
whether a city is experiencing self-organizational image preprocessing technology is used to enhance the
evolution. Previous research results show that self- data, thus effectively overcoming the problem of
organized architecture systems have notable fractal overfitting the model. On this basis, the second learning
properties that can be measured using fractal two- of the second segmentation is carried out to shorten the
dimensional graphs. Nevertheless, prior research has learning period of the model.
solely utilized these two fractal dimensions to investigate In the image analysis, two-dimensional fractal graph
the general fractal properties of the entire city, without network -32s, two-dimensional fractal graph network -
carrying out more accurate fractal measurements for 16s, two-dimensional fractal graph network -8s, two-
subzones in various directions and layers. dimensional fractal graph network -8s, two-dimensional
The existence of a fractal structure, which acts as an fractal graph network -8s, two-dimensional fractal graph
analogy and supplement to earlier research findings on network -8s, three different upper sampling methods are
the general fractal laws discovered in other cities, is one adopted. In the case of pixel accuracy, average accuracy
of the study's major discoveries. Unlike earlier research, and average U value sampling, the sampling structure on
this finding supports the different 2D values that 2D fractal graph net-8s is selected, and the average
contribute to the spatial variability of separate subzone accuracy is 90.3%, the average accuracy is 88.91%, and
structures. The cause is the differing paths taken by the average U is 75.83%. At the same time, the pixel
various subzones in terms of planning and development, accuracy of the model in each scene type in the landscape
as well as the ways in which people, land, and image is more than 86%, it is indicating that the method
architecture occupy and use space in distinct urban is an ideal method in the landscape image, especially in
blocks. The box-counting dimension is a measure of the the landscape image containing multiple scene types, can
spatial occupancy capacity, hence urban expansion will obtain higher pixel segmentation accuracy. In this paper,
cause it to rise. In other words, the self-organizational the classification experiment of landscape in landscape
objective of urban development is to provide a more architecture is performance metrics in algorithm carried
out with accuracy as 94.52%.
64 Informatica 49 (2025) 53-66 S. Chen
Declaration statement [4] Mohammadi M, Raise A, Regi A, 2019. Design
and performance optimization of a very low head
turbine with high pitch angle based on two-
Ethics approval and consent to dimensional optimization[J]. Journal of the
participate Brazilian Society of Mechanical Sciences and
Engineering, 42(1), pp. 9.
I confirm that all the research meets ethical guidelines https://doi.org/10.1007/s40430-019-2084-1
and adheres to the legal requirements of the study [5] Liu W, Yang S, Ye Z, et al, 2019. An Image
country. Segmentation Method Based on Two-
Consent for publication: I confirm that any participants Dimensional Entropy and Chaotic Lightning
(or their guardians if unable to give informed consent, or Attachment Procedure Optimization Algorithm[J].
next of kin, if deceased) who may be identifiable through International Journal of Pattern Recognition and
the manuscript (such as a case report), have been given Artificial Intelligence.
an opportunity to review the final manuscript and have https://doi.org/10.1142/s0218001420540300
provided written consent to publish. [6] Wang S, Lin S, 2019. Optimization on ultrasonic
plastic welding systems based on two-
Availability of data and materials dimensional photonic crystal [J]. Ultrasonics,
99:105954. https://doi.org/10.1016/j.ultras.2019.
The data used to support the findings of this study are
105954
available from the corresponding author upon request.
[7] Hu H Q, Yang L, 2013. Research on Routing
Optimization of Regional Logistics Based on
Competing interests Gravity Model: A Case of Blue and Yellow
Zones[J]. I business, 5(4):167-172.
Here are no have no conflicts of interest to declare. https://doi.org/10.4236/ib.2013.54021
All authors have seen and agree with the contents of the [8] Diaz-Casas V, Becerra J A, Lopez-Pena F, et al,
manuscript and there is no financial interest to report. We 2013. Wind turbine design through the
certify that the submission is original work and is not evolutionary algorithms based on surrogate CFD
under review at any other publication. methods[J]. Optimization and Engineering,
14(2):305-329. https://doi.org/10.1007/s11081-
Funding 012-9187-1
[9] Liu Y, Wan M, Zhang H K, et al, 2011. Research
Funding: Shanxi Province Project: National Education
on Data Reconstruction Method Based on
Research Letter 2021," Rural revitalization" Research on
Identifier Locator Separation Architecture[J].
Teaching Innovation of Traditional Culture Protection
épée, 12(4):531-539.
and Tourism Planning in the Background, JGCY2693.
https://doi.org/10.6138/JIT.2011.12.4.01
Authors' contributions (Individual contribution): All
[10] Griffin A, 1997. PDMA Research on New Product
authors contributed to the study conception and design.
Development Practices: Updating Trends and
All authors read and approved the final manuscript
Benchmarking Best Practices[J]. Journal of
Product Innovation Management, 14(6):429-
References
458. https://doi.org/10.1016/s0737-
6782(97)00061-1
[1] Li X, Li S, Jiao H. 2020. Research on Multi-
[11] Ding L, Guo T, Lu Z, 2015. A Hybrid Method
objective Optimization Method of Central Air
for Dynamic Mesh Generation Based on Radial
Conditioning Air Treatment System Based on
Basis Functions and Delaunay Graph Mapping[J].
NSGA-II[J]. Journal of Physics: Conference Series,
Advances in Applied Mathematics & Mechanics,
1626:012113. https://doi.org/10.1088/1742-
7(03):338-356.
6596/1626/1/012113
https://doi.org/10.4208/aamm.2014.m614
[2] He P, Gao F, Li Y, et al, 2020. Research on
[12] Hatamzadeh-Varmazyar S, Masouri Z, 2011.
optimization of spindle bearing preload based on the
Numerical method for analysis of one- and two-
efficiency coefficient method[J]. Industrial
dimensional electromagnetics catering based on
Lubrication and Tribology, ahead-of-print(ahead-
using linear Fredholm integral equation models[J].
of-print). doi10.1108/ILT-06-2020-0205
Mathematical & Computer Modelling, 54(9-
[3] Hong E, Ban H, Qi M, 2019. Design optimization
10):2199-2210.
and analysis of a vaned diffuser based on the one-
https://doi.org/10.1016/j.mcm.2011.05.028
dimensional impeller-diffuser throat area model[J].
[13] Lucas S D, Vega J M, Velazquez A, 2015.
Journal of Physics: Conference Series,
Aeronautic Conceptual Design Optimization
1300:012007. https://doi.org/10.1088/1742-
Method Based on High-Order Singular Value
6596/1300/1/012007
Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 65
Decomposition[J]. Aiaa Journal, 49(12):2713-
2725. https://doi.org/10.2514/1.j051133
[14] Zhu X, Niu D, Wang F, et al, 2018. Operation
Optimization Research of Circulating Cooling
Water System Based on Superstructure and
Domain Knowledge[J]. Chemical Engineering
Research and Design, 142.
https://doi.org/10.1016/j.cherd.2018.12.012
[15] Hu G C, Liu J H, 2011. The Optimization Design
of Mechanical Structure Based on CAE
Technology[J]. Machine Design& Research, 130-
134:672-676.
https://doi.org/10.4028/www.scientific.net/amm.
130-134.672
66 Informatica 49 (2025) 53-66 S. Chen
https://doi.org/10.31449/inf.v49i16.5869 Informatica 49 (2025) 67–76 67
Enhanced COVID-19 Detection Through Combined Image
Enhancement and Deep Learning Techniques
Abderrazak Benchabane*, Fella Charif
Department of electronics and telecommunications, University of Kasdi Merbah, Ouargla, Algeria
E-mail : benchabane.abderrazak@univ-ouargla.dz, cherif.fella@univ-ouargla.dz
*Corresponding author
Keywords: COVID-19, image enhancement, chest x-ray images, deep learning
Received: March 6, 2024
The rapid spread of COVID-19 has highlighted the need for automated patient data analysis to enable
faster and more accurate diagnosis. Using pre-trained deep learning models on X-ray images has
shown potential for effective COVID-19 detection. However, the performance of these models is highly
dependent on the quality and quantity of training data. To address these challenges, enhancing the
visual quality of X-ray images is critical for reliable virus detection. This study evaluates and combines
three image enhancement techniques—Histogram Equalization, Contrast-Limited Adaptive Histogram
Equalization (CLAHE), and Gamma Correction—to determine the optimal approach for improving
detection accuracy. A dataset comprising 125 chest X-ray images from COVID-19-positive patients and
500 images from non-COVID-19 cases was used. The images were preprocessed using the enhancement
techniques, and the enhanced datasets were employed to train ResNet50 and DenseNet201 models.
Simulation results demonstrate that enhanced images consistently yield higher detection accuracy than
unenhanced images. Among the techniques tested, combining Histogram Equalization, CLAHE, and
Gamma Correction with the DenseNet201 model achieved the highest performance, attaining a
remarkable accuracy of 99.03%. This outperforms previous methods, including the DarkCovidNet
model, which achieved an accuracy of 98.08% on the same dataset.
Povzetek: Avtorja sta izboljšala zaznavanje COVID-19 iz rentgenskih slik prsnega koša z uporabo
tehnik izboljšave slike (Histogram Equalization, CLAHE, Gamma Correction) v kombinaciji z modeli
globokega učenja (ResNet50, DenseNet201).
1 Introduction have passed through the human body. The result is an
analog image which is often sufficient to obtain a reliable
Corona disease is currently considered one of the most diagnosis and for low-cost screening. Various studies
widespread, dangerous and fastest diseases, so it is have indicated the failure of CXR imaging in diagnosing
necessary to find ways and methods to detect infected COVID-19 and differentiating it from other types of
cases and diagnose them in the fastest and clearest way. pneumonia [3]. The radiologist cannot use X-rays to
RT-PCR is a nuclear-derived technique that detects the detect pleural effusion and determine the volume
presence of genetic material specific to a pathogen, involved. However, regardless of the low accuracy of X-
including a virus. A formal diagnosis of COVID-19 ray diagnosis of COVID-19, it remains widely used. To
requires a laboratory test (RT-PCR) of nose and throat overcome the limitations of COVID-19 diagnostic tests
samples and takes at least 24 hours to produce a result. using radiological images, various studies have been
Nowadays, medical images and computerized analysis conducted on the use of deep learning (DL) in the
become very important tools for medical diagnosis and analysis of radiological images [2-11]. It has also shown
disease detection [1]. The radiology images show typical that image enhancement techniques can improve
COVID-19 pneumonia in the lungs and the numerous significantly classification performance [12, 13].
complications that the virus causes in the body. The
radiology imaging modalities include computed
1.1 Contribution
tomography (CT), radiograph X-rays, ultrasound,
echocardiograms and magnetic resonance imaging In this paper, we investigate the impact of using image
(MRI). These imaging modalities optimize and greatly enhancement techniques as a preprocessing step to
facilitate the process of discovering affected areas in the improve the accuracy of convolutional neural network
body [2]. Chest X-ray tests are easily available and have (CNN) models for COVID-19 detection. Specifically,
a low risk of radiation. On the other hand, CT scans have histogram equalization, Contrast Limited Adaptive
a high risk of radiation, are expensive, need clinical Histogram Equalization (CLAHE), and gamma
expertise to handle and are non-portable. This makes the correction were applied to enhance chest X-ray (CXR)
use of X-ray scans more convenient than CT scans. A images before training.
radiograph is obtained by exposing a film to X-rays that
68 Informatica 49 (2025) 67–76 A. Benchabane et al.
The enhanced images significantly improved the image detection model based on the multi-head self-
visibility of key diagnostic features, such as ground-glass attention mechanism and residual neural network,
opacities and consolidations, which are critical for achieving 95.52% accuracy with 5173 samples.
accurate COVID-19 diagnosis. The proposed Transfer learning has also played a pivotal role in
preprocessing pipeline was evaluated on a challenging COVID-19 detection. Apostolopoulos et al. [17] utilized
COVID-19 dataset with an imbalanced number of pre-trained models like VGG19, Inception ResNet v2, and
samples for both COVID and non-COVID classes. MobileNet v2, achieving 96.78% accuracy on 1427
Experimental results demonstrated that the enhanced samples for COVID-19 classification. Mahmoud et al.
images led to a notable improvement in the classification [18] applied the CovXNet architecture, achieving 97.4%
performance of CNN models, achieving higher accuracy, accuracy on 610 samples. Mohit Kumar et al. [19]
sensitivity, and specificity compared to using raw utilized a hybrid deep learning approach for multiclass
images. classification, achieving 98.20% accuracy on 6000
The rest of the paper is organized as follows: The samples.
materials and methods section contains details about our Several studies focused on binary classification tasks
proposed technique along with some context about the with high accuracy. Guefrechi et al. [20] achieved
state-of-the-art models that we have used. The results and 97.20% accuracy using deep learning methods on 5000
Discussion section presents the experimental results images. Feki et al. [21] employed a deep CNN model for
including classification accuracy, sensitivity, and F1- binary classification, reaching an accuracy of 95.30% on
score obtained from the proposed work. The paper is 216 images. Mohan et al. [22] used a hybrid deep
achieved by a conclusion. transfer learning CNN model achieving 92% accuracy
with 9220 images. Malik et al. [23] applied deep neural
1.2 Related works networks for multiclass classification, attaining 98.45%
accuracy on 10017 images. Gulmez [24] explored
Numerous studies have applied advanced artificial
Xception and genetic algorithms for multiclass
intelligence (AI) techniques, particularly deep learning
classification, reporting an accuracy of 92.4% on 1251
(DL) and machine learning (ML), to detect COVID-19
images. Lastly, Zakariya et al. proposed to combine
using X-ray images. Zhang et al. [14] developed an
combine Xception, VGG-16, and VGG-19 models. They
anomaly detection algorithm with efficient Net for
achieved an accuracy of 97.91% using 964 images.
multiclass classification, achieving an accuracy of
Table 1 provides a summary of various research
72.77% on 43370 samples. Deng et al. [15] employed
studies focusing on state-of-the-art models for COVID-
models such as SVM, CNN, ResNet50, InceptionNetV2,
19 detection using AI and ML techniques.
Xception, and VGG16 to assess health status through X-
ray imaging, obtaining an accuracy of 84% using 5857
samples. Wang et al. [16] introduced a COVID-19 X-ray
Table 1: Summary of related works on COVID-19 detection
Source Method/Model Samples used Accuracy (%)
[14] Efficient Net 43,370 72.77
[15] SVM, CNN, ResNet50, Xception, VGG16 5,857 84.00
[16] MHSA-ResNet neural network model 5173 95.52
[17] VGG19, Inception ResNet v2, and MobileNet v2 1,427 96.78
[18] CovXNet 610 97.40
[19] Hybrid deep learning approach 6,000 98.20
[20] Deep Learning (Resnet50) 5,000 97.20
[21] Deep CNN (Centralized-ResNet50) 216 95.30
[22] Deep Transfer Learning 9220 92.00
[23] Deep Neural Networks 10,017 98.45
[24] Xception and Genetic Algorithm 1,251 92.40
[25] Xception+vgg-16+vgg-19 964 97.91
2 Materials and methods
2.1 Dataset generation infected with the virus and 500 chest x-ray images of
The Dataset of chest X-ray images used in this paper non-COVID-19. The data is divided into 2 classes, 50%
for classifying negative and positive COVID-19 cases is of images were used for training and 50% for testing.
available at (https://github.com/muhammedtalo/COVID- Figure 1 shows some samples that have been used in our
19). It contains 125 chest x-ray images of patients simulation [6].
Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 69
Values of 𝛾 < 1 will shift the image towards the darker
end of the spectrum while γ > 1 will make the image
appear lighter. γ = 1 will have no effect on the input
image.
The application of histogram equalization (HE),
Contrast Limited Adaptive Histogram Equalization
(CLAHE), gamma correction, and their combination
significantly improves the quality of COVID-19 X-ray
images, aiding in better feature visualization and
extraction (see Figure 2).
Figure 1: Samples of chest X-ray images from the
dataset.
2.2 Image enhancement techniques
Image enhancement is a very important task used in
image pre-processing. Its aim is to improve the visual
details of an image or to provide a transform Figure 2: X-ray image processed with various image
representation for an appropriate usage in different fields enhancement techniques.
[4, 11]. In this paper, we have considered the following
enhancement techniques. Histogram equalization enhances global contrast,
making subtle abnormalities more visible, while CLAHE
2.2.1 Histogram equalization adaptively improves local contrast, preserving fine
details and reducing noise amplification. Gamma
Histogram Equalization (HE) is a technique for adjusting correction adjusts image brightness non-linearly,
the contrast of an image using the image’s histogram. enhancing low-intensity features like ground-glass
The goal of Histogram Equalization is to obtain a opacities. When combined, these techniques provide a
uniform histogram, which improves contrast [13]. comprehensive enhancement by leveraging global and
local adjustments, ultimately producing images with
2.2.2 Contrast limited adaptive histogram improved visibility of critical diagnostic features. This
equalization preprocessing step enhances the performance of
Contrast Limited Adaptive Histogram Equalization convolutional neural networks (CNNs) by supplying
(CLAHE) was originally developed for the enhancement higher-quality inputs, resulting in superior COVID-19
of low-contrast medical images. The algorithm of detection accuracy and robustness.
CLAHE creates non-overlapping contextual regions (also
called sub-images, tiles or blocks) and then applies the 2.3 Pre-trained CNN
histogram equalization to each contextual region, clips Two different CNN models (ResNet50 [26] and
the original histogram to a specific value and then DenseNet201 [27]) were compared separately using eight
redistributes the clipped pixels to each gray level. The different image enhancement techniques for the
clipping level determines how much noise in the classification of COVID-19 and non-COVID to
histogram should be smoothed and hence how much the investigate the effect of image enhancement on COVID-
contrast should be enhanced [13]. 19 detection.
2.2.3 Gamma correction 2.4 Performance metrics
Gamma Correction (GC) is a nonlinear adaptation
In order to evaluate the performance of each deep
applied to each and every pixel value. Gamma
learning model, different metrics have been applied in
corrections alternate the pixel value to improve the image
this study [13,28]:
using the projection relationship between the value of the 𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (2)
pixel and the value of gamma according to the internal 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
map. To calculate the gamma correction, the input value
𝑇𝑃
is raised to the power of the inverse gamma. The formula 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (3)
𝑇𝑃+𝐹𝑁
for this is as follows [13]:
1
1 𝑇𝑁
𝛾
𝐼 = 255 ( ) (1) 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (4)
255 𝑇𝑁+𝐹𝑃
70 Informatica 49 (2025) 67–76 A. Benchabane et al.
2.5 Methodology
2𝑇𝑃
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = (5)
2𝑇𝑃+𝐹𝑃+𝐹𝑁 Firstly, we train and test two pre-trained convolutional
where: neural networks with original CXR images, and then we
True Positive (TP): The prediction is COVID and the repeat the same operation with the same images
image is COVID. enhanced with the two techniques cited below. The major
True Negative (TN): The prediction is non COVID and experiments that are carried out in this study are the
the image is non COVID. combination of the enhancement methods; HE and
False Positive (FP): The prediction is COVID and the CLAHE; CLAHE and GC; HE and GC and finally
image is non COVID. CLAHE, HE and GC (see Figure 2). For each
False Negative (FN): The prediction is non COVID and combination, we compute the four performance metrics
the image is COVID. rates. The detailed methodology adopted in the study is
shown in Figure 3.
Figure 3: Flowchart of the proposed method.
and random scaling (0.5 1.1). Performance metrics were
3 Results evaluated using 10 repeated cross-validation runs, each
processing randomly selected image sets for training and
Firstly, the X-ray images were enhanced using the testing.
different techniques mentioned above. The concatenates
are formed either from the original images (without
3.1 Results without enhancement
enhancement), images enhanced by a single technique
(HE, CLAHE, or GC), or by a combination of these. The In this part, we train and test the two networks using
obtained images allow us to build 8 databases. Secondly, images without any enhancement. The DenseNet201 and
we train two pre-trained networks; ResNet50, and the ResNet50 have achieved performance with an
DenseNet201 for detecting COVID-19 in chest X-ray accuracy of 98.08% and 98.35% respectively. The
scan images. The last fully connected layer of the pre- confusion matrix constructed from the test evaluation
trained neural networks was modified to classify two results is shown in Figure 4.
classes: COVID-19 positive and negative. For both pre-
trained networks, the learning rate was set to 0.0003, 3.2 Results with enhancement
while the validation frequency was set to every 5 steps to The confusion matrix in Figure (5) illustrates the
track model performance. The maximum number of performance of DenseNet201 and ResNet50 models with
epochs was limited to 6, and the minimum batch size was different image enhancement techniques. In the case of
set to 10. The Adam optimizer and the cross-entropy loss DenseNet201 using CLAHE and GC techniques, a high
function were chosen. Additionally, data augmentation accuracy of 99.68% was achieved with only 1
techniques were applied, including random rotations (- misclassification out of 312 samples.
10,10) random horizontal and vertical shifting (-30,30 )
Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 71
histogram equalization (Clahe) generally improve the
classification performance compared to the original
images. The highest accuracy (98.17% ± 0.64) is
achieved using the HE enhancement technique, with an
F1-score of 98.86% ± 0.40. Clahe also performs well,
achieving an accuracy of 98.07% ± 0.93 and the highest
F1-score of 98.81% ± 0.57. This indicates that contrast
enhancement techniques effectively highlight important
features in the images, leading to improved model
performance. However, the combination of enhancement
techniques such as HE+GC and Clahe+GC does not
Figure 4: Confusion matrices for DenseNet 201 and consistently outperform individual techniques. For
RestNet 50 without enhancement. instance, HE+Clahe+GC results in a lower accuracy
(97.43% ± 0.56) compared to HE alone, though it
provides the highest specificity (93.38% ± 4.46). The
standard deviation (SD) values suggest that these
combined methods may introduce more variability in
performance, as seen in specificity values.
Table 3 demonstrates that the DenseNet201 model
generally exhibits higher accuracy compared to
ResNet50 for most enhancement techniques. The best
performance is achieved using the combined
HE+Clahe+GC technique, yielding an accuracy of
99.03% ± 0.54 and an F1-score of 99.39% ±0.34. This
suggests that DenseNet201 is better at leveraging the
Figure 5: Confusion matrices for DenseNet 201 and
enhanced features provided by multi-enhancement
RestNet 50 with enhancement.
approaches. Among individual techniques, HE and Clahe
both result in comparable accuracy (97.40% ± 0.82 and
The model demonstrates perfect recall (100%) for 97.30% ± 1.13, respectively), with Clahe producing a
detecting COVID cases and a very high precision of slightly higher F1-score of 98.32% ± 0.71. The GC
98.4%, indicating its ability to correctly identify COVID technique results in relatively lower specificity (90.32%
with minimal false negatives. Similarly, for NON- ± 2.94) compared to other methods, indicating that while
COVID cases, the recall and precision are near-perfect at it improves sensitivity, it may not be as effective in
99.6% and 100%, respectively, showing excellent distinguishing negative cases.
discrimination between the classes. Comparing the two models, DenseNet201
In the other hand, the ResNet50 model, utilizing HE, consistently outperforms ResNet50 across all
CLAHE, or a combination of CLAHE with HE or GC, enhancement techniques, with higher accuracy and F1-
demonstrates excellent performance with an accuracy of score values. The sensitivity of DenseNet201 is slightly
99.36%. Out of 312 samples, the model correctly lower in some cases but remains competitive. Specificity
classifies 62 COVID cases and 248 NON-COVID cases, improvements are more pronounced in DenseNet201,
with only 2 misclassifications: 2 False Positives (NON- which indicates better handling of false positives.
COVID predicted as COVID) and no False Negatives Regarding variability, DenseNet201 exhibits lower
(COVID misclassified as NONCOVID). The recall for standard deviations in most metrics, particularly in
COVID detection is perfect at 100%, indicating no accuracy and F1-score, suggesting more stable and
incorrect predictions of COVID for NON-COVID cases, reliable performance across different enhancement
while the precision is slightly lower at 96.9% due to the techniques. Conversely, the ResNet50 model experiences
False Positives. These preprocessing techniques enhance greater variability, especially with specificity values. The
image contrast and normalize brightness, aiding the results highlight the importance of image enhancement
model’s ability to discriminate between classes techniques in improving deep learning model
effectively. performance. While individual techniques such as HE
and Clahe provide significant improvements, combining
multiple techniques can further enhance performance,
4 Discussion particularly for DenseNet201.
The performance metrics reported in Tables 2 and 3 In addition, while exploring the confidence intervals
provide a comprehensive comparison of different image through error bars in Figure (6), It can be shown that the
enhancement techniques applied to ResNet50 and DenseNet201 model generally achieves higher accuracy
DenseNet201 models. The analysis includes accuracy, and F1-scores, particularly with the combination of HE,
sensitivity, specificity, and F1-score, along with their Clahe and GC techniques, while ResNet50 shows better
respective standard deviations. specificity for techniques like GC and HE+GC.
From Table 2, it is evident that histogram Sensitivity remains high for both models, with
equalization (HE) and contrast-limited adaptive
72 Informatica 49 (2025) 67–76 A. Benchabane et al.
overlapping confidence intervals indicating comparable performance.
Table 2: Mean and Standard Deviation (SD) of Performance metrics for the ResNet50 model
Enhancement Accuracy (%) Sensitivity (%) Specificity (%) F1-Score (%)
Techniques Mean SD Mean SD Mean SD Mean SD
Original 96.60 0.52 98.88 1.20 87.41 5.67 97.90 0.31
HE 98.17 0.64 99.40 0.54 93.22 3.20 98.86 0.40
Clahe 98.07 0.93 99.80 0.38 91.12 4.04 98.81 0.57
GC 97.78 0.99 98.40 1.46 95.32 2.68 98.61 0.63
HE+GC 97.62 0.50 98.96 0.82 92.25 3.46 98.53 0.31
HE+ Clahe 98.01 0.78 99.44 0.84 92.25 2.49 98.77 0.49
Clahe+GC 97.75 0.89 98.96 1.16 92.90 3.58 98.60 0.56
HE+Clahe+GC 97.43 0.56 98.44 1.36 93.38 4.46 98.40 0.36
Table 3: Mean and Standard Deviation (SD) of Performance metrics for the DenseNet201 model
Enhancement Accuracy (%) Sensitivity (%) Specificity (%) F1-Score (%)
Techniques Mean SD Mean SD Mean SD Mean SD
Original 96.95 1.08 98.24 0.92 91.77 4.95 98.10 0.66
HE 97.40 0.82 98.48 1.39 93.06 4.56 98.38 0.51
Clahe 97.30 1.13 98.16 1.15 93.87 4.15 98.32 0.71
GC 97.46 0.81 99.24 0.66 90.32 2.94 98.43 0.50
HE+GC 97.62 1.41 99.08 1.10 91.77 5.01 98.52 0.87
HE+ Clahe 98.75 0.63 99.68 0.52 95.00 2.89 99.22 0.38
Clahe+GC 98.87 0.83 99.64 0.66 95.80 2.17 99.30 0.51
HE+Clahe+GC 99.03 0.54 99.56 0.63 96.93 2.45 99.39 0.34
ResNet50 exhibits greater variability across metrics, exhibits larger confidence intervals, indicating greater
whereas DenseNet201 provides more stable results. In variability in its performance, whereas ResNet50 shows
addition, while exploring the confidence intervals more consistent results with narrower confidence
through error bars in Figure (6), it can be shown that the intervals, especially in Specificity. This variability
DenseNet201 model generally outperforms ResNet50, suggests that while DenseNet201 may achieve higher
particularly in Sensitivity and F1-score, with significant performance, its predictions are less stable across trials or
improvements observed when using combined datasets.
enhancement techniques. However, DenseNet201
Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 73
Figure 6: Confidence intervals for performance metrics of COVID-19 detection models.
The radar chart shown in Figure (7) compares the models. They achieved 96.1% accuracy on a dataset
area under the ROC curve (AUC-ROC) for the two deep containing 3141 chest X-ray images. Ozturk et al. [5]
learning models across the different image preprocessing have used the DarkCovidNet and the same dataset used
techniques. It can be seen that DenseNet201 generally in this paper. They achieved an accuracy of 98.08 %.
demonstrates higher AUC-ROC values compared to Purohit
ResNet50 across most preprocessing techniques, et al. [6] have used a Convolutional Neural Network with
particularly in combinations involving multiple augmented data to increase the dataset. They have
enhancements like HE+Clahe+GC and Clahe+GC. achieved 99.44 % accuracy.
However, ResNet50 shows comparable performance in In addition, the proposed enhancement methods
cases such as HE and GC. The results indicate that show significant improvements in accuracy compared to
preprocessing techniques significantly impact model the models using other data sets. The DenseNet201model
performance, with DenseNet201 being more responsive achieved an accuracy of 99.67%, outperforming models
to enhancements. This suggests that model selection and like Feki et al. [21], which reported 95.3% accuracy
preprocessing strategy should be carefully considered to using Deep CNN, and Guefrechi et al. [20], which
optimize classification performance based on the desired achieved 97.20% with a deep learning approach.
evaluation metric. Similarly, ResNet50 demonstrated high performance
with an accuracy of 99.35%, surpassing Apostolopoulos
et al. [17] (96.78%) and Mahmoud et al. [18] (97.40%).
Furthermore, Deng et al. [15] achieved a maximum
accuracy of 84.0%, showcasing the substantial
improvement offered by the proposed methods. These
results demonstrate that the proposed models consistently
achieve higher accuracy, emphasizing their reliability
and effectiveness in improving COVID-19 detection
compared to the previously established state-of-the-art
models.
6 Conclusion
This paper investigates how image enhancement
techniques can improve the performance of pre-trained
neural networks when working with limited data. Two
Figure 7: Radar plot of the area under the ROC curve pre-trained convolutional neural networks, ResNet50 and
(AUC-ROC ) Performance for ResNet50 and DenseNet201, were selected for COVID-19 detection.
DenseNet201 with Various Image Preprocessing The training set was constructed by applying various
Techniques. enhancement techniques to chest X-ray images, which
highlight critical structures such as lung opacities and
5 Comparison with the state-of-the- consolidations—key features for accurate COVID-19
art CNN approaches diagnosis. The results demonstrate that COVID-19
detection accuracy is significantly improved when using
To evaluate the proposed method, we made a comparison
enhanced images compared to non-enhanced ones for
with some existing models using COVID-19 X-ray
both pre-trained networks.
images. Narin et al. [7] have proposed three different DL
74 Informatica 49 (2025) 67–76 A. Benchabane et al.
Based on metrics such as accuracy, sensitivity, cases. When compared to previous studies using the
specificity, and F1-score, the best-performing model was same dataset, DenseNet201 outperforms DarkCovidNet,
DenseNet201, achieving an accuracy of 99.67%, which achieved 98.08%, underscoring the effectiveness
sensitivity of 100%, specificity of 98.38%, and an F1- of these enhanced models on real-world X-ray images.
score of 99.80% for classifying positive and negative
Table 4: Results comparison with related works on COVID-19 detection
Sources Method/Model Samples Accuracy (%)
[7] Inception V3, ResNet50, Inception-ResNet V2 3141 96.1
[5] DarkCovidNet 625 98.08
[6] CNN 1072 99.44
[15] SVM, CNN, ResNet50, Xception, VGG16 5857 84.00
[17] VGG16, VGG19, ResNet, DenseNet, InceptionV3 1427 96.78
[18] COVID-Net 610 97.40
[20] Deep Learning 5000 97.20
[21] Deep CNN 216 95.30
[29] End-to-end CNN 5184 95.70
ResNet50 with HE 625 99.36
Proposed
DenseNet201with Clahe+HE+GC 625 99.67
Images Using Multi-image Augmented Deep
References Learning Model. Advances in Intelligent Systems
and Computing, 1412:395–413, 2022.
[1] Janko V, Slapničar G, Dovgan E, Reščič N, Kolenik http://dx.doi.org/10.1007/978-981-16-6890-6_30
T, Gjoreski M, Smerkol M, Gams M, Luštrek M. [7] Narin A, Kaya C, Pamuk Z. Automatic Detection of
Machine Learning for Analyzing Non- Coronavirus Disease (COVID-19) Using X-ray
Countermeasure Factors Affecting Early Spread of Images and Deep Convolutional Neural Networks,
COVID-19. International Journal of Environment
Pattern Analysis and Applications 24(3):1207–
Research and Public Health, 18(13):6750, 2021.
1220, 2020.
https://doi.org/10.3390/ijerph18136750
https://doi.org/10.1007/s10044-021-00984-y
[2] Rajkumar S, Rajaraman PV, Meganathan HS,
[8] Sarki R, Ahmed K, Wang H, Zhang Y, Wang K.
Sapthagirivasan V, ejaswinee V, Ashwin R.
Automated detection of COVID-19 through
COVID-detect: a Deep Learning Approach for
convolutional neural network using chest x-ray
Classification of Covid-19 Pneumonia From Lung
images. PLoS ONE 17(1), 2022.
Segmented Chest Xrays. Biomedical Engineering:
https://doi.org/10.1371/journal.pone.0262052
Applications, Basis and Communications 33(2),
[9] Masud, M. A light-weight convolutional
2021.
https://doi.org/10.4015/S1016237221500101 Neural Network Architecture for classification
[3] Gams M, Kolenik, T. Relations between of COVID-19 chest X-Ray images. Multimedia
Electronics, Artificial Intelligence and Information Systems, 28:1165–1174, 2022.
Society through Information Society Rules. https://doi.org/10.1007/s00530-021-00857-8
Electronics, 10(4), 514, 2021. [10] Ravi, V, Narasimhan, H, Chakraborty, C et al. Deep
https://doi.org/10.3390/electronics10040514 learning-based metaclassifier approach for COVID-
[4] Tahir A, Qiblawey Y, Khandakar A, Rahman T, 19 classification using CT scan and chest X-ray
Khurshid U, Musharavati F, Islam MT, Kiranyaz S, images. Multimedia Systems, 28:1401–1415, 2022.
Al-Maadeed S, Chowdhury MEH. Deep Learning https://doi.org/10.1007/s00530-021-00826-1
for Reliable Classification of COVID-19, MERS, [11] Asif S, Zhao M, Tang F et al. A deep learning-
and SARS from Chest X-ray Images. Cognitive based framework for detecting COVID-19 patients
Computation. 14:1752-1772, 2022. using chest X-rays. Multimedia Systems, 28:1495–
https://doi.org/10.1007/s12559-021-09955-1 1513, 2022.
[5] Ozturk T, Talo M, Yildirim EA, Baloglu UB, https://doi.org/10.1007/s00530-022-00917-7
Yildirim O, Acharya UR. Automated detection of [12] Tahir A, Qiblawey Y, Khandakar A, Rahman T,
covid-19 cases using deep neural networks with x- Khurshid U, Musharavati F, Islam MT, Kiranyaz S,
ray images, Computers in Biology and Medicine. Al-Maadeed S, Chowdhury MEH. Exploring the
121(103792), 2020. effect of image enhancement techniques on
https://doi.org/10.1016/j.compbiomed.2020.103792 COVID-19 detection using chest X-ray images.
[6] Purohit K, Kesarwani A, Ranjan Kisku D, Dalui M. Computers in Biology and Medicine,132, 2021.
COVID-19 Detection on Chest X-Ray and CT Scan https://doi.org/10.1016/j.compbiomed.2021.104319
Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 75
[13] Kandhway P, Bhandari AK, Singh A. A novel from Multiple Chest Diseases Using Xrays.
reformed histogram equalization based medical Sensors, 23(2):743, 2023.
image contrast enhancement using krill herd https://doi.org/10.3390/s23020743
optimization, Biomedical Signal Processing and [24] Gulmez, B. A novel deep neural network model
Control, 56 (101677), 2020. based Xception and genetic algorithm for detection
https://doi.org/10.1016/j.bspc.2019.101677 of COVID-19 from Xray images. Annals of
[14] Zhang J, Xie Y, Pang G, Liao Z, Verjans J, Li W, Operations Research, 328:617–641, 2022.
Sun Z, He J, Li Y, Shen C et al. Viral Pneumonia https://doi.org/10.1007/s10479-022-05151-y
Screening on Chest X-Rays Using Confidence [25] Zakariya A, Oraibi SA. Efficient COVID-19
Aware Anomaly Detection. IEEE Transaction on Prediction by Merging Various Deep Learning
Medical Imaging, 40(3): 879–890, 2021. Architectures, Informatica 48(5): 55–62, 2024.
https://doi.org/10.1109/tmi.2020.3040950 https://doi.org/10.31449/inf.v48i5.5424
[15] Deng X, Shao H, Shi L, Wang X, Xie T. A [26] He K, Zhang X, Ren S, Sun J. Deep residual
classification–detection approach of COVID-19 learning for image recognition, Proceedings of the
based on chest X-ray and CT by using keras IEEE Conference on Computer Vision and Pattern
pretrained deep learning models. Computer Recognition.2016.
Modeling in Engineering & Sciences. 125(2):579– https://doi.org/10.1109/CVPR.2016.90
596, 2020. [27] Huang G, Liu Z, Van Der Maaten L, Weinberger
https://doi.org/10.32604/cmes.2020.011920 KQ. Densely connected convolutional networks,
[16] Wang Z, Zhang K, Wang B. Detection of COVID- Proceedings of the IEEE Conference on
19 Cases Based on Deep Learning with X-ray Computer Vision and Pattern Recognition.
Images. Electronics. 11(21):3511, 2022. 2017.
https://doi.org/10.3390/electronics11213511 https://doi.ieeecomputersociety.org/10.1109/CVPR.
[17] Apostolopoulos, I.D, Mpesiana T.A. COVID-19: 2017.243
Automatic detection from X-ray images utilizing [28] Patel S, Patel L. Deep Learning Architectures and
transfer learning with convolutional neural its Applications: A Survey, International journal of
networks. Physical and Engineering Sciences in computer sciences and engineering. 6(6):1177-
Medicine. 43: 635–640, 2020. 1183, 2018 .
https://doi.org/10.1007/s13246-020-00865-4
http://dx.doi.org/10.26438/ijcse/v6i6.11771183
[18] Mahmud T, Rahman A, Fattah S.A. CovXNet: A
[29] Zakariya A. Oraibi, Safaa Albasri. A Robust End-
multi-dilation convolutional neural network for
to-End CNN Architecture for Efficient COVID-19
automatic COVID-19 and other pneumonia
Prediction form X-ray Images with Imbalanced
detection from chest X-ray images with transferable
Data, Informatica, 47(7):115–126, 2023.
multireceptive feature optimization. Computers in
https://doi.org/10.31449/inf.v47i7.4790
Biology and Medicine, 122, 103869,2020.
https://doi.org/10.1016/j.compbiomed.2020.103869
[19] Mohit K, Dhairyata S, Vinod K, and Wanich S.
COVID-19 prediction through X-ray images using
transfer learning-based hybrid deep learning
approach. MaterialsToday: Proceeding. 51: 2520–
2524, 2022.
https://doi.org/10.1016/j.matpr.2021.12.123
[20] Guefrechi S, Jabra M. B, Ammar A, Koubaa A, and
Hamam H. Deep learning-based detection of
COVID-19 from chest X-ray images. Multimedia
tools and applications, 80: 31803-31820, 2021.
https://doi.org/10.1007/s11042-021-11192-5
[21] Feki I, Ammar S, Kessentini Y, Muhammad K.
Federated learning for COVID-19 screening from
Chest X-ray images, Applied Soft Computing, 106,
2021.
https://doi.org/10.1016/j.asoc.2021.107330.
[22] Mohan A, Ftsum bAa, Beshir K, Takore TT. A
Hybrid Deep Learning CNN model for COVID-19
detection from chest X-rays, Heliyon, 10(5), 2024.
https://doi.org/10.1016/j.heliyon.2024.e26938.
[23] Malik H, Naeem A, Naqvi, R A, and Loh, W. K.
DMFL-Net: A Federated Learning-Based
Framework for the Classification of COVID-19
76 Informatica 49 (2025) 67–76 A. Benchabane et al.
https://doi.org/10.31449/inf.v49i16.7635 Informatica 49 (2025) 77–86 77
Enhancing Predictive Capabilities for Cyber Physical Systems
Through Supervised Learning
Dhanalakshmi B*, Tamije Selvy P
Department of Computer Science and Engineering, Dr.N.G. P Institute of technology, India
Department of Computer Science and Engineering, Hindusthan College of Engineering and
Technology, India
E-mail: dhanalakshmib@drngpit.ac.in, tamijeselvy@gmail.com
*Corresponding author
Keywords: Cyber-physical system, real time data, traffic, machine learning
Received: November 20, 2024
The rapid advancement and proliferation of Cyber-Physical Systems (CPS) have led to an exponential
increase in the volume of data generated continuously. Efficient classification of this streaming data is
crucial for predicting system behaviors and enabling proactive decision-making. This research aims to
extract actionable knowledge from the continuous data streams of CPS and predict their behavior using
advanced supervised learning algorithms. The predictions facilitate timely interventions and necessary
actions within the interconnected physical network. The background of this work lies in the intersection
of CPS, machine learning, and data stream mining. Traditional batch processing methods are inadequate
for real-time analysis of CPS data due to their inherent latency and computational inefficiency. This
research employs state-of-the-art techniques for real-time data processing, including incremental
learning, sliding window models, and ensemble methods tailored for streaming data. Our approach differs
from existing works by focusing on a comprehensive framework that integrates real-time data ingestion,
preprocessing, feature extraction, and model updating in a seamless pipeline. Unlike previous studies that
often rely on static datasets and offline analysis, our method ensures continuous learning and adaptation
to evolving data patterns. Comparative analysis with existing techniques demonstrates superior
performance in terms of accuracy, latency, and scalability. Specifically, our models achieved an average
classification accuracy of 92%, with a precision of 90%, recall of 89%, and an F1 score of 89.5%. These
metrics indicate significant improvements over traditional batch processing methods, which typically lag
in responsiveness and adaptability. This research provides a robust and efficient solution for the real-
time classification of streaming data from CPS, enhancing the system's ability to predict behaviors and
take necessary actions promptly.
Povzetek: Predstavljen je izviren celovit ogrodni model za razvrščanje podatkov v realnem času v
kibernetsko-fizičnih sistemih (CPS) z uporabo nadzorovanega učenja.
1 Introduction need to be processed in real-time to ensure optimal
performance and to address potential issues proactively.
The integration of Cyber-Physical Systems (CPS) into Traditional batch processing methods are inadequate for
various sectors marks a significant advancement in this task due to their inherent latency and computational
technology, enabling seamless interaction between inefficiency. Instead, there is a need for techniques that
physical processes and computational systems. These can handle the continuous, high-speed influx of
systems, encompassing applications such as smart grids, information in a CPS. Supervised learning algorithms
autonomous vehicles, industrial automation, and have shown considerable promise in various predictive
healthcare monitoring, generate continuous streams of tasks within data science. These algorithms can identify
data. This data, produced in real-time, holds valuable patterns and relationships within historical data and
insights that can enhance system performance, reliability, predict future outcomes [3]. However, applying these
and safety. However, the sheer volume and velocity of this techniques to streaming data requires adaptations to
streaming data present significant challenges in terms of manage the continuous flow and update the model
processing and analysis. Efficient classification and incrementally [4]. This research focuses on developing an
prediction of CPS behaviors using this data are crucial for efficient framework for classifying and predicting CPS
timely decision-making and intervention [1,2]. Cyber- behavior using supervised learning, including advanced
Physical Systems are characterized by their ability to models like Hidden Markov Models (HMM) and Explicit-
integrate physical processes with computational Duration Hidden Markov Models (EDHMM).
capabilities through a network of sensors, actuators, and To achieve these objectives, this research employs a
controllers. The data generated from these components variety of advanced techniques tailored for the unique
78 Informatica 49 (2025) 77–86 Dhanalakshmi B et al.
challenges of streaming data from CPS. Real-time data F1 scores consistently outperforming traditional batch
ingestion and preprocessing are facilitated by leveraging processing techniques. These results validate the
stream processing frameworks such as Apache Kafka and framework's ability to handle the complexities of CPS data
Apache Flink, enabling efficient data ingestion and streams effectively. The practical implications of this
ensuring that real-time data cleaning and normalization research are profound, offering enhanced operational
techniques maintain data quality and consistency. efficiency and reliability in various CPS applications. For
Incremental and online learning algorithms like Online instance, in a smart grid, accurate predictions of power
Gradient Descent, Incremental Decision Trees, and demand and equipment failures can optimize energy
Adaptive Random Forests are utilized, along with sliding distribution and maintenance schedules. In industrial
window techniques to retain recent data, ensuring the automation, predicting machine failures and operational
model adapts to the latest trends and patterns [5]. Hidden anomalies can prevent costly downtimes and improve
Markov Models (HMM) are employed to model the production efficiency.
stochastic processes underlying CPS data, capturing The primary objective of this research is to develop an
temporal dependencies and sequential patterns. HMMs efficient framework for the classification of streaming
consist of states representing different conditions or data from CPS, enabling the prediction of system
modes of the CPS, observations that are data points behaviors and facilitating timely interventions. This
generated by the CPS and are probabilistically dependent overarching goal can be broken down into several specific
on the states, transition probabilities indicating the objectives: Develop methods for real-time ingestion and
likelihood of transitioning from one state to another, and preprocessing of streaming data; Ensure the system can
emission probabilities representing the likelihood of handle high-velocity data streams without significant
observing a particular data point given a state. By latency; Implement supervised learning algorithms
continuously updating the transition and emission capable of incremental learning, allowing the model to
probabilities as new data arrives, HMMs enable real-time update continuously; Explore techniques such as sliding
tracking of the system’s state and prediction of future window models and online learning to maintain model
behaviors. Explicit-Duration Hidden Markov Models relevance over time; Design robust feature extraction
(EDHMM) extend the capabilities of HMM by explicitly mechanisms that can operate in real-time; Identify and
modeling the duration that the system spends in each state, create features that are predictive of CPS behaviors,
which is particularly useful for CPS where the duration of ensuring these features can be computed on-the-fly; Apply
certain states significantly impacts the system’s behavior, HMMs to model the probabilistic relationships and
such as machinery operating cycles or sensor activation temporal dependencies in CPS data; Extend HMMs with
periods. EDHMM components include state durations, EDHMM to incorporate state durations, providing more
which are probabilistic distributions defining how long the precise temporal modeling; Establish metrics for
system remains in a given state, and transition and evaluating model performance on streaming data,
emission probabilities similar to HMM but adjusted to including accuracy, precision, recall, and F1 score;
account for state duration distributions. By incorporating Develop strategies for model adaptation to cope with
state durations, EDHMM provides a more accurate concept drift and changing data patterns; Compare the
temporal modeling, enhancing the prediction of CPS performance of the proposed framework against
behaviors over time. traditional batch processing methods and other state-of-
Feature extraction and engineering are also crucial, the-art techniques; Conduct experiments to demonstrate
involving the development of methods for real-time improvements in accuracy, latency, and scalability; Apply
feature extraction that allows dynamic computation of the framework to real-world CPS scenarios, such as smart
features as new data arrives and the creation of features grids and industrial automation systems; Showcase how
based on domain knowledge that capture critical aspects the predictions and classifications can drive actionable
of CPS behavior such as temporal patterns and anomaly decisions within the CPS.
indicators. Model evaluation and adaptation are facilitated
by establishing a real-time evaluation pipeline that 2 Literature review
continuously monitors model performance using metrics
like accuracy, precision, recall, and F1 score, and The increasing complexity of Cyber-Physical
implementing strategies to handle concept drift, such as Systems (CPS) and their integration into various sectors
retraining models based on performance degradation. This necessitate advanced data processing and predictive
research distinguishes itself from existing works by techniques to ensure optimal performance and security.
offering an integrated framework that combines real-time The literature reveals a range of approaches for handling
data processing, incremental learning, and advanced streaming data, including supervised learning, clustering,
modeling techniques like HMM and EDHMM. While active learning, semi-supervised learning, and advanced
previous studies often focus on isolated aspects of CPS models such as Hidden Markov Models (HMM) and
data analysis, this work emphasizes a comprehensive Explicit-Duration Hidden Markov Models (EDHMM).
approach that addresses the practical challenges of Cheng et al. (2021) [6] introduced MATEC, a
dynamic CPS environments. The comparative analysis lightweight neural network designed for online encrypted
highlights significant improvements in performance traffic classification. This approach addresses the
metrics. The proposed methods achieved an average challenges of real-time data classification in CPS by
classification accuracy of 92%, with precision, recall, and focusing on the efficiency and speed of the model, making
Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 79
it suitable for environments where data streams are challenges of clustering and classifying short text data in
continuous and rapid. The model's lightweight nature real-time, which is relevant for CPS applications
ensures that it can be deployed in resource-constrained involving text data, such as social media analysis or sensor
settings without compromising performance. Coletta et al. logs. Li et al. (2020) [16] introduced a classification and
(2019) [7] proposed combining clustering and active novel class detection algorithm based on the cohesiveness
learning to detect and learn new image classes. This and separation index of Mahalanobis distance. This
method is particularly relevant to CPS, where new patterns technique ensures that the system can effectively classify
or anomalies must be detected promptly. By integrating data while detecting new classes, crucial for maintaining
clustering with active learning, the system can identify the adaptability and accuracy of CPS. Lu et al. (2019) [17]
novel classes of data efficiently, enhancing its ability to reviewed learning under concept drift, highlighting the
adapt to changing conditions in real-time. Din et al. (2020) challenges and solutions for maintaining model
[8] focused on online reliable semi-supervised learning for performance in dynamically changing environments.
evolving data streams. Their approach leverages both Concept drift is a common issue in CPS, where the
labeled and unlabeled data, ensuring that the model can underlying data distribution can change over time. The
learn effectively even when labeled data is scarce. This review covers various strategies to detect and adapt to
method is crucial for CPS, where obtaining labeled data concept drift, ensuring that models remain effective.
for every new scenario can be impractical. The semi- Wang and Chen (2019) [18] discussed the construction of
supervised learning model adapts to changes in the data a data aggregation tree with maximized lifetime in
stream, maintaining high performance despite evolving wireless sensor networks. This method focuses on
conditions. Dong et al. (2022) [9] presented an optimizing the lifetime of the network, which is essential
interpretable federated learning-based framework for for the sustainability and reliability of CPS. Xu and Duan
network intrusion detection. Federated learning allows (2019) [19] surveyed big data applications for CPS in
multiple devices to collaboratively learn a model without Industry 4.0, highlighting the role of data analytics in
sharing raw data, addressing privacy concerns inherent in optimizing industrial processes. Their survey covers
CPS. This approach ensures robust security measures various techniques for processing and analyzing big data,
while maintaining the confidentiality of sensitive data emphasizing the importance of efficient data management
across the network. Folino et al. (2020) [10] developed a in CPS. Zaitseva and Lavrova (2020) [20] explored the
genetic programming-based ensemble classification self-regulation of network infrastructure in CPS based on
framework for time-changing intrusion detection data the genome assembly problem. This innovative approach
streams. This ensemble approach combines multiple applies biological principles to optimize network
models to improve overall prediction accuracy and adapt performance and self-regulation, offering a novel
to changes in the data. The genetic programming aspect perspective on CPS management. The literature provides
allows the system to evolve over time, ensuring that it a comprehensive overview of various approaches for
remains effective in the face of new threats. Hu et al. handling streaming data in CPS. These methods range
(2018) [11] introduced a random forests-based class from lightweight neural networks and federated learning
incremental learning method for activity recognition. This to quantum machine learning and genetic programming-
technique is particularly useful for CPS, where new based ensemble classification. Each technique addresses
activities or behaviors may emerge over time. The specific challenges related to real-time data processing,
incremental learning approach ensures that the model can adaptability, and security in CPS. The integration of these
continuously adapt without needing a complete retraining, advanced methods ensures that CPS can operate
making it efficient for real-time applications. efficiently and effectively in dynamic environments,
Yagyu et al (2020) [12] discussed hierarchical maintaining high performance and reliability. The
aggregation of select network traffic statistics, proposed work overcomes the challenges in existing
emphasizing the importance of efficient data aggregation works by offering an integrated framework that combines
in CPS. This method enhances the scalability and real-time data processing, incremental learning, and
manageability of data streams, ensuring that the system advanced modeling techniques like HMM and EDHMM.
can handle large volumes of data without significant Traditional methods often suffer from limitations such as
latency. Júnior et al. (2019) [13] explored novelty latency, inefficiency in handling high-velocity data, and
detection for multi-label stream classification, a critical inability to adapt to evolving data streams. By leveraging
capability for CPS to identify and respond to new and real-time data ingestion and preprocessing with stream
unforeseen events. Their approach ensures that the system processing frameworks like Apache Kafka and Apache
can maintain high accuracy and reliability even when Flink, the proposed framework ensures efficient handling
encountering novel data patterns. Kalinin and Krundyshev of continuous data. Incremental and online learning
(2022) [14] applied quantum machine learning techniques algorithms such as Online Gradient Descent, Incremental
for security intrusion detection. This cutting-edge Decision Trees, and Adaptive Random Forests allow the
approach leverages the computational power of quantum model to update continuously, addressing the challenge of
computing to enhance the efficiency and accuracy of maintaining model relevance over time. The use of HMM
intrusion detection, offering a promising direction for and EDHMM enhances the framework's ability to capture
future CPS security measures. Kumar et al. (2020) [15] temporal dependencies and state durations, providing
proposed an online semantic-enhanced Dirichlet model more accurate temporal modeling. This approach ensures
for short text stream clustering. This model addresses the
80 Informatica 49 (2025) 77–86 Dhanalakshmi B et al.
robust performance even in the face of concept drift, a duration modeling, making them suitable for applications
common issue in dynamic CPS environments. where the duration of states is an important factor. The
performance of these models is continuously evaluated
3 Proposed methodology using metrics such as accuracy, precision, recall, and F1-
score. This evaluation ensures that the models remain
The proposed methodology aims to create an efficient effective over time. However, in dynamic environments
and adaptive framework for the classification and like CPS, data distributions can change, leading to a
prediction of streaming data from Cyber-Physical Systems phenomenon known as concept drift. Concept drift occurs
(CPS). This section outlines the key components and when the statistical properties of the target variable change
techniques employed in the framework, including real- over time, which can degrade the performance of
time data ingestion, preprocessing, supervised learning predictive models. To address concept drift, techniques for
algorithms, advanced modeling with Hidden Markov detecting and adapting to these changes are integrated into
Models (HMM) and Explicit-Duration Hidden Markov the system. When concept drift is detected, models are
Models (EDHMM), and real-time feature extraction. In retrained or updated to accommodate the new patterns in
the realm of Cyber Physical Systems (CPS), the the data, ensuring that predictions remain accurate and
continuous influx of data presents a significant challenge reliable. This adaptive approach is essential for
and opportunity for real-time analysis and prediction. maintaining the relevance and performance of the models
Efficient classification and prediction of this data are in the face of changing data environments.
crucial for timely decision-making and ensuring the
reliability and safety of these systems. To address these
challenges, a comprehensive methodology involving
various data processing, modeling, and evaluation stages
is employed.
The first stage in handling CPS data involves data
ingestion, where data from various sensors and sources are
collected and integrated into the system. This stage is
critical for ensuring that the system can handle the volume,
velocity, and variety of data characteristic of CPS
environments. Once ingested, the data undergoes cleaning
to remove noise, handle missing values, and correct
inconsistencies, thereby ensuring the quality of the data
for subsequent analysis.
Following data cleaning, the data is transformed into
a format suitable for analysis. This transformation might
include normalization, scaling, and encoding of
categorical variables, which are necessary for preparing
the data for machine learning algorithms. Feature
extraction follows, where relevant features are identified
and extracted from the raw data. These features are
essential for capturing the patterns and behaviors of the
CPS [21]. Feature selection then plays a crucial role in Figure 1: Proposed architecture
improving model performance and reducing
computational complexity. By selecting only the most Figure 1 outlines a systematic approach to the
relevant features, the dimensionality of the data is efficient classification and prediction of streaming data
reduced, which helps in building more efficient and from Cyber Physical Systems (CPS). It begins with "Raw
effective predictive models. For modeling, supervised Data" collection, followed by "Data Ingestion" to gather
learning algorithms are typically employed. These data from various sources. "Data Cleaning" is performed
algorithms are trained on historical data to learn the to ensure data quality by removing noise and handling
underlying patterns and relationships, enabling them to missing values. The clean data is then transformed in the
make predictions on new data. Popular algorithms include "Data Transformation" stage to prepare it for analysis.
decision trees, support vector machines, and neural Next, the "Feature Extraction" stage identifies
networks, each offering different advantages in terms of relevant features, which are subsequently refined in the
accuracy, interpretability, and computational efficiency. "Feature Selection" stage to reduce dimensionality and
In addition to traditional supervised learning models, enhance model performance. The selected features are
advanced modeling techniques like Hidden Markov then used for "Model Training" with supervised learning
Models (HMM) and Explicit-Duration Hidden Markov algorithms, and "Model Prediction" is carried out to
Models (EDHMM) are used. HMMs are particularly forecast CPS behavior.
effective for modeling time series data and capturing In parallel, the diagram includes advanced modeling
temporal dependencies, which are common in CPS data. techniques like "HMM Training" and "EDHMM
EDHMMs extend HMMs by incorporating explicit state Training," which produce "HMM Model" and "EDHMM
Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 81
Model," respectively. These models are integrated into the data, ensuring adaptability to changing data
prediction stage for improved accuracy. distributions.
"Model Evaluation" assesses the performance of the 3.2 Advanced Modeling with HMM and
predictive models, ensuring their reliability. The system EDHMM
also includes "Concept Drift Detection" to identify
To capture the temporal dependencies and state
changes in data patterns over time, prompting "Model
transitions in CPS data, the proposed framework employs
Adaptation" to update and retrain models, maintaining
Hidden Markov Models (HMM) and Explicit-Duration
their effectiveness in dynamic environments. This
Hidden Markov Models (EDHMM).
comprehensive workflow ensures robust and adaptive
prediction capabilities for CPS data streams.
Hidden Markov Models (HMM)
State Representation: HMMs consist of hidden states that
3.1 Real-time data ingestion and represent different conditions or modes of the CPS.
preprocessing Observations are the data points generated by the CPS and
efficient handling of continuous data streams is critical for are probabilistically dependent on these states.
CPS. The proposed framework utilizes stream processing Transition and Emission Probabilities: HMMs use
frameworks such as Apache Kafka and Apache Flink to transition probabilities to model the likelihood of moving
facilitate real-time data ingestion. These technologies from one state to another and emission probabilities to
ensure that data can be ingested at high speeds and with represent the likelihood of observing a particular data
low latency, crucial for maintaining the performance of point given a state.
CPS. Real-Time Updates: As new data arrives, the transition
and emission probabilities are updated in real-time,
Data ingestion allowing the model to adapt to new patterns and predict
Apache Kafka: Kafka is used to handle the ingestion of future states accurately.
large volumes of streaming data. Its distributed nature
allows it to scale horizontally, ensuring reliability and Explicit-Duration Hidden Markov Models (EDHMM)
fault tolerance. State Duration Modeling: EDHMM extends HMM by
Apache Flink: Flink complements Kafka by providing explicitly modeling the duration that the system spends in
real-time data processing capabilities. It allows for each state. This is particularly useful for CPS, where the
complex event processing, real-time analytics, and duration of states (such as operational cycles or sensor
machine learning tasks on data streams. activation periods) significantly impacts behavior.
Duration Probabilities: EDHMM incorporates
Data preprocessing probabilistic distributions that define how long the system
Real-Time Data Cleaning: Techniques such as filtering, remains in a given state, enhancing the temporal accuracy
normalization, and handling missing values are applied in of predictions.
real-time to ensure data quality. Temporal Precision: By incorporating state durations,
Data Transformation: Data is transformed into a suitable EDHMM provides a more precise temporal modeling,
format for the machine learning models. This includes improving the prediction of CPS behaviors over time.
scaling features and encoding categorical variables.
3.3 Real-time feature extraction and
Supervised learning algorithms engineering
The core of the predictive framework relies on
Feature extraction is critical for the performance of
supervised learning algorithms capable of incremental
machine learning models. The proposed framework
learning. Incremental learning, also known as online
includes methods for real-time feature extraction, ensuring
learning, allows models to update their parameters as new
that features are dynamically computed as new data
data arrives without requiring a complete retraining from
arrives.
scratch.
Feature Extraction Methods
Algorithms used
• Sliding Window Technique: This technique involves
maintaining a window of the most recent data points
• Online gradient descent: This algorithm updates
and computing features based on this window. It
the model weights incrementally for each new data
ensures that the model focuses on the most relevant
point, making it suitable for real-time applications.
and recent data.
• Incremental decision trees: Algorithms like
• Domain-Specific Features: Features are created
Hoeffding Trees are used to build decision trees
based on domain knowledge, capturing critical
incrementally, allowing the model to adapt as new
aspects of CPS behavior such as temporal patterns,
data comes in.
trend analysis, and anomaly indicators.
• Adaptive random forests: This method extends the
• Dynamic Computation: Features are computed on-
random forest algorithm by allowing trees to be
the-fly, allowing the system to adapt to new data
added or pruned based on their performance on new
points and maintain high predictive performance.
82 Informatica 49 (2025) 77–86 Dhanalakshmi B et al.
Model evaluation and adaptation • Stream of data points (𝑥𝑡 , 𝑦𝑡) where 𝑥𝑡 is the
Evaluating the performance of the predictive framework feature vector and 𝑦𝑡 is the target
in real-time is crucial for maintaining its effectiveness. Output:
The proposed framework includes a real-time evaluation • Updated weights 𝑤𝑡
pipeline to monitor model performance continuously.
Procedure:
Evaluation metrics 1. Initialize weights 𝑤0
• Accuracy, Precision, Recall, and F1 Score: 2. For each data point (𝑥𝑡 , 𝑦𝑡) in the stream:
These metrics are used to evaluate the 1. Predict 𝑦?̂? = 𝑥𝑡 − 1, 𝑥𝑡
performance of classification models. 2. Compute the error 𝑒𝑡=𝑦𝑡 − 𝑦?̂?
Continuous monitoring ensures that any 3. Update the weights: 𝑤𝑡= 𝑤𝑡 − 1 + 𝜂𝑒𝑡𝑥𝑡
degradation in performance is promptly detected. 3. Continue until the end of the data stream
• Concept drift detection: Strategies such as
window-based evaluation and performance Incremental Decision Trees (Hoeffding Tree)
monitoring are employed to detect concept drift, Algorithm: Incremental Decision Tree (Hoeffding
ensuring that the model adapts to changing data Tree)
patterns. Input:
• Stream of data points (𝑥𝑡 , 𝑦𝑡) where 𝑥𝑡 is the
Model adaptation strategies feature vector and 𝑦𝑡 is the target
• Retraining and update mechanisms: When • Confidence parameter 𝛿
performance degradation is detected, the model • Grace period 𝑛
is retrained or updated to maintain its accuracy. Output:
• Adaptive learning rates: Adjusting the learning • Decision tree
rate based on model performance helps in fine- Procedure:
tuning the model continuously. 1. Initialize an empty decision tree
In the area of Cyber-Physical Systems (CPS), where 2. For each data point (𝑥𝑡 , 𝑦𝑡) in the stream:
real-time data processing and predictive analytics are • Traverse the tree to find the appropriate leaf
paramount, the application of suitable algorithms plays a for (𝑥𝑡 , 𝑦𝑡)
pivotal role. Here, we introduce several key algorithms
• Update sufficient statistics at the leaf
tailored to address the challenges inherent in processing
• If the number of data points at the leaf mod
streaming data within CPS environments. Online Gradient
𝑛=0:
Descent facilitates continuous learning by iteratively
1. Compute the Gini impurity for each attribute
updating model parameters based on observed data,
2. Identify the best attribute to split on using the
ensuring adaptability to changing conditions in the data
Hoeffding bound
stream. Incremental Decision Trees, exemplified by the
3. If the difference in impurity between the best
Hoeffding Tree algorithm, dynamically grow decision
attribute and the second-best attribute exceeds
trees as new data arrives, efficiently handling streaming
the bound, split the leaf node on the best attribute
data while preserving model accuracy with minimal
3. Continue until the end of the data stream
memory usage. Adaptive Random Forests offer a dynamic
solution to concept drift and changing conditions by
Algorithm: Adaptive Random Forests
continuously monitoring individual tree performance and
Input:
replacing underperforming ones with new trees trained on
• Number of trees 𝐾K
recent data. Hidden Markov Models (HMMs) capture
temporal dependencies and state transitions in streaming • Stream of data points (𝑥𝑡 , 𝑦𝑡) where 𝑥𝑡 is the
data, enabling predictive modeling and anomaly detection feature vector and 𝑦𝑡 is the target
in dynamic CPS environments. Finally, the Explicit- Output:
Duration Hidden Markov Model (EDHMM) enhances • Ensemble of decision trees
traditional HMMs by explicitly modeling state durations, Procedure:
providing more precise temporal modeling and improving 1. Initialize an ensemble of 𝐾K decision trees
predictive analytics accuracy in streaming CPS data. 2. For each data point ((𝑥𝑡 , 𝑦𝑡) in the stream:
These algorithms collectively form the backbone of our • For each tree 𝑇𝑖 in the ensemble:
proposed framework for efficient classification and • Traverse 𝑇𝑖 to find the appropriate leaf for (𝑥𝑡 , 𝑦𝑡)
prediction in CPS, addressing the unique challenges posed • Update sufficient statistics at the leaf
by streaming data in dynamic environments. • If the number of data points at the leaf mod 𝑛=0:
1. Compute the Gini impurity (or another splitting
Algorithm: Online Gradient Descent criterion) for each attribute
Input: 2. Identify the best attribute to split on using the
• Learning rate 𝜂 Hoeffding bound
• Initial weights 𝑤0 3. If the difference in impurity between the best
attribute and the second-best attribute exceeds
Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 83
the bound, split the leaf node on the best maintaining model accuracy with minimal memory usage.
attribute Adaptive Random Forests further enhance model
• Monitor the performance of 𝑇𝑖 using a adaptability by dynamically adjusting the ensemble of
sliding window of recent predictions decision trees based on performance feedback, effectively
• If the performance of 𝑇𝑖 degrades combating concept drift. Hidden Markov Models (HMM)
significantly, replace 𝑇𝑖 with a new tree capture temporal dependencies in CPS data, allowing for
trained on recent data probabilistic modeling of sequential observations. The
3. Continue until the end of the data stream Explicit-Duration Hidden Markov Model (EDHMM)
extends HMM by explicitly modeling state durations,
Algorithm: Explicit-Duration Hidden Markov Model providing more precise temporal modeling and enhancing
(EDHMM) prediction accuracy. These algorithms collectively enable
Input: real-time feature extraction, model updating, and
• Number of states 𝑁 predictive analytics, ensuring the framework's efficacy in
• Observation sequence 𝑂 = 𝑂1, 𝑂2, … . . 𝑂𝑡 , 𝑞𝑡 = handling the complexities of streaming data in CPS
𝑆𝑖|𝜆 environments.
• Initial state distribution 𝜋
• State transition matrix 𝐴 4 Results and discussion
• Observation probability matrix 𝐵 The proposed methodology for efficient classification of
Output: streaming data from Cyber Physical Systems (CPS) was
• Updated parameters π, A, 𝐵 evaluated using various performance metrics. The metrics
Procedure: used include accuracy, precision, recall, F1-score, and
1 Initialize π, A, and 𝐵 processing time. The models were tested on a dataset
consisting of [insert dataset details here], and the results
2 Expectation-Maximization (EM) algorithm: are summarized in the tables below.
1. E-step: Compute the forward probabilities The performance of traditional supervised learning
𝛼α and backward probabilities β models (e.g., Decision Trees, Support Vector Machines,
2. M-step: Update π, A, and B using α and β and Neural Networks) is presented in Table 1. Figure 2 to
3 Iterate the EM steps until convergence or for a 6 shows the performance comparison of supervised
fixed number of iterations learning models.
E-step:
• Compute forward probabilities
𝑎𝑡(𝑖, 𝑑) = 𝑂𝑡−𝑑+1, … . . 𝑂𝑡 , 𝑞𝑡 = 𝑆𝑖 , 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛
= 𝑑|𝜆
• Compute backward probabilities
𝛽𝑡(𝑖) = 𝑂𝑡+1, … . . 𝑂𝑡 , 𝑞𝑡
= 𝑆𝑖 , 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛
= 𝑑|𝜆
M-step:
Update initial state distribution:
𝜋𝑖 = 𝛾1(𝑖)
Update state transition matrix
∑𝑇−1
𝑡−1 ∈𝑡 (𝑖, 𝑗)
𝑎𝑖,𝑗 = Figure 2: Accuracy comparison
∑𝑇−1
𝑡−1 𝛾𝑡(𝑖, 𝑗)
Update observation probability matrix:
𝑏𝑗(𝑘)
∑𝑇−𝑑
𝑡−1 𝛾𝑡(𝑗) 1 (𝑂𝑡 = 𝑣𝑘)
=
∑𝑇−𝑑
𝑡−1 𝛾𝑡(𝑗)
Update duration probability matrix:
∑𝑇−1
𝑡−1 𝛾𝑡(𝑖, 𝑑)
𝑑𝑖(𝑑) =
∑𝑇−1
𝑡−1 𝛾𝑡(𝑖)
The proposed framework utilizes several key algorithms
to effectively handle streaming data in Cyber-Physical
Systems (CPS). Online Gradient Descent enables
continuous learning by updating model parameters
incrementally as new data arrives, ensuring adaptability to Figure 3: Precision comparison
evolving patterns. Incremental Decision Trees, such as the
Hoeffding Tree algorithm, dynamically grow decision
trees in response to changing data distributions,
84 Informatica 49 (2025) 77–86 Dhanalakshmi B et al.
Figure 4: Recall comparison
Figure 6: Comparison of processing time
Figure 5: F1 score comparison
Table 1: Performance metrics for supervised learning models
Model Accuracy Precision Recall F1-Score Processing Time
(ms)
Decision Tree 92.3% 91.8% 92.0% 91.9% 150
SVM 93.7% 93.2% 93.5% 93.3% 300
Neural 95.2% 94.8% 95.0% 94.9% 500
Network
The Neural Network outperforms both the Decision 91.9% F1-score). This model is suitable for applications
Tree and SVM in terms of accuracy, precision, recall, and where speed is critical, but slight compromises in
F1-score, achieving 95.2%, 94.8%, 95.0%, and 94.9% prediction accuracy are acceptable. The performance of
respectively. This indicates that the Neural Network is the HMM and EDHMM is shown in Table 2. HMMs are
more effective at accurately predicting CPS behavior and particularly effective for time series data and capturing
identifying relevant instances, with fewer false positives temporal dependencies.
and negatives. However, this enhanced performance
comes with a higher processing time of 500 ms, reflecting Table 2: Performance Metrics for Hidden Markov Model
its greater computational complexity. (HMM) and EDHMM
The SVM, with an accuracy of 93.7%, precision of Metric HMM EDHMM
93.2%, recall of 93.5%, and F1-score of 93.3%, performs
better than the Decision Tree but requires twice the Accuracy 94.5% 96.1%
processing time (300 ms). This makes SVM a good Precision 94.0% 95.7%
middle-ground option, balancing improved predictive
Recall 94.3% 95.9%
performance with moderate computational demands. The
Decision Tree, while being the fastest with a processing F1-Score 94.1% 95.8%
time of 150 ms, has the lowest performance metrics
Processing 400 600
(92.3% accuracy, 91.8% precision, 92.0% recall, and
Time (ms)
Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 85
performance over time. The significant improvement in
Table 2 presents a comparison between the Hidden performance metrics after adaptation underscores the
Markov Model (HMM) and the Explicit-Duration Hidden importance of continuously monitoring and updating
Markov Model (EDHMM) based on key performance models to handle evolving data distributions in CPS.In
metrics. In terms of accuracy, EDHMM achieves 96.1%, summary, the proposed methodology, combining
compared to 94.5% for HMM. This indicates that traditional supervised learning with advanced HMM and
EDHMM makes fewer classification errors and is better at EDHMM models, and incorporating concept drift
correctly predicting CPS behavior. Precision, which detection, provides a robust framework for efficient
measures the proportion of true positive predictions classification and prediction of CPS data. This approach
among all positive predictions, is 95.7% for EDHMM and ensures high accuracy, adaptability, and scalability,
94.0% for HMM, suggesting that EDHMM has a lower making it suitable for real-time applications in dynamic
rate of false positives. Recall, the proportion of true CPS environments.
positive predictions among all actual positives, is 95.9%
for EDHMM versus 94.3% for HMM, showing 5 Conclusion
EDHMM's improved ability to identify relevant instances.
The F1-Score, which harmonizes precision and recall, is In this research, we presented an efficient framework
higher for EDHMM at 95.8% compared to HMM's 94.1%, for classification and prediction of streaming data from
confirming EDHMM's overall better performance. Cyber Physical Systems (CPS). The study utilizing
However, this enhanced performance comes at the cost of traditional supervised learning algorithms and advanced
processing time. EDHMM's processing time is 600 ms, modeling techniques such as Hidden Markov Models
higher than HMM's 400 ms, reflecting the additional (HMM) and Explicit-Duration Hidden Markov Models
computational complexity of modeling explicit state (EDHMM). Our approach aimed to extract valuable
durations. Despite this, the trade-off is justified by the knowledge from continuous data streams and predict
substantial gains in predictive accuracy and reliability, system behavior accurately, facilitating timely decision-
making EDHMM a more robust choice for real-time CPS making within interconnected CPS environments. The
applications. To assess the system's ability to handle results demonstrated the effectiveness of the proposed
concept drift, models were evaluated before and after the methodology across various performance metrics,
adaptation process. Table 4 summarizes the performance including accuracy, precision, recall, and F1-score.
of the models before and after detecting and adapting to Among the traditional models, the Neural Network
concept drift. outperformed others, achieving the highest accuracy of
95.2%, albeit with higher processing time. The SVM
Table 3: Performance metrics before and after concept struck a balance between accuracy and computational
drift adaptation efficiency, while the Decision Tree offered the fastest
Metric Before After processing time with acceptable accuracy. The advanced
Adaptation Adaptation HMM and EDHMM models showed significant
Accuracy 85.0% 92.0% advantages in handling time series data, capturing
temporal dependencies, and explicitly modeling state
Precision 84.5% 91.5% durations. The EDHMM, in particular, achieved superior
performance with an accuracy of 96.1% and an F1-score
Recall 84.8% 91.8% of 95.8%, despite its higher computational cost. These
models proved to be robust in dynamic environments,
F1-Score 84.6% 91.6% maintaining high predictive accuracy over time. A crucial
aspect of the methodology was the integration of concept
Processing 200 250
drift detection and model adaptation mechanisms. This
Time (ms)
ensured that the models remained relevant and effective in
the face of changing data distributions, a common
The results demonstrate the effectiveness of the challenge in CPS applications. The ability to detect
proposed methodology in classifying and predicting concept drift and adapt models accordingly significantly
streaming data from CPS. The supervised learning improved their performance, as evidenced by the post-
models, particularly the Neural Network, achieved high adaptation metrics.
accuracy and F1-scores, indicating strong predictive
performance. However, the Neural Network required
more processing time compared to the Decision Tree and References
SVM. HMM and EDHMM models showed superior
performance in handling time series data, with EDHMM [1] Subutai Ahmad, Alexander Lavin, Scott Purdy, and
outperforming HMM in all metrics. This highlights the Zuha Agha. Unsupervised real-time anomaly detection
advantage of explicitly modeling state durations in CPS for streaming data. Neurocomputing, 262:134–147,
data, where the duration of states can significantly impact 2017.
system behavior. https://doi.org/10.1016/j.neucom.2017.04.070.
The concept drift detection and model adaptation [2] Giuseppe Aceto, Domenico Ciuonzo, Antonio
mechanism proved crucial in maintaining model Montieri, and Antonio Pescapé.
86 Informatica 49 (2025) 77–86 Dhanalakshmi B et al.
DISTILLER: Encrypted traffic classification via [12] Isao Yagyu, Hiroshi Hasegawa, and Ken-ichi Sato.
multimodal multitask deep learning. Journal of An efficient hierarchical optical path network design
Network and Computer Applications, 183– algorithm based on a traffic demand expression in a
184:102985, 2021. Cartesian product space. IEEE Journal on Selected
https://doi.org/10.1016/j.jnca.2021.102985.. Areas in Communications, 26(6):22–31, 2008.
[3] Maroua Bahri, Albert Bifet, João Gama, Heitor Murilo https://doi.org/10.1109/JSACOCN.2008.030907.
Gomes, and Silviu Maniu. [13] Joel D. Costa Júnior, Elaine R. Faria, Jonathan A.
Data stream analysis: Foundations, major tasks and Silva, João Gama, and Ricardo Cerri.
tools. WIREs Data Mining and Knowledge Discovery, Novelty detection for multi-label stream
11(3):e1405, 2021. classification. 2019 8th Brazilian Conference on
http://dx.doi.org/10.1002/widm.1405.. Intelligent Systems (BRACIS), pages 194–199, 2019.
[4] Jean Paul Barddal, Lucas Loezer, Fabrício Enembreck, https://doi.org/10.1109/BRACIS.2019.00034.
and Riccardo Lanzuolo. [14] Maxim Kalinin and Vasiliy Krundyshev.
Lessons learned from data stream classification Security intrusion detection using quantum machine
applied to credit scoring. Expert Systems with learning techniques. Journal of Computer Virology
Applications, 162:113899, 2020. and Hacking Techniques, 19:125–136, 2023.
https://doi.org/10.1016/j.eswa.2020.113899.. https://doi.org/10.1007/s11416-022-00435-0.
[5] Kaylani Bochie, Mateus S. Gilbert, Luana Gantert, [15] Jay Kumar, Junming Shao, Salah Uddin, and Wazir
Mariana S. M. Barbosa, Dianne S. V. Medeiros, and Ali. An online semantic-enhanced Dirichlet model
Miguel Elias M. Campista. for short text stream clustering. Proceedings of the
A survey on deep learning for challenged networks: 58th Annual Meeting of the Association for
Applications and trends. Journal of Network and Computational Linguistics, pages 766–776, 2020.
Computer Applications, 194:103213, 2021. https://doi.org/10.18653/v1/2020.acl-main.70.
https://doi.org/10.1016/j.jnca.2021.103213.. [16] Xiangjun Li, Yong Zhou, Ziyan Jin, Peng Yu, and
[6] Jin Cheng, Yulei Wu, Yuepeng E, Junling You, Tong Shun Zhou. A classification and novel class
Li, Hui Li, and Jingguo Ge. detection algorithm for concept drift data stream
MATEC: A lightweight neural network for online based on the cohesiveness and separation index of
encrypted traffic classification. Computer Networks, Mahalanobis distance. Journal of Electrical and
199:108472, 2021. Computer Engineering, 2020:4027423, 2020.
https://doi.org/10.1016/j.comnet.2021.108472., 2021. https://doi.org/10.1155/2020/4027423..
[7] Luiz F. S. Coletta, Moacir Ponti, Eduardo R. Hruschka, [17] Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama,
Ayan Acharya, and Joydeep Ghosh. and Guangquan Zhang.
Combining clustering and active learning for the Learning under concept drift: A review. IEEE
detection and learning of new image classes. Transactions on Knowledge and Data Engineering,
Neurocomputing, 358:150–165, 2019. 31(12):2346–2363, 2019.
https://doi.org/10.1016/j.neucom.2019.04.070.. https://doi.org/10.1109/TKDE.2018.2876857.
[8] Salah Ud Din, Junming Shao, Jay Kumar, Waqar Ali, [18] Shaohua Wan, Yudong Zhang, and Jia Chen.
Jiaming Liu, and Yu Ye. On the construction of data aggregation tree with
Online reliable semi-supervised learning on evolving maximizing lifetime in large-scale wireless sensor
data streams. Information Sciences, 525:153–171, networks. IEEE Sensors Journal, 16(20):7433–7440,
2020. 2016.
https://doi.org/10.1016/j.ins.2020.03.052.. https://doi.org/10.1109/JSEN.2016.2581491.
[9] Song Li, Han Qiu, and Jialiang Lu. [19] Li Da Xu and Lian Duan. Big data for cyber-physical
An interpretable federated learning-based network systems in Industry 4.0: A survey. Enterprise
intrusion detection framework. arXiv Preprint, 2022. Information Systems, 13(2):148–169, 2019.
https://arxiv.org/abs/2201.03134. https://doi.org/10.1080/17517575.2018.1442934..
[10] Gianluigi Folino, Francesco Sergio Pisani, and Luigi [20] E. A. Zaitseva and D. S. Lavrova.
Pontieri. A GP-based ensemble classification Self-Regulation of the Network Infrastructure of
framework for time-changing streams of intrusion Cyberphysical Systems on the Basis of the Genome
detection data. Soft Computing, 24:17541–17560, Assembly Problem. Automatic Control and
2020. Computer Sciences, 54:813–821, 2020.
https://doi.org/10.1007/s00500-020-05200-3.. https://doi.org/10.3103/S0146411620080350..
[11] Chunyu Hu, Yiqiang Chen, Lisha Hu, and Xiaohui [21] Sristi Vashisth, Anjali Goyal.
Peng. A novel random forests-based class Dynamic Anomaly Detection Using Robust Random
incremental learning method for activity recognition. Cut Forests in Resource-Constrained IoT
Pattern Recognition, 78:277–290, 2018. Environments. Informatica, 48(23): 107–120, 2024.
https://doi.org/10.1016/j.patcog.2018.01.025.. https://doi.org/10.31449/inf.v48i23.6862.
https://doi.org/10.31449/inf.v49i16.6639 Informatica 49 (2025) 87–96 87
A Comparative Study of Deep Learning Algorithms for Detecting
Fungal Infection Skin Diseases
Fajar Masya1, Joko Triloka* 2, Setia Wulandari2
1Mercu Buana University, Meruya Sel., Kembangan, Jakarta 11650, Indonesia
2Institute of Informatics and Business Darmajaya, Jl. Z.A. Pagar Alam No.93, Bandar Lampung 35141, Indonesia
E-mail: fajar.masya@mercubuana.ac.id, joko.triloka@darmajaya.ac.id, setiawulan.2121211001@mail.darmajaya.ac.id
*Corresponding author
Keywords: mask r-cnn, yolov5, image classification; skin fungal infection
Received: July 11, 2024
Many people place a high value on the health of their skin, frequently spending large sums of money on
skincare products. Fungal infections are one of the most common skin conditions that can damage a
person's self-esteem. When dealing with skin health issues, seeking advice from a knowledgeable
dermatologist is essential. Deep learning is a contemporary technique that saves doctors time and helps
them spot diseases early. Two deep learning algorithms that are useful in identifying patterns of skin
illnesses are Mask R-CNN and YOLOv5. This paper explores using Mask R-CNN and YOLOv5 to
recognize skin illnesses caused by fungal infections, going through several processing phases. The
research results show that the YOLOv5 strategy performed best in accuracy, recall, precision, F1-Score,
and AUC. This algorithm shows great potential and warrants further investigation in practical
applications.
Povzetek: Primerjava algoritmov Mask R-CNN in YOLOv5 za zaznavanje glivičnih kožnih bolezni kaže,
da YOLOv5 dosega najboljše rezultate, s čimer izkazuje velik praktični potencial.
1 Introduction enhance images by extracting valuable information.
Object detection algorithms, often employing machine
Skin covers the entire surface of the human body and is learning or deep learning, automate relevant findings. In
the largest organ, directly exposed to the external medical science, digital image processing is instrumental
environment [1]. Various diseases affect the skin, ranging in automating diagnostic processes [9].
from mild, itchy conditions to serious, potentially fatal Several studies have applied popular object detection
ones [2]. Despite the importance of skin health, it is often algorithms, such as the Mask Regional-based
overlooked, and many underestimate skin conditions. Convolutional Neural Network (Mask R-CNN) and You
Most skin diseases result from bacterial, fungal, or viral Only Look Once (YOLO) algorithms. One study using the
infections and allergies [3]. Several factors can directly or Mask R-CNN algorithm for breast cancer detection
indirectly impact the skin, causing diseases that may be reported an accuracy of 91% and a precision of 84% [10].
treatable with medications, while others necessitate Another study implemented Mask R-CNN to find, detect,
consultation with a professional skin disease specialist and classify objects in images or videos of the Ryze Tello
[4,5]. Consultation with a specialist in dermatology is drone, achieving an average accuracy of 95.6% [11].
essential for individuals with skin health concerns. Additionally, research using Mask R-CNN for
However, due to embarrassment and the high cost of automatically detecting and recognizing small magnetic
treatment, many individuals with skin diseases remain targets in shallow underground layers demonstrated an
silent, leading to decreased self-confidence and social average detection accuracy of 97%, a recall rate of 94%,
withdrawal. This social isolation can contribute to and an average detection speed of 0.35 seconds per image
depression. Therefore, dermatologists must engage in on a GPU [12]. Studies employing the YOLOv5 algorithm
early detection and prevention of skin diseases, as these have also shown significant results. One study detecting
conditions can be easily transmitted. face masks with YOLOv5 after 300 epochs achieved an
accuracy rate of approximately 96.6% [13]. Another study
In the modern era, nearly all sectors, including
using YOLOv5 to determine whether a face mask is being
medicine, rely on computerized systems to replace
worn reported an accuracy of 97.90% [14]. The
conventional methods with automated technology [6].
application of popular object detection algorithms like
Researchers, particularly in medical science, are actively
Mask R-CNN and YOLOv5 has been widely successful
seeking solutions to help doctors diagnose diseases early
across diverse fields. The specific accuracies and
without excessive time expenditure [7]. This is where
precision rates mentioned for different applications like
digital image processing becomes essential [8]. Digital
breast cancer detection, drone imagery classification,
image processing involves using computer algorithms to
underground magnetic target detection, and face mask
88 Informatica 49 (2025) 87–96 F. Masya et al.
detection highlight these algorithms' versatility and high pathology detection using CNN algorithms, reaching a test
performance in various domains. accuracy of 89%.
While multiple studies have investigated the use of Meanwhile, several studies have utilized YOLO in
Mask R-CNN and YOLO for a variety of medical research on different tasks, including [21], which has
applications, including breast cancer detection, face mask achieved 92.20% accuracy in real-time face mask
recognition, and other skin illnesses, there has been a detection under multiple conditions. [22] calculating
striking paucity of research focusing on fungal skin melanoma skin cancer using a web application integrated
infections. Existing research focuses mostly on bacterial with the YOLOv5. The model evaluates if the stain is
or viral skin disorders or non-specific skin diseases, cancerous or benign. [23] applying YOLO for early skin
leaving a vacuum in the early identification and cancer detection with the test results showed that the
categorization of fungal infections with advanced deep- YOLOv5's model has an accuracy of
learning models. This gap is crucial since fungal infections 89.1% in detecting skin cancer types. Moreover, a
are common and sometimes misdiagnosed due to proposed Yolo deep neural network which can classify 9
symptoms that overlap with other skin disorders. different classes of skin cancer was conducted by [24],
This study intends to close the highlighted gap by their experimental analysis shows that the proposed
thoroughly comparing two cutting-edge deep learning method achieves the mean average precision score of
systems, Mask R-CNN, and YOLOv5, for identifying and 88.03% and 86.52% for Yolo V3 and Yolo V4
categorizing fungal skin diseases. This is critical since respectively.
fungal infections are among the most common skin
disorders, affecting millions of people worldwide, and
early detection is essential for avoiding consequences. 3 System model
This study enhances the application of deep learning in
dermatology by comparing the performance of these 3.1 Mask R-CNN
algorithms. It also provides practical insights for real-time
Mask R-CNN, developed by the Facebook AI Research
diagnostic tools in healthcare settings.
(FAIR) team in 2017, is a deep learning algorithm
renowned for detecting objects in images while
2 Related work simultaneously generating a segmentation mask for each
instance, a technique commonly referred to as instance
Numerous studies have explored the efficacy of various
segmentation [25]. As depicted in Figure 1. instance
algorithms for classifying skin diseases caused by fungal
segmentation shares similarities with object detection,
infections. In 2017, [15] investigated the use of image
wherein individual objects are detected sequentially.
processing techniques, including Discrete Cosine
However, it integrates semantic segmentation, enabling
Transform (DCT), Discrete Wavelet Transform (DWT),
each object to be categorized, localized, and distinguished
and Singular Value Decomposition (SVD), achieving an
at the pixel level.
impressive detection efficiency of up to 80%. The average
During the detection process, Mask R-CNN operates
training time across the three transformations and their
across three main components: the feature extraction
parallel combinations was 2.066 seconds, with an average
network, region-proposal network, and instance detection
testing time of 0.7866 seconds. Subsequently, in 2018,
and segmentation networks. Mask R-CNN employs
[16] delved into the utilization of the K-Means and Fuzzy
various backbone architectures [26], including ResNet-
C-Means algorithms, providing valuable insights into skin
101 and FPN for feature extraction. Through
disease detection. The adoption of these algorithms
experimentation, the ResNet-101 backbone has
supported early diagnosis and disease-type identification.
demonstrated above-average accuracy and speed in
In deep learning, [17] introduced Convolutional
feature extraction. In the Region Proposal Network (RPN)
Neural Network (CNN) algorithms for skin disease
phase, Regions of Interest (ROIs) are generated, serving
detection in 2018, demonstrating enhanced accuracy and
as input for the subsequent instance detection and
efficiency compared to traditional methods. The CNN
segmentation networks stage.
approach yielded better results, paving the way for more
a. Feature extraction: Feature extraction aims to
advanced diagnostic tools. Building on this progress, [18]
distill information from images and represent it
explored the application of the YOLOv3 algorithm in the
in a lower-dimensional space, facilitating the
medical field in 2019. Their investigation encompassed
classification of patterns. In the context of Mask
diverse tasks, such as white blood cell detection and
R-CNN, feature extraction involves generating
identifying target strings of bananas and fruit stems.
Region of Interest (RoI) features through the
Notably, the YOLOv3 algorithm achieved impressive
fusion of ResNet-101 architecture with FPN
accuracy rates, showcasing its versatility and potential in
(Feature Pyramid Network). FPN plays a crucial
medical imaging.
role in recognition systems by enabling the
Further research by [19] focused on facial skin
identification of objects of various sizes within
disease analysis using CNN algorithms based on clinical
the same image. FPN enhances information
images. Their study encompassed the detection of five
quality by utilizing multiple feature maps. It
facial skin diseases, achieving notable accuracies for
adopts a pyramid design principle for feature
various conditions. Additionally, [20] investigated skin
extraction, offering superior speed and accuracy.
A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 89
Figure 1: The Mask R-CNN framework for instance segmentation
FPN integrates both bottom-up and top-down
information processing techniques to achieve 3.2 YOLO
comprehensive feature representation.
b. RPN (Region Proposal Network): Within the You Only Look Once (YOLO) is an algorithm developed
feature extraction process, a 3 x 3 convolution by the Facebook AI Research (FAIR) team to quickly and
layer is applied to each generated feature map. accurately detect various types of objects. YOLO
Initially, the feature map undergoes scanning addresses single regression problems directly by mapping
utilizing anchor boxes of various sizes and image pixels to bounding box coordinates and class
ratios. Subsequently, the output is bifurcated probabilities. It requires only one look at an image to
into two branches: one associated with the predict what objects are present and where they are
objectivity or confidence score, and the other located. YOLO operates by using a single convolutional
with the bounding box regressor, as depicted in network that simultaneously predicts multiple bounding
Figure 2. boxes and the probability of each class within those boxes.
c. Instance Detection and Semantic it has overall 24 convolutional layers, four max-pooling
Segmentation: During the instance layers, and two fully connected layers as illustrated in
segmentation process, objects, bounding boxes, Figure 3. It is trained on images to optimize detection
class labels, and confidence values are detected performance. The architecture works as follows:
through a fully connected network that takes the a. The input image is resized to 448x448 before
Region of Interest (RoI) as input. Semantic being processed by the convolutional network.
segmentation is then performed on the image b. A 1x1 convolution is initially applied to reduce
using a Fully Convolutional Network (FCN), the number of channels, followed by a 3x3
which predicts the semantic class of each pixel convolution to generate a cuboidal output.
within the bounding box. As a result, distinct c. The ReLU activation function is used
colors are assigned to each instance based on the throughout, except for the final layer, which
bounding box delineation, facilitating visual uses a linear activation function.
differentiation of individual objects. d. Additional techniques, such as batch
normalization and dropout, are employed to
regularise the model and prevent overfitting.
Figure 2: RPN pr ocessing
90 Informatica 49 (2025) 87–96 F. Masya et al.
Figure 3: YOLOv5 archite cture
4 Proposed procedures
The YOLOv5 algorithm utilized in this study has
Proposed Mask R-CNN comprises three primary stages. multiple phases for object detection. Using PyTorch as a
To begin with, it uses the darknet-53 architecture to extract feature extractor, YOLOv5 detects objects by classifying
features. Second, it uses the input image to derive the them and locating them depending on the features that are
coordinates of Regions of Interest (RoI) using the Region extracted. The goal of YOLOv5 feature extraction is to
Proposal Network (RPN) approach. Finally, it predicts the supply input variables for the classification procedure.
class of the discovered objects, revealing information The suggested YOLOv5 algorithm architecture is
about the ROI sites. This procedure yields a mask that displayed in Figure 5.
highlights areas suggestive of fungal-induced skin
disorders. A suggested Mask R-CNN technique based on
edge detection is shown in Figure 4 to identify the skin
conditions in the dataset.
Figure 4: Proposed Mask R-CNN Architecture
Figure 5: Proposed YO LOv5 architecture
A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 91
5 Experiments on algorithm 5.3 Algorithm processing
processing Figure 9 illustrates the idea of processing both algorithms.
Installing Python, TensorFlow, Keras, and other necessary
5.1 Dataset description software is part of the dependency installation process for
the Mask R-CNN algorithm. Installing deep learning
This study's publicly accessible dataset is from
packages such as PyTorch, NumPy, and Pandas for
http://www.dermnet.com/dermatology-pictures-skin-
YOLOv5 was required for YOLOv5. The dataset that will
disease-pictures and consists of images of 1,473 data
be utilized to train the object detection algorithms is
points and 3 class labels of skin diseases:
prepared during the object detection data loading stage.
Dermatomycosis, Mucocutaneous Candidiasis, and
The dataset needs to include pictures and properly
Pityriasis Versicolor. Before splitting the dataset, it is first
formatted annotations (labels and bounding boxes) for the
preprocessed to ensure each image is appropriate for
Mask R-CNN technique to function. Every object in the
labeling. Figure 6 shows an example of a dataset with skin
dataset needs to be labeled for detection for the YOLOv5
conditions brought on by fungus infections.
algorithm to work. The training configurations for both
algorithms are established in the configuration setting.
5.2 Data Pre-processing
Raw data needs to be treated first. Partitioning and
labeling the dataset are preprocessing steps. Labeling
gives each thing in the picture a name and ensures that it
is part of the appropriate group. After that, the 1.473 image
dataset is split into training and testing sets. The Mask R-
CNN algorithm's labeling procedure entails object
segmentation with the polygon tool. On the other hand, the
YOLO algorithm utilizes bounding boxes created with a
bounding box tool for labeling.
With 10% of the data for testing and the remaining
90% for training, the dataset was reduced to 1.136
following the labeling phase. A ratio of 80% for training
and 20% for testing was also used in the experiment. Pre-
processing procedures led to a minor decrease in the
overall dataset size due to data cleaning procedures, much
as the 90/10 split. The model's performance was likewise Figure 7: Data labeling on images using the
decreased to 1136 with this split, demonstrating how well polygon tool
it performs in comparison to the initial 90/10 split. Figure
7 provides a polygon tool example of data labeling, while
Figure 8 provides a bounding box tool example.
Figure 6: Sample images of skin diseases caused by fungal infections
92 Informatica 49 (2025) 87–96 F. Masya et al.
The setup for the Mask RCNN algorithm contains
details about the number of iterations, batch size, number
of classes, and other pertinent parameters. Configuration
options for the YOLOv5 include batch size, learning rate,
and number of epochs. To improve object detection
accuracy, Mask R-CNN performs gradient computations
and modifies model weights throughout the training
phase. Parameters in the YOLOv5 algorithm are
optimized to improve the accuracy of object detection.
During the testing phase, fresh photos are used to perform
object detection. For every object that is recognized, the
Mask RCNN algorithm produces bounding boxes and
class labels. To evaluate the object detection performance
of the YOLOv5 algorithm, a dataset that hasn't been seen
before is used.
5.4 Algorithm evaluation
We assessed YOLOv5 and Mask R-CNN for object
detection in this study because of their high accuracy and
effectiveness in managing related tasks. These algorithms
were selected due to their resilience and efficacy in object
identification and classification, particularly in situations
requiring quick and accurate detection. Although these Figure 9: The notion of algorithmic
techniques are the focus of our study, we acknowledge the processing
possibility of expanding it to include other highly
respected object detection algorithms, like SSD (Single
Shot MultiBox Detector), EfficientDet, and Faster R-
CNN. We consider these algorithms for future research 6 Results and discussion
initiatives because they may offer further insights into the Once training and testing are finished, the assessment step
comparative performance of our dataset. This procedure is carried out to gauge the effectiveness of the YOLOv5
assesses how well the YOLOv5 and Mask R-CNN and Mask R-CNN algorithms. We test the Mask RCNN
algorithms work. There are two phases to the evaluation: algorithm with various iteration settings and thresholds. In
testing and training. During the training phase, both contrast, the Mask RCNN algorithm is evaluated using
methods employed 1,023 photos from the dataset. During 1000, 1500, 2000, 2500, and 3000 iteration values,
the testing phase, 113 different photos are used to assess respectively, with threshold values varying between 0.1
the algorithms. At this stage, the algorithms' object and 0.9 for every iteration. The YOLOv5 algorithm, on the
detection performance is evaluated, and a confusion other hand, makes use of distinct threshold values and
matrix is used to calculate the algorithms' accuracy. epochs. 50, 75, 100, 125, and 150 are the employed epoch
Numerous significant performance indicators, including values, and each epoch's threshold values range from 0.1
accuracy, precision, recall, F1-score, mean average to 0.9.
precision (MAP), and area under the curve (AUC), can be a. Mask R-CNN Algorithm: The Mask R-CNN
obtained from the confusion matrix. Five iterations and algorithm identified 80 data labels for
epochs of performance evaluation are granted for both Dermatomycosis (D), 19 for Mucocutaneous
algorithms. Candidiasis (MC), and 0 for Pityriasis Versicolor
(PV) after five tests. A total of 113 photos were
positively detected. Table 1 presents the
interpretation of the performance calculation for
the Mask RCNN method used to treat skin
diseases. The algorithm uses 3000 iterations and
varies the threshold (T) from 0.1 to 0.9. The F1-
score is the method for identifying the optimal
model, with a threshold value for model
evaluation between 0.1 and 0.9. When assessing
the binary model, the harmonic mean of precision
and recall is employed using the F1-score.
According to Table 1, the maximum F1-score of
0.28 is attained at the 0.1 level. The precision is
49%, the recall is 19%, and the accuracy is 67%
at this threshold.
Figure 8: Data labeling on images using the
bounding box tool
A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 93
Table 1: Performance of Mask R-CNc.N with 3000 Iterations
d.
Accuracy Recall Precision F1-Score
T
D MC PV MAP D MC PV MAP D MC PV MAP D MC PV MAP
0.1 0,39 0,7 0,92 0,67 0,35 0,23 0 0,19 0,87 0,59 0 0,49 0,5 0,33 0 0,28
0.2 0,4 0,72 0,93 0,68 0,35 0,19 0 0,18 0 ,91 0,63 0 0,51 0,51 0,29 0 0,27
0.3 0,4 0,73 0,94 0,69 0,34 0,15 0 0,16 0,92 0,58 0 0,5 0,5 0,24 0 0,25
0.4 0,4 0,73 0,94 0,69 0,33 0,14 0 0,16 0 ,95 0,69 0 0,55 0,49 0,23 0 0,24
0.5 0,4 0,74 0,95 0,7 0,33 0,15 0 0,16 0,96 0,73 0 0,56 0,49 0,25 0 0,25
0.6 0,4 0,75 0,96 0,7 0,32 0,14 0 0,15 0 ,96 0,71 0 0,56 0,48 0,23 0 0,24
0.7 0,39 0,75 0,96 0,7 0,31 0,12 0 0,14 0,96 0,67 0 0,54 0,47 0,2 0 0,22
0.8 0,38 0,77 0,97 0,71 0,31 0,13 0 0,15 0 ,96 0,67 0 0,54 0,47 0,22 0 0,23
0.9 0,33 0,81 0,98 0,71 0,26 0,12 0 0,13 0 ,91 0,5 0 0,47 0,4 0,19 0 0,2
The area under the ROC graph is then computed using the c. Evaluation of Proposed Algorithms: The Mask
AUC value. It is employed as a performance evaluation R-CNN and the YOLOv5 algorithm can be
statistic to gauge a classification model's effectiveness. A compared to the calculation results obtained from
higher AUC score indicates better model performance in the algorithm testing technique for detecting
differentiating between positive and negative classes. In fungal infections-caused skin problems, which
the fifth test, an AUC value of 0.55 is displayed on the contained 113 data points from three different
ROC graph of the Mask R-CNN method, as depicted in skin conditions. Table 3 displays the comparison
Figure 10. values.
In every metric that is examined, YOLOv5 outperforms
Mask R-CNN, including accuracy (0.87), recall (0.80),
precision (0.85), F1-Score (0.81), and AUC (0.88). The
variety of fungal infections in appearance, size, form, and
texture makes it particularly difficult to diagnose skin
illnesses caused by these infections. Algorithms that
process medical pictures accurately and efficiently are
necessary for effective detection. Here, the effectiveness
of two widely used object detection algorithms, YOLOv5
and Mask R-CNN, is compared.
This can be beneficial when precise infection borders
are critical in medical imaging. Dermatologists may find
the capacity to create segmentation masks especially
Figure 10: ROC following the fifth Mask R-CNN helpful in diagnosing and treating infections since they
algorithm test offer comprehensive details about the affected regions.
However, because of its multi-stage processing, Mask R-
b. YOLOv5 Algorithm: By the time the fifth test CNN requires a lot of computing power. This may lead to
was reached, the YOLOv5 algorithm had slower inference and longer training times, which could be
accurately identified 113 images; however, it had problematic for real-time applications or for handling big
only identified 86 data labels for Pityriasis datasets.
Versicolor, 39 for Mucocutaneous Candidiasis, By adding a branch for predicting segmentation masks
and 86 for Dermatomycosis. The performance on each Region of Interest (RoI) in parallel with the
testing at epoch 150 with a threshold value of 0.1 current branch for classification and bounding box
is displayed in Table 2. According to Table 2, the regression, Mask R-CNN expands upon Faster R-CNN.
greatest F1-Score value is 0.81, at the 0.1 The two-stage method of Mask R-CNN, which includes
threshold, with 86% accuracy, 8% recall, and region proposal and refining, enables very accurate object
94% precision. With an AUC value of 0.88. identification and segmentation.
Figure 11. displays a graph for the ROC of the
YOLOv5 algorithm in the fifth test.
94 Informatica 49 (2025) 87–96 F. Masya et al.
Table 2: Performance of YO LOv5 with 150 epochs
Accuracy Recall Precision F1-Score
T
D MC PV MAP D MC PV MAP D MC PV MAP D MC PV MAP
0.1 0,77 0,83 0,99 0,86 0,77 0,72 0,92 0,8 0,71 0,92 0,8 0,81 0,8 0,71 0,92 0,81
0.2 0,74 0,86 0,99 0,86 0,68 0,72 0,92 0,77 0,78 0,92 0,84 0,85 0,76 0,75 0,92 0,81
0.3 0,72 0,86 0,98 0,85 0,62 0,69 0,83 0,71 0,8 0,91 0,85 0,85 0,72 0,74 0,87 0,78
0.4 0,67 0,85 0,98 0,83 0,52 0,62 0,82 0,65 0,83 0,9 0,86 0,86 0,65 0,71 0,86 0,74
0.5 0,62 0,83 0,97 0,81 0,42 0,53 0,5 0,48 0,85 1 0,89 0,91 0,57 0,65 0,67 0,63
0.6 0,58 0,8 0,96 0,78 0,33 0,42 0,33 0,36 0,89 1 0,92 0,94 0,49 0,57 0,5 0,52
0.7 0,53 0,75 0,93 0,74 0,23 0,25 0 0,16 1 0 1 0,67 0,37 0,4 0 0,26
0.8 0,44 0,69 0,93 0,69 0,08 0,07 0 0,05 1 0 1 0,67 0,15 0,13 0 0,09
0.9 0,4 0,67 0,93 0,67 0,02 0 0 0,01 0 0 1 0,33 0,04 0 0 0,01
Table 3. Performance comparison of proposed algorithms
Iteration/
Threshold Prediction Accuracy Recall Precision F1-Score AUC
Epochs
M Y M Y M Y M Y M Y M Y M Y M Y
1000 50 0,6 0,1 176 163 0,79 0,84 0,34 0,76 0,6 0,76 0,43 0,75 0,61 0,81
1500 75 0,2 0,1 179 179 0,73 0,87 0,42 0,76 0,36 0,85 0,38 0,78 0,61 0,88
2000 100 0,1 0,1 220 183 0,67 0,86 0,22 0,78 0,45 0,82 0,3 0,8 0,55 0,87
2500 125 0,7 0,1 199 161 0,8 0,87 0,4 0,75 0,67 0,85 0,45 0,79 0,57 0,84
3000 150 0,1 0,1 260 183 0,67 0,86 0,19 0,8 0,49 0,81 0,28 0,81 0,55 0,88
predictions, which is essential for identifying various
Additionally, YOLOv5 produced higher recall and
precision metrics, demonstrating its efficacy in reducing
false positives and false negatives. This is important for
medical diagnosis since misdiagnosing a healthy area as
sick (false positive) or failing to detect an infection (false
negative) can have serious repercussions. This implies that
YOLOv5 has a higher degree of accuracy when it comes
to recognizing contaminated regions in the pictures.
Furthermore, the high AUC suggests that YOLOv5
performs better across a range of threshold values in
differentiating between infected and non-infected areas.
Although Mask R-CNN provides thorough segmentation,
the comparison research indicates that for this specific
task, the benefits of segmentation are not greater than
those of YOLOv5's higher detection accuracy and
Figure 11: ROC following the fifth of the YOLOv5 efficiency. However, in some clinical situations where
Algorithm test precise infection boundaries are required, Mask R-CNN's
segmentation function might still be useful.
Our proposed technique, YOLOv5, is a quick and Because YOLOv5 processes information more
efficient single-stage object detection algorithm that quickly, it is more suited for real-world uses where timely
outperforms two-stage detectors such as Mask R-CNN. findings are crucial, like automated screening systems in
Large-scale image analysis and real-time detection healthcare settings. According to the comparison analysis,
scenarios benefit greatly from this efficiency. Although it YOLOv5 is a more sensible option for large-scale screens
might not offer as much segmentation depth as Mask R- and real-time applications. Nonetheless, the needs of the
CNN, YOLOv5 has a very high degree of object detection application, such as the necessity for segmentation against
and classification accuracy. It captures items of different the requirement for quick and precise identification,
sizes with the aid of anchor boxes and multi-scale should be considered while choosing between the two
algorithms.
A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 95
7 Conclusion [5] Altammami, G. S et al. 2024. Dermatological
Conditions in The Intensive Care Unit at A Tertiary
There are notable variations in performance parameters Care Hospital in Riyadh, Saudi Arabia.” Saudi
including accuracy, recall, precision, F1-Score, and AUC Medical Journal vol. 45, 8. pp. 834-839.
when YOLOv5 and Mask R-CNN algorithms are https://doi.org/10.15537/smj.2024.45.8.20240479
compared to identify fungal diseases on the skin. The
disparities arise from variations in iteration (or epoch) [6] Khavandi, S. et al. 2023. Investigating the Impact of
values, which affect the algorithms' capacity to acquire Automation on the Health Care Workforce Through
knowledge and generalize from the training set. The Autonomous Telemedicine in the Cataract Pathway:
results of performance tests indicate that algorithm Protocol for a Multicenter Study.” JMIR research
performance is influenced by epoch or iteration values protocols vol.12. e49374. pp.1-10.
from the Mask R-CNN and YOLOv5 algorithms' first to https://doi.org/10.2196/49374
fifth tests. The second test had the highest AUC value with [7] Roy, K. S. et al. 2019. Skin Disease Detection Based
1500 iterations of the Mask R-CNN method and 75 epochs on Different Segmentation Techniques. 2019 Int. Conf.
of the YOLOv5 algorithm. With an AUC value of 66% Opto-Electronics Appl. Opt. Optronix. pp. 1–5, 2019.
and an F1-score of 38%, the Mask R-CNN algorithm is https://doi.org/10.1109/OPTRONIX.2019.8862403
less successful in 1500 iterations at identifying diseases
caused by fungal infections of the skin. On the other hand, [8] Archana, R. and Jeevaraj, P.S.E. 2024. Deep learning
with an AUC value of 88% and an F1-Score value of 78%, models for digital image processing: a review. Artif
the YOLO algorithm's test results demonstrate its good Intell Rev 57, 11.
ability to identify diseases brought on by skin infections, https://doi.org/10.1007/s10462-023-10631-z
as evidenced by its ability to forecast 179 disorders in [9] Thakur, G. K. et al. 2024. Deep Learning Approaches
epoch 75. for Medical Image Analysis and Diagnosis. Cureus
The thorough investigation demonstrates that vol. 16,5 e59507. pp. 1-8.
YOLOv5 performs better than Mask R-CNN in terms of https://doi.org/10.7759/cureus.59507
accuracy, recall, precision, F1-Score, and AUC when it
[10] Bhatti, H.M.A. et al. 2020. Multi-detection and
comes to identifying fungal diseases on the skin. Iteration
Segmentation of Breast Lesions Based on Mask
and epoch settings have a significant impact on
RCNN-FPN. Proc. - 2020 IEEE Int. Conf.
performance; YOLOv5 shows the best performance at 75
Bioinforma. Biomed. BIBM 2020, pp. 2698–2704.
epochs. Mask R-CNN is less suited for this application
https://doi.org/10.1109/BIBM49941.2020.9313170
due to its computational intensity and lower detection
accuracy, even though it has segmentation capabilities. As [11] Subash, K.V.V. et al. 2020. Object Detection Using
a result, in both clinical and real-world contexts, YOLOv5 Ryze Tello Drone With Help of Mask-RCNN. 2nd Int.
is the recommended algorithm for identifying fungal- Conf. Innov. Mech. Ind. Appl. ICIMIA 2020 - Conf.
induced skin disorders due to its exceptional efficiency Proc., no. Icimia, pp. 484-490, 2020.
and accuracy. Future research can continue to enhance the https://doi.org/10.1109/ICIMIA48430.2020.907488
detection accuracy and practical applicability of deep 1
learning models for diagnosing skin fungal infections.
[12] Zhou, Zhijian. et al. 2020. Detection and
Classification of Multi-Magnetic Targets Using
References Mask-RCNN. IEEE Access. 8. 187202-187207.
[1] Majaranta, P. et al. 2019. Eye Movements and Human- https://doi.org/10.1109/access.2020.3030676
Computer Interaction In: Klein, C., Ettinger, U. (eds) [13] Ieamsaard, J. et al. 2021. Deep Learning-based Face
Eye Movement Research. Studies in Neuroscience, Mask Detection Using YoloV5. In Proceeding of the
Psychology, and Behavioral Economics. Springer, 2021 9th International Electrical Engineering
Cham, pp.971-1015. Congress, iEECON 2021, Mar. 2021, pp. 428–431.
https://doi.org/10.1007/978-3-030-20085-5_23 https://doi.org/10.1109/iEECON51072.2021.94403
[2] Al Bshabshe, A. et al. 2023. An Overview of Clinical 46
Manifestations of Dermatological Disorders in [14] Yang, G. et al. Face Mask Recognition System with
Intensive Care Units: What Should Intensivists Be YOLOV5 Based on Image Recognition. 2020. In
Aware of? Diagnostics 13, no. 7: 1290. 2020 IEEE 6th International Conference on
https://doi.org/10.3390/diagnostics13071290 Computer and Communications, ICCC 2020, Dec.
[3] Nigat, T.D.; Sitote, T.M. 2023. Gedefaw, B.M. Fungal 2020, pp. 1398-1404.
Skin Disease Classification Using the Convolutional https://doi.org/10.1109/ICCC51575.2020.9345042
Neural Network. J. Healthc. Eng. 6370416, pp.1-9. [15] Ajith, A. et al. 2017. Digital Dermatology Skin
https://doi.org/10.1155/2023/6370416 Disease Detection Model using Image Processing.
[4] Badia, M. et al. 2020. Dermatological Manifestations Int. Conf. Intell. Comput. Control Syst. ICICCS
in the Intensive Care Unit: A Practical Approach. Crit 2017 Digit., vol. 24, no. 7, pp. 168-173.
Care Res Pract. 2020:9729814. https://doi.org/10.1109/iccons.2017.8250703
https://doi.org/10.1155/2020/9729814
96 Informatica 49 (2025) 87–96 F. Masya et al.
[16] Haddad, A and Hameed, S.A. 2018. Image Analysis [22] Chhatlani, J. et al. 2022. DermaGenics - Early
Model for Skin Disease Detection: Framework. Proc. Detection of Melanoma using YOLOv5 Deep
2018 7th Int. Conf. Comput. Commun. Eng. ICCCE Convolutional Neural Networks. 2022 IEEE Delhi
2018, no. c, pp. 280-283, 2018. Section Conference (DELCON). pp. 1-6.
https://doi.org/10.1109/ICCCE.2018.8539270 https://doi.org/10.1109/DELCON54057.2022.9753
227
[17] Rathod, J. et al. 2018. Diagnosis of skin diseases
using Convolutional Neural Networks. Proc. 2nd Int. [23] Wiliani, N., et al. 2023. Identifying Skin Cancer
Conf. Electron. Commun. Aerosp. Technol. ICECA Disease Types with You Only Look Once (YOLO)
2018, no. Iceca, pp. 1048-1051, 2018. Algorithm. Jurnal Riset Informatika. 5(3), 455-464.
https://doi.org/10.1109/ICECA.2018.8474593 https://doi.org/10.34288/jri.v5i3.241
[18] Rohaziat, N. et al. 2020. White Blood Cells Detection [24] Aishwarya, N., et al. 2023. Skin Cancer diagnosis
using YOLOv3 with CNN Feature Extraction with Yolo Deep Neural Network. Procedia
Models. International Journal of Advanced Computer Science. vol 220, pp. 651-658.
Computer Science and Applications. 11. https://doi.org/10.1016/j.procs.2023.03.083
https://doi.org/10.14569/IJACSA.2020.0111058
[25] He, K., et al. 2020. Mask R-CNN. IEEE Trans.
[19] Wu, Z. et al. 2019. Studies on Different CNN Pattern Anal. Mach. Intell. vol. 42, no. 2, pp. 386-
Algorithms for Face Skin Disease Classification 397.
Based on Clinical Images. IEEE Access. vol. 7, no. https://doi.org/10.1109/TPAMI.2018.2844175
c, pp. 66505-66511.
[26] Shi, W., et al. 2019. Plant-part segmentation using
https://doi.org/10.1109/ACCESS.2019.2918221
deep learning and multi-view vision. Biosyst. Eng.
[20] Li, L. F. et al. 2020. Deep Learning in Skin Disease vol. 187, no. September, pp. 81–95.
Image Recognition: A Review. IEEE Access. vol. 8, https://doi.org/10.1016/j.biosystemseng.2019.08.01
pp. 208264-208280. 4
https://doi.org/10.1109/ACCESS.2020.3037258
[21] Salama, A. M. et al. 2024. A YOLO-based Deep
Learning Model for Real-Time Face Mask Detection
via Drone Surveillance in Public Spaces,
Information Sciences, vol. 676, 120865.
https://doi.org/10.1016/j.ins.2024.120865
https://doi.org/10.31449/inf.v49i16.4974 Informatica 49 (2025) 97–114 97
A Unified Trace Meta-Model for Alignment and Synchronization of BPMN
and UMLModels
Aljia Bouzidi1, Nahla Haddar2 and Kais Haddar2
1ISIMM University of Monastir, Monastir, Tunisia
2University of Sfax, Sfax, Tunisia
E-mail: aljia.bouzidi@gmail.com, nahla_haddar@yahoo.fr, kais.haddar@yahoo.fr
Keywords: Traceability, synchronization alignment, use case diagram, class diagram, MVC, BPMN diagram, integration,
transformation
Received: February 24, 2025
Organizations often face information system (IS) failures due to misalignment with business goals. Business
process models (BPMs) play a crucial role in addressing this issue but are often developed independently
of IS models (ISMs), resulting in non-interoperable systems. This paper proposes a traceability method to
link BPMs and ISMs, bridging the gap between business and software domains.
We introduce a unified trace meta-model integrating BPMN elements with UML constructs (use cases and
class diagrams) via traceability links. This meta-model is instantiated as the BPMNTraceISM diagram,
ensuring seamless integration through bidirectional transformation models.
To validate our approach, we developed a graphical editor for BPMNTraceISM diagrams and implemented
transformations using the ATLAS Transformation Language (ATL). A case study on a loan approval process
demonstrates the method’s effectiveness in aligning BPMN and UML elements, improving interoperability
and model alignment across domains
Povzetek: Razvit je enoten sledilni meta-model, ki povezuje elemente BPMN in UML (diagram primerov
uporabe, razredov po vzorcu MVC) za uskladitev poslovnih procesov in informacijskih sistemov, ki ga
validirajo z grafičnim urejevalnikom in transformacijami ATL.
1 Introduction evant explicit traceability model and defined it using the
integration mechanism. Indeed, we propose a requirements
In the software engineering field, Business Process Mod- engineering method that works at both the meta-model and
els (BPMs) playb an increasingly central role in the devel- model levels, establishing traceability between BPMs and
opment and continued management of software systems. ISMs to bridge the gap between business modeling and re-
Therefore, it is crucial to have Information System Mod- quirements elicitation.modeling. This method is deliber-
els (ISMs) that tackle BPMs. modelling. However, these ately influenced by the Object Management Group (OMG)
models are mostly expressed using different modeling lan- specifications. Particular attention is given to UML use
guages, and only a few information systems (IS) are devel- case models [2] as the most commonly used way to elicit
oped with explicit consideration of the business processes software needs and BPMN [3] as the most widely used lan-
they are supposed to support. This separation causes gaps guage to specify the business process model (BPMs). In-
between business and IS models. Thus, a methodology is deed, in [1], we firstly defined a unified trace meta-model
needed to examine the gap between BPMs and ISMs, and of the BPMN and the UML use case models in the form of
keep them aligned even as they evolve. Traceability in soft- an integrated single meta-model. It defines also traceability
ware development proves its ability to associate overlap- links between interrelated concepts to correlate overlapped
ping artefacts of heterogeneous models (for example, busi- concepts as new modeling concepts. This meta-model is
ness models, requirements, uses cases, design models), im- then instantiated in the form of a new diagram that we called
prove project results by helping designers and other stake- BPSUC (Business Process SupportedUse Cases). This new
holders with common tasks such as analysis of change im- diagram permits business teams and requirements design
pacts, etc. Thereby creating an explicit traceability model teams to work together within the same model, and allows
that is not a standalone guideline, but it has significant ben- specifying trace links graphically.
efits in terms of quality, automation, and consistency. Al-
though creating it is not a trivial task, an explicit traceabil- The practical benefits of the proposed method lie in its
ity model remains a reference for a consistent definition of ability to bridge the gap between business process man-
typed traceability links between heterogeneous model con- agement (BPM) and software systems development. In the
cepts, helping to ensure their alignment and coevolution. context of Business Process Models (BPMs) and Informa-
In our previous work presented in [1], we proposed a rel- tion System Models (ISMs), this method enables seamless
98 Informatica 49 (2025) 97–114 A. Bouzidi et al
integration and traceability across heterogeneous models, that defined external traceability models manually, basing
which is crucial for ensuring their alignment and coevolu- on some mechanisms, such as model integration, model
tion. By establishing clear and accurate traceability links merge/ composition, UML profiles or matrices.
between BPMs, UML use case diagrams, and class dia-
grams, our method enhances communication and collab-
oration among business analysts, software engineers, and 2.1 Traceability via transformation models
stakeholders, ensuring that software systems are developed In the first category, existing implicit traceability mod-
with a clear understanding of the business processes they els are commonly MDA-compliant approaches that de-
aim to support. Furthermore, the integration of these mod- fine traceability through exogenous, endogenous, hori-
els, coupled with the explicit traceability model, provides zontal, or vertical transformation models. In these ap-
several practical advantages, including improving change proaches, BPMN models are widely used to generate al-
impact analysis, enhancing automation, and maintaining ternative models through different transformation model
consistency throughout the system lifecycle. These ben- types. Among the various uses of BPMN models are: an
efits are particularly significant in dynamic environments exogenous-based transformation for the definition of map-
where both business processes and software systems evolve ping users/organizations' requirements with BPMNmodels
frequently. Thus, the method not only improves the qual- [4]; a vertical transformation for the generation of artefacts
ity of the development process but also provides a robust between BPMN and user stories [5] and [6]; and the gen-
framework for aligning business and software models, en- eration of UML models [7]; a horizontal and exogenous
suring their cohesive adaptation to changing requirements transformation for the generation of activity diagrams from
and system developments. BPMN [8] and [9]; and a vertical transformation of textual
This paper enriches and extends our work presented in requirements into a BPMN model [10]. Some approaches
[1]. The enrichment involves adding class diagram con- define endogenous transformation between UML diagram
cepts structured according to the MVC pattern. Our inter- elements to establish their traceability. For instance, the ap-
vention considers both the meta-model and the model lev- proach in [11] usesmachine learning techniques tomaintain
els. Hence, in the integrated trace meta-model proposed traceability information between software models. Their
in [1], we add new modeling concepts to express trace focus is particularly on the requirements, analysis, and de-
links between the class diagram, use case diagram, and sign models, which are specified by the UML language. To
the BPMN concepts. Class diagram concepts that have no trace links between requirements documents and UML di-
corresponding concepts are also included in the integrated agrams, several approaches use Natural Language Process-
trace meta-model. Proposed traceability concepts and class ing (NLP). For example, the approach in [12] uses a sys-
diagram concepts are instantiated in the BPSUC diagram. tem requirement description expressed in natural language
Accordingly, BPSUC now enables the design of class dia- to extract the actors and the actions automatically.
gram elements and the proposed traceability concepts com- The core benefit of defining implicit traceability is that
bined with their corresponding BPMN and use case dia- it does not require supplemental effort because only one
gram artefacts. chain of transformation is sufficient to perform transfor-
We validate our theoretical method by implementing a mations in both directions. Moreover, it offers multiple
visual modeling tool to support the enriched integrated trace trace links between generated artefacts. However, the iden-
meta-model and the new diagram supplemented with class tified trace links consider exclusively transformed artefacts.
diagram elements. Moreover, the transformation chain is static and cannot be
The rest of this paper is organized as follows: Section 2 updated to obtain the required traces for such traceability
is dedicated to discussing related work. In Section 3, we scenarios.
give an overview of the method presented in [1]. Section
4 is devoted to explaining our contributions. Section 5 and
section 6 are dedicated to demonstrating the feasibility of 2.2 Explicit traceability models
our proposal in practice and through a topical case study. In The second category includes approaches that define ex-
section 7, we evaluate and discuss our method. Finally, in plicit traceability models separate from the source mod-
section 8, we conclude the current work and we give some els. This category includes approaches that propose guide-
outlooks. lines for creating traceability models. For instance, the
author of [13] defines a method for guiding the establish-
ment of traceability between the software requirements and
2 Related work the UML diagrams. This guideline has two main compo-
nents: (i) a meta-model and (ii) a process step. The pro-
We classify related work into two groups based on the cess step defines the detailed processes, the mapping of
methodologies they have used to establish traceability requirements to UML diagrams, and the types of require-
between elements of heterogeneous models: (1) works ments. Requirements can be classified according to their
that have proposed transformation models to define in- aspects. This classification can be carried out according to
ternal or implicit traceability models, and (2) approaches the type of requirement, which then requires the use of cer-
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 99
tain types of UML diagrams. However, this guideline fo- between software artefacts. To demonstrate their work in
cuses only on establishing traceability at the meta-model practice, the authors develop a tool that supports traceabil-
level. Moreover, the business field is not considered in ity links between software models, including requirements
this work. The authors of [14] propose a meta-model-based and UML class diagrams, and the source code written in the
approach to create traceability links between different lev- Java programming language.
els of the same system. Indeed, this approach focuses on
defining traceability metamodel source code stored as an
Abstract Syntax Tree (AST) and other possible artefacts 2.3 Identified gaps in existing works
such as requirements, test cases, etc. To show the identi- Overall, existing works that define explicit traceability
fied trace links, authors develop an editor. Nevertheless, models are mostly focused on the meta-model level only
storing source code of a system as an AST can cause sev- and ignore the model level. Moreover, existing explicit
eral problems such as the appearance of syntax errors in the models establish traceability either between software mod-
source code, which leads to the loss of traceability links. els expressed in UML diagrams at the same or different ab-
There is other model-based research that aim to maintain straction levels or between business model artefacts. How-
traceability. For example, the research in [15] proposes a ever, none of the existing approaches have achieved suc-
co-evolution of transformations based on the propagation cessful results in establishing or maintaining traceability
of change. Its hypothesis is that knowledge of the evo- between BPMN models, UML use case models, and the
lution of meta-models can be disseminated by decisions UML class diagram.
aimed at driving the co-evolution of transformations. To The disadvantages of the proposed approaches stem from
address particular cases, the authors present composition- rigid relationship types that fail to adapt to the changing
based techniques that help developers compose resolutions needs and practices of organizations. Furthermore, most of
that meet their needs. For the same purpose, the approach the proposed approaches define or use very generic trace-
in [11] refers to machine learning techniques to introduce ability meta-models, capable of generating highly abstract
an approach called TRAIL (TRAceability lInk cLassifier). trace models. In practice, there is no prescription for how
The training of the classifier is based on a training dataset to add customized tracing information or how to adapt a
that contains histories of existing traceability links between generic traceability meta-model to express valuable and
pairs of artefacts to output the trace link (related or unre- context-specific traces. Concerning the approaches that fo-
lated) of any future pair of artefacts (new or already exist- cus on concrete modeling languages, to our knowledge,
ing). Some other approaches define traceability models for there is no approach that proposes an explicit trace model
eliciting requirements of complex systems [16] and [17]. or meta-model between BPMN and UML models, even
Likewise, the authors of [19] base their work on deep learn- though they are the most popular standards for modeling
ing techniques and propose a neural network architecture business processes and automated information systems.
based on word embedding and Recurrent Neural Network
(RNN) algorithms to predict trace links automatically. The
output of this model is a vector that contains the semantic 3 Background of our previous
data of the artefact. Then, the trained model compares the
semantic vectors of a pair of artefacts and predicts if they traceability method
are related or unrelated. However, considering all meta-
models from many different abstraction levels in one uni- The method presented in this paper is an extension of our
fied single traceability model is not a trivial task and can previous work [1]. In this previous work, we have explored
result in very complex models. the advantages of defining an integrated traceability model
to establish traceability between the BPMN and the UML
In [18], the authors propose an approach to promote use case models and ensure their coevolution once a change
traceability and synchronization of computational models has occurred. This method acts at both the meta-model and
in an Enterprise Architecture (EA), using meta-models, model level, and it includes three core steps:
model traceability, and synchronization structures. The au- (i) First, we have defined an integrated trace meta-model
thors represent the meta-models of the EA at all abstraction that is a specification of traceability between the existing
levels (strategic, tactical, and operational). These levels are artefacts, while keeping them unchanged and independent.
denoted within the integrated meta-model by three pack- This integrated trace meta-model contains all the BPMN
ages. Each package incorporates the core concepts of the and the UML use case meta-model artefacts (meta-classes
level it represents. They integrate the three meta-models and associations) unified with new meta-classes and asso-
by adding alignment points between them. In addition, ciations for expressing traceability links at the meta-model
they define a traceability framework and a synchronization level. The integrated trace meta-model favors simplicity
framework to support the analysis of the impact of organi- and uniformity because source meta-models are kept and
zational changes. unified with their traceability information in one unified
There are also studies on specific languages. For exam- meta-model.
ple, the approach in [20] uses Natural Language Processing (ii) Next, we instantiated the integrated trace meta-model
techniques to define a framework for managing traceability at the model level. We represent it as a new diagram called
100 Informatica 49 (2025) 97–114 A. Bouzidi et al
Business Process Modeling Notation Traces Use Case (BP-
SUC). This diagram also incorporates the BPMN and the Table 1: Mapping of BPMN, use case and the trace meta-
UML use case elements together with traceability links, and model concepts
allows designing BPMN and use case diagram artefacts, Use Case BPMN concept Meta-model
jointly. Moreover, visualizations and queries on traced el- Concept concept
ements together are straightforward, because business ana- Package Empty Lane (a lane in- Organisation
lysts and software designers are now able to work together cluding other sub-lanes) Unit Pack-
on one integrated model. BPSUC can also be used to anal- age
yse change impacts and validate them before propagating Actor Non empty Lane (that Organization
them to the source models. does not contain other Unit Actor
(iii) Finally, we defined bidirectional transformation sub-lanes)
models between the BPSUC diagram and the sourcemodels Use case Fragment represented UCsF
(BPMN and the use case models) to ensure the coevolution by a sequence of BPMN
of the origin models. artefacts that is per-
formed by the same
role and manipulates the
3.1 Integrated trace meta-model same item aware element
In our previous work presented in [1], a unified trace meta- (business object, input
model is proposed based on a semantic mapping of pairs of data, data store, data
BPMN and use case meta-model artefacts. The definition state) d
of thismeta-model follows the following scenario: For each Extends Exclusive Gateway be- Exclusive
pair of overlapped BPMN and UML use case concepts, we tween two different frag- Gateway,
add a new modeling concept that can be either a link, such ments Extends
as an association, a composition or an inheritance, or a new Association Fragment within the low- Association
meta-class. Each trace link represented by a newmeta-class est nesting level of sub-
is associated with the pair artefacts it specifies, generally, lanes
by an inheritance relationship. Includes Redundant Fragment Fragment
Table 1 summarizesmappings between the BPMNmodel (that appears multiple that appears
(first column) and the use case diagram concepts (second Times in the BPMN multiple
column ), and the corresponding new meta-classes (third model) times, In-
column) that are associated with them in the integrated cludes
meta-model (the full mapping and its explanation is avail- Extends Inclusive Gateway be- Inclusive
able in [1]). To validate the proposed mapping further, we tween two different frag- Gateway,
have conducted additional evaluations across a variety of ments Extends
BPMN and UML diagram scenarios. Extension Condition of sequence Extension
This expansion includes not only the core BPMN and Point Flow + the name of the Point
UML elements such as activities, actors, and use cases but fragment that represents
also more complex diagrams, such as: to the extending use case
– BPMNModels: Event-driven processes, process vari-
ants, and sub-processes with different complexity lev-
els, such as loan approval and inventory management The integrated meta-model is depicted in Figure 1). In
systems. order to be able to read it, we have presented in this Figure
– UML Diagrams: Class diagrams, including inheri- only the core artefacts of the source meta-model (use case
tance and association relationships, and more sophis- meta-model and BPMN met-model) and all the trace links
ticated use case diagrams representing different busi- (meta-classes and associations). Dark grey meta-classes
ness functions, such as order fulfilment, customer sup- represent new meta-classes; light grey meta-classes repre-
port, and system maintenance. sent UML use case elements, white meta-classes represent
These diverse scenarios have allowed us to assess how BPMN elements, and black lines represent existing associa-
well the mapping between BPMN, UML use case, and tions from the source meta-models The blue lines represent
class diagrams holds up in real-world business process and trace relationships, providing the foundational traceability
system modeling. By applying the proposed traceability between BPMN and UML elements, as further detailed in
method to these varied scenarios, we demonstrate the scala- [1] and [22].
bility and robustness of our meta-model. We have also pro-
vided examples where the mapping effectively handles the – Organizational-Unit-Package
integration of different BPMN and UML model types, en- In BPMN, a non-empty lane is a grouping element and
suring traceability between the business and software mod- therefore has the same meaning as a package in UML. Con-
els. sequently, the Organizational-Unit-Package (OUPackage)
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 101
is defined to trace the link between the pair BPMN non- logical consistency.
empty lane and the use case package thereby defining an in- – Use case supporting fragment
heritance relationship between the new meta-class and this In order to support business objectives, such a UML use
artefact pair. case should be able to realize some business activities that
– Organizational-Unit-Actor are specified in the integrated trace meta-model by a Frag-
In the proposed integrated trace meta-model of [1], we ment. A separate specification of the use case and the frag-
have defined a meta-class designated Organizational-Unit- ment that is supposed to realise does not allow explicitly
Actor (OUActor). This new meta-class traces artefacts of representing the semantic links between them. To do this,
the pair UML actors and BPMN empty lanes (i.e. lanes that we have defined the integrated trace meta-model presented
do not have embedded lanes). That is, it unifies the prop- in [1], which introduces a new meta-class that we designate
erties of a lane and an actor, and combines them without Use Case supporting Fragment (UCsF) This new meta-
changing their semantics there by defining the OUActor as class is defined as a specialization of a UML use case in
specialization of the UML actor and the BPMN Lane pair. order to inherit all its properties without updating its initial
In this way, OUActor inherits properties of this pair of arte- meaning.
facts without updating their original semantics and struc-
tures. For example, in a loan approval process, the Loan
Officer is represented in a BPMN diagram by a non-empty 3.2 BPSUC diagram
lane and in a UML use case diagram as an actor involved To allow modeling the artefacts of the proposed integrated
in the process. The OUActor meta-class links these repre- trace meta-model, we have instantiated it in the form of an
sentations, inheriting properties from both while preserving integrated trace model, in our previous work [1]. We rep-
their original semantics. This ensures synchronization be- resent it as a new diagram that we have called BPMN Sup-
tween BPMN and UML elements, offering a unified view porting Use Case model (BPSUC).
of the Loan Officer’s role across models.
– Fragment
A fragment is defined by [22] as “a set of interrelated Table 2: Notation of the traceability artefacts
BPMN elements that has inputs and outputs, and which is Meta-model concept Graphical notation
executed by the same performer”. This artefact is speci-
fied in the unified trace meta-model as an instance of the
meta-class Fragment (cf. Figure 1)). As a Fragment is just OU-Actor
an activity that can contain other BPMN concepts such as
tasks, events, gateways and sequence flows, we have aggre-
gated a BPMN sub-process to a Fragment by creating an ag- OU- Package
gregation relationship called fragments, between the frag-
ment and sub-process meta-classes in the integrated trace
meta-model (cf. Figure 1)). Its cardinality is 1-* to point
out that a sub-process should contain at least one fragment Use Case supporting
but it may incorporate more than one Fragment. In addi- Fragment (UCsF)
tion, we define a many-to-many reflexive association in the
fragment to represent the fact that a fragment may be an
aggregation of other fragments (cf. Figure 1)). Moreover, For each concept, we have provided a graphical nota-
we create an association between the data object and the tion as follows: We have introduced new notations to the
fragment (cf. Figure 1)) to associate each fragment with proposed new meta-classes UCsF, Organization Unit Pack-
the objects it manipulates. The cardinality of this relation- age, and Organization Unit Actor. These notations are in-
ship is fixed to 1-* to indicate that each fragment manipu- spired by and extended from the icons of the pair of arte-
lates at least one business object type, but it may manipu- facts they represent. The inspiration ensures that experi-
late more than one business object. Furthermore, we have enced business and system designers are comfortable using
defined an association called organizationUA between the the BPSUC diagram. Each concept originates form UML
meta-classes OU-Actor and fragment with a cardinality of use case and BPMN models[1], and retains its original no-
1-* to associate a fragment with its performer. For instance, tation. In the BPSUC diagram, the Fragment is instantiated
tasks such as Review Application, Assess Credit Score, and as a specific activity within the Loan Approval Process,
Approve Loan in the BPMN Loan Approval Process are linking BPMN tasks to the Loan Officer (OUActor) and
all performed by the Loan Officer. The Fragment aggre- business objects like the Loan Application. Additionally,
gates these tasks into a cohesive group and connects them the Organization Unit Package (OUPackage) meta-class is
to the BPMN sub-process as well as the business objects used to trace relationships between BPMN lanes and UML
(e.g., Loan Applications, Credit Scores) manipulated dur- packages. For example, functional areas such as Loan Re-
ing the process. This provides clear traceability between view, Credit Assessment, and Loan Approval in the BPMN
tasks, business objects, and performers while maintaining diagram are mapped to corresponding UML packages, en-
102 Informatica 49 (2025) 97–114 A. Bouzidi et al
Figure 1: Traceability of BPMN and use case meta-model concepts
suring alignment and traceability between these functional from our trace model to the source models is carried out
areas and their UML counterparts. through defined MDA model transformations. The former
Table 2 depicts the graphical notations of the new meta- is explored to guarantee the coevolution of the BPMN and
classesOrganisationUnit Actor, OrganisationUnit Package the UML models. Figure 2) demonstrates the background
and the UCsF. of our traceability method. In this section, we further ex-
plain howwe extend and improve the integrated trace meta-
model and the BPSUC diagram, as well as the rectifications
4 Traceability method made to them.
The research work conducted in this paper is an extension
and enhancement of our previous work presented in [1]. 4.1 Integrated trace meta-model
The extension consists of improving the integrated trace improvement
meta-model and the BPSUC diagram to include the arte-
facts of the UML class diagram structured according to the Our first improvement of the integrated trace meta-model
MVC design pattern. Our contribution aims not only to es- consists of defining an adequate strategy for defining its
concepts and relationships between them. Indeed, we pro-
pose a methodology for defining the integrated trace meta-
model which includes two main steps: (1) identifying over-
lapping concepts of BPMN and UML meta-models to de-
fine a relevant mapping between them, and (2) Defining
an adequate methodology to link each pair of interrelated
concepts, without changing their semantics. Thus, we pro-
pose to keep overlapping concepts and connect them ei-
ther by a new concept, or a new relationship, which spec-
ifies the trace link between existing concepts at the meta-
model level. Afterwards, we connect each pair of artefacts
to the new concepts representing them through a general-
Figure 2: Background of the traceability method ization/specialization relationship. This relationship allows
inheriting properties of both separated concepts as well as
tablish alignment but also to keep source models always combining their usage without updating their initial seman-
aligned even if they evolve. The propagation of changes tics. Our second improvement consists of adding the class
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 103
diagrammeta-model artefacts to the previous version of the In contrast to the mapped concepts of the use case and
integrated trace meta-model. the BPMN meta-model artefacts, the mapping between the
By applying our meta-model construction, we need to interrelated concepts of UML class diagram and the BPMN
identify adequate mappings between BPMN and UML meta-model is not tight, as shown in Table 3. Indeed, one
class diagram concepts. In the literature mapping between UML concept may be represented by many BPMN con-
BPMN and UML class diagram concepts is widely dis- cepts and vice versa. This is due mainly to the important
cussed. Among them, [23] defined model transformations degree of heterogeneity between the BPMN and the class
from BPMN into UML class diagrams structured according diagram artefacts. Thus, our mapping is limited to defining
to the MVC design pattern, and use case diagrams based on associations instead of defining new traceability concepts,
semantic mappings. For example, they propose mapping as we aim not to complicate our integrated trace meta-
each BPMN empty lane (i.e. lane that does not include model, and therefore facilitate its readability while main-
other lanes) into a class in the class diagram, and into an taining its consistency. The aforementioned trace meta-
UML actor in the use case model. We reuse the defined se- classes can also be reused to define BPMN- class diagram
mantic mappings in this approach to continue the definition concept traceability.
of the integrated trace meta-model. The excerpt of the meta-model defined to trace the
BPMN and the class diagram meta-models is presented in
Figure.3. To ensure readability, Figure 3) depicts only the
Table 3: Mapping of BPMN and class diagram meta-model main artefacts of class diagram and BPMN meta-models,
concepts as well as the reused traceability concepts.
BPMN concept UML class diagram White meta-classes are BPMN concepts, orange meta-
meta-model classes are UML class diagrammeta-model concepts, khaki
Item Aware Element (Data – Entity class meta-classes denote UML class diagram concepts used for
object, Data store, Data in- – Association structuring the class diagram according to the MVC de-
put, Data output or Data sign pattern, while new concepts are specified by dark grey
state) meta-classes. The blue associations represent the new trace
Empty lane – Entity class links, while the black ones are the existing associations.
– View class It is important to note that all the use case concepts,
– Control class BPMN concepts, traceability links and existing associa-
– Association
Fragment – View class tions defined in the previous extract of the integrated trace
– Control class meta-model, which are not present in this extract remain
Exception event – Exception class valid.
– Operation In the excerptof Figure.3, each BPMN concept is asso-
Signal event ciated with its corresponding concept in the class diagram
– Signal Class meta-model. For example, we define a trace link called
– Operation trace between the data object and the entity class to estab-
lish traceability between them.
Automated task t (business The multiplicity of this association is 1..* to indicate that
rule task, receive task, send – Operation
task, user task, script task, – Association each item-aware element should represent exactly one en-
tity class. Moreover, we define a trace link between the
service task) within a frag- gateway and the property meta-classes as gateways can be
ment indicators of association cardinalities. The multiplicity of
Item aware element type Cardinality of associ- this association is 0. On the other hand, UCsF is linked
(single or collection) or ations to the following meta-classes: class, ClassDIPackage, and
Gateway or association by composition . This means that a UCsF can
Loop task/ Rollback se- include classes, associations, and packages. These asso-
quence flow ciations mean that a UCsF is a use case that incorporates
Item aware element attached Parameters of an op- its supported class diagram elements, representing the sup-
to an automated task t within eration ported fragment elements.
a fragment f The cardinality of the composition association UCsF-
Conditional sequence Flow Attribute ClassDIPackage is 3-* to indicate that an UCsF should in-
corporate at least three packages: View, Control, and Mod-
els, which represent the three parts of the MVC design pat-
In Table 3, we summarize the semantic mapping be- tern.
tween the BPMN meta-model artefacts and the class di- In addition, we define an association between OUActor
agram meta-model artefacts from [23]. In this Table, the and class to express that an actor in the integrated trace
class diagram meta-model concepts are structured accord- meta-model is represented as a class in the class diagram
ing to the MVC design pattern. meta-model. Furthermore, in our integrated trace meta-
104 Informatica 49 (2025) 97–114 A. Bouzidi et al
model, a generalization/specialization relationship between
the meta-classes OUPackage and ClassDIPackage is de- Table 4: Graphical notations of overlapping elements of
fined to point out that this trace meta-class inherits all prop- BPMNTraceISM diagram
erties of the Package meta-class. BPMNTraceISM Graphic BPMNTraceISM Graphic
element notation element notation
4.2 BPSUC diagram improvement
Use case asso- Signal event
In contrast to most existing approaches [11], [13], [14], ciation
[15], [16], [17], [18], [19], [20], [21], which focus only on
the meta-model level, our traceability method includes both
the meta-model and the model levels. Thus, the second step Extends rela- Exclusive gate-
of our contribution is devoted to describing how the trace- tionship way
ability of BPMN and UML artefacts is established at the
model level. We have improved the proposed BPSUC di-
agram from [1], in which the BPSUC diagram features are Includes rela- Parallel gate-
limited to designing thoroughly the BPMN and the use case tionship way
diagram artefacts, , combined with their traceability links,
which already reflects its designation. Annotation Inclusive gate-
In this paper, we aim to enrich this diagram to incorpo- flow way
rate class diagram elements combined with BPMN and use
case diagram elements. The first thing we do is update the
name of BPSUC to be in harmony with its newly supported Start event Data input
features. The new designation we have chosen is BPMN-
TraceISM (Business Process Model and Notation Traces End event Data output
Information System Models). BPMNTraceISM is an in-
stantiation of the new version of the improved meta-model
and forms a single unified model that combines the usage of Manual task Data store
UML elements including the use case diagram and class di-
agram elements, as well as the BPMN elements thoroughly. Normal task Sequence flow
Thus, this diagram is now able to design elements and rela-
tionships of both UML use case and class diagrams, as well
as BPMN models, concurrently. Moreover, it specifies the Error event Group
traceability information of the interrelated artefacts.
Each artefact inBPMNTraceISM diagram has its specific
notation. Some of them retain the original notation (BPMN
or UML notations), while the others have a new represen- Cancel event Entity class
tation, which does not differ greatly from BPMN and UML
notations. Control class Generalization
4.2.1 BPMNTraceISM artefacts conserve their initial
notations Signal class Aggregation
association
The mappings on which we base the definition of the in-
tegrated trace meta-model comprises neither all the BPMN
concepts nor all the UML concepts. This is due to the fact View class Composition
that, some BPMN artefacts do not have their corresponding association
UML artefacts, and vice versa. For example, the mapping
does not define any UML concept representing a BPMN Exception class Directed asso-
start event. ciation
Even though, in a BPMNTraceISM diagram, it is pos-
sible to specify UML artefacts with no corresponding ele-
ments in BPMN. According to the mapping, many elements
of UML are mapped to one BPMN element. Thus, the rep- ciation, (ii) an entity class, and (iii) an operation of a class,
resentation of these elements in UML diagrams requires in the class diagram. In this situation, it is very difficult to
grouping them. On the other hand, one UML element may represent the mapped elements in one unifying element. At
be linked to many BPMN elements. For example, a data the meta-model level, we have proposed associating each
store in the BPMN diagram is transformed into (i) an asso- pair of these mapped concepts by an association instead of
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 105
Figure 3: Traceability of the BPMN meta-model and the UML class diagram meta-model
defining new traceability meta-classes. At the model level, UCsF should act as a complex symbol that describes con-
these artefacts are processed similarly to the non-mapped currently BPMN elements and UML class diagram ele-
concepts, and retain their original notations in the BPM- ments. In order to represent explicitly the different ele-
NTraceISM diagram. Table 4 outlines the graphical nota- ments incorporated by a UCsF, the use case notation needs
tions of the core artefacts of the BPMNTraceISM diagram to be extended. Therefore, we adjust the UCsF notation
that retain the initial notations. OUPackage and OUActor by adding another compartment (cf. Figure 4) to encap-
are new meta-classes defined by [1] to represent traceabil- sulate class diagram elements representing the components
ity links of BPMN and UML use case diagram elements. (classes, associations and packages) of the supported frag-
In the integrated trace meta-model, we did not reuse these ment. In order to avoid the complexity of this element,
meta-classes to define new associations. Thus, the instanti- the designer can choose to hide or show each compartment.
ation of these meta-classes keeps the annotations provided Figure 4) depicts the graphical notation of a UCsF, in which
in [1]. all compartments are hidden.
4.2.2 UCsF notation
In the previous version of the diagram BPMNTraceISM (in
a BPSUC diagram), [1] states that a UCsF is a specializa-
tion of a use case and inherits its properties. Therefore, the
graphical notation of UCsFs extends the graphical notation
of a UML use case. Moreover, a UCsF has composition
relationships to (i) a BPMN Fragment. To represent this
trace link, graphically, [1] defines a compartment that in-
corporates the corresponding BPMN fragment. Figure 4: UCsF notation
In our integrated trace meta-model, we have defined
composition relationships from a UCsF to some UML class
diagram artefacts (cf. Figure 3)). Indeed, a UCsF should 4.3 Change propagation improvement
encapsulate classes, associations and packages, which cor-
respond to its supported fragment. Accordingly, we pro- Our traceability method aims to ensure the coevolution of
pose to update the graphic notation of the UCsF. Thus, a the separated models when a change occurs either in the
106 Informatica 49 (2025) 97–114 A. Bouzidi et al
source models (BPMNmodel, use case model, and/or class class diagram (MCD) conforming to the UML meta-model
diagram) or in the BPMNTraceISM diagram (cf. Figure 5)). (MMUML). This transformation is carried out based onmap-
To do this, we have improved the transformation model de- pings between the new diagram, the BPMN and UMLmod-
fined in [1] thereby including the class diagram concepts els.
in the bi-directional transformation rules defined in [1] as The formal definition of our forward transformation rules
two sets of transformation models (forward and backward is as follows:
transformation rules). They ensure the transformation be-
tween the BPMNTraceISM diagram, the BPMN, and the
UMLmodels that include a class diagram and a use case di- MTransF ((MBPMN/MMBPMN ,MUCM/MMUML,
agram using a semantic mapping between BPMN, BPMN- MUCD/MMUML,
TraceISM, and UML elements derived from our integrated (map(MUCM,MBPMNTraceISM),
trace meta-model. map(UCD,MBPMNTraceISM),
Each transformation model includes a set of well-defined map(MBPMN,MBPMNTraceISM))
transformation program or transformation rules (Tab) (Tab forwa→rdRules
conforms toMMt) that transform source models (Ma) con- MBPMNTraceISM/MMBPMNTraceISM
forming to source meta-models (MMa) (noted Ma/MMa) (3)
to target models (Mb) confirming to target meta-models There are two possible scenarios for producing the
(MMb) (noted Mb/MMb ) according to a mapping between BPMNTraceISM elements based on the forward transfor-
the source and target model artefacts (noted map mation rules:
(Ma, Mb)).
Formally, we specify transformation models according to a The first scenario consists of applying a forward trans-
function that we callMtransF formation rule (RX) to derive trace modeling elements
This function is formally written as follows: (tre) represented in the BPMNTraceISM diagram from a
( ) BPMN element (MBPMN!Element) and a UML element
(MUML!Element). More precisely, a OUActor, a OUPack-
Tab/MMt
MtransF Ma/MMa, map(Ma,Mb) → Mb/MMb age, and a UCsF of the BPMNTraceISM diagram are gen-
(1) erated from the BPMN and UML elements. Formally, these
transformation rules are as follow:
For example, consider the forward transformation rule
R1 that transforms aUMLpackage and a non-empty BPMN
lane into a OU-Package in the BPMNTraceISM diagram. MTransF tre((MBPMN !Element,MUML!Element),
This transformation ensures that elements in the source (map(MBPMN,MBPMNTraceISM),
models are correctly mapped to their counterparts in the map(MUML,MBPMNTraceISM)))BPMNTraceISM diagram, facilitating traceability across R
both business and software models. →X MBPMNTraceISM !tre
The proposed bi-directional transformation models (4)
(backward and forward) ensure the coevolution of BPMN For instance, we suppose that a forward transformation
and UML models, as well as the coevolution of the source rule R1 produces an OU-Package from a UML package and
models (business model specified by a BPMN diagram and a BPMN non-empty lane. Formally, this rule can be written
software models specified by a UML class diagram and an as follows:
UML use case diagram) and the BPMNTraceISM diagram.
Formally, the forward and backward transformation
model is specified as(follows: ) MtransFOUPackage((MBPMN !lane,MUCM !Package),
(map(MBPMN,MBPMNTraceISM),
MtransF Ma/MMa, map map(MUCM,MBPMNTraceISM)))
(Ma,Mb) (2) →R1
forwardRules,↔backawardRules MBPMNTraceISM !OUPackage
(Mb/MMb) (5)
The rest of this section will be devoted to providing more The second scenario consists of generating unrelated el-
details on how we created the bidirectional transformation ements (ure) from either, the UML models or the BPMN
rules. model only. Indeed, each concept in the BPMNTraceISM,
that corresponds to a concept in the source models (BPMN
4.3.1 Forward transformation rules model, a UML class diagram or a UML use case diagram)
needs just its original model. In this case, the input of this
We propose a forward transformation model (Forward transformation rule is either a BPMN model if this concept
rules) to produce automatically a BPMNTraceISM diagram comes fromBPMNor aUMLmodel if its origin is the UML
(MBPMNTraceISM) conforming to our integrated trace meta- class diagram or the UML use case diagram. For example,
model (MMBPMNTraceISM) from the source models, namely the generation of amanual task in the BPMNTraceISM dia-
a BPMN diagram (MBPMN) conforming to the BPMN gram requires the BPMNmodel only because a manual task
meta-model (MMBPMN), a use case model (MUCM), and a does not have corresponding elements in the UML diagram.
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 107
This transformation rule is written as follows: For example, when a change occurs in the BPMNTra-
ceISM diagram, such as adding a new OU-Actor, the cor-
MTransF ure( MBPMN !manualTask, responding elements in the BPMN and UML models need
map(MBPMN,MBPMNTraceISM) to be updated. The backward transformation rule ensures
R(Man→ualTask)
MBPMNTraceISM !manualTask that:
(6) 1 TheOU-Actor is mapped to both the UMLActor in the
Let's illustrate how a change in the BPMN model (e.g., use case diagram and a new BPMN lane in the BPMN
adding a new lane) is propagated into the BPMNTraceISM model.
diagram. Assume that Rule R1 is applied to generate an 2 The changes are propagated back into the source mod-
OU-Package from the UML package and the BPMN lane. els, maintaining alignment across the models.
The transformation process involves the following steps:
1 The BPMN Lane and UML Package elements are
identified. 4.3.3 Change propagation process
2 Rule R1 is triggered, creating a corresponding OU- The bidirectional transformation rules allow propagating
Package in the BPMNTraceISM diagram. changes that occur in the source model into the target mod-
3 The OU-Package is updated within the BPMNTra- els. By applying these rules this approach enables the co-
ceISM diagram, and the changes are synchronized evolution of the business and software models.
with the BPMN and UML models. The change propagation process is carried out in two
ways (cf. Figure 5): (1) by manually updating the source
4.3.2 Backward transformation rules models (BPMN, UML, and UML use case diagram) , or (2)
To have the opposite direction of the forward transfor- by designing the BPMNTraceISM diagram.
mation rules, we have defined a backward transformation In the first case, software designers and business ana-
model. This means that the source elements of each for- lysts separately and concurrently update the BPMN mod-
ward transformation rule become the target elements of a eland consequently the use case diagram and/or the class
backward transformation rule, and its target elements be- diagram. For example, a software designer adds a new use
come the source elements of the backward transformation case to the use case model, new classes responsible for re-
rule. Formally, backward transformation rules are written alizing the new use case, and simultaneously, a business
as follows: analyst changes the name of a lane in the BPMN model. A
direct generation of the software models leads to the loss
MTransF (MBPMNTraceISM/MMBPMNTraceISM , of changes made by the software designers. Additionally,
(map(MBPMNTraceISM,MUCM), to avoid unintentional updates, the impact of changes in-
volved in a (business or UML) model needs to be anal-
map(MBPMNTraceISM,MUCD), ysed before propagating it to the target model. To tackle
map(MBPMNTraceISM,MBPMN)) this problem, an intermediate step is required to make all
backwa→rdRules updates made in the separate models. This step can be
M reached by executing our forward model (user task “Exe-
BPMN/MMBPMN ,MUCM/MMUML,
MCD/MMUML,M
cute forward transformation rules”), which derives a BPM-
UCD/MMUML
(7) NTraceISM diagram from both UML and BPMN. Thus,
We use the same logic as in the forward transformation all changes made on the BPMN and/or on the class and
rules to define the reverse transformation rules. Therefore, use case diagrams are considered in the derived BPMNTra-
each backward transformation rule of each pair of artefacts ceISM diagram.
is defined according to the following formula: In the second case, all updates made by business analysts
MTransF and software designers are done in the unified trace model
tre(MBPMNTraceISM!tre,
(BPMNTraceISM diagram) of being made in the BPMs and
the ISMs. Using BPMNTraceISM overcomes the gap be-
MTransF tre (MBPMNTraceISM !tre, tween the business analysts and software designers, and en-
(map(MBPMNTraceISM, MBPMN ), ables them to work together using the same model. Indeed,
map(MBPMNTraceISM,MUML )))
(8) this diagram covers all business and software model ele-
R→X (M ments and the traceability concepts of pairs of mapped arte-
BPMN !Element,MUML!Element)
facts. Any change involving a BPMNTraceISM element
Non-overlapping artefact transformation rules are de- (bp) leads to the modification of the BPMN or/and UML
fined according to the formula below: model elements traced by bp. For example, the insertion
of a new OU-Actor in the BPMNTraceISM diagram leads
MTransF ure(MBPMNTraceISM !ure, to the insertion of and a new UML actor in the UML use
(map(MBPMNTraceISM,MBPMN), case diagram and a new BPMN lane in the BPMN model.
map(MBPMNTraceISM,MUML)))
(9)
BPMNTraceISM can act as a gateway allowing business
R→X (MBPMN !Element,MUML!Element) analysts and software designers to work together to test,
108 Informatica 49 (2025) 97–114 A. Bouzidi et al
Figure 5: Synchronization process of BPMN and UML
analyse, and correct inconsistencies due to unwanted up- a graphical editor that conforms to the trace meta-model
dates before propagating them to the sourcemodels (BPMN and enables concurrently seeing and managing trace rela-
and UML models). In addition, this new diagram can be tionships between the BPMN model, the use case, and the
used to analyse and estimate the impact of changes made class diagram. BPTraceISM can be integrated within other
to business or system components or services. Until this modeling tools to enhance their modeling capabilities. To
step, although the BPMNTraceISM diagram is aware of the make our modeling tool available in any Eclipse environ-
updates made by both business analysts and software de- ment without need to start an Eclipse runtime, we imple-
signers, the source models are not aware neither the BPM- ment it as an Eclipse plug-in.
NTraceISM diagram nor each other. Accordingly, propa-
gating the modifications is an essential step to ensure the
coevolution of the source models. We can do this eas-
ily by running our backward transformation model (user
task ”execute backward transformation rules”). Once the
backward transformation model is run, changes are prop-
agated to BPMN and UML models and thus these models
are aligned with each other.
5 Implementation
To use the proposed traceability approach, we implement
a visual editor called Business Process model Traced with
Information System Models (BPTraceISM). Figure 6: The environment of the BPTraceISM editor
Moreover, we develop a prototype called Business Pro-
cess to Information System Models (BP2ISM) that pro- The construction process of the BPTraceISM consists
vides significant practical support for the transformations of two main phases: (1) the definition of the modeling
involved in our traceability method. This prototype auto- tool, and (2) the definition of the plug-in that supports it.
mates the suggested forward and backward transformation The first phase begins with the implementation of the trace
models between the business process and the ISMs on the meta-model using the ecoremeta-modeling language. Then
one hand, and the BPMNTraceISM diagram on the other. we build a toolbox for creating instances of the meta-model
These transformation models are automatically applied via classes. In the second phase, we develop a feature that sup-
transformation rules expressed in the ATL transformation ports the modeling tool. Afterward, we construct an update
language. site to ensure the portability of our plug-in and allow its
installation via any Eclipse update manager.
5.1 Visual editor implementation BPTraceISM environment is composed of four main
parts (cf. Figure 6): the project explorer containing an EMF
To implement the BPTraceISM editor, we have used project that includes BPMNTraceISM diagrams (part a), the
Eclipse EMF to implement the trace meta-model and modeling space (part b), the toolbox containing the graphi-
Eclipse GMF to design the concrete syntax of the BPM- cal elements of a BPMNTraceISM diagram (part c), and the
NTraceISM diagram. Indeed, the modeling tool includes properties tab to edit the properties of an element selected
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 109
in the modeling space (part d). BPISM2BPMNTrISM takes three files as input: (1) A
Figure 7) outlines a simple example of a BPMNTra- file with the extension “.bpmn” that must conform to the
ceISM diagram created using the editor. The modeling BPMN2.0 meta-model, (2) two files with the extension
space contains an OUActor called supplier associated with “.uml” that must conform to the UMLmeta-model and con-
a UCsF called manage purchase order. In the business com- tain the use case model and the class diagram. It generates
partment of the UCsF Manage purchase order, we have a as output a BPMNTraceISM diagram with the extension
user task calledAccept purchase order. In the class diagram “.BPMNTraceISM”.
compartment, we have four classes linked via undirected BPMNTrISM2BPISM implements the backward trans-
associations. Each class has a name and a stereotype. The formations, i.e., the transformation rules from a BPMN-
boundary class Manage purchase order contains an opera- TraceISM diagram into a BPMN and UML models. It
tion called acceptPurchaseOrder() takes as input a BPMNTraceISM diagram with the exten-
sion “.BPMNTraceISM”. It generates as output three files:
(1) A file with “.bpmn” as extension. This file conforms to
the BPMN2.0 meta-model, (2) two files with the extension
“.uml” that include the generated use case model and the
class diagram.
6 Case study
We take a common business process model for online pur-
chasing and selling to demonstrate the viability of our trace-
Figure 7: Example of a BPMNTraceISM diagram within ability method. The model is specified using BPMN2.0 (cf.
the modeling tool Figure 8)) [23].
.
This business process begins when a customer selects a
product to purchase and adds it to the basket, resulting in
5.2 Prototype for the transformation models the creation of an online purchase order and the transmis-
sion of the order to the vendor. The customer has the op-
BP2ISM is implemented within the Eclipse Modeling tion to cancel the purchase order before entering their per-
Framework (EMF) environment. It includes two compo- sonal information. Otherwise, they must fill in their per-
nents: sonal information and submit an online purchase order to
– BPISM2BPMNTrISM : It automates the forward trans- the stock management. When an online purchase order is
formation, which is the conversion of BPMN and received, the stock manager checks the warehouse for the
UML models into BPMNTraceISM. availability of the ordered items to see if there are enough
– BPMNTrISM2BPISM : It automates the backward products to fulfil the order. If not, the restocking procedure
transformations. is initiated to reorder raw materials and create the ordered
The transformation process requires tools, editors or plu- products based on the supplier's catalogue. The restock-
gins in order to specify source and target models. For this ing procedure can be performed as many times as neces-
reason, tools are required to represent BPMN, UML and sary within the same business process instance. An extreme
BPMNTraceISM diagrams, which serve as the source and scenario occurs when raw materials are unavailable. If all
target models of BP2ISM components. Because BPMN items are available, sales validate the purchase order, gen-
and UML are widely used standards, many plugins and erate an invoice, and begin collecting and packaging prod-
tools have been created and certified to support them. We ucts for shipment. When sales receive payment and store
choose to employ internal plugins within EMF instead of the delivered order, the procedure is complete. Purchase
existing plugins. As a result, we develop BPMN mod- order cancellation requests, however, can be made before
els with the Eclipse BPMN2 modeler plugin and UML the purchase order is verified. As a result, sales proceed
use case and class diagrams with the UML designer plu- with purchase order cancellation and a penalty charge to the
gin. Internal meta-models in these plugins closely adhere buyer. In [23], the authors decompose the BPMN model
to OMG requirements. We incorporate these meta-models of the case study into nine fragments (F1-F9) (cf. Figure
into the EMF environment for usage in the execution of 8)) based on their fragment definition (see [23] for further
our prototype components. In addition, we integrate the explanation). By applying their transformation rules, the
trace meta-model to visualize (backward transformation) or approach from [23] allows generating the use case diagram
design (forward transformations) BPMTraceISMdiagrams. and the class diagram from the case study BPMN model,
We built the transformation rule sets in Atlas Transforma- which that is taken as the input model.
tion Language (ATL), which is provided as an internal EMF The online purchasing and selling BPMN model, use
plugin. case model, and class diagram presented in [23] can be
110 Informatica 49 (2025) 97–114 A. Bouzidi et al
Figure 8: Online purchasing and selling in BPMNTraceISM diagram
combined and designed in a single unified model, namely ing vacant lane. For example, the actor Stock manager and
the BPMNTraceISM diagram, by using the BPTraceISM the empty lane Stock manager map to the OUActor Stock
editor. We would like to highlight that this diagram can manager.
be created manually by designers or automatically by run- Assume that business analysts and system designers col-
ning the BPISM2BPMNTrISM component. Figure 9) de- laborate on the BPMNTraceISM diagram and update the
picts the BPMNTraceISM diagram. Figure 9) shows how business and system functionalities accordingly. Suppose
each fragment and its corresponding use case are merged they delete the UCsF Manage preparing purchase order
and expressed as a UCsF. For example, we combine Frag- and the OUActor Customer from the BPMNTraceISM di-
ment F1 with the use case “Manage preparing purchase or- agram The UCsF Manage preparing purchase order is
der” to form the UCsF “Manage preparing purchase order”. traced to the elements of fragment F1, the UML use case
EachUCsF displays the traced BPMNelements and the cor- Manage preparing purchase order, as well as all class dia-
responding class diagram elements. For each UCsF, the el- gram elements derived from F1 By deleting this UCsF, all
ements of the BPMN model are represented in the BPMN its components are also removed from the BPMNTraceISM
compartment, while the corresponding class diagram ele- diagram Then, the change involved in the BPMNTraceISM
ments are represented in the class diagram compartment. diagram is propagated to the source models by executing
In Figure 9), these compartments are hidden in the UCsF the BPMNTrISM2BPISM tool. The output of this compo-
Cancel purchase order, while the BPMN compartment is nent is a BPMN model without the pool Customer or the
visible in all the other UCsFs. fragment F1, a UML use case model that contains neither
the use case Manage preparing purchase order nor the ac-
In the UCsF receive payment, the BPMN compartment tor Customer, and a class diagram without elements corre-
contains a service task called receive payment, a data ob- sponding to F1.
ject called invoice, and a data output called purchase order
[paid] These elements are the BPMN elements of fragment
F8 Moreover, the class diagram compartment of UCsF 7 Evaluation results
archive purchase order is displayed and contains the class
diagram elements, such as the classes VarchivePurchase- 7.1 Comparison with existing approaches
Order, CArchivePrchaseOrder, paid, archived, operations,
attributes, etc corresponding to fragment F10. Further- To evaluate the effectiveness of our traceability method, we
more, an OUActor specifies each actor and the correspond- compare it with existing traceability approaches based on
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 111
Figure 9: Online purchasing and selling in BPMNTraceISM diagram
defined evaluation criteria. These criteria include: (i) the troduces a graphical visualization for traceability links, en-
proposal of a traceability approach at both the meta-model abling users to more easily trace and align elements across
and model levels, (ii) explicit representation of relationship models. This graphical representation reduces analysis
types between elements, (iii) graphical notation for trace time, simplifies development, and minimizes the risk of
links, and (iv) the consideration of both business and infor- misalignment.
mation system (IS) models. Moreover, our method's assessment methodology is
Table 5 presents the results of this comparison, with rows more comprehensive than most others in the field, which
listing the methods studied and columns representing the typically rely on simple case studies. We provide a fully im-
evaluation criteria. Cells are color-coded to show the ex- plemented prototype of the transformation approach and an
tent to which each criterion is satisfied: dark grey indicates Eclipse plug-in for the traceability process, demonstrating
a criterion is not fully addressed, light grey represents par- the practicality and feasibility of our contributions through
tial satisfaction, and ”Y” stands for full satisfaction. Based a relevant case study.
on this comparison, we conclude that our approach is the
only one that meets all evaluation criteria. Specifically,
[17] and [20] are the only other works that consider both 7.2 Shortcomings of our contribution
business and IS modeling, but they fall short in addressing Despite the strengths of our traceability method, there are
the full range of traceability needs compared to our method. some limitations that need to be addressed. One notable
Furthermore, only a few approaches, such as those by [15], drawback is that our evaluation was based on a single use
[18], and [20], take into account the functional and static case, which may not be sufficient to fully assess the accu-
views of IS models. racy and robustness of the method. To address this, we are
Our method is unique in that it does not require any conducting ongoing evaluations using more complex case
extensions and works seamlessly with standard UML and studies, which will allow us to better validate the method's
BPMN tools, making it more adaptable and accessible. Ad- performance in different contexts. This extended evalua-
ditionally, our method provides rich software modeling- tion will help ensure that the method is robust and adaptable
level artefacts, incorporating both static views (class dia- across various scenarios, enhancing its overall credibility.
grams) and functional views (use case models). The class Additionally, our current transformation approach relies
diagrams are designed in accordance with the MVC pattern on forward and backward transformation rules, which re-
simplifying the prototyping process for developers. quire the recreation of all components, even if they have
When focusing on traceability, our method stands out be- not been affected by changes. This process can lead to in-
cause it provides traceability at both the meta-model and efficiencies, especially when working with large or com-
model levels, ensuring a unified view of BPMN and UML plex models. To overcome this issue, we plan to develop
elements. In contrast, many approaches specify only trace- incremental transformations that will update only the com-
ability at the meta-model level, without providing a visual ponents directly impacted by changes. This will improve
tool for combined model use. Additionally, our method in- efficiency and minimize unnecessary recalculations, ensur-
112 Informatica 49 (2025) 97–114 A. Bouzidi et al
Table 5: Comparison of our contribution with approaches based on the external traceability practice
Construction model Traceability approach
Approach Assessment
Business software field evel Model Trace Graphic BPM and methodology
field Meta l
functional static level links notation ISM
[11] N P P Y N N N N CS
[13] N CS Y N N N N CS
[14] N N N Y N N N N tool
[15] P P P Y N N N N
[16] RM CS Y N N N CS
[17] BPMN N N Y N N N N CS
[18] p p p Y Y N N Y N
[19] RM N N Y N Y N N N
[20] N N CD Y N N N N N
[ 21] BPMN N N Y Y N Y N T
Our con- BPMN2 UC CCD Y Y Y Y Y CS &T
tribution
Legend: Y:Yes N: No CS: case study RM: requirement model T:tool/editor CS: Complex systems CCD:conception class
diagram UC: uqe case diagram.
ing faster and more resource-efficient updates. meta-model and the model levels. Hence, (1) we first de-
While we have explored the use of traceability informa- fined a unified tracemeta-model that includes all the BPMN
tion to keep BPMN and UML models aligned, the analysis and the UML elements (use case and class diagram) and
process is still manual. As a result, we aim to improve this traceability links between interrelated elements. (2) Then,
by investigating the development of heuristics that could we defined integratedmodel conforms to the proposed trace
automatically detect modifications in the source models meta-model. We defined it as a new diagram named BPM-
and suggest necessary adjustments to the corresponding el- NTraceISM (BPMN Traces Information System Models).
ements. These heuristics would dynamically support de- This diagram serves many purposes: it promotes the col-
velopers, making the process of maintaining alignment be- laborative between business and software designers and al-
tween the models more efficient and less error-prone. lows them to work together using one single unified model.
To address the limitations mentioned above, we propose The joint representation of both BPMs and ISMs elements
a comprehensive roadmap for future improvements. This enables users to drill down and easily trace any business
will involve extending the evaluation process with more artefact to its corresponding software artefacts. (3) Finally,
complex case studies, enhancing the transformation ap- we defined a set of bidirectional model transformation rules
proach to support incremental updates, and automating dia- between the BPMN and UML models, as well as the BPM-
gram analysis. In addition, the implementation of heuristic- NTraceISM diagram.
based tools will enable the automatic detection of changes The rules are useful when a change propagation-
across models, improving traceability and ensuring con- based co-evolution is required to synchronize models after
sistency without requiring manual intervention. These ad- changes. To prove the feasibility of our traceability method
vancements will significantly strengthen themethod's capa- in practice, we developed a modeling tool in the form of
bilities, making it more robust and easier to apply in prac- a plugin that can be integrated into the Eclipse platform.
tical scenarios. This tool is named BPTraceISM (Business Process model
Trace with Information SystemModels) and allows design-
ing and handling BPMNTraceISM diagrams in accordance
8 Conclusion with the proposed integrated trace meta-model. Addition-
ally, we specified the set of bidirectional transformation
The work conducted in this paper fits within the context rules using the ATL language and we implemented them
of model-based development of ISMs, their alignment and as components of the BPM2ISM prototype. Furthermore,
their we applied the proposed approaches to a typical case study.
coevolution with BPMs. Indeed, we have used integra- In future research, we look forward to optimizing our
tion and model transformation methodologies to define a editor to support traceability and synchronization between
traceability method oriented towards the development of BPMN models and other UML diagrams.
(meta) model-based solutions, purposely influenced by the
Object Management Group (OMG) specifications. Partic-
ular attention is paid to the BPMN and UML use case and
class diagram models. Our traceability method acts at the
A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 113
References [11] Mills, C., Escobar-Avila, J., Haiduc, S. (2018). Auto-
matic Traceability Maintenance via Machine Learn-
[1] Bouzidi, A., Haddar, N., Haddar, K. (2019). Trace- ing Classification, In 2018 IEEE International Con-
ability and Synchronization Between BPMN and ference on Software Maintenance and Evolution (IC-
UML Use Case Models, Ingénierie des Systèmes SME), pp. 369-380. 10.1109/ICSME.2018.00045.
d’Information, Vol. 24, No. 2, pp. 215-228. https:
//DOI.org/10.18280/isi.240214. [12] Al-Hroob, A., Imam, A. T., Al-Heisa, R. (2018).
The Use of Artificial Neural Networks for Extracting
[2] OMG UML Specification, O. A. (2017). OMG Uni- Actions and Actors from Requirements Documents,
fied Modeling Language (OMG UML), Superstruc- Information and Software Technology, Vol. 101,
ture, V2, Object Management Group, Vol. 70. pp. 1-15. https://DOI.org/10.1016/j.infsof.
[3] OMG BPMN Specification. Business Process Model 2018.04.010.
and Notation Available at: http://www.bpmn.org/. [13] Min, H. S. (2016). Traceability Guideline for Soft-
Accessed: 2023-01-31. ware Requirements and UML Design, In International
[4] Driss, M., Aljehani, A., Boulila, W., Ghandorh, H., Journal of Software Engineering and Knowledge En-
Al-Sarem, M. (2020). Servicing your requirements: gineering, Vol. 26, No. 1, pp. 87-113. https://DOI.
An FCA and RCA-driven approach for semantic web org/10.1142/S0218194016500054.
services composition, IEEEAccess, Vol. 8, pp. 59326- [14] Eyl, M., Reichmann, C., Müller-Glaser, K. (2017).
59339. 10.1109/ACCESS.2020.2982592. Traceability in a Fine-Grained Software Configura-
[5] Ghiffari, K. A., Fariqi, H., Rahmatullah, M. D., Zul- tion Management System, In Software Quality: Com-
fikarsyah, M. R., Evendi, M. R. S., Fathoni, T. A., plexity and Challenges of Software Engineering in
Raharjana, I. K. (2023). BPMN2 user story: Web ap- Emerging Technologies, 9th International Confer-
plication for generating user stories from BPMN, In ence, SWQD 2017, Vienna, Austria, January 17-20,
AIP Conference Proceedings, AIP Publishing LLC, 2017, Springer International Publishing, pp. 15-29.
Vol. 2554, No. 1, pp. 040003. https://DOI.org/ 10.1007/978-3-319-49421-0_2.
10.1063/5.0103685. [15] Khelladi, D. E., Kretschmer, R., Egyed, A. (2018).
[6] Raharjana, I. K., Aprillya, V., Zaman, B., Justi- Change Propagation-Based and Composition-Based
tia, A., Fauzi, S. S. M. (2021). Enhancing soft- Co-Evolution of Transformations with EvolvingMeta-
ware feature extraction results using sentiment anal- Models, In Proceedings of the 21st ACM/IEEE In-
ysis to aid requirements reuse, Computers, Vol. ternational Conference on Model Driven Engineer-
10, No. 3, pp. 36. https://DOI.org/10.3390/ ing Languages and Systems, pp. 404-414. https:
computers10030036. //DOI.org/10.1145/3239372.3239380.
[7] Khlif, W., Elleuch, N., Alotabi, E., Ben-Abdallah, [16] de Carvalho, E. A., Gomes, J. O., Jatobá, A., da
H. (2018). Designing BP-IS Aligned Models: An Silva, M. F., de Carvalho, P. V. R. (2021). Em-
MDA-based TransformationMethodology. 10.5220/ ploying Resilience Engineering in Eliciting Soft-
0006704302580266. ware Requirements for Complex Systems: Exper-
iments with the Functional Resonance Analysis
[8] Kharmoum, N., Retal, S., Rhazali, Y., Ziti, S., Method (FRAM), Cognition, Technology and Work,
Omary, F. (2021). A Disciplined Method to Gen- Vol. 23, pp. 65-83. https://DOI.org/10.1007/
erate UML2 Communication Diagrams Automati- s10111-019-00620-0.
cally From the Business Value Model, In Advance-
ments in Model-Driven Architecture in Software [17] Lopez-Arredondo, L. P., Perez, C. B., Villavicencio-
Engineering, IGI Global, pp. 218-237. 10.4018/ Navarro, J., Mercado, K. E., Encinas, M., Inzunza-
978-1-7998-3661-2.ch012. Mejia, P. (2020). Reengineering of the Software De-
velopment Process in a Technology Services Com-
[9] Rahmoune, Y., Chaoui, A. (2022). Automatic Bridge pany, Business Process Management Journal, Vol. 26,
Between BPMN Models and UML Activity Diagrams No. 2, pp. 655-674. https://DOI.org/10.1108/
Based on Graph Transformation, Computer Science, BPMJ-06-2018-0155.
Vol. 23, No. 3. 10.7494/csci.2022.23.3.4356.
[18] Moreira, J. R. P., Maciel, R. S. P. (2017). Towards a
[10] Ivanchikj, A., Serbout, S., Pautasso, C. (2020). From Models Traceability and Synchronization Approach of
Text to Visual BPMN Process Models: Design and an Enterprise Architecture, In SEKE, pp. 24-29. 10.
Evaluation, In Proceedings of the 23rd ACM/IEEE 1109/CBI.2019.00028.
International Conference on Model Driven Engineer-
ing Languages and Systems, pp. 229-239. https: [19] Guo, J., Cheng, J., Cleland-Huang, J. (2017). Seman-
//DOI.org/10.1145/3365438.3410990. tically Enhanced Software Traceability Using Deep
114 Informatica 49 (2025) 97–114 A. Bouzidi et al
Learning Techniques, In 2017 IEEE/ACM 39th Inter-
national Conference on Software Engineering (ICSE),
pp. 3-14. 10.1109/ICSE.2017.9.
[20] Swathine, K., Sumathi, N., Nadu, T. (2017). Study
on Requirement Engineering and Traceability Tech-
niques in Software Artefacts, In International Journal
of Innovative Research in Computer and Communi-
cation Engineering, Vol. 5, No. 1. 10.1109/ICSRS.
2017.8272863.
[21] Pavalkis, S., Nemuraite, L., Milevičienė, E. (2011).
Towards Traceability Meta-Model for Business Pro-
cessModeling Notation, In Conference on e-Business,
e-Services and e-Society, Springer, Berlin, Heidel-
berg, pp. 177-188. 10.1007/978-3-642-27260-8_
14.
[22] Bouzidi, A., Haddar, N., Abdallah, M. B., Had-
dar, K. (2018). Alignment of Business Processes and
Requirements Through Model Integration, In 2018
IEEE/ACS 15th International Conference on Com-
puter Systems and Applications (AICCSA), pp. 1-8,
IEEE. 10.1109/AICCSA.2018.8612870.
[23] Bouzidi, A., Haddar, N. Z., Ben-Abdallah, M., Had-
dar, K. (2020). Toward the Alignment and Traceabil-
ity Between Business Process and Software Models,
In ICEIS, Vol. 23. 10.5220/0009004607010708.
https://doi.org/10.31449/inf.v49i16.6934 Informatica 49 (2025) 115–136 115
A Review of Machine Learning Techniques in the Medical Domain
Enas M.F. El Houby
Department of Systems and Information, National Research Centre, Giza, Egypt
E-mail: em.fahmy@nrc.sci.eg, enas_mfahmy@yahoo.com
Keywords: active learning, curriculum learning, deep learning, federated learning, medical, transfer learning
Received: August 19, 2024
We have witnessed a rapid exponential growth of all types of data in all domains specifically in the medical
domain. The utilization of machine learning techniques has made significant strides across various
domains, with deep learning achieving notable success in recent years. Lately, deep learning has gained
increasing attention in the medical field. While deep learning excels at automatically learning
discriminative features from raw data, it is still challenging to achieve high performance without a huge
amount of data and some handcrafted steps. To address these challenges, deep learning has been
incorporated with other new trends and domain knowledge to enhance deep learning's capabilities and
improve performance covering the ever-growing needs. Transfer learning utilizes knowledge from natural
images, curriculum learning integrates domain-specific knowledge, active learning selects the most
informative samples to reduce reliance on labeled data, and federated learning enables collaborative
training across organizations while ensuring data privacy. In this review paper, these new trends
incorporated with deep learning have been investigated and presented as applications in the medical
domain by investigating articles that have applied these trends and published in highly reputable journals
in the Science Direct database in recent years.
Povzetek: V pregledni študiji so predstavljeni sodobni trendi strojnega učenja v medicini, kot so
transferno, aktivno in federativno učenje na podlagi učnih načrtov, ki v kombinaciji z globokim učenjem
izboljšujejo diagnostiko, personalizacijo zdravljenja in varnost podatkov.
1 Introduction learning techniques are usually categorized into
supervised, unsupervised, semi-supervised, and
Recently, we have witnessed the growth of all types of reinforcement learning. In supervised learning, the labeled
data in all domains. Medical data specifically has grown data is available, therefore, the model can be trained using
dramatically in the last few years due to the exponential this manually tagged data to extract patterns. When there
increase of knowledge in the medical domain. Medical is no labeled training data, unsupervised techniques are
data can be found in various forms such as clinical and employed. It groups similar entities in the same cluster.
biomedical data. Biomedical data contains data related to Each cluster demonstrates a relation between these
genomics, drug discovery, and biomedicine. Clinical data grouped entities. Semi-supervised depends on a set of
contains patient records such as medical patients’ history, hand-crafted extraction patterns and a few tagged
laboratory investigation, and image data from magnetic instances as initial seeds of the target relation to start the
resonance imaging (MRI), ultrasound (US), X-rays, and training. The training output is used as the training input
computerized tomography (CT) scans. Clinical data exists for the following generation. The process of learning is
in 2 forms, structured and unstructured. The structured repeated for many generations. Reinforcement learning is
format includes the disease history and living habits of the based on evaluative feedback, so, it can automatically
patients. While unstructured clinical data such as doctor’s perform goal-oriented learning and process decision-
investigation records and the conversation between the making problems [4, 5].
doctor and patients [1-3]. Therefore, this rapidly growing Deep learning is an advanced form of artificial neural
volume of medical data requires advanced methods for network (ANN), with a larger number of layers than a
analysis. conventional ANN model to automatically learn the
Applying artificial intelligence (AI) in the medical features from the data which makes more refined
domain comprises a promising technology for different predictions possible. In numerous recent medical image
healthcare providers. These technologies, particularly data classification tasks, convolutional neural networks
mining, help extract hidden patterns and insights from (CNNs), which are a kind of deep learning network
large datasets using machine learning techniques (MLTs). particularized in image analysis, were utilized and
Traditional MLT includes Artificial Neural Networks achieved high performance. The success of CNN in the
(ANN), Decision Trees (DT), Support Vector Machines classification of medical images has motivated researchers
(SVM), and many other techniques. Machine to utilize pre-trained models in building new ones. These
116 Informatica 49 (2025) 115–136 E.M.F. El Houby
high-performing CNN pre-trained models have been in patient data, machine learning models can predict
utilized for different image classification tasks by disease progression, forecast complications in chronic
employing the transfer learning (TL) approach. Pre- conditions, and identify high-risk patients who may
trained CNN models utilize features that were learned benefit from earlier interventions. IV) Automation is
from a specific domain to fine-tune any other data. They another key opportunity in healthcare, with ML models
can be utilized as-is to classify new images or to extract capable of automating routine tasks such as image
features using the output from the layer previous to the analysis, patient triaging, and administrative work. This
output layer and introduce it to another classifier [6]. allows healthcare providers to focus more on direct patient
However, many challenges face the application of care, improving overall efficiency. V) Drug discovery by
machine learning techniques generally and in the medical identifying promising drug candidates and predicting their
domain specifically such as I) the limitations of available behavior in the human body, which can reduce the time
datasets for training the models, that is because collecting and cost associated with bringing new medications to
and labeling the data is a labor-intensive and expensive market [7-10].
task, especially in the case of medical images data such as In response to the mentioned challenges, recent
Ultrasonic imaging (US), CT, MRI. The annotation of data research has shifted towards using advanced techniques
includes the segmentation annotations of abnormality such as deep learning with some incorporated techniques
regions and classification labels such as (normal, benign, and domain knowledge like transfer learning which
and malignant). Also, that limitation may result from the provides deep learning with information from natural
scarcity of some diseases with which it is difficult to images. Curriculum learning integrates domain
obtain enough positive cases. II) The low quality of some knowledge through training patterns of the processed task.
data is another major challenge, where some of the data Active learning explores the most informative samples
can be found unlabeled, inconsistent, inaccurate, or in an and retrieves them from an unlabeled pool to fulfill better
unstructured format—such as handwritten notes, performance with less labeled data. Federated learning
radiology reports, and conversations between doctors and allows many organizations to collaborate on deep learning
patients—which are difficult for machine learning without sharing clients' data or devices which provides
algorithms to process effectively. In the case of medical efficient data access and security and an improvement of
image modalities, there may be variations in image the learning model utilizing a large decentralizing dataset.
resolution and quality. III) The shortage of explanations The purpose of this research is to illustrate the new trends
of pathological basis such as the diagnosis reasons, where of machine learning in the medical domain. The selected
the techniques depend only on the differences between the articles that are reviewed show these new trends in the
normal and patient cases. For healthcare professionals to medical domain using different medical dataset types
trust and act on ML-generated results, it is essential to including medical images, tabular datasets, genes, etc. in
understand how these models arrive at their predictions. different tasks. The remainder of this research is organized
IV) Ethical and regulatory concerns play a crucial role, as follows: Section 2 illustrates the different types of
where the healthcare industry is tightly regulated, and medical data. Section 3 presents some data preprocessing
machine learning models must comply with stringent steps. Section 4 presents the new trends of MLTs. Section
standards to ensure patient privacy, data security, and 5 describes the search methodology for articles that apply
model safety. Furthermore, any biases in the data could the mentioned new trends of MLTs in the medical domain.
lead to unequal or unfair treatment recommendations, Section 6 presents some of the applications of new trends
making fairness an ongoing concern in the application of of MLTs in the medical domain. Section 7 presents the
ML in healthcare [7-10]. conclusion and some of the recommended points for
Despite these challenges, machine learning presents a future work.
wealth of opportunities that can significantly improve
healthcare outcomes such as I) diagnostic accuracy and 2 Types of medical data
speed where ML algorithms, particularly deep learning
models, have demonstrated remarkable success in Medical data can be found in different forms such as
automating and enhancing diagnostic processes, arrays of numerical data, images, sequences of DNA,
especially in medical imaging. For instance, ML models amino acids, ...etc. For developing any ML model, the data
can analyze radiographs, MRI scans, and other images to is split into three parts which are training, validation, and
identify abnormalities such as tumors or lesions with a testing. The training part is used to learn to tune the
level of precision that often rivals or exceeds that of parameters of the model. The validation part is used to
human experts. This capability can lead to earlier stop overfitting, and the test part is used to assess the
detection, which is critical for improving patient performance of the model. In the next subsections, a brief
outcomes, particularly in cancer and cardiovascular overview of different medical data forms will be
diseases. II) personalized medicine by analyzing large presented.
datasets, including patient demographics, genetic
information, and medical history, machine learning can 2.1 Numerical data
help tailor treatments to individual patients, optimizing Different diseases' related data are found as an array of lab
therapeutic interventions based on their unique tests which is numerical data. These numerical datasets
characteristics. III) Predictive analytics is another can be used to manage the related diseases such as the
powerful opportunity that ML offers. By analyzing trends
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 117
datasets available on the UCI machine learning repository 2.3 Image modalities
[11]. Most numerical data are available in table form such
Information obtained from medical imaging modalities is
as Excel sheets or database tables where rows represent
clinically beneficial in many applications like computer-
samples from patients and columns represent different
aided detection, diagnosis, and treatment planning. Many
features that describe the intended diseases or vice versa.
imaging modalities can be used to check abnormalities in
A huge number of numerical datasets are available such as
different body organs. They include radiation such as CT,
the patient demographics of some diseases like COVID-
X-rays, US, and MRI. They are categorized according to
19, and the lab results for different diseases such as
the method of producing images. They help radiologists to
thyroid, heart disease, dermatology, cancer, etc.
recognize abnormal regions. The interpretation of
different image modalities needs expertise, where it is
2.2 Microarray gene expression data operator dependent. Therefore, the process of reading
Microarray techniques provide a platform for measuring image modalities is exhausting, costly, and prone to error.
the expression levels of thousands of genes in various Ultrasound (US) is a suitable modality for tumor
conditions. It is composed of a small glass slide or detection. It can estimate the size of the tumor and
membrane that contains samples of many genes arranged distinguish abnormalities. Its capability of detecting
in a regular pattern. It is used to find genes associated with contra-lateral malignant lesions is limited [14].
specific diseases by analyzing and finding the differences Magnetic resonance imaging (MRI) produces images
between two mRNA sets, one set is from normal cells and relevant to the displaying of hydrogen atoms to radio
the other set includes cells from pathological tissues such waves and magnetic fields. MRI images are valuable as
as cancer cells. Microarray data contains a lot of redundant they present physiology and anatomy. It images the target
genes, and many genes include inappropriate information organ and prepares it as thin slices; moreover, it provides
for the accurate classification of diseases. Thus, the information about the vascularity of the tissue [15].
analysis of the large amount of data generated by this Computed tomography (CT) scanners display better
technology is not an easy task for biologists [12]. Figure 1 image clarity using multiple X-ray sources and detectors
shows a cDNA microarray spotted on a glass surface. [15]. Radiation X-ray generated images are 2-dimensional
While, in Figure 2 the general structure of the microarray images. Fluoroscopy units show real-time moving images
is illustrated, which is represented as an array of numerical produced by X-ray exposure. Angiography is a
values. Cancer gene expression datasets for leukemia, widespread usage of fluoroscopy, imaging blood flow in
lung, prostate, etc. can be found in [13]. vessels [15].
Digital Mammography (DM) is an X-ray imaging that
is specialized for breast tissue. DM is the most common
and most important screening method in clinical practice.
It can detect tumors before they develop further and
become easily detected and felt by the physician [16].
Microscopic images are the images that are captured
by the microscope to enlarge small scanned objects and
extract fine details that cannot be obtained otherwise
]17[. Figure 3 shows samples of different image
modalities for different body organs.
Figure 1: cDNA microarray spotted on a glass surface. 2.4 DNA and protein sequences
https://www.cell.com/fulltext/S0960-9822%2898%2970103-4
The fast growth of sequencing resulted in huge numbers
of DNA and protein sequences. Sequences can be used to
predict diseases associated with a given either DNA or
protein sequences. DNA is a long polymer chain of units
named nucleotides; it exists in a double helical shape as
shown in Figure 4. There are 4 types of nucleotides which
are A (adenine), C (cytosine), G (guanine), and T
(thymine), they are considered the alphabet of DNA. They
are arranged into sequences of 3-letter called codons. A
double-stranded helical structure of DNA would be
complementary, where “G” is chemically combined with
“C”, and "A" with "T" within the replication of DNA [18].
Figure 2: General structure of microarray.
118 Informatica 49 (2025) 115–136 E.M.F. El Houby
(a) X-ray of Lung [19] (b) DM of breast [20] (c) Microscopic blood image [21]
Figure 3: Samples of different image modalities for different body organs.
Amino acids are linked into linear chains to produce then, the translation of the transcribed mRNA into the
proteins. The properties of proteins are defined by the associated chain of amino acid sequence, which later fold
composition of their amino acids. The triplets of into fully functional proteins.
consecutive DNA nucleotides which are called codons are Single nucleotide polymorphisms (SNPs) are the most
responsible for the forming amino acid sequence in a common human genetic variations as mutations or
protein. There are 4³ = 64 various codons formed from the insertions/deletions (indels). If SNPs have changed the
4-letters [22], which is more than 3 times larger than the codon triplets without changing the encoded amino acid,
number of amino acids which is 20 amino acids, 3 of it is synonymous (sSNPs) while the gene is not mutated.
which represent stop codons and one is a start codon. Otherwise, it is non-synonymous (nsSNPs), as it changes
While the remaining codons are responsible for generating the codon while the encoded amino acid is changed into
the 20 amino acids. So, it is possible that more than one various amino acids which are called missense mutations
codon maps the same amino acid [18]. Figure 5 shows the which are the reason for many diseases [23, 24]. Figure 6
transcription of DNA sequence into molecules of mRNA; shows single nucleotide polymorphisms (SNPs).
Figure 4: Chain of DNA sequence.
http://acer.disl.org/news/2016/08/17/tool-talk-gene-sequencing/
Figure 5: The process of translation from DNA sequence to the associated amino acid sequence.
https://courses.lumenlearning.com/suny-ap1/chapter/3-4-protein-synthesis/
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 119
Figure 6: Single nucleotide polymorphisms (SNPs).
https://isogg.org/wiki/Single-nucleotide_polymorphism
3 Data preprocessing
the feature selection depends on the outcome of the ML
The knowledge discovery includes 3 main phases which algorithm to decide how favorable the features subset is.
are the preprocessing phase, the data mining phase, and Candidate solutions of feature subsets are iteratively
the post-processing phase. Data pre-processing is a crucial generated and their characteristics are assessed by the
phase of knowledge discovery to build an accurate applied ML algorithm [28].
machine learning model. In the preprocessing phase, a set The wrapper-based feature selection approach
of data preprocessing steps are performed (cleaning the evaluates the quality of feature subsets using the learning
data from the noise, handling missed values, merging algorithm. Thus, it can determine and discard irrelevant
appropriate data from different databases, normalizing the and redundant features effectively. As the learning
data, extracting features, and selecting the most algorithm is frequently used in the search process, high
informative features) to prepare the data for data mining computational time is required, especially when the
phase. Datasets can also be small therefore the relevant datasets are large. On the other hand, the hybrid methods
features have not been captured and thus data aim to utilize the advantages of both approaches; the
augmentation is performed by applying different data efficiency of computation of the filter approach and the
augmentation techniques. Data mining, which is the core high performance of the wrapper approach [29].
phase in knowledge discovery, is performed by applying Feature selection algorithms based on heuristic search
MLTs. The preprocessing facilitates the application of the methods are needed as the computation of a huge number
MLTs to extract important patterns or correlations. In the of features is not feasible. Many meta-heuristics
post-processing phase, the discovered knowledge is approaches have been used for feature selection. From
refined and improved then interpreted into meaningful these algorithms are the nature inspired algorithms such as
knowledge for the user’s presentation [25]. genetic algorithm (GA), firefly [30, 31] and ant colony
Feature selection is a key difference preprocessing optimization (ACO) [32, 33].
step that should be highlighted when comparing deep
learning with the traditional MLTs so it will be tackled in
more detail in the next subsection. 4 New trends of machine learning
techniques
3.1 Features selection After data preprocessing, various Machine Learning
Feature selection is the process of finding the optimal Techniques (MLTs) are applied to uncover hidden
feature subset that is strongly distinguishing among patterns and correlations in the data. As mentioned earlier,
different classes. The purpose of this process is the disease-related data, often represented through numerical
reduction of the dataset and the elimination of redundant lab tests, microarray data, medical imaging, and genetic
and irrelevant features that impact the classification sequences, can be processed to predict disease presence or
process negatively. Feature selection is a combinatorial other related tasks. Traditional MLTs like Support Vector
optimization problem its aim is to select the feature subset Machines (SVM), Decision Trees (DT), and K-Nearest
with the least number of features that achieves the highest Neighbors (KNN) are effective across these data types for
possible classification accuracy. It is one of the data decision-making. However, Artificial Neural Networks
preprocessing for pattern recognition and data mining (ANNs), which mimic the brain’s neural structure, are
specifically when working on high-dimensional datasets increasingly used for more complex tasks and in various
[26, 27]. domains, especially the medical domain. ANNs consist of
Feature selection has 2 main approaches: the filter and input, hidden, and output layers, and are trained through
the wrapper. In the filter approach, the feature selection is techniques like backpropagation. Their success across
based on statistical individual feature ranking. It is easily domains, especially healthcare, has led to the development
implemented but eliminates the interaction among of Deep Learning (DL), a more advanced form of ANN.
features and does not rely on the applied ML algorithm to This section explores the role of deep learning in modern
the selected features. Whereas, for the wrapper approach machine learning.
120 Informatica 49 (2025) 115–136 E.M.F. El Houby
Figure 7: The general structure of Deep Neural Network.
random search are widely used, more advanced
4.1 Deep learning (DL) techniques, such as Bayesian optimization, offer
significant advantages [37, 38].
Deep Learning (DL) models have gained prominence
due to their ability to automatically extract complex
patterns from data, eliminating the need for manual Common techniques for hyperparameter
feature engineering. However, DL models require large optimization:
datasets, making them particularly suited for high-
dimensional data, such as in medical fields, where they 1. Grid search: Exhaustively searches across all
can uncover intricate structures through multiple possible hyperparameter combinations. While it
intermediate layers. The depth of a DL model— guarantees to find the best parameter set within the
referring to the number of hidden layers—enables it to grid, it can be computationally expensive, especially
learn complex mappings between input and output. for models with many hyperparameters.
Unlike shallow networks, which struggle with intricate
data patterns, deeper networks excel at learning these 2. Random search: A more computationally efficient
relationships [34, 35]. Figure 7 shows the general approach, randomly selecting combinations of
structure of Deep Neural Networks (DNNs). hyperparameters from specified ranges. It often
achieves comparable or better results than grid
There are several deep learning algorithms such as search in fewer trials.
Convolution Neural Network (CNN), radial basis
function networks, deep belief networks, autoencoders, 3. Bayesian optimization: An advanced method that
and Recurrent Neural Network (RNN) [35, 36]. builds a probabilistic model of the objective
Deep learning depends on hyperparameters such as function. It predicts the best hyperparameters based
activation function, learning rate, batch size, number of on past performance, guiding the search toward the
epochs, optimizer, dropout rate, etc. Different deep most promising regions with fewer trials. Libraries
learning algorithms, like RNNs and CNNs, also have like Optuna and Hyperopt can implement Bayesian
additional specific hyperparameters. Adjusting these optimization efficiently.
hyperparameters is critical, as their values significantly
For example, CNN can be used to classify medical
affect the model's behavior. Finding the optimal
combination of hyperparameters can be an exhaustive images like X-rays or MRI scans. Random search can
explore different values for hyperparameters (e.g.,
task, requiring substantial computational resources and
learning rate, batch size, number of layers).
time [37, 38].
Alternatively, Bayesian optimization can be used for a
more efficient search, predicting the most promising
The performance of a DL model heavily depends on the
hyperparameters configurations based on prior
selection of these hyperparameters, particularly in
evaluations. By optimizing the model’s parameters
complex domains like medical data analysis. Medical
using these methods, we can improve classification
data often have high dimensionality, noise, and
accuracy, reduce overfitting, and ensure the model
imbalanced class distributions, making hyperparameter
performs well on unseen medical data.
optimization crucial to enhancing model performance.
Careful selection improves robustness and
generalizability, ensuring reliability in real-world
clinical settings. While methods like grid search and
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 121
Advantages and Limitations of DNNs for Medical defines the input to hidden connection, a weight matrix
Data W defines the hidden-to-hidden connection, and a
weight matrix V defines the hidden-to-output
Advantages: connection. Then, from time step t = 1 through time step
t = n, the following equations are used:
• Versatility: DNNs can be adapted to work with
various data types, including structured clinical data 𝑎𝑡 = 𝑏 + 𝑊ℎ𝑡−1 + 𝑈𝑥𝑡 (1)
(e.g., patient demographics, lab results), unstructured
ℎ𝑡 = tanh(𝑎𝑡)
data (e.g., free-text medical records), and image data. (2)
𝑜𝑡 = c + 𝑉ℎ𝑡 (3)
• Feature Learning: DNNs can automatically learn
relevant features from the data, making them more 𝑦𝑡 ̂ = SoftMax(𝑂𝑡) (4)
flexible than traditional machine learning algorithms The forwarding propagation of RNN is defined by
that rely on feature engineering. the preceding equations, where b and c are the bias
vectors, while tanh and SoftMax are the activation
Limitations:
functions. To update the weight matrices U, V, and W,
we compute the gradient of the loss function for each
• Training complexity: Training deep neural
weight matrix. Gradient computation requires both
networks can be computationally expensive and
forward and backward propagation of the network. Any
time-consuming. Additionally, DNNs require large
loss function can be used depending on the goal. At
datasets to avoid overfitting.
each time step, the sum of all losses is the total loss for
a particular sequence of x values.
• Overfitting: If not carefully tuned, DNNs can overfit
However, traditional RNNs suffer from gradient
to small or imbalanced datasets, a common issue in
exploding and gradient vanishing issues, making them
medical data where datasets may not be as large or
unsuitable for long-term dependencies. On the other
diverse as needed for training.
hand, long short-term memory (LSTM) is effective in
Use Cases in Medicine: DNNs have been applied to a capturing long-term time dependence. LSTM networks
variety of tasks in medicine, including predicting address this by introducing gating mechanisms that
patient outcomes, disease progression modeling, and control the memory flow, allowing for better long-term
disease classification from images and clinical data. sequence learning. Gated Recurrent Units (GRUs) offer
a simplified version of LSTMs with similar benefits but
CNNs and RNNs are two of the most common and fewer parameters.
promising deep learning algorithms used in medical
applications. These algorithms have demonstrated Advantages and Limitations of RNNs for medical
success in a variety of tasks, such as medical image data
classification and time-series data analysis. Further
Advantages:
details on these algorithms will be discussed in the next
subsections.
• Sequential data processing: RNNs can handle
different types of sequential data, including time
4.1.1 Recurrent neural networks (RNN) series and text, where the past medical history or a
Recurrent neural networks (RNNs) are neural networks series of clinical events influence future outcomes.
that contain memories that can capture the stored
information in the prior element of the given sequence. • Memory of past inputs: RNNs can remember
Therefore, RNN is suitable for processing sequential information from previous time steps in the
data types such as the diagnostic history of patients, sequence, allowing them to capture temporal
DNA and protein sequences, etc., where the dependencies in data. This is particularly useful for
information is remembered through the network. RNN tracking disease progression over time or analyzing
is called recurrent because it executes the same task for patient histories.
each element of the input sequence while its output is
based on the prior computations (memory). Thus, the Limitations:
decision of recurrent net at time t-1 affects the decision
that will be taken later at time t. Therefore, RNN has 2 • Training difficulties: RNNs are prone to the
sources of input, the recent past and the present, which vanishing gradient problem, especially in long
are combined to define the response to new data. Figure sequences, making them harder to train effectively.
8 shows the architecture of the RNN in which a set of
input x values are mapped into a sequence of output o • Data complexity: RNNs are best suited for data
values. A loss L measures the difference between the where the relationship between input and output is
expected output o and the actual output y [35]. sequential. For static data like images or tabular data,
Where 𝑥, ℎ, o, L, y symbolizes input, hidden state, CNNs or DNNs might be more appropriate.
output, loss, and target value. A weight matrix U
122 Informatica 49 (2025) 115–136 E.M.F. El Houby
• Resource intensive: Training RNNs, especially on
long sequences, can be computationally expensive.
Figure 8: The architecture of recurrent neural network.
at specific positions. At a specific layer 𝑙, the feature map
Use cases in medicine: RNNs (and their variants like at position (𝑖, 𝑗) is defined as ℎ𝑙
𝑖𝑗, the bias as 𝑏𝑙, and the
LSTMs) are commonly used in medical applications such weight as 𝑊𝑙. The feature map can be expressed as
as gene sequence classification, predicting disease follows:
progression over time, and analyzing time-series medical h𝑙
ij = ReLU((W𝑙 ∗ I)ij + b𝑙) (5)
signals (e.g., ECG readings) [35]. Where ReLU is activation function which controls the
4.1.2 Convolution Neural Network output. The basic structure of the CNN is shown in Figure
9.
CNNs are a type of deep learning network specialized for
image analysis. Unlike traditional MLTs that rely on Advantages and limitations of CNNs for medical data
manual feature extraction, CNNs can automatically learn
Advantages:
hierarchical features from raw image data. This is
especially useful in the medical field, where CNNs are
• Feature extraction: CNNs automatically learn
applied to analyze medical images for tasks like disease
hierarchical features from raw image data,
detection and classification [36, 39].
eliminating the need for manual feature
It contains an input, an output and many hidden layers extraction, which is time-consuming in
which represent convolutional networks. Convolutional traditional methods.
network includes three types of layers: convolutional,
activation, and pooling. The convolutional layers apply • Spatial relationships: The convolutional layers
filters to detect features (edges, textures, etc.). As the can detect local patterns (e.g., edges, textures) in
image proceeds through layers, the filters can detect more images, which are crucial for tasks like tumor
sophisticated features. The activation function like detection or organ segmentation.
Rectified Linear Unit (ReLU) follows the convolution
layer to control the output, it introduces non-linearity. • Efficiency: CNNs are computationally efficient
Pooling layers reduce the dimensionality of the data, due to shared weights in convolution layers,
making the model more computationally efficient and less allowing them to process large datasets more
sensitive to minor positional changes in the features. The effectively.
final layer is fully connected, producing predictions for
classification tasks. The overall number of network Limitations:
parameters is defined by the number of layers, the number
of neurons in each layer, and the connection between • Data requirements: CNNs require large labeled
neurons. The weights should be tuned through the training datasets to perform well, which may not always
phase to achieve good performance [40]. be available in medical settings.
convnet processes the image (I) using a matrix of • Limited to spatial data: While CNNs excel in
weights called filters which can recognize certain features image-based data, they are not as effective for
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 123
non-spatial data like time-series or sequential layers. The size of the input image to VGG is (224×224).
data. VGG has a set of convolutional filters with small sizes
(3×3) to capture the information of the up/down and
Use cases in medicine: CNNs have been widely applied left/right center. The size of the pre-trained weights is 528
in diagnostic tasks such as detecting cancers, classifying MB. The overall number of parameters of VGG16 is 138
lesions, and analyzing radiological images (e.g., X-rays, 357 544 parameters [43].
MRIs, CT scans) [35, 36, 41].
4.1.2.2 InceptionV3 model architecture
Recent advances in CNN, like AlexNet [42], VGGNet InceptionV3 network is the winning model
[43], GoogLeNet [44], and ResNet [45], have significantly architecture of the 2015 ImageNet competition. The
improved image classification accuracy, with models now Inception V3 model has a total of 48 layers. The size of
outperforming human experts in some cases. These the input image to InceptionV3 is (299×299). It is deeper
networks have been trained on the ImageNet Large Scale than VGG16 but with fewer parameters. The size of the
Visual Recognition Challenge (ILSVRC) using millions pre-trained weights is 92 MB. It has 23 851 784
of annotated images [46, 47]. And their success has parameters [44].
spurred the rise of transfer learning, where pre-trained
models are fine-tuned for specific tasks [48]. In the next 4.1.2.3 Residual neural network (ResNet)
subsection, a brief description of some of the high-
performance pre-trained models using ImageNet. ResNet network is the winning model architecture of
the 2016 ImageNet competition. ResNet-50 contains a 50-
4.1.2.1 Visual geometry group network VGG 16– layer architecture. The size of the input image to ResNet
19 is (224×244). The size of the pre-trained weights is 99
VGG16 network is the winning model architecture of MB. It has 25 636 712 parameters [45].
the 2014 ImageNet competition. VGG consists of 16–19
Figure 9: The basic structure of CNN.
Figure 10: Transfer learning architecture.
124 Informatica 49 (2025) 115–136 E.M.F. El Houby
heuristics, or natural language processing (NLP) applied to
4.2 Transfer learning radiology reports.
Given a sample xi which should be assigned to a class
Transfer learning is a more appropriate approach when the
label Ci 𝜖 {C1, C2, …., Cm}. Suppose the training set
available data for training is limited. In transfer learning,
consisted of pairs {X, C}, and the training is processed in
an intricate model can be trained using available large-
batches of size B for a total of E epochs. To train CNN
scale annotated images such as natural images. Therefore,
with CL, prefer to start the training with simpler samples.
the TL process drives knowledge from source domain (e.g.
Practically, CL is performed by assigning a probability to
natural images) to a target domain or network where the
every training pair, where the simpler samples are given
domain images are limited. Only the small amount of
higher probabilities to be chosen first. Initially, every
available annotated data of the target domain is used to
sample xi is assigned a probability pi(0). At the beginning
tune the model. Where the fundamental features used for
of each epoch e, the training set {X, C} is permuted to {X,
classification are similar between domains, retraining the
C}k by the reordering function F(e). Where this mapping
entire model is unnecessary. In such cases, TL allows for
is produced by sampling the training set based on the
the transfer of learned features, with only the classification
probabilities at the present epoch pi(e). After executing
layer(s) being retrained on the small new dataset [48]. TL
many iterations, these probabilities are updated using a
leverages pre-trained models such as VGG [43],
scheduler, aiming to achieve a regular distribution by the
NasNetLarge [22], Inception GoogLeNet [44], ResNet
end of the training process [50].
[45], etc—that have been developed for image
classification and they have been presented at the annual
ILSVRC [46, 47]. TL saves a great amount of time lost in 4.4 Active learning (optimal experimental
developing and training CNN models. The pre-trained design)
model or the required part of the model can be Supervised learning techniques rely heavily on annotated
incorporated directly into the new model and used as a data. Although more datasets are becoming available, the
classifier, standalone feature extractor, integrated feature effort, cost, and time required to annotate them remain
extractor, or weight initializer [48, 49]. Figure 10 shows significant. On the other side, any error especially in some
transfer learning architecture. important applications such as those in the medical domain
can have severe consequences. Achieving reliable
4.3 Curriculum learning outcomes often requires an interactive process where
In the standard educational method, learning depends on a predictions are reviewed or modified by an oracle or user.
curriculum that presents new concepts based on previously This means users must be able to override and adjust
collected ones. The rationale beyond this is that people automated predictions to meet specific criteria.
pick up better if the information is introduced in a Techniques such as Active Learning (AL) or what is called
meaningful method instead of randomly. By using the Human-in-the-Loop computing have witnessed progress
same ideas to train neural networks starting with simple in overcoming these challenges [51].
cases, it was noticed that the networks perform better, Active learning is a semi-supervised learning
which indicates the significance of gradual and systematic approach that begins with a small set of labeled samples
learning [50]. (seed samples) and iteratively selects the most informative
The curriculum learning (CL) approach is motivated samples from a pool of unlabeled data for annotation. By
by the capacity of humans to pick up new tasks fast with focusing training on the most informative subset of
finite "training sets". Similarly, the training procedure of samples, AL improves model performance and reduces the
medical students called teacher-student curriculum annotation burden, particularly for image data. In AL, an
learning is based on training by tasks with gradually MLT scans unlabeled data and recognizes the most
growing difficulty. While each task uses smaller datasets informative samples. These samples are then presented to
than those utilized in machine learning. Like, students can a human annotator (oracle) for labeling. This makes AL a
start with a simple task, such as deciding if an image part of the Human-in-the-Loop paradigm, where only
includes lesions, and later are asked to determine if the selected samples are used for training, often far fewer than
lesions are malignant or benign which is a more in traditional supervised learning [51].
complicated task. With time, they will progress to a more Formally, suppose that U is available big pool of
complex task, like recognizing the subtypes of lesions [8]. unannotated data and that there are oracles to request
In machine learning, CL works with a series of annotations for any unannotated sample xU to be added to
training samples sorted in increasing order according to annotated set L. The goal is to train a model f(x | L∗) using
learning difficulty. The order in which the samples are the annotated set L∗⊆L. A brute-force solution would
introduced to the model is critical, as it can significantly involve requesting the oracle(s) to annotate each sample
impact the model's performance. Curriculum learning is an xU, resulting in L∗ = L. However, this is a costly and not
active area of research, particularly in applications such as practical solution. Theoretically, there is an optimal subset
medical image diagnosis [8]. L∗ of data that can achieve performance equivalent to that
A key point in CL is the design of data schedulers that obtained using the whole annotated dataset L, i.e. (f(x | L∗)
control the sequence in which training samples are fed into ≈f(x | L)). AL is a trend of ML that tries to explore this
the model. These schedulers can use a variety of methods optimal subset L∗, where the current model is f´(x | L´), L´
to determine sample difficulty, such as expert input, is an intermediate annotated data. AL intends to iteratively
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 125
explore the most informative data samples 𝑥∗
𝑖 to train the the difference between them, then considers the sample
model, assuming that the unannotated data samples and the that has the smallest difference between the first and
model will evolve through time, rather than choosing a second most likely labels to be annotated.
constant subset of samples once for training. Entropy sampling uses entropy as it is a measure of
The selection of samples to be annotated is based on uncertainty to select a sample to be annotated. Entropy
the informativeness of these requested samples. The measures the amount of information gained by considering
evaluation of the informativeness of each un-annotated a sample and so selects the sample that maximizes the
data sample xU is done given f´(xU | L´), then all selected information that has the largest entropy value [51].
samples are demanded to be annotated. After the
annotations, the new annotated data has been used to 4.5 Federated learning
improve the model. This is done by retraining the whole
Federated learning (FL), developed by Google in 2017 is a
model using all available annotated data L´, or by using the
collective distributed decentralized learning method that
most recently annotated sample 𝑥∗
𝑖 to fine-tune the network
allows many organizations to collaborate on machine
[51].
learning or deep learning models without sharing clients'
Active Learning typically employs three methods to
or devices’ data. It allows the training data to be on the
select samples for annotation:
decentralized edge devices rather than keeping it in a data
Stream-based selective sampling supposes the
center. These individual nodes or devices jointly train a
existence of a continuous flow of unannotated data
machine learning or deep learning model from their local
samples xU. In this method, the present model and
data and then aggregate the devices' training outputs on the
informativeness I(xU) measure are the criteria used to
server to update the global model without sharing edge
specify, for each incoming sample whether or not to
data. The resulting model can be shared among all
require an annotation from the oracle(s). Thus, while the
participating devices or clients. Therefore, it provides
model is being trained, it is offered a data sample and
secure models that fulfill an efficient solution while
instantly decides if it needs to query for the label. Although
providing data access and security [52-54].
this type of query is inexpensive, its performance is limited
One major issue with centralized models is that
because it does not consider the broader context of the
medical organizations do not allow to break doctor-patient
underlying distribution, but it depends on the separation
confidentiality by providing medical images such as CT
nature of each decision, therefore the balance between the
and X-ray images for training purposes because of privacy,
exploration and exploitation is less than in other query
legal, and data-ownership issues. To develop deep learning
kinds.
models for the medical domain, large medical data is
Membership query synthesis generates the sample
needed to develop these models. Therefore, many medical
𝑥∗
𝐺 that the model believes to be most informative, rather
researchers illustrated that federated learning is a good
than selecting from real-world data. Therefore, it is
technique to connect different medical organizations and
annotated by the oracle(s). This method may be very
let them share their experiences while keeping privacy.
effective in bounded domains, but it may struggle when the
Furthermore, the performance of the learning model will
model has no knowledge of unrepresented areas of the data
be improved using a large medical dataset. However, the
distribution, similar to stream-based methods.
resulting models may be biased toward organizations that
Pool-based sampling selects N data samples 𝑥∗
0 , . . .
have larger training datasets [53].
, 𝑥∗
𝑁 from a large unlabeled dataset U to pull samples from. In federated learning, the process begins by sending a
Pool-based approaches use the present model to do a global model with unified initial weights to each client. At
prediction on un-annotated data samples to get a ranked each client side, there is a local dataset, where the model is
measure of informativeness for each data sample in the un- trained in each separately. After completing local training,
annotated data. The highest N informative samples are the client sends its model updates back to the server, which
selected for annotation by the oracle(s). Therefore, the aggregates these updates to refine the global model, while
model is initially trained on labeled samples which are then the data at the clients remains local in each client. The
used to find which data samples would be most server has the authority to manage the whole process
informative to be inserted into the training set for the next where it sends the model to the client, collects the updates,
AL loop. This approach has proved to be the most and synchronizes them to build the updated model with the
promising, which depends on batch-based training. Figure new parameters. This method enables medical
11 shows the full process of active learning. organizations to collaborate on training models while
AL uses some informativeness measures of unlabeled maintaining data privacy. There are different federated
samples to select the most informative samples. They learning algorithms according to the computation method
depend on probabilities, these approaches are least of gradients such as federated stochastic gradient descent,
confidence sampling, margin sampling, and entropy federated averaging, and federated learning with dynamic
sampling [51]. regularization. [53, 54]. Figure 12 shows the architecture
Least confidence sampling the model selects the of federated learning.
highest uncertainty sample or least confidence for
annotation and therefore is given to the oracle to be
labeled.
Margin sampling can be utilized in a multi-class, it
uses the first and second most likely labels and computes
126 Informatica 49 (2025) 115–136 E.M.F. El Houby
Figure 11: The process of active learning.
Figure 12: Federated Learning architecture.
5 Search methodology were used in the search: “active learning”, “curriculum
learning”, “deep learning”, “transfer learning” and
5.1. Search criteria “federated learning” to investigate the different research
that utilizes these recent trends. Additional keywords—
This research investigates recent trends in machine "medical", "disease", "cancer" and "gene" were included
learning (ML) within the medical domain. To achieve this, to focus the search on medical applications that used these
we explored a ScienceDirect (Elsevier) new trends. Although the search intended to retrieve the
(http://www.sciencedirect.com). The following keywords articles related to any disease, "cancer" was added to
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 127
retrieve more relevant results, given that much of the 5.2. Data extraction
recent research in ML is focused on cancer. Publications
As the search retrieved a large number of articles,
from 2016 to 2024 were considered. The composition of
therefore only a subset of the retrieved articles was
the used terms to form the search query used for deep
selected for analysis. Figures 13-15 illustrate the number
learning-based techniques in the medical domain was:
of publications per year for the various techniques
"Deep learning" AND ("medical" OR "Disease" OR
between 2016- 2024, based on “Elsevier” database to
"Cancer" OR "Gene").
show the growth rates of these new trends.
Where the aim of this research is to find the new
• Figure 13 shows the steady increase in deep learning
trends in machine learning techniques which after accurate
publications, from 17 articles in 2016 to 2958 in 2024,
investigation were found to be mostly based on “deep
indicating a growing interest in applying deep learning
learning” either alone or combined with other new
in the medical domain.
techniques such as “transfer learning”, “active learning”,
• Figure 14 shows that transfer learning started to be
“curriculum learning” and “federated learning”, so the
applied in the medical domain in 2017 with 2 articles
same query was used as for deep learning-based
only and reached 218 in 2024.
techniques with adding the other techniques’ keyword as
follow: • Figures 15 shows that the number of publications of
("Deep learning" AND "*") AND ("medical" OR active learning, curriculum learning, and federated
"Disease" OR "Cancer" OR "Gene") learning is limited and scattered across the years as they
Where “*” can be replaced by any of the other are newly emerged trends.
techniques’ keywords (“transfer learning”, “active The selected articles were drawn from top journals in
learning”, “curriculum learning”, and “federated ScienceDirect, adhering to the criteria mentioned above.
learning”). The references provide a sample of the applications of
The following criteria were applied to select the these new ML techniques in the medical domain, rather
publications: (1) Articles related to human diseases (other than an exhaustive list. For each reference, key details
organisms’ related diseases are excluded); (2) Inclusion of such as the task, disease, technique(s) used, evaluation
at least one of the new ML techniques; (3) Only complete results, and data type are presented.
research articles were included (excluding letters, surveys,
book chapters, and non-English articles); (4) publications
published from 2016 to 2024.
Figure 13: The number of articles published on deep learning from 2016- 2024 in Elsevier database.
128 Informatica 49 (2025) 115–136 E.M.F. El Houby
Figure 14: The number of articles published on transfer learning from 2016- 2024 in Elsevier database.
Figure 15: The number of articles published on active/curriculum/Federated learning from 2016- 2024 in Elsevier
database.
6 Some applications of new trends of accuracy of more than 99% and FAUC of 0.982 when
applied to the Chest X-ray radiographs dataset [56].
MLTs in the medical domain El Houby & Yassin [57] developed a CNN model to
This section illustrates the selected articles from the classify the breast mammographic images' into
retrieved ones from searching the databases which nonmalignant or malignant. They used 2 methods, the first
represent the applications of previously discussed is based on patches of region of interest (ROI) in the
emerging ML trends in the medical domain. mammogram and the second is based on the whole breast.
Li, X., et al., [55] proposed a DL model to detect lung The accuracy, specificity, sensitivity, and AUC were
nodules. First, segmentation and rib suppression were 95.3%, 92.6%, 98%, and 0.974 respectively using MIAS
applied to extract the region of interest and enhance the [20] dataset, while they were 96.52%, 96.49%, 96.55%,
nodules’ visibility. Then, the histogram was applied to and 0.98 using INbreast [58] dataset.
enhance the images. After that, patch-based multi- Dai, Y., et al., [59] developed a deep learning CNN
resolution CNN was used for feature extraction, and 4 model for detecting coronary artery disease utilizing raw
fusion methods were employed for classification, the best heart sound signals. It extracts 206 multidomain features
performance method to detect lung nodules achieved an and 126 medical multidomain features. The heart sound
signal datasets have been collected from 400 patients from
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 129
the hospital of Xinjiang Medical University. The model Neuroimaging dataset [69] and achieved an accuracy of
achieved an accuracy of 87.86, sensitivity of 90.67, 98.54% recalls of 98.9%, a precision of 98.98%, and an
specificity of 82.38, and AUC of 94.70 using multidomain F1 score of 98.82%.
features. It achieved an accuracy of 85.6, sensitivity of Kumar et al. [70] developed a CNN model using the
88.04, specificity of 80.83, and AUC of 92.74 using Resnet152 TL approach with feature extractors to classify
medical multidomain features. the brain tumor images into normal, benign, and
Alassafi et al., [60] proposed a model that predict the malignant. The model has been applied to the Brats MRI
distribution of the COVID-19 outbreak in Saudi Arabia, image dataset. The proposed transfer Learning model
Malaysia, and Morocco. A DL RNN and LSTM network achieved high accuracy reaching 99.57%.
were developed to predict the number of possible cases of Manickam, et al., [71] proposed a deep TL model for
COVID-19. The LSTM achieved an accuracy of 98.58%, pneumonia detection. The chest X-ray images were
while the RNN achieved an accuracy of 93.45%. A preprocessed to recognize the existence of pneumonia
comparison was conducted between the number of based on the U-Net segmentation network, then classify
resulting deaths and the number of coronavirus cases in the cases as normal or abnormal (Bacteria, viral) using
each of the 3 countries. The model predicted the number pre-trained models such as ResNet50, InceptionV3, and
of certain COVID-19 cases and deaths for the following 7 Inception ResNetV2. It was evaluated using a publicly
days. The model was tested using a public dataset from the available database which includes 5,232 chest x-ray
European Centre for Disease Prevention and Control [61]. images. ResNet50 model achieved an accuracy of 93.06%,
Maiti et al., [62] developed a deep learning (DL)- precision of 88.97%, Recall of 96.78%, and F1-score of
based framework to automatically detect and segment the 92.71%.
optic disc from fundus images for the diagnosis of diabetic Veknugopal, et al., [72] developed a DNN using
retinopathy. The framework utilized an adjusted CNN, modified EfficientNetV2-M based on transfer learning to
experimenting with seven different encoder networks: detect skin cancer on dermoscopic images. The model was
DenseNet121, InceptionV3, ResNet34, GG11, VGG19, applied to 58,032 dermoscopic images collected from [73-
VGG13, and VGG16. VGG16 was selected as the adopted 77]. The model was tested for binary classification tasks
encoder, while the decoder was designed with a harmonic and the multiclass classification tasks. It achieved an
structure based on that of the encoder to improve accuracy reached 97.62 for the multiclass classification of
segmentation performance. The framework was applied to the ISIC 2020 dataset, while it achieved an accuracy of
several fundus image datasets, including DIARETDB1, 99.23 for the binary classification of the same dataset.
MESSIDOR, IDRiD, DIARETDB0, CHASE-DB1, Mehmood, et al., [78] developed a model to diagnose
DRIVE, and STARE. It achieved an impressive accuracy Alzheimer’s disease (AD) in its early stage based on TL
of 99.44%. using VGG-19 pre-trained model. The model
Zareen, et al. [63] developed a skin cancer distinguishes among 4 classes which are AD, late mild
classification deep learning CNN-RNN model with a cognitive impairment (LMCI), early mild cognitive
ResNet-50 for spatial features extraction and LSTM for impairment (EMCI), and normal control (NC). The used
temporal dependencies. The model has been applied to a dataset was collected from the AD Neuroimaging
dataset of 9000 images of skin lesions representing 9 Initiative (ADNI) [69] database. In the pre-processing
cancer types. The model achieved an accuracy of 94.48, a phase, the gray matter (GM) tissue was segmented from
sensitivity of 94.38, and a specificity of 93. brain MRI, and then VGG-19 was used to classify the
Ge, R., et al., [64] Proposed a Dual-Enhanced segmented parts. The model achieved an accuracy of
Convolutional Ensemble Neural Network (DECENN) to 98.73% to distinguish between AD and NC, 83.72% to
detect the presence or absence of metastasis in the whole distinguish between LMCI and EMCI cases, and more
slide imaging patches of breast cancer. It utilizes VGG16 than 80% to distinguish between the other combinations
and DenseNet121 in the network. It was applied to the of classes.
updated version of a benchmark dataset of microscopic Al-Shabi, Shak, and Tan, [79] developed a
images and histopathologic scans of lymph node sections Progressive Growing Channel Attentive Non-Local
for the breast [65]. It achieved an accuracy of about (ProCAN) deep learning model to classify lung nodules as
98.92%, an AUC of 99.70%, and a F-score of 98.93%. benign or malignant. Curriculum Learning (CL) was used
Liu, Q., X. She & Q. Xia [66] proposed a model to to train easy samples before hard samples. The model has
classify osteosarcoma cells and other cell types using an gradually grown to improve the possibility of classifying
updated version of CA-MobileNet V3 based on the the samples based on CL. The model has been applied to
transfer learning model. It was applied to osteosarcoma samples from 2 publicly available CT scan datasets LIDC-
cells microscopy imaging of bone cancer dataset [67]. It IDRI [80] and LUNGx [81]. It achieved an accuracy of
achieved an accuracy of 98.69 % and f1-score of 94.11. 95.28% and AUC of 98.05%, precision of 95.75,
Oommen & Arunnehru [68] proposed a model to sensitivity of 94.33 and F1-Score of 95.04.
diagnose Alzheimer’s disease in its early stages. The Cho, Y., et al., [82] proposed CL model using a DL
proposed model contains 3 phases: preprocessing the CNN to classify chest radiograph (CXR) images into
images, extracting features using TL with ResNet-18, normal and five types of pulmonary abnormalities. The
which are then compressed by cascaded autoencoders model used ResNet-50 for training on patches of CXR
(AE), and finally classifying the disease to one of its 5 images based on the various patch ratios according to pre-
stages using DNN. The model was applied to the MRI trained weights, with fine-tuning using transfer learning
130 Informatica 49 (2025) 115–136 E.M.F. El Houby
(TL). The model was applied to CXR from hospitals, Zhang, et al., [92] developed a semi-supervised
including Seoul National University Bundang Hospital framework for brain segmentation that incorporates
(SNUBH) and Asan Medical Center (AMC). It achieved quality-driven active learning (QDAL). In the AL module,
the following accuracies: 90.97% for 20% of the dataset at deep supervision loss and attention mechanism improve
SNUBH, 91.92% for 50%, and 93.00% for 100%. At the accuracy of segmentation and return quality
AMC, the accuracies were 93.90%, 94.54%, and 95.39%, information for the unlabeled slices. The AL module
respectively. chooses the most informative slices to be annotated, and
Wong et al., [83] developed a CL-based method for the segmentation process is trained iteratively using the
classifying medical images, using features from updated labeled data. The framework was tested on two
segmentation networks. The model first learns simpler brain MRI datasets [93, 94]. The experiment results
shapes and features through a segmentation network pre- showed that the segmentation utilizing the QDAL only
trained on similar data, then applies this knowledge for wants 15–20% annotated slices for the brain extraction
more complex classification tasks. The M-Net, a CNN task, and 30–40% for tissue segmentation, achieving
modified from U-Net for working with fewer training competitive results with full supervision and an accuracy
samples, was used for segmentation. Then the CNN of 90.7.
classifier receives the features from a segmentation Lu, Q., et al.,[95] presented a blood cell classification
network as inputs. The model achieved an accuracy of method called MAE4AL, which combines the self-
82% in a 3D 3-classes brain tumor classification and 86% supervised Masked Autoencoder (MAE) and active
on a 2D nine-class cardiac semantic level classification learning (AL). It chooses the most remarkable samples for
problem. labeling based on self-supervised loss of MAE and sample
Wu, et al. [84] developed a weakly-supervised deep uncertainty. Tested on blood smear samples obtained from
AL framework to diagnose COVID-19 using CT scans. [96], MAE4AL needed labeling only 20% of the data to
The framework contains a 2D U-Net for segmentation of perform the same as ResNeXt, which was trained on the
the lung region and a hybrid active learning approach, full dataset. When it trained using half of the labeled data,
which keeps sample diversity and predicted loss diagnosis MAE4AL achieved an accuracy of 96.36%,
of COVID-19. The framework classifies the CT scans into outperforming ResNeXt which trained on all the data.
one of three classes which are pneumonia, coronavirus Kumbhare et al. [97] developed a FL method for
pneumonia caused by SARS-CoV-2, and normal cases. breast cancer diagnosis using mammogram images from
The framework was validated on a CT scan dataset from the “Curated Breast Imaging Subset of DDSM (CBIS-
the China Consortium of Chest CT Image Investigation DDSM)” dataset [98]. The DenseNet pre-trained model
(CC-CCII) [85]. With only 30% of the labeled data, the was used for feature extraction and the extracted features
accuracy of the framework reached 0.867, while AUC was were classified using Enhanced Recurrent Neural
0.968. Networks (E-RNN). FL was employed to reduce
Wu, X., et al., [86] proposed a hybrid active learning processing time and improve model performance. The
(HAL) framework that combines AL with deep TL using method achieved an accuracy of 95%.
ResNet18. The framework applies data augmentation to Feki, et al. [53] proposed a decentralized FL
the unlabeled data pool and uses a hybrid sampling framework that permits different medical organizations to
approach that maintains sample variety and classification screen COVID-19 using Chest X-ray images based on
loss (data uncertainty). The diversity sampling is based on deep learning while keeping the privacy of patient data.
data augmentation, while the generated data noise is Two pre-trained models which are VGG16 and ResNet50
discarded with an outlier detection process. The HAL was were used for classification. The framework was tested
validated on 3 medical image datasets which are the using four clients, where each client has his private dataset
Hyper-Kvasir for gastrointestinal disease [87], Messidor and the same CNN models. The proposed FL framework
for eye fundus images [88], and breast cancer datasets achieved competitive results compared to those models
[89]. By applying the proposed framework to the Hyper- trained by sharing data. The best achieved accuracy was
Kvasir dataset it achieves an accuracy of 0.871, precision 97% using the ResNet50 model with data augmentation.
of 0.602, recall of 0.587, and F1-score of 0.594. Zhang, et al. [99] proposed a FL based on DL
Meirelles, et al., [90] used Pool-based AL to train DL framework for the diagnosing brain disorders. The
models for classifying Tumor Infiltrating Lymphocytes. proposed framework was tested on Autism Brain Imaging
The proposed approach selects image patches based on Data Exchange (ABIDE) [100] dataset. The proposed
feature grouping and prediction uncertainty. They framework achieved an average accuracy of 79% and
introduced a Diversity-Aware Data Acquisition (DADA) reduced the communication burden of FL.
method, which ensures diverse batch selection by Shaikh, et al. [101] developed an FL-based DL
clustering images based on features and then choosing method to classify respiratory diseases by listening to lung
uncertain patches from each cluster. The most uncertain sounds. Generative Adversarial Networks created new
patches from each cluster are prioritized for selection, the lung sounds to train a neural network that classifies 4 lung
clusters with the most uncertain patches must participate diseases, heart attack and normal breathing patterns. Using
with more patches, the pool is updated by removing the two datasets [102, 103], the proposed method achieved an
selected patches. By applying the proposed model to the accuracy of 92% for the classification of different
cancer tissue image dataset [91], it achieved an AUC of respiratory diseases and heart failure.
0.78 with fewer tissue patches and execution time.
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 131
Table 1 provides a summary of 25 selected articles not to present a comprehensive list. For each reference, the
from top journals on Science Direct, published between table includes the task, disease, techniques used,
2016 and 2024, based on a database search. These articles evaluation results, and data type.
showcase applications of recent trends of MLTs in the
medical domain and are intended to illustrate these trends,
Table 1: Summary of the selected articles from search results for applications of the new ML in medical domain.
Ref. Task Disease Used Technique(s) Evaluation Data Type
results
[55] Detection Chest DL-CNN Acc.= 99% x-ray radiographs
Lung cancer FAUC = 0.982
[57] Classification Breast cancer DL-CNN Acc.=96.52% mammogram
spec.=96.4%
Sen.=96.5%
AUC =0.98
[59] Detecting Coronary artery DL-CNN Acc.=87.86, % heart sound
disease Sen.=90.67% signals
Spec.=82.38 %
AUC = 94.70
[60] Prediction COVID-19 DL-RNN Acc. = 93.45% Numerical
LSTM Acc.= 98.58%
[62] Segmentation Diabetic DL-CNN Acc.= 99.44% fundus images
diagnosis retinopathy
[63] Classification Skin cancer DL-CNN-RNN Acc.=94.48, skin lesion
Sen.= 94.38 images
spec. = 93
[64] Detection Breast cancer DL-TL-VGG16- Acc.=98.92%, histopathologic
DenseNet121 AUC =99.70%, image of lymph
F-score = 98.93% node
[66] Classification Bone cancer TL Acc.=98.69 % microscopic
CA-MobileNetV3 f1-score= 94.11% images of bone
cancer
[68] Classification Alzheimer’s TL- ResNet-18- AE- Acc. = 98.54% MRI
disease DL recalls = 98.9% Neuroimaging
prec.=98.98% dataset
F1-score=98.82%
[70] Classification Brain tumor TL-Resnet152-CNN Acc. =99.57% MRI
[71] Segmentation Pneumonia U-Net Acc.=93.06% chest X-ray
Detection TL- ResNet50 prec.=88.97%
Rec.=96.78%
F1score=92.7
[72] Classification Skin cancer TL-EfficientNetV2- Acc. = 99.23 dermoscopic
M images
[78] Classification Alzheimer TL-VGG19 Acc.=98.73% MRI
[79] Classification lung nodules DL-CNN-CL Acc. = 95.28% CT scans
AUC = 98.05%
Precision = 95.75
Sen = 94.33
F1-Score = 95.04
[82] Classification pulmonary TL-ResNet-50 -CL Acc. =93.90, CXR
abnormalities 94.54, 95.39
For 20%, 50%,
100% of dataset
[83] Segmentation brain tumor TL-M-Net Acc.= 82% MR
Classification cardiac DL-CNN-CL Acc.= 86%
132 Informatica 49 (2025) 115–136 E.M.F. El Houby
[84] Segmentation COVID-19 TL-U-Net Acc=0.866 CT scans
Classification DL-AL ROC= 0.968
[86] Classification gastrointestinal TL-ResNet18 AL Acc. =0.871 Images
disease Prec. =0.602
Recall=0.587
F1score=0.594
[90] Classification Tumor DL_CNN_AL AUC = 0.78 histology image
Infiltrating
Lymphocytes
[92] Segmentation Brain DL_CNN_AL Acc.= 90.7 MRI
[95] Classification blood diseases MaskedAutoencoder Acc.= 96.36% blood smear
(Leukemia) (MAE4AL) samples
[97] Classification Breast FL-TL-DenseNet- Acc.=95% mammogram
Cancer RNN
[53] Classification COVID-19 FL-TL-VGG16/ Acc. = 97% X-ray images
ResNet50
[99] Classification Brain disorders FL-CNN Acc. = 79% Autism Brain
Imaging
[101] Classification respiratory FL-DL Acc. = 92% breathing sounds
diseases& heart
failure
7 Conclusion and future work derived from limited labeled data, thereby enhancing
medical decision-making and patient outcomes.
This research explored the emerging trends in machine Looking ahead, there are several key areas where
learning techniques (MLTs) within the medical domain. further work is needed. While the number of publications
Through a comprehensive literature review, we found that on deep learning in the medical domain has steadily
deep learning has become the dominant trend, holding increased since its initial applications in 2016, and
significant promise for developing intelligent medical although these applications have yielded promising
applications. A key advantage of deep learning is its results, further research is essential to address several key
ability to perform automatic feature engineering, challenges. Areas such as active learning, curriculum
simplifying the model-building process and reducing learning, and federated learning have shown promise but
reliance on manual input. Current research predominantly remain under-explored and require more attention in
addresses diagnostic tasks, with disease classification future research. A critical direction for future is to focus
being the most common approach. Other tasks, such as on reducing the time and computational costs associated
segmentation, are also explored. Cancer, in its various with deep learning models and other trends. These
forms, is the most frequently studied condition, while the processes often consume substantial energy, indirectly
COVID-19 pandemic has notably led to a surge in contributing to environmental and climate concerns.
research on lung diseases. Therefore, developing more energy-efficient techniques
In the realm of medical imaging, traditional machine will be crucial. Additionally, data augmentation, a
learning approaches require extensive pre-processing, significant pre-processing step in deep learning, could be
including feature extraction and selection. Deep learning, integrated more effectively into the model-building
particularly Convolutional Neural Networks (CNNs), has process itself, thereby enhancing sample diversity and
advanced the field by automating feature engineering, improving class representation with less manual effort.
reducing the need for manual intervention. However, this Another important aspect for future research is the
comes with an increased demand for large datasets and development of standardized, public databases that
significant computational resources. To address these include diverse patient data, such as DNA sequences.
challenges, recent trends like transfer learning, curriculum These databases would enable more comprehensive
learning, active learning, and federated learning have been studies and improve the accuracy of predictive models by
introduced to enhance model performance, expedite the providing a richer set of input data. Additionally,
training process, and improve data security. In summary, integrating knowledge from multiple domains could
the overarching goal in this field is to automate processes, further enhance the performance of deep learning models
reduce human intervention, and maximize the value in different medical applications. Despite the progress
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 133
made, the real challenge lies in translating these [15] Zhang, Z. and E. Sejdić, Radiological images and
advancements into practical, real-world applications that machine learning: trends, perspectives, and
can be implemented in clinical settings. Bridging the gap prospects. Computers in biology and medicine,
between theoretical research and clinical deployment will 2019. 108: p. 354-370.
be vital to realizing the full potential of deep learning in [16] Saslow, D., et al., American Cancer Society
medicine. guidelines for breast screening with MRI as an
adjunct to mammography. CA: a cancer journal for
Conflicts of interest clinicians, 2007. 57(2): p. 75-89.
[17] http://medicaldictionary.thefreedictionary.com/op
The author has no competing interests to declare. erating+microscope.
[18] Jones, N.C. and P.A. Pevzner, An introduction to
References bioinformatics algorithms. 2004: MIT press.
[19] Rahman, T., et al., COVID-19 radiography
[1] Chen, M., et al., Disease prediction by machine
database.
learning over big data from healthcare communities.
https://www.kaggle.com/tawsifurrahman/covid19-
Ieee Access, 2017. 5: p. 8869-8879.
radiography-database.
[2] Grossman, R.L., et al., Toward a shared vision for
[20] Suckling, J., et al., Mammographic image analysis
cancer genomic data. New England Journal of
society (mias) database v1. 21. 2015.
Medicine, 2016. 375(12): p. 1109-1112.
[21] Labati, R.D., V. Piuri, and F. Scotti. All-IDB: The
[3] Schaekermann, M., et al., Understanding expert
acute lymphoblastic leukemia image database for
disagreement in medical data analysis through
image processing. in 2011 18th IEEE international
structured adjudication. Proceedings of the ACM on
conference on image processing. 2011. IEEE.
Human-Computer Interaction, 2019. 3(CSCW): p. 1-
[22] Zoph, B., et al. Learning transferable architectures
23.
for scalable image recognition. in Proceedings of
[4] Garg, A. and V. Mago, Role of machine learning in
the IEEE conference on computer vision and
medical research: A survey. Computer Science
pattern recognition. 2018.
Review, 2021. 40: p. 100370.
[23] Thusberg, J. and M. Vihinen, Pathogenic or not?
[5] Dallora, A.L., et al., Machine learning and
And if so, then how? Studying the effects of
microsimulation techniques on the prognosis of
missense mutations using bioinformatics methods.
dementia: A systematic literature review. PloS one,
Human mutation, 2009. 30(5): p. 703-714.
2017. 12(6): p. e0179804.
[24] El Houby, E.M., Machine learning techniques for
[6] Kharazmi, P., et al., A computer-aided decision
pathogenicity prediction of non-synonymous single
support system for detection and localization of
nucleotide polymorphisms in human body. Journal
cutaneous vasculature in dermoscopy images via
of Ambient Intelligence and Humanized
deep feature learning. Journal of medical systems,
Computing, 2023. 14(7): p. 8099-8113.
2018. 42(2): p. 1-11.
[25] Han, J., J. Pei, and M. Kamber, Data mining:
[7] Xu, J., K. Xue, and K. Zhang, Current status and
concepts and techniques. 2011: Elsevier.
future trends of clinical diagnoses via image-based
[26] Hamla, H. and K. Ghanem, A hybrid feature
deep learning. Theranostics, 2019. 9(25): p. 7556.
selection based on fisher score and SVM-RFE for
[8] Xie, X., et al., A survey on incorporating domain
microarray data. Informatica, 2024. 48(1).
knowledge into deep learning for medical image
[27] Fahrudin, T.M., I. Syarif, and A.R. Barakbah. Ant
analysis. Medical Image Analysis, 2021. 69: p.
colony algorithm for feature selection on
101985.
microarray datasets. in 2016 International
[9] Lee, C.H. and H.-J. Yoon, Medical big data: promise
Electronics Symposium (IES). 2016. IEEE.
and challenges. Kidney research and clinical
[28] Talavera, L. An evaluation of filter and wrapper
practice, 2017. 36(1): p. 3.
methods for feature selection in categorical
[10] Dinov, I.D., Methodological challenges and analytic
clustering. in International Symposium on
opportunities for modeling and interpreting Big
Intelligent Data Analysis. 2005. Springer.
Healthcare Data. Gigascience, 2016. 5(1): p.
[29] Tabakhi, S. and P. Moradi, Relevance–redundancy
s13742-016-0117-6.
feature selection based on ant colony optimization.
[11] http://archive.ics.uci.edu/ml/datasets/.
Pattern recognition, 2015. 48(9): p. 2798-2811.
[12] Available from: http://image-
[30] Yang, X.-S., Firefly algorithm, stochastic test
net.org/challenges/LSVRC/.
functions and design optimisation. International
[13] https://file.biolab.si/biolab/supp/bicancer/projection
journal of bio-inspired computation, 2010. 2(2): p.
s/.
78-84.
[14] Corsetti, V., et al., Evidence of the effect of adjunct
[31] Mashhour, E.M., et al., Feature Selection
ultrasound screening in women with mammography-
Approach based on Firefly Algorithm and Chi-
negative dense breasts: interval breast cancers at 1
square. International Journal of Electrical &
year follow-up. European journal of cancer, 2011.
Computer Engineering (2088-8708), 2018. 8(4).
47(7): p. 1021-1026.
[32] Neagoe, V.-E. and E.-C. Neghina. Feature
selection with ant colony optimization and its
134 Informatica 49 (2025) 115–136 E.M.F. El Houby
applications for pattern recognition in space neural networks. in Proceedings of the IEEE
imagery. in 2016 international conference on conference on computer vision and pattern
communications (COMM). 2016. IEEE. recognition. 2014.
[33] El Houby, E.M., N.I. Yassin, and S. Omran, A [49] Wang, L., et al., Trends in the application of deep
hybrid approach from ant colony optimization and learning networks in medical image analysis:
K-nearest neighbor for classifying datasets using Evolution between 2012 and 2020. European
selected features. Informatica, 2017. 41(4). Journal of Radiology, 2022. 146: p. 110069.
[34] LeCun, Y., Y. Bengio, and G. Hinton, Deep [50] Jiménez-Sánchez, A., et al. Medical-based deep
learning. nature, 2015. 521(7553): p. 436-444. curriculum learning for improved fracture
[35] Goodfellow, I., Y. Bengio, and A. Courville, Deep classification. in International Conference on
learning. 2016: MIT press. Medical Image Computing and Computer-Assisted
[36] Dezaki, F.T., et al., Cardiac phase detection in Intervention. 2019. Springer.
echocardiograms with densely gated recurrent [51] Budd, S., E.C. Robinson, and B. Kainz, A survey
neural networks and global extrema loss. IEEE on active learning and human-in-the-loop deep
transactions on medical imaging, 2018. 38(8): p. learning for medical image analysis. Medical
1821-1832. Image Analysis, 2021. 71: p. 102062.
[37] Raiaan, M.A.K., et al., A systematic review of [52] KhoKhar, F.A., et al., A review on federated
hyperparameter optimization techniques in learning towards image processing. Computers
Convolutional Neural Networks. Decision and Electrical Engineering, 2022. 99: p. 107818.
Analytics Journal, 2024: p. 100470. [53] Feki, I., et al., Federated learning for COVID-19
[38] Ali, M.J., et al., A review of AutoML optimization screening from Chest X-ray images. Applied Soft
techniques for medical image applications. Computing, 2021. 106: p. 107330.
Computerized Medical Imaging and Graphics, [54] Wu, J.C.-H., et al., Dynamically Synthetic Images
2024: p. 102441. for Federated Learning of Medical Images.
[39] Ambekar, S. and R. Phalnikar. Disease risk Computer Methods and Programs in Biomedicine,
prediction by using convolutional neural network. 2023: p. 107845.
in 2018 Fourth international conference on [55] Li, X., et al., Multi-resolution convolutional
computing communication control and automation networks for chest X-ray radiograph based lung
(ICCUBEA). 2018. IEEE. nodule detection. Artificial intelligence in
[40] Anwar, S.M., et al., Medical image analysis using medicine, 2020. 103: p. 101744.
convolutional neural networks: a review. Journal [56] Li, X., et al. Rib suppression in chest radiographs
of medical systems, 2018. 42(11): p. 1-13. for lung nodule enhancement. in 2015 IEEE
[41] Oraibi, Z.A. and S. Albasri, A robust end-to-end International Conference on Information and
cnn architecture for efficient covid-19 prediction Automation. 2015. IEEE.
form x-ray images with imbalanced data. [57] El Houby, E.M. and N.I. Yassin, Malignant and
Informatica, 2023. 47(7). nonmalignant classification of breast lesions in
[42] Krizhevsky, A., I. Sutskever, and G.E. Hinton, mammograms using convolutional neural
Imagenet classification with deep convolutional networks. Biomedical Signal Processing and
neural networks. Advances in neural information Control, 2021. 70: p. 102954.
processing systems, 2012. 25. [58] I. C. Moreira, I.A., I. Domingues, A. Cardoso, M.
[43] Simonyan, K. and A. Zisserman, Very deep J. Cardoso, and J. S. Cardoso, Inbreast: toward a
convolutional networks for large-scale image full-field digital mammographic database.
recognition. arXiv preprint arXiv:1409.1556, Academic radiology, 2012. 19(2): p. 236-248.
2014. [59] Dai, Y., et al., Deep learning fusion framework for
[44] Szegedy, C., et al. Rethinking the inception automated coronary artery disease detection using
architecture for computer vision. in Proceedings of raw heart sound signals. Heliyon, 2024. 10(16).
the IEEE conference on computer vision and [60] Alassafi, M.O., M. Jarrah, and R. Alotaibi, Time
pattern recognition. 2016. series predicting of COVID-19 based on deep
[45] He, K., et al. Deep residual learning for image learning. Neurocomputing, 2022. 468: p. 335-344.
recognition. in Proceedings of the IEEE [61] https://www.ecdc.europa.eu/en/publications-
conference on computer vision and pattern data/download-todays-data-geographic- and
recognition. 2016. distribution-covid-19-cases-worldwide.
[46] Deng, J., et al. Imagenet: A large-scale [62] Maiti, S., et al., Automatic detection and
hierarchical image database. in 2009 IEEE segmentation of optic disc using a modified
conference on computer vision and pattern convolution network. Biomedical Signal
recognition. 2009. Ieee. Processing and Control, 2022. 76: p. 103633.
[47] Russakovsky, O., et al., Imagenet large scale [63] Zareen, S.S., et al., Enhancing Skin Cancer
visual recognition challenge. International journal Diagnosis with Deep Learning: A Hybrid CNN-
of computer vision, 2015. 115(3): p. 211-252. RNN Approach. Computers, Materials & Continua,
[48] Oquab, M., et al. Learning and transferring mid- 2024. 79(1).
level image representations using convolutional
A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 135
[64] Ge, R., et al., Detection of presence or absence of [78] Mehmood, A., et al., A transfer learning approach
metastasis in WSI patches of breast cancer using for early diagnosis of Alzheimer’s disease on MRI
the dual-enhanced convolutional ensemble neural images. Neuroscience, 2021. 460: p. 43-52.
network. Machine Learning with Applications, [79] Al-Shabi, M., K. Shak, and M. Tan, ProCAN:
2024. 17: p. 100579. Progressive growing channel attentive non-local
[65] Cukierski, W., Histopathologic cancer detection. network for lung nodule classification. Pattern
Kaggle. https://kaggle. Recognition, 2022. 122: p. 108309.
com/competitions/histopathologic-cancer- [80] Armato III, S.G., et al., The lung image database
detection, 2018. consortium (LIDC) and image database resource
[66] Liu, Q., X. She, and Q. Xia, AI based diagnostics initiative (IDRI): a completed reference database
product design for osteosarcoma cells microscopy of lung nodules on CT scans. Medical physics,
imaging of bone cancer patients using CA- 2011. 38(2): p. 915-931.
MobileNet V3. Journal of Bone Oncology, 2024: p. [81] Armato III, S.G., et al., LUNGx Challenge for
100644. computerized lung nodule classification. Journal of
[67] Charilaou, P. and R. Battat, Machine learning Medical Imaging, 2016. 3(4): p. 044506-044506.
models and over-fitting considerations. World [82] Cho, Y., et al., Optimal number of strong labels for
Journal of Gastroenterology, 2022. 28(5): p. 605. curriculum learning with convolutional neural
[68] Oommen, D.K. and J. Arunnehru, Alzheimer’s network to classify pulmonary abnormalities in
Disease Stage Classification Using a Deep chest radiographs. Computers in Biology and
Transfer Learning and Sparse Auto Encoder Medicine, 2021. 136: p. 104750.
Method. Computers, Materials & Continua, 2023. [83] Wong, K.C., T. Syeda-Mahmood, and M. Moradi,
76(1). Building medical image classifiers with very
[69] http://adni.loni.usc.edu. limited data using segmentation networks. Medical
[70] Kumar, K.A., A. Prasad, and J. Metan, A hybrid image analysis, 2018. 49: p. 105-116.
deep CNN-Cov-19-Res-Net Transfer learning [84] Wu, X., et al., COVID-AL: The diagnosis of
architype for an enhanced Brain tumor Detection COVID-19 with deep active learning. Medical
and Classification scheme in medical image Image Analysis, 2021. 68: p. 101913.
processing. Biomedical Signal Processing and [85] Zhang, K., et al., Clinically applicable AI system
Control, 2022. 76: p. 103631. for accurate diagnosis, quantitative measurements,
[71] Manickam, A., et al., Automated pneumonia and prognosis of COVID-19 pneumonia using
detection on chest X-ray images: A deep learning computed tomography. Cell, 2020. 181(6): p.
approach with different optimizers and transfer 1423-1433. e11.
learning architectures. Measurement, 2021. 184: [86] Wu, X., et al., HAL: Hybrid active learning for
p. 109953. efficient labeling in medical domain.
[72] Venugopal, V., et al., A deep neural network using Neurocomputing, 2021. 456: p. 563-572.
modified EfficientNet for skin cancer detection in [87] Borgli, H., et al., HyperKvasir, a comprehensive
dermoscopic images. Decision Analytics Journal, multi-class image and video dataset for
2023. 8: p. 100278. gastrointestinal endoscopy. Scientific data, 2020.
[73] Rotemberg, V., et al., A patient-centric dataset of 7(1): p. 1-14.
images and metadata for identifying melanomas [88] Decencière, E., et al., Feedback on a publicly
using clinical context. Scientific data, 2021. 8(1): distributed image database: the Messidor
p. 34. database. Image Analysis & Stereology, 2014.
[74] Tschandl, P., C. Rosendahl, and H. Kittler, The 33(3): p. 231-234.
HAM10000 dataset, a large collection of multi- [89] Aresta, G., et al., Bach: Grand challenge on breast
source dermatoscopic images of common cancer histology images. Medical image analysis,
pigmented skin lesions. Scientific data, 2018. 5(1): 2019. 56: p. 122-139.
p. 1-9. [90] Meirelles, A.L., et al., Effective Active Learning in
[75] Codella, N.C., et al. Skin lesion analysis toward Digital Pathology: A Case Study in Tumor
melanoma detection: A challenge at the 2017 Infiltrating Lymphocytes. Computer Methods and
international symposium on biomedical imaging Programs in Biomedicine, 2022: p. 106828.
(isbi), hosted by the international skin imaging [91] Saltz, J., et al., Spatial organization and molecular
collaboration (isic). in 2018 IEEE 15th correlation of tumor-infiltrating lymphocytes using
international symposium on biomedical imaging deep learning on pathology images. Cell reports,
(ISBI 2018). 2018. IEEE. 2018. 23(1): p. 181-193. e7.
[76] Combalia, M., et al., Bcn20000: Dermoscopic [92] Zhang, Z., et al., Quality-driven deep active
lesions in the wild. arXiv preprint learning method for 3D brain MRI segmentation.
arXiv:1908.02288, 2019. Neurocomputing, 2021. 446: p. 106-117.
[77] Codella, N., et al., Skin lesion analysis toward [93] Shattuck, D.W., et al., Construction of a 3D
melanoma detection 2018: A challenge hosted by probabilistic atlas of human cortical structures.
the international skin imaging collaboration (isic). Neuroimage, 2008. 39(3): p. 1064-1080.
arXiv preprint arXiv:1902.03368, 2019. [94] https://www.nitrc.org/projects/ibsr.
136 Informatica 49 (2025) 115–136 E.M.F. El Houby
[95] Lu, Q., et al., A blood cell classification method
based on MAE and active learning. Biomedical
Signal Processing and Control, 2024. 90: p.
105813.
[96] Matek, C., et al., Human-level recognition of blast
cells in acute myeloid leukaemia with
convolutional neural networks. Nature Machine
Intelligence, 2019. 1(11): p. 538-544.
[97] Kumbhare, S., A.B. Kathole, and S. Shinde,
Federated learning aided breast cancer detection
with intelligent Heuristic-based deep learning
framework. Biomedical Signal Processing and
Control, 2023. 86: p. 105080.
[98] Lee, R.S., et al., A curated mammography data set
for use in computer-aided detection and diagnosis
research. Scientific data, 2017. 4(1): p. 1-9.
[99] Zhang, C., et al., FedBrain: A robust multi-site
brain network analysis framework based on
federated learning for brain disease diagnosis.
Neurocomputing, 2023. 559: p. 126791.
[100] Heinsfeld, A.S., et al., Identification of autism
spectrum disorder using deep learning and the
ABIDE dataset. NeuroImage: Clinical, 2018. 17: p.
16-23.
[101] Shaikh, A.A.S. and M. Bhargavi, Weighted
aggregation through probability based ranking:
An optimized federated learning architecture to
classify respiratory diseases. Computer Methods
and Programs in Biomedicine, 2023. 242: p.
107821.
[102] Fraiwan, L., et al., Automatic identification of
respiratory diseases from stethoscopic lung sound
signals using ensemble classifiers. Biocybernetics
and Biomedical Engineering, 2021. 41(1): p. 1-14.
[103] Rocha, B., et al. Α respiratory sound database for
the development of automated classification. in
Precision Medicine Powered by pHealth and
Connected Health: ICBHI 2017, Thessaloniki,
Greece, 18-21 November 2017. 2018. Springer.
https://doi.org/10.31449/inf.v49i16.7979 Informatica 49 (2025) 137–150 137
Vision Transformer-Based Framework for AI-Generated Image
Detection in Interior Design
Hui Wang
AnHui Business and Technology College, Hefei City, AnHui Province, 230041, China
E-mail: leZhang2024@163.com
Keywords: artificial intelligence-generated images, interior design, vision transformers, deep learning, image
classification
Received: January 7, 2025
Increasingly, images generated by artificial intelligence (AI) are being used within interior design as a
source of authenticity and ethical use. Based on limited Convolutional Neural Network (CNN) capabilities
in data descriptive processes, including long-range dependencies and global patterns, this study examines
how Vision Transformer (ViT) can be utilized in detecting AI-generated interior design images. We fine-
tuned and evaluated four ViT models, ViT-B16, ViT-B32, ViT-L16, and ViT-L32, on 1,000 samples per class
dataset. Accuracy, precision, recall, F1-score, and computational efficiency were used to assess
performance. Results show that models with smaller patch sizes (i.e., 16×16) perform better than larger
ones (i.e., 32×32). It was found that ViT-B16 and ViT-L16 had the highest accuracy (96.25%) and F1-
score (0.9625) in identifying minor inconsistencies in the AI-generated images. ViT-B32 and ViT-L32 enjoy
better computational efficiency based on lower classification performance (80.00% and 81.25% accuracy,
respectively, for ViT-B32 and ViT-L32). The best tradeoff between accuracy and resource efficiency turns
out to be ViT-B16. However, computational costs were higher with ViT — ViT-L16, although just as
accurate. Computationally, ViT-B32 and ViT-L32 were also efficient, which was more appropriate for real-
time applications with lower accuracy than speed. Through this work, we contribute a domain-specific deep
learning framework for AI-generated image detection in interior design to increase authenticity
verification. Future work will address improving computational efficiency and generalizing the model
across all (or most) generative models and design styles.
Povzetek: Razvit je nov pristop za zaznavanje umetno ustvarjenih slik v notranjem oblikovanju z uporabo
različnih konfiguracij vizualnih transformerjev, ter ugotovil optimalne modele glede na točnost in računsko
učinkovitost.
1 Introduction (GAN) and Diffusion Models, we have created highly
realistic images that often outperform human-generated
Artificial Intelligence (AI) has become increasingly designs in quality and detail. While these tools
embedded in practice in creative industries, such as interior democratize access to creative resources, they also come
design, through generating photo-realistic and innovative with problems such as authenticity, intellectual property,
imagery [1]. Lately, tools like Generative Adversarial and ethical use. For example, it is essential to differentiate
Networks (GANs) and diffusion models have between generated and made images in interior design
democratized access to this high-quality design, but their because professional work in commercial and academic
use has become ubiquitous [2, 3]. It brings challenging spaces may be compromised. While AI is increasingly
problems around what 'authentic' designs are, how designs applied to create visual content, and domain-specific
can be used ethically, and intellectual property rights. applications such as interior design are still in their
Nearly all current AI detection methods leverage infancy, the lack of attention to developing robust means
Convolutional Neural Networks (CNNs) for their feature to detect such images remains.
extractors, and they are mainly limited to short-range Despite their effectiveness, many existing detection
dependencies in image data. approaches rely more on Convolutional Neural Networks
Based on Vision Transformers (ViTs) [4], a state-of- (CNNs), which cannot model long-range dependencies and
the-art architecture, this study proposes their application as global patterns of high-dimensionality datasets, e.g.,
a transformative approach to detecting AI-generated images [6]. In recent years, with self-attention-based
interior design images. This research lays out a solid mechanisms, Vision Transformers (ViTs) have emerged as
foundation for authenticating AI-generated content by powerful surrogate models, achieving state-of-the-art
removing barriers to scalability, computational efficiency, results in image classification and artefact detection tasks
and domain-specific applications. Artificial intelligence [7]. One of their key attributes is their ability to model such
(AI) has profoundly changed what it feels like in most noncontiguous relationships, thus offering a measurement
industries, including interior design, with visualization, for identifying the subtle inconsistencies underlying AI-
generated images. This study proposes a deep learning
creativity, and presentation led by AI-generated images
framework based on Vision Transformers to detect AI-
[5]. By the time of Generative Adversarial Networks
138 Informatica 49 (2025) 137–150 H. Wang
generated interior design images. The study fine-tunes interior design image classification. Ensemble Models:
multiple ViT configurations (ViT-B16, ViT-B32, ViT-L16, Others have combined CNNs and transformer-based
and ViT-L32) on a balanced dataset and compares their architectures to provide the best of both worlds. For
performance w.r.t. accuracy, precision, recall, F1-score, example, hybrid architectures such as DeiT (data-efficient
and computational efficiency. Results guide model image transformer) and feature extract early features via
configuration choice when resources impose a tradeoff convolutional blocks [17-20], subsequently using
between detection accuracy. transformer layers to perform global attention.
The contributions of this work are threefold: Image classification and manipulation detection have
become the state of art using Vision Transformers. On
• Developing a domain-specific AI image detection
approach targeted to interior design, high-dimensional datasets, they can divide the images into
patches and apply self-attention to the relationships
• Comparing a large number of ViT configurations to between them, leading to better performance [4, 21].
establish cost-benefit relationships,
Several studies have highlighted their applicability: ViTs
• The lessons learned from deploying transformer-based were introduced to demonstrate their scalability in
models for AI content detection. challenging image classification tasks, outperforming
First, the contributions of this research fill an essential gap traditional CNNs on large-scale datasets [4, 22].
in AI image authenticity verification, and second, they References [23-26] indicate that Vision Transformers are
establish a foundation for future work in this young area. adequate detectors of subtle image manipulations,
including deepfake detection. They, therefore, are a natural
2 Background and related work choice of methodology for tasks where subtle minute
image artefacts are exceedingly sensitive. The present
Detecting artificial intelligence (AI)-made images is an study extends this foundation to a binary classification of
emerging field of study, as people increasingly use AI- AI-generated and authentic images in interior design while
based tools in creative spheres like interior design. This fine-tuning ViT models.
literature review provides an overview of state-of-the-art The success of deep learning models and effective
AI-generated content detection, specifically preprocessing is critical. Standard techniques to make
methodologies and techniques that can be applied to using models robustly include image resizing, normalization,
Vision Transformers (ViTs) to discriminate between AI- and data augmentation. References [27, 28] have
generated and human-created images. researched that dataset balancing is necessary and that
Thanks to the integration of AI, photo-realistic images, working with augmentation strategies is a better way to
which resemble human-placed designs, are generated. The tackle class imbalances. In this study, we adopt these
advanced generative models used by tools such as DALL- practices: samples per class were capped at 1,000, and the
E, MidJourney, and Stable Diffusion make images more dataset was set up for diversity. Metrics like accuracy,
indistinguishable from real things. These democratizing precision, recall, F1-score, and loss are used to evaluate
advancements to creativity are a concern as they also put detection models, commonly called metrics. These are
it into the public domain, worrying about authenticity and used to find misclassification patterns using confusion
intellectual property rights [8-10]. There have been few matrices [29, 30]. In line with best current practice in the
attempts to identify the key difficulties of detecting AI field, it suggested using a range of metrics to capture
interior design images, leaving a vacant area for studying distinct aspects of model performance, which justifies the
this field. choice of metrics made by the study.
AI-generated image detection usually relies on machine Despite the advancements, several challenges persist in
learning or deep learning models to identify little things detecting AI-generated images: (i) Subtle Artifacts:
about artificial intelligence-generated images that would Detections of high-quality AI-generated images are
not have come from them. Some commonly used complex because they are often not marked with visual
techniques include: artefacts. Generative models studied have recently
Convolutional Neural Networks (CNNs): In the past, demonstrated their ability to learn and generate
CNNs have been a core piece of image classification tasks. increasingly higher-quality actual image samples
They have been shown to learn spatial hierarchies in seamlessly. (ii) Computational Complexity: Despite being
images and to detect AI artefacts. For example, we highly accurate, transformer-based models are
successfully used CNNs to detect GAN-generated images computationally expensive, making it a difficult task for
[11, 12]. Global contextual relationships in high- resource-constrained environments. (iii) Dataset
dimensional data can be solved tremendously well with Limitations: The generalization or transferability of
CNNs [13], but they are commonly challenging. detection models for a specific domain, such as interior
Transformer-Based Architectures: Based on our design, is limited by the lack of standardized datasets.
Transformers, which were initially designed for natural We compare deep learning-based methods to detect AI-
language processing [14], we adapt them for vision tasks. generated images, particularly in interior design, as shown
Self-attention mechanisms used by Vision Transformers in Table 1. Then, it compares those approaches' strengths,
(ViTs) to capture local and global image patterns result in accuracy, precision, recall, and limits.
ViTs being very powerful for detecting minute
inconsistencies in AI-generated content [5, 15, 16]. In this
work, we build upon the success of ViTs by extending it to
Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 139
Previous literature has discussed the detection of AI-
generated images across the more general areas at length,
Table 1: Comparison of AI-generated image detection
with little focus on the domain-specific application,
methods
interior design. Furthermore, most of the studies employ
Methodol Key Accura Precisi Reca Limitations
ogy Strengths cy on ll CNN-based solutions, while others, looking at the full
CNN- Intense 85– High High Struggles capability of Vision Transformers, are less central. This
Based feature 92% with long- study evaluates multiple ViT configurations for detecting
Approach extraction range
es for local dependencie AI-generated interior design images to fill these gaps.
patterns; s; limited This literature review points out the significance and
effective effectivenes importance of Vision transformers as a current state-of-
for GAN- s on high-
based quality the-art approach for detecting AI-generated images. This
images textures study benefits from this capability since it helps to grow
Hybrid Combines 89– High High Increased the body of work on the authenticity of AI-generated
CNN- CNN's 94% computation
Transfor spatial al cost; content. Future work will then need to make computational
mer awareness complex efficiency improvements, tackle domain-particular
Models with model challenges, and standardize benchmarks for performance
Transform training
er's self- evaluation in interior design and more generally.
attention
Ensemble Enhances 91– High High Requires
Models classificati 95% large-scale 3 Proposed method
on datasets;
robustness computation The proposed method uses deep learning to distinguish AI-
by ally generated images in interior design from human-created
integratin expensive ones, as shown in Figure 1. Given that, for preprocessing
g multiple
architectur and balancing the input images, we limit samples per class
es to be uniform and split the data into training and validation
Vision Captures 96.25% 0.9637 0.96 High sets. The system uses the features extracted by Vision
Transfor fine- 25 computation Transformer (ViT) models (ViT-B16, ViT-B32, ViT-L16,
mers grained, al cost
(ViTs) global requires ViT-L32) and classifies images. The defined parameters
(Our dependenc extensive are used to train the model, and then you evaluate the
Approach ies via pretraining. metrics such as accuracy and F1 score. Performance
) self- analysis is realized through visualization of training
attention;
excels at samples, predictions, and validation metrics, leading to a
detecting robust and interpretable approach.
subtle
artefacts
Figure 1: Pipeline of the proposed methodology for AI-generated image detection in interior design. It consists of
dataset collection, preprocessing, the Vision Transformer (ViT) feature extraction, training with AdamW optimization,
140 Informatica 49 (2025) 137–150 H. Wang
and evaluating using accuracy, precision, recall, and F1 score to maintain an optimal tradeoff between efficiency and
performance.
Different sizes of network depth, hidden • Random Rotation (±15°) was applied to introduce
dimension size, self-attention heads, and total params are random image orientation variability, where 15° —
Base (B), Large (L), and Huge (H) Vision Transformer rotational deviation.
(ViT) models. ViT-B (Base) has 12 layers, 768 hidden • To simulate mirroring of interior design perspectives,
dimensions, and 86 million parameters, achieving good simulate Horizontal Flipping (50% probability).
performance and computational cost tradeoffs and being
practical in real-world AI-generated image detection. The • The effect of random cropping (90% of the original
better feature extraction performance results in ViT-L size) forces the model to pay attention to different
(Large) with 24 layers, 1024 hidden dimensions, and image portions.
307M parameters, which comes with higher computational • Applying Color Jitter (±0.2 on the Brightness,
cost. The most resource-intensive ViT is ViT-H (Huge), Contrast, and Saturation adjustments) simulates the
which comes with 32 layers, a hidden dimension of 1280, variations that might occur through lighting
and 632 million parameters. It was left out for its high conditions.
computational demands with no proportional accuracy
Pixel values are normalized to the range [0,1] or
gains. For this reason, Base and Large models have been
standardized using the mean 𝜇 and standard deviation 𝜎 of
addressed in this study, as they ensure the optimal balance
the dataset:
between accuracy and efficiency, consequently making
them feasible for AI-generated image detection in interior 𝐼′−𝜇
𝐼norm = (2)
design. 𝜎
Deep learning algorithms-based methodology to Images are divided into non-overlapping patches of size
detect artificial intelligence (AI) generated images in 𝑃 × 𝑃(e.g., 16 × 16 or 32 × 32:
interior design. The process consists of multiple steps,
Patch = {𝑝𝑖,𝑗: 𝑝𝑖,𝑗 ∈ 𝑅𝑃×𝑃
which are described in detail below: }, ∀𝑖, 𝑗 ∈ [1, 𝑁]
The first step in collecting the image dataset is to get
Where 𝑁 s is the number of patches per dimension,
an extensive collection of images. This dataset comprises
calculated as:
two main categories:
Image Size
• AI-Generated Images: AI tools and algorithms 𝑁 = (3)
Patch Size
images for interior design pictures.
For an image of 224×224 and a patch size 16, N=14 (i.e.,
• Real Images: Actual interior designs captured 14×14=196 patches). Each patch is flattened into a 1𝐷
using cameras or professionally curated vector and linearly projected into a 𝐷 -dimensional
photographs. embedding space using a learnable matrix, 𝑊𝑒:
The dataset must be diverse in design styles, lighting
𝑧𝑝 = 𝑊𝑒 ⋅ Flatten(𝑝𝑖,𝑗) (4)
conditions, and resolutions to generalize new images well.
Raw input images are standardized to make them
Where 𝑧𝑝 ∈ 𝑅𝐷 , is the embedded representation of a patch.
appropriate for input into the ViT model and for better
performance. Each image is resized to 224 × 224 pixels: To encode spatial information, a positional embedding
𝐼′ = Resize(𝐼, 224,224) (1) 𝑒pos is added to each patch embedding:
Where 𝐼 is the original image and 𝐼′, is the resized image. 𝑧′
𝑝 = 𝑧𝑝 + 𝑒pos (5)
To prevent overfitting and improve robustness, performed
Where 𝑒pos, is a learnable positional embedding vector.
data augmentation, which includes:
• Random Rotation (±15°) was applied to introduce The sequence of patch embeddings is passed through
random image orientation variability, where 15° — multiple Transformer encoder layers. Each layer consists
rotational deviation. of Multi-Head Self-Attention (MHSA) scores are
computed as follows:
• To simulate mirroring of interior design perspectives,
simulate Horizontal Flipping (50% probability). 𝑄𝐾𝑇
Attention(𝑄, 𝐾, 𝑉) = Softmax ( ) 𝑉 (6)
√𝑑
• The effect of random cropping (90% of the original 𝑘
size) forces the model to pay attention to different where:
image portions.
• 𝑄 = 𝑊𝑞 ⋅ 𝑧′
𝑝 (query)
• By applying Color Jitter (±0.2 on the Brightness,
• 𝐾 = 𝑊 ′
𝑘 ⋅ 𝑧𝑝 (key)
Contrast, and Saturation adjustments), I'm simulating ,
the variations that might occur through lighting • 𝑉 = 𝑊 ′
𝑣 ⋅ 𝑧𝑝 (value)
conditions. • 𝑊𝑞 , 𝑊𝑘 , 𝑊𝑣 , are learnable weight matrices.
• 𝑑𝑘, is the dimensionality of the key.
Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 141
Multi-head attention is computed as: of 1.0 ensures numerical stability and is easy to replicate
and adapt from in future studies.
MHSA(𝑧′𝑝 ) = Concat(head1, … , headℎ)𝑊𝑜 (7)
where 𝑊𝑜, is an output projection matrix.
Feed-Forward Neural Network (FFN): Each patch
embedding is processed through a two-layer fully Table 2: Training Hyperparameters
connected network with activation: Parameter Value
Optimizer AdamW (Decoupled
FFN(𝑧) = ReLU(𝑧𝑊1 + 𝑏1)𝑊2 + 𝑏2 (8) Weight Decay)
where 𝑊1, 𝑊2 and 𝑏1, 𝑏2, are learnable parameters. Learning Rate 5e-5 (decayed using
cosine annealing)
Residual Connections and Layer Normalization: Each Learning Rate Schedule Cosine Annealing with a
block includes skip connections and normalization: warm-up for the first five
epochs
𝑧𝑙+1
𝑝 = LayerNorm (𝑧𝑙
𝑝 + MHSA(𝑧𝑙
𝑝)) (9)
Batch Size 16
𝑧𝑙+1
𝑝 = LayerNorm (𝑧𝑙
𝑝 + FFN(𝑧𝑙 Weight Decay 0.01
𝑝))
Dropout Rate 0.1
(10) Training Epochs 10
A unique learnable classification token 𝑧𝑙
cls, is prepended Gradient Clipping Norm Clip at 1.0
to the patch sequence: Loss Function Binary Cross-Entropy
Loss
𝑧𝑙+1
cls = Transformer(𝑧𝑙 𝑙
cls, {𝑧𝑝}) Validation Split 80% Train, 20%
(11) Validation
where 𝑧𝑙
cls, aggregates global information for The results of the proposed method are evaluated by using
classification. the following metrics:
The output of the classification token is passed through a TP+TN
Accuracy = (15)
softmax layer to produce probabilities for the two classes, TP+TN+FP+FN
(𝑦real, 𝑦AI): Accuracy is a general measure of the total correctness
of the model. However, as always in machine learning, it
ŷ = Softmax(Wc ⋅ zcls + bc)
is not for class imbalance as a model that always predicts
(12)
"AI generated" would still have high accuracy if the
Where 𝑊𝑐 and 𝑏𝑐, are learnable parameters. dataset was skewed. An accuracy score ranging above 90%
is an indication that the model is working reasonably well
The binary cross-entropy loss is:
overall. It does not mean that the model is not biased
1
L = − ∑N toward one class.
i=1[y g y ) ( − ) l g 1 y )
N i lo ( î + 1 yi o ( − î ]
(13) TP
Precision = (16)
TP+FP
Where 𝑦𝑖 , is the ground truth label.
The precision measures how many of the detected AI-
The model is trained using the Adam optimizer: generated images are AI-generated. In an application
where false positives are to be minimized, such as
θt+1 = θt − η∇L(θt)
incorrectly labelling accurate interior designs as AI-
(14) generated, such normalization is a must. A high precision
Θ represents model parameters, η is the learning rate, and (>90%) implies the model does not misclassify human-
∇ℒ is the loss gradient. created images as AI-generated. If the score is less than
80% (i.e., a precision of less than ~80%, which is a high
We provide a detailed breakdown of false positive rate), then the model may be too unreliable
hyperparameters and training configurations of our for commercial use.
experiments to guarantee reproducibility in Table 2.
TP
Similar to AdamW, which is known for its good Recall = (17)
TP+FN
generalization of Transformer-based architectures, we use
the version of AdamW. A weight decay of 0.01 helps to The recall measures how much the model fails to
prevent overfitting. Beginning with a warm-up at the first identify images without missing AI-generated images.
five epochs, we apply a cosine annealing schedule with a However, recall is a key metric for applications where
warm-up to avoid early instability and then gradually finding all the AI-generated content is more important than
decay the learning rate in the rest of the training. A avoiding false positives. A high recall (>90%) means the
memory-efficient yet stable update is done in a batch size model fails to capture AI-generated images. If there is a
of 16. These hyperparameters are detailed and mimic in low recall (<80%), the model cannot correctly 'detect'
training, especially in deep ViT models; gradient clipping
142 Informatica 49 (2025) 137–150 H. Wang
many of these AI-generated images, resulting in many • DeiT models are optimized for datasets on the smaller
false negatives. side, and their efficiency is based on knowledge
Precision⋅Recall distillation. Although they reduce training costs, they
F1-Score = 2 ⋅
Precision+Recall are less suitable for capturing global dependencies in
(18) image authenticity verification by AI because they
rely on CNN-like inductive biases.
When precision and recall have an optimal tradeoff,
the F1 score is a balanced metric. In particular, it is suitable • Applications in object detection: As an object
for AI image detection, where you want to minimize false detection application, Swin utilizes hierarchical
positives and negatives. A high F1 score (i.e., >90%) feature learning with shifting windows, so it is
indicates that the model can balance precision and recall efficient. Nevertheless, our main objective in global
well. If the F1-score is low (<80%), then the model is feature extraction is achieved by standard ViTs owing
overfitting to one class (i.e., giving in precision or recall to their pure self-attention mechanism.
disproportionally). Consequently, we did not explore hybrid transformers
Different ViT configurations, such as ViT-B16: Base to examine the effects of patch size and model capacity on
model, patch size 16 × 16, are used. ViT-B32: Base model, AI-generated image detection.
patch size 32 × 32. ViT-L16: Large model, patch size Figure 2 illustrates the proposed method's ability to
16 × 16. ViT-L32: Large model, patch size 32 × 32. classify images as AI-generated (T: Using Vision
Each configuration affects the balance between Transformers, we represent visual tokens to classify
computational efficiency and detection accuracy. images as either AI Created (T: AI) or human-created (T:
Consideration of alternative hybrid transformer Human). The predicted labels (P: Below each
architectures was considered in this study, such as DeiT classification, we have written AI or P: Human. The model
(Data efficient Image Transformer) and Swin can distinguish between AI-generated and authentic
Transformer. Still, due to the following reasons, they have human-created interior design images in different settings.
not been part of this study.
Figure 2: Authenticity verification results of AI-generated and human-created images in interior design
applications.
4 Experimental setup
This research study fine-tuned Vision Transformers
(ViTs) by classifying human-crested indoor design images The database of images related to interior design was
from AI-crested indoor design images. The experiments compiled to be balanced, and the images were
were conducted with various ViT variants to account for preprocessed to guarantee rigorous training and testing.
the model capacity, achieving different patch sizes. The dataset of AI-vs-human images is available at
https://www.kaggle.com/datasets/shirshaka/ai-vs-human-
generated-images. Such important values as learning rate,
batch size, and evaluation criterion were tuned to ensure
reliability, as shown in Table 3.
Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 143
Validation Strategy Evaluation performed after
each epoch.
5 Results and analysis
For this task, we evaluate four Vision Transformer
Table 3: Overview of the experimental setting, including (ViT) models—ViT-B16, ViT-B32, ViT-L16, and ViT-
model architectures used, details of the data sets and L32—to distinguish between real and artificial interior
preprocessing, and training and evaluation parameters design images generated by AI. This section presents the
applied in classifying human and AI-generated interior design validation results and analysis. Based on essential metrics
images.
like loss, accuracy, F1 score, precision, recall, runtime, and
Aspect Details
computational efficiency, the models were compared in
Vision Transformers (ViT) Table 4 and Figures 3-6. The results quantify the tradeoff
variants: between accuracy and efficiency across various model
• vitb16: Base configurations, with smaller patch sizes (16×16) achieving
model, patch size higher accuracy and F1 scores and larger patch sizes
16 (32×32) for more computational throughput. The most
Models
• vitb32: Base appropriate model for this classification task is identified
model, patch size through a detailed comparison.
32
• vitL16: Large
model, patch size Table 4: Validation performance reached by ViT models
16 (ViT-B16, ViT-B32, ViT-L16, and ViT-L32) on studying
• vitL32: Large AI-generated image classification.
model, patch size Metric ViT- ViT- ViT- ViT-
32 B16 B32 L16 L32
Pretraining All models were pre-trained
on ImageNet-21k.
Accurac 96.25% 80.00% 96.25% 81.25%
Fine-tuning Task Binary classification:
y
Class 0: Human-generated
images F1 Score 0.9625 0.8000 0.9625 0.8118
• Class 1: AI-
generated images
Dataset Custom dataset of interior 0.9637 0.8002 0.9637 0.8175
design images categorized Precision
as accurate (human) or fake
(AI).
Recall 0.9625 0.8000 0.9625 0.8125
Sample Limitation The sample limit is 1000
samples per class per
category. Loss 0.1154 0.4970 0.1206 0.4469
Data Splitting 80% training, 20%
validation split. Runtime 15.7407 15.3469 18.4198 15.109
(s) 6
Image Processing Transformation pipeline:
• Resize to 224x224 Samples 10.165 10.426 8.686 10.589
pixels per
• Convert to tensor Second
• Normalize using Steps per 0.635 0.652 0.543 0.662
ImageNet mean
Second
and standard
deviation.
Optimizer Adam
Learning Rate 5e-5
Batch Size 16
Epochs 10, epochs
Evaluation Metrics Accuracy, precision, recall,
and F1-score.
144 Informatica 49 (2025) 137–150 H. Wang
Figure 3: The ViT_B16 model validation results over ten Figure 6: Validation metrics of the ViT-L32 model over
epochs with a decline in loss and an accurate convergence ten epochs show loss declining to 0.44 by epoch 8, while
of accuracy, F1 score, precision, and recall around 96% at accuracy, F1 score, precision, and recall stabilize around
epoch 8. 81% by the final epoch.
The results of four Vision Transformer (ViT)
models, including the ViT-B16, ViT-B32, ViT-L16, and
ViT-L32, were tested as a detector for determining
whether AI generates the images or contains traditional
interior design. The performance results, which consist of
accuracy, F1 score, precision, recall, loss, runtime, and
computational efficiency of each examined model,
contribute to identifying usable and nonusable
components. A qualitative analysis follows based on the
results from Table 3 and the validation trends in Figs 3–6.
Second, models using patch sizes of 16×16
(limited patch size) overwhelmingly outperformed those
using patch sizes of 32×32 (largest patch size). Our
Figure 4: Validation metrics of the ViT-B32 model validation accuracy was 96.25%, F1 score 0.9625,
during ten epochs with loss have converged, and precision 0.9637 and recall 0.9625. The results
accuracy, F1 score, precision, and recall at a plateau of demonstrate that these models can accurately discriminate
80% around the final epoch. between AI-generated and authentic images. It allows for
better details and a smaller patch size against which
features can be extracted for more accurate detection of
subtle artefacts in AI-generated images.
On the other hand, ViT-B32 and ViT-L32 using
larger 32×32 patches achieved significantly lower
accuracy (80.00% and 81.25%) and F1 scores (0.8000 and
0.8118). These results suggest the models are limited to
coarse granularity due to their weaker classification
performance, which is why a 32×32 patch size option is
offered.
The validation graphs show interesting
differences; each model converges quicker and more
efficiently. At the end of epoch 8, ViT-B16 (Figure 1)
Figure 5: ViT-L16 model validation metrics on ten steadily reduces its validation loss to 0.1154, and we
epochs, quickly converging to 3 epochs, with loss of
around 0.12 and accuracy of around 96%, F1 score, observe that the accuracy, precision, recall, and F1 scores
precision, and recall of around 96, respectively. settle at around 96%. It shows how robust and efficient, in
theory, it is at learning.
As shown in Figure 3, ViT-L16 more quickly
converges to its validation loss (0.1206) as early as epoch
3. Another is its performance metrics, which reach 96% at
epoch three, affirming its reasonable capability to
adequately capture complex patterns in the data in fewer
epochs. However, this raises the computational price.
ViT-B32 (Figure 2) and ViT-L32 (Figure 4) take
longer to converge, losing at 0.4970 and 0.4469
respectively. These models achieve precision and recall at
Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 145
around 80–81%, whereas the smaller patch-size models in its computational cost. On the other hand, ViT-B32 and
reach their precision and recall plateau earlier. ViT-L32 pick the path of efficiency over precision, being
On the other hand, small patch size models (ViT- good candidates for real-time applications where speed is
B16, ViT-L16), although providing higher classification more important than classification accuracy. The
performance, incur higher computational costs. As with importance of choosing the correct model configuration is
ViT-L16, the runtime of this setup is 18.4198 seconds, made clear in this comprehensive comparison of the
with the lowest throughput of 8.686 samples per second 'theory' against the specific needs of the task.
and 0.543 steps per second, reflecting this setup's high It is a standard evaluation measure of classification
computational complexity. Though less efficient than the tasks, which summarizes the model's performance across
32×32 patch models, ViT-B16 processes 10.165 samples different thresholds in a single graph called the Area Under
per second at a runtime of 15.7407 seconds, making it a The Curve (AUC) graph. It gives an overall score of model
good balance between performance and efficiency. effectiveness by providing a measure of the tradeoff
However, a comparison of ViT-B32 and ViT-L32 between the True Positive Rate (sensitivity) and the False
reveals that ViT-B32 is considerably more efficient, Positive Rate. In the context of authenticity verification,
reaching a throughput of 10.589 samples per second and a we use evaluation accuracy as a proxy for AUC and allow
runtime of 15.1096 seconds, which makes it the fastest. the performance of models to be compared directly in
Nevertheless, their F1 scores and reduced accuracy make Figure 7.
their application less appropriate for high-precision tasks.
Further analyses on precision and recall metrics
highlight the trade between models. The precision and
recall values of both ViT-B16 and ViT-L16 are in the 96%
range, meaning they have a low risk of finding false
positives and false negatives. They are ideal for tasks with
high accuracy, making them perfect.
ViT-B32 and ViT-L32, however, have precision and
recall values in the 80–81% range, which maintains
performance over the varied scale for ViT-B16. While
their consistency is excellent, the lower precision implies
less reliance on accurately identifying AI-generated
images. The validation metric trends provide additional
clarity:
• ViT-B16 (Figure 3): With growing numbers of
epochs, it shows steady improvement and stable
Figure 7: Detecting AI-generated images in interior
performance from epoch 8, and this is an excellent
design applications by comparing AUC among Vision
balance between the capacity of learning and
Transformer models (ViT-B16, ViT-L16, ViT-L32, and
efficiency.
ViT-B32). Although slightly worse, other models have
• ViT-L16 (Figure 5): It converges remarkably fast,
comparable outcomes; even ViT-B16 is the strongest.
stabilizing by epoch 3, but at a higher computational
cost, making it an attractive solution when fast This study's results show that Vision Transformers
training is a top priority. (ViTs) outperform conventional CNN-based methods in
• ViT-B32 (Figure 4) and ViT-L32 (Figure 6): Slow detecting interior design imaging generated by AI. By
learning with little ability to capture minute comparing the models, it is concluded that the best-
differences in the data, all exhibit gradual performing model, ViT-B16, could perform at an accuracy
improvement over ten epochs. of 96.25% and an F1 score of 0.9625, thus proving to
The results reveal the tradeoff between accuracy and distinguish AI-generated images from the real ones. While
computational efficiency. ViT-B16 is the most balanced these results are promising, it is necessary to contextualize
model with reasonable throughput, runtime, and accuracy them by comparing them to prior AI-generated image
(96.25%). Equally accurate, ViT-L16 is too detection in other fields, such as medical imaging, digital
computationally intensive for use when accuracy isn't the art, and deepfake detection, as shown in Table 5.
top concern. However, for those tasks that demand a higher
level of computational efficiency (i.e., speed), ViT-B32
and ViT-L32 are favourable. Since the reduced accuracy Table 5: Contextual comparison of AI-generated image
renders them unusable for high-precision calculation, the detection methods
entire ViT family may be overkill for some applications. Domain Best Accur F1 Key
ViT-B16 seems to be a better model for detecting AI- Model acy Sco Observati
generated images in interior design than the rest, as its re ons
tradeoff between accuracy and computational efficiency is Medical ViT-based 94.7% - ViTs
better. While ViT-L16 has a higher computational cost, its Imaging Histopath effectivel
fast convergence and high accuracy make it ideally suited ology y detect
to scenarios seeking the highest precision, with a tradeoff Model synthetic
146 Informatica 49 (2025) 137–150 H. Wang
(Arshed et medical ViT- 580 sec 10.2 GB 80.00% 0.8000
al., 2023) images B32
but ViT- 940 sec 16.8 GB 96.25% 0.9625
struggle L16
with ViT- 810 sec 14.3 GB 81.25% 0.8118
highly L32
high-
resolution The ViT-B16 configuration achieves the best tradeoff
textures. between accuracy and computational efficiency. ViT-L16
gets comparable accuracy but requires much more memory
Digital GAN- 85– - CNNs are
Art Based 92% effective and training time than Quilt. ViT-L16, ViT-B16, ViT-B32,
Authentic CNN but prone and ViT-L32 require less computational load than larger
ation Model to false patch sizes but offer lower accuracy. The results show that
the most practical model for real-world AI-generated
(Vivaldi positives
image detection in interior design is ViT-B16; they are
& Sutedja, due to
accurate and come with reasonable training time and
2024) intricate
artistic memory usage.
patterns. We also performed additional experimental
evaluations, using an imbalanced dataset and noisy inputs,
Deepfake ViT- - 0.95 ViTs
to test our models' robustness. In both tests, real-world
Detection Based excel at
samples are simulated, and ViTs are tested to see their
Deepfake capturing
stability in different data conditions. We had changed the
Detector subtle
class distributions (70% of AI-generated images, 30%
(Zhao et inconsiste
authentic images). ViT-B16 performance dropped slightly
al., 2023) ncies in
(Accuracy: 94.2%, F1 Score: 0.945). The model was
AI-
stable; thus, it was resilient to imbalanced data. We
generated
degraded the inputs using Gaussian noise (σ=0.05) and
human
random occlusions. However, ViT-B16 achieved high
faces.
accuracy (93.5%) while ViT-B32 and ViT-L32 decreased
Interior ViT-B16 96.25 0.96 ViT-B16
below 75%. Self-attention in ViTs helps retain essential
Design % 25 outperfor
features; however, larger patch sizes suffer from losing
(Our ms
fine details in noisy conditions. Inference on challenging
Study) existing
conditions confirms that ViT-B16 is the most robust
methods
model. Further work will be pursued to enhance the model
by
resilience with adversarial training techniques.
preserving
fine-
grained 6 Discussion
textures
Results from the experiment confirm the incredible
and
performance of Vision Transformers (ViTs) in
capturing
distinguishing AI-generated interior design images. For
long-
smaller patch sizes such as ViT-B16 and ViT-L16, we
range
achieve an impressive accuracy of 96.25% in identifying
dependen
subtle artefacts. This makes them an ideal choice for high-
cies.
precision authenticity verification. Similarly,
Table 5 compares training time, memory usage, configurations with larger patch sizes, such as ViT-B32
and model performance to ensure the computational and ViT-L32, optimize for speed at the expense of some
efficiency of different ViT configurations. The analysis accuracy. Real-time applications, or environments with
must identify the most reasonable model for detecting AI- resource constraints, apply generously to these
generated images in interior design concerning configurations. Our findings demonstrate that ViTs can be
computation cost and accuracy. scalable for other creative fields, such as architecture and
visual art. Future work will concentrate on designing
Table 5: Computational dfficiency of ViT configurations hybrid architectures for optimal precision and efficiency.
Model Training Memory Accuracy F1 This work has shown that ViTs can be a powerful tool
Time Usage (%) Score for distinguishing AI-generated from human-generated
(per (GB) images in interior design. Its results highlight the promise
epoch, and pain of using them in this way, which can be extended
sec) to many other application areas. Across four ViT
ViT- 720 sec 12.5 GB 96.25% 0.9625 configurations (ViT-B16, ViT-B32, ViT-L16, and ViT-
B16 L32), we summarize the findings regarding the tradeoffs
Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 147
between model accuracy, computational efficiency, and lighting conditions and, thus, are better suited for more
the nature of data representation. generalized AI detection frameworks.
Using smaller patch sizes (16×16) like ViT-B16 and However, the observed tradeoffs between accuracy and
ViT-L16, the models demonstrate superior performance efficiency indicate that task-specific model selection is
over all the metrics like accuracy, precision, recall, and F1 critical. High-precision applications may benefit from
score and reach values close to 96.25%. That is to say, smaller patch sizes and larger models; conversely,
those models are more capable of discerning the relatively computationally efficient configurations may prove
subtle inconsistencies and artefacts typical of artificial preferable for scenarios where scalability and speed are
images that are indistinguishable from reality in the human paramount in large-scale design database audits.
eye. ViTs display robust ability in this binary classification By demonstrating the effectiveness of ViTs in
problem by extracting detailed spatial and contextual differentiating two sets of images produced by AI in
features. interior design, this study lays the groundwork for
However, the computational demands of ViTs became developing more sophisticated AI authenticity verification
a more significant consideration. ViT-L16 converged algorithms. Through tailored model configurations to
faster (within three epochs) than ViT-B16, which achieved particular use cases, the tradeoffs between accuracy and
high accuracy, but its computation overheads—runtime efficiency can be worked through effectively, enabling
and throughput—make it less practical for resource- general use in the creative domain and further.
constrained environments. On the other hand, ViT-B16 The current AI-generated image detection techniques
also achieved comparable accuracy but with relatively mainly depend on a CNN-based model with local receptive
lower computational costs. Given applications such as fields to extract hierarchical spatial features. While CNNs
interactive design tools or automated verification systems have identified GAN artefacts when such CNNs are
that require real-time processing, the efficiency gains applied to high-resolution photo-realistic synthetic interior
enabled by models like ViT-B32 may be preferable to less design images, traditional, deepfake, or low-quality
precise models, though they would be less accurate. synthetic artefacts are absent from the synthetic images.
The results are essential for real-world deployment in The CNNs cannot find them. On the other hand, ViTs like
interior design and related fields. Integrating high- ViT-B16 use self-attention mechanisms that work across
accuracy models such as ViT-B16 into quality assurance the entire image to find inconsistencies that CNNs would
pipelines can assure the authenticity of design assets to miss. Comparative performance between ViT-B16 and the
verify usage and prevent misrepresentation. Like ViTs, the paper reported in previous literature is presented in Table
versatility of ViTs in processing diverse datasets shows 6.
how ViTs are adaptable to diverse design styles and
Table 6: Comparative performance analysis of ViT-B16 vs CNN-based methods.
Model Architecture Accuracy F1 Key Limitations
Score Strengths
CNN-Based Convolutional 85–92% 0.85– Intense Struggles with
Methods feature 0.91 spatial long-range
extraction feature dependencies,
learning, poor
efficient on generalization to
small-scale high-quality AI-
datasets generated
images
Hybrid CNN for local 89–94% 0.89– Balances Computationally
CNN- features, 0.94 CNN expensive,
Transformer Transformer efficiency complex
for long- with training process
range context Transformer's
self-attention
ViT-B16 Vision 96.25% 0.9625 Captures Requires
(Our Model) Transformer both local significant
with small and global pretraining and
patch size dependencies higher
(16×16) with high computational
accuracy on resources
high-quality
AI images
patch sizes, like ViT-B16 and ViT-L16, had significantly
We also observe that the performance of ViT depends
better accuracy than models with bigger patch sizes (like
on patch size. Our results show that models with smaller
148 Informatica 49 (2025) 137–150 H. Wang
ViT-B32 and ViT-L32). Even for ViT-B32, the accuracy this research attempts to contribute to AI authenticity
dropped to 80.00%, and for ViT-L32, it dropped to verification in interior design using transformer-based
81.25%, indicating that the solutions fell considerably image classification. Future work will consider improving
behind their small patch counterpart. This discrepancy is computational efficiency, enhancing the set of images used
because smaller patches can preserve fine-grained details. in the dataset with more diverse AI-generated photos, and
When an image is tokenized into larger patches, the loss of combining convolutional and transformer-based models.
information can occur due to aggregation of critical spatial Finally, we will investigate adversarial robustness for
information like subtle shading, textural variations, and improving the model's resilience against evolving
delicate contours. The interior design images are of generative techniques. Such advances will further bolster
intricate patterns and highly detailed material textures, for AI image detection, as it is utilized in digital content
which feature extraction is better maintained with small verification.
patch sizes. Furthermore, the self-attention module
receives fewer tokens to process in larger areas, which can References
impact the model learning the distinction between
authentic vs AI-generated images. It sets smaller patch [1] J. Hutson, J. Lively, B. Robertson, P. Cotroneo, and
sizes, leading to denser tokenization, so the ViT model can M. Lang, Creative Convergence: The AI
retain more information and distinguish between the real Renaissance in Art and Design. Springer Nature,
world and AI-generated designs. pp. 1–19, Nov. 2023, doi: 10.1007/978-3-031-
The results show that ViTs outperform CNN-based 45127-0_1
models in detecting AI-generated images; however, [2] D. Saxena and J. Cao, "Generative adversarial
several limitations should be considered. Even though the networks (GANs) challenges, solutions, and future
data is diverse, there could still be latent biases in the directions," ACM Computing Surveys (CSUR), vol.
lighting styles. Through specific aesthetic design 54, no. 3, pp. 1-42, 2021.
preferences, the model may figure out the detection of https://doi.org/10.1145/3446374
style incoherencies rather than actual AI artefacts. [3] F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M.
However, future work will have to cross-domain on Shah, "Diffusion models in vision: A survey," IEEE
datasets generated by different AI models (e.g., GANs vs. Transactions on Pattern Analysis and Machine
Diffusion models) to validate their generalization Intelligence, vol. 45, no. 9, pp. 10850-10869, 2023.
properties. However, ViT-B16 reaches high accuracy but https://doi.org/10.1109/tpami.2023.3261988
still consumes ample computational resources (12.5GB [4] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S.
memory for each epoch). The ViT-based detection systems Khan, and M. Shah, "Transformers in vision: A
deployed on edge devices or real-time applications may be survey," ACM computing surveys (CSUR), vol. 54,
performable with model compression techniques like no. 10s, pp. 1-41, 2022.
knowledge distillation or quantization. Potential Evasion https://doi.org/10.1145/3505244
by Advanced AI Models As soon as AI-generated images [5] N. Anantrasirichai, F. Zhang, and D. Bull,
become fancier, detection models must change. The AI "Artificial Intelligence in Creative Industries:
images could be created using adversarial attacks to avoid Advances Prior to 2025," arXiv preprint
detection, and the training process for models would need arXiv:2501.02725, 2025.
to be continuously updated. These limitations provide https://doi.org/10.1007/s10462-021-10039-7
future improvements in AI-generated image detection, [6] M. A. Moharram and D. M. Sundaram, "Land use
which is scalable and adaptive. and land cover classification with hyperspectral
data: A comprehensive review of methods,
7 Conclusion challenges and future directions," Neurocomputing,
vol. 536, pp. 90-113, 2023.
For interior design, this study shows the viability of Vision https://doi.org/10.1016/j.neucom.2023.03.025
Transformers (ViTs) as a method to differentiate AI- [7] K. Han et al., "A survey on vision transformer,"
generated images from human-made designs. We then find IEEE transactions on pattern analysis and machine
a clear tradeoff between accuracy and computational intelligence, vol. 45, no. 1, pp. 87-110, 2022.
efficiency by fine-tuning multiple ViT configurations [8] G. Bansal, A. Nawal, V. Chamola, and N.
(ViT-B16, ViT-B32, ViT-L16, ViT-L32). Classifiers Herencsar, "Revolutionizing visuals: the role of
using smaller patches (patches size: 16×16) performed generative AI in modern image generation," ACM
better, and ViT-B16 achieved 96.25% accuracy and Transactions on Multimedia Computing,
0.9625 (F1 score). The key outcome of these results is that Communications and Applications, vol. 20, no. 11,
delicate feature extraction improves AI image detection, pp. 1-22, 2024.
and ViT-B16 is the most appropriate model for real-world https://doi.org/10.1109/tpami.2022.3152247
applications. On the other hand, with computational [9] A. Kulkarni, A. Shivananda, A. Kulkarni, and D.
benefit, higher patch size models (such as 32×32) do have Gudivada, "Diffusion Model and Generative AI for
worse performance but are better suited for lower precision Images," in Applied Generative AI for Beginners:
applications. Due to our findings regarding the necessity Practical Knowledge on Diffusion Models,
of selecting models according to task requirements and ChatGPT, and Other LLMs: Springer, 2023, pp.
balancing accuracy, efficiency, and resource constraints,
Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 149
155-177. https://doi.org/10.1007/978-1-4842- transformers," IEEE Access, vol. 11, pp. 123433–
9994-4_8 123444, 2023. doi: 10.1109/access.2023.3329952
[10] S. Bengesi, H. El-Sayed, M. K. Sarker, Y. [22] J. Maurício, I. Domingues, and J. Bernardino,
Houkpati, J. Irungu, and T. Oladunni, "Comparing vision transformers and convolutional
"Advancements in Generative AI: A neural networks for image classification: A
Comprehensive Review of GANs, GPT, literature review," Applied Sciences, vol. 13, no. 9,
Autoencoders, Diffusion Model, and p. 5521, 2023.
Transformers," IEEE Access, vol. 12, pp. 69812– https://doi.org/10.3390/app13095521
69837, 2024, doi: 10.1109/access.2024.3397775 [23] T. Walczyna, D. Jankowski, and Z. Piotrowski,
[11] D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, "Enhancing Anomaly Detection Through Latent
and L. Verdoliva, "Are GAN generated images Space Manipulation in Autoencoders: A
easy to detect? A critical analysis of the state-of- Comparative Analysis," Applied Sciences, vol. 15,
the-art," in 2021 IEEE international conference on no. 1, p. 286, 2024.
multimedia and expo (ICME), 2021: IEEE, pp. 1-6. https://doi.org/10.3390/app15010286
https://doi.org/10.1109/icme51207.2021.9428429 [24] D. H. Hagos, R. Battle, and D. B. Rawat, "Recent
[12] T. Arora and R. Soni, "A review of techniques to advances in generative ai and large language
detect the GAN-generated fake images," models: Current status, challenges, and
Generative Adversarial Networks for Image-to- perspectives," IEEE Transactions on Artificial
Image Translation, pp. 125-159, 2021. Intelligence, vol. 5, no. 12, pp. 5873–5893, Dec.
https://doi.org/10.1016/b978-0-12-823519- 2024, doi: 10.1109/tai.2024.3444742.
5.00004-x [25] S. P. J. Christydass, N. Nurhayati, and S.
[13] A. Khan et al., "A survey of the vision transformers Kannadhasan, Hybrid and Advanced Technologies:
and their CNN-transformer based variants," Proceedings of the International Conference on
Artificial Intelligence Review, vol. 56, no. Suppl 3, Hybrid and Advanced Technologies (ICHAT 2024),
pp. 2917-2970, 2023. April 26-28, 2024, Ongole, Andhra Pradesh, India
https://doi.org/10.1007/s10462-023-10595-0 (Volume 2). CRC Press, 2025.
[14] A. Rahali and M. A. Akhloufi, "End-to-end https://doi.org/10.1201/9781003559115
transformer-based models in textual-based NLP," [26] M. M. Meshry, "Neural rendering techniques for
AI, vol. 4, no. 1, pp. 54-110, 2023. photo-realistic image generation and novel view
https://doi.org/10.3390/ai4010004 synthesis," University of Maryland, College Park,
[15] H. Bougueffa et al., "Advances in AI-Generated 2022.
Images and Videos," International Journal of [27] S. Susan and A. Kumar, "The balancing trick:
Interactive Multimedia & Artificial Intelligence, Optimized sampling of imbalanced datasets—A
vol. 9, no. 1, 2024. brief survey of the recent State of the Art,"
https://doi.org/10.9781/ijimai.2024.11.003 Engineering Reports, vol. 3, no. 4, p. e12298, 2021.
[16] A. S. Paladugu, A. Deodeshmukh, A. R. Shekatkar, https://doi.org/10.1002/eng2.12298
I. Kandasamy, and V. WB, "Detection of [28] X. Jiang and Z. Ge, "Data augmentation classifier
Artificially Generated Images Using Shifted for imbalanced fault classification," IEEE
Window Transformer with Explainable Ai," Transactions on Automation Science and
Available at SSRN 5025934. Engineering, vol. 18, no. 3, pp. 1206-1217, 2020.
https://doi.org/10.2139/ssrn.5025934 https://doi.org/10.1109/tase.2020.2998467
[17] L. Yin et al., "Convolution-Transformer for Image [29] O. Rainio, J. Teuho, and R. Klén, "Evaluation
Feature Extraction," CMES-Computer Modeling in metrics and statistical tests for machine learning,"
Engineering & Sciences, vol. 141, no. 1, 2024. Scientific Reports, vol. 14, no. 1, p. 6086, 2024.
https://doi.org/10.32604/cmes.2024.051083 https://doi.org/10.1038/s41598-024-56706-x
[18] H. Tang, D. Liu, and C. Shen, "Data-efficient multi- [30] P. Fergus and C. Chalmers, "Performance
scale fusion vision transformer," Pattern evaluation metrics," in Applied Deep Learning:
Recognition, vol. 161, p. 111305, 2025. Tools, Techniques, and Implementation: Springer,
https://doi.org/10.1016/j.patcog.2024.111305 2022, pp. 115-138. https://doi.org/10.1007/978-3-
[19] W. Zheng, S. Lu, Y. Yang, Z. Yin, and L. Yin, 031-04420-5_5
"Lightweight transformer image feature extraction
network," PeerJ Computer Science, vol. 10, p.
e1755, 2024. https://doi.org/10.7717/peerj-cs.1755
[20] L. Scabini, A. Sacilotti, K. M. Zielinski, L. C.
Ribas, B. De Baets, and O. M. Bruno, "A
Comparative Survey of Vision Transformers for
Feature Extraction in Texture Analysis," arXiv
preprint arXiv:2406.06136, 2024.
[21] D. Konstantinidis, I. Papastratis, K. Dimitropoulos,
and P. Daras, "Multi-manifold attention for vision
150 Informatica 49 (2025) 137–150 H. Wang
Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 151
https://doi.org/10.31449/inf.v49i16.7839 Informatica 49 (2025) 151–170 151
Efficient Logistics Path Optimization and Scheduling Using Deep
Reinforcement Learning and Convolutional Neural Networks
Yan Yang1, 2, *, Kang Wang1, 2
1School of Economics and Management, Jiaozuo University, Jiaozuo 454000, Henan, China
2Graduate School, University of the East, Manila 0900, Philippines
E-mail: yangyan_edu@outlook.com
*Corresponding author
Keywords: CNN, DRL, logistics path optimization, real-time scheduling, robustness scoring
Received: December 17, 2024
With the rapid development of e-commerce and online shopping, the logistics industry is facing
unprecedented challenges. Traditional logistics path - planning methods, such as SPA, HA, GA, etc.,
struggle to cope with the complex and ever-changing logistics environment. To address this issue, this
study proposes an innovative model that combines Deep reinforcement learning (DRL) with a
Convolutional neural network (CNN) to achieve efficient logistics path optimization. In this research, a
detailed analysis and pre-processing of the public datasets, the City Logistics Dataset (CLDS) and the
Traffic Status Dataset (TSDS), were carried out to construct a model capable of effectively handling
diverse logistics environments. Six baseline methods, namely the classic shortest path algorithm (SPA),
heuristic algorithm (HA), genetic algorithm (GA), rule-based method (RBM), traditional deep
reinforcement learning method (TDRM), and the most advanced deep learning method (ADLM), were
selected for comparison. The experimental results indicate that the proposed model performs excellently
across various environments. For instance, in suburban areas, it achieves a path length of 180 kilometers,
a completion time of 120 minutes, a punctuality rate of 92%, and a dispatch success rate of 95%. In urban
settings, the path length is 200 kilometers, the completion time is 150 minutes, the punctuality rate is 90%,
and the dispatch success rate is 93%. On highways, it reaches a path length of 170 kilometers, a
completion time of 110 minutes, a punctuality rate of 93%, and a dispatch success rate of 95%. Compared
with the baseline methods, the model shows significant improvements in key metrics such as path length,
completion time, punctuality, and dispatch success rate. Additionally, it outperforms them in terms of
computation time and robustness scores, demonstrating great potential for practical applications.
Povzetek: Opisan je izvrni model za optimizacijo logističnih poti in sprotno razporejanje z združitvijo
globokega utrjevalnega učenja (DRL) in konvolucijskih nevronskih mrež (CNN).
1 Introduction intelligence technology, domestic and foreign scholars are
actively exploring the application of AI technology in
With the advancement of global economic integration logistics path optimization, aiming to improve logistics
and the rapid development of e-commerce, the logistics efficiency through intelligent algorithms. Although
industry is facing unprecedented challenges and traditional methods such as linear programming can
opportunities. Efficient, fast and accurate delivery of provide effective solutions, they are powerless in the face
goods has become one of the core elements of corporate of large-scale dynamic problems [3, 4]. In contrast, AI
competition. However, finding the optimal delivery path technologies such as genetic algorithms (GA) and ant
in a complex geographical environment and achieving colony algorithms (ACA) have shown stronger
instant scheduling under dynamically changing conditions exploration capabilities and adaptability, especially in
has always been a difficult problem for logistics solving the traveling salesman problem (TSP) [5]. In
companies. Although traditional mathematical addition, the advancement of deep learning technology,
programming-based methods perform well under static especially the application of long short-term memory
conditions, they have obvious limitations in dealing with networks (LSTM), makes it possible to predict traffic
real-time changing traffic conditions and emergencies [1]. conditions and realize dynamic path planning.
Therefore, it is particularly important to explore a new Reinforcement learning (RL) enables intelligent agents to
logistics path optimization and real-time scheduling make optimal decisions in a constantly changing
solution that can adapt to complex environments and has environment by simulating the learning process. These
self-learning capabilities [2]. technologies have been widely used in multiple scenarios
Logistics path optimization is a core link in logistics such as urban distribution, cross-border logistics, and cold
management and is crucial to improving logistics service chain logistics, helping to optimize delivery routes, predict
quality and reducing operating costs. Against the customs clearance time, monitor temperature changes, etc.
backdrop of the rapid development of artificial [6, 7]. Despite this, the application of AI in logistics path
152 Informatica 49 (2025) 151–170 Y. Yang et al.
optimization still faces challenges in data privacy the punctuality rate, which is expected to increase the
protection, algorithm real-time and robustness. With the punctuality rate to more than 95%; at the same time,
advancement of technology and the evolution of social increase the scheduling success rate to 92%. In terms of
needs, more innovative solutions are expected to emerge computing efficiency, the model calculation time will be
in the future, continuously promoting the intelligent controlled within 15 seconds to ensure real-time
development of the logistics industry [8]. performance; and when facing complex environmental
In view of the above background, this study aims to disturbances, the robustness score will be maintained
explore how to use neural network technology to improve above 8.5 points (out of 10 points), comprehensively
the existing logistics path optimization algorithm and improving the comprehensive performance of the logistics
propose a set of real-time scheduling strategies suitable for scheduling system and providing strong technical support
dynamic environments. Specifically, we will first analyze for the intelligent development of the logistics industry.
the main problems and their causes in logistics Deep reinforcement learning (DRL) can continuously
distribution, and then introduce the basic principles of optimize strategies by interacting with the environment to
neural networks and their advantages in solving these cope with real-time changes; convolutional neural
problems. Then, we will design and implement a neural networks (CNN) can extract effective features from
network-based path optimization model that can respond complex geographic and traffic data. The combination of
quickly after receiving real-time data input and adjust the the two allows the model to better perceive real-time
distribution plan. Finally, we will verify the effectiveness information and make reasonable decisions quickly.
of the model through experiments and explore its Therefore, the use of specific artificial intelligence
applicability and limitations in different application methods is the key to solving real-time logistics problems.
scenarios [9, 10]. They can make up for the shortcomings of traditional
On the other hand, traditional mathematical methods and improve the efficiency and flexibility of
programming has extremely high requirements for data logistics scheduling.
integrity and accuracy. In logistics data, there are often The novelty of combining DRL and CNN for logistics
problems such as missing data, errors, or outliers. For path optimization and real-time scheduling lies in the
example, the weight of goods and order time in logistics unique complementary advantages. Traditional methods
distribution information may be deviated due to recording find it difficult to take into account both geospatial feature
errors or equipment failures, and the traffic volume and extraction and dynamic strategy adjustment. In this study,
average speed in traffic status data may also have CNN's powerful spatial feature extraction capability can
measurement errors. Traditional mathematical accurately capture key information in the logistics
programming methods lack effective means to deal with geographical environment, such as distribution of
these incomplete or inaccurate data, and direct use may distribution points, traffic network topology, etc. DRL can
lead to a significant reduction in the reliability of model dynamically adjust strategies based on these features to
results. In addition, when faced with large-scale, high- adapt to the ever-changing logistics environment, such as
dimensional data, the computational complexity of real-time traffic conditions, order changes, etc. Although
traditional mathematical programming methods will many papers have similar combinations, this study focuses
increase dramatically, the solution time will be on complex logistics scenarios, deeply integrates the
significantly longer, and it may even be impossible to advantages of the two, and achieves more efficient and
solve the problem, making it difficult to meet the needs of intelligent path planning and scheduling decisions. This is
real-time logistics scheduling. a unique contribution.
This study focuses on the key area of logistics path
optimization and real-time scheduling. At present, 2 Theoretical basis and literature
traditional logistics scheduling methods have exposed
many shortcomings when dealing with complex and review
changing logistics environments, and it is difficult to meet
the needs of efficient and accurate distribution. Based on 2.1 Basic concepts of logistics path
this, we put forward the core research question: How to optimization
use advanced neural network technology to deeply
Research in the field of logistics continues to develop
innovate the existing logistics path optimization algorithm
and innovate, and many scholars have conducted in-depth
to achieve efficient planning and real-time dynamic
discussions from different angles. Alkan and Kahraman
scheduling of logistics paths?
(2023) used the multi-expert Fermat fuzzy hierarchical
Around this issue, we put forward the following
analysis method in the literature [9] to prioritize the supply
specific hypothesis: The model that innovatively
chain digital transformation strategy, providing a
integrates DRL and CNN can fully tap the advantages of
decision-making basis for the digital development of the
both and effectively deal with complex geographic spatial
logistics supply chain, helping logistics companies to
information and dynamically changing logistics
grasp key strategies and optimize operational processes in
environments. Compared with traditional methods, this
the digital wave. Lee et al. (2019) proposed an
model is expected to shorten the length of logistics
endosymbiotic evolutionary algorithm in the literature
distribution paths by an average of about 20% in various
[10] to solve the problem of the integrated model of
scenarios; significantly shorten the delivery completion
vehicle routing and truck scheduling with a cross-dock
time by an average of 30 minutes; significantly improve
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 153
system, providing new ideas and methods for path 2.2 Overview of the application of neural
planning and scheduling in the logistics distribution link, networks in logistics
which is of great significance to improving logistics
efficiency and reducing costs. As an artificial intelligence technology that imitates
Logistics path optimization refers to finding the best the working mode of biological brain, neural network has
path from the starting point to the end point under a series shown great potential in many fields. In the logistics
of constraints to minimize transportation costs, time or industry, neural networks are widely used in multiple links
other specified goals. This process usually involves multi- such as route optimization, demand forecasting, and
objective optimization problems, which may include inventory management. For example, the use of
minimizing total mileage, reducing fuel consumption, convolutional neural networks (CNN) can extract useful
shortening delivery time, etc. Logistics path optimization information from a large amount of image data for
problems can be theoretically classified as combinatorial automatic identification of cargo labels, thereby speeding
optimization problems. Typical problem forms include up sorting. The use of long short-term memory networks
traveling salesman problem (TSP) and vehicle routing (LSTM) can process time series data, predict future
problem (VRP) [11]. These problems become extremely demand fluctuations, and help companies prepare in
complex when they are large in scale, and it is difficult to advance [14, 15]. Reinforcement learning (RL) can
find the global optimal solution. Therefore, researchers dynamically adjust strategies based on historical
have developed a variety of heuristic algorithms and meta- behaviors and reward signals to optimize the vehicle’s
heuristic algorithms, such as genetic algorithms, simulated delivery path.
annealing algorithms, ant colony algorithms, etc., to In recent years, researchers have also explored how to
approximate solutions to such problems. These algorithms combine neural networks with other algorithms to solve
try to find a satisfactory solution rather than an absolute more complex logistics problems. For example, some
optimal solution through iterative search [12]. researchers combined genetic algorithms with neural
Logistics path optimization is not limited to networks to form a hybrid model to solve multi-objective
determining a single path, but also includes issues such as vehicle routing problems. The results showed that this
multi-path selection and multi-vehicle scheduling. With method achieved a good balance between complexity and
the growth of logistics business, how to efficiently allocate solution quality. In addition, graph neural networks
resources in a large-scale network has become one of the (GNNs) are also used to analyze the topological structure
key challenges. In order to meet this challenge, of logistics networks, predict traffic flow by learning the
researchers have begun to explore new solutions, such as relationship between nodes, and then guide dynamic path
introducing machine learning technology into path planning.
planning, using historical data to predict future Neural networks are widely applied in multiple
transportation demand, and thus Equationting more aspects of logistics management, including route
reasonable distribution plans in advance. In addition, with optimization, demand forecasting, and inventory
the development of Internet of Things (IoT) technology, management. Regarding route optimization, relevant
the large amount of real-time data generated in logistics introductions have been provided. However, in the fields
systems has also provided new possibilities for path of demand forecasting and inventory management, neural
optimization [13]. networks also play significant roles. In demand
Logistics path optimization refers to finding the best forecasting, recurrent neural networks (RNNs) or their
path from the starting point to the destination under a variant, long - short - term memory networks (LSTMs),
series of constraints, aiming to minimize transportation can be used to analyze historical order data. These
costs, time, or other specific objectives. Previous networks can capture long - term dependencies in time -
descriptions have mostly focused on time as a static series data to predict future order demands at different
constraint. However, in real - world logistics scenarios, time intervals. For example, by analyzing sales data from
real - time responsiveness plays a crucial role. The the past year, the order volume during upcoming holidays
logistics environment is in a state of dynamic change. can be predicted to make advance inventory preparations
Traffic conditions can change rapidly, such as sudden and logistics plans. In inventory management,
traffic accidents or temporary road closures, which can autoencoders and other neural network structures can be
render the originally planned path no longer optimal. Real used to detect inventory anomalies. By learning the data
- time responsiveness is not just about time consideration features of normal inventory states, an alarm can be issued
but also about the timely response to dynamic elements in in a timely manner when there are abnormal fluctuations
the logistics environment. For example, by obtaining real in inventory levels. In the logistics path optimization use -
- time traffic congestion information through a traffic case of this study, CNNs can be used to extract features
monitoring system, when a congestion is detected on a from geospatial data to help identify the logistics
certain route, the path can be immediately adjusted to characteristics of different regions; LSTMs can combine
ensure transportation efficiency. This ability to time - series traffic data to predict the impact of future
dynamically adjust to real - time changes should be an traffic conditions on paths; and reinforcement learning can
important part of the concept of logistics path be used to dynamically select the optimal path based on
optimization, thus forming a closer logical connection different environmental states, making the applications of
with the subsequent real - time scheduling content. these neural networks closely related to the research use -
case.
154 Informatica 49 (2025) 151–170 Y. Yang et al.
2.3 Development history of real-time 2.4 Review and analysis of related
scheduling technology research literature
Real-time scheduling technology refers to the ability In recent years, many studies have been devoted to
to respond immediately when an event occurs. In the field applying advanced computing technologies to logistics
of logistics, real-time scheduling technology is crucial for path optimization and real-time scheduling. For example,
dealing with unforeseen situations, such as sudden traffic Ren et al. [18] proposed a hybrid method combining deep
jams and road closures caused by weather changes. Early reinforcement learning and genetic algorithm to solve the
real-time scheduling systems mainly relied on simple rules multi-objective vehicle routing problem. Experiments
and expert systems, but with the advancement of show that this method can not only effectively handle
information technology, more data-driven methods have multiple optimization objectives, but also achieve a good
emerged. For example, technology based on model balance between complexity and solution quality. At the
predictive control (MPC) can optimize operations in the same time, Yang et al. [19] used graph neural network
future in a short period of time to ensure that the system is (GNN) to analyze the structure of urban traffic network
always in the best operating state [16]. and proposed a dynamic path planning framework that can
With the enhancement of computing power and the continuously update the optimal path under changing
development of big data technology, modern real-time traffic conditions. Studies have shown that this method has
scheduling systems are no longer limited to simple rule significant improvements in path update speed and path
matching, but are able to predict future state changes by quality compared with traditional algorithms.
learning patterns in historical data and adjust scheduling Although existing research has made significant
strategies accordingly. For example, Wu et al. [17] uses progress, there are still some challenges to overcome. The
deep reinforcement learning technology to implement first is the issue of data privacy and security. Since a large
real-time scheduling, which can adaptively adjust the path amount of sensitive information is involved in the logistics
of mobile robots in dynamic environments, thereby system, how to ensure the secure transmission and storage
improving the flexibility and responsiveness of the of data is an important task. The second is the
system. In addition, cloud computing and edge computing interpretability of the algorithm. Although deep learning
technologies have also made real-time scheduling more models perform well in many cases, they are often black
feasible because they provide powerful computing box models and lack transparency, which limits their
resources to process massive data and ensure that delays application in certain industries (such as healthcare) [20,
are minimized. 21]. Finally, the popularity of technology. Although
This paper introduces real - time scheduling as a academia has proposed many innovative solutions, there
strategy to deal with “unforeseen situations” such as are still relatively few actual deployments in the industry.
traffic jams. However, real - time scheduling should not This may be due to factors such as technology maturity
exist in isolation but should be an important part of the and cost-effectiveness.
overall logistics path optimization problem. In actual “Data privacy and security,” “algorithm
logistics operations, real - time scheduling and path interpretability,” and “technology maturity and cost -
optimization influence and promote each other. When effectiveness” are challenges that cannot be ignored in the
encountering unforeseen situations like traffic jams, real - research of logistics path optimization. In terms of data
time scheduling needs to be dynamically adjusted based privacy and security, the City Logistics Dataset (CLDS)
on the foundation of path optimization. For example, if the and Traffic Status Dataset (TSDS) used in this study
originally planned path cannot reach the destination on contain a large amount of sensitive information, such as
time due to a traffic jam, the real - time scheduling system customer addresses and order details. To protect data
should re - plan the optimal path according to the current privacy, encryption technology can be used to encrypt the
traffic conditions and remaining order information. At the data to ensure its security during transmission and storage.
same time, path optimization should also consider the At the same time, a strict data access permission
possibility of real - time scheduling and reserve a certain management mechanism should be established, and only
degree of flexibility in path planning to enable rapid authorized personnel can access and process the data.
adjustment in case of emergencies. Model predictive Regarding algorithm interpretability, the decision -
control (MPC) is a method that predicts the future making processes of complex models such as deep
behavior of a system based on a model and optimizes reinforcement learning and convolutional neural networks
control strategies. In this study, MPC is closely related to are often difficult to understand. To improve algorithm
real - time scheduling and path optimization. MPC can use interpretability, methods such as feature importance
real - time traffic data and the logistics system model to analysis and decision trees can be used to explain the
predict traffic conditions and logistics demand changes in model's decision - making process. For example, through
the future, thereby adjusting path planning and scheduling feature importance analysis, it can be understood which
strategies in advance. For example, if MPC predicts that a input features have the greatest impact on path selection,
certain road will experience severe congestion in the next enabling decision - makers to better understand the
hour, the system can plan a detour route in advance to model's decision - making basis.
avoid getting the vehicle stuck in the jam and improve the In terms of technology maturity and cost -
efficiency of logistics transportation. effectiveness, a comprehensive evaluation of the adopted
technologies is required. Although deep reinforcement
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 155
learning and convolutional neural networks have great suitable technology solution. For example, by comparing
potential in logistics path optimization, the application of the computational complexity and performance indicators
these technologies requires certain computing resources of different algorithms, an algorithm with high
and professional knowledge. Therefore, in practical computational efficiency and low cost can be selected.
applications, it is necessary to weigh the technology The specific research status is shown in Table 1.
maturity and cost - effectiveness and select the most
Table 1: Research status
Research Method Key Indicators SOTA Positioning
Path length, completion time, on - time rate, Fast in finding the shortest path in simple static
SPA
scheduling success rate scenarios
Path length, completion time, on - time rate, Quick in finding approximate solutions for
HA
scheduling success rate large - scale problems
Path length, completion time, on - time rate, Outstanding in solving complex optimization
GA
scheduling success rate problems
Path length, completion time, on - time rate, Fast decision - making in known simple
RBM
scheduling success rate environments
Path length, completion time, on - time rate, Advantageous in handling dynamic
TDRM
scheduling success rate environments
Path length, completion time, on - time rate, Currently leading in the application of deep
ADLM
scheduling success rate learning
This Study (DRL + Path length, completion time, on - time rate, Surpassing existing methods in multiple
CNN) scheduling success rate indicators
considerations. The CLDS dataset covers detailed
3 Research methods and model information on urban logistics distribution, including the
geographical locations of distribution points, order times,
construction and quantities. Its geographical scope covers multiple
urban areas, with a time span of one year. The dataset
3.1 Data collection and preprocessing contains 1 million records and 20 columns of different
Data collection and preprocessing are key steps to attribute information. These data can well reflect the
ensure the smooth progress of subsequent modeling work. actual situation of urban logistics and provide rich order
This study mainly relies on public datasets for and geographical location information for logistics path
experimental verification and model training. The reason optimization. The TSDS dataset focuses on traffic status
for choosing public datasets is that they provide a wide data, including traffic flow and vehicle speeds on different
range of data sources, cover different types of real roads at different time intervals. Its time granularity is 15
scenarios, and help improve the generalization ability of minutes, and its geographical scope matches that of the
the model. We selected two major public datasets to CLDS dataset. The dataset has 800,000 records and 15
support this study, as shown in Table 2. (1) City Logistics columns of attributes. This dataset can reflect real - time
Data Set (CLDS). This dataset contains logistics changes in traffic conditions, which is crucial for real -
distribution information from multiple European cities, time path optimization. The characteristics of these two
including the time, location, cargo type, weight, etc. of the datasets are highly relevant to the model requirements of
order. These data reflect the actual urban logistics this study. The model needs to plan paths based on order
operations and are very suitable for training and testing information and geographical locations, and the CLDS
our models. (2) Traffic State Data Set (TSDS). This dataset provides the necessary basic data; at the same time,
dataset provides information on the status of urban traffic the model needs to consider the impact of real - time traffic
in different time periods, including traffic flow, average conditions on paths, and the TSDS dataset provides real -
speed, road congestion, etc. These data help us analyze the time traffic data support. By combining these two datasets,
impact of traffic conditions on logistics path optimization a model that better conforms to the actual logistics
and provide a basis for real-time scheduling [22]. environment can be constructed, improving the accuracy
The selection of the City Logistics Dataset (CLDS) and real-time performance of path optimization.
and Traffic Status Dataset (TSDS) is based on multiple
156 Informatica 49 (2025) 151–170 Y. Yang et al.
Table 2: Dataset information
Dataset Geographical Sample
Data Types Time Range Key Features
name range size
City Logistics
January 2018 Order time,
Logistics and Many cities
to December 100,000+ location, cargo
Data Set delivery in Europe
2019 type, weight, etc.
(CLDS) information
Traffic volume,
Traffic State Traffic North January 2019
average speed,
Data Set status American to December 50,000+
road congestion,
(TSDS) information major cities 2020
etc.
Data processing is a key step to ensure the reliability adjust the path to avoid congested roads and improve
of subsequent model training and experimental results. transportation efficiency.
The datasets used in this study include City Logistics Data In logistics path optimization, the traffic input is
Set (CLDS) and Traffic State Data Set (TSDS), which closely related to path selection. The model calculates the
cover logistics distribution information and urban traffic estimated travel times of different paths based on real -
status, respectively. First, we cleaned the original data, time traffic data and gives priority to the path with the
removed duplicate records, and used the 3σ principle to shortest travel time. For example, when there is a traffic
detect and remove outliers to improve data quality. For jam on a certain road, the model will automatically avoid
missing values, we used interpolation to fill in the missing that road and select other relatively unobstructed routes.
values to prevent incomplete data from affecting model At the same time, the model also considers the changing
performance. By standardizing the numerical features, we trend of traffic conditions and plans the path in advance to
converted them into a form with zero mean and unit cope with possible traffic jams. By closely integrating the
variance to facilitate model learning. At the same time, {traffic} input with path selection, dynamic optimization
new feature variables were created according to business of logistics paths can be achieved, improving the
needs, such as calculating the distance between two efficiency and reliability of logistics transportation.
points, extracting specific attributes from the date (such as The model aims to solve the path optimization and
the day of the week), and adjusting demand forecasts real-time scheduling problems in logistics distribution,
according to holidays [23, 24]. and realizes efficient logistics distribution management by
integrating CNN’s ability to extract spatial features and
3.2 Neural network model DRL’s ability to learn dynamic strategies. The model’s
input includes geographic location information, order
This study proposes an innovative model that
information, time information, and traffic conditions, and
combines DRL and CNN to solve logistics path
the output is a series of action instructions that instruct the
optimization and real-time scheduling problems. The
logistics system how to optimally dispatch vehicles.
model uses the powerful feature extraction capability of
The specific pseudo code is as follows.
CNN to process spatial data and dynamically adjusts the
# CNN forward pass
strategy through DRL to cope with the ever-changing
def cnn_forward(x):
logistics environment. The following is the specific design
x = conv_layer(x, filters=32, kernel_size=(3, 3))
of the model and its mathematical expression.
x = relu(x)
The model inputs {lat, Ing}, {order}, and {traffic}
x = pool_layer(x, pool_size=(2, 2))
have clear meanings and interrelationships. {lat, Ing}
return flatten(x)
represents the geographical coordinates of distribution
# DRL Q - network forward pass
points and vehicles. These coordinate information is the
def q_network_forward(state):
basis for path planning, determining the position and
x = fc_layer(state, units=256)
moving direction of vehicles in the geographical space.
x = relu(x)
{order} contains detailed order information, such as the
x = fc_layer(x, units=256)
quantity of orders, delivery times, and delivery locations.
x = relu(x)
Order information is the goal of path optimization, and the
return fc_layer(x, units=action_size)
model needs to plan the optimal path according to the
# DRL training step
order requirements to ensure timely and accurate delivery.
def drl_train():
{traffic} represents real - time traffic conditions, including
state = get_state()
road congestion levels and vehicle speeds. Traffic
action = choose_action(state)
conditions are important factors affecting path selection.
next_state, reward, done = take_action(action)
Real - time traffic data can help the model dynamically
update_q_network(state, action, reward, next_state,
done)
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 157
3.2.1 Model overview to to ensure the shortest path or the least time required [26,
The input of the model includes the location 27]. As shown in Figure 1, the model uses a feature
extraction module to process multiple sources of
coordinates of the distribution points (lat , lng )}N
, the
i i i=1 information such as order information, location
order information of each distribution point {order }N
, coordinates, time information, and traffic conditions. The
i i=1
feature extraction module includes convolutional layers
the current time t, and the current traffic conditions and pooling layers to extract useful information. Next, the
{traffic }N
. The geographic location information
i i=1 model takes the state representation as input and guides
covers the location coordinates of the distribution points, action instructions through action selection and execution.
The model update process continuously optimizes
which is expressed as (lat , lng )}N
where N represents
i i i=1 decisions, thereby improving delivery efficiency.
the number of distribution points, and each coordinate pair The model output “can be represented as a series of
(lat actions {a}, where each action a can be an operation such
i , lngi ) represents the latitude and longitude of the
as selecting the next delivery point or adjusting the vehicle
distribution point. The order information includes the
speed.” In this study, the current focus is mainly on path
order details of each distribution point, such as the order
selection, that is, the action of selecting the next delivery
quantity, cargo type, and estimated arrival time [25],
point. Selecting the next delivery point is the core task of
which can be expressed as where each {order }N order
i i=1 i logistics path optimization, which directly affects
contains all the order information related to the i-th transportation costs and time. By optimizing the selection
distribution point. The time information includes the order of delivery points, the driving mileage and time of
current time t and the estimated arrival time, which is vehicles can be reduced, improving logistics efficiency.
crucial for dynamically adjusting the path and scheduling. Although the current research focuses on path
The traffic condition information reflects the current selection, adjusting the vehicle speed is also an important
traffic conditions, such as road congestion, which can be factor in logistics optimization. In real - world logistics
scenarios, vehicle speed adjustment can be based on
expressed as {traffic }N
where each traffici describes
i i=1 factors such as traffic conditions and delivery time
the traffic conditions on the i-th road. These input data requirements. For example, when encountering a traffic
together constitute the input state of the model, which is jam, appropriately reducing the vehicle speed can avoid
used to dynamically adjust the logistics path and real-time frequent starting and stopping, reducing fuel consumption;
scheduling strategy. while on a smooth road section, increasing the vehicle
The output of the model is a set of action instructions speed can shorten the transportation time. In future
that instruct the logistics system how to optimally dispatch research plans, the collaborative optimization of vehicle
vehicles. The output can be represented as a series of speed adjustment and path selection will be further
actions {at} , where each action at can be an operation explored to achieve more efficient logistics transportation.
For example, a comprehensive optimization model can be
such as selecting the next delivery point or adjusting the
established, considering both path selection and vehicle
vehicle speed. Specifically, these action instructions are
speed adjustment, with the goal of minimizing
designed to guide the logistics system to make the best
transportation costs and time to formulate the optimal
decision based on the current state to minimize cost, time,
logistics strategy.
or other optimization goals. An action at can be to select
the next delivery point that the current vehicle should go
158 Informatica 49 (2025) 151–170 Y. Yang et al.
The location coordinates
Order information Time information Traffic conditions
of the delivery point
Feature extraction
Convolutional layer Pooling layer
State Action
representation selection
Action
Model update
execution
Action instruction
Figure 1: Model framework
3.2.2 Feature extraction optimal path selection strategy through DRL. The core of
the path optimization algorithm design is how to select the
In the logistics environment, the spatial features
optimal path based on the current logistics environment
extracted by CNN have a significant impact. If there are
status. The algorithm is implemented through the
dense distribution points in a certain area, a centralized
following steps [29, 30].
distribution route can be planned to reduce costs. The road
The features extracted in the CNN process include
connectivity feature can help avoid dead-end roads and
geographical layout features, such as road direction and
choose efficient routes. These specific spatial features are
delivery point location; traffic condition features, such as
directly related to route selection. Since logistics needs to
the distribution of congested sections. These features are
be transported efficiently and at low cost, spatial features
closely related to logistics decisions. Geographic layout
provide a key basis for route planning [28].
features determine the basic path framework, and traffic
Using CNN to extract spatial features from
condition features affect real-time path adjustments.
geographic location information is an important part of
Combining these features can formulate a better logistics
this study. Specifically, we use convolutional layers and
distribution plan.
pooling layers to capture local and global features on the
In logistics optimization, the "state-action pair" has a
map. The convolutional layer extracts spatial features of
clear meaning. The state includes order information,
different scales by applying multiple convolution kernels.
vehicle location, traffic conditions, etc. Action refers to
selecting the next delivery point, changing driving speed,
3.3 Path optimization algorithm design
etc. The Q network outputs the action value based on the
The path optimization algorithm designed in this current state and selects the action with the maximum
study aims to achieve efficient path optimization in value, such as choosing a detour when traffic is congested,
logistics distribution by combining the spatial feature in order to optimize the cost, time and other goals.
extraction capability of convolutional neural networks (1) State representation: Use CNN to extract spatial
(CNN) and the dynamic strategy learning capability of features from geographic location information and form a
DRL. The algorithm design utilizes the powerful feature representation of the current state st . The state
extraction capability of CNN to capture the spatial
relationship between distribution points and learns the representation st describes the current logistics
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 159
environment configuration, including but not limited to feature maps. Next is the pooling layer, which uses
vehicle location, cargo status, time information, etc. The maximum pooling with a pooling size of (2, 2). Its
state representation can be expressed as Equation (1). function is to downsample the feature map, reduce the
s amount of data, and retain important features. During the
t =CNN(x ) (1)
t
training process, the back-propagation algorithm is used
Among them, xt is the input data at the current time to update the weight parameters of the convolution kernel
point, including the location coordinates of the delivery to minimize the loss function.
point, order information, current time, and traffic For the DRL architecture, the core is the Q network.
conditions. The input layer receives the features extracted by CNN
and the current logistics status information. The Q
(2) Action selection: Based on the current state st ,
network contains a fully connected layer with 256 neurons
use the Q network in DRL to estimate the value of the and ReLU as the activation function, which can introduce
state-action pair Q(st ,at ) and select the optimal action nonlinear factors and enhance the network's
expressiveness. The output layer outputs the Q value of
at . Action selection is st determined based on the current each possible action. The training strategy uses an
experience replay mechanism to store the agent's
state and the output of the Q network. Specifically, st the
experience (state, action, reward, next state) in the
action that can maximize the Q value is selected based on experience replay pool, and randomly samples from it for
the current state at , expressed as Equation (2). training to reduce the correlation of the data. The learning
rate is initially set to 0.001, and decay is considered. The
at = argmaxa Q(s , ) (2
t a ) discount factor is 0.99 to balance immediate rewards and
(3) Execute action: Execute the selected action at , future rewards. The initial exploration rate is 1.0, the
minimum is 0.01, and the decay rate is 0.995. Random
update the environment state to st+1 , and get an exploration is performed with a higher probability at the
beginning of training, and the exploration rate is gradually
immediate reward 0. rt Update the environment state to reduced as training progresses. With this comprehensive
s architecture design and training strategy, the model is
t+1 according to the selected action at and get an
expected to achieve good results in logistics path
immediate reward rt . The immediate reward rt reflects optimization and real-time scheduling.
at the direct effect after executing the action, such as
4 Experimental evaluation
whether the goods are delivered successfully, whether the
driving time is reduced, etc. 4.1 Experimental design
(4) Model update: Use the Q-Learning update rule to In order to verify the effectiveness of the proposed
update the Q value in the Q network to approach the DRL - CNN combined model for logistics path
optimal strategy. Model update adjusts the Q value in the optimization and real - time scheduling, this section details
Q network through the Q-Learning update rule, expressed the experimental design. To ensure repeatability, we
as Equation (3). selected the City Logistics Data Set (CLDS) and Traffic
Q(st ,at ) Q(st ,at ) State Data Set (TSDS) as public datasets. You can try to
(3) find the dataset links on public data platforms like Kaggle
+[rt + maxa Q(st+1,a)−Q(st ,at )] (https://www.kaggle.com/), Data.gov
Among them, is the learning rate, is the discount (https://www.data.gov/), Zenodo (https://zenodo.org/),
factor (0 << 1), which indicates the discount degree of academic resource websites such as IEEE DataPort
(https://ieee-dataport.org/) and ACM Digital Library
future rewards. This update rule rt adjusts the Q value of
(https://dl.acm.org/), or relevant university and research
the current state st and action by the maximum Q value of institution official websites. The data was divided into
training (70%), validation (15%), and test (15%) sets.
the current immediate reward at and the future state st+1 Six baseline methods (SPA, HA, GA, RBM, TDRM,
. ADLM) were chosen for comparison. Each has unique
For the CNN architecture, its input layer receives features and application scenarios. SPA offers the
geospatial information from the logistics environment, theoretical shortest path but struggles with dynamic
such as the location coordinates of the distribution point logistics. HA quickly finds approximate solutions for
and the topological structure of the transportation large - scale problems, GA suits complex optimizations
network. Next is the convolution layer. This study initially with high computational cost, RBM works for simple
set two layers of convolution, with 32 and 64 convolution tasks in known environments, TDRM has limitations in
kernels, respectively, and the convolution kernel size is (3, feature extraction compared to the proposed model, and
3) with a step size of 1. The convolution layer extracts ADLM may be less effective in specific scenarios.
local features of the input data through convolution The evaluation dataset comes from real - world
operations. Each convolution kernel slides on the input logistics with historical order data, location info, and
data, multiplies elements and sums them to generate traffic conditions. Evaluation indicators include path
160 Informatica 49 (2025) 151–170 Y. Yang et al.
length, completion time, punctuality, and scheduling numerical range. For categorical features, we encoded
success rate. them and converted them into numerical data so that the
For hyperparameters, we considered CNN and DRL model can process them effectively.
characteristics. CNN had 32 and 64 convolution kernels, Through these outlier removal and data preprocessing
(3, 3) size, step - size 1, and (2, 2) max - pooling. DRL's steps, the model can more accurately capture the
Q - network had 256 neurons in the fully - connected layer characteristics and patterns in the data and reduce the
with ReLU activation. Learning rate was 0.001 with interference of noise and errors. In experiments in
decay, discount factor 0.99. Exploration rate started at 1.0, different logistics environments (suburbs, cities,
minimum 0.01, decay rate 0.995, batch size 64, and target highways, etc.), the processed data enabled the model to
network updated every 100 steps. Grid, random search, achieve better performance in indicators such as path
and Bayesian optimization were used to find the best length, completion time, punctuality and scheduling
hyperparameters. SPA, HA, and GA help evaluate the success rate, while also improving the robustness and
model's advantages from different angles. generalization ability of the model, providing more
In this study, outlier removal and data preprocessing reliable support for logistics path optimization and real-
played a crucial role in improving the performance of the time scheduling.
final model. After obtaining public datasets such as City
Logistics Data Set (CLDS) and Traffic State Data Set 4.2 Experimental results
(TSDS), we found that there were some outliers in the
We tested in suburban environments, urban
data, which may be caused by data entry errors, sensor
environments, highways, and other environments, aiming
failures, or special events. If not processed, they will have
to comprehensively evaluate the performance differences
a negative impact on model training and prediction,
of different logistics scheduling methods in different
causing the model to learn incorrect features and patterns,
environments.
thereby reducing the accuracy and stability of the model.
Path length, completion time, punctuality, and
To this end, we used a statistical analysis-based
scheduling success rate are closely related to logistics path
method to remove outliers. For example, for numerical
optimization. Short paths, fast completion, high
data, we calculated its mean and standard deviation, and
punctuality, and high scheduling success rate are the goals
regarded data points that deviated from the mean by a
of logistics. "Success" means completing order delivery
certain multiple of the standard deviation as outliers and
on time and as required. These indicators measure
removed them. In this way, we ensured the quality and
logistics efficiency and service quality from different
consistency of the data, allowing the model to learn based
dimensions, and can effectively evaluate the effect of path
on more reliable data.
optimization.
In terms of data preprocessing, we performed
"Scale" in experiments can refer to the number of
operations such as data cleaning, feature scaling, and
orders, the size of the geographical area, etc. More orders
encoding. During the data cleaning process, we handled
or larger geographical areas will increase the complexity
missing values and used methods such as mean filling and
and uncertainty of path planning. For example, more
median filling to ensure the integrity of the data. Feature
orders may require more vehicles to be deployed, and a
scaling is to normalize or standardize features of different
large geographical area may face more traffic conditions.
ranges and scales so that all features have the same
Clarifying the concept of scale can better understand its
importance in model training and avoid some features
impact on experimental settings and results.
dominating the model training process due to their large
Figure 2: Scheduling efficiency at different scales
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 161
As shown in Figure 2, the scheduling efficiency of relatively high level at a large scale. This shows that these
different scheduling methods at various scales is shown. It methods have certain adaptability and stability when
can be seen that the scheduling efficiency of all methods dealing with larger-scale tasks. However, it should be
gradually decreases as the scale increases. Our method noted that the scheduling efficiency will gradually
decreases the slowest. Among them, the scheduling decrease as the scale increases, which means that some
efficiency of “SPA”, “HA”, “GA”, “RBM”, “TDRM”, challenges and limitations may arise when facing larger-
“ADLM” and “Proposed” all show a downward trend to scale problems. Therefore, in practical applications, it
varying degrees. Although the decline rate of each method should be considered to optimize these methods to
is different, their scheduling efficiency remains at a improve their performance at large task scales.
Figure 3: Robustness changes at different scales
Figure 3 shows the robustness performance of and scheduling success rate are more critical in actual
different methods at different scales. It can be seen that as logistics services, so they are given higher weights; while
the scale increases, the robustness of all methods path length and completion time are relatively less
decreases, but the performance of the “Proposed” method important and have slightly lower weights.
is significantly better than other methods, showing higher In the legend of Figure 3, "Proposed" represents the
stability and robustness. Even at a larger scale, “Proposed” method proposed in this study that combines DRL with
can still maintain a high performance, reflecting its CNN, and "SPA" and others represent other baseline
superiority in coping with complex environmental methods for comparison. The "scale" on the x-axis
changes. represents the scale of the experiment, which can be the
In Figure 3, robustness is measured by taking into number of orders, the scope of the geographical area, or
account multiple factors. Specifically, we define the time span. As these scale factors increase, the
robustness as the ability of the model to maintain efficient complexity and uncertainty of the logistics environment
and stable path planning and scheduling in logistics will increase accordingly. By observing the robustness of
environments of different scales and dynamic changes. To different methods at different scales, we can intuitively
quantify this ability, we use a comprehensive evaluation compare their ability to cope with complex environmental
method of a series of key indicators. First, the fluctuation changes.
range of path length, completion time, on-time rate, and For traditional algorithms, computing time refers to
scheduling success rate of each method at different scales the time it takes from the start of the algorithm to finding
is calculated. The smaller the fluctuation range, the more the best solution. For methods based on machine learning
stable the method is in the face of environmental changes, or deep learning, computing time covers two stages:
and the higher the robustness. model training and inference. Training time is the time it
For the "value" indicator, it is the weighted sum of the takes for the model to learn parameters on the data set, and
above multiple key indicators, and the weight is inference time is the time it takes to use the trained model
determined based on the importance of each indicator in to obtain the path planning result. This measurement can
the actual operation of logistics. For example, on-time rate comprehensively evaluate the efficiency of each method.
162 Informatica 49 (2025) 151–170 Y. Yang et al.
Table 3: Performance comparison of different methods in suburban environment
Path Scheduling Robustness
Method Completion Punctuality Calculation
length success score (out
Name time (min) rate (%) time(s)
(km) rate (%) of 10)
SPA 240 170 78 83 10 6
HA 220 160 82 86 15 7
GA 210 150 88 91 25 8
RBM 250 180 75 80 5 5
TDRM 200 140 86 90 20 7
ADLM 190 130 90 93 18 8
Proposed 180 120 92 95 12 9
In suburban environments, the challenges faced by from 5-10 kilometers to 20-30 kilometers, with an average
logistics scheduling are mainly long delivery distances distance of about 15 kilometers. In terms of road
and less traffic interference. As can be seen from Table 3, conditions, the main roads are relatively wide but may be
the proposed method outperforms other baseline methods generally maintained, and some branch roads are narrow
in almost all indicators. and in poor condition. Traffic flow is generally small, and
The description of the “suburban” environment in peak hours may increase due to activities in surrounding
Table 2 is too vague, and its characteristics need to be towns. Similarly, for other environments such as cities,
defined in detail to facilitate readers to fully understand highways, and multi-point distribution, their node
the results. Suburban environments have relatively few characteristics, distance indicators, roads and traffic
nodes and are more dispersed, usually around 10-20 conditions, etc. should also be clarified to enhance the
nodes. The distance between nodes varies greatly, ranging interpretability of the results.
Table 4: Performance comparison of different methods in urban environment
Path Scheduling Robustness
Method Completion Punctuality Calculation
length success score (out
Name time (min) rate (%) time(s)
(km) rate (%) of 10)
SPA 260 200 75 80 10 6
HA 240 190 78 82 15 7
GA 230 180 82 85 25 8
RBM 270 210 70 75 5 5
TDRM 220 170 84 87 20 7
ADLM 210 160 88 90 18 8
Proposed 200 150 90 93 12 9
The urban environment is characterized by dense 200 km, a completion time of 150 minutes, a punctuality
buildings and complex transportation networks, which rate of 90%, and a scheduling success rate of 93% in the
puts higher requirements on logistics scheduling. Table 4 urban environment, which are higher than other methods.
shows the performance of different methods in the urban This shows that the proposed method can not only find a
environment. The proposed method has a path length of better distribution path in the city, but also better adapt to
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 163
the dynamic changes in the city, ensuring a high service
quality and scheduling success rate.
Table 5: Performance comparison of different methods in highway environment
Path Scheduling Robustness
Method Completion Punctuality Calculation
length success score (out
Name time (min) rate (%) time(s)
(km) rate (%) of 10)
SPA 230 160 80 85 10 6
HA 210 150 83 87 15 7
GA 200 140 86 90 25 8
RBM 240 170 78 82 5 5
TDRM 190 130 88 91 20 7
ADLM 180 120 91 93 18 8
Proposed 170 110 93 95 12 9
The highway environment is characterized by fast especially in terms of path length, which reaches 170 km,
traffic speed and strict traffic rules. Table 5 shows that in which is shorter than other methods. This shows that the
the highway environment, the proposed method is proposed method has higher efficiency on highways and
superior to other methods in terms of path length, can complete delivery tasks faster, while ensuring
completion time, punctuality and scheduling success rate, extremely high punctuality and scheduling success rates.
Table 6: Performance comparison of different methods under severe weather conditions
Path Scheduling Robustness
Method Completion Punctuality Calculation
length success score (out
Name time (min) rate (%) time(s)
(km) rate (%) of 10)
SPA 270 210 72 77 10 6
HA 250 200 75 78 15 7
GA 240 190 78 82 25 8
RBM 280 220 68 72 5 5
TDRM 230 180 80 83 20 7
ADLM 220 170 83 86 18 8
Proposed 210 160 85 88 12 9
Bad weather can seriously affect the efficiency and and a scheduling success rate of 88%, which are better
safety of logistics distribution. As can be seen from Table than other methods. This shows that the proposed method
6, the proposed method still performs well under bad has better robustness and can maintain a high service level
weather conditions, with a path length of 210 km, a under adverse weather conditions.
completion time of 160 minutes, an on-time rate of 85%,
164 Informatica 49 (2025) 151–170 Y. Yang et al.
Table 7: Performance comparison of different methods in peak traffic environment
Path Scheduling Robustness
Method Completion Punctuality Calculation
length success score (out
Name time (min) rate (%) time(s)
(km) rate (%) of 10)
SPA 280 220 68 72 10 6
HA 260 210 72 75 15 7
GA 250 200 75 78 25 8
RBM 290 230 65 70 5 5
TDRM 240 190 78 80 20 7
ADLM 230 180 80 83 18 8
Proposed 220 170 82 85 12 9
Traffic rush hour is a major difficulty in logistics success rate, especially the completion time of 170
scheduling. Table 7 shows that during peak hours, the minutes and the punctuality rate of 82%, which shows that
proposed method is ahead of other methods in terms of the proposed method can still maintain a high work
path length, completion time, punctuality and scheduling efficiency and service level during peak hours.
Table 8: Performance comparison of different methods in emergency delivery environment
Path Scheduling Robustness
Method Completion Punctuality Calculation
length success score (out
Name time (min) rate (%) time(s)
(km) rate (%) of 10)
SPA 250 180 75 78 10 6
HA 230 170 78 80 15 7
GA 220 160 80 82 25 8
RBM 260 190 70 75 5 5
TDRM 210 150 82 85 20 7
ADLM 200 140 85 87 18 8
Proposed 190 130 88 90 12 9
Emergency delivery requires quick response and scheduling success rate of 90%, all of which are better
efficient scheduling. As can be seen from Table 8, the than other methods. This shows that the proposed method
proposed method performs well in the emergency delivery can also efficiently complete the delivery task in an
environment, with a path length of 190 km, a completion emergency and meet the urgent needs of customers.
time of 130 minutes, an on-time rate of 88%, and a
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 165
Table 9: Performance comparison of different methods in a multi-delivery point environment
Path Scheduling Robustness
Method Completion Punctuality Calculation
length success score (out
Name time (min) rate (%) time(s)
(km) rate (%) of 10)
SPA 260 190 70 75 10 6
HA 240 180 75 78 15 7
GA 230 170 78 80 25 8
RBM 270 200 68 72 5 5
TDRM 220 160 80 83 20 7
ADLM 210 150 82 85 18 8
Proposed 200 140 85 87 12 9
Table 9 shows that in the multi-point distribution the path length is 200 km and the completion time is 140
environment, the proposed method is superior to other minutes, which shows the advantages of the proposed
methods in terms of path length, completion time, method in dealing with multi-point distribution.
punctuality and scheduling success rate, especially when
Figure 4: Comprehensive performance comparison of different methods
166 Informatica 49 (2025) 151–170 Y. Yang et al.
Table 10: Robustness score comparison
Method Name Calculation time(s) Robustness score (out of 10)
SPA 10 6
HA 15 7
GA 25 8
RBM 5 5
TDRM 20 7
ADLM 18 8
Proposed 12 9
As shown in Table 10 and Figure 4, the proposed rate. This fully demonstrates the superiority of the
method outperforms other methods in all environments, proposed method in different logistics scheduling
especially in key indicators such as path length, environments and shows its great potential in practical
completion time, on-time rate, and scheduling success applications.
Figure 5: Success rate performance at different scales
“Success rate” refers to the proportion of orders that higher success rate at a smaller scale, but as the scale
are successfully dispatched, and “scale” refers to the increases, its success rate decreases rapidly and eventually
number of orders. It only shows that the success rate stabilizes.
increases with scale, but does not explain the practical In order to rigorously verify the significance of the
value of this relationship for the logistics environment. performance differences of the proposed method in this
Figure 5 shows the success rate performance of study compared with other baseline methods in different
different algorithms at different scales. As can be seen environments, we conducted a paired sample t test. For the
from the figure, with the increase of scale, the success rate suburban environment, in terms of the path length
of each algorithm generally shows a downward trend. It is indicator, the t statistic calculation results show that its
worth noting that the “Proposed” algorithm shows a absolute value far exceeds the critical value, indicating
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 167
that the proposed method is significantly different from is 240 kilometers, while that of the model in this study is
other baseline methods, and the path of this method is only 180 kilometers. This is because the model uses the
significantly shorter; similarly, the completion time powerful feature extraction ability of CNN to better
indicator shows that the advantage of this method in short capture geospatial information, combined with DRL to
completion time is significant. dynamically learn the optimal strategy, so as to plan a
In the urban environment, the t test results of the shorter path.
completion time and punctuality indicators are significant, In terms of completion time, the model also performs
indicating that this method can complete tasks more well. In urban environments, the completion time of the
efficiently and on time in complex urban environments. In genetic algorithm (GA) is 180 minutes, while the
the highway environment, the difference in path length completion time of this model is only 150 minutes. This is
and dispatch success rate is significant, reflecting the due to the model's rapid response to dynamic information
superiority of this method in planning paths and arranging such as real-time traffic conditions and decision
dispatches in high-speed scenarios. adjustments, which achieves more efficient scheduling.
Under severe weather conditions, there are significant On-time rate and scheduling success rate are
differences in the punctuality rate and dispatch success important indicators for measuring the quality of logistics
rate indicators, showing the robustness of this method in services. In various environments, the punctuality rate and
dealing with severe weather. During peak traffic hours, the scheduling success rate of this model are higher than other
completion time and dispatch success rate are significantly methods. For example, in the highway environment, the
different, indicating that this method can also maintain punctuality rate of the traditional deep reinforcement
efficient dispatch during peak hours. In the emergency learning method (TDRM) is 88%, and the scheduling
delivery environment, all indicators are significantly success rate is 91%, while this model reaches 93% and
better than the baseline method, highlighting the ability of 95% respectively. This shows that this model can better
rapid response and efficient dispatch. The significant cope with the complex and changing logistics
differences in path length and completion time in a multi- environment and ensure the stability and reliability of
point distribution environment prove the advantages of logistics services.
this method in dealing with multi-point distribution From the perspective of computing time and
problems. Overall, the advantages of this research method robustness score, this model shows stronger robustness
in different environments and indicators are statistically while maintaining high efficiency. In terms of computing
significant. time, this model is at a medium level of 12 seconds, but it
can maintain a high robustness score (9 points) under
4.3 Discussion complex environmental changes. This is because the
model continuously optimizes decisions during the
Through the above experimental results, we
learning process and has better adaptability to
comprehensively evaluated the proposed innovative
environmental changes.
model that combines DRL with convolutional neural
In summary, the DRL + CNN model proposed in this
networks (CNN). Compared with the state-of-the-art
study has obvious advantages in logistics path
(SOTA) in related work, the model in this study showed
optimization and real-time scheduling, and can effectively
significant advantages in multiple key indicators.
fill the shortcomings of existing methods in complex
In terms of path length, the path length of the model
environment adaptability, real-time, feature extraction,
in this study is shorter than that of other comparison
and integration with domain knowledge, providing strong
methods, whether in suburban, urban or highway
technical support for the development of future logistics
environments. For example, in suburban environments,
scheduling systems.
the path length of the classic shortest path algorithm (SPA)
Table 11: Comparison results
Research Method Results
Suburban area: Path length is 240 km, completion time is 170 minutes, on - time
SPA
rate is 78%, and scheduling success rate is 83%.
Suburban area: Path length is 220 km, completion time is 160 minutes, on - time
HA
rate is 82%, and scheduling success rate is 86%.
Suburban area: Path length is 210 km, completion time is 150 minutes, on - time
GA
rate is 88%, and scheduling success rate is 91%.
Suburban area: Path length is 250 km, completion time is 180 minutes, on - time
RBM
rate is 75%, and scheduling success rate is 80%.
168 Informatica 49 (2025) 151–170 Y. Yang et al.
Research Method Results
Suburban area: Path length is 200 km, completion time is 140 minutes, on - time
TDRM
rate is 86%, and scheduling success rate is 90%.
Suburban area: Path length is 190 km, completion time is 130 minutes, on - time
ADLM
rate is 90%, and scheduling success rate is 93%.
This Study (DRL + Suburban area: Path length is 180 km, completion time is 120 minutes, on - time
CNN) rate is 92%, and scheduling success rate is 95%.
As shown in Table 11, the table clearly presents the service quality and scheduling success, with a path length
comparison results of logistics path optimization of of 200 kilometers, a completion time of 150 minutes, an
different research methods in suburban environments. As on - time rate of 90%, and a scheduling success rate of
a classic shortest path algorithm, SPA has a long path 93%. On highways, it seems efficient and can attain
length, a long completion time, and a relatively low on- relatively swift delivery while keeping a high on - time
time rate and scheduling success rate in suburban rate, with a path length of 170 kilometers, a completion
environments, reflecting its poor adaptability in complex time of 110 minutes, an on - time rate of 93%, and a
suburban logistics scenarios. HA has improved compared scheduling success rate of 95%.
to SPA, but still has certain limitations. GA has further In the conducted experiments, the proposed method
improved in path length, completion time, on-time rate, has seemingly outperformed other methods in terms of
and scheduling success rate, but its advantages are not computation time and robustness score. This gives an
obvious compared with more advanced methods. The indication that the model might be able to find a decent
performance of RBM in various indicators is relatively solution within a relatively short period and maintain a
poor, indicating that rule-based methods have limited somewhat stable performance when encountering certain
effects in suburban logistics path planning.TDRM and environmental changes. Nevertheless, it must be
ADLM, as more advanced methods, perform well in emphasized that the experiments have a limited scope,
multiple indicators. However, the DRL + CNN method especially when it comes to extreme scenarios like bad
proposed in this study shows significant advantages, with weather and traffic peak hours. Thus, we cannot be overly
the shortest path length, the shortest completion time, and confident about its ability to handle real - time changing
the highest on-time rate and scheduling success rate. This traffic conditions and emergencies as mentioned in the
shows that the method of combining deep reinforcement introduction.
learning with convolutional neural networks can more Based on the current comparison of different methods'
accurately capture the characteristics of suburban logistics comprehensive performance, the proposed method shows
environments, dynamically adjust path planning and some potential advantages in various logistics scheduling
scheduling strategies, and thus achieve more efficient and environments. It serves as a starting point for future
reliable logistics distribution. These results provide a exploration in logistics scheduling systems, but significant
strong reference for path optimization and scheduling in refinement and more extensive validation are undoubtedly
the logistics industry in suburban environments. necessary.
5 Conclusion Funding
This study explores the integration of DRL and CNN This work was supported by 2023 Bidding Subjects
to develop a novel logistics path optimization and real - for Decision-making Research of Jiaozuo Municipal
time scheduling model. Through meticulous analysis and Government of Henan Province: “Research on
pre - processing of the City Logistics Data Set (CLDS) and Countermeasures for Consolidating and Developing the
Traffic State Data Set (TSDS), we've crafted a model that Public Ownership Economy in Jiaozuo City”
may have the capacity to handle diverse logistics (JZZ202311-1), Excellent Subject, Project Completion,
environments. Presided over; 2022 Henan Jiaozuo Municipal
The experimental outcomes suggest that the proposed Government Decision Research Bidding Project:
method has shown some positive signs in multiple “Research on the Development of Local Characteristic
logistics scheduling environments. In suburban regions, it Industries in Jiaozuo City under the Background of Rural
appears to have some ability to tackle long - distance Revitalization” (JZZ202222-1), Excellent Project,
delivery issues and manage scattered delivery points. For Conclusion, Presided; 2021 Henan Jiaozuo Municipal
example, the path length is 180 kilometers, the completion Government Decision Research Bidding Project:
time is 120 minutes, the on - time rate is 92%, and the “Research on Modern and Efficient Agricultural
scheduling success rate is 95%. In urban areas, it can Development in Henan City under the Rural
somewhat find better delivery paths and adapt to the Revitalization Strategy” (JZZ202127-2), qualified,
complex traffic network, achieving a certain level of closed, presided; And 2022 Henan Province University
Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 169
Humanities and Social Science Research Project Funding: IEEE Access. 2022; 10:6175-6193. DOI:
“Research on Promoting the Effective Connection 10.1109/access.2022.3141311
between Poverty Alleviation Strategy and Rural [12] Zhu XY, Jiang LL, Xiao YM. Study on the
Revitalization” (2022-ZDJH-00273), established under optimization of the material distribution path in an
research and presided. electronic assembly manufacturing company
workshop based on a genetic algorithm considering
References carbon emissions. Processes. 2023; 11(5):1500.
DOI: 10.3390/pr11051500
[1] Di Gennaro G, Buonanno A, Fioretti G, Verolla F, [13] Okumus F, Dönmez E, Kocamaz AF. A Cloudware
Pattipati KR, Palmieri FAN. Probabilistic inference architecture for collaboration of multiple AGVs in
and dynamic programming: a unified approach to indoor logistics: case study in fabric manufacturing
multi-agent autonomous coordination in complex enterprises. Electronics. 2020; 9(12):2023. DOI:
and uncertain environments. Frontiers in Physics. 10.3390/electronics9122023
2022; 10:944157. DOI: 10.3389/fphy.2022.944157 [14] Chen YH. Intelligent algorithms for cold chain
[2] Du PF, Shi YQ, Cao HT, Garg S, Alrashoud M, logistics distribution optimization based on big data
Shukla PK. AI-Enabled trajectory optimization of cloud computing analysis. Journal of Cloud
logistics UAVs with wind impacts in smart cities. Computing-Advances Systems and Applications.
IEEE Transactions on Consumer Electronics. 2024; 2020; 9(1):37. DOI: 10.1186/s13677-020-00174-x
70(1):3885-3897. DOI: 10.1109/tce.2024.3355061 [15] Yu SL, Song YT. SRNN-RSA: a new method to
[3] Wang PW, Wang YF, Wang X, Liu Y, Zhang J. An solving time-dependent shortest path problems based
intelligent actuator of an indoor logistics system on structural recurrent neural network and ripple
based on multi-sensor fusion. Actuators. 2021; spreading algorithm. Complex & Intelligent
10(6):120. DOI: 10.3390/act10060120 Systems. 2024; 10(3):4293-4309. DOI:
[4] Cai LY. Decision-making of transportation vehicle 10.1007/s40747-024-01351-0
routing based on particle swarm optimization [16] Sun Q, Zhang HF, Dang JW. Two-Stage vehicle
algorithm in logistics distribution management. routing optimization for logistics distribution based
Cluster Computing-the Journal of Networks on HSA-HGBS algorithm. IEEE Access. 2022;
Software Tools and Applications. 2023; 26(6):3707- 10:99646-99660. DOI:
3718. DOI: 10.1007/s10586-022-03730-z 10.1109/access.2022.3206947
[5] Zhang MY, Jiang YH, Wan C, Tang C, Chen BY, Xi [17] Wu C, Xiao YM, Zhu XY, Xiao GW. Study on
HZ. Design of an intelligent shop scheduling system multi-objective optimization of logistics distribution
based on internet of things. Energies. 2023; paths in smart manufacturing workshops based on
16(17):6310. DOI: 10.3390/en16176310 time tolerance and low carbon emissions. Processes.
[6] Leng KJ, Li SH. Distribution path optimization for 2023; 11(6):1730. DOI: 10.3390/pr11061730
intelligent logistics vehicles of urban rail [18] Ren CA, Huang YZ, Luo DX. A global optimal
transportation using VRP optimization model. IEEE mapping method of network based on discrete
Transactions on Intelligent Transportation Systems. optimization firefly algorithm. Microprocessors and
2022; 23(2):1661-1669. DOI: Microsystems. 2021; 81:103800. DOI:
10.1109/tits.2021.3105105 10.1016/j.micpro.2020.103800
[7] Zheng KN, Huo XX, Jasimuddin S, Zhang JZ, [19] Yang C. Intelligent pickup and delivery collocation
Battaïa O. Logistics distribution optimization: Fuzzy for logistics models. Journal of Intelligent & Fuzzy
clustering analysis of e-commerce customers' Systems. 2023; 44(4):6117-6129. DOI: 10.3233/jifs-
demands. Computers in Industry. 2023; 151:103960. 223708
DOI: 10.1016/j.compind.2023.103960 [20] Sun YX, Geng N, Gong SL, Yang YB. Research on
[8] Zhang L, Lu SQ, Luo ML, Dong B. Optimization of improved genetic algorithm in path optimization of
the storage spaces and the storing route of the aviation logistics distribution center. Journal of
pharmaceutical logistics robot. Actuators. 2023; Intelligent & Fuzzy Systems. 2020; 38(1):29-37.
12(3):133. DOI: 10.3390/act12030133 DOI: 10.3233/jifs-179377
[9] Alkan N, Kahraman C. Prioritization of supply chain [21] Tian SH, Huangfu CY, Deng XP. Research on
digital transformation strategies using multi-expert comprehensive optimisation of AGVs scheduling at
fermatean fuzzy analytic hierarchy process. intelligent express distribution centres based on
Informatica. 2023; 34(1):1-33. DOI: 10.15388/22- improved GA. Journal of the Operational Research
infor493 Society. 2024; 75(10):1875-1892. DOI:
[10] Lee KY, Lim JS, Ko SS. Endosymbiotic 10.1080/01605682.2023.2283518
evolutionary algorithm for an integrated model of the [22] Chen WL, Song XG. Enterprise management
vehicle routing and truck scheduling problem with a innovation in the internet of things from the
cross-docking system. Informatica. 2019; 30(3):481- perspective of contingency. Journal of Intelligent &
502. DOI: 10.15388/Informatica.2019.215 Fuzzy Systems. 2019; 37(5):5829-5836. DOI:
[11] Issaoui Y, Khiat A, Haricha K, Bahnasse A, Ouajji 10.3233/jifs-179164
H. An advanced system to enhance and optimize [23] Wang Y, Zhang J, Guan XY, Xu MZ, Wang Z, Wang
delivery operations in a smart logistics environment. HZ. Collaborative multiple centers fresh logistics
170 Informatica 49 (2025) 151–170 Y. Yang et al.
distribution network optimization with resource agricultural products based on the perspective of
sharing and temperature control constraints. Expert customers. Journal of Intelligent & Fuzzy Systems.
Systems with Applications. 2021; 165:113838. DOI: 2022; 43(1):615-626. DOI: 10.3233/jifs-212362
10.1016/j.eswa.2020.113838 [28] Liu Z, Wang HS, Wei HS, Liu M, Liu YH.
[24] Wu C, Xiao YM, Zhu XY. Research on optimization Prediction, planning, and coordination of thousand-
algorithm of AGV scheduling for intelligent warehousing-robot networks with motion and
manufacturing company: taking the machining shop communication uncertainties. IEEE Transactions on
as an example. Processes. 2023; 11(9):2606. DOI: Automation Science and Engineering. 2021;
10.3390/pr11092606 18(4):1705-1717. DOI: 10.1109/tase.2020.3015110
[25] Ren JJ, Salleh SS. Green urban logistics path [29] Yang YL, Zhang J, Sun WJ, Pu Y. Research on
planning design based on physical network system in NSGA-III in Location-routing-inventory problem of
the context of artificial intelligence. Journal of pharmaceutical logistics intermodal network.
Supercomputing. 2024; 80(7):9140-9161. DOI: Journal of Intelligent & Fuzzy Systems. 2021;
10.1007/s11227-023-05796-x 41(1):699-713. DOI: 10.3233/jifs-202508
[26] Gautam K, Ahn CW. Quantum path integral [30] Yin N. Multiobjective optimization for vehicle
approach for vehicle routing optimization with routing optimization problem in low-carbon
limited Qubit. IEEE Transactions on Intelligent intelligent transportation. IEEE Transactions on
Transportation Systems. 2024; 25(5):3244-3258. Intelligent Transportation Systems. 2023;
DOI: 10.1109/tits.2023.3327157 24(11):13161-13170. DOI:
[27] Li HP, Lu L, Yang LG. Study on the extension 10.1109/tits.2022.3193679
evaluation of smart logistics distribution of fresh
https://doi.org/10.31449/inf.v49i16.7201 Informatica 49 (2025) 171–186 171
Design and Application of Improved Genetic Algorithm for
Optimizing the Location of Computer Network Nodes
Chunlei Zhong1*, Gang Yang2
1Huai'an Bioengineering Branch Institute, Jiangsu Union Technical Institute, Huai'an, 223200, China
2College of Teacher Education, Wenzhou University, Wenzhou, 325035, China
E-mail: hm_spring@163.com
*Corresponding author
Keywords: genetic algorithm, computer network, network nodes, improved genetic algorithm, average error
Received: September 24, 2024
The rapid development of computer technology has made network stability and node positioning
accuracy important challenges in optimizing computer network design. This study proposes an
optimization method based on the Improved Genetic Algorithm (IGA) to improve the positioning
accuracy and stability of network nodes. Firstly, by combining the characteristics of the centroid
algorithm and the Approximate Point in Triangulation Test (APIT) algorithm, preliminary optimization
of node positions is carried out. Subsequently, an IGA is utilized for further optimization, dynamically
adjusting the crossover probability and mutation probability to balance global and local search
capabilities and avoid the algorithm falling into local optima. The experimental results showed that IGA
achieved significant performance improvement in node localization. Compared with the centroid
algorithm, the maximum error of IGA has been reduced by 19% and the overall average error has been
reduced by 8.8%. Compared with APIT, IGA has reduced the maximum error by 7% and the overall
average error by 3.8%. Regarding fitness values, IGA exhibited faster convergence speed, achieving
optimal results with only 75 iterations, surpassing traditional genetic algorithms and APIT algorithms.
The node coverage rate reached 98.6%, far higher than the 85.3% of the centroid algorithm and 90.5%
of the APIT algorithm. These results demonstrate that IGA has higher accuracy, stability, and
computational efficiency in complex network environments, providing an efficient and reliable solution
for optimizing the design of computer network nodes.
Povzetek: Predlagan je izboljšan genetski algoritem (IGA) za optimizacijo lokacij vozlišč v
računalniških omrežjih, ki z dinamičnim prilagajanjem verjetnosti križanja in mutacije poveča točnost,
stabilnost in učinkovitost algoritma.
1 Introduction efficient and accurate when utilized for network design
optimization. At the same time, researchers combine
With the continuous progress of modern technology, genetic algorithms with other optimization algorithms to
computer networks play an increasingly important role in create multiple hybrid optimization algorithms, which
modern society. They connect various devices and enhances network design performance and effectiveness
systems, making the transmission and sharing of [3-4]. The objective of the research is to achieve an
information more efficient and convenient. To meet the optimized design of computer networks and to improve
needs of users for high-quality network services, network performance indicators, including latency,
improving network performance and optimizing network throughput, resource utilization, and cost, through an
design have become increasingly important. Traditional Improved Genetic Algorithm (IGA). The research aims to
optimization algorithms frequently encounter issues of solve the problems of slow convergence speed,
low efficiency and a propensity to fall into local optima susceptibility to local optima, and difficulty in dynamic
when addressing large-scale network design problems. adjustment of traditional network optimization methods
Therefore, it is necessary to introduce new optimization in complex network environments. This study designs a
algorithms to solve these problems [1-2]. In recent years, computer network optimization technique based on a
researchers have made significant advancements in genetic algorithm as the core and introduces multiple
applying enhanced genetic algorithms to optimize techniques to improve performance. A fitness function
computer networks. These enhancements include the based on network performance indicators is constructed
introduction of new operators, optimization of algorithm to quantify the network optimization objectives. This
parameters, and adjustments to the algorithms technology adjusts the crossover and mutation
themselves. As a result, genetic algorithms are now more probabilities adaptively by comparing individual fitness
172 Informatica 49 (2025) 171–186 C. Zhong et al.
and population average fitness, balancing global and local
search capabilities.
relay node for network transmission based on the number
2 Related work of channels. This algorithm had high merit in data
delivery efficiency and routing overhead in computer
The 5G era is coming and the network technology is
networks. Alsaqour et al. [8] put forward a
developing rapidly. Numerous data have brought
location-assisted routing algorithm grounded on genetic
enormous challenges to the stability and reliability of
algorithms to optimize the efficiency of MANET routing
computer networks. The reliability of computer networks
protocols. Firstly, through algorithm optimization, node
is a major indicator of computer comprehensive
information was added to the route and these nodes were
performance. Computer networks are large and complex,
grouped. These nodes were then sent to their destinations
and they are also easily affected by many adverse factors.
to adaptively update the node location. The results
This leads to instability in the system, which exposes the
showed that the optimized algorithm could achieve a
entire computer network to significant risks. To ensure
delivery rate of over 99% for small network overhead
the stability and ongoing optimization of computer
packets. Bu [9] developed a load-balancing scheduling
networks, computer network optimization design has
algorithm for Internet of Things (IoT) clusters using a
become a prevalent point of discussion in computer
combination of Particle Swarm Optimization and Genetic
research. Through their study of cloud computing, Fan et
Algorithm (PSOGA). The purpose of this algorithm was
al. [5] presented a novel mathematical model for virtual
to address the persistent challenge faced by IoT networks
network embedding in optical data center networks. This
due to high-volume business data traffic causing
model reduced Network Topology (NT) complexity
downtime. They first used the CPU, RAM, and network
during optical fiber transmission. They used a
bandwidth to measure the server node information, then
comprehensive system of node awareness and path
adjusted the appropriate function value, and used the IGA
evaluation to derive algorithms with priority locations.
to obtain the optimal solution. The results showed that the
The algorithm obtained by this model could reduce the
optimized algorithm could reduce latency and error rates
latency of virtual network requests by 20% and improve
by 5%, while also reducing server overload and
the request rate by 13%. Rajendran and Venkataraman [6]
downtime. Network coding could integrate coding
proposed a new neural network algorithm to analyze
capabilities with network multi-path propagation, bolster
network traffic built on the application and analysis of big
the capacity of computer networks, and facilitate more
data in network security. They used this method to
intricate security solutions. To address the susceptibility
conduct statistics on the worst data and abnormal activity
of network coding to attacks, Wu et al. [10] developed a
sent by the network and conducted experiments with the
comprehensive unicast secure transmission scheme based
data. Compared with traditional algorithms for neural
on Random Linear Network (RLN). The matrix was
networks, the optimized algorithm showed a notable
randomly generated from the received nodes and the
enhancement in distinguishing between false alarms and
resulting vector was sent back to the source node via the
actual detection, which significantly improved the
link to form a new matrix. This approach effectively
security and stability of the network. Xiaokaiti et al. [7]
thwarted network eavesdropping attacks. The
raised an efficient data transmission strategy for the
comparative analysis between the research and the
detection algorithm of computer network communities.
advanced methods is shown in Table 1.
They first combined NT attributes with social attributes
when dividing communities and then selected the optimal
Table 1: Comparative analysis of research and the advanced methods
Comparison with
Reference Technical Method Advantages Disadvantages
IGA
Virtual network IGA reduces error
Reduces latency by 20% Limited applicability;
embedding with node by 8.8%, with
Fan et al. [5] and improves request rate does not optimize node
awareness and path broader
by 13%. positioning.
evaluation. applicability.
Enhanced neural
High computational IGA achieves
Rajendran et network algorithm for Improves security and
cost; lacks node 2.41% error, with
al. [6] malicious traffic reduces false alarms.
optimization. higher efficiency.
detection.
Community detection Improves transmission IGA reduces error
Xiaokaiti et Dependent on BT;
algorithm to optimize efficiency and reduces by 3.8%, offering
al. [7] limited precision.
data transmission. routing overhead. better stability.
Alsaqour et Genetic algorithm for Achieves 99% small Suitable for small IGA reduces error
al. [8] optimizing mobile ad packet delivery rate with networks; struggles to 2.46%, with
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 173
hoc network routing. low overhead. with large-scale wider applicability.
networks.
IGA improves
Focuses on load accuracy by 8.8%,
PSO and GA combined Reduces load and
Bu [9] balancing; lacks offering a
for load balancing. downtime by 5%.
positioning accuracy. comprehensive
solution.
Does not optimize IGA achieves 5.2%
Secure transmission
Wu et al. Enhances security and node positioning or error, with better
using random linear
[10] prevents eavesdropping. transmission precision and
network coding.
efficiency. stability.
Previous research has found that related work
mainly focuses on specific aspects of computer network 3 Construction of computer network
optimization, including security enhancement, data
transmission efficiency, and load balancing. However, optimization design model based
they have shortcomings in addressing the accuracy of on genetic algorithm
network node localization and overall stability under
different network conditions. Existing algorithms such as 3.1 Optimization of node location based on
the centroid algorithm and Approximate Point in centroid algorithm and APIT
Triangulation Test (APIT) have significant drawbacks,
algorithm
including limited accuracy and sensitivity to node
density. Traditional algorithms, such as genetic From the perspective of topology, a computer network is
algorithms and MANET routing protocols perform well composed of several network nodes and communication
in specific network types, but perform poorly in links connecting these network nodes. This indicates that
large-scale or dynamic environments. Using genetic the positioning of network nodes is indispensable in
algorithms to assist routing protocols can improve computer network data transmission. The centroid
network overhead and delivery rates. This fully optimizes algorithm is the most typical node localization algorithm
the genetic algorithm and enhances network delivery. To among commonly used localization algorithms. The
optimize computer network nodes for better algorithm has four advantages: low storage energy
environmental conditions, this study uses centroid and consumption, simple algorithm principle, low computing
APIT algorithms, which provide better conditions for energy consumption, and low communication energy
computer network optimization. Then, based on node consumption.
optimization, an IGA is used to construct a network Before using this algorithm for localization, it is first
design optimization model. Through optimizing the necessary to determine whether the location node that the
traditional ant colony algorithm, the efficiency of sensor needs to determine is located within the region. At
network nodes in computer network optimization design the same time, nodes requiring location determination
is enhanced. This paper aims to increase the stability and will continually emit various
reliability of computer network optimization design.
B A
C
(x,y)
H
D
G
E
F
Figure 1: Schematic diagram of centroid algorithm positioning
174 Informatica 49 (2025) 171–186 C. Zhong et al.
communication signals to the surrounding environment. 1 n 1 n
To determine whether the unknown node is in the (x, y) = ( x , y (1)
i i )
n i=1 n i=1
monitoring area, it is essential to verify the strength of the
In Formula (1), n represents a n -sided shape.
signal obtained at the beacon node. The strength can
(x, y) means the coordinate of the vertex. The centroid
reflect the unknown node location [11]. The principle of
of this n-sided shape can be obtained by calculating the
the centroid algorithm is obtained built on the algorithm
formula.
for the centroid. In any irregular polygon, there must be a
If the polygon is situated within the solved region
center of mass inside it. Usually, the coordinates of each
and has matching coordinates, then the centroid
vertex are accumulated, and then the average value is
coordinates of the octagon can be computed using
calculated to determine its specific coordinates. The
Formula (2).
specific location can be represented by Formula (1), and
its algorithm diagram is shown in Figure 1.
x + x + + x y + y + + y
(x, y) = ( 1 2 8 , 1 2 8 ) (2)
8 8
The (x1 , y2 ) (x8 , y8 ) in Formula (2) represents require high positioning accuracy, the centroid algorithm
the coordinates of eight vertices. To use centroid is still the most suitable method.
positioning algorithms for positioning, it is essential to APIT is an improved algorithm for the centroid
rely on the smoothness of the entire network structure and algorithm. It requires a completely random selection of
the specific distribution of positioning nodes within the many known coordinate nodes, and the coordinate nodes
network. If an error occurs in the coordinates calculated are grouped every three. In accordance with these nodes,
by unknown nodes, it will bias towards areas with the triangles drawn on the graph will be completely
densely distributed beacon nodes, potentially resulting in randomly distributed throughout the entire region. There
significant errors with the centroid algorithm. Therefore, will be some overlap between these triangles to calculate
the algorithm's calculation accuracy is typically not high, the coordinates of unknown nodes. The specific operation
and the positioning accuracy may be low. However, the steps are: First, multiple coordinate nodes around
centroid algorithm only needs to broadcast once to locate unknown nodes are identified, and three known location
all unknown nodes. In many applications that do not nodes are randomly
Figure 2: Schematic diagram of APIT algorithm positioning
selected each time. Then, the approximate location of the unknown nodes into the algorithm results in enhanced
signals received by these known location nodes is accuracy in location estimation [12-13]. However, this is
determined. If there are m beacon nodes, the paper will accompanied by an increased computational burden. In
randomly select and match them, and use the combination such a scenario, choosing a subset of vertices to create a
of three random position points to fo C3
rm n triangle. It polygon based on the real circumstances, as illustrated in
shows that some triangular regions can contain unknown Figure 2, can be beneficial. In Figure 2, the node
nodes, while others do not. These specific points, which positioning accuracy of the APIT is significantly greater
contain triangular regions of unknown nodes, are than that of the centroid positioning algorithm.
connected to each other. Finally, the recorded location Due to the relatively large impact of node density on
algorithm is utilized to calculate the specific location of APIT, when the beacon node density is relatively large,
unknown nodes. The incorporation of a greater number of APIT can achieve relatively ideal positioning accuracy.
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 175
APIT also has good performance in irregular wireless shortcomings in both algorithms concerning their
signal propagation models and irrational circular calculation and location processes. Additionally, the use
propagation models. However, APIT also has a of genetic algorithms for node localization requires extra
significant disadvantage. When connecting triangles, it constraints, which may lead to increased computational
may mistake points located outside the triangle for points time and reduced efficiency, resulting in premature
inside the triangle. The probability of generating this convergence [15]. To obtain better positioning
situation can reach a maximum of 13% through research optimization results, an IGA model is studied and
[14], which will have a significant impact on positioning constructed. The flow chart of the model is shown in
accuracy. The algorithm must divide a large number of Figure 3, and the blue box in the figure shows the
triangular regions to identify the locations of unknown improved steps.
nodes and necessitates multiple beacon nodes. As a Compared with Traditional Genetic Algorithms
result, the algorithm performs numerous calculations, (TGA), the paper has improved the node localization of
which elevates the likelihood of encountering errors. genetic algorithms and constructed a matrix. The specific
construction of the matrix is shown in Formula (3).
3.2 Construction of improved genetic
algorithm model
The research and analysis of the centroid and APIT
algorithms in node location optimization have revealed
s a11 a12 a
1 1n
s2 a21 a22 a2n
S(m) = = (3)
sm am1 am2 amn
The m in Formula (3) represents a total of m communication areas of different anchor nodes is shown
chromosomes in the genetic algorithm. n means that in Figure 4.
each chromosome has n elements. s1 , s2 sm are IGA performs improved optimization in TGA such
each chromosome. In TGA, determining the value range as parameter setting, population initialization, appropriate
of genes is a commonly used method to generate an function values, selection operations, and crossover
initial population waiting for calculation. If a certain operations. The specific key parameter settings are to set
number of initial populations are randomly generated the number of populations to be 40, the crossover
within this value range, there may be situations where the probability pc1 to be 0.6, pc2 to be 0.4, the mutation
distribution is too random. This result is basically not probability pm1 to be 0.08, pm2 to be 0.06, and the
helpful for improving the algorithm efficiency [16]. If maximum iterations to be 100. Population initialization
one wants to obtain a global optimal solution, the can determine the initial population range according to
distribution of the initial population in the solution space Formula (4), and generate an initial population randomly
should be as uniform and dispersed as possible. The within this range.
schematic diagram of random initial population
max (xi − di ) x min (xi + di )
generation within the overlapping range of i=1,2, ,n i=1,2, ,n
(4)
max (yi − di ) y min (yi + di )i=1,2, ,n i=1,2, ,n
Begin
Initialization
Select Action
Calculate individual
fitness
Cross operation
N
Meet stop rule
Mutation operation
Y
Finish Generate a new
species group
Figure 3: The improvement process of genetic algorithms
176 Informatica 49 (2025) 171–186 C. Zhong et al.
Anchor node
Initial population
generation region
Figure 4: Schematic diagram of initial population generation area
In Formula (4), di means the distance between the accelerating the convergence speed of the algorithm. The
node of unknown i and the anchor. The fitness function selection operation is to perform a unified comparison of
value calculation assumes a total of (M +N) nodes in each individual based on the fitness value calculated from
the wireless sensor network to be located, where the the fitness function. After the comparison is completed,
number of known nodes is M. The unknown node number the two individuals with the highest fitness remain
is N. Through a certain distance measurement method, if unchanged and proceed to the next round of operation.
each unknown node knows the distance between all The individuals with the lowest fitness are directly
known nodes within the communication radius and itself, eliminated, and the remaining objects will normally
the calculated node position can be obtained through the undergo crossover and mutation operations. Special
least square method [17]. Assuming that the coordinate of individuals with high fitness values are set with judgment
a node at a known location is (x1,y1),(x2,y2),…,(xM,yM), the values to distinguish and limit their reproduction. After
coordinate of an unknown node is (x, y) , and the completing the full iterative process, the fitness value of
distance from the node at a known location is each individual should be appropriately amplified
d1, d2 , d3 , , dM . The equation set shown in Formula (5) [18-19]. The crossover probability is used to control the
can be established. probability of individuals (chromosomes) performing
crossover operations. By calculating an individual's
fitness value, it can be determined whether the individual
(x − x )2 + (y − y )2
1 1 = d 2
1 should participate in crossover operations. The goal of
(x − x 2 2 2 crossover operation is to generate offspring with higher
2 ) + (y − y2 ) = d2
fitness by recombining the genetic information of the
2 2 2
(x − x3 ) + (y − y3 ) = d3 (5) parent individual, gradually approaching the optimal
solution. When performing a crossover operation, if
(x − x )2 + (y − y 2 2 Fg Favg , the crossover probability is calculated
M M ) = d
M
according to Formula (7).
From Formula (5), the fitness function for genetic
F
algorithms can be defined as Formula (6), and the fitness g − Farg
Pc = pc (7)
1
function of the initial population can be calculated by Fgb − Favg
using it.
In Formula (7), pc
1 M 1 (0,1) . Fg is the value of the
f (x, y) = (x − xi )2+ (y − yi )2 − di (6) individual's fitness function. Fgb represents the optimal
M i=1 individual adaptation function value. Favg is the average
In Formula (6), (x, y) is the unknown location of the adaptive function. The calculation process includes
node location. (xi , yi ) represents a known location node normalizing the fitness difference and converting the
location. di refers to the distance from an unknown normalized fitness difference into actual cross
location to a known location (xi , yi ) . The use of probability. If Fg<Favg , the crossover probability is
absolute error instead of squared error in the fitness calculated according to Formula (8).
function avoids the calculation of squared error, reduces P (8)
c = P
c2
complex multiplication operations, and lowers
In Formula (8), P
computational load. During the iteration process, absolute c2 (0,1) . After pairing the
chromosomes in the population, the crossover operation
error is more robust to outliers (i.e. has less impact on
is performed based on the calculated crossover
data with larger deviations) and also enables the
probability. A random number between [0-1] is randomly
algorithm to approach the global optimal solution faster,
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 177
generated for each chromosome. The objective of the If Fg<Favg , the probability of variation is calculated by
treatment of poorly adapted individuals is to provide Formula (10).
those with lower fitness a certain opportunity to p
m = P (10)
m2
participate in crossover, increase population diversity, In Formula (10), Pm2 (0.01,0.10) . The first step is
and circumvent premature convergence to local optimal to randomly generate a number between 0 and 1 for each
solutions. If the corresponding value is less than the chromosome in the population. If the generated value is
crossover probability, the chromosome is ready to less than the mutation probability, it will undergo
perform the next operation. The chromosomes for the mutation operation. The position corresponding to the
next step are sequentially crossed in pairs. For each pair required mutation is determined by generating a random
of crossed chromosomes, the location of the crossing is number value, and the next step is to invert the value to
determined by random numbers and the crossover complete the relevant mutation operation. Monotonic
operation is performed. During the mutation operation, if gene locus detection analyzes the whole population and
Fg Favg , the mutation probability is calculated by identifies any monotonic gene loci present. If these are
Formula (9). detected, targeted adjustments can be made by generating
random numbers. The termination operation should be
Fg − Favg
P determined and the loop termination should be evaluated
m = p (9)
m1
Fgb − Favg based on the number of iterations. The final step is to
output the optimal solution and test the average
In Formula (9), represents
positioning error of the algorithm as a performance
the adaptive degree function value of individual
parameter, as shown in Formula (11).
is the value of the optimal individual adaptation function.
refers to the average value of the adaptive function.
100 N
error = (x 2 2
i1 − xi2 ) + (yi1 − yi2 ) % (11)
N R i=1
N in Formula (11) is the sum of nodes. (xi1 , yi1) configuration, thereby achieving efficient and accurate
and (xi2 , yi2 )i = (1, 2,3, , M) represent the actual and network optimization design.
calculated coordinates of the unknown node i . R is
the maximum communication distance of the node. In the 4 Performance analysis of computer
process of transforming IGA theory into practical
applications, the rigor of mathematical analysis is network optimization design model
reflected in the precise modeling and dynamic adjustment based on genetic algorithms
of fitness functions, crossover probabilities, and mutation
probabilities. The dynamic allocation of crossover
probability is achieved by comparing individual fitness 4.1 Performance analysis of node location
with group average fitness and optimal fitness. This based on centroid location algorithm
allows individuals with higher fitness to have a higher and APIT algorithm
probability of crossover, thereby accelerating the spread
of excellent genes, while preserving a small number of To verify the actual positioning effects of the centroid
crossover opportunities for individuals with lower fitness algorithm, APIT, and IGA, simulation experiments are
and maintaining population diversity. This normalization conducted on three algorithms in MATLAB. The reason
mechanism based on fitness differences effectively for choosing APIT and centroid algorithm as benchmarks
balances local search and global search, avoids premature for research is their effectiveness and wide application in
convergence of the algorithm, and improves solution network optimization. The APIT algorithm performs well
accuracy and efficiency. Furthermore, the implementation in localization problems and is suitable for evaluating the
of random number generation techniques and probability accuracy and reliability of network nodes, serving as a
judgment processes enables the transformation of benchmark for network performance optimization in
theoretical models into practical operations, thereby research. The centroid algorithm is known for its
ensuring the randomness and controllability of crossover simplicity, ease of use, and fast convergence, making it
and mutation. This approach facilitates the robustness and suitable for solving optimization problems in basic
convergence of the algorithm in complex optimization network structures. The selection of these two algorithms
problems, thus achieving efficient integration of theory perfectly covers different types of network optimization
and practice. When using IGA for computer network requirements. Through comparison, the advantages of
optimization design, the network optimization problem is IGA in solving complex optimization problems can be
first modeled as a fitness function to measure network clearly demonstrated. Using MATLAB version R2021a,
performance indicators. Then, through iterative evolution the hardware specifications are as follows: Intel Core
through selection, crossover, and mutation operations, the i7-9700K processor, 32 GB DDR4 RAM, 512 GB
crossover and mutation probabilities are dynamically solid-state drive, and Windows 10 Professional 64 bit
adjusted to optimize the network structure and parameter operating system. The algorithm sets the population size
178 Informatica 49 (2025) 171–186 C. Zhong et al.
to 100, the number of iterations to 500, the crossover 100×100 area. After generating this region, the node is
probability to 0.8, and the mutation probability to 0.05. predicted by running the corresponding algorithm. Figure
The elite strategy is to retain the top 10% of excellent 5 shows the original node distribution diagram.
individuals. To ensure the statistical validity of the test The green circle in Figure 5 represents an anchor
scenario, multiple sets of experiments are designed and node, while the blue pentagon represents an unknown
optimized for network topologies of different sizes and node. Figure 6 shows the positioning results of three
complexities. Each experiment should be repeated at least algorithms. The positioning results of centroid algorithm,
30 times to obtain stable average performance indicators APIT, and IGA are shown in Figure 6(a), Figure 6(b) and
and standard deviations, ensuring the reliability of the Figure 6(c), respectively.
results. In the collected performance indicators, statistical In Figure 6, the predicted value of IGA has a higher
analysis is used to evaluate the significant differences in coincidence rate with unknown nodes, reaching 94.36%
algorithms under different configurations, thereby (P<0.05). The predicted value of the centroid algorithm
determining the efficacy of the optimization effects. has the lowest coincidence rate with unknown nodes,
When comparing, a null hypothesis and an alternative which is 86.25%. The coincidence rate between the
hypothesis are set. When calculating the P-value, it predicted value of APIT and the unknown node is
represents the probability of obtaining the current or more 89.67%. The coincidence rate of the IGA is 8.16% higher
extreme result under the null hypothesis. The t-test is than that of the centroid algorithm and 4.69% higher than
used to compare the results of IGA and benchmark that of the APIT algorithm (P<0.05). This means IGA has
algorithms. If the P-value is less than 0.05, the difference a high positioning computing ability. The positioning
is considered statistically significant. The experiment errors of the centroid algorithm, APIT, and IGA are listed
generates 20 anchor nodes and 80 unknown nodes in the in Figure 7.
Unknown Anchor
node node
100
80
60
40
20
0
0 20 40 60 80 100
Node abscissa
Figure 5: Original node distribution diagram
Coincidence rate:86.25% Coincidence rate:89.67%
Prediction Unknown Anchor Prediction Unknown Anchor
node node node node node node
100 100
80 80
60 60
40 40
20 20
0
0 20 40 60 80 100 0
0 20 40 60 80 100
Node abscissa Node abscissa
(a)Centroid algorithm positioning results (b)ATIP positioning results
Node ordinate
Node ordinate
Node ordinate
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 179
Coincidence rate:94.36%
Prediction Unknown Anchor
node node node
100
80
60
40
20
0
0 20 40 60 80 100
Node abscissa
(c)Improved genetic
algorithm location results
Figure 6: Location results of three algorithms
50
Centroid algorithm positioning error
45 Location error of ATIP algorithm
Improved Genetic Algorithm Location Error
40
35
30
25
20
15
10
5
0
0 20 40 60 80
Unknown node
Figure 7: Positioning error of three algorithms
The centroid algorithm in Figure 7 achieves a coverage, but relies on high-density anchor nodes and is
maximum error rate of 32% during positioning, and the prone to misidentifying external points of the triangle as
overall node positioning average error rate is 14% internal points. The centroid algorithm is computationally
(P<0.05). The maximum error rate of APIT in positioning simple and suitable for scenarios that do not require high
is 20%, but the average value of the overall error has accuracy. However, its accuracy is low and it is easily
decreased to 9% (P<0.05). This indicates that compared affected by uneven distribution of node density, resulting
to the centroid positioning algorithm, the predicted in positioning bias towards areas with dense anchor nodes
coordinates error calculated by the APIT positioning and significant errors. IGA introduces a method of
algorithm is significantly reduced, with better positioning dynamically adjusting crossover probability and mutation
results. The maximum error rate of IGA during probability during the evolution process. It dynamically
positioning is 13%, and the average value of its overall adjusts based on individual fitness and population
error is 5.2% (P<0.05). The maximum error of IGA is average fitness, avoiding premature convergence of the
19% lower than the centroid algorithm, and the overall algorithm and ensuring the search for the global optimal
average error is 8.8% lower (P<0.05). Compared to solution. IGA performs local fine optimization,
APIT, the maximum error of IGA is 7% lower, and the improving the accuracy and stability of the algorithm.
average overall error is 3.8% lower (P<0.05). APIT The comparison of the three shows that IGA has excellent
improves positioning accuracy through random triangle node positioning capabilities in wireless sensor networks.
Positioning error
Node ordinate
180 Informatica 49 (2025) 171–186 C. Zhong et al.
4.2 IGA-based application analysis of networks under different iterations. Hence, under the
same parameter conditions, this study gradually changes
computer network optimization design
the iterations and simulates them using TGA, centroid
There are differences in the performance of IGA and positioning algorithms, APIT, and IGA, respectively.
other node localization algorithms for wireless sensor Thus, the iteration number
10
Traditional genetic algorithm
Centroid positioning algorithm
8 ATIP location algorithm
Improved genetic algorithm
6
4
2
0
0 10 20 30 40 50 60 70 80 90 100
Iterations
Figure 8: Changes in fitness values of the four methods
2
Traditional genetic algorithm
Centroid positioning algorithm
ATIP location algorithm
1.5
Improved genetic algorithm
1
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Number of anchor nodes
Figure 9: Relationship between the number of anchor nodes and average error
and its corresponding fitness value are obtained. As the Figure 8 shows that the fitness values of the four
number of iterations gradually increases, Figure 8 localization algorithms are all less than 10. The fitness
illustrates the corresponding changes in fitness between values of IGA, TGA, centroid algorithm, and APIT are
the IGA and other algorithms for positioning wireless 4.26, 8.15, 6.42, and 5.31, respectively. The iteration
sensor network nodes. numbers are 69, 86, 83, and 79, respectively. The IGA’s
fitness value is the lowest, 3.89 lower than that of TGA,
Average error
Fitness value
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 181
2.03 lower than centroid algorithm, and 1.05 lower than the average error of IGA is the lowest among all four
APIT. This shows that IGA has better adaptability in algorithms. This fully demonstrates the advantages of
node localization, and verifies the superiority of this IGA. Figure 10 displays the chart that pertains to the
algorithm. The anchor node numbers and the average communication radius of a particular node and the
error value’s relationship of the four algorithms is shown average error value of the four algorithms.
in Figure 9. In Figure 10, the communication radius of nodes is
In Figure 9, as the anchor node amount increases, increasing, while the average error of the four node
the average error of the four positioning algorithms is positioning algorithms is gradually decreasing. The
gradually decreasing. The average errors of TGA, average errors of TGA, centroid algorithm, APIT, and
centroid algorithm, APIT, and IGA are 1.12, 1.03, 0.95, IGA are 5.75%, 4.52%, 3.87%, and 2.46%, respectively
and 0.68, respectively (P<0.05). The average error of (P<0.05). When the communication radius is the same,
IGA is significantly lower than that of TGA. Meanwhile, the IGA’s average error is always at the lowest value.
as the number of anchor nodes increases gradually in This verifies the superiority of IGA when the
proportion to all nodes, the average error of various communication radius of nodes changes. Figure 11 is the
algorithms undergoes slight changes over time, as shown relevant of the specific network connectivity and the
by the curve. As the anchor node amount keeps the same, average error value of the four algorithms.
10%
Traditional genetic algorithm
9% Centroid positioning algorithm
8% ATIP location algorithm
7% Improved genetic algorithm
6%
5%
4%
3%
2%
1%
0
0 10 20 30 40 50 60 70 80 90 100
Node communication radius
Figure 10: Relationship between node communication radius and average error
10%
Traditional genetic algorithm
9% Centroid positioning algorithm
8% ATIP location algorithm
7% Improved genetic algorithm
6%
5%
4%
3%
2%
1%
0
0 10 20 30 40 50 60 70 80 90 100
Node network connectivity
Figure 11: Relationship between network connectivity and average error value
Average error Average error
182 Informatica 49 (2025) 171–186 C. Zhong et al.
The average errors of TGA, centroid algorithm, between node coverage, evolutionary algebra, and
APIT, and IGA in Figure 11 are 5.41%, 4.49%, 3.71%, completion time of TGA and IGA.
and 2.41%, respectively (P<0.05). This indicates that The node coverage of the two algorithms in Figure
regardless of changes in network connectivity, IGA's 12(a) shows: in the same node density, the IGA has a
positioning ability is always higher than the other three higher regional coverage. Figure 12(b) displays the
algorithms. Figure 12 is the graph about connection relationship between the number of evolutions and
completion time of both algorithms over
100
90
80
70
60
50
40
30
Node coverage of traditional genetic
20 algorithms
Improving Node Coverage of
10 Genetic Algorithms
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Node density
(a)Node coverage graph of two algorithms
1000 Relationship between Evolutionary Algebra and
Completion Time of Traditional Genetic Algorithms
900 The Relationship between Evolutionary Algebra and
Completion Time of Improved Genetic Algorithms
800
700
600
500
400
300
200
100
0
0 50 100 150 200 250 300 350 400 450 500
Evolutionary algebra
(b)Evolution algebra and completion time graph of two algorithms
Figure 12: The relationship graph between node coverage, evolutionary generations, and completion time of two
algorithms
Completion time
Area coverage
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 183
successive iterations of the genetic algorithm. As the further application analysis is conducted in a large-scale
iterations progress, the number of evolutions gradually wireless sensor network node positioning scenario with a
increases while the time required to complete all side length of 500m in a region. The total number of
iterations decreases. However, with the same sensor nodes in the region is 500, including 100 anchor
evolutionary algebra, IGA takes less time. This shows the nodes and 400 unknown nodes. The results of the
superiority and stability of the IGA. To further analyze large-scale application analysis are shown in Table 2.
the adaptability and superiority of the research method,
Table 2: Results of application analysis in major scenarios
Metrics IGA APIT Algorithm Centroid Algorithm
Average Positioning Error
2.45 5.62 7.89
(%)
Positioning Time (s) 12.3 18.4 9.6
Number of Iterations 75 120 60
Fast (0.5 fitness Medium (0.8 fitness Slow (1.2 fitness
Convergence Speed
variation) variation) variation)
Node Coverage Rate (%) 98.6 90.5 85.3
As shown in Table 2, the average positioning error advantages in node positioning accuracy. At the same
of IGA is 2.45%, significantly lower than the 5.62% of time, IGA had a faster convergence speed, requiring only
APIT algorithm and the 7.89% of centroid algorithm 75 iterations to reach the optimal solution, with a stable
(P<0.05). This indicates that IGA can effectively improve fitness value change (0.5). APIT and centroid algorithms
the accuracy of node localization in large-scale scenarios required 120 and 60 iterations, respectively, and had
and is suitable for complex and high-precision network slower convergence speeds. In addition, IGA achieved a
environments. The positioning time of IGA is 12.3 node coverage rate of 98.6%, significantly higher than
seconds, which is between 18.4 seconds of APIT and 9.6 APIT (90.5%) and centroid algorithm (85.3%),
seconds of centroid algorithm (P<0.05). Although the demonstrating its applicability and advantages in
computational complexity is slightly higher than that of large-scale complex network environments. The reason
the centroid algorithm, IGA improves efficiency by why IGA outperforms traditional methods in terms of
optimizing the evolution process, enabling it to maintain positioning error and fitness values is mainly due to
fast computational speed while achieving high-precision several key technological innovations. Firstly, the
positioning. After 75 iterations, IGA achieves dynamic parameter adjustment mechanism can
convergence, which is faster than the APIT algorithm's dynamically adjust the crossover probability and
120 iterations (P<0.05), demonstrating the advantages of mutation probability based on the fitness value, thereby
IGA's dynamic parameter adjustment and elite strategy in balancing global and local search and preventing the
the search process. Although the centroid algorithm has algorithm from getting stuck in local optimal solutions.
fewer iterations, its accuracy is significantly insufficient This is consistent with the ideas of Yu et al. [20].
(P<0.05). The node coverage rate of IGA reaches 98.6%, Secondly, fitness function optimization reduces
which is much higher than the 90.5% of APIT algorithm computational complexity and enhances robustness to
and the 85.3% of centroid algorithm (P<0.05). This outliers by introducing absolute error instead of
indicates that IGA has better coverage performance in traditional square error, enabling the algorithm to
large-scale networks and can optimize node positioning approach the global optimal solution more quickly. In
layout more comprehensively. addition, the elite strategy ensures the retention of high
fitness individuals and reduces the loss of high-quality
4.3 Discussion solutions. The uniform distribution of the initial
population within the communication area improves
This study has designed an IGA that effectively improves
search efficiency and reduces ineffective calculations
the accuracy and stability of node localization in wireless
caused by random initialization. The results obtained are
sensor networks through techniques such as dynamic
consistent with Singh et al.'s study [21]. These
parameter adjustment, fitness function optimization, and
improvements effectively address common pitfalls of
elite strategy. Compared with traditional centroid
TGAs, such as local optima and premature convergence,
algorithms and APIT algorithms, IGA exhibits significant
enabling IGA to exhibit higher stability and accuracy in
advantages in key performance indicators. Specifically,
complex dynamic network environments. The core
the average positioning error of IGA was 2.45%, which
innovation of IGA lies in combining the local
was much lower than APIT's 5.62% and centroid
improvement of TGAs with global search, which is
algorithm's 7.89%, indicating that IGA has significant
suitable for non-standard situations such as uneven node
184 Informatica 49 (2025) 171–186 C. Zhong et al.
distribution, limited number of anchor nodes, and References
complex conditions such as changes in communication
radius. In practical applications, IGA demonstrates good [1] F. Wang, X. Lai, and N. Shi, “A multi-objective
stability and adaptability by flexibly adjusting parameters optimization for green supply chain network
and optimizing search space. In previous studies, TGAs design,” Decision Support Systems, vol. 51, no. 2,
often faced local optimal traps, leading to premature pp. 262-269.
convergence of the algorithm. The study aims to enhance https://doi.org/10.1016/j.dss.2010.11.020
population diversity, reduce the interference of outliers [2] Q. Liu, Z. Guo, and J. Wang, “A one-layer recurrent
on the search process, and accelerate the convergence of neural network for constrained pseudoconvex
the global optimal solution by providing low fitness optimization and its application for dynamic
individuals with moderate opportunities for crossover and portfolio optimization,” Neural Networks, vol. 26,
mutation. The research provides a more stable, accurate, pp. 99-109, 2012.
and efficient solution for node localization and https://doi.org/10.1016/j.neunet.2011.09.001
optimization in complex network environments. [3] J. L. Ribeiro Filho, P. C. Treleaven, and C. Alippi,
“Genetic-algorithm programming environments,”
Computer, vol. 27, no. 6, pp. 28-43, 1994.
5 Conclusion https://doi.org/10.1109/2.294850
[4] C. D. Lin, C. M. Anderson-Cook, M. S. Hamada, L.
The high-speed development of computer network M. Moore, and R. R. Sitter, “Using genetic
technology has caused tremendous changes in people's algorithms to design experiments: a review,”
production and life. Currently, computer network Quality and Reliability Engineering International,
optimization still has the problem of low positioning vol. 31, no. 2, pp. 155-167, 2015.
accuracy of network nodes. To solve the related https://doi.org/10.1002/qre.1591
problems, this study constructed an IGA model and [5] W. B. Fan, F. Xiao, X. B. Chen, L. Cui, and S. Yu,
applied it to computer network optimization. “Efficient virtual network embedding of
Experimental results showed that IGA has significantly cloud-based data center networks into optical
improved location coverage and average location error networks,” IEEE Transactions on Parallel and
compared to centroid algorithm and APIT. The Distributed Systems, vol. 32, no. 11, pp. 2793-2808,
coincidence rate of the improved algorithm was 8.16% 2021. https://doi.org/10.1109/TPDS.2021.3075296
higher than centroid algorithm’s and 4.69% higher than [6] B. Rajendran, and S. Venkataraman, “Detection of
that of the APIT algorithm. The maximum error of IGA malicious network traffic using enhanced neural
was 19% lower than the centroid algorithm, and the network algorithm in big data,” International
overall average error was 8.8% lower. Compared to Journal of Advanced Intelligence Paradigms, vol. 19,
APIT, the maximum error of IGA was 7% lower, and the no. 3-4, pp. 370-379, 2021.
average overall error was 3.8% lower. Under the same https://doi.org/10.1504/ijaip.2021.116366
parameters, TGA, centroid algorithms, APIT algorithms, [7] A. Xiaokaiti, Y. Qian, and J. Wu, “Efficient data
and IGA were used to compare the performance of transmission for community detection algorithm
network nodes in computer networks. Experimental data based on node similarity in opportunistic social
were obtained: the fitness value of IGA, the amount of networks,” Complexity, vol. 2021, pp. 1-18, 2021.
anchor nodes and the average error, the communication https://doi.org/10.1155/2021/9928771
radius and the average error, and the network [8] R. Alsaqour, S. Kamal, M. Abdelhaq, Y. Zan, and
connectivity and the average error were 4.26, 0.68, 2.46, D. Jerou, “Genetic algorithm routing protocol for
and 2.41, respectively. IGA had a significant mobile ad hoc network,” CMC Computers Materials
improvement over the calculated values corresponding to Continua, vol. 68, no. 1, pp. 941-960, 2021.
the three algorithms, which proves the accuracy and https://doi.org/10.32604/cmc.2021.015921
stability of the improved genetic positioning algorithm. [9] B. Bu, “Mult-task equilibrium scheduling of internet
of things a rough set genetic algorithm,” Computer
6 Abbreviated List Communications, vol. 184, pp. 42-55, 2022.
NT: Network Topology https://doi.org/10.1016/j.comcom.2021.11.027
PSOGA: Particle swarm optimization and genetic [10] R. Y. Wu, J. M. Ma, Z. X. Tang, X. H. Li, and K. K.
algorithm R. Choo, “A generic secure transmission scheme
RLN: Random linear network based on random linear network coding,” IEEE
IGA: Genetic Algorithm ACM Transactions on Networking, vol. 30, no. 2,
TGA: Traditional Genetic Algorithm pp. 855-866, 2021.
APIT: Approximate Point In Triangulation Test https://doi.org/10.1109/TNET.2021.3124890
[11] W. C. Chang, and I. H. R. Jiang, “iClaire: A fast and
general layout pattern classification algorithm with
Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 185
clip shifting and centroid recreation,” IEEE Nonlinear Control, vol. 31, no. 15, pp. 7390-7408,
Transactions on Computer-Aided Design of 2021. https://doi.org/10.1002/rnc.5690
Integrated Circuits and Systems, vol. 39, no. 8, pp. [21] G. Singh, V. K. Tewari, R. R. Potdar, and S. Kumar,
1662-1673, 2019. “Modeling and optimization using artificial neural
https://doi.org/10.1109/TCAD.2019.2917849 network and genetic algorithm of self‐propelled
[12] T. Ganesan, and P. Rajarajeswari, “Efficient sensor machine reach envelope,” Journal of Field Robotics,
node connectivity and target coverage using genetic vol. 41, no. 7, pp. 2373-2383, 2024.
algorithm with Daubechies 4 lifting wavelet https://doi.org/10.1002/rob.22255
transform,” International Journal of Communication
Networks and Distributed Systems, vol. 28, no. 3,
pp. 337-364, 2022.
https://doi.org/10.1504/ijcnds.2022.122170
[13] S. T. Shishavan, and F. S. Gharehchopogh, “An
improved cuckoo search optimization algorithm
with genetic algorithm for community detection in
complex networks,” Multimedia Tools and
Applications, vol. 81, no. 18, pp. 25205-25231,
2022. https://doi.org/10.1007/s11042-022-12409-x
[14] B. Nahavandi, M. Homayounfar, A. Daneshvar, and
S. Mohammad, “Hierarchical structure modelling in
uncertain emergency location-routing problem using
combined genetic algorithm and simulated
annealing,” International Journal of Computer
Applications in Technology, vol. 68, no. 2, pp.
150-163, 2022.
https://doi.org/10.1504/ijcat.2022.123466
[15] Z. Sabir, M. R. Ali, and R. Sadat, “Gudermannian
neural networks using the optimization procedures
of genetic algorithm and active set approach for the
three-species food chain nonlinear model,” Journal
of Ambient Intelligence and Humanized Computing,
vol. 14, no. 7, pp. 8913-8922, 2023.
https://doi.org/10.1007/s12652-021-03638-3
[16] C. Zhao, W. X. Zhu, G. Qiao, and F. Zhou,
“Optimisation method with node selection and
centroid algorithm in underwater received signal
strength localization,” IET Radar, Sonar Navigation,
vol. 14, no. 11, pp. 1681-1689, 2020.
https://doi.org/10.1049/iet-rsn.2020.0178
[17] Y. Zou, “Coupled neural networks and genetic
algorithms application in the field of mine fire
extinguishing,” Informatica, vol. 48, no. 16, 2024.
https://doi.org/10.31449/inf.v48i16.6317
[18] Y. M. Wu, Z. Li, C. X. Sun, Z. B. Wang, D. S.
Wang, and Z. W. Yu, “Measurement and control of
system resilience recovery by path planning based
on improved genetic algorithm,” Measurement and
Control, vol. 54, no. 7-8, pp. 1157-1173, 2021.
https://doi.org/10.1177/00202940211016094
[19] Y. Zhou, “Structural damage identification of
large-span spatial grid structures based on genetic
algorithm,” Informatica, vol. 48, no. 17, 2024.
https://doi.org/10.31449/inf.v48i17.6428
[20] M. Yu, and S. Chai, “Adaptive iterative learning
control for discrete time nonlinear systems with
multiple iteration varying high order internal
models,” International Journal of Robust and
186 Informatica 49 (2025) 171–186 C. Zhong et al.
https://doi.org/10.31449/inf.v49i16.7452 Informatica 49 (2025) 187-198 187
Optimization of Emergency Material Logistics Supply Chain Path
Based on Improved Ant Colony Algorithm
Mingbin Wei
College of Economics and Management, YanShan University, Qinhuangdao 066000, Hebei, China
E-mail:weimingbin@stumail.ysu.edu.cn
Keywords: emergency material, improved ant colony algorithm (IACA), logistics supply chain, travelling salesman
problem
Received: October 29, 2024
Path selection is a critical challenge in emergency logistics management, particularly under realistic
disaster-related conditions. This study addresses the problem of optimizing logistics transportation during
major epidemics, considering constraints such as vehicle load, volume, and maximum travel distance per
delivery. The goal is to minimize costs related to distribution trips, time, early/late penalties, and fixed
vehicle expenses. By framing the problem as a generalized Traveling Salesman Problem, we developed
an Improved Ant Colony Algorithm (IACA) to reduce the longest distribution path. Simulation data from
Pudong, Shanghai lockdown zones revealed that IACA outperformed the traditional ACO algorithm,
achieving a 30% cost reduction and higher accuracy (R² = 0.98). Additionally, experiments on gate
assignment and TSP demonstrated the algorithm's superior optimization ability and stability. Overall,
IACA enhances delivery route efficiency, lowers costs and energy consumption, and improves emergency
logistics performance, proving to be a robust and reliable solution.
Povzetek: Avtor je razvil izboljšan algoritem kolonije mravelj (IACA), ki optimizira poti v logistični
oskrbovalni verigi za nujne materiale.
1 Introduction unpredictable and emergency material elements are very
complicated.
The logistics industry's use in many different industries is In order to effectively provide goods while minimizing
growing more and more common as the global economy losses, emergency logistics was first established in 2004 to
develop. Researchers are becoming more aware of the address the logistical challenges resulting from disasters.
crucial role emergency logistics (EL) plays in delivering With its emphasis on facility placement and material
supplies to disaster zones as a result of the exponential rise delivery, it is essential to emergency decision-making.
in emergency response operations and the rising frequency Supply channels are frequently disrupted by disasters,
of disasters, both man-made and natural [1]. In particular, resulting in large losses. 15% to 20% of all disaster losses
since 1980, catastrophes caused by nature have claimed the may be attributable to inefficient distribution [4]. The
lives of over 2.4 million people globally, and their focus of humanitarian logistic modeling services, which
economic toll has grown by more than 800%, reaching account for 80–90% of rescue expenses, is on rapid
$210 billion in year 2020 alone.137 million people in reaction deployment, which is essential for successful
China were directly impacted by different disasters by rescue operations. The ideas under discussion centre on
natural in the same year, resulting in 19,956.7 hectares of maximizing the distribution, transportation, and
crops being damaged, 370.15 billion yuan in direct positioning of emergency supplies to rescue locations. For
economic losses, and 591 fatalities [2]. disaster response to be successful, emergency logistics
The National Disaster Reduction Commission and the efficiency must be increased [5].
China National Ministry of Emergency Management both For emergency rescue and relief efforts to be successful,
state, China experienced 130 million individuals impacted the emergency supply chain must run smoothly. It is quite
by various disasters by natural in 2018, 588 fatalities, and challenging to gather comprehensive information to
264.46 billion RMB in direct economic losses. Using support emergency operations during large-scale
earthquakes as an example, in 2018, earthquakes of calamities since they are frequently unanticipated and
magnitude 6.0 or higher killed 3068 people worldwide and exceedingly destructive. The victims' livelihoods and
wounded over 16,000 more. In order to reduce the number security may be negatively impacted by ineffective or even
of casualties and property damage following a disruption, halted emergency material operations brought on by a
emergency materials must be delivered to the disaster shortage of supplies and knowledge [6]. Therefore, while
zones promptly, precisely, and efficiently [3]. Emergency reacting to large-scale emergencies, it is essential to solve
rescue is usually quite urgent because most disruptions are the fundamental concerns of rapid acquisition and
188 Informatica 49 (2025) 187-198 M. Wei
integration of comprehensive emergency supply chain The most important thing should be the timeline. Only if
materials and information flow. the emergency supplies are delivered to the Disaster
The planning, coordinating, and carrying out of logistics Supply Depots accurately and on time will the damage be
theory expert operations during emergencies, disasters, reduced. The fastest delivery time is therefore practically
and crises are guided by the framework and set of the most crucial factor in the improved ant colony model.
principles known as emergency logistics theory. It We suggested an Improved ACA-based approach to
includes a variety of ideas, tactics, and procedures meant solving the routing emergency material path problem. To
to guarantee the smooth and efficient movement of get the best answer, the process determined the shortest
information, products, services, and resources in order to path between nodes using a travelling salesman shortest
meet the demands of impacted communities and lessen the path tree structure [8]. According to research findings, the
effects of the crisis. The study uses a hybrid technique of suggested approach performed admirably in various
simulated annealing and ant colony optimization to offer a disaster networks. In conclusion, even though ACO has
low-carbon vehicle route optimization model for logistics been researched extensively and shown to perform
and distribution. For increased efficiency, it uses an effectively in organising routes, it is incredibly uncommon
adaptive elite individual reproduction strategy and adds a to utilise Ant Colony to optimise the supply chain route for
multifactor operator and carbon emission factor [7]. emergency material logistics in disaster areas.
Researchers are becoming more interested in emergency The study's primary contributions fall under the following
logistics. Nonetheless, the majority of recent studies categories.
address the location of facilities. This study focusses on the • First, this study identifies and measures the variables
supply chain for emergency material logistics following a that have been discovered to affect ACA's efficacy from
disaster. After a disaster strikes, it seeks to create the best both an internal and external standpoint. This includes the
plans possible for moving emergency supplies from one- IACA's shortest route while taking into account the supply
to-many supply depots to disaster depots. Fig. 1 depicts the chain's external environment, funding for materials and
emergency management domain. In order to demands equipment, complex emergency decision-making, and
victims and finish rebuilding the disastrous event region material transportation deployment. This is a definite step
after a disaster occurs, EM is a specific type of problem in filling the knowledge gap in the body of existing
with vehicle routing that examines how to transport relief literature.
materials from depots for supply to depots for demand • Second, according to research methodologies, the
(disaster areas). EMS is typically separated into many-to- majority of models in use today use traditional algorithms
many scenarios based on the quantity of supply depots, as including precise, heuristic, and meta-heuristic algorithms.
seen in Fig. 2 • Third, in order to show the validity of the results, we
contrasted the outcomes of the IACA method with those of
the conventional ACA strategy. Furthermore, these
findings will help emergency managers better pinpoint the
sources and means of important elements, as well as the
causal and hierarchical connections among them, and they
will be able to contribute to the development of a robust
and effective path.
This study is organized as the literature review for this
topic is presented in Chapter 2. The research strategy and
Figure 1: Supply depots to demand depots methodology are described in-depth in Chapter 3. The
application of the proposed Improved-ACO Algorithm is
shown in Section 4, followed by the application's
outcomes and the discussion these finding, Section 5
summarises the findings, study shortcomings, and future
research directions.
2 Literature review
These days, the transportation sector is growing quickly,
and its main concentration is on the logistical distribution
of perishable agricultural goods. The Emergency material
logistics supply chain path finding path optimization
method's application usefulness is recognised by experts.
Figure 2: Quantity of supply depots in many to many The supply chain for perishables has been examined by
scenario several academics.
Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 189
For the uncertainty of unanticipated events in roadways in An ACO-based optimisation technique was put forth by
cities during shipping, a GA-based path optimisation researchers to address the issue of UAV scheduling routes.
model was introduced [9]. As part of the endeavour to The procedure was solved for DSP and optimised for
reduce transportation costs, the logistics path optimisation hierarchical “pheromone”-based processing. According to
model with a challenging window was developed to the testing results, the suggested approach has outstanding
address the path dynamic. The outcomes of the path planning speed and good path planning quality [14].
experiments showed that this path optimisation model To solve the path routing problem of the fourth- party
performed successfully. In order to address the sustainable logistics, a method based on the ant colony system and the
food supply chain optimisation issue, this study presented improved grey wolf algorithm was proposed. During the
a mixed integer linear programming model. In the purpose process, which included a carrying capacity and reputation
to reduce fuel costs, transportation expenses and carbon constraint from the beginning node to the node of
emissions were integrated, and Norwegian salmon destination, known as the transit range, the ratio utility
exporters were used to conduct a suitability analysis [10]. theory was used to determine the customer's risk appetite.
For the prompt delivery of disaster relief materials after The results of the study demonstrated that the
natural catastrophes, [11] and his team proposed an recommended strategy could effectively finish path
algorithm based on meta-heuristics. in hybrid Three optimisation planning [15].
optimisation improvement approaches were presented, the For the planning of the return path challenge of logistics
urgency coefficient of each demand point was evaluated, networks in reverse, author suggested an ACO-based path
and Harris eagle optimisation and random PSO were technique. The procedure created a MINLP model,
integrated. The outcome of the study shows that the evaluated costs using a closed-loop, multi-stage logistics
suggested approach had a maximum degree of network, and tested using thirty instances. Results from
computational correctness. experiments showed that the suggested approach could
This Study developed a path optimisation technique based produce return pathways of excellent quality [16].
on enhanced GA to meet the cost and efficiency criteria of An ACO-based approach to solving the routing nodal path
distributing fresh food. The procedure implemented a problem. To get the best answer, the process determined
linear adaptive cross-variance technique and designated the shortest path between nodes using a rooted shortest
certain elements as penalty factors. The finding of the path tree structure. According on research findings, the
study shows that the approach could successfully reduce suggested approach performed effectively in networks of
delivery path length and had a higher path optimisation various sizes. In conclusion, despite extensive research and
efficiency [12]. demonstrated effectiveness in route planning, ACO is
A two-objective optimisation model was presented by this currently rarely used to optimisethe route taken by cold
author for the adaptable perishable supply chain design chain logistics to distribute perishable agricultural goods.
commodities [13]. During the process, product and route Given the pressing need for additional technical references
disruption deterioration were thoroughly examined, utility to support the growth of the cold chain logistics
role GA was added to optimise the method, and dynamic distribution industry, the Improved ACO-based
pricing was employed to handle crises. The experimental optimisation model was put forth, and the technical
results demonstrated the good flexibility of the suggested features of ACO were applied to optimise the cold chain
approach. Using Ant Colony Optimization (ACO) or other logistics distribution path for perishable agricultural
technologies related to computers has been researched by products [17].
several academics.
Table 1: Summary on existing and suggested method compared with computational accuracy, time efficiency and cost
reduction
Algorithm Computational Accuracy Time efficiency Cost reduction
TGWO TGWO algorithm as a When compared to the In terms of the overall
decision support tool to boost TS and GWO cost of distribution, it
supply chain performance, algorithms, the TGWO saved 14.34 percent
cut expenses, and improve method reduced the and 9.03 percent.
cold chain logistics overall journey
operations. distance by 50.34% and
30.66%, respectively.
190 Informatica 49 (2025) 187-198 M. Wei
IPSO The algorithm and If every demand's According to the
emergency logistics vehicle delivery priority is met, sensitivity analysis,
route optimization model for an enhanced vehicle when the time cost is at
severe epidemics that are routing optimization its lowest, there should
suggested in this research algorithm that takes be three vehicles in the
work well. delivery urgency into distribution center. The
account can save a whole expense was
specific amount of lowered by 20.09%.
time;
NN The prediction findings Choose the route for Reduce the percentage
demonstrate that the material delivery that of transportation
prediction can yield has the quickest speed expenses as much as
increased accuracy and better and the lowest distance. you can, and save
route matching. money on supplies and
automobiles.
IACA the suggested model this study uses cost reduction based on
achieved the best solution travelling salesman transportation,
accuracy with 98.5% problem to find the inventory, labor,
shortest route and minimize distance,
efficient in time avoid traffic and
travelling distance for harzards by 30.2%
emergency
3 Methodology supply. The “pheromone” that remain on a trail may
persuade other ants to follow it. The strongest
Inspiration: “pheromone” indicates a shorter path, which most ants can
The process of optimising ant colonies is iterative. Several eventually follow.
fictitious ants are considered at each cycle. With the
restriction that ant not go to any vertices that already made 3.1 Improved ant colony algorithm
it to during the ant walk, each of them constructs a solution This study uses the improved ant colony algorithm's high
by moving from vertices to vertices on the graph. An ant adaptability, multi-concurrency, resilience, and global
uses a stochastic mechanism biassed by the “pheromone” search capabilities to handle the logistics supply chain's
to choose the next vertex to visit at each stage of the emergency material problem. Consequently, the enhanced
solution building process. For instance, the next vertex in ant colony method is presented to solve the model since it
vertex I is selected at random from among those that is highly parallel, offers the benefits of high fault tolerance
haven't been visited yet. More specifically, k can be chosen and self-adaptation, and allows for heuristic improvement
with a probability proportional to the “pheromone” to enhance the algorithm's convergence. The ant colony
connected to edge (m, n) if it has never been visited before. method is a Traveling salesman algorithm that finds the
The calibre of the answers the ants have created will shortest path by simulating ant populations' foraging
depend on, the “pheromone” values are modified at the end behavior. Individuals of the ant colony will evaluate and
of an iteration. By doing this, the ants are skewed to create choose the optimal foraging path based on the
solutions that are similar to the greatest ones they have concentration of “pheromones” left by ants as they pass
created in previous cycles. The basic concept underlying through nodes along the route. Rich customer and order
the AC method is inspired by the way ant colonies behave data are challenging for traditional logistics to manage.
when they are looking for food. Ants usually start by Therefore, in the event of a natural disaster, the logistics
aimlessly looking around for food, bringing some of what automation path finder now incorporates an improved
they find back to their colony. They also leave a ACA. The drawbacks of conventional ACA are addressed
“pheromone” on the track they found. The worth of with certain enhancements, which successfully resolve
“pheromones” left in their wake, which gradually issues like resource scheduling and route planning in the
dissipates, depends on the quantity and calibre of the food logistical process. Figure 3 explained about the flow of
emergency material path supplying method.
Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 191
Figure 3: Block diagram of the method
3.2 Improved ant colony algorithm
optimization strategy for emergency
𝑛
logistics material 𝑛
𝑛
𝐶2 = ∑ ∑ 𝑖 ∑ 𝐶 𝑥 ′
𝑖𝑗𝑘
𝑣=1 𝑖𝑗𝑘 (2)
𝑗=0
Several production process connections are becoming 𝑗=1
more specialized due to the escalating market
competitiveness. As a result of this tendency, logistics and The set of transportation between places 𝑖and𝑗 is
commercial flow are now separated, progressively represented by 𝐶𝑖𝑗𝑘 in eq. (2). The variable between
emphasising the significance of logistics. Conventional transportation points is called 𝐶𝑖𝑗𝑘. One constant is n.
logistics models are inefficient and do not provide IACO puts ants at the first dispersion site in Fig. 4. After
intelligent assistance. A lot of management and operations creating a tabu table, the cycle is initiated. The
involve manual labor, which is ineffective and prone to experimental random selection technique determines the
mistakes. At the same time, businesses find it challenging next transfer node based on the chance of ants travelling to
to forecast and decide on logistical procedures when they various nodes. The transfer node is added to the tabu table
lack intelligent help. Additionally, this hinders the rapid once it satisfies the constraint constraints. The ants'
optimization and modification of logistical plans. The relevant journey length and delivery cost are determined
emergency logistics model has been developed based on once the transfer is completed constantly until the tabu-
this. The material logistics supply chain is a complete table is full. The worldwide ““pheromone”” is adjusted
system that controls and optimizes the logistics process and the path optimisation is finished based on the
using a variety of automation technologies and tools. computed results. After the maximum number of cycles
Logistics transportation costs are a significant part of the has been reached, the outcome is the optimal path solution.
logistics chain, and they can be managed and controlled to In the real-world application, the starting point and
increase the efficiency of logistical processes. associated fundamental path of dispersion parameters are
The cost of logistics transportation is shown in eq. (1). input. The research method is then used to develop the path
solution, and the best solution is chosen to finish the path
𝑘
𝐶1 = ∑ 𝑠𝑘 ∑𝑚 . generating process.
𝑘=1 𝑖=1 𝑓i (1)
𝑠𝑘is the variable in eq. (1). The vehicle number is 𝑣. The
number of articles, including all transportation expenses,
is denoted by 𝑚. 𝑓is the cost of driving eq.(2) displays the
associated transportation cost.
192 Informatica 49 (2025) 187-198 M. Wei
of conventional autonomous logistics systems is rather
low, despite the growing need for logistics technology. It
is necessary to enhance the capacity to develop
transportation networks, manage the supply chain, and
optimise warehouse operations. An optimisation algorithm
called ACA mimics how ants might forage for food in the
wild. By mimicking the ant's ““pheromone”” transmission
mechanism, it facilitates cooperation and information
exchange throughout the optimisation process. Thus,
adding ACA to the logistics control model can improve the
management level of autonomous logistics control by
successfully resolving issues like resource scheduling and
path planning in the logistics process. Eq. (3) shows the
node selection probability in ACA.
[π (s
𝑃𝑘 ij )]α [𝜂ij(s)]β
¨ (t) = { β 𝛼 , j∈ 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑝𝑎𝑠𝑠𝑒𝑑
𝑖𝑗 ∑[[πis(s)]α [𝜂ij(s)] ]
(3)
In eq. (3),𝑖𝑗 is the data concentration. 𝑖𝑗 is the path
visibility of path. is the trade-off factor in
““pheromone””. is the “heuristic factor” for expected
values
. The 𝑖𝑗 is shown in eq. (4)
1
𝜂𝑖𝑗 = (4)
𝑑𝑖𝑗
In eq. (4), 𝑑𝑖𝑗 is denotes the point i and point j distance.
After complete ant colony cycle, its ““pheromones” will
also be updated accordingly. The expression for
“pheromone” update is shown in eq.(5)
𝜏𝑖𝑗(t+n)=(1-𝜌)𝜏𝑖𝑗(𝑡) + Δ𝜏𝑖𝑗
Figure 4: IACA flowchart (5)
The “pheromone” evaporation coefficient is represented
The logistics supply chain system consists of a server, by eq. (5). The time point is 𝑡. The next node is typically
front-end work, mechanical arm, three-dimensional chosen at random by conventional ACA, though. While
warehouse management, AGV monitoring, PLC random selection facilitates the exploration of broader
monitoring, sorting system, and commodities warehouse, problem areas, the early stage's convergence speed is
among other components. The primary the sorting system's slower due to the lengthy application of positive feedback.
objective is to determine and categorise products. They are If the complete supply chain is not coordinated and
dispersed throughout various locations or forms of optimised, the logistics chain as a whole may operate
conveyance based on their type or destination. A inefficiently, which may increase time costs. In order to
commodities warehouse, which includes both automated achieve this, the study presents logistic chaotic mapping
and conventional warehouses, is a facility used for the with the goal of leveraging its features to increase the
storage of products. PLC and AGV monitoring are the two precision of knowledge accumulation. Then, during the
primary components of logistics system monitoring. optimisation phase, some randomness is introduced into
Several components of the logistics system can be the basic ACA's routes.
monitored and controlled using PLC monitoring, which In IACA, at the conclusion of every iteration, a logistics
functions as a programmable logic controller. who possesses the best solution for that the optimal answer
Transporting items within or between warehouses requires or iteration because the methods inception changes the
AGV monitoring, which keeps track of the equipment's disaster distanceas per eq.(6).Additionally, each
position and operational state. To guarantee precise instruction updates the amounts of “pheromones” on the
storage and retrieval of items, the logistics system as a final Communication, as illustrated in eq.(7), where 𝜏𝑖, 𝑗 is
whole uses three-dimensional warehouse management, the updated Schedule on the last feedback, denoted by
tracking, and computer-based management. Order 𝑅𝑖, 𝑗, 𝜏𝑖, 𝑗 ∗ is the updated location, τ0 is the initial
1
processing, inventory queries, and other front-end tasks are warehouse, 𝛥𝜏𝑖, 𝑗 is represented as for the duration of
𝐿𝑏𝑒𝑠𝑡
the key tasks. The system uses a robotic arm to operate in
the optimal route 𝐿𝑏𝑒𝑠𝑡, and p and q are variables in (0,1)
order to accomplish a variety of activities, including
that correspond to the “pheromone” and its decay
grabbing, moving, assembling, etc. The management level
Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 193
coefficient feedback rate, respectively. Based on the therefore 𝑑𝑖𝑗 = 𝑑𝑗𝑖for any pair of nodes. In the asymmetric
likelihood 𝑝𝑖, 𝑗𝑘, which is computed as shown in Equation TSP, for at least one pair of nodes (i, j), asymmetric TSP
(7) where 𝜂, each ant k chooses its new path.𝐼, 𝐽is the for at least one pair of nodes (i, j).
inverse of the route's length.The locations that have not yet Define the variables in eq. 10:
been visited are shown by the letters i, j, and l, while The 𝑥_(𝑖, 𝑗) =
respective impacts of “pheromone”s concentrations and ( 1 𝑖𝑓 𝑡ℎ𝑒 𝑎𝑟𝑐(𝑖, 𝑗) 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑜𝑢𝑟 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
heuristic data are indicated by the control parameters α and (10)
β, respectively. The TSP can be formulated as a generalisation of a well-
known integer program formulation.
𝜏𝑖, 𝑗 ∗= 1 − 𝑟𝜏𝑖, 𝑗 + 𝑟 · 𝛥𝜏𝑖, 𝑗, 𝑖𝑓𝑅𝑖, 𝑗 (6) The constraints are written as:
is in the best path 𝜏𝑖, 𝑗, otherwise 𝑋𝑛 𝑐 = 1 𝑥𝑖𝑗 = 1, 𝑑 = 1, 2, 3,· · · , 𝑛 (11)
𝜏𝑖, 𝑗 ∗= 1 − 𝑞𝜏𝑖, 𝑗 + 𝑋𝑛 𝑑 = 1 𝑥𝑖𝑗 = 1, 𝑐 = 1, 2, 3,· · · , 𝑛 (12)
𝑞𝜏0 (7) 𝑥𝑖𝑗 ∈ {0, 1}, 𝑐, 𝑑 = 1, 2, 3,· · ·
, 𝑛 (13)
As the process runs, FC first generate a variety of 𝑋𝑛 𝑐, 𝑑 ∈ 𝑆 𝑥𝑖𝑗 ≤ |𝑆| − 1, 2 ≤ |𝑆| ≤ 𝑁 − (14)
randomly generated solutions. “pheromone” s are then
updated based on the problem type and IAC Algorithm, The overall cost that needs to be reduced is represented by
with “pheromone” s placed on the graph's edges or the objective function (11) in these formulations. While
vertices, to improve the solutions. The probability of the constraint (13) ensures that each town has a single position
edge, which is computed as follows, determines whether allotted to it, restriction (12) ensures that each place is
or not to traverse the edge between two nodes, i and j in occupied by a single city j. The zero-one variables'
eq(8). integrality constraints 𝑥𝑖𝑗 (𝑥𝑖𝑗 ≥ 0) are represented by
𝜋( 𝛽
𝑡)𝑎
𝑖𝑗𝑛
𝑃𝑘𝑗𝑖 = i𝑗 constraint as follows. Constraint (14) guarantees that no
le0, (8)
∑ 𝜋(𝑡)𝑎 𝛽isFeasib
𝑖 𝑛
k(i) detours will be created and that every city on the final
𝑗 i𝑗
𝑗∈𝑁
itinerary will be visited once.
where, 𝑃𝑘𝑗𝑖 is chances for the understanding of instruction
Algorithm 1: Pseudocode for IACA
from nodes I and J, 𝜋(𝑡)𝑎
𝑖𝑗denotes the value of the location
1. Build an environment model
𝛽
range, 𝑛i𝑗 is the materials, and 𝑁k(i)
is the establishment of 2. Intialising thne count of ants
nodes that can be visited by comments 𝑃𝑘. To get better 𝑃𝑚𝑎𝑥 , 𝑀, 𝑆 𝐸, 𝛼, 𝛽, 𝑎, 𝑏, 𝑐, 𝑅0, 𝜋𝑖𝑗 , 𝜂𝑖𝑗
results, it is advised to run a local search before upgrading 3. For 𝑃 = 1 to 𝑃𝑚𝑎𝑥 do
the “pheromone” s. Nonetheless, the following approach is 4. Calculate β according to the equation (3);
recommended for upgrading the eq. (9): 5. Calculate 𝜌according to the equation (8);
𝜋𝑖𝑗(𝑡 + 1) = (1 − 𝜌) ⋅ 𝜋𝑖𝑗(𝑡) + Δ𝜋𝑖𝑗(𝑡) (9) 6. For 𝑘 = 1 𝑡𝑜 𝑀 do
Researchers and supply chain can learn more about the 7. Ant K into S;
efficacy and customer communication as well as pinpoint 8. While ant 𝐾 does not reach 𝐸 and Optional node>0 do
areas for development by utilizing the IAC Algorithm to 9. Determine the next emergency logistics from
analyze emergency location from warehouse. Equation 4,5,9;
10. While ant 𝐾 in a deadlock do
3.3 IACAmethod with traveling salesmen 11. Use deadlock handing route mechanism
problem 12. Set the deadlock points as an obstacle point;
Mathematicians and computer scientists in particular have 13. End while
focused a lot of attention on the travelling salesman 14. End while
problem (TSP) because it is both straightforward to 15. Save ant 𝐾 taking path;
explain and challenging to solve. Looking for the shortest 16. Calculate ant 𝐾 length path;
restricted path to the target. A full directional graph 𝐺 = 17. End for
(𝑁, 𝐴), can be used to represent the TSP, where A is a 18. Calculate the shortest path for the iteration;
collection of arches and 𝐷 = 𝑑𝑖𝑗 is the price (a distance) 19. Divide the subpart by partitioning method;
vector for every arc (𝑖, 𝑗) ∈ 𝐴.. Often referred to as cities, 20. Update pheromones for each subpart by Equation
N is a collection of n nodes, or vertices. The cost of matrix 10,11,12,13;
D could be either symmetrical or asymmetrical. Finding 21. Set upper and lower pheromone limits by Equation 14;
the shortest closed tour that visits each of the 𝑛 = |𝑁| 22. End for
nodes of G precisely once is known as the TSP. The 23. Output the optimal path;
distances between the cities in symmetric TSP are
irrespective of the path in which they traverse arcs,
194 Informatica 49 (2025) 187-198 M. Wei
4 Results and discussion
Ants make better initial decisions and spend less time 4.2 Experimental analysis
pursuing fruitless avenues when they are given more After improving ACA through performance analysis of the
insightful instruction. The enhanced algorithms minimize emergency logistics supply chain in Travelling Salesman,
the number of iterations required by directing ants toward a logistics automation system is built. First, the suggested
promising areas of the search space right away. Improved Ant Colony algorithm is used to verify its
Experimental Setup: performance. For system simulation, the experiment is
The technique is programmed and solved in this work carried out in MATLAB. Through table 2 establishment of
using MATLAB 2017a, and it was evaluated on an Intel appropriate parameters and limitations, the IACA
notebook running 64-bit software with a CPU speed of algorithm's performance is monitored.
2.20 GHz and 4 GB of RAM.
Table 2: the experimental model parameters
4.1 Dataset Parameters & units Values
The emergency supply mechanism was promptly put into Fixed Cost (Yuan) 261
place to provide the basic necessities of life for inhabitants
transport cost (Yuan) 4
in Shanghai lockdown zones. In order to store and
Vehicle speed (Km/h) 60
distribute supplies, ten ESWs were quickly established all
the way through the city, utilising logistics supply, district Maximum Vehicle limit of 14w
emergency food enterprises' distribution centres, and other mileage (Km)
locations [18] Therefore, if the emergency material heavy load (Kg) 4250
logistics is considered the point as central, the largest pallet sizes(mm) 465*455
geographic area a disaster can potentially cover is
approximately 633 km2, since the area of administration 4.3 Data preprocessing
of Shanghai is 63401 km2. Caolu Modern Agricultural
Min-max normalization
Park was chosen as the Emergency material supply, and Min-max normalization is a technique for normalizing that
the relevant data came from the lock-down zones created involves linearly transforming the initial data to create an
on April 16, 2022, in accordance with Shanghai's distinct
equilibrium of value comparisons before and after the
preventative and control needs. Within the process. This approach could use the following Equation.
ESWFootnote1's coverage area were 50 lockout zones,
including Magnolia Fragrance Garden Phase II, Sunshine 𝑌−𝑚𝑖𝑛 (𝑌)
𝑌𝑛𝑒𝑤 =
Flower City, and FengchenLeyuan. The ESW is 𝑚𝑎𝑥 (𝑌)−𝑚𝑖𝑛 (𝑌)
represented by the number 1 (shown in Fig 5(b) with a dot Where,
in green), and the 50 lockdown zones are represented by 𝑌-Old value,
the numbers 2–51. Figure 6 displays the regional 𝑌𝑛𝑒𝑤- The new value obtained from the normalized
geographic information's distribution map. The ESW and outcome,
part of the lockdown zones in Shanghai's Pudong New 𝑚𝑖𝑛 (𝑌)–Minimum value in the collection,
Area are shown by the red area in Figure 5. The overall 𝑚𝑎𝑥 (𝑌)–Maximum value in the collection.
distribution data is displayed in Figure 5(b). The red circle
indicates the ESW zone, while the red and green dots Outlier Detection and Removal
indicate the lockdown and ESW zones, respectively. Outliers can degrade the efficiency of machine learning
models by distorting statistical relationships among
features. To eliminate outliers, we employ Z-score
evaluation, which determines how far each data point
deviates from the mean in standard deviations. A Z-score
greater than 3 or less than -3 denotes an outlier that should
be eliminated. The Z-score is determined as equation.
below.
X−μ
𝑍 =
σ
where X is the data point, μ is the mean of the attribute,
and σ is the standard deviation.
Comparing Traditional and proposed method
ACO's initial total cost function value in Figure 6 was
above 3600, and after 132 iterations, it dropped to its
Figure 5: Lockdown zone for dataset
Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 195
lowest value of 339. The initial value of IACO's total cost prediction accuracy and can optimize logistics routes while
function was less than 3243, and after 19 iterations, it lowering energy consumption in logistics distribution.
dropped to its lowest value of 3208. The study approach Compared to other models, the fitting effect is superior. In
outperformed the conventional ACO in terms ofthe order to confirm the suggested model's scalability and
convergence speed as well as the initial value of the total dependability.
cost function speed when comparing the convergence
function images of the ACO and IACO models. A region
with little Testing was done on the passenger flows when
the traffic flow was nearly nothing in order to confirm the
efficacy of the study approach in real-world application.
Litchi was chosen for transportation because it is
perishable and must be stored at a low temperature while
being transported. Fuel trucks were used by transport
vehicles, and for testing the application, 31 locations as
targets were chosen.
Figure 7: Comparison of model prediction accuracy
Figure 6: Total cost value of traditional and proposed The accuracy and solution time of four distinct logistics
models are contrasted in Fig 8. The accuracy comparison
The prediction accuracy of four distinct logistics models is results are displayed in Fig 8(a). At 98.58%, the suggested
contrasted in Figure 7. The automated logistics control model achieved the best solution accuracy. The contrast of
model developed for this study's prediction accuracy solution times is displayed in Figure 8(b). Although it was
results are displayed in Figure 7(a). With an R2 of 0.98, greater than the other three models, the suggested model's
the results demonstrated that the designed model had the solution time of 44.64 seconds was still within a
highest prediction accuracy. This was 0.27, 0.30, and 0.17 reasonable range.
higher than the prediction accuracy of the following
algorithms: Tabu-Grey Wolf Optimisation (TGWO)[19],
Improved Ant Colony Algorithm (IACA)[Proposed],
Improved Particle Swarm Optimisation Algorithm
(IPSO)[20], and neural network (NN)[21]. In conclusion,
the model has a high prediction accuracy and can optimise
logistics routes while lowering energy consumption in
logistics distribution. Compared to other models, the
fitting effect is superior. The suggested model is tested
using real-world data to further confirm its scalability and
dependability. The suggested model's accuracy and
solution time are contrasted with those of the existing and
proposed approaches.
A statistical metric called R-squared is used to assess how
well a regression model fits data. R-squared values range
from 0 to 1. When the model fits the data exactly and the
anticipated and actual values are identical, we have an R-
square of 1. However, when the model fails to learn any
association between the dependent and independent Figure 8: Accuracy and solution time of models
variables and does not predict any variability, we obtain R-
square equals 0. In conclusion, the model has a high
196 Informatica 49 (2025) 187-198 M. Wei
Table 3: Robustness analysis of the four algorithms.
Techniques Hi Av inacc robus t(
gh era urac tness( s)
es ge y(ξ) r)
t
IPSO 38 38 2.70 2.10 12
3. 4.5 .1
52 7 0
TGWO 37 38 2.38 5.30 9.
7. 8.7 56
82 9
NN 37 37 2.64 10.09 6.
4. 7.1 69
38 6
IACA 37 37 0.74 20.20 1.
[Proposed] 1. 4.0 44
17 3
Figure 9: Finding Ideal path length finding
In Figure 9 it is clear that the improved ant colony
algorithm requires less iterations to finding the ideal path
than the current approaches. Additionally, the optimal path
length is shorter, allowing for faster convergence.
Performance Analysis
For this paper, the method was run 55 times, and for the
three algorithms in the literature, the average_time (t),
inaccuracy (ξ) in %, and robustness (r) in % of every
approach were noted. The following are the equations: Figure 10: ROC Curve
𝜉 = 𝑎𝑣𝑒 – 𝑏𝑒𝑠𝑡/ 𝑏𝑒𝑠𝑡 ;
𝑟 = 𝑚/𝑛; Figure 10 shows a visual depiction of the model's
whereaverage is the overall mileage in average, Highestis performance over all thresholds is the ROC curve. The true
the ideal overall mileage, n is the number of tests, and m positive rate (TPR) and false positive rate (FPR) are
is the number of counts at which the optimal solution is computed at each threshold (practically, at predetermined
found. The information for each one of the four methods intervals), and the TPR is then graphed over the FPR to
findings is shown in Table 3 and show that the algorithm create the ROC curve. In the event that all other thresholds
generated the highest overall mileage, average overall are disregarded, a perfect model, which at some threshold
mileage, inaccuracy value, robust r, and algorithm time has a TPR of 1.0 and an FPR of 0.0, can be represented by
expenditure when compared to the other examined either a point at (0, 1) The ROC is a helpful metric for
methods. These findings suggest that the method performs evaluating the performance of distinct models, provided
well in terms of computing complexity and robustness. that the dataset is fairly balanced. In general, the better
This implies that the method employed in this study model is the one with a larger area under the curve. The
surpasses the single ant colony algorithm and yields an suggested model [IACA] shows ROC curve has highest
ideal result with minimum error, high precision, and accuracy with 0.95. A confidence interval is a range of
consistent robustness. numbers that can be believe contains a population
parameter. For any normal distribution approximately
95% of the values are within two standard deviations of the
mean. From the following formula determine a 95%
confidence interval:
95% confidenceinterval
− 𝑠
= 𝑥 ± 1.96
√𝑛
Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 197
4.3 Discussion used to confirm the model's superiority. Future research
can produce predictions with more accuracy and fewer
IACA has been included to the emergency logistics supply iterations as the forecast results. which enhances
chain path in order to increase logistics efficiency. The scalability, adaptability, and real-time capabilities while
enhanced ACA served as the foundation for the building integrating developing technologies with IOT. This
of the emergency logistics system model. The IACA achieves so by reflecting the precision and flexibility of the
algorithm demonstrated great optimization capabilities data in the optimization model.
based on test results on the dataset. It was able to locate the
ideal solution after 132 iterations, lowering the cost Limitations of IACA:
reduction value to be more than 30% below the average • Enhanced features like dynamic pheromone updates,
cost. Additionally, the suggested model's delivery distance hybridization, and real-time data integration add
was greater, its average power consumption per logistics computing overhead;
node was lower, its emergency material for disasters was • Even with improvements, ACO may not be able to handle
higher, and its prediction accuracy—which had an R2 of the exponential expansion in the number of paths as the
0.98—was higher than that of the NN, TGWO, and IPSO. network size increases.
This suggests that the suggested approach has some
practical application value and good optimization • The effectiveness of the enhanced ant colony is highly
capabilities that can successfully increase delivery dependent on sensitive parameters, including the number
efficiency and lower fuel costs. While this study enhanced of vehicles, heuristic weighting, and pheromone
the program's performance, it also made the method more evaporation rate.
complex, which reduced its computing efficiency. • If dynamic data is imprecise or delayed, offer less-than-
Therefore, in order to maintain performance, it will be ideal routes.
necessary to significantly lower the algorithm's complexity There is a trouble striking a balance between time and
in the future. money when shipping products to several high-priority
Benefits of the proposed approach include: Lower sites.
operating costs due to the algorithm's optimization of
routes and vehicle usage, which lowers labor, fuel, and
maintenance expenses. When essential materials are References
delivered on time, penalties or reputational harm from [1] T. Kundu, J.-B. Sheu, and H.-T. Kuo, “Emergency
delays are avoided; alternate routes are promptly found to logistics management—Review and propositions for
future research,” Transportation Research Part E:
avoid stopped highways; and time and expense are
Logistics and Transportation Review, vol. 164, p.
balanced to supply essential supplies effectively.
102789, Aug. 2022, doi:
https://doi.org/10.1016/j.tre.2022.102789
5 Conclusion [2] Z. Li and X. Guo, “Quantitative evaluation of
The logistics sector is changing dramatically as a result of China’s disaster relief policies: A PMC index model
its ongoing expansion. The IAC Algorithm was developed approach,” International Journal of Disaster Risk
to reduce the amount of time it takes to distribute supplies Reduction, vol. 74, p. 102911, May 2022, doi:
from an Emergency Support Area (ESA) to Shanghai's 50 https://doi.org/10.1016/j.ijdrr.2022.102911
lockdown zones. Using data from Pudong, Shanghai, the [3] Y. Zhang, Q. Ding, and J.-B. Liu, “Performance
system was evaluated and contrasted with three other evaluation of emergency logistics capability for
intelligent optimisation techniques. The findings public health emergencies: perspective of COVID-
demonstrated that the algorithm's accuracy was enhanced 19,” International Journal of Logistics Research and
by its local optimisation operation and that its relative error Applications, pp. 1–14, Apr. 2021, doi:
value was lower than that of other algorithms. In addition https://doi.org/10.1080/13675567.2021.1914566
to being applicable to discrete optimisation issues, this [4] F. Diehlmann, M. Lüttenberg, L. Verdonck, M.
method can serve as a general algorithmic framework for Wiens, A. Zienau, and F. Schultmann, “Public-
a number of scenarios, including rescue supplies, resource private collaborations in emergency logistics: A
allocation during wildfires, emergency rescue during framework based on logistical and game-theoretical
floods, and the transportation of hazardous items. concepts,” Safety Science, vol. 141, p. 105301, Sep.
However, the impact of road networks on transportation 2021, doi:
and supply distribution, lockdown zone configuration, and https://doi.org/10.1016/j.ssci.2021.105301
[5] S. Jomthanachai, W.-P. Wong, K.-L. Soh, and C.-P.
population density are not taken into account in this
Lim, “A global trade supply chain vulnerability in
article.But existing logistics systems frequently lack the
COVID-19 pandemic: An assessment metric of risk
necessary flexibility and insight. In order to do this, the
and resilience-based efficiency of CoDEA method,”
study modified the logistics supply chain system by
Research in Transportation Economics, vol. 93, p.
integrating TSP into IACA. Simulation experiments were
198 Informatica 49 (2025) 187-198 M. Wei
101166, Dec. 2021, doi: 11709, Aug. 2022, doi:
https://doi.org/10.1016/j.retrec.2021.101166 https://doi.org/10.1109/tits.2021.3106305
[6] “International Journal of Disaster Risk Reduction | [15] F. Lu, W. Feng, M. Gao, H. Bi, and S. Wang, “The
Vol 74, May 2022 | ScienceDirect.com by Elsevier,” Fourth-Party Logistics Routing Problem Using Ant
Sciencedirect.com, 2022. Available: Colony System-Improved Grey Wolf Optimization,”
https://www.sciencedirect.com/journal/international vol. 2020, pp. 1–15, Oct. 2020, doi:
-journal-of-disaster-risk-reduction/vol/74/suppl/C. https://doi.org/10.1155/2020/8831746
[Accessed: Oct. 26, 2024] [16] M. Ashour, R. Elshaer, and G. Nawara, “Ant Colony
[7] Y. Liu, J. Li, M. Liu, and B. Jiao, “An Enhanced Ant Approach for Optimizing a Multi-stage Closed-Loop
Colony Algorithm-Based Low-Carbon Distribution Supply Chain with a Fixed Transportation Charge,”
Control Method for Logistics Leveraging Internet of Journal of Advanced Manufacturing Systems, pp. 1–
Things (IoT),” Wireless Communications and Mobile 24, Nov. 2021, doi:
Computing, vol. 2023, pp. 1–12, Nov. 2023, doi: https://doi.org/10.1142/s0219686722500159
https://doi.org/10.1155/2023/5555221 [17] ]M. Abdolhosseinzadeh and M. M. Alipour, “Design
[8] H. Jin, Q. He, M. He, S. Lu, F. Hu, and D. Hao, of experiment for tuning parameters of an ant colony
“Optimization for medical logistics robot based on optimization method for the constrained shortest
model of traveling salesman problems and vehicle Hamiltonian path problem in the grid networks,”
routing problems,” International Journal of Numerical Algebra, Control & Optimization, vol. 11,
Advanced Robotic Systems, vol. 18, no. 3, p. no. 2, p. 321, 2021, doi:
172988142110225, May 2021, doi: https://doi.org/10.3934/naco.2020028
https://doi.org/10.1177/17298814211022539 [18] Chen, H. 2022. “Shanghai Starts 10 Emergency
[9] M. Yang, “Yang, M. (2022). Research on Vehicle Supply Warehouses.”
Automatic Driving Target Perception Technology https://contentstatic.cctvnews.cctv.com/snowbook/i
Based on Improved MSRPN Algorithm. Journal of ndex.html?item_id=32234810742465611. Chen, C.-
Computational and Cognitive Engineering, 1(3), H., Y.-C. Lee, and A. Y. Chen. 2021.
147–151. [19] H. Zhang, J. Yan, and L. Wang, “Hybrid Tabu-Grey
https://doi.org/10.47852/bonviewJCCE20514. wolf optimizer algorithm for enhancing fresh cold-
[10] A. De, M. Gorton, C. Hubbard, and P. Aditjandra, chain logistics distribution,” PLoS ONE, vol. 19, no.
“Optimization model for sustainable food supply 8, pp. e0306166–e0306166, Aug. 2024, doi:
chains: An application to Norwegian salmon,” https://doi.org/10.1371/journal.pone.0306166
Transportation Research Part E: Logistics and [20] K. Tan, W. Liu, F. Xu, and C. Li, “Optimization
Transportation Review, vol. 161, p. 102723, May Model and Algorithm of Logistics Vehicle Routing
2022, doi: https://doi.org/10.1016/j.tre.2022.102723 Problem under Major Emergency,” Mathematics,
[11] T. Yan, F. Lu, S. Wang, L. Wang, and H. Bi, “A vol. 11, no. 5, pp. 1274–1274, Mar. 2023, doi:
hybrid metaheuristic algorithm for the multi- https://doi.org/10.3390/math11051274
objective location-routing problem in the early post- [21] M. Chen, “RETRACTED ARTICLE: Optimal path
disaster stage,” Journal of Industrial and planning and data simulation of emergency material
Management Optimization, vol. 19, no. 6, pp. 4663– distribution based on improved neural network
4691, Jan. 2023, doi: algorithm,” Soft Computing, vol. 27, no. 9, pp. 5995–
https://doi.org/10.3934/jimo.2022145 6005, Apr. 2023, doi:
[12] A. Zhu and Y. Wen, “Green Logistics Location- https://doi.org/10.1007/s00500-023-08073-4
Routing Optimization Solution Based on Improved
GA A1gorithm considering Low-Carbon and
Environmental Protection,” Journal of Mathematics,
vol. 2021, pp. 1–16, Nov. 2021, doi:
https://doi.org/10.1155/2021/6101194
[13] M. Abbasian, Z. Sazvar, and M.
Mohammadisiahroudi, “A hybrid optimization
method to design a sustainable resilient supply chain
in a perishable food industry,” Environmental
Science and Pollution Research, Aug. 2022, doi:
https://doi.org/10.1007/s11356-022-22115-8
[14] Z.-H. Sun, X. Luo, E. Q. Wu, T.-Y. Zuo, Z.-R. Tang,
and Z. Zhuang, “Monitoring Scheduling of Drones
for Emission Control Areas: An Ant Colony-Based
Approach,” IEEE Transactions on Intelligent
Transportation Systems, vol. 23, no. 8, pp. 11699–
https://doi.org/10.31449/inf.v49i16.6990 Informatica 49 (2025) 199-212 199
Application Method and Least Squares Support Vector Machine
Analysis of a Heat Pipe Network Leakage Monitoring System Using an
Inspection Robot
Xu Wang, Xiaobo Long, Guangwei Li, Jing Li, Yuweijia Zhao*
Kunming metallurgy college, Kunming, Yunnan, 650033, China
E-mail: kmyz_wang79@163.com
*Corresponding author
Keywords: heat pipe network inspection robot, human-computer interaction, heat pipe network side leakage monitoring,
mobile platform, robot control
Received: August 24, 2024
With the maturity of the Internet and big data technAology, heat supply intelligence has become a
development trend, and the traditional heat pipe network management mode is gradually transitioning to
an "intelligent heat pipe network". It has become a hot spot for research and development at home and
abroad. Combining big data technology, inspection robot control, heat pipe network leakage warning and
data monitoring, scientific monitoring and evaluation of the energy-saving operation of heat pipe
networks, and intelligent operation of heat pipes have become the current development trend. Whether in
terms of the economic benefits of energy-saving operation of heat pipe networks or the social benefits of
realizing intelligent operation and management of heat pipe networks, the study of a lateral leakage
monitoring system for heat pipe networks is of great significance. This paper examines a technique for
implementing a lateral leakage monitoring system for heat pipe networks using an inspection robot
control system, which includes a real-time tracking module utilizing LSSVM (Least Squares Support
Vector Machine) optimization to improve detection accuracy. The monitoring module can acquire, store,
visualize, and send sensor data and video data; the user-defined interface module receives and parses
XML user files from the server and generates user-defined interfaces and logic, thus realizing the human-
computer interaction function. The experimental findings show that enhancing the weight factor and
radial basis kernel function parameters of the LSSVM with the gravitational search technique resulted in
an outstanding classification accuracy of 99.99% with a classification time of only 55.938 seconds,
surpassing other optimization techniques.
Povzetek: Z uporabo robotskega nadzornega sistema in optimizirane metode LSSVM so avtorji razvili
inteligentni sistem za spremljanje puščanja v toplotnih cevnih omrežjih
1 Introduction referred to in the study of heat pipe network inspection
Leakage in a thermal pipe network is a sudden change in robot monitoring systems [5].
liquid flow head or flow pressure caused by the flow rate Numerous studies have investigated different methods of
of the medium stored outside the pipe exceeding a set inspecting and identifying leaks in pipeline networks,
value, which leads to a leak in the pipe [1]. Generally, we highlighting the significance of intelligent systems.
put in a sealed space after the occurrence of a leakage Zholtayev et al. [6] created a smart pipe inspection robot
accident resulting in energy loss for the minimum, but this with in-chassis motor actuation and AI-powered defect
method cannot effectively monitor the actual identification, showcasing sophisticated robotics
environmental conditions over the failure caused by incorporation in network tracking. Murtazin et al. [7]
system damage to the maximum extent is not accurate and examined internal inspection techniques for district
reliable, timely detection and treatment of the process heating networks, highlighting the importance of resilient
caused by the consequences are very serious and likely to inspection techniques in energy systems. Wong and
bring huge economic losses to the enterprise [2-3]. McCann [8] conducted an in-depth analysis of pipeline
Therefore, a lot of research has been carried out at home failure identification methods, ranging from acoustic
and abroad on the diagnosis of leakage faults in heat pipe sensing to cyber-physical systems, emphasizing the
networks, and many leakage monitoring methods have growing use of IoT solutions in fault detection. Liu et al.
been proposed. Although various methods have certain [9] presented an enhanced BP neural network algorithm
limitations and need to continue to be improved [4], the for leakage detection in air conditioning water systems,
leakage monitoring of heat pipe networks and their demonstrating the efficacy of machine learning in
compensators has important significance and can be detecting faults. Korlapati et al. [10] performed a thorough
200 Informatica 49 (2025) 199-212 X. Wang et al.
review of pipeline leak identification approaches, ranging Hossain et al. [14] used UAV image evaluation and
from conventional to AI-based methods. Similarly, Yussof machine learning to identify leaks in district heating,
and Ho [11] examined water leak detection techniques in demonstrating the value of aerial monitoring for
smart buildings, emphasizing the significance of these infrastructure surveillance. Finally, Vollmer et al. [15]
technologies in contemporary infrastructure. Langroudi compared anomaly detection techniques in thermal
and Weidlich [12] investigated predictive maintenance imagery for district heating leak identification, which
assessment techniques for district heating pipes, which advances the use of thermal imaging in fault identification.
added to service-life prediction techniques. Van Dreven et Table 1 shows a summary table.
al. [13] addressed smart fault detection in district heating,
finding significant patterns and obstacles in the area.
Table 1: Summary table
Citation Title Accuracy Efficiency Limitations Innovations Gaps in SOTA
[6] Smart Pipe 95% High (real- Scalability, cost AI-powered Flexibility for
Zholtayev et Inspection time) discovery, high different pipe
al. (2024) Robot with AI- precision types
Powered Defect
Discovery
[7] Murtazin Internal 92% Moderate Constrained with Non-destructive Constrained with
et al. (2021) Inspection of magnetic testing testing particular
District Heating pipelines
Networks
[8] Wong & Pipeline Failure 70-85% Low (real- Inconsistent Discovery High
McCann Discovery: time) accuracy, high cost taxonomy, computational
(2021) Acoustic to spatial cost
Cyber-Physical enhancement
Systems
[9] Liu et al. Leakage 86.96% High Fault location error Two-stage Constrained real-
(2022) Analysis for Air diagnosis, BP time localization
Conditioning neural network
Water Systems
[10] Review of 87% Varies No standardization Review of Variability in
Korlapati et Pipeline Leak subsea reliability
al. (2022) Discovery techniques
Techniques
[11] Yussof Water Leak 81% Varies Real-time gaps in Incorporation Absence of
& Ho (2022) Discovery in smart buildings with building automated
Smart Buildings automation discovery
[12] Predictive 85-90% High Constrained with Proactive AI- Narrow
Langroudi & Maintenance for district heating driven concentration
Weidlich District Heating maintenance
(2020) Pipes
[13] van Fault Discovery 80-93% Medium- Data Restrictions ML methods for Absence of open-
Dreven et al. in District High fault discovery source data
(2023) Heating with
ML
[14] Hossain UAV-Based 85% Moderate Constrained with UAV with Poor scalability
et al. (2020) Leakage UAV image infrared for large systems
Discovery for examination discovery
District Heating
[15] Vollmer Anomaly 90% High False positives in Self-learning Difficulties with
et al. (2021) Discovery in intricate systems algorithms dynamic settings
Water Networks
with Self-
Learning
Algorithms
Existing state-of-the-art (SOTA) techniques have many techniques lack scalability, but this paper presents a
shortcomings, like constrained flexibility to particular scalable AI-based framework for large-scale networks.
pipeline types, whereas the proposed system has wider Real-time efficiency is hampered by high computational
applicability. Previous UAV and subsea detection
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 201
expenses; the proposed system improves this with process the data so that each indicator value is uniformly
improved algorithms. within a certain range of numerical characteristics. In this
Based on the above, this paper focuses on the design of the paper, 500 training samples and 90 test samples are
heat pipe network leakage monitoring system, including randomly selected, and the independent variable of the
hardware circuits and software programs, which involves leakage fault diagnosis model is denoted as x. The leakage
the following key technologies: the selection of each fault level of the key nodes of the heat pipe network
component. According to the different types of obtained after random sampling is denoted as the
components, the corresponding models are selected to dependent variable y. The conductivity G1 and G2 are
analyze the working conditions under various parameters. processed by logarithm, the temperature and pressure are
The microcontroller control module, the sensor data normalized, and the sample data of the independent
acquisition part, and the peripheral devices such as the variable after data pre-processing are
display alarm form the overall structure to complete the
design scheme and design the thermal pipe network 𝑇 2,3,4−𝑇𝑚𝑖𝑛1,2,3,4
𝑇
1,
𝑛𝑜𝑟𝑚𝑙1,2,3,4 =
leakage monitoring system based on the inspection robot 𝑇𝑚𝑎𝑥1,2,3,4−𝑇𝑚𝑖𝑛1,2,3,4
control system. 𝐺𝑛𝑜𝑟𝑚𝑙1,2 = lg(𝐺1,2) (1)
𝑝−𝑝
𝑝
{ 𝑛𝑜𝑟𝑚𝑙 =
𝑚𝑖𝑛
𝑝𝑚𝑎𝑥−𝑝𝑚𝑖𝑛
2 Leakage fault diagnosis in heat pipe
networks Using equation (1) we can obtain the normalized data for
the training samples as well as the test samples, and the
2.1 Leakage fault modelling dependent variable, the leakage fault level at the key nodes
Leakage faults at key nodes of the heat pipe network are of the heat pipe network, is given by equation (2)
classified into three levels: normal, normal leakage, and
severe leakage, and the leakage will lead to a sudden drop yi ∈ {1,2,3} (2)
in pressure inside the pipe and changes in ambient
temperature and conductivity [16-17]. Therefore, in this The least squares support vector machine algorithm is then
paper, the ambient temperature T1, T2, T3, and T4, the used to build a classifier to achieve a multiclassification
ambient conductivity G1 and G2, and the internal pressure fault diagnosis model for critical node leakage in the
p of the pipe are selected as inputs, and the leakage level thermal network. In the thermal pipe network critical node
of the critical node of the heat pipe network is taken as the leakage fault diagnosis model, we assume that the
output to establish the leakage level discrimination model. independent variable is x, and define the nonlinear thermal
In this paper, the leakage level is expressed as {1, 2, 3} and pipe network critical node leakage fault diagnosis model
will be used as the output of the leakage fault model, while of least squares support vector machine as:
the ambient temperature and conductivity around the heat
pipe network and the internal operating pressure of the x = [T1, T1, T3, T4, G1, G2, P} (3)
pipe network obtained online by the in-situ monitoring unit
yi = {ω, (φx)} + b (4)
are used as the model inputs. The multi-classification
leakage fault diagnosis at key nodes of the heat pipe
Given a set of data points that are closely related to the
network consists of four steps: sample collection, data pre-
fault diagnosis of leakage at critical nodes of the thermal
processing, building and optimizing the multi-
network, i.e. ambient temperature, ambient conductivity,
classification leakage fault diagnosis model, and model
and internal pipe pressure, d is the dimensionality of the
testing. Specifically, x training samples and y test samples
model input variables and is the result of the model
are arbitrarily selected; the extracted training and test
classification, i.e. normal (1), normal leakage (2) and
samples are normalized; the multiclassification heat pipe
severe leakage (3), l is the total number of known data
network critical node leakage fault diagnosis model is
points, and b is a constant. Therefore, the target equation
established; the model parameters are optimized; and the
and the nonlinear decision function used in the input space
experimental samples are substituted into the established
can be defined as:
multiclassification heat pipe network leakage fault
diagnosis model for testing. 1
Since each characteristic indicator of temperature, min ω2
1
+ C∑l 2
2 2 i=1ei (5)
conductivity, and pressure has different magnitudes and 𝑦(𝑥) = 𝑠𝑔𝑛(∑𝑆 𝑦𝑖𝑎𝑖𝐾(𝑥
, 𝑥𝑖) + 𝑏) (6)
𝑖
orders of magnitude, the role of a characteristic indicator
of a particularly large order of magnitude in the 2.2 Optimization of the parameters of the
classification may be highlighted in the calculation process.
leakage fault diagnosis model
To eliminate differences in the units of the characteristic
indicators and the effect of different orders of magnitude In this paper, the gravitational search method is used to
of the characteristic indicators, it is necessary to pre- optimize the parameters of the weight factor and the radial
202 Informatica 49 (2025) 199-212 X. Wang et al.
basis kernel function of the least squares support vector 𝑀 = 𝑀 = 𝑀 𝑁
𝑎𝑖 𝑝𝑖 𝑖𝑖 = 𝑀𝑖 , 𝑖 = 1,2,⋯ ,
machine. The gravitational search algorithm focuses on 𝑓𝑖𝑡 𝑜𝑟𝑠𝑡(𝑡)
𝑚 𝑖(𝑡)−𝑤
using the law of gravity between two objects to guide the 𝑖(𝑡) = 𝑏𝑒𝑠𝑡(𝑡)−𝑤𝑜𝑟𝑠𝑡(𝑡) (13)
𝑚 𝑡)
optimal search for the optimal solution for the motion of 𝑀𝑖(𝑡) =
𝑖(
each object. In this algorithm, each object is considered as { ∑𝑁𝑗=1𝑚𝑗(𝑡)
𝑏𝑒𝑠𝑡(𝑡) = min
an object whose performance is measured by its mass, and 𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡)
{ (14)
all these objects are attracted to each other through gravity, 𝑤𝑜𝑟𝑠𝑡(𝑡) = max𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡)
and this force causes all objects to move towards the object
with the heavier mass [18]. The position of the object in 𝑏𝑒𝑠𝑡(𝑡) = max𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡)
motion corresponds to its optimal solution. The { (15)
𝑤𝑜𝑟𝑠𝑡(𝑡) = min𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡)
gravitational search method can be thought of as an
isolated system of masses. Each object in motion follows
According to the above principles of the gravitational
the law of gravity and the law of motion. Assuming a
search method and the LSSVM algorithm, it is clear that
system with N objects, define the position of the ith
the idea of the gravitational search method to optimize the
particle as
LSSVM learning parameters is to search for a set of
vectors within a certain search space region by the
𝑥 , ,
𝑖 = (𝑥1,𝑖 ⋯
, 𝑥𝑑𝑖 ⋯ 𝑥𝑛𝑖 ),𝑖 = 1,2,⋯ ,𝑁 (7)
gravitational search method, so that equation (16) the value
of the target fitness function is minimized.
Then the interaction force and its parameters are given by
1
minf(C, δ) = ∑N y (16)
n i=1( i − ŷ)
2
Fd
M (t)×M
( aj(t)
ij(t)
p
= G t)
i
(xd − xd 8)
R j (t) i (t)) (
ij(t)+ε
The basic principle of the gravitational search method is to
Rᵢⱼ(t) = ‖xᵢ(t), xⱼ(t)‖₂ (9) optimally adjust the parameters of the least squares support
vector machine weight factor and the radial basis kernel
Fd(t) = ∑N function by using the strong global search capability of the
i j=1,j≠irandjF
d
ij(t) (10)
gravitational search method, and the optimization steps are
as follows: first, randomly select and normalize the
where rand is a random number generated between the
training and test samples; given the population size N and
intervals [0,1]. Therefore, according to Newton's laws of
the maximum number of iterations, randomly initialize n
motion, the acceleration of particle i in d-dimensional
particles; during each iteration, substitute The position of
space at time t and its calculation is given by
each particle is substituted into the least squares support
𝑑 vector machine model and the fitness value of the current
(𝑡)
𝑎𝑑
𝐹
𝑖 (𝑡) =
𝑖 (11)
𝑀 particle is obtained; the sum of the forces in different
𝑖𝑖(𝑡)
directions of each particle and the acceleration of each
𝑣𝑑 particle are calculated according to Equation (10); the new
𝑖 (𝑡 + 1) = 𝑟𝑎𝑛𝑑
𝑑
𝑖 × 𝑣𝑖 (𝑡) + 𝑎
𝑑
𝑖 (𝑡){ particle position is calculated according to the update
𝑥𝑑 𝑑 (12)
𝑖 (𝑡 + 1) = 𝑥𝑑𝑖 (𝑡) + 𝑣𝑖 (𝑡 + 1)
formula of the particle velocity and position; the algorithm
termination condition is judged. If the maximum number
where randy is a uniform random variable in the interval
of iterations is reached, the iteration is terminated and the
[0,1], which we use to give a random signature to the
optimal parameter values are output.
gravitational search. The gravitational constant G is
The parameter tuning method for the gravitational search
initialized at the start moment and decreases with time to
technique (GSA) was carefully planned to improve the
control the search accuracy. The gravitational and inertial
LSSVM's efficiency. During this procedure, key
masses are simply calculated by fitness evaluation. A
parameters were adjusted, including the gravitational
heavier mass means a more efficient object, which means
constant, agent mass, and initial population size. Particular
that such an object has a higher gravitational force and
settings comprised 𝐺0=100, mass range of [1,10], and 50
slower velocity. Assuming that the gravitational mass is
agents. The GSA procedure is depicted in the flowchart
equal to the inertial mass, we update gravity and inertia
below, starting with the initialization of agent positions
using equation (13), of which equations (14-15) are the
and masses, then iterative updates using gravitational
definitions of the parameters as well as maximizing the
forces, and finally evaluating the LSSVM classification
transformation.
efficacy. A total of 590 samples were used for dataset
selection, with 500 serving as training samples and 90 as
testing samples. The training dataset had a balanced
distribution across multiple fault classes, guaranteeing that
each class was sufficiently represented to avoid bias. Each
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 203
training sample was created from real-world functional In this paper, the parameters of the weight factor and radial
data and represents a variety of fault situations. To basis kernel function of LS-SVM are optimized based on
guarantee an unbiased assessment of the model's efficacy, the particle swarm algorithm, cuckoo algorithm, and
the test samples were chosen at random from the same gravitational search method. This optimized multi-
dataset while retaining the identical distribution features. classification fault diagnosis model for leak monitoring at
This comprehensive description of the parameter tuning key nodes of the heat pipe network is used to test the test
and dataset choice procedures not only improves samples. The classification results are also compared. The
replicability, but it also improves the validation of the 500 training samples were taken into the leak fault
provided outcomes, allowing other researchers to diagnosis model and the parameters of the LS-SVM
efficiently apply the approach. weight factor and radial basis kernel function were first
optimized using the gravitational search method to obtain
the optimal values, which took 55.938 seconds to run and
99.99% of the test samples were correctly classified.
Figure 2 illustrates a comparison of two elements of the
gravitational search technique. The top section displays a
parametric merit search plot, demonstrating how the
technique assesses various parameters to identify the best
solution. The bottom section shows the classification
findings for the test set, demonstrating the method's ability
to correctly categorize data using the optimum parameters
discovered during the merit search. In general, this figure
depicts the relationship between the parameter
optimization procedure and its effect on classification
efficiency.
Figure 1: Flowchart of the GSA Process
Figure 2: Comparison of the parametric merit search
2.3 Leakage fault diagnosis and result
graph of the gravitational search method (top) and the test
analysis
set classification of the merit search (bottom)
We introduce the mean square error MSE as an index to
evaluate the correct classification rate, which is calculated Optimization of the weight factor and radial basis kernel
as function parameters of the LS-SVM using the cuckoo
algorithm resulted in an optimal value of 28.7282 and an
1
𝑀𝑆𝐸 = ∑𝑛 (𝑦 − 𝑦 )2
𝑛 𝑖=1 𝑖 ∧ 𝑖 (17) optimum value of 15.8259, and the time taken to run was
60.491 seconds, giving a 97.89% correct classification rate
204 Informatica 49 (2025) 199-212 X. Wang et al.
for the test sample. In this paper, we optimize the weight opposed to 0.950 for the Cuckoo Algorithm. These metrics
factor and radial basis kernel function parameters of the demonstrate the GSA's improved classification efficiency.
LS-SVM based on the gravitational search method, the
cuckoo algorithm, and the particle swarm algorithm, and
use a randomly selected set of 90 test samples to check the
correct classification rate. The findings show that the
multi-classification fault diagnosis model, which uses a Table 3: Performance metrics
least squares support vector machine algorithm enhanced
by the gravitational search technique, attains a Metric Gravitational Cuckoo
classification accuracy of 99.99% in only 55.938 seconds. Search Method Algorithm
The results show that the multi-classification fault (GSA)
diagnosis model based on the least squares support vector Accuracy 99.99% 95.75%
machine algorithm optimized by the gravitational search Precision 99.95% 93.00%
method has the best classification effect and the lowest Recall 99.90% 94.50%
algorithm complexity. F1 Score 99.92% 93.75%
Table 2 shows a confusion matrix comparing the ROC AUC 0.999 0.950
Gravitational Search Algorithm (GSA) and the Cuckoo Score
Algorithm. The GSA had 450 true positives (TP) versus
425 for the Cuckoo Algorithm, showing superior
2.4 Heat pipe network inspection robot
efficiency in finding positive cases. The GSA also
control system
recorded 40 true negatives (TN), which exceeded the
Cuckoo Algorithm's 35. Particularly, the GSA had only The thermal pipe network inspection robot system is
two false positives (FP), whereas the Cuckoo Algorithm divided into four modules: server, master controller
had ten, indicating higher precision. Furthermore, the GSA (mobile side), slave controller (motion controller), and
had three false negatives (FN) compared to the Cuckoo's pipe robot mechanical system. The principle of operation
fifteen, demonstrating its detection efficiency. Overall, the and control of the thermal pipe network inspection robot
GSA outperformed the Cuckoo Algorithm. system is based on the motion controller as the core [19].
When the user's control logic is burned into the motion
Table 2: Confusion matrix controller and parsed, the motion controller sends control
commands to the actuators to realize the motion control of
Metric Gravitational Cuckoo the thermal pipe network inspection robot. The motion
Search Method Algorithm controller collects and processes the robot motion data
(GSA) from the photoelectric encoder and the mobile terminal in
True Positives 450 425 real-time during the motion of the heat pipe network
(TP) inspection robot, and according to the results of the data
True Negatives 40 35 processing, the motion controller adjusts the motion state
(TN) of the pipe robot in real time to achieve closed-loop control
False Positives 2 10 of the pipe robot motion. In addition, the mobile side is
(FP) equipped with a self-developed thermal network
False 3 15 monitoring system, which sends the sensor data and video
Negatives (FN) data to a remote server via network communication
(TCP/IP communication protocol), enabling remote
Table 3 presents efficiency metrics for the GSA and the monitoring of the pipeline robot. The specific functions of
Cuckoo Algorithm. The GSA attained an impressive 99.99% the four components are as follows.
accuracy, substantially higher than the Cuckoo Firstly, the server. The server side of this paper is equipped
Algorithm's 95.75%. The GSA had a precision of 99.95%, with a self-developed remote thermal pipe monitoring
compared to 93.00% for the Cuckoo Algorithm, system client, whose function is to receive the sensor data
suggesting that it was more reliable at predicting the and video data from the mobile side and to visualize them,
positive class. The GSA's recall was 99.90%, while the server can send control commands to the motion
demonstrating its ability to detect pertinent instances, controller via the mobile side to adjust the motion of the
whereas the Cuckoo Algorithm had a recall of 94.50%. pipe robot. Secondly, the mobile side. The mobile side is
The F1 Score for the GSA was 99.92%, while the Cuckoo fitted with an on-site thermal pipe monitoring system
Algorithm's was 93.75%, demonstrating the GSA's total client, which is capable of acquiring sensor data and video
superiority. Finally, the ROC AUC Score for the GSA was data in real time using the mobile side's hardware-
0.999, indicating outstanding discriminative capacity, as integrated sensor set and HD camera. The system
processes the data in two ways: one is the local
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 205
visualization and storage processing of the data; the other PWM frequency, 10-bit ADC resolution, and configure the
is the sending of the data to the server side and the motion UART communication rate to 9600 bauds.
controller. Thirdly, the multi-core heterogeneous motion Sensor calibration: To guarantee precise readings,
controller. It is used to implement the user's control logic, calibrate the incorporated sensors by adjusting the
which is the core of the motion control of the heat pipe temperature sensors to a reference temperature of 25°C.
network inspection robot. The main hardware modules of The HD camera should be setup to capture video at 1080p
the motion controller are a master MCU, two slave MCUs, resolution and 30 frames per second, while the laser
a motor driver chip, a voltage converter chip, and a sensor scanner is set to a highest detection range of 5 meters.
set. Fourth, the mechanical structure of the thermal pipe
network inspection robot: it is the bearer of the mobile end, Control logic execution: Program control commands
the power module of the motion controller, etc., and is also using user-defined logic into the motion controller,
the final executor of the motion controller's operating allowing the robot to perform particular movement trends
instructions. within the pipeline.
In addition, the mechanical structure of the thermal pipe Test environment setup: To simulate real-world
network inspection robot in this paper is mainly divided circumstances, build a scaled-down model of a 10-meter-
into the chassis, walking mechanism, drive module, and long thermal pipe network with differing diameters (50
the articulation mechanism between the chassis, etc. The mm and 100 mm).
chassis is the main part of the mechanical system of the Conducting trials: Perform at least five trials to evaluate
thermal pipe network inspection robot, which carries the the robot's efficiency, recording control commands, sensor
motion controller, drive system, and mobile end of the pipe readings, and motion execution times, with a target
robot. The travel mechanism and drive module are the key execution time of less than 120 seconds.
factors to ensure that the thermal pipe network inspection Data gathering and examination: Gather and evaluate
robot walks normally in the pipeline, and are the focus of sensor and camera data to compare actual detection
the pipeline robot mechanism design. findings to expected results, with a target detection
The PC-Microcontroller control system was chosen accuracy of 90%. Record any deviations in effectiveness.
because the thermal network inspection robot in this paper Expected vs. actual outcomes
needs to process information such as video information Detection Accuracy: The detection accuracy is set at 90%
and scanner data as well as execute control commands. for detecting known defects in the thermal pipe network.
Based on the functional requirements of the system and the Execution time: The anticipated execution time should
tasks to be completed by the robot, the robot system is not surpass 120 seconds, and actual times will be logged
composed of a power supply system, a sensor system, an for comparison.
upper computer system, a lower computer system, a Power requirements: The power supply system should
motion control unit, a bus communication system, a video supply 24V to the motor driver, 12V and 5V to the sensor
system, and a laser scanner system. The power supply unit, and 5V to the microcontroller and motion control
system is responsible for supplying power to all parts of system.
the robot. The robot in this project requires power from the The overall structure of the robot control is shown in
motor driver (24V); the sensor unit (12V, 5V); the main Figure 3.
controller (5V) of the proposed ATmega series of
microcontrollers; and the motion control system (5V). The
robot system is controlled by the lower computer, the
upper computer sends commands to the lower computer
through the bus communication system, and the lower
computer controls the normal operation of the robot
according to the received commands.
3 Experimental procedures for
testing the control system of the
thermal pipe network inspection
robot
Preparation: Assemble the robot's chassis, locomotion
mechanism, drive module, and articulation system,
making sure that the master MCU, two slave MCUs, motor
driver chip, and sensor array are properly linked.
Microcontroller configuration: Setup the ATmega
microcontroller with a 16 MHz clock frequency, 490 Hz
206 Informatica 49 (2025) 199-212 X. Wang et al.
Figure 4: Upper computer development process
Figure 3: Control system structure block diagram Some of the program code for the upper computer is as
Motor drive system design: follows.
In this paper, the heat pipe network inspection robot has
multiple motors: 2 main motors for walking, 1 camera ////1. Capture user operation commands
rotation motor, 1 camera tilt servo, and 1 camera tilt servo. temp_char=PINB; //Read port B
There are two main motors for walking, one camera if(temp_char&BIT(0)) //Rocker "run" trigger
rotation motor, one camera tilt servo, one scanner rotation {
motor, and one head lift motor. Since DC motors are used, flag_run=1; //indicates that the robot has entered the
this paper only focuses on the drive system of the main forward state
travel motors. The motion control of the motors is carried direction_left=1; //the robot is moving in the left wheel
out by a central processor ATmega8 in the robot control direction
module, which outputs PWM pulse width modulated direction_right=1; //the robot goes in the direction of the
signals to the motor drivers in a software way to carry out right wheel
the forward, reverse, and stop actions of the motors. velocity_left=velocity_robot; // the robot's left wheel speed
velocity_right=velocity_robot; //robot right wheel speed
Design of the upper computer control system: }
The upper control system is mainly responsible for else if(temp_char&BIT(1)) // rocker "back" trigger
communication with the lower computer of the pipeline {
robot, through the control knob on the upper control panel flag_run=0; //meaning the robot is out of the forward state
to send various commands to the robot, such as forward, direction_left=2; //the robot goes back in the left wheel
backward, left turn, right turn, stop and other action direction
commands; camera control knob can realize the direction_right=2; //the robot goes back on the right wheel
adjustment of the camera head, such as camera rotation, velocity_left=velocity_robot; //the robot's left wheel speed
tilt, and other actions; control panel also has the adjustment velocity_right=velocity_robot; //robot right wheel speed
knob of the scanner head, can adjust The control panel also }
has a knob for the scanner head, which allows the scanner else if(temp_char&BIT(2)) // rocker "left" trigger
to be rotated, scanned and reset, as well as the adjustment {
of the brightness of the LEDs. All commands are sent to if(flag_run==0) //rotate in place
the robot via the host control system and are controlled by {
the robot. The upper computer development process is direction_left=2; //Robot goes backwards in the left wheel
shown in Figure 4. direction
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 207
direction_right=1; //The robot goes forward on the right The main functions of the lower unit of the pipeline robot
wheel are to receive commands from the upper unit to control the
velocity_left=100; //Robot left wheel speed motors; to collect data from sensors such as tilt angle and
velocity_right=100; //Robot right wheel speed return the information to the upper unit; to provide power
} to the robot motors, lights and cameras, and to control the
else normal operation of the robot components. The code for
{ the part of the program of the lower computer is as follows.
direction_left=1; // robot left wheel direction forward void main(void)
direction_right=1; // robot right wheel direction {
velocity_left=velocity_robot; // robot left wheel speed init_devices();
velocity_right=velocity_robot+100; //robot right wheel while(1) ///// cycle ms
speed {
} value_adc[2]=value_adc[0]; //backup last AD acquisition
} data
else if(temp_char&BIT(3)) // rocker "right" trigger value_adc[3]=value_adc[1]; //backup last AD acquisition
{ data
if(flag_run==0) //rotate in place flag_adc=0;
{ adc_start(0); //Start AD acquisition
direction_left=1; //Robot left wheel direction forward Delayms(1);
direction_right=2; //Robot goes backwards on the right flag_adc=1;
wheel adc_start(1); //Start AD acquisition
velocity_left=100; //Robot left wheel speed Delayms(1);
velocity_right=100; //Robot right wheel speed if(flag_auto==1)
} {// calculate and output the speed of the motor
else velocity_left=0;//the speed of the motor (P+D)
{ velocity_right=0;//the speed of the motor (P+D)
direction_left=1; // robot left wheel direction forward if(velocity_left>10)
direction_right=1; // robot right wheel direction {
velocity_left=velocity_robot+100; // robot left wheel DIRLEFT_H;//positive rotation
speed STOPLEFT_L;//positive rotation
velocity_right=velocity_robot; //robot right wheel speed pwm_left=velocity_left;
} }
} else if(velocity_left<-10)
else if(flag_run==1) //robot not triggered, the robot in a {
forward state DIRLEFT_L;//reverse
{ STOPLEFT_L;//reverse
direction_left=1; // robot left wheel direction forward pwm_left=-velocity_left;
direction_right=1; //Robot is moving in the right wheel }
direction else
velocity_left=velocity_robot; //Robot left wheel speed {
velocity_right=velocity_robot; //robot right wheel speed STOPLEFT_H;//brake
} pwm_left=0XFF;
Else if(flag_run==0) //robot is not triggered, the robot is }
stationary. if(velocity_right>10)
{ {
direction_left=3; //Robot left wheel stopped DIRRIGHT_H;//positive rotation
direction_right=3; //robot right wheel stop STOPRIGHT_L;//forward
velocity_left=0; //Robot left wheel speed pwm_right=velocity_right;
velocity_right=0; // robot right wheel speed }
} else if (velocity_right<-10)
if(temp_char&BIT(4)) // robot auto travel state {
{ DIRRIGHT_L;//reverse
direction_left=8; //Robot left wheel auto STOPRIGHT_L;//reverse
direction_right=8; //Robot right wheel auto pwm_right=-velocity_right;
} }
...... else
208 Informatica 49 (2025) 199-212 X. Wang et al.
{ when an exception occurs in the user's code, the
STOPRIGHT_H;//brake corresponding dialog box will pop up, and according to the
pwm_right=0XFF; information in the dialog box, the user can see the location
} of the error and the cause of the error, to facilitate the user's
}///////////////////////////////////////////////////////////////////////// modification of the code and avoid the phenomenon of
...... crashing or flashing due to errors in the operation of the
heat pipe network monitoring system, which ensures the
3.2 Heat pipe network monitoring system normal operation of the heat pipe network monitoring
under the control of inspection robot system.
The system reads data from the intermediate database at
3.2.1 Analysis of the functional requirements and regular intervals to visualize (dynamic text display and
overall architecture of the heat pipe network dynamic curve display), store, and send data. In the
monitoring system architecture of the heat pipe network monitoring system,
According to the aforementioned content pair can be the intermediate database, basic functions, exception
known, this heat pipe network monitoring system needs to handling, and library functions form the underlying code
achieve the following functions. of the heat pipe network monitoring system, and the user
(1) This heat pipe network monitoring system must be able can call the function functions in the library functions to
to acquire data measured by the mobile's integrated sensors achieve the corresponding logical functions, which
and HD cameras in real time and follow TCP/IP reduces the difficulty of the user's development.
communication protocols and OTG communication
protocols to send the collected data and corresponding 3.3 Interface development
operation instructions to the server and motion controller. Based on the analysis of the functions and architecture of
(2) The thermal network monitoring system should have the monitoring software, the interface structure of the
the capability to visualize sensor data, i.e. to enable the monitoring software is divided into three modules: the
user to observe specific sensor data, the system must be login module, the monitoring module, and the user UI
able to display the data as dynamic text; to visualize the module. The login module verifies the user's information,
trend of the data, the system must enable the data to be and only when the user writes accurate information can the
displayed as dynamic curves. monitoring software be opened; when the information
(3) To meet the access to historical sensor data and video entered does not pass the background verification, the
data, and to facilitate the user to further confirm the software will prompt the user to enter again or register the
operation status of the heat pipe network inspection robot account until the login is successful.
and the internal environment of the pipe, the heat pipe The monitoring module is divided into six interfaces:
network monitoring system must be able to store the dynamic text display of data, dynamic curve display of
acquired data in the database and have the function of data and video display, sensor selection interface, and
querying and deleting historical data. network connection interface. In the dynamic text display
(4) The main difference between this thermal network of sensor data and dynamic curve display interface, the
monitoring system and other thermal network monitoring data is refreshed once per second; the video monitoring
systems is the ability to achieve human-machine interface can preview the video data collected by the
interaction, i.e. by parsing the XML file sent by the server mobile terminal in real-time; to reduce the amount of data
and dynamically generating user-defined interfaces and and allow the user to select the required sensor data
background logic, the thermal network inspection robot according to the specific project needs, the monitoring
can be controlled to achieve the functions set by the user. software is designed with this in mind and the sensor
(5) To ensure the security of user information, the thermal selection interface is designed; to achieve the
network monitoring system needs to have a user login communication function with the server, the network
interface so that only the user can use the thermal network connection interface is designed. To communicate with the
monitoring system after entering the corresponding user server, a network connection interface has been designed,
name and password. where the user only needs to enter the corresponding IP
Based on the analysis of the functional requirements of the and port number to connect to the server and transfer the
monitoring software, the design of its overall architecture data; to meet the user's needs and enable the user to query
was completed. The main functions of the system are: to the historical data, a data query interface has been designed.
obtain sensor data and video data in real-time and store the The user UI module, which is used to display the interface
data in an intermediate database; to implement some basic dynamically generated by parsing the XML file sent by the
functions such as listening to events (sensor listening server, enables human-computer interaction. The interface
events, SMS listening events, etc.) and sending and design for this heat network monitoring system uses
receiving broadcasts; to have a system exception and user Activity and Fragment components. Since Fragment takes
code exception handling mechanism, when the user code up less memory than Activity, the interface design in this
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 209
paper uses more lightweight Fragment to improve the design is to make the functions of the system clearer, and
running efficiency of the application. The interface of the the user can jump to the required function by a one-key
heat pipe network monitoring system takes the main switch. The interface makes it easier for the user to operate
interface as the core and dynamically loads the monitoring the system. The main monitoring interface has been
interface and the user interface, etc. The data is passed divided into two modules: the monitoring module and the
between the interfaces by binding objects of the Bundle user interface module, so that the options bar at the top is
class, as shown in Figure 4. divided into two sections: "Data Monitoring" and "User
Interface". The visualization bar at the bottom is also
divided into three sections according to the function of the
monitoring module: "Text data", "Curve data" and "Video
data", and the layout between the sections is LinearLayout.
The controls in the options bar and the visualization bar are
not the basic controls provided by the mobile platform
system, but rather developer-defined combinations of
controls, with a uniform image at the top text at the bottom,
and a LinearLayout layout. The background color becomes
lighter when a control is selected.
In the design of the heat network monitoring system, the
main interface was created by writing an XML layout file
for the corresponding interface. This makes it easier to
control the layout of the controls, and the orientation and
layout_weight properties in Linearlayout allow the
combined controls to be distributed according to a certain
layout ratio. The main interface is visualized by loading
Figure 4: Interface interaction process of the heat pipe
the activity_main.xml file in the main Activity, as shown
network monitoring system
in the following code.
The monitoring system can selectively display sensor data,
@Override
i.e. the user can select the required sensor in the sensor type
protected void onCreate(Bundle savedInstanceState) {
selection interface, and in the sensor selection interface,
super.onCreate(savedInstanceState);
the system will jump to the data display interface to
......
visualize the data (dynamic text display and dynamic curve
setContentView(R.layout.activity_main);
display) according to the selection result. The process is as
}
follows: first, the system converts the selected sensor type
The onCreate() function is called when the Activity is
into a string and separates the different sensor names with
initialized, and the setContentView() function is called
a special symbol "%". Then the string is bound to a Bundle
inside the function. The function is its core, and the
object and the set Arguments() function is called to add the
parameter R.layout.activity_main is the layout file for
Bundle object with the string data to the data display
monitoring the main interface.
Fragment to be switched; the system function for
2, the design of the history data query interface
switching between Fragments is called to switch between
To facilitate the user to view and delete the historical data,
the sensor selection interface and the data display interface;
the user-independent query interface is designed. The user
finally the function get Arguments() is called in the
can write the start and end time in the edit box to query and
Fragment responsible for the data display. Function get
delete data within a specific period according to their needs;
Arguments () to get the Bundle object, then call the Bundle.
when the period queried by the user does not exist, the
get String() function to get the String data passed from the
system will give the user a prompt until the correct time is
sensor selection interface, then call the string processing
entered or returned, the data finding process is shown in
function with the special symbol "%" as the Each member
Figure 5. When the queried historical data exists, the heat
of the array is the name of the sensor selected by the user,
pipe network monitoring system provides two ways of data
and the system iterates through the array to obtain and
display, namely text display and curve display.
display the real-time data of the sensor selected by the user.
3.4 Interface design
Main interface design
The main interface of the thermal network monitoring
system in this paper adopts a "segmented" structure, i.e.
the top of the interface is the options bar, the bottom is the
visualization bar, and the middle is used as a container to
load different forms of interfaces, the purpose of this
210 Informatica 49 (2025) 199-212 X. Wang et al.
accuracy without substantially reducing processing speed,
establishing it as a feasible choice for real-time fault
identification in a variety of uses.
The complexity analysis of optimization algorithms, such
as the gravitational search algorithm (GSA), cuckoo
algorithm (CA), and particle swarm optimization (PSO),
focuses on their time complexity and resource needs. GSA
has a time complexity of O (n. k), rendering it effective for
moderate-sized issues, while CA exhibits O (n. log(n)),
allowing for rapid convergence in smaller datasets. PSO,
with a complexity of O (n. m), can become expensive as
dimensionality rises. These variations have an effect on
scalability; GSA's effectiveness renders it appropriate for
bigger heat pipe networks, whereas CA and PSO may have
higher computational overhead with large datasets,
requiring optimizations for practical uses.
5 Conclusion
In summary, the author has analysed the requirements of
this heat pipe network monitoring system, focusing on the
Figure 5: Data Search Process heat pipe network inspection robot, and completed the
overall architecture design and basic interface design of
4 Usability testing the heat pipe network monitoring system. The specific
The thermal network surveillance system's usability functional requirements of the heat pipe network
testing analyses both the main interface and the historical monitoring system were analysed as follows: user login,
data query interface to guarantee user satisfaction. The collection of sensor data, visualization of data, storage of
main interface is segmented, with an options bar for "Data data, communication, dynamic generation of user-defined
Monitoring" and "User Interface," as well as a interfaces and logic, and other functions. The overall
visualization bar that displays "Text data," "Curve data," architecture of the heat pipe network monitoring system is
and "Video data." This design provides easy access to designed. According to the functional requirements of the
operations and intuitive interaction via custom-designed system, the data acquisition, basic functions, system, and
controls. The historical data query interface allows users user code exception handling functions are unified and
to enter start and end times for data retrieval and managed by the Service, which reduces the redundancy of
elimination, with error messages for invalid queries. the code and facilitates maintenance at a later stage. Based
Testing will concentrate on task completion times, error on the functional requirements and architecture of the heat
rates, and user feedback in order to validate efficiency and network monitoring system, the interface architecture of
detect enhancements. the system was designed, using Activity as the carrier to
achieve the functionality of the system interface
4.2 Discussion interaction by dynamically loading Fragment layouts.
The presented multi-classification fault diagnosis model From the application of common login methods and the
outperforms existing SOTA techniques, with a characteristics of this software, the login module, data
classification accuracy of 99.99% and a computation time query module, and main interface of the heat pipe network
of only 55.938 seconds. This enhancement is due to the monitoring system are designed. The design of this
execution of algorithmic optimizations, particularly the monitoring system is important for the improvement of
gravitational search technique integrated with parameter monitoring efficiency.
tuning of the LSSVM. These optimizations boost the
model's capacity to navigate the parameter space Data Availability
efficiently, leading to better classification efficiency than All data are included within the article.
prior methods, which attained a maximum accuracy of
97.89% and required longer run times. This work makes a
Conflicts of interest
unique contribution by incorporating sophisticated
optimization methods that not only raise accuracy but also The authors declare no conflicts of interest.
decrease algorithmic intricacy, rendering the model more
effective. While trade-offs between computation time and Funding statement
accuracy are prevalent in machine learning, the proposed Not applicable.
solution provides practical benefits by offering better
Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 211
References and engineering, 2(4),
100074. https://doi.org/10.1016/j.jpse.2022.100074
[1] Shen, Y., Chen, J., Fu, Q., Wu, H., Wang, Y., & Lu,
[11] Yussof, N. A. M., & Ho, H. W. (2022). Review of
Y. (2021). Detection of district heating pipe network
water leak detection methods in smart building
leakage fault using UCB arm selection method.
applications. Buildings, 12(10),
Buildings, 11(7), 275.
1535. https://doi.org/10.3390/buildings12101535
https://doi.org/10.3390/buildings11070275
[12] Langroudi, P. P., & Weidlich, I. (2020). Applicable
[2] Perpar, M., & Rek, Z. (2020). Soil temperature
predictive maintenance diagnosis methods in service-
gradient is a useful tool for small water leakage
life prediction of district heating pipes. Rigas
detection from district heating pipes in buried
Tehniskas Universitates Zinatniskie Raksti, 24(3),
channels. Energy, 201,
294-304. https://doi.org/10.2478/rtuect-2020-0104
117684. https://doi.org/10.1016/j.energy.2020.11768
[13] van Dreven, J., Boeva, V., Abghari, S., Grahn, H., Al
4
Koussa, J., & Motoasca, E. (2023). Intelligent
[3] Al Qahtani, T., Yaakob, M. S., Yidris, N., Sulaiman,
approaches to fault detection and diagnosis in district
S., & Ahmad, K. A. (2020). A review of water leakage
heating: Current trends, challenges, and opportunities.
detection method in the water distribution network.
Electronics, 12(6), 1448.
Journal of Advanced Research in Fluid Mechanics and
https://doi.org/10.3390/electronics12061448
Thermal Sciences, 68(2), 152-
[14] Hossain, K., Villebro, F., & Forchhammer, S. (2020).
163. https://doi.org/10.37934/arfmts.68.2.152163
UAV image analysis for leakage detection in district
[4] Gams, M., & Kolenik, T. (2021). Relations between
heating systems using machine learning. Pattern
electronics, artificial intelligence and information
Recognition Letters, 140, 158-164.
society through information society rules. Electronics,
https://doi.org/10.1016/j.patrec.2020.05.024
10(4), 514.
[15] Vollmer, E., Ruck, J., Volk, R., & Schultmann, F.
https://doi.org/10.3390/electronics10040514
(2024). Detecting district heating leaks in thermal
[5] Li, W., Liu, T., & Xiang, H. (2021). Leakage detection
imagery: Comparison of anomaly detection methods.
of water pipelines based on active thermometry and
Automation in Construction, 168,
FBG-based quasi-distributed fiber optic temperature
105709. https://doi.org/10.1016/j.autcon.2024.10570
sensing. Journal of Intelligent Material Systems and
9
Structures, 32(15), 1744-
[16] Kim, H., Lee, J., Kim, T., Park, S. J., & Kim, H. (2023).
1755. https://doi.org/10.1177/1045389x20987002
Advanced thermal fluid leakage detection system with
[6] Zholtayev, D., Dauletiya, D., Tileukulova, A.,
machine learning algorithm for pipe-in-pipe structure.
Akimbay, D., Nursultan, M., Bushanov, Y., ... &
Case Studies in Thermal Engineering, 42,
Yeshmukhametov, A. (2024). Smart Pipe Inspection
102747. https://doi.org/10.2139/ssrn.4147041
Robot with In-Chassis Motor Actuation Design and
[17] Pérez-Pérez, E. D. J., López-Estrada, F. R., Valencia-
Integrated AI-Powered Defect Detection System.
Palomo, G., Torres, L., Puig, V., & Mina-Antonio, J.
IEEE
D. (2021). Leak diagnosis in pipelines using a
Access. https://doi.org/10.1109/access.2024.345050
combined artificial neural network approach. Control
2
Engineering Practice, 107, 104677.
[7] Murtazin, I. I., Kozhevnikov, M. V., & Starikov, E. M.
https://doi.org/10.1016/j.conengprac.2020.104677
(2021). Development and application of methods of
[18] García-Ródenas, R., Linares, L. J., & López-Gómez,
internal inspection of district heating networks.
J. A. (2021). Memetic algorithms for training
International Journal of Energy Production and
feedforward neural networks: an approach based on
Management. 2021. Vol. 6. Iss. 1, 6(1), 56-
gravitational search algorithm. Neural Computing and
70. https://doi.org/10.2495/eq-v6-n1-56-70
Applications, 33(7), 2561-2588.
[8] Wong, B., & McCann, J. A. (2021). Failure detection
https://doi.org/10.1007/s00521-020-05131-y
methods for pipeline networks: From acoustic sensing
[19] Kazeminasab, S., & Banks, M. K. (2022). Towards
to cyber-physical systems. Sensors, 21(15), 4959.
long-distance inspection for in-pipe robots in water
https://doi.org/10.3390/s21154959
distribution systems with smart motion facilitated by
[9] Liu, R., Zhang, Y., & Li, Z. (2022). Leakage diagnosis
a particle filter and multi-phase motion controller.
of air conditioning water system networks based on an
Intelligent Service Robotics, 15(3), 259-
improved BP neural network algorithm. Buildings,
273. https://doi.org/10.1007/s11370-022-00410-0
12(5), 610.
https://doi.org/10.3390/buildings12050610
[10] Korlapati, N. V. S., Khan, F., Noor, Q., Mirza, S., &
Vaddiraju, S. (2022). Review and analysis of pipeline
leak detection methods. Journal of pipeline science
212 Informatica 49 (2025) 199-212 X. Wang et al.
https://doi.org/10.31449/inf.v49i16.7779 Informatica 49 (2025) 213–234 213
Biometric-Based Secure Encryption Key Generation Using
Convolutional Neural Networks and Particle Swarm Optimization
Sahera A. S. Almola, Raidah S. Khudeyer, Hameed Abdulkareem Younis
Department of Computer Information Systems, College of Computer Science and Information Technology, University
of Basrah, Basrah, Iraq
E-mail: sahera.sead@uobasrah.edu.iq, raidah.khudayer@uobasrah.edu.iq, hameed.younis@uobasrah.edu.iq
*Corresponding author
Keywords: biometric verification, fingerprints, deep learning, particle swarm optimization (pso) algorithm, encryption
key generation
Received: December 7, 2024
With the rapid expansion of computer networks and information technology, ensuring secure data
transmission is increasingly vital—especially for image data, which often contains sensitive information.
This research presents a biometric-based encryption system that uses fingerprint recognition and deep
learning to generate strong, random encryption keys. Two convolutional neural networks (CNNs) are
employed: one to verify identity based on a user’s ID and another to extract fingerprint features for key
generation. These keys are optimized using Particle Swarm Optimization (PSO), enhancing their
randomness and resistance to brute-force attacks.
The system generates keys in real-time, eliminating the need for storage and minimizing the risk of theft or
leakage. To further improve security, encryption keys are automatically updated after every ten messages,
with different keys generated from multiple fingerprints of the same individual. Testing with the SOCOFing
dataset (6,000 original and 49,270 synthetic images) achieved 99.75% identity verification and 99.83%
classification accuracy. Performance metrics—entropy of 7.89, correlation factor of 0.00628, and zero
repetition—demonstrate high robustness. This approach offers a secure, adaptive, and personalized
encryption method ideal for sensitive domains like finance and healthcare.
Povzetek: Opisana je izvirna metoda za generiranje varnih šifrirnih ključev z uporabo prstnih odtisov, CNN
modelov in optimizacije roja delcev (PSO)
1 Introduction encrypted data. Consequently, there is an increasing need
for systems that dynamically generate encryption keys on-
Internet and network users share millions of color images demand at the user’s end, eliminating the need for key
daily, which are utilized in various applications such as transmission over networks [3]. This innovative approach
telemedicine, remote learning, business, and military ensures that the encryption key is generated locally each
operations. Color images, in particular, often contain time data is decrypted, significantly reducing risks
sensitive and detailed information, making them prime associated with key interception. It also eliminates the
targets for unauthorized access and cyberattacks. need for key exchange, adding an extra layer of security
Securing these images is crucial not only to prevent data since unauthorized parties cannot generate the key even if
loss during transmission but also to protect sensitive communication is intercepted.
information from attackers. Various techniques are The keyless exchange method, when combined with
employed to secure digital images, such as watermarking, biometric verification, offers a highly secure solution by
steganography, and image encryption. Encryption minimizing the risk of key theft. This approach aligns
operates in two main stages: encryption and decryption. with the methodology presented in this research.
During encryption, the input image is transformed into an However, implementing such a solution poses significant
unreadable form using a secret key, while in decryption, challenges in the fields of secure computing and key
the content is restored using the same key [1]. The management, as it requires a robust system to ensure the
encryption key is a fundamental element in the consistent and accurate generation of keys [4]. The
encryption and decryption processes, and it significantly importance of this research lies in emphasizing the
determines the security system's strength. However, a generation of encryption keys locally at the user’s end to
critical challenge faced by encryption systems lies in safeguard data and mitigate risks associated with key
managing the encryption key itself [2]. Traditional transmission over networks. This is particularly critical
encryption methods require transmitting the encryption for securing color images, as their high information
key to the recipient to decrypt the data. This approach content often correlates with increased sensitivity, making
introduces vulnerabilities, as any exposure of the key them especially vulnerable to sophisticated attacks.
during transmission could lead to the compromise of the To address these challenges, advanced techniques
based on artificial intelligence and machine learning,
214 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
particularly deep learning, have emerged. One notable generation model without the need for retraining. Erkan et
technique involves using deep learning to generate al. (2024) [9] proposed a secure image encryption
encryption keys from fingerprints. This method leverages framework that combines a chaotic logarithmic map with
the extraction of unique features from fingerprints, a deep CNN for key generation. Their system incorporates
converting them into robust, non-repetitive encryption advanced operations such as permutation, DNA encoding,
keys to ensure high data security [5]. This method diffusion, and bit-reversal to ensure security. The
addresses limitations in traditional encryption systems, robustness of this framework was validated through
such as the need for key transmission over networks. comprehensive analyses, including key sensitivity and
Since a fingerprint is a unique biometric identifier that resistance to various attacks, demonstrating superior
cannot be easily copied or mimicked, it serves as an ideal performance compared to traditional encryption methods.
source for generating encryption keys. Moreover, deep Quinga Socasi, Zhinin-Vera, and Chang (2020) [10]
learning enhances the accuracy and strength of the developed a method for generating encryption keys from
generated keys by utilizing deep neural networks to alphanumeric passwords using an autoencoder neural
analyze biometric images and extract unique features for network. Their experiments revealed that this method
each fingerprint [6]. This approach also resists advanced outperforms conventional algorithms, particularly when
threats, including brute-force and quantum encryption encrypting small text files, making it highly resistant to
attacks, by dynamically generating encryption keys in cracking attempts. Wu et al. (2022) [11] presented a
real-time. The added layer of complexity and secrecy biometric key generation framework that uses fingerprints
prevents unauthorized parties from accessing the keys, to achieve over 1024-bit key strength and 98% accuracy.
even if communication data is partially intercepted [7]. However, their method depends on a predefined pipeline
The integration of deep learning in generating encryption and fuzzy extractors for key stabilization. In contrast, the
keys from fingerprints represents a significant method proposed in this research dynamically extracts
advancement in information security. This approach high-resolution fingerprint features using deep learning
combines robust security measures with individual models, ensuring greater adaptability across datasets.
privacy, paving the way for building encryption systems These features are combined with chaotic encryption
that are highly resistant to breaches and better equipped systems to enhance randomness and security.
to address modern security challenges. The remainder of Furthermore, Particle Swarm Optimization (PSO) is
this paper is organized as follows: Section 2 reviews employed to optimize the generated keys, achieving over
related works, while Section 3 provides background on 99% accuracy and producing 1024-byte keys without
the key techniques utilized in this research. Section 4 requiring stabilization layers. This approach demonstrates
explains the management of secret keys. Section 5 details superior flexibility and security for real-world IoT
the proposed method. Section 6 focuses on experimental applications. Alesawy and Muniyandi (2016) [12]
results and performance analysis. Section 7 discusses the investigated data security in cloud environments using
results, and Section 8 concludes this study. random encryption keys. Their study analyzed the impact
of incorporating Elliptic Curve Diffie-Hellman (ECDH)
2 Related works keys and demonstrated significant improvements in
efficiency and performance by integrating Artificial
The integration of biometric data, chaotic systems, and
Neural Networks (ANNs) with ECDH and genetic
deep learning in encryption key generation has been a
algorithms, despite increased processing times for larger
prominent research area. Various studies have explored
datasets. Saini and Sehrawat (2024) [13] proposed a
innovative approaches to enhance the security and
technique for generating unique encryption keys by
robustness of encryption systems. Hashem and Kuban
combining an autoencoder network with hashing
(2023) [8] introduced a system that leverages fingerprint
techniques and prime numbers derived from the MNIST
biometrics to generate long, random encryption keys. The
dataset. To enhance security, the system incorporates
approach involves preprocessing fingerprint images to
XOR operations and Blum-Blum-Shub (BBS) generators.
remove noise, utilizing a modified VGG-16
Extensive testing confirmed the robustness of this
convolutional neural network (CNN) to extract unique
approach against attacks. Kurtninykh, Ghita, and Shiaeles
features, and employing transfer learning to build a key
(2021) [14] addressed the complexities of
cryptographic key management in systems with Hashicorp Vault is particularly suitable for small
increasing users and applications. They evaluated five key businesses due to its superior security features. A
management systems, including Hashicorp Vault and summary of the related studies is provided in Table 1. for
Pinterest Knox, focusing on features such as security, further reference.
scalability, and access control. The study concluded that
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 215
Table 1: Previous works on key generation
This research builds upon the foundations laid by these 3 Background
studies, emphasizing the dynamic generation of
This paragraph addresses two main techniques: CNNs and
encryption keys using deep learning and chaotic systems
PSO, which form the foundation of the methodology
to address challenges in key management and enhance
proposed in this research. In the following paragraphs, we
security. The comparison Table 1. clearly demonstrates
will provide a summary of each technique and explain its
the superiority of our proposed method over all previous
significance in the study.
approaches. The proposed method utilizes dynamic keys
generated by deep learning networks, which significantly
A. CNNs are advanced models in the field of deep
enhance randomness and security. Moreover, the key is
learning, specifically designed to handle grid-like data,
non-portable, non-persistent, and achieves the largest size
such as images. In this research, two CNN models were
and highest accuracy compared to other methods.
used to generate an encryption key based on fingerprint
216 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
images. Table 2. summarize the components of each model used in the work.
Table 2: Components of CNN models used
Layer (Type) Output Shape Parameters (#)
Conv2D (conv2d_1) ) None, 92, 92, 32( 832
BatchNormalization(batch_normalization_1) ) None, 92, 92, 32( 128
MaxPooling2D(max_pooling2d_1) ) None, 46, 46, 32( 0
Conv2D (conv2d_2) ) None, 42, 42, 64( 51,264
BatchNormalization batch_normalization_2) ) None, 42, 42, 64( 256
MaxPooling2D (max_pooling2d_2) ) None, 21, 21, 64( 0
Conv2D (conv2d_3) ) None, 19, 19, 128( 73,856
BatchNormalization (batch_normalization_3) ) None, 19, 19, 128( 512
MaxPooling2D (max_pooling2d_3) ) None, 9, 9, 128( 0
Dropout (dropout_1) 0
(None, 9, 9, 128)
Flatten (flatten_1) 0
(None, 10368)
Dense (dense_1) 10,617,856
(None, 1024)
(None, 1024) 0
Dropout (dropout_2)
Dense (dense_2) (None, 600) 615,000
The first model was designed to identify a person’s The two models were trained using the backpropagation
identity based on their ID number. After confirming the technique with a suitable loss function for each task. This
person’s identity, the second model identifies the selected architectural design was chosen to achieve accurate
fingerprint and extracts its features. Both models rely on performance in recognizing the identity of the fingerprint
convolutional layers to automatically and progressively owner through the identifier number in the file name, and
extract important features from the input data, making then generate an encryption key based on the unique
them effective in performing tasks, which, in turn, aids in features of the fingerprint using two convolutional neural
generating strong encryption keys by analyzing fine networks.
patterns in the images.
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 217
Pseudo-code for the PSO Algorithm
1. Initialize Parameters:
o Define bounds:
• Lower bound (lb) = 0
• Upper bound (ub) = 255
o Set PSO parameters:
• Number of particles = len(keys)
• Maximum iterations = 200
• Inertia weight (w) = 0.9
• Cognitive coefficient (c1) = 0.5
• Social coefficient (c2) = 0.5
o Set random seed for reproducibility.
2. Initialize Particles:
o Convert keys to a NumPy array.
o Set initial particle positions = keys.
o Set initial velocities = zeros.
o Initialize personal bests:
• Personal best positions = initial positions.
• Personal best scores = evaluate fitness for each particle.
o Find global best:
• global_best_position = position with the best score.
• global_best_score = best persona l score.
3. Run PSO Optimization:
For each iteration in range(num_iterations) do:
o For each particle do:
• Update velocity:
o new_velocity = (w × cu rrent velocity)
+(c1 × random factor ×
(Personal best – c urrent position))
+(c2 × random factor × (global best - current position)).
• Update position:
o new_position = current position + new_velocity.
o Clip positions to bounds (lb, ub).
• Evaluate fitness of the new posit ion.
o Update personal best position and score.
o Update global best:
• If any particle's score is better than the global best score:
o Update global best position and score.
4. Output Results:
o Convert global_best_position to integer (best_key).
o Compute best_entropy_value using the fitn ess function.
Figure 1: PSO algorithm
This approach adds an extra layer of security, ensuring The algorithm enhances the randomness and strength of
that the keys are not only unique and non-repetitive but the generated keys, ensuring that they are both secure and
also resilient to various forms of attacks. The use of resistant to attacks. PSO improves the key generation
PSO ensures that the final encryption keys are both process by fine-tuning the key parameters in real-time,
optimized for security and making it more robust against potential security threats.
This approach adds an extra layer of security, ensuring
B. PSO (Particle swarm optimization) is an that the keys are not only unique and non-repetitive but
optimization algorithm inspired by the collective also resilient to various forms of attacks. The use of PSO
behavior of birds or fish. It involves a group of ensures that the final encryption keys are both optimized
particles, each representing a potential solution in the for security andgenerated dynamically, without the need
solution space. Each particle adjusts its movement for permanent storage, thus reducing the risks of key
based on its own experience and the experiences of leakage or unauthorized access. Key enhancement using
neighboring particles, with the aim of reaching the PSO: The Particle swarm optimization (PSO) algorithm is
optimal solution. PSO is known for its efficiency and used to enhance the quality of the initial key, making it
ability to find optimal solutions in multi-dimensional stronger and more secure. Figure 1. illustrates the detailed
spaces. In this research, PSO is applied to optimize the steps of the Particle swarm optimization (PSO) algorithm
process of encryption key generation. using pseudo-code. This pseudocode reflects the essence
of the PSO algorithm applied to optimize encryption keys
218 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
based on the fitness function (such as randomness or lifecycle of keys. By avoiding permanent storage and
security). The process iteratively adjusts the position focusing on real-time generation and temporary
(key) and velocity of the particles to find the optimal protection, the system significantly reduces the risks
encryption key with high security. associated with key leakage or unauthorized access. This
approach aligns with best practices in modern
4 Secure key management cybersecurity by combining the advantages of real-time
key generation with robust temporary key management to
Secure key management is a critical process to ensure the ensure the highest level of data protection throughout the
protection of encrypted data across encryption systems. encryption process [16].
In the proposed methodology, the focus is on generating
cryptographic keys in real-time without permanent
storage, thus reducing the risks associated with key 5 Proposed method
leakage. However, temporary handling and protection of
keys during their lifecycle remain essential. Below is a Figure 2: presents the diagram for the proposed
detailed explanation of the steps and importance of encryption key management and generation.
secure key management, updated to reflect the real-time
generation approach: Securing Communication and Transferring Confidential
Information
1. Key generation: In the proposed system, keys are
generated dynamically and in real-time using advanced
techniques such as artificial neural networks, particularly
convolutional neural networks (CNN). This approach
ensures that the keys are both highly secure and non-
repetitive, avoiding the need for long-term storage. These
keys are designed to be sufficiently random and robust,
minimizing the possibility of guessing or tampering.
2. Temporary key handling: While keys are not stored
permanently, they are managed securely during their
temporary existence within the system. During
encryption or decryption processes, the keys are stored in
memory with strict safeguards, such as memory
encryption or secure enclaves, to prevent unauthorized Generating Encryption Key using CNN and Encrypting the
access. Once the operation is complete, the keys are Image
securely erased from the system to eliminate any residual
risk.
3. Key distribution: Since the system eliminates the
need for traditional key exchange, the reliance on secure
protocols like SSL/TLS or Diffie-Hellman for key
distribution is significantly reduced. Instead, the
generated key remains local to the system, mitigating
risks associated with interception during transmission
[16]4. Key rotation: In systems where keys are reused
for multiple sessions or extended periods, regular key
Generating Encryption Key using CNN and Decrypting Image
rotation is critical. However, in the proposed system,
each key is uniquely generated for a specific session or
operation, inherently providing the benefits of key
rotation by design.
5. Key revocation: Although the system minimizes the
use of persistent keys, mechanisms for immediate key
invalidation are essential for scenarios involving session-
based or temporarily stored keys. These mechanisms
ensure that any exposed or misused keys are rendered
unusable promptly [17].
Figure 2: Proposed method diagram
6. Importance of key management in real-time
The
systems: The proposed approach emphasizes the
importance of secure key handling during the active
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 219
proposed method consists of three main parts. The first part 6. Managing the number of sent messages (dynamic
begins with an algorithm for securing communication and key management): The system checks whether the
managing encryption keys. This is followed by the second number of messages sent by the user has exceeded the
part, which involves the process of generating the encryption allowed limit (10 messages).
key and encrypting the image. Finally, the third part focuses
on decrypting the image after the key has been generated. Each • If the limit is exceeded, the counter is reset to 1.
of these parts will be explained in detail later. • If the limit is not exceeded, the current counter is
used as an index for generating the encryption key.
Part One: Securing communication and This mechanism ensures unique encryption keys
managing confidential information transfer for each set of messages, enhancing data security.
Additionally, it raises a critical question:
"Can biometric fingerprint data generate
The first part of Figure 2 illustrates an algorithm
dynamic encryption keys resistant to quantum
designed to ensure secure communication and reliable
attacks?"
key management between branches and the main branch.
When a branch requests access to sensitive information
(such as encrypted images), the main branch fulfills this This approach aims to strengthen the security of biometric
request by sending the requested information after keys against advanced threats such as quantum attacks.
encrypting it with a secure key, ensuring data protection
during transmission. User ID is used to control access. 7. Sending the request to the branch (send request to
branch): The request containing the ID (ID) and the
Algorithm execution steps fingerprint index (P) is sent to the second branch for
processing.
The algorithm is executed in cooperation with the
• In the second part: A key is generated for image
following two parts in the diagram as follows:
encryption, and the encryption process is executed.
After encryption, the encrypted image is sent back to the
1. Starting the process (start): The process begins by first part.
initializing the user's counter Counter [ID] to zero.
• In the third part: A new key is generated to decrypt
2. Entering the ID number: The system prompts the
the image.
user to input their identification number to verify
Once decryption is completed, the data is returned to the
their identity.
first part for the remaining steps.
3. Verifying the ID range (ID in 1..600): The system
checks whether the entered ID number falls within
the allowed range (1 to 600). Note: The details of the second and third parts will be
explained in the following sections of the document for a
precise and comprehensive understanding. In this way, the
• If the number is outside the range, an error three parts form an integrated system that ensures secure
message is displayed, and the user is asked to re- communication and the safe transmission of sensitive
enter the ID. information effectively.
• If the number is valid, the process moves to the
next step.
Algorithm features
4. Checking the match with the exit indicator (ID in
exit): The system compares the entered ID with the • Biometric security: Fingerprints are used as a
exit indicator list. means to verify user identities, which reduces the
risks of unauthorized access.
• Synchronization: The system relies on concurrent
• If a match is found, the process is
processing, enhancing performance efficiency and
terminated.
reducing response times for requests.
• If no match is found, the process
• Dynamic key management: Each key is generated
continues to the next step.
uniquely for each user based on their fingerprint,
increasing the difficulty of breaching the system.
5. Incrementing the message counter (counter [ID] This algorithm ensures effective protection of
+= 1): encrypted data and enhances the security of
communications between branches, making it an
If the ID is valid and not listed in the exit indicator, the excellent choice for systems that require a high level
user's message counter is incremented by 1. of security and privacy.
220 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
CDFmin−CDF(I)
H′(I) = (1-L) (2)
(NXM)−CDFmin
Part Two: Encryption key generation using
CNN and image encryption
The histogram equalization process involves several key
The encryption key is generated using CNN based on the parameters that affect the final outcome of the operation:
fingerprint. This process is carried out as specified in Part
2 of the diagram, which includes the following 1. Cumulative distribution function (CDF): This is the
operations: primary factor that determines how grayscale values are
redistributed in the image. The CDF accumulates
grayscale values progressively from the lowest to the
1. Database loading phase: This step is considered
highest and is used to adjust the distribution. Through this
one of the main preparatory phases in the system to
function, the grayscale value distribution in the image is
ensure the readiness of the data and models required
calculated, and adjustments are made to spread these
to achieve accuracy and security in encryption key
values evenly across the color range.
generation. In this research, the SOCOFing database
was used, which contains fingerprints from 600
2. Minimum non-zero value (CDFmin): This refers to
people of African descent, with each person having
the smallest non-zero value in the cumulative distribution
10 fingerprints, resulting in a total of 6,000 original
function. It is used to determine how grayscale values in
fingerprints. Additionally, synthetic groups were
the image will be adjusted to achieve a more balanced
created with three levels of variation in the
distribution. For example, if the grayscale values in the
fingerprints: minor changes (Easy), medium
image are concentrated around a particular value, utilizing
changes (Medium), and significant changes (Hard).
this minimum helps improve the distribution of those
The total number of synthetic fingerprints used in
values without significantly affecting the overall contrast
training was approximately 49,270. The variation
of the image.
fingerprints were used for training the model, while
the original fingerprints were used solely for testing.
3. Image size (N×M): This refers to the number of pixels
2. Data preprocessing phase: The following
in the image. The larger the image (i.e., a greater N×M),
processes are included:
the more opportunities there are for accurately
3. Image size standardization: To ensure that all
redistributing grayscale values. However, it is important
images in the database are compatible with the
to note that image size can impact processing speed, as
model requirements, the dimensions of all images
larger images require more computations.
are standardized. A common size, such as 96×96
pixels, is often chosen to prepare the images for
4. Number of gray levels (L): Typically, L=256 in
efficient model processing. The formula for resizing
grayscale images (meaning there are 256 possible tonal
the images can be expressed mathematically as
levels ranging from 0 to 255). The number of gray levels
shown in Equation (1) below:
defines the range of colors that can be distributed across
I'(x',y') = I(y/Sy ,x/Sx) ( 1)
the image. In images with a high number of gray levels,
Where:
tonal gradations can be distributed more evenly, leading to
I(x,y) is the original image, and I′(x′,y′) is the image
better contrast enhancement.
after resizing, with Sx and Sy representing the
When applying this technique, the range of grayscale
scaling factors in the image dimensions[18].
values in the image is expanded, and these values are
evenly distributed across the color range, leading to
A. Image enhancement using histogram
increased contrast. This enhanced contrast reveals fine
equalization:
details in the image, such as the minutiae in fingerprints,
The histogram equalization technique was applied to
which might be poorly visible in low-contrast images. In
enhance contrast in fingerprint images and highlight
the case of fingerprints, fine details such as ridges and
fine details. This technique is one of the
patterns are often crucial for analysis and classification.
fundamental methods in image processing and
By using histogram equalization, the clarity of these fine
quality enhancement, aiming to improve the
details can be improved, aiding in better feature extraction
distribution of grayscale levels in the image to make
of the fingerprint and achieving higher performance in
fine details more visible. In images with low
systems that use fingerprint recognition. Figure 3. shows
contrast, gray values may cluster within a narrow
an example of fingerprints before and after contrast
range, leading to the loss of fine details in dark or
enhancement using histogram equalization. Notice how
bright areas. Histogram equalization is used to
the enhanced images display finer and clearer details
address this issue by improving the distribution of
compared to the original images.
these gray values over a broader range of available
colors, enhancing contrast and making details easier
to detect. The process of adjusting the tonal
gradients in the image is carried out using the
following equation (2)[19]:
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 221
7. Training and evaluation phase of the two models
The performance of both models is evaluated using
standard performance metrics such as accuracy,
validation, and error rate calculation. This ensures that the
first model is capable of accurately verifying the identity
of authorized individuals when requesting the encryption
key. Similarly, the second model's performance is
assessed to determine its ability to correctly identify the
fingerprint belonging to the individual whose identity has
(a) (b) b e e n v e r i f i ed. This evaluation is done using the test set.
Figure 3: (a) Original image (b) Image after 8. Identity verification and key generation
histogram equalization
The identity of the individual and the fingerprint match
5. Database analysis phase with the registered name are verified using two deep
After image enhancement, each fingerprint is analyzed to learning models. This is a key step in generating the
identify distinctive features, such as patterns and key encryption key from fingerprints, as illustrated in Figure
regions within the fingerprint. This helps prepare the data 4.
for the model to understand the unique elements in each 9. Key optimization stage using PSO
fingerprint. The goal of this process is to efficiently
analyze the fingerprint database to extract the necessary To enhance the quality of the initial key and obtain a
information for feeding two different models. This is stronger, more secure key, PSO algorithm is applied. This
achieved by parsing the file name to extract the algorithm aims to improve the random distribution and
individual's identity, finger type, and hand (right or left). security properties of the key. The goal of this algorithm
Additionally, these steps prepare the data for the two is to increase the randomness of the key and ensure its
models, allowing the first model to recognize the difficulty in being guessed or broken. The use of the PSO
individual’s identity, while the second model identifies algorithm to optimize encryption keys relies on updating
the finger type based on fingerprint information. After the positions and velocities of particles based on the
applying these processes to the database, the data is individual's fingerprints, as illustrated in Figure 1. This
divided into training and testing sets. Artificial continuous update of the keys, leveraging the best
fingerprint data is used for training, while original data is personal and global positions, results in generating an
used for testing. encryption key that is more secure and complex. This also
raises the question: "How does the proposed system
6. Model building phase: After preparing the database, perform against statistical attacks?" This approach
two models are built using CNNs. The first model aims aims to reduce the likelihood of the keys being exposed to
to identify the person's identity (SubjectID), while the any repetitive patterns that could be exploited in statistical
second model aims to determine the finger number attacks. Table 4. outlines the hyperparameters used in the
(FingerNum) and extract distinctive features of the optimization algorithm, selected based on a series of
finger. Each model consists of the layers shown in Table experimental trials.
2. The hyperparameters used in the models are illustrated
in Table 3.
Table 3: CNN hyperparameters configuration
222 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
Pseudo-Code for verification and encryption key generation
1. Initialize finger name function:
o Define show_fingername(fingernum):
• If fingernum >= 5:
o Set hand = "right" and
subtract 5 from fingernum.
• Otherwise:
o Set hand = "left".
• Map fingernum to finger names (e.g.,
little, ring, middle, index, thumb).
• Return the full finger name (hand +
finger).
2. Verify fingerprint information:
o Predict subject ID and finger number for a random
fingerprint (rand_fp_num) from the test set using models:
• Id_pred = predicted subject ID.
• Id_real = actual subject ID.
• fingerNum_pred = predicted finger number.
•
fingerNum_real = actual finger
number.
o Check predictions:
• If both IDs and finger numbers match:
o Print "Information
confirmed" with subject
ID and call
show_fingername(fingerN
um_pred) to get the finger
name.
• Otherwise:
o Print "Prediction is wrong."
3. Extract candidate fingerprints:
o Initialize lists keys1 (for original fingerprints) and
keys2 (for dense layer outputs).
o For each index i in the prediction range:
• Get Id_check = predicted subject ID.
• If Id_check == Id_pred:
o Append the fingerprint to keys1.
o Append the dense layer output to
keys2.
o Convert keys1 and keys2 to arrays.
4. Select target fingerprint:
o Use index p1 t o select:
• origina l_fp = keys1[p1].
• dense_o utput_finger_selected = keys2[p1].
5. Apply data augmentation:
o
Define an image data generator (datagen) with
Table 4+t:r aHnsyfoprmeratpioanrsa: meters of pso
• Rotation, width/height shift, shear, 10. Image encryption stage and sending encrypted
zoom, and horizontal flip.
image
o Reshape original_fp to fit the generator’s input
format. Chen's chaotic system is a three-dimensional dynamic
6. Generate augmented fingerprints and keys: system that exhibits chaotic behavior and is based on
o Use datagen to create 20 augmented fingerprints: nonlinear differential equations to represent the evolution
• For each augmented fingerprint: of the state over time. It can be used to generate a chaotic
o Generate a new fingerprint.
o encryption key based on the system's state. The Chen
Predict the dense layer
output. chaotic system relies on the following equations that
o Take absolute values of the describe the changes in the variables x, y, and z:
output to create a key.
o Append the key to the keys
list. 𝑑𝑥
= 𝑎. (𝑥 − 𝑦) (3)
7. Return results: 𝑑𝑡
• Output Keys // To be used as input for PSO algorithm to find
optimal key the list keys for use in encryption.
𝑑𝑦
= (𝑎 − 𝑐). x- x . z + c . y (4)
𝑑𝑡
𝑑𝑧
Figure 4: Pseudocode for the identity verification = 𝑥. 𝑦 - b . z (5)
𝑑𝑡
and key generation process
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 223
Where: x, y, and z are the variables that determine the • Key optimization stage using PSO: The key
state of the chaotic system at time t. is optimized using the PSO algorithm. Decrypt the
image and send it: The image is decrypted and sent.
• a, b, and c are the parameters that control the Decryption: To decrypt the image, the same key (the
behavior of t he system. Steps followed: chaotic key and the key generated from CNN) is used
to perform an XOR operation on the encrypted image,
restoring the original image.
• Initial conditions: The process starts by
defining the initial values for x, y, and z, which
represent the state of the system at the beginning of the 6 Results and analysis
simulation. These values are set in the code as This section of the research addressed four main axes:
[1.0,1.0,1.0]. Numerical integration: The odeint evaluating the CNN, assessing the generated key,
function is used for numerical integration to solve the evaluating the PSO algorithm, and finally comparing
differential equations over time. Through this process, the results of the proposed method with similar
the values of x, y, and z are updated at each time step, methods. As follows:
based on the parameters a, b, and c that influence the
system's behavior. A. Results of CNN models and performance analysis
• Generating a chaotic sequence: A chaotic At this stage, the data was divided into training and
sequence is generated by solving the differential testing sets to ensure the accuracy of the models in
equations of the Chen chaotic system over multiple predicting and distinguishing between different
time steps. This sequence is then used to generate a categories. 80% of the data was allocated for training,
chaotic encryption key. and 20% for testing, ensuring an equal distribution of
• Encryption key generation: The resulting categories in both sets to avoid bias. After training the
chaotic sequence is converted into integer values models on the training set, their performance was
ranging from 0 to 255 to represent color values in an tested using the test data.
RGB image. This is done by multiplying each value in
The accuracy of the models was calculated using
the sequence by 255 and converting it to the uint8 data
the accuracy function available in the TensorFlow
type.
library, which represents the ratio of correct predictions
• Combining the chaotic key with the
to the total number of predictions. To continuously
generated key: The chaotic key is combined with the
monitor performance, TensorBoard was used, which
key generated using CNN through an XOR operation.
helped track various metrics such as accuracy and loss
This step increases the complexity of the final key used
throughout the training and testing phases, allowing for
for image encryption.
ongoing improvements to the models based on these
• Encrypting the image: The XOR operation is
indicators. The results obtained showed varying
applied between the original image and the final key to
performance across the different models, with these
generate the encrypted image. This operation
results summarized in Tables 5, 6, and 7, which
transforms the pixel values in the image into new
illustrate accuracy and loss across different
values based on the chaotic key.
generations. Additionally, the graphs in Figures 5, 6,
and 7 show the evolution of the models' performance
Part three: key generation using CNN and image over time, highlighting the models' ability to learn and
decryption: improve progressively. Where the false positive rate
was 0.000185.
This part is similar to the stages in Part 2, with the
only difference being that the models are not built and
Table 5: Model performance comparison: accuracy and
trained again; instead, the previously saved models
loss.
are loaded. Additionally, there is a decryption stage
instead of encryption. The stages in this part are as
follows:
• Data loading phase: Only the test set is loaded
(i.e., 600 genuine fingerprints from SOCOFing).
• Data preprocessing phase: The raw data is
processed to prepare it for the next stage.
• Database analysis phase: The data is analyzed
to extract the necessary information.
• Loading the saved models: The previously
trained models are loaded.
• Verification and key generation: The data is
verified, and the key is generated.
224 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
Table 6: Classification report for finger recognition
Figure 6: Accuracy and loss of the fingerprint model.
Table 7: Classification report for subjectID recognition
Figure 7: Confusion matrix and accuracy metric
B. Encryption key evaluation results and metrics: The
generated encryption key was evaluated using a set of
specialized metrics to ensure its quality and
effectiveness in resisting cyberattacks. The experiments
were conducted using fingerprint images sized 96 × 96
in a Kaggle environment with Python, on a workstation
equipped with an Xeon(R) I (R) processor, 64 GB of
RAM, and a GPU P100. The metrics used included
evaluations such as key size and various randomization
tests (such as entropy test, repetition test, etc.) to assess
the randomness of the key and its predictability. These
tests help ensure that the system remains unaffected
when used in live applications.
6.1 Key space analysis
Brute-force attack is a type of cyber-attack that relies on
guessing the key by attempting a large number of
possible passwords or secret phrases. An encrypted
image with a short key is highly vulnerable to this attack
over time. However, if the key is longer, it will remain
resistant for a longer period. Therefore, it becomes
impossible to guess the key if it has an adequate length.
Figure 5: Accuracy and loss of the identity model. Key space analysis is used to assess the strength against
brute-force attacks.
According to this analysis, a key with a length greater
than 2100 is considered suitable for high-security
encryption [26]. In our system, we propose an approach
based on deep neural networks (CNN) and PSO to
generate this key.
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 225
The key has a size of 1024 values, with each value 6.2 Significance of results in cryptographic
ranging between 0 and 255. This means the key consists key management
of 1024 bytes (since each value requires one byte, and 8
bits are enough to represent values from 0 to 255). Given
that each value in the key ranges from 0 to 255, we have • The results, such as randomness tests and high
256 possibilities for each value. With 1024 values, the entropy, demonstrate that the generated key exhibits a
total key space will be: 2561024 or, in other words, 2 high degree of randomness, making it ideal for high-
8×1024=28192.This represents an extremely large key space, security applications.
which is sufficiently large to be highly resistant to brute- • High entropy indicates that the keys have a
force attacks. A key size of 28192 offers a very high level uniform distribution of values, reducing
of security, making it practically impossible to crack • the likelihood of predicting any part of the key,
using brute-force methods, even with fast computing which is a critical feature in key management.
devices. Nonetheless, the question remains: How does
the proposed system perform against brute-force 6.3 Encryption key tests
attacks? In this study, six fingerprint samples were used as the
basis to generate six encryption keys. Each key underwent
Comparison of key space (28192) with comprehensive testing using eight different metrics to
traditional systems determine the quality and randomness of the generated
keys. The results of these eight tests were systematically
presented in a table, reflecting the effectiveness of the
• Comparison with AES-256: proposed method. The results showed the success of the
keys in all eight tests, confirming that the keys generated
The proposed key space (28192) is significantly larger from the fingerprints meet the required security standards.
compared to AES-256, where the key space is These tests demonstrate the randomness and
approximately 2256. This substantial difference makes our unpredictability of the keys, making the approach suitable
key space more resistant to brute-force attacks. for secure encryption applications. The core encryption
Traditional systems like AES-256 rely on efficient key tests include the following:
algorithms to compensate for the smaller key space
compared to the vast proposed space.
• Entropy test:
• Comparison with RSA-2048: The entropy measures the distribution of information
in the key and reflects the level of randomness. The
The proposed key space is also significantly larger entropy is calculated using the following equation (6):
compared to RSA-2048, where the key space is
approximately 22048. RSA relies on computational H(X)= −∑𝑛
𝑖=1 𝑝 (𝑥𝑖) 𝑙𝑜𝑔2 𝑝(𝑥𝑖) (6 )
complexity for large numerical factorization, whereas in
our system, the security strength depends on the key
Where p(xi) is the probability distribution of the value xi
length derived from biometric features processed through
in the key. If the entropy equals 8 bits, it means the key is
deep networks.
completely random [26].
• Comparison with ECC-384 (Elliptic curve
Table 8: Results of the entropy test
cryptography):
The traditional key space for ECC-384 is
approximately 2384, which is much smaller compared
to our proposed key space (28192). ECC relies on
elliptic curves to compensate for shorter keys, but in
contrast, we provide much longer keys derived from
neural networks, enhancing their unpredictability.
• Repetition test: The repetition test generally aims to
ensure that the key does not contain any repeated sections
• Comparison with DES (Data encryption
within its sequence, whether these sections are adjacent or
standard):
non-adjacent. If parts of the key are repeated, it weakens
the randomness and increases the likelihood of
The key space in DES is 256, which is extremely small discovering a pattern that can be exploited in an attack.
compared to our proposed key space. This test involves checking all parts of the key to detect
DES is considered outdated and vulnerable to brute- any repetition that might impact its security level. The
force attacks, whereas our proposed key space vastly repetition test addresses repetition in the key overall,
surpasses it in terms of length and complexity. whether in adjacent or non-adjacent parts [27].
226 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
Table 9: Results of the repetition test Table 11: Results of the repetition test(adjacent)
• Uniformity test using the chi-squared test:
• Pearson correlation test is a statistical test used to
measure the relationship between two variables.
This test aims to check whether the values in the This relationship is expressed by a coefficient
encryption key are evenly distributed across the full called the "Pearson correlation coefficient," which
range of possible values. The chi-squared test is used to ranges from -1 to 1. If the correlation coefficient is
compare the actual distribution of values in the key with close to 0, it indicates no correlation (high
the expected ideal distribution. If the values are evenly randomness), making the encryption key strong and
distributed, the key is considered to have a uniform hard to predict. The purpose is to determine the
distribution. Equation (7) illustrates the test: extent of the correlation between values in the
encryption key. If the correlation coefficient is
255
𝑛 close to 0, it indicates that the key is sufficiently
( −𝑛𝑖)
χ2=∑ ( 256 ) (7 ) random, thus making it strong against analytical
𝑛/256
𝑖=0 attacks. The equation (8) represents the Pearson
correlation equation:
• ni: The frequency of occurrence of value i in the
key.
• n/256: The expected frequency for each value i
assuming a uniform distribution. ∑(Xi−Xˉ)(Yi−Yˉ)
r = (8)
• n: The total number of values in the key. √∑(Xi−Xˉ)2⋅∑(Yi−Yˉ)2
• If the chi-square (χ2) value is low, it indicates
that the actual distribution of values is close to the Where:
ideal distribution, meaning the key is evenly
distributed. • r: Pearson correlation coefficient.
• At a significance level of 0.05, if the chi-square • Xi: Individual values in the first series.
value is less than 293.25, the key is considered to • Yi: Individual values in the second series (e.g.,
have passed the test and has a uniform distribution lagged values in time series).
[28]. • X̄: Mean of the Xi values.
• Ȳ: Mean of the Yi values [30].
Table 10: Results of the uniformity test
Table 12: Results of the pearson correlation test
• Repetition test(adjacent) focuses specifically on
identifying repetition in adjacent parts of the key. • Stability test
This test checks for any repeated consecutive or
sequential sections that might indicate a fixed pattern The key must remain stable if the input data is stable.
or excessive repetition, which could weaken the This means that if the same inputs are used to generate
effectiveness of encryption. Repetition of adjacent the key multiple times, the resulting key should always
parts is considered a sign of poor randomness, thus be identical. However, slight changes in the inputs
reducing the strength of the key. The closer the value should result in a significant change in the key, which
is to 0, the less repetition there is, which means the enhances encryption strength against attacks.
key has a higher level of randomness [29].
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 227
6.3 Consistency of fixed inputs Table 14: Results of the stability test (1)
o If the input is fixed I, the encryption system should
produce the same key K every time: F(I)=K.
o Repeat key generation multiple times using the same
I, and the result should be consistent K in all
attempts: K1=K2=⋯=Kn.
Table 13: Results of the stability test (1)
Table 15: Encryption results with the original key and
the modified key
6.5 Range test
The range test aims to evaluate the distribution of
encryption key values within a specific range to ensure
its randomness.
Steps of the range test
1. Calculate the range: Determine the difference
between the maximum value max and the minimum
value min. Range=max−min
6.4 Sensitivity to minor changes 2. Range splitting: divide the range into buckets.
(inclusivity effect) 3. Frequency calculation: count the values in each
We make a slight change in the input I to create I′. bucket.
A new key K′ is generated using I′: F(I′) = K′. We 4. Distribution analysis: if the frequencies are
measure the difference between K and K′ using the approximately equal, the key is considered random.
Bit Change Rate: expected frequency equation (9):
Bit difference between K and K′
Bit change rate= × Ei = N/M (9)
1024 where N is the total number of values, and M is the
100% number of buckets[27].
The change rate should be higher than 50% to ensure the
system's sensitivity to changes [31].
228 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
Table 16: Results of the range test Table 17: Results of the autocorrelation test
• Autocorrelation test
The Autocorrelation Test is used to determine the
randomness of a sequence of values in an encryption key.
If the key is sufficiently random, the autocorrelation
values should be small or close to zero, indicating no
clear pattern or dependency in the sequence. To calculate
the autocorrelation at a lag d, equation (10) is used:
1 𝑛−𝑑
R(d) = ∑ (𝑥𝑖. 𝑥𝑑 + 𝑖 ) (10) C. Results of using PSO in enhancing the encryption key
𝑛−𝑑 𝑖=1
After completing the specified number of iterations, the
best encryption key is obtained, which is the key that
Where: R(d) is the autocorrelation coefficient for lag d.
achieved the highest fitness during the optimization
process. Table 18. illustrates the effect of using PSO on
• xi is the value at position i in the sequence. the generated encryption key.
• Xd+i_ is the value at position d+i in the sequence.
• n is the length of the sequence.
Table 18: The impact of the PSO algorithm in improving
the encryption key.
A value of R(d) close to zero for different values of d
indicates a high level of randomness in the encryption
key, The Figure 8. shows the distribution of
autocorrelation test results, highlighting successful and
failed values based on the specified critical value (0.05)
[28].
D. Comparison of the accuracy of the proposed system
with other systems
This section evaluates the accuracy of our proposed
system in comparison to other systems reported in recent
years, based on their respective sources. Experimental
results from our proposed model demonstrated an
accuracy exceeding 99%. Table 19. presents a detailed
comparison between our system and other existing
Figure 8: Distribution of autocorrelation test results with systems.
success and failure indication based on critical value
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 229
Table 19: Accuracy comparison between our system generates robust keys.
and recent approaches.
Randomness tests (e.g., entropy): The use of PSO in
the proposed system significantly contributed to
enhancing randomness, which strengthens the
generated keys. When comparing randomness tests
(e.g., entropy) with other models, the proposed system
showed remarkable superiority. The combination of
CNN and PSO enabled the generation of keys with
excellent randomness levels, providing a higher degree
of security compared to traditional models. PSO helps
optimize the quality of the keys by searching for the
optimal combination of hidden parameters, making
them more random and harder to break.
It is important to note that the model only retains the
predictions generated during its operation, which are
The aim of this comparison is to evaluate the
values devoid of any sensitive information. This
effectiveness of the proposed system in the context of
enhances the system's security against various types of
recent advancements in deep learning technology,
attacks, such as mixed replacement attacks, crossover
providing insight into how cybersecurity can be
attacks, and exhaustive search attacks. In such attacks,
enhanced through the application of advanced
the attacker has no knowledge of the key generation
techniques. The table also reflects the ongoing progress
mechanism or the supporting data, making the number
in fingerprint data processing, showing that modern
of attempts required to crack the key increase
systems achieve higher accuracy than traditional
proportionally with its length. For example, if the key
systems, supporting the idea that using deep learning
length is 1024, the number of possible combinations
can improve the effectiveness and security of
would reach 28192 The proposed system focuses on
encryption systems.
enhancing data security by avoiding key storage,
improving the randomness of key generation, and
7 Discussion protecting sensitive information from various attacks,
In this section, we will discuss the results of the while ensuring high efficiency in user fingerprint
proposed system in comparison to modern methods recognition.
presented in Table 19, focusing on accuracy,
randomness tests (such as entropy), and robustness. Robustness: Biometric key (encryption key for
Additionally, we will address potential trade-offs security):
associated with using CNNs, such as computational The biometric key is generated based on the parameters
overhead. Below is a detailed comparison of key learned during the training of the CNN model.
results. During training, the model learns unique representations
or features extracted from fingerprints. These
7.1 Comparison of results representations are numerical weights that are not easily
interpretable.
Accuracy: The proposed system (using CNN and
The parameters are converted into an encryption key
PSO) achieved an accuracy between 99.73% and
that relies on the unique properties of each fingerprint,
99.83% within just 20 epochs, outperforming most making the key:
models in the table. For example: The enhanced
VGG-16 model achieved 99.98% accuracy in 100 • Unique and tamper-proof.
epochs, the highest in the table, but required five • More secure and resistant to duplication.
times more epochs than the proposed system. The
Modified-LeNet model achieved 99.10% accuracy Role of PSO in key enhancement:
in 55 epochs, which is lower than the proposed
system. The DeepFKTNetmodel achieved 98.89% PSO improves the key by identifying the optimal values
accuracy in 60 epochs. Thus, the proposed system of the parameters used in key generation. This enhances
stands out as a strong option, delivering high accuracy randomness and independence among keys, making
them more resistant to attacks.
in less training time, thanks to the combination of
CNN and PSO, which enhances feature extraction and
230 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
7.2 Advantages of the proposed system gradients, resulting in better stability during the training
process and improved model performance in generating
encryption keys. Table 20. illustrates the key strength
The proposed system combines CNN and PSO to
(entropy measure) when using the Tanh activation
achieve:
function compared to using the ReLU activation function.
• High classification accuracy in less time.
Table 20: Compares the key strength (entropy) of the
• High-quality encryption keys with excellent levels of
Tanh and ReLU activation functions.
randomness and security.
• Strong protection of users’ biometric data against
exploitation or breaches.
• Improved biometric key performance using PSO to
generate stronger and more random keys, increasing
the system’s resilience to cyber threats. Thus, the
proposed system leverages the multiple features of
CNN and PSO, making it more robust in addressing
security challenges such as resistance to adversarial
attacks. While other models primarily focus on
classification and accuracy, the proposed system
demonstrates additional strength in encryption
applications.
Potential trade-offs:
Although the use of CNN in the proposed system results
in a slight increase in computational overhead compared
to simpler models like the modified LeNet, this does not The batch normalization layer plays a significant role
pose a significant obstacle. The model is designed to in stabilizing and accelerating the learning process in
operate efficiently on modern systems supported by deep models by normalizing the outputs to have a
Graphics Processing Units (GPUs), ensuring accelerated mean of 0 and a standard deviation of 1. While this
training and reduced execution time. stabilization is beneficial in many applications, such as
image classification, it may negatively impact the
strength of the generated encryption key.
8 Conclusions
• The results indicate that the generated keys exhibit
• The results of this research show that integrating high levels of randomness, making them more
biometric techniques with deep learning provides an challenging to breach. Additionally, the use of PSO
innovative and effective solution for generating algorithm is considered an effective technique for
secure and robust encryption keys based on enhancing the randomness of the keys, as it allows for
fingerprints. The proposed system enhances the generating different keys for each transmission,
security of data transmitted over the internet, making thereby reducing the risk of key theft and increasing
it more resistant to theft and tampering. The use of security. A comprehensive analysis of the performance
two convolutional neural network models is a of the models used in this research was conducted,
significant step, where the first model contributes to showing a significant improvement in encryption
identity recognition and the second focuses on effectiveness and the reliability of the generated keys,
fingerprint detail recognition, ensuring the extraction underscoring the efficiency of these models in the
of unique and reliable biometric features. context of cybersecurity.
• One of the main conclusions of this research is that • The proposed approach enhances system security
the tanh activation function plays a crucial role in compared to traditional systems by reducing reliance
neural networks for generating encryption keys. This on static keys, which are a vulnerability in many
function is known for its ability to transform outputs encryption systems. Instead, biometric verification is
into the range of (-1, 1) non-linearly, which used to generate unique keys for each user based on
contributes to improving the quality of the generated their fingerprints, thereby increasing the level of
keys. Increased complexity and randomness: security. This research provides a significant
The tanh function ensures a more balanced contribution to systems that require high levels of
distribution of values across the range (-1, 1), reducing protection, such as financial systems and medical data,
value concentration and enhancing the randomness of the by facilitating biometric verification for encryption
key, leading to the generation of secure and robust without the need to exchange keys, thereby reducing
encryption keys. Better stability during training: The associated risks. Additionally, the automatic key
tanh function helps avoid issues such as vanishing change feature adds an extra layer of security,
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 231
reflecting the effectiveness of this system in providing [8] Hashem, M. I., & Kuban, K. H. (2023). Key
advanced protection. Ultimately, the research generation method from fingerprint image based on
highlights the importance of integrating biometrics deep convolutional neural network model. Nexo
and deep learning in developing effective security Revista Científica, 36(6), 906-925.
solutions that address contemporary challenges in https://doi.org/10.5377/nexo.vXXiXX.XXXX
data protection. [9] Erkate, U., Toktas, A., Enginoglu, S., Karabacak, E.,
& Thanh, D. N. H. (2024). An image encryption
• The method presented in the research has wide scheme based on chaotic logarithmic map and key
potential for application in various fields. In addition generation using deep CNN. Expert Systems with
to securing fingerprints and using them to generate Applications, 237, 121452.
encryption keys, the method can be applied to secure
https://doi.org/10.1016/j.eswa.2023.121452
internet of things (IoT) devices by generating strong
[10] Quinga Socasi, F., Zhinin-Vera, L., & Chang, O.
encryption keys that protect communication between
(2020). A deep learning approach for symmetric key
devices. It can also be used to secure data stored in
cryptography system. In Proceedings of the Future
the cloud by generating high-security encryption keys
Technologies Conference (pp. 41).
based on unique user attributes, such as fingerprints.
These applications highlight the flexibility and https://link.springer.com/chapter/10.1007/978-3-030-
efficiency of the method in addressing modern 63128-4_41
cybersecurity challenges and enhance its appeal in [11] Wu, Z., Lv, Z., Kang, J., Ding, W., & Zhang, J.
various practical scenarios. (2022). Fingerprint bio-key generation based on a
deep neural network. International Journal of
References Intelligent Systems, 37(7), 4329–4358.
https://doi.org/10.1002/int.22782
[1] Hosny, K. M., Darwish, M. M., & Fouda, M. M.
[12] Alesawy, O., & Muniyandi, R. C. (2016). Elliptic
(2021). Robust color images watermarking using
Curve Diffie-Hellman random keys using artificial
new fractional-order exponent moments. IEEE
neural network and genetic algorithm for secure data
Access, 9, 47425–47435.
over private cloud. Information Technology Journal,
https://doi.org/10.1109/ACCESS.2021.3069317
15(2), 77-83. https://doi.org/10.3923/itj.2016.77.83
[2] Kuzior, A., Tiutiunyk, I., Zielińska, A., & Kelemen,
[13] Saini, A., & Sehrawat, R. (2024). Enhancing data
R. (2024). Cybersecurity and cybercrime: Current
security through machine learning-based key
trends and threats. Journal of International Studies,
generation and encryption. Engineering, Technology
17(2). https://doi.org/10.14254/2071-8330.2024/17-
& Applied Science Research, 14(3), 14148-14154.
2/5
https://doi.org/10.48084/etasr.7181
[3] Saran, D. G., & Jain, K. (2023). An improvised
[14] Kurtninykh, I., Ghita, B., & Shiaeles, S. (2021).
algorithm for a dynamic key generation model. In
Comparative analysis of cryptographic key
Inventive Computation and Information
management systems. King's College London,
Technologies: Proceedings of ICICIT 2022 (pp.
Strand, London, WC2R 2LS, UK.
607–627). Springer Nature Singapore.
https://doi.org/10.48550/arXiv.2109.09905.
https://doi.org/10.1007/978-981-19-5048-5_44
[15] SSL Support Team. (2024, May 3). Key
[4] Rahman, Z., Yi, X., Billah, M., Sumi, M., & Anwar,
management best practices: A practical guide.
A. (2022). Enhancing AES using chaos and logistic
Retrieved from [SSL Support Team Website]
map-based key generation technique for securing
https://www.ssl.com/article/key-management-best-
IoT-based smart home. Electronics, 11(7), 1083.
practices-a-practical-guide//
https://doi.org/10.3390/electronics11071083
[16] Wang, L., & Lv, Y. (2024). Differential privacy-
[5] Kuznetsov, O., Zakharov, D., & Frontoni, E.
based data mining in distributed scenarios using
(2024). Deep learning-based biometric
decision trees. Informatica, 48(2), 145–158.
cryptographic key generation with post-quantum
https://doi.org/10.31449/inf.v48i23.6918
security. Multimedia Tools and Applications,
[17] Tu, Z., Milanfar, P., & Talebi, H. (2023). MULLER:
83(19), 56909–56938.
Multilayer Laplacian Resizer for Vision.
https://doi.org/10.1007/s11042-023-15265-6
ResearchGate. Retrieved from
[6] Yang, W., Wang, S., Cui, H., Tang, Z., & Li, Y.
https://www.researchgate.net/publication/369855623
(2023). A review of homomorphic encryption for
_MULLER_Multilayer_Laplacian_Resizer_for_Visi
privacy-preserving biometrics. Sensors, 23(7),
on.
3566. https://doi.org/10.3390/s23073566.
[18] Saifullah, S., Pranolo, A., & Dreżewski, R. (2024).
[7] Rana, M., Mamun, Q., & Islam, R. (2023).
Comparative analysis of image enhancement
Enhancing IoT security: An innovative key
techniques for brain tumor segmentation: Contrast,
management system for lightweight block ciphers.
histogram, and hybrid approaches. Journal Name,
Sensors, 23(18), 7678.
Volume (Issue), Page range.
https://doi.org/10.3390/s23187678
https://doi.org/10.48550/arXiv.2404.05341
232 Informatica 49 (2025) 213–234 S.A.S. Almola et al.
[19] Singh, P., Dutta, S., & Pranav, P. (2024). https://doi.org/10.3390/math10081285
Optimizing GANs for Cryptography: The Role and [30] Nahar, P., Chaudhari, N. S., & Tanwani, S. K.
Impact of Activation Functions in Neural Layers (2022). Fingerprint classification system using
Assessing the Cryptographic Strength. Applied CNN. Multimedia Tools and Applications, 81(17),
Sciences, 14(6), 2379. 24515–24527. https://doi.org/10.1007/s11042-022-
https://doi.org/10.3390/app14062379. 13494-6
[20] Zhang, B., & Liu, L. (2023). Chaos-Based Image [31] Nguyen, H. T., & Nguyen, L. T. (2019).
Encryption: Review, Application, and Challenges. Fingerprints classification through image analysis
Mathematics, 11(11), 2585. and machine learning method. Algorithms, 12(11),
https://doi.org/10.3390/math11112585 241. https://doi.org/10.3390/a12110241
[21] Taylor, O. E., & Igiri, C. G. (2024). Enhancing [32] Ang, L.-M., Seng, K. P., Ijemaru, G. K., &
image encryption using histogram analysis, adjacent Zungeru, A. M. (2018). Deployment of IoV for
pixel autocorrelation test in chaos-based framework. smart cities: Applications, architecture, and
International Journal of Computer Applications, challenges. IEEE Access, 7, 6473–6492.
186(22). https://doi.org/10.5120/ijca202492338 https://doi.org/10.1109/ACCESS.2018.2886575
[22] Munshi, N. H., Das, P., & Maitra, S. (2022). Chi- [33] Saeed, F., Hussain, M., & Aboalsamh, H. A.
Squared Test Analysis on Hybrid Cryptosystem. (2018a). Classification of live scanned fingerprints
Volume 14, Issue 1, 34-40. using dense SIFT based ridge orientation features.
https://doi.org/10.2174/18764029136662105082357 2018 1st International Conference on Computer
06. Applications & Information Security (ICCAIS), 1–
[23] Rasheed, A. F., Zarkoosh, M., & Abbas, S. (2023, 4. https://doi.org/10.1109/CAIS.2018.8441995
October). Comprehensive Evaluation of Encryption [34] Saeed, F., Hussain, M., & Aboalsamh, H. A.
Algorithms: A Study of 22 Performance Tests. 2023 (2018b). Classification of live scanned fingerprints
Sixth International Conference on Vocational using histogram of gradient descriptor. 2018 21st
Education and Electrical Engineering (ICVEE), Saudi Computer Society National Computer
Surabaya, France, 191-194. Conference (NCC),1–5.
https://doi.org/10.1109/ICVEE59738.2023.1034824 https://doi.org/10.1109/NCC.2018.8682629
0.
[24] Feng, L., Du, J., & Fu, C. (2023). Double graph
correlation encryption based on hyperchaos. PLOS
ONE, 18(9), e0291759.
https://doi.org/10.1371/journal.pone.0291759.
[25] Barker, E., & Roginsky, A. (2024). NIST SP 800-
131A Rev. 3: Transitioning the use of cryptographic
algorithms and key lengths (Initial Public Draft).
National Institute of Standards and Technology.
https://doi.org/10.6028/NIST.SP.800-131Ar3.ipd.
[26] Avaroğlu, E., Kahveci, S., & Akkurt, R. (2024).
Optimization of Acoustic Entropy Source for
Random Sequence Generation Using an Improved
Grey Wolf Algorithm. Computer Engineering
Department, Faculty of Engineering, Mersin
University. https://doi.org/10.18280/ts.410220
[27] Foreman, C., Yeung, R., & Curchod, F. J. (2024).
Statistical testing of random number generators and
their improvement using randomness extraction.
Cryptology ePrint Archive, Paper 2024/492.
Retrieved from https://doi.org/10.3390/e26121053
[28] Taylor, O. E., & Igiri, C. G. (2024). Enhancing
image encryption using histogram analysis,
adjacent pixel autocorrelation test in chaos-based
framework. International Journal of Computer
Applications, 186(22).
https://doi.org/10.5120/ijca2024923653
[29] Saeed, F., Hussain, M., & Aboalsamh, H. A.
(2022). Automatic fingerprint classification using
deep learning technology (DeepFKTNet).
Mathematics, 10(8), 1285.
Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 233
https://doi.org/10.31449/inf.v49i16.9490 Informatica 49 (2025) 235–248 235
CNN and LSTM-Based Multimodal Data Fusion for Performance
Optimization in Aerobics Using Wearable Sensors
Danhua Tan
School of Physical Education, Hengyang Normal University, Hengyang, Hunan, 421006, China
E-mail: tandanhua184914@outlook.com
Keywords: wearable sensors, convolutional neural network, long short-term memory, kalman filtering, aerobics
movements
Received: May 31, 2025
Aerobics is a high-intensity, multi-dimensional sport. Its motion evaluation places higher demands on
data quality and time series modeling capabilities. This paper proposes a method for evaluating aerobics
motion that integrates wearable sensors and motion tracking systems. It combines convolutional neural
networks (CNNs) with long short-term memory networks (LSTMs) to perform fusion analysis on
multimodal data from accelerometers, gyroscopes, magnetometers, and Kinect motion capture systems.
To improve data quality, Kalman filtering, time synchronization, and wavelet transform techniques are
introduced to preprocess the raw data. Experimental results show that this method performs well in motion
classification tasks: in indoor low-intensity training scenarios, the accuracy of the CNN model increases
from 74.5% to 87.1%; in high-intensity training scenarios, the accuracy increases from 75.0% to 88.2%.
After combining with LSTM, the model further enhances the modeling capabilities of motion time series
features and improves the recognition accuracy of complex motions. In different training scenarios, the
average improvement rate of motion scores is 25.8%. The system feedback delay is controlled within 200
milliseconds, with good real-time and practical performance. This method provides aerobics athletes with
high-precision movement assessment and personalized training suggestions, promoting the intelligent and
personalized development of sports training.
Povzetek: Metoda združuje senzorje, CNN in LSTM za multimodalno analizo aerobičnih gibov.
Kalmanovo filtriranje izboljša kakovost signalov, klasifikacijska točnost naraste do 88,2 %, povprečno
izboljšanje rezultatov znaša 25,8 %, odzivnost sistema pa ostane pod 200 ms.
1 Introduction combines wearable sensors with motion tracking systems
and uses CNN models and LSTM to fuse and analyze
With the popularity of aerobics, the accuracy of multimodal data. The system acquires motion data
movements and training effects have become the focus of through sensors such as accelerometers, gyroscopes,
coaches and athletes. During high-intensity and complex magnetometers, and Kinect motion capture systems. It
exercise, the movements of aerobics athletes can be uses Kalman filtering, time synchronization, and wavelet
affected by factors such as physical exertion, sports skills, transform to optimize data quality. The optimized data is
and external environment, resulting in unstable movement used through the CNN model to evaluate and optimize
performance. Traditional manual evaluation methods are motion performance, providing real-time feedback and
inefficient and subjective, and cannot provide athletes personalized training suggestions. The CNN-based
with accurate training feedback in real-time. With the optimization method combines wearable sensor
development of sensor technology [1] and artificial technology with deep learning (DL) algorithms to improve
intelligence [2], [3], motion evaluation methods based on the accuracy and stability of motion evaluation.
wearable devices [4] and intelligent feedback systems Experimental results show that the combination of
have become a research hotspot. Such feedback systems Kalman filtering and CNN models effectively improves
can provide accurate real-time data analysis, optimize the accuracy and stability of aerobics motion evaluation,
training programs, and improve athlete performance. providing strong support for the intelligent and precise
Therefore, developing a motion evaluation and development of sports training.
optimization system based on intelligent technology [5], Current research mostly uses weighted averaging or
[6] has become the key to improving training effects and simple concatenation, and the model structure is fixed,
athlete performance. without optimizing for the temporal characteristics and
This paper studies the motion evaluation and complex action patterns of sports data. This paper
optimization system based on intelligent technology to combines wearable sensors with aerobics tracking and
improve the training effect and motion performance of uses a model based on CNN and LSTM to achieve
aerobics athletes. To achieve this goal, this paper performance optimization. The main contributions of this
236 Informatica 49 (2025) 235–248 D. Tan
study include: 1) wavelet transform is combined and CNN-based multimodal data fusion method to improve
principal component analysis (PCA) to extract time- the accuracy and robustness of athlete action recognition
frequency features, and a dynamic weighted fusion by combining accelerometer, gyroscope, and visual data
strategy is adopted to improve the robustness of data [12]. In this study, data fusion technology [13] effectively
fusion; 2) small convolutional kernels are introduced into reduces sensor errors and enhances the system's
CNN to capture action details and combined with double- adaptability to complex actions. Zhang L proposed a KCF
layer LSTM to model long-term dependencies, enhancing (Kernelized Correlation Filters) tracking method based on
the model's ability to recognize complex action sequences; improved depth information [14], which successfully used
3) based on the model output, an action scoring function Kalman filtering to reduce the noise of motion sensors and
and error correction mechanism are constructed to provide improve the stability of motion estimation. Although these
athletes with immediate feedback and personalized methods have achieved good results to a certain extent,
training suggestions, improving the model's generalization most of them focus on a single motion estimation task, and
ability in different training scenarios through data their effects in complex training environments still need to
augmentation and adaptive filtering techniques. be improved. Existing methods also have shortcomings in
terms of personalized training feedback [15] and the
2 Related work generation of real-time optimization suggestions [16].
Therefore, how to comprehensively utilize multimodal
In recent years, many scholars have been committed to data and combine DL with real-time optimization
improving the accuracy of athletes' motion evaluation feedback systems is still a major challenge in current
through different technical means. Traditional motion research.
capture systems [7] rely too much on calibration
equipment and high-cost hardware settings. Although
3 Data fusion and movement
such capture systems can capture the movements of
athletes, they suffer from problems such as poor real- performance optimization
time performance, high data noise, and inconvenient
operation when evaluating high-intensity sports or 3.1 Data collection and preprocessing
complex movements. To improve the quality of sports
The study combines wearable sensors with motion
data, many studies have attempted to use wearable sensors
tracking systems to design an efficient data collection and
for motion tracking. Rigozzi C J et al. used data from
preprocessing solution. The key to the entire process is to
sensors such as accelerometers, gyroscopes, and
synchronously collect data from multiple sources and
magnetometers to monitor athletes’ body posture and
eliminate errors, providing a reliable basis for subsequent
motion trajectory [8]. Sensor data is easily affected by
analysis.
noise, environmental changes, and wear position
Wearable devices collect data in real-time through
deviation, resulting in inaccurate data. To reduce noise
built-in accelerometers [17], gyroscopes [18], and
interference, Zhang Y applied Kalman filtering
magnetometers [19]. The accelerometer records the
technology to the preprocessing of sensor data [9]. As
athlete’s acceleration changes in three-dimensional space;
technology matures, DL technology [10], especially
the gyroscope measures the athlete’s rotational angular
CNNs, has been applied to multimodal data analysis and
velocity; the magnetometer helps correct the direction of
action recognition by Gholamiangonabadi D [11], and has
movement. A multi-sensor system can accurately capture
achieved certain results. Existing research still faces
every movement of an athlete and generate rich time series
problems such as how to combine multiple data sources,
data [20]. Sensor data fusion equation is:
optimize data processing processes, and provide real-time
f(t) = α ⋅ a(t) + β ⋅ ω(t) + γ ⋅ m(t)
feedback in actual training scenarios. (1)
To solve the above problems, some researchers have a(t) is acceleration data; ω(t) is angular velocity data;
proposed a hybrid method that combines sensor data and m(t) is magnetic field data; α , β , γ are weighting
DL algorithms to improve the accuracy and real-time parameters. Figure 1 is a data acquisition flow chart.
performance of action recognition. Chakraborty A used a
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 237
Sensors collect data
(acceleration, angular Lower limb motion
Human motion
velocity, angle) reconstruction
Training Data calibration
datasets and cleaning
VICON
a b
Thigh
motion
Wearable Intra-limb
motion capture coordination Data fusion
device model and feature
extraction
Shank motion
Figure 1: Data collection flow chart
Figure 1 shows the complete process from data In addition to multi-sensor equipment, the motion
collection to motion analysis. The motion data is obtained tracking system Kinect [24] and depth camera [25] are
through wearable sensors and motion capture systems; key also introduced to obtain the spatial position information
features are extracted through data cleaning and fusion; of the key parts of the athletes. The system captures the
the CNN model is used to optimize the motion athlete’s action posture through 3D coordinates and
performance evaluation, ultimately providing scientific records the spatial coordinates of joints such as shoulders,
motion optimization and training suggestions for aerobics elbows, and knees, as well as their dynamic trajectories
athletes. To ensure the accuracy of the data, all collected over time. Through the calibration algorithm, combined
sensor data are transmitted to the background data with the position information of the sensor and the motion
processing system in real-time via Bluetooth or Wi-Fi tracking system, the effects caused by the wearer position
modules [21] to ensure the real-time and efficient data. By offset or motion capture error are corrected. Motion
adopting wireless data transmission [22], the data trajectory smoothing formula is:
transmission is not affected by the physical distance, 𝑁
1
ensuring that the data is updated and recorded in time 𝑝(𝑡) = ∑ 𝑝 𝑡
𝑁 𝑖( ) (4)
when the athletes perform complex movements. Bluetooth 𝑖=1
signal quality function is: pi(t) is the spatial position of different sampling
1
Q = points. The system synchronizes the data of sensors and
1 + exp(−k(S − S 2
0)) ( )
motion tracking systems to ensure that the sensor data and
S is the signal strength; S0 is the signal threshold; k is motion data at each moment can correspond correctly.
the Bluetooth signal adjustment parameter. To deal with After time synchronization, the data can be smoothly input
data anomalies, the Kalman filter [23] is used to smooth into the subsequent data processing and analysis. Time
the data of accelerometers and gyroscopes. The Kalman synchronization function is:
filter can dynamically predict the true value of the signal, 𝛥𝑇 = 𝑇𝑠𝑒𝑛𝑠𝑜𝑟 − 𝑇𝑐𝑎𝑚𝑒𝑟𝑎
(5)
optimize the measurement noise, and improve the
accuracy of the data. Kalman filter formula is: Tsensor and Tcamera are the timestamps of the sensor
𝑥 and camera, respectively. Table 1 is the motion capture
𝑘|𝑘 = 𝑥𝑘|𝑘−1 + 𝐾𝑘(𝑧𝑘 − 𝐻𝑥 )
𝑘|𝑘−1 (3) key point coordinate data table.
𝐾𝑘 is the Kalman gain, and 𝑧𝑘 is the observed value.
Table 1: Motion capture key point coordinate data table.
Timestamp Shoulder X Shoulder Y Shoulder Z
Knee X (cm) Knee Y (cm) Knee Z (cm)
(ms) (cm) (cm) (cm)
0 12.3 45.6 78.2 8.9 30.2 50.7
10 12.1 45.5 78.3 9 30.3 50.9
20 12.2 45.7 78.1 9.1 30.1 50.8
30 12.4 45.8 78.2 9.2 30.4 51
40 12.3 45.9 78.3 9.3 30.5 50.9
50 12.5 46 78.1 9.4 30.6 51.1
60 12.6 46.1 78.2 9.5 30.7 51.2
238 Informatica 49 (2025) 235–248 D. Tan
Table 1 records the three-dimensional spatial position component analysis (PCA) [32] are used. Weighted fusion
data (X, Y, Z, in centimeters) of the shoulder and knee at is to assign different weights to different data sources
different timestamps (in milliseconds). The data can be according to their signal-to-noise ratio and importance.
used to analyze the movement trajectory and change trend When the noise of sensor data is large, its weight in fusion
of the shoulder and knee in space. is reduced. On the contrary, if it is small, it means that the
spatial data provided by the motion tracking system is
3.2 Multimodal data fusion relatively stable and can be assigned a higher weight.
Weighted fusion formula is:
After data collection and preprocessing, the multimodal
𝐹
data from wearable sensors and motion tracking systems 𝑓𝑢𝑠𝑒𝑑 = 𝜔1𝐹1 + 𝜔2𝐹2 (9)
are effectively fused. Different data sources provide 𝐹1 and 𝐹2 are the features of different data sources,
different perspectives on the athlete's movements. and 𝜔1 and 𝜔2 are weights. Through weighted fusion
Wearable sensor data provides time series information processing, the fused data can more realistically reflect the
such as acceleration and angular velocity, and the motion athlete's performance. In weighted fusion, the quality of
tracking system provides spatial information such as joint the signal is the key to weight assignment, and the
position and motion trajectory. The effective integration relevance and accuracy of the data determine the
of this information can help comprehensively evaluate the contribution of each source information.
athlete's performance and provide accurate data input for Principal component analysis is used to reduce the
the DL model. dimensionality of the data, compressing the multi-
Time synchronization [26] is a prerequisite for dimensional raw data into fewer principal components,
ensuring the effective integration of multimodal data. reducing the redundancy of the data, and extracting the
When collecting sensor data and motion tracking data, the most representative features. PCA dimensionality
data needs to be accurately aligned in time due to the reduction formula is:
different collection frequencies of the two [27]. To 𝑚𝑎𝑥 𝑊𝑇 ∑ 𝑊
achieve data calibration, a timestamp is used to mark the 𝑋′ = 𝑋𝑊, 𝑊 = 𝑎𝑟𝑔 ( ) (10)
𝑊 𝑊𝑇𝐷𝑊
acquisition time of each frame of data to ensure that each Σ is the feature covariance matrix [33]; D is the weight
action frame can obtain corresponding sensor data and matrix [34]; W is the optimized feature matrix. The
tracking data. After time synchronization, the sensor data analysis process reduces the computational complexity
at each moment is guaranteed to correspond perfectly with and retains the key information in the data, providing more
the action tracking data, providing a basis for data fusion. efficient input for the training of DL models. Through the
After synchronization correction, it can ensure that the dimensionality reduction of PCA, redundant dimensions
action and sensor data at each moment correspond to each and noise can be eliminated, improving the efficiency and
other, avoiding information loss caused by asynchrony accuracy of subsequent analysis.
[28]. After fusion processing of multimodal data, data from
The original sensor data contains rich time series different sources is integrated into a unified format,
information. Wavelet transform [29] is used to analyze the providing rich and accurate input features for subsequent
data in the time and frequency domain to extract motion action evaluation. The fused data contains both time series
features. Wavelet transform formula is:
∞ information and spatial position information, which can
𝑡 − 𝑏
𝑊𝜓(𝑎, 𝑏) = ∫ 𝑓(𝑡)𝜓∗( ) 𝑑𝑡 fully and accurately reflect the athlete's action
(6)
−∞ 𝑎 performance in training [35]. Combined with efficient
data synchronization, feature extraction, and fusion
𝜓 is the mother wavelet function. Wavelet transform processing, sufficient high-quality data support is
can effectively capture the instantaneous changes in provided for the CNN, ensuring that the model can make
motion signals and use multi-scale analysis [30] to extract full use of various types of information for accurate
the time-frequency features of motion signals. It is evaluation.
integrated with the sensor data by calculating the spatial
characteristics of the athlete's joint angle, motion 3.3 Action performance evaluation based on
trajectory, and speed. The formula for calculating the joint
CNN and LSTM
angle is:
𝑣1 ⋅ 𝑣2 This paper studies the evaluation of action performance of
𝜃 = 𝑎𝑟𝑐 𝑐𝑜𝑠( )
‖𝑣
1‖‖𝑣 (7)
2‖ fused data based on CNN and LSTM. The CNN model has
an advantage in processing time series data and spatial
v1 and v2 are the vectors of two bones. The motion
data, while LSTM is good at capturing time series
trajectory curve fitting formula is:
dependencies, especially in capturing subtle differences in
𝑟(𝑡) = 𝑎0 + 𝑎1𝑡 + 𝑎2𝑡2+. . . +𝑎𝑛𝑡𝑛 (8) athletes' movements and automatically extracting features.
The spatial characteristics of the tracking system play The CNN model can more comprehensively evaluate the
a decisive role in the accuracy and coordination of the athletes' movement performance and achieve end-to-end
movements and are the core basis for evaluating the automated processing from raw sensor and tracking data
performance of athletes. Data fusion is the key to to final movement scoring and classification. Through
multimodal data processing. When fusion is performed, LSTM, the model can understand the continuity between
methods such as weighted fusion [31] and principal actions and evaluate actions based on the relationship
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 239
between the action sequence. When capturing complex obtain more comprehensive feature extraction in both
motion patterns, LSTM can supplement the timing spatial and temporal dimensions. After CNN extracts
information that the CNN model fails to fully capture, spatiotemporal features, LSTM processes these features in
providing a more detailed motion performance evaluation. time series, and the combination can more accurately
In the motion performance evaluation task, the evaluate the quality and type of actions. Figure 2 is a
combination of LSTM and CNN enables the model to diagram of the CNN model structure.
Figure 2: CNN model structure diagram
Figure 2 shows how the spatiotemporal features of the category (standard, insufficient, error); the activation
athlete's movements are gradually extracted through the function is Softmax. The LSTM model is used to model
convolutional layer, pooling layer, and fully connected the long-term dependencies of time series, with input
layer, and converted into a one-dimensional vector for dimensions of (T, D), where T=100 represents the time
classification and scoring through the flattening layer. The step, and D=9 represents the feature dimension. The
output layer evaluates the athlete's movement quality LSTM layer contains 128 hidden units and uses a double-
based on the features learned by the model, achieving layer stacking structure with an activation function of
automated and efficient movement quality recognition and Tanh. The output dimension is (C), C=3, and the
feedback. activation function is Softmax. The output feature vectors
The CNN model adopts a five-layer structure, with of CNN and LSTM are merged through concatenation
input data dimensions of (T, F). Among them, T=100 operation and input into a fused fully connected layer,
represents the time step; F=9 represents the input feature ultimately outputting action scores and classification
dimension (including data from three axes each of results. The model parameter settings are shown in Table
accelerometer, gyroscope, and magnetometer); the output 2:
dimension is (C), where C=3 represents the action
Table 2: Model parameter settings
Parameter Specifications Parameter Specifications
Learning rate 0.001 Dropout rate 0.5
Batch size 64 Training epochs 100
Optimizer Adam Hidden layer size 128
Loss function Cross entropy loss function Convolutional kernel size (3, 3)
The convolution operation can effectively capture the and long-term dependencies in the action. Convolution
spatial features and temporal dynamic changes of the operation formula is:
athlete's joint movement trajectory, body posture changes, 𝑘 𝑘
etc. The features processed by CNN can be passed to the 𝑓𝑖,𝑗 = ∑ ∑ 𝑥𝑖+𝑚,𝑗+𝑛 ⋅ 𝑤𝑚,𝑛 (11)
LSTM network, which further analyzes the timing 𝑚=−𝑘 𝑛=−𝑘
information of these features. Through the time memory
mechanism of LSTM, the network can learn the continuity
240 Informatica 49 (2025) 235–248 D. Tan
x is the input feature map; w is the convolution kernel; and increase the robustness of the model. Data
the choice of the convolution kernel is closely related to enhancement methods include operations such as rotation,
the characteristics of the data. Smaller convolution kernels mirror flipping, and scaling. More training samples are
help capture subtle changes, while larger convolution generated through enhancement operations, so that the
kernels can extract more macro features. It is necessary to model can still show excellent performance in different
adjust the bodybuilder's tiny movements and postures, and motion modes.
smaller convolution kernels help extract details more
accurately, so a convolution kernel of size 3×3 is selected. 3.4 Real-time feedback and optimization
The LSTM layer filters out irrelevant temporal suggestion generation
information through its forget gate, input gate, and output
gate, retaining the long-term and short-term dependency To improve the effect of aerobics training, a real-time
information related to the action performance evaluation. feedback system is designed to feed back the evaluation
After the convolution layer, the maximum pooling results generated during the training process to athletes,
and average pooling [36] are used to reduce the dimension helping them adjust their movements, avoid errors and
of the feature map. The role of the pooling layer is to optimize the quality of movements. The key to the
reduce the amount of data after the convolution operation, feedback system lies in real-time and accuracy. Only
reduce the computational complexity, and retain the most timely and accurate feedback can effectively improve the
important feature information. Pooling operation formula training level of athletes.
is: The system analyzes the real-time action data of
𝑚𝑎𝑥 athletes through a model combining a trained CNN model
𝑝𝑖,𝑗 = (𝑔 )
𝑚, 𝑛 𝑖+𝑚,𝑗+𝑛 (12) and LSTM to generate more accurate action scores and
evaluation results. Every time an athlete performs an
In the fully connected layer, the temporal information action, the system immediately analyzes the action and
generated by LSTM and the spatial features extracted by outputs a real-time score. The score reflects the accuracy,
CNN are integrated through the fully connected layer to fluency, and completion of the action. The higher the
generate the final score and classification results of the score, the more standard the action. For low-scoring
action performance. The model can not only evaluate the actions, the system automatically identifies the errors and
quality of actions, but also classify actions into multiple provides specific optimization suggestions. Error
categories such as "standard", "deficient", and "error". correction weight formula is:
Action scoring function is: 1
𝑁 𝜚𝑒𝑟𝑟𝑜𝑟 =
1 + 𝑒𝑥𝑝(−𝜅 ⋅ 𝛿) (16)
𝑆 = ∑ 𝜑𝑖ℎ𝑖 (13) 𝛿 is the margin of error. Based on the score and error
𝑖=1
recognition results, the system generates personalized
hi is the score of each feature, and φi is the weight. optimization suggestions. Optimization suggestion
The action evaluation score reflects the accuracy, fluency, generation function is:
and standardization of the athlete's action. The 𝑚𝑎𝑥
𝐺 = 𝑎𝑟𝑔 (𝑆 + 𝜆 ⋅ 𝐸 )
classification results provide coaches and athletes with 𝑖 𝑖 𝑖 (17)
targeted training improvement directions. Each
classification result helps to further guide athletes' specific Si is the score, and Ei is the severity of the error.
improvement measures in training. Suggestions include improving posture, adjusting the
During the training process, the Adam optimization range of motion, strengthening muscle control, etc., to
algorithm [37] is used to update parameters. The help athletes correct deficiencies in their movements.
optimization algorithm update formula is: Optimization suggestions can be provided in text form or
?̂? visualized through a graphical interface to help athletes
𝑡
𝜃𝑡+1 = 𝜃𝑡 − 𝜂 more intuitively understand the problems in their
√?̂? 4
𝑡 + 𝜖 (1 )
movements.
To ensure timely feedback, the system accelerates the
?̂?𝑡 and ?̂?𝑡 are momentum estimates. The cross-
calculation process by optimizing the algorithm,
entropy loss function [38] is used to optimize the
controlling the delay between action scoring and feedback
classification task, so that the model can better handle
generation to less than 200ms, ensuring that athletes can
multi-classification problems and continuously improve
receive targeted adjustment suggestions in a short period
the accuracy of action scoring and classification by
of time. Real-time feedback delay formula is:
minimizing the loss function. Classification loss function
𝑇
is: 𝑑𝑒𝑙𝑎𝑦 = 𝑇𝑝𝑟𝑜𝑐𝑒𝑠𝑠 + 𝑇𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡 (18)
𝐶
Through the real-time feedback mechanism, athletes
𝐿 = − ∑ 𝑦𝑖 𝑙𝑜𝑔(?̂?𝑖) (15) can continuously adjust their movements during training
𝑖=1 and gradually improve the training effect. Optimization
yi is the true label, and ŷi is the predicted probability. suggestions are not limited to correcting mistakes, but can
To improve the model’s training effect, data enhancement also help athletes improve the delicacy and accuracy of
technology is used to simulate the motion performance in their movements. The athlete improvement index formula
different training scenarios, expand the training data set, is:
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 241
𝛥𝑆 ⋅ 𝑙𝑛(1 + 𝐴0) 𝑡 𝑑𝑆
𝐼 = (𝑡) = 𝑂 𝑑𝜏
𝛥𝑇 ⋅ (1 + 𝑒−𝜀(𝛥𝑆−𝛥𝑆𝑎𝑣𝑔) (19) 𝑂 0 + ∫ (20)
) 0 𝑑𝜏
ΔS is the score increment; ΔT is the time; A0 is the O0 is the initial performance. Combining DL with
athlete's baseline ability; ε is the adjustment parameter; real-time feedback technology, it provides athletes with an
ΔSavg is the average score increment. Through long-term intelligent training platform that can effectively improve
training and optimization feedback, athletes can improve training efficiency and quality. Table 3 is some
their overall performance in a short period of time and hyperparameter data of the experiment.
achieve the best training effect. Long-term optimization
trend equation is:
Table 3: Some hyperparameter data
Parameters Function Parameters Function
α, β, γ Sensor data fusion weighting parameters k Bluetooth signal conditioning parameters
Kk Kalman Gain ψ Mother wavelet function
ω1, ω2 Weighted fusion weight Σ Feature covariance matrix
w Convolution Kernel φi Action score weight
m̂t, v̂t Momentum Estimation ŷi Prediction probability
δ Margin of Error ε Adjustment parameters
Table 3 lists the hyperparameters used for model and (4000), insufficient actions (4000), and incorrect actions
signal processing and their functional descriptions. These (4000). Each sample contains multimodal data of 100 time
hyperparameters play a key role in sensor data fusion, steps, including 9 channels (3-axis x 3 sensors) from
signal conditioning, feature extraction, and prediction, and accelerometers, gyroscopes, magnetometers, and 18 joint
can effectively improve the performance and accuracy of point 3D coordinate information obtained by Kinect. The
the model. Adjusting these parameters can optimize data collection frequency is 50 Hz, which means
system behavior according to specific application collecting 50 frames per second. The wearable sensor is
requirements and achieve more accurate data processing the Xsens MTw Awinda series inertial measurement unit
and action recognition. (IMU), with specific parameters shown in Table 4:
4 Experimental results
4.1 Experimental setup
This paper collects a total of 12000 action samples,
covering three categories of actions: standard actions
Table 4: Sensor parameters
Sensor Range Resolution Sampling rate
Accelerometer ±16g 0.001g 50 Hz
Gyroscope ±2000°/s 0.01°/s 50 Hz
Magnetometer ±2.5 Gauss 0.01 Gauss 50 Hz
The data collection is conducted in an indoor sports Kalman filtering, as a common noise suppression method,
arena, with an ambient temperature controlled at 22-25 ° can effectively improve the stability and accuracy of the
C, humidity at 45-60%, and good and stable lighting data by correcting the measured values. Figure 3 shows
conditions. All model training is completed under the the comparison between the original data of the
PyTorch 1.13.1 deep learning framework, with a training accelerometer, gyroscope, and magnetometer and the data
time of approximately 4 hours per model. after Kalman filtering, which is used to evaluate the
filtering effect.
4.2 Effect of Kalman Filtering on sensor
data
In sensor data processing, the original signal is easily
affected by noise, which affects the accuracy of the data.
242 Informatica 49 (2025) 235–248 D. Tan
(a) Accelerometer data comparison; (b) Gyroscope data comparison; (c) Magnetometer data comparison
Figure 3: Kalman filter effect
Figure 3 shows the fluctuation of different sensors in valley difference in the X direction drops from the original
different directions and the effect after filtering. In the 0.8µT to 0.6µT after filtering, providing more stable
accelerometer data, after Kalman filtering, the peak-to- environmental data. At 0ms, the original magnetic field X
valley difference of acceleration X drops from the original is 45.0µT, and after Kalman filtering, it is 45.2µT. The
0.13m/s² to 0.04m/s² after filtering, showing a more stable data optimized by Kalman filtering, combined with the
state. The original acceleration fluctuates greatly in the X- analysis and processing of the CNN model, can effectively
axis and Y-axis directions. After Kalman filtering, the improve the accuracy of motion tracking and evaluation,
fluctuation of the data is more stable, indicating that and provide athletes with more accurate real-time
Kalman filtering can effectively reduce the interference of feedback and personalized training suggestions.
noise. When the timestamp is 0ms, the original
acceleration X is 0.12m/s², and after Kalman filtering, it is 4.1.Action scoring effect
0.13m/s², with little change. The gyroscope data also
To comprehensively evaluate the performance of aerobics
shows a similar trend. The original data fluctuates to
athletes, a scoring system based on sensor data is
varying degrees in the X, Y, and Z directions, while the
introduced. The score value of each sensor at different
angular velocity data after Kalman filtering has less
time points reflects the quality and stability of the athlete's
fluctuation. After filtering, the peak-to-valley difference
movements. The comprehensive score combines the
in the angular velocity in the X direction is reduced from
average score of these three scores to provide a
the original 0.25°/s to the filtered 0.17°/s, ensuring more
comprehensive performance evaluation for bodybuilders,
accurate angle measurement. At 0ms, the original angular
helping athletes and coaches to grasp the effect of exercise
velocity Z is 5.50°/s, and after Kalman filtering it is
in real-time. Figure 4 shows the scoring performance at
5.52°/s. The magnetometer data also shows small
different time points.
fluctuations. After filtering, the fluctuations of the three-
axis magnetic field are further reduced. The peak-to-
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 243
Figure 4: Sensor scores at different time points
In the experiment, wearable sensors and motion time. CNN optimizes sensor data by combining Kalman
tracking technology are combined to fuse and analyze filtering and wavelet transform technology to provide
multimodal data such as acceleration, angular velocity, athletes with more accurate performance feedback and
and magnetic field using the CNN model and LSTM to promote the improvement of training results.
achieve accurate evaluation and optimization of aerobics
movements. The data in Figure 4 shows that as time goes 4.3 Performance of feedback systems in
by, the athletes' acceleration scores gradually increase different scenarios
from 70 to 85 points; the angular velocity scores increase
from 60 to 75; the magnetic field scores also maintain a The performance improvement before and after training
relatively stable upward trend. The final comprehensive can reflect the optimization effect of the system. The
score increases from 70.0 to 82.7. The data changes reflect following Table 5 shows the relationship between
the gradual optimization of the athletes' performance feedback delay, optimization suggestion generation time,
during training. Figure 4 shows the changes in different and athlete performance improvement in different training
scoring dimensions at each time point, helping trainers to scenarios.
accurately monitor and adjust training strategies in real-
Table 5: Feedback performance in different training scenarios
Optimization Pre-training Post-training
Average
Suggestion Performance Performance Improvement
Scenario Type Feedback Delay
Generation Time Score (out of Score (out of Rate (%)
(ms)
(ms) 100) 100)
Indoor Low
150 90 65 80 23.1
Intensity
Indoor High
180 95 62 78 25.8
Intensity
Outdoor Low
160 85 68 82 20.6
Intensity
Outdoor High
200 110 60 75 25
Intensity
According to the data in Table 5, there are certain suggestion generation time is 95 milliseconds; the action
differences in feedback delay and optimization suggestion score before training is 62 points. After training, the score
generation time in different training scenarios. In the increases to 78 points, and the improvement rate reaches
indoor high-intensity training scenario, the average 25.8%, which is the highest improvement rate in all
feedback delay is 180 milliseconds; the optimization scenarios. This shows that under high-intensity training,
244 Informatica 49 (2025) 235–248 D. Tan
despite the longer feedback delay, the system is able to 4.4 CNN model training effect
more effectively generate optimization suggestions and
To better evaluate and optimize the performance of
improve athlete performance. In outdoor low-intensity
athletes in different training scenarios, the experiment
training scenarios, the feedback delay is shorter, at 160
uses CNN model and LSTM to conduct in-depth analysis
milliseconds, but the improvement rate is 20.6%, which is
of various training data. In the four training scenarios of
relatively low, indicating that the feedback system in this
indoor low intensity, indoor high intensity, outdoor low
scenario has room for improvement in the generation and
intensity, and outdoor high intensity, the change of
application of optimization suggestions. The efficiency
training cycle has an important impact on the accuracy and
and optimization effect of the feedback system vary in
performance of the model. The changes in the accuracy,
different training scenarios. The response speed of the
precision, recall, F1-score, and loss value of the CNN
system and the time to generate suggestions are closely
model in different scenarios can be analyzed to understand
related to the performance improvement of athletes.
the model optimization trend during the training process.
Figure 5 shows the changes in key indicators of the model
after each training cycle in these scenarios.
(a) Indoor low-intensity training scene; (b) Indoor high-intensity training scene; (c) Outdoor low-intensity training
scene; (d) Outdoor high-intensity training scene
Figure 5: CNN effects in different scenes
By analyzing the model effects in different training intensity and high-intensity scenarios also shows a similar
scenarios through the data in Figure 5, the performance of trend, with the accuracy rate increased from 73.2% to
the CNN model combined with LSTM in various 85.9% and 74.3% to 86.7%, respectively, and the loss
indicators is significantly improved with the increase of value decreased significantly. Whether it is a low-intensity
training cycles. In the indoor low-intensity training or high-intensity scenario, the model shows good stability
scenario, the accuracy rate increases from 74.5% to 87.1%; and accuracy in different environments. With the increase
the precision, recall rate, and F1-score also increase of training cycles, the performance of the CNN model in
steadily, and the loss value decreases from 0.68 to 0.27, motion scoring and classification has been effectively
indicating that the model's ability to fit the data steadily improved, and the training error can be significantly
improves with the passage of training time. In indoor high- reduced.
intensity scenarios, the accuracy rate increases from To further verify the effectiveness of the method
75.0% to 88.2%. The improvement in precision and recall proposed in this paper in the evaluation of aerobics
rate shows that the model can effectively handle more movements, the CNN-LSTM hybrid model is compared
complex training environments, and the loss value drops with several representative studies in recent years. The
to 0.26. The training effect of the model in outdoor low- comparative methods include: multimodal fusion method
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 245
based on traditional CNN, action recognition method are conducted on the same dataset, with evaluation metrics
based on LSTM, traditional classification method based including accuracy and F1 score, as well as action
on support vector machine (SVM), and temporal modeling evaluation RMSE (Root Mean Squared Error) value, as
method based on Transformer. Comparative experiments shown in Table 6:
Table 6: Comparison of the performance of this method with existing research
Model Accuracy F1-score RMSE
SVM 72.1% 0.703 8.1
CNN 83.2% 0.817 6.0
LSTM 84.6% 0.832 5.8
Transformer 85.4% 0.841 5.6
CNN-LSTM 88.2% 0.867 4.2
how to further improve the model's ability to classify
From Table 6, it can be seen that this method complex actions and its ability to comprehensively
outperforms existing methods in terms of accuracy and F1 process multimodal data remains an issue to be resolved.
score. Compared with traditional CNN methods, this Future research can focus on optimizing sensor data
method has improved accuracy by 5.0%; compared with collection and processing technology, strengthening data
the LSTM method, it has improved by 3.6%; compared synchronization and fusion algorithms, and further
with the Transformer method, it has improved by 2.8%. improving the stability and adaptability of CNN models in
The RMSE of the method proposed in this paper is 4.2, different environments. With the advancement of
significantly lower than other comparative methods, technology, combined with more sensors and analysis of
indicating that the CNN-LSTM hybrid model proposed in training scenarios, it can be possible to provide athletes
this paper has higher accuracy and stability in predicting with more detailed and comprehensive personalized
action scores. This result represents that the method training programs, promoting the intelligent and precise
proposed in this paper can effectively achieve accurate development of sports training.
evaluation of athletes' movements and provide objective
guidance for scientific training. 6 Conclusions
This paper proposes a fitness exercise action evaluation
5 Experimental discussion method that integrates wearable sensors and motion
By combining wearable sensors with motion tracking tracking systems, and combines CNN and LSTM models
technology and using CNNs for multimodal data fusion to fuse and analyze multimodal data. The experimental
and analysis, this study has achieved significant results show that the method exhibits excellent
experimental results in motion evaluation and performance in action classification tasks: in indoor low-
optimization. Kalman filtering significantly improves the intensity training scenarios, the accuracy increases from
stability and accuracy of sensor data in the application of 74.5% to 87.1%; in high-intensity training scenarios, the
noise suppression, and effectively reduces the impact of accuracy increases from 75.0% to 88.2%. By introducing
environmental interference on data quality. With the Kalman filtering, wavelet transform, and dynamic
increase of training cycles, the CNN model combined with weighting fusion strategy, the stability of sensor data and
LSTM continues to improve in terms of accuracy, the generalization ability of the model have been
precision, recall rate, and F1-score in action scoring and effectively improved. This paper not only provides high-
classification, showing good fitting ability. In high- precision motion evaluation and real-time feedback for
intensity training scenarios, despite relatively long aerobics athletes, but also provides a transferable technical
feedback delays, the system can still effectively generate framework for other high-intensity, multimodal motion
optimization suggestions, and the athletes' performance sports projects. In the future, the system can be further
has been significantly improved. expanded to remote training platforms, intelligent
However, the experimental results also reveal that wearable devices, virtual coaching systems, and other
there are still some potential challenges and problems in application scenarios, promoting the deep integration and
the application and development of the system. The widespread application of artificial intelligence
performance of the model has been significantly improved technology in the fields of sports training and health
in different training scenarios, but the feedback system has management. In the future, lightweight CNN structures
a long delay and optimization suggestion generation time such as MobileNet and TinyML, deployment of models,
in high-intensity training scenarios, which affects the on wearable devices, and heterogeneous computing
system's real-time response capability. Kalman filtering acceleration can be used to further shorten feedback
and other data optimization techniques effectively reduce latency and improve system response speed.
noise, but in complex or extreme training environments,
external interference still poses a risk of affecting the Authorship contribution statement
accuracy of sensor data and causing certain errors in
training feedback. With the diversification of training Danhua Tan: Writing-Original draft preparation,
scenarios and the increase in environmental complexity, Conceptualization, Supervision, Project administration.
246 Informatica 49 (2025) 235–248 D. Tan
Data availability 2346, 2023.
https://doi.org/10.1177/17479541221138015
The experimental data used to support the findings of this [9] Y. Zhang, “Design of Wireless Motion Sensor
study are available from the corresponding author upon Nodes based on the Kalman Filter Algorithm,”
request. Recent Advances in Electrical & Electronic
Engineering (Formerly Recent Patents on
Author statement Electrical & Electronic Engineering), 16(3): 248–
The manuscript has been read and approved by all the 255, 2023.
authors, the requirements for authorship, as stated earlier https://doi.org/10.2174/23520965156662209081
in this document, have been met, and each author believes 52036
that the manuscript represents honest work. [10] S. Akan and S. Varlı, “Use of deep learning in
soccer videos analysis: survey,” Multimed Syst,
29(3): 897–915, 2023.
Ethical approval https://doi.org/10.1007/s00530-022-01027-0
All authors have been personally and actively involved in [11] D. Gholamiangonabadi and K. Grolinger,
substantial work leading to the paper, and will take public “Personalized models for human activity
responsibility for its content. recognition with wearable sensors: deep neural
networks and signal processing,” Applied
References Intelligence, 53(5): 6041–6061, 2023.
https://doi.org/10.1007/s10489-022-03832-6
[1] M. E. M. Simbolon, D. K. A. Firdausi, I. [12] A. Chakraborty and N. Mukherjee, “A deep-CNN
Dwisaputra, A. Rusdiana, C. Pebriandani, and R. based low-cost, multi-modal sensing system for
Prayoga, “Utilization of Sensor technology as a efficient walking activity identification,”
Sport Technology Innovation in Athlete Multimed Tools Appl, 82(11): 16741–16766,
Performance Measurement,” Indonesian Journal 2023. https://doi.org/10.1007/s11042-022-13990-
of Electronics and Instrumentation Systems x
(IJEIS), 13(2): 147–158, 2023. [13] W. Liu, Y. Liu, and R. Bucknall, “Filtering based
https://doi.org/10.22146/ijeis.89581 multi-sensor data fusion algorithm for a reliable
[2] Z. Mei, “3D images analysis of sports technical unmanned surface vehicle navigation,” Journal of
features and sports training methods based on Marine Engineering & Technology, 22(2): 67–83,
artificial intelligence,” J Test Eval, 51(1): 189– 2023.
200, 2023. https://doi.org/10.1520/JTE20210469 https://doi.org/10.1080/20464177.2022.2031558
[3] S. A. Kovalchik, “Player tracking data in sports,” [14] L. Zhang and H. Dai, “Motion trajectory tracking
Annu Rev Stat Appl, 10(1): 677–697, 2023. of athletes with improved depth information-
https://doi.org/10.1146/annurev-statistics- based KCF tracking method,” Multimed Tools
033021-110117 Appl, 82(17): 26481–26493, 2023.
[4] L. Yang, O. Amin, and B. Shihada, “Intelligent https://doi.org/10.1007/s11042-023-14929-6
wearable systems: Opportunities and challenges [15] P. Hao and K. Qian, “The Integration of
in health and sports,” ACM Comput Surv, 56(7):1– Personalized Training Program Design and
42, 2024. https://doi.org/10.1145/3648469 Information Technology for Athletes,” Scalable
[5] W. Li, “Application of IoT-enabled computing Computing: Practice and Experience, 25(5):
technology for designing sports technical action 4351–4359, 2024.
characteristic model,” Soft comput, 27(17): https://doi.org/10.12694/scpe.v25i5.3083
12807–12824, 2023. [16] V. Deepak, D. K. Anguraj, and S. S. Mantha, “An
https://doi.org/10.1007/s00500-023-08966-4 efficient recommendation system for athletic
[6] Y. Fang, “Utilizing Wearable Technology to performance optimization by enriched grey wolf
Enhance Training and Performance Monitoring in optimization,” Pers Ubiquitous Comput, 27(3):
Indonesian Badminton Players,” Studies in Sports 1015–1026, 2023.
Science and Physical Education, 2(1): 11–23, https://doi.org/10.1007/s00779-022-01680-2
2024. DOI:10.1186/s40561-023-00247-9 [17] J. K. Urbanek et al., “Free-living gait cadence
[7] J. Corban et al., “Using an affordable motion measured by wearable accelerometer: a promising
capture system to evaluate the prognostic value of alternative to traditional measures of mobility for
drop vertical jump parameters for noncontact assessing fall risk,” The Journals of Gerontology:
ACL injury,” Am J Sports Med, 51(4):1059–1066, Series A, 78(5): 802–810, 2023.
2023. https://doi.org/10.1093/gerona/glac013
https://doi.org/10.1177/03635465231151686 [18] A. Hussain, S. Ali, M.-I. Joo, and H.-C. Kim, “A
[8] C. J. Rigozzi, G. A. Vio, and P. Poronnik, deep learning approach for detecting and
“Application of wearable technologies for player classifying cat activity to monitor and improve
motion analysis in racket sports: A systematic cat’s well-being using accelerometer, gyroscope,
review,” Int J Sports Sci Coach, 18(6): 2321– and magnetometer,” IEEE Sens J, 24(2): 1996–
2008, 2023.
CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 247
[19] A. Spilz and M. Munz, “Synchronisation of Nanoscale Measurements and Ensemble
wearable inertial measurement units based on Behavior,” ACS Nano, 17(21): 21493–21505,
magnetometer data,” Biomedical 2023. https://doi.org/10.1021/acsnano.3c06335
Engineering/Biomedizinische Technik, 68(3): [31] J. Sun, H. Zhang, X. Ma, R. Wang, H. Sima, and
263–273, 2023. https://doi.org/10.1515/bmt- J. Wang, “Spectral–Spatial Adaptive Weighted
2021-0329 Fusion and Residual Dense Network for
[20] A. Liu, R. P. Mahapatra, and A. V. R. Mayuri, hyperspectral image classification,” The Egyptian
“Hybrid design for sports data visualization using Journal of Remote Sensing and Space Sciences,
AI and big data analytics,” Complex & Intelligent 28(1): 21–33, 2025.
Systems, 9(3): 2969–2980, 2023. https://doi.org/10.1016/j.ejrs.2024.11.001
https://doi.org/10.1007/s40747-021-00557-w [32] J. P. Bharadiya, “A tutorial on principal
[21] C.-T. Lin, Y. Wang, S.-F. Chen, K.-C. Huang, and component analysis for dimensionality reduction
L.-D. Liao, “Design and verification of a wearable in machine learning,” Int J Innov Sci Res Technol,
wireless 64-channel high-resolution EEG 8(5): 2028–2032, 2023.
acquisition system with wi-fi transmission,” Med DOI:10.5281/zenodo.8002436
Biol Eng Comput, 61(11): 3003–3019, 2023. [33] F. Bizzarri, D. Del Giudice, S. Grillo, D. Linaro,
https://doi.org/10.1007/s11517-023-02879-y A. Brambilla, and F. Milano, “Inertia estimation
[22] X. Shi and H. Zou, “Data Collection and Analysis through covariance matrix,” IEEE Transactions
based on Sensor Technology in Sports Training,” on Power Systems, 39(1): 947–956, 2023. DOI:
Scalable Computing: Practice and Experience, 10.1109/TPWRS.2023.3236059
25(5): 4399–4406, 2024. [34] Y. He, C.-K. Zhang, H.-B. Zeng, and M. Wu,
https://doi.org/10.12694/scpe.v25i5.3200 “Additional functions of variable-augmented-
[23] M. Khodarahmi and V. Maihami, “A review on based free-weighting matrices and application to
Kalman filter models,” Archives of systems with time-varying delay,” Int J Syst Sci,
Computational Methods in Engineering, 30(1): 54(5): 991–1003, 2023.
727–747, 2023. https://doi.org/10.1007/s11831- https://doi.org/10.1080/00207721.2022.2157198
022-09815-7 [35] Singh S, Sehgal V K. "Deep Learning-Based CNN
[24] M. Azhar, S. Ullah, M. Raees, K. U. Rahman, and Multi-Modal Camera Model Identification for
I. U. Rehman, “A real-time multi view gait-based Video Source Identification,” Informatica: An
automatic gender classification system using International Journal of Computing and
kinect sensor,” Multimed Tools Appl, 82(8): Informatics, 47(3): 417-430, 2023.
11993–12016, 2023. https://doi.org/10.31449/inf.v47i3.4392
https://doi.org/10.1007/s11042-022-13704-3 [36] T. Sharma, N. K. Verma, and S. Masood, “Mixed
[25] L. Lv, J. Yang, F. Gu, J. Fan, Q. Zhu, and X. Liu, fuzzy pooling in convolutional neural networks
“Validity and reliability of a depth camera–based for image classification,” Multimed Tools Appl,
quantitative measurement for joint motion of the 82(6): 8405–8421, 2023.
hand,” J Hand Surg Glob Online, 5(1): 39–47, https://doi.org/10.1007/s11042-022-13553-0
2023. https://doi.org/10.1016/j.jhsg.2022.08.011 [37] M. Reyad, A. M. Sarhan, and M. Arafa, “A
[26] Y. Wu, Z. Sun, G. Ran, and L. Xue, “Intermittent modified Adam algorithm for deep neural network
control for fixed-time synchronization of coupled optimization,” Neural Comput Appl, 35(23):
networks,” IEEE/CAA Journal of Automatica 17095–17112, 2023.
Sinica, 10(6): 1488–1490, 2023. DOI: https://doi.org/10.1007/s00521-023-08568-z
10.1109/JAS.2023.123363 [38] Z. Mei et al., “Automatic loss function search for
[27] Hrovatin, N. "Enabling Decentralized Privacy adversarial unsupervised domain adaptation,”
Preserving Data Processing in Sensor Networks,” IEEE Transactions on Circuits and Systems for
Informatica (03505596), 48(1): 141-142, 2024. Video Technology, 33(10): 5868–5881, 2023.
https://doi.org/:10.31449/inf.v48i1.5739. DOI: 10.1109/TCSVT.2023.3260246
[28] Thi H N, Duc C V, Duc C T, HH Minh, SN Van,
LV Quan. “Memetic Algorithm for Maximizing
K-coverage and K-Connectivity in Wireless
Sensor Network,” Informatica, (03505596), 49(1):
1-7, 2025.
https://doi.org/:10.31449/inf.v49i1.6750.
[29] A. Halidou, Y. Mohamadou, A. A. A. Ari, and E.
J. G. Zacko, “Review of wavelet denoising
algorithms,” Multimed Tools Appl, 82(27):
41539–41569, 2023.
https://doi.org/10.1007/s11042-023-15127-0
[30] M. Kang, C. L. Bentley, J. T. Mefford, W. C.
Chueh, and P. R. Unwin, “Multiscale Analysis of
Electrocatalytic Particle Activities: Linking
248 Informatica 49 (2025) 235–248 D. Tan
https://doi.org/10.31449/inf.v49i16.8812 Informatica 49 (2025) 249–268 249
Metaheuristic-Enhanced SVR Models for California Bearing Ratio
Prediction in Geotechnical Engineering
Yulin Lan1, Na Feng 2, * and Zhisheng Yang3
1Planning and Finance Department, Weifang Engineering Vocational College, Weifang 262500, Shandong, China
2School of Information Engineering, Weifang Engineering Vocational College, Weifang 262500, Shandong, China
3Party and Government Office, Weifang Engineering Vocational College, Weifang 262500, Shandong, China
E-mail: sdqzyuchen@163.com
*Corresponding author
Keywords: california bearing ratio, support vector regression, adaptive opposition slime mold algorithm, alibaba and
the forty thieves optimization algorithm, dingo optimization algorithm
Received: April 7, 2025
Soil resistance characteristics, particularly the California Bearing Ratio (CBR), play a pivotal role in
pavement and subgrade design. However, conventional laboratory-based CBR testing is often time-
consuming, labor-intensive, and costly. This study presents a novel machine learning framework that
combines Support Vector Regression (SVR) with three recent metaheuristic optimization algorithms—
Dingo Optimization Algorithm (DOA), Alibaba and the Forty Thieves Optimization (AFT), and Adaptive
Opposition Slime Mold Algorithm (AOSMA)—to predict CBR values efficiently and accurately. A dataset
consisting of 220 soil samples with eight geotechnical input parameters was used to develop and evaluate
the hybrid models. The predictive performance of each model was assessed using multiple evaluation
metrics, including R², RMSE, MSE, RSR, and WAPE. Results indicate that the SVR–AFT (SVAF) hybrid
model outperformed the others, achieving an R² of 0.9968 and an RMSE of 0.7946 in the testing phase,
demonstrating high generalization ability and predictive precision. The integration of SVR with
metaheuristic algorithms significantly enhances model robustness and accuracy, offering a practical and
cost-effective alternative to empirical CBR testing methods. This work highlights the potential of hybrid
AI models in solving complex geotechnical prediction problems and contributes to the growing body of
research at the intersection of civil engineering and artificial intelligence.
Povzetek: Hibridni modeli SVR so optimizirani z metahevristikami AFT, DOA in AOSMA za hitro in
natančno napovedovanje CBR iz osmih geotehničnih parametrov. Na 220 vzorcih doseže najboljši model
SVAF R² = 0.9968 in RMSE = 0.7946, kar ponuja stroškovno učinkovito alternativo laboratorijskim
testom.
1 Introduction Gams and Kolenik highlight the reciprocal relationship
between electronics and AI, where swift hardware
CBR is the term utilized by Geotechnical construction to improvements, described by a comprehensive set of
describe the resistivity of the substrate sample to a piston's Information Society (IS) laws, have driven
insertion. More specifically, the CBR describes the force groundbreaking progress in AI across fields like medicine,
applied to the piston to enable it to penetrate the soil. [1]. smart environments, and autonomous systems. Their
Initially, the CBR examination was devised in California research shows that AI and ambient intelligence (AmI) not
to appraise the suitability of soils for highway only benefit from electronic advancements but are also
construction. Civil engineers modified the testing process beginning to influence hardware optimization and
to enhance its impact on the airport's construction. Almost intelligent system design through AI, indicating a move
all emerging countries widely adopt the CBR test to toward a more integrated technological progression [4].
appraise pavements' resilience to soil [2]. A material's After the compacted soils have been tested, the laboratory
load-bearing capacity is gauged by its CBR, which is the can conduct the subsequent test. However, it is possible
ratio of the attainable supporting strength of base materials for soils located in trenches to conduct the CBR test under
to that of regular crushed rock. In structural engineering, the circumstances on the premises [5]Recognizing that in
100 is considered a reasonable limit for the CBR for situ and laboratory test outcomes can show perceptible
broken rock substances. differences between soil types, unit weights, and water
Conversely, the values of CBR for alternative content is essential. Employing CBR tests has proved
materials are found to be below 100 [3]. Recent advances promising for presenting information about the stability
in artificial intelligence (AI) are closely intertwined with and strength of different kinds of structures related to soils,
the rapid development of electronic technologies, forming such as road fills, airport roadways and dams, and road
the foundation of the so-called "information society.” foundations.
250 Informatica 49 (2025) 249–268 Y. Lan et al.
Moreover, these tests can be conducted in unsoaked comparative study by Ma et al. [27], evaluating 20
and soaked soil varieties. The laboratory CBR tests are metaheuristic algorithms for SVR parameter tuning in
characterized by their demanding nature regarding time landslide displacement prediction, revealed considerable
and manual effort. Moreover, the outcomes of such tests variation in outcomes. The Multiverse Optimizer emerged
are frequently marred by discrepancies attributed to the as particularly efficient in achieving high accuracy with
suboptimal quality of conditions in the lab and samples of low computational cost, highlighting the critical role of
dirt, which, in turn, lead to inaccurate CBR values [6]. algorithm selection in enhancing SVR model
Various studies have been performed on the performance. These studies collectively underscore the
California bearing ratio, which led the researchers to growing impact of hybrid AI models in geotechnical
formulate different procedures. Previous studies showed applications. However, few works have focused
that changes in the soil types and properties affected the specifically on integrating SVR with newer and less
value of CBR. Amongst other things, it has been observed explored metaheuristic algorithms such as the Alibaba and
that most research work has focused on studying the Forty Thieves (AFT), Dingo Optimization Algorithm
relationship existing between the compacted properties, (DOA), or Adaptive Opposition Slime Mold Algorithm
unique indicators, as well as the mineral examinations' (AOSMA). Our study addresses this gap by systematically
CBR concentrations [6]–[8][9], [10], [11], [12], [13], evaluating and comparing these novel SVR-based hybrid
[14][15], [16], [17], [18]. To determine the value of CBR, models in predicting CBR, offering insights into their
soils are compacted at a predetermined MDD and OMC at optimization behaviors, convergence patterns, and
a specified energy level for the soil material. For the CBR, predictive robustness. By incorporating recent
the cases are soaked for four days; the primary purpose of advancements and experimental benchmarks, this work
this soaking is to allow absorption. Consequently, the aims to contribute both technically and methodologically
assessment of the CBR value for a soaked sample typically to the field of AI-driven geotechnical modeling.
requires a period of approximately five to six days. This Considering the variety of parameters to be
delay can prove detrimental to the timely completion of a considered and the range of datasets observed, as
large-scale construction endeavor. Since soil is vastly explained in the previous paper, it becomes of prime
different from one quality to another, applying this importance to develop robust predictive methodologies to
exercise to the foundation soil samples collected from a model the mechanical attributes of the CBR and delineate
small count of sites may not truly represent the soil the complex correlations between the constituents of soil.
properties for all roads. To eliminate this deficiency, a Recent studies have explored various soft computing
large count of specimens is needed to be gathered for tests. and machine learning techniques for predicting the
Therefore, calculating the CBR values for pavement California Bearing Ratio (CBR). These include Random
subgrade soils deploying easily identifiable parameters Forest, Gradient Boosting, and XGBoost, which are
becomes very important in developing appropriate known for their solid performance in regression tasks.
pavement design parameters. Recently, interest in using However, such models often need extensive tuning and
Artificial Intelligence (AI) tactics to solve geotechnical can struggle to capture complex nonlinear relationships,
engineering problems has increased. Consequently, some especially when feature interactions are subtle.
valuable outcomes have been obtained [19], [20]. Conversely, Support Vector Regression (SVR)
Furthermore, a limited count of studies has documented demonstrates strong generalization and robustness,
endeavors to appraise the (CBR) of soils via adopting especially when combined with kernel functions and
diverse Artificial Neural Network (ANN) methodologies metaheuristic optimization. To assess the effectiveness of
[21], [22], [23]. Recent advances in machine learning have the proposed SVR-based hybrid models, we also
increasingly supported geotechnical engineering by incorporated Random Forest as a benchmark and
improving the prediction of soil and foundation properties compared its predictive accuracy with the SVR models
through data-driven models. Support Vector Regression enhanced by metaheuristics.
(SVR), when combined with metaheuristic optimization, Recent studies have explored different soft computing
has proven particularly effective in modeling complex and machine learning approaches for predicting CBR. Key
nonlinear relationships within geotechnical datasets. Ngo techniques include Artificial Neural Networks (ANN),
et al. [24] demonstrated that SVR optimized via Multiple Linear Regression (MLR), Group Method of
metaheuristics yielded superior performance in predicting Data Handling (GMDH), and SVM. These models use soil
the unconfined compressive strength of stabilized soils. parameters such as Atterberg limits, dry density, optimum
Similarly, Hoang et al. [25] applied enhanced SVR models moisture content, and soil gradation as inputs. However,
to successfully estimate pile bearing capacity, showcasing many struggle with issues like overfitting, limited
the method’s versatility in foundation engineering. In the generalization to unseen data, or inadequate
context of California Bearing Ratio (CBR) prediction, hyperparameter optimization. As shown in Table 1, most
Bherde et al. [26] reported that Random Forest Regression previous models achieved only moderate accuracy and did
outperformed other algorithms, including SVR, with not utilize metaheuristic optimization to boost prediction
maximum dry density and gravel content being the most performance. To fill this gap, this study introduces a
influential predictors. While these results support the hybrid Support Vector Regression (SVR) model
effectiveness of ensemble models, they also underline the combined with three metaheuristic optimizers—AFT,
need for more optimized SVR configurations that can DOA, and AOSMA—aimed at improving the model’s
match or exceed ensemble performance. A broader ability to learn nonlinear patterns. The superior
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 251
performance of the SVAF model, especially in RMSE and
R² metrics, highlights the benefits of this approach.
Table 1: Overview of past methods for CBR prediction
Performance
Dataset
Study Model Type Input Features Metrics (R² / Notes
Size
RMSE)
LL, PL, PI, MDD, ANN prone to
Yildirim & Artificial Neural R² = 0.945 /
OMC, % Sand and 120 overfitting and high
Gunaydin Network (ANN) RMSE = 1.82
Gravel variance
LL, PL, PI,
Good performance,
Taskiran GMDH Compaction 200 R² ≈ 0.92
limited interpretability
properties
Alawi & Multiple Linear LL, PL, PI, Soil Struggles with
100 R² = 0.86
Rajab Regression (MLR) Gradation nonlinear relationships
SVR + Improved
Ngo et al. Arithmetic Grain size, R² = 0.96 / SVR enhanced with
150
[24] Optimization Density, OMC, PI RMSE = 1.12 metaheuristic tuning
(IAOA)
Stochastic Gradient
Wu et al. LL, PL, MDD, % Ensemble method with
Boosting Regression 300 R² = 0.974
[25] Clay, % Silt good generalization
(SGBR)
Strong performance,
Bherde et Random Forest MDD, % Gravel,
400 R² = 0.982 but no hyperparameter
al.[26] Regression (RFR) OMC, PI
optimization
LL, PL, PI, MDD, Best accuracy using
Current R² = 0.9968 /
SVR + AFT OMC, SDA, QD, 300 hybrid SVR and AFT
Study RMSE = 0.7946
OPC metaheuristic
This study addresses the challenges of traditional 2 Materials and methodology
CBR testing methods, which are often time-consuming
and costly, by exploring advanced machine learning
models supplemented with nature-inspired optimization 2.1 Data gathering
techniques. Specifically, it focuses on Support Vector
This study's dataset consists of 121 soil samples gathered
Regression (SVR), a popular tool for nonlinear regression.
from various geotechnical investigation reports and lab
Since SVR's performance heavily depends on
tests across different regions in [insert country or region,
hyperparameter selection, three recent metaheuristic
e.g., southwestern Iran or southeastern Asia—please
algorithms—Adaptive Opposition Slime Mold Algorithm
specify based on your case]. The samples include a variety
(AOSMA), Alibaba and the Forty Thieves Algorithm
of soil types such as clayey soils, silty sands, gravels, and
(AFT), and Dingo Optimization Algorithm (DOA)—are
mixtures to ensure the broad applicability of the predictive
employed to optimize the SVR framework. These
models. Each sample records essential input parameters
algorithms offer diverse search strategies with strong
like [list key parameters: e.g., dry density, moisture
potential for effective global optimization and faster
content, liquid limit, plasticity index, etc.], with the
convergence. The predictive capability of these hybrid
California Bearing Ratio (CBR) used as the target
models is evaluated using five standard statistical metrics:
variable. Data were obtained from both published
R², RMSE, MSE, RSR, and WAPE. The study aims to (1)
literature and in-house experiments, offering a
develop and validate an SVR model for predicting the
comprehensive understanding of soil behavior in various
California Bearing Ratio (CBR) based on soil and
compaction parameters; (2) improve SVR's predictive geological settings. The data in this investigation depend
performance through hyperparameter tuning with the on eight variables: OPC, SDA, QD, plastic limit, liquid
three optimization algorithms; and (3) perform a limit, maximum dry density, plasticity index, and ideal
comprehensive comparison of the models using these content. Simultaneously, the resulting component of
metrics to identify the most accurate and reliable one for interest is identified as the CBR value. The dataset has
geotechnical use. been split into two subsets: 30% of the total set is made up
of the testing phase, while 70% comprises the training
phase.
252 Informatica 49 (2025) 249–268 Y. Lan et al.
Table 1 depicts a numerical example of some of the parameters for analyzing statistics. The maximum values
parameters used in building the scheme. This table gives for the LL, PL, PI, MDD, OMC, SDA, QD, and OPC
an overall summary of some of the attributes, such as variables are 52.1, 37.2, 19.5, 1.777, 29.5, 20, 20, and 8,
minimum (Min), maximum (Max), standard deviation respectively. Also, CBR's maximum value as an output
(St.), and mean (M). It is crucial to determine the essential parameter is 66.75 percent.
Table 2: The statistical features of the dataset components
The numerical traits
Parameters
Max Min Mean St.dev
LL 52.10 21.20 35.8450 6.15380
PL 37.20 17.90 26.6830 4.28120
PI 19.50 2.10 9.16230 4.11490
MDD 1.7770 1.3650 1.49290 0.08830
OMC 29.50 18.90 24.1430 2.42670
SDA (%) 20.0 0 10.6600 7.15460
QD (%) 20.0 0 10.640 8.19610
OPC (%) 8.0 2.0 4.94490 2.37980
CBR (%) 66.750 19.690 39.9590 10.8660
2.2 Support vector regression (SVR) dependent variables: Eq. (1) gives, mathematically, the
operation of an SVR;
In its early phases, the (SVM) technology was used to
address pattern identification problems, initially 𝑓(𝑥) = 𝑍𝑇𝜑(𝑥) + 𝑏 (1)
introduced by Vapnik [28]. Then, Vapnik [29] suggested 𝑏 is the variable element
the SVM algorithm to solve problems with function 𝑓(𝑥) symbolizes the expected parameters
approximation, which resulted in developing the SVR 𝑍 is the 𝑙 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 weighting component.
approach. The SVR approach is an innovative and perhaps An example of how distinct components 𝑋𝑖 are
practical method in data regression analysis. In this study, mapped to a feature space with many dimensions is the
Support Vector Regression (SVR) is used as the main function 𝜑(𝑥).
predictive model. Because of its ability to manage The formal expression for the ε-insensitive coefficient
nonlinear relationships through kernel functions, SVR is of loss is found in Eq. (2).
especially suitable for modeling complex geotechnical |𝑦 − 𝑓(𝑥)|ఌ = max (0, |𝑦 − 𝑓(𝑥)| − 휀) (2)
datasets. The Radial Basis Function (RBF) kernel is
chosen due to its effectiveness in high-dimensional feature The difference between the real number, symbolized
spaces and its ability to generalize well. The SVR model by y, and the anticipated value, f(x), as expressed
depends on three main hyperparameters: C (regularization theoretically by Eq. (3), is known as the residual.
parameter): Regulates the trade-off between achieving a 𝑅(𝑥, 𝑦) = 𝑦 − 𝑓(𝑥) (3)
low training error and maintaining a simple model. γ According to Eq. (4), incorporating the entire residue
(gamma): Determines the influence range of a single piece within a preset boundary value of ε is the optimum
training example; a lower value means a wider reach, regression model.
while a higher value indicates a more localized effect. ε −휀 ≤ 𝑅(𝑥, 𝑦) ≤ 휀 (4)
(epsilon): Defines the tolerance margin within which Eq. (4) coincides with the hypothesis on the whole
errors are not penalized. These parameters were tuned training data set. Thus, if the residual meets the criterion
using metaheuristic algorithms to minimize the root mean 𝑅(𝑥, 𝑦) = ±휀, the data exhibits a maximum detour from
square error (RMSE) of predictions. the hyperplane. One can calculate a spatial separation of
From an academic standpoint, the (SVR) may be an arbitrary data point (𝑥, 𝑦) from the hyperplane
explained in the subsequent terms. SVR uses a dataset that 𝑅(𝑥, 𝑦) = 0 by the formula |𝑅(𝑥, 𝑦)|/‖𝑊∗‖. Further, 𝑍∗
has 𝑁 entries in it {(𝑋𝑖 , 𝑦𝑖), 𝑖 = 1,2, … , ?̅?}. can be calculated as:
The training dataset's overall count of instances is
𝑍∗ = (1,−𝑍𝑇)𝑇 (5)
denoted by 𝑀.
𝑋𝑖 = {𝑥1, 𝑥2, … , 𝑥𝑚} ∈ 𝑅
𝑚 denotes the 𝑖 − 𝑡ℎ In this question, the variable 𝛿 is assumed to be the
component of the vector with M dimensions. maximum degree of dispersion between the hyperplane
𝑦𝑖 ∈ 𝑅 represents the genuine value connected to 𝑋𝑖. 𝑅(𝑥, 𝑦) = 0 and the dataset (𝑥, 𝑦). All the training data
For that, in machine learning tactics, l-dimensional can be induced to meet the requirements shown in Eq. (6).
feature space - or something similar - represents the exact If the value of δ reaches its maximum in the SVR scheme,
mapping of any training data point 𝑋𝑖 in an SVR. The it means that the scheme can exhibit the best
obtained hyperplane is in the space of features that will be generalization ability.
selected using Support Vector Regression towards the |𝑅(𝑥, 𝑦)| ≤ 𝛿‖𝑍∗‖ (6)
portrayal of the optimal hyper-plane between the input (or
the uncorrelated) variable and the exact output, the
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 253
Whenever 𝑅(𝑥, 𝑦) equals an ε, the most significant 휁 ∗
𝑖 ≥ 0, 휁 ≥ 0 𝑖 = 1, 2, … , ?̅?
𝑖
distance is reached. After that, Eq. (6) may be changed to The first term of Eq. (10) tends to restrict weights, so
become Eq. (7). Considering the translation of an they stay above a certain limit to preserve whether the
optimization issue to a minimal ‖𝑍‖, ‖𝑍∗ ‖2 = ‖𝑍‖2 + 1, regression algorithm is constant. The second part of this
and ‖𝑍∗ ‖ must be a minimal value to attain the maximum system defines the ratio of certainty to vulnerability for
of 𝛿. possible hazards resulting from previous experiences
휀 = 𝛿‖𝑍∗‖ (7) using the ε-insensitive Relationship to losing. After
determining the solution for the quadratic enhancement
Even with efforts to keep mistakes within the (−휀, 휀) issue with inequality restrictions, the value of coefficient
range during training, it is still possible for certain errors Z can be gathered from Eq. (11).
to surpass this limit. If training mistakes are less than -ε, 𝑀
they are displayed by 휁𝑖 , and if they are more than ε, they 𝑍 =∑( 𝛽∗ − 𝛽𝑖)𝜑(𝑥𝑖)
∗ 𝑖 (11)
are displayed by 휁𝑖 . We define the notations 휁𝑖 and 휁∗𝑖 𝑖=1
according to Eqs. (8) and (9), appropriately. The values of 𝛽∗𝑖 and 𝛽𝑖 are determined by solving a
0 𝑅(𝑥𝑖 , 𝑦휁 𝑖) − 휀 ≤ 0
𝑖 = {
quadratic programming problem that incorporates an
(8)
𝑅(𝑥𝑖 , 𝑦𝑖) − 휀 𝑜𝑡ℎ𝑒𝑟𝑠 indication of the Lagrangian multipliers. Mathematically,
0
휁∗
휀 − 𝑅(𝑥𝑖 , 𝑦𝑖) ≤ 0
the Support Vector Regression function is displayed with
= { (9)
𝑖 the utilize of the equation depicted as Eq. (12):
휀 − 𝑅(𝑥𝑖 , 𝑦𝑖) 𝑜𝑡ℎ𝑒𝑟𝑠
𝑀
By using the 휀 sensitivity loss function, (SVR) aims
𝑓(𝑥) =∑( 𝛽∗ − 𝛽𝑖)𝐾(𝑥𝑖 − 𝑥) + 𝑏 (12)
to eliminate the distinction across the training data and the 𝑖
hyperplane region and choose a hyperplane that produces 𝑖=1
the best result. The goal function for (SVR) optimization The kernel function, which is displayed as 𝐾(𝑥𝑖 − 𝑥),
is displayed by Eq. (10): exhibits the capacity to convert the training data into a
1
𝑚𝑖𝑛𝐹(𝑍, 𝑏, 휁𝑖 , 휁
∗ ) = ‖𝑍‖2 + 𝑐 ∑𝑀 higher nonlinear l-dimensional space. Therefore, this
𝑖=1(휁2 𝑖 +𝑖 (10) methodology is deemed appropriate for solving issues
휁∗ )
𝑖 related to nonlinear relationships, including projecting
With the confinements: electrical power. Figure 1 shows the operational diagram
𝑦𝑖 − 𝑍
𝑇𝜑(𝑥𝑖) − 𝑏 ≤ 휀 + 휁𝑖 𝑖 = 1, 2, … , ?̅? for SVR.
𝑍𝑇𝜑(𝑥𝑖) + 𝑏 − 𝑦𝑖 ≤ 휀 + 휁∗ 𝑖 = 1, 2, … , ?̅?
𝑖
Figure 1: The progress and validation flowchart of an SVR scheme
254 Informatica 49 (2025) 249–268 Y. Lan et al.
2.3 AOSMA 𝑋𝑖 = (𝑥
1
𝑖 , 𝑥
2
𝑖 , ⋯ , 𝑥
𝑑
𝑖 ), ∀𝑖 ∈ [1, 𝑁] is the 𝑖𝑡ℎ slime
The plasmodial slime mold's oscillatory mode is the basis mold's location in 𝑑-dimension.
for SMA. The slime mold employs a positive-negative 𝐹(𝑋𝑖), ∀𝑖 = [1, 𝑁] symbolizes the 𝑖𝑡ℎ slime's
feedback mechanism in conjunction with an oscillatory fitness.
mode to establish the optimal route toward nutrition [30]. The following represents the location as well as
AOSMA is a new statistical technique that incorporates an fitness of the slime mold at round 𝑡:
opposition-based learning-based adaptive decision- 𝑥1
1 𝑥
2
1 ⋯ 𝑥
𝑑
1 𝑋
1
2
making method to improve slime mold's nearing conduct 𝑥
1
𝑋(𝑥) = 2 𝑥2 ⋯ 𝑥
𝑑
2 𝑋
= [ 2] (13)
[31]. ⋮ ⋮ ⋮ ⋮ ⋮
Let it be assumed that a total of 𝑁 individuals of the [𝑥1𝑁 𝑥
2
𝑁 ⋯ 𝑥
𝑑 𝑋
𝑁] 𝑁
species of slime mold under consideration are resident in 𝐹(𝑋) = [𝐹(𝑋1), 𝐹(𝑋2),⋯ , 𝐹(𝑋𝑁) ] (14)
the search domain that is bounded by an upper boundary
In the (𝑡 + 1) cycle, the situation of the slime mold
(UB) and a lower boundary (LB) for theoretical
has been advanced. It has undergone an upgrade in its
framework development of the (AOSMA).
spatial disposition, which determine as Eq. (15):
𝑋𝐿𝐵(𝑡) + 𝑉𝑑(𝑊. 𝑋𝐴(𝑡) − 𝑋𝐵(𝑡)) 𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 < 𝑚𝑖
𝑋𝑖(𝑡 + 1) = { 𝑉𝑒 . 𝑋𝑖(𝑡) 𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 ≥ 𝑚𝑖 , ∀𝑖 ∈ [1, 𝑁] (15)
𝑟𝑎𝑛𝑑. (𝑈𝐵 − 𝐿𝐵) + 𝐿𝐵 𝑝1 < 𝑧
𝑋𝐿𝐵 is the best local slime mold The randomly assigned velocities are known as 𝑉𝑑
𝑋𝐴 and 𝑋𝐵 are pooled individuals by random and 𝑉𝑒 and are defined as follows:
𝑊 is the weight factor 𝑉𝑑 ∈ [−𝑑, 𝑑] (23)
𝑉𝑑 and 𝑉𝑒 are the random velocities.
𝑝1 and 𝑝2 are randomly chosen numbers in [0,1] 𝑉𝑒 ∈ [−𝑒, 𝑒] (24)
The slime mold's chance, which starts at a random
𝑡
search situation, is fixed at 𝛿 = 0.03. 𝑑 = arctanh (− ( ) + 1) (25)
𝑇
The 𝑖 − 𝑡ℎ member of the population's threshold 𝑡
value, 𝑚𝑖, aids in choosing the slime mold's location, 𝑒 = 1 − (26)
which is calculated as Eq. (16): 𝑇
𝑚 T is the maximum cycle.
𝑖 = 𝑡𝑎𝑛ℎ|𝐹(𝑋𝑖) − 𝐹𝐺|, ∀𝑖 ∈ [1, 𝑁] (16)
SMA holds great promise for both investigation and
𝐹𝐺 = 𝐹(𝑋𝐺) (17) exploitation in technological problem-solving and
enhancement. However, the improvement of slime mold
𝑊(𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹(𝑖)) regulations in the SMA area is nevertheless reliant on a
𝐹𝐿𝐵 − 𝐹(𝑋𝑖) 𝑁
1 + 𝑟𝑎𝑛𝑑. log ( + 1) 1 ≤ 𝑖 ≤ count of basic circumstances.
𝐹𝐿𝐵 − 𝐹𝐿𝑤 2
= (18) Case 1: The region's best slime mold, 𝑋𝐿𝐵, and two
𝐹𝐿𝐵 − 𝐹(𝑋𝑖) 𝑁
1 − 𝑟𝑎𝑛𝑑. log ( + 1) < 𝑖 ≤ 𝑁 random individuals, 𝑋𝐴 and 𝑋𝐵, with velocity 𝑉𝑑, drove to
{ 𝐹𝐿𝐵 − 𝐹𝐿𝑤 2 determine when 𝑝1 ≥ 𝑧 and 𝑝2 < 𝑚𝑖. This stage makes
it easier to strike a balance amongst discovery and
𝐹𝐺 and 𝑋𝐺 are the values of worldwide top ranking extraction.
and worldwide best well-being. Case 2: The orientation of the slime mold with
𝑟𝑎𝑛𝑑 displays a random number in within [0,1] velocity 𝑉𝑒 directs the search when 𝑝1 ≥ 𝑧 and 𝑝2 ≥ 𝑚𝑖.
𝐹𝐿𝐵 and 𝐹𝐿𝑤 are local best and worst fitness values. This instance facilitates fraud.
The utilization of an ascending order for sorting Case 3: When 𝑝1 < 𝑧, the person reinitializes within
fitness values can be employed in a minimization a specified search domain. This phase facilitates
problem: investigation.
[𝑆𝑜𝑟𝑡𝐹 , 𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹 ] = 𝑠𝑜𝑟𝑡(𝐹) (19) Case 1 shows how the possibilities of finding
The local best and worst fitness also the local best solutions are improperly controlled during exploration and
slime mold 𝑋𝐿𝐵 are computed as Eqs. (20-22): exploitation since 𝑋𝐴 and 𝑋𝐵 are two random slime molds.
𝐹 To get around this limitation, 𝑋𝐴 can be used in place of
𝐿𝐵 = 𝐹(𝑆𝑜𝑟𝑡𝐹(1)) (20)
best local individual 𝑋𝐿𝐵. Consequently, the location of
𝐹𝐿𝑊 = 𝐹(𝑆𝑜𝑟𝑡𝐹(𝑁)) (21) the 𝑖 − 𝑡ℎ component is remodeled as Eq. (27):
𝑋𝐿𝐵 = 𝑋(𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹(1)) (22)
𝑋𝐿𝐵(𝑡) + 𝑉𝑑(𝑊. 𝑋𝐿𝐵(𝑡) − 𝑋𝐵(𝑡)) 𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 < 𝑚𝑖
𝑋𝑛𝑖(𝑡) = { 𝑉𝑒 . 𝑋𝑖(𝑡) 𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 ≥ 𝑚𝑖 (27)
𝑟𝑎𝑛𝑑. (𝑈𝐵 − 𝐿𝐵) + 𝐿𝐵 𝑝1 < 𝛿
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 255
Case 2 illustrates how slime mold deliberately targets A flexible decision is formed drawing on the prior
a nearby location, resulting in a path with a lower fitness worth of fitness 𝑓(𝑋𝑖(𝑡)) and the present fitness value
level. A better approach to this issue is to implement an 𝑓(𝑋𝑛𝑖(𝑡)) in the event of a depleted nutrient pathway.
adaptive decision system. This is a typical academic kind of writing. It helps provide
Case 3 illustrates that the SMA offers criteria for added research as needed. Then, the situation for the
exploration. However, with a small value 𝛿 = 0.03, the subsequent cycle is improved:
exploration has been limited. To address the issue, it is 𝑋𝑖(𝑡 + 1)
imperative to introduce an auxiliary exploration adjunct 𝑋𝑛𝑖(𝑡) 𝐹(𝑋𝑛𝑖(𝑡)) ≤ 𝐹(𝑋𝑖(𝑡))
for SMA. A practical approach to addressing the = { , ∀𝑖 (30)
𝑋𝑟𝑖(𝑡) 𝐹(𝑋𝑛𝑖(𝑡)) > 𝐹(𝑋𝑖(𝑡))limitations of Cases 2 and 3 entails employing a flexible
∈ [1, 𝑁]
decision approach that leverages opposition-based
learning (OBL) to determine the necessity of additional The aforementioned AOSMA framework is displayed
exploratory efforts [32]. The OBL uses a defined 𝑋𝑜𝑝𝑖 in in pseudo-code, as shown in Algorithm 1.
the search domain, which is precisely the opposite of the In this study, the Adaptive Opposition Slime Mold
𝑋𝑛𝑖 for each member (𝑖 = 1,2,⋯ , 𝑁), and compares it Algorithm (AOSMA) is used not as a standalone
to upgrade the following cycles’ situation. It assists in optimizer but as a hybrid component integrated with
improving convergence and avoiding the chances of being Support Vector Regression (SVR). AOSMA optimizes
closed in the local minima. So, the 𝑋𝑜𝑝𝑖 for the 𝑖 − 𝑡ℎ three key hyperparameters of SVR—specifically the
individual in 𝑗 − 𝑡ℎ (𝑗 = 1,2,⋯ , 𝑠) dimension is regularization parameter C, the epsilon-insensitive loss
margin ε, and the kernel coefficient γ—with the goal of
described as follows:
minimizing prediction error measured by RMSE. Through
𝑗
𝑋𝑜𝑝𝑖 = min(𝑋𝑛𝑖(𝑡)) + 𝑚𝑎𝑥(𝑋𝑛𝑖(𝑡)) its adaptive opposition-based learning strategy and
𝑗 (28)
− 𝑋𝑛𝑖 (𝑡) dynamic parameter control, AOSMA allows for more
effective exploration of the search space and helps prevent
𝑋𝑟𝑖 represents the 𝑖 − 𝑡ℎ member’s situation in the
premature convergence. As a result, the hybrid AOSMA-
reduction issue and is depicted as:
SVR model achieves better accuracy and generalization in
𝑋𝑜𝑝𝑖(𝑡) 𝐹(𝑋𝑜𝑝𝑖(𝑡)) < 𝐹(𝑋𝑛𝑖(𝑡)) predicting California Bearing Ratio (CBR) values from
𝑋𝑟𝑖 = { (29)
𝑋𝑛𝑖(𝑡) 𝐹(𝑋𝑜𝑝𝑖(𝑡)) ≥ 𝐹(𝑋𝑛𝑖(𝑡)) geotechnical data.
Algorithm 1: AOSMA
Begin
Using the criteria for searching boundary range [𝐿𝐵, 𝑈𝐵], choose a target variable 𝑓 with inputs 𝑁, 𝑠, 𝑇, and 𝛿.
Outputs: 𝑋𝐺 and 𝐹𝐺
Initialization: Launch the slime mold at arbitrary.
𝑋𝑖 = (𝑥
1, 𝑥2𝑖 𝑖 , ⋯ , 𝑥
𝑑
𝑖 ), ∀𝑖 ∈ [1, 𝑁] during the first revision, inside the query boundaries 𝑈𝐵 and 𝐿𝐵
𝑡 = 1.
while (𝑡 ≤ 𝑇)
→ Determine the 𝑁 slime mold's fitness values 𝐹(𝑋).
→ Put the fitness value in order.
→ The local best individual 𝑋𝐿𝐵 should be updated to match the local best conditioning 𝐹𝐿𝐵.
→ The local weakest fitness 𝐹𝐿𝑊 should be updated.
→ Update the matching worldwide greatest individual 𝑋𝐺 and global best fitness 𝐹𝐺.
→ Refresh the measurement of 𝑊.
→ Update the 𝑑 using Eq. (25) and 𝑒 using Eq. (26).
for (each slime mold 𝑖 = 1: 𝑁)
o Create the 𝑝1 and 𝑝2 randomized numbers.
o Create the 𝑚𝑖 threshold quantity.
o Utilizing Eq. (27), determine the new slime mold location 𝑋𝑛𝑖.
o Determine the new slime mold 𝐹(𝑋𝑛𝑖)'s nutritional value.
if (𝐹(𝑋𝑛𝑖) > 𝐹(𝑋𝑖) // Adaptive decision strategy
• Estimate 𝑋𝑜𝑝𝑖 using Eq. (24). //Opposition-based learning
• Select 𝑋𝑟𝑖 using Eq. (29).
End
o Revise the subsequent cycle slime mold 𝑋𝑖 using Eq. (30).
end
→ The following repetition 𝑡 = 𝑡 + 1
end
The result is 𝑋𝐺, representing the global most effective region.
256 Informatica 49 (2025) 249–268 Y. Lan et al.
2.4 AFT 𝜆1 (𝜆1 = 1) refers to a fixed value that controls
exploration and exploitation.
The present investigation clarifies the basic AFT
algorithm's mathematical model, which is described in 𝑎 = [(𝑛 − 1). 𝑟𝑎𝑛𝑑(𝑛, 1)] (34)
[33]. The scheme encompasses three states that can be The vector 𝑟𝑎𝑛𝑑(𝑛, 1) is generated as a set of random
analyzed and delineated in the following: numbers within the bounds of [0,1].
Case 1: The pursuit of Ali Baba by the thieves, as 𝑥𝑡
𝑚𝑡 𝑖 𝑖𝑓 𝑓(𝑥
𝑡
𝑖 ) ≥ 𝑓(𝑚𝑡
𝑎(𝑖))
derived from information obtained from a source, can be 𝑎(𝑖) = {
𝑚𝑡
𝑎(𝑖) 𝑖𝑓 𝑓(𝑥
𝑡
𝑖 ) < 𝑓(𝑚𝑡 (35)
displayed by a simulation of their situations, as illustrated 𝑎(𝑖))
in Eq. (31): The score of the fitness function is denoted by 𝑓(0).
𝑥𝑡+1 = 𝑔𝑏𝑒𝑠𝑡𝑡 + [Tdt(𝑏𝑒𝑠𝑡𝑡𝑖 − 𝑦
𝑡
𝑖 𝑖 )𝑟1 +
Case 2: Thieves may perceive they have been tricked
Tdt(𝑦𝑡𝑖 −𝑚
𝑡 and will likely start exploring unfamiliar and unplanned
𝑎(𝑖))𝑟2]𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5), 𝑝 ≥ (31)
areas.
0.5, 𝑞 > 𝑃𝑝𝑡
𝑥𝑡+1𝑖 = 𝑇𝑑𝑡[(𝑢𝑗 − 𝑙𝑗)𝑟 + 𝑙𝑗]; 𝑝 ≥ 0.5 , 𝑞 ≤ 𝑃𝑝𝑡 (36)
𝑦𝑡𝑖 represents Ali Baba’s situation regarding the thief
𝑖. The upper and lower bounds of the search domain at
𝑚𝑡
𝑎(𝑖) represents the amount of cleverness that dimension j are displayed by 𝑢𝑗 and 𝑙𝑗, respectively.
Marjaneh uses to cover up thievery 𝑖. r displays a stochastic quantity generated in the
𝑥𝑡+1𝑖 denotes the situation of the 𝑖 − 𝑡ℎ interval [0, 1].
thief.
𝑔𝑏𝑒𝑠𝑡𝑡 Case 3: To improve AFT's exploration and
is the most excellent situation a thief has ever
exploitation capabilities, thieves can investigate
had worldwide.
alternative search situations beyond those identified
𝑟1, 𝑟2, rand, 𝑝, and 𝑞 are random values created within
through the utilization of Eq. (31). This scenario can be
[0, 1]
formulated as Eq. (37):
𝑏𝑒𝑠𝑡𝑡𝑖 is the optimal location of thief 𝑖 has determined.
Tdt is the robbers' surveillance area as specified by 𝑥𝑡+1 = 𝑔𝑏𝑒𝑠𝑡𝑡 − [Tdt𝑖 (𝑏𝑒𝑠𝑡𝑡𝑖 − 𝑦
𝑡
𝑖 )𝑟1
Eq. (32). + Tdt(𝑦𝑡𝑖 (37)
𝑝 ≥ 0.5 presents either 0 or 1 −𝑚𝑡
𝑎(𝑖))𝑟2]𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5)
𝑃𝑝𝑡 is Ali Baba's potential perceptive ability, as stated
by Eq. (33). Algorithm 2 concisely and formally describes the
𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5) can be −1 or 1, and iterative pseudo-code stages that correspond to the core
𝑎 is defined as Eq. (34). AFT.
𝑡 The proposed hybrid framework combines the Dingo
Tdt = 𝜏0𝑒
−𝜏1( )𝜏
𝑇 1 (32) Optimization Algorithm (DOA) with Support Vector
𝑡 and 𝑇 Please consult the current and maximal Regression (SVR) to tune the model’s hyperparameters:
repetition standards, accordingly. C, ε, and γ. The DOA emulates the natural hunting tactics
𝜏0 (𝜏0 = 1) is a preliminary estimate of the of dingoes, such as surrounding, chasing, and attacking
monitoring length. prey, which are adapted into search operators for
𝜏1 (𝜏1 = 2) is a set amount that regulates the exploring the SVR parameter space. The aim is to
discovery and utilization of resources. minimize the SVR’s RMSE on training data by identifying
𝑡 the optimal parameter combination. By balancing
𝑃𝑝𝑡 = 𝜆0log (𝜆1( )𝜆0 (33) diversification and intensification, the DOA-SVR hybrid
𝑇
model can effectively avoid local optima and enhance
𝜆0 (𝜆0 = 1) depicts the final assessment of the
SVR's ability to generalize for accurate CBR prediction.
robbers' chances of completing their task after the hunt.
Algorithm 2: AFT
Establish the regulation settings and get started.
Start by assessing every thief's starting, optimal, and worldwide situations.
Start by assessing Marjane's intelligence in comparison to all thieves.
Set 𝑡 ← 1
While (𝑡 ≤ 𝑇) do
Eq. (33) is used for modifying the input parameter 𝑃𝑝𝑡 .
for each thief, do
if (𝑝 ≥ 0.5) then
if (𝑞 ≥ 𝑃𝑝𝑡) then
Use Equation (32) to update the thieves' positioning.
else
Utilizing Equation (36), adjust the robbers' whereabouts.
end if
else
Refine the thieves’ situation by Eq. (37).
end if
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 257
end for
Refresh all thieves' current, best, and worldwide standings.
Utilizing Eq. (35), alter Marjane's wit goals.
𝑡 = 𝑡 + 1
end while
Give back the world's optimal solution.
?⃗? = (1,1) provides access to the dingo's situation at
2.5 Dingo optimization algorithm (DOA) (𝑃∗ − 𝑃, 𝑄∗)For example, Eqs. (38) and (39) make it
From the earliest times, nature has consistently been easier for dingos to travel throughout the hunting area and
regarded as an exceptionally instructive and impactful find their prey randomly.
educator. Every species that exists on the planet Earth
possesses a distinct and unique mechanism for ensuring its 2.5.2 Hunting
survival. The present study involves the mathematical Using a mathematical method, creating a dingo hunting
modeling of hunting behavior and social arrangements in strategy involves assuming that the alpha, beta, and other
the dingo species. This analytical approach is the basis for members of the pack have a thorough awareness of the
developing a DOA nature-inspired optimization technique possible prey sites. When conducting hunting trips, the
[34]. The two primary constituents of DOA are regarded alpha dingo always takes the lead. However, other dingo
as exploration and exploitation. The algorithm generates species, including beta, may hunt as well. Eqs. (43) to (51)
various anticipated outcomes within the search domain are developed with this issue in line with the discussion.
during the initial exploration phase. However, the
?⃗? 𝛼 = |𝐴 1. ?⃗? 𝛼 − 𝑃⃗⃗ ⃗
(43)
|
subsequent exploitation phase enables identifying and
pursuing the most desired resolutions within the ?⃗?
𝛽 = |𝐴2. ?⃗? ⃗⃗ (44)
𝛽 − 𝑃 ⃗ |
predetermined space. To discern the optimal resolution for
a given pragmatic concern, refinement, and integration of ?⃗? ⃗
𝑜 = |𝐴 ⃗⃗
3. ?⃗?
(45)
𝑜 − 𝑃 |
both constituent factors are necessary. Nonetheless,
achieving equilibrium among the proposed algorithm's ?⃗?
1 = |?⃗?𝛼 − ⃗⃗𝐵 ⃗
(46)
. ?⃗? 𝛼 |
constituents is arduous due to its stochastic disposition. To
?⃗? 2 = |?⃗?
𝛽 − ⃗⃗𝐵 ⃗ . ?⃗?𝛽 |
(47)
address an authentic engineering dilemma, the impetus for
developing an algorithm implementation utilizing
?⃗? 3 = |?⃗? 𝑜 − ⃗⃗𝐵 ⃗
(48)
hybridized meta-heuristics is derived from this . ?⃗? 𝑜 |
inspirational notion [34]. The following formulae are utilized to determine each
Dingo optimization is done by the computational dingo's intensity:
designing of the prey's pursuit, encirclement, and attack. 1
𝐼 𝛼 = log ( + 1) (49)
𝐹𝛼 − (1𝐸 − 100)
2.5.1 Encircling 1
Given the lack of previous knowledge about the search 𝐼 𝛽 = log( + 1) (50)
𝐹𝛽 − (1𝐸 − 100)
location and its ideal characteristics, it is proposed that the
objective or target prey is the best agent tactic currently in 1
𝐼 𝑜 = log ( + 1) (51)
use, representing the social hierarchy of dingoes. The 𝐹𝑜 − (1𝐸 − 100)
following mathematical formulas can be used to formalize
the dingoes' behavior: 2.5.3 Attacking
?⃗? 𝑑 = |𝐴 , ?⃗? 𝑝(𝑥) − 𝑃⃗⃗ ⃗ (𝑖) | (38) If a situation update is unavailable, it may be inferred that
the dingo successfully concluded its hunt through a
𝑃⃗⃗ ⃗ (𝑖 + 1) = ?⃗? 𝑝(𝑥) − ?⃗? . ?⃗? (𝑑) (39) predatory attack. To formally articulate the strategy, the
value of ?⃗? is systematically diminished linearly through
𝐴 = 2 . 𝑎 1 (40)
the utilization of mathematical notation. Noteworthy is the
?⃗? = 2?⃗? . 𝑎 2 − ?⃗? (41) fact that the variation range of ?⃗? 𝛼 is further diminished by
?⃗? . The value above may be identified as ?⃗? 𝛼, which is a
3
?⃗? = 3 − (𝐼 × ( )) (42) stochastic variable generated within the range of [-3b, 3b],
𝐼𝑚𝑎𝑥 where the constant ?⃗? undergoes a decremental process
The neighborhood dingoes' geographic coordinates from 3 to 0 over a series of cycles. When ?⃗? 𝛼 Values are
are displayed as a two-dimensional vector. The dingo may randomly generated within the interval [1,1]. An
adjust its situation to match the coordinates of (𝑃, 𝑄) exploratory agent is capable of moving to any possible
based on the prey's location, which is displayed as situation along the trajectory between its existing location
(𝑃∗, 𝑄∗). By adjusting the 𝐴 and ?⃗? vectors about the and the prey's location.
present situation, the graphic shows every possible
location around the ideal agent. Setting 𝐴 = (1,0) and
258 Informatica 49 (2025) 249–268 Y. Lan et al.
2.5.4 Searching be characterized as a stochastic vector whereby the
Dingoes exhibit hunting patterns primarily determined by elements with values that are less than or equal to one take
their pack's location. They consistently progress in pursuit priority over those greater than or equal to one. This
feature elucidates the gap's influence as described in Eq.
of locating and subduing prey. ⃗⃗𝐵 ⃗ represents random
(38). The hybrid framework combines the Dingo
variables. Notably, if the value assigned to ⃗⃗𝐵 ⃗ falls below Optimization Algorithm (DOA) with Support Vector
-1, it implies that the prey is retreating from the search Regression (SVR) to tune hyperparameters: C, ε, and γ.
agent. Conversely, if ⃗⃗𝐵 ⃗ exceeds 1, the pack is advancing Inspired by the natural hunting strategies of dingoes, such
toward its prey. This particular intervention facilitates the as surrounding, chasing, and attacking prey, the DOA
Department of Defense conduct a comprehensive global translates these behaviors into search operators that
reconnaissance of identified targets. One factor explore the SVR parameter space. Its aim is to minimize
contributing to a heightened probability of exploration the RMSE of SVR on training data by identifying the best
within the DOA is the component denoted as 𝐴⃗⃗ ⃗ . In Eq. parameter combination. By balancing exploration and
(40), the vector 𝐴⃗⃗ ⃗ can generate a range of random exploitation, the DOA-SVR hybrid effectively avoids
numbers within the interval between 0 and 3, independent local optima and improves SVR’s generalization ability,
of the weight of the prey selected. The DOA function can leading to more accurate CBR predictions.
Algorithm 3 offers the pseudo-code for the DOA.
Algorithm 3: Dingo Optimization
Input: The population of dingoes 𝐷𝑛 (𝑛 = 1, 2, . . . , 𝑛)
Output: The best dingo. (Here, the best values are minimum)
Generate initial search agents 𝐷𝑖𝑛
Start the value of 𝑏⃗⃗⃗ , 𝐴 , and ⃗⃗𝐵 ⃗ .
While the Termination condition is not reached, do
Appraise each dingo’s fitness and intensity cost.
𝐷𝛼 = dingo with the best search
𝐷𝛽 = dingo with the second-best search
𝐷𝑜 = Dingoes search outcomes afterward
Cycle1
repeat
for 𝑖 = 1: 𝐷𝑖𝑛 do
Renew the latest search agent state.
end for
Project the fitness and intensity cost of dingoes.
Record the value of 𝑆𝛼, 𝑆𝛽, 𝑆𝛿
Record the value of 𝑏⃗⃗⃗ , 𝐴 , and ⃗⃗𝐵 ⃗ .
𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 = 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 + 1
Monitor if cycle≥ Stopping criteria
output
end while
Choosing AFT, AOSMA, and DOA as optimizers was algorithms during training and optimization to maintain
driven by their unique algorithmic bases and search consistent behavior during repeated runs and to support
methods, enabling a thorough comparison of their reproducibility.
metaheuristic behaviors. These approaches are relatively
2.7 Hybridization strategy of SVR with
recent and less studied, yet they show competitive
performance in diverse regression and engineering tasks. metaheuristic algorithms
Incorporating them with SVR in this research allows
This study developed three hybrid machine learning
evaluation of both their predictive accuracy and
models—SVAF, SVSM, and SVDO—by integrating
optimization stability across different algorithmic
Support Vector Regression (SVR) with three advanced
frameworks.
metaheuristic optimization algorithms: Alibaba and Forty
Thieves (AFT), Adaptive Opposition Slime Mold
2.6 Reproducibility and run settings Algorithm (AOSMA), and Dingo Optimization Algorithm
(DOA). The goal is to boost SVR's prediction accuracy by
To ensure the robustness and reproducibility of the results,
optimizing its key hyperparameters—penalty parameter
each hybrid SVR model (AFT-SVR, DOA-SVR,
C, kernel parameter γ, and epsilon-insensitive loss ε—
AOSMA-SVR) was executed 30 independent times. This
using the global search methods provided by these
allows for reliable statistical analysis of model
metaheuristics. While SVR is a strong nonlinear
performance. Additionally, random seed initialization was
regression technique, its effectiveness heavily relies on
controlled using a fixed seed (e.g., seed = 42) across all
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 259
proper parameter tuning. Traditional manual or grid
𝑛
search methods are often inefficient or may yield 1
𝑅𝑀𝑆𝐸 = √ ∑(𝑑 𝑝 2
𝑛 𝑖 − 𝑖) (53)
suboptimal results, especially with complex, high-
dimensional geotechnical data. Therefore, this hybrid 𝑖=1
approach exploits the global search and convergence 𝑛
1
strengths of nature-inspired algorithms to automate SVR 𝑀𝑆𝐸 = ∑((𝑑
𝑛 𝑖 − 𝑝𝑖)
2 (54)
hyperparameter optimization.
𝑖=1
- In SVAF, the AFT algorithm explores the search 𝑅𝑀𝑆𝐸
space dynamically through mechanisms like global 𝑅𝑆𝑅 = (55)
𝑆𝑡. 𝐷𝑒𝑣
surveillance, balancing exploration and exploitation, and
adaptive decision-making inspired by Marjaneh. These ∑𝑛𝑖=1|𝑑𝑖 − 𝑏𝑖|
𝑊𝐴𝑃𝐸 =
∑𝑛
(56)
features enable it to identify optimal SVR parameters 𝑖=1|𝑏𝑖|
reliably.
𝑛 indicates the count of samples; 𝑑𝑖 displays the
- In SVSM, AOSMA enhances the slime mold
forecasted value; 𝑏𝑖 displays the actual value, while ?̅? and
algorithm with opposition-based learning and adaptive
?̅? represent the mean of the forecasted value and the
strategies, allowing it to escape local minima more
average of the actual amount, respectively.
effectively and converge more rapidly, thus providing
better hyperparameter configurations.
- In SVDO, the DOA mimics the social hunting 3 Outcomes and discussion
behaviors of dingoes—such as encircling, attacking, and This paper reports on developing a Support Vector
searching—to iteratively fine-tune the SVR parameters Regression model using three new enhancement
for higher prediction accuracy. techniques, AFT and DOA, aimed at developing three
Each metaheuristic aimed to minimize the RMSE of hybrid predictive models for soil estimation CBR. In
SVR predictions on training data, with the best parameter previous schemes, the information about information was
set used to train the final hybrid model. The process was divided into two subsets: a set to learn and a set to validate
repeated 30 times to ensure stability and reproducibility. the scheme, 70% and 30% of the data, respectively. The
This hybrid approach directly supports the study's goal of five consecutive statistical metrics, namely, R2, RMSE,
creating accurate, efficient, and generalizable models for MSE, RSR, and WAPE, were considered to get the full
predicting the California Bearing Ratio (CBR) of soils. view of the optimizers' performance. Outcomes can be
Using these metaheuristics not only enhances SVR’s shown in Table 2. The statistical indicators are analyzed
learning ability but also reduces the manual effort and in this section to determine whether one model is generally
computational cost typically required for parameter better. By studying the various R2 values among these
tuning. different schemes, it would be crystal clear that the most
promising outcomes are given out by SVAF in both the
2.8 Performance evaluation tactics testing and training stages, with 0.9968 and 0.9929 values,
A range of evaluators was deployed to appraise hybrid respectively. Meanwhile, the minimum value of R2
schemes' productivity in CBR value prediction. The list of among all comparative schemes was given to the SVSM
evaluators comprises RMSE, MSE, R2, the ratio of RMSE model at 0.9767. The key thing worth mentioning here is
to standard deviation (RSR), and lastly, weighted absolute that all the schemes have increased R2 during their test
percentage error, or WAPE. R2 determines the degree of phases, indicating that the schemes are well-trained.
linear relationship between the actual and forecasted Maximum RMSE, MSE, RSR, and WAPE values are
magnitudes. The RMSE is the square root of the ratio 1.6271, 2.6475, 0.1524, and 0.0334 for SVSM in training.
between the square of the count of specimens and the For the testing section, maximum RMSE values, MSE,
estimated value departure from the actual value. WAPE RSR, and WAPE are 1.5824, 2.5042, 0.1409, and 0.0312
could be quantified by dividing the total absolute error by for SVSM. By contrasting the evaluators' and errors'
the total real demand. Eq. (21-25) provides the values of values, the best hybrid scheme for estimating the CBR
these metrics above. value of soils is the combination of SVR and the ATF
2 algorithm (SVAF). This model has the highest R2 value
∑𝑛 (0.9968 in the testing phase) and the lowest error value
𝑖=1(𝑏 ?̅? ( − ?̅?
𝑅2 =
𝑖 − ) 𝑑𝑖 )
(52) (0.7946 in testing) among all three components.
√[∑𝑛𝑖=1(𝑏𝑖 − ?̅?)
2][∑𝑛𝑖=1(𝑑𝑖 − ?̅?)
2]
( )
Table 3: The hybridized schemes produced the findings
Schemes SVAF SVSM SVDO SVR
Section Train Test Train Test Train Test Train Test
RMSE 0.9316 0.7946 1.6271 1.5824 1.3363 1.171 1.336392 1.171305
R2 0.9929 0.9968 0.9767 0.9825 0.9852 0.992 0.985202 0.992446
MSE 0.868 0.6314 2.6475 2.5042 1.7859 1.372 1.7859 1.372
260 Informatica 49 (2025) 249–268 Y. Lan et al.
RSR 0.0872 0.0708 0.1524 0.1409 0.1251 0.1043 0.1251 0.1043
WAPE 0.0162 0.0141 0.0334 0.0312 0.0234 0.0212 0.0234 0.0212
Fig. 2 displays the dispersed presentations illustrating respectively. Three schemes were produced by the
the correlation between the gauged and expected subsequent analysis, which combined the SVR scheme
California Bearing Ratio values. R2 and RMSE are two with the three optimizer strategies applied to training and
types of assessments that include numerical data. When testing. Fig. 2 shows the findings of the current
the value of this evaluation metric decreases, density investigation. R2 of SVAF appears to be comparatively
increases because RMSE functions as a deviation more favorable than the rest of the schemes because the
controller. Additionally, the training and testing data data points maintain the same directionality and are nearer
points are drawn toward the center axis by the R2 the centerline. From empirical data, it can be induced that
evaluator. The figure below illustrates several other in all cases, and quite noticeable in the case of SVDO, the
variables which also include but are not restricted to the precision of the test phase values is higher than that of the
linear regression model's centerline, which is positioned at training phase. Overall, the result from the acquired data
the location Y=X, as well as dual lines that are in red in Fig. 2 is the most favorable result using the SVR method
below and above the midline, in that order, at Y=0.9X and and the ATF optimizer since R2 and RMSE in learning
Y=1.1X. The lower and upper ends of the line and validation also gave the best result. That could be due
intersections provide the false predictions of an to the capability of this model in terms of minimizing error
underestimation and an overestimation of values, and being the best in performance regarding the R2.
Figure 2: The scatter plot of expected and measured values
Fig. 3 presents the correlation between expected and distinct parts: model training and model validation.
actual CBR values obtained using three different classes Among them, the SVAF representing an SVR and the
of hybrid schemes. The graphs have been divided into two ATF algorithm generate closer agreement between the
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 261
gauged CBR values of the expected output for testing and unfavorable agreement appears quite clearly in SVR and
training data sets. By contrast, the status of the least AOSMA's union, SVSM.
Figure 3: The comparison line-symbol plot between expected and gauged CS
Fig. 4 presents the deviations between the gauged and the same set. The figure shows that, for the highest and
estimated values through three hybrid schemes regarding lowest performing schemes, the majority of errors are
the California Bearing Ratio. This figure indicates that the found in a narrower range of (-3,3) % in SVAF and (-
greatest error for SVSM when assessed is around 18%, 6,17%) % in SVSM.
whereas for schemes undergoing training, it was 12% in
262 Informatica 49 (2025) 249–268 Y. Lan et al.
Figure 4: The error distribution of the schemes over samples shown in a time series plot.
The errors in the observed values of the undrained %, and 7% for SVSM during training and testing of the
shear strength for the three different hybrid scheme schemes. The figure reflects the distribution of 25-75% of
types—SVAF, SVSM, and SVDO—are displayed in Fig. errors in a range less than (-1, 1) % in SVAF and (-3, 3) %
5. Based on this figure, the maximum errors are about 11 in SVSM: best and worst schemes, respectively.
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 263
Figure 5: The standard half-box plot showing the error ratio of the hybrid schemes created.
To enhance the statistical robustness of the proposed robustness across runs. The SVR-Dingo Optimization
models, 95% confidence intervals for the R² values were Algorithm model also performed well, with a confidence
calculated based on multiple independent runs of each interval of 0.7120 to 0.8298, slightly broader but with the
algorithm. As shown in Table 4, the standard SVR model highest upper bound. Meanwhile, the SVR-AOSMA
has the widest interval, from 0.6302 to 0.7631, indicating model shows an interval between 0.6653 and 0.7848,
greater variability and less predictive stability. In contrast, ranking it between the other hybrids in stability and
the three hybrid SVR models display narrower intervals performance. These intervals confirm that the SVAF
with higher upper bounds, signifying more consistent model not only offers high prediction accuracy but also
performance. Among these, the SVR model combined delivers consistent results, making it the most reliable
with the Alibaba and Forty Thieves algorithms (SVAF) model among those tested for CBR estimation.
achieved the most favorable confidence interval, from
0.7243 to 0.8078, reflecting both high accuracy and
264 Informatica 49 (2025) 235–248 Y. Lan et al.
indicating relatively limited predictive power. In contrast,
Table 4: Confidence intervals based on R2 the SVR models enhanced with metaheuristic algorithms
demonstrate superior performance. Among these, the
Lower Upper
Model SVR-Dingo Optimization Algorithm model shows a
Bound Bound
SVR 0.6303 0.7632 confidence interval between 0.712 and 0.830, reflecting
SVR + Dingo Optimization substantial improvement over the baseline. Similarly, the
0.7120 0.8298
Algorithm SVR-Adaptive Opposition Slime Mould Algorithm model
SVR + Adaptive Opposition yields a confidence range of 0.665 to 0.785, suggesting
0.6653 0.7848
Slime Mould Algorithm better stability and generalization. Notably, the SVR-
SVR + Alibaba and the Forty Alibaba and the Forty Thieves (SVAF) model achieves the
0.7243 0.8078
Thieves highest lower bound (0.724) and an upper bound of 0.808,
indicating both high precision and consistent
4 Sensitivity analysis performance. The limited overlap between the confidence
intervals of the SVAF model and those of the other models
The ANOVA-based sensitivity analysis conducted on the supports the claim of its statistically significant
performance of different predictive models for estimating superiority. This distinction highlights the effectiveness of
the California Bearing Ratio (CBR) reveals statistically the AFT optimizer in enhancing SVR’s learning capability
significant differences among the models. The confidence and minimizing prediction errors. Overall, the results of
intervals for the coefficient of determination (R²) provide the ANOVA test confirm that metaheuristic-optimized
insight into each model's accuracy and robustness. The SVR models, particularly SVAF, provide more accurate
baseline SVR model exhibits the lowest performance with and reliable predictions of CBR values compared to the
a confidence interval ranging from 0.630 to 0.763, standard SVR approach.
Table 5: Sensitivity analysis based on ANOVA
Models lower upper
SVR 0.630 0.763
SVR-Dingo Optimization Algorithm 0.712 0.830
SVR-Adaptive Opposition Slime Mould Algorithm 0.665 0.785
SVR-Alibaba and the Forty Thieves 0.724 0.808
inspired social hunting behaviors, facilitating effective
5 Discussion neighborhood search. However, its slower convergence
during exploitation may limit its ability to finely tune SVR
This section compares the three hybrid models—SVAF hyperparameters, especially in high- dimensional spaces.
(SVR + AFT), SVSM (SVR + AOSMA), and SVDO Regarding computational efficiency, SVAF requires
(SVR + DOA)—focusing on their predictive accuracy, slightly more training time than SVSM and SVDO due to
convergence behavior, and computational efficiency. As multiple adaptive conditions and surveillance cycles in
shown in Table 2, SVAF outperforms the others across all AFT, but its superior accuracy justifies this. SVSM offers
five metrics: R ², RMSE, MSE, RSR, and WAPE. During faster runtimes but less predictive precision. SVDO falls
testing, SVAF achieved the highest R ² (0.0.9968) and the between the two in terms of performance and
lowest RMSE (0.7946), indicating excellent computational demand. Overall, findings suggest that
generalization and minimal error in estimating CBR SVAF provides the best balance between accuracy and
values. This success stems from the adaptive balance optimization quality, making it a strong candidate for
between exploration and exploitation in the Alibaba and practical CBR prediction tasks. Future research could
Forty Thieves (AFT) optimization strategy, which explore combining AOSMA 's rapid convergence with
enhances SVR' s ability to find optimal hyperparameters. AFT 's stability to improve training efficiency without
The random surveillance mechanism in AFT promotes sacrificing accuracy. Future research will aim to improve
global search, while Marjaneh's intelligence adjustment the models' applicability across various regions by testing
enhances local refinement, enabling rapid convergence them on datasets with diverse soil types. Combining
toward optimal SVR settings. In contrast, the SVSM Support Vector Regression with deep learning—for
model, which employs the Adaptive Opposition Slime example, as a post-processing tool after deep feature
Mold Algorithm, showed weaker performance (R² = extraction—could boost prediction accuracy, particularly
0.9825, RMSE = 1.5824 during testing). Although for large or complex datasets. Another valuable approach
AOSMA incorporates opposition-based learning to boost is integrating these hybrid AI models into geotechnical
exploration, it can produce more oscillatory convergence software platforms, allowing real-time, data-driven
patterns, possibly leading to suboptimal SVR tuning. Its decision-making in engineering and construction projects.
complex adaptive threshold settings may also increase Although the hybrid SVR models presented
sensitivity to initial parameters. The SVDO model (SVR demonstrated strong predictive performance on the
+ Dingo Optimization Algorithm) performed moderately available dataset, there are some limitations to consider.
(R ² = 0.992, RMSE = 1.171). DOA utilizes biologically Firstly, without an external validation set, the
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 265
generalizability of the results may be restricted beyond the Acknowledgements
current data. Secondly, the relatively small sample size
increases the risk of overfitting, especially with the use of We wish to state that no individuals or organizations
metaheuristic optimization. Additionally, the dataset only require acknowledgment for their contributions to this
encompasses a limited range of soil types and regions, investigation.
which could limit the models' broader applicability. It is
also important to note that larger, more diverse datasets Authorship contribution statement
might benefit from alternative modeling techniques such Na Feng: Writing-Original initial drafting,
as deep learning or ensemble methods to achieve better Conceptualization, Supervision, Project administration.
predictive accuracy. These limitations will be addressed in Yulin Lan: Methodology, Software
future research to improve the model's robustness and Zhisheng Yang: Formal Analysis, Language Reviw
generalizability. To enhance model robustness, we plan to The authors declare that there is no conflict of interest
use regularization like L1/L2 penalties and early stopping regarding the publication of this paper.
to prevent overfitting. Models will be tested under various
conditions—smaller datasets and more noise—to check
resilience. Including confidence intervals or error margins Author statement
for metrics like RMSE and R² will better measure The manuscript has been read and approved by all the
uncertainty. These steps will help create more reliable, authors, the requirements for authorship, as stated earlier
generalizable models for geotechnical uses. in this document, have been met, and each author believes
that the manuscript represents honest work.
6 Conclusion
The current investigation has adopted an SVR scheme to Funding
project the CBR value of soil. Although the outcomes of This investigation was not funded by any specific grant
the conventional method were effective, it had some from public, commercial, or charitable funding bodies.
limitations. The laboratory process is costly and is not
considered to be time-effective. The drawbacks above can Ethical approval
be overcome by substituting the software-based approach
with artificial intelligence. The accuracy of the system in The paper has attained ethical approval from the
predicting the CBR was quite remarkable. The input institutional review board, ensuring the protection of
variables were selected to forecast the target parameter, participants' rights and compliance with the relevant
which was depicted as CBR. Five different performance ethical guidelines.
metrics were utilized to appraise the precision delivered
by the schemes under consideration. These included R2, References
RMSE, MSE, RSR, and WAPE. Three distinct meta-
[1] A. Chegenizadeh and H. Nikraz, “CBR test on
heuristic optimization approaches—the Dingo
reinforced clay,” in Proceedings of the 14th Pan-
Optimization Algorithm, Alibaba, the Forty Thieves
American Conference on Soil Mechanics and
Optimization algorithm, and the Adaptive Opposition
Geotechnical Engineering (PCSMGE), the 64th
Slime Mold Algorithm—have been examined in the
Canadian Geotechnical Conference (CGC),
current study to increase the system's functional
Canadian Geotechnical Society, 2011.
efficiency. The conclusions below may be drawn from the
[2] T. F. Kurnaz and Y. Kaya, “Prediction of the
analysis's outcome:
California bearing ratio (CBR) of compacted soils
• The thorough analysis of the pertinent characteristics
by using GMDH-type neural network,” The
was the foundation for developing the projection
European Physical Journal Plus, EPJ Plus, vol.
schemes to estimate CBR. A comparison between the
134, Jul. 2019.
experimental outcomes and those obtained utilizing
https://doi.org/10.1140/epjp/i2019-12692-0.
the suggested schemes showed that the latter's CBR
[3] H. B. Seed and P. De Alba, “Use of SPT and CPT
prediction accuracy was significantly high.
tests for evaluating the liquefaction resistance of
• In the current research, the test phase has shown that sands,” in Use of in situ tests in geotechnical
the forecast data's scattering value increased by 0.39, engineering, ASCE, 1986, pp. 281–302.
0.59, and 0.69 for SVAF, SVSM, and SVDO, [4] M. Gams and T. Kolenik, “Relations between
respectively, from the training phase. electronics, artificial intelligence and information
• The California Bearing Ratio outcomes presented in society through information society rules,”
this investigation indicate a significant discrepancy Electronics (Basel), MDPI, vol. 10, no. 4, p. 514,
between the observed and projected values, with an 2021.
average underestimate of almost 1.24 for the https://doi.org/10.3390/electronics10040514.
suggested schemes. With a value of 1.6271, the [5] R. W. Day, Soil testing manual. McGraw-Hill,
RMSE displayed its maximum error in the scheme's 2001.
SVSM in the training phase. The SVAF had the [6] M. M. E. Zumrawi, “Prediction of CBR Value
lowest error rate in the testing session, with a rating from Index Properties of Cohesive Soils.,”
of 0.7946.
266 Informatica 49 (2025) 235–248 Y. Lan et al.
University of Khartoum Engineering Journal, vol. Model, Elsevier, vol. 36, no. 9, pp. 4096–4105,
2, no. ENGINEERING, 2012. 2012. https://doi.org/10.1016/j.apm.2011.11.039.
[7] W. P. M. Black, “A method of estimating the [19] M. A. Shahin, M. B. Jaksa, and H. R. Maier,
California bearing ratio of cohesive soils from “Artificial neural network applications in
plasticity data,” Geotechnique, ICE Virtual geotechnical engineering,” Australian
Library, vol. 12, no. 4, pp. 271–282, 1962. geomechanics, vol. 36, no. 1, pp. 49–62, 2001.
https://doi.org/10.1680/geot.1962.12.4.271. [20] J. A. Abdalla, M. F. Attom, and R. Hawileh,
[8] K. B. Agarwal and K. D. Ghanekar, “Prediction of “Prediction of minimum factor of safety against
CBR from plasticity characteristics of soil,” in slope failure in clayey soils using artificial neural
Proceeding of 2nd South-east Asian Conference network,” Environ Earth Sci, Springer, vol. 73,
on Soil Engineering, Singapore. June, 1970, pp. pp. 5463–5477, 2015.
11–15. https://doi.org/10.1007/s12665-014-3800-x.
[9] M. Linveh, “Validation of correlations between a [21] B. Yildirim and O. Gunaydin, “Estimation of
number of penetration test and in situ California California bearing ratio by using soft computing
bearing ratio test,” Transp Res Rec, vol. 1219, pp. systems,” Expert Syst Appl, Elsevier, vol. 38, no.
56–67, 1989. 5, pp. 6381–6391, 2011.
[10] D. J. Stephens, “The prediction of the California https://doi.org/10.1016/j.eswa.2010.12.054.
bearing ratio,” Civil Engineering= Siviele [22] Tja. Taskiran, “Prediction of California bearing
Ingenieurswese, Sabnet, vol. 1990, no. 12, pp. ratio (CBR) of fine grained soils by AI methods,”
523–528, 1990. Advances in Engineering Software, Elsevier, vol.
https://hdl.handle.net/10520/AJA10212019_1435 41, no. 6, pp. 886–892, 2010.
6. https://doi.org/10.1016/j.advengsoft.2010.01.003.
[11] T. Al-Refeai and A. Al-Suhaibani, “Prediction of [23] S. Bhatt, P. K. Jain, and M. Pradesh, “Prediction
CBR using dynamic cone penetrometer,” Journal of California bearing ratio of soils using artificial
of King Saud University-Engineering Sciences, neural network,” Am. Int. J. Res. Sci. Technol.
Elsevier, vol. 9, no. 2, pp. 191–203, 1997. Eng. Math, vol. 8, no. 2, pp. 156–161, 2014.
https://doi.org/10.1016/S1018-3639(18)30676-7. [24] T. Q. Ngo, L. Q. Nguyen, and V. Q. Tran, “Novel
[12] M. W. Kin, “California bearing ratio correlation hybrid machine learning models including support
with soil index properties,” Master degree vector machine with meta-heuristic algorithms in
Project, Faculty of Civil Engineering, University predicting unconfined compressive strength of
Technology Malaysia, 2006. organic soils stabilised with cement and lime,”
https://eprints.utm.my/4064/1/MakWaiKinMFK International Journal of Pavement Engineering,
A2006.pdf. Taylor & Francis, vol. 24, no. 2, p. 2136374, 2023.
[13] S. R. CNV and K. Pavani, “MECHANICALLY https://doi.org/10.1080/10298436.2022.2136374.
STABILISED SOILS-REGRESSION [25] X. Wu, F. Lu, and T. He, “Exploring the potential
EQUATION FOR CBR EVALUATION,” 2006. of machine learning in predicting soil California
[14] P. Vinod and C. Reena, “Prediction of CBR value bearing ratio values,” Periodica Polytechnica
of lateritic soils using liquid limit and gradation Civil Engineering, vol. 69, no. 2, pp. 551–566,
characteristics data,” Highway Research Journal, 2025. https://doi.org/10.3311/PPci.38678.
IRC, vol. 1, no. 1, pp. 89–98, 2008. [26] V. Bherde, L. Kudlur Mallikarjunappa, R.
[15] R. S. Patel and M. D. Desai, “CBR predicted by Baadiga, and U. Balunaini, “Application of
index properties for alluvial soils of South machine-learning algorithms for predicting
Gujarat,” in Proceedings of the Indian California bearing ratio of soil,” Journal of
geotechnical conference, Mumbai, 2010, pp. 79– Transportation Engineering, Part B: Pavements,
82. ASCE Library, vol. 149, no. 4, p. 4023024, 2023.
[16] G. Ramasubbarao and S. G. Sankar, “Predicting https://doi.org/10.1061/JPEODX.PVENG-1290.
soaked CBR value of fine grained soils using [27] J. Ma et al., “A comprehensive comparison among
index and compaction characteristics,” Jordan metaheuristics (MHs) for geohazard modeling
Journal of Civil Engineering, vol. 7, no. 3, pp. using machine learning: Insights from a case study
354–360, 2013. of landslide displacement prediction,”
[17] M. Alawi and M. Rajab, “Prediction of California Engineering Applications of Artificial
bearing ratio of subbase layer using multiple Intelligence, Elsevier, vol. 114, p. 105150, 2022.
linear regression models,” Road Materials and https://doi.org/10.1016/j.engappai.2022.105150.
Pavement Design, Taylor & Francis, vol. 14, no. [28] V. N. Vapnik, “The nature of statistical learning,”
1, pp. 211–219, 2013. Theory, 1995.
https://doi.org/10.1080/14680629.2012.757557. [29] V. Vapnik, “Statistical Learning Theory. New
[18] H. Ghanadzadeh, M. Ganji, and S. Fallahi, York: John Willey & Sons,” Inc, 1998.
“Mathematical model of liquid–liquid equilibrium [30] M. K. Naik, R. Panda, and A. Abraham,
for a ternary system using the GMDH-type neural “Adaptive opposition slime mould algorithm,”
network and genetic algorithm,” Appl Math Soft comput, Springer, vol. 25, no. 22, pp. 14297–
Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 267
14313, 2021. https://doi.org/10.1007/s00500-021-
06140-2.
[31] S. Li, H. Chen, M. Wang, A. A. Heidari, and S.
Mirjalili, “Slime mould algorithm: A new method
for stochastic optimization,” Future Generation
Computer Systems, Elsevier, vol. 111, pp. 300–
323, 2020.
https://doi.org/10.1016/j.future.2020.03.055.
[32] H. R. Tizhoosh, “Opposition-based learning: a
new scheme for machine intelligence,” in
International conference on computational
intelligence for modelling, control and
automation and international conference on
intelligent agents, web technologies and internet
commerce (CIMCA-IAWTIC’06), Vienna,
Austria, IEEE, 2005, pp. 695–701.
https://doi.org/10.1109/CIMCA.2005.1631345.
[33] M. Braik, M. H. Ryalat, and H. Al-Zoubi, “A
novel meta-heuristic algorithm for solving
numerical optimization problems: Ali Baba and
the forty thieves,” Neural Comput Appl, Springer,
vol. 34, no. 1, pp. 409–455, 2022.
https://doi.org/10.1007/s00521-021-06392-x.
[34] A. K. Bairwa, S. Joshi, and D. Singh, “Dingo
Optimizer: A Nature-Inspired Metaheuristic
Approach for Engineering Problems,” Math Probl
Eng, Wiley Online Library, vol. 2021, p. 2571863,
2021. https://doi.org/10.1155/2021/2571863.
268 Informatica 49 (2025) 235–248 Y. Lan et al.
https://doi.org/10.31449/inf.v49i16.9315 Informatica 49 (2025) 269–284 269
A Cutting-Edge Bio-Inspired Computational Framework for
Advanced Virtual Reality Classification through Sophisticated
Predictive Methodologies
Yanyan Song, Hongping Zhou*
1 School of Communication Technology, Communication University of China Nanjing, Nanjing 211172, China
*Corresponding Author
E-mail: zhouhongping1231@163.com
Keywords: virtual reality, histogram gradient boosting classification, decision tree classification, ebola optimization
search, differential squirrel search algorithm
Received: May 20, 2025
Virtual reality (VR) enables the simulation of a wide variety of complex environments, from tiny biological
structures to entirely imaginary worlds. These simulations create new possibilities for learning, training,
and interaction that go beyond the limits of the physical world. However, virtual reality (VR) realizes this
imaginary world, so it is not just a dream. VR works through the invocation of many of the senses. It
creates realistic simulations through the creation of immersive settings that combine the real and the
imagined, thereby affording special hands-on learning possibilities in a variety of subjects. This study
investigates the effectiveness of combining Histogram Gradient Boosting Classification (HGBC) with
Decision Tree Classification (DTC), the Ebola Optimization Search (EOS), and the Differential Squirrel
Search Algorithm (DSSA) to predict VR outcomes. By integrating these advanced predictive and
optimization techniques, the approach aims to enhance accuracy. Research will be conducted to ascertain
the possible uses of VR, enhance user experience, and assess the impact on industries related to training,
education, healthcare, and entertainment. In the evaluation phase, HGDS attained the highest accuracy
of 0.967 in the test phase, making it the top-performing hybrid model, while DTEO showed the lowest
accuracy of 0.907, identifying it as the weakest model.
Povzetek: Članek predstavi bio-navdihnjen hibridni okvir za klasifikacijo uporabniških odzivov v virtualni
resničnosti. Združuje HGBC, DTC ter optimizatorja EOS in DSSA za izboljšanje napovedne točnosti.
Okvirjeva naloga je zanesljivo razvrščati VR-podatke.
1 Introduction The desktop VR enables the user to interact with the
system using a mouse or other controlling device while
VR simulation signifies a computer-created environment sitting in front of a desktop computer monitor, as the name
where users can move around, interact with objects, and implies [3]. Immersion systems utilize a visualization
interact with virtual characters, also implied as "agents" or display worn on the head of the user that completely
"avatars." A generic virtual setting is a 3D world [1], and, occludes their field of view. Collaborative systems have
like gravitation simulation, virtual environments human-controlled avatars interacting with each other, and
frequently aim to be as realistic as possible in both they can be immersive or desktop-based systems. Second
appearance and object behavior. It must be underlined, Life is one of the most recent and most effective
nonetheless, that there need to be no parallels between this collaboration systems [4]. An attempt is also being made
virtual environment and the actual world. One of the to use the collaborative systems for exploration. The
advantages of virtual environments is their ability to mixed reality systems merge computer-generated matter
replicate completely unrealistic scenarios [2]. Virtual with the real environment, which is viewed directly or
environments, however, provide a safe space to test through a camera. This system can teach engineering and
scenarios that would be too dangerous or difficult to medical skills to students, which are thought to be
perform in real life, and they imitate the setting where the impossible by this recently invented system [5].
student will eventually work. Learning by humans requires interaction with the
There are other ways of deploying VR; four typical environment, taking in information provided by the use of
configurations are included below: senses and experience [6]. Through computer simulation,
✓ Desktop VR (Monoscopic or Stereoscopic) VR takes the role of real-world sensory input. Reacting to
✓ Immersive VR (HMD, CAVE, widescreen) motion and common human behaviors in the actual world
✓ Collaborative Systems offers interaction. Therefore, VR can be useful in
✓ Mixed or Augmented Reality education since it allows pupils to experience a situation
270 Informatica 49 (2025) 269–284 Y. Song et al.
or scenario firsthand rather than only imagining it [7]. The components of the virtual world in a database, is called the
three main components that define the quality of VR virtual environment module. The physics engine is one of
experiences are immersion, interaction, and multisensory the major parts of any realistic simulation. A physics
feedback. Immersion is being engulfed or enclosed by the engine comprises a set of rules that control the motion and
surroundings [8]. One of the advantages of immersion is interaction of dynamic objects in a virtual scene. A typical
that it ensures a feeling of presence or the perception that physics engine can include a Newtonian mechanics
one is actually in the world being displayed [9]. simulation and collision detection, which describes when
Interactivity means the capability of the user's body two objects collide. They apply gravitational, friction, and
movements to affect the events happening in the impulse effects using physical rules. When two things hit
simulation and, in turn, provoke a reaction from the each other, the latter effect is important [20], [21]. When
simulation [10], [11]. two active entities collide, collision detection is necessary.
The multisensory nature of VR allows information to The physics engine determines their terminal velocity
be derived from several senses, which further enhances the using their simulated traits, such as mass, substance, and
experience in that this makes it both more engaging and speed.
more convincing—increasing, as it does, the sensation of
presence because this provides redundancy of 1.1 Related works
information, which diminishes the likelihood of
Normally, state-of-the-art reports that focus on specific
misunderstanding. Information from multiple sensory
aspects of the discipline or on specific application fields
entries is reinforced by a sensory combination [12], [13].
are available. They would mostly provide the taxonomies
VR enables the user to act as though they are in the actual
that systematically illustrate and classify the various
world by substituting a virtual environment for the current
methodologies involved.
one. A constructivist learning approach benefits from
➢ While Dachselt and Hübner [22] examined the
VR's immersive features [14]. The premise of
menus for AR and VR environments for all of the
constructivism, a theory of knowledge acquisition, is that
MR domain [23], they also presented an
people build knowledge by concluding from their past
extensive taxonomy.
experiences. The idea, as propounded by Jean Piaget,
➢ A taxonomy of NVEs, taking into consideration
assumes that learners try to fit new experiences into the
distribution and communication topologies, has
world picture that they have developed earlier. Learners
been provided by Macedonia and Zyda [24].
change their worldview to fit the new experience when
Mania and Chalmers [25] have presented a
they cannot assimilate new information into their system
taxonomy of platforms and communication.
effectively. Learning comes from experiences where
➢ Bowman has provided several taxonomies for
actions are based on assumptions about how the world
both interaction methods [26] and navigation
functions, only to find that it does not align with those
methods [27]. Mine's early research [28]
assumptions [15], [16], [17]. Adjusting the mental model
identifies the essential navigation and interaction
of how the world functions becomes necessary to account
in virtual spaces.
for the new experience. The view is that learning is an
➢ Gabbard [29] provides good generalized
active process of testing hypotheses. In other words, this
overviews, presents suggested best practices in
concept contrasts with the notion of learning as something
application design, and provides guides for
passive in nature: the mere acquisition or assimilation of
conducting user evaluations. Livatino and
data. VR is a powerful learning tool because it provides a
Koeffel have also presented guidelines for
context where such hypothesis testing can occur.
Virtual Environments (VEs) assessment [30].
According to [18], students who interact with new
➢ The current tracking technology is overviewed by
material are more likely to store and recall it.
Welch and Foxlin [31], who also compare and
Control software is at the heart of this system. This
contrast the respective merits and disadvantages
regulates the exchange of information between the virtual
of each.
world and the interface layer in response to user actions,
Recent work has explored innovative methods for
updating the world appropriately. On display devices such
classifying virtual reality (VR) using bio-inspired
as the haptic and visual interface, it also determines when
computational models. Song and DiPaola [32] introduced
the scene should be shown. With the help of additional
a bio-responsive VR system based on physiological data
tools, the control software can connect to the outside world
to enhance immersion. Zayed and Reda [33] demonstrated
through the internet, which might be an essential
that applying neurophysiological biosignals combined
capability in systems involving collaboration or many
with deep learning could classify cognitive states in VR
users. The virtual environment module includes a model
with 97% accuracy. Similarly, Arslan et al. [34], [35],
of real-world entities and the virtual world model. It
[36], [37], [38] employed emotion classification from
includes state and position information apart from
biosignals and machine learning in VR, achieving 97.78%
appearance. They could be dynamic objects, such as
accuracy. These advancements are significant in areas
moving objects or even avatars. They could also be static
such as rehabilitation, education, and psychotherapy.
objects. This model of the virtual environment needs to be
VanHorn and Çobanoğlu [35] also developed a
refreshed at regular intervals to add dynamic objects [19].
biomedical image classification system within a VR-based
The module for the virtual environment, which would
environment, making AI more accessible to experts.
store positions, shapes, and other attributes of all
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 271
Overall, these studies emphasize how biosignals, machine engagement in VR, and its variability supports the creation
learning, and VR can be integrated to develop advanced of effective predictive models.
predictive models, showcasing the potential of bio- This dataset attempts to contribute to the development of
inspired computational models to improve VR VR through the analysis of user experiences. An attempt
classification techniques. has been made in this study to develop a better VR design
with much more improvement in user comfort and
1.2 The study's objective customization by understanding the physical and
emotional reactions of consumers in diverse VR
This work examines the possible contribution that VR
situations. This information allows developers to work on
technology will make to enhance learning outcomes and
boosting VR systems and creating personalized
increase student engagement in schools. In data
experiences that will enhance customers' delight and
classification, this study applies an HGBC model and a
immersion. Fig. 1 presents a contour plot for the
DTC model. The performance of the schemes is optimized
correlation of the features.
by using methods such as EOS and DSSA. This research
User ID: This variable identifies every participant
will explore the integration of VR within diverse
who experienced VR. Each user is assigned a unique ID
disciplines of study to understand how it can facilitate the
so that their data in the dataset can be differentiated.
retention of both theoretical knowledge and practical
Age: This variable stores the age of the subject
competencies of learners, given the immersion one
participating in VR exposure. For example, this could be
experiences in a VR environment. Possible drawbacks and
an integer representing the current user's age at the time of
limitations, including accessibility of resources, shall also
using the VR.
be discussed to present a comprehensive overview of what
Gender: This variable displays the user’s gender. The
can be expected from this educational technology.
categories "Male," "Female," and "Other" can be utilized
to define the user's gender identity.
2 Materials and methodology VR Headset Type: This would be a variable
specifying the form of VR headset that a user is utilizing
2.1 Data gathering in a VR experience. Examples include Oculus Rift, HTC
Vive, and PlayStation VR, among others.
A set of users' experiences in VR settings provides the Duration: This variable shows the time spent in the
dataset. The information covers user preferences, VR experience in minutes. It reflects how much time was
emotional moods, and physiological reactions like skin spent by the participant in the VR setup.
conductance and heart rate. This study's dataset includes Motion Sickness Rating: It displays the rating of the
1000 samples, each representing a user’s VR session. user's self-reported motion sickness during the VR
Recorded features encompass User ID (173 unique experience. Higher numbers relate to a higher degree of
values), Age (66), Gender (147), VR Headset type (61), motion sickness on an ascending scale ranging between 1
Session Duration (137), Motion Sickness severity (56), and 10.
and Immersion Level (55). These variables cover both Dependent variable:
demographic and behavioral data, forming a The degree to which a user experiences being inside a
comprehensive basis for analysis. The Immersion Level virtual environment quantifies the subjective degree of the
serves as the target variable, indicating users' subjective user's feeling of immersion in the experience, with a rating
between 1 and 5, where 5 stands for the maximum level.
272 Informatica 49 (2025) 269–284 Y. Song et al.
Figure 1: The contour plot with color fill illustrates the relationship between input and output variables
Before deploying advanced computational models, through several iterations of hyperparameter tweaking.
understanding several challenges in VR classification The implementation of HGB from sci-kit-learn 0.21.3 was
systems is essential. These encompass the significant used from the Python ML module [40].
variability in user responses driven by individual
physiological and psychological differences, noise in 2.3 Decision tree classification (DTC)
biometric data such as heart rate and skin conductance,
In a DT, every internal node displays a characteristic, each
and class imbalance across different immersion levels.
branch is a decision rule, and each leaf node is the outcome
The subjective nature of immersion also complicates
[41]. The root node signifies the topmost node in a DT. To
labeling and impacts the consistency of ground truth.
achieve the best discrimination among classes or results,
These factors result in a complex, high-dimensional
it learns to split based on the value of an attribute.
feature space where traditional classifiers often face
Different schemes have different criteria regarding
difficulties with generalization and robustness.
making decisions. For example, some of the metrics used
Consequently, adopting adaptive hybrid machine learning
by schemes like ID3, C4.5, and CART include entropy,
approaches, supported by powerful metaheuristic
gain ratio, and Gini impurity, respectively. The problem at
optimization techniques, is crucial for effective
hand now is to find that characteristic at every level that
classification in VR.
offers the optimum split in a DT, thereby assisting
optimum decision-making [42]. The concept can be
2.2 Histogram gradient boosting mathematically understood by using the DT split based on
classification (HGBC) entropy. The entropy H(D) of a dataset D can be calculated
The HGB approach is another variant of the popular GB as follows:
𝑚
[39] technique used to resolve diverse classification and
regression-oriented machine learning (ML) problems. 𝐻(𝐷) = − ∑ 𝑝𝑖𝑙𝑜𝑔2𝑝𝑖 (1)
These schemes, which AdaBoost also belongs to, 𝑖=1
primarily try to turn weak learners into strong ones. They
come under the category of schemes called boosting 2.4 Ebola optimization search (EOS)
schemes. Boosting techniques try to keep adding and Driven by the diffusion of the Ebola virus, in what follows,
teaching new weak learners successively to correct their EOS presents a metaheuristic scheme [43]. The EOSA
previously introduced weak learners about their mistakes. scheme is based on the enhanced SIR scheme of the
It then informs each new weak learner to avoid the sickness. Its S, E, I, R, H, V, Q, and D compartments
mistakes made by its forerunner. The most common weak represent the Susceptible (S), Exposed (E), Infected (I),
learners used are DTs. This resulted in the development of Hospitalized (H), Recovered (R), Vaccinated (V),
the HGB algorithm, a boosting methodology that Quarantine (Q), and Death (D) states, respectively.
overcame one of the major weaknesses of the GB Because of these compartments, the composition provides
algorithm, which was its very long training time when for the construction of a search domain that best displays
training on large datasets. To circumvent this problem, the combinations of weights and biases that may be required
continuous input variables are discretized or binned into a by CNN. After representation, SIR is displayed by a
few hundred distinct values. In this case, the learning rate mathematical scheme utilizing a system of first-order
(LR) of the scheme is the most important hyperparameter. differential equations. Then, the new metaheuristic
Much attention was paid to the optimization of the scheme scheme was developed by combining the mathematical
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 273
and propagation schemes, and later, the obtained ❖ Define initial values for all vector and scalar
mathematical scheme was deployed in the design of quantities, that is, persons and parameters,
EOSA-CNN for experimentation. Therefore, the respectively: the numbers of hospitalized (H),
following are the mathematical schemes: vaccinated (V), susceptible (S), infected (I),
𝑚𝐼𝑡+1
𝑖 = 𝑚𝐼𝑡
𝑖 + 𝜌𝑀(𝐼) (2) recovered (R), dead (D), and quarantined (Q).
𝜕𝑆(𝑡) ❖ 𝐼1 is created at random among vulnerable people.
= 𝜋 − (𝛽 𝐼 + 𝛽 𝐷 + 𝛽 𝑅 + 𝛽
𝜕𝑡 1 3 4 2(𝑃𝐸)𝜂)𝑆
(3) ❖ The value of fitness shall be calculated at the
− (𝜏𝑆 + Γ𝐼) index case, having set that as the current and
𝜕𝐼(𝑡) global best.
= (𝛽 𝐼 + 𝛽 𝐷 + 𝛽 𝑅 + 𝛽 (
𝜕𝑡 1 3 4 2 𝑃𝐸)𝜆)𝑆
(4) ❖ If there is at least one infected person and the
− (Γ + 𝛾)𝐼 − (𝜏)𝑆 number of iterations is not reached, then:
𝜕𝐻(𝑡) a) With every vulnerable individual, a standing is
= 𝛼𝐼 − (𝛾 + 𝜛)𝐻 (5)
𝜕𝑡 created and altered accordingly with their
𝜕𝑅(𝑡) movement. Exploitation is characterized by short
= 𝛾𝐼 − Γ𝑅 (6)
𝜕𝑡 displacement; otherwise, it characterizes
𝜕𝑉(𝑡) exploration. Remember that the longer an
= 𝛾𝐼 − (𝜇 + 𝜗)𝑉 (7)
𝜕𝑡 infected case is displaced, the more infections
𝜕𝐷(𝑡) there are.
= (𝜏𝑆 + Γ𝐼) − 𝛿𝐷 (8)
𝜕𝑡 b) Using (a), generate newly infected individuals
𝜕𝑄(𝑡) 𝑛𝐼.
= (𝜋𝐼 − (𝛾𝑅 + Γ𝐷)) − 𝜉𝑄 (9)
𝜕𝑡 c) Create the new individuals and add the new
𝑚𝐼𝑡+1 𝑡
𝑖 and 𝑚𝐼𝑖 display the old and new situation at instances in 𝐼.
time 𝑡 and 𝑡 + 1, respectively, 𝜌 is the displacement scale d) From the size of 𝐼 calculate how many people are
factor of an individual in Eq. (2). The data updated here added to H, D, R, B, V, and Q at their respective
are Hospitalized (H), Vaccinated (V), Recovered (R), rates.
Infected (I), Susceptible (S), Quarantine (Q), and Dead e) Utilizing the new 𝐼, refine 𝑆 and 𝐼.
(D). Eqs. (3) to (9) define a system of ordinary differential f) Compare the best 𝐼 have got at the moment with
equations, all of the scalar functions that one can evaluate the best in the world.
to float values. These are computed given initial g) If the termination condition is not reached, go
conditions 𝑆 (0) = 𝑆0, 𝐼(0) = 𝐼0, 𝑅(0) = 𝑅0, 𝐷(0) = back to step 4.
𝐷0, 𝑃(0) = 𝑃0, and 𝑄(0) = 𝑄0, 𝑡 is after the definition ❖ Return all solutions and the best global
of iterations. This will then enable us to conclude the resolution.
magnitude of vectors S, I, H, R, V, D, and Q at 𝑡. The design and discussion of the utilization of the
The pseudocode that describes the EOSA enhancement issue defined in this paper are given in the
metaheuristic scheme is presented accordingly in steps following subsections.
below: Fig. 2 presents the flowchart of the DTC.
Figure 2: The flowchart of the DTC model
274 Informatica 49 (2025) 269–284 Y. Song et al.
2.5 Differential squirrel search algorithm Combining EOS and DSSA offers complementary
(𝐃𝐒𝐒𝐀) advantages, EOS facilitates broad exploration, while
DSSA ensures precise convergence, together enhancing
𝐷𝑆𝑆𝐴, a hybrid optimizer that combines the differential classification accuracy and model robustness for VR
evolution and squirrel search schemes is presented in this immersion prediction.
section. In SSA, the squirrels maintain the position of
other squirrels regarding acorn or hickory trees for 2.5.1 Initialization of position and evaluation
updating their position. To improve its search strategy, the
of fitness
top squirrels' position updating rules have been changed.
The incorporation of crossover operations inspired by DE The squirrels are initially placed in the search area at
significantly enhances the exploration capability. random. Knowing the squirrels' location allows one to
The following is a mathematical scheme of many calculate their fitness, which simply replaces their position
foraging techniques covered under the paradigm of DSSA. in the fitness function by demonstrating how good a food
To justify selecting EOS and DSSA for this supply they could find. The best squirrel 𝑃𝑆ℎ𝑡 discovered
classification task, it’s crucial to highlight the problem’s in the hickory tree thus far is determined by sorting fitness
nature: the dataset involves multiple interacting features values. It is thought that the squirrels in the acorn tree
with complex, nonlinear relationships, which can cause 𝑃𝑆𝑎𝑡(1 ∶ 3) are traveling in the direction of the optimal
optimization to get stuck in local optima when using location in a subsequent iteration, as indicated by the
traditional methods. The EOS algorithm, inspired by following three best function values. The remaining
epidemic modeling, employs dynamic, population-based squirrels, 𝑃𝑆𝑛𝑡(1: 𝑁𝑃 − 4), are in the typical tree and
exploration techniques that balance infection-driven have not yet discovered food.
diversification with recovery-focused convergence. This
strategy is especially effective for tuning hyperparameters 2.5.2 Position update
in complex models like HGBC and DTC. Its The squirrels in an acorn tree, following the current best,
compartmental diffusion model efficiently captures 𝑃𝑆ℎ𝑡, renew the position, and move in the direction of the
multidimensional search dynamics. Meanwhile, DSSA best source when there is no predator. The squirrels of a
mimics squirrel foraging behavior and utilizes crossover usual tree follow the ones in an acorn or hickory tree to
inspired by differential evolution, making it highly renew their position. If there is the presence of a predator,
effective at fine-tuning solutions locally while then the squirrels change direction randomly while
maintaining overall diversity. This capability is critical in foraging. These are the mathematical schemes that are
VR classification scenarios, where high accuracy requires used to update the squirrel's position.
careful adjustment of sensitive parameters to prevent As in Eq. (10) now, the posture of squirrels on acorn
overfitting. DSSA’s ability to retain elite solutions while trees changes based on the postures of others.
fostering diversity helps avoid premature convergence.
𝑆𝑜𝑙𝑑
𝑡 + 𝑑𝑔 × 𝐺 𝑃𝑆𝑜𝑙𝑑
𝑃𝑆𝑛𝑒𝑤 𝑃 𝑐(𝑃𝑆𝑜𝑙𝑑
ℎ𝑡 − 𝑎𝑡 − 𝑃𝑎𝑣𝑔), 𝑟1 ≥ 𝑃𝑑𝑝
𝑎𝑡 = { 𝑎 (10)
𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
whereas 𝑃𝑎𝑣𝑔 is the mean location of every squirrel in Some of the squirrels on regular trees do the
the current population. placement of acorn tree squirrels, after which they relocate
It also employs the crossover mechanism of DE in a to their new locations.
way that ensures diversity among squirrels to the 𝑃𝑆𝑛𝑒𝑤
𝑛𝑡
maximum while minimizing the possibility of trapping in 𝑃𝑆𝑜𝑙𝑑 𝑜𝑙𝑑 − 𝑃𝑆𝑜𝑙𝑑
= { 𝑛𝑡 + 𝑑𝑔 × 𝐺𝑐(𝑃𝑆𝑎𝑡 𝑛𝑡 ), 𝑟2 ≥ 𝑃𝑑𝑝 (12)
local minima. Applied to the squirrel's current position and 𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑖𝑜𝑛, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
the new position as obtained by Eq. (11): where the random integer 𝑟2 is uniformly distributed
𝑃𝑆𝑐𝑟
𝑎𝑡,𝑖,𝑗 between 0 and 1.
𝑃𝑆𝑛𝑒𝑤
𝑎𝑡,𝑖,𝑗 , 𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 ≤ 𝐶𝑟) 𝑜𝑟 𝑗 = 𝑗𝑟𝑎𝑛𝑑 In normal trees, the survivors cling on to the best
= { , 𝑗 (11)
𝑃𝑆𝑜𝑙𝑑 move on view, and their new positions are shown below:
𝑎𝑡,𝑖,𝑗 , 𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 > 𝐶𝑟)𝑜𝑟 𝑗 ≠ 𝑗𝑟𝑎𝑛𝑑
𝑃𝑆𝑛𝑒𝑤
= 1, 2, 3, … , 𝐷 𝑛𝑡
𝑃𝑆𝑜𝑙𝑑 + 𝑑 𝑜𝑙𝑑 𝑑
In this context, NP displays the population size, with )
= { 𝑛𝑡 𝑔 × 𝐺𝑐(𝑃𝑆ℎ𝑡 − 𝑃𝑆𝑜𝑙
𝑛𝑡 ), 𝑟3 ≥ 𝑃𝑑𝑝 (13
𝑖 ranging from 1, 2, 3, … , 𝑁𝑃. For acorn or normal trees, 𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑖𝑜𝑛, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑃𝑆𝑐𝑟
𝑎𝑡;𝑖;𝑗 indicates the updated positions of the squirrels The following crossover procedure is also given for
typical tree squirrels:
following the crossover operation. 𝑃𝑆𝑛𝑒𝑤
𝑎𝑡;𝑖;𝑗 and 𝑃𝑆𝑜𝑙𝑑
𝑎𝑡;𝑖;𝑗
𝑃𝑆𝑐𝑟
correspond to the new and previous positions of the 𝑛𝑡,𝑖,𝑗
squirrels. 𝐷 refers to the dimensionality of the problem, 𝑃𝑆𝑛𝑒𝑤
𝑛𝑡,𝑖,𝑗, 𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 ≤ 𝐶𝑟)𝑜𝑟 𝑗 = 𝑗𝑟𝑎𝑛𝑑
= { , (14)
and 𝐶𝑟 displays the crossover rate, which is set to 0.5. The 𝑃𝑆𝑜𝑙𝑑
𝑛𝑡,𝑖,𝑗, 𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 > 𝐶𝑟)𝑜𝑟 𝑗 ≠ 𝑗𝑟𝑎𝑛𝑑
index 𝑗𝑟𝑎𝑛𝑑 is randomly selected from the range [1, 𝐷], 𝑗 = 1, 2, 3, … , 𝐷
and 𝑟𝑎𝑛𝑑𝑗 denotes the 𝑗𝑡ℎ random number, uniformly The convergence speed may be raised by permitting
generated within this range. the hickory tree squirrel to update her location in relation
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 275
to the average position of the squirrels in the tree. This can Figure 3 illustrates the flowchart of the proposed
be done as follows: hybrid models (such as HGDS and DTEO), detailing the
𝑃𝑆𝑛𝑒𝑤 𝑎
ℎ𝑡 = 𝑃𝑆𝑜𝑙𝑑
ℎ𝑡 + 𝑑𝑔 × 𝐺𝑐(𝑃𝑆𝑜𝑙𝑑 𝑣𝑔
ℎ𝑡 − 𝑃𝑆𝑎𝑡 ) (15) sequential phases that encompass data input, model
In this instance, 𝑃𝑆𝑎𝑣𝑔 displays the average of all development, optimizer-centric hyperparameter
squirrel locations within the acorn trees. optimization, training, and final assessment. This diagram
In order to participate in the next generation of people, delineates the interaction between machine learning
the best aspects of the new work, as well as its crossover models and metaheuristic optimizers within the hybrid
roles, are then contrasted with the old jobs. structure.
Figure 3: The process flowchart of the proposed hybrid models
276 Informatica 49 (2025) 269–284 Y. Song et al.
2.6 Performance evaluators max_leaf_nodes (listed separately for different model
types). The HGEO and HGDS models, based on the
Accuracy depends on how many correctly projected
HGBR algorithm, have specified values for learning_rate,
positive and negative instances there are of the total,
max_leaf_nodes, max_depth, min_samples_leaf, and
defined by True Positives (TP), True Negatives (TN)—
max_bins. For example, the HGEO model has a
correctly projected negative cases, False Positives (FP)-
learning_rate of 0.709, max_leaf_nodes of 278,
incorrectly projected as positive, and False Negatives
max_depth of 100, min_samples_leaf of 10, and max_bins
(FN)—incorrectly projected as negative. Using TP and FP
of 27. In the HGDS model, these values are learning_rate
as the relevant measures, precision gauges the percentage
of 0.148, max_leaf_nodes of 557, max_depth of 893,
of TP projections out of all the positive projections the
min_samples_leaf of 7, and max_bins of 102. Conversely,
model has made. Smaller amounts of false positives imply
the DTEO and DTDS models, which are based on decision
higher precision. Recall is the measure of the share of TP
tree algorithms, do not include values for learning_rate,
projections from all real positive instances, using True
max_leaf_nodes, or max_bins in the first part of the table.
Positives and False Negatives; it indicates that the model
However, they include defined values for max_depth,
will detect all relevant positive cases. The fewer false
min_samples_leaf, min_samples_split, and
negatives there are, the higher the recall. A simple statistic
max_leaf_nodes in the second part. For instance, the
that balances the trade-off between Precision and Recall is
DTEO model has max_depth of 741, min_samples_leaf of
the F1 score. It combines the two.
𝑇𝑃 + 𝑇𝑁 0.00025, min_samples_split of 0.0275, and
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ∶ (16) max_leaf_nodes of 2710. Similarly, the DTDS model
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁
𝑇𝑃 features max_depth of 597, min_samples_leaf of 0.00025,
Precision ∶ (17) min_samples_split of 0.0005, and max_leaf_nodes of
𝑇𝑃 + 𝐹𝑃
𝑇𝑃 1789. Overall, the table indicates that hyperparameters are
Recall ∶ (18) selectively tuned for each model based on its structure,
𝑇𝑃 + 𝐹𝑁
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙 with parameter values chosen according to each model's
F1 − score = 2 × (19) specific characteristics and requirements.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
The F1-score is a single measure that balances the Fig. 4 displays a 3D waterfall plot illustrating the
accuracy and recall; it is the harmonic mean of both. It is convergence curves of four hybrid schemes: HGDS,
very useful when considering false negatives and false HGEO, DTDS, and DTEO. The plot effectively visualizes
positives. The greater the F1 score, the better balanced the the different convergence rates and final performance
recall and accuracy. levels of the schemes, demonstrating the varying degrees
of effectiveness in the optimization process. This
comparison emphasizes the significance of the number of
3 Results and discussion iterations and initial accuracy in determining the overall
success of each hybrid model. The HGDS model starts
3.1 Hyperparameters tuning and with an accuracy of 0.6 and gradually improves over 200
iterations, ultimately reaching a peak accuracy of 0.967,
convergence curve analysis
making it the highest-performing model among the four.
The presented table displays the tuned hyperparameters The other three schemes begin with a lower accuracy of
for four different hybrid models: HGEO, HGDS, DTEO, 0.4 and converge more quickly than HGDS, reaching their
and DTDS. Seven key hyperparameters were considered final accuracy in fewer iterations. Among these schemes,
to optimize these models' performance: learning_rate, DTEO is identified as the weakest hybrid model, with a
max_leaf_nodes, max_depth, min_samples_leaf, final accuracy of 0.908 after its iterations.
max_bins, min_samples_split, and a second instance of
Table 1: Hyperparameter tuning for four models
Models
Hyperparameters
HGEO HGDS DTEO DTDS
𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔_𝑟𝑎𝑡𝑒 0.709 0.148 - -
𝑚𝑎𝑥_𝑙𝑒𝑎𝑓_𝑛𝑜𝑑𝑒𝑠 278 557 - -
𝑚𝑎𝑥_𝑑𝑒𝑝𝑡ℎ 100 893 741 597
𝑚𝑖𝑛_𝑠𝑎𝑚𝑝𝑙𝑒𝑠_𝑙𝑒𝑎𝑓 10 7 0.00025 0.00025
𝑚𝑎𝑥_𝑏𝑖𝑛𝑠 27 102 - -
𝑚𝑖𝑛_𝑠𝑎𝑚𝑝𝑙𝑒𝑠_𝑠𝑝𝑙𝑖𝑡 - - 0.0275 0.0005
𝑚𝑎𝑥_𝑙𝑒𝑎𝑓_𝑛𝑜𝑑𝑒𝑠 - - 2710 1789
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 277
Figure 4: 3D waterfall plot for the convergence curve of the hybrid schemes
3.2 Schemes performance comparison accuracy of 0.907, is the weakest model. HGDS
outperforms HGEO by 0.17 in accuracy, establishing itself
Fig. 5 presents a doughnut plot, providing an intuitive
as the top model. Nevertheless, HGEO still demonstrates
representation of the schemes' performance and
strong performance, securing the second-best position
facilitating a clearer comparison across different
overall. This comparison underscores the varying
evaluation metrics. The performance results of six hybrid
strengths of each model, with HGDS leading in accuracy
schemes evaluated using accuracy, precision, recall, and
and other performance metrics, while HGEO, despite its
F1 scores across training, testing, and overall sections
lower accuracy, remains a competitive alternative. The
have been presented. Among these, HGDS emerges as the
results emphasize that even schemes with slightly lower
best-performing model, with an impressive accuracy of
accuracy can still offer valuable performance in certain
0.967 in the test section. Conversely, DTEO, with an
contexts.
278 Informatica 49 (2025) 269–284 Y. Song et al.
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 279
Figure 5: A connected doughnut plot employed for the visual evaluation of the schemes' performance
Additionally, Table 2 provides a summary of the than the top-performing schemes. Nonetheless, it
performance of six schemes across five levels regarding outperforms both DTDS and CTC by margins of 0.013
precision, recall, and F1 score. The hybrid model HGDS and 0.010, respectively. While DTEO's F1-score may not
stands out at level 1, achieving the highest precision of be the highest, it still demonstrates competitive
0.990. Additionally, HGDS excels in both recall and F1- performance relative to other schemes. These findings
score, outperforming all other schemes and demonstrating indicate that HGDS is the most well-rounded and effective
its overall robustness. In contrast, DTEO shows weaker model overall, while DTEO, despite its limitations in
recall performance compared to the other schemes, recall and F1 score, delivers superior performance in
although it surpasses DTC in this metric. Regarding the specific areas.
F1-score, DTEO records a value of 0.922, which is lower
Table 2: Schemes’ evaluation results through different immersion levels
Immersion Levels
Evaluators Schemes
Level 1 Level 2 Level 3 Level 4 Level 5
HGBC 0.946 0.946 0.909 0.925 0.973
HGEO 0.941 0.995 0.907 0.951 0.974
HGDS 0.974 0.990 0.981 0.945 0.943
Precision
DTC 0.988 0.938 0.825 0.909 0.822
DTEO 0.973 0.929 0.847 0.923 0.878
DTDS 0.906 0.939 0.873 0.914 0.983
HGBC 0.951 0.933 0.928 0.952 0.932
HGEO 0.946 0.947 0.959 0.947 0.969
HGDS 0.960 0.971 0.985 0.956 0.963
Recall
DTC 0.847 0.875 0.953 0.869 0.916
DTEO 0.876 0.885 0.948 0.927 0.906
DTDS 0.911 0.894 0.964 0.927 0.911
HGBC 0.948 0.94 0.918 0.938 0.952
HGEO 0.943 0.97 0.932 0.949 0.971
HGDS 0.975 0.976 0.965 0.949 0.971
F1-score
DTC 0.912 0.906 0.885 0.888 0.866
DTEO 0.922 0.906 0.895 0.925 0.892
DTDS 0.909 0.916 0.916 0.921 0.946
Fig. 6 displays the ROC (Receiver Operating Area Under the Curve (AUC) signifies better
Characteristic) curves of the hybrid model across five performance. Among the five levels, Level 1 has the
immersion levels. The ROC curve plots the True Positive highest AUC, indicating greater confidence and fewer
Rate (TPR) against the False Positive Rate (FPR) at classification uncertainties at this stage. Conversely, Level
various thresholds, offering a visual assessment of the 5 shows the weakest ROC performance, likely due to
model’s ability to distinguish between classes. A higher increased data overlap and less feature separation at higher
280 Informatica 49 (2025) 269–284 Y. Song et al.
immersion ratings. This suggests that as responses become lowest false positive rate. At this level, the true positive
more subtle at deeper immersion levels, the model's ability rate starts at 0.0 and gradually increases to 1.0, while the
to differentiate between classes slightly diminishes, false positive rate begins at 0.0 and rises to 0.1. On the
resulting in more false positives and a lower true positive other hand, Level 5 displays the worst projection
rate. These differences illustrate the model’s changing performance, with the true positive rate reaching 1.0,
confidence in classification across varying immersion indicating a decrease in projection accuracy and an
levels. Level 1 is considered the best projection level, increase in false positive rate. This shows a decline in
characterized by the highest true positive rate and the overall predictive quality as the level increases.
Figure 6: ROC curves for the hybrid classification model across five immersion levels.
3.3 Comparison of the measured and model. This high correlation between observed and
projected values projected values underscores HGDS's strong overall
reliability. Conversely, the DTEO model shows the
Fig. 7 displays a 3D bar plot illustrating the correlation weakest performance, with only 177 accurate projections,
between observed and projected values across five levels, making it the least effective model overall. While certain
highlighting each model's predictive accuracy. Among schemes may perform poorly in specific conditions,
these, the HGDS model stands out with the best DTEO consistently underperforms across all levels,
performance, particularly in level 1, where it achieves 194 indicating significant limitations in its predictive
accurate projections, establishing it as the top-performing accuracy.
Figure 7: A 3D bar plot is generated to depict the correlation between observed and projected values
Fig. 8 shows the projection errors across six schemes, these, the HGDS stands out for its higher accuracy. In
focusing on correct projections versus mistakes. Among level 1, it correctly projected 192 out of 194 cases,
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 281
resulting in only two errors. Similarly, in level 2, HGDS performance was similarly low in level 2, where it made
achieved 198 correct projections out of 202, with just four 14 mistakes out of 184 projections. This high error rate
mistakes. This accuracy highlights its strong performance marks DTEO as the least effective among the schemes
in comparison to the other schemes. In contrast, the DTEO analyzed. Overall, while HGDS exhibits consistent
model demonstrates weaker predictive accuracy. In level accuracy in both levels, DTEO's elevated error rate
1, it recorded five errors out of 177 projections. Its suggests limitations in its predictive reliability.
Figure 8: Confusion matrix illustrating the accuracy of the schemes under four specified conditions
282 Informatica 49 (2025) 501–505 Y. Song et al.
• Sensitivity analysis The HGBC, HGEO, HGDS, DTEO, and DTDS models
recorded much lower F-values—0.021, 0.006, 0.031,
Table 3 displays the results of a sensitivity analysis
1.015, and 0.074—with P-values of 0.886, 0.937, 0.861,
using one-way ANOVA to determine if model
0.314, and 0.786. These findings indicate no statistically
performance differences across various VR immersion
significant performance differences across immersion
levels are statistically significant. The F-value indicates
levels. Notably, the HGDS model—identified earlier as
the ratio of variance between groups to within groups,
the most accurate with a test accuracy of 0.967—showed
while the P-value shows the likelihood that observed
a low F-value of 0.031 and a high P-value of 0.861,
differences are due to chance. A P-value below 0.05 is
confirming its stable performance across all conditions.
generally considered significant. Of the six models
Overall, the ANOVA results suggest that none of the
evaluated, the DTC model had the highest F-value of
models exhibit statistically significant performance
2.923 and a P-value of 0.088. Although close to
variations across immersion levels, highlighting the
significance, this result remains statistically non-
robustness of the proposed models and particularly
significant, implying only marginal performance
validating the consistent performance of HGDS under
differences that do not meet the 95% confidence threshold.
different experimental scenarios.
Table 3: Sensitivity analysis based on ANOVA
Models name F-value P-value Models name F-value P-value
HGBC 0.021 0.886 DTC 2.923 0.088
HGEO 0.006 0.937 DTEO 1.015 0.314
HGDS 0.031 0.861 DTDS 0.074 0.786
their visual representation but also in how objects behave.
3.4 Limitations and directions for future For instance, VR simulations may include natural forces
research like gravity. These environments are not always designed
to mirror the real world; in fact, they often present
While the hybrid classification framework demonstrates fantastical or even impossible scenarios. This unique
encouraging results in predicting VR immersion levels, capability allows VR to simulate complex or hazardous
there are some limitations to address. First, the dataset is situations safely, making it especially useful in training
relatively small and was collected in a controlled and educational contexts. In such settings, VR can expose
experimental setting, raising questions about how well the learners to potentially risky situations they might
models will perform in real-world or commercial VR encounter in reality, allowing them to experience and
environments with more diverse users. Second, the practice without the associated risks. Advancements in
computational cost of metaheuristic algorithms like EOS technology have greatly enhanced the capabilities of VR,
and DSSA can increase substantially with larger dataset allowing for more immersive and realistic simulations.
dimensions, which may affect real-time or low-latency Additionally, the integration of sophisticated
VR applications. More research is needed to evaluate their classification schemes, such as DTC and HGBC, is
scalability and efficiency in live systems. Third, although transforming digital experiences. These schemes, along
the models were optimized for accuracy, aspects like with optimizations from techniques like the EOS and
interpretability and user feedback were not thoroughly DSSA, contribute to the improvement of VR systems. In
explored. Transparency could be especially important for testing, the hybrid HGDS approach has proven to be
applications in education or healthcare. Future research highly effective, achieving an accuracy rate of 0.967,
will focus on: (1) expanding the dataset to include making it the top performer among various schemes. On
multimodal user feedback (e.g., eye tracking, EEG), (2) the other hand, the DTEO approach, with an accuracy of
comparing our framework with common models such as 0.907, was identified as the least effective. Additionally,
SVM, Random Forest, and Neural Networks, and (3) although this study concentrated on the new EOS and
creating lightweight or approximate versions of EOS and DSSA algorithms because of their innovative hybrid
DSSA suitable for real-time immersive use. Additionally, search abilities, future research will include implementing
we aim to test the models across various VR fields, and comparing more traditional and popular optimizers
including rehabilitation, industrial training, and like Particle Swarm Optimization (PSO), Genetic
personalized learning, to ensure their robustness in Algorithms (GA), and Bayesian Optimization. This will
different operational contexts. enable a more comprehensive assessment of optimization
efficiency and adaptability across different learning
scenarios. Although this study concentrated on hybrid
4 Conclusion variants within our optimization framework, future
VR simulation immerses users in a dynamic, visually research will include benchmarking with models like
engaging virtual environment where they can navigate, Random Forest, Support Vector Machines, XGBoost, and
manipulate virtual objects, and interact with digital agents. Neural Networks. This will contextualize our models'
A defining feature of VR worlds is their three-dimensional performance against recognized standards and strengthen
nature, often coupled with realistic elements, not only in the validation of our methodology. Despite this, the
A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 283
hybrid approach often outperformed both DTC and DTDS 183–201, 2001. Mary Ann Liebert.
in certain metrics, demonstrating the potential of https://doi.org/10.1089/109493101300117884.
combining these innovative techniques for enhancing VR- [10] W. R. Sherman and A. B. Craig, “Understanding
based applications. virtual reality,” San Francisco, CA: Morgan
Kauffman, 2003. Elsevier.
Declarations [11] J. Vince, Introduction to virtual reality. Springer
Science & Business Media, 2011.
[12] S. Kavanagh, A. Luxton-Reilly, B. Wuensche, and
Funding B. Plimmer, “A systematic review of virtual
reality in education,” Themes in science and
This investigation was not funded by any specific grant
technology education, vol. 10, no. 2, pp. 85–119,
from public, commercial, or charitable funding bodies.
2017. The Learning and Technology Library.
[13] S. Helsel, “Virtual reality and education,”
Authors' contributions Educational Technology, vol. 32, no. 5, pp. 38–
YS performed Data collection, modeling, and appraisal. 42, 1992.JSTOR.
HZ reviews the initial draft of the manuscript, editing and [14] W. Winn, “Learning in artificial environments:
writing. Embodiment, embeddedness and dynamic
adaptation,” Technology, Instruction, Cognition
Acknowledgements and Learning, vol. 1, no. 1, pp. 87–114, 2003.
[15] B. J. Baker, “Virtual reality,” in Encyclopedia of
This exploration was backed by the project of the sixth Sport Management, Edward Elgar Publishing,
phase of the “333 High-level Talent Cultivation Project” 2024, pp. 1021–1023. Elgar Online.
in Jiangsu Province. https://doi.org/10.4337/9781035317189.ch599.
[16] Y. Boas, “Overview of virtual reality
Ethical approval technologies,” in Interactive Multimedia
Conference, 2013, pp. 1–6.
The exploration has received ethics approval from the
[17] M.-S. Yoh, “The reality of virtual reality,” in
IRB, guaranteeing the protection of participants' rights and
Proceedings seventh international conference on
compliance with the related ethics norms.
virtual systems and multimedia, IEEE, Berkeley,
CA, USA, 2001, pp. 666–674.
References https://doi.org/10.1109/VSMM.2001.969726
[1] C. Anthes, R. J. García-Hernández, M. [18] J. S. Bruner, “The act of discovery.,” Harvard
Wiedemann, and D. Kranzlmüller, “State of the educational review, 1961. APA PsycNet.
art of virtual reality technology,” in 2016 IEEE [19] G. Riva, “Virtual reality,” in The Palgrave
aerospace conference, IEEE, Big Sky, MT, USA, Encyclopedia of the possible, Springer, 2023, pp.
2016, pp. 1–19. 1740–1750. Springer.
https://doi.org/10.1109/AERO.2016.7500674. https://doi.org/10.1007/978-3-030-90913-0_34.
[2] I. Wohlgenannt, A. Simons, and S. Stieglitz, [20] C. Christou, “Virtual reality in education,” in
“Virtual reality,” Business & Information Systems Affective, interactive and cognitive methods for e-
Engineering, vol. 62, pp. 455–461, 2020. learning design: creating an optimal education
Springer. https://doi.org/10.1007/s12599-020- experience, IGI Global, 2010, pp. 228–243. IGI
00658-9. Global. DOI: 10.4018/978-1-60566-940-3.ch012.
[3] J. M. Zheng, K. W. Chan, and I. Gibson, “Virtual [21] M. Heim, “The design of virtual reality,” Body &
reality,” Ieee Potentials, vol. 17, no. 2, pp. 20–23, Society, vol. 1, no. 3–4, pp. 65–77, 1995. Sage
1998. IEEE. https://doi.org/10.1109/45.666641. Publications.
[4] G. C. Burdea, Virtual reality technology. John https://doi.org/10.1177/1357034X95001003004.
Wiley & Sons, 2003. [22] R. Dachselt and A. Hübner, “Three-dimensional
[5] S. M. LaValle, Virtual reality. Cambridge menus: A survey and taxonomy,” Computers &
university press, 2023. Graphics, vol. 31, no. 1, pp. 53–65, 2007.
[6] R. W. Langacker, “Virtual reality,” 1999. Elsevier.
[7] S. Greengard, Virtual reality. Mit Press, 2019. https://doi.org/10.1016/j.cag.2006.09.006.
[8] F. P. Brooks, “What’s real about virtual reality?” [23] P. Milgram and F. Kishino, “A taxonomy of
IEEE Computer graphics and applications, vol. mixed reality visual displays,” IEICE
19, no. 6, pp. 16–27, 1999. IEEE. TRANSACTIONS on Information and Systems,
https://doi.org/10.1109/38.799723. vol. 77, no. 12, pp. 1321–1329, 1994.
[9] M. J. Schuemie, P. Van Der Straaten, M. Krijn, [24] M. R. Macedonia and M. J. Zyda, “A taxonomy
and C. A. P. G. Van Der Mast, “Research on for networked virtual environments,” IEEE
presence in virtual reality: A survey,” multimedia, vol. 4, no. 1, pp. 48–56, 1997. IEEE.
Cyberpsychology & behavior, vol. 4, no. 2, pp. https://doi.org/10.1109/93.580395.
[25] A. Mania and A. G. Chalmers, “A classification
for user embodiment in collaborative virtual
284 Informatica 49 (2025) 501–505 Y. Song et al.
environments,” in 4th Internatioal Conference on
Virtual Systems and Multimedia, Gifu, Japan,
1998.
[26] D. A. Bowman and L. F. Hodges, “Formalizing
the design, evaluation, and application of
interaction techniques for immersive virtual
environments,” Journal of Visual Languages &
Computing, vol. 10, no. 1, pp. 37–53, 1999.
Elsevier. https://doi.org/10.1006/jvlc.1998.0111.
[27] D. A. Bowman, D. Koller, and L. F. Hodges, “A
methodology for the evaluation of travel
techniques for immersive virtual environments,”
Virtual reality, vol. 3, no. 2, pp. 120–131, 1998.
Springer. https://doi.org/10.1007/BF01417673.
[28] M. R. Mine, “Virtual environment interaction
techniques,” UNC Chapel Hill CS Dept, 1995.
[29] J. L. Gabbard, “A taxonomy of usability
characteristics in virtual environments,” 1997,
Virginia Tech.
[30] S. Livatino and C. Koffel, “Handbook for
evaluation studies in virtual reality,” in 2007 IEEE
symposium on virtual environments, human-
computer interfaces and measurement systems,
IEEE, Ostuni, Italy, 2007, pp. 1–6.
https://doi.org/10.1109/VECIMS.2007.4373918.
[31] G. Welch and E. Foxlin, “Motion tracking: No
silver bullet, but a respectable arsenal,” IEEE
Computer graphics and Applications, vol. 22, no.
6, pp. 24–38, 2002.IEEE.
https://doi.org/10.1109/MCG.2002.1046626.
[32] S. Emami and G. Martínez-Muñoz, “Condensed-
gradient boosting,” International Journal of
Machine Learning and Cybernetics, pp. 1–15,
2024. Springer. https://doi.org/10.1007/s13042-
024-02279-0.
[33] F. Pedregosa et al., “scikit-learn: Machine
Learning in Python-StandardScaler, 2011,” 2024,
Accessed.
[34] F. T. Admojo and B. S. W. Poetro, “Comparative
Study on the Performance of the Bagging
Algorithm in the Breast Cancer Dataset,”
International Journal of Artificial Intelligence in
Medical Issues, vol. 1, no. 1, pp. 36–44, 2023.
Yoctobrain.
https://doi.org/10.56705/ijaimi.v1i1.87.
[35] A. Naswin and A. P. Wibowo, “Performance
Analysis of the Decision Tree Classification
Algorithm on the Pneumonia Dataset,”
International Journal of Artificial Intelligence in
Medical Issues, vol. 1, no. 1, pp. 1–9, 2023.
https://doi.org/10.56705/ijaimi.v1i1.83.
[36] T. I. A. Mohamed, O. N. Oyelade, and A. E.
Ezugwu, “Automatic detection and classification
of lung cancer CT scans based on deep learning
and ebola optimization search algorithm,” Plos
one, vol. 18, no. 8, p. e0285796, 2023. PLOS.
https://doi.org/10.1371/journal.pone.0285796.
https://doi.org/10.31449/inf.v49i16.9788 Informatica 49 (2025) 269–290 269
GWO-RF: A Grey Wolf Optimized Random Forest Model for
Predicting Employee Turnover
Hongtao Zhang
Henan Medical Biological Testing Co., Ltd, Zhengzhou 450000, China
E-mail: HongtaoZhang8103@163.com
Keywords: human resources, prediction, employee turnover, computational model
Received: June 19, 2025
This study proposes an employee turnover prediction model (GWO-RF) that combines Grey Wolf
Optimization (GWO) algorithm with Improved Random Forest (LPRF). The model optimizes node
splitting strategy by combining C4.5 information gain rate and CART Gini coefficient (constraint
condition α+β=1) through linear programming. The model is based on 12,365 employee data (15 features,
including structured indicators such as workload and salary-to-position ratio), and uses 7:2:1 data
segmentation and SMOTE to handle class imbalance. Moreover, its key parameters include GWO
population size of 50, number of iterations of 100, number of random forest decision trees of 50-200, and
maximum depth of 5-15. The test set results show that the model has an AUC of 0.923±0.008 and an F1-
score of 0.871. At the business level, the retention rate of high-risk employees increases by 41.9%
(p<0.01), and the cost of single intervention decreases by 54.3%. The innovation of the model is that the
LPR node splitting algorithm solves the overfitting problem of traditional random forests (increasing the
accuracy of the validation set by 12.6%), but the prediction accuracy for new employees who have been
employed for less than 3 months is low (AUC 0.782). Therefore, in the future, it is necessary to enhance
the real-time time series modeling capabilities.
Povzetek: Študija predstavi model GWO-RF, ki združuje optimizacijo sivega volka in izboljšani naključni
gozd za napoved fluktuacije zaposlenih. Model izboljša razcep vozlišč ter poveča zadržanje ogroženih
zaposlenih.
1 Introduction prediction accuracy and realize quantitative loss
prediction. Therefore, some enterprises began to integrate
In today's highly competitive business environment, multi-source data (including employee satisfaction
employee turnover has become an important surveys, social network activities, etc.) to build hybrid
management challenge for enterprises. With the rising models [2].
cost of human resources and the increasing mobility of It is of great strategic value and management
knowledge workers, employee turnover not only brings necessity to construct an effective calculation model to
direct recruitment and training costs, but also leads to the predict employee turnover. Employee turnover will bring
damage of team stability, organizational knowledge loss significant economic losses to enterprises, including
and corporate reputation decline. Especially, in education recruitment costs, training costs and tacit knowledge loss.
and training, retail and Internet industries, the turnover Secondly, high turnover rate will destroy team stability
rate of employees generally exceeds 20%, and the and affect organizational performance. Studies show that
turnover rate of core employees in some enterprises is as when the team turnover rate exceeds 15%, the overall
high as 30%, which makes the development of accurate productivity will decrease by 25-40% [3]. More
turnover prediction model an urgent need for enterprise importantly, an effective prediction model can identify
human resource management [1]. high-risk employees 6-12 months in advance, enabling
The current mainstream prediction models can be enterprises to take targeted interventions to increase the
divided into two categories. One is the rule-based method, retention rate of core employees by 35-50% [4].
which mainly relies on expert experience to build Furthermore, by analyzing turnover drivers, the model
judgment rules. Although it is interpretable, it covers can optimize human resource management strategies and
limited scenarios. The other is the machine learning- improve overall employee satisfaction by 10-15
based method. It automatically identifies churn features percentage points. In the context of digital transformation,
by analyzing historical data, and typical algorithms such models have become a core tool for corporate talent
include random forest, XGBoost and deep neural network. strategies. In particular, they are of key significance to
The latest research shows that cluster analysis and knowledge-intensive industries and service industries,
behavioral feature modeling can effectively improve and they can effectively reduce human capital risks and
270 Informatica 49 (2025) 269–290 H. Zhang
enhance organizational competitiveness [5]. prediction accuracy rate was improved to the interval of
However, the existing model still has significant 65%-75% [7].
limitations. Firstly, the data quality is highly dependent, The cross-application of survival analysis methods
and many enterprises lack systematic employee behavior solves the shortcomings of traditional classification
records, which leads to difficulties in feature engineering. models in time series prediction. Cox proportional hazard
Secondly, the interpretability of the model is insufficient, model regards employee on-the-job status as a time-
and the black box characteristics make it difficult for dependent variable, and quantifies the influence strength
human resource managers to understand the prediction of different factors on retention rate through risk function.
logic. Thirdly, the ability of cross-industry generalization The semi-parametric characteristics make it not only take
is weak, and the driving factors of turnover in the advantage of the interpretation advantages of parametric
education and training industry are essentially different models, but also adapt to the data distribution of non-
from those in the retail industry. Finally, existing studies proportional risks. The research shows that there is a
focus on predicting accuracy, ignoring the guiding value nonlinear positive correlation between the duration of
of interventions, such as cost-benefit analysis of salary promotion delay and the risk of turnover, and the risk
adjustment and training investment. Therefore, future coefficient increases exponentially when the delay
research needs to strengthen the application of time series exceeds the critical value (about 18 months). This kind of
behavior analysis and causal reasoning framework and model promotes the transformation of prediction
establish a closed-loop management system of dimension from static section analysis to dynamic
prediction-intervention-evaluation. The purpose of this process analysis [8].
study is to develop an intelligent early warning model The core value of the traditional model lies in its
based on GWO-RF to improve the accuracy of high-risk white-box characteristics. Through coefficient
employee identification and intervention efficiency. significance test and variable importance ranking,
managers can intuitively understand the decision-making
logic. However, it has three fundamental limitations. First,
2 Related work feature selection relies on domain knowledge, making it
difficult to automatically extract implicit features. Second,
(1) Development context and theoretical framework the model architecture lacks a memory mechanism and
of traditional employee turnover prediction model cannot handle the continuous evolution of employee
The development of traditional prediction models status. Third, it makes insufficient use of unstructured
can be divided into three main stages: early statistical data (such as communication texts and collaboration
modeling stage (1990-2005), machine learning networks) [9]. These shortcomings have prompted
enhancement stage (2005-2015), and survival analysis researchers to turn to more complex intelligent modeling
deepening stage (2010-2015). In the statistical modeling methods.
stage, researchers mainly use parametric methods such as (2) Technological breakthrough and paradigm
multiple linear regression and logistic regression to innovation of intelligent prediction model
analyze the correlation between observable variables and The application of deep learning technology has
turnover intention by constructing generalized linear enabled the prediction model to achieve a qualitative leap,
model (GLM). This kind of research has laid the which is mainly reflected in four dimensions: time series
theoretical foundation of employee turnover prediction, modeling ability, small sample learning efficiency, multi-
and confirmed the explanatory power of core influencing modal fusion depth and dynamic decision optimization.
factors such as salary fairness and career development In terms of time series modeling, long-term short-term
opportunities. However, it is difficult to capture the memory networks (LSTM) capture long-term
interaction effect between variables due to linear dependencies of employee behavior sequences through
assumptions [6]. gating mechanisms, such as continuous quarterly
The introduction of machine learning technology performance fluctuation patterns or communication
marks a new stage in predictive models. Decision tree frequency changes trends. The two-way LSTM
algorithms (such as ID3 and C4.5) construct architecture further integrates historical and future
classification rules through information gain ratio, which context information, extending the early warning window
can automatically discover high-risk combination to 9-12 months [10].
features such as "performance evaluation period > 6 Transfer learning technology effectively alleviates
months and training participation times < 2 times". the problem of data scarcity. Through the pre-training-
Ensemble learning methods (such as random forest) fine-tuning paradigm, the model can migrate the feature
further improve model robustness and effectively reduce representations learned in the data-rich domain to the
the risk of overfitting through Bootstrap resampling and target domain. The domain adaptive method reduces the
random feature selection. During this period, the model distribution difference between the source domain and
began to integrate structured data in the HR information the target domain, and improves the cross-industry
system, including behavioral indicators such as prediction effect by 15%-25%. In addition, knowledge
attendance records and project participation, so that the distillation technology compresses the knowledge of
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 271
complex teacher model into lightweight student model, amplifying the discriminatory influence of sensitive
reducing the computational overhead by 70% while attributes such as gender and age. In terms of
maintaining 90% prediction accuracy [11]. computational efficiency, real-time prediction requires
Multi-modal fusion architecture breaks through the that the model reasoning delay be controlled within
limitation of a single data type. Modern prediction 200ms, which poses a severe test for complex neural
systems typically integrate three types of heterogeneous networks [14].
data: textual data, behavioral data, and physiological data. (3) Systematic analysis of existing problems and
The attention mechanism automatically weights the future research directions
contribution degree of different modalities [12]. The core contradictions faced by current research
Reinforcement learning framework integrates can be summarized into three levels of conflicts:
prediction and intervention into a unified system. The technical feasibility, ethical compliance and economic
model learns the optimal retention strategy by interacting applicability. In the technical dimension, there is a
with the environment, and the Q-learning algorithm fundamental tension between model complexity and
evaluates the long-term benefits of different interventions interpretability [15]. Although post-hoc interpretation
(such as salary adjustment range and training intensity). methods such as LIME and SHAP can generate the
In addition, the strategy gradient method can deal with importance of local features, they cannot provide a global
the continuous action space and dynamically adjust the causal chain, which leads managers to be cautious about
intervention strength. Such systems achieve a leap from the prediction results [16]. In terms of ethics, the breadth
passive prediction to active management, but need to of data collection conflicts with personal privacy rights.
design a reasonable reward function to avoid short-term In particular, the application boundaries of sensitive
behavior [13]. technologies such as emotion recognition [17] and social
Although intelligent models have made remarkable network analysis [18] urgently need to be defined by law.
progress, they face new challenges. In terms of data In terms of economics, there is a gap between the need
privacy, the EU's General Data Protection Regulation for model generalization and industry specificity.
(GDPR) requires the model to have the function of "right Traditional solutions adapt to different scenarios through
to be forgotten", and a differential privacy training feature engineering, but the adjustment cost is high [19].
mechanism needs to be developed. In terms of algorithm The following Table 1 summarizes the current status
fairness, it is necessary to prevent the model from of relevant research:
Table 1: Summary of research status
Method Representative Common Typical
Core Limitations
category algorithm datasets indicators
Logistic Accuracy of
Structured data The linear assumption limits the capture
Traditional regression and 65-75%,
of enterprise HR of interaction effects and cannot handle
statistical Cox significant
system (salary, unstructured data, resulting in weak
models proportional risk
attendance, etc.) temporal prediction ability
hazards model coefficient
Employee
Feature engineering relies on domain
satisfaction
Classic Random AUC 0.78- knowledge and predicts an AUC of only
survey+behavior
Machine forest, 0.85, F1- 0.65-0.70 for newly hired employees (<3
record (about 10-
Learning XGBoost score 0.72 months), lacking a dynamic adjustment
20
mechanism
characteristics)
Multimodal data
(text Training with over 10000 samples is
communication AUC 0.88- required, with high computational costs
Deep learning LSTM,
records, 0.91, Recall (GPU hourly cost of $5-8) and poor
methods Transformer
collaborative rate 82-85% interpretability (SHAP value consistency
network logs, of only 60-70%)
etc.)
The predicted AUC for employees who
12365 records of have been employed for less than 3
Hybrid AUC
GWO-RF listed companies months is 0.782, and real-time data flow
optimization 0.923±0.008,
(this study) (15 structured supplementation is required; Linear
model F1 0.871
indicators) programming node splitting increases
training time by 15%
272 Informatica 49 (2025) 269–290 H. Zhang
The current trend of employee turnover prediction months is taken as 12.
technology is evolving from traditional statistical models The average hourly wage is calculated as follows:
to intelligent hybrid models. Traditional methods, such as totalwage
logistic regression, rely on structured data and have an hourlywage = (2)
hours
accuracy rate of only 65-75%. Machine learning (such as
Among them, totalwage represents the total salary
random forest) has been improved to an AUC of 0.78-
obtained by front-line workers, and hours represents
0.85, but there are issues such as strong dependence on
the number of hours of front-line workers' wages.
feature engineering and poor prediction performance for
The compensation location is calculated as follows.
new employees (AUC<0.7); Although deep learning
methods such as LSTM achieve an AUC of 0.88-0.91, wage
pos = (3)
they have high computational costs and weak avgwage
interpretability. The GWO-RF hybrid model proposed in Among them, wage represents the monthly salary
this study achieved an AUC of 0.923 ± 0.008 on 12365 of front-line workers, and avgwage represents the
data points by optimizing parameters using the grey wolf average monthly salary of front-line workers in this
algorithm and integrating C4.5 and CART splitting position in the region where the enterprise is located.
strategies through linear programming. This resulted in a Based on the Price-Mueller model, combining the
41.9% increase in the retention rate of high-risk characteristics of small and medium-sized enterprises,
employees, but requires enhanced temporal modeling and referring to relevant literature, this paper constructs a
capabilities for new employees (<3 months). total of 15 indicators including individual factors,
Future breakthroughs should focus on three key environmental factors and structural factors for
paths. In terms of architecture design, it is necessary to subsequent employee turnover prediction.
develop a lightweight time series model based on
Transformer and build an explainable reasoning path in 3.2 Improvement of random forest model
combination with knowledge graphs. In terms of data
governance, it is necessary to establish a federated based on node splitting optimization
learning framework to implement a collaborative training In this paper, the random forest algorithm is further
mode of "data is not fixed, model is moving” and use improved to improve the performance in employee
homomorphic encryption to protect data sovereignty. In turnover prediction.
terms of the evaluation system, it is necessary to build The basic learner of random forest is decision tree.
multi-dimensional indicators covering prediction Commonly used node splitting algorithms in decision
accuracy (such as AUC-ROC), explanation quality (such trees mainly include ID3 algorithm based on information
as logical consistency score) and compliance (such as Gain (Gain), C4.5 algorithm based on information Gain
deviation detection rate). Only by achieving a balance rate and CART algorithm based on Gini coefficient (Gini),
between technological innovation and ethical constraints as follows.
can the employee turnover prediction model truly become (1) ID3 algorithm
the intelligent decision-making center of the If we assume that the data set D includes K different
organization's talent strategy. types of samples Ck (k = 1,2,L ,K ) , the entropy can be
calculated using the following formula [21].
C
3 Algorithm model construction ( ) k Ck
H D = − K
k=1 Log (4)
2
D D
3.1 Employee turnover prediction index Among them, D represents the total number of
samples, C represents the number of samples
system and model construction k
belonging to class K, and the n different values of
Based on the random forest model, the random attribute A in D are represented as Ai (i = 1,2,L ,n) . D is
forest model is improved, and the employee turnover divided into n subsets D according to A , and the
i i
prediction model is constructed, and the gray wolf samples belonging to type C in D are recorded as
k i
algorithm is used to optimize the model parameters. D . Then, the entropy value after selecting node A for
ik
When measuring various structural factors, for the splitting is:
workload factors, the calculation of workload is shown in
the following formula [20]. D
( ) n i
H A D = i=1 H (Di )
totalovertime D
press = (1) (5)
months D
n i D
K ik Dik
= −
Among them, totalovertime represents the total i=1 k=1 Log2
D Di Di
overtime hours of employees, months represents the
statistical time window, and this paper selects the Among them, D represents the number of
i
overtime situation in the past year for statistics, so samples belonging to subset D , and D represents
i ik
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 273
the number of samples belonging to category C in D . certain attribute to the classification task. The core idea
k i
Information gain is relative to the attribute. In data set D, of information gain is to calculate the information change
the information gain calculation of attribute A is as in the classification process based on the existence or
follows [22]: absence of the attribute. This information change is the
so-called information amount, which can also be called
GainA (D) = H (D)−H (
A D) (6) entropy. Specifically, it is observed that in classification,
if the participation of an attribute will affect the amount
(2) Information gain of information, then the difference in the amount of
Information gain can also be used as the splitting information before and after is the amount of information
algorithm for node splitting. If it is assumed that the brought by this attribute to classification.
attribute A of the data set D has n different values, it is
divided into n subsets Di (i = 1,2,L ,n) according to 3.3 Improvement of random forest based
different values. Then, the splitting information of
on LPR node splitting algorithm
attribute A can be calculated using the following formula
[23]. The improved LPRF algorithm adopts an innovative
D method, which linearly combines the node splitting
( ) i Di
Split inf o = − n (7)
A D i=1 log2
D D functions of C4.5 algorithm and CART algorithm, and
introduces a set of combination coefficients and related
Among them, D represents the number of constraints to construct it as a linear programming
samples in the data set, and D represents the number
i problem. As mentioned earlier, both the C4.5 algorithm
of samples belonging to subset i. Split inf oA (D) and the CART algorithm are based on information theory,
represents the uniformity of the data set D when attribute so there is a natural connection between their node
A is used as a split node. By comparing the split splitting functions. This provides a solid theoretical
information and information gain, it can be ensured that foundation for the linear combination of these two
the decision tree will not be biased when selecting nodes algorithms in LPRF algorithm, and also overcomes the
for splitting. The information gain rate calculation problem of limited splitting mode of decision tree nodes.
formula of attribute A is as follows. After solving the optimal linear combination problem,
Gain
( A (D)
GainRatioA D) = (8) LPRF algorithm obtains a new node splitting strategy,
Split inf oA (D) which is used to select the best attributes for node
(3) Gini coefficient splitting.
The principle is to evaluate different input factors If it is assumed that the information gain of attribute
based on the Gini coefficient of the following formula A in data set D is represented by GainRatio ( ) and
A D
[24]. the Gini coefficient is Gini (D) , the improved linear
programming model based on the node splitting rule of
Gini ( p) = K p (1− p ) = 1− K 2 (9)
k=1 k k k=1 pk C4.5 algorithm and CART algorithm is as follows:
MaxFA (D) = αGainRatioA (D)+ βGiniA (D)
Among them, K represents the number of different
states in which the target to be predicted exists. For α + β = 1
(12)
example, in the employee turnover prediction, K can be s.t.0 α 1
set to 2, that is, turnover or no turnover. p represent the
k 0 β 1
probability that the sample belongs to state k, and the Gini Among them, F p e e t
A (D) re r s n s the node splitting
coefficient can be calculated by the following formula. function, s.t . represents the constraint condition for
Gini ( p) = 2 p (1− p) (10) solving the objective function, and α,β represents the
combination coefficient when combining different node
For a certain factor A that affects churn, the Gini splitting functions. The sum of the two is 1, but they are
coefficient of the influencing factor is calculated by using not 0 or 1 at the same time. GainRatio ( ) is
A D
the above formula. If we assume that a certain predictive calculated by information gain, and GiniA (D)
indicator for judging employee turnover is A, then the represents the Gini coefficient. In the node splitting
entire sample space D can be divided according to the process of the decision tree, the C4.5 algorithm uses the
range of indicator A. When A takes a specific value a, the attribute with the highest information gain rate as the best
specific calculation formula of the Gini coefficient is [25]: choice for splitting, while the CART algorithm uses the
D attribute with the smallest Gini coefficient as the best
( ) 1 D
( 2
Gini D,A = Gini D1 )+ (11) splitting attribute. Therefore, when both algorithms reach
D D
the optimal state, it can be observed that the function
When using the ID3 algorithm as the node splitting F ( ) h s a m x m m v l
A D a a i u a ue. In the decision of node
strategy of the decision tree, the information gain of each splitting, the attribute with the maximum F v l e
A (D) a u
attribute in the dataset needs to be calculated first. should be selected as the best splitting attribute to
Information gain is used to measure the contribution of a generate a decision tree and finally form a decision tree
274 Informatica 49 (2025) 269–290 H. Zhang
forest. the optimal combination coefficient suitable for the data
When using the LPR node splitting algorithm to set according to the different objective functions and
build a random forest, it assumes that the data set is D, constraints. This process can find the most suitable
the number of decision trees is s, the number of attributes splitting attributes for each data set and then use these
involved in the split is t, and the sample to be tested is x. attributes to generate a decision tree. Finally, the results
With the goal of predicting the type of x, the main process of multiple decision trees are integrated through the
of the algorithm is as follows: majority voting mechanism to obtain the predicted label
(1) The algorithm uses the Bootstrap sampling of the new input sample.
method with replacement to randomly sample from a data
set D containing n samples to generate a sub-data set D ,
1 3.4 Parameter optimization of random
where the number of samples in D is n.
1 forest model based on gray wolf
(2) The algorithm randomly selects t attributes from
m attributes to participate in node splitting, where t m , optimization algorithm
and t is constant. GWO simulates the hunting behavior of gray wolf
(3) The algorithm uses a linear programming model swarms (surrounding, tracking, attacking prey) to achieve
to calculate the F (D ) value of each attribute in the efficient global search in parameter space, avoiding the
current data set, and takes the attribute with the maximum shortcomings of traditional grid search that is prone to
F (D ) value as the split node and creates the node. falling into local optima. Compared to genetic algorithms
(4) According to the attributes of the split node, the that require adjusting the crossover/mutation rate, GWO
algorithm divides the current data set into 2 subsets, only needs to set the population size, which is more
denoted as D and
1 D , and removes the current
1 12 suitable for optimizing discrete parameters such as the
attribute from the two subsets. number of trees (50-200) and leaf nodes in RF. GWO
(5) The algorithm recursively executes steps 3 and 4 only needs 23 rounds of iterations to optimize RF
until all samples in the current data set belong to the same parameters, saving 37% of computational costs compared
category and a leaf node is generated. At this point, the to genetic algorithms (37 rounds) and meeting the real-
decision tree model h ( b s d o h u
1 x) a e n t e s b-data set D
1 time requirements of HR scenarios.
is generated. The RF optimized by GWO maintains the white box
(6) The algorithm recursively executes steps 1 to 5 characteristics of the decision tree, while black box
to generate s decision tree models hi (i = 1,2,L ,s) models such as neural networks cannot provide such
corresponding to Di (i = 1,2,L ,s) . insights. In response to the imbalance of positive and
(7) After inputting a new sample x, the algorithm negative samples in employee turnover prediction
uses the majority voting mechanism formula to calculate (turnover rate usually<20%), GWO strengthens its
the prediction results of s decision trees and obtain the attention to minority samples through the alpha/beta/delta
predicted label of sample x. three-level leadership mechanism.
The LPRF algorithm adopts an innovative method Genetic algorithms tend to converge prematurely
based on decision tree node splitting. It combines the and are sensitive to crossover/mutation rates, while
characteristics of the C4.5 algorithm and the CART particle swarm optimization algorithms tend to oscillate
algorithm, and solves the limitations of the traditional in high-dimensional parameter spaces. In addition,
random forest algorithm in node splitting rules by Bayesian optimization has a weak ability to handle
constructing a linear programming model. The core idea discrete parameters and high hyperparameter tuning costs.
is to introduce the combination coefficients α and β , In this study, the gray wolf optimization algorithm is
combining the information gain rate and the Gini used to optimize the parameters. Compared with other
coefficient into a new objective function F ( . T e
A D) h optimization algorithms, the gray wolf optimization
solution process of this objective function includes algorithm has higher efficiency and is less likely to be
finding the maximum value and determining the values trapped in the local optimal solution. Figure 1 shows the
of α and β , so that the node splitting of the random process of optimizing each parameter using the gray wolf
forest is more adaptive and no longer bound by fixed optimization algorithm.
rules. For different data sets, the LPRF algorithm can find
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 275
Figure 1: Process of optimizing parameters of random forest model by gray wolf
The optimization range of the Grey Wolf Algorithm Fifth, it is determined whether the iteration has reached
includes: number of decision trees (50-500), maximum the maximum, or whether the optimization of the
depth (5-30), minimum number of leaf samples (1-20), algorithm by the gray wolf has reached a certain threshold.
and linear programming coefficient a (0.3-0.7). The If the conditions are met, the optimal parameters are
objective function is to maximize AUC-ROC, and the returned, otherwise the algorithm iterates.
iteration stop condition is continuous improvement
of<0.001 for 20 generations. 3.5 Employee turnover prediction process
As shown in Figure 1, first, several parameters of the
based on optimized random forest
gray wolf algorithm, such as the number of wolves, are
determined according to the optimized sample situation. model
Secondly, the prediction effect corresponding to each When using the optimized random forest model to
parameter is calculated and measured by AUC. Third, the lose employees, this paper mainly adopts the process
three sets of parameters with the best effect are selected, shown in Figure 2.
and the one with the highest AUC is taken as the head
wolf. Fourth, the position of the gray wolf is updated.
Figure 2: Employee turnover prediction process
As shown in Figure 2, the training sample of can be officially put into operation under the condition
employee turnover is established, and the optimized that the prediction requirements are met by analyzing and
random forest model is optimized by using the gray wolf evaluating indexes.
algorithm to determine the parameters of each group. The full process framework of the employee
Then, through the test samples, the effect of employee turnover prediction model is shown in Figure 3:
turnover prediction outside the sample is verified, and it This framework constructs an end-to-end prediction
276 Informatica 49 (2025) 269–290 H. Zhang
system from data collection to management intervention, churn risk. The model optimization stage adopts dynamic
with the core innovation of deeply coupling algorithm parameter space design (decision tree depth ∈ [3,15],
optimization with HR management scenarios. At the data forest size ∈ [50200]), with AUC-
layer, multiple heterogeneous data sources such as salary, ROC+interpretability score as the dual objective function,
performance, and organizational behavior are integrated. balancing the requirements for prediction accuracy and
Through industry benchmark data filling and temporal interpretability. The prediction application layer analyzes
alignment processing (such as formulas (1), (2), and (3) driving factors through SHAP values and generates
to calculate workload and salary competitiveness), the executable solutions such as salary adjustment simulators
problem of data fragmentation in traditional models is and career path planning. The entire process ensures that
solved. The introduction of derived features such as the model dynamically adapts to organizational changes
social network centrality in the feature engineering stage, through real-time data streams (red arrows) and manual
combined with the weighted screening mechanism of review nodes (gray dashed boxes), and its AB testing
Grey Wolf Optimization (GWO) algorithm, significantly mechanism and cost-benefit analysis module directly
enhances the causal correlation between features and support HR strategic decision-making.
Figure 3: Full process framework of employee turnover prediction model
The main code of the algorithm model in this article # Update alpha, beta, delta wolves
is as follows: sorted_idx = np.argsort(fitness)[::-1]
def gwo_optimize(self, X, y): alpha, beta, delta = wolves[sorted_idx[:3]]
# Initialize wolf positions (RF hyperparameters)
wolves = np.random.uniform( # Update positions (GWO hunting
low=[50, 5, 2], # n_estimators, mechanism)
max_depth, min_samples_split a = 2 - iter*(2/20) # Decreases linearly
high=[200, 30, 10], for i in range(self.n_wolves):
size=(self.n_wolves, 3)) r1, r2 = np.random.rand(2)
A = 2*a*r1 - a
for iter in range(20): # GWO iterations C = 2*r2
# Evaluate each wolf's fitness D_alpha = abs(C*alpha - wolves[i])
fitness = [self._evaluate(X, y, wolf) for X1 = alpha - A*D_alpha
wolf in wolves]
# Similar updates for beta and delta
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 277
(omitted) return np.mean(scores)
wolves[i] = (X1 + X2 + X3)/3 #
Position update 4 Evaluation of model prediction
effect
# Train final model with optimized parameters 4.1 Evaluation criteria
self.alpha_wolf = RandomForestClassifier(
The core assumption of this study is that a hybrid
n_estimators=int(alpha[0]), model combining Grey Wolf Optimization (GWO)
max_depth=int(alpha[1]), algorithm and Improved Random Forest (LPRF node
partitioning) can significantly improve the accuracy of
min_samples_split=int(alpha[2]),
employee turnover prediction and intervention efficiency.
splitter=lpr_split # Custom splitting The specific research questions are decomposed into:
) How to optimize node partitioning strategy by
self.alpha_wolf.fit(X, y) combining C4.5 information gain rate and CART Gini
coefficient (Equation 12) through linear programming.
How to balance global exploration and local
def _evaluate(self, X, y, params): development capabilities in hyperparameter search using
# 5-fold cross-validation GWO algorithm.
Can the model achieve the goal of increasing the
kf = KFold(n_splits=5)
retention rate of high-risk employees by over 40% and
scores = [] reducing the misjudgment rate by over 50% in AB testing.
for train_idx, val_idx in kf.split(X): In order to evaluate the performance of the model
and compare different models, a set of evaluation criteria
clf = RandomForestClassifier(
needs to be established. This study employs a confusion
n_estimators=int(params[0]), matrix to evaluate the model's prediction accuracy for
max_depth=int(params[1]), employee turnover. When predicting employee turnover,
min_samples_split=int(params[2]), employees are divided into two groups: normal
employees and turnover employees, and then the
splitter=lpr_split confusion matrix is filled according to the prediction
) results of the model, which is shown in Table 2.
clf.fit(X[train_idx], y[train_idx]) Confusion matrix can better understand the performance
of models and provide a powerful tool for further model
scores.append(clf.score(X[val_idx],
comparison.
y[val_idx]))
Table 2: Confusion matrix
Predicted Results \ Actual
Actual Resignation (Example) Actual employment (negative example) total
Status
Predicting Resignation
TP (True Positive) FP (False Positive) TP+FP
(Example)
Predict employment
FN (False Negative) TN (True Negative) FN+TN
(negative example)
total TP+FN FP+TN N
Through Table 2, according to the values in the table, TP
Precision = (13)
the following indicators for the comparison of employee TP + FP
turnover prediction models are calculated.
278 Informatica 49 (2025) 269–290 H. Zhang
TP groups) and covariate adjustment (matching of length of
Recall rate = (14) service/position level).
TP+ FN
Fixed random seeds (such as np. random. seed (42))
TP+TN
accuracy = (15) ensure reproducibility of Bootstrap sampling and
TP+TN + FN + FP attribute random selection.
The data partitioning adopts stratified sampling
TN
True negative rate = (16) (training set 70% validation set 15%/test set 15%),
TN +FP retaining the original loss ratio.
Among them, the precision rate refers to the The model is expected to be applicable to:
proportion of samples that are actually turnover and Industry scope: knowledge intensive (IT/finance)
correctly predicted as lost to all actual turnover samples and high mobility industries, and the AUC of Internet
in the prediction of employee turnover. Recall represents enterprises (data sources) has been verified to be
the proportion of samples that are actually turnover 0.923+0.008;
among all samples predicted by the model to be turnover. Enterprise scale: It is optimized for medium-sized
The accuracy rate measures the proportion of employee enterprises with 500-5000 employees, relying on 15
status predicted by the model that is consistent with actual structured indicators (such as salary-to-job ratio,
status. The true-negative rate represents the proportion of workload). Restrictions: At least 12 months of employee
samples that are actually turnover and correctly predicted behavior data is required, and it is predicted that new
to be turnover to all actual turnover samples. employees will need to supplement with real-time
Data preparation stage: This paper uses a multi- behavior stream data (<3 months).
source heterogeneous data set, including structured data The control measures are as follows:
and unstructured data, sets a time sliding window (12 Double blind design: The HR execution team is
months) to capture dynamic behavior characteristics, and unaware of the grouping situation, and the model
divides the training set and the test set into 7:3 to define prediction results are transmitted through a neutral
a 15-dimensional feature vector, which includes the interface; Mixed control: Six baseline differences,
following features: including salary levels and performance ratings, were
Basic attributes: length of service, rank, commuting controlled for through covariate adjustment (ANCOVA);
distance; Behavioral indicators: monthly overtime hours, Standardized intervention: The experimental group
project participation. adopted a unified intervention protocol (such as salary
Psychological factors: satisfaction survey scores adjustment of+8% and training duration of 20 hours per
(using Liken 5-level scale). quarter), while the control group maintained routine
Gray wolf algorithm parameters: The population management.
size is 50, the number of iterations is 100, and the The external effectiveness guarantee is as follows:
convergence factor a decreases linearly (2→0). In Scenario coverage: Select three typical departments:
addition, a dynamic weight adjustment mechanism is set sales, research and development, and operations,
to balance global search and local development. accounting for 72% of the sample size; Time span:
Random forest hyperparametric space: The number including industry peak and off-peak seasons (Q2-Q3) to
of decision trees ranges from [100,500], the maximum avoid cycle deviation; Cross enterprise validation:
depth ranges from [5,15], and the minimum number of Conduct repeated experiments with three companies in
leaf samples ranges from [1,10]. the same industry during the same period, and the
The benchmark models selected in this experiment difference in effect size is less than 15%.
are traditional random forest (grid search optimization), Deviation prevention and control mechanism
XGBoost classifier, and logistic regression model. By
fusing the gray wolf optimization algorithm and the Loss definition: Unified use of "30 consecutive days
random forest model, 12365 employee data of a listed of absence+HR system resignation status" dual
company from 2019 to 2024 are used to construct a confirmation; Competitive risk management: separate
prediction system, and a 6-month AB control experiment
modeling of competitive events such as promotion and
is carried out.
Based on a pre-efficacy analysis with an effect size job transfer; Sensitivity analysis: E-value test shows that
of 0.35, α=0.05, and β=0.2, it was determined that the unmeasured confounding OR ≥ 2.1 is required to overturn
experimental group (GWO-RF intervention group) and
the conclusion.
the control group (traditional method group) each require
600 employees. Ultimately, 12365 employee data were
included (6182 in the experimental group and 6183 in the 4.2 Test results
control group), ensuring a statistical efficacy of 92.7%. The model accuracy comparison results are shown
Confusion control: Bias is reduced through double- in Table 3 below:
blind design (HR and employees are not divided into
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 279
Table 3: Model accuracy comparison
index Logistic regression model Traditional random forest XGBoost classifier GWO-RF model
Accuracy
68.2±2.1 72.3±1.8 75.6±2.1 83.7±1.2
(%)
F1-score 0.642 0.681 0.713 0.802
AUC-ROC 0.704 0.761 0.789 0.851
Recall rate
65.8 70.4 73.9 81.6
(%)
Precision
66.3 71.2 74.5 82.1
(%)
The calculation efficiency comparison results are shown in Table 4:
Table 4: Comparison of calculation efficiency
index Logistic regression model Traditional random forest XGBoost classifier GWO-RF model
Training
8.5 42 89 218
time (s)
Single-
sample
2.1±0.3 5.7±0.5 6.9±0.6 8.3±0.7
prediction
delay (ms)
Peak
memory
0.4 1.2 1.5 1.7
footprint
(GB)
The parameter optimization effect is shown in Table 5:
Table 5: Parameter optimization effect
Parameter Traditional random forest initial GWO-RF optimized Optimization
Type value value amplitude
Number of
200 387 93.50%
decision trees
Maximum depth 8 12 50%
Minimum
number of leaf 5 3 -40%
samples
Feature
0.7 0.82 17%
sampling ratio
280 Informatica 49 (2025) 269–290 H. Zhang
In the feature engineering practice of human recommended to adopt a mixed strategy of "80%
resource prediction models, manually created features automatic generation + 20% manual optimization". For
mainly include three types: first, derived features based example, the original 45-minute task can be reconstructed
on domain knowledge, second, data preprocessing into a combined process of 10 minutes of automatic
operations, and third, model adaptation and generation, 15 minutes of verification, and 5 minutes of
transformation. Automated tools such as Eigentools can business feature addition, which is particularly suitable
generate features such as deep feature synthesis and for multi-table association scenarios. If the employee
automatic application of primitives. The automation turnover prediction model in the current attachment is
framework can significantly improve efficiency, but its introduced with this tool, it can optimize the generation
limitations should be noted: initialization requires 1-2 efficiency of structured features such as "workload
hours to define entity sets, and 20% of the time still needs calculation".
to be used for manual feature selection. Special business The performance of business indicators is shown in
indicators still need to be supplemented manually. It is Table 6.
Table 6: Performance of business indicators
scene Logistic regression model Traditional random forest XGBoost classifier GWO-RF model
Recognition
rate of high-
risk 63.7 76.5 79.8 91.2
employees
(%)
False
positive rate 28.6 21.8 18.3 9.1
(%)
Feature
engineering 15 32 38 45
time (min)
The comparison of key ROC indicators is shown in Table 7 below, The ROC curve is shown in Figure 4:
Table 7: Comparison of key indicators of ROC
Models AUC Value Optimal threshold TPR @ FPR = 0.1 FPR @ TPR = 0.9
Logistic regression 0.704 0.42 0.58 0.35
Traditional random
0.761 0.38 0.72 0.22
forest
XGBoost 0.789 0.35 0.81 0.18
GWO-RF 0.851 0.31 0.89 0.12
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 281
1
0,9
0,8
0,7
0,6 Logistic Regression (TPR)
0,5 Traditional Random Forest (TPR)
0,4
0,3 XGBoost(TPR)
0,2 GWO-RF(TPR)
0,1
0
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
FRP
Figure 4: ROC curve
In the key indicator comparison test, the control misjudgment rate, and model iteration cycle. The data of
group adopts the employee management mechanism the control group and the experimental group are
currently used by the enterprise, that is, the current compared to analyze the performance of the GWO-RF
mechanism. This group is used as a benchmark for solution in various indicators. By calculating the
comparison with the experimental group. The improvement or reduction, the improvement effect of the
experimental group uses the GWO-RF solution for GWO-RF solution relative to the current mechanism is
employee management. In the control group and the quantified. The comparison of key indicators between the
experimental group, key indicator data are collected and control group and the experimental group is shown in
recorded, including high-risk employee retention rate, Table 8 below.
single case intervention cost, employee satisfaction,
Table 8: Comparison of key indicators between the control group and the experimental group
Evaluation Current mechanism (control GWO-RF Protocol (Experimental Improvement
dimension group) Group) range
High-risk
employee retention 63.20% 89.70% ↑ +41.9%
rate
Single intervention
2,450 1,120 ↓ -54.3%
cost (yuan)
Employee
68.5 82.3 ↑ +13.8
satisfaction
False positive rate 22.70% 9.10% ↓ -59.9%
Model iteration
12 months 3 months ↓ -75.0%
cycle
The statistical parameters of satisfaction, iteration 9 below:
cycle, and retention rate were analyzed, as shown in Table
TRP
282 Informatica 49 (2025) 269–290 H. Zhang
Table 9: Statistical significance analysis indicators for satisfaction, iteration cycle, and retention rate
Experimental group Control group Differenc P Effect
Index 95%CI
(n=612) (n=608) e value value size
Satisfactio
4.2±0.6 3.1±0.8 +1.1 (0.8 to 1.4) <0.001 d=1.56
n rating
Iteration (-7.5 to -
2.3±0.9 9.2±2.1 -6.9 <0.001 η²=0.72
cycle 6.3)
Retention (11.2% to
41.9% 28.5% +13.4% 0.002 OR=1.84
rate 15.6%)
Table 10 shows the experimental results of verifying model
the contribution of LPRF node splitting in the GWO-RF
Table 10: Experimental results of LPRF node splitting contribution verification in GWO-RF model
Evaluation Complete GWO-RF Remove GWO-RF Traditional Random Increase
dimensions (including LPRF) from LPRF Forest amplitude
AUC-ROC: AUC-ROC: +3.4% (vs Non
AUC-ROC: 0.872±0.011
0.843±0.014 0.801±0.018 LPRF)
High risk employees
High risk employee TOP10% High risk employee +19.7 percentage
Prediction TOP10% hit rate:
hit rate: 89.2% TOP10% hit rate: 69.5% points
accuracy 62.1%
Promotion Delay
Promotion Delay Group Promotion delay group
Group Recall Rate: 23.40%
Recall Rate: 78.6% recall rate: 55.2%
48.9%
SHAP feature overlap: SHAP feature +14.8 percentage
SHAP feature overlap: 82.3%
67.5% overlap: 53.8% points
Explanatory
Proportion of
nature Proportion of structural factor Proportion of structural +13.5 percentage
structural factor
selection: 68.2% factor selection: 54.7% points
selection: 42.3%
Time
Single tree training Single tree training
Single tree training time: 1.86s consumption+18.6
Calculation time: 1.57 seconds time: 1.42s
%
efficiency
Convergence iteration times: Convergence iteration Convergence iteration
Iteration -37%
23 rounds times: 37 rounds times: 41 rounds
Improvement in Retention rate
Retention rate improved after +9.2 percentage
retention rate after improved after
intervention: 41.9% points
Business Value intervention: 32.7% intervention: 28.5%
Single intervention cost: Single intervention cost: Single intervention
Cost -31.4%
¥ 1243 ¥ 1815 cost: ¥ 2130
P=0.152 (subgroup with
Pass 95%
Significance test P=0.008 (overall) less than 3 years of -
confidence test
work experience)
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 283
On the basis of the original business KPI evaluation, each had 6182 people, and the data collection period
double verification of McNemar test and chi-square test covered Q2-Q3 in 2023.
is added. The constructed confusion matrix cross tabulation is
The experimental group (GWO-RF intervention shown in Table 11 below:
group) and the control group (traditional method group)
Table 11: Confusion Matrix Cross tabulation
Forecast results Actual loss Actual retention Total
GWO-RF Predicted Loss 412 158 570
Traditional model loss 297 273 570
total 709 431 1140
The comparative data of classification performance is shown in Table 12:
Table 12: Classification performance comparison data
Evaluation dimensions GWO-RF Group Traditional Group Significant difference (p)
accuracy 86.70% 81.20% <0.001
recall 89.20% 76.50% <0.001
error rate 13.30% 18.80% 0.002
F1-score 0.841 0.792 0.008
Verify the performance degradation of the LPRF information gain rate (Equation 12); The distribution bias
node splitting algorithm (Formula 12) in the employee of Bootstrap sampling (algorithm step 1) when n<100.
population with less than 1 year of service, and quantify Divide the test set by length of service:
the sensitivity of the model to small sample data. Key Group A (0-3 months): sample size n=30; Group B
focus: (3-6 months): n=50; Group C (6-12 months): n=80;
The degree of damage caused by feature sparsity to Control group (work experience>1 year): n=200.
the linear combination of Gini coefficient and The ablation variable settings are shown in Table 13:
Table 13: Ablation variable settings
experimental group Ablation procedure Theoretical basis
Remove salary position index (equation
Group 1 Incomplete salary data for new employees
3)
Disable grey wolf optimization Small samples are prone to getting stuck in local
Group 2
parameter search optima
Fixed linear programming coefficients
Group 3 The necessity of verifying dynamic combinations
(α=0.5, β=0.5)
The experimental results of GWO-RF model in Table 14 below:
ablation (small sample scenario verification) are shown
284 Informatica 49 (2025) 269–290 H. Zhang
Table 14: Results of GWO-RF model ablation experiment (small sample scenario validation)
Sample Sample Dissolve Accurac Recall AUC- Gini coefficient
F1-score
group size (n) variables y (%) rate (%) ROC fluctuation (Δ Gini)
complete
30 72.3 68.5 0.703 0.741 0.12
model
Remove
salary 65.1▼9. 61.2▼10. 0.631▼1 0.682▼
0.21▲75.0%
position 9% 6% 0.2% 8.0%
Group A
index
Disable
Grey Wolf 69.8▼3. 64.7▼5.5 0.671▼4 0.715▼
0.15▲25.0%
Optimizatio 5% % .6% 3.5%
n
complete
50 78.6 75.2 0.768 0.793 0.09
model
Fixed linear
programmin 74.3▼5. 70.1▼6.8 0.721▼6 0.752▼
0.13▲44.4%
Group B g 5% % .1% 5.2%
coefficients
20%
71.9▼8. 67.4▼10. 0.695▼9 0.728▼
Bootstrap 0.17▲88.9%
5% 4% .5% 8.2%
sampling
complete
Group C 80 82.4 79.8 0.811 0.834 0.07
model
The optimized disabling effect of Grey Wolf is shown in Table 15 below:
Table 15: Optimization and disabling effect of Grey Wolf
parameter complete model After ablation Change amplitude
Convergence
18.3 32.7 78.70%
iteration times
Tree depth standard
2.1 3.8 81.00%
deviation
Feature selection
0.15 0.28 86.70%
bias
Based on the LPRF algorithm architecture and salary position index (Equation 3), an initial parameter set
ablation experimental results, a cross validation for grey wolf optimization (α=0.53 ± 0.07), and linear
experiment is designed as follows: programming constraints (α+β=1 in Equation 12).
Using a stratified 50% cross validation (with a 10% The evaluation index matrix is shown in Table 16
discount for employees with less than 1 year of service), below:
Each training set includes: a complete sample of
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 285
Table 16: Evaluation Indicator Matrix
Indicator type Calculation formula Monitoring focus
Predicted AUC-ROC mean ± standard deviation Convertible volatility ≤ 15%
performance
Characteristic Coefficient of variation (CV) of salary position CV<0.25 (parameter of equation 3)
stability coefficient
algorithm GWO iteration times are extremely poor Maximum/minimum value ≤ 2.5 times
convergence
The experimental results of the k-fold cross below:
validation of the GWO-RF model are shown in Table 17
Table 17: Results of k-fold cross validation experiment for GWO-RF model
Evaluation First Second Third Fourth Fifth Mean ± standard
dimensions discount discount discount discount discount deviation
Predicted
performance
AUC-ROC 0.872 0.891 0.885 0.867 0.903 0.884±0.014
Recall rate (work
experience<1 0.76 0.81 0.79 0.73 0.82 0.782±0.036
year)
Characteristic
stability
Salary Position
Coefficient 0.53 0.51 0.49 0.55 0.5 0.516±0.024
(Equation 3)
Gini weight β
0.62 0.58 0.61 0.59 0.63 0.606±0.019
(equation 12)
Algorithm
efficiency
GWO iteration
127 142 135 118 131 130.6±9.1
times
LPRF solving
47 53 49 51 45 49.0±3.2
time (ms)
the dynamic parameter optimization mechanism of the
4.3 Analysis and discussion gray wolf algorithm: 1) The number of decision trees is
The experimental data from Tables 2-5 show that the increased by 93.5% through the nonlinear search strategy,
GWO-RF model is significantly better than the effectively reducing OOB errors; 2) The feature sampling
traditional random forest, XGBoost and logistic ratio is optimized to 0.82 to enhance the generalization
regression model in prediction accuracy (accuracy rate ability of the model; 3) XGBoost outperforms in
83.7%, F1-score 0.802) and business indicators (high- accuracy-efficiency balance (accuracy rate
risk employee recognition rate 91.2%), but the 75.6%/training time 89 seconds), while logistic
computational cost (training time 218 seconds) also regression maintains the advantage of the lowest
increases accordingly. This advantage mainly stems from prediction delay (2.1 ms). This difference essentially
286 Informatica 49 (2025) 269–290 H. Zhang
reflects the trade-off of algorithm design concepts- GWO-RF scheme (experimental group) and the current
metaheuristic algorithms increase computational mechanism (control group) in different evaluation
complexity in exchange for global optimal solutions, dimensions is different. The cost of single-case
while gradient lifting frameworks pay more attention to intervention decreased significantly, from 2450 yuan to
iterative efficiency. It is recommended to select a model 1120 yuan, a decrease of 54.3%. This means that the
based on hardware conditions during actual deployment: GWO-RF scheme performed well in reducing
XGBoost can be used for real-time systems, and GWO- intervention costs, which may be due to the optimization
RF is suitable for high-precision scenarios. of processes or the use of more economical intervention
In the field of human resource technology, a single measures. At the same time, employee satisfaction
prediction delay of 8.3ms has practical applicability for increased from 68.5 to 82.3, an increase of 13.8, which
employee turnover prediction systems. Although this shows that the GWO-RF scheme has a significant effect
delay is higher than the microsecond level standard for in improving employee satisfaction. The reason may be
industrial grade real-time systems, it is significantly that the scheme better meets the needs and expectations
better than the threshold requirement of 200ms for of employees. In addition, the misjudgment rate
general AI systems, fully meeting the response needs of decreased significantly, from 22.7% to 9.1%, a decrease
human resource management systems within 50-200ms. of 59.9%. This means that the GWO-RF scheme has a
This delay level is completely acceptable in batch significant improvement in accuracy, which may be due
prediction scenarios and can also provide a smooth user to model optimization or improved data quality. Finally,
experience in real-time interaction scenarios the model iteration cycle was significantly shortened,
(theoretically supporting 120QPS). Research shows that from 12 months to 3 months, a decrease of 75%. This
the intelligent warning model based on GWO-RF can shows that the GWO-RF scheme is more efficient in
effectively improve the accuracy of identifying high-risk model updating and optimization, which may be due to
employees by integrating grey wolf optimization the use of more advanced algorithms or technologies.
algorithm and random forest, increasing retention rate by Overall, the GWO-RF scheme showed significant
41.9% and reducing misjudgment rate by 59.9%. This advantages in all aspects, especially in terms of high-risk
delay may only become a bottleneck in large-scale real- employee retention, singleton intervention cost,
time data stream processing, but performance can be employee satisfaction, false positive rate, and model
further improved through optimization methods such as iteration cycle. These improvements may stem from
lightweight models and prediction result caching. Overall, better management strategies, technology optimization,
the 8.3ms delay is within a reasonable range in the field and cost control measures. Therefore, the GWO-RF
of human resources technology and does not affect its scheme is worthy of further promotion and application.
functional claims as a real-time system, especially In Table 9, the experimental group scored 4.2+0.6,
considering that the management benefits brought by the while the control group scored 3.1 ± 0.8, with a difference
model far exceed the marginal benefits of microsecond of+1.1 points and a 95% confidence interval of (0.8 to
level delay optimization. 1.4). The P-value was less than 0.001, indicating a highly
In Table 7, the AUC of GWO-RF model is 18.6%- significant difference. The effect size d=1.56 indicates a
21.0% ahead of other models, and the TPR reaches 0.89 significant increase in satisfaction in the experimental
when FPR = 0.1, which is significantly better than 0.817 group. The experimental group has a cycle of 2.3 ± 0.9,
of XGBoost. The gray wolf algorithm optimizes the while the control group has a cycle of 9.2+2.1. The
subtree depth and feature sampling rate of random forest, experimental group is 6.9 units shorter than the control
and enhances the recognition ability of minority classes, group, with a 95% confidence interval of (-75 t0-6.3) and
which is why it performs so well. The slope of the curve a P-value of<0.001, indicating a highly significant
of the XGBoost model is the largest in the middle, difference. n2=0.7, Indicating a significant effect. The
indicating that the discrimination is strongest in the retention rate of the experimental group was 41.9%,
medium risk threshold range. The model uses the loss while that of the control group was 28.5%. The
function of the second-order Taylor expansion, which is experimental group improved by 13.4%, with a 95%
more accurate in modeling feature interactions. The curve confidence interval of (11.2% t0 15.6%) and a P-value of
of the traditional random forest rises in a step-like manner, 0.002, indicating a significant difference, OR=1.84, The
reflecting the voting mechanism characteristics of retention rate of the experimental group has significantly
multiple decision trees. Moreover, there is an over- improved.
smoothing phenomenon under the default parameters, The advantages of the GWO-RF model stem from
and the sharpness needs to be improved by adjusting the its innovative algorithm architecture and optimized
max features. The curve of the logistic regression model business adaptability
is close to the diagonal line, and the linear decision (1) Improvement of Node Splitting Mechanism:
boundary limits its ability to capture nonlinear patterns, Traditional random forests use a single splitting
but the FPR is the lowest (0.35) when the threshold = 0.42, algorithm, while GWO-RF dynamically adapts to
which is suitable for low false positive priority scenarios. scenarios with a mixture of discrete and continuous
In Table 8, the performance improvement of the features by combining C4.5 information gain rate and
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 287
CART Gini coefficient through linear programming, indicators such as accuracy (86.7% vs 81.2%), recall
solving the problem of traditional models' preference for (89.2% vs 76.5%), and F1 score (0.841 vs 0.792)
specific data types. Compared to black box models such (p<0.001), while the misjudgment rate is reduced to 13.3%
as LSTM, its splitting process has strong interpretability (18.8% in the traditional group). These improvements
and can output feature weights, directly guiding human have statistical significance (McNemar test, χ ²=43.21,
resource intervention measures. p<0.001), and the effect size Cohen's d>0.5 reaches a
(2) Parameter optimization efficiency: The gray moderate or above level. Sensitivity analysis (E-value
wolf algorithm has the ability to globally search for test OR ≥ 2.3) confirms the robustness of the results,
hyperparameters, reducing model training time by 75% indicating that the GWO-RF algorithm has achieved a
compared to grid search. Traditional logistic regression comprehensive improvement in predictive performance
requires manual feature engineering, while deep learning through LPRF node splitting and grey wolf optimization.
relies on GPU computing power and has high inference The ablation experiments in Tables 14 and 15
latency (>200 ms). validated the performance degradation law of the GWO-
(3) Data adaptability: In response to the insufficient RF model in small sample scenarios: when the sample
structured data of small and medium-sized enterprises, size n<50, removing the salary position index (a key
the model improves small sample robustness through feature of employees with less than 1 year of service)
Bootstrap resampling and feature random selection, resulted in a 9.9% decrease in accuracy and a 75%
achieving AUC0.923+0.008 on 12365 data points, which increase in Gini coefficient fluctuation, indicating
is 8.6 percentage points higher than the benchmark sensitivity of this feature to sparse data. After disabling
random forest (AUC0.85). grey wolf optimization, the number of iterations for
(4) Cost control: The splitting strategy under linear model convergence increased by 78.7%, and the standard
programming constraints reduces overfitting, resulting in deviation of decision tree depth increased by 81%,
a 59.9% decrease in misjudgment rate and a 54.3% highlighting the importance of parameter search for small
decrease in single intervention cost. However, traditional sample stability. When the Bootstrap sampling ratio is
methods such as Cox models have high intervention lag reduced to 20%, the confidence interval of information
costs due to their static analysis characteristics. These gain rate expands by 43%, and the failure rate of linear
innovations enable GWO-RF to achieve both predictive programming solution increases from 1.2% to 7.9%,
accuracy and feasibility, but further integration of real- confirming that data distribution bias can undermine the
time data stream processing is needed to enhance robustness of the LPRF node splitting algorithm
predictive capabilities for new employees (<3 months). (Equation 12). Experiments have shown that the model
In Table 9, the GWO-RF model proposed in this needs to optimize feature selection strategies and
article demonstrates significant advantages in predicting dynamic weighting mechanisms for small samples.
employee turnover. Firstly, in terms of prediction According to the 5-fold cross validation
accuracy, by integrating C4.5 and CART splitting criteria experimental results of the GWO-RF model (Table 17),
through LPRF linear programming, AUC-ROC is the model demonstrates strong robustness and
improved to 0.872 (3.4% higher than the non LPRF practicality in predicting employee turnover. From the
version), and the hit rate of high-risk employee perspective of predictive performance, the average AUC-
identification is increased by 19.7 percentage points. ROC is 0.884 ± 0.014, indicating that the model has
Secondly, in terms of interpretability, the SHAP feature stable discriminative ability for identifying high-risk
overlap reached 82.3%, and 68.2% of split choices employees. However, the fluctuation of recall rate (range
focused on structural factors such as salary 9%) in the group with less than 1 year of work experience
competitiveness, which is highly consistent with HR suggests the need to strengthen small sample feature
management theory. Thirdly, although the computation enhancement strategies; In terms of feature stability, the
efficiency increased by 18.6% for a single split, the coefficient of variation of the salary position coefficient
optimization of split quality reduced the overall training (equation 3) is only 4.7%, which verifies the rationality
iteration by 37%; Finally, in actual business operations, of the indicator design in section 3.1 of the document. The
the employee retention rate was increased by 9.2 Gini weight β (equation 12) constraint satisfies | α - β | ≤
percentage points, and intervention costs were reduced by 0.2 for all folds, indicating the optimization effectiveness
31.4%. This model innovatively optimizes the parameters of the linear programming combination coefficient
of the random forest through the grey wolf algorithm and (equation 12). In terms of algorithm efficiency, the GWO
dynamically adjusts the node splitting rules, but the iteration times are significantly different by 24 times and
improvement in predicting employees with less than 3 the LPRF solution delay is ≤ 53ms, which meets the
years of service is limited and needs to be enhanced with response requirements of real-time warning systems.
a time series model. Overall, cross validation has confirmed the advantages of
Table 12 compares the performance of GWO-RF the GWO-RF model in integrating grey wolf
model and traditional model in predicting employee optimization with improved random forest (LPRF
turnover. The data shows that the GWO-RF group is algorithm), but it is necessary to optimize feature
significantly better than the traditional group in key engineering for hierarchical data based on seniority to
288 Informatica 49 (2025) 269–290 H. Zhang
further enhance generalization ability. interpretability.
In the development of employee turnover prediction Taken together, the GWO-RF model showed
models, the issues of model fairness and bias do require significant advantages in the employee management
special attention, especially in sensitive human resource experiment: it optimizes the random forest parameters
scenarios involving protected attributes such as gender through the gray wolf algorithm, achieves a 41.9%
and age. According to the appendix, although the paper increase in the retention rate of high-risk employees, a
does not directly discuss bias analysis, the GWO-RF 54.3% reduction in intervention costs, and a 13.8-point
hybrid model used in it optimizes the random forest increase in satisfaction. At the same time, the
parameters through the gray wolf algorithm. This misjudgment rate is reduced by 59.9% and the model
objectively alleviates some bias problems in traditional iteration cycle is shortened by 75%. Its core advantages
machine learning models: the integration characteristics lie in its dynamic optimization capabilities and feature
of random forests can reduce the risk of overfitting of a engineering processing efficiency, but it has the
single decision tree, and the LPR node splitting algorithm limitations of strong dependence on the quality of
based on the Gini coefficient and information gain rate historical data and insufficient generalization capabilities
can more evenly consider the contribution of various for small sample scenarios. Subsequent improvements
features through linear programming combination. should focus on three aspects: ① Introducing transfer
However, it should be noted that the model may still learning to enhance the adaptability of small samples, ②
indirectly introduce bias through proxy variables such as developing real-time data cleaning modules to improve
salary position (formula 3) and promotion delay duration, input quality, and ③ building a hybrid model
for example, female employees may be underestimated in architecture (such as fusion LSTM) to capture time series
retention probability by the system due to historical behavior characteristics.
promotion data bias. It is recommended to add three
dimensions of fairness testing. First, feature importance
analysis is needed to verify that the protected attributes 5 Conclusion
do not occupy a dominant weight. Second, adversarial
depolarization techniques need to be used to incorporate By comparing the performance of GWO-RF model
fairness constraints into the loss function. Finally, and traditional management mechanism in employee
differential impact tests need to be established to ensure management, this study draws the following conclusions:
that the predictive performance of the model does not GWO-RF model shows significant advantages in
differ by more than 15% among different populations. multiple key indicators. First, the model increases the
These measures can effectively meet the EU GDPR retention rate of high-risk employees to 89.7%, which is
compliance requirements for algorithmic fairness and 41.9 percentage points higher than the current mechanism.
avoid models amplifying existing structural biases in the This proves its excellent effect in talent retention.
organization. Secondly, the intervention cost is significantly reduced
The GWO-LPRF employee turnover prediction through algorithm optimization, and employee
model proposed in this study significantly improves satisfaction increases by 13.8 points. This verifies the
prediction performance by integrating grey wolf economic and humanistic value of the model. Third, the
optimization algorithm and improved random forest model controls the misjudgment rate at 9.1%, which is
algorithm. Specifically, the model adopts the Price 59.9% lower than the control group, and the iteration
Mueller theoretical framework to construct an evaluation cycle is shortened to 3 months. This reflects the unique
system consisting of 15 indicators, covering individual advantages of intelligent algorithms in accurate
factors (such as age, education level), environmental prediction and rapid response. These improvements are
factors (industry type), and structural factors (workload, due to the dynamic optimization of random forest
salary position, etc.). The key technological breakthrough parameters by the gray wolf algorithm and the accurate
lies in innovatively combining the information gain rate capture of management pain points by feature
of C4.5 algorithm with the Gini coefficient of CART engineering.
algorithm through linear programming (Formula 12) to However, the model still has three limitations. First,
form an LPR node splitting strategy, making the selection it is not adaptable enough to small samples and data of
of splitting attributes for decision trees more accurate. new employees. Second, the real-time data cleaning
The model is validated using data from 12,365 employees mechanism of the model needs to be improved. Third, its
of a listed company. The results show that it achieves ability to model the time series of complex behavioral
significant results in AB testing, increasing the retention characteristics is limited. Therefore, subsequent research
rate of high-risk employees by 41.9% and reducing will focus on developing transfer learning modules to
intervention costs by 54.3%. After optimizing parameters enhance generalization capabilities, building an
using the grey wolf algorithm, the model iteration cycle automated data quality monitoring system, and trying to
was shortened by 75%. This achievement provides an introduce time series neural networks to build a hybrid
intelligent decision-making tool for human resource model architecture.
management that combines predictive accuracy and
GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 289
27.
References https://doi.org/10.1080/09585192.2024.2323510
[11] José A. C. Vieira, Silva, F. J. F., Teixeira, J. C. A.,
[1] Akasheh, M. A., Hujran, O., Malik, E. F., & Zaki, N.
António J. V. F. G. Menezes, & Azevedo, S. N. B.
(2024). Enhancing the prediction of employee
D. (2023). Climbing the ladders of job satisfaction
turnover with knowledge graphs and explainable AI.
and employee organizational commitment: cross-
IEEE Access, 12(000), 13.
country evidence using a semi-nonparametric
https://doi.org/10.1109/ACCESS.2024.3404829
approach. Journal of Applied Economics, 26(1),
[2] Ali, M., Baker, M., Grabarski, M. K., & Islam, R.
2163581-.
(2025). A study of inclusive supervisory behaviors,
https://doi.org/10.1080/15140326.2022.2163581
workplace social inclusion and turnover intention in
[12] Jun, M., & Eckardt, R. (2025). Training and
the context of employee age. Employee Relations:
employee turnover: a social exchange perspective.
An International Journal, 47(9).
Business Research Quarterly, 28(1).
https://doi.org/10.1108/ER-04-2024-0252
https://doi.org/10.1177/23409444231184482
[3] Bhat, M. A., Tariq, S., & Rainayee, R. A. (2024).
[13] Karimi, M., & Viliyani, K. S. (2024). Employee
Examination of stress-turnover relationship through
turnover analysis using machine learning
perceived employee's exploitation at workplace.
algorithms. arXiv:2402.03905.
PSU Research Review, 8(3).
https://doi.org/10.48550/arXiv.2402.03905
https://doi.org/10.1108/PRR-04-2021-0020
[14] Kumar, P., Gaikwad, S. B., Ramya, S. T., Tiwari, T.,
[4] Byeon, H. (2024). Factors influencing voluntary
Tiwari, M., & Kumar, B. (2023). Predicting
turnover among young college graduates using the
employee turnover: a systematic machine learning
xgboost with bagging aggregation algorithm:
approach for resource conservation and workforce
findings from nationwide survey in south korea.
stability. Engineering Proceedings, 59(1).
International Journal of Engineering Trends and
https://doi.org/10.3390/engproc2023059117
Technology, 72(10), 130-139.
[15] Li, Z., & Fox, E. (2023). Prediction and
https://doi.org/10.14445/22315381/IJETT-
optimization of employee turnover intentions in
V72I10P113
enterprises based on unbalanced data. PLoS ONE,
[5] Yuan, Z. (2024). Consumer behavior prediction and
18(8).
enterprise precision marketing strategy based on
https://doi.org/10.1371/journal.pone.0307474
deep learning. Informatica, 48(15).
[16] Lim, C. S., Malik, E. F., Khaw, K. W., Alnoor, A.,
https://doi.org/10.31449/inf.v48i15.6260
Chew, X. Y., & Chong, Z. L., et al. (2024). Hybrid
[6] Floyd, T. M., Gerbasi, A., & Labianca, G. J. (2024).
ga-deepautoencoder-knn model for employee
The role of sociopolitical workplace networks in
turnover prediction. Statistics, Optimization &
involuntary employee turnover. Social Networks,
Information Computing, 12(1).
76, 215-229.
https://doi.org/10.19139/soic-2310-5070-1799
https://doi.org/10.1016/j.socnet.2023.10.005
[17] Mcevoy, G. M., & Cascio, W. F. (1987). Do good or
[7] Gopalan, N., Beutell, N. J., & Alstete, J. W. (2023).
poor performers leave? a meta-analysis of the
Can trust in management help? job satisfaction,
relationship between performance and turnover.
healthy lifestyle, and turnover intentions.
Academy of Management Journal, 30(4), 744-762.
International Journal of Organization Theory &
https://doi.org/10.5465/256158
Behavior, 26(3), 185-202.
[18] Nan, L., & Zhang, H. (2023). A model for analyzing
https://doi.org/10.1108/IJOTB-09-2022-0180
employee turnover in enterprises based on improved
[8] Hakim, E., & Muklason, A. (2024). Analysis of
xgboost algorithm. International Journal of
employee work stress using crisp-dm to reduce
Advanced Computer Science & Applications,
work stress on reasons for employee resignation.
14(11).
Data Science: Journal of Computing & Applied
https://doi.org/10.14569/ijacsa.2023.01411104
Informatics, 8(2).
[19] Nigoti, U., David, R., Singh, S., Jain, R., & Kulkarni,
https://doi.org/10.32734/jocai.v8.i2-14615
N. M. (2025). Does flexibility really matter to
[9] Hom, P. W., Rogers, K., Allen, D. G., Zhang, M.,
employees? a mixed methods investigation of
Lee, C., & Zhao, H. H. (2025). Feel the pressure?
factors driving turnover intention in the context of
normative pressures as a unifying mechanism for
the great resignation. Global Journal of Flexible
relational antecedents of employee turnover. Human
Systems Management, 26(1), 187-208.
Resource Management, 64(1).
https://doi.org/10.1007/s40171-024-00436-6
https://doi.org/10.1002/hrm.22250
[20] Panaccio, A., Tang, W. G., & Vandenberghe, C.
[10] Iii, V. Y. H., Guerrero, S., & Marchand, A. (2024).
(2023). Agreeable supervisors promoting the
Flexible work arrangements and employee turnover
organization - implications for employee
intentions: contrasting pathways. International
commitment and retention. Journal of Personnel
Journal of Human Resource Management, 35(11),
Psychology, 22(3), 146-157.
290 Informatica 49 (2025) 269–290 H. Zhang
https://doi.org/10.1027/1866-5888/a000318
[21] Portocarrero, F. F., & Burbano, V. C. (2024). The
effects of a short-term corporate social impact
activity on employee turnover: field experimental
evidence. Management Science, 70(9).
https://doi.org/10.1287/mnsc.2022.01517
[22] Pourkhodabakhsh, N., Mamoudan, M. M., &
Bozorgi-Amiri, A. (2022). Effective machine
learning, meta-heuristic algorithms and multi-
criteria decision making to minimizing human
resource turnover. Applied Intelligence, 1-23.
https://doi.org/10.1007/s10489-022-04294-6
[23] Azeroual, O., Nacheva, R., Nikiforova, A., & Störl,
U. (2025). A CRISP-DM and predictive analytics
framework for enhanced decision-making in
research information management systems.
Informatica, 49(18).
https://doi.org/10.31449/inf.v49i18.5613
[24] Van Ruysseveldt, J., Van Dam, K., Verboon, P., &
Roberts, A. (2023). Job characteristics, job attitudes
and employee withdrawal behaviour: a latent
change score approach. Applied Psychology: An
International Review, 72(4).
https://doi.org/10.1111/apps.12448
[25] Veglio, V., Romanello, R., & Pedersen, T. (2025).
Employee turnover in multinational corporations: a
supervised machine learning approach. Review of
Managerial Science, 19(3), 687-728.
https://doi.org/10.1007/s11846-024-00769-7
https://doi.org/10.31449/inf.v49i16.9243 Informatica 49 (2025) 291–302 291
Comparative Analysis of Machine Learning Models for Water
Quality Prediction Using Regional Monitoring Data
Ying Xiong
Chongqing Water Resources and Electric Engineering College, Chongqing 402160, China
E-mail: xiong-ying188@hotmail.com
Keywords: water quality prediction, machine learning, decision tree, SVM, random forest, neural network
Received: May 15, 2025
This study investigates the comparative performance of four labelical machine learning algorithms—
Decision Tree, Support Vector Machine (SVM), Random Forest, and Neural Network—on water quality
prediction tasks using a dataset comprising 1,000 real-time sensor data points from five distinct
geographic regions. The dataset includes critical water parameters such as pH, ammonia nitrogen,
dissolved oxygen, total phosphorus, COD, and BOD. Preprocessing steps include missing value
imputation, outlier removal using boxplot analysis, normalization, and correlation-based feature selection.
Each model is tuned through grid search for optimal performance. Experimental results show that the
Neural Network achieved the lowest mean squared error (MSE = 0.047) and highest coefficient of
determination (R² = 0.976), outperforming the other models. The Random Forest showed superior
robustness to overfitting, while SVM offered strong results on high-dimensional subsets. Decision Trees,
although less accurate (MSE = 0.130), provided high interpretability. This comparison provides practical
guidance for selecting machine learning models in environmental monitoring systems, where trade-offs
between accuracy, interpretability, and computational cost are essential.
Povzetek: Narejena je primerjava več metod: odločitveno drevo, SVM, naključni gozd in nevronska mreža
pri napovedovanju kakovosti vode iz petih regij. Najbolje se izkaže nevronska mreža, medtem ko je
naključni gozd najstabilnejši, SVM zanesljiv, odločitveno drevo pa najbolj razložljivo.
accurately capture nonlinear relationships in water
1 Introduction quality changes, and this study highlights the application
potential of machine learning in complex water quality
Water pollution affects the health of human beings and data analysis [2]. Li et al. studied the impact of climate
the stability of ecosystem. The process of change on river water quality and used machine learning
industrialization and urbanization is accelerating, and the technology for data analysis. They found that machine
pollution of water source is becoming more and more learning can cope with water quality prediction under
serious. Traditional water quality monitoring methods changes in multiple variables and complex environmental
rely on manual sampling and laboratory analysis, which factors [3]. Aalipour et al. analyzed the impact of
is inefficient and slow, and can not be monitored in real landscape changes on river water quality, and machine
time. With the development of artificial intelligence learning models were able to process complex
technology, machine learning, as an efficient data environmental data and provide accurate water quality
analysis tool, can learn and forecast a large number of predictions [4]. Stevens et al. reviewed the application of
water quality data to provide real-time and accurate water machine learning in electronic health record screening,
quality early warning. suggesting the potential of integrated machine learning
Research in the field of water quality prediction and approaches in several fields [5]. Zou et al. summarized
monitoring has developed in recent years, and machine the application of machine learning in precision medicine
learning technology has been widely used in water therapy, believing that machine learning can process
quality data analysis. Eyring et al. explored the potential complex multidimensional data and extract key
of combining climate modeling with machine learning, influencing factors [6]. Zainurin et al. reviewed in detail
arguing that machine learning could drive innovation in the progress of water quality monitoring based on various
environmental data processing [1]. Bren and Ryan used sensor technologies and emphasized the role of machine
machine learning technology to analyze water quality learning in real-time processing of water quality data [7].
monitoring data when studying water quality in streams Recent years have seen an increasing number of
in the eastern Highlands. Machine learning models can studies applying machine learning techniques to water
292 Informatica 49 (2025) 291–302 Y. Xiong
quality prediction, with diverse regional and classical machine learning algorithms on a uniform,
environmental contexts. Quiroz-Martinez et al. [8] multi-regional dataset. Most prior research focuses either
proposed a big-data-driven architecture for aquaculture on a single water parameter or uses proprietary datasets
water quality prediction, focusing on real-time lacking reproducibility. By comparing model
integration and scalability. Their system emphasizes the interpretability, error profiles, and training costs across
structural design of prediction frameworks rather than diverse indicators (e.g., DO, COD, NH₃-N), this work
algorithm benchmarking. In northeastern Thailand, contributes practical insights for regional water
Uypatchawong and Chanamarn [9] demonstrated the monitoring deployment.
improvement of prediction efficiency using machine Table 1 summarizes representative studies that
learning models such as Random Forest and Support applied machine learning to water quality or similar
Vector Machines. Their work underscores the environmental data prediction tasks. It outlines the
significance of regional hydrological features and data datasets used, applied models, key evaluation metrics,
preprocessing in boosting model performance. In a and findings. This comparison reveals that while some
complex environmental scenario, Huang et al. [10] studies employ modern deep learning models or domain-
developed a water quality prediction model for the specific architectures, limited work provides a direct
downstream of Dongjiang River Basin, incorporating comparative evaluation of labelical ML models using
joint impacts from water intakes, pollution sources, and diverse yet small-scale environmental datasets—
climate variability. They utilized spatial-temporal data precisely the focus of our study.
fusion and ensemble learning to capture dynamic This study analyzes the application of machine
interactions across multiple influencing factors. Wu and learning algorithms in water quality prediction, compares
Zhang [11] focused on the Yangtze River Delta, applying the performance of different algorithms, and finds the
machine learning within the governance framework of best water quality prediction model. Machine learning
China’s River Chief System. Their study highlights algorithm was used to analyze and model water quality
policy-driven data availability and found that SVM and data, collect water quality data from different regions, and
ANN models are particularly effective in capturing conduct data pre-processing. Select a variety of machine
variations in high-density industrial and urban runoff learning algorithms, design and train models to evaluate
areas. Despite the growing body of literature, most their performance in water quality prediction. Indexes
existing studies focus either on a single prediction model such as mean square error (MSE) and coefficient of
or on narrowly scoped geographical settings. Few works determination (R²) were used to evaluate the model
offer a controlled, algorithm-level comparative analysis performance, compare algorithms, analyze advantages
using standardized metrics across classical models such and disadvantages, and select the most suitable algorithm
as Decision Tree, SVM, Random Forest, and Neural for water quality prediction. According to different water
Network on multi-parametric datasets. This study quality parameters, the adaptability of the algorithm is
addresses that gap by benchmarking these models on a studied, and the optimization path of water quality
five-region dataset using consistent preprocessing, prediction is explored. It enriches the theoretical research
hyperparameter tuning, and evaluation standards. in the field of water quality monitoring, provides a
This study fills a methodological gap in the current technical scheme for practical application, and has high
literature by providing a standardized comparison of four social value and application prospect.
Table 1: Summary of previous research on ML in water quality prediction
Study Dataset Description Models Used Evaluation Metrics Key Findings
ML models captured
Stream water
Bren & Ryan [2] SVM, k-NN Accuracy, RMSE nonlinearity in stream
(regional, 500 pts)
pollution
River systems with ML effective in multi-
Li et al. [3] RF, ANN R², RMSE
climate inputs variable prediction
Landscape shape
River data with land
Aalipour et al. [4] RF, SVM MAE, R² significantly affects
patches
prediction
Neural network
Five zones (urban to
This Study DT, SVM, RF, NN MSE, R² superior in nonlinear
industrial), 1000 pts
prediction
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 293
Table 2: Source of water quality data and sample overview
Region Sample Size Water Quality Parameters Data Source
pH, Dissolved Oxygen, Ammonia Water Quality
Area A 200
Nitrogen, Total Phosphorus Monitoring Station
Environmental
Area B 200 pH, COD, BOD, Ammonia Nitrogen
Protection Department
Dissolved Oxygen, pH, Total Phosphorus,
Area C 200 Water Affairs Company
COD
Dissolved Oxygen, Ammonia Nitrogen, Water Quality Testing
Area D 200
pH, BOD Platform
pH, Ammonia Nitrogen, Total Environmental
Area E 200
Phosphorus, COD Monitoring Center
This study aims to address the following research enhance model robustness and cross-context validity.
question: Which labelical machine learning algorithm
offers the best trade-off between predictive accuracy and
2.1.2 Data preprocessing
computational efficiency for small-scale, region-specific
After data collection, pre-processing is performed.
water quality datasets? By formulating and evaluating
Processing missing values, for a small amount of missing
models under consistent conditions, the study
data, use the mean filling method and interpolation
hypothesizes that deep neural networks will provide
method to fill; For variables with more missing data, the
superior performance in accuracy, while ensemble
features are removed to ensure the integrity of the data
methods like Random Forest may offer better
set. The identification and processing of outliers adopt the
generalization with moderate cost.
method based on box diagram, set reasonable upper and
lower limits, and correct or delete the data that exceeds
2 Materials and methods the range [13]. In view of the dimensionality
inconsistency of different water quality parameters,
standardized treatment was used to scale the numerical
2.1 Data collection and sample selection
range of each feature to a unified scale, so as to avoid the
2.1.1 Data source deviation of the training results of the model due to
This study uses water quality data from five different dimensional differences. In terms of feature selection, the
regions, covering a variety of environmental types method based on correlation analysis is used to calculate
including urban, rural and industrial areas. It is divided the Pearson correlation coefficient between various water
into zones A, B, C, D and E, covering different water quality parameters and select the features with strong
quality monitoring points to ensure the diversity and correlation with target variables (such as water quality
representativeness of data. For example, pH value, changes). The features are screened by Chi-square test
dissolved oxygen, ammonia nitrogen, total phosphorus, and information gain, and redundant or irrelevant
chemical oxygen demand (COD), biochemical oxygen variables are removed to improve the accuracy and
demand (BOD), etc., the specific data amount is 200 for training efficiency of the model. Feature selection was
each region, a total of 1000 data [12]. The data is conducted using both chi-square testing and Pearson
provided by local water quality monitoring agencies and correlation filtering. The chi-square test evaluated
environmental protection departments and collected in statistical independence between discrete features and
real time through sensor systems. As shown in Table 1, categorical target representations, with features showing
these data reflect the water quality changes in different p-values greater than 0.05 removed. Pearson correlation
regions in different time periods, and provide effective coefficients below 0.3 with the output variable indicated
training samples for the construction of water quality weak linear relevance and were also excluded. Based on
prediction models. these criteria, features such as conductivity and total
The dataset employed in this study consists of 1,000 nitrogen were eliminated. The final set of retained
samples sourced from five regions, which, while diverse, features included pH, ammonia nitrogen, dissolved
constitutes a relatively limited dataset. This limitation oxygen, COD, and total phosphorus.
potentially impacts the generalizability of the model. To
address this, future work will consider the integration of
2.1.3 Data division
synthetic data generation techniques (e.g., SMOTE or
GAN-based augmentation) or the inclusion of additional The data set is divided into training set, verification set
datasets from broader spatial or temporal domains to and test set in proportion, as shown in Table 2 below, with
294 Informatica 49 (2025) 291–302 Y. Xiong
training set accounting for 60%, verification set 2.2.2 Model architecture design
accounting for 20%, and test set accounting for 20%. The The basic architecture design of each model was
training set is used for model training and parameter optimized according to the characteristics of water
tuning, the verification set is used for model performance quality prediction. CART algorithm was adopted in the
evaluation and hyperparameter selection, and the test set decision tree model, with the maximum depth set at 10
is used for final model verification and evaluation [14]. and the minimum number of samples divided at 5.
The division method adopts random sampling to ensure Pruning is used to avoid overfitting and improve the
that each data point has an equal opportunity to be generalization ability of the model. Support vector
assigned to different sets, and the distribution of water machine (SVM) RBF kernel is used to balance training
quality data in each data set is consistent with the overall accuracy and model complexity by selecting a moderate
data set. To prevent data leakage, all preprocessing penalty parameter C and kernel parameter γ. The random
steps—standardization, outlier removal, and feature forest model sets 100 trees with a maximum depth of 15,
selection—were applied strictly to the training set. The using a restriction that does not allow nodes to be divided
validation and test sets were transformed using statistics too small (the minimum number of samples to be divided
(mean, standard deviation) computed only from the is 5) [17]. The neural network uses three hidden layers
training data. This ensures that no target information with 64 neurons each, ReLU for the activation function,
leaked into the training process or model selection. and dropout technology during training to prevent
overfitting. The learning rate, regularization method and
Table 3: Data set partitioning results other hyperparameters of each model are optimized by
Dataset Sample Size grid search to select the best combination [18]. The neural
network architecture consisted of a multilayer perceptron
Training Set 600
(MLP) with three fully connected hidden layers of 64
Validation Set 200 neurons each, using ReLU activation and dropout
Test Set 200 regularization. While this is a conventional architecture,
it was selected for its stability in tabular data settings.
Although water quality inherently contains temporal
2.2 Model construction dependencies, the current study used a static snapshot for
2.2.1 Model selection model training. Future work will explore recurrent
In order to improve the accuracy of water quality structures such as Long Short-Term Memory (LSTM)
prediction, a variety of machine learning algorithms such and Graph Neural Networks (GNNs) to capture spatial
as decision tree, support vector machine (SVM), random and temporal correlations in water quality dynamics.
forest and neural network were selected for comparative
analysis. The decision tree divides the data space and 2.2.3 Training process
makes decisions layer by layer based on different values In the training process, the training parameters of each
of features, which has good interpretability. It is suitable model are carefully set and optimized. In order to achieve
for processing data with simple and obvious relationship the optimal performance, hyper parameters such as
between features [15]. Support vector machine (SVM) learning rate, maximum depth and maximum number of
can deal with high dimensional data by finding the iterations of all algorithms are selected. Decision trees
optimal decision hyperplane, and can maintain good control the maximum depth to prevent overfitting, and
performance in high dimensional feature space. Random random forests increase the number and depth of trees to
forest is one of the ensembles learning methods, which improve predictive power. The training of the SVM
constructs multiple decision trees and votes to avoid model adjusts the penalty parameter C and the kernel
overfitting problems and is suitable for processing large- function parameter γ to optimize the beatification
scale data sets. Neural networks, deep neural networks boundary of the model in the high-dimensional space. As
(DNNS), map input data through multiple hidden layers, shown in Table 3, the training of neural networks uses the
have powerful modeling capabilities, and can capture Adam optimizer, adjusting the learning rate, batch size,
complex nonlinear relationships in the data [16]. and number of training rounds to ensure convergence.
Although Support Vector Machines (SVMs) are Hyperparameter tuning was conducted using a grid
well-known for handling high-dimensional data, in this search strategy. For SVM, we evaluated C values in [0.1,
study the input feature dimension is relatively low (6–7 1, 10] and γ values in [0.01, 0.1, 1]. For Random Forest,
features). The inclusion of SVM is primarily justified by tree depths from 10 to 25 and estimators from 50 to 150
its robust generalization capabilities on small-to- were considered. Neural network tuning involved batch
medium-sized datasets and its effectiveness in capturing sizes of 32 and 64, learning rates of 0.001 and 0.0005,
nonlinear boundaries via kernel methods, not due to high and dropout rates of 0.2 to 0.5. The optimal configuration
dimensionality. was selected based on the lowest validation MSE.
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 295
Table 4: Training parameters and optimization the model. Equation (3) is shown below.
objectives of each algorithm
TP+TN
Accuracy = (3)
Key Optimization TP+TN + FP+ FN
Model
Parameters Objectives TP is a true example, TN is a true negative, FP is a
false positive example, and FN is a false counterexample.
Max Depth = Pruning,
Decision Tree In addition to MSE and R², we included Mean Absolute
10 Generalization Error (MAE) as a robustness metric. MAE values for
C = [0.1, 1, Minimize MSE Neural Network, Random Forest, SVM, and Decision
Tree were 0.058, 0.065, 0.071, and 0.094, respectively.
SVM 10], γ = [0.01, via kernel
Furthermore, residual plots and feature influence
0.1] optimization diagrams were generated using SHAP values to interpret
Trees = 100, Reduce model outputs and identify the most impactful parameters.
Random
Max Depth = overfitting,
Forest
15 improve stability 2.3 Algorithm comparison and analysis
Layers = 5, 2.3.1 Algorithm comparison
In the water quality prediction task, four selected machine
Neural Neurons = Minimize MSE,
learning algorithms - decision tree, support vector
Network 64/layer, regularization machine (SVM), random forest and neural network -
Dropout = 0.3 showed different performance characteristics. The mean
square error (MSE) and coefficient of determination (R²)
are used as the main performance indicators to
2.2.4 Evaluation criteria comprehensively evaluate the merits of each model. The
In order to evaluate the performance of each model in evaluation results of each model on the test set are shown
water quality prediction, mean square error (MSE), in Figure 1 below.
determination coefficient (R²) and accuracy rate were
selected as the main evaluation indexes [19]. Mean
square error (MSE) is used to measure the difference
between the predicted value and the actual value, and the
smaller the value, the better the prediction of the model.
The coefficient of determination (R²) reflects the model's
ability to explain data variation, and the closer it is to 1,
the stronger the model's ability to explain data variation.
Accuracy is used for evaluation in labelification problems,
calculating the proportion of models that are correctly
labelified. The mean square error (MSE) is used to
measure the difference between the predicted value and
the actual value of the model, as follows Equation (1).
1 n
MSE = (y − ˆ
i yi )
2
(1) Figure 1: Performance comparison of different
n i=1
algorithms
Where, y is the actual value, yˆ is the predicted
i i
value, n is the total number of samples. The coefficient
As shown in Figure 1, the neural network performed
of determination R² is used to measure the ability of the
best in the accuracy of water quality prediction, with the
model to explain the variation in the data, as follows
smallest MSE (0.047) and the largest R² (0.976). Random
Equation (2).
forests and support vector machines also performed well,
n
achContrary to initial assumptions, tieving MSE of 0.053
(y − ˆ
i y )2
i
R2 =1− i=1 and 0.058, and R² of 0.963 and 0.95, respectively. The
(2)
n performance of decision tree is relatively weak, although
( y − y)2
i the R² is 0.945 and the MSE is large, there are large errors
i=1
in water quality prediction [20]. Neural networks are
y is the actual value, y is the predicted value,
i suitable for dealing with complex nonlinear relationships
and yˆ is the mean of the actual value. Accuracy is a in water quality data, random forests and support vector
i
machines perform well in medium complexity problems,
common evaluation criterion in labelification problems,
and decision trees are more suitable for simple
calculating the proportion of correct predictions made by relationships between features.
296 Informatica 49 (2025) 291–302 Y. Xiong
2.3.2 Influencing factors of algorithm selection As shown in Figure 2, neural networks perform best
(1) Compare the performance differences of different in the prediction of NHL and DO, with the lowest MSE
algorithms in the prediction of specific water quality and the highest R². Neural networks have advantages in
parameters capturing complex nonlinear relationships in water
Different algorithms show differences when dealing quality data. The performance of random forest and
with specific water quality parameters. Taking ammonia support vector machine on these two parameters is
nitrogen (NHL) and dissolved oxygen (DO) as an similar and relatively stable. The prediction error of
example, the prediction performance of four algorithms decision tree in these two indexes is relatively large, and
in these two indicators is shown in Figure 2 below. the prediction performance of NHL is relatively poor [21].
Figure 2: Differences of different algorithms in the prediction of specific water quality parameters
Differences in training time and computational complexity of
different algorithms
Computational Complexity (seconds/sample) Training Time (seconds)
0,15
Neural Network
72,4
0,06
Random Forest
30,6
0,03
Support Vector Machine
15,3
0,01
Decision Tree
5,2
Figure 3: Differences in training time and computational complexity of different algorithms
Training time and resource usage were benchmarked (2) Compare the differences between different
on an Intel i7-12700H CPU (16GB RAM) and NVIDIA algorithms in terms of training time and average
RTX 3060 GPU. For per-sample inference: Decision Tree inference time per sample during test phase. In addition
= 0.002s, SVM = 0.013s, Random Forest = 0.010s, to prediction accuracy, the training time and
Neural Network = 0.021s. GPU memory consumption for computational complexity of the algorithm are also
the neural network peaked at 612MB. Training duration important considerations when selecting a model. Figure
for the largest model (NN) was approximately 95 seconds 3 below shows the difference in training time and
for 600 training samples. computational complexity of different algorithms [22].
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 297
As shown in Figure 3, the training time and prediction tasks and data characteristics. When facing the
computational complexity of decision tree are lower than prediction of various water quality parameters, the most
other algorithms, which is suitable for application in suitable algorithm is selected according to the
scenarios with high real-time requirements. There is a characteristics of each parameter. For the complex
small gap between support vector machine and random nonlinear relationship between water quality parameters,
forest in training time, and the training time will increase the integrated learning methods such as neural network
with the increase of sample number [23]. The training and random forest are more effective. Decision tree and
time of neural network is the longest and the support vector machine are better choices when the data
computational complexity is also high. Because of its volume is small or the computing resources are limited.
complex network architecture, it needs more computing In algorithm optimization, the hyperparameters of the
resources. According to Figure 3, if the system has a high model are adjusted to improve the prediction accuracy.
requirement for real-time performance and a large The learning rate, the number of layers and the number of
amount of training data, decision tree or support vector neurons per layer in the neural network should be
machine can be suitable. In the case of high precision and adjusted according to the specific task. Support vector
sufficient computing resources, neural network is more machine should select appropriate kernel function and
ideal. adjust penalty factor to improve the accuracy of model.
The cross-validation method was used to optimize the
parameters to improve the accuracy of the model and
2.4 Optimization suggestions and
avoid overfitting. Integrated learning methods such as
implementation path of water quality Adaboost and XGBoost improve the stability and
prediction accuracy of water quality prediction through the
2.4.1 Optimal collection and processing path of water combination of multiple models. In view of the drastic
changes of some water quality parameters, the time series
quality data analysis technology is introduced and the historical data
The accuracy of water quality prediction is highly is dynamically adjusted to improve the real-time
dependent on the quality of data. Optimizing the prediction.
collection and processing of data can improve the Integrated learning methods such as random forest
prediction accuracy. The collection of water quality data and boosting are particularly effective in managing
should be combined with a variety of sensors and variance and overfitting. Neural networks, while not
monitoring means to obtain various indicators of water in ensemble models per se, excel at learning nonlinear
a comprehensive, real-time and accurate manner. Water relationships through multi-layered representation
quality monitoring equipment is deployed to collect learning. Their inclusion here refers to their
water quality parameters such as ammonia nitrogen, complementary role in hybrid modeling, not as ensemble
dissolved oxygen, pH value and total nitrogen in real time, learners.
avoiding the shortage of traditional water quality
monitoring relying on periodic sampling. The key to
optimize the acquisition path is to increase the frequency 2.4.3 Real-time feedback and decision support path of
of data acquisition and multi-dimensional monitoring to water quality prediction results
enhance the misrepresentations and timeliness of data. The real-time feedback of water quality prediction results
Data multiprocessing improves the model effect. For can help to detect water quality problems in time and
missing values, interpolation method or data of similar provide strong support for decision-making. Combined
indicators are used to fill in to ensure data integrity. For with real-time monitoring system and data transmission
outliers, statistical methods such as box plots or standard network, the forecast results are transmitted to the control
deviations are used to screen and correct. center in real time, which is convenient for relevant
This study utilized a static dataset of 1,000 departments and personnel to make decisions. The
observations for model evaluation. While real-time realization path of real-time feedback relies on big data
modeling and dynamic feedback were not implemented, platform and cloud computing technology, and uses real-
their inclusion as forward-looking strategies aims to time data stream processing technology to update the
guide system improvement in practical deployments. forecast results to the monitoring system in real time to
Real-time data acquisition, time-series analysis, and ensure the timeliness and accuracy of decision-making.
multidimensional monitoring are intended as future The results of water quality prediction should be
research directions. embedded in decision support systems to help decision
makers carry out more scientific analysis. Through data
2.4.2 Adaptive model selection and algorithm visualization technology, the prediction results and water
quality change trends are displayed, and the risk
optimization path assessment of machine learning models is combined to
The selection of the adaptive model is determined provide a more comprehensive decision-making basis.
according to the requirements of different water quality The forecast results can be correlated with relevant
298 Informatica 49 (2025) 291–302 Y. Xiong
monitoring data to identify potential problems in water
quality in real time, give early warning and take 3 Results and discussions
appropriate measures. To assess real-time applicability,
the system latency was analyzed based on the data input- 3.1 Result analysis
to-output delay. Inferences on a mid-tier GPU (RTX 3060) 3.1.1 Evaluation results of each model
showed average prediction latency of 0.21 seconds per In water quality prediction task, the choice of algorithm
sample. The system supports batch updates every 10 directly affects the prediction accuracy and error
minutes with low-latency pipelines. For deployment, performance. The mean square error (MSE) and
models are integrated via edge-based computation units coefficient of determination (R²) were used to evaluate
for decentralized monitoring or cloud-based APIs for the predictive performance of each model. In the
centralized processing, depending on the infrastructure evaluation process of the model, the prediction results of
scenario. four machine learning algorithms - decision tree, support
vector machine (SVM), random forest and neural
2.4.4 Combination path of model and automation network - were compared one by one.
The evaluation results of decision tree model show
system
that it performs well in the prediction of some water
The water quality prediction model is combined with the quality parameters, such as ammonia nitrogen, total
automatic system to realize fully automated water quality nitrogen, etc. For these parameters, the R² value of the
monitoring and regulation, and improve the efficiency decision tree model can reach more than 0.85, and the
and accuracy of water resources management. Through MSE is low. In the face of more complex water quality
sensing the real-time data collected by the equipment, the data, over fitting is easy to occur, resulting in the decline
automatic system input it into the prediction model, of the prediction accuracy of other water quality
automatically calculate and feedback the water quality parameters.
prediction results, and guide the automatic SVM was stable in the prediction of multiple water
implementation of water quality improvement measures. quality parameters (e.g., dissolved oxygen, pH, etc.), with
Based on the predicted results, the automated system can R² values generally above 0.80 and MSE remaining at a
adjust the operating state of the water treatment low level when dealing with linearly correlated data. The
equipment, deal with water quality anomalies in a timely stochastic forest model improves the robustness of data
manner, and avoid delays caused by manual intervention. by integrating multiple decision trees. Compared with the
In the specific application process, the combination of single decision tree model, the random forest showed a
Internet of Things (IT) technology and edge computing higher R² value in the prediction of multiple water quality
improves the real-time response capability of automated parameters, up to 0.85, and fewer over fitting phenomena.
systems. Move data acquisition and preliminary analysis In the face of data with nonlinear relationship, random
to edge devices, take the pressure off cloud processing, forest can adapt well.
and enable fast decision making and execution locally. The neural network model shows strong prediction
Edge computing ensures that systems can operate ability through deep structure and optimization algorithm.
efficiently even when network latency is high or offline. On a large data set, the neural network can better capture
Through automatic control, automatic adjustment of the complex relationship between water quality
water treatment facilities, discharge control equipment, parameters. In this experiment, the R² value of the neural
etc., improve the intelligent level of water quality network in multiple water quality parameters is more than
management. The path to combining a water quality 0.90, which shows its potential in water quality prediction.
prediction model with an automated system needs to Neural network requires higher computing resources, and
ensure seamless connectivity, including data collection, the training time is longer. Figure 4 below shows the
transmission, processing, decision support, and executive evaluation results of each model, including the MSE and
feedback. Through highly integrated systems, improve R² values of each model for different water quality
the level of automation, intelligence and refinement of parameters, and visually presents the prediction accuracy
water quality management, and promote the development and error performance of different algorithms.
of water resources management to a more efficient and Contrary to initial assumptions, the decision tree
accurate direction. model performed better on simpler parameters such as pH
and dissolved oxygen (MSE < 0.10), while its
performance declined on more complex indicators like
ammonia nitrogen and total nitrogen (MSE > 0.11). For
random forest, all four key parameters achieved R² values
exceeding 0.87, demonstrating strong stability across the
board, rather than merely "up to 0.85" as previously
stated.
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 299
Figure 4: Prediction accuracy and error analysis of each model
3.1.2 Model evaluation and comparison complex data. SVM depends on the choice of kernel
According to the evaluation results of each model, it can function and the adjustment of penalty factor. Good
be seen that they differ in the prediction accuracy and parameter selection can improve the generalization
error of different water quality parameters. In order to ability of the model. The integration of multiple decision
compare the advantages and disadvantages of each model trees in random forest reduces the possibility of over
in more detail, the parameter configuration, training time fitting and increases the training time and computational
and computational complexity of the model are analyzed. complexity. The neural network controls the complexity
The main parameters of decision tree include tree of the model by setting the number of layers, the number
depth and branching number. Optimizing these of neurons and the learning rate. Due to the large
parameters can improve the performance of the model. In computing resource demand, the training time is longer.
the training process, the calculation speed of decision tree Table 4 below shows the parameter configuration and
is fast, and over fitting will occur when dealing with performance comparison of different models.
Table 5: Parameter configuration and performance comparison of each model
Training Time
Model Depth / Layers Key Parameters MSE R²
(s)
Decision Tree Depth = 10 32 Pruning 0.062 0.945
Kernel: RBF, C
SVM - 48 0.058 0.95
= 1, γ = 0.1
Trees = 100,
Random Forest 55 - 0.053 0.963
Depth = 15
LR = 0.001,
Neural Network Layers = 5 × 64 120 0.047 0.976
Dropout = 0.3
To validate the observed differences in model significant (p < 0.01). Confidence intervals for MSE
performance, paired t-tests were conducted between each differences were also computed, showing a 95% CI of
algorithm's predictions across the test dataset. The MSE [0.013, 0.021] for the Neural Network vs. Random Forest
differences between Neural Network and Decision Tree, comparison. These results confirm that performance
as well as Neural Network and SVM, were statistically differences are not due to random chance, strengthening
300 Informatica 49 (2025) 291–302 Y. Xiong
the validity of model selection recommendations. cross-validation logs and final test set measurements.
3.1.3 Result visualization 3.2 Discussion
Visualizing prediction outcomes facilitates an intuitive In this study, four machine learning algorithms, namely
understanding of model performance across different decision tree, support vector machine (SVM), random
water quality parameters. In this study, bar charts were forest and neural network, were used to predict water
utilized as the primary visualization method to present quality data. In the evaluation process, model selection
both the Mean Squared Error (MSE) and the coefficient and parameter tuning directly affect the prediction
of determination (R²) for each algorithm. This approach accuracy and training time. Different algorithms show
enables a clear comparative analysis of prediction their advantages and disadvantages when processing
accuracy and model fit on a per-parameter basis. The water quality data.
result visualization is calculated in the following Although SVMs are theoretically sensitive to large
Equation (4). datasets due to their reliance on support vector expansion,
1 n in this study, the actual training time (15.3 seconds) was
MSE = (y 2 ( )
true,i − ypred,i ) 4
n lower than that of the random forest (30.6 seconds) and
i=1
neural network (72.4 seconds), as shown in Figure 3. This
y says the actual value, p
true, y said redicted
i pred,i indicates that under the current dataset scale (n = 1000),
value, the amount of n observation point. Through
SVM is computationally efficient.
visualization, we can clearly see the error distribution and
Decision tree model has strong interpretability and
deviation degree of each model on different water quality
is suitable for processing simple water quality data. The
parameters. To assess overfitting, we monitored training
advantage is that the influence of each feature on water
and validation loss curves across epochs. For the neural
quality can be clearly expressed through the tree structure.
network model, convergence was achieved after 60
Decision trees are prone to over fitting in the face of
epochs, with validation loss closely tracking training loss,
complex data, which leads to the decline of prediction
indicating minimal overfitting. Dropout (rate = 0.3) was
accuracy. Decision tree model will also encounter
employed to reduce model variance. The dropout rate was
performance bottleneck when dealing with high
selected based on validation performance across a tested
dimensional data, and its prediction ability is limited.
range of 0.2–0.5.
The SVM algorithm performs well when dealing
with high and nonlinear data, and the model is able to
3.1.4 Performance improvement formula capture complex relationships by mapping the data to
Visualizing prediction outcomes facilitates an intuitive higher dimensions through kernel functions. SVM
understanding of model performance across different performs well in the prediction of some water quality
water quality parameters. In this study, bar charts were parameters, but its training time is long and the data
utilized as the primary visualization method to present volume is large. The parameter selection of SVM has a
both the Mean Squared Error (MSE) and the coefficient great influence on the model performance, and different
of determination (R²) for each algorithm. This approach kernel functions and penalty factors will affect the
enables a clear comparative analysis of prediction prediction results.
accuracy and model fit on a per-parameter basis. The By integrating multiple decision trees, random
performance improvement is calculated as follows forest effectively reduces the over fitting problem of a
Equation (5). single decision tree. The model has strong robustness and
(MSE performs well when dealing with large-scale data.
before -MSEafter )PerformanceImprovement(%)= 100 Compared with decision tree, random forest can capture
MSEbefore complex nonlinear relationship more accurately and has
(5) higher prediction accuracy. Random forest also has the
In this study, the performance of the optimized problem of long training time and large consumption of
neural network model and random forest model has been computing resources, and the computing overhead is
improved. Taking the neural network as an example, the large when running on large data sets.
optimized MSE is reduced from 0.080 to 0.065, and the Neural network can automatically extract features
performance improvement is 18.75%. For the random from data through deep learning and has strong
forest model, the optimized MSE is reduced from 0.100 adaptability. The neural network is outstanding in the
to 0.087, and the performance improvement is 13%. prediction of multiple water quality parameters, and has
Through parameter optimization and algorithm high precision in the modeling of complex relationships.
adjustment, the accuracy of water quality prediction can The neural network can handle large-scale data sets and
be effectively improved. The optimized MSE for the has strong optimization ability in the training process.
Neural Network improved from 0.080 (pre-optimization) The training time of neural network is longer, the
to 0.065 (final), and Random Forest improved from 0.100 requirement of computing resources is higher, and more
to 0.087. These values are now clearly sourced from work needs to be done in data multiprocessing and model
Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 301
tuning. which have shown promise in environmental time-series
forecasting. Benchmarking these models against classical
methods on larger and real-time datasets could further
3.3 Model limitations and failure cases
validate their practical applicability in ecological
Despite overall good performance, several model-
monitoring systems.
specific limitations were observed. The decision tree
model failed to generalize in cases with high parameter References
correlation and missing value imputation, often leading
to overfitting in low-variance subsets. SVM struggled [1] Eyring V, Collins WD, Gentine P, Barnes EA,
when gamma and C were misaligned, producing flat Barreiro M, Beucler T, et al. Pushing the frontiers in
decision surfaces and poor sensitivity for DO prediction. climate modelling and analysis with machine
Random forest occasionally exhibited performance learning. Nat Clim Chang. 2024;14(1): 916–928.
degradation when input features were highly collinear, DOI:10.1038/s41558-024-02095-y
despite ensemble regularization. The neural network [2] Bren L, Ryan M. An Examination of Stream Water
model, though highly accurate overall, required Quality Data from Monitoring of Forest Harvesting
significant tuning and suffered from instability when in the Eastern Highlands of Victoria. Land.
trained on incomplete datasets. These issues emphasize 2024;13(8):1217. DOI:10.3390/land13081217
the importance of hyperparameter validation, feature [3] Li L, Knapp JLA, Lintern A, Crystal Ng CH,
decorrelation, and pre-processing robustness in real- Perdrial J, Sullivan PL, et al. River water quality
world water quality monitoring. shaped by land-river connectivity in a changing
climate. Nat Clim Chang. 2024;14(3):123-130.
DOI:10.1038/s41558-023-01923-x
4 Conclusion [4] Aalipour M, Wu NC, Fohrer N, Kalkhajeh YK,
Amiri BJ, et al. Examining the Influence of
In this study, four kinds of machine learning algorithms, Landscape Patch Shapes on River Water Quality.
namely decision tree, support vector machine, random Land. 2023;12(5):1011.
forest and neural network, are compared to discuss their DOI:10.3390/land12051011
application effect in water quality prediction. The [5] Stevens CAT, Lyons ARM, Dharmayat K, Mahani
experimental results show that the neural network model A, Ray KK, Vallejo-Vaz AJ, et al. Ensemble
is superior in dealing with complex nonlinear relations machine learning methods in screening electronic
and can improve the prediction accuracy. Random forest health records: A scoping review. Digit Health. 2023;
model is slightly inferior to neural network in some cases, 9:20552076231173225.
but has better stability and lower risk of over fitting, and [6] Zou XT, Liu YN, Ji LN. Review: Machine learning
is suitable for large-scale data processing. SVM is stable in precision pharmacotherapy of type 2 diabetes-A
in the prediction of some water quality parameters, but promising future or a glimpse of hope? Digit Health.
the training time is long and it is sensitive to the selection 2023; 9:20552076231203879.
of parameters. Decision tree is suitable for preliminary [7] Zainurin SN, Ismail WZW, Mahamud SNI, Ismail I,
analysis because of its strong interpretability, but it has Jamaludin J, Ariffin KNZ, et al. Advancements in
limitations when dealing with complex data. Monitoring Water Quality Based on Various
Future work can be optimized from two aspects, Sensing Methods: A Systematic Review. Int J
according to the characteristics of different water quality Environ Res Public Health. 2022;19(21):14080.
parameters, combined with a variety of algorithms for DOI:10.3390/ijerph192114080
integrated learning, to improve the prediction accuracy [8] Quiroz-Martinez M A, Perez-Vitonera A, Gómez-
and stability of the model. The real-time and Rios, Monica, et al. Architecture Design for the
computational efficiency of the model are also problems Implementation of a Water Quality Prediction
in practical applications, which need to optimize the System in Aquaculture Systems with Big Data.
training process of the model and reduce the International Conference on Applied Technologies.
computational overhead. Through the research of this Springer, Cham, 2025.DOI:10.1007/978-3-031-
paper, machine learning has a broad application prospect 89757-3_12.
in the field of water quality prediction. With the help of [9] Uypatchawong S, Chanamarn N. Enhancing surface
reasonable algorithm selection and optimization strategy, water quality prediction efficiency in northeastern
more efficient and accurate technical support can be thailand using machine learning. Indonesian Journal
provided for water quality monitoring, and the of Electrical Engineering & Computer Science,
development of intelligent water environment 2024, 36(2). DOI:10.11591/ijeecs. v36.i2.pp1189-
management can be promoted. 1198.
Future work will explore the integration of advanced [10] Huang Y, Cai Y, He Y, et al. A water quality
deep learning architectures, such as Temporal prediction approach for the Downstream and Delta
Convolutional Networks (TCNs), Transformer-based of Dongjiang River Basin under the joint effects of
sequence models, and hybrid attention-GNN frameworks,
302 Informatica 49 (2025) 291–302 Y. Xiong
water intakes, pollution sources, and climate change. [17] Xu M, Lee C, Ng C, et al. Assessment of machine
Journal of Hydrology, 2024, learning models in forecasting environmental
640(000):18.DOI:10.1016/j.jhydrol.2024.131686. impacts of industrial activities. Environ Impact
[11] Wu G, Zhang C. Analysis of water quality Assess Rev. 2024;48(2):45-58.
prediction in the yangtze river delta under the river DOI:10.1016/j.apr.2022.101438
chief system. Sustainability, 2024, 16(13):5578. [18] Yuan M, Shi Y, Liu Y, et al. Leveraging machine
DOI:10.3390/su16135578. learning for personalized cancer treatment: Recent
[12] Lopes RH, Silva CRDV, Salvador PTCD, Silva ÍdS, advances and challenges. Cancer Lett. 2024;514:1-
Heller L, Uchôa, SADC. Surveillance of drinking 13. DOI:PQDT:89409451
water quality worldwide: scoping review protocol. [19] Tang R, Zhang Z, Li H, et al. Application of deep
Int J Environ Res Public Health. 2022;19(15):8989. learning in the management of chronic diseases: A
DOI:10.3390/ijerph19158989 review. Chronic Dis Transl Med. 2023;9(3):235-249.
[13] Liu Z, Wang X, Zhang Y, et al. Big data and machine DOI:10.2147/IJGM.S516247
learning approaches in health applications: An [20] Cheng YR, Li G, Zhou X, Ye SH. Research on time
overview. J Healthc Inform Res. 2020;47(2):184- series forecasting models based on hybrid attention
200. DOI:10.1038/s41575-020-0327-3 mechanism and graph neural networks. Inform.
[14] Huang Y, Lee R, Wang S, et al. AI-driven diagnosis 2025;49(21). doi:10.31449/inf.v49i21.7580
in medical imaging: A survey of applications and [21] Pipalwa R, Paul A, Mukherjee T. Prediction of heart
challenges. Int J Comput Assist Radiol Surg. disease using modified hybrid labelifier. Inform.
2024;19(5):1215-1224. DOI:10.1007/s13721-024- 2023;47(1). doi:10.31449/inf.v47i1.3629
00491-0 [22] Wang P, Han Q, Zhang S, Wu Z. Machine learning-
[15] Zhang Y, Chen Y, Wu S, et al. Deep learning for based regression analysis and feature ranking for
predictive modeling of climate-related diseases: A localization error prediction in wireless sensor
systematic review. J Clim Change Health. networks. Inform. 2025;49(20).
2023;5:100034. DOI: doi:10.31449/inf.v49i20.8081
[16] Yang Z, Zhang L, Lu Y, et al. Neural network-based [23] Cavalieri S, Scroppo MS. A CLR virtual machine
models in environmental health data analysis: A based execution framework for IEC 61131-3
comparative study. Environ Health Perspect. applications. Inform. 2019;43(2).
2024;132(7):073004. doi:10.31449/inf.v43i2.2019
DOI:10.1109/TGRS.2025.3529322
https://doi.org/10.31449/inf.v49i16.9602 Informatica 49 (2025) 303–314 303
A GAN-Based Framework for Synthetic Financial Data Generation,
Risk Forecasting, and Portfolio Optimization under Uncertainty
Aihua Li
Department of Engineering Management, Henan Technical College of Construction, Zhengzhou, 450064, China
E-mail: hnzdli@163.com
Keywords: financial risk, dynamic prediction, decision optimization, generative adversarial network (GAN), machine
learning, risk management and financial modeling
Received: June 6, 2025
This article proposes a financial risk dynamic prediction and decision optimization model based on
Generative Adversarial Network (GAN). The model generates synthetic financial data, trains a risk
prediction model, and optimizes financial decisions based on predicted risks. Simulation results show that
the proposed method outperforms traditional machine learning models, achieving a mean absolute error
(MAE) of 0.012 and a mean squared error (MSE) of 0.002, indicating high prediction accuracy. The model
achieves an average risk of 4.5% and an average return of 8.2%, surpassing conventional algorithms.
With a recommended portfolio allocation of 65% equities, 30% bonds, and 5% cash, it optimizes
investment decisions by maximizing returns while minimizing risks. Overall, the proposed approach
provides a novel and effective solution for financial risk prediction and decision optimization,
demonstrating superior performance over existing methods.
Povzetek: Članek predstavi GAN-okvir za generiranje sintetičnih finančnih podatkov, napoved tveganja
in optimizacijo portfelja. Model doseže kvalitetne napovedi (MAE 0,012; MSE 0,002) ter predlaga
optimalno razmerje 65 % delnic, 30 % obveznic, 5 % gotovine, kar izboljša donosnost in zmanjša tveganje.
1 Introduction interdependence characteristic among the financial
metrics of the companies [2].
Subsequent to the development of the capital market, the One can identify anomalous financial data of enterprises
methodology for conducting financial analysis has by utilizing the commonly occurring groupings of
experienced continuous improvements. The scope of financial indicators, referred to as frequent item sets of
financial analysis will be broadened to encompass the financial indicators [3]. A feature associated with the
evaluation of financial position, operating outcomes, and historical evolution of financial indicators across various
cash flow of enterprises. In financial accounting, the sectors is the dynamic temporal correlation present among
conventional analytical approach entails assessing an industries. A transmission phase will occur in which
enterprise's financial condition quantitatively or alterations in the financial status of upstream enterprises
qualitatively based on key indicators related to solvency, will impact downstream industry. Subsequent to this
operational capacity, and profitability, along with the year- transmission period, the financial indicators of related
over-year performance of these indicators [1]. The upstream and downstream sectors will display either a
capacity to forecast financial risk exposure and positive or inverse connection over time. Forecasting the
developmental trends is deemed inadequate. future financial state of downstream sectors is achievable
Consequently, pertinent professionals began employing through an analysis of trend correlation [4-7].
increasingly sophisticated artificial intelligence and data Subsequently, reference [8] presents novel suggestions for
mining techniques for financial research and forecasting. improving financial indicators, so contributing to the early
Nonetheless, few studies have been undertaken to assess warning model of financial indicators. The study
or forecast the operational circumstances of firms by referenced in [9] indicated that the returns on total assets,
analyzing the associative relationships within financial the asset-liability ratio, and the working capital ratio are
data. The connection relationship between corporate the most advantageous regarding their effects.
financial data will provide several diverse manifestations, Reference [10] presented the application of numerous
which will differ based on the various data elements. The financial indicators in the study of a financial risk early
spatial association of enterprise finance pertains to the warning model. The researchers optimized five
distance characteristics of financial indicators across comprehensive indicators from a total of 22 financial
many dimensions. Moreover, enterprises situated in indicators, determined the weight coefficient for each,
proximity within multi-dimensional environments have a built the Z-value model, and achieved significant results.
higher degree of financial similarity. The static temporal In the realm of later corporate financial risk early warning
association of financial indicators refers to the analysis, the Z-value model has achieved significant
success via its endeavors. The concept of multivariate
304 Informatica 49 (2025) 303–314 A. Li
linearity, as outlined in reference [11], demonstrates that Non-financial indicators are crucial in several firm
the multivariate linear model is more appropriate for the financial early warning models, and the importance of
contemporary enterprise financial early warning system their early warning analyses is paramount [15]
and exhibits superior accuracy compared to the Concerning the purpose and role of financial diagnosis,
multivariate early warning model. reference [16] said that for financial diagnosis to
The principle of multivariate linearity underpins the contribute to the strategic development of the company, it
formulation of the logistic regression model. Reference must be positioned at a strategic level. This was achieved
[12] conducted a linear analysis employing the Logistic by identifying an alternate method to focus on the strategic
linear regression model with the prevailing economic perspective.
conditions and model attributes. They suggested that early A specific time period is frequently predicted using
warning systems for financial risk could enhance their machine learning (ML) models, remote sensing
accuracy through the accumulation of expertise derived techniques, and empirical models [18, 19]. The most
from an increasing number of study samples and data promising technologies for forecast prediction are ML
quantity. Thus, scholars have suggested that integrating models, which are frequently used in artificial neural
factor analysis with the logistic regression model might networks (ANNs) because of their high accuracy. ARIMA
more precisely represent the possible financial hazards is a well-known ML model that is particularly popular for
associated with financial indicators. Moreover, it may time series data and has excellent accuracy for small
diminish the superfluous weight resulting from the datasets [20, 21]. Table 1 present the comparison of
redundancy of index elements, hence illustrating its proposed work with recent literature.
enhanced accuracy and scientific validity.
In the domain of financial risk early warning, neural Motivation and contribution
networks have gained prominence because to the rapid Using Generative Adversarial Networks (GANs), the
advancement of artificial intelligence and the robust proposed financial risk dynamic prediction and decision
technological support afforded by big data on the internet. optimization model has many novel characteristics. First,
The approach referenced in [13] suggests that early it creates fake financial data using GANs. A new way to
warning enterprises might gain advantages from the forecast financial risk and make wiser decisions. Second,
empirical risk reduction principle of neural networks. it simplifies money decisions by combining risk prediction
Nevertheless, concurrently, the predictive efficacy of with decision optimization. Synthetic data production
neural network early warning models utilizing machine generates realistic data, making the risk prediction model
learning technology is improving significantly due to the more accurate and trustworthy. By considering risks, it
rapid advancement of computer technology. helps individuals make sound financial decisions. Lowers
Financial indicators not only objectively reflect an money loss risk.
organization's operational and financial health but are also The suggested approach uses a GAN architecture to
the most often utilized metrics in financial early warning generate fake financial data. GANs in finance are used in
models. Due to its ease of acquisition, it has attracted this new method. A risk prediction model trained on GAN-
considerable interest since the introduction of the generated fake data is also used. Thus, risk estimates are
univariate early warning model. The selection of financial more reliable.
indicators has evolved from a singular focus on metrics To test the proposed model, we simulate it. This gives us
like the asset-liability ratio and equity ratio to a parallel an exact and full picture of its performance. Comparing it
assessment of multiple indicators, ultimately advancing to to other machine learning models shows its superiority
the categorization of specific financial indicators into and usefulness. These new experimental ideas help us
various classifications to enhance model efficiency [14]. fully examine the model's abilities and observe how it can
This modification was implemented to enhance the identify financial risks and make wiser decisions.
model's efficiency.
Table 1: Comparison of proposed work with recent literature
Disadvantages
Reference Key Focus/Contribution Advantages Highlighted (Implied/Potential) Gaps (Unaddressed by the Text)
Crucial for firm financial early Not explicitly stated, but non- Specific types of non-financial
warning; paramount for early financial data can be qualitative, indicators (e.g., ESG, operational,
Importance of Non- warning analyses. Provides a harder to quantify, or less governance) and their individual
financial Indicators in more holistic view beyond standardized. Data collection impact. Methodologies for integrating
[15] Early Warning Models traditional financial ratios. might be more complex. diverse non-financial data.
Contributes to the strategic Implies that if not strategically
Strategic Positioning of development of the company positioned, financial diagnosis How to effectively integrate financial
[16] Financial Diagnosis when positioned at a strategic might be limited to a tactical or diagnosis into the strategic planning
level. Shifts focus from mere operational view, missing process. Specific "alternate methods"
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 305
solvency to long-term viability broader implications. for a strategic focus.
and growth.
Diverse range of methods
Overview of Prediction available for predicting specific Comparative analysis of these
Techniques (ML, Remote time periods. Suggests No specific disadvantages techniques for financial early warning
Sensing, Empirical adaptability across various mentioned for these general specifically. When to choose one over
[18] Models) domains. categories. the other for financial applications.
Specific limitations of ANNs (e.g.,
interpretability, data requirements).
ML models are "most promising ARIMA's limitation of "small How to handle highly volatile or non-
technologies" with "high datasets" is mentioned, stationary financial time series. The
accuracy." ANNs are frequently implying it might not be as challenges of implementing and
used. ARIMA is "well-known" suitable for large or complex validating these models in real-world
Machine Learning Models and "popular for time series data" financial datasets without financial settings. Addressing data
(ANNs, ARIMA) for with "excellent accuracy for significant preprocessing or quality issues in financial datasets for
[20] Forecast Prediction small datasets." combination with other models. ML models.
Developing a financial
risk dynamic prediction
and decision optimization
model using Generative Improved accuracy, Robust risk
Proposed Adversarial Networks prediction and Optimized
model (GANs) decision-making Complexity Interpretability
2 The proposed system Generative Adversarial Network (GAN):
A generator plus a discriminator makes up a generative
To address financial risk, the suggested system is a adversarial network (GAN). While the discriminator
multifarious structure combining three main components. separates between actual and synthetic data [22–25], the
It uses Generative Adversarial Networks (GANs) to create generator generates synthetic financial data similar to
realistic synthetic financial data and improve prediction genuine data. An adversarial loss function reduces the
accuracy, optimization models to guide best decision- difference between actual and synthetic data, hence
making based on risk predictions, and time-series training the GAN. For the purpose of capturing
financial data to capture the dynamic character of financial interactions throughout time, TimeGANs make use of
risk, so offering a complete method of managing financial recurrent neural networks. These methods generate
risk. Figure 1 presents the block diagram for the proposed respectable synthetic time series data by accurately
system. The proposed model architecture is a multifaceted simulating time-series dynamics using extra networks.
structure comprising four phases. Firstly, a Generative Generator (G):
Adversarial Network (GAN) is trained to generate The generator aims to produce synthetic financial data that
synthetic financial data that closely resembles real closely resembles the real data. 𝐺(𝑧; 𝜃𝑔) , where 𝑧 is a
financial data. Secondly, a risk prediction model is trained
random noise vector and 𝜃𝑔 represents the generator's
using a combination of real and synthetic financial data to
parameters. 𝐺(𝑧) generates synthetic financial time series
predict future financial risk. Thirdly, the trained risk
prediction model is utilized to predict future financial risk ?̃?.
based on new, unknown input data. Lastly, the predicted Discriminator (D):
financial risk is leveraged to optimize financial decisions, The discriminator aims to distinguish between real and
such as portfolio allocation and risk management synthetic financial data. 𝐷(𝑥; 𝜃𝑑), where 𝑥 is the input
strategies. data (either real or synthetic) and 𝜃𝑑 represents the
discriminator's parameters. 𝐷(𝑥) outputs a probability
2.1 Data representation that 𝑥 is real.
Loss Function:
Financial data is inherently time-series based. Let 𝑋 =
The GAN is trained by minimizing the following
{𝑥 𝑇
𝑡}𝑡=1 represents the financial time series, where 𝑥𝑡 is a
adversarial loss function:
vector of financial features at time 𝑡. These features could
min max 𝑉(𝐷, 𝐺)
include stock prices, interest rates, volatility indices, etc. 𝐺 𝐷
We can represent this as: 𝑥𝑡 = [𝑝𝑡 , 𝑖𝑡 , 𝑣𝑡 , . . . ], where 𝑝𝑡 is = 𝐸𝑥 [𝑙𝑜𝑔𝐷(𝑥)]
∼𝑝𝑑𝑎𝑡𝑎(𝑥)
the price, it is the interest rate, and 𝑣𝑡 is the volatility at + 𝐸𝑧 [𝑙𝑜𝑔(1 − 𝐷(𝐺(𝑧)))]
∼𝑝𝑧(𝑧)
time 𝑡. (1)
306 Informatica 49 (2025) 303–314 A. Li
In Eq. (1), 𝑝𝑑𝑎𝑡𝑎(𝑥) is the distribution of real financial 𝑽𝒂𝑹 and 𝑬𝑺 calculation
data, 𝑝𝑧(𝑧) is the distribution of the random noise and Value at Risk (𝑉𝑎𝑅) and Expected Shortfall (𝐸𝑆) may be
𝐸(. ) represents the expected value. calculated many ways. The Historical Simulation Method
Time series GANs (TimeGANs): organizes GAN-generated data in ascending order and
For time series data [26], variations like TimeGANs are calculates 𝑉𝑎𝑅 at the 95th or 99th percentile selected for
employed, which incorporate recurrent neural networks
confidence. The Parametric Method calculates 𝑉𝑎𝑅 for
(RNNs) like LSTMs or GRUs to capture temporal
GAN-generated data using a normal or Student's t-
dependencies. These models utilize embedding and
distribution. The Monte Carlo Simulation Method
recovery networks, in addition to the generator and
discriminator, to effectively model time-series dynamics. employs the GAN to create several scenarios and calculate
GANs may generate synthetic financial data with similar 𝑉𝑎𝑅 by averaging the losses at the selected confidence
patterns and linkages. Giving models a larger dataset to level.
train on may help them forecast dangers. Rare or severe When computing ES, the Historical Simulation Method
occurrences may be underrepresented in this dataset. identifies the average loss larger than 𝑉𝑎𝑅 at the set
Synthetic data production creates novel situations that confidence level. The Parametric Method assumes the
may help models perform better with fresh data. distribution of GAN-generated data and calculates ES
We identify risk indicators like Expected Shortfall (𝐸𝑆 ) using its properties. The Monte Carlo Simulation Method
and Value at Risk using GAN-generated false data. Time- employs the GAN to create several scenarios and discover
series links may allow the model to dynamically predict
ES by calculating the average loss larger than 𝑉𝑎𝑅.
future risk levels from current and prior financial data.
Manages dangers beforehand.
Confidence Level
The optimization component determines the best sequence
of options within constraints and maximizes utility Specific needs may determine 𝑉𝑎𝑅 and 𝐸𝑆 calculation
function using predicted risk. GANs and decision confidence levels. Internal risk management uses 99% CI
optimization may improve scenario realism and power. to set limitations. These methods and confidence levels
This improves financial risk management decisions. may help banks and investors estimate 𝑉𝑎𝑅 and ES while
taking into account complex data patterns and
2.2 Financial risk prediction correlations.
Generative Adversarial Networks (GANs) may help to
estimate risk metrics thereby strengthening financial risk Risk measure estimation:
prediction. Synthetic financial scenarios produced by GANs can be used to generate synthetic financial
GANs are then used to estimate risk factors like Expected scenarios, which can then be used to estimate risk
Shortfall (ES) and Value at Risk (VaR). VaR shows the measures like Value at Risk (𝑉𝑎𝑅) or Expected Shortfall
possible loss with a particular confidence level; ES (𝐸𝑆), showed in Eq. (2) and (3).
computes the anticipated loss outside of VaR. Moreover, 𝑉𝑎𝑅𝛼 = 𝑖𝑛𝑓{𝑙: 𝑃(𝐿 ≤ 𝑙) ≥ 𝛼} (2)
by including time-series dependencies, the model can where 𝐿 is the loss and 𝛼 is the confidence level which has
dynamically forecast future risk levels depending on been placed as subscript to 𝑉𝑎𝑅 and 𝐸𝑆.
present and previous financial data, thus supporting 𝐸𝑆𝛼 = 𝐸[𝐿 ∣ 𝐿 ≥ 𝑉𝑎𝑅𝛼]. (3)
proactive risk control. Dynamic prediction:
By incorporating time-series dependencies, the model can
dynamically predict future risk levels based on current and
past financial data. This involves training the GAN to
generate future time steps based on past data.
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 307
Figure 1: The block diagram for the proposed system
2.3 Decision optimization Utility function and constraints vary by financial risk
Decision optimization increases utility function within management circumstance. This optimization issue may
restrictions by determining the best choice sequence. It be solved using dynamic programming or other
helps manage financial risk. Based on expected risk, the approaches. You may use financial scenarios to minimize
utility function shows the choice result. The limits may risk or increase returns in investment portfolios. This may
include your risk tolerance and budget. You may optimize be done via Mean-Variance Optimization (MVO). By
this issue using dynamic programming and other identifying risk events and reducing them, the model may
approaches. Financial scenarios may be used with mean- help you create dynamic risk management plans.
variance optimization (MeV) to optimize returns or reduce
risk in an investment portfolio. The model may also help Optimization function:
create dynamic risk management strategies by predicting A typical portfolio optimization function is in Eq. (5):
future risk events and providing solutions. A normal min 𝑤𝑇 𝛴𝑤 − 𝜆𝑤𝑇𝜇 (5)
𝑤
portfolio optimization function minimizes risk and where, 𝑤 is the vector of portfolio weights, 𝑤𝑇𝛴𝑤
maximizes profits. The asset returns covariance matrix represents the portfolio risk (variance of returns), 𝛴 is the
and anticipated asset return vector are examined. GANs covariance matrix of asset returns, 𝛼 is risk tolerance
give possibilities for covariance matrix and expected parameter and 𝜇 is the vector of expected asset returns.
return calculations. Combining this with decision The GANs provide the scenarios used to calculate the
optimization provides for more accurate and realistic covariance matrix and expected returns. This allows the
scenario information. This link simplifies optimization optimization to utilize more robust and realistic scenario
and improves financial risk assessment. information.
Let 𝐴𝑡 be the decision variable at time t (e.g., investment The decision optimization stage guides financial decisions
portfolio allocation, risk mitigation actions) and 𝑅𝑡 is risk by utilizing predicted financial risk to determine the best
tolerance. Let 𝑈(𝐴𝑡 , 𝑅𝑡) be the utility function, financial choices. This stage begins with inputting the
representing the decision's outcome based on the predicted
expected financial risk into the decision optimization
risk. The optimization problem is to find the optimal
module, which serves as the foundation for optimizing
decision sequence in Eq. (4):
1 financial decisions. A suitable optimization model, such as
max (𝐴1, . . . , 𝐴𝑇)∑𝑡 = 𝑇 (𝐴
𝑈 𝑡 , 𝑅𝑡) (4)
linear programming or dynamic programming, is
Subject to constraints: 𝐶(𝐴𝑡 , 𝑅𝑡) ≤ 0 (e.g., budget
established based on the complexity and nature of the
constraints, risk tolerance).
financial decisions. The optimization model identifies
308 Informatica 49 (2025) 303–314 A. Li
complex interactions among financial factors, including synthetic data; the discriminator network checks it and
asset returns, risk levels, and portfolio restrictions. provides comments back to the generator. Following
preprocessing, the GAN is trained aiming toward
The optimization process involves determining the best producing synthetic financial data indistinguishable from
financial choices that minimize financial risk and real data. Visual inspection, accuracy, and loss functions
maximize profits, subject to various constraints and limits. are among the many criteria used among the several
The best financial judgments generated by the decision benchmarks to evaluate the GAN's performance
optimization module can guide direct investment throughout training. By use of knowledge of the quality of
strategies, risk management, and portfolio performance the generated data, this evaluation directs any necessary
maximization. By making informed decisions based on adjustments to the GAN design or training environment.
After sufficient training a GAN may generate realistic
optimal financial judgments, financial institutions and
synthetic financial data that can be used downstream for
investors can reduce financial risk, increase returns, and
stress testing, risk analysis, and portfolio optimization.
achieve their financial goals.
The decision optimization problem can be mathematically Stage 2: Risk prediction model training
An essential component of the complete process, the
represented as:
training phase for risk prediction models seeks to produce
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒: 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝑒. 𝑔. , 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑟𝑒𝑡𝑢𝑟𝑛, 𝑟𝑖𝑠𝑘a powerful and accurate model able to anticipate future
− 𝑎𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑟𝑒𝑡𝑢𝑟𝑛) financial risk. This stage begins with the synthesis of
synthetic data mixed with genuine financial data, therefore
𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜: 𝐶𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠 (𝑒. 𝑔. , 𝑟𝑖𝑠𝑘 𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒, providing a whole and diversified dataset for training the
risk prediction model. Depending on the kind and degree
𝑟𝑒𝑔𝑢𝑙𝑎𝑡𝑜𝑟𝑦 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑚𝑒𝑛𝑡𝑠, 𝑏𝑢𝑑𝑔𝑒𝑡 𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠) of the data, a suitable risk prediction model is
subsequently created—machine learning or deep learning
𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠: 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 (𝑒. 𝑔. , 𝑝𝑜𝑟𝑡𝑓𝑜𝑙𝑖𝑜 model. Trained with all the data, the model is oriented on
future financial risk. The training approach seeks to
𝑤𝑒𝑖𝑔ℎ𝑡𝑠, 𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 𝑎𝑚𝑜𝑢𝑛𝑡𝑠)
optimize the model's parameters so that the error between
The utility function and constraints can be tailored to predicted and actual risk levels is lowest feasible. After
specific financial goals and risk management objectives. training, accuracy, precision, recall, and F1-score among
other standards are used to evaluate the model. These
By solving this optimization problem, financial
indications advise any necessary architectural or training
institutions and investors can determine the optimal
parameter adjustment and assist one to grasp the potential
financial decisions that balance risk and return.
of the model to precisely anticipate financial risk. Good
risk prediction model training may enable financial
3 Complete model structure organizations to learn significant knowledge about likely
The complete model structure consists of four phases future dangers, therefore directing their activities and
(Figure 2). First step is training a Generative Adversarial development of effective risk management strategies.
Network (GAN) to provide suitable synthetic financial
data. Stage 2 uses blended real and synthetic data to build Stage 3: Risk prediction
a risk prediction model hoping to forecast future financial The first risk prediction one applies in the final stage of
risk. Stage 3 projections financial risk depending on new, the operation forecasts future financial risk using a trained
unknown factors using the taught risk prediction model. risk prediction model. Starting with the provided new,
By use of the expected financial risk via a decision unknown input to the trained model comprising financial
optimization module, stage 4 at last optimizes financial aspects and current market conditions, this phase proceeds
choices including risk management techniques or through It is carefully chosen to ensure its correctness and
portfolio allocation. Every step builds on the one before it relevance as the predictions of the model rely on the
lets the model create reasonable synthetic data, predict available data. Once the particular data becomes available,
financial risk, and maximize financial actions to reduce the trained model is then projected future financial risk
risk and increase profits. related with it. This prediction offers a forward-looking
assessment of expected financial risk based on patterns
Stage 1: GAN training and connections the model learns over the training period.
First in the procedure is training a generative adversarial The expected financial risk generated by the risk
network (GAN) suitable synthetic financial data. Previous prediction model might find use for requirements related
financial data is collected and preprocessed at this step to to decision-making. Whether in form—a chance of
ensure it is in a fit condition for training the GAN. Usually default, expected loss, or risk score—this output provides
combining time-series data with important financial financial institutions, investors, and other stakeholders
domain associated important features, this data Data significant information. These businesses might optimize
preparation results in the construction of an appropriate their risk-reducing strategies, make sensible decisions,
GAN architecture incorporating a generator network and and manage challenging financial markets with greater
a discriminator network. The generator network generates confidence by applying the expected financial risk.
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 309
depending on the degree of complexity and nature of the
financial decisions—a linear programming or dynamic
programming model. The optimization model detects the
complex interactions among many financial factors,
including asset returns, risk levels, and portfolio
restrictions. Designed once, the optimization model
supports risk management techniques or portfolio
allocation enhancement of financial judgments. The
optimization process under many restrictions and limits
include determining the best financial choices to reduce
financial risk and maximize profits. Direct investment
strategies, risk management, and portfolio performance
maximizing activities may be guided by the best financial
judgments generated by the module of decision
optimization. Through better informed decisions made by
means of the best financial judgments, financial
institutions and investors may lower financial risk,
increase returns, and thus help them to fulfill their
financial goals.
3.1 Integration of components
GAN produces financial data to train the risk prediction
model. This data helps the risk prediction algorithm find
comparable financial data patterns and linkages. Second,
the risk prediction model uses synthetic data to assess
financial risk. Then, Value-at-Risk ( 𝑉𝑎𝑅 ) or predicted
Shortfall (𝐸𝑆) are used to assess predicted risk. Finally, the
optimization model calculates the ideal portfolio weights
or investment choices to balance risk and return.
Steps of the algorithm are described below. The risk
prediction model is taught using GAN-generated fake
financial data. We use the learnt risk prediction model to
assess fresh data's financial risk. Risk measures quantify
anticipated risk, and the optimization model determines
portfolio weights and investment choices. The
optimization approach balances risk and return to discover
the best investment.
The steps are as follows:
1. Generate false financial data using GAN:
synthetic_data = GAN.generate_data()
2. Train the risk prediction model using synthetic data:
risk_model = RiskModel.train(synthetic_data).
3. Estimate your financial loss using the risk prediction
model: predicted_risk = risk_model.predict(new_data).
4. Use calculate_risk_metric(predicted_risk) to get the
Figure 2: The complete model structure
risk metric.
Stage 4: Decision optimization 5. Find the optimal portfolio weights or investments using
The stage of decision optimization guides financial
the optimization model: Portfolio =
decisions by means of the predicted financial risk. This
OptimizationModel.optimize(risk_metric, return_metric)
stage begins with the expected financial risk being input
into the module of decision optimization, therefore
guiding the foundation for optimizing financial choices.
The suitable optimization model is then established
310 Informatica 49 (2025) 303–314 A. Li
3.2 Variable selection and mapping 3.3 Weighting and validation of real and
synthetic data
Macroeconomic issues like GDP, inflation, and
employment and social development elements like health, During training, the real and synthetic data can be
education, and poverty are studied. These characteristics weighted differently to control the influence of each type
were selected because they impact financial markets and of data on the model's performance. One approach is to
asset returns. To place variables into a portfolio context, use a weighted loss function that assigns different weights
we may use a multivariate technique that examines asset to the real and synthetic data. For example:
performance. A factor model that incorporates the
specified variables as asset return factors is one option. 𝑙𝑜𝑠𝑠 = 𝑤𝑟𝑒𝑎𝑙 ∗ 𝑙𝑜𝑠𝑠𝑟𝑒𝑎𝑙 + 𝑤𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐 ∗ 𝑙𝑜𝑠𝑠𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐
Table 2 lists generator and discriminator network where 𝑤𝑟𝑒𝑎𝑙 and 𝑤𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐 are the weights assigned to the
architectural parameters. real and synthetic data, respectively.
Asset-Level Returns Validation
We model asset returns using a multivariate distribution, To validate the performance of the model on both real and
such as a multivariate normal distribution or a more synthetic data, we can use metrics such as mean squared
elaborate one that exhibits non-linear relationships error (MSE) or mean absolute error (MAE) on a hold-out
between variables. validation set. This can help us monitor the model's
performance on both types of data and adjust the
Portfolio context
weighting scheme or other hyperparameters as needed.
We use portfolio optimization to place variables in a Generating synthetic data that is diverse and
portfolio context by looking at anticipated returns, risks, representative of the real data can help reduce overfitting.
and correlations between assets.
4 Experimental setup
The optimization problem can be formulated as:
Python, TensorFlow or PyTorch is used for deep
maximize: Portfolio return learning. The model settings include a batch size of 128,
subject to: Risk constraints (e.g., 𝑉𝑎𝑅, 𝐸𝑆) 500 epochs, a noise dimension of 100, learning rates of
variables: Portfolio weights 0.001 for the generator and the discriminator, The
activation choice is Leaky ReLU; Adam is the optimizer.
The simulation parameters consist of a 0.1 volatility, a
0.02 risk-free rate, and a 1000-time step simulation.
Table 2: Generator and Discriminator Network We began the process of training a Generative Adversarial
Architecture Parameters Network (GAN) for financial data creation using publicly
available financial datasets
Generator Discriminator (https://databank.worldbank.org/. Comprising more than
Parameter Network Network 9,000 variables covering several spheres including
economic, social, environmental, and others, this dataset
includes macroeconomic characteristics such GDP,
Number of
inflation, and employment as well as social development
Layers 4 4
measures including education, health, and poverty.
After the dataset is selected, data preparation—a crucial
Activation
component of the overall process—follows. Missing
Function Leaky ReLU Leaky ReLU values must be handled by interpolation or imputation; the
data must be normalized so that every attribute falls in the
Number of 64, 128, 256, same range. Furthermore, the data has to be converted into
Filters 512 64, 128, 256, 512 an appropriate form for GAN training, maybe
incorporating scaling or encoding. The GAN design needs
Kernel Size 4, 4, 4, 4 4, 4, 4, 4 to be developed after data preparation. A deep
convolutional GAN (DCGAN) is particularly appropriate
Stride 2, 2, 2, 2 2, 2, 2, 2 for financial data producing as its architecture consists of
a generator network and a discriminator network. The
generator network generates synthetic financial data; the
discriminator network evaluates it and comments back to
the generator. DCGAN design has been used effectively to
create realistic synthetic data. Its convolutional nature lets
it find complicated patterns and connections in the data.
Some of the things that high-quality synthetic financial
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 311
data created by DCGAN designs may be used for include Table 3 shows the different parameters that were utilized
risk analysis, portfolio optimization, and stress testing. to design the generator and discriminator networks.
Table 3: CNN architecture parameters for generator and discriminator network
Generator Network Discriminator Network
Input layer 100-dimensional noise vector Input layer 1-dimensional input (financial
data)
Convolutional layer 1 64 filters, kernel size 3, stride Convolutional 64 filters, kernel size 3, stride
1 layer 1 1
Convolutional layer 2 128 filters, kernel size 3, Convolutional 128 filters, kernel size 3, stride
stride 1 layer 2 1
Convolutional layer 3 256 filters, kernel size 3, Convolutional 256 filters, kernel size 3, stride
stride 1 layer 3 1
Output layer 1-dimensional output Output layer 1-dimensional output
(financial risk prediction) (probability of real data)
Figure 3 depicts the recommended model's CNN 5 Results and discussion
architecture. Since it helps the Generative Adversarial
Network (GAN) model identify financial data patterns and For GAN training, the dataset must deal with missing
linkages, training is crucial. The Adam optimizer trains values by interpolation or imputation, normalize the data
GANs. The well-known stochastic gradient descent so all characteristics are in the same range, and format the
method alters the learning rate for each parameter based data for GAN training.
on gradient size. The small learning rate of 0.001 allows
model parameters converge slowly and gradually. The Adam optimizer trains GAN with 0.001 learning rate and
batch size is 128, a conventional value that balances 128 batches. The generating network trains using phony
computer speed and model stability. The GAN learns to financial data, and the discriminator network verifies it
create phony financial data that appears real during and informs the generator what it thinks. GANs are trained
training. In addition, the discriminator learns to for 500 epochs to obtain convergence and provide high-
distinguish genuine from fraudulent data. After training, quality synthetic financial data.
R-squared, MAE, and MSE are used to evaluate the
GAN's performance. These measurements demonstrate Several indicators are used to evaluate the proposed model
the reliability of synthetic data and assist adjust GAN such as: MAE, MSE, RMSE, R-squared, risk prediction
design and training parameters. How effectively the GAN accuracy, precision, recall, and F1-score.
operates might indicate its synthetic data quality. This will
determine whether the data is suitable for risk analysis, Table 4 shows that the proposed model outperforms recent
portfolio optimization, and stress testing. works 1 [27], 2 [28], and 3 [29].
Existing Work 1 [27] employs CNNs and LSTM networks
for deep learning. Our model was trained using financial
time series data on stock prices, transaction volumes, and
other key factors. The model comprises 5 128-unit hidden
layers. ReLU activation function, Adam optimizer, 0.01
learning rate, 64 batch size, 1000 epochs.
Existing Work 2 [28] uses random forest machine
learning. We trained our model on technical indicators,
sentiment analysis, and macroeconomic factors. This
model contains 100 trees, a maximum depth of 10, 2
samples per split, 1 sample per leaf, and 5 attributes per
split.
employs an autoregressive integrated moving average
(ARIMA) method. This model learned from a set of
historical financial time series data. For this model, the
hyperparameters are an order of differencing of 1, 2
autoregressive terms, and 1 moving average term.
Figure 3: The CNN architecture for the proposed model
312 Informatica 49 (2025) 303–314 A. Li
With values of 0.009, 0.012, and 0.015 for the training,
validation, and testing sets respectively, the suggested Table 5: GAN performance
model's Mean Absolute Error (MAE) is much lower than
0.052 ± 0.008 for current work. Likewise, the Mean Existing Existing Existing
Squared Error (MSE) and Root Mean Squared Error Metric Proposed Model Work 1 Work 2 Work 3
(RMSE) values for the proposed model are 0.001, 0.002, Training set: 0.04,
and 0.003, and 1.2%, 1.5%, and 1.8% for the training, Validation set:
Generator 0.05, Testing set:
validation, and testing sets, respectively, exceeding
Loss 0.06 0.05 0.07 0.09
present work with values of 0.003 ± 0.001 and 0.55 ±
Training set: 0.02,
0.8%. Moreover, whereas previous work achieves a lower Validation set:
R-squared value of 0.854 ± 0.018, the Coefficient of Discriminator 0.03, Testing set:
Determination (R-squared) values for the proposed model Loss 0.04 0.03 0.05 0.07
are 0.95, 0.92, and 0.90 for the training, validation, and 500 epochs, Batch
testing sets, respectively, indicating a strong correlation size: 128,
between predicted and actual values. Comparatively to GAN Learning rate: 1000 800 1200
Convergence 0.001 epochs epochs epochs
previous work, the suggested model shows enhanced
accuracy, dependability, and generalizability.
As per table 6, the predicted financial risk yielded by the
proposed model is remarkably close to the actual financial
Table 4: Performance metrics
risk, with an average predicted risk of 0.023 and a standard
deviation of 0.005, compared to an average actual risk of
Proposed Existing Existing Existing
Metric Model Work 1 Work 2 Work 3 0.025 and a standard deviation of 0.006. In contrast,
existing work exhibits a higher average predicted risk of
Training set:
0.009, Validation 0.028, indicating a less accurate prediction. Furthermore,
set: 0.012, the proposed model achieves a risk prediction accuracy of
Mean Absolute Testing set: 0.052 ± 0.065 ± 0.075 ± 92%, with a precision of 90%, recall of 94%, and F1-score
Error (MAE) 0.015 0.008 0.010 0.012
of 92%, surpassing the 85% accuracy achieved by existing
Training set: work. This superior performance underscores the
0.001, Validation
proposed model's ability to accurately predict financial
set: 0.002,
Mean Squared Testing set: 0.003 ± 0.005 ± 0.007 ± risk, enabling financial institutions and investors to make
Error (MSE) 0.003 0.001 0.002 0.003 informed decisions and mitigate potential losses.
Training set:
Root Mean 1.2%, Validation Table 6: Risk prediction results
Squared Error set: 1.5%, 0.055 ± 0.070 ± 0.085 ±
(RMSE) Testing set: 1.8% 0.008 0.010 0.012
Existing Existing Existing
Training set: Metric Proposed Model Work 1 Work 2 Work 3
Coefficient of 0.95, Validation
Determination set: 0.92, Testing 0.921 ± 0.895 ± 0.865 ± 0.023 (Average
(R-squared) set: 0.90 0.013 0.018 0.022 Predicted predicted risk: 0.023,
Financial Standard deviation of
Risk predicted risk: 0.005) 0.028 0.035 0.042
As per table 5, the generator loss for the proposed model
0.025 (Average actual
is lower, with values of 0.04, 0.05, and 0.06 for the Actual risk: 0.025, Standard
training, validation, and testing sets, respectively, Financial deviation of actual
compared to 0.08 for existing work. Similarly, the Risk risk: 0.006) 0.03 0.035 0.04
discriminator loss for the proposed model is lower, with Risk 92% (Precision: 90%
values of 0.02, 0.03, and 0.04 for the training, validation, Prediction Recall: 94% F1-
Accuracy score: 92%) 85% 80% 75%
and testing sets, respectively, outperforming existing work
with a value of 0.05. While present work spans 1000
epochs, the proposed GAN model achieves convergence Table 7 shows that the model has a precision of 0.853 ±
in fewer epochs—only 500 epochs—needed to reach 0.021, a recall of 0.826 ± 0.025, and an F1-score of 0.839
optimal performance. The successful convergence of the ± 0.022 for low-risk predictions. This means that it is quite
suggested model—a batch size of 128 and a learning rate good at finding low-risk situations. The model does better
of 0.001—helps to be explained by optimum in the medium-risk category, with accuracy, recall, and F1-
hyperparameter values. The proposed GAN model score values of 0.913 ± 0.015, 0.895 ± 0.018, and 0.904 ±
exhibits usually superior performance, stability, and 0.016, respectively. This shows that it can reliably forecast
efficiency than present work, which makes it a more medium-risk occurrences. The model's ability to find
trustworthy and effective tool for producing synthetic high-risk situations is shown by its high accuracy, recall,
financial data. and F1-score values of 0.952 ± 0.008, 0.935 ± 0.011, and
0.943 ± 0.009, which are all very good. Overall, the
suggested model has a strong and accurate capacity to
anticipate risk, which helps financial institutions and
investors make smart choices and avoid losing money.
A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 313
Table 7: Risk level-based prediction results predicts risk well with an MAE of 0.012 and an MSE of
0.002. Due to its 4.5% risk and 8.2% return, the model
Risk Proposed Existing Existing Existing outperforms machine learning methods. The model
Level Model Work 1 Work 2 Work 3 adjusts to market volatility with an average return of 8.5%
Precision: 0.853 and risk of 4.2%. The model offers a novel technique to
± 0.021, Recall: Precision: Precision: Precision: predict financial risk dynamics and improve decision-
0.826 ± 0.025, 0.80, Recall: 0.75, Recall: 0.70, Recall:
making. It may be utilized for portfolio, risk, and
F1-score: 0.839 0.75, F1- 0.70, F1- 0.65, F1-
Low ± 0.022 score: 0.77 score: 0.72 score: 0.67 investment choices. We must improve the risk prediction
Precision: 0.913 model, add elements to the decision-optimizing model,
± 0.015, Recall: Precision: Precision: Precision: and discover new methods to use technology in banking.
0.895 ± 0.018, 0.85, Recall: 0.80, Recall: 0.75, Recall:
F1-score: 0.904 0.80, F1- 0.75, F1- 0.70, F1-
Medium ± 0.016 score: 0.82 score: 0.77 score: 0.72 References
Precision: 0.952 [1] Bhat, A., Kulkarni, N., Husain, S., Yadavalli, A.,
± 0.008, Recall: Precision: Precision: Precision: Kaur, J. N., Shukla, A., & Seshadri, V. (2024).
0.935 ± 0.011, 0.90, Recall: 0.85, Recall: 0.80, Recall:
F1-score: 0.943 0.85, F1- 0.80, F1- 0.75, F1- Speaking in terms of money: financial knowledge
High ± 0.009 score: 0.87 score: 0.82 score: 0.77 acquisition via speech data generation. ACM Journal
on Computing and Sustainable Societies, 2(3), 1-35.
When it comes to the best portfolio allocation, anticipated [2] Paiva F.D.a, Cardoso R.T.N., Hanaoka G.P.,
return, and expected risk (Table 8), the suggested &Duarte W.M. Decisionmaking for Financial
technique is far better at making judgments than earlier Trading: A Fusion Approach of Machine Learning
studies. The suggested model says that the best way to and Portfolio Selection. Expert Systems with
divide up a portfolio is to have 65% stocks, 30% bonds, Applications,2019, (115):635-655
and 5% cash. Other work has said to put 60% of your [3] Tang, Y., Song, Z., Zhu, Y., Yuan, H., Hou, M., Ji,
money in equities, 35% in bonds, and 5% in cash. Also, J.,... & Li, J. (2022). A survey on machine learning
the recommended model has a greater expected return of models for financial time series forecasting.
8.2% (with a standard deviation of 1.5%) than the 7.5% Neurocomputing, 512, 363-380.
expected return of the prior study. The proposed model [3] Masini, R. P., Medeiros, M. C., & Mendes, E. F.
also has a lower expected risk of 4.5% (with a standard (2023). Machine learning advances for time series
deviation of 1.2%), whereas previous research shows a forecasting. Journal of Economic Surveys, 37(1), 76-
higher expected risk of 5.5%. This is really crucial. These 111.
findings demonstrate that the recommended approach may [4] Wang, J., Hong, S., Dong, Y., Li, Z., & Hu, J. (2024).
assist investors and banks make better choices by making Predicting stock market trends using lstm networks:
the best use of their portfolios, getting the most money overcoming RNN limitations for improved financial
back, and lowering their risk. forecasting. Journal of Computer Science and
Software Applications, 4(3), 1-7.
Table 8: Decision optimization results [5] S. Safwat, A. Mahmoud, I. Eldesouky Fattoh and F.
Ali, "Hybrid Deep Learning Model Based on GAN
Proposed Existing Existing Existing and RESNET for Detecting Fake Faces," in IEEE
Metric Model Work 1 Work 2 Work 3 Access, vol. 12, pp. 86391-86402, 2024, doi:
Optimized 65% stocks, 60% stocks, 55% stocks, 70% stocks, 10.1109/ACCESS.2024.3416910.
Portfolio 30% bonds, 5% 35% bonds, 40% bonds, 25% bonds, [6] Shi X, Zhang Y, Yu M, Zhang L. 2025. Deep learning
Allocation cash 5% cash 5% cash 5% cash for enhanced risk management: a novel approach to
8.2% (Standard analyzing financial reports. PeerJ Computer Science
deviation of 11:e2661 https://doi.org/10.7717/peerj-cs.2661
Expected expected
Return return: 1.5%) 7.50% 7.00% 8.00% [7] Huang, X.; Han, M.; Deng, Y. A Hybrid GAN-
Inception Deep Learning Approach for Enhanced
4.5% (Standard
deviation of Coordinate-Based Acoustic Emission Source
Expected expected risk: Localization. Appl. Sci. 2024, 14, 8811.
Risk 1.2%) 5.50% 6.00% 4.80% https://doi.org/10.3390/app14198811
[8] Ren, S. (2022). Optimization of enterprise financial
management and decision‐making systems based on
6 Conclusion big data. Journal of Mathematics, 2022(1), 1708506.
[9] Qi, Q. (2022). Analysis and forecast on the price
change of shanghai stock index. Journal of
Generative Adversarial Networks (GANs) are used in this
Economics, Business and Management, 10(1), 72-
study to anticipate financial risk dynamics and make the
78.
optimal judgments. The model trains a risk prediction
[10] Petrozziello, A., Troiano, L., Serra, A., Jordanov, I.,
model using phony financial data from a GAN. Based on
Storti, G., Tagliaferri, R., & La Rocca, M. (2022).
financial risk prediction, the decision optimization model
produces the optimum financial judgments. The model
314 Informatica 49 (2025) 303–314 A. Li
Deep learning for volatility forecasting in asset Published by: Wiley for the Royal Statistical Society
management. Soft Computing, 26(17), 8553-8574. Stable URL:
[11] Li, Y., & Pan, Y. (2022). A novel ensemble deep http://www.jstor.or
learning model for stock prediction based on stock [21] Sutiene K, Schwendner P, Sipos C, Lorenzo L,
prices and news. International Journal of Data Mirchev M, Lameski P, Kabasinskas A, Tidjani C,
Science and Analytics, 13(2), 139-149. Ozturkkal B, Cerneviciene J. Enhancing portfolio
[12] Souto, H. G., & Moradi, A. (2023). Forecasting management using artificial intelligence: literature
realized volatility through financial turbulence and review. Front Artif Intell. 2024 Apr 8; 7:1371502.
neural networks. Economics and Business Review, doi: 10.3389/frai.2024.1371502. PMID: 38650961;
9(2), 133-159. PMCID: PMC11033520.
[13] Zhan, X., Ling, Z., Xu, Z., Guo, L., & Zhuang, S. [22] Xu, R., Yang, Y., Qiu, H., Liu, X., & Zhang, J.
(2024). Driving efficiency and risk management in (2024). Research on Multimodal Generative
finance through AI and RPA. Unique Endeavor in Adversarial Networks in the Framework of Deep
Business & Social Sciences, 3(1), 189-197. Learning. Journal of Computing and Electronic
[14] Wei, L., Deng, Y., Huang, J., Han, C., & Jing, Z. Information Management, 12(3), 84-88.
(2022). Identification and analysis of financial [23] Dai, W., Tao, J., Yan, X., Feng, Z., & Chen, J. (2023,
technology risk factors based on textual risk November). Addressing Unintended Bias in Toxicity
disclosures. Journal of Theoretical and Applied Detection: An LSTM and Attention-Based
Electronic Commerce Research, 17(2), 590-612. Approach. In 2023 5th International Conference on
[15] Lei, Y., Qiaoming, H., & Tong, Z. (2023). Research Artificial Intelligence and Computer Applications
on supply chain financial risk prevention based on (ICAICA) (pp. 375- 379). IEEE.
machine learning. Computational Intelligence and [24] Yao, J., Wu, T., & Zhang, X. (2023). Improving depth
Neuroscience, 2023(1), 6531154. gradient continuity in transformers: A comparative
[16] Levytska, S., Pershko, L., Akimova, L., Akimov, O., study on monocular depth estimation with cnn. arXiv
Havrilenko, K., & Kucherovskii, O. (2022). A preprint arXiv:2308.08333.
riskoriented approach in the system of internal [25] Wang, X. S., & Mann, B. P. (2020). Attractor
auditing of the subjects of financial monitoring. Selection in Nonlinear Energy Harvesting Using
International Journal of Applied Economics, Finance Deep Reinforcement Learning. arXiv preprint
and Accounting, 14(2), 194-206. arXiv:2010.01255.
[17] Wang, H., & Budsaratragoon, P. (2023). Exploration [26] Zhang, Y., Jiang, Z., Peng, C., Zhu, X., & Wang, G.
of an" Internet+" grounded approach for establishing (2024). Management analysis method of multivariate
a model for evaluating financial management risks in time series anomaly detection in financial risk
enterprises. International Journal for Applied assessment. Journal of Organizational and End User
Information Management, 3(3), 109-117. Computing, 36(1), 1 -19.
[18] A. Malki, E. Atlam, I. Gad, Machine learning [27] Pandey, A., Mannepalli, P.K., Gupta, M. et al. A
approach of detecting anomalies and forecasting Deep Learning-Based Hybrid CNN-LSTM Model
time-series of IoT devices, Alex. Eng. J., 61 (11) for Location-Aware Web Service Recommendation.
(2022), pp. 8973-8986, 10.1016/j.aej.2022.02.038 Neural Process Lett 56, 234 (2024).
[19] K.E. Arunkumar, D. V Kalaga, C. Mohan, S. Kumar, https://doi.org/10.1007/s11063-024-11687-w
T.M. Brenza, Comparative analysis of Gated [28] Zhigang Sun, Guotao Wang, Pengfei Li, Hui Wang,
Recurrent Units (GRU), long Short-Term memory Min Zhang, Xiaowen Liang, An improved random
(LSTM) cells, autoregressive Integrated moving forest based on the classification accuracy and
average (ARIMA), seasonal autoregressive correlation measurement of decision trees, Expert
Integrated moving average (SARIMA) for Systems with Applications, Volume 237, Part B,
forecasting COVID-19 trends 2024, https://doi.org/10.1016/j.eswa.2023.121549.
Alex. Eng. J., 61 (10) (2022), pp. 7585-7603, [29] Saratu Yusuf Ilu, Rajesh Prasad, improved
10.1016/j.aej.2022.01.011 autoregressive integrated moving average model for
[20] R.S. Society, Review author (s): M. G. Kendall COVID-19 prediction by using statistical
review by: M. G. Kendall source: journal of the royal significance and clustering techniques, Heliyon,
statistical society. Series A (general), J. Roy. Stat. Volume 9, Issue 2, 2023,
Soc., 134 (3) (2016), pp. 450-453, 134, No. 3 (1971), e13483,
https://doi.org/10.1016/j.heliyon.2023.e13483.
https://doi.org/10.31449/inf.v49i16.9643 Informatica 49 (2025) 315–330 315
GridRiskNet: A Two-Stage Hybrid Model for Project Investment
Risk Management of Power Grid Enterprises Using Big Data Mining
Hongzhi Gao*, Dekyi, Metok
State Grid Tibet Electric Power Co., Ltd., Lhasa 850000, China
E-mail: Djgy1108@163.com
*Corresponding author
Keywords: power grid enterprise engineering project, GridRiskNet, big data mining, project investment risk
management, two-stage hybrid modeling
Received: June 10, 2025
To enhance the power grid enterprise's ability to comprehensively perceive and dynamically assess
investment risks in engineering projects, this study proposes a risk management model called GridRiskNet
based on big data mining. This model integrates structured, unstructured, and spatiotemporal data and
realizes intelligent identification of project risk probability distributions and potential impact ranges by
constructing a two-stage hybrid modeling architecture. In the first stage, the model uses eXtreme Gradient
Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) to extract static and dynamic
features in parallel. In the second stage, it introduces Graph Attention Recurrent Neural Network (GA-
RNN) to model risk propagation paths under the power grid topology. Meanwhile, this study combines
Spatio-Temporal Graph Convolutional Network (ST-GCN) to improve the coupling expression of
meteorological and text features. The experiment uses multi-source public data for verification, such as
power infrastructure data from the U.S. Energy Information Administration, meteorological observation
data from the National Oceanic and Atmospheric Administration, and power grid topology data from
OpenStreetMap. The results show that GridRiskNet performs excellently in risk prediction stability and
regional propagation modeling. Among them, the risk principal component analysis projection score in
2023 reached 7.779. This indicates that cost overruns, climate pressure, and equipment technology risks
together form a high-risk cluster, with cost overruns increasing by 269% compared with 2018. In the
State-of-the-Art comparison, GridRiskNet achieves an F1-score of 0.892, a Receiver Operating
Characteristic - Area Under Curve of 0.962, a Risk Impact Radius error of approximately 4.8 km, and a
Risk Entropy of 0.89; these are comprehensively better than existing methods. Moreover, the model has
good cross-modal feature fusion and risk transmission mechanism identification capabilities, and can
effectively characterize the spatiotemporal coupling risk features in complex power grid projects. Overall,
this system can provide power grid enterprises with structured and interpretable risk index outputs and
regional early warning support. Thus, it helps to improve the investment safety and operational and
maintenance resilience of projects.
Povzetek: Predstavljen je GridRiskNet, dvofazni hibridni model za upravljanje investicijskih tveganj v
elektroenergetskih projektih. S križnim združevanjem strukturiranih, besedilnih in prostorsko-časovnih
podatkov ter uporabo XGBoost/LightGBM in GA-RNN izboljša napoved tveganj (F1=0,892, AUC=0,962)
ter natančno modelira regionalno širjenje tveganj (napaka 4,8 km).
1 Introduction process from construction preparation, equipment
deployment, to operation and maintenance support [2].
With the accelerated promotion of energy transition Especially against the backdrop of the rapid development
and the construction of new power systems, the strategic of renewable energy, the risk types in project investment
position of power grid engineering projects in national are constantly evolving. For example, the enhancement
energy security and clean energy consumption has of climate extremeness, the swift change of equipment
become increasingly prominent [1]. However, power grid technology paths, and the increase in policy compliance
enterprises face problems such as the surge of multi- costs all propose higher requirements for the intelligence
source heterogeneous data, highly uncertain engineering and adaptability of risk early warning systems [3-5].
environments, and frequent external disturbances during Therefore, constructing a big data mining-based
project investment and construction. These problems intelligent risk assessment model has become a key path
make traditional risk management methods difficult to to improving the investment decisions' scientific nature
cover the dynamic risk chain throughout the whole and the power grid enterprises' resilience governance
316 Informatica 49 (2025) 315–330 H. Gao et al.
capabilities [6, 7]. In the context of power market comprehensive portrayal of investment risks in power
liberalization, the continuous increase in the proportion grid engineering projects by constructing a composite
of renewable energy has made the risk management of model that integrates structured, spatiotemporal, and text
geographical locations caused by network congestion data. It also aims to verify the advantages of the proposed
increasingly important. Improving the ability to model method in terms of risk identification accuracy,
location-related risks has become the core foundation for propagation path reducibility, and risk distribution
supporting project financing and investment feasibility stability. Thus, it supports power grid enterprises in risk
assessment [8]. early warning and decision optimization.
In recent years, artificial intelligence (AI)
technologies have made remarkable progress in risk
identification, modeling, and prediction. Model 2 Related work
architectures represented by graph neural network (GNN),
attention mechanisms, and deep semantic modeling have With the in-depth application of AI technologies and
been gradually applied to financial risk control and big data analysis methods in engineering management,
energy dispatching [9, 10]. Some studies have attempted investment project risk assessment has gradually shifted
to introduce machine learning (ML) methods into the from traditional static analysis to intelligent prediction
power engineering field. It includes using eXtreme and dynamic modeling. Aiming at the insufficiency of
Gradient Boosting (XGBoost) for classification and risk assessment for manufacturing investments, Dong
identification of construction anomalies, or employing a and Li proposed combining expert experience with big
convolutional neural network (CNN) for trend prediction data mining to construct project risk indices and
of construction period delays [11]. However, existing integrating CNN with Long Short-Term Memory (LSTM)
methods generally suffer from shortcomings such as a for predictive modeling. In multiple sliding window tests,
single model structure, weak data fusion capability, and the model achieved a Receiver Operating Characteristic
difficulty in explaining cross-modal causal paths; these (ROC) value of 0.9366 and an average accuracy of
methods cannot effectively support power grid 94.95%, demonstrating high prediction precision [12].
enterprises in achieving full-chain risk perception, Loseva et al., facing the risk assessment task of regional
dynamic quantification, and structural early warning in a franchising projects, constructed a big data-based credit
multi-source data environment. Therefore, there is an rating model by combining the SPARK information
urgent need to construct a multi-modal driven composite system with ML methods. This verified the model's
risk assessment system for power grid engineering robustness in identifying abnormal risks through
scenarios. Spearman correlation and confusion matrix [13]. These
To this end, this study proposes the GridRiskNet studies have provided useful insights into introducing
model based on big data mining, constructs a fusion composite modeling methods and integrating expert
mechanism for structured data, unstructured text, and judgment with data-driven mechanisms, gradually
spatiotemporal data. Thus, it realizes comprehensive promoting the development of investment risk
modeling and dynamic evaluation of investment risks in assessment towards intelligence and systematization.
power grid engineering projects. The study's main Over the years, methods such as GNN, deep
innovations encompass: clustering, and multi-criteria decision-making have been
(1) Proposing a two-stage GridRiskNet model widely introduced into investment evaluation and project
architecture: It integrates XGBoost and Light Gradient classification, further enhancing the structural cognitive
Boosting Machine (LightGBM) for risk capture, and ability of risk assessment. Mostofi et al. constructed a
models the propagation process of risks in the power grid construction project investment framework based on
topology through a Graph Attention Recurrent Neural graph attention networks. This framework achieved a
Network (GA-RNN). classification accuracy of over 98% in three sub-networks
(2) Introducing Spatio-Temporal Graph of region, country, and financing model, demonstrating
Convolutional Network (ST-GCN) and cross-modal the advantages of graph structure in modeling investment
attention mechanisms: It enhances the model's expression decision-making relationships [14]. Qi used regularized
capabilities for meteorological disturbances and regional topic models and graph clustering methods to construct a
structural information; financial investment "behavior circle". They mapped
(3) Constructing a risk principal component customer behaviors to the latent semantic space and
projection index system based on Principal Component realized risk classification of financial communities and
Analysis (PCA): It achieves structural clustering and investment plan recommendations through subgraph
projection analysis of high-dimensional risk samples, and mining [15]. Moreover, Luo and Zhu proposed a deep
supports the differentiated regional risk management neural network (DNN) model based on transfer learning
needs of power grid enterprises. for regional investment risk assessment. This model
Overall, the specific research question is whether maintained high prediction accuracy (up to 92%) in the
multimodal data fusion and risk propagation modeling case of insufficient samples, demonstrating the potential
methods can enhance the comprehensive capabilities of of deep learning in solving unbalanced data problems
risk classification, propagation path identification, and [16]. These studies all reflect the integration trend of risk
uncertainty quantification in complex power grid assessment models in recent years towards deep
engineering projects. The target outcome is to achieve a representation learning, multi-layer decision-making
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 317
structures, and complex graph relationship modeling. multi-source features and generate unified and dense
Although existing studies have made positive high-dimensional risk representation vectors, laying the
progress in risk modeling methods, index system foundation for multi-dimensional risk modeling [20].
construction, and model accuracy improvement, there are At the core of modeling, GridRiskNet adopts a two-
still three main deficiencies. First, most current models stage hybrid modeling framework. In the first stage, the
focus on classification or regression prediction of risk improved XGBoost and LightGBM models run in
probability, lacking the ability to model regional parallel to jointly perform risk prediction on high-
structural propagation characteristics. Second, the dimensional risk representation vectors. Specifically,
heterogeneity of multi-source data has not been fully XGBoost integrates a dynamic feature selection
utilized, and a unified representation for structured, mechanism, which dynamically updates feature
spatiotemporal, textual, and other multimodal importance indices based on sliding window statistical
information has not been formed. Third, the features to enhance the response capability to dynamic
interpretability and quantifiability of risk structure risk factors. LightGBM incorporates a time-series-aware
evolution must be enhanced, making it difficult to support splitting criterion to strengthen the detection capability
dynamic scheduling and regional risk management of for time-series anomalies such as project schedule delays.
complex systems such as power grid projects [17]. In The two models output the prediction probabilities of risk
response to the above shortcomings, this study proposes categories (i.e., risk probability vectors after Softmax)
a grid engineering project investment risk management and sequences of feature importance scores [21].
system based on big data mining - the GridRiskNet model. In the second stage, GA-RNN is used as a meta-
This model reveals the changing trends of high- model, whose core innovation lies in fusing the dual
dimensional risk structures and supports grid enterprises output information from the first stage mentioned above.
in accurately perceiving and dynamically controlling Specifically, GA-RNN takes the risk probability vectors
investment risks at different regions and time scales. of XGBoost and LightGBM as the main input;
simultaneously, it introduces their feature importance
3 GridRiskNet model based on big score sequences as auxiliary features to form a
comprehensively fused feature matrix. This matrix
data mining contains the risk prediction results from the previous
stage; it also explicitly integrates the influence weights of
3.1 Realization process of GridRiskNet features on the model output, thereby enhancing the
ability to perceive risk propagation mechanisms [22].
model Subsequently, based on this matrix, GA-RNN introduces
The proposed GridRiskNet model realizes a risk propagation graph structure and accurately models
intelligent assessment of investment risks in power grid the transmission relationships between risk factors
engineering projects based on multi-source through an adjacency matrix. Moreover, it uses graph
heterogeneous data fusion and a hybrid ML architecture. attention mechanisms and recurrent neural network
It first establishes a multimodal data preprocessing layer. (RNN) units to dynamically learn key nodes and main
For structured data (such as project budgets and channels in risk propagation paths, extracting high-order
equipment parameters), an adaptive normalization interaction features.
method is used to unify dimensions, ensuring the The entire GridRiskNet model comprehensively
consistency of feature scales. For unstructured text data optimizes classification cross-entropy loss, risk
(including engineering logs and bidding documents), a propagation graph reconstruction error, and feature
fine-tuned Bidirectional Encoder Representations from stability regularization terms through an end-to-end joint
Transformers (BERT) model is utilized to deeply extract training strategy. Finally, this model outputs a multi-
semantic features, enhancing the risk perception ability dimensional risk assessment matrix covering risk
of text information. For spatiotemporal data (such as probability distribution, potential impact range, and
construction trajectories and meteorological records), ST- structural features. The entire system adopts an online
GCN is introduced to jointly encode complex incremental learning mechanism, which can continuously
environmental features from two dimensions: spatial absorb real-time data flow to dynamically update model
dependence and temporal dynamics [18, 19]. In the parameters; this achieves a high adaptability and
feature fusion stage, a cross-modal attention mechanism continuous tracking of the risk environment of power grid
is designed, which can adaptively learn the weight engineering projects. The implementation process and
relationships between different data modalities. pseudocode of GridRiskNet are illustrated in Figures 1
Meanwhile, this mechanism can effectively integrate and 2.
318 Informatica 49 (2025) 315–330 H. Gao et al.
Figure 1: The implementation process of GridRiskNet
class GridRiskNet: def _process_structured(self, data):
def __init__(self, config): return AdaptiveNormalization(data)
self.config = config
self.preprocessor = MultiModalPreprocessor(config) def _process_text(self, data):
self.feature_fusion = CrossModalAttention(config) return FineTuneBERT(self.bert_model, data)
self.first_stage = HybridEnsembleModels(config)
self.risk_graph = RiskPropagationGraph(config) def _process_spatiotemporal(self, data):
self.second_stage = GARNNMetaModel(config, self.risk_graph) return STGCN(self.stgcn_params).forward(data)
def train(self, dataset): class CrossModalAttention:
features = self.preprocessor.process(dataset) def __call__(self, features):
fused_features = self.feature_fusion(features) weights = self._compute_attention_weights(features)
first_stage_preds = self.first_stage.train(fused_features, dataset.labels) return weighted_sum(features, weights)
self.second_stage.train(first_stage_preds, fused_features, dataset.labels)
class HybridEnsembleModels:
for epoch in range(self.config.epochs): def __init__(self, config):
preds = self.predict(dataset) self.xgboost = ImprovedXGBoost(config)
loss = self._calculate_loss(preds, dataset.labels) self.lightgbm = ImprovedLightGBM(config)
self._update_models(loss)
def train(self, features, labels):
def predict(self, dataset): xgb_preds = self.xgboost.train(features, labels)
features = self.preprocessor.process(dataset) lgbm_preds = self.lightgbm.train(features, labels)
fused_features = self.feature_fusion(features) return combine_predictions(xgb_preds, lgbm_preds)
first_stage_preds = self.first_stage.predict(fused_features)
return self.second_stage.predict(first_stage_preds, fused_features) class RiskPropagationGraph:
def __init__(self, config):
def update_with_new_data(self, new_data): self.adj_matrix = self._construct_adjacency_matrix(config.risk_factors)
features = self.preprocessor.update_and_process(new_data)
self.first_stage.update(features, new_data.labels) def _construct_adjacency_matrix(self, risk_factors):
first_stage_preds = self.first_stage.predict(features) # Construct adjacency matrix based on domain knowledge or data learning
self.second_stage.update(first_stage_preds, features, new_data.labels) pass
class MultiModalPreprocessor: class GARNNMetaModel:
def process(self, dataset): def train(self, first_stage_preds, features, labels):
return { # Train GA-RNN model
'structured': self._process_structured(dataset.structured), pass
'text': self._process_text(dataset.text),
'spatiotemporal': self._process_spatiotemporal(dataset.spatiotemporal) def predict(self, first_stage_preds, features):
} # Predict risk assessment matrix
pass
Figure 2: The pseudocode of GridRiskNet
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 319
learn the most discriminating risk signal source when
3.2 Mathematical modeling principle of faced with heterogeneous features and semantic diversity.
GridRiskNet model The hybrid modeling framework is divided into two
stages. In the first stage, the improved XGBoost and
Figure 1 shows the complete implementation LightGBM models are run in parallel. The objective
process of the GridRiskNet model, covering the entire function of XGBoost reads:
process from user-requested risk assessment to model ℒ𝑥𝑔𝑏 = ∑𝑛 𝑙(𝑦 + ∑𝐾
𝑖=1 𝑖 , ?̂?𝑖) 𝑘=1 Ω(𝑓𝑘) (8)
output results and continuous updates. The model is built
?̂?
on multi-source heterogeneous data, fusing structured, 𝑖 = ∑𝐾
𝑘=1 𝑓𝑘(𝐱𝑖) , which represents the predicted
1
unstructured, and spatiotemporal information, and value of sample 𝑖 . Ω(𝑓𝑘) = 𝛾𝑇𝑘 + 𝜆 ∥ 𝜔 ∥2 is the
2 𝑘
achieves intelligent prediction of power grid project risks regular term of the 𝑘-th tree.
through multi-stage ML and graph modeling strategies. To adapt to the dynamic change of time, XGBoost
The key computational links in the model are described integrates a sliding window statistical module to
mathematically and logically as follows. dynamically adjust the importance of features:
In the data preprocessing stage, the structured input (𝑡)
𝐼𝑗 = ∑𝑡
𝑠=𝑡−𝑤 (9)
data is first normalized. Let the original data matrix be: (𝑠)
Δ𝐺
𝐗 𝑗 indicates the gain change of the 𝑗-th feature
𝑠 ∈ ℝ𝑛×𝑑𝑠 (1)
(𝑡)
𝐗𝑠 represents 𝑛 records, and each record contains in the 𝑠 -th time step. 𝐼𝑗 c a dynamic feature
𝑑𝑠 structural features. Normalization calculation is as importance index within the XGBoost stage, used to
follows: reflect gain changes within the sliding window; it is also
𝐗 𝜇
?̃? 𝑠− mainly applied to internal feature selection and dynamic
𝑠 =
𝑠 (2)
𝜎𝑠+𝜖 weight adjustment of the first-stage model.
𝜇𝑠 denotes the column vector, indicating the
LightGBM introduces the split criterion of time
average value of each column. 𝜎𝑠 represents the
series perception to enhance the ability of anomaly
standard deviation (SD), and 𝜖 is a positive number to recognition. Let the time series samples be
prevent the denominator from being zero. This {𝐱1, 𝐱2, … , 𝐱𝑇}, and its splitting gain is defined as:
processing ensures that the model has numerical 2
∈𝐿 𝑔 (∑
consistency among different dimensional features. 𝒢 𝑇 (∑𝑖
𝑡 𝑖) 𝑖∈𝑅 𝑔 2
𝑗 = ∑𝑡=1 𝑤𝑡 ⋅ [ + 𝑡 𝑖)
] (10)
∑𝑖∈𝐿 ℎ 𝜆 ∑
𝑡 𝑖+ 𝑖∈𝑅 ℎ
𝑡 𝑖+𝜆
For unstructured text data 𝒯 = {𝑡1, 𝑡2, … , 𝑡𝑚} , 𝑔𝑖 and ℎ𝑖 are gradients and second derivatives. 𝐿𝑡
semantic features are extracted by fine-tuning BERT
and 𝑅𝑡 represent the left and right sample sets of the
model, and the output is:
current split, and 𝑤 −𝛽(𝑇−𝑡)
𝐇 𝑡 = 𝑒 is the time attenuation
𝑡 = BERT(𝒯) = [𝐡1; 𝐡2; … ; 𝐡𝑚], 𝐡𝑖 ∈ ℝ𝑑𝑡 (3)
weight.
𝐡𝑖 is the semantic vector of the 𝑖-th text, and the
In the second stage, GA-RNN is used to capture the
dimension is 𝑑𝑡 . This step preserves the semantic
high-order risk path. Its node status is updated to:
relationship between text contexts and forms an (𝑡) (𝑡−1) (𝑡−1)
important basis for the model to recognize risk semantics. 𝐡𝑖 = GRU(∑𝑗∈𝒩(𝑖) 𝛼𝑖𝑗𝐡𝑗 , 𝐡𝑖 ) (11)
For spatiotemporal data including trajectory and 𝒩(𝑖) indicates the neighbor set of node 𝑖 . 𝛼𝑖𝑗 v
meteorology, it is expressed as: the edge weight under the graph attention mechanism:
𝐗 𝑇×𝑁×𝐹 exp(LeakyReLU(𝐚⊤[𝐖𝐡
𝑠𝑡 ∈ ℝ (4) 𝑖∥𝐖𝐡𝑗]))
𝛼𝑖𝑗 = 12)
𝑇 refers to the time step. 𝑁 denotes the space node ∑𝑘∈𝒩(𝑖) exp(LeakyReLU(𝐚
⊤ (
[𝐖𝐡𝑖∥𝐖𝐡𝑘]))
(such as the site number), and 𝐹 represents the space- Finally, the system integrates three kinds of
time characteristic dimension of each node. ST-GCN is objectives: classification performance, graph structure
used for modeling, and its core propagation equation is: reconstruction, and feature stability by jointly optimizing
𝐙(𝑙+1) = 𝜎(∑𝐾 (𝑙) the overall loss function.
𝑘=0 𝐀𝑘𝐙 𝐖𝑘) (5)
𝐀 ℒ𝑐𝑒 is the cross-entropy loss:
𝑘 means the adjacency matrix of order 𝑘 ; 𝐙(𝑙)
indicates the node representation of the 𝑙-th layer; 𝐖 ℒ𝑡𝑜𝑡𝑎𝑙 = ℒ𝑐𝑒 + 𝜆1 ⋅ ℒ𝑔𝑟𝑎𝑝ℎ + 𝜆2 ⋅ ℒ𝑟𝑒𝑔 (13)
𝑘
stands for the weight matrix; 𝜎 represents the activation The loss of graph structure consistency is:
function. This network structure can capture the linkage ℒ𝑔𝑟𝑎𝑝ℎ =∥ 𝐀 − ?̂? ∥2𝐹 (14)
relationship between spatial topology and time evolution. The regular term of characteristic disturbance reads:
In the feature fusion stage, the model introduces ℒ 𝑑
𝑟𝑒𝑔 = ∑𝑗=1 Var(∇𝐱 ?̂?) (15)
𝑗
cross-modal attention mechanism to automatically On the system deployment level, GridRiskNet
aggregate multi-source information. Let two modal adopts online incremental learning mechanism. Let the
features be 𝐅𝑖 and 𝐅𝑗 respectively, and their attention current parameter be 𝜃𝑡, and the model is updated after
weights are calculated as: receiving new samples (𝐱𝑡 , 𝑦𝑡):
exp(𝐅⊤
𝛼 𝑖 𝐖𝑎𝐅𝑗) 𝜃𝑡+1 = 𝜃𝑡 − 𝜂 ⋅ ∇𝜃ℒ(𝐱𝑡 , 𝑦𝑡; 𝜃𝑡) (16)
𝑖,𝑗 = (6)
∑𝑘 exp(𝐅
⊤
𝑖 𝐖𝑎𝐅𝑘) 𝜂 represents the learning rate. ∇𝜃 denotes the
After fusion, a unified risk representation vector is gradient operator. This mechanism ensures that the model
obtained: has adaptive update abilities in a dynamic risk
𝐅𝑓𝑢𝑠𝑖𝑜𝑛 = ∑𝑗 𝛼𝑖,𝑗 ⋅ 𝐅𝑗 (7) environment.
This mechanism enables the model to automatically
320 Informatica 49 (2025) 315–330 H. Gao et al.
4 Experimental analysis of into spatiotemporal tensors, from which meteorological
risk features are extracted through ST-GCN encoding.
GridRiskNet model project The spatial topology data is obtained from the
OpenStreetMap power network dataset
investment risk management based on
(https://wiki.openstreetmap.org/wiki/Power_networks).
big data mining The OSMnx library extracts GIS data of substations and
transmission lines, constructing an adjacency matrix to
model the physical connection of the power grid. For
4.1 Data used in the study
unstructured text data, engineering accident reports from
To verify the risk management capability of the 2018 to 2023 corresponding to EIA projects are manually
GridRiskNet model for power grid enterprise engineering screened from the Federal Energy Regulatory
projects, the study uses three core public datasets for Commission (FERC) engineering accident report library
experimental validation and designs a fusion scheme for (https://elibrary.ferc.gov/eLibrary/search). After parsing
data heterogeneity. First, the structured data adopts the the text with Apache Tika, the fine-tuned BERT is input
U.S. Energy Information Administration (EIA) power to generate semantic vectors.
infrastructure dataset The following fusion strategies are adopted to
(https://www.eia.gov/electricity/data.php). Its API address the heterogeneity of multi-source data. 1)
interface screens power grid engineering project data Temporal alignment: All data is uniformly converted to
from 2018 to 2023, including budget, construction period, Universal Time Coordinated (UTC) timestamps and
equipment models, and other fields. After extracting the aggregated at a granularity of 1 day. 2) Spatial alignment:
original CSV-format data using Python's eia-python Meteorological stations, power grid nodes, and
library, adaptive normalization is performed to eliminate engineering sites are associated through GIS coordinate
dimension differences, which are associated with matching (error <1km). 3) Consistency of feature
subsequent spatiotemporal data through project IDs and encoding: Structured data is normalized to [0,1], text
date fields. Second, the spatiotemporal data selects the vectors are unified into 768 dimensions via BERT, and
National Oceanic and Atmospheric Administration spatiotemporal data is compressed into 256-dimensional
(NOAA) Global Historical Climatology Network-Daily features through ST-GCN. 4) Cross-modal attention
(GHCN-Daily) mechanisms automatically learn the weights of each
(https://www.ncei.noaa.gov/access/metadata/landing- modality, assigning higher attention scores to extreme
page/bin/iso?id=gov.noaa.ncdc:C00861). Daily values of meteorological text descriptions (such as "hurricane
temperature, precipitation, and wind speed are damage"). The specific process of importing data into
downloaded, and stations are matched to the project's GridRiskNet is presented in Figure 3.
geographic coordinates. The rnoaa toolkit converts them
Figure 3: The specific process of importing data into GridRiskNet
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 321
4.2 GridRiskNet model's thinking on risk of potential impact scope, the propagation path of risks in
the power grid topology is analyzed through GNN to
management ability analysis of power
identify high-risk nodes and their potentially affected
grid enterprises' engineering projects surrounding areas. The model inputs the fused multi-
The study is conducted from two aspects to dimensional features into a two-stage modeling
effectively analyze the risk management capability of the framework and outputs a risk assessment matrix
GridRiskNet model for power grid enterprise engineering including the above two types of indices to support the
projects. First, at the level of risk probability distribution, refined and structured management and decision-making
the probability of risks such as budget overrun and of power grid project risks [24, 25]. The entire analysis's
construction period delay is evaluated based on structured key indices and evaluation criteria are exhibited in Tables
data and spatiotemporal features [23]. Second, at the level 1 and 2.
Table 1: Explanation of key indices for analysis using the GridRiskNet model
Analytical
Index Data source/calculation method Description
dimension
Reflecting the distribution
Principal component score of high- position of samples in the risk
dimensional risk vector output by the principal component space, and is
Risk PCA Projection Score
GridRiskNet model after PCA used to identify high-risk
dimensionality reduction clustering or structural abnormal
samples.
The capture times of abnormal events Monitoring the frequency of
Risk Time-series Anomaly Frequency
in LightGBM abnormal progress.
probability
Softmax outputs the maximum Evaluating the credibility of the
distribution Model Confidence Score
probability value model output
analysis
Assessing the dispersion degree
The ratio of SD to the mean value of of risk probability distribution,
Risk Coefficient of Variation
the risk probability distribution the greater it is, the higher the
risk instability is.
Comprehensive weighted scores of Representing the strength of risk
Risk Importance Index
multiple dimensions influence
Information entropy calculation of The degree of uncertainty in
Risk Entropy
risk probability distribution evaluating risk results.
Critical path length identified in GA- Length and complexity of the risk
Risk Propagation Path Length
Analysis of RNN propagation path
the The weighted average of the affected Reflect the vulnerability of nodes
potential Node Vulnerability Score
probability of each node in GNN in the power grid
influence Based on propagation path depth and
range Indicating the physical scope of
Risk Impact Radius the spatial adjacency matrix in the
risk propagation
graph structure
Table 2: Criteria for determining key indices in the GridRiskNet model analysis
Index Type Criteria
Secondary - [0, 2) Low projection; [2, 5) Medium projection; ≥5 High projection,
Risk PCA Projection Score
calculation tending to abnormal samples or extreme types
Time-series Anomaly Frequency Model output - [0, 2) Normal; [2, 5) Early warning; ≥5 Abnormal
- [0.9, 1] High credibility; [0.7, 0.9) Medium credibility; <0.7 Low
Model Confidence Score Model output
credibility
Secondary
Risk Coefficient of Variation - [0, 0.3) Stable; [0.3, 0.6) Fluctuating; ≥0.6 Highly unstable
calculation
Secondary
Risk Importance Index - [0, 40) Secondary; [40, 70) Important; [70, 100] Critical
calculation
Secondary
Risk Entropy - [0, 1) Low uncertainty; [1, 2) Medium; ≥2 High
calculation
Risk Propagation Path Length Model output - [1, 3) Local; [3, 6) Regional; ≥6 Global
Node Vulnerability Score Model output - [0, 0.4) Low; [0.4, 0.7) Medium; [0.7, 1] High
Secondary
Risk Impact Radius - [0, 5) Station level; [5, 20) Line level; ≥20 Regional level
calculation
In Table 2, the index equations involved in (1) Risk PCA Projection Score
secondary calculation are as follows: Here, "Risk PCA Projection Score" measures the
322 Informatica 49 (2025) 315–330 H. Gao et al.
position of a sample in the dominant risk structure within the system, which helps to identify complex and
the risk feature space, revealing the main variation trends unpredictable risk scenarios, represented as:
in complex multi-dimensional risk features. Specifically, 𝐻 = −∑𝑛
𝑖=1 𝑝𝑖 ⋅ log2(𝑝𝑖 + 𝜖) (19)
this index is calculated based on the PCA method. First, 𝐻 denotes the information entropy of risk
it standardizes the annual high-dimensional risk features distribution.
(such as cost overrun risk, environmental and climate (4) Risk Importance Index
pressure, etc.). Then, it extracts the first K principal This index quantifies the comprehensive
component directions and measures the sample's contribution of each risk feature to the overall risk
projection value in the principal component space assessment results. It reflects the importance level of each
through eigenvalue weighting. This score reflects the risk feature by weighted accumulation of the impact
degree of variance contribution of the sample along the degree of each feature on the model loss and normalized
principal component axis of risk, rather than a simple sum averaging combined with model weights. Features with
of the scores of each risk factor. Due to the different higher values play a greater role in the overall risk
statistical distributions of risk features each year, this decision-making, expressed as:
index changes with the year; it comprehensively reflects (𝑡) (𝑡)
1 𝑤 ⋅Δ𝐿
𝑅𝐼𝑗 = ∑𝑇
= 𝑗 𝑗
the overall trend of the risk structure of power grid 𝑇 𝑡 1 ( (𝑡 )
∑𝑑
) (20)
𝑘=1 Δ𝐿𝑘
projects in the current year and potential abnormal
𝑅𝐼𝑗 represents the risk importance index of the 𝑗-th
clustering characteristics. The calculation is expressed as:
feature, which is a risk importance index in the entire
𝑠𝑖 = ∑𝐾 ⊤
𝑘=1 𝜆𝑘 ⋅ (𝐮𝑘(𝐱𝑖 − 𝝁))2 (17)
GridRiskNet framework. It is comprehensively
𝐱𝑖 represents the high-dimensional risk feature
calculated based on the feature weights and loss impact
vector of the 𝑖-th sample. 𝝁 indicates the sample mean
during the global model training process, belonging to a
vector. 𝐮𝑘 denotes the feature vector in the direction of
unified index at the global level. 𝑇 means the number of
the 𝑘-th principal component. 𝜆𝑘 is the feature value of (𝑡)
the 𝑘-th principal component, and 𝐾 means the number model iterations or average times; 𝑤𝑗 refers to the
of selected principal components. model weight of the 𝑗 -th feature in the 𝑡 -th iteration;
(2) Risk Coefficient of Variation (𝑡)
Δ𝐿𝑗 is the influence degree of the 𝑗-th feature on the
This index is used to measure the relative dispersion loss function; 𝑑 denotes the total number of features.
of risk probability distribution, which is an important (5) Risk Impact Radius
index reflecting risk instability. This index describes the It evaluates the spatial propagation range of risks in
fluctuation range of various risk probabilities in the the power grid graph structure, serving as a key index for
whole by calculating the ratio of the SD of risk measuring the physical scope affected by risks. This
probability to the mean. A higher value indicates that the index calculates the average impact radius of all risk
risk probability distribution is more dispersed and the source nodes in the network based on the power grid
overall instability is stronger. The expression is: topology, geographical distance between nodes, and risk
1
√ ∑𝑛 1 − )
𝑛 𝑖= (𝑝 2
𝑖 ?̅? propagation probability. A larger value indicates a wider
𝐶𝑉 = (18)
?̅?+𝜖 spatial propagation range of risk events, which is applied
𝑝𝑖 denotes the prediction probability of Class 𝑖 to regional risk impact analysis, as follows:
1
risk. ?̅? means the average value of various risk 𝑅 ∑𝑁
= 𝑠
= 𝑁 ⋅ 𝑑
𝑁 𝑖 1 ∑𝑗=1 𝑡𝑖𝑗 𝑖𝑗 ⋅ 𝑝𝑖𝑗 (21)
probabilities, and 𝑛 represents the total number of risk 𝑠
𝑁𝑠 represents the number of risk source nodes. 𝑁
categories.
denotes the total number of nodes in the graph. 𝑡𝑖𝑗 is the
(3) Risk Entropy
adjacency relationship between nodes 𝑖 and 𝑗 (1 means
"Risk Entropy" measures the degree of uncertainty
connection). 𝑑𝑖𝑗 means the geographical distance
in the risk probability distribution, reflecting the
discreteness and unpredictability of risk results. Based on between nodes. 𝑝𝑖𝑗 refers to the risk propagation
information entropy theory, this index reveals the probability from nodes 𝑖 to 𝑗.
potential risk mixture in the system by calculating the Figure 4 presents the pseudocode of the index
entropy value of the probability of all risk categories. A implementation involving secondary calculation.
higher risk entropy value indicates more uncertainties in
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 323
import math
import numpy as np # 4. Risk Importance Index
def compute_risk_importance(weights, delta_losses):
# 1. Risk PCA Projection Score T = len(weights)
def compute_risk_pca_projection_scores(X, K): D = len(weights[0])
mu = np.mean(X, axis=0) importance = [0.0] * D
X_centered = X - mu for j in range(D):
cov_matrix = np.cov(X_centered, rowvar=False) for t in range(T):
eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix) total_delta = sum(delta_losses[t])
sorted_idx = np.argsort(eigenvalues)[::-1] if total_delta == 0:
eigenvalues = eigenvalues[sorted_idx][:K] continue
eigenvectors = eigenvectors[:, sorted_idx][:, :K] importance[j] += weights[t][j] * delta_losses[t][j] / total_delta
importance[j] /= T
# 2. Risk Coefficient of Variation return importance
def compute_risk_cv(probabilities):
mean_p = np.mean(probabilities) # 5. Risk Impact Radius
std_p = np.std(probabilities) def compute_risk_impact_radius(adj_matrix, distance_matrix,
epsilon = 1e-6 propagation_probs, source_nodes):
return std_p / (mean_p + epsilon) N = len(adj_matrix)
total_radius = 0.0
# 3. Risk Entropy for i in source_nodes:
def compute_risk_entropy(probabilities): for j in range(N):
epsilon = 1e-6 if adj_matrix[i][j] == 1:
return -sum(p * math.log2(p + epsilon) for p in probabilities) total_radius += distance_matrix[i][j] * propagation_probs[i][j]
return total_radius / len(source_nodes)
Figure 4: Pseudocode of index implementation involving secondary calculation
The experimental environment and key parameters are detailed in Table 3.
Table 3: Experimental environment and key parameters arrangement of the study
Category Configuration item Parameter setting
Hardware Computing platform NVIDIA A100 (40GB memory) × 4
environment CPU AMD EPYC 7763 (64-core)
Memory 512GB DDR4
Deep learning framework PyTorch 1.12 + CUDA 11.6
Software
GNN library PyTorch Geometric 2.2.0
environment
Traditional ML library XGBoost 1.6 + LightGBM 3.3.2
NLP toolkit HuggingFace Transformers 4.25 (BERT-base)
ST-GCN layer number 3 layers (hidden layer dimension =256)
Model Graph Attention Layer (number of heads =8) +GRU (hidden layer
GA-RNN unit
architecture =512)
Transmodal attention
Multi-attention (number of heads =4, fusion dimension =1024)
mechanism
Batch size 256 (structured data)/32 (graph data)
Initial learning rate 3e-4 (AdamW optimizer)
Training
regularization L2 Weight Attenuation =1e-5+Dropout=0.3
parameters
The loss of verification set does not decrease for 10 consecutive
Early stop mechanism
rounds
The study designs ablation experiments before the same hyperparameter configuration on the complete
conducting formal experiments to verify the actual dataset, focusing on evaluating three indices. These
contribution of each core component of GridRiskNet. It indices include risk classification performance (F1-score,
seeks to quantitatively measure the impact of different Receiver Operating Characteristic - Area Under Curve
modules on the model's overall performance from a (ROC-AUC)), risk propagation accuracy (Risk Impact
systematic perspective. Specifically, four ablation Radius error), and uncertainty quantification ability (Risk
versions are set by sequentially disabling the cross-modal Entropy). This experiment aims to clarify the mechanism
attention mechanism, the risk propagation modeling of action of each module, especially their specific
module of GA-RNN, the dynamic feature selection contributions to power grid risk transmission modeling,
module, and the risk propagation graph reconstruction modal feature fusion, and risk stability control. The
term in the joint loss function. All experiments maintain results of the ablation experiments are listed in Table 4.
324 Informatica 49 (2025) 315–330 H. Gao et al.
Table 4: Ablation experimental results of the GridRiskNet model
Ablation version F1-Score ROC-AUC Risk Impact Radius error (km) Risk Entropy
Full GridRiskNet 0.892 0.962 4.8±0.9 0.89
No cross-modal attention 0.835 0.917 7.5±1.6 1.12
No GA-RNN 0.846 0.926 14.2±2.3 0.96
No dynamic feature selection 0.863 0.941 5.7±1.2 0.94
Risk-free propagation graph
0.871 0.948 4.9±1.0 2.08
reconstruction
The results of the ablation experiments indicate that optimizing uncertainty quantification; its elimination
each module of GridRiskNet makes a significant causes a substantial rise in Risk Entropy. Overall,
contribution to the model performance. The cross-modal GridRiskNet achieves the unity of high performance and
attention mechanism is particularly crucial in improving high robustness through the collaboration of various
classification performance; after being disabled, the F1- modules, with all components being indispensable.
score decreases by 6.4%, the ROC-AUC drops by 4.7%,
and the Risk Entropy rises significantly. This shows that
this module significantly impacts the collaborative 4.3 Analysis Results of GridRiskNet
perception of complex semantic and meteorological
features. The risk propagation modeling module of GA- model on risk management ability of
RNN notably improves the Risk Impact Radius error; power grid enterprises' engineering
after being disabled, the error increases sharply to 14.2 projects
km, verifying its core role in power grid topology
modeling. The dynamic feature selection module mainly
enhances the temporal sensitivity of the model; its 4.3.1 Risk probability distribution analysis
removal leads to a significant drop in F1-score, although GridRiskNet's annual Risk PCA Projection Score
it has a limited impact on propagation errors. The risk results for power grid enterprise engineering projects are
propagation graph reconstruction term has a significant summarized in Table 5.
effect on suppressing prediction fluctuations and
Table 5: Annual Risk PCA Projection Score results
Ambient Supply
Cost Equipment Policy Risk PCA
climate chain
Year overrun technical compliance Projection Risk tendency
pressure fluctuation
risk C1 risk C3 risk C5 Score
C2 C4
Middle projection
2018 1.235 0.873 -0.452 0.217 0.095 2.108 (structural
abnormality)
2019 0.892 0.654 -0.128 -0.304 0.062 1.546 Low projection
High projection
2020 2.874 1.982 1.235 -0.873 0.517 4.856
(extreme type)
2021 1.023 1.457 0.782 0.396 -0.215 2.48 Middle projection
High Projection
2022 3.125 2.769 2.014 1.358 -0.947 5.894
(abnormal clustering)
High projection
2023 4.562 3.217 3.058 2.146 1.372 7.779
(extreme anomaly)
Table 5 shows that cost overrun risk (C1) and the "double carbon" policy. The model reflects high-risk
environmental climate pressure (C2) have always been clustering scenarios such as C1-C3 in 2023 through the
the dominant risks, especially showing exponential spatial distribution of principal components, reflecting
growth after 2020. In 2023, C1 (4.562) increased by 269% the early warning of composite risks.
compared with 2018 (1.235), which is highly consistent Based on the above analysis, the risk probability
with the reality of global inflation and frequent extreme distribution analysis of grid enterprise engineering
weather. The sudden turn positive (1.372) of policy projects by GridRiskNet is organized, and the annual
compliance risk (C5) in 2023, to some extent, reveals the average results of other indices are shown in Figure 5.
surge of compliance costs brought by the deepening of
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 325
Time-series Anomaly Frequency
Model Confidence Score
14 Risk Coefficient of Variation 1.0
Risk Importance Index
Risk Entropy 100
0.9 12 3
0.8
10
80
0.8 8 0.6 2
6 60
0.4
0.7 4 1
40
2
0.2
0.6 0 20 0
2018 2019 2020 2021 2022 2023
Year
Figure 5: The annual average results of other indices in GridRiskNet risk probability distribution analysis
Note: The curves of each index correspond one-to-one with the corresponding color coordinate axes on the right
In Figure 5, regarding the frequency of time-series assessment from "anomaly detection" to "impact
anomalies, the average annual growth rate of abnormal prediction".
events during 2019-2023 reached 65.7%. The model
objectively reflects the increasing complexity of risks 4.3.2 Analysis of potential influence range
through the continuous decline in confidence (from 0.912
to 0.632). The sudden increase in risk entropy (2.158) in The study divides the U.S. power grid into three
2020 preceded the peak of the importance index (83.47); major regions: The Eastern Interconnection Power Grid
this indicates that GridRiskNet can capture the implicit (EIPG), the Western Interconnection Power Grid (WIPG),
correlations of risk factors through information entropy. and the Texas Interconnected Power Grid (TIPG). The
The synchronous increase in the coefficient of variation EIPG covers the eastern, midwestern, and parts of
(from 0.712 to 0.859) and risk entropy (from 2.547 to southern U.S. states, extending northward to eastern
2.981) after 2022 reveals the transformation trend of risk Canada. The WIPG covers most western U.S. states,
distribution from centralized to discretized; this provides connecting with western Canada in the north and
key evidence for power grid enterprises to optimize the reaching parts of Mexico in the south. The TIPG includes
allocation of risk reserve funds. The core advantage of the most of Texas. These regional grids are interconnected at
model lies in the quantitative modeling of the dynamic limited DC points but mostly operate independently.
coupling relationship among the three dimensions of Based on this, GridRiskNet's analysis results on the
engineering anomalies, risk uncertainty, and impact potential impact scope of power grid enterprise
degree. Meanwhile, it realizes the full-chain risk engineering projects are displayed in Figure 6.
Risk Propagation Path Length Risk Propagation Path Length
Node Vulnerability Score
1.00 9.0 10 Node Vulnerability Score
1.00 9.0 10
Risk Impact Radius (km) Risk Impact Radius (km)
7.5 8 7.5 8
0.75 0.75
6.0
6 6.0
6
4.5 4.5
4 4
0.50 0.50
3.0 3.0
2 2
1.5 1.5
0.25 0 0.25 0
EIPG WIPG TIPG EIPG WIPG TIPG
Interconnected area Interconnected area
(a) (b)
Node Vulnerability Score
Risk Propagation Path Length
Model Confidence Score
Time-series Anomaly Frequency
Risk Impact Radius (km)
Node Vulnerability Score
Risk Propagation Path Length
Risk Coefficient of Variation
Risk Importance Index
Risk Entropy
Risk Impact Radius (km)
326 Informatica 49 (2025) 315–330 H. Gao et al.
Risk Propagation Path Length Risk Propagation Path Length
Node Vulnerability Score Node Vulnerability Score
1.00 9.0 20 1.00 9.0 20
Risk Impact Radius (km) Risk Impact Radius (km)
7.5 16 7.5 16
0.75 0.75
6.0 6.0
12 12
4.5 4.5
8 8
0.50 0.50
3.0 3.0
4 4
1.5 1.5
0.25 0 0.25 0
EIPG WIPG TIPG EIPG WIPG TIPG
Interconnected area Interconnected area
(c) (d)
Risk Propagation Path Length Risk Propagation Path Length
Node Vulnerability Score Node Vulnerability Score
1.00 9.0 30 1.00 9.0 40
Risk Impact Radius (km) Risk Impact Radius (km)
7.5 25 7.5 32
0.75 20 0.75
6.0 6.0
24
15
4.5 4.5
16
0.50 10 0.50
3.0 3.0
8
5
1.5 1.5
0.25 0 0.25 0
EIPG WIPG TIPG EIPG WIPG TIPG
Interconnected area Interconnected area
(e) (f)
Figure 6: Analysis of GridRiskNet's potential impact on power grid enterprise engineering projects ((a) 2018; (b)
2019; (c) 2020; (d) 2021; (e) 2022; (f) 2023)
Note: The curves of each index correspond one-to-one with the corresponding color coordinate axes on the right
Based on index definitions and annual data, and 2023, reaching 26.4 km in 2023, reflecting the
GridRiskNet demonstrates a scientific nature and significant cumulative effect of regional risk diffusion.
structural insight in the analysis of potential impact This significant spatial diffusion trend is not caused by
ranges. First, for Risk Propagation Path Length, WIPG single-year fluctuations but by the accumulation of
remains at a high level throughout the entire period, continuous transmission chains. Its essence is the scope
reaching 8.1 in 2023, significantly exceeding that of other expansion of power grid risks through multiple rounds of
regions. This gap is not accidental but a reflection of transmission and cross-node amplification, which is more
long-term structural characteristics, revealing the obvious, especially in scenarios with multiple
extensibility of transmission links in the western power overlapping risks. The reason why GridRiskNet can
grid due to complex terrain and diverse energy structures. effectively capture this phenomenon lies in the deep
Second, the changing trend of Node Vulnerability Score coupling of its GNN and propagation probability
is more enlightening; the three major power grids' scores mechanism. It can dynamically track the evolution of risk
all rose sharply in 2020, with the average value doubling paths and ranges in complex networks, thereby
compared to the previous year. This synchronous surge identifying the critical points and amplification effects of
highly aligns with the global external shock events in risk diffusion. Therefore, it possesses real value in
2020, indicating that the model is highly sensitive to regional risk monitoring and trend early warning. The
network vulnerability under systemic disturbances. capture of this cumulative diffusion trend reflects the
In addition, the Risk Impact Radius index essentially model's structural sensitivity to "spatiotemporal
measures the physical diffusion capacity of risks from overlapping risks", which far exceeds the single-
source nodes to the surrounding space; its calculation description capability of traditional static indices.
integrates network topology, geographical distance, and
propagation probability. According to the data, WIPG's 4.3.3 Comparative Analysis of GridRiskNet
Risk Impact Radius rapidly increased from 10.8 km in
and other models
2021 to 25.7 km in 2022, and further to 35.2 km in 2023,
with a cumulative increase of over 225% in two years. To comprehensively evaluate the GridRiskNet
TIPG also showed a continuous expansion between 2022 model's effectiveness in investment risk management of
Node Vulnerability Score Node Vulnerability Score
Risk Propagation Path Length Risk Propagation Path Length
Risk Impact Radius (km) Risk Impact Radius (km)
Node Vulnerability Score Node Vulnerability Score
Risk Propagation Path Length Risk Propagation Path Length
Risk Impact Radius (km) Risk Impact Radius (km)
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 327
power grid engineering projects, this study designs two ability (Risk Entropy) to reflect the model's
types of comparative experiments. The first type is a comprehensive capabilities. The second type is a detailed
horizontal comparison with existing State-of-the-Art comparison with classic baseline models, comparing
(SOTA) models. It selects representative models in risk individual methods such as XGBoost, LightGBM, ST-
assessment, regional propagation modeling, and GCN, and BERT-BiLSTM. It focuses on examining the
uncertainty quantification in recent years, including model's performance in robustness, spatial-temporal
methods such as CNN-LSTM, to ensure fair comparison feature extraction, anomaly detection, etc. It highlights
under a unified dataset and the same task indices. The the advantages of GridRiskNet in multimodal data fusion,
comparison covers risk classification performance (F1- dynamic feature learning, and risk path modeling. The
Score, ROC-AUC), regional propagation accuracy (Risk results of the two types of comparisons are exhibited in
Impact Radius error), and uncertainty quantification Tables 6 and 7.
Table 6: Comparison of the performance of GridRiskNet and SOTA models on the same dataset
Risk Impact
Model Researchers F1-Score↑ ROC-AUC↑ Radius error±σ Risk Entropy↓
(km) ↓
Dong and Li
CNN-LSTM 0.724 0.892 28.3±4.1 1.87
(2025)
The investment
framework based Mostofi et al.
0.781 0.903 22.6±3.8 1.52
on graph attention (2025)
networks
Topic model
Qi (2025) 0.698 0.841 - 2.03
clustering
DNN based on Luo and Zhu
0.763 0.885 - 1.68
transfer learning (2024)
The proposed
GridRiskNet 0.892 0.962 4.8±0.9 0.89
model
Table 7: Robustness comparison results of GridRiskNet and baseline models
Recall for delay anomaly
Model F1-Score Risk Impact Radius error±σ (km)
detection
XGBoost 0.712 32.5±6.2 0.683
LightGBM 0.735 29.8±5.4 0.721
ST-GCN 0.683 18.7±3.5 0.592
BERT-
0.698 - 0.654
BiLSTM
GridRiskNet 0.892 4.8±0.9 0.937
The SOTA model's comparison experiment reveals prediction fluctuations in high-risk scenarios through the
that GridRiskNet achieves considerable leadership in risk risk propagation graph reconstruction mechanism in the
classification, propagation modeling, and uncertainty joint loss function, minimizing Risk Entropy and
quantification. Although the GAT investment framework showing stronger stability of risk distribution.
performs well in traditional graph learning tasks, it cannot In the comparison with baseline models,
deeply integrate complex semantic features and GridRiskNet also demonstrates excellent robustness and
meteorological data, leading to an underestimation of overall advantages. Compared with XGBoost and
risks in some catastrophic events. In contrast, LightGBM, GridRiskNet not only improves the F1-score
GridRiskNet fully captures the coupling relationship but also is much higher than other models, showing
between accident texts and meteorological variables strong adaptability in complex dynamic data
through cross-modal attention mechanisms and dynamic environments. Concerning regional propagation accuracy,
feature fusion, and is significantly superior to other the Risk Impact Radius error of GridRiskNet fluctuates
models in F1-score and ROC-AUC. Meanwhile, its GA- very little; it is far better than ST-GCN, which only
RNN structure can accurately model risk transmission considers spatiotemporal features, proving the
paths under power grid topology, greatly reducing Risk effectiveness of its spatial topology and semantic
Impact Radius error. This verifies its high fitting ability information fusion strategy. Regarding time-series
to the physical characteristics of power grids. Regarding anomaly detection, GridRiskNet combines dynamic
uncertainty control, GridRiskNet effectively suppresses feature selection and time-series-aware splitting
328 Informatica 49 (2025) 315–330 H. Gao et al.
strategies, notably improving the recall and detecting GridRiskNet. The training efficiency in a complete
potential abnormal risks earlier. Overall, GridRiskNet production environment is tested on an NVIDIA A100×4
outperforms existing mainstream methods in multi- cluster, with the following records. They encompass: (1)
dimensional tasks, having high accuracy and robustness; average convergence time in the training phase (in hours
it also has a more suitable direction in key links of power (h)); (2) maximum inference delay per sample in the
grid engineering risk management, such as risk inference phase (in milliseconds (ms)); (3) peak memory
transmission, modal coupling, and dynamic prediction. consumption (in gigabyte (GB)); (4) training time per
0.01 F1-Score (in h). Under the condition of meeting the
4.3.4 GridRiskNet training cost and efficiency needs of offline batch processing and periodic risk
monitoring in power grids, the practical controllability of
analysis GridRiskNet is scientifically measured. The analysis
Tests on computing cost and efficiency are results are suggested in Table 8.
conducted to evaluate the engineering practicality of
Table 8: Analysis results of the training cost and efficiency of GridRiskNet and baseline models under the same dataset
Peak memory
Convergence Maximum inference Training time per 0.01 F1-
Model consumption
time (h) delay per sample (ms) Score (h)
(GB)
XGBoost 1.2 0.09 1.5 0.17
LightGBM 1.0 0.07 1.2 0.14
ST-GCN 8.5 0.36 5.1 1.25
BERT-BiLSTM 12.3 0.45 6.4 1.77
GridRiskNet 17.8 0.63 9.8 2.00
According to the results in Table 8, although uncertainty to describe the impact of policy changes on
GridRiskNet has a longer absolute training time (17.8 h) project risks. It essentially structurally summarizes policy
and higher single-sample inference delay (0.63 ms) than volatility and does not depend on specific legal
other models, its key index "training time per 0.01 F1- provisions. At the same time, GridRiskNet focuses on
score" is 2 h, which is lower than that of BERT-BiLSTM risk propagation mechanisms and multimodal feature
(1.77 h), bringing greater benefits. This indicates that its fusion, and its methodology is a universal architecture for
high complexity effectively "exchanges for performance" global engineering projects. Therefore, even with U.S.
with obvious non-linear returns. Moreover, the inference data, the revealed coupling relationships and propagation
delay of 0.63 ms is still far lower than the acceptable mechanisms of multi-source heterogeneous risks have
threshold (usually at the second level) in offline power high reference value for Chinese power grid enterprises.
grid risk prediction, making it suitable for daily or even Additionally, the advantages of GridRiskNet over
hourly scheduling scenarios. The memory consumption existing SOTA models are reflected not only in the
of GridRiskNet matches the typical Graphic Processing superiority of indices but also in innovative
Unit configuration of power enterprises (<10 GB), breakthroughs in methodological mechanisms. First,
making deployment feasible. Overall, although regarding risk classification, GridRiskNet introduces a
GridRiskNet has a higher training cost, it has the cross-modal attention mechanism to deeply explore the
advantages of high performance returns, controllable coupling relationship between accident texts and
inference, and resource affordability, thus making the meteorological features. It effectively makes up for the
feasibility for practical engineering applications. perception defects of traditional single-modal models in
complex scenarios. This enables its F1-score and ROC-
4.4 Discussion AUC to be significantly better than those of models such
as GAT. Second, in regional propagation modeling,
It should be explained that the experimental data of GridRiskNet is based on the GA-RNN structure and
this study are based on U.S. sources (EIA, NOAA, OSM). embeds a risk propagation graph reconstruction
However, the research on investment risk issues of power mechanism. It can dynamically identify key transmission
grid engineering projects has a high degree of paths in the power grid topology and accurately capture
commonality and structural consistency. The core lies in the risk diffusion process. Thus, it can minimize the Risk
the complexity of the investment process, construction Impact Radius error and demonstrate high fitting ability
environment, and risk chain of power grid projects, not to the physical structure of the power grid. Third, for
limited to specific countries. Cost overrun, climate uncertainty quantification, the joint loss function design
pressure, equipment technical failure, supply chain of GridRiskNet integrates classification error, graph
fluctuation, and policy compliance risks (C1-C5) are five reconstruction error, and feature stability regularization
key risks commonly faced by global power grid projects. terms. This helps to control prediction fluctuations in
Among them, "policy compliance risk" is abstracted in high-risk scenarios and reduces risk entropy to the lowest
the model as an index of institutional environment level. Compared with SOTA models that mainly rely on
GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 329
traditional graph networks or single deep models, G. Goal-oriented graph generation for transmission
GridRiskNet realizes the collaborative optimization of expansion planning. Engineering Applications of
structured, spatiotemporal, and semantic data. Its core Artificial Intelligence, 2025, 149(4): 110350.
innovation lies in the deep integration of the three https://doi.org/10.1016/j.engappai.2025.110350
mechanisms: "dynamic feature learning, propagation [2] Silvester B R. Hesitation at increasing integration:
path modeling, and risk distribution stability". This not The feasibility of Norway expanding cross-border
only improves model performance but also achieves a renewable electricity interconnection to support
balance between the complexity of risk perception, path European decarbonisation. Technological
interpretability, and prediction stability, possessing high Forecasting and Social Change, 2025, 213(3):
practical value and theoretical promotion potential. 123917.
https://doi.org/10.1016/j.techfore.2024.123917
[3] Yu Z, Guo L I, Wen T. Design management of clean
5 Conclusion energy projects from the perspective of partnering.
Journal of Tsinghua University (Science and
This study constructs the GridRiskNet risk Technology), 2025, 65(1): 115-124.
management system based on big data mining around the https://doi.org/10.16511/j.cnki.qhdxxb.2024.22.042
intelligent management needs of investment risks in [4] Nyangon J. Climate-proofing critical energy
power grid enterprise engineering projects. It also infrastructure: Smart grids, artificial intelligence,
realizes the fusion modeling and dynamic evaluation of and machine learning for power system resilience
structured, unstructured, and spatiotemporal data. against extreme weather events. Journal of
Through the two-stage modeling architecture, the model Infrastructure Systems, 2024, 30(1): 03124001.
performs well in risk probability distribution https://doi.org/10.1061/JITSE4.ISENG-2375
identification and regional propagation path modeling. [5] Sun B, Zhang Y, Fan B, Xie P. An optimal sequential
The experimental results show that GridRiskNet has investment decision model for generation-side
strong risk structure identification and regional difference energy storage projects in China considering policy
perception abilities under multiple indices. From 2020 to uncertainty. Journal of Energy Storage, 2024, 83(11):
2023, the Risk PCA Projection Score has significantly 110748. https://doi.org/10.1016/j.est.2024.110748
climbed, revealing the dominant position of cost overrun, [6] Sun P, Yuan C, Li X, Di J. Big data analytics, firm
climate pressure, and equipment risk in the evolution of risk and corporate policies: Evidence from China.
engineering risks. At the same time, the model can Research in International Business and Finance,
effectively capture the changing trends of risk path length 2024, 70(23): 102371. 10.1016/j.ribaf.2024.102371
and impact radius in the analysis of the potential impact [7] Hammouri Q, Alfraheed M, Al-Wadi B M.
scope of each power grid region. Moreover, it can identify Influence of information technology on project risk
the propagation characteristics of structural vulnerability management: The mediating role of risk
of the western power grid and the high-impact radius of identification. Journal of Project Management, 2025,
the Texas power grid, providing quantitative support for 10(1): 143-150.
regional risk management. https://doi.org/10.5267/j.jpm.2024.10.001
Although GridRiskNet shows strong comprehensive [8] Risanger S, Mays J. Congestion risk, transmission
performance in the experiment, there is still room for rights, and investment equilibria in electricity
further optimization. The current model still relies on a markets. The Energy Journal, 2024, 45(1): 173-200.
fixed attention mechanism in the fusion process between https://doi.org/10.5547/01956574.45.1.sris
different data modalities, which struggles to fully [9] Khanna K, Govindarasu M. Resiliency-driven
characterize the dynamic coupling relationship between cyber–physical risk assessment and investment
heterogeneous features due to time and place. In addition, planning for power substations. IEEE Transactions
the physical constraint mechanism is not introduced in on Control Systems Technology, 2024, 7(3): 21.
the risk propagation modeling, and the mapping accuracy https://doi.org/10.1109/TCST.2024.3378990
of the actual operation state of the power grid still has [10] Liu H, Li X, Zhang Y. Investment risk assessment
room for improvement. Follow-up research can further based on improved BP neural network. International
introduce reinforcement learning and physical graph Journal of Automation and Control, 2024, 18(6):
embedding methods to improve the model's adaptability 636-654.
to dynamic environmental changes. Furthermore, https://doi.org/10.1504/IJAAC.2024.142093
expanding the model to broader scenarios such as new [11] Bussmann N, Giudici P, Tanda A, Yu P Y.
energy access and emergency dispatching supports the
Explainable machine learning to predict the cost of
intelligent transformation of investment risk management capital. Frontiers in Artificial Intelligence, 2025,
of power grid enterprises in a pluralistic and complex 8(1): 1578190.
environment. https://doi.org/10.3389/frai.2025.1578190
[12] Dong S, Li A. The application of deep learning
models in investment risk analysis of intelligent
References
manufacturing projects. Intelligent Decision
Technologies-netherlands, 2025, 3(1): 14.
[1] Varbella A, Gjorgiev B, Sartore F, Zio E, Sansavini
330 Informatica 49 (2025) 315–330 H. Gao et al.
https://doi.org/10.1177/18724981251325923 algorithm. Informatica, 2023, 47(2): 16-21.
[13] Loseva O V, Munerman I V, Fedotova M A. https://doi.org/10.31449/inf.v47i2.4026
Assessment and classification models of regional [25] Feng J. Multi-attribute perceptual fuzzy information
investment projects implemented through decision-making technology in investment risk
concession agreements. Economy of Regions, 2024, assessment of green finance Projects. Journal of
20(1): 276-292. Intelligent Systems, 2024, 33(1): 20230189.
https://doi.org/10.17059/ekon.reg.2024-1-19 https://doi.org/10.1515/jisys-2023-0189
[14] Mostofi F, Bahadır Ü, Tokdemir O B, Toğan V,
Yepes V. Enhancing strategic investment in
construction engineering projects: A novel graph
attention network decision-support model.
Computers & Industrial Engineering, 2025, 203(2):
111033. https://doi.org/10.1016/j.cie.2025.111033
[15] Qi Y. Multi modal graph search: intelligent massive-
scale subgraph discovery for multi-category
financial pattern mining. IEEE Access, 2025, 1(1):
331.
https://doi.org/10.1109/ACCESS.2025.3553560
[16] Luo S, Zhu X. Regional investment risk evaluation
based on compound risk correlation coefficient and
migration learning approach. Journal of
Computational Methods in Science and Engineering,
2024, 24(1): 327-342. 10.3233/JCM-237045
[17] Gao C, Wang X, Li D, Han C, You W, Zhao Y. A
novel hybrid power-grid investment optimization
model with collaborative consideration of risk and
benefit. Energies, 2023, 16(20): 7215.
https://doi.org/10.3390/en16207215
[18] Oikonomou K, Maloney P R, Bhattacharya S, et al.
Energy storage planning for enhanced resilience of
power systems against wildfires and heatwaves.
Journal of Energy Storage, 2025, 119(1): 116074.
https://doi.org/10.1016/j.est.2025.116074
[19] Tavakoli M, Chandra R, Tian F, Bravo C. Multi-
modal deep learning for credit rating prediction
using text and numerical data streams. Applied Soft
Computing, 2025. 2(4): 112771.
https://doi.org/10.1016/j.asoc.2025.112771
[20] Liu K, Liu M, Tang M, Zhang C, Zhu J. XGBoost-
based power grid fault prediction with feature
enhancement: application to meteorology.
Computers, Materials & Continua, 2025, 82(2): 7.
https://doi.org/10.32604/cmc.2024.057074
[21] Zhou X, Li J. Risk assessment of high-voltage
power grid under typhoon disaster based on model-
driven and data-driven methods. Energies, 2025,
18(4): 809. https://doi.org/10.3390/en18040809
[22] Sari R P, Febriyanto F, Adi A C. Analysis
implementation of the ensemble algorithm in
predicting customer churn in telco data: A
comparative study. Informatica, 2023, 47(7): 22-26.
https://doi.org/10.31449/inf.v47i7.4797
[23] Tikhomirova T, Tikhomirov N. Methods for
assessing low profitability risks of an investment
project in conditions of uncertainty. Revista Gestão
& Tecnologia, 2024, 24(2): 244-257.
https://doi.org/10.20397/2177-
6652/2024.v24i2.2845
[24] Li L. Dynamic cost estimation of reconstruction
project based on particle swarm optimization
https://doi.org/10.31449/inf.v49i16.9600 Informatica 49 (2025) 331–350 331
Real-Time Motion Recognition in Special Training Systems Based on
the Optimized BBO-KNN Method of Motion Morphology
Yin Xu
School of Physical Education, Henan Kaifeng College of Science Technology and Communication, Kaifeng 475001,
China
E-mail: xumeili2025@163.com
Keywords: KNN dynamic weight, sports, special training
Received: June 6, 2025
The traditional sports training boxing system has problems with insufficient accuracy and poor real-time
performance in high similarity action classification, and lacks adaptability to individual action
differences. This article constructs a sports training system based on dynamic weight optimization KNN
(BBO-KNN), aiming to improve the accuracy and real-time performance of complex action recognition,
and provide technical support for personalized training. In response to the problems of insufficient
accuracy (high FP rate), poor real-time performance (delay>1s), and lack of individual adaptability in
high similarity action classification of traditional sports training systems, this study proposes a KNN
model based on dynamic weight optimization (BBO-KNN). The model performance is optimized by fusing
proprietary datasets with public datasets and using 5-fold cross validation (training/testing ratio 7:3).
The experimental results validate that BBO-KNN significantly outperforms benchmark models such as
LSTM (94.50%) and SVM (89.30%) in accuracy (96.20% ± 0.3%). The system performs highly similar
actions such as running ↔ The FP rate of jumping has decreased to 1.6%, and the global FP rate is
1.39%.and robustness (noise interference fluctuation ± 1.2%). The classification error distribution shows
its stability advantage, and the confusion matrix highlights the accurate recognition of highly similar
actions (such as running → jumping). Research has shown that the BBO-KNN model effectively solves
the real-time and robustness problems of motion recognition through dynamic weight optimization. In the
future, it can be extended to complex movements such as gymnastics by combining visual data and
adapting to individual style differences through incremental learning.
Povzetek: Članek predstavi sistem za športno vadbo, ki uporablja dinamično uteženi BBO-KNN za boljše
prepoznavanje gibov.
1 Introduction processing capabilities. The specific deficiency is the
core contradiction of insufficient data utilization.
Sports special training is undergoing a profound Currently, only 42% of sports teams have established a
change from traditional experience-oriented to data- complete analysis system, which means that more than
driven. This transformation process presents multi- half of the training data is dormant and cannot be
dimensional technical characteristics and systematic converted into effective training decision-making basis.
development bottlenecks. From a macro perspective, it The deficiency of this data value mining stems from
can be seen that the digital penetration of modern sports multiple technical obstacles, including but not limited to
training systems has reached a considerable scale. imperfect feature engineering construction, inefficient
According to authoritative data from the General data cleaning process, and insufficient adaptability of
Administration of Sport of China in 2024, more than analysis model. What is more prominent is the static
three-quarters of professional sports teams have deployed phenomenon of evaluation indicators. Up to 91% of
various wearable devices for training data collection. training systems still adopt the fixed weight scoring
This proportion has nearly doubled compared to five mechanism [1]. This rigid evaluation system can't adapt
years ago, indicating that a fundamental paradigm shift is to the dynamic changes of athletes' physiological
taking place in sports training methodology. parameters, resulting in systematic deviation between
There is a sharp intergenerational gap between the training programs and actual needs. In addition, the
rapid popularization of hardware and the intelligence feedback delay problem further amplifies this mismatch.
level of software systems. The widespread deployment of The decision lag of 2.3 training cycles on average makes
data acquisition equipment has not simultaneously the training adjustment always lag behind the actual state
brought about a significant improvement in training change of athletes, resulting in the time loss of training
efficiency, but has exposed structural defects in data effect [2].
332 Informatica 49 (2025) 331–350 Y. Xu
By deeply analyzing the technical essence behind and delay>200ms. To achieve this assumption, the system
these phenomena, we can find that the fundamental uses the BBO algorithm to optimize the feature weight
reason for the homogeneity of training programs lies in vector to enhance local feature sensitivity, combines K-
the uniformity of feature extraction dimensions and the Means clustering to compress the dataset size, and
lack of personalized modeling, which reflects the designs a lightweight edge architecture for real-time
fundamental contradiction between the traditional batch processing.
computing model and the real-time decision-making The implementation of minute level weight updates
needs [3]. Therefore, solving these systemic defects through sliding time windows and incremental learning
requires the introduction of innovative algorithm relies on a triple mechanism:
architectures and technical paradigms. There are still two (1) The 200ms sensor window slides in 10ms steps
key optimization spaces in current technology. The first to ensure real-time feature extraction;
is the balance between computing resource consumption (2) Incremental learning only updates cluster centers
and real-time requirements, especially the control of (non feature weights), and adjusts secondary cluster
computational complexity when processing high- points every 5 days through new data (as mentioned in
dimensional features. The second is the model the conclusion);
generalization ability in small sample scenarios and the (3) The feature weight WK3 remains static, and its
adaptive performance when facing new athletes or rare “dynamic” effect comes from the weight distribution
training situations [4]. optimized by BBO, while window sliding allows the
The core innovation of KNN dynamic weight model to continuously capture temporal features.
optimization technology lies in building a four-
dimensional optimization space, realizing minute-level
weight updates in the time dimension, and compressing 2 Related work
the data processing delay to 1/60 of the traditional method
through the sliding time window mechanism and 2.1 Research status of sports special
incremental learning algorithm. Moreover, it completes training system
multimodal data fusion in the feature dimension,
integrating multi-source information such as Rodriguez et al. [8] developed a multi-sensor fusion
biomechanics, physiology and biochemistry, and wearable system. It integrates IMU, sEMG and heart rate
environmental parameters [5, 6]. In the individual monitoring modules, increasing the data collection
dimension, it establishes an athlete-specific model and dimension to 23 physiological indicators, but there is a
achieves efficient matching of similar samples through 15% sensor signal interference problem. The 4D optical
dynamic neighborhood search technology. In the capture solution proposed by Cizmic et al. [9] improves
environmental dimension, it integrates venue equipment the motion analysis accuracy to 0.3mm, but the system
parameters to build a complete training situation construction cost is as high as 2 million yuan, making it
perception system. This multi-dimensional optimization difficult to popularize and apply. At present, non-contact
architecture enables the system to process nonlinear and monitoring technology based on millimeter wave radar
non-stationary training data, effectively solving the can realize micro-motion capture within a range of 5m,
response hysteresis problem of traditional systems [7]. but the sampling rate is limited to 120Hz.
The traditional sports training classification system The BP neural network evaluation model
encounters issues of insufficient accuracy and poor real- constructed by Balkhi et al. [10] improves the accuracy
time performance in high-similarity action classification, of technical action scoring to 89% in sports events, but
and lacks adaptability to individual motion variations. requires more than 800 hours of labeled data training.
Therefore, this paper constructs a sports training system Calderón-Díaz et al. [11] introduced the transfer learning
based on Biogeography-Based OptimizationKNN (BBO- method, which can realize personalized modeling of new
KNN), aiming to improve the accuracy and real-time athletes with only 200 samples, but the cross-event
performance of complex action recognition and provide transfer error still reaches 28%. It is worth noting that the
technical support for personalized training. digital twin evaluation system developed by Iduh et al.
This study aimed to investigate the performance [12] controls the sports action prediction error within 1.2 °
limits of the BBO-KNN (BBO optimized KNN) through real-time physical simulation, but it needs to be
algorithm for high-similarity action recognition by equipped with the support of a supercomputing center.
addressing a specific research question. The specific From the above research on sports special training
hypothesis is whether BBO-KNN could reduce the false system, the current system generally faces three major
positive rate (FP rate) to below 2%, while maintaining a challenges: (1) the asynchronous problem of multi-source
stable end-to-end processing latency below 20ms and a data leads to 27% information loss; (2) The lack of
classification accuracy better than 95%. This goal is interpretability of the model leads to the trust crisis of
directly aimed at the core defects of traditional systems coaches; (3) The contradiction between hardware
(such as LSTM and SVM) in high similarity action (such portability and accuracy is prominent. It is particularly
as running and jumping) classification, with FP rate>4.2% noteworthy that 82% of commercial systems still use
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 333
static evaluation algorithms, which cannot adapt to the increased by 19%, but the construction of rule base relies
dynamic changes of athletes' status. on a large amount of expert knowledge.
From the above research, the current research
2.2 Application of optimization algorithm mainly faces three bottlenecks: (1) the contradiction
between the real-time performance and accuracy of the
in training system
algorithm, and the optimal system still has a delay of 5-8
Chen et al. [13] introduced genetic algorithm into minutes; (2) The model is not explainable enough, and 68%
sports special training cycle planning, and improved the of AI decisions cannot provide reasonable explanations;
matching degree of training scheme by 31% through (3) The cross-project transfer capability is weak, and the
adaptive cross-mutation strategy, but there is the problem average error is 37%. In particular, 82% of commercial
of slow iterative convergence speed (an average of 14 training systems (2024 market research) still adopt static
hours). The improved particle swarm optimization optimization strategies, which are difficult to adapt to the
algorithm by Taborri et al. [14] optimizes the load dynamic changes of athletes' status.
distribution of strength training in sports events, which
increases the maximum strength growth rate of athletes 2.3 Research status of application of KNN
by 22%, but the sensitivity of the algorithm parameters is
high and needs repeated debugging. optimization algorithm in sports
The LSTM-ATT hybrid model developed by Hanif training
et al. [15] achieves 92% accuracy in the evaluation of The weighted dynamic KNN model proposed by
sports-specific actions, but the model needs 150,000 Merzah et al. [20] improves the accuracy rate to 94% in
pieces of labeled data for training. AshokKumar et al. [16] sports action recognition, but the real-time calculation
applied reinforcement learning to optimize sports- delay is still 1.2 seconds. The quantized distance
specific strategies, which increased the athlete's scoring calculation method developed by Bunker et al. [21] can
rate by 29%, but there was a problem of high training increase the speed of athletes' posture analysis by 100
costs (200 hours of simulated adversarial data was times, but it needs the support of special quantum
required). The meta-learning method can shorten the computing equipment. The improved KNN scheme
model adaptation cycle for new athletes from 14 days to combined with SHAP value interpretation proposed by
5 days, but it has a huge demand for computing resources, Teixeira et al. [22] reduces the error of sports action
requiring 4 A100 graphics cards. evaluation from 3.2° to 0.8°, but the complexity of feature
The Pareto frontier algorithm proposed by Kumar et engineering increases by 3 times.
al. [17] balances technical improvement and injury risk The sliding window incremental learning system
in sports training, which optimizes the training benefit- applied by Woltmann et al. [23] shortens the update cycle
risk ratio by 37%, but the complexity of the algorithm of the training model to 15 minutes, but the memory
leads to a decrease in real-time performance and a delay footprint is still as high as 32GB. The multi-modal
of up to 11 minutes. The NSGA-III algorithm developed distance measurement method proposed by Sonalcan et
by Molavian et al. [18] realizes the multi-objective al. [24] combines electromyographic and mechanical
optimization of sports specialties and improves the characteristics in sports events, so that the prediction
competition performance by 0.8%, but it needs the error of action angle is < 0.5 °, but 17 sensor data need to
support of accurate biomechanical modeling. be synchronized.
Malamatinos et al. [19] applied fuzzy logic to optimize
sports posture, and the completion of movements was
Table 1 below summarizes the relevant work:
Table 1: Summary of related work
Technical
Method and Key Results Limitations and technical bottlenecks
direction
Method: Multi sensor wearable
Limitations: 15% signal interference
Bottleneck:
system (IMU/sEMG/heart
Asynchronous multi-source data leads to 27%
Multi sensor rate)
Results: 23 physiological
information loss, and there is a contradiction between
fusion system indicators were collected, and the
portability and accuracy (the cost of a 0.3mm precision
data dimension was increased by
system reaches 2 million yuan)
300%
Method: High precision optical
Limitations: Supercomputing Center Dependency (2-
4D optical marker point tracking
Result:
million-yuan cost)
Bottleneck: High hardware
capture scheme Motion capture accuracy reaches
deployment costs, difficult to popularize applications
0.3mm
334 Informatica 49 (2025) 331–350 Y. Xu
Method: Multi layer
Limitations: 800 hours of annotated data training
BP neural backpropagation
required
Bottleneck: high data dependency, long
network model network
Result: Technical
model update cycle (>2 weeks)
action scoring accuracy rate of 89%
Method: Cross athlete feature Limitations: Cross project migration error of
Transfer learning
transfer
Result: New athlete 28%
Bottleneck: weak domain adaptability,
program
modeling only requires 200 samples insufficient generalization ability
Method: Weighted Neighbor Limitations: Real time latency of 1.2
Dynamic KNN
Classification
Result: Action seconds
Bottleneck: Low computational efficiency,
algorithm
recognition accuracy rate of 94% unable to meet real-time requirements of<100ms
Method: Quantization feature
Limitations: Requires specialized quantum devices.
Quantum distance similarity measurement
Result:
Bottleneck: Strong hardware dependency and
calculation Attitude analysis speed increased by
extremely high commercialization costs
100 times
Method: Fusion of
Limitations: 17 sensors need to be synchronized.
Multimodal KNN electromyography and mechanical
Bottleneck: The system integration complexity is high,
optimization features
Result: Prediction error
and the engineering implementation is difficult
of action angle<0.5 °
From the above research, the current research on the 3.1 Design of improved nearest neighbor
application of KNN optimization algorithm in sports
classification algorithm
training faces three core challenges: There is a
contradiction between real-time requirements and The KNN algorithm generally uses the majority
computational accuracy, and the optimal system still has voting method. It assumes that there are N labeled
a delay of 8-15 seconds; (2) The asynchrony of multi-
source data leads to a loss of 27% of feature information; samples T = (x1, y1 ) ,(x2 , y2 ) , ,(xN , yN ) , x
i
(3) The cost of personalized adaptation is too high, and it
takes 14-20 days to build a single athlete model. In represents a sample with n -dimensional features,
particular, 83% of the existing systems (2025 market
research) still adopt the static K-value strategy, which is x Rn label of
i ,i =1,2, , N and y is the x ,
i i
difficult to adapt to the dynamic changes of training
intensity.
yi = (c2 ,c2 , ,cl ) The label value y of the sample to
.
3 Sports special training system be tested is obtained by the classification rule, as shown
based on KNN dynamic weight in the following formula[25]:
optimization
y = argmaxc H y ,c i = N j = L
( ) ( ), 1,2, , , 1,2, ,
j xi Nk x i j
This paper improves the KNN classifier, which has
excellent performance in feature engineering processing, (1)
and proposes a KNN classification algorithm based on 0 y
the K-means clustering algorithm. This paper combines ( i c j
H yi ,c j ) = (2)
1 yi = c j
the univariate feature selection method and the
BGWOPSO algorithm to search for the optimal feature Among them,
set, and selects the BBO algorithm as the weight Nk (x) = xi | xi is the K nearest neighbor samples of x
optimization module of the subsequent human motion
intention recognition model to propose a human motion , and when y and otherwise it is
i = c ,
j H ( yi ,c j ) =1
intention recognition model that can use fewer features to
identify multiple motion patterns and has a higher 0.
classification accuracy.
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 335
3.1.1 Comparison of feature normalization standardize the data. Although all data information is
methods used in the dimensionless process, the importance of each
When a sample includes multiple eigenvalues, variable is not treated equally, and the analysis weight of
features with larger magnitudes will weaken features with variables with large differences is relatively large. The
smaller magnitudes and affect the accuracy of the KNN
conversion function is:
classifier. Therefore, the data needs to be normalized, and
the commonly used normalization methods are maximum x −
x
scale = (4)
normalization and mean-variance normalization. 2
The extreme value normalization method uses the
The maximum normalization and mean-variance
maximum and minimum values in the variable value
range to scale the original data proportionally to the data normalization methods are used to normalize the post-FC
within the 0,1 range to eliminate the impact of the mixed data set respectively, and the mean and standard
dimension. Since the extreme value normalization deviation are extracted as eigenvalues, which are input
method is only related to the two extreme values of the
into KNN classifier, and the classification accuracy of the
maximum and minimum values, the scaling of each
variable is overly dependent on the two extreme values. two are compared. When the K values of the nearest
The conversion function of the extreme value neighbor are taken from 1 to 15 respectively, the 5-fold
normalization is as follows [26]:
cross-validation accuracy of the KNN classifier after
x − x
x min
scale1 = (3) normalization by the maximum normalization method
xmax − xmin
and the mean-variance normalization method is
The mean-variance normalization method uses the
compared, and the results are shown in Figure 1(a).
mean and standard deviation of the original data to
(a) Comparison of classifier accuracy when using maximum normalization and mean-variance normalization
methods;
336 Informatica 49 (2025) 331–350 Y. Xu
(b) Comparison of classifier accuracy using different distance measurement formulas
Figure 1: Comparison of classifier accuracy (Using post FC mixed dataset (action fragment sampling rate
100Hz))
It can be seen from Figure 1(a) that after the two
1
normalization methods normalize the post-FC mixed data p
( ) ( n (l ) (l) ) p
L4 xi , x j = ( )
l=1 xi − x 8
set, the accuracy of KNN classifier is not much different, j
and there is no obvious law at all. The most commonly
used method is the nearest neighbor K value. When
1
K = 3 and K = 4 , the data processed by the mean- T
( ) ( n (l ) (l ) 1( (l ) (l )
5 ))2
L xi , x =
variance normalization method has higher classification j l 1 xi − x j − )
= xi − x (9
j
accuracy, so in subsequent experiments, the mean-
variance normalization method is used to normalize the Among them, feature space is an n-dimensional
data.
real vector space Rn , x , a d
i , x
j n is the
3.1.2 Comparison of distance measurement
covariance matrix of multidimensional random variables.
formulas
When the data of each dimension are independent
Commonly used distance measures in KNN and identically distributed, the Mahalanobis distance is
algorithm include Manhattan distance, Euclidean the Euclidean distance. The post-FC mixed data set is
distance, Chebyshev distance, Min distance and normalized using the mean-variance normalization
Mahalanobis distance, etc. The formulas are [27]: method, and the mean and standard deviation are
extracted as feature values and input into the KNN
( ) n (l ) (l )
L1 xi , x j = l=1 xi − x j (5)
classifier. The dataset is a post-FC hybrid dataset (feature
dimensions: mean and standard deviation), and the K
1 value ranges from 1 to 15 (full range validation). The
2
( ) n ( ) ( ( validation method uses 5-fold cross validation
2 i , x j = ( l l )
L x
l= xi − x )2
6)
1 j
(independent calculation of accuracy for each fold). The
results are shown in Table 2.
( ) (l ) (l )
L3 xi , x j = max xi − x j (7)
l
Table 2: Comparison of classification accuracy of KNN classifier using different distance metrics
K value
Distance formula
1 2 3 4 5 6 7 8
Manhattan distance 92.57 92.03 91.66 91.58 91.12 91.49 91.27 92.12
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 337
Euclidean distance 92.12 92.67 92.40 92.67 91.39 91.75 90.48 91.02
Chebyshev distance 90.48 90.59 90.20 89.89 89.01 89.83 88.37 88.09
Min's distance 92.03 92.57 91.58 91.48 91.26 91.57 90.13 90.60
Mahalanobis distance 90.14 89.60 90.87 89.87 89.31 89.78 89.90 90.24
K value
Distance formula
9 10 11 12 13 14 15
Manhattan distance 90.89 90.93 91.58 90.93 91.30 90.00 90.59
Euclidean distance 91.49 90.20 90.24 90.29 89.92 90.10 89.01
Chebyshev distance 87.55 87.45 87.55 87.27 87.19 87.36 86.35
Min's distance 90.57 90.57 90.02 89.38 90.57 89.47 88.82
Mahalanobis distance 86.45 89.25 89.09 86.81 88.23 86.25 86.08
It can be seen from Figure 1 (b) and Table 2 that nearest neighbor K value is K = 4 .
using Euclidean distance and Manhattan distance can
make the algorithm obtain high accuracy, but using 3.1.4 Dataset size reduction based on K-
Manhattan distance will bring serious computational
means clustering algorithm
burden to the algorithm and the prediction time is too
long. Considering the accuracy and operation time, Real-time implementation of KNN classifier on
Euclidean distance is selected for measurement. intelligent dynamic knee prostheses is difficult. In order
Figure 1 (b) and Table 2 show that the Euclidean to solve this problem, a combination of KNN algorithm
distance has an accuracy of 92.67% at K=4, which is and K-Means clustering algorithm is proposed. To ensure
better than the Chebyshev distance (89.89%) and has the accuracy of the experiments, several trials are
better hardware adaptability. Hardware level performed to determine the cluster centers.
optimization: The native multiply add instruction (MAC) In the post-FC hybrid dataset, each motion state
of ARM7-M FPU holds the sum of squares operation, contains 120 sets of motion data, and after feature
which enables 24-dimensional feature calculation to be extraction, the data storage amount is still huge. The K-
performed at only 4.2us/time, 38% faster than Manhattan Means clustering algorithm can significantly reduce the
distance, especially avoiding the prediction penalty of size of the data set and remove most of the similar sample
Manhattan distance absolute value branch (Figure 1 (b) points.
accuracy curve confirms this choice). To reduce the computational complexity of KNN,
hierarchical K-Means clustering is employed to compress
3.1.3 Selection of nearest neighbor value each class of action data independently: K-Means
clustering is performed on the samples of each action
How to choose the appropriate nearest neighbor K class, and the set of tooth count centers is represented as
value is also critical to improving the accuracy of the KI = KI1, KI2 , , KIl , where l is the number of classes.
KNN classifier. The smaller the K value is, the easier it is In addition, the corresponding primary cluster centers are
for the model to overfit. When K = 1, it is equivalent to generated. Within the same action class, secondary K-
predicting only based on the nearest point to the target Means clustering is performed to obtain M secondary
point. If this point is a noise point, an error will occur. cluster points ( M 120 ), resulting in a set of secondary
When the K value is larger, points farther away from the cluster points, represented as
target will also participate in the prediction, resulting in KS =KS1, KS1, KS1 KS l . These two s
1 2 , M , , M ets are
underfitting. When K is equal to the total number of saved as new datasets.
sample points, the prediction result is the label with the KI completely replacing the original data, a
most points in all samples, and the classification model is compressed table collection (non-index tag) is formed.
completely invalid at this time. Common methods for KNN operates directly on the compressed set, eliminating
selecting the nearest neighbor K value include empirical the need to trace back to the original data. KI and KS
judgment and determination using optimization constitute completely independent compressed table
algorithms. collections, which are directly used as the operational
As shown in Figure 1(a) and (b), when the nearest objects for KNN in inference. This “hierarchical
neighbor K value is from 1 to 15, the classifier achieves representation + geometric constraints” architecture not
relatively high values at K = 2 and K = 4 . Then, as the only retains key motion features but also completely
K value increases, the prediction accuracy of the avoids the computational burden of the original data. This
classifier gradually decreases. When K = 2 , the K value also provides support for reducing computational burden
is small and the probability of overfitting is greater, so the in subsequent experiments
338 Informatica 49 (2025) 331–350 Y. Xu
3.1.5 Improvement of classification decision radius covering c the symmetric points
The basic principle of trigonometric inequality is
rules based on triangular inequality
that the sum of any two sides in a triangle is greater than
The test sample point is x, the class primary center the third side, which can be associated with the distance
point is c, and the secondary center point is s, satisfying relationship between three sample points, as shown in
the basic properties of metric space: Figure 2. In the figure, the unmarked sample point T is a
d (x,c)−d (c, s) d (x, s) d (x,c)+ d (c,s) , which green circle.
defines a spherical region centered at x and with a
Figure 2: Schematic diagram of triangular inequality method
The steps to improve the KNN algorithm are as compressed_set = {}
follows: For class_1abels in unique_1abels: # Traverse each
Step 1: The K-Means algorithm is used as a action category
preprocessing step to reduce the size of the dataset, and 1 Class_data=DataSet [class_1abel] # Retrieve all
initial center is clustered in each class, and M secondary samples of the current class
clusters are clustered in each class.
Step 2: The initial centers of the first K classes with #Main clustering center (capturing core features of
the smallest distances are selected. the class)
Step 3: Among the selected top K classes. The main_centers =
distance from the selected secondary cluster point in each KMeans(n_clusters=Kc_main).fit(class_data).cluster_ce
class to the unlabeled sample is calculated, and the first nters_
K minimum distance values are selected and the mean is
calculated. The class label with the smallest distance #Secondary clustering points (covering intra class
mean is assigned to the unlabeled sample as the label. variation)
The triangle inequality accelerates computation, sub_centers =
narrows the search space, and reduces the omission rate KMeans(n_clusters=Kc_sub).fit(class_data).cluster_cent
of key neighbors through threshold conditions and ers_
geometric constraints (Figure 2). This achieves coupling
between the spherical filter domain and the cluster compressed_set[class_label] = {
distribution. This mathematical framework provides a 'KI': main_centers,
theoretical basis for improving the high accuracy and low #Class initial center set
latency of KNN in sports action recognition. 'KS': sub_centers # subpoint set
}
The algorithm pseudocode is as follows: return compressed_set
#Training stage: K-Means clustering compression #Prediction stage: Improve KNN inference
def train_KMeans_compress(DataSet, Kc_main=1, def enhanced_KNN_predict(sample,
Kc_sub=15): compressed_set, K=4):
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 339
#Step 1: Calculate the distance to various main 3.2 Construction of human motion
centers
intention recognition model
main_distances = []
for label, centers in compressed_set.items(): This paper proposes a human motion intention
dist = min([euclidean(sample, center) for center in recognition optimization system, as shown in Figure 3.
centers['KI']]) When the subject wears an intelligent powered knee
main_distances.append((label, dist)) prosthesis, the 6-axis IMU sensor, uniaxial pressure
sensor and knee encoder acquire raw data with a sampling
frequency of 100 Hz. When the foot touches the ground,
#Step 2: Select the top K nearest classes the 8-channel sensor data of the knee prosthesis is
top_classes = sorted(main_distances, key=lambda x: collected within 200ms, and the BGWOPSO algorithm is
used for feature selection. By comparing the optimization
x[1])[:K]
of feature weights using three weight optimization
methods such as the BBO algorithm, the classification
#Step 3: Triangular inequality screening for accuracy of the KNN classifier is improved, and the
weight optimization method used in this system is
secondary points
determined.
min_avg_dist = float('inf')
The feature weights (WK3) optimized by the BBO
predicted_label = None
algorithm remain static during the inference phase, and
for class_label, _ in top_classes:
their function is to enhance sensitivity to key motion
#Get all sub points of this category
attributes through pre-set feature importance. Dynamics
sub_points = compressed_set[class_label]['KS']
are mainly reflected in two aspects:
Neighbor dynamic screening: Real time selection of
#Triangle inequality filtering (only calculates points
relevant samples based on triangular inequality (Figure 2);
that may be closer)
Incremental model update: Adjusting cluster centers to
candidate_points = []
adapt to individual differences through new data
for point in sub_points:
Metaheuristics can reduce computational burden. In
If Euclidean (point, sample)200ms >150ms >100ms -
latency measurement
Fluctuation ± 1.2% Monte Carlo
Noise Fluctuations Fluctuations ± 3.5%
(signal loss test of 0.003* simulation
robustness ± 2.8% ± 4.1% fluctuation
15%) (1000 times)
24 dimensional 150000 Feature Learning
Training data 80000
features+incremental annotated engineering - curve
requirements samples
learning data dependency analysis
Online
Model 5 days (new athlete Time cost
14 days updates are 10 days <0.001*
update cycle adaptation) tracking
not supported
The quantitative delay comparison results with SOTA model are shown in Table 5 below:
Table 5: Quantitative delay comparison results
Model Delay (end-to-end) Hardware dependency Input sensitivity
Low (Feature Dimension
Universal sensor (low-
BBO-KNN <20ms Compression Buffer Input
cost)
Fluctuations)
High (complete sequence
LSTM >200ms GPU Accelerator required for temporal
modeling)
Medium (kernel function
SVM >150ms CPU cluster
calculation burden)
High (uncompressed
KNN 1.2 seconds No special requirements
sample size)
The results of parameter sensitivity verification are shown in Table 6 below:
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 345
Table 6: Parameter sensitivity verification
Convergence
Parameter Accuracy FP rate
algebraic Key conclusions
perturbation fluctuation fluctuation
variation
Population
size ± 30%
Convergence
15
Insufficient population leads to local
▶ 105 (-30%) -0.70% 0.90% generations
optima (WK3 weight imbalance)
ahead of
schedule
Delay 8th
Revenue does not offset calculation
▶ 195 (+30%) 0.20% -0.10% generation
costs (delay ↑ 23%)
convergence
Iteration times
± 20%
Not reaching the convergence saturation
▶ 40 (-20%) -0.40% 0.60% -
point (K=4 curve in Figure 1a)
Diminishing marginal benefits
▶ 60 (+20%) 0.10% -0.05% -
(resource waste ↑ 35%)
By quantifying the contributions of each module ablation experiment are shown in Table 7 below:
using the variable control method, the results of the
Table 7: Results of ablation test
Ablation component Accuracy variation FP rate change Key Function
Complete BBO-KNN 96.20% 1.60% -
Remove BBO weight Decreased feature
94.17% (↓2.03%) 1.90%
optimization sensitivity
Remove context fusion Increased confusion in
92.50% (↓3.70%) 3.20%
highly similar actions
Remove feature selection Noise characteristics
90.10% (↓6.10%) 5.80% interfere with decision-
making
Only using single center Loss of intra class
clustering 89.42% (↓6.78%) 6.50% diversity (comparison in
Table 6)
To further verify the universality of the model in this on the Berkeley MHAD dataset. This dataset contains 12
article, Berkeley MHAD (international dataset: basic actions with a balanced sample size (approximately
https://tele-immersion.citris-uc.org/berkeley_mhad) was 150 samples per class), using the same 5-fold cross
used to validate the universality of basic actions. Table 8 validation method as the document (training/testing ratio
shows the performance comparison results of the model 7:3). The evaluation indicators include accuracy, recall,
346 Informatica 49 (2025) 331–350 Y. Xu
Jaccard coefficient, and F1 score.
Table 8: Results of universal validation
model Accuracy Recall Jaccard F1 value
BBO-KNN 95.50%±0.4% 95.10%±0.5% 92.80%±0.6% 95.30%±0.4%
LSTM 94.00%±0.7% 93.60%±0.8% 91.50%±0.9% 93.90%±0.7%
SVM 88.80%±1.3% 88.20%±1.5% 85.60%±1.4% 88.50%±1.3%
random forest 92.20%±0.8% 91.50%±1.0% 89.40%±1.1% 91.80%±0.9%
The multimodal fusion mechanism of BBO-KNN
4.3 Analysis and discussion achieves action understanding through spatiotemporal
In Table 3, the BBO-KNN model performs well in aligned sensor collaborative perception:
all evaluation indicators. In particular, the F1 value of this Physical layer correlation: The pressure sensor
model reaches 96.0%, which is the best performance captures the plantar contact force (vertical dynamic
among the four models. The LSTM model performs index), and the IMU analyzes the joint angular velocity
second, and each indicator is relatively high, but it is (kinematic trajectory). The fusion of the two is similar to
slightly inferior to BBO-KNN in all evaluation indicators. the biological perception mechanism that combines
The accuracy, recall and F1 value of the random forest tactile feedback and visual trajectory (non-image pixel
model are higher than those of SVM, but the overall analogy).
performance is still not as good as BBO-KNN and LSTM. Technical advantage: As shown in the confusion
The SVM model performs the worst in all indicators, matrix in Figure 5 (a), the precise distinction between
which is related to its weak ability to process sequence running and jumping (FP rate of 1.6%) is due to the
data. complementarity of pressure IMU (jump pressure
The BBO-KNN model performs well in sports distribution vs change in aerial angular velocity). This
action recognition tasks (F1 value 96.0%), and its fusion logic is similar to the probability interpretability of
performance advantage can be attributed to the following Gaussian Mixture Model (GMM) in multi-source signal
core improvement strategies and technical characteristics: separation (non background modeling analogy).
(1) Design of KNN algorithm with dynamic weight The weight vector optimized by BBO directly
optimization quantifies the contributions of each sensor, and the newly
The classification effect of the traditional KNN added data only updates the cluster center (non-black box
algorithm is limited by the fixed number of neighbors (K- parameters). The athlete style adaptation records are
value) and uniform distance weight allocation. By retained as an independent KS subset.
introducing a dynamic weight strategy, BBO-KNN (4) Robustness enhancement and noise suppression
adaptively adjusts the contribution of nearest neighbor BBO-KNN effectively reduces the influence of
samples according to the local characteristics of sensor sensor noise on classification results by integrating
data. For example, during the sprint acceleration phase, filtering algorithms and outlier detection modules. For
due to the sensitivity of BBO optimized feature weights example, when the foot touches the ground during
(Y-axis acceleration weight 0.21) to high acceleration, sprinting, the model can filter out the interference of
relevant samples are easily selected into the candidate set. instantaneous vibration signals on acceleration data. This
(2) Context feature fusion mechanism is similar to the idea of suppressing dynamic noise in
BBO-KNN integrates the contextual information of background modeling using the Gaussian mixture model
motion intention, which makes up for the shortcomings (GMM), but BBO-KNN achieves real-time requirements
of traditional KNN that only rely on static feature through lighter calculations.
similarity. In long jump movement recognition, the model The excellent performance of BBO-KNN stems
enhances the robustness of movement segmentation by from its comprehensive design of dynamic weight
analyzing the timing relationship between the change of optimization, context feature fusion, multi-modal data
knee joint angle before take-off and the inertial adaptability and noise suppression mechanism. These
measurement unit (IMU) signal during take-off. This improvements not only inherit the intuition and efficiency
mechanism is highly consistent with the needs of of the traditional KNN algorithm, but also make up for its
complex time series data modeling, and is similar to the shortcomings in timing modeling and noise sensitivity.
advantage of KNN in processing high-dimensional Therefore, this model is especially suitable for scenes
grayscale data in image recognition. such as sports actions, which need to give consideration
(3) Adaptability of multi-modal sensor data to real-time and classification accuracy.
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 347
In Figure 4, the classification error of BBO-KNN is are only 3 cases of running jump misjudgment (compared
3.8%, Weight optimization reduces sensitivity to K to 7 cases of LSTM), which is clearly presented in the
values, and improves the recognition accuracy of action confusion matrix of Figure 5a; At the same time, its
boundary through local feature adaptation. For example, lightweight architecture achieves end-to-end latency
in knee prosthesis movement, dynamically adjusting the of<20ms (more than 10 times faster than LSTM>200ms),
neighbor weight can avoid misclassification during gait which is attributed to K-Means clustering reducing
phase switching. The error of LSTM is 5.5%. Although it computational load by 87% (original 120 groups/class →
is good at time series modeling, it is not as flexible as 1 center point+15 key points); In terms of robustness,
BBO-KNN in capturing short-term motion features. BBO-KNN fluctuated only ± 1.2% (Monte Carlo
When the action segment is short, the LSTM may lose simulation p=0.003) in the noise test with a sensor signal
key frame information. loss of 15%, significantly better than LSTM's ± 2.8%,
The classification error of random forest is 7.3%. confirming the strong anti-interference ability of sliding
Due to the hard boundary characteristics of ensemble window filtering (error distribution verification in Figure
decision tree, the gradual features of continuous motion 4); In addition, BBO weight optimization compresses the
intention are insufficiently fitted. The classification error feature dimension from 24 to 7 (Equation 10), shortening
of SVM is 10.7%. It is difficult to select kernel function the construction cycle of new athlete models to 5 days (t-
in high-dimensional IMU data, and it is sensitive to test p<0.001), and solving the bottleneck of 28% cross
unbalanced training data. item error in traditional transfer learning. These
The low error of BBO-KNN verifies its advantages quantitative results rigorously validate the
in motion intent recognition tasks. Its core is to solve the comprehensive innovation of dynamic weight
bottleneck of traditional methods in real-time and noise architecture in terms of accuracy, real-time performance,
robustness through dynamic neighbor selection and and adaptability.
context fusion. Table 5 shows that edge deployment avoids data
In Figure 5 (a), the high diagonal accuracy of the transmission overhead. The latency fluctuation in the
confusion matrix of the BBO-KNN model is high, and the noise test is ±1 ms, which is associated with an accuracy
classification accuracy of the running and swimming fluctuation of ±1.2%. This is indirectly supported by the
categories reaches 98.4% and 99.0% respectively, which error distribution in Figure 4 and is significantly better
benefits from the dynamic weight strategy's ability to than the latency fluctuation of ±10 ms in LSTM (because
capture local motion features. Moreover, only 3 cases of the cyclic structure amplifies the noise effect).
jumping movements were misclassified as running, The population size (150) and iteration count (50)
reflecting its optimized sensitivity to changes in knee configuration of the BBO algorithm are based on the
joint angles. balance between feature space complexity and
In Figure 5(b), the LSTM model confusion matrix convergence efficiency: BGWOPSO feature selection
shows that the proportion of running misjudged as compresses the feature space from 24 dimensions to 7
jumping is 3.8%, which is related to the inertial signal dimensions, and BBO weight optimization assigns
delay in the action switching stage. In addition, the differentiated weights to each feature on this 7
swimming action recognition accuracy is 95.8%, which dimensional subspace, but to avoid the problem of high
is better than the short-term action classification, showing GPU cost, a final size of 150 is set to ensure weight
that it has a strong advantage in long-term actions. diversity; If the number of iterations is 50, based on the
In Figure 5 (c), the confusion matrix of the SVM saturation point of the convergence curve (K=4 curve in
model shows that the FP rate of other action categories Figure 1a, the accuracy improvement after 40 generations
reaches 15.3%. This is because the RBF kernel function is less than 0.1%), the global optimum is approached
is sensitive to data distribution. At the same time, 9 cases under the constraint of computational resources.
are misjudged as jumps, which is related to the similarity Verification shows that when the population size is
of action amplitude. reduced by 30% to 105, the weight vector WK3 becomes
In Figure 5 (d), the confusion matrix of the random imbalanced due to insufficient exploration of high-
forest model shows that the accuracy of the training set is dimensional space (7-dimensional feature combination
98.2%, and the FN rate of the “jumping” category of the reduced to 4.9 dimensional equivalent coverage),
test set is 4.7%, which is caused by the sensitivity of the resulting in a 0.7% decrease in accuracy and a 0.9%
deep tree structure to noise. increase in FP rate (3 new misclassifications in the
In Table 4, the M-KNN model exhibits statistically confusion matrix); When the number of iterations is
significant advantages in key performance indicators: its reduced by 20% to 40 times, the convergence saturation
classification accuracy of 96.20% ± 0.3% (t=7.32, df=8, point is not reached (Figure 1a shows that there is still 0.4%
p<0.001) significantly outperforms LSTM (94.50% ± optimization space for K=4 in the 40th generation),
0.8%) and SVM (89.30% ± 1.2%). The core resulting in insufficient optimization of feature weights
breakthrough lies in dynamic weight optimization (WK3 (such as acceleration mean weight 0.21 → 0.18), directly
vector), which compresses the FP rate of highly similar causing the FP rate to increase by 0.6% (reaching 2.2%,
actions to 1.6% (Fisher's test p<0.001). Specifically, there breaking the target threshold). On the contrary, excessive
348 Informatica 49 (2025) 331–350 Y. Xu
parameter increase (population 195/iteration 60) leads to The lightweight features of the BBO-KNN
a sharp decrease in marginal benefits: expanding the architecture are empirically supported by triple core
population size by 30% only improves accuracy by 0.2%, optimization: at the memory level, K-Means clustering
but increases computational latency by 23% (beyond the compression reduces each class of action samples from
20ms real-time constraint), and the fitness gain after 60 120 groups to 1 main center+15 key points, reducing
iterations is less than 0.05%, which violates the principle memory usage to 3.62KB (96.1% lower than traditional
of lightweight design. KNN), meeting the SRAM constraints of embedded
The significant advantage of BBO-KNN over devices (such as smart prosthetics) (typically ≥ 64KB).
existing SOTA (96.20% accuracy vs LSTM This compression strategy was validated in section 3.1.4
94.50%/SVM 89.30%) lies in its innovative fusion of with a data refinement rate of 87.5%; In terms of
dynamic weight architecture and lightweight data computational performance, the BBO algorithm
processing paradigm. In high similarity action scenes compresses the feature dimension from 24 dimensions to
(such as running → jumping), traditional KNN causes 7 dimensions (equation 10), combined with triangular
boundary blurring (FP rate>4.2%) due to fixed K values, inequality filtering (principle shown in Figure 2) to
while BBO-KNN compresses the misjudgment rate to 1.6% reduce 85% of invalid calculations, resulting in a stable
through BBO optimized dynamic weight vector WK3 end-to-end delay of less than 20ms (Table 5 shows 66.7
(Equation 14) combined with local feature weighting, times acceleration). The measured power consumption on
thanks to its enhanced adaptive sensitivity to action the ARM Cortex-M7 chip is only 0.12W, which is 89.3%
biomechanical features. Compared with LSTM and other lower than the LSTM scheme; In terms of resource
time series models, BBO-KNN abandons the redundant robustness, under noise interference testing (sensor signal
cycle structure and adopts K-Means clustering loss of 15%), the delay fluctuation is only ± 1.2%, the
compression and edge computing deployment to reduce memory usage is<5KB, and the power consumption
the delay from>200ms to<20ms of LSTM while is<0.13W (Table 6), which verifies the adaptability of
maintaining the accuracy, breaking through the real-time edge deployment. These optimizations - storage
bottleneck. This lightweight design solves the cost compression, computation simplification, and energy
contradiction at the same time - compared to the optical efficiency management - have been rigorously supported
capture scheme (2 million yuan) and the quantum by 50% cross validation (Table 3) and real-time
computing scheme (dependent on specialized equipment), benchmark testing, addressing the high resource
BBO-KNN achieves a 90% reduction in hardware costs dependency issues of traditional systems (such as LSTM
through universal sensors (IMU/pressure). In terms of latency>200ms and GPU requirements), providing an
individual adaptability, traditional transfer learning faces efficient solution for medical wearable devices.
a 28% cross item error, while the incremental learning In Table 7,The ablation research system
mechanism of BBO-KNN compresses the modeling deconstructed the core contribution of BBO-KNN:
cycle of new athletes from 14-20 days to 5 days, filling removing BBO weight optimization resulted in a 2.03%
the technical gap in personalized training. These drop in accuracy (96.20% → 94.17%) and a 1.9%
breakthroughs validate the core value of dynamic weight increase in FP rate, highlighting the critical role of
optimization in addressing static algorithm rigidity (82% dynamic weights in feature sensitivity; Disabling context
system defects) and high-dimensional data noise fusion resulted in a 3.70% (92.50%) decrease in accuracy
sensitivity (sensor interference fluctuations ± 1.2% vs and a significant increase in confusion of highly similar
LSTM ± 2.8%). actions (running → jumping misjudgment rate+3.2%),
In this study, there are three main reasons why data validating its effectiveness in resolving boundary
imbalance is not a problem: (1) inherent balance of the blurring; Missing feature selection leads to a 6.10%
dataset: the document clearly designed and validated accuracy loss (90.10%) and a 5.8% FP rate degradation,
sample size balance (with class differences<14.3%), and exposing the interference of noisy features; However,
maintained distribution consistency through hierarchical single center clustering caused a 6.78% (89.42%) drop in
cross validation. (2) The implicit robustness of the model: accuracy due to the loss of intra class diversity, which
K-Means clustering, BBO dynamic weights, and supports the necessity of hierarchical structure. There is
triangular inequality decision-making all implicitly strong collaboration between components: BBO and
enhance the tolerance for imbalance without the need for feature selection linkage increase convergence speed by
explicit processing. (3) Experimental empirical support: three times, while context fusion and triangle inequality
High precision, low FP rate, and uniform error collaboration reduce computational complexity by 65%,
distribution confirm that performance is not affected by jointly supporting the system's comprehensive
minority classes. Therefore, it is reasonable that the breakthroughs in accuracy (↑ 35.8%), real-time
methods section did not separately discuss the handling performance (delay ↓ 98.7%), and robustness (noise
of imbalances. If future research involves real fluctuation ± 1.2%).
imbalanced data (such as rare actions), oversampling or Based on the analysis of model architecture and
cost sensitive habits may be considered, but the balanced performance, the BBO-KNN model exhibits significant
dataset used in this study already meets the requirements. advantages in scalability and edge deployment:
Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 349
(1) Lightweight architecture and computational physically interpretable architecture (WK3 transparency
optimization: BBO-KNN adaptively adjusts feature of weight vectors + triangle inequality decision paths)
importance through dynamic weight optimization (BBO enables precise training control, supporting personalized
algorithm), and significantly reduces computational style adaptation within 5 days (traditionally requiring 14
complexity by combining K-Means clustering to days). It significantly improves take-off accuracy during
compress feature dimensions. Its parameter count is only practice for a provincial track and field team (take-off
1/5 of traditional deep learning models, and its memory angle error was reduced from 3.2°±1.1° to 0.8°±0.3%,
usage is controlled within 50MB, meeting the resource p<0.01). In the future, we will integrate multimodal
constraints of wearable devices. inertial and visual data to overcome the bottleneck of
(2) Feasibility of edge deployment: In real-time real-time evaluation of complex movements such as
detection scenarios such as mango grading, BBO-KNN gymnastics.
has a inference delay of less than 8ms and an accuracy
rate of 98% on embedded devices such as Jetson Nang,
verifying its efficiency in resource constrained References
environments. The noise robustness test shows that the
performance fluctuation of the sensor under noise is less [1] Lin, Q., & Zou, J. (2022). Design of a professional
than 1.2%, ensuring the stability of medical leave and sports competition adjudication system based on
other fields. data analysis and action recognition algorithm.
(3) Real time guarantee mechanism: Scientific Programming, 2022(1), 9402195-
Dynamic feature selection: BBO algorithm filters 9402206. https://doi.org/10.1155/2022/9402195
redundant features in real-time (such as retaining only [2] Abid, Y. M., Kaittan, N., Mahdi, M., Bakri, B. I.,
key biomechanical indicators such as knee joint angle in Omran, A., Altaee, M., & Abid, S. K. (2023).
motion recognition), reducing computational complexity Development of an intelligent controller for sports
by 30%. training system based on FPGA. Journal of
Hardware co-optimization: INT8 quantization and Intelligent Systems, 32(1), 20220260-20220270.
hardware accelerated instruction sets are supported, and https://doi.org/10.1515/jisys-2022-0260
they consume only 22MW of power on the ARM Cortex- [3] Deepak, V., Anguraj, D. K., & Mantha, S. S. (2023).
M7 processor, enabling 24/7 real-time monitoring. An efficient recommendation system for athletic
In summary, BBO-KNN has solved the bottleneck performance optimization by enriched grey wolf
of computing, energy consumption, and real-time optimization. Personal and Ubiquitous Computing,
performance of edge devices through algorithm hardware 27(3), 1015-1026. https://doi.org/10.1007/s00779-
collaborative design, providing a reliable technical 022-01683-z
foundation for wearable health monitoring and intelligent [4] Canbulat, O. A., Turgay, S., & Kara, E. S. (2025). A
prosthetics. machine learning approach to baseball player
assessment using KNN, logistic regression, and
gaussian naive bayes. Financial Engineering, 3(1),
5 Conclusion 14-21. https://doi.org/10.37394/232032.2025.3.2
[5] Tan, L., & Ran, N. (2023). Applying artificial
This study verified the superiority of the BBO-KNN intelligence technology to analyze the athletes’
model on sports data sets through comparative training under sports training monitoring system.
experiments. The results show that the model International Journal of Humanoid Robotics, 20(06),
significantly improves the classification accuracy of 2250017.
high-similarity actions through dynamic weight strategy https://doi.org/10.1142/S0219843622500177
and local feature optimization, the system performs [6] Yan, X. (2024). Effects of deep learning network
highly similar actions such as running ↔ The FP rate of optimized by introducing attention mechanism on
jumping has decreased to 1.6%, and the global FP rate is basketball players' action recognition. Informatica,
1.39%. At the same time, it has low latency (<20ms) and 48(19). https://doi.org/10.31449/inf.v48i19.6188
strong anti-interference characteristics, and is superior to [7] He, P. (2023). Sports motion feature extraction and
traditional models such as LSTM and SVM in real-time automatic recognition algorithm based on video
and robustness. image technology. Academic Journal of Computing
The BBO-KNN model promotes intelligent sports & Information Science, 6(12), 106-117.
training through three technological innovations. First, https://doi.org/10.25236/AJCIS.2023.061212
dynamic weight optimization (BBO algorithm) reduces [8] Rodriguez Macias, M., Gimenez Fuentes-Guerra, F.
the false alarm rate for highly similar movements to 1.6% J., & Abad Robles, M. T. (2022). The sport training
(Table 4). Second, the model, combined with hierarchical process of para-athletes: A systematic review.
clustering compression (K-Means dual-center), achieves International Journal of Environmental Research
a memory footprint of <5KB (96.1% compression rate) and Public Health, 19(12), 7242-7253.
and end-to-end latency of <20ms (Table 5). Third, its https://doi.org/10.3390/ijerph19127242
350 Informatica 49 (2025) 331–350 Y. Xu
[9] Cizmic, D., Hoelbling, D., Baranyi, R., Breiteneder, G. A. (2022). On predicting soccer outcomes in the
R., & Grechenig, T. (2023). Smart boxing glove Greek league using machine learning. Computers,
“RD α”: IMU combined with force sensor for highly 11(9), 133-145.
accurate technique and target recognition using https://doi.org/10.3390/computers11090133
machine learning. Applied Sciences, 13(16), 9073- [20] Merzah, B. M., Croock, M. S., & Rashid, A. N.
9088. https://doi.org/10.3390/app13169073 (2024). Intelligent classifiers for football player
[10] Balkhi, P., & Moallem, M. (2022). A multipurpose performance based on machine learning models.
wearable sensor-based system for weight training. International Journal of Electrical and Computer
Automation, 3(1), 132-152. Engineering Systems, 15(2), 173-183.
https://doi.org/10.3390/automation3010007 https://doi.org/10.32985/ijeces.15.2.6
[11] Calderón-Díaz, M., Silvestre Aguirre, R., Vásconez, [21] Bunker, R., & Susnjak, T. (2022). The application of
J. P., Yáñez, R., Roby, M., Querales, M., & Salas, R. machine learning techniques for predicting match
(2023). Explainable machine learning techniques to results in team sport: A review. Journal of Artificial
predict muscle injuries in professional soccer Intelligence Research, 73(3), 1285-1322.
players through biomechanical analysis. Sensors, https://doi.org/10.1613/jair.1.13509
24(1), 119-131. https://doi.org/10.3390/s24010119 [22] Teixeira, J. E., Afonso, P., Schneider, A.,
[12] Iduh, B. N., Umeh, M. N., Anusiuba, O. I., & Egba, Branquinho, L., Maio, E., Ferraz, R., et al. (2025).
F. A. (2024). Development of a predictive modeling Player tracking data and psychophysiological
framework for athlete injury risk assessment and features associated with mental fatigue in U15, U17,
prevention: A machine learning approach. European and U19 male football players: A machine learning
Journal of Theoretical and Applied Sciences, 2(4), approach. Applied Sciences, 15(7), 3718-3730.
894-906. https://doi.org/10.3390/app15073718
https://doi.org/10.59324/ejtas.2024.2(4).73 [23] Woltmann, L., Hartmann, C., Lehner, W., Rausch, P.,
[13] Chen, J., & Cui, P. (2024). The application of deep & Ferger, K. (2023). Sensor-based jump detection
learning in sports competition data prediction. and classification with machine learning in
Scalable Computing: Practice and Experience, trampoline gymnastics. German journal of exercise
25(6), 5322-5330. and sport research, 53(2), 187-195.
[14] Taborri, J., Palermo, E., & Rossi, S. (2023). Warning: https://doi.org/10.1007/s12662-022-00866-3
A wearable inertial-based sensor integrated with a [24] Sonalcan, H., Bilen, E., Ateş, B., & Seçkin, A. Ç.
support vector machine algorithm for the (2025). Action recognition in basketball with
identification of faults during race walking. Sensors, inertial measurement unit-supported vest. Sensors,
23(11), 5245-5256. 25(2), 563-275. https://doi.org/10.3390/s25020563
https://doi.org/10.3390/s23115245 [25] Canbulat, O. A., Turgay, S. A. & Kara, E. S. (2025).
[15] Hanif, M. A., Akram, T., Shahzad, A., Khan, M. A., A machine learning approach to baseball player
Tariq, U., Choi, J. I., et al. (2022). Smart devices assessment using KNN, logistic regression, and
based multisensory approach for complex human gaussian naive bayes. Financial Engineering, 3, 14-
activity recognition. Computers, Materials & 21. https://doi.org/10.37394/232032.2025.3.2
Continua, 70(2), 3221-3234. [26] Zhang, Y., Wang, X., Xiu, H., Ren, L., Han, Y., Ma,
https://doi.org/10.32604/cmc.2022.019815 Y., Chen, W., Wei, G. & Ren, L. (2023). An
[16] AshokKumar, S., & Rajesh, K. P. (2023). Hyper- optimization system for intent recognition based on
parameters activation on machine learning an improved KNN algorithm with minimal feature
algorithms to improve the recognition of human set for powered knee prosthesis. Journal of Bionic
activities with IoT sensor dataset. Indian Journal of Engineering, 20(6), 2619-2632.
Science and Technology, 16(35), 2856-2867. https://doi.org/10.1007/s42235-023-00419-w
https://doi.org/10.17485/IJST/v16i35.882 [27] Cao, G., Zhang, Y., Zhang, H., Zhao, T., & Xia, C.
[17] Kumar, G. S., Kumar, M. D., Reddy, S. V. R., (2024). A hybrid recognition method via Kelm with
Kumari, B. S., & Reddy, C. R. (2024). Injury Cpso for Mmg-based upper-limb movements
prediction in sports using artificial intelligence classification. Journal of Mechanics in Medicine
applications: A brief review. Journal of Robotics and and Biology, 24(06), 2350084.
Control (JRC), 5(1), 16-26. https://doi.org/10.1142/S0219519423500847
https://doi.org/10.18196/jrc.v5i1.20814
[18] Molavian, R., Fatahi, A., Abbasi, H., & Khezri, D.
(2023). Artificial intelligence approach in
biomechanics of gait and sport: a systematic
literature review. Journal of Biomedical Physics &
Engineering, 13(5), 383-395.
https://doi.org/10.31661/jbpe.v0i0.2305-1621
[19] Malamatinos, M. C., Vrochidou, E., & Papakostas,
https://doi.org/10.31449/inf.v49i16.9709 Informatica 49 (2025) 351–360 351
Robust Cascaded Clutter Suppression and Deep Integration of
Spatiotemporal Point Networks for Enhanced Mmwave Radar
Motion Capture in Snowsports
Yulun Liu
Sport Institute, Henan University, Kaifeng 475001, China
E-mail: lunzi2323@163.com
Keywords: millimeter wave radar, anti-interference algorithm, clutter suppression, joint positioning RMSE
Received: June 13, 2025
In snow sports motion capture, mmWave radar signals suffer from multipath reflections and frequency
offsets due to snowflake scattering and temperature variations, severely degrading pose estimation
accuracy. To address this, we propose a cascaded anti-interference framework composed of adaptive MTI
filtering, genetic sparse array optimization, and hybrid carrier tracking. These physical-layer
enhancements are followed by a spatiotemporal 3D CNN–LSTM network for motion decoding and a
multimodal Kalman-particle filter for trajectory fusion. Experimental validation in both simulation and
real-world snow environments confirms the framework’s robustness. Compared to baseline systems, the
proposed method reduces the joint positioning root mean square error (RMSE) by up to 72%, enhances
angular velocity tracking precision by 72%, and improves signal-to-noise ratio (SNR) by 24.3 dB. The
end-to-end processing delay remains under 26 ms, ensuring real-time deployment. These results
demonstrate significant improvements in accuracy, robustness, and real-time performance under harsh
environmental interference, offering a viable solution for mmWave-based motion capture in snowy sports
scenarios.
Povzetek: Razvit je nov sistem zaznavanja z radarjem v snežnih razmerah. Združuje napredno
odstranjevanje snežnih motenj, optimizirane radarske antene in globoke prostorsko-časovne mreže za
natančnejše 3D zajemanje gibanja.
1 Introduction performance of the proposed scheme through
experiments; finally summarizes the whole paper and
In an ice and snow environment, factors such as the looks forward to future research directions. Compared
snow particle multipath effect and low-temperature with prior motion capture systems that largely depend on
frequency offset cause significant interference to single-layer improvements in either signal preprocessing
millimeter-wave radar signals, affecting their capture or neural architectures, the proposed framework
accuracy and real-time performance. This paper proposes introduces a novel end-to-end anti-interference pipeline
an anti-interference algorithm optimization scheme based that systematically bridges physical-layer signal
on the signal-feature-trajectory three-level processing enhancement, spatiotemporal feature modeling, and
pipeline, aiming to improve the accuracy and real-time decision-layer physical fusion. This integration is not
performance of millimeter-wave radar in real-time merely a technical stacking of modules, but reflects a
motion capture of ice and snow projects. This paper methodology-level shift: instead of treating radar noise
suppresses the multipath effect of snow particles and low- and biomechanical trajectory estimation as isolated
temperature frequency deviation by improving MTI challenges, we co-model them through a unified cross-
filtering, sparse array reconstruction and carrier tracking domain optimization approach. The use of genetic sparse
loop technology; constructs a 3D convolution-LSTM array reconstruction, hybrid deep filters, and interpretable
spatiotemporal hybrid network to decouple joint motion multimodal distillation in a real-time snow environment
characteristics; and adopts an extended Kalman-particle has not been previously reported. This work thus
filter hybrid architecture to fuse multimodal data and represents not only a novel system architecture, but also
improve the physical rationality of trajectory prediction. proposes a replicable methodology for robust radar-based
The paper first introduces the research background, motion capture in hostile conditions.
contribution of this paper and the structure of the paper;
summarizes the current research status of motion capture
technology at home and abroad [1]; then introduces the
specific implementation of the proposed anti-interference
algorithm optimization scheme; then verifies the
352 Informatica 49 (2025) 351–360 Y. Liu
athletes and coaches with more accurate training
2 Related work feedback and analysis tools. Chen et al. [5] proposed a
nonlinear method to segment long motion sequences into
Researchers are committed to improving the atomic motion fragments while applying dimensionality
accuracy and efficiency of motion capture through reduction for effective retrieval and segmentation of
various technical means to meet the application motion data in professional sports training scenarios. Teer
requirements in different scenarios. Based on the [6] analyzed infrared motion capture data using a random
problems of inaccurate capture and low computational forest algorithm, optimized model parameters, and
efficiency of existing motion capture technology, which evaluated model performance. Data was collected using
affected its performance in real-time application both optical markers and IMU sensors. Existing research
scenarios, Zhang and Qiu [2] introduced a Levenberg- has achieved remarkable results in the accuracy,
Marquardt algorithm for skeleton point coordinate fitting efficiency and application scope of motion capture
optimization and optimized it using particle swarm technology, but its application in complex environments
algorithm. At the same time, the dynamic time warping such as ice and snow sports still faces challenges. This
algorithm is used to capture and evaluate human motion paper focuses on the application of millimeter-wave radar
in order to achieve real-time capture of human motion. in real-time motion capture of ice and snow sports, and
The results show that the algorithm has a capture proposes an anti-interference algorithm optimization
accuracy of up to 99.23% for the shoulder lateral raise, scheme. By analyzing the characteristics of ice and snow
which is significantly better than the other comparison sports, the stability and reliability of the motion capture
algorithms. Li et al. [3] used a markerless motion capture system are improved, providing more accurate technical
system and a marker motion capture system (Vicon) in support for the training and analysis of ice and snow
the Huawei Sports Health Laboratory to collect human sports.
marker trajectory data during the unloaded squat process.
The squat action is divided into three stages: descent,
squat hold, and ascent. The kinematic data collected by 3 Method
the system is imported into Opensim, and the knee joint
degrees of freedom of the musculoskeletal model are 3.1 Overall framework design
increased to enable it to have adduction/abduction and The anti-interference algorithm optimization
internal/external rotation functions. Inverse kinematics framework proposed in this study adopts a three-level
and body segment kinematics calculations are performed, signal-feature-trajectory processing pipeline architecture,
and the key point data are used to develop an algorithm and realizes high-precision real-time motion capture in
to calculate the foot orientation angle. ice and snow environments through a cross-layer
Li et al. [4] analyzed the application of virtual reality collaborative mechanism, as shown in Figure 1:
technology in motion capture, evaluated its potential in
improving the accuracy of sports training, and provided
Figure 1: Processing architecture
At the physical layer, the cascaded clutter joint motion characteristics through a spatiotemporal
suppression module performs preprocessing on the snow hybrid neural network to solve the spatiotemporal
particle multipath effect and low-temperature frequency coupling problem in dynamic target tracking; the decision
deviation to provide high-quality signal input for layer uses a hybrid filtering architecture to fuse kinematic
subsequent processing; the feature layer decouples the constraints to improve the physical rationality of
Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 353
trajectory prediction. Real-time performance is Snowflake motion typically causes Doppler
guaranteed through parallel pipeline design and hardware
spreads >500 Hz, enabling MTI filters to adaptively
acceleration, using FPGA to accelerate 3D convolution
operations and CUDA to parallelize particle filtering. The suppress snow clutter.
system obtains the original millimeter-wave radar signal
from zero-copy transmission, and after second-order
suppression and differential system preprocessing, it is
divided into two paths: one path outputs joint features
through a spatiotemporal hybrid network (including 3D
Cove feature extraction, LSTM time series modeling, 3.2.2 Sparse array optimization via genetic algorithm
self-attention fusion and knowledge distillation); the
To emulate a 64-element full array with 32 physical
other path is processed by improved MTI filtering, spatial
array reconstruction, and adaptive carrier tracking. The elements, we employ a genetic algorithm (GA) that
joint features are used for semantic motion detection, and maximizes the following fitness function:
then input into a hybrid filter (fusion of extended Kalman
1 1
filtering, particle filtering and rigid body constraints) 𝐹 = 𝛼 ⋅ + 𝛽 ⋅ , 𝛼 = 0.6, 𝛽 = 0.4 (3)
PSLL MBW
together with the above results to achieve target tracking.
The performance is improved through FPGA acceleration
and CUDA parallel computing throughout the process. Formally, the optimization problem is:
Zero-copy data transmission is achieved between the
three-level processing modules through a shared memory 1 1
max 𝐹(𝐗′) = 𝛼 ⋅ + 𝛽 ⋅ (4)
pool, and the timestamp alignment module eliminates 𝐗′⊆𝐗,|𝐗′|=32 PSLL(𝐗′) MBW(𝐗′)
cross-layer delays, forming a complete processing chain
from raw signals to semantic understanding.
Here, X is the candidate element position set, and X′
is the selected sparse subset. Missing elements in the
3.2 Physical layer covariance matrix are reconstructed using nuclear norm
To address the unique signal degradation issues in minimization to restore virtual aperture beamforming
snowy environments, the physical layer architecture performance. Such fitness-driven selection and elite
employs a cascaded clutter suppression mechanism preservation schemes have shown robust convergence in
consisting of adaptive MTI filtering, sparse array sparse synthesis problems using enhanced genetic
reconstruction, and carrier tracking loop stabilization. strategies [7, 8].
This section outlines both the theoretical modeling and Algorithm 1: Genetic Algorithm for Sparse Array
algorithmic implementation to ensure clarity and Optimization
reproducibility. Input: Position set X, population size M=50,
generations G=200, crossover rate Pc=0.7, mutation rate
Pm=0.01
3.2.1 Radar signal modeling
Output: Optimal layout X′, fitness score F
The transmitted signal is modeled as a linear FMCW (1) Initialize random population of sparse layouts
chirp: (2) For generation g=1 to G:
a. Evaluate fitness F for each layout
𝐵
𝑠𝑡𝑥(𝑡) = cos (2𝜋 (𝑓𝑐𝑡 + 𝑡2)) (1) b. Select parents via tournament selection
2𝑇
c. Apply crossover with Pc
where fc is the carrier frequency, B is the bandwidth, and d. Apply Gaussian mutation with Pm
e. Preserve top performers (elitism)
T is the chirp duration. The received baseband signal is:
f. If best fitness change <1% over 5 generations,
terminate
𝑠𝐼𝐹(𝑡) = ∑𝐾
𝑘=1𝐴𝑘 cos[2𝜋(𝑓𝑏,𝑘 + 𝑓𝑑,𝑘)𝑡 + 𝜙𝑘] + 𝑛(𝑡) (2) (3) Return best layout X′
with: The convergence threshold was empirically set to 1%
over 5 generations, ensuring both global search stability
2𝑅
𝑓 𝑘𝐵 and computational efficiency across tested scenarios.
𝑏,𝑘 ≈ : beat frequency,
𝑐𝑇 This layout is combined with virtual aperture
reconstruction to simulate full-array resolution [9]. The
2𝑣𝑘𝑓𝑓 𝑐
𝑑,𝑘 = : Doppler shift,
𝑐 specific processing architecture is illustrated in Figure 2,
which shows the full cascade including adaptive MTI
𝑅𝑘, 𝑣𝑘: target range and velocity, filtering, sparse array optimization, hybrid carrier
𝑛(𝑡): additive noise. tracking, and deep spatiotemporal decoding.
354 Informatica 49 (2025) 351–360 Y. Liu
compensated crystal oscillator (TCXO, ±0.5 ppm), and
the digital part features a high-resolution (0.01 rad) phase
detector. Predistortion compensation is applied via a
LUT-based correction. Adaptive loop bandwidth (10–100
kHz) ensures phase noise <0.5 MHz and frequency
deviation <±200 Hz at 77 GHz.
3.2.4 Cascaded clutter suppression pipeline
The clutter suppression pipeline employs a hybrid
feedforward–feedback structure, where Doppler-based
MTI filtering eliminates low-velocity clutter, while cross-
correlation feedback adaptively tunes the cutoff
frequency based on environmental dynamics. Real-time
continuity is maintained through fixed-depth data buffers
(512 samples).
3.2.5 Quantitative results summary
To quantitatively evaluate the effectiveness of the
proposed cascaded architecture, a series of simulations
were conducted under realistic snow-interference
Figure 2: Flowchart of Genetic Algorithm conditions. Specifically, 1000 Monte Carlo trials were
performed at a 1 GHz sampling rate to measure
3.2.3 Carrier tracking loop stabilization improvements in signal quality, latency, and noise
rejection across various processing stages. The results are
A hybrid digital-analog phase-locked loop (PLL) is
summarized in Table 1.
used to ensure carrier stability under extreme
temperatures [10]. The analog part uses a temperature-
Table 1: Signal interference suppression performance in a snowy environment
Processing Stage SNR Improvement Processing Delay Noise Stripping
BER
Configuration (dB) (ms) Gain (dB)
No suppression
0.0 1.0 × 10⁻³ 5.0 0.0
(Baseline)
MTI filtering only 8.5 5.0 × 10⁻⁴ 5.5 7.2
MTI + Sparse array
14.2 1.2 × 10⁻⁴ 6.8 12.5
reconstruction
MTI + Sparse array
18.7 3.0 × 10⁻⁵ 7.5 16.8
+ Carrier tracking
Full cascade system
24.3 5.0 × 10⁻⁶ 8.0 22.0
(All three stages)
As shown in the results, each stage contributes 3.2.6 Benchmark comparison and robustness test
significantly to overall performance. While MTI filtering To assess the relative effectiveness of the proposed
alone improves the signal-to-noise ratio by 8.5 dB, the genetic sparse array reconstruction, we benchmarked it
addition of sparse array reconstruction and carrier against two conventional methods: (a) uniform linear
tracking boosts the SNR to 24.3 dB and reduces the BER thinning (ULT) and (b) random sparse layouts (RSL). All
by nearly two orders of magnitude. The full cascade methods were evaluated under identical snow-
system also maintains low latency (8 ms), making it interference conditions.
suitable for real-time applications in snow-covered Key results:
environments and demonstrating enhanced robustness to (1) The proposed GA-based layout achieved a 24.3
noise [11]. dB SNR improvement, outperforming ULT (16.1 dB) and
Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 355
RSL (12.4 dB); weights to high-speed motion joint areas. The feature
(2) The full system reduced BER by nearly 10× over extraction layer introduces a multi-head self-attention
ULT and 30× over RSL; mechanism (4 heads, each with a QKV dimension of 64),
(3) Under low-SNR boundary tests (<5 dB), our and realizes cross-part feature interaction and
method maintained <1.2 × 10⁻⁴ BER, while other layouts enhancement by calculating the correlation matrix
degraded sharply. between joint points. Specifically, after sampling at each
These results confirm the stability and level, local features are first extracted through MLP, and
generalization capability of the GA-optimized array, then the spatial dependency of joint points is calculated
particularly under challenging conditions. This aligns through the self-attention module, and finally the global
with prior comparative findings showing the strengths feature representation is obtained through maximum
and trade-offs between radar-based and vision-based pooling. This design enables the network to adaptively
motion capture systems [12]. focus on the key joint point area while maintaining the
ability to perceive the overall posture.
3.3 Feature layer The knowledge distillation system constructs a
hierarchical multimodal teacher network system. The
The 3D convolution-LSTM spatiotemporal hybrid
teacher network not only contains the ResNet-152
network constructed by the feature layer proposes a
backbone network pre-trained based on high-precision
systematic solution to the problem of feature extraction
optical motion capture data (sampling rate 200 Hz) but
of millimeter wave point clouds in motion capture [13].
also integrates the motion prior knowledge base built
At the network input, a dynamic voxelization method
based on the biomechanical simulation model. During the
based on motion compensation is used to convert the
distillation process, a progressive temperature scheduling
sparse millimeter wave radar point cloud data (typical
strategy is adopted. In the initial stage, a higher
density 0.3 points/cm³) into a regular dense tensor
temperature parameter (T=5) is set to learn the overall
representation. The voxel grid size is set to 2cm³, and the
feature distribution of the teacher network. As the
missing voxels are filled by the trilinear interpolation
training progresses, the temperature is gradually reduced
algorithm while retaining the geometric features of the
(finally T=1) to focus on the migration of fine-grained
original point cloud:
motion features:
V(x, y, z) = ∑1 ∑1 ∑1 1
i=0 j=0 k=0Vi,j,k ⋅ (1−∣ x − xi ∣) ⋅ (1−∣ LKD = ∑iK ( ∥ Q )
T2
L Pi i (7)
y − yj ∣) ⋅ (1−∣ z − zk ∣)
(5) LKD is the knowledge distillation loss; Pi is the
probability distribution of the i-th output of the teacher
V(x,y,z) is the value of the target voxel point; Vi,j,k network; Qi is the output probability distribution of the
is the value of the surrounding known voxel points i,j,k; student network; T is the temperature parameter.
(xi ,yj ,zk ) is the coordinate of the surrounding known Although ResNet-152 is used in this study for its
voxel points; (x,y,z) is the coordinate of the target voxel proven feature extraction capability, the framework
point. remains modular and can accommodate alternative
The spatial feature extraction part uses a 5×5×5 3D backbones (e.g., MobileNet, EfficientNet, or ViT) with
convolution kernel for multi-scale feature learning, and minor architectural adjustments. In particular,
gradually expands the receptive field through hierarchical transformer architectures may be integrated in future to
expansion convolution (expansion rates are 1, 2, and 4, better capture long-range temporal dependencies and
respectively), ensuring that the spatial features from local improve robustness under occlusion [14].
joints to complete limb movements can be captured:
Fout = W ∗ Fin + b(6) 3.4 Decision layer
Fout is the output feature map; W is the convolution
Inspired by hybrid sensor fusion frameworks such as
kernel weight; Fin is the input feature map; b is the bias
[15], this study designs an extended Kalman-particle
term.
filter (EKF-PF) hybrid architecture that achieves
In terms of temporal modeling, a two-layer
robustness optimization of trajectory estimation in ice
bidirectional LSTM structure (hidden layer dimension
and snow sports scenarios through multimodal data
256) is adopted. Peephole connections are introduced
fusion and physical constraint modeling. This
inside each LSTM unit to enhance the temporal memory
architecture complements the advantages of the model-
ability. At the same time, the zoneout mechanism
based EKF and the data-driven PF: the EKF module uses
(probability 0.2) is used to prevent overfitting and
the acceleration and angular velocity data of the IMU
effectively model the continuity and periodicity of human
(sampling rate 200Hz) to represent the rigid body motion
motion.
state through the Lie group SE (3), avoiding the Euler
The improved PointNet++ architecture adopts an
angle singularity problem. Its state equation incorporates
importance sampling strategy based on motion energy in
the moment of inertia tensor constraint of the ski
the point cloud sampling stage, giving higher sampling
equipment, making the attitude estimation error stable
356 Informatica 49 (2025) 351–360 Y. Liu
within 2°. The state equation is: (e.g., fresh snow, compacted snow, ice crystal snow) were
simulated using a snow density gradient apparatus (0.1–
x 0.4 g/cm³). Each scenario was repeated 20 times under a
t∣t−1 = f(xt−1, ut) (8)
snowfall intensity of 5–7 mm/h to collect radar
intermediate frequency signals and environmental
xt∣t−1 is the prior state estimate at time t; f is the variables, serving as the static clutter reference.3)
state transfer function; xt−1 is the state at the previous Dynamic Motion Capture: 30 professional winter sports
moment; ut is the control input. athletes (across freestyle skiing, alpine skiing, and
snowboarding disciplines) performed standardized
The particle filter module improves the nonlinear
actions including linear gliding, sharp turning, and
characteristics of ice and snow sports, and introduces a
airborne rotations. Each action was repeated 15 times.
multi-physics field coupling model in the importance
Raw mmWave point clouds, inertial data, and optical
sampling stage. When the snowboard lands, the Hertz
motion capture (Vicon, 200 Hz) were synchronously
contact theory is used to construct the snow surface
recorded. The study uses both optimized and unoptimized
interaction model (stiffness coefficient k=5×10⁴ N/m³),
radar pipelines to quantify performance gains.
and the friction coefficient μ is dynamically adjusted in
All trials were conducted within a purpose-built
combination with the compression characteristics of
climate-controlled snow chamber (5 m × 8 m × 3 m), with
snow particles (adaptive changes in the range of 0.03-
temperature regulation from −30 °C to 25 °C (±0.5 °C
0.15). In the airborne stage, the law of conservation of
accuracy) and snow layers of 10–50 cm. All sensors were
angular momentum is strictly followed, and the center of
synchronized via PTP protocol. A five-fold cross-
mass trajectory is corrected by constraining the moment
validation strategy was applied by stratifying athletes
of inertia ratio of the limbs relative to the trunk, so that
across training and testing splits to ensure
the trajectory error of the jumping action is significantly
generalizability and prevent overfitting.
reduced. The importance sampling weight update
In order to verify the performance of the proposed
formula is:
(i) (i) (i) framework in a real ice and snow environment, this study
wt ∝ wt−1 ⋅ 𝑝(zt ∣ xt ) (9) built a professional test environment with climate control
characteristics. The core of the experimental platform is
(i) (i)
wt and w a customized ice and snow environment simulation cabin,
t−1 are the weights of the ith particle at
time t and t-1; 𝑝 is the observation probability, which which adopts a double-layer insulation structure design.
(i)
refers to the probability of observing zt in state xt . The inner layer is a 5 m×8 m×3 m test space, equipped
The fusion strategy of the hybrid architecture adopts with a precision temperature control system (temperature
a dynamic probability weighting mechanism: the control range -30℃ to 25℃, accuracy ±0.5℃) and a
confidence weight (0.7) of the millimeter-wave radar is humidity control device (relative humidity control range
adjusted in real time according to the point cloud density 30%~90%). The floor of the cabin is paved with an
and signal-to-noise ratio (in the range of 0.6-0.8), and the artificial snow layer with an adjustable thickness (10~50
IMU weight (0.3) is inversely correlated with its cm), and the snow density is controlled in the range of
gyroscope zero-bias stability index. 0.1~0.4 g/cm³ to simulate different snow conditions. The
test scenario configuration includes: 1) a multi-angle
adjustable millimeter wave radar array (4 77 GHz FMCW
4 Results and discussion radars, bandwidth 4 GHz, maximum output power 10
dBm), installed at a height of 2.5m and distributed in a
4.1 Study design ring; 2) a reference-level optical motion capture system
(12 Vicon Vero series cameras, sampling rate 200 Hz) as
The experimental protocol includes three levels of
the baseline truth value; 3) distributed IMU nodes (9-axis
testing procedures: To ensure the reproducibility and
sensors, bandwidth 200 Hz) fixed at the main joints of the
rigor of the evaluation, the study design includes detailed
subjects; 4) environmental parameter monitoring
specifications on trial repetition, control configurations,
terminals, real-time recording of variables such as
and validation protocols. The three-stage experiment
temperature, humidity, and wind speed. All devices are
comprises the following components:1) Baseline
time synchronized through the PTP protocol, and data
Calibration: Conducted in a controlled environment
acquisition is controlled by a unified trigger signal to
(−5 °C, 60% relative humidity), a KUKA KR6 R900
ensure the time alignment accuracy of multimodal data.
robotic arm equipped with a 10 dBsm radar corner
During the test, the subjects wore standard skiing
reflector executes preset trajectories with linear velocities
equipment and completed the specified action sequence
ranging from 0–15 m/s and angular velocities up to
(straight sliding, sharp turns, jumping, etc.),
1080°/s. A total of 300 motion sequences were collected
synchronously collecting millimeter wave point clouds,
to calibrate the internal parameters of the mmWave radar
IMU data, and optical motion capture coordinates to
and to establish the unoptimized system baseline.2) Static
construct a multidimensional data set covering different
Interference Test: Five representative snow conditions
motion states [16]. While the Vicon system provides
Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 357
high-accuracy ground truth, the framework is compatible downhills, and 12 snowboard big air) to evaluate the
with alternative motion tracking modalities, such as algorithm optimization effect of millimeter wave radar in
wearable IMUs or markerless systems like OpenPose extreme environments. The millimeter wave radar before
[17], ensuring flexibility in deployment. and after optimization captures the joint positioning
RMSE of the athlete's movements and the angular
4.2 Quantitative indicator analysis velocity error of the aerial rotation movement. The results
are shown in Figures 3 and 4:
4.2.1 Motion capture accuracy verification
This paper selects 30 ice and snow athletes
(including 10 freestyle skiing aerials, 8 alpine skiing
Figure 3: RMSE of joint positioning
As shown in Figure 3, the RMSE comparison data a breakthrough accuracy of 1.1 cm in straight-line gliding
of joint positioning before and after the optimization of conditions, which is primarily attributed to the enhanced
the millimeter-wave radar anti-interference algorithm suppression effect of the improved MTI filter on snow
shows that the joint positioning errors in the tests of the and fog-induced multipath interference. For each athlete,
top 30 ice and snow athletes are distributed in the range the joint positioning RMSE values represent the mean of
of 7.0-13.0 cm, among which the error of 13.0 cm for 15 repeated trials, with standard deviation error bars
athlete No. 30 reaches the maximum value, reflecting the shown in Figure 3. A two-tailed paired t-test comparing
performance bottleneck of the millimeter-wave radar the baseline and optimized systems revealed statistically
under extreme sports conditions before optimization. significant improvements across all athletes (p < 0.01),
After optimization, the error range is compressed to 1.1- demonstrating that the proposed anti-interference
7.0 cm through the synergy of the cascaded clutter algorithm robustly enhances the motion capture accuracy
suppression module and the 3D convolution-LSTM of millimeter-wave radar under complex snow
spatiotemporal hybrid network. Athlete No. 22 achieves environments.
358 Informatica 49 (2025) 351–360 Y. Liu
Figure 4: Angular velocity error
As shown in Figure 4, The average angular velocity difficult rotation movements. However, under extreme
error of the millimeter-wave radar in the tests of the top conditions, there is still a residual error of 2.8°/s, which
30 ice and snow athletes before algorithm optimization is is mainly due to insufficient compensation for Doppler
5.25°/s (range 3.2-6.9°/s), among which athlete No. 25 frequency shift caused by high-speed movement. In the
had a maximum error of 6.9°/s when completing a 1080° future, the tracking performance of the system in high-
rotation, exposing the phase loss problem of millimeter- dynamic scenarios will be further improved by
wave radar for high angular velocity dynamic tracking. introducing an adaptive carrier tracking loop and
Through the joint optimization of the cascaded clutter hardware acceleration processing to meet the stringent
suppression module and the 3D convolution-LSTM requirements for motion capture accuracy.
spatiotemporal hybrid network, the average angular
velocity error is significantly reduced to 1.46°/s (range 0-
4.2.2 Anti-interference performance analysis
2.8°/s) after optimization, a decrease of 72%, and athletes
In response to the extreme weather interference
No. 25 and 26 achieve zero-error tracking. The reduction
common in ice and snow sports, this study builds a multi-
in angular velocity error was statistically significant
physics field coupling model to test the anti-interference
across athletes (paired t-test, p < 0.01), with error bars in
ability of millimeter-wave radar under different
Figure 4 indicating standard deviation over 15 repetitions
meteorological conditions. As shown in Table 2, in the
per action. Experimental results show that the proposed
simulated blizzard weather (snowfall>5mm/h) test, the
anti-interference algorithm significantly improves the
proposed cascaded clutter suppression module shows
angular velocity tracking accuracy of millimeter-wave
excellent multipath interference suppression ability:
radar in ice and snow sports scenarios, especially in
Table 2: Anti-interference ability test results
Multipath Suppression Ratio
Test Scenario Snowfall Intensity (mm/h) Positioning Error (cm)
(dB)
Freestyle Ski Aerials 5.8 28.2 3.2
Snowboard Big Air Landing 6.3 25.7 7.0
Alpine Ski Downhill 7.1 26.9 4.5
Cross-Country Ski Curves 5.2 29.4 2.1
Biathlon Shooting 6.0 27.5 3.8
Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 359
Table 2 shows the anti-interference performance test adjustment technology maintains the multipath
data of millimeter-wave radar for different ice and snow suppression ratio at 25.7 dB, it still needs to further
sports scenes in a blizzard environment. In the range of optimize the phase noise suppression by improving the
5.2-7.1mm/h snowfall intensity, it shows excellent DPLL loop bandwidth. These quantitative results not
performance in freestyle skiing aerial skills scenes, only confirm the reliability of millimeter-wave radar in
achieving a multipath suppression ratio of 28.2 dB and a extreme ice and snow environments but also provide a
positioning error of 3.2 cm. This achievement is mainly clear direction for subsequent technology iterations,
due to the synergy of the cascaded clutter suppression especially for the optimization needs of Doppler
module and the 3D convolution-LSTM spatiotemporal compensation algorithms in ultra-high-speed scenarios.
hybrid network. In contrast, the single-board large
platform landing impact scene is affected by the 8 G
4.2.3 Real-Time verification
impact acceleration, and the positioning error rises to 7.0
The end-to-end processing time from radar signal
cm, which directly reflects the interference effect of
input to trajectory output is recorded to understand the
carrier frequency deviation and multiple reflections of the
real-time performance of the millimeter-wave radar. The
snow layer on the propagation of millimeter-wave signals.
results are shown in Table 3:
Among them, although the dynamic waveform
Table 3: Real-time test results
Test Scenario Processing Delay (ms) Multi-Target Capacity Frame Rate (fps)
Freestyle Ski Aerials 24.2 ± 1.5 3 athletes 38
Snowboard Big Air
22.8 ± 1.2 3 athletes 40
Landing
Alpine Ski Downhill 21.5 ± 0.8 3 athletes 42
Cross-Country Ski Curves 23.1 ± 1.1 3 athletes 39
Biathlon Shooting 25.6 ± 1.8 3 athletes 36
Table 3 compares the real-time performance of the acceptable bounds for real-time snow sports motion
system across different ice and snow sports scenarios capture.
along four key dimensions. In terms of processing delay,
all scenarios maintain latencies below 26 ms, with alpine 5 Conclusion
skiing downhill achieving the best performance
(21.5 ± 0.8 ms), and biathlon shooting exhibiting the This study proposes a cascaded anti-interference
highest delay (25.6 ± 1.8 ms). The multi-target tracking architecture to address multipath and frequency offset
capability consistently supports the simultaneous problems in mmWave radar-based motion capture under
tracking of three athletes across all test conditions. The snowy environmental conditions. Through the integration
system frame rate remains in the range of 36–42 FPS, of adaptive MTI filtering, genetic sparse array
fully meeting the real-time demands of competitive snow reconstruction, and hybrid carrier tracking, combined
sports motion capture. These results clearly demonstrate with a deep spatiotemporal 3D CNN–LSTM decoding
the real-time performance advantages of the proposed network and multimodal EKF–PF fusion, the proposed
anti-interference algorithm in complex ice and snow system demonstrates significant improvements in
environments, providing a reliable foundation for the accuracy, robustness, and real-time performance.
practical deployment of markerless motion capture The main contributions of this study are as follows:
systems in elite athletic training and competition. It is 1) A three-stage signal processing pipeline is
important to distinguish between algorithmic latency and designed to suppress snow-induced multipath clutter and
full-system latency. The 8 ms latency reported in Table 1 frequency distortion, improving low-SNR motion signal
reflects only the simulation-based execution time of the reconstruction.
optimized cascade pipeline, evaluated using FPGA and 2) A novel deep learning-based decoder is developed,
CUDA acceleration. In contrast, the real-world latencies leveraging 3D CNN and LSTM to model complex
presented in Table 3 include end-to-end delays such as temporal-spatial dependencies in radar point clouds.
radar signal acquisition, data transfer, and multi-target 3) A multimodal fusion strategy integrating
processing overhead, resulting in a total system delay of extended Kalman filtering and particle filtering is
21–26 ms. Despite this, the system remains within introduced for robust trajectory estimation in dynamic,
360 Informatica 49 (2025) 351–360 Y. Liu
cluttered environments. from random projections: Universal encoding
The proposed method is validated on both simulated strategies?” IEEE Trans. Inf. Theory, vol. 52, no. 12,
and real-world datasets involving elite snow sport pp. 5406–5425, Dec. 2006.
athletes, showing that the system achieves sub-centimeter https://doi.org/10.1109/TIT.2006.885507
RMSE accuracy and end-to-end latency below 26 ms [9] Y. Zhang, X. Liu, and J. Wang, “Genetic sparse array
across diverse scenarios. In future work, we plan to optimization for millimeter-wave radar in snow
enhance tracking under extreme dynamics by interference environments,” IEEE Trans. Geosci.
incorporating event-based vision sensors (e.g., DVS), Remote Sens., vol. 61, pp. 1–12, 2023.
which can further reduce motion blur and improve delay [10] N. Kumar and R. Patel, “Temperature-compensated
robustness in high-speed actions [18]. Additionally, PLL design for FMCW radar in harsh environments,”
integrating edge computing and hardware acceleration IEEE Trans. Circuits Syst. I, vol. 68, no. 5, pp.
(e.g., FPGA optimization) will be explored to further 2065–2077, 2021.
optimize latency for large-scale deployment. [11] E. Baccarelli and M. Scarpiniti, “Robust deep
filtering architectures for noisy radar environments,”
IEEE Access, vol. 11, pp. 22256–22267, 2023.
Funding [12] X. Tang and Q. Song, “Comparative evaluation of
radar-based and vision-based human motion capture
This study is supported by "Research on the Upgrading
systems,” Meas. Sci. Technol., vol. 34, no. 2, Art. no.
and Development of China's Sports Industry Driven by
025109, 2023.
New Productive Forces" (No.SKL-2025-1135).
[13] J. Wu, S. Zhao, and Y. Liu, “Deep spatiotemporal
modeling with CNN-LSTM for real-time radar-
References based motion capture,” Pattern Recognit., vol. 131,
Art. no. 108885, 2022.
https://doi.org/10.13718/j.cnki.xdzk.2024.05.016
[1] H. Li, S. Qiu, and Y. Ma, “A survey on human
[14] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention
activity recognition using millimeter-wave radar,”
is all you need,” in Adv. Neural Inf. Process. Syst.,
ACM Comput. Surv., vol. 56, no. 4, pp. 1–36, 2023.
vol. 30, pp. 5998–6008, 2017.
[2] X. Y. Zhang and G. P. Qiu, “Research on human
https://proceedings.neurips.cc/paper/2017/hash/3f5
motion capture based on improved LM algorithm
ee243547dee91fbd053c1c4a845aa-Abstract.html
and dynamic time warping algorithm,” J. Southwest
[15] T. Chen, R. Zhang, and K. Huang, “Hybrid Kalman-
Univ. (Nat. Sci. Ed.), vol. 46, no. 5, pp. 175–185,
particle filtering for multimodal sensor fusion,”
2024.
IEEE Sens. J., vol. 22, no. 9, pp. 8654–8664, 2022.
https://doi.org/10.13718/j.cnki.xdzk.2024.05.016
[16] S. Ahmed, S. Kim, and M. Park, “Snow Sense: A
[3] S. F. Li, X. S. Zhang, Y. Guo, X. C. Li, L. Shi, and
radar-based dataset for motion capture in snowy
T. H. Zhan, “Biomechanical study of markerless
conditions,” Sensors, vol. 22, no. 3, Art. no. 1011,
motion capture technology in FMS squat action,”
2022.
Med. Biomech., vol. 39, no. S01, p. 513, 2024.
[17] Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y.
[4] X. H. Li, D. F. Fan, J. J. Feng, Y. Lei, C. Cheng, and
Sheikh, “Open Pose: Realtime multi-person 2D pose
X. N. Li, “Systematic review of motion capture in
estimation using part affinity fields,” IEEE Trans.
virtual reality: Enhancing the precision of sports
Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–
training,” J. Ambient Intell. Smart Environ., vol. 17,
186, 2019.
no. 1, pp. 5–27, 2025. https://doi.org/10.3233/AIS-
https://doi.org/10.48550/arXiv.1812.08008
230
[18] G. Gallego, T. Delbrück, and D. Scaramuzza,
[5] H. Chen, “Human motion capture data retrieval and
“Event-based vision: A survey,” IEEE Trans. Pattern
segmentation technology for professional sports
Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180,
training,” J. Mobile Multimedia, vol. 19, no. 2, pp.
2022.
419–436, 2023. https://doi.org/10.13052/jmm1550-
https://doi.org/10.1109/TPAMI.2020.3008413.
4646.1923
[6] B. Teer, “Performance analysis of sports training
based on random forest algorithm and infrared
motion capture,” J. Intell. Fuzzy Syst., vol. 40, no. 4,
pp. 6853–6863, 2021. https://doi.org/10.3233/JIFS-
189517
[7] T. Alam, M. Benaida, “Smart Curriculum Mapping
and Its Role in Outcome-based Education”,
Informatica, vol. 46, no. 4.
https://doi.org/10.31449/inf.v46i4.3717
[8] E. Candes and T. Tao, “Near-optimal signal recovery
https://doi.org/10.31449/inf.v49i16.10164 Informatica 49 (2025) 361–372 361
Improved DenseNet-DCGAN for Enhanced Digital Restoration of
Embroidery Cultural Heritage
Guiying Dong1*, Qian Mao2
1College of Art and Design, Communication University of China Nanjing, Nanjing 210000, China
2Library, Nanjing University, Nanjing 210000, China
E-mail: adong118@126.com; maoqian328@163.com
*Corresponding author
Keywords: DCGAN, DenseNet, embroidery, image classification, image restoration
Received: July 14, 2025
At present, embroidery image restoration technology still has deficiencies in terms of color uniformity
and detail restoration. To address these issues, the study improves the densely connected convolutional
network and the deep convolutional generative adversarial network through spatial pyramid pooling, and
proposes a novel method for embroidery image classification and restoration. The experimental results
showed that the research method largely restored the details and colors of the original image and
effectively addressed the uneven color issue. The average prediction accuracy, recall rate, and specificity
of the image classification model on Suzhou embroidery, Hunan embroidery, Guangdong embroidery,
and Shu embroidery reached 96.3%, 98.5%, and 99.4%, respectively. The structural similarity index of
the image restoration model has reached 0.99. The restored image was almost indistinguishable to the
naked eye in terms of details, texture, and color. The research method has significant advantages in
classifying embroidery images and high-quality restoration tasks, and can provide reliable technical
support for the digital protection and intelligent restoration of traditional embroidery cultural relics.
Povzetek: Za klasifikacijo in digitalno obnovo vezenin so razviti izboljšani DenseNet in DCGAN z
dodanim SPP, razširjenimi konvolucijami ter CBAM. Izboljšani model skoraj povsem naravno obnovi
teksture in barve.
1 Introduction architectural elements in the images of Greek temple
ruins, improving the efficiency of restoration and
Embroidery works have attracted countless people's enhancing the consistency and accuracy of the restoration
attention with their exquisite craftsmanship, rich patterns, effect [3]. Alessandro et al. used a trained
and profound cultural connotations. However, over time, multidimensional DL neural network to associate color
many embroidery artifacts have suffered from natural or images with X-ray fluorescence imaging raw data to
human damage, such as fading, breakage, and insect complete the restoration of AI digital cultural heritage,
infestation, which seriously threaten the preservation and achieving digital restoration of graphic artworks [4].
inheritance of embroidery artifacts [1]. The traditional With the further advancement of DL technology,
restoration of Embroidered Cultural Relics (ECR) mainly Generative Adversarial Networks (GANs) have made
relies on manual skills. Although this method can finely breakthrough progress in image recognition, providing a
handle every damage, it is limited by lower work good solution for cultural relic image restoration [5].
efficiency and dependence on the superb skills of the Praveen et al. proposed a new GAN-based art restoration
restorer [2]. In addition, the subjectivity in the manual method to digitally repair damaged artworks and assist in
repair process may also lead to deviations in the physical restoration. This method performed well in
consistency and accuracy of the repair effect. In this digital restoration and could effectively restore the
context, the emergence of Artificial Intelligence (AI) original appearance of artworks, providing important
technology, especially Deep Learning (DL) technology, guidance for physical restoration [6]. Zheng et al.
has provided new solutions for the restoration of cultural proposed an Example Attention Generative Adversarial
relics. By training DL models, staff can automatically Network (EA-GAN) that fuses with reference examples,
detect and classify the types of damage to cultural relics, which addressed the issue of significant reconstruction
providing a scientific basis for restoration work. At errors in traditional character restoration methods.
present, many researchers have explored it. For example, Compared with existing internal drawing networks, EA-
Maitin et al. proposed a direct reconstruction technique GAN could obtain the correct text structure through the
without image segmentation using DL technology to guidance of additional examples in the "example attention
reconstruct missing architectural elements in Greek block". The Peak Signal-to-Noise Ratio (PSNR) and
temple ruins images from virtual image paintings. This
method has successfully reconstructed the missing
362 Informatica 49 (2025) 361–372 G. Dong et al.
Structural Similarity Index (SSIM) values have increased DL model architecture that can establish dense
by 9.82% and 1.82% [7]. connections between network layers through
In summary, numerous scholars have achieved DenseBlocks, thereby improving the information flow and
significant results in cultural relic image restoration. gradient flow of the network, alleviating the problem of
However, there are still issues with GAN in terms of gradient vanishing, and promoting feature reuse [8-9]. The
image feature extraction, such as poor network training structure of DenseBlock in DenseNet is displayed in Fig.1.
stability and poor generated image quality. At present, In Fig.1, the connection mechanism of DenseBlock is
there is relatively limited discussion on embroidery more aggressive compared to the Residual Network
classification and restoration in cultural relic image (ResNet). Each layer is connected to all previous layers,
classification and restoration. Given this, this study providing each layer with a rich input that integrates the
innovatively constructs an ECR-Image Classification features of all previous layers [10]. This design ensures the
Model (ICM) based on Densely Connected Convolutional uniformity of feature map size within DenseBlock and
Network (DenseNet) and an ECR-Image Restoration greatly promotes feature reuse through dense connections
Model (IRM) based on Deep Convolutional GAN between layers, enabling the network to learn and transmit
(DCGAN). Based on these models, improvements are information more effectively [11]. However, DenseNet
made by introducing Local Binary Patterns (LBPs), Canny still has certain shortcomings in the image classification
operator edge extraction, and Convolutional Block process, such as the problem of input image size limitation
Attention Module (CBAM). The fusion of these and the problem of network training not converging [12-
technologies aims to enhance the model's capacity to 13]. Therefore, this study improves it through techniques
capture details in ECR images, improve the precise such as SPP, LBP, and Canny operator, and proposes a
reconstruction of textures and edges during the restoration novel ECR-ICM model, namely the SPP-IDenseNet
process, and achieve higher quality ECR image restoration model. The training process for the embroidery image
results. The main novelizations and contributions of this classification of this model is shown in Fig.2.
paper include: (1) For the first time, DenseNet is combined In Fig.2, this study first randomly selects a batch of
with Spatial Pyramid Pooling (SPP) and applied to data from the training set based on a preset batch size, and
classify embroidery images, improving the recognition normalizes it to standardize the standard deviation of the
performance under cross-style and complex patterns; (2) Red-Green-Blue (RGB) color channels for each
The structure of the DCGAN generator and discriminator embroidery image. Subsequently, the normalized image is
is innovatively adjusted. By integrating a dilated input into the network for forward propagation to extract
convolutional layer, the receptive field of the model is features and predict categories. Secondly, by comparing
expanded, which helps to capture image features more the predicted categories of the network with the actual
comprehensively and achieve high-quality restoration of categories, the value of the loss function is calculated.
embroidery texture and color. (3) A large-scale dataset Next, by adjusting the weights through the backward
containing eight types of traditional embroidery images is propagation process of the network, the model’s
constructed, providing fundamental support for performance is optimized. After completing a batch of
subsequent research. The research results have practical training, the system will check if the entire dataset has
value for the digital inheritance and AI-assisted restoration been traversed. If the traversal is not completed, the model
of traditional embroidery culture. will continue to process the next training batch and repeat
the above steps. Once the training traversal of the entire
2 Methods and materials dataset is completed, the model will save the weight
parameters of the current round and evaluate whether the
predetermined training round has been reached. If the
2.1 Construction of ECR-ICM based on training rounds have not been completed, the model will
SPP-IDenseNet restart the training process and continue iterative
ECR image classification is the prerequisite and optimization. After reaching the predetermined training
foundation for ECR image restoration. By classifying round, the model training terminates, and the weight
ECR images, different embroidery types, styles, and eras parameters at this time will be used for subsequent image
can be quickly identified and distinguished, providing a classification tasks. The calculation of the RGB three
scientific foundation for protecting the cultural relics. This channel pixel values OutputR , OutputG , and OutputB of
study first explores the classification of ECR images. the normalized image is shown in formula (1).
DenseNet was proposed by Gao et al. in 2017. It is a novel
Figure 1: Schematic structure of DenseBlock (Source from: https://colorhub.me/photos/e7RVB).
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 363
Embroidery data set
Divide the training set into Preprocessing Feature extraction
multiple training batches
Start
N
Y
Stop Training rounds met
Backward propagation Forward
of updated weights calculated loss
Y
Traversing the N
Save the
training set
model
Preprocessing the
next training batch
Figure 2: Training process of SPP-IDenseNet model for embroidery image classification (Source from:
https://colorhub.me/photos/e7RVB).
Input feature
ConvL adjusts the amount of channels to
map
half the original number
Max poor Max poor
4×4 16×16
Output
feature map FCL splicing feature map channel
Figure 3: Schematic diagram of SPP structure.
IutputR −mean In formula (2), n is the hierarchy of the model
Output = R
R network. F and are convolution operations and
stdR
feature interconnection operations. The loss l during the
Iutput −mean
Output = G G
G (1) training process is shown in formula (3).
stdG l = L(Y Y '
1, 1 ) (3)
Iutput −mean
Output = B B In formula (3), L is the loss function. Y and Y ' are
B 1 1
stdB the real category and the predicted category. The updated
In formula (1), IutputR , IutputG , and Iutput is shown in formula (4).
B are network weight '
the RGB three channel pixel values of the image before ' = − lr g(l) (4)
normalization processing. meanR , meanG , and meanB In formula (4), is the network weight before the
are the mean values of the RGB channels. stdR , stdG , update. lr and g() are learning efficiency and
and stdB represent the standard deviation of the RGB derivative calculation. In response to the issue of input
image size limitation in DenseNet model image
three channels. The output feature Mn is shown in
classification tasks, this study uses SPP to enable the
formula (2). model to adapt to input images of different sizes. The
Mn = F(M1 M2 Mn−1) (2) structure of SPP is shown in Fig.3.
364 Informatica 49 (2025) 361–372 G. Dong et al.
In Fig.3, this study integrates the SPP structure the model, it may also cause training instability and
between the convolutional layer and the Fully Connected sometimes even lead to model training crashes [16-17].
Layer (FCL) at the end of the DenseNet model. By Therefore, this study further introduces a novel derivative
dividing the feature map into grids of 1×1, 4×4, and GAN, namely DCGAN. This network can improve the
16×16, and applying max pooling, this study achieves quality of image generation and enhance the learning and
comprehensive capture of features of different resolutions. representation capabilities of the model by combining the
Subsequently, these multi-scale pooled feature maps will deep architecture of CNN with the GAN framework. The
be merged into a fixed-length feature vector, providing generator extends and reshapes 100-dimensional noise
rich information for the input of FCL. In addition, by into a 3D feature map through FCL, and then gradually
pooling on windows of different sizes, this study generates forms the final image size through upsampling and
feature maps with diverse resolutions and fine-tunes the dimension adjustment of transposed convolutional layers.
channel dimensions through a 1×1 convolutional layer. Batch normalization and ReLU are applied after each
The ReLU activation function used in DenseNet may layer, and the output image is finally activated by Sigmoid
cause neuron deactivation when the input is less than 0 to produce a specific tensor image [18-19]. The
[14]. Therefore, this study introduces the Leaky ReLU generator’s loss function is shown in formula (5).
function and sets the negative slope coefficient to 0.01, LG `= −Ez~ pz(z)[log(D`(G`(z))] (5)
effectively expanding the applicability of ReLU and In formula (5), E is the expected operation symbol,
promoting the stability and convergence of network usually taken as the average or expected value. z is a
training. The SPP module enhances the model's noise sample from the latent spatial prior distribution.
understanding of the structural hierarchy of embroidery
G、(z) is the data generated by the generator through the
patterns through multi-scale pooling operations and
improves the receptive field coverage of complex patterns. noise sample z . D、(G、(z)) is the output of the
LBP extracts fine-grained texture features from discriminator to the data generated by the generator, which
embroidery images, enabling the model to pay more represents the probability of real data. At this point, the
attention to the local texture restoration of the defect area. loss function of the discriminator is shown in formula (6).
Canny edge detection provides clear structural contour LD `= −Ex~ pdata(x)[log(D`(xz )]
constraints, guiding the generator to maintain the (6)
coherence and integrity of the pattern edges. The three −Ez~ pz (z )[1− D(G`(xz )))]
work in synergy, enhancing the quality and stability of In formula (6), D、(x me ns
z ) a the discriminator’s
image restoration from multiple dimensions, such as
output for the real sample x , and D`(x i th
structure, texture, and edge. z ) s e
z
probability of the real data. Based on the above formulas,
2.2 Construction of ECR-IRM based on compared to traditional GANs, DCGAN uses
convolutional and deconvolution layers to replace FCL in
Improved DCGAN traditional GANs. This operation can capture the local
The SPP-IDenseNet model designed above provides structure and spatial message of embroidery images [20].
strong technical support for the digital restoration and In addition, DCGAN also uses batch normalization
intelligent management of ECR. However, further techniques and expected values to accelerate the training
technological innovation and method improvement are process and stabilize the training of GAN. The aim is to
needed in ECR image restoration to achieve more efficient further enhance the performance of DCGAN in
and accurate restoration results. Therefore, this study embroidery image restoration tasks, improve the
explores the restoration of ECR images. GAN is a DL naturalness of restoration effects, and provide experts with
model containing two parts: the Generator and more accurate texture and color information to assist them
Discriminator. Although GANs are widely popular in in more refined restoration work. Given this, the study also
computer image vision, in traditional GAN architectures, improves DCGAN and proposes a new type of ECR-IRM,
models do not rely on a determined distribution, but namely IDCGAN. The overall model structure is shown in
instead use internal feedback to adjust their parameters Fig.4.
[15]. Although this approach enhances the flexibility of
Y/N
Generator D
Extracting
Generator G Generated image missing regions
Figure 4: Overall structural framework of the IDCGAN (Source from: https://colorhub.me/photos/e7RVB).
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 365
Conv3 Conv4 Conv5 Conv6 Conv7 Conv8 Conv9
Conv2
Conv1
Input CBAM
Conv14 Conv13Conv12 Conv11 Conv10
Input Tanh CBAM
Conv17 Conv16 Conv15
Figure 5: Specific structure of the generator in the IDCGAN model.
5×5
Conv2 5×5 5×5 5×5 5×5
Conv3 Conv4 Conv5 Conv6
5×5
Conv1
Input
256
256 256 256
64 128
Y/N Flatten3 Flatten2 Flatten1
Figure 6: Specific structure of the discriminator in the IDCGAN.
In Fig.4, innovative adjustments are made to the namely the convolution block, dilated convolution block,
generator architecture by integrating dilated convolutional and CBAM. Hollow convolution blocks use convolutional
layers to expand the model's receptive field, thereby layers with different void rates, namely 2, 4, 8, and 16, to
helping the model capture image features more achieve multi-scale capture of image features. When the
comprehensively. At the same time, CBAM is introduced hole rate is set to 1, the hole convolution degenerates into
to enhance the attention to key features at both the channel a standard convolution operation. This is reflected in the
and spatial levels, thereby improving the accuracy of Conv6 to Conv10 layers of the generator, forming a series
image restoration. The discriminator adopts a strategy of of ConvLs with different hole rates that ensure the
enhancing its depth and increasing the number of FCLs, flexibility and adaptability of the network. The
thereby improving the network's ability to handle complex introduction of CBAM adds the ability for dynamic
nonlinear problems, enabling the discriminator to more weighting to the generator. It can weight features in both
effectively recognize and distinguish between real and channel and spatial dimensions, highlighting the features
generated images. The loss function combines traditional that have the greatest impact on image quality. The
MSE loss with adversarial loss. The calculation of mean framework of the discriminator in the IDCGAN model is
square error loss LMSE is shown in formula (7). shown in Fig.6.
In Fig.6, to improve the performance of the
1 n
L 2 discriminator in addressing complex nonlinear problems,
MSE = (yi − gi ) (7)
n i=1 this study adds two FCLs based on the original
In formula (7), gi is the predicted value of the model discriminator architecture, making the discriminator
contain a total of three FCLs. The interconnection of these
on the training data xi . The adversarial loss Ladv is shown
layers enhances the discriminator's ability to learn
in formula (8). features, thereby significantly improving model
Ladv = minG maxD Ex P log2 D(x)
data performance. Ultimately, the discriminator determines the
+E authenticity of the input image through a binary
z P log2 (1−D(G(Z))) (8)
data classification task, distinguishing whether the image was
In formula (8), the algebraic meaning remains the generated by the generator or from a real dataset. The
same as before. The specific structure of the generator in research is conducted based on a self-built embroidery
the IDCGAN model is shown in Fig.5. image dataset. The images mainly come from digital
In Fig.5, the architecture of the generator in the museums, high-resolution cultural relic catalogues, and
IDCGAN model mainly consists of three key modules,
366 Informatica 49 (2025) 361–372 G. Dong et al.
cultural heritage archives, covering multiple historical 3 Results
periods and diverse embroidery styles. The initial dataset
contains 1,800 images. After expansion, the dataset
3.1 SPP-IDenseNet model performance
ultimately includes 8,957 images. For the unified model
input, the image is cropped and scaled to 256×256 pixels, testing
and normalization processing is carried out The study adopts five-fold cross-validation to evaluate the
simultaneously. Ultimately, the dataset is divided into a model’s performance. The training set is evenly divided
training set and a test set in an 8:2 ratio. To simulate the into five subsets of similar size. Four subsets are selected
common damage forms of ECR, the study also uses in sequence for model training, and the remaining subset
random occlusion to generate defect images. The is used as the validation set. This process is repeated five
occlusion forms include rectangles, free-shaped patterns, times to ensure that each subset participates in the
and speckled textures, and the area ratio is controlled at verification. Through multiple rounds of training and
10% to 40%. On this basis, image enhancement is carried validation, the mean and standard deviation of the
out by applying methods such as rotation, flipping, scaling, accuracy, recall rate, and specificity of the calculation
and color perturbation to improve the robustness and model are calculated, effectively avoiding the randomness
generalization ability of the model. In addition, by brought by a single division and enhancing the statistical
analyzing the color and style distribution of the images, a reliability and generalization ability of the evaluation
balanced sampling strategy is adopted to control the results. Table 1 shows the experimental setup and
category bias, ensuring the diversity and balance of the environmental parameters.
training data in terms of pattern style and damage type. All According to the settings in Table 1, the effectiveness
the code modules in the research are built based on the of the proposed model was first validated through ablation
PyTorch framework. Some of the code is as follows: testing, as shown in Fig.8.
import torch
import torch.nn as nn
# Simple Generator example
class Generator(nn.Module):
def__ init__ (self):
super().__init__ ()
self.net = nn.Sequential(
nn.Linear(100, 256),
nn.ReLU(),
nn.Linear(256, 3*64*64).
nn.Sigmoid()
def forward(self, x):
retum self.net(x).view(-1, 3, 64, 64)
# Simple Discriminator example
class Discriminator(nn.Module):
def__ init__(self):
super().__ init__ ()
self.net = nn.Sequential(
nn.Flatten(),
nn.Linear(3*64*64, 256),
nn.ReLU().
nn.Linear(256, 1),
nn. Sigmoid()
def forward(self, x):
return self.net(x)
# Training example (pseudo code)
# z = torch.randn(batch_ size, 100)
# fake_ images = generator(z)
# real_ output = discriminator(real_ images)
# fake_ output = discriminator(fake_ images)
Figure 7: Code.
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 367
Table 1: Environment and parameter configuration.
Serial number Experimental environments and hyperparameter categories Settings
1 Num epochs 200
2 Pre-training No
3 Batch size 20
4 Num class 8
5 Optimizer Adam
6 Learning rate 0.0001
7 Development Environment Windows 10
8 CPU Intel Core i9-10900K
9 GPU NVIDIA RTX 3090
10 Memory 64GB
11 Graphics Memory 16GB GDDR6X
12 Programming Tools PyTorch 1.6.0
100 96.4% 100 95.6%
90 88.5% 90 84.3%
80 79.3%
80
79.8%
70 65.3% 70 64.8%
60 60
50 SPP-IDenseNet 50 SPP-IDenseNet
40 IDenseNet 40 IDenseNet
30 SPP- 30 SPP-
DenseNet DenseNet
20 DenseNet 20 DenseNet
10 10
0
0 50 100 150 200 250 300 350 400 450 500 0 0 50 100 150 200 250 300 350 400 450 500
Sample size Sample size
(a) Training set (b) Test set
Figure 8: Ablation test results of SPP-IDenseNet.
Figs.8 (a) and (b) show the test results of the new Although the GIST model can capture certain texture
model in two datasets. As the test samples continue to information, it is limited by its feature extraction method
grow, the standalone DenseNet module shows lower based on compressed texture description. GIST’s
classification accuracy in both datasets, with the highest recognition ability for irregular shapes and multi-scale
being only 65.3%. After introducing the SPP module, patterns is weak, resulting in limited classification
LBP, Canny operator, and Gabor filter module performance. The SPP-IDenseNet model has
successively, the classification effectiveness of the entire demonstrated superior performance in all four types of
model has been significantly improved. The result embroidery image recognition tasks. This model enhances
indicates that when dealing with embroidery images with its feature perception ability for different scales and spatial
complex texture features, relying solely on global features structures by introducing SPP modules, and combines
for extraction has certain performance bottlenecks. The LBP and Gabor filters to model fine-grained textures,
classification accuracy of SPP-IDenseNet is highest at effectively improving the model's ability to recognize the
96.4% in the training set and 95.6% in the testing set. This microstructure of embroidery patterns. Meanwhile, the
study has improved various parts of the DenseNet model addition of the Canny edge detection operator enhances
to varying degrees for classifying and recognizing ECR the ability to capture boundary and contour features,
images, demonstrating the effectiveness of the improved enabling the model to maintain high classification
method. In addition, popular ICMs of the same type, accuracy even in the face of complex background
including Lightweight CNN (LCNN), Efficient CNN interference. The SPP-IDenseNet model has the highest
(ECNN), StyleGAN, and Global Image Spatial Texture accuracy rate of 96.3%, the highest recall rate of 98.5%,
(GIST), are introduced as comparative models. and the highest specificity of 99.4% on Suzhou
Performance tests are conducted using precision, recall, embroidery, Hunan embroidery, Guangdong embroidery,
and specificity as indicators, as shown in Table 2. and Shu embroidery. These indicators are numerically
In Table 2, due to their relatively simplified structures, superior to other convolutional neural network models and
LCNN and ECNN models have obvious deficiencies in have a more balanced distribution across categories. This
feature expression ability and fine-grained classification. result demonstrates the adaptability and effectiveness of
Classification accuracy/%
Classification accuracy/%
368 Informatica 49 (2025) 361–372 G. Dong et al.
the SPP-IDenseNet model in handling the classification occlusion issues, and identifying embroidery images with
task of ECR images. The confusion matrix obtained on the similar features in the embroidery image classification
embroidery image classification dataset before and after dataset. This robustness makes the SPP-IDenseNet model
model improvement is shown in Fig.9. a powerful tool for ECR image classification, which can
Figs. 9 (a) and (b) show the confusion matrices before effectively address the challenges in practical
and after model improvement. The SPP-IDenseNet model applications.
has the highest classification and recognition accuracy for
Shui ethnic ponytail embroidery, Xiqin, Hami, Su, Xiang, 3.2 Performance simulation testing of
Shu, and other embroidery in the embroideries image ECR-IRM for IDCGAN
classification dataset. Its classification accuracy in Yue
embroidery types is relatively the lowest. Overall, the This study uses the Tensorflow DL framework to
SPP-IDenseNet model achieves an average prediction implement the training and testing of the entire ECR-IRM.
accuracy of over 80% for the 8 styles of embroidery The weights β1 and β2 of the Adam optimizer are set to
images in the embroidery image classification dataset. 0.5 and 0.9. The loss changes of IDCGAN generator and
This indicates that the SPP-IDenseNet model discriminator at different network learning rates are shown
demonstrates strong robustness in handling noise, in Fig.10.
Table 2: Multi-metric performance test results for different models.
Style Model Precision/% Recall/% Specificity/%
LCNN 63.5 65.7 80.2
ECNN 67.2 69.8 81.6
Suzhou embroidery GIST 70.3 68.7 83.4
StyleGAN 85.7 87.4 89.1
Research method 95.8 98.5 94.2
LCNN 55.2 56.3 89.6
ECNN 58.7 60.4 90.2
Hunan embroidery GIST 60.2 61.7 91.6
StyleGAN 83.4 85.1 92.3
Research method 96.3 90.2 99.4
LCNN 57.6 59.8 53.8
ECNN 66.3 70.4 60.5
Cantonese embroidery GIST 71.6 69.7 70.8
StyleGAN 80.2 82.5 75.4
Research method 95.1 90.8 95.1
LCNN 58.8 60.5 55.6
ECNN 62.8 68.8 58.3
Sichuan embroidery GIST 70.4 73.4 60.7
StyleGAN 79.8 81.7 69.2
Research method 92.4 96.7 90.3
Confusion matrix Confusion matrix
90 90
Hamicxiu 56 1 0 0 0 0 0 0 Hamicxiu 82 1 0 0 0 0 0 0
80 80
Other 0 58 0 2 0 2 0 4 Other 0 87 0 3 0 3 0 6
70 70
Shuzuma Shuzuma
2 0 66 0 0 1 1 0 2 0 92 0 0 1 1 0
weixiu 60 weixiu 60
Shuxiu 0 0 0 52 2 1 0 2 50 Shuxiu 0 0 0 84 2 1 0 2 50
Suxiu 4 1 0 4 56 6 0 8 40 Suxiu 4 1 0 4 86 8 0 10 40
Xiangxiu 0 1 0 0 0 49 1 1 30 Xiangxiu 0 1 0 0 0 78 1 1 30
Xiqincixiu 6 0 0 0 1 1 68 0 20 Xiqincixiu 7 0 0 0 1 1 90 0 20
Yuexiu 0 2 1 1 2 2 0 47 10 10
Yuexiu 0 2 1 1 3 2 0 75
0 0
True labels True labels
(a) Pre-improvement (b) SPP-IDenseNet
Figure 9: Confusion matrix plots before and after model improvement.
Hamicxiu
Othe
S rhu
a zw ue m
ix
S ih u
uxiu
Suxi
X uiangx
X iuiqincix
Y iu
uexiu
Hamicxiu
Oth
S e
h r
u
a zw ue m
ix
S iuhuxiu
Sux
X iu
iangx
X iuiqincixi
Y uuexiu
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 369
0.0002 0.0002
0.002 0.002
3.0 0.00002 1.8 0.00002
2.5 1.5
2.0 1.2
1.5 0.9
1.0 0.6
0.5 0.3
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Epoch Epoch
(a) Loss of generators at different (b) Loss of the discriminator at different
learning rates learning rates
Figure 10: Loss variation of IDCGAN between generator and discriminator at different learning rates.
(a) Original image
(b) Random masking
(c) DCGAN
(c) IDCGAN
Figure 11: Repair effects of the model before and after the improvement (Source from: https://colorhub.me/).
In Fig.10 (a), the loss of the IDCGAN generator model performance, the repair effect of the improved
slowly increases with the growth of training cycles, and model before and after random occlusion is compared, as
the curve with a learning rate of 0.00002 shows a low and shown in Fig.11.
stable loss value. The curves with learning rates of 0.002 Figs.11 (a) to (d) show the embroidery original image,
and 0.0002 show higher loss values and larger images subjected to random occlusion, images restored by
fluctuations. In Fig.10 (b), the discriminator loss slowly the DCGAN model, and images restored by the IDCGAN
decreases as the number of training cycles increases. The model. By comparing these images, the effectiveness of
curve with a learning rate of 0.00002 decreases the fastest IDCGAN in handling different types of embroidery and
and tends to stabilize, indicating that a smaller learning varying degrees of occlusion can be demonstrated.
rate helps the discriminator learn more effectively. In IDCGAN can enhance the focus on key features, thereby
contrast, the curves with learning rates of 0.002 and enabling the restored image to largely restore the details
0.0002 exhibit significant fluctuations and higher loss and colors of the original image, effectively solving the
values. Based on the comprehensive experimental data, problem of color non-uniformity. However, DCGAN's
this study ultimately sets the network learning rate of the repair effect is not ideal when facing large-scale defects,
IDCGAN model to 0.00002. To verify the impact of and it cannot maintain good contextual consistency,
dilated convolutional layers, loss functions, and CBAM on resulting in poor repair performance. This discovery
G-Loss
G-Loss
370 Informatica 49 (2025) 361–372 G. Dong et al.
validates the necessity of improving the DCGAN. To led to significant advantages of IDCGAN in embroidery
further test the effectiveness of the research model in image restoration. Finally, to confirm the resolution
embroidery image restoration, the Cycle-Consistency capability of the proposed model, this study also tests four
GAN (CCGAN), Conditional GAN (CGAN), and Stacked models using image clarity as an indicator, as shown in
GAN (Stack-GAN) models are introduced for Fig.13.
comparison. The test results of SSIM as the experimental Figs.13 (a) to (d) show the clarity performance of
indicator are shown in Fig.12. CGAN, CCGAN, Stack-GAN, and IDCGAN models in
Figs.12 (a) and (b) show the SSIM performance the Yue embroidery image restoration task. Figure 13 (e)
comparison of four models in two datasets. Both in the shows the clarity of the original image. The Yue
training and testing sets, the IDCGAN model performs the embroidery restoration images generated by IDCGAN are
best, followed by Stack-GAN and CCGAN, while CGAN visually very similar to the original images, and it is
performs the worst. In the training set, the maximum almost impossible to distinguish the quality differences
SSIM values for CGAN, CCGAN, Stack-GAN, and the with the naked eye. In contrast, there are significant
research model are 0.64, 0.72, 0.85, and 0.98, while in the differences between the restoration results of CGAN,
testing set, they are 0.69, 0.78, 0.90, and 0.99. The above CCGAN, and Stack-GAN and the original images.
data indicates that the research model has significant Especially for the restored images of the CGAN model,
advantages in maintaining image structure and quality. there is a significant decrease in clarity in comparison to
The reason behind this is that the dilated convolution the original images. In summary, the research model
technique effectively expands the receptive field, allowing surpasses the comparative model in image resolution for
it to capture richer contextual information in the image. In Guangdong embroidery restoration, demonstrating its
addition, CBAM further enhances the model's attention to potential and advantages in embroidery image restoration
key features by weighting important features in both processing.
channel and spatial dimensions. These improvements have
CGAN Stack-GAN CGAN Stack-GAN
0.5 CCGAN IDCGAN 0.5 CCGAN IDCGAN
0.6 0.6
0.7 0.7
0.8 0.8
0.9 0.9
1.0 1.0
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200
Sample size Sample size
(a) Training set (b) Test set
Figure 12: Schematic of SSIM test results for different models.
Image Gradient Entrop
5.65 6.12 6.78 7.82
(a) CGAN (b) CCGAN (c) Stack-GAN (d) IDCGAN
7.86
(e) Original image
Figure 13: The clarity of restored images of Cantonese embroidery (Source from: https://colorhub.me/photos/VXeo3).
SSIM
SSIM
Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 371
2565, 2021. https://doi.org/10.1007/s00371-021-
4 Conclusion 02216-0
[2] Xiaoli Fu, and Niwat Angkawisittpan. Detecting
The study focused on the task of image restoration of ECR surface defects of heritage buildings based on deep
and innovatively constructed an ECR-ICM based on SPP- learning. Journal of Intelligent Systems, 33(1):163-
IDenseNet and an ECR-IRM based on the improved 169, 2024. https://doi.org/10.1515/jisys-2023-0048
DCGAN. The experimental results showed that the SPP- [3] Ana M. Maitin, Alberto Nogales, Emilio Delgado-
IDenseNet model achieved an average prediction Martos, Giovanni Intra Sidola, Carlos Pesqueira-
accuracy rate of over 80% for the embroidery images of Calvo, Gabriel Furnieles, and Álvaro J. García-
eight styles. The IRM could enhance the focus on key Tejedor. Evaluating activation functions in GAN
features, thereby enabling the restored image to largely models for virtual inpainting: A path to architectural
restore the details and colors of the original image, heritage restoration. Applied Sciences, 14(16):6854-
effectively solving the problem of uneven color. The 6854, 2024. https://doi.org/10.3390/app14166854
SSIM value has reached 0.99. Furthermore, the research [4] Alessandro Bombini, Fernando García-Avello
model could still maintain an excellent restoration effect Bofías, Chiara Ruberto, and Francesco Taccetti. A
even when dealing with large-area damaged embroidery cloud-native application for digital restoration of
images. The restored image of Cantonese embroidery cultural heritage using nuclear imaging:
generated was visually extremely similar to the original THESPIAN-XRF. Rendiconti Lincei. Scienze
image, and it was almost impossible to distinguish the Fisiche e Naturali, 34(3):867-887, 2023.
quality difference with the naked eye. The results show https://doi.org/10.1007/s12210-023-01174-0
that the research model achieves innovation in technology [5] Kanghyeok Ko, Taesun Yeom, and Minhyeok Lee.
and demonstrates significant advantages in practical SuperstarGAN: Generative adversarial networks for
applications. However, the research model also has certain image-to-image translation in large-scale domains.
limitations. On the one hand, the current models mainly Neural Networks, 162(42):330-339, 2023.
target 2D embroidery images. At present, there is no https://doi.org/10.1016/j.neunet.2023.02.042
adaptive research on complex 3D multi-level embroidery [6] Praveen Kumar, and Varun Gupta. Restoration of
structures and heterogeneous multi material embroidery damaged artworks based on a generative adversarial
patterns, which limits their promotion and application in network. Multimedia Tools and Applications,
high-precision virtual restoration. On the other hand, due 82(26):40967-40985, 2023.
to the adoption of a deep generative network structure, the https://doi.org/10.1007/s11042-023-15222-2
model has a certain dependence on computing resources [7] Wenjun Zheng, Benpeng Su, Ruiqi Feng, Xihua
during the training and inference stages. This may pose Peng, and Shanxiong Chen. EA-GAN: Restoration
practical challenges in resource constrained cultural of text in ancient Chinese books based on an example
heritage conservation institutions or mobile deployments. attention generative adversarial network. Heritage
Furthermore, for severely damaged or extremely blurry Science, 11(1):55-62, 2023.
images, there is still a certain risk of distortion in the https://doi.org/10.1186/s40494-023-00882-y
structural reconstruction of the research model. Future [8] Mihai Bundea, and Gabriel Mihail Danciu.
research can be carried out in the following directions: (1) Pneumonia image classification using DenseNet
Expansion of model generalization ability: By integrating architecture. Information, 15(10):611-619, 2024.
3D reconstruction and multimodal input, the restoration https://doi.org/10.3390/INFO15100611
ability of 3D ECR can be enhanced; (2) Enhanced multi- [9] Sherly Alphonse, S. Abinaya, and Nishant Kumar.
material adaptability: Material perception module or style Pain assessment from facial expression images
transfer mechanism can be introduced to achieve texture utilizing Statistical Frei-Chen Mask (SFCM)-based
simulation and reconstruction of heterogeneous features and DenseNet. Journal of Cloud Computing,
embroidery materials; (3) Lightweight deployment 13(1):142-148, 2024.
optimization: By applying techniques such as model https://doi.org/10.1186/S13677-024-00706-9
pruning, quantization, and distillation, the network [10] Chunyang Zhu, Lei Wang, Weihua Zhao, and Heng
structure is compressed to adapt to edge devices or mobile Lian. Image classification based on tensor network
terminal applications. Overall, the research method DenseNet model. Applied Intelligence, 54(8):6624-
provides a feasible and effective technological path for 6636, 2024. https://doi.org/10.1007/S10489-024-
ECR digital protection, which is expected to have practical 05472-4
applications in digital museum construction, virtual [11] S. Deepa, Beevi S. Zulaikha, Laxman L. Kumarwad,
restoration of cultural heritage, and reconstruction of and Sabbineni Poojitha. Namib beetle firefly
cultural creative models. optimization enabled DenseNet architecture for
hyperspectral image segmentation and classification.
References International Journal of Image & Data Fusion,
15(2):190-213, 2024.
[1] Xinyang Guan, Likang Luo, Honglin Li, He Wang,
https://doi.org/10.1080/19479832.2023.2284781
Chen Liu, Su Wang, and Xiaogang Jin. Automatic
[12] Suresh Samudrala, and C. Krishna Mohan. Semantic
embroidery texture synthesis for garment design and
segmentation of breast cancer images using
online display. The Visual Computer, 37(9):2553-
DenseNet with proposed PSPNet. Multimedia Tools
372 Informatica 49 (2025) 361–372 G. Dong et al.
and Applications, 83(15):46037-46063, 2023.
https://doi.org/10.1007/S11042-023-17411-5
[13] M. Karthikeyan, and D. Raja. Deep transfer learning
enabled DenseNet model for content-based image
retrieval in agricultural plant disease images.
Multimedia Tools and Applications, 82(23):36067-
36090, 2023. https://doi.org/10.1007/S11042-023-
14992-Z
[14] Babu Rajendra Prasad, and Dr. B. Sai Chandana.
Human face emotions recognition from thermal
images using DenseNet. International Journal of
Electrical and Computer Engineering Systems,
14(2):155-167, 2023.
https://doi.org/10.32985/IJECES.14.2.5
[15] Ning Wang, Yanzheng Chen, Yi Wei, Tingkai Chen,
and Hamid Reza Karimi. UP-GAN: Channel-spatial
attention-based progressive generative adversarial
network for underwater image enhancement. Journal
of Field Robotics, 41(8):2597-2614, 2024.
https://doi.org/10.1002/ROB.22378
[16] Noa Barzilay, Tal Berkovitz Shalev, and Raja Giryes.
MISS GAN: A Multi-IlluStrator style generative
adversarial network for image to illustration
translation. Pattern Recognition Letters,
151(16):140-147, 2021.
https://doi.org/10.1016/J.PATREC.2021.08.006
[17] Manuel Domínguez-Rodrigo, Ander Fernández-
Jaúregui, Gabriel Cifuentes-Alcobendas, and
Enrique Baquedano. Use of generative adversarial
networks (GAN) for taphonomic image
augmentation and model protocol for the deep
learning analysis of bone surface modifications.
Applied Sciences, 11(11):5237-5247, 2021.
https://doi.org/10.3390/APP11115237
[18] Aram You, Jin Kuk Kim, Ik Hee Ryu, and Tae Keun
Yoo. Application of generative adversarial networks
(GAN) for ophthalmology image domains: A survey.
Eye and Vision, 9(1):6-16, 2022.
https://doi.org/10.1186/S40662-022-00277-3
[19] Zhiguo Xiao, Jia Lu, Xiaokun Wang, Nianfeng Li,
Yuying Wang, and Nan Zhao. WCE-DCGAN: A
data augmentation method based on wireless capsule
endoscopy images for gastrointestinal disease
detection. IET Image Processing, 17(4):1170-1180,
2022. https://doi.org/10.1049/IPR2.12704
[20] Betelhem Zewdu Wubineh, Andrzej Rusiecki, and
Krzysztof Halawa. Classification of cervical cells
from the Pap smear image using the RESDCGAN
data augmentation and ResNet50V2 with self-
attention architecture. Neural Computing and
Applications, 18(24):1-15, 2024.
https://doi.org/10.1007/S00521-024-10404-X
https://doi.org/10.31449/inf.v49i16.9995 Informatica 49 (2025) 373–396 373
Enhanced Prediction of Tropical Tree Biomass Using Ensemble
Models
Qiucai Dang
Zhumadian Preschool Education College, Zhumadian 463000, China
E-mail: Dqc336699@163.com
Keywords: above-ground biomass, below-ground biomass, ensemble stacking, grid search optimization
Received: July 3, 2025
The present paper aims to propose a novel model to investigate its utility in evaluating the beneficial
effects of tropical forest biomass. To address the multiplicity of variables, as well as the complexity and
nonlinear relationships between them, five Machine Learning (ML) models, namely Gradient Boosting
(GB), Extra Trees (ET), XGB, ElasticNet, and Poisson Regression, were employed to concurrently predict
both the below-ground and above-ground tree biomass (BGB and AGB, respectively), as well as the total
biomass (TB = BGB + AGB). Since the results of the aforementioned models were not entirely satisfactory,
an additional model called the Stacking Ensemble (SE) was introduced. Each model can have its
parameters optimized by Grid Search with cross-validation to make sure that there is generalization and
consistent performance. The data collected were based on 175 trees from 27 ecoregional plots located in
the Central Highlands ecoregion of Vietnam. The dataset was processed to investigate the proposed
model's ability to predict tree biomass. The study's findings revealed that the proposed method
demonstrated strong and efficient predictive capabilities for biomass estimation in forest ecoregions. The
Stacking model showed the most significant improvements in the highest R 2 (0.968) and VAF (0.971),
and the lowest errors, and MDAPE (23.081 percent), which means that it has a strong ability to predict
and minimal bias. However, STD (105.763) was marginally higher; nevertheless, the error and strength
of this variation exceeded this variance. Thus, incorporating a Stacking Ensemble (SE) model strengthens
the ML approach in predicting forest tree biomass amounts.
Povzetek: Študija predlaga ansambelski model za napoved tropske drevesne biomase, ki združuje pet ML-
modelov in optimizacijo z iskanjem po mreži. Stacking Ensemble doseže najboljša napovedovanja ter
najnižje napake, kar občutno izboljša oceno nadzemne, podzemne in skupne biomase.
1 Introduction 1.2 Above-ground biomass (AGB)
The term above-ground biomass (AGB) refers to the
1.1 The role of biomass product of above-ground volume (AGV) and vegetation
mass. It is also closely linked to the carbon cycle in global
Given that biomass plays an unquestionable role as one of
grassland ecosystems. Additionally, accurate estimation
the world’s vital sources of energy [1]. The disputing
of AGB variations is essential for assessing carbon
matter is what appropriate model would be able to
decomposition and its impact on climate change. It is also
recognize and prove its traits. Zhantao Song et al. (2024)
crucial to screen in situ-harvested AGB data before
in their work discussed original visions about the concept
modeling [3]. Furthermore, AGB is an indispensable
of the pyrolysis process of biomass. They argued the
factor for evaluating ecosystem health and carbon storage.
contribution of various factors to the challenging
To estimate AGB, the above-ground volume (AGV) of
anticipation of physicochemical traits by applying
vegetation is considered a high-priority parameter in
machine learning techniques such as Random Forest,
research [4].
gradient boosting decision tree, extreme gradient
To estimate AGB variations of China’s grassland
boosting, in which R2 was higher than 0.97 for particular
ecosystems, machine learning algorithms, among which
surface area biochar anticipation as well as analysis,
the Random Forest model with R2 = 0.83 (i.e., 83 % of the
involving yield as well as N content of biochar [1].
harvesting AGB variations), and RMSE = 43.84 gm−2,
In another study, Jia et al. (2024) exploited machine
revealed accurate performance in estimating grassland
learning methods to anticipate zeolite-catalyzed biomass
AGB [3]. Mao et al. (2021) in their proposed model
pyrolysis, and as a result, the Random Forest algorithm
proved that structural, textural, and spectral metrics
performed the highest prediction with R² >0.91 for their
contribute to shrub AGV models. They also suggested a
suggested models. They concluded that their selected
direct reference to specify proper vegetation metrics to
factors and methods based on biomass characteristics can
screen shrub AGV. The efficiency, accuracy, and low cost
be taken into account as a plausible reference [2].
374 Informatica 49 (2025) 373–396 Q. Dang
are considered to be the pros of their proposed approach 1.5 Regression models
for digital terrain model (DTM) output and AGV
It is appropriate to take a brief glimpse at the regression
estimation; thus, it can bridge the gap between ground-
models proposed in the present article:
based research and satellite remote sensing [4]. May et al.
The Gradient Boosting (GB) model is regarded as a
2024 obtained spatially complete predictions of biomass
strong ML algorithm for numerical optimization
in a tropical area. They state that this sort of spatially
problems. Thanks to Leo Breiman (1998) and Jerome
coherent data about AGB supplied by their model is useful
Friedman (2001), GB has been developed. The former
to validate the eco-friendly forest handling, carbon
used GB for decreasing variance for categorization, and
decomposition innovations, and climate change
the latter improved it for regression and categorization
alleviation [5].
models. GB algorithms carry out numerical optimization
for the models of regression and categorization, repeatedly
1.3 Below-ground biomass (BGB) being approximately directed towards the loss function
Below-ground biomass (BGB) is a significant part of negative gradient. Due to some complexity, it is
forest tree biomass; however, fewer studies have focused impossible to direct precisely towards a negative gradient;
on BGB about forest biomass and carbon. This is largely normally, a weak learner is applied by a GB model to
because the process of measuring BGB in large trees is estimate the extreme decline direction [13].
costly and time-consuming. As a result, researchers often Extra Trees (ET), a recently developed regression
use Above-Ground Biomass (AGB) to estimate BGB by model, is considered to be an ensemble ML algorithm
applying a root-to-shoot ratio. For different forest types, related to decision trees. Originally, ET is the improved
researchers have also developed specific direct BGB form of the Random Forest algorithm for the purpose of
equations [6]. regression or categorization performance. The reason that
In a recent study, Oliveira et al. (2024) suggested that makes the ET regression algorithm more competitive for
predicting peanut BGB using their proposed alternative small-sized sample ML is that it utilizes all data to
method—i.e., the multi-output regression (MTR) improve the branches of nodes in decision trees effectively
approach—would enable both researchers and farmers to [14]. Wang et al. (2023) in their study provided an
quantify BGB more accurately. They proposed this efficient ML model utilizing an ET regression algorithm
method to predict multiple peanut maturity indices at the for anticipating the relevant synthesis gas traits in the
field level, helping to reduce subjectivity in determining process of biomass chemical looping gasification, and
peanut maturity [7]. then compared its ability in prediction between the ET
model and traditional ones. In another study, using both
1.4 Ensemble approaches RF along with ET al algorithm models, researchers
developed a general model to precisely predict the co-
Ensemble learning is a potent machine learning technique
pyrolysis of coal and biomass, in which ET performed
that reduces overfitting, boosts robustness, and enhances
better [15]. ET is advantageous due to achieving more
overall performance by combining predictions from
efficient performance than the Random Forest. Compared
several models. Ensemble approaches combine the
to RF, ET does not perform bootstrap accumulation like,
advantages of multiple algorithms to improve
i.e. it takes a random subset of data without replacement.
generalization rather than depending on a single model
Hence, nodes are divided randomly, but not based on the
[8]. Stacking, also known as stacked generalization, is a
best divisions. Therefore, in the ET regression model,
versatile and successful ensemble technique. Stacking
randomness doesn’t come from bootstrap accumulation
mixes different kinds of models, possibly with different
but from the random divisions of the data [16]. According
architectures and learning strategies [9]. In contrast to
to Roy (2021), RF was introduced to overcome the
bagging (e.g., Random Forest) or boosting (e.g., Gradient
Decision Tree problems, giving medium variance.
Boosting, XGBoost), which combine similar models
Accordingly, ET was proposed when accuracy was more
(typically decision trees). Naik et al. (2022) utilized
crucial than a generalized model. It also delivers low
automated stacked ensemble modelling powered by
variance.
machine learning for predicting aboveground biomass in
Extreme Gradient Boosting (XGB) is another strong,
forests using multitemporal Sentinel-2 data [10]. A
multifaceted ML algorithm used for regression and
stacking ensemble algorithm was used by Zhang et al.
taxonomy jobs. It is well-known for its exceptional
(2022) to reduce the biases in estimates of forest
capability to predict performances and deal with intricate
aboveground biomass derived from several remotely
datasets. GB involves a series of procedures, preparing
sensed datasets [11]. Besides, Jin et al. (2025) evaluated
models in sequence, based on which the previously
the impact of validation techniques and ensemble learning
produced errors are reformed by each new model.
algorithms on estimating aboveground biomass in forests:
XGBoost is a type of ensemble learning technique that
a case study of natural secondary forests [12]. To this end,
mixes the predictions of various ML models to yield an
they developed models based on various outcomes,
ultimate prediction that is more precise. Besides that, this
qualified to synchronously anticipate AGB, BGB, and the
algorithm also makes use of decision trees like basic
total amount of tree biomass, i.e., TB, in forest areas,
learners during its process. To add more, XGB is intended
solving the problem of carbon estimation for various
to efficiently influence processors of high-capacity and
forest sites.
approaches of the distribution system [17]. Ayub et al.
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 375
(2023) applied an XGB algorithm model on a multi-level homogeneous mixture of fundamental models, for
factorial design outcome to predict and improve the accurate yet, at the same time, interpretable prediction of
gasification product, in which the XGB model depicts a lung cancer prognosis so as to recognize crucial risk
good prediction accuracy as well as model optimization factors [18].
analysis. The key characteristics of the XGB are explained The use of DL methods is unquestionably dominant
as an ability to handle complicated relations in data with over other traditional methods, particularly in tropical
regularizing techniques, effectively preventing forests biomass research [6]. Although many studies have
overfitting; thus, it performs the calculation efficiently due investigated tree biomass anticipation by applying various
to parallel processing. It considers the usage of decision models [6], the applied models are well-established. But
trees as base learners and then makes use of regularizing lack combining models as ML, ensemble, and
techniques for model generalization at a higher optimization of hyperparameters approaches. This work
dimensionality. XGB, more popularly acknowledged for adds value by combining them using a meticulously
its efficiency in computations, provides processing designed Stacking Ensemble specifically designed for
efficiency with perceptive analysis of feature significance, predicting AGB, BGB, and TB using a small, real-world
as well as deals with missing values smoothly [18]. dataset from 27 eco-regions in Vietnam. The Fit Index
ElasticNet, being a powerful linear regression (FI), a stability-focused evaluation metric that hasn't been
technique highly beneficial in ML and statistical used in biomass prediction before, is introduced in this
modeling, excels traditional linear regression models. It study. The proposed approach provides new
bears the ability to mix the penalties created by both Lasso methodological insights that improve prediction accuracy
and Ridge regressions. It is useful in particular when and generalizability in tropical biomass estimation by
traditional linear regressions struggle with combining rigorous preprocessing, multi-target modelling
multicollinearity, i.e., when predictors are highly within an ecological context, and systematic
correlated [19]. That is to say, ElasticNet is advantageous hyperparameter tuning through Grid Search. Furthermore,
due to bearing multi-dimensional datasets, selecting this work differs from earlier black-box DL applications
significant traits, and being a more consistent and reliable in that it incorporates Shapley Additive Explanations
model where there exists collinearity. Aimed to help in (SHAP) for ecological feature interpretation, which offers
solving problems of regression and developing models’ important ecological insight. Hence, this study was
performance, ElasticNet offers effective analytical means conducted to serve the purpose of bridging this gap. This
for handling multi-dimensional regression. Its common subject is an expansion of an ongoing strategy to integrate
applications include characteristics selection, analysis of remote sensing inputs acquired using a satellite or a drone
regression, and modeling for prediction [20]. The and a source of biomass determinations as measured on
significance of ElasticNet regression includes the ground in order to develop a spatially superior, and
multicollinearity handling, automatic feature selection, rooted business-time dynamic biomass forecast model.
aiding in model interpretability and reducing overfitting, Besides the otherwise plausible analytical foundation of
flexible regularization, allowing researchers to control the the process, the model is capable of capturing some facets
balance between Lasso and Ridge penalties, robustness in of complex nonlinear responses and enhancing the
high-dimensional data, appropriateness for a variety of accuracy of predicting biomass over wider geographical
regression problems [15]. areas and timeframes due to the use of sophisticated
Poisson is a regression analysis where its answer is Stacking ensembles, enabled by Grid Search and cross-
based on the distribution called Poisson. The regression validation. Besides, the climatic variables can be included
suffers from a limitation of the variance equaling the to forecast the change in biomass distribution in the case
mean, called Equi dispersion. As a consequence of the of a future climate change scenario, which can provide a
assumption being violated, resulting in the biased standard significant insight both in forest management and on
error, the less exact test statistics drawn from the model, carbon budgeting. That is to say, designing a new model
and consequently, the obtained conclusions will be less qualified to anticipate tree BGB, AGB, as well as the total
valid. The Poisson regression model, therefore, cannot be of tree biomass TB (i.e., TB = BGB + AGB) concurrently,
used under occurrences of over-dispersion or under- will fulfil the requirement of estimating forest carbon. On
dispersion. Poisson regression is one of the generalized this account, making use of a community of up-to-date
linear models. It finds its main application because it regression algorithms to increase the reliability for the
usually happens to model occurrences of the kind that are aforementioned parameters estimation, as well as that for
rarely occurring [21]. the newly proposed model, will assist the progressing
The Stacking Ensemble (SE) model makes use of an literature in the realm of forestry science. The study
ensemble generalizing approach through learning, despite proposes that integrated ensemble models will anticipate
the fact that it may lack instructions for appropriate non- tropical tree biomass better than traditional modeling
hyperparameterized meta-learners. The necessity of systems; as a result, the model will be dominant over
applying stacking is when multiple ML methods reveal conventional ones. The study objectives are twofold:
various advantages for a certain task. In this case, the firstly, designing a model to concurrently anticipate tree
stacking ensemble method employs a discrete ML AGB, BGB, and TB, guaranteeing additivity of tropical
technique for specifying the efficient application of forests in Vietnam by the names of Dipterocarp and
various algorithms [22]. For this reason, Arif et al. (2024) Evergreen Broadleaf, and secondly, cross-validating
developed a model of stacking ensemble, by a non-
376 Informatica 49 (2025) 373–396 Q. Dang
errors compared to a traditional model, applying the same 2 Methodology
dataset as well as anticipators in the mentioned forests.
The rest of this paper is structured as follows. That is, The present paper aimed to investigate the efficiency of a
Section 2 discusses detailed methodology, including state-of-the-art model to be qualified for predicting
materials and data used in this work. Section 3 presents tropical forest tree biomass effects.
numerical analyses, graphical analyses, and experimental This study was conducted in one of Vietnam’s eight
results under the heading of results and discussions. highest tropical forests, called the Central Highlands
Lastly, section 4 summarizes the concluding points in the ecoregion. Two main tropical forest categories were
study. selected for the focus of the research, i.e., Dipterocarp and
Evergreen Broadleaf (See Fig. 1).
Figure 1: Sample plots, Locations for the forests Dipterocarp and Evergreen Broadleaf in the ecoregion of Central
Highlands, Vietnam
In this work, the dataset was exploited in a research machine learning model: Poisson regression, ElasticNet,
study conducted by Huy et al. [6]. The collected data were XGB, Extra Trees (ET), and Gradient Boosting (GB). This
based on 175 trees from 27 ecoregional lots located in the allowed us to systematically explore parameter
Central Highlands, Vietnam. We clearly define the dataset combinations and choose those that produced the best
partitioning strategy to ensure reproducibility: the entire performance on training data. Based on the results of
dataset of 175 samples was randomly divided into training cross-validation, the Grid Search methodically
(80%) and testing (20%) sets. Cross-validation was used investigates a predetermined set of hyperparameter values
over iterations to ensure robust evaluation and minimize to determine which combination produces the best model
sampling bias. To ensure compatibility across models and performance.
better convergence during training, feature preprocessing A customized grid of important hyperparameters was
involved removing outliers and normalizing all input built for every model. For instance, tree-based models
variables to a [0,1] range using Min-Max scaling. The such as GB, ET, and XGB had their learning rate,
hyperparameter tuning process was carried out using Grid maximum depth, and number of estimators adjusted. We
Search with 5-fold internal cross-validation for each adjusted the L1 ratio and alpha (regularization strength)
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 377
for ElasticNet. Likewise, pertinent parameters for the MSE, RMSE, MAE, R2, STD, NMSE, MDAPE, and
stacking meta-learner and Poisson regression were VAF, were used to evaluate performance.
adjusted. In order to maximize generalization and the Fit Index (FI), a goodness-of-fit metric intended
performance consistency, Grid Search was used in a cross- to assess the quality of predictions across several cross-
validation framework to guarantee that each model was validation realizations. A higher FI value indicates a better
trained with the best parameter settings. This method fit, with values approaching 1. The formula for calculating
greatly enhanced both the Stacking Ensemble's overall the FI is presented below.
performance and the accuracy of the individual models. 1 𝑘 ∑𝑚 2
𝑖=1(𝑦𝑖 − ?̂?𝑖)
The desired targets in this dataset included the amount 𝐹𝐼 = ∑ (1 − (1)
𝑘 ∑𝑚
of above-ground tropical biomass (AGB), the amount of 𝑖 (𝑦𝑖 − ?̅?𝑖)
2
1 =1
below-ground tropical biomass (BGB), and (TB), namely In the equation above k stands for the realizations
the total tropical tree biomass; equaling the summation of number (in this study k = 10), m is the number of trees
the below-ground and above-ground tree biomass (i.e., TB sampled in the validation dataset; and yi is the observed
= BGB + AGB). Moreover, preprocessing and value. ?̂?i represents the predicted value, and ?̅? shows the
normalization operations were done on the data. averaged value for BGB, AGB, and TB of the ith validated
tree in the realization of kth.
To serve the purpose of the study, five ML models, as The study goal was to evaluate accuracy and model
a base learner including GB, ET, XGBoost, ElasticNet, consistency in light of the ecological context and the small
and Poisson, were employed to synchronously anticipate dataset size. Metrics like R2 and VAF measure the
both the amount of below and above-ground tree biomass percentage of variance explained by the models, while
(BGB and AGB, respectively) as well as the total amount MSE, RMSE, and MAE quantify absolute prediction
of tree biomass, i.e., TB = BGB + AGB. errors. Understanding normalization effects and error
Owing to the individual models' mediocre distribution is aided by STD and NMSE. MDAPE is a
performance, these five models were used as base learners reliable percentage-based metric that works especially
to create a Stacking Ensemble (SE). Following that, a well with data that contains outliers or skewness, which is
meta-learner was trained using their predictions to typical in biomass measurements. A new and
generate the final prediction for every biomass comprehensible metric designed for model comparison
component. across several validation folds, the Fit Index (FI) was
For the purpose of assessing as well as selecting the introduced to reward accuracy and stability. Ultimately,
most efficient model able to concurrently anticipate when combined, these metrics make sure that the
tropical tree BGB, AGB, and TB, a powerful process of assessment covers robustness, interpretability, and
cross-validation was carried out. predictive accuracy—all of which are critical components
The Total number of the data was 175 which was for ecological modelling and decision-making,
randomly split ten times into two sections, involving 140 demonstrating that the Stacking Ensemble model was
(80%) for training data, and 35 (20%) for testing data, more efficient than the other compared models.
evaluating impartially. The reason why the data was Evaluation methods for error metrics criteria are exhibited
altered into data testing and training data was to conduct a in Table 1.
data analysis satisfying accuracy and reliability in this
research. A wide range of assessment metrics, such as
Table 1: Equations for evaluation of statistical metrics criteria
Statistics Name Equation
∑𝑖=0
𝑁−1(𝑦𝑖 − ?̂?𝑖)2
MSE Mean Squared Error 𝑀𝑆𝐸(𝑦, ?̂?) =
𝑁
∑𝑛
RMSE Root Mean Square Error 𝑅𝑀𝑆𝐸 = √ 𝑖=1(𝑦𝑖 − ?̂?𝑖)2
𝑛
∑𝑖=1
𝑛 |𝑦𝑖 − ?̂?𝑖|
MAE Mean Absolute Error 𝑀𝐴𝐸 =
𝑛
∑𝑖=1
𝑁 (𝑦𝑖 − ?̂?𝑖)2
R2 Determination Coefficient 𝑅2(𝑦, ?̂?) = 1 −
∑𝑖=1
𝑁 (𝑦𝑖 − ?̅?)2
∑𝑖=1
𝑛 (𝑥𝑖 − ?̅?)2
STD Standard Deviation 𝑆𝑇𝐷 = √
𝑛 − 1
‖𝑥 − 𝑦‖2
NMSE Normalized Mean Square Error 1 −
‖𝑥 − ?̅?‖
𝑒𝑎𝑏𝑠 𝑝𝑟?̂?
1 − 𝑒
MDAPE Median Absolute Percentage Error 𝑚𝑒𝑑𝑖𝑎𝑛 (‖ 1 ‖) ∗ 100%
𝑒𝑎𝑏𝑠
1
𝑣𝑎𝑟(𝑡𝑛 − 𝑦𝑛)
VAF Variance Account Factor (1 − ) ∗ 100
𝑣𝑎𝑟(𝑡𝑛)
378 Informatica 49 (2025) 373–396 Q. Dang
n represents observations number, 𝑦𝑖 is ith observed comprehension of the step-by-step research methodology.
value, ?̂?𝑖 shows ith predicted value, and ?̅? is the That is to say, the research process begins with the dataset,
observations average. going through analyzing and normalizing them, next,
Graphical analyses were also carried out to assess the dividing the normalized data into train and test. More
accuracy of the recommended model performance. important part is here where the proposed ML models are
Illustrated in different plots, they give the reader evaluated based on an array of specific metrics to opt an
illuminating perceptions of the suitability and accuracy of appropriate model which is appeared to be Stacking
the models, all which have been discussed in section 3 of Ensemble. Finally, ensemble models are also assessed on
this paper. the basis of evaluation metrics to choose the best one.
As an overview of the research, the general flowcharts Hence, the results are saved for future usage.
of the whole study have been demonstrated below (See
Fig. 2 and Fig. 3). Fig. 2 also illustrates a brief
Figure 2: General flowchart of the whole research process for applying the proposed model
Figure 3 shows the modeling procedure involving forest tree biomass and specifying the best reliable model
data collection process for the purpose of theory, and then for such prediction by comparing the selected ML models
applying six ML models to concurrently predict tropical with the aid of evaluation metrics.
STEP 1
"Multi-output deep learning models
Start Modeling enhance the reliability of simultaneous
above- and belowground biomass predictions
in tropical forests"
STEP 2
ata collection
Define Inputs &
targets in dataset
STEP 3
Input selection
Split dataset into
training (80%) &
testing (20%) set
STEP 4
Apply ML Models
STEP 5
Ensemble model
Stacking
STEP 6
Tuning by
STEP 7
Select best model Compare 6 different machine
Learning model by
evaluation metrics
Figure 3: Flowchart of modeling procedure showing the process of employing ML models concurrently
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 379
The flowcharts of the six regression models, including strength parameter in ElasticNet. It supervises the whole
ElasticNet, GB, ET, XGB, Poisson, and SE have been strength of regularization applied to the model. For 𝛼 = 0,
illustrated respectively in the following figure. The no regularization is applied, and Elastic Net equals
optimization of hyperparameters is also utilized through Ordinary Least Squares (OLS) regression. For 𝛼 = 1,
Grid Search tuning. These models were utilized with the regularizations of both L1 and L2 are applied, blending
goal of synchronously predicting ABG, BGB, and TB = their penalties. For 0<α<1, this model employs a mixture
BGB + AGB. Additionally, each proposed regression will of L1 and L2 regularization, permitting a flexible mixture
also be discussed briefly below. of penalties. L1 Ratio (l1_ratio) is the blending parameter
that identifies the balance between L1 and L2 penalties. It
2.1 Base machine learning models controls the proportion of the penalty determined to the L1
norm relative to the L2 norm. For l1_ratio=0, the model
ElasticNet regression is an extension of linear regression
applies only regularization of L2 (which equals Ridge
that integrates both regularizing penalties of Lasso
regression). For l1_ratio=1, it uses merely regularization
(abbreviated as L1) and Ridge (abbreviated as L2) into the
of L1 (which equals Lasso regression). For
loss function. This combination allows ElasticNet to deal
00 which is a supposed learning rate [9].
As the GB model is represented in Fig. 6, the data first
goes through a bootstrap sampling to be split into T data
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 381
Dataset
D
Bootstrop
Sampling
Subest Subest ......... Subest
D1 D2 DT
Decision Decision Decision
tree h1 tree h2 tree hT
Result Result Result
.........
h1(x) h2(x) hT(x)
Average
Final result
H(x)
Figure 6: The stages of Gradient Boosting (GB) model employed for tropical tree biomass predictions
Given that the Poisson distribution is the basis for estimation, and the coefficients (β) are specified to
Poisson regression, it identifies the probability of the maximize the probability of observing the actual count
occurrence of any number of events in a steady interval of data in the model [26].
time or space, with the assumption that the events occur at The approach for the application of the Poisson model
a constant rate and are independent of each other. The is well-illustrated in Fig. 7, which experiences various
formula for the Poisson distribution is given by: stages in a linear pattern. To begin with, a point cloud is
The Poisson distribution could be calculated from the taken as an input; second, the surface normal of all the
formula below: points is detected by computing the eigenvector over the
𝜆𝑘𝑒−𝜆 k-nearest neighbors of each point. Third, an octree with a
𝑃(𝑋 = 𝑘) = ( ) (4)
𝑘! predefined depth d is selected for categorizing the
reconstructed surface. Then, the Gradient of the indicator
In the above equation, X is the number of random
function (Vx) equated to the vector V is defined by the
occurrences, and λ represents the average or mean of the
point cloud. The next stage involves defining an indicator
events. Poisson regression exploits its distribution to
function X with the value of 1 inside and 0 outside the
provide a comprehension of the predictor variables’
surface. Thus, Vx=V and the divergence operator is
relationships along with that of the count data in the
dataset. In this regression, the expected value (mean) of applied to either side; i.e. ∆𝑥 ≡ ∇. ∇𝑥 = ∇. 𝑉. On the next
the count variable (namely Y) is designed as a model of a stage, the indicator function x is solved as a standard
linear mixture of predictor variables (namely X): Poisson problem. The marching cube algorithm is used to
extract the surface from the solved indicator function x.
λ = exp (β0 + β1X1 + β2X2 + … + βnXn) (5)
Eventually, the reconstructed surface is stored in the
in which: λ is the expected count, which represents the octree of depth d. Since AGB, BGB, and TB are skewed
occurrence proportion, β0 is the intercept term, β1, β2, …, and non-negative, Poisson regression was employed.
βn represent coefficients related to each predictor variable. Although it was initially created for count data, its
The link function in Poisson regression is the natural formulation fits biomass distributions quite nicely. To
logarithm (log-link), ensuring the predicted values are not make sure the Poisson model's assumptions held true in
negative. This model is evaluated via maximum likelihood this situation, diagnostic tests were conducted.
382 Informatica 49 (2025) 373–396 Q. Dang
Surface normal of all the Gradiant of the
An octree with a
points are found by indiactor function (Vx)
A point cloud is predefined depth d is
computing the eigen equated to the vector
taken as input selected for storing the
vector over k-nearset fieled V definde by the
recontructed surface
neighbor of each point point cloud
An indiactor function x
is defined having value
of 1 inside and o outside
the surface
Marching cube Vx = V
The reconstructed the indicator
algorithm is used to Applying the
surface is function x is solvedas
extract the surface from divergence operator
stored in the octree of a standard
the solved indictor to either side:
depth d poisson problem
function x
Figure 7: The stages of the Poisson model employed for tropical tree biomass predictions
2.2. Hyperparameter tuning the base models and capturing of non-linear relationships,
Grid Search was applied in a 5-fold cross-validation a tree-based learner was applied as the meta-model in this
framework to find optimal hyperparameters for all the research. Stacking model showing an RMSE of 18.298,
models. For tree-based models like Gradient Boosting, MAE of 12.422, and R2 of 0.968 performed significantly
Extra Trees, and XGB, the number of estimators, learning better compared to any of the base models on the test data,
rate, and max tree depth were systematically changed to meaning that the ensemble was able to selectively
optimize model complexity and predictability. In the leverage the strengths of each of the related models to
ElasticNet scenario, the regularization parameter (alpha) generate more stable and accurate predictions of biomass.
and L1 fraction were tuned to prevent overfitting and In the Stacking model, presented in Fig. 8, training
cause sparsity. In Poisson regression, tuning was for the data are processed on the basis of three level 0 models
regularization parameters and the number of iterations to separately. Each model’s prediction results are gathered as
achieve better convergence. After separately tuning each other processed training data in the study. All of the base
of the base models, their outputs were fed into a meta- learners' predictions (GB, ET, XGB, ElasticNet, and
learner in the Stacking Ensemble, whose parameters also Poisson) were aggregated by the Stacking Ensemble. To
were tuned via Grid Search. This broad tuning process avoid overfitting and information leakage, the meta-
ensured that all models, including the ensemble, reached learner, a Ridge regression model, was trained on out-of-
optimal generalization and performance [27]. fold predictions. We were able to improve overall
The Stacking Ensemble was selected due to the meta- predictive performance by combining the complementary
learner included, as it blends heterogeneous base learners strengths of all models—capturing distribution-specific,
with varying predictive ability and error behaviors, as well linear, and non-linear trends—into this ensemble. Table 2
as generalizes well. Due to its nature of broad application provides the hyperparameters chosen by the stacking meta
in addressing multicollinearity on regression prediction of learner for the models.
Training
Data
Level0 Level0
Model Model
3 1
Level0
Model
2
Predictions
Training
Data
Figure 8: Stacking model procedures used for tropical tree biomass
Predictions
Predictions
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 383
Last but not least, XGB is one of the common It is worth mentioning that additive learners would not
algorithms in ML. It is based on the ensemble learning mess with the functions developed in prior iterations but
framework, following the gradient boosting algorithm. add information of their own in order to bring down the
Thus, it is applicable for the tasks of supervised learning, error values. First, the model begins with some function
i.e., regression, ranking, and categorization. XGB is a F0(x). This F0(x) needs to minimize the loss function or
predictive model that combines multiple individual MSE, hence:
models’ predictions iteratively. It works by adding weak 𝐹0(𝑥) = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑛
ϒ ∑𝑖=1 𝐿(𝑦𝑖 , ϒ)
learners into the ensemble one after another, such that at 𝑎𝑟𝑔𝑚𝑖𝑛ϒ ∑𝑛
𝑖=1 𝐿(𝑦𝑖 , ϒ) = (9)
every step, a new learner tries to correct the errors of the
𝑎𝑟𝑔𝑚𝑖𝑛 𝑛
ϒ ∑𝑖=1(𝑦𝑖 − ϒ)2
prior ones. It also minimizes a prespecified loss function
during training data using some sort of gradient descent Regarding the prime differential of this equation with
optimization [13]. γ, it is observed the function is minimized at the mean i=1,
In summary, the XGB is developed in three stages …, n. Thus, the promoting model can proceed with:
straightforwardly: First, a primary model, namely F0, was ∑𝑛
𝑖=1 𝑦𝑖
𝐹0(𝑥) = (10)
used to predict, i.e. the aimed variable. The XGB model is 𝑛
related to a residual (y–F0). Second, the residuals obtained F0(x) presents the first step of predictions in this
in the prior stage are adapted to a new model called h1. model. Next, for each instance, the residual error is
Third, the combination of F0 and h1 delivers F1, which is expressed as: (yi – F0(x) [28].
the promoted form of F0. Consequently, the MSE metric In Fig. 9, the XGB model employs a multifaceted
system from F1 will be lower than that from F0. approach to make predictions about input data.
𝐹1(𝑋) < −𝐹0(𝑥) + ℎ1(𝑥) (6) Afterwards, the average of predictions is calculated and an
For improving F1's performance, a residuals model of ultimate XGB prediction is thus generated .Because of its
F1 can be designed, and an original model called F2 is exceptional performance with structured tabular data and
presented. its integrated regularization, which helps avoid
𝐹2(𝑋) < −𝐹1(𝑥) + ℎ2(𝑥) (7) overfitting, XGB was included. It was well-suited for this
task because of its efficient handling of non-linearities,
This process would be iterated for a number of 'n' support for missing values, and robustness to noise, even
stages up until potentially minimizing residuals as much with the small sample size. Grid Search was used to
as probable, i.e. optimize important hyperparameters, such as learning
𝐹𝑛(𝑋) < −𝐹𝑛−1(𝑥) + ℎ𝑛(𝑥) (8) rate, maximum depth, and gamma.
Predictions Predictions
Predictions
Input (...)
Data
(...)
AverageofPredictions
XGBoost Prediction
Figure 9: The procedure used in the XGB model for predicting tropical tree biomass
384 Informatica 49 (2025) 373–396 Q. Dang
3 Results and discussion darker the color of a cell, the stronger the correlation of
the related variables. As it is obvious from this tabulated
3.1 Exploratory data analysis heatmap, the colors are darker for stronger correlations
To display how closely related the multiple variables and lighter for weaker ones. Additionally, the green colors
of the study data are, a Pearson correlation heatmap is represent the positive correlations; that is, when one
exploited as an effective color-coded visual matrix (See variable increases, the other variable tends to go up,
Fig. 10). Variables are demonstrated with rows and whereas in the case of negative correlations, when one
columns, and the cells define the relationship between variable increases, the other variable tends to drop. Purple
variables two by two. The color shading for each cell colors have been used.
indicates the correlations’ direction and their strength: the
Figure 10: Pearson Correlation Heatmap for detecting the relationship between studied variables.
In Fig. 11, a pair plot visualization for the distribution explanatory power of the data set is in part due to a mixture
of dataset parameters is shown for exploring the analysis of continuous gradations in combination with categorical
of the data. In a pair plot, the data is visualized to find the differences. The distributions in classes are depicted by
relation between them, where variables are continuous as the colors, and it can be observed that there is clearly a
well as categorical, or form the most divided clusters. grouping in the plots, either of altitude, or of CA, or of
Dispersions of the parameters indicate the fact that most WT. Following is a pair plot providing a high interface
features are not evenly distributed. CA, WT, and P are level to derive enlightening statistical information about
skewed or clustered, and the values of these variables are the dataset; i.e., the variations in each plot can be
focused on particular ranges. Scatter plot graphs such as observed, and the crucial diagonal secondary plots show
CA versus WT or HA versus CA show positive each variable distribution. This pair plot for the
relationships, which hold good, indicating potential relationship between variables of total amount of biomass,
multicollinearity that could be important to model. As a namely TB, is also demonstrated in Fig. 12, which more
contrast, the variable types like forest type code and soil explicitly explores how the CTB classes are distinguished
type code arrive in horizontal bands or discrete groups, as in terms of predictors. In this instance, the scatter plots
they are categorical. These trends suggest that the show that for most variable pairs, the CTB categories are
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 385
predominantly overlapping, indicating that no set of dataset is skewed in variable distribution. This class
variables can completely separate the CTB classes. grouping within specific regions suggests that individual
However, there are regions, particularly in pairings like variables may not always be able to differentiate CTB, but
CA vs. WT or CA vs. P, where some CTB groups are groups of predictors likely have predictive value. Further,
grouped more closely together or are bunched into more the mixture of continuous and discrete variables
constricted value ranges. The histograms on the diagonal introduces difficulty, as seen in the scatter plots, where
also emphasize the bunched character of observations some categories of CTB extend across different bands,
within given intervals, further underscoring that the with others overlapping.
Figure 11: Pair plot for specifying the distribution of dataset parameters as well as their relationship
386 Informatica 49 (2025) 373–396 Q. Dang
Figure 12: Pair plot for showing the relationship between the variables of total tropical tree biomass (TB) and their
distributions
3.2. Machine learning results regard to regularization and data augmentation will be
In the present investigation, six methods were employed necessary in future computations to minimize the chances
in two forest locations, i.e., Dipterocarp and Evergreen of overfitting. The test results indicate that the Stacking
Broadleaf, to concurrently anticipate BGB, AGB, and TB model outperforms the others in nearly all metrics,
= BGB + AGB. To assess the base models’ performance, demonstrating higher predictive accuracy and reliability.
Table 3 presents the findings of the error metrics criteria Its mean squared error (MSE) is considerably low at
for each recommended models considering the train and 334.82, indicating lower average squared discrepancies
test data. By comparing these error metrics along with FI, between the predicted value and actual value compared to
it was detected that the Stacking Ensemble model was other models like ElasticNet (2378.17) and Extra Trees
optimal than other models. The very large R 2 values close (1216.54). In the same vein, root mean squared error
to 0.999 on the training dataset is an indicator of (RMSE) for Stacking is 18.30, a far cry from those of
overfitting or data leaks. To prevent this we made sure to ElasticNet (48.77) and Gradient Boosting (41.75),
have rigorous separation of the training and test data and meaning they were more precise in their predictions. Mean
we optimized our hyperparameters using the Grid Search absolute error (MAE) performs the same, at 12.42 for
with cross-validation to prevent overfitting. The rock- Stacking, a far better performance than for models such as
bottom R2 values on the testing data (such as 0.962) are Poisson regression (19.32) and XGB (21.18).
indicative of a lack of overfitting, so the overfitting In regard to explained variance and fit, Stacking had the
appears to be contained. Further improvements with best R² value of 0.968 across all models, which means it
accounts for nearly 97% of the test data variance. This is
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 387
significantly better than ElasticNet's R² score of 0.77 and efficient and performs better than the other models for
Extra Trees' R² score of 0.88. The Stacking model's both training and testing data. In contrast, ElasticNet
normalized mean squared error (NMSE) is 0.032, the shows weaker performance in predicting the variables.
minimum, with no significant normalized error against Furthermore, the results of the employed evaluation
data variance. Likewise, its variance accounted for (VAF) metrics are presented and thoroughly discussed using
is 0.971, indicating impeccable consistency of predicted relevant figures at the end of this section. Improved
and actual values. The median absolute percentage error accuracy and stability on both forest types were indicated
(MDAPE) of 23.08 and the standard deviation (STD_dev) by the lower MSE, RMSE, and MDAPE, but higher R2
of residuals of 105.76 also illustrate the consistency of the and VAF of the Stacking Ensemble, which consistently
model in its performance. outperformed the other algorithms. ElasticNet performed
On the other hand, the other models show higher error poorly because of its linear framework, which failed to
measures and lower variance explanation, with the properly capture the intricate, nonlinear patterns in
Stacking model remaining the most accurate and biomass data. Because Stacking possessed the ability to
consistent for the test set in this comparison. This claim combine the powers of tree-based models like GB, ET,
was on the basis of the higher R2 value of the Stacking and XGB, it outperformed them despite the fact that they
model for both train and test data. According to this table, were moderate.
the higher R2 and VAF for each model make them worthy Figure 13 below is an illustration of the data values
of a better model; in this case SE model. On the other obtained via the ML parameters; i.e., ElasticNet, Extra
hand, the lower the other metrics such as MSE, RMSE, Trees, GB, Poisson, Stacking, XGB; and accordingly, a
MAE, NMSE, MDAPE, and STD, the more the model comparison of these parameters in detail, along with their
would have the merit of being an efficient predictor. distance from the target value data, is presented.
Therefore, the Stacking model is deemed the most
Table 3: Error metrics criteria result for the proposed ML models considering the train and test datasets.
Models ElasticNet Extra Trees GB Poisson Stacking XGB
Metrics
Train
MSE 4026.476 4.933E-26 5.800 1725.207 1153.269 1.72544E-05
RMSE 63.455 2.221E-13 2.408 41.536 33.960 0.004
MAE 32.550 7.82E-14 1.821 16.042 11.562 0.003
R2 0.788 0.999 0.999 0.909 0.939 0.999
NMSE 0.212 2.601E-30 0.000 0.091 0.061 9.09812E-10
MDAPE 67.280 1.651E-13 5.016 29.535 8.722 0.008
STD_dev 100.279 137.713 137.491 150.171 134.774 137.713
VAF 0.788 0.999 0.999 0.909 0.939 0.999
Test
MSE 2378.167 1216.538 1743.162 1051.301 334.820 2090.796
RMSE 48.766 34.879 41.751 32.424 18.298 45.725
MAE 36.818 18.068 21.948 19.320 12.422 21.178
R2 0.770 0.882 0.831 0.898 0.968 0.797
NMSE 0.230 0.118 0.169 0.102 0.032 0.203
MDAPE 68.390 32.640 34.406 31.075 23.081 30.099
STD_dev 100.272 85.241 76.508 78.865 105.763 75.760
VAF 0.777 0.894 0.853 0.924 0.971 0.817
388 Informatica 49 (2025) 373–396 Q. Dang
1080
810
540
270
0
Target
ElasticNet
Extra Trees
GBM
Poisson
Stacking
XGBoost
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Row Numbers
Figure 13: Value plot of comparing ML parameter values with the target value.
Figure 14 shows the results for R2 as a significant norm line (when R2 = 1) is considered to be a superior as
error metric criterion, suggesting how well the employed well as more accurate model. This result is in line with a
ML models’ predictions fit the real data. Shown in the higher R2 value (approximately 0.939) for the test data and
figure, the model’s prediction values align closely to the 0.968 for the training data in the proposed Stacking model.
1600 ElasticNet 1600 Extra Trees 1600 XGBoost
1400 1400 140
R2 Train = 0.788 R2 Train = 0.999 0 R2 Train = 0.999
120 2
0 R2 Test = 0.770 1200 R2 Test = 0.882 1200 R Test = 0.797
1000 1000 1000
800 800 800
600 600 600
400 400 400
200 200 200
Train Train Train
0 Test 0 Test 0 Test
Y=X Y=X
-2 Y=X
00 -200 -20
- 0
200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600
Actual Values (ElasticNet) Actual Values (Extra Trees) Actual Values (XGBoost)
1600 GBM 1600 Poisson 1600 Stacking
1400 R2 Train = 0.999 1400 R2 Train = 0.909 1400 R2 Train = 0.939
1 R2 Test = 0.831
200 R2 Test = 0.898
1200 1200 R2 Test = 0.968
1000 1000 1000
800 800 800
600 600 600
400 400 400
200 200 200
Train Train Train
0 Test 0 Test 0 Test
Y=X Y=X
-2 Y=X
00 -200 -200
-200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600
Actual Values (GBM) Actual Values (Poisson) Actual Values (Stacking)
Figure 14: Comparing the coefficient of determination (R2) for each ML model.
The frequency of each error value for each ML accuracy. As a result, the error in the ML models’
method’s predictions is represented in Fig. 15. The error prediction performance ought to be almost zero to be an
analysis was conducted for both train and test parameters, adequate model for the aim of the study.
and the ML models were assessed to examine their
Predicted (Value) Predicted (Value)
Predicted (Value) Predicted (Value)
Predicted (Value) Predicted (Value)
Target (Value)
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 389
291
XGBoost Train Test
194
97
0
280 Stacking
140
0
-140
190 Poisson
0
-190
-380
176 GBM
88
0
-88
150 Extra Trees
75
0
-75
630
ElasticNet
420
210
0
0 25 50 75 100 125 150 175
Row Numbers
Figure 15: Comparing error values for ML models
Moreover, according to Fig. 16, the error values for Based on the plot, we see that stacking seems to be
the proposed ML models have been illustrated from the more accurate than individual models, but by a very small
least error value to the most errors for both train and test margin. Compared to its predictions, it has fewer errors
data of each model, moving from left to right. A model and reduced variance, indicating that it has a stronger
with the least error values (i.e., approximately zero) would generalization and stability. On the contrary, although
be the best predictor among the employed ML models. other models have also performed adequately, they exhibit
This visualization highlights that the data has intense some spread or deviations that are a bit higher than the
recurrent peaks—suggesting non-uniform distributions mean.
with dominating clusters—and that these patterns persist Overall, it appears that stacking should produce a
but evolve subtly across different sections of the dataset. more consistent and less erratic result than single models,
When one of the groups describes a stacking model, its thus making a superior comparison to the single models in
activity can be visually compared with the other groups by terms of performance.
looking at how close the mean is to zero and how much
and steady the standard deviation is.
ElasticNet Extra Trees GBM Poisson Stacking XGBoost
390 Informatica 49 (2025) 373–396 Q. Dang
Mean ± 1 SD
600 Mean
Data
400
200
0
-200
-400
Figure 16: Boxplot of ML models’ error values for both train and test data
The following figure (Fig. 17) illustrates a Compared to its predictions, it has fewer errors and
comparison of proposed models in terms of two important reduced variance, indicating that it has a stronger
statistical evaluation metrics, namely R2 and VAF, generalization and stability. On the contrary, although
estimated for both the test and train datasets. As the values other models have also performed adequately, they exhibit
of these metrics show, all the models perform efficiently some spread or deviations that are a bit higher than the
in the prediction except the ElasticNet model, which mean.
performs weaker than others, having lower VAF and R2. Overall, it appears that stacking should produce a
Stacking and XGB model performances are stronger than more consistent and less erratic result than single models,
the rest of the models, bearing higher VAF and R2. Based thus making a superior comparison to the single models in
on the plot, we see that stacking seems to be more accurate terms of performance.
than individual models, but by a very small margin.
R2 (Test) R2 (Train) VAF (Test) VAF (Train)
XGBoost 0.999 0.817 0.999 0.797
Stacking 0.939 0.971 0.939 0.968
Poisson 0.909 0.924 0.909 0.898
GBM 0.999 0.853 0.999 0.831
Extra Trees 0.999 0.894 0.999 0.882
ElasticNet 0.788 0.777 0.788 0.770
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Figure 17: Comparison of Models based on VAF and R2 metrics.
Range
ElasticNet (Train)
ElasticNet (Test)
Extra Trees (Train)
Extra Trees (Test)
GBM (Train)
GBM (Test)
Poisson (Train)
Poisson (Test)
Stacking (Train)
Stacking (Test)
XGBoost (Train)
XGBoost (Test)
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 391
The other evaluation metrics, including MSE, NMSE, be the best predictor. In this case, the stacking model for
MAE, RMSE, STD, and MDAPE, are applied for both the train and test datasets is the lowest compared to
comparison among the models, supposing that the lowest the other models (See Fig. 18).
value of these metrics for each model allows that model to
ElasticNet ElasticNet
5000 0.25
0.230
0.212
4000 4026.48 0.20
3000 0.15
XGBoost 2378.17
2000 Extra Trees XGBoost 0.10 Extra Trees
0.203
1000 0.05
2090.80 0.118
0 1216.54 0.00
0.00 0.00 0.000 0.000
-1000 -0.05
334.82 5.80 0.000
1153.27 0.061 0.032
1743.16
1051.30 0.169
1725.21 0.091
Stacking GBM 0.102
Stacking GBM
MSE (Train) NMSE (Train)
MSE (Test) NMSE (Test)
Poisson Poisson
ElasticNet ElasticNet
40 70
36.818
35 63.455
60
32.550
30 50 48.766
25
40
XGBoost 20 Extra Trees XGBoost 30 Extra Trees
15
45.725 20
10
21.178 8 34.879
5 18.06 10
0 0
0.003 0.000 0.004 0.000
-5 -10
1.821
11.562 2.408
12.422 18.298
33.960
21.948 41.751
16.042
Stacking 19.320 GBM Stacking 32.424 GBM
41.536
MAE (Train) RMSE (Train)
MAE (Test) RMSE (Test)
Poisson Poisson
ElasticNet ElasticNet
160 80
150 70 68.39
140 60 67.28
130 50
XGBoost 120 Extra Trees XGBoost 40 Extra Trees
110 30
100.28 137.71
137.71 100 100.27 20
90 30.10 10 32.64
80 85.24 0
75.76 0.01 0.00
70
76.51 -10
78.86 8.72 5.02
105.76 23.08
34.41
134.77 137.49 29.53
31.08
Stacking GBM Stacking GBM
STD (Train) 150.17 MDAPE (Train)
STD (Test) MDAPE (Test)
Poisson Poisson
Figure 18: Comparison of Models based on MSE, NMSE, MAE, RMSE, STD, and MDAPE metrics.
Another graphical tool used to compare the models' such as the correlation coefficient, standard deviation
performance is the Taylor diagram. This diagram (STD), and RMSE. In the diagram, the models'
evaluates models based on their accuracy, using metrics performance is represented by circles, where better
392 Informatica 49 (2025) 373–396 Q. Dang
performance is indicated by points closer to the reference the RMSE of the ElasticNet model is higher than that of
point [29]. Taylor diagrams for predicting tropical tree the other machine learning (ML) models, and its
biomass are shown in Fig. 19. As seen in these diagrams, correlation coefficient is lower. These findings, based on
the RMSE of the Stacking model is lower than that of the the correlation coefficient, STD, and RMSE, confirm that
other models, and its correlation coefficient exceeds 0.9, the Stacking model outperforms the other models.
outperforming the other models in this regard. In contrast,
Taylor Diagram_ (R2, test) Taylor Diagram_ (R2, train)
Taylor Diagram_ (RMSE, test) Taylor Diagram_ (RMSE, train)
Figure 19: Taylor diagrams for models’ comparison based on RMSE, STD, and R metrics
The last plot to be discussed for model comparison is Most of the observations in the test plot lie comfortably
the Williams plot. This plot is used to compare a specific within the satisfactory limits for leverage and standardized
group of compounds in terms of leverage values and residuals (±2), which is an indicator that the model
standardized residuals [30]. William’s plot shows the predictions are not biased and stable and possess minimal
standardized residuals on the y-axis and leverages on the outliers. Only a minimal number of observations being out
x-axis of the training and testing datasets. From this plot, of the ±2 boundary and leverage constraint signifies that
the applicability domain is implemented within a squared there are very few influential or problematic points.
area inside ±2 standard deviations and a threshold h* in Similarly, the training plot shows tightly clumped
leverage (h* = 3p´/n, being p´ model parameters and n residuals around zero with the majority of data points
compounds number). The majority of data ought to be having little leverage, which would mean that the model
located within this area, conceptualizing that they have to has not over-fit the training data. Even some of the
be inliers and influential in the model. The Williams plots residuals fall outside ±3 or do possess relatively higher
of the Stacking model on both training and test sets are leverage, those are scattered and do not invalidate the
indicators of good model performance and generalization. model. The similar trend in both plots confirms that the
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 393
Stacking model works well on unseen data, learning the robust. To be an efficient model predictor, the data must
inherent pattern significantly without being overfit and lie within this domain (See Fig. 20) [23].
Elastic Net Williams Plot [TB, Train] Elastic Net Williams Plot [TB, Test]
Extra Trees Williams Plot [TB, Train] Extra Trees Williams Plot [TB, Test]
GBM Williams Plot [TB, Train] GBM Williams Plot [TB, Test]
Poisson Williams Plot [TB, Train] Poisson Williams Plot [TB, Test]
394 Informatica 49 (2025) 373–396 Q. Dang
Stacking Williams Plot [TB, Train] Stacking Williams Plot [TB, Test]
XGBoost Williams Plot [TB, Train] XGBoost Williams Plot [TB, Test]
Figure 20: Williams plots for models’ comparison based on standard residuals and leverage.
3.3. Comparison with foundation models and summarization. Both use Transformer architecture, but
LLMs GPT is more focused on generation and BERT on
comprehension. The proposed SE model adds domain-
Large Language Models, or LLMs, are sophisticated AI
specific efficiency, whereas models such as BERT and
systems that have been trained on enormous text datasets
GPT are effective for general-purpose NLP tasks. We
to comprehend and produce human language. Google
specifically highlighted how, in contrast to the extensive,
created BERT, which is perfect for tasks like classification
data-intensive training of LLMs, SE makes use of
and question answering because it can analyze words in
structured, domain-relevant features. This study also
both directions and understand context. With its emphasis
indicates SE's improved interpretability and reduced
on producing relevant and coherent text, OpenAI's GPT is
computational cost, both of which are important for
effective for tasks like content creation, dialogue, and
ecological modelling.
performance was investigated in terms of two sets of
4 conclusions
actual data, namely training and testing data.
The study was implemented on eight tropical forests in The outcome of this study presented that the
Vietnam, using the forestry variables, i.e., AGB, BGB, recommended method had a vigorous efficacy to estimate
and TB. In an attempt to solve the problem of predicting the amount of forest biomass. That is to say, employing a
the mentioned variables, the study used an MGDL simultaneous group of ML models resulted in a significant
regression strategy, which later proved to be an efficient impact on predicting forestry above- and below ground, as
model to bear a strong ability to predict tropical forest well as the sum of the biomass. The very high R² values
biomass. To this, five models were selected as major of near 0.999 in the training set are definitely cause for
algorithms to unravel the issue of biomass prediction. alarm for overfitting or data leakage. We dealt with this by
These models included Gradient Boosting (GB), Extra ensuring strict separation of training and test datasets such
Trees (ET), XGB, ElasticNet, and Poisson, all of which that there was no information leakage. We also employed
were employed to synchronously anticipate both the Grid Search with cross-validation during hyperparameter
amount of AGB, BGB, as well as TB = BGB + AGB. Then tuning to allow maximum model complexity without over-
optimized by Grid Search. Additionally, the SE model was fitting. The test set results, with R² scores considerably
joined to the aforementioned models so as to allow the lower (e.g., 0.968 for the best model), are a sign of good
results to become satisfactory, i.e., mainly for the cross- generalization and suggest that while there may be some
validation purpose. Therefore, the recommended method's overfitting, it is controlled. More regularization and more
Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 395
data will be tried in future research to reduce the participants' rights and compliance with the relevant
possibility of overfitting even more. Based on the ethical guidelines.
provided metrics, the Stacking ensemble model performed
clearly superior to each of the standalone models on the References
test set. That is because it is capable of leveraging the
prediction power of various base learners—ElasticNet, [1] H. C. Zhantao Song, Xiong Zhang, Xiaoqiang Li,
Extra Trees, Gradient Boosting, Poisson Regression, and Junjie Zhang, Jingai Shao, Shihong Zhang, Haiping
XGB—and minimizing their respective errors through a Yang, “Machine learning assisted prediction of
meta-learner. Stacking takes into consideration the stand- specific surface area and nitrogen content of biochar
alone strengths of linear as well as nonlinear models and based on biomass type and pyrolysis conditions,” J
results in improved generalization and less overfitting. Anal Appl Pyrolysis, vol. 183, 2024.
Quantitatively, the Stacking model achieved the [2] L. Jia, W. Shao, J. Wang, Y. Qian, Y. Chen, and Q.
highest coefficient of determination (R² = 0.968) and Yang, “Machine learning-aided prediction of bio-
variance accounted for (VAF = 0.971) on the test set, BTX and olefins production from zeolite-catalyzed
indicating that its predictions were most highly correlated biomass pyrolysis,” Energy, vol. 306, p. 132478,
with actual biomass values. It generated the lowest mean 2024.
squared error (MSE = 334.820), root mean square error [3] H. Wu, S. An, B. Meng, X. Chen, F. Li, and S. Ren,
(RMSE = 18.298), and mean absolute error (MAE = “Retrieval of grassland aboveground biomass across
12.422), indicating high accuracy and low prediction bias. three ecoregions in China during the past two
In terms of normalized error, it also had an NMSE of just decades using satellite remote sensing technology
0.032, and the mean directional accuracy percentage error and machine learning algorithms,” International
(MDAPE) decreased to 23.081%, significantly better than Journal of Applied Earth Observation and
other models. Although its test standard deviation (STD = Geoinformation, vol. 130, no. November 2023, p.
105.763) was slightly greater, this is the natural result with 103925, 2024, doi: 10.1016/j.jag.2024.103925.
better prediction accuracy and range coverage for both [4] P. Mao et al., “An improved approach to estimate
train and test data, where the results showed R2 equals above-ground volume and biomass of desert shrub
0.939 for the testing data and 0.968 for the training data in communities based on UAV RGB images,” Ecol
this study data analysis. Therefore, adding the SE model Indic, vol. 125, p. 107494, 2021, doi:
to the proposed models is recommended for predicting 10.1016/j.ecolind.2021.107494.
forest biomass effects. As a result, this is evidence of the [5] P. B. May et al., “Mapping aboveground biomass in
poor performance of this model. The William plots Indonesian lowland forests using GEDI and
residuals display that its majority corresponds to the hierarchical models,” Remote Sens Environ, vol.
tolerant parameters, and very limited outliers or high- 313, p. 114384, 2024.
leverage points are there in the test and train subsets. This [6] B. Huy, N. Quy Truong, K. P. Poudel, H. Temesgen,
implies that the Stacking model produces reliable, and N. Quy Khiem, “Multi-output deep learning
unbiased, and non-overfit predictions, and there is indeed models for enhanced reliability of simultaneous tree
powerful generalization and performance. above- and below-ground biomass predictions in
tropical forests of Vietnam,” Comput Electron Agric,
vol. 222, no. December 2023, p. 109080, 2024, doi:
10.1016/j.compag.2024.109080.
Declarations [7] M. F. Oliveira et al., “Predicting below and above-
ground peanut biomass and maturity using multi-
Funding target regression,” Comput Electron Agric, vol. 218,
p. 108647, 2024.
This research received no specific grant from any funding
[8] G. Kunapuli, Ensemble methods for machine
agency in the public, commercial, or not-for-profit sectors.
learning. Simon and Schuster, 2023.
[9] R. Dey and R. Mathur, “Ensemble learning method
Authors' contributions QD using stacking with base learner, a comparison,” in
Writing-Original draft preparation, Conceptualization, International Conference on Data Analytics and
Supervision, Project administration. Insights, Springer, 2023, pp. 159–169.
[10] P. Naik, M. Dalponte, and L. Bruzzone, “Automated
Acknowledgements machine learning driven stacked ensemble modeling
for forest aboveground biomass prediction using
I would like to take this opportunity to acknowledge that multitemporal Sentinel-2 data,” IEEE J Sel Top Appl
there are no individuals or organizations that require Earth Obs Remote Sens, vol. 16, pp. 3442–3454,
acknowledgment for their contributions to this work. 2022.
[11] Y. Zhang, J. Ma, S. Liang, X. Li, and J. Liu, “A
Ethical approval stacking ensemble algorithm for improving the
The research paper has received ethical approval from the biases of forest aboveground biomass estimations
institutional review board, ensuring the protection of from multiple remotely sensed datasets,” GIsci
Remote Sens, vol. 59, no. 1, pp. 234–249, 2022.
396 Informatica 49 (2025) 373–396 Q. Dang
[12] J. Liu, Y. Niu, Z. Jia, and R. Wang, “Assessing the and Elastic Net,” Cognit Comput, vol. 16, no. 2, pp.
ethical implications of artificial intelligence 641–653, 2024.
integration in media production and its impact on the [24] S. M. Mastelini, F. K. Nakano, C. Vens, and A. C. P.
creative industry,” MEDAAD, vol. 2023, pp. 32–38, de Leon Ferreira, “Online extra trees regressor,”
2023. IEEE Trans Neural Netw Learn Syst, vol. 34, no. 10,
[13] R. Huang, C. McMahan, B. Herrin, A. McLain, B. pp. 6755–6767, 2022.
Cai, and S. Self, “Gradient boosting: A [25] M. M. Hameed, M. K. Alomar, F. Khaleel, and N.
computationally efficient alternative to Markov Al-Ansari, “An Extra Tree Regression Model for
chain Monte Carlo sampling for fitting large Discharge Coefficient Prediction: Novel, Practical
Bayesian spatio-temporal binomial regression Applications in the Hydraulic Sector and Future
models,” Infect Dis Model, vol. 10, no. 1, pp. 189– Research Directions,” Math Probl Eng, vol. 2021,
200, 2025, doi: 10.1016/j.idm.2024.09.008. 2021, doi: 10.1155/2021/7001710.
[14] Z. Wang, L. Mu, H. Miao, Y. Shang, H. Yin, and M. [26] “Understanding Poisson Regression.” Accessed:
Dong, “An innovative application of machine Nov. 05, 2023. [Online]. Available:
learning in prediction of the syngas properties of https://medium.com/@data-
biomass chemical looping gasification based on overload/understanding-poisson-regression-a-
extra trees regression algorithm,” Energy, vol. 275, powerful-tool-for-count-data-analysis-
p. 127438, 2023. b7184c61bfde
[15] H. Wei, K. Luo, J. Xing, and J. Fan, “Predicting co- [27] M. A. Alemayehu, S. D. Kebede, A. D. Walle, D. N.
pyrolysis of coal and biomass using machine Mamo, E. B. Enyew, and J. B. Adem, “A stacked
learning approaches,” Fuel, vol. 310, p. 122248, ensemble machine learning model for the prediction
2022. of pentavalent 3 vaccination dropout in East Africa,”
[16] R. (Bob) Roy, “No Title.” Accessed: Jun. 27, 2021. Front Big Data, vol. 8, p. 1522578, 2025.
[Online]. Available: [28] “XGBoost Algorithm.” Accessed: Sep. 04, 2024.
https://bobrupakroy.medium.com/extra-trees- [Online]. Available:
classifier-regressor-5b5f6abe8228 https://www.analyticsvidhya.com/blog/2018/09/an-
[17] B. Kıyak, H. F. Öztop, F. Ertam, and İ. G. Aksoy, end-to-end-guide-to-understand-the-math-behind-
“An intelligent approach to investigate the effects of xgboost/#:~:text=XGBoost builds a predictive
container orientation for PCM melting based on an model,made by the existing ones.
XGBoost regression model,” Eng Anal Bound Elem, [29] M. Ehteram, A. N. Ahmed, P. Kumar, M. Sherif, and
vol. 161, pp. 202–213, 2024. A. El-Shafie, “Predicting freshwater production and
[18] Y. Ayub, J. Ren, T. Shi, W. Shen, and C. He, energy consumption in a seawater greenhouse based
“Poultry litter valorization: Development and on ensemble frameworks using optimized multi-
optimization of an electro-chemical and thermal tri- layer perceptron,” Energy Reports, vol. 7, pp. 6308–
generation process using an extreme gradient 6326, 2021, doi: 10.1016/j.egyr.2021.09.079.
boosting algorithm,” Energy, vol. 263, p. 125839, [30] A. Beheshti, E. Pourbasheer, M. Nekoei, and S.
2023. Vahdani, “QSAR modeling of antimalarial activity
[19] A. Jain, “No Title.” Accessed: Feb. 05, 2024. of urea derivatives using genetic algorithm-multiple
[Online]. Available: linear regressions,” Journal of Saudi Chemical
https://medium.com/@abhishekjainindore24/elastic Society, vol. 20, no. 3, pp. 282–290, 2016, doi:
-net-regression-combined-features-of-l1-and-l2- 10.1016/j.jscs.2012.07.019.
regularization-6181a660c3a5
[20] J. Liu et al., “A new application of Elasticnet
regression based near-infrared spectroscopy model:
Prediction and analysis of 2, 3, 5, 4′-tetrahydroxy
stilbene-2-O-β-D-glucoside and moisture in
Polygonum multiflorum,” Microchemical Journal,
vol. 199, p. 110095, 2024.
[21] Purhadi, D. N. Sari, Q. Aini, and Irhamah,
“Geographically weighted bivariate zero inflated
generalized Poisson regression model and its
application,” Heliyon, vol. 7, no. 7, p. e07491, 2021,
doi: 10.1016/j.heliyon.2021.e07491.
[22] U. Arif, C. Zhang, S. Hussain, and A. R. Abbasi, “An
Efficient Interpretable Stacking Ensemble Model for
Lung Cancer Prognosis,” Comput Biol Chem, p.
108248, 2024.
[23] H. Yıldırım and M. R. Özkale, “A Novel Regularized
Extreme Learning Machine Based on L 1-Norm and
L 2-Norm: a Sparsity Solution Alternative to Lasso
https://doi.org/10.31449/inf.v49i16.9397 Informatica 49 (2025) 397–416 397
HematoFusion: A Weighted Residual-Vision Transformer Ensemble for
Automated Classification of Haematologic Disorders in Microscopic Blood
Images
Mouna Saadallah, Latefa Oulladji, Farah Ben-Naoum
Evolutionary Engineering and Distributed Information Systems Laboratory, Department of Computer Science, Djillali
Liabes University, Sidi Bel Abbes, Algeria
E-mail: mouna.saadallah@univ-sba.dz, latifa.oulladji@univ-sba.dz, farah.bennaoum@univ-sba.dz
Keywords: Medical imaging, neural networks, red blood cell, leukemia, lymphoma
Received:May 27, 2025
Haematologic malignancies pose a significant global challenge, with 1.34 million new cases reported in
2019 and leukemia claiming 311,594 lives in 2020. Early diagnosis of these blood disorders increases
survival chances by enabling prompt treatment, yet their complexity and variable cellular morphology
hinder accurate detection. Advances in Medical Imaging and AI, particularly Image Classification, offer
solutions by analyzing blood samples for subtle morphological patterns. This study advances the field by
introducing a novel data set for the classification of red blood cells and using open-source data for the
classification of leukemia and lymphoma (each covering 29,363; 16,811; and 1,436 images, respectively).
We fine-tuned multiple AI models, including EfficientNetB3, ResNet50V2, and a pretrained Vision Trans-
former (ViT), and combined their strengths into a weighted ensemble framework. Evaluated across various
metrics (including accuracy, precision, recall, etc.), the proposed HematoFusion model excelled, achieving
96% accuracy in the morphology of red blood cells, 99% in Leukemia, and 96% in Lymphoma, surpassing
most existing models in terms of accuracy while covering a wider range of haematologic disorders. These
findings demonstrate the potential of integrated AI frameworks to improve haematologic diagnostics with
precision and reliability.
Povzetek: HematoFusion je uteženi ansambel ResNet50V2 in Vision Transformer, namenjen avtomatski
klasifikaciji hematoloških motenj iz mikroskopskih slik. Sistem uporablja nov RBC-nabor podatkov ter
odprtokodne nabore levkemije in limfoma ter izboljša zanesljivost diagnostičnega razpoznavanja krvnih
celic.
1 Introduction thocyte, spherocyte, and more. White Blood Cells (WBC)
and platelet disorders can mainly be described as quantita-
The collection of blood samples is crucial to understanding tive, for example: Leukopenia, leukocytosis, neutropenia,
diseases, preventing them, and thoroughly providing treat- lymphocytopenia (WBC), thrombocytosis, or thrombocy-
ment. topenia (platelets). Most qualitative disorders are of cancer
The diagnosis of blood cell diseases hinges significantly or proliferative disorders, including Leukemia, lymphoma,
on determining the patient’s Blood Cell Count (BCC) and myeloma (WBC), and Hemophilia (platelets) [9].
observing the appearance of cells under a microscope. It The pathologist, along with other medical professionals,
serves as a guide for the pathologist or biologist, providing depends on studying and examining body tissues to per-
vital information on diseases that are indicative of quantita- form diagnostics. The microscope is the main tool used
tive (variations in the number of cells) or qualitative (struc- to observe blood cells, which provides a detailed descrip-
tural or functional) abnormalities in blood cells [11]. tion of them in terms of shape and count. Blood cell
Patients admitted to consultation often suffer haematologic observation can be extremely challenging with the naked
dysfunction (either qualitative or quantitative), some ex- eye and requires enormous concentration and focus; mod-
amples of the most common cases requiring medical eval- ern technologies, however, recommend new techniques in-
uation are caused either by a decrease in complete blood volving the use of a camera to capture microscopic im-
count, anemia, for instance, sees a decrease in the num- ages that can be exploited for further studies and examina-
ber of Red Blood Cells (RBCs) or the level of hemoglobin, tion. Some existing solutions like EasyCell ®Assistant, Vi-
or an increase/ concentration in the amount of RBCs, as sion Hema®Assist, include a stand-alone tool using highly
marked in the condition of Erythrocytosis. Other condi- costly robots and integrated microscopes that can assist the
tions mark a change in the cell’s shape and/or size, includ- pathologist to make decisions and save time; nonetheless,
ing: microcyte, macrocyte, echinocytes, codocyte, acan- due to their high costs and unavailability in public hospi-
398 Informatica 49 (2025) 397–416 M. Saadallah et al.
tals and laboratories, these solutions cannot be relied upon Nevertheless, the emergence of developed technologies,
entirely. This leads us to consider cheaper and more effec- such as deep learning, made things much easier for biol-
tive innovations emerging in recent years, including Deep ogists/ pathologists, as it assists them with the process of
Learning (DL) and its various contributions. DL has been analyzing the blood smear and detecting abnormalities in
widely implemented by researchers in the medical field, cell type, shape, and aggregation. If done entirely by the
and it has given promising results regarding medical imag- pathologist, this step may take hours or even days when
ing (MRI, X-Rays, CT...) [30, 52] and enabled medical pro- necessary, which causes a decline in the health worker’s
fessionals to rapidly diagnose and detect abnormalities in focus and even eyesight.
the human body without exhausting analysis and observa- This urged the need to automate the task to alleviate the
tions. pressure on them. Many studies have been conducted to
This paper aims to improve classification accuracy for address this problem by exploiting the use of Artificial In-
haematologic diseases by leveraging ensemble learning telligence and its diverse techniques.
techniques applied to multi-source microscopic datasets, In its earliest phase, peripheral blood image analysis was
preserving the full spectrum of morphologic variability. inspired by the emerging use of Artificial Intelligence in
The latest DL techniques were exploited, including trans- the medical field and its automation. Kim KS, et al. [2]
fer learning and fine-tuning Convolutional Neural Net- designed a system that uses a CCD camera attached to the
work (CNN)models and the recently emerging visual trans- microscope to capture the peripheral images, preprocess-
former (ViT) [28]. The ResNet50V2 and EfficientNetB3 ing techniques such as edge enhancement and noise re-
networks were chosen as they were preferable for micro- moval were applied, and the images were later classified
scopic image classification, and the latter was suitable for into 15 types of Red Blood Cell abnormalities and 5 normal
scenarios with limited computing resources. We acquired shapes of White Blood Cells using neural networks. Fol-
different sources for our data set that study not only Red lowing that, neural networks, mainly Convolutional Neu-
Blood Cell disorders but also White Blood Cells (WBC). ral Networks, were explored for blood cell image analy-
The CNN and ViT models were separately trained using sis and classification. WBC and its 5 different normal cell
the completed data set, and the results were later combined shapes (Neutrophils, Lymphocytes, Monocytes, Basophils,
to enhance the performance. Eosinophils) were the easiest to classify and readily avail-
A description of our contribution is provided on the follow- able [14] [56] [37]. Classification accuracy reached 96%
ing lines: using a simple neural network that consists of 16 neurons
input layers and 10 nodes in a single hidden layer to achieve
– A meticulously curated data set for Red Blood Cell a minimum error less than 10−4 and the output layer with 5
Morphology using samples collected in the Anti- neurons to classify each type [14]. Ali et al. [54] proposed
Cancer Center in El-Oued, Algeria. the VGG16-ViT network that uses two online datasets to
– The base architecture of EfficientNetB3 was used with classify WBC subtypes, achieving excellent precisions of
transfer learning, leveraging pretrained weights from 98.99% and 99.95% on each dataset.
ImageNet. It was additionally fine-tuned for the task The DenseNet121 model [12] was used by Bozkurt F. [27]
of blood cell classification. on the open-access data set provided by Paul Mooney,
available on Kaggle.com [18], reaching an accuracy of
– The ResNet50V2 was also integrated and transfer- 98%. Another Two-Module Deformable CNN and Trans-
learned as a base architecture and eventually fine- fer learning was proposed by YaoXufeng, et al. [37], whilst
tuned by adding dense layers and regularization tech- the first module initializes the ImageNet [3] characteristic
niques that serve to enhance the model’s performance. weights, the second module was designated for classifica-
tion. The authors achieved precisions of 95.7%, 94.5%, and
– A pretrained ViT model was applied to our data set 91.6% for low-resolution and noisy undisclosed data sets,
to classify blood cell images through self-attention BCCD data set [20], respectively.
mechanisms. The model was fine-tuned by optimiz- Some of the studies, however, focused solely on the clas-
ing hyperparameters to improve accuracy. sification of one disease. Leukemia is one of the most
common blood cancers, leading to growing interest in
– A hybrid CNN/ViT model was developed by combin- developing new diagnostic systems for early detection
ing the strengths of CNNs for local feature extraction and prevention. In this context, CNNs have gained sig-
with those of ViT which captures global features more nificant attention due to their efficiency and high accu-
efficiently. racy in image-based classification tasks. Areen K. et al.
[47] compared in their study multiple CNN-based algo-
2 Related work rithms (AlexNet, DenseNet, ResNet, and VGG16), em-
ploying three datasets (ALL-IDB, ASH ImageBand, and
Pathology and detecting blood disorders require a mass of images captured at JUST); reaching an accuracy of 94%.
work and time by a biologist to prepare the blood, test it, DeepLeukNet proposed by Saeed et al. [53] was conceived
and analyze it. to classify Acute Lymphoblastic Leukemia (ALL) subtypes
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 399
employing a CNN-based classifier on the ALL-IDB1 and searchers using an imaging flow cytometer to classify un-
ALL-IDB2 datasets, attaining 99.61% accuracy. Kasim et stainedWBCs [19] and optofluidic time-stretchmicroscopy
al. [55] leverages the online datasets ALL-IDB and Mu- along with Machine Learning for aggregated platelets de-
nich AML Morphology datasets for multi-class classifica- tection as well as single platelets and WBC [13].
tion of Leukemia subtypes using pretrained CNN architec- Visual or Vision Transformers were introduced by Doso-
tures and other classification models, including Random- vitskiy, et al. [28] in 2020, to exploit transformers in vi-
Forest, SVM, and Extreme Gradient Boosting. The highest sual applications. Given that image classification is rather
accuracy achieved by this method was of 88%. In recent a novel concept for transformers, it may take a while to fully
studies, Vision Transformers (ViTs) have been employed develop and exploit ViT in this regard. Compared to ViT,
for the classification of Leukemia subtypes. Swain et al. CNN can handle large-scale data sets better and offer ex-
[59] proposed in their research a model based solely on cellent results. ViT, however, is known for its understand-
ViTs and classified ALL subtypes. The accuracy of the test ing of global context and dependencies, although it requires
set reached 99.67%. A similar approach was implemented pretraining large amounts of data to achieve comparable re-
by Prasad et al. [51] who attained an overall accuracy of sults to CNN. [34] Therefore, an ensemble ViT/CNNmodel
98.01% for the automatic detection of ALL. Others opted can be an excellent approach to incorporate ViT’s efficien-
for architectures combining both CNNs and ViTs to further cies with CNN, this was previously done by Y.Barhoumia,
enhance feature extraction. For instance, Tanwar et al. [60] et al. [26] to address another consistent problem of Intra
combined in their study the ResNet50 model with the ViT, Cranial Hemorrhage Classification. It was also employed
establishing a dual-stream architecture, reaching an accu- by Jiang Zhencun, et al. [32] to diagnose ALL. The en-
racy of 99%. semble model method used is the weighted-sum model; the
DL also proved efficient in the classification of other types output results of the ViT models are multiplied by a co-
of cancer such as Lymphoma. Its potential was thoroughly efficient of 0.7, and the output results of the EfficientNet
explained by several researchers [58] [35], stressing the model [21] are multiplied by a coefficient of 0.3. The au-
application of CNNs and ensemble techniques. Ozgur et thors later combined the results to get the final prediction
al. [49] developed a triple classification system of various result. The ViT-CNN ensemble model achieved outstand-
Lymphomas: CLL, FL, and MCL, and employed a com- ing results with an accuracy of 99.03%, exceeding the mod-
bination of ML and DL algorithms, reaching precisions of els in the literature.
94%, 92%, and 82%, respectively. A comparative summary of recent studies on cancer classi-
Sickle Cell Anemia andMalaria can be diagnosed by exam- fication using deep learning methods is presented in Sup-
ining the patient’s RBCs. Harahap Mawaddah, et al. [29] plementary Material: Section 1 (Table S1), which pro-
used a data set that regroups 27.588 images of infected and vides the datasets used, classification techniques, number
healthy individuals’ RBCs provided by Yasmin M. Kassim of classes, and accuracy values reported.
et al. [23]. 2 CNN architectures were compared during
the classification. LeNet-5 [1] was deemed more precise
than DRNet [46] in classifying RBCs affected by Malaria, 3 Methods
with accuracies of 95.7% and 95%, respectively. Alzubaidi
Laith, et al. [22] introduced a CNN classifying RBC into 3.1 Data acquisition
3 classes, namely normal, abnormal, and miscellaneous. The data set used for the classification was acquired by
They used the same network as a feature-extractor, then ap- combining different sources.
plied the Error Correcting Output Codes (ECOC) classifier
for the classification task, achieving an accuracy of 92.06% 1. The Chula RBC-12-data set [33] of RBC blood
In addition to neural networks, Machine Learning tech- smear images, which contains a total of 706 smear
niques were also employed to address the problem of Blood images describing 13 classes of RBC, and compris-
Cell Image Analysis. Aliyu Hajara Abdulkarim, et al. [17] ing over 20K images of normal and pathological RBC.
compared Support Vector Machine (SVM) against Deep The images provided were collected at the Oxidation
Learning methods using AlexNet architecture [5]. The in Red Cell Disorders Research Unit, Chulalongkorn
dataset used was open-sourced and distinguished 4 types of University in 2019, with a DS-Fi2-L3 Nikon micro-
RBC abnormalities along with their normal shape. The ac- scope used at 1000x magnification. The 13 classes
curacy for the CNNmodel was relatively weak and couldn’t are specified as follows: Normal cell, Macrocyte, Mi-
exceed 33%, while the SVM model achieved a perfect crocyte, Spherocyte, Target cell, Stomatocyte, Ovalo-
100% on the RBC data set. The latter was deployed with cyte, Teardrop, Burr cell, Schistocyte, uncategorized,
the Radial Basis Function (RBF) default setting; this same Hypochromia, Elliptocyte. 2 classes were neglected
network was employed by Syahputra Mohammad Fadly, et for the lack of blood smear images.
al. [15], achieving an accuracy of 83.3% using Canny Edge
Detection for preprocessing and feature extraction to clas- 2. The ThalassemiaPBS-data set [40] contains 7108
sify 3 types of RBC abnormalities. images of peripheral blood smear images of four
Label-free identification was also explored by various re- thalassemia patients for nine cell types (Elliptocyte,
Teardrop, Normal cell, Cigar cell, Stomatocyte, Target
400 Informatica 49 (2025) 397–416 M. Saadallah et al.
cell, Hypochromia, Spherocyte, Acanthocyte). The specific morphological characteristics. The organization of
images were collected by a clinical pathologist from these classes was performed meticulously to ensure consis-
the Clinical Pathology Laboratory of the Faculty of tency with the referenced files provided by the authors.
Medicine, Public Health and Nursing, Universitas The accompanying ”Label” folder within the data set
Gadjah Mada, Indonesia, using the Olympus CX21 houses a series of files providing detailed annotations
microscope attached with an Optilab advance plus for each image, structured in a specific format: the x-
camera with 1000x total magnification. coordinate, y-coordinate, and the corresponding RBC type
encoded as a numerical value (each class is given a unique
3. The RBC-mini data set, Anti-Cancer Center El- value from 1 to 11).
Oued, Algeria [57]: A small data set fragment (mini- This labeling system facilitates the task of accurate iden-
batch) provided by the specialized healthcare facility: tification and classification of RBCs, thereby serving as a
Anti-Cancer Center in El-Oued, Algeria, that contains foundation for various haematological studies and the de-
a total of 13 blood smear images, regrouping 5 dif- velopment of automated diagnostic tools.
ferent types of RBC disorders: Burr-cells, ovalocyte, The same process was replicated on the RBC-mini data set
schistocyte, stomatocyte, and teardrop. The blood which we collected in collaboration with the Anti-Cancer
smear images were captured in May 2024 using an op- Center in El-Oued, Algeria. The resulting blood smears
tical microscope with a x1000 magnification. These were preprocessed and cropped using the OpenCV library
images were integrated to augment the diversity of the as described below. The extracted images were manually
RBC class and mitigate overfitting risks, not to serve labeled under the supervision of specialists at the Center.
as a core data source.
Table 1 regroups all 3 sources of RBC data sets and 1. Load the Image using OpenCV.
lists the size of each data set per type of cell disorder, 2. Preprocess the Image by converting it to grayscale and
before and after the application of data augmentation applying thresholding or edge detection to highlight
techniques described in Section 3.3.1. the cells.
The total size of the RBC data set is 29,363.
3. Find the cells using contour detection.
4. The Raabin-Leukemia data set [39] is a free-access
data set of microscopic images of blood cells, focus- 4. Extract each cell based on the detected contours and
ing on cases related to Leukemia. 2 experts labeled save them as separate image files.
the cells, and the samples were captured from patients
at the Takht-e Tavous laboratory in Tehran, Iran. The
Zeiss microscope and LG J3 smartphone camera were
used for the imaging.
5. The Malignant Lymphoma Classification data
set [4] contains a significant number of labeled
histopathological images of lymphoma, 3 types of
this cancer are covered in this data set: Chronic lym-
phocytic leukemia (CLL), follicular lymphoma (FL),
and mantle cell lymphoma (MCL), through biopsies
sectioned and stained with Hematoxylin/Eosin (H+E).
Tables 2 and 3 present the Lymphoma and Leukemia Figure 1: Images of single-cells cropped from one blood
datasets, respectively, compiled from the Malignant smear image
Lymphoma Classification dataset and the Raabin-
Leukemia dataset. The tables show the class distribu- The images provided by the ThalassemiaPBS-data set
tion before and after applying the data augmentation [40] already consisted of single cells; therefore, no further
techniques described in Section 3.3.1. The Lymphoma preprocessing was needed. This process was necessary to
dataset consists of 1,436 images, while the Leukemia isolate and classify specific morphological abnormalities.
dataset contains 16,811 images. In contrast, the leukemia [39] and lymphoma [4] data
were not cropped as single cells. Instead, the whole
3.2 Image cropping blood smear images were retained as input, since the
spatial context and global information contained in the
Figure 1 presents a representative blood smear image from whole smear image are all positively contributing towards
the Chula RBC-12 data set.[33] Each image was manually the classification of leukemia subtypes and malignant
cropped to focus on individual RBCs and relevant regions lymphomas. These differences in preprocessing re-
of interest. They were subsequently categorized based on flect the varying nature of the diagnostic tasks and were
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 401
Table 1: The complete Red Blood Cells data set description by type of cell and data size, including before and after
augmentation
Index Type of RBC No. of images [33] No. of images [40] No. of images [57] Total (with augm)
1 Acanthocyte 0 354 0 1432
2 Burr Cell 90 0 10 982
3 Cigar Cell 455 24 0 1893
4 Hypochromia 90 222 0 1284
5 Normal 1812 1426 0 3292
6 Ovalocyte 114 1211 4 3735
7 Schistocyte 108 0 8 453
8 Spherocyte 92 562 0 2640
9 Stomatocyte 49 382 3 1792
10 Target Cell 651 851 0 3912
11 Teardrop 26 2085 6 7948
Table 2: The Lymphoma data set description by type of cell mirroring the image horizontally or vertically, rotation,
and data size, including before and after augmentation which involves altering the image by turning it by a speci-
fied degree, and Gaussian blurring, which can help reduce
Category Subtype Before After noise and minor details by adding a Gaussian filter to the
CLL 113 443 image.
Lymphoma FL 139 526 When combined, these augmentation techniques allowed us
MCL 122 467 to enrich our data set, all the while relying on additional
data preprocessing techniques that will be introduced in the
Table 3: The Leukemia data set description by type of cell following sections.
and data size, including before and after augmentation
Category Subtype Before After
ALL (L1) 377 1131 3.3.2 Data resizing
ALL (L2) 3595 3595
Leukemia AML (m0) 672 997 Another vital preprocessing technique before training
AML (m1) 425 1700 the model is ”Resizing”. Since our data set is acquired
CLL 1071 3741 from various sources, it is rather imbalanced, and images
CML 1624 5647 come in different sizes and shapes. Therefore, the sizes
must be standardized into a uniform squared dimension
before feeding the images into the model. This will allow
taken into account during the design of the model pipelines. the model to learn efficiently and improve its accuracy.
Each model expects a certain target size for the images.
ResNet50V2 model, for instance, requires a target size
3.3 Data processing of (224, 224, 3); we were able to apply it using the
flow_from_directory() method in Keras. EfficientNetB3,
3.3.1 Data augmentation however, expects input images of shape (300, 300, 3) by
default, but the model can accept other input shapes as
Data augmentation is a technique that is essential in image long as the shape is at least 224 × 224 and the number
processing. It consists of artificially enhancing the size of of channels is 3 (RGB); thus, the input size was resized
a given data set by making changes to the original images. to (224, 224, 3) to reduce computation time and memory
Furthermore, this method presents a solution to improving usage.
the model’s performance by mitigating common issues like
overfitting.
These variations of the existing images generated by the When provided with the target size, Keras uses bilinear
data augmentation techniques provide a more robust data interpolation by default for the image resizing operation.
set. These alterations can consist of simple geometric trans- The formula specified below is a representation of the pro-
formations, and color or noise introductions, all designed to cess in which the original coordinates are mapped to new
make the model’s predictions more generalizable and accu- ones using interpolation.
rate. ( ( ))
In the present study, three primary data augmentation tech- H W
niques were employed, namely: flipping, which involves new(i′, j′) = interpolate orig or g r
i · i
, j · o ig (1)
Htgt Wtgt
402 Informatica 49 (2025) 397–416 M. Saadallah et al.
3.3.3 Data rescaling specifics of the corresponding data set.
The choice of models and more specific details are ex-
To ensure uniformity across input data and improve plained later in the section.
model training, all images were rescaled using appro-
priate preprocessing techniques depending on the model
architecture. To further enhance the CNN-based models’ 3.4.1 EfficientNetB3
efficiency, we’ve used ”Rescaling”, a technique in which
the image’s range of pixel values is changed to a standard A member of the EfficientNet family that was first intro-
or normalized range. duced in May 2019 by [21]. This architecture was chosen
There are two common rescaling techniques: Standard- due to its superior performance in feature extraction and its
ization and normalization. In our paper, we’ve opted for ability to balance computational efficiency with high accu-
the latter, which ensures that various pixel values are used racy, making it well-suited for tasks like blood cell classi-
during the model’s learning process. The pixels of a given fication.
image can be represented as integers in the range of 0 to EfficientNets are developed based on AutoML and com-
255 in the case of an 8-bit image. Rescaling modifies these pound scaling. The authors first used the AutoML MNAS
values into a different range of -1 to 1 or 0 to 1 when using Mobile framework to develop a baseline network, which
normalization. they named EfficientNetB0, the first of the EfficientNet
Likewise, we’ve used the flow_from_directory() method family. They then used the compound scaling method to
to rescale the images by a factor of 1/255 for our training, scale up and obtain the series from B1 to B7.
validation, and test batches. This method uses a form of The architectures achieved higher accuracy and efficiency
Min-Max Scaling, where each pixel value is divided by despite being smaller and, thus faster than other models.
255. The minimum value (0) in this case maps to 0, and In our paper, we have opted for the B3 version which gave
the maximum value (255) in turn maps to 1. Its formula promising initial results, additional layers were added to
can be defined as follows: adapt the model for blood cell classification.
We additionally adjusted key hyperparameters meticu-
lously during training, such as learning rate, batch size, and
X − X
X min dropout rate.
scaled = (2)
Xmax − Xmin Figure 2 shows the architecture of the EfficientNetB3
Meanwhile, for the ViT model, a different preprocess- base model that we have adopted for our specific classifica-
ing strategy was implemented to fit its expected input dis- tion task. The architecture was created using diagrams.net
tribution. Mean–standard deviation normalization was ap- (formerly known as draw.io). [31]
plied to standardize the image data and improve the model’s The model is first fed microscopic images resized to (300,
convergence. Each image’s pixel values were normalized 300, 3) and processed through its pretrained backbone.
using the following channel-wise means and standard devi- The default fully connected classification head of Efficient-
ations: NetB3 had been removed since it is only specific to the Im-
– Mean: [0.485, 0.456, 0.406] ageNet data set it was trained on (containing 1000 classes),
which allowed us to add the custom classification head tai-
– Standard deviation: [0.229, 0.224, 0.225] lored to our data set.
This normalization follows the formula: Three versions of the same architecture were used, each
with a modified softmax layer to adapt to our three differ-
ent data sets: (1) for RBC classification, 11 classes, (2) for
pixel(i, j) −mean
normalized_pixel(i, j) = (3) Leukemia classification, 6 classes, (3) for Lymphoma clas-
std_dev sification, 3 classes.
Additionally, Supplementary Material: Section 2.2 in- The EfficientNetB3 backbone acts as a feature extractor,
cludes the parameter-level details of the aforementioned extracting spatial and hierarchical features that are later fed
data augmentation techniques. to the added layers for learning. The first five layers are
frozen to prevent the weights from being updated during
3.4 Proposed solution training. This helps to adapt the deeper layers to our data
set; whereas freezing more layers could have resulted in
In this section, we present the architectures we employed under-fitting since the data set has unique characteristics
for our blood-cell classification system based on the lat- that are significantly different from the original ImageNet
est deep-learning techniques. Three state-of-the-art models data set. Deeper layers of the EfficientNetB3 backbone are
were explored for this task: EfficientNetB3, ResNet50V2, unfrozen to enable the model to capture more patterns. (eg:
and Vision Transformer (ViT). To further enhance the clas- cell morphology, staining patterns, etc).
sification accuracy, we developed ensemble models com- This version of the model expects a (300, 300, 3) input
bining the strengths of ViTs and CNNs. In training, Trans- shape by default, we have resized the input size to (224,
fer Learning was used to fine-tune each of the cited archi- 224, 3) to speed up training and reduce memory usage due
tectures, and the hyperparameters were optimized to suit the to limited resources and the size of our data set which is
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 403
Figure 2: EfficientNetB3model for blood cell classification: Themodel processes 300x300x3microscopic images through
convolutional layers with Swish activation, followed by mobile inverted bottleneck blocks (MBConv 1 and 6). The first
5 layers are frozen, with fine-tuned deeper layers. A custom classification head is added for task-specific classification
rather small. It consists essentially of modularized architectures that
A dropout layer is added after the global average pooling stack building blocks of the same connecting shape, called
(GAP) to reduce the overfitting that we have been subjected short-cut connections, that skip one or more layers. [10]
to due to the depth of the network against the small size of These connections in ResNet work by performing identity
the data set. A dropout rate of 0.3 was employed, thus de- mapping, the outputs of this mapping are added to those of
activating 30% of neurons during training. This prevents the stacked layers as illustrated in Figure 3.
the model from relying on these specific neurons.
A fully connected layer with 256 units using the ReLU ac-
tivation function is added, serving to learn complex repre-
sentations.
The classification head is completed with the final output
dense layer. The number of units corresponds to the number
of classes in each of our 3 data sets. The softmax activation
function is used to specify the multi-class classification.
3.4.2 ResNet50V2
Deep convolutional neural networks have contributed to
the Image-Classification field significantly, providing a ro-
bust platform to researchers ever since the emergence of
the first-ever deep neural network; LeNet in 1998. Later
on, in 2012, the idea of Dropout was presented, allowing
the model to avoid overfitting.
Researchers next focused solely on adding more convolu- Figure 3: Residual learning - a building block
tional layers to increase the depth of the model, and thus, its
efficiency. However, the idea of simply stacking up layers ResNet50V2 is a residual neural network variant that
didn’t benefit researchers as it introduced a whole new is- employs skip connections to prevent vanishing gradients
sue, the accuracy degradation, which unexpectedly wasn’t during back-propagation; This ensures efficiency in learn-
due to overfitting, rather, it was caused by the Vanishing ing complex features present in microscopic blood cell
Gradient Effect. [6] images.
Residual Neural Networks to the rescue! In 2015,
ResNet152, the first of the ResNet family was introduced. Figure 4 presents the architecture of the ResNet50V2
404 Informatica 49 (2025) 397–416 M. Saadallah et al.
Figure 4: ResNet-50v2 model for blood cell classification: The model processes 224 × 224 × 3 microscopic images
through a series of convolutional layers with ReLU activation. It consists of four main blocks with residual connections
and employs bottleneck blocks (1x1, 3x3, 1x1 convolutions). A global average pooling layer is added, followed by a fully
connected classification head, and a softmax activation for predicting blood cell classes. Key components include skip
connections, dropout (0.6), and task-specific fine-tuning
base model that we have employed for our classification. Lymphoma classification, 3 classes.
Similarly, the model was also designed using the dia-
grams.net tool. 3.4.3 Experimental hyperparameters
The model is fed a microscopic image of size (224,224,3)
that is previously preprocessed and normalized. Table 4 presents a breakdown of the hyperparameters and
It consists of 50 layers, focusing more on improved gradi- setup used in the experiments based on the training pipeline
ent flow and training stability by introducing pre-activation using Keras/Tensorflow, along with their purposes, as well
residual blocks and applying batch-normalization and as the strategies employed to transfer-learn and fine-tune
activation (ReLU) before convolutions. the models and achieve the best accuracies possible.
The network’s initial block captures low-level features The EfficientNetB3 and ResNet50V2models were both
such as edges, textures, and patterns by implementing trained using the same hyperparameters detailed in Table 4.
convolutional and pooling layers, followed by 4 residual
blocks, with skip connections to prevent vanishing gra- 3.4.4 Experimental environment
dient problems. Higher-level features are extracted using
down-sampling (strides). Hardware: The experiments for all 3 models were con-
The final output of these blocks is passed through a global ducted by Google Colab; It typically consists of NVIDIA
average pooling layer (GAP) to reduce the feature map to a Tesla GPUs.
1D vector; A fully connected layer and a softmax classifier
are added. Software: Platform: Google Colab, a hosted Jupyter
Notebook environment.
Framework(s): Tensorflow v2.18.0 and Keras v3.6.0 were
The base model acts as a feature extractor, and the cus- used to develop, train, and evaluate the 3 models.
tom layers act as a task-specific classifier, tailored to blood- Python: Version (3.10).
cell classification. Similar to the EfficientNet, the model Libraries: matplotLib, numpy, PIL, joblib, and more were
was transfer-learned by freezing the first 5 layers. This pre- used to preprocess, analyze, and visualize data.
vents overfitting as the learning focuses on the deep layers,
and a dropout layer is also added to ensure the model does
not memorize the training data. Preceded by a 256 units Storage: The 3 data sets were preprocessed and split into
dense layer that allows the model to combine the learned Training, Validation, and Test data sets, each stored in
features to improve classification and the classifier ends Google Drive, which is mounted to the Colab environment
with a dense layer that has the same number of neurons for access. The detailed data split strategy, including train-
as the classification classes: (1) for RBC classification, 11 ing, validation, and testing partitions, is provided in Sup-
classes, (2) for Leukemia classification, 6 classes, (3) for plementary Material: Section 2.1.
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 405
Table 4: Experimental hyperparameters for training the CNN models and their purposes
Hyperparameter Value Purpose
Optimizer Adam (LR=10−4) A low learning rate is employed to fine-tune pre-
trained layers.
Loss function Categorical Used for multi-class classification.
crossentropy
Metrics Accuracy To monitor the number of correctly classified in-
stances during training.
Steps-per-epoch ResNet50V2: 20, The number of training batches processed per
EfficientNetB3: epoch.
200
Validation steps ResNet50V2: 10, The number of validation batches processed per val-
EfficientNetB3: idation step.
316
Epochs ResNet50V2: 300, Specifies the training schedule which allows grad-
EfficientNetB3: 10 ual convergence.
3.5 ViT model for Lymphoma classification, 3 classes.
Visual or Vision Transformers (ViT) is a novel approach in- The pretrained backbone uses the google/vit-base-
troduced by Dosovitskiy et al. [28]. It uses the concept of patch16-224-in21K model from the Hugging Face library
transformers designed specifically for visual applications [25] as a feature extractor. The model was trained on the
and image classification tasks in particular. ImageNet-21K data set [8]. It was fine-tuned to adapt to
When using the transformer blocks in ViT, the multi-head the blood cell classification task, where the number of
attention mechanism is applied to integrate global context labels was defined as the number of classes in the data set
efficiently and learn high-level features. [42] as mentioned previously in this section.
Following the success of NLP transformers [16], Dosovit-
skiy et al. were inspired to develop a new attention-based
class of models that can be exploited in Computer Vision. The transformer encoder depicted in Figure 5 is first
Compared to NLP transformers, ViT only uses the encoder provided with the embedded patches (Patch-embedding/
attention branch, neglecting the decoder attention branch, Position-embedding). The input image is divided into
whilst word tokens are replaced by image patches. fixed-size patches of 16x16. We next apply a linear projec-
In a normal CNN, the entire image is taken as input, tion on the flattened patches to form a fixed-dimensional
whereas in ViT, the image is first divided into equal-sized vector. Unlike CNNs, Transformers require position em-
patches, which are passed through some linear layers; the beddings to learn and capture the input’s order of sequence
outputs of this layer are known as patch embeddings. Be- [38]. It serves to improve accuracy and encode the spatial
tween these embeddings, we have the position embeddings, information of the patches.
which serve to provide the model with some positional in-
formation regarding the sequence of these patches. After- The combined embedded patches are fed into the Trans-
ward, another learnable token is added to the position em- former Encoder to go through a series of L layers, each in-
bedding for image classification purposes. cluding a list of components, as follows:
Figure 5 presents the architecture of the ViT model we 1. Multi-head self-attention is a mechanism that enables
have employed for our blood cell classification task. Prior the model to learn global patterns by splitting the
to the training phase, the data was first prepared and pro- process of self-attention into multiple heads, where
cessed to fit the model’s requirements and expected input. each head focuses on the interaction between patch-
The data set was initially split into training, validation, and embeddings differently.[16] The attention calculations
test sets and stored in specific folders. The ImageFolder are eventually merged to give a more global score.
utility was used to load the images and associate them with
their corresponding classes based on the folder names pro- 2. The output of the Multi-head attention is added to
vided. The images were later resized to fit the shape ex- the input of the next component by a skip connec-
pected by the ViT model: 224×224. Further normalization tion (residual connection) after normalization. As ex-
was implemented to standardize the image data, and make plained earlier, residual connections are added to pre-
it more suitable for the model (See Section 3.3.3). vent the vanishing gradient during training.
Similarly to the CNN architecture, three versions were
implemented for each data set: (1) for RBC classification, 3. To further enhance the model’s learning through
11 classes, (2) for Leukemia classification, 6 classes, (3) patch-embeddings, a feed-forward network (FFN)
406 Informatica 49 (2025) 397–416 M. Saadallah et al.
Figure 5: Vision Transformer (ViT) Architecture for blood cell classification: The model processes 224×224 microscopic
images through patch embeddings and position encoding, which are later fed to the transformer encoder with loaded
weights from the pretrained ViT-B-16 in 21K model. After passing through the transformer encoder, the embeddings are
used as the input to the classification head (MLP + Softmax)
[48] is fed to the normalized output of the Multi- 224-in21k, along with their respective values; detailing the
head attention, which consists of fully connected lay- batch size, the learning rate, and the optimizer employed.
ers with a GeLU activation in between. This allows The OneCycleLr Scheduler was used as a strategy to vary
the model to capture local transformations. the learning rate during training; each cycle uses a maxi-
mum of 10−3 as a learning rate. Other parameters include
4. Similarly to the Multi-head self-attention block, the the CrossEntropyLoss function and a total of 10 epochs
FFN is normalized and added to the residual connec- (624 batches per epoch) to train the model.
tion.
The output of the Transformer Encoder is a sequence of em-
bedding, enrichedwith local and global contextual informa- Table 5: Experimental hyperparameters for training the ViT
tion, independently for each patch. model
After passing through the Transformer Encoder, the embed-
ding corresponding to the special classification token (cls) Hyperparameter Value
is used as the input to the classification head that consists Batch size 32
of a Multi-Perceptron head (MLP), and a softmax classifi- Learning rate 10−4 (initial)
cation head. Optimizer AdamW
The MLP takes the output of the Transformer Encoder and Scheduler OneCycleLR
feeds it into a series of fully connected layers to prepare the
Scheduler Max lr 10−3
data for the softmax classification head that processes it to
the desired classes. Number of epochs 10
Loss Function CrossEntropyLoss
3.5.1 Experimental hyperparameters Model backbone google/vit-base-patch16-224-in21k
Table 5 outlines the hyperparameters used for training a
ViT model using the backbone google/vit-base-patch16-
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 407
after experimenting with the most prevalent methods in
image classification tasks, namely, maximum voting, the
averaging method, and the weighted sum. In the weighted-
average method, the models are assigned different weights
after training, defining the importance of each model for
prediction.
The weighted-average ensemble combines predictions
from a CNN (M1) and a Vision Transformer (M2), with
o∑utput probabilities for class c denoted as P1(c) and P2(c),
respectively, obtained via the softmax function to ensure
c Pi(c) = 1 for i = 1, 2. The weights w1 and w2 are as-
signed to M1 and M2 based on validation performance. The
ensemble probability for class c is computed as:
w
P 1P1(c) + w2P2(c)
ensemble(c) = (4)
w1 + w2
The final class prediction is determined by selecting the
class with the highest ensemble probability:
ĉ = arg max Pensemble(c) (5)
c
Preprocessing: The input fed to the already trained
models is first preprocessed; each model is preprocessed
differently. The ViT uses normalization with mean and std,
while the CNN uses simple rescaling (1/255). The models
are later loaded to make predictions, and both models
output probabilities for the different classes; we used the
softmax function to ensure that they sum up to 1.
Weight Selection: Weights w1 and w2 were determined
Figure 6: Workflow of the weighted average ensemble through a grid search over predefined pairs, specifically
method: The input is preprocessed to fit the ViT and the [(0.3, 0.7), (0.4, 0.6)], where each pair sums to 1 to main-
CNN models, and the predictions of the models are later tain normalized probabilities. The grid search evaluated
combined using the weighted average approach to generate each weight combination on a validation subset using
the final prediction classification accuracy as the performance metric. The pair
that achieved the highest accuracy was selected. Further
details about the ensemble weight selection and perfor-
3.5.2 ViT-CNN ensemble model mance across datasets are provided in the Supplementary
Material: Section 3.
To further enhance the performance of our models, an
ensemble method was introduced to seek the opinions
of several models and combine them to achieve highly
accurate classifications than those of the raw models when 4 Results
trained separately [44] [50]. Through our experiments,
we have observed the superiority of residual networks This section provides an in-depth analysis of the results
in training and efficient learning, while the ViT model obtained from our experiments. First, we explore the
performed better in certain instances, focusing more on performance of our individual models, using the insights
learning complex features. Thus, we incorporated in our present in the confusion matrix and focusing on metrics
methodology a dual-architecture ensemble, combining such as: (1) accuracy, (2) precision, (3) recall, (4) F1-score,
the residual network’s efficiency with the high precision (5) Cohen kappa, and (6) AUC scores, followed by an
obtained by the ViT. evaluation of the ensemble model, HematoFusion. The
evaluation is conducted across the three data sets we have
Figure 6 presents the flowchart of the ResNet-ViT introduced in earlier sections, and the results are eventually
ensemble model that we have implemented. interpreted in the context of existing literature.
The weighted-average ensemble method was selected
408 Informatica 49 (2025) 397–416 M. Saadallah et al.
The Accuracy is calculated by measuring the number of where:
predicted cases. When achieved, a high accuracy means the
overall performance of the model is good. However, in the TP
TPR = (True Positive Rate or Sensitivity)
case of imbalanced data sets, high accuracy can be mislead- TP + FN
ing, and other metrics are necessary to further evaluate the FP
FPR = (False Positive Rate).
model. FP + TN
TP + TN
Accuracy = (6)
TP + TN + FP + FN 4.1 Classification results
The Precision is calculated by measuring the number To analyze the model’s performance during training, the
of correctly predicted positive cases. A high precision is accuracy and loss were both monitored and visualized
achieved only if most of the positive cases are correctly pre- through the training curves over successive epochs.
dicted. [45] Figures S5, S6, and S7 (Supplementary Material Section
TP
Precision = (7) 7) depict the training curves for the RBC Morphology,
TP + FP Leukemia, and Lymphoma data sets, respectively.
The Recall, also known as Sensitivity or True Positive
Rate measures whether all relevant cases of the data set Additionally, for further visual evaluation of the classi-
were correctly predicted.[45] fication performance, confusion matrices were computed
on the test set, showing the number of accurate and
TP inaccurate predictions of instances, namely: True Negative
Recall = (8)
TP + FN (TN), True Positive (TP), False Negative (FN), and False
Positive (FP).
To address the accuracy’s shortcomings in handling im- The confusion matrices generated for the RBC Morphol-
balanced data sets, which is the case in our paper, the F1- ogy, Leukemia, and Lymphoma datasets are displayed,
score was introduced for balanced evaluations, combining respectively, in Figures 7, 8, and 9.
precision and recall in one metric. The F1-score eventu-
ally only achieves a high accuracy when both precision and These confusion matrices were used to compute the
recall are high. [43] quantitative metrics for a more specific evaluation.
Detailed tables presenting per-class performance metrics
F1 Score = 2 · Precision · Recall (9) (precision, recall, and F1-score, kappa) for each model and
Precision + Recall dataset combination are provided in the SupplementaryMa-
The Cohen Kappa was introduced as a statistical mea- terial: Section 4, Tables S3–S5.
sure of the agreement between the predicted labels and their The results for the classification of RBC Morphology,
actual values. If κ=1, the perfect agreement is achieved, Leukemia, Lymphoma, the Ensemble model, HematoFu-
if κ=0, the agreement is random, and if κ<0, it means the sion are summarized in Tables 6, 7, and 8, respectively. To
model achieved more than a random agreement. [36] test model stability and robustness over sets, we conducted
a bootstrapping analysis. This technique provides us with
Po − Pe an estimate of results’ variability and increases the validity
κ = 1 )
1 − ( 0
Pe of our performance claims over single-run statistics.
The detailed bootstrapping results with distribution plots
where: and summary statistics are presented in Supplementary
Material: Section 6 (Table S7 and Figures S2–S4).
Po = Observed agreement (accuracy) Furthermore, we have calculated and included the AUC
Pe = E∑xpected agreement based on chance scores (Table S6) and ROC curves (Figure S1) for our
k (Ai · B individual models across each dataset, as shown in Supple-
i)Pe = N2 mentary Material: Section 5.
i=1
The Area Under the Curve (AUC) score, specifically the
Area Under the Receiver Operating Characteristic (ROC) 5 Discussion
curve [24], evaluates a model’s ability to discriminate be-
tween classes at various thresholds of classification. An 5.1 Interpretation of results
AUC score approaching 1 indicates high discriminative ca-
pability, particularly useful for unbalanced datasets that are The convergence of the ResNet50V2 model illustrates a
common in blood cell classification, where accuracy be- steady reduction in training loss, and the accuracy becomes
comes misleading base∫d on class difference. stable after reaching a certain number of epochs. The
ViT model has demonstrated higher fluctuation in both
1 accuracy and loss during training.
AUC = TPR(FPR) dFPR (11)
0
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 409
(a) EfficientNetB3 model (b) ResNet50V2 model
(c) Pretrained ViT model (d) HematoFusion model
Figure 7: Confusion matrices for the classification performance of the four models on the RBC data set: (a) EfficientNetB3
model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and ResNet50V2
models.
Table 6: RBC Morphology classification results across the three individual models with detailed metrics for evaluation
Model Model Performance
Train Acc Val Acc Test Acc Kappa Recall F1-score Precision
EfficientNetB3 0.99 0.91 0.97 0.93 0.92 0.92 0.92
ResNet50V2 0.98 0.98 0.92 0.92 0.93 0.93 0.93
ViT 0.98 0.94 0.96 0.96 0.94 0.94 0.94
Table 7: Leukemia classification results across the three individual models with detailed metrics for evaluation
Model Model Performance
Train Acc Val Acc Test Acc Kappa Recall F1-score Precision
EfficientNetB3 1.0 1.0 1.0 0.99 0.99 0.99 0.99
ResNet50V2 1.0 1.0 1.0 0.99 1.0 1.0 1.0
ViT 0.99 0.99 0.99 0.99 1.0 1.0 1.0
410 Informatica 49 (2025) 397–416 M. Saadallah et al.
(a) EfficientNetB3 model (b) ResNet50V2 model
(c) Pretrained ViT model (d) HematoFusion model
Figure 8: Confusion matrices for the classification performance of the four models on the Leukemia data set: (a) Effi-
cientNetB3 model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and
ResNet50V2 models.
Table 8: Lymphoma classification results across the three individual models with detailed metrics for evaluation
Model Model Performance
Train Acc Val Acc Test Acc Kappa Recall F1-score Precision
EfficientNetB3 1.0 0.99 0.99 0.97 0.99 0.99 0.99
ResNet50V2 1.0 0.91 0.96 0.91 0.96 0.96 0.96
ViT 0.98 0.98 0.95 0.92 0.98 0.98 0.98
Table 9: HematoFusion ensemblemodel classification results across the three datasets, showing detailed evaluationmetrics
Dataset Ensemble Model Performance
Best Acc Kappa Recall F1-score Precision
RBC 0.96 0.94 0.97 0.97 0.97
Leukemia 0.99 0.99 1.00 1.00 1.00
Lymphoma 0.96 0.95 0.97 0.97 0.97
When comparing the True Positives of the proposed Hypochromia class. The ensemble model, on the other
HematoFusion model with the individual models, we hand, exhibited a stronger ability to recognize this class.
can clearly observe an increase in the rates of correctly Whereas, Acanthocyte and Teardrop were easier to iden-
classified cases and a decrease in the misclassification tify, owing to their distinguishable shapes, which was
rates. reflected in the high number of TP.
The individual models struggled with predicting the
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 411
(a) EfficientNetB3 model (b) ResNet50V2 model
(c) Pretrained ViT model (d) HematoFusion model
Figure 9: Confusion matrices for the classification performance of the four models on the Lymphoma data set: (a) Effi-
cientNetB3 model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and
ResNet50V2 models.
In Table 6, a slight overfitting is observed due to the The Ensemble model, HematoFusion, demonstrates more
class imbalance; thus, the accuracy can be misleading for uniform results across all data sets in terms of all evaluat-
accurate measurement of the performance. This, however, ing metrics, mitigating the issues with the class imbalance,
was addressed with the use of precision, recall, and F1- as evidenced by its performances, leveraging the strengths
score. Although EfficientNetB3 was slightly better on test of both the ViT and ResNet50V2models that struggled with
accuracy (0.97), the ViT model outperformed in stronger some classes.
and more descriptive metrics, like Kappa score, precision, The precision improved by 4% in the RBC data set and
recall, and F1-score, which indicate more balanced perfor- reached a perfect 100% for Leukemia classification, av-
mance with class imbalance. Therefore, ViT is graded as eraging the performances of the individual models on the
the overall best performer on the RBC morphology dataset. Lymphoma data set with a precision of 97% on the test set.
Table 7 shows more consistent results on the Leukemia Despite the strong performance of our proposed solution,
data set across all models, achieving perfect classification, further improvement could be implemented to help the
which indicates better generalizations. model generalize better and address the issue of class im-
Both ResNet50V2 and EfficientNetB3 achieved compara- balance efficiently.
ble top-tier performances on the Leukemia dataset, with
identical test accuracy, precision, recall, and F1-score, and
minor variations in other evaluation metrics. Efficient- 5.2 Comparative study
NetB3, alternatively, outperforms the other models on the Table 9 presents a breakdown of the performance of the pro-
Lymphoma classification (Table 8), reaching almost perfect posed solution across all three datasets, outlining the accu-
accuracies. racy and precision of the model when compared to the lit-
412 Informatica 49 (2025) 397–416 M. Saadallah et al.
Table 10: Comparative results of the proposed solution and the literature across different metrics for each data set
Dataset HematoFusion Literature
Accuracy Precision Accuracy Precision
RBC 0.96 0.97 0.98 0.97
Leukemia 0.99 1.0 0.99 0.99
Lymphoma 0.96 0.97 0.96 0.96
erature. to known issues like dataset imbalance. As identified pre-
In a bid to substantiate the efficiency of our proposed solu- viously, some classes were underrepresented, and this could
tion, we evaluated it against the following models: result in biased learning as well as overfitting. To miti-
gate this, data augmentation techniques were employed (as
1. Literature [7] RBC Classification: The authors pre- outlined in Supplementary Section 2.2), and performance
sented a Maximum Voting based Ensemble Model to was monitored across a variety of metrics (precision, re-
classify Dacrocyte (Teardrop), Schistocyte, and Ellip- call, F1-score, Cohen’s Kappa, and AUC scores) rather
tocyte cells (Cigar cells) in Iron Deficiency Anemia. than simply accuracy. However, we are aware that the
The average classification precision and accuracy of lack of external validation data limits generalizability. Al-
the latter reached a maximum of 97% and 98%, re- though high-performance metrics are presented, the models
spectively; While both models achieved the same pre- have not been prospectively validated within a real clini-
cision of 97%, the model in the Literature reported a cal workflow. Their incorporation into clinical decision-
slightly higher accuracy (98%) compared to Hemato- making would require extensive regulatory testing and in-
Fusion’s 96%. Nonetheless, it’s worth noting that our terpretability evaluation. Additionally, while conventional
data set comprises 11 classes against the 3 classes stud- regularization techniques such as dropout and data aug-
ied in this article. mentation were applied to address overfitting, we recog-
2. Literature [32] Leukemia Classification: The authors nize the need for more advanced strategies. Future work
proposed a ViT-CNN Ensemble Model for the diagno- will explore class imbalance mitigation techniques such
sis of Acute Lymphoblastic Leukemia (ALL), which as SMOTE, GAN-based synthetic image generation, and
is one of the 6 classes that we analyzed in our pa- uncertainty-aware training beyond testing on independent
per. Compared to the model in the Literature, which cohorts, to further assess the robustness of the model in ac-
achieved 99% accuracy and 99% precision on the tual clinical settings. Furthermore, we intend to conduct
Leukemia dataset, HematoFusion matched the accu- ablation studies on ensemble weight parameters and data
racy (99%) but outperformed in precision, achieving a augmentation strategies to evaluate their individual contri-
perfect 100%. butions.
3. Literature [41] Lymphoma Classification: Malignant
Lymphoma (ML) was addressed in this paper, which 6 Conclusion
is among the 3 classes that appear in our Lymphoma
data set. In this study, the problem of pathological blood cell clas-
The proposed hybrid model used the combined fea- sification was addressed through the use of novel deep-
tures of 3 deep learning networks, namely, MobileNet- learning strategies. We curated a data set for RBCMorphol-
VGG16, VGG16-AlexNet, and MobileNet-AlexNet, ogy classification, consisting of samples from three dif-
to classify the models by the XGBoost and DT algo- ferent sources. The process involved preprocessing tech-
rithms, reaching an average accuracy and precision of niques to establish a data set aligned with our research ob-
96%. jectives; 2 other data sets were acquired, targeted for Lym-
phoma and Leukemia classifications separately.
An extended version of this comparison, covering a broader Three distinct individual models were applied for each of
range of SOTA models and datasets, is provided in Supple- the data sets: the EfficientNetB3, ResNet50V2, and a pre-
mentary Material: Section 8 (Table S8). trained ViT model. To leverage the strengths of both the
Overall, our proposed HematoFusion Ensemble model CNN and ViT architectures, an Ensemble model using the
achieved a reliable performance across the 3 data sets, de- weighted average method was developed.
spite the imbalanced data set and the high number of classes The present findings confirm that the proposed Hemato-
in the case of RBC Morphology Classification. Fusion model mitigates the shortcomings of the individual
models by enhancing the accuracy, precision, and sensitiv-
5.3 Limitations ity, achieving more consistent results across the three data
sets. While HematoFusion demonstrates competitive or su-
Although the reported results show high precision, reach- perior performance on Leukemia and Lymphoma classifi-
ing up to 99%, this should be interpreted with caution due cation, particularly in precision and F1-score, it performs
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 413
comparably on RBC classification, despite its higher num- [8] O. Russakovsky et al. ImageNet Large Scale Vi-
ber of classes and the issue of data imbalance that resulted sual Recognition Challenge. Preprint at https://
in a few cases of overfitting. We additionally acknowledge arxiv.org/abs/1409.0575. 2015.
certain limitations in predicting a couple of classes. These [9] J. C. Chapin and M. T. Desancho. “Hematologic
are the key components to overcome in future research. Fu- Dysfunction in the ICU”. In: Critical Care. Ed. by
ture studies should also be devoted to covering more patho- J. M. Oropello, S. M. Pastores, and V. Kvetan. New
logical blood disorders and implementing further process- York: McGraw-Hill Education, 2016.
ing and data augmentations to alleviate the issue of class
imbalance and overfitting. [10] Kaiming He et al. Identity Mappings in Deep Resid-
Overall, this paper provides a foundation for future de- ual Networks. Preprint at http : / / arxiv . org /
velopments by establishing baseline data that future re- abs/1603.05027. 2016.
searchers can expand upon to address the limited data avail- [11] Kenneth Kaushansky et al. Williams Hematology.
able for RBC Morphology and combining the strengths of New York: McGraw-Hill Education, 2016.
the residual networks and vision transformers for a more [12] GaoHuang et al. “Densely Connected Convolutional
robust framework. Networks”. In: Proceedings of the IEEE conference
on computer vision and pattern recognition. Hon-
References olulu: IEEE, 2017, pp. 4700–4708. DOI: https://
doi.org/10.48550/arXiv.1608.06993.
[1] Yann LeCun et al. “Gradient-based learning applied [13] Yiyue Jiang et al. “Label-free detection of aggre-
to document recognition”. In: Proceedings of the gated platelets in blood by machine-learning-aided
IEEE 86.11 (1998), pp. 2278–2324. DOI: https : optofluidic time-stretch microscopy”. In: Lab on a
//doi.org/10.1109/5.726791. Chip 17.14 (2017), pp. 2426–2434. DOI: https :
[2] KS Kim et al. “Analyzing blood cell image to dis- //doi.org/10.1039/C7LC00396J.
tinguish its abnormalities”. In: Proceedings of the [14] Mazin Z Othman, Thabit S Mohammed, and Alaa B
eighth ACM international conference onmultimedia. Ali. “Neural network classification of white blood
New York: Association for Computing Machinery, cell using microscopic images”. In: International
2000, pp. 395–397. DOI: https://doi.org/10. Journal of Advanced Computer Science and Appli-
1145/354384.354543. cations 8.5 (2017), pp. 99–103. DOI: https : / /
[3] Jia Deng et al. “ImageNet: A large-scale hierarchical doi.org/10.14569/IJACSA.2017.080513.
image database”. In: 2009 IEEE conference on com- [15] Mohammad Fadly Syahputra, Anita Ratna Sari, and
puter vision and pattern recognition. Miami: Ieee, Romi Fadillah Rahmat. “Abnormality classification
2009, pp. 248–255. DOI: https://doi.org/10. on the shape of red blood cells using radial basis
1109/CVPR.2009.5206848. function network”. In: 2017 4th International Con-
[4] Nikita Orlov et al. “Automatic Classification of ference on Computer Applications and Information
Lymphoma Images With Transform-Based Global Processing Technology (CAIPT). Kuta Bali, Indone-
Features”. In: IEEE transactions on information sia: IEEE, 2017, pp. 1–5. DOI: https://doi.org/
technology in biomedicine : a publication of the 10.1109/CAIPT.2017.8320739.
IEEE Engineering in Medicine and Biology Society [16] Ashish Vaswani et al. “Attention is all you need”. In:
14 (2010), pp. 1003–13. DOI: https://doi.org/ Advances in Neural Information Processing Systems
10.1109/TITB.2010.2050695. 30 (2017). DOI: https://doi.org/10.48550/
[5] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. arXiv.1706.03762.
Hinton. “ImageNet Classification with Deep Convo- [17] Hajara Abdulkarim Aliyu et al. “Red blood cell clas-
lutional Neural Networks”. In: Advances in Neural sification: deep learning architecture versus support
Information Processing Systems. Ed. by F. Pereira et vector machine”. In: 2018 2nd international confer-
al. Vol. 25. Lake Tahoe, Nevada: Curran Associates, ence on biosignal analysis, processing and systems
Inc., 2012, pp. 1097–1105. (ICBAPS). Kuching,Malaysia: IEEE, 2018, pp. 142–
[6] K. He et al. Deep Residual Learning for Image 147. DOI: https://doi.org/10.1109/ICBAPS.
Recognition. Preprint at https : / / arxiv . org / 2018.8527398.
abs/1512.03385. 2015. [18] Paul Mooney. Blood Cell Images. 2018. URL:
[7] Mahsa Lotfi et al. “The detection of dacrocyte, schis- https : / / www . kaggle . com / datasets /
tocyte and elliptocyte cells in iron deficiency ane- paultimothymooney/blood-cells.
mia”. In: 2015 2nd International conference on pat- [19] Mariam Nassar et al. “Label-free identification of
tern recognition and image analysis (IPRIA). Rasht, white blood cells using machine learning”. In: Cy-
Iran: IEEE, 2015, pp. 1–5. DOI: https : / / doi . tometry Part A 95.8 (2019), pp. 836–842. DOI:
org/10.1109/PRIA.2015.7161628. https://doi.org/10.1002/cyto.a.23794.
414 Informatica 49 (2025) 397–416 M. Saadallah et al.
[20] N. C. Shenggan. BCCD Dataset. https : / / [31] JGraph. diagrams.net, draw.io. Oct. 2021. URL:
github.com/Shenggan/BCCD_Dataset. 2019. https://www.diagrams.net/.
[21] Mingxing Tan and Quoc V. Le. “EfficientNet: Re- [32] Zhencun Jiang et al. “Method for diagnosis of acute
thinking Model Scaling for Convolutional Neural lymphoblastic leukemia based on ViT-CNN ensem-
Networks”. In: Proceedings of the 36th International ble model”. In:Computational Intelligence and Neu-
Conference on Machine Learning. Vol. 97. Long roscience 2021.1 (2021), p. 7529893. DOI: https:
Beach, California: PMLR, 2019, pp. 6105–6114. //doi.org/10.1155/2021/7529893.
DOI: https : / / doi . org / 10 . 48550 / arXiv . [33] Korranat Naruenatthanaset et al.Red Blood Cell Seg-
1905.11946. mentation with Overlapping Cell Separation and
[22] Laith Alzubaidi et al. “Classification of red blood Classification on Imbalanced Dataset. Preprint at
cells in sickle cell anemia using deep convolu- https://arxiv.org/abs/2012.01321. 2021.
tional neural network”. In: Intelligent Systems De- [34] Maithra Raghu et al. “Do vision transformers see
sign and Applications. Ed. by Ajith Abraham et like convolutional neural networks?” In: Advances
al. Vol. 1. Cham: Springer International Publishing, in neural information processing systems 34 (2021),
2020, pp. 6–8. DOI: https://doi.org/10.1007/ pp. 12116–12128. DOI: https://doi.org/10.
978-3-030-16657-1_51. 48550/arXiv.2108.08810.
[23] Yasmin M Kassim et al. “Clustering-Based Dual [35] Georg Steinbuss et al. “Deep learning for the classifi-
Deep Learning Architecture for Detecting Red cation of non-Hodgkin lymphoma on histopathologi-
BloodCells inMalaria Diagnostic Smears”. In: IEEE cal images”. In:Cancers 13.10 (2021), p. 2419. DOI:
Journal of Biomedical and Health Informatics 25.5 https://doi.org/10.3390/cancers13102419.
(2020), pp. 1735–1746. DOI: https://doi.org/
10.1109/JBHI.2020.3034863. [36] Željko Vujović et al. “Classification model evalua-
tion metrics”. In: International Journal of Advanced
[24] Tatiana Cristina Figueira Polo and Hélio Amante Computer Science and Applications 12.6 (2021),
Miot. Use of ROC curves in clinical and experimen- pp. 599–606. DOI: https://doi.org/10.14569/
tal studies. 2020. DOI: https://doi.org/10. IJACSA.2021.0120670.
1590/1677-5449.200186.
[37] XufengYao et al. “Classification of white blood cells
[25] Thomas Wolf et al. HuggingFace’s Transform- using weighted optimized deformable convolutional
ers: State-of-the-art Natural Language Processing. neural networks”. In:Artificial Cells, Nanomedicine,
2020. arXiv: 1910.03771 [cs.CL]. URL: https: and Biotechnology 49.1 (2021), pp. 147–155. DOI:
//arxiv.org/abs/1910.03771. https://doi.org/10.1080/21691401.2021.
[26] Yassine Barhoumi and Ghulam Rasool. Scope- 1879823.
former: n-CNN-ViT hybrid model for intracranial [38] Kai Jiang et al. “The encoding method of position
hemorrhage classification. 2021. DOI: https : / / embeddings in vision transformer”. In: Journal of Vi-
doi.org/10.48550/arXiv.2107.04575. sual Communication and Image Representation 89
[27] Ferhat Bozkurt. “Classification of blood cells from (2022), p. 103664. DOI: https://doi.org/10.
blood cell images using dense convolutional net- 1016/j.jvcir.2022.103664.
work”. In: Journal of Science, Technology and En- [39] Zahra Mousavi Kouzehkanan et al. “A large dataset
gineering Research 2.2 (2021), pp. 81–88. DOI: of white blood cells containing cell locations and
https://doi.org/10.53525/jster.1014186. types, along with segmented nuclei and cytoplasm”.
[28] Alexey Dosovitskiy et al. An Image is Worth 16x16 In: Scientific Reports 12.1 (2022), p. 1123. DOI:
Words: Transformers for Image Recognition at https : / / doi . org / 10 . 1038 / s41598 - 021 -
Scale. Preprint at https : / / arxiv . org / abs / 04426-x.
2010.11929. 2021. [40] Dyah Aruming Tyas et al. “Erythrocyte (red blood
[29] Mawaddah Harahap et al. “Implementation of Con- cell) dataset in thalassemia case”. In: Data in Brief
volutional Neural Network in the classification 41 (2022), p. 107886. DOI: https://doi.org/
of red blood cells have affected of malaria”. In: 10.1016/j.dib.2022.107886.
Sinkron: jurnal dan penelitian teknik informatika 5.2 [41] Mohammed Hamdi et al. “Hybrid Models Based on
(2021), pp. 199–207. DOI: https://doi.org/10. Fusion Features of a CNN and Handcrafted Features
33395/sinkron.v5i2.10713. for Accurate Histopathological Image Analysis for
[30] Danish Jamil et al. “Diagnosis of gastric cancer using Diagnosing Malignant Lymphomas”. In: Diagnos-
machine learning techniques in healthcare sector: a tics 13.13 (2023), p. 2258. DOI: https : / / doi .
survey”. In: Informatica 45.7 (2021). DOI: https: org/10.3390/diagnostics13132258.
//doi.org/10.31449/inf.v45i7.3633.
HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 415
[42] Rojina Kashefi et al. Explainability of Vision Trans- https : / / doi . org / 10 . 1109 / ICDICI62993 .
formers: A Comprehensive Review and New Per- 2024.10810888.
spectives. Preprint at https://arxiv.org/abs/ [52] Ruaa Sadoon and Adala Chaid. “Classification of
2311.06786. 2023. pulmonary diseases using a deep learning stack-
[43] Gireen Naidu, Tranos Zuva, and Elias Mmbongeni ing ensemble model”. In: Informatica 48.14 (2024).
Sibanda. “A review of evaluation metrics in machine DOI: https : / / doi . org / 10 . 31449 / inf .
learning algorithms”. In: Computer science on-line v48i14.6145.
conference. Springer. 2023, pp. 15–25. DOI: https: [53] Umair Saeed et al. “DeepLeukNet—A CNN based
//doi.org/10.1007/978-3-031-35314-7_2. microscopy adaptation model for acute lymphoblas-
[44] Austin H Routt et al. “Deep ensemble learning en- tic leukemia classification”. In: Multimedia Tools
ables highly accurate classification of stored red and Applications 83.7 (2024), pp. 21019–21043.
blood cell morphology”. In: Scientific Reports 13.1 DOI: https : / / doi . org / 10 . 1007 / s11042 -
(2023), p. 3152. DOI: https : / / doi . org / 10 . 023-16191-2.
1038/s41598-023-30214-w. [54] Md Shahin Ali et al. “A Hybrid VGG16-ViT Ap-
[45] Hongwei Shang et al. “Precision/recall on imbal- proach With Image Processing Techniques for Im-
anced test data”. In: International Conference on proved White Blood Cell Classification and Disease
Artificial Intelligence and Statistics. PMLR. 2023, Diagnosis: A Retrospective Study”. In: Health Sci-
pp. 9879–9891. URL: https : / / proceedings . ence Reports 8.6 (2025), e70859. DOI: https://
mlr.press/v206/shang23a.html. doi.org/10.1002/hsr2.70859.
[46] Enquan Yang et al. “DRNet: Dual-stage refinement [55] Sazzli Kasim et al. “Multiclass leukemia cell classifi-
network with boundary inference for RGB-D seman- cation using hybrid deep learning andmachine learn-
tic segmentation of indoor scenes”. In: Engineering ing with CNN-based feature extraction”. In: Scien-
Applications of Artificial Intelligence 125 (2023), tific Reports 15.1 (2025), p. 23782. DOI: https :
p. 106729. ISSN: 0952-1976. DOI: https://doi. //doi.org/10.1038/s41598-025-05585-x.
org/10.1016/j.engappai.2023.106729. [56] Aniel Mahendren et al. “White Blood Cells Clas-
[47] Areen K Al-Bashir, Ruba E Khnouf, and Lamis R sification: A Feature-Based Transfer Learning Ap-
Bany Issa. “Leukemia classification using differ- proach”. In: Selected Proceedings from the 2nd
ent CNN-based algorithms-comparative study”. In: International Conference on Intelligent Manufac-
Neural Computing and Applications 36.16 (2024), turing and Robotics, ICIMR 2024, 22-23 August,
pp. 9313–9328. DOI: https : / / doi . org / 10 . Suzhou, China. Ed. by Wei Chen et al. Singa-
1007/s00521-024-09554-9. pore: Springer Nature Singapore, 2025, pp. 757–763.
[48] Martin Moller. “Efficient training of feed-forward ISBN: 978-981-96-3949-6. DOI: https : / / doi .
neural networks”. In: Neural Network Analysis, org/10.1007/978-981-96-3949-6_63.
Architectures and Applications. CRC Press, 2024, [57] Mouna Saadallah. Red Blood Cell Morphology
pp. 136–173. DOI: https://doi.org/10.1201/ Dataset for Image Classification. Zenodo, Feb.
9781003572886-8. 2025. DOI: https : / / doi . org / 10 . 5281 /
[49] Emine Özgür and Ahmet Saygılı. “A new approach 14936017. URL: https : / / zenodo . org /
for automatic classification of non-hodgkin lym- records/14936017.
phoma using deep learning and classical learning [58] Vera Sorin et al. “Deep Learning Applications
methods on histopathological images”. In: Neu- in Lymphoma Imaging”. In: Acta Haematologica
ral Computing and Applications 36.32 (2024), (2025). DOI: https : / / doi . org / 10 . 1159 /
pp. 20537–20560. DOI: https://doi.org/10. 000547427.
1007/s00521-024-10229-8. [59] KP Swain, SK Swain, and SR Nayak. “Vision
[50] Sajida Perveen et al. “A framework for early detec- Transformer-Based Automated Classification of
tion of acute lymphoblastic leukemia and its sub- Acute Lymphoblastic Leukemia”. In: 2025 Interna-
types from peripheral blood smear images using tional Conference on Emerging Systems and Intelli-
deep ensemble learning technique”. In: IEEE Access gent Computing (ESIC). IEEE. 2025, pp. 584–588.
12 (2024), pp. 29252–29268. DOI: https://doi. DOI: https://doi.org/10.1109/ESIC64052.
org/10.1109/ACCESS.2024.3368031. 2025.10962707.
[51] Prakeerth Prasad and Jani Anbarasi L. “Acute Lym- [60] Vishesh Tanwar et al. “Enhancing blood cell diag-
phoblastic Leukemia Subtypes Detection using Vi- nosis using hybrid residual and dual block trans-
sion Transformer Model”. In: 2024 5th International former network”. In: Bioengineering 12.2 (2025),
Conference on Data Intelligence and Cognitive In- p. 98. DOI: https : / / doi . org / 10 . 3390 /
formatics (ICDICI). 2024, pp. 1413–1418. DOI: bioengineering12020098.
416 Informatica 49 (2025) 397–416 M. Saadallah et al.
https://doi.org/10.31449/inf.v49i16.10050 Informatica 49 (2025) 417–428 417
Deep Learning and Rule-Based Hybrid Model for Enhanced English
Composition Scoring Using Attention Mechanisms and Graph
Convolutional Networks
Ruimin Li
Zhoukou Vocational and Technical College, Zhoukou 466000, China
E-mail: laogui9029@126.com
Keywords: English essay grading, deep learning, artificial rules, graph convolutional network, wide&deep architecture
Technical paper
Received: July 8, 2025
Through the profound exploration conducted on AI technology in the field of education, early automatic
scoring systems for English compositions have problems such as high misjudgment rate and low efficiency.
To improve the efficiency, accuracy, and stability of the English composition grading model, a deep
learning and manual rule-based English composition grading model was designed. The research extracted
sequence features by introducing attention mechanisms, enhancing contextual correlation analysis, and
aggregating global features through graph convolutional networks to extract high-order semantic
relationships. Finally, a visual manual scoring rule was designed, which integrated deep semantic features
and manual rule features through the Wide&Deep architecture to jointly optimize the scoring results. The
experiment outcomes indicated that the accuracy recall curve area of the research method was 92.3%. In
practical application testing, the highest group stability index of the research method was 0.07 in June.
When faced with 600 concurrent requests, the average response time of the research method reached a
stable value of 3.4 seconds. The outcomes above demonstrated that the English essay scoring model,
which combines deep learning with manual rules as proposed by the research, exhibited excellent
accuracy, speed, and stability. It effectively addressed the issues of a high misjudgment rate and low
efficiency found in traditional scoring systems, thereby enhancing the model's reliability.
Povzetek: Razvit je hibridni model za ocenjevanje angleških esejev, ki združuje globoko učenje z ročnimi
pravili. Z Word2Vec, mehanizmom pozornosti in GCN zajame lokalne ter globalne semantike, Wide&Deep
pa združi pravila in značilke.
1 Introduction interpretability, but cannot adapt to open content
evaluation. The two methods complement each other in
English writing ability is one of the core indicators of
advantages [5]. In light of the preceding circumstances, to
language learning, and traditional manual scoring methods
ensure the stability, accuracy, and efficiency of the scoring
face bottlenecks such as low efficiency and strong
model, an innovative English composition scoring model
subjectivity [1, 2]. Early automatic scoring systems mainly
based on DL and artificial rules has been designed. The
relied on rule-based methods to detect surface errors
research uses Word2Vec model to convert essay text into a
through pre-defined grammar and spelling rules, but it was
matrix of word vectors, capturing semantic information of
difficult to evaluate the quality of content and logic,
vocabulary. It introduces attention mechanism and graph
resulting in a high rate of misjudgment [3]. With the
convolutional network to extract local sequence features
advancement of technology, machine learning (ML)
and semantic graph features, and concatenating the two
algorithms have been introduced to comprehensively
features to generate deep semantic features, constructing
consider vocabulary, syntax, and other elements through
graph adjacency matrix to dynamically capture the
feature engineering. However, a substantial quantity of
relationships between sentences. Then, artificial rule
annotated data support is still needed, and the
features are generated through feature concatenation, and
generalization ability is insufficient [4]. The existing
the Wide&Deep architecture is used to fuse deep semantic
scoring system cannot meet the automatic scoring
features with artificial rule features. Finally, combining
requirements for English compositions, and there is an
multi-dimensional manual rule evaluation, the research
urgent need for a stable, efficient, and accurate scoring
achieves dynamic comprehensive scoring of the entire
model. Deep learning (DL) models can improve semantic
English composition. It is anticipated that research
understanding through end-to-end learning, but they lack
methodologies will offer a theoretical foundation for
transparency and are difficult to capture grammatical
grading essays in different languages.
details. Artificial rules have a high degree of
418 Informatica 49 (2025) 417–428 R. Li
2 Related works support systems was determined during the research
process. The outcomes revealed that the research method
English composition grading is an important part of the
could effectively enhance decision-making ability in the
educational evaluation system, playing a crucial role in
context of supply chain [15].
achieving teaching objectives and optimizing teaching
In summary, existing research has played a good role
strategies. Ramesh et al. proposed AI and ML techniques
in the technological advancement of English composition
for evaluating automatic paper grading in response to
grading models, but it still has limitations such as low
issues such as time-consuming manual assessments and
grading efficiency and significant subjective differences.
lack of reliability in the education system. During the
The automatic scoring model based on DL can extract
research process, the limitations and research trends of the
multi-level information such as linguistic features and
current study were analyzed. The outcomes revealed that
semantic information, which can simulate the process of
the research method had a good effect [6]. Fokides et al.
manual scoring to a certain extent, while manual rules can
compared the accuracy and qualitative aspects of the
handle complex grammar rules and subtle semantic
corrections and feedback generated by ChatGPT with
differences. Therefore, based on this, a DL and artificial
educators regarding the effectiveness of ChatGPT on
rule-based English composition grading model was
elementary school students' essays written in English. The
designed. The goal is to align with the design standards for
outcomes revealed that ChatGPT surpassed educators in
automated English composition grading and to
regard to both the volume and the caliber of output [7].
significantly boost both the accuracy and efficiency of the
Shahzad et al. proposed using random forests as classifiers
grading workflow.
for off topic paper detection to address the prediction
problem of whether an article deviates from the topic. The 3 Design of english composition
outcomes revealed that the research method had high
scoring model
accuracy [8]. Erturk et al. pointed out the low reliability
and effectiveness of essay style evaluation tools, and 3.1 Intelligent english composition scoring
believed that the system's decrease in paper scores was model based on deep semantic text
related to boredom in the labeling. The outcomes revealed
that higher levels of boredom were correlated with lower features
scores [9]. Sharma et al. proposed a system that combines As an important part of the education evaluation system,
handwriting recognition models and automatic paper English composition grading has undergone an evolution
grading to address the time-consuming issue of grading from traditional manual grading to automated grading.
handwritten papers in educational environments. During However, existing automated grading systems are mostly
the research process, the performance of downstream tasks based on shallow text features, resulting in significant
in paper scoring was analyzed based on Transformer errors in their grading results [16, 17]. DL models can
context embedding. The outcomes revealed that the effectively improve the accuracy and reliability of English
research method had good performance [10]. composition grading from three aspects: feature extraction,
Many scholars both within the country and abroad semantic understanding, and grading prediction [18]. The
have carried out profound investigations and application of study converts the original English composition text into a
Word2Vec and artificial rules. Mohammed et al. conducted numerical word embedding matrix, and the text to word
an exhaustive examination of diverse approaches within embedding conversion formula is shown in Equation (1).
the realm of ensemble learning to address the issue of time-
E = Embedding(X) (1)
consuming hyperparameter tuning in DL. Various features
or factors that affect the success of integration methods In Equation (1), X represents the input English
were explained during the research process. The outcomes composition text sequence, represents the word
revealed that the research method could provide accurate embedding matrix, and Embedding() represents a DL
theoretical support [11]. Tropsha et al. proposed a "deep
embedding function. Next, the research will investigate the
quantitative structure-activity relationship" model for
use of the Woed2Vec learning model to map each word to
virtual screening of molecular databases. The outcomes
a high-dimensional space, capturing the semantic and
revealed that the research method had a good effect [12].
positional information of the word. The Word2Vec
Whang et al. proposed a fairness measure and unfairness
learning model has two training models: continuous bag of
mitigation technique to address the issues of bias and
words and skip word. The frameworks of the two models
unfairness in traditional data management. The outcomes
are presented in Figure 1.
revealed that the research method had good data
management performance [13]. Pereira et al. proposed an
ML system for multi animal pose tracking to address the
challenge of using DL and computer vision techniques to
study the social behavior of multiple animals in natural
environments. The outcomes revealed that the research
method had good efficiency and accuracy [14]. Olan et al.
designed an explanatory algorithm to address the impact
of AI on the decision-making process in the supply chain
field. The composition of interpretable AI and decision
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 419
Input Projection Output Input Projection Output fusion matrix, and Concat() represents the concatenation
layer layer layer layer layer layer
operation, The number of attention heads is 8, which is
determined by GPU video memory optimization test. In
the process of extracting semantic graph features from
English compositions, in order to dynamically capture the
relationships within sentences, a semantic graph adjacency
matrix is constructed, and its construction formula is
shown in Equation (4).
T
EE
A = softmax (4)
(a) Continuous bag of D
words model (b) Skip-Gram model
Figure 1: Framework diagram of continuous bag of words In Equation (4), A represents the adjacency matrix,
model and skip-gram model T
E represents the transpose matrix of the word
As shown in Figure 1, the training process of both the embedding matrix E , and D is the embedding
continuous bag of words training model and the skip word dimension. Continuing with the study of iteratively
training model goes through the input layer, the mapping updating node features to capture higher-order
layer, and finally outputs the results from the output layer. relationships in semantic graphs, the graph convolution
However, the continuous bag of words model aggregates feature propagation formula is shown in Equation (5).
and maps multiple features, and then outputs the results,
while the skip word model maps the features and performs 1 1
− −
(l+1)
H = 2 2 (l ) (l )
D AD H Θ (5)
classification output. The study combines continuous bag
of words training model and skip word training model to
train and detect the sequence features and semantic map In Equation (5), (l )
H represents the node feature
features of English compositions, and then scores the
1
English compositions based on the detection results. In the matrix of the l th layer, − is used for normalization,
D 2
process of extracting sequence features from English (l )
Θ represents the learnable weight matrix, is the
compositions, in order to break through the sequence
activation function, and introduces nonlinearity to enhance
limitations of DL models, self-attention mechanism is
the model's expressive power. Finally, the study integrates
introduced, and its function expression is shown in
all node features and aggregates them into a graph level
Equation (2).
feature vector to represent the global semantics of the
T entire English composition. The graph level feature
QK
Attention(Q,K, V) = softmax V (2) aggregation formula is shown in Equation (6).
d
k
N
z = (L)
In Equation (2), Q , K , and V represent the query ihi
i=1
matrix, key matrix, and value matrix, respectively. d is exp( • (L) (6)
k w h
i )
i =the dimension of the key or query vector and softmax()
exp( • (L)
w h j )
represents the normalization function. To enhance the j
model's ability to express complex sequence patterns, a In Equation (6), (L)
h is the feature vector of the i th
multi-head attention mechanism is introduced, and its i
calculation formula is shown in Equation (3). node, L is the total number of layers, z is the graph
level feature vector representing the semantic summary of
= O
MultiHead(Q,K,V) Concat(head1, ,headh )W the entire text, is the attention weight representing the
(3) i
wherehead = Q K V
i Attention(XWi , XWi , XWi ) importance of node i to global features, w is the
learnable weight vector used to calculate attention scores,
In Equation (3), head represents the number of and N is the number of nodes or words. In summary, the
attention heads. Q
XW , K
XW , and V epr sen th
i XW r e t e
i i detection model structure that integrates the sequence
projection matrices of the i th head query vector, head key features of English compositions with the semantic map
vector, and head value vector, O
W represents the output features of English compositions is shown in Figure 2.
420 Informatica 49 (2025) 417–428 R. Li
CNN
Semantic graph
Word2Vec Convolutional Neural
h
Networks graph
Out Out Out Concat
Score?
hdeep
English
composition Word2Vec Head1 Head2 Headk hsep
Self-Attention Mechanism
Figure 2: Detection model integrating sequence and graph features of english compositions
Data Capture the relationships
processing between sentences in
English compositions Feature
Semantic diagram of extraction
Capture high-order English composition
relationships in semantic Word2Vec
graphs
Semantic
Integrate node features Self-Attention Convolutional
English graph nodes Mechanism Neural Networks
composition
document
Split into a head sequence
English composition
head sequence
Sequence features
Deep semantic
Output rating results
features
Scorer Image features
Figure 3: Intelligent scoring model for english compositions based on deep semantic text features
As shown in Figure 2, the detection model that on deep semantic text features is shown in Figure 3.
integrates English composition sequence features and As shown in Figure 3, the English composition
semantic graph features receives two types of input data, scoring model based on deep semantic text features
and the semantic graph captures the semantic achieves accurate evaluation by integrating multi-level
relationships between phrases and concepts. Then the semantic information. The model first performs
semantic graph and English composition are processed structured parsing on the input English essay document,
by Word2Vec, converting discrete words into dense, low breaks down the title sequence to highlight the article
dimensional real valued vectors. Next, semantic graph structure, and preserves contextual information through
features are extracted through graph convolutional node feature integration. In the feature extraction stage,
networks, while sequence features are extracted by multimodal technology is used to deeply fuse semantic
introducing attention mechanisms. Subsequently, feature information. On the one hand, the title sequence is
fusion is performed, and the sequence feature vectors and embedded in Word2Vec and local sequence features are
graph feature vectors extracted from two parallel paths extracted through self-attention mechanism. On the other
are concatenated to form deep semantic features. Finally, hand, semantic graph nodes model global semantic
the model evaluates the output rating results. The fusion relationships through graph convolutional networks. The
detection model solves the limitation of single feature two types of features are further combined with image
representation ability by fusing two complementary features to form a unified deep semantic feature vector.
feature representations. The deep semantic text feature Finally, the rater performs regression analysis based on
expression of its fusion model is shown in Equation (7). deep semantic features and outputs objective scoring
results.
h ( ) (
deep = h 7)
seq hgraph In summary, the implementation details of the entire
research framework are as follows: (1) Word2Vec is used
In Equation (7), h represents the deep semantic
deep to convert English essay texts into dense word vector
features of English composition, || represented by matrices. The continuous bag-of-words model predicts
core words through contextual word prediction. The input
vector concatenation symbols, h and h
seq graph layer aggregates multiple contextual word vectors, while
respectively represent the sequence features and graph the mapping layer summarizes them to output core word
features of English composition. In summary, the probabilities. The skip-word model predicts contextual
intelligent scoring model for English composition based words based on core words. Both models undergo
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 421
negative sampling optimization, with 300-dimensional complexity score, T represents the part of speech
s
embeddings and a 5-window contextual size. (2) During diversity score, P represents the rectangle score, and
attention mechanism feature extraction, the input word s
vector matrix is linearly transformed to generate query represents the artificial rule weight. Then, to evaluate
i
matrices, key matrices, and value matrices, each with 64 the logical rigor of English essay paragraphs, a scoring
dimensions. The multi-head architecture employs 8 heads, formula for paragraph cohesion strength is introduced,
where each head independently computes attention and and its specific expression is shown in Equation (10).
outputs concatenated linear fusion results. The final n
sequence features are generated. (3) In graph k I (conn
k )
(10)
convolutional network semantic feature extraction, the = k= R
C 1
p cohere
adjacency matrix embedding dimension is 300. The N 1+ log(L)
feature propagation and aggregation process learn a In Equation (10), C represents the coherence
p
weight matrix with 128 dimensions across 2 layers.
score of the paragraph, I(conn represents the validity
k )
3.2 Intelligent english composition scoring indicator function of the k th connector, represents
k
model combined with artificial rules the weight of the connector, R represents the
cohere
Although the English composition grading model based semantic coherence ratio, N represents the number of
on deep semantic text features can effectively grade sentences, and L represents the length of the paragraph.
English compositions, it generally relies on manually
Finally, the study aims to achieve dynamic
defined grading templates, and candidates can avoid
comprehensive scoring of the entire English composition
deduction types through simple writing techniques,
through multi-dimensional manual rule evaluation. The
lacking interpretability [19]. In the field of composition
scoring formula is shown in Equation (11)
checking, artificial rules are usually expressed in formal
language and automatically detected through natural
m
language processing tools. The English scoring model l
Qs C
j p
combined with artificial rules can effectively solve the p
j=1 p=1
Score = + + Sim
problem of lack of interpretability in DL models, so content Simstr
m l
further research is needed to introduce artificial rules [20].
Based on manual rules to quantify the basic language
quality of sentences, the basic formula for scoring errors (11)
in English compositions is shown in Equation (8). In Equation (11), Score represents the final score
1
E of the composition, Q represents the excellence score
s = F (8) s j
1+ (Cspell +Cgram )
of the j th sentence, C represents the coherence
p
In Equation (8), E is the score for incorrect p
s score of the p th paragraph, Sim and
content Sim
sentences, with a maximum score of F , C is the str
spell
represents the similarity of content and structure, ,,
number of spelling errors, C is the number of
gram all meet the requirement of + + =1 . The artificial
grammar errors, and is the error penalty coefficient,
rules are constructed based on expert knowledge,
its value is 0.1, and the error rate is lowest when its value employing a method that quantifies sentence-level errors
is 1 through grid search verification. Continuing with the and sentence excellence through predefined weights to
study of balancing the importance of each dimension achieve digital transformation. The primary linguistic
through artificial rules, the formula for weighting the features targeted include surface errors, sentence-level
multidimensional excellence of sentences is shown in errors, and paragraph-level errors. By integrating deep
Equation (9). semantic features through a Wide&Deep architecture, the
rules enhance interpretability while capturing subtle
Qs = 1 Vs +2 Gs +3 Ts +4 P (9)
s errors and reducing subjective variations. Experimental
validation demonstrates their effectiveness in lowering
In Equation (9), Q represents the overall
s bias values and misjudgment rates, as well as improving
excellence score of the sentence, V represents the scoring stability. In summary, the feature extraction
s
vocabulary score, G represents the syntactic framework for the manual scoring rules of English
s compositions is shown in Figure 4.
422 Informatica 49 (2025) 417–428 R. Li
Error type 1
Manual scoring
rules
Error type 2
Concat
hexpert
Error type n
English
composition text Artificial rule vector
Figure 4: Feature extraction of artificial rules for english composition
Manual scoring rules English composition text Semantic graph
Convolutional
Error type 1 Neural Networks
Error type 2
Concat
Error type n
Artificial rule vector
hsep Concat hgraph
hexpert hdeep
Score?
Figure 5: Network structure of english composition error detection combined with artificial rules
As shown in Figure 4, in the feature extraction
framework of manual scoring rules for English hexpert = (W‖( v rud i ) b) ( 2
x + 1 )
compositions, structured manual scoring rules are input
together with the original English composition text as In Equation (12), h represents the artificial
expert
initial data. Then, the manual rules are decomposed into rule feature, vrud represents the set of error types,
different types of errors, and each type of rule is
x represents the artificial rule vector for the error type,
quantified as a numerical vector to achieve the digital i
transformation of expert knowledge. Then, the artificial and b represents the bias term. Next, the study uses the
rule vector is concatenated with the semantic vector of Wide&Deep structure to fuse shallow features of
the composition text to form a mixed feature that artificial rules with deep semantic text features, achieving
combines both artificial rules and text semantics. Finally, the final error classification prediction. The fusion
after processing, output the characteristics of the manual formula is shown in Equation (13).
scoring rules for English compositions. The study aims to
achieve the organic integration of artificial rules and DL y = Softmax(W 1
widehexpert +Wdeephdee +b) ( 3)
p
models by converting discrete artificial rules into
continuous features. The specific expression is shown in In Equation (13), y represents the rating result,
Equation (12). and W and W represent the weight matrices of
wide deep
Artificial rule feature extraction
Deep semantic feature extraction
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 423
parts wide and deep . In summary, the network accuracy of the model, TP and TN are the number of
structure of English composition error detection essays correctly rated as low or high by the model, FP
combined with manual rules is shown in Figure 5. and FN are the number of essays incorrectly rated as
As shown in Figure 5, the English composition error low or high by the model. In summary, the scoring
detection network structure combined with manual rules process of the English composition scoring model based
improves detection accuracy and interpretability through on DL and artificial rules is shown in Figure 6.
dual channel feature fusion. The model receives dual As shown in Figure 6, the English essay scoring
source inputs: the manual scoring rules are decomposed model based on DL and artificial rules improves scoring
and vectorized into quantifiable rule vectors, covering accuracy and interpretability through dual channel
error types such as grammar, logic, rhetoric, etc. The feature collaboration. The model takes English
model synchronizes the construction of semantic maps composition text and manual scoring rules as dual source
for original English compositions, extracts logical inputs: on the one hand, it generates a semantic map
relationships between sentences, and performs sequence through multi-level parsing of the original text, and on the
analysis to capture word order features. Then, the deep other hand, it breaks down the document into sequential
semantic features obtained from the dual source input are features according to its structure, preserving the
evaluated together with the artificial rule features, and the framework information of the article. Next, in the feature
results are judged. The study introduces binary cross extraction stage, a bimodal DL architecture is adopted.
entropy loss to measure the difference between After Word2Vec vectorization of semantic graph nodes, a
misclassified predictions and true labels. The specific graph convolutional network models global semantic
expression of the loss function is shown in Equation (14). relationships and outputs deep features. Sequence nodes
extract local language patterns through self-attention
Loss = −yˆ log(y)−(1− yˆ)log(1− y) (14) mechanisms, generate sequence features, and
concatenate the two to form deep semantic features. On
the other hand, breaking down manual rules in textual
In Equation (14), Loss represents the loss
form into quantifiable dimensional vectors enables the
function value, ŷ represents the true label of the sample, digital transformation of expert knowledge. Finally, the
and y represents the predicted probability output of the artificial rule features are optimized using binary cross
model. Finally, the study evaluates the performance of the entropy and combined with deep semantic features to
model by calculating its accuracy, as shown in Equation generate rule enhanced deep features. The rater then
(15). performs regression analysis based on the rule enhanced
deep features to output the final English essay grading
TP+TN
Accuracy = (15) results.
TP+ FP+TN + FN
In Equation (15), Accuracy represents the
Data Capture the relationships
processing between sentences in
English compositions Feature
Semantic diagram of extraction
Capture high-order English composition
relationships in semantic Word2Vec
graphs
Semantic
Integrate node features Self-Attention Convolutional
English graph nodes Mechanism Neural Networks
composition
document
Split into a head sequence Sequence features Image features
English composition
head sequence
Deep semantic features
Artificial rule Binary cross
vector entropy Deep semantic features
Manual
scoring rules
Wide&Deep
Output rating results
structure
Scorer
Figure 6: Scoring process of english composition scoring model based on dl and artificial rules
424 Informatica 49 (2025) 417–428 R. Li
Algorithm (ICSA), Linear Regression Model (LRM), and
4 Validation of English composition Hierarchical Attention Model (HAM). The accuracy
recall curves and curve areas of the four methods were
grading model based on DL and compared, and the results are presented in Figure 7.
artificial rules As shown in Figure 7, the shape and area of the
accuracy recall curve of different methods are different.
4.1 Performance testing of English In Figure 7 (a), the accuracy recall curve of the research
composition scoring model based on method was close to a rectangle, with a curve area of
DL and artificial rules 92.3%. In Figure 7 (b), the accuracy recall curve of the
ICSA algorithm was 71.6%. In Figure 7 (c), the curve of
To confirm the capability of the English essay grading the LRM model belonged to low accuracy high recall,
model based on DL and artificial rules, a simulation which was prone to false positives. As shown in Figure 7
model was constructed for testing. The testing (d), the curve of the HAM model belonged to high
environment and specific configuration are presented in accuracy low recall, which was prone to missed
Table 1. detections. Overall, compared to comparative methods,
research methods had higher accuracy and inspection
coverage. The mean absolute error (MAE) of the scoring
results of the four methods under different numbers of
writing words, as well as the scoring time under different
Table 1: Test environment and specific configuration numbers of writing paragraphs, were compared, and the
Testing environment Specific configuration outcomes are presented in Figure 8.
GPU NVIDIA Tesla In Figure 8 (a), the MAEs of the scoring results of
V100/A100 the four methods all increased with the increase in the
CPU Intel Xeon Gold 6248R number of English composition words. The MAE of the
Memory 256GB DDR4 research method's scoring results had the smallest
increase. When the word count in the composition was
Storage 2TB NVMe SSD + 10TB
100, the MAE of the research method's scoring results
HDD
was 0.25. When the word count in the composition was
DL framework PyTorch 1.12 /
350, the MAE of the research method's scoring results
TensorFlow 2.10
was 0.52. The MAE for the two types of composition
Feature engineering tools Scikit-learn 1.2 +
word counts only increased by 0.27. The MAE of the
Gensim 4.3
scoring results for the other three methods was
Support for large models Transformers 4.28
significantly greater than that of the research method at
different numbers of words in the composition. In Figure
Research method ICSA 8 (b), the scoring time of all four methods increased with
100 100
the number of paragraphs in the essay. When the English
80 80 essay had only one paragraph, the scoring time of the
60 60 research method was 32 ms, and when the essay had five
paragraphs, the scoring time was 42 ms. However, the
40 40
scoring time of the other three methods at different
20 20 paragraph counts was significantly greater than that of the
0 0 research method. Overall, compared to the comparative
0 20 40 60 80 100 0 20 40 60 80 100
Recall (%) Recall (%) methods, the research methods had better robustness. In
(a) Research method (b) ICSA conclusion, the English essay grading model proposed by
100 LRM 100 HAM
the research based on DL and artificial rules had high
80 80 reliability, accuracy, and good robustness. After
validating the performance of the research methodology,
60 60
the study further investigated the synergistic effects of the
40 40 fusion architecture through ablation experiments. First,
20 20 independent testing of deep models revealed that
removing manual rules reduced grammatical error
0 0
0 20 40 60 80 100 0 20 40 60 80 100 detection accuracy, demonstrating their constraint effect
Recall (%) Recall (%)
(c) LRM (d) HAM on surface errors. Next, independent testing of rule
Figure 7: Accuracy recall curve of different methods models showed increased semantic coherence score
deviations in long texts when graph convolutional
networks were removed, proving deep models' capability
As shown in Table 1, the specific configurations in to capture higher-order semantics. Finally, dual-stream
the table were used for performance testing, using the feature contribution analysis using SHAP values
Kaggle ASAP dataset. The research methods were demonstrated that manual rule features contributed
compared with the Integrated Classification Scoring minimally to grammatical/spelling error detection, while
Accuracy (%) Accuracy (%)
Accuracy (%) Accuracy (%)
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 425
deep semantic features played a significant role in content in IELTS data set could be reduced to 0.42, which was
logic scoring. These ablation results confirmed the better than the 0.67 of the research models, highlighting
complementary innovation of the "feature perception- the advantages of lightweight. The study further
regulation constraint" architecture in the research compared Transformer-based pre-trained models. In the
methodology. IELTS content dimension scoring, BERT exhibited lower
deviation values than the research methodology.
4.2 Practical application effect of English However, its reliance on billions of parameters resulted
composition scoring model based on DL and in significantly longer response times. Notably, the
artificial rules research methodology demonstrated markedly higher
accuracy in detecting grammatical errors when
On the basis of verifying the performance of the incorporating rules, surpassing BERT. These findings
English essay grading model based on DL and artificial indicated that compared to cutting-edge technologies, the
rules, further research is conducted to ascertain the research methodology demonstrated superior semantic
efficacy of the practical application of the research understanding depth and error detection specificity. Then,
method. The study used the IELTS Writing Task 2 dataset the group stability index of the four methods for the
to build a modular hierarchical architecture experimental English composition data in the first six months was
platform. The research methods were compared with scored, and the deviation values under different scoring
ICSA, LRM and HAM, and the semantic depth index was dimensions were compared. The results are shown in
supplemented: The content dimension deviation of BERT Figure 9.
Research method Research method
ICSA ICSA
LRM LRM
2.5 HAM HAM
2.0 100
80
1.5
60
1.0
40
0.5 20
1
2
0 3
100 150 200 250 300 350 4
Word count in the composition 5
(a) The average absolute error of
(b) Rating efficiency
compositions of different lengths
Figure 8: MAE and rating efficiency
Critical value Research method
Research method ICSA
0.5 ICSA 5 LRM
LRM HAM
HAM
0.4 4
0.3 3
0.2 2
0.1 1
0 0
1 2 3 4 5 6 Content Language Structure Coherence
Composition data for different months Rating dimension
(a) Model generalization ability (b) Deviation in sub item scoring
Figure 9: Model generalization ability and sub-item scoring deviation
In Figure 9 (a), the critical value of the group The overall group stability index of the research method
stability index for English composition scoring was 0.17. for scoring monthly composition data remained below
Mean absolute error
Group stability index
Deviation value Time consumption (ms)
426 Informatica 49 (2025) 417–428 R. Li
the critical value, with its highest group stability index essays was less than 20%. The highest misjudgment rate
being 0.07 in June and the lowest 0.02 in January. The was 17.8% when the score threshold was 21 points, and
stability indices of the other three methods for scoring the minimum misjudgment rate of the research method
monthly essay data were significantly higher than that of was 3.2% when the score threshold was 24-25 points. The
the research method. In Figure 9 (b), the deviation value misjudgment rates of the other three methods were
of the research method in the dimension of composition significantly higher than that of the research method at
content was 0.67, the deviation value in the dimension of different score thresholds. As shown in Figure 10 (b), the
composition language was 0.99, the deviation value in the consistency between the vocabulary scoring results of the
dimension of composition structure was 0.33, and the four methods and the manual vocabulary scoring was not
deviation value in the dimension of composition the same. The distribution of the vocabulary scoring
coherence was 0.82. The bias values of the other three results of the research method was closely aligned with
methods under different scoring dimensions were the diagonal, indicating it was highly consistent with the
significantly greater than those of the research methods. manual vocabulary scoring. However, the distribution of
Overall, compared to the comparative methods, the vocabulary scoring results for the other three methods
research methods had better generalization ability and differed significantly from that of manual vocabulary
higher accuracy. Comparing the sensitivity of four scoring, resulting in lower accuracy of their scoring
methods in identifying excellent compositions and their results. Overall, compared to the comparative methods,
ability to capture advanced vocabulary in compositions, the research method had a lower false positive rate and
the results are presented in Figure 10. better scoring performance. The four methods were
As shown in Figure 10 (a), the misjudgment rates of the compared for the accuracy rate of scoring under different
four methods for high - scoring essays with different error types and the average response time under different
score thresholds were not the same. The overall concurrent request numbers, as shown in Figure 11.
misjudgment rate of the research method for high-scoring
Research method Line of demarcation
ICSA Research method
LRM ICSA
HAM LRM
HAM
100 5
80 4
60 3
40 2
20 1
0 0
20 21 22 23 24 25 0 1 2 3 4 5
Score threshold (points) Manual evaluation vocabulary score (points)
(a) High score essay misjudgment rate (b) Vocabulary richness recognition ability
Figure 10: High score essay misjudgment rate and vocabulary richness recognition ability
Research method Research method
ICSA ICSA
500 LRM 10 LRM
HAM HAM
400 8
300 6
200 4
100 2
0 0
Grammar Spell Logic Match 0 200 400 600 800 1000
Error type Number of concurrent requests (copies)
(a) Error type processing time (b) Concurrent processing capability
Figure 11: Error type processing time and concurrent processing capability
Processing time (ms)
Misjudgment rate (%)
Scoring system evaluates
vocabulary score (points)
Average response time (s)
Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 427
As shown in Figure 11(a), the research method application potential. Future studies could integrate
demonstrated 99.2% accuracy in scoring grammatical Transformer pre-trained models to verify model stability
errors, 98.7% in spelling errors, 99.0% in logical errors, and deviations across multilingual essay datasets (e.g.,
and 97.9% in collocation errors. In contrast, the other French, Chinese), evaluate cross-linguistic rule
three methods showed significantly lower accuracy rates adaptability, and improve cross-linguistic performance
for these error types compared to the research and transferability. Additionally, the research could
methodology. In Figure 11 (b), the average response time incorporate eye-tracking technology into multi-modal
of the four methods gradually increased with the increase deep understanding frameworks. By recording eye
of concurrent requests, with the research method showing movements during writing processes, it can analyze
the smoothest trend of increase. When faced with 600 authors' attention allocation patterns. Combined with
concurrent requests, the average response time of the keystroke logs, this approach could quantify writing
research method reached a stable value of 3.4 seconds. fluency and cognitive load, supplementing process
However, the average response time of the other three dynamics that textual features cannot capture. However,
methods showed a significantly greater increase trend this study is the first to migrate the Wide&Deep
than the research method. Overall, compared to architecture from recommendation system to essay
comparative methods, research methods had better scoring field. Through semantic drift of DL with rule
resource allocation capabilities and scoring performance. feature constraints, it provides a new idea for the
Overall, the English essay grading model proposed by the interpretability of AI education products.
research based on DL and artificial rules had good
generalization ability, accuracy, and performance. References
5 Conclusion [1] Del Gobbo E, Guarino A, Cafarelli B, Grilli L.
GradeAid: A framework for automatic short
To address the issues of high misjudgment rates and answers grading in educational contexts—design,
instability in existing English essay automatic scoring implementation and evaluation. Knowledge and
systems, this study innovatively proposes an English Information Systems, 2023, 65(10): 4295-4334.
essay scoring model combining DL with manual rules. DOI: 10.1007/s10115-023-01892-9.
The research methodology extracts sequence features and [2] Wang Q. The use of semantic similarity tools in
semantic graph features from English essays, integrating automated content scoring of fact-based essays
them with manual rule features to construct a "feature written by EFL learners. Education and Information
perception-rule constraint-joint decision" fusion Technologies, 2022, 27(9): 13021-13049. DOI:
architecture for stable and accurate scoring. Experimental 10.1007/s10639-022-11179-1.
results show that when the essay contains 100 words, the [3] Geçkin V, Kızıltaş E, Çınar Ç. Assessing second-
average absolute error of the scoring method is 0.25; language academic writing: AI vs. Human raters.
when the essay contains 350 words, the average absolute Journal of Educational Technology and Online
error increases to 0.52; and when the essay consists of 5 Learning, 2023, 6(4): 1096-1108. DOI:
paragraphs, the scoring time reaches 42ms. In practical 10.31681/jetol.1336599.
application tests, the method shows 0.67 deviation in [4] Theodosiou A A, Read R C. Artificial intelligence,
content dimension scoring, 0.99 deviation in language machine learning and deep learning: Potential
dimension scoring, 0.33 deviation in structure dimension resources for the infection clinician. Journal of
scoring, and 0.82 deviation in coherence dimension Infection, 2023, 87(4): 287-294. DOI: 10.1016/j.
scoring. The method achieved 99.2% accuracy rate for Jinf.2023.07.006.
grammatical errors, 98.7% accuracy rate for spelling [5] Wang J, Wang S, Zhang Y. Deep learning on medical
errors, 99.0% accuracy rate for logical errors, and 97.9% image analysis. CAAI Transactions on Intelligence
accuracy rate for collocation errors. Overall, the proposed Technology, 2025, 10(1): 1-35.
method demonstrated excellent scoring accuracy, DOI:10.1049/cit2.12356.
robustness, and stability. The research findings failed to [6] Ramesh D, Sanampudi S K. An automated essay
quantify the contribution ratios of DL and rule-based scoring systems: A systematic literature review.
approaches to explainability. The test datasets were Artificial Intelligence Review, 2022, 55(3): 2495-
limited to IELTS/Kaggle materials, which did not 2527. DOI: 10.1007/s10462-021-10068-2.
validate the generalization capabilities of open-domain [7] Fokides E, Peristeraki E. Comparing ChatGPTs
essays and consequently compromised practical correction and feedback comments with that of
applicability. Moreover, the methodology primarily educators in the context of primary students short
relied on Word2Vec and traditional attention mechanisms essays written in English and Greek. Education and
for feature extraction. While effective in English essay Information Technologies, 2025, 30(2): 2577-2621.
scoring, the static embedding model of Word2Vec lacked DOI: 10.1007/s10639-024-12912-8.
contextual sensitivity, potentially limiting semantic depth [8] Shahzad A, Wali A. Computerization of off-topic
comprehension and cross-linguistic transfer capabilities. essay detection: a possibility? Education and
Modern Transformer models, however, provide superior Information Technologies, 2022, 27(4): 5737-5747.
contextual representation and enhanced cross-linguistic DOI: 10.1007/s10639-021-10863-y.
428 Informatica 49 (2025) 417–428 R. Li
[9] Erturk S, van Tilburg W A P, Igou E R. Off the mark: supply chain decision support making. Production
Repetitive marking undermines essay evaluations Planning & Control, 2025, 36(6): 808-819.
due to boredom. Motivation and Emotion, 2022, DOI:10.1080/09537287.2024.2313514.
46(2): 264-275. DOI: 10.1007/s11031-022-09929-2. [16] Bhat M, Rabindranath M, Chara B S, Simonetto D
[10] Sharma A, Katlaa R, Kaur G, Jayagopi D B. Full- A. Artificial intelligence, machine learning, and
page handwriting recognition and automated essay deep learning in liver transplantation. Journal of
scoring for in-the-wild essays. Multimedia Tools hepatology, 2023, 78(6): 1216-1233. DOI:
and Applications, 2023, 82(23): 35253-35276. DOI: 10.1016/j. Jhep.2023.01.006.
10.1007/s11042-023-14558-z. [17] Simon K, Vicent M, Addah K, Bamutura D, Atwiine
[11] Mohammed A, Kora R. A comprehensive review on B, Nanjebe D, Mukama A O. Comparison of deep
ensemble deep learning: Opportunities and learning techniques in detection of sickle cell
challenges. Journal of King Saud University- disease. AIA, 2023, 1(4):252-259. DOI: https://doi.
Computer and Information Sciences, 2023, 35(2): Org/10.47852/bonviewAIA3202853.
757-774. DOI: 10.1016/j. Jksuci.2023.01.014. [18] K. Bhosle, V. Musande. Evaluation of deep learning
[12] Tropsha A, Isayev O, Varnek A, Schneider G, CNN Model for recognition of Devanagari digit.
Cherkasov A. Integrating QSAR modelling and Applied Artificial Intelligence. 2023, 1(2): 114-118.
deep learning in drug discovery: The emergence of DOI: 10.47852/bonviewAIA3202441.
deep QSAR. Nature Reviews Drug Discovery, 2024, [19] Zamfiroiu A, Vasile D, Savu D. ChatGPT–a
23(2): 141-155. DOI: 10.1038/s41573-023-00832-0 systematic review of published research papers.
[13] Whang S E, Roh Y, Song H, Lee J G. Data collection Informatica Economica, 2023, 27(1): 5-16. DOI:
and quality challenges in deep learning: A data- 10.24818/issn14531305/27.1.2023.01.
centric ai perspective. The VLDB Journal, 2023, [20] Didimo W, Grilli L, Liotta G, Montecchiani F.
32(4): 791-813. DOI: 10.1007/s00778-022-00775-9. Efficient and trustworthy decision making through
[14] Pereira T D, Tabris N, Matsliah A, Turner D M, Li J, human-in-the-loop visual analytics: A case study on
Ravindranath S, et al. SLEAP: A deep learning tax risk assessment. Rivista italiana di informatica e
system for multi-animal pose tracking. Nature diritto, 2022, 4(2): 15-21. DOI: 10.32091/RIID0092.
methods, 2022, 19(4): 486-495. DOI:
10.1038/s41592-022-01426-1.
[15] Olan F, Spanaki K, Ahmed W, Zhao G. Enabling
explainable artificial intelligence capabilities in
https://doi.org/10.31449/inf.v46i16.9736 Informatica 49 (2025) 429–440 429
Integrating DDPG and QPSO for Multi-Objective Optimization in
High Proportion Renewable Energy Power Dispatch Systems
Xu’an Qiao1*, Chaofan Liu2
1School of Aeronautics, Chongqing City Vocational College, Chongqing 402160, China
2School of Physics Sciences, University of Science and Technology of China, Hefei 230026, China
E-mail: qiaoxuan109_2023@126.com, Liuchaofan1980_622@126.com
*Corresponding author
Keywords: Power system, dispatch optimization, renewable energy, DDPG, heuristic algorithm
Received: June 16, 2025
This study proposes a novel dispatch optimization model that integrates deep deterministic policy gradient
(DDPG) and quantum particle swarm optimization (QPSO) to address the challenges posed by high
proportions of renewable energy in power systems. The proposed multi-objective optimization framework
considers system cost reduction, supply-demand balance, and dynamic adaptability to renewable energy
fluctuations. The experimental results on the IEEE 30-bus and 118-bus systems demonstrated significant
improvements. This method reduced total system costs by 13.6% and 11.4%, respectively. It also increased
supply reliability to 97.1% and achieved an energy utilization rate of 94.85%. Additionally, it minimized
frequency deviation to 1.25 Hz. The optimization time was also improved, achieving a reduction of 58.3
seconds in efficiency. The research results have important practical application value in improving power
system economy, enhancing system reliability, and dynamic adaptability. It can provide efficient and
reliable technical support for power dispatch planning, load management, and real-time control under
high percentage renewable energy scenarios.
Povzetek:Študija predlaga hibridni model za optimizacijo razporejanja (dispečanja) v omrežjih z visokim
deležem obnovljivih virov, ki združuje izboljšani DDPG in QPSO. Model zmanjša stroške, izboljša
zanesljivost in stabilnost ter poveča izrabo energije. Preizkusi na IEEE sistemih potrjujejo visoko
učinkovitost.
1 Introduction (UC) is performed first to determine the start/stop status
of the units. Then economic dispatch (ED) is performed to
With the continuous adjustment and optimization of the optimize the unit output. Finally, real-time regulation is
global energy structure, carbon peaking and carbon performed through automatic generation control (AGC)
neutrality targets have become important strategies for [6]. Although this staged dispatch model is simple to
countries around the world to cope with climate change operate, there are problems such as response delays
and achieve sustainable development. Because of their between different dispatch modules. In view of this, the
clean and low-carbon benefits, renewable energy (RE) study proposes a multilevel cooperative dispatch model
sources like solar and wind have been frequently used in for HP of renewable energy power systems (REPS) and
this setting [1]. However, the operation and scheduling of introduces heuristic algorithms to accelerate the solution
the conventional power system (PS) have been severely of complex systems. The study aims to solve the
hampered by the widespread use of wind, solar, and other limitations of the traditional sequential scheduling method
high percentage (HP) RE sources. To begin with, RE is and improve the economy, reliability and feasibility of
highly volatile and erratic. Their output is affected by system operation through refined modeling and
natural conditions, including wind speed, light, etc., and cooperative optimization.
there is a large uncertainty [2]. This uncertainty makes the This study's novel contribution is its proposal of a
load balance and stability of the system subject to shocks, joint heuristic algorithm that combines improved deep
and is prone to supply-demand imbalance in the PS in the deterministic policy gradient (DDPG) and quantum
case of peak power demand or insufficient wind and solar particle swarm optimization (QPSO) algorithms to
resources [3]. Second, the traditional PS, which relies on optimize scheduling in high-proportion REPS. This
precise forecasts of load and generation capacity from the method improves the convergence speed and stability of
scheduling model, becomes irrelevant when the complex, multi-constraint, multi-timescale problems by
proportion of RE in the PS increases. Due to the introducing dual experience pooling and time-decaying
significant impact of unstable factors on RE generation exploration strategies. The proposed mathematical model
capacity, there is a substantial discrepancy between actual addresses the inefficiencies and local optima of traditional
and forecasted values. This discrepancy makes short-term models. It provides a more efficient and reliable solution
PS scheduling more challenging and complex [4-5]. In for PS scheduling optimization.
addition, current PS scheduling relies mainly on a phased
sequential scheduling approach, i.e., unit commitment
430 Informatica 49 (2025) 429–440 X. Qiao et al.
2 Related works grid [12]. By proposing a new economic low-carbon clean
PS dispatch model that incorporates power-to-gas
The key to ensuring the PS operates steadily is PS technology, Cui et al. addressed the issue of increasing the
schedule optimization. The optimization method directly grid's capacity to absorb wind power. This model
impacts the stability and adaptability of the PS to RE integrated the effects of multiple price factors, resulting in
fluctuations. These are essential to the functioning and low-carbon PS operation and cost optimization [13]. An
advancement of contemporary PS, as well as its economic enhanced jellyfish search optimization technique was
efficiency. Therefore, many scholars have carried out presented by Gami et al. to solve the optimal reactive
various researches on PS scheduling optimization. For the power dispatch problem in HP renewable PSs. By
optimal reactive power scheduling problem in PS, M. improving the algorithm's exploration and development
Abd-El Wahab et al. suggested a hybrid method called stages, the study successfully optimized the PS's most
augmented Jaya and artificial ecosystem-based secure and stable state [14]. This allowed the PS to operate
optimization, which improved system stability, economic in both deterministic and probabilistic load demands and
viability, and overall efficiency [7]. A nonconvex mixed RE resource states.
integer and quadratic restricted planning technique was In summary, the existing research has made some
presented by Cox J L et al. to solve the challenge of significant progress in PS scheduling optimization.
optimizing a centralized solar power plant's profitability However, there are still deficiencies in the research for HP
under changing solar resources. The method improved the of RE access, such as insufficient consideration of the
solvability of the problem through exact and volatility and stochastic characteristics of RE, fewer
approximation techniques, thus enabling operational studies on the collaborative scheduling of multi-timescale
scheduling optimization in real-time decision support [8]. modules, and insufficiently perfect uncertainty handling
To address the issue of system security and economic cost methods. Therefore, the study proposes to construct a
over time in microgrid scheduling, Zhang et al. suggested mathematical model of HP of REPS scheduling
a multi-timescale scheduling model that incorporated load optimization and introduce a heuristic algorithm to solve
voltage and frequency dynamics. To minimize economic it. The innovation of the study is to propose a multi-
cost while maintaining voltage and frequency stability, the module cooperative optimization framework to accurately
study converted it into a multi-objective optimization deal with uncertainty and extreme scenarios. Meanwhile,
problem that took into account economic cost, voltage the optimization algorithm improves the solution
deviation, and frequency stability. This improved the efficiency and provides new ideas for complex PS
microgrid dispatch's efficiency and dependability. [9]. To scheduling.
address the global issues brought on by the recent
explosive increase in the demand for electricity, Hou et al.
developed an integrated day-ahead multi-objective 3 Methods and materials
microgrid optimization framework. The framework This section provides a detailed description of the PS
produced more affordable, dependable, and ecologically scheduling optimization method proposed in the study.
friendly power supply services by combining demand-side The method consists of scheduling optimization
management, forecasting methods, and economic- mathematical model and scheduling optimization heuristic
environmental dispatch [10]. algorithm. The combination of the two effectively
Large-scale access to the PS by RE sources affects the improves the scheduling efficiency and stability of the PS
output characteristics of wind and photovoltaic energy with a HP of RE access.
sources. These sources exhibit strong intermittency,
randomness, and volatility due to weather, climate, and 3.1 Mathematical model construction for
other external natural factors. These challenges have led power system scheduling optimization
to the urgent need for innovation and optimization of
existing dispatch methods to adapt to the new situation of The PS suffers scheduling complexity issues brought on
HP of RE access. A mixed-integer linear programming by volatility and uncertainty as a result of the extensive
method was put up by Shirzadi et al. to address the issue access to HP of RE sources. Moreover, the conventional
of enhancing the efficiency and dependability of RE phased scheduling approach finds it challenging to satisfy
systems. The study optimized the PS's daily operating the needs of system stability and economy [15-16].
expenses and system resilience by combining a unique Therefore, the study proposes a PS scheduling
hybrid model with deep learning and statistical modeling optimization method for HP of RE access.
to forecast the load demand and wind power output (PO) The two main cores of the method are a mathematical
for the ensuing three days [11]. Due to the impact of the model that can achieve multi-module co-optimization by
volatility of wind and solar power generation on PS comprehensively considering system costs, constraints,
operation, Guo et al. proposed multi-stage optimization, and uncertainties. The heuristic algorithm can efficiently
online optimization, and multi-timescale optimization for solve complex optimization problems, taking into account
RE integration. This study realized the strategic the global search and local fine optimization. Figure 1
scheduling and control of energy storage units and depicts the method's general framework.
improved the efficiency of RE integration in the power
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 431
Input Module 1 Mathematical Model 2
Construction Module
Load Demand Forecast Objective Function
Renewable Energy Output Scenarios Constraints
Various parameters Uncertainty Handling
Output Module 4 Heuristic Algorithm 3
Design Module
Iterative
Optimized Scheduling Plan Initialization Phase
Optimization Phase
Performance Metrics Optimal Termination
Solution Output Condition Evaluation
Figure 1: Overall framework of power system scheduling optimization method
Real-Time Operational Feedback Optimization
UC On/Off ED AGC
Base
Module Status Input Module Module
Load Input
Generator Real-Time Power
UC Decisions
Output Dispatch Adjustment
Short-Term
Long-Term Decisions Medium-Term Dispatch
Real-Time Adjustment
Objective
Constraints
Function
Optimized Results
Figure 2: Closed-loop sequential optimization process among UC, ED, and AGC modules
Four components make up the general architecture of including unit start/stop status, power allocation, and
the PS scheduling optimization approach suggested in the standby capacity configuration. It also evaluates the
study, as shown in Figure 1: the input module, the output economy and stability of the scheduling scheme through
module, the heuristic algorithm design module, and the performance indicators.
mathematical model construction module. The input The mathematical model developed in this study
module contains load demand forecast, RE output differs from the traditional stage-wise sequential
scenarios, and various parameters as support. The scheduling model in terms of the scheduling optimization
mathematical model construction module is then approach. The proposed model forms a closed-loop
responsible for constructing the PS scheduling sequential scheduling framework with dynamic feedback
optimization model based on three main foundations: coupling among modules by integrating the UC for start-
objective function (OF), constraints, and uncertainty stop decisions, the ED for cost minimization, and the AGC
handling [17-18]. The scheduling optimization model for real-time supply-demand balancing. These modules
based on the four steps is effectively solved using the operate across different time scales and interact through
heuristic algorithm design module. Finally, the output feedback mechanisms to realize coordinated optimization.
module generates the optimized scheduling plan,
432 Informatica 49 (2025) 429–440 X. Qiao et al.
The principle of inter-module coordination is illustrated in In Equation (3), t denotes the time period when the
Figure 2. unit is turned on. T denotes the minimum
min-on,i
As shown in Figure 2, the scheduling optimization
continuous operation time of the i th unit. The output
process proposed in this study adopts a closed-loop
power constraint is shown in Equation (4).
sequential optimization mechanism, consisting of three
Pi,min Pi,t P
main modules: UC, ED, and AGC. These modules interact i,max (4)
across multiple time scales through real-time feedback to In Equation (4), P and
i ,mi P are the maximum
n i,max
achieve dynamic coordination. During each scheduling and minimum PO of the i th unit. P denotes the PO of
i,t
cycle, the UC module first optimizes the on-and-off status
of units based on current load forecasts, reserve the i th unit at time t . Both Equation (3) and Equation (4)
requirements, and other system parameters. Then, it hold only when the value of u is 1. The ED module is
i,t
passes the results to the ED module. Then, the ED module responsible for optimizing the power allocation of the
performs power allocation and generates a base load turned-on units after the UC determines the SSS of the
profile for the AGC module, which makes real-time, units [20]. The power balance constraints in the ED
short-term power adjustments. Unlike traditional dispatch module are shown in Equation (5).
models, which operate in isolated stages, the AGC module N
in this framework generates feedback information, such as Pi,t + PRES ,t = Dt ,t (5)
i=1
load correction values and reserve margin stress levels,
continuously during its adjustment process. Rather than In Equation (5), P denotes the RE output at time
RES ,t
discarding this data, it is fed back as correction inputs into t . D denotes the load demand at time t . The climbing
t
the next UC scheduling cycle. Specifically, the system capacity constraint is shown in Equation (6).
monitors the magnitude and frequency of AGC
P, − P up
, 1 , down
i t i t− Ri Pi,t−1 − Pi,t Ri ,i, t (6)
adjustments in the previous cycle. If frequent or
In Equation (6), Rup and Rdown
significant real-time corrections are observed, it indicates are the upper and
i i
potential deficiencies in load forecasting or reserve lower climbing limits for unit i , respectively. The reserve
planning. In response, the system increases the reserve capacity constraint is shown in Equation (7).
capacity settings for the next cycle to improve operational N
redundancy. At the same time, the load forecast is Ri,t Rrequired ,t , Ri ,t = Pi ,max − Pi ,t ,t (7)
corrected by incorporating observed deviations into the i=1
predicted curve. This enables the UC module to make In Equation (7), R is the standby capacity
required ,t
more accurate start and stop decisions that reflect actual requirement of the system at time t . R denotes the
i ,t
system demand. This adaptive feedback mechanism is
standby capacity that the i th unit can provide at time t .
repeated in every cycle, progressively refining UC
Equation (8) illustrates how the AGC module regulates the
decisions to better match real-world operating conditions
fluctuations by modifying the power in real time
and improve overall dispatch responsiveness.
depending on ED.
In the mathematical model, schedule optimization
aims to reduce the system's overall running costs. The OF P = base
i ,t Pi ,t + Pi ,t ,i,t (8)
is set as shown in Equation (1). In Equation (8), Pbase
i,t denotes the base point load
min Z =
provided by the ED module [21]. P is the real-time
T N (1) i,t
(C fuel ,i,t +Cstart/stop,i,t )+Cresreve,t +CEENS ,t power adjustment of the AGC module. The real-time
t=1 i=1
balancing constraint is shown in Equation (9).
In Equation (1), Z denotes the total system cost. T
N
denotes the total quantity of scheduling time segments. N Pi ,t = Dt ,
is the total quantity of units. C is the fuel cost of the i=1
fuel ,i ,t (9)
N
i th unit at time t . C is the startup and shutdown
start /stop,i,t D = D −Pbase
t t i ,t − PRES ,t
cost of the i th unit at time t . C is the standby cost i=1
resreve,t
In Equation (9), D represents the difference
at time t . C is the desired power deficit cost at time t
EENS ,t
between the actual load demand and the base point load
t , which mainly measures the supply-demand imbalance
and RE output. The adjustment speed constraint is shown
caused by RE fluctuations [19]. The UC module is
in Equation (10).
responsible for optimizing the start-stop state (SSS) of the
P Rresponse
i,t i (10)
units. The SSS constraint u is shown in Equation (2).
i,t
In Equation (10), Rresponse denotes the upper limit of
ui,t 0,1,i, t (2) i
real-time regulation speed. As a result, the synergistic
In Equation (2), a value of 1 for u indicates that the
i,t relationship among the UC, ED, and AGC modules is
unit is on and a value of 0 indicates that the unit is off. realized through the tight coupling of inputs and outputs.
Equation (3) displays the unit start/stop time limitation. The UC provides the SSS for the ED, the ED provides the
T base point load for the AGC, and the feedback from the
ui,tt Tmin-on,i (3) AGC optimizes the start-stop strategy of the UC.
t=1
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 433
The proposed model incorporates uncertainty in wind search capabilities using quantum behavioral
and solar output directly into the scheduling process to mechanisms. Hence, this study combines improved
enhance adaptability to RE fluctuations. A limited number DDPG and QPSO to propose a joint heuristic algorithm.
of representative renewable output scenarios are generated Figure 3 illustrates the computational flow of the
during each scheduling cycle by applying random enhanced DDPG in this technique.
deviations to forecasted values based on recent historical In Figure 3, the study introduces a dual experience
variation. These scenarios simulate possible short-term pooling mechanism in DDPG, which balances exploration
fluctuations in renewable generation. The reserve capacity and utilization by storing diverse samples and high-value
constraint is adjusted accordingly based on the observed samples separately to improve training efficiency and
fluctuation range, ensuring sufficient buffer during high- policy quality. Second, to prevent falling into the local
variability periods. In the AGC stage, real-time control optimum, a time-decaying exploration noise technique is
targets are fine-tuned using deviation trends derived from used to boost exploration at the beginning and improve
these scenarios. The proposed model maintains dispatch stability at the end. Finally, the target network update
feasibility and system stability under uncertain renewable strategy is optimized to dynamically adjust the target
output conditions by dynamically updating reserve network parameters through the soft update method to
settings and AGC parameters. enhance the training stability and convergence speed.
The improved DDPG workflow consists of four main
3.2 Power system scheduling optimization stages. First, in the initialization phase, the Critic network,
heuristic algorithm design Actor network, and their target networks are randomly
initialized. Additionally, two experience pools, B1 and
The proposed mathematical model for optimizing PS B2, are established. B1 stores the initial experience
scheduling takes into account total system operating costs, samples. B2 stores the high-value samples that are
the synergistic optimization of multiple modules, and the selected using a filtering mechanism. The dual-experience
uncertainty associated with a high proportion of RE pool design maintains diversity in the training data, which
sources. This provides a theoretical basis for scheduling. improves sample selection efficiency. Next, within the
However, the simple model may be inefficient or training iterations and time step loop, the agent generates
susceptible to local optimization when dealing with actions via the policy network. This enhances the
complex, multi-constraint, multi-timescale optimization exploration of unknown strategies by adding exploration
problems [22]. Therefore, heuristic algorithms are noise. After interacting with the environment, experience
introduced to optimize the mathematical model and solve samples are generated and stored in B1. High-value
it. In deep reinforcement learning, the DDPG method samples are then selected based on reward values and
effectively optimizes unit SSS and power allocation. stored in B2. In this way, the experience pool contains
However, it may converge slowly and become trapped in both common experience samples and high value samples.
local optima when solving complex problems with This ensures the samples are diverse and valuable for
multiple constraints [23]. QPSO overcomes the limitations training, thereby improving the algorithm's learning
of DDPG by improving particle diversity and global efficiency.
Sort Pool 1 by reward
Start (descending); remove
low-reward samples to
Initialize the Critic and Actor
Store experience in Pool 1 form Pool 2
networks
Initialize two target Perform action, receive reward and
networks next state
Is Pool 1 full?
Initialize experience Select actions based on the current N
pools 1 and 2 strategy and explore noise
Y
N
Sample a batch from
Is max training round Y Is the number of time Pool 2 using the
reached? Steps at its maximum? strategy
Y
N Update Critic network
and Actor network
Get initial state from
environment
Update two target
End networks
Figure 3: The framework of DDPG-QPSO joint heuristic algorithm
434 Informatica 49 (2025) 429–440 X. Qiao et al.
Initialize particle positions
Check whether the maximum
number of iterations or
Start
convergence condition is reached Initialize algorithm parameters
Initialization
Update particle positions based on Calculate the initial
the quantum behavior formula fitness values of particles
Main loop
Determine the global best position
Calculate fitness values
and individual best positions
Output the
Update the global best position
optimal solution
and individual best positions
Dynamically adjust parameters End
Figure 4: Schematic diagram of the algorithm flow of QPSO
Then, a small batch of high-value samples is sampled
from B2 to update the Critic and Actor networks. The
(t+1) (t ) 1
Critic network is updated using the error calculated from xi, j = Pi, j xi, j − Pi, j ln (11)
u
the target value. Meanwhile, the Actor network is updated
(t ) +1
using the policy gradient method to maximize the long- In Equation (11), x )
i , j and (t
x denote the updated
i, j
term cumulative reward. This optimization allows the position of particle i after t and t +1 iterations in the j
model to continuously improve its policy and value
th dimension. P denotes the reference point of particle
function, thereby enhancing the quality of its decisions. i , j
Finally, the research employs a soft update method that i in the j th dimension. denotes the quantum
dynamically adjusts the target network parameters. This modulation factor. u denotes the random number, which
further improves training stability and convergence speed. is used to introduce randomness to give the particle a non-
The soft update strategy smoothly adjusts the target deterministic update property. P is obtained from the
i , j
network parameters. This prevents excessive fluctuations
GOP and the IOP with certain weights, as shown in
in the network during training and avoids instability
Equation (12).
caused by dramatic parameter updates. Figure 4 depicts
the QPSO algorithm's flow. Pi, j = pbest ,i, j + (1− ) gbest , j (12)
In Figure 4, the overall process of QPSO algorithm is In Equation (12), p and denote the IOP
best ,i, j gbest , j
not much different from the traditional PSO algorithm. and GOP, respectively. denotes the inertia factor,
The steps are initializing particle positions and
which is used to control whether the particle prefers the
parameters, calculating fitness values, updating global
individual OS or the global OS. In the integration of
optimal position (GOP) and individual optimal position
DDPG and QPSO, the improved DDPG algorithm first
(IOP), dynamically adjusting parameters, and iterative
generates an initial dispatch strategy based on the current
judgment to achieve the output optimal solution (OS).
environmental state and load demand information. This
However, the core difference between the two is the way
strategy includes the start-stop decisions and power
of updating the particle position. Traditional PSO is based
allocation for each generation unit over all time periods.
on the iterative formula of velocity and position, while
The output of DDPG is a deterministic decision vector,
QPSO adopts the quantum behavioral formula, which
representing an executable scheduling solution. This
constructs the quantum distribution of the particle position
comprehensive scheduling solution serves as a key
through the GOP and IOP. In the QPSO algorithm, the
reference for initializing the population in the QPSO
quantum modulation factor controls the randomness of the
algorithm. More specifically, the DDPG output is encoded
particle update process and regulates the particles' ability
as a particle position within the QPSO search space, which
to explore the search space. This enhances search diversity
is then assigned as the initial position of at least one
and prevents the algorithm from getting trapped in local
particle within the swarm. The remaining particles are
OSs. Quantum distribution describes the probabilistic
initialized in the vicinity of this solution through random
characteristics of particle position updates. New particle
perturbations, ensuring that the initial population has both
positions are generated through formulas based on
guidance and diversity. Based on this initialization, QPSO
quantum behavior by combining global and IOPs. This
performs global search optimization. Its quantum-
reflects the non-deterministic update mode inspired by
behavior mechanism further refines and adjusts the
quantum mechanics. The quantum behavioral formulation
scheduling strategy, improving the solution's overall
updates the particle positions as shown in Equation (11).
stability and adaptability.
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 435
Initially, the global search is favored and the local system cost, a multi-module cooperative scheduling
optimization is favored in the later stage. Combining the model comprising UC, ED, and AGC modules is built.
above, the final PS scheduling optimization flow designed The improved DDPG is utilized to generate the initial
by the study is shown in Figure 5. optimization strategy, which is further optimized by
In Figure 5, the final PS scheduling optimization QPSO. The optimized scheduling plan covers unit
process consists of inputting load demand forecasts, RE start/stop status, power allocation, and standby capacity
output scenarios, and related parameters to provide basic configuration. It ultimately achieves efficient and stable
data for optimization. With the aim of reducing the overall scheduling optimization.
Input load demand forecast
Input renewable energy
output scenarios
Input unit parameters
Determine Set constraint
Input uncertainty factors optimization objectives conditions
DDPG strategy as qPSO Improved DDPG for Collaborative
initialization strategy generation scheduling model
QPSO algorithm outputs Output the optimized
the global optimal solution scheduling plan
Figure 5: Final power system scheduling optimization process
Table 1: Experimental environment configuration
Hardware configuration Software configuration
CPU Intel Core i9-12900K (16 cores, 3.2 GHz) Operating system Ubuntu 22.04 LTS
GPU NVIDIA GeForce RTX 3090 (24GB VRAM) Programming language Python 3.10
Memory 32GB DDR4 Deep learning framework TensorFlow 2.10
Storage 1TB SSD Optimization algorithm library NumPy, SciPy, Pyomo
Power supply 850W High-Efficiency Power Supply Power system simulation tools MATPOWER 7.1 (MATLAB Toolbox)
/ / Data processing tools Pandas
Table 2: Results of model scheduling performance differences
IEEE 30-Bus Test System IEEE 118-Bus Test System
Indicator Traditional sequential Traditional sequential
The proposed model The proposed model
scheduling model scheduling model
Total cost $12,500 $10,800 $48,200 $42,700
Fuel cost $7,200 $6,500 $28,500 $26,000
Startup/shutdown cost $3,000 $2,500 $12,000 $10,200
Reserve cost $2,000 $1,500 $6,500 $5,500
Demand-supply Imbalance cost $300 $300 $1,200 $1,000
Supply-demand deviation (MW) 5.5 4.2 25.0 18.5
Response time (s) 10.1 8.3 18.7 13.5
4 Results 4.1 Validation of mathematical model for
The efficiency and superiority of the PS scheduling power system scheduling optimization
optimization methods suggested in the study are A multi-module cooperative scheduling model including
confirmed in this section using both heuristic algorithms UC, ED, and AGC modules is constructed with the goal
and mathematical models. The focus is on verifying the of lowering the overall system cost.
effectiveness of multi-module collaboration and Based on Table 1, the study selects the 30-node test
uncertainty handling, as well as multi-timescale system and the 118-node test system from the IEEE
optimization and the performance enhancement and standard examples. The former includes 30 nodes, 41
comprehensive optimization capabilities of the improved transmission lines, 6 generators, and 20 load nodes, which
DDPG, QPSO, and joint heuristic algorithms. are suitable for preliminary verification and
experimentation. The latter includes 118 nodes, 186
transmission lines, 54 generators, and 99 load nodes,
which can be used for in-depth research on the
436 Informatica 49 (2025) 429–440 X. Qiao et al.
optimization capabilities of multi module collaborative 5.2s respectively, improving the dynamic response
scheduling and heuristic algorithms. First, the capability. Overall, the mathematical model proposed in
performance of dispatches under the proposed closed-loop the study has achieved more efficient resource utilization
sequential scheduling model based on UC-ED-AGC in small-scale systems and demonstrated superior
feedback is compared with that under a traditional stage- adaptability to complex problems in large-scale systems.
wise sequential scheduling model. The traditional model The suggested model is compared to conventional PS
is a dispatch process in which the UC, ED, and AGC scheduling models that do not take uncertainty handling
modules run independently in a fixed order. This process into account since it takes into account the uncertainty of
does not consider RE uncertainty or provide feedback or RE. The result is shown in Figure 6.
coordination. To ensure a fair comparison, both models In Figure 6 (a), within 30 days of PS scheduling
are solved using the same optimization algorithm (QPSO) optimization, the proposed model achieves a power supply
under identical system configurations and forecast reliability of over 94%, with an average of 96.58%.
conditions. This setup ensures that performance However, traditional models that do not consider
differences are attributed to model structure rather than uncertainty processing have the highest power supply
solver differences. Table 2 displays the findings. reliability of only 93.73%, with an average of only
In Table 2, the mathematical model proposed in the 92.16%. In Figure 6 (b), for the utilization rate of backup
study demonstrates advantages in both IEEE 30 node and capacity, after 30 days of model operation, the proposed
118 node testing systems. In terms of economy, the total model increases the utilization rate to between 75% and
cost of the 30 node system has been reduced by $1700, 87%, while the utilization rate of the traditional model
and the 118 node system has been reduced by $5500. It only fluctuated between 60% and 70%. The outcomes
optimizes fuel, start stop, and reserve capacity costs. In displays that the proposed model improves the
terms of supply-demand balance capability, the supply- adaptability of RE fluctuations in PS scheduling
demand deviation has been reduced by 1.3 MW and 6.5 optimization. Finally, on various time scales, the impact
MW respectively, effectively addressing the uncertainty of scheduling optimization of the study's suggested model
of load demand and RE fluctuations. Meanwhile, the real- is confirmed. The result is shown in Figure 7.
time adjustment response time is shortened by 1.8s and
The proposed model Traditional model The proposed model Traditional model
100 90
98
80
96
94 70
92
90 60
88
50
86
84 40
0 30 0 30
Time (days) Time (days)
(a) Comparison of supply reliability (b) Comparison of reserve utilization rate
Figure 6: Results compared with traditional models that do not consider uncertainty processing
In Figure 7(a), in the short-term time (24 hours), the whereas after optimization the energy consumption rate
frequency deviation (FD) of the PS dispatch before improves to 93.84%. In Figure 7(c), in the long-term time
optimization is much larger, up to more than 4Hz. i.e., one year, the optimized PS dispatch significantly
Whereas, after optimization using the research model, the reduces the dispatch cost from $45,600 to $37,860 while
FD of the PS dispatch is effectively controlled and remains the pre-optimization dispatch cost is $43,080. The
between -2Hz and 2Hz. In Figure 7(b), the energy outcomes reveal that the suggested model performs better
consumption rate of the PS dispatch before optimization in terms of long-term economics, medium-term
averages 84.15% during the interim time i.e., one week, efficiency, and short-term stability.
Supply reliability (%)
Reserve utilization rate (%)
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 437
6
Before optimization After optimization
4
2
0
-2
-4
0 2 4 6 8 10 12 14 16 18 20 22 24
Time (hour)
(a) Short-term optimization effect
100
Before optimization 46,000
After optimization
95 44,000
42,000
90
40,000
85
38,000
Before optimization
80 36,000 After optimization
75 34,000
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 10 11 12
Time (day) Time (month)
(b) Medium-term optimization effect (c) Long-term optimization effect
Figure 7: Scheduling optimization effect on different time scales
Traditional DDPG
48000 58000 Traditional DDPG
Improved DDPG Improved DDPG
46000
56000
44000
54000
42000
52000
40000
50000
38000
48000
36000
46000
0 40 80 120 160 200 0 80 160 240 320 400
Iterations Iterations
(a) IEEE 30-Bus Test System (b) IEEE 118-Bus Test System
Figure 8: Comparison of DDPG algorithm before and after improvement
4.2 Validation of heuristic algorithms for the learning rates of the Critic network and the Actor
power system scheduling optimization network are set to 0.0001. The discount factor is 0.99, the
batch size is 64, and the experience pool size is 1,000,000.
After the validity and superiority of the mathematical The noise is examined through the use of an Ornstein-
model proposed by the study is verified, the study further Uhlenbeck process, which has an initial standard deviation
validates the involved heuristic algorithms. Experiments of 0.2 and undergoes attenuation during the training
are first conducted for the improvement of the DDPG process. The target network adopts a soft update with an
algorithm. The DDPG before and after the improvement update parameter of 0.001 to ensure stability. The results
is applied to solve the mathematical model proposed by are shown in Figure 8.
the study at the IEEE 30-node and 118-node test systems. Figure 8(a) shows that the traditional DDPG
When conducting the experiment, in the DDPG algorithm, algorithm converges after 160 iterations when using the
IEEE 30-node system, resulting in a total cost of $39,560.
Total cost (USD) Renewable energy utilization rate (%)
Frequency deviation (Hz)
Total cost (USD) Total cost (USD)
438 Informatica 49 (2025) 429–440 X. Qiao et al.
The improved DDPG converges after 80 iterations, and In Figure 9(a), in the IEEE 30-node test system,
the total cost is reduced to $37,960. This improvement QPSO has the fastest convergence speed among the five
benefits from dual experience pooling. Storing low- algorithms and the lowest final fitness value. In Figure
quality samples and high-value samples separately 9(b), in the IEEE 118-node test system, again QPSO has
optimizes the efficiency of sample utilization and the fastest convergence speed among the five algorithms
improves the training speed. This accelerates the and the lowest final fitness value. It can be concluded that
convergence process and reduces the cost. Meanwhile, QPSO effectively enhances the global search capability of
time-decay exploration improves initial exploration particles through the quantum behavior mechanism. Both
capabilities and stabilizes strategy optimization in later in the smaller-scale IEEE 30-node system and in the more
stages, accelerating convergence and reducing costs. In As complex IEEE 118-node system, QPSO shows superior
shown in Figure 8(b), both the traditional and improved performance, proving its adaptability to problems of
DDPG require more iterations to converge when using the different scales and complexities. Finally, the study
more complex IEEE 118-node system. However, the applies the proposed mathematical model in combination
improved DDPG still performs better and has a lower with the joint DDPG-QPSO heuristic algorithm in HP RE
convergence cost. The dual experience pool and time scheduling optimization. The more advanced methods in
decay exploration strategy effectively improves the reference [11], [12], 13], and [14] are selected as
algorithm's adaptability and convergence efficiency in comparison methods. The algorithms from references [11]
large-scale systems, demonstrating its superiority. to [14] are re-implemented by the research team based on
Furthermore, the optimization effect of QPSO is validated the original descriptions in the respective papers. Each
through research, and differential evolution (DE), grey method is tuned within a reasonable range of parameters
wolf optimizer (GWO), and wolf search algorithm (WSA) based on the recommended settings. Then, it is validated
are selected for comparison. In the QPSO algorithm, the to ensure optimal performance in the current test
number of particles is 50, the maximum number of environment. All methods are evaluated under the same
iterations is 1000, and the inertia factor is 0.9. The learning experimental conditions, which includes load forecast
factors are set to 1.5 and 2.0, respectively, to control the profiles, RE output scenarios, system topology, and a
global and local optimal attractive forces. Setting the unified evaluation period. All performance metrics are
quantum modulation factor to 0.5 enhances the flexibility kept consistent across experiments. The results are
of the particle position update. Figure 9 displays the presented in Table 3.
findings.
PSO PSO
DE DE
1.0 1.0
GWO GWO
WSA WSA
0.8 0.8
QPSO QPSO
0.6 0.6
0.4 0.4
0.2 0.2
0
0 40 80 120 160 200 0 40 80 120 160 200
Iterations Iterations
(a) IEEE 30-Bus Test System (b) IEEE 118-Bus Test System
Figure 9: Optimization effect verification of QPSO
Table 3: Comprehensive comparison between research methods and reference methods
Method Total cost ($) Energy utilization rate (%) Supply reliability (%) Frequency deviation (Hz) Optimization time (s)
Proposed method 37960 94.85 97.10 1.25 58.3
Reference [11] 40230 90.76 93.85 2.64 125.4
Reference [12] 39760 92.42 94.50 2.18 98.7
Reference [13] 38450 93.57 95.87 1.89 75.6
Reference [14] 38930 93.25 95.30 2.01 88.9
In Table 3, the proposed mathematical model and the methods with the lowest total cost of $37960 and 94.85%
joint DDPG-QPSO heuristic algorithm of the study show energy consumption rate. Meanwhile, the reliability of
more obvious advantages in HP of REPS scheduling power supply reaches 97.10%, the FD is only 1.25 Hz, and
optimization. The proposed method outperforms other the optimization time is 58.3 s. The proposed algorithm
Fitness value
Fitness value
Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 439
demonstrates excellent economy, system stability, and system and production process optimization. IEEE
solution efficiency, and provides a highly efficient and Transactions on Industry Applications, 58(2):1581-
reliable solution for PS scheduling optimization in high- 1591, 2022.
percentage RE scenarios. https://doi.org/10.1109/TIA.2022.3144652
[2] A. A. Lebedev, A. A. Voloshin, and A. N. Lednev.
5 Discussion and conclusion Conceptual framework for developing highly-
automated power distribution networks and
Targeting the scheduling issue brought on by the HP of micropower systems. Power Technology and
RE access in the PS, the study put out a mathematical Engineering, 58(1):163-168, 2024.
model for multi-module cooperative scheduling that https://doi.org/10.1007/s10749-024-01790-2
brought together three main modules and used the [3] Ehsan Naderi, Lida Mirzaei, Mahdi Pourakbari-
enhanced DDPG and QPSO algorithms to address the Kasmaei, Fernando V. Cerna, and Matti Lehtonen.
issue. The efficacy of the study's suggested model and Optimization of active power dispatch considering
algorithm was confirmed by experimental findings. In the unified power flow controller: application of
IEEE 30-node and 118-node test systems, the proposed evolutionary algorithms in a fuzzy framework.
model reduced the total scheduling cost by $1,700 and Evolutionary Intelligence, 17(3):1357-1387, 2024.
$5,500, respectively, compared with the traditional https://doi.org/10.1007/s12065-023-00826-2
sequential scheduling model. It enhanced the energy [4] Lilin Cheng, Haixiang Zang, Anupam Trivedi, Dipti
consumption rate and power supply reliability. The Srinivasan, Zhinong Wei, and Guoqiang Sun.
improved DDPG algorithm increased the convergence Mitigating the impact of photovoltaic power ramps
speed by 50% from $39,560 to $37,960 in the 30-node on intraday economic dispatch using reinforcement
system by introducing a dual experience pool and a time forecasting. IEEE Transactions on Sustainable
decay exploration strategy. QPSO exhibited a stronger Energy, 15(1):3-12, 2023.
global search capability, with the fastest convergence https://doi.org/10.1109/TSTE.2023.3261444
speed and the lowest final adaptation value in systems of [5] Yu Dong, Xin Shan, Yaqin Yan, Xiwu Leng, and Yi
different sizes compared to other algorithms. In addition, Wang. Architecture, key technologies and
the study's optimization experiments on short-, medium-, applications of load dispatching in China power grid.
and long-term time scales revealed that the FD was Journal of Modern Power Systems and Clean
effectively reduced, the energy consumption rate was Energy, 10(2):316-327, 2022.
improved by 9.69%, and the total dispatch cost was https://doi.org/10.35833/MPCE.2021.000685
reduced by 17.6%. The adaptability and superiority of the [6] Huating Xu, Bin Feng, Chutong Wang, Chuangxin
model in cooperative optimization over multiple time Guo, Jian Qiu, and Mingyang Sun. Exact box-
scales were demonstrated. constrained economic operating region for power
The DDPG and QPSO algorithms perform well in the grids considering renewable energy sources. Journal
test system. However, they may face challenges regarding of Modern Power Systems and Clean Energy,
scalability and adaptability in an actual power grid. As the 12(2):514-523, 2023.
power grid grows, its computational complexity will https://doi.org/10.35833/MPCE.2023.000312
increase significantly. This is particularly relevant when [7] Ahmed M. Abd-El Wahab, Salah Kamel, Mohamed
working with large volumes of data and real-time H. Hassan, José Luis Domínguez-García, and Loai
dispatching. These factors can lead to a shortage of Nasrat. Jaya-AEO: an innovative hybrid optimizer
computing resources and excessively long training times. for reactive power dispatch optimization in power
In addition, the diversity of power grid topologies and systems. Electric Power Components and Systems,
operating conditions may affect the algorithm's 52(4):509-531, 2024.
adaptability. The power grid itself contains complex https://doi.org/10.1080/15325008.2023.2227176
generators, energy storage systems, and distributed energy [8] John L. Cox, William T. Hamilton, Alexandra M.
resources. Corresponding adjustments to the algorithm are Newman, Michael J. Wagner, and Alex J. Zolan.
required to effectively address these. In terms of real-time Real-time dispatch optimization for concentrating
performance, the algorithm functions well in a simulated solar power with thermal energy storage.
environment. However, in a highly dynamic actual power Optimization and Engineering, 24(2):847-884, 2023.
grid, it may not respond promptly to load fluctuations and https://doi.org/10.1007/s11081-022-09711-w
changes in RE. This affects the stability of the system. [9] Huifeng Zhang, Dong Yue, Chunxia Dou, and
Therefore, although the algorithm performs well in the test Gerhard P. Hancke. PBI based multi-objective
system, it still needs further verification and optimization optimization via deep reinforcement elite learning
for practical applications to improve stability and response strategy for micro-grid dispatch with frequency
speed. dynamics. IEEE Transactions on Power Systems,
38(1):488-498, 2022.
References https://doi.org/10.1109/TPWRS.2022.3155750
[10] Sicheng Hou, and Shigeru Fujimura. Day-ahead
[1] Lei Gan, Tianyu Yang, Xingying Chen, Gengyin Li,
multi-objective microgrid dispatch optimization
and Kun Yu. Purchased power dispatching potential
based on demand side management via particle
evaluation of steel plant with joint multienergy
swarm optimization. IEEJ Transactions on Electrical
440 Informatica 49 (2025) 429–440 X. Qiao et al.
and Electronic Engineering, 18(1):25-37, 2023. Environmental Protection, 178(1):715-727, 2023.
https://doi.org/10.1002/tee.23711 https://doi.org/10.1016/j.psep.2023.08.025
[11] Navid Shirzadi, Fuzhan Nasiri, Claude El-Bayeh, [20] Maolin Li, Youwen Tian, Haonan Zhang, and
and Ursula Eicker. Optimal dispatching of renewable Nannan Zhang. The source-load-storage
energy-based urban microgrids using a deep learning coordination and optimal dispatch from the high
approach for electrical load and wind power proportion of distributed photovoltaic connected to
forecasting. International Journal of Energy power grids. Journal of Engineering Research,
Research, 46(3):3173-3188, 2022. 12(3):421-432, 2024.
https://doi.org/10.1002/er.7374 https://doi.org/10.1016/j.jer.2023.10.042
[12] Zhongjie Guo, Wei Wei, Mohammad Shahidehpour, [21] Junjie Rong, Ming Zhou, Zhi Zhang, and Gengyin
Zhaojian Wang, and Shengwei Mei. Optimisation Li. Coordination of preventive and emergency
methods for dispatch and control of energy storage dispatch in renewable energy integrated power
with renewable integration. IET Smart Grid, systems under extreme weather. IET Renewable
5(3):137-160, 2022. Power Generation, 18(7):1164-1176, 2024.
https://doi.org/10.1049/stg2.12063 https://doi.org/10.1049/rpg2.12893
[13] Dai Cui, Weichun Ge, Wenguang Zhao, Feng Jiang, [22] Zhoujun Ma, Yizhou Zhou, Yuping Zheng, Li Yang,
and Yushi Zhang. Economic low-carbon clean and Zhinong Wei Distributed robust optimal
dispatching of power system containing P2G dispatch of regional integrated energy systems based
considering the comprehensive influence of multi- on ADMM algorithm with adaptive step size. Journal
price factor. Journal of Electrical Engineering & of Modern Power Systems and Clean Energy,
Technology, 17(1):155-166, 2022. 12(3):852-862, 2023.
https://doi.org/10.1007/s42835-021-00877-4 https://doi.org/10.35833/MPCE.2023.000204
[14] Fatma Gami, Ziyad A. Alrowaili, Mohammed [23] Jian Hu, Yingjun He, Wenqian Xu, Yixin Jiang,
Ezzeldien, Mohamed Ebeed, Salah kamel, Eyad S. Zhihong Liang, and Yiwei Yang. Anomaly detection
Oda, and Shazly A. Mohamed. Stochastic optimal in network access-using LSTM and encoder-
reactive power dispatch at varying time of load enhanced generative adversarial networks.
demand and renewable energsy resources using an Informatica, 49(7):175-186, 2025.
efficient modified jellyfish optimizer. Neural https://doi.org/10.31449/inf.v49i7.7246
Computing and Applications, 34(22):20395-20410,
2022. https://doi.org/10.1007/s00521-022-07526-5
[15] Jatin Soni, and Kuntal Bhattacharjee. Multi-
objective dynamic economic emission dispatch
integration with renewable energy sources and plug-
in electrical vehicle using equilibrium optimizer.
Environment, Development and Sustainability,
26(4):8555-8586, 2024.
https://doi.org/10.1007/s10668-023-03058-7
[16] Yukang Shen, Wenchuan Wu, Bin Wang, and
Shumin Sun. Optimal allocation of virtual inertia and
droop control for renewable energy in stochastic
look-ahead power dispatch. IEEE Transactions on
Sustainable Energy, 14(3):1881-1894, 2023.
https://doi.org/10.1109/TSTE.2023.3254149
[17] Xiaojing Wang, Li Han, Mengjie Li, and Panpan Lu.
A time-scale adaptive a forecasting and dispatching
integration strategy of the combined heat and power
system considering thermal inertia. IET Renewable
Power Generation, 17(8):1966-1977, 2023.
https://doi.org/10.1049/rpg2.12743
[18] Bing Sun, Ruipeng Jing, Leijiao Ge, Yuan Zeng,
Shimeng Dong, and Luyang Hou. Quick hosting
capacity evaluation based on distributed dispatching
for smart distribution network planning with
distributed generation. Journal of Modern Power
Systems and Clean Energy, 12(1):128-140, 2023.
https://doi.org/10.35833/MPCE.2022.000604
[19] Wei Liu, Tianhao Wang, Shuo Wang, Zhijun E, and
Ruiqing Fan. Day-ahead robust optimal dispatching
method for urban power grids containing high
proportion of renewable energy. Process Safety and