https://doi.org/10.31449/inf.v49i16.7805 Informatica 49 (2025) 1–20 1 Automating Financial Audits with Random Forests and Real-Time Stream Processing: A Case Study on Efficiency and Risk Detection Jianlin Li1,2, Wanli Liu3*, Jie Zhang4 1School of Business Administration, Hebei University of Economics and Business Shijiazhuang 050061, Hebei, China 2Research Center for Corporate Governance and Enterprise Growth of Hebei University of Economics and Business, Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China 3Finance Department, Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China 4Office of Scientific Research, Hebei University of Economics and Business, Shijiazhuang 050061, Hebei, China E-mail: Jianlin_Li1748@outlook.com, L135821liu_ll@hotmail.com, Jie_Zhang0152@outlook.com *Corresponding author Keywords: artificial intelligence, financial audit, automated method Received: December 11, 2024 In the current complex economic environment, enterprises are increasingly in need of efficient, accurate and real-time financial audits. Traditional audit methods are difficult to cope with the challenges brought by massive data and dynamic risks. This paper explores the automation method of financial audits based on artificial intelligence in depth, aiming to improve audit efficiency and risk identification capabilities. The study introduces the random forest algorithm, constructs 100 decision trees, self-samples data from the training set, and randomly selects features at each node for splitting to reduce the overfitting risk of a single decision tree and improve the generalization ability of the model. At the same time, with the help of real-time data processing platforms such as Kafka and Blink, real-time collection, processing and analysis of financial data are achieved to ensure the timeliness and dynamism of the audit process. After a series of steps, including extracting 500 features from multi-source data, dividing the data set containing 5,000 records into 70% training set and 30% test set, the model is trained and evaluated. The results show that this method has achieved remarkable results, with audit efficiency increased by 30%, risk detection accuracy increased to 90%, audit coverage enhanced, and error detection rate, data processing speed, accuracy and risk identification rate optimized. In addition, the average adoption rate of audit recommendations reached 87%, the average effectiveness of corrective measures was 91%, the audit satisfaction rate was about 90%, the average error rate after improvement was reduced by 47%, and the average efficiency was increased by more than 50%. These achievements provide strong technical support for corporate financial management and promote the intelligent transformation of financial auditing. Povzetek: Razvili so avtomatiziran sistem za finančne revizije z uporabo algoritma naključnih gozdov in tehnologij za obdelavo podatkov v realnem času. 1 Introduction the needs of modern enterprises for real-time risk monitoring and rapid response. In this context, this study In the current global economic environment, enterprises explores the financial audit automation method based on are faced with increasingly complex financial artificial intelligence, aiming to promote the intelligent management and audit requirements. With the rapid transformation of financial audit through technological development of information technology, traditional innovation, and improve the financial management level financial audit methods have been difficult to meet the and competitiveness of enterprises. requirements of enterprises for efficient, accurate and real- In the current research situation in the field of time audit. Advances in artificial intelligence technology, financial audit, scholars have put forward many especially machine learning and data analysis technology, viewpoints and theories to explore the influence of various have provided new solutions for financial auditing. With factors on audit pricing, audit quality and financial report. the introduction of AI, the audit process can be highly Sun pointed out that the comparability of financial automated, thereby improving audit efficiency and risk statements is related to audit pricing. The higher the identification. Advanced algorithms such as Random comparability of financial statements, the lower the audit Forest can process massive financial data, automatically cost [1]. Condie et al. studied the effect of audit experience identify abnormal transactions and potential risks, reduce on the degree of financial reporting aggressiveness of human errors, and improve the accuracy and reliability of chief financial officers (Cfos) and found that Cfos with audits. At the same time, the application of real-time data more audit experience tended to report more processing technologies such as Kafka and Blink ensure conservatively [2]. Koh et al. discussed the impact of that the audit process is real-time and dynamic, meeting refinement of financial statements on audit pricing. The 2 Informatica 49 (2025) 1–20 J. Li et al. higher the refinement of financial statements, the higher how to efficiently process and analyze data and discover the audit cost [3]. Lyshchenko et al. emphasized the role potential risks in time has become an urgent problem to be of financial audit in ensuring the reliability of financial solved. The data sources involved in the audit process are statements, pointing out that audit can improve the diverse and the formats are different, and the complexity information quality of financial statements [4]. Suryani's of data integration and cleaning increases the difficulty of research shows that the scale and audit period of audit the audit. In view of the problems, the purpose of this firms have an impact on fraud in financial statements, and study is to build an efficient and accurate financial audit larger audit firms and longer audit period can effectively automation system by introducing artificial intelligence reduce the occurrence of financial fraud [5]. Xu et al. used technology, especially random forest algorithm and real- the simultaneous equation method to study the time data processing technology, aiming to improve audit relationship between readability of financial reports and efficiency, enhance risk identification ability, optimize audit costs, and found that the more difficult financial data processing process, and realize real-time audit reports are to read, the higher the audit costs [6]. process. When discussing the relationship between electronics, In order to achieve the research objectives, this study artificial intelligence and the information society, will adopt a number of advanced technologies and Erdmann et al. pointed out that the rules of the information methods. The random forest algorithm is used to analyze society are the key link between the three, emphasizing the and predict financial data and automatically identify important role of electronics in promoting the progress of abnormal transactions and potential risks. The algorithm the information society [7]. In addition, Ijadi Maghsoodi improves the accuracy and robustness of the model by et al. proposed a method based on individual risk attitudes constructing multiple decision trees and randomly when studying the optimization problem of investment selecting features at each node for splitting. Real-time data strategies in virtual financial markets, which is of great processing platforms such as Kafka and Flink are significance for understanding the dynamic changes of introduced to realize real-time collection, processing and financial markets [8]. At the same time, Pragarauskaitė analysis of massive financial data, ensuring the dynamic and Dzemyda used the Markov model to analyze frequent and timely audit process. The application of explainable patterns in financial data, providing a new perspective for AI technologies, such as LIME and SHAP, improves the financial market analysis [9]. These studies not only transparency and explainability of the model, so that provide a theoretical basis for the in-depth analysis of this auditors can understand and interpret the audit results and study, but also provide rich empirical evidence for enhance the trust in the audit conclusions. Establish understanding the interaction between electronics, dynamic feedback and continuous learning system, collect artificial intelligence and the information society. and analyze user feedback, continuously optimize audit Lim's research finds that there is a relationship strategy and model parameters, and achieve continuous between the financial capability of enterprises and the improvement of the system. demand for audit quality. Enterprises with strong financial capability are more inclined to choose high-quality audit Financial Audit services [10]. Ismail et al. studied the relationship between Challenges the effectiveness of the audit committee, the internal audit Data function and the delay of financial reports, and found that Integration the effectiveness of the audit committee and the strong AI internal audit function can reduce the delay of financial Technology reports [11]. Oussii and Boulila provided evidence on the relationship between the financial expertise of the audit Risk committee and the effectiveness of the internal audit Identification function, pointing out that the audit committee with rich Random Forest financial expertise can improve the effectiveness of the Algorithm Real-time Data internal audit [12]. Research has shown that various Processing aspects of auditing, such as audit experience, readability of financial statements, internal audit function, and audit Figure 1: Schematic diagram of research content committee expertise, all have an impact on audit quality As shown in Figure 1, the implications of this research and reliability of financial reporting. It provides in the current scientific field are reflected in several ways. theoretical basis and empirical support for further By applying artificial intelligence technology to financial exploring how to improve the quality of financial reports audit, audit efficiency and accuracy are improved, human by improving the audit process and methods. errors are reduced, and the reliability of audit results is At present, the financial audit field is faced with enhanced. The real-time data processing technology and multiple challenges, including low audit efficiency, dynamic feedback mechanism introduced in the study inaccurate risk identification, insufficient data processing ensure the real-time and flexibility of the audit process, ability and the lack of real-time audit process. Traditional and meet the needs of modern enterprises for fast response audit methods rely on manual operation and are easily and real-time monitoring. By increasing the level of affected by human factors, which makes it difficult to automation in the audit process, this study has freed the guarantee the accuracy and reliability of audit results. auditor's energy to focus on higher-level analysis and With the explosive growth of enterprise financial data, Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 3 decision-making, and improved the value and Fores effectiveness of the overall audit work. The results of this t study have practical significance to the financial audit industry, and also provide useful reference for automation and intelligence in other fields, and promote the Neur application and development of artificial intelligence al [19] 85% Low 600 technology in a wider range of fields. Netw To better illustrate the position of the current study ork relative to existing literature, we have summarized the related research in Table 1 based on audit accuracy, audit Mod [13] 83% SVM 650 efficiency, model types used, and dataset sizes. Through erate this table, we can clearly see that the current study outperforms previous research in terms of both audit Rand accuracy and audit efficiency. Specifically, the current om study uses a combination of random forest algorithms and Fores real-time data processing techniques, achieving an audit t + 100 accuracy of 90% and 87% in audit efficiency, particularly [5] 90% High Real- 0 when handling large datasets. This indicates that the Time introduction of real-time data processing technologies and Proc optimized random forest models can significantly enhance essin both audit accuracy and efficiency, providing strong g technical support for the automation of financial auditing. Table 1: Comparison of related research with current Existing research is still insufficient in terms of audit study accuracy, efficiency and the ability to deal with complex data, and cannot fully meet the urgent needs of enterprises Aud Audi for efficient and accurate financial audits. Therefore, this Dat Refe it t Mod study aims to break through these bottlenecks and explore aset renc Acc Effic el better financial audit automation methods through Siz e urac ienc Type innovative technology integration to provide solid e y y guarantees for corporate financial management. When processing large-scale, high-dimensional Mod financial data, support vector machines (SVMs) have high [14] 80% SVM 500 erate computational complexity, are prone to overfitting problems, and are sensitive to the choice of kernel functions, making it difficult to adapt to the diversity and Deci Mod complexity of financial data. Although the gradient [17] 82% sion 600 erate boosting algorithm performs well in some scenarios, it is Tree sensitive to outliers, and financial data often contains abnormal transaction records, which will affect the Rand accuracy and stability of the model. In addition, the om [10] 85% Low 700 gradient boosting algorithm takes a long time to train and Fores cannot meet the real-time requirements of financial audits. t In the process of financial audit automation, existing methods are difficult to meet the needs of enterprises for Grad efficient and accurate audits. So, how to deeply integrate ient [2] 83% Low 550 the random forest algorithm with real-time data processing Boos technology, while taking into account the processing of ting massive data, improve the sensitivity to subtle anomalies in complex financial data, and achieve more Naiv comprehensive and accurate risk identification and Mod e [1] 81% 650 auditing? This has become a key issue that needs to be erate Baye explored. s This study aims to build a financial audit automation system based on random forest and real-time processing Deep technology. Strive to achieve a 35% increase in audit [18] 84% Low Lear 750 efficiency and shorten the data processing time by more ning than half; at the same time, increase the audit accuracy to 92% and reduce the false alarm rate to less than 8%, Mod Rand provide enterprises with efficient and reliable financial [21] 86% 700 erate om 4 Informatica 49 (2025) 1–20 J. Li et al. audit services, and help enterprises strengthen financial automation [7]. Other studies have demonstrated management and risk prevention and control. successful cases of using innovative methods in complex Assume that the combination of random forest system decision-making, which is consistent with our goal algorithm and real-time data processing technology in in the field of financial auditing to achieve financial audit financial auditing can significantly improve audit automation through random forests and real-time stream efficiency. Random forest can mine complex data features processing technology, and to improve audit efficiency through parallel processing of multiple decision trees, and and risk detection capabilities in complex financial data real-time processing technology can ensure real-time data environments [8]. analysis. The two work together to shorten the audit cycle, The uniqueness of this study lies in that, for the first improve accuracy, and reduce false alarms, thereby time, the random forest algorithm is deeply integrated with achieving the improvement of audit efficiency and Kafka and Flink real-time processing technology and accuracy in the research objectives. applied to the entire life cycle of the financial audit When discussing the application of real-time process. In terms of audit process optimization, through processing technology in the field of financial auditing, an real-time processing technology, real-time collection, earlier solution is to use a real-time processing framework analysis and feedback of audit data are realized, and the with low latency and high throughput characteristics to traditional post-audit is transformed into an in-process conduct real-time monitoring of financial transaction data. audit, which greatly shortens the audit cycle from the By setting a sliding window, the system can perform real- original average of 15 days to 7 days. In terms of the time analysis on the data in the window, and then detect timeliness of anomaly detection, most previous studies abnormal transactions, and basically complete the have adopted batch processing methods, which cannot detection of abnormal transactions within seconds. detect financial risks in time. This system can process and However, this solution lacks flexibility when dealing with analyze data at the moment it is generated. Once an complex business logic. anomaly is detected, an alarm will be issued immediately, In comparison, this study uses a combination of Kafka providing strong support for enterprises to take timely risk and Flink. Kafka serves as a data buffer and distribution response measures. This innovative method not only platform, which can efficiently collect and temporarily improves audit efficiency and accuracy, but also provides store financial data to ensure the stability of data new ideas and methods for the real-time and intelligent transmission; Flink is responsible for real-time processing development of the financial audit field. and analysis of data. It not only guarantees low latency, but also its powerful stream processing function can 2 Materials and methods properly deal with complex financial audit logic, such as processing multi-dimensional financial indicator 2.1 Data collection and sample selection correlation analysis, risk assessment under complex business processes, etc., can show good adaptability and 2.1.1 Data collection and sample selection processing capabilities. Data collection and sample selection are the key steps in In terms of anomaly detection algorithms, there is an the research of AI-based financial audit automation. The algorithm that identifies abnormal data based on building diversity and accuracy of data sources directly affect the probabilistic relationships between financial data. This effectiveness and reliability of the model. In this study, the algorithm focuses on the dependency relationship between main data sources include the company's internal financial data and determines whether the data is abnormal by statements, bank statements, transaction records, analyzing the probabilistic connection between each data electronic invoices, audit reports, and external market data point. However, unlike the random forest algorithm in this and economic indicators. In order to ensure the study, the random forest algorithm is more capable of comprehensiveness and misrepresentations of the data, the extracting and classifying data features by virtue of the financial data of a number of enterprises from 2015 to integrated learning of multiple decision trees. In this 2023 were selected, covering manufacturing, service, study, the random forest algorithm combined with real- retail and other industries. The data includes the daily time processing technology can classify and detect operation data of the company, and also involves key anomalies in financial data in real time. This feature is financial reports such as quarterly reports and annual more in line with the timeliness requirements of modern reports [1]. financial auditing, and can detect potential risks timelier Data sources include public databases such as Yahoo in the ever-changing financial environment. Finance, which provide real-time and historical financial In the context of the widespread application of market data, company financial statements, etc. The ETL artificial intelligence, research results in related fields process first extracts real-time streaming data through have provided us with valuable ideas and references for Flink and Kafka integration to ensure high throughput and our exploration in the direction of financial auditing. For low latency. Then, the data is cleaned, outliers are example, some studies focus on the application of artificial processed, and features are extracted. Flink is used for intelligence in complex business processes and deeply real-time conversion, and the processed data is input into analyze how to build a sustainable implementation model. the random forest model for financial audit analysis. This prompts us to think about how to more efficiently use Finally, the converted data and model output are stored in artificial intelligence technology to optimize audit a database or real-time data warehouse to ensure real-time processes and strategies in the process of financial audit Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 5 monitoring and automated financial auditing, and timely convert data from different sources and formats into a detection of anomalies and potential financial risks. unified format to facilitate subsequent integration and According to the previous estimate of the data analysis. volume, in the early stage of system operation, the average When dealing with outliers, we adopted the 3σ amount of data will increase by several pieces per day, and principle, which means detecting and dealing with outliers the frequency of data generation is relatively stable. After that exceed 3 standard deviations from the mean. This testing, when the number of Kafka partitions is set to 8, method is simple and effective, but it may have limitations the parallelism requirements of data processing can be in some cases, such as when the data distribution is not met. For example, in a high-concurrency scenario, 8 normally distributed. In contrast, isolation forests or z- partitions can enable 8 consumers to process data at the score-based methods can better adapt to non-normally same time, effectively reducing data backlogs. When the distributed data. amount of data fluctuates, we can use Kafka's dynamic In the process of data integration, data warehouse partition adjustment mechanism to increase or decrease technology is used to store decentralized financial data on the number of partitions in real time according to the rate a unified platform, and the data is extracted, transformed and accumulation of data generation, ensuring that the and loaded through the ETL process. Especially for the system always maintains efficient operation. needs of real-time data processing, the Kafka stream We use the YARN cluster mode to deploy Flink jobs processing platform is introduced to realize real-time data because YARN can better manage cluster resources and acquisition and analysis, and ensure the timeliness and realize dynamic resource allocation. In terms of timeliness of data. In order to further improve the parallelism setting, the parallelism is set to 16 according efficiency and accuracy of data analysis, data feature to the complexity of the task and the number of CPU cores engineering is also carried out to extract and select multi- in the cluster. Each parallel task is allocated 2GB of dimensional features from the original data. In the audit, memory, which is based on the monitoring and analysis of for the detection of financial fraud, the key financial task memory usage. Through multiple tests, when each indicators including revenue growth rate, cost change rate task is allocated 2GB of memory, the task execution and accounts receivable turnover rate are extracted as the efficiency is the highest and there will be no memory input features of the model. Characteristics can reflect the overflow. At the same time, in the YARN cluster, we financial health of the enterprise and can effectively configured 10 nodes, each with an Intel Xeon Platinum identify potential financial risks. The process of data 8380 CPU model and 32GB of memory to meet the collection and sample selection includes data source hardware resource requirements of Flink jobs. selection, data cleaning and prepossessing, data In this study, the "real-time" processing achieved with integration and real-time processing, and also involves the Kafka and Flink refers to the frequency of data processing application of feature engineering to ensure the at the transaction level. That is, once new financial data is comprehensiveness, accuracy and timeliness of the data, generated, Kafka will immediately receive it and quickly which provides the data foundation for the subsequent transmit it to Flink for processing, with almost no delay, audit automation model construction and optimization [2]. which is different from daily or other low-frequency batch Through correlation analysis, we found that revenue processing methods. Through this real-time processing, growth rate and accounts receivable turnover rate are Blink can calculate key financial indicators such as highly correlated with financial risk, so these two features accounts receivable turnover and cash flow in a very short are used as important inputs of the model. In addition, time. The real-time acquisition of these indicators allows through PCA analysis, we further verified the auditors to promptly detect subtle changes in the effectiveness of these features after dimensionality company's financial situation and quickly discover reduction. potential risks. For example, a sudden decrease in cash Ethical considerations, data anonymization, and flow may indicate that the company's capital chain is tight, dataset reproducibility. Ethical considerations are crucial and then take measures in advance to improve the audit in the process of collecting financial data. We strictly process and improve audit efficiency. Real-time abide by data protection regulations to ensure that the data processing and analysis technology enhances risk control sources are legal and compliant. For data obtained from capabilities, mainly reflected in its ability to monitor public databases and internal reports, strict anonymization financial data in real time. Once the data fluctuates is performed. All information that can directly or abnormally, the audit system will immediately issue an indirectly identify individuals is removed, such as early warning, allowing auditors to intervene in time to replacing company names with codes and desensitizing reduce the company's financial risks. key personnel information. To ensure the reproducibility In the process of data collection, data cleaning and of the dataset, the data collection process, tools used, and preprocessing are essential steps. Eliminate duplicate data parameter settings are recorded in detail. For example, and data items with obvious errors to ensure data when extracting data from public databases such as Yahoo consistency and accuracy. Deal with missing value Finance, SQL query statements and Python scripts are problems, such as completion by means of mean filling or recorded to facilitate other researchers to reproduce the interpolation. For outliers, the 3σ principle is adopted to research process and ensure the scientificity and detect and process them to ensure the rationality of the credibility of the research. data. In order to improve the quality and availability of The dataset reflects real-world conditions and data, data standardization has also been carried out to potential biases. The dataset used in this study is 6 Informatica 49 (2025) 1–20 J. Li et al. comprehensive, covering financial data of companies in multiple industries from 2015 to 2023, including Data Data A D manufacturing, service, retail, etc., involving various Cleaning Standardization aspects of company daily operations, quarterly and annual reports, etc. However, potential biases may still exist. The Missing data mainly comes from companies with data disclosure B Real-time E Values Processing capabilities, which may ignore some small and micro enterprises or emerging enterprises. In addition, different industries have different financial characteristics and risk Outlier C Feature F Detection Engineering patterns. Although the samples cover multiple industries, they may be underrepresented in certain segments, Figure 2: Preprocessing steps resulting in limited adaptability of the model in these special scenarios. As shown in Figure 2, in order to ensure the timeliness and real-time performance of data, real-time data 2.1.2 Data cleaning and processioning steps processing frameworks, such as Apache Kafka and Apache Slink, are introduced in the re-processing process In the research of financial audit automation based on to realize real-time processing and analysis of data flow. artificial intelligence, the data cleaning and pr-processing Through the framework, the real-time financial data can step is very important, which ensures the accuracy and be cleaned and re-processed in time to ensure the latest consistency of input data, thus improving the performance and accuracy of the data. Feature engineering plays a key and reliability of the model. The first step in data cleaning role in data preprocessing. Multi-dimensional feature is to remove duplicate records and incorrect data. extraction and selection are carried out for financial data. Duplicate transaction records in financial statements are Key financial indicators such as revenue growth rate, cost checked and deleted by unique identifiers to ensure the change rate and asset-liability ratio are extracted from the uniqueness of data. Use logical rules and domain original transaction data, and correlation analysis is knowledge to detect and correct obvious errors, such as carried out to select the features that have an impact on the negative revenue records or unreasonable transaction audit model. The re-processing steps ensure the high amounts. Dealing with missing values is a key step in data quality of the data and lay the foundation for the preprocessing. Many methods are used to deal with subsequent training and optimization of the audit missing values, including mean filling method, median automation model. filling method and K-nearest neighbor algorithm. Missing financial indicators, such as sales in a quarter, can be filled 2.1.3 Financial data integration by calculating the average sales of similar enterprises to ensure data integrity. In the research of financial audit automation based on When dealing with outliers, the 3σ principle is artificial intelligence, financial data integration is a key adopted, that is, abnormal data that exceeds 3 standard step to realize comprehensive data analysis and real-time deviations of the mean value is detected and processed. processing. Integrate data from multiple sources into a Based on the detected abnormal values, manual unified database for centralized processing and analysis. verification is carried out according to the actual business Key financial indicators such as revenue, cost, profit, logic, and the confirmed abnormal data is removed or accounts receivable, and accounts payable are shown adjusted. For an expense that is higher than the industry below. average, further verification is conducted to confirm the Table 2: Financial data integration presence of data entry errors or unusual financial activity. Data standardization is a step-in data processioning. The Reve Profi Play Cost Receiv z-score standardization method is used to convert the data Compa nue t able Ye (mill ables into a standard normal distribution, which can eliminate ny (mill (mill (mill ar ion (millio the dimensional differences between different financial Name ion ion ion $) n $) indicators and enhance the stability of the model. The $) $) $) financial data of different enterprises such as revenue, cost Tech 20 143. 97.5 46.1 36.2 and profit are standardized, so that all indicators are Solutio 50.37 15 67 4 3 4 analyzed and compared under the same dimension [3]. ns Inc. Tech 20 153. 103. 50.1 38.7 Solutio 53.89 16 45 29 6 6 ns Inc. Green 20 165. 110. 55.2 41.2 Energy 60.34 17 78 56 2 1 Corp. Green 20 172. 115. 56.7 44.3 Energy 63.96 18 14 37 7 2 Corp. 20 Health 188. 129. 58.6 68.48 47.8 Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 7 19 Plus 23 58 5 9 01 ions s Ltd. 10:0 Inc. Health 0:00 20 194. 133. 60.9 49.7 Plus 70.87 2024 Gree 20 36 45 1 6 Ltd. -01- n TXN 850.7 Parab 45. Auto 01 Ener Debit 20 210. 145. 64.8 53.1 002 8 les 12 Tech 75.34 10:0 gy 21 47 67 0 2 Global 5:00 Corp. Auto 2024 20 223. 154. 68.8 56.4 Healt Tech 78.92 -01- Recei 22 19 32 7 7 TXN h 1560. Credi 72. Global 01 vable 003 Plus 90 t 55 Food 10:1 s Ltd. 20 Innova 235. 162. 72.6 59.3 0:00 82.45 23 tions 56 89 7 4 2024 Auto Inc. -01- TXN Tech 1120. Parab 49. 01 Debit 004 Glob 23 les 34 As shown in Table 2, in the process of data 10:1 al integration, data warehouse technology is used to realize 5:00 data extraction, conversion and loading through ETL 2024 Food process. Extract relevant financial data from different data -01- Inno Recei TXN 1330. Credi 64. sources (such as ERP system, CRM system, e-invoice 01 vatio vable 005 67 t 78 system.) to ensure the comprehensiveness and 10:2 ns s completeness of the data. Format conversion and 0:00 Inc. standardization of data from different sources, such as 2024 Tech unified date format, currency unit conversion., to ensure -01- TXN Solut 975.3 Parab 50. data consistency and comparability. The converted data is 01 Debit 006 ions 4 les 85 loaded into a unified database, and the partitioning and 10:2 Inc. indexing techniques are used to improve the efficiency of 5:00 data query and processing. As shown in Table 3, the Kafka platform enables real- In order to meet the needs of real-time data time acquisition and processing of transaction data from processing, Kafka stream processing platform is various data sources, such as sales systems, banking introduced to realize real-time data acquisition and interfaces, and supply chain management systems. Kafka's processing. Monitor transaction records and bank high throughput and low latency features ensure timely statements in real time, and update accounts receivable data transmission and processing. Blink is used for real- and parables data to support dynamic financial audit time data analysis and processing. With Blink, it is analysis. The data integration step ensures the high quality possible to calculate various key financial indicators in and timeliness of the data, and provides the data real time, such as accounts receivable turnover, cash flow. foundation for the subsequent audit automation model In the table, you can monitor Tech Solutions Inc in real construction and optimization [4]. time. Accounts receivable and accounts payable changes, and through the flow processing algorithm to instantly 2.1.4 Real-time data processing and analysis calculate the company's cash flow and financial health. In the research of financial audit automation based on Using machine learning algorithms, anomaly artificial intelligence, real-time data processing and detection modules are embedded in the data stream to analysis is the key to ensure the high efficiency and identify and flag suspicious transactions in real time. By accuracy of audit process. It uses advanced stream analyzing unusual changes in transaction amount and processing technology to realize real-time processing and frequency, potential financial fraud is detected in a timely analysis of financial data. Stream processing platforms manner. The real-time processing and analysis technology such as Kafka and Slink are introduced into the system to improves the audit efficiency and enhances the risk support real-time monitoring and analysis of large-scale control ability in the audit process. The method of real- financial data. time data processing and analysis provides a strong support for the automation of financial audit and ensures Table 3: Real-time financial transactions the accuracy and timeliness of data. Trans Bal Com actio Trans anc 2.2 Model construction Trans Time Acco pany n actio e actio stam unt 2.2.1 Selection of audit automation model Nam Amo n (mi n ID p Type e unt Type llio In the study of AI-based financial audit automation ($) n $) methods, model selection is a key step to ensure the TXN 2024 Tech 1200. Recei Credi 52. efficiency and accuracy of the audit process. According to 001 -01- Solut 45 vable t 38 the research objectives and data characteristics, random 8 Informatica 49 (2025) 1–20 J. Li et al. forest algorithm is chosen as the core audit automation trees exceeds 100, the improvement slows down. On the model. Based on the superior performance of random validation set, when the number of trees is 100, the recall forest in processing large-scale, high-dimensional data, as rate reaches the highest, and the overfitting phenomenon well as its high accuracy and robustness in classification is not obvious. Therefore, considering the generalization and regression tasks. Random forest algorithm can ability and computational cost of the model, the number classify and predict data by constructing multiple decision of trees is selected as 100. For the determination of the trees and splitting features randomly at each node. maximum depth, we start the test from a depth of 5 and Advantages include the ability to handle a large number of gradually increase the depth. When the depth is 10, the input variables, difficulty in over fitting, and robustness to accuracy of the model on the training set reaches 90%, and missing data. The specific model selection and the accuracy on the validation set remains at around 85%. construction process is as follows: If the depth is further increased, the accuracy of the Feature selection: Extract key features from training set will increase slightly, but the accuracy of the integrated financial data, such as revenue growth rate, validation set will decrease, and overfitting will occur. accounts receivable turnover, asset-liability ratio, and cash Therefore, the maximum depth is set to 10 to achieve a flow. Characteristics can fully reflect the financial health balance between capturing data features and preventing and potential risks of the enterprise. 500 characteristics overfitting. were extracted from the financial data of several Model training: Train a random forest model on a enterprises, including revenue, cost, profit, accounts training set to build a forest containing 100 decision trees. receivable, accounts payable. Each decision tree is generated by self-sampling the We selected the three key features of "income growth training set. The goal of the model is to minimize the rate, cost change rate and accounts receivable turnover classification error rate, as shown in formula (1). rate" from the 500 initial features, and used a combination 1 N of stepwise regression and correlation analysis. First, we E =  I (yi  yi ) performed univariate correlation analysis on all features N i=1 (1) and target variables (such as financial risk indicators), Model evaluation: Evaluate model performance using selected features with high correlation (absolute value test sets, calculating metrics such as accuracy, recall, and greater than 0.5), and initially reduced the number of F1 scores. Among the 1500 test data, the model correctly features to about 100. Then, we used the stepwise classifies 1400 items and incorrectly classifies 100 items, regression method to introduce the initially selected then the accuracy of the model is shown in formula (2). features into the regression model one by one, and selected 1400 the feature combination with the best model fitting effect Accuracy = = 0.9333 and the most streamlined variables based on indicators 1500 (2) such as AIC (Akaike Information Criterion) and BIC The recall rate and F1 score are calculated as shown (Bayesian Information Criterion). In the correlation in formulas (3) and (4). analysis, we used the Pearson correlation coefficient to TP calculate the correlation between each feature and the Recall = financial risk indicator on the quarterly financial data of TP + FN (3) the past 5 years. For example, the Pearson correlation PrecisionRecall coefficient between the income growth rate and the F1= 2 financial risk indicator is 0.7, indicating that there is a Precision +Recall (4) strong positive correlation between the two. When Model optimization: Improve model performance and verifying the validity of the features by principal stability by adjusting model parameters such as the component analysis (PCA), we first standardized the number of trees, maximum depth, and minimum number selected 3 features, then calculated the covariance matrix, of sample splits. Cross-validation method was used to and solved the eigenvalues and eigenvectors. The number further verify the generalization ability of the model. of principal components is determined by the cumulative Through the steps, the random forest algorithm can contribution rate reaching more than 90%. The results effectively identify and classify various financial show that the cumulative contribution rate of the first two anomalies and risks in the automation of financial audit, principal components reaches 92%, indicating that these and provide accurate and reliable audit results. This three characteristics can effectively explain most of the method improves the audit efficiency, strengthens the risk information in the financial data. control ability, and ensures the healthy development of Data set partitioning: The data set is divided into enterprise financial management. training sets and test sets to ensure the generalization ability of the model. Typically, 70% of the data is used for 2.2.2 Model architecture design training and 30% for testing. There are a total of 5,000 records, the training set contains 3,500 records, and the In the research of financial audit automation method based test set contains 1,500 records [5]. on artificial intelligence, the design of model architecture To determine the number of trees, we set the number is the core step to build an efficient and accurate audit of trees to 50, 100, 150, and 200 for experiments. On the automation system. As the core model, random forest training set, as the number of trees increases, the accuracy algorithm can give full play to its advantages in processing of the model gradually increases, but when the number of Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 9 high-dimension and large-scale data through reasonable paths →define_paths(tasks) architecture design. path →a_star(tasks, paths) In the model architecture, the main function of the # Audit "data input layer" is to receive raw financial data from for task in path: different data sources, including internal corporate audit(task) financial statements, bank flow records, etc., and perform report(task) preliminary format verification and missing value To ensure that other researchers can repeat our marking on the data. For example, for data in date format, research process, we describe the specific steps of data ensure that it conforms to a unified standard format; for collection in detail. The data is mainly obtained from data with missing values, mark the missing position for internal databases, public financial reports, and third-party subsequent processing. The "feature extraction layer" is financial data providers. Specifically, the internal database based on the data input layer, and deeply processes the raw of the company provides real-time updated financial data to extract effective information that can reflect the records; public financial reports are obtained through financial status and risk characteristics of the enterprise. stock exchanges and company websites; third-party For example, various financial ratios such as debt-to-asset financial data providers supplement industry benchmark ratio and gross profit margin are calculated from financial data. In addition, we recorded the SQL query statements statement data; trend features and seasonal features are and Python scripts for data extraction in detail to ensure extracted from time series data. Data conversion is to the consistency and integrity of the data. The ETL process change the form of raw data to meet the input requirements includes data extraction using Apache NiFi, data cleaning of the model, such as converting text data into numerical and transformation through the Pandas library, and finally data and performing unique hot encoding on categorical loading into the data warehouse using Apache Hive. These data. Data standardization is to normalize numerical data detailed steps ensure the transparency and repeatability of so that data with different features have the same scale. the data processing process. Commonly used methods include Z-score standardization Feature extraction layer: In this layer, key features are and Min-Max standardization. Through these clear extracted from raw data, including but not limited to functional definitions and operation processes, the clarity revenue growth rate, accounts receivable turnover, asset- of the model architecture and the efficiency of data liability ratio, and cash flow. The purpose of feature processing are ensured. extraction is to transform complex raw data into a Data entry layer: This layer is responsible for simplified representation that the model can process. The receiving and processing financial data from a variety of extracted features include revenue growth rate, accounts data sources, such as ERP systems, bank statements, and receivable turnover rate, asset-liability ratio. electronic invoices. The data input layer needs to realize Data standardization layer: In order to eliminate real-time data acquisition and re-processing to ensure the dimensional differences between different features, the integrity and consistency of the input data [6]. data standardization layer standardizes the extracted In the data standardization layer, we used the z-score features. As shown in formula (5). standardization method to convert data from different X −  sources into the same dimension to eliminate dimensional Z = differences.  (5) Below is the pseudo code of the model framework. Model training layer: This layer contains the concrete # Data implementation of the random forest algorithm to build data →collect(['internal', 'bank', 'external']) multiple decision trees from the training set data. Each cleaned →clean(data) decision tree is generated by self-sampling the training set unified →integrate(cleaned) and split by randomly selecting features at each node. The kafka →setup_kafka() prediction of random forest algorithm is shown in formula while True: (6). new →kafka.get() yˆ =mode{h (x),h 1 2(x),,hk (x)} (6) process(new) Model optimization layer: In order to improve the detect_anomaly(new) performance and stability of the model, the model # Model optimization layer optimizes the model by tuning and features →select(unified) cross-validation methods. The parameters include the train, test →split(features, 0.7) number of decision trees, the maximum depth and the rf →RandomForest(100, 10) minimum number of sample splits [10]. rf.train(train) Prediction layer: After model training is complete, the # Evaluation and Optimization prediction layer is responsible for making predictions on pred →rf.predict(test) the test set and output the results. The main task of the metrics →evaluate(pred, test.labels) prediction layer is to evaluate the performance of the model, including accuracy, recall, and F1 Score. The rf →optimize(rf, train, 5) accuracy on the test set is 93.33%, as shown in formula # Path Planning (7). tasks →define_tasks() 10 Informatica 49 (2025) 1–20 J. Li et al. Recall refers to the proportion of samples correctly Number of Correct Predictions identified as anomalies to all actual anomaly samples, Accuracy = reflecting the model's ability to detect anomalies. In Total Number of Predictions financial auditing, high precision can reduce misjudgment (7) of normal transactions and reduce audit costs; high recall In the anomaly detection model of financial auditing, can ensure that more potential financial risks are we define "correct" classification as: for a transaction data, discovered. Through a comprehensive evaluation of if its various financial indicators meet the reasonable precision and recall, the performance of the anomaly range specified by accounting standards, and after in- detection layer can be more comprehensively measured. depth analysis, no signs of financial fraud, such as Feedback and Improvement layer: This layer fictitious income, concealed expenses, etc., are found, compares the predicted results of the model with the actual then the transaction is judged to be normal. On the audit results and continuously improves and optimizes the contrary, if there are abnormal fluctuations in indicators in model based on the feedback. Through cyclic iteration, the the transaction data, or there are significant differences accuracy and robustness of the model are continuously from the company's past operating data and the industry improved. Through architecture design, the application of average, and at the same time, possible clues of financial random forest algorithm in financial audit automation has fraud are found through data analysis, such as mismatch been fully optimized and promoted. The systematic design between income and costs, abnormal cash flow, etc., then ensures the efficiency and accuracy of the model when the transaction is judged to be abnormal. In the anomaly dealing with large-scale and high-dimensional financial detection process, the model first extracts multi- data, and provides powerful technical support for dimensional features of the input financial data, including enterprise financial audit [11]. financial ratios, trend analysis, etc. Then, the trained classifier is used to determine whether the data belongs to 2.2.3 Configuring the data layer and the normal class or the abnormal class according to the processing layer preset threshold and decision rules. For example, when the accounts receivable turnover rate is lower than a certain In the research of financial audit automation based on percentage of the industry average, and the revenue artificial intelligence, the configuration of data layer and growth rate fluctuates significantly in a short period of processing layer is the key part of model construction, time, the model will judge the transaction as abnormal, which directly affects the efficiency and performance of triggering further audit investigation. the system. The data layer is responsible for storing and Anomaly Detection Layer: During the audit process, managing financial data, while the processing layer is the anomaly detection layer is responsible for identifying responsible for data cleaning, transformation, analysis and and flagging suspicious financial activity. By analyzing modeling, and the data layer configuration needs to unusual changes in transaction amount and frequency, the consider the diversity of data and storage efficiency. model can detect potential financial fraud in real time. In After re-evaluating the data distribution, we found the "anomaly detection layer", in order to more that the current data set does not meet the normal comprehensively evaluate the performance of the real- distribution assumption. Therefore, we use the isolation time fraud detection system, we added the evaluation of forest algorithm instead of the 3σ principle for outlier false positive and false negative rates. By calculating the detection. The isolation forest algorithm is based on the false positive and false negative rates, we can better principle that in high-dimensional space, normal data measure the accuracy and robustness of the system. points tend to cluster together, while abnormal data points Specifically, we calculated the false positive and false are relatively isolated. The algorithm constructs multiple negative rates through the confusion matrix and analyzed random binary trees to randomly divide the data points and their impact on the system performance. The results show calculates the path length of each data point in the tree. that the system has a low false positive rate, which means The shorter the path length, the more isolated the data that normal activities are less likely to be mislabeled as point is and the more likely it is an outlier. In practical abnormal; at the same time, the false negative rate is also applications, we first normalize the original financial data effectively controlled to ensure that potential risks are not to eliminate the impact of the dimension. Then, the ignored. These evaluation results further confirm the processed data is input into the isolation forest model, and efficiency and accuracy of the system in real-time fraud the number of trees is set to 100 and the subsample size is detection. set to 256 to ensure the stability and accuracy of the model. In addition to focusing on false positives and false After the model training is completed, for new financial negatives, the anomaly detection layer also focuses on data, we calculate its anomaly score in the isolation forest precision and recall. Precision refers to the proportion of and set a suitable threshold (such as 0.5). When the samples correctly identified as anomalies to all samples anomaly score exceeds the threshold, the data point is identified as anomalies, reflecting the accuracy of the determined to be an outlier. model in identifying anomalies. Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 11 Table 4: Financial indicators margin, operating profit margin, return on assets., and select features that have a greater impact on the model Ret through correlation analysis [12]. urn De The processed data is re-stored in the data warehouse, Gro Oper on bt- Qu which is convenient for subsequent model training and Rec Comp ss ating Ass to- Ye ick real-time processing. Partitioning and indexing techniques ord any Mar Marg ets Eq ar Rat are used to improve the efficiency of data query and ID Name gin in (R uity io processing. Real-time data processing and analysis are (%) (%) OA Rat realized through stream processing platforms such as ) io Kafka and Blink. The real-time processing layer is (%) responsible for monitoring and analyzing the flow of Tech financial data, calculating key financial metrics and 20 Soluti 35. 8.3 0.6 1.2 001 12.45 anomaly detection in real time. Monitor the company's 19 ons 67 4 7 5 gross and operating margin changes in real time, and Inc. identify and flag unusual transactions in a timely manner. Tech The processing layer uses the random forest algorithm to 20 Soluti 37. 9.1 0.6 1.3 002 13.22 train and predict the processed data. The model training 20 ons 12 1 5 0 process includes the partitioning of data sets, the Inc. adjustment of model parameters and cross-validation to Green ensure the generalization ability and prediction accuracy 20 Energ 30. 7.5 0.7 1.1 003 10.89 of the model. The configuration, data layer and processing 19 y 78 6 2 5 layer work together to ensure high-quality management Corp. and efficient processing of financial data, which provides Green a basis for the construction and optimization of audit 20 Energ 32. 8.1 0.7 1.2 004 11.34 automation models. 20 y 45 2 0 0 Corp. 2.2.4 Random forest algorithm selection and Health 20 28. 6.7 0.7 1.1 implementation 005 Plus 9.45 19 56 8 5 0 Ltd. In the research of financial audit automation based on artificial intelligence, the implementation process of Health 20 29. 7.2 0.7 1.1 random forest algorithm is very important, which 006 Plus 10.12 20 67 3 3 8 determines the accuracy and robustness of the model. Ltd. Data preparation: Load and process the characteristic Auto 20 34. 8.5 0.6 1.2 data in the data table. Ensure data integrity and 007 Tech 12.67 21 89 6 8 2 consistency. Characteristic data include revenue growth Global rate, accounts receivable turnover, asset turnover, debt Auto 20 36. 9.4 0.6 1.2 ratio, and net profit margin. In the process of data 008 Tech 13.45 22 45 5 6 8 preparation, the feature data is standardized to eliminate Global dimensional differences between different features. Food Model training: Train a random forest model on a 20 Innova 33. 8.1 0.6 1.2 009 12.12 training set. Random forests achieve prediction by 21 tions 56 2 9 0 building multiple decision trees and splitting randomly Inc. selected features at each node. Set key parameters of the Food random forest, such as the number of decision trees. In this 20 Innova 35. 8.8 0.6 1.2 010 12.78 study, 100 decision trees are constructed. Each decision 22 tions 12 9 7 5 tree is generated by self-sampling to ensure the robustness Inc. and accuracy of the model. Model prediction: Use a trained random forest model As shown in Table 4, the processing layer is to make predictions on the test set. The model is classified responsible for cleaning, transforming, analyzing, and or regression based on the prediction results of the modeling the financial data in the data layer. The majority decision tree. Of the 300 records in the test set, processing layer cleans the financial data in the data layer, the model classified 280 correctly and 20 incorrectly [13]. including dealing with missing values, outliers, and Model evaluation: Evaluate the performance of the duplicate data. For missing values, the median fill method model, mainly including calculation accuracy, recall rate is used, and for outliers, the 3σ principle is used for and F1 score. detection and processing. Data transformation involves Parameter optimization: Optimize model standardizing and normalizing the raw data to eliminate performance by adjusting model parameters. The cross- dimensional differences between different features. The validation method is used to verify the generalization processing layer improves the performance of the model ability of the model to ensure the consistency and stability through feature extraction and feature selection. Extract of the model on different data sets. key features from financial indicators, such as gross profit 12 Informatica 49 (2025) 1–20 J. Li et al. Through the implementation process, the random algorithm. The optimization strategy mainly includes forest algorithm has been effectively applied in the parameterize tuning, feature selection, data enhancement automation of financial audit. It improves the audit and model integration. Feature selection by calculating the efficiency, strengthens the risk control ability, and characteristics, the features that contribute little to the provides technical support for the financial health model are eliminated, so as to reduce the noise and management of enterprises. The systematic realization improve the interpretation and efficiency of the model. process ensures the efficiency and accuracy of the model, Characterization can be determined by calculating the and further promotes the intelligent development of contribution of each feature to the reduction of model financial audit [14] impurity. In the analysis, the revenue growth rate and accounts receivable turnover rate contribute the most, and 2.3 Training and optimization the characteristics can be preferentially retained. Data enhancement is another strategy to improve the 2.3.1 Training process description robustness and generalization of the model by generating In the research of financial audit automation based on more training samples. The model integration strategy artificial intelligence, the training process is the key step further improves the prediction performance by to ensure the performance of random forest model. Extract combining the prediction results of multiple models. and standardize key characteristics from the data sheet, Random forest and gradient lifting decision tree are including revenue growth rate, accounts receivable combined to form an integrated model, and the advantages turnover, asset turnover, debt ratio, and net profit margin. of different algorithms are utilized to enhance the accuracy After the features are reprocessed, the input data set of the and stability of prediction. In the concrete implementation, model is formed. Next, the data set is divided into a the random forest and GBDT are trained respectively, and training set (70%) and a test set (30%). In the model then the predicted results of the two are fused by weighted training phase, the random forest algorithm is trained by average or voting mechanism to obtain the final predicted building 100 decision trees. Each tree uses a bootstrap value [15]. sampling method to extract samples from the training set. Through the optimization strategy, the performance of At each node, features are randomly selected for splitting random forest algorithm in financial audit automation has to minimize the Gini coefficient. In addition, the been improved, ensuring the efficiency and reliability of generalization ability of the model is further improved by the model in different data sets and scenarios, and setting the maximum tree depth to 10 and using a 50% providing technical support for the financial health feature subset. management of enterprises. For situations where real-time data sources are In the model training stage, the random forest temporarily unavailable, we have designed buffering algorithm is trained by constructing 100 decision trees. strategies and error handling mechanisms. When the real- Each decision tree uses a self-service sampling method to time data stream is interrupted, the system automatically extract samples from the training set to ensure the stores the data in the memory buffer and periodically diversity of data and the robustness of the model. At each attempts to reconnect to the data source. Once the data node, randomly selected features are split to minimize Gin source is restored, the data in the buffer will be quickly coefficient or maximize information gain, thus building processed and fed into the system. In addition, the system the structure of the tree. The goal of model training is to is also configured with error handling logic. When the data reduce the over fitting risk of a single decision tree and source is unavailable for a long time, an alarm mechanism improve the generalization ability of the whole model will be triggered to notify the administrator to through the voting results of the majority decision trees. troubleshoot the problem. These mechanisms ensure the We have listed the hyperparameters used for training stability and continuity of the system in the face of the random forest model in detail, including the number of emergencies. decision trees (set to 100), the maximum depth (set to 10), In the random forest algorithm, the number of trees the number of features used to split a node (set to 50% of and tree depth are two key hyperparameters. The number the total number of features), and other key parameters. of trees is chosen to be 100 because more trees can Cross-validation methods are used to evaluate the integrate the results of more decision trees, reduce the risk model's performance on different datasets through of overfitting of a single tree, and improve the repeated iterations and parameter adjustments (e.g., generalization ability of the model. If the number of trees number of decision trees, maximum depth.) to ensure high is too small, the model will not learn fully; if the number accuracy and stability. After the training process, the of trees is too large, the computational cost will increase model's performance on the test set is used for final and the benefits will gradually decrease. The tree depth is evaluation and validation to confirm its validity and set to 10 to balance the complexity and accuracy of the reliability in practical applications. model. If the tree is too deep, the model will overfit the training data and the generalization ability will deteriorate; 2.3.2 Model optimization strategy if the tree is too shallow, the complex features of the data cannot be learned, which will reduce the performance of In the research of financial audit automation based on the model. A reasonable tree depth can avoid overfitting artificial intelligence, model optimization strategy is the while ensuring the model's ability to capture features. key to improve the performance of random forest Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 13 2.4 Automatic path planning of financial to be carried out after the audit of balance sheets is audit completed. We convert these tasks into nodes and edges in the graph algorithm, and use the Dijkstra algorithm to 2.4.1 Path planning algorithm selection calculate the shortest path from the start node (such as In the research of financial audit automation method based project start) to the end node (such as audit report on artificial intelligence, the selection of path planning generation). By optimizing path planning, we can algorithm is the link to realize efficient audit process. Path reasonably arrange the work order of auditors, reduce planning algorithms are designed to determine the best unnecessary waiting time and repetitive work, such as audit path to maximize audit efficiency and coverage avoiding auditors from frequently switching between while minimizing audit costs and time. Based on the different tasks, thereby improving audit efficiency, and it demand of this research, the shortest path algorithm and is expected that the audit time can be shortened by about heuristic search algorithm based on graph theory, such as 20%. Dijkstra algorithm and A (A-Star) algorithm, are selected as the core algorithm of path planning. Dijkstra algorithm 2.4.2 Audit process design is a classical shortest path algorithm, which can find the In the research of financial audit automation based on shortest path from the starting point to the end point in the artificial intelligence, the audit process design is the key weighted graph. It is suitable for task planning in financial to realize the efficient automation of audit tasks. audit, such as determining the optimal path from one audit Designing a scientific and reasonable audit process can task to another, reducing the waste of auditors' time and maximize the use of artificial intelligence technology to resources. The algorithm maintains a priority queue, improve audit efficiency and accuracy. The design of audit gradually expands to all nodes in the graph, calculates the process mainly includes re-audit preparation, data shortest path of each node, and finally builds a complete acquisition and preprocessing, model application and shortest path tree. analysis, anomaly detection and processing, audit report On the basis of Dijkstra's algorithm, algorithm A generation and so on. introduces a heuristic function, which makes it more In the re-audit preparation stage, the system efficient to search the optimal path. The heuristic function establishes the audit plan and determines the audit focus estimates the distance between the current node and the and risk areas according to the historical financial data and destination node, thus preferentially choosing the path that industry benchmark data of the enterprise. This step is most likely to reach the destination. Algorithm A has includes collecting data such as the company's annual practical application value in financial audit automation, financial statements, bank statements, electronic invoices for example, in large-scale data sets or complex audit and transaction records. Next is the data acquisition and tasks, it can quickly find an efficient audit path and reprocessing stage, the system through the API interface improve the overall audit efficiency. When planning an and data crawler technology, real-time acquisition of the audit task, there are multiple task nodes and paths, each latest financial data of enterprises, and data cleaning, with a different cost (such as time or resource format conversion and feature extraction. The processed consumption). Using Dijkstra's algorithm, a path to data will be stored in a data warehouse for subsequent minimize the total cost can be calculated. In more complex analysis [17]. scenarios, algorithm A further optimizes the path selection In the stage of model application and analysis, random by introducing heuristic evaluation, making the audit forest algorithm is applied to financial data for risk process more efficient and intelligent [16]. assessment and anomaly detection. The system analyzes By combining Dijkstra and An algorithm, we can key financial indicators, such as revenue growth rate, effectively plan the path of financial audit tasks and asset-liability ratio, cash flow., and predicts potential improve the overall performance and efficiency of audit financial risks and abnormal transactions through the automation system. The path planning method simplifies model. During an audit, the system finds that a company's the audit process, enhances the accuracy and timeliness of accounts receivable turnover is lower than the industry the audit results, and provides support for the financial average, and the model flags this as an anomaly and management of enterprises. further analyzes the cause. During the exception detection In the audit task, we define each audit link as a node, and handling phase, the system analyzes the detected such as financial statement review, inventory counting, exceptions in detail and provides actionable audit accounts receivable verification, etc. The edges between suggestions. The system recommends that auditors further nodes represent the order and dependency between tasks, verify whether the low turnover is due to, for example, such as inventory counting can only be performed after the poor collection of accounts or errors in financial financial statement review is completed. Suppose we have statements. an audit project, including four main tasks: auditing sales This is the audit report generation stage. The system revenue, auditing costs and expenses, auditing balance automatically generates detailed audit reports, including sheets, and auditing cash flow. Among them, auditing audit findings, risk assessment, and improvement sales revenue and auditing costs and expenses can be suggestions. The report format is standardized, which is carried out in parallel, while auditing balance sheets needs easy for auditors and management to read and make to be carried out after the audit of sales revenue and costs decisions. The system generates a PDF report with a and expenses is completed, and auditing cash flow needs summary of the audit, a detailed exception list, and 14 Informatica 49 (2025) 1–20 J. Li et al. recommendations for improvement. Through the audit valuable decision support for enterprise management. The process design, the financial audit process has realized a audit report generated by the system recommends that high degree of automation and intelligence, improved the enterprises optimize their inventory management audit efficiency and accuracy, and enhanced the processes to reduce inventory costs and improve the transparency and traceability of the audit process, which efficiency of capital use. Through the implementation of provides support for the financial health management of audit strategy, the financial audit process has realized a enterprises. high degree of automation and intelligence, improved the audit efficiency and accuracy, and enhanced the 2.4.3 Implementation of audit policies transparency of enterprise financial management and risk control ability. In the research of financial audit automation method based In the implementation of audit policies, we take a on artificial intelligence, the realization of audit strategy is manufacturing enterprise as an example. The principle of the link to ensure the efficient and accurate audit process. formulating the audit plan is to determine the key areas The implementation process includes the formulation, and key links of the audit based on the business implementation and dynamic adjustment of audit strategy. characteristics, financial risk status and regulatory The development of audit strategies is based on a requirements of the enterprise. For example, for this comprehensive analysis of the company's financial data manufacturing enterprise, we focus on raw material and risk assessment, using artificial intelligence procurement, production process cost control and product technology to identify key audit indicators and high-risk sales. The schedule is as follows: at the beginning of each areas. Through the analysis of historical data and industry quarter, a detailed audit plan is formulated to clarify the benchmark data, the system develops detailed audit audit tasks and time nodes of each stage; the first week is strategies. Audit policies include audit scope, key audit to conduct a preliminary review of financial statements, areas, schedule, and resource allocation. For an enterprise, the second week is to conduct inventory counting and the system focuses on its accounts receivable and accounts receivable verification, the third week is to inventory management, and makes corresponding audit conduct a detailed review of costs and expenses, and the schedule and resource allocation plan. fourth week is to summarize the audit results and write an In the execution stage, the system obtains the latest audit report. Resource allocation is based on the difficulty financial data of the enterprise in real time through API and workload of the audit task, and auditors and technical interface and data crawler technology, and performs data resources are reasonably deployed. For complex cost analysis and audit procedures according to the accounting links, auditors with rich experience and predetermined audit strategy. The random forest algorithm professional data analysis tools are arranged. Through the is used for risk assessment and anomaly detection, and the implementation of these audit policies, the company has system monitors and analyzes key financial indicators in reduced the incidence of financial risks by 30% in the past real time. When a company's cash flow fluctuates during year, and the audit satisfaction rate has reached more than the audit process, the system will mark the item as high 85%. risk and further analyze the cause of the fluctuation. Dynamic adjustment is the link of audit strategy implementation. The system continuously monitors data 3 Results and discussion and audit results during the audit process and dynamically adjusts audit policies according to the actual situation. If 3.1 Results an exception is found in a certain area during the preliminary audit, the system will increase the audit efforts 3.1.1 Audit efficiency improvement result in this area and adjust the audit resources and schedule. In In the research of financial audit automation method based the process of audit, it is found that the accounts payable on artificial intelligence, the audit efficiency is improved turnover rate of an enterprise is abnormally high, and the by introducing random forest algorithm and optimizing system will increase the audit of the supplier payment audit process. Specific efficiency improvements can be process to ensure the legality and compliance of all demonstrated by comparing key indicators before and accounts transactions [18]. after the implementation of automated auditing. The system also continuously optimizes audit Random forest feature selection can extract the most strategies through machine learning algorithms. By representative features from a large amount of data, reduce analyzing the successful experience and failure lessons of noise and redundant information, and improve the past audit projects, the system constantly adjusts and accuracy and robustness of the model. The real-time optimizes audit strategies to improve audit efficiency and analysis function enables the system to quickly respond to accuracy. The system adjusts the weight of the risk data changes, optimize the decision-making process, and assessment model based on historical data to ensure improve the real-time and adaptability of the system. accurate identification of high-risk areas. The These improvements have jointly promoted the implementation of the audit policy also includes the improvement of system performance, ensuring more automatic generation of audit reports and accurate predictions and more efficient resource recommendations. The system generates a detailed audit allocation. report based on the audit results, including found risks, anomalies and improvement suggestions, providing Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 15 As shown in Figure 4, the risk detection rate of automated audit systems in different companies has increased to more than 88%. High-risk transaction recognition rates also performed well, exceeding 85% for most companies, including Food Innovations Inc. That's 90 percent. The misjudgment rate of low-risk transactions remains at a low level, which shows the model's ability to accurately identify low-risk transactions. Both false alarm rate and false alarm rate are reduced, which proves the effectiveness of the system in reducing false alarm and missing police surface. Tech Solutions Inc. had a false alarm rate of 10% and a false alarm rate of 5%, showing Figure 3: Audit efficiency improvement result the model's stability in balancing false alarms and false As shown in Figure 3, through the introduction of alarms. Through the application of random forest algorithm, random forest algorithm, the automated audit system has the automated audit system shows high efficiency and shown improved efficiency in many aspects. Audit accuracy in risk identification, improves the risk detection coverage increased across all companies, indicating that automated systems are able to more fully audit a rate and the identification rate of high-risk transactions, company's financial data. Error detection rates also and reduces the misjudgment rate, false positive rate and improved, reflecting the model's strong ability to identify false negative rate of low-risk transactions. The results provide strong support for the financial management and and correct errors. The speed of data processing is risk control of enterprises, and improve the quality and accelerated, indicating that automated systems are able to efficiency of audit work. process large amounts of financial data more efficiently. The improvement of accuracy and risk identification rate further proves the reliability and effectiveness of the 3.1.3 Audit feedback and improvement results automated audit system. Under the joint action of In the research of financial audit automation based on indicators, the financial audit process becomes more artificial intelligence, audit feedback and improvement efficient and accurate, which provides a guarantee for the results are the key to ensure the continuous optimization financial management of enterprises. and efficient operation of audit process. By collecting and analyzing audit feedback, the system can continuously 3.1.2 Audit risk identification effect improve the algorithm and process to improve the accuracy and efficiency of the audit. It shows the In the research of financial audit automation method based performance of key indicators after audit feedback and on artificial intelligence, the random forest algorithm is introduced to effectively improve the effect of audit risk improvement of different companies, including the identification. Through the integration of multiple adoption rate of audit recommendations, the efficiency of corrective measures, audit satisfaction, the reduction rate decision trees, random forest algorithm improves the of error rate after improvement and the increase rate of detection ability of abnormal data and potential risks. This efficiency after improvement. paper presents the audit risk identification effect of The increased computing costs or complexity of different companies, including risk detection rate, high risk transaction identification rate, low risk transaction maintaining AI systems may stem from multiple factors. misjudgment rate, false positive rate and false negative First, real-time analysis functions require rapid processing of large amounts of data, which increases the demand for rate. The data show that automated audit systems perform computing resources and may lead to increased hardware well in risk identification. and operation and maintenance costs. Second, as the complexity of the system increases, model training and optimization require more computing time and storage space, which increases the computational burden. Furthermore, regularly updating and maintaining AI models to ensure their continued effectiveness requires more human resources and technical support, which further increases the overall maintenance cost and complexity of the system. Figure 4: Audit efficiency improvement result 16 Informatica 49 (2025) 1–20 J. Li et al. accidental. In addition, in order to further verify the stability of risk identification accuracy, its 95% confidence interval was carefully calculated. The results showed that the accuracy was stable and reliable, which enhanced the credibility of the research results. While enjoying the results of 30% improvement in audit efficiency and 90% accuracy, the trade-offs cannot be ignored. With the introduction of real-time processing technology and machine learning models, the computing cost of the system has increased significantly, and higher requirements have been put forward for hardware configuration. More powerful servers are needed to Figure 5: Audit efficiency improvement result support the rapid processing of massive data. At the same time, the complexity of the system has been greatly As shown in Figure 5, different companies have improved. The training, optimization and daily operation achieved results after audit feedback and improvement. and maintenance of the model require the participation of The adoption rate of audit recommendations is high, professional technicians, and the labor cost and technical reaching 87% on average, indicating that enterprises difficulty have increased. However, considering the huge attach great importance to the audit recommendations benefits it brings to corporate financial management, these provided by the system and actively adopt them. Health investments are still worthwhile. Plus Ltd. had an adoption rate of 89%. The effectiveness In the results section, we supplemented the control of corrective actions also performed well, averaging 91%, group data and selected the traditional sampling audit indicating that the corrective actions proposed by the method as a control. On the same audit items and data sets, system were highly effective in improving financial the automated audit method based on artificial intelligence processes and controlling risks. The audit satisfaction and the traditional sampling audit method were used for reflects the overall evaluation of the automated audit auditing respectively. In terms of audit coverage, the system of the enterprise, with an average of about 90%, automated audit method reached 95%, while the indicating that the enterprise is very satisfied with the traditional sampling audit method was only 70%. This is audit results and feedback process of the system. The because the automated audit can conduct a comprehensive reduction of error rate after improvement shows the analysis of all data, while the traditional sampling audit is improvement effect after audit feedback, with an average limited by the sample size. In terms of detection rate, the reduction of 47%. Tech Solutions Inc. 's error rate was automated audit method has a detection rate of 90% for reduced by 45 percent, while Food Innovations Inc. It was financial risks, while the traditional sampling audit 49 percent. method is 75%, indicating that the automated audit The improved efficiency further proves the positive method can more effectively detect potential financial effect of audit feedback on improving audit efficiency, risks. Through comparative analysis, we can more with an average increase of more than 50%. The efficiency intuitively see the advantages of the automated audit of Green Energy Corp. has increased by 52%, indicating method based on artificial intelligence in improving audit that the audit efficiency of enterprises has been improved efficiency and accuracy. through audit feedback and improvement measures. In terms of efficiency, by introducing the automated Through continuous audit feedback and improvement audit system, the audit time has been shortened from an measures, the financial audit automation system based on average of 20 working days to less than 10 working days, artificial intelligence improves the accuracy and and the efficiency has been increased by more than 50%. efficiency of the audit, enhances the standardization and This is mainly due to the system's ability to quickly transparency of the financial management of enterprises, process large amounts of financial data and reduce the and provides protection for the financial health of time for manual review. In terms of accuracy, the accuracy enterprises. of risk identification has increased from 80% to more than To verify the significance of the improvement in audit 93%. For example, in an audit of a listed company, the efficiency, we conducted a t-test on the indicators before automated audit system discovered an abnormal and after automation, and the results showed that the p- transaction of fictitious income in a timely manner by value was less than 0.05, indicating that the improvement monitoring financial data in real time, while traditional was statistically significant. audit methods failed to detect it at the first time. Through In the result analysis phase, in order to evaluate the continuous audit feedback and improvement measures, we research results more rigorously, an in-depth statistical continue to optimize the model and audit process, further analysis of the improvement in audit efficiency was improve the accuracy and efficiency of audits, and carried out. For the key indicators before and after enhance the standardization and transparency of corporate automation, a t-test was carefully designed and executed. financial management. Through the calculation and analysis of a large amount of The increase in computing costs mainly includes the sample data, the result of p-value less than 0.05 was finally following aspects: hardware equipment upgrade costs. In obtained. This data strongly shows that the improvement order to meet the needs of big data processing and model in audit efficiency is statistically significant and not calculation, we purchased high-performance servers, and Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 17 the cost increased by 500,000 yuan; software licensing ordinary servers, while deep learning systems usually fees. We use professional data analysis software and require high-performance servers equipped with GPUs. artificial intelligence algorithm libraries, and the annual software licensing fee is 200,000 yuan; human resource 3.2 Discussion investment. We recruited and trained professionals with data analysis and artificial intelligence technology, and the 3.2.1 Problem summary human resource cost increased by 300,000 yuan each year. In the research of financial audit automation method based Through cost-benefit analysis, we calculated the return on on artificial intelligence, although the efficiency investment (ROI). In the past year, due to the improvement and risk identification effect have been improvement of audit efficiency, the company saved 1 achieved, there are still some problems that need to be million yuan in audit costs, and avoided 2 million yuan in summarized and solved. Data quality remains a challenge. potential losses caused by the failure to discover financial Even after strict data cleaning and preprocessing steps are risks in time. According to the ROI calculation formula: implemented, data noise and missing values still affect the ROI = (benefit - cost) / cost × 100%, the calculated ROI accuracy and stability of the model. The data sources is 200%, indicating that the cost increase is acceptable and involved in the audit process are diverse and the data has a high investment value. formats are not uniform, which leads to the complexity of When evaluating the improvement of audit efficiency, data integration and increases the difficulty of system we selected 30 audit projects as samples and recorded the processing and analysis. The issues of model audit time before and after the use of the automated audit interpretation and transparency need attention. Although system. When using the t test, we first performed a complex algorithms such as random forest perform well in normality test on the two groups of data to ensure that the accuracy and efficiency, their internal decision-making data met the conditions of the t test. Then, we calculated process is complicated and difficult to be understood and the mean and standard deviation of the two groups of data, explained by non-technical personnel. In the process of and calculated the t value using the t test formula. After generating audit reports and interpreting audit results, calculation, the t value was 3.5, the degree of freedom was users' trust in audit conclusions is reduced. 58, and the corresponding p value was 0.01, which was The real-time processing capability of the system less than 0.05, indicating that at a confidence level of 95%, needs to be improved. Despite the introduction of real- the audit time of the automated audit system was time processing platforms such as Kafka and Blink, the significantly lower than that of the traditional method, and system still has room for improvement in processing speed the efficiency was significantly improved. In terms of risk and latency in the face of large-scale and high-frequency identification accuracy, the Mann-Whitney U test was data flows. This puts forward higher technical used. The number of risks identified and the correct requirements for realizing real - time audit. The identification rate of the automated system and the generalization ability of the model also needs attention. traditional method in 30 audit projects were compared. Although the robustness of the model has been improved The calculated Mann-Whitney U value was 200, and the through cross-validation and parameter optimization, the corresponding p value was 0.03, which was less than 0.05, model shows insufficient adaptability in the face of new indicating that the automated system was significantly types of financial data and fraudulent means, which affects better than the traditional method in terms of risk its promotion and application in different enterprises and identification accuracy. industries. Audit systems based on deep learning, such as The user feedback mechanism needs to be improved. convolutional neural networks (CNN) and recurrent Although the system can automatically generate audit neural networks (RNN), have powerful feature learning reports and improvement suggestions, how to effectively capabilities when processing financial data and can collect and process user feedback in the feedback and automatically extract complex data features. However, improvement process to continuously optimize the audit deep learning models require a large amount of labeled strategy and model performance is still a problem that data for training, and the training process consumes large needs in-depth research. Although AI-based financial computing resources and takes a long time to train. In audit automation methods have achieved results in contrast, the random forest algorithm used in this study improving audit efficiency and risk identification, they combined with Kafka and Flink real-time processing still need to be further optimized and improved in data technology has obvious advantages in audit efficiency. quality, model interpretation, real-time processing When processing financial data of the same scale, the audit capabilities, generalization capabilities and user feedback time of this system is only 50% of that of the deep learning mechanisms to achieve more efficient and reliable system. The average audit time of the deep learning financial audit automation. system is 20 days, while this system only takes 10 days. In Although audit efficiency has been significantly terms of accuracy, although the deep learning system improved, an increase in computational costs has also performs well in identifying some complex risks, the been noted. This is due to the introduction of real-time accuracy of this system in identifying common financial processing technology and machine learning models that risks is comparable to that of the deep learning system, increase the computational burden on the system. reaching more than 90%. At the same time, this system has However, this cost increase is acceptable considering the low demand for computing resources and can run on efficiency gains. 18 Informatica 49 (2025) 1–20 J. Li et al. 3.2.2 Research suggestions and positive, which shows that the debt-to-asset ratio plays In the research of financial audit automation method based a key positive role in the model's judgment of the on artificial intelligence, in order to further improve the transaction as abnormal. That is, the debt-to-asset ratio efficiency and accuracy of the system, the following exceeds the normal range, which greatly increases the research suggestions need to be put forward. Data quality possibility of the transaction being judged as abnormal. management needs to be further improved. It is Visualizing the SHAP value (such as using a SHAP value recommended to establish a more comprehensive data bar chart, with the horizontal axis as the feature name and cleaning and preprocessing mechanism, adopt advanced the vertical axis as the SHAP value size) can more missing value processing methods and anomaly detection intuitively show the degree of influence of each feature on techniques, such as adaptive filtering and deep learning the prediction result. Auditors can see at a glance which models, to improve data reliability and integrity. Enhance features have the greatest impact on the model's decision, model interpretation and transparency. It is recommended and thus conduct in-depth analysis of why the transaction to integrate explainable AI technologies such as LIME and was judged to be abnormal, greatly improving the SHAP into the model so that auditors can understand and transparency and credibility of the audit, so that audit explain the decision-making process of the model, thereby decisions are no longer "black box" operations, but are improving the transparency of audit reports and the trust based on clear and explainable evidence. of users. Enhanced real-time processing capabilities. It is suggested to optimize the existing real-time data 3.3 Discussion processing architecture, introduce more efficient stream From a quantitative perspective, in terms of audit processing technologies and hardware acceleration accuracy, the accuracy of cutting-edge research is mostly schemes, such as GPU acceleration and distributed in the range of 80% - 86%, while this study uses the computing framework, to cope with large-scale and high- random forest algorithm combined with real-time data frequency data stream processing needs, and ensure that processing technology to achieve an audit accuracy of the system can respond to and process data in real time. 90%. In terms of audit efficiency, most cutting-edge Improving the generalization ability of models is research efficiencies are at a medium or low level, but this another direction. It is suggested to enhance the study has achieved a significant result of 30% efficiency adaptability and robustness of the model in different improvement. This is mainly attributed to the fact that the enterprises and industries through integration learning and random forest algorithm constructs multiple decision trees transfer learning techniques. The multi-model fusion and randomly selects features for node splitting, method is used to improve the generalization performance effectively reducing the risk of overfitting and improving of the model, and the model is applied to different the model's ability to identify complex financial data; at financial environments through transfer learning. the same time, the use of real-time data processing Optimize the user feedback mechanism. It is suggested to platforms such as Kafka and Flink has realized the real- establish a dynamic feedback and continuous learning time collection, processing and analysis of financial data, system, collect and analyze user feedback, adjust and greatly accelerating the audit process. optimize audit strategy and model parameters in time. Qualitatively, the support vector machine, gradient Through the closed-loop mechanism of user feedback, the boosting and other methods used in cutting-edge research system performance can be continuously improved to have limitations when facing the high dimensionality, ensure the effectiveness of audit strategies and the complexity and dynamic changes of financial data. The accuracy of models. The suggestions aim to further support vector machine has high computational improve the financial audit automation method based on complexity, is sensitive to the choice of kernel functions, artificial intelligence, improve the intelligence level and and is difficult to adapt to the diversity of financial data; practicability of the system, and provide more powerful gradient boosting is sensitive to outliers, takes a long time technical support and guarantee for the financial to train, and cannot meet real-time requirements. In management of enterprises. Through improvement, more contrast, this research method not only improves the efficient and accurate financial audit can be achieved, and accuracy and robustness of the model, but also ensures the the intelligent transformation of the financial audit dynamics and timeliness of the audit process, and can industry can be promoted. better adapt to the actual needs of corporate financial audits. 3.2.3 SHAP effect However, this research method still has certain In an actual case, we selected the financial data of a certain limitations. In terms of data quality, despite the company over a period of time, including multiple features implementation of strict data cleaning and preprocessing such as income, expenditure, accounts receivable turnover steps, data noise and missing values will still affect the rate, and debt-to-asset ratio. After model training and accuracy and stability of the model. Model interpretability prediction, we obtained the result that a certain transaction and transparency are also issues that need attention. The was judged to be abnormal. At this time, the SHAP value internal decision-making process of the random forest can clearly explain the basis for the model to make this algorithm is complex, and it is difficult for non-technical judgment. personnel to understand and explain, which to a certain By calculating the SHAP value of each feature, we extent reduces the user's trust in the audit conclusions. In found that the SHAP value of the debt-to-asset ratio is high addition, when facing large-scale, high-frequency data Automating Financial Audits with Random Forests and Real-Time… Informatica 49 (2025) 1–20 19 traffic, the system's real-time processing capabilities have to meet the real-time requirements of audits. In addition, been improved, but there is still room for improvement. SHAP is calculated based on a decision tree model, which The generalization ability of the model when dealing with is susceptible to data distribution and noise. Even a small new financial data and fraud methods also needs to be change in data may cause a significant change in the tree enhanced. structure, which in turn leads to unstable calculation results of the SHAP value, which cannot accurately reflect 4 Conclusion the true contribution of features to model decisions, affecting the reliability of audit results. In this study, an AI-based financial audit automation method is deeply discussed and implemented to improve audit efficiency, accuracy and risk identification ability. Through the introduction of random forest algorithm, Acknowledgement combined with several key steps such as data integration, This study was funded by Science Research Project of real-time processing, model training and optimization, the Hebei Education Department (BJS2023041). system has shown improvement in many aspects. By constructing and optimizing the random forest model, the audit coverage and error detection rate are improved, and References the risk identification and processing are realized efficiently. The data cleaning and reprocessing steps [1] Sun JH, Li LC, Qi BL. Financial statement ensure the quality and consistency of the input data, comparability and audit pricing. Accounting Finance. providing a reliable basis for the model. The introduction 2022; 62(5): 4631-4661. of real-time processing technologies such as Kafka and https://doi.org/10.1111/acfi.12970. Slink has accelerated data processing and met the [2] Condie ER, Obermire KM, Seidel TA, Wilkins MS. processing needs of high frequency data streams. The Prior Audit Experience and CFO Financial Reporting audit process design optimizes the allocation and Aggressiveness. Audit J Pract Theory. 2021; 40(4): utilization of audit resources and improves the overall 99-121. https://doi.org/10.2308/AJPT-2020-012. audit efficiency through systematic steps. [3] Koh K, Tong YH, Zhu ZN. The effects of financial The maintainability and transparency of the model statement disaggregation on audit pricing. Int J Audit. was also emphasized in the study, and by introducing 2022; 26(2): 94-112. explainable AI technology, users' trust in audit results was https://doi.org/10.1111/ijau.12253. increased. At the same time, through the establishment of [4] Lyshchenko O, Ocheret'ko L, Lukanovska I, dynamic feedback mechanism, the system can constantly Sobolieva-Tereshchenko O, Nazarenko I. The role of collect and analyze user feedback, timely adjust the audit financial audit in ensuring the reliability of financial strategy and model parameters, and achieve continuous statements. Ad Alta J Interdiscip Res. 2024; 14(1). optimization. Despite the achievements, the study also https://doi.org/10.33543/140139141145 points out some challenges, such as data quality issues, [5] Suryani E, Winarningsih S, Avianti I, Sofia P, Dewi N. model generalization capabilities, and real-time Does Audit Firm Size and Audit Tenure Influence processing capabilities, which need to be further Fraudulent Financial Statements? Australas Account addressed in future research and applications. This study Bus Finance J. 2023; 17(2): 26-37. shows the great potential of artificial intelligence in [6] Xu Q, Fernando G, Tam K, Zhang W. Financial report financial audit automation, and also puts forward specific readability and audit fees: a simultaneous equation suggestions for improvement, which provides a valuable approach. Managerial Aud J. 2020; 35(3): 345-372. reference for future financial audit practice. Through https://doi.org/10.1108/MAJ-02-2019-2177. continuous optimization and improvement of technology [7] Erdmann A, Yazdani M, Mas Iglesias JM, Marin and methods, financial audit will achieve a higher level of Palacios C. Pricing Powered by Artificial intelligence and automation, and provide more accurate Intelligence: An Assessment Model for the and efficient support for the financial management and Sustainable Implementation of AI Supported Price risk control of enterprises. Intelligent audit methods Functions. Informatica. 2024;35(3):529-56. improve the quality and efficiency of audit work, but also https://doi.org/10.15388/24-infor559 enhance the transparency and standardization of financial [8] Ijadi Maghsoodi A, Hafezalkotob A, Azizi Ari I, Ijadi management, and promote the innovation and Maghsoodi S, Hafezalkotob A. Selection of Waste development of the financial audit industry. Lubricant Oil Regenerative Technology Using Although explainable artificial intelligence Entropy-Weighted Risk-Based Fuzzy Axiomatic technology (such as SHAP) has achieved remarkable Design Approach. Informatica. 2018;29(1):41-74. results in improving the transparency of financial audits, https://doi.org/10.15388/Informatica.2018.157 its limitations cannot be ignored. In an extremely high- [9] Pragarauskaitė J, Dzemyda G. Markov Models in the frequency data environment, calculating the SHAP value analysis of frequent patterns in financial data. requires complex operations on massive data, resulting in Informatica, 2013, 24(1): 87-102. a sharp increase in computing resource consumption, http://dx.doi.org/10.15388/Informatica.2013.386 processing efficiency cannot keep up with the speed of [10] Lim CY, Lobo GJ, Rao PG, Yue H. Financial data updates, and scalability is limited, making it difficult capacity and the demand for audit quality. Accounting 20 Informatica 49 (2025) 1–20 J. Li et al. Bus Res. 2022; 52(1): 1-37. https://doi.org/10.1080/00014788.2020.1824116. [11] Ismail R, Mohd-Saleh N, Yaakob R. Audit committee effectiveness, internal audit function and financial reporting lag: Evidence from Malaysia. Asian Acad Manage J Account Finance. 2022; 18(2): 169-193. https://doi.org/10.21315/aamjaf2022.18.2.8. [12] Oussii AA, Boulila N. Evidence on the relation between audit committee financial expertise and internal audit function effectiveness. J Econ Adm Sci. 2021; 37(4): 659-676. https://doi.org/10.1108/JEAS- 04-2020-0041. [13] Lyubenko A, Znak N, Karpachova O. Audit features of the first IFRS financial statements. Financial Credit Activity Probl Theory Pract. 2022; 1(42) :185-194. [14] Endrawes M, Feng ZA, Lu MT, Shan YW. Audit committee characteristics and financial statement comparability. Accounting Finance. 2020; 60(3): 2361-2395. https://doi.org/10.1111/acfi.12354. [15] Lutfi A, Alkilani SZ, Saad M, Alshirah MH, Alshirah AF, Alrawad M, et al. The influence of audit committee chair characteristics on financial reporting quality. J Risk Financial Manage. 2022; 15(12): 563. https://doi.org/10.3390/jrfm15120563. [16] Calvin CG, Holt M. The impact of domain-specific internal audit education on financial reporting quality and external audit efficiency. Accounting Horizons. 2023; 37(2): 47-65. https://doi.org/10.2308/HORIZONS-2020-105. [17] Alcaide-Ruiz MD, Bravo-Urquiza F. Does audit committee financial expertise actually improve information readability? Rev Contab Span Account Rev. 2022; 25(2): 257-270. https://doi.org/10.6018/rcsar.420261. [18] Driskill MW, Knechel WR, Thomas E. Financial auditing as an economic service. Curr Issues Aud. 2022; 16(2). https://doi.org/10.2308/CIIA-2021-021. https://doi.org/10.31449/inf.v49i16.7705 Informatica 49 (2025) 21–36 21 Graph Neural Network-Based User Preference Model for Social Network Access Control Yuan Zhang1,2* 1Xuchang Vocational Technical College, Xuchang 461000, China 2Henan Province Data Intelligence and Security Application Engineering Technology Research Center, Xuchang 461000, China E-mail: hnxc_z@126.com *Corresponding author Keywords: social networks, user preferences, graph neural network, multi-layer attention, access control Received: November 11, 2024 The popularity and deepening of social networks have increased the risk of personal information leakage for users. To enhance the security of social networks, this study constructed an access control model based on the preferences of social network users. This model utilizes graph neural networks to generate access control strategies based on user preferences, and introduces a multi-layer attention mechanism to optimize the graph neural network. To better capture user preference information, the study sets the learning rate to 0.0001. The experimental results demonstrated that in the Twitter dataset, the accuracy of the proposed model reached 95.7% and the F1 score reached 96.2%, which were significantly higher than those of other models. These results indicated that the model could more accurately classify access control in social networks and reduce false positives. The area under the receiver operation characteristic curve of the proposed model was 0.982, which was higher than other models. The decision time was 13.77 seconds, significantly lower than other models. This indicated that the model could more effectively distinguish different types of user access requests and provide more reliable guarantees for secure access to social networks. The user's preferred social network access control model based on graph neural networks has superior performance, effectively ensuring the information security of social network users and laying the foundation for further development of access control technology. Povzetek: Predstavljen je nov model za nadzor dostopa v družbenih omrežjih, ki temelji na grafskih nevronskih mrežah in uporabniških preferencah. Z uporabo večslojnega pozornostnega mehanizma model omogoča zanesljivo in varno upravljanje dostopa. 1 Introduction unauthorized access and malicious activities. It can also prevent data leakage, tampering, and destruction, protect In the era of rapid digital development, social networks sensitive information from being leaked to unauthorized play an important role in today's society. Through social personnel, and ensure the reliability of network systems platforms, people can not only obtain the information and [4]. Therefore, the importance of implementing effective exchange ideas they need but also engage in commercial access control for social networks is self-evident. activities through social networks, greatly changing their Nowadays, there are mainly attribute-based, communication methods and lifestyle habits [1, 2]. policy-based, and relation-based Access Control Models However, the popularity of social networks has made the (ACM), which are widely used in various scenarios [5]. issue of user privacy protection increasingly prominent. However, traditional models still have drawbacks such as Using social networks means that users need to expose complex permission management and difficulty in their personal information to a certain extent. Criminals adapting to dynamic network environments. Specifically, can steal user information through cyber attacks and use traditional models often require manual intervention in it for illegal activities, thereby posing potential risks to the process of assigning, revoking, and updating users [3]. Meanwhile, there is a large amount of false permissions, resulting in increased management costs and information and rumors on social networks. The rapid error rates. The user behavior and social relationships of dissemination of this information may lead to social networks are constantly changing, and traditional misunderstandings among the public about certain events models are difficult to adapt, resulting in insufficient or issues, resulting in adverse social impacts. Access flexibility of access control policies and inability to control is a critical component in information security, effectively respond to new security threats. In this used to manage user access permissions to systems, context, this study constructs ACM based on the networks, or applications. Access control can help preferences of social users, uses Graph Neural Network organizations protect important data and resources from 22 Informatica 49 (2025) 21–36 Y. Zhang (GNN) for access control, and introduces Multi-Layer proof-based Ethereum access blockchain to accelerate Attention (MLA) to optimize GNN. Finally, data storage. This method could significantly improve UP-GNN-SNAC model, a GNN-based social network network security [9]. Zhang L et al. designed a ACM catering to user preferences, is designed. The lightweight decentralized multi-authorization ACM based innovation of the research lies in constructing an ACM on ciphertext policy attribute-based encryption and based on user preferences. Compared with existing blockchain to enhance the security of in-vehicle social GNN-based ACMs, this model better balances privacy networks. Distributed multi-authorization nodes protection and user experience by capturing user supported vehicle users by performing lightweight preferences, providing a more efficient and accurate computing with the help of vehicle cloud service solution for secure access to social networks. providers. This model had significant advantages compared to existing solutions [10]. 2 Related works Zhao Y et al. designed a policy-protected, cleanable ACM to improve the efficiency of data encryption in The progress of the Internet has made it a part of people's vehicle social networks. It could test and clean encrypted daily life to interact with others through social networks. data, and divide access policies into attribute names and However, due to system vulnerabilities in online attribute values, thereby hiding information in the platforms, many criminals exploit these vulnerabilities to ciphertext and achieving good encryption performance launch attacks, resulting in the leakage of user [11]. Squicciarini A et al. designed a discrete ACM based information and even being maliciously exploited. Access on individual decision-making to address privacy and control, as a key technology for maintaining social security issues arising from data sharing in social network security, is currently a hot topic of research networks. It took into account individual preferences in among relevant professionals. You M et al. designed a social networks and selected discrete privacy values from knowledge graph-based access control decision-making a fixed set of options. This model had a good privacy method to improve access control performance under protection effect in data sharing [12]. Dixit M S et al. different degrees of imbalance. It extracted topological designed a deep learning-based real-time user ACM for features to represent high cardinality classification users social networks to address user login restrictions. It used and resource attributes, revealing the interrelationships CNN and LSTM to predict the age of users and adopts between different objects. This method could multi-task CNN for face detection and feature extraction, significantly improve access control performance [6]. Gai thus achieving significant control over user login [13]. K et al. designed a zero-trust cross-organizational data Wen W et al. designed an autonomous privacy control sharing ACM based on blockchain to enhance security in and identity verification sharing scheme built on fast network data sharing. It utilized blockchain alliances to response codes in social networks to solve the problem of establish a trusted environment and deployed role-based users being unable to independently control privacy access control through multi-signature protocols and sharing. It used fast response codes with high-quality smart contract methods, which had high practicality [7]. images for error correction, combining the advantages of Wu H et al. designed a cloud network secure storage data polynomial-based and visual-based secret image sharing. ACM based on association rules to improve the security This scheme had low computational complexity and of social network data access control. It utilized scalability [14]. Safi S M et al. designed an improved association rule feature extraction methods for data end-to-end mobile social network security ACM to mining and attack detection in network security storage protect the personal privacy of social network users. It areas and achieved data access control in network encrypted user-shared data through ciphertext policy security storage areas through adaptive partition-weighted attribute encryption, utilizing advanced encryption interface scheduling. This method was superior to standards to prevent unauthorized user access. This traditional methods [8]. Azbeg K et al. designed an ACM scheme had high security and practicality [15]. The based on improved blockchain technology to enhance the summary of related work is shown in Table 1. security and privacy of network systems. It stored data in the interstellar file system and utilized authorization Table 1: Summary of related work. References Model Key features Dataset Indicator results Insufficient Extracting Access Control Not considering topological Decision Method Improved access the balance features to Synthesize social [6] Based on control between privacy represent user network data Knowledge performance protection and and resource Graph user experience attributes Blockchain Zero Establishing a Cross Slow response [7] High practicality Trust Cross Trusted organizational speed Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 23 Organizational Environment transaction data Data Sharing through Access Control Blockchain Alliance Poor adaptability to new types of Association Using attack modes Rules Cloud Superior to association rules Cloud storage and difficulty in [8] Network Security traditional for attack logs handling Storage Access methods detection dynamically Control changing environments Improve Interstellar File Significant Requires a large blockchain System and File transfer [9] improvement in amount of technology Authorization record network security storage space access control Proof Ethereum Complex key Decentralized Cryptography Compared to management Multi policy Vehicle existing increases Authorization [10] attribute-based communication solutions, it has deployment Model for encryption and records significant difficulty and Vehicle mounted blockchain advantages slow response Social Networks speed The cleaning Policy protection Testing and Encrypt the The encryption process may [11] can purify access cleaning dataset effect is good result in control encrypted data information loss Lack of effective Personal modeling of Individual preference group behavior Good privacy [12] Decision privacy user behavior data and insufficient protection effect Discrete ACM protection in consideration of social networks personalized preferences Deep learning models require a Real time user Convolutional Superior to large amount of Social network [13] access control in neural network traditional data for training, user data deep learning predicts age methods which poses a risk of privacy leakage High image Image correction Low Quick response quality combined with User uploads computational [14] code autonomous requirements secret image images complexity and privacy control and sensitivity to sharing good scalability image noise The key distribution and management of Mobile social Cryptography Mobile device High safety and ciphertext policy [15] network security policy attribute logs practicality attribute access control encryption encryption are relatively complex In summary, many scholars have achieved However, these methods still have slow response times significant results in social network access control. and fail to consider the balance between privacy 24 Informatica 49 (2025) 21–36 Y. Zhang protection and user experience. Therefore, this study User preference refers to the preferences of users towards constructs an ACM based on user preferences and certain things, which are formed by the comprehensive simulates it using an improved GNN with MLA influence of various factors such as personal factors and mechanism to design an UP-GNN-SNAC model to social environment. Among them, personal factors improve access control effectiveness. include internal characteristics such as age, gender, occupation, interests, values, and behavioral habits of 3 GNN-based ACM based on user users. Social factors include external environmental factors such as social circles, interaction objects, social preferences frequency, cultural values, and social interactions. In This section mainly elaborates on the construction social networks, users express their preferences through process of the UP-GNN-SNAC model. The first section is posting and activity operations. These operations generate the design of ACM based on user preferences, and the a large amount of data. By analyzing these data, user second section is the implementation of an access control behavior patterns and characteristics can be understood, algorithm based on improved GNN. and appropriate access permissions can be generated for users to meet their privacy needs in different scenarios, thereby protecting user privacy [16]. Therefore, this study 3.1 ACM construction based on user constructs a model based on the preferences of social preferences network users, as shown in Figure 1. Algorithm module Historical Access control module Data Protection User data module Module Preference analysis module Figure 1: Specific architecture of access control model. In Figure 1, ACM consists of six modules: user, data sent by the data protection module is transmitted to the protection, access control, preference analysis, historical preference analysis module, it will analyze the user's data, and algorithm. When users need to post or obtain historical social data, obtain the user's preferences, and information from social networks, sending requests first return Personal Preferences (PPs). When the data is goes through the data protection module. It can encrypt transmitted to the algorithm module, it will train the and backup the information posted by users. Then, the obtained data and finally return the best result to the data protection module sends the user's request to the access control module. Specifically, different users have access control module. This module sends requests to the different preferences. When users upload information, preference analysis module, historical data module, and different preference information corresponds to different algorithm module respectively. The historical data access control policies [17]. It is necessary to determine module can extract and preprocess user interaction the level of privacy of uploaded information based on behavior data, basic attributes, and social relationship user preferences, that is, to establish a quantitative model data. After cleaning, deduplication, and standardization of of user preferences to measure social information these data results, they provide input for the preference entropy. Figure 2 shows a social information sensitivity analysis module and algorithm module. When the request measurement model based on user preferences. Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 25 Inverse cosine Information Sensitivity function entropy measurement Historical Information User posts numbers entropy weight Social Information data Sensitivity User sharing Measurement Model Social degree friends Conditional information entropy Figure 2: Social information sensitivity measurement model based on user preferences. In Figure 2, after users post information, they need user. The confidentiality of the information posted by the to calculate the sensitivity of the information based on user can be calculated based on whether the user's information entropy and obtain the user's social uploaded information is blocked from their friends. The information sharing degree based on their historical visits calculation method is shown in equation (3). and social friends. It is also necessary to use methods F i such as information entropy weight and conditional h = a (3) i F information entropy to calculate and obtain an h In equation (3), i is the confidentiality level of information entropy measurement model. Information F i information i . a is the number of friends blocked by entropy is a basic concept in information theory, which is the user. used to measure the uncertainty of a random variable. It The level of confidentiality is the ratio of the number of reflects how difficult it is to predict the outcome of an friends allowed to view social data on a social network to event, or how much information is needed to describe the the total number of friends. As the number of friends event. Conversely, an increase in information entropy permitted to view the social data increases, the level of corresponds to a decrease in event outcome uncertainty, confidentiality thereof decreases. The degree of social and vice versa. Information entropy weight is a weight information sharing can be used to describe the impact of allocation method based on information entropy, which is the number of social friends and the number of friends used to measure the importance of different features or blocked by the user on social data sharing. The degree to data dimensions. Conditional information entropy is used which friends are permitted to access information is to measure the uncertainty of a random event given directly correlated with the extent of social information certain conditions. Therefore, information entropy can be sharing. The calculation method is shown in equation (4). used to describe the amount of privacy contained in social data, determine the degree of privacy of the social data, 2 i i Fa arctan F si (F , F ) ( (4 a = hi w F) = ) and construct an information sensitivity measurement F model for social data. The calculation method for the s (F , F i amount of social data of users is shown in equation (1). ) In equation (4), i a is the user's social H(x) = − p(xi )log2 p(xi ) (1) information sharing degree. Information entropy can be measured based on the degree of social information H(x) In equation (1), is the average number of sharing among users, as shown in equation (5). private information uploaded by all users in the social p(x network. i ) is the proportion of the privacy level of Hs (x) = −si p(xi )log2 p(xi ) (5) information i in the total privacy information. H (x) As the social breadth of a user increases with the number In equation (5), s is the information entropy. of social friends, the relationship between the social According to the information entropy analysis of user breadth of a user and the number of social friends can be preference mechanism, in the algorithm module, user obtained as shown in equation (2). social data are divided into a training set and a testing set, 2 and they are trained separately to obtain the final access w(F) = arctan F (2)  control policy. Figure 3 displays the obtaining process of the strategy. w(F) In equation (2), represents the social breadth of the user. F is the number of social friends of the 26 Informatica 49 (2025) 21–36 Y. Zhang Training Decision- model making Training set User Feature User History extraction preferences Final decision Decision- Test set Test making model Figure 3: Access control policy acquisition process. In Figure 3, after dividing the social data into two the process by which a node updates its own datasets, the user history records of different datasets are representation by exchanging information with its first obtained, and then user feature extraction is neighboring nodes. This process enables the transmission performed to calculate user preferences. The next step is of information on the graph [19]. Attention mechanism is to combine the feature vectors to obtain a training model an important technique in deep learning that allows and a testing model, and then make decisions separately models to selectively focus on different parts of the input through the training model and the testing model. The sequence, assigning different weights to each part of the final step is to make a comprehensive decision based on input sequence to highlight the more critical information the access request, obtain the final decision, and thus for the task. The attention mechanism is a process that obtain the access control policy. dynamically assigns weights to the elements of the input sequence. This allows the model to focus on key parts of the input in a targeted manner. As a result, the model 3.2 Access control method based on processes and learns information in the data more improved GNN efficiently [20]. Therefore, GNN performs well in graph In ACM, the algorithm module is the core of the entire structure data such as social networks and chemical model, which is used to process and analyze the dynamic molecular structures. Research is conducted on processing unit of user preference data, historical constructing a GNN model based on user preferences. To behavior data, and social relationship data. The algorithm improve the performance of the model, MLA mechanism module not only determines whether the model can is introduced to optimize the model, and an access control accurately analyze user preferences but also directly method based on improved GNN is designed to capture relates to whether the model can make effective decisions the complex patterns of user social relationships and [18]. GNN is a graph-based deep learning method that personal behavior, and optimize access permission can enrich node representations by utilizing the allocation. The study aims to enhance the model's relationships between nodes. Specifically, GNN can understanding of the relationships between different update the representation of nodes by defining the nodes by integrating MLA mechanisms into GNN. Each connection relationships between nodes on the graph, layer of the attention mechanism enables the model to utilizing their neighbor information to achieve focus on different node characteristics, thereby enabling information transfer and learning of the entire graph. the model to more finely distinguish the importance of GNN mainly includes three core functions: node users and their associated objects, enhance the model's representation, graph structure representation, and learning ability, improve its resolution of user message passing. Among them, node representation can preferences, and more accurately capture the user's true map each node to a low dimensional vector space for intentions. Consequently, this enhances the effectiveness subsequent calculations. The graph structure is of access control. The model structure is shown in Figure represented in a low dimensional vector space for 4. subsequent calculations. Message passing is defined as User Node Social data classification Preference Node embedding dissemination fusion Figure 4: Access control method based on improved GNN. Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 27 In Figure 4, nodes are constructed based on users, their social friends, and social data posted by users. The d respectively. T represents transpose. k represents the user nodes include basic attributes, social relationship characteristics, behavior characteristics, and privacy dimension of the key vector, used to scale dot product setting characteristics. Social data nodes contain features results and prevent gradient vanishing. The method for of published content, interactive behavior, and social updating user SP nodes is shown in equation (9). relationships. These features are transformed into low 1 dimensional vectors through numerical processing and pn+ = un +  n n a a a u  b bS embedding learning, and embedded into GNNs as input  a (9)  n vectors. Given the input data, the user's Social softmax(Relu( n )T 3Relu( n  a = ua W ub )) Preferences (SPs) and PPs are obtained, and the two preferences are fused. Then, the fused data is trained and pn+1 In equation (9), a is the updated temporary the nodes are classified. The user attributes are selected to represent the user, and after embedding, the user node embedding matrix is obtained. The calculation method is  n node of user SPs. a is the attention score, which is the shown in equation (6). aggregated weight ratio of each neighboring node during ua = f (W1 [Pa ,Ea ]) (6) un un node update. a and b are the n -th embeddings of u P In equation (6), a is the user node, a is the nodes a and b . b represents all the explicit and implicit neighbor nodes of the node in the figure. E node embedding matrix, a is the node free embedding softmax() Relu() and both represent activation W matrix, and 1 is the node embedding weight. By using W functions. 3 is the attention weight. The update of user natural language processing tools to process and extract PP nodes is shown in equation (10). each social data, the embedding matrix of user posted social data nodes is obtained after embedding, and the qn+1 un n n a = a +  a d  i iC  a (10) calculation method is shown in equation (7).  n softmax(MLP[ n , n  i = ua di ]) di = f (W2 [Qi ,Vi ]) (7) qn+1 In equation (10), a is the updated PP temporary d Q In equation (7), i is a social data node. i is the  n node. a is the weight ratio of adjacent nodes when a V social data embedding matrix. i is the free embedding c user node updates. a is all user related data in the W matrix of data nodes. 2 is the embedding weight of figure. MLP is a multi-layer perceptron. A multi-layer perceptron is a simple neural network used to perform social data. The embedded user nodes and social data nonlinear transformations on the feature vectors of nodes. nodes are input into the fusion layer and updated It typically consists of multiple fully connected layers, simultaneously through the MLA mechanism. The MLA each of which can be followed by a nonlinear activation mechanism calculation method is shown in equation (8). function. The embedding of user nodes and social data QKT A(Q, K ,V ) = softmax( )V (8) nodes have different meanings in each dimension. If dk attention scores are calculated using functions such as dot product or mean pooling, it will result in inaccurate A(Q,K,V) In equation (8), represents attention. attention scores. Therefore, attention neural networks are used to calculate the attention scores of each neighboring Q , K , and V represent query, key, and value, node, and the results obtained by each neural network are finally normalized. The updated user's social preference 28 Informatica 49 (2025) 21–36 Y. Zhang temporary node and personal preference temporary node un+1 are weighted and fused to obtain the updated user node. In equation (11), a is the updated user node. The calculation method is shown in equation (11). n+1 n+1 and  are the weights of SP temporary nodes un+1  n+1 pn+1  n+1qn+1  a = a + a  (11) and PP temporary nodes in the updated user nodes.  n+1 n+1  + =1 Figure 5 shows the user preference fusion process. Embedded layer Fusion layer Access control output layer GNN N-order fusion User node Propagation Word Fusion law embedding coefficient Social data Social Personal End user preference preference node Figure 5: User preference fusion process. In Figure 5, user preference fusion consists of three C parts: embedding layer, preference propagation fusion n W In equation (12), ua is the fusion coefficient. 4 layer, and access control output layer. In the embedding layer, word embeddings are performed on user nodes and is the fusion weight of user nodes. e is a natural social data as inputs to the model. In the preference sigmod() propagation fusion layer, two GNNs simulate the constant. d is the dimension. is the propagation and change patterns of user social preferences among users and the propagation and change tanh() activation function. is a hyperbolic tangent patterns of user personal preferences in social data. After N rounds of propagation and fusion, user nodes are function. Finally, the loss function of the model is defined updated using attention mechanisms based on explicit to measure the difference between the predicted results of neighbor nodes, implicit neighbor nodes, and social data nodes. In the access control output layer, in the graph, N the model and the true labels, as expressed in equation user nodes obtained through N preference propagation (13). fusion are used to calculate the fusion coefficient through 1 a linear neural network. Based on the fusion coefficient, L = − [ya  log( pra ) + (1− ya )  log(1− pra )](13) the N user nodes are finally fused to obtain the final user M a nodes with user preferences. The fusion coefficient can In equation (13), L is the loss and M is the quantify the importance or weight of user social preference temporary nodes and user personal preference y amount of user nodes. a means the true label of the temporary nodes. Therefore, the calculation of the fusion coefficient is completed by updating the user nodes and normalizing through a nonlinear transformation and a softmax function. At the same time, the user embedding a -th user node, with a value of 0 or 1. When access is vectors after each propagation are multiplied by their corresponding fusion coefficients, and these weighted allowed, it is 1, and when access is prohibited, it is 0. embedding vectors are added to obtain the final user node vector. The calculation method is shown in equation (12). pra is the probability of being judged as allowed access. C During the training phase, the model will continuously n = Softmax(tanh(W4u n + e  dT )  u a ) a  N (12) adjust parameters based on the difference between the u = sigmod(C un  a un a ) n=1 a true labels and the predicted probabilities to minimize the Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 29 loss function and optimize the probability estimates allowed access, thus completing access control. The allowed for access. According to the loss function, all implementation process of the designed ACM is shown in user nodes are classified based on whether they are Figure 6. Update the user Start Merge End preference node Input Update user social End user node Exportation parameter preference node Iterative Calculated fusion Initialize Access control propagation coefficient Enter the Traverse the Loss function Determine the user GNN model social matrix classification prediction node type Figure 6: Implementation process of the proposed access control algorithm. In Figure 6, the original data such as user 4 Analysis of ACM results on social information, social relationships, and content posted by users are first collected from social networks. Redundant networks information in the data is eliminated by removing This chapter mainly elaborates on the experimental duplicate items, and missing values are filled in to ensure results of the UP-GNN-SNAC model. The first section is the integrity of the data. The data are then uniformly converted through format conversion to complete the a performance analysis of ACMs based on improved preprocessing of the collected data. The random seed is GNN. The second section is an analysis of the practical set to 42 using the random module and numpy library in application effect of the ACM based on improved GNN. Python, thereby ensuring that the generated random number sequence is the same every time the code is run. Then, the basic attributes and social behavior of users are 4.1 Performance testing of social network extracted to construct user feature vectors, content feature ACM vectors, and social relationship features. The model is To verify the performance of the proposed used to map user node content to the status space, UP-GNN-SNAC model, this study conducts simulation outputting user node embedding matrices and content experiments using Python 3.7 on a Windows 11 64 bit node embedding matrices. Then, by analyzing users' operating system equipped with an Intel Core social behavior, personal and social preferences are i7-14700KF central processor, 16GB of RAM, and obtained, and a MLA mechanism is introduced to update 256GB of hard drive. The preferred propagation depth is the representations of user nodes and content nodes, 5, the learning rate is 0.0001, and the maximum number highlighting the influence of important neighbors. Users' of iterations is 200. Accuracy is the most intuitive social and personal preferences are combined to form a evaluation metric in classification models, representing comprehensive user preference representation. Next, the proportion of correctly classified samples to the total using the fused user nodes as input, iterative propagation sample size. It measures the accuracy of the model in is performed through GNN to update the node classifying user access permissions. The F1 value is the representation. Next, a loss function is defined to measure harmonic mean of accuracy and recall, used to the difference between the model's predicted results and comprehensively measure the performance of a model. It the true labels, and the model parameters are adjusted to can balance the accuracy and recall of the model and minimize the loss function. Finally, the trained model is avoid bias caused by imbalanced data. Firstly, the Twitter employed to classify and predict new user nodes, dataset is introduced to calculate the accuracy and F1 determine whether to allow access to specific resources, value of the research model, and compared with the and implement access control policies based on the accuracy and F1 value of traditional GNN and results of access control decisions. The purpose of these blockchain-based IoT ACMs in reference [20]. The actions is to ensure user privacy protection in social results are shown in Figure 7. networks. 30 Informatica 49 (2025) 21–36 Y. Zhang 10 10 90 90 80 80 70 70 60 60 50 50 40 Designed algorithm 40 Designed algorithm 30 Reference [20] 30 Reference [20] GNN GNN 20 20 10 10 0 0 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Number of iterations Number of iterations (a) Accuracy (b) F1 Figure 7: Accuracy and F1 value of different models. In Figure 7 (a), as the iteration increases, the 96.2±1.22%, respectively. Compared with the traditional accuracy of three models shows an upward trend. When GCN model and the model in reference [20], the F1 value iterating 200 times, the accuracy of traditional GNN is of the proposed model has increased by 12.6% and 6.4%, 88.3±1.97%, the accuracy of the model in reference [20] respectively. The accuracy and F1 value of the research is 91.4±2.03%, and the accuracy of the research model is model are significantly higher, proving its high 95.7±2.11%. Compared with traditional GNN and the classification accuracy and good effectiveness. The loss model in reference [20], the accuracy of the proposed function can be used to measure the difference between ACM has been improved by 7.4% and 4.3%, the model's predicted results and the true labels. The respectively. In Figure 7 (b), as iterations increase, the F1 experiment then introduces the Yelp dataset and values of various models gradually increase and tend to calculates the loss of the research algorithm under the flatten out. When the iteration reaches its maximum, the Twitter and Yelp datasets. The results compared with the F1 values of GNN, the model in reference [20], and the other two algorithms are shown in Figure 8. research model are 83.6±1.16%, 89.8±1.09%, and 1.0 0.45 0.9 0.40 0.8 0.35 Designed algorithm 0.7 Designed algorithm Reference [20] 0.30 Reference [20] 0.6 GNN GNN 0.25 0.5 0.20 0.4 0.15 0.3 0.2 0.10 0.1 0.05 0.0 0.00 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Number of iterations Number of iterations (a) Twitter dataset (b) Yelp dataset Figure 8: Loss of different algorithms in different datasets. In Figure 8 (a), on Twitter, as iterations increase, the calculating its accuracy, recall, and Area Under the Curve losses of different models all show a decreasing trend. (AUC) on both the Twitter and Reddit datasets, it is The loss values of traditional GNN, reference [20] compared with traditional GNN, Graph Convolutional models, and research models are 0.23±0.03, 0.14±0.02, Network, signature schemes based on ciphertext policy and 0.07±0.03. In Figure 8 (b), the changes in loss curves attributes in reference [19], and models in reference [20]. of different models in Yelp are consistent with those in The AUC metric is a statistical technique that can Twitter. The loss values of the three models are comprehensively reflect the model's ability to distinguish 0.12±0.02, 0.07±0.01, and 0.04±0.01. The loss value of between different categories. A higher AUC value the research model is greatly lower than others, indicating indicates that the model can more accurately predict its good generalization ability. It performs well in which users should be granted access permissions, different datasets, indicating good scalability. To further thereby reducing the likelihood of erroneously denying validate the performance of the proposed model by legitimate access or erroneously approving illegal access. Loss Accuracy (%) Loss F1 (%) Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 31 Meanwhile, the analysis of variance is used to evaluate The three indicators of the proposed model are 0.966, the differences between models. ANOVA is a statistical 0.943, and 0.982, respectively. On the Reddit dataset, the method used to compare whether there is a significant accuracies of the five models are 0.768, 0.821, 0.871, difference in the mean between two or more groups. It is 0.896, and 0.972, respectively, with recall rates of 0.813, a widely used tool for researching experimental design 0.846, 0.913, 0.935, and 0.938, and AUC values of 0.778, and data analysis. ANOVA compares the variability 0.815, 0.857, 0.911, and 0.976, respectively. In different between different groups to determine whether the within datasets, the accuracy, recall, and AUC values of the group variability is significantly smaller than the between three indicators of the proposed model are significantly group variability. If the inter-group variability is higher than those of other models, and the differences significantly greater than the intra-group variability, it between the three indicators of the five models are can be concluded that there are significant differences statistically significant (P<0.05), proving its good between different groups. The significance level is set to comprehensive performance and reliability. Finally, 0.05. If P<0.05, it indicates that the difference between ablation experiments are conducted on the proposed groups is statistically significant. Otherwise, the model to calculate the accuracy, recall, F1 value, and difference between groups is not statistically significant. running time of different modules. The results are shown The results are shown in Table 2. in Table 3. Table 2: Precision, recall, and AUC values of different Table 3: Results of ablation experiment. models. Running Module Accuracy Recall F1 Data Preci Rec AU time (s) Model P P P set sion all C Attention 0.774 0.819 0.807 69.58 0.8 0.7 module GNN 0.782 25 91 GNN 0.862 0.842 0.853 43.12 Graph module Convolu 0.8 0.7 Designed 0.825 0.973 0.956 0.961 46.89 tional 31 96 algorithm Network Twit Referenc <0. 0.9 <0. 0.8 <0. 0.865 From Table 3, the accuracy, recall, F1 value, and ter e [19] 05 04 05 36 05 Referenc 0.9 0.9 running time of the attention module are 0.774, 0.819, 0.903 e [20] 32 19 0.807, and 69.58s, respectively. The accuracy, recall, F1 Designe value, and running time of the GNN module are 0.862, d 0.9 0.9 0.966 algorith 43 82 0.842, 0.853, and 43.12s, respectively. The four m indicators of the designed model are 0.973, 0.956, 0.961, 0.8 0.7 GNN 0.768 and 46.89s, respectively. The accuracy, recall, and F1 13 78 Graph score of the designed model are higher than those of the Convolu 0.8 0.8 0.821 two sub-modules, and the running time is higher than that tional 46 15 of the attention module, but slightly lower than that of the Network Red Referenc <0. 0.9 <0. 0.8 <0. GNN module. Despite the augmented computational 0.871 dit e [19] 05 13 05 57 05 complexity of the model, it has been demonstrated to Referenc 0.9 0.9 0.896 enhance prediction accuracy. In practical application e [20] 35 11 Designe scenarios, the additional temporal expenditure is deemed d 0.9 0.9 0.972 justifiable. algorith 38 76 m 4.2 Analysis of the practical application effect From Table 2, in the Twitter dataset, the accuracy, of ACM in social networks recall, and AUC values of the traditional GNN model are To verify the practical application effect of the ACM 0.782, 0.825, and 0.791, respectively. The three based on improved GNN, this study first calculates the indicators of the graph convolutional network are 0.825, space overhead and computation time of the research 0.831, and 0.796, respectively. The three indicators of the model during encryption and decryption. It is compared model in reference [19] are 0.865, 0.904, and 0.836, with the results of traditional GNN and the model in respectively. The three indicators of the model in reference [20]. Space overhead refers to the storage space reference [20] are 0.903, 0.932, and 0.919, respectively. 32 Informatica 49 (2025) 21–36 Y. Zhang occupied during the storage and operation of the model, as shown in Figure 9. 50 40 Designed algorithm Reference [20] Designed algorithm Reference [20] 45 36 GNN GNN 40 32 35 28 30 24 25 20 20 16 15 12 10 8 5 4 0 0 10 15 20 25 30 0 5 10 15 20 25 Social data volume Social data volume (a) Calculate expenses (b) computing time Figure 9: The computational cost and time of different models. In Figure 9, as social data increases, the spatial PM 0 1 1 1 overhead and computation time of different models Sensitivity 1 1 0 1 gradually increase. When the social data scale is 30, the space overhead of traditional GNN, models in reference UA 1 1 1 1 [20], and research models is 43.7±3.13Kb, 30.4±3.05Kb, Trust level 1 1 0 1 and 16.2±2.88Kb, respectively, with computation times Personalizatio of 32.1±2.79s, 15.9±2.92s, and 5.3±0.97s. The space 1 0 1 1 overhead and computation time of the research model are n much lower than other models, which proves its high computational efficiency and low computational In Table 4, only the research model is consistent in complexity. This study validates the access control terms of UPQ. In terms of HR and UA, all four models effectiveness of the research model from seven aspects: are consistent. In terms of PM, traditional GNN does not User Preference Quantification (UPQ), Historical comply. The model in reference [20] does not match in Records (HR), Privacy Metrics (PM), Sensitivity, User terms of sensitivity and trustworthiness. This may be Attributes (UA), Trust, and Personalization. If the effect because the model may not have dynamically evaluated matches, output 1; otherwise, output 0. Table 4 compares user behavior, authentication, or contextual information, the model with traditional GNN, reference [19], and [20]. resulting in an inability to accurately measure trust levels. Among them, UPQ can meet user needs, HRs are used to In terms of personalization, only the model in reference evaluate the consistency of user behavior, PMs and [19] does not match. The research model is consistent in sensitivity can ensure data security and compliance, UAs all 7 aspects, proving that its access control effect is can provide basic access control basis, trust can evaluate relatively ideal. Finally, the Receiver Operating the reliability of user behavior, and personalization can Characteristic curve (ROC) is introduced. The horizontal improve user experience. The results are shown in Table axis of the ROC curve represents the false positive rate, 4. which represents the proportion of all negative samples Table 4: Access control effectiveness of different models. that were incorrectly predicted as positive. The vertical Researc axis represents the true sample rate, which represents the proportion of all actual positive samples correctly GN Referenc Referenc h Index predicted as positive samples. The model correctly N e [19] e [20] algorith identifies requests that are actually positive samples as m legitimate access and requests that are actually negative UPQ 0 0 0 1 samples as illegal access. The ROC curves of the four models are calculated separately, and the results are HR 1 1 1 1 shown in Figure 10. Space overhead (Kb) TIme (s) Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 33 1.0 0.9 0.8 0.7 0.6 0.5 0.4 Designed algorithm 0.3 Reference [20] 0.2 Reference [19] 0.1 GNN 0 0 0.2 0.4 0.6 0.8 1.0 False Positive Figure 10: ROC curves and correlation coefficients R of four models. From Figure 10, the ACM based on traditional GNN preferences and privacy measurement mechanisms, is closest to the standard line and has the smallest area effectively improving network security, it also increases under it. The lower area of the model in reference [19] is certain computational overhead. Therefore, in practical second, followed by the model in reference [20]. The applications, debugging needs to be carried out according ROC curve of the proposed model is closer to the upper to specific requirements. left corner, with the largest lower area, indicating its strong classification ability and proving the high accuracy 6 Conclusion of the user preference-based ACM based on the improved ACM is crucial for the security of social networks, as it GNN. can help protect sensitive data and prevent malicious 5 Discussion attacks and violations. To improve the accuracy and operational efficiency of social network ACM, a new The research aims to improve the effectiveness of social type of ACM was designed based on the preferences of network access control by utilizing MLA mechanisms to social network users. The user preferences were enhance GNN's understanding of complex social simulated using GNN, and a MLA mechanism was relationships and personal behavior. A GNN-based social introduced to improve the model. The experimental network ACM based on user preferences is proposed. The results showed that the accuracy and F1 value of the results showed that the accuracy and F1 value of the proposed model were 95.7% and 96.2%, respectively, proposed ACM improved by 7.4% and 12.6% significantly higher than other models. This proved that respectively compared to the GNN model and the through GNN and MLA mechanism, the model could blockchain-based IoT ACM, demonstrating its high dynamically capture user preference features and improve classification accuracy. This is similar to the conclusion classification accuracy. The space cost of the proposed drawn by You M et al. [6], while the proposed model is model was 16.2Kb and the computation time was 5.3s, superior. This is because the proposed model optimizes which was significantly lower than the space cost and GNN through a MLA mechanism, which can more computation time of other models. This proved that the effectively capture complex patterns of user preferences model adopted a lightweight GNN architecture, reducing and social relationships, thereby significantly improving computational complexity and optimizing algorithm performance. The computation time for encryption and design to reduce space cost. Although the proposed ACM decryption of the proposed model was 5.3 seconds, which has superior performance, there are still some was much lower than the GNN model and the shortcomings. The study did not test it on different types blockchain-based IoT ACM. This conclusion is consistent of social platforms, and future research will further test with the findings of Gai K et al. [7], but the running the performance of the model through different social efficiency of the proposed model is higher than that of the network platforms to improve its universality. At the method proposed by Gai K et al. This is because the same time, the performance of the model in dynamic proposed model significantly improves computation time environments will be explored to cope with the constantly through MLA and information entropy. In summary, the changing user behavior and data traffic in social proposed model performs well in multiple aspects. networks. Although the proposed model can more accurately identify legitimate and illegitimate access through user True Positive 34 Informatica 49 (2025) 21–36 Y. Zhang Journal of System Assurance Engineering and Funding Management, 2023, 14(4): 1379-1386. https://doi.org/10.1007/s13198-023-01942-z The authors declare that no funds, grants, or other support [9] Azbeg K, Ouchetto O, Andaloussi S J. (2022). Access were received during the preparation of this manuscript. control and privacy-preserving blockchain-based system for diseases management. IEEE Transactions Competing interests on Computational Social Systems, 2022, 10(4): The authors have no relevant financial or non-financial 1515-1527. interests to disclose. https://doi.org/10.1109/TCSS.2022.3186945 [10] Zhang L, Zhang Y, Wu Q, Mu Y, Rezaeibagha F. Data availability statement (2022). A secure and efficient decentralized access control scheme based on blockchain for vehicular All data generated or analysed during this study are social networks. IEEE Internet of Things Journal, included in this article. 2022, 9(18): 17938-17952. https://doi.org/10.1109/JIOT.2022.3161047 References [11] Zhao Y, Yu H, Liang Y, Conti M, Bazzi W, Ren Y. (2023). A sanitizable access control with [1] Gai T, Cao M, Chiclana F, Zhang Z, Dong Y, policy-protection for vehicular social networks. Herrera-Viedma E, Wu J. (2023). Consensus-trust IEEE Transactions on Intelligent Transportation driven bidirectional feedback mechanism for Systems, 2023, 25(3): 2956-2965. improving consensus in social network large-group https://doi.org/10.1109/TITS.2023.3285623 decision making. Group Decision and Negotiation, [12] Squicciarini A, Rajtmajer S, Gao Y, Semonsen J, 32(1): 45-74. Belmonte A, Agarwal P. (2022). An extended https://doi.org/10.1007/s10726-022-09798-7 ultimatum game for multi-party access control in [2] Kashmar N, Adda M, Ibrahim H. (2022). Access social networks. ACM Transactions on the Web control metamodels: review, critical analysis, and (TWEB), 2022, 16(3): 1-23. research issues. Journal of Ubiquitous Systems and https://doi.org/10.1145/3555351 Pervasive Networks, 16(2): 93-102. [13] Dixit M S, Wajgi M D, Wanjari S. (2022). Real time https://doi.org/10.5383/JUSPN.03.01.000 user access control on social network using deep [3] Wang W, Huang H, Yin Z, Gadekallu T R, Alazab, M, learning. International Journal for Research Su C. (2023). Smart contract token-based Publication and Seminar, 2022, 13(2): 246-251. privacy-preserving access control system for https://jrps.shodhsagar.com/index.php/j/article/view/ industrial Internet of Things. Digital 598. Communications and Networks, 2023, 9(2): 337-346. [14] Wen W, Fan J, Zhang Y, Fang Y. (2022). APCAS: https://doi.org/10.1016/j.dcan.2022.10.005 Autonomous privacy control and authentication [4] Thabit S, Yan L S, Tao Y, Abdullah A B. (2022). Trust sharing in social networks. IEEE Transactions on management and data protection for online social Computational Social Systems, 2022, 10(6): networks. IET Communications, 2022, 16(12): 3169-3180. 1355-1368. https://doi.org/10.1049/cmu2.12401 https://doi.org/10.1109/TCSS.2022.3218883 [5] Ameer S, Benson J, Sandhu R. (2022). Hybrid [15] Safi S M, Movaghar A, Ghorbani M. (2022). Privacy approaches (ABAC and RBAC) toward secure protection scheme for mobile social network. access control in smart home IoT. IEEE Journal of King Saud University-Computer and Transactions on Dependable and Secure Computing, Information Sciences, 2022, 34(7): 4062-4074. 2022, 20(5): 4032-4051. https://doi.org/10.1016/j.jksuci.2022.05.011 https://doi.org/10.1109/TDSC.2022.3216297. [16] Ahmed F, Wei L, Niu Y, Zhao T, Zhang W, Zhang D, [6] You M, Yin J, Wang H, Cao J, Wang K, Miao Y, Dong W. (2022). Toward fine‐grained access control Bertino E. (2023). A knowledge graph empowered and privacy protection for video sharing in media online learning framework for access control convergence environment. International Journal of decision-making. World Wide Web, 2023, 26(2): Intelligent Systems, 2022, 37(5): 3025-3049. 827-848. https://doi.org/10.1002/int.22810 https://doi.org/10.1007/s11280-022-01076-5 [17] Salem R B, Aimeur E, Hage H. (2023). A multi-party [7] Gai K, She Y, Zhu L, Choo K K R, Wan Z. (2023). A agent for privacy preference elicitation. Artificial blockchain-based access control scheme for zero Intelligence and Applications, 2023, 1(2): 98-105. trust cross-organizational data sharing. ACM https://doi.org/10.47852/bonviewAIA2202514 Transactions on Internet Technology, 2023, 23(3): [18] Mayeke N R, Arigbabu A T, Olaniyi O O, Okunleye 1-25. https://doi.org/10.1145/3511899 O J, Adigwe C S. (2024). Evolving access control [8] Wu H, Ye W, Guo Y. (2023). Data access control paradigms: A comprehensive multi-dimensional method of cloud network secure storage under analysis of security risks and system assurance in Social Internet of Things environment. International Graph Neural Network-Based User Preference Model for Social… Informatica 49 (2025) 21–36 35 cyber engineering. Asian Journal of Research in Computer Science, 2024, 17(5): 108-124. https://doi.org/10.2139/ssrn.4752902 [19] Patil R Y. (2024). A secure privacy preserving and access control scheme for medical internet of things (MIoT) using attribute-based signcryption. International Journal of Information Technology, 2024, 16(1): 181-191. https://doi.org/10.1007/s41870-023-01569-0 [20] Zhonghua C, Goyal S B, Rajawat A S. (2024). Smart contracts attribute-based access control model for security & privacy of IoT system using blockchain and edge computing. The Journal of Supercomputing, 2024, 80(2): 1396-1425. https://doi.org/10.1007/s11227-023-05517-4 36 Informatica 49 (2025) 21–36 Y. Zhang https://doi.org/10.31449/inf.v49i16.7787 Informatica 49 (2025) 37–52 37 Fusion of Deep Convolutional Neural Networks and Brain Visual Cognition for Enhanced Image Classification Xintao Li1, *, Hongyan Guo2 1College of Innovation and Entrepreneurship, Henan Open University, Zhengzhou 450046, China 2School of Information Engineering and Artificial Intelligence, Zhengzhou Vocational University of Information and Technology, Zhengzhou 450046, China *Email of Corresponding Author: lxt5168@163.com Keywords: deep convolutional neural network, brain, visual cognition, intelligent computing model, image classification Received: December 9, 2024 The brain visual system is one of the core centers for human perception of external information. How to establish the brain visual cognitive system to classify and process image information is a key matter in the area of human-computer connection. In order to improve the accuracy of computer vision image classification, a fusion intelligent computing model based on deep convolutional neural network and brain visual cognition is proposed. This model simulates the visual processing mechanism of the human brain and uses brain computer interface technology to extract electroencephalogram signals, thereby achieving efficient classification and processing of image information. When designing an image classification model based on DCNN, a long short-term memory network structure is introduced to extract time series features of electroencephalogram signals. In order to enhance the classification accuracy of the model, attention mechanism and occlusion independent neural response methods are also applied to improve the accuracy of capturing the correlation information between brain response and image features. The results show that the prediction accuracy of the research model reaches 93.54% and 94.03% in the V4 visual region and L0 visual region, respectively. The highest accuracy on facial visual images reaches 95.46%, while the lowest accuracy on animal visual images is 91.57%. By introducing the long short-term memory module, the loss value of the model decreases from 0.26 to 0.21, with a reduction of 19.23%. In addition, ablation experiments show that by introducing attention mechanisms and occlusion independent neural responses, the final classification accuracy is improved to 93.94%. In summary, the research on the fusion intelligent computing model grounded on deep convolutional neural networks and brain visual cognition effectively improves the accuracy of image classification and demonstrated its potential in the field of intelligent computing. Povzetek: Predstavljen je inteligentni model za razvrščanje slik, ki združuje globoke konvolucijske nevronske mreže (DCNN) in možgansko vizualno kognicijo preko EEG signalov. 1 Introduction outstanding performance in image processing tasks [4]. sHowever, despite the excellent performance of computer With the rapid prosperity of artificial intelligence, technology in image classification, computers still cannot human-computer interaction has turned into a trend in the fully replace the precise image recognition and current research field. Brain computer interface (BCI), as classification capabilities of the human brain in complex a cutting-edge scientific research direction, is gradually and diverse open environments with interference and becoming a meaningful bridge in the area of human- occlusion [5]. So, the challenge currently facing the field computer connection. The visual system of the human of computer vision is figuring out how to empower brain has evolved over millions of years and possesses artificial intelligence systems to more effectively mimic extremely efficient visual processing capabilities. human brain cognition and attain precise image Through multi-level visual processing mechanisms, the classification in intricate scenarios. Therefore, in this brain can quickly and accurately understand complex context, research innovatively combines the powerful visual information [1]. When external objects are computing power of DCNN with the cognitive transmitted to the visual center of the brain through the characteristics of the brain’s visual system, and constructs visual organs, the brain quickly recognizes, classifies, and an intelligent computing model based on the fusion of understands these visual information, thereby forming DCNN and brain visual cognitive information, in order to cognition of the object or scene [2]. BCI can interpret achieve accurate image classification in complex visual cognition of the brain by recording and analyzing backgrounds. electroencephalogram (EEG) signals [3]. The Deep The research objectives include designing and Convolutional Neural Network (DCNN) in computer implementing an intelligent computing model based on vision technology has attracted much attention due to its DCNN and EEG signal fusion to improve the performance 38 Informatica 49 (2025) 37–52 X. Li et al. of image classification in interference and occlusion generative adversarial networks and variational environments. The research aims to explore how the autoencoders to produce composite EEG cues. The results model simulates the visual recognition process of the showed that the method was effective [8]. Kumari et al. human brain, especially for accurate image classification proposed a multi-channel EEG movement sorting model in complex backgrounds. The research hypothesis is that to improve the precision of EEG movement sorting. The by combining the visual feedback and image features of model utilized CNN to extract descriptive emotional state the brain, intelligent computing models can simulate the characteristics from EEG signals and generates two- visual recognition process of the human brain, thereby dimensional images to represent these features. The improving the accuracy of classification results. The outcomes revealed that the overall precision of this model preset results demonstrate that by introducing visual reached 83.04% [9]. cognitive information from the brain, the model can mimic DCNN occupies a momentous position in EEG the cognitive process of the human brain in actual visual picture sorting tasks. Santamaria-Vazquez et al. raised a tasks, providing new ideas and directions for the sorting model grounded on different control signals to integration of BCIs and intelligent systems. extract complex features from EEG data for classification. The research content mainly includes four sections. The model used DCNN for time calibration of BCIs and The second section provides a survey of the current study integrated modules for detection of event-related status of visual EEG picture classification and DCNN potentials. The outcomes revealed that the command around the world. The third section conducts research on decoding accuracy of this way improved by 16.0% [10]. intelligent computing models that integrate DCNN and Yıldırım et al. raised a novel deep one-dimensional CNN brain visual cognition. The first section proposes the monitoring model to optimize the precision of EEG design of a picture sorting model grounded on the fusion monitoring. The model utilized machine learning of DCNN and brain visual cognition information. The techniques to automatically identify regular and aberrant second section designs an intelligent computing model EEG signals, and classified EEG signals using an end-to- based on the fusion of DCNN and brain visual cognitive end structure. The outcomes revealed that this way was information. The fourth section validates the intelligent feasible [11]. Miao et al. raised a multi-layer CNN model computing model that integrates DCNN with brain visual using a DCNN structure to raise the classification cognition. precision of EEG pattern identification algorithms. The model utilized prior knowledge and complex parameter 2 Related works adjustments to extract spatial frequency features. The outcomes showed that this way had good classification The visual cognitive ability of the brain can recognize, capability [12]. Li et al. proposed a way of using DCNN classify, and understand visual information. In recent combined with continuous wavelet transform to enhance years, research on visual interpretation based on the identification rate of limbs action image EEG cues. monitoring the neural response of the brain during visual This method mapped the limbs action image EEG cues to cognition has gained the eyesight of numerous time-frequency image signals using continuous wavelet professionals and savants. Gao et al. raised an attention- transform, and input the image signals into the CNN based parallel multi-scale Convolutional Neural Network structure to collect characteristics and classify them. The (CNN) model to improve the accuracy of decoding EEG outcomes revealed that this way effectively raised the aroused potentials. The model used two parallel recognition rate [13]. In recent years, the combination of convolutional layers to extract temporal features and BCI and DCNN has become an important research utilized attention mechanisms to weight features at direction in the analysis of EEG and brain visual neural different times. The outcomes revealed that the model activity signals. The detailed progress of BCI is as follows: effectively reformed the interpreting ability of ocular Tang X et al. proposed an end-to-end BCI method based aroused potentials under complex conditions [6]. Ahirwal on CNNs, which directly extracts spatiotemporal features et al. proposed a new channel selection technique that from EEG signals and classifies them. The results showed could identify and characterize harmful emotions aiming that this method could achieve higher classification to raise the precision of emotion sorting of EEG signals. accuracy than traditional manual feature extraction This technique extracted three forms of characteristics methods, especially in motion imagination tasks and from EEG cues: time-domain characteristics, frequency- various emotional state classification tasks [14]. In domain characteristics, and entropy based characteristics, addition, Kawala Sterniuk et al. reviewed over 50 years of and used Support Vector Machines (SVM) and artificial using BCIs and concluded that BCI not only enables brain neural networks to classify emotions based on the control, but also opens the door for regulating the central extracted features. The outcomes showed that this way nervous system through neural interfaces, demonstrating effectively optimized the sorting behaviour [7]. the potential applications of this technology [15]. The Komolovait et al. raised a way of using CNN combined research on integrating BCI and DCNN will provide a with stable-state ocular aroused potentials to gain more solid foundation for the popularization and interpretable characteristics from rough EEG cues in order application of BCI technology. The comparative summary to improve the effectiveness of brain activity data in table is shown in Table 1. classifying visual stimuli. This method also introduced Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 39 Table 1: Comparison summary table Study Method Advantages Limitations Missing features Gao Parallel multi-scale Improved the Still affected by noise in Failed to effectively combine et al. CNN based on attention decoding complex environments, spatial and temporal features in the [6] performance requires processing a brain's visual cognitive process; of visual large amount of cannot adapt to complex evoked temporal features environmental visual information potentials processing Ahirw EEG-based emotion Improved Focuses mainly on Cannot process complex visual al et classification model, emotion emotion classification, information and its complex al. [7] combining SVM and classification lacks deep classification relationship with emotions artificial neural accuracy and processing of visual networks information Komo Steady-state visual Effectively Poor robustness to signal Failed to effectively combine lovait evoked potentials improved noise, high complexity visual cognitive mechanisms; ė et combined with CNNs visual in training generative limited to static visual stimulus al. [8] stimulus adversarial networks processing classification Kuma Multi-channel EEG- Achieved an Focuses on emotion Cannot handle complex image ri et based emotion average classification, mainly classification tasks, especially al. [9] classification model accuracy of uses image feature multi-class image recognition 83.04% representations, lacks handling of more complex scenarios Santa Classification model Increased Relies heavily on event- Lacks adaptability to dynamic maria- based on different command related potential EEG signals, unable to combine Vazqu control signals using decoding detection, may face spatial and temporal features ez et DCNNs accuracy by difficulties in decoding al. 16.0% complex EEG data [10] Yıldır EEG monitoring model Provides a Focuses on normal vs Cannot effectively process multi- ım et based on deep one- feasible abnormal EEG signal class or dynamically changing al. dimensional CNNs classification classification, lacks visual information [11] method ability to handle complex visual tasks Miao EEG pattern recognition Shows good Mainly focuses on Lacks comprehensive capture of et al. based on multi-layer classification spatial frequency feature dynamic EEG data or multi- [12] DCNNs performance extraction, may be dimensional features of visual limited in handling information complex dynamic tasks Li et Classification of Significantly Relies on signal Cannot process EEG signals al. left/right hand motor improved preprocessing, suitable related to visual tasks, sensitive to [13] imagery EEG signals recognition for specific tasks environmental noise combined with rate continuous wavelet transform and DCNNs In summary, although existing methods have made complex environments in existing methods. It has higher some progress in EEG classification tasks, they have classification accuracy and wide application prospects. certain limitations in handling complex dynamic tasks, 3 Intelligent computing model enhancing robustness, and adapting to multiple tasks. The research combines the visual cognitive mechanism of the integrating DCNN and brain visual brain with DCNNs and Long Short-Term Memory (LSTM) cognition networks to design a fusion intelligent computing model. Research receives EEG information through BCI, This model can more comprehensively capture the spatial combines voxel encoding and improved DCNN model to and temporal features of EEG signals, solving the achieve image classification, and uses LSTM to collect problems of lack of adaptability and poor adaptability to temporal characteristics of EEG cues. Attention 40 Informatica 49 (2025) 37–52 X. Li et al. mechanism is utilized to raise the accuracy of image The ventral lower temporal cortex is particularly closely feature extraction, and the correlation between brain related to complex visual recognition and is the main response and image features is enhanced by masking functional area for object and face recognition. When the irrelevant neural responses. brain receives visual stimuli, it stimulates the cortical regions in the ventral stream, transforming simple visual 3.1 Design of image classification model features into higher-level cognitive concepts. For instance, based on DCNN and brain visual visual information is initially processed by the primary visual cortex and then passed through intermediate areas, cognitive information ultimately being mapped to the inferior temporal cortex Neuroscience research has found that the human brain within the ventral stream, where intricate functions like achieves complex cognitive processing through parallel object recognition and color discrimination take place. information exchange between dorsal and ventral streams The dorsal flow is mainly responsible for processing in visual activities [16]. Abdominal flow is a pathway that spatial information, motion perception, and action control. connects the primary sensory cortex with the temporal and Dorsal flow helps the brain perform functions such as prefrontal regions, primarily responsible for recognizing object localization, motion tracking, and hand eye visual and auditory stimuli and mapping basic information coordination through connections with the parietal lobe, to higher-level semantic concepts [17]. The dorsal flow is motor cortex, and other areas. Therefore, given the core responsible for spatial information and motion control. role of ventral flow in image classification tasks, research The activity of brain neurons triggered by visual stimuli is focuses on analyzing the brain signal response of ventral called EEG signals, and BCIs can record and measure flow to better understand the process of visual feature these signals through biometric technology to reflect the extraction and semantic comprehension. The encoding brain’s response to behavior. The core area of ventral flow framework for ventral response based on brain visual includes the primary visual cortex, ventral intermediate cognition is shown in Figure 1. cortex, ventral lower temporal cortex, and other regions. V1 V2 V4 L0 Stimulus Linear layer Visual area of Feature extraction brain activity model Figure 1: A coding framework for ventral response based on brain visual cognition As shown in Figure 1, in the ventral response and visual stimuli. This mapping helps to reveal the roles encoding framework based on brain visual cognition, the of different brain regions in visual information processing, brain activity caused by visual stimuli can be obtained thereby enhancing the accuracy of image classification through the BCI, and the stimulus image can be input into tasks. The EEG signals capture the electrical activity of the feature extraction model. After nonlinear calculation, the cerebral cortex, which can be mapped to specific the feature space of the image can be obtained. Then, these regions of the brain through modeling techniques such as features are used to predict the voxel space of the visual source localization, in order to infer activity responses in region through linear layers. The voxel encoding model different areas. This type of method can correlate the transforms human-readable data into a format that spatiotemporal patterns of EEG signals with voxels in machines can store, facilitating the achievement of either functional neuroimaging data. There may be some shared encoding across various visual regions or unique common neural response patterns between multiple visual encoding for specific visual areas. This process aids in regions. These shared response patterns can be captured in pinpointing the regions within the brain's visual cortex that voxel encoding models, revealing how these regions are responsible for processing visual information. [18]. collectively respond to the same visual stimuli. For Voxel encoding converts brain activity into a feature space, example, in image classification tasks, certain visual enabling precise association between cognitive responses regions may exhibit similar neural activity responses to Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 41 the same visual features, so voxel encoding can reflect the feature representations by combining convolutional and similarity and interactivity between these regions as a pooling layers, which helps extract abstract features from shared encoding pattern. By combining the results of brain data. Therefore, the study adopts DCNN to extract image visual cognition and image classification, complementary features and designs a picture sorting model grounded on information exchange and expression can be achieved, the fusion of DCNN and brain visual cognitive thereby obtaining a more comprehensive joint information, as shown in Figure 2. representation. DCNN can automatically learn image Brain response Splicing Reliability prediction SVM EEG signal data together acquisition DCNN Figure 2: Image classification model grounded on information fusion As represented in Figure 2, the image classification    wv = d f / (d + f ) (2 v f d ) model based on information fusion mainly includes three b v parts: characteristic collection structure, characteristic In equation (2), w represents the fusion weight of v reliability prediction structure, and brain computer  image features, and d f represents the classification information fusion classification structure. The brain v response data utilizes the ventral response encoding sensitivity index of image features. The fusion weight of framework to extract semantic features, while image data brain response features is shown in equation (3).    is extracted through the DCNN structure for image wb = d f / (d f + d f ) (3) b b v characteristic collection. Next, the extracted characteristics are input into the feature reliability In equation (3), w represents the fusion weight of b prediction structure for reliability calculation, and then the brain response features. The math description for the fusion weights of image features and brain response fusion feature is represented in equation (4). characteristic are automatically adjusted. Finally, the fF = (wb  f (b))concat(wv  f (v)) (4) fused features are input into the SVM for classification. In equation (4), f represents fusion features, F f (b) EEG signals will undergo denoising processing after acquisition, such as bandpass filtering, independent represents brain response features, and f (v) represents component analysis, signal normalization, and other image features. Due to the fact that EEG signals are preprocessing operations to ensure signal quality. collected over a continuous period of time and have time- Subsequently, EEG signals will be synchronized with the series characteristics, there is a continuity relationship presentation time of visual stimuli to ensure accurate between the signals at each moment and those before and matching between brain responses and image features at after [19]. However, although existing feature extraction each moment. The loss function for reliability prediction models perform well in many application scenarios, they is shown in equation (1). often do not fully consider the temporal dependencies in n   time series data. Especially when it comes to traditional L = ( )2 ) MSE  d p − d f / n (1 n=1 b models like CNNs, while they excel at extracting features from images and static data, they can fall short when it In equation (1), L means the loss function of MSE comes to capturing temporal information and dynamic  reliability prediction, d p represents the feature reliability signal changes in time-series data, such as EEG signals. In  response to this situation, the study uses an LSTM prediction value, d f represents the classification b structure to extract time series features of EEG signals. sensitivity index of brain response features, and n The architecture for extracting brain response features represents the batch size. The fusion weights of image based on time series is shown in Figure 3. features are shown in equation (2). 42 Informatica 49 (2025) 37–52 X. Li et al. Transformer module T time stamps T N calss tokens x1 x2 x3 xt x1 x2 x3 xt Temporal Filtering transfomer module V V V L 1 2 4 0 Global features of EEG signals T + N Figure 3: Architecture for extracting brain response features based on time series As shown in Figure 3, the brain response feature represents the output matrix of the input gate. The extraction architecture based on time series uses calculation for the output gate is shown in equation (7). Transformer module to extract global features of EEG ot = (Wo ht−1, xt + bo ) (7) signals on time series, and embeds absolute positions to maintain the order of the model. Before inputting In equation (7), o represents the output gate, W t o positional encoding, the classification identification bits means the weight of the output gate, and b means the o are concatenated with the time series, and then mapped offset term of the output gate. The output features are through linear transformation to raise the diversity of shown in equation (8). characteristic collection. In research models, LSTM is h = o  tanh(C ) mainly used to integrate brain response data collected (8) t t t from BCIs. The integration process is as follows: Firstly, In equation (8), h represents the output feature. The t the brain response signals collected from the BCI system unique gating mechanism of LSTM can effectively handle are preprocessed, such as denoising and normalization, to the problem of long time intervals and delays in time series, obtain clean time-series data. Then, these preprocessed and can discard and store large-span information in EEG brain response data are used as inputs for the LSTM data, thus better encoding EEG signals. network. LSTM networks can capture temporal dependencies in data and learn neural response patterns of the brain at different time points. Next, through the time- 3.2 Design of intelligent computing model dependent modeling of LSTM, the output data contains based on DCNN and brain visual the gradual response patterns of the brain to visual stimuli cognitive information throughout the entire image processing process. Finally, The study simulates the connectivity and the temporal response of the brain is processed by LSTM classification patterns of biological brain neurons, and combined with image features extracted by DCNN. exploring the connection between picture features and The calculation for the forget gate of LSTM structure is brain reactions. DCNN has demonstrated significant shown in equation (5). capabilities in image feature extraction. By combining ft = (W f [ht−1, xt ]+ bf ) (5) convolutional and pooling layers, it can automatically learn multi-level abstract feature representations of In equation (5), f means the output of the forget gate, t images, effectively capturing low-level and high-level W means the weight of the forget gate,  means the f features in images. However, despite DCNN's high Sigmoid activation function, b represents the offset term efficiency in feature extraction, the image features it f extracts still struggle to fully explain the brain's response of the forget gate, x represents the input signal at time t , t patterns. This is because the visual cognitive process of and h represents the output signal at time t −1 . The the brain not only relies on low-level visual features of t−1 images, but also involves complex high-level semantic unit update calculation is shown in equation (6). information processing, perceptual integration, and Ct = ft Ct−1 + it Ct (6) interaction with other cognitive processes such as memory In equation (6), C means the cell condition at time and emotion. The features extracted by DCNN mainly t t , C represents the cell condition at time t −1 , and i focus on significant visual information in the image, but t−1 t these features often lack sufficient high-level semantic Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 43 depth and are difficult to fully integrate with the complex to construct an intelligent computing model for the brain. responses of the brain in visual cognition. Therefore, the The brain response is used as supervised information for picture characteristics extracted by DCNN are difficult to images, and difficult to interpret high-level semantic fully explain the representation information of brain information is transferred to the DCNN model to achieve response, and there are some modality specific more accurate mining of the brain’s visual cognitive expressions between the two, making it difficult to deeply response. The intelligent computing model structure explore their deep correlations [20, 21]. In response to this grounded on data fusion is represented in Figure 4. limitation, a network structure based on DCNN is studied Hypersphere Brain response v1 v2 v3 vN b1 EEG signal data Normalization acquisition b2 b3 DCNN bN Figure 4: Intelligent computing model structure grounded on data fusion As represented in Figure 4, in the intelligent and N represents the total amount of positive and computing model framework based on information fusion, negative samples. The calculation for the comparative loss characteristic extraction is first constructed on the is represented in equation (10). cognitive response data of the brain to visual images m collected by the BCI. Then, the DCNN structure is applied exp(S( f (vi ), f (b+ j )) / )  j=0 to collect characteristics from the input picture. After Li = −log (10) n normalizing the two extracted features separately, the exp(S( f (vi ), f (b− k )) / ) k=0 fused features are mapped onto an N-dimensional sphere. Subsequently, based on the normalized features, a set of In equation (10), L i represents the contrastive loss, positive and negative samples are constructed, and the m means the amount of positive samples, n means the InfoNCE loss function is used for calculation, thereby amount of negative samples, f (v ) means the mapped achieving the transfer of correlated information between i + the two feature maps. The math description of the image features, f (bj ) represents brain response features InfoNCE loss function is represented in equation (9). of the same category as the image features, and f (b− j ) exp(S(z + i , zi ) / ) Li = −log N (9) represents brain response features of different categories exp(S(zi , z j ) / ) from the image features. In intelligent computing models j=0 based on the fusion of DCNN and brain visual cognitive In equation (9), L represents the InfoNCE loss i information, the DCNN structure may be affected in function,  represents the temperature coefficient, z classification accuracy due to irrelevant information. To i represents the image representation corresponding to the address this issue, research is being conducted to improve the DCNN structure by incorporating attention input data x , i S(z e r s n s t i , z r e t h j ) p e e cosine similarity mechanisms. The DCNN feature extraction model between image representations, S(z , + i zi ) represents the grounded on attention mechanism is represented in Figure alignment characteristics during hypersphere mapping, 5. 44 Informatica 49 (2025) 37–52 X. Li et al. (W2 , H2 ) (1,W2 H2 ) Fat A W W1 W2 H Frr1 H F 1 rr 2 H2 X0 X1 X2 X3 Figure 5: DCNN feature extraction model grounded on attention mechanism As represented in Figure 5, the DCNN architecture In equation (12), A represents positional importance used mainly includes multiple convolutional layers, and F represents fully connected operation. The new at pooling layers, activation functions, and fully connected feature map obtained by further downsampling the layers, aiming to extract multi-level feature abstract features is shown in equation (13). representations from images to improve classification X2 = Frr2 (X1) (13) performance. The core idea of DCNN is to extract local features from images through a series of convolution and In equation (13), X represents the new feature map 2 pooling operations, and then add nonlinear after further downsampling, and F represents the rr 2 transformations through nonlinear activation functions to further downsampling operation. The corrected feature learn more complex image representations. The map is shown in equation (14). convolutional layer, as a fundamental component in the W2 H2 DCNN architecture, can extract local features of the input X3 = A(i, j) X2 (i, j) (14) image through convolution operations. After each i=1j=1 convolution operation in each layer, the study will use In equation (14), X represents the feature map 3 activation functions to perform nonlinear transformations obtained after attention branch correction, W and H on the output results. The purpose of the activation represent the width and height of the characteristic map, function is to introduce nonlinear factors, so that the and (i, j) represents the feature values on the feature map. network can learn more complex mapping relationships. The function of the pooling layer is to downsample the When capturing the correlation information between brain feature map output by the convolutional layer, thereby visual cognitive responses and image features, some non- reducing the spatial size of the feature map while correlated neural responses may affect the determination preserving important features. After extracting sufficient of representation similarity. These "unrelated neural local features in the convolutional and pooling layers, the responses" pertain to neural activities that aren't directly last few layers are usually fully connected layers. The fully tied to visual tasks and might stem from background noise, connected layer linearly combines the extracted features irrelevant visual cues, or various other bodily influences. and generates the final output result through an activation For example, the activity of certain regions in EEG signals function. The DCNN feature extraction model grounded may be unrelated to the current visual task, and this on attention mechanism adds a parallel attention branch to irrelevant neural activity can lead to misleading similarity the initial DCNN structure to learn the importance judgments when the brain processes visual information. information of feature map position. This path can correct To address this issue, research has been conducted on an the activation values of feature maps, reduce the activation intelligent computing model based on the fusion of DCNN values of redundant information, and thus improve the and brain visual cognitive information, which devotes to accuracy of image characteristic collection [22, 23]. The raise the precision of capturing correlated information by feature transformation process is shown in equation (11). masking non-correlated neural responses. In the intelligent X = F computing model based on the fusion of DCNN and brain 1 rr1(X0 ) (11) visual cognitive information, the study aims to add In equation (11), X represents the transformed 1 windows of different scales to the extracted image features abstract feature map, X represents the initial feature map, to mask non-correlated neural responses. The motivation 0 of this method is to better highlight the effective response and F represents the downsampling operation. The rr1 of the brain to visual information and improve the calculation for position importance is shown in equation accuracy of similarity determination between brain visual (12). cognitive responses and image features by reducing or A = Fat (X1) (12) eliminating the influence of irrelevant neural reactions. The visualization process of brain response and image feature correlation information is shown in Figure 6. Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 45 Electroencephalogram signal Image features Feature Feature Feature extraction extraction extraction d ( f (v), g(b)) d ( f (v (x, y)), f (b)) d ( f (v (x, y)), f (b)) 1 n 1 1  n  n V V 1  n V t Figure 6: Visualization process of brain response and image feature correlation information As shown in Figure 6, in the process of visualizing the used Python language and was implemented using the correlation information between brain response and image PyTorch framework. The experimental parameters were features, the correlation features are first represented in the set as follows: batch size was 16, original learning rate was shared representation space and their Euclidean distance 0.001, Adam optimizer was used during training, output is calculated. Then, a window with a scale of  i  is layer size was 40, and the key vector value in the self i added to the extracted image features to mask non- attention mechanism was 128. The dataset was sourced correlated neural responses. Based on the Euclidean from the comprehensive evaluation platform Brain Score. distances calculated from various image features, different This dataset aimed to evaluate the effectiveness and sizes of occlusion windows are determined. The saliency accuracy of computer simulated brain operation models, maps obtained through occlusion at different scales are thus covering response data of primate visual systems. The then combined to create a comprehensive saliency map dataset contains approximately 5000 image stimuli, each that encapsulates the relationship between brain responses corresponding to a recorded brain electrophysiological and image features. The calculation for the significance response data. The stimulus images cover a total of 40 map is shown in equation (15). categories, including natural scenes and artificial objects. The number of images in each category is roughly equal V =| d( f (v (x, y)), f (b))−d( f (v), g(b)) | (15) t t to ensure data balance. The size of each image is 224x224 In equation (15), V represents the significance map pixels, which can retain sufficient visual information and  t meet the input requirements of CNN. Any data and d represents the Euclidean distance. augmentation techniques used by the research include random cropping, horizontal flipping, random rotation, 4 Validation of an intelligent and color jitter. The above data augmentation techniques computing model integrating can effectively expand the diversity of training data, avoid model overfitting, and improve the generalization ability DCNN and brain visual cognition to various visual stimuli. After preprocessing, the data was After setting up the experimental environment, the separated into a training set and a testing set in a 3:7 ratio. behaviour of the image classification model grounded on While primarily intended for evaluating brain functional information fusion was first verified, and then the models, the "Brain Score" dataset is well-suited as a data intelligent computing model based on information fusion source in this study to verify the efficacy of intelligent was experimentally analyzed. computing models that integrate image classification with brain visual cognition, given its abundance of visual 4.1 Experiment environment construction stimulus images and corresponding EEG response data. In the experimental design of this study, the evaluation of To tesify the effectiveness of the intelligent image classification focuses on guiding the learning and computing model that integrated DCNN and brain visual classification of image features through brain response cognition, the study first conducted the construction of an data, rather than simply image classification. The detailed experimental environment. The experimental hardware experiment environment configuration and network system configuration was as follows: the processor was training parameters are represented in Table 2. Intel i7-8700, the GPU was Nvidia GeForce 1080Ti, and the memory was 64 GB DDR4. The experimental model 46 Informatica 49 (2025) 37–52 X. Li et al. Table 2: Experiment environment configuration and prediction accuracy of the ventral response encoding network training parameters method based on brain visual cognition was significantly higher than the other two methods. The maximum Experiment Configuration Training par Config prediction accuracy of this method reached 93.54%, which al environ ameters uration was 6.05% and 19.57% higher than the maximum ment prediction accuracy of CNN-EM and GaborNet-VE, CPU Intel i7-8700 Batch Size 16 which were 87.49% and 73.97%, separately. From Figure GPU Nvidia GeFor Initial learni 0.001 7(b), the results of visual area L0 show that the maximum ce 1080Ti ng rate prediction accuracy of the ventral response encoding Memory 64 GB DDR4 Output layer 40 method based on brain visual cognition was 94.03%, size which was 11.49% and 18.95% higher than the maximum programmi Python Key vector v 128 accuracy of 82.54% and 75.08% of the other two methods, ng languag alue respectively. In addition, the study used paired t-tests to e validate the credibility of the results. In the V4 region, the Frame PyTorch Optimizer Adam difference in accuracy between the ventral response encoding based on brain visual cognition and CNN-EM 4.2 Performance verification of image reached a statistically significant level (t=4.72, P<0.05). classification model based on The difference in accuracy between ventral response information fusion encoding based on brain visual cognition and GaborNet VE also reached a statistically significant level (t=6.88, In order to verify the predictive accuracy of ventral P<0.05). In the L0 region, the accuracy difference response encoding based on brain visual cognition for between ventral response encoding based on brain visual brain cognitive response, this method was compared and cognition and CNN-EM reached a statistically significant analyzed with other voxel encoding methods, including level (t=5.23, P<0.05). Similarly, the accuracy difference Convolutional Neural Network Enhancement Model between ventral response encoding based on brain visual (CNN-EM) and GaborNet Visual Encoding (GaborNet- cognition and GaborNet VE was also statistically VE). The accuracy comparison of different encoding significant in the L0 region (t=7.14, P<0.05). Ventral methods in different visual regions is represented in Figure response encoding based on brain visual cognition could 7. From Figure 7(a), within the V4 visual region, the accurately predict brain cognitive response. * * 1.0 * * 1.0 * * 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 Brain cognitive Brain cognitive CNN-EM GaborNet-VE CNN-EM GaborNet-VE encoding encoding Coding method Coding method (a) Comparison of Accuracy in V4 Visual Region (b) Comparison of Accuracy in L0 Visual Region Figure 7: Comparison of accuracy of different encoding methods (*Indicating P<0.05) To further testify the performance of the image LSTM converged to 0.26, while the loss value of the classification model based on information fusion, a model after inserting LSTM converged to 0.21, with a relative unpack was operated on the classification models reduction of 19.23%. This indicated that the classification before and after adding LSTM, as represented in Figure 8. model incorporating LSTM had better convergence From Figure 8, the loss value of the model before adding performance. Prediction accuracy Prediction accuracy Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 47 0.65 LSTM post model LSTM pre model 0.55 0.45 0.35 0.25 0.15 0 200 400 600 800 1000 Number of iterations Figure 8: Comparison of training loss values for networks incorporating LSTM Models To further validate the capability of the image images, the classification accuracy of this model was the classification model grounded on information fusion, this lowest at 91.57%, which was 16.31%, 12.03%, and 19.08% study compared and analyzed the model with other higher than the accuracy of 75.26%, 79.54%, and 72.49% advanced image classification models, including Feature of the other three models, respectively. Compared with Weighted Classification (FWC), Residual Network basic testing image classification tasks, animal (ResNet), and Visual Geometry Group (VGG). In addition, classification often faces more complex backgrounds and to ensure the broad applicability and contextualization of varying object shapes, which makes this task an important the research results, the performance of these models was criterion for testing model robustness. Therefore, the compared with benchmark test results in the current field significant improvement of the research model in this task of computer vision. The accuracy comparison of different indicated that it has stronger generalization ability and classification models is represented in Figure 9. From adaptability when facing highly complex and dynamically Figure 9, in the datasets of various visual images, the changing visual environments. In summary, the image accuracy of the picture sorting model grounded on classification model based on information fusion has information fusion was the best. On facial vision images, demonstrated excellent classification performance in the accuracy of this model was as high as 95.46%, which multiple tasks. Moreover, the performance of the research was an improvement of 6.30%, 5.41%, and 10.03% model still has significant advantages compared to compared to the accuracy of 89.16%, 90.05%, and 85.43% benchmark testing. of FWC, ResNet, and VGG, respectively. This result The reason why facial image recognition had better showed significant advantages compared to the accuracy than animal image recognition was that facial mainstream benchmarks in the current field of facial images had more stable and easily recognizable features recognition. In facial recognition tasks, many of the most compared to animal images. Facial recognition typically advanced facial recognition technologies, such as FaceNet fixed structural features and relatively consistent and ArcFace, achieved high accuracy on multiple standard backgrounds, which enabled information fusion based datasets such as LFW and CASIA WebFace. For example, models to fully exploit the effective information in the FaceNet reported an accuracy of 94.63% on the LFW brain's visual cognitive model, thereby improving dataset [6]. Moreover, ArcFace also achieved a recognition accuracy. However, animal images face more recognition rate of nearly 94.51% on the same dataset [7]. complex challenges, including background noise, changes However, the above studies all achieved accuracy in in animal size and morphology, different shooting angles, interference free environments, while this study still different species, etc. These factors make classification achieved an accuracy of up to 95.46% in actual tasks more complex and varied. Therefore, in terms of environments with complex interference and occlusion. recognition accuracy, the performance of facial image Compared with existing benchmark tests, the researched classification was better than that of animal image image classification model based on information fusion classification. had more advantages in performance. On animal visual Magnitude of the loss 48 Informatica 49 (2025) 37–52 X. Li et al. Intelligent computing model FWC model ResNet model 100 VGG model 90 80 70 60 50 40 30 20 10 0 Face Automobile Animal Fruits Chair recognition Types of visual images Figure 9: Comparison of accuracy between different classification model 4.3 Performance verification of intelligent relatively chaotic, while the sample distribution of ResNet computing models based on was relatively clear. The ResNet model had a more obvious distinction between facial and animal visual information fusion images, but it was more confusing in distinguishing To tesify the ability of intelligent computing models images such as fruits and cars. The intelligent computing grounded on information fusion, the visualization sample model based on information fusion studied exhibited distribution results of different models under various brain significant classification clarity and good classification visual cognitive image stimuli were compared and studied. performance under all visual image stimuli. This was The dataset contains 40 categories of images, which are because images of facial and animal categories were more divided into two main categories: natural scenes and consistent in natural scenes and were easily artificial objects. Natural scenes include image categories distinguishable by models. However, categories such as such as faces and animals, while artificial objects include cars, fruits, and chairs belong to the category of artificial image categories such as cars, fruits, chairs, etc. In the objects, and the visual differences between these experiment, a combination of these image categories was categories were significant, posing greater challenges to used to test the classification performance of the model the model. The intelligent computing model based on under different visual stimuli. The visualization outcomes information fusion revealed the deep level features of of various models are represented in Figure 10. From brain response, effectively improving classification Figure 10, the sample distribution of FWC and VGG was accuracy. Classification accuracy/% Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 49 Face recognition Fruits Animal Face recognition Fruits Animal 3 Automobile Chair 3 Automobile Chair 2 2 1 1 0 0 -1 -1 12 12 -3 -3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 X-axis sample distribution X-axis sample distribution (a) Visualization results of FWC model (b) Visualization results of ResNet model Face recognition Fruits Animal Face recognition Fruits Animal 3 3 Automobile Chair Automobile Chair 2 2 1 1 0 0 -1 -1 12 12 -3 -3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 X-axis sample distribution X-axis sample distribution (d) Visualization results of intelligent (c) Visualization results of VGG model computing models Figure 10: Visualization results of different models To further validate the ability of the intelligent Framewor N T mecha neural rate/ computing model grounded on information fusion, k N M nism responses % ablation experiments were conducted. The classification √ / / / / 81.4 accuracy in ablation experiments was calculated based on 2 the precision of image classification tasks, which only √ √ / / / 87.0 reflected the accuracy of image classification results. The 6 ablation experiment results of image classification are √ √ √ / / 90.4 shown in Table 3. From Table 3, the classification 8 accuracy of the brain visual cognitive response encoding √ √ √ √ / 92.6 framework was 81.42%. When the DCNN structure was 5 fused, the accuracy was improved by 5.64%. After adding √ √ √ √ √ 93.9 the LSTM structure, the accuracy was increased to 4 90.48%. When attention mechanism was added for Note: "√" indicates the existence of the module; "/" improvement, the classification accuracy increased by indicates its non existence 2.17%. When further optimizing using occluded non- correlated neural responses, the accuracy of the model reached 93.94%. From the above, it can be seen that the 5 Discussion addition of the above modules brought benefits to the In order to improve the accuracy of computer vision classification performance of the model, effectively image classification, a fusion intelligent computing model raising the classification accuracy of images. was constructed by simulating the visual processing Table 3: Ablation experiment mechanism of the human brain, using BCI technology to extract EEG signals generated by human visual cognition, Brain D L Attent Obstructing Accu and combining DCNN structure. The results showed that Response C S ion non racy after adding LSTM, the convergence of the model was Coding correlated significantly improved, with the loss value decreasing Y-axis sample distribution Y-axis sample distribution Y-axis sample distribution Y-axis sample distribution 50 Informatica 49 (2025) 37–52 X. Li et al. from 0.26 to 0.21, indicating a 19.23% increase in 6 Conclusion convergence speed. This indicated that LSTM could effectively capture time series features, improve the In recent years, the introduction of visual cognitive model's ability to process time-series data, and thus make mechanisms in the brain has provided new solutions to the the model more accurate in learning dynamic information. limitations of accuracy and generalization ability of After incorporating the advantages of LSTM into the traditional DCNN in processing complex visual model, it could better understand the temporal information. Research used BCIs to receive EEG dependencies in brain activity, resulting in more accurate information, used voxel encoding models to obtain the prediction performance. In addition, compared with other expression content of visual images, and combined an advanced methods, the research method was significantly improved DCNN structure to construct an efficient image superior to other methods. For example, although the classification model. On this basis, the LSTM structure model studied by Gao Z et al. effectively improved the was further introduced to extract time series features of decoding performance of VEP in complex environments, EEG signals. Attention mechanisms and occlusion it still faced the problem of noise interference and failed independent neural responses were utilized to enhance the to effectively integrate spatial and temporal features in the accuracy of capturing correlation information between brain's visual cognitive process [6]. The model studied in brain responses and image features. The outcomes this article not only considers spatial features but also revealed that the ventral response encoding method integrates dynamic temporal information when predicting grounded on brain visual cognition achieved prediction brain responses in visual regions, significantly improving accuracy of 93.54% and 94.03% in the V4 visual region the accuracy of predictions. In addition, the model and L0 visual region, significantly better than the CNN- proposed by Ahirwal M K et al. achieved good results in EM and GaborNet VE methods. In the model validation, emotion classification, but it mainly focused on emotion after adding the LSTM module, the loss value decreased classification and cannot handle complex visual from 0.26 to 0.21, with a reduction of 19.23%. In terms of information and multi-class image classification tasks [7]. image classification capability, the accuracy of the The model studied in this article could not only handle information fusion based model on facial visual images complex visual information, but also adapt to the was as high as 95.46%, and the lowest accuracy on animal multidimensional features of the brain's visual cognitive visual images was 91.57%, both significantly better than process, thus exhibiting a more comprehensive comparative models such as FWC, ResNet, and VGG. In classification and understanding of visual information. addition, ablation experiments showed that by introducing The potential extensions of the research model to attention mechanisms and occlusion independent neural other tasks include video analysis, multi-modal data responses, the final classification accuracy was improved fusion, etc. Video data not only contains spatial to 93.94%. From the above, the research on the fusion information of static images, but also dynamic time series intelligent computing model based on DCNN and brain information. Therefore, models based on brain visual visual cognition effectively improved the accuracy of cognition can better understand the dynamic changes in computer vision image classification. videos by integrating spatial and temporal features, Although research focused on EEG image especially with the addition of LSTM modules. In the field classification and achieved good classification results in of multi-modal data fusion, cross modal learning can be the relevant areas of ventral flow and visual regions, the achieved by introducing multi-modal neural network current scope of research has not yet covered other brain structures and combining data from different modalities. tissue and neural mechanisms. Therefore, future research For example, in video description generation tasks, visual can be extended to explore the functions of other brain information of video frames can be combined with speech regions, such as their contributions to tasks such as or text information to generate more accurate and natural cognitive control and emotion recognition in different descriptions. brain regions. In addition, combining different neural The reason why the research method is superior to mechanisms and multi-modal data will help improve the other methods is that it considers the spatial and temporal comprehensiveness and accuracy of cognitive image characteristics of the brain in the visual cognitive process, classification, thereby promoting further development in while other methods rely more on a single spatial or static the field of BCIs. Future work will strive to further feature. In addition, the introduction of LSTM further enhance the analytical ability of EEG information for enhances the model's ability to process temporal complex visual stimuli through the integration of broader information, enabling the model to decode complex neural regions and mechanisms, in order to promote the dynamic brain signals more accurately. The potential widespread application of intelligent computing models in applications of this discovery cover fields such as practical applications. neuroscience experiments, intelligent medical devices, and brain computer interaction systems. However, this Funding method still has certain limitations. For example, the study This study was supported by the Key Science and only explored the classification of EEG images, so the Technology Program of Henan Province (Project Name: research results are not comprehensive enough. This Research on NoC Routing Algorithm and Fault-Tolerant aspect can be further improved in the future. Technology Based on Spanning Tree Sub-Domains; Grant No. 252102210225). Fusion of Deep Convolutional Neural Networks and Brain Visual… Informatica 49 (2025) 37–52 51 References [11] Yıldırım Ö, Baloglu U B, Acharya U R. A deep convolutional neural network model for automated [1] Wilson H, Chen X, Golbabaee M, Proulx, M. J., & identification of abnormal electroencephalogram O’Neill, E. Feasibility of decoding visual signals. Neural Computing and Applications, 2020, information from electroencephalogram. Brain- 32(20): 15857-15868. DOI: 10.1007/s00521-018- Computer Interfaces, 2024, 11(1-2): 33-60. DOI: 3889-z 10.1080/2326263X.2023.2287719 [12] Miao M, Hu W, Yin H, & Zhang, K. Spatial‐ [2] Finlayson S G, Subbaswamy A, Singh K, Bowers, J, Frequency feature learning and classification of Kupke, A., Zittrain, J & Saria, S. The clinician and motor imagery electroencephalogram based on deep dataset shift in artificial intelligence. New England convolution neural network. Computational and Journal of Medicine, 2021, 385(3): 283-286. DOI: Mathematical Methods in Medicine, 2020, 2020(1): 10.1056/NEJMc2104626 1981728-1981752. DOI: 10.1155/2020/1981728 [3] Masana M, Liu X, Twardowski B, Menta, M., [13] Li F, He F, Wang F, Zhang, D & Li, X. A novel Bagdanov, A. D & Van De Weijer, J. Class- simplified convolutional neural network incremental learning: survey and performance classification algorithm of motor imagery evaluation on image classification. IEEE electroencephalogram signals based on deep learning. Transactions on Pattern Analysis and Machine Applied Sciences, 2020, 10(5): 1605-1624. DOI: Intelligence, 2022, 45(5): 5513-5533. DOI: 10.3390/app10051605 10.1109/TPAMI.2022.3213473 [14] Tang X, Shen H, Zhao S, Li, N., & Liu, J. Flexible [4] Zhu Y, Zhuang F, Wang J, Ke, G., Chen, J., Bian, J & brain–computer interfaces. Nature Electronics, He, Q. Deep subdomain adaptation network for 2023, 6(2): 109-118. DOI: 10.1038/s41928-022- image classification. IEEE Transactions on Neural 00913-9 Networks and Learning Systems, 2020, 32(4): 1713- [15] Kawala-Sterniuk A, Browarska N, Al-Bakri A, Pelc, 1722. DOI: 10.1109/TNNLS.2020.2988928 M., Zygarlicki, J., Sidikova, M., et al. Summary of [5] Hong D, Gao L, Yao J, Zhang, B., Plaza, A & over fifty years with brain-computer interfaces—a Chanussot, J. Graph convolutional networks for review. Brain Sciences, 2021, 11(1): 43-45. DOI: hyperspectral image classification. IEEE 10.3390/brainsci11010043 Transactions on Geoscience and Remote Sensing, [16] Cohn N. Your brain on comics: a cognitive model of 2020, 59(7): 5966-5978. DOI: visual narrative comprehension. Topics in Cognitive 10.1109/TGRS.2020.3015157 Science, 2020, 12(1): 352-386. DOI: [6] Gao Z, Sun X, Liu M, Dang, W., Ma, C., & Chen, G. 10.1111/tops.12421 Attention-based parallel multiscale convolutional [17] Finlayson S G, Subbaswamy A, Singh K, Bowers, J, neural network for visual evoked potentials Kupke, A., Zittrain, J & Saria, S. The clinician and electroencephalogram classification. IEEE Journal of dataset shift in artificial intelligence. New England Biomedical and Health Informatics, 2021, 25(8): Journal of Medicine, 2021, 385(3): 283-286. DOI: 2887-2894. DOI: 10.1109/JBHI.2021.3059686 10.1056/NEJMc2104626 [7] Ahirwal M K, Kose M R. Audio-visual stimulation [18] Bicanski A, Burgess N. Neuronal vector coding in based emotion classification by correlated spatial cognition. Nature Reviews Neuroscience, electroencephalogram channels. Health and 2020, 21(9): 453-470. DOI: 10.1038/s41583-020- Technology, 2020, 10(1): 7-23. DOI: 0336-9 10.1007/s12553-019-00394-5 [19] Franzen L, Stark Z, Johnson A P. Individuals with [8] Komolovaitė D, Maskeliūnas R, Damaševičius R. dyslexia use a different visual sampling strategy to Deep convolutional neural network-based visual read text. Scientific Reports, 2021, 11(1): 6449-6455. stimuli classification using electroencephalography DOI: 10.1038/s41598-021-84945-9 signals of healthy and alzheimer’s disease subjects. [20] Zhou S K, Greenspan H, Davatzikos C, Duncan, J. S, Life, 2022, 12(3): 374-379. DOI: Van Ginneken, B, Madabhushi, A & Summers, R. M. 10.3390/life12030374 A review of deep learning in medical imaging: [9] Kumari N, Anwar S, Bhattacharjee V. Time series- Imaging traits, technology trends, case studies with dependent feature of electroencephalogram signals progress highlights, and future promises. for improved visually evoked emotion classification Proceedings of the IEEE, 2021, 109(5): 820-838. using EmotionCapsNet. Neural Computing and DOI: 10.1109/JPROC.2021.3054390 Applications, 2022, 34(16): 13291-13303. DOI: [21] Basso M A, Bickford M E, Cang J. Unraveling 10.1007/s00521-022-06942-x circuits of visual perception and cognition through [10] Santamaria-Vazquez E, Martinez-Cagigal V, the superior colliculus. Neuron, 2021, 109(6): 918- Vaquerizo-Villar F, & Hornero, R. 937. DOI: 10.1016/j.neuron.2021.01.013 electroencephalogram-inception: a novel deep [22] Jeong J J, Tariq A, Adejumo T, Trivedi, H., Gichoya, convolutional neural network for assistive ERP- J. W., & Banerjee, I. Systematic review of generative based brain-computer interfaces. IEEE Transactions adversarial networks (GANs) for medical image on Neural Systems and Rehabilitation Engineering, classification and segmentation, Journal of Digital 2020, 28(12): 2773-2782. DOI: Imaging, 2022, 35(2): 137-152. DOI: 10.1109/TNSRE.2020.3048106 10.1007/s10278-021-00556-w 52 Informatica 49 (2025) 37–52 X. Li et al. [23] Wang F. Automatic ink painting rendering technique based on deep convolutional neural networks. Informatica, 2025, 49(5): 95-108. DOI: 10.31449/inf.v49i5.7112 https://doi.org/10.31449/inf.v49i16.6312 Informatica 49 (2024) 53-66 53 Research on Optimization Method of Landscape Architecture Planning and Design Based on Two-Dimensional Fractal Graph Generation Algorithm Sheng Chen Shanxi Vocational University of Engineering and Technology, School of Architectural Design, Department of Cultural Heritage Conservation Engineering) E-mail: cs10078910@163.com Keywords: design optimization, generation algorithm, landscape architecture, two-dimensional fractal graph. Received: May 30, 2024 The development of modern mathematical theory, especially two-dimensional fractal graph algorithm, provides a possibility for large-scale landscape data processing. Landscape digital identification technology is an innovative technology based on digital landscape technology and computer identification of experimental data. It is an important artificial intelligence technology, which includes three steps: landscape acquisition, landscape processing and landscape identification. The characteristics of the scene in the landscape picture can be collected by special instruments, such as cameras, etc., and then the collected data can be processed by two-dimensional fractal graph algorithm, and finally realice the automatic identification of the landscape. For images with significant boundary characteristics, we can extract the boundary of the región quickly and accurately, so as to realice the segmentation of the region. However, when the edge features of the image are not good enough, there is little color difference between the background and the region, or there is some interference, the result will be very bad. In this paper, based on the two-dimensional fractal graph generation algorithm, a series of optimization of landscape architecture planning and design. The accuracy of landscape prime number can reflect whether specific types of landscape pictures can be correctly identified and divided. 200 Pictures are divided into six categories, namely Water scene, landscape scene, living scene, sky scene, architecture and transportation then exact ratios of two-dimensional fractal graph network -8s, two-dimensional fractal graph network - 16s, two-dimensional fractal graph network -32s and two-dimensional fractal graph network -32s. It reached the best level in pixel accuracy, average accuracy, average IU, etc., and the pixel accuracy has reached as high as 100%, average accuracy has reached 100%, average accuracy has reached 100%. When compared to the recommended algorithm, the 2D fractal graph generation algorithm has the highest accuracy (94.52%), precision (93.34%), and recall (94.18%) in the classification process. Povzetek: Razvita je optimirana metoda načrtovanja krajinske arhitekture z uporabo algoritma generiranja dvodimenzionalnih fraktalnih grafov, kar je omogočilo učinkovito avtomatsko prepoznavanje in segmentacijo krajinskih elementov. 1 Introduction However, it is possible to maximize the perfection of design results through as much analysis and thinking as The performance of landscape architecture is a possible and considering as many factors as possible. The comprehensive concept, which refers to the elements of the site environment is the material premise characteristics and functions of the whole life cycle of on which the landscape architecture depends. The design landscape architecture, including early analysis, of landscape architecture is not a static way to carry out, conceptual design, construction and operation. For a long and its form will inevitably exist in the site environment. time, landscape architecture has been dependent on the Human perception in the space environment, sound, light, designer's prediction to design it: based on the designer's heat and other factors in the natural environment affect prediction to complete the design. However, designers the form. It is significant to determine how to respond to cannot predict all the factors with complete accuracy, so the "unpredictable" immaterial capabilities, as well as it is impossible to design a perfect design. how to respond to the force, energy and feeling in the structure, is the main task in the performance 54 Informatica 49 (2025) 53-66 S. Chen optimization process, as shown in Figure 1 below, which In terms of the connotation of garden landscape, in is the specific landscape architecture design plan. addition to traditional garden landscape, there are also landscape preference, landscape competitiveness evaluation, etc., and the connotation of evaluation is gradually deepening. In the 20th century eighties to nineties, the landscape more beauty estimates, the environment, such as the study of different models, the visual landscape and visual effects evaluation, among them, the main methods of landscape evaluation model is divided into three categories: describe the factor method, questionnaire survey method, aesthetic attitude determination method. At present, the evaluation of landscape resources using both qualitative and quantitative way more, more is established by using AHP method of fuzzy comprehensive evaluation model, including: from the perspective of ecology, the GIS technology, the tourists demand Angle, landscape image, etc. At present, high-resolution landscape architecture Figure 1: Planar heat planning and design in landscape can provide a large amount of landscape information with architecture design rich characteristics, so it is widely used in landscape For example, in the landscape architecture design shown assessment [1-3]. in the figure 1, designers can operate and optimize the Two-dimensional fractal image generation algorithm is form according to the perception of the natural an image segmentation method based on image features. environment of the site or the human environment of the Its working principle is to select the boundary points in a site, to realize the transformation of the original single region as candidate boundaries and select a method that pure form. can be spliced to obtain the boundary of the region The form-finding model and multi-objective through the inconsistency of the features of the region or optimization based on performance digital analysis make region. Several edge detection operators, including first- performance itself a factor and method of form creation, order differential Sobel operator, Robert operator and and help designers complete the design as an second-order differential Laplacian operator, are usually optimization technique. Landscape architecture structure used to extract the edge of a region. optimization design process of performance, can be Based on two-dimensional fractal graph. The partition considered in landscape architecture designers technique of generating algorithm is essentially a interpretation of the space environmental conditions as partition based on similarity criterion, which includes the foundation, the performance optimization software as some common methods. In this method, a series of basic the main technical means, will be dissected into texture pixels are used to describe each region, and an landscape architecture structures as a material system + extended growth criterion is determined to expand the markers, a micro three dimensions from macro a medium region. Secondly, the growth of adjacent seed pixels is to form self-organizing process of structures, It is a form calculated to determine whether the adjacent pixels have generation process from top to bottom and self- added the pixel of seed pixels. When no new pixel is expression. Here, the dominant position of the designer is found, the whole growth is carried out. The most critical expressed by interpreting the site environment, predicting step is to know the rules of seed birth and growth [4-5]. the function and the relationship between the construction The watershed method mainly regards the pixel points of form and the performance target. This design idea can be each scene in landscape architecture as the coordinates in roughly summarized into three kinds of form generation the whole graph, and represents its location with one and feedback processes: one is a dynamic interaction pixel value. Next, they used a similar in floods overflow, between the form of the structure and the human subject. low-lying, pixels less place, is a piece of plain, and the The other is the environmental or other external forces high place, is a mountain, in a basin, with the change of acting on various forms of the structure, and the the terrain, terrain will be more and more high, the higher resistance of various forms to the environment or other the terrain, the topography is lower, the easier it is to be factors. The third is the interaction between the flooded. After enough seawater is poured into the area, a components of the structure itself. Therefore, the depression is formed, creating an open area. However, performance of landscape architecture can also be there is too much segmentation due to the interference of summarized into three categories, which in turn is the the pixels. In general, this method transforms the spiritual demand (the spatial feeling brought by the space landscape into a grid landscape in graph theory, and treats to the user). each pixel point as a node on the landscape, and the Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 55 connection between these nodes is called the boundary. edges on G and cross off those parts of G that are not The common method to define edges is to calculate the connected together, thus realizing the partition of G. In dissimilarity of pixel points in the neighborhood the partitioned graph G, each independent subgraph is according to the correlation of pixel values, and then treat matched with the corresponding partition, which realizes them as edge weights, so as to get the graph G=< V, E>. the image segmentation [6-8]. In general, G is a weighted undirected graph, and its weight is usually defined according to the actual situation. 2 Related works The basic principle of graph theory is to cut off several Table 1: Summary on related works Reference Methods/Algorithm Merits Limitations [9] This work presents the multi-bit Trie-tree Test results indicate that this Under this design, the technique with non-collision hashing. approach has a good success rate necessary data reconstruction for data reconstruction and a high- technology has not developed. performance efficiency. [10] This survey updates trends in Best-practice companies have Businesses that do not maintain organizations, processes, and outcomes higher expectations for their NPD current NPD practices will find for NPD in the United States and was programs, use more versatile their competitive lack in performed little over five years after teams, and are more likely to growth. PDMA's initial best-practices survey. monitor NPD processes and outcomes. [11] This work presents an effective hybrid The hybrid RBFs-Delaunay graph Delaunay graph mapping is approach for dynamic mesh generation mapping approach is found to be lacking in efficiency. based on Delaunay graph mapping and as accurate and effective as the Radial Basis Functions (RBFs). Delaunay technique in building dynamic meshes for various test . scenarios. . [12] Utilizing a novel two-dimensional The method is desirable from a Further information on the modification of hat functions (2D-MHFs) computational standpoint, and its error analysis is needed to solve linear Fredholm integral great accuracy is demonstrated by equations a few numerical examples. [13] An introduction to genetic algorithm- The method, particularly when A small number of snapshots based aeronautics is given. incorporating numerous free computed via computational design factors and configurable fluid dynamics. modeling parameters, is more versatile and computationally efficient than the traditional approach. [14] This study offers a thorough approach to Lastly, simulated experiments Lack of performance metrics water and energy operation optimization. based on real network data are with the suggested operation. used to validate the viability of the suggested operation optimization strategy. The comprehensive energy-saving rate reaches 31.3%, effectively lowering the costs associated with system operation. https://doi.org/10.31449/inf.v49i16.6312 Informatica 49 (2024) 53-66 56 3 Research methods In the formula, represents the set composed of all landscape element pixels in the landscape image, 3.1 Based on two-dimensional fractal graph, represents the set of pixels adjacent to the landscape element pixel, and usually takes the set of pixels in four it is an algorithm based on conditional similar regions, top, bottom, left and random point partition right.VNiLRepresents the combination of categories of landscape element pixel classification. It represents the According to the human landscape of the scene, make full categories of all landscape element pixels and is the use of the subjectivity of architectural design, carry out corresponding category of pixel points. xiiφ Is the artistic creation of landscape architecture, and visualize potential energy function.The general form of the unary its art. The design of this experimental landscape potential energy is the logarithmic pattern of the class architecture structure is based on the landscape pavilion likelihood probability corresponding to the with roof and column for people to watch and chat. The pixel.φi(xi)xiThe likelihood probability can be learned reason is: according to the actual specific pixel features, and the First of all, in landscape architecture works, the most common pattern is directly trained according to the interaction between people and places is very critical, pixel values. However, these methods only consider the which requires architectural design to be color factors of the landscape image pixels, but are not comprehensively considered according to their own particularly comprehensive. Therefore, color and texture experience and the situation of the site. After a detailed are generally selected: arrangement of the area, the author divides it into two parts: natural environment and cultural environment. The φi(xi) = λTφT(xi) + λcolφcol(xi) + λlφl(xi) (2) experiment was conducted in a humanized manner centered on the central axis of the Guangzhou center, Potential energy function trained by the value size (color) targeting white-collar workers, tourists and nearby feature. φlIs a potential function defined according to the residents. In view of the large urban population and location characteristics of pixels. λT、λcolAnd are their various groups, the construction of landscape architecture respective weights, which are generally obtained through layout should not only meet the needs of people, but also training.λ take into account the needs of history, culture and society, l that is to pay full attention to the structure of landscape Binary potential energy, generally defined in the form architecture and people's behavior. On this basis, a concept of random spatial structure based ,φij(xi, xj) (3) on background, two-dimensional fractal graph, is proposed. This method uses each pixel in the background 0, xi = xj as a node, as shown in Figure 2 below, and uses the φij(xi, xj) = { (4) g(i,j),xi ≠ xj corresponding relationship between each point as the boundary. Then the minimum state random ability field Functions are generally defined based on the relationship is used to divide the landscape garden landscape. between pixel values between adjacent pixels. In the formula, the meaning of function is for pixel values with the same pixel class, which will make the value of function 0. Otherwise, it will be determined according to the function. According to the human landscape of the scene, make full use of the subjectivity of architectural design, carry out artistic creation of landscape architecture, and visualize its art. The design of this experimental landscape architecture structure is based on the landscape pavilion Figure 2: The most basic representation of the two- with roof and column for people to watch and chat. dimensional fractal graph algorithm This method is based on the pixels of the pixel value, is often a function of the formula is the pixel value and pixel The energy of the conditional random field in image type are the same type of pixel values, resulting in the semantic segmentation is expressed as follows: function value is 0, on the other hand, will be decided according to the function. E(X) = ∑i∈V   φi(xi) + ∑i∈V,j∈N   φ i ij(xi, xj); ∀i, xi ∈ At present, by using random field model for post- L (1) processing, semantic of early model correction, one yuan a function usually to repair and add in front of the Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 57 semantic segmentation, make the original semantic Effective navigation of the landscape design space is segmentation more accurately. made possible by the 2D fractal graph creation technique, which takes advantage of this hierarchical classification 3.2 Description of algorithm to balance exploration and exploitation across various layers of search agents. Fractal graphs converge to ideal • Updating 2D Search Velocity: Every search agent's landscape designs through iterative refinement directed velocity is updated according to its velocity, position, and by equations guiding velocity and position updates. The the best-known positions of both itself and its neighbors hierarchical classification that the 2D fractal graph in the same level. presents improves its effectiveness and efficiency in • Update Search 2D Position: Next, each search navigating the intricate landscape design space. agent's position is modified in accordance with its Fundamentally, the fractal graph method is a multi-level velocity. categorization scheme that groups search agents • Termination of Optimization: Until a termination according to how well they can explore and exploit new criterion is satisfied, the optimization process is carried areas. Because of its hierarchical structure, 2D is able to out recursively. strike a balance between exploitation which helps to • Best answer Extraction: The ideal answer to the refine promising solutions and exploration which allows landscape design challenge is ultimately determined to be for the discovery of a variety of landscape design the most well-known position among all search agents. alternatives. Algorithm 1: Two-dimension fractal graph for the landscape architecture planning and design Initialize: - Define the 2D landscape planning and design problem. - Parameters: size of the population (N), iterations in maximum (MaxIter), hierarchical levels (L), weights (w, w_local, w_global), acceleration coefficients (c1, c2), are the classifications. Generate initial fractal graph: - Randomly initialize N search agents with positions and velocities within the solution space. Evaluate stability: - Evaluate the stability of each search agent using the landscape design objective function. Main loop: For iter = 1 to MaxIter: Update hierarchical classification: - Classify search agents into hierarchical levels based on their stability and exploration-exploitation characteristics. For each level L: Update velocity and position: For each search agent in level L: Update velocity: - Calculate cognitive and social components: cognitive_component = c1 * rand() * (p_local - position) social_component = c2 * rand() * (p_global - position) 58 Informatica 49 (2025) 53-66 S. Chen - Update velocity: velocity = w * velocity + cognitive_component + social_component Update position: - Update position: position = position + velocity Evaluate fitness: - Evaluate the stability of the new position using the landscape design objective function. Update local best: - Update local best position if current stability is better than previous. Update global best: - Identify the search agent with the best architecture stability among all levels. Return global best as the optimal solution. 3.3 A Survey of the computational methods similarity and a high degree of convergence. This makes of two-dimensional fractal graphs the landscape design of the environment and the structural form can communicate directly, not Due to the emergence of two-dimensional fractal graph, mechanically or indirectly, to adapt to the former, but will it has been widely paid attention to for its unique be integrated into the landscape design of the behavior of advantages, coupled with the development of technology, the purpose. Although the message contained in each people's understanding of it is more and more profound, place is different, its essence is to seek a symbiotic two-dimensional fractal graph performance is also relationship between human and nature. Its inner essence getting better and better, has been widely used in many is palpable and obvious, such as the use of local materials aspects. This is especially true for the identification, and traditional crafts. The internal expression is a especially for the landscape. 2 d fractal graph has many spiritual guidance for designers to convey and express the advantages, the key point is the weighted average, and inner meaning of the museum through accurately biological neural network, it can reduce weight, thus grasping the characteristics of the scene. reduce the difficulties of network modeling. Compared to Its meaning includes two levels: first, designers use the the conventional 2 d fractal graph, it saves a lot of tedious power of nature to show a landscape structure with deep pretreatment process, such as refactoring related data and humanistic characteristics and spiritual emotions, and so on. make its spatial connotation appear and continue In numerous hierarchical networks, 2 d fractal graph is according to their own experience and experience. Due to one of the most widely used. The method can in advance the addition of landscape architecture structure, it adds a to BP, effectively reduce the training parameters, thus new artistic spirit and cultural connotation, so that its greatly improve the performance of the algorithm. With symbol has a natural artistic spirit of the place. No matter 2 d fractal graph method, can effectively shorten the what the meaning is, it shows that the landscape preparation process of input, thus for the user to save the architecture structure is a man-made intermediary work time and reduce the pressure of work. Layer by between people and places, and the essence of its design layer, layer by layer, layer by layer, layer by layer with a is to integrate human thought and subjective will into the new computation, a new way of computing, each kind of information of places. Under the guidance of design, new data are added to a new system, this method can be people often naturally associate or recall, and realize the applied to many aspects. meaning of the designer, thus arouse people's resonance. The formation of architectural form is mainly an 2 d fractal graphic method belong to the category of deep innovative embodiment of the relationship between the learning, its structure characteristics are similar to deep spatial structure and the architectural structure of the learning, both the localization, also has a hierarchical. building site. Landscape structure is a unique landscape The method adopted a kind of monitoring method of architecture and its structure and has a high degree of training, make the method can more accurately extracted Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 59 from the input data contains information. This method ( a2 1 = f(W1 1 ) 11x1 + W12 × 1 + W1 1 13 × 1 + b1 ) can improve the learning efficiency of 2 d fractal graph. a2 ( 2 = f(W1 1 ) 21 × 1 + W22 × 1 + W1 1 23 × 1 + b2 ) In actual application, because there is no information a2 ( there is usually no classification marking, so you need to 3 = f(W1 1 ) 31x1 + W32 × 1 + W1 1 33 × 1 + b3 ) first according to the characteristics of the tag hW,b(x) = a3 1 = f(W2 11a2 1 + W2 12a2 2 + W2 2 ( ) 13a3 + b 2 1 ) information itself, to unsupervised learning of these data, in order to gain rules, then marked by the monitoring data (6) of learning, both to fully play to the role of the samples, and can deal with the issue of information is not much. Prior to the spread of the calculation principle is shown in figure 1-3, the neural network structure and Logistic model structure are nearly the same, both calculation method and the principle of using but the biggest difference between them is the neural network Winding. Hypothesis 2 d fractal graph algorithm is able to handle the size of the data, the width of the image with said the figure like height with said, the two figures is a plane, but the dimension of the image and color channel Several are represented by d.h ∗ w ∗ dwh Basic network formed by convolution and pooling layer, and the two basic structures are based on local entered as a unit, and 2 d fractal graph algorithm in image translation can maintain the structure of the constant, in other words Element is only related to the position of the space, if one layer of data on the coordinates of the vector is, the data Figure 3: Computational architecture and general vector of the next layer is, and its calculation formula is formula of two-dimensional fractal graphs as follows: As we can see in Figure 3, the input level is a neural Yij = fks(Xsi+δi,sj+δj)0 ≤ δi, δj ≤ k (7) network."+ 1" is refers to a point, known as split points. The system has three types of input, output, three kinds Convolution kernels in the above formula used, the size of concealed structure, only a as output, input on the left, of the step length with s, said the said algorithm () on the right is the output, and in the central is implied, function which adopted operation, general can be divided shows the fully connected. Hidden in the shadows of data, into matrix on the convolution or average pooling, a including the data on the node, is unable to display in the nonlinear excitation function, and calculate the maximum training. pool type of operation such as the maximum value. Among the whole network structure, said one of the layers, a total of figure 2 contains three layers.n1Need a When the parameters and satisfy, to the principle of tag in each layer, is in the first layer of the output layer, mutual transformation between this formula can be used use said, parameters of fang, such as: to represent: ′ (W, b) = (W1, b1, W2 , b2) (5) fks∘ gks′ = (f ∘g)k′+(k−1)s′ ,ss′ (8) Wij (1) arrangement of a unit of the ordinal number is 1 General 2 d fractal graph algorithm in dealing with layer in the middle position of the first unit, and the nonlinear equation, a very common will adopt the method connection between the unit operation parameters, said in for solving nonlinear filter, completely is adopted by the the first layer of the first units to set up. j1 + 1ibl two-dimensional fractal graph. Full convolution net, can i1 + 1iThe neural network computation method is to use one be used to express two-dimensional fractal graph network of the units as outputs, and tags, and the output markup and because its input image in conformity with its output became the first layer, the value is 1, it represents in the image, so the convolution network size is not qualified. known and the two functions, and it contains a collection of the cases, can pass for function prediction, Thus the final results and output. a2 i wbhw,b(x) The specific calculation steps are as follows: 60 Informatica 49 (2025) 53-66 S. Chen 3.4 Based on the two-dimensional fractal some kind of mutual coupling relationship. For example, figure garden jingsu level evaluation index of when the landscape building structure is opened in the light environment, attention must be paid to the structural the algorithm performance of the structure to ensure the stability and constructability of the structure. From this point, it can be In the process of various scenarios of semantic seen that the combined effects of building characteristics classification, unable to differentiate each scenario, so and lighting conditions are very complex and some even sometimes will be a scene of scenes into other scenarios, contradict each other. For another example, the average the effect of causing some fuzzy. This paper adopts a new sunshine structure is strongly related to factors such as method, the images were compared with the real world, vertical scale, but in a specific design, the coupling of according to the classification results obtained will be sunlight with horizontal and vertical scales is difficult. digital processing was carried out, as a convicted Monitored learning is machine learning, and its technical landscape image of the final result. This article is feature is the ability to recognize the functions of a map. currently used by common word segmentation criterion, During this period, all training samples have an used for statistical accuracy. alternative target and a more satisfactory output result, This paper used to represent the semantics and belongs to which is called "monitoring”. The technique of the class is judgment as the number of pixels, to represent monitoring refers to the comprehensive analysis of the the language righteousness, the total number of input data at runtime, and the corresponding mapping is categories in this article belongs to the class represented obtained, to obtain a new set of sampled data. If it is new by the number of the pixels of the total number of data that was generated before, then he will label these nijijncjncj6, ti = ∑i  niji . Calculate the overall accurate new data samples as categories. In the algorithms for two- rate formula such as: dimensional fractal graphs, guided learning algorithms ∑ are usually used for training while supervised learning is i  nii (9) ∑i  ti usually based on gradients (Krizhevsky et al, 2012). Batch processing stochastic gradient reduction methods Used to define JingSu belongs to the first-class scenery are commonly used. In the learning process of two- and be correctly defined as the first-class scenery of dimensional fractal graph N, we only use one example to Beijing plain accurate formula can be used to represent:ii simplify the description process. The method is divided into two stages: forward stage and reverse stage. The first 1 ∑ n i  nii cl (10) step is to carry out in turn until the final result, the second ti step according to the error of the value to be output weight and deviation, after the end of the operation, IU scheme (IU) on average, is expected by calculation according to the weight and deviation of each level is landscape JingSu category pixels in the intersection of adjusted accordingly. If the number of classes, we specify right then to predict the scenery line pixels and the pixels is for samples during classification, its error function of the original category and set, the result is that the final formula is as follows: discriminant index, can use formula to represent: 1 1 J(W, b; x, y) = Σc 1 (t − )2 ∥ − ∑ n i  nii 2 k=1 k yk = t 2 cl (11) (ti+∑j   nji−nii) y ∥2 2 ) (12) Among them, represents the weight in the neural network, 4 Result analysis b. Represents the bias in the neural network, the training sample is represented by, and the corresponding standard 4.1 The practical application of two- of the training sample is represented by.WxytkDenotes dimensional fractal figure the KTH dimension component of the predicted value 2-D fractal graph network training, after a long period of generated by the sample when predicting the sample;x time after the improvement and development, there are However, it represents the dimensionality component of two main ways of training: in the presence of a monitor the sample label to be predicted.ykxk When we are doing teaching. While this paper used a learning algorithm of back propagation, the first thing we need to do is calculate monitoring, the original image and its corresponding the error terms at each level in a certain order.Suppose manual segmentation image used in the network of the 2- that the error term of the first layer is calculated according d fractal graph modeling [14]. to the above formula, and the weight of this layer is However, the influence ratio of each factor of the actual expressed and the bias parameter is expressed.δ(a+1)l + landscape architecture structure on the landscape 1WbIf both layers are fully connected, the error term of architecture structure is not the same, and each aspect has Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 61 the first layer can be calculated using the following 4.2 Optimization of landscape architecture formula: planning and design and expansion of two- dimensional fractal graph δ(l) T = ((W(l)) δ(l+1)) ⋅ f ′(z(l)) (13) Through the detection and classification of 200 pictures, The corresponding gradient calculation formula is shown the correct degree of landscape element value in as follows: landscape garden landscape is obtained, as shown in Table 3 below, which can intuitively reflect the ∇W(l)J(W, b; x, y) = δ(l+1)(a(1))T (14) performance of three different upper sampling structures. ∇b(l)JJ(W, b; x, y) = δ(l+1) The accuracy of landscape prime number can reflect whether specific types of landscape pictures can be If the first layer is a feature extraction stage, that is, the correctly identified and divided. Pictures are divided into convolution layer and the sampling layer of, then the six categories, namely Water scene, landscape scene, error term of the first layer can be calculated by the living scene, sky scene, architecture and transportation. formula: Water scenes include Water, river and Mountain. Scenes include: landscape, landscape, landscape, sky scene, (l) δ = unsample ((W a) T ) δ a+1) k k k ) ⋅ architecture, traffic, traffic, etc. It can be seen from Table f ′(z a) k ) (15) 3 that the classification accuracy of two-dimensional fractal graph network -32s is the worst among all Among them, the value of represents the first convolution classifications. The classification accuracy of two- kernel. After a series of runs in the upper sampling layer, dimensional fractal graph network -16s is higher than that the error obtained later can be transmitted to the previous of two-dimensional fractal graph network -8s, while the layer through the subsampling layer, which refers to the accuracy of two-dimensional fractal graph network -8s is convolution layer of. kkδ a+1) k If use method is to higher than that of two-dimensional fractal graph network average sampling, sampling layer will put poor 洖 -8s. Therefore, two-dimensional fractal graph network - simple average assigned to run before sampling of sub 8S also has a good effect on landscape quality zone, if use is a maximum sampling, so when the forward classification of landscape images. On the two- propagation, samples shipped to in clause, the sampling dimensional fractal graph network -8S, "Skyview" has value will get all the error, the rest of the value is 0 [15]. the best classification rate, but in the real world, it has the The accuracy of landscape classification can be seen in lowest classification rate, only occasionally appearing the accuracy of landscape classification, which can see something similar to the real scene. whether specific types in landscape have been correctly Table 3: Three kinds of the knot JingSu landscape identified and divided. As shown in Table 4-3, the classification accuracy classification of landscape scenery has low accuracy, and the classification accuracy of natural landscape is higher Categories FCN-8s FCN- FCN-32s than the 2D fractal network -8s.It can be seen that the (%) 16s(%) (%) two-dimensional fractal graph network -16s has the best The surface of the 91.89 90.32 87.85 overall performance in the classification of habitat water landscape in landscape images. Finally, three values of average accuracy, average accuracy and average The mountain 87.66 86.35 82.43 accuracy of image pixels are tested. The final value is shown in Table 2; Vegetation 85.94 88.28 83.12 The sky 93.56 90.65 88.45 Table 2: Results of three kinds of upsampled semantic segmentation on landscape pixels Building 88.29 87.56 84.68 Scene Elements The traffic 86.12 85.08 83.28 Pixel Average The average IU accuracy(%) accuracy(%) (%) FCN-8s 86.25 85.98 74.33 Table 3 shows the exact ratios of two-dimensional fractal graph network -8s, two-dimensional fractal graph FCN- 88.97 88.58 75.35 16s network -16s, two-dimensional fractal graph network - 32s and two-dimensional fractal graph network -32s. FCN- 83.06 82.75 72.69 From this figure, we can know the pixel accuracy, 32s average accuracy and average value of the three kinds of 62 Informatica 49 (2025) 53-66 S. Chen upper sampling. Through the comparison of three The landscape planning findings, which were obtained different upper sampling modes, it is concluded that two- through the use of a 2D fractal network generation dimensional fractal graph net-8S has reached the best algorithm, are shown in Figure 4 and Table 4. The level in pixel accuracy, average accuracy, average IU, assessments are based on a number of factors, such as etc., and the pixel accuracy has reached as high as 100%, ecological sustainability, aesthetic appeal, resource average accuracy has reached 100%, average accuracy efficiency, and robustness. Ecological sustainability, has reached 100%. The average accuracy is lower than aesthetic appeal, and resource efficiency are numerical pixel, which is calculated from the data of each values assigned to each design solution, ranging from 0 classification of the image. Too much data will result in to 1, representing the quality of the design in each the average accuracy and the average IU. respective area. The algorithm generates a wide range of solutions every time, which encourages the exploration Table 4: Classification of landscape planning and design of the solution space and makes it possible to identify based on two-dimensional fractal graph generation several different design options. The term "robustness" algorithm describes how stable the solutions produced by the 2D fractal network generation algorithm is under different Classification FCN- FCN- FCN- circumstances. The robustness of all trials shows that the 8s 16s 32s algorithm's results are dependable and consistent in FCN- 8s, FCN-16s, and FCN-32s scenarios. Ecological Sustainability 0.86 0.79 0.72 (0-1) Table 5: overall performance Aesthetic Appeal (0-1) 0.92 0.88 0.81 Algorithm Accuracy Precision Recall Resource Efficiency (0-1) 0.84 0.79 0.82 (%) (%) (%) Robustness 0.94 0.90 0.89 Traditional 86.34 85.12 85.78 Optimization Algorithm Suggested 94.52 93.34 94.18 Algorithm Figure 4: Outcome of landscape planning and design based on two-dimensional fractal graph generation algorithm Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 63 balanced and effective distribution of urban space while optimizing the spatial configuration and enabling coordinated growth within its internal areas. By illustrating a progressive decrease in the fractal dimension from the city's center to its periphery, suggesting outward growth, the study contributes to the notion of fractal cities. Furthermore, it shows that pixels with mature landscape architecture have larger fractal dimensions than those that are still developing or are being quickly classified as FCN-8s, FCN-16s, or FCN- 32s. Figure 5: Overall performance of the methods 6 Conclusion An academic discipline that focuses on the interaction In order to solve this problem, we adopt a method based between human habitation and the natural environment is on quadratic fractal graph to solve this problem. Two- landscape architecture planning and design. Table 5 and dimensional fractal graphs are mainly divided according figure 5 demonstrates that the 2D fractal graph generation to the pixel points in landscape architecture. Different approach that is recommended has the best accuracy from the method of 2D fractal graph, the final step of 2D (94.52%), precision (93.34%), and recall (94.18%). fractal graph is to transform the whole connected level into a transition, so that the operational architecture of 2D 5 Discussion fractal graph is preserved. The final image is the same size as the original image, and the original image of the A fractal graph is a complex entity that is formed using original image can be obtained. Compared with the recursive iteration rules and can be found in both natural traditional two-dimensional fractal graph method, two- systems and man-made systems, such as cities. It displays dimensional fractal graph has higher computational speed intrinsic self-similarity on both a large and small scale. and is not constrained by the size of the input image. In The fractal two-dimension is a scientific technique for this paper, a new image semantic partition method is measuring landscape architecture aspects and its introduced. evolutionary unsample in the context of network systems. After establishing a virtual environment, the model is Furthermore, it is an important metric to determine modeled by the common SifFlow on the network. The whether a city is experiencing self-organizational image preprocessing technology is used to enhance the evolution. Previous research results show that self- data, thus effectively overcoming the problem of organized architecture systems have notable fractal overfitting the model. On this basis, the second learning properties that can be measured using fractal two- of the second segmentation is carried out to shorten the dimensional graphs. Nevertheless, prior research has learning period of the model. solely utilized these two fractal dimensions to investigate In the image analysis, two-dimensional fractal graph the general fractal properties of the entire city, without network -32s, two-dimensional fractal graph network - carrying out more accurate fractal measurements for 16s, two-dimensional fractal graph network -8s, two- subzones in various directions and layers. dimensional fractal graph network -8s, two-dimensional The existence of a fractal structure, which acts as an fractal graph network -8s, two-dimensional fractal graph analogy and supplement to earlier research findings on network -8s, three different upper sampling methods are the general fractal laws discovered in other cities, is one adopted. In the case of pixel accuracy, average accuracy of the study's major discoveries. Unlike earlier research, and average U value sampling, the sampling structure on this finding supports the different 2D values that 2D fractal graph net-8s is selected, and the average contribute to the spatial variability of separate subzone accuracy is 90.3%, the average accuracy is 88.91%, and structures. The cause is the differing paths taken by the average U is 75.83%. At the same time, the pixel various subzones in terms of planning and development, accuracy of the model in each scene type in the landscape as well as the ways in which people, land, and image is more than 86%, it is indicating that the method architecture occupy and use space in distinct urban is an ideal method in the landscape image, especially in blocks. The box-counting dimension is a measure of the the landscape image containing multiple scene types, can spatial occupancy capacity, hence urban expansion will obtain higher pixel segmentation accuracy. In this paper, cause it to rise. In other words, the self-organizational the classification experiment of landscape in landscape objective of urban development is to provide a more architecture is performance metrics in algorithm carried out with accuracy as 94.52%. 64 Informatica 49 (2025) 53-66 S. Chen Declaration statement [4] Mohammadi M, Raise A, Regi A, 2019. Design and performance optimization of a very low head turbine with high pitch angle based on two- Ethics approval and consent to dimensional optimization[J]. Journal of the participate Brazilian Society of Mechanical Sciences and Engineering, 42(1), pp. 9. I confirm that all the research meets ethical guidelines https://doi.org/10.1007/s40430-019-2084-1 and adheres to the legal requirements of the study [5] Liu W, Yang S, Ye Z, et al, 2019. An Image country. Segmentation Method Based on Two- Consent for publication: I confirm that any participants Dimensional Entropy and Chaotic Lightning (or their guardians if unable to give informed consent, or Attachment Procedure Optimization Algorithm[J]. next of kin, if deceased) who may be identifiable through International Journal of Pattern Recognition and the manuscript (such as a case report), have been given Artificial Intelligence. an opportunity to review the final manuscript and have https://doi.org/10.1142/s0218001420540300 provided written consent to publish. [6] Wang S, Lin S, 2019. Optimization on ultrasonic plastic welding systems based on two- Availability of data and materials dimensional photonic crystal [J]. Ultrasonics, 99:105954. https://doi.org/10.1016/j.ultras.2019. The data used to support the findings of this study are 105954 available from the corresponding author upon request. [7] Hu H Q, Yang L, 2013. Research on Routing Optimization of Regional Logistics Based on Competing interests Gravity Model: A Case of Blue and Yellow Zones[J]. I business, 5(4):167-172. Here are no have no conflicts of interest to declare. https://doi.org/10.4236/ib.2013.54021 All authors have seen and agree with the contents of the [8] Diaz-Casas V, Becerra J A, Lopez-Pena F, et al, manuscript and there is no financial interest to report. We 2013. Wind turbine design through the certify that the submission is original work and is not evolutionary algorithms based on surrogate CFD under review at any other publication. methods[J]. Optimization and Engineering, 14(2):305-329. https://doi.org/10.1007/s11081- Funding 012-9187-1 [9] Liu Y, Wan M, Zhang H K, et al, 2011. Research Funding: Shanxi Province Project: National Education on Data Reconstruction Method Based on Research Letter 2021," Rural revitalization" Research on Identifier Locator Separation Architecture[J]. Teaching Innovation of Traditional Culture Protection épée, 12(4):531-539. and Tourism Planning in the Background, JGCY2693. https://doi.org/10.6138/JIT.2011.12.4.01 Authors' contributions (Individual contribution): All [10] Griffin A, 1997. PDMA Research on New Product authors contributed to the study conception and design. Development Practices: Updating Trends and All authors read and approved the final manuscript Benchmarking Best Practices[J]. Journal of Product Innovation Management, 14(6):429- References 458. https://doi.org/10.1016/s0737- 6782(97)00061-1 [1] Li X, Li S, Jiao H. 2020. Research on Multi- [11] Ding L, Guo T, Lu Z, 2015. A Hybrid Method objective Optimization Method of Central Air for Dynamic Mesh Generation Based on Radial Conditioning Air Treatment System Based on Basis Functions and Delaunay Graph Mapping[J]. NSGA-II[J]. Journal of Physics: Conference Series, Advances in Applied Mathematics & Mechanics, 1626:012113. https://doi.org/10.1088/1742- 7(03):338-356. 6596/1626/1/012113 https://doi.org/10.4208/aamm.2014.m614 [2] He P, Gao F, Li Y, et al, 2020. Research on [12] Hatamzadeh-Varmazyar S, Masouri Z, 2011. optimization of spindle bearing preload based on the Numerical method for analysis of one- and two- efficiency coefficient method[J]. Industrial dimensional electromagnetics catering based on Lubrication and Tribology, ahead-of-print(ahead- using linear Fredholm integral equation models[J]. of-print). doi10.1108/ILT-06-2020-0205 Mathematical & Computer Modelling, 54(9- [3] Hong E, Ban H, Qi M, 2019. Design optimization 10):2199-2210. and analysis of a vaned diffuser based on the one- https://doi.org/10.1016/j.mcm.2011.05.028 dimensional impeller-diffuser throat area model[J]. [13] Lucas S D, Vega J M, Velazquez A, 2015. Journal of Physics: Conference Series, Aeronautic Conceptual Design Optimization 1300:012007. https://doi.org/10.1088/1742- Method Based on High-Order Singular Value 6596/1300/1/012007 Research on Optimization Method of Landscape Architecture… Informatica 49 (2025) 53-66 65 Decomposition[J]. Aiaa Journal, 49(12):2713- 2725. https://doi.org/10.2514/1.j051133 [14] Zhu X, Niu D, Wang F, et al, 2018. Operation Optimization Research of Circulating Cooling Water System Based on Superstructure and Domain Knowledge[J]. Chemical Engineering Research and Design, 142. https://doi.org/10.1016/j.cherd.2018.12.012 [15] Hu G C, Liu J H, 2011. The Optimization Design of Mechanical Structure Based on CAE Technology[J]. Machine Design& Research, 130- 134:672-676. https://doi.org/10.4028/www.scientific.net/amm. 130-134.672 66 Informatica 49 (2025) 53-66 S. Chen https://doi.org/10.31449/inf.v49i16.5869 Informatica 49 (2025) 67–76 67 Enhanced COVID-19 Detection Through Combined Image Enhancement and Deep Learning Techniques Abderrazak Benchabane*, Fella Charif Department of electronics and telecommunications, University of Kasdi Merbah, Ouargla, Algeria E-mail : benchabane.abderrazak@univ-ouargla.dz, cherif.fella@univ-ouargla.dz *Corresponding author Keywords: COVID-19, image enhancement, chest x-ray images, deep learning Received: March 6, 2024 The rapid spread of COVID-19 has highlighted the need for automated patient data analysis to enable faster and more accurate diagnosis. Using pre-trained deep learning models on X-ray images has shown potential for effective COVID-19 detection. However, the performance of these models is highly dependent on the quality and quantity of training data. To address these challenges, enhancing the visual quality of X-ray images is critical for reliable virus detection. This study evaluates and combines three image enhancement techniques—Histogram Equalization, Contrast-Limited Adaptive Histogram Equalization (CLAHE), and Gamma Correction—to determine the optimal approach for improving detection accuracy. A dataset comprising 125 chest X-ray images from COVID-19-positive patients and 500 images from non-COVID-19 cases was used. The images were preprocessed using the enhancement techniques, and the enhanced datasets were employed to train ResNet50 and DenseNet201 models. Simulation results demonstrate that enhanced images consistently yield higher detection accuracy than unenhanced images. Among the techniques tested, combining Histogram Equalization, CLAHE, and Gamma Correction with the DenseNet201 model achieved the highest performance, attaining a remarkable accuracy of 99.03%. This outperforms previous methods, including the DarkCovidNet model, which achieved an accuracy of 98.08% on the same dataset. Povzetek: Avtorja sta izboljšala zaznavanje COVID-19 iz rentgenskih slik prsnega koša z uporabo tehnik izboljšave slike (Histogram Equalization, CLAHE, Gamma Correction) v kombinaciji z modeli globokega učenja (ResNet50, DenseNet201). 1 Introduction have passed through the human body. The result is an analog image which is often sufficient to obtain a reliable Corona disease is currently considered one of the most diagnosis and for low-cost screening. Various studies widespread, dangerous and fastest diseases, so it is have indicated the failure of CXR imaging in diagnosing necessary to find ways and methods to detect infected COVID-19 and differentiating it from other types of cases and diagnose them in the fastest and clearest way. pneumonia [3]. The radiologist cannot use X-rays to RT-PCR is a nuclear-derived technique that detects the detect pleural effusion and determine the volume presence of genetic material specific to a pathogen, involved. However, regardless of the low accuracy of X- including a virus. A formal diagnosis of COVID-19 ray diagnosis of COVID-19, it remains widely used. To requires a laboratory test (RT-PCR) of nose and throat overcome the limitations of COVID-19 diagnostic tests samples and takes at least 24 hours to produce a result. using radiological images, various studies have been Nowadays, medical images and computerized analysis conducted on the use of deep learning (DL) in the become very important tools for medical diagnosis and analysis of radiological images [2-11]. It has also shown disease detection [1]. The radiology images show typical that image enhancement techniques can improve COVID-19 pneumonia in the lungs and the numerous significantly classification performance [12, 13]. complications that the virus causes in the body. The radiology imaging modalities include computed 1.1 Contribution tomography (CT), radiograph X-rays, ultrasound, echocardiograms and magnetic resonance imaging In this paper, we investigate the impact of using image (MRI). These imaging modalities optimize and greatly enhancement techniques as a preprocessing step to facilitate the process of discovering affected areas in the improve the accuracy of convolutional neural network body [2]. Chest X-ray tests are easily available and have (CNN) models for COVID-19 detection. Specifically, a low risk of radiation. On the other hand, CT scans have histogram equalization, Contrast Limited Adaptive a high risk of radiation, are expensive, need clinical Histogram Equalization (CLAHE), and gamma expertise to handle and are non-portable. This makes the correction were applied to enhance chest X-ray (CXR) use of X-ray scans more convenient than CT scans. A images before training. radiograph is obtained by exposing a film to X-rays that 68 Informatica 49 (2025) 67–76 A. Benchabane et al. The enhanced images significantly improved the image detection model based on the multi-head self- visibility of key diagnostic features, such as ground-glass attention mechanism and residual neural network, opacities and consolidations, which are critical for achieving 95.52% accuracy with 5173 samples. accurate COVID-19 diagnosis. The proposed Transfer learning has also played a pivotal role in preprocessing pipeline was evaluated on a challenging COVID-19 detection. Apostolopoulos et al. [17] utilized COVID-19 dataset with an imbalanced number of pre-trained models like VGG19, Inception ResNet v2, and samples for both COVID and non-COVID classes. MobileNet v2, achieving 96.78% accuracy on 1427 Experimental results demonstrated that the enhanced samples for COVID-19 classification. Mahmoud et al. images led to a notable improvement in the classification [18] applied the CovXNet architecture, achieving 97.4% performance of CNN models, achieving higher accuracy, accuracy on 610 samples. Mohit Kumar et al. [19] sensitivity, and specificity compared to using raw utilized a hybrid deep learning approach for multiclass images. classification, achieving 98.20% accuracy on 6000 The rest of the paper is organized as follows: The samples. materials and methods section contains details about our Several studies focused on binary classification tasks proposed technique along with some context about the with high accuracy. Guefrechi et al. [20] achieved state-of-the-art models that we have used. The results and 97.20% accuracy using deep learning methods on 5000 Discussion section presents the experimental results images. Feki et al. [21] employed a deep CNN model for including classification accuracy, sensitivity, and F1- binary classification, reaching an accuracy of 95.30% on score obtained from the proposed work. The paper is 216 images. Mohan et al. [22] used a hybrid deep achieved by a conclusion. transfer learning CNN model achieving 92% accuracy with 9220 images. Malik et al. [23] applied deep neural 1.2 Related works networks for multiclass classification, attaining 98.45% accuracy on 10017 images. Gulmez [24] explored Numerous studies have applied advanced artificial Xception and genetic algorithms for multiclass intelligence (AI) techniques, particularly deep learning classification, reporting an accuracy of 92.4% on 1251 (DL) and machine learning (ML), to detect COVID-19 images. Lastly, Zakariya et al. proposed to combine using X-ray images. Zhang et al. [14] developed an combine Xception, VGG-16, and VGG-19 models. They anomaly detection algorithm with efficient Net for achieved an accuracy of 97.91% using 964 images. multiclass classification, achieving an accuracy of Table 1 provides a summary of various research 72.77% on 43370 samples. Deng et al. [15] employed studies focusing on state-of-the-art models for COVID- models such as SVM, CNN, ResNet50, InceptionNetV2, 19 detection using AI and ML techniques. Xception, and VGG16 to assess health status through X- ray imaging, obtaining an accuracy of 84% using 5857 samples. Wang et al. [16] introduced a COVID-19 X-ray Table 1: Summary of related works on COVID-19 detection Source Method/Model Samples used Accuracy (%) [14] Efficient Net 43,370 72.77 [15] SVM, CNN, ResNet50, Xception, VGG16 5,857 84.00 [16] MHSA-ResNet neural network model 5173 95.52 [17] VGG19, Inception ResNet v2, and MobileNet v2 1,427 96.78 [18] CovXNet 610 97.40 [19] Hybrid deep learning approach 6,000 98.20 [20] Deep Learning (Resnet50) 5,000 97.20 [21] Deep CNN (Centralized-ResNet50) 216 95.30 [22] Deep Transfer Learning 9220 92.00 [23] Deep Neural Networks 10,017 98.45 [24] Xception and Genetic Algorithm 1,251 92.40 [25] Xception+vgg-16+vgg-19 964 97.91 2 Materials and methods 2.1 Dataset generation infected with the virus and 500 chest x-ray images of The Dataset of chest X-ray images used in this paper non-COVID-19. The data is divided into 2 classes, 50% for classifying negative and positive COVID-19 cases is of images were used for training and 50% for testing. available at (https://github.com/muhammedtalo/COVID- Figure 1 shows some samples that have been used in our 19). It contains 125 chest x-ray images of patients simulation [6]. Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 69 Values of 𝛾 < 1 will shift the image towards the darker end of the spectrum while γ > 1 will make the image appear lighter. γ = 1 will have no effect on the input image. The application of histogram equalization (HE), Contrast Limited Adaptive Histogram Equalization (CLAHE), gamma correction, and their combination significantly improves the quality of COVID-19 X-ray images, aiding in better feature visualization and extraction (see Figure 2). Figure 1: Samples of chest X-ray images from the dataset. 2.2 Image enhancement techniques Image enhancement is a very important task used in image pre-processing. Its aim is to improve the visual details of an image or to provide a transform Figure 2: X-ray image processed with various image representation for an appropriate usage in different fields enhancement techniques. [4, 11]. In this paper, we have considered the following enhancement techniques. Histogram equalization enhances global contrast, making subtle abnormalities more visible, while CLAHE 2.2.1 Histogram equalization adaptively improves local contrast, preserving fine details and reducing noise amplification. Gamma Histogram Equalization (HE) is a technique for adjusting correction adjusts image brightness non-linearly, the contrast of an image using the image’s histogram. enhancing low-intensity features like ground-glass The goal of Histogram Equalization is to obtain a opacities. When combined, these techniques provide a uniform histogram, which improves contrast [13]. comprehensive enhancement by leveraging global and local adjustments, ultimately producing images with 2.2.2 Contrast limited adaptive histogram improved visibility of critical diagnostic features. This equalization preprocessing step enhances the performance of Contrast Limited Adaptive Histogram Equalization convolutional neural networks (CNNs) by supplying (CLAHE) was originally developed for the enhancement higher-quality inputs, resulting in superior COVID-19 of low-contrast medical images. The algorithm of detection accuracy and robustness. CLAHE creates non-overlapping contextual regions (also called sub-images, tiles or blocks) and then applies the 2.3 Pre-trained CNN histogram equalization to each contextual region, clips Two different CNN models (ResNet50 [26] and the original histogram to a specific value and then DenseNet201 [27]) were compared separately using eight redistributes the clipped pixels to each gray level. The different image enhancement techniques for the clipping level determines how much noise in the classification of COVID-19 and non-COVID to histogram should be smoothed and hence how much the investigate the effect of image enhancement on COVID- contrast should be enhanced [13]. 19 detection. 2.2.3 Gamma correction 2.4 Performance metrics Gamma Correction (GC) is a nonlinear adaptation In order to evaluate the performance of each deep applied to each and every pixel value. Gamma learning model, different metrics have been applied in corrections alternate the pixel value to improve the image this study [13,28]: using the projection relationship between the value of the 𝑇𝑃+𝑇𝑁 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (2) pixel and the value of gamma according to the internal 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 map. To calculate the gamma correction, the input value 𝑇𝑃 is raised to the power of the inverse gamma. The formula 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = (3) 𝑇𝑃+𝐹𝑁 for this is as follows [13]: 1 1 𝑇𝑁 𝛾 𝐼 = 255 ( ) (1) 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = (4) 255 𝑇𝑁+𝐹𝑃 70 Informatica 49 (2025) 67–76 A. Benchabane et al. 2.5 Methodology 2𝑇𝑃 𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = (5) 2𝑇𝑃+𝐹𝑃+𝐹𝑁 Firstly, we train and test two pre-trained convolutional where: neural networks with original CXR images, and then we True Positive (TP): The prediction is COVID and the repeat the same operation with the same images image is COVID. enhanced with the two techniques cited below. The major True Negative (TN): The prediction is non COVID and experiments that are carried out in this study are the the image is non COVID. combination of the enhancement methods; HE and False Positive (FP): The prediction is COVID and the CLAHE; CLAHE and GC; HE and GC and finally image is non COVID. CLAHE, HE and GC (see Figure 2). For each False Negative (FN): The prediction is non COVID and combination, we compute the four performance metrics the image is COVID. rates. The detailed methodology adopted in the study is shown in Figure 3. Figure 3: Flowchart of the proposed method. and random scaling (0.5 1.1). Performance metrics were 3 Results evaluated using 10 repeated cross-validation runs, each processing randomly selected image sets for training and Firstly, the X-ray images were enhanced using the testing. different techniques mentioned above. The concatenates are formed either from the original images (without 3.1 Results without enhancement enhancement), images enhanced by a single technique (HE, CLAHE, or GC), or by a combination of these. The In this part, we train and test the two networks using obtained images allow us to build 8 databases. Secondly, images without any enhancement. The DenseNet201 and we train two pre-trained networks; ResNet50, and the ResNet50 have achieved performance with an DenseNet201 for detecting COVID-19 in chest X-ray accuracy of 98.08% and 98.35% respectively. The scan images. The last fully connected layer of the pre- confusion matrix constructed from the test evaluation trained neural networks was modified to classify two results is shown in Figure 4. classes: COVID-19 positive and negative. For both pre- trained networks, the learning rate was set to 0.0003, 3.2 Results with enhancement while the validation frequency was set to every 5 steps to The confusion matrix in Figure (5) illustrates the track model performance. The maximum number of performance of DenseNet201 and ResNet50 models with epochs was limited to 6, and the minimum batch size was different image enhancement techniques. In the case of set to 10. The Adam optimizer and the cross-entropy loss DenseNet201 using CLAHE and GC techniques, a high function were chosen. Additionally, data augmentation accuracy of 99.68% was achieved with only 1 techniques were applied, including random rotations (- misclassification out of 312 samples. 10,10) random horizontal and vertical shifting (-30,30 ) Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 71 histogram equalization (Clahe) generally improve the classification performance compared to the original images. The highest accuracy (98.17% ± 0.64) is achieved using the HE enhancement technique, with an F1-score of 98.86% ± 0.40. Clahe also performs well, achieving an accuracy of 98.07% ± 0.93 and the highest F1-score of 98.81% ± 0.57. This indicates that contrast enhancement techniques effectively highlight important features in the images, leading to improved model performance. However, the combination of enhancement techniques such as HE+GC and Clahe+GC does not Figure 4: Confusion matrices for DenseNet 201 and consistently outperform individual techniques. For RestNet 50 without enhancement. instance, HE+Clahe+GC results in a lower accuracy (97.43% ± 0.56) compared to HE alone, though it provides the highest specificity (93.38% ± 4.46). The standard deviation (SD) values suggest that these combined methods may introduce more variability in performance, as seen in specificity values. Table 3 demonstrates that the DenseNet201 model generally exhibits higher accuracy compared to ResNet50 for most enhancement techniques. The best performance is achieved using the combined HE+Clahe+GC technique, yielding an accuracy of 99.03% ± 0.54 and an F1-score of 99.39% ±0.34. This suggests that DenseNet201 is better at leveraging the Figure 5: Confusion matrices for DenseNet 201 and enhanced features provided by multi-enhancement RestNet 50 with enhancement. approaches. Among individual techniques, HE and Clahe both result in comparable accuracy (97.40% ± 0.82 and The model demonstrates perfect recall (100%) for 97.30% ± 1.13, respectively), with Clahe producing a detecting COVID cases and a very high precision of slightly higher F1-score of 98.32% ± 0.71. The GC 98.4%, indicating its ability to correctly identify COVID technique results in relatively lower specificity (90.32% with minimal false negatives. Similarly, for NON- ± 2.94) compared to other methods, indicating that while COVID cases, the recall and precision are near-perfect at it improves sensitivity, it may not be as effective in 99.6% and 100%, respectively, showing excellent distinguishing negative cases. discrimination between the classes. Comparing the two models, DenseNet201 In the other hand, the ResNet50 model, utilizing HE, consistently outperforms ResNet50 across all CLAHE, or a combination of CLAHE with HE or GC, enhancement techniques, with higher accuracy and F1- demonstrates excellent performance with an accuracy of score values. The sensitivity of DenseNet201 is slightly 99.36%. Out of 312 samples, the model correctly lower in some cases but remains competitive. Specificity classifies 62 COVID cases and 248 NON-COVID cases, improvements are more pronounced in DenseNet201, with only 2 misclassifications: 2 False Positives (NON- which indicates better handling of false positives. COVID predicted as COVID) and no False Negatives Regarding variability, DenseNet201 exhibits lower (COVID misclassified as NONCOVID). The recall for standard deviations in most metrics, particularly in COVID detection is perfect at 100%, indicating no accuracy and F1-score, suggesting more stable and incorrect predictions of COVID for NON-COVID cases, reliable performance across different enhancement while the precision is slightly lower at 96.9% due to the techniques. Conversely, the ResNet50 model experiences False Positives. These preprocessing techniques enhance greater variability, especially with specificity values. The image contrast and normalize brightness, aiding the results highlight the importance of image enhancement model’s ability to discriminate between classes techniques in improving deep learning model effectively. performance. While individual techniques such as HE and Clahe provide significant improvements, combining multiple techniques can further enhance performance, 4 Discussion particularly for DenseNet201. The performance metrics reported in Tables 2 and 3 In addition, while exploring the confidence intervals provide a comprehensive comparison of different image through error bars in Figure (6), It can be shown that the enhancement techniques applied to ResNet50 and DenseNet201 model generally achieves higher accuracy DenseNet201 models. The analysis includes accuracy, and F1-scores, particularly with the combination of HE, sensitivity, specificity, and F1-score, along with their Clahe and GC techniques, while ResNet50 shows better respective standard deviations. specificity for techniques like GC and HE+GC. From Table 2, it is evident that histogram Sensitivity remains high for both models, with equalization (HE) and contrast-limited adaptive 72 Informatica 49 (2025) 67–76 A. Benchabane et al. overlapping confidence intervals indicating comparable performance. Table 2: Mean and Standard Deviation (SD) of Performance metrics for the ResNet50 model Enhancement Accuracy (%) Sensitivity (%) Specificity (%) F1-Score (%) Techniques Mean SD Mean SD Mean SD Mean SD Original 96.60 0.52 98.88 1.20 87.41 5.67 97.90 0.31 HE 98.17 0.64 99.40 0.54 93.22 3.20 98.86 0.40 Clahe 98.07 0.93 99.80 0.38 91.12 4.04 98.81 0.57 GC 97.78 0.99 98.40 1.46 95.32 2.68 98.61 0.63 HE+GC 97.62 0.50 98.96 0.82 92.25 3.46 98.53 0.31 HE+ Clahe 98.01 0.78 99.44 0.84 92.25 2.49 98.77 0.49 Clahe+GC 97.75 0.89 98.96 1.16 92.90 3.58 98.60 0.56 HE+Clahe+GC 97.43 0.56 98.44 1.36 93.38 4.46 98.40 0.36 Table 3: Mean and Standard Deviation (SD) of Performance metrics for the DenseNet201 model Enhancement Accuracy (%) Sensitivity (%) Specificity (%) F1-Score (%) Techniques Mean SD Mean SD Mean SD Mean SD Original 96.95 1.08 98.24 0.92 91.77 4.95 98.10 0.66 HE 97.40 0.82 98.48 1.39 93.06 4.56 98.38 0.51 Clahe 97.30 1.13 98.16 1.15 93.87 4.15 98.32 0.71 GC 97.46 0.81 99.24 0.66 90.32 2.94 98.43 0.50 HE+GC 97.62 1.41 99.08 1.10 91.77 5.01 98.52 0.87 HE+ Clahe 98.75 0.63 99.68 0.52 95.00 2.89 99.22 0.38 Clahe+GC 98.87 0.83 99.64 0.66 95.80 2.17 99.30 0.51 HE+Clahe+GC 99.03 0.54 99.56 0.63 96.93 2.45 99.39 0.34 ResNet50 exhibits greater variability across metrics, exhibits larger confidence intervals, indicating greater whereas DenseNet201 provides more stable results. In variability in its performance, whereas ResNet50 shows addition, while exploring the confidence intervals more consistent results with narrower confidence through error bars in Figure (6), it can be shown that the intervals, especially in Specificity. This variability DenseNet201 model generally outperforms ResNet50, suggests that while DenseNet201 may achieve higher particularly in Sensitivity and F1-score, with significant performance, its predictions are less stable across trials or improvements observed when using combined datasets. enhancement techniques. However, DenseNet201 Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 73 Figure 6: Confidence intervals for performance metrics of COVID-19 detection models. The radar chart shown in Figure (7) compares the models. They achieved 96.1% accuracy on a dataset area under the ROC curve (AUC-ROC) for the two deep containing 3141 chest X-ray images. Ozturk et al. [5] learning models across the different image preprocessing have used the DarkCovidNet and the same dataset used techniques. It can be seen that DenseNet201 generally in this paper. They achieved an accuracy of 98.08 %. demonstrates higher AUC-ROC values compared to Purohit ResNet50 across most preprocessing techniques, et al. [6] have used a Convolutional Neural Network with particularly in combinations involving multiple augmented data to increase the dataset. They have enhancements like HE+Clahe+GC and Clahe+GC. achieved 99.44 % accuracy. However, ResNet50 shows comparable performance in In addition, the proposed enhancement methods cases such as HE and GC. The results indicate that show significant improvements in accuracy compared to preprocessing techniques significantly impact model the models using other data sets. The DenseNet201model performance, with DenseNet201 being more responsive achieved an accuracy of 99.67%, outperforming models to enhancements. This suggests that model selection and like Feki et al. [21], which reported 95.3% accuracy preprocessing strategy should be carefully considered to using Deep CNN, and Guefrechi et al. [20], which optimize classification performance based on the desired achieved 97.20% with a deep learning approach. evaluation metric. Similarly, ResNet50 demonstrated high performance with an accuracy of 99.35%, surpassing Apostolopoulos et al. [17] (96.78%) and Mahmoud et al. [18] (97.40%). Furthermore, Deng et al. [15] achieved a maximum accuracy of 84.0%, showcasing the substantial improvement offered by the proposed methods. These results demonstrate that the proposed models consistently achieve higher accuracy, emphasizing their reliability and effectiveness in improving COVID-19 detection compared to the previously established state-of-the-art models. 6 Conclusion This paper investigates how image enhancement techniques can improve the performance of pre-trained neural networks when working with limited data. Two Figure 7: Radar plot of the area under the ROC curve pre-trained convolutional neural networks, ResNet50 and (AUC-ROC ) Performance for ResNet50 and DenseNet201, were selected for COVID-19 detection. DenseNet201 with Various Image Preprocessing The training set was constructed by applying various Techniques. enhancement techniques to chest X-ray images, which highlight critical structures such as lung opacities and 5 Comparison with the state-of-the- consolidations—key features for accurate COVID-19 art CNN approaches diagnosis. The results demonstrate that COVID-19 detection accuracy is significantly improved when using To evaluate the proposed method, we made a comparison enhanced images compared to non-enhanced ones for with some existing models using COVID-19 X-ray both pre-trained networks. images. Narin et al. [7] have proposed three different DL 74 Informatica 49 (2025) 67–76 A. Benchabane et al. Based on metrics such as accuracy, sensitivity, cases. When compared to previous studies using the specificity, and F1-score, the best-performing model was same dataset, DenseNet201 outperforms DarkCovidNet, DenseNet201, achieving an accuracy of 99.67%, which achieved 98.08%, underscoring the effectiveness sensitivity of 100%, specificity of 98.38%, and an F1- of these enhanced models on real-world X-ray images. score of 99.80% for classifying positive and negative Table 4: Results comparison with related works on COVID-19 detection Sources Method/Model Samples Accuracy (%) [7] Inception V3, ResNet50, Inception-ResNet V2 3141 96.1 [5] DarkCovidNet 625 98.08 [6] CNN 1072 99.44 [15] SVM, CNN, ResNet50, Xception, VGG16 5857 84.00 [17] VGG16, VGG19, ResNet, DenseNet, InceptionV3 1427 96.78 [18] COVID-Net 610 97.40 [20] Deep Learning 5000 97.20 [21] Deep CNN 216 95.30 [29] End-to-end CNN 5184 95.70 ResNet50 with HE 625 99.36 Proposed DenseNet201with Clahe+HE+GC 625 99.67 Images Using Multi-image Augmented Deep References Learning Model. Advances in Intelligent Systems and Computing, 1412:395–413, 2022. [1] Janko V, Slapničar G, Dovgan E, Reščič N, Kolenik http://dx.doi.org/10.1007/978-981-16-6890-6_30 T, Gjoreski M, Smerkol M, Gams M, Luštrek M. [7] Narin A, Kaya C, Pamuk Z. Automatic Detection of Machine Learning for Analyzing Non- Coronavirus Disease (COVID-19) Using X-ray Countermeasure Factors Affecting Early Spread of Images and Deep Convolutional Neural Networks, COVID-19. International Journal of Environment Pattern Analysis and Applications 24(3):1207– Research and Public Health, 18(13):6750, 2021. 1220, 2020. https://doi.org/10.3390/ijerph18136750 https://doi.org/10.1007/s10044-021-00984-y [2] Rajkumar S, Rajaraman PV, Meganathan HS, [8] Sarki R, Ahmed K, Wang H, Zhang Y, Wang K. Sapthagirivasan V, ejaswinee V, Ashwin R. Automated detection of COVID-19 through COVID-detect: a Deep Learning Approach for convolutional neural network using chest x-ray Classification of Covid-19 Pneumonia From Lung images. PLoS ONE 17(1), 2022. Segmented Chest Xrays. Biomedical Engineering: https://doi.org/10.1371/journal.pone.0262052 Applications, Basis and Communications 33(2), [9] Masud, M. A light-weight convolutional 2021. https://doi.org/10.4015/S1016237221500101 Neural Network Architecture for classification [3] Gams M, Kolenik, T. Relations between of COVID-19 chest X-Ray images. Multimedia Electronics, Artificial Intelligence and Information Systems, 28:1165–1174, 2022. Society through Information Society Rules. https://doi.org/10.1007/s00530-021-00857-8 Electronics, 10(4), 514, 2021. [10] Ravi, V, Narasimhan, H, Chakraborty, C et al. Deep https://doi.org/10.3390/electronics10040514 learning-based metaclassifier approach for COVID- [4] Tahir A, Qiblawey Y, Khandakar A, Rahman T, 19 classification using CT scan and chest X-ray Khurshid U, Musharavati F, Islam MT, Kiranyaz S, images. Multimedia Systems, 28:1401–1415, 2022. Al-Maadeed S, Chowdhury MEH. Deep Learning https://doi.org/10.1007/s00530-021-00826-1 for Reliable Classification of COVID-19, MERS, [11] Asif S, Zhao M, Tang F et al. A deep learning- and SARS from Chest X-ray Images. Cognitive based framework for detecting COVID-19 patients Computation. 14:1752-1772, 2022. using chest X-rays. Multimedia Systems, 28:1495– https://doi.org/10.1007/s12559-021-09955-1 1513, 2022. [5] Ozturk T, Talo M, Yildirim EA, Baloglu UB, https://doi.org/10.1007/s00530-022-00917-7 Yildirim O, Acharya UR. Automated detection of [12] Tahir A, Qiblawey Y, Khandakar A, Rahman T, covid-19 cases using deep neural networks with x- Khurshid U, Musharavati F, Islam MT, Kiranyaz S, ray images, Computers in Biology and Medicine. Al-Maadeed S, Chowdhury MEH. Exploring the 121(103792), 2020. effect of image enhancement techniques on https://doi.org/10.1016/j.compbiomed.2020.103792 COVID-19 detection using chest X-ray images. [6] Purohit K, Kesarwani A, Ranjan Kisku D, Dalui M. Computers in Biology and Medicine,132, 2021. COVID-19 Detection on Chest X-Ray and CT Scan https://doi.org/10.1016/j.compbiomed.2021.104319 Enhanced COVID-19 Detection Through Combined Image… Informatica 49 (2025) 67–76 75 [13] Kandhway P, Bhandari AK, Singh A. A novel from Multiple Chest Diseases Using Xrays. reformed histogram equalization based medical Sensors, 23(2):743, 2023. image contrast enhancement using krill herd https://doi.org/10.3390/s23020743 optimization, Biomedical Signal Processing and [24] Gulmez, B. A novel deep neural network model Control, 56 (101677), 2020. based Xception and genetic algorithm for detection https://doi.org/10.1016/j.bspc.2019.101677 of COVID-19 from Xray images. Annals of [14] Zhang J, Xie Y, Pang G, Liao Z, Verjans J, Li W, Operations Research, 328:617–641, 2022. Sun Z, He J, Li Y, Shen C et al. Viral Pneumonia https://doi.org/10.1007/s10479-022-05151-y Screening on Chest X-Rays Using Confidence [25] Zakariya A, Oraibi SA. Efficient COVID-19 Aware Anomaly Detection. IEEE Transaction on Prediction by Merging Various Deep Learning Medical Imaging, 40(3): 879–890, 2021. Architectures, Informatica 48(5): 55–62, 2024. https://doi.org/10.1109/tmi.2020.3040950 https://doi.org/10.31449/inf.v48i5.5424 [15] Deng X, Shao H, Shi L, Wang X, Xie T. A [26] He K, Zhang X, Ren S, Sun J. Deep residual classification–detection approach of COVID-19 learning for image recognition, Proceedings of the based on chest X-ray and CT by using keras IEEE Conference on Computer Vision and Pattern pretrained deep learning models. Computer Recognition.2016. Modeling in Engineering & Sciences. 125(2):579– https://doi.org/10.1109/CVPR.2016.90 596, 2020. [27] Huang G, Liu Z, Van Der Maaten L, Weinberger https://doi.org/10.32604/cmes.2020.011920 KQ. Densely connected convolutional networks, [16] Wang Z, Zhang K, Wang B. Detection of COVID- Proceedings of the IEEE Conference on 19 Cases Based on Deep Learning with X-ray Computer Vision and Pattern Recognition. Images. Electronics. 11(21):3511, 2022. 2017. https://doi.org/10.3390/electronics11213511 https://doi.ieeecomputersociety.org/10.1109/CVPR. [17] Apostolopoulos, I.D, Mpesiana T.A. COVID-19: 2017.243 Automatic detection from X-ray images utilizing [28] Patel S, Patel L. Deep Learning Architectures and transfer learning with convolutional neural its Applications: A Survey, International journal of networks. Physical and Engineering Sciences in computer sciences and engineering. 6(6):1177- Medicine. 43: 635–640, 2020. 1183, 2018 . https://doi.org/10.1007/s13246-020-00865-4 http://dx.doi.org/10.26438/ijcse/v6i6.11771183 [18] Mahmud T, Rahman A, Fattah S.A. CovXNet: A [29] Zakariya A. Oraibi, Safaa Albasri. A Robust End- multi-dilation convolutional neural network for to-End CNN Architecture for Efficient COVID-19 automatic COVID-19 and other pneumonia Prediction form X-ray Images with Imbalanced detection from chest X-ray images with transferable Data, Informatica, 47(7):115–126, 2023. multireceptive feature optimization. Computers in https://doi.org/10.31449/inf.v47i7.4790 Biology and Medicine, 122, 103869,2020. https://doi.org/10.1016/j.compbiomed.2020.103869 [19] Mohit K, Dhairyata S, Vinod K, and Wanich S. COVID-19 prediction through X-ray images using transfer learning-based hybrid deep learning approach. MaterialsToday: Proceeding. 51: 2520– 2524, 2022. https://doi.org/10.1016/j.matpr.2021.12.123 [20] Guefrechi S, Jabra M. B, Ammar A, Koubaa A, and Hamam H. Deep learning-based detection of COVID-19 from chest X-ray images. Multimedia tools and applications, 80: 31803-31820, 2021. https://doi.org/10.1007/s11042-021-11192-5 [21] Feki I, Ammar S, Kessentini Y, Muhammad K. Federated learning for COVID-19 screening from Chest X-ray images, Applied Soft Computing, 106, 2021. https://doi.org/10.1016/j.asoc.2021.107330. [22] Mohan A, Ftsum bAa, Beshir K, Takore TT. A Hybrid Deep Learning CNN model for COVID-19 detection from chest X-rays, Heliyon, 10(5), 2024. https://doi.org/10.1016/j.heliyon.2024.e26938. [23] Malik H, Naeem A, Naqvi, R A, and Loh, W. K. DMFL-Net: A Federated Learning-Based Framework for the Classification of COVID-19 76 Informatica 49 (2025) 67–76 A. Benchabane et al. https://doi.org/10.31449/inf.v49i16.7635 Informatica 49 (2025) 77–86 77 Enhancing Predictive Capabilities for Cyber Physical Systems Through Supervised Learning Dhanalakshmi B*, Tamije Selvy P Department of Computer Science and Engineering, Dr.N.G. P Institute of technology, India Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology, India E-mail: dhanalakshmib@drngpit.ac.in, tamijeselvy@gmail.com *Corresponding author Keywords: Cyber-physical system, real time data, traffic, machine learning Received: November 20, 2024 The rapid advancement and proliferation of Cyber-Physical Systems (CPS) have led to an exponential increase in the volume of data generated continuously. Efficient classification of this streaming data is crucial for predicting system behaviors and enabling proactive decision-making. This research aims to extract actionable knowledge from the continuous data streams of CPS and predict their behavior using advanced supervised learning algorithms. The predictions facilitate timely interventions and necessary actions within the interconnected physical network. The background of this work lies in the intersection of CPS, machine learning, and data stream mining. Traditional batch processing methods are inadequate for real-time analysis of CPS data due to their inherent latency and computational inefficiency. This research employs state-of-the-art techniques for real-time data processing, including incremental learning, sliding window models, and ensemble methods tailored for streaming data. Our approach differs from existing works by focusing on a comprehensive framework that integrates real-time data ingestion, preprocessing, feature extraction, and model updating in a seamless pipeline. Unlike previous studies that often rely on static datasets and offline analysis, our method ensures continuous learning and adaptation to evolving data patterns. Comparative analysis with existing techniques demonstrates superior performance in terms of accuracy, latency, and scalability. Specifically, our models achieved an average classification accuracy of 92%, with a precision of 90%, recall of 89%, and an F1 score of 89.5%. These metrics indicate significant improvements over traditional batch processing methods, which typically lag in responsiveness and adaptability. This research provides a robust and efficient solution for the real- time classification of streaming data from CPS, enhancing the system's ability to predict behaviors and take necessary actions promptly. Povzetek: Predstavljen je izviren celovit ogrodni model za razvrščanje podatkov v realnem času v kibernetsko-fizičnih sistemih (CPS) z uporabo nadzorovanega učenja. 1 Introduction need to be processed in real-time to ensure optimal performance and to address potential issues proactively. The integration of Cyber-Physical Systems (CPS) into Traditional batch processing methods are inadequate for various sectors marks a significant advancement in this task due to their inherent latency and computational technology, enabling seamless interaction between inefficiency. Instead, there is a need for techniques that physical processes and computational systems. These can handle the continuous, high-speed influx of systems, encompassing applications such as smart grids, information in a CPS. Supervised learning algorithms autonomous vehicles, industrial automation, and have shown considerable promise in various predictive healthcare monitoring, generate continuous streams of tasks within data science. These algorithms can identify data. This data, produced in real-time, holds valuable patterns and relationships within historical data and insights that can enhance system performance, reliability, predict future outcomes [3]. However, applying these and safety. However, the sheer volume and velocity of this techniques to streaming data requires adaptations to streaming data present significant challenges in terms of manage the continuous flow and update the model processing and analysis. Efficient classification and incrementally [4]. This research focuses on developing an prediction of CPS behaviors using this data are crucial for efficient framework for classifying and predicting CPS timely decision-making and intervention [1,2]. Cyber- behavior using supervised learning, including advanced Physical Systems are characterized by their ability to models like Hidden Markov Models (HMM) and Explicit- integrate physical processes with computational Duration Hidden Markov Models (EDHMM). capabilities through a network of sensors, actuators, and To achieve these objectives, this research employs a controllers. The data generated from these components variety of advanced techniques tailored for the unique 78 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. challenges of streaming data from CPS. Real-time data F1 scores consistently outperforming traditional batch ingestion and preprocessing are facilitated by leveraging processing techniques. These results validate the stream processing frameworks such as Apache Kafka and framework's ability to handle the complexities of CPS data Apache Flink, enabling efficient data ingestion and streams effectively. The practical implications of this ensuring that real-time data cleaning and normalization research are profound, offering enhanced operational techniques maintain data quality and consistency. efficiency and reliability in various CPS applications. For Incremental and online learning algorithms like Online instance, in a smart grid, accurate predictions of power Gradient Descent, Incremental Decision Trees, and demand and equipment failures can optimize energy Adaptive Random Forests are utilized, along with sliding distribution and maintenance schedules. In industrial window techniques to retain recent data, ensuring the automation, predicting machine failures and operational model adapts to the latest trends and patterns [5]. Hidden anomalies can prevent costly downtimes and improve Markov Models (HMM) are employed to model the production efficiency. stochastic processes underlying CPS data, capturing The primary objective of this research is to develop an temporal dependencies and sequential patterns. HMMs efficient framework for the classification of streaming consist of states representing different conditions or data from CPS, enabling the prediction of system modes of the CPS, observations that are data points behaviors and facilitating timely interventions. This generated by the CPS and are probabilistically dependent overarching goal can be broken down into several specific on the states, transition probabilities indicating the objectives: Develop methods for real-time ingestion and likelihood of transitioning from one state to another, and preprocessing of streaming data; Ensure the system can emission probabilities representing the likelihood of handle high-velocity data streams without significant observing a particular data point given a state. By latency; Implement supervised learning algorithms continuously updating the transition and emission capable of incremental learning, allowing the model to probabilities as new data arrives, HMMs enable real-time update continuously; Explore techniques such as sliding tracking of the system’s state and prediction of future window models and online learning to maintain model behaviors. Explicit-Duration Hidden Markov Models relevance over time; Design robust feature extraction (EDHMM) extend the capabilities of HMM by explicitly mechanisms that can operate in real-time; Identify and modeling the duration that the system spends in each state, create features that are predictive of CPS behaviors, which is particularly useful for CPS where the duration of ensuring these features can be computed on-the-fly; Apply certain states significantly impacts the system’s behavior, HMMs to model the probabilistic relationships and such as machinery operating cycles or sensor activation temporal dependencies in CPS data; Extend HMMs with periods. EDHMM components include state durations, EDHMM to incorporate state durations, providing more which are probabilistic distributions defining how long the precise temporal modeling; Establish metrics for system remains in a given state, and transition and evaluating model performance on streaming data, emission probabilities similar to HMM but adjusted to including accuracy, precision, recall, and F1 score; account for state duration distributions. By incorporating Develop strategies for model adaptation to cope with state durations, EDHMM provides a more accurate concept drift and changing data patterns; Compare the temporal modeling, enhancing the prediction of CPS performance of the proposed framework against behaviors over time. traditional batch processing methods and other state-of- Feature extraction and engineering are also crucial, the-art techniques; Conduct experiments to demonstrate involving the development of methods for real-time improvements in accuracy, latency, and scalability; Apply feature extraction that allows dynamic computation of the framework to real-world CPS scenarios, such as smart features as new data arrives and the creation of features grids and industrial automation systems; Showcase how based on domain knowledge that capture critical aspects the predictions and classifications can drive actionable of CPS behavior such as temporal patterns and anomaly decisions within the CPS. indicators. Model evaluation and adaptation are facilitated by establishing a real-time evaluation pipeline that 2 Literature review continuously monitors model performance using metrics like accuracy, precision, recall, and F1 score, and The increasing complexity of Cyber-Physical implementing strategies to handle concept drift, such as Systems (CPS) and their integration into various sectors retraining models based on performance degradation. This necessitate advanced data processing and predictive research distinguishes itself from existing works by techniques to ensure optimal performance and security. offering an integrated framework that combines real-time The literature reveals a range of approaches for handling data processing, incremental learning, and advanced streaming data, including supervised learning, clustering, modeling techniques like HMM and EDHMM. While active learning, semi-supervised learning, and advanced previous studies often focus on isolated aspects of CPS models such as Hidden Markov Models (HMM) and data analysis, this work emphasizes a comprehensive Explicit-Duration Hidden Markov Models (EDHMM). approach that addresses the practical challenges of Cheng et al. (2021) [6] introduced MATEC, a dynamic CPS environments. The comparative analysis lightweight neural network designed for online encrypted highlights significant improvements in performance traffic classification. This approach addresses the metrics. The proposed methods achieved an average challenges of real-time data classification in CPS by classification accuracy of 92%, with precision, recall, and focusing on the efficiency and speed of the model, making Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 79 it suitable for environments where data streams are challenges of clustering and classifying short text data in continuous and rapid. The model's lightweight nature real-time, which is relevant for CPS applications ensures that it can be deployed in resource-constrained involving text data, such as social media analysis or sensor settings without compromising performance. Coletta et al. logs. Li et al. (2020) [16] introduced a classification and (2019) [7] proposed combining clustering and active novel class detection algorithm based on the cohesiveness learning to detect and learn new image classes. This and separation index of Mahalanobis distance. This method is particularly relevant to CPS, where new patterns technique ensures that the system can effectively classify or anomalies must be detected promptly. By integrating data while detecting new classes, crucial for maintaining clustering with active learning, the system can identify the adaptability and accuracy of CPS. Lu et al. (2019) [17] novel classes of data efficiently, enhancing its ability to reviewed learning under concept drift, highlighting the adapt to changing conditions in real-time. Din et al. (2020) challenges and solutions for maintaining model [8] focused on online reliable semi-supervised learning for performance in dynamically changing environments. evolving data streams. Their approach leverages both Concept drift is a common issue in CPS, where the labeled and unlabeled data, ensuring that the model can underlying data distribution can change over time. The learn effectively even when labeled data is scarce. This review covers various strategies to detect and adapt to method is crucial for CPS, where obtaining labeled data concept drift, ensuring that models remain effective. for every new scenario can be impractical. The semi- Wang and Chen (2019) [18] discussed the construction of supervised learning model adapts to changes in the data a data aggregation tree with maximized lifetime in stream, maintaining high performance despite evolving wireless sensor networks. This method focuses on conditions. Dong et al. (2022) [9] presented an optimizing the lifetime of the network, which is essential interpretable federated learning-based framework for for the sustainability and reliability of CPS. Xu and Duan network intrusion detection. Federated learning allows (2019) [19] surveyed big data applications for CPS in multiple devices to collaboratively learn a model without Industry 4.0, highlighting the role of data analytics in sharing raw data, addressing privacy concerns inherent in optimizing industrial processes. Their survey covers CPS. This approach ensures robust security measures various techniques for processing and analyzing big data, while maintaining the confidentiality of sensitive data emphasizing the importance of efficient data management across the network. Folino et al. (2020) [10] developed a in CPS. Zaitseva and Lavrova (2020) [20] explored the genetic programming-based ensemble classification self-regulation of network infrastructure in CPS based on framework for time-changing intrusion detection data the genome assembly problem. This innovative approach streams. This ensemble approach combines multiple applies biological principles to optimize network models to improve overall prediction accuracy and adapt performance and self-regulation, offering a novel to changes in the data. The genetic programming aspect perspective on CPS management. The literature provides allows the system to evolve over time, ensuring that it a comprehensive overview of various approaches for remains effective in the face of new threats. Hu et al. handling streaming data in CPS. These methods range (2018) [11] introduced a random forests-based class from lightweight neural networks and federated learning incremental learning method for activity recognition. This to quantum machine learning and genetic programming- technique is particularly useful for CPS, where new based ensemble classification. Each technique addresses activities or behaviors may emerge over time. The specific challenges related to real-time data processing, incremental learning approach ensures that the model can adaptability, and security in CPS. The integration of these continuously adapt without needing a complete retraining, advanced methods ensures that CPS can operate making it efficient for real-time applications. efficiently and effectively in dynamic environments, Yagyu et al (2020) [12] discussed hierarchical maintaining high performance and reliability. The aggregation of select network traffic statistics, proposed work overcomes the challenges in existing emphasizing the importance of efficient data aggregation works by offering an integrated framework that combines in CPS. This method enhances the scalability and real-time data processing, incremental learning, and manageability of data streams, ensuring that the system advanced modeling techniques like HMM and EDHMM. can handle large volumes of data without significant Traditional methods often suffer from limitations such as latency. Júnior et al. (2019) [13] explored novelty latency, inefficiency in handling high-velocity data, and detection for multi-label stream classification, a critical inability to adapt to evolving data streams. By leveraging capability for CPS to identify and respond to new and real-time data ingestion and preprocessing with stream unforeseen events. Their approach ensures that the system processing frameworks like Apache Kafka and Apache can maintain high accuracy and reliability even when Flink, the proposed framework ensures efficient handling encountering novel data patterns. Kalinin and Krundyshev of continuous data. Incremental and online learning (2022) [14] applied quantum machine learning techniques algorithms such as Online Gradient Descent, Incremental for security intrusion detection. This cutting-edge Decision Trees, and Adaptive Random Forests allow the approach leverages the computational power of quantum model to update continuously, addressing the challenge of computing to enhance the efficiency and accuracy of maintaining model relevance over time. The use of HMM intrusion detection, offering a promising direction for and EDHMM enhances the framework's ability to capture future CPS security measures. Kumar et al. (2020) [15] temporal dependencies and state durations, providing proposed an online semantic-enhanced Dirichlet model more accurate temporal modeling. This approach ensures for short text stream clustering. This model addresses the 80 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. robust performance even in the face of concept drift, a duration modeling, making them suitable for applications common issue in dynamic CPS environments. where the duration of states is an important factor. The performance of these models is continuously evaluated 3 Proposed methodology using metrics such as accuracy, precision, recall, and F1- score. This evaluation ensures that the models remain The proposed methodology aims to create an efficient effective over time. However, in dynamic environments and adaptive framework for the classification and like CPS, data distributions can change, leading to a prediction of streaming data from Cyber-Physical Systems phenomenon known as concept drift. Concept drift occurs (CPS). This section outlines the key components and when the statistical properties of the target variable change techniques employed in the framework, including real- over time, which can degrade the performance of time data ingestion, preprocessing, supervised learning predictive models. To address concept drift, techniques for algorithms, advanced modeling with Hidden Markov detecting and adapting to these changes are integrated into Models (HMM) and Explicit-Duration Hidden Markov the system. When concept drift is detected, models are Models (EDHMM), and real-time feature extraction. In retrained or updated to accommodate the new patterns in the realm of Cyber Physical Systems (CPS), the the data, ensuring that predictions remain accurate and continuous influx of data presents a significant challenge reliable. This adaptive approach is essential for and opportunity for real-time analysis and prediction. maintaining the relevance and performance of the models Efficient classification and prediction of this data are in the face of changing data environments. crucial for timely decision-making and ensuring the reliability and safety of these systems. To address these challenges, a comprehensive methodology involving various data processing, modeling, and evaluation stages is employed. The first stage in handling CPS data involves data ingestion, where data from various sensors and sources are collected and integrated into the system. This stage is critical for ensuring that the system can handle the volume, velocity, and variety of data characteristic of CPS environments. Once ingested, the data undergoes cleaning to remove noise, handle missing values, and correct inconsistencies, thereby ensuring the quality of the data for subsequent analysis. Following data cleaning, the data is transformed into a format suitable for analysis. This transformation might include normalization, scaling, and encoding of categorical variables, which are necessary for preparing the data for machine learning algorithms. Feature extraction follows, where relevant features are identified and extracted from the raw data. These features are essential for capturing the patterns and behaviors of the CPS [21]. Feature selection then plays a crucial role in Figure 1: Proposed architecture improving model performance and reducing computational complexity. By selecting only the most Figure 1 outlines a systematic approach to the relevant features, the dimensionality of the data is efficient classification and prediction of streaming data reduced, which helps in building more efficient and from Cyber Physical Systems (CPS). It begins with "Raw effective predictive models. For modeling, supervised Data" collection, followed by "Data Ingestion" to gather learning algorithms are typically employed. These data from various sources. "Data Cleaning" is performed algorithms are trained on historical data to learn the to ensure data quality by removing noise and handling underlying patterns and relationships, enabling them to missing values. The clean data is then transformed in the make predictions on new data. Popular algorithms include "Data Transformation" stage to prepare it for analysis. decision trees, support vector machines, and neural Next, the "Feature Extraction" stage identifies networks, each offering different advantages in terms of relevant features, which are subsequently refined in the accuracy, interpretability, and computational efficiency. "Feature Selection" stage to reduce dimensionality and In addition to traditional supervised learning models, enhance model performance. The selected features are advanced modeling techniques like Hidden Markov then used for "Model Training" with supervised learning Models (HMM) and Explicit-Duration Hidden Markov algorithms, and "Model Prediction" is carried out to Models (EDHMM) are used. HMMs are particularly forecast CPS behavior. effective for modeling time series data and capturing In parallel, the diagram includes advanced modeling temporal dependencies, which are common in CPS data. techniques like "HMM Training" and "EDHMM EDHMMs extend HMMs by incorporating explicit state Training," which produce "HMM Model" and "EDHMM Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 81 Model," respectively. These models are integrated into the data, ensuring adaptability to changing data prediction stage for improved accuracy. distributions. "Model Evaluation" assesses the performance of the 3.2 Advanced Modeling with HMM and predictive models, ensuring their reliability. The system EDHMM also includes "Concept Drift Detection" to identify To capture the temporal dependencies and state changes in data patterns over time, prompting "Model transitions in CPS data, the proposed framework employs Adaptation" to update and retrain models, maintaining Hidden Markov Models (HMM) and Explicit-Duration their effectiveness in dynamic environments. This Hidden Markov Models (EDHMM). comprehensive workflow ensures robust and adaptive prediction capabilities for CPS data streams. Hidden Markov Models (HMM) State Representation: HMMs consist of hidden states that 3.1 Real-time data ingestion and represent different conditions or modes of the CPS. preprocessing Observations are the data points generated by the CPS and efficient handling of continuous data streams is critical for are probabilistically dependent on these states. CPS. The proposed framework utilizes stream processing Transition and Emission Probabilities: HMMs use frameworks such as Apache Kafka and Apache Flink to transition probabilities to model the likelihood of moving facilitate real-time data ingestion. These technologies from one state to another and emission probabilities to ensure that data can be ingested at high speeds and with represent the likelihood of observing a particular data low latency, crucial for maintaining the performance of point given a state. CPS. Real-Time Updates: As new data arrives, the transition and emission probabilities are updated in real-time, Data ingestion allowing the model to adapt to new patterns and predict Apache Kafka: Kafka is used to handle the ingestion of future states accurately. large volumes of streaming data. Its distributed nature allows it to scale horizontally, ensuring reliability and Explicit-Duration Hidden Markov Models (EDHMM) fault tolerance. State Duration Modeling: EDHMM extends HMM by Apache Flink: Flink complements Kafka by providing explicitly modeling the duration that the system spends in real-time data processing capabilities. It allows for each state. This is particularly useful for CPS, where the complex event processing, real-time analytics, and duration of states (such as operational cycles or sensor machine learning tasks on data streams. activation periods) significantly impacts behavior. Duration Probabilities: EDHMM incorporates Data preprocessing probabilistic distributions that define how long the system Real-Time Data Cleaning: Techniques such as filtering, remains in a given state, enhancing the temporal accuracy normalization, and handling missing values are applied in of predictions. real-time to ensure data quality. Temporal Precision: By incorporating state durations, Data Transformation: Data is transformed into a suitable EDHMM provides a more precise temporal modeling, format for the machine learning models. This includes improving the prediction of CPS behaviors over time. scaling features and encoding categorical variables. 3.3 Real-time feature extraction and Supervised learning algorithms engineering The core of the predictive framework relies on Feature extraction is critical for the performance of supervised learning algorithms capable of incremental machine learning models. The proposed framework learning. Incremental learning, also known as online includes methods for real-time feature extraction, ensuring learning, allows models to update their parameters as new that features are dynamically computed as new data data arrives without requiring a complete retraining from arrives. scratch. Feature Extraction Methods Algorithms used • Sliding Window Technique: This technique involves maintaining a window of the most recent data points • Online gradient descent: This algorithm updates and computing features based on this window. It the model weights incrementally for each new data ensures that the model focuses on the most relevant point, making it suitable for real-time applications. and recent data. • Incremental decision trees: Algorithms like • Domain-Specific Features: Features are created Hoeffding Trees are used to build decision trees based on domain knowledge, capturing critical incrementally, allowing the model to adapt as new aspects of CPS behavior such as temporal patterns, data comes in. trend analysis, and anomaly indicators. • Adaptive random forests: This method extends the • Dynamic Computation: Features are computed on- random forest algorithm by allowing trees to be the-fly, allowing the system to adapt to new data added or pruned based on their performance on new points and maintain high predictive performance. 82 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. Model evaluation and adaptation • Stream of data points (𝑥𝑡 , 𝑦𝑡) where 𝑥𝑡 is the Evaluating the performance of the predictive framework feature vector and 𝑦𝑡 is the target in real-time is crucial for maintaining its effectiveness. Output: The proposed framework includes a real-time evaluation • Updated weights 𝑤𝑡 pipeline to monitor model performance continuously. Procedure: Evaluation metrics 1. Initialize weights 𝑤0 • Accuracy, Precision, Recall, and F1 Score: 2. For each data point (𝑥𝑡 , 𝑦𝑡) in the stream: These metrics are used to evaluate the 1. Predict 𝑦?̂? = 𝑥𝑡 − 1, 𝑥𝑡 performance of classification models. 2. Compute the error 𝑒𝑡=𝑦𝑡 − 𝑦?̂? Continuous monitoring ensures that any 3. Update the weights: 𝑤𝑡= 𝑤𝑡 − 1 + 𝜂𝑒𝑡𝑥𝑡 degradation in performance is promptly detected. 3. Continue until the end of the data stream • Concept drift detection: Strategies such as window-based evaluation and performance Incremental Decision Trees (Hoeffding Tree) monitoring are employed to detect concept drift, Algorithm: Incremental Decision Tree (Hoeffding ensuring that the model adapts to changing data Tree) patterns. Input: • Stream of data points (𝑥𝑡 , 𝑦𝑡) where 𝑥𝑡 is the Model adaptation strategies feature vector and 𝑦𝑡 is the target • Retraining and update mechanisms: When • Confidence parameter 𝛿 performance degradation is detected, the model • Grace period 𝑛 is retrained or updated to maintain its accuracy. Output: • Adaptive learning rates: Adjusting the learning • Decision tree rate based on model performance helps in fine- Procedure: tuning the model continuously. 1. Initialize an empty decision tree In the area of Cyber-Physical Systems (CPS), where 2. For each data point (𝑥𝑡 , 𝑦𝑡) in the stream: real-time data processing and predictive analytics are • Traverse the tree to find the appropriate leaf paramount, the application of suitable algorithms plays a for (𝑥𝑡 , 𝑦𝑡) pivotal role. Here, we introduce several key algorithms • Update sufficient statistics at the leaf tailored to address the challenges inherent in processing • If the number of data points at the leaf mod streaming data within CPS environments. Online Gradient 𝑛=0: Descent facilitates continuous learning by iteratively 1. Compute the Gini impurity for each attribute updating model parameters based on observed data, 2. Identify the best attribute to split on using the ensuring adaptability to changing conditions in the data Hoeffding bound stream. Incremental Decision Trees, exemplified by the 3. If the difference in impurity between the best Hoeffding Tree algorithm, dynamically grow decision attribute and the second-best attribute exceeds trees as new data arrives, efficiently handling streaming the bound, split the leaf node on the best attribute data while preserving model accuracy with minimal 3. Continue until the end of the data stream memory usage. Adaptive Random Forests offer a dynamic solution to concept drift and changing conditions by Algorithm: Adaptive Random Forests continuously monitoring individual tree performance and Input: replacing underperforming ones with new trees trained on • Number of trees 𝐾K recent data. Hidden Markov Models (HMMs) capture temporal dependencies and state transitions in streaming • Stream of data points (𝑥𝑡 , 𝑦𝑡) where 𝑥𝑡 is the data, enabling predictive modeling and anomaly detection feature vector and 𝑦𝑡 is the target in dynamic CPS environments. Finally, the Explicit- Output: Duration Hidden Markov Model (EDHMM) enhances • Ensemble of decision trees traditional HMMs by explicitly modeling state durations, Procedure: providing more precise temporal modeling and improving 1. Initialize an ensemble of 𝐾K decision trees predictive analytics accuracy in streaming CPS data. 2. For each data point ((𝑥𝑡 , 𝑦𝑡) in the stream: These algorithms collectively form the backbone of our • For each tree 𝑇𝑖 in the ensemble: proposed framework for efficient classification and • Traverse 𝑇𝑖 to find the appropriate leaf for (𝑥𝑡 , 𝑦𝑡) prediction in CPS, addressing the unique challenges posed • Update sufficient statistics at the leaf by streaming data in dynamic environments. • If the number of data points at the leaf mod 𝑛=0: 1. Compute the Gini impurity (or another splitting Algorithm: Online Gradient Descent criterion) for each attribute Input: 2. Identify the best attribute to split on using the • Learning rate 𝜂 Hoeffding bound • Initial weights 𝑤0 3. If the difference in impurity between the best attribute and the second-best attribute exceeds Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 83 the bound, split the leaf node on the best maintaining model accuracy with minimal memory usage. attribute Adaptive Random Forests further enhance model • Monitor the performance of 𝑇𝑖 using a adaptability by dynamically adjusting the ensemble of sliding window of recent predictions decision trees based on performance feedback, effectively • If the performance of 𝑇𝑖 degrades combating concept drift. Hidden Markov Models (HMM) significantly, replace 𝑇𝑖 with a new tree capture temporal dependencies in CPS data, allowing for trained on recent data probabilistic modeling of sequential observations. The 3. Continue until the end of the data stream Explicit-Duration Hidden Markov Model (EDHMM) extends HMM by explicitly modeling state durations, Algorithm: Explicit-Duration Hidden Markov Model providing more precise temporal modeling and enhancing (EDHMM) prediction accuracy. These algorithms collectively enable Input: real-time feature extraction, model updating, and • Number of states 𝑁 predictive analytics, ensuring the framework's efficacy in • Observation sequence 𝑂 = 𝑂1, 𝑂2, … . . 𝑂𝑡 , 𝑞𝑡 = handling the complexities of streaming data in CPS 𝑆𝑖|𝜆 environments. • Initial state distribution 𝜋 • State transition matrix 𝐴 4 Results and discussion • Observation probability matrix 𝐵 The proposed methodology for efficient classification of Output: streaming data from Cyber Physical Systems (CPS) was • Updated parameters π, A, 𝐵 evaluated using various performance metrics. The metrics Procedure: used include accuracy, precision, recall, F1-score, and 1 Initialize π, A, and 𝐵 processing time. The models were tested on a dataset consisting of [insert dataset details here], and the results 2 Expectation-Maximization (EM) algorithm: are summarized in the tables below. 1. E-step: Compute the forward probabilities The performance of traditional supervised learning 𝛼α and backward probabilities β models (e.g., Decision Trees, Support Vector Machines, 2. M-step: Update π, A, and B using α and β and Neural Networks) is presented in Table 1. Figure 2 to 3 Iterate the EM steps until convergence or for a 6 shows the performance comparison of supervised fixed number of iterations learning models. E-step: • Compute forward probabilities 𝑎𝑡(𝑖, 𝑑) = 𝑂𝑡−𝑑+1, … . . 𝑂𝑡 , 𝑞𝑡 = 𝑆𝑖 , 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 = 𝑑|𝜆 • Compute backward probabilities 𝛽𝑡(𝑖) = 𝑂𝑡+1, … . . 𝑂𝑡 , 𝑞𝑡 = 𝑆𝑖 , 𝑑𝑢𝑟𝑎𝑡𝑖𝑜𝑛 = 𝑑|𝜆 M-step: Update initial state distribution: 𝜋𝑖 = 𝛾1(𝑖) Update state transition matrix ∑𝑇−1 𝑡−1 ∈𝑡 (𝑖, 𝑗) 𝑎𝑖,𝑗 = Figure 2: Accuracy comparison ∑𝑇−1 𝑡−1 𝛾𝑡(𝑖, 𝑗) Update observation probability matrix: 𝑏𝑗(𝑘) ∑𝑇−𝑑 𝑡−1 𝛾𝑡(𝑗) 1 (𝑂𝑡 = 𝑣𝑘) = ∑𝑇−𝑑 𝑡−1 𝛾𝑡(𝑗) Update duration probability matrix: ∑𝑇−1 𝑡−1 𝛾𝑡(𝑖, 𝑑) 𝑑𝑖(𝑑) = ∑𝑇−1 𝑡−1 𝛾𝑡(𝑖) The proposed framework utilizes several key algorithms to effectively handle streaming data in Cyber-Physical Systems (CPS). Online Gradient Descent enables continuous learning by updating model parameters incrementally as new data arrives, ensuring adaptability to Figure 3: Precision comparison evolving patterns. Incremental Decision Trees, such as the Hoeffding Tree algorithm, dynamically grow decision trees in response to changing data distributions, 84 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. Figure 4: Recall comparison Figure 6: Comparison of processing time Figure 5: F1 score comparison Table 1: Performance metrics for supervised learning models Model Accuracy Precision Recall F1-Score Processing Time (ms) Decision Tree 92.3% 91.8% 92.0% 91.9% 150 SVM 93.7% 93.2% 93.5% 93.3% 300 Neural 95.2% 94.8% 95.0% 94.9% 500 Network The Neural Network outperforms both the Decision 91.9% F1-score). This model is suitable for applications Tree and SVM in terms of accuracy, precision, recall, and where speed is critical, but slight compromises in F1-score, achieving 95.2%, 94.8%, 95.0%, and 94.9% prediction accuracy are acceptable. The performance of respectively. This indicates that the Neural Network is the HMM and EDHMM is shown in Table 2. HMMs are more effective at accurately predicting CPS behavior and particularly effective for time series data and capturing identifying relevant instances, with fewer false positives temporal dependencies. and negatives. However, this enhanced performance comes with a higher processing time of 500 ms, reflecting Table 2: Performance Metrics for Hidden Markov Model its greater computational complexity. (HMM) and EDHMM The SVM, with an accuracy of 93.7%, precision of Metric HMM EDHMM 93.2%, recall of 93.5%, and F1-score of 93.3%, performs better than the Decision Tree but requires twice the Accuracy 94.5% 96.1% processing time (300 ms). This makes SVM a good Precision 94.0% 95.7% middle-ground option, balancing improved predictive Recall 94.3% 95.9% performance with moderate computational demands. The Decision Tree, while being the fastest with a processing F1-Score 94.1% 95.8% time of 150 ms, has the lowest performance metrics Processing 400 600 (92.3% accuracy, 91.8% precision, 92.0% recall, and Time (ms) Enhancing Predictive Capabilities for Cyber Physical Systems… Informatica 49 (2025) 77–86 85 performance over time. The significant improvement in Table 2 presents a comparison between the Hidden performance metrics after adaptation underscores the Markov Model (HMM) and the Explicit-Duration Hidden importance of continuously monitoring and updating Markov Model (EDHMM) based on key performance models to handle evolving data distributions in CPS.In metrics. In terms of accuracy, EDHMM achieves 96.1%, summary, the proposed methodology, combining compared to 94.5% for HMM. This indicates that traditional supervised learning with advanced HMM and EDHMM makes fewer classification errors and is better at EDHMM models, and incorporating concept drift correctly predicting CPS behavior. Precision, which detection, provides a robust framework for efficient measures the proportion of true positive predictions classification and prediction of CPS data. This approach among all positive predictions, is 95.7% for EDHMM and ensures high accuracy, adaptability, and scalability, 94.0% for HMM, suggesting that EDHMM has a lower making it suitable for real-time applications in dynamic rate of false positives. Recall, the proportion of true CPS environments. positive predictions among all actual positives, is 95.9% for EDHMM versus 94.3% for HMM, showing 5 Conclusion EDHMM's improved ability to identify relevant instances. The F1-Score, which harmonizes precision and recall, is In this research, we presented an efficient framework higher for EDHMM at 95.8% compared to HMM's 94.1%, for classification and prediction of streaming data from confirming EDHMM's overall better performance. Cyber Physical Systems (CPS). The study utilizing However, this enhanced performance comes at the cost of traditional supervised learning algorithms and advanced processing time. EDHMM's processing time is 600 ms, modeling techniques such as Hidden Markov Models higher than HMM's 400 ms, reflecting the additional (HMM) and Explicit-Duration Hidden Markov Models computational complexity of modeling explicit state (EDHMM). Our approach aimed to extract valuable durations. Despite this, the trade-off is justified by the knowledge from continuous data streams and predict substantial gains in predictive accuracy and reliability, system behavior accurately, facilitating timely decision- making EDHMM a more robust choice for real-time CPS making within interconnected CPS environments. The applications. To assess the system's ability to handle results demonstrated the effectiveness of the proposed concept drift, models were evaluated before and after the methodology across various performance metrics, adaptation process. Table 4 summarizes the performance including accuracy, precision, recall, and F1-score. of the models before and after detecting and adapting to Among the traditional models, the Neural Network concept drift. outperformed others, achieving the highest accuracy of 95.2%, albeit with higher processing time. The SVM Table 3: Performance metrics before and after concept struck a balance between accuracy and computational drift adaptation efficiency, while the Decision Tree offered the fastest Metric Before After processing time with acceptable accuracy. The advanced Adaptation Adaptation HMM and EDHMM models showed significant Accuracy 85.0% 92.0% advantages in handling time series data, capturing temporal dependencies, and explicitly modeling state Precision 84.5% 91.5% durations. The EDHMM, in particular, achieved superior performance with an accuracy of 96.1% and an F1-score Recall 84.8% 91.8% of 95.8%, despite its higher computational cost. These models proved to be robust in dynamic environments, F1-Score 84.6% 91.6% maintaining high predictive accuracy over time. A crucial aspect of the methodology was the integration of concept Processing 200 250 drift detection and model adaptation mechanisms. This Time (ms) ensured that the models remained relevant and effective in the face of changing data distributions, a common The results demonstrate the effectiveness of the challenge in CPS applications. The ability to detect proposed methodology in classifying and predicting concept drift and adapt models accordingly significantly streaming data from CPS. The supervised learning improved their performance, as evidenced by the post- models, particularly the Neural Network, achieved high adaptation metrics. accuracy and F1-scores, indicating strong predictive performance. However, the Neural Network required more processing time compared to the Decision Tree and References SVM. HMM and EDHMM models showed superior performance in handling time series data, with EDHMM [1] Subutai Ahmad, Alexander Lavin, Scott Purdy, and outperforming HMM in all metrics. This highlights the Zuha Agha. Unsupervised real-time anomaly detection advantage of explicitly modeling state durations in CPS for streaming data. Neurocomputing, 262:134–147, data, where the duration of states can significantly impact 2017. system behavior. https://doi.org/10.1016/j.neucom.2017.04.070. The concept drift detection and model adaptation [2] Giuseppe Aceto, Domenico Ciuonzo, Antonio mechanism proved crucial in maintaining model Montieri, and Antonio Pescapé. 86 Informatica 49 (2025) 77–86 Dhanalakshmi B et al. DISTILLER: Encrypted traffic classification via [12] Isao Yagyu, Hiroshi Hasegawa, and Ken-ichi Sato. multimodal multitask deep learning. Journal of An efficient hierarchical optical path network design Network and Computer Applications, 183– algorithm based on a traffic demand expression in a 184:102985, 2021. Cartesian product space. IEEE Journal on Selected https://doi.org/10.1016/j.jnca.2021.102985.. Areas in Communications, 26(6):22–31, 2008. [3] Maroua Bahri, Albert Bifet, João Gama, Heitor Murilo https://doi.org/10.1109/JSACOCN.2008.030907. Gomes, and Silviu Maniu. [13] Joel D. Costa Júnior, Elaine R. Faria, Jonathan A. Data stream analysis: Foundations, major tasks and Silva, João Gama, and Ricardo Cerri. tools. WIREs Data Mining and Knowledge Discovery, Novelty detection for multi-label stream 11(3):e1405, 2021. classification. 2019 8th Brazilian Conference on http://dx.doi.org/10.1002/widm.1405.. Intelligent Systems (BRACIS), pages 194–199, 2019. [4] Jean Paul Barddal, Lucas Loezer, Fabrício Enembreck, https://doi.org/10.1109/BRACIS.2019.00034. and Riccardo Lanzuolo. [14] Maxim Kalinin and Vasiliy Krundyshev. Lessons learned from data stream classification Security intrusion detection using quantum machine applied to credit scoring. Expert Systems with learning techniques. Journal of Computer Virology Applications, 162:113899, 2020. and Hacking Techniques, 19:125–136, 2023. https://doi.org/10.1016/j.eswa.2020.113899.. https://doi.org/10.1007/s11416-022-00435-0. [5] Kaylani Bochie, Mateus S. Gilbert, Luana Gantert, [15] Jay Kumar, Junming Shao, Salah Uddin, and Wazir Mariana S. M. Barbosa, Dianne S. V. Medeiros, and Ali. An online semantic-enhanced Dirichlet model Miguel Elias M. Campista. for short text stream clustering. Proceedings of the A survey on deep learning for challenged networks: 58th Annual Meeting of the Association for Applications and trends. Journal of Network and Computational Linguistics, pages 766–776, 2020. Computer Applications, 194:103213, 2021. https://doi.org/10.18653/v1/2020.acl-main.70. https://doi.org/10.1016/j.jnca.2021.103213.. [16] Xiangjun Li, Yong Zhou, Ziyan Jin, Peng Yu, and [6] Jin Cheng, Yulei Wu, Yuepeng E, Junling You, Tong Shun Zhou. A classification and novel class Li, Hui Li, and Jingguo Ge. detection algorithm for concept drift data stream MATEC: A lightweight neural network for online based on the cohesiveness and separation index of encrypted traffic classification. Computer Networks, Mahalanobis distance. Journal of Electrical and 199:108472, 2021. Computer Engineering, 2020:4027423, 2020. https://doi.org/10.1016/j.comnet.2021.108472., 2021. https://doi.org/10.1155/2020/4027423.. [7] Luiz F. S. Coletta, Moacir Ponti, Eduardo R. Hruschka, [17] Jie Lu, Anjin Liu, Fan Dong, Feng Gu, João Gama, Ayan Acharya, and Joydeep Ghosh. and Guangquan Zhang. Combining clustering and active learning for the Learning under concept drift: A review. IEEE detection and learning of new image classes. Transactions on Knowledge and Data Engineering, Neurocomputing, 358:150–165, 2019. 31(12):2346–2363, 2019. https://doi.org/10.1016/j.neucom.2019.04.070.. https://doi.org/10.1109/TKDE.2018.2876857. [8] Salah Ud Din, Junming Shao, Jay Kumar, Waqar Ali, [18] Shaohua Wan, Yudong Zhang, and Jia Chen. Jiaming Liu, and Yu Ye. On the construction of data aggregation tree with Online reliable semi-supervised learning on evolving maximizing lifetime in large-scale wireless sensor data streams. Information Sciences, 525:153–171, networks. IEEE Sensors Journal, 16(20):7433–7440, 2020. 2016. https://doi.org/10.1016/j.ins.2020.03.052.. https://doi.org/10.1109/JSEN.2016.2581491. [9] Song Li, Han Qiu, and Jialiang Lu. [19] Li Da Xu and Lian Duan. Big data for cyber-physical An interpretable federated learning-based network systems in Industry 4.0: A survey. Enterprise intrusion detection framework. arXiv Preprint, 2022. Information Systems, 13(2):148–169, 2019. https://arxiv.org/abs/2201.03134. https://doi.org/10.1080/17517575.2018.1442934.. [10] Gianluigi Folino, Francesco Sergio Pisani, and Luigi [20] E. A. Zaitseva and D. S. Lavrova. Pontieri. A GP-based ensemble classification Self-Regulation of the Network Infrastructure of framework for time-changing streams of intrusion Cyberphysical Systems on the Basis of the Genome detection data. Soft Computing, 24:17541–17560, Assembly Problem. Automatic Control and 2020. Computer Sciences, 54:813–821, 2020. https://doi.org/10.1007/s00500-020-05200-3.. https://doi.org/10.3103/S0146411620080350.. [11] Chunyu Hu, Yiqiang Chen, Lisha Hu, and Xiaohui [21] Sristi Vashisth, Anjali Goyal. Peng. A novel random forests-based class Dynamic Anomaly Detection Using Robust Random incremental learning method for activity recognition. Cut Forests in Resource-Constrained IoT Pattern Recognition, 78:277–290, 2018. Environments. Informatica, 48(23): 107–120, 2024. https://doi.org/10.1016/j.patcog.2018.01.025.. https://doi.org/10.31449/inf.v48i23.6862. https://doi.org/10.31449/inf.v49i16.6639 Informatica 49 (2025) 87–96 87 A Comparative Study of Deep Learning Algorithms for Detecting Fungal Infection Skin Diseases Fajar Masya1, Joko Triloka* 2, Setia Wulandari2 1Mercu Buana University, Meruya Sel., Kembangan, Jakarta 11650, Indonesia 2Institute of Informatics and Business Darmajaya, Jl. Z.A. Pagar Alam No.93, Bandar Lampung 35141, Indonesia E-mail: fajar.masya@mercubuana.ac.id, joko.triloka@darmajaya.ac.id, setiawulan.2121211001@mail.darmajaya.ac.id *Corresponding author Keywords: mask r-cnn, yolov5, image classification; skin fungal infection Received: July 11, 2024 Many people place a high value on the health of their skin, frequently spending large sums of money on skincare products. Fungal infections are one of the most common skin conditions that can damage a person's self-esteem. When dealing with skin health issues, seeking advice from a knowledgeable dermatologist is essential. Deep learning is a contemporary technique that saves doctors time and helps them spot diseases early. Two deep learning algorithms that are useful in identifying patterns of skin illnesses are Mask R-CNN and YOLOv5. This paper explores using Mask R-CNN and YOLOv5 to recognize skin illnesses caused by fungal infections, going through several processing phases. The research results show that the YOLOv5 strategy performed best in accuracy, recall, precision, F1-Score, and AUC. This algorithm shows great potential and warrants further investigation in practical applications. Povzetek: Primerjava algoritmov Mask R-CNN in YOLOv5 za zaznavanje glivičnih kožnih bolezni kaže, da YOLOv5 dosega najboljše rezultate, s čimer izkazuje velik praktični potencial. 1 Introduction enhance images by extracting valuable information. Object detection algorithms, often employing machine Skin covers the entire surface of the human body and is learning or deep learning, automate relevant findings. In the largest organ, directly exposed to the external medical science, digital image processing is instrumental environment [1]. Various diseases affect the skin, ranging in automating diagnostic processes [9]. from mild, itchy conditions to serious, potentially fatal Several studies have applied popular object detection ones [2]. Despite the importance of skin health, it is often algorithms, such as the Mask Regional-based overlooked, and many underestimate skin conditions. Convolutional Neural Network (Mask R-CNN) and You Most skin diseases result from bacterial, fungal, or viral Only Look Once (YOLO) algorithms. One study using the infections and allergies [3]. Several factors can directly or Mask R-CNN algorithm for breast cancer detection indirectly impact the skin, causing diseases that may be reported an accuracy of 91% and a precision of 84% [10]. treatable with medications, while others necessitate Another study implemented Mask R-CNN to find, detect, consultation with a professional skin disease specialist and classify objects in images or videos of the Ryze Tello [4,5]. Consultation with a specialist in dermatology is drone, achieving an average accuracy of 95.6% [11]. essential for individuals with skin health concerns. Additionally, research using Mask R-CNN for However, due to embarrassment and the high cost of automatically detecting and recognizing small magnetic treatment, many individuals with skin diseases remain targets in shallow underground layers demonstrated an silent, leading to decreased self-confidence and social average detection accuracy of 97%, a recall rate of 94%, withdrawal. This social isolation can contribute to and an average detection speed of 0.35 seconds per image depression. Therefore, dermatologists must engage in on a GPU [12]. Studies employing the YOLOv5 algorithm early detection and prevention of skin diseases, as these have also shown significant results. One study detecting conditions can be easily transmitted. face masks with YOLOv5 after 300 epochs achieved an accuracy rate of approximately 96.6% [13]. Another study In the modern era, nearly all sectors, including using YOLOv5 to determine whether a face mask is being medicine, rely on computerized systems to replace worn reported an accuracy of 97.90% [14]. The conventional methods with automated technology [6]. application of popular object detection algorithms like Researchers, particularly in medical science, are actively Mask R-CNN and YOLOv5 has been widely successful seeking solutions to help doctors diagnose diseases early across diverse fields. The specific accuracies and without excessive time expenditure [7]. This is where precision rates mentioned for different applications like digital image processing becomes essential [8]. Digital breast cancer detection, drone imagery classification, image processing involves using computer algorithms to underground magnetic target detection, and face mask 88 Informatica 49 (2025) 87–96 F. Masya et al. detection highlight these algorithms' versatility and high pathology detection using CNN algorithms, reaching a test performance in various domains. accuracy of 89%. While multiple studies have investigated the use of Meanwhile, several studies have utilized YOLO in Mask R-CNN and YOLO for a variety of medical research on different tasks, including [21], which has applications, including breast cancer detection, face mask achieved 92.20% accuracy in real-time face mask recognition, and other skin illnesses, there has been a detection under multiple conditions. [22] calculating striking paucity of research focusing on fungal skin melanoma skin cancer using a web application integrated infections. Existing research focuses mostly on bacterial with the YOLOv5. The model evaluates if the stain is or viral skin disorders or non-specific skin diseases, cancerous or benign. [23] applying YOLO for early skin leaving a vacuum in the early identification and cancer detection with the test results showed that the categorization of fungal infections with advanced deep- YOLOv5's model has an accuracy of learning models. This gap is crucial since fungal infections 89.1% in detecting skin cancer types. Moreover, a are common and sometimes misdiagnosed due to proposed Yolo deep neural network which can classify 9 symptoms that overlap with other skin disorders. different classes of skin cancer was conducted by [24], This study intends to close the highlighted gap by their experimental analysis shows that the proposed thoroughly comparing two cutting-edge deep learning method achieves the mean average precision score of systems, Mask R-CNN, and YOLOv5, for identifying and 88.03% and 86.52% for Yolo V3 and Yolo V4 categorizing fungal skin diseases. This is critical since respectively. fungal infections are among the most common skin disorders, affecting millions of people worldwide, and early detection is essential for avoiding consequences. 3 System model This study enhances the application of deep learning in dermatology by comparing the performance of these 3.1 Mask R-CNN algorithms. It also provides practical insights for real-time Mask R-CNN, developed by the Facebook AI Research diagnostic tools in healthcare settings. (FAIR) team in 2017, is a deep learning algorithm renowned for detecting objects in images while 2 Related work simultaneously generating a segmentation mask for each instance, a technique commonly referred to as instance Numerous studies have explored the efficacy of various segmentation [25]. As depicted in Figure 1. instance algorithms for classifying skin diseases caused by fungal segmentation shares similarities with object detection, infections. In 2017, [15] investigated the use of image wherein individual objects are detected sequentially. processing techniques, including Discrete Cosine However, it integrates semantic segmentation, enabling Transform (DCT), Discrete Wavelet Transform (DWT), each object to be categorized, localized, and distinguished and Singular Value Decomposition (SVD), achieving an at the pixel level. impressive detection efficiency of up to 80%. The average During the detection process, Mask R-CNN operates training time across the three transformations and their across three main components: the feature extraction parallel combinations was 2.066 seconds, with an average network, region-proposal network, and instance detection testing time of 0.7866 seconds. Subsequently, in 2018, and segmentation networks. Mask R-CNN employs [16] delved into the utilization of the K-Means and Fuzzy various backbone architectures [26], including ResNet- C-Means algorithms, providing valuable insights into skin 101 and FPN for feature extraction. Through disease detection. The adoption of these algorithms experimentation, the ResNet-101 backbone has supported early diagnosis and disease-type identification. demonstrated above-average accuracy and speed in In deep learning, [17] introduced Convolutional feature extraction. In the Region Proposal Network (RPN) Neural Network (CNN) algorithms for skin disease phase, Regions of Interest (ROIs) are generated, serving detection in 2018, demonstrating enhanced accuracy and as input for the subsequent instance detection and efficiency compared to traditional methods. The CNN segmentation networks stage. approach yielded better results, paving the way for more a. Feature extraction: Feature extraction aims to advanced diagnostic tools. Building on this progress, [18] distill information from images and represent it explored the application of the YOLOv3 algorithm in the in a lower-dimensional space, facilitating the medical field in 2019. Their investigation encompassed classification of patterns. In the context of Mask diverse tasks, such as white blood cell detection and R-CNN, feature extraction involves generating identifying target strings of bananas and fruit stems. Region of Interest (RoI) features through the Notably, the YOLOv3 algorithm achieved impressive fusion of ResNet-101 architecture with FPN accuracy rates, showcasing its versatility and potential in (Feature Pyramid Network). FPN plays a crucial medical imaging. role in recognition systems by enabling the Further research by [19] focused on facial skin identification of objects of various sizes within disease analysis using CNN algorithms based on clinical the same image. FPN enhances information images. Their study encompassed the detection of five quality by utilizing multiple feature maps. It facial skin diseases, achieving notable accuracies for adopts a pyramid design principle for feature various conditions. Additionally, [20] investigated skin extraction, offering superior speed and accuracy. A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 89 Figure 1: The Mask R-CNN framework for instance segmentation FPN integrates both bottom-up and top-down information processing techniques to achieve 3.2 YOLO comprehensive feature representation. b. RPN (Region Proposal Network): Within the You Only Look Once (YOLO) is an algorithm developed feature extraction process, a 3 x 3 convolution by the Facebook AI Research (FAIR) team to quickly and layer is applied to each generated feature map. accurately detect various types of objects. YOLO Initially, the feature map undergoes scanning addresses single regression problems directly by mapping utilizing anchor boxes of various sizes and image pixels to bounding box coordinates and class ratios. Subsequently, the output is bifurcated probabilities. It requires only one look at an image to into two branches: one associated with the predict what objects are present and where they are objectivity or confidence score, and the other located. YOLO operates by using a single convolutional with the bounding box regressor, as depicted in network that simultaneously predicts multiple bounding Figure 2. boxes and the probability of each class within those boxes. c. Instance Detection and Semantic it has overall 24 convolutional layers, four max-pooling Segmentation: During the instance layers, and two fully connected layers as illustrated in segmentation process, objects, bounding boxes, Figure 3. It is trained on images to optimize detection class labels, and confidence values are detected performance. The architecture works as follows: through a fully connected network that takes the a. The input image is resized to 448x448 before Region of Interest (RoI) as input. Semantic being processed by the convolutional network. segmentation is then performed on the image b. A 1x1 convolution is initially applied to reduce using a Fully Convolutional Network (FCN), the number of channels, followed by a 3x3 which predicts the semantic class of each pixel convolution to generate a cuboidal output. within the bounding box. As a result, distinct c. The ReLU activation function is used colors are assigned to each instance based on the throughout, except for the final layer, which bounding box delineation, facilitating visual uses a linear activation function. differentiation of individual objects. d. Additional techniques, such as batch normalization and dropout, are employed to regularise the model and prevent overfitting. Figure 2: RPN pr ocessing 90 Informatica 49 (2025) 87–96 F. Masya et al. Figure 3: YOLOv5 archite cture 4 Proposed procedures The YOLOv5 algorithm utilized in this study has Proposed Mask R-CNN comprises three primary stages. multiple phases for object detection. Using PyTorch as a To begin with, it uses the darknet-53 architecture to extract feature extractor, YOLOv5 detects objects by classifying features. Second, it uses the input image to derive the them and locating them depending on the features that are coordinates of Regions of Interest (RoI) using the Region extracted. The goal of YOLOv5 feature extraction is to Proposal Network (RPN) approach. Finally, it predicts the supply input variables for the classification procedure. class of the discovered objects, revealing information The suggested YOLOv5 algorithm architecture is about the ROI sites. This procedure yields a mask that displayed in Figure 5. highlights areas suggestive of fungal-induced skin disorders. A suggested Mask R-CNN technique based on edge detection is shown in Figure 4 to identify the skin conditions in the dataset. Figure 4: Proposed Mask R-CNN Architecture Figure 5: Proposed YO LOv5 architecture A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 91 5 Experiments on algorithm 5.3 Algorithm processing processing Figure 9 illustrates the idea of processing both algorithms. Installing Python, TensorFlow, Keras, and other necessary 5.1 Dataset description software is part of the dependency installation process for the Mask R-CNN algorithm. Installing deep learning This study's publicly accessible dataset is from packages such as PyTorch, NumPy, and Pandas for http://www.dermnet.com/dermatology-pictures-skin- YOLOv5 was required for YOLOv5. The dataset that will disease-pictures and consists of images of 1,473 data be utilized to train the object detection algorithms is points and 3 class labels of skin diseases: prepared during the object detection data loading stage. Dermatomycosis, Mucocutaneous Candidiasis, and The dataset needs to include pictures and properly Pityriasis Versicolor. Before splitting the dataset, it is first formatted annotations (labels and bounding boxes) for the preprocessed to ensure each image is appropriate for Mask R-CNN technique to function. Every object in the labeling. Figure 6 shows an example of a dataset with skin dataset needs to be labeled for detection for the YOLOv5 conditions brought on by fungus infections. algorithm to work. The training configurations for both algorithms are established in the configuration setting. 5.2 Data Pre-processing Raw data needs to be treated first. Partitioning and labeling the dataset are preprocessing steps. Labeling gives each thing in the picture a name and ensures that it is part of the appropriate group. After that, the 1.473 image dataset is split into training and testing sets. The Mask R- CNN algorithm's labeling procedure entails object segmentation with the polygon tool. On the other hand, the YOLO algorithm utilizes bounding boxes created with a bounding box tool for labeling. With 10% of the data for testing and the remaining 90% for training, the dataset was reduced to 1.136 following the labeling phase. A ratio of 80% for training and 20% for testing was also used in the experiment. Pre- processing procedures led to a minor decrease in the overall dataset size due to data cleaning procedures, much as the 90/10 split. The model's performance was likewise Figure 7: Data labeling on images using the decreased to 1136 with this split, demonstrating how well polygon tool it performs in comparison to the initial 90/10 split. Figure 7 provides a polygon tool example of data labeling, while Figure 8 provides a bounding box tool example. Figure 6: Sample images of skin diseases caused by fungal infections 92 Informatica 49 (2025) 87–96 F. Masya et al. The setup for the Mask RCNN algorithm contains details about the number of iterations, batch size, number of classes, and other pertinent parameters. Configuration options for the YOLOv5 include batch size, learning rate, and number of epochs. To improve object detection accuracy, Mask R-CNN performs gradient computations and modifies model weights throughout the training phase. Parameters in the YOLOv5 algorithm are optimized to improve the accuracy of object detection. During the testing phase, fresh photos are used to perform object detection. For every object that is recognized, the Mask RCNN algorithm produces bounding boxes and class labels. To evaluate the object detection performance of the YOLOv5 algorithm, a dataset that hasn't been seen before is used. 5.4 Algorithm evaluation We assessed YOLOv5 and Mask R-CNN for object detection in this study because of their high accuracy and effectiveness in managing related tasks. These algorithms were selected due to their resilience and efficacy in object identification and classification, particularly in situations requiring quick and accurate detection. Although these Figure 9: The notion of algorithmic techniques are the focus of our study, we acknowledge the processing possibility of expanding it to include other highly respected object detection algorithms, like SSD (Single Shot MultiBox Detector), EfficientDet, and Faster R- CNN. We consider these algorithms for future research 6 Results and discussion initiatives because they may offer further insights into the Once training and testing are finished, the assessment step comparative performance of our dataset. This procedure is carried out to gauge the effectiveness of the YOLOv5 assesses how well the YOLOv5 and Mask R-CNN and Mask R-CNN algorithms. We test the Mask RCNN algorithms work. There are two phases to the evaluation: algorithm with various iteration settings and thresholds. In testing and training. During the training phase, both contrast, the Mask RCNN algorithm is evaluated using methods employed 1,023 photos from the dataset. During 1000, 1500, 2000, 2500, and 3000 iteration values, the testing phase, 113 different photos are used to assess respectively, with threshold values varying between 0.1 the algorithms. At this stage, the algorithms' object and 0.9 for every iteration. The YOLOv5 algorithm, on the detection performance is evaluated, and a confusion other hand, makes use of distinct threshold values and matrix is used to calculate the algorithms' accuracy. epochs. 50, 75, 100, 125, and 150 are the employed epoch Numerous significant performance indicators, including values, and each epoch's threshold values range from 0.1 accuracy, precision, recall, F1-score, mean average to 0.9. precision (MAP), and area under the curve (AUC), can be a. Mask R-CNN Algorithm: The Mask R-CNN obtained from the confusion matrix. Five iterations and algorithm identified 80 data labels for epochs of performance evaluation are granted for both Dermatomycosis (D), 19 for Mucocutaneous algorithms. Candidiasis (MC), and 0 for Pityriasis Versicolor (PV) after five tests. A total of 113 photos were positively detected. Table 1 presents the interpretation of the performance calculation for the Mask RCNN method used to treat skin diseases. The algorithm uses 3000 iterations and varies the threshold (T) from 0.1 to 0.9. The F1- score is the method for identifying the optimal model, with a threshold value for model evaluation between 0.1 and 0.9. When assessing the binary model, the harmonic mean of precision and recall is employed using the F1-score. According to Table 1, the maximum F1-score of 0.28 is attained at the 0.1 level. The precision is 49%, the recall is 19%, and the accuracy is 67% at this threshold. Figure 8: Data labeling on images using the bounding box tool A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 93 Table 1: Performance of Mask R-CNc.N with 3000 Iterations d. Accuracy Recall Precision F1-Score T D MC PV MAP D MC PV MAP D MC PV MAP D MC PV MAP 0.1 0,39 0,7 0,92 0,67 0,35 0,23 0 0,19 0,87 0,59 0 0,49 0,5 0,33 0 0,28 0.2 0,4 0,72 0,93 0,68 0,35 0,19 0 0,18 0 ,91 0,63 0 0,51 0,51 0,29 0 0,27 0.3 0,4 0,73 0,94 0,69 0,34 0,15 0 0,16 0,92 0,58 0 0,5 0,5 0,24 0 0,25 0.4 0,4 0,73 0,94 0,69 0,33 0,14 0 0,16 0 ,95 0,69 0 0,55 0,49 0,23 0 0,24 0.5 0,4 0,74 0,95 0,7 0,33 0,15 0 0,16 0,96 0,73 0 0,56 0,49 0,25 0 0,25 0.6 0,4 0,75 0,96 0,7 0,32 0,14 0 0,15 0 ,96 0,71 0 0,56 0,48 0,23 0 0,24 0.7 0,39 0,75 0,96 0,7 0,31 0,12 0 0,14 0,96 0,67 0 0,54 0,47 0,2 0 0,22 0.8 0,38 0,77 0,97 0,71 0,31 0,13 0 0,15 0 ,96 0,67 0 0,54 0,47 0,22 0 0,23 0.9 0,33 0,81 0,98 0,71 0,26 0,12 0 0,13 0 ,91 0,5 0 0,47 0,4 0,19 0 0,2 The area under the ROC graph is then computed using the c. Evaluation of Proposed Algorithms: The Mask AUC value. It is employed as a performance evaluation R-CNN and the YOLOv5 algorithm can be statistic to gauge a classification model's effectiveness. A compared to the calculation results obtained from higher AUC score indicates better model performance in the algorithm testing technique for detecting differentiating between positive and negative classes. In fungal infections-caused skin problems, which the fifth test, an AUC value of 0.55 is displayed on the contained 113 data points from three different ROC graph of the Mask R-CNN method, as depicted in skin conditions. Table 3 displays the comparison Figure 10. values. In every metric that is examined, YOLOv5 outperforms Mask R-CNN, including accuracy (0.87), recall (0.80), precision (0.85), F1-Score (0.81), and AUC (0.88). The variety of fungal infections in appearance, size, form, and texture makes it particularly difficult to diagnose skin illnesses caused by these infections. Algorithms that process medical pictures accurately and efficiently are necessary for effective detection. Here, the effectiveness of two widely used object detection algorithms, YOLOv5 and Mask R-CNN, is compared. This can be beneficial when precise infection borders are critical in medical imaging. Dermatologists may find the capacity to create segmentation masks especially Figure 10: ROC following the fifth Mask R-CNN helpful in diagnosing and treating infections since they algorithm test offer comprehensive details about the affected regions. However, because of its multi-stage processing, Mask R- b. YOLOv5 Algorithm: By the time the fifth test CNN requires a lot of computing power. This may lead to was reached, the YOLOv5 algorithm had slower inference and longer training times, which could be accurately identified 113 images; however, it had problematic for real-time applications or for handling big only identified 86 data labels for Pityriasis datasets. Versicolor, 39 for Mucocutaneous Candidiasis, By adding a branch for predicting segmentation masks and 86 for Dermatomycosis. The performance on each Region of Interest (RoI) in parallel with the testing at epoch 150 with a threshold value of 0.1 current branch for classification and bounding box is displayed in Table 2. According to Table 2, the regression, Mask R-CNN expands upon Faster R-CNN. greatest F1-Score value is 0.81, at the 0.1 The two-stage method of Mask R-CNN, which includes threshold, with 86% accuracy, 8% recall, and region proposal and refining, enables very accurate object 94% precision. With an AUC value of 0.88. identification and segmentation. Figure 11. displays a graph for the ROC of the YOLOv5 algorithm in the fifth test. 94 Informatica 49 (2025) 87–96 F. Masya et al. Table 2: Performance of YO LOv5 with 150 epochs Accuracy Recall Precision F1-Score T D MC PV MAP D MC PV MAP D MC PV MAP D MC PV MAP 0.1 0,77 0,83 0,99 0,86 0,77 0,72 0,92 0,8 0,71 0,92 0,8 0,81 0,8 0,71 0,92 0,81 0.2 0,74 0,86 0,99 0,86 0,68 0,72 0,92 0,77 0,78 0,92 0,84 0,85 0,76 0,75 0,92 0,81 0.3 0,72 0,86 0,98 0,85 0,62 0,69 0,83 0,71 0,8 0,91 0,85 0,85 0,72 0,74 0,87 0,78 0.4 0,67 0,85 0,98 0,83 0,52 0,62 0,82 0,65 0,83 0,9 0,86 0,86 0,65 0,71 0,86 0,74 0.5 0,62 0,83 0,97 0,81 0,42 0,53 0,5 0,48 0,85 1 0,89 0,91 0,57 0,65 0,67 0,63 0.6 0,58 0,8 0,96 0,78 0,33 0,42 0,33 0,36 0,89 1 0,92 0,94 0,49 0,57 0,5 0,52 0.7 0,53 0,75 0,93 0,74 0,23 0,25 0 0,16 1 0 1 0,67 0,37 0,4 0 0,26 0.8 0,44 0,69 0,93 0,69 0,08 0,07 0 0,05 1 0 1 0,67 0,15 0,13 0 0,09 0.9 0,4 0,67 0,93 0,67 0,02 0 0 0,01 0 0 1 0,33 0,04 0 0 0,01 Table 3. Performance comparison of proposed algorithms Iteration/ Threshold Prediction Accuracy Recall Precision F1-Score AUC Epochs M Y M Y M Y M Y M Y M Y M Y M Y 1000 50 0,6 0,1 176 163 0,79 0,84 0,34 0,76 0,6 0,76 0,43 0,75 0,61 0,81 1500 75 0,2 0,1 179 179 0,73 0,87 0,42 0,76 0,36 0,85 0,38 0,78 0,61 0,88 2000 100 0,1 0,1 220 183 0,67 0,86 0,22 0,78 0,45 0,82 0,3 0,8 0,55 0,87 2500 125 0,7 0,1 199 161 0,8 0,87 0,4 0,75 0,67 0,85 0,45 0,79 0,57 0,84 3000 150 0,1 0,1 260 183 0,67 0,86 0,19 0,8 0,49 0,81 0,28 0,81 0,55 0,88 predictions, which is essential for identifying various Additionally, YOLOv5 produced higher recall and precision metrics, demonstrating its efficacy in reducing false positives and false negatives. This is important for medical diagnosis since misdiagnosing a healthy area as sick (false positive) or failing to detect an infection (false negative) can have serious repercussions. This implies that YOLOv5 has a higher degree of accuracy when it comes to recognizing contaminated regions in the pictures. Furthermore, the high AUC suggests that YOLOv5 performs better across a range of threshold values in differentiating between infected and non-infected areas. Although Mask R-CNN provides thorough segmentation, the comparison research indicates that for this specific task, the benefits of segmentation are not greater than those of YOLOv5's higher detection accuracy and Figure 11: ROC following the fifth of the YOLOv5 efficiency. However, in some clinical situations where Algorithm test precise infection boundaries are required, Mask R-CNN's segmentation function might still be useful. Our proposed technique, YOLOv5, is a quick and Because YOLOv5 processes information more efficient single-stage object detection algorithm that quickly, it is more suited for real-world uses where timely outperforms two-stage detectors such as Mask R-CNN. findings are crucial, like automated screening systems in Large-scale image analysis and real-time detection healthcare settings. According to the comparison analysis, scenarios benefit greatly from this efficiency. Although it YOLOv5 is a more sensible option for large-scale screens might not offer as much segmentation depth as Mask R- and real-time applications. Nonetheless, the needs of the CNN, YOLOv5 has a very high degree of object detection application, such as the necessity for segmentation against and classification accuracy. It captures items of different the requirement for quick and precise identification, sizes with the aid of anchor boxes and multi-scale should be considered while choosing between the two algorithms. A Comparative Study of Deep Learning Algorithms for Detecting… Informatica 49 (2025) 87–96 95 7 Conclusion [5] Altammami, G. S et al. 2024. Dermatological Conditions in The Intensive Care Unit at A Tertiary There are notable variations in performance parameters Care Hospital in Riyadh, Saudi Arabia.” Saudi including accuracy, recall, precision, F1-Score, and AUC Medical Journal vol. 45, 8. pp. 834-839. when YOLOv5 and Mask R-CNN algorithms are https://doi.org/10.15537/smj.2024.45.8.20240479 compared to identify fungal diseases on the skin. The disparities arise from variations in iteration (or epoch) [6] Khavandi, S. et al. 2023. Investigating the Impact of values, which affect the algorithms' capacity to acquire Automation on the Health Care Workforce Through knowledge and generalize from the training set. The Autonomous Telemedicine in the Cataract Pathway: results of performance tests indicate that algorithm Protocol for a Multicenter Study.” JMIR research performance is influenced by epoch or iteration values protocols vol.12. e49374. pp.1-10. from the Mask R-CNN and YOLOv5 algorithms' first to https://doi.org/10.2196/49374 fifth tests. The second test had the highest AUC value with [7] Roy, K. S. et al. 2019. Skin Disease Detection Based 1500 iterations of the Mask R-CNN method and 75 epochs on Different Segmentation Techniques. 2019 Int. Conf. of the YOLOv5 algorithm. With an AUC value of 66% Opto-Electronics Appl. Opt. Optronix. pp. 1–5, 2019. and an F1-score of 38%, the Mask R-CNN algorithm is https://doi.org/10.1109/OPTRONIX.2019.8862403 less successful in 1500 iterations at identifying diseases caused by fungal infections of the skin. On the other hand, [8] Archana, R. and Jeevaraj, P.S.E. 2024. Deep learning with an AUC value of 88% and an F1-Score value of 78%, models for digital image processing: a review. Artif the YOLO algorithm's test results demonstrate its good Intell Rev 57, 11. ability to identify diseases brought on by skin infections, https://doi.org/10.1007/s10462-023-10631-z as evidenced by its ability to forecast 179 disorders in [9] Thakur, G. K. et al. 2024. Deep Learning Approaches epoch 75. for Medical Image Analysis and Diagnosis. Cureus The thorough investigation demonstrates that vol. 16,5 e59507. pp. 1-8. YOLOv5 performs better than Mask R-CNN in terms of https://doi.org/10.7759/cureus.59507 accuracy, recall, precision, F1-Score, and AUC when it [10] Bhatti, H.M.A. et al. 2020. Multi-detection and comes to identifying fungal diseases on the skin. Iteration Segmentation of Breast Lesions Based on Mask and epoch settings have a significant impact on RCNN-FPN. Proc. - 2020 IEEE Int. Conf. performance; YOLOv5 shows the best performance at 75 Bioinforma. Biomed. BIBM 2020, pp. 2698–2704. epochs. Mask R-CNN is less suited for this application https://doi.org/10.1109/BIBM49941.2020.9313170 due to its computational intensity and lower detection accuracy, even though it has segmentation capabilities. As [11] Subash, K.V.V. et al. 2020. Object Detection Using a result, in both clinical and real-world contexts, YOLOv5 Ryze Tello Drone With Help of Mask-RCNN. 2nd Int. is the recommended algorithm for identifying fungal- Conf. Innov. Mech. Ind. Appl. ICIMIA 2020 - Conf. induced skin disorders due to its exceptional efficiency Proc., no. Icimia, pp. 484-490, 2020. and accuracy. Future research can continue to enhance the https://doi.org/10.1109/ICIMIA48430.2020.907488 detection accuracy and practical applicability of deep 1 learning models for diagnosing skin fungal infections. [12] Zhou, Zhijian. et al. 2020. Detection and Classification of Multi-Magnetic Targets Using References Mask-RCNN. IEEE Access. 8. 187202-187207. [1] Majaranta, P. et al. 2019. Eye Movements and Human- https://doi.org/10.1109/access.2020.3030676 Computer Interaction In: Klein, C., Ettinger, U. (eds) [13] Ieamsaard, J. et al. 2021. Deep Learning-based Face Eye Movement Research. Studies in Neuroscience, Mask Detection Using YoloV5. In Proceeding of the Psychology, and Behavioral Economics. Springer, 2021 9th International Electrical Engineering Cham, pp.971-1015. Congress, iEECON 2021, Mar. 2021, pp. 428–431. https://doi.org/10.1007/978-3-030-20085-5_23 https://doi.org/10.1109/iEECON51072.2021.94403 [2] Al Bshabshe, A. et al. 2023. An Overview of Clinical 46 Manifestations of Dermatological Disorders in [14] Yang, G. et al. Face Mask Recognition System with Intensive Care Units: What Should Intensivists Be YOLOV5 Based on Image Recognition. 2020. In Aware of? Diagnostics 13, no. 7: 1290. 2020 IEEE 6th International Conference on https://doi.org/10.3390/diagnostics13071290 Computer and Communications, ICCC 2020, Dec. [3] Nigat, T.D.; Sitote, T.M. 2023. Gedefaw, B.M. Fungal 2020, pp. 1398-1404. Skin Disease Classification Using the Convolutional https://doi.org/10.1109/ICCC51575.2020.9345042 Neural Network. J. Healthc. Eng. 6370416, pp.1-9. [15] Ajith, A. et al. 2017. Digital Dermatology Skin https://doi.org/10.1155/2023/6370416 Disease Detection Model using Image Processing. [4] Badia, M. et al. 2020. Dermatological Manifestations Int. Conf. Intell. Comput. Control Syst. ICICCS in the Intensive Care Unit: A Practical Approach. Crit 2017 Digit., vol. 24, no. 7, pp. 168-173. Care Res Pract. 2020:9729814. https://doi.org/10.1109/iccons.2017.8250703 https://doi.org/10.1155/2020/9729814 96 Informatica 49 (2025) 87–96 F. Masya et al. [16] Haddad, A and Hameed, S.A. 2018. Image Analysis [22] Chhatlani, J. et al. 2022. DermaGenics - Early Model for Skin Disease Detection: Framework. Proc. Detection of Melanoma using YOLOv5 Deep 2018 7th Int. Conf. Comput. Commun. Eng. ICCCE Convolutional Neural Networks. 2022 IEEE Delhi 2018, no. c, pp. 280-283, 2018. Section Conference (DELCON). pp. 1-6. https://doi.org/10.1109/ICCCE.2018.8539270 https://doi.org/10.1109/DELCON54057.2022.9753 227 [17] Rathod, J. et al. 2018. Diagnosis of skin diseases using Convolutional Neural Networks. Proc. 2nd Int. [23] Wiliani, N., et al. 2023. Identifying Skin Cancer Conf. Electron. Commun. Aerosp. Technol. ICECA Disease Types with You Only Look Once (YOLO) 2018, no. Iceca, pp. 1048-1051, 2018. Algorithm. Jurnal Riset Informatika. 5(3), 455-464. https://doi.org/10.1109/ICECA.2018.8474593 https://doi.org/10.34288/jri.v5i3.241 [18] Rohaziat, N. et al. 2020. White Blood Cells Detection [24] Aishwarya, N., et al. 2023. Skin Cancer diagnosis using YOLOv3 with CNN Feature Extraction with Yolo Deep Neural Network. Procedia Models. International Journal of Advanced Computer Science. vol 220, pp. 651-658. Computer Science and Applications. 11. https://doi.org/10.1016/j.procs.2023.03.083 https://doi.org/10.14569/IJACSA.2020.0111058 [25] He, K., et al. 2020. Mask R-CNN. IEEE Trans. [19] Wu, Z. et al. 2019. Studies on Different CNN Pattern Anal. Mach. Intell. vol. 42, no. 2, pp. 386- Algorithms for Face Skin Disease Classification 397. Based on Clinical Images. IEEE Access. vol. 7, no. https://doi.org/10.1109/TPAMI.2018.2844175 c, pp. 66505-66511. [26] Shi, W., et al. 2019. Plant-part segmentation using https://doi.org/10.1109/ACCESS.2019.2918221 deep learning and multi-view vision. Biosyst. Eng. [20] Li, L. F. et al. 2020. Deep Learning in Skin Disease vol. 187, no. September, pp. 81–95. Image Recognition: A Review. IEEE Access. vol. 8, https://doi.org/10.1016/j.biosystemseng.2019.08.01 pp. 208264-208280. 4 https://doi.org/10.1109/ACCESS.2020.3037258 [21] Salama, A. M. et al. 2024. A YOLO-based Deep Learning Model for Real-Time Face Mask Detection via Drone Surveillance in Public Spaces, Information Sciences, vol. 676, 120865. https://doi.org/10.1016/j.ins.2024.120865 https://doi.org/10.31449/inf.v49i16.4974 Informatica 49 (2025) 97–114 97 A Unified Trace Meta-Model for Alignment and Synchronization of BPMN and UMLModels Aljia Bouzidi1, Nahla Haddar2 and Kais Haddar2 1ISIMM University of Monastir, Monastir, Tunisia 2University of Sfax, Sfax, Tunisia E-mail: aljia.bouzidi@gmail.com, nahla_haddar@yahoo.fr, kais.haddar@yahoo.fr Keywords: Traceability, synchronization alignment, use case diagram, class diagram, MVC, BPMN diagram, integration, transformation Received: February 24, 2025 Organizations often face information system (IS) failures due to misalignment with business goals. Business process models (BPMs) play a crucial role in addressing this issue but are often developed independently of IS models (ISMs), resulting in non-interoperable systems. This paper proposes a traceability method to link BPMs and ISMs, bridging the gap between business and software domains. We introduce a unified trace meta-model integrating BPMN elements with UML constructs (use cases and class diagrams) via traceability links. This meta-model is instantiated as the BPMNTraceISM diagram, ensuring seamless integration through bidirectional transformation models. To validate our approach, we developed a graphical editor for BPMNTraceISM diagrams and implemented transformations using the ATLAS Transformation Language (ATL). A case study on a loan approval process demonstrates the method’s effectiveness in aligning BPMN and UML elements, improving interoperability and model alignment across domains Povzetek: Razvit je enoten sledilni meta-model, ki povezuje elemente BPMN in UML (diagram primerov uporabe, razredov po vzorcu MVC) za uskladitev poslovnih procesov in informacijskih sistemov, ki ga validirajo z grafičnim urejevalnikom in transformacijami ATL. 1 Introduction evant explicit traceability model and defined it using the integration mechanism. Indeed, we propose a requirements In the software engineering field, Business Process Mod- engineering method that works at both the meta-model and els (BPMs) playb an increasingly central role in the devel- model levels, establishing traceability between BPMs and opment and continued management of software systems. ISMs to bridge the gap between business modeling and re- Therefore, it is crucial to have Information System Mod- quirements elicitation.modeling. This method is deliber- els (ISMs) that tackle BPMs. modelling. However, these ately influenced by the Object Management Group (OMG) models are mostly expressed using different modeling lan- specifications. Particular attention is given to UML use guages, and only a few information systems (IS) are devel- case models [2] as the most commonly used way to elicit oped with explicit consideration of the business processes software needs and BPMN [3] as the most widely used lan- they are supposed to support. This separation causes gaps guage to specify the business process model (BPMs). In- between business and IS models. Thus, a methodology is deed, in [1], we firstly defined a unified trace meta-model needed to examine the gap between BPMs and ISMs, and of the BPMN and the UML use case models in the form of keep them aligned even as they evolve. Traceability in soft- an integrated single meta-model. It defines also traceability ware development proves its ability to associate overlap- links between interrelated concepts to correlate overlapped ping artefacts of heterogeneous models (for example, busi- concepts as new modeling concepts. This meta-model is ness models, requirements, uses cases, design models), im- then instantiated in the form of a new diagram that we called prove project results by helping designers and other stake- BPSUC (Business Process SupportedUse Cases). This new holders with common tasks such as analysis of change im- diagram permits business teams and requirements design pacts, etc. Thereby creating an explicit traceability model teams to work together within the same model, and allows that is not a standalone guideline, but it has significant ben- specifying trace links graphically. efits in terms of quality, automation, and consistency. Al- though creating it is not a trivial task, an explicit traceabil- The practical benefits of the proposed method lie in its ity model remains a reference for a consistent definition of ability to bridge the gap between business process man- typed traceability links between heterogeneous model con- agement (BPM) and software systems development. In the cepts, helping to ensure their alignment and coevolution. context of Business Process Models (BPMs) and Informa- In our previous work presented in [1], we proposed a rel- tion System Models (ISMs), this method enables seamless 98 Informatica 49 (2025) 97–114 A. Bouzidi et al integration and traceability across heterogeneous models, that defined external traceability models manually, basing which is crucial for ensuring their alignment and coevolu- on some mechanisms, such as model integration, model tion. By establishing clear and accurate traceability links merge/ composition, UML profiles or matrices. between BPMs, UML use case diagrams, and class dia- grams, our method enhances communication and collab- oration among business analysts, software engineers, and 2.1 Traceability via transformation models stakeholders, ensuring that software systems are developed In the first category, existing implicit traceability mod- with a clear understanding of the business processes they els are commonly MDA-compliant approaches that de- aim to support. Furthermore, the integration of these mod- fine traceability through exogenous, endogenous, hori- els, coupled with the explicit traceability model, provides zontal, or vertical transformation models. In these ap- several practical advantages, including improving change proaches, BPMN models are widely used to generate al- impact analysis, enhancing automation, and maintaining ternative models through different transformation model consistency throughout the system lifecycle. These ben- types. Among the various uses of BPMN models are: an efits are particularly significant in dynamic environments exogenous-based transformation for the definition of map- where both business processes and software systems evolve ping users/organizations' requirements with BPMNmodels frequently. Thus, the method not only improves the qual- [4]; a vertical transformation for the generation of artefacts ity of the development process but also provides a robust between BPMN and user stories [5] and [6]; and the gen- framework for aligning business and software models, en- eration of UML models [7]; a horizontal and exogenous suring their cohesive adaptation to changing requirements transformation for the generation of activity diagrams from and system developments. BPMN [8] and [9]; and a vertical transformation of textual This paper enriches and extends our work presented in requirements into a BPMN model [10]. Some approaches [1]. The enrichment involves adding class diagram con- define endogenous transformation between UML diagram cepts structured according to the MVC pattern. Our inter- elements to establish their traceability. For instance, the ap- vention considers both the meta-model and the model lev- proach in [11] usesmachine learning techniques tomaintain els. Hence, in the integrated trace meta-model proposed traceability information between software models. Their in [1], we add new modeling concepts to express trace focus is particularly on the requirements, analysis, and de- links between the class diagram, use case diagram, and sign models, which are specified by the UML language. To the BPMN concepts. Class diagram concepts that have no trace links between requirements documents and UML di- corresponding concepts are also included in the integrated agrams, several approaches use Natural Language Process- trace meta-model. Proposed traceability concepts and class ing (NLP). For example, the approach in [12] uses a sys- diagram concepts are instantiated in the BPSUC diagram. tem requirement description expressed in natural language Accordingly, BPSUC now enables the design of class dia- to extract the actors and the actions automatically. gram elements and the proposed traceability concepts com- The core benefit of defining implicit traceability is that bined with their corresponding BPMN and use case dia- it does not require supplemental effort because only one gram artefacts. chain of transformation is sufficient to perform transfor- We validate our theoretical method by implementing a mations in both directions. Moreover, it offers multiple visual modeling tool to support the enriched integrated trace trace links between generated artefacts. However, the iden- meta-model and the new diagram supplemented with class tified trace links consider exclusively transformed artefacts. diagram elements. Moreover, the transformation chain is static and cannot be The rest of this paper is organized as follows: Section 2 updated to obtain the required traces for such traceability is dedicated to discussing related work. In Section 3, we scenarios. give an overview of the method presented in [1]. Section 4 is devoted to explaining our contributions. Section 5 and section 6 are dedicated to demonstrating the feasibility of 2.2 Explicit traceability models our proposal in practice and through a topical case study. In The second category includes approaches that define ex- section 7, we evaluate and discuss our method. Finally, in plicit traceability models separate from the source mod- section 8, we conclude the current work and we give some els. This category includes approaches that propose guide- outlooks. lines for creating traceability models. For instance, the author of [13] defines a method for guiding the establish- ment of traceability between the software requirements and 2 Related work the UML diagrams. This guideline has two main compo- nents: (i) a meta-model and (ii) a process step. The pro- We classify related work into two groups based on the cess step defines the detailed processes, the mapping of methodologies they have used to establish traceability requirements to UML diagrams, and the types of require- between elements of heterogeneous models: (1) works ments. Requirements can be classified according to their that have proposed transformation models to define in- aspects. This classification can be carried out according to ternal or implicit traceability models, and (2) approaches the type of requirement, which then requires the use of cer- A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 99 tain types of UML diagrams. However, this guideline fo- between software artefacts. To demonstrate their work in cuses only on establishing traceability at the meta-model practice, the authors develop a tool that supports traceabil- level. Moreover, the business field is not considered in ity links between software models, including requirements this work. The authors of [14] propose a meta-model-based and UML class diagrams, and the source code written in the approach to create traceability links between different lev- Java programming language. els of the same system. Indeed, this approach focuses on defining traceability metamodel source code stored as an Abstract Syntax Tree (AST) and other possible artefacts 2.3 Identified gaps in existing works such as requirements, test cases, etc. To show the identi- Overall, existing works that define explicit traceability fied trace links, authors develop an editor. Nevertheless, models are mostly focused on the meta-model level only storing source code of a system as an AST can cause sev- and ignore the model level. Moreover, existing explicit eral problems such as the appearance of syntax errors in the models establish traceability either between software mod- source code, which leads to the loss of traceability links. els expressed in UML diagrams at the same or different ab- There is other model-based research that aim to maintain straction levels or between business model artefacts. How- traceability. For example, the research in [15] proposes a ever, none of the existing approaches have achieved suc- co-evolution of transformations based on the propagation cessful results in establishing or maintaining traceability of change. Its hypothesis is that knowledge of the evo- between BPMN models, UML use case models, and the lution of meta-models can be disseminated by decisions UML class diagram. aimed at driving the co-evolution of transformations. To The disadvantages of the proposed approaches stem from address particular cases, the authors present composition- rigid relationship types that fail to adapt to the changing based techniques that help developers compose resolutions needs and practices of organizations. Furthermore, most of that meet their needs. For the same purpose, the approach the proposed approaches define or use very generic trace- in [11] refers to machine learning techniques to introduce ability meta-models, capable of generating highly abstract an approach called TRAIL (TRAceability lInk cLassifier). trace models. In practice, there is no prescription for how The training of the classifier is based on a training dataset to add customized tracing information or how to adapt a that contains histories of existing traceability links between generic traceability meta-model to express valuable and pairs of artefacts to output the trace link (related or unre- context-specific traces. Concerning the approaches that fo- lated) of any future pair of artefacts (new or already exist- cus on concrete modeling languages, to our knowledge, ing). Some other approaches define traceability models for there is no approach that proposes an explicit trace model eliciting requirements of complex systems [16] and [17]. or meta-model between BPMN and UML models, even Likewise, the authors of [19] base their work on deep learn- though they are the most popular standards for modeling ing techniques and propose a neural network architecture business processes and automated information systems. based on word embedding and Recurrent Neural Network (RNN) algorithms to predict trace links automatically. The output of this model is a vector that contains the semantic 3 Background of our previous data of the artefact. Then, the trained model compares the semantic vectors of a pair of artefacts and predicts if they traceability method are related or unrelated. However, considering all meta- models from many different abstraction levels in one uni- The method presented in this paper is an extension of our fied single traceability model is not a trivial task and can previous work [1]. In this previous work, we have explored result in very complex models. the advantages of defining an integrated traceability model to establish traceability between the BPMN and the UML In [18], the authors propose an approach to promote use case models and ensure their coevolution once a change traceability and synchronization of computational models has occurred. This method acts at both the meta-model and in an Enterprise Architecture (EA), using meta-models, model level, and it includes three core steps: model traceability, and synchronization structures. The au- (i) First, we have defined an integrated trace meta-model thors represent the meta-models of the EA at all abstraction that is a specification of traceability between the existing levels (strategic, tactical, and operational). These levels are artefacts, while keeping them unchanged and independent. denoted within the integrated meta-model by three pack- This integrated trace meta-model contains all the BPMN ages. Each package incorporates the core concepts of the and the UML use case meta-model artefacts (meta-classes level it represents. They integrate the three meta-models and associations) unified with new meta-classes and asso- by adding alignment points between them. In addition, ciations for expressing traceability links at the meta-model they define a traceability framework and a synchronization level. The integrated trace meta-model favors simplicity framework to support the analysis of the impact of organi- and uniformity because source meta-models are kept and zational changes. unified with their traceability information in one unified There are also studies on specific languages. For exam- meta-model. ple, the approach in [20] uses Natural Language Processing (ii) Next, we instantiated the integrated trace meta-model techniques to define a framework for managing traceability at the model level. We represent it as a new diagram called 100 Informatica 49 (2025) 97–114 A. Bouzidi et al Business Process Modeling Notation Traces Use Case (BP- SUC). This diagram also incorporates the BPMN and the Table 1: Mapping of BPMN, use case and the trace meta- UML use case elements together with traceability links, and model concepts allows designing BPMN and use case diagram artefacts, Use Case BPMN concept Meta-model jointly. Moreover, visualizations and queries on traced el- Concept concept ements together are straightforward, because business ana- Package Empty Lane (a lane in- Organisation lysts and software designers are now able to work together cluding other sub-lanes) Unit Pack- on one integrated model. BPSUC can also be used to anal- age yse change impacts and validate them before propagating Actor Non empty Lane (that Organization them to the source models. does not contain other Unit Actor (iii) Finally, we defined bidirectional transformation sub-lanes) models between the BPSUC diagram and the sourcemodels Use case Fragment represented UCsF (BPMN and the use case models) to ensure the coevolution by a sequence of BPMN of the origin models. artefacts that is per- formed by the same role and manipulates the 3.1 Integrated trace meta-model same item aware element In our previous work presented in [1], a unified trace meta- (business object, input model is proposed based on a semantic mapping of pairs of data, data store, data BPMN and use case meta-model artefacts. The definition state) d of thismeta-model follows the following scenario: For each Extends Exclusive Gateway be- Exclusive pair of overlapped BPMN and UML use case concepts, we tween two different frag- Gateway, add a new modeling concept that can be either a link, such ments Extends as an association, a composition or an inheritance, or a new Association Fragment within the low- Association meta-class. Each trace link represented by a newmeta-class est nesting level of sub- is associated with the pair artefacts it specifies, generally, lanes by an inheritance relationship. Includes Redundant Fragment Fragment Table 1 summarizesmappings between the BPMNmodel (that appears multiple that appears (first column) and the use case diagram concepts (second Times in the BPMN multiple column ), and the corresponding new meta-classes (third model) times, In- column) that are associated with them in the integrated cludes meta-model (the full mapping and its explanation is avail- Extends Inclusive Gateway be- Inclusive able in [1]). To validate the proposed mapping further, we tween two different frag- Gateway, have conducted additional evaluations across a variety of ments Extends BPMN and UML diagram scenarios. Extension Condition of sequence Extension This expansion includes not only the core BPMN and Point Flow + the name of the Point UML elements such as activities, actors, and use cases but fragment that represents also more complex diagrams, such as: to the extending use case – BPMNModels: Event-driven processes, process vari- ants, and sub-processes with different complexity lev- els, such as loan approval and inventory management The integrated meta-model is depicted in Figure 1). In systems. order to be able to read it, we have presented in this Figure – UML Diagrams: Class diagrams, including inheri- only the core artefacts of the source meta-model (use case tance and association relationships, and more sophis- meta-model and BPMN met-model) and all the trace links ticated use case diagrams representing different busi- (meta-classes and associations). Dark grey meta-classes ness functions, such as order fulfilment, customer sup- represent new meta-classes; light grey meta-classes repre- port, and system maintenance. sent UML use case elements, white meta-classes represent These diverse scenarios have allowed us to assess how BPMN elements, and black lines represent existing associa- well the mapping between BPMN, UML use case, and tions from the source meta-models The blue lines represent class diagrams holds up in real-world business process and trace relationships, providing the foundational traceability system modeling. By applying the proposed traceability between BPMN and UML elements, as further detailed in method to these varied scenarios, we demonstrate the scala- [1] and [22]. bility and robustness of our meta-model. We have also pro- vided examples where the mapping effectively handles the – Organizational-Unit-Package integration of different BPMN and UML model types, en- In BPMN, a non-empty lane is a grouping element and suring traceability between the business and software mod- therefore has the same meaning as a package in UML. Con- els. sequently, the Organizational-Unit-Package (OUPackage) A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 101 is defined to trace the link between the pair BPMN non- logical consistency. empty lane and the use case package thereby defining an in- – Use case supporting fragment heritance relationship between the new meta-class and this In order to support business objectives, such a UML use artefact pair. case should be able to realize some business activities that – Organizational-Unit-Actor are specified in the integrated trace meta-model by a Frag- In the proposed integrated trace meta-model of [1], we ment. A separate specification of the use case and the frag- have defined a meta-class designated Organizational-Unit- ment that is supposed to realise does not allow explicitly Actor (OUActor). This new meta-class traces artefacts of representing the semantic links between them. To do this, the pair UML actors and BPMN empty lanes (i.e. lanes that we have defined the integrated trace meta-model presented do not have embedded lanes). That is, it unifies the prop- in [1], which introduces a new meta-class that we designate erties of a lane and an actor, and combines them without Use Case supporting Fragment (UCsF) This new meta- changing their semantics there by defining the OUActor as class is defined as a specialization of a UML use case in specialization of the UML actor and the BPMN Lane pair. order to inherit all its properties without updating its initial In this way, OUActor inherits properties of this pair of arte- meaning. facts without updating their original semantics and struc- tures. For example, in a loan approval process, the Loan Officer is represented in a BPMN diagram by a non-empty 3.2 BPSUC diagram lane and in a UML use case diagram as an actor involved To allow modeling the artefacts of the proposed integrated in the process. The OUActor meta-class links these repre- trace meta-model, we have instantiated it in the form of an sentations, inheriting properties from both while preserving integrated trace model, in our previous work [1]. We rep- their original semantics. This ensures synchronization be- resent it as a new diagram that we have called BPMN Sup- tween BPMN and UML elements, offering a unified view porting Use Case model (BPSUC). of the Loan Officer’s role across models. – Fragment A fragment is defined by [22] as “a set of interrelated Table 2: Notation of the traceability artefacts BPMN elements that has inputs and outputs, and which is Meta-model concept Graphical notation executed by the same performer”. This artefact is speci- fied in the unified trace meta-model as an instance of the meta-class Fragment (cf. Figure 1)). As a Fragment is just OU-Actor an activity that can contain other BPMN concepts such as tasks, events, gateways and sequence flows, we have aggre- gated a BPMN sub-process to a Fragment by creating an ag- OU- Package gregation relationship called fragments, between the frag- ment and sub-process meta-classes in the integrated trace meta-model (cf. Figure 1)). Its cardinality is 1-* to point out that a sub-process should contain at least one fragment Use Case supporting but it may incorporate more than one Fragment. In addi- Fragment (UCsF) tion, we define a many-to-many reflexive association in the fragment to represent the fact that a fragment may be an aggregation of other fragments (cf. Figure 1)). Moreover, For each concept, we have provided a graphical nota- we create an association between the data object and the tion as follows: We have introduced new notations to the fragment (cf. Figure 1)) to associate each fragment with proposed new meta-classes UCsF, Organization Unit Pack- the objects it manipulates. The cardinality of this relation- age, and Organization Unit Actor. These notations are in- ship is fixed to 1-* to indicate that each fragment manipu- spired by and extended from the icons of the pair of arte- lates at least one business object type, but it may manipu- facts they represent. The inspiration ensures that experi- late more than one business object. Furthermore, we have enced business and system designers are comfortable using defined an association called organizationUA between the the BPSUC diagram. Each concept originates form UML meta-classes OU-Actor and fragment with a cardinality of use case and BPMN models[1], and retains its original no- 1-* to associate a fragment with its performer. For instance, tation. In the BPSUC diagram, the Fragment is instantiated tasks such as Review Application, Assess Credit Score, and as a specific activity within the Loan Approval Process, Approve Loan in the BPMN Loan Approval Process are linking BPMN tasks to the Loan Officer (OUActor) and all performed by the Loan Officer. The Fragment aggre- business objects like the Loan Application. Additionally, gates these tasks into a cohesive group and connects them the Organization Unit Package (OUPackage) meta-class is to the BPMN sub-process as well as the business objects used to trace relationships between BPMN lanes and UML (e.g., Loan Applications, Credit Scores) manipulated dur- packages. For example, functional areas such as Loan Re- ing the process. This provides clear traceability between view, Credit Assessment, and Loan Approval in the BPMN tasks, business objects, and performers while maintaining diagram are mapped to corresponding UML packages, en- 102 Informatica 49 (2025) 97–114 A. Bouzidi et al Figure 1: Traceability of BPMN and use case meta-model concepts suring alignment and traceability between these functional from our trace model to the source models is carried out areas and their UML counterparts. through defined MDA model transformations. The former Table 2 depicts the graphical notations of the new meta- is explored to guarantee the coevolution of the BPMN and classesOrganisationUnit Actor, OrganisationUnit Package the UML models. Figure 2) demonstrates the background and the UCsF. of our traceability method. In this section, we further ex- plain howwe extend and improve the integrated trace meta- model and the BPSUC diagram, as well as the rectifications 4 Traceability method made to them. The research work conducted in this paper is an extension and enhancement of our previous work presented in [1]. 4.1 Integrated trace meta-model The extension consists of improving the integrated trace improvement meta-model and the BPSUC diagram to include the arte- facts of the UML class diagram structured according to the Our first improvement of the integrated trace meta-model MVC design pattern. Our contribution aims not only to es- consists of defining an adequate strategy for defining its concepts and relationships between them. Indeed, we pro- pose a methodology for defining the integrated trace meta- model which includes two main steps: (1) identifying over- lapping concepts of BPMN and UML meta-models to de- fine a relevant mapping between them, and (2) Defining an adequate methodology to link each pair of interrelated concepts, without changing their semantics. Thus, we pro- pose to keep overlapping concepts and connect them ei- ther by a new concept, or a new relationship, which spec- ifies the trace link between existing concepts at the meta- model level. Afterwards, we connect each pair of artefacts to the new concepts representing them through a general- Figure 2: Background of the traceability method ization/specialization relationship. This relationship allows inheriting properties of both separated concepts as well as tablish alignment but also to keep source models always combining their usage without updating their initial seman- aligned even if they evolve. The propagation of changes tics. Our second improvement consists of adding the class A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 103 diagrammeta-model artefacts to the previous version of the In contrast to the mapped concepts of the use case and integrated trace meta-model. the BPMN meta-model artefacts, the mapping between the By applying our meta-model construction, we need to interrelated concepts of UML class diagram and the BPMN identify adequate mappings between BPMN and UML meta-model is not tight, as shown in Table 3. Indeed, one class diagram concepts. In the literature mapping between UML concept may be represented by many BPMN con- BPMN and UML class diagram concepts is widely dis- cepts and vice versa. This is due mainly to the important cussed. Among them, [23] defined model transformations degree of heterogeneity between the BPMN and the class from BPMN into UML class diagrams structured according diagram artefacts. Thus, our mapping is limited to defining to the MVC design pattern, and use case diagrams based on associations instead of defining new traceability concepts, semantic mappings. For example, they propose mapping as we aim not to complicate our integrated trace meta- each BPMN empty lane (i.e. lane that does not include model, and therefore facilitate its readability while main- other lanes) into a class in the class diagram, and into an taining its consistency. The aforementioned trace meta- UML actor in the use case model. We reuse the defined se- classes can also be reused to define BPMN- class diagram mantic mappings in this approach to continue the definition concept traceability. of the integrated trace meta-model. The excerpt of the meta-model defined to trace the BPMN and the class diagram meta-models is presented in Figure.3. To ensure readability, Figure 3) depicts only the Table 3: Mapping of BPMN and class diagram meta-model main artefacts of class diagram and BPMN meta-models, concepts as well as the reused traceability concepts. BPMN concept UML class diagram White meta-classes are BPMN concepts, orange meta- meta-model classes are UML class diagrammeta-model concepts, khaki Item Aware Element (Data – Entity class meta-classes denote UML class diagram concepts used for object, Data store, Data in- – Association structuring the class diagram according to the MVC de- put, Data output or Data sign pattern, while new concepts are specified by dark grey state) meta-classes. The blue associations represent the new trace Empty lane – Entity class links, while the black ones are the existing associations. – View class It is important to note that all the use case concepts, – Control class BPMN concepts, traceability links and existing associa- – Association Fragment – View class tions defined in the previous extract of the integrated trace – Control class meta-model, which are not present in this extract remain Exception event – Exception class valid. – Operation In the excerptof Figure.3, each BPMN concept is asso- Signal event ciated with its corresponding concept in the class diagram – Signal Class meta-model. For example, we define a trace link called – Operation trace between the data object and the entity class to estab- lish traceability between them. Automated task t (business The multiplicity of this association is 1..* to indicate that rule task, receive task, send – Operation task, user task, script task, – Association each item-aware element should represent exactly one en- tity class. Moreover, we define a trace link between the service task) within a frag- gateway and the property meta-classes as gateways can be ment indicators of association cardinalities. The multiplicity of Item aware element type Cardinality of associ- this association is 0. On the other hand, UCsF is linked (single or collection) or ations to the following meta-classes: class, ClassDIPackage, and Gateway or association by composition . This means that a UCsF can Loop task/ Rollback se- include classes, associations, and packages. These asso- quence flow ciations mean that a UCsF is a use case that incorporates Item aware element attached Parameters of an op- its supported class diagram elements, representing the sup- to an automated task t within eration ported fragment elements. a fragment f The cardinality of the composition association UCsF- Conditional sequence Flow Attribute ClassDIPackage is 3-* to indicate that an UCsF should in- corporate at least three packages: View, Control, and Mod- els, which represent the three parts of the MVC design pat- In Table 3, we summarize the semantic mapping be- tern. tween the BPMN meta-model artefacts and the class di- In addition, we define an association between OUActor agram meta-model artefacts from [23]. In this Table, the and class to express that an actor in the integrated trace class diagram meta-model concepts are structured accord- meta-model is represented as a class in the class diagram ing to the MVC design pattern. meta-model. Furthermore, in our integrated trace meta- 104 Informatica 49 (2025) 97–114 A. Bouzidi et al model, a generalization/specialization relationship between the meta-classes OUPackage and ClassDIPackage is de- Table 4: Graphical notations of overlapping elements of fined to point out that this trace meta-class inherits all prop- BPMNTraceISM diagram erties of the Package meta-class. BPMNTraceISM Graphic BPMNTraceISM Graphic element notation element notation 4.2 BPSUC diagram improvement Use case asso- Signal event In contrast to most existing approaches [11], [13], [14], ciation [15], [16], [17], [18], [19], [20], [21], which focus only on the meta-model level, our traceability method includes both the meta-model and the model levels. Thus, the second step Extends rela- Exclusive gate- of our contribution is devoted to describing how the trace- tionship way ability of BPMN and UML artefacts is established at the model level. We have improved the proposed BPSUC di- agram from [1], in which the BPSUC diagram features are Includes rela- Parallel gate- limited to designing thoroughly the BPMN and the use case tionship way diagram artefacts, , combined with their traceability links, which already reflects its designation. Annotation Inclusive gate- In this paper, we aim to enrich this diagram to incorpo- flow way rate class diagram elements combined with BPMN and use case diagram elements. The first thing we do is update the name of BPSUC to be in harmony with its newly supported Start event Data input features. The new designation we have chosen is BPMN- TraceISM (Business Process Model and Notation Traces End event Data output Information System Models). BPMNTraceISM is an in- stantiation of the new version of the improved meta-model and forms a single unified model that combines the usage of Manual task Data store UML elements including the use case diagram and class di- agram elements, as well as the BPMN elements thoroughly. Normal task Sequence flow Thus, this diagram is now able to design elements and rela- tionships of both UML use case and class diagrams, as well as BPMN models, concurrently. Moreover, it specifies the Error event Group traceability information of the interrelated artefacts. Each artefact inBPMNTraceISM diagram has its specific notation. Some of them retain the original notation (BPMN or UML notations), while the others have a new represen- Cancel event Entity class tation, which does not differ greatly from BPMN and UML notations. Control class Generalization 4.2.1 BPMNTraceISM artefacts conserve their initial notations Signal class Aggregation association The mappings on which we base the definition of the in- tegrated trace meta-model comprises neither all the BPMN concepts nor all the UML concepts. This is due to the fact View class Composition that, some BPMN artefacts do not have their corresponding association UML artefacts, and vice versa. For example, the mapping does not define any UML concept representing a BPMN Exception class Directed asso- start event. ciation Even though, in a BPMNTraceISM diagram, it is pos- sible to specify UML artefacts with no corresponding ele- ments in BPMN. According to the mapping, many elements of UML are mapped to one BPMN element. Thus, the rep- ciation, (ii) an entity class, and (iii) an operation of a class, resentation of these elements in UML diagrams requires in the class diagram. In this situation, it is very difficult to grouping them. On the other hand, one UML element may represent the mapped elements in one unifying element. At be linked to many BPMN elements. For example, a data the meta-model level, we have proposed associating each store in the BPMN diagram is transformed into (i) an asso- pair of these mapped concepts by an association instead of A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 105 Figure 3: Traceability of the BPMN meta-model and the UML class diagram meta-model defining new traceability meta-classes. At the model level, UCsF should act as a complex symbol that describes con- these artefacts are processed similarly to the non-mapped currently BPMN elements and UML class diagram ele- concepts, and retain their original notations in the BPM- ments. In order to represent explicitly the different ele- NTraceISM diagram. Table 4 outlines the graphical nota- ments incorporated by a UCsF, the use case notation needs tions of the core artefacts of the BPMNTraceISM diagram to be extended. Therefore, we adjust the UCsF notation that retain the initial notations. OUPackage and OUActor by adding another compartment (cf. Figure 4) to encap- are new meta-classes defined by [1] to represent traceabil- sulate class diagram elements representing the components ity links of BPMN and UML use case diagram elements. (classes, associations and packages) of the supported frag- In the integrated trace meta-model, we did not reuse these ment. In order to avoid the complexity of this element, meta-classes to define new associations. Thus, the instanti- the designer can choose to hide or show each compartment. ation of these meta-classes keeps the annotations provided Figure 4) depicts the graphical notation of a UCsF, in which in [1]. all compartments are hidden. 4.2.2 UCsF notation In the previous version of the diagram BPMNTraceISM (in a BPSUC diagram), [1] states that a UCsF is a specializa- tion of a use case and inherits its properties. Therefore, the graphical notation of UCsFs extends the graphical notation of a UML use case. Moreover, a UCsF has composition relationships to (i) a BPMN Fragment. To represent this trace link, graphically, [1] defines a compartment that in- corporates the corresponding BPMN fragment. Figure 4: UCsF notation In our integrated trace meta-model, we have defined composition relationships from a UCsF to some UML class diagram artefacts (cf. Figure 3)). Indeed, a UCsF should 4.3 Change propagation improvement encapsulate classes, associations and packages, which cor- respond to its supported fragment. Accordingly, we pro- Our traceability method aims to ensure the coevolution of pose to update the graphic notation of the UCsF. Thus, a the separated models when a change occurs either in the 106 Informatica 49 (2025) 97–114 A. Bouzidi et al source models (BPMNmodel, use case model, and/or class class diagram (MCD) conforming to the UML meta-model diagram) or in the BPMNTraceISM diagram (cf. Figure 5)). (MMUML). This transformation is carried out based onmap- To do this, we have improved the transformation model de- pings between the new diagram, the BPMN and UMLmod- fined in [1] thereby including the class diagram concepts els. in the bi-directional transformation rules defined in [1] as The formal definition of our forward transformation rules two sets of transformation models (forward and backward is as follows: transformation rules). They ensure the transformation be- tween the BPMNTraceISM diagram, the BPMN, and the UMLmodels that include a class diagram and a use case di- MTransF ((MBPMN/MMBPMN ,MUCM/MMUML, agram using a semantic mapping between BPMN, BPMN- MUCD/MMUML, TraceISM, and UML elements derived from our integrated (map(MUCM,MBPMNTraceISM), trace meta-model. map(UCD,MBPMNTraceISM), Each transformation model includes a set of well-defined map(MBPMN,MBPMNTraceISM)) transformation program or transformation rules (Tab) (Tab forwa→rdRules conforms toMMt) that transform source models (Ma) con- MBPMNTraceISM/MMBPMNTraceISM forming to source meta-models (MMa) (noted Ma/MMa) (3) to target models (Mb) confirming to target meta-models There are two possible scenarios for producing the (MMb) (noted Mb/MMb ) according to a mapping between BPMNTraceISM elements based on the forward transfor- the source and target model artefacts (noted map mation rules: (Ma, Mb)). Formally, we specify transformation models according to a The first scenario consists of applying a forward trans- function that we callMtransF formation rule (RX) to derive trace modeling elements This function is formally written as follows: (tre) represented in the BPMNTraceISM diagram from a ( ) BPMN element (MBPMN!Element) and a UML element (MUML!Element). More precisely, a OUActor, a OUPack- Tab/MMt MtransF Ma/MMa, map(Ma,Mb) → Mb/MMb age, and a UCsF of the BPMNTraceISM diagram are gen- (1) erated from the BPMN and UML elements. Formally, these transformation rules are as follow: For example, consider the forward transformation rule R1 that transforms aUMLpackage and a non-empty BPMN lane into a OU-Package in the BPMNTraceISM diagram. MTransF tre((MBPMN !Element,MUML!Element), This transformation ensures that elements in the source (map(MBPMN,MBPMNTraceISM), models are correctly mapped to their counterparts in the map(MUML,MBPMNTraceISM)))BPMNTraceISM diagram, facilitating traceability across R both business and software models. →X MBPMNTraceISM !tre The proposed bi-directional transformation models (4) (backward and forward) ensure the coevolution of BPMN For instance, we suppose that a forward transformation and UML models, as well as the coevolution of the source rule R1 produces an OU-Package from a UML package and models (business model specified by a BPMN diagram and a BPMN non-empty lane. Formally, this rule can be written software models specified by a UML class diagram and an as follows: UML use case diagram) and the BPMNTraceISM diagram. Formally, the forward and backward transformation model is specified as(follows: ) MtransFOUPackage((MBPMN !lane,MUCM !Package), (map(MBPMN,MBPMNTraceISM), MtransF Ma/MMa, map map(MUCM,MBPMNTraceISM))) (Ma,Mb) (2) →R1 forwardRules,↔backawardRules MBPMNTraceISM !OUPackage (Mb/MMb) (5) The rest of this section will be devoted to providing more The second scenario consists of generating unrelated el- details on how we created the bidirectional transformation ements (ure) from either, the UML models or the BPMN rules. model only. Indeed, each concept in the BPMNTraceISM, that corresponds to a concept in the source models (BPMN 4.3.1 Forward transformation rules model, a UML class diagram or a UML use case diagram) needs just its original model. In this case, the input of this We propose a forward transformation model (Forward transformation rule is either a BPMN model if this concept rules) to produce automatically a BPMNTraceISM diagram comes fromBPMNor aUMLmodel if its origin is the UML (MBPMNTraceISM) conforming to our integrated trace meta- class diagram or the UML use case diagram. For example, model (MMBPMNTraceISM) from the source models, namely the generation of amanual task in the BPMNTraceISM dia- a BPMN diagram (MBPMN) conforming to the BPMN gram requires the BPMNmodel only because a manual task meta-model (MMBPMN), a use case model (MUCM), and a does not have corresponding elements in the UML diagram. A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 107 This transformation rule is written as follows: For example, when a change occurs in the BPMNTra- ceISM diagram, such as adding a new OU-Actor, the cor- MTransF ure( MBPMN !manualTask, responding elements in the BPMN and UML models need map(MBPMN,MBPMNTraceISM) to be updated. The backward transformation rule ensures R(Man→ualTask) MBPMNTraceISM !manualTask that: (6) 1 TheOU-Actor is mapped to both the UMLActor in the Let's illustrate how a change in the BPMN model (e.g., use case diagram and a new BPMN lane in the BPMN adding a new lane) is propagated into the BPMNTraceISM model. diagram. Assume that Rule R1 is applied to generate an 2 The changes are propagated back into the source mod- OU-Package from the UML package and the BPMN lane. els, maintaining alignment across the models. The transformation process involves the following steps: 1 The BPMN Lane and UML Package elements are identified. 4.3.3 Change propagation process 2 Rule R1 is triggered, creating a corresponding OU- The bidirectional transformation rules allow propagating Package in the BPMNTraceISM diagram. changes that occur in the source model into the target mod- 3 The OU-Package is updated within the BPMNTra- els. By applying these rules this approach enables the co- ceISM diagram, and the changes are synchronized evolution of the business and software models. with the BPMN and UML models. The change propagation process is carried out in two ways (cf. Figure 5): (1) by manually updating the source 4.3.2 Backward transformation rules models (BPMN, UML, and UML use case diagram) , or (2) To have the opposite direction of the forward transfor- by designing the BPMNTraceISM diagram. mation rules, we have defined a backward transformation In the first case, software designers and business ana- model. This means that the source elements of each for- lysts separately and concurrently update the BPMN mod- ward transformation rule become the target elements of a eland consequently the use case diagram and/or the class backward transformation rule, and its target elements be- diagram. For example, a software designer adds a new use come the source elements of the backward transformation case to the use case model, new classes responsible for re- rule. Formally, backward transformation rules are written alizing the new use case, and simultaneously, a business as follows: analyst changes the name of a lane in the BPMN model. A direct generation of the software models leads to the loss MTransF (MBPMNTraceISM/MMBPMNTraceISM , of changes made by the software designers. Additionally, (map(MBPMNTraceISM,MUCM), to avoid unintentional updates, the impact of changes in- volved in a (business or UML) model needs to be anal- map(MBPMNTraceISM,MUCD), ysed before propagating it to the target model. To tackle map(MBPMNTraceISM,MBPMN)) this problem, an intermediate step is required to make all backwa→rdRules updates made in the separate models. This step can be M reached by executing our forward model (user task “Exe- BPMN/MMBPMN ,MUCM/MMUML, MCD/MMUML,M cute forward transformation rules”), which derives a BPM- UCD/MMUML (7) NTraceISM diagram from both UML and BPMN. Thus, We use the same logic as in the forward transformation all changes made on the BPMN and/or on the class and rules to define the reverse transformation rules. Therefore, use case diagrams are considered in the derived BPMNTra- each backward transformation rule of each pair of artefacts ceISM diagram. is defined according to the following formula: In the second case, all updates made by business analysts MTransF and software designers are done in the unified trace model tre(MBPMNTraceISM!tre, (BPMNTraceISM diagram) of being made in the BPMs and the ISMs. Using BPMNTraceISM overcomes the gap be- MTransF tre (MBPMNTraceISM !tre, tween the business analysts and software designers, and en- (map(MBPMNTraceISM, MBPMN ), ables them to work together using the same model. Indeed, map(MBPMNTraceISM,MUML ))) (8) this diagram covers all business and software model ele- R→X (M ments and the traceability concepts of pairs of mapped arte- BPMN !Element,MUML!Element) facts. Any change involving a BPMNTraceISM element Non-overlapping artefact transformation rules are de- (bp) leads to the modification of the BPMN or/and UML fined according to the formula below: model elements traced by bp. For example, the insertion of a new OU-Actor in the BPMNTraceISM diagram leads MTransF ure(MBPMNTraceISM !ure, to the insertion of and a new UML actor in the UML use (map(MBPMNTraceISM,MBPMN), case diagram and a new BPMN lane in the BPMN model. map(MBPMNTraceISM,MUML))) (9) BPMNTraceISM can act as a gateway allowing business R→X (MBPMN !Element,MUML!Element) analysts and software designers to work together to test, 108 Informatica 49 (2025) 97–114 A. Bouzidi et al Figure 5: Synchronization process of BPMN and UML analyse, and correct inconsistencies due to unwanted up- a graphical editor that conforms to the trace meta-model dates before propagating them to the sourcemodels (BPMN and enables concurrently seeing and managing trace rela- and UML models). In addition, this new diagram can be tionships between the BPMN model, the use case, and the used to analyse and estimate the impact of changes made class diagram. BPTraceISM can be integrated within other to business or system components or services. Until this modeling tools to enhance their modeling capabilities. To step, although the BPMNTraceISM diagram is aware of the make our modeling tool available in any Eclipse environ- updates made by both business analysts and software de- ment without need to start an Eclipse runtime, we imple- signers, the source models are not aware neither the BPM- ment it as an Eclipse plug-in. NTraceISM diagram nor each other. Accordingly, propa- gating the modifications is an essential step to ensure the coevolution of the source models. We can do this eas- ily by running our backward transformation model (user task ”execute backward transformation rules”). Once the backward transformation model is run, changes are prop- agated to BPMN and UML models and thus these models are aligned with each other. 5 Implementation To use the proposed traceability approach, we implement a visual editor called Business Process model Traced with Information System Models (BPTraceISM). Figure 6: The environment of the BPTraceISM editor Moreover, we develop a prototype called Business Pro- cess to Information System Models (BP2ISM) that pro- The construction process of the BPTraceISM consists vides significant practical support for the transformations of two main phases: (1) the definition of the modeling involved in our traceability method. This prototype auto- tool, and (2) the definition of the plug-in that supports it. mates the suggested forward and backward transformation The first phase begins with the implementation of the trace models between the business process and the ISMs on the meta-model using the ecoremeta-modeling language. Then one hand, and the BPMNTraceISM diagram on the other. we build a toolbox for creating instances of the meta-model These transformation models are automatically applied via classes. In the second phase, we develop a feature that sup- transformation rules expressed in the ATL transformation ports the modeling tool. Afterward, we construct an update language. site to ensure the portability of our plug-in and allow its installation via any Eclipse update manager. 5.1 Visual editor implementation BPTraceISM environment is composed of four main parts (cf. Figure 6): the project explorer containing an EMF To implement the BPTraceISM editor, we have used project that includes BPMNTraceISM diagrams (part a), the Eclipse EMF to implement the trace meta-model and modeling space (part b), the toolbox containing the graphi- Eclipse GMF to design the concrete syntax of the BPM- cal elements of a BPMNTraceISM diagram (part c), and the NTraceISM diagram. Indeed, the modeling tool includes properties tab to edit the properties of an element selected A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 109 in the modeling space (part d). BPISM2BPMNTrISM takes three files as input: (1) A Figure 7) outlines a simple example of a BPMNTra- file with the extension “.bpmn” that must conform to the ceISM diagram created using the editor. The modeling BPMN2.0 meta-model, (2) two files with the extension space contains an OUActor called supplier associated with “.uml” that must conform to the UMLmeta-model and con- a UCsF called manage purchase order. In the business com- tain the use case model and the class diagram. It generates partment of the UCsF Manage purchase order, we have a as output a BPMNTraceISM diagram with the extension user task calledAccept purchase order. In the class diagram “.BPMNTraceISM”. compartment, we have four classes linked via undirected BPMNTrISM2BPISM implements the backward trans- associations. Each class has a name and a stereotype. The formations, i.e., the transformation rules from a BPMN- boundary class Manage purchase order contains an opera- TraceISM diagram into a BPMN and UML models. It tion called acceptPurchaseOrder() takes as input a BPMNTraceISM diagram with the exten- sion “.BPMNTraceISM”. It generates as output three files: (1) A file with “.bpmn” as extension. This file conforms to the BPMN2.0 meta-model, (2) two files with the extension “.uml” that include the generated use case model and the class diagram. 6 Case study We take a common business process model for online pur- chasing and selling to demonstrate the viability of our trace- Figure 7: Example of a BPMNTraceISM diagram within ability method. The model is specified using BPMN2.0 (cf. the modeling tool Figure 8)) [23]. . This business process begins when a customer selects a product to purchase and adds it to the basket, resulting in 5.2 Prototype for the transformation models the creation of an online purchase order and the transmis- sion of the order to the vendor. The customer has the op- BP2ISM is implemented within the Eclipse Modeling tion to cancel the purchase order before entering their per- Framework (EMF) environment. It includes two compo- sonal information. Otherwise, they must fill in their per- nents: sonal information and submit an online purchase order to – BPISM2BPMNTrISM : It automates the forward trans- the stock management. When an online purchase order is formation, which is the conversion of BPMN and received, the stock manager checks the warehouse for the UML models into BPMNTraceISM. availability of the ordered items to see if there are enough – BPMNTrISM2BPISM : It automates the backward products to fulfil the order. If not, the restocking procedure transformations. is initiated to reorder raw materials and create the ordered The transformation process requires tools, editors or plu- products based on the supplier's catalogue. The restock- gins in order to specify source and target models. For this ing procedure can be performed as many times as neces- reason, tools are required to represent BPMN, UML and sary within the same business process instance. An extreme BPMNTraceISM diagrams, which serve as the source and scenario occurs when raw materials are unavailable. If all target models of BP2ISM components. Because BPMN items are available, sales validate the purchase order, gen- and UML are widely used standards, many plugins and erate an invoice, and begin collecting and packaging prod- tools have been created and certified to support them. We ucts for shipment. When sales receive payment and store choose to employ internal plugins within EMF instead of the delivered order, the procedure is complete. Purchase existing plugins. As a result, we develop BPMN mod- order cancellation requests, however, can be made before els with the Eclipse BPMN2 modeler plugin and UML the purchase order is verified. As a result, sales proceed use case and class diagrams with the UML designer plu- with purchase order cancellation and a penalty charge to the gin. Internal meta-models in these plugins closely adhere buyer. In [23], the authors decompose the BPMN model to OMG requirements. We incorporate these meta-models of the case study into nine fragments (F1-F9) (cf. Figure into the EMF environment for usage in the execution of 8)) based on their fragment definition (see [23] for further our prototype components. In addition, we integrate the explanation). By applying their transformation rules, the trace meta-model to visualize (backward transformation) or approach from [23] allows generating the use case diagram design (forward transformations) BPMTraceISMdiagrams. and the class diagram from the case study BPMN model, We built the transformation rule sets in Atlas Transforma- which that is taken as the input model. tion Language (ATL), which is provided as an internal EMF The online purchasing and selling BPMN model, use plugin. case model, and class diagram presented in [23] can be 110 Informatica 49 (2025) 97–114 A. Bouzidi et al Figure 8: Online purchasing and selling in BPMNTraceISM diagram combined and designed in a single unified model, namely ing vacant lane. For example, the actor Stock manager and the BPMNTraceISM diagram, by using the BPTraceISM the empty lane Stock manager map to the OUActor Stock editor. We would like to highlight that this diagram can manager. be created manually by designers or automatically by run- Assume that business analysts and system designers col- ning the BPISM2BPMNTrISM component. Figure 9) de- laborate on the BPMNTraceISM diagram and update the picts the BPMNTraceISM diagram. Figure 9) shows how business and system functionalities accordingly. Suppose each fragment and its corresponding use case are merged they delete the UCsF Manage preparing purchase order and expressed as a UCsF. For example, we combine Frag- and the OUActor Customer from the BPMNTraceISM di- ment F1 with the use case “Manage preparing purchase or- agram The UCsF Manage preparing purchase order is der” to form the UCsF “Manage preparing purchase order”. traced to the elements of fragment F1, the UML use case EachUCsF displays the traced BPMNelements and the cor- Manage preparing purchase order, as well as all class dia- responding class diagram elements. For each UCsF, the el- gram elements derived from F1 By deleting this UCsF, all ements of the BPMN model are represented in the BPMN its components are also removed from the BPMNTraceISM compartment, while the corresponding class diagram ele- diagram Then, the change involved in the BPMNTraceISM ments are represented in the class diagram compartment. diagram is propagated to the source models by executing In Figure 9), these compartments are hidden in the UCsF the BPMNTrISM2BPISM tool. The output of this compo- Cancel purchase order, while the BPMN compartment is nent is a BPMN model without the pool Customer or the visible in all the other UCsFs. fragment F1, a UML use case model that contains neither the use case Manage preparing purchase order nor the ac- In the UCsF receive payment, the BPMN compartment tor Customer, and a class diagram without elements corre- contains a service task called receive payment, a data ob- sponding to F1. ject called invoice, and a data output called purchase order [paid] These elements are the BPMN elements of fragment F8 Moreover, the class diagram compartment of UCsF 7 Evaluation results archive purchase order is displayed and contains the class diagram elements, such as the classes VarchivePurchase- 7.1 Comparison with existing approaches Order, CArchivePrchaseOrder, paid, archived, operations, attributes, etc corresponding to fragment F10. Further- To evaluate the effectiveness of our traceability method, we more, an OUActor specifies each actor and the correspond- compare it with existing traceability approaches based on A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 111 Figure 9: Online purchasing and selling in BPMNTraceISM diagram defined evaluation criteria. These criteria include: (i) the troduces a graphical visualization for traceability links, en- proposal of a traceability approach at both the meta-model abling users to more easily trace and align elements across and model levels, (ii) explicit representation of relationship models. This graphical representation reduces analysis types between elements, (iii) graphical notation for trace time, simplifies development, and minimizes the risk of links, and (iv) the consideration of both business and infor- misalignment. mation system (IS) models. Moreover, our method's assessment methodology is Table 5 presents the results of this comparison, with rows more comprehensive than most others in the field, which listing the methods studied and columns representing the typically rely on simple case studies. We provide a fully im- evaluation criteria. Cells are color-coded to show the ex- plemented prototype of the transformation approach and an tent to which each criterion is satisfied: dark grey indicates Eclipse plug-in for the traceability process, demonstrating a criterion is not fully addressed, light grey represents par- the practicality and feasibility of our contributions through tial satisfaction, and ”Y” stands for full satisfaction. Based a relevant case study. on this comparison, we conclude that our approach is the only one that meets all evaluation criteria. Specifically, [17] and [20] are the only other works that consider both 7.2 Shortcomings of our contribution business and IS modeling, but they fall short in addressing Despite the strengths of our traceability method, there are the full range of traceability needs compared to our method. some limitations that need to be addressed. One notable Furthermore, only a few approaches, such as those by [15], drawback is that our evaluation was based on a single use [18], and [20], take into account the functional and static case, which may not be sufficient to fully assess the accu- views of IS models. racy and robustness of the method. To address this, we are Our method is unique in that it does not require any conducting ongoing evaluations using more complex case extensions and works seamlessly with standard UML and studies, which will allow us to better validate the method's BPMN tools, making it more adaptable and accessible. Ad- performance in different contexts. This extended evalua- ditionally, our method provides rich software modeling- tion will help ensure that the method is robust and adaptable level artefacts, incorporating both static views (class dia- across various scenarios, enhancing its overall credibility. grams) and functional views (use case models). The class Additionally, our current transformation approach relies diagrams are designed in accordance with the MVC pattern on forward and backward transformation rules, which re- simplifying the prototyping process for developers. quire the recreation of all components, even if they have When focusing on traceability, our method stands out be- not been affected by changes. This process can lead to in- cause it provides traceability at both the meta-model and efficiencies, especially when working with large or com- model levels, ensuring a unified view of BPMN and UML plex models. To overcome this issue, we plan to develop elements. In contrast, many approaches specify only trace- incremental transformations that will update only the com- ability at the meta-model level, without providing a visual ponents directly impacted by changes. This will improve tool for combined model use. Additionally, our method in- efficiency and minimize unnecessary recalculations, ensur- 112 Informatica 49 (2025) 97–114 A. Bouzidi et al Table 5: Comparison of our contribution with approaches based on the external traceability practice Construction model Traceability approach Approach Assessment Business software field evel Model Trace Graphic BPM and methodology field Meta l functional static level links notation ISM [11] N P P Y N N N N CS [13] N CS Y N N N N CS [14] N N N Y N N N N tool [15] P P P Y N N N N [16] RM CS Y N N N CS [17] BPMN N N Y N N N N CS [18] p p p Y Y N N Y N [19] RM N N Y N Y N N N [20] N N CD Y N N N N N [ 21] BPMN N N Y Y N Y N T Our con- BPMN2 UC CCD Y Y Y Y Y CS &T tribution Legend: Y:Yes N: No CS: case study RM: requirement model T:tool/editor CS: Complex systems CCD:conception class diagram UC: uqe case diagram. ing faster and more resource-efficient updates. meta-model and the model levels. Hence, (1) we first de- While we have explored the use of traceability informa- fined a unified tracemeta-model that includes all the BPMN tion to keep BPMN and UML models aligned, the analysis and the UML elements (use case and class diagram) and process is still manual. As a result, we aim to improve this traceability links between interrelated elements. (2) Then, by investigating the development of heuristics that could we defined integratedmodel conforms to the proposed trace automatically detect modifications in the source models meta-model. We defined it as a new diagram named BPM- and suggest necessary adjustments to the corresponding el- NTraceISM (BPMN Traces Information System Models). ements. These heuristics would dynamically support de- This diagram serves many purposes: it promotes the col- velopers, making the process of maintaining alignment be- laborative between business and software designers and al- tween the models more efficient and less error-prone. lows them to work together using one single unified model. To address the limitations mentioned above, we propose The joint representation of both BPMs and ISMs elements a comprehensive roadmap for future improvements. This enables users to drill down and easily trace any business will involve extending the evaluation process with more artefact to its corresponding software artefacts. (3) Finally, complex case studies, enhancing the transformation ap- we defined a set of bidirectional model transformation rules proach to support incremental updates, and automating dia- between the BPMN and UML models, as well as the BPM- gram analysis. In addition, the implementation of heuristic- NTraceISM diagram. based tools will enable the automatic detection of changes The rules are useful when a change propagation- across models, improving traceability and ensuring con- based co-evolution is required to synchronize models after sistency without requiring manual intervention. These ad- changes. To prove the feasibility of our traceability method vancements will significantly strengthen themethod's capa- in practice, we developed a modeling tool in the form of bilities, making it more robust and easier to apply in prac- a plugin that can be integrated into the Eclipse platform. tical scenarios. This tool is named BPTraceISM (Business Process model Trace with Information SystemModels) and allows design- ing and handling BPMNTraceISM diagrams in accordance 8 Conclusion with the proposed integrated trace meta-model. Addition- ally, we specified the set of bidirectional transformation The work conducted in this paper fits within the context rules using the ATL language and we implemented them of model-based development of ISMs, their alignment and as components of the BPM2ISM prototype. Furthermore, their we applied the proposed approaches to a typical case study. coevolution with BPMs. Indeed, we have used integra- In future research, we look forward to optimizing our tion and model transformation methodologies to define a editor to support traceability and synchronization between traceability method oriented towards the development of BPMN models and other UML diagrams. (meta) model-based solutions, purposely influenced by the Object Management Group (OMG) specifications. Partic- ular attention is paid to the BPMN and UML use case and class diagram models. Our traceability method acts at the A Unified Trace Meta-Model for Alignment and Synchronization… Informatica 49 (2025) 97–114 113 References [11] Mills, C., Escobar-Avila, J., Haiduc, S. (2018). Auto- matic Traceability Maintenance via Machine Learn- [1] Bouzidi, A., Haddar, N., Haddar, K. (2019). Trace- ing Classification, In 2018 IEEE International Con- ability and Synchronization Between BPMN and ference on Software Maintenance and Evolution (IC- UML Use Case Models, Ingénierie des Systèmes SME), pp. 369-380. 10.1109/ICSME.2018.00045. d’Information, Vol. 24, No. 2, pp. 215-228. https: //DOI.org/10.18280/isi.240214. [12] Al-Hroob, A., Imam, A. T., Al-Heisa, R. (2018). The Use of Artificial Neural Networks for Extracting [2] OMG UML Specification, O. A. (2017). OMG Uni- Actions and Actors from Requirements Documents, fied Modeling Language (OMG UML), Superstruc- Information and Software Technology, Vol. 101, ture, V2, Object Management Group, Vol. 70. pp. 1-15. https://DOI.org/10.1016/j.infsof. [3] OMG BPMN Specification. Business Process Model 2018.04.010. and Notation Available at: http://www.bpmn.org/. [13] Min, H. S. (2016). Traceability Guideline for Soft- Accessed: 2023-01-31. ware Requirements and UML Design, In International [4] Driss, M., Aljehani, A., Boulila, W., Ghandorh, H., Journal of Software Engineering and Knowledge En- Al-Sarem, M. (2020). Servicing your requirements: gineering, Vol. 26, No. 1, pp. 87-113. https://DOI. An FCA and RCA-driven approach for semantic web org/10.1142/S0218194016500054. services composition, IEEEAccess, Vol. 8, pp. 59326- [14] Eyl, M., Reichmann, C., Müller-Glaser, K. (2017). 59339. 10.1109/ACCESS.2020.2982592. Traceability in a Fine-Grained Software Configura- [5] Ghiffari, K. A., Fariqi, H., Rahmatullah, M. D., Zul- tion Management System, In Software Quality: Com- fikarsyah, M. R., Evendi, M. R. S., Fathoni, T. A., plexity and Challenges of Software Engineering in Raharjana, I. K. (2023). BPMN2 user story: Web ap- Emerging Technologies, 9th International Confer- plication for generating user stories from BPMN, In ence, SWQD 2017, Vienna, Austria, January 17-20, AIP Conference Proceedings, AIP Publishing LLC, 2017, Springer International Publishing, pp. 15-29. Vol. 2554, No. 1, pp. 040003. https://DOI.org/ 10.1007/978-3-319-49421-0_2. 10.1063/5.0103685. [15] Khelladi, D. E., Kretschmer, R., Egyed, A. (2018). [6] Raharjana, I. K., Aprillya, V., Zaman, B., Justi- Change Propagation-Based and Composition-Based tia, A., Fauzi, S. S. M. (2021). Enhancing soft- Co-Evolution of Transformations with EvolvingMeta- ware feature extraction results using sentiment anal- Models, In Proceedings of the 21st ACM/IEEE In- ysis to aid requirements reuse, Computers, Vol. ternational Conference on Model Driven Engineer- 10, No. 3, pp. 36. https://DOI.org/10.3390/ ing Languages and Systems, pp. 404-414. https: computers10030036. //DOI.org/10.1145/3239372.3239380. [7] Khlif, W., Elleuch, N., Alotabi, E., Ben-Abdallah, [16] de Carvalho, E. A., Gomes, J. O., Jatobá, A., da H. (2018). Designing BP-IS Aligned Models: An Silva, M. F., de Carvalho, P. V. R. (2021). Em- MDA-based TransformationMethodology. 10.5220/ ploying Resilience Engineering in Eliciting Soft- 0006704302580266. ware Requirements for Complex Systems: Exper- iments with the Functional Resonance Analysis [8] Kharmoum, N., Retal, S., Rhazali, Y., Ziti, S., Method (FRAM), Cognition, Technology and Work, Omary, F. (2021). A Disciplined Method to Gen- Vol. 23, pp. 65-83. https://DOI.org/10.1007/ erate UML2 Communication Diagrams Automati- s10111-019-00620-0. cally From the Business Value Model, In Advance- ments in Model-Driven Architecture in Software [17] Lopez-Arredondo, L. P., Perez, C. B., Villavicencio- Engineering, IGI Global, pp. 218-237. 10.4018/ Navarro, J., Mercado, K. E., Encinas, M., Inzunza- 978-1-7998-3661-2.ch012. Mejia, P. (2020). Reengineering of the Software De- velopment Process in a Technology Services Com- [9] Rahmoune, Y., Chaoui, A. (2022). Automatic Bridge pany, Business Process Management Journal, Vol. 26, Between BPMN Models and UML Activity Diagrams No. 2, pp. 655-674. https://DOI.org/10.1108/ Based on Graph Transformation, Computer Science, BPMJ-06-2018-0155. Vol. 23, No. 3. 10.7494/csci.2022.23.3.4356. [18] Moreira, J. R. P., Maciel, R. S. P. (2017). Towards a [10] Ivanchikj, A., Serbout, S., Pautasso, C. (2020). From Models Traceability and Synchronization Approach of Text to Visual BPMN Process Models: Design and an Enterprise Architecture, In SEKE, pp. 24-29. 10. Evaluation, In Proceedings of the 23rd ACM/IEEE 1109/CBI.2019.00028. International Conference on Model Driven Engineer- ing Languages and Systems, pp. 229-239. https: [19] Guo, J., Cheng, J., Cleland-Huang, J. (2017). Seman- //DOI.org/10.1145/3365438.3410990. tically Enhanced Software Traceability Using Deep 114 Informatica 49 (2025) 97–114 A. Bouzidi et al Learning Techniques, In 2017 IEEE/ACM 39th Inter- national Conference on Software Engineering (ICSE), pp. 3-14. 10.1109/ICSE.2017.9. [20] Swathine, K., Sumathi, N., Nadu, T. (2017). Study on Requirement Engineering and Traceability Tech- niques in Software Artefacts, In International Journal of Innovative Research in Computer and Communi- cation Engineering, Vol. 5, No. 1. 10.1109/ICSRS. 2017.8272863. [21] Pavalkis, S., Nemuraite, L., Milevičienė, E. (2011). Towards Traceability Meta-Model for Business Pro- cessModeling Notation, In Conference on e-Business, e-Services and e-Society, Springer, Berlin, Heidel- berg, pp. 177-188. 10.1007/978-3-642-27260-8_ 14. [22] Bouzidi, A., Haddar, N., Abdallah, M. B., Had- dar, K. (2018). Alignment of Business Processes and Requirements Through Model Integration, In 2018 IEEE/ACS 15th International Conference on Com- puter Systems and Applications (AICCSA), pp. 1-8, IEEE. 10.1109/AICCSA.2018.8612870. [23] Bouzidi, A., Haddar, N. Z., Ben-Abdallah, M., Had- dar, K. (2020). Toward the Alignment and Traceabil- ity Between Business Process and Software Models, In ICEIS, Vol. 23. 10.5220/0009004607010708. https://doi.org/10.31449/inf.v49i16.6934 Informatica 49 (2025) 115–136 115 A Review of Machine Learning Techniques in the Medical Domain Enas M.F. El Houby Department of Systems and Information, National Research Centre, Giza, Egypt E-mail: em.fahmy@nrc.sci.eg, enas_mfahmy@yahoo.com Keywords: active learning, curriculum learning, deep learning, federated learning, medical, transfer learning Received: August 19, 2024 We have witnessed a rapid exponential growth of all types of data in all domains specifically in the medical domain. The utilization of machine learning techniques has made significant strides across various domains, with deep learning achieving notable success in recent years. Lately, deep learning has gained increasing attention in the medical field. While deep learning excels at automatically learning discriminative features from raw data, it is still challenging to achieve high performance without a huge amount of data and some handcrafted steps. To address these challenges, deep learning has been incorporated with other new trends and domain knowledge to enhance deep learning's capabilities and improve performance covering the ever-growing needs. Transfer learning utilizes knowledge from natural images, curriculum learning integrates domain-specific knowledge, active learning selects the most informative samples to reduce reliance on labeled data, and federated learning enables collaborative training across organizations while ensuring data privacy. In this review paper, these new trends incorporated with deep learning have been investigated and presented as applications in the medical domain by investigating articles that have applied these trends and published in highly reputable journals in the Science Direct database in recent years. Povzetek: V pregledni študiji so predstavljeni sodobni trendi strojnega učenja v medicini, kot so transferno, aktivno in federativno učenje na podlagi učnih načrtov, ki v kombinaciji z globokim učenjem izboljšujejo diagnostiko, personalizacijo zdravljenja in varnost podatkov. 1 Introduction learning techniques are usually categorized into supervised, unsupervised, semi-supervised, and Recently, we have witnessed the growth of all types of reinforcement learning. In supervised learning, the labeled data in all domains. Medical data specifically has grown data is available, therefore, the model can be trained using dramatically in the last few years due to the exponential this manually tagged data to extract patterns. When there increase of knowledge in the medical domain. Medical is no labeled training data, unsupervised techniques are data can be found in various forms such as clinical and employed. It groups similar entities in the same cluster. biomedical data. Biomedical data contains data related to Each cluster demonstrates a relation between these genomics, drug discovery, and biomedicine. Clinical data grouped entities. Semi-supervised depends on a set of contains patient records such as medical patients’ history, hand-crafted extraction patterns and a few tagged laboratory investigation, and image data from magnetic instances as initial seeds of the target relation to start the resonance imaging (MRI), ultrasound (US), X-rays, and training. The training output is used as the training input computerized tomography (CT) scans. Clinical data exists for the following generation. The process of learning is in 2 forms, structured and unstructured. The structured repeated for many generations. Reinforcement learning is format includes the disease history and living habits of the based on evaluative feedback, so, it can automatically patients. While unstructured clinical data such as doctor’s perform goal-oriented learning and process decision- investigation records and the conversation between the making problems [4, 5]. doctor and patients [1-3]. Therefore, this rapidly growing Deep learning is an advanced form of artificial neural volume of medical data requires advanced methods for network (ANN), with a larger number of layers than a analysis. conventional ANN model to automatically learn the Applying artificial intelligence (AI) in the medical features from the data which makes more refined domain comprises a promising technology for different predictions possible. In numerous recent medical image healthcare providers. These technologies, particularly data classification tasks, convolutional neural networks mining, help extract hidden patterns and insights from (CNNs), which are a kind of deep learning network large datasets using machine learning techniques (MLTs). particularized in image analysis, were utilized and Traditional MLT includes Artificial Neural Networks achieved high performance. The success of CNN in the (ANN), Decision Trees (DT), Support Vector Machines classification of medical images has motivated researchers (SVM), and many other techniques. Machine to utilize pre-trained models in building new ones. These 116 Informatica 49 (2025) 115–136 E.M.F. El Houby high-performing CNN pre-trained models have been in patient data, machine learning models can predict utilized for different image classification tasks by disease progression, forecast complications in chronic employing the transfer learning (TL) approach. Pre- conditions, and identify high-risk patients who may trained CNN models utilize features that were learned benefit from earlier interventions. IV) Automation is from a specific domain to fine-tune any other data. They another key opportunity in healthcare, with ML models can be utilized as-is to classify new images or to extract capable of automating routine tasks such as image features using the output from the layer previous to the analysis, patient triaging, and administrative work. This output layer and introduce it to another classifier [6]. allows healthcare providers to focus more on direct patient However, many challenges face the application of care, improving overall efficiency. V) Drug discovery by machine learning techniques generally and in the medical identifying promising drug candidates and predicting their domain specifically such as I) the limitations of available behavior in the human body, which can reduce the time datasets for training the models, that is because collecting and cost associated with bringing new medications to and labeling the data is a labor-intensive and expensive market [7-10]. task, especially in the case of medical images data such as In response to the mentioned challenges, recent Ultrasonic imaging (US), CT, MRI. The annotation of data research has shifted towards using advanced techniques includes the segmentation annotations of abnormality such as deep learning with some incorporated techniques regions and classification labels such as (normal, benign, and domain knowledge like transfer learning which and malignant). Also, that limitation may result from the provides deep learning with information from natural scarcity of some diseases with which it is difficult to images. Curriculum learning integrates domain obtain enough positive cases. II) The low quality of some knowledge through training patterns of the processed task. data is another major challenge, where some of the data Active learning explores the most informative samples can be found unlabeled, inconsistent, inaccurate, or in an and retrieves them from an unlabeled pool to fulfill better unstructured format—such as handwritten notes, performance with less labeled data. Federated learning radiology reports, and conversations between doctors and allows many organizations to collaborate on deep learning patients—which are difficult for machine learning without sharing clients' data or devices which provides algorithms to process effectively. In the case of medical efficient data access and security and an improvement of image modalities, there may be variations in image the learning model utilizing a large decentralizing dataset. resolution and quality. III) The shortage of explanations The purpose of this research is to illustrate the new trends of pathological basis such as the diagnosis reasons, where of machine learning in the medical domain. The selected the techniques depend only on the differences between the articles that are reviewed show these new trends in the normal and patient cases. For healthcare professionals to medical domain using different medical dataset types trust and act on ML-generated results, it is essential to including medical images, tabular datasets, genes, etc. in understand how these models arrive at their predictions. different tasks. The remainder of this research is organized IV) Ethical and regulatory concerns play a crucial role, as follows: Section 2 illustrates the different types of where the healthcare industry is tightly regulated, and medical data. Section 3 presents some data preprocessing machine learning models must comply with stringent steps. Section 4 presents the new trends of MLTs. Section standards to ensure patient privacy, data security, and 5 describes the search methodology for articles that apply model safety. Furthermore, any biases in the data could the mentioned new trends of MLTs in the medical domain. lead to unequal or unfair treatment recommendations, Section 6 presents some of the applications of new trends making fairness an ongoing concern in the application of of MLTs in the medical domain. Section 7 presents the ML in healthcare [7-10]. conclusion and some of the recommended points for Despite these challenges, machine learning presents a future work. wealth of opportunities that can significantly improve healthcare outcomes such as I) diagnostic accuracy and 2 Types of medical data speed where ML algorithms, particularly deep learning models, have demonstrated remarkable success in Medical data can be found in different forms such as automating and enhancing diagnostic processes, arrays of numerical data, images, sequences of DNA, especially in medical imaging. For instance, ML models amino acids, ...etc. For developing any ML model, the data can analyze radiographs, MRI scans, and other images to is split into three parts which are training, validation, and identify abnormalities such as tumors or lesions with a testing. The training part is used to learn to tune the level of precision that often rivals or exceeds that of parameters of the model. The validation part is used to human experts. This capability can lead to earlier stop overfitting, and the test part is used to assess the detection, which is critical for improving patient performance of the model. In the next subsections, a brief outcomes, particularly in cancer and cardiovascular overview of different medical data forms will be diseases. II) personalized medicine by analyzing large presented. datasets, including patient demographics, genetic information, and medical history, machine learning can 2.1 Numerical data help tailor treatments to individual patients, optimizing Different diseases' related data are found as an array of lab therapeutic interventions based on their unique tests which is numerical data. These numerical datasets characteristics. III) Predictive analytics is another can be used to manage the related diseases such as the powerful opportunity that ML offers. By analyzing trends A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 117 datasets available on the UCI machine learning repository 2.3 Image modalities [11]. Most numerical data are available in table form such Information obtained from medical imaging modalities is as Excel sheets or database tables where rows represent clinically beneficial in many applications like computer- samples from patients and columns represent different aided detection, diagnosis, and treatment planning. Many features that describe the intended diseases or vice versa. imaging modalities can be used to check abnormalities in A huge number of numerical datasets are available such as different body organs. They include radiation such as CT, the patient demographics of some diseases like COVID- X-rays, US, and MRI. They are categorized according to 19, and the lab results for different diseases such as the method of producing images. They help radiologists to thyroid, heart disease, dermatology, cancer, etc. recognize abnormal regions. The interpretation of different image modalities needs expertise, where it is 2.2 Microarray gene expression data operator dependent. Therefore, the process of reading Microarray techniques provide a platform for measuring image modalities is exhausting, costly, and prone to error. the expression levels of thousands of genes in various Ultrasound (US) is a suitable modality for tumor conditions. It is composed of a small glass slide or detection. It can estimate the size of the tumor and membrane that contains samples of many genes arranged distinguish abnormalities. Its capability of detecting in a regular pattern. It is used to find genes associated with contra-lateral malignant lesions is limited [14]. specific diseases by analyzing and finding the differences Magnetic resonance imaging (MRI) produces images between two mRNA sets, one set is from normal cells and relevant to the displaying of hydrogen atoms to radio the other set includes cells from pathological tissues such waves and magnetic fields. MRI images are valuable as as cancer cells. Microarray data contains a lot of redundant they present physiology and anatomy. It images the target genes, and many genes include inappropriate information organ and prepares it as thin slices; moreover, it provides for the accurate classification of diseases. Thus, the information about the vascularity of the tissue [15]. analysis of the large amount of data generated by this Computed tomography (CT) scanners display better technology is not an easy task for biologists [12]. Figure 1 image clarity using multiple X-ray sources and detectors shows a cDNA microarray spotted on a glass surface. [15]. Radiation X-ray generated images are 2-dimensional While, in Figure 2 the general structure of the microarray images. Fluoroscopy units show real-time moving images is illustrated, which is represented as an array of numerical produced by X-ray exposure. Angiography is a values. Cancer gene expression datasets for leukemia, widespread usage of fluoroscopy, imaging blood flow in lung, prostate, etc. can be found in [13]. vessels [15]. Digital Mammography (DM) is an X-ray imaging that is specialized for breast tissue. DM is the most common and most important screening method in clinical practice. It can detect tumors before they develop further and become easily detected and felt by the physician [16]. Microscopic images are the images that are captured by the microscope to enlarge small scanned objects and extract fine details that cannot be obtained otherwise ]17[. Figure 3 shows samples of different image modalities for different body organs. Figure 1: cDNA microarray spotted on a glass surface. 2.4 DNA and protein sequences https://www.cell.com/fulltext/S0960-9822%2898%2970103-4 The fast growth of sequencing resulted in huge numbers of DNA and protein sequences. Sequences can be used to predict diseases associated with a given either DNA or protein sequences. DNA is a long polymer chain of units named nucleotides; it exists in a double helical shape as shown in Figure 4. There are 4 types of nucleotides which are A (adenine), C (cytosine), G (guanine), and T (thymine), they are considered the alphabet of DNA. They are arranged into sequences of 3-letter called codons. A double-stranded helical structure of DNA would be complementary, where “G” is chemically combined with “C”, and "A" with "T" within the replication of DNA [18]. Figure 2: General structure of microarray. 118 Informatica 49 (2025) 115–136 E.M.F. El Houby (a) X-ray of Lung [19] (b) DM of breast [20] (c) Microscopic blood image [21] Figure 3: Samples of different image modalities for different body organs. Amino acids are linked into linear chains to produce then, the translation of the transcribed mRNA into the proteins. The properties of proteins are defined by the associated chain of amino acid sequence, which later fold composition of their amino acids. The triplets of into fully functional proteins. consecutive DNA nucleotides which are called codons are Single nucleotide polymorphisms (SNPs) are the most responsible for the forming amino acid sequence in a common human genetic variations as mutations or protein. There are 4³ = 64 various codons formed from the insertions/deletions (indels). If SNPs have changed the 4-letters [22], which is more than 3 times larger than the codon triplets without changing the encoded amino acid, number of amino acids which is 20 amino acids, 3 of it is synonymous (sSNPs) while the gene is not mutated. which represent stop codons and one is a start codon. Otherwise, it is non-synonymous (nsSNPs), as it changes While the remaining codons are responsible for generating the codon while the encoded amino acid is changed into the 20 amino acids. So, it is possible that more than one various amino acids which are called missense mutations codon maps the same amino acid [18]. Figure 5 shows the which are the reason for many diseases [23, 24]. Figure 6 transcription of DNA sequence into molecules of mRNA; shows single nucleotide polymorphisms (SNPs). Figure 4: Chain of DNA sequence. http://acer.disl.org/news/2016/08/17/tool-talk-gene-sequencing/ Figure 5: The process of translation from DNA sequence to the associated amino acid sequence. https://courses.lumenlearning.com/suny-ap1/chapter/3-4-protein-synthesis/ A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 119 Figure 6: Single nucleotide polymorphisms (SNPs). https://isogg.org/wiki/Single-nucleotide_polymorphism 3 Data preprocessing the feature selection depends on the outcome of the ML The knowledge discovery includes 3 main phases which algorithm to decide how favorable the features subset is. are the preprocessing phase, the data mining phase, and Candidate solutions of feature subsets are iteratively the post-processing phase. Data pre-processing is a crucial generated and their characteristics are assessed by the phase of knowledge discovery to build an accurate applied ML algorithm [28]. machine learning model. In the preprocessing phase, a set The wrapper-based feature selection approach of data preprocessing steps are performed (cleaning the evaluates the quality of feature subsets using the learning data from the noise, handling missed values, merging algorithm. Thus, it can determine and discard irrelevant appropriate data from different databases, normalizing the and redundant features effectively. As the learning data, extracting features, and selecting the most algorithm is frequently used in the search process, high informative features) to prepare the data for data mining computational time is required, especially when the phase. Datasets can also be small therefore the relevant datasets are large. On the other hand, the hybrid methods features have not been captured and thus data aim to utilize the advantages of both approaches; the augmentation is performed by applying different data efficiency of computation of the filter approach and the augmentation techniques. Data mining, which is the core high performance of the wrapper approach [29]. phase in knowledge discovery, is performed by applying Feature selection algorithms based on heuristic search MLTs. The preprocessing facilitates the application of the methods are needed as the computation of a huge number MLTs to extract important patterns or correlations. In the of features is not feasible. Many meta-heuristics post-processing phase, the discovered knowledge is approaches have been used for feature selection. From refined and improved then interpreted into meaningful these algorithms are the nature inspired algorithms such as knowledge for the user’s presentation [25]. genetic algorithm (GA), firefly [30, 31] and ant colony Feature selection is a key difference preprocessing optimization (ACO) [32, 33]. step that should be highlighted when comparing deep learning with the traditional MLTs so it will be tackled in more detail in the next subsection. 4 New trends of machine learning techniques 3.1 Features selection After data preprocessing, various Machine Learning Feature selection is the process of finding the optimal Techniques (MLTs) are applied to uncover hidden feature subset that is strongly distinguishing among patterns and correlations in the data. As mentioned earlier, different classes. The purpose of this process is the disease-related data, often represented through numerical reduction of the dataset and the elimination of redundant lab tests, microarray data, medical imaging, and genetic and irrelevant features that impact the classification sequences, can be processed to predict disease presence or process negatively. Feature selection is a combinatorial other related tasks. Traditional MLTs like Support Vector optimization problem its aim is to select the feature subset Machines (SVM), Decision Trees (DT), and K-Nearest with the least number of features that achieves the highest Neighbors (KNN) are effective across these data types for possible classification accuracy. It is one of the data decision-making. However, Artificial Neural Networks preprocessing for pattern recognition and data mining (ANNs), which mimic the brain’s neural structure, are specifically when working on high-dimensional datasets increasingly used for more complex tasks and in various [26, 27]. domains, especially the medical domain. ANNs consist of Feature selection has 2 main approaches: the filter and input, hidden, and output layers, and are trained through the wrapper. In the filter approach, the feature selection is techniques like backpropagation. Their success across based on statistical individual feature ranking. It is easily domains, especially healthcare, has led to the development implemented but eliminates the interaction among of Deep Learning (DL), a more advanced form of ANN. features and does not rely on the applied ML algorithm to This section explores the role of deep learning in modern the selected features. Whereas, for the wrapper approach machine learning. 120 Informatica 49 (2025) 115–136 E.M.F. El Houby Figure 7: The general structure of Deep Neural Network. random search are widely used, more advanced 4.1 Deep learning (DL) techniques, such as Bayesian optimization, offer significant advantages [37, 38]. Deep Learning (DL) models have gained prominence due to their ability to automatically extract complex patterns from data, eliminating the need for manual Common techniques for hyperparameter feature engineering. However, DL models require large optimization: datasets, making them particularly suited for high- dimensional data, such as in medical fields, where they 1. Grid search: Exhaustively searches across all can uncover intricate structures through multiple possible hyperparameter combinations. While it intermediate layers. The depth of a DL model— guarantees to find the best parameter set within the referring to the number of hidden layers—enables it to grid, it can be computationally expensive, especially learn complex mappings between input and output. for models with many hyperparameters. Unlike shallow networks, which struggle with intricate data patterns, deeper networks excel at learning these 2. Random search: A more computationally efficient relationships [34, 35]. Figure 7 shows the general approach, randomly selecting combinations of structure of Deep Neural Networks (DNNs). hyperparameters from specified ranges. It often achieves comparable or better results than grid There are several deep learning algorithms such as search in fewer trials. Convolution Neural Network (CNN), radial basis function networks, deep belief networks, autoencoders, 3. Bayesian optimization: An advanced method that and Recurrent Neural Network (RNN) [35, 36]. builds a probabilistic model of the objective Deep learning depends on hyperparameters such as function. It predicts the best hyperparameters based activation function, learning rate, batch size, number of on past performance, guiding the search toward the epochs, optimizer, dropout rate, etc. Different deep most promising regions with fewer trials. Libraries learning algorithms, like RNNs and CNNs, also have like Optuna and Hyperopt can implement Bayesian additional specific hyperparameters. Adjusting these optimization efficiently. hyperparameters is critical, as their values significantly For example, CNN can be used to classify medical affect the model's behavior. Finding the optimal combination of hyperparameters can be an exhaustive images like X-rays or MRI scans. Random search can explore different values for hyperparameters (e.g., task, requiring substantial computational resources and learning rate, batch size, number of layers). time [37, 38]. Alternatively, Bayesian optimization can be used for a more efficient search, predicting the most promising The performance of a DL model heavily depends on the hyperparameters configurations based on prior selection of these hyperparameters, particularly in evaluations. By optimizing the model’s parameters complex domains like medical data analysis. Medical using these methods, we can improve classification data often have high dimensionality, noise, and accuracy, reduce overfitting, and ensure the model imbalanced class distributions, making hyperparameter performs well on unseen medical data. optimization crucial to enhancing model performance. Careful selection improves robustness and generalizability, ensuring reliability in real-world clinical settings. While methods like grid search and A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 121 Advantages and Limitations of DNNs for Medical defines the input to hidden connection, a weight matrix Data W defines the hidden-to-hidden connection, and a weight matrix V defines the hidden-to-output Advantages: connection. Then, from time step t = 1 through time step t = n, the following equations are used: • Versatility: DNNs can be adapted to work with various data types, including structured clinical data 𝑎𝑡 = 𝑏 + 𝑊ℎ𝑡−1 + 𝑈𝑥𝑡 (1) (e.g., patient demographics, lab results), unstructured ℎ𝑡 = tanh(𝑎𝑡) data (e.g., free-text medical records), and image data. (2) 𝑜𝑡 = c + 𝑉ℎ𝑡 (3) • Feature Learning: DNNs can automatically learn relevant features from the data, making them more 𝑦𝑡 ̂ = SoftMax(𝑂𝑡) (4) flexible than traditional machine learning algorithms The forwarding propagation of RNN is defined by that rely on feature engineering. the preceding equations, where b and c are the bias vectors, while tanh and SoftMax are the activation Limitations: functions. To update the weight matrices U, V, and W, we compute the gradient of the loss function for each • Training complexity: Training deep neural weight matrix. Gradient computation requires both networks can be computationally expensive and forward and backward propagation of the network. Any time-consuming. Additionally, DNNs require large loss function can be used depending on the goal. At datasets to avoid overfitting. each time step, the sum of all losses is the total loss for a particular sequence of x values. • Overfitting: If not carefully tuned, DNNs can overfit However, traditional RNNs suffer from gradient to small or imbalanced datasets, a common issue in exploding and gradient vanishing issues, making them medical data where datasets may not be as large or unsuitable for long-term dependencies. On the other diverse as needed for training. hand, long short-term memory (LSTM) is effective in Use Cases in Medicine: DNNs have been applied to a capturing long-term time dependence. LSTM networks variety of tasks in medicine, including predicting address this by introducing gating mechanisms that patient outcomes, disease progression modeling, and control the memory flow, allowing for better long-term disease classification from images and clinical data. sequence learning. Gated Recurrent Units (GRUs) offer a simplified version of LSTMs with similar benefits but CNNs and RNNs are two of the most common and fewer parameters. promising deep learning algorithms used in medical applications. These algorithms have demonstrated Advantages and Limitations of RNNs for medical success in a variety of tasks, such as medical image data classification and time-series data analysis. Further Advantages: details on these algorithms will be discussed in the next subsections. • Sequential data processing: RNNs can handle different types of sequential data, including time 4.1.1 Recurrent neural networks (RNN) series and text, where the past medical history or a Recurrent neural networks (RNNs) are neural networks series of clinical events influence future outcomes. that contain memories that can capture the stored information in the prior element of the given sequence. • Memory of past inputs: RNNs can remember Therefore, RNN is suitable for processing sequential information from previous time steps in the data types such as the diagnostic history of patients, sequence, allowing them to capture temporal DNA and protein sequences, etc., where the dependencies in data. This is particularly useful for information is remembered through the network. RNN tracking disease progression over time or analyzing is called recurrent because it executes the same task for patient histories. each element of the input sequence while its output is based on the prior computations (memory). Thus, the Limitations: decision of recurrent net at time t-1 affects the decision that will be taken later at time t. Therefore, RNN has 2 • Training difficulties: RNNs are prone to the sources of input, the recent past and the present, which vanishing gradient problem, especially in long are combined to define the response to new data. Figure sequences, making them harder to train effectively. 8 shows the architecture of the RNN in which a set of input x values are mapped into a sequence of output o • Data complexity: RNNs are best suited for data values. A loss L measures the difference between the where the relationship between input and output is expected output o and the actual output y [35]. sequential. For static data like images or tabular data, Where 𝑥, ℎ, o, L, y symbolizes input, hidden state, CNNs or DNNs might be more appropriate. output, loss, and target value. A weight matrix U 122 Informatica 49 (2025) 115–136 E.M.F. El Houby • Resource intensive: Training RNNs, especially on long sequences, can be computationally expensive. Figure 8: The architecture of recurrent neural network. at specific positions. At a specific layer 𝑙, the feature map Use cases in medicine: RNNs (and their variants like at position (𝑖, 𝑗) is defined as ℎ𝑙 𝑖𝑗, the bias as 𝑏𝑙, and the LSTMs) are commonly used in medical applications such weight as 𝑊𝑙. The feature map can be expressed as as gene sequence classification, predicting disease follows: progression over time, and analyzing time-series medical h𝑙 ij = ReLU((W𝑙 ∗ I)ij + b𝑙) (5) signals (e.g., ECG readings) [35]. Where ReLU is activation function which controls the 4.1.2 Convolution Neural Network output. The basic structure of the CNN is shown in Figure 9. CNNs are a type of deep learning network specialized for image analysis. Unlike traditional MLTs that rely on Advantages and limitations of CNNs for medical data manual feature extraction, CNNs can automatically learn Advantages: hierarchical features from raw image data. This is especially useful in the medical field, where CNNs are • Feature extraction: CNNs automatically learn applied to analyze medical images for tasks like disease hierarchical features from raw image data, detection and classification [36, 39]. eliminating the need for manual feature It contains an input, an output and many hidden layers extraction, which is time-consuming in which represent convolutional networks. Convolutional traditional methods. network includes three types of layers: convolutional, activation, and pooling. The convolutional layers apply • Spatial relationships: The convolutional layers filters to detect features (edges, textures, etc.). As the can detect local patterns (e.g., edges, textures) in image proceeds through layers, the filters can detect more images, which are crucial for tasks like tumor sophisticated features. The activation function like detection or organ segmentation. Rectified Linear Unit (ReLU) follows the convolution layer to control the output, it introduces non-linearity. • Efficiency: CNNs are computationally efficient Pooling layers reduce the dimensionality of the data, due to shared weights in convolution layers, making the model more computationally efficient and less allowing them to process large datasets more sensitive to minor positional changes in the features. The effectively. final layer is fully connected, producing predictions for classification tasks. The overall number of network Limitations: parameters is defined by the number of layers, the number of neurons in each layer, and the connection between • Data requirements: CNNs require large labeled neurons. The weights should be tuned through the training datasets to perform well, which may not always phase to achieve good performance [40]. be available in medical settings. convnet processes the image (I) using a matrix of • Limited to spatial data: While CNNs excel in weights called filters which can recognize certain features image-based data, they are not as effective for A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 123 non-spatial data like time-series or sequential layers. The size of the input image to VGG is (224×224). data. VGG has a set of convolutional filters with small sizes (3×3) to capture the information of the up/down and Use cases in medicine: CNNs have been widely applied left/right center. The size of the pre-trained weights is 528 in diagnostic tasks such as detecting cancers, classifying MB. The overall number of parameters of VGG16 is 138 lesions, and analyzing radiological images (e.g., X-rays, 357 544 parameters [43]. MRIs, CT scans) [35, 36, 41]. 4.1.2.2 InceptionV3 model architecture Recent advances in CNN, like AlexNet [42], VGGNet InceptionV3 network is the winning model [43], GoogLeNet [44], and ResNet [45], have significantly architecture of the 2015 ImageNet competition. The improved image classification accuracy, with models now Inception V3 model has a total of 48 layers. The size of outperforming human experts in some cases. These the input image to InceptionV3 is (299×299). It is deeper networks have been trained on the ImageNet Large Scale than VGG16 but with fewer parameters. The size of the Visual Recognition Challenge (ILSVRC) using millions pre-trained weights is 92 MB. It has 23 851 784 of annotated images [46, 47]. And their success has parameters [44]. spurred the rise of transfer learning, where pre-trained models are fine-tuned for specific tasks [48]. In the next 4.1.2.3 Residual neural network (ResNet) subsection, a brief description of some of the high- performance pre-trained models using ImageNet. ResNet network is the winning model architecture of the 2016 ImageNet competition. ResNet-50 contains a 50- 4.1.2.1 Visual geometry group network VGG 16– layer architecture. The size of the input image to ResNet 19 is (224×244). The size of the pre-trained weights is 99 VGG16 network is the winning model architecture of MB. It has 25 636 712 parameters [45]. the 2014 ImageNet competition. VGG consists of 16–19 Figure 9: The basic structure of CNN. Figure 10: Transfer learning architecture. 124 Informatica 49 (2025) 115–136 E.M.F. El Houby heuristics, or natural language processing (NLP) applied to 4.2 Transfer learning radiology reports. Given a sample xi which should be assigned to a class Transfer learning is a more appropriate approach when the label Ci 𝜖 {C1, C2, …., Cm}. Suppose the training set available data for training is limited. In transfer learning, consisted of pairs {X, C}, and the training is processed in an intricate model can be trained using available large- batches of size B for a total of E epochs. To train CNN scale annotated images such as natural images. Therefore, with CL, prefer to start the training with simpler samples. the TL process drives knowledge from source domain (e.g. Practically, CL is performed by assigning a probability to natural images) to a target domain or network where the every training pair, where the simpler samples are given domain images are limited. Only the small amount of higher probabilities to be chosen first. Initially, every available annotated data of the target domain is used to sample xi is assigned a probability pi(0). At the beginning tune the model. Where the fundamental features used for of each epoch e, the training set {X, C} is permuted to {X, classification are similar between domains, retraining the C}k by the reordering function F(e). Where this mapping entire model is unnecessary. In such cases, TL allows for is produced by sampling the training set based on the the transfer of learned features, with only the classification probabilities at the present epoch pi(e). After executing layer(s) being retrained on the small new dataset [48]. TL many iterations, these probabilities are updated using a leverages pre-trained models such as VGG [43], scheduler, aiming to achieve a regular distribution by the NasNetLarge [22], Inception GoogLeNet [44], ResNet end of the training process [50]. [45], etc—that have been developed for image classification and they have been presented at the annual ILSVRC [46, 47]. TL saves a great amount of time lost in 4.4 Active learning (optimal experimental developing and training CNN models. The pre-trained design) model or the required part of the model can be Supervised learning techniques rely heavily on annotated incorporated directly into the new model and used as a data. Although more datasets are becoming available, the classifier, standalone feature extractor, integrated feature effort, cost, and time required to annotate them remain extractor, or weight initializer [48, 49]. Figure 10 shows significant. On the other side, any error especially in some transfer learning architecture. important applications such as those in the medical domain can have severe consequences. Achieving reliable 4.3 Curriculum learning outcomes often requires an interactive process where In the standard educational method, learning depends on a predictions are reviewed or modified by an oracle or user. curriculum that presents new concepts based on previously This means users must be able to override and adjust collected ones. The rationale beyond this is that people automated predictions to meet specific criteria. pick up better if the information is introduced in a Techniques such as Active Learning (AL) or what is called meaningful method instead of randomly. By using the Human-in-the-Loop computing have witnessed progress same ideas to train neural networks starting with simple in overcoming these challenges [51]. cases, it was noticed that the networks perform better, Active learning is a semi-supervised learning which indicates the significance of gradual and systematic approach that begins with a small set of labeled samples learning [50]. (seed samples) and iteratively selects the most informative The curriculum learning (CL) approach is motivated samples from a pool of unlabeled data for annotation. By by the capacity of humans to pick up new tasks fast with focusing training on the most informative subset of finite "training sets". Similarly, the training procedure of samples, AL improves model performance and reduces the medical students called teacher-student curriculum annotation burden, particularly for image data. In AL, an learning is based on training by tasks with gradually MLT scans unlabeled data and recognizes the most growing difficulty. While each task uses smaller datasets informative samples. These samples are then presented to than those utilized in machine learning. Like, students can a human annotator (oracle) for labeling. This makes AL a start with a simple task, such as deciding if an image part of the Human-in-the-Loop paradigm, where only includes lesions, and later are asked to determine if the selected samples are used for training, often far fewer than lesions are malignant or benign which is a more in traditional supervised learning [51]. complicated task. With time, they will progress to a more Formally, suppose that U is available big pool of complex task, like recognizing the subtypes of lesions [8]. unannotated data and that there are oracles to request In machine learning, CL works with a series of annotations for any unannotated sample xU to be added to training samples sorted in increasing order according to annotated set L. The goal is to train a model f(x | L∗) using learning difficulty. The order in which the samples are the annotated set L∗⊆L. A brute-force solution would introduced to the model is critical, as it can significantly involve requesting the oracle(s) to annotate each sample impact the model's performance. Curriculum learning is an xU, resulting in L∗ = L. However, this is a costly and not active area of research, particularly in applications such as practical solution. Theoretically, there is an optimal subset medical image diagnosis [8]. L∗ of data that can achieve performance equivalent to that A key point in CL is the design of data schedulers that obtained using the whole annotated dataset L, i.e. (f(x | L∗) control the sequence in which training samples are fed into ≈f(x | L)). AL is a trend of ML that tries to explore this the model. These schedulers can use a variety of methods optimal subset L∗, where the current model is f´(x | L´), L´ to determine sample difficulty, such as expert input, is an intermediate annotated data. AL intends to iteratively A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 125 explore the most informative data samples 𝑥∗ 𝑖 to train the the difference between them, then considers the sample model, assuming that the unannotated data samples and the that has the smallest difference between the first and model will evolve through time, rather than choosing a second most likely labels to be annotated. constant subset of samples once for training. Entropy sampling uses entropy as it is a measure of The selection of samples to be annotated is based on uncertainty to select a sample to be annotated. Entropy the informativeness of these requested samples. The measures the amount of information gained by considering evaluation of the informativeness of each un-annotated a sample and so selects the sample that maximizes the data sample xU is done given f´(xU | L´), then all selected information that has the largest entropy value [51]. samples are demanded to be annotated. After the annotations, the new annotated data has been used to 4.5 Federated learning improve the model. This is done by retraining the whole Federated learning (FL), developed by Google in 2017 is a model using all available annotated data L´, or by using the collective distributed decentralized learning method that most recently annotated sample 𝑥∗ 𝑖 to fine-tune the network allows many organizations to collaborate on machine [51]. learning or deep learning models without sharing clients' Active Learning typically employs three methods to or devices’ data. It allows the training data to be on the select samples for annotation: decentralized edge devices rather than keeping it in a data Stream-based selective sampling supposes the center. These individual nodes or devices jointly train a existence of a continuous flow of unannotated data machine learning or deep learning model from their local samples xU. In this method, the present model and data and then aggregate the devices' training outputs on the informativeness I(xU) measure are the criteria used to server to update the global model without sharing edge specify, for each incoming sample whether or not to data. The resulting model can be shared among all require an annotation from the oracle(s). Thus, while the participating devices or clients. Therefore, it provides model is being trained, it is offered a data sample and secure models that fulfill an efficient solution while instantly decides if it needs to query for the label. Although providing data access and security [52-54]. this type of query is inexpensive, its performance is limited One major issue with centralized models is that because it does not consider the broader context of the medical organizations do not allow to break doctor-patient underlying distribution, but it depends on the separation confidentiality by providing medical images such as CT nature of each decision, therefore the balance between the and X-ray images for training purposes because of privacy, exploration and exploitation is less than in other query legal, and data-ownership issues. To develop deep learning kinds. models for the medical domain, large medical data is Membership query synthesis generates the sample needed to develop these models. Therefore, many medical 𝑥∗ 𝐺 that the model believes to be most informative, rather researchers illustrated that federated learning is a good than selecting from real-world data. Therefore, it is technique to connect different medical organizations and annotated by the oracle(s). This method may be very let them share their experiences while keeping privacy. effective in bounded domains, but it may struggle when the Furthermore, the performance of the learning model will model has no knowledge of unrepresented areas of the data be improved using a large medical dataset. However, the distribution, similar to stream-based methods. resulting models may be biased toward organizations that Pool-based sampling selects N data samples 𝑥∗ 0 , . . . have larger training datasets [53]. , 𝑥∗ 𝑁 from a large unlabeled dataset U to pull samples from. In federated learning, the process begins by sending a Pool-based approaches use the present model to do a global model with unified initial weights to each client. At prediction on un-annotated data samples to get a ranked each client side, there is a local dataset, where the model is measure of informativeness for each data sample in the un- trained in each separately. After completing local training, annotated data. The highest N informative samples are the client sends its model updates back to the server, which selected for annotation by the oracle(s). Therefore, the aggregates these updates to refine the global model, while model is initially trained on labeled samples which are then the data at the clients remains local in each client. The used to find which data samples would be most server has the authority to manage the whole process informative to be inserted into the training set for the next where it sends the model to the client, collects the updates, AL loop. This approach has proved to be the most and synchronizes them to build the updated model with the promising, which depends on batch-based training. Figure new parameters. This method enables medical 11 shows the full process of active learning. organizations to collaborate on training models while AL uses some informativeness measures of unlabeled maintaining data privacy. There are different federated samples to select the most informative samples. They learning algorithms according to the computation method depend on probabilities, these approaches are least of gradients such as federated stochastic gradient descent, confidence sampling, margin sampling, and entropy federated averaging, and federated learning with dynamic sampling [51]. regularization. [53, 54]. Figure 12 shows the architecture Least confidence sampling the model selects the of federated learning. highest uncertainty sample or least confidence for annotation and therefore is given to the oracle to be labeled. Margin sampling can be utilized in a multi-class, it uses the first and second most likely labels and computes 126 Informatica 49 (2025) 115–136 E.M.F. El Houby Figure 11: The process of active learning. Figure 12: Federated Learning architecture. 5 Search methodology were used in the search: “active learning”, “curriculum learning”, “deep learning”, “transfer learning” and 5.1. Search criteria “federated learning” to investigate the different research that utilizes these recent trends. Additional keywords— This research investigates recent trends in machine "medical", "disease", "cancer" and "gene" were included learning (ML) within the medical domain. To achieve this, to focus the search on medical applications that used these we explored a ScienceDirect (Elsevier) new trends. Although the search intended to retrieve the (http://www.sciencedirect.com). The following keywords articles related to any disease, "cancer" was added to A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 127 retrieve more relevant results, given that much of the 5.2. Data extraction recent research in ML is focused on cancer. Publications As the search retrieved a large number of articles, from 2016 to 2024 were considered. The composition of therefore only a subset of the retrieved articles was the used terms to form the search query used for deep selected for analysis. Figures 13-15 illustrate the number learning-based techniques in the medical domain was: of publications per year for the various techniques "Deep learning" AND ("medical" OR "Disease" OR between 2016- 2024, based on “Elsevier” database to "Cancer" OR "Gene"). show the growth rates of these new trends. Where the aim of this research is to find the new • Figure 13 shows the steady increase in deep learning trends in machine learning techniques which after accurate publications, from 17 articles in 2016 to 2958 in 2024, investigation were found to be mostly based on “deep indicating a growing interest in applying deep learning learning” either alone or combined with other new in the medical domain. techniques such as “transfer learning”, “active learning”, • Figure 14 shows that transfer learning started to be “curriculum learning” and “federated learning”, so the applied in the medical domain in 2017 with 2 articles same query was used as for deep learning-based only and reached 218 in 2024. techniques with adding the other techniques’ keyword as follow: • Figures 15 shows that the number of publications of ("Deep learning" AND "*") AND ("medical" OR active learning, curriculum learning, and federated "Disease" OR "Cancer" OR "Gene") learning is limited and scattered across the years as they Where “*” can be replaced by any of the other are newly emerged trends. techniques’ keywords (“transfer learning”, “active The selected articles were drawn from top journals in learning”, “curriculum learning”, and “federated ScienceDirect, adhering to the criteria mentioned above. learning”). The references provide a sample of the applications of The following criteria were applied to select the these new ML techniques in the medical domain, rather publications: (1) Articles related to human diseases (other than an exhaustive list. For each reference, key details organisms’ related diseases are excluded); (2) Inclusion of such as the task, disease, technique(s) used, evaluation at least one of the new ML techniques; (3) Only complete results, and data type are presented. research articles were included (excluding letters, surveys, book chapters, and non-English articles); (4) publications published from 2016 to 2024. Figure 13: The number of articles published on deep learning from 2016- 2024 in Elsevier database. 128 Informatica 49 (2025) 115–136 E.M.F. El Houby Figure 14: The number of articles published on transfer learning from 2016- 2024 in Elsevier database. Figure 15: The number of articles published on active/curriculum/Federated learning from 2016- 2024 in Elsevier database. 6 Some applications of new trends of accuracy of more than 99% and FAUC of 0.982 when applied to the Chest X-ray radiographs dataset [56]. MLTs in the medical domain El Houby & Yassin [57] developed a CNN model to This section illustrates the selected articles from the classify the breast mammographic images' into retrieved ones from searching the databases which nonmalignant or malignant. They used 2 methods, the first represent the applications of previously discussed is based on patches of region of interest (ROI) in the emerging ML trends in the medical domain. mammogram and the second is based on the whole breast. Li, X., et al., [55] proposed a DL model to detect lung The accuracy, specificity, sensitivity, and AUC were nodules. First, segmentation and rib suppression were 95.3%, 92.6%, 98%, and 0.974 respectively using MIAS applied to extract the region of interest and enhance the [20] dataset, while they were 96.52%, 96.49%, 96.55%, nodules’ visibility. Then, the histogram was applied to and 0.98 using INbreast [58] dataset. enhance the images. After that, patch-based multi- Dai, Y., et al., [59] developed a deep learning CNN resolution CNN was used for feature extraction, and 4 model for detecting coronary artery disease utilizing raw fusion methods were employed for classification, the best heart sound signals. It extracts 206 multidomain features performance method to detect lung nodules achieved an and 126 medical multidomain features. The heart sound signal datasets have been collected from 400 patients from A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 129 the hospital of Xinjiang Medical University. The model Neuroimaging dataset [69] and achieved an accuracy of achieved an accuracy of 87.86, sensitivity of 90.67, 98.54% recalls of 98.9%, a precision of 98.98%, and an specificity of 82.38, and AUC of 94.70 using multidomain F1 score of 98.82%. features. It achieved an accuracy of 85.6, sensitivity of Kumar et al. [70] developed a CNN model using the 88.04, specificity of 80.83, and AUC of 92.74 using Resnet152 TL approach with feature extractors to classify medical multidomain features. the brain tumor images into normal, benign, and Alassafi et al., [60] proposed a model that predict the malignant. The model has been applied to the Brats MRI distribution of the COVID-19 outbreak in Saudi Arabia, image dataset. The proposed transfer Learning model Malaysia, and Morocco. A DL RNN and LSTM network achieved high accuracy reaching 99.57%. were developed to predict the number of possible cases of Manickam, et al., [71] proposed a deep TL model for COVID-19. The LSTM achieved an accuracy of 98.58%, pneumonia detection. The chest X-ray images were while the RNN achieved an accuracy of 93.45%. A preprocessed to recognize the existence of pneumonia comparison was conducted between the number of based on the U-Net segmentation network, then classify resulting deaths and the number of coronavirus cases in the cases as normal or abnormal (Bacteria, viral) using each of the 3 countries. The model predicted the number pre-trained models such as ResNet50, InceptionV3, and of certain COVID-19 cases and deaths for the following 7 Inception ResNetV2. It was evaluated using a publicly days. The model was tested using a public dataset from the available database which includes 5,232 chest x-ray European Centre for Disease Prevention and Control [61]. images. ResNet50 model achieved an accuracy of 93.06%, Maiti et al., [62] developed a deep learning (DL)- precision of 88.97%, Recall of 96.78%, and F1-score of based framework to automatically detect and segment the 92.71%. optic disc from fundus images for the diagnosis of diabetic Veknugopal, et al., [72] developed a DNN using retinopathy. The framework utilized an adjusted CNN, modified EfficientNetV2-M based on transfer learning to experimenting with seven different encoder networks: detect skin cancer on dermoscopic images. The model was DenseNet121, InceptionV3, ResNet34, GG11, VGG19, applied to 58,032 dermoscopic images collected from [73- VGG13, and VGG16. VGG16 was selected as the adopted 77]. The model was tested for binary classification tasks encoder, while the decoder was designed with a harmonic and the multiclass classification tasks. It achieved an structure based on that of the encoder to improve accuracy reached 97.62 for the multiclass classification of segmentation performance. The framework was applied to the ISIC 2020 dataset, while it achieved an accuracy of several fundus image datasets, including DIARETDB1, 99.23 for the binary classification of the same dataset. MESSIDOR, IDRiD, DIARETDB0, CHASE-DB1, Mehmood, et al., [78] developed a model to diagnose DRIVE, and STARE. It achieved an impressive accuracy Alzheimer’s disease (AD) in its early stage based on TL of 99.44%. using VGG-19 pre-trained model. The model Zareen, et al. [63] developed a skin cancer distinguishes among 4 classes which are AD, late mild classification deep learning CNN-RNN model with a cognitive impairment (LMCI), early mild cognitive ResNet-50 for spatial features extraction and LSTM for impairment (EMCI), and normal control (NC). The used temporal dependencies. The model has been applied to a dataset was collected from the AD Neuroimaging dataset of 9000 images of skin lesions representing 9 Initiative (ADNI) [69] database. In the pre-processing cancer types. The model achieved an accuracy of 94.48, a phase, the gray matter (GM) tissue was segmented from sensitivity of 94.38, and a specificity of 93. brain MRI, and then VGG-19 was used to classify the Ge, R., et al., [64] Proposed a Dual-Enhanced segmented parts. The model achieved an accuracy of Convolutional Ensemble Neural Network (DECENN) to 98.73% to distinguish between AD and NC, 83.72% to detect the presence or absence of metastasis in the whole distinguish between LMCI and EMCI cases, and more slide imaging patches of breast cancer. It utilizes VGG16 than 80% to distinguish between the other combinations and DenseNet121 in the network. It was applied to the of classes. updated version of a benchmark dataset of microscopic Al-Shabi, Shak, and Tan, [79] developed a images and histopathologic scans of lymph node sections Progressive Growing Channel Attentive Non-Local for the breast [65]. It achieved an accuracy of about (ProCAN) deep learning model to classify lung nodules as 98.92%, an AUC of 99.70%, and a F-score of 98.93%. benign or malignant. Curriculum Learning (CL) was used Liu, Q., X. She & Q. Xia [66] proposed a model to to train easy samples before hard samples. The model has classify osteosarcoma cells and other cell types using an gradually grown to improve the possibility of classifying updated version of CA-MobileNet V3 based on the the samples based on CL. The model has been applied to transfer learning model. It was applied to osteosarcoma samples from 2 publicly available CT scan datasets LIDC- cells microscopy imaging of bone cancer dataset [67]. It IDRI [80] and LUNGx [81]. It achieved an accuracy of achieved an accuracy of 98.69 % and f1-score of 94.11. 95.28% and AUC of 98.05%, precision of 95.75, Oommen & Arunnehru [68] proposed a model to sensitivity of 94.33 and F1-Score of 95.04. diagnose Alzheimer’s disease in its early stages. The Cho, Y., et al., [82] proposed CL model using a DL proposed model contains 3 phases: preprocessing the CNN to classify chest radiograph (CXR) images into images, extracting features using TL with ResNet-18, normal and five types of pulmonary abnormalities. The which are then compressed by cascaded autoencoders model used ResNet-50 for training on patches of CXR (AE), and finally classifying the disease to one of its 5 images based on the various patch ratios according to pre- stages using DNN. The model was applied to the MRI trained weights, with fine-tuning using transfer learning 130 Informatica 49 (2025) 115–136 E.M.F. El Houby (TL). The model was applied to CXR from hospitals, Zhang, et al., [92] developed a semi-supervised including Seoul National University Bundang Hospital framework for brain segmentation that incorporates (SNUBH) and Asan Medical Center (AMC). It achieved quality-driven active learning (QDAL). In the AL module, the following accuracies: 90.97% for 20% of the dataset at deep supervision loss and attention mechanism improve SNUBH, 91.92% for 50%, and 93.00% for 100%. At the accuracy of segmentation and return quality AMC, the accuracies were 93.90%, 94.54%, and 95.39%, information for the unlabeled slices. The AL module respectively. chooses the most informative slices to be annotated, and Wong et al., [83] developed a CL-based method for the segmentation process is trained iteratively using the classifying medical images, using features from updated labeled data. The framework was tested on two segmentation networks. The model first learns simpler brain MRI datasets [93, 94]. The experiment results shapes and features through a segmentation network pre- showed that the segmentation utilizing the QDAL only trained on similar data, then applies this knowledge for wants 15–20% annotated slices for the brain extraction more complex classification tasks. The M-Net, a CNN task, and 30–40% for tissue segmentation, achieving modified from U-Net for working with fewer training competitive results with full supervision and an accuracy samples, was used for segmentation. Then the CNN of 90.7. classifier receives the features from a segmentation Lu, Q., et al.,[95] presented a blood cell classification network as inputs. The model achieved an accuracy of method called MAE4AL, which combines the self- 82% in a 3D 3-classes brain tumor classification and 86% supervised Masked Autoencoder (MAE) and active on a 2D nine-class cardiac semantic level classification learning (AL). It chooses the most remarkable samples for problem. labeling based on self-supervised loss of MAE and sample Wu, et al. [84] developed a weakly-supervised deep uncertainty. Tested on blood smear samples obtained from AL framework to diagnose COVID-19 using CT scans. [96], MAE4AL needed labeling only 20% of the data to The framework contains a 2D U-Net for segmentation of perform the same as ResNeXt, which was trained on the the lung region and a hybrid active learning approach, full dataset. When it trained using half of the labeled data, which keeps sample diversity and predicted loss diagnosis MAE4AL achieved an accuracy of 96.36%, of COVID-19. The framework classifies the CT scans into outperforming ResNeXt which trained on all the data. one of three classes which are pneumonia, coronavirus Kumbhare et al. [97] developed a FL method for pneumonia caused by SARS-CoV-2, and normal cases. breast cancer diagnosis using mammogram images from The framework was validated on a CT scan dataset from the “Curated Breast Imaging Subset of DDSM (CBIS- the China Consortium of Chest CT Image Investigation DDSM)” dataset [98]. The DenseNet pre-trained model (CC-CCII) [85]. With only 30% of the labeled data, the was used for feature extraction and the extracted features accuracy of the framework reached 0.867, while AUC was were classified using Enhanced Recurrent Neural 0.968. Networks (E-RNN). FL was employed to reduce Wu, X., et al., [86] proposed a hybrid active learning processing time and improve model performance. The (HAL) framework that combines AL with deep TL using method achieved an accuracy of 95%. ResNet18. The framework applies data augmentation to Feki, et al. [53] proposed a decentralized FL the unlabeled data pool and uses a hybrid sampling framework that permits different medical organizations to approach that maintains sample variety and classification screen COVID-19 using Chest X-ray images based on loss (data uncertainty). The diversity sampling is based on deep learning while keeping the privacy of patient data. data augmentation, while the generated data noise is Two pre-trained models which are VGG16 and ResNet50 discarded with an outlier detection process. The HAL was were used for classification. The framework was tested validated on 3 medical image datasets which are the using four clients, where each client has his private dataset Hyper-Kvasir for gastrointestinal disease [87], Messidor and the same CNN models. The proposed FL framework for eye fundus images [88], and breast cancer datasets achieved competitive results compared to those models [89]. By applying the proposed framework to the Hyper- trained by sharing data. The best achieved accuracy was Kvasir dataset it achieves an accuracy of 0.871, precision 97% using the ResNet50 model with data augmentation. of 0.602, recall of 0.587, and F1-score of 0.594. Zhang, et al. [99] proposed a FL based on DL Meirelles, et al., [90] used Pool-based AL to train DL framework for the diagnosing brain disorders. The models for classifying Tumor Infiltrating Lymphocytes. proposed framework was tested on Autism Brain Imaging The proposed approach selects image patches based on Data Exchange (ABIDE) [100] dataset. The proposed feature grouping and prediction uncertainty. They framework achieved an average accuracy of 79% and introduced a Diversity-Aware Data Acquisition (DADA) reduced the communication burden of FL. method, which ensures diverse batch selection by Shaikh, et al. [101] developed an FL-based DL clustering images based on features and then choosing method to classify respiratory diseases by listening to lung uncertain patches from each cluster. The most uncertain sounds. Generative Adversarial Networks created new patches from each cluster are prioritized for selection, the lung sounds to train a neural network that classifies 4 lung clusters with the most uncertain patches must participate diseases, heart attack and normal breathing patterns. Using with more patches, the pool is updated by removing the two datasets [102, 103], the proposed method achieved an selected patches. By applying the proposed model to the accuracy of 92% for the classification of different cancer tissue image dataset [91], it achieved an AUC of respiratory diseases and heart failure. 0.78 with fewer tissue patches and execution time. A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 131 Table 1 provides a summary of 25 selected articles not to present a comprehensive list. For each reference, the from top journals on Science Direct, published between table includes the task, disease, techniques used, 2016 and 2024, based on a database search. These articles evaluation results, and data type. showcase applications of recent trends of MLTs in the medical domain and are intended to illustrate these trends, Table 1: Summary of the selected articles from search results for applications of the new ML in medical domain. Ref. Task Disease Used Technique(s) Evaluation Data Type results [55] Detection Chest DL-CNN Acc.= 99% x-ray radiographs Lung cancer FAUC = 0.982 [57] Classification Breast cancer DL-CNN Acc.=96.52% mammogram spec.=96.4% Sen.=96.5% AUC =0.98 [59] Detecting Coronary artery DL-CNN Acc.=87.86, % heart sound disease Sen.=90.67% signals Spec.=82.38 % AUC = 94.70 [60] Prediction COVID-19 DL-RNN Acc. = 93.45% Numerical LSTM Acc.= 98.58% [62] Segmentation Diabetic DL-CNN Acc.= 99.44% fundus images diagnosis retinopathy [63] Classification Skin cancer DL-CNN-RNN Acc.=94.48, skin lesion Sen.= 94.38 images spec. = 93 [64] Detection Breast cancer DL-TL-VGG16- Acc.=98.92%, histopathologic DenseNet121 AUC =99.70%, image of lymph F-score = 98.93% node [66] Classification Bone cancer TL Acc.=98.69 % microscopic CA-MobileNetV3 f1-score= 94.11% images of bone cancer [68] Classification Alzheimer’s TL- ResNet-18- AE- Acc. = 98.54% MRI disease DL recalls = 98.9% Neuroimaging prec.=98.98% dataset F1-score=98.82% [70] Classification Brain tumor TL-Resnet152-CNN Acc. =99.57% MRI [71] Segmentation Pneumonia U-Net Acc.=93.06% chest X-ray Detection TL- ResNet50 prec.=88.97% Rec.=96.78% F1score=92.7 [72] Classification Skin cancer TL-EfficientNetV2- Acc. = 99.23 dermoscopic M images [78] Classification Alzheimer TL-VGG19 Acc.=98.73% MRI [79] Classification lung nodules DL-CNN-CL Acc. = 95.28% CT scans AUC = 98.05% Precision = 95.75 Sen = 94.33 F1-Score = 95.04 [82] Classification pulmonary TL-ResNet-50 -CL Acc. =93.90, CXR abnormalities 94.54, 95.39 For 20%, 50%, 100% of dataset [83] Segmentation brain tumor TL-M-Net Acc.= 82% MR Classification cardiac DL-CNN-CL Acc.= 86% 132 Informatica 49 (2025) 115–136 E.M.F. El Houby [84] Segmentation COVID-19 TL-U-Net Acc=0.866 CT scans Classification DL-AL ROC= 0.968 [86] Classification gastrointestinal TL-ResNet18 AL Acc. =0.871 Images disease Prec. =0.602 Recall=0.587 F1score=0.594 [90] Classification Tumor DL_CNN_AL AUC = 0.78 histology image Infiltrating Lymphocytes [92] Segmentation Brain DL_CNN_AL Acc.= 90.7 MRI [95] Classification blood diseases MaskedAutoencoder Acc.= 96.36% blood smear (Leukemia) (MAE4AL) samples [97] Classification Breast FL-TL-DenseNet- Acc.=95% mammogram Cancer RNN [53] Classification COVID-19 FL-TL-VGG16/ Acc. = 97% X-ray images ResNet50 [99] Classification Brain disorders FL-CNN Acc. = 79% Autism Brain Imaging [101] Classification respiratory FL-DL Acc. = 92% breathing sounds diseases& heart failure 7 Conclusion and future work derived from limited labeled data, thereby enhancing medical decision-making and patient outcomes. This research explored the emerging trends in machine Looking ahead, there are several key areas where learning techniques (MLTs) within the medical domain. further work is needed. While the number of publications Through a comprehensive literature review, we found that on deep learning in the medical domain has steadily deep learning has become the dominant trend, holding increased since its initial applications in 2016, and significant promise for developing intelligent medical although these applications have yielded promising applications. A key advantage of deep learning is its results, further research is essential to address several key ability to perform automatic feature engineering, challenges. Areas such as active learning, curriculum simplifying the model-building process and reducing learning, and federated learning have shown promise but reliance on manual input. Current research predominantly remain under-explored and require more attention in addresses diagnostic tasks, with disease classification future research. A critical direction for future is to focus being the most common approach. Other tasks, such as on reducing the time and computational costs associated segmentation, are also explored. Cancer, in its various with deep learning models and other trends. These forms, is the most frequently studied condition, while the processes often consume substantial energy, indirectly COVID-19 pandemic has notably led to a surge in contributing to environmental and climate concerns. research on lung diseases. Therefore, developing more energy-efficient techniques In the realm of medical imaging, traditional machine will be crucial. Additionally, data augmentation, a learning approaches require extensive pre-processing, significant pre-processing step in deep learning, could be including feature extraction and selection. Deep learning, integrated more effectively into the model-building particularly Convolutional Neural Networks (CNNs), has process itself, thereby enhancing sample diversity and advanced the field by automating feature engineering, improving class representation with less manual effort. reducing the need for manual intervention. However, this Another important aspect for future research is the comes with an increased demand for large datasets and development of standardized, public databases that significant computational resources. To address these include diverse patient data, such as DNA sequences. challenges, recent trends like transfer learning, curriculum These databases would enable more comprehensive learning, active learning, and federated learning have been studies and improve the accuracy of predictive models by introduced to enhance model performance, expedite the providing a richer set of input data. Additionally, training process, and improve data security. In summary, integrating knowledge from multiple domains could the overarching goal in this field is to automate processes, further enhance the performance of deep learning models reduce human intervention, and maximize the value in different medical applications. Despite the progress A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 133 made, the real challenge lies in translating these [15] Zhang, Z. and E. Sejdić, Radiological images and advancements into practical, real-world applications that machine learning: trends, perspectives, and can be implemented in clinical settings. Bridging the gap prospects. Computers in biology and medicine, between theoretical research and clinical deployment will 2019. 108: p. 354-370. be vital to realizing the full potential of deep learning in [16] Saslow, D., et al., American Cancer Society medicine. guidelines for breast screening with MRI as an adjunct to mammography. CA: a cancer journal for Conflicts of interest clinicians, 2007. 57(2): p. 75-89. [17] http://medicaldictionary.thefreedictionary.com/op The author has no competing interests to declare. erating+microscope. [18] Jones, N.C. and P.A. Pevzner, An introduction to References bioinformatics algorithms. 2004: MIT press. [19] Rahman, T., et al., COVID-19 radiography [1] Chen, M., et al., Disease prediction by machine database. learning over big data from healthcare communities. https://www.kaggle.com/tawsifurrahman/covid19- Ieee Access, 2017. 5: p. 8869-8879. radiography-database. [2] Grossman, R.L., et al., Toward a shared vision for [20] Suckling, J., et al., Mammographic image analysis cancer genomic data. New England Journal of society (mias) database v1. 21. 2015. Medicine, 2016. 375(12): p. 1109-1112. [21] Labati, R.D., V. Piuri, and F. Scotti. All-IDB: The [3] Schaekermann, M., et al., Understanding expert acute lymphoblastic leukemia image database for disagreement in medical data analysis through image processing. in 2011 18th IEEE international structured adjudication. Proceedings of the ACM on conference on image processing. 2011. IEEE. Human-Computer Interaction, 2019. 3(CSCW): p. 1- [22] Zoph, B., et al. Learning transferable architectures 23. for scalable image recognition. in Proceedings of [4] Garg, A. and V. Mago, Role of machine learning in the IEEE conference on computer vision and medical research: A survey. Computer Science pattern recognition. 2018. Review, 2021. 40: p. 100370. [23] Thusberg, J. and M. Vihinen, Pathogenic or not? [5] Dallora, A.L., et al., Machine learning and And if so, then how? Studying the effects of microsimulation techniques on the prognosis of missense mutations using bioinformatics methods. dementia: A systematic literature review. PloS one, Human mutation, 2009. 30(5): p. 703-714. 2017. 12(6): p. e0179804. [24] El Houby, E.M., Machine learning techniques for [6] Kharazmi, P., et al., A computer-aided decision pathogenicity prediction of non-synonymous single support system for detection and localization of nucleotide polymorphisms in human body. Journal cutaneous vasculature in dermoscopy images via of Ambient Intelligence and Humanized deep feature learning. Journal of medical systems, Computing, 2023. 14(7): p. 8099-8113. 2018. 42(2): p. 1-11. [25] Han, J., J. Pei, and M. Kamber, Data mining: [7] Xu, J., K. Xue, and K. Zhang, Current status and concepts and techniques. 2011: Elsevier. future trends of clinical diagnoses via image-based [26] Hamla, H. and K. Ghanem, A hybrid feature deep learning. Theranostics, 2019. 9(25): p. 7556. selection based on fisher score and SVM-RFE for [8] Xie, X., et al., A survey on incorporating domain microarray data. Informatica, 2024. 48(1). knowledge into deep learning for medical image [27] Fahrudin, T.M., I. Syarif, and A.R. Barakbah. Ant analysis. Medical Image Analysis, 2021. 69: p. colony algorithm for feature selection on 101985. microarray datasets. in 2016 International [9] Lee, C.H. and H.-J. Yoon, Medical big data: promise Electronics Symposium (IES). 2016. IEEE. and challenges. Kidney research and clinical [28] Talavera, L. An evaluation of filter and wrapper practice, 2017. 36(1): p. 3. methods for feature selection in categorical [10] Dinov, I.D., Methodological challenges and analytic clustering. in International Symposium on opportunities for modeling and interpreting Big Intelligent Data Analysis. 2005. Springer. Healthcare Data. Gigascience, 2016. 5(1): p. [29] Tabakhi, S. and P. Moradi, Relevance–redundancy s13742-016-0117-6. feature selection based on ant colony optimization. [11] http://archive.ics.uci.edu/ml/datasets/. Pattern recognition, 2015. 48(9): p. 2798-2811. [12] Available from: http://image- [30] Yang, X.-S., Firefly algorithm, stochastic test net.org/challenges/LSVRC/. functions and design optimisation. International [13] https://file.biolab.si/biolab/supp/bicancer/projection journal of bio-inspired computation, 2010. 2(2): p. s/. 78-84. [14] Corsetti, V., et al., Evidence of the effect of adjunct [31] Mashhour, E.M., et al., Feature Selection ultrasound screening in women with mammography- Approach based on Firefly Algorithm and Chi- negative dense breasts: interval breast cancers at 1 square. International Journal of Electrical & year follow-up. European journal of cancer, 2011. Computer Engineering (2088-8708), 2018. 8(4). 47(7): p. 1021-1026. [32] Neagoe, V.-E. and E.-C. Neghina. Feature selection with ant colony optimization and its 134 Informatica 49 (2025) 115–136 E.M.F. El Houby applications for pattern recognition in space neural networks. in Proceedings of the IEEE imagery. in 2016 international conference on conference on computer vision and pattern communications (COMM). 2016. IEEE. recognition. 2014. [33] El Houby, E.M., N.I. Yassin, and S. Omran, A [49] Wang, L., et al., Trends in the application of deep hybrid approach from ant colony optimization and learning networks in medical image analysis: K-nearest neighbor for classifying datasets using Evolution between 2012 and 2020. European selected features. Informatica, 2017. 41(4). Journal of Radiology, 2022. 146: p. 110069. [34] LeCun, Y., Y. Bengio, and G. Hinton, Deep [50] Jiménez-Sánchez, A., et al. Medical-based deep learning. nature, 2015. 521(7553): p. 436-444. curriculum learning for improved fracture [35] Goodfellow, I., Y. Bengio, and A. Courville, Deep classification. in International Conference on learning. 2016: MIT press. Medical Image Computing and Computer-Assisted [36] Dezaki, F.T., et al., Cardiac phase detection in Intervention. 2019. Springer. echocardiograms with densely gated recurrent [51] Budd, S., E.C. Robinson, and B. Kainz, A survey neural networks and global extrema loss. IEEE on active learning and human-in-the-loop deep transactions on medical imaging, 2018. 38(8): p. learning for medical image analysis. Medical 1821-1832. Image Analysis, 2021. 71: p. 102062. [37] Raiaan, M.A.K., et al., A systematic review of [52] KhoKhar, F.A., et al., A review on federated hyperparameter optimization techniques in learning towards image processing. Computers Convolutional Neural Networks. Decision and Electrical Engineering, 2022. 99: p. 107818. Analytics Journal, 2024: p. 100470. [53] Feki, I., et al., Federated learning for COVID-19 [38] Ali, M.J., et al., A review of AutoML optimization screening from Chest X-ray images. Applied Soft techniques for medical image applications. Computing, 2021. 106: p. 107330. Computerized Medical Imaging and Graphics, [54] Wu, J.C.-H., et al., Dynamically Synthetic Images 2024: p. 102441. for Federated Learning of Medical Images. [39] Ambekar, S. and R. Phalnikar. Disease risk Computer Methods and Programs in Biomedicine, prediction by using convolutional neural network. 2023: p. 107845. in 2018 Fourth international conference on [55] Li, X., et al., Multi-resolution convolutional computing communication control and automation networks for chest X-ray radiograph based lung (ICCUBEA). 2018. IEEE. nodule detection. Artificial intelligence in [40] Anwar, S.M., et al., Medical image analysis using medicine, 2020. 103: p. 101744. convolutional neural networks: a review. Journal [56] Li, X., et al. Rib suppression in chest radiographs of medical systems, 2018. 42(11): p. 1-13. for lung nodule enhancement. in 2015 IEEE [41] Oraibi, Z.A. and S. Albasri, A robust end-to-end International Conference on Information and cnn architecture for efficient covid-19 prediction Automation. 2015. IEEE. form x-ray images with imbalanced data. [57] El Houby, E.M. and N.I. Yassin, Malignant and Informatica, 2023. 47(7). nonmalignant classification of breast lesions in [42] Krizhevsky, A., I. Sutskever, and G.E. Hinton, mammograms using convolutional neural Imagenet classification with deep convolutional networks. Biomedical Signal Processing and neural networks. Advances in neural information Control, 2021. 70: p. 102954. processing systems, 2012. 25. [58] I. C. Moreira, I.A., I. Domingues, A. Cardoso, M. [43] Simonyan, K. and A. Zisserman, Very deep J. Cardoso, and J. S. Cardoso, Inbreast: toward a convolutional networks for large-scale image full-field digital mammographic database. recognition. arXiv preprint arXiv:1409.1556, Academic radiology, 2012. 19(2): p. 236-248. 2014. [59] Dai, Y., et al., Deep learning fusion framework for [44] Szegedy, C., et al. Rethinking the inception automated coronary artery disease detection using architecture for computer vision. in Proceedings of raw heart sound signals. Heliyon, 2024. 10(16). the IEEE conference on computer vision and [60] Alassafi, M.O., M. Jarrah, and R. Alotaibi, Time pattern recognition. 2016. series predicting of COVID-19 based on deep [45] He, K., et al. Deep residual learning for image learning. Neurocomputing, 2022. 468: p. 335-344. recognition. in Proceedings of the IEEE [61] https://www.ecdc.europa.eu/en/publications- conference on computer vision and pattern data/download-todays-data-geographic- and recognition. 2016. distribution-covid-19-cases-worldwide. [46] Deng, J., et al. Imagenet: A large-scale [62] Maiti, S., et al., Automatic detection and hierarchical image database. in 2009 IEEE segmentation of optic disc using a modified conference on computer vision and pattern convolution network. Biomedical Signal recognition. 2009. Ieee. Processing and Control, 2022. 76: p. 103633. [47] Russakovsky, O., et al., Imagenet large scale [63] Zareen, S.S., et al., Enhancing Skin Cancer visual recognition challenge. International journal Diagnosis with Deep Learning: A Hybrid CNN- of computer vision, 2015. 115(3): p. 211-252. RNN Approach. Computers, Materials & Continua, [48] Oquab, M., et al. Learning and transferring mid- 2024. 79(1). level image representations using convolutional A Review of Machine Learning Techniques in Medical Domain Informatica 49 (2025) 115–136 135 [64] Ge, R., et al., Detection of presence or absence of [78] Mehmood, A., et al., A transfer learning approach metastasis in WSI patches of breast cancer using for early diagnosis of Alzheimer’s disease on MRI the dual-enhanced convolutional ensemble neural images. Neuroscience, 2021. 460: p. 43-52. network. Machine Learning with Applications, [79] Al-Shabi, M., K. Shak, and M. Tan, ProCAN: 2024. 17: p. 100579. Progressive growing channel attentive non-local [65] Cukierski, W., Histopathologic cancer detection. network for lung nodule classification. Pattern Kaggle. https://kaggle. Recognition, 2022. 122: p. 108309. com/competitions/histopathologic-cancer- [80] Armato III, S.G., et al., The lung image database detection, 2018. consortium (LIDC) and image database resource [66] Liu, Q., X. She, and Q. Xia, AI based diagnostics initiative (IDRI): a completed reference database product design for osteosarcoma cells microscopy of lung nodules on CT scans. Medical physics, imaging of bone cancer patients using CA- 2011. 38(2): p. 915-931. MobileNet V3. Journal of Bone Oncology, 2024: p. [81] Armato III, S.G., et al., LUNGx Challenge for 100644. computerized lung nodule classification. Journal of [67] Charilaou, P. and R. Battat, Machine learning Medical Imaging, 2016. 3(4): p. 044506-044506. models and over-fitting considerations. World [82] Cho, Y., et al., Optimal number of strong labels for Journal of Gastroenterology, 2022. 28(5): p. 605. curriculum learning with convolutional neural [68] Oommen, D.K. and J. Arunnehru, Alzheimer’s network to classify pulmonary abnormalities in Disease Stage Classification Using a Deep chest radiographs. Computers in Biology and Transfer Learning and Sparse Auto Encoder Medicine, 2021. 136: p. 104750. Method. Computers, Materials & Continua, 2023. [83] Wong, K.C., T. Syeda-Mahmood, and M. Moradi, 76(1). Building medical image classifiers with very [69] http://adni.loni.usc.edu. limited data using segmentation networks. Medical [70] Kumar, K.A., A. Prasad, and J. Metan, A hybrid image analysis, 2018. 49: p. 105-116. deep CNN-Cov-19-Res-Net Transfer learning [84] Wu, X., et al., COVID-AL: The diagnosis of architype for an enhanced Brain tumor Detection COVID-19 with deep active learning. Medical and Classification scheme in medical image Image Analysis, 2021. 68: p. 101913. processing. Biomedical Signal Processing and [85] Zhang, K., et al., Clinically applicable AI system Control, 2022. 76: p. 103631. for accurate diagnosis, quantitative measurements, [71] Manickam, A., et al., Automated pneumonia and prognosis of COVID-19 pneumonia using detection on chest X-ray images: A deep learning computed tomography. Cell, 2020. 181(6): p. approach with different optimizers and transfer 1423-1433. e11. learning architectures. Measurement, 2021. 184: [86] Wu, X., et al., HAL: Hybrid active learning for p. 109953. efficient labeling in medical domain. [72] Venugopal, V., et al., A deep neural network using Neurocomputing, 2021. 456: p. 563-572. modified EfficientNet for skin cancer detection in [87] Borgli, H., et al., HyperKvasir, a comprehensive dermoscopic images. Decision Analytics Journal, multi-class image and video dataset for 2023. 8: p. 100278. gastrointestinal endoscopy. Scientific data, 2020. [73] Rotemberg, V., et al., A patient-centric dataset of 7(1): p. 1-14. images and metadata for identifying melanomas [88] Decencière, E., et al., Feedback on a publicly using clinical context. Scientific data, 2021. 8(1): distributed image database: the Messidor p. 34. database. Image Analysis & Stereology, 2014. [74] Tschandl, P., C. Rosendahl, and H. Kittler, The 33(3): p. 231-234. HAM10000 dataset, a large collection of multi- [89] Aresta, G., et al., Bach: Grand challenge on breast source dermatoscopic images of common cancer histology images. Medical image analysis, pigmented skin lesions. Scientific data, 2018. 5(1): 2019. 56: p. 122-139. p. 1-9. [90] Meirelles, A.L., et al., Effective Active Learning in [75] Codella, N.C., et al. Skin lesion analysis toward Digital Pathology: A Case Study in Tumor melanoma detection: A challenge at the 2017 Infiltrating Lymphocytes. Computer Methods and international symposium on biomedical imaging Programs in Biomedicine, 2022: p. 106828. (isbi), hosted by the international skin imaging [91] Saltz, J., et al., Spatial organization and molecular collaboration (isic). in 2018 IEEE 15th correlation of tumor-infiltrating lymphocytes using international symposium on biomedical imaging deep learning on pathology images. Cell reports, (ISBI 2018). 2018. IEEE. 2018. 23(1): p. 181-193. e7. [76] Combalia, M., et al., Bcn20000: Dermoscopic [92] Zhang, Z., et al., Quality-driven deep active lesions in the wild. arXiv preprint learning method for 3D brain MRI segmentation. arXiv:1908.02288, 2019. Neurocomputing, 2021. 446: p. 106-117. [77] Codella, N., et al., Skin lesion analysis toward [93] Shattuck, D.W., et al., Construction of a 3D melanoma detection 2018: A challenge hosted by probabilistic atlas of human cortical structures. the international skin imaging collaboration (isic). Neuroimage, 2008. 39(3): p. 1064-1080. arXiv preprint arXiv:1902.03368, 2019. [94] https://www.nitrc.org/projects/ibsr. 136 Informatica 49 (2025) 115–136 E.M.F. El Houby [95] Lu, Q., et al., A blood cell classification method based on MAE and active learning. Biomedical Signal Processing and Control, 2024. 90: p. 105813. [96] Matek, C., et al., Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks. Nature Machine Intelligence, 2019. 1(11): p. 538-544. [97] Kumbhare, S., A.B. Kathole, and S. Shinde, Federated learning aided breast cancer detection with intelligent Heuristic-based deep learning framework. Biomedical Signal Processing and Control, 2023. 86: p. 105080. [98] Lee, R.S., et al., A curated mammography data set for use in computer-aided detection and diagnosis research. Scientific data, 2017. 4(1): p. 1-9. [99] Zhang, C., et al., FedBrain: A robust multi-site brain network analysis framework based on federated learning for brain disease diagnosis. Neurocomputing, 2023. 559: p. 126791. [100] Heinsfeld, A.S., et al., Identification of autism spectrum disorder using deep learning and the ABIDE dataset. NeuroImage: Clinical, 2018. 17: p. 16-23. [101] Shaikh, A.A.S. and M. Bhargavi, Weighted aggregation through probability based ranking: An optimized federated learning architecture to classify respiratory diseases. Computer Methods and Programs in Biomedicine, 2023. 242: p. 107821. [102] Fraiwan, L., et al., Automatic identification of respiratory diseases from stethoscopic lung sound signals using ensemble classifiers. Biocybernetics and Biomedical Engineering, 2021. 41(1): p. 1-14. [103] Rocha, B., et al. Α respiratory sound database for the development of automated classification. in Precision Medicine Powered by pHealth and Connected Health: ICBHI 2017, Thessaloniki, Greece, 18-21 November 2017. 2018. Springer. https://doi.org/10.31449/inf.v49i16.7979 Informatica 49 (2025) 137–150 137 Vision Transformer-Based Framework for AI-Generated Image Detection in Interior Design Hui Wang AnHui Business and Technology College, Hefei City, AnHui Province, 230041, China E-mail: leZhang2024@163.com Keywords: artificial intelligence-generated images, interior design, vision transformers, deep learning, image classification Received: January 7, 2025 Increasingly, images generated by artificial intelligence (AI) are being used within interior design as a source of authenticity and ethical use. Based on limited Convolutional Neural Network (CNN) capabilities in data descriptive processes, including long-range dependencies and global patterns, this study examines how Vision Transformer (ViT) can be utilized in detecting AI-generated interior design images. We fine- tuned and evaluated four ViT models, ViT-B16, ViT-B32, ViT-L16, and ViT-L32, on 1,000 samples per class dataset. Accuracy, precision, recall, F1-score, and computational efficiency were used to assess performance. Results show that models with smaller patch sizes (i.e., 16×16) perform better than larger ones (i.e., 32×32). It was found that ViT-B16 and ViT-L16 had the highest accuracy (96.25%) and F1- score (0.9625) in identifying minor inconsistencies in the AI-generated images. ViT-B32 and ViT-L32 enjoy better computational efficiency based on lower classification performance (80.00% and 81.25% accuracy, respectively, for ViT-B32 and ViT-L32). The best tradeoff between accuracy and resource efficiency turns out to be ViT-B16. However, computational costs were higher with ViT — ViT-L16, although just as accurate. Computationally, ViT-B32 and ViT-L32 were also efficient, which was more appropriate for real- time applications with lower accuracy than speed. Through this work, we contribute a domain-specific deep learning framework for AI-generated image detection in interior design to increase authenticity verification. Future work will address improving computational efficiency and generalizing the model across all (or most) generative models and design styles. Povzetek: Razvit je nov pristop za zaznavanje umetno ustvarjenih slik v notranjem oblikovanju z uporabo različnih konfiguracij vizualnih transformerjev, ter ugotovil optimalne modele glede na točnost in računsko učinkovitost. 1 Introduction (GAN) and Diffusion Models, we have created highly realistic images that often outperform human-generated Artificial Intelligence (AI) has become increasingly designs in quality and detail. While these tools embedded in practice in creative industries, such as interior democratize access to creative resources, they also come design, through generating photo-realistic and innovative with problems such as authenticity, intellectual property, imagery [1]. Lately, tools like Generative Adversarial and ethical use. For example, it is essential to differentiate Networks (GANs) and diffusion models have between generated and made images in interior design democratized access to this high-quality design, but their because professional work in commercial and academic use has become ubiquitous [2, 3]. It brings challenging spaces may be compromised. While AI is increasingly problems around what 'authentic' designs are, how designs applied to create visual content, and domain-specific can be used ethically, and intellectual property rights. applications such as interior design are still in their Nearly all current AI detection methods leverage infancy, the lack of attention to developing robust means Convolutional Neural Networks (CNNs) for their feature to detect such images remains. extractors, and they are mainly limited to short-range Despite their effectiveness, many existing detection dependencies in image data. approaches rely more on Convolutional Neural Networks Based on Vision Transformers (ViTs) [4], a state-of- (CNNs), which cannot model long-range dependencies and the-art architecture, this study proposes their application as global patterns of high-dimensionality datasets, e.g., a transformative approach to detecting AI-generated images [6]. In recent years, with self-attention-based interior design images. This research lays out a solid mechanisms, Vision Transformers (ViTs) have emerged as foundation for authenticating AI-generated content by powerful surrogate models, achieving state-of-the-art removing barriers to scalability, computational efficiency, results in image classification and artefact detection tasks and domain-specific applications. Artificial intelligence [7]. One of their key attributes is their ability to model such (AI) has profoundly changed what it feels like in most noncontiguous relationships, thus offering a measurement industries, including interior design, with visualization, for identifying the subtle inconsistencies underlying AI- generated images. This study proposes a deep learning creativity, and presentation led by AI-generated images framework based on Vision Transformers to detect AI- [5]. By the time of Generative Adversarial Networks 138 Informatica 49 (2025) 137–150 H. Wang generated interior design images. The study fine-tunes interior design image classification. Ensemble Models: multiple ViT configurations (ViT-B16, ViT-B32, ViT-L16, Others have combined CNNs and transformer-based and ViT-L32) on a balanced dataset and compares their architectures to provide the best of both worlds. For performance w.r.t. accuracy, precision, recall, F1-score, example, hybrid architectures such as DeiT (data-efficient and computational efficiency. Results guide model image transformer) and feature extract early features via configuration choice when resources impose a tradeoff convolutional blocks [17-20], subsequently using between detection accuracy. transformer layers to perform global attention. The contributions of this work are threefold: Image classification and manipulation detection have become the state of art using Vision Transformers. On • Developing a domain-specific AI image detection approach targeted to interior design, high-dimensional datasets, they can divide the images into patches and apply self-attention to the relationships • Comparing a large number of ViT configurations to between them, leading to better performance [4, 21]. establish cost-benefit relationships, Several studies have highlighted their applicability: ViTs • The lessons learned from deploying transformer-based were introduced to demonstrate their scalability in models for AI content detection. challenging image classification tasks, outperforming First, the contributions of this research fill an essential gap traditional CNNs on large-scale datasets [4, 22]. in AI image authenticity verification, and second, they References [23-26] indicate that Vision Transformers are establish a foundation for future work in this young area. adequate detectors of subtle image manipulations, including deepfake detection. They, therefore, are a natural 2 Background and related work choice of methodology for tasks where subtle minute image artefacts are exceedingly sensitive. The present Detecting artificial intelligence (AI)-made images is an study extends this foundation to a binary classification of emerging field of study, as people increasingly use AI- AI-generated and authentic images in interior design while based tools in creative spheres like interior design. This fine-tuning ViT models. literature review provides an overview of state-of-the-art The success of deep learning models and effective AI-generated content detection, specifically preprocessing is critical. Standard techniques to make methodologies and techniques that can be applied to using models robustly include image resizing, normalization, Vision Transformers (ViTs) to discriminate between AI- and data augmentation. References [27, 28] have generated and human-created images. researched that dataset balancing is necessary and that Thanks to the integration of AI, photo-realistic images, working with augmentation strategies is a better way to which resemble human-placed designs, are generated. The tackle class imbalances. In this study, we adopt these advanced generative models used by tools such as DALL- practices: samples per class were capped at 1,000, and the E, MidJourney, and Stable Diffusion make images more dataset was set up for diversity. Metrics like accuracy, indistinguishable from real things. These democratizing precision, recall, F1-score, and loss are used to evaluate advancements to creativity are a concern as they also put detection models, commonly called metrics. These are it into the public domain, worrying about authenticity and used to find misclassification patterns using confusion intellectual property rights [8-10]. There have been few matrices [29, 30]. In line with best current practice in the attempts to identify the key difficulties of detecting AI field, it suggested using a range of metrics to capture interior design images, leaving a vacant area for studying distinct aspects of model performance, which justifies the this field. choice of metrics made by the study. AI-generated image detection usually relies on machine Despite the advancements, several challenges persist in learning or deep learning models to identify little things detecting AI-generated images: (i) Subtle Artifacts: about artificial intelligence-generated images that would Detections of high-quality AI-generated images are not have come from them. Some commonly used complex because they are often not marked with visual techniques include: artefacts. Generative models studied have recently Convolutional Neural Networks (CNNs): In the past, demonstrated their ability to learn and generate CNNs have been a core piece of image classification tasks. increasingly higher-quality actual image samples They have been shown to learn spatial hierarchies in seamlessly. (ii) Computational Complexity: Despite being images and to detect AI artefacts. For example, we highly accurate, transformer-based models are successfully used CNNs to detect GAN-generated images computationally expensive, making it a difficult task for [11, 12]. Global contextual relationships in high- resource-constrained environments. (iii) Dataset dimensional data can be solved tremendously well with Limitations: The generalization or transferability of CNNs [13], but they are commonly challenging. detection models for a specific domain, such as interior Transformer-Based Architectures: Based on our design, is limited by the lack of standardized datasets. Transformers, which were initially designed for natural We compare deep learning-based methods to detect AI- language processing [14], we adapt them for vision tasks. generated images, particularly in interior design, as shown Self-attention mechanisms used by Vision Transformers in Table 1. Then, it compares those approaches' strengths, (ViTs) to capture local and global image patterns result in accuracy, precision, recall, and limits. ViTs being very powerful for detecting minute inconsistencies in AI-generated content [5, 15, 16]. In this work, we build upon the success of ViTs by extending it to Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 139 Previous literature has discussed the detection of AI- generated images across the more general areas at length, Table 1: Comparison of AI-generated image detection with little focus on the domain-specific application, methods interior design. Furthermore, most of the studies employ Methodol Key Accura Precisi Reca Limitations ogy Strengths cy on ll CNN-based solutions, while others, looking at the full CNN- Intense 85– High High Struggles capability of Vision Transformers, are less central. This Based feature 92% with long- study evaluates multiple ViT configurations for detecting Approach extraction range es for local dependencie AI-generated interior design images to fill these gaps. patterns; s; limited This literature review points out the significance and effective effectivenes importance of Vision transformers as a current state-of- for GAN- s on high- based quality the-art approach for detecting AI-generated images. This images textures study benefits from this capability since it helps to grow Hybrid Combines 89– High High Increased the body of work on the authenticity of AI-generated CNN- CNN's 94% computation Transfor spatial al cost; content. Future work will then need to make computational mer awareness complex efficiency improvements, tackle domain-particular Models with model challenges, and standardize benchmarks for performance Transform training er's self- evaluation in interior design and more generally. attention Ensemble Enhances 91– High High Requires Models classificati 95% large-scale 3 Proposed method on datasets; robustness computation The proposed method uses deep learning to distinguish AI- by ally generated images in interior design from human-created integratin expensive ones, as shown in Figure 1. Given that, for preprocessing g multiple architectur and balancing the input images, we limit samples per class es to be uniform and split the data into training and validation Vision Captures 96.25% 0.9637 0.96 High sets. The system uses the features extracted by Vision Transfor fine- 25 computation Transformer (ViT) models (ViT-B16, ViT-B32, ViT-L16, mers grained, al cost (ViTs) global requires ViT-L32) and classifies images. The defined parameters (Our dependenc extensive are used to train the model, and then you evaluate the Approach ies via pretraining. metrics such as accuracy and F1 score. Performance ) self- analysis is realized through visualization of training attention; excels at samples, predictions, and validation metrics, leading to a detecting robust and interpretable approach. subtle artefacts Figure 1: Pipeline of the proposed methodology for AI-generated image detection in interior design. It consists of dataset collection, preprocessing, the Vision Transformer (ViT) feature extraction, training with AdamW optimization, 140 Informatica 49 (2025) 137–150 H. Wang and evaluating using accuracy, precision, recall, and F1 score to maintain an optimal tradeoff between efficiency and performance. Different sizes of network depth, hidden • Random Rotation (±15°) was applied to introduce dimension size, self-attention heads, and total params are random image orientation variability, where 15° — Base (B), Large (L), and Huge (H) Vision Transformer rotational deviation. (ViT) models. ViT-B (Base) has 12 layers, 768 hidden • To simulate mirroring of interior design perspectives, dimensions, and 86 million parameters, achieving good simulate Horizontal Flipping (50% probability). performance and computational cost tradeoffs and being practical in real-world AI-generated image detection. The • The effect of random cropping (90% of the original better feature extraction performance results in ViT-L size) forces the model to pay attention to different (Large) with 24 layers, 1024 hidden dimensions, and image portions. 307M parameters, which comes with higher computational • Applying Color Jitter (±0.2 on the Brightness, cost. The most resource-intensive ViT is ViT-H (Huge), Contrast, and Saturation adjustments) simulates the which comes with 32 layers, a hidden dimension of 1280, variations that might occur through lighting and 632 million parameters. It was left out for its high conditions. computational demands with no proportional accuracy Pixel values are normalized to the range [0,1] or gains. For this reason, Base and Large models have been standardized using the mean 𝜇 and standard deviation 𝜎 of addressed in this study, as they ensure the optimal balance the dataset: between accuracy and efficiency, consequently making them feasible for AI-generated image detection in interior 𝐼′−𝜇 𝐼norm = (2) design. 𝜎 Deep learning algorithms-based methodology to Images are divided into non-overlapping patches of size detect artificial intelligence (AI) generated images in 𝑃 × 𝑃(e.g., 16 × 16 or 32 × 32: interior design. The process consists of multiple steps, Patch = {𝑝𝑖,𝑗: 𝑝𝑖,𝑗 ∈ 𝑅𝑃×𝑃 which are described in detail below: }, ∀𝑖, 𝑗 ∈ [1, 𝑁] The first step in collecting the image dataset is to get Where 𝑁 s is the number of patches per dimension, an extensive collection of images. This dataset comprises calculated as: two main categories: Image Size • AI-Generated Images: AI tools and algorithms 𝑁 = (3) Patch Size images for interior design pictures. For an image of 224×224 and a patch size 16, N=14 (i.e., • Real Images: Actual interior designs captured 14×14=196 patches). Each patch is flattened into a 1𝐷 using cameras or professionally curated vector and linearly projected into a 𝐷 -dimensional photographs. embedding space using a learnable matrix, 𝑊𝑒: The dataset must be diverse in design styles, lighting 𝑧𝑝 = 𝑊𝑒 ⋅ Flatten(𝑝𝑖,𝑗) (4) conditions, and resolutions to generalize new images well. Raw input images are standardized to make them Where 𝑧𝑝 ∈ 𝑅𝐷 , is the embedded representation of a patch. appropriate for input into the ViT model and for better performance. Each image is resized to 224 × 224 pixels: To encode spatial information, a positional embedding 𝐼′ = Resize(𝐼, 224,224) (1) 𝑒pos is added to each patch embedding: Where 𝐼 is the original image and 𝐼′, is the resized image. 𝑧′ 𝑝 = 𝑧𝑝 + 𝑒pos (5) To prevent overfitting and improve robustness, performed Where 𝑒pos, is a learnable positional embedding vector. data augmentation, which includes: • Random Rotation (±15°) was applied to introduce The sequence of patch embeddings is passed through random image orientation variability, where 15° — multiple Transformer encoder layers. Each layer consists rotational deviation. of Multi-Head Self-Attention (MHSA) scores are computed as follows: • To simulate mirroring of interior design perspectives, simulate Horizontal Flipping (50% probability). 𝑄𝐾𝑇 Attention(𝑄, 𝐾, 𝑉) = Softmax ( ) 𝑉 (6) √𝑑 • The effect of random cropping (90% of the original 𝑘 size) forces the model to pay attention to different where: image portions. • 𝑄 = 𝑊𝑞 ⋅ 𝑧′ 𝑝  (query) • By applying Color Jitter (±0.2 on the Brightness, • 𝐾 = 𝑊 ′ 𝑘 ⋅ 𝑧𝑝  (key) Contrast, and Saturation adjustments), I'm simulating , the variations that might occur through lighting • 𝑉 = 𝑊 ′ 𝑣 ⋅ 𝑧𝑝  (value) conditions. • 𝑊𝑞 , 𝑊𝑘 , 𝑊𝑣  , are learnable weight matrices. • 𝑑𝑘, is the dimensionality of the key. Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 141 Multi-head attention is computed as: of 1.0 ensures numerical stability and is easy to replicate and adapt from in future studies. MHSA(𝑧′𝑝 ) = Concat(head1, … , headℎ)𝑊𝑜 (7) where 𝑊𝑜, is an output projection matrix. Feed-Forward Neural Network (FFN): Each patch embedding is processed through a two-layer fully Table 2: Training Hyperparameters connected network with activation: Parameter Value Optimizer AdamW (Decoupled FFN(𝑧) = ReLU(𝑧𝑊1 + 𝑏1)𝑊2 + 𝑏2 (8) Weight Decay) where 𝑊1, 𝑊2 and 𝑏1, 𝑏2, are learnable parameters. Learning Rate 5e-5 (decayed using cosine annealing) Residual Connections and Layer Normalization: Each Learning Rate Schedule Cosine Annealing with a block includes skip connections and normalization: warm-up for the first five epochs 𝑧𝑙+1 𝑝 = LayerNorm (𝑧𝑙 𝑝 + MHSA(𝑧𝑙 𝑝)) (9) Batch Size 16 𝑧𝑙+1 𝑝 = LayerNorm (𝑧𝑙 𝑝 + FFN(𝑧𝑙 Weight Decay 0.01 𝑝)) Dropout Rate 0.1 (10) Training Epochs 10 A unique learnable classification token 𝑧𝑙 cls, is prepended Gradient Clipping Norm Clip at 1.0 to the patch sequence: Loss Function Binary Cross-Entropy Loss 𝑧𝑙+1 cls = Transformer(𝑧𝑙 𝑙 cls, {𝑧𝑝}) Validation Split 80% Train, 20% (11) Validation where 𝑧𝑙 cls, aggregates global information for The results of the proposed method are evaluated by using classification. the following metrics: The output of the classification token is passed through a TP+TN Accuracy = (15) softmax layer to produce probabilities for the two classes, TP+TN+FP+FN (𝑦real, 𝑦AI): Accuracy is a general measure of the total correctness of the model. However, as always in machine learning, it ŷ = Softmax(Wc ⋅ zcls + bc) is not for class imbalance as a model that always predicts (12) "AI generated" would still have high accuracy if the Where 𝑊𝑐 and 𝑏𝑐, are learnable parameters. dataset was skewed. An accuracy score ranging above 90% is an indication that the model is working reasonably well The binary cross-entropy loss is: overall. It does not mean that the model is not biased 1 L = − ∑N toward one class. i=1[y g y ) ( − ) l g 1 y ) N i lo ( î + 1 yi o ( − î ] (13) TP Precision = (16) TP+FP Where 𝑦𝑖 , is the ground truth label. The precision measures how many of the detected AI- The model is trained using the Adam optimizer: generated images are AI-generated. In an application where false positives are to be minimized, such as θt+1 = θt − η∇L(θt) incorrectly labelling accurate interior designs as AI- (14) generated, such normalization is a must. A high precision Θ represents model parameters, η is the learning rate, and (>90%) implies the model does not misclassify human- ∇ℒ is the loss gradient. created images as AI-generated. If the score is less than 80% (i.e., a precision of less than ~80%, which is a high We provide a detailed breakdown of false positive rate), then the model may be too unreliable hyperparameters and training configurations of our for commercial use. experiments to guarantee reproducibility in Table 2. TP Similar to AdamW, which is known for its good Recall = (17) TP+FN generalization of Transformer-based architectures, we use the version of AdamW. A weight decay of 0.01 helps to The recall measures how much the model fails to prevent overfitting. Beginning with a warm-up at the first identify images without missing AI-generated images. five epochs, we apply a cosine annealing schedule with a However, recall is a key metric for applications where warm-up to avoid early instability and then gradually finding all the AI-generated content is more important than decay the learning rate in the rest of the training. A avoiding false positives. A high recall (>90%) means the memory-efficient yet stable update is done in a batch size model fails to capture AI-generated images. If there is a of 16. These hyperparameters are detailed and mimic in low recall (<80%), the model cannot correctly 'detect' training, especially in deep ViT models; gradient clipping 142 Informatica 49 (2025) 137–150 H. Wang many of these AI-generated images, resulting in many • DeiT models are optimized for datasets on the smaller false negatives. side, and their efficiency is based on knowledge Precision⋅Recall distillation. Although they reduce training costs, they F1-Score = 2 ⋅ Precision+Recall are less suitable for capturing global dependencies in (18) image authenticity verification by AI because they rely on CNN-like inductive biases. When precision and recall have an optimal tradeoff, the F1 score is a balanced metric. In particular, it is suitable • Applications in object detection: As an object for AI image detection, where you want to minimize false detection application, Swin utilizes hierarchical positives and negatives. A high F1 score (i.e., >90%) feature learning with shifting windows, so it is indicates that the model can balance precision and recall efficient. Nevertheless, our main objective in global well. If the F1-score is low (<80%), then the model is feature extraction is achieved by standard ViTs owing overfitting to one class (i.e., giving in precision or recall to their pure self-attention mechanism. disproportionally). Consequently, we did not explore hybrid transformers Different ViT configurations, such as ViT-B16: Base to examine the effects of patch size and model capacity on model, patch size 16 × 16, are used. ViT-B32: Base model, AI-generated image detection. patch size 32 × 32. ViT-L16: Large model, patch size Figure 2 illustrates the proposed method's ability to 16 × 16. ViT-L32: Large model, patch size 32 × 32. classify images as AI-generated (T: Using Vision Each configuration affects the balance between Transformers, we represent visual tokens to classify computational efficiency and detection accuracy. images as either AI Created (T: AI) or human-created (T: Consideration of alternative hybrid transformer Human). The predicted labels (P: Below each architectures was considered in this study, such as DeiT classification, we have written AI or P: Human. The model (Data efficient Image Transformer) and Swin can distinguish between AI-generated and authentic Transformer. Still, due to the following reasons, they have human-created interior design images in different settings. not been part of this study. Figure 2: Authenticity verification results of AI-generated and human-created images in interior design applications. 4 Experimental setup This research study fine-tuned Vision Transformers (ViTs) by classifying human-crested indoor design images The database of images related to interior design was from AI-crested indoor design images. The experiments compiled to be balanced, and the images were were conducted with various ViT variants to account for preprocessed to guarantee rigorous training and testing. the model capacity, achieving different patch sizes. The dataset of AI-vs-human images is available at https://www.kaggle.com/datasets/shirshaka/ai-vs-human- generated-images. Such important values as learning rate, batch size, and evaluation criterion were tuned to ensure reliability, as shown in Table 3. Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 143 Validation Strategy Evaluation performed after each epoch. 5 Results and analysis For this task, we evaluate four Vision Transformer Table 3: Overview of the experimental setting, including (ViT) models—ViT-B16, ViT-B32, ViT-L16, and ViT- model architectures used, details of the data sets and L32—to distinguish between real and artificial interior preprocessing, and training and evaluation parameters design images generated by AI. This section presents the applied in classifying human and AI-generated interior design validation results and analysis. Based on essential metrics images. like loss, accuracy, F1 score, precision, recall, runtime, and Aspect Details computational efficiency, the models were compared in Vision Transformers (ViT) Table 4 and Figures 3-6. The results quantify the tradeoff variants: between accuracy and efficiency across various model • vitb16: Base configurations, with smaller patch sizes (16×16) achieving model, patch size higher accuracy and F1 scores and larger patch sizes 16 (32×32) for more computational throughput. The most Models • vitb32: Base appropriate model for this classification task is identified model, patch size through a detailed comparison. 32 • vitL16: Large model, patch size Table 4: Validation performance reached by ViT models 16 (ViT-B16, ViT-B32, ViT-L16, and ViT-L32) on studying • vitL32: Large AI-generated image classification. model, patch size Metric ViT- ViT- ViT- ViT- 32 B16 B32 L16 L32 Pretraining All models were pre-trained on ImageNet-21k. Accurac 96.25% 80.00% 96.25% 81.25% Fine-tuning Task Binary classification: y Class 0: Human-generated images F1 Score 0.9625 0.8000 0.9625 0.8118 • Class 1: AI- generated images Dataset Custom dataset of interior 0.9637 0.8002 0.9637 0.8175 design images categorized Precision as accurate (human) or fake (AI). Recall 0.9625 0.8000 0.9625 0.8125 Sample Limitation The sample limit is 1000 samples per class per category. Loss 0.1154 0.4970 0.1206 0.4469 Data Splitting 80% training, 20% validation split. Runtime 15.7407 15.3469 18.4198 15.109 (s) 6 Image Processing Transformation pipeline: • Resize to 224x224 Samples 10.165 10.426 8.686 10.589 pixels per • Convert to tensor Second • Normalize using Steps per 0.635 0.652 0.543 0.662 ImageNet mean Second and standard deviation. Optimizer Adam Learning Rate 5e-5 Batch Size 16 Epochs 10, epochs Evaluation Metrics Accuracy, precision, recall, and F1-score. 144 Informatica 49 (2025) 137–150 H. Wang Figure 3: The ViT_B16 model validation results over ten Figure 6: Validation metrics of the ViT-L32 model over epochs with a decline in loss and an accurate convergence ten epochs show loss declining to 0.44 by epoch 8, while of accuracy, F1 score, precision, and recall around 96% at accuracy, F1 score, precision, and recall stabilize around epoch 8. 81% by the final epoch. The results of four Vision Transformer (ViT) models, including the ViT-B16, ViT-B32, ViT-L16, and ViT-L32, were tested as a detector for determining whether AI generates the images or contains traditional interior design. The performance results, which consist of accuracy, F1 score, precision, recall, loss, runtime, and computational efficiency of each examined model, contribute to identifying usable and nonusable components. A qualitative analysis follows based on the results from Table 3 and the validation trends in Figs 3–6. Second, models using patch sizes of 16×16 (limited patch size) overwhelmingly outperformed those using patch sizes of 32×32 (largest patch size). Our Figure 4: Validation metrics of the ViT-B32 model validation accuracy was 96.25%, F1 score 0.9625, during ten epochs with loss have converged, and precision 0.9637 and recall 0.9625. The results accuracy, F1 score, precision, and recall at a plateau of demonstrate that these models can accurately discriminate 80% around the final epoch. between AI-generated and authentic images. It allows for better details and a smaller patch size against which features can be extracted for more accurate detection of subtle artefacts in AI-generated images. On the other hand, ViT-B32 and ViT-L32 using larger 32×32 patches achieved significantly lower accuracy (80.00% and 81.25%) and F1 scores (0.8000 and 0.8118). These results suggest the models are limited to coarse granularity due to their weaker classification performance, which is why a 32×32 patch size option is offered. The validation graphs show interesting differences; each model converges quicker and more efficiently. At the end of epoch 8, ViT-B16 (Figure 1) Figure 5: ViT-L16 model validation metrics on ten steadily reduces its validation loss to 0.1154, and we epochs, quickly converging to 3 epochs, with loss of around 0.12 and accuracy of around 96%, F1 score, observe that the accuracy, precision, recall, and F1 scores precision, and recall of around 96, respectively. settle at around 96%. It shows how robust and efficient, in theory, it is at learning. As shown in Figure 3, ViT-L16 more quickly converges to its validation loss (0.1206) as early as epoch 3. Another is its performance metrics, which reach 96% at epoch three, affirming its reasonable capability to adequately capture complex patterns in the data in fewer epochs. However, this raises the computational price. ViT-B32 (Figure 2) and ViT-L32 (Figure 4) take longer to converge, losing at 0.4970 and 0.4469 respectively. These models achieve precision and recall at Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 145 around 80–81%, whereas the smaller patch-size models in its computational cost. On the other hand, ViT-B32 and reach their precision and recall plateau earlier. ViT-L32 pick the path of efficiency over precision, being On the other hand, small patch size models (ViT- good candidates for real-time applications where speed is B16, ViT-L16), although providing higher classification more important than classification accuracy. The performance, incur higher computational costs. As with importance of choosing the correct model configuration is ViT-L16, the runtime of this setup is 18.4198 seconds, made clear in this comprehensive comparison of the with the lowest throughput of 8.686 samples per second 'theory' against the specific needs of the task. and 0.543 steps per second, reflecting this setup's high It is a standard evaluation measure of classification computational complexity. Though less efficient than the tasks, which summarizes the model's performance across 32×32 patch models, ViT-B16 processes 10.165 samples different thresholds in a single graph called the Area Under per second at a runtime of 15.7407 seconds, making it a The Curve (AUC) graph. It gives an overall score of model good balance between performance and efficiency. effectiveness by providing a measure of the tradeoff However, a comparison of ViT-B32 and ViT-L32 between the True Positive Rate (sensitivity) and the False reveals that ViT-B32 is considerably more efficient, Positive Rate. In the context of authenticity verification, reaching a throughput of 10.589 samples per second and a we use evaluation accuracy as a proxy for AUC and allow runtime of 15.1096 seconds, which makes it the fastest. the performance of models to be compared directly in Nevertheless, their F1 scores and reduced accuracy make Figure 7. their application less appropriate for high-precision tasks. Further analyses on precision and recall metrics highlight the trade between models. The precision and recall values of both ViT-B16 and ViT-L16 are in the 96% range, meaning they have a low risk of finding false positives and false negatives. They are ideal for tasks with high accuracy, making them perfect. ViT-B32 and ViT-L32, however, have precision and recall values in the 80–81% range, which maintains performance over the varied scale for ViT-B16. While their consistency is excellent, the lower precision implies less reliance on accurately identifying AI-generated images. The validation metric trends provide additional clarity: • ViT-B16 (Figure 3): With growing numbers of epochs, it shows steady improvement and stable Figure 7: Detecting AI-generated images in interior performance from epoch 8, and this is an excellent design applications by comparing AUC among Vision balance between the capacity of learning and Transformer models (ViT-B16, ViT-L16, ViT-L32, and efficiency. ViT-B32). Although slightly worse, other models have • ViT-L16 (Figure 5): It converges remarkably fast, comparable outcomes; even ViT-B16 is the strongest. stabilizing by epoch 3, but at a higher computational cost, making it an attractive solution when fast This study's results show that Vision Transformers training is a top priority. (ViTs) outperform conventional CNN-based methods in • ViT-B32 (Figure 4) and ViT-L32 (Figure 6): Slow detecting interior design imaging generated by AI. By learning with little ability to capture minute comparing the models, it is concluded that the best- differences in the data, all exhibit gradual performing model, ViT-B16, could perform at an accuracy improvement over ten epochs. of 96.25% and an F1 score of 0.9625, thus proving to The results reveal the tradeoff between accuracy and distinguish AI-generated images from the real ones. While computational efficiency. ViT-B16 is the most balanced these results are promising, it is necessary to contextualize model with reasonable throughput, runtime, and accuracy them by comparing them to prior AI-generated image (96.25%). Equally accurate, ViT-L16 is too detection in other fields, such as medical imaging, digital computationally intensive for use when accuracy isn't the art, and deepfake detection, as shown in Table 5. top concern. However, for those tasks that demand a higher level of computational efficiency (i.e., speed), ViT-B32 and ViT-L32 are favourable. Since the reduced accuracy Table 5: Contextual comparison of AI-generated image renders them unusable for high-precision calculation, the detection methods entire ViT family may be overkill for some applications. Domain Best Accur F1 Key ViT-B16 seems to be a better model for detecting AI- Model acy Sco Observati generated images in interior design than the rest, as its re ons tradeoff between accuracy and computational efficiency is Medical ViT-based 94.7% - ViTs better. While ViT-L16 has a higher computational cost, its Imaging Histopath effectivel fast convergence and high accuracy make it ideally suited ology y detect to scenarios seeking the highest precision, with a tradeoff Model synthetic 146 Informatica 49 (2025) 137–150 H. Wang (Arshed et medical ViT- 580 sec 10.2 GB 80.00% 0.8000 al., 2023) images B32 but ViT- 940 sec 16.8 GB 96.25% 0.9625 struggle L16 with ViT- 810 sec 14.3 GB 81.25% 0.8118 highly L32 high- resolution The ViT-B16 configuration achieves the best tradeoff textures. between accuracy and computational efficiency. ViT-L16 gets comparable accuracy but requires much more memory Digital GAN- 85– - CNNs are Art Based 92% effective and training time than Quilt. ViT-L16, ViT-B16, ViT-B32, Authentic CNN but prone and ViT-L32 require less computational load than larger ation Model to false patch sizes but offer lower accuracy. The results show that the most practical model for real-world AI-generated (Vivaldi positives image detection in interior design is ViT-B16; they are & Sutedja, due to accurate and come with reasonable training time and 2024) intricate artistic memory usage. patterns. We also performed additional experimental evaluations, using an imbalanced dataset and noisy inputs, Deepfake ViT- - 0.95 ViTs to test our models' robustness. In both tests, real-world Detection Based excel at samples are simulated, and ViTs are tested to see their Deepfake capturing stability in different data conditions. We had changed the Detector subtle class distributions (70% of AI-generated images, 30% (Zhao et inconsiste authentic images). ViT-B16 performance dropped slightly al., 2023) ncies in (Accuracy: 94.2%, F1 Score: 0.945). The model was AI- stable; thus, it was resilient to imbalanced data. We generated degraded the inputs using Gaussian noise (σ=0.05) and human random occlusions. However, ViT-B16 achieved high faces. accuracy (93.5%) while ViT-B32 and ViT-L32 decreased Interior ViT-B16 96.25 0.96 ViT-B16 below 75%. Self-attention in ViTs helps retain essential Design % 25 outperfor features; however, larger patch sizes suffer from losing (Our ms fine details in noisy conditions. Inference on challenging Study) existing conditions confirms that ViT-B16 is the most robust methods model. Further work will be pursued to enhance the model by resilience with adversarial training techniques. preserving fine- grained 6 Discussion textures Results from the experiment confirm the incredible and performance of Vision Transformers (ViTs) in capturing distinguishing AI-generated interior design images. For long- smaller patch sizes such as ViT-B16 and ViT-L16, we range achieve an impressive accuracy of 96.25% in identifying dependen subtle artefacts. This makes them an ideal choice for high- cies. precision authenticity verification. Similarly, Table 5 compares training time, memory usage, configurations with larger patch sizes, such as ViT-B32 and model performance to ensure the computational and ViT-L32, optimize for speed at the expense of some efficiency of different ViT configurations. The analysis accuracy. Real-time applications, or environments with must identify the most reasonable model for detecting AI- resource constraints, apply generously to these generated images in interior design concerning configurations. Our findings demonstrate that ViTs can be computation cost and accuracy. scalable for other creative fields, such as architecture and visual art. Future work will concentrate on designing Table 5: Computational dfficiency of ViT configurations hybrid architectures for optimal precision and efficiency. Model Training Memory Accuracy F1 This work has shown that ViTs can be a powerful tool Time Usage (%) Score for distinguishing AI-generated from human-generated (per (GB) images in interior design. Its results highlight the promise epoch, and pain of using them in this way, which can be extended sec) to many other application areas. Across four ViT ViT- 720 sec 12.5 GB 96.25% 0.9625 configurations (ViT-B16, ViT-B32, ViT-L16, and ViT- B16 L32), we summarize the findings regarding the tradeoffs Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 147 between model accuracy, computational efficiency, and lighting conditions and, thus, are better suited for more the nature of data representation. generalized AI detection frameworks. Using smaller patch sizes (16×16) like ViT-B16 and However, the observed tradeoffs between accuracy and ViT-L16, the models demonstrate superior performance efficiency indicate that task-specific model selection is over all the metrics like accuracy, precision, recall, and F1 critical. High-precision applications may benefit from score and reach values close to 96.25%. That is to say, smaller patch sizes and larger models; conversely, those models are more capable of discerning the relatively computationally efficient configurations may prove subtle inconsistencies and artefacts typical of artificial preferable for scenarios where scalability and speed are images that are indistinguishable from reality in the human paramount in large-scale design database audits. eye. ViTs display robust ability in this binary classification By demonstrating the effectiveness of ViTs in problem by extracting detailed spatial and contextual differentiating two sets of images produced by AI in features. interior design, this study lays the groundwork for However, the computational demands of ViTs became developing more sophisticated AI authenticity verification a more significant consideration. ViT-L16 converged algorithms. Through tailored model configurations to faster (within three epochs) than ViT-B16, which achieved particular use cases, the tradeoffs between accuracy and high accuracy, but its computation overheads—runtime efficiency can be worked through effectively, enabling and throughput—make it less practical for resource- general use in the creative domain and further. constrained environments. On the other hand, ViT-B16 The current AI-generated image detection techniques also achieved comparable accuracy but with relatively mainly depend on a CNN-based model with local receptive lower computational costs. Given applications such as fields to extract hierarchical spatial features. While CNNs interactive design tools or automated verification systems have identified GAN artefacts when such CNNs are that require real-time processing, the efficiency gains applied to high-resolution photo-realistic synthetic interior enabled by models like ViT-B32 may be preferable to less design images, traditional, deepfake, or low-quality precise models, though they would be less accurate. synthetic artefacts are absent from the synthetic images. The results are essential for real-world deployment in The CNNs cannot find them. On the other hand, ViTs like interior design and related fields. Integrating high- ViT-B16 use self-attention mechanisms that work across accuracy models such as ViT-B16 into quality assurance the entire image to find inconsistencies that CNNs would pipelines can assure the authenticity of design assets to miss. Comparative performance between ViT-B16 and the verify usage and prevent misrepresentation. Like ViTs, the paper reported in previous literature is presented in Table versatility of ViTs in processing diverse datasets shows 6. how ViTs are adaptable to diverse design styles and Table 6: Comparative performance analysis of ViT-B16 vs CNN-based methods. Model Architecture Accuracy F1 Key Limitations Score Strengths CNN-Based Convolutional 85–92% 0.85– Intense Struggles with Methods feature 0.91 spatial long-range extraction feature dependencies, learning, poor efficient on generalization to small-scale high-quality AI- datasets generated images Hybrid CNN for local 89–94% 0.89– Balances Computationally CNN- features, 0.94 CNN expensive, Transformer Transformer efficiency complex for long- with training process range context Transformer's self-attention ViT-B16 Vision 96.25% 0.9625 Captures Requires (Our Model) Transformer both local significant with small and global pretraining and patch size dependencies higher (16×16) with high computational accuracy on resources high-quality AI images patch sizes, like ViT-B16 and ViT-L16, had significantly We also observe that the performance of ViT depends better accuracy than models with bigger patch sizes (like on patch size. Our results show that models with smaller 148 Informatica 49 (2025) 137–150 H. Wang ViT-B32 and ViT-L32). Even for ViT-B32, the accuracy this research attempts to contribute to AI authenticity dropped to 80.00%, and for ViT-L32, it dropped to verification in interior design using transformer-based 81.25%, indicating that the solutions fell considerably image classification. Future work will consider improving behind their small patch counterpart. This discrepancy is computational efficiency, enhancing the set of images used because smaller patches can preserve fine-grained details. in the dataset with more diverse AI-generated photos, and When an image is tokenized into larger patches, the loss of combining convolutional and transformer-based models. information can occur due to aggregation of critical spatial Finally, we will investigate adversarial robustness for information like subtle shading, textural variations, and improving the model's resilience against evolving delicate contours. The interior design images are of generative techniques. Such advances will further bolster intricate patterns and highly detailed material textures, for AI image detection, as it is utilized in digital content which feature extraction is better maintained with small verification. patch sizes. Furthermore, the self-attention module receives fewer tokens to process in larger areas, which can References impact the model learning the distinction between authentic vs AI-generated images. It sets smaller patch [1] J. Hutson, J. Lively, B. Robertson, P. Cotroneo, and sizes, leading to denser tokenization, so the ViT model can M. Lang, Creative Convergence: The AI retain more information and distinguish between the real Renaissance in Art and Design. Springer Nature, world and AI-generated designs. pp. 1–19, Nov. 2023, doi: 10.1007/978-3-031- The results show that ViTs outperform CNN-based 45127-0_1 models in detecting AI-generated images; however, [2] D. Saxena and J. Cao, "Generative adversarial several limitations should be considered. Even though the networks (GANs) challenges, solutions, and future data is diverse, there could still be latent biases in the directions," ACM Computing Surveys (CSUR), vol. lighting styles. Through specific aesthetic design 54, no. 3, pp. 1-42, 2021. preferences, the model may figure out the detection of https://doi.org/10.1145/3446374 style incoherencies rather than actual AI artefacts. [3] F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. However, future work will have to cross-domain on Shah, "Diffusion models in vision: A survey," IEEE datasets generated by different AI models (e.g., GANs vs. Transactions on Pattern Analysis and Machine Diffusion models) to validate their generalization Intelligence, vol. 45, no. 9, pp. 10850-10869, 2023. properties. However, ViT-B16 reaches high accuracy but https://doi.org/10.1109/tpami.2023.3261988 still consumes ample computational resources (12.5GB [4] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. memory for each epoch). The ViT-based detection systems Khan, and M. Shah, "Transformers in vision: A deployed on edge devices or real-time applications may be survey," ACM computing surveys (CSUR), vol. 54, performable with model compression techniques like no. 10s, pp. 1-41, 2022. knowledge distillation or quantization. Potential Evasion https://doi.org/10.1145/3505244 by Advanced AI Models As soon as AI-generated images [5] N. Anantrasirichai, F. Zhang, and D. Bull, become fancier, detection models must change. The AI "Artificial Intelligence in Creative Industries: images could be created using adversarial attacks to avoid Advances Prior to 2025," arXiv preprint detection, and the training process for models would need arXiv:2501.02725, 2025. to be continuously updated. These limitations provide https://doi.org/10.1007/s10462-021-10039-7 future improvements in AI-generated image detection, [6] M. A. Moharram and D. M. Sundaram, "Land use which is scalable and adaptive. and land cover classification with hyperspectral data: A comprehensive review of methods, 7 Conclusion challenges and future directions," Neurocomputing, vol. 536, pp. 90-113, 2023. For interior design, this study shows the viability of Vision https://doi.org/10.1016/j.neucom.2023.03.025 Transformers (ViTs) as a method to differentiate AI- [7] K. Han et al., "A survey on vision transformer," generated images from human-made designs. We then find IEEE transactions on pattern analysis and machine a clear tradeoff between accuracy and computational intelligence, vol. 45, no. 1, pp. 87-110, 2022. efficiency by fine-tuning multiple ViT configurations [8] G. Bansal, A. Nawal, V. Chamola, and N. (ViT-B16, ViT-B32, ViT-L16, ViT-L32). Classifiers Herencsar, "Revolutionizing visuals: the role of using smaller patches (patches size: 16×16) performed generative AI in modern image generation," ACM better, and ViT-B16 achieved 96.25% accuracy and Transactions on Multimedia Computing, 0.9625 (F1 score). The key outcome of these results is that Communications and Applications, vol. 20, no. 11, delicate feature extraction improves AI image detection, pp. 1-22, 2024. and ViT-B16 is the most appropriate model for real-world https://doi.org/10.1109/tpami.2022.3152247 applications. On the other hand, with computational [9] A. Kulkarni, A. Shivananda, A. Kulkarni, and D. benefit, higher patch size models (such as 32×32) do have Gudivada, "Diffusion Model and Generative AI for worse performance but are better suited for lower precision Images," in Applied Generative AI for Beginners: applications. Due to our findings regarding the necessity Practical Knowledge on Diffusion Models, of selecting models according to task requirements and ChatGPT, and Other LLMs: Springer, 2023, pp. balancing accuracy, efficiency, and resource constraints, Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 149 155-177. https://doi.org/10.1007/978-1-4842- transformers," IEEE Access, vol. 11, pp. 123433– 9994-4_8 123444, 2023. doi: 10.1109/access.2023.3329952 [10] S. Bengesi, H. El-Sayed, M. K. Sarker, Y. [22] J. Maurício, I. Domingues, and J. Bernardino, Houkpati, J. Irungu, and T. Oladunni, "Comparing vision transformers and convolutional "Advancements in Generative AI: A neural networks for image classification: A Comprehensive Review of GANs, GPT, literature review," Applied Sciences, vol. 13, no. 9, Autoencoders, Diffusion Model, and p. 5521, 2023. Transformers," IEEE Access, vol. 12, pp. 69812– https://doi.org/10.3390/app13095521 69837, 2024, doi: 10.1109/access.2024.3397775 [23] T. Walczyna, D. Jankowski, and Z. Piotrowski, [11] D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, "Enhancing Anomaly Detection Through Latent and L. Verdoliva, "Are GAN generated images Space Manipulation in Autoencoders: A easy to detect? A critical analysis of the state-of- Comparative Analysis," Applied Sciences, vol. 15, the-art," in 2021 IEEE international conference on no. 1, p. 286, 2024. multimedia and expo (ICME), 2021: IEEE, pp. 1-6. https://doi.org/10.3390/app15010286 https://doi.org/10.1109/icme51207.2021.9428429 [24] D. H. Hagos, R. Battle, and D. B. Rawat, "Recent [12] T. Arora and R. Soni, "A review of techniques to advances in generative ai and large language detect the GAN-generated fake images," models: Current status, challenges, and Generative Adversarial Networks for Image-to- perspectives," IEEE Transactions on Artificial Image Translation, pp. 125-159, 2021. Intelligence, vol. 5, no. 12, pp. 5873–5893, Dec. https://doi.org/10.1016/b978-0-12-823519- 2024, doi: 10.1109/tai.2024.3444742. 5.00004-x [25] S. P. J. Christydass, N. Nurhayati, and S. [13] A. Khan et al., "A survey of the vision transformers Kannadhasan, Hybrid and Advanced Technologies: and their CNN-transformer based variants," Proceedings of the International Conference on Artificial Intelligence Review, vol. 56, no. Suppl 3, Hybrid and Advanced Technologies (ICHAT 2024), pp. 2917-2970, 2023. April 26-28, 2024, Ongole, Andhra Pradesh, India https://doi.org/10.1007/s10462-023-10595-0 (Volume 2). CRC Press, 2025. [14] A. Rahali and M. A. Akhloufi, "End-to-end https://doi.org/10.1201/9781003559115 transformer-based models in textual-based NLP," [26] M. M. Meshry, "Neural rendering techniques for AI, vol. 4, no. 1, pp. 54-110, 2023. photo-realistic image generation and novel view https://doi.org/10.3390/ai4010004 synthesis," University of Maryland, College Park, [15] H. Bougueffa et al., "Advances in AI-Generated 2022. Images and Videos," International Journal of [27] S. Susan and A. Kumar, "The balancing trick: Interactive Multimedia & Artificial Intelligence, Optimized sampling of imbalanced datasets—A vol. 9, no. 1, 2024. brief survey of the recent State of the Art," https://doi.org/10.9781/ijimai.2024.11.003 Engineering Reports, vol. 3, no. 4, p. e12298, 2021. [16] A. S. Paladugu, A. Deodeshmukh, A. R. Shekatkar, https://doi.org/10.1002/eng2.12298 I. Kandasamy, and V. WB, "Detection of [28] X. Jiang and Z. Ge, "Data augmentation classifier Artificially Generated Images Using Shifted for imbalanced fault classification," IEEE Window Transformer with Explainable Ai," Transactions on Automation Science and Available at SSRN 5025934. Engineering, vol. 18, no. 3, pp. 1206-1217, 2020. https://doi.org/10.2139/ssrn.5025934 https://doi.org/10.1109/tase.2020.2998467 [17] L. Yin et al., "Convolution-Transformer for Image [29] O. Rainio, J. Teuho, and R. Klén, "Evaluation Feature Extraction," CMES-Computer Modeling in metrics and statistical tests for machine learning," Engineering & Sciences, vol. 141, no. 1, 2024. Scientific Reports, vol. 14, no. 1, p. 6086, 2024. https://doi.org/10.32604/cmes.2024.051083 https://doi.org/10.1038/s41598-024-56706-x [18] H. Tang, D. Liu, and C. Shen, "Data-efficient multi- [30] P. Fergus and C. Chalmers, "Performance scale fusion vision transformer," Pattern evaluation metrics," in Applied Deep Learning: Recognition, vol. 161, p. 111305, 2025. Tools, Techniques, and Implementation: Springer, https://doi.org/10.1016/j.patcog.2024.111305 2022, pp. 115-138. https://doi.org/10.1007/978-3- [19] W. Zheng, S. Lu, Y. Yang, Z. Yin, and L. Yin, 031-04420-5_5 "Lightweight transformer image feature extraction network," PeerJ Computer Science, vol. 10, p. e1755, 2024. https://doi.org/10.7717/peerj-cs.1755 [20] L. Scabini, A. Sacilotti, K. M. Zielinski, L. C. Ribas, B. De Baets, and O. M. Bruno, "A Comparative Survey of Vision Transformers for Feature Extraction in Texture Analysis," arXiv preprint arXiv:2406.06136, 2024. [21] D. Konstantinidis, I. Papastratis, K. Dimitropoulos, and P. Daras, "Multi-manifold attention for vision 150 Informatica 49 (2025) 137–150 H. Wang Vision Transformer-Based Framework for AI-Generated Image… Informatica 49 (2025) 137–150 151 https://doi.org/10.31449/inf.v49i16.7839 Informatica 49 (2025) 151–170 151 Efficient Logistics Path Optimization and Scheduling Using Deep Reinforcement Learning and Convolutional Neural Networks Yan Yang1, 2, *, Kang Wang1, 2 1School of Economics and Management, Jiaozuo University, Jiaozuo 454000, Henan, China 2Graduate School, University of the East, Manila 0900, Philippines E-mail: yangyan_edu@outlook.com *Corresponding author Keywords: CNN, DRL, logistics path optimization, real-time scheduling, robustness scoring Received: December 17, 2024 With the rapid development of e-commerce and online shopping, the logistics industry is facing unprecedented challenges. Traditional logistics path - planning methods, such as SPA, HA, GA, etc., struggle to cope with the complex and ever-changing logistics environment. To address this issue, this study proposes an innovative model that combines Deep reinforcement learning (DRL) with a Convolutional neural network (CNN) to achieve efficient logistics path optimization. In this research, a detailed analysis and pre-processing of the public datasets, the City Logistics Dataset (CLDS) and the Traffic Status Dataset (TSDS), were carried out to construct a model capable of effectively handling diverse logistics environments. Six baseline methods, namely the classic shortest path algorithm (SPA), heuristic algorithm (HA), genetic algorithm (GA), rule-based method (RBM), traditional deep reinforcement learning method (TDRM), and the most advanced deep learning method (ADLM), were selected for comparison. The experimental results indicate that the proposed model performs excellently across various environments. For instance, in suburban areas, it achieves a path length of 180 kilometers, a completion time of 120 minutes, a punctuality rate of 92%, and a dispatch success rate of 95%. In urban settings, the path length is 200 kilometers, the completion time is 150 minutes, the punctuality rate is 90%, and the dispatch success rate is 93%. On highways, it reaches a path length of 170 kilometers, a completion time of 110 minutes, a punctuality rate of 93%, and a dispatch success rate of 95%. Compared with the baseline methods, the model shows significant improvements in key metrics such as path length, completion time, punctuality, and dispatch success rate. Additionally, it outperforms them in terms of computation time and robustness scores, demonstrating great potential for practical applications. Povzetek: Opisan je izvrni model za optimizacijo logističnih poti in sprotno razporejanje z združitvijo globokega utrjevalnega učenja (DRL) in konvolucijskih nevronskih mrež (CNN). 1 Introduction intelligence technology, domestic and foreign scholars are actively exploring the application of AI technology in With the advancement of global economic integration logistics path optimization, aiming to improve logistics and the rapid development of e-commerce, the logistics efficiency through intelligent algorithms. Although industry is facing unprecedented challenges and traditional methods such as linear programming can opportunities. Efficient, fast and accurate delivery of provide effective solutions, they are powerless in the face goods has become one of the core elements of corporate of large-scale dynamic problems [3, 4]. In contrast, AI competition. However, finding the optimal delivery path technologies such as genetic algorithms (GA) and ant in a complex geographical environment and achieving colony algorithms (ACA) have shown stronger instant scheduling under dynamically changing conditions exploration capabilities and adaptability, especially in has always been a difficult problem for logistics solving the traveling salesman problem (TSP) [5]. In companies. Although traditional mathematical addition, the advancement of deep learning technology, programming-based methods perform well under static especially the application of long short-term memory conditions, they have obvious limitations in dealing with networks (LSTM), makes it possible to predict traffic real-time changing traffic conditions and emergencies [1]. conditions and realize dynamic path planning. Therefore, it is particularly important to explore a new Reinforcement learning (RL) enables intelligent agents to logistics path optimization and real-time scheduling make optimal decisions in a constantly changing solution that can adapt to complex environments and has environment by simulating the learning process. These self-learning capabilities [2]. technologies have been widely used in multiple scenarios Logistics path optimization is a core link in logistics such as urban distribution, cross-border logistics, and cold management and is crucial to improving logistics service chain logistics, helping to optimize delivery routes, predict quality and reducing operating costs. Against the customs clearance time, monitor temperature changes, etc. backdrop of the rapid development of artificial [6, 7]. Despite this, the application of AI in logistics path 152 Informatica 49 (2025) 151–170 Y. Yang et al. optimization still faces challenges in data privacy the punctuality rate, which is expected to increase the protection, algorithm real-time and robustness. With the punctuality rate to more than 95%; at the same time, advancement of technology and the evolution of social increase the scheduling success rate to 92%. In terms of needs, more innovative solutions are expected to emerge computing efficiency, the model calculation time will be in the future, continuously promoting the intelligent controlled within 15 seconds to ensure real-time development of the logistics industry [8]. performance; and when facing complex environmental In view of the above background, this study aims to disturbances, the robustness score will be maintained explore how to use neural network technology to improve above 8.5 points (out of 10 points), comprehensively the existing logistics path optimization algorithm and improving the comprehensive performance of the logistics propose a set of real-time scheduling strategies suitable for scheduling system and providing strong technical support dynamic environments. Specifically, we will first analyze for the intelligent development of the logistics industry. the main problems and their causes in logistics Deep reinforcement learning (DRL) can continuously distribution, and then introduce the basic principles of optimize strategies by interacting with the environment to neural networks and their advantages in solving these cope with real-time changes; convolutional neural problems. Then, we will design and implement a neural networks (CNN) can extract effective features from network-based path optimization model that can respond complex geographic and traffic data. The combination of quickly after receiving real-time data input and adjust the the two allows the model to better perceive real-time distribution plan. Finally, we will verify the effectiveness information and make reasonable decisions quickly. of the model through experiments and explore its Therefore, the use of specific artificial intelligence applicability and limitations in different application methods is the key to solving real-time logistics problems. scenarios [9, 10]. They can make up for the shortcomings of traditional On the other hand, traditional mathematical methods and improve the efficiency and flexibility of programming has extremely high requirements for data logistics scheduling. integrity and accuracy. In logistics data, there are often The novelty of combining DRL and CNN for logistics problems such as missing data, errors, or outliers. For path optimization and real-time scheduling lies in the example, the weight of goods and order time in logistics unique complementary advantages. Traditional methods distribution information may be deviated due to recording find it difficult to take into account both geospatial feature errors or equipment failures, and the traffic volume and extraction and dynamic strategy adjustment. In this study, average speed in traffic status data may also have CNN's powerful spatial feature extraction capability can measurement errors. Traditional mathematical accurately capture key information in the logistics programming methods lack effective means to deal with geographical environment, such as distribution of these incomplete or inaccurate data, and direct use may distribution points, traffic network topology, etc. DRL can lead to a significant reduction in the reliability of model dynamically adjust strategies based on these features to results. In addition, when faced with large-scale, high- adapt to the ever-changing logistics environment, such as dimensional data, the computational complexity of real-time traffic conditions, order changes, etc. Although traditional mathematical programming methods will many papers have similar combinations, this study focuses increase dramatically, the solution time will be on complex logistics scenarios, deeply integrates the significantly longer, and it may even be impossible to advantages of the two, and achieves more efficient and solve the problem, making it difficult to meet the needs of intelligent path planning and scheduling decisions. This is real-time logistics scheduling. a unique contribution. This study focuses on the key area of logistics path optimization and real-time scheduling. At present, 2 Theoretical basis and literature traditional logistics scheduling methods have exposed many shortcomings when dealing with complex and review changing logistics environments, and it is difficult to meet the needs of efficient and accurate distribution. Based on 2.1 Basic concepts of logistics path this, we put forward the core research question: How to optimization use advanced neural network technology to deeply Research in the field of logistics continues to develop innovate the existing logistics path optimization algorithm and innovate, and many scholars have conducted in-depth to achieve efficient planning and real-time dynamic discussions from different angles. Alkan and Kahraman scheduling of logistics paths? (2023) used the multi-expert Fermat fuzzy hierarchical Around this issue, we put forward the following analysis method in the literature [9] to prioritize the supply specific hypothesis: The model that innovatively chain digital transformation strategy, providing a integrates DRL and CNN can fully tap the advantages of decision-making basis for the digital development of the both and effectively deal with complex geographic spatial logistics supply chain, helping logistics companies to information and dynamically changing logistics grasp key strategies and optimize operational processes in environments. Compared with traditional methods, this the digital wave. Lee et al. (2019) proposed an model is expected to shorten the length of logistics endosymbiotic evolutionary algorithm in the literature distribution paths by an average of about 20% in various [10] to solve the problem of the integrated model of scenarios; significantly shorten the delivery completion vehicle routing and truck scheduling with a cross-dock time by an average of 30 minutes; significantly improve Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 153 system, providing new ideas and methods for path 2.2 Overview of the application of neural planning and scheduling in the logistics distribution link, networks in logistics which is of great significance to improving logistics efficiency and reducing costs. As an artificial intelligence technology that imitates Logistics path optimization refers to finding the best the working mode of biological brain, neural network has path from the starting point to the end point under a series shown great potential in many fields. In the logistics of constraints to minimize transportation costs, time or industry, neural networks are widely used in multiple links other specified goals. This process usually involves multi- such as route optimization, demand forecasting, and objective optimization problems, which may include inventory management. For example, the use of minimizing total mileage, reducing fuel consumption, convolutional neural networks (CNN) can extract useful shortening delivery time, etc. Logistics path optimization information from a large amount of image data for problems can be theoretically classified as combinatorial automatic identification of cargo labels, thereby speeding optimization problems. Typical problem forms include up sorting. The use of long short-term memory networks traveling salesman problem (TSP) and vehicle routing (LSTM) can process time series data, predict future problem (VRP) [11]. These problems become extremely demand fluctuations, and help companies prepare in complex when they are large in scale, and it is difficult to advance [14, 15]. Reinforcement learning (RL) can find the global optimal solution. Therefore, researchers dynamically adjust strategies based on historical have developed a variety of heuristic algorithms and meta- behaviors and reward signals to optimize the vehicle’s heuristic algorithms, such as genetic algorithms, simulated delivery path. annealing algorithms, ant colony algorithms, etc., to In recent years, researchers have also explored how to approximate solutions to such problems. These algorithms combine neural networks with other algorithms to solve try to find a satisfactory solution rather than an absolute more complex logistics problems. For example, some optimal solution through iterative search [12]. researchers combined genetic algorithms with neural Logistics path optimization is not limited to networks to form a hybrid model to solve multi-objective determining a single path, but also includes issues such as vehicle routing problems. The results showed that this multi-path selection and multi-vehicle scheduling. With method achieved a good balance between complexity and the growth of logistics business, how to efficiently allocate solution quality. In addition, graph neural networks resources in a large-scale network has become one of the (GNNs) are also used to analyze the topological structure key challenges. In order to meet this challenge, of logistics networks, predict traffic flow by learning the researchers have begun to explore new solutions, such as relationship between nodes, and then guide dynamic path introducing machine learning technology into path planning. planning, using historical data to predict future Neural networks are widely applied in multiple transportation demand, and thus Equationting more aspects of logistics management, including route reasonable distribution plans in advance. In addition, with optimization, demand forecasting, and inventory the development of Internet of Things (IoT) technology, management. Regarding route optimization, relevant the large amount of real-time data generated in logistics introductions have been provided. However, in the fields systems has also provided new possibilities for path of demand forecasting and inventory management, neural optimization [13]. networks also play significant roles. In demand Logistics path optimization refers to finding the best forecasting, recurrent neural networks (RNNs) or their path from the starting point to the destination under a variant, long - short - term memory networks (LSTMs), series of constraints, aiming to minimize transportation can be used to analyze historical order data. These costs, time, or other specific objectives. Previous networks can capture long - term dependencies in time - descriptions have mostly focused on time as a static series data to predict future order demands at different constraint. However, in real - world logistics scenarios, time intervals. For example, by analyzing sales data from real - time responsiveness plays a crucial role. The the past year, the order volume during upcoming holidays logistics environment is in a state of dynamic change. can be predicted to make advance inventory preparations Traffic conditions can change rapidly, such as sudden and logistics plans. In inventory management, traffic accidents or temporary road closures, which can autoencoders and other neural network structures can be render the originally planned path no longer optimal. Real used to detect inventory anomalies. By learning the data - time responsiveness is not just about time consideration features of normal inventory states, an alarm can be issued but also about the timely response to dynamic elements in in a timely manner when there are abnormal fluctuations the logistics environment. For example, by obtaining real in inventory levels. In the logistics path optimization use - - time traffic congestion information through a traffic case of this study, CNNs can be used to extract features monitoring system, when a congestion is detected on a from geospatial data to help identify the logistics certain route, the path can be immediately adjusted to characteristics of different regions; LSTMs can combine ensure transportation efficiency. This ability to time - series traffic data to predict the impact of future dynamically adjust to real - time changes should be an traffic conditions on paths; and reinforcement learning can important part of the concept of logistics path be used to dynamically select the optimal path based on optimization, thus forming a closer logical connection different environmental states, making the applications of with the subsequent real - time scheduling content. these neural networks closely related to the research use - case. 154 Informatica 49 (2025) 151–170 Y. Yang et al. 2.3 Development history of real-time 2.4 Review and analysis of related scheduling technology research literature Real-time scheduling technology refers to the ability In recent years, many studies have been devoted to to respond immediately when an event occurs. In the field applying advanced computing technologies to logistics of logistics, real-time scheduling technology is crucial for path optimization and real-time scheduling. For example, dealing with unforeseen situations, such as sudden traffic Ren et al. [18] proposed a hybrid method combining deep jams and road closures caused by weather changes. Early reinforcement learning and genetic algorithm to solve the real-time scheduling systems mainly relied on simple rules multi-objective vehicle routing problem. Experiments and expert systems, but with the advancement of show that this method can not only effectively handle information technology, more data-driven methods have multiple optimization objectives, but also achieve a good emerged. For example, technology based on model balance between complexity and solution quality. At the predictive control (MPC) can optimize operations in the same time, Yang et al. [19] used graph neural network future in a short period of time to ensure that the system is (GNN) to analyze the structure of urban traffic network always in the best operating state [16]. and proposed a dynamic path planning framework that can With the enhancement of computing power and the continuously update the optimal path under changing development of big data technology, modern real-time traffic conditions. Studies have shown that this method has scheduling systems are no longer limited to simple rule significant improvements in path update speed and path matching, but are able to predict future state changes by quality compared with traditional algorithms. learning patterns in historical data and adjust scheduling Although existing research has made significant strategies accordingly. For example, Wu et al. [17] uses progress, there are still some challenges to overcome. The deep reinforcement learning technology to implement first is the issue of data privacy and security. Since a large real-time scheduling, which can adaptively adjust the path amount of sensitive information is involved in the logistics of mobile robots in dynamic environments, thereby system, how to ensure the secure transmission and storage improving the flexibility and responsiveness of the of data is an important task. The second is the system. In addition, cloud computing and edge computing interpretability of the algorithm. Although deep learning technologies have also made real-time scheduling more models perform well in many cases, they are often black feasible because they provide powerful computing box models and lack transparency, which limits their resources to process massive data and ensure that delays application in certain industries (such as healthcare) [20, are minimized. 21]. Finally, the popularity of technology. Although This paper introduces real - time scheduling as a academia has proposed many innovative solutions, there strategy to deal with “unforeseen situations” such as are still relatively few actual deployments in the industry. traffic jams. However, real - time scheduling should not This may be due to factors such as technology maturity exist in isolation but should be an important part of the and cost-effectiveness. overall logistics path optimization problem. In actual “Data privacy and security,” “algorithm logistics operations, real - time scheduling and path interpretability,” and “technology maturity and cost - optimization influence and promote each other. When effectiveness” are challenges that cannot be ignored in the encountering unforeseen situations like traffic jams, real - research of logistics path optimization. In terms of data time scheduling needs to be dynamically adjusted based privacy and security, the City Logistics Dataset (CLDS) on the foundation of path optimization. For example, if the and Traffic Status Dataset (TSDS) used in this study originally planned path cannot reach the destination on contain a large amount of sensitive information, such as time due to a traffic jam, the real - time scheduling system customer addresses and order details. To protect data should re - plan the optimal path according to the current privacy, encryption technology can be used to encrypt the traffic conditions and remaining order information. At the data to ensure its security during transmission and storage. same time, path optimization should also consider the At the same time, a strict data access permission possibility of real - time scheduling and reserve a certain management mechanism should be established, and only degree of flexibility in path planning to enable rapid authorized personnel can access and process the data. adjustment in case of emergencies. Model predictive Regarding algorithm interpretability, the decision - control (MPC) is a method that predicts the future making processes of complex models such as deep behavior of a system based on a model and optimizes reinforcement learning and convolutional neural networks control strategies. In this study, MPC is closely related to are often difficult to understand. To improve algorithm real - time scheduling and path optimization. MPC can use interpretability, methods such as feature importance real - time traffic data and the logistics system model to analysis and decision trees can be used to explain the predict traffic conditions and logistics demand changes in model's decision - making process. For example, through the future, thereby adjusting path planning and scheduling feature importance analysis, it can be understood which strategies in advance. For example, if MPC predicts that a input features have the greatest impact on path selection, certain road will experience severe congestion in the next enabling decision - makers to better understand the hour, the system can plan a detour route in advance to model's decision - making basis. avoid getting the vehicle stuck in the jam and improve the In terms of technology maturity and cost - efficiency of logistics transportation. effectiveness, a comprehensive evaluation of the adopted technologies is required. Although deep reinforcement Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 155 learning and convolutional neural networks have great suitable technology solution. For example, by comparing potential in logistics path optimization, the application of the computational complexity and performance indicators these technologies requires certain computing resources of different algorithms, an algorithm with high and professional knowledge. Therefore, in practical computational efficiency and low cost can be selected. applications, it is necessary to weigh the technology The specific research status is shown in Table 1. maturity and cost - effectiveness and select the most Table 1: Research status Research Method Key Indicators SOTA Positioning Path length, completion time, on - time rate, Fast in finding the shortest path in simple static SPA scheduling success rate scenarios Path length, completion time, on - time rate, Quick in finding approximate solutions for HA scheduling success rate large - scale problems Path length, completion time, on - time rate, Outstanding in solving complex optimization GA scheduling success rate problems Path length, completion time, on - time rate, Fast decision - making in known simple RBM scheduling success rate environments Path length, completion time, on - time rate, Advantageous in handling dynamic TDRM scheduling success rate environments Path length, completion time, on - time rate, Currently leading in the application of deep ADLM scheduling success rate learning This Study (DRL + Path length, completion time, on - time rate, Surpassing existing methods in multiple CNN) scheduling success rate indicators considerations. The CLDS dataset covers detailed 3 Research methods and model information on urban logistics distribution, including the geographical locations of distribution points, order times, construction and quantities. Its geographical scope covers multiple urban areas, with a time span of one year. The dataset 3.1 Data collection and preprocessing contains 1 million records and 20 columns of different Data collection and preprocessing are key steps to attribute information. These data can well reflect the ensure the smooth progress of subsequent modeling work. actual situation of urban logistics and provide rich order This study mainly relies on public datasets for and geographical location information for logistics path experimental verification and model training. The reason optimization. The TSDS dataset focuses on traffic status for choosing public datasets is that they provide a wide data, including traffic flow and vehicle speeds on different range of data sources, cover different types of real roads at different time intervals. Its time granularity is 15 scenarios, and help improve the generalization ability of minutes, and its geographical scope matches that of the the model. We selected two major public datasets to CLDS dataset. The dataset has 800,000 records and 15 support this study, as shown in Table 2. (1) City Logistics columns of attributes. This dataset can reflect real - time Data Set (CLDS). This dataset contains logistics changes in traffic conditions, which is crucial for real - distribution information from multiple European cities, time path optimization. The characteristics of these two including the time, location, cargo type, weight, etc. of the datasets are highly relevant to the model requirements of order. These data reflect the actual urban logistics this study. The model needs to plan paths based on order operations and are very suitable for training and testing information and geographical locations, and the CLDS our models. (2) Traffic State Data Set (TSDS). This dataset provides the necessary basic data; at the same time, dataset provides information on the status of urban traffic the model needs to consider the impact of real - time traffic in different time periods, including traffic flow, average conditions on paths, and the TSDS dataset provides real - speed, road congestion, etc. These data help us analyze the time traffic data support. By combining these two datasets, impact of traffic conditions on logistics path optimization a model that better conforms to the actual logistics and provide a basis for real-time scheduling [22]. environment can be constructed, improving the accuracy The selection of the City Logistics Dataset (CLDS) and real-time performance of path optimization. and Traffic Status Dataset (TSDS) is based on multiple 156 Informatica 49 (2025) 151–170 Y. Yang et al. Table 2: Dataset information Dataset Geographical Sample Data Types Time Range Key Features name range size City Logistics January 2018 Order time, Logistics and Many cities to December 100,000+ location, cargo Data Set delivery in Europe 2019 type, weight, etc. (CLDS) information Traffic volume, Traffic State Traffic North January 2019 average speed, Data Set status American to December 50,000+ road congestion, (TSDS) information major cities 2020 etc. Data processing is a key step to ensure the reliability adjust the path to avoid congested roads and improve of subsequent model training and experimental results. transportation efficiency. The datasets used in this study include City Logistics Data In logistics path optimization, the traffic input is Set (CLDS) and Traffic State Data Set (TSDS), which closely related to path selection. The model calculates the cover logistics distribution information and urban traffic estimated travel times of different paths based on real - status, respectively. First, we cleaned the original data, time traffic data and gives priority to the path with the removed duplicate records, and used the 3σ principle to shortest travel time. For example, when there is a traffic detect and remove outliers to improve data quality. For jam on a certain road, the model will automatically avoid missing values, we used interpolation to fill in the missing that road and select other relatively unobstructed routes. values to prevent incomplete data from affecting model At the same time, the model also considers the changing performance. By standardizing the numerical features, we trend of traffic conditions and plans the path in advance to converted them into a form with zero mean and unit cope with possible traffic jams. By closely integrating the variance to facilitate model learning. At the same time, {traffic} input with path selection, dynamic optimization new feature variables were created according to business of logistics paths can be achieved, improving the needs, such as calculating the distance between two efficiency and reliability of logistics transportation. points, extracting specific attributes from the date (such as The model aims to solve the path optimization and the day of the week), and adjusting demand forecasts real-time scheduling problems in logistics distribution, according to holidays [23, 24]. and realizes efficient logistics distribution management by integrating CNN’s ability to extract spatial features and 3.2 Neural network model DRL’s ability to learn dynamic strategies. The model’s input includes geographic location information, order This study proposes an innovative model that information, time information, and traffic conditions, and combines DRL and CNN to solve logistics path the output is a series of action instructions that instruct the optimization and real-time scheduling problems. The logistics system how to optimally dispatch vehicles. model uses the powerful feature extraction capability of The specific pseudo code is as follows. CNN to process spatial data and dynamically adjusts the # CNN forward pass strategy through DRL to cope with the ever-changing def cnn_forward(x): logistics environment. The following is the specific design x = conv_layer(x, filters=32, kernel_size=(3, 3)) of the model and its mathematical expression. x = relu(x) The model inputs {lat, Ing}, {order}, and {traffic} x = pool_layer(x, pool_size=(2, 2)) have clear meanings and interrelationships. {lat, Ing} return flatten(x) represents the geographical coordinates of distribution # DRL Q - network forward pass points and vehicles. These coordinate information is the def q_network_forward(state): basis for path planning, determining the position and x = fc_layer(state, units=256) moving direction of vehicles in the geographical space. x = relu(x) {order} contains detailed order information, such as the x = fc_layer(x, units=256) quantity of orders, delivery times, and delivery locations. x = relu(x) Order information is the goal of path optimization, and the return fc_layer(x, units=action_size) model needs to plan the optimal path according to the # DRL training step order requirements to ensure timely and accurate delivery. def drl_train(): {traffic} represents real - time traffic conditions, including state = get_state() road congestion levels and vehicle speeds. Traffic action = choose_action(state) conditions are important factors affecting path selection. next_state, reward, done = take_action(action) Real - time traffic data can help the model dynamically update_q_network(state, action, reward, next_state, done) Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 157 3.2.1 Model overview to to ensure the shortest path or the least time required [26, The input of the model includes the location 27]. As shown in Figure 1, the model uses a feature extraction module to process multiple sources of coordinates of the distribution points (lat , lng )}N , the i i i=1 information such as order information, location order information of each distribution point {order }N , coordinates, time information, and traffic conditions. The i i=1 feature extraction module includes convolutional layers the current time t, and the current traffic conditions and pooling layers to extract useful information. Next, the {traffic }N . The geographic location information i i=1 model takes the state representation as input and guides covers the location coordinates of the distribution points, action instructions through action selection and execution. The model update process continuously optimizes which is expressed as (lat , lng )}N where N represents i i i=1 decisions, thereby improving delivery efficiency. the number of distribution points, and each coordinate pair The model output “can be represented as a series of (lat actions {a}, where each action a can be an operation such i , lngi ) represents the latitude and longitude of the as selecting the next delivery point or adjusting the vehicle distribution point. The order information includes the speed.” In this study, the current focus is mainly on path order details of each distribution point, such as the order selection, that is, the action of selecting the next delivery quantity, cargo type, and estimated arrival time [25], point. Selecting the next delivery point is the core task of which can be expressed as where each {order }N order i i=1 i logistics path optimization, which directly affects contains all the order information related to the i-th transportation costs and time. By optimizing the selection distribution point. The time information includes the order of delivery points, the driving mileage and time of current time t and the estimated arrival time, which is vehicles can be reduced, improving logistics efficiency. crucial for dynamically adjusting the path and scheduling. Although the current research focuses on path The traffic condition information reflects the current selection, adjusting the vehicle speed is also an important traffic conditions, such as road congestion, which can be factor in logistics optimization. In real - world logistics scenarios, vehicle speed adjustment can be based on expressed as {traffic }N where each traffici describes i i=1 factors such as traffic conditions and delivery time the traffic conditions on the i-th road. These input data requirements. For example, when encountering a traffic together constitute the input state of the model, which is jam, appropriately reducing the vehicle speed can avoid used to dynamically adjust the logistics path and real-time frequent starting and stopping, reducing fuel consumption; scheduling strategy. while on a smooth road section, increasing the vehicle The output of the model is a set of action instructions speed can shorten the transportation time. In future that instruct the logistics system how to optimally dispatch research plans, the collaborative optimization of vehicle vehicles. The output can be represented as a series of speed adjustment and path selection will be further actions {at} , where each action at can be an operation explored to achieve more efficient logistics transportation. For example, a comprehensive optimization model can be such as selecting the next delivery point or adjusting the established, considering both path selection and vehicle vehicle speed. Specifically, these action instructions are speed adjustment, with the goal of minimizing designed to guide the logistics system to make the best transportation costs and time to formulate the optimal decision based on the current state to minimize cost, time, logistics strategy. or other optimization goals. An action at can be to select the next delivery point that the current vehicle should go 158 Informatica 49 (2025) 151–170 Y. Yang et al. The location coordinates Order information Time information Traffic conditions of the delivery point Feature extraction Convolutional layer Pooling layer State Action representation selection Action Model update execution Action instruction Figure 1: Model framework 3.2.2 Feature extraction optimal path selection strategy through DRL. The core of the path optimization algorithm design is how to select the In the logistics environment, the spatial features optimal path based on the current logistics environment extracted by CNN have a significant impact. If there are status. The algorithm is implemented through the dense distribution points in a certain area, a centralized following steps [29, 30]. distribution route can be planned to reduce costs. The road The features extracted in the CNN process include connectivity feature can help avoid dead-end roads and geographical layout features, such as road direction and choose efficient routes. These specific spatial features are delivery point location; traffic condition features, such as directly related to route selection. Since logistics needs to the distribution of congested sections. These features are be transported efficiently and at low cost, spatial features closely related to logistics decisions. Geographic layout provide a key basis for route planning [28]. features determine the basic path framework, and traffic Using CNN to extract spatial features from condition features affect real-time path adjustments. geographic location information is an important part of Combining these features can formulate a better logistics this study. Specifically, we use convolutional layers and distribution plan. pooling layers to capture local and global features on the In logistics optimization, the "state-action pair" has a map. The convolutional layer extracts spatial features of clear meaning. The state includes order information, different scales by applying multiple convolution kernels. vehicle location, traffic conditions, etc. Action refers to selecting the next delivery point, changing driving speed, 3.3 Path optimization algorithm design etc. The Q network outputs the action value based on the The path optimization algorithm designed in this current state and selects the action with the maximum study aims to achieve efficient path optimization in value, such as choosing a detour when traffic is congested, logistics distribution by combining the spatial feature in order to optimize the cost, time and other goals. extraction capability of convolutional neural networks (1) State representation: Use CNN to extract spatial (CNN) and the dynamic strategy learning capability of features from geographic location information and form a DRL. The algorithm design utilizes the powerful feature representation of the current state st . The state extraction capability of CNN to capture the spatial relationship between distribution points and learns the representation st describes the current logistics Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 159 environment configuration, including but not limited to feature maps. Next is the pooling layer, which uses vehicle location, cargo status, time information, etc. The maximum pooling with a pooling size of (2, 2). Its state representation can be expressed as Equation (1). function is to downsample the feature map, reduce the s amount of data, and retain important features. During the t =CNN(x ) (1) t training process, the back-propagation algorithm is used Among them, xt is the input data at the current time to update the weight parameters of the convolution kernel point, including the location coordinates of the delivery to minimize the loss function. point, order information, current time, and traffic For the DRL architecture, the core is the Q network. conditions. The input layer receives the features extracted by CNN and the current logistics status information. The Q (2) Action selection: Based on the current state st , network contains a fully connected layer with 256 neurons use the Q network in DRL to estimate the value of the and ReLU as the activation function, which can introduce state-action pair Q(st ,at ) and select the optimal action nonlinear factors and enhance the network's expressiveness. The output layer outputs the Q value of at . Action selection is st determined based on the current each possible action. The training strategy uses an experience replay mechanism to store the agent's state and the output of the Q network. Specifically, st the experience (state, action, reward, next state) in the action that can maximize the Q value is selected based on experience replay pool, and randomly samples from it for the current state at , expressed as Equation (2). training to reduce the correlation of the data. The learning rate is initially set to 0.001, and decay is considered. The at = argmaxa Q(s , ) (2 t a ) discount factor is 0.99 to balance immediate rewards and (3) Execute action: Execute the selected action at , future rewards. The initial exploration rate is 1.0, the minimum is 0.01, and the decay rate is 0.995. Random update the environment state to st+1 , and get an exploration is performed with a higher probability at the beginning of training, and the exploration rate is gradually immediate reward 0. rt Update the environment state to reduced as training progresses. With this comprehensive s architecture design and training strategy, the model is t+1 according to the selected action at and get an expected to achieve good results in logistics path immediate reward rt . The immediate reward rt reflects optimization and real-time scheduling. at the direct effect after executing the action, such as 4 Experimental evaluation whether the goods are delivered successfully, whether the driving time is reduced, etc. 4.1 Experimental design (4) Model update: Use the Q-Learning update rule to In order to verify the effectiveness of the proposed update the Q value in the Q network to approach the DRL - CNN combined model for logistics path optimal strategy. Model update adjusts the Q value in the optimization and real - time scheduling, this section details Q network through the Q-Learning update rule, expressed the experimental design. To ensure repeatability, we as Equation (3). selected the City Logistics Data Set (CLDS) and Traffic Q(st ,at ) Q(st ,at ) State Data Set (TSDS) as public datasets. You can try to (3) find the dataset links on public data platforms like Kaggle +[rt + maxa Q(st+1,a)−Q(st ,at )] (https://www.kaggle.com/), Data.gov Among them,  is the learning rate,  is the discount (https://www.data.gov/), Zenodo (https://zenodo.org/), factor (0 <<  1), which indicates the discount degree of academic resource websites such as IEEE DataPort (https://ieee-dataport.org/) and ACM Digital Library future rewards. This update rule rt adjusts the Q value of (https://dl.acm.org/), or relevant university and research the current state st and action by the maximum Q value of institution official websites. The data was divided into training (70%), validation (15%), and test (15%) sets. the current immediate reward at and the future state st+1 Six baseline methods (SPA, HA, GA, RBM, TDRM, . ADLM) were chosen for comparison. Each has unique For the CNN architecture, its input layer receives features and application scenarios. SPA offers the geospatial information from the logistics environment, theoretical shortest path but struggles with dynamic such as the location coordinates of the distribution point logistics. HA quickly finds approximate solutions for and the topological structure of the transportation large - scale problems, GA suits complex optimizations network. Next is the convolution layer. This study initially with high computational cost, RBM works for simple set two layers of convolution, with 32 and 64 convolution tasks in known environments, TDRM has limitations in kernels, respectively, and the convolution kernel size is (3, feature extraction compared to the proposed model, and 3) with a step size of 1. The convolution layer extracts ADLM may be less effective in specific scenarios. local features of the input data through convolution The evaluation dataset comes from real - world operations. Each convolution kernel slides on the input logistics with historical order data, location info, and data, multiplies elements and sums them to generate traffic conditions. Evaluation indicators include path 160 Informatica 49 (2025) 151–170 Y. Yang et al. length, completion time, punctuality, and scheduling numerical range. For categorical features, we encoded success rate. them and converted them into numerical data so that the For hyperparameters, we considered CNN and DRL model can process them effectively. characteristics. CNN had 32 and 64 convolution kernels, Through these outlier removal and data preprocessing (3, 3) size, step - size 1, and (2, 2) max - pooling. DRL's steps, the model can more accurately capture the Q - network had 256 neurons in the fully - connected layer characteristics and patterns in the data and reduce the with ReLU activation. Learning rate was 0.001 with interference of noise and errors. In experiments in decay, discount factor 0.99. Exploration rate started at 1.0, different logistics environments (suburbs, cities, minimum 0.01, decay rate 0.995, batch size 64, and target highways, etc.), the processed data enabled the model to network updated every 100 steps. Grid, random search, achieve better performance in indicators such as path and Bayesian optimization were used to find the best length, completion time, punctuality and scheduling hyperparameters. SPA, HA, and GA help evaluate the success rate, while also improving the robustness and model's advantages from different angles. generalization ability of the model, providing more In this study, outlier removal and data preprocessing reliable support for logistics path optimization and real- played a crucial role in improving the performance of the time scheduling. final model. After obtaining public datasets such as City Logistics Data Set (CLDS) and Traffic State Data Set 4.2 Experimental results (TSDS), we found that there were some outliers in the We tested in suburban environments, urban data, which may be caused by data entry errors, sensor environments, highways, and other environments, aiming failures, or special events. If not processed, they will have to comprehensively evaluate the performance differences a negative impact on model training and prediction, of different logistics scheduling methods in different causing the model to learn incorrect features and patterns, environments. thereby reducing the accuracy and stability of the model. Path length, completion time, punctuality, and To this end, we used a statistical analysis-based scheduling success rate are closely related to logistics path method to remove outliers. For example, for numerical optimization. Short paths, fast completion, high data, we calculated its mean and standard deviation, and punctuality, and high scheduling success rate are the goals regarded data points that deviated from the mean by a of logistics. "Success" means completing order delivery certain multiple of the standard deviation as outliers and on time and as required. These indicators measure removed them. In this way, we ensured the quality and logistics efficiency and service quality from different consistency of the data, allowing the model to learn based dimensions, and can effectively evaluate the effect of path on more reliable data. optimization. In terms of data preprocessing, we performed "Scale" in experiments can refer to the number of operations such as data cleaning, feature scaling, and orders, the size of the geographical area, etc. More orders encoding. During the data cleaning process, we handled or larger geographical areas will increase the complexity missing values and used methods such as mean filling and and uncertainty of path planning. For example, more median filling to ensure the integrity of the data. Feature orders may require more vehicles to be deployed, and a scaling is to normalize or standardize features of different large geographical area may face more traffic conditions. ranges and scales so that all features have the same Clarifying the concept of scale can better understand its importance in model training and avoid some features impact on experimental settings and results. dominating the model training process due to their large Figure 2: Scheduling efficiency at different scales Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 161 As shown in Figure 2, the scheduling efficiency of relatively high level at a large scale. This shows that these different scheduling methods at various scales is shown. It methods have certain adaptability and stability when can be seen that the scheduling efficiency of all methods dealing with larger-scale tasks. However, it should be gradually decreases as the scale increases. Our method noted that the scheduling efficiency will gradually decreases the slowest. Among them, the scheduling decrease as the scale increases, which means that some efficiency of “SPA”, “HA”, “GA”, “RBM”, “TDRM”, challenges and limitations may arise when facing larger- “ADLM” and “Proposed” all show a downward trend to scale problems. Therefore, in practical applications, it varying degrees. Although the decline rate of each method should be considered to optimize these methods to is different, their scheduling efficiency remains at a improve their performance at large task scales. Figure 3: Robustness changes at different scales Figure 3 shows the robustness performance of and scheduling success rate are more critical in actual different methods at different scales. It can be seen that as logistics services, so they are given higher weights; while the scale increases, the robustness of all methods path length and completion time are relatively less decreases, but the performance of the “Proposed” method important and have slightly lower weights. is significantly better than other methods, showing higher In the legend of Figure 3, "Proposed" represents the stability and robustness. Even at a larger scale, “Proposed” method proposed in this study that combines DRL with can still maintain a high performance, reflecting its CNN, and "SPA" and others represent other baseline superiority in coping with complex environmental methods for comparison. The "scale" on the x-axis changes. represents the scale of the experiment, which can be the In Figure 3, robustness is measured by taking into number of orders, the scope of the geographical area, or account multiple factors. Specifically, we define the time span. As these scale factors increase, the robustness as the ability of the model to maintain efficient complexity and uncertainty of the logistics environment and stable path planning and scheduling in logistics will increase accordingly. By observing the robustness of environments of different scales and dynamic changes. To different methods at different scales, we can intuitively quantify this ability, we use a comprehensive evaluation compare their ability to cope with complex environmental method of a series of key indicators. First, the fluctuation changes. range of path length, completion time, on-time rate, and For traditional algorithms, computing time refers to scheduling success rate of each method at different scales the time it takes from the start of the algorithm to finding is calculated. The smaller the fluctuation range, the more the best solution. For methods based on machine learning stable the method is in the face of environmental changes, or deep learning, computing time covers two stages: and the higher the robustness. model training and inference. Training time is the time it For the "value" indicator, it is the weighted sum of the takes for the model to learn parameters on the data set, and above multiple key indicators, and the weight is inference time is the time it takes to use the trained model determined based on the importance of each indicator in to obtain the path planning result. This measurement can the actual operation of logistics. For example, on-time rate comprehensively evaluate the efficiency of each method. 162 Informatica 49 (2025) 151–170 Y. Yang et al. Table 3: Performance comparison of different methods in suburban environment Path Scheduling Robustness Method Completion Punctuality Calculation length success score (out Name time (min) rate (%) time(s) (km) rate (%) of 10) SPA 240 170 78 83 10 6 HA 220 160 82 86 15 7 GA 210 150 88 91 25 8 RBM 250 180 75 80 5 5 TDRM 200 140 86 90 20 7 ADLM 190 130 90 93 18 8 Proposed 180 120 92 95 12 9 In suburban environments, the challenges faced by from 5-10 kilometers to 20-30 kilometers, with an average logistics scheduling are mainly long delivery distances distance of about 15 kilometers. In terms of road and less traffic interference. As can be seen from Table 3, conditions, the main roads are relatively wide but may be the proposed method outperforms other baseline methods generally maintained, and some branch roads are narrow in almost all indicators. and in poor condition. Traffic flow is generally small, and The description of the “suburban” environment in peak hours may increase due to activities in surrounding Table 2 is too vague, and its characteristics need to be towns. Similarly, for other environments such as cities, defined in detail to facilitate readers to fully understand highways, and multi-point distribution, their node the results. Suburban environments have relatively few characteristics, distance indicators, roads and traffic nodes and are more dispersed, usually around 10-20 conditions, etc. should also be clarified to enhance the nodes. The distance between nodes varies greatly, ranging interpretability of the results. Table 4: Performance comparison of different methods in urban environment Path Scheduling Robustness Method Completion Punctuality Calculation length success score (out Name time (min) rate (%) time(s) (km) rate (%) of 10) SPA 260 200 75 80 10 6 HA 240 190 78 82 15 7 GA 230 180 82 85 25 8 RBM 270 210 70 75 5 5 TDRM 220 170 84 87 20 7 ADLM 210 160 88 90 18 8 Proposed 200 150 90 93 12 9 The urban environment is characterized by dense 200 km, a completion time of 150 minutes, a punctuality buildings and complex transportation networks, which rate of 90%, and a scheduling success rate of 93% in the puts higher requirements on logistics scheduling. Table 4 urban environment, which are higher than other methods. shows the performance of different methods in the urban This shows that the proposed method can not only find a environment. The proposed method has a path length of better distribution path in the city, but also better adapt to Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 163 the dynamic changes in the city, ensuring a high service quality and scheduling success rate. Table 5: Performance comparison of different methods in highway environment Path Scheduling Robustness Method Completion Punctuality Calculation length success score (out Name time (min) rate (%) time(s) (km) rate (%) of 10) SPA 230 160 80 85 10 6 HA 210 150 83 87 15 7 GA 200 140 86 90 25 8 RBM 240 170 78 82 5 5 TDRM 190 130 88 91 20 7 ADLM 180 120 91 93 18 8 Proposed 170 110 93 95 12 9 The highway environment is characterized by fast especially in terms of path length, which reaches 170 km, traffic speed and strict traffic rules. Table 5 shows that in which is shorter than other methods. This shows that the the highway environment, the proposed method is proposed method has higher efficiency on highways and superior to other methods in terms of path length, can complete delivery tasks faster, while ensuring completion time, punctuality and scheduling success rate, extremely high punctuality and scheduling success rates. Table 6: Performance comparison of different methods under severe weather conditions Path Scheduling Robustness Method Completion Punctuality Calculation length success score (out Name time (min) rate (%) time(s) (km) rate (%) of 10) SPA 270 210 72 77 10 6 HA 250 200 75 78 15 7 GA 240 190 78 82 25 8 RBM 280 220 68 72 5 5 TDRM 230 180 80 83 20 7 ADLM 220 170 83 86 18 8 Proposed 210 160 85 88 12 9 Bad weather can seriously affect the efficiency and and a scheduling success rate of 88%, which are better safety of logistics distribution. As can be seen from Table than other methods. This shows that the proposed method 6, the proposed method still performs well under bad has better robustness and can maintain a high service level weather conditions, with a path length of 210 km, a under adverse weather conditions. completion time of 160 minutes, an on-time rate of 85%, 164 Informatica 49 (2025) 151–170 Y. Yang et al. Table 7: Performance comparison of different methods in peak traffic environment Path Scheduling Robustness Method Completion Punctuality Calculation length success score (out Name time (min) rate (%) time(s) (km) rate (%) of 10) SPA 280 220 68 72 10 6 HA 260 210 72 75 15 7 GA 250 200 75 78 25 8 RBM 290 230 65 70 5 5 TDRM 240 190 78 80 20 7 ADLM 230 180 80 83 18 8 Proposed 220 170 82 85 12 9 Traffic rush hour is a major difficulty in logistics success rate, especially the completion time of 170 scheduling. Table 7 shows that during peak hours, the minutes and the punctuality rate of 82%, which shows that proposed method is ahead of other methods in terms of the proposed method can still maintain a high work path length, completion time, punctuality and scheduling efficiency and service level during peak hours. Table 8: Performance comparison of different methods in emergency delivery environment Path Scheduling Robustness Method Completion Punctuality Calculation length success score (out Name time (min) rate (%) time(s) (km) rate (%) of 10) SPA 250 180 75 78 10 6 HA 230 170 78 80 15 7 GA 220 160 80 82 25 8 RBM 260 190 70 75 5 5 TDRM 210 150 82 85 20 7 ADLM 200 140 85 87 18 8 Proposed 190 130 88 90 12 9 Emergency delivery requires quick response and scheduling success rate of 90%, all of which are better efficient scheduling. As can be seen from Table 8, the than other methods. This shows that the proposed method proposed method performs well in the emergency delivery can also efficiently complete the delivery task in an environment, with a path length of 190 km, a completion emergency and meet the urgent needs of customers. time of 130 minutes, an on-time rate of 88%, and a Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 165 Table 9: Performance comparison of different methods in a multi-delivery point environment Path Scheduling Robustness Method Completion Punctuality Calculation length success score (out Name time (min) rate (%) time(s) (km) rate (%) of 10) SPA 260 190 70 75 10 6 HA 240 180 75 78 15 7 GA 230 170 78 80 25 8 RBM 270 200 68 72 5 5 TDRM 220 160 80 83 20 7 ADLM 210 150 82 85 18 8 Proposed 200 140 85 87 12 9 Table 9 shows that in the multi-point distribution the path length is 200 km and the completion time is 140 environment, the proposed method is superior to other minutes, which shows the advantages of the proposed methods in terms of path length, completion time, method in dealing with multi-point distribution. punctuality and scheduling success rate, especially when Figure 4: Comprehensive performance comparison of different methods 166 Informatica 49 (2025) 151–170 Y. Yang et al. Table 10: Robustness score comparison Method Name Calculation time(s) Robustness score (out of 10) SPA 10 6 HA 15 7 GA 25 8 RBM 5 5 TDRM 20 7 ADLM 18 8 Proposed 12 9 As shown in Table 10 and Figure 4, the proposed rate. This fully demonstrates the superiority of the method outperforms other methods in all environments, proposed method in different logistics scheduling especially in key indicators such as path length, environments and shows its great potential in practical completion time, on-time rate, and scheduling success applications. Figure 5: Success rate performance at different scales “Success rate” refers to the proportion of orders that higher success rate at a smaller scale, but as the scale are successfully dispatched, and “scale” refers to the increases, its success rate decreases rapidly and eventually number of orders. It only shows that the success rate stabilizes. increases with scale, but does not explain the practical In order to rigorously verify the significance of the value of this relationship for the logistics environment. performance differences of the proposed method in this Figure 5 shows the success rate performance of study compared with other baseline methods in different different algorithms at different scales. As can be seen environments, we conducted a paired sample t test. For the from the figure, with the increase of scale, the success rate suburban environment, in terms of the path length of each algorithm generally shows a downward trend. It is indicator, the t statistic calculation results show that its worth noting that the “Proposed” algorithm shows a absolute value far exceeds the critical value, indicating Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 167 that the proposed method is significantly different from is 240 kilometers, while that of the model in this study is other baseline methods, and the path of this method is only 180 kilometers. This is because the model uses the significantly shorter; similarly, the completion time powerful feature extraction ability of CNN to better indicator shows that the advantage of this method in short capture geospatial information, combined with DRL to completion time is significant. dynamically learn the optimal strategy, so as to plan a In the urban environment, the t test results of the shorter path. completion time and punctuality indicators are significant, In terms of completion time, the model also performs indicating that this method can complete tasks more well. In urban environments, the completion time of the efficiently and on time in complex urban environments. In genetic algorithm (GA) is 180 minutes, while the the highway environment, the difference in path length completion time of this model is only 150 minutes. This is and dispatch success rate is significant, reflecting the due to the model's rapid response to dynamic information superiority of this method in planning paths and arranging such as real-time traffic conditions and decision dispatches in high-speed scenarios. adjustments, which achieves more efficient scheduling. Under severe weather conditions, there are significant On-time rate and scheduling success rate are differences in the punctuality rate and dispatch success important indicators for measuring the quality of logistics rate indicators, showing the robustness of this method in services. In various environments, the punctuality rate and dealing with severe weather. During peak traffic hours, the scheduling success rate of this model are higher than other completion time and dispatch success rate are significantly methods. For example, in the highway environment, the different, indicating that this method can also maintain punctuality rate of the traditional deep reinforcement efficient dispatch during peak hours. In the emergency learning method (TDRM) is 88%, and the scheduling delivery environment, all indicators are significantly success rate is 91%, while this model reaches 93% and better than the baseline method, highlighting the ability of 95% respectively. This shows that this model can better rapid response and efficient dispatch. The significant cope with the complex and changing logistics differences in path length and completion time in a multi- environment and ensure the stability and reliability of point distribution environment prove the advantages of logistics services. this method in dealing with multi-point distribution From the perspective of computing time and problems. Overall, the advantages of this research method robustness score, this model shows stronger robustness in different environments and indicators are statistically while maintaining high efficiency. In terms of computing significant. time, this model is at a medium level of 12 seconds, but it can maintain a high robustness score (9 points) under 4.3 Discussion complex environmental changes. This is because the model continuously optimizes decisions during the Through the above experimental results, we learning process and has better adaptability to comprehensively evaluated the proposed innovative environmental changes. model that combines DRL with convolutional neural In summary, the DRL + CNN model proposed in this networks (CNN). Compared with the state-of-the-art study has obvious advantages in logistics path (SOTA) in related work, the model in this study showed optimization and real-time scheduling, and can effectively significant advantages in multiple key indicators. fill the shortcomings of existing methods in complex In terms of path length, the path length of the model environment adaptability, real-time, feature extraction, in this study is shorter than that of other comparison and integration with domain knowledge, providing strong methods, whether in suburban, urban or highway technical support for the development of future logistics environments. For example, in suburban environments, scheduling systems. the path length of the classic shortest path algorithm (SPA) Table 11: Comparison results Research Method Results Suburban area: Path length is 240 km, completion time is 170 minutes, on - time SPA rate is 78%, and scheduling success rate is 83%. Suburban area: Path length is 220 km, completion time is 160 minutes, on - time HA rate is 82%, and scheduling success rate is 86%. Suburban area: Path length is 210 km, completion time is 150 minutes, on - time GA rate is 88%, and scheduling success rate is 91%. Suburban area: Path length is 250 km, completion time is 180 minutes, on - time RBM rate is 75%, and scheduling success rate is 80%. 168 Informatica 49 (2025) 151–170 Y. Yang et al. Research Method Results Suburban area: Path length is 200 km, completion time is 140 minutes, on - time TDRM rate is 86%, and scheduling success rate is 90%. Suburban area: Path length is 190 km, completion time is 130 minutes, on - time ADLM rate is 90%, and scheduling success rate is 93%. This Study (DRL + Suburban area: Path length is 180 km, completion time is 120 minutes, on - time CNN) rate is 92%, and scheduling success rate is 95%. As shown in Table 11, the table clearly presents the service quality and scheduling success, with a path length comparison results of logistics path optimization of of 200 kilometers, a completion time of 150 minutes, an different research methods in suburban environments. As on - time rate of 90%, and a scheduling success rate of a classic shortest path algorithm, SPA has a long path 93%. On highways, it seems efficient and can attain length, a long completion time, and a relatively low on- relatively swift delivery while keeping a high on - time time rate and scheduling success rate in suburban rate, with a path length of 170 kilometers, a completion environments, reflecting its poor adaptability in complex time of 110 minutes, an on - time rate of 93%, and a suburban logistics scenarios. HA has improved compared scheduling success rate of 95%. to SPA, but still has certain limitations. GA has further In the conducted experiments, the proposed method improved in path length, completion time, on-time rate, has seemingly outperformed other methods in terms of and scheduling success rate, but its advantages are not computation time and robustness score. This gives an obvious compared with more advanced methods. The indication that the model might be able to find a decent performance of RBM in various indicators is relatively solution within a relatively short period and maintain a poor, indicating that rule-based methods have limited somewhat stable performance when encountering certain effects in suburban logistics path planning.TDRM and environmental changes. Nevertheless, it must be ADLM, as more advanced methods, perform well in emphasized that the experiments have a limited scope, multiple indicators. However, the DRL + CNN method especially when it comes to extreme scenarios like bad proposed in this study shows significant advantages, with weather and traffic peak hours. Thus, we cannot be overly the shortest path length, the shortest completion time, and confident about its ability to handle real - time changing the highest on-time rate and scheduling success rate. This traffic conditions and emergencies as mentioned in the shows that the method of combining deep reinforcement introduction. learning with convolutional neural networks can more Based on the current comparison of different methods' accurately capture the characteristics of suburban logistics comprehensive performance, the proposed method shows environments, dynamically adjust path planning and some potential advantages in various logistics scheduling scheduling strategies, and thus achieve more efficient and environments. It serves as a starting point for future reliable logistics distribution. These results provide a exploration in logistics scheduling systems, but significant strong reference for path optimization and scheduling in refinement and more extensive validation are undoubtedly the logistics industry in suburban environments. necessary. 5 Conclusion Funding This study explores the integration of DRL and CNN This work was supported by 2023 Bidding Subjects to develop a novel logistics path optimization and real - for Decision-making Research of Jiaozuo Municipal time scheduling model. Through meticulous analysis and Government of Henan Province: “Research on pre - processing of the City Logistics Data Set (CLDS) and Countermeasures for Consolidating and Developing the Traffic State Data Set (TSDS), we've crafted a model that Public Ownership Economy in Jiaozuo City” may have the capacity to handle diverse logistics (JZZ202311-1), Excellent Subject, Project Completion, environments. Presided over; 2022 Henan Jiaozuo Municipal The experimental outcomes suggest that the proposed Government Decision Research Bidding Project: method has shown some positive signs in multiple “Research on the Development of Local Characteristic logistics scheduling environments. In suburban regions, it Industries in Jiaozuo City under the Background of Rural appears to have some ability to tackle long - distance Revitalization” (JZZ202222-1), Excellent Project, delivery issues and manage scattered delivery points. For Conclusion, Presided; 2021 Henan Jiaozuo Municipal example, the path length is 180 kilometers, the completion Government Decision Research Bidding Project: time is 120 minutes, the on - time rate is 92%, and the “Research on Modern and Efficient Agricultural scheduling success rate is 95%. In urban areas, it can Development in Henan City under the Rural somewhat find better delivery paths and adapt to the Revitalization Strategy” (JZZ202127-2), qualified, complex traffic network, achieving a certain level of closed, presided; And 2022 Henan Province University Efficient Logistics Path Optimization and Scheduling Using Deep… Informatica 49 (2025) 151–170 169 Humanities and Social Science Research Project Funding: IEEE Access. 2022; 10:6175-6193. DOI: “Research on Promoting the Effective Connection 10.1109/access.2022.3141311 between Poverty Alleviation Strategy and Rural [12] Zhu XY, Jiang LL, Xiao YM. Study on the Revitalization” (2022-ZDJH-00273), established under optimization of the material distribution path in an research and presided. electronic assembly manufacturing company workshop based on a genetic algorithm considering References carbon emissions. Processes. 2023; 11(5):1500. DOI: 10.3390/pr11051500 [1] Di Gennaro G, Buonanno A, Fioretti G, Verolla F, [13] Okumus F, Dönmez E, Kocamaz AF. A Cloudware Pattipati KR, Palmieri FAN. Probabilistic inference architecture for collaboration of multiple AGVs in and dynamic programming: a unified approach to indoor logistics: case study in fabric manufacturing multi-agent autonomous coordination in complex enterprises. Electronics. 2020; 9(12):2023. DOI: and uncertain environments. Frontiers in Physics. 10.3390/electronics9122023 2022; 10:944157. DOI: 10.3389/fphy.2022.944157 [14] Chen YH. Intelligent algorithms for cold chain [2] Du PF, Shi YQ, Cao HT, Garg S, Alrashoud M, logistics distribution optimization based on big data Shukla PK. AI-Enabled trajectory optimization of cloud computing analysis. Journal of Cloud logistics UAVs with wind impacts in smart cities. Computing-Advances Systems and Applications. IEEE Transactions on Consumer Electronics. 2024; 2020; 9(1):37. DOI: 10.1186/s13677-020-00174-x 70(1):3885-3897. DOI: 10.1109/tce.2024.3355061 [15] Yu SL, Song YT. SRNN-RSA: a new method to [3] Wang PW, Wang YF, Wang X, Liu Y, Zhang J. An solving time-dependent shortest path problems based intelligent actuator of an indoor logistics system on structural recurrent neural network and ripple based on multi-sensor fusion. Actuators. 2021; spreading algorithm. Complex & Intelligent 10(6):120. DOI: 10.3390/act10060120 Systems. 2024; 10(3):4293-4309. DOI: [4] Cai LY. Decision-making of transportation vehicle 10.1007/s40747-024-01351-0 routing based on particle swarm optimization [16] Sun Q, Zhang HF, Dang JW. Two-Stage vehicle algorithm in logistics distribution management. routing optimization for logistics distribution based Cluster Computing-the Journal of Networks on HSA-HGBS algorithm. IEEE Access. 2022; Software Tools and Applications. 2023; 26(6):3707- 10:99646-99660. DOI: 3718. DOI: 10.1007/s10586-022-03730-z 10.1109/access.2022.3206947 [5] Zhang MY, Jiang YH, Wan C, Tang C, Chen BY, Xi [17] Wu C, Xiao YM, Zhu XY, Xiao GW. Study on HZ. Design of an intelligent shop scheduling system multi-objective optimization of logistics distribution based on internet of things. Energies. 2023; paths in smart manufacturing workshops based on 16(17):6310. DOI: 10.3390/en16176310 time tolerance and low carbon emissions. Processes. [6] Leng KJ, Li SH. Distribution path optimization for 2023; 11(6):1730. DOI: 10.3390/pr11061730 intelligent logistics vehicles of urban rail [18] Ren CA, Huang YZ, Luo DX. A global optimal transportation using VRP optimization model. IEEE mapping method of network based on discrete Transactions on Intelligent Transportation Systems. optimization firefly algorithm. Microprocessors and 2022; 23(2):1661-1669. DOI: Microsystems. 2021; 81:103800. DOI: 10.1109/tits.2021.3105105 10.1016/j.micpro.2020.103800 [7] Zheng KN, Huo XX, Jasimuddin S, Zhang JZ, [19] Yang C. Intelligent pickup and delivery collocation Battaïa O. Logistics distribution optimization: Fuzzy for logistics models. Journal of Intelligent & Fuzzy clustering analysis of e-commerce customers' Systems. 2023; 44(4):6117-6129. DOI: 10.3233/jifs- demands. Computers in Industry. 2023; 151:103960. 223708 DOI: 10.1016/j.compind.2023.103960 [20] Sun YX, Geng N, Gong SL, Yang YB. Research on [8] Zhang L, Lu SQ, Luo ML, Dong B. Optimization of improved genetic algorithm in path optimization of the storage spaces and the storing route of the aviation logistics distribution center. Journal of pharmaceutical logistics robot. Actuators. 2023; Intelligent & Fuzzy Systems. 2020; 38(1):29-37. 12(3):133. DOI: 10.3390/act12030133 DOI: 10.3233/jifs-179377 [9] Alkan N, Kahraman C. Prioritization of supply chain [21] Tian SH, Huangfu CY, Deng XP. Research on digital transformation strategies using multi-expert comprehensive optimisation of AGVs scheduling at fermatean fuzzy analytic hierarchy process. intelligent express distribution centres based on Informatica. 2023; 34(1):1-33. DOI: 10.15388/22- improved GA. Journal of the Operational Research infor493 Society. 2024; 75(10):1875-1892. DOI: [10] Lee KY, Lim JS, Ko SS. Endosymbiotic 10.1080/01605682.2023.2283518 evolutionary algorithm for an integrated model of the [22] Chen WL, Song XG. Enterprise management vehicle routing and truck scheduling problem with a innovation in the internet of things from the cross-docking system. Informatica. 2019; 30(3):481- perspective of contingency. Journal of Intelligent & 502. DOI: 10.15388/Informatica.2019.215 Fuzzy Systems. 2019; 37(5):5829-5836. DOI: [11] Issaoui Y, Khiat A, Haricha K, Bahnasse A, Ouajji 10.3233/jifs-179164 H. An advanced system to enhance and optimize [23] Wang Y, Zhang J, Guan XY, Xu MZ, Wang Z, Wang delivery operations in a smart logistics environment. HZ. Collaborative multiple centers fresh logistics 170 Informatica 49 (2025) 151–170 Y. Yang et al. distribution network optimization with resource agricultural products based on the perspective of sharing and temperature control constraints. Expert customers. Journal of Intelligent & Fuzzy Systems. Systems with Applications. 2021; 165:113838. DOI: 2022; 43(1):615-626. DOI: 10.3233/jifs-212362 10.1016/j.eswa.2020.113838 [28] Liu Z, Wang HS, Wei HS, Liu M, Liu YH. [24] Wu C, Xiao YM, Zhu XY. Research on optimization Prediction, planning, and coordination of thousand- algorithm of AGV scheduling for intelligent warehousing-robot networks with motion and manufacturing company: taking the machining shop communication uncertainties. IEEE Transactions on as an example. Processes. 2023; 11(9):2606. DOI: Automation Science and Engineering. 2021; 10.3390/pr11092606 18(4):1705-1717. DOI: 10.1109/tase.2020.3015110 [25] Ren JJ, Salleh SS. Green urban logistics path [29] Yang YL, Zhang J, Sun WJ, Pu Y. Research on planning design based on physical network system in NSGA-III in Location-routing-inventory problem of the context of artificial intelligence. Journal of pharmaceutical logistics intermodal network. Supercomputing. 2024; 80(7):9140-9161. DOI: Journal of Intelligent & Fuzzy Systems. 2021; 10.1007/s11227-023-05796-x 41(1):699-713. DOI: 10.3233/jifs-202508 [26] Gautam K, Ahn CW. Quantum path integral [30] Yin N. Multiobjective optimization for vehicle approach for vehicle routing optimization with routing optimization problem in low-carbon limited Qubit. IEEE Transactions on Intelligent intelligent transportation. IEEE Transactions on Transportation Systems. 2024; 25(5):3244-3258. Intelligent Transportation Systems. 2023; DOI: 10.1109/tits.2023.3327157 24(11):13161-13170. DOI: [27] Li HP, Lu L, Yang LG. Study on the extension 10.1109/tits.2022.3193679 evaluation of smart logistics distribution of fresh https://doi.org/10.31449/inf.v49i16.7201 Informatica 49 (2025) 171–186 171 Design and Application of Improved Genetic Algorithm for Optimizing the Location of Computer Network Nodes Chunlei Zhong1*, Gang Yang2 1Huai'an Bioengineering Branch Institute, Jiangsu Union Technical Institute, Huai'an, 223200, China 2College of Teacher Education, Wenzhou University, Wenzhou, 325035, China E-mail: hm_spring@163.com *Corresponding author Keywords: genetic algorithm, computer network, network nodes, improved genetic algorithm, average error Received: September 24, 2024 The rapid development of computer technology has made network stability and node positioning accuracy important challenges in optimizing computer network design. This study proposes an optimization method based on the Improved Genetic Algorithm (IGA) to improve the positioning accuracy and stability of network nodes. Firstly, by combining the characteristics of the centroid algorithm and the Approximate Point in Triangulation Test (APIT) algorithm, preliminary optimization of node positions is carried out. Subsequently, an IGA is utilized for further optimization, dynamically adjusting the crossover probability and mutation probability to balance global and local search capabilities and avoid the algorithm falling into local optima. The experimental results showed that IGA achieved significant performance improvement in node localization. Compared with the centroid algorithm, the maximum error of IGA has been reduced by 19% and the overall average error has been reduced by 8.8%. Compared with APIT, IGA has reduced the maximum error by 7% and the overall average error by 3.8%. Regarding fitness values, IGA exhibited faster convergence speed, achieving optimal results with only 75 iterations, surpassing traditional genetic algorithms and APIT algorithms. The node coverage rate reached 98.6%, far higher than the 85.3% of the centroid algorithm and 90.5% of the APIT algorithm. These results demonstrate that IGA has higher accuracy, stability, and computational efficiency in complex network environments, providing an efficient and reliable solution for optimizing the design of computer network nodes. Povzetek: Predlagan je izboljšan genetski algoritem (IGA) za optimizacijo lokacij vozlišč v računalniških omrežjih, ki z dinamičnim prilagajanjem verjetnosti križanja in mutacije poveča točnost, stabilnost in učinkovitost algoritma. 1 Introduction efficient and accurate when utilized for network design optimization. At the same time, researchers combine With the continuous progress of modern technology, genetic algorithms with other optimization algorithms to computer networks play an increasingly important role in create multiple hybrid optimization algorithms, which modern society. They connect various devices and enhances network design performance and effectiveness systems, making the transmission and sharing of [3-4]. The objective of the research is to achieve an information more efficient and convenient. To meet the optimized design of computer networks and to improve needs of users for high-quality network services, network performance indicators, including latency, improving network performance and optimizing network throughput, resource utilization, and cost, through an design have become increasingly important. Traditional Improved Genetic Algorithm (IGA). The research aims to optimization algorithms frequently encounter issues of solve the problems of slow convergence speed, low efficiency and a propensity to fall into local optima susceptibility to local optima, and difficulty in dynamic when addressing large-scale network design problems. adjustment of traditional network optimization methods Therefore, it is necessary to introduce new optimization in complex network environments. This study designs a algorithms to solve these problems [1-2]. In recent years, computer network optimization technique based on a researchers have made significant advancements in genetic algorithm as the core and introduces multiple applying enhanced genetic algorithms to optimize techniques to improve performance. A fitness function computer networks. These enhancements include the based on network performance indicators is constructed introduction of new operators, optimization of algorithm to quantify the network optimization objectives. This parameters, and adjustments to the algorithms technology adjusts the crossover and mutation themselves. As a result, genetic algorithms are now more probabilities adaptively by comparing individual fitness 172 Informatica 49 (2025) 171–186 C. Zhong et al. and population average fitness, balancing global and local search capabilities. relay node for network transmission based on the number 2 Related work of channels. This algorithm had high merit in data delivery efficiency and routing overhead in computer The 5G era is coming and the network technology is networks. Alsaqour et al. [8] put forward a developing rapidly. Numerous data have brought location-assisted routing algorithm grounded on genetic enormous challenges to the stability and reliability of algorithms to optimize the efficiency of MANET routing computer networks. The reliability of computer networks protocols. Firstly, through algorithm optimization, node is a major indicator of computer comprehensive information was added to the route and these nodes were performance. Computer networks are large and complex, grouped. These nodes were then sent to their destinations and they are also easily affected by many adverse factors. to adaptively update the node location. The results This leads to instability in the system, which exposes the showed that the optimized algorithm could achieve a entire computer network to significant risks. To ensure delivery rate of over 99% for small network overhead the stability and ongoing optimization of computer packets. Bu [9] developed a load-balancing scheduling networks, computer network optimization design has algorithm for Internet of Things (IoT) clusters using a become a prevalent point of discussion in computer combination of Particle Swarm Optimization and Genetic research. Through their study of cloud computing, Fan et Algorithm (PSOGA). The purpose of this algorithm was al. [5] presented a novel mathematical model for virtual to address the persistent challenge faced by IoT networks network embedding in optical data center networks. This due to high-volume business data traffic causing model reduced Network Topology (NT) complexity downtime. They first used the CPU, RAM, and network during optical fiber transmission. They used a bandwidth to measure the server node information, then comprehensive system of node awareness and path adjusted the appropriate function value, and used the IGA evaluation to derive algorithms with priority locations. to obtain the optimal solution. The results showed that the The algorithm obtained by this model could reduce the optimized algorithm could reduce latency and error rates latency of virtual network requests by 20% and improve by 5%, while also reducing server overload and the request rate by 13%. Rajendran and Venkataraman [6] downtime. Network coding could integrate coding proposed a new neural network algorithm to analyze capabilities with network multi-path propagation, bolster network traffic built on the application and analysis of big the capacity of computer networks, and facilitate more data in network security. They used this method to intricate security solutions. To address the susceptibility conduct statistics on the worst data and abnormal activity of network coding to attacks, Wu et al. [10] developed a sent by the network and conducted experiments with the comprehensive unicast secure transmission scheme based data. Compared with traditional algorithms for neural on Random Linear Network (RLN). The matrix was networks, the optimized algorithm showed a notable randomly generated from the received nodes and the enhancement in distinguishing between false alarms and resulting vector was sent back to the source node via the actual detection, which significantly improved the link to form a new matrix. This approach effectively security and stability of the network. Xiaokaiti et al. [7] thwarted network eavesdropping attacks. The raised an efficient data transmission strategy for the comparative analysis between the research and the detection algorithm of computer network communities. advanced methods is shown in Table 1. They first combined NT attributes with social attributes when dividing communities and then selected the optimal Table 1: Comparative analysis of research and the advanced methods Comparison with Reference Technical Method Advantages Disadvantages IGA Virtual network IGA reduces error Reduces latency by 20% Limited applicability; embedding with node by 8.8%, with Fan et al. [5] and improves request rate does not optimize node awareness and path broader by 13%. positioning. evaluation. applicability. Enhanced neural High computational IGA achieves Rajendran et network algorithm for Improves security and cost; lacks node 2.41% error, with al. [6] malicious traffic reduces false alarms. optimization. higher efficiency. detection. Community detection Improves transmission IGA reduces error Xiaokaiti et Dependent on BT; algorithm to optimize efficiency and reduces by 3.8%, offering al. [7] limited precision. data transmission. routing overhead. better stability. Alsaqour et Genetic algorithm for Achieves 99% small Suitable for small IGA reduces error al. [8] optimizing mobile ad packet delivery rate with networks; struggles to 2.46%, with Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 173 hoc network routing. low overhead. with large-scale wider applicability. networks. IGA improves Focuses on load accuracy by 8.8%, PSO and GA combined Reduces load and Bu [9] balancing; lacks offering a for load balancing. downtime by 5%. positioning accuracy. comprehensive solution. Does not optimize IGA achieves 5.2% Secure transmission Wu et al. Enhances security and node positioning or error, with better using random linear [10] prevents eavesdropping. transmission precision and network coding. efficiency. stability. Previous research has found that related work mainly focuses on specific aspects of computer network 3 Construction of computer network optimization, including security enhancement, data transmission efficiency, and load balancing. However, optimization design model based they have shortcomings in addressing the accuracy of on genetic algorithm network node localization and overall stability under different network conditions. Existing algorithms such as 3.1 Optimization of node location based on the centroid algorithm and Approximate Point in centroid algorithm and APIT Triangulation Test (APIT) have significant drawbacks, algorithm including limited accuracy and sensitivity to node density. Traditional algorithms, such as genetic From the perspective of topology, a computer network is algorithms and MANET routing protocols perform well composed of several network nodes and communication in specific network types, but perform poorly in links connecting these network nodes. This indicates that large-scale or dynamic environments. Using genetic the positioning of network nodes is indispensable in algorithms to assist routing protocols can improve computer network data transmission. The centroid network overhead and delivery rates. This fully optimizes algorithm is the most typical node localization algorithm the genetic algorithm and enhances network delivery. To among commonly used localization algorithms. The optimize computer network nodes for better algorithm has four advantages: low storage energy environmental conditions, this study uses centroid and consumption, simple algorithm principle, low computing APIT algorithms, which provide better conditions for energy consumption, and low communication energy computer network optimization. Then, based on node consumption. optimization, an IGA is used to construct a network Before using this algorithm for localization, it is first design optimization model. Through optimizing the necessary to determine whether the location node that the traditional ant colony algorithm, the efficiency of sensor needs to determine is located within the region. At network nodes in computer network optimization design the same time, nodes requiring location determination is enhanced. This paper aims to increase the stability and will continually emit various reliability of computer network optimization design. B A C (x,y) H D G E F Figure 1: Schematic diagram of centroid algorithm positioning 174 Informatica 49 (2025) 171–186 C. Zhong et al. communication signals to the surrounding environment. 1 n 1 n To determine whether the unknown node is in the (x, y) = (  x ,  y (1) i i ) n i=1 n i=1 monitoring area, it is essential to verify the strength of the In Formula (1), n represents a n -sided shape. signal obtained at the beacon node. The strength can (x, y) means the coordinate of the vertex. The centroid reflect the unknown node location [11]. The principle of of this n-sided shape can be obtained by calculating the the centroid algorithm is obtained built on the algorithm formula. for the centroid. In any irregular polygon, there must be a If the polygon is situated within the solved region center of mass inside it. Usually, the coordinates of each and has matching coordinates, then the centroid vertex are accumulated, and then the average value is coordinates of the octagon can be computed using calculated to determine its specific coordinates. The Formula (2). specific location can be represented by Formula (1), and its algorithm diagram is shown in Figure 1. x + x + + x y + y + + y (x, y) = ( 1 2 8 , 1 2 8 ) (2) 8 8 The (x1 , y2 ) (x8 , y8 ) in Formula (2) represents require high positioning accuracy, the centroid algorithm the coordinates of eight vertices. To use centroid is still the most suitable method. positioning algorithms for positioning, it is essential to APIT is an improved algorithm for the centroid rely on the smoothness of the entire network structure and algorithm. It requires a completely random selection of the specific distribution of positioning nodes within the many known coordinate nodes, and the coordinate nodes network. If an error occurs in the coordinates calculated are grouped every three. In accordance with these nodes, by unknown nodes, it will bias towards areas with the triangles drawn on the graph will be completely densely distributed beacon nodes, potentially resulting in randomly distributed throughout the entire region. There significant errors with the centroid algorithm. Therefore, will be some overlap between these triangles to calculate the algorithm's calculation accuracy is typically not high, the coordinates of unknown nodes. The specific operation and the positioning accuracy may be low. However, the steps are: First, multiple coordinate nodes around centroid algorithm only needs to broadcast once to locate unknown nodes are identified, and three known location all unknown nodes. In many applications that do not nodes are randomly Figure 2: Schematic diagram of APIT algorithm positioning selected each time. Then, the approximate location of the unknown nodes into the algorithm results in enhanced signals received by these known location nodes is accuracy in location estimation [12-13]. However, this is determined. If there are m beacon nodes, the paper will accompanied by an increased computational burden. In randomly select and match them, and use the combination such a scenario, choosing a subset of vertices to create a of three random position points to fo C3 rm n triangle. It polygon based on the real circumstances, as illustrated in shows that some triangular regions can contain unknown Figure 2, can be beneficial. In Figure 2, the node nodes, while others do not. These specific points, which positioning accuracy of the APIT is significantly greater contain triangular regions of unknown nodes, are than that of the centroid positioning algorithm. connected to each other. Finally, the recorded location Due to the relatively large impact of node density on algorithm is utilized to calculate the specific location of APIT, when the beacon node density is relatively large, unknown nodes. The incorporation of a greater number of APIT can achieve relatively ideal positioning accuracy. Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 175 APIT also has good performance in irregular wireless shortcomings in both algorithms concerning their signal propagation models and irrational circular calculation and location processes. Additionally, the use propagation models. However, APIT also has a of genetic algorithms for node localization requires extra significant disadvantage. When connecting triangles, it constraints, which may lead to increased computational may mistake points located outside the triangle for points time and reduced efficiency, resulting in premature inside the triangle. The probability of generating this convergence [15]. To obtain better positioning situation can reach a maximum of 13% through research optimization results, an IGA model is studied and [14], which will have a significant impact on positioning constructed. The flow chart of the model is shown in accuracy. The algorithm must divide a large number of Figure 3, and the blue box in the figure shows the triangular regions to identify the locations of unknown improved steps. nodes and necessitates multiple beacon nodes. As a Compared with Traditional Genetic Algorithms result, the algorithm performs numerous calculations, (TGA), the paper has improved the node localization of which elevates the likelihood of encountering errors. genetic algorithms and constructed a matrix. The specific construction of the matrix is shown in Formula (3). 3.2 Construction of improved genetic algorithm model The research and analysis of the centroid and APIT algorithms in node location optimization have revealed s  a11 a12 a 1 1n      s2 a21 a22 a2n S(m) =   =   (3)         sm  am1 am2 amn  The m in Formula (3) represents a total of m communication areas of different anchor nodes is shown chromosomes in the genetic algorithm. n means that in Figure 4. each chromosome has n elements. s1 , s2 sm are IGA performs improved optimization in TGA such each chromosome. In TGA, determining the value range as parameter setting, population initialization, appropriate of genes is a commonly used method to generate an function values, selection operations, and crossover initial population waiting for calculation. If a certain operations. The specific key parameter settings are to set number of initial populations are randomly generated the number of populations to be 40, the crossover within this value range, there may be situations where the probability pc1 to be 0.6, pc2 to be 0.4, the mutation distribution is too random. This result is basically not probability pm1 to be 0.08, pm2 to be 0.06, and the helpful for improving the algorithm efficiency [16]. If maximum iterations to be 100. Population initialization one wants to obtain a global optimal solution, the can determine the initial population range according to distribution of the initial population in the solution space Formula (4), and generate an initial population randomly should be as uniform and dispersed as possible. The within this range. schematic diagram of random initial population  max (xi − di )  x  min (xi + di ) generation within the overlapping range of i=1,2, ,n i=1,2, ,n  (4)  max (yi − di )  y  min (yi + di )i=1,2, ,n i=1,2, ,n Begin Initialization Select Action Calculate individual fitness Cross operation N Meet stop rule Mutation operation Y Finish Generate a new species group Figure 3: The improvement process of genetic algorithms 176 Informatica 49 (2025) 171–186 C. Zhong et al. Anchor node Initial population generation region Figure 4: Schematic diagram of initial population generation area In Formula (4), di means the distance between the accelerating the convergence speed of the algorithm. The node of unknown i and the anchor. The fitness function selection operation is to perform a unified comparison of value calculation assumes a total of (M +N) nodes in each individual based on the fitness value calculated from the wireless sensor network to be located, where the the fitness function. After the comparison is completed, number of known nodes is M. The unknown node number the two individuals with the highest fitness remain is N. Through a certain distance measurement method, if unchanged and proceed to the next round of operation. each unknown node knows the distance between all The individuals with the lowest fitness are directly known nodes within the communication radius and itself, eliminated, and the remaining objects will normally the calculated node position can be obtained through the undergo crossover and mutation operations. Special least square method [17]. Assuming that the coordinate of individuals with high fitness values are set with judgment a node at a known location is (x1,y1),(x2,y2),…,(xM,yM), the values to distinguish and limit their reproduction. After coordinate of an unknown node is (x, y) , and the completing the full iterative process, the fitness value of distance from the node at a known location is each individual should be appropriately amplified d1, d2 , d3 , , dM . The equation set shown in Formula (5) [18-19]. The crossover probability is used to control the can be established. probability of individuals (chromosomes) performing crossover operations. By calculating an individual's fitness value, it can be determined whether the individual (x − x )2 + (y − y )2 1 1 = d 2 1 should participate in crossover operations. The goal of  (x − x 2 2 2 crossover operation is to generate offspring with higher  2 ) + (y − y2 ) = d2  fitness by recombining the genetic information of the 2 2 2 (x − x3 ) + (y − y3 ) = d3 (5) parent individual, gradually approaching the optimal   solution. When performing a crossover operation, if (x − x )2 + (y − y 2 2 Fg  Favg , the crossover probability is calculated M M ) = d  M according to Formula (7). From Formula (5), the fitness function for genetic F algorithms can be defined as Formula (6), and the fitness g − Farg Pc = pc (7) 1 function of the initial population can be calculated by Fgb − Favg using it. In Formula (7), pc 1 M 1 (0,1) . Fg is the value of the f (x, y) =  (x − xi )2+ (y − yi )2 − di (6) individual's fitness function. Fgb represents the optimal M i=1 individual adaptation function value. Favg is the average In Formula (6), (x, y) is the unknown location of the adaptive function. The calculation process includes node location. (xi , yi ) represents a known location node normalizing the fitness difference and converting the location. di refers to the distance from an unknown normalized fitness difference into actual cross location to a known location (xi , yi ) . The use of probability. If Fg<Favg , the crossover probability is absolute error instead of squared error in the fitness calculated according to Formula (8). function avoids the calculation of squared error, reduces P (8) c = P c2 complex multiplication operations, and lowers In Formula (8), P computational load. During the iteration process, absolute c2 (0,1) . After pairing the chromosomes in the population, the crossover operation error is more robust to outliers (i.e. has less impact on is performed based on the calculated crossover data with larger deviations) and also enables the probability. A random number between [0-1] is randomly algorithm to approach the global optimal solution faster, Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 177 generated for each chromosome. The objective of the If Fg<Favg , the probability of variation is calculated by treatment of poorly adapted individuals is to provide Formula (10). those with lower fitness a certain opportunity to p m = P (10) m2 participate in crossover, increase population diversity, In Formula (10), Pm2 (0.01,0.10) . The first step is and circumvent premature convergence to local optimal to randomly generate a number between 0 and 1 for each solutions. If the corresponding value is less than the chromosome in the population. If the generated value is crossover probability, the chromosome is ready to less than the mutation probability, it will undergo perform the next operation. The chromosomes for the mutation operation. The position corresponding to the next step are sequentially crossed in pairs. For each pair required mutation is determined by generating a random of crossed chromosomes, the location of the crossing is number value, and the next step is to invert the value to determined by random numbers and the crossover complete the relevant mutation operation. Monotonic operation is performed. During the mutation operation, if gene locus detection analyzes the whole population and Fg  Favg , the mutation probability is calculated by identifies any monotonic gene loci present. If these are Formula (9). detected, targeted adjustments can be made by generating random numbers. The termination operation should be Fg − Favg P determined and the loop termination should be evaluated m = p (9) m1 Fgb − Favg based on the number of iterations. The final step is to output the optimal solution and test the average In Formula (9), represents positioning error of the algorithm as a performance the adaptive degree function value of individual parameter, as shown in Formula (11). is the value of the optimal individual adaptation function. refers to the average value of the adaptive function. 100 N error =  (x 2 2 i1 − xi2 ) + (yi1 − yi2 ) % (11) N R i=1 N in Formula (11) is the sum of nodes. (xi1 , yi1) configuration, thereby achieving efficient and accurate and (xi2 , yi2 )i = (1, 2,3, , M) represent the actual and network optimization design. calculated coordinates of the unknown node i . R is the maximum communication distance of the node. In the 4 Performance analysis of computer process of transforming IGA theory into practical applications, the rigor of mathematical analysis is network optimization design model reflected in the precise modeling and dynamic adjustment based on genetic algorithms of fitness functions, crossover probabilities, and mutation probabilities. The dynamic allocation of crossover probability is achieved by comparing individual fitness 4.1 Performance analysis of node location with group average fitness and optimal fitness. This based on centroid location algorithm allows individuals with higher fitness to have a higher and APIT algorithm probability of crossover, thereby accelerating the spread of excellent genes, while preserving a small number of To verify the actual positioning effects of the centroid crossover opportunities for individuals with lower fitness algorithm, APIT, and IGA, simulation experiments are and maintaining population diversity. This normalization conducted on three algorithms in MATLAB. The reason mechanism based on fitness differences effectively for choosing APIT and centroid algorithm as benchmarks balances local search and global search, avoids premature for research is their effectiveness and wide application in convergence of the algorithm, and improves solution network optimization. The APIT algorithm performs well accuracy and efficiency. Furthermore, the implementation in localization problems and is suitable for evaluating the of random number generation techniques and probability accuracy and reliability of network nodes, serving as a judgment processes enables the transformation of benchmark for network performance optimization in theoretical models into practical operations, thereby research. The centroid algorithm is known for its ensuring the randomness and controllability of crossover simplicity, ease of use, and fast convergence, making it and mutation. This approach facilitates the robustness and suitable for solving optimization problems in basic convergence of the algorithm in complex optimization network structures. The selection of these two algorithms problems, thus achieving efficient integration of theory perfectly covers different types of network optimization and practice. When using IGA for computer network requirements. Through comparison, the advantages of optimization design, the network optimization problem is IGA in solving complex optimization problems can be first modeled as a fitness function to measure network clearly demonstrated. Using MATLAB version R2021a, performance indicators. Then, through iterative evolution the hardware specifications are as follows: Intel Core through selection, crossover, and mutation operations, the i7-9700K processor, 32 GB DDR4 RAM, 512 GB crossover and mutation probabilities are dynamically solid-state drive, and Windows 10 Professional 64 bit adjusted to optimize the network structure and parameter operating system. The algorithm sets the population size 178 Informatica 49 (2025) 171–186 C. Zhong et al. to 100, the number of iterations to 500, the crossover 100×100 area. After generating this region, the node is probability to 0.8, and the mutation probability to 0.05. predicted by running the corresponding algorithm. Figure The elite strategy is to retain the top 10% of excellent 5 shows the original node distribution diagram. individuals. To ensure the statistical validity of the test The green circle in Figure 5 represents an anchor scenario, multiple sets of experiments are designed and node, while the blue pentagon represents an unknown optimized for network topologies of different sizes and node. Figure 6 shows the positioning results of three complexities. Each experiment should be repeated at least algorithms. The positioning results of centroid algorithm, 30 times to obtain stable average performance indicators APIT, and IGA are shown in Figure 6(a), Figure 6(b) and and standard deviations, ensuring the reliability of the Figure 6(c), respectively. results. In the collected performance indicators, statistical In Figure 6, the predicted value of IGA has a higher analysis is used to evaluate the significant differences in coincidence rate with unknown nodes, reaching 94.36% algorithms under different configurations, thereby (P<0.05). The predicted value of the centroid algorithm determining the efficacy of the optimization effects. has the lowest coincidence rate with unknown nodes, When comparing, a null hypothesis and an alternative which is 86.25%. The coincidence rate between the hypothesis are set. When calculating the P-value, it predicted value of APIT and the unknown node is represents the probability of obtaining the current or more 89.67%. The coincidence rate of the IGA is 8.16% higher extreme result under the null hypothesis. The t-test is than that of the centroid algorithm and 4.69% higher than used to compare the results of IGA and benchmark that of the APIT algorithm (P<0.05). This means IGA has algorithms. If the P-value is less than 0.05, the difference a high positioning computing ability. The positioning is considered statistically significant. The experiment errors of the centroid algorithm, APIT, and IGA are listed generates 20 anchor nodes and 80 unknown nodes in the in Figure 7. Unknown Anchor node node 100 80 60 40 20 0 0 20 40 60 80 100 Node abscissa Figure 5: Original node distribution diagram Coincidence rate:86.25% Coincidence rate:89.67% Prediction Unknown Anchor Prediction Unknown Anchor node node node node node node 100 100 80 80 60 60 40 40 20 20 0 0 20 40 60 80 100 0 0 20 40 60 80 100 Node abscissa Node abscissa (a)Centroid algorithm positioning results (b)ATIP positioning results Node ordinate Node ordinate Node ordinate Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 179 Coincidence rate:94.36% Prediction Unknown Anchor node node node 100 80 60 40 20 0 0 20 40 60 80 100 Node abscissa (c)Improved genetic algorithm location results Figure 6: Location results of three algorithms 50 Centroid algorithm positioning error 45 Location error of ATIP algorithm Improved Genetic Algorithm Location Error 40 35 30 25 20 15 10 5 0 0 20 40 60 80 Unknown node Figure 7: Positioning error of three algorithms The centroid algorithm in Figure 7 achieves a coverage, but relies on high-density anchor nodes and is maximum error rate of 32% during positioning, and the prone to misidentifying external points of the triangle as overall node positioning average error rate is 14% internal points. The centroid algorithm is computationally (P<0.05). The maximum error rate of APIT in positioning simple and suitable for scenarios that do not require high is 20%, but the average value of the overall error has accuracy. However, its accuracy is low and it is easily decreased to 9% (P<0.05). This indicates that compared affected by uneven distribution of node density, resulting to the centroid positioning algorithm, the predicted in positioning bias towards areas with dense anchor nodes coordinates error calculated by the APIT positioning and significant errors. IGA introduces a method of algorithm is significantly reduced, with better positioning dynamically adjusting crossover probability and mutation results. The maximum error rate of IGA during probability during the evolution process. It dynamically positioning is 13%, and the average value of its overall adjusts based on individual fitness and population error is 5.2% (P<0.05). The maximum error of IGA is average fitness, avoiding premature convergence of the 19% lower than the centroid algorithm, and the overall algorithm and ensuring the search for the global optimal average error is 8.8% lower (P<0.05). Compared to solution. IGA performs local fine optimization, APIT, the maximum error of IGA is 7% lower, and the improving the accuracy and stability of the algorithm. average overall error is 3.8% lower (P<0.05). APIT The comparison of the three shows that IGA has excellent improves positioning accuracy through random triangle node positioning capabilities in wireless sensor networks. Positioning error Node ordinate 180 Informatica 49 (2025) 171–186 C. Zhong et al. 4.2 IGA-based application analysis of networks under different iterations. Hence, under the same parameter conditions, this study gradually changes computer network optimization design the iterations and simulates them using TGA, centroid There are differences in the performance of IGA and positioning algorithms, APIT, and IGA, respectively. other node localization algorithms for wireless sensor Thus, the iteration number 10 Traditional genetic algorithm Centroid positioning algorithm 8 ATIP location algorithm Improved genetic algorithm 6 4 2 0 0 10 20 30 40 50 60 70 80 90 100 Iterations Figure 8: Changes in fitness values of the four methods 2 Traditional genetic algorithm Centroid positioning algorithm ATIP location algorithm 1.5 Improved genetic algorithm 1 0.5 0 0 10 20 30 40 50 60 70 80 90 100 Number of anchor nodes Figure 9: Relationship between the number of anchor nodes and average error and its corresponding fitness value are obtained. As the Figure 8 shows that the fitness values of the four number of iterations gradually increases, Figure 8 localization algorithms are all less than 10. The fitness illustrates the corresponding changes in fitness between values of IGA, TGA, centroid algorithm, and APIT are the IGA and other algorithms for positioning wireless 4.26, 8.15, 6.42, and 5.31, respectively. The iteration sensor network nodes. numbers are 69, 86, 83, and 79, respectively. The IGA’s fitness value is the lowest, 3.89 lower than that of TGA, Average error Fitness value Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 181 2.03 lower than centroid algorithm, and 1.05 lower than the average error of IGA is the lowest among all four APIT. This shows that IGA has better adaptability in algorithms. This fully demonstrates the advantages of node localization, and verifies the superiority of this IGA. Figure 10 displays the chart that pertains to the algorithm. The anchor node numbers and the average communication radius of a particular node and the error value’s relationship of the four algorithms is shown average error value of the four algorithms. in Figure 9. In Figure 10, the communication radius of nodes is In Figure 9, as the anchor node amount increases, increasing, while the average error of the four node the average error of the four positioning algorithms is positioning algorithms is gradually decreasing. The gradually decreasing. The average errors of TGA, average errors of TGA, centroid algorithm, APIT, and centroid algorithm, APIT, and IGA are 1.12, 1.03, 0.95, IGA are 5.75%, 4.52%, 3.87%, and 2.46%, respectively and 0.68, respectively (P<0.05). The average error of (P<0.05). When the communication radius is the same, IGA is significantly lower than that of TGA. Meanwhile, the IGA’s average error is always at the lowest value. as the number of anchor nodes increases gradually in This verifies the superiority of IGA when the proportion to all nodes, the average error of various communication radius of nodes changes. Figure 11 is the algorithms undergoes slight changes over time, as shown relevant of the specific network connectivity and the by the curve. As the anchor node amount keeps the same, average error value of the four algorithms. 10% Traditional genetic algorithm 9% Centroid positioning algorithm 8% ATIP location algorithm 7% Improved genetic algorithm 6% 5% 4% 3% 2% 1% 0 0 10 20 30 40 50 60 70 80 90 100 Node communication radius Figure 10: Relationship between node communication radius and average error 10% Traditional genetic algorithm 9% Centroid positioning algorithm 8% ATIP location algorithm 7% Improved genetic algorithm 6% 5% 4% 3% 2% 1% 0 0 10 20 30 40 50 60 70 80 90 100 Node network connectivity Figure 11: Relationship between network connectivity and average error value Average error Average error 182 Informatica 49 (2025) 171–186 C. Zhong et al. The average errors of TGA, centroid algorithm, between node coverage, evolutionary algebra, and APIT, and IGA in Figure 11 are 5.41%, 4.49%, 3.71%, completion time of TGA and IGA. and 2.41%, respectively (P<0.05). This indicates that The node coverage of the two algorithms in Figure regardless of changes in network connectivity, IGA's 12(a) shows: in the same node density, the IGA has a positioning ability is always higher than the other three higher regional coverage. Figure 12(b) displays the algorithms. Figure 12 is the graph about connection relationship between the number of evolutions and completion time of both algorithms over 100 90 80 70 60 50 40 30 Node coverage of traditional genetic 20 algorithms Improving Node Coverage of 10 Genetic Algorithms 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Node density (a)Node coverage graph of two algorithms 1000 Relationship between Evolutionary Algebra and Completion Time of Traditional Genetic Algorithms 900 The Relationship between Evolutionary Algebra and Completion Time of Improved Genetic Algorithms 800 700 600 500 400 300 200 100 0 0 50 100 150 200 250 300 350 400 450 500 Evolutionary algebra (b)Evolution algebra and completion time graph of two algorithms Figure 12: The relationship graph between node coverage, evolutionary generations, and completion time of two algorithms Completion time Area coverage Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 183 successive iterations of the genetic algorithm. As the further application analysis is conducted in a large-scale iterations progress, the number of evolutions gradually wireless sensor network node positioning scenario with a increases while the time required to complete all side length of 500m in a region. The total number of iterations decreases. However, with the same sensor nodes in the region is 500, including 100 anchor evolutionary algebra, IGA takes less time. This shows the nodes and 400 unknown nodes. The results of the superiority and stability of the IGA. To further analyze large-scale application analysis are shown in Table 2. the adaptability and superiority of the research method, Table 2: Results of application analysis in major scenarios Metrics IGA APIT Algorithm Centroid Algorithm Average Positioning Error 2.45 5.62 7.89 (%) Positioning Time (s) 12.3 18.4 9.6 Number of Iterations 75 120 60 Fast (0.5 fitness Medium (0.8 fitness Slow (1.2 fitness Convergence Speed variation) variation) variation) Node Coverage Rate (%) 98.6 90.5 85.3 As shown in Table 2, the average positioning error advantages in node positioning accuracy. At the same of IGA is 2.45%, significantly lower than the 5.62% of time, IGA had a faster convergence speed, requiring only APIT algorithm and the 7.89% of centroid algorithm 75 iterations to reach the optimal solution, with a stable (P<0.05). This indicates that IGA can effectively improve fitness value change (0.5). APIT and centroid algorithms the accuracy of node localization in large-scale scenarios required 120 and 60 iterations, respectively, and had and is suitable for complex and high-precision network slower convergence speeds. In addition, IGA achieved a environments. The positioning time of IGA is 12.3 node coverage rate of 98.6%, significantly higher than seconds, which is between 18.4 seconds of APIT and 9.6 APIT (90.5%) and centroid algorithm (85.3%), seconds of centroid algorithm (P<0.05). Although the demonstrating its applicability and advantages in computational complexity is slightly higher than that of large-scale complex network environments. The reason the centroid algorithm, IGA improves efficiency by why IGA outperforms traditional methods in terms of optimizing the evolution process, enabling it to maintain positioning error and fitness values is mainly due to fast computational speed while achieving high-precision several key technological innovations. Firstly, the positioning. After 75 iterations, IGA achieves dynamic parameter adjustment mechanism can convergence, which is faster than the APIT algorithm's dynamically adjust the crossover probability and 120 iterations (P<0.05), demonstrating the advantages of mutation probability based on the fitness value, thereby IGA's dynamic parameter adjustment and elite strategy in balancing global and local search and preventing the the search process. Although the centroid algorithm has algorithm from getting stuck in local optimal solutions. fewer iterations, its accuracy is significantly insufficient This is consistent with the ideas of Yu et al. [20]. (P<0.05). The node coverage rate of IGA reaches 98.6%, Secondly, fitness function optimization reduces which is much higher than the 90.5% of APIT algorithm computational complexity and enhances robustness to and the 85.3% of centroid algorithm (P<0.05). This outliers by introducing absolute error instead of indicates that IGA has better coverage performance in traditional square error, enabling the algorithm to large-scale networks and can optimize node positioning approach the global optimal solution more quickly. In layout more comprehensively. addition, the elite strategy ensures the retention of high fitness individuals and reduces the loss of high-quality 4.3 Discussion solutions. The uniform distribution of the initial population within the communication area improves This study has designed an IGA that effectively improves search efficiency and reduces ineffective calculations the accuracy and stability of node localization in wireless caused by random initialization. The results obtained are sensor networks through techniques such as dynamic consistent with Singh et al.'s study [21]. These parameter adjustment, fitness function optimization, and improvements effectively address common pitfalls of elite strategy. Compared with traditional centroid TGAs, such as local optima and premature convergence, algorithms and APIT algorithms, IGA exhibits significant enabling IGA to exhibit higher stability and accuracy in advantages in key performance indicators. Specifically, complex dynamic network environments. The core the average positioning error of IGA was 2.45%, which innovation of IGA lies in combining the local was much lower than APIT's 5.62% and centroid improvement of TGAs with global search, which is algorithm's 7.89%, indicating that IGA has significant suitable for non-standard situations such as uneven node 184 Informatica 49 (2025) 171–186 C. Zhong et al. distribution, limited number of anchor nodes, and References complex conditions such as changes in communication radius. In practical applications, IGA demonstrates good [1] F. Wang, X. Lai, and N. Shi, “A multi-objective stability and adaptability by flexibly adjusting parameters optimization for green supply chain network and optimizing search space. In previous studies, TGAs design,” Decision Support Systems, vol. 51, no. 2, often faced local optimal traps, leading to premature pp. 262-269. convergence of the algorithm. The study aims to enhance https://doi.org/10.1016/j.dss.2010.11.020 population diversity, reduce the interference of outliers [2] Q. Liu, Z. Guo, and J. Wang, “A one-layer recurrent on the search process, and accelerate the convergence of neural network for constrained pseudoconvex the global optimal solution by providing low fitness optimization and its application for dynamic individuals with moderate opportunities for crossover and portfolio optimization,” Neural Networks, vol. 26, mutation. The research provides a more stable, accurate, pp. 99-109, 2012. and efficient solution for node localization and https://doi.org/10.1016/j.neunet.2011.09.001 optimization in complex network environments. [3] J. L. Ribeiro Filho, P. C. Treleaven, and C. Alippi, “Genetic-algorithm programming environments,” Computer, vol. 27, no. 6, pp. 28-43, 1994. 5 Conclusion https://doi.org/10.1109/2.294850 [4] C. D. Lin, C. M. Anderson-Cook, M. S. Hamada, L. The high-speed development of computer network M. Moore, and R. R. Sitter, “Using genetic technology has caused tremendous changes in people's algorithms to design experiments: a review,” production and life. Currently, computer network Quality and Reliability Engineering International, optimization still has the problem of low positioning vol. 31, no. 2, pp. 155-167, 2015. accuracy of network nodes. To solve the related https://doi.org/10.1002/qre.1591 problems, this study constructed an IGA model and [5] W. B. Fan, F. Xiao, X. B. Chen, L. Cui, and S. Yu, applied it to computer network optimization. “Efficient virtual network embedding of Experimental results showed that IGA has significantly cloud-based data center networks into optical improved location coverage and average location error networks,” IEEE Transactions on Parallel and compared to centroid algorithm and APIT. The Distributed Systems, vol. 32, no. 11, pp. 2793-2808, coincidence rate of the improved algorithm was 8.16% 2021. https://doi.org/10.1109/TPDS.2021.3075296 higher than centroid algorithm’s and 4.69% higher than [6] B. Rajendran, and S. Venkataraman, “Detection of that of the APIT algorithm. The maximum error of IGA malicious network traffic using enhanced neural was 19% lower than the centroid algorithm, and the network algorithm in big data,” International overall average error was 8.8% lower. Compared to Journal of Advanced Intelligence Paradigms, vol. 19, APIT, the maximum error of IGA was 7% lower, and the no. 3-4, pp. 370-379, 2021. average overall error was 3.8% lower. Under the same https://doi.org/10.1504/ijaip.2021.116366 parameters, TGA, centroid algorithms, APIT algorithms, [7] A. Xiaokaiti, Y. Qian, and J. Wu, “Efficient data and IGA were used to compare the performance of transmission for community detection algorithm network nodes in computer networks. Experimental data based on node similarity in opportunistic social were obtained: the fitness value of IGA, the amount of networks,” Complexity, vol. 2021, pp. 1-18, 2021. anchor nodes and the average error, the communication https://doi.org/10.1155/2021/9928771 radius and the average error, and the network [8] R. Alsaqour, S. Kamal, M. Abdelhaq, Y. Zan, and connectivity and the average error were 4.26, 0.68, 2.46, D. Jerou, “Genetic algorithm routing protocol for and 2.41, respectively. IGA had a significant mobile ad hoc network,” CMC Computers Materials improvement over the calculated values corresponding to Continua, vol. 68, no. 1, pp. 941-960, 2021. the three algorithms, which proves the accuracy and https://doi.org/10.32604/cmc.2021.015921 stability of the improved genetic positioning algorithm. [9] B. Bu, “Mult-task equilibrium scheduling of internet of things a rough set genetic algorithm,” Computer 6 Abbreviated List Communications, vol. 184, pp. 42-55, 2022. NT: Network Topology https://doi.org/10.1016/j.comcom.2021.11.027 PSOGA: Particle swarm optimization and genetic [10] R. Y. Wu, J. M. Ma, Z. X. Tang, X. H. Li, and K. K. algorithm R. Choo, “A generic secure transmission scheme RLN: Random linear network based on random linear network coding,” IEEE IGA: Genetic Algorithm ACM Transactions on Networking, vol. 30, no. 2, TGA: Traditional Genetic Algorithm pp. 855-866, 2021. APIT: Approximate Point In Triangulation Test https://doi.org/10.1109/TNET.2021.3124890 [11] W. C. Chang, and I. H. R. Jiang, “iClaire: A fast and general layout pattern classification algorithm with Design and Application of Improved Genetic Algorithm for… Informatica 49 (2025) 171–186 185 clip shifting and centroid recreation,” IEEE Nonlinear Control, vol. 31, no. 15, pp. 7390-7408, Transactions on Computer-Aided Design of 2021. https://doi.org/10.1002/rnc.5690 Integrated Circuits and Systems, vol. 39, no. 8, pp. [21] G. Singh, V. K. Tewari, R. R. Potdar, and S. Kumar, 1662-1673, 2019. “Modeling and optimization using artificial neural https://doi.org/10.1109/TCAD.2019.2917849 network and genetic algorithm of self‐propelled [12] T. Ganesan, and P. Rajarajeswari, “Efficient sensor machine reach envelope,” Journal of Field Robotics, node connectivity and target coverage using genetic vol. 41, no. 7, pp. 2373-2383, 2024. algorithm with Daubechies 4 lifting wavelet https://doi.org/10.1002/rob.22255 transform,” International Journal of Communication Networks and Distributed Systems, vol. 28, no. 3, pp. 337-364, 2022. https://doi.org/10.1504/ijcnds.2022.122170 [13] S. T. Shishavan, and F. S. Gharehchopogh, “An improved cuckoo search optimization algorithm with genetic algorithm for community detection in complex networks,” Multimedia Tools and Applications, vol. 81, no. 18, pp. 25205-25231, 2022. https://doi.org/10.1007/s11042-022-12409-x [14] B. Nahavandi, M. Homayounfar, A. Daneshvar, and S. Mohammad, “Hierarchical structure modelling in uncertain emergency location-routing problem using combined genetic algorithm and simulated annealing,” International Journal of Computer Applications in Technology, vol. 68, no. 2, pp. 150-163, 2022. https://doi.org/10.1504/ijcat.2022.123466 [15] Z. Sabir, M. R. Ali, and R. Sadat, “Gudermannian neural networks using the optimization procedures of genetic algorithm and active set approach for the three-species food chain nonlinear model,” Journal of Ambient Intelligence and Humanized Computing, vol. 14, no. 7, pp. 8913-8922, 2023. https://doi.org/10.1007/s12652-021-03638-3 [16] C. Zhao, W. X. Zhu, G. Qiao, and F. Zhou, “Optimisation method with node selection and centroid algorithm in underwater received signal strength localization,” IET Radar, Sonar Navigation, vol. 14, no. 11, pp. 1681-1689, 2020. https://doi.org/10.1049/iet-rsn.2020.0178 [17] Y. Zou, “Coupled neural networks and genetic algorithms application in the field of mine fire extinguishing,” Informatica, vol. 48, no. 16, 2024. https://doi.org/10.31449/inf.v48i16.6317 [18] Y. M. Wu, Z. Li, C. X. Sun, Z. B. Wang, D. S. Wang, and Z. W. Yu, “Measurement and control of system resilience recovery by path planning based on improved genetic algorithm,” Measurement and Control, vol. 54, no. 7-8, pp. 1157-1173, 2021. https://doi.org/10.1177/00202940211016094 [19] Y. Zhou, “Structural damage identification of large-span spatial grid structures based on genetic algorithm,” Informatica, vol. 48, no. 17, 2024. https://doi.org/10.31449/inf.v48i17.6428 [20] M. Yu, and S. Chai, “Adaptive iterative learning control for discrete time nonlinear systems with multiple iteration varying high order internal models,” International Journal of Robust and 186 Informatica 49 (2025) 171–186 C. Zhong et al. https://doi.org/10.31449/inf.v49i16.7452 Informatica 49 (2025) 187-198 187 Optimization of Emergency Material Logistics Supply Chain Path Based on Improved Ant Colony Algorithm Mingbin Wei College of Economics and Management, YanShan University, Qinhuangdao 066000, Hebei, China E-mail:weimingbin@stumail.ysu.edu.cn Keywords: emergency material, improved ant colony algorithm (IACA), logistics supply chain, travelling salesman problem Received: October 29, 2024 Path selection is a critical challenge in emergency logistics management, particularly under realistic disaster-related conditions. This study addresses the problem of optimizing logistics transportation during major epidemics, considering constraints such as vehicle load, volume, and maximum travel distance per delivery. The goal is to minimize costs related to distribution trips, time, early/late penalties, and fixed vehicle expenses. By framing the problem as a generalized Traveling Salesman Problem, we developed an Improved Ant Colony Algorithm (IACA) to reduce the longest distribution path. Simulation data from Pudong, Shanghai lockdown zones revealed that IACA outperformed the traditional ACO algorithm, achieving a 30% cost reduction and higher accuracy (R² = 0.98). Additionally, experiments on gate assignment and TSP demonstrated the algorithm's superior optimization ability and stability. Overall, IACA enhances delivery route efficiency, lowers costs and energy consumption, and improves emergency logistics performance, proving to be a robust and reliable solution. Povzetek: Avtor je razvil izboljšan algoritem kolonije mravelj (IACA), ki optimizira poti v logistični oskrbovalni verigi za nujne materiale. 1 Introduction unpredictable and emergency material elements are very complicated. The logistics industry's use in many different industries is In order to effectively provide goods while minimizing growing more and more common as the global economy losses, emergency logistics was first established in 2004 to develop. Researchers are becoming more aware of the address the logistical challenges resulting from disasters. crucial role emergency logistics (EL) plays in delivering With its emphasis on facility placement and material supplies to disaster zones as a result of the exponential rise delivery, it is essential to emergency decision-making. in emergency response operations and the rising frequency Supply channels are frequently disrupted by disasters, of disasters, both man-made and natural [1]. In particular, resulting in large losses. 15% to 20% of all disaster losses since 1980, catastrophes caused by nature have claimed the may be attributable to inefficient distribution [4]. The lives of over 2.4 million people globally, and their focus of humanitarian logistic modeling services, which economic toll has grown by more than 800%, reaching account for 80–90% of rescue expenses, is on rapid $210 billion in year 2020 alone.137 million people in reaction deployment, which is essential for successful China were directly impacted by different disasters by rescue operations. The ideas under discussion centre on natural in the same year, resulting in 19,956.7 hectares of maximizing the distribution, transportation, and crops being damaged, 370.15 billion yuan in direct positioning of emergency supplies to rescue locations. For economic losses, and 591 fatalities [2]. disaster response to be successful, emergency logistics The National Disaster Reduction Commission and the efficiency must be increased [5]. China National Ministry of Emergency Management both For emergency rescue and relief efforts to be successful, state, China experienced 130 million individuals impacted the emergency supply chain must run smoothly. It is quite by various disasters by natural in 2018, 588 fatalities, and challenging to gather comprehensive information to 264.46 billion RMB in direct economic losses. Using support emergency operations during large-scale earthquakes as an example, in 2018, earthquakes of calamities since they are frequently unanticipated and magnitude 6.0 or higher killed 3068 people worldwide and exceedingly destructive. The victims' livelihoods and wounded over 16,000 more. In order to reduce the number security may be negatively impacted by ineffective or even of casualties and property damage following a disruption, halted emergency material operations brought on by a emergency materials must be delivered to the disaster shortage of supplies and knowledge [6]. Therefore, while zones promptly, precisely, and efficiently [3]. Emergency reacting to large-scale emergencies, it is essential to solve rescue is usually quite urgent because most disruptions are the fundamental concerns of rapid acquisition and 188 Informatica 49 (2025) 187-198 M. Wei integration of comprehensive emergency supply chain The most important thing should be the timeline. Only if materials and information flow. the emergency supplies are delivered to the Disaster The planning, coordinating, and carrying out of logistics Supply Depots accurately and on time will the damage be theory expert operations during emergencies, disasters, reduced. The fastest delivery time is therefore practically and crises are guided by the framework and set of the most crucial factor in the improved ant colony model. principles known as emergency logistics theory. It We suggested an Improved ACA-based approach to includes a variety of ideas, tactics, and procedures meant solving the routing emergency material path problem. To to guarantee the smooth and efficient movement of get the best answer, the process determined the shortest information, products, services, and resources in order to path between nodes using a travelling salesman shortest meet the demands of impacted communities and lessen the path tree structure [8]. According to research findings, the effects of the crisis. The study uses a hybrid technique of suggested approach performed admirably in various simulated annealing and ant colony optimization to offer a disaster networks. In conclusion, even though ACO has low-carbon vehicle route optimization model for logistics been researched extensively and shown to perform and distribution. For increased efficiency, it uses an effectively in organising routes, it is incredibly uncommon adaptive elite individual reproduction strategy and adds a to utilise Ant Colony to optimise the supply chain route for multifactor operator and carbon emission factor [7]. emergency material logistics in disaster areas. Researchers are becoming more interested in emergency The study's primary contributions fall under the following logistics. Nonetheless, the majority of recent studies categories. address the location of facilities. This study focusses on the • First, this study identifies and measures the variables supply chain for emergency material logistics following a that have been discovered to affect ACA's efficacy from disaster. After a disaster strikes, it seeks to create the best both an internal and external standpoint. This includes the plans possible for moving emergency supplies from one- IACA's shortest route while taking into account the supply to-many supply depots to disaster depots. Fig. 1 depicts the chain's external environment, funding for materials and emergency management domain. In order to demands equipment, complex emergency decision-making, and victims and finish rebuilding the disastrous event region material transportation deployment. This is a definite step after a disaster occurs, EM is a specific type of problem in filling the knowledge gap in the body of existing with vehicle routing that examines how to transport relief literature. materials from depots for supply to depots for demand • Second, according to research methodologies, the (disaster areas). EMS is typically separated into many-to- majority of models in use today use traditional algorithms many scenarios based on the quantity of supply depots, as including precise, heuristic, and meta-heuristic algorithms. seen in Fig. 2 • Third, in order to show the validity of the results, we contrasted the outcomes of the IACA method with those of the conventional ACA strategy. Furthermore, these findings will help emergency managers better pinpoint the sources and means of important elements, as well as the causal and hierarchical connections among them, and they will be able to contribute to the development of a robust and effective path. This study is organized as the literature review for this topic is presented in Chapter 2. The research strategy and Figure 1: Supply depots to demand depots methodology are described in-depth in Chapter 3. The application of the proposed Improved-ACO Algorithm is shown in Section 4, followed by the application's outcomes and the discussion these finding, Section 5 summarises the findings, study shortcomings, and future research directions. 2 Literature review These days, the transportation sector is growing quickly, and its main concentration is on the logistical distribution of perishable agricultural goods. The Emergency material logistics supply chain path finding path optimization method's application usefulness is recognised by experts. Figure 2: Quantity of supply depots in many to many The supply chain for perishables has been examined by scenario several academics. Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 189 For the uncertainty of unanticipated events in roadways in An ACO-based optimisation technique was put forth by cities during shipping, a GA-based path optimisation researchers to address the issue of UAV scheduling routes. model was introduced [9]. As part of the endeavour to The procedure was solved for DSP and optimised for reduce transportation costs, the logistics path optimisation hierarchical “pheromone”-based processing. According to model with a challenging window was developed to the testing results, the suggested approach has outstanding address the path dynamic. The outcomes of the path planning speed and good path planning quality [14]. experiments showed that this path optimisation model To solve the path routing problem of the fourth- party performed successfully. In order to address the sustainable logistics, a method based on the ant colony system and the food supply chain optimisation issue, this study presented improved grey wolf algorithm was proposed. During the a mixed integer linear programming model. In the purpose process, which included a carrying capacity and reputation to reduce fuel costs, transportation expenses and carbon constraint from the beginning node to the node of emissions were integrated, and Norwegian salmon destination, known as the transit range, the ratio utility exporters were used to conduct a suitability analysis [10]. theory was used to determine the customer's risk appetite. For the prompt delivery of disaster relief materials after The results of the study demonstrated that the natural catastrophes, [11] and his team proposed an recommended strategy could effectively finish path algorithm based on meta-heuristics. in hybrid Three optimisation planning [15]. optimisation improvement approaches were presented, the For the planning of the return path challenge of logistics urgency coefficient of each demand point was evaluated, networks in reverse, author suggested an ACO-based path and Harris eagle optimisation and random PSO were technique. The procedure created a MINLP model, integrated. The outcome of the study shows that the evaluated costs using a closed-loop, multi-stage logistics suggested approach had a maximum degree of network, and tested using thirty instances. Results from computational correctness. experiments showed that the suggested approach could This Study developed a path optimisation technique based produce return pathways of excellent quality [16]. on enhanced GA to meet the cost and efficiency criteria of An ACO-based approach to solving the routing nodal path distributing fresh food. The procedure implemented a problem. To get the best answer, the process determined linear adaptive cross-variance technique and designated the shortest path between nodes using a rooted shortest certain elements as penalty factors. The finding of the path tree structure. According on research findings, the study shows that the approach could successfully reduce suggested approach performed effectively in networks of delivery path length and had a higher path optimisation various sizes. In conclusion, despite extensive research and efficiency [12]. demonstrated effectiveness in route planning, ACO is A two-objective optimisation model was presented by this currently rarely used to optimisethe route taken by cold author for the adaptable perishable supply chain design chain logistics to distribute perishable agricultural goods. commodities [13]. During the process, product and route Given the pressing need for additional technical references disruption deterioration were thoroughly examined, utility to support the growth of the cold chain logistics role GA was added to optimise the method, and dynamic distribution industry, the Improved ACO-based pricing was employed to handle crises. The experimental optimisation model was put forth, and the technical results demonstrated the good flexibility of the suggested features of ACO were applied to optimise the cold chain approach. Using Ant Colony Optimization (ACO) or other logistics distribution path for perishable agricultural technologies related to computers has been researched by products [17]. several academics. Table 1: Summary on existing and suggested method compared with computational accuracy, time efficiency and cost reduction Algorithm Computational Accuracy Time efficiency Cost reduction TGWO TGWO algorithm as a When compared to the In terms of the overall decision support tool to boost TS and GWO cost of distribution, it supply chain performance, algorithms, the TGWO saved 14.34 percent cut expenses, and improve method reduced the and 9.03 percent. cold chain logistics overall journey operations. distance by 50.34% and 30.66%, respectively. 190 Informatica 49 (2025) 187-198 M. Wei IPSO The algorithm and If every demand's According to the emergency logistics vehicle delivery priority is met, sensitivity analysis, route optimization model for an enhanced vehicle when the time cost is at severe epidemics that are routing optimization its lowest, there should suggested in this research algorithm that takes be three vehicles in the work well. delivery urgency into distribution center. The account can save a whole expense was specific amount of lowered by 20.09%. time; NN The prediction findings Choose the route for Reduce the percentage demonstrate that the material delivery that of transportation prediction can yield has the quickest speed expenses as much as increased accuracy and better and the lowest distance. you can, and save route matching. money on supplies and automobiles. IACA the suggested model this study uses cost reduction based on achieved the best solution travelling salesman transportation, accuracy with 98.5% problem to find the inventory, labor, shortest route and minimize distance, efficient in time avoid traffic and travelling distance for harzards by 30.2% emergency 3 Methodology supply. The “pheromone” that remain on a trail may persuade other ants to follow it. The strongest Inspiration: “pheromone” indicates a shorter path, which most ants can The process of optimising ant colonies is iterative. Several eventually follow. fictitious ants are considered at each cycle. With the restriction that ant not go to any vertices that already made 3.1 Improved ant colony algorithm it to during the ant walk, each of them constructs a solution This study uses the improved ant colony algorithm's high by moving from vertices to vertices on the graph. An ant adaptability, multi-concurrency, resilience, and global uses a stochastic mechanism biassed by the “pheromone” search capabilities to handle the logistics supply chain's to choose the next vertex to visit at each stage of the emergency material problem. Consequently, the enhanced solution building process. For instance, the next vertex in ant colony method is presented to solve the model since it vertex I is selected at random from among those that is highly parallel, offers the benefits of high fault tolerance haven't been visited yet. More specifically, k can be chosen and self-adaptation, and allows for heuristic improvement with a probability proportional to the “pheromone” to enhance the algorithm's convergence. The ant colony connected to edge (m, n) if it has never been visited before. method is a Traveling salesman algorithm that finds the The calibre of the answers the ants have created will shortest path by simulating ant populations' foraging depend on, the “pheromone” values are modified at the end behavior. Individuals of the ant colony will evaluate and of an iteration. By doing this, the ants are skewed to create choose the optimal foraging path based on the solutions that are similar to the greatest ones they have concentration of “pheromones” left by ants as they pass created in previous cycles. The basic concept underlying through nodes along the route. Rich customer and order the AC method is inspired by the way ant colonies behave data are challenging for traditional logistics to manage. when they are looking for food. Ants usually start by Therefore, in the event of a natural disaster, the logistics aimlessly looking around for food, bringing some of what automation path finder now incorporates an improved they find back to their colony. They also leave a ACA. The drawbacks of conventional ACA are addressed “pheromone” on the track they found. The worth of with certain enhancements, which successfully resolve “pheromones” left in their wake, which gradually issues like resource scheduling and route planning in the dissipates, depends on the quantity and calibre of the food logistical process. Figure 3 explained about the flow of emergency material path supplying method. Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 191 Figure 3: Block diagram of the method 3.2 Improved ant colony algorithm optimization strategy for emergency 𝑛 logistics material 𝑛 𝑛 𝐶2 = ∑ ∑ 𝑖 ∑ 𝐶 𝑥 ′ 𝑖𝑗𝑘 𝑣=1 𝑖𝑗𝑘 (2) 𝑗=0 Several production process connections are becoming 𝑗=1 more specialized due to the escalating market competitiveness. As a result of this tendency, logistics and The set of transportation between places 𝑖and𝑗 is commercial flow are now separated, progressively represented by 𝐶𝑖𝑗𝑘 in eq. (2). The variable between emphasising the significance of logistics. Conventional transportation points is called 𝐶𝑖𝑗𝑘. One constant is n. logistics models are inefficient and do not provide IACO puts ants at the first dispersion site in Fig. 4. After intelligent assistance. A lot of management and operations creating a tabu table, the cycle is initiated. The involve manual labor, which is ineffective and prone to experimental random selection technique determines the mistakes. At the same time, businesses find it challenging next transfer node based on the chance of ants travelling to to forecast and decide on logistical procedures when they various nodes. The transfer node is added to the tabu table lack intelligent help. Additionally, this hinders the rapid once it satisfies the constraint constraints. The ants' optimization and modification of logistical plans. The relevant journey length and delivery cost are determined emergency logistics model has been developed based on once the transfer is completed constantly until the tabu- this. The material logistics supply chain is a complete table is full. The worldwide ““pheromone”” is adjusted system that controls and optimizes the logistics process and the path optimisation is finished based on the using a variety of automation technologies and tools. computed results. After the maximum number of cycles Logistics transportation costs are a significant part of the has been reached, the outcome is the optimal path solution. logistics chain, and they can be managed and controlled to In the real-world application, the starting point and increase the efficiency of logistical processes. associated fundamental path of dispersion parameters are The cost of logistics transportation is shown in eq. (1). input. The research method is then used to develop the path solution, and the best solution is chosen to finish the path 𝑘 𝐶1 = ∑ 𝑠𝑘 ∑𝑚 . generating process. 𝑘=1 𝑖=1 𝑓i (1) 𝑠𝑘is the variable in eq. (1). The vehicle number is 𝑣. The number of articles, including all transportation expenses, is denoted by 𝑚. 𝑓is the cost of driving eq.(2) displays the associated transportation cost. 192 Informatica 49 (2025) 187-198 M. Wei of conventional autonomous logistics systems is rather low, despite the growing need for logistics technology. It is necessary to enhance the capacity to develop transportation networks, manage the supply chain, and optimise warehouse operations. An optimisation algorithm called ACA mimics how ants might forage for food in the wild. By mimicking the ant's ““pheromone”” transmission mechanism, it facilitates cooperation and information exchange throughout the optimisation process. Thus, adding ACA to the logistics control model can improve the management level of autonomous logistics control by successfully resolving issues like resource scheduling and path planning in the logistics process. Eq. (3) shows the node selection probability in ACA. [π (s 𝑃𝑘 ij )]α [𝜂ij(s)]β ¨ (t) = { β 𝛼 , j∈ 𝑤𝑖𝑡ℎ𝑜𝑢𝑡 𝑝𝑎𝑠𝑠𝑒𝑑 𝑖𝑗 ∑[[πis(s)]α [𝜂ij(s)] ] (3) In eq. (3),𝑖𝑗 is the data concentration. 𝑖𝑗 is the path visibility of path. is the trade-off factor in ““pheromone””. is the “heuristic factor” for expected values . The 𝑖𝑗 is shown in eq. (4) 1 𝜂𝑖𝑗 = (4) 𝑑𝑖𝑗 In eq. (4), 𝑑𝑖𝑗 is denotes the point i and point j distance. After complete ant colony cycle, its ““pheromones” will also be updated accordingly. The expression for “pheromone” update is shown in eq.(5) 𝜏𝑖𝑗(t+n)=(1-𝜌)𝜏𝑖𝑗(𝑡) + Δ𝜏𝑖𝑗 Figure 4: IACA flowchart (5) The “pheromone” evaporation coefficient is represented The logistics supply chain system consists of a server, by eq. (5). The time point is 𝑡. The next node is typically front-end work, mechanical arm, three-dimensional chosen at random by conventional ACA, though. While warehouse management, AGV monitoring, PLC random selection facilitates the exploration of broader monitoring, sorting system, and commodities warehouse, problem areas, the early stage's convergence speed is among other components. The primary the sorting system's slower due to the lengthy application of positive feedback. objective is to determine and categorise products. They are If the complete supply chain is not coordinated and dispersed throughout various locations or forms of optimised, the logistics chain as a whole may operate conveyance based on their type or destination. A inefficiently, which may increase time costs. In order to commodities warehouse, which includes both automated achieve this, the study presents logistic chaotic mapping and conventional warehouses, is a facility used for the with the goal of leveraging its features to increase the storage of products. PLC and AGV monitoring are the two precision of knowledge accumulation. Then, during the primary components of logistics system monitoring. optimisation phase, some randomness is introduced into Several components of the logistics system can be the basic ACA's routes. monitored and controlled using PLC monitoring, which In IACA, at the conclusion of every iteration, a logistics functions as a programmable logic controller. who possesses the best solution for that the optimal answer Transporting items within or between warehouses requires or iteration because the methods inception changes the AGV monitoring, which keeps track of the equipment's disaster distanceas per eq.(6).Additionally, each position and operational state. To guarantee precise instruction updates the amounts of “pheromones” on the storage and retrieval of items, the logistics system as a final Communication, as illustrated in eq.(7), where 𝜏𝑖, 𝑗 is whole uses three-dimensional warehouse management, the updated Schedule on the last feedback, denoted by tracking, and computer-based management. Order 𝑅𝑖, 𝑗, 𝜏𝑖, 𝑗 ∗ is the updated location, τ0 is the initial 1 processing, inventory queries, and other front-end tasks are warehouse, 𝛥𝜏𝑖, 𝑗 is represented as for the duration of 𝐿𝑏𝑒𝑠𝑡 the key tasks. The system uses a robotic arm to operate in the optimal route 𝐿𝑏𝑒𝑠𝑡, and p and q are variables in (0,1) order to accomplish a variety of activities, including that correspond to the “pheromone” and its decay grabbing, moving, assembling, etc. The management level Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 193 coefficient feedback rate, respectively. Based on the therefore 𝑑𝑖𝑗 = 𝑑𝑗𝑖for any pair of nodes. In the asymmetric likelihood 𝑝𝑖, 𝑗𝑘, which is computed as shown in Equation TSP, for at least one pair of nodes (i, j), asymmetric TSP (7) where 𝜂, each ant k chooses its new path.𝐼, 𝐽is the for at least one pair of nodes (i, j). inverse of the route's length.The locations that have not yet Define the variables in eq. 10: been visited are shown by the letters i, j, and l, while The 𝑥_(𝑖, 𝑗) = respective impacts of “pheromone”s concentrations and ( 1 𝑖𝑓 𝑡ℎ𝑒 𝑎𝑟𝑐(𝑖, 𝑗) 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑡𝑜𝑢𝑟 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 heuristic data are indicated by the control parameters α and (10) β, respectively. The TSP can be formulated as a generalisation of a well- known integer program formulation. 𝜏𝑖, 𝑗 ∗= 1 − 𝑟𝜏𝑖, 𝑗 + 𝑟 · 𝛥𝜏𝑖, 𝑗, 𝑖𝑓𝑅𝑖, 𝑗 (6) The constraints are written as: is in the best path 𝜏𝑖, 𝑗, otherwise 𝑋𝑛 𝑐 = 1 𝑥𝑖𝑗 = 1, 𝑑 = 1, 2, 3,· · · , 𝑛 (11) 𝜏𝑖, 𝑗 ∗= 1 − 𝑞𝜏𝑖, 𝑗 + 𝑋𝑛 𝑑 = 1 𝑥𝑖𝑗 = 1, 𝑐 = 1, 2, 3,· · · , 𝑛 (12) 𝑞𝜏0 (7) 𝑥𝑖𝑗 ∈ {0, 1}, 𝑐, 𝑑 = 1, 2, 3,· · · , 𝑛 (13) As the process runs, FC first generate a variety of 𝑋𝑛 𝑐, 𝑑 ∈ 𝑆 𝑥𝑖𝑗 ≤ |𝑆| − 1, 2 ≤ |𝑆| ≤ 𝑁 − (14) randomly generated solutions. “pheromone” s are then updated based on the problem type and IAC Algorithm, The overall cost that needs to be reduced is represented by with “pheromone” s placed on the graph's edges or the objective function (11) in these formulations. While vertices, to improve the solutions. The probability of the constraint (13) ensures that each town has a single position edge, which is computed as follows, determines whether allotted to it, restriction (12) ensures that each place is or not to traverse the edge between two nodes, i and j in occupied by a single city j. The zero-one variables' eq(8). integrality constraints 𝑥𝑖𝑗 (𝑥𝑖𝑗 ≥ 0) are represented by 𝜋( 𝛽 𝑡)𝑎 𝑖𝑗𝑛 𝑃𝑘𝑗𝑖 = i𝑗 constraint as follows. Constraint (14) guarantees that no le0, (8) ∑ 𝜋(𝑡)𝑎 𝛽isFeasib 𝑖 𝑛 k(i) detours will be created and that every city on the final 𝑗 i𝑗 𝑗∈𝑁 itinerary will be visited once. where, 𝑃𝑘𝑗𝑖 is chances for the understanding of instruction Algorithm 1: Pseudocode for IACA from nodes I and J, 𝜋(𝑡)𝑎 𝑖𝑗denotes the value of the location 1. Build an environment model 𝛽 range, 𝑛i𝑗 is the materials, and 𝑁k(i) is the establishment of 2. Intialising thne count of ants nodes that can be visited by comments 𝑃𝑘. To get better 𝑃𝑚𝑎𝑥 , 𝑀, 𝑆 𝐸, 𝛼, 𝛽, 𝑎, 𝑏, 𝑐, 𝑅0, 𝜋𝑖𝑗 , 𝜂𝑖𝑗 results, it is advised to run a local search before upgrading 3. For 𝑃 = 1 to 𝑃𝑚𝑎𝑥 do the “pheromone” s. Nonetheless, the following approach is 4. Calculate β according to the equation (3); recommended for upgrading the eq. (9): 5. Calculate 𝜌according to the equation (8); 𝜋𝑖𝑗(𝑡 + 1) = (1 − 𝜌) ⋅ 𝜋𝑖𝑗(𝑡) + Δ𝜋𝑖𝑗(𝑡) (9) 6. For 𝑘 = 1 𝑡𝑜 𝑀 do Researchers and supply chain can learn more about the 7. Ant K into S; efficacy and customer communication as well as pinpoint 8. While ant 𝐾 does not reach 𝐸 and Optional node>0 do areas for development by utilizing the IAC Algorithm to 9. Determine the next emergency logistics from analyze emergency location from warehouse. Equation 4,5,9; 10. While ant 𝐾 in a deadlock do 3.3 IACAmethod with traveling salesmen 11. Use deadlock handing route mechanism problem 12. Set the deadlock points as an obstacle point; Mathematicians and computer scientists in particular have 13. End while focused a lot of attention on the travelling salesman 14. End while problem (TSP) because it is both straightforward to 15. Save ant 𝐾 taking path; explain and challenging to solve. Looking for the shortest 16. Calculate ant 𝐾 length path; restricted path to the target. A full directional graph 𝐺 = 17. End for (𝑁, 𝐴), can be used to represent the TSP, where A is a 18. Calculate the shortest path for the iteration; collection of arches and 𝐷 = 𝑑𝑖𝑗 is the price (a distance) 19. Divide the subpart by partitioning method; vector for every arc (𝑖, 𝑗) ∈ 𝐴.. Often referred to as cities, 20. Update pheromones for each subpart by Equation N is a collection of n nodes, or vertices. The cost of matrix 10,11,12,13; D could be either symmetrical or asymmetrical. Finding 21. Set upper and lower pheromone limits by Equation 14; the shortest closed tour that visits each of the 𝑛 = |𝑁| 22. End for nodes of G precisely once is known as the TSP. The 23. Output the optimal path; distances between the cities in symmetric TSP are irrespective of the path in which they traverse arcs, 194 Informatica 49 (2025) 187-198 M. Wei 4 Results and discussion Ants make better initial decisions and spend less time 4.2 Experimental analysis pursuing fruitless avenues when they are given more After improving ACA through performance analysis of the insightful instruction. The enhanced algorithms minimize emergency logistics supply chain in Travelling Salesman, the number of iterations required by directing ants toward a logistics automation system is built. First, the suggested promising areas of the search space right away. Improved Ant Colony algorithm is used to verify its Experimental Setup: performance. For system simulation, the experiment is The technique is programmed and solved in this work carried out in MATLAB. Through table 2 establishment of using MATLAB 2017a, and it was evaluated on an Intel appropriate parameters and limitations, the IACA notebook running 64-bit software with a CPU speed of algorithm's performance is monitored. 2.20 GHz and 4 GB of RAM. Table 2: the experimental model parameters 4.1 Dataset Parameters & units Values The emergency supply mechanism was promptly put into Fixed Cost (Yuan) 261 place to provide the basic necessities of life for inhabitants transport cost (Yuan) 4 in Shanghai lockdown zones. In order to store and Vehicle speed (Km/h) 60 distribute supplies, ten ESWs were quickly established all the way through the city, utilising logistics supply, district Maximum Vehicle limit of 14w emergency food enterprises' distribution centres, and other mileage (Km) locations [18] Therefore, if the emergency material heavy load (Kg) 4250 logistics is considered the point as central, the largest pallet sizes(mm) 465*455 geographic area a disaster can potentially cover is approximately 633 km2, since the area of administration 4.3 Data preprocessing of Shanghai is 63401 km2. Caolu Modern Agricultural Min-max normalization Park was chosen as the Emergency material supply, and Min-max normalization is a technique for normalizing that the relevant data came from the lock-down zones created involves linearly transforming the initial data to create an on April 16, 2022, in accordance with Shanghai's distinct equilibrium of value comparisons before and after the preventative and control needs. Within the process. This approach could use the following Equation. ESWFootnote1's coverage area were 50 lockout zones, including Magnolia Fragrance Garden Phase II, Sunshine 𝑌−𝑚𝑖𝑛 (𝑌) 𝑌𝑛𝑒𝑤 = Flower City, and FengchenLeyuan. The ESW is 𝑚𝑎𝑥 (𝑌)−𝑚𝑖𝑛 (𝑌) represented by the number 1 (shown in Fig 5(b) with a dot Where, in green), and the 50 lockdown zones are represented by 𝑌-Old value, the numbers 2–51. Figure 6 displays the regional 𝑌𝑛𝑒𝑤- The new value obtained from the normalized geographic information's distribution map. The ESW and outcome, part of the lockdown zones in Shanghai's Pudong New 𝑚𝑖𝑛 (𝑌)–Minimum value in the collection, Area are shown by the red area in Figure 5. The overall 𝑚𝑎𝑥 (𝑌)–Maximum value in the collection. distribution data is displayed in Figure 5(b). The red circle indicates the ESW zone, while the red and green dots Outlier Detection and Removal indicate the lockdown and ESW zones, respectively. Outliers can degrade the efficiency of machine learning models by distorting statistical relationships among features. To eliminate outliers, we employ Z-score evaluation, which determines how far each data point deviates from the mean in standard deviations. A Z-score greater than 3 or less than -3 denotes an outlier that should be eliminated. The Z-score is determined as equation. below. X−μ 𝑍 = σ where X is the data point, μ is the mean of the attribute, and σ is the standard deviation. Comparing Traditional and proposed method ACO's initial total cost function value in Figure 6 was above 3600, and after 132 iterations, it dropped to its Figure 5: Lockdown zone for dataset Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 195 lowest value of 339. The initial value of IACO's total cost prediction accuracy and can optimize logistics routes while function was less than 3243, and after 19 iterations, it lowering energy consumption in logistics distribution. dropped to its lowest value of 3208. The study approach Compared to other models, the fitting effect is superior. In outperformed the conventional ACO in terms ofthe order to confirm the suggested model's scalability and convergence speed as well as the initial value of the total dependability. cost function speed when comparing the convergence function images of the ACO and IACO models. A region with little Testing was done on the passenger flows when the traffic flow was nearly nothing in order to confirm the efficacy of the study approach in real-world application. Litchi was chosen for transportation because it is perishable and must be stored at a low temperature while being transported. Fuel trucks were used by transport vehicles, and for testing the application, 31 locations as targets were chosen. Figure 7: Comparison of model prediction accuracy Figure 6: Total cost value of traditional and proposed The accuracy and solution time of four distinct logistics models are contrasted in Fig 8. The accuracy comparison The prediction accuracy of four distinct logistics models is results are displayed in Fig 8(a). At 98.58%, the suggested contrasted in Figure 7. The automated logistics control model achieved the best solution accuracy. The contrast of model developed for this study's prediction accuracy solution times is displayed in Figure 8(b). Although it was results are displayed in Figure 7(a). With an R2 of 0.98, greater than the other three models, the suggested model's the results demonstrated that the designed model had the solution time of 44.64 seconds was still within a highest prediction accuracy. This was 0.27, 0.30, and 0.17 reasonable range. higher than the prediction accuracy of the following algorithms: Tabu-Grey Wolf Optimisation (TGWO)[19], Improved Ant Colony Algorithm (IACA)[Proposed], Improved Particle Swarm Optimisation Algorithm (IPSO)[20], and neural network (NN)[21]. In conclusion, the model has a high prediction accuracy and can optimise logistics routes while lowering energy consumption in logistics distribution. Compared to other models, the fitting effect is superior. The suggested model is tested using real-world data to further confirm its scalability and dependability. The suggested model's accuracy and solution time are contrasted with those of the existing and proposed approaches. A statistical metric called R-squared is used to assess how well a regression model fits data. R-squared values range from 0 to 1. When the model fits the data exactly and the anticipated and actual values are identical, we have an R- square of 1. However, when the model fails to learn any association between the dependent and independent Figure 8: Accuracy and solution time of models variables and does not predict any variability, we obtain R- square equals 0. In conclusion, the model has a high 196 Informatica 49 (2025) 187-198 M. Wei Table 3: Robustness analysis of the four algorithms. Techniques Hi Av inacc robus t( gh era urac tness( s) es ge y(ξ) r) t IPSO 38 38 2.70 2.10 12 3. 4.5 .1 52 7 0 TGWO 37 38 2.38 5.30 9. 7. 8.7 56 82 9 NN 37 37 2.64 10.09 6. 4. 7.1 69 38 6 IACA 37 37 0.74 20.20 1. [Proposed] 1. 4.0 44 17 3 Figure 9: Finding Ideal path length finding In Figure 9 it is clear that the improved ant colony algorithm requires less iterations to finding the ideal path than the current approaches. Additionally, the optimal path length is shorter, allowing for faster convergence. Performance Analysis For this paper, the method was run 55 times, and for the three algorithms in the literature, the average_time (t), inaccuracy (ξ) in %, and robustness (r) in % of every approach were noted. The following are the equations: Figure 10: ROC Curve 𝜉 = 𝑎𝑣𝑒 – 𝑏𝑒𝑠𝑡/ 𝑏𝑒𝑠𝑡 ; 𝑟 = 𝑚/𝑛; Figure 10 shows a visual depiction of the model's whereaverage is the overall mileage in average, Highestis performance over all thresholds is the ROC curve. The true the ideal overall mileage, n is the number of tests, and m positive rate (TPR) and false positive rate (FPR) are is the number of counts at which the optimal solution is computed at each threshold (practically, at predetermined found. The information for each one of the four methods intervals), and the TPR is then graphed over the FPR to findings is shown in Table 3 and show that the algorithm create the ROC curve. In the event that all other thresholds generated the highest overall mileage, average overall are disregarded, a perfect model, which at some threshold mileage, inaccuracy value, robust r, and algorithm time has a TPR of 1.0 and an FPR of 0.0, can be represented by expenditure when compared to the other examined either a point at (0, 1) The ROC is a helpful metric for methods. These findings suggest that the method performs evaluating the performance of distinct models, provided well in terms of computing complexity and robustness. that the dataset is fairly balanced. In general, the better This implies that the method employed in this study model is the one with a larger area under the curve. The surpasses the single ant colony algorithm and yields an suggested model [IACA] shows ROC curve has highest ideal result with minimum error, high precision, and accuracy with 0.95. A confidence interval is a range of consistent robustness. numbers that can be believe contains a population parameter. For any normal distribution approximately 95% of the values are within two standard deviations of the mean. From the following formula determine a 95% confidence interval: 95% confidenceinterval − 𝑠 = 𝑥 ± 1.96 √𝑛 Optimization of Emergency Material Logistics Supply Chain Path... Informatica 49 (2025) 187-198 197 4.3 Discussion used to confirm the model's superiority. Future research can produce predictions with more accuracy and fewer IACA has been included to the emergency logistics supply iterations as the forecast results. which enhances chain path in order to increase logistics efficiency. The scalability, adaptability, and real-time capabilities while enhanced ACA served as the foundation for the building integrating developing technologies with IOT. This of the emergency logistics system model. The IACA achieves so by reflecting the precision and flexibility of the algorithm demonstrated great optimization capabilities data in the optimization model. based on test results on the dataset. It was able to locate the ideal solution after 132 iterations, lowering the cost Limitations of IACA: reduction value to be more than 30% below the average • Enhanced features like dynamic pheromone updates, cost. Additionally, the suggested model's delivery distance hybridization, and real-time data integration add was greater, its average power consumption per logistics computing overhead; node was lower, its emergency material for disasters was • Even with improvements, ACO may not be able to handle higher, and its prediction accuracy—which had an R2 of the exponential expansion in the number of paths as the 0.98—was higher than that of the NN, TGWO, and IPSO. network size increases. This suggests that the suggested approach has some practical application value and good optimization • The effectiveness of the enhanced ant colony is highly capabilities that can successfully increase delivery dependent on sensitive parameters, including the number efficiency and lower fuel costs. While this study enhanced of vehicles, heuristic weighting, and pheromone the program's performance, it also made the method more evaporation rate. complex, which reduced its computing efficiency. • If dynamic data is imprecise or delayed, offer less-than- Therefore, in order to maintain performance, it will be ideal routes. necessary to significantly lower the algorithm's complexity There is a trouble striking a balance between time and in the future. money when shipping products to several high-priority Benefits of the proposed approach include: Lower sites. operating costs due to the algorithm's optimization of routes and vehicle usage, which lowers labor, fuel, and maintenance expenses. When essential materials are References delivered on time, penalties or reputational harm from [1] T. Kundu, J.-B. Sheu, and H.-T. Kuo, “Emergency delays are avoided; alternate routes are promptly found to logistics management—Review and propositions for future research,” Transportation Research Part E: avoid stopped highways; and time and expense are Logistics and Transportation Review, vol. 164, p. balanced to supply essential supplies effectively. 102789, Aug. 2022, doi: https://doi.org/10.1016/j.tre.2022.102789 5 Conclusion [2] Z. Li and X. Guo, “Quantitative evaluation of The logistics sector is changing dramatically as a result of China’s disaster relief policies: A PMC index model its ongoing expansion. The IAC Algorithm was developed approach,” International Journal of Disaster Risk to reduce the amount of time it takes to distribute supplies Reduction, vol. 74, p. 102911, May 2022, doi: from an Emergency Support Area (ESA) to Shanghai's 50 https://doi.org/10.1016/j.ijdrr.2022.102911 lockdown zones. Using data from Pudong, Shanghai, the [3] Y. Zhang, Q. Ding, and J.-B. Liu, “Performance system was evaluated and contrasted with three other evaluation of emergency logistics capability for intelligent optimisation techniques. The findings public health emergencies: perspective of COVID- demonstrated that the algorithm's accuracy was enhanced 19,” International Journal of Logistics Research and by its local optimisation operation and that its relative error Applications, pp. 1–14, Apr. 2021, doi: value was lower than that of other algorithms. In addition https://doi.org/10.1080/13675567.2021.1914566 to being applicable to discrete optimisation issues, this [4] F. Diehlmann, M. Lüttenberg, L. Verdonck, M. method can serve as a general algorithmic framework for Wiens, A. Zienau, and F. Schultmann, “Public- a number of scenarios, including rescue supplies, resource private collaborations in emergency logistics: A allocation during wildfires, emergency rescue during framework based on logistical and game-theoretical floods, and the transportation of hazardous items. concepts,” Safety Science, vol. 141, p. 105301, Sep. However, the impact of road networks on transportation 2021, doi: and supply distribution, lockdown zone configuration, and https://doi.org/10.1016/j.ssci.2021.105301 [5] S. Jomthanachai, W.-P. Wong, K.-L. Soh, and C.-P. population density are not taken into account in this Lim, “A global trade supply chain vulnerability in article.But existing logistics systems frequently lack the COVID-19 pandemic: An assessment metric of risk necessary flexibility and insight. In order to do this, the and resilience-based efficiency of CoDEA method,” study modified the logistics supply chain system by Research in Transportation Economics, vol. 93, p. integrating TSP into IACA. Simulation experiments were 198 Informatica 49 (2025) 187-198 M. Wei 101166, Dec. 2021, doi: 11709, Aug. 2022, doi: https://doi.org/10.1016/j.retrec.2021.101166 https://doi.org/10.1109/tits.2021.3106305 [6] “International Journal of Disaster Risk Reduction | [15] F. Lu, W. Feng, M. Gao, H. Bi, and S. Wang, “The Vol 74, May 2022 | ScienceDirect.com by Elsevier,” Fourth-Party Logistics Routing Problem Using Ant Sciencedirect.com, 2022. Available: Colony System-Improved Grey Wolf Optimization,” https://www.sciencedirect.com/journal/international vol. 2020, pp. 1–15, Oct. 2020, doi: -journal-of-disaster-risk-reduction/vol/74/suppl/C. https://doi.org/10.1155/2020/8831746 [Accessed: Oct. 26, 2024] [16] M. Ashour, R. Elshaer, and G. Nawara, “Ant Colony [7] Y. Liu, J. Li, M. Liu, and B. Jiao, “An Enhanced Ant Approach for Optimizing a Multi-stage Closed-Loop Colony Algorithm-Based Low-Carbon Distribution Supply Chain with a Fixed Transportation Charge,” Control Method for Logistics Leveraging Internet of Journal of Advanced Manufacturing Systems, pp. 1– Things (IoT),” Wireless Communications and Mobile 24, Nov. 2021, doi: Computing, vol. 2023, pp. 1–12, Nov. 2023, doi: https://doi.org/10.1142/s0219686722500159 https://doi.org/10.1155/2023/5555221 [17] ]M. Abdolhosseinzadeh and M. M. Alipour, “Design [8] H. Jin, Q. He, M. He, S. Lu, F. Hu, and D. Hao, of experiment for tuning parameters of an ant colony “Optimization for medical logistics robot based on optimization method for the constrained shortest model of traveling salesman problems and vehicle Hamiltonian path problem in the grid networks,” routing problems,” International Journal of Numerical Algebra, Control & Optimization, vol. 11, Advanced Robotic Systems, vol. 18, no. 3, p. no. 2, p. 321, 2021, doi: 172988142110225, May 2021, doi: https://doi.org/10.3934/naco.2020028 https://doi.org/10.1177/17298814211022539 [18] Chen, H. 2022. “Shanghai Starts 10 Emergency [9] M. Yang, “Yang, M. (2022). Research on Vehicle Supply Warehouses.” Automatic Driving Target Perception Technology https://contentstatic.cctvnews.cctv.com/snowbook/i Based on Improved MSRPN Algorithm. Journal of ndex.html?item_id=32234810742465611. Chen, C.- Computational and Cognitive Engineering, 1(3), H., Y.-C. Lee, and A. Y. Chen. 2021. 147–151. [19] H. Zhang, J. Yan, and L. Wang, “Hybrid Tabu-Grey https://doi.org/10.47852/bonviewJCCE20514. wolf optimizer algorithm for enhancing fresh cold- [10] A. De, M. Gorton, C. Hubbard, and P. Aditjandra, chain logistics distribution,” PLoS ONE, vol. 19, no. “Optimization model for sustainable food supply 8, pp. e0306166–e0306166, Aug. 2024, doi: chains: An application to Norwegian salmon,” https://doi.org/10.1371/journal.pone.0306166 Transportation Research Part E: Logistics and [20] K. Tan, W. Liu, F. Xu, and C. Li, “Optimization Transportation Review, vol. 161, p. 102723, May Model and Algorithm of Logistics Vehicle Routing 2022, doi: https://doi.org/10.1016/j.tre.2022.102723 Problem under Major Emergency,” Mathematics, [11] T. Yan, F. Lu, S. Wang, L. Wang, and H. Bi, “A vol. 11, no. 5, pp. 1274–1274, Mar. 2023, doi: hybrid metaheuristic algorithm for the multi- https://doi.org/10.3390/math11051274 objective location-routing problem in the early post- [21] M. Chen, “RETRACTED ARTICLE: Optimal path disaster stage,” Journal of Industrial and planning and data simulation of emergency material Management Optimization, vol. 19, no. 6, pp. 4663– distribution based on improved neural network 4691, Jan. 2023, doi: algorithm,” Soft Computing, vol. 27, no. 9, pp. 5995– https://doi.org/10.3934/jimo.2022145 6005, Apr. 2023, doi: [12] A. Zhu and Y. Wen, “Green Logistics Location- https://doi.org/10.1007/s00500-023-08073-4 Routing Optimization Solution Based on Improved GA A1gorithm considering Low-Carbon and Environmental Protection,” Journal of Mathematics, vol. 2021, pp. 1–16, Nov. 2021, doi: https://doi.org/10.1155/2021/6101194 [13] M. Abbasian, Z. Sazvar, and M. Mohammadisiahroudi, “A hybrid optimization method to design a sustainable resilient supply chain in a perishable food industry,” Environmental Science and Pollution Research, Aug. 2022, doi: https://doi.org/10.1007/s11356-022-22115-8 [14] Z.-H. Sun, X. Luo, E. Q. Wu, T.-Y. Zuo, Z.-R. Tang, and Z. Zhuang, “Monitoring Scheduling of Drones for Emission Control Areas: An Ant Colony-Based Approach,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 11699– https://doi.org/10.31449/inf.v49i16.6990 Informatica 49 (2025) 199-212 199 Application Method and Least Squares Support Vector Machine Analysis of a Heat Pipe Network Leakage Monitoring System Using an Inspection Robot Xu Wang, Xiaobo Long, Guangwei Li, Jing Li, Yuweijia Zhao* Kunming metallurgy college, Kunming, Yunnan, 650033, China E-mail: kmyz_wang79@163.com *Corresponding author Keywords: heat pipe network inspection robot, human-computer interaction, heat pipe network side leakage monitoring, mobile platform, robot control Received: August 24, 2024 With the maturity of the Internet and big data technAology, heat supply intelligence has become a development trend, and the traditional heat pipe network management mode is gradually transitioning to an "intelligent heat pipe network". It has become a hot spot for research and development at home and abroad. Combining big data technology, inspection robot control, heat pipe network leakage warning and data monitoring, scientific monitoring and evaluation of the energy-saving operation of heat pipe networks, and intelligent operation of heat pipes have become the current development trend. Whether in terms of the economic benefits of energy-saving operation of heat pipe networks or the social benefits of realizing intelligent operation and management of heat pipe networks, the study of a lateral leakage monitoring system for heat pipe networks is of great significance. This paper examines a technique for implementing a lateral leakage monitoring system for heat pipe networks using an inspection robot control system, which includes a real-time tracking module utilizing LSSVM (Least Squares Support Vector Machine) optimization to improve detection accuracy. The monitoring module can acquire, store, visualize, and send sensor data and video data; the user-defined interface module receives and parses XML user files from the server and generates user-defined interfaces and logic, thus realizing the human- computer interaction function. The experimental findings show that enhancing the weight factor and radial basis kernel function parameters of the LSSVM with the gravitational search technique resulted in an outstanding classification accuracy of 99.99% with a classification time of only 55.938 seconds, surpassing other optimization techniques. Povzetek: Z uporabo robotskega nadzornega sistema in optimizirane metode LSSVM so avtorji razvili inteligentni sistem za spremljanje puščanja v toplotnih cevnih omrežjih 1 Introduction referred to in the study of heat pipe network inspection Leakage in a thermal pipe network is a sudden change in robot monitoring systems [5]. liquid flow head or flow pressure caused by the flow rate Numerous studies have investigated different methods of of the medium stored outside the pipe exceeding a set inspecting and identifying leaks in pipeline networks, value, which leads to a leak in the pipe [1]. Generally, we highlighting the significance of intelligent systems. put in a sealed space after the occurrence of a leakage Zholtayev et al. [6] created a smart pipe inspection robot accident resulting in energy loss for the minimum, but this with in-chassis motor actuation and AI-powered defect method cannot effectively monitor the actual identification, showcasing sophisticated robotics environmental conditions over the failure caused by incorporation in network tracking. Murtazin et al. [7] system damage to the maximum extent is not accurate and examined internal inspection techniques for district reliable, timely detection and treatment of the process heating networks, highlighting the importance of resilient caused by the consequences are very serious and likely to inspection techniques in energy systems. Wong and bring huge economic losses to the enterprise [2-3]. McCann [8] conducted an in-depth analysis of pipeline Therefore, a lot of research has been carried out at home failure identification methods, ranging from acoustic and abroad on the diagnosis of leakage faults in heat pipe sensing to cyber-physical systems, emphasizing the networks, and many leakage monitoring methods have growing use of IoT solutions in fault detection. Liu et al. been proposed. Although various methods have certain [9] presented an enhanced BP neural network algorithm limitations and need to continue to be improved [4], the for leakage detection in air conditioning water systems, leakage monitoring of heat pipe networks and their demonstrating the efficacy of machine learning in compensators has important significance and can be detecting faults. Korlapati et al. [10] performed a thorough 200 Informatica 49 (2025) 199-212 X. Wang et al. review of pipeline leak identification approaches, ranging Hossain et al. [14] used UAV image evaluation and from conventional to AI-based methods. Similarly, Yussof machine learning to identify leaks in district heating, and Ho [11] examined water leak detection techniques in demonstrating the value of aerial monitoring for smart buildings, emphasizing the significance of these infrastructure surveillance. Finally, Vollmer et al. [15] technologies in contemporary infrastructure. Langroudi compared anomaly detection techniques in thermal and Weidlich [12] investigated predictive maintenance imagery for district heating leak identification, which assessment techniques for district heating pipes, which advances the use of thermal imaging in fault identification. added to service-life prediction techniques. Van Dreven et Table 1 shows a summary table. al. [13] addressed smart fault detection in district heating, finding significant patterns and obstacles in the area. Table 1: Summary table Citation Title Accuracy Efficiency Limitations Innovations Gaps in SOTA [6] Smart Pipe 95% High (real- Scalability, cost AI-powered Flexibility for Zholtayev et Inspection time) discovery, high different pipe al. (2024) Robot with AI- precision types Powered Defect Discovery [7] Murtazin Internal 92% Moderate Constrained with Non-destructive Constrained with et al. (2021) Inspection of magnetic testing testing particular District Heating pipelines Networks [8] Wong & Pipeline Failure 70-85% Low (real- Inconsistent Discovery High McCann Discovery: time) accuracy, high cost taxonomy, computational (2021) Acoustic to spatial cost Cyber-Physical enhancement Systems [9] Liu et al. Leakage 86.96% High Fault location error Two-stage Constrained real- (2022) Analysis for Air diagnosis, BP time localization Conditioning neural network Water Systems [10] Review of 87% Varies No standardization Review of Variability in Korlapati et Pipeline Leak subsea reliability al. (2022) Discovery techniques Techniques [11] Yussof Water Leak 81% Varies Real-time gaps in Incorporation Absence of & Ho (2022) Discovery in smart buildings with building automated Smart Buildings automation discovery [12] Predictive 85-90% High Constrained with Proactive AI- Narrow Langroudi & Maintenance for district heating driven concentration Weidlich District Heating maintenance (2020) Pipes [13] van Fault Discovery 80-93% Medium- Data Restrictions ML methods for Absence of open- Dreven et al. in District High fault discovery source data (2023) Heating with ML [14] Hossain UAV-Based 85% Moderate Constrained with UAV with Poor scalability et al. (2020) Leakage UAV image infrared for large systems Discovery for examination discovery District Heating [15] Vollmer Anomaly 90% High False positives in Self-learning Difficulties with et al. (2021) Discovery in intricate systems algorithms dynamic settings Water Networks with Self- Learning Algorithms Existing state-of-the-art (SOTA) techniques have many techniques lack scalability, but this paper presents a shortcomings, like constrained flexibility to particular scalable AI-based framework for large-scale networks. pipeline types, whereas the proposed system has wider Real-time efficiency is hampered by high computational applicability. Previous UAV and subsea detection Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 201 expenses; the proposed system improves this with process the data so that each indicator value is uniformly improved algorithms. within a certain range of numerical characteristics. In this Based on the above, this paper focuses on the design of the paper, 500 training samples and 90 test samples are heat pipe network leakage monitoring system, including randomly selected, and the independent variable of the hardware circuits and software programs, which involves leakage fault diagnosis model is denoted as x. The leakage the following key technologies: the selection of each fault level of the key nodes of the heat pipe network component. According to the different types of obtained after random sampling is denoted as the components, the corresponding models are selected to dependent variable y. The conductivity G1 and G2 are analyze the working conditions under various parameters. processed by logarithm, the temperature and pressure are The microcontroller control module, the sensor data normalized, and the sample data of the independent acquisition part, and the peripheral devices such as the variable after data pre-processing are display alarm form the overall structure to complete the design scheme and design the thermal pipe network 𝑇 2,3,4−𝑇𝑚𝑖𝑛1,2,3,4 𝑇 1, 𝑛𝑜𝑟𝑚𝑙1,2,3,4 = leakage monitoring system based on the inspection robot 𝑇𝑚𝑎𝑥1,2,3,4−𝑇𝑚𝑖𝑛1,2,3,4 control system. 𝐺𝑛𝑜𝑟𝑚𝑙1,2 = lg(𝐺1,2) (1) 𝑝−𝑝 𝑝 { 𝑛𝑜𝑟𝑚𝑙 = 𝑚𝑖𝑛 𝑝𝑚𝑎𝑥−𝑝𝑚𝑖𝑛 2 Leakage fault diagnosis in heat pipe networks Using equation (1) we can obtain the normalized data for the training samples as well as the test samples, and the 2.1 Leakage fault modelling dependent variable, the leakage fault level at the key nodes Leakage faults at key nodes of the heat pipe network are of the heat pipe network, is given by equation (2) classified into three levels: normal, normal leakage, and severe leakage, and the leakage will lead to a sudden drop yi ∈ {1,2,3} (2) in pressure inside the pipe and changes in ambient temperature and conductivity [16-17]. Therefore, in this The least squares support vector machine algorithm is then paper, the ambient temperature T1, T2, T3, and T4, the used to build a classifier to achieve a multiclassification ambient conductivity G1 and G2, and the internal pressure fault diagnosis model for critical node leakage in the p of the pipe are selected as inputs, and the leakage level thermal network. In the thermal pipe network critical node of the critical node of the heat pipe network is taken as the leakage fault diagnosis model, we assume that the output to establish the leakage level discrimination model. independent variable is x, and define the nonlinear thermal In this paper, the leakage level is expressed as {1, 2, 3} and pipe network critical node leakage fault diagnosis model will be used as the output of the leakage fault model, while of least squares support vector machine as: the ambient temperature and conductivity around the heat pipe network and the internal operating pressure of the x = [T1, T1, T3, T4, G1, G2, P} (3) pipe network obtained online by the in-situ monitoring unit yi = {ω, (φx)} + b (4) are used as the model inputs. The multi-classification leakage fault diagnosis at key nodes of the heat pipe Given a set of data points that are closely related to the network consists of four steps: sample collection, data pre- fault diagnosis of leakage at critical nodes of the thermal processing, building and optimizing the multi- network, i.e. ambient temperature, ambient conductivity, classification leakage fault diagnosis model, and model and internal pipe pressure, d is the dimensionality of the testing. Specifically, x training samples and y test samples model input variables and is the result of the model are arbitrarily selected; the extracted training and test classification, i.e. normal (1), normal leakage (2) and samples are normalized; the multiclassification heat pipe severe leakage (3), l is the total number of known data network critical node leakage fault diagnosis model is points, and b is a constant. Therefore, the target equation established; the model parameters are optimized; and the and the nonlinear decision function used in the input space experimental samples are substituted into the established can be defined as: multiclassification heat pipe network leakage fault diagnosis model for testing. 1 Since each characteristic indicator of temperature, min ω2 1 + C∑l 2 2 2 i=1ei (5) conductivity, and pressure has different magnitudes and 𝑦(𝑥) = 𝑠𝑔𝑛(∑𝑆 𝑦𝑖𝑎𝑖𝐾(𝑥 , 𝑥𝑖) + 𝑏) (6) 𝑖 orders of magnitude, the role of a characteristic indicator of a particularly large order of magnitude in the 2.2 Optimization of the parameters of the classification may be highlighted in the calculation process. leakage fault diagnosis model To eliminate differences in the units of the characteristic indicators and the effect of different orders of magnitude In this paper, the gravitational search method is used to of the characteristic indicators, it is necessary to pre- optimize the parameters of the weight factor and the radial 202 Informatica 49 (2025) 199-212 X. Wang et al. basis kernel function of the least squares support vector 𝑀 = 𝑀 = 𝑀 𝑁 𝑎𝑖 𝑝𝑖 𝑖𝑖 = 𝑀𝑖 , 𝑖 = 1,2,⋯ , machine. The gravitational search algorithm focuses on 𝑓𝑖𝑡 𝑜𝑟𝑠𝑡(𝑡) 𝑚 𝑖(𝑡)−𝑤 using the law of gravity between two objects to guide the 𝑖(𝑡) = 𝑏𝑒𝑠𝑡(𝑡)−𝑤𝑜𝑟𝑠𝑡(𝑡) (13) 𝑚 𝑡) optimal search for the optimal solution for the motion of 𝑀𝑖(𝑡) = 𝑖( each object. In this algorithm, each object is considered as { ∑𝑁𝑗=1𝑚𝑗(𝑡) 𝑏𝑒𝑠𝑡(𝑡) = min an object whose performance is measured by its mass, and 𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡) { (14) all these objects are attracted to each other through gravity, 𝑤𝑜𝑟𝑠𝑡(𝑡) = max𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡) and this force causes all objects to move towards the object with the heavier mass [18]. The position of the object in 𝑏𝑒𝑠𝑡(𝑡) = max𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡) motion corresponds to its optimal solution. The { (15) 𝑤𝑜𝑟𝑠𝑡(𝑡) = min𝑗∈{1,⋯,𝑁}𝑓𝑡𝑗(𝑡) gravitational search method can be thought of as an isolated system of masses. Each object in motion follows According to the above principles of the gravitational the law of gravity and the law of motion. Assuming a search method and the LSSVM algorithm, it is clear that system with N objects, define the position of the ith the idea of the gravitational search method to optimize the particle as LSSVM learning parameters is to search for a set of vectors within a certain search space region by the 𝑥 , , 𝑖 = (𝑥1,𝑖 ⋯ , 𝑥𝑑𝑖 ⋯ 𝑥𝑛𝑖 ),𝑖 = 1,2,⋯ ,𝑁 (7) gravitational search method, so that equation (16) the value of the target fitness function is minimized. Then the interaction force and its parameters are given by 1 minf(C, δ) = ∑N y (16) n i=1( i − ŷ) 2 Fd M (t)×M ( aj(t) ij(t) p = G t) i (xd − xd 8) R j (t) i (t)) ( ij(t)+ε The basic principle of the gravitational search method is to Rᵢⱼ(t) = ‖xᵢ(t), xⱼ(t)‖₂ (9) optimally adjust the parameters of the least squares support vector machine weight factor and the radial basis kernel Fd(t) = ∑N function by using the strong global search capability of the i j=1,j≠irandjF d ij(t) (10) gravitational search method, and the optimization steps are as follows: first, randomly select and normalize the where rand is a random number generated between the training and test samples; given the population size N and intervals [0,1]. Therefore, according to Newton's laws of the maximum number of iterations, randomly initialize n motion, the acceleration of particle i in d-dimensional particles; during each iteration, substitute The position of space at time t and its calculation is given by each particle is substituted into the least squares support 𝑑 vector machine model and the fitness value of the current (𝑡) 𝑎𝑑 𝐹 𝑖 (𝑡) = 𝑖 (11) 𝑀 particle is obtained; the sum of the forces in different 𝑖𝑖(𝑡) directions of each particle and the acceleration of each 𝑣𝑑 particle are calculated according to Equation (10); the new 𝑖 (𝑡 + 1) = 𝑟𝑎𝑛𝑑 𝑑 𝑖 × 𝑣𝑖 (𝑡) + 𝑎 𝑑 𝑖 (𝑡){ particle position is calculated according to the update 𝑥𝑑 𝑑 (12) 𝑖 (𝑡 + 1) = 𝑥𝑑𝑖 (𝑡) + 𝑣𝑖 (𝑡 + 1) formula of the particle velocity and position; the algorithm termination condition is judged. If the maximum number where randy is a uniform random variable in the interval of iterations is reached, the iteration is terminated and the [0,1], which we use to give a random signature to the optimal parameter values are output. gravitational search. The gravitational constant G is The parameter tuning method for the gravitational search initialized at the start moment and decreases with time to technique (GSA) was carefully planned to improve the control the search accuracy. The gravitational and inertial LSSVM's efficiency. During this procedure, key masses are simply calculated by fitness evaluation. A parameters were adjusted, including the gravitational heavier mass means a more efficient object, which means constant, agent mass, and initial population size. Particular that such an object has a higher gravitational force and settings comprised 𝐺0=100, mass range of [1,10], and 50 slower velocity. Assuming that the gravitational mass is agents. The GSA procedure is depicted in the flowchart equal to the inertial mass, we update gravity and inertia below, starting with the initialization of agent positions using equation (13), of which equations (14-15) are the and masses, then iterative updates using gravitational definitions of the parameters as well as maximizing the forces, and finally evaluating the LSSVM classification transformation. efficacy. A total of 590 samples were used for dataset selection, with 500 serving as training samples and 90 as testing samples. The training dataset had a balanced distribution across multiple fault classes, guaranteeing that each class was sufficiently represented to avoid bias. Each Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 203 training sample was created from real-world functional In this paper, the parameters of the weight factor and radial data and represents a variety of fault situations. To basis kernel function of LS-SVM are optimized based on guarantee an unbiased assessment of the model's efficacy, the particle swarm algorithm, cuckoo algorithm, and the test samples were chosen at random from the same gravitational search method. This optimized multi- dataset while retaining the identical distribution features. classification fault diagnosis model for leak monitoring at This comprehensive description of the parameter tuning key nodes of the heat pipe network is used to test the test and dataset choice procedures not only improves samples. The classification results are also compared. The replicability, but it also improves the validation of the 500 training samples were taken into the leak fault provided outcomes, allowing other researchers to diagnosis model and the parameters of the LS-SVM efficiently apply the approach. weight factor and radial basis kernel function were first optimized using the gravitational search method to obtain the optimal values, which took 55.938 seconds to run and 99.99% of the test samples were correctly classified. Figure 2 illustrates a comparison of two elements of the gravitational search technique. The top section displays a parametric merit search plot, demonstrating how the technique assesses various parameters to identify the best solution. The bottom section shows the classification findings for the test set, demonstrating the method's ability to correctly categorize data using the optimum parameters discovered during the merit search. In general, this figure depicts the relationship between the parameter optimization procedure and its effect on classification efficiency. Figure 1: Flowchart of the GSA Process Figure 2: Comparison of the parametric merit search 2.3 Leakage fault diagnosis and result graph of the gravitational search method (top) and the test analysis set classification of the merit search (bottom) We introduce the mean square error MSE as an index to evaluate the correct classification rate, which is calculated Optimization of the weight factor and radial basis kernel as function parameters of the LS-SVM using the cuckoo algorithm resulted in an optimal value of 28.7282 and an 1 𝑀𝑆𝐸 = ∑𝑛 (𝑦 − 𝑦 )2 𝑛 𝑖=1 𝑖 ∧ 𝑖 (17) optimum value of 15.8259, and the time taken to run was 60.491 seconds, giving a 97.89% correct classification rate 204 Informatica 49 (2025) 199-212 X. Wang et al. for the test sample. In this paper, we optimize the weight opposed to 0.950 for the Cuckoo Algorithm. These metrics factor and radial basis kernel function parameters of the demonstrate the GSA's improved classification efficiency. LS-SVM based on the gravitational search method, the cuckoo algorithm, and the particle swarm algorithm, and use a randomly selected set of 90 test samples to check the correct classification rate. The findings show that the multi-classification fault diagnosis model, which uses a Table 3: Performance metrics least squares support vector machine algorithm enhanced by the gravitational search technique, attains a Metric Gravitational Cuckoo classification accuracy of 99.99% in only 55.938 seconds. Search Method Algorithm The results show that the multi-classification fault (GSA) diagnosis model based on the least squares support vector Accuracy 99.99% 95.75% machine algorithm optimized by the gravitational search Precision 99.95% 93.00% method has the best classification effect and the lowest Recall 99.90% 94.50% algorithm complexity. F1 Score 99.92% 93.75% Table 2 shows a confusion matrix comparing the ROC AUC 0.999 0.950 Gravitational Search Algorithm (GSA) and the Cuckoo Score Algorithm. The GSA had 450 true positives (TP) versus 425 for the Cuckoo Algorithm, showing superior 2.4 Heat pipe network inspection robot efficiency in finding positive cases. The GSA also control system recorded 40 true negatives (TN), which exceeded the Cuckoo Algorithm's 35. Particularly, the GSA had only The thermal pipe network inspection robot system is two false positives (FP), whereas the Cuckoo Algorithm divided into four modules: server, master controller had ten, indicating higher precision. Furthermore, the GSA (mobile side), slave controller (motion controller), and had three false negatives (FN) compared to the Cuckoo's pipe robot mechanical system. The principle of operation fifteen, demonstrating its detection efficiency. Overall, the and control of the thermal pipe network inspection robot GSA outperformed the Cuckoo Algorithm. system is based on the motion controller as the core [19]. When the user's control logic is burned into the motion Table 2: Confusion matrix controller and parsed, the motion controller sends control commands to the actuators to realize the motion control of Metric Gravitational Cuckoo the thermal pipe network inspection robot. The motion Search Method Algorithm controller collects and processes the robot motion data (GSA) from the photoelectric encoder and the mobile terminal in True Positives 450 425 real-time during the motion of the heat pipe network (TP) inspection robot, and according to the results of the data True Negatives 40 35 processing, the motion controller adjusts the motion state (TN) of the pipe robot in real time to achieve closed-loop control False Positives 2 10 of the pipe robot motion. In addition, the mobile side is (FP) equipped with a self-developed thermal network False 3 15 monitoring system, which sends the sensor data and video Negatives (FN) data to a remote server via network communication (TCP/IP communication protocol), enabling remote Table 3 presents efficiency metrics for the GSA and the monitoring of the pipeline robot. The specific functions of Cuckoo Algorithm. The GSA attained an impressive 99.99% the four components are as follows. accuracy, substantially higher than the Cuckoo Firstly, the server. The server side of this paper is equipped Algorithm's 95.75%. The GSA had a precision of 99.95%, with a self-developed remote thermal pipe monitoring compared to 93.00% for the Cuckoo Algorithm, system client, whose function is to receive the sensor data suggesting that it was more reliable at predicting the and video data from the mobile side and to visualize them, positive class. The GSA's recall was 99.90%, while the server can send control commands to the motion demonstrating its ability to detect pertinent instances, controller via the mobile side to adjust the motion of the whereas the Cuckoo Algorithm had a recall of 94.50%. pipe robot. Secondly, the mobile side. The mobile side is The F1 Score for the GSA was 99.92%, while the Cuckoo fitted with an on-site thermal pipe monitoring system Algorithm's was 93.75%, demonstrating the GSA's total client, which is capable of acquiring sensor data and video superiority. Finally, the ROC AUC Score for the GSA was data in real time using the mobile side's hardware- 0.999, indicating outstanding discriminative capacity, as integrated sensor set and HD camera. The system processes the data in two ways: one is the local Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 205 visualization and storage processing of the data; the other PWM frequency, 10-bit ADC resolution, and configure the is the sending of the data to the server side and the motion UART communication rate to 9600 bauds. controller. Thirdly, the multi-core heterogeneous motion Sensor calibration: To guarantee precise readings, controller. It is used to implement the user's control logic, calibrate the incorporated sensors by adjusting the which is the core of the motion control of the heat pipe temperature sensors to a reference temperature of 25°C. network inspection robot. The main hardware modules of The HD camera should be setup to capture video at 1080p the motion controller are a master MCU, two slave MCUs, resolution and 30 frames per second, while the laser a motor driver chip, a voltage converter chip, and a sensor scanner is set to a highest detection range of 5 meters. set. Fourth, the mechanical structure of the thermal pipe network inspection robot: it is the bearer of the mobile end, Control logic execution: Program control commands the power module of the motion controller, etc., and is also using user-defined logic into the motion controller, the final executor of the motion controller's operating allowing the robot to perform particular movement trends instructions. within the pipeline. In addition, the mechanical structure of the thermal pipe Test environment setup: To simulate real-world network inspection robot in this paper is mainly divided circumstances, build a scaled-down model of a 10-meter- into the chassis, walking mechanism, drive module, and long thermal pipe network with differing diameters (50 the articulation mechanism between the chassis, etc. The mm and 100 mm). chassis is the main part of the mechanical system of the Conducting trials: Perform at least five trials to evaluate thermal pipe network inspection robot, which carries the the robot's efficiency, recording control commands, sensor motion controller, drive system, and mobile end of the pipe readings, and motion execution times, with a target robot. The travel mechanism and drive module are the key execution time of less than 120 seconds. factors to ensure that the thermal pipe network inspection Data gathering and examination: Gather and evaluate robot walks normally in the pipeline, and are the focus of sensor and camera data to compare actual detection the pipeline robot mechanism design. findings to expected results, with a target detection The PC-Microcontroller control system was chosen accuracy of 90%. Record any deviations in effectiveness. because the thermal network inspection robot in this paper Expected vs. actual outcomes needs to process information such as video information Detection Accuracy: The detection accuracy is set at 90% and scanner data as well as execute control commands. for detecting known defects in the thermal pipe network. Based on the functional requirements of the system and the Execution time: The anticipated execution time should tasks to be completed by the robot, the robot system is not surpass 120 seconds, and actual times will be logged composed of a power supply system, a sensor system, an for comparison. upper computer system, a lower computer system, a Power requirements: The power supply system should motion control unit, a bus communication system, a video supply 24V to the motor driver, 12V and 5V to the sensor system, and a laser scanner system. The power supply unit, and 5V to the microcontroller and motion control system is responsible for supplying power to all parts of system. the robot. The robot in this project requires power from the The overall structure of the robot control is shown in motor driver (24V); the sensor unit (12V, 5V); the main Figure 3. controller (5V) of the proposed ATmega series of microcontrollers; and the motion control system (5V). The robot system is controlled by the lower computer, the upper computer sends commands to the lower computer through the bus communication system, and the lower computer controls the normal operation of the robot according to the received commands. 3 Experimental procedures for testing the control system of the thermal pipe network inspection robot Preparation: Assemble the robot's chassis, locomotion mechanism, drive module, and articulation system, making sure that the master MCU, two slave MCUs, motor driver chip, and sensor array are properly linked. Microcontroller configuration: Setup the ATmega microcontroller with a 16 MHz clock frequency, 490 Hz 206 Informatica 49 (2025) 199-212 X. Wang et al. Figure 4: Upper computer development process Figure 3: Control system structure block diagram Some of the program code for the upper computer is as Motor drive system design: follows. In this paper, the heat pipe network inspection robot has multiple motors: 2 main motors for walking, 1 camera ////1. Capture user operation commands rotation motor, 1 camera tilt servo, and 1 camera tilt servo. temp_char=PINB; //Read port B There are two main motors for walking, one camera if(temp_char&BIT(0)) //Rocker "run" trigger rotation motor, one camera tilt servo, one scanner rotation { motor, and one head lift motor. Since DC motors are used, flag_run=1; //indicates that the robot has entered the this paper only focuses on the drive system of the main forward state travel motors. The motion control of the motors is carried direction_left=1; //the robot is moving in the left wheel out by a central processor ATmega8 in the robot control direction module, which outputs PWM pulse width modulated direction_right=1; //the robot goes in the direction of the signals to the motor drivers in a software way to carry out right wheel the forward, reverse, and stop actions of the motors. velocity_left=velocity_robot; // the robot's left wheel speed velocity_right=velocity_robot; //robot right wheel speed Design of the upper computer control system: } The upper control system is mainly responsible for else if(temp_char&BIT(1)) // rocker "back" trigger communication with the lower computer of the pipeline { robot, through the control knob on the upper control panel flag_run=0; //meaning the robot is out of the forward state to send various commands to the robot, such as forward, direction_left=2; //the robot goes back in the left wheel backward, left turn, right turn, stop and other action direction commands; camera control knob can realize the direction_right=2; //the robot goes back on the right wheel adjustment of the camera head, such as camera rotation, velocity_left=velocity_robot; //the robot's left wheel speed tilt, and other actions; control panel also has the adjustment velocity_right=velocity_robot; //robot right wheel speed knob of the scanner head, can adjust The control panel also } has a knob for the scanner head, which allows the scanner else if(temp_char&BIT(2)) // rocker "left" trigger to be rotated, scanned and reset, as well as the adjustment { of the brightness of the LEDs. All commands are sent to if(flag_run==0) //rotate in place the robot via the host control system and are controlled by { the robot. The upper computer development process is direction_left=2; //Robot goes backwards in the left wheel shown in Figure 4. direction Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 207 direction_right=1; //The robot goes forward on the right The main functions of the lower unit of the pipeline robot wheel are to receive commands from the upper unit to control the velocity_left=100; //Robot left wheel speed motors; to collect data from sensors such as tilt angle and velocity_right=100; //Robot right wheel speed return the information to the upper unit; to provide power } to the robot motors, lights and cameras, and to control the else normal operation of the robot components. The code for { the part of the program of the lower computer is as follows. direction_left=1; // robot left wheel direction forward void main(void) direction_right=1; // robot right wheel direction { velocity_left=velocity_robot; // robot left wheel speed init_devices(); velocity_right=velocity_robot+100; //robot right wheel while(1) ///// cycle ms speed { } value_adc[2]=value_adc[0]; //backup last AD acquisition } data else if(temp_char&BIT(3)) // rocker "right" trigger value_adc[3]=value_adc[1]; //backup last AD acquisition { data if(flag_run==0) //rotate in place flag_adc=0; { adc_start(0); //Start AD acquisition direction_left=1; //Robot left wheel direction forward Delayms(1); direction_right=2; //Robot goes backwards on the right flag_adc=1; wheel adc_start(1); //Start AD acquisition velocity_left=100; //Robot left wheel speed Delayms(1); velocity_right=100; //Robot right wheel speed if(flag_auto==1) } {// calculate and output the speed of the motor else velocity_left=0;//the speed of the motor (P+D) { velocity_right=0;//the speed of the motor (P+D) direction_left=1; // robot left wheel direction forward if(velocity_left>10) direction_right=1; // robot right wheel direction { velocity_left=velocity_robot+100; // robot left wheel DIRLEFT_H;//positive rotation speed STOPLEFT_L;//positive rotation velocity_right=velocity_robot; //robot right wheel speed pwm_left=velocity_left; } } } else if(velocity_left<-10) else if(flag_run==1) //robot not triggered, the robot in a { forward state DIRLEFT_L;//reverse { STOPLEFT_L;//reverse direction_left=1; // robot left wheel direction forward pwm_left=-velocity_left; direction_right=1; //Robot is moving in the right wheel } direction else velocity_left=velocity_robot; //Robot left wheel speed { velocity_right=velocity_robot; //robot right wheel speed STOPLEFT_H;//brake } pwm_left=0XFF; Else if(flag_run==0) //robot is not triggered, the robot is } stationary. if(velocity_right>10) { { direction_left=3; //Robot left wheel stopped DIRRIGHT_H;//positive rotation direction_right=3; //robot right wheel stop STOPRIGHT_L;//forward velocity_left=0; //Robot left wheel speed pwm_right=velocity_right; velocity_right=0; // robot right wheel speed } } else if (velocity_right<-10) if(temp_char&BIT(4)) // robot auto travel state { { DIRRIGHT_L;//reverse direction_left=8; //Robot left wheel auto STOPRIGHT_L;//reverse direction_right=8; //Robot right wheel auto pwm_right=-velocity_right; } } ...... else 208 Informatica 49 (2025) 199-212 X. Wang et al. { when an exception occurs in the user's code, the STOPRIGHT_H;//brake corresponding dialog box will pop up, and according to the pwm_right=0XFF; information in the dialog box, the user can see the location } of the error and the cause of the error, to facilitate the user's }///////////////////////////////////////////////////////////////////////// modification of the code and avoid the phenomenon of ...... crashing or flashing due to errors in the operation of the heat pipe network monitoring system, which ensures the 3.2 Heat pipe network monitoring system normal operation of the heat pipe network monitoring under the control of inspection robot system. The system reads data from the intermediate database at 3.2.1 Analysis of the functional requirements and regular intervals to visualize (dynamic text display and overall architecture of the heat pipe network dynamic curve display), store, and send data. In the monitoring system architecture of the heat pipe network monitoring system, According to the aforementioned content pair can be the intermediate database, basic functions, exception known, this heat pipe network monitoring system needs to handling, and library functions form the underlying code achieve the following functions. of the heat pipe network monitoring system, and the user (1) This heat pipe network monitoring system must be able can call the function functions in the library functions to to acquire data measured by the mobile's integrated sensors achieve the corresponding logical functions, which and HD cameras in real time and follow TCP/IP reduces the difficulty of the user's development. communication protocols and OTG communication protocols to send the collected data and corresponding 3.3 Interface development operation instructions to the server and motion controller. Based on the analysis of the functions and architecture of (2) The thermal network monitoring system should have the monitoring software, the interface structure of the the capability to visualize sensor data, i.e. to enable the monitoring software is divided into three modules: the user to observe specific sensor data, the system must be login module, the monitoring module, and the user UI able to display the data as dynamic text; to visualize the module. The login module verifies the user's information, trend of the data, the system must enable the data to be and only when the user writes accurate information can the displayed as dynamic curves. monitoring software be opened; when the information (3) To meet the access to historical sensor data and video entered does not pass the background verification, the data, and to facilitate the user to further confirm the software will prompt the user to enter again or register the operation status of the heat pipe network inspection robot account until the login is successful. and the internal environment of the pipe, the heat pipe The monitoring module is divided into six interfaces: network monitoring system must be able to store the dynamic text display of data, dynamic curve display of acquired data in the database and have the function of data and video display, sensor selection interface, and querying and deleting historical data. network connection interface. In the dynamic text display (4) The main difference between this thermal network of sensor data and dynamic curve display interface, the monitoring system and other thermal network monitoring data is refreshed once per second; the video monitoring systems is the ability to achieve human-machine interface can preview the video data collected by the interaction, i.e. by parsing the XML file sent by the server mobile terminal in real-time; to reduce the amount of data and dynamically generating user-defined interfaces and and allow the user to select the required sensor data background logic, the thermal network inspection robot according to the specific project needs, the monitoring can be controlled to achieve the functions set by the user. software is designed with this in mind and the sensor (5) To ensure the security of user information, the thermal selection interface is designed; to achieve the network monitoring system needs to have a user login communication function with the server, the network interface so that only the user can use the thermal network connection interface is designed. To communicate with the monitoring system after entering the corresponding user server, a network connection interface has been designed, name and password. where the user only needs to enter the corresponding IP Based on the analysis of the functional requirements of the and port number to connect to the server and transfer the monitoring software, the design of its overall architecture data; to meet the user's needs and enable the user to query was completed. The main functions of the system are: to the historical data, a data query interface has been designed. obtain sensor data and video data in real-time and store the The user UI module, which is used to display the interface data in an intermediate database; to implement some basic dynamically generated by parsing the XML file sent by the functions such as listening to events (sensor listening server, enables human-computer interaction. The interface events, SMS listening events, etc.) and sending and design for this heat network monitoring system uses receiving broadcasts; to have a system exception and user Activity and Fragment components. Since Fragment takes code exception handling mechanism, when the user code up less memory than Activity, the interface design in this Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 209 paper uses more lightweight Fragment to improve the design is to make the functions of the system clearer, and running efficiency of the application. The interface of the the user can jump to the required function by a one-key heat pipe network monitoring system takes the main switch. The interface makes it easier for the user to operate interface as the core and dynamically loads the monitoring the system. The main monitoring interface has been interface and the user interface, etc. The data is passed divided into two modules: the monitoring module and the between the interfaces by binding objects of the Bundle user interface module, so that the options bar at the top is class, as shown in Figure 4. divided into two sections: "Data Monitoring" and "User Interface". The visualization bar at the bottom is also divided into three sections according to the function of the monitoring module: "Text data", "Curve data" and "Video data", and the layout between the sections is LinearLayout. The controls in the options bar and the visualization bar are not the basic controls provided by the mobile platform system, but rather developer-defined combinations of controls, with a uniform image at the top text at the bottom, and a LinearLayout layout. The background color becomes lighter when a control is selected. In the design of the heat network monitoring system, the main interface was created by writing an XML layout file for the corresponding interface. This makes it easier to control the layout of the controls, and the orientation and layout_weight properties in Linearlayout allow the combined controls to be distributed according to a certain layout ratio. The main interface is visualized by loading Figure 4: Interface interaction process of the heat pipe the activity_main.xml file in the main Activity, as shown network monitoring system in the following code. The monitoring system can selectively display sensor data, @Override i.e. the user can select the required sensor in the sensor type protected void onCreate(Bundle savedInstanceState) { selection interface, and in the sensor selection interface, super.onCreate(savedInstanceState); the system will jump to the data display interface to ...... visualize the data (dynamic text display and dynamic curve setContentView(R.layout.activity_main); display) according to the selection result. The process is as } follows: first, the system converts the selected sensor type The onCreate() function is called when the Activity is into a string and separates the different sensor names with initialized, and the setContentView() function is called a special symbol "%". Then the string is bound to a Bundle inside the function. The function is its core, and the object and the set Arguments() function is called to add the parameter R.layout.activity_main is the layout file for Bundle object with the string data to the data display monitoring the main interface. Fragment to be switched; the system function for 2, the design of the history data query interface switching between Fragments is called to switch between To facilitate the user to view and delete the historical data, the sensor selection interface and the data display interface; the user-independent query interface is designed. The user finally the function get Arguments() is called in the can write the start and end time in the edit box to query and Fragment responsible for the data display. Function get delete data within a specific period according to their needs; Arguments () to get the Bundle object, then call the Bundle. when the period queried by the user does not exist, the get String() function to get the String data passed from the system will give the user a prompt until the correct time is sensor selection interface, then call the string processing entered or returned, the data finding process is shown in function with the special symbol "%" as the Each member Figure 5. When the queried historical data exists, the heat of the array is the name of the sensor selected by the user, pipe network monitoring system provides two ways of data and the system iterates through the array to obtain and display, namely text display and curve display. display the real-time data of the sensor selected by the user. 3.4 Interface design Main interface design The main interface of the thermal network monitoring system in this paper adopts a "segmented" structure, i.e. the top of the interface is the options bar, the bottom is the visualization bar, and the middle is used as a container to load different forms of interfaces, the purpose of this 210 Informatica 49 (2025) 199-212 X. Wang et al. accuracy without substantially reducing processing speed, establishing it as a feasible choice for real-time fault identification in a variety of uses. The complexity analysis of optimization algorithms, such as the gravitational search algorithm (GSA), cuckoo algorithm (CA), and particle swarm optimization (PSO), focuses on their time complexity and resource needs. GSA has a time complexity of O (n. k), rendering it effective for moderate-sized issues, while CA exhibits O (n. log(n)), allowing for rapid convergence in smaller datasets. PSO, with a complexity of O (n. m), can become expensive as dimensionality rises. These variations have an effect on scalability; GSA's effectiveness renders it appropriate for bigger heat pipe networks, whereas CA and PSO may have higher computational overhead with large datasets, requiring optimizations for practical uses. 5 Conclusion In summary, the author has analysed the requirements of this heat pipe network monitoring system, focusing on the Figure 5: Data Search Process heat pipe network inspection robot, and completed the overall architecture design and basic interface design of 4 Usability testing the heat pipe network monitoring system. The specific The thermal network surveillance system's usability functional requirements of the heat pipe network testing analyses both the main interface and the historical monitoring system were analysed as follows: user login, data query interface to guarantee user satisfaction. The collection of sensor data, visualization of data, storage of main interface is segmented, with an options bar for "Data data, communication, dynamic generation of user-defined Monitoring" and "User Interface," as well as a interfaces and logic, and other functions. The overall visualization bar that displays "Text data," "Curve data," architecture of the heat pipe network monitoring system is and "Video data." This design provides easy access to designed. According to the functional requirements of the operations and intuitive interaction via custom-designed system, the data acquisition, basic functions, system, and controls. The historical data query interface allows users user code exception handling functions are unified and to enter start and end times for data retrieval and managed by the Service, which reduces the redundancy of elimination, with error messages for invalid queries. the code and facilitates maintenance at a later stage. Based Testing will concentrate on task completion times, error on the functional requirements and architecture of the heat rates, and user feedback in order to validate efficiency and network monitoring system, the interface architecture of detect enhancements. the system was designed, using Activity as the carrier to achieve the functionality of the system interface 4.2 Discussion interaction by dynamically loading Fragment layouts. The presented multi-classification fault diagnosis model From the application of common login methods and the outperforms existing SOTA techniques, with a characteristics of this software, the login module, data classification accuracy of 99.99% and a computation time query module, and main interface of the heat pipe network of only 55.938 seconds. This enhancement is due to the monitoring system are designed. The design of this execution of algorithmic optimizations, particularly the monitoring system is important for the improvement of gravitational search technique integrated with parameter monitoring efficiency. tuning of the LSSVM. These optimizations boost the model's capacity to navigate the parameter space Data Availability efficiently, leading to better classification efficiency than All data are included within the article. prior methods, which attained a maximum accuracy of 97.89% and required longer run times. This work makes a Conflicts of interest unique contribution by incorporating sophisticated optimization methods that not only raise accuracy but also The authors declare no conflicts of interest. decrease algorithmic intricacy, rendering the model more effective. While trade-offs between computation time and Funding statement accuracy are prevalent in machine learning, the proposed Not applicable. solution provides practical benefits by offering better Application Method and Least Squares Support Vector Machine.... Informatica 49 (2025) 199-212 211 References and engineering, 2(4), 100074. https://doi.org/10.1016/j.jpse.2022.100074 [1] Shen, Y., Chen, J., Fu, Q., Wu, H., Wang, Y., & Lu, [11] Yussof, N. A. M., & Ho, H. W. (2022). Review of Y. (2021). Detection of district heating pipe network water leak detection methods in smart building leakage fault using UCB arm selection method. applications. Buildings, 12(10), Buildings, 11(7), 275. 1535. https://doi.org/10.3390/buildings12101535 https://doi.org/10.3390/buildings11070275 [12] Langroudi, P. P., & Weidlich, I. (2020). Applicable [2] Perpar, M., & Rek, Z. (2020). Soil temperature predictive maintenance diagnosis methods in service- gradient is a useful tool for small water leakage life prediction of district heating pipes. Rigas detection from district heating pipes in buried Tehniskas Universitates Zinatniskie Raksti, 24(3), channels. Energy, 201, 294-304. https://doi.org/10.2478/rtuect-2020-0104 117684. https://doi.org/10.1016/j.energy.2020.11768 [13] van Dreven, J., Boeva, V., Abghari, S., Grahn, H., Al 4 Koussa, J., & Motoasca, E. (2023). Intelligent [3] Al Qahtani, T., Yaakob, M. S., Yidris, N., Sulaiman, approaches to fault detection and diagnosis in district S., & Ahmad, K. A. (2020). A review of water leakage heating: Current trends, challenges, and opportunities. detection method in the water distribution network. Electronics, 12(6), 1448. Journal of Advanced Research in Fluid Mechanics and https://doi.org/10.3390/electronics12061448 Thermal Sciences, 68(2), 152- [14] Hossain, K., Villebro, F., & Forchhammer, S. (2020). 163. https://doi.org/10.37934/arfmts.68.2.152163 UAV image analysis for leakage detection in district [4] Gams, M., & Kolenik, T. (2021). Relations between heating systems using machine learning. Pattern electronics, artificial intelligence and information Recognition Letters, 140, 158-164. society through information society rules. Electronics, https://doi.org/10.1016/j.patrec.2020.05.024 10(4), 514. [15] Vollmer, E., Ruck, J., Volk, R., & Schultmann, F. https://doi.org/10.3390/electronics10040514 (2024). Detecting district heating leaks in thermal [5] Li, W., Liu, T., & Xiang, H. (2021). Leakage detection imagery: Comparison of anomaly detection methods. of water pipelines based on active thermometry and Automation in Construction, 168, FBG-based quasi-distributed fiber optic temperature 105709. https://doi.org/10.1016/j.autcon.2024.10570 sensing. Journal of Intelligent Material Systems and 9 Structures, 32(15), 1744- [16] Kim, H., Lee, J., Kim, T., Park, S. J., & Kim, H. (2023). 1755. https://doi.org/10.1177/1045389x20987002 Advanced thermal fluid leakage detection system with [6] Zholtayev, D., Dauletiya, D., Tileukulova, A., machine learning algorithm for pipe-in-pipe structure. Akimbay, D., Nursultan, M., Bushanov, Y., ... & Case Studies in Thermal Engineering, 42, Yeshmukhametov, A. (2024). Smart Pipe Inspection 102747. https://doi.org/10.2139/ssrn.4147041 Robot with In-Chassis Motor Actuation Design and [17] Pérez-Pérez, E. D. J., López-Estrada, F. R., Valencia- Integrated AI-Powered Defect Detection System. Palomo, G., Torres, L., Puig, V., & Mina-Antonio, J. IEEE D. (2021). Leak diagnosis in pipelines using a Access. https://doi.org/10.1109/access.2024.345050 combined artificial neural network approach. Control 2 Engineering Practice, 107, 104677. [7] Murtazin, I. I., Kozhevnikov, M. V., & Starikov, E. M. https://doi.org/10.1016/j.conengprac.2020.104677 (2021). Development and application of methods of [18] García-Ródenas, R., Linares, L. J., & López-Gómez, internal inspection of district heating networks. J. A. (2021). Memetic algorithms for training International Journal of Energy Production and feedforward neural networks: an approach based on Management. 2021. Vol. 6. Iss. 1, 6(1), 56- gravitational search algorithm. Neural Computing and 70. https://doi.org/10.2495/eq-v6-n1-56-70 Applications, 33(7), 2561-2588. [8] Wong, B., & McCann, J. A. (2021). Failure detection https://doi.org/10.1007/s00521-020-05131-y methods for pipeline networks: From acoustic sensing [19] Kazeminasab, S., & Banks, M. K. (2022). Towards to cyber-physical systems. Sensors, 21(15), 4959. long-distance inspection for in-pipe robots in water https://doi.org/10.3390/s21154959 distribution systems with smart motion facilitated by [9] Liu, R., Zhang, Y., & Li, Z. (2022). Leakage diagnosis a particle filter and multi-phase motion controller. of air conditioning water system networks based on an Intelligent Service Robotics, 15(3), 259- improved BP neural network algorithm. Buildings, 273. https://doi.org/10.1007/s11370-022-00410-0 12(5), 610. https://doi.org/10.3390/buildings12050610 [10] Korlapati, N. V. S., Khan, F., Noor, Q., Mirza, S., & Vaddiraju, S. (2022). Review and analysis of pipeline leak detection methods. Journal of pipeline science 212 Informatica 49 (2025) 199-212 X. Wang et al. https://doi.org/10.31449/inf.v49i16.7779 Informatica 49 (2025) 213–234 213 Biometric-Based Secure Encryption Key Generation Using Convolutional Neural Networks and Particle Swarm Optimization Sahera A. S. Almola, Raidah S. Khudeyer, Hameed Abdulkareem Younis Department of Computer Information Systems, College of Computer Science and Information Technology, University of Basrah, Basrah, Iraq E-mail: sahera.sead@uobasrah.edu.iq, raidah.khudayer@uobasrah.edu.iq, hameed.younis@uobasrah.edu.iq *Corresponding author Keywords: biometric verification, fingerprints, deep learning, particle swarm optimization (pso) algorithm, encryption key generation Received: December 7, 2024 With the rapid expansion of computer networks and information technology, ensuring secure data transmission is increasingly vital—especially for image data, which often contains sensitive information. This research presents a biometric-based encryption system that uses fingerprint recognition and deep learning to generate strong, random encryption keys. Two convolutional neural networks (CNNs) are employed: one to verify identity based on a user’s ID and another to extract fingerprint features for key generation. These keys are optimized using Particle Swarm Optimization (PSO), enhancing their randomness and resistance to brute-force attacks. The system generates keys in real-time, eliminating the need for storage and minimizing the risk of theft or leakage. To further improve security, encryption keys are automatically updated after every ten messages, with different keys generated from multiple fingerprints of the same individual. Testing with the SOCOFing dataset (6,000 original and 49,270 synthetic images) achieved 99.75% identity verification and 99.83% classification accuracy. Performance metrics—entropy of 7.89, correlation factor of 0.00628, and zero repetition—demonstrate high robustness. This approach offers a secure, adaptive, and personalized encryption method ideal for sensitive domains like finance and healthcare. Povzetek: Opisana je izvirna metoda za generiranje varnih šifrirnih ključev z uporabo prstnih odtisov, CNN modelov in optimizacije roja delcev (PSO) 1 Introduction encrypted data. Consequently, there is an increasing need for systems that dynamically generate encryption keys on- Internet and network users share millions of color images demand at the user’s end, eliminating the need for key daily, which are utilized in various applications such as transmission over networks [3]. This innovative approach telemedicine, remote learning, business, and military ensures that the encryption key is generated locally each operations. Color images, in particular, often contain time data is decrypted, significantly reducing risks sensitive and detailed information, making them prime associated with key interception. It also eliminates the targets for unauthorized access and cyberattacks. need for key exchange, adding an extra layer of security Securing these images is crucial not only to prevent data since unauthorized parties cannot generate the key even if loss during transmission but also to protect sensitive communication is intercepted. information from attackers. Various techniques are The keyless exchange method, when combined with employed to secure digital images, such as watermarking, biometric verification, offers a highly secure solution by steganography, and image encryption. Encryption minimizing the risk of key theft. This approach aligns operates in two main stages: encryption and decryption. with the methodology presented in this research. During encryption, the input image is transformed into an However, implementing such a solution poses significant unreadable form using a secret key, while in decryption, challenges in the fields of secure computing and key the content is restored using the same key [1]. The management, as it requires a robust system to ensure the encryption key is a fundamental element in the consistent and accurate generation of keys [4]. The encryption and decryption processes, and it significantly importance of this research lies in emphasizing the determines the security system's strength. However, a generation of encryption keys locally at the user’s end to critical challenge faced by encryption systems lies in safeguard data and mitigate risks associated with key managing the encryption key itself [2]. Traditional transmission over networks. This is particularly critical encryption methods require transmitting the encryption for securing color images, as their high information key to the recipient to decrypt the data. This approach content often correlates with increased sensitivity, making introduces vulnerabilities, as any exposure of the key them especially vulnerable to sophisticated attacks. during transmission could lead to the compromise of the To address these challenges, advanced techniques based on artificial intelligence and machine learning, 214 Informatica 49 (2025) 213–234 S.A.S. Almola et al. particularly deep learning, have emerged. One notable generation model without the need for retraining. Erkan et technique involves using deep learning to generate al. (2024) [9] proposed a secure image encryption encryption keys from fingerprints. This method leverages framework that combines a chaotic logarithmic map with the extraction of unique features from fingerprints, a deep CNN for key generation. Their system incorporates converting them into robust, non-repetitive encryption advanced operations such as permutation, DNA encoding, keys to ensure high data security [5]. This method diffusion, and bit-reversal to ensure security. The addresses limitations in traditional encryption systems, robustness of this framework was validated through such as the need for key transmission over networks. comprehensive analyses, including key sensitivity and Since a fingerprint is a unique biometric identifier that resistance to various attacks, demonstrating superior cannot be easily copied or mimicked, it serves as an ideal performance compared to traditional encryption methods. source for generating encryption keys. Moreover, deep Quinga Socasi, Zhinin-Vera, and Chang (2020) [10] learning enhances the accuracy and strength of the developed a method for generating encryption keys from generated keys by utilizing deep neural networks to alphanumeric passwords using an autoencoder neural analyze biometric images and extract unique features for network. Their experiments revealed that this method each fingerprint [6]. This approach also resists advanced outperforms conventional algorithms, particularly when threats, including brute-force and quantum encryption encrypting small text files, making it highly resistant to attacks, by dynamically generating encryption keys in cracking attempts. Wu et al. (2022) [11] presented a real-time. The added layer of complexity and secrecy biometric key generation framework that uses fingerprints prevents unauthorized parties from accessing the keys, to achieve over 1024-bit key strength and 98% accuracy. even if communication data is partially intercepted [7]. However, their method depends on a predefined pipeline The integration of deep learning in generating encryption and fuzzy extractors for key stabilization. In contrast, the keys from fingerprints represents a significant method proposed in this research dynamically extracts advancement in information security. This approach high-resolution fingerprint features using deep learning combines robust security measures with individual models, ensuring greater adaptability across datasets. privacy, paving the way for building encryption systems These features are combined with chaotic encryption that are highly resistant to breaches and better equipped systems to enhance randomness and security. to address modern security challenges. The remainder of Furthermore, Particle Swarm Optimization (PSO) is this paper is organized as follows: Section 2 reviews employed to optimize the generated keys, achieving over related works, while Section 3 provides background on 99% accuracy and producing 1024-byte keys without the key techniques utilized in this research. Section 4 requiring stabilization layers. This approach demonstrates explains the management of secret keys. Section 5 details superior flexibility and security for real-world IoT the proposed method. Section 6 focuses on experimental applications. Alesawy and Muniyandi (2016) [12] results and performance analysis. Section 7 discusses the investigated data security in cloud environments using results, and Section 8 concludes this study. random encryption keys. Their study analyzed the impact of incorporating Elliptic Curve Diffie-Hellman (ECDH) 2 Related works keys and demonstrated significant improvements in efficiency and performance by integrating Artificial The integration of biometric data, chaotic systems, and Neural Networks (ANNs) with ECDH and genetic deep learning in encryption key generation has been a algorithms, despite increased processing times for larger prominent research area. Various studies have explored datasets. Saini and Sehrawat (2024) [13] proposed a innovative approaches to enhance the security and technique for generating unique encryption keys by robustness of encryption systems. Hashem and Kuban combining an autoencoder network with hashing (2023) [8] introduced a system that leverages fingerprint techniques and prime numbers derived from the MNIST biometrics to generate long, random encryption keys. The dataset. To enhance security, the system incorporates approach involves preprocessing fingerprint images to XOR operations and Blum-Blum-Shub (BBS) generators. remove noise, utilizing a modified VGG-16 Extensive testing confirmed the robustness of this convolutional neural network (CNN) to extract unique approach against attacks. Kurtninykh, Ghita, and Shiaeles features, and employing transfer learning to build a key (2021) [14] addressed the complexities of cryptographic key management in systems with Hashicorp Vault is particularly suitable for small increasing users and applications. They evaluated five key businesses due to its superior security features. A management systems, including Hashicorp Vault and summary of the related studies is provided in Table 1. for Pinterest Knox, focusing on features such as security, further reference. scalability, and access control. The study concluded that Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 215 Table 1: Previous works on key generation This research builds upon the foundations laid by these 3 Background studies, emphasizing the dynamic generation of This paragraph addresses two main techniques: CNNs and encryption keys using deep learning and chaotic systems PSO, which form the foundation of the methodology to address challenges in key management and enhance proposed in this research. In the following paragraphs, we security. The comparison Table 1. clearly demonstrates will provide a summary of each technique and explain its the superiority of our proposed method over all previous significance in the study. approaches. The proposed method utilizes dynamic keys generated by deep learning networks, which significantly A. CNNs are advanced models in the field of deep enhance randomness and security. Moreover, the key is learning, specifically designed to handle grid-like data, non-portable, non-persistent, and achieves the largest size such as images. In this research, two CNN models were and highest accuracy compared to other methods. used to generate an encryption key based on fingerprint 216 Informatica 49 (2025) 213–234 S.A.S. Almola et al. images. Table 2. summarize the components of each model used in the work. Table 2: Components of CNN models used Layer (Type) Output Shape Parameters (#) Conv2D (conv2d_1) ) None, 92, 92, 32( 832 BatchNormalization(batch_normalization_1) ) None, 92, 92, 32( 128 MaxPooling2D(max_pooling2d_1) ) None, 46, 46, 32( 0 Conv2D (conv2d_2) ) None, 42, 42, 64( 51,264 BatchNormalization batch_normalization_2) ) None, 42, 42, 64( 256 MaxPooling2D (max_pooling2d_2) ) None, 21, 21, 64( 0 Conv2D (conv2d_3) ) None, 19, 19, 128( 73,856 BatchNormalization (batch_normalization_3) ) None, 19, 19, 128( 512 MaxPooling2D (max_pooling2d_3) ) None, 9, 9, 128( 0 Dropout (dropout_1) 0 (None, 9, 9, 128) Flatten (flatten_1) 0 (None, 10368) Dense (dense_1) 10,617,856 (None, 1024) (None, 1024) 0 Dropout (dropout_2) Dense (dense_2) (None, 600) 615,000 The first model was designed to identify a person’s The two models were trained using the backpropagation identity based on their ID number. After confirming the technique with a suitable loss function for each task. This person’s identity, the second model identifies the selected architectural design was chosen to achieve accurate fingerprint and extracts its features. Both models rely on performance in recognizing the identity of the fingerprint convolutional layers to automatically and progressively owner through the identifier number in the file name, and extract important features from the input data, making then generate an encryption key based on the unique them effective in performing tasks, which, in turn, aids in features of the fingerprint using two convolutional neural generating strong encryption keys by analyzing fine networks. patterns in the images. Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 217 Pseudo-code for the PSO Algorithm 1. Initialize Parameters: o Define bounds: • Lower bound (lb) = 0 • Upper bound (ub) = 255 o Set PSO parameters: • Number of particles = len(keys) • Maximum iterations = 200 • Inertia weight (w) = 0.9 • Cognitive coefficient (c1) = 0.5 • Social coefficient (c2) = 0.5 o Set random seed for reproducibility. 2. Initialize Particles: o Convert keys to a NumPy array. o Set initial particle positions = keys. o Set initial velocities = zeros. o Initialize personal bests: • Personal best positions = initial positions. • Personal best scores = evaluate fitness for each particle. o Find global best: • global_best_position = position with the best score. • global_best_score = best persona l score. 3. Run PSO Optimization: For each iteration in range(num_iterations) do: o For each particle do: • Update velocity: o new_velocity = (w × cu rrent velocity) +(c1 × random factor × (Personal best – c urrent position)) +(c2 × random factor × (global best - current position)). • Update position: o new_position = current position + new_velocity. o Clip positions to bounds (lb, ub). • Evaluate fitness of the new posit ion. o Update personal best position and score. o Update global best: • If any particle's score is better than the global best score: o Update global best position and score. 4. Output Results: o Convert global_best_position to integer (best_key). o Compute best_entropy_value using the fitn ess function. Figure 1: PSO algorithm This approach adds an extra layer of security, ensuring The algorithm enhances the randomness and strength of that the keys are not only unique and non-repetitive but the generated keys, ensuring that they are both secure and also resilient to various forms of attacks. The use of resistant to attacks. PSO improves the key generation PSO ensures that the final encryption keys are both process by fine-tuning the key parameters in real-time, optimized for security and making it more robust against potential security threats. This approach adds an extra layer of security, ensuring B. PSO (Particle swarm optimization) is an that the keys are not only unique and non-repetitive but optimization algorithm inspired by the collective also resilient to various forms of attacks. The use of PSO behavior of birds or fish. It involves a group of ensures that the final encryption keys are both optimized particles, each representing a potential solution in the for security andgenerated dynamically, without the need solution space. Each particle adjusts its movement for permanent storage, thus reducing the risks of key based on its own experience and the experiences of leakage or unauthorized access. Key enhancement using neighboring particles, with the aim of reaching the PSO: The Particle swarm optimization (PSO) algorithm is optimal solution. PSO is known for its efficiency and used to enhance the quality of the initial key, making it ability to find optimal solutions in multi-dimensional stronger and more secure. Figure 1. illustrates the detailed spaces. In this research, PSO is applied to optimize the steps of the Particle swarm optimization (PSO) algorithm process of encryption key generation. using pseudo-code. This pseudocode reflects the essence of the PSO algorithm applied to optimize encryption keys 218 Informatica 49 (2025) 213–234 S.A.S. Almola et al. based on the fitness function (such as randomness or lifecycle of keys. By avoiding permanent storage and security). The process iteratively adjusts the position focusing on real-time generation and temporary (key) and velocity of the particles to find the optimal protection, the system significantly reduces the risks encryption key with high security. associated with key leakage or unauthorized access. This approach aligns with best practices in modern 4 Secure key management cybersecurity by combining the advantages of real-time key generation with robust temporary key management to Secure key management is a critical process to ensure the ensure the highest level of data protection throughout the protection of encrypted data across encryption systems. encryption process [16]. In the proposed methodology, the focus is on generating cryptographic keys in real-time without permanent storage, thus reducing the risks associated with key 5 Proposed method leakage. However, temporary handling and protection of keys during their lifecycle remain essential. Below is a Figure 2: presents the diagram for the proposed detailed explanation of the steps and importance of encryption key management and generation. secure key management, updated to reflect the real-time generation approach: Securing Communication and Transferring Confidential Information 1. Key generation: In the proposed system, keys are generated dynamically and in real-time using advanced techniques such as artificial neural networks, particularly convolutional neural networks (CNN). This approach ensures that the keys are both highly secure and non- repetitive, avoiding the need for long-term storage. These keys are designed to be sufficiently random and robust, minimizing the possibility of guessing or tampering. 2. Temporary key handling: While keys are not stored permanently, they are managed securely during their temporary existence within the system. During encryption or decryption processes, the keys are stored in memory with strict safeguards, such as memory encryption or secure enclaves, to prevent unauthorized Generating Encryption Key using CNN and Encrypting the access. Once the operation is complete, the keys are Image securely erased from the system to eliminate any residual risk. 3. Key distribution: Since the system eliminates the need for traditional key exchange, the reliance on secure protocols like SSL/TLS or Diffie-Hellman for key distribution is significantly reduced. Instead, the generated key remains local to the system, mitigating risks associated with interception during transmission [16]4. Key rotation: In systems where keys are reused for multiple sessions or extended periods, regular key Generating Encryption Key using CNN and Decrypting Image rotation is critical. However, in the proposed system, each key is uniquely generated for a specific session or operation, inherently providing the benefits of key rotation by design. 5. Key revocation: Although the system minimizes the use of persistent keys, mechanisms for immediate key invalidation are essential for scenarios involving session- based or temporarily stored keys. These mechanisms ensure that any exposed or misused keys are rendered unusable promptly [17]. Figure 2: Proposed method diagram 6. Importance of key management in real-time The systems: The proposed approach emphasizes the importance of secure key handling during the active Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 219 proposed method consists of three main parts. The first part 6. Managing the number of sent messages (dynamic begins with an algorithm for securing communication and key management): The system checks whether the managing encryption keys. This is followed by the second number of messages sent by the user has exceeded the part, which involves the process of generating the encryption allowed limit (10 messages). key and encrypting the image. Finally, the third part focuses on decrypting the image after the key has been generated. Each • If the limit is exceeded, the counter is reset to 1. of these parts will be explained in detail later. • If the limit is not exceeded, the current counter is used as an index for generating the encryption key. Part One: Securing communication and This mechanism ensures unique encryption keys managing confidential information transfer for each set of messages, enhancing data security. Additionally, it raises a critical question: "Can biometric fingerprint data generate The first part of Figure 2 illustrates an algorithm dynamic encryption keys resistant to quantum designed to ensure secure communication and reliable attacks?" key management between branches and the main branch. When a branch requests access to sensitive information (such as encrypted images), the main branch fulfills this This approach aims to strengthen the security of biometric request by sending the requested information after keys against advanced threats such as quantum attacks. encrypting it with a secure key, ensuring data protection during transmission. User ID is used to control access. 7. Sending the request to the branch (send request to branch): The request containing the ID (ID) and the Algorithm execution steps fingerprint index (P) is sent to the second branch for processing. The algorithm is executed in cooperation with the • In the second part: A key is generated for image following two parts in the diagram as follows: encryption, and the encryption process is executed. After encryption, the encrypted image is sent back to the 1. Starting the process (start): The process begins by first part. initializing the user's counter Counter [ID] to zero. • In the third part: A new key is generated to decrypt 2. Entering the ID number: The system prompts the the image. user to input their identification number to verify Once decryption is completed, the data is returned to the their identity. first part for the remaining steps. 3. Verifying the ID range (ID in 1..600): The system checks whether the entered ID number falls within the allowed range (1 to 600). Note: The details of the second and third parts will be explained in the following sections of the document for a precise and comprehensive understanding. In this way, the • If the number is outside the range, an error three parts form an integrated system that ensures secure message is displayed, and the user is asked to re- communication and the safe transmission of sensitive enter the ID. information effectively. • If the number is valid, the process moves to the next step. Algorithm features 4. Checking the match with the exit indicator (ID in exit): The system compares the entered ID with the • Biometric security: Fingerprints are used as a exit indicator list. means to verify user identities, which reduces the risks of unauthorized access. • Synchronization: The system relies on concurrent • If a match is found, the process is processing, enhancing performance efficiency and terminated. reducing response times for requests. • If no match is found, the process • Dynamic key management: Each key is generated continues to the next step. uniquely for each user based on their fingerprint, increasing the difficulty of breaching the system. 5. Incrementing the message counter (counter [ID] This algorithm ensures effective protection of += 1): encrypted data and enhances the security of communications between branches, making it an If the ID is valid and not listed in the exit indicator, the excellent choice for systems that require a high level user's message counter is incremented by 1. of security and privacy. 220 Informatica 49 (2025) 213–234 S.A.S. Almola et al. CDFmin−CDF(I) H′(I) = (1-L) (2) (NXM)−CDFmin Part Two: Encryption key generation using CNN and image encryption The histogram equalization process involves several key The encryption key is generated using CNN based on the parameters that affect the final outcome of the operation: fingerprint. This process is carried out as specified in Part 2 of the diagram, which includes the following 1. Cumulative distribution function (CDF): This is the operations: primary factor that determines how grayscale values are redistributed in the image. The CDF accumulates grayscale values progressively from the lowest to the 1. Database loading phase: This step is considered highest and is used to adjust the distribution. Through this one of the main preparatory phases in the system to function, the grayscale value distribution in the image is ensure the readiness of the data and models required calculated, and adjustments are made to spread these to achieve accuracy and security in encryption key values evenly across the color range. generation. In this research, the SOCOFing database was used, which contains fingerprints from 600 2. Minimum non-zero value (CDFmin): This refers to people of African descent, with each person having the smallest non-zero value in the cumulative distribution 10 fingerprints, resulting in a total of 6,000 original function. It is used to determine how grayscale values in fingerprints. Additionally, synthetic groups were the image will be adjusted to achieve a more balanced created with three levels of variation in the distribution. For example, if the grayscale values in the fingerprints: minor changes (Easy), medium image are concentrated around a particular value, utilizing changes (Medium), and significant changes (Hard). this minimum helps improve the distribution of those The total number of synthetic fingerprints used in values without significantly affecting the overall contrast training was approximately 49,270. The variation of the image. fingerprints were used for training the model, while the original fingerprints were used solely for testing. 3. Image size (N×M): This refers to the number of pixels 2. Data preprocessing phase: The following in the image. The larger the image (i.e., a greater N×M), processes are included: the more opportunities there are for accurately 3. Image size standardization: To ensure that all redistributing grayscale values. However, it is important images in the database are compatible with the to note that image size can impact processing speed, as model requirements, the dimensions of all images larger images require more computations. are standardized. A common size, such as 96×96 pixels, is often chosen to prepare the images for 4. Number of gray levels (L): Typically, L=256 in efficient model processing. The formula for resizing grayscale images (meaning there are 256 possible tonal the images can be expressed mathematically as levels ranging from 0 to 255). The number of gray levels shown in Equation (1) below: defines the range of colors that can be distributed across I'(x',y') = I(y/Sy ,x/Sx) ( 1) the image. In images with a high number of gray levels, Where: tonal gradations can be distributed more evenly, leading to I(x,y) is the original image, and I′(x′,y′) is the image better contrast enhancement. after resizing, with Sx and Sy representing the When applying this technique, the range of grayscale scaling factors in the image dimensions[18]. values in the image is expanded, and these values are evenly distributed across the color range, leading to A. Image enhancement using histogram increased contrast. This enhanced contrast reveals fine equalization: details in the image, such as the minutiae in fingerprints, The histogram equalization technique was applied to which might be poorly visible in low-contrast images. In enhance contrast in fingerprint images and highlight the case of fingerprints, fine details such as ridges and fine details. This technique is one of the patterns are often crucial for analysis and classification. fundamental methods in image processing and By using histogram equalization, the clarity of these fine quality enhancement, aiming to improve the details can be improved, aiding in better feature extraction distribution of grayscale levels in the image to make of the fingerprint and achieving higher performance in fine details more visible. In images with low systems that use fingerprint recognition. Figure 3. shows contrast, gray values may cluster within a narrow an example of fingerprints before and after contrast range, leading to the loss of fine details in dark or enhancement using histogram equalization. Notice how bright areas. Histogram equalization is used to the enhanced images display finer and clearer details address this issue by improving the distribution of compared to the original images. these gray values over a broader range of available colors, enhancing contrast and making details easier to detect. The process of adjusting the tonal gradients in the image is carried out using the following equation (2)[19]: Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 221 7. Training and evaluation phase of the two models The performance of both models is evaluated using standard performance metrics such as accuracy, validation, and error rate calculation. This ensures that the first model is capable of accurately verifying the identity of authorized individuals when requesting the encryption key. Similarly, the second model's performance is assessed to determine its ability to correctly identify the fingerprint belonging to the individual whose identity has (a) (b) b e e n v e r i f i ed. This evaluation is done using the test set. Figure 3: (a) Original image (b) Image after 8. Identity verification and key generation histogram equalization The identity of the individual and the fingerprint match 5. Database analysis phase with the registered name are verified using two deep After image enhancement, each fingerprint is analyzed to learning models. This is a key step in generating the identify distinctive features, such as patterns and key encryption key from fingerprints, as illustrated in Figure regions within the fingerprint. This helps prepare the data 4. for the model to understand the unique elements in each 9. Key optimization stage using PSO fingerprint. The goal of this process is to efficiently analyze the fingerprint database to extract the necessary To enhance the quality of the initial key and obtain a information for feeding two different models. This is stronger, more secure key, PSO algorithm is applied. This achieved by parsing the file name to extract the algorithm aims to improve the random distribution and individual's identity, finger type, and hand (right or left). security properties of the key. The goal of this algorithm Additionally, these steps prepare the data for the two is to increase the randomness of the key and ensure its models, allowing the first model to recognize the difficulty in being guessed or broken. The use of the PSO individual’s identity, while the second model identifies algorithm to optimize encryption keys relies on updating the finger type based on fingerprint information. After the positions and velocities of particles based on the applying these processes to the database, the data is individual's fingerprints, as illustrated in Figure 1. This divided into training and testing sets. Artificial continuous update of the keys, leveraging the best fingerprint data is used for training, while original data is personal and global positions, results in generating an used for testing. encryption key that is more secure and complex. This also raises the question: "How does the proposed system 6. Model building phase: After preparing the database, perform against statistical attacks?" This approach two models are built using CNNs. The first model aims aims to reduce the likelihood of the keys being exposed to to identify the person's identity (SubjectID), while the any repetitive patterns that could be exploited in statistical second model aims to determine the finger number attacks. Table 4. outlines the hyperparameters used in the (FingerNum) and extract distinctive features of the optimization algorithm, selected based on a series of finger. Each model consists of the layers shown in Table experimental trials. 2. The hyperparameters used in the models are illustrated in Table 3. Table 3: CNN hyperparameters configuration 222 Informatica 49 (2025) 213–234 S.A.S. Almola et al. Pseudo-Code for verification and encryption key generation 1. Initialize finger name function: o Define show_fingername(fingernum): • If fingernum >= 5: o Set hand = "right" and subtract 5 from fingernum. • Otherwise: o Set hand = "left". • Map fingernum to finger names (e.g., little, ring, middle, index, thumb). • Return the full finger name (hand + finger). 2. Verify fingerprint information: o Predict subject ID and finger number for a random fingerprint (rand_fp_num) from the test set using models: • Id_pred = predicted subject ID. • Id_real = actual subject ID. • fingerNum_pred = predicted finger number. • fingerNum_real = actual finger number. o Check predictions: • If both IDs and finger numbers match: o Print "Information confirmed" with subject ID and call show_fingername(fingerN um_pred) to get the finger name. • Otherwise: o Print "Prediction is wrong." 3. Extract candidate fingerprints: o Initialize lists keys1 (for original fingerprints) and keys2 (for dense layer outputs). o For each index i in the prediction range: • Get Id_check = predicted subject ID. • If Id_check == Id_pred: o Append the fingerprint to keys1. o Append the dense layer output to keys2. o Convert keys1 and keys2 to arrays. 4. Select target fingerprint: o Use index p1 t o select: • origina l_fp = keys1[p1]. • dense_o utput_finger_selected = keys2[p1]. 5. Apply data augmentation: o Define an image data generator (datagen) with Table 4+t:r aHnsyfoprmeratpioanrsa: meters of pso • Rotation, width/height shift, shear, 10. Image encryption stage and sending encrypted zoom, and horizontal flip. image o Reshape original_fp to fit the generator’s input format. Chen's chaotic system is a three-dimensional dynamic 6. Generate augmented fingerprints and keys: system that exhibits chaotic behavior and is based on o Use datagen to create 20 augmented fingerprints: nonlinear differential equations to represent the evolution • For each augmented fingerprint: of the state over time. It can be used to generate a chaotic o Generate a new fingerprint. o encryption key based on the system's state. The Chen Predict the dense layer output. chaotic system relies on the following equations that o Take absolute values of the describe the changes in the variables x, y, and z: output to create a key. o Append the key to the keys list. 𝑑𝑥 = 𝑎. (𝑥 − 𝑦) (3) 7. Return results: 𝑑𝑡 • Output Keys // To be used as input for PSO algorithm to find optimal key the list keys for use in encryption. 𝑑𝑦 = (𝑎 − 𝑐). x- x . z + c . y (4) 𝑑𝑡 𝑑𝑧 Figure 4: Pseudocode for the identity verification = 𝑥. 𝑦 - b . z (5) 𝑑𝑡 and key generation process Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 223 Where: x, y, and z are the variables that determine the • Key optimization stage using PSO: The key state of the chaotic system at time t. is optimized using the PSO algorithm. Decrypt the image and send it: The image is decrypted and sent. • a, b, and c are the parameters that control the Decryption: To decrypt the image, the same key (the behavior of t he system. Steps followed: chaotic key and the key generated from CNN) is used to perform an XOR operation on the encrypted image, restoring the original image. • Initial conditions: The process starts by defining the initial values for x, y, and z, which represent the state of the system at the beginning of the 6 Results and analysis simulation. These values are set in the code as This section of the research addressed four main axes: [1.0,1.0,1.0]. Numerical integration: The odeint evaluating the CNN, assessing the generated key, function is used for numerical integration to solve the evaluating the PSO algorithm, and finally comparing differential equations over time. Through this process, the results of the proposed method with similar the values of x, y, and z are updated at each time step, methods. As follows: based on the parameters a, b, and c that influence the system's behavior. A. Results of CNN models and performance analysis • Generating a chaotic sequence: A chaotic At this stage, the data was divided into training and sequence is generated by solving the differential testing sets to ensure the accuracy of the models in equations of the Chen chaotic system over multiple predicting and distinguishing between different time steps. This sequence is then used to generate a categories. 80% of the data was allocated for training, chaotic encryption key. and 20% for testing, ensuring an equal distribution of • Encryption key generation: The resulting categories in both sets to avoid bias. After training the chaotic sequence is converted into integer values models on the training set, their performance was ranging from 0 to 255 to represent color values in an tested using the test data. RGB image. This is done by multiplying each value in The accuracy of the models was calculated using the sequence by 255 and converting it to the uint8 data the accuracy function available in the TensorFlow type. library, which represents the ratio of correct predictions • Combining the chaotic key with the to the total number of predictions. To continuously generated key: The chaotic key is combined with the monitor performance, TensorBoard was used, which key generated using CNN through an XOR operation. helped track various metrics such as accuracy and loss This step increases the complexity of the final key used throughout the training and testing phases, allowing for for image encryption. ongoing improvements to the models based on these • Encrypting the image: The XOR operation is indicators. The results obtained showed varying applied between the original image and the final key to performance across the different models, with these generate the encrypted image. This operation results summarized in Tables 5, 6, and 7, which transforms the pixel values in the image into new illustrate accuracy and loss across different values based on the chaotic key. generations. Additionally, the graphs in Figures 5, 6, and 7 show the evolution of the models' performance Part three: key generation using CNN and image over time, highlighting the models' ability to learn and decryption: improve progressively. Where the false positive rate was 0.000185. This part is similar to the stages in Part 2, with the only difference being that the models are not built and Table 5: Model performance comparison: accuracy and trained again; instead, the previously saved models loss. are loaded. Additionally, there is a decryption stage instead of encryption. The stages in this part are as follows: • Data loading phase: Only the test set is loaded (i.e., 600 genuine fingerprints from SOCOFing). • Data preprocessing phase: The raw data is processed to prepare it for the next stage. • Database analysis phase: The data is analyzed to extract the necessary information. • Loading the saved models: The previously trained models are loaded. • Verification and key generation: The data is verified, and the key is generated. 224 Informatica 49 (2025) 213–234 S.A.S. Almola et al. Table 6: Classification report for finger recognition Figure 6: Accuracy and loss of the fingerprint model. Table 7: Classification report for subjectID recognition Figure 7: Confusion matrix and accuracy metric B. Encryption key evaluation results and metrics: The generated encryption key was evaluated using a set of specialized metrics to ensure its quality and effectiveness in resisting cyberattacks. The experiments were conducted using fingerprint images sized 96 × 96 in a Kaggle environment with Python, on a workstation equipped with an Xeon(R) I (R) processor, 64 GB of RAM, and a GPU P100. The metrics used included evaluations such as key size and various randomization tests (such as entropy test, repetition test, etc.) to assess the randomness of the key and its predictability. These tests help ensure that the system remains unaffected when used in live applications. 6.1 Key space analysis Brute-force attack is a type of cyber-attack that relies on guessing the key by attempting a large number of possible passwords or secret phrases. An encrypted image with a short key is highly vulnerable to this attack over time. However, if the key is longer, it will remain resistant for a longer period. Therefore, it becomes impossible to guess the key if it has an adequate length. Figure 5: Accuracy and loss of the identity model. Key space analysis is used to assess the strength against brute-force attacks. According to this analysis, a key with a length greater than 2100 is considered suitable for high-security encryption [26]. In our system, we propose an approach based on deep neural networks (CNN) and PSO to generate this key. Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 225 The key has a size of 1024 values, with each value 6.2 Significance of results in cryptographic ranging between 0 and 255. This means the key consists key management of 1024 bytes (since each value requires one byte, and 8 bits are enough to represent values from 0 to 255). Given that each value in the key ranges from 0 to 255, we have • The results, such as randomness tests and high 256 possibilities for each value. With 1024 values, the entropy, demonstrate that the generated key exhibits a total key space will be: 2561024 or, in other words, 2 high degree of randomness, making it ideal for high- 8×1024=28192.This represents an extremely large key space, security applications. which is sufficiently large to be highly resistant to brute- • High entropy indicates that the keys have a force attacks. A key size of 28192 offers a very high level uniform distribution of values, reducing of security, making it practically impossible to crack • the likelihood of predicting any part of the key, using brute-force methods, even with fast computing which is a critical feature in key management. devices. Nonetheless, the question remains: How does the proposed system perform against brute-force 6.3 Encryption key tests attacks? In this study, six fingerprint samples were used as the basis to generate six encryption keys. Each key underwent Comparison of key space (28192) with comprehensive testing using eight different metrics to traditional systems determine the quality and randomness of the generated keys. The results of these eight tests were systematically presented in a table, reflecting the effectiveness of the • Comparison with AES-256: proposed method. The results showed the success of the keys in all eight tests, confirming that the keys generated The proposed key space (28192) is significantly larger from the fingerprints meet the required security standards. compared to AES-256, where the key space is These tests demonstrate the randomness and approximately 2256. This substantial difference makes our unpredictability of the keys, making the approach suitable key space more resistant to brute-force attacks. for secure encryption applications. The core encryption Traditional systems like AES-256 rely on efficient key tests include the following: algorithms to compensate for the smaller key space compared to the vast proposed space. • Entropy test: • Comparison with RSA-2048: The entropy measures the distribution of information in the key and reflects the level of randomness. The The proposed key space is also significantly larger entropy is calculated using the following equation (6): compared to RSA-2048, where the key space is approximately 22048. RSA relies on computational H(X)= −∑𝑛 𝑖=1 𝑝 (𝑥𝑖) 𝑙𝑜𝑔2 𝑝(𝑥𝑖) (6 ) complexity for large numerical factorization, whereas in our system, the security strength depends on the key Where p(xi) is the probability distribution of the value xi length derived from biometric features processed through in the key. If the entropy equals 8 bits, it means the key is deep networks. completely random [26]. • Comparison with ECC-384 (Elliptic curve Table 8: Results of the entropy test cryptography): The traditional key space for ECC-384 is approximately 2384, which is much smaller compared to our proposed key space (28192). ECC relies on elliptic curves to compensate for shorter keys, but in contrast, we provide much longer keys derived from neural networks, enhancing their unpredictability. • Repetition test: The repetition test generally aims to ensure that the key does not contain any repeated sections • Comparison with DES (Data encryption within its sequence, whether these sections are adjacent or standard): non-adjacent. If parts of the key are repeated, it weakens the randomness and increases the likelihood of The key space in DES is 256, which is extremely small discovering a pattern that can be exploited in an attack. compared to our proposed key space. This test involves checking all parts of the key to detect DES is considered outdated and vulnerable to brute- any repetition that might impact its security level. The force attacks, whereas our proposed key space vastly repetition test addresses repetition in the key overall, surpasses it in terms of length and complexity. whether in adjacent or non-adjacent parts [27]. 226 Informatica 49 (2025) 213–234 S.A.S. Almola et al. Table 9: Results of the repetition test Table 11: Results of the repetition test(adjacent) • Uniformity test using the chi-squared test: • Pearson correlation test is a statistical test used to measure the relationship between two variables. This test aims to check whether the values in the This relationship is expressed by a coefficient encryption key are evenly distributed across the full called the "Pearson correlation coefficient," which range of possible values. The chi-squared test is used to ranges from -1 to 1. If the correlation coefficient is compare the actual distribution of values in the key with close to 0, it indicates no correlation (high the expected ideal distribution. If the values are evenly randomness), making the encryption key strong and distributed, the key is considered to have a uniform hard to predict. The purpose is to determine the distribution. Equation (7) illustrates the test: extent of the correlation between values in the encryption key. If the correlation coefficient is 255 𝑛 close to 0, it indicates that the key is sufficiently ( −𝑛𝑖) χ2=∑ ( 256 ) (7 ) random, thus making it strong against analytical 𝑛/256 𝑖=0 attacks. The equation (8) represents the Pearson correlation equation: • ni: The frequency of occurrence of value i in the key. • n/256: The expected frequency for each value i assuming a uniform distribution. ∑(Xi−Xˉ)(Yi−Yˉ) r = (8) • n: The total number of values in the key. √∑(Xi−Xˉ)2⋅∑(Yi−Yˉ)2 • If the chi-square (χ2) value is low, it indicates that the actual distribution of values is close to the Where: ideal distribution, meaning the key is evenly distributed. • r: Pearson correlation coefficient. • At a significance level of 0.05, if the chi-square • Xi: Individual values in the first series. value is less than 293.25, the key is considered to • Yi: Individual values in the second series (e.g., have passed the test and has a uniform distribution lagged values in time series). [28]. • X̄: Mean of the Xi values. • Ȳ: Mean of the Yi values [30]. Table 10: Results of the uniformity test Table 12: Results of the pearson correlation test • Repetition test(adjacent) focuses specifically on identifying repetition in adjacent parts of the key. • Stability test This test checks for any repeated consecutive or sequential sections that might indicate a fixed pattern The key must remain stable if the input data is stable. or excessive repetition, which could weaken the This means that if the same inputs are used to generate effectiveness of encryption. Repetition of adjacent the key multiple times, the resulting key should always parts is considered a sign of poor randomness, thus be identical. However, slight changes in the inputs reducing the strength of the key. The closer the value should result in a significant change in the key, which is to 0, the less repetition there is, which means the enhances encryption strength against attacks. key has a higher level of randomness [29]. Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 227 6.3 Consistency of fixed inputs Table 14: Results of the stability test (1) o If the input is fixed I, the encryption system should produce the same key K every time: F(I)=K. o Repeat key generation multiple times using the same I, and the result should be consistent K in all attempts: K1=K2=⋯=Kn. Table 13: Results of the stability test (1) Table 15: Encryption results with the original key and the modified key 6.5 Range test The range test aims to evaluate the distribution of encryption key values within a specific range to ensure its randomness. Steps of the range test 1. Calculate the range: Determine the difference between the maximum value max and the minimum value min. Range=max−min 6.4 Sensitivity to minor changes 2. Range splitting: divide the range into buckets. (inclusivity effect) 3. Frequency calculation: count the values in each We make a slight change in the input I to create I′. bucket. A new key K′ is generated using I′: F(I′) = K′. We 4. Distribution analysis: if the frequencies are measure the difference between K and K′ using the approximately equal, the key is considered random. Bit Change Rate: expected frequency equation (9): Bit difference between K and K′ Bit change rate= × Ei = N/M (9) 1024 where N is the total number of values, and M is the 100% number of buckets[27]. The change rate should be higher than 50% to ensure the system's sensitivity to changes [31]. 228 Informatica 49 (2025) 213–234 S.A.S. Almola et al. Table 16: Results of the range test Table 17: Results of the autocorrelation test • Autocorrelation test The Autocorrelation Test is used to determine the randomness of a sequence of values in an encryption key. If the key is sufficiently random, the autocorrelation values should be small or close to zero, indicating no clear pattern or dependency in the sequence. To calculate the autocorrelation at a lag d, equation (10) is used: 1 𝑛−𝑑 R(d) = ∑ (𝑥𝑖. 𝑥𝑑 + 𝑖 ) (10) C. Results of using PSO in enhancing the encryption key 𝑛−𝑑 𝑖=1 After completing the specified number of iterations, the best encryption key is obtained, which is the key that Where: R(d) is the autocorrelation coefficient for lag d. achieved the highest fitness during the optimization process. Table 18. illustrates the effect of using PSO on • xi is the value at position i in the sequence. the generated encryption key. • Xd+i_ is the value at position d+i in the sequence. • n is the length of the sequence. Table 18: The impact of the PSO algorithm in improving the encryption key. A value of R(d) close to zero for different values of d indicates a high level of randomness in the encryption key, The Figure 8. shows the distribution of autocorrelation test results, highlighting successful and failed values based on the specified critical value (0.05) [28]. D. Comparison of the accuracy of the proposed system with other systems This section evaluates the accuracy of our proposed system in comparison to other systems reported in recent years, based on their respective sources. Experimental results from our proposed model demonstrated an accuracy exceeding 99%. Table 19. presents a detailed comparison between our system and other existing Figure 8: Distribution of autocorrelation test results with systems. success and failure indication based on critical value Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 229 Table 19: Accuracy comparison between our system generates robust keys. and recent approaches. Randomness tests (e.g., entropy): The use of PSO in the proposed system significantly contributed to enhancing randomness, which strengthens the generated keys. When comparing randomness tests (e.g., entropy) with other models, the proposed system showed remarkable superiority. The combination of CNN and PSO enabled the generation of keys with excellent randomness levels, providing a higher degree of security compared to traditional models. PSO helps optimize the quality of the keys by searching for the optimal combination of hidden parameters, making them more random and harder to break. It is important to note that the model only retains the predictions generated during its operation, which are The aim of this comparison is to evaluate the values devoid of any sensitive information. This effectiveness of the proposed system in the context of enhances the system's security against various types of recent advancements in deep learning technology, attacks, such as mixed replacement attacks, crossover providing insight into how cybersecurity can be attacks, and exhaustive search attacks. In such attacks, enhanced through the application of advanced the attacker has no knowledge of the key generation techniques. The table also reflects the ongoing progress mechanism or the supporting data, making the number in fingerprint data processing, showing that modern of attempts required to crack the key increase systems achieve higher accuracy than traditional proportionally with its length. For example, if the key systems, supporting the idea that using deep learning length is 1024, the number of possible combinations can improve the effectiveness and security of would reach 28192 The proposed system focuses on encryption systems. enhancing data security by avoiding key storage, improving the randomness of key generation, and 7 Discussion protecting sensitive information from various attacks, In this section, we will discuss the results of the while ensuring high efficiency in user fingerprint proposed system in comparison to modern methods recognition. presented in Table 19, focusing on accuracy, randomness tests (such as entropy), and robustness. Robustness: Biometric key (encryption key for Additionally, we will address potential trade-offs security): associated with using CNNs, such as computational The biometric key is generated based on the parameters overhead. Below is a detailed comparison of key learned during the training of the CNN model. results. During training, the model learns unique representations or features extracted from fingerprints. These 7.1 Comparison of results representations are numerical weights that are not easily interpretable. Accuracy: The proposed system (using CNN and The parameters are converted into an encryption key PSO) achieved an accuracy between 99.73% and that relies on the unique properties of each fingerprint, 99.83% within just 20 epochs, outperforming most making the key: models in the table. For example: The enhanced VGG-16 model achieved 99.98% accuracy in 100 • Unique and tamper-proof. epochs, the highest in the table, but required five • More secure and resistant to duplication. times more epochs than the proposed system. The Modified-LeNet model achieved 99.10% accuracy Role of PSO in key enhancement: in 55 epochs, which is lower than the proposed system. The DeepFKTNetmodel achieved 98.89% PSO improves the key by identifying the optimal values accuracy in 60 epochs. Thus, the proposed system of the parameters used in key generation. This enhances stands out as a strong option, delivering high accuracy randomness and independence among keys, making them more resistant to attacks. in less training time, thanks to the combination of CNN and PSO, which enhances feature extraction and 230 Informatica 49 (2025) 213–234 S.A.S. Almola et al. 7.2 Advantages of the proposed system gradients, resulting in better stability during the training process and improved model performance in generating encryption keys. Table 20. illustrates the key strength The proposed system combines CNN and PSO to (entropy measure) when using the Tanh activation achieve: function compared to using the ReLU activation function. • High classification accuracy in less time. Table 20: Compares the key strength (entropy) of the • High-quality encryption keys with excellent levels of Tanh and ReLU activation functions. randomness and security. • Strong protection of users’ biometric data against exploitation or breaches. • Improved biometric key performance using PSO to generate stronger and more random keys, increasing the system’s resilience to cyber threats. Thus, the proposed system leverages the multiple features of CNN and PSO, making it more robust in addressing security challenges such as resistance to adversarial attacks. While other models primarily focus on classification and accuracy, the proposed system demonstrates additional strength in encryption applications. Potential trade-offs: Although the use of CNN in the proposed system results in a slight increase in computational overhead compared to simpler models like the modified LeNet, this does not The batch normalization layer plays a significant role pose a significant obstacle. The model is designed to in stabilizing and accelerating the learning process in operate efficiently on modern systems supported by deep models by normalizing the outputs to have a Graphics Processing Units (GPUs), ensuring accelerated mean of 0 and a standard deviation of 1. While this training and reduced execution time. stabilization is beneficial in many applications, such as image classification, it may negatively impact the strength of the generated encryption key. 8 Conclusions • The results indicate that the generated keys exhibit • The results of this research show that integrating high levels of randomness, making them more biometric techniques with deep learning provides an challenging to breach. Additionally, the use of PSO innovative and effective solution for generating algorithm is considered an effective technique for secure and robust encryption keys based on enhancing the randomness of the keys, as it allows for fingerprints. The proposed system enhances the generating different keys for each transmission, security of data transmitted over the internet, making thereby reducing the risk of key theft and increasing it more resistant to theft and tampering. The use of security. A comprehensive analysis of the performance two convolutional neural network models is a of the models used in this research was conducted, significant step, where the first model contributes to showing a significant improvement in encryption identity recognition and the second focuses on effectiveness and the reliability of the generated keys, fingerprint detail recognition, ensuring the extraction underscoring the efficiency of these models in the of unique and reliable biometric features. context of cybersecurity. • One of the main conclusions of this research is that • The proposed approach enhances system security the tanh activation function plays a crucial role in compared to traditional systems by reducing reliance neural networks for generating encryption keys. This on static keys, which are a vulnerability in many function is known for its ability to transform outputs encryption systems. Instead, biometric verification is into the range of (-1, 1) non-linearly, which used to generate unique keys for each user based on contributes to improving the quality of the generated their fingerprints, thereby increasing the level of keys. Increased complexity and randomness: security. This research provides a significant The tanh function ensures a more balanced contribution to systems that require high levels of distribution of values across the range (-1, 1), reducing protection, such as financial systems and medical data, value concentration and enhancing the randomness of the by facilitating biometric verification for encryption key, leading to the generation of secure and robust without the need to exchange keys, thereby reducing encryption keys. Better stability during training: The associated risks. Additionally, the automatic key tanh function helps avoid issues such as vanishing change feature adds an extra layer of security, Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 231 reflecting the effectiveness of this system in providing [8] Hashem, M. I., & Kuban, K. H. (2023). Key advanced protection. Ultimately, the research generation method from fingerprint image based on highlights the importance of integrating biometrics deep convolutional neural network model. Nexo and deep learning in developing effective security Revista Científica, 36(6), 906-925. solutions that address contemporary challenges in https://doi.org/10.5377/nexo.vXXiXX.XXXX data protection. [9] Erkate, U., Toktas, A., Enginoglu, S., Karabacak, E., & Thanh, D. N. H. (2024). An image encryption • The method presented in the research has wide scheme based on chaotic logarithmic map and key potential for application in various fields. In addition generation using deep CNN. Expert Systems with to securing fingerprints and using them to generate Applications, 237, 121452. encryption keys, the method can be applied to secure https://doi.org/10.1016/j.eswa.2023.121452 internet of things (IoT) devices by generating strong [10] Quinga Socasi, F., Zhinin-Vera, L., & Chang, O. encryption keys that protect communication between (2020). A deep learning approach for symmetric key devices. It can also be used to secure data stored in cryptography system. In Proceedings of the Future the cloud by generating high-security encryption keys Technologies Conference (pp. 41). based on unique user attributes, such as fingerprints. These applications highlight the flexibility and https://link.springer.com/chapter/10.1007/978-3-030- efficiency of the method in addressing modern 63128-4_41 cybersecurity challenges and enhance its appeal in [11] Wu, Z., Lv, Z., Kang, J., Ding, W., & Zhang, J. various practical scenarios. (2022). Fingerprint bio-key generation based on a deep neural network. International Journal of References Intelligent Systems, 37(7), 4329–4358. https://doi.org/10.1002/int.22782 [1] Hosny, K. M., Darwish, M. M., & Fouda, M. M. [12] Alesawy, O., & Muniyandi, R. C. (2016). Elliptic (2021). Robust color images watermarking using Curve Diffie-Hellman random keys using artificial new fractional-order exponent moments. IEEE neural network and genetic algorithm for secure data Access, 9, 47425–47435. over private cloud. Information Technology Journal, https://doi.org/10.1109/ACCESS.2021.3069317 15(2), 77-83. https://doi.org/10.3923/itj.2016.77.83 [2] Kuzior, A., Tiutiunyk, I., Zielińska, A., & Kelemen, [13] Saini, A., & Sehrawat, R. (2024). Enhancing data R. (2024). Cybersecurity and cybercrime: Current security through machine learning-based key trends and threats. Journal of International Studies, generation and encryption. Engineering, Technology 17(2). https://doi.org/10.14254/2071-8330.2024/17- & Applied Science Research, 14(3), 14148-14154. 2/5 https://doi.org/10.48084/etasr.7181 [3] Saran, D. G., & Jain, K. (2023). An improvised [14] Kurtninykh, I., Ghita, B., & Shiaeles, S. (2021). algorithm for a dynamic key generation model. In Comparative analysis of cryptographic key Inventive Computation and Information management systems. King's College London, Technologies: Proceedings of ICICIT 2022 (pp. Strand, London, WC2R 2LS, UK. 607–627). Springer Nature Singapore. https://doi.org/10.48550/arXiv.2109.09905. https://doi.org/10.1007/978-981-19-5048-5_44 [15] SSL Support Team. (2024, May 3). Key [4] Rahman, Z., Yi, X., Billah, M., Sumi, M., & Anwar, management best practices: A practical guide. A. (2022). Enhancing AES using chaos and logistic Retrieved from [SSL Support Team Website] map-based key generation technique for securing https://www.ssl.com/article/key-management-best- IoT-based smart home. Electronics, 11(7), 1083. practices-a-practical-guide// https://doi.org/10.3390/electronics11071083 [16] Wang, L., & Lv, Y. (2024). Differential privacy- [5] Kuznetsov, O., Zakharov, D., & Frontoni, E. based data mining in distributed scenarios using (2024). Deep learning-based biometric decision trees. Informatica, 48(2), 145–158. cryptographic key generation with post-quantum https://doi.org/10.31449/inf.v48i23.6918 security. Multimedia Tools and Applications, [17] Tu, Z., Milanfar, P., & Talebi, H. (2023). MULLER: 83(19), 56909–56938. Multilayer Laplacian Resizer for Vision. https://doi.org/10.1007/s11042-023-15265-6 ResearchGate. Retrieved from [6] Yang, W., Wang, S., Cui, H., Tang, Z., & Li, Y. https://www.researchgate.net/publication/369855623 (2023). A review of homomorphic encryption for _MULLER_Multilayer_Laplacian_Resizer_for_Visi privacy-preserving biometrics. Sensors, 23(7), on. 3566. https://doi.org/10.3390/s23073566. [18] Saifullah, S., Pranolo, A., & Dreżewski, R. (2024). [7] Rana, M., Mamun, Q., & Islam, R. (2023). Comparative analysis of image enhancement Enhancing IoT security: An innovative key techniques for brain tumor segmentation: Contrast, management system for lightweight block ciphers. histogram, and hybrid approaches. Journal Name, Sensors, 23(18), 7678. Volume (Issue), Page range. https://doi.org/10.3390/s23187678 https://doi.org/10.48550/arXiv.2404.05341 232 Informatica 49 (2025) 213–234 S.A.S. Almola et al. [19] Singh, P., Dutta, S., & Pranav, P. (2024). https://doi.org/10.3390/math10081285 Optimizing GANs for Cryptography: The Role and [30] Nahar, P., Chaudhari, N. S., & Tanwani, S. K. Impact of Activation Functions in Neural Layers (2022). Fingerprint classification system using Assessing the Cryptographic Strength. Applied CNN. Multimedia Tools and Applications, 81(17), Sciences, 14(6), 2379. 24515–24527. https://doi.org/10.1007/s11042-022- https://doi.org/10.3390/app14062379. 13494-6 [20] Zhang, B., & Liu, L. (2023). Chaos-Based Image [31] Nguyen, H. T., & Nguyen, L. T. (2019). Encryption: Review, Application, and Challenges. Fingerprints classification through image analysis Mathematics, 11(11), 2585. and machine learning method. Algorithms, 12(11), https://doi.org/10.3390/math11112585 241. https://doi.org/10.3390/a12110241 [21] Taylor, O. E., & Igiri, C. G. (2024). Enhancing [32] Ang, L.-M., Seng, K. P., Ijemaru, G. K., & image encryption using histogram analysis, adjacent Zungeru, A. M. (2018). Deployment of IoV for pixel autocorrelation test in chaos-based framework. smart cities: Applications, architecture, and International Journal of Computer Applications, challenges. IEEE Access, 7, 6473–6492. 186(22). https://doi.org/10.5120/ijca202492338 https://doi.org/10.1109/ACCESS.2018.2886575 [22] Munshi, N. H., Das, P., & Maitra, S. (2022). Chi- [33] Saeed, F., Hussain, M., & Aboalsamh, H. A. Squared Test Analysis on Hybrid Cryptosystem. (2018a). Classification of live scanned fingerprints Volume 14, Issue 1, 34-40. using dense SIFT based ridge orientation features. https://doi.org/10.2174/18764029136662105082357 2018 1st International Conference on Computer 06. Applications & Information Security (ICCAIS), 1– [23] Rasheed, A. F., Zarkoosh, M., & Abbas, S. (2023, 4. https://doi.org/10.1109/CAIS.2018.8441995 October). Comprehensive Evaluation of Encryption [34] Saeed, F., Hussain, M., & Aboalsamh, H. A. Algorithms: A Study of 22 Performance Tests. 2023 (2018b). Classification of live scanned fingerprints Sixth International Conference on Vocational using histogram of gradient descriptor. 2018 21st Education and Electrical Engineering (ICVEE), Saudi Computer Society National Computer Surabaya, France, 191-194. Conference (NCC),1–5. https://doi.org/10.1109/ICVEE59738.2023.1034824 https://doi.org/10.1109/NCC.2018.8682629 0. [24] Feng, L., Du, J., & Fu, C. (2023). Double graph correlation encryption based on hyperchaos. PLOS ONE, 18(9), e0291759. https://doi.org/10.1371/journal.pone.0291759. [25] Barker, E., & Roginsky, A. (2024). NIST SP 800- 131A Rev. 3: Transitioning the use of cryptographic algorithms and key lengths (Initial Public Draft). National Institute of Standards and Technology. https://doi.org/10.6028/NIST.SP.800-131Ar3.ipd. [26] Avaroğlu, E., Kahveci, S., & Akkurt, R. (2024). Optimization of Acoustic Entropy Source for Random Sequence Generation Using an Improved Grey Wolf Algorithm. Computer Engineering Department, Faculty of Engineering, Mersin University. https://doi.org/10.18280/ts.410220 [27] Foreman, C., Yeung, R., & Curchod, F. J. (2024). Statistical testing of random number generators and their improvement using randomness extraction. Cryptology ePrint Archive, Paper 2024/492. Retrieved from https://doi.org/10.3390/e26121053 [28] Taylor, O. E., & Igiri, C. G. (2024). Enhancing image encryption using histogram analysis, adjacent pixel autocorrelation test in chaos-based framework. International Journal of Computer Applications, 186(22). https://doi.org/10.5120/ijca2024923653 [29] Saeed, F., Hussain, M., & Aboalsamh, H. A. (2022). Automatic fingerprint classification using deep learning technology (DeepFKTNet). Mathematics, 10(8), 1285. Biometric-Based Secure Encryption Key Generation Using… Informatica 49 (2025) 213–234 233 https://doi.org/10.31449/inf.v49i16.9490 Informatica 49 (2025) 235–248 235 CNN and LSTM-Based Multimodal Data Fusion for Performance Optimization in Aerobics Using Wearable Sensors Danhua Tan School of Physical Education, Hengyang Normal University, Hengyang, Hunan, 421006, China E-mail: tandanhua184914@outlook.com Keywords: wearable sensors, convolutional neural network, long short-term memory, kalman filtering, aerobics movements Received: May 31, 2025 Aerobics is a high-intensity, multi-dimensional sport. Its motion evaluation places higher demands on data quality and time series modeling capabilities. This paper proposes a method for evaluating aerobics motion that integrates wearable sensors and motion tracking systems. It combines convolutional neural networks (CNNs) with long short-term memory networks (LSTMs) to perform fusion analysis on multimodal data from accelerometers, gyroscopes, magnetometers, and Kinect motion capture systems. To improve data quality, Kalman filtering, time synchronization, and wavelet transform techniques are introduced to preprocess the raw data. Experimental results show that this method performs well in motion classification tasks: in indoor low-intensity training scenarios, the accuracy of the CNN model increases from 74.5% to 87.1%; in high-intensity training scenarios, the accuracy increases from 75.0% to 88.2%. After combining with LSTM, the model further enhances the modeling capabilities of motion time series features and improves the recognition accuracy of complex motions. In different training scenarios, the average improvement rate of motion scores is 25.8%. The system feedback delay is controlled within 200 milliseconds, with good real-time and practical performance. This method provides aerobics athletes with high-precision movement assessment and personalized training suggestions, promoting the intelligent and personalized development of sports training. Povzetek: Metoda združuje senzorje, CNN in LSTM za multimodalno analizo aerobičnih gibov. Kalmanovo filtriranje izboljša kakovost signalov, klasifikacijska točnost naraste do 88,2 %, povprečno izboljšanje rezultatov znaša 25,8 %, odzivnost sistema pa ostane pod 200 ms. 1 Introduction combines wearable sensors with motion tracking systems and uses CNN models and LSTM to fuse and analyze With the popularity of aerobics, the accuracy of multimodal data. The system acquires motion data movements and training effects have become the focus of through sensors such as accelerometers, gyroscopes, coaches and athletes. During high-intensity and complex magnetometers, and Kinect motion capture systems. It exercise, the movements of aerobics athletes can be uses Kalman filtering, time synchronization, and wavelet affected by factors such as physical exertion, sports skills, transform to optimize data quality. The optimized data is and external environment, resulting in unstable movement used through the CNN model to evaluate and optimize performance. Traditional manual evaluation methods are motion performance, providing real-time feedback and inefficient and subjective, and cannot provide athletes personalized training suggestions. The CNN-based with accurate training feedback in real-time. With the optimization method combines wearable sensor development of sensor technology [1] and artificial technology with deep learning (DL) algorithms to improve intelligence [2], [3], motion evaluation methods based on the accuracy and stability of motion evaluation. wearable devices [4] and intelligent feedback systems Experimental results show that the combination of have become a research hotspot. Such feedback systems Kalman filtering and CNN models effectively improves can provide accurate real-time data analysis, optimize the accuracy and stability of aerobics motion evaluation, training programs, and improve athlete performance. providing strong support for the intelligent and precise Therefore, developing a motion evaluation and development of sports training. optimization system based on intelligent technology [5], Current research mostly uses weighted averaging or [6] has become the key to improving training effects and simple concatenation, and the model structure is fixed, athlete performance. without optimizing for the temporal characteristics and This paper studies the motion evaluation and complex action patterns of sports data. This paper optimization system based on intelligent technology to combines wearable sensors with aerobics tracking and improve the training effect and motion performance of uses a model based on CNN and LSTM to achieve aerobics athletes. To achieve this goal, this paper performance optimization. The main contributions of this 236 Informatica 49 (2025) 235–248 D. Tan study include: 1) wavelet transform is combined and CNN-based multimodal data fusion method to improve principal component analysis (PCA) to extract time- the accuracy and robustness of athlete action recognition frequency features, and a dynamic weighted fusion by combining accelerometer, gyroscope, and visual data strategy is adopted to improve the robustness of data [12]. In this study, data fusion technology [13] effectively fusion; 2) small convolutional kernels are introduced into reduces sensor errors and enhances the system's CNN to capture action details and combined with double- adaptability to complex actions. Zhang L proposed a KCF layer LSTM to model long-term dependencies, enhancing (Kernelized Correlation Filters) tracking method based on the model's ability to recognize complex action sequences; improved depth information [14], which successfully used 3) based on the model output, an action scoring function Kalman filtering to reduce the noise of motion sensors and and error correction mechanism are constructed to provide improve the stability of motion estimation. Although these athletes with immediate feedback and personalized methods have achieved good results to a certain extent, training suggestions, improving the model's generalization most of them focus on a single motion estimation task, and ability in different training scenarios through data their effects in complex training environments still need to augmentation and adaptive filtering techniques. be improved. Existing methods also have shortcomings in terms of personalized training feedback [15] and the 2 Related work generation of real-time optimization suggestions [16]. Therefore, how to comprehensively utilize multimodal In recent years, many scholars have been committed to data and combine DL with real-time optimization improving the accuracy of athletes' motion evaluation feedback systems is still a major challenge in current through different technical means. Traditional motion research. capture systems [7] rely too much on calibration equipment and high-cost hardware settings. Although 3 Data fusion and movement such capture systems can capture the movements of athletes, they suffer from problems such as poor real- performance optimization time performance, high data noise, and inconvenient operation when evaluating high-intensity sports or 3.1 Data collection and preprocessing complex movements. To improve the quality of sports The study combines wearable sensors with motion data, many studies have attempted to use wearable sensors tracking systems to design an efficient data collection and for motion tracking. Rigozzi C J et al. used data from preprocessing solution. The key to the entire process is to sensors such as accelerometers, gyroscopes, and synchronously collect data from multiple sources and magnetometers to monitor athletes’ body posture and eliminate errors, providing a reliable basis for subsequent motion trajectory [8]. Sensor data is easily affected by analysis. noise, environmental changes, and wear position Wearable devices collect data in real-time through deviation, resulting in inaccurate data. To reduce noise built-in accelerometers [17], gyroscopes [18], and interference, Zhang Y applied Kalman filtering magnetometers [19]. The accelerometer records the technology to the preprocessing of sensor data [9]. As athlete’s acceleration changes in three-dimensional space; technology matures, DL technology [10], especially the gyroscope measures the athlete’s rotational angular CNNs, has been applied to multimodal data analysis and velocity; the magnetometer helps correct the direction of action recognition by Gholamiangonabadi D [11], and has movement. A multi-sensor system can accurately capture achieved certain results. Existing research still faces every movement of an athlete and generate rich time series problems such as how to combine multiple data sources, data [20]. Sensor data fusion equation is: optimize data processing processes, and provide real-time f(t) = α ⋅ a(t) + β ⋅ ω(t) + γ ⋅ m(t) feedback in actual training scenarios. (1) To solve the above problems, some researchers have a(t) is acceleration data; ω(t) is angular velocity data; proposed a hybrid method that combines sensor data and m(t) is magnetic field data; α , β , γ are weighting DL algorithms to improve the accuracy and real-time parameters. Figure 1 is a data acquisition flow chart. performance of action recognition. Chakraborty A used a CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 237 Sensors collect data (acceleration, angular Lower limb motion Human motion velocity, angle) reconstruction Training Data calibration datasets and cleaning VICON a b Thigh motion Wearable Intra-limb motion capture coordination Data fusion device model and feature extraction Shank motion Figure 1: Data collection flow chart Figure 1 shows the complete process from data In addition to multi-sensor equipment, the motion collection to motion analysis. The motion data is obtained tracking system Kinect [24] and depth camera [25] are through wearable sensors and motion capture systems; key also introduced to obtain the spatial position information features are extracted through data cleaning and fusion; of the key parts of the athletes. The system captures the the CNN model is used to optimize the motion athlete’s action posture through 3D coordinates and performance evaluation, ultimately providing scientific records the spatial coordinates of joints such as shoulders, motion optimization and training suggestions for aerobics elbows, and knees, as well as their dynamic trajectories athletes. To ensure the accuracy of the data, all collected over time. Through the calibration algorithm, combined sensor data are transmitted to the background data with the position information of the sensor and the motion processing system in real-time via Bluetooth or Wi-Fi tracking system, the effects caused by the wearer position modules [21] to ensure the real-time and efficient data. By offset or motion capture error are corrected. Motion adopting wireless data transmission [22], the data trajectory smoothing formula is: transmission is not affected by the physical distance, 𝑁 1 ensuring that the data is updated and recorded in time 𝑝(𝑡) = ∑ 𝑝 𝑡 𝑁 𝑖( ) (4) when the athletes perform complex movements. Bluetooth 𝑖=1 signal quality function is: pi(t) is the spatial position of different sampling 1 Q = points. The system synchronizes the data of sensors and 1 + exp(−k(S − S 2 0)) ( ) motion tracking systems to ensure that the sensor data and S is the signal strength; S0 is the signal threshold; k is motion data at each moment can correspond correctly. the Bluetooth signal adjustment parameter. To deal with After time synchronization, the data can be smoothly input data anomalies, the Kalman filter [23] is used to smooth into the subsequent data processing and analysis. Time the data of accelerometers and gyroscopes. The Kalman synchronization function is: filter can dynamically predict the true value of the signal, 𝛥𝑇 = 𝑇𝑠𝑒𝑛𝑠𝑜𝑟 − 𝑇𝑐𝑎𝑚𝑒𝑟𝑎 (5) optimize the measurement noise, and improve the accuracy of the data. Kalman filter formula is: Tsensor and Tcamera are the timestamps of the sensor 𝑥 and camera, respectively. Table 1 is the motion capture 𝑘|𝑘 = 𝑥𝑘|𝑘−1 + 𝐾𝑘(𝑧𝑘 − 𝐻𝑥 ) 𝑘|𝑘−1 (3) key point coordinate data table. 𝐾𝑘 is the Kalman gain, and 𝑧𝑘 is the observed value. Table 1: Motion capture key point coordinate data table. Timestamp Shoulder X Shoulder Y Shoulder Z Knee X (cm) Knee Y (cm) Knee Z (cm) (ms) (cm) (cm) (cm) 0 12.3 45.6 78.2 8.9 30.2 50.7 10 12.1 45.5 78.3 9 30.3 50.9 20 12.2 45.7 78.1 9.1 30.1 50.8 30 12.4 45.8 78.2 9.2 30.4 51 40 12.3 45.9 78.3 9.3 30.5 50.9 50 12.5 46 78.1 9.4 30.6 51.1 60 12.6 46.1 78.2 9.5 30.7 51.2 238 Informatica 49 (2025) 235–248 D. Tan Table 1 records the three-dimensional spatial position component analysis (PCA) [32] are used. Weighted fusion data (X, Y, Z, in centimeters) of the shoulder and knee at is to assign different weights to different data sources different timestamps (in milliseconds). The data can be according to their signal-to-noise ratio and importance. used to analyze the movement trajectory and change trend When the noise of sensor data is large, its weight in fusion of the shoulder and knee in space. is reduced. On the contrary, if it is small, it means that the spatial data provided by the motion tracking system is 3.2 Multimodal data fusion relatively stable and can be assigned a higher weight. Weighted fusion formula is: After data collection and preprocessing, the multimodal 𝐹 data from wearable sensors and motion tracking systems 𝑓𝑢𝑠𝑒𝑑 = 𝜔1𝐹1 + 𝜔2𝐹2 (9) are effectively fused. Different data sources provide 𝐹1 and 𝐹2 are the features of different data sources, different perspectives on the athlete's movements. and 𝜔1 and 𝜔2 are weights. Through weighted fusion Wearable sensor data provides time series information processing, the fused data can more realistically reflect the such as acceleration and angular velocity, and the motion athlete's performance. In weighted fusion, the quality of tracking system provides spatial information such as joint the signal is the key to weight assignment, and the position and motion trajectory. The effective integration relevance and accuracy of the data determine the of this information can help comprehensively evaluate the contribution of each source information. athlete's performance and provide accurate data input for Principal component analysis is used to reduce the the DL model. dimensionality of the data, compressing the multi- Time synchronization [26] is a prerequisite for dimensional raw data into fewer principal components, ensuring the effective integration of multimodal data. reducing the redundancy of the data, and extracting the When collecting sensor data and motion tracking data, the most representative features. PCA dimensionality data needs to be accurately aligned in time due to the reduction formula is: different collection frequencies of the two [27]. To 𝑚𝑎𝑥 𝑊𝑇 ∑ 𝑊 achieve data calibration, a timestamp is used to mark the 𝑋′ = 𝑋𝑊, 𝑊 = 𝑎𝑟𝑔 ( ) (10) 𝑊 𝑊𝑇𝐷𝑊 acquisition time of each frame of data to ensure that each Σ is the feature covariance matrix [33]; D is the weight action frame can obtain corresponding sensor data and matrix [34]; W is the optimized feature matrix. The tracking data. After time synchronization, the sensor data analysis process reduces the computational complexity at each moment is guaranteed to correspond perfectly with and retains the key information in the data, providing more the action tracking data, providing a basis for data fusion. efficient input for the training of DL models. Through the After synchronization correction, it can ensure that the dimensionality reduction of PCA, redundant dimensions action and sensor data at each moment correspond to each and noise can be eliminated, improving the efficiency and other, avoiding information loss caused by asynchrony accuracy of subsequent analysis. [28]. After fusion processing of multimodal data, data from The original sensor data contains rich time series different sources is integrated into a unified format, information. Wavelet transform [29] is used to analyze the providing rich and accurate input features for subsequent data in the time and frequency domain to extract motion action evaluation. The fused data contains both time series features. Wavelet transform formula is: ∞ information and spatial position information, which can 𝑡 − 𝑏 𝑊𝜓(𝑎, 𝑏) = ∫ 𝑓(𝑡)𝜓∗( ) 𝑑𝑡 fully and accurately reflect the athlete's action (6) −∞ 𝑎 performance in training [35]. Combined with efficient data synchronization, feature extraction, and fusion 𝜓 is the mother wavelet function. Wavelet transform processing, sufficient high-quality data support is can effectively capture the instantaneous changes in provided for the CNN, ensuring that the model can make motion signals and use multi-scale analysis [30] to extract full use of various types of information for accurate the time-frequency features of motion signals. It is evaluation. integrated with the sensor data by calculating the spatial characteristics of the athlete's joint angle, motion 3.3 Action performance evaluation based on trajectory, and speed. The formula for calculating the joint CNN and LSTM angle is: 𝑣1 ⋅ 𝑣2 This paper studies the evaluation of action performance of 𝜃 = 𝑎𝑟𝑐 𝑐𝑜𝑠( ) ‖𝑣 1‖‖𝑣 (7) 2‖ fused data based on CNN and LSTM. The CNN model has an advantage in processing time series data and spatial v1 and v2 are the vectors of two bones. The motion data, while LSTM is good at capturing time series trajectory curve fitting formula is: dependencies, especially in capturing subtle differences in 𝑟(𝑡) = 𝑎0 + 𝑎1𝑡 + 𝑎2𝑡2+. . . +𝑎𝑛𝑡𝑛 (8) athletes' movements and automatically extracting features. The spatial characteristics of the tracking system play The CNN model can more comprehensively evaluate the a decisive role in the accuracy and coordination of the athletes' movement performance and achieve end-to-end movements and are the core basis for evaluating the automated processing from raw sensor and tracking data performance of athletes. Data fusion is the key to to final movement scoring and classification. Through multimodal data processing. When fusion is performed, LSTM, the model can understand the continuity between methods such as weighted fusion [31] and principal actions and evaluate actions based on the relationship CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 239 between the action sequence. When capturing complex obtain more comprehensive feature extraction in both motion patterns, LSTM can supplement the timing spatial and temporal dimensions. After CNN extracts information that the CNN model fails to fully capture, spatiotemporal features, LSTM processes these features in providing a more detailed motion performance evaluation. time series, and the combination can more accurately In the motion performance evaluation task, the evaluate the quality and type of actions. Figure 2 is a combination of LSTM and CNN enables the model to diagram of the CNN model structure. Figure 2: CNN model structure diagram Figure 2 shows how the spatiotemporal features of the category (standard, insufficient, error); the activation athlete's movements are gradually extracted through the function is Softmax. The LSTM model is used to model convolutional layer, pooling layer, and fully connected the long-term dependencies of time series, with input layer, and converted into a one-dimensional vector for dimensions of (T, D), where T=100 represents the time classification and scoring through the flattening layer. The step, and D=9 represents the feature dimension. The output layer evaluates the athlete's movement quality LSTM layer contains 128 hidden units and uses a double- based on the features learned by the model, achieving layer stacking structure with an activation function of automated and efficient movement quality recognition and Tanh. The output dimension is (C), C=3, and the feedback. activation function is Softmax. The output feature vectors The CNN model adopts a five-layer structure, with of CNN and LSTM are merged through concatenation input data dimensions of (T, F). Among them, T=100 operation and input into a fused fully connected layer, represents the time step; F=9 represents the input feature ultimately outputting action scores and classification dimension (including data from three axes each of results. The model parameter settings are shown in Table accelerometer, gyroscope, and magnetometer); the output 2: dimension is (C), where C=3 represents the action Table 2: Model parameter settings Parameter Specifications Parameter Specifications Learning rate 0.001 Dropout rate 0.5 Batch size 64 Training epochs 100 Optimizer Adam Hidden layer size 128 Loss function Cross entropy loss function Convolutional kernel size (3, 3) The convolution operation can effectively capture the and long-term dependencies in the action. Convolution spatial features and temporal dynamic changes of the operation formula is: athlete's joint movement trajectory, body posture changes, 𝑘 𝑘 etc. The features processed by CNN can be passed to the 𝑓𝑖,𝑗 = ∑ ∑ 𝑥𝑖+𝑚,𝑗+𝑛 ⋅ 𝑤𝑚,𝑛 (11) LSTM network, which further analyzes the timing 𝑚=−𝑘 𝑛=−𝑘 information of these features. Through the time memory mechanism of LSTM, the network can learn the continuity 240 Informatica 49 (2025) 235–248 D. Tan x is the input feature map; w is the convolution kernel; and increase the robustness of the model. Data the choice of the convolution kernel is closely related to enhancement methods include operations such as rotation, the characteristics of the data. Smaller convolution kernels mirror flipping, and scaling. More training samples are help capture subtle changes, while larger convolution generated through enhancement operations, so that the kernels can extract more macro features. It is necessary to model can still show excellent performance in different adjust the bodybuilder's tiny movements and postures, and motion modes. smaller convolution kernels help extract details more accurately, so a convolution kernel of size 3×3 is selected. 3.4 Real-time feedback and optimization The LSTM layer filters out irrelevant temporal suggestion generation information through its forget gate, input gate, and output gate, retaining the long-term and short-term dependency To improve the effect of aerobics training, a real-time information related to the action performance evaluation. feedback system is designed to feed back the evaluation After the convolution layer, the maximum pooling results generated during the training process to athletes, and average pooling [36] are used to reduce the dimension helping them adjust their movements, avoid errors and of the feature map. The role of the pooling layer is to optimize the quality of movements. The key to the reduce the amount of data after the convolution operation, feedback system lies in real-time and accuracy. Only reduce the computational complexity, and retain the most timely and accurate feedback can effectively improve the important feature information. Pooling operation formula training level of athletes. is: The system analyzes the real-time action data of 𝑚𝑎𝑥 athletes through a model combining a trained CNN model 𝑝𝑖,𝑗 = (𝑔 ) 𝑚, 𝑛 𝑖+𝑚,𝑗+𝑛 (12) and LSTM to generate more accurate action scores and evaluation results. Every time an athlete performs an In the fully connected layer, the temporal information action, the system immediately analyzes the action and generated by LSTM and the spatial features extracted by outputs a real-time score. The score reflects the accuracy, CNN are integrated through the fully connected layer to fluency, and completion of the action. The higher the generate the final score and classification results of the score, the more standard the action. For low-scoring action performance. The model can not only evaluate the actions, the system automatically identifies the errors and quality of actions, but also classify actions into multiple provides specific optimization suggestions. Error categories such as "standard", "deficient", and "error". correction weight formula is: Action scoring function is: 1 𝑁 𝜚𝑒𝑟𝑟𝑜𝑟 = 1 + 𝑒𝑥𝑝(−𝜅 ⋅ 𝛿) (16) 𝑆 = ∑ 𝜑𝑖ℎ𝑖 (13) 𝛿 is the margin of error. Based on the score and error 𝑖=1 recognition results, the system generates personalized hi is the score of each feature, and φi is the weight. optimization suggestions. Optimization suggestion The action evaluation score reflects the accuracy, fluency, generation function is: and standardization of the athlete's action. The 𝑚𝑎𝑥 𝐺 = 𝑎𝑟𝑔 (𝑆 + 𝜆 ⋅ 𝐸 ) classification results provide coaches and athletes with 𝑖 𝑖 𝑖 (17) targeted training improvement directions. Each classification result helps to further guide athletes' specific Si is the score, and Ei is the severity of the error. improvement measures in training. Suggestions include improving posture, adjusting the During the training process, the Adam optimization range of motion, strengthening muscle control, etc., to algorithm [37] is used to update parameters. The help athletes correct deficiencies in their movements. optimization algorithm update formula is: Optimization suggestions can be provided in text form or ?̂? visualized through a graphical interface to help athletes 𝑡 𝜃𝑡+1 = 𝜃𝑡 − 𝜂 more intuitively understand the problems in their √?̂? 4 𝑡 + 𝜖 (1 ) movements. To ensure timely feedback, the system accelerates the ?̂?𝑡 and ?̂?𝑡 are momentum estimates. The cross- calculation process by optimizing the algorithm, entropy loss function [38] is used to optimize the controlling the delay between action scoring and feedback classification task, so that the model can better handle generation to less than 200ms, ensuring that athletes can multi-classification problems and continuously improve receive targeted adjustment suggestions in a short period the accuracy of action scoring and classification by of time. Real-time feedback delay formula is: minimizing the loss function. Classification loss function 𝑇 is: 𝑑𝑒𝑙𝑎𝑦 = 𝑇𝑝𝑟𝑜𝑐𝑒𝑠𝑠 + 𝑇𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡 (18) 𝐶 Through the real-time feedback mechanism, athletes 𝐿 = − ∑ 𝑦𝑖 𝑙𝑜𝑔(?̂?𝑖) (15) can continuously adjust their movements during training 𝑖=1 and gradually improve the training effect. Optimization yi is the true label, and ŷi is the predicted probability. suggestions are not limited to correcting mistakes, but can To improve the model’s training effect, data enhancement also help athletes improve the delicacy and accuracy of technology is used to simulate the motion performance in their movements. The athlete improvement index formula different training scenarios, expand the training data set, is: CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 241 𝛥𝑆 ⋅ 𝑙𝑛(1 + 𝐴0) 𝑡 𝑑𝑆 𝐼 = (𝑡) = 𝑂 𝑑𝜏 𝛥𝑇 ⋅ (1 + 𝑒−𝜀(𝛥𝑆−𝛥𝑆𝑎𝑣𝑔) (19) 𝑂 0 + ∫ (20) ) 0 𝑑𝜏 ΔS is the score increment; ΔT is the time; A0 is the O0 is the initial performance. Combining DL with athlete's baseline ability; ε is the adjustment parameter; real-time feedback technology, it provides athletes with an ΔSavg is the average score increment. Through long-term intelligent training platform that can effectively improve training and optimization feedback, athletes can improve training efficiency and quality. Table 3 is some their overall performance in a short period of time and hyperparameter data of the experiment. achieve the best training effect. Long-term optimization trend equation is: Table 3: Some hyperparameter data Parameters Function Parameters Function α, β, γ Sensor data fusion weighting parameters k Bluetooth signal conditioning parameters Kk Kalman Gain ψ Mother wavelet function ω1, ω2 Weighted fusion weight Σ Feature covariance matrix w Convolution Kernel φi Action score weight m̂t, v̂t Momentum Estimation ŷi Prediction probability δ Margin of Error ε Adjustment parameters Table 3 lists the hyperparameters used for model and (4000), insufficient actions (4000), and incorrect actions signal processing and their functional descriptions. These (4000). Each sample contains multimodal data of 100 time hyperparameters play a key role in sensor data fusion, steps, including 9 channels (3-axis x 3 sensors) from signal conditioning, feature extraction, and prediction, and accelerometers, gyroscopes, magnetometers, and 18 joint can effectively improve the performance and accuracy of point 3D coordinate information obtained by Kinect. The the model. Adjusting these parameters can optimize data collection frequency is 50 Hz, which means system behavior according to specific application collecting 50 frames per second. The wearable sensor is requirements and achieve more accurate data processing the Xsens MTw Awinda series inertial measurement unit and action recognition. (IMU), with specific parameters shown in Table 4: 4 Experimental results 4.1 Experimental setup This paper collects a total of 12000 action samples, covering three categories of actions: standard actions Table 4: Sensor parameters Sensor Range Resolution Sampling rate Accelerometer ±16g 0.001g 50 Hz Gyroscope ±2000°/s 0.01°/s 50 Hz Magnetometer ±2.5 Gauss 0.01 Gauss 50 Hz The data collection is conducted in an indoor sports Kalman filtering, as a common noise suppression method, arena, with an ambient temperature controlled at 22-25 ° can effectively improve the stability and accuracy of the C, humidity at 45-60%, and good and stable lighting data by correcting the measured values. Figure 3 shows conditions. All model training is completed under the the comparison between the original data of the PyTorch 1.13.1 deep learning framework, with a training accelerometer, gyroscope, and magnetometer and the data time of approximately 4 hours per model. after Kalman filtering, which is used to evaluate the filtering effect. 4.2 Effect of Kalman Filtering on sensor data In sensor data processing, the original signal is easily affected by noise, which affects the accuracy of the data. 242 Informatica 49 (2025) 235–248 D. Tan (a) Accelerometer data comparison; (b) Gyroscope data comparison; (c) Magnetometer data comparison Figure 3: Kalman filter effect Figure 3 shows the fluctuation of different sensors in valley difference in the X direction drops from the original different directions and the effect after filtering. In the 0.8µT to 0.6µT after filtering, providing more stable accelerometer data, after Kalman filtering, the peak-to- environmental data. At 0ms, the original magnetic field X valley difference of acceleration X drops from the original is 45.0µT, and after Kalman filtering, it is 45.2µT. The 0.13m/s² to 0.04m/s² after filtering, showing a more stable data optimized by Kalman filtering, combined with the state. The original acceleration fluctuates greatly in the X- analysis and processing of the CNN model, can effectively axis and Y-axis directions. After Kalman filtering, the improve the accuracy of motion tracking and evaluation, fluctuation of the data is more stable, indicating that and provide athletes with more accurate real-time Kalman filtering can effectively reduce the interference of feedback and personalized training suggestions. noise. When the timestamp is 0ms, the original acceleration X is 0.12m/s², and after Kalman filtering, it is 4.1.Action scoring effect 0.13m/s², with little change. The gyroscope data also To comprehensively evaluate the performance of aerobics shows a similar trend. The original data fluctuates to athletes, a scoring system based on sensor data is varying degrees in the X, Y, and Z directions, while the introduced. The score value of each sensor at different angular velocity data after Kalman filtering has less time points reflects the quality and stability of the athlete's fluctuation. After filtering, the peak-to-valley difference movements. The comprehensive score combines the in the angular velocity in the X direction is reduced from average score of these three scores to provide a the original 0.25°/s to the filtered 0.17°/s, ensuring more comprehensive performance evaluation for bodybuilders, accurate angle measurement. At 0ms, the original angular helping athletes and coaches to grasp the effect of exercise velocity Z is 5.50°/s, and after Kalman filtering it is in real-time. Figure 4 shows the scoring performance at 5.52°/s. The magnetometer data also shows small different time points. fluctuations. After filtering, the fluctuations of the three- axis magnetic field are further reduced. The peak-to- CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 243 Figure 4: Sensor scores at different time points In the experiment, wearable sensors and motion time. CNN optimizes sensor data by combining Kalman tracking technology are combined to fuse and analyze filtering and wavelet transform technology to provide multimodal data such as acceleration, angular velocity, athletes with more accurate performance feedback and and magnetic field using the CNN model and LSTM to promote the improvement of training results. achieve accurate evaluation and optimization of aerobics movements. The data in Figure 4 shows that as time goes 4.3 Performance of feedback systems in by, the athletes' acceleration scores gradually increase different scenarios from 70 to 85 points; the angular velocity scores increase from 60 to 75; the magnetic field scores also maintain a The performance improvement before and after training relatively stable upward trend. The final comprehensive can reflect the optimization effect of the system. The score increases from 70.0 to 82.7. The data changes reflect following Table 5 shows the relationship between the gradual optimization of the athletes' performance feedback delay, optimization suggestion generation time, during training. Figure 4 shows the changes in different and athlete performance improvement in different training scoring dimensions at each time point, helping trainers to scenarios. accurately monitor and adjust training strategies in real- Table 5: Feedback performance in different training scenarios Optimization Pre-training Post-training Average Suggestion Performance Performance Improvement Scenario Type Feedback Delay Generation Time Score (out of Score (out of Rate (%) (ms) (ms) 100) 100) Indoor Low 150 90 65 80 23.1 Intensity Indoor High 180 95 62 78 25.8 Intensity Outdoor Low 160 85 68 82 20.6 Intensity Outdoor High 200 110 60 75 25 Intensity According to the data in Table 5, there are certain suggestion generation time is 95 milliseconds; the action differences in feedback delay and optimization suggestion score before training is 62 points. After training, the score generation time in different training scenarios. In the increases to 78 points, and the improvement rate reaches indoor high-intensity training scenario, the average 25.8%, which is the highest improvement rate in all feedback delay is 180 milliseconds; the optimization scenarios. This shows that under high-intensity training, 244 Informatica 49 (2025) 235–248 D. Tan despite the longer feedback delay, the system is able to 4.4 CNN model training effect more effectively generate optimization suggestions and To better evaluate and optimize the performance of improve athlete performance. In outdoor low-intensity athletes in different training scenarios, the experiment training scenarios, the feedback delay is shorter, at 160 uses CNN model and LSTM to conduct in-depth analysis milliseconds, but the improvement rate is 20.6%, which is of various training data. In the four training scenarios of relatively low, indicating that the feedback system in this indoor low intensity, indoor high intensity, outdoor low scenario has room for improvement in the generation and intensity, and outdoor high intensity, the change of application of optimization suggestions. The efficiency training cycle has an important impact on the accuracy and and optimization effect of the feedback system vary in performance of the model. The changes in the accuracy, different training scenarios. The response speed of the precision, recall, F1-score, and loss value of the CNN system and the time to generate suggestions are closely model in different scenarios can be analyzed to understand related to the performance improvement of athletes. the model optimization trend during the training process. Figure 5 shows the changes in key indicators of the model after each training cycle in these scenarios. (a) Indoor low-intensity training scene; (b) Indoor high-intensity training scene; (c) Outdoor low-intensity training scene; (d) Outdoor high-intensity training scene Figure 5: CNN effects in different scenes By analyzing the model effects in different training intensity and high-intensity scenarios also shows a similar scenarios through the data in Figure 5, the performance of trend, with the accuracy rate increased from 73.2% to the CNN model combined with LSTM in various 85.9% and 74.3% to 86.7%, respectively, and the loss indicators is significantly improved with the increase of value decreased significantly. Whether it is a low-intensity training cycles. In the indoor low-intensity training or high-intensity scenario, the model shows good stability scenario, the accuracy rate increases from 74.5% to 87.1%; and accuracy in different environments. With the increase the precision, recall rate, and F1-score also increase of training cycles, the performance of the CNN model in steadily, and the loss value decreases from 0.68 to 0.27, motion scoring and classification has been effectively indicating that the model's ability to fit the data steadily improved, and the training error can be significantly improves with the passage of training time. In indoor high- reduced. intensity scenarios, the accuracy rate increases from To further verify the effectiveness of the method 75.0% to 88.2%. The improvement in precision and recall proposed in this paper in the evaluation of aerobics rate shows that the model can effectively handle more movements, the CNN-LSTM hybrid model is compared complex training environments, and the loss value drops with several representative studies in recent years. The to 0.26. The training effect of the model in outdoor low- comparative methods include: multimodal fusion method CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 245 based on traditional CNN, action recognition method are conducted on the same dataset, with evaluation metrics based on LSTM, traditional classification method based including accuracy and F1 score, as well as action on support vector machine (SVM), and temporal modeling evaluation RMSE (Root Mean Squared Error) value, as method based on Transformer. Comparative experiments shown in Table 6: Table 6: Comparison of the performance of this method with existing research Model Accuracy F1-score RMSE SVM 72.1% 0.703 8.1 CNN 83.2% 0.817 6.0 LSTM 84.6% 0.832 5.8 Transformer 85.4% 0.841 5.6 CNN-LSTM 88.2% 0.867 4.2 how to further improve the model's ability to classify From Table 6, it can be seen that this method complex actions and its ability to comprehensively outperforms existing methods in terms of accuracy and F1 process multimodal data remains an issue to be resolved. score. Compared with traditional CNN methods, this Future research can focus on optimizing sensor data method has improved accuracy by 5.0%; compared with collection and processing technology, strengthening data the LSTM method, it has improved by 3.6%; compared synchronization and fusion algorithms, and further with the Transformer method, it has improved by 2.8%. improving the stability and adaptability of CNN models in The RMSE of the method proposed in this paper is 4.2, different environments. With the advancement of significantly lower than other comparative methods, technology, combined with more sensors and analysis of indicating that the CNN-LSTM hybrid model proposed in training scenarios, it can be possible to provide athletes this paper has higher accuracy and stability in predicting with more detailed and comprehensive personalized action scores. This result represents that the method training programs, promoting the intelligent and precise proposed in this paper can effectively achieve accurate development of sports training. evaluation of athletes' movements and provide objective guidance for scientific training. 6 Conclusions This paper proposes a fitness exercise action evaluation 5 Experimental discussion method that integrates wearable sensors and motion By combining wearable sensors with motion tracking tracking systems, and combines CNN and LSTM models technology and using CNNs for multimodal data fusion to fuse and analyze multimodal data. The experimental and analysis, this study has achieved significant results show that the method exhibits excellent experimental results in motion evaluation and performance in action classification tasks: in indoor low- optimization. Kalman filtering significantly improves the intensity training scenarios, the accuracy increases from stability and accuracy of sensor data in the application of 74.5% to 87.1%; in high-intensity training scenarios, the noise suppression, and effectively reduces the impact of accuracy increases from 75.0% to 88.2%. By introducing environmental interference on data quality. With the Kalman filtering, wavelet transform, and dynamic increase of training cycles, the CNN model combined with weighting fusion strategy, the stability of sensor data and LSTM continues to improve in terms of accuracy, the generalization ability of the model have been precision, recall rate, and F1-score in action scoring and effectively improved. This paper not only provides high- classification, showing good fitting ability. In high- precision motion evaluation and real-time feedback for intensity training scenarios, despite relatively long aerobics athletes, but also provides a transferable technical feedback delays, the system can still effectively generate framework for other high-intensity, multimodal motion optimization suggestions, and the athletes' performance sports projects. In the future, the system can be further has been significantly improved. expanded to remote training platforms, intelligent However, the experimental results also reveal that wearable devices, virtual coaching systems, and other there are still some potential challenges and problems in application scenarios, promoting the deep integration and the application and development of the system. The widespread application of artificial intelligence performance of the model has been significantly improved technology in the fields of sports training and health in different training scenarios, but the feedback system has management. In the future, lightweight CNN structures a long delay and optimization suggestion generation time such as MobileNet and TinyML, deployment of models, in high-intensity training scenarios, which affects the on wearable devices, and heterogeneous computing system's real-time response capability. Kalman filtering acceleration can be used to further shorten feedback and other data optimization techniques effectively reduce latency and improve system response speed. noise, but in complex or extreme training environments, external interference still poses a risk of affecting the Authorship contribution statement accuracy of sensor data and causing certain errors in training feedback. With the diversification of training Danhua Tan: Writing-Original draft preparation, scenarios and the increase in environmental complexity, Conceptualization, Supervision, Project administration. 246 Informatica 49 (2025) 235–248 D. Tan Data availability 2346, 2023. https://doi.org/10.1177/17479541221138015 The experimental data used to support the findings of this [9] Y. Zhang, “Design of Wireless Motion Sensor study are available from the corresponding author upon Nodes based on the Kalman Filter Algorithm,” request. Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Author statement Electrical & Electronic Engineering), 16(3): 248– The manuscript has been read and approved by all the 255, 2023. authors, the requirements for authorship, as stated earlier https://doi.org/10.2174/23520965156662209081 in this document, have been met, and each author believes 52036 that the manuscript represents honest work. [10] S. Akan and S. Varlı, “Use of deep learning in soccer videos analysis: survey,” Multimed Syst, 29(3): 897–915, 2023. Ethical approval https://doi.org/10.1007/s00530-022-01027-0 All authors have been personally and actively involved in [11] D. Gholamiangonabadi and K. Grolinger, substantial work leading to the paper, and will take public “Personalized models for human activity responsibility for its content. recognition with wearable sensors: deep neural networks and signal processing,” Applied References Intelligence, 53(5): 6041–6061, 2023. https://doi.org/10.1007/s10489-022-03832-6 [1] M. E. M. Simbolon, D. K. A. Firdausi, I. [12] A. Chakraborty and N. Mukherjee, “A deep-CNN Dwisaputra, A. Rusdiana, C. Pebriandani, and R. based low-cost, multi-modal sensing system for Prayoga, “Utilization of Sensor technology as a efficient walking activity identification,” Sport Technology Innovation in Athlete Multimed Tools Appl, 82(11): 16741–16766, Performance Measurement,” Indonesian Journal 2023. https://doi.org/10.1007/s11042-022-13990- of Electronics and Instrumentation Systems x (IJEIS), 13(2): 147–158, 2023. [13] W. Liu, Y. Liu, and R. Bucknall, “Filtering based https://doi.org/10.22146/ijeis.89581 multi-sensor data fusion algorithm for a reliable [2] Z. Mei, “3D images analysis of sports technical unmanned surface vehicle navigation,” Journal of features and sports training methods based on Marine Engineering & Technology, 22(2): 67–83, artificial intelligence,” J Test Eval, 51(1): 189– 2023. 200, 2023. https://doi.org/10.1520/JTE20210469 https://doi.org/10.1080/20464177.2022.2031558 [3] S. A. Kovalchik, “Player tracking data in sports,” [14] L. Zhang and H. Dai, “Motion trajectory tracking Annu Rev Stat Appl, 10(1): 677–697, 2023. of athletes with improved depth information- https://doi.org/10.1146/annurev-statistics- based KCF tracking method,” Multimed Tools 033021-110117 Appl, 82(17): 26481–26493, 2023. [4] L. Yang, O. Amin, and B. Shihada, “Intelligent https://doi.org/10.1007/s11042-023-14929-6 wearable systems: Opportunities and challenges [15] P. Hao and K. Qian, “The Integration of in health and sports,” ACM Comput Surv, 56(7):1– Personalized Training Program Design and 42, 2024. https://doi.org/10.1145/3648469 Information Technology for Athletes,” Scalable [5] W. Li, “Application of IoT-enabled computing Computing: Practice and Experience, 25(5): technology for designing sports technical action 4351–4359, 2024. characteristic model,” Soft comput, 27(17): https://doi.org/10.12694/scpe.v25i5.3083 12807–12824, 2023. [16] V. Deepak, D. K. Anguraj, and S. S. Mantha, “An https://doi.org/10.1007/s00500-023-08966-4 efficient recommendation system for athletic [6] Y. Fang, “Utilizing Wearable Technology to performance optimization by enriched grey wolf Enhance Training and Performance Monitoring in optimization,” Pers Ubiquitous Comput, 27(3): Indonesian Badminton Players,” Studies in Sports 1015–1026, 2023. Science and Physical Education, 2(1): 11–23, https://doi.org/10.1007/s00779-022-01680-2 2024. DOI:10.1186/s40561-023-00247-9 [17] J. K. Urbanek et al., “Free-living gait cadence [7] J. Corban et al., “Using an affordable motion measured by wearable accelerometer: a promising capture system to evaluate the prognostic value of alternative to traditional measures of mobility for drop vertical jump parameters for noncontact assessing fall risk,” The Journals of Gerontology: ACL injury,” Am J Sports Med, 51(4):1059–1066, Series A, 78(5): 802–810, 2023. 2023. https://doi.org/10.1093/gerona/glac013 https://doi.org/10.1177/03635465231151686 [18] A. Hussain, S. Ali, M.-I. Joo, and H.-C. Kim, “A [8] C. J. Rigozzi, G. A. Vio, and P. Poronnik, deep learning approach for detecting and “Application of wearable technologies for player classifying cat activity to monitor and improve motion analysis in racket sports: A systematic cat’s well-being using accelerometer, gyroscope, review,” Int J Sports Sci Coach, 18(6): 2321– and magnetometer,” IEEE Sens J, 24(2): 1996– 2008, 2023. CNN and LSTM-Based Multimodal Data Fusion for Performance… Informatica 49 (2025) 235–248 247 [19] A. Spilz and M. Munz, “Synchronisation of Nanoscale Measurements and Ensemble wearable inertial measurement units based on Behavior,” ACS Nano, 17(21): 21493–21505, magnetometer data,” Biomedical 2023. https://doi.org/10.1021/acsnano.3c06335 Engineering/Biomedizinische Technik, 68(3): [31] J. Sun, H. Zhang, X. Ma, R. Wang, H. Sima, and 263–273, 2023. https://doi.org/10.1515/bmt- J. Wang, “Spectral–Spatial Adaptive Weighted 2021-0329 Fusion and Residual Dense Network for [20] A. Liu, R. P. Mahapatra, and A. V. R. Mayuri, hyperspectral image classification,” The Egyptian “Hybrid design for sports data visualization using Journal of Remote Sensing and Space Sciences, AI and big data analytics,” Complex & Intelligent 28(1): 21–33, 2025. Systems, 9(3): 2969–2980, 2023. https://doi.org/10.1016/j.ejrs.2024.11.001 https://doi.org/10.1007/s40747-021-00557-w [32] J. P. Bharadiya, “A tutorial on principal [21] C.-T. Lin, Y. Wang, S.-F. Chen, K.-C. Huang, and component analysis for dimensionality reduction L.-D. Liao, “Design and verification of a wearable in machine learning,” Int J Innov Sci Res Technol, wireless 64-channel high-resolution EEG 8(5): 2028–2032, 2023. acquisition system with wi-fi transmission,” Med DOI:10.5281/zenodo.8002436 Biol Eng Comput, 61(11): 3003–3019, 2023. [33] F. Bizzarri, D. Del Giudice, S. Grillo, D. Linaro, https://doi.org/10.1007/s11517-023-02879-y A. Brambilla, and F. Milano, “Inertia estimation [22] X. Shi and H. Zou, “Data Collection and Analysis through covariance matrix,” IEEE Transactions based on Sensor Technology in Sports Training,” on Power Systems, 39(1): 947–956, 2023. DOI: Scalable Computing: Practice and Experience, 10.1109/TPWRS.2023.3236059 25(5): 4399–4406, 2024. [34] Y. He, C.-K. Zhang, H.-B. Zeng, and M. Wu, https://doi.org/10.12694/scpe.v25i5.3200 “Additional functions of variable-augmented- [23] M. Khodarahmi and V. Maihami, “A review on based free-weighting matrices and application to Kalman filter models,” Archives of systems with time-varying delay,” Int J Syst Sci, Computational Methods in Engineering, 30(1): 54(5): 991–1003, 2023. 727–747, 2023. https://doi.org/10.1007/s11831- https://doi.org/10.1080/00207721.2022.2157198 022-09815-7 [35] Singh S, Sehgal V K. "Deep Learning-Based CNN [24] M. Azhar, S. Ullah, M. Raees, K. U. Rahman, and Multi-Modal Camera Model Identification for I. U. Rehman, “A real-time multi view gait-based Video Source Identification,” Informatica: An automatic gender classification system using International Journal of Computing and kinect sensor,” Multimed Tools Appl, 82(8): Informatics, 47(3): 417-430, 2023. 11993–12016, 2023. https://doi.org/10.31449/inf.v47i3.4392 https://doi.org/10.1007/s11042-022-13704-3 [36] T. Sharma, N. K. Verma, and S. Masood, “Mixed [25] L. Lv, J. Yang, F. Gu, J. Fan, Q. Zhu, and X. Liu, fuzzy pooling in convolutional neural networks “Validity and reliability of a depth camera–based for image classification,” Multimed Tools Appl, quantitative measurement for joint motion of the 82(6): 8405–8421, 2023. hand,” J Hand Surg Glob Online, 5(1): 39–47, https://doi.org/10.1007/s11042-022-13553-0 2023. https://doi.org/10.1016/j.jhsg.2022.08.011 [37] M. Reyad, A. M. Sarhan, and M. Arafa, “A [26] Y. Wu, Z. Sun, G. Ran, and L. Xue, “Intermittent modified Adam algorithm for deep neural network control for fixed-time synchronization of coupled optimization,” Neural Comput Appl, 35(23): networks,” IEEE/CAA Journal of Automatica 17095–17112, 2023. Sinica, 10(6): 1488–1490, 2023. DOI: https://doi.org/10.1007/s00521-023-08568-z 10.1109/JAS.2023.123363 [38] Z. Mei et al., “Automatic loss function search for [27] Hrovatin, N. "Enabling Decentralized Privacy adversarial unsupervised domain adaptation,” Preserving Data Processing in Sensor Networks,” IEEE Transactions on Circuits and Systems for Informatica (03505596), 48(1): 141-142, 2024. Video Technology, 33(10): 5868–5881, 2023. https://doi.org/:10.31449/inf.v48i1.5739. DOI: 10.1109/TCSVT.2023.3260246 [28] Thi H N, Duc C V, Duc C T, HH Minh, SN Van, LV Quan. “Memetic Algorithm for Maximizing K-coverage and K-Connectivity in Wireless Sensor Network,” Informatica, (03505596), 49(1): 1-7, 2025. https://doi.org/:10.31449/inf.v49i1.6750. [29] A. Halidou, Y. Mohamadou, A. A. A. Ari, and E. J. G. Zacko, “Review of wavelet denoising algorithms,” Multimed Tools Appl, 82(27): 41539–41569, 2023. https://doi.org/10.1007/s11042-023-15127-0 [30] M. Kang, C. L. Bentley, J. T. Mefford, W. C. Chueh, and P. R. Unwin, “Multiscale Analysis of Electrocatalytic Particle Activities: Linking 248 Informatica 49 (2025) 235–248 D. Tan https://doi.org/10.31449/inf.v49i16.8812 Informatica 49 (2025) 249–268 249 Metaheuristic-Enhanced SVR Models for California Bearing Ratio Prediction in Geotechnical Engineering Yulin Lan1, Na Feng 2, * and Zhisheng Yang3 1Planning and Finance Department, Weifang Engineering Vocational College, Weifang 262500, Shandong, China 2School of Information Engineering, Weifang Engineering Vocational College, Weifang 262500, Shandong, China 3Party and Government Office, Weifang Engineering Vocational College, Weifang 262500, Shandong, China E-mail: sdqzyuchen@163.com *Corresponding author Keywords: california bearing ratio, support vector regression, adaptive opposition slime mold algorithm, alibaba and the forty thieves optimization algorithm, dingo optimization algorithm Received: April 7, 2025 Soil resistance characteristics, particularly the California Bearing Ratio (CBR), play a pivotal role in pavement and subgrade design. However, conventional laboratory-based CBR testing is often time- consuming, labor-intensive, and costly. This study presents a novel machine learning framework that combines Support Vector Regression (SVR) with three recent metaheuristic optimization algorithms— Dingo Optimization Algorithm (DOA), Alibaba and the Forty Thieves Optimization (AFT), and Adaptive Opposition Slime Mold Algorithm (AOSMA)—to predict CBR values efficiently and accurately. A dataset consisting of 220 soil samples with eight geotechnical input parameters was used to develop and evaluate the hybrid models. The predictive performance of each model was assessed using multiple evaluation metrics, including R², RMSE, MSE, RSR, and WAPE. Results indicate that the SVR–AFT (SVAF) hybrid model outperformed the others, achieving an R² of 0.9968 and an RMSE of 0.7946 in the testing phase, demonstrating high generalization ability and predictive precision. The integration of SVR with metaheuristic algorithms significantly enhances model robustness and accuracy, offering a practical and cost-effective alternative to empirical CBR testing methods. This work highlights the potential of hybrid AI models in solving complex geotechnical prediction problems and contributes to the growing body of research at the intersection of civil engineering and artificial intelligence. Povzetek: Hibridni modeli SVR so optimizirani z metahevristikami AFT, DOA in AOSMA za hitro in natančno napovedovanje CBR iz osmih geotehničnih parametrov. Na 220 vzorcih doseže najboljši model SVAF R² = 0.9968 in RMSE = 0.7946, kar ponuja stroškovno učinkovito alternativo laboratorijskim testom. 1 Introduction Gams and Kolenik highlight the reciprocal relationship between electronics and AI, where swift hardware CBR is the term utilized by Geotechnical construction to improvements, described by a comprehensive set of describe the resistivity of the substrate sample to a piston's Information Society (IS) laws, have driven insertion. More specifically, the CBR describes the force groundbreaking progress in AI across fields like medicine, applied to the piston to enable it to penetrate the soil. [1]. smart environments, and autonomous systems. Their Initially, the CBR examination was devised in California research shows that AI and ambient intelligence (AmI) not to appraise the suitability of soils for highway only benefit from electronic advancements but are also construction. Civil engineers modified the testing process beginning to influence hardware optimization and to enhance its impact on the airport's construction. Almost intelligent system design through AI, indicating a move all emerging countries widely adopt the CBR test to toward a more integrated technological progression [4]. appraise pavements' resilience to soil [2]. A material's After the compacted soils have been tested, the laboratory load-bearing capacity is gauged by its CBR, which is the can conduct the subsequent test. However, it is possible ratio of the attainable supporting strength of base materials for soils located in trenches to conduct the CBR test under to that of regular crushed rock. In structural engineering, the circumstances on the premises [5]Recognizing that in 100 is considered a reasonable limit for the CBR for situ and laboratory test outcomes can show perceptible broken rock substances. differences between soil types, unit weights, and water Conversely, the values of CBR for alternative content is essential. Employing CBR tests has proved materials are found to be below 100 [3]. Recent advances promising for presenting information about the stability in artificial intelligence (AI) are closely intertwined with and strength of different kinds of structures related to soils, the rapid development of electronic technologies, forming such as road fills, airport roadways and dams, and road the foundation of the so-called "information society.” foundations. 250 Informatica 49 (2025) 249–268 Y. Lan et al. Moreover, these tests can be conducted in unsoaked comparative study by Ma et al. [27], evaluating 20 and soaked soil varieties. The laboratory CBR tests are metaheuristic algorithms for SVR parameter tuning in characterized by their demanding nature regarding time landslide displacement prediction, revealed considerable and manual effort. Moreover, the outcomes of such tests variation in outcomes. The Multiverse Optimizer emerged are frequently marred by discrepancies attributed to the as particularly efficient in achieving high accuracy with suboptimal quality of conditions in the lab and samples of low computational cost, highlighting the critical role of dirt, which, in turn, lead to inaccurate CBR values [6]. algorithm selection in enhancing SVR model Various studies have been performed on the performance. These studies collectively underscore the California bearing ratio, which led the researchers to growing impact of hybrid AI models in geotechnical formulate different procedures. Previous studies showed applications. However, few works have focused that changes in the soil types and properties affected the specifically on integrating SVR with newer and less value of CBR. Amongst other things, it has been observed explored metaheuristic algorithms such as the Alibaba and that most research work has focused on studying the Forty Thieves (AFT), Dingo Optimization Algorithm relationship existing between the compacted properties, (DOA), or Adaptive Opposition Slime Mold Algorithm unique indicators, as well as the mineral examinations' (AOSMA). Our study addresses this gap by systematically CBR concentrations [6]–[8][9], [10], [11], [12], [13], evaluating and comparing these novel SVR-based hybrid [14][15], [16], [17], [18]. To determine the value of CBR, models in predicting CBR, offering insights into their soils are compacted at a predetermined MDD and OMC at optimization behaviors, convergence patterns, and a specified energy level for the soil material. For the CBR, predictive robustness. By incorporating recent the cases are soaked for four days; the primary purpose of advancements and experimental benchmarks, this work this soaking is to allow absorption. Consequently, the aims to contribute both technically and methodologically assessment of the CBR value for a soaked sample typically to the field of AI-driven geotechnical modeling. requires a period of approximately five to six days. This Considering the variety of parameters to be delay can prove detrimental to the timely completion of a considered and the range of datasets observed, as large-scale construction endeavor. Since soil is vastly explained in the previous paper, it becomes of prime different from one quality to another, applying this importance to develop robust predictive methodologies to exercise to the foundation soil samples collected from a model the mechanical attributes of the CBR and delineate small count of sites may not truly represent the soil the complex correlations between the constituents of soil. properties for all roads. To eliminate this deficiency, a Recent studies have explored various soft computing large count of specimens is needed to be gathered for tests. and machine learning techniques for predicting the Therefore, calculating the CBR values for pavement California Bearing Ratio (CBR). These include Random subgrade soils deploying easily identifiable parameters Forest, Gradient Boosting, and XGBoost, which are becomes very important in developing appropriate known for their solid performance in regression tasks. pavement design parameters. Recently, interest in using However, such models often need extensive tuning and Artificial Intelligence (AI) tactics to solve geotechnical can struggle to capture complex nonlinear relationships, engineering problems has increased. Consequently, some especially when feature interactions are subtle. valuable outcomes have been obtained [19], [20]. Conversely, Support Vector Regression (SVR) Furthermore, a limited count of studies has documented demonstrates strong generalization and robustness, endeavors to appraise the (CBR) of soils via adopting especially when combined with kernel functions and diverse Artificial Neural Network (ANN) methodologies metaheuristic optimization. To assess the effectiveness of [21], [22], [23]. Recent advances in machine learning have the proposed SVR-based hybrid models, we also increasingly supported geotechnical engineering by incorporated Random Forest as a benchmark and improving the prediction of soil and foundation properties compared its predictive accuracy with the SVR models through data-driven models. Support Vector Regression enhanced by metaheuristics. (SVR), when combined with metaheuristic optimization, Recent studies have explored different soft computing has proven particularly effective in modeling complex and machine learning approaches for predicting CBR. Key nonlinear relationships within geotechnical datasets. Ngo techniques include Artificial Neural Networks (ANN), et al. [24] demonstrated that SVR optimized via Multiple Linear Regression (MLR), Group Method of metaheuristics yielded superior performance in predicting Data Handling (GMDH), and SVM. These models use soil the unconfined compressive strength of stabilized soils. parameters such as Atterberg limits, dry density, optimum Similarly, Hoang et al. [25] applied enhanced SVR models moisture content, and soil gradation as inputs. However, to successfully estimate pile bearing capacity, showcasing many struggle with issues like overfitting, limited the method’s versatility in foundation engineering. In the generalization to unseen data, or inadequate context of California Bearing Ratio (CBR) prediction, hyperparameter optimization. As shown in Table 1, most Bherde et al. [26] reported that Random Forest Regression previous models achieved only moderate accuracy and did outperformed other algorithms, including SVR, with not utilize metaheuristic optimization to boost prediction maximum dry density and gravel content being the most performance. To fill this gap, this study introduces a influential predictors. While these results support the hybrid Support Vector Regression (SVR) model effectiveness of ensemble models, they also underline the combined with three metaheuristic optimizers—AFT, need for more optimized SVR configurations that can DOA, and AOSMA—aimed at improving the model’s match or exceed ensemble performance. A broader ability to learn nonlinear patterns. The superior Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 251 performance of the SVAF model, especially in RMSE and R² metrics, highlights the benefits of this approach. Table 1: Overview of past methods for CBR prediction Performance Dataset Study Model Type Input Features Metrics (R² / Notes Size RMSE) LL, PL, PI, MDD, ANN prone to Yildirim & Artificial Neural R² = 0.945 / OMC, % Sand and 120 overfitting and high Gunaydin Network (ANN) RMSE = 1.82 Gravel variance LL, PL, PI, Good performance, Taskiran GMDH Compaction 200 R² ≈ 0.92 limited interpretability properties Alawi & Multiple Linear LL, PL, PI, Soil Struggles with 100 R² = 0.86 Rajab Regression (MLR) Gradation nonlinear relationships SVR + Improved Ngo et al. Arithmetic Grain size, R² = 0.96 / SVR enhanced with 150 [24] Optimization Density, OMC, PI RMSE = 1.12 metaheuristic tuning (IAOA) Stochastic Gradient Wu et al. LL, PL, MDD, % Ensemble method with Boosting Regression 300 R² = 0.974 [25] Clay, % Silt good generalization (SGBR) Strong performance, Bherde et Random Forest MDD, % Gravel, 400 R² = 0.982 but no hyperparameter al.[26] Regression (RFR) OMC, PI optimization LL, PL, PI, MDD, Best accuracy using Current R² = 0.9968 / SVR + AFT OMC, SDA, QD, 300 hybrid SVR and AFT Study RMSE = 0.7946 OPC metaheuristic This study addresses the challenges of traditional 2 Materials and methodology CBR testing methods, which are often time-consuming and costly, by exploring advanced machine learning models supplemented with nature-inspired optimization 2.1 Data gathering techniques. Specifically, it focuses on Support Vector This study's dataset consists of 121 soil samples gathered Regression (SVR), a popular tool for nonlinear regression. from various geotechnical investigation reports and lab Since SVR's performance heavily depends on tests across different regions in [insert country or region, hyperparameter selection, three recent metaheuristic e.g., southwestern Iran or southeastern Asia—please algorithms—Adaptive Opposition Slime Mold Algorithm specify based on your case]. The samples include a variety (AOSMA), Alibaba and the Forty Thieves Algorithm of soil types such as clayey soils, silty sands, gravels, and (AFT), and Dingo Optimization Algorithm (DOA)—are mixtures to ensure the broad applicability of the predictive employed to optimize the SVR framework. These models. Each sample records essential input parameters algorithms offer diverse search strategies with strong like [list key parameters: e.g., dry density, moisture potential for effective global optimization and faster content, liquid limit, plasticity index, etc.], with the convergence. The predictive capability of these hybrid California Bearing Ratio (CBR) used as the target models is evaluated using five standard statistical metrics: variable. Data were obtained from both published R², RMSE, MSE, RSR, and WAPE. The study aims to (1) literature and in-house experiments, offering a develop and validate an SVR model for predicting the comprehensive understanding of soil behavior in various California Bearing Ratio (CBR) based on soil and compaction parameters; (2) improve SVR's predictive geological settings. The data in this investigation depend performance through hyperparameter tuning with the on eight variables: OPC, SDA, QD, plastic limit, liquid three optimization algorithms; and (3) perform a limit, maximum dry density, plasticity index, and ideal comprehensive comparison of the models using these content. Simultaneously, the resulting component of metrics to identify the most accurate and reliable one for interest is identified as the CBR value. The dataset has geotechnical use. been split into two subsets: 30% of the total set is made up of the testing phase, while 70% comprises the training phase. 252 Informatica 49 (2025) 249–268 Y. Lan et al. Table 1 depicts a numerical example of some of the parameters for analyzing statistics. The maximum values parameters used in building the scheme. This table gives for the LL, PL, PI, MDD, OMC, SDA, QD, and OPC an overall summary of some of the attributes, such as variables are 52.1, 37.2, 19.5, 1.777, 29.5, 20, 20, and 8, minimum (Min), maximum (Max), standard deviation respectively. Also, CBR's maximum value as an output (St.), and mean (M). It is crucial to determine the essential parameter is 66.75 percent. Table 2: The statistical features of the dataset components The numerical traits Parameters Max Min Mean St.dev LL 52.10 21.20 35.8450 6.15380 PL 37.20 17.90 26.6830 4.28120 PI 19.50 2.10 9.16230 4.11490 MDD 1.7770 1.3650 1.49290 0.08830 OMC 29.50 18.90 24.1430 2.42670 SDA (%) 20.0 0 10.6600 7.15460 QD (%) 20.0 0 10.640 8.19610 OPC (%) 8.0 2.0 4.94490 2.37980 CBR (%) 66.750 19.690 39.9590 10.8660 2.2 Support vector regression (SVR) dependent variables: Eq. (1) gives, mathematically, the operation of an SVR; In its early phases, the (SVM) technology was used to address pattern identification problems, initially 𝑓(𝑥) = 𝑍𝑇𝜑(𝑥) + 𝑏 (1) introduced by Vapnik [28]. Then, Vapnik [29] suggested 𝑏 is the variable element the SVM algorithm to solve problems with function 𝑓(𝑥) symbolizes the expected parameters approximation, which resulted in developing the SVR 𝑍 is the 𝑙 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 weighting component. approach. The SVR approach is an innovative and perhaps An example of how distinct components 𝑋𝑖 are practical method in data regression analysis. In this study, mapped to a feature space with many dimensions is the Support Vector Regression (SVR) is used as the main function 𝜑(𝑥). predictive model. Because of its ability to manage The formal expression for the ε-insensitive coefficient nonlinear relationships through kernel functions, SVR is of loss is found in Eq. (2). especially suitable for modeling complex geotechnical |𝑦 − 𝑓(𝑥)|ఌ = max (0, |𝑦 − 𝑓(𝑥)| − 휀) (2) datasets. The Radial Basis Function (RBF) kernel is chosen due to its effectiveness in high-dimensional feature The difference between the real number, symbolized spaces and its ability to generalize well. The SVR model by y, and the anticipated value, f(x), as expressed depends on three main hyperparameters: C (regularization theoretically by Eq. (3), is known as the residual. parameter): Regulates the trade-off between achieving a 𝑅(𝑥, 𝑦) = 𝑦 − 𝑓(𝑥) (3) low training error and maintaining a simple model. γ According to Eq. (4), incorporating the entire residue (gamma): Determines the influence range of a single piece within a preset boundary value of ε is the optimum training example; a lower value means a wider reach, regression model. while a higher value indicates a more localized effect. ε −휀 ≤ 𝑅(𝑥, 𝑦) ≤ 휀 (4) (epsilon): Defines the tolerance margin within which Eq. (4) coincides with the hypothesis on the whole errors are not penalized. These parameters were tuned training data set. Thus, if the residual meets the criterion using metaheuristic algorithms to minimize the root mean 𝑅(𝑥, 𝑦) = ±휀, the data exhibits a maximum detour from square error (RMSE) of predictions. the hyperplane. One can calculate a spatial separation of From an academic standpoint, the (SVR) may be an arbitrary data point (𝑥, 𝑦) from the hyperplane explained in the subsequent terms. SVR uses a dataset that 𝑅(𝑥, 𝑦) = 0 by the formula |𝑅(𝑥, 𝑦)|/‖𝑊∗‖. Further, 𝑍∗ has 𝑁 entries in it {(𝑋𝑖 , 𝑦𝑖), 𝑖 = 1,2, … , ?̅?}. can be calculated as: The training dataset's overall count of instances is 𝑍∗ = (1,−𝑍𝑇)𝑇 (5) denoted by 𝑀. 𝑋𝑖 = {𝑥1, 𝑥2, … , 𝑥𝑚} ∈ 𝑅 𝑚 denotes the 𝑖 − 𝑡ℎ In this question, the variable 𝛿 is assumed to be the component of the vector with M dimensions. maximum degree of dispersion between the hyperplane 𝑦𝑖 ∈ 𝑅 represents the genuine value connected to 𝑋𝑖. 𝑅(𝑥, 𝑦) = 0 and the dataset (𝑥, 𝑦). All the training data For that, in machine learning tactics, l-dimensional can be induced to meet the requirements shown in Eq. (6). feature space - or something similar - represents the exact If the value of δ reaches its maximum in the SVR scheme, mapping of any training data point 𝑋𝑖 in an SVR. The it means that the scheme can exhibit the best obtained hyperplane is in the space of features that will be generalization ability. selected using Support Vector Regression towards the |𝑅(𝑥, 𝑦)| ≤ 𝛿‖𝑍∗‖ (6) portrayal of the optimal hyper-plane between the input (or the uncorrelated) variable and the exact output, the Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 253 Whenever 𝑅(𝑥, 𝑦) equals an ε, the most significant 휁 ∗ 𝑖 ≥ 0, 휁 ≥ 0 𝑖 = 1, 2, … , ?̅? 𝑖 distance is reached. After that, Eq. (6) may be changed to The first term of Eq. (10) tends to restrict weights, so become Eq. (7). Considering the translation of an they stay above a certain limit to preserve whether the optimization issue to a minimal ‖𝑍‖, ‖𝑍∗ ‖2 = ‖𝑍‖2 + 1, regression algorithm is constant. The second part of this and ‖𝑍∗ ‖ must be a minimal value to attain the maximum system defines the ratio of certainty to vulnerability for of 𝛿. possible hazards resulting from previous experiences 휀 = 𝛿‖𝑍∗‖ (7) using the ε-insensitive Relationship to losing. After determining the solution for the quadratic enhancement Even with efforts to keep mistakes within the (−휀, 휀) issue with inequality restrictions, the value of coefficient range during training, it is still possible for certain errors Z can be gathered from Eq. (11). to surpass this limit. If training mistakes are less than -ε, 𝑀 they are displayed by 휁𝑖 , and if they are more than ε, they 𝑍 =∑( 𝛽∗ − 𝛽𝑖)𝜑(𝑥𝑖) ∗ 𝑖 (11) are displayed by 휁𝑖 . We define the notations 휁𝑖 and 휁∗𝑖 𝑖=1 according to Eqs. (8) and (9), appropriately. The values of 𝛽∗𝑖 and 𝛽𝑖 are determined by solving a 0 𝑅(𝑥𝑖 , 𝑦휁 𝑖) − 휀 ≤ 0 𝑖 = { quadratic programming problem that incorporates an (8) 𝑅(𝑥𝑖 , 𝑦𝑖) − 휀 𝑜𝑡ℎ𝑒𝑟𝑠 indication of the Lagrangian multipliers. Mathematically, 0 휁∗ 휀 − 𝑅(𝑥𝑖 , 𝑦𝑖) ≤ 0 the Support Vector Regression function is displayed with = { (9) 𝑖 the utilize of the equation depicted as Eq. (12): 휀 − 𝑅(𝑥𝑖 , 𝑦𝑖) 𝑜𝑡ℎ𝑒𝑟𝑠 𝑀 By using the 휀 sensitivity loss function, (SVR) aims 𝑓(𝑥) =∑( 𝛽∗ − 𝛽𝑖)𝐾(𝑥𝑖 − 𝑥) + 𝑏 (12) to eliminate the distinction across the training data and the 𝑖 hyperplane region and choose a hyperplane that produces 𝑖=1 the best result. The goal function for (SVR) optimization The kernel function, which is displayed as 𝐾(𝑥𝑖 − 𝑥), is displayed by Eq. (10): exhibits the capacity to convert the training data into a 1 𝑚𝑖𝑛𝐹(𝑍, 𝑏, 휁𝑖 , 휁 ∗ ) = ‖𝑍‖2 + 𝑐 ∑𝑀 higher nonlinear l-dimensional space. Therefore, this 𝑖=1(휁2 𝑖 +𝑖 (10) methodology is deemed appropriate for solving issues 휁∗ ) 𝑖 related to nonlinear relationships, including projecting With the confinements: electrical power. Figure 1 shows the operational diagram 𝑦𝑖 − 𝑍 𝑇𝜑(𝑥𝑖) − 𝑏 ≤ 휀 + 휁𝑖 𝑖 = 1, 2, … , ?̅? for SVR. 𝑍𝑇𝜑(𝑥𝑖) + 𝑏 − 𝑦𝑖 ≤ 휀 + 휁∗ 𝑖 = 1, 2, … , ?̅? 𝑖 Figure 1: The progress and validation flowchart of an SVR scheme 254 Informatica 49 (2025) 249–268 Y. Lan et al. 2.3 AOSMA 𝑋𝑖 = (𝑥 1 𝑖 , 𝑥 2 𝑖 , ⋯ , 𝑥 𝑑 𝑖 ), ∀𝑖 ∈ [1, 𝑁] is the 𝑖𝑡ℎ slime The plasmodial slime mold's oscillatory mode is the basis mold's location in 𝑑-dimension. for SMA. The slime mold employs a positive-negative 𝐹(𝑋𝑖), ∀𝑖 = [1, 𝑁] symbolizes the 𝑖𝑡ℎ slime's feedback mechanism in conjunction with an oscillatory fitness. mode to establish the optimal route toward nutrition [30]. The following represents the location as well as AOSMA is a new statistical technique that incorporates an fitness of the slime mold at round 𝑡: opposition-based learning-based adaptive decision- 𝑥1 1 𝑥 2 1 ⋯ 𝑥 𝑑 1 𝑋 1 2 making method to improve slime mold's nearing conduct 𝑥 1 𝑋(𝑥) = 2 𝑥2 ⋯ 𝑥 𝑑 2 𝑋 = [ 2] (13) [31]. ⋮ ⋮ ⋮ ⋮ ⋮ Let it be assumed that a total of 𝑁 individuals of the [𝑥1𝑁 𝑥 2 𝑁 ⋯ 𝑥 𝑑 𝑋 𝑁] 𝑁 species of slime mold under consideration are resident in 𝐹(𝑋) = [𝐹(𝑋1), 𝐹(𝑋2),⋯ , 𝐹(𝑋𝑁) ] (14) the search domain that is bounded by an upper boundary In the (𝑡 + 1) cycle, the situation of the slime mold (UB) and a lower boundary (LB) for theoretical has been advanced. It has undergone an upgrade in its framework development of the (AOSMA). spatial disposition, which determine as Eq. (15): 𝑋𝐿𝐵(𝑡) + 𝑉𝑑(𝑊. 𝑋𝐴(𝑡) − 𝑋𝐵(𝑡)) 𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 < 𝑚𝑖 𝑋𝑖(𝑡 + 1) = { 𝑉𝑒 . 𝑋𝑖(𝑡) 𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 ≥ 𝑚𝑖 , ∀𝑖 ∈ [1, 𝑁] (15) 𝑟𝑎𝑛𝑑. (𝑈𝐵 − 𝐿𝐵) + 𝐿𝐵 𝑝1 < 𝑧 𝑋𝐿𝐵 is the best local slime mold The randomly assigned velocities are known as 𝑉𝑑 𝑋𝐴 and 𝑋𝐵 are pooled individuals by random and 𝑉𝑒 and are defined as follows: 𝑊 is the weight factor 𝑉𝑑 ∈ [−𝑑, 𝑑] (23) 𝑉𝑑 and 𝑉𝑒 are the random velocities. 𝑝1 and 𝑝2 are randomly chosen numbers in [0,1] 𝑉𝑒 ∈ [−𝑒, 𝑒] (24) The slime mold's chance, which starts at a random 𝑡 search situation, is fixed at 𝛿 = 0.03. 𝑑 = arctanh (− ( ) + 1) (25) 𝑇 The 𝑖 − 𝑡ℎ member of the population's threshold 𝑡 value, 𝑚𝑖, aids in choosing the slime mold's location, 𝑒 = 1 − (26) which is calculated as Eq. (16): 𝑇 𝑚 T is the maximum cycle. 𝑖 = 𝑡𝑎𝑛ℎ|𝐹(𝑋𝑖) − 𝐹𝐺|, ∀𝑖 ∈ [1, 𝑁] (16) SMA holds great promise for both investigation and 𝐹𝐺 = 𝐹(𝑋𝐺) (17) exploitation in technological problem-solving and enhancement. However, the improvement of slime mold 𝑊(𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹(𝑖)) regulations in the SMA area is nevertheless reliant on a 𝐹𝐿𝐵 − 𝐹(𝑋𝑖) 𝑁 1 + 𝑟𝑎𝑛𝑑. log ( + 1) 1 ≤ 𝑖 ≤ count of basic circumstances. 𝐹𝐿𝐵 − 𝐹𝐿𝑤 2 = (18) Case 1: The region's best slime mold, 𝑋𝐿𝐵, and two 𝐹𝐿𝐵 − 𝐹(𝑋𝑖) 𝑁 1 − 𝑟𝑎𝑛𝑑. log ( + 1) < 𝑖 ≤ 𝑁 random individuals, 𝑋𝐴 and 𝑋𝐵, with velocity 𝑉𝑑, drove to { 𝐹𝐿𝐵 − 𝐹𝐿𝑤 2 determine when 𝑝1 ≥ 𝑧 and 𝑝2 < 𝑚𝑖. This stage makes it easier to strike a balance amongst discovery and 𝐹𝐺 and 𝑋𝐺 are the values of worldwide top ranking extraction. and worldwide best well-being. Case 2: The orientation of the slime mold with 𝑟𝑎𝑛𝑑 displays a random number in within [0,1] velocity 𝑉𝑒 directs the search when 𝑝1 ≥ 𝑧 and 𝑝2 ≥ 𝑚𝑖. 𝐹𝐿𝐵 and 𝐹𝐿𝑤 are local best and worst fitness values. This instance facilitates fraud. The utilization of an ascending order for sorting Case 3: When 𝑝1 < 𝑧, the person reinitializes within fitness values can be employed in a minimization a specified search domain. This phase facilitates problem: investigation. [𝑆𝑜𝑟𝑡𝐹 , 𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹 ] = 𝑠𝑜𝑟𝑡(𝐹) (19) Case 1 shows how the possibilities of finding The local best and worst fitness also the local best solutions are improperly controlled during exploration and slime mold 𝑋𝐿𝐵 are computed as Eqs. (20-22): exploitation since 𝑋𝐴 and 𝑋𝐵 are two random slime molds. 𝐹 To get around this limitation, 𝑋𝐴 can be used in place of 𝐿𝐵 = 𝐹(𝑆𝑜𝑟𝑡𝐹(1)) (20) best local individual 𝑋𝐿𝐵. Consequently, the location of 𝐹𝐿𝑊 = 𝐹(𝑆𝑜𝑟𝑡𝐹(𝑁)) (21) the 𝑖 − 𝑡ℎ component is remodeled as Eq. (27): 𝑋𝐿𝐵 = 𝑋(𝑆𝑜𝑟𝑡𝐼𝑛𝑑𝐹(1)) (22) 𝑋𝐿𝐵(𝑡) + 𝑉𝑑(𝑊. 𝑋𝐿𝐵(𝑡) − 𝑋𝐵(𝑡)) 𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 < 𝑚𝑖 𝑋𝑛𝑖(𝑡) = { 𝑉𝑒 . 𝑋𝑖(𝑡) 𝑝1 ≥ 𝛿 𝑎𝑛𝑑 𝑝2 ≥ 𝑚𝑖 (27) 𝑟𝑎𝑛𝑑. (𝑈𝐵 − 𝐿𝐵) + 𝐿𝐵 𝑝1 < 𝛿 Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 255 Case 2 illustrates how slime mold deliberately targets A flexible decision is formed drawing on the prior a nearby location, resulting in a path with a lower fitness worth of fitness 𝑓(𝑋𝑖(𝑡)) and the present fitness value level. A better approach to this issue is to implement an 𝑓(𝑋𝑛𝑖(𝑡)) in the event of a depleted nutrient pathway. adaptive decision system. This is a typical academic kind of writing. It helps provide Case 3 illustrates that the SMA offers criteria for added research as needed. Then, the situation for the exploration. However, with a small value 𝛿 = 0.03, the subsequent cycle is improved: exploration has been limited. To address the issue, it is 𝑋𝑖(𝑡 + 1) imperative to introduce an auxiliary exploration adjunct 𝑋𝑛𝑖(𝑡) 𝐹(𝑋𝑛𝑖(𝑡)) ≤ 𝐹(𝑋𝑖(𝑡)) for SMA. A practical approach to addressing the = { , ∀𝑖 (30) 𝑋𝑟𝑖(𝑡) 𝐹(𝑋𝑛𝑖(𝑡)) > 𝐹(𝑋𝑖(𝑡))limitations of Cases 2 and 3 entails employing a flexible ∈ [1, 𝑁] decision approach that leverages opposition-based learning (OBL) to determine the necessity of additional The aforementioned AOSMA framework is displayed exploratory efforts [32]. The OBL uses a defined 𝑋𝑜𝑝𝑖 in in pseudo-code, as shown in Algorithm 1. the search domain, which is precisely the opposite of the In this study, the Adaptive Opposition Slime Mold 𝑋𝑛𝑖 for each member (𝑖 = 1,2,⋯ , 𝑁), and compares it Algorithm (AOSMA) is used not as a standalone to upgrade the following cycles’ situation. It assists in optimizer but as a hybrid component integrated with improving convergence and avoiding the chances of being Support Vector Regression (SVR). AOSMA optimizes closed in the local minima. So, the 𝑋𝑜𝑝𝑖 for the 𝑖 − 𝑡ℎ three key hyperparameters of SVR—specifically the individual in 𝑗 − 𝑡ℎ (𝑗 = 1,2,⋯ , 𝑠) dimension is regularization parameter C, the epsilon-insensitive loss margin ε, and the kernel coefficient γ—with the goal of described as follows: minimizing prediction error measured by RMSE. Through 𝑗 𝑋𝑜𝑝𝑖 = min(𝑋𝑛𝑖(𝑡)) + 𝑚𝑎𝑥(𝑋𝑛𝑖(𝑡)) its adaptive opposition-based learning strategy and 𝑗 (28) − 𝑋𝑛𝑖 (𝑡) dynamic parameter control, AOSMA allows for more effective exploration of the search space and helps prevent 𝑋𝑟𝑖 represents the 𝑖 − 𝑡ℎ member’s situation in the premature convergence. As a result, the hybrid AOSMA- reduction issue and is depicted as: SVR model achieves better accuracy and generalization in 𝑋𝑜𝑝𝑖(𝑡) 𝐹(𝑋𝑜𝑝𝑖(𝑡)) < 𝐹(𝑋𝑛𝑖(𝑡)) predicting California Bearing Ratio (CBR) values from 𝑋𝑟𝑖 = { (29) 𝑋𝑛𝑖(𝑡) 𝐹(𝑋𝑜𝑝𝑖(𝑡)) ≥ 𝐹(𝑋𝑛𝑖(𝑡)) geotechnical data. Algorithm 1: AOSMA Begin Using the criteria for searching boundary range [𝐿𝐵, 𝑈𝐵], choose a target variable 𝑓 with inputs 𝑁, 𝑠, 𝑇, and 𝛿. Outputs: 𝑋𝐺 and 𝐹𝐺 Initialization: Launch the slime mold at arbitrary. 𝑋𝑖 = (𝑥 1, 𝑥2𝑖 𝑖 , ⋯ , 𝑥 𝑑 𝑖 ), ∀𝑖 ∈ [1, 𝑁] during the first revision, inside the query boundaries 𝑈𝐵 and 𝐿𝐵 𝑡 = 1. while (𝑡 ≤ 𝑇) → Determine the 𝑁 slime mold's fitness values 𝐹(𝑋). → Put the fitness value in order. → The local best individual 𝑋𝐿𝐵 should be updated to match the local best conditioning 𝐹𝐿𝐵. → The local weakest fitness 𝐹𝐿𝑊 should be updated. → Update the matching worldwide greatest individual 𝑋𝐺 and global best fitness 𝐹𝐺. → Refresh the measurement of 𝑊. → Update the 𝑑 using Eq. (25) and 𝑒 using Eq. (26). for (each slime mold 𝑖 = 1: 𝑁) o Create the 𝑝1 and 𝑝2 randomized numbers. o Create the 𝑚𝑖 threshold quantity. o Utilizing Eq. (27), determine the new slime mold location 𝑋𝑛𝑖. o Determine the new slime mold 𝐹(𝑋𝑛𝑖)'s nutritional value. if (𝐹(𝑋𝑛𝑖) > 𝐹(𝑋𝑖) // Adaptive decision strategy • Estimate 𝑋𝑜𝑝𝑖 using Eq. (24). //Opposition-based learning • Select 𝑋𝑟𝑖 using Eq. (29). End o Revise the subsequent cycle slime mold 𝑋𝑖 using Eq. (30). end → The following repetition 𝑡 = 𝑡 + 1 end The result is 𝑋𝐺, representing the global most effective region. 256 Informatica 49 (2025) 249–268 Y. Lan et al. 2.4 AFT 𝜆1 (𝜆1 = 1) refers to a fixed value that controls exploration and exploitation. The present investigation clarifies the basic AFT algorithm's mathematical model, which is described in 𝑎 = [(𝑛 − 1). 𝑟𝑎𝑛𝑑(𝑛, 1)] (34) [33]. The scheme encompasses three states that can be The vector 𝑟𝑎𝑛𝑑(𝑛, 1) is generated as a set of random analyzed and delineated in the following: numbers within the bounds of [0,1]. Case 1: The pursuit of Ali Baba by the thieves, as 𝑥𝑡 𝑚𝑡 𝑖 𝑖𝑓 𝑓(𝑥 𝑡 𝑖 ) ≥ 𝑓(𝑚𝑡 𝑎(𝑖)) derived from information obtained from a source, can be 𝑎(𝑖) = { 𝑚𝑡 𝑎(𝑖) 𝑖𝑓 𝑓(𝑥 𝑡 𝑖 ) < 𝑓(𝑚𝑡 (35) displayed by a simulation of their situations, as illustrated 𝑎(𝑖)) in Eq. (31): The score of the fitness function is denoted by 𝑓(0). 𝑥𝑡+1 = 𝑔𝑏𝑒𝑠𝑡𝑡 + [Tdt(𝑏𝑒𝑠𝑡𝑡𝑖 − 𝑦 𝑡 𝑖 𝑖 )𝑟1 + Case 2: Thieves may perceive they have been tricked Tdt(𝑦𝑡𝑖 −𝑚 𝑡 and will likely start exploring unfamiliar and unplanned 𝑎(𝑖))𝑟2]𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5), 𝑝 ≥ (31) areas. 0.5, 𝑞 > 𝑃𝑝𝑡 𝑥𝑡+1𝑖 = 𝑇𝑑𝑡[(𝑢𝑗 − 𝑙𝑗)𝑟 + 𝑙𝑗]; 𝑝 ≥ 0.5 , 𝑞 ≤ 𝑃𝑝𝑡 (36) 𝑦𝑡𝑖 represents Ali Baba’s situation regarding the thief 𝑖. The upper and lower bounds of the search domain at 𝑚𝑡 𝑎(𝑖) represents the amount of cleverness that dimension j are displayed by 𝑢𝑗 and 𝑙𝑗, respectively. Marjaneh uses to cover up thievery 𝑖. r displays a stochastic quantity generated in the 𝑥𝑡+1𝑖 denotes the situation of the 𝑖 − 𝑡ℎ interval [0, 1]. thief. 𝑔𝑏𝑒𝑠𝑡𝑡 Case 3: To improve AFT's exploration and is the most excellent situation a thief has ever exploitation capabilities, thieves can investigate had worldwide. alternative search situations beyond those identified 𝑟1, 𝑟2, rand, 𝑝, and 𝑞 are random values created within through the utilization of Eq. (31). This scenario can be [0, 1] formulated as Eq. (37): 𝑏𝑒𝑠𝑡𝑡𝑖 is the optimal location of thief 𝑖 has determined. Tdt is the robbers' surveillance area as specified by 𝑥𝑡+1 = 𝑔𝑏𝑒𝑠𝑡𝑡 − [Tdt𝑖 (𝑏𝑒𝑠𝑡𝑡𝑖 − 𝑦 𝑡 𝑖 )𝑟1 Eq. (32). + Tdt(𝑦𝑡𝑖 (37) 𝑝 ≥ 0.5 presents either 0 or 1 −𝑚𝑡 𝑎(𝑖))𝑟2]𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5) 𝑃𝑝𝑡 is Ali Baba's potential perceptive ability, as stated by Eq. (33). Algorithm 2 concisely and formally describes the 𝑠𝑔𝑛(𝑟𝑎𝑛𝑑 − 0.5) can be −1 or 1, and iterative pseudo-code stages that correspond to the core 𝑎 is defined as Eq. (34). AFT. 𝑡 The proposed hybrid framework combines the Dingo Tdt = 𝜏0𝑒 −𝜏1( )𝜏 𝑇 1 (32) Optimization Algorithm (DOA) with Support Vector 𝑡 and 𝑇 Please consult the current and maximal Regression (SVR) to tune the model’s hyperparameters: repetition standards, accordingly. C, ε, and γ. The DOA emulates the natural hunting tactics 𝜏0 (𝜏0 = 1) is a preliminary estimate of the of dingoes, such as surrounding, chasing, and attacking monitoring length. prey, which are adapted into search operators for 𝜏1 (𝜏1 = 2) is a set amount that regulates the exploring the SVR parameter space. The aim is to discovery and utilization of resources. minimize the SVR’s RMSE on training data by identifying 𝑡 the optimal parameter combination. By balancing 𝑃𝑝𝑡 = 𝜆0log (𝜆1( )𝜆0 (33) diversification and intensification, the DOA-SVR hybrid 𝑇 model can effectively avoid local optima and enhance 𝜆0 (𝜆0 = 1) depicts the final assessment of the SVR's ability to generalize for accurate CBR prediction. robbers' chances of completing their task after the hunt. Algorithm 2: AFT Establish the regulation settings and get started. Start by assessing every thief's starting, optimal, and worldwide situations. Start by assessing Marjane's intelligence in comparison to all thieves. Set 𝑡 ← 1 While (𝑡 ≤ 𝑇) do Eq. (33) is used for modifying the input parameter 𝑃𝑝𝑡 . for each thief, do if (𝑝 ≥ 0.5) then if (𝑞 ≥ 𝑃𝑝𝑡) then Use Equation (32) to update the thieves' positioning. else Utilizing Equation (36), adjust the robbers' whereabouts. end if else Refine the thieves’ situation by Eq. (37). end if Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 257 end for Refresh all thieves' current, best, and worldwide standings. Utilizing Eq. (35), alter Marjane's wit goals. 𝑡 = 𝑡 + 1 end while Give back the world's optimal solution. ?⃗? = (1,1) provides access to the dingo's situation at 2.5 Dingo optimization algorithm (DOA) (𝑃∗ − 𝑃, 𝑄∗)For example, Eqs. (38) and (39) make it From the earliest times, nature has consistently been easier for dingos to travel throughout the hunting area and regarded as an exceptionally instructive and impactful find their prey randomly. educator. Every species that exists on the planet Earth possesses a distinct and unique mechanism for ensuring its 2.5.2 Hunting survival. The present study involves the mathematical Using a mathematical method, creating a dingo hunting modeling of hunting behavior and social arrangements in strategy involves assuming that the alpha, beta, and other the dingo species. This analytical approach is the basis for members of the pack have a thorough awareness of the developing a DOA nature-inspired optimization technique possible prey sites. When conducting hunting trips, the [34]. The two primary constituents of DOA are regarded alpha dingo always takes the lead. However, other dingo as exploration and exploitation. The algorithm generates species, including beta, may hunt as well. Eqs. (43) to (51) various anticipated outcomes within the search domain are developed with this issue in line with the discussion. during the initial exploration phase. However, the ?⃗? 𝛼 = |𝐴 1. ?⃗? 𝛼 − 𝑃⃗⃗ ⃗ (43) | subsequent exploitation phase enables identifying and pursuing the most desired resolutions within the ?⃗? 𝛽 = |𝐴2. ?⃗? ⃗⃗ (44) 𝛽 − 𝑃 ⃗ | predetermined space. To discern the optimal resolution for a given pragmatic concern, refinement, and integration of ?⃗? ⃗ 𝑜 = |𝐴 ⃗⃗ 3. ?⃗? (45) 𝑜 − 𝑃 | both constituent factors are necessary. Nonetheless, achieving equilibrium among the proposed algorithm's ?⃗? 1 = |?⃗?𝛼 − ⃗⃗𝐵 ⃗ (46) . ?⃗? 𝛼 | constituents is arduous due to its stochastic disposition. To ?⃗? 2 = |?⃗? 𝛽 − ⃗⃗𝐵 ⃗ . ?⃗?𝛽 | (47) address an authentic engineering dilemma, the impetus for developing an algorithm implementation utilizing ?⃗? 3 = |?⃗? 𝑜 − ⃗⃗𝐵 ⃗ (48) hybridized meta-heuristics is derived from this . ?⃗? 𝑜 | inspirational notion [34]. The following formulae are utilized to determine each Dingo optimization is done by the computational dingo's intensity: designing of the prey's pursuit, encirclement, and attack. 1 𝐼 𝛼 = log ( + 1) (49) 𝐹𝛼 − (1𝐸 − 100) 2.5.1 Encircling 1 Given the lack of previous knowledge about the search 𝐼 𝛽 = log( + 1) (50) 𝐹𝛽 − (1𝐸 − 100) location and its ideal characteristics, it is proposed that the objective or target prey is the best agent tactic currently in 1 𝐼 𝑜 = log ( + 1) (51) use, representing the social hierarchy of dingoes. The 𝐹𝑜 − (1𝐸 − 100) following mathematical formulas can be used to formalize the dingoes' behavior: 2.5.3 Attacking ?⃗? 𝑑 = |𝐴 , ?⃗? 𝑝(𝑥) − 𝑃⃗⃗ ⃗ (𝑖) | (38) If a situation update is unavailable, it may be inferred that the dingo successfully concluded its hunt through a 𝑃⃗⃗ ⃗ (𝑖 + 1) = ?⃗? 𝑝(𝑥) − ?⃗? . ?⃗? (𝑑) (39) predatory attack. To formally articulate the strategy, the value of ?⃗? is systematically diminished linearly through 𝐴 = 2 . 𝑎 1 (40) the utilization of mathematical notation. Noteworthy is the ?⃗? = 2?⃗? . 𝑎 2 − ?⃗? (41) fact that the variation range of ?⃗? 𝛼 is further diminished by ?⃗? . The value above may be identified as ?⃗? 𝛼, which is a 3 ?⃗? = 3 − (𝐼 × ( )) (42) stochastic variable generated within the range of [-3b, 3b], 𝐼𝑚𝑎𝑥 where the constant ?⃗? undergoes a decremental process The neighborhood dingoes' geographic coordinates from 3 to 0 over a series of cycles. When ?⃗? 𝛼 Values are are displayed as a two-dimensional vector. The dingo may randomly generated within the interval [1,1]. An adjust its situation to match the coordinates of (𝑃, 𝑄) exploratory agent is capable of moving to any possible based on the prey's location, which is displayed as situation along the trajectory between its existing location (𝑃∗, 𝑄∗). By adjusting the 𝐴 and ?⃗? vectors about the and the prey's location. present situation, the graphic shows every possible location around the ideal agent. Setting 𝐴 = (1,0) and 258 Informatica 49 (2025) 249–268 Y. Lan et al. 2.5.4 Searching be characterized as a stochastic vector whereby the Dingoes exhibit hunting patterns primarily determined by elements with values that are less than or equal to one take their pack's location. They consistently progress in pursuit priority over those greater than or equal to one. This feature elucidates the gap's influence as described in Eq. of locating and subduing prey. ⃗⃗𝐵 ⃗ represents random (38). The hybrid framework combines the Dingo variables. Notably, if the value assigned to ⃗⃗𝐵 ⃗ falls below Optimization Algorithm (DOA) with Support Vector -1, it implies that the prey is retreating from the search Regression (SVR) to tune hyperparameters: C, ε, and γ. agent. Conversely, if ⃗⃗𝐵 ⃗ exceeds 1, the pack is advancing Inspired by the natural hunting strategies of dingoes, such toward its prey. This particular intervention facilitates the as surrounding, chasing, and attacking prey, the DOA Department of Defense conduct a comprehensive global translates these behaviors into search operators that reconnaissance of identified targets. One factor explore the SVR parameter space. Its aim is to minimize contributing to a heightened probability of exploration the RMSE of SVR on training data by identifying the best within the DOA is the component denoted as 𝐴⃗⃗ ⃗ . In Eq. parameter combination. By balancing exploration and (40), the vector 𝐴⃗⃗ ⃗ can generate a range of random exploitation, the DOA-SVR hybrid effectively avoids numbers within the interval between 0 and 3, independent local optima and improves SVR’s generalization ability, of the weight of the prey selected. The DOA function can leading to more accurate CBR predictions. Algorithm 3 offers the pseudo-code for the DOA. Algorithm 3: Dingo Optimization Input: The population of dingoes 𝐷𝑛 (𝑛 = 1, 2, . . . , 𝑛) Output: The best dingo. (Here, the best values are minimum) Generate initial search agents 𝐷𝑖𝑛 Start the value of 𝑏⃗⃗⃗ , 𝐴 , and ⃗⃗𝐵 ⃗ . While the Termination condition is not reached, do Appraise each dingo’s fitness and intensity cost. 𝐷𝛼 = dingo with the best search 𝐷𝛽 = dingo with the second-best search 𝐷𝑜 = Dingoes search outcomes afterward Cycle1 repeat for 𝑖 = 1: 𝐷𝑖𝑛 do Renew the latest search agent state. end for Project the fitness and intensity cost of dingoes. Record the value of 𝑆𝛼, 𝑆𝛽, 𝑆𝛿 Record the value of 𝑏⃗⃗⃗ , 𝐴 , and ⃗⃗𝐵 ⃗ . 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 = 𝐼𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛 + 1 Monitor if cycle≥ Stopping criteria output end while Choosing AFT, AOSMA, and DOA as optimizers was algorithms during training and optimization to maintain driven by their unique algorithmic bases and search consistent behavior during repeated runs and to support methods, enabling a thorough comparison of their reproducibility. metaheuristic behaviors. These approaches are relatively 2.7 Hybridization strategy of SVR with recent and less studied, yet they show competitive performance in diverse regression and engineering tasks. metaheuristic algorithms Incorporating them with SVR in this research allows This study developed three hybrid machine learning evaluation of both their predictive accuracy and models—SVAF, SVSM, and SVDO—by integrating optimization stability across different algorithmic Support Vector Regression (SVR) with three advanced frameworks. metaheuristic optimization algorithms: Alibaba and Forty Thieves (AFT), Adaptive Opposition Slime Mold 2.6 Reproducibility and run settings Algorithm (AOSMA), and Dingo Optimization Algorithm (DOA). The goal is to boost SVR's prediction accuracy by To ensure the robustness and reproducibility of the results, optimizing its key hyperparameters—penalty parameter each hybrid SVR model (AFT-SVR, DOA-SVR, C, kernel parameter γ, and epsilon-insensitive loss ε— AOSMA-SVR) was executed 30 independent times. This using the global search methods provided by these allows for reliable statistical analysis of model metaheuristics. While SVR is a strong nonlinear performance. Additionally, random seed initialization was regression technique, its effectiveness heavily relies on controlled using a fixed seed (e.g., seed = 42) across all Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 259 proper parameter tuning. Traditional manual or grid 𝑛 search methods are often inefficient or may yield 1 𝑅𝑀𝑆𝐸 = √ ∑(𝑑 𝑝 2 𝑛 𝑖 − 𝑖) (53) suboptimal results, especially with complex, high- dimensional geotechnical data. Therefore, this hybrid 𝑖=1 approach exploits the global search and convergence 𝑛 1 strengths of nature-inspired algorithms to automate SVR 𝑀𝑆𝐸 = ∑((𝑑 𝑛 𝑖 − 𝑝𝑖) 2 (54) hyperparameter optimization. 𝑖=1 - In SVAF, the AFT algorithm explores the search 𝑅𝑀𝑆𝐸 space dynamically through mechanisms like global 𝑅𝑆𝑅 = (55) 𝑆𝑡. 𝐷𝑒𝑣 surveillance, balancing exploration and exploitation, and adaptive decision-making inspired by Marjaneh. These ∑𝑛𝑖=1|𝑑𝑖 − 𝑏𝑖| 𝑊𝐴𝑃𝐸 = ∑𝑛 (56) features enable it to identify optimal SVR parameters 𝑖=1|𝑏𝑖| reliably. 𝑛 indicates the count of samples; 𝑑𝑖 displays the - In SVSM, AOSMA enhances the slime mold forecasted value; 𝑏𝑖 displays the actual value, while ?̅? and algorithm with opposition-based learning and adaptive ?̅? represent the mean of the forecasted value and the strategies, allowing it to escape local minima more average of the actual amount, respectively. effectively and converge more rapidly, thus providing better hyperparameter configurations. - In SVDO, the DOA mimics the social hunting 3 Outcomes and discussion behaviors of dingoes—such as encircling, attacking, and This paper reports on developing a Support Vector searching—to iteratively fine-tune the SVR parameters Regression model using three new enhancement for higher prediction accuracy. techniques, AFT and DOA, aimed at developing three Each metaheuristic aimed to minimize the RMSE of hybrid predictive models for soil estimation CBR. In SVR predictions on training data, with the best parameter previous schemes, the information about information was set used to train the final hybrid model. The process was divided into two subsets: a set to learn and a set to validate repeated 30 times to ensure stability and reproducibility. the scheme, 70% and 30% of the data, respectively. The This hybrid approach directly supports the study's goal of five consecutive statistical metrics, namely, R2, RMSE, creating accurate, efficient, and generalizable models for MSE, RSR, and WAPE, were considered to get the full predicting the California Bearing Ratio (CBR) of soils. view of the optimizers' performance. Outcomes can be Using these metaheuristics not only enhances SVR’s shown in Table 2. The statistical indicators are analyzed learning ability but also reduces the manual effort and in this section to determine whether one model is generally computational cost typically required for parameter better. By studying the various R2 values among these tuning. different schemes, it would be crystal clear that the most promising outcomes are given out by SVAF in both the 2.8 Performance evaluation tactics testing and training stages, with 0.9968 and 0.9929 values, A range of evaluators was deployed to appraise hybrid respectively. Meanwhile, the minimum value of R2 schemes' productivity in CBR value prediction. The list of among all comparative schemes was given to the SVSM evaluators comprises RMSE, MSE, R2, the ratio of RMSE model at 0.9767. The key thing worth mentioning here is to standard deviation (RSR), and lastly, weighted absolute that all the schemes have increased R2 during their test percentage error, or WAPE. R2 determines the degree of phases, indicating that the schemes are well-trained. linear relationship between the actual and forecasted Maximum RMSE, MSE, RSR, and WAPE values are magnitudes. The RMSE is the square root of the ratio 1.6271, 2.6475, 0.1524, and 0.0334 for SVSM in training. between the square of the count of specimens and the For the testing section, maximum RMSE values, MSE, estimated value departure from the actual value. WAPE RSR, and WAPE are 1.5824, 2.5042, 0.1409, and 0.0312 could be quantified by dividing the total absolute error by for SVSM. By contrasting the evaluators' and errors' the total real demand. Eq. (21-25) provides the values of values, the best hybrid scheme for estimating the CBR these metrics above. value of soils is the combination of SVR and the ATF 2 algorithm (SVAF). This model has the highest R2 value ∑𝑛 (0.9968 in the testing phase) and the lowest error value 𝑖=1(𝑏 ?̅? ( − ?̅? 𝑅2 = 𝑖 − ) 𝑑𝑖 ) (52) (0.7946 in testing) among all three components. √[∑𝑛𝑖=1(𝑏𝑖 − ?̅?) 2][∑𝑛𝑖=1(𝑑𝑖 − ?̅?) 2] ( ) Table 3: The hybridized schemes produced the findings Schemes SVAF SVSM SVDO SVR Section Train Test Train Test Train Test Train Test RMSE 0.9316 0.7946 1.6271 1.5824 1.3363 1.171 1.336392 1.171305 R2 0.9929 0.9968 0.9767 0.9825 0.9852 0.992 0.985202 0.992446 MSE 0.868 0.6314 2.6475 2.5042 1.7859 1.372 1.7859 1.372 260 Informatica 49 (2025) 249–268 Y. Lan et al. RSR 0.0872 0.0708 0.1524 0.1409 0.1251 0.1043 0.1251 0.1043 WAPE 0.0162 0.0141 0.0334 0.0312 0.0234 0.0212 0.0234 0.0212 Fig. 2 displays the dispersed presentations illustrating respectively. Three schemes were produced by the the correlation between the gauged and expected subsequent analysis, which combined the SVR scheme California Bearing Ratio values. R2 and RMSE are two with the three optimizer strategies applied to training and types of assessments that include numerical data. When testing. Fig. 2 shows the findings of the current the value of this evaluation metric decreases, density investigation. R2 of SVAF appears to be comparatively increases because RMSE functions as a deviation more favorable than the rest of the schemes because the controller. Additionally, the training and testing data data points maintain the same directionality and are nearer points are drawn toward the center axis by the R2 the centerline. From empirical data, it can be induced that evaluator. The figure below illustrates several other in all cases, and quite noticeable in the case of SVDO, the variables which also include but are not restricted to the precision of the test phase values is higher than that of the linear regression model's centerline, which is positioned at training phase. Overall, the result from the acquired data the location Y=X, as well as dual lines that are in red in Fig. 2 is the most favorable result using the SVR method below and above the midline, in that order, at Y=0.9X and and the ATF optimizer since R2 and RMSE in learning Y=1.1X. The lower and upper ends of the line and validation also gave the best result. That could be due intersections provide the false predictions of an to the capability of this model in terms of minimizing error underestimation and an overestimation of values, and being the best in performance regarding the R2. Figure 2: The scatter plot of expected and measured values Fig. 3 presents the correlation between expected and distinct parts: model training and model validation. actual CBR values obtained using three different classes Among them, the SVAF representing an SVR and the of hybrid schemes. The graphs have been divided into two ATF algorithm generate closer agreement between the Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 261 gauged CBR values of the expected output for testing and unfavorable agreement appears quite clearly in SVR and training data sets. By contrast, the status of the least AOSMA's union, SVSM. Figure 3: The comparison line-symbol plot between expected and gauged CS Fig. 4 presents the deviations between the gauged and the same set. The figure shows that, for the highest and estimated values through three hybrid schemes regarding lowest performing schemes, the majority of errors are the California Bearing Ratio. This figure indicates that the found in a narrower range of (-3,3) % in SVAF and (- greatest error for SVSM when assessed is around 18%, 6,17%) % in SVSM. whereas for schemes undergoing training, it was 12% in 262 Informatica 49 (2025) 249–268 Y. Lan et al. Figure 4: The error distribution of the schemes over samples shown in a time series plot. The errors in the observed values of the undrained %, and 7% for SVSM during training and testing of the shear strength for the three different hybrid scheme schemes. The figure reflects the distribution of 25-75% of types—SVAF, SVSM, and SVDO—are displayed in Fig. errors in a range less than (-1, 1) % in SVAF and (-3, 3) % 5. Based on this figure, the maximum errors are about 11 in SVSM: best and worst schemes, respectively. Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 263 Figure 5: The standard half-box plot showing the error ratio of the hybrid schemes created. To enhance the statistical robustness of the proposed robustness across runs. The SVR-Dingo Optimization models, 95% confidence intervals for the R² values were Algorithm model also performed well, with a confidence calculated based on multiple independent runs of each interval of 0.7120 to 0.8298, slightly broader but with the algorithm. As shown in Table 4, the standard SVR model highest upper bound. Meanwhile, the SVR-AOSMA has the widest interval, from 0.6302 to 0.7631, indicating model shows an interval between 0.6653 and 0.7848, greater variability and less predictive stability. In contrast, ranking it between the other hybrids in stability and the three hybrid SVR models display narrower intervals performance. These intervals confirm that the SVAF with higher upper bounds, signifying more consistent model not only offers high prediction accuracy but also performance. Among these, the SVR model combined delivers consistent results, making it the most reliable with the Alibaba and Forty Thieves algorithms (SVAF) model among those tested for CBR estimation. achieved the most favorable confidence interval, from 0.7243 to 0.8078, reflecting both high accuracy and 264 Informatica 49 (2025) 235–248 Y. Lan et al. indicating relatively limited predictive power. In contrast, Table 4: Confidence intervals based on R2 the SVR models enhanced with metaheuristic algorithms demonstrate superior performance. Among these, the Lower Upper Model SVR-Dingo Optimization Algorithm model shows a Bound Bound SVR 0.6303 0.7632 confidence interval between 0.712 and 0.830, reflecting SVR + Dingo Optimization substantial improvement over the baseline. Similarly, the 0.7120 0.8298 Algorithm SVR-Adaptive Opposition Slime Mould Algorithm model SVR + Adaptive Opposition yields a confidence range of 0.665 to 0.785, suggesting 0.6653 0.7848 Slime Mould Algorithm better stability and generalization. Notably, the SVR- SVR + Alibaba and the Forty Alibaba and the Forty Thieves (SVAF) model achieves the 0.7243 0.8078 Thieves highest lower bound (0.724) and an upper bound of 0.808, indicating both high precision and consistent 4 Sensitivity analysis performance. The limited overlap between the confidence intervals of the SVAF model and those of the other models The ANOVA-based sensitivity analysis conducted on the supports the claim of its statistically significant performance of different predictive models for estimating superiority. This distinction highlights the effectiveness of the California Bearing Ratio (CBR) reveals statistically the AFT optimizer in enhancing SVR’s learning capability significant differences among the models. The confidence and minimizing prediction errors. Overall, the results of intervals for the coefficient of determination (R²) provide the ANOVA test confirm that metaheuristic-optimized insight into each model's accuracy and robustness. The SVR models, particularly SVAF, provide more accurate baseline SVR model exhibits the lowest performance with and reliable predictions of CBR values compared to the a confidence interval ranging from 0.630 to 0.763, standard SVR approach. Table 5: Sensitivity analysis based on ANOVA Models lower upper SVR 0.630 0.763 SVR-Dingo Optimization Algorithm 0.712 0.830 SVR-Adaptive Opposition Slime Mould Algorithm 0.665 0.785 SVR-Alibaba and the Forty Thieves 0.724 0.808 inspired social hunting behaviors, facilitating effective 5 Discussion neighborhood search. However, its slower convergence during exploitation may limit its ability to finely tune SVR This section compares the three hybrid models—SVAF hyperparameters, especially in high- dimensional spaces. (SVR + AFT), SVSM (SVR + AOSMA), and SVDO Regarding computational efficiency, SVAF requires (SVR + DOA)—focusing on their predictive accuracy, slightly more training time than SVSM and SVDO due to convergence behavior, and computational efficiency. As multiple adaptive conditions and surveillance cycles in shown in Table 2, SVAF outperforms the others across all AFT, but its superior accuracy justifies this. SVSM offers five metrics: R ², RMSE, MSE, RSR, and WAPE. During faster runtimes but less predictive precision. SVDO falls testing, SVAF achieved the highest R ² (0.0.9968) and the between the two in terms of performance and lowest RMSE (0.7946), indicating excellent computational demand. Overall, findings suggest that generalization and minimal error in estimating CBR SVAF provides the best balance between accuracy and values. This success stems from the adaptive balance optimization quality, making it a strong candidate for between exploration and exploitation in the Alibaba and practical CBR prediction tasks. Future research could Forty Thieves (AFT) optimization strategy, which explore combining AOSMA 's rapid convergence with enhances SVR' s ability to find optimal hyperparameters. AFT 's stability to improve training efficiency without The random surveillance mechanism in AFT promotes sacrificing accuracy. Future research will aim to improve global search, while Marjaneh's intelligence adjustment the models' applicability across various regions by testing enhances local refinement, enabling rapid convergence them on datasets with diverse soil types. Combining toward optimal SVR settings. In contrast, the SVSM Support Vector Regression with deep learning—for model, which employs the Adaptive Opposition Slime example, as a post-processing tool after deep feature Mold Algorithm, showed weaker performance (R² = extraction—could boost prediction accuracy, particularly 0.9825, RMSE = 1.5824 during testing). Although for large or complex datasets. Another valuable approach AOSMA incorporates opposition-based learning to boost is integrating these hybrid AI models into geotechnical exploration, it can produce more oscillatory convergence software platforms, allowing real-time, data-driven patterns, possibly leading to suboptimal SVR tuning. Its decision-making in engineering and construction projects. complex adaptive threshold settings may also increase Although the hybrid SVR models presented sensitivity to initial parameters. The SVDO model (SVR demonstrated strong predictive performance on the + Dingo Optimization Algorithm) performed moderately available dataset, there are some limitations to consider. (R ² = 0.992, RMSE = 1.171). DOA utilizes biologically Firstly, without an external validation set, the Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 265 generalizability of the results may be restricted beyond the Acknowledgements current data. Secondly, the relatively small sample size increases the risk of overfitting, especially with the use of We wish to state that no individuals or organizations metaheuristic optimization. Additionally, the dataset only require acknowledgment for their contributions to this encompasses a limited range of soil types and regions, investigation. which could limit the models' broader applicability. It is also important to note that larger, more diverse datasets Authorship contribution statement might benefit from alternative modeling techniques such Na Feng: Writing-Original initial drafting, as deep learning or ensemble methods to achieve better Conceptualization, Supervision, Project administration. predictive accuracy. These limitations will be addressed in Yulin Lan: Methodology, Software future research to improve the model's robustness and Zhisheng Yang: Formal Analysis, Language Reviw generalizability. To enhance model robustness, we plan to The authors declare that there is no conflict of interest use regularization like L1/L2 penalties and early stopping regarding the publication of this paper. to prevent overfitting. Models will be tested under various conditions—smaller datasets and more noise—to check resilience. Including confidence intervals or error margins Author statement for metrics like RMSE and R² will better measure The manuscript has been read and approved by all the uncertainty. These steps will help create more reliable, authors, the requirements for authorship, as stated earlier generalizable models for geotechnical uses. in this document, have been met, and each author believes that the manuscript represents honest work. 6 Conclusion The current investigation has adopted an SVR scheme to Funding project the CBR value of soil. Although the outcomes of This investigation was not funded by any specific grant the conventional method were effective, it had some from public, commercial, or charitable funding bodies. limitations. The laboratory process is costly and is not considered to be time-effective. The drawbacks above can Ethical approval be overcome by substituting the software-based approach with artificial intelligence. The accuracy of the system in The paper has attained ethical approval from the predicting the CBR was quite remarkable. The input institutional review board, ensuring the protection of variables were selected to forecast the target parameter, participants' rights and compliance with the relevant which was depicted as CBR. Five different performance ethical guidelines. metrics were utilized to appraise the precision delivered by the schemes under consideration. These included R2, References RMSE, MSE, RSR, and WAPE. Three distinct meta- [1] A. Chegenizadeh and H. Nikraz, “CBR test on heuristic optimization approaches—the Dingo reinforced clay,” in Proceedings of the 14th Pan- Optimization Algorithm, Alibaba, the Forty Thieves American Conference on Soil Mechanics and Optimization algorithm, and the Adaptive Opposition Geotechnical Engineering (PCSMGE), the 64th Slime Mold Algorithm—have been examined in the Canadian Geotechnical Conference (CGC), current study to increase the system's functional Canadian Geotechnical Society, 2011. efficiency. The conclusions below may be drawn from the [2] T. F. Kurnaz and Y. Kaya, “Prediction of the analysis's outcome: California bearing ratio (CBR) of compacted soils • The thorough analysis of the pertinent characteristics by using GMDH-type neural network,” The was the foundation for developing the projection European Physical Journal Plus, EPJ Plus, vol. schemes to estimate CBR. A comparison between the 134, Jul. 2019. experimental outcomes and those obtained utilizing https://doi.org/10.1140/epjp/i2019-12692-0. the suggested schemes showed that the latter's CBR [3] H. B. Seed and P. De Alba, “Use of SPT and CPT prediction accuracy was significantly high. tests for evaluating the liquefaction resistance of • In the current research, the test phase has shown that sands,” in Use of in situ tests in geotechnical the forecast data's scattering value increased by 0.39, engineering, ASCE, 1986, pp. 281–302. 0.59, and 0.69 for SVAF, SVSM, and SVDO, [4] M. Gams and T. Kolenik, “Relations between respectively, from the training phase. electronics, artificial intelligence and information • The California Bearing Ratio outcomes presented in society through information society rules,” this investigation indicate a significant discrepancy Electronics (Basel), MDPI, vol. 10, no. 4, p. 514, between the observed and projected values, with an 2021. average underestimate of almost 1.24 for the https://doi.org/10.3390/electronics10040514. suggested schemes. With a value of 1.6271, the [5] R. W. Day, Soil testing manual. McGraw-Hill, RMSE displayed its maximum error in the scheme's 2001. SVSM in the training phase. The SVAF had the [6] M. M. E. Zumrawi, “Prediction of CBR Value lowest error rate in the testing session, with a rating from Index Properties of Cohesive Soils.,” of 0.7946. 266 Informatica 49 (2025) 235–248 Y. Lan et al. University of Khartoum Engineering Journal, vol. Model, Elsevier, vol. 36, no. 9, pp. 4096–4105, 2, no. ENGINEERING, 2012. 2012. https://doi.org/10.1016/j.apm.2011.11.039. [7] W. P. M. Black, “A method of estimating the [19] M. A. Shahin, M. B. Jaksa, and H. R. Maier, California bearing ratio of cohesive soils from “Artificial neural network applications in plasticity data,” Geotechnique, ICE Virtual geotechnical engineering,” Australian Library, vol. 12, no. 4, pp. 271–282, 1962. geomechanics, vol. 36, no. 1, pp. 49–62, 2001. https://doi.org/10.1680/geot.1962.12.4.271. [20] J. A. Abdalla, M. F. Attom, and R. Hawileh, [8] K. B. Agarwal and K. D. Ghanekar, “Prediction of “Prediction of minimum factor of safety against CBR from plasticity characteristics of soil,” in slope failure in clayey soils using artificial neural Proceeding of 2nd South-east Asian Conference network,” Environ Earth Sci, Springer, vol. 73, on Soil Engineering, Singapore. June, 1970, pp. pp. 5463–5477, 2015. 11–15. https://doi.org/10.1007/s12665-014-3800-x. [9] M. Linveh, “Validation of correlations between a [21] B. Yildirim and O. Gunaydin, “Estimation of number of penetration test and in situ California California bearing ratio by using soft computing bearing ratio test,” Transp Res Rec, vol. 1219, pp. systems,” Expert Syst Appl, Elsevier, vol. 38, no. 56–67, 1989. 5, pp. 6381–6391, 2011. [10] D. J. Stephens, “The prediction of the California https://doi.org/10.1016/j.eswa.2010.12.054. bearing ratio,” Civil Engineering= Siviele [22] Tja. Taskiran, “Prediction of California bearing Ingenieurswese, Sabnet, vol. 1990, no. 12, pp. ratio (CBR) of fine grained soils by AI methods,” 523–528, 1990. Advances in Engineering Software, Elsevier, vol. https://hdl.handle.net/10520/AJA10212019_1435 41, no. 6, pp. 886–892, 2010. 6. https://doi.org/10.1016/j.advengsoft.2010.01.003. [11] T. Al-Refeai and A. Al-Suhaibani, “Prediction of [23] S. Bhatt, P. K. Jain, and M. Pradesh, “Prediction CBR using dynamic cone penetrometer,” Journal of California bearing ratio of soils using artificial of King Saud University-Engineering Sciences, neural network,” Am. Int. J. Res. Sci. Technol. Elsevier, vol. 9, no. 2, pp. 191–203, 1997. Eng. Math, vol. 8, no. 2, pp. 156–161, 2014. https://doi.org/10.1016/S1018-3639(18)30676-7. [24] T. Q. Ngo, L. Q. Nguyen, and V. Q. Tran, “Novel [12] M. W. Kin, “California bearing ratio correlation hybrid machine learning models including support with soil index properties,” Master degree vector machine with meta-heuristic algorithms in Project, Faculty of Civil Engineering, University predicting unconfined compressive strength of Technology Malaysia, 2006. organic soils stabilised with cement and lime,” https://eprints.utm.my/4064/1/MakWaiKinMFK International Journal of Pavement Engineering, A2006.pdf. Taylor & Francis, vol. 24, no. 2, p. 2136374, 2023. [13] S. R. CNV and K. Pavani, “MECHANICALLY https://doi.org/10.1080/10298436.2022.2136374. STABILISED SOILS-REGRESSION [25] X. Wu, F. Lu, and T. He, “Exploring the potential EQUATION FOR CBR EVALUATION,” 2006. of machine learning in predicting soil California [14] P. Vinod and C. Reena, “Prediction of CBR value bearing ratio values,” Periodica Polytechnica of lateritic soils using liquid limit and gradation Civil Engineering, vol. 69, no. 2, pp. 551–566, characteristics data,” Highway Research Journal, 2025. https://doi.org/10.3311/PPci.38678. IRC, vol. 1, no. 1, pp. 89–98, 2008. [26] V. Bherde, L. Kudlur Mallikarjunappa, R. [15] R. S. Patel and M. D. Desai, “CBR predicted by Baadiga, and U. Balunaini, “Application of index properties for alluvial soils of South machine-learning algorithms for predicting Gujarat,” in Proceedings of the Indian California bearing ratio of soil,” Journal of geotechnical conference, Mumbai, 2010, pp. 79– Transportation Engineering, Part B: Pavements, 82. ASCE Library, vol. 149, no. 4, p. 4023024, 2023. [16] G. Ramasubbarao and S. G. Sankar, “Predicting https://doi.org/10.1061/JPEODX.PVENG-1290. soaked CBR value of fine grained soils using [27] J. Ma et al., “A comprehensive comparison among index and compaction characteristics,” Jordan metaheuristics (MHs) for geohazard modeling Journal of Civil Engineering, vol. 7, no. 3, pp. using machine learning: Insights from a case study 354–360, 2013. of landslide displacement prediction,” [17] M. Alawi and M. Rajab, “Prediction of California Engineering Applications of Artificial bearing ratio of subbase layer using multiple Intelligence, Elsevier, vol. 114, p. 105150, 2022. linear regression models,” Road Materials and https://doi.org/10.1016/j.engappai.2022.105150. Pavement Design, Taylor & Francis, vol. 14, no. [28] V. N. Vapnik, “The nature of statistical learning,” 1, pp. 211–219, 2013. Theory, 1995. https://doi.org/10.1080/14680629.2012.757557. [29] V. Vapnik, “Statistical Learning Theory. New [18] H. Ghanadzadeh, M. Ganji, and S. Fallahi, York: John Willey & Sons,” Inc, 1998. “Mathematical model of liquid–liquid equilibrium [30] M. K. Naik, R. Panda, and A. Abraham, for a ternary system using the GMDH-type neural “Adaptive opposition slime mould algorithm,” network and genetic algorithm,” Appl Math Soft comput, Springer, vol. 25, no. 22, pp. 14297– Metaheuristic-Enhanced SVR Models for California Bearing Ratio… Informatica 49 (2025) 249–268 267 14313, 2021. https://doi.org/10.1007/s00500-021- 06140-2. [31] S. Li, H. Chen, M. Wang, A. A. Heidari, and S. Mirjalili, “Slime mould algorithm: A new method for stochastic optimization,” Future Generation Computer Systems, Elsevier, vol. 111, pp. 300– 323, 2020. https://doi.org/10.1016/j.future.2020.03.055. [32] H. R. Tizhoosh, “Opposition-based learning: a new scheme for machine intelligence,” in International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06), Vienna, Austria, IEEE, 2005, pp. 695–701. https://doi.org/10.1109/CIMCA.2005.1631345. [33] M. Braik, M. H. Ryalat, and H. Al-Zoubi, “A novel meta-heuristic algorithm for solving numerical optimization problems: Ali Baba and the forty thieves,” Neural Comput Appl, Springer, vol. 34, no. 1, pp. 409–455, 2022. https://doi.org/10.1007/s00521-021-06392-x. [34] A. K. Bairwa, S. Joshi, and D. Singh, “Dingo Optimizer: A Nature-Inspired Metaheuristic Approach for Engineering Problems,” Math Probl Eng, Wiley Online Library, vol. 2021, p. 2571863, 2021. https://doi.org/10.1155/2021/2571863. 268 Informatica 49 (2025) 235–248 Y. Lan et al. https://doi.org/10.31449/inf.v49i16.9315 Informatica 49 (2025) 269–284 269 A Cutting-Edge Bio-Inspired Computational Framework for Advanced Virtual Reality Classification through Sophisticated Predictive Methodologies Yanyan Song, Hongping Zhou* 1 School of Communication Technology, Communication University of China Nanjing, Nanjing 211172, China *Corresponding Author E-mail: zhouhongping1231@163.com Keywords: virtual reality, histogram gradient boosting classification, decision tree classification, ebola optimization search, differential squirrel search algorithm Received: May 20, 2025 Virtual reality (VR) enables the simulation of a wide variety of complex environments, from tiny biological structures to entirely imaginary worlds. These simulations create new possibilities for learning, training, and interaction that go beyond the limits of the physical world. However, virtual reality (VR) realizes this imaginary world, so it is not just a dream. VR works through the invocation of many of the senses. It creates realistic simulations through the creation of immersive settings that combine the real and the imagined, thereby affording special hands-on learning possibilities in a variety of subjects. This study investigates the effectiveness of combining Histogram Gradient Boosting Classification (HGBC) with Decision Tree Classification (DTC), the Ebola Optimization Search (EOS), and the Differential Squirrel Search Algorithm (DSSA) to predict VR outcomes. By integrating these advanced predictive and optimization techniques, the approach aims to enhance accuracy. Research will be conducted to ascertain the possible uses of VR, enhance user experience, and assess the impact on industries related to training, education, healthcare, and entertainment. In the evaluation phase, HGDS attained the highest accuracy of 0.967 in the test phase, making it the top-performing hybrid model, while DTEO showed the lowest accuracy of 0.907, identifying it as the weakest model. Povzetek: Članek predstavi bio-navdihnjen hibridni okvir za klasifikacijo uporabniških odzivov v virtualni resničnosti. Združuje HGBC, DTC ter optimizatorja EOS in DSSA za izboljšanje napovedne točnosti. Okvirjeva naloga je zanesljivo razvrščati VR-podatke. 1 Introduction The desktop VR enables the user to interact with the system using a mouse or other controlling device while VR simulation signifies a computer-created environment sitting in front of a desktop computer monitor, as the name where users can move around, interact with objects, and implies [3]. Immersion systems utilize a visualization interact with virtual characters, also implied as "agents" or display worn on the head of the user that completely "avatars." A generic virtual setting is a 3D world [1], and, occludes their field of view. Collaborative systems have like gravitation simulation, virtual environments human-controlled avatars interacting with each other, and frequently aim to be as realistic as possible in both they can be immersive or desktop-based systems. Second appearance and object behavior. It must be underlined, Life is one of the most recent and most effective nonetheless, that there need to be no parallels between this collaboration systems [4]. An attempt is also being made virtual environment and the actual world. One of the to use the collaborative systems for exploration. The advantages of virtual environments is their ability to mixed reality systems merge computer-generated matter replicate completely unrealistic scenarios [2]. Virtual with the real environment, which is viewed directly or environments, however, provide a safe space to test through a camera. This system can teach engineering and scenarios that would be too dangerous or difficult to medical skills to students, which are thought to be perform in real life, and they imitate the setting where the impossible by this recently invented system [5]. student will eventually work. Learning by humans requires interaction with the There are other ways of deploying VR; four typical environment, taking in information provided by the use of configurations are included below: senses and experience [6]. Through computer simulation, ✓ Desktop VR (Monoscopic or Stereoscopic) VR takes the role of real-world sensory input. Reacting to ✓ Immersive VR (HMD, CAVE, widescreen) motion and common human behaviors in the actual world ✓ Collaborative Systems offers interaction. Therefore, VR can be useful in ✓ Mixed or Augmented Reality education since it allows pupils to experience a situation 270 Informatica 49 (2025) 269–284 Y. Song et al. or scenario firsthand rather than only imagining it [7]. The components of the virtual world in a database, is called the three main components that define the quality of VR virtual environment module. The physics engine is one of experiences are immersion, interaction, and multisensory the major parts of any realistic simulation. A physics feedback. Immersion is being engulfed or enclosed by the engine comprises a set of rules that control the motion and surroundings [8]. One of the advantages of immersion is interaction of dynamic objects in a virtual scene. A typical that it ensures a feeling of presence or the perception that physics engine can include a Newtonian mechanics one is actually in the world being displayed [9]. simulation and collision detection, which describes when Interactivity means the capability of the user's body two objects collide. They apply gravitational, friction, and movements to affect the events happening in the impulse effects using physical rules. When two things hit simulation and, in turn, provoke a reaction from the each other, the latter effect is important [20], [21]. When simulation [10], [11]. two active entities collide, collision detection is necessary. The multisensory nature of VR allows information to The physics engine determines their terminal velocity be derived from several senses, which further enhances the using their simulated traits, such as mass, substance, and experience in that this makes it both more engaging and speed. more convincing—increasing, as it does, the sensation of presence because this provides redundancy of 1.1 Related works information, which diminishes the likelihood of Normally, state-of-the-art reports that focus on specific misunderstanding. Information from multiple sensory aspects of the discipline or on specific application fields entries is reinforced by a sensory combination [12], [13]. are available. They would mostly provide the taxonomies VR enables the user to act as though they are in the actual that systematically illustrate and classify the various world by substituting a virtual environment for the current methodologies involved. one. A constructivist learning approach benefits from ➢ While Dachselt and Hübner [22] examined the VR's immersive features [14]. The premise of menus for AR and VR environments for all of the constructivism, a theory of knowledge acquisition, is that MR domain [23], they also presented an people build knowledge by concluding from their past extensive taxonomy. experiences. The idea, as propounded by Jean Piaget, ➢ A taxonomy of NVEs, taking into consideration assumes that learners try to fit new experiences into the distribution and communication topologies, has world picture that they have developed earlier. Learners been provided by Macedonia and Zyda [24]. change their worldview to fit the new experience when Mania and Chalmers [25] have presented a they cannot assimilate new information into their system taxonomy of platforms and communication. effectively. Learning comes from experiences where ➢ Bowman has provided several taxonomies for actions are based on assumptions about how the world both interaction methods [26] and navigation functions, only to find that it does not align with those methods [27]. Mine's early research [28] assumptions [15], [16], [17]. Adjusting the mental model identifies the essential navigation and interaction of how the world functions becomes necessary to account in virtual spaces. for the new experience. The view is that learning is an ➢ Gabbard [29] provides good generalized active process of testing hypotheses. In other words, this overviews, presents suggested best practices in concept contrasts with the notion of learning as something application design, and provides guides for passive in nature: the mere acquisition or assimilation of conducting user evaluations. Livatino and data. VR is a powerful learning tool because it provides a Koeffel have also presented guidelines for context where such hypothesis testing can occur. Virtual Environments (VEs) assessment [30]. According to [18], students who interact with new ➢ The current tracking technology is overviewed by material are more likely to store and recall it. Welch and Foxlin [31], who also compare and Control software is at the heart of this system. This contrast the respective merits and disadvantages regulates the exchange of information between the virtual of each. world and the interface layer in response to user actions, Recent work has explored innovative methods for updating the world appropriately. On display devices such classifying virtual reality (VR) using bio-inspired as the haptic and visual interface, it also determines when computational models. Song and DiPaola [32] introduced the scene should be shown. With the help of additional a bio-responsive VR system based on physiological data tools, the control software can connect to the outside world to enhance immersion. Zayed and Reda [33] demonstrated through the internet, which might be an essential that applying neurophysiological biosignals combined capability in systems involving collaboration or many with deep learning could classify cognitive states in VR users. The virtual environment module includes a model with 97% accuracy. Similarly, Arslan et al. [34], [35], of real-world entities and the virtual world model. It [36], [37], [38] employed emotion classification from includes state and position information apart from biosignals and machine learning in VR, achieving 97.78% appearance. They could be dynamic objects, such as accuracy. These advancements are significant in areas moving objects or even avatars. They could also be static such as rehabilitation, education, and psychotherapy. objects. This model of the virtual environment needs to be VanHorn and Çobanoğlu [35] also developed a refreshed at regular intervals to add dynamic objects [19]. biomedical image classification system within a VR-based The module for the virtual environment, which would environment, making AI more accessible to experts. store positions, shapes, and other attributes of all A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 271 Overall, these studies emphasize how biosignals, machine engagement in VR, and its variability supports the creation learning, and VR can be integrated to develop advanced of effective predictive models. predictive models, showcasing the potential of bio- This dataset attempts to contribute to the development of inspired computational models to improve VR VR through the analysis of user experiences. An attempt classification techniques. has been made in this study to develop a better VR design with much more improvement in user comfort and 1.2 The study's objective customization by understanding the physical and emotional reactions of consumers in diverse VR This work examines the possible contribution that VR situations. This information allows developers to work on technology will make to enhance learning outcomes and boosting VR systems and creating personalized increase student engagement in schools. In data experiences that will enhance customers' delight and classification, this study applies an HGBC model and a immersion. Fig. 1 presents a contour plot for the DTC model. The performance of the schemes is optimized correlation of the features. by using methods such as EOS and DSSA. This research User ID: This variable identifies every participant will explore the integration of VR within diverse who experienced VR. Each user is assigned a unique ID disciplines of study to understand how it can facilitate the so that their data in the dataset can be differentiated. retention of both theoretical knowledge and practical Age: This variable stores the age of the subject competencies of learners, given the immersion one participating in VR exposure. For example, this could be experiences in a VR environment. Possible drawbacks and an integer representing the current user's age at the time of limitations, including accessibility of resources, shall also using the VR. be discussed to present a comprehensive overview of what Gender: This variable displays the user’s gender. The can be expected from this educational technology. categories "Male," "Female," and "Other" can be utilized to define the user's gender identity. 2 Materials and methodology VR Headset Type: This would be a variable specifying the form of VR headset that a user is utilizing 2.1 Data gathering in a VR experience. Examples include Oculus Rift, HTC Vive, and PlayStation VR, among others. A set of users' experiences in VR settings provides the Duration: This variable shows the time spent in the dataset. The information covers user preferences, VR experience in minutes. It reflects how much time was emotional moods, and physiological reactions like skin spent by the participant in the VR setup. conductance and heart rate. This study's dataset includes Motion Sickness Rating: It displays the rating of the 1000 samples, each representing a user’s VR session. user's self-reported motion sickness during the VR Recorded features encompass User ID (173 unique experience. Higher numbers relate to a higher degree of values), Age (66), Gender (147), VR Headset type (61), motion sickness on an ascending scale ranging between 1 Session Duration (137), Motion Sickness severity (56), and 10. and Immersion Level (55). These variables cover both Dependent variable: demographic and behavioral data, forming a The degree to which a user experiences being inside a comprehensive basis for analysis. The Immersion Level virtual environment quantifies the subjective degree of the serves as the target variable, indicating users' subjective user's feeling of immersion in the experience, with a rating between 1 and 5, where 5 stands for the maximum level. 272 Informatica 49 (2025) 269–284 Y. Song et al. Figure 1: The contour plot with color fill illustrates the relationship between input and output variables Before deploying advanced computational models, through several iterations of hyperparameter tweaking. understanding several challenges in VR classification The implementation of HGB from sci-kit-learn 0.21.3 was systems is essential. These encompass the significant used from the Python ML module [40]. variability in user responses driven by individual physiological and psychological differences, noise in 2.3 Decision tree classification (DTC) biometric data such as heart rate and skin conductance, In a DT, every internal node displays a characteristic, each and class imbalance across different immersion levels. branch is a decision rule, and each leaf node is the outcome The subjective nature of immersion also complicates [41]. The root node signifies the topmost node in a DT. To labeling and impacts the consistency of ground truth. achieve the best discrimination among classes or results, These factors result in a complex, high-dimensional it learns to split based on the value of an attribute. feature space where traditional classifiers often face Different schemes have different criteria regarding difficulties with generalization and robustness. making decisions. For example, some of the metrics used Consequently, adopting adaptive hybrid machine learning by schemes like ID3, C4.5, and CART include entropy, approaches, supported by powerful metaheuristic gain ratio, and Gini impurity, respectively. The problem at optimization techniques, is crucial for effective hand now is to find that characteristic at every level that classification in VR. offers the optimum split in a DT, thereby assisting optimum decision-making [42]. The concept can be 2.2 Histogram gradient boosting mathematically understood by using the DT split based on classification (HGBC) entropy. The entropy H(D) of a dataset D can be calculated The HGB approach is another variant of the popular GB as follows: 𝑚 [39] technique used to resolve diverse classification and regression-oriented machine learning (ML) problems. 𝐻(𝐷) = − ∑ 𝑝𝑖𝑙𝑜𝑔2𝑝𝑖 (1) These schemes, which AdaBoost also belongs to, 𝑖=1 primarily try to turn weak learners into strong ones. They come under the category of schemes called boosting 2.4 Ebola optimization search (EOS) schemes. Boosting techniques try to keep adding and Driven by the diffusion of the Ebola virus, in what follows, teaching new weak learners successively to correct their EOS presents a metaheuristic scheme [43]. The EOSA previously introduced weak learners about their mistakes. scheme is based on the enhanced SIR scheme of the It then informs each new weak learner to avoid the sickness. Its S, E, I, R, H, V, Q, and D compartments mistakes made by its forerunner. The most common weak represent the Susceptible (S), Exposed (E), Infected (I), learners used are DTs. This resulted in the development of Hospitalized (H), Recovered (R), Vaccinated (V), the HGB algorithm, a boosting methodology that Quarantine (Q), and Death (D) states, respectively. overcame one of the major weaknesses of the GB Because of these compartments, the composition provides algorithm, which was its very long training time when for the construction of a search domain that best displays training on large datasets. To circumvent this problem, the combinations of weights and biases that may be required continuous input variables are discretized or binned into a by CNN. After representation, SIR is displayed by a few hundred distinct values. In this case, the learning rate mathematical scheme utilizing a system of first-order (LR) of the scheme is the most important hyperparameter. differential equations. Then, the new metaheuristic Much attention was paid to the optimization of the scheme scheme was developed by combining the mathematical A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 273 and propagation schemes, and later, the obtained ❖ Define initial values for all vector and scalar mathematical scheme was deployed in the design of quantities, that is, persons and parameters, EOSA-CNN for experimentation. Therefore, the respectively: the numbers of hospitalized (H), following are the mathematical schemes: vaccinated (V), susceptible (S), infected (I), 𝑚𝐼𝑡+1 𝑖 = 𝑚𝐼𝑡 𝑖 + 𝜌𝑀(𝐼) (2) recovered (R), dead (D), and quarantined (Q). 𝜕𝑆(𝑡) ❖ 𝐼1 is created at random among vulnerable people. = 𝜋 − (𝛽 𝐼 + 𝛽 𝐷 + 𝛽 𝑅 + 𝛽 𝜕𝑡 1 3 4 2(𝑃𝐸)𝜂)𝑆 (3) ❖ The value of fitness shall be calculated at the − (𝜏𝑆 + Γ𝐼) index case, having set that as the current and 𝜕𝐼(𝑡) global best. = (𝛽 𝐼 + 𝛽 𝐷 + 𝛽 𝑅 + 𝛽 ( 𝜕𝑡 1 3 4 2 𝑃𝐸)𝜆)𝑆 (4) ❖ If there is at least one infected person and the − (Γ + 𝛾)𝐼 − (𝜏)𝑆 number of iterations is not reached, then: 𝜕𝐻(𝑡) a) With every vulnerable individual, a standing is = 𝛼𝐼 − (𝛾 + 𝜛)𝐻 (5) 𝜕𝑡 created and altered accordingly with their 𝜕𝑅(𝑡) movement. Exploitation is characterized by short = 𝛾𝐼 − Γ𝑅 (6) 𝜕𝑡 displacement; otherwise, it characterizes 𝜕𝑉(𝑡) exploration. Remember that the longer an = 𝛾𝐼 − (𝜇 + 𝜗)𝑉 (7) 𝜕𝑡 infected case is displaced, the more infections 𝜕𝐷(𝑡) there are. = (𝜏𝑆 + Γ𝐼) − 𝛿𝐷 (8) 𝜕𝑡 b) Using (a), generate newly infected individuals 𝜕𝑄(𝑡) 𝑛𝐼. = (𝜋𝐼 − (𝛾𝑅 + Γ𝐷)) − 𝜉𝑄 (9) 𝜕𝑡 c) Create the new individuals and add the new 𝑚𝐼𝑡+1 𝑡 𝑖 and 𝑚𝐼𝑖 display the old and new situation at instances in 𝐼. time 𝑡 and 𝑡 + 1, respectively, 𝜌 is the displacement scale d) From the size of 𝐼 calculate how many people are factor of an individual in Eq. (2). The data updated here added to H, D, R, B, V, and Q at their respective are Hospitalized (H), Vaccinated (V), Recovered (R), rates. Infected (I), Susceptible (S), Quarantine (Q), and Dead e) Utilizing the new 𝐼, refine 𝑆 and 𝐼. (D). Eqs. (3) to (9) define a system of ordinary differential f) Compare the best 𝐼 have got at the moment with equations, all of the scalar functions that one can evaluate the best in the world. to float values. These are computed given initial g) If the termination condition is not reached, go conditions 𝑆 (0) = 𝑆0, 𝐼(0) = 𝐼0, 𝑅(0) = 𝑅0, 𝐷(0) = back to step 4. 𝐷0, 𝑃(0) = 𝑃0, and 𝑄(0) = 𝑄0, 𝑡 is after the definition ❖ Return all solutions and the best global of iterations. This will then enable us to conclude the resolution. magnitude of vectors S, I, H, R, V, D, and Q at 𝑡. The design and discussion of the utilization of the The pseudocode that describes the EOSA enhancement issue defined in this paper are given in the metaheuristic scheme is presented accordingly in steps following subsections. below: Fig. 2 presents the flowchart of the DTC. Figure 2: The flowchart of the DTC model 274 Informatica 49 (2025) 269–284 Y. Song et al. 2.5 Differential squirrel search algorithm Combining EOS and DSSA offers complementary (𝐃𝐒𝐒𝐀) advantages, EOS facilitates broad exploration, while DSSA ensures precise convergence, together enhancing 𝐷𝑆𝑆𝐴, a hybrid optimizer that combines the differential classification accuracy and model robustness for VR evolution and squirrel search schemes is presented in this immersion prediction. section. In SSA, the squirrels maintain the position of other squirrels regarding acorn or hickory trees for 2.5.1 Initialization of position and evaluation updating their position. To improve its search strategy, the of fitness top squirrels' position updating rules have been changed. The incorporation of crossover operations inspired by DE The squirrels are initially placed in the search area at significantly enhances the exploration capability. random. Knowing the squirrels' location allows one to The following is a mathematical scheme of many calculate their fitness, which simply replaces their position foraging techniques covered under the paradigm of DSSA. in the fitness function by demonstrating how good a food To justify selecting EOS and DSSA for this supply they could find. The best squirrel 𝑃𝑆ℎ𝑡 discovered classification task, it’s crucial to highlight the problem’s in the hickory tree thus far is determined by sorting fitness nature: the dataset involves multiple interacting features values. It is thought that the squirrels in the acorn tree with complex, nonlinear relationships, which can cause 𝑃𝑆𝑎𝑡(1 ∶ 3) are traveling in the direction of the optimal optimization to get stuck in local optima when using location in a subsequent iteration, as indicated by the traditional methods. The EOS algorithm, inspired by following three best function values. The remaining epidemic modeling, employs dynamic, population-based squirrels, 𝑃𝑆𝑛𝑡(1: 𝑁𝑃 − 4), are in the typical tree and exploration techniques that balance infection-driven have not yet discovered food. diversification with recovery-focused convergence. This strategy is especially effective for tuning hyperparameters 2.5.2 Position update in complex models like HGBC and DTC. Its The squirrels in an acorn tree, following the current best, compartmental diffusion model efficiently captures 𝑃𝑆ℎ𝑡, renew the position, and move in the direction of the multidimensional search dynamics. Meanwhile, DSSA best source when there is no predator. The squirrels of a mimics squirrel foraging behavior and utilizes crossover usual tree follow the ones in an acorn or hickory tree to inspired by differential evolution, making it highly renew their position. If there is the presence of a predator, effective at fine-tuning solutions locally while then the squirrels change direction randomly while maintaining overall diversity. This capability is critical in foraging. These are the mathematical schemes that are VR classification scenarios, where high accuracy requires used to update the squirrel's position. careful adjustment of sensitive parameters to prevent As in Eq. (10) now, the posture of squirrels on acorn overfitting. DSSA’s ability to retain elite solutions while trees changes based on the postures of others. fostering diversity helps avoid premature convergence. 𝑆𝑜𝑙𝑑 𝑡 + 𝑑𝑔 × 𝐺 𝑃𝑆𝑜𝑙𝑑 𝑃𝑆𝑛𝑒𝑤 𝑃 𝑐(𝑃𝑆𝑜𝑙𝑑 ℎ𝑡 − 𝑎𝑡 − 𝑃𝑎𝑣𝑔), 𝑟1 ≥ 𝑃𝑑𝑝 𝑎𝑡 = { 𝑎 (10) 𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 whereas 𝑃𝑎𝑣𝑔 is the mean location of every squirrel in Some of the squirrels on regular trees do the the current population. placement of acorn tree squirrels, after which they relocate It also employs the crossover mechanism of DE in a to their new locations. way that ensures diversity among squirrels to the 𝑃𝑆𝑛𝑒𝑤 𝑛𝑡 maximum while minimizing the possibility of trapping in 𝑃𝑆𝑜𝑙𝑑 𝑜𝑙𝑑 − 𝑃𝑆𝑜𝑙𝑑 = { 𝑛𝑡 + 𝑑𝑔 × 𝐺𝑐(𝑃𝑆𝑎𝑡 𝑛𝑡 ), 𝑟2 ≥ 𝑃𝑑𝑝 (12) local minima. Applied to the squirrel's current position and 𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑖𝑜𝑛, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 the new position as obtained by Eq. (11): where the random integer 𝑟2 is uniformly distributed 𝑃𝑆𝑐𝑟 𝑎𝑡,𝑖,𝑗 between 0 and 1. 𝑃𝑆𝑛𝑒𝑤 𝑎𝑡,𝑖,𝑗 , 𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 ≤ 𝐶𝑟) 𝑜𝑟 𝑗 = 𝑗𝑟𝑎𝑛𝑑 In normal trees, the survivors cling on to the best = { , 𝑗 (11) 𝑃𝑆𝑜𝑙𝑑 move on view, and their new positions are shown below: 𝑎𝑡,𝑖,𝑗 , 𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 > 𝐶𝑟)𝑜𝑟 𝑗 ≠ 𝑗𝑟𝑎𝑛𝑑 𝑃𝑆𝑛𝑒𝑤 = 1, 2, 3, … , 𝐷 𝑛𝑡 𝑃𝑆𝑜𝑙𝑑 + 𝑑 𝑜𝑙𝑑 𝑑 In this context, NP displays the population size, with ) = { 𝑛𝑡 𝑔 × 𝐺𝑐(𝑃𝑆ℎ𝑡 − 𝑃𝑆𝑜𝑙 𝑛𝑡 ), 𝑟3 ≥ 𝑃𝑑𝑝 (13 𝑖 ranging from 1, 2, 3, … , 𝑁𝑃. For acorn or normal trees, 𝑟𝑎𝑛𝑑𝑜𝑚 𝑝𝑜𝑠𝑖𝑖𝑜𝑛, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑃𝑆𝑐𝑟 𝑎𝑡;𝑖;𝑗 indicates the updated positions of the squirrels The following crossover procedure is also given for typical tree squirrels: following the crossover operation. 𝑃𝑆𝑛𝑒𝑤 𝑎𝑡;𝑖;𝑗 and 𝑃𝑆𝑜𝑙𝑑 𝑎𝑡;𝑖;𝑗 𝑃𝑆𝑐𝑟 correspond to the new and previous positions of the 𝑛𝑡,𝑖,𝑗 squirrels. 𝐷 refers to the dimensionality of the problem, 𝑃𝑆𝑛𝑒𝑤 𝑛𝑡,𝑖,𝑗, 𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 ≤ 𝐶𝑟)𝑜𝑟 𝑗 = 𝑗𝑟𝑎𝑛𝑑 = { , (14) and 𝐶𝑟 displays the crossover rate, which is set to 0.5. The 𝑃𝑆𝑜𝑙𝑑 𝑛𝑡,𝑖,𝑗, 𝑖𝑓(𝑟𝑎𝑛𝑑𝑗 > 𝐶𝑟)𝑜𝑟 𝑗 ≠ 𝑗𝑟𝑎𝑛𝑑 index 𝑗𝑟𝑎𝑛𝑑 is randomly selected from the range [1, 𝐷], 𝑗 = 1, 2, 3, … , 𝐷 and 𝑟𝑎𝑛𝑑𝑗 denotes the 𝑗𝑡ℎ random number, uniformly The convergence speed may be raised by permitting generated within this range. the hickory tree squirrel to update her location in relation A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 275 to the average position of the squirrels in the tree. This can Figure 3 illustrates the flowchart of the proposed be done as follows: hybrid models (such as HGDS and DTEO), detailing the 𝑃𝑆𝑛𝑒𝑤 𝑎 ℎ𝑡 = 𝑃𝑆𝑜𝑙𝑑 ℎ𝑡 + 𝑑𝑔 × 𝐺𝑐(𝑃𝑆𝑜𝑙𝑑 𝑣𝑔 ℎ𝑡 − 𝑃𝑆𝑎𝑡 ) (15) sequential phases that encompass data input, model In this instance, 𝑃𝑆𝑎𝑣𝑔 displays the average of all development, optimizer-centric hyperparameter squirrel locations within the acorn trees. optimization, training, and final assessment. This diagram In order to participate in the next generation of people, delineates the interaction between machine learning the best aspects of the new work, as well as its crossover models and metaheuristic optimizers within the hybrid roles, are then contrasted with the old jobs. structure. Figure 3: The process flowchart of the proposed hybrid models 276 Informatica 49 (2025) 269–284 Y. Song et al. 2.6 Performance evaluators max_leaf_nodes (listed separately for different model types). The HGEO and HGDS models, based on the Accuracy depends on how many correctly projected HGBR algorithm, have specified values for learning_rate, positive and negative instances there are of the total, max_leaf_nodes, max_depth, min_samples_leaf, and defined by True Positives (TP), True Negatives (TN)— max_bins. For example, the HGEO model has a correctly projected negative cases, False Positives (FP)- learning_rate of 0.709, max_leaf_nodes of 278, incorrectly projected as positive, and False Negatives max_depth of 100, min_samples_leaf of 10, and max_bins (FN)—incorrectly projected as negative. Using TP and FP of 27. In the HGDS model, these values are learning_rate as the relevant measures, precision gauges the percentage of 0.148, max_leaf_nodes of 557, max_depth of 893, of TP projections out of all the positive projections the min_samples_leaf of 7, and max_bins of 102. Conversely, model has made. Smaller amounts of false positives imply the DTEO and DTDS models, which are based on decision higher precision. Recall is the measure of the share of TP tree algorithms, do not include values for learning_rate, projections from all real positive instances, using True max_leaf_nodes, or max_bins in the first part of the table. Positives and False Negatives; it indicates that the model However, they include defined values for max_depth, will detect all relevant positive cases. The fewer false min_samples_leaf, min_samples_split, and negatives there are, the higher the recall. A simple statistic max_leaf_nodes in the second part. For instance, the that balances the trade-off between Precision and Recall is DTEO model has max_depth of 741, min_samples_leaf of the F1 score. It combines the two. 𝑇𝑃 + 𝑇𝑁 0.00025, min_samples_split of 0.0275, and 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ∶ (16) max_leaf_nodes of 2710. Similarly, the DTDS model 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁 𝑇𝑃 features max_depth of 597, min_samples_leaf of 0.00025, Precision ∶ (17) min_samples_split of 0.0005, and max_leaf_nodes of 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 1789. Overall, the table indicates that hyperparameters are Recall ∶ (18) selectively tuned for each model based on its structure, 𝑇𝑃 + 𝐹𝑁 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙 with parameter values chosen according to each model's F1 − score = 2 × (19) specific characteristics and requirements. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 The F1-score is a single measure that balances the Fig. 4 displays a 3D waterfall plot illustrating the accuracy and recall; it is the harmonic mean of both. It is convergence curves of four hybrid schemes: HGDS, very useful when considering false negatives and false HGEO, DTDS, and DTEO. The plot effectively visualizes positives. The greater the F1 score, the better balanced the the different convergence rates and final performance recall and accuracy. levels of the schemes, demonstrating the varying degrees of effectiveness in the optimization process. This comparison emphasizes the significance of the number of 3 Results and discussion iterations and initial accuracy in determining the overall success of each hybrid model. The HGDS model starts 3.1 Hyperparameters tuning and with an accuracy of 0.6 and gradually improves over 200 iterations, ultimately reaching a peak accuracy of 0.967, convergence curve analysis making it the highest-performing model among the four. The presented table displays the tuned hyperparameters The other three schemes begin with a lower accuracy of for four different hybrid models: HGEO, HGDS, DTEO, 0.4 and converge more quickly than HGDS, reaching their and DTDS. Seven key hyperparameters were considered final accuracy in fewer iterations. Among these schemes, to optimize these models' performance: learning_rate, DTEO is identified as the weakest hybrid model, with a max_leaf_nodes, max_depth, min_samples_leaf, final accuracy of 0.908 after its iterations. max_bins, min_samples_split, and a second instance of Table 1: Hyperparameter tuning for four models Models Hyperparameters HGEO HGDS DTEO DTDS 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔_𝑟𝑎𝑡𝑒 0.709 0.148 - - 𝑚𝑎𝑥_𝑙𝑒𝑎𝑓_𝑛𝑜𝑑𝑒𝑠 278 557 - - 𝑚𝑎𝑥_𝑑𝑒𝑝𝑡ℎ 100 893 741 597 𝑚𝑖𝑛_𝑠𝑎𝑚𝑝𝑙𝑒𝑠_𝑙𝑒𝑎𝑓 10 7 0.00025 0.00025 𝑚𝑎𝑥_𝑏𝑖𝑛𝑠 27 102 - - 𝑚𝑖𝑛_𝑠𝑎𝑚𝑝𝑙𝑒𝑠_𝑠𝑝𝑙𝑖𝑡 - - 0.0275 0.0005 𝑚𝑎𝑥_𝑙𝑒𝑎𝑓_𝑛𝑜𝑑𝑒𝑠 - - 2710 1789 A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 277 Figure 4: 3D waterfall plot for the convergence curve of the hybrid schemes 3.2 Schemes performance comparison accuracy of 0.907, is the weakest model. HGDS outperforms HGEO by 0.17 in accuracy, establishing itself Fig. 5 presents a doughnut plot, providing an intuitive as the top model. Nevertheless, HGEO still demonstrates representation of the schemes' performance and strong performance, securing the second-best position facilitating a clearer comparison across different overall. This comparison underscores the varying evaluation metrics. The performance results of six hybrid strengths of each model, with HGDS leading in accuracy schemes evaluated using accuracy, precision, recall, and and other performance metrics, while HGEO, despite its F1 scores across training, testing, and overall sections lower accuracy, remains a competitive alternative. The have been presented. Among these, HGDS emerges as the results emphasize that even schemes with slightly lower best-performing model, with an impressive accuracy of accuracy can still offer valuable performance in certain 0.967 in the test section. Conversely, DTEO, with an contexts. 278 Informatica 49 (2025) 269–284 Y. Song et al. A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 279 Figure 5: A connected doughnut plot employed for the visual evaluation of the schemes' performance Additionally, Table 2 provides a summary of the than the top-performing schemes. Nonetheless, it performance of six schemes across five levels regarding outperforms both DTDS and CTC by margins of 0.013 precision, recall, and F1 score. The hybrid model HGDS and 0.010, respectively. While DTEO's F1-score may not stands out at level 1, achieving the highest precision of be the highest, it still demonstrates competitive 0.990. Additionally, HGDS excels in both recall and F1- performance relative to other schemes. These findings score, outperforming all other schemes and demonstrating indicate that HGDS is the most well-rounded and effective its overall robustness. In contrast, DTEO shows weaker model overall, while DTEO, despite its limitations in recall performance compared to the other schemes, recall and F1 score, delivers superior performance in although it surpasses DTC in this metric. Regarding the specific areas. F1-score, DTEO records a value of 0.922, which is lower Table 2: Schemes’ evaluation results through different immersion levels Immersion Levels Evaluators Schemes Level 1 Level 2 Level 3 Level 4 Level 5 HGBC 0.946 0.946 0.909 0.925 0.973 HGEO 0.941 0.995 0.907 0.951 0.974 HGDS 0.974 0.990 0.981 0.945 0.943 Precision DTC 0.988 0.938 0.825 0.909 0.822 DTEO 0.973 0.929 0.847 0.923 0.878 DTDS 0.906 0.939 0.873 0.914 0.983 HGBC 0.951 0.933 0.928 0.952 0.932 HGEO 0.946 0.947 0.959 0.947 0.969 HGDS 0.960 0.971 0.985 0.956 0.963 Recall DTC 0.847 0.875 0.953 0.869 0.916 DTEO 0.876 0.885 0.948 0.927 0.906 DTDS 0.911 0.894 0.964 0.927 0.911 HGBC 0.948 0.94 0.918 0.938 0.952 HGEO 0.943 0.97 0.932 0.949 0.971 HGDS 0.975 0.976 0.965 0.949 0.971 F1-score DTC 0.912 0.906 0.885 0.888 0.866 DTEO 0.922 0.906 0.895 0.925 0.892 DTDS 0.909 0.916 0.916 0.921 0.946 Fig. 6 displays the ROC (Receiver Operating Area Under the Curve (AUC) signifies better Characteristic) curves of the hybrid model across five performance. Among the five levels, Level 1 has the immersion levels. The ROC curve plots the True Positive highest AUC, indicating greater confidence and fewer Rate (TPR) against the False Positive Rate (FPR) at classification uncertainties at this stage. Conversely, Level various thresholds, offering a visual assessment of the 5 shows the weakest ROC performance, likely due to model’s ability to distinguish between classes. A higher increased data overlap and less feature separation at higher 280 Informatica 49 (2025) 269–284 Y. Song et al. immersion ratings. This suggests that as responses become lowest false positive rate. At this level, the true positive more subtle at deeper immersion levels, the model's ability rate starts at 0.0 and gradually increases to 1.0, while the to differentiate between classes slightly diminishes, false positive rate begins at 0.0 and rises to 0.1. On the resulting in more false positives and a lower true positive other hand, Level 5 displays the worst projection rate. These differences illustrate the model’s changing performance, with the true positive rate reaching 1.0, confidence in classification across varying immersion indicating a decrease in projection accuracy and an levels. Level 1 is considered the best projection level, increase in false positive rate. This shows a decline in characterized by the highest true positive rate and the overall predictive quality as the level increases. Figure 6: ROC curves for the hybrid classification model across five immersion levels. 3.3 Comparison of the measured and model. This high correlation between observed and projected values projected values underscores HGDS's strong overall reliability. Conversely, the DTEO model shows the Fig. 7 displays a 3D bar plot illustrating the correlation weakest performance, with only 177 accurate projections, between observed and projected values across five levels, making it the least effective model overall. While certain highlighting each model's predictive accuracy. Among schemes may perform poorly in specific conditions, these, the HGDS model stands out with the best DTEO consistently underperforms across all levels, performance, particularly in level 1, where it achieves 194 indicating significant limitations in its predictive accurate projections, establishing it as the top-performing accuracy. Figure 7: A 3D bar plot is generated to depict the correlation between observed and projected values Fig. 8 shows the projection errors across six schemes, these, the HGDS stands out for its higher accuracy. In focusing on correct projections versus mistakes. Among level 1, it correctly projected 192 out of 194 cases, A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 281 resulting in only two errors. Similarly, in level 2, HGDS performance was similarly low in level 2, where it made achieved 198 correct projections out of 202, with just four 14 mistakes out of 184 projections. This high error rate mistakes. This accuracy highlights its strong performance marks DTEO as the least effective among the schemes in comparison to the other schemes. In contrast, the DTEO analyzed. Overall, while HGDS exhibits consistent model demonstrates weaker predictive accuracy. In level accuracy in both levels, DTEO's elevated error rate 1, it recorded five errors out of 177 projections. Its suggests limitations in its predictive reliability. Figure 8: Confusion matrix illustrating the accuracy of the schemes under four specified conditions 282 Informatica 49 (2025) 501–505 Y. Song et al. • Sensitivity analysis The HGBC, HGEO, HGDS, DTEO, and DTDS models recorded much lower F-values—0.021, 0.006, 0.031, Table 3 displays the results of a sensitivity analysis 1.015, and 0.074—with P-values of 0.886, 0.937, 0.861, using one-way ANOVA to determine if model 0.314, and 0.786. These findings indicate no statistically performance differences across various VR immersion significant performance differences across immersion levels are statistically significant. The F-value indicates levels. Notably, the HGDS model—identified earlier as the ratio of variance between groups to within groups, the most accurate with a test accuracy of 0.967—showed while the P-value shows the likelihood that observed a low F-value of 0.031 and a high P-value of 0.861, differences are due to chance. A P-value below 0.05 is confirming its stable performance across all conditions. generally considered significant. Of the six models Overall, the ANOVA results suggest that none of the evaluated, the DTC model had the highest F-value of models exhibit statistically significant performance 2.923 and a P-value of 0.088. Although close to variations across immersion levels, highlighting the significance, this result remains statistically non- robustness of the proposed models and particularly significant, implying only marginal performance validating the consistent performance of HGDS under differences that do not meet the 95% confidence threshold. different experimental scenarios. Table 3: Sensitivity analysis based on ANOVA Models name F-value P-value Models name F-value P-value HGBC 0.021 0.886 DTC 2.923 0.088 HGEO 0.006 0.937 DTEO 1.015 0.314 HGDS 0.031 0.861 DTDS 0.074 0.786 their visual representation but also in how objects behave. 3.4 Limitations and directions for future For instance, VR simulations may include natural forces research like gravity. These environments are not always designed to mirror the real world; in fact, they often present While the hybrid classification framework demonstrates fantastical or even impossible scenarios. This unique encouraging results in predicting VR immersion levels, capability allows VR to simulate complex or hazardous there are some limitations to address. First, the dataset is situations safely, making it especially useful in training relatively small and was collected in a controlled and educational contexts. In such settings, VR can expose experimental setting, raising questions about how well the learners to potentially risky situations they might models will perform in real-world or commercial VR encounter in reality, allowing them to experience and environments with more diverse users. Second, the practice without the associated risks. Advancements in computational cost of metaheuristic algorithms like EOS technology have greatly enhanced the capabilities of VR, and DSSA can increase substantially with larger dataset allowing for more immersive and realistic simulations. dimensions, which may affect real-time or low-latency Additionally, the integration of sophisticated VR applications. More research is needed to evaluate their classification schemes, such as DTC and HGBC, is scalability and efficiency in live systems. Third, although transforming digital experiences. These schemes, along the models were optimized for accuracy, aspects like with optimizations from techniques like the EOS and interpretability and user feedback were not thoroughly DSSA, contribute to the improvement of VR systems. In explored. Transparency could be especially important for testing, the hybrid HGDS approach has proven to be applications in education or healthcare. Future research highly effective, achieving an accuracy rate of 0.967, will focus on: (1) expanding the dataset to include making it the top performer among various schemes. On multimodal user feedback (e.g., eye tracking, EEG), (2) the other hand, the DTEO approach, with an accuracy of comparing our framework with common models such as 0.907, was identified as the least effective. Additionally, SVM, Random Forest, and Neural Networks, and (3) although this study concentrated on the new EOS and creating lightweight or approximate versions of EOS and DSSA algorithms because of their innovative hybrid DSSA suitable for real-time immersive use. Additionally, search abilities, future research will include implementing we aim to test the models across various VR fields, and comparing more traditional and popular optimizers including rehabilitation, industrial training, and like Particle Swarm Optimization (PSO), Genetic personalized learning, to ensure their robustness in Algorithms (GA), and Bayesian Optimization. This will different operational contexts. enable a more comprehensive assessment of optimization efficiency and adaptability across different learning scenarios. Although this study concentrated on hybrid 4 Conclusion variants within our optimization framework, future VR simulation immerses users in a dynamic, visually research will include benchmarking with models like engaging virtual environment where they can navigate, Random Forest, Support Vector Machines, XGBoost, and manipulate virtual objects, and interact with digital agents. Neural Networks. This will contextualize our models' A defining feature of VR worlds is their three-dimensional performance against recognized standards and strengthen nature, often coupled with realistic elements, not only in the validation of our methodology. Despite this, the A Cutting-Edge Bio-Inspired Computational Framework for Advanced… Informatica 49 (2025) 269–284 283 hybrid approach often outperformed both DTC and DTDS 183–201, 2001. Mary Ann Liebert. in certain metrics, demonstrating the potential of https://doi.org/10.1089/109493101300117884. combining these innovative techniques for enhancing VR- [10] W. R. Sherman and A. B. Craig, “Understanding based applications. virtual reality,” San Francisco, CA: Morgan Kauffman, 2003. Elsevier. Declarations [11] J. Vince, Introduction to virtual reality. Springer Science & Business Media, 2011. [12] S. Kavanagh, A. Luxton-Reilly, B. Wuensche, and Funding B. Plimmer, “A systematic review of virtual reality in education,” Themes in science and This investigation was not funded by any specific grant technology education, vol. 10, no. 2, pp. 85–119, from public, commercial, or charitable funding bodies. 2017. The Learning and Technology Library. [13] S. Helsel, “Virtual reality and education,” Authors' contributions Educational Technology, vol. 32, no. 5, pp. 38– YS performed Data collection, modeling, and appraisal. 42, 1992.JSTOR. HZ reviews the initial draft of the manuscript, editing and [14] W. Winn, “Learning in artificial environments: writing. Embodiment, embeddedness and dynamic adaptation,” Technology, Instruction, Cognition Acknowledgements and Learning, vol. 1, no. 1, pp. 87–114, 2003. [15] B. J. Baker, “Virtual reality,” in Encyclopedia of This exploration was backed by the project of the sixth Sport Management, Edward Elgar Publishing, phase of the “333 High-level Talent Cultivation Project” 2024, pp. 1021–1023. Elgar Online. in Jiangsu Province. https://doi.org/10.4337/9781035317189.ch599. [16] Y. Boas, “Overview of virtual reality Ethical approval technologies,” in Interactive Multimedia Conference, 2013, pp. 1–6. The exploration has received ethics approval from the [17] M.-S. Yoh, “The reality of virtual reality,” in IRB, guaranteeing the protection of participants' rights and Proceedings seventh international conference on compliance with the related ethics norms. virtual systems and multimedia, IEEE, Berkeley, CA, USA, 2001, pp. 666–674. References https://doi.org/10.1109/VSMM.2001.969726 [1] C. Anthes, R. J. García-Hernández, M. [18] J. S. Bruner, “The act of discovery.,” Harvard Wiedemann, and D. Kranzlmüller, “State of the educational review, 1961. APA PsycNet. art of virtual reality technology,” in 2016 IEEE [19] G. Riva, “Virtual reality,” in The Palgrave aerospace conference, IEEE, Big Sky, MT, USA, Encyclopedia of the possible, Springer, 2023, pp. 2016, pp. 1–19. 1740–1750. Springer. https://doi.org/10.1109/AERO.2016.7500674. https://doi.org/10.1007/978-3-030-90913-0_34. [2] I. Wohlgenannt, A. Simons, and S. Stieglitz, [20] C. Christou, “Virtual reality in education,” in “Virtual reality,” Business & Information Systems Affective, interactive and cognitive methods for e- Engineering, vol. 62, pp. 455–461, 2020. learning design: creating an optimal education Springer. https://doi.org/10.1007/s12599-020- experience, IGI Global, 2010, pp. 228–243. IGI 00658-9. Global. DOI: 10.4018/978-1-60566-940-3.ch012. [3] J. M. Zheng, K. W. Chan, and I. Gibson, “Virtual [21] M. Heim, “The design of virtual reality,” Body & reality,” Ieee Potentials, vol. 17, no. 2, pp. 20–23, Society, vol. 1, no. 3–4, pp. 65–77, 1995. Sage 1998. IEEE. https://doi.org/10.1109/45.666641. Publications. [4] G. C. Burdea, Virtual reality technology. John https://doi.org/10.1177/1357034X95001003004. Wiley & Sons, 2003. [22] R. Dachselt and A. Hübner, “Three-dimensional [5] S. M. LaValle, Virtual reality. Cambridge menus: A survey and taxonomy,” Computers & university press, 2023. Graphics, vol. 31, no. 1, pp. 53–65, 2007. [6] R. W. Langacker, “Virtual reality,” 1999. Elsevier. [7] S. Greengard, Virtual reality. Mit Press, 2019. https://doi.org/10.1016/j.cag.2006.09.006. [8] F. P. Brooks, “What’s real about virtual reality?” [23] P. Milgram and F. Kishino, “A taxonomy of IEEE Computer graphics and applications, vol. mixed reality visual displays,” IEICE 19, no. 6, pp. 16–27, 1999. IEEE. TRANSACTIONS on Information and Systems, https://doi.org/10.1109/38.799723. vol. 77, no. 12, pp. 1321–1329, 1994. [9] M. J. Schuemie, P. Van Der Straaten, M. Krijn, [24] M. R. Macedonia and M. J. Zyda, “A taxonomy and C. A. P. G. Van Der Mast, “Research on for networked virtual environments,” IEEE presence in virtual reality: A survey,” multimedia, vol. 4, no. 1, pp. 48–56, 1997. IEEE. Cyberpsychology & behavior, vol. 4, no. 2, pp. https://doi.org/10.1109/93.580395. [25] A. Mania and A. G. Chalmers, “A classification for user embodiment in collaborative virtual 284 Informatica 49 (2025) 501–505 Y. Song et al. environments,” in 4th Internatioal Conference on Virtual Systems and Multimedia, Gifu, Japan, 1998. [26] D. A. Bowman and L. F. Hodges, “Formalizing the design, evaluation, and application of interaction techniques for immersive virtual environments,” Journal of Visual Languages & Computing, vol. 10, no. 1, pp. 37–53, 1999. Elsevier. https://doi.org/10.1006/jvlc.1998.0111. [27] D. A. Bowman, D. Koller, and L. F. Hodges, “A methodology for the evaluation of travel techniques for immersive virtual environments,” Virtual reality, vol. 3, no. 2, pp. 120–131, 1998. Springer. https://doi.org/10.1007/BF01417673. [28] M. R. Mine, “Virtual environment interaction techniques,” UNC Chapel Hill CS Dept, 1995. [29] J. L. Gabbard, “A taxonomy of usability characteristics in virtual environments,” 1997, Virginia Tech. [30] S. Livatino and C. Koffel, “Handbook for evaluation studies in virtual reality,” in 2007 IEEE symposium on virtual environments, human- computer interfaces and measurement systems, IEEE, Ostuni, Italy, 2007, pp. 1–6. https://doi.org/10.1109/VECIMS.2007.4373918. [31] G. Welch and E. Foxlin, “Motion tracking: No silver bullet, but a respectable arsenal,” IEEE Computer graphics and Applications, vol. 22, no. 6, pp. 24–38, 2002.IEEE. https://doi.org/10.1109/MCG.2002.1046626. [32] S. Emami and G. Martínez-Muñoz, “Condensed- gradient boosting,” International Journal of Machine Learning and Cybernetics, pp. 1–15, 2024. Springer. https://doi.org/10.1007/s13042- 024-02279-0. [33] F. Pedregosa et al., “scikit-learn: Machine Learning in Python-StandardScaler, 2011,” 2024, Accessed. [34] F. T. Admojo and B. S. W. Poetro, “Comparative Study on the Performance of the Bagging Algorithm in the Breast Cancer Dataset,” International Journal of Artificial Intelligence in Medical Issues, vol. 1, no. 1, pp. 36–44, 2023. Yoctobrain. https://doi.org/10.56705/ijaimi.v1i1.87. [35] A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” International Journal of Artificial Intelligence in Medical Issues, vol. 1, no. 1, pp. 1–9, 2023. https://doi.org/10.56705/ijaimi.v1i1.83. [36] T. I. A. Mohamed, O. N. Oyelade, and A. E. Ezugwu, “Automatic detection and classification of lung cancer CT scans based on deep learning and ebola optimization search algorithm,” Plos one, vol. 18, no. 8, p. e0285796, 2023. PLOS. https://doi.org/10.1371/journal.pone.0285796. https://doi.org/10.31449/inf.v49i16.9788 Informatica 49 (2025) 269–290 269 GWO-RF: A Grey Wolf Optimized Random Forest Model for Predicting Employee Turnover Hongtao Zhang Henan Medical Biological Testing Co., Ltd, Zhengzhou 450000, China E-mail: HongtaoZhang8103@163.com Keywords: human resources, prediction, employee turnover, computational model Received: June 19, 2025 This study proposes an employee turnover prediction model (GWO-RF) that combines Grey Wolf Optimization (GWO) algorithm with Improved Random Forest (LPRF). The model optimizes node splitting strategy by combining C4.5 information gain rate and CART Gini coefficient (constraint condition α+β=1) through linear programming. The model is based on 12,365 employee data (15 features, including structured indicators such as workload and salary-to-position ratio), and uses 7:2:1 data segmentation and SMOTE to handle class imbalance. Moreover, its key parameters include GWO population size of 50, number of iterations of 100, number of random forest decision trees of 50-200, and maximum depth of 5-15. The test set results show that the model has an AUC of 0.923±0.008 and an F1- score of 0.871. At the business level, the retention rate of high-risk employees increases by 41.9% (p<0.01), and the cost of single intervention decreases by 54.3%. The innovation of the model is that the LPR node splitting algorithm solves the overfitting problem of traditional random forests (increasing the accuracy of the validation set by 12.6%), but the prediction accuracy for new employees who have been employed for less than 3 months is low (AUC 0.782). Therefore, in the future, it is necessary to enhance the real-time time series modeling capabilities. Povzetek: Študija predstavi model GWO-RF, ki združuje optimizacijo sivega volka in izboljšani naključni gozd za napoved fluktuacije zaposlenih. Model izboljša razcep vozlišč ter poveča zadržanje ogroženih zaposlenih. 1 Introduction prediction accuracy and realize quantitative loss prediction. Therefore, some enterprises began to integrate In today's highly competitive business environment, multi-source data (including employee satisfaction employee turnover has become an important surveys, social network activities, etc.) to build hybrid management challenge for enterprises. With the rising models [2]. cost of human resources and the increasing mobility of It is of great strategic value and management knowledge workers, employee turnover not only brings necessity to construct an effective calculation model to direct recruitment and training costs, but also leads to the predict employee turnover. Employee turnover will bring damage of team stability, organizational knowledge loss significant economic losses to enterprises, including and corporate reputation decline. Especially, in education recruitment costs, training costs and tacit knowledge loss. and training, retail and Internet industries, the turnover Secondly, high turnover rate will destroy team stability rate of employees generally exceeds 20%, and the and affect organizational performance. Studies show that turnover rate of core employees in some enterprises is as when the team turnover rate exceeds 15%, the overall high as 30%, which makes the development of accurate productivity will decrease by 25-40% [3]. More turnover prediction model an urgent need for enterprise importantly, an effective prediction model can identify human resource management [1]. high-risk employees 6-12 months in advance, enabling The current mainstream prediction models can be enterprises to take targeted interventions to increase the divided into two categories. One is the rule-based method, retention rate of core employees by 35-50% [4]. which mainly relies on expert experience to build Furthermore, by analyzing turnover drivers, the model judgment rules. Although it is interpretable, it covers can optimize human resource management strategies and limited scenarios. The other is the machine learning- improve overall employee satisfaction by 10-15 based method. It automatically identifies churn features percentage points. In the context of digital transformation, by analyzing historical data, and typical algorithms such models have become a core tool for corporate talent include random forest, XGBoost and deep neural network. strategies. In particular, they are of key significance to The latest research shows that cluster analysis and knowledge-intensive industries and service industries, behavioral feature modeling can effectively improve and they can effectively reduce human capital risks and 270 Informatica 49 (2025) 269–290 H. Zhang enhance organizational competitiveness [5]. prediction accuracy rate was improved to the interval of However, the existing model still has significant 65%-75% [7]. limitations. Firstly, the data quality is highly dependent, The cross-application of survival analysis methods and many enterprises lack systematic employee behavior solves the shortcomings of traditional classification records, which leads to difficulties in feature engineering. models in time series prediction. Cox proportional hazard Secondly, the interpretability of the model is insufficient, model regards employee on-the-job status as a time- and the black box characteristics make it difficult for dependent variable, and quantifies the influence strength human resource managers to understand the prediction of different factors on retention rate through risk function. logic. Thirdly, the ability of cross-industry generalization The semi-parametric characteristics make it not only take is weak, and the driving factors of turnover in the advantage of the interpretation advantages of parametric education and training industry are essentially different models, but also adapt to the data distribution of non- from those in the retail industry. Finally, existing studies proportional risks. The research shows that there is a focus on predicting accuracy, ignoring the guiding value nonlinear positive correlation between the duration of of interventions, such as cost-benefit analysis of salary promotion delay and the risk of turnover, and the risk adjustment and training investment. Therefore, future coefficient increases exponentially when the delay research needs to strengthen the application of time series exceeds the critical value (about 18 months). This kind of behavior analysis and causal reasoning framework and model promotes the transformation of prediction establish a closed-loop management system of dimension from static section analysis to dynamic prediction-intervention-evaluation. The purpose of this process analysis [8]. study is to develop an intelligent early warning model The core value of the traditional model lies in its based on GWO-RF to improve the accuracy of high-risk white-box characteristics. Through coefficient employee identification and intervention efficiency. significance test and variable importance ranking, managers can intuitively understand the decision-making logic. However, it has three fundamental limitations. First, 2 Related work feature selection relies on domain knowledge, making it difficult to automatically extract implicit features. Second, (1) Development context and theoretical framework the model architecture lacks a memory mechanism and of traditional employee turnover prediction model cannot handle the continuous evolution of employee The development of traditional prediction models status. Third, it makes insufficient use of unstructured can be divided into three main stages: early statistical data (such as communication texts and collaboration modeling stage (1990-2005), machine learning networks) [9]. These shortcomings have prompted enhancement stage (2005-2015), and survival analysis researchers to turn to more complex intelligent modeling deepening stage (2010-2015). In the statistical modeling methods. stage, researchers mainly use parametric methods such as (2) Technological breakthrough and paradigm multiple linear regression and logistic regression to innovation of intelligent prediction model analyze the correlation between observable variables and The application of deep learning technology has turnover intention by constructing generalized linear enabled the prediction model to achieve a qualitative leap, model (GLM). This kind of research has laid the which is mainly reflected in four dimensions: time series theoretical foundation of employee turnover prediction, modeling ability, small sample learning efficiency, multi- and confirmed the explanatory power of core influencing modal fusion depth and dynamic decision optimization. factors such as salary fairness and career development In terms of time series modeling, long-term short-term opportunities. However, it is difficult to capture the memory networks (LSTM) capture long-term interaction effect between variables due to linear dependencies of employee behavior sequences through assumptions [6]. gating mechanisms, such as continuous quarterly The introduction of machine learning technology performance fluctuation patterns or communication marks a new stage in predictive models. Decision tree frequency changes trends. The two-way LSTM algorithms (such as ID3 and C4.5) construct architecture further integrates historical and future classification rules through information gain ratio, which context information, extending the early warning window can automatically discover high-risk combination to 9-12 months [10]. features such as "performance evaluation period > 6 Transfer learning technology effectively alleviates months and training participation times < 2 times". the problem of data scarcity. Through the pre-training- Ensemble learning methods (such as random forest) fine-tuning paradigm, the model can migrate the feature further improve model robustness and effectively reduce representations learned in the data-rich domain to the the risk of overfitting through Bootstrap resampling and target domain. The domain adaptive method reduces the random feature selection. During this period, the model distribution difference between the source domain and began to integrate structured data in the HR information the target domain, and improves the cross-industry system, including behavioral indicators such as prediction effect by 15%-25%. In addition, knowledge attendance records and project participation, so that the distillation technology compresses the knowledge of GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 271 complex teacher model into lightweight student model, amplifying the discriminatory influence of sensitive reducing the computational overhead by 70% while attributes such as gender and age. In terms of maintaining 90% prediction accuracy [11]. computational efficiency, real-time prediction requires Multi-modal fusion architecture breaks through the that the model reasoning delay be controlled within limitation of a single data type. Modern prediction 200ms, which poses a severe test for complex neural systems typically integrate three types of heterogeneous networks [14]. data: textual data, behavioral data, and physiological data. (3) Systematic analysis of existing problems and The attention mechanism automatically weights the future research directions contribution degree of different modalities [12]. The core contradictions faced by current research Reinforcement learning framework integrates can be summarized into three levels of conflicts: prediction and intervention into a unified system. The technical feasibility, ethical compliance and economic model learns the optimal retention strategy by interacting applicability. In the technical dimension, there is a with the environment, and the Q-learning algorithm fundamental tension between model complexity and evaluates the long-term benefits of different interventions interpretability [15]. Although post-hoc interpretation (such as salary adjustment range and training intensity). methods such as LIME and SHAP can generate the In addition, the strategy gradient method can deal with importance of local features, they cannot provide a global the continuous action space and dynamically adjust the causal chain, which leads managers to be cautious about intervention strength. Such systems achieve a leap from the prediction results [16]. In terms of ethics, the breadth passive prediction to active management, but need to of data collection conflicts with personal privacy rights. design a reasonable reward function to avoid short-term In particular, the application boundaries of sensitive behavior [13]. technologies such as emotion recognition [17] and social Although intelligent models have made remarkable network analysis [18] urgently need to be defined by law. progress, they face new challenges. In terms of data In terms of economics, there is a gap between the need privacy, the EU's General Data Protection Regulation for model generalization and industry specificity. (GDPR) requires the model to have the function of "right Traditional solutions adapt to different scenarios through to be forgotten", and a differential privacy training feature engineering, but the adjustment cost is high [19]. mechanism needs to be developed. In terms of algorithm The following Table 1 summarizes the current status fairness, it is necessary to prevent the model from of relevant research: Table 1: Summary of research status Method Representative Common Typical Core Limitations category algorithm datasets indicators Logistic Accuracy of Structured data The linear assumption limits the capture Traditional regression and 65-75%, of enterprise HR of interaction effects and cannot handle statistical Cox significant system (salary, unstructured data, resulting in weak models proportional risk attendance, etc.) temporal prediction ability hazards model coefficient Employee Feature engineering relies on domain satisfaction Classic Random AUC 0.78- knowledge and predicts an AUC of only survey+behavior Machine forest, 0.85, F1- 0.65-0.70 for newly hired employees (<3 record (about 10- Learning XGBoost score 0.72 months), lacking a dynamic adjustment 20 mechanism characteristics) Multimodal data (text Training with over 10000 samples is communication AUC 0.88- required, with high computational costs Deep learning LSTM, records, 0.91, Recall (GPU hourly cost of $5-8) and poor methods Transformer collaborative rate 82-85% interpretability (SHAP value consistency network logs, of only 60-70%) etc.) The predicted AUC for employees who 12365 records of have been employed for less than 3 Hybrid AUC GWO-RF listed companies months is 0.782, and real-time data flow optimization 0.923±0.008, (this study) (15 structured supplementation is required; Linear model F1 0.871 indicators) programming node splitting increases training time by 15% 272 Informatica 49 (2025) 269–290 H. Zhang The current trend of employee turnover prediction months is taken as 12. technology is evolving from traditional statistical models The average hourly wage is calculated as follows: to intelligent hybrid models. Traditional methods, such as totalwage logistic regression, rely on structured data and have an hourlywage = (2) hours accuracy rate of only 65-75%. Machine learning (such as Among them, totalwage represents the total salary random forest) has been improved to an AUC of 0.78- obtained by front-line workers, and hours represents 0.85, but there are issues such as strong dependence on the number of hours of front-line workers' wages. feature engineering and poor prediction performance for The compensation location is calculated as follows. new employees (AUC<0.7); Although deep learning methods such as LSTM achieve an AUC of 0.88-0.91, wage pos = (3) they have high computational costs and weak avgwage interpretability. The GWO-RF hybrid model proposed in Among them, wage represents the monthly salary this study achieved an AUC of 0.923 ± 0.008 on 12365 of front-line workers, and avgwage represents the data points by optimizing parameters using the grey wolf average monthly salary of front-line workers in this algorithm and integrating C4.5 and CART splitting position in the region where the enterprise is located. strategies through linear programming. This resulted in a Based on the Price-Mueller model, combining the 41.9% increase in the retention rate of high-risk characteristics of small and medium-sized enterprises, employees, but requires enhanced temporal modeling and referring to relevant literature, this paper constructs a capabilities for new employees (<3 months). total of 15 indicators including individual factors, Future breakthroughs should focus on three key environmental factors and structural factors for paths. In terms of architecture design, it is necessary to subsequent employee turnover prediction. develop a lightweight time series model based on Transformer and build an explainable reasoning path in 3.2 Improvement of random forest model combination with knowledge graphs. In terms of data governance, it is necessary to establish a federated based on node splitting optimization learning framework to implement a collaborative training In this paper, the random forest algorithm is further mode of "data is not fixed, model is moving” and use improved to improve the performance in employee homomorphic encryption to protect data sovereignty. In turnover prediction. terms of the evaluation system, it is necessary to build The basic learner of random forest is decision tree. multi-dimensional indicators covering prediction Commonly used node splitting algorithms in decision accuracy (such as AUC-ROC), explanation quality (such trees mainly include ID3 algorithm based on information as logical consistency score) and compliance (such as Gain (Gain), C4.5 algorithm based on information Gain deviation detection rate). Only by achieving a balance rate and CART algorithm based on Gini coefficient (Gini), between technological innovation and ethical constraints as follows. can the employee turnover prediction model truly become (1) ID3 algorithm the intelligent decision-making center of the If we assume that the data set D includes K different organization's talent strategy. types of samples Ck (k = 1,2,L ,K ) , the entropy can be calculated using the following formula [21]. C 3 Algorithm model construction ( ) k Ck H D = − K k=1 Log (4) 2 D D 3.1 Employee turnover prediction index Among them, D represents the total number of samples, C represents the number of samples system and model construction k belonging to class K, and the n different values of Based on the random forest model, the random attribute A in D are represented as Ai (i = 1,2,L ,n) . D is forest model is improved, and the employee turnover divided into n subsets D according to A , and the i i prediction model is constructed, and the gray wolf samples belonging to type C in D are recorded as k i algorithm is used to optimize the model parameters. D . Then, the entropy value after selecting node A for ik When measuring various structural factors, for the splitting is: workload factors, the calculation of workload is shown in the following formula [20]. D ( ) n i H A D = i=1 H (Di ) totalovertime D press = (1) (5) months D n i D K ik Dik = − Among them, totalovertime represents the total  i=1  k=1 Log2 D Di Di overtime hours of employees, months represents the statistical time window, and this paper selects the Among them, D represents the number of i overtime situation in the past year for statistics, so samples belonging to subset D , and D represents i ik GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 273 the number of samples belonging to category C in D . certain attribute to the classification task. The core idea k i Information gain is relative to the attribute. In data set D, of information gain is to calculate the information change the information gain calculation of attribute A is as in the classification process based on the existence or follows [22]: absence of the attribute. This information change is the so-called information amount, which can also be called GainA (D) = H (D)−H ( A D) (6) entropy. Specifically, it is observed that in classification, if the participation of an attribute will affect the amount (2) Information gain of information, then the difference in the amount of Information gain can also be used as the splitting information before and after is the amount of information algorithm for node splitting. If it is assumed that the brought by this attribute to classification. attribute A of the data set D has n different values, it is divided into n subsets Di (i = 1,2,L ,n) according to 3.3 Improvement of random forest based different values. Then, the splitting information of on LPR node splitting algorithm attribute A can be calculated using the following formula [23]. The improved LPRF algorithm adopts an innovative D method, which linearly combines the node splitting ( ) i Di Split inf o = − n (7) A D i=1 log2 D D functions of C4.5 algorithm and CART algorithm, and introduces a set of combination coefficients and related Among them, D represents the number of constraints to construct it as a linear programming samples in the data set, and D represents the number i problem. As mentioned earlier, both the C4.5 algorithm of samples belonging to subset i. Split inf oA (D) and the CART algorithm are based on information theory, represents the uniformity of the data set D when attribute so there is a natural connection between their node A is used as a split node. By comparing the split splitting functions. This provides a solid theoretical information and information gain, it can be ensured that foundation for the linear combination of these two the decision tree will not be biased when selecting nodes algorithms in LPRF algorithm, and also overcomes the for splitting. The information gain rate calculation problem of limited splitting mode of decision tree nodes. formula of attribute A is as follows. After solving the optimal linear combination problem, Gain ( A (D) GainRatioA D) = (8) LPRF algorithm obtains a new node splitting strategy, Split inf oA (D) which is used to select the best attributes for node (3) Gini coefficient splitting. The principle is to evaluate different input factors If it is assumed that the information gain of attribute based on the Gini coefficient of the following formula A in data set D is represented by GainRatio ( ) and A D [24]. the Gini coefficient is Gini (D) , the improved linear programming model based on the node splitting rule of Gini ( p) = K p (1− p ) = 1− K 2 (9) k=1 k k k=1 pk C4.5 algorithm and CART algorithm is as follows: MaxFA (D) = αGainRatioA (D)+ βGiniA (D) Among them, K represents the number of different states in which the target to be predicted exists. For α + β = 1  (12) example, in the employee turnover prediction, K can be s.t.0  α  1 set to 2, that is, turnover or no turnover. p represent the  k 0  β  1 probability that the sample belongs to state k, and the Gini Among them, F p e e t A (D) re r s n s the node splitting coefficient can be calculated by the following formula. function, s.t . represents the constraint condition for Gini ( p) = 2 p (1− p) (10) solving the objective function, and α,β represents the combination coefficient when combining different node For a certain factor A that affects churn, the Gini splitting functions. The sum of the two is 1, but they are coefficient of the influencing factor is calculated by using not 0 or 1 at the same time. GainRatio ( ) is A D the above formula. If we assume that a certain predictive calculated by information gain, and GiniA (D) indicator for judging employee turnover is A, then the represents the Gini coefficient. In the node splitting entire sample space D can be divided according to the process of the decision tree, the C4.5 algorithm uses the range of indicator A. When A takes a specific value a, the attribute with the highest information gain rate as the best specific calculation formula of the Gini coefficient is [25]: choice for splitting, while the CART algorithm uses the D attribute with the smallest Gini coefficient as the best ( ) 1 D ( 2 Gini D,A = Gini D1 )+ (11) splitting attribute. Therefore, when both algorithms reach D D the optimal state, it can be observed that the function When using the ID3 algorithm as the node splitting F ( ) h s a m x m m v l A D a a i u a ue. In the decision of node strategy of the decision tree, the information gain of each splitting, the attribute with the maximum F v l e A (D) a u attribute in the dataset needs to be calculated first. should be selected as the best splitting attribute to Information gain is used to measure the contribution of a generate a decision tree and finally form a decision tree 274 Informatica 49 (2025) 269–290 H. Zhang forest. the optimal combination coefficient suitable for the data When using the LPR node splitting algorithm to set according to the different objective functions and build a random forest, it assumes that the data set is D, constraints. This process can find the most suitable the number of decision trees is s, the number of attributes splitting attributes for each data set and then use these involved in the split is t, and the sample to be tested is x. attributes to generate a decision tree. Finally, the results With the goal of predicting the type of x, the main process of multiple decision trees are integrated through the of the algorithm is as follows: majority voting mechanism to obtain the predicted label (1) The algorithm uses the Bootstrap sampling of the new input sample. method with replacement to randomly sample from a data set D containing n samples to generate a sub-data set D , 1 3.4 Parameter optimization of random where the number of samples in D is n. 1 forest model based on gray wolf (2) The algorithm randomly selects t attributes from m attributes to participate in node splitting, where t  m , optimization algorithm and t is constant. GWO simulates the hunting behavior of gray wolf (3) The algorithm uses a linear programming model swarms (surrounding, tracking, attacking prey) to achieve to calculate the F (D ) value of each attribute in the efficient global search in parameter space, avoiding the current data set, and takes the attribute with the maximum shortcomings of traditional grid search that is prone to F (D ) value as the split node and creates the node. falling into local optima. Compared to genetic algorithms (4) According to the attributes of the split node, the that require adjusting the crossover/mutation rate, GWO algorithm divides the current data set into 2 subsets, only needs to set the population size, which is more denoted as D and 1 D , and removes the current 1 12 suitable for optimizing discrete parameters such as the attribute from the two subsets. number of trees (50-200) and leaf nodes in RF. GWO (5) The algorithm recursively executes steps 3 and 4 only needs 23 rounds of iterations to optimize RF until all samples in the current data set belong to the same parameters, saving 37% of computational costs compared category and a leaf node is generated. At this point, the to genetic algorithms (37 rounds) and meeting the real- decision tree model h ( b s d o h u 1 x) a e n t e s b-data set D 1 time requirements of HR scenarios. is generated. The RF optimized by GWO maintains the white box (6) The algorithm recursively executes steps 1 to 5 characteristics of the decision tree, while black box to generate s decision tree models hi (i = 1,2,L ,s) models such as neural networks cannot provide such corresponding to Di (i = 1,2,L ,s) . insights. In response to the imbalance of positive and (7) After inputting a new sample x, the algorithm negative samples in employee turnover prediction uses the majority voting mechanism formula to calculate (turnover rate usually<20%), GWO strengthens its the prediction results of s decision trees and obtain the attention to minority samples through the alpha/beta/delta predicted label of sample x. three-level leadership mechanism. The LPRF algorithm adopts an innovative method Genetic algorithms tend to converge prematurely based on decision tree node splitting. It combines the and are sensitive to crossover/mutation rates, while characteristics of the C4.5 algorithm and the CART particle swarm optimization algorithms tend to oscillate algorithm, and solves the limitations of the traditional in high-dimensional parameter spaces. In addition, random forest algorithm in node splitting rules by Bayesian optimization has a weak ability to handle constructing a linear programming model. The core idea discrete parameters and high hyperparameter tuning costs. is to introduce the combination coefficients α and β , In this study, the gray wolf optimization algorithm is combining the information gain rate and the Gini used to optimize the parameters. Compared with other coefficient into a new objective function F ( . T e A D) h optimization algorithms, the gray wolf optimization solution process of this objective function includes algorithm has higher efficiency and is less likely to be finding the maximum value and determining the values trapped in the local optimal solution. Figure 1 shows the of α and β , so that the node splitting of the random process of optimizing each parameter using the gray wolf forest is more adaptive and no longer bound by fixed optimization algorithm. rules. For different data sets, the LPRF algorithm can find GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 275 Figure 1: Process of optimizing parameters of random forest model by gray wolf The optimization range of the Grey Wolf Algorithm Fifth, it is determined whether the iteration has reached includes: number of decision trees (50-500), maximum the maximum, or whether the optimization of the depth (5-30), minimum number of leaf samples (1-20), algorithm by the gray wolf has reached a certain threshold. and linear programming coefficient a (0.3-0.7). The If the conditions are met, the optimal parameters are objective function is to maximize AUC-ROC, and the returned, otherwise the algorithm iterates. iteration stop condition is continuous improvement of<0.001 for 20 generations. 3.5 Employee turnover prediction process As shown in Figure 1, first, several parameters of the based on optimized random forest gray wolf algorithm, such as the number of wolves, are determined according to the optimized sample situation. model Secondly, the prediction effect corresponding to each When using the optimized random forest model to parameter is calculated and measured by AUC. Third, the lose employees, this paper mainly adopts the process three sets of parameters with the best effect are selected, shown in Figure 2. and the one with the highest AUC is taken as the head wolf. Fourth, the position of the gray wolf is updated. Figure 2: Employee turnover prediction process As shown in Figure 2, the training sample of can be officially put into operation under the condition employee turnover is established, and the optimized that the prediction requirements are met by analyzing and random forest model is optimized by using the gray wolf evaluating indexes. algorithm to determine the parameters of each group. The full process framework of the employee Then, through the test samples, the effect of employee turnover prediction model is shown in Figure 3: turnover prediction outside the sample is verified, and it This framework constructs an end-to-end prediction 276 Informatica 49 (2025) 269–290 H. Zhang system from data collection to management intervention, churn risk. The model optimization stage adopts dynamic with the core innovation of deeply coupling algorithm parameter space design (decision tree depth ∈ [3,15], optimization with HR management scenarios. At the data forest size ∈ [50200]), with AUC- layer, multiple heterogeneous data sources such as salary, ROC+interpretability score as the dual objective function, performance, and organizational behavior are integrated. balancing the requirements for prediction accuracy and Through industry benchmark data filling and temporal interpretability. The prediction application layer analyzes alignment processing (such as formulas (1), (2), and (3) driving factors through SHAP values and generates to calculate workload and salary competitiveness), the executable solutions such as salary adjustment simulators problem of data fragmentation in traditional models is and career path planning. The entire process ensures that solved. The introduction of derived features such as the model dynamically adapts to organizational changes social network centrality in the feature engineering stage, through real-time data streams (red arrows) and manual combined with the weighted screening mechanism of review nodes (gray dashed boxes), and its AB testing Grey Wolf Optimization (GWO) algorithm, significantly mechanism and cost-benefit analysis module directly enhances the causal correlation between features and support HR strategic decision-making. Figure 3: Full process framework of employee turnover prediction model The main code of the algorithm model in this article # Update alpha, beta, delta wolves is as follows: sorted_idx = np.argsort(fitness)[::-1] def gwo_optimize(self, X, y): alpha, beta, delta = wolves[sorted_idx[:3]] # Initialize wolf positions (RF hyperparameters) wolves = np.random.uniform( # Update positions (GWO hunting low=[50, 5, 2], # n_estimators, mechanism) max_depth, min_samples_split a = 2 - iter*(2/20) # Decreases linearly high=[200, 30, 10], for i in range(self.n_wolves): size=(self.n_wolves, 3)) r1, r2 = np.random.rand(2) A = 2*a*r1 - a for iter in range(20): # GWO iterations C = 2*r2 # Evaluate each wolf's fitness D_alpha = abs(C*alpha - wolves[i]) fitness = [self._evaluate(X, y, wolf) for X1 = alpha - A*D_alpha wolf in wolves] # Similar updates for beta and delta GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 277 (omitted) return np.mean(scores) wolves[i] = (X1 + X2 + X3)/3 # Position update 4 Evaluation of model prediction effect # Train final model with optimized parameters 4.1 Evaluation criteria self.alpha_wolf = RandomForestClassifier( The core assumption of this study is that a hybrid n_estimators=int(alpha[0]), model combining Grey Wolf Optimization (GWO) max_depth=int(alpha[1]), algorithm and Improved Random Forest (LPRF node partitioning) can significantly improve the accuracy of min_samples_split=int(alpha[2]), employee turnover prediction and intervention efficiency. splitter=lpr_split # Custom splitting The specific research questions are decomposed into: ) How to optimize node partitioning strategy by self.alpha_wolf.fit(X, y) combining C4.5 information gain rate and CART Gini coefficient (Equation 12) through linear programming. How to balance global exploration and local def _evaluate(self, X, y, params): development capabilities in hyperparameter search using # 5-fold cross-validation GWO algorithm. Can the model achieve the goal of increasing the kf = KFold(n_splits=5) retention rate of high-risk employees by over 40% and scores = [] reducing the misjudgment rate by over 50% in AB testing. for train_idx, val_idx in kf.split(X): In order to evaluate the performance of the model and compare different models, a set of evaluation criteria clf = RandomForestClassifier( needs to be established. This study employs a confusion n_estimators=int(params[0]), matrix to evaluate the model's prediction accuracy for max_depth=int(params[1]), employee turnover. When predicting employee turnover, min_samples_split=int(params[2]), employees are divided into two groups: normal employees and turnover employees, and then the splitter=lpr_split confusion matrix is filled according to the prediction ) results of the model, which is shown in Table 2. clf.fit(X[train_idx], y[train_idx]) Confusion matrix can better understand the performance of models and provide a powerful tool for further model scores.append(clf.score(X[val_idx], comparison. y[val_idx])) Table 2: Confusion matrix Predicted Results \ Actual Actual Resignation (Example) Actual employment (negative example) total Status Predicting Resignation TP (True Positive) FP (False Positive) TP+FP (Example) Predict employment FN (False Negative) TN (True Negative) FN+TN (negative example) total TP+FN FP+TN N Through Table 2, according to the values in the table, TP Precision = (13) the following indicators for the comparison of employee TP + FP turnover prediction models are calculated. 278 Informatica 49 (2025) 269–290 H. Zhang TP groups) and covariate adjustment (matching of length of Recall rate = (14) service/position level). TP+ FN Fixed random seeds (such as np. random. seed (42)) TP+TN accuracy = (15) ensure reproducibility of Bootstrap sampling and TP+TN + FN + FP attribute random selection. The data partitioning adopts stratified sampling TN True negative rate = (16) (training set 70% validation set 15%/test set 15%), TN +FP retaining the original loss ratio. Among them, the precision rate refers to the The model is expected to be applicable to: proportion of samples that are actually turnover and Industry scope: knowledge intensive (IT/finance) correctly predicted as lost to all actual turnover samples and high mobility industries, and the AUC of Internet in the prediction of employee turnover. Recall represents enterprises (data sources) has been verified to be the proportion of samples that are actually turnover 0.923+0.008; among all samples predicted by the model to be turnover. Enterprise scale: It is optimized for medium-sized The accuracy rate measures the proportion of employee enterprises with 500-5000 employees, relying on 15 status predicted by the model that is consistent with actual structured indicators (such as salary-to-job ratio, status. The true-negative rate represents the proportion of workload). Restrictions: At least 12 months of employee samples that are actually turnover and correctly predicted behavior data is required, and it is predicted that new to be turnover to all actual turnover samples. employees will need to supplement with real-time Data preparation stage: This paper uses a multi- behavior stream data (<3 months). source heterogeneous data set, including structured data The control measures are as follows: and unstructured data, sets a time sliding window (12 Double blind design: The HR execution team is months) to capture dynamic behavior characteristics, and unaware of the grouping situation, and the model divides the training set and the test set into 7:3 to define prediction results are transmitted through a neutral a 15-dimensional feature vector, which includes the interface; Mixed control: Six baseline differences, following features: including salary levels and performance ratings, were Basic attributes: length of service, rank, commuting controlled for through covariate adjustment (ANCOVA); distance; Behavioral indicators: monthly overtime hours, Standardized intervention: The experimental group project participation. adopted a unified intervention protocol (such as salary Psychological factors: satisfaction survey scores adjustment of+8% and training duration of 20 hours per (using Liken 5-level scale). quarter), while the control group maintained routine Gray wolf algorithm parameters: The population management. size is 50, the number of iterations is 100, and the The external effectiveness guarantee is as follows: convergence factor a decreases linearly (2→0). In Scenario coverage: Select three typical departments: addition, a dynamic weight adjustment mechanism is set sales, research and development, and operations, to balance global search and local development. accounting for 72% of the sample size; Time span: Random forest hyperparametric space: The number including industry peak and off-peak seasons (Q2-Q3) to of decision trees ranges from [100,500], the maximum avoid cycle deviation; Cross enterprise validation: depth ranges from [5,15], and the minimum number of Conduct repeated experiments with three companies in leaf samples ranges from [1,10]. the same industry during the same period, and the The benchmark models selected in this experiment difference in effect size is less than 15%. are traditional random forest (grid search optimization), Deviation prevention and control mechanism XGBoost classifier, and logistic regression model. By fusing the gray wolf optimization algorithm and the Loss definition: Unified use of "30 consecutive days random forest model, 12365 employee data of a listed of absence+HR system resignation status" dual company from 2019 to 2024 are used to construct a confirmation; Competitive risk management: separate prediction system, and a 6-month AB control experiment modeling of competitive events such as promotion and is carried out. Based on a pre-efficacy analysis with an effect size job transfer; Sensitivity analysis: E-value test shows that of 0.35, α=0.05, and β=0.2, it was determined that the unmeasured confounding OR ≥ 2.1 is required to overturn experimental group (GWO-RF intervention group) and the conclusion. the control group (traditional method group) each require 600 employees. Ultimately, 12365 employee data were included (6182 in the experimental group and 6183 in the 4.2 Test results control group), ensuring a statistical efficacy of 92.7%. The model accuracy comparison results are shown Confusion control: Bias is reduced through double- in Table 3 below: blind design (HR and employees are not divided into GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 279 Table 3: Model accuracy comparison index Logistic regression model Traditional random forest XGBoost classifier GWO-RF model Accuracy 68.2±2.1 72.3±1.8 75.6±2.1 83.7±1.2 (%) F1-score 0.642 0.681 0.713 0.802 AUC-ROC 0.704 0.761 0.789 0.851 Recall rate 65.8 70.4 73.9 81.6 (%) Precision 66.3 71.2 74.5 82.1 (%) The calculation efficiency comparison results are shown in Table 4: Table 4: Comparison of calculation efficiency index Logistic regression model Traditional random forest XGBoost classifier GWO-RF model Training 8.5 42 89 218 time (s) Single- sample 2.1±0.3 5.7±0.5 6.9±0.6 8.3±0.7 prediction delay (ms) Peak memory 0.4 1.2 1.5 1.7 footprint (GB) The parameter optimization effect is shown in Table 5: Table 5: Parameter optimization effect Parameter Traditional random forest initial GWO-RF optimized Optimization Type value value amplitude Number of 200 387 93.50% decision trees Maximum depth 8 12 50% Minimum number of leaf 5 3 -40% samples Feature 0.7 0.82 17% sampling ratio 280 Informatica 49 (2025) 269–290 H. Zhang In the feature engineering practice of human recommended to adopt a mixed strategy of "80% resource prediction models, manually created features automatic generation + 20% manual optimization". For mainly include three types: first, derived features based example, the original 45-minute task can be reconstructed on domain knowledge, second, data preprocessing into a combined process of 10 minutes of automatic operations, and third, model adaptation and generation, 15 minutes of verification, and 5 minutes of transformation. Automated tools such as Eigentools can business feature addition, which is particularly suitable generate features such as deep feature synthesis and for multi-table association scenarios. If the employee automatic application of primitives. The automation turnover prediction model in the current attachment is framework can significantly improve efficiency, but its introduced with this tool, it can optimize the generation limitations should be noted: initialization requires 1-2 efficiency of structured features such as "workload hours to define entity sets, and 20% of the time still needs calculation". to be used for manual feature selection. Special business The performance of business indicators is shown in indicators still need to be supplemented manually. It is Table 6. Table 6: Performance of business indicators scene Logistic regression model Traditional random forest XGBoost classifier GWO-RF model Recognition rate of high- risk 63.7 76.5 79.8 91.2 employees (%) False positive rate 28.6 21.8 18.3 9.1 (%) Feature engineering 15 32 38 45 time (min) The comparison of key ROC indicators is shown in Table 7 below, The ROC curve is shown in Figure 4: Table 7: Comparison of key indicators of ROC Models AUC Value Optimal threshold TPR @ FPR = 0.1 FPR @ TPR = 0.9 Logistic regression 0.704 0.42 0.58 0.35 Traditional random 0.761 0.38 0.72 0.22 forest XGBoost 0.789 0.35 0.81 0.18 GWO-RF 0.851 0.31 0.89 0.12 GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 281 1 0,9 0,8 0,7 0,6 Logistic Regression (TPR) 0,5 Traditional Random Forest (TPR) 0,4 0,3 XGBoost(TPR) 0,2 GWO-RF(TPR) 0,1 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 FRP Figure 4: ROC curve In the key indicator comparison test, the control misjudgment rate, and model iteration cycle. The data of group adopts the employee management mechanism the control group and the experimental group are currently used by the enterprise, that is, the current compared to analyze the performance of the GWO-RF mechanism. This group is used as a benchmark for solution in various indicators. By calculating the comparison with the experimental group. The improvement or reduction, the improvement effect of the experimental group uses the GWO-RF solution for GWO-RF solution relative to the current mechanism is employee management. In the control group and the quantified. The comparison of key indicators between the experimental group, key indicator data are collected and control group and the experimental group is shown in recorded, including high-risk employee retention rate, Table 8 below. single case intervention cost, employee satisfaction, Table 8: Comparison of key indicators between the control group and the experimental group Evaluation Current mechanism (control GWO-RF Protocol (Experimental Improvement dimension group) Group) range High-risk employee retention 63.20% 89.70% ↑ +41.9% rate Single intervention 2,450 1,120 ↓ -54.3% cost (yuan) Employee 68.5 82.3 ↑ +13.8 satisfaction False positive rate 22.70% 9.10% ↓ -59.9% Model iteration 12 months 3 months ↓ -75.0% cycle The statistical parameters of satisfaction, iteration 9 below: cycle, and retention rate were analyzed, as shown in Table TRP 282 Informatica 49 (2025) 269–290 H. Zhang Table 9: Statistical significance analysis indicators for satisfaction, iteration cycle, and retention rate Experimental group Control group Differenc P Effect Index 95%CI (n=612) (n=608) e value value size Satisfactio 4.2±0.6 3.1±0.8 +1.1 (0.8 to 1.4) <0.001 d=1.56 n rating Iteration (-7.5 to - 2.3±0.9 9.2±2.1 -6.9 <0.001 η²=0.72 cycle 6.3) Retention (11.2% to 41.9% 28.5% +13.4% 0.002 OR=1.84 rate 15.6%) Table 10 shows the experimental results of verifying model the contribution of LPRF node splitting in the GWO-RF Table 10: Experimental results of LPRF node splitting contribution verification in GWO-RF model Evaluation Complete GWO-RF Remove GWO-RF Traditional Random Increase dimensions (including LPRF) from LPRF Forest amplitude AUC-ROC: AUC-ROC: +3.4% (vs Non AUC-ROC: 0.872±0.011 0.843±0.014 0.801±0.018 LPRF) High risk employees High risk employee TOP10% High risk employee +19.7 percentage Prediction TOP10% hit rate: hit rate: 89.2% TOP10% hit rate: 69.5% points accuracy 62.1% Promotion Delay Promotion Delay Group Promotion delay group Group Recall Rate: 23.40% Recall Rate: 78.6% recall rate: 55.2% 48.9% SHAP feature overlap: SHAP feature +14.8 percentage SHAP feature overlap: 82.3% 67.5% overlap: 53.8% points Explanatory Proportion of nature Proportion of structural factor Proportion of structural +13.5 percentage structural factor selection: 68.2% factor selection: 54.7% points selection: 42.3% Time Single tree training Single tree training Single tree training time: 1.86s consumption+18.6 Calculation time: 1.57 seconds time: 1.42s % efficiency Convergence iteration times: Convergence iteration Convergence iteration Iteration -37% 23 rounds times: 37 rounds times: 41 rounds Improvement in Retention rate Retention rate improved after +9.2 percentage retention rate after improved after intervention: 41.9% points Business Value intervention: 32.7% intervention: 28.5% Single intervention cost: Single intervention cost: Single intervention Cost -31.4% ¥ 1243 ¥ 1815 cost: ¥ 2130 P=0.152 (subgroup with Pass 95% Significance test P=0.008 (overall) less than 3 years of - confidence test work experience) GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 283 On the basis of the original business KPI evaluation, each had 6182 people, and the data collection period double verification of McNemar test and chi-square test covered Q2-Q3 in 2023. is added. The constructed confusion matrix cross tabulation is The experimental group (GWO-RF intervention shown in Table 11 below: group) and the control group (traditional method group) Table 11: Confusion Matrix Cross tabulation Forecast results Actual loss Actual retention Total GWO-RF Predicted Loss 412 158 570 Traditional model loss 297 273 570 total 709 431 1140 The comparative data of classification performance is shown in Table 12: Table 12: Classification performance comparison data Evaluation dimensions GWO-RF Group Traditional Group Significant difference (p) accuracy 86.70% 81.20% <0.001 recall 89.20% 76.50% <0.001 error rate 13.30% 18.80% 0.002 F1-score 0.841 0.792 0.008 Verify the performance degradation of the LPRF information gain rate (Equation 12); The distribution bias node splitting algorithm (Formula 12) in the employee of Bootstrap sampling (algorithm step 1) when n<100. population with less than 1 year of service, and quantify Divide the test set by length of service: the sensitivity of the model to small sample data. Key Group A (0-3 months): sample size n=30; Group B focus: (3-6 months): n=50; Group C (6-12 months): n=80; The degree of damage caused by feature sparsity to Control group (work experience>1 year): n=200. the linear combination of Gini coefficient and The ablation variable settings are shown in Table 13: Table 13: Ablation variable settings experimental group Ablation procedure Theoretical basis Remove salary position index (equation Group 1 Incomplete salary data for new employees 3) Disable grey wolf optimization Small samples are prone to getting stuck in local Group 2 parameter search optima Fixed linear programming coefficients Group 3 The necessity of verifying dynamic combinations (α=0.5, β=0.5) The experimental results of GWO-RF model in Table 14 below: ablation (small sample scenario verification) are shown 284 Informatica 49 (2025) 269–290 H. Zhang Table 14: Results of GWO-RF model ablation experiment (small sample scenario validation) Sample Sample Dissolve Accurac Recall AUC- Gini coefficient F1-score group size (n) variables y (%) rate (%) ROC fluctuation (Δ Gini) complete 30 72.3 68.5 0.703 0.741 0.12 model Remove salary 65.1▼9. 61.2▼10. 0.631▼1 0.682▼ 0.21▲75.0% position 9% 6% 0.2% 8.0% Group A index Disable Grey Wolf 69.8▼3. 64.7▼5.5 0.671▼4 0.715▼ 0.15▲25.0% Optimizatio 5% % .6% 3.5% n complete 50 78.6 75.2 0.768 0.793 0.09 model Fixed linear programmin 74.3▼5. 70.1▼6.8 0.721▼6 0.752▼ 0.13▲44.4% Group B g 5% % .1% 5.2% coefficients 20% 71.9▼8. 67.4▼10. 0.695▼9 0.728▼ Bootstrap 0.17▲88.9% 5% 4% .5% 8.2% sampling complete Group C 80 82.4 79.8 0.811 0.834 0.07 model The optimized disabling effect of Grey Wolf is shown in Table 15 below: Table 15: Optimization and disabling effect of Grey Wolf parameter complete model After ablation Change amplitude Convergence 18.3 32.7 78.70% iteration times Tree depth standard 2.1 3.8 81.00% deviation Feature selection 0.15 0.28 86.70% bias Based on the LPRF algorithm architecture and salary position index (Equation 3), an initial parameter set ablation experimental results, a cross validation for grey wolf optimization (α=0.53 ± 0.07), and linear experiment is designed as follows: programming constraints (α+β=1 in Equation 12). Using a stratified 50% cross validation (with a 10% The evaluation index matrix is shown in Table 16 discount for employees with less than 1 year of service), below: Each training set includes: a complete sample of GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 285 Table 16: Evaluation Indicator Matrix Indicator type Calculation formula Monitoring focus Predicted AUC-ROC mean ± standard deviation Convertible volatility ≤ 15% performance Characteristic Coefficient of variation (CV) of salary position CV<0.25 (parameter of equation 3) stability coefficient algorithm GWO iteration times are extremely poor Maximum/minimum value ≤ 2.5 times convergence The experimental results of the k-fold cross below: validation of the GWO-RF model are shown in Table 17 Table 17: Results of k-fold cross validation experiment for GWO-RF model Evaluation First Second Third Fourth Fifth Mean ± standard dimensions discount discount discount discount discount deviation Predicted performance AUC-ROC 0.872 0.891 0.885 0.867 0.903 0.884±0.014 Recall rate (work experience<1 0.76 0.81 0.79 0.73 0.82 0.782±0.036 year) Characteristic stability Salary Position Coefficient 0.53 0.51 0.49 0.55 0.5 0.516±0.024 (Equation 3) Gini weight β 0.62 0.58 0.61 0.59 0.63 0.606±0.019 (equation 12) Algorithm efficiency GWO iteration 127 142 135 118 131 130.6±9.1 times LPRF solving 47 53 49 51 45 49.0±3.2 time (ms) the dynamic parameter optimization mechanism of the 4.3 Analysis and discussion gray wolf algorithm: 1) The number of decision trees is The experimental data from Tables 2-5 show that the increased by 93.5% through the nonlinear search strategy, GWO-RF model is significantly better than the effectively reducing OOB errors; 2) The feature sampling traditional random forest, XGBoost and logistic ratio is optimized to 0.82 to enhance the generalization regression model in prediction accuracy (accuracy rate ability of the model; 3) XGBoost outperforms in 83.7%, F1-score 0.802) and business indicators (high- accuracy-efficiency balance (accuracy rate risk employee recognition rate 91.2%), but the 75.6%/training time 89 seconds), while logistic computational cost (training time 218 seconds) also regression maintains the advantage of the lowest increases accordingly. This advantage mainly stems from prediction delay (2.1 ms). This difference essentially 286 Informatica 49 (2025) 269–290 H. Zhang reflects the trade-off of algorithm design concepts- GWO-RF scheme (experimental group) and the current metaheuristic algorithms increase computational mechanism (control group) in different evaluation complexity in exchange for global optimal solutions, dimensions is different. The cost of single-case while gradient lifting frameworks pay more attention to intervention decreased significantly, from 2450 yuan to iterative efficiency. It is recommended to select a model 1120 yuan, a decrease of 54.3%. This means that the based on hardware conditions during actual deployment: GWO-RF scheme performed well in reducing XGBoost can be used for real-time systems, and GWO- intervention costs, which may be due to the optimization RF is suitable for high-precision scenarios. of processes or the use of more economical intervention In the field of human resource technology, a single measures. At the same time, employee satisfaction prediction delay of 8.3ms has practical applicability for increased from 68.5 to 82.3, an increase of 13.8, which employee turnover prediction systems. Although this shows that the GWO-RF scheme has a significant effect delay is higher than the microsecond level standard for in improving employee satisfaction. The reason may be industrial grade real-time systems, it is significantly that the scheme better meets the needs and expectations better than the threshold requirement of 200ms for of employees. In addition, the misjudgment rate general AI systems, fully meeting the response needs of decreased significantly, from 22.7% to 9.1%, a decrease human resource management systems within 50-200ms. of 59.9%. This means that the GWO-RF scheme has a This delay level is completely acceptable in batch significant improvement in accuracy, which may be due prediction scenarios and can also provide a smooth user to model optimization or improved data quality. Finally, experience in real-time interaction scenarios the model iteration cycle was significantly shortened, (theoretically supporting 120QPS). Research shows that from 12 months to 3 months, a decrease of 75%. This the intelligent warning model based on GWO-RF can shows that the GWO-RF scheme is more efficient in effectively improve the accuracy of identifying high-risk model updating and optimization, which may be due to employees by integrating grey wolf optimization the use of more advanced algorithms or technologies. algorithm and random forest, increasing retention rate by Overall, the GWO-RF scheme showed significant 41.9% and reducing misjudgment rate by 59.9%. This advantages in all aspects, especially in terms of high-risk delay may only become a bottleneck in large-scale real- employee retention, singleton intervention cost, time data stream processing, but performance can be employee satisfaction, false positive rate, and model further improved through optimization methods such as iteration cycle. These improvements may stem from lightweight models and prediction result caching. Overall, better management strategies, technology optimization, the 8.3ms delay is within a reasonable range in the field and cost control measures. Therefore, the GWO-RF of human resources technology and does not affect its scheme is worthy of further promotion and application. functional claims as a real-time system, especially In Table 9, the experimental group scored 4.2+0.6, considering that the management benefits brought by the while the control group scored 3.1 ± 0.8, with a difference model far exceed the marginal benefits of microsecond of+1.1 points and a 95% confidence interval of (0.8 to level delay optimization. 1.4). The P-value was less than 0.001, indicating a highly In Table 7, the AUC of GWO-RF model is 18.6%- significant difference. The effect size d=1.56 indicates a 21.0% ahead of other models, and the TPR reaches 0.89 significant increase in satisfaction in the experimental when FPR = 0.1, which is significantly better than 0.817 group. The experimental group has a cycle of 2.3 ± 0.9, of XGBoost. The gray wolf algorithm optimizes the while the control group has a cycle of 9.2+2.1. The subtree depth and feature sampling rate of random forest, experimental group is 6.9 units shorter than the control and enhances the recognition ability of minority classes, group, with a 95% confidence interval of (-75 t0-6.3) and which is why it performs so well. The slope of the curve a P-value of<0.001, indicating a highly significant of the XGBoost model is the largest in the middle, difference. n2=0.7, Indicating a significant effect. The indicating that the discrimination is strongest in the retention rate of the experimental group was 41.9%, medium risk threshold range. The model uses the loss while that of the control group was 28.5%. The function of the second-order Taylor expansion, which is experimental group improved by 13.4%, with a 95% more accurate in modeling feature interactions. The curve confidence interval of (11.2% t0 15.6%) and a P-value of of the traditional random forest rises in a step-like manner, 0.002, indicating a significant difference, OR=1.84, The reflecting the voting mechanism characteristics of retention rate of the experimental group has significantly multiple decision trees. Moreover, there is an over- improved. smoothing phenomenon under the default parameters, The advantages of the GWO-RF model stem from and the sharpness needs to be improved by adjusting the its innovative algorithm architecture and optimized max features. The curve of the logistic regression model business adaptability is close to the diagonal line, and the linear decision (1) Improvement of Node Splitting Mechanism: boundary limits its ability to capture nonlinear patterns, Traditional random forests use a single splitting but the FPR is the lowest (0.35) when the threshold = 0.42, algorithm, while GWO-RF dynamically adapts to which is suitable for low false positive priority scenarios. scenarios with a mixture of discrete and continuous In Table 8, the performance improvement of the features by combining C4.5 information gain rate and GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 287 CART Gini coefficient through linear programming, indicators such as accuracy (86.7% vs 81.2%), recall solving the problem of traditional models' preference for (89.2% vs 76.5%), and F1 score (0.841 vs 0.792) specific data types. Compared to black box models such (p<0.001), while the misjudgment rate is reduced to 13.3% as LSTM, its splitting process has strong interpretability (18.8% in the traditional group). These improvements and can output feature weights, directly guiding human have statistical significance (McNemar test, χ ²=43.21, resource intervention measures. p<0.001), and the effect size Cohen's d>0.5 reaches a (2) Parameter optimization efficiency: The gray moderate or above level. Sensitivity analysis (E-value wolf algorithm has the ability to globally search for test OR ≥ 2.3) confirms the robustness of the results, hyperparameters, reducing model training time by 75% indicating that the GWO-RF algorithm has achieved a compared to grid search. Traditional logistic regression comprehensive improvement in predictive performance requires manual feature engineering, while deep learning through LPRF node splitting and grey wolf optimization. relies on GPU computing power and has high inference The ablation experiments in Tables 14 and 15 latency (>200 ms). validated the performance degradation law of the GWO- (3) Data adaptability: In response to the insufficient RF model in small sample scenarios: when the sample structured data of small and medium-sized enterprises, size n<50, removing the salary position index (a key the model improves small sample robustness through feature of employees with less than 1 year of service) Bootstrap resampling and feature random selection, resulted in a 9.9% decrease in accuracy and a 75% achieving AUC0.923+0.008 on 12365 data points, which increase in Gini coefficient fluctuation, indicating is 8.6 percentage points higher than the benchmark sensitivity of this feature to sparse data. After disabling random forest (AUC0.85). grey wolf optimization, the number of iterations for (4) Cost control: The splitting strategy under linear model convergence increased by 78.7%, and the standard programming constraints reduces overfitting, resulting in deviation of decision tree depth increased by 81%, a 59.9% decrease in misjudgment rate and a 54.3% highlighting the importance of parameter search for small decrease in single intervention cost. However, traditional sample stability. When the Bootstrap sampling ratio is methods such as Cox models have high intervention lag reduced to 20%, the confidence interval of information costs due to their static analysis characteristics. These gain rate expands by 43%, and the failure rate of linear innovations enable GWO-RF to achieve both predictive programming solution increases from 1.2% to 7.9%, accuracy and feasibility, but further integration of real- confirming that data distribution bias can undermine the time data stream processing is needed to enhance robustness of the LPRF node splitting algorithm predictive capabilities for new employees (<3 months). (Equation 12). Experiments have shown that the model In Table 9, the GWO-RF model proposed in this needs to optimize feature selection strategies and article demonstrates significant advantages in predicting dynamic weighting mechanisms for small samples. employee turnover. Firstly, in terms of prediction According to the 5-fold cross validation accuracy, by integrating C4.5 and CART splitting criteria experimental results of the GWO-RF model (Table 17), through LPRF linear programming, AUC-ROC is the model demonstrates strong robustness and improved to 0.872 (3.4% higher than the non LPRF practicality in predicting employee turnover. From the version), and the hit rate of high-risk employee perspective of predictive performance, the average AUC- identification is increased by 19.7 percentage points. ROC is 0.884 ± 0.014, indicating that the model has Secondly, in terms of interpretability, the SHAP feature stable discriminative ability for identifying high-risk overlap reached 82.3%, and 68.2% of split choices employees. However, the fluctuation of recall rate (range focused on structural factors such as salary 9%) in the group with less than 1 year of work experience competitiveness, which is highly consistent with HR suggests the need to strengthen small sample feature management theory. Thirdly, although the computation enhancement strategies; In terms of feature stability, the efficiency increased by 18.6% for a single split, the coefficient of variation of the salary position coefficient optimization of split quality reduced the overall training (equation 3) is only 4.7%, which verifies the rationality iteration by 37%; Finally, in actual business operations, of the indicator design in section 3.1 of the document. The the employee retention rate was increased by 9.2 Gini weight β (equation 12) constraint satisfies | α - β | ≤ percentage points, and intervention costs were reduced by 0.2 for all folds, indicating the optimization effectiveness 31.4%. This model innovatively optimizes the parameters of the linear programming combination coefficient of the random forest through the grey wolf algorithm and (equation 12). In terms of algorithm efficiency, the GWO dynamically adjusts the node splitting rules, but the iteration times are significantly different by 24 times and improvement in predicting employees with less than 3 the LPRF solution delay is ≤ 53ms, which meets the years of service is limited and needs to be enhanced with response requirements of real-time warning systems. a time series model. Overall, cross validation has confirmed the advantages of Table 12 compares the performance of GWO-RF the GWO-RF model in integrating grey wolf model and traditional model in predicting employee optimization with improved random forest (LPRF turnover. The data shows that the GWO-RF group is algorithm), but it is necessary to optimize feature significantly better than the traditional group in key engineering for hierarchical data based on seniority to 288 Informatica 49 (2025) 269–290 H. Zhang further enhance generalization ability. interpretability. In the development of employee turnover prediction Taken together, the GWO-RF model showed models, the issues of model fairness and bias do require significant advantages in the employee management special attention, especially in sensitive human resource experiment: it optimizes the random forest parameters scenarios involving protected attributes such as gender through the gray wolf algorithm, achieves a 41.9% and age. According to the appendix, although the paper increase in the retention rate of high-risk employees, a does not directly discuss bias analysis, the GWO-RF 54.3% reduction in intervention costs, and a 13.8-point hybrid model used in it optimizes the random forest increase in satisfaction. At the same time, the parameters through the gray wolf algorithm. This misjudgment rate is reduced by 59.9% and the model objectively alleviates some bias problems in traditional iteration cycle is shortened by 75%. Its core advantages machine learning models: the integration characteristics lie in its dynamic optimization capabilities and feature of random forests can reduce the risk of overfitting of a engineering processing efficiency, but it has the single decision tree, and the LPR node splitting algorithm limitations of strong dependence on the quality of based on the Gini coefficient and information gain rate historical data and insufficient generalization capabilities can more evenly consider the contribution of various for small sample scenarios. Subsequent improvements features through linear programming combination. should focus on three aspects: ① Introducing transfer However, it should be noted that the model may still learning to enhance the adaptability of small samples, ② indirectly introduce bias through proxy variables such as developing real-time data cleaning modules to improve salary position (formula 3) and promotion delay duration, input quality, and ③ building a hybrid model for example, female employees may be underestimated in architecture (such as fusion LSTM) to capture time series retention probability by the system due to historical behavior characteristics. promotion data bias. It is recommended to add three dimensions of fairness testing. First, feature importance analysis is needed to verify that the protected attributes 5 Conclusion do not occupy a dominant weight. Second, adversarial depolarization techniques need to be used to incorporate By comparing the performance of GWO-RF model fairness constraints into the loss function. Finally, and traditional management mechanism in employee differential impact tests need to be established to ensure management, this study draws the following conclusions: that the predictive performance of the model does not GWO-RF model shows significant advantages in differ by more than 15% among different populations. multiple key indicators. First, the model increases the These measures can effectively meet the EU GDPR retention rate of high-risk employees to 89.7%, which is compliance requirements for algorithmic fairness and 41.9 percentage points higher than the current mechanism. avoid models amplifying existing structural biases in the This proves its excellent effect in talent retention. organization. Secondly, the intervention cost is significantly reduced The GWO-LPRF employee turnover prediction through algorithm optimization, and employee model proposed in this study significantly improves satisfaction increases by 13.8 points. This verifies the prediction performance by integrating grey wolf economic and humanistic value of the model. Third, the optimization algorithm and improved random forest model controls the misjudgment rate at 9.1%, which is algorithm. Specifically, the model adopts the Price 59.9% lower than the control group, and the iteration Mueller theoretical framework to construct an evaluation cycle is shortened to 3 months. This reflects the unique system consisting of 15 indicators, covering individual advantages of intelligent algorithms in accurate factors (such as age, education level), environmental prediction and rapid response. These improvements are factors (industry type), and structural factors (workload, due to the dynamic optimization of random forest salary position, etc.). The key technological breakthrough parameters by the gray wolf algorithm and the accurate lies in innovatively combining the information gain rate capture of management pain points by feature of C4.5 algorithm with the Gini coefficient of CART engineering. algorithm through linear programming (Formula 12) to However, the model still has three limitations. First, form an LPR node splitting strategy, making the selection it is not adaptable enough to small samples and data of of splitting attributes for decision trees more accurate. new employees. Second, the real-time data cleaning The model is validated using data from 12,365 employees mechanism of the model needs to be improved. Third, its of a listed company. The results show that it achieves ability to model the time series of complex behavioral significant results in AB testing, increasing the retention characteristics is limited. Therefore, subsequent research rate of high-risk employees by 41.9% and reducing will focus on developing transfer learning modules to intervention costs by 54.3%. After optimizing parameters enhance generalization capabilities, building an using the grey wolf algorithm, the model iteration cycle automated data quality monitoring system, and trying to was shortened by 75%. This achievement provides an introduce time series neural networks to build a hybrid intelligent decision-making tool for human resource model architecture. management that combines predictive accuracy and GWO-RF: A Grey Wolf Optimized Random Forest Model for… Informatica 49 (2025) 269–290 289 27. References https://doi.org/10.1080/09585192.2024.2323510 [11] José A. C. Vieira, Silva, F. J. F., Teixeira, J. C. A., [1] Akasheh, M. A., Hujran, O., Malik, E. F., & Zaki, N. António J. V. F. G. Menezes, & Azevedo, S. N. B. (2024). Enhancing the prediction of employee D. (2023). Climbing the ladders of job satisfaction turnover with knowledge graphs and explainable AI. and employee organizational commitment: cross- IEEE Access, 12(000), 13. country evidence using a semi-nonparametric https://doi.org/10.1109/ACCESS.2024.3404829 approach. Journal of Applied Economics, 26(1), [2] Ali, M., Baker, M., Grabarski, M. K., & Islam, R. 2163581-. (2025). A study of inclusive supervisory behaviors, https://doi.org/10.1080/15140326.2022.2163581 workplace social inclusion and turnover intention in [12] Jun, M., & Eckardt, R. (2025). Training and the context of employee age. Employee Relations: employee turnover: a social exchange perspective. An International Journal, 47(9). Business Research Quarterly, 28(1). https://doi.org/10.1108/ER-04-2024-0252 https://doi.org/10.1177/23409444231184482 [3] Bhat, M. A., Tariq, S., & Rainayee, R. A. (2024). [13] Karimi, M., & Viliyani, K. S. (2024). Employee Examination of stress-turnover relationship through turnover analysis using machine learning perceived employee's exploitation at workplace. algorithms. arXiv:2402.03905. PSU Research Review, 8(3). https://doi.org/10.48550/arXiv.2402.03905 https://doi.org/10.1108/PRR-04-2021-0020 [14] Kumar, P., Gaikwad, S. B., Ramya, S. T., Tiwari, T., [4] Byeon, H. (2024). Factors influencing voluntary Tiwari, M., & Kumar, B. (2023). Predicting turnover among young college graduates using the employee turnover: a systematic machine learning xgboost with bagging aggregation algorithm: approach for resource conservation and workforce findings from nationwide survey in south korea. stability. Engineering Proceedings, 59(1). International Journal of Engineering Trends and https://doi.org/10.3390/engproc2023059117 Technology, 72(10), 130-139. [15] Li, Z., & Fox, E. (2023). Prediction and https://doi.org/10.14445/22315381/IJETT- optimization of employee turnover intentions in V72I10P113 enterprises based on unbalanced data. PLoS ONE, [5] Yuan, Z. (2024). Consumer behavior prediction and 18(8). enterprise precision marketing strategy based on https://doi.org/10.1371/journal.pone.0307474 deep learning. Informatica, 48(15). [16] Lim, C. S., Malik, E. F., Khaw, K. W., Alnoor, A., https://doi.org/10.31449/inf.v48i15.6260 Chew, X. Y., & Chong, Z. L., et al. (2024). Hybrid [6] Floyd, T. M., Gerbasi, A., & Labianca, G. J. (2024). ga-deepautoencoder-knn model for employee The role of sociopolitical workplace networks in turnover prediction. Statistics, Optimization & involuntary employee turnover. Social Networks, Information Computing, 12(1). 76, 215-229. https://doi.org/10.19139/soic-2310-5070-1799 https://doi.org/10.1016/j.socnet.2023.10.005 [17] Mcevoy, G. M., & Cascio, W. F. (1987). Do good or [7] Gopalan, N., Beutell, N. J., & Alstete, J. W. (2023). poor performers leave? a meta-analysis of the Can trust in management help? job satisfaction, relationship between performance and turnover. healthy lifestyle, and turnover intentions. Academy of Management Journal, 30(4), 744-762. International Journal of Organization Theory & https://doi.org/10.5465/256158 Behavior, 26(3), 185-202. [18] Nan, L., & Zhang, H. (2023). A model for analyzing https://doi.org/10.1108/IJOTB-09-2022-0180 employee turnover in enterprises based on improved [8] Hakim, E., & Muklason, A. (2024). Analysis of xgboost algorithm. International Journal of employee work stress using crisp-dm to reduce Advanced Computer Science & Applications, work stress on reasons for employee resignation. 14(11). Data Science: Journal of Computing & Applied https://doi.org/10.14569/ijacsa.2023.01411104 Informatics, 8(2). [19] Nigoti, U., David, R., Singh, S., Jain, R., & Kulkarni, https://doi.org/10.32734/jocai.v8.i2-14615 N. M. (2025). Does flexibility really matter to [9] Hom, P. W., Rogers, K., Allen, D. G., Zhang, M., employees? a mixed methods investigation of Lee, C., & Zhao, H. H. (2025). Feel the pressure? factors driving turnover intention in the context of normative pressures as a unifying mechanism for the great resignation. Global Journal of Flexible relational antecedents of employee turnover. Human Systems Management, 26(1), 187-208. Resource Management, 64(1). https://doi.org/10.1007/s40171-024-00436-6 https://doi.org/10.1002/hrm.22250 [20] Panaccio, A., Tang, W. G., & Vandenberghe, C. [10] Iii, V. Y. H., Guerrero, S., & Marchand, A. (2024). (2023). Agreeable supervisors promoting the Flexible work arrangements and employee turnover organization - implications for employee intentions: contrasting pathways. International commitment and retention. Journal of Personnel Journal of Human Resource Management, 35(11), Psychology, 22(3), 146-157. 290 Informatica 49 (2025) 269–290 H. Zhang https://doi.org/10.1027/1866-5888/a000318 [21] Portocarrero, F. F., & Burbano, V. C. (2024). The effects of a short-term corporate social impact activity on employee turnover: field experimental evidence. Management Science, 70(9). https://doi.org/10.1287/mnsc.2022.01517 [22] Pourkhodabakhsh, N., Mamoudan, M. M., & Bozorgi-Amiri, A. (2022). Effective machine learning, meta-heuristic algorithms and multi- criteria decision making to minimizing human resource turnover. Applied Intelligence, 1-23. https://doi.org/10.1007/s10489-022-04294-6 [23] Azeroual, O., Nacheva, R., Nikiforova, A., & Störl, U. (2025). A CRISP-DM and predictive analytics framework for enhanced decision-making in research information management systems. Informatica, 49(18). https://doi.org/10.31449/inf.v49i18.5613 [24] Van Ruysseveldt, J., Van Dam, K., Verboon, P., & Roberts, A. (2023). Job characteristics, job attitudes and employee withdrawal behaviour: a latent change score approach. Applied Psychology: An International Review, 72(4). https://doi.org/10.1111/apps.12448 [25] Veglio, V., Romanello, R., & Pedersen, T. (2025). Employee turnover in multinational corporations: a supervised machine learning approach. Review of Managerial Science, 19(3), 687-728. https://doi.org/10.1007/s11846-024-00769-7 https://doi.org/10.31449/inf.v49i16.9243 Informatica 49 (2025) 291–302 291 Comparative Analysis of Machine Learning Models for Water Quality Prediction Using Regional Monitoring Data Ying Xiong Chongqing Water Resources and Electric Engineering College, Chongqing 402160, China E-mail: xiong-ying188@hotmail.com Keywords: water quality prediction, machine learning, decision tree, SVM, random forest, neural network Received: May 15, 2025 This study investigates the comparative performance of four labelical machine learning algorithms— Decision Tree, Support Vector Machine (SVM), Random Forest, and Neural Network—on water quality prediction tasks using a dataset comprising 1,000 real-time sensor data points from five distinct geographic regions. The dataset includes critical water parameters such as pH, ammonia nitrogen, dissolved oxygen, total phosphorus, COD, and BOD. Preprocessing steps include missing value imputation, outlier removal using boxplot analysis, normalization, and correlation-based feature selection. Each model is tuned through grid search for optimal performance. Experimental results show that the Neural Network achieved the lowest mean squared error (MSE = 0.047) and highest coefficient of determination (R² = 0.976), outperforming the other models. The Random Forest showed superior robustness to overfitting, while SVM offered strong results on high-dimensional subsets. Decision Trees, although less accurate (MSE = 0.130), provided high interpretability. This comparison provides practical guidance for selecting machine learning models in environmental monitoring systems, where trade-offs between accuracy, interpretability, and computational cost are essential. Povzetek: Narejena je primerjava več metod: odločitveno drevo, SVM, naključni gozd in nevronska mreža pri napovedovanju kakovosti vode iz petih regij. Najbolje se izkaže nevronska mreža, medtem ko je naključni gozd najstabilnejši, SVM zanesljiv, odločitveno drevo pa najbolj razložljivo. accurately capture nonlinear relationships in water 1 Introduction quality changes, and this study highlights the application potential of machine learning in complex water quality Water pollution affects the health of human beings and data analysis [2]. Li et al. studied the impact of climate the stability of ecosystem. The process of change on river water quality and used machine learning industrialization and urbanization is accelerating, and the technology for data analysis. They found that machine pollution of water source is becoming more and more learning can cope with water quality prediction under serious. Traditional water quality monitoring methods changes in multiple variables and complex environmental rely on manual sampling and laboratory analysis, which factors [3]. Aalipour et al. analyzed the impact of is inefficient and slow, and can not be monitored in real landscape changes on river water quality, and machine time. With the development of artificial intelligence learning models were able to process complex technology, machine learning, as an efficient data environmental data and provide accurate water quality analysis tool, can learn and forecast a large number of predictions [4]. Stevens et al. reviewed the application of water quality data to provide real-time and accurate water machine learning in electronic health record screening, quality early warning. suggesting the potential of integrated machine learning Research in the field of water quality prediction and approaches in several fields [5]. Zou et al. summarized monitoring has developed in recent years, and machine the application of machine learning in precision medicine learning technology has been widely used in water therapy, believing that machine learning can process quality data analysis. Eyring et al. explored the potential complex multidimensional data and extract key of combining climate modeling with machine learning, influencing factors [6]. Zainurin et al. reviewed in detail arguing that machine learning could drive innovation in the progress of water quality monitoring based on various environmental data processing [1]. Bren and Ryan used sensor technologies and emphasized the role of machine machine learning technology to analyze water quality learning in real-time processing of water quality data [7]. monitoring data when studying water quality in streams Recent years have seen an increasing number of in the eastern Highlands. Machine learning models can studies applying machine learning techniques to water 292 Informatica 49 (2025) 291–302 Y. Xiong quality prediction, with diverse regional and classical machine learning algorithms on a uniform, environmental contexts. Quiroz-Martinez et al. [8] multi-regional dataset. Most prior research focuses either proposed a big-data-driven architecture for aquaculture on a single water parameter or uses proprietary datasets water quality prediction, focusing on real-time lacking reproducibility. By comparing model integration and scalability. Their system emphasizes the interpretability, error profiles, and training costs across structural design of prediction frameworks rather than diverse indicators (e.g., DO, COD, NH₃-N), this work algorithm benchmarking. In northeastern Thailand, contributes practical insights for regional water Uypatchawong and Chanamarn [9] demonstrated the monitoring deployment. improvement of prediction efficiency using machine Table 1 summarizes representative studies that learning models such as Random Forest and Support applied machine learning to water quality or similar Vector Machines. Their work underscores the environmental data prediction tasks. It outlines the significance of regional hydrological features and data datasets used, applied models, key evaluation metrics, preprocessing in boosting model performance. In a and findings. This comparison reveals that while some complex environmental scenario, Huang et al. [10] studies employ modern deep learning models or domain- developed a water quality prediction model for the specific architectures, limited work provides a direct downstream of Dongjiang River Basin, incorporating comparative evaluation of labelical ML models using joint impacts from water intakes, pollution sources, and diverse yet small-scale environmental datasets— climate variability. They utilized spatial-temporal data precisely the focus of our study. fusion and ensemble learning to capture dynamic This study analyzes the application of machine interactions across multiple influencing factors. Wu and learning algorithms in water quality prediction, compares Zhang [11] focused on the Yangtze River Delta, applying the performance of different algorithms, and finds the machine learning within the governance framework of best water quality prediction model. Machine learning China’s River Chief System. Their study highlights algorithm was used to analyze and model water quality policy-driven data availability and found that SVM and data, collect water quality data from different regions, and ANN models are particularly effective in capturing conduct data pre-processing. Select a variety of machine variations in high-density industrial and urban runoff learning algorithms, design and train models to evaluate areas. Despite the growing body of literature, most their performance in water quality prediction. Indexes existing studies focus either on a single prediction model such as mean square error (MSE) and coefficient of or on narrowly scoped geographical settings. Few works determination (R²) were used to evaluate the model offer a controlled, algorithm-level comparative analysis performance, compare algorithms, analyze advantages using standardized metrics across classical models such and disadvantages, and select the most suitable algorithm as Decision Tree, SVM, Random Forest, and Neural for water quality prediction. According to different water Network on multi-parametric datasets. This study quality parameters, the adaptability of the algorithm is addresses that gap by benchmarking these models on a studied, and the optimization path of water quality five-region dataset using consistent preprocessing, prediction is explored. It enriches the theoretical research hyperparameter tuning, and evaluation standards. in the field of water quality monitoring, provides a This study fills a methodological gap in the current technical scheme for practical application, and has high literature by providing a standardized comparison of four social value and application prospect. Table 1: Summary of previous research on ML in water quality prediction Study Dataset Description Models Used Evaluation Metrics Key Findings ML models captured Stream water Bren & Ryan [2] SVM, k-NN Accuracy, RMSE nonlinearity in stream (regional, 500 pts) pollution River systems with ML effective in multi- Li et al. [3] RF, ANN R², RMSE climate inputs variable prediction Landscape shape River data with land Aalipour et al. [4] RF, SVM MAE, R² significantly affects patches prediction Neural network Five zones (urban to This Study DT, SVM, RF, NN MSE, R² superior in nonlinear industrial), 1000 pts prediction Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 293 Table 2: Source of water quality data and sample overview Region Sample Size Water Quality Parameters Data Source pH, Dissolved Oxygen, Ammonia Water Quality Area A 200 Nitrogen, Total Phosphorus Monitoring Station Environmental Area B 200 pH, COD, BOD, Ammonia Nitrogen Protection Department Dissolved Oxygen, pH, Total Phosphorus, Area C 200 Water Affairs Company COD Dissolved Oxygen, Ammonia Nitrogen, Water Quality Testing Area D 200 pH, BOD Platform pH, Ammonia Nitrogen, Total Environmental Area E 200 Phosphorus, COD Monitoring Center This study aims to address the following research enhance model robustness and cross-context validity. question: Which labelical machine learning algorithm offers the best trade-off between predictive accuracy and 2.1.2 Data preprocessing computational efficiency for small-scale, region-specific After data collection, pre-processing is performed. water quality datasets? By formulating and evaluating Processing missing values, for a small amount of missing models under consistent conditions, the study data, use the mean filling method and interpolation hypothesizes that deep neural networks will provide method to fill; For variables with more missing data, the superior performance in accuracy, while ensemble features are removed to ensure the integrity of the data methods like Random Forest may offer better set. The identification and processing of outliers adopt the generalization with moderate cost. method based on box diagram, set reasonable upper and lower limits, and correct or delete the data that exceeds 2 Materials and methods the range [13]. In view of the dimensionality inconsistency of different water quality parameters, standardized treatment was used to scale the numerical 2.1 Data collection and sample selection range of each feature to a unified scale, so as to avoid the 2.1.1 Data source deviation of the training results of the model due to This study uses water quality data from five different dimensional differences. In terms of feature selection, the regions, covering a variety of environmental types method based on correlation analysis is used to calculate including urban, rural and industrial areas. It is divided the Pearson correlation coefficient between various water into zones A, B, C, D and E, covering different water quality parameters and select the features with strong quality monitoring points to ensure the diversity and correlation with target variables (such as water quality representativeness of data. For example, pH value, changes). The features are screened by Chi-square test dissolved oxygen, ammonia nitrogen, total phosphorus, and information gain, and redundant or irrelevant chemical oxygen demand (COD), biochemical oxygen variables are removed to improve the accuracy and demand (BOD), etc., the specific data amount is 200 for training efficiency of the model. Feature selection was each region, a total of 1000 data [12]. The data is conducted using both chi-square testing and Pearson provided by local water quality monitoring agencies and correlation filtering. The chi-square test evaluated environmental protection departments and collected in statistical independence between discrete features and real time through sensor systems. As shown in Table 1, categorical target representations, with features showing these data reflect the water quality changes in different p-values greater than 0.05 removed. Pearson correlation regions in different time periods, and provide effective coefficients below 0.3 with the output variable indicated training samples for the construction of water quality weak linear relevance and were also excluded. Based on prediction models. these criteria, features such as conductivity and total The dataset employed in this study consists of 1,000 nitrogen were eliminated. The final set of retained samples sourced from five regions, which, while diverse, features included pH, ammonia nitrogen, dissolved constitutes a relatively limited dataset. This limitation oxygen, COD, and total phosphorus. potentially impacts the generalizability of the model. To address this, future work will consider the integration of 2.1.3 Data division synthetic data generation techniques (e.g., SMOTE or GAN-based augmentation) or the inclusion of additional The data set is divided into training set, verification set datasets from broader spatial or temporal domains to and test set in proportion, as shown in Table 2 below, with 294 Informatica 49 (2025) 291–302 Y. Xiong training set accounting for 60%, verification set 2.2.2 Model architecture design accounting for 20%, and test set accounting for 20%. The The basic architecture design of each model was training set is used for model training and parameter optimized according to the characteristics of water tuning, the verification set is used for model performance quality prediction. CART algorithm was adopted in the evaluation and hyperparameter selection, and the test set decision tree model, with the maximum depth set at 10 is used for final model verification and evaluation [14]. and the minimum number of samples divided at 5. The division method adopts random sampling to ensure Pruning is used to avoid overfitting and improve the that each data point has an equal opportunity to be generalization ability of the model. Support vector assigned to different sets, and the distribution of water machine (SVM) RBF kernel is used to balance training quality data in each data set is consistent with the overall accuracy and model complexity by selecting a moderate data set. To prevent data leakage, all preprocessing penalty parameter C and kernel parameter γ. The random steps—standardization, outlier removal, and feature forest model sets 100 trees with a maximum depth of 15, selection—were applied strictly to the training set. The using a restriction that does not allow nodes to be divided validation and test sets were transformed using statistics too small (the minimum number of samples to be divided (mean, standard deviation) computed only from the is 5) [17]. The neural network uses three hidden layers training data. This ensures that no target information with 64 neurons each, ReLU for the activation function, leaked into the training process or model selection. and dropout technology during training to prevent overfitting. The learning rate, regularization method and Table 3: Data set partitioning results other hyperparameters of each model are optimized by Dataset Sample Size grid search to select the best combination [18]. The neural network architecture consisted of a multilayer perceptron Training Set 600 (MLP) with three fully connected hidden layers of 64 Validation Set 200 neurons each, using ReLU activation and dropout Test Set 200 regularization. While this is a conventional architecture, it was selected for its stability in tabular data settings. Although water quality inherently contains temporal 2.2 Model construction dependencies, the current study used a static snapshot for 2.2.1 Model selection model training. Future work will explore recurrent In order to improve the accuracy of water quality structures such as Long Short-Term Memory (LSTM) prediction, a variety of machine learning algorithms such and Graph Neural Networks (GNNs) to capture spatial as decision tree, support vector machine (SVM), random and temporal correlations in water quality dynamics. forest and neural network were selected for comparative analysis. The decision tree divides the data space and 2.2.3 Training process makes decisions layer by layer based on different values In the training process, the training parameters of each of features, which has good interpretability. It is suitable model are carefully set and optimized. In order to achieve for processing data with simple and obvious relationship the optimal performance, hyper parameters such as between features [15]. Support vector machine (SVM) learning rate, maximum depth and maximum number of can deal with high dimensional data by finding the iterations of all algorithms are selected. Decision trees optimal decision hyperplane, and can maintain good control the maximum depth to prevent overfitting, and performance in high dimensional feature space. Random random forests increase the number and depth of trees to forest is one of the ensembles learning methods, which improve predictive power. The training of the SVM constructs multiple decision trees and votes to avoid model adjusts the penalty parameter C and the kernel overfitting problems and is suitable for processing large- function parameter γ to optimize the beatification scale data sets. Neural networks, deep neural networks boundary of the model in the high-dimensional space. As (DNNS), map input data through multiple hidden layers, shown in Table 3, the training of neural networks uses the have powerful modeling capabilities, and can capture Adam optimizer, adjusting the learning rate, batch size, complex nonlinear relationships in the data [16]. and number of training rounds to ensure convergence. Although Support Vector Machines (SVMs) are Hyperparameter tuning was conducted using a grid well-known for handling high-dimensional data, in this search strategy. For SVM, we evaluated C values in [0.1, study the input feature dimension is relatively low (6–7 1, 10] and γ values in [0.01, 0.1, 1]. For Random Forest, features). The inclusion of SVM is primarily justified by tree depths from 10 to 25 and estimators from 50 to 150 its robust generalization capabilities on small-to- were considered. Neural network tuning involved batch medium-sized datasets and its effectiveness in capturing sizes of 32 and 64, learning rates of 0.001 and 0.0005, nonlinear boundaries via kernel methods, not due to high and dropout rates of 0.2 to 0.5. The optimal configuration dimensionality. was selected based on the lowest validation MSE. Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 295 Table 4: Training parameters and optimization the model. Equation (3) is shown below. objectives of each algorithm TP+TN Accuracy = (3) Key Optimization TP+TN + FP+ FN Model Parameters Objectives TP is a true example, TN is a true negative, FP is a false positive example, and FN is a false counterexample. Max Depth = Pruning, Decision Tree In addition to MSE and R², we included Mean Absolute 10 Generalization Error (MAE) as a robustness metric. MAE values for C = [0.1, 1, Minimize MSE Neural Network, Random Forest, SVM, and Decision Tree were 0.058, 0.065, 0.071, and 0.094, respectively. SVM 10], γ = [0.01, via kernel Furthermore, residual plots and feature influence 0.1] optimization diagrams were generated using SHAP values to interpret Trees = 100, Reduce model outputs and identify the most impactful parameters. Random Max Depth = overfitting, Forest 15 improve stability 2.3 Algorithm comparison and analysis Layers = 5, 2.3.1 Algorithm comparison In the water quality prediction task, four selected machine Neural Neurons = Minimize MSE, learning algorithms - decision tree, support vector Network 64/layer, regularization machine (SVM), random forest and neural network - Dropout = 0.3 showed different performance characteristics. The mean square error (MSE) and coefficient of determination (R²) are used as the main performance indicators to 2.2.4 Evaluation criteria comprehensively evaluate the merits of each model. The In order to evaluate the performance of each model in evaluation results of each model on the test set are shown water quality prediction, mean square error (MSE), in Figure 1 below. determination coefficient (R²) and accuracy rate were selected as the main evaluation indexes [19]. Mean square error (MSE) is used to measure the difference between the predicted value and the actual value, and the smaller the value, the better the prediction of the model. The coefficient of determination (R²) reflects the model's ability to explain data variation, and the closer it is to 1, the stronger the model's ability to explain data variation. Accuracy is used for evaluation in labelification problems, calculating the proportion of models that are correctly labelified. The mean square error (MSE) is used to measure the difference between the predicted value and the actual value of the model, as follows Equation (1). 1 n MSE =  (y − ˆ i yi ) 2 (1) Figure 1: Performance comparison of different n i=1 algorithms Where, y is the actual value, yˆ is the predicted i i value, n is the total number of samples. The coefficient As shown in Figure 1, the neural network performed of determination R² is used to measure the ability of the best in the accuracy of water quality prediction, with the model to explain the variation in the data, as follows smallest MSE (0.047) and the largest R² (0.976). Random Equation (2). forests and support vector machines also performed well, n achContrary to initial assumptions, tieving MSE of 0.053  (y − ˆ i y )2 i R2 =1− i=1 and 0.058, and R² of 0.963 and 0.95, respectively. The (2) n performance of decision tree is relatively weak, although  ( y − y)2 i the R² is 0.945 and the MSE is large, there are large errors i=1 in water quality prediction [20]. Neural networks are y is the actual value, y is the predicted value, i suitable for dealing with complex nonlinear relationships and yˆ is the mean of the actual value. Accuracy is a in water quality data, random forests and support vector i machines perform well in medium complexity problems, common evaluation criterion in labelification problems, and decision trees are more suitable for simple calculating the proportion of correct predictions made by relationships between features. 296 Informatica 49 (2025) 291–302 Y. Xiong 2.3.2 Influencing factors of algorithm selection As shown in Figure 2, neural networks perform best (1) Compare the performance differences of different in the prediction of NHL and DO, with the lowest MSE algorithms in the prediction of specific water quality and the highest R². Neural networks have advantages in parameters capturing complex nonlinear relationships in water Different algorithms show differences when dealing quality data. The performance of random forest and with specific water quality parameters. Taking ammonia support vector machine on these two parameters is nitrogen (NHL) and dissolved oxygen (DO) as an similar and relatively stable. The prediction error of example, the prediction performance of four algorithms decision tree in these two indexes is relatively large, and in these two indicators is shown in Figure 2 below. the prediction performance of NHL is relatively poor [21]. Figure 2: Differences of different algorithms in the prediction of specific water quality parameters Differences in training time and computational complexity of different algorithms Computational Complexity (seconds/sample) Training Time (seconds) 0,15 Neural Network 72,4 0,06 Random Forest 30,6 0,03 Support Vector Machine 15,3 0,01 Decision Tree 5,2 Figure 3: Differences in training time and computational complexity of different algorithms Training time and resource usage were benchmarked (2) Compare the differences between different on an Intel i7-12700H CPU (16GB RAM) and NVIDIA algorithms in terms of training time and average RTX 3060 GPU. For per-sample inference: Decision Tree inference time per sample during test phase. In addition = 0.002s, SVM = 0.013s, Random Forest = 0.010s, to prediction accuracy, the training time and Neural Network = 0.021s. GPU memory consumption for computational complexity of the algorithm are also the neural network peaked at 612MB. Training duration important considerations when selecting a model. Figure for the largest model (NN) was approximately 95 seconds 3 below shows the difference in training time and for 600 training samples. computational complexity of different algorithms [22]. Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 297 As shown in Figure 3, the training time and prediction tasks and data characteristics. When facing the computational complexity of decision tree are lower than prediction of various water quality parameters, the most other algorithms, which is suitable for application in suitable algorithm is selected according to the scenarios with high real-time requirements. There is a characteristics of each parameter. For the complex small gap between support vector machine and random nonlinear relationship between water quality parameters, forest in training time, and the training time will increase the integrated learning methods such as neural network with the increase of sample number [23]. The training and random forest are more effective. Decision tree and time of neural network is the longest and the support vector machine are better choices when the data computational complexity is also high. Because of its volume is small or the computing resources are limited. complex network architecture, it needs more computing In algorithm optimization, the hyperparameters of the resources. According to Figure 3, if the system has a high model are adjusted to improve the prediction accuracy. requirement for real-time performance and a large The learning rate, the number of layers and the number of amount of training data, decision tree or support vector neurons per layer in the neural network should be machine can be suitable. In the case of high precision and adjusted according to the specific task. Support vector sufficient computing resources, neural network is more machine should select appropriate kernel function and ideal. adjust penalty factor to improve the accuracy of model. The cross-validation method was used to optimize the parameters to improve the accuracy of the model and 2.4 Optimization suggestions and avoid overfitting. Integrated learning methods such as implementation path of water quality Adaboost and XGBoost improve the stability and prediction accuracy of water quality prediction through the 2.4.1 Optimal collection and processing path of water combination of multiple models. In view of the drastic changes of some water quality parameters, the time series quality data analysis technology is introduced and the historical data The accuracy of water quality prediction is highly is dynamically adjusted to improve the real-time dependent on the quality of data. Optimizing the prediction. collection and processing of data can improve the Integrated learning methods such as random forest prediction accuracy. The collection of water quality data and boosting are particularly effective in managing should be combined with a variety of sensors and variance and overfitting. Neural networks, while not monitoring means to obtain various indicators of water in ensemble models per se, excel at learning nonlinear a comprehensive, real-time and accurate manner. Water relationships through multi-layered representation quality monitoring equipment is deployed to collect learning. Their inclusion here refers to their water quality parameters such as ammonia nitrogen, complementary role in hybrid modeling, not as ensemble dissolved oxygen, pH value and total nitrogen in real time, learners. avoiding the shortage of traditional water quality monitoring relying on periodic sampling. The key to optimize the acquisition path is to increase the frequency 2.4.3 Real-time feedback and decision support path of of data acquisition and multi-dimensional monitoring to water quality prediction results enhance the misrepresentations and timeliness of data. The real-time feedback of water quality prediction results Data multiprocessing improves the model effect. For can help to detect water quality problems in time and missing values, interpolation method or data of similar provide strong support for decision-making. Combined indicators are used to fill in to ensure data integrity. For with real-time monitoring system and data transmission outliers, statistical methods such as box plots or standard network, the forecast results are transmitted to the control deviations are used to screen and correct. center in real time, which is convenient for relevant This study utilized a static dataset of 1,000 departments and personnel to make decisions. The observations for model evaluation. While real-time realization path of real-time feedback relies on big data modeling and dynamic feedback were not implemented, platform and cloud computing technology, and uses real- their inclusion as forward-looking strategies aims to time data stream processing technology to update the guide system improvement in practical deployments. forecast results to the monitoring system in real time to Real-time data acquisition, time-series analysis, and ensure the timeliness and accuracy of decision-making. multidimensional monitoring are intended as future The results of water quality prediction should be research directions. embedded in decision support systems to help decision makers carry out more scientific analysis. Through data 2.4.2 Adaptive model selection and algorithm visualization technology, the prediction results and water quality change trends are displayed, and the risk optimization path assessment of machine learning models is combined to The selection of the adaptive model is determined provide a more comprehensive decision-making basis. according to the requirements of different water quality The forecast results can be correlated with relevant 298 Informatica 49 (2025) 291–302 Y. Xiong monitoring data to identify potential problems in water quality in real time, give early warning and take 3 Results and discussions appropriate measures. To assess real-time applicability, the system latency was analyzed based on the data input- 3.1 Result analysis to-output delay. Inferences on a mid-tier GPU (RTX 3060) 3.1.1 Evaluation results of each model showed average prediction latency of 0.21 seconds per In water quality prediction task, the choice of algorithm sample. The system supports batch updates every 10 directly affects the prediction accuracy and error minutes with low-latency pipelines. For deployment, performance. The mean square error (MSE) and models are integrated via edge-based computation units coefficient of determination (R²) were used to evaluate for decentralized monitoring or cloud-based APIs for the predictive performance of each model. In the centralized processing, depending on the infrastructure evaluation process of the model, the prediction results of scenario. four machine learning algorithms - decision tree, support vector machine (SVM), random forest and neural 2.4.4 Combination path of model and automation network - were compared one by one. The evaluation results of decision tree model show system that it performs well in the prediction of some water The water quality prediction model is combined with the quality parameters, such as ammonia nitrogen, total automatic system to realize fully automated water quality nitrogen, etc. For these parameters, the R² value of the monitoring and regulation, and improve the efficiency decision tree model can reach more than 0.85, and the and accuracy of water resources management. Through MSE is low. In the face of more complex water quality sensing the real-time data collected by the equipment, the data, over fitting is easy to occur, resulting in the decline automatic system input it into the prediction model, of the prediction accuracy of other water quality automatically calculate and feedback the water quality parameters. prediction results, and guide the automatic SVM was stable in the prediction of multiple water implementation of water quality improvement measures. quality parameters (e.g., dissolved oxygen, pH, etc.), with Based on the predicted results, the automated system can R² values generally above 0.80 and MSE remaining at a adjust the operating state of the water treatment low level when dealing with linearly correlated data. The equipment, deal with water quality anomalies in a timely stochastic forest model improves the robustness of data manner, and avoid delays caused by manual intervention. by integrating multiple decision trees. Compared with the In the specific application process, the combination of single decision tree model, the random forest showed a Internet of Things (IT) technology and edge computing higher R² value in the prediction of multiple water quality improves the real-time response capability of automated parameters, up to 0.85, and fewer over fitting phenomena. systems. Move data acquisition and preliminary analysis In the face of data with nonlinear relationship, random to edge devices, take the pressure off cloud processing, forest can adapt well. and enable fast decision making and execution locally. The neural network model shows strong prediction Edge computing ensures that systems can operate ability through deep structure and optimization algorithm. efficiently even when network latency is high or offline. On a large data set, the neural network can better capture Through automatic control, automatic adjustment of the complex relationship between water quality water treatment facilities, discharge control equipment, parameters. In this experiment, the R² value of the neural etc., improve the intelligent level of water quality network in multiple water quality parameters is more than management. The path to combining a water quality 0.90, which shows its potential in water quality prediction. prediction model with an automated system needs to Neural network requires higher computing resources, and ensure seamless connectivity, including data collection, the training time is longer. Figure 4 below shows the transmission, processing, decision support, and executive evaluation results of each model, including the MSE and feedback. Through highly integrated systems, improve R² values of each model for different water quality the level of automation, intelligence and refinement of parameters, and visually presents the prediction accuracy water quality management, and promote the development and error performance of different algorithms. of water resources management to a more efficient and Contrary to initial assumptions, the decision tree accurate direction. model performed better on simpler parameters such as pH and dissolved oxygen (MSE < 0.10), while its performance declined on more complex indicators like ammonia nitrogen and total nitrogen (MSE > 0.11). For random forest, all four key parameters achieved R² values exceeding 0.87, demonstrating strong stability across the board, rather than merely "up to 0.85" as previously stated. Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 299 Figure 4: Prediction accuracy and error analysis of each model 3.1.2 Model evaluation and comparison complex data. SVM depends on the choice of kernel According to the evaluation results of each model, it can function and the adjustment of penalty factor. Good be seen that they differ in the prediction accuracy and parameter selection can improve the generalization error of different water quality parameters. In order to ability of the model. The integration of multiple decision compare the advantages and disadvantages of each model trees in random forest reduces the possibility of over in more detail, the parameter configuration, training time fitting and increases the training time and computational and computational complexity of the model are analyzed. complexity. The neural network controls the complexity The main parameters of decision tree include tree of the model by setting the number of layers, the number depth and branching number. Optimizing these of neurons and the learning rate. Due to the large parameters can improve the performance of the model. In computing resource demand, the training time is longer. the training process, the calculation speed of decision tree Table 4 below shows the parameter configuration and is fast, and over fitting will occur when dealing with performance comparison of different models. Table 5: Parameter configuration and performance comparison of each model Training Time Model Depth / Layers Key Parameters MSE R² (s) Decision Tree Depth = 10 32 Pruning 0.062 0.945 Kernel: RBF, C SVM - 48 0.058 0.95 = 1, γ = 0.1 Trees = 100, Random Forest 55 - 0.053 0.963 Depth = 15 LR = 0.001, Neural Network Layers = 5 × 64 120 0.047 0.976 Dropout = 0.3 To validate the observed differences in model significant (p < 0.01). Confidence intervals for MSE performance, paired t-tests were conducted between each differences were also computed, showing a 95% CI of algorithm's predictions across the test dataset. The MSE [0.013, 0.021] for the Neural Network vs. Random Forest differences between Neural Network and Decision Tree, comparison. These results confirm that performance as well as Neural Network and SVM, were statistically differences are not due to random chance, strengthening 300 Informatica 49 (2025) 291–302 Y. Xiong the validity of model selection recommendations. cross-validation logs and final test set measurements. 3.1.3 Result visualization 3.2 Discussion Visualizing prediction outcomes facilitates an intuitive In this study, four machine learning algorithms, namely understanding of model performance across different decision tree, support vector machine (SVM), random water quality parameters. In this study, bar charts were forest and neural network, were used to predict water utilized as the primary visualization method to present quality data. In the evaluation process, model selection both the Mean Squared Error (MSE) and the coefficient and parameter tuning directly affect the prediction of determination (R²) for each algorithm. This approach accuracy and training time. Different algorithms show enables a clear comparative analysis of prediction their advantages and disadvantages when processing accuracy and model fit on a per-parameter basis. The water quality data. result visualization is calculated in the following Although SVMs are theoretically sensitive to large Equation (4). datasets due to their reliance on support vector expansion, 1 n in this study, the actual training time (15.3 seconds) was MSE = (y 2 ( ) true,i − ypred,i ) 4 n lower than that of the random forest (30.6 seconds) and i=1 neural network (72.4 seconds), as shown in Figure 3. This y says the actual value, p true, y said redicted i pred,i indicates that under the current dataset scale (n = 1000), value, the amount of n observation point. Through SVM is computationally efficient. visualization, we can clearly see the error distribution and Decision tree model has strong interpretability and deviation degree of each model on different water quality is suitable for processing simple water quality data. The parameters. To assess overfitting, we monitored training advantage is that the influence of each feature on water and validation loss curves across epochs. For the neural quality can be clearly expressed through the tree structure. network model, convergence was achieved after 60 Decision trees are prone to over fitting in the face of epochs, with validation loss closely tracking training loss, complex data, which leads to the decline of prediction indicating minimal overfitting. Dropout (rate = 0.3) was accuracy. Decision tree model will also encounter employed to reduce model variance. The dropout rate was performance bottleneck when dealing with high selected based on validation performance across a tested dimensional data, and its prediction ability is limited. range of 0.2–0.5. The SVM algorithm performs well when dealing with high and nonlinear data, and the model is able to 3.1.4 Performance improvement formula capture complex relationships by mapping the data to Visualizing prediction outcomes facilitates an intuitive higher dimensions through kernel functions. SVM understanding of model performance across different performs well in the prediction of some water quality water quality parameters. In this study, bar charts were parameters, but its training time is long and the data utilized as the primary visualization method to present volume is large. The parameter selection of SVM has a both the Mean Squared Error (MSE) and the coefficient great influence on the model performance, and different of determination (R²) for each algorithm. This approach kernel functions and penalty factors will affect the enables a clear comparative analysis of prediction prediction results. accuracy and model fit on a per-parameter basis. The By integrating multiple decision trees, random performance improvement is calculated as follows forest effectively reduces the over fitting problem of a Equation (5). single decision tree. The model has strong robustness and (MSE performs well when dealing with large-scale data. before -MSEafter )PerformanceImprovement(%)= 100 Compared with decision tree, random forest can capture MSEbefore complex nonlinear relationship more accurately and has (5) higher prediction accuracy. Random forest also has the In this study, the performance of the optimized problem of long training time and large consumption of neural network model and random forest model has been computing resources, and the computing overhead is improved. Taking the neural network as an example, the large when running on large data sets. optimized MSE is reduced from 0.080 to 0.065, and the Neural network can automatically extract features performance improvement is 18.75%. For the random from data through deep learning and has strong forest model, the optimized MSE is reduced from 0.100 adaptability. The neural network is outstanding in the to 0.087, and the performance improvement is 13%. prediction of multiple water quality parameters, and has Through parameter optimization and algorithm high precision in the modeling of complex relationships. adjustment, the accuracy of water quality prediction can The neural network can handle large-scale data sets and be effectively improved. The optimized MSE for the has strong optimization ability in the training process. Neural Network improved from 0.080 (pre-optimization) The training time of neural network is longer, the to 0.065 (final), and Random Forest improved from 0.100 requirement of computing resources is higher, and more to 0.087. These values are now clearly sourced from work needs to be done in data multiprocessing and model Comparative Analysis of Machine Learning Models for Water… Informatica 49 (2025) 291–302 301 tuning. which have shown promise in environmental time-series forecasting. Benchmarking these models against classical methods on larger and real-time datasets could further 3.3 Model limitations and failure cases validate their practical applicability in ecological Despite overall good performance, several model- monitoring systems. specific limitations were observed. The decision tree model failed to generalize in cases with high parameter References correlation and missing value imputation, often leading to overfitting in low-variance subsets. SVM struggled [1] Eyring V, Collins WD, Gentine P, Barnes EA, when gamma and C were misaligned, producing flat Barreiro M, Beucler T, et al. Pushing the frontiers in decision surfaces and poor sensitivity for DO prediction. climate modelling and analysis with machine Random forest occasionally exhibited performance learning. Nat Clim Chang. 2024;14(1): 916–928. degradation when input features were highly collinear, DOI:10.1038/s41558-024-02095-y despite ensemble regularization. The neural network [2] Bren L, Ryan M. An Examination of Stream Water model, though highly accurate overall, required Quality Data from Monitoring of Forest Harvesting significant tuning and suffered from instability when in the Eastern Highlands of Victoria. Land. trained on incomplete datasets. These issues emphasize 2024;13(8):1217. DOI:10.3390/land13081217 the importance of hyperparameter validation, feature [3] Li L, Knapp JLA, Lintern A, Crystal Ng CH, decorrelation, and pre-processing robustness in real- Perdrial J, Sullivan PL, et al. River water quality world water quality monitoring. shaped by land-river connectivity in a changing climate. Nat Clim Chang. 2024;14(3):123-130. DOI:10.1038/s41558-023-01923-x 4 Conclusion [4] Aalipour M, Wu NC, Fohrer N, Kalkhajeh YK, Amiri BJ, et al. Examining the Influence of In this study, four kinds of machine learning algorithms, Landscape Patch Shapes on River Water Quality. namely decision tree, support vector machine, random Land. 2023;12(5):1011. forest and neural network, are compared to discuss their DOI:10.3390/land12051011 application effect in water quality prediction. The [5] Stevens CAT, Lyons ARM, Dharmayat K, Mahani experimental results show that the neural network model A, Ray KK, Vallejo-Vaz AJ, et al. Ensemble is superior in dealing with complex nonlinear relations machine learning methods in screening electronic and can improve the prediction accuracy. Random forest health records: A scoping review. Digit Health. 2023; model is slightly inferior to neural network in some cases, 9:20552076231173225. but has better stability and lower risk of over fitting, and [6] Zou XT, Liu YN, Ji LN. Review: Machine learning is suitable for large-scale data processing. SVM is stable in precision pharmacotherapy of type 2 diabetes-A in the prediction of some water quality parameters, but promising future or a glimpse of hope? Digit Health. the training time is long and it is sensitive to the selection 2023; 9:20552076231203879. of parameters. Decision tree is suitable for preliminary [7] Zainurin SN, Ismail WZW, Mahamud SNI, Ismail I, analysis because of its strong interpretability, but it has Jamaludin J, Ariffin KNZ, et al. Advancements in limitations when dealing with complex data. Monitoring Water Quality Based on Various Future work can be optimized from two aspects, Sensing Methods: A Systematic Review. Int J according to the characteristics of different water quality Environ Res Public Health. 2022;19(21):14080. parameters, combined with a variety of algorithms for DOI:10.3390/ijerph192114080 integrated learning, to improve the prediction accuracy [8] Quiroz-Martinez M A, Perez-Vitonera A, Gómez- and stability of the model. The real-time and Rios, Monica, et al. Architecture Design for the computational efficiency of the model are also problems Implementation of a Water Quality Prediction in practical applications, which need to optimize the System in Aquaculture Systems with Big Data. training process of the model and reduce the International Conference on Applied Technologies. computational overhead. Through the research of this Springer, Cham, 2025.DOI:10.1007/978-3-031- paper, machine learning has a broad application prospect 89757-3_12. in the field of water quality prediction. With the help of [9] Uypatchawong S, Chanamarn N. Enhancing surface reasonable algorithm selection and optimization strategy, water quality prediction efficiency in northeastern more efficient and accurate technical support can be thailand using machine learning. Indonesian Journal provided for water quality monitoring, and the of Electrical Engineering & Computer Science, development of intelligent water environment 2024, 36(2). DOI:10.11591/ijeecs. v36.i2.pp1189- management can be promoted. 1198. Future work will explore the integration of advanced [10] Huang Y, Cai Y, He Y, et al. A water quality deep learning architectures, such as Temporal prediction approach for the Downstream and Delta Convolutional Networks (TCNs), Transformer-based of Dongjiang River Basin under the joint effects of sequence models, and hybrid attention-GNN frameworks, 302 Informatica 49 (2025) 291–302 Y. Xiong water intakes, pollution sources, and climate change. [17] Xu M, Lee C, Ng C, et al. Assessment of machine Journal of Hydrology, 2024, learning models in forecasting environmental 640(000):18.DOI:10.1016/j.jhydrol.2024.131686. impacts of industrial activities. Environ Impact [11] Wu G, Zhang C. Analysis of water quality Assess Rev. 2024;48(2):45-58. prediction in the yangtze river delta under the river DOI:10.1016/j.apr.2022.101438 chief system. Sustainability, 2024, 16(13):5578. [18] Yuan M, Shi Y, Liu Y, et al. Leveraging machine DOI:10.3390/su16135578. learning for personalized cancer treatment: Recent [12] Lopes RH, Silva CRDV, Salvador PTCD, Silva ÍdS, advances and challenges. Cancer Lett. 2024;514:1- Heller L, Uchôa, SADC. Surveillance of drinking 13. DOI:PQDT:89409451 water quality worldwide: scoping review protocol. [19] Tang R, Zhang Z, Li H, et al. Application of deep Int J Environ Res Public Health. 2022;19(15):8989. learning in the management of chronic diseases: A DOI:10.3390/ijerph19158989 review. Chronic Dis Transl Med. 2023;9(3):235-249. [13] Liu Z, Wang X, Zhang Y, et al. Big data and machine DOI:10.2147/IJGM.S516247 learning approaches in health applications: An [20] Cheng YR, Li G, Zhou X, Ye SH. Research on time overview. J Healthc Inform Res. 2020;47(2):184- series forecasting models based on hybrid attention 200. DOI:10.1038/s41575-020-0327-3 mechanism and graph neural networks. Inform. [14] Huang Y, Lee R, Wang S, et al. AI-driven diagnosis 2025;49(21). doi:10.31449/inf.v49i21.7580 in medical imaging: A survey of applications and [21] Pipalwa R, Paul A, Mukherjee T. Prediction of heart challenges. Int J Comput Assist Radiol Surg. disease using modified hybrid labelifier. Inform. 2024;19(5):1215-1224. DOI:10.1007/s13721-024- 2023;47(1). doi:10.31449/inf.v47i1.3629 00491-0 [22] Wang P, Han Q, Zhang S, Wu Z. Machine learning- [15] Zhang Y, Chen Y, Wu S, et al. Deep learning for based regression analysis and feature ranking for predictive modeling of climate-related diseases: A localization error prediction in wireless sensor systematic review. J Clim Change Health. networks. Inform. 2025;49(20). 2023;5:100034. DOI: doi:10.31449/inf.v49i20.8081 [16] Yang Z, Zhang L, Lu Y, et al. Neural network-based [23] Cavalieri S, Scroppo MS. A CLR virtual machine models in environmental health data analysis: A based execution framework for IEC 61131-3 comparative study. Environ Health Perspect. applications. Inform. 2019;43(2). 2024;132(7):073004. doi:10.31449/inf.v43i2.2019 DOI:10.1109/TGRS.2025.3529322 https://doi.org/10.31449/inf.v49i16.9602 Informatica 49 (2025) 303–314 303 A GAN-Based Framework for Synthetic Financial Data Generation, Risk Forecasting, and Portfolio Optimization under Uncertainty Aihua Li Department of Engineering Management, Henan Technical College of Construction, Zhengzhou, 450064, China E-mail: hnzdli@163.com Keywords: financial risk, dynamic prediction, decision optimization, generative adversarial network (GAN), machine learning, risk management and financial modeling Received: June 6, 2025 This article proposes a financial risk dynamic prediction and decision optimization model based on Generative Adversarial Network (GAN). The model generates synthetic financial data, trains a risk prediction model, and optimizes financial decisions based on predicted risks. Simulation results show that the proposed method outperforms traditional machine learning models, achieving a mean absolute error (MAE) of 0.012 and a mean squared error (MSE) of 0.002, indicating high prediction accuracy. The model achieves an average risk of 4.5% and an average return of 8.2%, surpassing conventional algorithms. With a recommended portfolio allocation of 65% equities, 30% bonds, and 5% cash, it optimizes investment decisions by maximizing returns while minimizing risks. Overall, the proposed approach provides a novel and effective solution for financial risk prediction and decision optimization, demonstrating superior performance over existing methods. Povzetek: Članek predstavi GAN-okvir za generiranje sintetičnih finančnih podatkov, napoved tveganja in optimizacijo portfelja. Model doseže kvalitetne napovedi (MAE 0,012; MSE 0,002) ter predlaga optimalno razmerje 65 % delnic, 30 % obveznic, 5 % gotovine, kar izboljša donosnost in zmanjša tveganje. 1 Introduction interdependence characteristic among the financial metrics of the companies [2]. Subsequent to the development of the capital market, the One can identify anomalous financial data of enterprises methodology for conducting financial analysis has by utilizing the commonly occurring groupings of experienced continuous improvements. The scope of financial indicators, referred to as frequent item sets of financial analysis will be broadened to encompass the financial indicators [3]. A feature associated with the evaluation of financial position, operating outcomes, and historical evolution of financial indicators across various cash flow of enterprises. In financial accounting, the sectors is the dynamic temporal correlation present among conventional analytical approach entails assessing an industries. A transmission phase will occur in which enterprise's financial condition quantitatively or alterations in the financial status of upstream enterprises qualitatively based on key indicators related to solvency, will impact downstream industry. Subsequent to this operational capacity, and profitability, along with the year- transmission period, the financial indicators of related over-year performance of these indicators [1]. The upstream and downstream sectors will display either a capacity to forecast financial risk exposure and positive or inverse connection over time. Forecasting the developmental trends is deemed inadequate. future financial state of downstream sectors is achievable Consequently, pertinent professionals began employing through an analysis of trend correlation [4-7]. increasingly sophisticated artificial intelligence and data Subsequently, reference [8] presents novel suggestions for mining techniques for financial research and forecasting. improving financial indicators, so contributing to the early Nonetheless, few studies have been undertaken to assess warning model of financial indicators. The study or forecast the operational circumstances of firms by referenced in [9] indicated that the returns on total assets, analyzing the associative relationships within financial the asset-liability ratio, and the working capital ratio are data. The connection relationship between corporate the most advantageous regarding their effects. financial data will provide several diverse manifestations, Reference [10] presented the application of numerous which will differ based on the various data elements. The financial indicators in the study of a financial risk early spatial association of enterprise finance pertains to the warning model. The researchers optimized five distance characteristics of financial indicators across comprehensive indicators from a total of 22 financial many dimensions. Moreover, enterprises situated in indicators, determined the weight coefficient for each, proximity within multi-dimensional environments have a built the Z-value model, and achieved significant results. higher degree of financial similarity. The static temporal In the realm of later corporate financial risk early warning association of financial indicators refers to the analysis, the Z-value model has achieved significant success via its endeavors. The concept of multivariate 304 Informatica 49 (2025) 303–314 A. Li linearity, as outlined in reference [11], demonstrates that Non-financial indicators are crucial in several firm the multivariate linear model is more appropriate for the financial early warning models, and the importance of contemporary enterprise financial early warning system their early warning analyses is paramount [15] and exhibits superior accuracy compared to the Concerning the purpose and role of financial diagnosis, multivariate early warning model. reference [16] said that for financial diagnosis to The principle of multivariate linearity underpins the contribute to the strategic development of the company, it formulation of the logistic regression model. Reference must be positioned at a strategic level. This was achieved [12] conducted a linear analysis employing the Logistic by identifying an alternate method to focus on the strategic linear regression model with the prevailing economic perspective. conditions and model attributes. They suggested that early A specific time period is frequently predicted using warning systems for financial risk could enhance their machine learning (ML) models, remote sensing accuracy through the accumulation of expertise derived techniques, and empirical models [18, 19]. The most from an increasing number of study samples and data promising technologies for forecast prediction are ML quantity. Thus, scholars have suggested that integrating models, which are frequently used in artificial neural factor analysis with the logistic regression model might networks (ANNs) because of their high accuracy. ARIMA more precisely represent the possible financial hazards is a well-known ML model that is particularly popular for associated with financial indicators. Moreover, it may time series data and has excellent accuracy for small diminish the superfluous weight resulting from the datasets [20, 21]. Table 1 present the comparison of redundancy of index elements, hence illustrating its proposed work with recent literature. enhanced accuracy and scientific validity. In the domain of financial risk early warning, neural Motivation and contribution networks have gained prominence because to the rapid Using Generative Adversarial Networks (GANs), the advancement of artificial intelligence and the robust proposed financial risk dynamic prediction and decision technological support afforded by big data on the internet. optimization model has many novel characteristics. First, The approach referenced in [13] suggests that early it creates fake financial data using GANs. A new way to warning enterprises might gain advantages from the forecast financial risk and make wiser decisions. Second, empirical risk reduction principle of neural networks. it simplifies money decisions by combining risk prediction Nevertheless, concurrently, the predictive efficacy of with decision optimization. Synthetic data production neural network early warning models utilizing machine generates realistic data, making the risk prediction model learning technology is improving significantly due to the more accurate and trustworthy. By considering risks, it rapid advancement of computer technology. helps individuals make sound financial decisions. Lowers Financial indicators not only objectively reflect an money loss risk. organization's operational and financial health but are also The suggested approach uses a GAN architecture to the most often utilized metrics in financial early warning generate fake financial data. GANs in finance are used in models. Due to its ease of acquisition, it has attracted this new method. A risk prediction model trained on GAN- considerable interest since the introduction of the generated fake data is also used. Thus, risk estimates are univariate early warning model. The selection of financial more reliable. indicators has evolved from a singular focus on metrics To test the proposed model, we simulate it. This gives us like the asset-liability ratio and equity ratio to a parallel an exact and full picture of its performance. Comparing it assessment of multiple indicators, ultimately advancing to to other machine learning models shows its superiority the categorization of specific financial indicators into and usefulness. These new experimental ideas help us various classifications to enhance model efficiency [14]. fully examine the model's abilities and observe how it can This modification was implemented to enhance the identify financial risks and make wiser decisions. model's efficiency. Table 1: Comparison of proposed work with recent literature Disadvantages Reference Key Focus/Contribution Advantages Highlighted (Implied/Potential) Gaps (Unaddressed by the Text) Crucial for firm financial early Not explicitly stated, but non- Specific types of non-financial warning; paramount for early financial data can be qualitative, indicators (e.g., ESG, operational, Importance of Non- warning analyses. Provides a harder to quantify, or less governance) and their individual financial Indicators in more holistic view beyond standardized. Data collection impact. Methodologies for integrating [15] Early Warning Models traditional financial ratios. might be more complex. diverse non-financial data. Contributes to the strategic Implies that if not strategically Strategic Positioning of development of the company positioned, financial diagnosis How to effectively integrate financial [16] Financial Diagnosis when positioned at a strategic might be limited to a tactical or diagnosis into the strategic planning level. Shifts focus from mere operational view, missing process. Specific "alternate methods" A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 305 solvency to long-term viability broader implications. for a strategic focus. and growth. Diverse range of methods Overview of Prediction available for predicting specific Comparative analysis of these Techniques (ML, Remote time periods. Suggests No specific disadvantages techniques for financial early warning Sensing, Empirical adaptability across various mentioned for these general specifically. When to choose one over [18] Models) domains. categories. the other for financial applications. Specific limitations of ANNs (e.g., interpretability, data requirements). ML models are "most promising ARIMA's limitation of "small How to handle highly volatile or non- technologies" with "high datasets" is mentioned, stationary financial time series. The accuracy." ANNs are frequently implying it might not be as challenges of implementing and used. ARIMA is "well-known" suitable for large or complex validating these models in real-world Machine Learning Models and "popular for time series data" financial datasets without financial settings. Addressing data (ANNs, ARIMA) for with "excellent accuracy for significant preprocessing or quality issues in financial datasets for [20] Forecast Prediction small datasets." combination with other models. ML models. Developing a financial risk dynamic prediction and decision optimization model using Generative Improved accuracy, Robust risk Proposed Adversarial Networks prediction and Optimized model (GANs) decision-making Complexity Interpretability 2 The proposed system Generative Adversarial Network (GAN): A generator plus a discriminator makes up a generative To address financial risk, the suggested system is a adversarial network (GAN). While the discriminator multifarious structure combining three main components. separates between actual and synthetic data [22–25], the It uses Generative Adversarial Networks (GANs) to create generator generates synthetic financial data similar to realistic synthetic financial data and improve prediction genuine data. An adversarial loss function reduces the accuracy, optimization models to guide best decision- difference between actual and synthetic data, hence making based on risk predictions, and time-series training the GAN. For the purpose of capturing financial data to capture the dynamic character of financial interactions throughout time, TimeGANs make use of risk, so offering a complete method of managing financial recurrent neural networks. These methods generate risk. Figure 1 presents the block diagram for the proposed respectable synthetic time series data by accurately system. The proposed model architecture is a multifaceted simulating time-series dynamics using extra networks. structure comprising four phases. Firstly, a Generative Generator (G): Adversarial Network (GAN) is trained to generate The generator aims to produce synthetic financial data that synthetic financial data that closely resembles real closely resembles the real data. 𝐺(𝑧; 𝜃𝑔) , where 𝑧 is a financial data. Secondly, a risk prediction model is trained random noise vector and 𝜃𝑔 represents the generator's using a combination of real and synthetic financial data to parameters. 𝐺(𝑧) generates synthetic financial time series predict future financial risk. Thirdly, the trained risk prediction model is utilized to predict future financial risk ?̃?. based on new, unknown input data. Lastly, the predicted Discriminator (D): financial risk is leveraged to optimize financial decisions, The discriminator aims to distinguish between real and such as portfolio allocation and risk management synthetic financial data. 𝐷(𝑥; 𝜃𝑑), where 𝑥 is the input strategies. data (either real or synthetic) and 𝜃𝑑 represents the discriminator's parameters. 𝐷(𝑥) outputs a probability 2.1 Data representation that 𝑥 is real. Loss Function: Financial data is inherently time-series based. Let 𝑋 = The GAN is trained by minimizing the following {𝑥 𝑇 𝑡}𝑡=1 represents the financial time series, where 𝑥𝑡 is a adversarial loss function: vector of financial features at time 𝑡. These features could min max 𝑉(𝐷, 𝐺) include stock prices, interest rates, volatility indices, etc. 𝐺 𝐷 We can represent this as: 𝑥𝑡 = [𝑝𝑡 , 𝑖𝑡 , 𝑣𝑡 , . . . ], where 𝑝𝑡 is = 𝐸𝑥 [𝑙𝑜𝑔𝐷(𝑥)] ∼𝑝𝑑𝑎𝑡𝑎(𝑥) the price, it is the interest rate, and 𝑣𝑡 is the volatility at + 𝐸𝑧 [𝑙𝑜𝑔(1 − 𝐷(𝐺(𝑧)))] ∼𝑝𝑧(𝑧) time 𝑡. (1) 306 Informatica 49 (2025) 303–314 A. Li In Eq. (1), 𝑝𝑑𝑎𝑡𝑎(𝑥) is the distribution of real financial 𝑽𝒂𝑹 and 𝑬𝑺 calculation data, 𝑝𝑧(𝑧) is the distribution of the random noise and Value at Risk (𝑉𝑎𝑅) and Expected Shortfall (𝐸𝑆) may be 𝐸(. ) represents the expected value. calculated many ways. The Historical Simulation Method Time series GANs (TimeGANs): organizes GAN-generated data in ascending order and For time series data [26], variations like TimeGANs are calculates 𝑉𝑎𝑅 at the 95th or 99th percentile selected for employed, which incorporate recurrent neural networks confidence. The Parametric Method calculates 𝑉𝑎𝑅 for (RNNs) like LSTMs or GRUs to capture temporal GAN-generated data using a normal or Student's t- dependencies. These models utilize embedding and distribution. The Monte Carlo Simulation Method recovery networks, in addition to the generator and discriminator, to effectively model time-series dynamics. employs the GAN to create several scenarios and calculate GANs may generate synthetic financial data with similar 𝑉𝑎𝑅 by averaging the losses at the selected confidence patterns and linkages. Giving models a larger dataset to level. train on may help them forecast dangers. Rare or severe When computing ES, the Historical Simulation Method occurrences may be underrepresented in this dataset. identifies the average loss larger than 𝑉𝑎𝑅 at the set Synthetic data production creates novel situations that confidence level. The Parametric Method assumes the may help models perform better with fresh data. distribution of GAN-generated data and calculates ES We identify risk indicators like Expected Shortfall (𝐸𝑆 ) using its properties. The Monte Carlo Simulation Method and Value at Risk using GAN-generated false data. Time- employs the GAN to create several scenarios and discover series links may allow the model to dynamically predict ES by calculating the average loss larger than 𝑉𝑎𝑅. future risk levels from current and prior financial data. Manages dangers beforehand. Confidence Level The optimization component determines the best sequence of options within constraints and maximizes utility Specific needs may determine 𝑉𝑎𝑅 and 𝐸𝑆 calculation function using predicted risk. GANs and decision confidence levels. Internal risk management uses 99% CI optimization may improve scenario realism and power. to set limitations. These methods and confidence levels This improves financial risk management decisions. may help banks and investors estimate 𝑉𝑎𝑅 and ES while taking into account complex data patterns and 2.2 Financial risk prediction correlations. Generative Adversarial Networks (GANs) may help to estimate risk metrics thereby strengthening financial risk Risk measure estimation: prediction. Synthetic financial scenarios produced by GANs can be used to generate synthetic financial GANs are then used to estimate risk factors like Expected scenarios, which can then be used to estimate risk Shortfall (ES) and Value at Risk (VaR). VaR shows the measures like Value at Risk (𝑉𝑎𝑅) or Expected Shortfall possible loss with a particular confidence level; ES (𝐸𝑆), showed in Eq. (2) and (3). computes the anticipated loss outside of VaR. Moreover, 𝑉𝑎𝑅𝛼 = 𝑖𝑛𝑓{𝑙: 𝑃(𝐿 ≤ 𝑙) ≥ 𝛼} (2) by including time-series dependencies, the model can where 𝐿 is the loss and 𝛼 is the confidence level which has dynamically forecast future risk levels depending on been placed as subscript to 𝑉𝑎𝑅 and 𝐸𝑆. present and previous financial data, thus supporting 𝐸𝑆𝛼 = 𝐸[𝐿 ∣ 𝐿 ≥ 𝑉𝑎𝑅𝛼]. (3) proactive risk control. Dynamic prediction: By incorporating time-series dependencies, the model can dynamically predict future risk levels based on current and past financial data. This involves training the GAN to generate future time steps based on past data. A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 307 Figure 1: The block diagram for the proposed system 2.3 Decision optimization Utility function and constraints vary by financial risk Decision optimization increases utility function within management circumstance. This optimization issue may restrictions by determining the best choice sequence. It be solved using dynamic programming or other helps manage financial risk. Based on expected risk, the approaches. You may use financial scenarios to minimize utility function shows the choice result. The limits may risk or increase returns in investment portfolios. This may include your risk tolerance and budget. You may optimize be done via Mean-Variance Optimization (MVO). By this issue using dynamic programming and other identifying risk events and reducing them, the model may approaches. Financial scenarios may be used with mean- help you create dynamic risk management plans. variance optimization (MeV) to optimize returns or reduce risk in an investment portfolio. The model may also help Optimization function: create dynamic risk management strategies by predicting A typical portfolio optimization function is in Eq. (5): future risk events and providing solutions. A normal min 𝑤𝑇 𝛴𝑤 − 𝜆𝑤𝑇𝜇 (5) 𝑤 portfolio optimization function minimizes risk and where, 𝑤 is the vector of portfolio weights, 𝑤𝑇𝛴𝑤 maximizes profits. The asset returns covariance matrix represents the portfolio risk (variance of returns), 𝛴 is the and anticipated asset return vector are examined. GANs covariance matrix of asset returns, 𝛼 is risk tolerance give possibilities for covariance matrix and expected parameter and 𝜇 is the vector of expected asset returns. return calculations. Combining this with decision The GANs provide the scenarios used to calculate the optimization provides for more accurate and realistic covariance matrix and expected returns. This allows the scenario information. This link simplifies optimization optimization to utilize more robust and realistic scenario and improves financial risk assessment. information. Let 𝐴𝑡 be the decision variable at time t (e.g., investment The decision optimization stage guides financial decisions portfolio allocation, risk mitigation actions) and 𝑅𝑡 is risk by utilizing predicted financial risk to determine the best tolerance. Let 𝑈(𝐴𝑡 , 𝑅𝑡) be the utility function, financial choices. This stage begins with inputting the representing the decision's outcome based on the predicted expected financial risk into the decision optimization risk. The optimization problem is to find the optimal module, which serves as the foundation for optimizing decision sequence in Eq. (4): 1 financial decisions. A suitable optimization model, such as max (𝐴1, . . . , 𝐴𝑇)∑𝑡 = 𝑇 (𝐴 𝑈 𝑡 , 𝑅𝑡) (4) linear programming or dynamic programming, is Subject to constraints: 𝐶(𝐴𝑡 , 𝑅𝑡) ≤ 0 (e.g., budget established based on the complexity and nature of the constraints, risk tolerance). financial decisions. The optimization model identifies 308 Informatica 49 (2025) 303–314 A. Li complex interactions among financial factors, including synthetic data; the discriminator network checks it and asset returns, risk levels, and portfolio restrictions. provides comments back to the generator. Following preprocessing, the GAN is trained aiming toward The optimization process involves determining the best producing synthetic financial data indistinguishable from financial choices that minimize financial risk and real data. Visual inspection, accuracy, and loss functions maximize profits, subject to various constraints and limits. are among the many criteria used among the several The best financial judgments generated by the decision benchmarks to evaluate the GAN's performance optimization module can guide direct investment throughout training. By use of knowledge of the quality of strategies, risk management, and portfolio performance the generated data, this evaluation directs any necessary maximization. By making informed decisions based on adjustments to the GAN design or training environment. After sufficient training a GAN may generate realistic optimal financial judgments, financial institutions and synthetic financial data that can be used downstream for investors can reduce financial risk, increase returns, and stress testing, risk analysis, and portfolio optimization. achieve their financial goals. The decision optimization problem can be mathematically Stage 2: Risk prediction model training An essential component of the complete process, the represented as: training phase for risk prediction models seeks to produce 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒: 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝑒. 𝑔. , 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑟𝑒𝑡𝑢𝑟𝑛, 𝑟𝑖𝑠𝑘a powerful and accurate model able to anticipate future − 𝑎𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑟𝑒𝑡𝑢𝑟𝑛) financial risk. This stage begins with the synthesis of synthetic data mixed with genuine financial data, therefore 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜: 𝐶𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠 (𝑒. 𝑔. , 𝑟𝑖𝑠𝑘 𝑡𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒, providing a whole and diversified dataset for training the risk prediction model. Depending on the kind and degree 𝑟𝑒𝑔𝑢𝑙𝑎𝑡𝑜𝑟𝑦 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑚𝑒𝑛𝑡𝑠, 𝑏𝑢𝑑𝑔𝑒𝑡 𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠) of the data, a suitable risk prediction model is subsequently created—machine learning or deep learning 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠: 𝐷𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 (𝑒. 𝑔. , 𝑝𝑜𝑟𝑡𝑓𝑜𝑙𝑖𝑜 model. Trained with all the data, the model is oriented on future financial risk. The training approach seeks to 𝑤𝑒𝑖𝑔ℎ𝑡𝑠, 𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 𝑎𝑚𝑜𝑢𝑛𝑡𝑠) optimize the model's parameters so that the error between The utility function and constraints can be tailored to predicted and actual risk levels is lowest feasible. After specific financial goals and risk management objectives. training, accuracy, precision, recall, and F1-score among other standards are used to evaluate the model. These By solving this optimization problem, financial indications advise any necessary architectural or training institutions and investors can determine the optimal parameter adjustment and assist one to grasp the potential financial decisions that balance risk and return. of the model to precisely anticipate financial risk. Good risk prediction model training may enable financial 3 Complete model structure organizations to learn significant knowledge about likely The complete model structure consists of four phases future dangers, therefore directing their activities and (Figure 2). First step is training a Generative Adversarial development of effective risk management strategies. Network (GAN) to provide suitable synthetic financial data. Stage 2 uses blended real and synthetic data to build Stage 3: Risk prediction a risk prediction model hoping to forecast future financial The first risk prediction one applies in the final stage of risk. Stage 3 projections financial risk depending on new, the operation forecasts future financial risk using a trained unknown factors using the taught risk prediction model. risk prediction model. Starting with the provided new, By use of the expected financial risk via a decision unknown input to the trained model comprising financial optimization module, stage 4 at last optimizes financial aspects and current market conditions, this phase proceeds choices including risk management techniques or through It is carefully chosen to ensure its correctness and portfolio allocation. Every step builds on the one before it relevance as the predictions of the model rely on the lets the model create reasonable synthetic data, predict available data. Once the particular data becomes available, financial risk, and maximize financial actions to reduce the trained model is then projected future financial risk risk and increase profits. related with it. This prediction offers a forward-looking assessment of expected financial risk based on patterns Stage 1: GAN training and connections the model learns over the training period. First in the procedure is training a generative adversarial The expected financial risk generated by the risk network (GAN) suitable synthetic financial data. Previous prediction model might find use for requirements related financial data is collected and preprocessed at this step to to decision-making. Whether in form—a chance of ensure it is in a fit condition for training the GAN. Usually default, expected loss, or risk score—this output provides combining time-series data with important financial financial institutions, investors, and other stakeholders domain associated important features, this data Data significant information. These businesses might optimize preparation results in the construction of an appropriate their risk-reducing strategies, make sensible decisions, GAN architecture incorporating a generator network and and manage challenging financial markets with greater a discriminator network. The generator network generates confidence by applying the expected financial risk. A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 309 depending on the degree of complexity and nature of the financial decisions—a linear programming or dynamic programming model. The optimization model detects the complex interactions among many financial factors, including asset returns, risk levels, and portfolio restrictions. Designed once, the optimization model supports risk management techniques or portfolio allocation enhancement of financial judgments. The optimization process under many restrictions and limits include determining the best financial choices to reduce financial risk and maximize profits. Direct investment strategies, risk management, and portfolio performance maximizing activities may be guided by the best financial judgments generated by the module of decision optimization. Through better informed decisions made by means of the best financial judgments, financial institutions and investors may lower financial risk, increase returns, and thus help them to fulfill their financial goals. 3.1 Integration of components GAN produces financial data to train the risk prediction model. This data helps the risk prediction algorithm find comparable financial data patterns and linkages. Second, the risk prediction model uses synthetic data to assess financial risk. Then, Value-at-Risk ( 𝑉𝑎𝑅 ) or predicted Shortfall (𝐸𝑆) are used to assess predicted risk. Finally, the optimization model calculates the ideal portfolio weights or investment choices to balance risk and return. Steps of the algorithm are described below. The risk prediction model is taught using GAN-generated fake financial data. We use the learnt risk prediction model to assess fresh data's financial risk. Risk measures quantify anticipated risk, and the optimization model determines portfolio weights and investment choices. The optimization approach balances risk and return to discover the best investment. The steps are as follows: 1. Generate false financial data using GAN: synthetic_data = GAN.generate_data() 2. Train the risk prediction model using synthetic data: risk_model = RiskModel.train(synthetic_data). 3. Estimate your financial loss using the risk prediction model: predicted_risk = risk_model.predict(new_data). 4. Use calculate_risk_metric(predicted_risk) to get the Figure 2: The complete model structure risk metric. Stage 4: Decision optimization 5. Find the optimal portfolio weights or investments using The stage of decision optimization guides financial the optimization model: Portfolio = decisions by means of the predicted financial risk. This OptimizationModel.optimize(risk_metric, return_metric) stage begins with the expected financial risk being input into the module of decision optimization, therefore guiding the foundation for optimizing financial choices. The suitable optimization model is then established 310 Informatica 49 (2025) 303–314 A. Li 3.2 Variable selection and mapping 3.3 Weighting and validation of real and synthetic data Macroeconomic issues like GDP, inflation, and employment and social development elements like health, During training, the real and synthetic data can be education, and poverty are studied. These characteristics weighted differently to control the influence of each type were selected because they impact financial markets and of data on the model's performance. One approach is to asset returns. To place variables into a portfolio context, use a weighted loss function that assigns different weights we may use a multivariate technique that examines asset to the real and synthetic data. For example: performance. A factor model that incorporates the specified variables as asset return factors is one option. 𝑙𝑜𝑠𝑠 = 𝑤𝑟𝑒𝑎𝑙 ∗ 𝑙𝑜𝑠𝑠𝑟𝑒𝑎𝑙 + 𝑤𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐 ∗ 𝑙𝑜𝑠𝑠𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐 Table 2 lists generator and discriminator network where 𝑤𝑟𝑒𝑎𝑙 and 𝑤𝑠𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐 are the weights assigned to the architectural parameters. real and synthetic data, respectively. Asset-Level Returns Validation We model asset returns using a multivariate distribution, To validate the performance of the model on both real and such as a multivariate normal distribution or a more synthetic data, we can use metrics such as mean squared elaborate one that exhibits non-linear relationships error (MSE) or mean absolute error (MAE) on a hold-out between variables. validation set. This can help us monitor the model's performance on both types of data and adjust the Portfolio context weighting scheme or other hyperparameters as needed. We use portfolio optimization to place variables in a Generating synthetic data that is diverse and portfolio context by looking at anticipated returns, risks, representative of the real data can help reduce overfitting. and correlations between assets. 4 Experimental setup The optimization problem can be formulated as: Python, TensorFlow or PyTorch is used for deep maximize: Portfolio return learning. The model settings include a batch size of 128, subject to: Risk constraints (e.g., 𝑉𝑎𝑅, 𝐸𝑆) 500 epochs, a noise dimension of 100, learning rates of variables: Portfolio weights 0.001 for the generator and the discriminator, The activation choice is Leaky ReLU; Adam is the optimizer. The simulation parameters consist of a 0.1 volatility, a 0.02 risk-free rate, and a 1000-time step simulation. Table 2: Generator and Discriminator Network We began the process of training a Generative Adversarial Architecture Parameters Network (GAN) for financial data creation using publicly available financial datasets Generator Discriminator (https://databank.worldbank.org/. Comprising more than Parameter Network Network 9,000 variables covering several spheres including economic, social, environmental, and others, this dataset includes macroeconomic characteristics such GDP, Number of inflation, and employment as well as social development Layers 4 4 measures including education, health, and poverty. After the dataset is selected, data preparation—a crucial Activation component of the overall process—follows. Missing Function Leaky ReLU Leaky ReLU values must be handled by interpolation or imputation; the data must be normalized so that every attribute falls in the Number of 64, 128, 256, same range. Furthermore, the data has to be converted into Filters 512 64, 128, 256, 512 an appropriate form for GAN training, maybe incorporating scaling or encoding. The GAN design needs Kernel Size 4, 4, 4, 4 4, 4, 4, 4 to be developed after data preparation. A deep convolutional GAN (DCGAN) is particularly appropriate Stride 2, 2, 2, 2 2, 2, 2, 2 for financial data producing as its architecture consists of a generator network and a discriminator network. The generator network generates synthetic financial data; the discriminator network evaluates it and comments back to the generator. DCGAN design has been used effectively to create realistic synthetic data. Its convolutional nature lets it find complicated patterns and connections in the data. Some of the things that high-quality synthetic financial A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 311 data created by DCGAN designs may be used for include Table 3 shows the different parameters that were utilized risk analysis, portfolio optimization, and stress testing. to design the generator and discriminator networks. Table 3: CNN architecture parameters for generator and discriminator network Generator Network Discriminator Network Input layer 100-dimensional noise vector Input layer 1-dimensional input (financial data) Convolutional layer 1 64 filters, kernel size 3, stride Convolutional 64 filters, kernel size 3, stride 1 layer 1 1 Convolutional layer 2 128 filters, kernel size 3, Convolutional 128 filters, kernel size 3, stride stride 1 layer 2 1 Convolutional layer 3 256 filters, kernel size 3, Convolutional 256 filters, kernel size 3, stride stride 1 layer 3 1 Output layer 1-dimensional output Output layer 1-dimensional output (financial risk prediction) (probability of real data) Figure 3 depicts the recommended model's CNN 5 Results and discussion architecture. Since it helps the Generative Adversarial Network (GAN) model identify financial data patterns and For GAN training, the dataset must deal with missing linkages, training is crucial. The Adam optimizer trains values by interpolation or imputation, normalize the data GANs. The well-known stochastic gradient descent so all characteristics are in the same range, and format the method alters the learning rate for each parameter based data for GAN training. on gradient size. The small learning rate of 0.001 allows model parameters converge slowly and gradually. The Adam optimizer trains GAN with 0.001 learning rate and batch size is 128, a conventional value that balances 128 batches. The generating network trains using phony computer speed and model stability. The GAN learns to financial data, and the discriminator network verifies it create phony financial data that appears real during and informs the generator what it thinks. GANs are trained training. In addition, the discriminator learns to for 500 epochs to obtain convergence and provide high- distinguish genuine from fraudulent data. After training, quality synthetic financial data. R-squared, MAE, and MSE are used to evaluate the GAN's performance. These measurements demonstrate Several indicators are used to evaluate the proposed model the reliability of synthetic data and assist adjust GAN such as: MAE, MSE, RMSE, R-squared, risk prediction design and training parameters. How effectively the GAN accuracy, precision, recall, and F1-score. operates might indicate its synthetic data quality. This will determine whether the data is suitable for risk analysis, Table 4 shows that the proposed model outperforms recent portfolio optimization, and stress testing. works 1 [27], 2 [28], and 3 [29]. Existing Work 1 [27] employs CNNs and LSTM networks for deep learning. Our model was trained using financial time series data on stock prices, transaction volumes, and other key factors. The model comprises 5 128-unit hidden layers. ReLU activation function, Adam optimizer, 0.01 learning rate, 64 batch size, 1000 epochs. Existing Work 2 [28] uses random forest machine learning. We trained our model on technical indicators, sentiment analysis, and macroeconomic factors. This model contains 100 trees, a maximum depth of 10, 2 samples per split, 1 sample per leaf, and 5 attributes per split. employs an autoregressive integrated moving average (ARIMA) method. This model learned from a set of historical financial time series data. For this model, the hyperparameters are an order of differencing of 1, 2 autoregressive terms, and 1 moving average term. Figure 3: The CNN architecture for the proposed model 312 Informatica 49 (2025) 303–314 A. Li With values of 0.009, 0.012, and 0.015 for the training, validation, and testing sets respectively, the suggested Table 5: GAN performance model's Mean Absolute Error (MAE) is much lower than 0.052 ± 0.008 for current work. Likewise, the Mean Existing Existing Existing Squared Error (MSE) and Root Mean Squared Error Metric Proposed Model Work 1 Work 2 Work 3 (RMSE) values for the proposed model are 0.001, 0.002, Training set: 0.04, and 0.003, and 1.2%, 1.5%, and 1.8% for the training, Validation set: Generator 0.05, Testing set: validation, and testing sets, respectively, exceeding Loss 0.06 0.05 0.07 0.09 present work with values of 0.003 ± 0.001 and 0.55 ± Training set: 0.02, 0.8%. Moreover, whereas previous work achieves a lower Validation set: R-squared value of 0.854 ± 0.018, the Coefficient of Discriminator 0.03, Testing set: Determination (R-squared) values for the proposed model Loss 0.04 0.03 0.05 0.07 are 0.95, 0.92, and 0.90 for the training, validation, and 500 epochs, Batch testing sets, respectively, indicating a strong correlation size: 128, between predicted and actual values. Comparatively to GAN Learning rate: 1000 800 1200 Convergence 0.001 epochs epochs epochs previous work, the suggested model shows enhanced accuracy, dependability, and generalizability. As per table 6, the predicted financial risk yielded by the proposed model is remarkably close to the actual financial Table 4: Performance metrics risk, with an average predicted risk of 0.023 and a standard deviation of 0.005, compared to an average actual risk of Proposed Existing Existing Existing Metric Model Work 1 Work 2 Work 3 0.025 and a standard deviation of 0.006. In contrast, existing work exhibits a higher average predicted risk of Training set: 0.009, Validation 0.028, indicating a less accurate prediction. Furthermore, set: 0.012, the proposed model achieves a risk prediction accuracy of Mean Absolute Testing set: 0.052 ± 0.065 ± 0.075 ± 92%, with a precision of 90%, recall of 94%, and F1-score Error (MAE) 0.015 0.008 0.010 0.012 of 92%, surpassing the 85% accuracy achieved by existing Training set: work. This superior performance underscores the 0.001, Validation proposed model's ability to accurately predict financial set: 0.002, Mean Squared Testing set: 0.003 ± 0.005 ± 0.007 ± risk, enabling financial institutions and investors to make Error (MSE) 0.003 0.001 0.002 0.003 informed decisions and mitigate potential losses. Training set: Root Mean 1.2%, Validation Table 6: Risk prediction results Squared Error set: 1.5%, 0.055 ± 0.070 ± 0.085 ± (RMSE) Testing set: 1.8% 0.008 0.010 0.012 Existing Existing Existing Training set: Metric Proposed Model Work 1 Work 2 Work 3 Coefficient of 0.95, Validation Determination set: 0.92, Testing 0.921 ± 0.895 ± 0.865 ± 0.023 (Average (R-squared) set: 0.90 0.013 0.018 0.022 Predicted predicted risk: 0.023, Financial Standard deviation of Risk predicted risk: 0.005) 0.028 0.035 0.042 As per table 5, the generator loss for the proposed model 0.025 (Average actual is lower, with values of 0.04, 0.05, and 0.06 for the Actual risk: 0.025, Standard training, validation, and testing sets, respectively, Financial deviation of actual compared to 0.08 for existing work. Similarly, the Risk risk: 0.006) 0.03 0.035 0.04 discriminator loss for the proposed model is lower, with Risk 92% (Precision: 90% values of 0.02, 0.03, and 0.04 for the training, validation, Prediction Recall: 94% F1- Accuracy score: 92%) 85% 80% 75% and testing sets, respectively, outperforming existing work with a value of 0.05. While present work spans 1000 epochs, the proposed GAN model achieves convergence Table 7 shows that the model has a precision of 0.853 ± in fewer epochs—only 500 epochs—needed to reach 0.021, a recall of 0.826 ± 0.025, and an F1-score of 0.839 optimal performance. The successful convergence of the ± 0.022 for low-risk predictions. This means that it is quite suggested model—a batch size of 128 and a learning rate good at finding low-risk situations. The model does better of 0.001—helps to be explained by optimum in the medium-risk category, with accuracy, recall, and F1- hyperparameter values. The proposed GAN model score values of 0.913 ± 0.015, 0.895 ± 0.018, and 0.904 ± exhibits usually superior performance, stability, and 0.016, respectively. This shows that it can reliably forecast efficiency than present work, which makes it a more medium-risk occurrences. The model's ability to find trustworthy and effective tool for producing synthetic high-risk situations is shown by its high accuracy, recall, financial data. and F1-score values of 0.952 ± 0.008, 0.935 ± 0.011, and 0.943 ± 0.009, which are all very good. Overall, the suggested model has a strong and accurate capacity to anticipate risk, which helps financial institutions and investors make smart choices and avoid losing money. A GAN-Based Framework for Synthetic Financial Data Generation… Informatica 49 (2025) 303–314 313 Table 7: Risk level-based prediction results predicts risk well with an MAE of 0.012 and an MSE of 0.002. Due to its 4.5% risk and 8.2% return, the model Risk Proposed Existing Existing Existing outperforms machine learning methods. The model Level Model Work 1 Work 2 Work 3 adjusts to market volatility with an average return of 8.5% Precision: 0.853 and risk of 4.2%. The model offers a novel technique to ± 0.021, Recall: Precision: Precision: Precision: predict financial risk dynamics and improve decision- 0.826 ± 0.025, 0.80, Recall: 0.75, Recall: 0.70, Recall: making. It may be utilized for portfolio, risk, and F1-score: 0.839 0.75, F1- 0.70, F1- 0.65, F1- Low ± 0.022 score: 0.77 score: 0.72 score: 0.67 investment choices. We must improve the risk prediction Precision: 0.913 model, add elements to the decision-optimizing model, ± 0.015, Recall: Precision: Precision: Precision: and discover new methods to use technology in banking. 0.895 ± 0.018, 0.85, Recall: 0.80, Recall: 0.75, Recall: F1-score: 0.904 0.80, F1- 0.75, F1- 0.70, F1- Medium ± 0.016 score: 0.82 score: 0.77 score: 0.72 References Precision: 0.952 [1] Bhat, A., Kulkarni, N., Husain, S., Yadavalli, A., ± 0.008, Recall: Precision: Precision: Precision: Kaur, J. N., Shukla, A., & Seshadri, V. (2024). 0.935 ± 0.011, 0.90, Recall: 0.85, Recall: 0.80, Recall: F1-score: 0.943 0.85, F1- 0.80, F1- 0.75, F1- Speaking in terms of money: financial knowledge High ± 0.009 score: 0.87 score: 0.82 score: 0.77 acquisition via speech data generation. ACM Journal on Computing and Sustainable Societies, 2(3), 1-35. When it comes to the best portfolio allocation, anticipated [2] Paiva F.D.a, Cardoso R.T.N., Hanaoka G.P., return, and expected risk (Table 8), the suggested &Duarte W.M. Decisionmaking for Financial technique is far better at making judgments than earlier Trading: A Fusion Approach of Machine Learning studies. The suggested model says that the best way to and Portfolio Selection. Expert Systems with divide up a portfolio is to have 65% stocks, 30% bonds, Applications,2019, (115):635-655 and 5% cash. Other work has said to put 60% of your [3] Tang, Y., Song, Z., Zhu, Y., Yuan, H., Hou, M., Ji, money in equities, 35% in bonds, and 5% in cash. Also, J.,... & Li, J. (2022). A survey on machine learning the recommended model has a greater expected return of models for financial time series forecasting. 8.2% (with a standard deviation of 1.5%) than the 7.5% Neurocomputing, 512, 363-380. expected return of the prior study. The proposed model [3] Masini, R. P., Medeiros, M. C., & Mendes, E. F. also has a lower expected risk of 4.5% (with a standard (2023). Machine learning advances for time series deviation of 1.2%), whereas previous research shows a forecasting. Journal of Economic Surveys, 37(1), 76- higher expected risk of 5.5%. This is really crucial. These 111. findings demonstrate that the recommended approach may [4] Wang, J., Hong, S., Dong, Y., Li, Z., & Hu, J. (2024). assist investors and banks make better choices by making Predicting stock market trends using lstm networks: the best use of their portfolios, getting the most money overcoming RNN limitations for improved financial back, and lowering their risk. forecasting. Journal of Computer Science and Software Applications, 4(3), 1-7. Table 8: Decision optimization results [5] S. Safwat, A. Mahmoud, I. Eldesouky Fattoh and F. Ali, "Hybrid Deep Learning Model Based on GAN Proposed Existing Existing Existing and RESNET for Detecting Fake Faces," in IEEE Metric Model Work 1 Work 2 Work 3 Access, vol. 12, pp. 86391-86402, 2024, doi: Optimized 65% stocks, 60% stocks, 55% stocks, 70% stocks, 10.1109/ACCESS.2024.3416910. Portfolio 30% bonds, 5% 35% bonds, 40% bonds, 25% bonds, [6] Shi X, Zhang Y, Yu M, Zhang L. 2025. Deep learning Allocation cash 5% cash 5% cash 5% cash for enhanced risk management: a novel approach to 8.2% (Standard analyzing financial reports. PeerJ Computer Science deviation of 11:e2661 https://doi.org/10.7717/peerj-cs.2661 Expected expected Return return: 1.5%) 7.50% 7.00% 8.00% [7] Huang, X.; Han, M.; Deng, Y. A Hybrid GAN- Inception Deep Learning Approach for Enhanced 4.5% (Standard deviation of Coordinate-Based Acoustic Emission Source Expected expected risk: Localization. Appl. Sci. 2024, 14, 8811. Risk 1.2%) 5.50% 6.00% 4.80% https://doi.org/10.3390/app14198811 [8] Ren, S. (2022). Optimization of enterprise financial management and decision‐making systems based on 6 Conclusion big data. Journal of Mathematics, 2022(1), 1708506. [9] Qi, Q. (2022). Analysis and forecast on the price change of shanghai stock index. Journal of Generative Adversarial Networks (GANs) are used in this Economics, Business and Management, 10(1), 72- study to anticipate financial risk dynamics and make the 78. optimal judgments. The model trains a risk prediction [10] Petrozziello, A., Troiano, L., Serra, A., Jordanov, I., model using phony financial data from a GAN. Based on Storti, G., Tagliaferri, R., & La Rocca, M. (2022). financial risk prediction, the decision optimization model produces the optimum financial judgments. The model 314 Informatica 49 (2025) 303–314 A. Li Deep learning for volatility forecasting in asset Published by: Wiley for the Royal Statistical Society management. Soft Computing, 26(17), 8553-8574. Stable URL: [11] Li, Y., & Pan, Y. (2022). A novel ensemble deep http://www.jstor.or learning model for stock prediction based on stock [21] Sutiene K, Schwendner P, Sipos C, Lorenzo L, prices and news. International Journal of Data Mirchev M, Lameski P, Kabasinskas A, Tidjani C, Science and Analytics, 13(2), 139-149. Ozturkkal B, Cerneviciene J. Enhancing portfolio [12] Souto, H. G., & Moradi, A. (2023). Forecasting management using artificial intelligence: literature realized volatility through financial turbulence and review. Front Artif Intell. 2024 Apr 8; 7:1371502. neural networks. Economics and Business Review, doi: 10.3389/frai.2024.1371502. PMID: 38650961; 9(2), 133-159. PMCID: PMC11033520. [13] Zhan, X., Ling, Z., Xu, Z., Guo, L., & Zhuang, S. [22] Xu, R., Yang, Y., Qiu, H., Liu, X., & Zhang, J. (2024). Driving efficiency and risk management in (2024). Research on Multimodal Generative finance through AI and RPA. Unique Endeavor in Adversarial Networks in the Framework of Deep Business & Social Sciences, 3(1), 189-197. Learning. Journal of Computing and Electronic [14] Wei, L., Deng, Y., Huang, J., Han, C., & Jing, Z. Information Management, 12(3), 84-88. (2022). Identification and analysis of financial [23] Dai, W., Tao, J., Yan, X., Feng, Z., & Chen, J. (2023, technology risk factors based on textual risk November). Addressing Unintended Bias in Toxicity disclosures. Journal of Theoretical and Applied Detection: An LSTM and Attention-Based Electronic Commerce Research, 17(2), 590-612. Approach. In 2023 5th International Conference on [15] Lei, Y., Qiaoming, H., & Tong, Z. (2023). Research Artificial Intelligence and Computer Applications on supply chain financial risk prevention based on (ICAICA) (pp. 375- 379). IEEE. machine learning. Computational Intelligence and [24] Yao, J., Wu, T., & Zhang, X. (2023). Improving depth Neuroscience, 2023(1), 6531154. gradient continuity in transformers: A comparative [16] Levytska, S., Pershko, L., Akimova, L., Akimov, O., study on monocular depth estimation with cnn. arXiv Havrilenko, K., & Kucherovskii, O. (2022). A preprint arXiv:2308.08333. riskoriented approach in the system of internal [25] Wang, X. S., & Mann, B. P. (2020). Attractor auditing of the subjects of financial monitoring. Selection in Nonlinear Energy Harvesting Using International Journal of Applied Economics, Finance Deep Reinforcement Learning. arXiv preprint and Accounting, 14(2), 194-206. arXiv:2010.01255. [17] Wang, H., & Budsaratragoon, P. (2023). Exploration [26] Zhang, Y., Jiang, Z., Peng, C., Zhu, X., & Wang, G. of an" Internet+" grounded approach for establishing (2024). Management analysis method of multivariate a model for evaluating financial management risks in time series anomaly detection in financial risk enterprises. International Journal for Applied assessment. Journal of Organizational and End User Information Management, 3(3), 109-117. Computing, 36(1), 1 -19. [18] A. Malki, E. Atlam, I. Gad, Machine learning [27] Pandey, A., Mannepalli, P.K., Gupta, M. et al. A approach of detecting anomalies and forecasting Deep Learning-Based Hybrid CNN-LSTM Model time-series of IoT devices, Alex. Eng. J., 61 (11) for Location-Aware Web Service Recommendation. (2022), pp. 8973-8986, 10.1016/j.aej.2022.02.038 Neural Process Lett 56, 234 (2024). [19] K.E. Arunkumar, D. V Kalaga, C. Mohan, S. Kumar, https://doi.org/10.1007/s11063-024-11687-w T.M. Brenza, Comparative analysis of Gated [28] Zhigang Sun, Guotao Wang, Pengfei Li, Hui Wang, Recurrent Units (GRU), long Short-Term memory Min Zhang, Xiaowen Liang, An improved random (LSTM) cells, autoregressive Integrated moving forest based on the classification accuracy and average (ARIMA), seasonal autoregressive correlation measurement of decision trees, Expert Integrated moving average (SARIMA) for Systems with Applications, Volume 237, Part B, forecasting COVID-19 trends 2024, https://doi.org/10.1016/j.eswa.2023.121549. Alex. Eng. J., 61 (10) (2022), pp. 7585-7603, [29] Saratu Yusuf Ilu, Rajesh Prasad, improved 10.1016/j.aej.2022.01.011 autoregressive integrated moving average model for [20] R.S. Society, Review author (s): M. G. Kendall COVID-19 prediction by using statistical review by: M. G. Kendall source: journal of the royal significance and clustering techniques, Heliyon, statistical society. Series A (general), J. Roy. Stat. Volume 9, Issue 2, 2023, Soc., 134 (3) (2016), pp. 450-453, 134, No. 3 (1971), e13483, https://doi.org/10.1016/j.heliyon.2023.e13483. https://doi.org/10.31449/inf.v49i16.9643 Informatica 49 (2025) 315–330 315 GridRiskNet: A Two-Stage Hybrid Model for Project Investment Risk Management of Power Grid Enterprises Using Big Data Mining Hongzhi Gao*, Dekyi, Metok State Grid Tibet Electric Power Co., Ltd., Lhasa 850000, China E-mail: Djgy1108@163.com *Corresponding author Keywords: power grid enterprise engineering project, GridRiskNet, big data mining, project investment risk management, two-stage hybrid modeling Received: June 10, 2025 To enhance the power grid enterprise's ability to comprehensively perceive and dynamically assess investment risks in engineering projects, this study proposes a risk management model called GridRiskNet based on big data mining. This model integrates structured, unstructured, and spatiotemporal data and realizes intelligent identification of project risk probability distributions and potential impact ranges by constructing a two-stage hybrid modeling architecture. In the first stage, the model uses eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM) to extract static and dynamic features in parallel. In the second stage, it introduces Graph Attention Recurrent Neural Network (GA- RNN) to model risk propagation paths under the power grid topology. Meanwhile, this study combines Spatio-Temporal Graph Convolutional Network (ST-GCN) to improve the coupling expression of meteorological and text features. The experiment uses multi-source public data for verification, such as power infrastructure data from the U.S. Energy Information Administration, meteorological observation data from the National Oceanic and Atmospheric Administration, and power grid topology data from OpenStreetMap. The results show that GridRiskNet performs excellently in risk prediction stability and regional propagation modeling. Among them, the risk principal component analysis projection score in 2023 reached 7.779. This indicates that cost overruns, climate pressure, and equipment technology risks together form a high-risk cluster, with cost overruns increasing by 269% compared with 2018. In the State-of-the-Art comparison, GridRiskNet achieves an F1-score of 0.892, a Receiver Operating Characteristic - Area Under Curve of 0.962, a Risk Impact Radius error of approximately 4.8 km, and a Risk Entropy of 0.89; these are comprehensively better than existing methods. Moreover, the model has good cross-modal feature fusion and risk transmission mechanism identification capabilities, and can effectively characterize the spatiotemporal coupling risk features in complex power grid projects. Overall, this system can provide power grid enterprises with structured and interpretable risk index outputs and regional early warning support. Thus, it helps to improve the investment safety and operational and maintenance resilience of projects. Povzetek: Predstavljen je GridRiskNet, dvofazni hibridni model za upravljanje investicijskih tveganj v elektroenergetskih projektih. S križnim združevanjem strukturiranih, besedilnih in prostorsko-časovnih podatkov ter uporabo XGBoost/LightGBM in GA-RNN izboljša napoved tveganj (F1=0,892, AUC=0,962) ter natančno modelira regionalno širjenje tveganj (napaka 4,8 km). 1 Introduction process from construction preparation, equipment deployment, to operation and maintenance support [2]. With the accelerated promotion of energy transition Especially against the backdrop of the rapid development and the construction of new power systems, the strategic of renewable energy, the risk types in project investment position of power grid engineering projects in national are constantly evolving. For example, the enhancement energy security and clean energy consumption has of climate extremeness, the swift change of equipment become increasingly prominent [1]. However, power grid technology paths, and the increase in policy compliance enterprises face problems such as the surge of multi- costs all propose higher requirements for the intelligence source heterogeneous data, highly uncertain engineering and adaptability of risk early warning systems [3-5]. environments, and frequent external disturbances during Therefore, constructing a big data mining-based project investment and construction. These problems intelligent risk assessment model has become a key path make traditional risk management methods difficult to to improving the investment decisions' scientific nature cover the dynamic risk chain throughout the whole and the power grid enterprises' resilience governance 316 Informatica 49 (2025) 315–330 H. Gao et al. capabilities [6, 7]. In the context of power market comprehensive portrayal of investment risks in power liberalization, the continuous increase in the proportion grid engineering projects by constructing a composite of renewable energy has made the risk management of model that integrates structured, spatiotemporal, and text geographical locations caused by network congestion data. It also aims to verify the advantages of the proposed increasingly important. Improving the ability to model method in terms of risk identification accuracy, location-related risks has become the core foundation for propagation path reducibility, and risk distribution supporting project financing and investment feasibility stability. Thus, it supports power grid enterprises in risk assessment [8]. early warning and decision optimization. In recent years, artificial intelligence (AI) technologies have made remarkable progress in risk identification, modeling, and prediction. Model 2 Related work architectures represented by graph neural network (GNN), attention mechanisms, and deep semantic modeling have With the in-depth application of AI technologies and been gradually applied to financial risk control and big data analysis methods in engineering management, energy dispatching [9, 10]. Some studies have attempted investment project risk assessment has gradually shifted to introduce machine learning (ML) methods into the from traditional static analysis to intelligent prediction power engineering field. It includes using eXtreme and dynamic modeling. Aiming at the insufficiency of Gradient Boosting (XGBoost) for classification and risk assessment for manufacturing investments, Dong identification of construction anomalies, or employing a and Li proposed combining expert experience with big convolutional neural network (CNN) for trend prediction data mining to construct project risk indices and of construction period delays [11]. However, existing integrating CNN with Long Short-Term Memory (LSTM) methods generally suffer from shortcomings such as a for predictive modeling. In multiple sliding window tests, single model structure, weak data fusion capability, and the model achieved a Receiver Operating Characteristic difficulty in explaining cross-modal causal paths; these (ROC) value of 0.9366 and an average accuracy of methods cannot effectively support power grid 94.95%, demonstrating high prediction precision [12]. enterprises in achieving full-chain risk perception, Loseva et al., facing the risk assessment task of regional dynamic quantification, and structural early warning in a franchising projects, constructed a big data-based credit multi-source data environment. Therefore, there is an rating model by combining the SPARK information urgent need to construct a multi-modal driven composite system with ML methods. This verified the model's risk assessment system for power grid engineering robustness in identifying abnormal risks through scenarios. Spearman correlation and confusion matrix [13]. These To this end, this study proposes the GridRiskNet studies have provided useful insights into introducing model based on big data mining, constructs a fusion composite modeling methods and integrating expert mechanism for structured data, unstructured text, and judgment with data-driven mechanisms, gradually spatiotemporal data. Thus, it realizes comprehensive promoting the development of investment risk modeling and dynamic evaluation of investment risks in assessment towards intelligence and systematization. power grid engineering projects. The study's main Over the years, methods such as GNN, deep innovations encompass: clustering, and multi-criteria decision-making have been (1) Proposing a two-stage GridRiskNet model widely introduced into investment evaluation and project architecture: It integrates XGBoost and Light Gradient classification, further enhancing the structural cognitive Boosting Machine (LightGBM) for risk capture, and ability of risk assessment. Mostofi et al. constructed a models the propagation process of risks in the power grid construction project investment framework based on topology through a Graph Attention Recurrent Neural graph attention networks. This framework achieved a Network (GA-RNN). classification accuracy of over 98% in three sub-networks (2) Introducing Spatio-Temporal Graph of region, country, and financing model, demonstrating Convolutional Network (ST-GCN) and cross-modal the advantages of graph structure in modeling investment attention mechanisms: It enhances the model's expression decision-making relationships [14]. Qi used regularized capabilities for meteorological disturbances and regional topic models and graph clustering methods to construct a structural information; financial investment "behavior circle". They mapped (3) Constructing a risk principal component customer behaviors to the latent semantic space and projection index system based on Principal Component realized risk classification of financial communities and Analysis (PCA): It achieves structural clustering and investment plan recommendations through subgraph projection analysis of high-dimensional risk samples, and mining [15]. Moreover, Luo and Zhu proposed a deep supports the differentiated regional risk management neural network (DNN) model based on transfer learning needs of power grid enterprises. for regional investment risk assessment. This model Overall, the specific research question is whether maintained high prediction accuracy (up to 92%) in the multimodal data fusion and risk propagation modeling case of insufficient samples, demonstrating the potential methods can enhance the comprehensive capabilities of of deep learning in solving unbalanced data problems risk classification, propagation path identification, and [16]. These studies all reflect the integration trend of risk uncertainty quantification in complex power grid assessment models in recent years towards deep engineering projects. The target outcome is to achieve a representation learning, multi-layer decision-making GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 317 structures, and complex graph relationship modeling. multi-source features and generate unified and dense Although existing studies have made positive high-dimensional risk representation vectors, laying the progress in risk modeling methods, index system foundation for multi-dimensional risk modeling [20]. construction, and model accuracy improvement, there are At the core of modeling, GridRiskNet adopts a two- still three main deficiencies. First, most current models stage hybrid modeling framework. In the first stage, the focus on classification or regression prediction of risk improved XGBoost and LightGBM models run in probability, lacking the ability to model regional parallel to jointly perform risk prediction on high- structural propagation characteristics. Second, the dimensional risk representation vectors. Specifically, heterogeneity of multi-source data has not been fully XGBoost integrates a dynamic feature selection utilized, and a unified representation for structured, mechanism, which dynamically updates feature spatiotemporal, textual, and other multimodal importance indices based on sliding window statistical information has not been formed. Third, the features to enhance the response capability to dynamic interpretability and quantifiability of risk structure risk factors. LightGBM incorporates a time-series-aware evolution must be enhanced, making it difficult to support splitting criterion to strengthen the detection capability dynamic scheduling and regional risk management of for time-series anomalies such as project schedule delays. complex systems such as power grid projects [17]. In The two models output the prediction probabilities of risk response to the above shortcomings, this study proposes categories (i.e., risk probability vectors after Softmax) a grid engineering project investment risk management and sequences of feature importance scores [21]. system based on big data mining - the GridRiskNet model. In the second stage, GA-RNN is used as a meta- This model reveals the changing trends of high- model, whose core innovation lies in fusing the dual dimensional risk structures and supports grid enterprises output information from the first stage mentioned above. in accurately perceiving and dynamically controlling Specifically, GA-RNN takes the risk probability vectors investment risks at different regions and time scales. of XGBoost and LightGBM as the main input; simultaneously, it introduces their feature importance 3 GridRiskNet model based on big score sequences as auxiliary features to form a comprehensively fused feature matrix. This matrix data mining contains the risk prediction results from the previous stage; it also explicitly integrates the influence weights of 3.1 Realization process of GridRiskNet features on the model output, thereby enhancing the ability to perceive risk propagation mechanisms [22]. model Subsequently, based on this matrix, GA-RNN introduces The proposed GridRiskNet model realizes a risk propagation graph structure and accurately models intelligent assessment of investment risks in power grid the transmission relationships between risk factors engineering projects based on multi-source through an adjacency matrix. Moreover, it uses graph heterogeneous data fusion and a hybrid ML architecture. attention mechanisms and recurrent neural network It first establishes a multimodal data preprocessing layer. (RNN) units to dynamically learn key nodes and main For structured data (such as project budgets and channels in risk propagation paths, extracting high-order equipment parameters), an adaptive normalization interaction features. method is used to unify dimensions, ensuring the The entire GridRiskNet model comprehensively consistency of feature scales. For unstructured text data optimizes classification cross-entropy loss, risk (including engineering logs and bidding documents), a propagation graph reconstruction error, and feature fine-tuned Bidirectional Encoder Representations from stability regularization terms through an end-to-end joint Transformers (BERT) model is utilized to deeply extract training strategy. Finally, this model outputs a multi- semantic features, enhancing the risk perception ability dimensional risk assessment matrix covering risk of text information. For spatiotemporal data (such as probability distribution, potential impact range, and construction trajectories and meteorological records), ST- structural features. The entire system adopts an online GCN is introduced to jointly encode complex incremental learning mechanism, which can continuously environmental features from two dimensions: spatial absorb real-time data flow to dynamically update model dependence and temporal dynamics [18, 19]. In the parameters; this achieves a high adaptability and feature fusion stage, a cross-modal attention mechanism continuous tracking of the risk environment of power grid is designed, which can adaptively learn the weight engineering projects. The implementation process and relationships between different data modalities. pseudocode of GridRiskNet are illustrated in Figures 1 Meanwhile, this mechanism can effectively integrate and 2. 318 Informatica 49 (2025) 315–330 H. Gao et al. Figure 1: The implementation process of GridRiskNet class GridRiskNet: def _process_structured(self, data): def __init__(self, config): return AdaptiveNormalization(data) self.config = config self.preprocessor = MultiModalPreprocessor(config) def _process_text(self, data): self.feature_fusion = CrossModalAttention(config) return FineTuneBERT(self.bert_model, data) self.first_stage = HybridEnsembleModels(config) self.risk_graph = RiskPropagationGraph(config) def _process_spatiotemporal(self, data): self.second_stage = GARNNMetaModel(config, self.risk_graph) return STGCN(self.stgcn_params).forward(data) def train(self, dataset): class CrossModalAttention: features = self.preprocessor.process(dataset) def __call__(self, features): fused_features = self.feature_fusion(features) weights = self._compute_attention_weights(features) first_stage_preds = self.first_stage.train(fused_features, dataset.labels) return weighted_sum(features, weights) self.second_stage.train(first_stage_preds, fused_features, dataset.labels) class HybridEnsembleModels: for epoch in range(self.config.epochs): def __init__(self, config): preds = self.predict(dataset) self.xgboost = ImprovedXGBoost(config) loss = self._calculate_loss(preds, dataset.labels) self.lightgbm = ImprovedLightGBM(config) self._update_models(loss) def train(self, features, labels): def predict(self, dataset): xgb_preds = self.xgboost.train(features, labels) features = self.preprocessor.process(dataset) lgbm_preds = self.lightgbm.train(features, labels) fused_features = self.feature_fusion(features) return combine_predictions(xgb_preds, lgbm_preds) first_stage_preds = self.first_stage.predict(fused_features) return self.second_stage.predict(first_stage_preds, fused_features) class RiskPropagationGraph: def __init__(self, config): def update_with_new_data(self, new_data): self.adj_matrix = self._construct_adjacency_matrix(config.risk_factors) features = self.preprocessor.update_and_process(new_data) self.first_stage.update(features, new_data.labels) def _construct_adjacency_matrix(self, risk_factors): first_stage_preds = self.first_stage.predict(features) # Construct adjacency matrix based on domain knowledge or data learning self.second_stage.update(first_stage_preds, features, new_data.labels) pass class MultiModalPreprocessor: class GARNNMetaModel: def process(self, dataset): def train(self, first_stage_preds, features, labels): return { # Train GA-RNN model 'structured': self._process_structured(dataset.structured), pass 'text': self._process_text(dataset.text), 'spatiotemporal': self._process_spatiotemporal(dataset.spatiotemporal) def predict(self, first_stage_preds, features): } # Predict risk assessment matrix pass Figure 2: The pseudocode of GridRiskNet GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 319 learn the most discriminating risk signal source when 3.2 Mathematical modeling principle of faced with heterogeneous features and semantic diversity. GridRiskNet model The hybrid modeling framework is divided into two stages. In the first stage, the improved XGBoost and Figure 1 shows the complete implementation LightGBM models are run in parallel. The objective process of the GridRiskNet model, covering the entire function of XGBoost reads: process from user-requested risk assessment to model ℒ𝑥𝑔𝑏 = ∑𝑛 𝑙(𝑦 + ∑𝐾 𝑖=1   𝑖 , ?̂?𝑖) 𝑘=1   Ω(𝑓𝑘) (8) output results and continuous updates. The model is built ?̂? on multi-source heterogeneous data, fusing structured, 𝑖 = ∑𝐾 𝑘=1   𝑓𝑘(𝐱𝑖) , which represents the predicted 1 unstructured, and spatiotemporal information, and value of sample 𝑖 . Ω(𝑓𝑘) = 𝛾𝑇𝑘 + 𝜆 ∥ 𝜔 ∥2 is the 2 𝑘 achieves intelligent prediction of power grid project risks regular term of the 𝑘-th tree. through multi-stage ML and graph modeling strategies. To adapt to the dynamic change of time, XGBoost The key computational links in the model are described integrates a sliding window statistical module to mathematically and logically as follows. dynamically adjust the importance of features: In the data preprocessing stage, the structured input (𝑡) 𝐼𝑗 = ∑𝑡 𝑠=𝑡−𝑤   (9) data is first normalized. Let the original data matrix be: (𝑠) Δ𝐺 𝐗 𝑗 indicates the gain change of the 𝑗-th feature 𝑠 ∈ ℝ𝑛×𝑑𝑠 (1) (𝑡) 𝐗𝑠 represents 𝑛 records, and each record contains in the 𝑠 -th time step. 𝐼𝑗 c a dynamic feature 𝑑𝑠 structural features. Normalization calculation is as importance index within the XGBoost stage, used to follows: reflect gain changes within the sliding window; it is also 𝐗 𝜇 ?̃? 𝑠− mainly applied to internal feature selection and dynamic 𝑠 = 𝑠 (2) 𝜎𝑠+𝜖 weight adjustment of the first-stage model. 𝜇𝑠 denotes the column vector, indicating the LightGBM introduces the split criterion of time average value of each column. 𝜎𝑠 represents the series perception to enhance the ability of anomaly standard deviation (SD), and 𝜖 is a positive number to recognition. Let the time series samples be prevent the denominator from being zero. This {𝐱1, 𝐱2, … , 𝐱𝑇}, and its splitting gain is defined as: processing ensures that the model has numerical 2 ∈𝐿 𝑔 (∑ consistency among different dimensional features. 𝒢 𝑇 (∑𝑖   𝑡 𝑖) 𝑖∈𝑅  𝑔 2 𝑗 = ∑𝑡=1   𝑤𝑡 ⋅ [ + 𝑡 𝑖) ] (10) ∑𝑖∈𝐿  ℎ 𝜆 ∑ 𝑡 𝑖+ 𝑖∈𝑅  ℎ 𝑡 𝑖+𝜆 For unstructured text data 𝒯 = {𝑡1, 𝑡2, … , 𝑡𝑚} , 𝑔𝑖 and ℎ𝑖 are gradients and second derivatives. 𝐿𝑡 semantic features are extracted by fine-tuning BERT and 𝑅𝑡 represent the left and right sample sets of the model, and the output is: current split, and 𝑤 −𝛽(𝑇−𝑡) 𝐇 𝑡 = 𝑒 is the time attenuation 𝑡 = BERT(𝒯) = [𝐡1; 𝐡2; … ; 𝐡𝑚], 𝐡𝑖 ∈ ℝ𝑑𝑡 (3) weight. 𝐡𝑖 is the semantic vector of the 𝑖-th text, and the In the second stage, GA-RNN is used to capture the dimension is 𝑑𝑡 . This step preserves the semantic high-order risk path. Its node status is updated to: relationship between text contexts and forms an (𝑡) (𝑡−1) (𝑡−1) important basis for the model to recognize risk semantics. 𝐡𝑖 = GRU(∑𝑗∈𝒩(𝑖)   𝛼𝑖𝑗𝐡𝑗 , 𝐡𝑖 ) (11) For spatiotemporal data including trajectory and 𝒩(𝑖) indicates the neighbor set of node 𝑖 . 𝛼𝑖𝑗 v meteorology, it is expressed as: the edge weight under the graph attention mechanism: 𝐗 𝑇×𝑁×𝐹 exp⁡(LeakyReLU(𝐚⊤[𝐖𝐡 𝑠𝑡 ∈ ℝ (4) 𝑖∥𝐖𝐡𝑗])) 𝛼𝑖𝑗 = 12) 𝑇 refers to the time step. 𝑁 denotes the space node ∑𝑘∈𝒩(𝑖)  exp⁡(LeakyReLU(𝐚 ⊤ ( [𝐖𝐡𝑖∥𝐖𝐡𝑘])) (such as the site number), and 𝐹 represents the space- Finally, the system integrates three kinds of time characteristic dimension of each node. ST-GCN is objectives: classification performance, graph structure used for modeling, and its core propagation equation is: reconstruction, and feature stability by jointly optimizing 𝐙(𝑙+1) = 𝜎(∑𝐾 (𝑙) the overall loss function. 𝑘=0   𝐀𝑘𝐙 𝐖𝑘) (5) 𝐀 ℒ𝑐𝑒 is the cross-entropy loss: 𝑘 means the adjacency matrix of order 𝑘 ; 𝐙(𝑙) indicates the node representation of the 𝑙-th layer; 𝐖 ℒ𝑡𝑜𝑡𝑎𝑙 = ℒ𝑐𝑒 + 𝜆1 ⋅ ℒ𝑔𝑟𝑎𝑝ℎ + 𝜆2 ⋅ ℒ𝑟𝑒𝑔 (13) 𝑘 stands for the weight matrix; 𝜎 represents the activation The loss of graph structure consistency is: function. This network structure can capture the linkage ℒ𝑔𝑟𝑎𝑝ℎ =∥ 𝐀 − ?̂? ∥2𝐹 (14) relationship between spatial topology and time evolution. The regular term of characteristic disturbance reads: In the feature fusion stage, the model introduces ℒ 𝑑 𝑟𝑒𝑔 = ∑𝑗=1   Var(∇𝐱 ?̂?) (15) 𝑗 cross-modal attention mechanism to automatically On the system deployment level, GridRiskNet aggregate multi-source information. Let two modal adopts online incremental learning mechanism. Let the features be 𝐅𝑖 and 𝐅𝑗 respectively, and their attention current parameter be 𝜃𝑡, and the model is updated after weights are calculated as: receiving new samples (𝐱𝑡 , 𝑦𝑡): exp⁡(𝐅⊤ 𝛼 𝑖 𝐖𝑎𝐅𝑗) 𝜃𝑡+1 = 𝜃𝑡 − 𝜂 ⋅ ∇𝜃ℒ(𝐱𝑡 , 𝑦𝑡; 𝜃𝑡) (16) 𝑖,𝑗 = (6) ∑𝑘  exp⁡(𝐅 ⊤ 𝑖 𝐖𝑎𝐅𝑘) 𝜂 represents the learning rate. ∇𝜃 denotes the After fusion, a unified risk representation vector is gradient operator. This mechanism ensures that the model obtained: has adaptive update abilities in a dynamic risk 𝐅𝑓𝑢𝑠𝑖𝑜𝑛 = ∑𝑗   𝛼𝑖,𝑗 ⋅ 𝐅𝑗 (7) environment. This mechanism enables the model to automatically 320 Informatica 49 (2025) 315–330 H. Gao et al. 4 Experimental analysis of into spatiotemporal tensors, from which meteorological risk features are extracted through ST-GCN encoding. GridRiskNet model project The spatial topology data is obtained from the OpenStreetMap power network dataset investment risk management based on (https://wiki.openstreetmap.org/wiki/Power_networks). big data mining The OSMnx library extracts GIS data of substations and transmission lines, constructing an adjacency matrix to model the physical connection of the power grid. For 4.1 Data used in the study unstructured text data, engineering accident reports from To verify the risk management capability of the 2018 to 2023 corresponding to EIA projects are manually GridRiskNet model for power grid enterprise engineering screened from the Federal Energy Regulatory projects, the study uses three core public datasets for Commission (FERC) engineering accident report library experimental validation and designs a fusion scheme for (https://elibrary.ferc.gov/eLibrary/search). After parsing data heterogeneity. First, the structured data adopts the the text with Apache Tika, the fine-tuned BERT is input U.S. Energy Information Administration (EIA) power to generate semantic vectors. infrastructure dataset The following fusion strategies are adopted to (https://www.eia.gov/electricity/data.php). Its API address the heterogeneity of multi-source data. 1) interface screens power grid engineering project data Temporal alignment: All data is uniformly converted to from 2018 to 2023, including budget, construction period, Universal Time Coordinated (UTC) timestamps and equipment models, and other fields. After extracting the aggregated at a granularity of 1 day. 2) Spatial alignment: original CSV-format data using Python's eia-python Meteorological stations, power grid nodes, and library, adaptive normalization is performed to eliminate engineering sites are associated through GIS coordinate dimension differences, which are associated with matching (error <1km). 3) Consistency of feature subsequent spatiotemporal data through project IDs and encoding: Structured data is normalized to [0,1], text date fields. Second, the spatiotemporal data selects the vectors are unified into 768 dimensions via BERT, and National Oceanic and Atmospheric Administration spatiotemporal data is compressed into 256-dimensional (NOAA) Global Historical Climatology Network-Daily features through ST-GCN. 4) Cross-modal attention (GHCN-Daily) mechanisms automatically learn the weights of each (https://www.ncei.noaa.gov/access/metadata/landing- modality, assigning higher attention scores to extreme page/bin/iso?id=gov.noaa.ncdc:C00861). Daily values of meteorological text descriptions (such as "hurricane temperature, precipitation, and wind speed are damage"). The specific process of importing data into downloaded, and stations are matched to the project's GridRiskNet is presented in Figure 3. geographic coordinates. The rnoaa toolkit converts them Figure 3: The specific process of importing data into GridRiskNet GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 321 4.2 GridRiskNet model's thinking on risk of potential impact scope, the propagation path of risks in the power grid topology is analyzed through GNN to management ability analysis of power identify high-risk nodes and their potentially affected grid enterprises' engineering projects surrounding areas. The model inputs the fused multi- The study is conducted from two aspects to dimensional features into a two-stage modeling effectively analyze the risk management capability of the framework and outputs a risk assessment matrix GridRiskNet model for power grid enterprise engineering including the above two types of indices to support the projects. First, at the level of risk probability distribution, refined and structured management and decision-making the probability of risks such as budget overrun and of power grid project risks [24, 25]. The entire analysis's construction period delay is evaluated based on structured key indices and evaluation criteria are exhibited in Tables data and spatiotemporal features [23]. Second, at the level 1 and 2. Table 1: Explanation of key indices for analysis using the GridRiskNet model Analytical Index Data source/calculation method Description dimension Reflecting the distribution Principal component score of high- position of samples in the risk dimensional risk vector output by the principal component space, and is Risk PCA Projection Score GridRiskNet model after PCA used to identify high-risk dimensionality reduction clustering or structural abnormal samples. The capture times of abnormal events Monitoring the frequency of Risk Time-series Anomaly Frequency in LightGBM abnormal progress. probability Softmax outputs the maximum Evaluating the credibility of the distribution Model Confidence Score probability value model output analysis Assessing the dispersion degree The ratio of SD to the mean value of of risk probability distribution, Risk Coefficient of Variation the risk probability distribution the greater it is, the higher the risk instability is. Comprehensive weighted scores of Representing the strength of risk Risk Importance Index multiple dimensions influence Information entropy calculation of The degree of uncertainty in Risk Entropy risk probability distribution evaluating risk results. Critical path length identified in GA- Length and complexity of the risk Risk Propagation Path Length Analysis of RNN propagation path the The weighted average of the affected Reflect the vulnerability of nodes potential Node Vulnerability Score probability of each node in GNN in the power grid influence Based on propagation path depth and range Indicating the physical scope of Risk Impact Radius the spatial adjacency matrix in the risk propagation graph structure Table 2: Criteria for determining key indices in the GridRiskNet model analysis Index Type Criteria Secondary - [0, 2) Low projection; [2, 5) Medium projection; ≥5 High projection, Risk PCA Projection Score calculation tending to abnormal samples or extreme types Time-series Anomaly Frequency Model output - [0, 2) Normal; [2, 5) Early warning; ≥5 Abnormal - [0.9, 1] High credibility; [0.7, 0.9) Medium credibility; <0.7 Low Model Confidence Score Model output credibility Secondary Risk Coefficient of Variation - [0, 0.3) Stable; [0.3, 0.6) Fluctuating; ≥0.6 Highly unstable calculation Secondary Risk Importance Index - [0, 40) Secondary; [40, 70) Important; [70, 100] Critical calculation Secondary Risk Entropy - [0, 1) Low uncertainty; [1, 2) Medium; ≥2 High calculation Risk Propagation Path Length Model output - [1, 3) Local; [3, 6) Regional; ≥6 Global Node Vulnerability Score Model output - [0, 0.4) Low; [0.4, 0.7) Medium; [0.7, 1] High Secondary Risk Impact Radius - [0, 5) Station level; [5, 20) Line level; ≥20 Regional level calculation In Table 2, the index equations involved in (1) Risk PCA Projection Score secondary calculation are as follows: Here, "Risk PCA Projection Score" measures the 322 Informatica 49 (2025) 315–330 H. Gao et al. position of a sample in the dominant risk structure within the system, which helps to identify complex and the risk feature space, revealing the main variation trends unpredictable risk scenarios, represented as: in complex multi-dimensional risk features. Specifically, 𝐻 = −∑𝑛 𝑖=1   𝑝𝑖 ⋅ log2⁡(𝑝𝑖 + 𝜖) (19) this index is calculated based on the PCA method. First, 𝐻 denotes the information entropy of risk it standardizes the annual high-dimensional risk features distribution. (such as cost overrun risk, environmental and climate (4) Risk Importance Index pressure, etc.). Then, it extracts the first K principal This index quantifies the comprehensive component directions and measures the sample's contribution of each risk feature to the overall risk projection value in the principal component space assessment results. It reflects the importance level of each through eigenvalue weighting. This score reflects the risk feature by weighted accumulation of the impact degree of variance contribution of the sample along the degree of each feature on the model loss and normalized principal component axis of risk, rather than a simple sum averaging combined with model weights. Features with of the scores of each risk factor. Due to the different higher values play a greater role in the overall risk statistical distributions of risk features each year, this decision-making, expressed as: index changes with the year; it comprehensively reflects (𝑡) (𝑡) 1 𝑤 ⋅Δ𝐿 𝑅𝐼𝑗 = ∑𝑇 =   𝑗 𝑗 the overall trend of the risk structure of power grid 𝑇 𝑡 1 ( (𝑡 ) ∑𝑑 ) (20) 𝑘=1  Δ𝐿𝑘 projects in the current year and potential abnormal 𝑅𝐼𝑗 represents the risk importance index of the 𝑗-th clustering characteristics. The calculation is expressed as: feature, which is a risk importance index in the entire 𝑠𝑖 = ∑𝐾 ⊤ 𝑘=1   𝜆𝑘 ⋅ (𝐮𝑘(𝐱𝑖 − 𝝁))2 (17) GridRiskNet framework. It is comprehensively 𝐱𝑖 represents the high-dimensional risk feature calculated based on the feature weights and loss impact vector of the 𝑖-th sample. 𝝁 indicates the sample mean during the global model training process, belonging to a vector. 𝐮𝑘 denotes the feature vector in the direction of unified index at the global level. 𝑇 means the number of the 𝑘-th principal component. 𝜆𝑘 is the feature value of (𝑡) the 𝑘-th principal component, and 𝐾 means the number model iterations or average times; 𝑤𝑗 refers to the of selected principal components. model weight of the 𝑗 -th feature in the 𝑡 -th iteration; (2) Risk Coefficient of Variation (𝑡) Δ𝐿𝑗 is the influence degree of the 𝑗-th feature on the This index is used to measure the relative dispersion loss function; 𝑑 denotes the total number of features. of risk probability distribution, which is an important (5) Risk Impact Radius index reflecting risk instability. This index describes the It evaluates the spatial propagation range of risks in fluctuation range of various risk probabilities in the the power grid graph structure, serving as a key index for whole by calculating the ratio of the SD of risk measuring the physical scope affected by risks. This probability to the mean. A higher value indicates that the index calculates the average impact radius of all risk risk probability distribution is more dispersed and the source nodes in the network based on the power grid overall instability is stronger. The expression is: topology, geographical distance between nodes, and risk 1 √ ∑𝑛 1   − ) 𝑛 𝑖= (𝑝 2 𝑖 ?̅? propagation probability. A larger value indicates a wider 𝐶𝑉 = (18) ?̅?+𝜖 spatial propagation range of risk events, which is applied 𝑝𝑖 denotes the prediction probability of Class 𝑖 to regional risk impact analysis, as follows: 1 risk. ?̅? means the average value of various risk 𝑅 ∑𝑁 = 𝑠 =   𝑁 ⋅ 𝑑 𝑁 𝑖 1 ∑𝑗=1   𝑡𝑖𝑗 𝑖𝑗 ⋅ 𝑝𝑖𝑗 (21) probabilities, and 𝑛 represents the total number of risk 𝑠 𝑁𝑠 represents the number of risk source nodes. 𝑁 categories. denotes the total number of nodes in the graph. 𝑡𝑖𝑗 is the (3) Risk Entropy adjacency relationship between nodes 𝑖 and 𝑗 (1 means "Risk Entropy" measures the degree of uncertainty connection). 𝑑𝑖𝑗 means the geographical distance in the risk probability distribution, reflecting the discreteness and unpredictability of risk results. Based on between nodes. 𝑝𝑖𝑗 refers to the risk propagation information entropy theory, this index reveals the probability from nodes 𝑖 to 𝑗. potential risk mixture in the system by calculating the Figure 4 presents the pseudocode of the index entropy value of the probability of all risk categories. A implementation involving secondary calculation. higher risk entropy value indicates more uncertainties in GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 323 import math import numpy as np # 4. Risk Importance Index def compute_risk_importance(weights, delta_losses): # 1. Risk PCA Projection Score T = len(weights) def compute_risk_pca_projection_scores(X, K): D = len(weights[0]) mu = np.mean(X, axis=0) importance = [0.0] * D X_centered = X - mu for j in range(D): cov_matrix = np.cov(X_centered, rowvar=False) for t in range(T): eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix) total_delta = sum(delta_losses[t]) sorted_idx = np.argsort(eigenvalues)[::-1] if total_delta == 0: eigenvalues = eigenvalues[sorted_idx][:K] continue eigenvectors = eigenvectors[:, sorted_idx][:, :K] importance[j] += weights[t][j] * delta_losses[t][j] / total_delta importance[j] /= T # 2. Risk Coefficient of Variation return importance def compute_risk_cv(probabilities): mean_p = np.mean(probabilities) # 5. Risk Impact Radius std_p = np.std(probabilities) def compute_risk_impact_radius(adj_matrix, distance_matrix, epsilon = 1e-6 propagation_probs, source_nodes): return std_p / (mean_p + epsilon) N = len(adj_matrix) total_radius = 0.0 # 3. Risk Entropy for i in source_nodes: def compute_risk_entropy(probabilities): for j in range(N): epsilon = 1e-6 if adj_matrix[i][j] == 1: return -sum(p * math.log2(p + epsilon) for p in probabilities) total_radius += distance_matrix[i][j] * propagation_probs[i][j] return total_radius / len(source_nodes) Figure 4: Pseudocode of index implementation involving secondary calculation The experimental environment and key parameters are detailed in Table 3. Table 3: Experimental environment and key parameters arrangement of the study Category Configuration item Parameter setting Hardware Computing platform NVIDIA A100 (40GB memory) × 4 environment CPU AMD EPYC 7763 (64-core) Memory 512GB DDR4 Deep learning framework PyTorch 1.12 + CUDA 11.6 Software GNN library PyTorch Geometric 2.2.0 environment Traditional ML library XGBoost 1.6 + LightGBM 3.3.2 NLP toolkit HuggingFace Transformers 4.25 (BERT-base) ST-GCN layer number 3 layers (hidden layer dimension =256) Model Graph Attention Layer (number of heads =8) +GRU (hidden layer GA-RNN unit architecture =512) Transmodal attention Multi-attention (number of heads =4, fusion dimension =1024) mechanism Batch size 256 (structured data)/32 (graph data) Initial learning rate 3e-4 (AdamW optimizer) Training regularization L2 Weight Attenuation =1e-5+Dropout=0.3 parameters The loss of verification set does not decrease for 10 consecutive Early stop mechanism rounds The study designs ablation experiments before the same hyperparameter configuration on the complete conducting formal experiments to verify the actual dataset, focusing on evaluating three indices. These contribution of each core component of GridRiskNet. It indices include risk classification performance (F1-score, seeks to quantitatively measure the impact of different Receiver Operating Characteristic - Area Under Curve modules on the model's overall performance from a (ROC-AUC)), risk propagation accuracy (Risk Impact systematic perspective. Specifically, four ablation Radius error), and uncertainty quantification ability (Risk versions are set by sequentially disabling the cross-modal Entropy). This experiment aims to clarify the mechanism attention mechanism, the risk propagation modeling of action of each module, especially their specific module of GA-RNN, the dynamic feature selection contributions to power grid risk transmission modeling, module, and the risk propagation graph reconstruction modal feature fusion, and risk stability control. The term in the joint loss function. All experiments maintain results of the ablation experiments are listed in Table 4. 324 Informatica 49 (2025) 315–330 H. Gao et al. Table 4: Ablation experimental results of the GridRiskNet model Ablation version F1-Score ROC-AUC Risk Impact Radius error (km) Risk Entropy Full GridRiskNet 0.892 0.962 4.8±0.9 0.89 No cross-modal attention 0.835 0.917 7.5±1.6 1.12 No GA-RNN 0.846 0.926 14.2±2.3 0.96 No dynamic feature selection 0.863 0.941 5.7±1.2 0.94 Risk-free propagation graph 0.871 0.948 4.9±1.0 2.08 reconstruction The results of the ablation experiments indicate that optimizing uncertainty quantification; its elimination each module of GridRiskNet makes a significant causes a substantial rise in Risk Entropy. Overall, contribution to the model performance. The cross-modal GridRiskNet achieves the unity of high performance and attention mechanism is particularly crucial in improving high robustness through the collaboration of various classification performance; after being disabled, the F1- modules, with all components being indispensable. score decreases by 6.4%, the ROC-AUC drops by 4.7%, and the Risk Entropy rises significantly. This shows that this module significantly impacts the collaborative 4.3 Analysis Results of GridRiskNet perception of complex semantic and meteorological features. The risk propagation modeling module of GA- model on risk management ability of RNN notably improves the Risk Impact Radius error; power grid enterprises' engineering after being disabled, the error increases sharply to 14.2 projects km, verifying its core role in power grid topology modeling. The dynamic feature selection module mainly enhances the temporal sensitivity of the model; its 4.3.1 Risk probability distribution analysis removal leads to a significant drop in F1-score, although GridRiskNet's annual Risk PCA Projection Score it has a limited impact on propagation errors. The risk results for power grid enterprise engineering projects are propagation graph reconstruction term has a significant summarized in Table 5. effect on suppressing prediction fluctuations and Table 5: Annual Risk PCA Projection Score results Ambient Supply Cost Equipment Policy Risk PCA climate chain Year overrun technical compliance Projection Risk tendency pressure fluctuation risk C1 risk C3 risk C5 Score C2 C4 Middle projection 2018 1.235 0.873 -0.452 0.217 0.095 2.108 (structural abnormality) 2019 0.892 0.654 -0.128 -0.304 0.062 1.546 Low projection High projection 2020 2.874 1.982 1.235 -0.873 0.517 4.856 (extreme type) 2021 1.023 1.457 0.782 0.396 -0.215 2.48 Middle projection High Projection 2022 3.125 2.769 2.014 1.358 -0.947 5.894 (abnormal clustering) High projection 2023 4.562 3.217 3.058 2.146 1.372 7.779 (extreme anomaly) Table 5 shows that cost overrun risk (C1) and the "double carbon" policy. The model reflects high-risk environmental climate pressure (C2) have always been clustering scenarios such as C1-C3 in 2023 through the the dominant risks, especially showing exponential spatial distribution of principal components, reflecting growth after 2020. In 2023, C1 (4.562) increased by 269% the early warning of composite risks. compared with 2018 (1.235), which is highly consistent Based on the above analysis, the risk probability with the reality of global inflation and frequent extreme distribution analysis of grid enterprise engineering weather. The sudden turn positive (1.372) of policy projects by GridRiskNet is organized, and the annual compliance risk (C5) in 2023, to some extent, reveals the average results of other indices are shown in Figure 5. surge of compliance costs brought by the deepening of GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 325 Time-series Anomaly Frequency Model Confidence Score 14 Risk Coefficient of Variation 1.0 Risk Importance Index Risk Entropy 100 0.9 12 3 0.8 10 80 0.8 8 0.6 2 6 60 0.4 0.7 4 1 40 2 0.2 0.6 0 20 0 2018 2019 2020 2021 2022 2023 Year Figure 5: The annual average results of other indices in GridRiskNet risk probability distribution analysis Note: The curves of each index correspond one-to-one with the corresponding color coordinate axes on the right In Figure 5, regarding the frequency of time-series assessment from "anomaly detection" to "impact anomalies, the average annual growth rate of abnormal prediction". events during 2019-2023 reached 65.7%. The model objectively reflects the increasing complexity of risks 4.3.2 Analysis of potential influence range through the continuous decline in confidence (from 0.912 to 0.632). The sudden increase in risk entropy (2.158) in The study divides the U.S. power grid into three 2020 preceded the peak of the importance index (83.47); major regions: The Eastern Interconnection Power Grid this indicates that GridRiskNet can capture the implicit (EIPG), the Western Interconnection Power Grid (WIPG), correlations of risk factors through information entropy. and the Texas Interconnected Power Grid (TIPG). The The synchronous increase in the coefficient of variation EIPG covers the eastern, midwestern, and parts of (from 0.712 to 0.859) and risk entropy (from 2.547 to southern U.S. states, extending northward to eastern 2.981) after 2022 reveals the transformation trend of risk Canada. The WIPG covers most western U.S. states, distribution from centralized to discretized; this provides connecting with western Canada in the north and key evidence for power grid enterprises to optimize the reaching parts of Mexico in the south. The TIPG includes allocation of risk reserve funds. The core advantage of the most of Texas. These regional grids are interconnected at model lies in the quantitative modeling of the dynamic limited DC points but mostly operate independently. coupling relationship among the three dimensions of Based on this, GridRiskNet's analysis results on the engineering anomalies, risk uncertainty, and impact potential impact scope of power grid enterprise degree. Meanwhile, it realizes the full-chain risk engineering projects are displayed in Figure 6. Risk Propagation Path Length Risk Propagation Path Length Node Vulnerability Score 1.00 9.0 10 Node Vulnerability Score 1.00 9.0 10 Risk Impact Radius (km) Risk Impact Radius (km) 7.5 8 7.5 8 0.75 0.75 6.0 6 6.0 6 4.5 4.5 4 4 0.50 0.50 3.0 3.0 2 2 1.5 1.5 0.25 0 0.25 0 EIPG WIPG TIPG EIPG WIPG TIPG Interconnected area Interconnected area (a) (b) Node Vulnerability Score Risk Propagation Path Length Model Confidence Score Time-series Anomaly Frequency Risk Impact Radius (km) Node Vulnerability Score Risk Propagation Path Length Risk Coefficient of Variation Risk Importance Index Risk Entropy Risk Impact Radius (km) 326 Informatica 49 (2025) 315–330 H. Gao et al. Risk Propagation Path Length Risk Propagation Path Length Node Vulnerability Score Node Vulnerability Score 1.00 9.0 20 1.00 9.0 20 Risk Impact Radius (km) Risk Impact Radius (km) 7.5 16 7.5 16 0.75 0.75 6.0 6.0 12 12 4.5 4.5 8 8 0.50 0.50 3.0 3.0 4 4 1.5 1.5 0.25 0 0.25 0 EIPG WIPG TIPG EIPG WIPG TIPG Interconnected area Interconnected area (c) (d) Risk Propagation Path Length Risk Propagation Path Length Node Vulnerability Score Node Vulnerability Score 1.00 9.0 30 1.00 9.0 40 Risk Impact Radius (km) Risk Impact Radius (km) 7.5 25 7.5 32 0.75 20 0.75 6.0 6.0 24 15 4.5 4.5 16 0.50 10 0.50 3.0 3.0 8 5 1.5 1.5 0.25 0 0.25 0 EIPG WIPG TIPG EIPG WIPG TIPG Interconnected area Interconnected area (e) (f) Figure 6: Analysis of GridRiskNet's potential impact on power grid enterprise engineering projects ((a) 2018; (b) 2019; (c) 2020; (d) 2021; (e) 2022; (f) 2023) Note: The curves of each index correspond one-to-one with the corresponding color coordinate axes on the right Based on index definitions and annual data, and 2023, reaching 26.4 km in 2023, reflecting the GridRiskNet demonstrates a scientific nature and significant cumulative effect of regional risk diffusion. structural insight in the analysis of potential impact This significant spatial diffusion trend is not caused by ranges. First, for Risk Propagation Path Length, WIPG single-year fluctuations but by the accumulation of remains at a high level throughout the entire period, continuous transmission chains. Its essence is the scope reaching 8.1 in 2023, significantly exceeding that of other expansion of power grid risks through multiple rounds of regions. This gap is not accidental but a reflection of transmission and cross-node amplification, which is more long-term structural characteristics, revealing the obvious, especially in scenarios with multiple extensibility of transmission links in the western power overlapping risks. The reason why GridRiskNet can grid due to complex terrain and diverse energy structures. effectively capture this phenomenon lies in the deep Second, the changing trend of Node Vulnerability Score coupling of its GNN and propagation probability is more enlightening; the three major power grids' scores mechanism. It can dynamically track the evolution of risk all rose sharply in 2020, with the average value doubling paths and ranges in complex networks, thereby compared to the previous year. This synchronous surge identifying the critical points and amplification effects of highly aligns with the global external shock events in risk diffusion. Therefore, it possesses real value in 2020, indicating that the model is highly sensitive to regional risk monitoring and trend early warning. The network vulnerability under systemic disturbances. capture of this cumulative diffusion trend reflects the In addition, the Risk Impact Radius index essentially model's structural sensitivity to "spatiotemporal measures the physical diffusion capacity of risks from overlapping risks", which far exceeds the single- source nodes to the surrounding space; its calculation description capability of traditional static indices. integrates network topology, geographical distance, and propagation probability. According to the data, WIPG's 4.3.3 Comparative Analysis of GridRiskNet Risk Impact Radius rapidly increased from 10.8 km in and other models 2021 to 25.7 km in 2022, and further to 35.2 km in 2023, with a cumulative increase of over 225% in two years. To comprehensively evaluate the GridRiskNet TIPG also showed a continuous expansion between 2022 model's effectiveness in investment risk management of Node Vulnerability Score Node Vulnerability Score Risk Propagation Path Length Risk Propagation Path Length Risk Impact Radius (km) Risk Impact Radius (km) Node Vulnerability Score Node Vulnerability Score Risk Propagation Path Length Risk Propagation Path Length Risk Impact Radius (km) Risk Impact Radius (km) GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 327 power grid engineering projects, this study designs two ability (Risk Entropy) to reflect the model's types of comparative experiments. The first type is a comprehensive capabilities. The second type is a detailed horizontal comparison with existing State-of-the-Art comparison with classic baseline models, comparing (SOTA) models. It selects representative models in risk individual methods such as XGBoost, LightGBM, ST- assessment, regional propagation modeling, and GCN, and BERT-BiLSTM. It focuses on examining the uncertainty quantification in recent years, including model's performance in robustness, spatial-temporal methods such as CNN-LSTM, to ensure fair comparison feature extraction, anomaly detection, etc. It highlights under a unified dataset and the same task indices. The the advantages of GridRiskNet in multimodal data fusion, comparison covers risk classification performance (F1- dynamic feature learning, and risk path modeling. The Score, ROC-AUC), regional propagation accuracy (Risk results of the two types of comparisons are exhibited in Impact Radius error), and uncertainty quantification Tables 6 and 7. Table 6: Comparison of the performance of GridRiskNet and SOTA models on the same dataset Risk Impact Model Researchers F1-Score↑ ROC-AUC↑ Radius error±σ Risk Entropy↓ (km) ↓ Dong and Li CNN-LSTM 0.724 0.892 28.3±4.1 1.87 (2025) The investment framework based Mostofi et al. 0.781 0.903 22.6±3.8 1.52 on graph attention (2025) networks Topic model Qi (2025) 0.698 0.841 - 2.03 clustering DNN based on Luo and Zhu 0.763 0.885 - 1.68 transfer learning (2024) The proposed GridRiskNet 0.892 0.962 4.8±0.9 0.89 model Table 7: Robustness comparison results of GridRiskNet and baseline models Recall for delay anomaly Model F1-Score Risk Impact Radius error±σ (km) detection XGBoost 0.712 32.5±6.2 0.683 LightGBM 0.735 29.8±5.4 0.721 ST-GCN 0.683 18.7±3.5 0.592 BERT- 0.698 - 0.654 BiLSTM GridRiskNet 0.892 4.8±0.9 0.937 The SOTA model's comparison experiment reveals prediction fluctuations in high-risk scenarios through the that GridRiskNet achieves considerable leadership in risk risk propagation graph reconstruction mechanism in the classification, propagation modeling, and uncertainty joint loss function, minimizing Risk Entropy and quantification. Although the GAT investment framework showing stronger stability of risk distribution. performs well in traditional graph learning tasks, it cannot In the comparison with baseline models, deeply integrate complex semantic features and GridRiskNet also demonstrates excellent robustness and meteorological data, leading to an underestimation of overall advantages. Compared with XGBoost and risks in some catastrophic events. In contrast, LightGBM, GridRiskNet not only improves the F1-score GridRiskNet fully captures the coupling relationship but also is much higher than other models, showing between accident texts and meteorological variables strong adaptability in complex dynamic data through cross-modal attention mechanisms and dynamic environments. Concerning regional propagation accuracy, feature fusion, and is significantly superior to other the Risk Impact Radius error of GridRiskNet fluctuates models in F1-score and ROC-AUC. Meanwhile, its GA- very little; it is far better than ST-GCN, which only RNN structure can accurately model risk transmission considers spatiotemporal features, proving the paths under power grid topology, greatly reducing Risk effectiveness of its spatial topology and semantic Impact Radius error. This verifies its high fitting ability information fusion strategy. Regarding time-series to the physical characteristics of power grids. Regarding anomaly detection, GridRiskNet combines dynamic uncertainty control, GridRiskNet effectively suppresses feature selection and time-series-aware splitting 328 Informatica 49 (2025) 315–330 H. Gao et al. strategies, notably improving the recall and detecting GridRiskNet. The training efficiency in a complete potential abnormal risks earlier. Overall, GridRiskNet production environment is tested on an NVIDIA A100×4 outperforms existing mainstream methods in multi- cluster, with the following records. They encompass: (1) dimensional tasks, having high accuracy and robustness; average convergence time in the training phase (in hours it also has a more suitable direction in key links of power (h)); (2) maximum inference delay per sample in the grid engineering risk management, such as risk inference phase (in milliseconds (ms)); (3) peak memory transmission, modal coupling, and dynamic prediction. consumption (in gigabyte (GB)); (4) training time per 0.01 F1-Score (in h). Under the condition of meeting the 4.3.4 GridRiskNet training cost and efficiency needs of offline batch processing and periodic risk monitoring in power grids, the practical controllability of analysis GridRiskNet is scientifically measured. The analysis Tests on computing cost and efficiency are results are suggested in Table 8. conducted to evaluate the engineering practicality of Table 8: Analysis results of the training cost and efficiency of GridRiskNet and baseline models under the same dataset Peak memory Convergence Maximum inference Training time per 0.01 F1- Model consumption time (h) delay per sample (ms) Score (h) (GB) XGBoost 1.2 0.09 1.5 0.17 LightGBM 1.0 0.07 1.2 0.14 ST-GCN 8.5 0.36 5.1 1.25 BERT-BiLSTM 12.3 0.45 6.4 1.77 GridRiskNet 17.8 0.63 9.8 2.00 According to the results in Table 8, although uncertainty to describe the impact of policy changes on GridRiskNet has a longer absolute training time (17.8 h) project risks. It essentially structurally summarizes policy and higher single-sample inference delay (0.63 ms) than volatility and does not depend on specific legal other models, its key index "training time per 0.01 F1- provisions. At the same time, GridRiskNet focuses on score" is 2 h, which is lower than that of BERT-BiLSTM risk propagation mechanisms and multimodal feature (1.77 h), bringing greater benefits. This indicates that its fusion, and its methodology is a universal architecture for high complexity effectively "exchanges for performance" global engineering projects. Therefore, even with U.S. with obvious non-linear returns. Moreover, the inference data, the revealed coupling relationships and propagation delay of 0.63 ms is still far lower than the acceptable mechanisms of multi-source heterogeneous risks have threshold (usually at the second level) in offline power high reference value for Chinese power grid enterprises. grid risk prediction, making it suitable for daily or even Additionally, the advantages of GridRiskNet over hourly scheduling scenarios. The memory consumption existing SOTA models are reflected not only in the of GridRiskNet matches the typical Graphic Processing superiority of indices but also in innovative Unit configuration of power enterprises (<10 GB), breakthroughs in methodological mechanisms. First, making deployment feasible. Overall, although regarding risk classification, GridRiskNet introduces a GridRiskNet has a higher training cost, it has the cross-modal attention mechanism to deeply explore the advantages of high performance returns, controllable coupling relationship between accident texts and inference, and resource affordability, thus making the meteorological features. It effectively makes up for the feasibility for practical engineering applications. perception defects of traditional single-modal models in complex scenarios. This enables its F1-score and ROC- 4.4 Discussion AUC to be significantly better than those of models such as GAT. Second, in regional propagation modeling, It should be explained that the experimental data of GridRiskNet is based on the GA-RNN structure and this study are based on U.S. sources (EIA, NOAA, OSM). embeds a risk propagation graph reconstruction However, the research on investment risk issues of power mechanism. It can dynamically identify key transmission grid engineering projects has a high degree of paths in the power grid topology and accurately capture commonality and structural consistency. The core lies in the risk diffusion process. Thus, it can minimize the Risk the complexity of the investment process, construction Impact Radius error and demonstrate high fitting ability environment, and risk chain of power grid projects, not to the physical structure of the power grid. Third, for limited to specific countries. Cost overrun, climate uncertainty quantification, the joint loss function design pressure, equipment technical failure, supply chain of GridRiskNet integrates classification error, graph fluctuation, and policy compliance risks (C1-C5) are five reconstruction error, and feature stability regularization key risks commonly faced by global power grid projects. terms. This helps to control prediction fluctuations in Among them, "policy compliance risk" is abstracted in high-risk scenarios and reduces risk entropy to the lowest the model as an index of institutional environment level. Compared with SOTA models that mainly rely on GridRiskNet: A Two-Stage Hybrid Model for Project Investment… Informatica 49 (2025) 315–330 329 traditional graph networks or single deep models, G. Goal-oriented graph generation for transmission GridRiskNet realizes the collaborative optimization of expansion planning. Engineering Applications of structured, spatiotemporal, and semantic data. Its core Artificial Intelligence, 2025, 149(4): 110350. innovation lies in the deep integration of the three https://doi.org/10.1016/j.engappai.2025.110350 mechanisms: "dynamic feature learning, propagation [2] Silvester B R. Hesitation at increasing integration: path modeling, and risk distribution stability". This not The feasibility of Norway expanding cross-border only improves model performance but also achieves a renewable electricity interconnection to support balance between the complexity of risk perception, path European decarbonisation. Technological interpretability, and prediction stability, possessing high Forecasting and Social Change, 2025, 213(3): practical value and theoretical promotion potential. 123917. https://doi.org/10.1016/j.techfore.2024.123917 [3] Yu Z, Guo L I, Wen T. Design management of clean 5 Conclusion energy projects from the perspective of partnering. Journal of Tsinghua University (Science and This study constructs the GridRiskNet risk Technology), 2025, 65(1): 115-124. management system based on big data mining around the https://doi.org/10.16511/j.cnki.qhdxxb.2024.22.042 intelligent management needs of investment risks in [4] Nyangon J. Climate-proofing critical energy power grid enterprise engineering projects. It also infrastructure: Smart grids, artificial intelligence, realizes the fusion modeling and dynamic evaluation of and machine learning for power system resilience structured, unstructured, and spatiotemporal data. against extreme weather events. Journal of Through the two-stage modeling architecture, the model Infrastructure Systems, 2024, 30(1): 03124001. performs well in risk probability distribution https://doi.org/10.1061/JITSE4.ISENG-2375 identification and regional propagation path modeling. [5] Sun B, Zhang Y, Fan B, Xie P. An optimal sequential The experimental results show that GridRiskNet has investment decision model for generation-side strong risk structure identification and regional difference energy storage projects in China considering policy perception abilities under multiple indices. From 2020 to uncertainty. Journal of Energy Storage, 2024, 83(11): 2023, the Risk PCA Projection Score has significantly 110748. https://doi.org/10.1016/j.est.2024.110748 climbed, revealing the dominant position of cost overrun, [6] Sun P, Yuan C, Li X, Di J. Big data analytics, firm climate pressure, and equipment risk in the evolution of risk and corporate policies: Evidence from China. engineering risks. At the same time, the model can Research in International Business and Finance, effectively capture the changing trends of risk path length 2024, 70(23): 102371. 10.1016/j.ribaf.2024.102371 and impact radius in the analysis of the potential impact [7] Hammouri Q, Alfraheed M, Al-Wadi B M. scope of each power grid region. Moreover, it can identify Influence of information technology on project risk the propagation characteristics of structural vulnerability management: The mediating role of risk of the western power grid and the high-impact radius of identification. Journal of Project Management, 2025, the Texas power grid, providing quantitative support for 10(1): 143-150. regional risk management. https://doi.org/10.5267/j.jpm.2024.10.001 Although GridRiskNet shows strong comprehensive [8] Risanger S, Mays J. Congestion risk, transmission performance in the experiment, there is still room for rights, and investment equilibria in electricity further optimization. The current model still relies on a markets. The Energy Journal, 2024, 45(1): 173-200. fixed attention mechanism in the fusion process between https://doi.org/10.5547/01956574.45.1.sris different data modalities, which struggles to fully [9] Khanna K, Govindarasu M. Resiliency-driven characterize the dynamic coupling relationship between cyber–physical risk assessment and investment heterogeneous features due to time and place. In addition, planning for power substations. IEEE Transactions the physical constraint mechanism is not introduced in on Control Systems Technology, 2024, 7(3): 21. the risk propagation modeling, and the mapping accuracy https://doi.org/10.1109/TCST.2024.3378990 of the actual operation state of the power grid still has [10] Liu H, Li X, Zhang Y. Investment risk assessment room for improvement. Follow-up research can further based on improved BP neural network. International introduce reinforcement learning and physical graph Journal of Automation and Control, 2024, 18(6): embedding methods to improve the model's adaptability 636-654. to dynamic environmental changes. Furthermore, https://doi.org/10.1504/IJAAC.2024.142093 expanding the model to broader scenarios such as new [11] Bussmann N, Giudici P, Tanda A, Yu P Y. energy access and emergency dispatching supports the Explainable machine learning to predict the cost of intelligent transformation of investment risk management capital. Frontiers in Artificial Intelligence, 2025, of power grid enterprises in a pluralistic and complex 8(1): 1578190. environment. https://doi.org/10.3389/frai.2025.1578190 [12] Dong S, Li A. The application of deep learning models in investment risk analysis of intelligent References manufacturing projects. Intelligent Decision Technologies-netherlands, 2025, 3(1): 14. [1] Varbella A, Gjorgiev B, Sartore F, Zio E, Sansavini 330 Informatica 49 (2025) 315–330 H. Gao et al. https://doi.org/10.1177/18724981251325923 algorithm. Informatica, 2023, 47(2): 16-21. [13] Loseva O V, Munerman I V, Fedotova M A. https://doi.org/10.31449/inf.v47i2.4026 Assessment and classification models of regional [25] Feng J. Multi-attribute perceptual fuzzy information investment projects implemented through decision-making technology in investment risk concession agreements. Economy of Regions, 2024, assessment of green finance Projects. Journal of 20(1): 276-292. Intelligent Systems, 2024, 33(1): 20230189. https://doi.org/10.17059/ekon.reg.2024-1-19 https://doi.org/10.1515/jisys-2023-0189 [14] Mostofi F, Bahadır Ü, Tokdemir O B, Toğan V, Yepes V. Enhancing strategic investment in construction engineering projects: A novel graph attention network decision-support model. Computers & Industrial Engineering, 2025, 203(2): 111033. https://doi.org/10.1016/j.cie.2025.111033 [15] Qi Y. Multi modal graph search: intelligent massive- scale subgraph discovery for multi-category financial pattern mining. IEEE Access, 2025, 1(1): 331. https://doi.org/10.1109/ACCESS.2025.3553560 [16] Luo S, Zhu X. Regional investment risk evaluation based on compound risk correlation coefficient and migration learning approach. Journal of Computational Methods in Science and Engineering, 2024, 24(1): 327-342. 10.3233/JCM-237045 [17] Gao C, Wang X, Li D, Han C, You W, Zhao Y. A novel hybrid power-grid investment optimization model with collaborative consideration of risk and benefit. Energies, 2023, 16(20): 7215. https://doi.org/10.3390/en16207215 [18] Oikonomou K, Maloney P R, Bhattacharya S, et al. Energy storage planning for enhanced resilience of power systems against wildfires and heatwaves. Journal of Energy Storage, 2025, 119(1): 116074. https://doi.org/10.1016/j.est.2025.116074 [19] Tavakoli M, Chandra R, Tian F, Bravo C. Multi- modal deep learning for credit rating prediction using text and numerical data streams. Applied Soft Computing, 2025. 2(4): 112771. https://doi.org/10.1016/j.asoc.2025.112771 [20] Liu K, Liu M, Tang M, Zhang C, Zhu J. XGBoost- based power grid fault prediction with feature enhancement: application to meteorology. Computers, Materials & Continua, 2025, 82(2): 7. https://doi.org/10.32604/cmc.2024.057074 [21] Zhou X, Li J. Risk assessment of high-voltage power grid under typhoon disaster based on model- driven and data-driven methods. Energies, 2025, 18(4): 809. https://doi.org/10.3390/en18040809 [22] Sari R P, Febriyanto F, Adi A C. Analysis implementation of the ensemble algorithm in predicting customer churn in telco data: A comparative study. Informatica, 2023, 47(7): 22-26. https://doi.org/10.31449/inf.v47i7.4797 [23] Tikhomirova T, Tikhomirov N. Methods for assessing low profitability risks of an investment project in conditions of uncertainty. Revista Gestão & Tecnologia, 2024, 24(2): 244-257. https://doi.org/10.20397/2177- 6652/2024.v24i2.2845 [24] Li L. Dynamic cost estimation of reconstruction project based on particle swarm optimization https://doi.org/10.31449/inf.v49i16.9600 Informatica 49 (2025) 331–350 331 Real-Time Motion Recognition in Special Training Systems Based on the Optimized BBO-KNN Method of Motion Morphology Yin Xu School of Physical Education, Henan Kaifeng College of Science Technology and Communication, Kaifeng 475001, China E-mail: xumeili2025@163.com Keywords: KNN dynamic weight, sports, special training Received: June 6, 2025 The traditional sports training boxing system has problems with insufficient accuracy and poor real-time performance in high similarity action classification, and lacks adaptability to individual action differences. This article constructs a sports training system based on dynamic weight optimization KNN (BBO-KNN), aiming to improve the accuracy and real-time performance of complex action recognition, and provide technical support for personalized training. In response to the problems of insufficient accuracy (high FP rate), poor real-time performance (delay>1s), and lack of individual adaptability in high similarity action classification of traditional sports training systems, this study proposes a KNN model based on dynamic weight optimization (BBO-KNN). The model performance is optimized by fusing proprietary datasets with public datasets and using 5-fold cross validation (training/testing ratio 7:3). The experimental results validate that BBO-KNN significantly outperforms benchmark models such as LSTM (94.50%) and SVM (89.30%) in accuracy (96.20% ± 0.3%). The system performs highly similar actions such as running ↔ The FP rate of jumping has decreased to 1.6%, and the global FP rate is 1.39%.and robustness (noise interference fluctuation ± 1.2%). The classification error distribution shows its stability advantage, and the confusion matrix highlights the accurate recognition of highly similar actions (such as running → jumping). Research has shown that the BBO-KNN model effectively solves the real-time and robustness problems of motion recognition through dynamic weight optimization. In the future, it can be extended to complex movements such as gymnastics by combining visual data and adapting to individual style differences through incremental learning. Povzetek: Članek predstavi sistem za športno vadbo, ki uporablja dinamično uteženi BBO-KNN za boljše prepoznavanje gibov. 1 Introduction processing capabilities. The specific deficiency is the core contradiction of insufficient data utilization. Sports special training is undergoing a profound Currently, only 42% of sports teams have established a change from traditional experience-oriented to data- complete analysis system, which means that more than driven. This transformation process presents multi- half of the training data is dormant and cannot be dimensional technical characteristics and systematic converted into effective training decision-making basis. development bottlenecks. From a macro perspective, it The deficiency of this data value mining stems from can be seen that the digital penetration of modern sports multiple technical obstacles, including but not limited to training systems has reached a considerable scale. imperfect feature engineering construction, inefficient According to authoritative data from the General data cleaning process, and insufficient adaptability of Administration of Sport of China in 2024, more than analysis model. What is more prominent is the static three-quarters of professional sports teams have deployed phenomenon of evaluation indicators. Up to 91% of various wearable devices for training data collection. training systems still adopt the fixed weight scoring This proportion has nearly doubled compared to five mechanism [1]. This rigid evaluation system can't adapt years ago, indicating that a fundamental paradigm shift is to the dynamic changes of athletes' physiological taking place in sports training methodology. parameters, resulting in systematic deviation between There is a sharp intergenerational gap between the training programs and actual needs. In addition, the rapid popularization of hardware and the intelligence feedback delay problem further amplifies this mismatch. level of software systems. The widespread deployment of The decision lag of 2.3 training cycles on average makes data acquisition equipment has not simultaneously the training adjustment always lag behind the actual state brought about a significant improvement in training change of athletes, resulting in the time loss of training efficiency, but has exposed structural defects in data effect [2]. 332 Informatica 49 (2025) 331–350 Y. Xu By deeply analyzing the technical essence behind and delay>200ms. To achieve this assumption, the system these phenomena, we can find that the fundamental uses the BBO algorithm to optimize the feature weight reason for the homogeneity of training programs lies in vector to enhance local feature sensitivity, combines K- the uniformity of feature extraction dimensions and the Means clustering to compress the dataset size, and lack of personalized modeling, which reflects the designs a lightweight edge architecture for real-time fundamental contradiction between the traditional batch processing. computing model and the real-time decision-making The implementation of minute level weight updates needs [3]. Therefore, solving these systemic defects through sliding time windows and incremental learning requires the introduction of innovative algorithm relies on a triple mechanism: architectures and technical paradigms. There are still two (1) The 200ms sensor window slides in 10ms steps key optimization spaces in current technology. The first to ensure real-time feature extraction; is the balance between computing resource consumption (2) Incremental learning only updates cluster centers and real-time requirements, especially the control of (non feature weights), and adjusts secondary cluster computational complexity when processing high- points every 5 days through new data (as mentioned in dimensional features. The second is the model the conclusion); generalization ability in small sample scenarios and the (3) The feature weight WK3 remains static, and its adaptive performance when facing new athletes or rare “dynamic” effect comes from the weight distribution training situations [4]. optimized by BBO, while window sliding allows the The core innovation of KNN dynamic weight model to continuously capture temporal features. optimization technology lies in building a four- dimensional optimization space, realizing minute-level weight updates in the time dimension, and compressing 2 Related work the data processing delay to 1/60 of the traditional method through the sliding time window mechanism and 2.1 Research status of sports special incremental learning algorithm. Moreover, it completes training system multimodal data fusion in the feature dimension, integrating multi-source information such as Rodriguez et al. [8] developed a multi-sensor fusion biomechanics, physiology and biochemistry, and wearable system. It integrates IMU, sEMG and heart rate environmental parameters [5, 6]. In the individual monitoring modules, increasing the data collection dimension, it establishes an athlete-specific model and dimension to 23 physiological indicators, but there is a achieves efficient matching of similar samples through 15% sensor signal interference problem. The 4D optical dynamic neighborhood search technology. In the capture solution proposed by Cizmic et al. [9] improves environmental dimension, it integrates venue equipment the motion analysis accuracy to 0.3mm, but the system parameters to build a complete training situation construction cost is as high as 2 million yuan, making it perception system. This multi-dimensional optimization difficult to popularize and apply. At present, non-contact architecture enables the system to process nonlinear and monitoring technology based on millimeter wave radar non-stationary training data, effectively solving the can realize micro-motion capture within a range of 5m, response hysteresis problem of traditional systems [7]. but the sampling rate is limited to 120Hz. The traditional sports training classification system The BP neural network evaluation model encounters issues of insufficient accuracy and poor real- constructed by Balkhi et al. [10] improves the accuracy time performance in high-similarity action classification, of technical action scoring to 89% in sports events, but and lacks adaptability to individual motion variations. requires more than 800 hours of labeled data training. Therefore, this paper constructs a sports training system Calderón-Díaz et al. [11] introduced the transfer learning based on Biogeography-Based OptimizationKNN (BBO- method, which can realize personalized modeling of new KNN), aiming to improve the accuracy and real-time athletes with only 200 samples, but the cross-event performance of complex action recognition and provide transfer error still reaches 28%. It is worth noting that the technical support for personalized training. digital twin evaluation system developed by Iduh et al. This study aimed to investigate the performance [12] controls the sports action prediction error within 1.2 ° limits of the BBO-KNN (BBO optimized KNN) through real-time physical simulation, but it needs to be algorithm for high-similarity action recognition by equipped with the support of a supercomputing center. addressing a specific research question. The specific From the above research on sports special training hypothesis is whether BBO-KNN could reduce the false system, the current system generally faces three major positive rate (FP rate) to below 2%, while maintaining a challenges: (1) the asynchronous problem of multi-source stable end-to-end processing latency below 20ms and a data leads to 27% information loss; (2) The lack of classification accuracy better than 95%. This goal is interpretability of the model leads to the trust crisis of directly aimed at the core defects of traditional systems coaches; (3) The contradiction between hardware (such as LSTM and SVM) in high similarity action (such portability and accuracy is prominent. It is particularly as running and jumping) classification, with FP rate>4.2% noteworthy that 82% of commercial systems still use Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 333 static evaluation algorithms, which cannot adapt to the increased by 19%, but the construction of rule base relies dynamic changes of athletes' status. on a large amount of expert knowledge. From the above research, the current research 2.2 Application of optimization algorithm mainly faces three bottlenecks: (1) the contradiction between the real-time performance and accuracy of the in training system algorithm, and the optimal system still has a delay of 5-8 Chen et al. [13] introduced genetic algorithm into minutes; (2) The model is not explainable enough, and 68% sports special training cycle planning, and improved the of AI decisions cannot provide reasonable explanations; matching degree of training scheme by 31% through (3) The cross-project transfer capability is weak, and the adaptive cross-mutation strategy, but there is the problem average error is 37%. In particular, 82% of commercial of slow iterative convergence speed (an average of 14 training systems (2024 market research) still adopt static hours). The improved particle swarm optimization optimization strategies, which are difficult to adapt to the algorithm by Taborri et al. [14] optimizes the load dynamic changes of athletes' status. distribution of strength training in sports events, which increases the maximum strength growth rate of athletes 2.3 Research status of application of KNN by 22%, but the sensitivity of the algorithm parameters is high and needs repeated debugging. optimization algorithm in sports The LSTM-ATT hybrid model developed by Hanif training et al. [15] achieves 92% accuracy in the evaluation of The weighted dynamic KNN model proposed by sports-specific actions, but the model needs 150,000 Merzah et al. [20] improves the accuracy rate to 94% in pieces of labeled data for training. AshokKumar et al. [16] sports action recognition, but the real-time calculation applied reinforcement learning to optimize sports- delay is still 1.2 seconds. The quantized distance specific strategies, which increased the athlete's scoring calculation method developed by Bunker et al. [21] can rate by 29%, but there was a problem of high training increase the speed of athletes' posture analysis by 100 costs (200 hours of simulated adversarial data was times, but it needs the support of special quantum required). The meta-learning method can shorten the computing equipment. The improved KNN scheme model adaptation cycle for new athletes from 14 days to combined with SHAP value interpretation proposed by 5 days, but it has a huge demand for computing resources, Teixeira et al. [22] reduces the error of sports action requiring 4 A100 graphics cards. evaluation from 3.2° to 0.8°, but the complexity of feature The Pareto frontier algorithm proposed by Kumar et engineering increases by 3 times. al. [17] balances technical improvement and injury risk The sliding window incremental learning system in sports training, which optimizes the training benefit- applied by Woltmann et al. [23] shortens the update cycle risk ratio by 37%, but the complexity of the algorithm of the training model to 15 minutes, but the memory leads to a decrease in real-time performance and a delay footprint is still as high as 32GB. The multi-modal of up to 11 minutes. The NSGA-III algorithm developed distance measurement method proposed by Sonalcan et by Molavian et al. [18] realizes the multi-objective al. [24] combines electromyographic and mechanical optimization of sports specialties and improves the characteristics in sports events, so that the prediction competition performance by 0.8%, but it needs the error of action angle is < 0.5 °, but 17 sensor data need to support of accurate biomechanical modeling. be synchronized. Malamatinos et al. [19] applied fuzzy logic to optimize sports posture, and the completion of movements was Table 1 below summarizes the relevant work: Table 1: Summary of related work Technical Method and Key Results Limitations and technical bottlenecks direction Method: Multi sensor wearable Limitations: 15% signal interference
Bottleneck: system (IMU/sEMG/heart Asynchronous multi-source data leads to 27% Multi sensor rate)
Results: 23 physiological information loss, and there is a contradiction between fusion system indicators were collected, and the portability and accuracy (the cost of a 0.3mm precision data dimension was increased by system reaches 2 million yuan) 300% Method: High precision optical Limitations: Supercomputing Center Dependency (2- 4D optical marker point tracking
Result: million-yuan cost)
Bottleneck: High hardware capture scheme Motion capture accuracy reaches deployment costs, difficult to popularize applications 0.3mm 334 Informatica 49 (2025) 331–350 Y. Xu Method: Multi layer Limitations: 800 hours of annotated data training BP neural backpropagation required
Bottleneck: high data dependency, long network model network
Result: Technical model update cycle (>2 weeks) action scoring accuracy rate of 89% Method: Cross athlete feature Limitations: Cross project migration error of Transfer learning transfer
Result: New athlete 28%
Bottleneck: weak domain adaptability, program modeling only requires 200 samples insufficient generalization ability Method: Weighted Neighbor Limitations: Real time latency of 1.2 Dynamic KNN Classification
Result: Action seconds
Bottleneck: Low computational efficiency, algorithm recognition accuracy rate of 94% unable to meet real-time requirements of<100ms Method: Quantization feature Limitations: Requires specialized quantum devices. Quantum distance similarity measurement
Result: Bottleneck: Strong hardware dependency and calculation Attitude analysis speed increased by extremely high commercialization costs 100 times Method: Fusion of Limitations: 17 sensors need to be synchronized. Multimodal KNN electromyography and mechanical Bottleneck: The system integration complexity is high, optimization features
Result: Prediction error and the engineering implementation is difficult of action angle<0.5 ° From the above research, the current research on the 3.1 Design of improved nearest neighbor application of KNN optimization algorithm in sports classification algorithm training faces three core challenges: There is a contradiction between real-time requirements and The KNN algorithm generally uses the majority computational accuracy, and the optimal system still has voting method. It assumes that there are N labeled a delay of 8-15 seconds; (2) The asynchrony of multi- source data leads to a loss of 27% of feature information; samples T = (x1, y1 ) ,(x2 , y2 ) , ,(xN , yN ) , x i (3) The cost of personalized adaptation is too high, and it takes 14-20 days to build a single athlete model. In represents a sample with n -dimensional features, particular, 83% of the existing systems (2025 market research) still adopt the static K-value strategy, which is x   Rn label of i ,i =1,2, , N and y is the x , i i difficult to adapt to the dynamic changes of training intensity. yi  = (c2 ,c2 , ,cl ) The label value y of the sample to . 3 Sports special training system be tested is obtained by the classification rule, as shown based on KNN dynamic weight in the following formula[25]: optimization y = argmaxc  H y ,c i = N j = L  ( ) ( ), 1,2, , , 1,2, , j xi Nk x i j This paper improves the KNN classifier, which has excellent performance in feature engineering processing, (1) and proposes a KNN classification algorithm based on 0 y  the K-means clustering algorithm. This paper combines ( i c j H yi ,c j ) =  (2) 1 yi = c j the univariate feature selection method and the BGWOPSO algorithm to search for the optimal feature Among them, set, and selects the BBO algorithm as the weight Nk (x) = xi | xi is the K nearest neighbor samples of x optimization module of the subsequent human motion intention recognition model to propose a human motion , and when y and otherwise it is i = c , j H ( yi ,c j ) =1 intention recognition model that can use fewer features to identify multiple motion patterns and has a higher 0. classification accuracy. Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 335 3.1.1 Comparison of feature normalization standardize the data. Although all data information is methods used in the dimensionless process, the importance of each When a sample includes multiple eigenvalues, variable is not treated equally, and the analysis weight of features with larger magnitudes will weaken features with variables with large differences is relatively large. The smaller magnitudes and affect the accuracy of the KNN conversion function is: classifier. Therefore, the data needs to be normalized, and the commonly used normalization methods are maximum x − x scale = (4) normalization and mean-variance normalization. 2  The extreme value normalization method uses the The maximum normalization and mean-variance maximum and minimum values in the variable value range to scale the original data proportionally to the data normalization methods are used to normalize the post-FC within the 0,1 range to eliminate the impact of the mixed data set respectively, and the mean and standard dimension. Since the extreme value normalization deviation are extracted as eigenvalues, which are input method is only related to the two extreme values of the into KNN classifier, and the classification accuracy of the maximum and minimum values, the scaling of each variable is overly dependent on the two extreme values. two are compared. When the K values of the nearest The conversion function of the extreme value neighbor are taken from 1 to 15 respectively, the 5-fold normalization is as follows [26]: cross-validation accuracy of the KNN classifier after x − x x min scale1 = (3) normalization by the maximum normalization method xmax − xmin and the mean-variance normalization method is The mean-variance normalization method uses the compared, and the results are shown in Figure 1(a). mean and standard deviation of the original data to (a) Comparison of classifier accuracy when using maximum normalization and mean-variance normalization methods; 336 Informatica 49 (2025) 331–350 Y. Xu (b) Comparison of classifier accuracy using different distance measurement formulas Figure 1: Comparison of classifier accuracy (Using post FC mixed dataset (action fragment sampling rate 100Hz)) It can be seen from Figure 1(a) that after the two 1 normalization methods normalize the post-FC mixed data p ( ) ( n (l ) (l) ) p L4 xi , x j =  ( ) l=1 xi − x 8 set, the accuracy of KNN classifier is not much different, j and there is no obvious law at all. The most commonly used method is the nearest neighbor K value. When 1 K = 3 and K = 4 , the data processed by the mean- T ( ) ( n (l ) (l ) 1( (l ) (l ) 5 ))2 L xi , x = variance normalization method has higher classification j  l 1 xi − x j  − ) = xi − x (9 j accuracy, so in subsequent experiments, the mean- variance normalization method is used to normalize the Among them, feature space  is an n-dimensional data. real vector space Rn , x , a d i , x j  n  is the 3.1.2 Comparison of distance measurement covariance matrix of multidimensional random variables. formulas When the data of each dimension are independent Commonly used distance measures in KNN and identically distributed, the Mahalanobis distance is algorithm include Manhattan distance, Euclidean the Euclidean distance. The post-FC mixed data set is distance, Chebyshev distance, Min distance and normalized using the mean-variance normalization Mahalanobis distance, etc. The formulas are [27]: method, and the mean and standard deviation are extracted as feature values and input into the KNN ( ) n (l ) (l ) L1 xi , x j = l=1 xi − x j (5) classifier. The dataset is a post-FC hybrid dataset (feature dimensions: mean and standard deviation), and the K 1 value ranges from 1 to 15 (full range validation). The 2 ( ) n ( ) ( ( validation method uses 5-fold cross validation 2 i , x j = ( l l ) L x  l= xi − x )2 6) 1 j (independent calculation of accuracy for each fold). The results are shown in Table 2. ( ) (l ) (l ) L3 xi , x j = max xi − x j (7) l Table 2: Comparison of classification accuracy of KNN classifier using different distance metrics K value Distance formula 1 2 3 4 5 6 7 8 Manhattan distance 92.57 92.03 91.66 91.58 91.12 91.49 91.27 92.12 Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 337 Euclidean distance 92.12 92.67 92.40 92.67 91.39 91.75 90.48 91.02 Chebyshev distance 90.48 90.59 90.20 89.89 89.01 89.83 88.37 88.09 Min's distance 92.03 92.57 91.58 91.48 91.26 91.57 90.13 90.60 Mahalanobis distance 90.14 89.60 90.87 89.87 89.31 89.78 89.90 90.24 K value Distance formula 9 10 11 12 13 14 15 Manhattan distance 90.89 90.93 91.58 90.93 91.30 90.00 90.59 Euclidean distance 91.49 90.20 90.24 90.29 89.92 90.10 89.01 Chebyshev distance 87.55 87.45 87.55 87.27 87.19 87.36 86.35 Min's distance 90.57 90.57 90.02 89.38 90.57 89.47 88.82 Mahalanobis distance 86.45 89.25 89.09 86.81 88.23 86.25 86.08 It can be seen from Figure 1 (b) and Table 2 that nearest neighbor K value is K = 4 . using Euclidean distance and Manhattan distance can make the algorithm obtain high accuracy, but using 3.1.4 Dataset size reduction based on K- Manhattan distance will bring serious computational means clustering algorithm burden to the algorithm and the prediction time is too long. Considering the accuracy and operation time, Real-time implementation of KNN classifier on Euclidean distance is selected for measurement. intelligent dynamic knee prostheses is difficult. In order Figure 1 (b) and Table 2 show that the Euclidean to solve this problem, a combination of KNN algorithm distance has an accuracy of 92.67% at K=4, which is and K-Means clustering algorithm is proposed. To ensure better than the Chebyshev distance (89.89%) and has the accuracy of the experiments, several trials are better hardware adaptability. Hardware level performed to determine the cluster centers. optimization: The native multiply add instruction (MAC) In the post-FC hybrid dataset, each motion state of ARM7-M FPU holds the sum of squares operation, contains 120 sets of motion data, and after feature which enables 24-dimensional feature calculation to be extraction, the data storage amount is still huge. The K- performed at only 4.2us/time, 38% faster than Manhattan Means clustering algorithm can significantly reduce the distance, especially avoiding the prediction penalty of size of the data set and remove most of the similar sample Manhattan distance absolute value branch (Figure 1 (b) points. accuracy curve confirms this choice). To reduce the computational complexity of KNN, hierarchical K-Means clustering is employed to compress 3.1.3 Selection of nearest neighbor value each class of action data independently: K-Means clustering is performed on the samples of each action How to choose the appropriate nearest neighbor K class, and the set of tooth count centers is represented as value is also critical to improving the accuracy of the KI = KI1, KI2 , , KIl , where l is the number of classes. KNN classifier. The smaller the K value is, the easier it is In addition, the corresponding primary cluster centers are for the model to overfit. When K = 1, it is equivalent to generated. Within the same action class, secondary K- predicting only based on the nearest point to the target Means clustering is performed to obtain M secondary point. If this point is a noise point, an error will occur. cluster points ( M 120 ), resulting in a set of secondary When the K value is larger, points farther away from the cluster points, represented as target will also participate in the prediction, resulting in KS =KS1, KS1, KS1 KS l . These two s 1 2 , M , , M ets are underfitting. When K is equal to the total number of saved as new datasets. sample points, the prediction result is the label with the KI completely replacing the original data, a most points in all samples, and the classification model is compressed table collection (non-index tag) is formed. completely invalid at this time. Common methods for KNN operates directly on the compressed set, eliminating selecting the nearest neighbor K value include empirical the need to trace back to the original data. KI and KS judgment and determination using optimization constitute completely independent compressed table algorithms. collections, which are directly used as the operational As shown in Figure 1(a) and (b), when the nearest objects for KNN in inference. This “hierarchical neighbor K value is from 1 to 15, the classifier achieves representation + geometric constraints” architecture not relatively high values at K = 2 and K = 4 . Then, as the only retains key motion features but also completely K value increases, the prediction accuracy of the avoids the computational burden of the original data. This classifier gradually decreases. When K = 2 , the K value also provides support for reducing computational burden is small and the probability of overfitting is greater, so the in subsequent experiments 338 Informatica 49 (2025) 331–350 Y. Xu 3.1.5 Improvement of classification decision radius covering c the symmetric points The basic principle of trigonometric inequality is rules based on triangular inequality that the sum of any two sides in a triangle is greater than The test sample point is x, the class primary center the third side, which can be associated with the distance point is c, and the secondary center point is s, satisfying relationship between three sample points, as shown in the basic properties of metric space: Figure 2. In the figure, the unmarked sample point T is a d (x,c)−d (c, s)  d (x, s)  d (x,c)+ d (c,s) , which green circle. defines a spherical region centered at x and with a Figure 2: Schematic diagram of triangular inequality method The steps to improve the KNN algorithm are as compressed_set = {} follows: For class_1abels in unique_1abels: # Traverse each Step 1: The K-Means algorithm is used as a action category preprocessing step to reduce the size of the dataset, and 1 Class_data=DataSet [class_1abel] # Retrieve all initial center is clustered in each class, and M secondary samples of the current class clusters are clustered in each class. Step 2: The initial centers of the first K classes with #Main clustering center (capturing core features of the smallest distances are selected. the class) Step 3: Among the selected top K classes. The main_centers = distance from the selected secondary cluster point in each KMeans(n_clusters=Kc_main).fit(class_data).cluster_ce class to the unlabeled sample is calculated, and the first nters_ K minimum distance values are selected and the mean is calculated. The class label with the smallest distance #Secondary clustering points (covering intra class mean is assigned to the unlabeled sample as the label. variation) The triangle inequality accelerates computation, sub_centers = narrows the search space, and reduces the omission rate KMeans(n_clusters=Kc_sub).fit(class_data).cluster_cent of key neighbors through threshold conditions and ers_ geometric constraints (Figure 2). This achieves coupling between the spherical filter domain and the cluster compressed_set[class_label] = { distribution. This mathematical framework provides a 'KI': main_centers, theoretical basis for improving the high accuracy and low #Class initial center set latency of KNN in sports action recognition. 'KS': sub_centers # subpoint set } The algorithm pseudocode is as follows: return compressed_set #Training stage: K-Means clustering compression #Prediction stage: Improve KNN inference def train_KMeans_compress(DataSet, Kc_main=1, def enhanced_KNN_predict(sample, Kc_sub=15): compressed_set, K=4): Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 339 #Step 1: Calculate the distance to various main 3.2 Construction of human motion centers intention recognition model main_distances = [] for label, centers in compressed_set.items(): This paper proposes a human motion intention dist = min([euclidean(sample, center) for center in recognition optimization system, as shown in Figure 3. centers['KI']]) When the subject wears an intelligent powered knee main_distances.append((label, dist)) prosthesis, the 6-axis IMU sensor, uniaxial pressure sensor and knee encoder acquire raw data with a sampling frequency of 100 Hz. When the foot touches the ground, #Step 2: Select the top K nearest classes the 8-channel sensor data of the knee prosthesis is top_classes = sorted(main_distances, key=lambda x: collected within 200ms, and the BGWOPSO algorithm is used for feature selection. By comparing the optimization x[1])[:K] of feature weights using three weight optimization methods such as the BBO algorithm, the classification #Step 3: Triangular inequality screening for accuracy of the KNN classifier is improved, and the weight optimization method used in this system is secondary points determined. min_avg_dist = float('inf') The feature weights (WK3) optimized by the BBO predicted_label = None algorithm remain static during the inference phase, and for class_label, _ in top_classes: their function is to enhance sensitivity to key motion #Get all sub points of this category attributes through pre-set feature importance. Dynamics sub_points = compressed_set[class_label]['KS'] are mainly reflected in two aspects: Neighbor dynamic screening: Real time selection of #Triangle inequality filtering (only calculates points relevant samples based on triangular inequality (Figure 2); that may be closer) Incremental model update: Adjusting cluster centers to candidate_points = [] adapt to individual differences through new data for point in sub_points: Metaheuristics can reduce computational burden. In If Euclidean (point, sample)200ms >150ms >100ms - latency measurement Fluctuation ± 1.2% Monte Carlo Noise Fluctuations Fluctuations ± 3.5% (signal loss test of 0.003* simulation robustness ± 2.8% ± 4.1% fluctuation 15%) (1000 times) 24 dimensional 150000 Feature Learning Training data 80000 features+incremental annotated engineering - curve requirements samples learning data dependency analysis Online Model 5 days (new athlete Time cost 14 days updates are 10 days <0.001* update cycle adaptation) tracking not supported The quantitative delay comparison results with SOTA model are shown in Table 5 below: Table 5: Quantitative delay comparison results Model Delay (end-to-end) Hardware dependency Input sensitivity Low (Feature Dimension Universal sensor (low- BBO-KNN <20ms Compression Buffer Input cost) Fluctuations) High (complete sequence LSTM >200ms GPU Accelerator required for temporal modeling) Medium (kernel function SVM >150ms CPU cluster calculation burden) High (uncompressed KNN 1.2 seconds No special requirements sample size) The results of parameter sensitivity verification are shown in Table 6 below: Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 345 Table 6: Parameter sensitivity verification Convergence Parameter Accuracy FP rate algebraic Key conclusions perturbation fluctuation fluctuation variation Population size ± 30% Convergence 15 Insufficient population leads to local ▶ 105 (-30%) -0.70% 0.90% generations optima (WK3 weight imbalance) ahead of schedule Delay 8th Revenue does not offset calculation ▶ 195 (+30%) 0.20% -0.10% generation costs (delay ↑ 23%) convergence Iteration times ± 20% Not reaching the convergence saturation ▶ 40 (-20%) -0.40% 0.60% - point (K=4 curve in Figure 1a) Diminishing marginal benefits ▶ 60 (+20%) 0.10% -0.05% - (resource waste ↑ 35%) By quantifying the contributions of each module ablation experiment are shown in Table 7 below: using the variable control method, the results of the Table 7: Results of ablation test Ablation component Accuracy variation FP rate change Key Function Complete BBO-KNN 96.20% 1.60% - Remove BBO weight Decreased feature 94.17% (↓2.03%) 1.90% optimization sensitivity Remove context fusion Increased confusion in 92.50% (↓3.70%) 3.20% highly similar actions Remove feature selection Noise characteristics 90.10% (↓6.10%) 5.80% interfere with decision- making Only using single center Loss of intra class clustering 89.42% (↓6.78%) 6.50% diversity (comparison in Table 6) To further verify the universality of the model in this on the Berkeley MHAD dataset. This dataset contains 12 article, Berkeley MHAD (international dataset: basic actions with a balanced sample size (approximately https://tele-immersion.citris-uc.org/berkeley_mhad) was 150 samples per class), using the same 5-fold cross used to validate the universality of basic actions. Table 8 validation method as the document (training/testing ratio shows the performance comparison results of the model 7:3). The evaluation indicators include accuracy, recall, 346 Informatica 49 (2025) 331–350 Y. Xu Jaccard coefficient, and F1 score. Table 8: Results of universal validation model Accuracy Recall Jaccard F1 value BBO-KNN 95.50%±0.4% 95.10%±0.5% 92.80%±0.6% 95.30%±0.4% LSTM 94.00%±0.7% 93.60%±0.8% 91.50%±0.9% 93.90%±0.7% SVM 88.80%±1.3% 88.20%±1.5% 85.60%±1.4% 88.50%±1.3% random forest 92.20%±0.8% 91.50%±1.0% 89.40%±1.1% 91.80%±0.9% The multimodal fusion mechanism of BBO-KNN 4.3 Analysis and discussion achieves action understanding through spatiotemporal In Table 3, the BBO-KNN model performs well in aligned sensor collaborative perception: all evaluation indicators. In particular, the F1 value of this Physical layer correlation: The pressure sensor model reaches 96.0%, which is the best performance captures the plantar contact force (vertical dynamic among the four models. The LSTM model performs index), and the IMU analyzes the joint angular velocity second, and each indicator is relatively high, but it is (kinematic trajectory). The fusion of the two is similar to slightly inferior to BBO-KNN in all evaluation indicators. the biological perception mechanism that combines The accuracy, recall and F1 value of the random forest tactile feedback and visual trajectory (non-image pixel model are higher than those of SVM, but the overall analogy). performance is still not as good as BBO-KNN and LSTM. Technical advantage: As shown in the confusion The SVM model performs the worst in all indicators, matrix in Figure 5 (a), the precise distinction between which is related to its weak ability to process sequence running and jumping (FP rate of 1.6%) is due to the data. complementarity of pressure IMU (jump pressure The BBO-KNN model performs well in sports distribution vs change in aerial angular velocity). This action recognition tasks (F1 value 96.0%), and its fusion logic is similar to the probability interpretability of performance advantage can be attributed to the following Gaussian Mixture Model (GMM) in multi-source signal core improvement strategies and technical characteristics: separation (non background modeling analogy). (1) Design of KNN algorithm with dynamic weight The weight vector optimized by BBO directly optimization quantifies the contributions of each sensor, and the newly The classification effect of the traditional KNN added data only updates the cluster center (non-black box algorithm is limited by the fixed number of neighbors (K- parameters). The athlete style adaptation records are value) and uniform distance weight allocation. By retained as an independent KS subset. introducing a dynamic weight strategy, BBO-KNN (4) Robustness enhancement and noise suppression adaptively adjusts the contribution of nearest neighbor BBO-KNN effectively reduces the influence of samples according to the local characteristics of sensor sensor noise on classification results by integrating data. For example, during the sprint acceleration phase, filtering algorithms and outlier detection modules. For due to the sensitivity of BBO optimized feature weights example, when the foot touches the ground during (Y-axis acceleration weight 0.21) to high acceleration, sprinting, the model can filter out the interference of relevant samples are easily selected into the candidate set. instantaneous vibration signals on acceleration data. This (2) Context feature fusion mechanism is similar to the idea of suppressing dynamic noise in BBO-KNN integrates the contextual information of background modeling using the Gaussian mixture model motion intention, which makes up for the shortcomings (GMM), but BBO-KNN achieves real-time requirements of traditional KNN that only rely on static feature through lighter calculations. similarity. In long jump movement recognition, the model The excellent performance of BBO-KNN stems enhances the robustness of movement segmentation by from its comprehensive design of dynamic weight analyzing the timing relationship between the change of optimization, context feature fusion, multi-modal data knee joint angle before take-off and the inertial adaptability and noise suppression mechanism. These measurement unit (IMU) signal during take-off. This improvements not only inherit the intuition and efficiency mechanism is highly consistent with the needs of of the traditional KNN algorithm, but also make up for its complex time series data modeling, and is similar to the shortcomings in timing modeling and noise sensitivity. advantage of KNN in processing high-dimensional Therefore, this model is especially suitable for scenes grayscale data in image recognition. such as sports actions, which need to give consideration (3) Adaptability of multi-modal sensor data to real-time and classification accuracy. Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 347 In Figure 4, the classification error of BBO-KNN is are only 3 cases of running jump misjudgment (compared 3.8%, Weight optimization reduces sensitivity to K to 7 cases of LSTM), which is clearly presented in the values, and improves the recognition accuracy of action confusion matrix of Figure 5a; At the same time, its boundary through local feature adaptation. For example, lightweight architecture achieves end-to-end latency in knee prosthesis movement, dynamically adjusting the of<20ms (more than 10 times faster than LSTM>200ms), neighbor weight can avoid misclassification during gait which is attributed to K-Means clustering reducing phase switching. The error of LSTM is 5.5%. Although it computational load by 87% (original 120 groups/class → is good at time series modeling, it is not as flexible as 1 center point+15 key points); In terms of robustness, BBO-KNN in capturing short-term motion features. BBO-KNN fluctuated only ± 1.2% (Monte Carlo When the action segment is short, the LSTM may lose simulation p=0.003) in the noise test with a sensor signal key frame information. loss of 15%, significantly better than LSTM's ± 2.8%, The classification error of random forest is 7.3%. confirming the strong anti-interference ability of sliding Due to the hard boundary characteristics of ensemble window filtering (error distribution verification in Figure decision tree, the gradual features of continuous motion 4); In addition, BBO weight optimization compresses the intention are insufficiently fitted. The classification error feature dimension from 24 to 7 (Equation 10), shortening of SVM is 10.7%. It is difficult to select kernel function the construction cycle of new athlete models to 5 days (t- in high-dimensional IMU data, and it is sensitive to test p<0.001), and solving the bottleneck of 28% cross unbalanced training data. item error in traditional transfer learning. These The low error of BBO-KNN verifies its advantages quantitative results rigorously validate the in motion intent recognition tasks. Its core is to solve the comprehensive innovation of dynamic weight bottleneck of traditional methods in real-time and noise architecture in terms of accuracy, real-time performance, robustness through dynamic neighbor selection and and adaptability. context fusion. Table 5 shows that edge deployment avoids data In Figure 5 (a), the high diagonal accuracy of the transmission overhead. The latency fluctuation in the confusion matrix of the BBO-KNN model is high, and the noise test is ±1 ms, which is associated with an accuracy classification accuracy of the running and swimming fluctuation of ±1.2%. This is indirectly supported by the categories reaches 98.4% and 99.0% respectively, which error distribution in Figure 4 and is significantly better benefits from the dynamic weight strategy's ability to than the latency fluctuation of ±10 ms in LSTM (because capture local motion features. Moreover, only 3 cases of the cyclic structure amplifies the noise effect). jumping movements were misclassified as running, The population size (150) and iteration count (50) reflecting its optimized sensitivity to changes in knee configuration of the BBO algorithm are based on the joint angles. balance between feature space complexity and In Figure 5(b), the LSTM model confusion matrix convergence efficiency: BGWOPSO feature selection shows that the proportion of running misjudged as compresses the feature space from 24 dimensions to 7 jumping is 3.8%, which is related to the inertial signal dimensions, and BBO weight optimization assigns delay in the action switching stage. In addition, the differentiated weights to each feature on this 7 swimming action recognition accuracy is 95.8%, which dimensional subspace, but to avoid the problem of high is better than the short-term action classification, showing GPU cost, a final size of 150 is set to ensure weight that it has a strong advantage in long-term actions. diversity; If the number of iterations is 50, based on the In Figure 5 (c), the confusion matrix of the SVM saturation point of the convergence curve (K=4 curve in model shows that the FP rate of other action categories Figure 1a, the accuracy improvement after 40 generations reaches 15.3%. This is because the RBF kernel function is less than 0.1%), the global optimum is approached is sensitive to data distribution. At the same time, 9 cases under the constraint of computational resources. are misjudged as jumps, which is related to the similarity Verification shows that when the population size is of action amplitude. reduced by 30% to 105, the weight vector WK3 becomes In Figure 5 (d), the confusion matrix of the random imbalanced due to insufficient exploration of high- forest model shows that the accuracy of the training set is dimensional space (7-dimensional feature combination 98.2%, and the FN rate of the “jumping” category of the reduced to 4.9 dimensional equivalent coverage), test set is 4.7%, which is caused by the sensitivity of the resulting in a 0.7% decrease in accuracy and a 0.9% deep tree structure to noise. increase in FP rate (3 new misclassifications in the In Table 4, the M-KNN model exhibits statistically confusion matrix); When the number of iterations is significant advantages in key performance indicators: its reduced by 20% to 40 times, the convergence saturation classification accuracy of 96.20% ± 0.3% (t=7.32, df=8, point is not reached (Figure 1a shows that there is still 0.4% p<0.001) significantly outperforms LSTM (94.50% ± optimization space for K=4 in the 40th generation), 0.8%) and SVM (89.30% ± 1.2%). The core resulting in insufficient optimization of feature weights breakthrough lies in dynamic weight optimization (WK3 (such as acceleration mean weight 0.21 → 0.18), directly vector), which compresses the FP rate of highly similar causing the FP rate to increase by 0.6% (reaching 2.2%, actions to 1.6% (Fisher's test p<0.001). Specifically, there breaking the target threshold). On the contrary, excessive 348 Informatica 49 (2025) 331–350 Y. Xu parameter increase (population 195/iteration 60) leads to The lightweight features of the BBO-KNN a sharp decrease in marginal benefits: expanding the architecture are empirically supported by triple core population size by 30% only improves accuracy by 0.2%, optimization: at the memory level, K-Means clustering but increases computational latency by 23% (beyond the compression reduces each class of action samples from 20ms real-time constraint), and the fitness gain after 60 120 groups to 1 main center+15 key points, reducing iterations is less than 0.05%, which violates the principle memory usage to 3.62KB (96.1% lower than traditional of lightweight design. KNN), meeting the SRAM constraints of embedded The significant advantage of BBO-KNN over devices (such as smart prosthetics) (typically ≥ 64KB). existing SOTA (96.20% accuracy vs LSTM This compression strategy was validated in section 3.1.4 94.50%/SVM 89.30%) lies in its innovative fusion of with a data refinement rate of 87.5%; In terms of dynamic weight architecture and lightweight data computational performance, the BBO algorithm processing paradigm. In high similarity action scenes compresses the feature dimension from 24 dimensions to (such as running → jumping), traditional KNN causes 7 dimensions (equation 10), combined with triangular boundary blurring (FP rate>4.2%) due to fixed K values, inequality filtering (principle shown in Figure 2) to while BBO-KNN compresses the misjudgment rate to 1.6% reduce 85% of invalid calculations, resulting in a stable through BBO optimized dynamic weight vector WK3 end-to-end delay of less than 20ms (Table 5 shows 66.7 (Equation 14) combined with local feature weighting, times acceleration). The measured power consumption on thanks to its enhanced adaptive sensitivity to action the ARM Cortex-M7 chip is only 0.12W, which is 89.3% biomechanical features. Compared with LSTM and other lower than the LSTM scheme; In terms of resource time series models, BBO-KNN abandons the redundant robustness, under noise interference testing (sensor signal cycle structure and adopts K-Means clustering loss of 15%), the delay fluctuation is only ± 1.2%, the compression and edge computing deployment to reduce memory usage is<5KB, and the power consumption the delay from>200ms to<20ms of LSTM while is<0.13W (Table 6), which verifies the adaptability of maintaining the accuracy, breaking through the real-time edge deployment. These optimizations - storage bottleneck. This lightweight design solves the cost compression, computation simplification, and energy contradiction at the same time - compared to the optical efficiency management - have been rigorously supported capture scheme (2 million yuan) and the quantum by 50% cross validation (Table 3) and real-time computing scheme (dependent on specialized equipment), benchmark testing, addressing the high resource BBO-KNN achieves a 90% reduction in hardware costs dependency issues of traditional systems (such as LSTM through universal sensors (IMU/pressure). In terms of latency>200ms and GPU requirements), providing an individual adaptability, traditional transfer learning faces efficient solution for medical wearable devices. a 28% cross item error, while the incremental learning In Table 7,The ablation research system mechanism of BBO-KNN compresses the modeling deconstructed the core contribution of BBO-KNN: cycle of new athletes from 14-20 days to 5 days, filling removing BBO weight optimization resulted in a 2.03% the technical gap in personalized training. These drop in accuracy (96.20% → 94.17%) and a 1.9% breakthroughs validate the core value of dynamic weight increase in FP rate, highlighting the critical role of optimization in addressing static algorithm rigidity (82% dynamic weights in feature sensitivity; Disabling context system defects) and high-dimensional data noise fusion resulted in a 3.70% (92.50%) decrease in accuracy sensitivity (sensor interference fluctuations ± 1.2% vs and a significant increase in confusion of highly similar LSTM ± 2.8%). actions (running → jumping misjudgment rate+3.2%), In this study, there are three main reasons why data validating its effectiveness in resolving boundary imbalance is not a problem: (1) inherent balance of the blurring; Missing feature selection leads to a 6.10% dataset: the document clearly designed and validated accuracy loss (90.10%) and a 5.8% FP rate degradation, sample size balance (with class differences<14.3%), and exposing the interference of noisy features; However, maintained distribution consistency through hierarchical single center clustering caused a 6.78% (89.42%) drop in cross validation. (2) The implicit robustness of the model: accuracy due to the loss of intra class diversity, which K-Means clustering, BBO dynamic weights, and supports the necessity of hierarchical structure. There is triangular inequality decision-making all implicitly strong collaboration between components: BBO and enhance the tolerance for imbalance without the need for feature selection linkage increase convergence speed by explicit processing. (3) Experimental empirical support: three times, while context fusion and triangle inequality High precision, low FP rate, and uniform error collaboration reduce computational complexity by 65%, distribution confirm that performance is not affected by jointly supporting the system's comprehensive minority classes. Therefore, it is reasonable that the breakthroughs in accuracy (↑ 35.8%), real-time methods section did not separately discuss the handling performance (delay ↓ 98.7%), and robustness (noise of imbalances. If future research involves real fluctuation ± 1.2%). imbalanced data (such as rare actions), oversampling or Based on the analysis of model architecture and cost sensitive habits may be considered, but the balanced performance, the BBO-KNN model exhibits significant dataset used in this study already meets the requirements. advantages in scalability and edge deployment: Real-Time Motion Recognition in Special Training Systems Based… Informatica 49 (2025) 331–350 349 (1) Lightweight architecture and computational physically interpretable architecture (WK3 transparency optimization: BBO-KNN adaptively adjusts feature of weight vectors + triangle inequality decision paths) importance through dynamic weight optimization (BBO enables precise training control, supporting personalized algorithm), and significantly reduces computational style adaptation within 5 days (traditionally requiring 14 complexity by combining K-Means clustering to days). It significantly improves take-off accuracy during compress feature dimensions. Its parameter count is only practice for a provincial track and field team (take-off 1/5 of traditional deep learning models, and its memory angle error was reduced from 3.2°±1.1° to 0.8°±0.3%, usage is controlled within 50MB, meeting the resource p<0.01). In the future, we will integrate multimodal constraints of wearable devices. inertial and visual data to overcome the bottleneck of (2) Feasibility of edge deployment: In real-time real-time evaluation of complex movements such as detection scenarios such as mango grading, BBO-KNN gymnastics. has a inference delay of less than 8ms and an accuracy rate of 98% on embedded devices such as Jetson Nang, verifying its efficiency in resource constrained References environments. The noise robustness test shows that the performance fluctuation of the sensor under noise is less [1] Lin, Q., & Zou, J. (2022). Design of a professional than 1.2%, ensuring the stability of medical leave and sports competition adjudication system based on other fields. data analysis and action recognition algorithm. (3) Real time guarantee mechanism: Scientific Programming, 2022(1), 9402195- Dynamic feature selection: BBO algorithm filters 9402206. https://doi.org/10.1155/2022/9402195 redundant features in real-time (such as retaining only [2] Abid, Y. M., Kaittan, N., Mahdi, M., Bakri, B. I., key biomechanical indicators such as knee joint angle in Omran, A., Altaee, M., & Abid, S. K. (2023). motion recognition), reducing computational complexity Development of an intelligent controller for sports by 30%. training system based on FPGA. Journal of Hardware co-optimization: INT8 quantization and Intelligent Systems, 32(1), 20220260-20220270. hardware accelerated instruction sets are supported, and https://doi.org/10.1515/jisys-2022-0260 they consume only 22MW of power on the ARM Cortex- [3] Deepak, V., Anguraj, D. K., & Mantha, S. S. (2023). M7 processor, enabling 24/7 real-time monitoring. An efficient recommendation system for athletic In summary, BBO-KNN has solved the bottleneck performance optimization by enriched grey wolf of computing, energy consumption, and real-time optimization. Personal and Ubiquitous Computing, performance of edge devices through algorithm hardware 27(3), 1015-1026. https://doi.org/10.1007/s00779- collaborative design, providing a reliable technical 022-01683-z foundation for wearable health monitoring and intelligent [4] Canbulat, O. A., Turgay, S., & Kara, E. S. (2025). A prosthetics. machine learning approach to baseball player assessment using KNN, logistic regression, and gaussian naive bayes. Financial Engineering, 3(1), 5 Conclusion 14-21. https://doi.org/10.37394/232032.2025.3.2 [5] Tan, L., & Ran, N. (2023). Applying artificial This study verified the superiority of the BBO-KNN intelligence technology to analyze the athletes’ model on sports data sets through comparative training under sports training monitoring system. experiments. The results show that the model International Journal of Humanoid Robotics, 20(06), significantly improves the classification accuracy of 2250017. high-similarity actions through dynamic weight strategy https://doi.org/10.1142/S0219843622500177 and local feature optimization, the system performs [6] Yan, X. (2024). Effects of deep learning network highly similar actions such as running ↔ The FP rate of optimized by introducing attention mechanism on jumping has decreased to 1.6%, and the global FP rate is basketball players' action recognition. Informatica, 1.39%. At the same time, it has low latency (<20ms) and 48(19). https://doi.org/10.31449/inf.v48i19.6188 strong anti-interference characteristics, and is superior to [7] He, P. (2023). Sports motion feature extraction and traditional models such as LSTM and SVM in real-time automatic recognition algorithm based on video and robustness. image technology. Academic Journal of Computing The BBO-KNN model promotes intelligent sports & Information Science, 6(12), 106-117. training through three technological innovations. First, https://doi.org/10.25236/AJCIS.2023.061212 dynamic weight optimization (BBO algorithm) reduces [8] Rodriguez Macias, M., Gimenez Fuentes-Guerra, F. the false alarm rate for highly similar movements to 1.6% J., & Abad Robles, M. T. (2022). The sport training (Table 4). Second, the model, combined with hierarchical process of para-athletes: A systematic review. clustering compression (K-Means dual-center), achieves International Journal of Environmental Research a memory footprint of <5KB (96.1% compression rate) and Public Health, 19(12), 7242-7253. and end-to-end latency of <20ms (Table 5). Third, its https://doi.org/10.3390/ijerph19127242 350 Informatica 49 (2025) 331–350 Y. Xu [9] Cizmic, D., Hoelbling, D., Baranyi, R., Breiteneder, G. A. (2022). On predicting soccer outcomes in the R., & Grechenig, T. (2023). Smart boxing glove Greek league using machine learning. Computers, “RD α”: IMU combined with force sensor for highly 11(9), 133-145. accurate technique and target recognition using https://doi.org/10.3390/computers11090133 machine learning. Applied Sciences, 13(16), 9073- [20] Merzah, B. M., Croock, M. S., & Rashid, A. N. 9088. https://doi.org/10.3390/app13169073 (2024). Intelligent classifiers for football player [10] Balkhi, P., & Moallem, M. (2022). A multipurpose performance based on machine learning models. wearable sensor-based system for weight training. International Journal of Electrical and Computer Automation, 3(1), 132-152. Engineering Systems, 15(2), 173-183. https://doi.org/10.3390/automation3010007 https://doi.org/10.32985/ijeces.15.2.6 [11] Calderón-Díaz, M., Silvestre Aguirre, R., Vásconez, [21] Bunker, R., & Susnjak, T. (2022). The application of J. P., Yáñez, R., Roby, M., Querales, M., & Salas, R. machine learning techniques for predicting match (2023). Explainable machine learning techniques to results in team sport: A review. Journal of Artificial predict muscle injuries in professional soccer Intelligence Research, 73(3), 1285-1322. players through biomechanical analysis. Sensors, https://doi.org/10.1613/jair.1.13509 24(1), 119-131. https://doi.org/10.3390/s24010119 [22] Teixeira, J. E., Afonso, P., Schneider, A., [12] Iduh, B. N., Umeh, M. N., Anusiuba, O. I., & Egba, Branquinho, L., Maio, E., Ferraz, R., et al. (2025). F. A. (2024). Development of a predictive modeling Player tracking data and psychophysiological framework for athlete injury risk assessment and features associated with mental fatigue in U15, U17, prevention: A machine learning approach. European and U19 male football players: A machine learning Journal of Theoretical and Applied Sciences, 2(4), approach. Applied Sciences, 15(7), 3718-3730. 894-906. https://doi.org/10.3390/app15073718 https://doi.org/10.59324/ejtas.2024.2(4).73 [23] Woltmann, L., Hartmann, C., Lehner, W., Rausch, P., [13] Chen, J., & Cui, P. (2024). The application of deep & Ferger, K. (2023). Sensor-based jump detection learning in sports competition data prediction. and classification with machine learning in Scalable Computing: Practice and Experience, trampoline gymnastics. German journal of exercise 25(6), 5322-5330. and sport research, 53(2), 187-195. [14] Taborri, J., Palermo, E., & Rossi, S. (2023). Warning: https://doi.org/10.1007/s12662-022-00866-3 A wearable inertial-based sensor integrated with a [24] Sonalcan, H., Bilen, E., Ateş, B., & Seçkin, A. Ç. support vector machine algorithm for the (2025). Action recognition in basketball with identification of faults during race walking. Sensors, inertial measurement unit-supported vest. Sensors, 23(11), 5245-5256. 25(2), 563-275. https://doi.org/10.3390/s25020563 https://doi.org/10.3390/s23115245 [25] Canbulat, O. A., Turgay, S. A. & Kara, E. S. (2025). [15] Hanif, M. A., Akram, T., Shahzad, A., Khan, M. A., A machine learning approach to baseball player Tariq, U., Choi, J. I., et al. (2022). Smart devices assessment using KNN, logistic regression, and based multisensory approach for complex human gaussian naive bayes. Financial Engineering, 3, 14- activity recognition. Computers, Materials & 21. https://doi.org/10.37394/232032.2025.3.2 Continua, 70(2), 3221-3234. [26] Zhang, Y., Wang, X., Xiu, H., Ren, L., Han, Y., Ma, https://doi.org/10.32604/cmc.2022.019815 Y., Chen, W., Wei, G. & Ren, L. (2023). An [16] AshokKumar, S., & Rajesh, K. P. (2023). Hyper- optimization system for intent recognition based on parameters activation on machine learning an improved KNN algorithm with minimal feature algorithms to improve the recognition of human set for powered knee prosthesis. Journal of Bionic activities with IoT sensor dataset. Indian Journal of Engineering, 20(6), 2619-2632. Science and Technology, 16(35), 2856-2867. https://doi.org/10.1007/s42235-023-00419-w https://doi.org/10.17485/IJST/v16i35.882 [27] Cao, G., Zhang, Y., Zhang, H., Zhao, T., & Xia, C. [17] Kumar, G. S., Kumar, M. D., Reddy, S. V. R., (2024). A hybrid recognition method via Kelm with Kumari, B. S., & Reddy, C. R. (2024). Injury Cpso for Mmg-based upper-limb movements prediction in sports using artificial intelligence classification. Journal of Mechanics in Medicine applications: A brief review. Journal of Robotics and and Biology, 24(06), 2350084. Control (JRC), 5(1), 16-26. https://doi.org/10.1142/S0219519423500847 https://doi.org/10.18196/jrc.v5i1.20814 [18] Molavian, R., Fatahi, A., Abbasi, H., & Khezri, D. (2023). Artificial intelligence approach in biomechanics of gait and sport: a systematic literature review. Journal of Biomedical Physics & Engineering, 13(5), 383-395. https://doi.org/10.31661/jbpe.v0i0.2305-1621 [19] Malamatinos, M. C., Vrochidou, E., & Papakostas, https://doi.org/10.31449/inf.v49i16.9709 Informatica 49 (2025) 351–360 351 Robust Cascaded Clutter Suppression and Deep Integration of Spatiotemporal Point Networks for Enhanced Mmwave Radar Motion Capture in Snowsports Yulun Liu Sport Institute, Henan University, Kaifeng 475001, China E-mail: lunzi2323@163.com Keywords: millimeter wave radar, anti-interference algorithm, clutter suppression, joint positioning RMSE Received: June 13, 2025 In snow sports motion capture, mmWave radar signals suffer from multipath reflections and frequency offsets due to snowflake scattering and temperature variations, severely degrading pose estimation accuracy. To address this, we propose a cascaded anti-interference framework composed of adaptive MTI filtering, genetic sparse array optimization, and hybrid carrier tracking. These physical-layer enhancements are followed by a spatiotemporal 3D CNN–LSTM network for motion decoding and a multimodal Kalman-particle filter for trajectory fusion. Experimental validation in both simulation and real-world snow environments confirms the framework’s robustness. Compared to baseline systems, the proposed method reduces the joint positioning root mean square error (RMSE) by up to 72%, enhances angular velocity tracking precision by 72%, and improves signal-to-noise ratio (SNR) by 24.3 dB. The end-to-end processing delay remains under 26 ms, ensuring real-time deployment. These results demonstrate significant improvements in accuracy, robustness, and real-time performance under harsh environmental interference, offering a viable solution for mmWave-based motion capture in snowy sports scenarios. Povzetek: Razvit je nov sistem zaznavanja z radarjem v snežnih razmerah. Združuje napredno odstranjevanje snežnih motenj, optimizirane radarske antene in globoke prostorsko-časovne mreže za natančnejše 3D zajemanje gibanja. 1 Introduction performance of the proposed scheme through experiments; finally summarizes the whole paper and In an ice and snow environment, factors such as the looks forward to future research directions. Compared snow particle multipath effect and low-temperature with prior motion capture systems that largely depend on frequency offset cause significant interference to single-layer improvements in either signal preprocessing millimeter-wave radar signals, affecting their capture or neural architectures, the proposed framework accuracy and real-time performance. This paper proposes introduces a novel end-to-end anti-interference pipeline an anti-interference algorithm optimization scheme based that systematically bridges physical-layer signal on the signal-feature-trajectory three-level processing enhancement, spatiotemporal feature modeling, and pipeline, aiming to improve the accuracy and real-time decision-layer physical fusion. This integration is not performance of millimeter-wave radar in real-time merely a technical stacking of modules, but reflects a motion capture of ice and snow projects. This paper methodology-level shift: instead of treating radar noise suppresses the multipath effect of snow particles and low- and biomechanical trajectory estimation as isolated temperature frequency deviation by improving MTI challenges, we co-model them through a unified cross- filtering, sparse array reconstruction and carrier tracking domain optimization approach. The use of genetic sparse loop technology; constructs a 3D convolution-LSTM array reconstruction, hybrid deep filters, and interpretable spatiotemporal hybrid network to decouple joint motion multimodal distillation in a real-time snow environment characteristics; and adopts an extended Kalman-particle has not been previously reported. This work thus filter hybrid architecture to fuse multimodal data and represents not only a novel system architecture, but also improve the physical rationality of trajectory prediction. proposes a replicable methodology for robust radar-based The paper first introduces the research background, motion capture in hostile conditions. contribution of this paper and the structure of the paper; summarizes the current research status of motion capture technology at home and abroad [1]; then introduces the specific implementation of the proposed anti-interference algorithm optimization scheme; then verifies the 352 Informatica 49 (2025) 351–360 Y. Liu athletes and coaches with more accurate training 2 Related work feedback and analysis tools. Chen et al. [5] proposed a nonlinear method to segment long motion sequences into Researchers are committed to improving the atomic motion fragments while applying dimensionality accuracy and efficiency of motion capture through reduction for effective retrieval and segmentation of various technical means to meet the application motion data in professional sports training scenarios. Teer requirements in different scenarios. Based on the [6] analyzed infrared motion capture data using a random problems of inaccurate capture and low computational forest algorithm, optimized model parameters, and efficiency of existing motion capture technology, which evaluated model performance. Data was collected using affected its performance in real-time application both optical markers and IMU sensors. Existing research scenarios, Zhang and Qiu [2] introduced a Levenberg- has achieved remarkable results in the accuracy, Marquardt algorithm for skeleton point coordinate fitting efficiency and application scope of motion capture optimization and optimized it using particle swarm technology, but its application in complex environments algorithm. At the same time, the dynamic time warping such as ice and snow sports still faces challenges. This algorithm is used to capture and evaluate human motion paper focuses on the application of millimeter-wave radar in order to achieve real-time capture of human motion. in real-time motion capture of ice and snow sports, and The results show that the algorithm has a capture proposes an anti-interference algorithm optimization accuracy of up to 99.23% for the shoulder lateral raise, scheme. By analyzing the characteristics of ice and snow which is significantly better than the other comparison sports, the stability and reliability of the motion capture algorithms. Li et al. [3] used a markerless motion capture system are improved, providing more accurate technical system and a marker motion capture system (Vicon) in support for the training and analysis of ice and snow the Huawei Sports Health Laboratory to collect human sports. marker trajectory data during the unloaded squat process. The squat action is divided into three stages: descent, squat hold, and ascent. The kinematic data collected by 3 Method the system is imported into Opensim, and the knee joint degrees of freedom of the musculoskeletal model are 3.1 Overall framework design increased to enable it to have adduction/abduction and The anti-interference algorithm optimization internal/external rotation functions. Inverse kinematics framework proposed in this study adopts a three-level and body segment kinematics calculations are performed, signal-feature-trajectory processing pipeline architecture, and the key point data are used to develop an algorithm and realizes high-precision real-time motion capture in to calculate the foot orientation angle. ice and snow environments through a cross-layer Li et al. [4] analyzed the application of virtual reality collaborative mechanism, as shown in Figure 1: technology in motion capture, evaluated its potential in improving the accuracy of sports training, and provided Figure 1: Processing architecture At the physical layer, the cascaded clutter joint motion characteristics through a spatiotemporal suppression module performs preprocessing on the snow hybrid neural network to solve the spatiotemporal particle multipath effect and low-temperature frequency coupling problem in dynamic target tracking; the decision deviation to provide high-quality signal input for layer uses a hybrid filtering architecture to fuse kinematic subsequent processing; the feature layer decouples the constraints to improve the physical rationality of Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 353 trajectory prediction. Real-time performance is Snowflake motion typically causes Doppler guaranteed through parallel pipeline design and hardware spreads >500 Hz, enabling MTI filters to adaptively acceleration, using FPGA to accelerate 3D convolution operations and CUDA to parallelize particle filtering. The suppress snow clutter. system obtains the original millimeter-wave radar signal from zero-copy transmission, and after second-order suppression and differential system preprocessing, it is divided into two paths: one path outputs joint features through a spatiotemporal hybrid network (including 3D Cove feature extraction, LSTM time series modeling, 3.2.2 Sparse array optimization via genetic algorithm self-attention fusion and knowledge distillation); the To emulate a 64-element full array with 32 physical other path is processed by improved MTI filtering, spatial array reconstruction, and adaptive carrier tracking. The elements, we employ a genetic algorithm (GA) that joint features are used for semantic motion detection, and maximizes the following fitness function: then input into a hybrid filter (fusion of extended Kalman 1 1 filtering, particle filtering and rigid body constraints) 𝐹 = 𝛼 ⋅ + 𝛽 ⋅ , 𝛼 = 0.6, 𝛽 = 0.4 (3) PSLL MBW together with the above results to achieve target tracking. The performance is improved through FPGA acceleration and CUDA parallel computing throughout the process. Formally, the optimization problem is: Zero-copy data transmission is achieved between the three-level processing modules through a shared memory 1 1 max 𝐹(𝐗′) = 𝛼 ⋅ + 𝛽 ⋅ (4) pool, and the timestamp alignment module eliminates 𝐗′⊆𝐗,|𝐗′|=32 PSLL(𝐗′) MBW(𝐗′) cross-layer delays, forming a complete processing chain from raw signals to semantic understanding. Here, X is the candidate element position set, and X′ is the selected sparse subset. Missing elements in the 3.2 Physical layer covariance matrix are reconstructed using nuclear norm To address the unique signal degradation issues in minimization to restore virtual aperture beamforming snowy environments, the physical layer architecture performance. Such fitness-driven selection and elite employs a cascaded clutter suppression mechanism preservation schemes have shown robust convergence in consisting of adaptive MTI filtering, sparse array sparse synthesis problems using enhanced genetic reconstruction, and carrier tracking loop stabilization. strategies [7, 8]. This section outlines both the theoretical modeling and Algorithm 1: Genetic Algorithm for Sparse Array algorithmic implementation to ensure clarity and Optimization reproducibility. Input: Position set X, population size M=50, generations G=200, crossover rate Pc=0.7, mutation rate Pm=0.01 3.2.1 Radar signal modeling Output: Optimal layout X′, fitness score F The transmitted signal is modeled as a linear FMCW (1) Initialize random population of sparse layouts chirp: (2) For generation g=1 to G: a. Evaluate fitness F for each layout 𝐵 𝑠𝑡𝑥(𝑡) = cos (2𝜋 (𝑓𝑐𝑡 + 𝑡2)) (1) b. Select parents via tournament selection 2𝑇 c. Apply crossover with Pc where fc is the carrier frequency, B is the bandwidth, and d. Apply Gaussian mutation with Pm e. Preserve top performers (elitism) T is the chirp duration. The received baseband signal is: f. If best fitness change <1% over 5 generations, terminate 𝑠𝐼𝐹(𝑡) = ∑𝐾 𝑘=1𝐴𝑘 cos[2𝜋(𝑓𝑏,𝑘 + 𝑓𝑑,𝑘)𝑡 + 𝜙𝑘] + 𝑛(𝑡) (2) (3) Return best layout X′ with: The convergence threshold was empirically set to 1% over 5 generations, ensuring both global search stability 2𝑅 𝑓 𝑘𝐵 and computational efficiency across tested scenarios. 𝑏,𝑘 ≈ : beat frequency, 𝑐𝑇 This layout is combined with virtual aperture reconstruction to simulate full-array resolution [9]. The 2𝑣𝑘𝑓𝑓 𝑐 𝑑,𝑘 = : Doppler shift, 𝑐 specific processing architecture is illustrated in Figure 2, which shows the full cascade including adaptive MTI 𝑅𝑘, 𝑣𝑘: target range and velocity, filtering, sparse array optimization, hybrid carrier 𝑛(𝑡): additive noise. tracking, and deep spatiotemporal decoding. 354 Informatica 49 (2025) 351–360 Y. Liu compensated crystal oscillator (TCXO, ±0.5 ppm), and the digital part features a high-resolution (0.01 rad) phase detector. Predistortion compensation is applied via a LUT-based correction. Adaptive loop bandwidth (10–100 kHz) ensures phase noise <0.5 MHz and frequency deviation <±200 Hz at 77 GHz. 3.2.4 Cascaded clutter suppression pipeline The clutter suppression pipeline employs a hybrid feedforward–feedback structure, where Doppler-based MTI filtering eliminates low-velocity clutter, while cross- correlation feedback adaptively tunes the cutoff frequency based on environmental dynamics. Real-time continuity is maintained through fixed-depth data buffers (512 samples). 3.2.5 Quantitative results summary To quantitatively evaluate the effectiveness of the proposed cascaded architecture, a series of simulations were conducted under realistic snow-interference Figure 2: Flowchart of Genetic Algorithm conditions. Specifically, 1000 Monte Carlo trials were performed at a 1 GHz sampling rate to measure 3.2.3 Carrier tracking loop stabilization improvements in signal quality, latency, and noise rejection across various processing stages. The results are A hybrid digital-analog phase-locked loop (PLL) is summarized in Table 1. used to ensure carrier stability under extreme temperatures [10]. The analog part uses a temperature- Table 1: Signal interference suppression performance in a snowy environment Processing Stage SNR Improvement Processing Delay Noise Stripping BER Configuration (dB) (ms) Gain (dB) No suppression 0.0 1.0 × 10⁻³ 5.0 0.0 (Baseline) MTI filtering only 8.5 5.0 × 10⁻⁴ 5.5 7.2 MTI + Sparse array 14.2 1.2 × 10⁻⁴ 6.8 12.5 reconstruction MTI + Sparse array 18.7 3.0 × 10⁻⁵ 7.5 16.8 + Carrier tracking Full cascade system 24.3 5.0 × 10⁻⁶ 8.0 22.0 (All three stages) As shown in the results, each stage contributes 3.2.6 Benchmark comparison and robustness test significantly to overall performance. While MTI filtering To assess the relative effectiveness of the proposed alone improves the signal-to-noise ratio by 8.5 dB, the genetic sparse array reconstruction, we benchmarked it addition of sparse array reconstruction and carrier against two conventional methods: (a) uniform linear tracking boosts the SNR to 24.3 dB and reduces the BER thinning (ULT) and (b) random sparse layouts (RSL). All by nearly two orders of magnitude. The full cascade methods were evaluated under identical snow- system also maintains low latency (8 ms), making it interference conditions. suitable for real-time applications in snow-covered Key results: environments and demonstrating enhanced robustness to (1) The proposed GA-based layout achieved a 24.3 noise [11]. dB SNR improvement, outperforming ULT (16.1 dB) and Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 355 RSL (12.4 dB); weights to high-speed motion joint areas. The feature (2) The full system reduced BER by nearly 10× over extraction layer introduces a multi-head self-attention ULT and 30× over RSL; mechanism (4 heads, each with a QKV dimension of 64), (3) Under low-SNR boundary tests (<5 dB), our and realizes cross-part feature interaction and method maintained <1.2 × 10⁻⁴ BER, while other layouts enhancement by calculating the correlation matrix degraded sharply. between joint points. Specifically, after sampling at each These results confirm the stability and level, local features are first extracted through MLP, and generalization capability of the GA-optimized array, then the spatial dependency of joint points is calculated particularly under challenging conditions. This aligns through the self-attention module, and finally the global with prior comparative findings showing the strengths feature representation is obtained through maximum and trade-offs between radar-based and vision-based pooling. This design enables the network to adaptively motion capture systems [12]. focus on the key joint point area while maintaining the ability to perceive the overall posture. 3.3 Feature layer The knowledge distillation system constructs a hierarchical multimodal teacher network system. The The 3D convolution-LSTM spatiotemporal hybrid teacher network not only contains the ResNet-152 network constructed by the feature layer proposes a backbone network pre-trained based on high-precision systematic solution to the problem of feature extraction optical motion capture data (sampling rate 200 Hz) but of millimeter wave point clouds in motion capture [13]. also integrates the motion prior knowledge base built At the network input, a dynamic voxelization method based on the biomechanical simulation model. During the based on motion compensation is used to convert the distillation process, a progressive temperature scheduling sparse millimeter wave radar point cloud data (typical strategy is adopted. In the initial stage, a higher density 0.3 points/cm³) into a regular dense tensor temperature parameter (T=5) is set to learn the overall representation. The voxel grid size is set to 2cm³, and the feature distribution of the teacher network. As the missing voxels are filled by the trilinear interpolation training progresses, the temperature is gradually reduced algorithm while retaining the geometric features of the (finally T=1) to focus on the migration of fine-grained original point cloud: motion features: V(x, y, z) = ∑1 ∑1 ∑1 1 i=0 j=0 k=0Vi,j,k ⋅ (1−∣ x − xi ∣) ⋅ (1−∣ LKD = ∑iK ( ∥ Q ) T2 L Pi i (7) y − yj ∣) ⋅ (1−∣ z − zk ∣) (5) LKD is the knowledge distillation loss; Pi is the probability distribution of the i-th output of the teacher V(x,y,z) is the value of the target voxel point; Vi,j,k network; Qi is the output probability distribution of the is the value of the surrounding known voxel points i,j,k; student network; T is the temperature parameter. (xi ,yj ,zk ) is the coordinate of the surrounding known Although ResNet-152 is used in this study for its voxel points; (x,y,z) is the coordinate of the target voxel proven feature extraction capability, the framework point. remains modular and can accommodate alternative The spatial feature extraction part uses a 5×5×5 3D backbones (e.g., MobileNet, EfficientNet, or ViT) with convolution kernel for multi-scale feature learning, and minor architectural adjustments. In particular, gradually expands the receptive field through hierarchical transformer architectures may be integrated in future to expansion convolution (expansion rates are 1, 2, and 4, better capture long-range temporal dependencies and respectively), ensuring that the spatial features from local improve robustness under occlusion [14]. joints to complete limb movements can be captured: Fout = W ∗ Fin + b(6) 3.4 Decision layer Fout is the output feature map; W is the convolution Inspired by hybrid sensor fusion frameworks such as kernel weight; Fin is the input feature map; b is the bias [15], this study designs an extended Kalman-particle term. filter (EKF-PF) hybrid architecture that achieves In terms of temporal modeling, a two-layer robustness optimization of trajectory estimation in ice bidirectional LSTM structure (hidden layer dimension and snow sports scenarios through multimodal data 256) is adopted. Peephole connections are introduced fusion and physical constraint modeling. This inside each LSTM unit to enhance the temporal memory architecture complements the advantages of the model- ability. At the same time, the zoneout mechanism based EKF and the data-driven PF: the EKF module uses (probability 0.2) is used to prevent overfitting and the acceleration and angular velocity data of the IMU effectively model the continuity and periodicity of human (sampling rate 200Hz) to represent the rigid body motion motion. state through the Lie group SE (3), avoiding the Euler The improved PointNet++ architecture adopts an angle singularity problem. Its state equation incorporates importance sampling strategy based on motion energy in the moment of inertia tensor constraint of the ski the point cloud sampling stage, giving higher sampling equipment, making the attitude estimation error stable 356 Informatica 49 (2025) 351–360 Y. Liu within 2°. The state equation is: (e.g., fresh snow, compacted snow, ice crystal snow) were simulated using a snow density gradient apparatus (0.1– x 0.4 g/cm³). Each scenario was repeated 20 times under a t∣t−1 = f(xt−1, ut) (8) snowfall intensity of 5–7 mm/h to collect radar intermediate frequency signals and environmental xt∣t−1 is the prior state estimate at time t; f is the variables, serving as the static clutter reference.3) state transfer function; xt−1 is the state at the previous Dynamic Motion Capture: 30 professional winter sports moment; ut is the control input. athletes (across freestyle skiing, alpine skiing, and snowboarding disciplines) performed standardized The particle filter module improves the nonlinear actions including linear gliding, sharp turning, and characteristics of ice and snow sports, and introduces a airborne rotations. Each action was repeated 15 times. multi-physics field coupling model in the importance Raw mmWave point clouds, inertial data, and optical sampling stage. When the snowboard lands, the Hertz motion capture (Vicon, 200 Hz) were synchronously contact theory is used to construct the snow surface recorded. The study uses both optimized and unoptimized interaction model (stiffness coefficient k=5×10⁴ N/m³), radar pipelines to quantify performance gains. and the friction coefficient μ is dynamically adjusted in All trials were conducted within a purpose-built combination with the compression characteristics of climate-controlled snow chamber (5 m × 8 m × 3 m), with snow particles (adaptive changes in the range of 0.03- temperature regulation from −30 °C to 25 °C (±0.5 °C 0.15). In the airborne stage, the law of conservation of accuracy) and snow layers of 10–50 cm. All sensors were angular momentum is strictly followed, and the center of synchronized via PTP protocol. A five-fold cross- mass trajectory is corrected by constraining the moment validation strategy was applied by stratifying athletes of inertia ratio of the limbs relative to the trunk, so that across training and testing splits to ensure the trajectory error of the jumping action is significantly generalizability and prevent overfitting. reduced. The importance sampling weight update In order to verify the performance of the proposed formula is: (i) (i) (i) framework in a real ice and snow environment, this study wt ∝ wt−1 ⋅ 𝑝(zt ∣ xt ) (9) built a professional test environment with climate control characteristics. The core of the experimental platform is (i) (i) wt and w a customized ice and snow environment simulation cabin, t−1 are the weights of the ith particle at time t and t-1; 𝑝 is the observation probability, which which adopts a double-layer insulation structure design. (i) refers to the probability of observing zt in state xt . The inner layer is a 5 m×8 m×3 m test space, equipped The fusion strategy of the hybrid architecture adopts with a precision temperature control system (temperature a dynamic probability weighting mechanism: the control range -30℃ to 25℃, accuracy ±0.5℃) and a confidence weight (0.7) of the millimeter-wave radar is humidity control device (relative humidity control range adjusted in real time according to the point cloud density 30%~90%). The floor of the cabin is paved with an and signal-to-noise ratio (in the range of 0.6-0.8), and the artificial snow layer with an adjustable thickness (10~50 IMU weight (0.3) is inversely correlated with its cm), and the snow density is controlled in the range of gyroscope zero-bias stability index. 0.1~0.4 g/cm³ to simulate different snow conditions. The test scenario configuration includes: 1) a multi-angle adjustable millimeter wave radar array (4 77 GHz FMCW 4 Results and discussion radars, bandwidth 4 GHz, maximum output power 10 dBm), installed at a height of 2.5m and distributed in a 4.1 Study design ring; 2) a reference-level optical motion capture system (12 Vicon Vero series cameras, sampling rate 200 Hz) as The experimental protocol includes three levels of the baseline truth value; 3) distributed IMU nodes (9-axis testing procedures: To ensure the reproducibility and sensors, bandwidth 200 Hz) fixed at the main joints of the rigor of the evaluation, the study design includes detailed subjects; 4) environmental parameter monitoring specifications on trial repetition, control configurations, terminals, real-time recording of variables such as and validation protocols. The three-stage experiment temperature, humidity, and wind speed. All devices are comprises the following components:1) Baseline time synchronized through the PTP protocol, and data Calibration: Conducted in a controlled environment acquisition is controlled by a unified trigger signal to (−5 °C, 60% relative humidity), a KUKA KR6 R900 ensure the time alignment accuracy of multimodal data. robotic arm equipped with a 10 dBsm radar corner During the test, the subjects wore standard skiing reflector executes preset trajectories with linear velocities equipment and completed the specified action sequence ranging from 0–15 m/s and angular velocities up to (straight sliding, sharp turns, jumping, etc.), 1080°/s. A total of 300 motion sequences were collected synchronously collecting millimeter wave point clouds, to calibrate the internal parameters of the mmWave radar IMU data, and optical motion capture coordinates to and to establish the unoptimized system baseline.2) Static construct a multidimensional data set covering different Interference Test: Five representative snow conditions motion states [16]. While the Vicon system provides Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 357 high-accuracy ground truth, the framework is compatible downhills, and 12 snowboard big air) to evaluate the with alternative motion tracking modalities, such as algorithm optimization effect of millimeter wave radar in wearable IMUs or markerless systems like OpenPose extreme environments. The millimeter wave radar before [17], ensuring flexibility in deployment. and after optimization captures the joint positioning RMSE of the athlete's movements and the angular 4.2 Quantitative indicator analysis velocity error of the aerial rotation movement. The results are shown in Figures 3 and 4: 4.2.1 Motion capture accuracy verification This paper selects 30 ice and snow athletes (including 10 freestyle skiing aerials, 8 alpine skiing Figure 3: RMSE of joint positioning As shown in Figure 3, the RMSE comparison data a breakthrough accuracy of 1.1 cm in straight-line gliding of joint positioning before and after the optimization of conditions, which is primarily attributed to the enhanced the millimeter-wave radar anti-interference algorithm suppression effect of the improved MTI filter on snow shows that the joint positioning errors in the tests of the and fog-induced multipath interference. For each athlete, top 30 ice and snow athletes are distributed in the range the joint positioning RMSE values represent the mean of of 7.0-13.0 cm, among which the error of 13.0 cm for 15 repeated trials, with standard deviation error bars athlete No. 30 reaches the maximum value, reflecting the shown in Figure 3. A two-tailed paired t-test comparing performance bottleneck of the millimeter-wave radar the baseline and optimized systems revealed statistically under extreme sports conditions before optimization. significant improvements across all athletes (p < 0.01), After optimization, the error range is compressed to 1.1- demonstrating that the proposed anti-interference 7.0 cm through the synergy of the cascaded clutter algorithm robustly enhances the motion capture accuracy suppression module and the 3D convolution-LSTM of millimeter-wave radar under complex snow spatiotemporal hybrid network. Athlete No. 22 achieves environments. 358 Informatica 49 (2025) 351–360 Y. Liu Figure 4: Angular velocity error As shown in Figure 4, The average angular velocity difficult rotation movements. However, under extreme error of the millimeter-wave radar in the tests of the top conditions, there is still a residual error of 2.8°/s, which 30 ice and snow athletes before algorithm optimization is is mainly due to insufficient compensation for Doppler 5.25°/s (range 3.2-6.9°/s), among which athlete No. 25 frequency shift caused by high-speed movement. In the had a maximum error of 6.9°/s when completing a 1080° future, the tracking performance of the system in high- rotation, exposing the phase loss problem of millimeter- dynamic scenarios will be further improved by wave radar for high angular velocity dynamic tracking. introducing an adaptive carrier tracking loop and Through the joint optimization of the cascaded clutter hardware acceleration processing to meet the stringent suppression module and the 3D convolution-LSTM requirements for motion capture accuracy. spatiotemporal hybrid network, the average angular velocity error is significantly reduced to 1.46°/s (range 0- 4.2.2 Anti-interference performance analysis 2.8°/s) after optimization, a decrease of 72%, and athletes In response to the extreme weather interference No. 25 and 26 achieve zero-error tracking. The reduction common in ice and snow sports, this study builds a multi- in angular velocity error was statistically significant physics field coupling model to test the anti-interference across athletes (paired t-test, p < 0.01), with error bars in ability of millimeter-wave radar under different Figure 4 indicating standard deviation over 15 repetitions meteorological conditions. As shown in Table 2, in the per action. Experimental results show that the proposed simulated blizzard weather (snowfall>5mm/h) test, the anti-interference algorithm significantly improves the proposed cascaded clutter suppression module shows angular velocity tracking accuracy of millimeter-wave excellent multipath interference suppression ability: radar in ice and snow sports scenarios, especially in Table 2: Anti-interference ability test results Multipath Suppression Ratio Test Scenario Snowfall Intensity (mm/h) Positioning Error (cm) (dB) Freestyle Ski Aerials 5.8 28.2 3.2 Snowboard Big Air Landing 6.3 25.7 7.0 Alpine Ski Downhill 7.1 26.9 4.5 Cross-Country Ski Curves 5.2 29.4 2.1 Biathlon Shooting 6.0 27.5 3.8 Robust Cascaded Clutter Suppression And Deep Integration of… Informatica 49 (2025) 351–360 359 Table 2 shows the anti-interference performance test adjustment technology maintains the multipath data of millimeter-wave radar for different ice and snow suppression ratio at 25.7 dB, it still needs to further sports scenes in a blizzard environment. In the range of optimize the phase noise suppression by improving the 5.2-7.1mm/h snowfall intensity, it shows excellent DPLL loop bandwidth. These quantitative results not performance in freestyle skiing aerial skills scenes, only confirm the reliability of millimeter-wave radar in achieving a multipath suppression ratio of 28.2 dB and a extreme ice and snow environments but also provide a positioning error of 3.2 cm. This achievement is mainly clear direction for subsequent technology iterations, due to the synergy of the cascaded clutter suppression especially for the optimization needs of Doppler module and the 3D convolution-LSTM spatiotemporal compensation algorithms in ultra-high-speed scenarios. hybrid network. In contrast, the single-board large platform landing impact scene is affected by the 8 G 4.2.3 Real-Time verification impact acceleration, and the positioning error rises to 7.0 The end-to-end processing time from radar signal cm, which directly reflects the interference effect of input to trajectory output is recorded to understand the carrier frequency deviation and multiple reflections of the real-time performance of the millimeter-wave radar. The snow layer on the propagation of millimeter-wave signals. results are shown in Table 3: Among them, although the dynamic waveform Table 3: Real-time test results Test Scenario Processing Delay (ms) Multi-Target Capacity Frame Rate (fps) Freestyle Ski Aerials 24.2 ± 1.5 3 athletes 38 Snowboard Big Air 22.8 ± 1.2 3 athletes 40 Landing Alpine Ski Downhill 21.5 ± 0.8 3 athletes 42 Cross-Country Ski Curves 23.1 ± 1.1 3 athletes 39 Biathlon Shooting 25.6 ± 1.8 3 athletes 36 Table 3 compares the real-time performance of the acceptable bounds for real-time snow sports motion system across different ice and snow sports scenarios capture. along four key dimensions. In terms of processing delay, all scenarios maintain latencies below 26 ms, with alpine 5 Conclusion skiing downhill achieving the best performance (21.5 ± 0.8 ms), and biathlon shooting exhibiting the This study proposes a cascaded anti-interference highest delay (25.6 ± 1.8 ms). The multi-target tracking architecture to address multipath and frequency offset capability consistently supports the simultaneous problems in mmWave radar-based motion capture under tracking of three athletes across all test conditions. The snowy environmental conditions. Through the integration system frame rate remains in the range of 36–42 FPS, of adaptive MTI filtering, genetic sparse array fully meeting the real-time demands of competitive snow reconstruction, and hybrid carrier tracking, combined sports motion capture. These results clearly demonstrate with a deep spatiotemporal 3D CNN–LSTM decoding the real-time performance advantages of the proposed network and multimodal EKF–PF fusion, the proposed anti-interference algorithm in complex ice and snow system demonstrates significant improvements in environments, providing a reliable foundation for the accuracy, robustness, and real-time performance. practical deployment of markerless motion capture The main contributions of this study are as follows: systems in elite athletic training and competition. It is 1) A three-stage signal processing pipeline is important to distinguish between algorithmic latency and designed to suppress snow-induced multipath clutter and full-system latency. The 8 ms latency reported in Table 1 frequency distortion, improving low-SNR motion signal reflects only the simulation-based execution time of the reconstruction. optimized cascade pipeline, evaluated using FPGA and 2) A novel deep learning-based decoder is developed, CUDA acceleration. In contrast, the real-world latencies leveraging 3D CNN and LSTM to model complex presented in Table 3 include end-to-end delays such as temporal-spatial dependencies in radar point clouds. radar signal acquisition, data transfer, and multi-target 3) A multimodal fusion strategy integrating processing overhead, resulting in a total system delay of extended Kalman filtering and particle filtering is 21–26 ms. Despite this, the system remains within introduced for robust trajectory estimation in dynamic, 360 Informatica 49 (2025) 351–360 Y. Liu cluttered environments. from random projections: Universal encoding The proposed method is validated on both simulated strategies?” IEEE Trans. Inf. Theory, vol. 52, no. 12, and real-world datasets involving elite snow sport pp. 5406–5425, Dec. 2006. athletes, showing that the system achieves sub-centimeter https://doi.org/10.1109/TIT.2006.885507 RMSE accuracy and end-to-end latency below 26 ms [9] Y. Zhang, X. Liu, and J. Wang, “Genetic sparse array across diverse scenarios. In future work, we plan to optimization for millimeter-wave radar in snow enhance tracking under extreme dynamics by interference environments,” IEEE Trans. Geosci. incorporating event-based vision sensors (e.g., DVS), Remote Sens., vol. 61, pp. 1–12, 2023. which can further reduce motion blur and improve delay [10] N. Kumar and R. Patel, “Temperature-compensated robustness in high-speed actions [18]. Additionally, PLL design for FMCW radar in harsh environments,” integrating edge computing and hardware acceleration IEEE Trans. Circuits Syst. I, vol. 68, no. 5, pp. (e.g., FPGA optimization) will be explored to further 2065–2077, 2021. optimize latency for large-scale deployment. [11] E. Baccarelli and M. Scarpiniti, “Robust deep filtering architectures for noisy radar environments,” IEEE Access, vol. 11, pp. 22256–22267, 2023. Funding [12] X. Tang and Q. Song, “Comparative evaluation of radar-based and vision-based human motion capture This study is supported by "Research on the Upgrading systems,” Meas. Sci. Technol., vol. 34, no. 2, Art. no. and Development of China's Sports Industry Driven by 025109, 2023. New Productive Forces" (No.SKL-2025-1135). [13] J. Wu, S. Zhao, and Y. Liu, “Deep spatiotemporal modeling with CNN-LSTM for real-time radar- References based motion capture,” Pattern Recognit., vol. 131, Art. no. 108885, 2022. https://doi.org/10.13718/j.cnki.xdzk.2024.05.016 [1] H. Li, S. Qiu, and Y. Ma, “A survey on human [14] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention activity recognition using millimeter-wave radar,” is all you need,” in Adv. Neural Inf. Process. Syst., ACM Comput. Surv., vol. 56, no. 4, pp. 1–36, 2023. vol. 30, pp. 5998–6008, 2017. [2] X. Y. Zhang and G. P. Qiu, “Research on human https://proceedings.neurips.cc/paper/2017/hash/3f5 motion capture based on improved LM algorithm ee243547dee91fbd053c1c4a845aa-Abstract.html and dynamic time warping algorithm,” J. Southwest [15] T. Chen, R. Zhang, and K. Huang, “Hybrid Kalman- Univ. (Nat. Sci. Ed.), vol. 46, no. 5, pp. 175–185, particle filtering for multimodal sensor fusion,” 2024. IEEE Sens. J., vol. 22, no. 9, pp. 8654–8664, 2022. https://doi.org/10.13718/j.cnki.xdzk.2024.05.016 [16] S. Ahmed, S. Kim, and M. Park, “Snow Sense: A [3] S. F. Li, X. S. Zhang, Y. Guo, X. C. Li, L. Shi, and radar-based dataset for motion capture in snowy T. H. Zhan, “Biomechanical study of markerless conditions,” Sensors, vol. 22, no. 3, Art. no. 1011, motion capture technology in FMS squat action,” 2022. Med. Biomech., vol. 39, no. S01, p. 513, 2024. [17] Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. [4] X. H. Li, D. F. Fan, J. J. Feng, Y. Lei, C. Cheng, and Sheikh, “Open Pose: Realtime multi-person 2D pose X. N. Li, “Systematic review of motion capture in estimation using part affinity fields,” IEEE Trans. virtual reality: Enhancing the precision of sports Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172– training,” J. Ambient Intell. Smart Environ., vol. 17, 186, 2019. no. 1, pp. 5–27, 2025. https://doi.org/10.3233/AIS- https://doi.org/10.48550/arXiv.1812.08008 230 [18] G. Gallego, T. Delbrück, and D. Scaramuzza, [5] H. Chen, “Human motion capture data retrieval and “Event-based vision: A survey,” IEEE Trans. Pattern segmentation technology for professional sports Anal. Mach. Intell., vol. 44, no. 1, pp. 154–180, training,” J. Mobile Multimedia, vol. 19, no. 2, pp. 2022. 419–436, 2023. https://doi.org/10.13052/jmm1550- https://doi.org/10.1109/TPAMI.2020.3008413. 4646.1923 [6] B. Teer, “Performance analysis of sports training based on random forest algorithm and infrared motion capture,” J. Intell. Fuzzy Syst., vol. 40, no. 4, pp. 6853–6863, 2021. https://doi.org/10.3233/JIFS- 189517 [7] T. Alam, M. Benaida, “Smart Curriculum Mapping and Its Role in Outcome-based Education”, Informatica, vol. 46, no. 4. https://doi.org/10.31449/inf.v46i4.3717 [8] E. Candes and T. Tao, “Near-optimal signal recovery https://doi.org/10.31449/inf.v49i16.10164 Informatica 49 (2025) 361–372 361 Improved DenseNet-DCGAN for Enhanced Digital Restoration of Embroidery Cultural Heritage Guiying Dong1*, Qian Mao2 1College of Art and Design, Communication University of China Nanjing, Nanjing 210000, China 2Library, Nanjing University, Nanjing 210000, China E-mail: adong118@126.com; maoqian328@163.com *Corresponding author Keywords: DCGAN, DenseNet, embroidery, image classification, image restoration Received: July 14, 2025 At present, embroidery image restoration technology still has deficiencies in terms of color uniformity and detail restoration. To address these issues, the study improves the densely connected convolutional network and the deep convolutional generative adversarial network through spatial pyramid pooling, and proposes a novel method for embroidery image classification and restoration. The experimental results showed that the research method largely restored the details and colors of the original image and effectively addressed the uneven color issue. The average prediction accuracy, recall rate, and specificity of the image classification model on Suzhou embroidery, Hunan embroidery, Guangdong embroidery, and Shu embroidery reached 96.3%, 98.5%, and 99.4%, respectively. The structural similarity index of the image restoration model has reached 0.99. The restored image was almost indistinguishable to the naked eye in terms of details, texture, and color. The research method has significant advantages in classifying embroidery images and high-quality restoration tasks, and can provide reliable technical support for the digital protection and intelligent restoration of traditional embroidery cultural relics. Povzetek: Za klasifikacijo in digitalno obnovo vezenin so razviti izboljšani DenseNet in DCGAN z dodanim SPP, razširjenimi konvolucijami ter CBAM. Izboljšani model skoraj povsem naravno obnovi teksture in barve. 1 Introduction architectural elements in the images of Greek temple ruins, improving the efficiency of restoration and Embroidery works have attracted countless people's enhancing the consistency and accuracy of the restoration attention with their exquisite craftsmanship, rich patterns, effect [3]. Alessandro et al. used a trained and profound cultural connotations. However, over time, multidimensional DL neural network to associate color many embroidery artifacts have suffered from natural or images with X-ray fluorescence imaging raw data to human damage, such as fading, breakage, and insect complete the restoration of AI digital cultural heritage, infestation, which seriously threaten the preservation and achieving digital restoration of graphic artworks [4]. inheritance of embroidery artifacts [1]. The traditional With the further advancement of DL technology, restoration of Embroidered Cultural Relics (ECR) mainly Generative Adversarial Networks (GANs) have made relies on manual skills. Although this method can finely breakthrough progress in image recognition, providing a handle every damage, it is limited by lower work good solution for cultural relic image restoration [5]. efficiency and dependence on the superb skills of the Praveen et al. proposed a new GAN-based art restoration restorer [2]. In addition, the subjectivity in the manual method to digitally repair damaged artworks and assist in repair process may also lead to deviations in the physical restoration. This method performed well in consistency and accuracy of the repair effect. In this digital restoration and could effectively restore the context, the emergence of Artificial Intelligence (AI) original appearance of artworks, providing important technology, especially Deep Learning (DL) technology, guidance for physical restoration [6]. Zheng et al. has provided new solutions for the restoration of cultural proposed an Example Attention Generative Adversarial relics. By training DL models, staff can automatically Network (EA-GAN) that fuses with reference examples, detect and classify the types of damage to cultural relics, which addressed the issue of significant reconstruction providing a scientific basis for restoration work. At errors in traditional character restoration methods. present, many researchers have explored it. For example, Compared with existing internal drawing networks, EA- Maitin et al. proposed a direct reconstruction technique GAN could obtain the correct text structure through the without image segmentation using DL technology to guidance of additional examples in the "example attention reconstruct missing architectural elements in Greek block". The Peak Signal-to-Noise Ratio (PSNR) and temple ruins images from virtual image paintings. This method has successfully reconstructed the missing 362 Informatica 49 (2025) 361–372 G. Dong et al. Structural Similarity Index (SSIM) values have increased DL model architecture that can establish dense by 9.82% and 1.82% [7]. connections between network layers through In summary, numerous scholars have achieved DenseBlocks, thereby improving the information flow and significant results in cultural relic image restoration. gradient flow of the network, alleviating the problem of However, there are still issues with GAN in terms of gradient vanishing, and promoting feature reuse [8-9]. The image feature extraction, such as poor network training structure of DenseBlock in DenseNet is displayed in Fig.1. stability and poor generated image quality. At present, In Fig.1, the connection mechanism of DenseBlock is there is relatively limited discussion on embroidery more aggressive compared to the Residual Network classification and restoration in cultural relic image (ResNet). Each layer is connected to all previous layers, classification and restoration. Given this, this study providing each layer with a rich input that integrates the innovatively constructs an ECR-Image Classification features of all previous layers [10]. This design ensures the Model (ICM) based on Densely Connected Convolutional uniformity of feature map size within DenseBlock and Network (DenseNet) and an ECR-Image Restoration greatly promotes feature reuse through dense connections Model (IRM) based on Deep Convolutional GAN between layers, enabling the network to learn and transmit (DCGAN). Based on these models, improvements are information more effectively [11]. However, DenseNet made by introducing Local Binary Patterns (LBPs), Canny still has certain shortcomings in the image classification operator edge extraction, and Convolutional Block process, such as the problem of input image size limitation Attention Module (CBAM). The fusion of these and the problem of network training not converging [12- technologies aims to enhance the model's capacity to 13]. Therefore, this study improves it through techniques capture details in ECR images, improve the precise such as SPP, LBP, and Canny operator, and proposes a reconstruction of textures and edges during the restoration novel ECR-ICM model, namely the SPP-IDenseNet process, and achieve higher quality ECR image restoration model. The training process for the embroidery image results. The main novelizations and contributions of this classification of this model is shown in Fig.2. paper include: (1) For the first time, DenseNet is combined In Fig.2, this study first randomly selects a batch of with Spatial Pyramid Pooling (SPP) and applied to data from the training set based on a preset batch size, and classify embroidery images, improving the recognition normalizes it to standardize the standard deviation of the performance under cross-style and complex patterns; (2) Red-Green-Blue (RGB) color channels for each The structure of the DCGAN generator and discriminator embroidery image. Subsequently, the normalized image is is innovatively adjusted. By integrating a dilated input into the network for forward propagation to extract convolutional layer, the receptive field of the model is features and predict categories. Secondly, by comparing expanded, which helps to capture image features more the predicted categories of the network with the actual comprehensively and achieve high-quality restoration of categories, the value of the loss function is calculated. embroidery texture and color. (3) A large-scale dataset Next, by adjusting the weights through the backward containing eight types of traditional embroidery images is propagation process of the network, the model’s constructed, providing fundamental support for performance is optimized. After completing a batch of subsequent research. The research results have practical training, the system will check if the entire dataset has value for the digital inheritance and AI-assisted restoration been traversed. If the traversal is not completed, the model of traditional embroidery culture. will continue to process the next training batch and repeat the above steps. Once the training traversal of the entire 2 Methods and materials dataset is completed, the model will save the weight parameters of the current round and evaluate whether the predetermined training round has been reached. If the 2.1 Construction of ECR-ICM based on training rounds have not been completed, the model will SPP-IDenseNet restart the training process and continue iterative ECR image classification is the prerequisite and optimization. After reaching the predetermined training foundation for ECR image restoration. By classifying round, the model training terminates, and the weight ECR images, different embroidery types, styles, and eras parameters at this time will be used for subsequent image can be quickly identified and distinguished, providing a classification tasks. The calculation of the RGB three scientific foundation for protecting the cultural relics. This channel pixel values OutputR , OutputG , and OutputB of study first explores the classification of ECR images. the normalized image is shown in formula (1). DenseNet was proposed by Gao et al. in 2017. It is a novel Figure 1: Schematic structure of DenseBlock (Source from: https://colorhub.me/photos/e7RVB). Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 363 Embroidery data set Divide the training set into Preprocessing Feature extraction multiple training batches Start N Y Stop Training rounds met Backward propagation Forward of updated weights calculated loss Y Traversing the N Save the training set model Preprocessing the next training batch Figure 2: Training process of SPP-IDenseNet model for embroidery image classification (Source from: https://colorhub.me/photos/e7RVB). Input feature ConvL adjusts the amount of channels to map half the original number Max poor Max poor 4×4 16×16 Output feature map FCL splicing feature map channel Figure 3: Schematic diagram of SPP structure.  IutputR −mean In formula (2), n is the hierarchy of the model Output = R R network. F and  are convolution operations and  stdR feature interconnection operations. The loss l during the  Iutput −mean Output = G G G (1) training process is shown in formula (3).  stdG l = L(Y Y ' 1, 1 ) (3)  Iutput −mean Output = B B In formula (3), L is the loss function. Y and Y ' are B 1 1  stdB the real category and the predicted category. The updated In formula (1), IutputR , IutputG , and Iutput  is shown in formula (4). B are network weight ' the RGB three channel pixel values of the image before  ' = − lr g(l) (4) normalization processing. meanR , meanG , and meanB In formula (4),  is the network weight before the are the mean values of the RGB channels. stdR , stdG , update. lr and g() are learning efficiency and and stdB represent the standard deviation of the RGB derivative calculation. In response to the issue of input image size limitation in DenseNet model image three channels. The output feature Mn is shown in classification tasks, this study uses SPP to enable the formula (2). model to adapt to input images of different sizes. The Mn = F(M1 M2  Mn−1) (2) structure of SPP is shown in Fig.3. 364 Informatica 49 (2025) 361–372 G. Dong et al. In Fig.3, this study integrates the SPP structure the model, it may also cause training instability and between the convolutional layer and the Fully Connected sometimes even lead to model training crashes [16-17]. Layer (FCL) at the end of the DenseNet model. By Therefore, this study further introduces a novel derivative dividing the feature map into grids of 1×1, 4×4, and GAN, namely DCGAN. This network can improve the 16×16, and applying max pooling, this study achieves quality of image generation and enhance the learning and comprehensive capture of features of different resolutions. representation capabilities of the model by combining the Subsequently, these multi-scale pooled feature maps will deep architecture of CNN with the GAN framework. The be merged into a fixed-length feature vector, providing generator extends and reshapes 100-dimensional noise rich information for the input of FCL. In addition, by into a 3D feature map through FCL, and then gradually pooling on windows of different sizes, this study generates forms the final image size through upsampling and feature maps with diverse resolutions and fine-tunes the dimension adjustment of transposed convolutional layers. channel dimensions through a 1×1 convolutional layer. Batch normalization and ReLU are applied after each The ReLU activation function used in DenseNet may layer, and the output image is finally activated by Sigmoid cause neuron deactivation when the input is less than 0 to produce a specific tensor image [18-19]. The [14]. Therefore, this study introduces the Leaky ReLU generator’s loss function is shown in formula (5). function and sets the negative slope coefficient to 0.01, LG `= −Ez~ pz(z)[log(D`(G`(z))] (5) effectively expanding the applicability of ReLU and In formula (5), E is the expected operation symbol, promoting the stability and convergence of network usually taken as the average or expected value. z is a training. The SPP module enhances the model's noise sample from the latent spatial prior distribution. understanding of the structural hierarchy of embroidery G、(z) is the data generated by the generator through the patterns through multi-scale pooling operations and improves the receptive field coverage of complex patterns. noise sample z . D、(G、(z)) is the output of the LBP extracts fine-grained texture features from discriminator to the data generated by the generator, which embroidery images, enabling the model to pay more represents the probability of real data. At this point, the attention to the local texture restoration of the defect area. loss function of the discriminator is shown in formula (6). Canny edge detection provides clear structural contour LD `= −Ex~ pdata(x)[log(D`(xz )] constraints, guiding the generator to maintain the (6) coherence and integrity of the pattern edges. The three −Ez~ pz (z )[1− D(G`(xz )))] work in synergy, enhancing the quality and stability of In formula (6), D、(x me ns z ) a the discriminator’s image restoration from multiple dimensions, such as output for the real sample x , and D`(x i th structure, texture, and edge. z ) s e z probability of the real data. Based on the above formulas, 2.2 Construction of ECR-IRM based on compared to traditional GANs, DCGAN uses convolutional and deconvolution layers to replace FCL in Improved DCGAN traditional GANs. This operation can capture the local The SPP-IDenseNet model designed above provides structure and spatial message of embroidery images [20]. strong technical support for the digital restoration and In addition, DCGAN also uses batch normalization intelligent management of ECR. However, further techniques and expected values to accelerate the training technological innovation and method improvement are process and stabilize the training of GAN. The aim is to needed in ECR image restoration to achieve more efficient further enhance the performance of DCGAN in and accurate restoration results. Therefore, this study embroidery image restoration tasks, improve the explores the restoration of ECR images. GAN is a DL naturalness of restoration effects, and provide experts with model containing two parts: the Generator and more accurate texture and color information to assist them Discriminator. Although GANs are widely popular in in more refined restoration work. Given this, the study also computer image vision, in traditional GAN architectures, improves DCGAN and proposes a new type of ECR-IRM, models do not rely on a determined distribution, but namely IDCGAN. The overall model structure is shown in instead use internal feedback to adjust their parameters Fig.4. [15]. Although this approach enhances the flexibility of Y/N Generator D Extracting Generator G Generated image missing regions Figure 4: Overall structural framework of the IDCGAN (Source from: https://colorhub.me/photos/e7RVB). Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 365 Conv3 Conv4 Conv5 Conv6 Conv7 Conv8 Conv9 Conv2 Conv1 Input CBAM Conv14 Conv13Conv12 Conv11 Conv10 Input Tanh CBAM Conv17 Conv16 Conv15 Figure 5: Specific structure of the generator in the IDCGAN model. 5×5 Conv2 5×5 5×5 5×5 5×5 Conv3 Conv4 Conv5 Conv6 5×5 Conv1 Input 256 256 256 256 64 128 Y/N Flatten3 Flatten2 Flatten1 Figure 6: Specific structure of the discriminator in the IDCGAN. In Fig.4, innovative adjustments are made to the namely the convolution block, dilated convolution block, generator architecture by integrating dilated convolutional and CBAM. Hollow convolution blocks use convolutional layers to expand the model's receptive field, thereby layers with different void rates, namely 2, 4, 8, and 16, to helping the model capture image features more achieve multi-scale capture of image features. When the comprehensively. At the same time, CBAM is introduced hole rate is set to 1, the hole convolution degenerates into to enhance the attention to key features at both the channel a standard convolution operation. This is reflected in the and spatial levels, thereby improving the accuracy of Conv6 to Conv10 layers of the generator, forming a series image restoration. The discriminator adopts a strategy of of ConvLs with different hole rates that ensure the enhancing its depth and increasing the number of FCLs, flexibility and adaptability of the network. The thereby improving the network's ability to handle complex introduction of CBAM adds the ability for dynamic nonlinear problems, enabling the discriminator to more weighting to the generator. It can weight features in both effectively recognize and distinguish between real and channel and spatial dimensions, highlighting the features generated images. The loss function combines traditional that have the greatest impact on image quality. The MSE loss with adversarial loss. The calculation of mean framework of the discriminator in the IDCGAN model is square error loss LMSE is shown in formula (7). shown in Fig.6. In Fig.6, to improve the performance of the 1 n L 2 discriminator in addressing complex nonlinear problems, MSE = (yi − gi ) (7) n i=1 this study adds two FCLs based on the original In formula (7), gi is the predicted value of the model discriminator architecture, making the discriminator contain a total of three FCLs. The interconnection of these on the training data xi . The adversarial loss Ladv is shown layers enhances the discriminator's ability to learn in formula (8). features, thereby significantly improving model Ladv = minG maxD Ex P log2 D(x) data performance. Ultimately, the discriminator determines the +E authenticity of the input image through a binary z P log2 (1−D(G(Z))) (8) data classification task, distinguishing whether the image was In formula (8), the algebraic meaning remains the generated by the generator or from a real dataset. The same as before. The specific structure of the generator in research is conducted based on a self-built embroidery the IDCGAN model is shown in Fig.5. image dataset. The images mainly come from digital In Fig.5, the architecture of the generator in the museums, high-resolution cultural relic catalogues, and IDCGAN model mainly consists of three key modules, 366 Informatica 49 (2025) 361–372 G. Dong et al. cultural heritage archives, covering multiple historical 3 Results periods and diverse embroidery styles. The initial dataset contains 1,800 images. After expansion, the dataset 3.1 SPP-IDenseNet model performance ultimately includes 8,957 images. For the unified model input, the image is cropped and scaled to 256×256 pixels, testing and normalization processing is carried out The study adopts five-fold cross-validation to evaluate the simultaneously. Ultimately, the dataset is divided into a model’s performance. The training set is evenly divided training set and a test set in an 8:2 ratio. To simulate the into five subsets of similar size. Four subsets are selected common damage forms of ECR, the study also uses in sequence for model training, and the remaining subset random occlusion to generate defect images. The is used as the validation set. This process is repeated five occlusion forms include rectangles, free-shaped patterns, times to ensure that each subset participates in the and speckled textures, and the area ratio is controlled at verification. Through multiple rounds of training and 10% to 40%. On this basis, image enhancement is carried validation, the mean and standard deviation of the out by applying methods such as rotation, flipping, scaling, accuracy, recall rate, and specificity of the calculation and color perturbation to improve the robustness and model are calculated, effectively avoiding the randomness generalization ability of the model. In addition, by brought by a single division and enhancing the statistical analyzing the color and style distribution of the images, a reliability and generalization ability of the evaluation balanced sampling strategy is adopted to control the results. Table 1 shows the experimental setup and category bias, ensuring the diversity and balance of the environmental parameters. training data in terms of pattern style and damage type. All According to the settings in Table 1, the effectiveness the code modules in the research are built based on the of the proposed model was first validated through ablation PyTorch framework. Some of the code is as follows: testing, as shown in Fig.8. import torch import torch.nn as nn # Simple Generator example class Generator(nn.Module): def__ init__ (self): super().__init__ () self.net = nn.Sequential( nn.Linear(100, 256), nn.ReLU(), nn.Linear(256, 3*64*64). nn.Sigmoid() def forward(self, x): retum self.net(x).view(-1, 3, 64, 64) # Simple Discriminator example class Discriminator(nn.Module): def__ init__(self): super().__ init__ () self.net = nn.Sequential( nn.Flatten(), nn.Linear(3*64*64, 256), nn.ReLU(). nn.Linear(256, 1), nn. Sigmoid() def forward(self, x): return self.net(x) # Training example (pseudo code) # z = torch.randn(batch_ size, 100) # fake_ images = generator(z) # real_ output = discriminator(real_ images) # fake_ output = discriminator(fake_ images) Figure 7: Code. Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 367 Table 1: Environment and parameter configuration. Serial number Experimental environments and hyperparameter categories Settings 1 Num epochs 200 2 Pre-training No 3 Batch size 20 4 Num class 8 5 Optimizer Adam 6 Learning rate 0.0001 7 Development Environment Windows 10 8 CPU Intel Core i9-10900K 9 GPU NVIDIA RTX 3090 10 Memory 64GB 11 Graphics Memory 16GB GDDR6X 12 Programming Tools PyTorch 1.6.0 100 96.4% 100 95.6% 90 88.5% 90 84.3% 80 79.3% 80 79.8% 70 65.3% 70 64.8% 60 60 50 SPP-IDenseNet 50 SPP-IDenseNet 40 IDenseNet 40 IDenseNet 30 SPP- 30 SPP- DenseNet DenseNet 20 DenseNet 20 DenseNet 10 10 0 0 50 100 150 200 250 300 350 400 450 500 0 0 50 100 150 200 250 300 350 400 450 500 Sample size Sample size (a) Training set (b) Test set Figure 8: Ablation test results of SPP-IDenseNet. Figs.8 (a) and (b) show the test results of the new Although the GIST model can capture certain texture model in two datasets. As the test samples continue to information, it is limited by its feature extraction method grow, the standalone DenseNet module shows lower based on compressed texture description. GIST’s classification accuracy in both datasets, with the highest recognition ability for irregular shapes and multi-scale being only 65.3%. After introducing the SPP module, patterns is weak, resulting in limited classification LBP, Canny operator, and Gabor filter module performance. The SPP-IDenseNet model has successively, the classification effectiveness of the entire demonstrated superior performance in all four types of model has been significantly improved. The result embroidery image recognition tasks. This model enhances indicates that when dealing with embroidery images with its feature perception ability for different scales and spatial complex texture features, relying solely on global features structures by introducing SPP modules, and combines for extraction has certain performance bottlenecks. The LBP and Gabor filters to model fine-grained textures, classification accuracy of SPP-IDenseNet is highest at effectively improving the model's ability to recognize the 96.4% in the training set and 95.6% in the testing set. This microstructure of embroidery patterns. Meanwhile, the study has improved various parts of the DenseNet model addition of the Canny edge detection operator enhances to varying degrees for classifying and recognizing ECR the ability to capture boundary and contour features, images, demonstrating the effectiveness of the improved enabling the model to maintain high classification method. In addition, popular ICMs of the same type, accuracy even in the face of complex background including Lightweight CNN (LCNN), Efficient CNN interference. The SPP-IDenseNet model has the highest (ECNN), StyleGAN, and Global Image Spatial Texture accuracy rate of 96.3%, the highest recall rate of 98.5%, (GIST), are introduced as comparative models. and the highest specificity of 99.4% on Suzhou Performance tests are conducted using precision, recall, embroidery, Hunan embroidery, Guangdong embroidery, and specificity as indicators, as shown in Table 2. and Shu embroidery. These indicators are numerically In Table 2, due to their relatively simplified structures, superior to other convolutional neural network models and LCNN and ECNN models have obvious deficiencies in have a more balanced distribution across categories. This feature expression ability and fine-grained classification. result demonstrates the adaptability and effectiveness of Classification accuracy/% Classification accuracy/% 368 Informatica 49 (2025) 361–372 G. Dong et al. the SPP-IDenseNet model in handling the classification occlusion issues, and identifying embroidery images with task of ECR images. The confusion matrix obtained on the similar features in the embroidery image classification embroidery image classification dataset before and after dataset. This robustness makes the SPP-IDenseNet model model improvement is shown in Fig.9. a powerful tool for ECR image classification, which can Figs. 9 (a) and (b) show the confusion matrices before effectively address the challenges in practical and after model improvement. The SPP-IDenseNet model applications. has the highest classification and recognition accuracy for Shui ethnic ponytail embroidery, Xiqin, Hami, Su, Xiang, 3.2 Performance simulation testing of Shu, and other embroidery in the embroideries image ECR-IRM for IDCGAN classification dataset. Its classification accuracy in Yue embroidery types is relatively the lowest. Overall, the This study uses the Tensorflow DL framework to SPP-IDenseNet model achieves an average prediction implement the training and testing of the entire ECR-IRM. accuracy of over 80% for the 8 styles of embroidery The weights β1 and β2 of the Adam optimizer are set to images in the embroidery image classification dataset. 0.5 and 0.9. The loss changes of IDCGAN generator and This indicates that the SPP-IDenseNet model discriminator at different network learning rates are shown demonstrates strong robustness in handling noise, in Fig.10. Table 2: Multi-metric performance test results for different models. Style Model Precision/% Recall/% Specificity/% LCNN 63.5 65.7 80.2 ECNN 67.2 69.8 81.6 Suzhou embroidery GIST 70.3 68.7 83.4 StyleGAN 85.7 87.4 89.1 Research method 95.8 98.5 94.2 LCNN 55.2 56.3 89.6 ECNN 58.7 60.4 90.2 Hunan embroidery GIST 60.2 61.7 91.6 StyleGAN 83.4 85.1 92.3 Research method 96.3 90.2 99.4 LCNN 57.6 59.8 53.8 ECNN 66.3 70.4 60.5 Cantonese embroidery GIST 71.6 69.7 70.8 StyleGAN 80.2 82.5 75.4 Research method 95.1 90.8 95.1 LCNN 58.8 60.5 55.6 ECNN 62.8 68.8 58.3 Sichuan embroidery GIST 70.4 73.4 60.7 StyleGAN 79.8 81.7 69.2 Research method 92.4 96.7 90.3 Confusion matrix Confusion matrix 90 90 Hamicxiu 56 1 0 0 0 0 0 0 Hamicxiu 82 1 0 0 0 0 0 0 80 80 Other 0 58 0 2 0 2 0 4 Other 0 87 0 3 0 3 0 6 70 70 Shuzuma Shuzuma 2 0 66 0 0 1 1 0 2 0 92 0 0 1 1 0 weixiu 60 weixiu 60 Shuxiu 0 0 0 52 2 1 0 2 50 Shuxiu 0 0 0 84 2 1 0 2 50 Suxiu 4 1 0 4 56 6 0 8 40 Suxiu 4 1 0 4 86 8 0 10 40 Xiangxiu 0 1 0 0 0 49 1 1 30 Xiangxiu 0 1 0 0 0 78 1 1 30 Xiqincixiu 6 0 0 0 1 1 68 0 20 Xiqincixiu 7 0 0 0 1 1 90 0 20 Yuexiu 0 2 1 1 2 2 0 47 10 10 Yuexiu 0 2 1 1 3 2 0 75 0 0 True labels True labels (a) Pre-improvement (b) SPP-IDenseNet Figure 9: Confusion matrix plots before and after model improvement. Hamicxiu Othe S rhu a zw ue m ix S ih u uxiu Suxi X uiangx X iuiqincix Y iu uexiu Hamicxiu Oth S e h r u a zw ue m ix S iuhuxiu Sux X iu iangx X iuiqincixi Y uuexiu Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 369 0.0002 0.0002 0.002 0.002 3.0 0.00002 1.8 0.00002 2.5 1.5 2.0 1.2 1.5 0.9 1.0 0.6 0.5 0.3 0 0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Epoch Epoch (a) Loss of generators at different (b) Loss of the discriminator at different learning rates learning rates Figure 10: Loss variation of IDCGAN between generator and discriminator at different learning rates. (a) Original image (b) Random masking (c) DCGAN (c) IDCGAN Figure 11: Repair effects of the model before and after the improvement (Source from: https://colorhub.me/). In Fig.10 (a), the loss of the IDCGAN generator model performance, the repair effect of the improved slowly increases with the growth of training cycles, and model before and after random occlusion is compared, as the curve with a learning rate of 0.00002 shows a low and shown in Fig.11. stable loss value. The curves with learning rates of 0.002 Figs.11 (a) to (d) show the embroidery original image, and 0.0002 show higher loss values and larger images subjected to random occlusion, images restored by fluctuations. In Fig.10 (b), the discriminator loss slowly the DCGAN model, and images restored by the IDCGAN decreases as the number of training cycles increases. The model. By comparing these images, the effectiveness of curve with a learning rate of 0.00002 decreases the fastest IDCGAN in handling different types of embroidery and and tends to stabilize, indicating that a smaller learning varying degrees of occlusion can be demonstrated. rate helps the discriminator learn more effectively. In IDCGAN can enhance the focus on key features, thereby contrast, the curves with learning rates of 0.002 and enabling the restored image to largely restore the details 0.0002 exhibit significant fluctuations and higher loss and colors of the original image, effectively solving the values. Based on the comprehensive experimental data, problem of color non-uniformity. However, DCGAN's this study ultimately sets the network learning rate of the repair effect is not ideal when facing large-scale defects, IDCGAN model to 0.00002. To verify the impact of and it cannot maintain good contextual consistency, dilated convolutional layers, loss functions, and CBAM on resulting in poor repair performance. This discovery G-Loss G-Loss 370 Informatica 49 (2025) 361–372 G. Dong et al. validates the necessity of improving the DCGAN. To led to significant advantages of IDCGAN in embroidery further test the effectiveness of the research model in image restoration. Finally, to confirm the resolution embroidery image restoration, the Cycle-Consistency capability of the proposed model, this study also tests four GAN (CCGAN), Conditional GAN (CGAN), and Stacked models using image clarity as an indicator, as shown in GAN (Stack-GAN) models are introduced for Fig.13. comparison. The test results of SSIM as the experimental Figs.13 (a) to (d) show the clarity performance of indicator are shown in Fig.12. CGAN, CCGAN, Stack-GAN, and IDCGAN models in Figs.12 (a) and (b) show the SSIM performance the Yue embroidery image restoration task. Figure 13 (e) comparison of four models in two datasets. Both in the shows the clarity of the original image. The Yue training and testing sets, the IDCGAN model performs the embroidery restoration images generated by IDCGAN are best, followed by Stack-GAN and CCGAN, while CGAN visually very similar to the original images, and it is performs the worst. In the training set, the maximum almost impossible to distinguish the quality differences SSIM values for CGAN, CCGAN, Stack-GAN, and the with the naked eye. In contrast, there are significant research model are 0.64, 0.72, 0.85, and 0.98, while in the differences between the restoration results of CGAN, testing set, they are 0.69, 0.78, 0.90, and 0.99. The above CCGAN, and Stack-GAN and the original images. data indicates that the research model has significant Especially for the restored images of the CGAN model, advantages in maintaining image structure and quality. there is a significant decrease in clarity in comparison to The reason behind this is that the dilated convolution the original images. In summary, the research model technique effectively expands the receptive field, allowing surpasses the comparative model in image resolution for it to capture richer contextual information in the image. In Guangdong embroidery restoration, demonstrating its addition, CBAM further enhances the model's attention to potential and advantages in embroidery image restoration key features by weighting important features in both processing. channel and spatial dimensions. These improvements have CGAN Stack-GAN CGAN Stack-GAN 0.5 CCGAN IDCGAN 0.5 CCGAN IDCGAN 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200 Sample size Sample size (a) Training set (b) Test set Figure 12: Schematic of SSIM test results for different models. Image Gradient Entrop 5.65 6.12 6.78 7.82 (a) CGAN (b) CCGAN (c) Stack-GAN (d) IDCGAN 7.86 (e) Original image Figure 13: The clarity of restored images of Cantonese embroidery (Source from: https://colorhub.me/photos/VXeo3). SSIM SSIM Improved DenseNet-DCGAN for Enhanced Digital Restoration… Informatica 49 (2025) 361–372 371 2565, 2021. https://doi.org/10.1007/s00371-021- 4 Conclusion 02216-0 [2] Xiaoli Fu, and Niwat Angkawisittpan. Detecting The study focused on the task of image restoration of ECR surface defects of heritage buildings based on deep and innovatively constructed an ECR-ICM based on SPP- learning. Journal of Intelligent Systems, 33(1):163- IDenseNet and an ECR-IRM based on the improved 169, 2024. https://doi.org/10.1515/jisys-2023-0048 DCGAN. The experimental results showed that the SPP- [3] Ana M. Maitin, Alberto Nogales, Emilio Delgado- IDenseNet model achieved an average prediction Martos, Giovanni Intra Sidola, Carlos Pesqueira- accuracy rate of over 80% for the embroidery images of Calvo, Gabriel Furnieles, and Álvaro J. García- eight styles. The IRM could enhance the focus on key Tejedor. Evaluating activation functions in GAN features, thereby enabling the restored image to largely models for virtual inpainting: A path to architectural restore the details and colors of the original image, heritage restoration. Applied Sciences, 14(16):6854- effectively solving the problem of uneven color. The 6854, 2024. https://doi.org/10.3390/app14166854 SSIM value has reached 0.99. Furthermore, the research [4] Alessandro Bombini, Fernando García-Avello model could still maintain an excellent restoration effect Bofías, Chiara Ruberto, and Francesco Taccetti. A even when dealing with large-area damaged embroidery cloud-native application for digital restoration of images. The restored image of Cantonese embroidery cultural heritage using nuclear imaging: generated was visually extremely similar to the original THESPIAN-XRF. Rendiconti Lincei. Scienze image, and it was almost impossible to distinguish the Fisiche e Naturali, 34(3):867-887, 2023. quality difference with the naked eye. The results show https://doi.org/10.1007/s12210-023-01174-0 that the research model achieves innovation in technology [5] Kanghyeok Ko, Taesun Yeom, and Minhyeok Lee. and demonstrates significant advantages in practical SuperstarGAN: Generative adversarial networks for applications. However, the research model also has certain image-to-image translation in large-scale domains. limitations. On the one hand, the current models mainly Neural Networks, 162(42):330-339, 2023. target 2D embroidery images. At present, there is no https://doi.org/10.1016/j.neunet.2023.02.042 adaptive research on complex 3D multi-level embroidery [6] Praveen Kumar, and Varun Gupta. Restoration of structures and heterogeneous multi material embroidery damaged artworks based on a generative adversarial patterns, which limits their promotion and application in network. Multimedia Tools and Applications, high-precision virtual restoration. On the other hand, due 82(26):40967-40985, 2023. to the adoption of a deep generative network structure, the https://doi.org/10.1007/s11042-023-15222-2 model has a certain dependence on computing resources [7] Wenjun Zheng, Benpeng Su, Ruiqi Feng, Xihua during the training and inference stages. This may pose Peng, and Shanxiong Chen. EA-GAN: Restoration practical challenges in resource constrained cultural of text in ancient Chinese books based on an example heritage conservation institutions or mobile deployments. attention generative adversarial network. Heritage Furthermore, for severely damaged or extremely blurry Science, 11(1):55-62, 2023. images, there is still a certain risk of distortion in the https://doi.org/10.1186/s40494-023-00882-y structural reconstruction of the research model. Future [8] Mihai Bundea, and Gabriel Mihail Danciu. research can be carried out in the following directions: (1) Pneumonia image classification using DenseNet Expansion of model generalization ability: By integrating architecture. Information, 15(10):611-619, 2024. 3D reconstruction and multimodal input, the restoration https://doi.org/10.3390/INFO15100611 ability of 3D ECR can be enhanced; (2) Enhanced multi- [9] Sherly Alphonse, S. Abinaya, and Nishant Kumar. material adaptability: Material perception module or style Pain assessment from facial expression images transfer mechanism can be introduced to achieve texture utilizing Statistical Frei-Chen Mask (SFCM)-based simulation and reconstruction of heterogeneous features and DenseNet. Journal of Cloud Computing, embroidery materials; (3) Lightweight deployment 13(1):142-148, 2024. optimization: By applying techniques such as model https://doi.org/10.1186/S13677-024-00706-9 pruning, quantization, and distillation, the network [10] Chunyang Zhu, Lei Wang, Weihua Zhao, and Heng structure is compressed to adapt to edge devices or mobile Lian. Image classification based on tensor network terminal applications. Overall, the research method DenseNet model. Applied Intelligence, 54(8):6624- provides a feasible and effective technological path for 6636, 2024. https://doi.org/10.1007/S10489-024- ECR digital protection, which is expected to have practical 05472-4 applications in digital museum construction, virtual [11] S. Deepa, Beevi S. Zulaikha, Laxman L. Kumarwad, restoration of cultural heritage, and reconstruction of and Sabbineni Poojitha. Namib beetle firefly cultural creative models. optimization enabled DenseNet architecture for hyperspectral image segmentation and classification. References International Journal of Image & Data Fusion, 15(2):190-213, 2024. [1] Xinyang Guan, Likang Luo, Honglin Li, He Wang, https://doi.org/10.1080/19479832.2023.2284781 Chen Liu, Su Wang, and Xiaogang Jin. Automatic [12] Suresh Samudrala, and C. Krishna Mohan. Semantic embroidery texture synthesis for garment design and segmentation of breast cancer images using online display. The Visual Computer, 37(9):2553- DenseNet with proposed PSPNet. Multimedia Tools 372 Informatica 49 (2025) 361–372 G. Dong et al. and Applications, 83(15):46037-46063, 2023. https://doi.org/10.1007/S11042-023-17411-5 [13] M. Karthikeyan, and D. Raja. Deep transfer learning enabled DenseNet model for content-based image retrieval in agricultural plant disease images. Multimedia Tools and Applications, 82(23):36067- 36090, 2023. https://doi.org/10.1007/S11042-023- 14992-Z [14] Babu Rajendra Prasad, and Dr. B. Sai Chandana. Human face emotions recognition from thermal images using DenseNet. International Journal of Electrical and Computer Engineering Systems, 14(2):155-167, 2023. https://doi.org/10.32985/IJECES.14.2.5 [15] Ning Wang, Yanzheng Chen, Yi Wei, Tingkai Chen, and Hamid Reza Karimi. UP-GAN: Channel-spatial attention-based progressive generative adversarial network for underwater image enhancement. Journal of Field Robotics, 41(8):2597-2614, 2024. https://doi.org/10.1002/ROB.22378 [16] Noa Barzilay, Tal Berkovitz Shalev, and Raja Giryes. MISS GAN: A Multi-IlluStrator style generative adversarial network for image to illustration translation. Pattern Recognition Letters, 151(16):140-147, 2021. https://doi.org/10.1016/J.PATREC.2021.08.006 [17] Manuel Domínguez-Rodrigo, Ander Fernández- Jaúregui, Gabriel Cifuentes-Alcobendas, and Enrique Baquedano. Use of generative adversarial networks (GAN) for taphonomic image augmentation and model protocol for the deep learning analysis of bone surface modifications. Applied Sciences, 11(11):5237-5247, 2021. https://doi.org/10.3390/APP11115237 [18] Aram You, Jin Kuk Kim, Ik Hee Ryu, and Tae Keun Yoo. Application of generative adversarial networks (GAN) for ophthalmology image domains: A survey. Eye and Vision, 9(1):6-16, 2022. https://doi.org/10.1186/S40662-022-00277-3 [19] Zhiguo Xiao, Jia Lu, Xiaokun Wang, Nianfeng Li, Yuying Wang, and Nan Zhao. WCE-DCGAN: A data augmentation method based on wireless capsule endoscopy images for gastrointestinal disease detection. IET Image Processing, 17(4):1170-1180, 2022. https://doi.org/10.1049/IPR2.12704 [20] Betelhem Zewdu Wubineh, Andrzej Rusiecki, and Krzysztof Halawa. Classification of cervical cells from the Pap smear image using the RESDCGAN data augmentation and ResNet50V2 with self- attention architecture. Neural Computing and Applications, 18(24):1-15, 2024. https://doi.org/10.1007/S00521-024-10404-X https://doi.org/10.31449/inf.v49i16.9995 Informatica 49 (2025) 373–396 373 Enhanced Prediction of Tropical Tree Biomass Using Ensemble Models Qiucai Dang Zhumadian Preschool Education College, Zhumadian 463000, China E-mail: Dqc336699@163.com Keywords: above-ground biomass, below-ground biomass, ensemble stacking, grid search optimization Received: July 3, 2025 The present paper aims to propose a novel model to investigate its utility in evaluating the beneficial effects of tropical forest biomass. To address the multiplicity of variables, as well as the complexity and nonlinear relationships between them, five Machine Learning (ML) models, namely Gradient Boosting (GB), Extra Trees (ET), XGB, ElasticNet, and Poisson Regression, were employed to concurrently predict both the below-ground and above-ground tree biomass (BGB and AGB, respectively), as well as the total biomass (TB = BGB + AGB). Since the results of the aforementioned models were not entirely satisfactory, an additional model called the Stacking Ensemble (SE) was introduced. Each model can have its parameters optimized by Grid Search with cross-validation to make sure that there is generalization and consistent performance. The data collected were based on 175 trees from 27 ecoregional plots located in the Central Highlands ecoregion of Vietnam. The dataset was processed to investigate the proposed model's ability to predict tree biomass. The study's findings revealed that the proposed method demonstrated strong and efficient predictive capabilities for biomass estimation in forest ecoregions. The Stacking model showed the most significant improvements in the highest R 2 (0.968) and VAF (0.971), and the lowest errors, and MDAPE (23.081 percent), which means that it has a strong ability to predict and minimal bias. However, STD (105.763) was marginally higher; nevertheless, the error and strength of this variation exceeded this variance. Thus, incorporating a Stacking Ensemble (SE) model strengthens the ML approach in predicting forest tree biomass amounts. Povzetek: Študija predlaga ansambelski model za napoved tropske drevesne biomase, ki združuje pet ML- modelov in optimizacijo z iskanjem po mreži. Stacking Ensemble doseže najboljša napovedovanja ter najnižje napake, kar občutno izboljša oceno nadzemne, podzemne in skupne biomase. 1 Introduction 1.2 Above-ground biomass (AGB) The term above-ground biomass (AGB) refers to the 1.1 The role of biomass product of above-ground volume (AGV) and vegetation mass. It is also closely linked to the carbon cycle in global Given that biomass plays an unquestionable role as one of grassland ecosystems. Additionally, accurate estimation the world’s vital sources of energy [1]. The disputing of AGB variations is essential for assessing carbon matter is what appropriate model would be able to decomposition and its impact on climate change. It is also recognize and prove its traits. Zhantao Song et al. (2024) crucial to screen in situ-harvested AGB data before in their work discussed original visions about the concept modeling [3]. Furthermore, AGB is an indispensable of the pyrolysis process of biomass. They argued the factor for evaluating ecosystem health and carbon storage. contribution of various factors to the challenging To estimate AGB, the above-ground volume (AGV) of anticipation of physicochemical traits by applying vegetation is considered a high-priority parameter in machine learning techniques such as Random Forest, research [4]. gradient boosting decision tree, extreme gradient To estimate AGB variations of China’s grassland boosting, in which R2 was higher than 0.97 for particular ecosystems, machine learning algorithms, among which surface area biochar anticipation as well as analysis, the Random Forest model with R2 = 0.83 (i.e., 83 % of the involving yield as well as N content of biochar [1]. harvesting AGB variations), and RMSE = 43.84 gm−2, In another study, Jia et al. (2024) exploited machine revealed accurate performance in estimating grassland learning methods to anticipate zeolite-catalyzed biomass AGB [3]. Mao et al. (2021) in their proposed model pyrolysis, and as a result, the Random Forest algorithm proved that structural, textural, and spectral metrics performed the highest prediction with R² >0.91 for their contribute to shrub AGV models. They also suggested a suggested models. They concluded that their selected direct reference to specify proper vegetation metrics to factors and methods based on biomass characteristics can screen shrub AGV. The efficiency, accuracy, and low cost be taken into account as a plausible reference [2]. 374 Informatica 49 (2025) 373–396 Q. Dang are considered to be the pros of their proposed approach 1.5 Regression models for digital terrain model (DTM) output and AGV It is appropriate to take a brief glimpse at the regression estimation; thus, it can bridge the gap between ground- models proposed in the present article: based research and satellite remote sensing [4]. May et al. The Gradient Boosting (GB) model is regarded as a 2024 obtained spatially complete predictions of biomass strong ML algorithm for numerical optimization in a tropical area. They state that this sort of spatially problems. Thanks to Leo Breiman (1998) and Jerome coherent data about AGB supplied by their model is useful Friedman (2001), GB has been developed. The former to validate the eco-friendly forest handling, carbon used GB for decreasing variance for categorization, and decomposition innovations, and climate change the latter improved it for regression and categorization alleviation [5]. models. GB algorithms carry out numerical optimization for the models of regression and categorization, repeatedly 1.3 Below-ground biomass (BGB) being approximately directed towards the loss function Below-ground biomass (BGB) is a significant part of negative gradient. Due to some complexity, it is forest tree biomass; however, fewer studies have focused impossible to direct precisely towards a negative gradient; on BGB about forest biomass and carbon. This is largely normally, a weak learner is applied by a GB model to because the process of measuring BGB in large trees is estimate the extreme decline direction [13]. costly and time-consuming. As a result, researchers often Extra Trees (ET), a recently developed regression use Above-Ground Biomass (AGB) to estimate BGB by model, is considered to be an ensemble ML algorithm applying a root-to-shoot ratio. For different forest types, related to decision trees. Originally, ET is the improved researchers have also developed specific direct BGB form of the Random Forest algorithm for the purpose of equations [6]. regression or categorization performance. The reason that In a recent study, Oliveira et al. (2024) suggested that makes the ET regression algorithm more competitive for predicting peanut BGB using their proposed alternative small-sized sample ML is that it utilizes all data to method—i.e., the multi-output regression (MTR) improve the branches of nodes in decision trees effectively approach—would enable both researchers and farmers to [14]. Wang et al. (2023) in their study provided an quantify BGB more accurately. They proposed this efficient ML model utilizing an ET regression algorithm method to predict multiple peanut maturity indices at the for anticipating the relevant synthesis gas traits in the field level, helping to reduce subjectivity in determining process of biomass chemical looping gasification, and peanut maturity [7]. then compared its ability in prediction between the ET model and traditional ones. In another study, using both 1.4 Ensemble approaches RF along with ET al algorithm models, researchers developed a general model to precisely predict the co- Ensemble learning is a potent machine learning technique pyrolysis of coal and biomass, in which ET performed that reduces overfitting, boosts robustness, and enhances better [15]. ET is advantageous due to achieving more overall performance by combining predictions from efficient performance than the Random Forest. Compared several models. Ensemble approaches combine the to RF, ET does not perform bootstrap accumulation like, advantages of multiple algorithms to improve i.e. it takes a random subset of data without replacement. generalization rather than depending on a single model Hence, nodes are divided randomly, but not based on the [8]. Stacking, also known as stacked generalization, is a best divisions. Therefore, in the ET regression model, versatile and successful ensemble technique. Stacking randomness doesn’t come from bootstrap accumulation mixes different kinds of models, possibly with different but from the random divisions of the data [16]. According architectures and learning strategies [9]. In contrast to to Roy (2021), RF was introduced to overcome the bagging (e.g., Random Forest) or boosting (e.g., Gradient Decision Tree problems, giving medium variance. Boosting, XGBoost), which combine similar models Accordingly, ET was proposed when accuracy was more (typically decision trees). Naik et al. (2022) utilized crucial than a generalized model. It also delivers low automated stacked ensemble modelling powered by variance. machine learning for predicting aboveground biomass in Extreme Gradient Boosting (XGB) is another strong, forests using multitemporal Sentinel-2 data [10]. A multifaceted ML algorithm used for regression and stacking ensemble algorithm was used by Zhang et al. taxonomy jobs. It is well-known for its exceptional (2022) to reduce the biases in estimates of forest capability to predict performances and deal with intricate aboveground biomass derived from several remotely datasets. GB involves a series of procedures, preparing sensed datasets [11]. Besides, Jin et al. (2025) evaluated models in sequence, based on which the previously the impact of validation techniques and ensemble learning produced errors are reformed by each new model. algorithms on estimating aboveground biomass in forests: XGBoost is a type of ensemble learning technique that a case study of natural secondary forests [12]. To this end, mixes the predictions of various ML models to yield an they developed models based on various outcomes, ultimate prediction that is more precise. Besides that, this qualified to synchronously anticipate AGB, BGB, and the algorithm also makes use of decision trees like basic total amount of tree biomass, i.e., TB, in forest areas, learners during its process. To add more, XGB is intended solving the problem of carbon estimation for various to efficiently influence processors of high-capacity and forest sites. approaches of the distribution system [17]. Ayub et al. Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 375 (2023) applied an XGB algorithm model on a multi-level homogeneous mixture of fundamental models, for factorial design outcome to predict and improve the accurate yet, at the same time, interpretable prediction of gasification product, in which the XGB model depicts a lung cancer prognosis so as to recognize crucial risk good prediction accuracy as well as model optimization factors [18]. analysis. The key characteristics of the XGB are explained The use of DL methods is unquestionably dominant as an ability to handle complicated relations in data with over other traditional methods, particularly in tropical regularizing techniques, effectively preventing forests biomass research [6]. Although many studies have overfitting; thus, it performs the calculation efficiently due investigated tree biomass anticipation by applying various to parallel processing. It considers the usage of decision models [6], the applied models are well-established. But trees as base learners and then makes use of regularizing lack combining models as ML, ensemble, and techniques for model generalization at a higher optimization of hyperparameters approaches. This work dimensionality. XGB, more popularly acknowledged for adds value by combining them using a meticulously its efficiency in computations, provides processing designed Stacking Ensemble specifically designed for efficiency with perceptive analysis of feature significance, predicting AGB, BGB, and TB using a small, real-world as well as deals with missing values smoothly [18]. dataset from 27 eco-regions in Vietnam. The Fit Index ElasticNet, being a powerful linear regression (FI), a stability-focused evaluation metric that hasn't been technique highly beneficial in ML and statistical used in biomass prediction before, is introduced in this modeling, excels traditional linear regression models. It study. The proposed approach provides new bears the ability to mix the penalties created by both Lasso methodological insights that improve prediction accuracy and Ridge regressions. It is useful in particular when and generalizability in tropical biomass estimation by traditional linear regressions struggle with combining rigorous preprocessing, multi-target modelling multicollinearity, i.e., when predictors are highly within an ecological context, and systematic correlated [19]. That is to say, ElasticNet is advantageous hyperparameter tuning through Grid Search. Furthermore, due to bearing multi-dimensional datasets, selecting this work differs from earlier black-box DL applications significant traits, and being a more consistent and reliable in that it incorporates Shapley Additive Explanations model where there exists collinearity. Aimed to help in (SHAP) for ecological feature interpretation, which offers solving problems of regression and developing models’ important ecological insight. Hence, this study was performance, ElasticNet offers effective analytical means conducted to serve the purpose of bridging this gap. This for handling multi-dimensional regression. Its common subject is an expansion of an ongoing strategy to integrate applications include characteristics selection, analysis of remote sensing inputs acquired using a satellite or a drone regression, and modeling for prediction [20]. The and a source of biomass determinations as measured on significance of ElasticNet regression includes the ground in order to develop a spatially superior, and multicollinearity handling, automatic feature selection, rooted business-time dynamic biomass forecast model. aiding in model interpretability and reducing overfitting, Besides the otherwise plausible analytical foundation of flexible regularization, allowing researchers to control the the process, the model is capable of capturing some facets balance between Lasso and Ridge penalties, robustness in of complex nonlinear responses and enhancing the high-dimensional data, appropriateness for a variety of accuracy of predicting biomass over wider geographical regression problems [15]. areas and timeframes due to the use of sophisticated Poisson is a regression analysis where its answer is Stacking ensembles, enabled by Grid Search and cross- based on the distribution called Poisson. The regression validation. Besides, the climatic variables can be included suffers from a limitation of the variance equaling the to forecast the change in biomass distribution in the case mean, called Equi dispersion. As a consequence of the of a future climate change scenario, which can provide a assumption being violated, resulting in the biased standard significant insight both in forest management and on error, the less exact test statistics drawn from the model, carbon budgeting. That is to say, designing a new model and consequently, the obtained conclusions will be less qualified to anticipate tree BGB, AGB, as well as the total valid. The Poisson regression model, therefore, cannot be of tree biomass TB (i.e., TB = BGB + AGB) concurrently, used under occurrences of over-dispersion or under- will fulfil the requirement of estimating forest carbon. On dispersion. Poisson regression is one of the generalized this account, making use of a community of up-to-date linear models. It finds its main application because it regression algorithms to increase the reliability for the usually happens to model occurrences of the kind that are aforementioned parameters estimation, as well as that for rarely occurring [21]. the newly proposed model, will assist the progressing The Stacking Ensemble (SE) model makes use of an literature in the realm of forestry science. The study ensemble generalizing approach through learning, despite proposes that integrated ensemble models will anticipate the fact that it may lack instructions for appropriate non- tropical tree biomass better than traditional modeling hyperparameterized meta-learners. The necessity of systems; as a result, the model will be dominant over applying stacking is when multiple ML methods reveal conventional ones. The study objectives are twofold: various advantages for a certain task. In this case, the firstly, designing a model to concurrently anticipate tree stacking ensemble method employs a discrete ML AGB, BGB, and TB, guaranteeing additivity of tropical technique for specifying the efficient application of forests in Vietnam by the names of Dipterocarp and various algorithms [22]. For this reason, Arif et al. (2024) Evergreen Broadleaf, and secondly, cross-validating developed a model of stacking ensemble, by a non- 376 Informatica 49 (2025) 373–396 Q. Dang errors compared to a traditional model, applying the same 2 Methodology dataset as well as anticipators in the mentioned forests. The rest of this paper is structured as follows. That is, The present paper aimed to investigate the efficiency of a Section 2 discusses detailed methodology, including state-of-the-art model to be qualified for predicting materials and data used in this work. Section 3 presents tropical forest tree biomass effects. numerical analyses, graphical analyses, and experimental This study was conducted in one of Vietnam’s eight results under the heading of results and discussions. highest tropical forests, called the Central Highlands Lastly, section 4 summarizes the concluding points in the ecoregion. Two main tropical forest categories were study. selected for the focus of the research, i.e., Dipterocarp and Evergreen Broadleaf (See Fig. 1). Figure 1: Sample plots, Locations for the forests Dipterocarp and Evergreen Broadleaf in the ecoregion of Central Highlands, Vietnam In this work, the dataset was exploited in a research machine learning model: Poisson regression, ElasticNet, study conducted by Huy et al. [6]. The collected data were XGB, Extra Trees (ET), and Gradient Boosting (GB). This based on 175 trees from 27 ecoregional lots located in the allowed us to systematically explore parameter Central Highlands, Vietnam. We clearly define the dataset combinations and choose those that produced the best partitioning strategy to ensure reproducibility: the entire performance on training data. Based on the results of dataset of 175 samples was randomly divided into training cross-validation, the Grid Search methodically (80%) and testing (20%) sets. Cross-validation was used investigates a predetermined set of hyperparameter values over iterations to ensure robust evaluation and minimize to determine which combination produces the best model sampling bias. To ensure compatibility across models and performance. better convergence during training, feature preprocessing A customized grid of important hyperparameters was involved removing outliers and normalizing all input built for every model. For instance, tree-based models variables to a [0,1] range using Min-Max scaling. The such as GB, ET, and XGB had their learning rate, hyperparameter tuning process was carried out using Grid maximum depth, and number of estimators adjusted. We Search with 5-fold internal cross-validation for each adjusted the L1 ratio and alpha (regularization strength) Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 377 for ElasticNet. Likewise, pertinent parameters for the MSE, RMSE, MAE, R2, STD, NMSE, MDAPE, and stacking meta-learner and Poisson regression were VAF, were used to evaluate performance. adjusted. In order to maximize generalization and the Fit Index (FI), a goodness-of-fit metric intended performance consistency, Grid Search was used in a cross- to assess the quality of predictions across several cross- validation framework to guarantee that each model was validation realizations. A higher FI value indicates a better trained with the best parameter settings. This method fit, with values approaching 1. The formula for calculating greatly enhanced both the Stacking Ensemble's overall the FI is presented below. performance and the accuracy of the individual models. 1 𝑘 ∑𝑚 2 𝑖=1(𝑦𝑖 − ?̂?𝑖) The desired targets in this dataset included the amount 𝐹𝐼 = ∑ (1 − (1) 𝑘 ∑𝑚 of above-ground tropical biomass (AGB), the amount of 𝑖 (𝑦𝑖 − ?̅?𝑖) 2 1 =1 below-ground tropical biomass (BGB), and (TB), namely In the equation above k stands for the realizations the total tropical tree biomass; equaling the summation of number (in this study k = 10), m is the number of trees the below-ground and above-ground tree biomass (i.e., TB sampled in the validation dataset; and yi is the observed = BGB + AGB). Moreover, preprocessing and value. ?̂?i represents the predicted value, and ?̅? shows the normalization operations were done on the data. averaged value for BGB, AGB, and TB of the ith validated tree in the realization of kth. To serve the purpose of the study, five ML models, as The study goal was to evaluate accuracy and model a base learner including GB, ET, XGBoost, ElasticNet, consistency in light of the ecological context and the small and Poisson, were employed to synchronously anticipate dataset size. Metrics like R2 and VAF measure the both the amount of below and above-ground tree biomass percentage of variance explained by the models, while (BGB and AGB, respectively) as well as the total amount MSE, RMSE, and MAE quantify absolute prediction of tree biomass, i.e., TB = BGB + AGB. errors. Understanding normalization effects and error Owing to the individual models' mediocre distribution is aided by STD and NMSE. MDAPE is a performance, these five models were used as base learners reliable percentage-based metric that works especially to create a Stacking Ensemble (SE). Following that, a well with data that contains outliers or skewness, which is meta-learner was trained using their predictions to typical in biomass measurements. A new and generate the final prediction for every biomass comprehensible metric designed for model comparison component. across several validation folds, the Fit Index (FI) was For the purpose of assessing as well as selecting the introduced to reward accuracy and stability. Ultimately, most efficient model able to concurrently anticipate when combined, these metrics make sure that the tropical tree BGB, AGB, and TB, a powerful process of assessment covers robustness, interpretability, and cross-validation was carried out. predictive accuracy—all of which are critical components The Total number of the data was 175 which was for ecological modelling and decision-making, randomly split ten times into two sections, involving 140 demonstrating that the Stacking Ensemble model was (80%) for training data, and 35 (20%) for testing data, more efficient than the other compared models. evaluating impartially. The reason why the data was Evaluation methods for error metrics criteria are exhibited altered into data testing and training data was to conduct a in Table 1. data analysis satisfying accuracy and reliability in this research. A wide range of assessment metrics, such as Table 1: Equations for evaluation of statistical metrics criteria Statistics Name Equation ∑𝑖=0 𝑁−1(𝑦𝑖 − ?̂?𝑖)2 MSE Mean Squared Error 𝑀𝑆𝐸(𝑦, ?̂?) = 𝑁 ∑𝑛 RMSE Root Mean Square Error 𝑅𝑀𝑆𝐸 = √ 𝑖=1(𝑦𝑖 − ?̂?𝑖)2 𝑛 ∑𝑖=1 𝑛 |𝑦𝑖 − ?̂?𝑖| MAE Mean Absolute Error 𝑀𝐴𝐸 = 𝑛 ∑𝑖=1 𝑁 (𝑦𝑖 − ?̂?𝑖)2 R2 Determination Coefficient 𝑅2(𝑦, ?̂?) = 1 − ∑𝑖=1 𝑁 (𝑦𝑖 − ?̅?)2 ∑𝑖=1 𝑛 (𝑥𝑖 − ?̅?)2 STD Standard Deviation 𝑆𝑇𝐷 = √ 𝑛 − 1 ‖𝑥 − 𝑦‖2 NMSE Normalized Mean Square Error 1 − ‖𝑥 − ?̅?‖ 𝑒𝑎𝑏𝑠 𝑝𝑟?̂? 1 − 𝑒 MDAPE Median Absolute Percentage Error 𝑚𝑒𝑑𝑖𝑎𝑛 (‖ 1 ‖) ∗ 100% 𝑒𝑎𝑏𝑠 1 𝑣𝑎𝑟(𝑡𝑛 − 𝑦𝑛) VAF Variance Account Factor (1 − ) ∗ 100 𝑣𝑎𝑟(𝑡𝑛) 378 Informatica 49 (2025) 373–396 Q. Dang n represents observations number, 𝑦𝑖 is ith observed comprehension of the step-by-step research methodology. value, ?̂?𝑖 shows ith predicted value, and ?̅? is the That is to say, the research process begins with the dataset, observations average. going through analyzing and normalizing them, next, Graphical analyses were also carried out to assess the dividing the normalized data into train and test. More accuracy of the recommended model performance. important part is here where the proposed ML models are Illustrated in different plots, they give the reader evaluated based on an array of specific metrics to opt an illuminating perceptions of the suitability and accuracy of appropriate model which is appeared to be Stacking the models, all which have been discussed in section 3 of Ensemble. Finally, ensemble models are also assessed on this paper. the basis of evaluation metrics to choose the best one. As an overview of the research, the general flowcharts Hence, the results are saved for future usage. of the whole study have been demonstrated below (See Fig. 2 and Fig. 3). Fig. 2 also illustrates a brief Figure 2: General flowchart of the whole research process for applying the proposed model Figure 3 shows the modeling procedure involving forest tree biomass and specifying the best reliable model data collection process for the purpose of theory, and then for such prediction by comparing the selected ML models applying six ML models to concurrently predict tropical with the aid of evaluation metrics. STEP 1 "Multi-output deep learning models Start Modeling enhance the reliability of simultaneous above- and belowground biomass predictions in tropical forests" STEP 2 ata collection Define Inputs & targets in dataset STEP 3 Input selection Split dataset into training (80%) & testing (20%) set STEP 4 Apply ML Models STEP 5 Ensemble model Stacking STEP 6 Tuning by STEP 7 Select best model Compare 6 different machine Learning model by evaluation metrics Figure 3: Flowchart of modeling procedure showing the process of employing ML models concurrently Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 379 The flowcharts of the six regression models, including strength parameter in ElasticNet. It supervises the whole ElasticNet, GB, ET, XGB, Poisson, and SE have been strength of regularization applied to the model. For 𝛼 = 0, illustrated respectively in the following figure. The no regularization is applied, and Elastic Net equals optimization of hyperparameters is also utilized through Ordinary Least Squares (OLS) regression. For 𝛼 = 1, Grid Search tuning. These models were utilized with the regularizations of both L1 and L2 are applied, blending goal of synchronously predicting ABG, BGB, and TB = their penalties. For 0<α<1, this model employs a mixture BGB + AGB. Additionally, each proposed regression will of L1 and L2 regularization, permitting a flexible mixture also be discussed briefly below. of penalties. L1 Ratio (l1_ratio) is the blending parameter that identifies the balance between L1 and L2 penalties. It 2.1 Base machine learning models controls the proportion of the penalty determined to the L1 norm relative to the L2 norm. For l1_ratio=0, the model ElasticNet regression is an extension of linear regression applies only regularization of L2 (which equals Ridge that integrates both regularizing penalties of Lasso regression). For l1_ratio=1, it uses merely regularization (abbreviated as L1) and Ridge (abbreviated as L2) into the of L1 (which equals Lasso regression). For loss function. This combination allows ElasticNet to deal 00 which is a supposed learning rate [9]. As the GB model is represented in Fig. 6, the data first goes through a bootstrap sampling to be split into T data Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 381 Dataset D Bootstrop Sampling Subest Subest ......... Subest D1 D2 DT Decision Decision Decision tree h1 tree h2 tree hT Result Result Result ......... h1(x) h2(x) hT(x) Average Final result H(x) Figure 6: The stages of Gradient Boosting (GB) model employed for tropical tree biomass predictions Given that the Poisson distribution is the basis for estimation, and the coefficients (β) are specified to Poisson regression, it identifies the probability of the maximize the probability of observing the actual count occurrence of any number of events in a steady interval of data in the model [26]. time or space, with the assumption that the events occur at The approach for the application of the Poisson model a constant rate and are independent of each other. The is well-illustrated in Fig. 7, which experiences various formula for the Poisson distribution is given by: stages in a linear pattern. To begin with, a point cloud is The Poisson distribution could be calculated from the taken as an input; second, the surface normal of all the formula below: points is detected by computing the eigenvector over the 𝜆𝑘𝑒−𝜆 k-nearest neighbors of each point. Third, an octree with a 𝑃(𝑋 = 𝑘) = ( ) (4) 𝑘! predefined depth d is selected for categorizing the reconstructed surface. Then, the Gradient of the indicator In the above equation, X is the number of random function (Vx) equated to the vector V is defined by the occurrences, and λ represents the average or mean of the point cloud. The next stage involves defining an indicator events. Poisson regression exploits its distribution to function X with the value of 1 inside and 0 outside the provide a comprehension of the predictor variables’ surface. Thus, Vx=V and the divergence operator is relationships along with that of the count data in the dataset. In this regression, the expected value (mean) of applied to either side; i.e. ∆𝑥 ≡ ∇. ∇𝑥 = ∇. 𝑉. On the next the count variable (namely Y) is designed as a model of a stage, the indicator function x is solved as a standard linear mixture of predictor variables (namely X): Poisson problem. The marching cube algorithm is used to extract the surface from the solved indicator function x. λ = exp (β0 + β1X1 + β2X2 + … + βnXn) (5) Eventually, the reconstructed surface is stored in the in which: λ is the expected count, which represents the octree of depth d. Since AGB, BGB, and TB are skewed occurrence proportion, β0 is the intercept term, β1, β2, …, and non-negative, Poisson regression was employed. βn represent coefficients related to each predictor variable. Although it was initially created for count data, its The link function in Poisson regression is the natural formulation fits biomass distributions quite nicely. To logarithm (log-link), ensuring the predicted values are not make sure the Poisson model's assumptions held true in negative. This model is evaluated via maximum likelihood this situation, diagnostic tests were conducted. 382 Informatica 49 (2025) 373–396 Q. Dang Surface normal of all the Gradiant of the An octree with a points are found by indiactor function (Vx) A point cloud is predefined depth d is computing the eigen equated to the vector taken as input selected for storing the vector over k-nearset fieled V definde by the recontructed surface neighbor of each point point cloud An indiactor function x is defined having value of 1 inside and o outside the surface Marching cube Vx = V The reconstructed the indicator algorithm is used to Applying the surface is function x is solvedas extract the surface from divergence operator stored in the octree of a standard the solved indictor to either side: depth d poisson problem function x Figure 7: The stages of the Poisson model employed for tropical tree biomass predictions 2.2. Hyperparameter tuning the base models and capturing of non-linear relationships, Grid Search was applied in a 5-fold cross-validation a tree-based learner was applied as the meta-model in this framework to find optimal hyperparameters for all the research. Stacking model showing an RMSE of 18.298, models. For tree-based models like Gradient Boosting, MAE of 12.422, and R2 of 0.968 performed significantly Extra Trees, and XGB, the number of estimators, learning better compared to any of the base models on the test data, rate, and max tree depth were systematically changed to meaning that the ensemble was able to selectively optimize model complexity and predictability. In the leverage the strengths of each of the related models to ElasticNet scenario, the regularization parameter (alpha) generate more stable and accurate predictions of biomass. and L1 fraction were tuned to prevent overfitting and In the Stacking model, presented in Fig. 8, training cause sparsity. In Poisson regression, tuning was for the data are processed on the basis of three level 0 models regularization parameters and the number of iterations to separately. Each model’s prediction results are gathered as achieve better convergence. After separately tuning each other processed training data in the study. All of the base of the base models, their outputs were fed into a meta- learners' predictions (GB, ET, XGB, ElasticNet, and learner in the Stacking Ensemble, whose parameters also Poisson) were aggregated by the Stacking Ensemble. To were tuned via Grid Search. This broad tuning process avoid overfitting and information leakage, the meta- ensured that all models, including the ensemble, reached learner, a Ridge regression model, was trained on out-of- optimal generalization and performance [27]. fold predictions. We were able to improve overall The Stacking Ensemble was selected due to the meta- predictive performance by combining the complementary learner included, as it blends heterogeneous base learners strengths of all models—capturing distribution-specific, with varying predictive ability and error behaviors, as well linear, and non-linear trends—into this ensemble. Table 2 as generalizes well. Due to its nature of broad application provides the hyperparameters chosen by the stacking meta in addressing multicollinearity on regression prediction of learner for the models. Training Data Level0 Level0 Model Model 3 1 Level0 Model 2 Predictions Training Data Figure 8: Stacking model procedures used for tropical tree biomass Predictions Predictions Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 383 Last but not least, XGB is one of the common It is worth mentioning that additive learners would not algorithms in ML. It is based on the ensemble learning mess with the functions developed in prior iterations but framework, following the gradient boosting algorithm. add information of their own in order to bring down the Thus, it is applicable for the tasks of supervised learning, error values. First, the model begins with some function i.e., regression, ranking, and categorization. XGB is a F0(x). This F0(x) needs to minimize the loss function or predictive model that combines multiple individual MSE, hence: models’ predictions iteratively. It works by adding weak 𝐹0(𝑥) = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑛 ϒ ∑𝑖=1 𝐿(𝑦𝑖 , ϒ) learners into the ensemble one after another, such that at 𝑎𝑟𝑔𝑚𝑖𝑛ϒ ∑𝑛 𝑖=1 𝐿(𝑦𝑖 , ϒ) = (9) every step, a new learner tries to correct the errors of the 𝑎𝑟𝑔𝑚𝑖𝑛 𝑛 ϒ ∑𝑖=1(𝑦𝑖 − ϒ)2 prior ones. It also minimizes a prespecified loss function during training data using some sort of gradient descent Regarding the prime differential of this equation with optimization [13]. γ, it is observed the function is minimized at the mean i=1, In summary, the XGB is developed in three stages …, n. Thus, the promoting model can proceed with: straightforwardly: First, a primary model, namely F0, was ∑𝑛 𝑖=1 𝑦𝑖 𝐹0(𝑥) = (10) used to predict, i.e. the aimed variable. The XGB model is 𝑛 related to a residual (y–F0). Second, the residuals obtained F0(x) presents the first step of predictions in this in the prior stage are adapted to a new model called h1. model. Next, for each instance, the residual error is Third, the combination of F0 and h1 delivers F1, which is expressed as: (yi – F0(x) [28]. the promoted form of F0. Consequently, the MSE metric In Fig. 9, the XGB model employs a multifaceted system from F1 will be lower than that from F0. approach to make predictions about input data. 𝐹1(𝑋) < −𝐹0(𝑥) + ℎ1(𝑥) (6) Afterwards, the average of predictions is calculated and an For improving F1's performance, a residuals model of ultimate XGB prediction is thus generated .Because of its F1 can be designed, and an original model called F2 is exceptional performance with structured tabular data and presented. its integrated regularization, which helps avoid 𝐹2(𝑋) < −𝐹1(𝑥) + ℎ2(𝑥) (7) overfitting, XGB was included. It was well-suited for this task because of its efficient handling of non-linearities, This process would be iterated for a number of 'n' support for missing values, and robustness to noise, even stages up until potentially minimizing residuals as much with the small sample size. Grid Search was used to as probable, i.e. optimize important hyperparameters, such as learning 𝐹𝑛(𝑋) < −𝐹𝑛−1(𝑥) + ℎ𝑛(𝑥) (8) rate, maximum depth, and gamma. Predictions Predictions Predictions Input (...) Data (...) AverageofPredictions XGBoost Prediction Figure 9: The procedure used in the XGB model for predicting tropical tree biomass 384 Informatica 49 (2025) 373–396 Q. Dang 3 Results and discussion darker the color of a cell, the stronger the correlation of the related variables. As it is obvious from this tabulated 3.1 Exploratory data analysis heatmap, the colors are darker for stronger correlations To display how closely related the multiple variables and lighter for weaker ones. Additionally, the green colors of the study data are, a Pearson correlation heatmap is represent the positive correlations; that is, when one exploited as an effective color-coded visual matrix (See variable increases, the other variable tends to go up, Fig. 10). Variables are demonstrated with rows and whereas in the case of negative correlations, when one columns, and the cells define the relationship between variable increases, the other variable tends to drop. Purple variables two by two. The color shading for each cell colors have been used. indicates the correlations’ direction and their strength: the Figure 10: Pearson Correlation Heatmap for detecting the relationship between studied variables. In Fig. 11, a pair plot visualization for the distribution explanatory power of the data set is in part due to a mixture of dataset parameters is shown for exploring the analysis of continuous gradations in combination with categorical of the data. In a pair plot, the data is visualized to find the differences. The distributions in classes are depicted by relation between them, where variables are continuous as the colors, and it can be observed that there is clearly a well as categorical, or form the most divided clusters. grouping in the plots, either of altitude, or of CA, or of Dispersions of the parameters indicate the fact that most WT. Following is a pair plot providing a high interface features are not evenly distributed. CA, WT, and P are level to derive enlightening statistical information about skewed or clustered, and the values of these variables are the dataset; i.e., the variations in each plot can be focused on particular ranges. Scatter plot graphs such as observed, and the crucial diagonal secondary plots show CA versus WT or HA versus CA show positive each variable distribution. This pair plot for the relationships, which hold good, indicating potential relationship between variables of total amount of biomass, multicollinearity that could be important to model. As a namely TB, is also demonstrated in Fig. 12, which more contrast, the variable types like forest type code and soil explicitly explores how the CTB classes are distinguished type code arrive in horizontal bands or discrete groups, as in terms of predictors. In this instance, the scatter plots they are categorical. These trends suggest that the show that for most variable pairs, the CTB categories are Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 385 predominantly overlapping, indicating that no set of dataset is skewed in variable distribution. This class variables can completely separate the CTB classes. grouping within specific regions suggests that individual However, there are regions, particularly in pairings like variables may not always be able to differentiate CTB, but CA vs. WT or CA vs. P, where some CTB groups are groups of predictors likely have predictive value. Further, grouped more closely together or are bunched into more the mixture of continuous and discrete variables constricted value ranges. The histograms on the diagonal introduces difficulty, as seen in the scatter plots, where also emphasize the bunched character of observations some categories of CTB extend across different bands, within given intervals, further underscoring that the with others overlapping. Figure 11: Pair plot for specifying the distribution of dataset parameters as well as their relationship 386 Informatica 49 (2025) 373–396 Q. Dang Figure 12: Pair plot for showing the relationship between the variables of total tropical tree biomass (TB) and their distributions 3.2. Machine learning results regard to regularization and data augmentation will be In the present investigation, six methods were employed necessary in future computations to minimize the chances in two forest locations, i.e., Dipterocarp and Evergreen of overfitting. The test results indicate that the Stacking Broadleaf, to concurrently anticipate BGB, AGB, and TB model outperforms the others in nearly all metrics, = BGB + AGB. To assess the base models’ performance, demonstrating higher predictive accuracy and reliability. Table 3 presents the findings of the error metrics criteria Its mean squared error (MSE) is considerably low at for each recommended models considering the train and 334.82, indicating lower average squared discrepancies test data. By comparing these error metrics along with FI, between the predicted value and actual value compared to it was detected that the Stacking Ensemble model was other models like ElasticNet (2378.17) and Extra Trees optimal than other models. The very large R 2 values close (1216.54). In the same vein, root mean squared error to 0.999 on the training dataset is an indicator of (RMSE) for Stacking is 18.30, a far cry from those of overfitting or data leaks. To prevent this we made sure to ElasticNet (48.77) and Gradient Boosting (41.75), have rigorous separation of the training and test data and meaning they were more precise in their predictions. Mean we optimized our hyperparameters using the Grid Search absolute error (MAE) performs the same, at 12.42 for with cross-validation to prevent overfitting. The rock- Stacking, a far better performance than for models such as bottom R2 values on the testing data (such as 0.962) are Poisson regression (19.32) and XGB (21.18). indicative of a lack of overfitting, so the overfitting In regard to explained variance and fit, Stacking had the appears to be contained. Further improvements with best R² value of 0.968 across all models, which means it accounts for nearly 97% of the test data variance. This is Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 387 significantly better than ElasticNet's R² score of 0.77 and efficient and performs better than the other models for Extra Trees' R² score of 0.88. The Stacking model's both training and testing data. In contrast, ElasticNet normalized mean squared error (NMSE) is 0.032, the shows weaker performance in predicting the variables. minimum, with no significant normalized error against Furthermore, the results of the employed evaluation data variance. Likewise, its variance accounted for (VAF) metrics are presented and thoroughly discussed using is 0.971, indicating impeccable consistency of predicted relevant figures at the end of this section. Improved and actual values. The median absolute percentage error accuracy and stability on both forest types were indicated (MDAPE) of 23.08 and the standard deviation (STD_dev) by the lower MSE, RMSE, and MDAPE, but higher R2 of residuals of 105.76 also illustrate the consistency of the and VAF of the Stacking Ensemble, which consistently model in its performance. outperformed the other algorithms. ElasticNet performed On the other hand, the other models show higher error poorly because of its linear framework, which failed to measures and lower variance explanation, with the properly capture the intricate, nonlinear patterns in Stacking model remaining the most accurate and biomass data. Because Stacking possessed the ability to consistent for the test set in this comparison. This claim combine the powers of tree-based models like GB, ET, was on the basis of the higher R2 value of the Stacking and XGB, it outperformed them despite the fact that they model for both train and test data. According to this table, were moderate. the higher R2 and VAF for each model make them worthy Figure 13 below is an illustration of the data values of a better model; in this case SE model. On the other obtained via the ML parameters; i.e., ElasticNet, Extra hand, the lower the other metrics such as MSE, RMSE, Trees, GB, Poisson, Stacking, XGB; and accordingly, a MAE, NMSE, MDAPE, and STD, the more the model comparison of these parameters in detail, along with their would have the merit of being an efficient predictor. distance from the target value data, is presented. Therefore, the Stacking model is deemed the most Table 3: Error metrics criteria result for the proposed ML models considering the train and test datasets. Models ElasticNet Extra Trees GB Poisson Stacking XGB Metrics Train MSE 4026.476 4.933E-26 5.800 1725.207 1153.269 1.72544E-05 RMSE 63.455 2.221E-13 2.408 41.536 33.960 0.004 MAE 32.550 7.82E-14 1.821 16.042 11.562 0.003 R2 0.788 0.999 0.999 0.909 0.939 0.999 NMSE 0.212 2.601E-30 0.000 0.091 0.061 9.09812E-10 MDAPE 67.280 1.651E-13 5.016 29.535 8.722 0.008 STD_dev 100.279 137.713 137.491 150.171 134.774 137.713 VAF 0.788 0.999 0.999 0.909 0.939 0.999 Test MSE 2378.167 1216.538 1743.162 1051.301 334.820 2090.796 RMSE 48.766 34.879 41.751 32.424 18.298 45.725 MAE 36.818 18.068 21.948 19.320 12.422 21.178 R2 0.770 0.882 0.831 0.898 0.968 0.797 NMSE 0.230 0.118 0.169 0.102 0.032 0.203 MDAPE 68.390 32.640 34.406 31.075 23.081 30.099 STD_dev 100.272 85.241 76.508 78.865 105.763 75.760 VAF 0.777 0.894 0.853 0.924 0.971 0.817 388 Informatica 49 (2025) 373–396 Q. Dang 1080 810 540 270 0 Target ElasticNet Extra Trees GBM Poisson Stacking XGBoost 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 Row Numbers Figure 13: Value plot of comparing ML parameter values with the target value. Figure 14 shows the results for R2 as a significant norm line (when R2 = 1) is considered to be a superior as error metric criterion, suggesting how well the employed well as more accurate model. This result is in line with a ML models’ predictions fit the real data. Shown in the higher R2 value (approximately 0.939) for the test data and figure, the model’s prediction values align closely to the 0.968 for the training data in the proposed Stacking model. 1600 ElasticNet 1600 Extra Trees 1600 XGBoost 1400 1400 140 R2 Train = 0.788 R2 Train = 0.999 0 R2 Train = 0.999 120 2 0 R2 Test = 0.770 1200 R2 Test = 0.882 1200 R Test = 0.797 1000 1000 1000 800 800 800 600 600 600 400 400 400 200 200 200 Train Train Train 0 Test 0 Test 0 Test Y=X Y=X -2 Y=X 00 -200 -20 - 0 200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600 Actual Values (ElasticNet) Actual Values (Extra Trees) Actual Values (XGBoost) 1600 GBM 1600 Poisson 1600 Stacking 1400 R2 Train = 0.999 1400 R2 Train = 0.909 1400 R2 Train = 0.939 1 R2 Test = 0.831 200 R2 Test = 0.898 1200 1200 R2 Test = 0.968 1000 1000 1000 800 800 800 600 600 600 400 400 400 200 200 200 Train Train Train 0 Test 0 Test 0 Test Y=X Y=X -2 Y=X 00 -200 -200 -200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600 -200 0 200 400 600 800 1000 1200 1400 1600 Actual Values (GBM) Actual Values (Poisson) Actual Values (Stacking) Figure 14: Comparing the coefficient of determination (R2) for each ML model. The frequency of each error value for each ML accuracy. As a result, the error in the ML models’ method’s predictions is represented in Fig. 15. The error prediction performance ought to be almost zero to be an analysis was conducted for both train and test parameters, adequate model for the aim of the study. and the ML models were assessed to examine their Predicted (Value) Predicted (Value) Predicted (Value) Predicted (Value) Predicted (Value) Predicted (Value) Target (Value) Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 389 291 XGBoost Train Test 194 97 0 280 Stacking 140 0 -140 190 Poisson 0 -190 -380 176 GBM 88 0 -88 150 Extra Trees 75 0 -75 630 ElasticNet 420 210 0 0 25 50 75 100 125 150 175 Row Numbers Figure 15: Comparing error values for ML models Moreover, according to Fig. 16, the error values for Based on the plot, we see that stacking seems to be the proposed ML models have been illustrated from the more accurate than individual models, but by a very small least error value to the most errors for both train and test margin. Compared to its predictions, it has fewer errors data of each model, moving from left to right. A model and reduced variance, indicating that it has a stronger with the least error values (i.e., approximately zero) would generalization and stability. On the contrary, although be the best predictor among the employed ML models. other models have also performed adequately, they exhibit This visualization highlights that the data has intense some spread or deviations that are a bit higher than the recurrent peaks—suggesting non-uniform distributions mean. with dominating clusters—and that these patterns persist Overall, it appears that stacking should produce a but evolve subtly across different sections of the dataset. more consistent and less erratic result than single models, When one of the groups describes a stacking model, its thus making a superior comparison to the single models in activity can be visually compared with the other groups by terms of performance. looking at how close the mean is to zero and how much and steady the standard deviation is. ElasticNet Extra Trees GBM Poisson Stacking XGBoost 390 Informatica 49 (2025) 373–396 Q. Dang Mean ± 1 SD 600 Mean Data 400 200 0 -200 -400 Figure 16: Boxplot of ML models’ error values for both train and test data The following figure (Fig. 17) illustrates a Compared to its predictions, it has fewer errors and comparison of proposed models in terms of two important reduced variance, indicating that it has a stronger statistical evaluation metrics, namely R2 and VAF, generalization and stability. On the contrary, although estimated for both the test and train datasets. As the values other models have also performed adequately, they exhibit of these metrics show, all the models perform efficiently some spread or deviations that are a bit higher than the in the prediction except the ElasticNet model, which mean. performs weaker than others, having lower VAF and R2. Overall, it appears that stacking should produce a Stacking and XGB model performances are stronger than more consistent and less erratic result than single models, the rest of the models, bearing higher VAF and R2. Based thus making a superior comparison to the single models in on the plot, we see that stacking seems to be more accurate terms of performance. than individual models, but by a very small margin. R2 (Test) R2 (Train) VAF (Test) VAF (Train) XGBoost 0.999 0.817 0.999 0.797 Stacking 0.939 0.971 0.939 0.968 Poisson 0.909 0.924 0.909 0.898 GBM 0.999 0.853 0.999 0.831 Extra Trees 0.999 0.894 0.999 0.882 ElasticNet 0.788 0.777 0.788 0.770 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Figure 17: Comparison of Models based on VAF and R2 metrics. Range ElasticNet (Train) ElasticNet (Test) Extra Trees (Train) Extra Trees (Test) GBM (Train) GBM (Test) Poisson (Train) Poisson (Test) Stacking (Train) Stacking (Test) XGBoost (Train) XGBoost (Test) Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 391 The other evaluation metrics, including MSE, NMSE, be the best predictor. In this case, the stacking model for MAE, RMSE, STD, and MDAPE, are applied for both the train and test datasets is the lowest compared to comparison among the models, supposing that the lowest the other models (See Fig. 18). value of these metrics for each model allows that model to ElasticNet ElasticNet 5000 0.25 0.230 0.212 4000 4026.48 0.20 3000 0.15 XGBoost 2378.17 2000 Extra Trees XGBoost 0.10 Extra Trees 0.203 1000 0.05 2090.80 0.118 0 1216.54 0.00 0.00 0.00 0.000 0.000 -1000 -0.05 334.82 5.80 0.000 1153.27 0.061 0.032 1743.16 1051.30 0.169 1725.21 0.091 Stacking GBM 0.102 Stacking GBM MSE (Train) NMSE (Train) MSE (Test) NMSE (Test) Poisson Poisson ElasticNet ElasticNet 40 70 36.818 35 63.455 60 32.550 30 50 48.766 25 40 XGBoost 20 Extra Trees XGBoost 30 Extra Trees 15 45.725 20 10 21.178 8 34.879 5 18.06 10 0 0 0.003 0.000 0.004 0.000 -5 -10 1.821 11.562 2.408 12.422 18.298 33.960 21.948 41.751 16.042 Stacking 19.320 GBM Stacking 32.424 GBM 41.536 MAE (Train) RMSE (Train) MAE (Test) RMSE (Test) Poisson Poisson ElasticNet ElasticNet 160 80 150 70 68.39 140 60 67.28 130 50 XGBoost 120 Extra Trees XGBoost 40 Extra Trees 110 30 100.28 137.71 137.71 100 100.27 20 90 30.10 10 32.64 80 85.24 0 75.76 0.01 0.00 70 76.51 -10 78.86 8.72 5.02 105.76 23.08 34.41 134.77 137.49 29.53 31.08 Stacking GBM Stacking GBM STD (Train) 150.17 MDAPE (Train) STD (Test) MDAPE (Test) Poisson Poisson Figure 18: Comparison of Models based on MSE, NMSE, MAE, RMSE, STD, and MDAPE metrics. Another graphical tool used to compare the models' such as the correlation coefficient, standard deviation performance is the Taylor diagram. This diagram (STD), and RMSE. In the diagram, the models' evaluates models based on their accuracy, using metrics performance is represented by circles, where better 392 Informatica 49 (2025) 373–396 Q. Dang performance is indicated by points closer to the reference the RMSE of the ElasticNet model is higher than that of point [29]. Taylor diagrams for predicting tropical tree the other machine learning (ML) models, and its biomass are shown in Fig. 19. As seen in these diagrams, correlation coefficient is lower. These findings, based on the RMSE of the Stacking model is lower than that of the the correlation coefficient, STD, and RMSE, confirm that other models, and its correlation coefficient exceeds 0.9, the Stacking model outperforms the other models. outperforming the other models in this regard. In contrast, Taylor Diagram_ (R2, test) Taylor Diagram_ (R2, train) Taylor Diagram_ (RMSE, test) Taylor Diagram_ (RMSE, train) Figure 19: Taylor diagrams for models’ comparison based on RMSE, STD, and R metrics The last plot to be discussed for model comparison is Most of the observations in the test plot lie comfortably the Williams plot. This plot is used to compare a specific within the satisfactory limits for leverage and standardized group of compounds in terms of leverage values and residuals (±2), which is an indicator that the model standardized residuals [30]. William’s plot shows the predictions are not biased and stable and possess minimal standardized residuals on the y-axis and leverages on the outliers. Only a minimal number of observations being out x-axis of the training and testing datasets. From this plot, of the ±2 boundary and leverage constraint signifies that the applicability domain is implemented within a squared there are very few influential or problematic points. area inside ±2 standard deviations and a threshold h* in Similarly, the training plot shows tightly clumped leverage (h* = 3p´/n, being p´ model parameters and n residuals around zero with the majority of data points compounds number). The majority of data ought to be having little leverage, which would mean that the model located within this area, conceptualizing that they have to has not over-fit the training data. Even some of the be inliers and influential in the model. The Williams plots residuals fall outside ±3 or do possess relatively higher of the Stacking model on both training and test sets are leverage, those are scattered and do not invalidate the indicators of good model performance and generalization. model. The similar trend in both plots confirms that the Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 393 Stacking model works well on unseen data, learning the robust. To be an efficient model predictor, the data must inherent pattern significantly without being overfit and lie within this domain (See Fig. 20) [23]. Elastic Net Williams Plot [TB, Train] Elastic Net Williams Plot [TB, Test] Extra Trees Williams Plot [TB, Train] Extra Trees Williams Plot [TB, Test] GBM Williams Plot [TB, Train] GBM Williams Plot [TB, Test] Poisson Williams Plot [TB, Train] Poisson Williams Plot [TB, Test] 394 Informatica 49 (2025) 373–396 Q. Dang Stacking Williams Plot [TB, Train] Stacking Williams Plot [TB, Test] XGBoost Williams Plot [TB, Train] XGBoost Williams Plot [TB, Test] Figure 20: Williams plots for models’ comparison based on standard residuals and leverage. 3.3. Comparison with foundation models and summarization. Both use Transformer architecture, but LLMs GPT is more focused on generation and BERT on comprehension. The proposed SE model adds domain- Large Language Models, or LLMs, are sophisticated AI specific efficiency, whereas models such as BERT and systems that have been trained on enormous text datasets GPT are effective for general-purpose NLP tasks. We to comprehend and produce human language. Google specifically highlighted how, in contrast to the extensive, created BERT, which is perfect for tasks like classification data-intensive training of LLMs, SE makes use of and question answering because it can analyze words in structured, domain-relevant features. This study also both directions and understand context. With its emphasis indicates SE's improved interpretability and reduced on producing relevant and coherent text, OpenAI's GPT is computational cost, both of which are important for effective for tasks like content creation, dialogue, and ecological modelling. performance was investigated in terms of two sets of 4 conclusions actual data, namely training and testing data. The study was implemented on eight tropical forests in The outcome of this study presented that the Vietnam, using the forestry variables, i.e., AGB, BGB, recommended method had a vigorous efficacy to estimate and TB. In an attempt to solve the problem of predicting the amount of forest biomass. That is to say, employing a the mentioned variables, the study used an MGDL simultaneous group of ML models resulted in a significant regression strategy, which later proved to be an efficient impact on predicting forestry above- and below ground, as model to bear a strong ability to predict tropical forest well as the sum of the biomass. The very high R² values biomass. To this, five models were selected as major of near 0.999 in the training set are definitely cause for algorithms to unravel the issue of biomass prediction. alarm for overfitting or data leakage. We dealt with this by These models included Gradient Boosting (GB), Extra ensuring strict separation of training and test datasets such Trees (ET), XGB, ElasticNet, and Poisson, all of which that there was no information leakage. We also employed were employed to synchronously anticipate both the Grid Search with cross-validation during hyperparameter amount of AGB, BGB, as well as TB = BGB + AGB. Then tuning to allow maximum model complexity without over- optimized by Grid Search. Additionally, the SE model was fitting. The test set results, with R² scores considerably joined to the aforementioned models so as to allow the lower (e.g., 0.968 for the best model), are a sign of good results to become satisfactory, i.e., mainly for the cross- generalization and suggest that while there may be some validation purpose. Therefore, the recommended method's overfitting, it is controlled. More regularization and more Enhanced Prediction of Tropical Tree Biomass Using Ensemble… Informatica 49 (2025) 373–396 395 data will be tried in future research to reduce the participants' rights and compliance with the relevant possibility of overfitting even more. Based on the ethical guidelines. provided metrics, the Stacking ensemble model performed clearly superior to each of the standalone models on the References test set. That is because it is capable of leveraging the prediction power of various base learners—ElasticNet, [1] H. C. Zhantao Song, Xiong Zhang, Xiaoqiang Li, Extra Trees, Gradient Boosting, Poisson Regression, and Junjie Zhang, Jingai Shao, Shihong Zhang, Haiping XGB—and minimizing their respective errors through a Yang, “Machine learning assisted prediction of meta-learner. Stacking takes into consideration the stand- specific surface area and nitrogen content of biochar alone strengths of linear as well as nonlinear models and based on biomass type and pyrolysis conditions,” J results in improved generalization and less overfitting. Anal Appl Pyrolysis, vol. 183, 2024. Quantitatively, the Stacking model achieved the [2] L. Jia, W. Shao, J. Wang, Y. Qian, Y. Chen, and Q. highest coefficient of determination (R² = 0.968) and Yang, “Machine learning-aided prediction of bio- variance accounted for (VAF = 0.971) on the test set, BTX and olefins production from zeolite-catalyzed indicating that its predictions were most highly correlated biomass pyrolysis,” Energy, vol. 306, p. 132478, with actual biomass values. It generated the lowest mean 2024. squared error (MSE = 334.820), root mean square error [3] H. Wu, S. An, B. Meng, X. Chen, F. Li, and S. Ren, (RMSE = 18.298), and mean absolute error (MAE = “Retrieval of grassland aboveground biomass across 12.422), indicating high accuracy and low prediction bias. three ecoregions in China during the past two In terms of normalized error, it also had an NMSE of just decades using satellite remote sensing technology 0.032, and the mean directional accuracy percentage error and machine learning algorithms,” International (MDAPE) decreased to 23.081%, significantly better than Journal of Applied Earth Observation and other models. Although its test standard deviation (STD = Geoinformation, vol. 130, no. November 2023, p. 105.763) was slightly greater, this is the natural result with 103925, 2024, doi: 10.1016/j.jag.2024.103925. better prediction accuracy and range coverage for both [4] P. Mao et al., “An improved approach to estimate train and test data, where the results showed R2 equals above-ground volume and biomass of desert shrub 0.939 for the testing data and 0.968 for the training data in communities based on UAV RGB images,” Ecol this study data analysis. Therefore, adding the SE model Indic, vol. 125, p. 107494, 2021, doi: to the proposed models is recommended for predicting 10.1016/j.ecolind.2021.107494. forest biomass effects. As a result, this is evidence of the [5] P. B. May et al., “Mapping aboveground biomass in poor performance of this model. The William plots Indonesian lowland forests using GEDI and residuals display that its majority corresponds to the hierarchical models,” Remote Sens Environ, vol. tolerant parameters, and very limited outliers or high- 313, p. 114384, 2024. leverage points are there in the test and train subsets. This [6] B. Huy, N. Quy Truong, K. P. Poudel, H. Temesgen, implies that the Stacking model produces reliable, and N. Quy Khiem, “Multi-output deep learning unbiased, and non-overfit predictions, and there is indeed models for enhanced reliability of simultaneous tree powerful generalization and performance. above- and below-ground biomass predictions in tropical forests of Vietnam,” Comput Electron Agric, vol. 222, no. December 2023, p. 109080, 2024, doi: 10.1016/j.compag.2024.109080. Declarations [7] M. F. Oliveira et al., “Predicting below and above- ground peanut biomass and maturity using multi- Funding target regression,” Comput Electron Agric, vol. 218, p. 108647, 2024. This research received no specific grant from any funding [8] G. Kunapuli, Ensemble methods for machine agency in the public, commercial, or not-for-profit sectors. learning. Simon and Schuster, 2023. [9] R. Dey and R. Mathur, “Ensemble learning method Authors' contributions QD using stacking with base learner, a comparison,” in Writing-Original draft preparation, Conceptualization, International Conference on Data Analytics and Supervision, Project administration. Insights, Springer, 2023, pp. 159–169. [10] P. Naik, M. Dalponte, and L. Bruzzone, “Automated Acknowledgements machine learning driven stacked ensemble modeling for forest aboveground biomass prediction using I would like to take this opportunity to acknowledge that multitemporal Sentinel-2 data,” IEEE J Sel Top Appl there are no individuals or organizations that require Earth Obs Remote Sens, vol. 16, pp. 3442–3454, acknowledgment for their contributions to this work. 2022. [11] Y. Zhang, J. Ma, S. Liang, X. Li, and J. Liu, “A Ethical approval stacking ensemble algorithm for improving the The research paper has received ethical approval from the biases of forest aboveground biomass estimations institutional review board, ensuring the protection of from multiple remotely sensed datasets,” GIsci Remote Sens, vol. 59, no. 1, pp. 234–249, 2022. 396 Informatica 49 (2025) 373–396 Q. Dang [12] J. Liu, Y. Niu, Z. Jia, and R. Wang, “Assessing the and Elastic Net,” Cognit Comput, vol. 16, no. 2, pp. ethical implications of artificial intelligence 641–653, 2024. integration in media production and its impact on the [24] S. M. Mastelini, F. K. Nakano, C. Vens, and A. C. P. creative industry,” MEDAAD, vol. 2023, pp. 32–38, de Leon Ferreira, “Online extra trees regressor,” 2023. IEEE Trans Neural Netw Learn Syst, vol. 34, no. 10, [13] R. Huang, C. McMahan, B. Herrin, A. McLain, B. pp. 6755–6767, 2022. Cai, and S. Self, “Gradient boosting: A [25] M. M. Hameed, M. K. Alomar, F. Khaleel, and N. computationally efficient alternative to Markov Al-Ansari, “An Extra Tree Regression Model for chain Monte Carlo sampling for fitting large Discharge Coefficient Prediction: Novel, Practical Bayesian spatio-temporal binomial regression Applications in the Hydraulic Sector and Future models,” Infect Dis Model, vol. 10, no. 1, pp. 189– Research Directions,” Math Probl Eng, vol. 2021, 200, 2025, doi: 10.1016/j.idm.2024.09.008. 2021, doi: 10.1155/2021/7001710. [14] Z. Wang, L. Mu, H. Miao, Y. Shang, H. Yin, and M. [26] “Understanding Poisson Regression.” Accessed: Dong, “An innovative application of machine Nov. 05, 2023. [Online]. Available: learning in prediction of the syngas properties of https://medium.com/@data- biomass chemical looping gasification based on overload/understanding-poisson-regression-a- extra trees regression algorithm,” Energy, vol. 275, powerful-tool-for-count-data-analysis- p. 127438, 2023. b7184c61bfde [15] H. Wei, K. Luo, J. Xing, and J. Fan, “Predicting co- [27] M. A. Alemayehu, S. D. Kebede, A. D. Walle, D. N. pyrolysis of coal and biomass using machine Mamo, E. B. Enyew, and J. B. Adem, “A stacked learning approaches,” Fuel, vol. 310, p. 122248, ensemble machine learning model for the prediction 2022. of pentavalent 3 vaccination dropout in East Africa,” [16] R. (Bob) Roy, “No Title.” Accessed: Jun. 27, 2021. Front Big Data, vol. 8, p. 1522578, 2025. [Online]. Available: [28] “XGBoost Algorithm.” Accessed: Sep. 04, 2024. https://bobrupakroy.medium.com/extra-trees- [Online]. Available: classifier-regressor-5b5f6abe8228 https://www.analyticsvidhya.com/blog/2018/09/an- [17] B. Kıyak, H. F. Öztop, F. Ertam, and İ. G. Aksoy, end-to-end-guide-to-understand-the-math-behind- “An intelligent approach to investigate the effects of xgboost/#:~:text=XGBoost builds a predictive container orientation for PCM melting based on an model,made by the existing ones. XGBoost regression model,” Eng Anal Bound Elem, [29] M. Ehteram, A. N. Ahmed, P. Kumar, M. Sherif, and vol. 161, pp. 202–213, 2024. A. El-Shafie, “Predicting freshwater production and [18] Y. Ayub, J. Ren, T. Shi, W. Shen, and C. He, energy consumption in a seawater greenhouse based “Poultry litter valorization: Development and on ensemble frameworks using optimized multi- optimization of an electro-chemical and thermal tri- layer perceptron,” Energy Reports, vol. 7, pp. 6308– generation process using an extreme gradient 6326, 2021, doi: 10.1016/j.egyr.2021.09.079. boosting algorithm,” Energy, vol. 263, p. 125839, [30] A. Beheshti, E. Pourbasheer, M. Nekoei, and S. 2023. Vahdani, “QSAR modeling of antimalarial activity [19] A. Jain, “No Title.” Accessed: Feb. 05, 2024. of urea derivatives using genetic algorithm-multiple [Online]. Available: linear regressions,” Journal of Saudi Chemical https://medium.com/@abhishekjainindore24/elastic Society, vol. 20, no. 3, pp. 282–290, 2016, doi: -net-regression-combined-features-of-l1-and-l2- 10.1016/j.jscs.2012.07.019. regularization-6181a660c3a5 [20] J. Liu et al., “A new application of Elasticnet regression based near-infrared spectroscopy model: Prediction and analysis of 2, 3, 5, 4′-tetrahydroxy stilbene-2-O-β-D-glucoside and moisture in Polygonum multiflorum,” Microchemical Journal, vol. 199, p. 110095, 2024. [21] Purhadi, D. N. Sari, Q. Aini, and Irhamah, “Geographically weighted bivariate zero inflated generalized Poisson regression model and its application,” Heliyon, vol. 7, no. 7, p. e07491, 2021, doi: 10.1016/j.heliyon.2021.e07491. [22] U. Arif, C. Zhang, S. Hussain, and A. R. Abbasi, “An Efficient Interpretable Stacking Ensemble Model for Lung Cancer Prognosis,” Comput Biol Chem, p. 108248, 2024. [23] H. Yıldırım and M. R. Özkale, “A Novel Regularized Extreme Learning Machine Based on L 1-Norm and L 2-Norm: a Sparsity Solution Alternative to Lasso https://doi.org/10.31449/inf.v49i16.9397 Informatica 49 (2025) 397–416 397 HematoFusion: A Weighted Residual-Vision Transformer Ensemble for Automated Classification of Haematologic Disorders in Microscopic Blood Images Mouna Saadallah, Latefa Oulladji, Farah Ben-Naoum Evolutionary Engineering and Distributed Information Systems Laboratory, Department of Computer Science, Djillali Liabes University, Sidi Bel Abbes, Algeria E-mail: mouna.saadallah@univ-sba.dz, latifa.oulladji@univ-sba.dz, farah.bennaoum@univ-sba.dz Keywords: Medical imaging, neural networks, red blood cell, leukemia, lymphoma Received:May 27, 2025 Haematologic malignancies pose a significant global challenge, with 1.34 million new cases reported in 2019 and leukemia claiming 311,594 lives in 2020. Early diagnosis of these blood disorders increases survival chances by enabling prompt treatment, yet their complexity and variable cellular morphology hinder accurate detection. Advances in Medical Imaging and AI, particularly Image Classification, offer solutions by analyzing blood samples for subtle morphological patterns. This study advances the field by introducing a novel data set for the classification of red blood cells and using open-source data for the classification of leukemia and lymphoma (each covering 29,363; 16,811; and 1,436 images, respectively). We fine-tuned multiple AI models, including EfficientNetB3, ResNet50V2, and a pretrained Vision Trans- former (ViT), and combined their strengths into a weighted ensemble framework. Evaluated across various metrics (including accuracy, precision, recall, etc.), the proposed HematoFusion model excelled, achieving 96% accuracy in the morphology of red blood cells, 99% in Leukemia, and 96% in Lymphoma, surpassing most existing models in terms of accuracy while covering a wider range of haematologic disorders. These findings demonstrate the potential of integrated AI frameworks to improve haematologic diagnostics with precision and reliability. Povzetek: HematoFusion je uteženi ansambel ResNet50V2 in Vision Transformer, namenjen avtomatski klasifikaciji hematoloških motenj iz mikroskopskih slik. Sistem uporablja nov RBC-nabor podatkov ter odprtokodne nabore levkemije in limfoma ter izboljša zanesljivost diagnostičnega razpoznavanja krvnih celic. 1 Introduction thocyte, spherocyte, and more. White Blood Cells (WBC) and platelet disorders can mainly be described as quantita- The collection of blood samples is crucial to understanding tive, for example: Leukopenia, leukocytosis, neutropenia, diseases, preventing them, and thoroughly providing treat- lymphocytopenia (WBC), thrombocytosis, or thrombocy- ment. topenia (platelets). Most qualitative disorders are of cancer The diagnosis of blood cell diseases hinges significantly or proliferative disorders, including Leukemia, lymphoma, on determining the patient’s Blood Cell Count (BCC) and myeloma (WBC), and Hemophilia (platelets) [9]. observing the appearance of cells under a microscope. It The pathologist, along with other medical professionals, serves as a guide for the pathologist or biologist, providing depends on studying and examining body tissues to per- vital information on diseases that are indicative of quantita- form diagnostics. The microscope is the main tool used tive (variations in the number of cells) or qualitative (struc- to observe blood cells, which provides a detailed descrip- tural or functional) abnormalities in blood cells [11]. tion of them in terms of shape and count. Blood cell Patients admitted to consultation often suffer haematologic observation can be extremely challenging with the naked dysfunction (either qualitative or quantitative), some ex- eye and requires enormous concentration and focus; mod- amples of the most common cases requiring medical eval- ern technologies, however, recommend new techniques in- uation are caused either by a decrease in complete blood volving the use of a camera to capture microscopic im- count, anemia, for instance, sees a decrease in the num- ages that can be exploited for further studies and examina- ber of Red Blood Cells (RBCs) or the level of hemoglobin, tion. Some existing solutions like EasyCell ®Assistant, Vi- or an increase/ concentration in the amount of RBCs, as sion Hema®Assist, include a stand-alone tool using highly marked in the condition of Erythrocytosis. Other condi- costly robots and integrated microscopes that can assist the tions mark a change in the cell’s shape and/or size, includ- pathologist to make decisions and save time; nonetheless, ing: microcyte, macrocyte, echinocytes, codocyte, acan- due to their high costs and unavailability in public hospi- 398 Informatica 49 (2025) 397–416 M. Saadallah et al. tals and laboratories, these solutions cannot be relied upon Nevertheless, the emergence of developed technologies, entirely. This leads us to consider cheaper and more effec- such as deep learning, made things much easier for biol- tive innovations emerging in recent years, including Deep ogists/ pathologists, as it assists them with the process of Learning (DL) and its various contributions. DL has been analyzing the blood smear and detecting abnormalities in widely implemented by researchers in the medical field, cell type, shape, and aggregation. If done entirely by the and it has given promising results regarding medical imag- pathologist, this step may take hours or even days when ing (MRI, X-Rays, CT...) [30, 52] and enabled medical pro- necessary, which causes a decline in the health worker’s fessionals to rapidly diagnose and detect abnormalities in focus and even eyesight. the human body without exhausting analysis and observa- This urged the need to automate the task to alleviate the tions. pressure on them. Many studies have been conducted to This paper aims to improve classification accuracy for address this problem by exploiting the use of Artificial In- haematologic diseases by leveraging ensemble learning telligence and its diverse techniques. techniques applied to multi-source microscopic datasets, In its earliest phase, peripheral blood image analysis was preserving the full spectrum of morphologic variability. inspired by the emerging use of Artificial Intelligence in The latest DL techniques were exploited, including trans- the medical field and its automation. Kim KS, et al. [2] fer learning and fine-tuning Convolutional Neural Net- designed a system that uses a CCD camera attached to the work (CNN)models and the recently emerging visual trans- microscope to capture the peripheral images, preprocess- former (ViT) [28]. The ResNet50V2 and EfficientNetB3 ing techniques such as edge enhancement and noise re- networks were chosen as they were preferable for micro- moval were applied, and the images were later classified scopic image classification, and the latter was suitable for into 15 types of Red Blood Cell abnormalities and 5 normal scenarios with limited computing resources. We acquired shapes of White Blood Cells using neural networks. Fol- different sources for our data set that study not only Red lowing that, neural networks, mainly Convolutional Neu- Blood Cell disorders but also White Blood Cells (WBC). ral Networks, were explored for blood cell image analy- The CNN and ViT models were separately trained using sis and classification. WBC and its 5 different normal cell the completed data set, and the results were later combined shapes (Neutrophils, Lymphocytes, Monocytes, Basophils, to enhance the performance. Eosinophils) were the easiest to classify and readily avail- A description of our contribution is provided on the follow- able [14] [56] [37]. Classification accuracy reached 96% ing lines: using a simple neural network that consists of 16 neurons input layers and 10 nodes in a single hidden layer to achieve – A meticulously curated data set for Red Blood Cell a minimum error less than 10−4 and the output layer with 5 Morphology using samples collected in the Anti- neurons to classify each type [14]. Ali et al. [54] proposed Cancer Center in El-Oued, Algeria. the VGG16-ViT network that uses two online datasets to – The base architecture of EfficientNetB3 was used with classify WBC subtypes, achieving excellent precisions of transfer learning, leveraging pretrained weights from 98.99% and 99.95% on each dataset. ImageNet. It was additionally fine-tuned for the task The DenseNet121 model [12] was used by Bozkurt F. [27] of blood cell classification. on the open-access data set provided by Paul Mooney, available on Kaggle.com [18], reaching an accuracy of – The ResNet50V2 was also integrated and transfer- 98%. Another Two-Module Deformable CNN and Trans- learned as a base architecture and eventually fine- fer learning was proposed by YaoXufeng, et al. [37], whilst tuned by adding dense layers and regularization tech- the first module initializes the ImageNet [3] characteristic niques that serve to enhance the model’s performance. weights, the second module was designated for classifica- tion. The authors achieved precisions of 95.7%, 94.5%, and – A pretrained ViT model was applied to our data set 91.6% for low-resolution and noisy undisclosed data sets, to classify blood cell images through self-attention BCCD data set [20], respectively. mechanisms. The model was fine-tuned by optimiz- Some of the studies, however, focused solely on the clas- ing hyperparameters to improve accuracy. sification of one disease. Leukemia is one of the most common blood cancers, leading to growing interest in – A hybrid CNN/ViT model was developed by combin- developing new diagnostic systems for early detection ing the strengths of CNNs for local feature extraction and prevention. In this context, CNNs have gained sig- with those of ViT which captures global features more nificant attention due to their efficiency and high accu- efficiently. racy in image-based classification tasks. Areen K. et al. [47] compared in their study multiple CNN-based algo- 2 Related work rithms (AlexNet, DenseNet, ResNet, and VGG16), em- ploying three datasets (ALL-IDB, ASH ImageBand, and Pathology and detecting blood disorders require a mass of images captured at JUST); reaching an accuracy of 94%. work and time by a biologist to prepare the blood, test it, DeepLeukNet proposed by Saeed et al. [53] was conceived and analyze it. to classify Acute Lymphoblastic Leukemia (ALL) subtypes HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 399 employing a CNN-based classifier on the ALL-IDB1 and searchers using an imaging flow cytometer to classify un- ALL-IDB2 datasets, attaining 99.61% accuracy. Kasim et stainedWBCs [19] and optofluidic time-stretchmicroscopy al. [55] leverages the online datasets ALL-IDB and Mu- along with Machine Learning for aggregated platelets de- nich AML Morphology datasets for multi-class classifica- tection as well as single platelets and WBC [13]. tion of Leukemia subtypes using pretrained CNN architec- Visual or Vision Transformers were introduced by Doso- tures and other classification models, including Random- vitskiy, et al. [28] in 2020, to exploit transformers in vi- Forest, SVM, and Extreme Gradient Boosting. The highest sual applications. Given that image classification is rather accuracy achieved by this method was of 88%. In recent a novel concept for transformers, it may take a while to fully studies, Vision Transformers (ViTs) have been employed develop and exploit ViT in this regard. Compared to ViT, for the classification of Leukemia subtypes. Swain et al. CNN can handle large-scale data sets better and offer ex- [59] proposed in their research a model based solely on cellent results. ViT, however, is known for its understand- ViTs and classified ALL subtypes. The accuracy of the test ing of global context and dependencies, although it requires set reached 99.67%. A similar approach was implemented pretraining large amounts of data to achieve comparable re- by Prasad et al. [51] who attained an overall accuracy of sults to CNN. [34] Therefore, an ensemble ViT/CNNmodel 98.01% for the automatic detection of ALL. Others opted can be an excellent approach to incorporate ViT’s efficien- for architectures combining both CNNs and ViTs to further cies with CNN, this was previously done by Y.Barhoumia, enhance feature extraction. For instance, Tanwar et al. [60] et al. [26] to address another consistent problem of Intra combined in their study the ResNet50 model with the ViT, Cranial Hemorrhage Classification. It was also employed establishing a dual-stream architecture, reaching an accu- by Jiang Zhencun, et al. [32] to diagnose ALL. The en- racy of 99%. semble model method used is the weighted-sum model; the DL also proved efficient in the classification of other types output results of the ViT models are multiplied by a co- of cancer such as Lymphoma. Its potential was thoroughly efficient of 0.7, and the output results of the EfficientNet explained by several researchers [58] [35], stressing the model [21] are multiplied by a coefficient of 0.3. The au- application of CNNs and ensemble techniques. Ozgur et thors later combined the results to get the final prediction al. [49] developed a triple classification system of various result. The ViT-CNN ensemble model achieved outstand- Lymphomas: CLL, FL, and MCL, and employed a com- ing results with an accuracy of 99.03%, exceeding the mod- bination of ML and DL algorithms, reaching precisions of els in the literature. 94%, 92%, and 82%, respectively. A comparative summary of recent studies on cancer classi- Sickle Cell Anemia andMalaria can be diagnosed by exam- fication using deep learning methods is presented in Sup- ining the patient’s RBCs. Harahap Mawaddah, et al. [29] plementary Material: Section 1 (Table S1), which pro- used a data set that regroups 27.588 images of infected and vides the datasets used, classification techniques, number healthy individuals’ RBCs provided by Yasmin M. Kassim of classes, and accuracy values reported. et al. [23]. 2 CNN architectures were compared during the classification. LeNet-5 [1] was deemed more precise than DRNet [46] in classifying RBCs affected by Malaria, 3 Methods with accuracies of 95.7% and 95%, respectively. Alzubaidi Laith, et al. [22] introduced a CNN classifying RBC into 3.1 Data acquisition 3 classes, namely normal, abnormal, and miscellaneous. The data set used for the classification was acquired by They used the same network as a feature-extractor, then ap- combining different sources. plied the Error Correcting Output Codes (ECOC) classifier for the classification task, achieving an accuracy of 92.06% 1. The Chula RBC-12-data set [33] of RBC blood In addition to neural networks, Machine Learning tech- smear images, which contains a total of 706 smear niques were also employed to address the problem of Blood images describing 13 classes of RBC, and compris- Cell Image Analysis. Aliyu Hajara Abdulkarim, et al. [17] ing over 20K images of normal and pathological RBC. compared Support Vector Machine (SVM) against Deep The images provided were collected at the Oxidation Learning methods using AlexNet architecture [5]. The in Red Cell Disorders Research Unit, Chulalongkorn dataset used was open-sourced and distinguished 4 types of University in 2019, with a DS-Fi2-L3 Nikon micro- RBC abnormalities along with their normal shape. The ac- scope used at 1000x magnification. The 13 classes curacy for the CNNmodel was relatively weak and couldn’t are specified as follows: Normal cell, Macrocyte, Mi- exceed 33%, while the SVM model achieved a perfect crocyte, Spherocyte, Target cell, Stomatocyte, Ovalo- 100% on the RBC data set. The latter was deployed with cyte, Teardrop, Burr cell, Schistocyte, uncategorized, the Radial Basis Function (RBF) default setting; this same Hypochromia, Elliptocyte. 2 classes were neglected network was employed by Syahputra Mohammad Fadly, et for the lack of blood smear images. al. [15], achieving an accuracy of 83.3% using Canny Edge Detection for preprocessing and feature extraction to clas- 2. The ThalassemiaPBS-data set [40] contains 7108 sify 3 types of RBC abnormalities. images of peripheral blood smear images of four Label-free identification was also explored by various re- thalassemia patients for nine cell types (Elliptocyte, Teardrop, Normal cell, Cigar cell, Stomatocyte, Target 400 Informatica 49 (2025) 397–416 M. Saadallah et al. cell, Hypochromia, Spherocyte, Acanthocyte). The specific morphological characteristics. The organization of images were collected by a clinical pathologist from these classes was performed meticulously to ensure consis- the Clinical Pathology Laboratory of the Faculty of tency with the referenced files provided by the authors. Medicine, Public Health and Nursing, Universitas The accompanying ”Label” folder within the data set Gadjah Mada, Indonesia, using the Olympus CX21 houses a series of files providing detailed annotations microscope attached with an Optilab advance plus for each image, structured in a specific format: the x- camera with 1000x total magnification. coordinate, y-coordinate, and the corresponding RBC type encoded as a numerical value (each class is given a unique 3. The RBC-mini data set, Anti-Cancer Center El- value from 1 to 11). Oued, Algeria [57]: A small data set fragment (mini- This labeling system facilitates the task of accurate iden- batch) provided by the specialized healthcare facility: tification and classification of RBCs, thereby serving as a Anti-Cancer Center in El-Oued, Algeria, that contains foundation for various haematological studies and the de- a total of 13 blood smear images, regrouping 5 dif- velopment of automated diagnostic tools. ferent types of RBC disorders: Burr-cells, ovalocyte, The same process was replicated on the RBC-mini data set schistocyte, stomatocyte, and teardrop. The blood which we collected in collaboration with the Anti-Cancer smear images were captured in May 2024 using an op- Center in El-Oued, Algeria. The resulting blood smears tical microscope with a x1000 magnification. These were preprocessed and cropped using the OpenCV library images were integrated to augment the diversity of the as described below. The extracted images were manually RBC class and mitigate overfitting risks, not to serve labeled under the supervision of specialists at the Center. as a core data source. Table 1 regroups all 3 sources of RBC data sets and 1. Load the Image using OpenCV. lists the size of each data set per type of cell disorder, 2. Preprocess the Image by converting it to grayscale and before and after the application of data augmentation applying thresholding or edge detection to highlight techniques described in Section 3.3.1. the cells. The total size of the RBC data set is 29,363. 3. Find the cells using contour detection. 4. The Raabin-Leukemia data set [39] is a free-access data set of microscopic images of blood cells, focus- 4. Extract each cell based on the detected contours and ing on cases related to Leukemia. 2 experts labeled save them as separate image files. the cells, and the samples were captured from patients at the Takht-e Tavous laboratory in Tehran, Iran. The Zeiss microscope and LG J3 smartphone camera were used for the imaging. 5. The Malignant Lymphoma Classification data set [4] contains a significant number of labeled histopathological images of lymphoma, 3 types of this cancer are covered in this data set: Chronic lym- phocytic leukemia (CLL), follicular lymphoma (FL), and mantle cell lymphoma (MCL), through biopsies sectioned and stained with Hematoxylin/Eosin (H+E). Tables 2 and 3 present the Lymphoma and Leukemia Figure 1: Images of single-cells cropped from one blood datasets, respectively, compiled from the Malignant smear image Lymphoma Classification dataset and the Raabin- Leukemia dataset. The tables show the class distribu- The images provided by the ThalassemiaPBS-data set tion before and after applying the data augmentation [40] already consisted of single cells; therefore, no further techniques described in Section 3.3.1. The Lymphoma preprocessing was needed. This process was necessary to dataset consists of 1,436 images, while the Leukemia isolate and classify specific morphological abnormalities. dataset contains 16,811 images. In contrast, the leukemia [39] and lymphoma [4] data were not cropped as single cells. Instead, the whole 3.2 Image cropping blood smear images were retained as input, since the spatial context and global information contained in the Figure 1 presents a representative blood smear image from whole smear image are all positively contributing towards the Chula RBC-12 data set.[33] Each image was manually the classification of leukemia subtypes and malignant cropped to focus on individual RBCs and relevant regions lymphomas. These differences in preprocessing re- of interest. They were subsequently categorized based on flect the varying nature of the diagnostic tasks and were HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 401 Table 1: The complete Red Blood Cells data set description by type of cell and data size, including before and after augmentation Index Type of RBC No. of images [33] No. of images [40] No. of images [57] Total (with augm) 1 Acanthocyte 0 354 0 1432 2 Burr Cell 90 0 10 982 3 Cigar Cell 455 24 0 1893 4 Hypochromia 90 222 0 1284 5 Normal 1812 1426 0 3292 6 Ovalocyte 114 1211 4 3735 7 Schistocyte 108 0 8 453 8 Spherocyte 92 562 0 2640 9 Stomatocyte 49 382 3 1792 10 Target Cell 651 851 0 3912 11 Teardrop 26 2085 6 7948 Table 2: The Lymphoma data set description by type of cell mirroring the image horizontally or vertically, rotation, and data size, including before and after augmentation which involves altering the image by turning it by a speci- fied degree, and Gaussian blurring, which can help reduce Category Subtype Before After noise and minor details by adding a Gaussian filter to the CLL 113 443 image. Lymphoma FL 139 526 When combined, these augmentation techniques allowed us MCL 122 467 to enrich our data set, all the while relying on additional data preprocessing techniques that will be introduced in the Table 3: The Leukemia data set description by type of cell following sections. and data size, including before and after augmentation Category Subtype Before After ALL (L1) 377 1131 3.3.2 Data resizing ALL (L2) 3595 3595 Leukemia AML (m0) 672 997 Another vital preprocessing technique before training AML (m1) 425 1700 the model is ”Resizing”. Since our data set is acquired CLL 1071 3741 from various sources, it is rather imbalanced, and images CML 1624 5647 come in different sizes and shapes. Therefore, the sizes must be standardized into a uniform squared dimension before feeding the images into the model. This will allow taken into account during the design of the model pipelines. the model to learn efficiently and improve its accuracy. Each model expects a certain target size for the images. ResNet50V2 model, for instance, requires a target size 3.3 Data processing of (224, 224, 3); we were able to apply it using the flow_from_directory() method in Keras. EfficientNetB3, 3.3.1 Data augmentation however, expects input images of shape (300, 300, 3) by default, but the model can accept other input shapes as Data augmentation is a technique that is essential in image long as the shape is at least 224 × 224 and the number processing. It consists of artificially enhancing the size of of channels is 3 (RGB); thus, the input size was resized a given data set by making changes to the original images. to (224, 224, 3) to reduce computation time and memory Furthermore, this method presents a solution to improving usage. the model’s performance by mitigating common issues like overfitting. These variations of the existing images generated by the When provided with the target size, Keras uses bilinear data augmentation techniques provide a more robust data interpolation by default for the image resizing operation. set. These alterations can consist of simple geometric trans- The formula specified below is a representation of the pro- formations, and color or noise introductions, all designed to cess in which the original coordinates are mapped to new make the model’s predictions more generalizable and accu- ones using interpolation. rate. ( ( )) In the present study, three primary data augmentation tech- H W niques were employed, namely: flipping, which involves new(i′, j′) = interpolate orig or g r i · i , j · o ig (1) Htgt Wtgt 402 Informatica 49 (2025) 397–416 M. Saadallah et al. 3.3.3 Data rescaling specifics of the corresponding data set. The choice of models and more specific details are ex- To ensure uniformity across input data and improve plained later in the section. model training, all images were rescaled using appro- priate preprocessing techniques depending on the model architecture. To further enhance the CNN-based models’ 3.4.1 EfficientNetB3 efficiency, we’ve used ”Rescaling”, a technique in which the image’s range of pixel values is changed to a standard A member of the EfficientNet family that was first intro- or normalized range. duced in May 2019 by [21]. This architecture was chosen There are two common rescaling techniques: Standard- due to its superior performance in feature extraction and its ization and normalization. In our paper, we’ve opted for ability to balance computational efficiency with high accu- the latter, which ensures that various pixel values are used racy, making it well-suited for tasks like blood cell classi- during the model’s learning process. The pixels of a given fication. image can be represented as integers in the range of 0 to EfficientNets are developed based on AutoML and com- 255 in the case of an 8-bit image. Rescaling modifies these pound scaling. The authors first used the AutoML MNAS values into a different range of -1 to 1 or 0 to 1 when using Mobile framework to develop a baseline network, which normalization. they named EfficientNetB0, the first of the EfficientNet Likewise, we’ve used the flow_from_directory() method family. They then used the compound scaling method to to rescale the images by a factor of 1/255 for our training, scale up and obtain the series from B1 to B7. validation, and test batches. This method uses a form of The architectures achieved higher accuracy and efficiency Min-Max Scaling, where each pixel value is divided by despite being smaller and, thus faster than other models. 255. The minimum value (0) in this case maps to 0, and In our paper, we have opted for the B3 version which gave the maximum value (255) in turn maps to 1. Its formula promising initial results, additional layers were added to can be defined as follows: adapt the model for blood cell classification. We additionally adjusted key hyperparameters meticu- lously during training, such as learning rate, batch size, and X − X X min dropout rate. scaled = (2) Xmax − Xmin Figure 2 shows the architecture of the EfficientNetB3 Meanwhile, for the ViT model, a different preprocess- base model that we have adopted for our specific classifica- ing strategy was implemented to fit its expected input dis- tion task. The architecture was created using diagrams.net tribution. Mean–standard deviation normalization was ap- (formerly known as draw.io). [31] plied to standardize the image data and improve the model’s The model is first fed microscopic images resized to (300, convergence. Each image’s pixel values were normalized 300, 3) and processed through its pretrained backbone. using the following channel-wise means and standard devi- The default fully connected classification head of Efficient- ations: NetB3 had been removed since it is only specific to the Im- – Mean: [0.485, 0.456, 0.406] ageNet data set it was trained on (containing 1000 classes), which allowed us to add the custom classification head tai- – Standard deviation: [0.229, 0.224, 0.225] lored to our data set. This normalization follows the formula: Three versions of the same architecture were used, each with a modified softmax layer to adapt to our three differ- ent data sets: (1) for RBC classification, 11 classes, (2) for pixel(i, j) −mean normalized_pixel(i, j) = (3) Leukemia classification, 6 classes, (3) for Lymphoma clas- std_dev sification, 3 classes. Additionally, Supplementary Material: Section 2.2 in- The EfficientNetB3 backbone acts as a feature extractor, cludes the parameter-level details of the aforementioned extracting spatial and hierarchical features that are later fed data augmentation techniques. to the added layers for learning. The first five layers are frozen to prevent the weights from being updated during 3.4 Proposed solution training. This helps to adapt the deeper layers to our data set; whereas freezing more layers could have resulted in In this section, we present the architectures we employed under-fitting since the data set has unique characteristics for our blood-cell classification system based on the lat- that are significantly different from the original ImageNet est deep-learning techniques. Three state-of-the-art models data set. Deeper layers of the EfficientNetB3 backbone are were explored for this task: EfficientNetB3, ResNet50V2, unfrozen to enable the model to capture more patterns. (eg: and Vision Transformer (ViT). To further enhance the clas- cell morphology, staining patterns, etc). sification accuracy, we developed ensemble models com- This version of the model expects a (300, 300, 3) input bining the strengths of ViTs and CNNs. In training, Trans- shape by default, we have resized the input size to (224, fer Learning was used to fine-tune each of the cited archi- 224, 3) to speed up training and reduce memory usage due tectures, and the hyperparameters were optimized to suit the to limited resources and the size of our data set which is HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 403 Figure 2: EfficientNetB3model for blood cell classification: Themodel processes 300x300x3microscopic images through convolutional layers with Swish activation, followed by mobile inverted bottleneck blocks (MBConv 1 and 6). The first 5 layers are frozen, with fine-tuned deeper layers. A custom classification head is added for task-specific classification rather small. It consists essentially of modularized architectures that A dropout layer is added after the global average pooling stack building blocks of the same connecting shape, called (GAP) to reduce the overfitting that we have been subjected short-cut connections, that skip one or more layers. [10] to due to the depth of the network against the small size of These connections in ResNet work by performing identity the data set. A dropout rate of 0.3 was employed, thus de- mapping, the outputs of this mapping are added to those of activating 30% of neurons during training. This prevents the stacked layers as illustrated in Figure 3. the model from relying on these specific neurons. A fully connected layer with 256 units using the ReLU ac- tivation function is added, serving to learn complex repre- sentations. The classification head is completed with the final output dense layer. The number of units corresponds to the number of classes in each of our 3 data sets. The softmax activation function is used to specify the multi-class classification. 3.4.2 ResNet50V2 Deep convolutional neural networks have contributed to the Image-Classification field significantly, providing a ro- bust platform to researchers ever since the emergence of the first-ever deep neural network; LeNet in 1998. Later on, in 2012, the idea of Dropout was presented, allowing the model to avoid overfitting. Researchers next focused solely on adding more convolu- Figure 3: Residual learning - a building block tional layers to increase the depth of the model, and thus, its efficiency. However, the idea of simply stacking up layers ResNet50V2 is a residual neural network variant that didn’t benefit researchers as it introduced a whole new is- employs skip connections to prevent vanishing gradients sue, the accuracy degradation, which unexpectedly wasn’t during back-propagation; This ensures efficiency in learn- due to overfitting, rather, it was caused by the Vanishing ing complex features present in microscopic blood cell Gradient Effect. [6] images. Residual Neural Networks to the rescue! In 2015, ResNet152, the first of the ResNet family was introduced. Figure 4 presents the architecture of the ResNet50V2 404 Informatica 49 (2025) 397–416 M. Saadallah et al. Figure 4: ResNet-50v2 model for blood cell classification: The model processes 224 × 224 × 3 microscopic images through a series of convolutional layers with ReLU activation. It consists of four main blocks with residual connections and employs bottleneck blocks (1x1, 3x3, 1x1 convolutions). A global average pooling layer is added, followed by a fully connected classification head, and a softmax activation for predicting blood cell classes. Key components include skip connections, dropout (0.6), and task-specific fine-tuning base model that we have employed for our classification. Lymphoma classification, 3 classes. Similarly, the model was also designed using the dia- grams.net tool. 3.4.3 Experimental hyperparameters The model is fed a microscopic image of size (224,224,3) that is previously preprocessed and normalized. Table 4 presents a breakdown of the hyperparameters and It consists of 50 layers, focusing more on improved gradi- setup used in the experiments based on the training pipeline ent flow and training stability by introducing pre-activation using Keras/Tensorflow, along with their purposes, as well residual blocks and applying batch-normalization and as the strategies employed to transfer-learn and fine-tune activation (ReLU) before convolutions. the models and achieve the best accuracies possible. The network’s initial block captures low-level features The EfficientNetB3 and ResNet50V2models were both such as edges, textures, and patterns by implementing trained using the same hyperparameters detailed in Table 4. convolutional and pooling layers, followed by 4 residual blocks, with skip connections to prevent vanishing gra- 3.4.4 Experimental environment dient problems. Higher-level features are extracted using down-sampling (strides). Hardware: The experiments for all 3 models were con- The final output of these blocks is passed through a global ducted by Google Colab; It typically consists of NVIDIA average pooling layer (GAP) to reduce the feature map to a Tesla GPUs. 1D vector; A fully connected layer and a softmax classifier are added. Software: Platform: Google Colab, a hosted Jupyter Notebook environment. Framework(s): Tensorflow v2.18.0 and Keras v3.6.0 were The base model acts as a feature extractor, and the cus- used to develop, train, and evaluate the 3 models. tom layers act as a task-specific classifier, tailored to blood- Python: Version (3.10). cell classification. Similar to the EfficientNet, the model Libraries: matplotLib, numpy, PIL, joblib, and more were was transfer-learned by freezing the first 5 layers. This pre- used to preprocess, analyze, and visualize data. vents overfitting as the learning focuses on the deep layers, and a dropout layer is also added to ensure the model does not memorize the training data. Preceded by a 256 units Storage: The 3 data sets were preprocessed and split into dense layer that allows the model to combine the learned Training, Validation, and Test data sets, each stored in features to improve classification and the classifier ends Google Drive, which is mounted to the Colab environment with a dense layer that has the same number of neurons for access. The detailed data split strategy, including train- as the classification classes: (1) for RBC classification, 11 ing, validation, and testing partitions, is provided in Sup- classes, (2) for Leukemia classification, 6 classes, (3) for plementary Material: Section 2.1. HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 405 Table 4: Experimental hyperparameters for training the CNN models and their purposes Hyperparameter Value Purpose Optimizer Adam (LR=10−4) A low learning rate is employed to fine-tune pre- trained layers. Loss function Categorical Used for multi-class classification. crossentropy Metrics Accuracy To monitor the number of correctly classified in- stances during training. Steps-per-epoch ResNet50V2: 20, The number of training batches processed per EfficientNetB3: epoch. 200 Validation steps ResNet50V2: 10, The number of validation batches processed per val- EfficientNetB3: idation step. 316 Epochs ResNet50V2: 300, Specifies the training schedule which allows grad- EfficientNetB3: 10 ual convergence. 3.5 ViT model for Lymphoma classification, 3 classes. Visual or Vision Transformers (ViT) is a novel approach in- The pretrained backbone uses the google/vit-base- troduced by Dosovitskiy et al. [28]. It uses the concept of patch16-224-in21K model from the Hugging Face library transformers designed specifically for visual applications [25] as a feature extractor. The model was trained on the and image classification tasks in particular. ImageNet-21K data set [8]. It was fine-tuned to adapt to When using the transformer blocks in ViT, the multi-head the blood cell classification task, where the number of attention mechanism is applied to integrate global context labels was defined as the number of classes in the data set efficiently and learn high-level features. [42] as mentioned previously in this section. Following the success of NLP transformers [16], Dosovit- skiy et al. were inspired to develop a new attention-based class of models that can be exploited in Computer Vision. The transformer encoder depicted in Figure 5 is first Compared to NLP transformers, ViT only uses the encoder provided with the embedded patches (Patch-embedding/ attention branch, neglecting the decoder attention branch, Position-embedding). The input image is divided into whilst word tokens are replaced by image patches. fixed-size patches of 16x16. We next apply a linear projec- In a normal CNN, the entire image is taken as input, tion on the flattened patches to form a fixed-dimensional whereas in ViT, the image is first divided into equal-sized vector. Unlike CNNs, Transformers require position em- patches, which are passed through some linear layers; the beddings to learn and capture the input’s order of sequence outputs of this layer are known as patch embeddings. Be- [38]. It serves to improve accuracy and encode the spatial tween these embeddings, we have the position embeddings, information of the patches. which serve to provide the model with some positional in- formation regarding the sequence of these patches. After- The combined embedded patches are fed into the Trans- ward, another learnable token is added to the position em- former Encoder to go through a series of L layers, each in- bedding for image classification purposes. cluding a list of components, as follows: Figure 5 presents the architecture of the ViT model we 1. Multi-head self-attention is a mechanism that enables have employed for our blood cell classification task. Prior the model to learn global patterns by splitting the to the training phase, the data was first prepared and pro- process of self-attention into multiple heads, where cessed to fit the model’s requirements and expected input. each head focuses on the interaction between patch- The data set was initially split into training, validation, and embeddings differently.[16] The attention calculations test sets and stored in specific folders. The ImageFolder are eventually merged to give a more global score. utility was used to load the images and associate them with their corresponding classes based on the folder names pro- 2. The output of the Multi-head attention is added to vided. The images were later resized to fit the shape ex- the input of the next component by a skip connec- pected by the ViT model: 224×224. Further normalization tion (residual connection) after normalization. As ex- was implemented to standardize the image data, and make plained earlier, residual connections are added to pre- it more suitable for the model (See Section 3.3.3). vent the vanishing gradient during training. Similarly to the CNN architecture, three versions were implemented for each data set: (1) for RBC classification, 3. To further enhance the model’s learning through 11 classes, (2) for Leukemia classification, 6 classes, (3) patch-embeddings, a feed-forward network (FFN) 406 Informatica 49 (2025) 397–416 M. Saadallah et al. Figure 5: Vision Transformer (ViT) Architecture for blood cell classification: The model processes 224×224 microscopic images through patch embeddings and position encoding, which are later fed to the transformer encoder with loaded weights from the pretrained ViT-B-16 in 21K model. After passing through the transformer encoder, the embeddings are used as the input to the classification head (MLP + Softmax) [48] is fed to the normalized output of the Multi- 224-in21k, along with their respective values; detailing the head attention, which consists of fully connected lay- batch size, the learning rate, and the optimizer employed. ers with a GeLU activation in between. This allows The OneCycleLr Scheduler was used as a strategy to vary the model to capture local transformations. the learning rate during training; each cycle uses a maxi- mum of 10−3 as a learning rate. Other parameters include 4. Similarly to the Multi-head self-attention block, the the CrossEntropyLoss function and a total of 10 epochs FFN is normalized and added to the residual connec- (624 batches per epoch) to train the model. tion. The output of the Transformer Encoder is a sequence of em- bedding, enrichedwith local and global contextual informa- Table 5: Experimental hyperparameters for training the ViT tion, independently for each patch. model After passing through the Transformer Encoder, the embed- ding corresponding to the special classification token (cls) Hyperparameter Value is used as the input to the classification head that consists Batch size 32 of a Multi-Perceptron head (MLP), and a softmax classifi- Learning rate 10−4 (initial) cation head. Optimizer AdamW The MLP takes the output of the Transformer Encoder and Scheduler OneCycleLR feeds it into a series of fully connected layers to prepare the Scheduler Max lr 10−3 data for the softmax classification head that processes it to the desired classes. Number of epochs 10 Loss Function CrossEntropyLoss 3.5.1 Experimental hyperparameters Model backbone google/vit-base-patch16-224-in21k Table 5 outlines the hyperparameters used for training a ViT model using the backbone google/vit-base-patch16- HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 407 after experimenting with the most prevalent methods in image classification tasks, namely, maximum voting, the averaging method, and the weighted sum. In the weighted- average method, the models are assigned different weights after training, defining the importance of each model for prediction. The weighted-average ensemble combines predictions from a CNN (M1) and a Vision Transformer (M2), with o∑utput probabilities for class c denoted as P1(c) and P2(c), respectively, obtained via the softmax function to ensure c Pi(c) = 1 for i = 1, 2. The weights w1 and w2 are as- signed to M1 and M2 based on validation performance. The ensemble probability for class c is computed as: w P 1P1(c) + w2P2(c) ensemble(c) = (4) w1 + w2 The final class prediction is determined by selecting the class with the highest ensemble probability: ĉ = arg max Pensemble(c) (5) c Preprocessing: The input fed to the already trained models is first preprocessed; each model is preprocessed differently. The ViT uses normalization with mean and std, while the CNN uses simple rescaling (1/255). The models are later loaded to make predictions, and both models output probabilities for the different classes; we used the softmax function to ensure that they sum up to 1. Weight Selection: Weights w1 and w2 were determined Figure 6: Workflow of the weighted average ensemble through a grid search over predefined pairs, specifically method: The input is preprocessed to fit the ViT and the [(0.3, 0.7), (0.4, 0.6)], where each pair sums to 1 to main- CNN models, and the predictions of the models are later tain normalized probabilities. The grid search evaluated combined using the weighted average approach to generate each weight combination on a validation subset using the final prediction classification accuracy as the performance metric. The pair that achieved the highest accuracy was selected. Further details about the ensemble weight selection and perfor- 3.5.2 ViT-CNN ensemble model mance across datasets are provided in the Supplementary Material: Section 3. To further enhance the performance of our models, an ensemble method was introduced to seek the opinions of several models and combine them to achieve highly accurate classifications than those of the raw models when 4 Results trained separately [44] [50]. Through our experiments, we have observed the superiority of residual networks This section provides an in-depth analysis of the results in training and efficient learning, while the ViT model obtained from our experiments. First, we explore the performed better in certain instances, focusing more on performance of our individual models, using the insights learning complex features. Thus, we incorporated in our present in the confusion matrix and focusing on metrics methodology a dual-architecture ensemble, combining such as: (1) accuracy, (2) precision, (3) recall, (4) F1-score, the residual network’s efficiency with the high precision (5) Cohen kappa, and (6) AUC scores, followed by an obtained by the ViT. evaluation of the ensemble model, HematoFusion. The evaluation is conducted across the three data sets we have Figure 6 presents the flowchart of the ResNet-ViT introduced in earlier sections, and the results are eventually ensemble model that we have implemented. interpreted in the context of existing literature. The weighted-average ensemble method was selected 408 Informatica 49 (2025) 397–416 M. Saadallah et al. The Accuracy is calculated by measuring the number of where: predicted cases. When achieved, a high accuracy means the overall performance of the model is good. However, in the TP TPR = (True Positive Rate or Sensitivity) case of imbalanced data sets, high accuracy can be mislead- TP + FN ing, and other metrics are necessary to further evaluate the FP FPR = (False Positive Rate). model. FP + TN TP + TN Accuracy = (6) TP + TN + FP + FN 4.1 Classification results The Precision is calculated by measuring the number To analyze the model’s performance during training, the of correctly predicted positive cases. A high precision is accuracy and loss were both monitored and visualized achieved only if most of the positive cases are correctly pre- through the training curves over successive epochs. dicted. [45] Figures S5, S6, and S7 (Supplementary Material Section TP Precision = (7) 7) depict the training curves for the RBC Morphology, TP + FP Leukemia, and Lymphoma data sets, respectively. The Recall, also known as Sensitivity or True Positive Rate measures whether all relevant cases of the data set Additionally, for further visual evaluation of the classi- were correctly predicted.[45] fication performance, confusion matrices were computed on the test set, showing the number of accurate and TP inaccurate predictions of instances, namely: True Negative Recall = (8) TP + FN (TN), True Positive (TP), False Negative (FN), and False Positive (FP). To address the accuracy’s shortcomings in handling im- The confusion matrices generated for the RBC Morphol- balanced data sets, which is the case in our paper, the F1- ogy, Leukemia, and Lymphoma datasets are displayed, score was introduced for balanced evaluations, combining respectively, in Figures 7, 8, and 9. precision and recall in one metric. The F1-score eventu- ally only achieves a high accuracy when both precision and These confusion matrices were used to compute the recall are high. [43] quantitative metrics for a more specific evaluation. Detailed tables presenting per-class performance metrics F1 Score = 2 · Precision · Recall (9) (precision, recall, and F1-score, kappa) for each model and Precision + Recall dataset combination are provided in the SupplementaryMa- The Cohen Kappa was introduced as a statistical mea- terial: Section 4, Tables S3–S5. sure of the agreement between the predicted labels and their The results for the classification of RBC Morphology, actual values. If κ=1, the perfect agreement is achieved, Leukemia, Lymphoma, the Ensemble model, HematoFu- if κ=0, the agreement is random, and if κ<0, it means the sion are summarized in Tables 6, 7, and 8, respectively. To model achieved more than a random agreement. [36] test model stability and robustness over sets, we conducted a bootstrapping analysis. This technique provides us with Po − Pe an estimate of results’ variability and increases the validity κ = 1 ) 1 − ( 0 Pe of our performance claims over single-run statistics. The detailed bootstrapping results with distribution plots where: and summary statistics are presented in Supplementary Material: Section 6 (Table S7 and Figures S2–S4). Po = Observed agreement (accuracy) Furthermore, we have calculated and included the AUC Pe = E∑xpected agreement based on chance scores (Table S6) and ROC curves (Figure S1) for our k (Ai · B individual models across each dataset, as shown in Supple- i)Pe = N2 mentary Material: Section 5. i=1 The Area Under the Curve (AUC) score, specifically the Area Under the Receiver Operating Characteristic (ROC) 5 Discussion curve [24], evaluates a model’s ability to discriminate be- tween classes at various thresholds of classification. An 5.1 Interpretation of results AUC score approaching 1 indicates high discriminative ca- pability, particularly useful for unbalanced datasets that are The convergence of the ResNet50V2 model illustrates a common in blood cell classification, where accuracy be- steady reduction in training loss, and the accuracy becomes comes misleading base∫d on class difference. stable after reaching a certain number of epochs. The ViT model has demonstrated higher fluctuation in both 1 accuracy and loss during training. AUC = TPR(FPR) dFPR (11) 0 HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 409 (a) EfficientNetB3 model (b) ResNet50V2 model (c) Pretrained ViT model (d) HematoFusion model Figure 7: Confusion matrices for the classification performance of the four models on the RBC data set: (a) EfficientNetB3 model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and ResNet50V2 models. Table 6: RBC Morphology classification results across the three individual models with detailed metrics for evaluation Model Model Performance Train Acc Val Acc Test Acc Kappa Recall F1-score Precision EfficientNetB3 0.99 0.91 0.97 0.93 0.92 0.92 0.92 ResNet50V2 0.98 0.98 0.92 0.92 0.93 0.93 0.93 ViT 0.98 0.94 0.96 0.96 0.94 0.94 0.94 Table 7: Leukemia classification results across the three individual models with detailed metrics for evaluation Model Model Performance Train Acc Val Acc Test Acc Kappa Recall F1-score Precision EfficientNetB3 1.0 1.0 1.0 0.99 0.99 0.99 0.99 ResNet50V2 1.0 1.0 1.0 0.99 1.0 1.0 1.0 ViT 0.99 0.99 0.99 0.99 1.0 1.0 1.0 410 Informatica 49 (2025) 397–416 M. Saadallah et al. (a) EfficientNetB3 model (b) ResNet50V2 model (c) Pretrained ViT model (d) HematoFusion model Figure 8: Confusion matrices for the classification performance of the four models on the Leukemia data set: (a) Effi- cientNetB3 model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and ResNet50V2 models. Table 8: Lymphoma classification results across the three individual models with detailed metrics for evaluation Model Model Performance Train Acc Val Acc Test Acc Kappa Recall F1-score Precision EfficientNetB3 1.0 0.99 0.99 0.97 0.99 0.99 0.99 ResNet50V2 1.0 0.91 0.96 0.91 0.96 0.96 0.96 ViT 0.98 0.98 0.95 0.92 0.98 0.98 0.98 Table 9: HematoFusion ensemblemodel classification results across the three datasets, showing detailed evaluationmetrics Dataset Ensemble Model Performance Best Acc Kappa Recall F1-score Precision RBC 0.96 0.94 0.97 0.97 0.97 Leukemia 0.99 0.99 1.00 1.00 1.00 Lymphoma 0.96 0.95 0.97 0.97 0.97 When comparing the True Positives of the proposed Hypochromia class. The ensemble model, on the other HematoFusion model with the individual models, we hand, exhibited a stronger ability to recognize this class. can clearly observe an increase in the rates of correctly Whereas, Acanthocyte and Teardrop were easier to iden- classified cases and a decrease in the misclassification tify, owing to their distinguishable shapes, which was rates. reflected in the high number of TP. The individual models struggled with predicting the HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 411 (a) EfficientNetB3 model (b) ResNet50V2 model (c) Pretrained ViT model (d) HematoFusion model Figure 9: Confusion matrices for the classification performance of the four models on the Lymphoma data set: (a) Effi- cientNetB3 model, (b) ResNet50V2 model, (c) pretrained ViT model, (d) HematoFusion model combining the ViT and ResNet50V2 models. In Table 6, a slight overfitting is observed due to the The Ensemble model, HematoFusion, demonstrates more class imbalance; thus, the accuracy can be misleading for uniform results across all data sets in terms of all evaluat- accurate measurement of the performance. This, however, ing metrics, mitigating the issues with the class imbalance, was addressed with the use of precision, recall, and F1- as evidenced by its performances, leveraging the strengths score. Although EfficientNetB3 was slightly better on test of both the ViT and ResNet50V2models that struggled with accuracy (0.97), the ViT model outperformed in stronger some classes. and more descriptive metrics, like Kappa score, precision, The precision improved by 4% in the RBC data set and recall, and F1-score, which indicate more balanced perfor- reached a perfect 100% for Leukemia classification, av- mance with class imbalance. Therefore, ViT is graded as eraging the performances of the individual models on the the overall best performer on the RBC morphology dataset. Lymphoma data set with a precision of 97% on the test set. Table 7 shows more consistent results on the Leukemia Despite the strong performance of our proposed solution, data set across all models, achieving perfect classification, further improvement could be implemented to help the which indicates better generalizations. model generalize better and address the issue of class im- Both ResNet50V2 and EfficientNetB3 achieved compara- balance efficiently. ble top-tier performances on the Leukemia dataset, with identical test accuracy, precision, recall, and F1-score, and minor variations in other evaluation metrics. Efficient- 5.2 Comparative study NetB3, alternatively, outperforms the other models on the Table 9 presents a breakdown of the performance of the pro- Lymphoma classification (Table 8), reaching almost perfect posed solution across all three datasets, outlining the accu- accuracies. racy and precision of the model when compared to the lit- 412 Informatica 49 (2025) 397–416 M. Saadallah et al. Table 10: Comparative results of the proposed solution and the literature across different metrics for each data set Dataset HematoFusion Literature Accuracy Precision Accuracy Precision RBC 0.96 0.97 0.98 0.97 Leukemia 0.99 1.0 0.99 0.99 Lymphoma 0.96 0.97 0.96 0.96 erature. to known issues like dataset imbalance. As identified pre- In a bid to substantiate the efficiency of our proposed solu- viously, some classes were underrepresented, and this could tion, we evaluated it against the following models: result in biased learning as well as overfitting. To miti- gate this, data augmentation techniques were employed (as 1. Literature [7] RBC Classification: The authors pre- outlined in Supplementary Section 2.2), and performance sented a Maximum Voting based Ensemble Model to was monitored across a variety of metrics (precision, re- classify Dacrocyte (Teardrop), Schistocyte, and Ellip- call, F1-score, Cohen’s Kappa, and AUC scores) rather tocyte cells (Cigar cells) in Iron Deficiency Anemia. than simply accuracy. However, we are aware that the The average classification precision and accuracy of lack of external validation data limits generalizability. Al- the latter reached a maximum of 97% and 98%, re- though high-performance metrics are presented, the models spectively; While both models achieved the same pre- have not been prospectively validated within a real clini- cision of 97%, the model in the Literature reported a cal workflow. Their incorporation into clinical decision- slightly higher accuracy (98%) compared to Hemato- making would require extensive regulatory testing and in- Fusion’s 96%. Nonetheless, it’s worth noting that our terpretability evaluation. Additionally, while conventional data set comprises 11 classes against the 3 classes stud- regularization techniques such as dropout and data aug- ied in this article. mentation were applied to address overfitting, we recog- 2. Literature [32] Leukemia Classification: The authors nize the need for more advanced strategies. Future work proposed a ViT-CNN Ensemble Model for the diagno- will explore class imbalance mitigation techniques such sis of Acute Lymphoblastic Leukemia (ALL), which as SMOTE, GAN-based synthetic image generation, and is one of the 6 classes that we analyzed in our pa- uncertainty-aware training beyond testing on independent per. Compared to the model in the Literature, which cohorts, to further assess the robustness of the model in ac- achieved 99% accuracy and 99% precision on the tual clinical settings. Furthermore, we intend to conduct Leukemia dataset, HematoFusion matched the accu- ablation studies on ensemble weight parameters and data racy (99%) but outperformed in precision, achieving a augmentation strategies to evaluate their individual contri- perfect 100%. butions. 3. Literature [41] Lymphoma Classification: Malignant Lymphoma (ML) was addressed in this paper, which 6 Conclusion is among the 3 classes that appear in our Lymphoma data set. In this study, the problem of pathological blood cell clas- The proposed hybrid model used the combined fea- sification was addressed through the use of novel deep- tures of 3 deep learning networks, namely, MobileNet- learning strategies. We curated a data set for RBCMorphol- VGG16, VGG16-AlexNet, and MobileNet-AlexNet, ogy classification, consisting of samples from three dif- to classify the models by the XGBoost and DT algo- ferent sources. The process involved preprocessing tech- rithms, reaching an average accuracy and precision of niques to establish a data set aligned with our research ob- 96%. jectives; 2 other data sets were acquired, targeted for Lym- phoma and Leukemia classifications separately. An extended version of this comparison, covering a broader Three distinct individual models were applied for each of range of SOTA models and datasets, is provided in Supple- the data sets: the EfficientNetB3, ResNet50V2, and a pre- mentary Material: Section 8 (Table S8). trained ViT model. To leverage the strengths of both the Overall, our proposed HematoFusion Ensemble model CNN and ViT architectures, an Ensemble model using the achieved a reliable performance across the 3 data sets, de- weighted average method was developed. spite the imbalanced data set and the high number of classes The present findings confirm that the proposed Hemato- in the case of RBC Morphology Classification. Fusion model mitigates the shortcomings of the individual models by enhancing the accuracy, precision, and sensitiv- 5.3 Limitations ity, achieving more consistent results across the three data sets. While HematoFusion demonstrates competitive or su- Although the reported results show high precision, reach- perior performance on Leukemia and Lymphoma classifi- ing up to 99%, this should be interpreted with caution due cation, particularly in precision and F1-score, it performs HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 413 comparably on RBC classification, despite its higher num- [8] O. Russakovsky et al. ImageNet Large Scale Vi- ber of classes and the issue of data imbalance that resulted sual Recognition Challenge. Preprint at https:// in a few cases of overfitting. We additionally acknowledge arxiv.org/abs/1409.0575. 2015. certain limitations in predicting a couple of classes. These [9] J. C. Chapin and M. T. Desancho. “Hematologic are the key components to overcome in future research. Fu- Dysfunction in the ICU”. In: Critical Care. Ed. by ture studies should also be devoted to covering more patho- J. M. Oropello, S. M. Pastores, and V. Kvetan. New logical blood disorders and implementing further process- York: McGraw-Hill Education, 2016. ing and data augmentations to alleviate the issue of class imbalance and overfitting. [10] Kaiming He et al. Identity Mappings in Deep Resid- Overall, this paper provides a foundation for future de- ual Networks. Preprint at http : / / arxiv . org / velopments by establishing baseline data that future re- abs/1603.05027. 2016. searchers can expand upon to address the limited data avail- [11] Kenneth Kaushansky et al. Williams Hematology. able for RBC Morphology and combining the strengths of New York: McGraw-Hill Education, 2016. the residual networks and vision transformers for a more [12] GaoHuang et al. “Densely Connected Convolutional robust framework. Networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Hon- References olulu: IEEE, 2017, pp. 4700–4708. DOI: https:// doi.org/10.48550/arXiv.1608.06993. [1] Yann LeCun et al. “Gradient-based learning applied [13] Yiyue Jiang et al. “Label-free detection of aggre- to document recognition”. In: Proceedings of the gated platelets in blood by machine-learning-aided IEEE 86.11 (1998), pp. 2278–2324. DOI: https : optofluidic time-stretch microscopy”. In: Lab on a //doi.org/10.1109/5.726791. Chip 17.14 (2017), pp. 2426–2434. DOI: https : [2] KS Kim et al. “Analyzing blood cell image to dis- //doi.org/10.1039/C7LC00396J. tinguish its abnormalities”. In: Proceedings of the [14] Mazin Z Othman, Thabit S Mohammed, and Alaa B eighth ACM international conference onmultimedia. Ali. “Neural network classification of white blood New York: Association for Computing Machinery, cell using microscopic images”. In: International 2000, pp. 395–397. DOI: https://doi.org/10. Journal of Advanced Computer Science and Appli- 1145/354384.354543. cations 8.5 (2017), pp. 99–103. DOI: https : / / [3] Jia Deng et al. “ImageNet: A large-scale hierarchical doi.org/10.14569/IJACSA.2017.080513. image database”. In: 2009 IEEE conference on com- [15] Mohammad Fadly Syahputra, Anita Ratna Sari, and puter vision and pattern recognition. Miami: Ieee, Romi Fadillah Rahmat. “Abnormality classification 2009, pp. 248–255. DOI: https://doi.org/10. on the shape of red blood cells using radial basis 1109/CVPR.2009.5206848. function network”. In: 2017 4th International Con- [4] Nikita Orlov et al. “Automatic Classification of ference on Computer Applications and Information Lymphoma Images With Transform-Based Global Processing Technology (CAIPT). Kuta Bali, Indone- Features”. In: IEEE transactions on information sia: IEEE, 2017, pp. 1–5. DOI: https://doi.org/ technology in biomedicine : a publication of the 10.1109/CAIPT.2017.8320739. IEEE Engineering in Medicine and Biology Society [16] Ashish Vaswani et al. “Attention is all you need”. In: 14 (2010), pp. 1003–13. DOI: https://doi.org/ Advances in Neural Information Processing Systems 10.1109/TITB.2010.2050695. 30 (2017). DOI: https://doi.org/10.48550/ [5] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. arXiv.1706.03762. Hinton. “ImageNet Classification with Deep Convo- [17] Hajara Abdulkarim Aliyu et al. “Red blood cell clas- lutional Neural Networks”. In: Advances in Neural sification: deep learning architecture versus support Information Processing Systems. Ed. by F. Pereira et vector machine”. In: 2018 2nd international confer- al. Vol. 25. Lake Tahoe, Nevada: Curran Associates, ence on biosignal analysis, processing and systems Inc., 2012, pp. 1097–1105. (ICBAPS). Kuching,Malaysia: IEEE, 2018, pp. 142– [6] K. He et al. Deep Residual Learning for Image 147. DOI: https://doi.org/10.1109/ICBAPS. Recognition. Preprint at https : / / arxiv . org / 2018.8527398. abs/1512.03385. 2015. [18] Paul Mooney. Blood Cell Images. 2018. URL: [7] Mahsa Lotfi et al. “The detection of dacrocyte, schis- https : / / www . kaggle . com / datasets / tocyte and elliptocyte cells in iron deficiency ane- paultimothymooney/blood-cells. mia”. In: 2015 2nd International conference on pat- [19] Mariam Nassar et al. “Label-free identification of tern recognition and image analysis (IPRIA). Rasht, white blood cells using machine learning”. In: Cy- Iran: IEEE, 2015, pp. 1–5. DOI: https : / / doi . tometry Part A 95.8 (2019), pp. 836–842. DOI: org/10.1109/PRIA.2015.7161628. https://doi.org/10.1002/cyto.a.23794. 414 Informatica 49 (2025) 397–416 M. Saadallah et al. [20] N. C. Shenggan. BCCD Dataset. https : / / [31] JGraph. diagrams.net, draw.io. Oct. 2021. URL: github.com/Shenggan/BCCD_Dataset. 2019. https://www.diagrams.net/. [21] Mingxing Tan and Quoc V. Le. “EfficientNet: Re- [32] Zhencun Jiang et al. “Method for diagnosis of acute thinking Model Scaling for Convolutional Neural lymphoblastic leukemia based on ViT-CNN ensem- Networks”. In: Proceedings of the 36th International ble model”. In:Computational Intelligence and Neu- Conference on Machine Learning. Vol. 97. Long roscience 2021.1 (2021), p. 7529893. DOI: https: Beach, California: PMLR, 2019, pp. 6105–6114. //doi.org/10.1155/2021/7529893. DOI: https : / / doi . org / 10 . 48550 / arXiv . [33] Korranat Naruenatthanaset et al.Red Blood Cell Seg- 1905.11946. mentation with Overlapping Cell Separation and [22] Laith Alzubaidi et al. “Classification of red blood Classification on Imbalanced Dataset. Preprint at cells in sickle cell anemia using deep convolu- https://arxiv.org/abs/2012.01321. 2021. tional neural network”. In: Intelligent Systems De- [34] Maithra Raghu et al. “Do vision transformers see sign and Applications. Ed. by Ajith Abraham et like convolutional neural networks?” In: Advances al. Vol. 1. Cham: Springer International Publishing, in neural information processing systems 34 (2021), 2020, pp. 6–8. DOI: https://doi.org/10.1007/ pp. 12116–12128. DOI: https://doi.org/10. 978-3-030-16657-1_51. 48550/arXiv.2108.08810. [23] Yasmin M Kassim et al. “Clustering-Based Dual [35] Georg Steinbuss et al. “Deep learning for the classifi- Deep Learning Architecture for Detecting Red cation of non-Hodgkin lymphoma on histopathologi- BloodCells inMalaria Diagnostic Smears”. In: IEEE cal images”. In:Cancers 13.10 (2021), p. 2419. DOI: Journal of Biomedical and Health Informatics 25.5 https://doi.org/10.3390/cancers13102419. (2020), pp. 1735–1746. DOI: https://doi.org/ 10.1109/JBHI.2020.3034863. [36] Željko Vujović et al. “Classification model evalua- tion metrics”. In: International Journal of Advanced [24] Tatiana Cristina Figueira Polo and Hélio Amante Computer Science and Applications 12.6 (2021), Miot. Use of ROC curves in clinical and experimen- pp. 599–606. DOI: https://doi.org/10.14569/ tal studies. 2020. DOI: https://doi.org/10. IJACSA.2021.0120670. 1590/1677-5449.200186. [37] XufengYao et al. “Classification of white blood cells [25] Thomas Wolf et al. HuggingFace’s Transform- using weighted optimized deformable convolutional ers: State-of-the-art Natural Language Processing. neural networks”. In:Artificial Cells, Nanomedicine, 2020. arXiv: 1910.03771 [cs.CL]. URL: https: and Biotechnology 49.1 (2021), pp. 147–155. DOI: //arxiv.org/abs/1910.03771. https://doi.org/10.1080/21691401.2021. [26] Yassine Barhoumi and Ghulam Rasool. Scope- 1879823. former: n-CNN-ViT hybrid model for intracranial [38] Kai Jiang et al. “The encoding method of position hemorrhage classification. 2021. DOI: https : / / embeddings in vision transformer”. In: Journal of Vi- doi.org/10.48550/arXiv.2107.04575. sual Communication and Image Representation 89 [27] Ferhat Bozkurt. “Classification of blood cells from (2022), p. 103664. DOI: https://doi.org/10. blood cell images using dense convolutional net- 1016/j.jvcir.2022.103664. work”. In: Journal of Science, Technology and En- [39] Zahra Mousavi Kouzehkanan et al. “A large dataset gineering Research 2.2 (2021), pp. 81–88. DOI: of white blood cells containing cell locations and https://doi.org/10.53525/jster.1014186. types, along with segmented nuclei and cytoplasm”. [28] Alexey Dosovitskiy et al. An Image is Worth 16x16 In: Scientific Reports 12.1 (2022), p. 1123. DOI: Words: Transformers for Image Recognition at https : / / doi . org / 10 . 1038 / s41598 - 021 - Scale. Preprint at https : / / arxiv . org / abs / 04426-x. 2010.11929. 2021. [40] Dyah Aruming Tyas et al. “Erythrocyte (red blood [29] Mawaddah Harahap et al. “Implementation of Con- cell) dataset in thalassemia case”. In: Data in Brief volutional Neural Network in the classification 41 (2022), p. 107886. DOI: https://doi.org/ of red blood cells have affected of malaria”. In: 10.1016/j.dib.2022.107886. Sinkron: jurnal dan penelitian teknik informatika 5.2 [41] Mohammed Hamdi et al. “Hybrid Models Based on (2021), pp. 199–207. DOI: https://doi.org/10. Fusion Features of a CNN and Handcrafted Features 33395/sinkron.v5i2.10713. for Accurate Histopathological Image Analysis for [30] Danish Jamil et al. “Diagnosis of gastric cancer using Diagnosing Malignant Lymphomas”. In: Diagnos- machine learning techniques in healthcare sector: a tics 13.13 (2023), p. 2258. DOI: https : / / doi . survey”. In: Informatica 45.7 (2021). DOI: https: org/10.3390/diagnostics13132258. //doi.org/10.31449/inf.v45i7.3633. HematoFusion: A Weighted Residual-Vision Transformer… Informatica 49 (2025) 397–416 415 [42] Rojina Kashefi et al. Explainability of Vision Trans- https : / / doi . org / 10 . 1109 / ICDICI62993 . formers: A Comprehensive Review and New Per- 2024.10810888. spectives. Preprint at https://arxiv.org/abs/ [52] Ruaa Sadoon and Adala Chaid. “Classification of 2311.06786. 2023. pulmonary diseases using a deep learning stack- [43] Gireen Naidu, Tranos Zuva, and Elias Mmbongeni ing ensemble model”. In: Informatica 48.14 (2024). Sibanda. “A review of evaluation metrics in machine DOI: https : / / doi . org / 10 . 31449 / inf . learning algorithms”. In: Computer science on-line v48i14.6145. conference. Springer. 2023, pp. 15–25. DOI: https: [53] Umair Saeed et al. “DeepLeukNet—A CNN based //doi.org/10.1007/978-3-031-35314-7_2. microscopy adaptation model for acute lymphoblas- [44] Austin H Routt et al. “Deep ensemble learning en- tic leukemia classification”. In: Multimedia Tools ables highly accurate classification of stored red and Applications 83.7 (2024), pp. 21019–21043. blood cell morphology”. In: Scientific Reports 13.1 DOI: https : / / doi . org / 10 . 1007 / s11042 - (2023), p. 3152. DOI: https : / / doi . org / 10 . 023-16191-2. 1038/s41598-023-30214-w. [54] Md Shahin Ali et al. “A Hybrid VGG16-ViT Ap- [45] Hongwei Shang et al. “Precision/recall on imbal- proach With Image Processing Techniques for Im- anced test data”. In: International Conference on proved White Blood Cell Classification and Disease Artificial Intelligence and Statistics. PMLR. 2023, Diagnosis: A Retrospective Study”. In: Health Sci- pp. 9879–9891. URL: https : / / proceedings . ence Reports 8.6 (2025), e70859. DOI: https:// mlr.press/v206/shang23a.html. doi.org/10.1002/hsr2.70859. [46] Enquan Yang et al. “DRNet: Dual-stage refinement [55] Sazzli Kasim et al. “Multiclass leukemia cell classifi- network with boundary inference for RGB-D seman- cation using hybrid deep learning andmachine learn- tic segmentation of indoor scenes”. In: Engineering ing with CNN-based feature extraction”. In: Scien- Applications of Artificial Intelligence 125 (2023), tific Reports 15.1 (2025), p. 23782. DOI: https : p. 106729. ISSN: 0952-1976. DOI: https://doi. //doi.org/10.1038/s41598-025-05585-x. org/10.1016/j.engappai.2023.106729. [56] Aniel Mahendren et al. “White Blood Cells Clas- [47] Areen K Al-Bashir, Ruba E Khnouf, and Lamis R sification: A Feature-Based Transfer Learning Ap- Bany Issa. “Leukemia classification using differ- proach”. In: Selected Proceedings from the 2nd ent CNN-based algorithms-comparative study”. In: International Conference on Intelligent Manufac- Neural Computing and Applications 36.16 (2024), turing and Robotics, ICIMR 2024, 22-23 August, pp. 9313–9328. DOI: https : / / doi . org / 10 . Suzhou, China. Ed. by Wei Chen et al. Singa- 1007/s00521-024-09554-9. pore: Springer Nature Singapore, 2025, pp. 757–763. [48] Martin Moller. “Efficient training of feed-forward ISBN: 978-981-96-3949-6. DOI: https : / / doi . neural networks”. In: Neural Network Analysis, org/10.1007/978-981-96-3949-6_63. Architectures and Applications. CRC Press, 2024, [57] Mouna Saadallah. Red Blood Cell Morphology pp. 136–173. DOI: https://doi.org/10.1201/ Dataset for Image Classification. Zenodo, Feb. 9781003572886-8. 2025. DOI: https : / / doi . org / 10 . 5281 / [49] Emine Özgür and Ahmet Saygılı. “A new approach 14936017. URL: https : / / zenodo . org / for automatic classification of non-hodgkin lym- records/14936017. phoma using deep learning and classical learning [58] Vera Sorin et al. “Deep Learning Applications methods on histopathological images”. In: Neu- in Lymphoma Imaging”. In: Acta Haematologica ral Computing and Applications 36.32 (2024), (2025). DOI: https : / / doi . org / 10 . 1159 / pp. 20537–20560. DOI: https://doi.org/10. 000547427. 1007/s00521-024-10229-8. [59] KP Swain, SK Swain, and SR Nayak. “Vision [50] Sajida Perveen et al. “A framework for early detec- Transformer-Based Automated Classification of tion of acute lymphoblastic leukemia and its sub- Acute Lymphoblastic Leukemia”. In: 2025 Interna- types from peripheral blood smear images using tional Conference on Emerging Systems and Intelli- deep ensemble learning technique”. In: IEEE Access gent Computing (ESIC). IEEE. 2025, pp. 584–588. 12 (2024), pp. 29252–29268. DOI: https://doi. DOI: https://doi.org/10.1109/ESIC64052. org/10.1109/ACCESS.2024.3368031. 2025.10962707. [51] Prakeerth Prasad and Jani Anbarasi L. “Acute Lym- [60] Vishesh Tanwar et al. “Enhancing blood cell diag- phoblastic Leukemia Subtypes Detection using Vi- nosis using hybrid residual and dual block trans- sion Transformer Model”. In: 2024 5th International former network”. In: Bioengineering 12.2 (2025), Conference on Data Intelligence and Cognitive In- p. 98. DOI: https : / / doi . org / 10 . 3390 / formatics (ICDICI). 2024, pp. 1413–1418. DOI: bioengineering12020098. 416 Informatica 49 (2025) 397–416 M. Saadallah et al. https://doi.org/10.31449/inf.v49i16.10050 Informatica 49 (2025) 417–428 417 Deep Learning and Rule-Based Hybrid Model for Enhanced English Composition Scoring Using Attention Mechanisms and Graph Convolutional Networks Ruimin Li Zhoukou Vocational and Technical College, Zhoukou 466000, China E-mail: laogui9029@126.com Keywords: English essay grading, deep learning, artificial rules, graph convolutional network, wide&deep architecture Technical paper Received: July 8, 2025 Through the profound exploration conducted on AI technology in the field of education, early automatic scoring systems for English compositions have problems such as high misjudgment rate and low efficiency. To improve the efficiency, accuracy, and stability of the English composition grading model, a deep learning and manual rule-based English composition grading model was designed. The research extracted sequence features by introducing attention mechanisms, enhancing contextual correlation analysis, and aggregating global features through graph convolutional networks to extract high-order semantic relationships. Finally, a visual manual scoring rule was designed, which integrated deep semantic features and manual rule features through the Wide&Deep architecture to jointly optimize the scoring results. The experiment outcomes indicated that the accuracy recall curve area of the research method was 92.3%. In practical application testing, the highest group stability index of the research method was 0.07 in June. When faced with 600 concurrent requests, the average response time of the research method reached a stable value of 3.4 seconds. The outcomes above demonstrated that the English essay scoring model, which combines deep learning with manual rules as proposed by the research, exhibited excellent accuracy, speed, and stability. It effectively addressed the issues of a high misjudgment rate and low efficiency found in traditional scoring systems, thereby enhancing the model's reliability. Povzetek: Razvit je hibridni model za ocenjevanje angleških esejev, ki združuje globoko učenje z ročnimi pravili. Z Word2Vec, mehanizmom pozornosti in GCN zajame lokalne ter globalne semantike, Wide&Deep pa združi pravila in značilke. 1 Introduction interpretability, but cannot adapt to open content evaluation. The two methods complement each other in English writing ability is one of the core indicators of advantages [5]. In light of the preceding circumstances, to language learning, and traditional manual scoring methods ensure the stability, accuracy, and efficiency of the scoring face bottlenecks such as low efficiency and strong model, an innovative English composition scoring model subjectivity [1, 2]. Early automatic scoring systems mainly based on DL and artificial rules has been designed. The relied on rule-based methods to detect surface errors research uses Word2Vec model to convert essay text into a through pre-defined grammar and spelling rules, but it was matrix of word vectors, capturing semantic information of difficult to evaluate the quality of content and logic, vocabulary. It introduces attention mechanism and graph resulting in a high rate of misjudgment [3]. With the convolutional network to extract local sequence features advancement of technology, machine learning (ML) and semantic graph features, and concatenating the two algorithms have been introduced to comprehensively features to generate deep semantic features, constructing consider vocabulary, syntax, and other elements through graph adjacency matrix to dynamically capture the feature engineering. However, a substantial quantity of relationships between sentences. Then, artificial rule annotated data support is still needed, and the features are generated through feature concatenation, and generalization ability is insufficient [4]. The existing the Wide&Deep architecture is used to fuse deep semantic scoring system cannot meet the automatic scoring features with artificial rule features. Finally, combining requirements for English compositions, and there is an multi-dimensional manual rule evaluation, the research urgent need for a stable, efficient, and accurate scoring achieves dynamic comprehensive scoring of the entire model. Deep learning (DL) models can improve semantic English composition. It is anticipated that research understanding through end-to-end learning, but they lack methodologies will offer a theoretical foundation for transparency and are difficult to capture grammatical grading essays in different languages. details. Artificial rules have a high degree of 418 Informatica 49 (2025) 417–428 R. Li 2 Related works support systems was determined during the research process. The outcomes revealed that the research method English composition grading is an important part of the could effectively enhance decision-making ability in the educational evaluation system, playing a crucial role in context of supply chain [15]. achieving teaching objectives and optimizing teaching In summary, existing research has played a good role strategies. Ramesh et al. proposed AI and ML techniques in the technological advancement of English composition for evaluating automatic paper grading in response to grading models, but it still has limitations such as low issues such as time-consuming manual assessments and grading efficiency and significant subjective differences. lack of reliability in the education system. During the The automatic scoring model based on DL can extract research process, the limitations and research trends of the multi-level information such as linguistic features and current study were analyzed. The outcomes revealed that semantic information, which can simulate the process of the research method had a good effect [6]. Fokides et al. manual scoring to a certain extent, while manual rules can compared the accuracy and qualitative aspects of the handle complex grammar rules and subtle semantic corrections and feedback generated by ChatGPT with differences. Therefore, based on this, a DL and artificial educators regarding the effectiveness of ChatGPT on rule-based English composition grading model was elementary school students' essays written in English. The designed. The goal is to align with the design standards for outcomes revealed that ChatGPT surpassed educators in automated English composition grading and to regard to both the volume and the caliber of output [7]. significantly boost both the accuracy and efficiency of the Shahzad et al. proposed using random forests as classifiers grading workflow. for off topic paper detection to address the prediction problem of whether an article deviates from the topic. The 3 Design of english composition outcomes revealed that the research method had high scoring model accuracy [8]. Erturk et al. pointed out the low reliability and effectiveness of essay style evaluation tools, and 3.1 Intelligent english composition scoring believed that the system's decrease in paper scores was model based on deep semantic text related to boredom in the labeling. The outcomes revealed that higher levels of boredom were correlated with lower features scores [9]. Sharma et al. proposed a system that combines As an important part of the education evaluation system, handwriting recognition models and automatic paper English composition grading has undergone an evolution grading to address the time-consuming issue of grading from traditional manual grading to automated grading. handwritten papers in educational environments. During However, existing automated grading systems are mostly the research process, the performance of downstream tasks based on shallow text features, resulting in significant in paper scoring was analyzed based on Transformer errors in their grading results [16, 17]. DL models can context embedding. The outcomes revealed that the effectively improve the accuracy and reliability of English research method had good performance [10]. composition grading from three aspects: feature extraction, Many scholars both within the country and abroad semantic understanding, and grading prediction [18]. The have carried out profound investigations and application of study converts the original English composition text into a Word2Vec and artificial rules. Mohammed et al. conducted numerical word embedding matrix, and the text to word an exhaustive examination of diverse approaches within embedding conversion formula is shown in Equation (1). the realm of ensemble learning to address the issue of time- E = Embedding(X) (1) consuming hyperparameter tuning in DL. Various features or factors that affect the success of integration methods In Equation (1), X represents the input English were explained during the research process. The outcomes composition text sequence, represents the word revealed that the research method could provide accurate embedding matrix, and Embedding() represents a DL theoretical support [11]. Tropsha et al. proposed a "deep embedding function. Next, the research will investigate the quantitative structure-activity relationship" model for use of the Woed2Vec learning model to map each word to virtual screening of molecular databases. The outcomes a high-dimensional space, capturing the semantic and revealed that the research method had a good effect [12]. positional information of the word. The Word2Vec Whang et al. proposed a fairness measure and unfairness learning model has two training models: continuous bag of mitigation technique to address the issues of bias and words and skip word. The frameworks of the two models unfairness in traditional data management. The outcomes are presented in Figure 1. revealed that the research method had good data management performance [13]. Pereira et al. proposed an ML system for multi animal pose tracking to address the challenge of using DL and computer vision techniques to study the social behavior of multiple animals in natural environments. The outcomes revealed that the research method had good efficiency and accuracy [14]. Olan et al. designed an explanatory algorithm to address the impact of AI on the decision-making process in the supply chain field. The composition of interpretable AI and decision Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 419 Input Projection Output Input Projection Output fusion matrix, and Concat() represents the concatenation layer layer layer layer layer layer operation, The number of attention heads is 8, which is determined by GPU video memory optimization test. In the process of extracting semantic graph features from English compositions, in order to dynamically capture the relationships within sentences, a semantic graph adjacency matrix is constructed, and its construction formula is shown in Equation (4).  T EE  A = softmax   (4) (a) Continuous bag of  D  words model (b) Skip-Gram model Figure 1: Framework diagram of continuous bag of words In Equation (4), A represents the adjacency matrix, model and skip-gram model T E represents the transpose matrix of the word As shown in Figure 1, the training process of both the embedding matrix E , and D is the embedding continuous bag of words training model and the skip word dimension. Continuing with the study of iteratively training model goes through the input layer, the mapping updating node features to capture higher-order layer, and finally outputs the results from the output layer. relationships in semantic graphs, the graph convolution However, the continuous bag of words model aggregates feature propagation formula is shown in Equation (5). and maps multiple features, and then outputs the results, while the skip word model maps the features and performs  1 1 − −  (l+1) H = 2 2 (l ) (l ) D AD H Θ  (5) classification output. The study combines continuous bag   of words training model and skip word training model to train and detect the sequence features and semantic map In Equation (5), (l ) H represents the node feature features of English compositions, and then scores the 1 English compositions based on the detection results. In the matrix of the l th layer, − is used for normalization, D 2 process of extracting sequence features from English (l ) Θ represents the learnable weight matrix,  is the compositions, in order to break through the sequence activation function, and introduces nonlinearity to enhance limitations of DL models, self-attention mechanism is the model's expressive power. Finally, the study integrates introduced, and its function expression is shown in all node features and aggregates them into a graph level Equation (2). feature vector to represent the global semantics of the  T entire English composition. The graph level feature QK  Attention(Q,K, V) = softmax  V (2) aggregation formula is shown in Equation (6).  d   k   N z = (L) In Equation (2), Q , K , and V represent the query ihi  i=1 matrix, key matrix, and value matrix, respectively. d is  exp( • (L) (6) k w h  i )  i =the dimension of the key or query vector and softmax()  exp( • (L) w h j ) represents the normalization function. To enhance the  j model's ability to express complex sequence patterns, a In Equation (6), (L) h is the feature vector of the i th multi-head attention mechanism is introduced, and its i calculation formula is shown in Equation (3). node, L is the total number of layers, z is the graph level feature vector representing the semantic summary of  =  O MultiHead(Q,K,V) Concat(head1, ,headh )W the entire text,  is the attention weight representing the  (3) i wherehead = Q K V i Attention(XWi , XWi , XWi ) importance of node i to global features, w is the learnable weight vector used to calculate attention scores, In Equation (3), head represents the number of and N is the number of nodes or words. In summary, the attention heads. Q XW , K XW , and V epr sen th i XW r e t e i i detection model structure that integrates the sequence projection matrices of the i th head query vector, head key features of English compositions with the semantic map vector, and head value vector, O W represents the output features of English compositions is shown in Figure 2. 420 Informatica 49 (2025) 417–428 R. Li CNN Semantic graph Word2Vec Convolutional Neural h Networks graph Out Out Out Concat Score? hdeep English composition Word2Vec Head1 Head2 Headk hsep Self-Attention Mechanism Figure 2: Detection model integrating sequence and graph features of english compositions Data Capture the relationships processing between sentences in English compositions Feature Semantic diagram of extraction Capture high-order English composition relationships in semantic Word2Vec graphs Semantic Integrate node features Self-Attention Convolutional English graph nodes Mechanism Neural Networks composition document Split into a head sequence English composition head sequence Sequence features Deep semantic Output rating results features Scorer Image features Figure 3: Intelligent scoring model for english compositions based on deep semantic text features As shown in Figure 2, the detection model that on deep semantic text features is shown in Figure 3. integrates English composition sequence features and As shown in Figure 3, the English composition semantic graph features receives two types of input data, scoring model based on deep semantic text features and the semantic graph captures the semantic achieves accurate evaluation by integrating multi-level relationships between phrases and concepts. Then the semantic information. The model first performs semantic graph and English composition are processed structured parsing on the input English essay document, by Word2Vec, converting discrete words into dense, low breaks down the title sequence to highlight the article dimensional real valued vectors. Next, semantic graph structure, and preserves contextual information through features are extracted through graph convolutional node feature integration. In the feature extraction stage, networks, while sequence features are extracted by multimodal technology is used to deeply fuse semantic introducing attention mechanisms. Subsequently, feature information. On the one hand, the title sequence is fusion is performed, and the sequence feature vectors and embedded in Word2Vec and local sequence features are graph feature vectors extracted from two parallel paths extracted through self-attention mechanism. On the other are concatenated to form deep semantic features. Finally, hand, semantic graph nodes model global semantic the model evaluates the output rating results. The fusion relationships through graph convolutional networks. The detection model solves the limitation of single feature two types of features are further combined with image representation ability by fusing two complementary features to form a unified deep semantic feature vector. feature representations. The deep semantic text feature Finally, the rater performs regression analysis based on expression of its fusion model is shown in Equation (7). deep semantic features and outputs objective scoring results. h ( ) ( deep =  h 7) seq hgraph In summary, the implementation details of the entire research framework are as follows: (1) Word2Vec is used In Equation (7), h represents the deep semantic deep to convert English essay texts into dense word vector features of English composition, || represented by matrices. The continuous bag-of-words model predicts core words through contextual word prediction. The input vector concatenation symbols, h and h seq graph layer aggregates multiple contextual word vectors, while respectively represent the sequence features and graph the mapping layer summarizes them to output core word features of English composition. In summary, the probabilities. The skip-word model predicts contextual intelligent scoring model for English composition based words based on core words. Both models undergo Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 421 negative sampling optimization, with 300-dimensional complexity score, T represents the part of speech s embeddings and a 5-window contextual size. (2) During diversity score, P represents the rectangle score, and attention mechanism feature extraction, the input word s vector matrix is linearly transformed to generate query  represents the artificial rule weight. Then, to evaluate i matrices, key matrices, and value matrices, each with 64 the logical rigor of English essay paragraphs, a scoring dimensions. The multi-head architecture employs 8 heads, formula for paragraph cohesion strength is introduced, where each head independently computes attention and and its specific expression is shown in Equation (10). outputs concatenated linear fusion results. The final n sequence features are generated. (3) In graph  k  I (conn k ) (10) convolutional network semantic feature extraction, the = k= R C 1 p  cohere adjacency matrix embedding dimension is 300. The N 1+ log(L) feature propagation and aggregation process learn a In Equation (10), C represents the coherence p weight matrix with 128 dimensions across 2 layers. score of the paragraph, I(conn represents the validity k ) 3.2 Intelligent english composition scoring indicator function of the k th connector,  represents k model combined with artificial rules the weight of the connector, R represents the cohere Although the English composition grading model based semantic coherence ratio, N represents the number of on deep semantic text features can effectively grade sentences, and L represents the length of the paragraph. English compositions, it generally relies on manually Finally, the study aims to achieve dynamic defined grading templates, and candidates can avoid comprehensive scoring of the entire English composition deduction types through simple writing techniques, through multi-dimensional manual rule evaluation. The lacking interpretability [19]. In the field of composition scoring formula is shown in Equation (11) checking, artificial rules are usually expressed in formal language and automatically detected through natural  m language processing tools. The English scoring model   l  Qs C j   p combined with artificial rules can effectively solve the p  j=1 p=1 Score =    +    + Sim problem of lack of interpretability in DL models, so content Simstr  m   l  further research is needed to introduce artificial rules [20].     Based on manual rules to quantify the basic language     quality of sentences, the basic formula for scoring errors (11) in English compositions is shown in Equation (8). In Equation (11), Score represents the final score 1 E of the composition, Q represents the excellence score s = F (8) s j 1+   (Cspell +Cgram ) of the j th sentence, C represents the coherence p In Equation (8), E is the score for incorrect p s score of the p th paragraph, Sim and content Sim sentences, with a maximum score of F , C is the str spell represents the similarity of content and structure, ,, number of spelling errors, C is the number of gram all meet the requirement of  +  + =1 . The artificial grammar errors, and  is the error penalty coefficient, rules are constructed based on expert knowledge, its value is 0.1, and the error rate is lowest when its value employing a method that quantifies sentence-level errors is 1 through grid search verification. Continuing with the and sentence excellence through predefined weights to study of balancing the importance of each dimension achieve digital transformation. The primary linguistic through artificial rules, the formula for weighting the features targeted include surface errors, sentence-level multidimensional excellence of sentences is shown in errors, and paragraph-level errors. By integrating deep Equation (9). semantic features through a Wide&Deep architecture, the rules enhance interpretability while capturing subtle Qs = 1 Vs +2 Gs +3 Ts +4 P (9) s errors and reducing subjective variations. Experimental validation demonstrates their effectiveness in lowering In Equation (9), Q represents the overall s bias values and misjudgment rates, as well as improving excellence score of the sentence, V represents the scoring stability. In summary, the feature extraction s vocabulary score, G represents the syntactic framework for the manual scoring rules of English s compositions is shown in Figure 4. 422 Informatica 49 (2025) 417–428 R. Li Error type 1 Manual scoring rules Error type 2 Concat hexpert Error type n English composition text Artificial rule vector Figure 4: Feature extraction of artificial rules for english composition Manual scoring rules English composition text Semantic graph Convolutional Error type 1 Neural Networks Error type 2 Concat Error type n Artificial rule vector hsep Concat hgraph hexpert hdeep Score? Figure 5: Network structure of english composition error detection combined with artificial rules As shown in Figure 4, in the feature extraction framework of manual scoring rules for English hexpert = (W‖( v rud i ) b) ( 2  x + 1 ) compositions, structured manual scoring rules are input together with the original English composition text as In Equation (12), h represents the artificial expert initial data. Then, the manual rules are decomposed into rule feature, vrud represents the set of error types, different types of errors, and each type of rule is x represents the artificial rule vector for the error type, quantified as a numerical vector to achieve the digital i transformation of expert knowledge. Then, the artificial and b represents the bias term. Next, the study uses the rule vector is concatenated with the semantic vector of Wide&Deep structure to fuse shallow features of the composition text to form a mixed feature that artificial rules with deep semantic text features, achieving combines both artificial rules and text semantics. Finally, the final error classification prediction. The fusion after processing, output the characteristics of the manual formula is shown in Equation (13). scoring rules for English compositions. The study aims to achieve the organic integration of artificial rules and DL y = Softmax(W 1 widehexpert +Wdeephdee +b) ( 3) p models by converting discrete artificial rules into continuous features. The specific expression is shown in In Equation (13), y represents the rating result, Equation (12). and W and W represent the weight matrices of wide deep Artificial rule feature extraction Deep semantic feature extraction Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 423 parts wide and deep . In summary, the network accuracy of the model, TP and TN are the number of structure of English composition error detection essays correctly rated as low or high by the model, FP combined with manual rules is shown in Figure 5. and FN are the number of essays incorrectly rated as As shown in Figure 5, the English composition error low or high by the model. In summary, the scoring detection network structure combined with manual rules process of the English composition scoring model based improves detection accuracy and interpretability through on DL and artificial rules is shown in Figure 6. dual channel feature fusion. The model receives dual As shown in Figure 6, the English essay scoring source inputs: the manual scoring rules are decomposed model based on DL and artificial rules improves scoring and vectorized into quantifiable rule vectors, covering accuracy and interpretability through dual channel error types such as grammar, logic, rhetoric, etc. The feature collaboration. The model takes English model synchronizes the construction of semantic maps composition text and manual scoring rules as dual source for original English compositions, extracts logical inputs: on the one hand, it generates a semantic map relationships between sentences, and performs sequence through multi-level parsing of the original text, and on the analysis to capture word order features. Then, the deep other hand, it breaks down the document into sequential semantic features obtained from the dual source input are features according to its structure, preserving the evaluated together with the artificial rule features, and the framework information of the article. Next, in the feature results are judged. The study introduces binary cross extraction stage, a bimodal DL architecture is adopted. entropy loss to measure the difference between After Word2Vec vectorization of semantic graph nodes, a misclassified predictions and true labels. The specific graph convolutional network models global semantic expression of the loss function is shown in Equation (14). relationships and outputs deep features. Sequence nodes extract local language patterns through self-attention Loss = −yˆ log(y)−(1− yˆ)log(1− y) (14) mechanisms, generate sequence features, and concatenate the two to form deep semantic features. On the other hand, breaking down manual rules in textual In Equation (14), Loss represents the loss form into quantifiable dimensional vectors enables the function value, ŷ represents the true label of the sample, digital transformation of expert knowledge. Finally, the and y represents the predicted probability output of the artificial rule features are optimized using binary cross model. Finally, the study evaluates the performance of the entropy and combined with deep semantic features to model by calculating its accuracy, as shown in Equation generate rule enhanced deep features. The rater then (15). performs regression analysis based on the rule enhanced deep features to output the final English essay grading TP+TN Accuracy = (15) results. TP+ FP+TN + FN In Equation (15), Accuracy represents the Data Capture the relationships processing between sentences in English compositions Feature Semantic diagram of extraction Capture high-order English composition relationships in semantic Word2Vec graphs Semantic Integrate node features Self-Attention Convolutional English graph nodes Mechanism Neural Networks composition document Split into a head sequence Sequence features Image features English composition head sequence Deep semantic features Artificial rule Binary cross vector entropy Deep semantic features Manual scoring rules Wide&Deep Output rating results structure Scorer Figure 6: Scoring process of english composition scoring model based on dl and artificial rules 424 Informatica 49 (2025) 417–428 R. Li Algorithm (ICSA), Linear Regression Model (LRM), and 4 Validation of English composition Hierarchical Attention Model (HAM). The accuracy recall curves and curve areas of the four methods were grading model based on DL and compared, and the results are presented in Figure 7. artificial rules As shown in Figure 7, the shape and area of the accuracy recall curve of different methods are different. 4.1 Performance testing of English In Figure 7 (a), the accuracy recall curve of the research composition scoring model based on method was close to a rectangle, with a curve area of DL and artificial rules 92.3%. In Figure 7 (b), the accuracy recall curve of the ICSA algorithm was 71.6%. In Figure 7 (c), the curve of To confirm the capability of the English essay grading the LRM model belonged to low accuracy high recall, model based on DL and artificial rules, a simulation which was prone to false positives. As shown in Figure 7 model was constructed for testing. The testing (d), the curve of the HAM model belonged to high environment and specific configuration are presented in accuracy low recall, which was prone to missed Table 1. detections. Overall, compared to comparative methods, research methods had higher accuracy and inspection coverage. The mean absolute error (MAE) of the scoring results of the four methods under different numbers of writing words, as well as the scoring time under different Table 1: Test environment and specific configuration numbers of writing paragraphs, were compared, and the Testing environment Specific configuration outcomes are presented in Figure 8. GPU NVIDIA Tesla In Figure 8 (a), the MAEs of the scoring results of V100/A100 the four methods all increased with the increase in the CPU Intel Xeon Gold 6248R number of English composition words. The MAE of the Memory 256GB DDR4 research method's scoring results had the smallest increase. When the word count in the composition was Storage 2TB NVMe SSD + 10TB 100, the MAE of the research method's scoring results HDD was 0.25. When the word count in the composition was DL framework PyTorch 1.12 / 350, the MAE of the research method's scoring results TensorFlow 2.10 was 0.52. The MAE for the two types of composition Feature engineering tools Scikit-learn 1.2 + word counts only increased by 0.27. The MAE of the Gensim 4.3 scoring results for the other three methods was Support for large models Transformers 4.28 significantly greater than that of the research method at different numbers of words in the composition. In Figure Research method ICSA 8 (b), the scoring time of all four methods increased with 100 100 the number of paragraphs in the essay. When the English 80 80 essay had only one paragraph, the scoring time of the 60 60 research method was 32 ms, and when the essay had five paragraphs, the scoring time was 42 ms. However, the 40 40 scoring time of the other three methods at different 20 20 paragraph counts was significantly greater than that of the 0 0 research method. Overall, compared to the comparative 0 20 40 60 80 100 0 20 40 60 80 100 Recall (%) Recall (%) methods, the research methods had better robustness. In (a) Research method (b) ICSA conclusion, the English essay grading model proposed by 100 LRM 100 HAM the research based on DL and artificial rules had high 80 80 reliability, accuracy, and good robustness. After validating the performance of the research methodology, 60 60 the study further investigated the synergistic effects of the 40 40 fusion architecture through ablation experiments. First, 20 20 independent testing of deep models revealed that removing manual rules reduced grammatical error 0 0 0 20 40 60 80 100 0 20 40 60 80 100 detection accuracy, demonstrating their constraint effect Recall (%) Recall (%) (c) LRM (d) HAM on surface errors. Next, independent testing of rule Figure 7: Accuracy recall curve of different methods models showed increased semantic coherence score deviations in long texts when graph convolutional networks were removed, proving deep models' capability As shown in Table 1, the specific configurations in to capture higher-order semantics. Finally, dual-stream the table were used for performance testing, using the feature contribution analysis using SHAP values Kaggle ASAP dataset. The research methods were demonstrated that manual rule features contributed compared with the Integrated Classification Scoring minimally to grammatical/spelling error detection, while Accuracy (%) Accuracy (%) Accuracy (%) Accuracy (%) Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 425 deep semantic features played a significant role in content in IELTS data set could be reduced to 0.42, which was logic scoring. These ablation results confirmed the better than the 0.67 of the research models, highlighting complementary innovation of the "feature perception- the advantages of lightweight. The study further regulation constraint" architecture in the research compared Transformer-based pre-trained models. In the methodology. IELTS content dimension scoring, BERT exhibited lower deviation values than the research methodology. 4.2 Practical application effect of English However, its reliance on billions of parameters resulted composition scoring model based on DL and in significantly longer response times. Notably, the artificial rules research methodology demonstrated markedly higher accuracy in detecting grammatical errors when On the basis of verifying the performance of the incorporating rules, surpassing BERT. These findings English essay grading model based on DL and artificial indicated that compared to cutting-edge technologies, the rules, further research is conducted to ascertain the research methodology demonstrated superior semantic efficacy of the practical application of the research understanding depth and error detection specificity. Then, method. The study used the IELTS Writing Task 2 dataset the group stability index of the four methods for the to build a modular hierarchical architecture experimental English composition data in the first six months was platform. The research methods were compared with scored, and the deviation values under different scoring ICSA, LRM and HAM, and the semantic depth index was dimensions were compared. The results are shown in supplemented: The content dimension deviation of BERT Figure 9. Research method Research method ICSA ICSA LRM LRM 2.5 HAM HAM 2.0 100 80 1.5 60 1.0 40 0.5 20 1 2 0 3 100 150 200 250 300 350 4 Word count in the composition 5 (a) The average absolute error of (b) Rating efficiency compositions of different lengths Figure 8: MAE and rating efficiency Critical value Research method Research method ICSA 0.5 ICSA 5 LRM LRM HAM HAM 0.4 4 0.3 3 0.2 2 0.1 1 0 0 1 2 3 4 5 6 Content Language Structure Coherence Composition data for different months Rating dimension (a) Model generalization ability (b) Deviation in sub item scoring Figure 9: Model generalization ability and sub-item scoring deviation In Figure 9 (a), the critical value of the group The overall group stability index of the research method stability index for English composition scoring was 0.17. for scoring monthly composition data remained below Mean absolute error Group stability index Deviation value Time consumption (ms) 426 Informatica 49 (2025) 417–428 R. Li the critical value, with its highest group stability index essays was less than 20%. The highest misjudgment rate being 0.07 in June and the lowest 0.02 in January. The was 17.8% when the score threshold was 21 points, and stability indices of the other three methods for scoring the minimum misjudgment rate of the research method monthly essay data were significantly higher than that of was 3.2% when the score threshold was 24-25 points. The the research method. In Figure 9 (b), the deviation value misjudgment rates of the other three methods were of the research method in the dimension of composition significantly higher than that of the research method at content was 0.67, the deviation value in the dimension of different score thresholds. As shown in Figure 10 (b), the composition language was 0.99, the deviation value in the consistency between the vocabulary scoring results of the dimension of composition structure was 0.33, and the four methods and the manual vocabulary scoring was not deviation value in the dimension of composition the same. The distribution of the vocabulary scoring coherence was 0.82. The bias values of the other three results of the research method was closely aligned with methods under different scoring dimensions were the diagonal, indicating it was highly consistent with the significantly greater than those of the research methods. manual vocabulary scoring. However, the distribution of Overall, compared to the comparative methods, the vocabulary scoring results for the other three methods research methods had better generalization ability and differed significantly from that of manual vocabulary higher accuracy. Comparing the sensitivity of four scoring, resulting in lower accuracy of their scoring methods in identifying excellent compositions and their results. Overall, compared to the comparative methods, ability to capture advanced vocabulary in compositions, the research method had a lower false positive rate and the results are presented in Figure 10. better scoring performance. The four methods were As shown in Figure 10 (a), the misjudgment rates of the compared for the accuracy rate of scoring under different four methods for high - scoring essays with different error types and the average response time under different score thresholds were not the same. The overall concurrent request numbers, as shown in Figure 11. misjudgment rate of the research method for high-scoring Research method Line of demarcation ICSA Research method LRM ICSA HAM LRM HAM 100 5 80 4 60 3 40 2 20 1 0 0 20 21 22 23 24 25 0 1 2 3 4 5 Score threshold (points) Manual evaluation vocabulary score (points) (a) High score essay misjudgment rate (b) Vocabulary richness recognition ability Figure 10: High score essay misjudgment rate and vocabulary richness recognition ability Research method Research method ICSA ICSA 500 LRM 10 LRM HAM HAM 400 8 300 6 200 4 100 2 0 0 Grammar Spell Logic Match 0 200 400 600 800 1000 Error type Number of concurrent requests (copies) (a) Error type processing time (b) Concurrent processing capability Figure 11: Error type processing time and concurrent processing capability Processing time (ms) Misjudgment rate (%) Scoring system evaluates vocabulary score (points) Average response time (s) Deep Learning and Rule-Based Hybrid Model for Enhanced English… Informatica 49 (2025) 417–428 427 As shown in Figure 11(a), the research method application potential. Future studies could integrate demonstrated 99.2% accuracy in scoring grammatical Transformer pre-trained models to verify model stability errors, 98.7% in spelling errors, 99.0% in logical errors, and deviations across multilingual essay datasets (e.g., and 97.9% in collocation errors. In contrast, the other French, Chinese), evaluate cross-linguistic rule three methods showed significantly lower accuracy rates adaptability, and improve cross-linguistic performance for these error types compared to the research and transferability. Additionally, the research could methodology. In Figure 11 (b), the average response time incorporate eye-tracking technology into multi-modal of the four methods gradually increased with the increase deep understanding frameworks. By recording eye of concurrent requests, with the research method showing movements during writing processes, it can analyze the smoothest trend of increase. When faced with 600 authors' attention allocation patterns. Combined with concurrent requests, the average response time of the keystroke logs, this approach could quantify writing research method reached a stable value of 3.4 seconds. fluency and cognitive load, supplementing process However, the average response time of the other three dynamics that textual features cannot capture. However, methods showed a significantly greater increase trend this study is the first to migrate the Wide&Deep than the research method. Overall, compared to architecture from recommendation system to essay comparative methods, research methods had better scoring field. Through semantic drift of DL with rule resource allocation capabilities and scoring performance. feature constraints, it provides a new idea for the Overall, the English essay grading model proposed by the interpretability of AI education products. research based on DL and artificial rules had good generalization ability, accuracy, and performance. References 5 Conclusion [1] Del Gobbo E, Guarino A, Cafarelli B, Grilli L. GradeAid: A framework for automatic short To address the issues of high misjudgment rates and answers grading in educational contexts—design, instability in existing English essay automatic scoring implementation and evaluation. Knowledge and systems, this study innovatively proposes an English Information Systems, 2023, 65(10): 4295-4334. essay scoring model combining DL with manual rules. DOI: 10.1007/s10115-023-01892-9. The research methodology extracts sequence features and [2] Wang Q. The use of semantic similarity tools in semantic graph features from English essays, integrating automated content scoring of fact-based essays them with manual rule features to construct a "feature written by EFL learners. Education and Information perception-rule constraint-joint decision" fusion Technologies, 2022, 27(9): 13021-13049. DOI: architecture for stable and accurate scoring. Experimental 10.1007/s10639-022-11179-1. results show that when the essay contains 100 words, the [3] Geçkin V, Kızıltaş E, Çınar Ç. Assessing second- average absolute error of the scoring method is 0.25; language academic writing: AI vs. Human raters. when the essay contains 350 words, the average absolute Journal of Educational Technology and Online error increases to 0.52; and when the essay consists of 5 Learning, 2023, 6(4): 1096-1108. DOI: paragraphs, the scoring time reaches 42ms. In practical 10.31681/jetol.1336599. application tests, the method shows 0.67 deviation in [4] Theodosiou A A, Read R C. Artificial intelligence, content dimension scoring, 0.99 deviation in language machine learning and deep learning: Potential dimension scoring, 0.33 deviation in structure dimension resources for the infection clinician. Journal of scoring, and 0.82 deviation in coherence dimension Infection, 2023, 87(4): 287-294. DOI: 10.1016/j. scoring. The method achieved 99.2% accuracy rate for Jinf.2023.07.006. grammatical errors, 98.7% accuracy rate for spelling [5] Wang J, Wang S, Zhang Y. Deep learning on medical errors, 99.0% accuracy rate for logical errors, and 97.9% image analysis. CAAI Transactions on Intelligence accuracy rate for collocation errors. Overall, the proposed Technology, 2025, 10(1): 1-35. method demonstrated excellent scoring accuracy, DOI:10.1049/cit2.12356. robustness, and stability. The research findings failed to [6] Ramesh D, Sanampudi S K. An automated essay quantify the contribution ratios of DL and rule-based scoring systems: A systematic literature review. approaches to explainability. The test datasets were Artificial Intelligence Review, 2022, 55(3): 2495- limited to IELTS/Kaggle materials, which did not 2527. DOI: 10.1007/s10462-021-10068-2. validate the generalization capabilities of open-domain [7] Fokides E, Peristeraki E. Comparing ChatGPTs essays and consequently compromised practical correction and feedback comments with that of applicability. Moreover, the methodology primarily educators in the context of primary students short relied on Word2Vec and traditional attention mechanisms essays written in English and Greek. Education and for feature extraction. While effective in English essay Information Technologies, 2025, 30(2): 2577-2621. scoring, the static embedding model of Word2Vec lacked DOI: 10.1007/s10639-024-12912-8. contextual sensitivity, potentially limiting semantic depth [8] Shahzad A, Wali A. Computerization of off-topic comprehension and cross-linguistic transfer capabilities. essay detection: a possibility? Education and Modern Transformer models, however, provide superior Information Technologies, 2022, 27(4): 5737-5747. contextual representation and enhanced cross-linguistic DOI: 10.1007/s10639-021-10863-y. 428 Informatica 49 (2025) 417–428 R. Li [9] Erturk S, van Tilburg W A P, Igou E R. Off the mark: supply chain decision support making. Production Repetitive marking undermines essay evaluations Planning & Control, 2025, 36(6): 808-819. due to boredom. Motivation and Emotion, 2022, DOI:10.1080/09537287.2024.2313514. 46(2): 264-275. DOI: 10.1007/s11031-022-09929-2. [16] Bhat M, Rabindranath M, Chara B S, Simonetto D [10] Sharma A, Katlaa R, Kaur G, Jayagopi D B. Full- A. Artificial intelligence, machine learning, and page handwriting recognition and automated essay deep learning in liver transplantation. Journal of scoring for in-the-wild essays. Multimedia Tools hepatology, 2023, 78(6): 1216-1233. DOI: and Applications, 2023, 82(23): 35253-35276. DOI: 10.1016/j. Jhep.2023.01.006. 10.1007/s11042-023-14558-z. [17] Simon K, Vicent M, Addah K, Bamutura D, Atwiine [11] Mohammed A, Kora R. A comprehensive review on B, Nanjebe D, Mukama A O. Comparison of deep ensemble deep learning: Opportunities and learning techniques in detection of sickle cell challenges. Journal of King Saud University- disease. AIA, 2023, 1(4):252-259. DOI: https://doi. Computer and Information Sciences, 2023, 35(2): Org/10.47852/bonviewAIA3202853. 757-774. DOI: 10.1016/j. Jksuci.2023.01.014. [18] K. Bhosle, V. Musande. Evaluation of deep learning [12] Tropsha A, Isayev O, Varnek A, Schneider G, CNN Model for recognition of Devanagari digit. Cherkasov A. Integrating QSAR modelling and Applied Artificial Intelligence. 2023, 1(2): 114-118. deep learning in drug discovery: The emergence of DOI: 10.47852/bonviewAIA3202441. deep QSAR. Nature Reviews Drug Discovery, 2024, [19] Zamfiroiu A, Vasile D, Savu D. ChatGPT–a 23(2): 141-155. DOI: 10.1038/s41573-023-00832-0 systematic review of published research papers. [13] Whang S E, Roh Y, Song H, Lee J G. Data collection Informatica Economica, 2023, 27(1): 5-16. DOI: and quality challenges in deep learning: A data- 10.24818/issn14531305/27.1.2023.01. centric ai perspective. The VLDB Journal, 2023, [20] Didimo W, Grilli L, Liotta G, Montecchiani F. 32(4): 791-813. DOI: 10.1007/s00778-022-00775-9. Efficient and trustworthy decision making through [14] Pereira T D, Tabris N, Matsliah A, Turner D M, Li J, human-in-the-loop visual analytics: A case study on Ravindranath S, et al. SLEAP: A deep learning tax risk assessment. Rivista italiana di informatica e system for multi-animal pose tracking. Nature diritto, 2022, 4(2): 15-21. DOI: 10.32091/RIID0092. methods, 2022, 19(4): 486-495. DOI: 10.1038/s41592-022-01426-1. [15] Olan F, Spanaki K, Ahmed W, Zhao G. Enabling explainable artificial intelligence capabilities in https://doi.org/10.31449/inf.v46i16.9736 Informatica 49 (2025) 429–440 429 Integrating DDPG and QPSO for Multi-Objective Optimization in High Proportion Renewable Energy Power Dispatch Systems Xu’an Qiao1*, Chaofan Liu2 1School of Aeronautics, Chongqing City Vocational College, Chongqing 402160, China 2School of Physics Sciences, University of Science and Technology of China, Hefei 230026, China E-mail: qiaoxuan109_2023@126.com, Liuchaofan1980_622@126.com *Corresponding author Keywords: Power system, dispatch optimization, renewable energy, DDPG, heuristic algorithm Received: June 16, 2025 This study proposes a novel dispatch optimization model that integrates deep deterministic policy gradient (DDPG) and quantum particle swarm optimization (QPSO) to address the challenges posed by high proportions of renewable energy in power systems. The proposed multi-objective optimization framework considers system cost reduction, supply-demand balance, and dynamic adaptability to renewable energy fluctuations. The experimental results on the IEEE 30-bus and 118-bus systems demonstrated significant improvements. This method reduced total system costs by 13.6% and 11.4%, respectively. It also increased supply reliability to 97.1% and achieved an energy utilization rate of 94.85%. Additionally, it minimized frequency deviation to 1.25 Hz. The optimization time was also improved, achieving a reduction of 58.3 seconds in efficiency. The research results have important practical application value in improving power system economy, enhancing system reliability, and dynamic adaptability. It can provide efficient and reliable technical support for power dispatch planning, load management, and real-time control under high percentage renewable energy scenarios. Povzetek:Študija predlaga hibridni model za optimizacijo razporejanja (dispečanja) v omrežjih z visokim deležem obnovljivih virov, ki združuje izboljšani DDPG in QPSO. Model zmanjša stroške, izboljša zanesljivost in stabilnost ter poveča izrabo energije. Preizkusi na IEEE sistemih potrjujejo visoko učinkovitost. 1 Introduction (UC) is performed first to determine the start/stop status of the units. Then economic dispatch (ED) is performed to With the continuous adjustment and optimization of the optimize the unit output. Finally, real-time regulation is global energy structure, carbon peaking and carbon performed through automatic generation control (AGC) neutrality targets have become important strategies for [6]. Although this staged dispatch model is simple to countries around the world to cope with climate change operate, there are problems such as response delays and achieve sustainable development. Because of their between different dispatch modules. In view of this, the clean and low-carbon benefits, renewable energy (RE) study proposes a multilevel cooperative dispatch model sources like solar and wind have been frequently used in for HP of renewable energy power systems (REPS) and this setting [1]. However, the operation and scheduling of introduces heuristic algorithms to accelerate the solution the conventional power system (PS) have been severely of complex systems. The study aims to solve the hampered by the widespread use of wind, solar, and other limitations of the traditional sequential scheduling method high percentage (HP) RE sources. To begin with, RE is and improve the economy, reliability and feasibility of highly volatile and erratic. Their output is affected by system operation through refined modeling and natural conditions, including wind speed, light, etc., and cooperative optimization. there is a large uncertainty [2]. This uncertainty makes the This study's novel contribution is its proposal of a load balance and stability of the system subject to shocks, joint heuristic algorithm that combines improved deep and is prone to supply-demand imbalance in the PS in the deterministic policy gradient (DDPG) and quantum case of peak power demand or insufficient wind and solar particle swarm optimization (QPSO) algorithms to resources [3]. Second, the traditional PS, which relies on optimize scheduling in high-proportion REPS. This precise forecasts of load and generation capacity from the method improves the convergence speed and stability of scheduling model, becomes irrelevant when the complex, multi-constraint, multi-timescale problems by proportion of RE in the PS increases. Due to the introducing dual experience pooling and time-decaying significant impact of unstable factors on RE generation exploration strategies. The proposed mathematical model capacity, there is a substantial discrepancy between actual addresses the inefficiencies and local optima of traditional and forecasted values. This discrepancy makes short-term models. It provides a more efficient and reliable solution PS scheduling more challenging and complex [4-5]. In for PS scheduling optimization. addition, current PS scheduling relies mainly on a phased sequential scheduling approach, i.e., unit commitment 430 Informatica 49 (2025) 429–440 X. Qiao et al. 2 Related works grid [12]. By proposing a new economic low-carbon clean PS dispatch model that incorporates power-to-gas The key to ensuring the PS operates steadily is PS technology, Cui et al. addressed the issue of increasing the schedule optimization. The optimization method directly grid's capacity to absorb wind power. This model impacts the stability and adaptability of the PS to RE integrated the effects of multiple price factors, resulting in fluctuations. These are essential to the functioning and low-carbon PS operation and cost optimization [13]. An advancement of contemporary PS, as well as its economic enhanced jellyfish search optimization technique was efficiency. Therefore, many scholars have carried out presented by Gami et al. to solve the optimal reactive various researches on PS scheduling optimization. For the power dispatch problem in HP renewable PSs. By optimal reactive power scheduling problem in PS, M. improving the algorithm's exploration and development Abd-El Wahab et al. suggested a hybrid method called stages, the study successfully optimized the PS's most augmented Jaya and artificial ecosystem-based secure and stable state [14]. This allowed the PS to operate optimization, which improved system stability, economic in both deterministic and probabilistic load demands and viability, and overall efficiency [7]. A nonconvex mixed RE resource states. integer and quadratic restricted planning technique was In summary, the existing research has made some presented by Cox J L et al. to solve the challenge of significant progress in PS scheduling optimization. optimizing a centralized solar power plant's profitability However, there are still deficiencies in the research for HP under changing solar resources. The method improved the of RE access, such as insufficient consideration of the solvability of the problem through exact and volatility and stochastic characteristics of RE, fewer approximation techniques, thus enabling operational studies on the collaborative scheduling of multi-timescale scheduling optimization in real-time decision support [8]. modules, and insufficiently perfect uncertainty handling To address the issue of system security and economic cost methods. Therefore, the study proposes to construct a over time in microgrid scheduling, Zhang et al. suggested mathematical model of HP of REPS scheduling a multi-timescale scheduling model that incorporated load optimization and introduce a heuristic algorithm to solve voltage and frequency dynamics. To minimize economic it. The innovation of the study is to propose a multi- cost while maintaining voltage and frequency stability, the module cooperative optimization framework to accurately study converted it into a multi-objective optimization deal with uncertainty and extreme scenarios. Meanwhile, problem that took into account economic cost, voltage the optimization algorithm improves the solution deviation, and frequency stability. This improved the efficiency and provides new ideas for complex PS microgrid dispatch's efficiency and dependability. [9]. To scheduling. address the global issues brought on by the recent explosive increase in the demand for electricity, Hou et al. developed an integrated day-ahead multi-objective 3 Methods and materials microgrid optimization framework. The framework This section provides a detailed description of the PS produced more affordable, dependable, and ecologically scheduling optimization method proposed in the study. friendly power supply services by combining demand-side The method consists of scheduling optimization management, forecasting methods, and economic- mathematical model and scheduling optimization heuristic environmental dispatch [10]. algorithm. The combination of the two effectively Large-scale access to the PS by RE sources affects the improves the scheduling efficiency and stability of the PS output characteristics of wind and photovoltaic energy with a HP of RE access. sources. These sources exhibit strong intermittency, randomness, and volatility due to weather, climate, and 3.1 Mathematical model construction for other external natural factors. These challenges have led power system scheduling optimization to the urgent need for innovation and optimization of existing dispatch methods to adapt to the new situation of The PS suffers scheduling complexity issues brought on HP of RE access. A mixed-integer linear programming by volatility and uncertainty as a result of the extensive method was put up by Shirzadi et al. to address the issue access to HP of RE sources. Moreover, the conventional of enhancing the efficiency and dependability of RE phased scheduling approach finds it challenging to satisfy systems. The study optimized the PS's daily operating the needs of system stability and economy [15-16]. expenses and system resilience by combining a unique Therefore, the study proposes a PS scheduling hybrid model with deep learning and statistical modeling optimization method for HP of RE access. to forecast the load demand and wind power output (PO) The two main cores of the method are a mathematical for the ensuing three days [11]. Due to the impact of the model that can achieve multi-module co-optimization by volatility of wind and solar power generation on PS comprehensively considering system costs, constraints, operation, Guo et al. proposed multi-stage optimization, and uncertainties. The heuristic algorithm can efficiently online optimization, and multi-timescale optimization for solve complex optimization problems, taking into account RE integration. This study realized the strategic the global search and local fine optimization. Figure 1 scheduling and control of energy storage units and depicts the method's general framework. improved the efficiency of RE integration in the power Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 431 Input Module 1 Mathematical Model 2 Construction Module Load Demand Forecast Objective Function Renewable Energy Output Scenarios Constraints Various parameters Uncertainty Handling Output Module 4 Heuristic Algorithm 3 Design Module Iterative Optimized Scheduling Plan Initialization Phase Optimization Phase Performance Metrics Optimal Termination Solution Output Condition Evaluation Figure 1: Overall framework of power system scheduling optimization method Real-Time Operational Feedback Optimization UC On/Off ED AGC Base Module Status Input Module Module Load Input Generator Real-Time Power UC Decisions Output Dispatch Adjustment Short-Term Long-Term Decisions Medium-Term Dispatch Real-Time Adjustment Objective Constraints Function Optimized Results Figure 2: Closed-loop sequential optimization process among UC, ED, and AGC modules Four components make up the general architecture of including unit start/stop status, power allocation, and the PS scheduling optimization approach suggested in the standby capacity configuration. It also evaluates the study, as shown in Figure 1: the input module, the output economy and stability of the scheduling scheme through module, the heuristic algorithm design module, and the performance indicators. mathematical model construction module. The input The mathematical model developed in this study module contains load demand forecast, RE output differs from the traditional stage-wise sequential scenarios, and various parameters as support. The scheduling model in terms of the scheduling optimization mathematical model construction module is then approach. The proposed model forms a closed-loop responsible for constructing the PS scheduling sequential scheduling framework with dynamic feedback optimization model based on three main foundations: coupling among modules by integrating the UC for start- objective function (OF), constraints, and uncertainty stop decisions, the ED for cost minimization, and the AGC handling [17-18]. The scheduling optimization model for real-time supply-demand balancing. These modules based on the four steps is effectively solved using the operate across different time scales and interact through heuristic algorithm design module. Finally, the output feedback mechanisms to realize coordinated optimization. module generates the optimized scheduling plan, 432 Informatica 49 (2025) 429–440 X. Qiao et al. The principle of inter-module coordination is illustrated in In Equation (3), t denotes the time period when the Figure 2. unit is turned on. T denotes the minimum min-on,i As shown in Figure 2, the scheduling optimization continuous operation time of the i th unit. The output process proposed in this study adopts a closed-loop power constraint is shown in Equation (4). sequential optimization mechanism, consisting of three Pi,min  Pi,t  P main modules: UC, ED, and AGC. These modules interact i,max (4) across multiple time scales through real-time feedback to In Equation (4), P and i ,mi P are the maximum n i,max achieve dynamic coordination. During each scheduling and minimum PO of the i th unit. P denotes the PO of i,t cycle, the UC module first optimizes the on-and-off status of units based on current load forecasts, reserve the i th unit at time t . Both Equation (3) and Equation (4) requirements, and other system parameters. Then, it hold only when the value of u is 1. The ED module is i,t passes the results to the ED module. Then, the ED module responsible for optimizing the power allocation of the performs power allocation and generates a base load turned-on units after the UC determines the SSS of the profile for the AGC module, which makes real-time, units [20]. The power balance constraints in the ED short-term power adjustments. Unlike traditional dispatch module are shown in Equation (5). models, which operate in isolated stages, the AGC module N in this framework generates feedback information, such as Pi,t + PRES ,t = Dt ,t (5) i=1 load correction values and reserve margin stress levels, continuously during its adjustment process. Rather than In Equation (5), P denotes the RE output at time RES ,t discarding this data, it is fed back as correction inputs into t . D denotes the load demand at time t . The climbing t the next UC scheduling cycle. Specifically, the system capacity constraint is shown in Equation (6). monitors the magnitude and frequency of AGC P, − P up , 1 , down i t i t−  Ri Pi,t−1 − Pi,t  Ri ,i, t (6) adjustments in the previous cycle. If frequent or In Equation (6), Rup and Rdown significant real-time corrections are observed, it indicates are the upper and i i potential deficiencies in load forecasting or reserve lower climbing limits for unit i , respectively. The reserve planning. In response, the system increases the reserve capacity constraint is shown in Equation (7). capacity settings for the next cycle to improve operational N redundancy. At the same time, the load forecast is Ri,t  Rrequired ,t , Ri ,t = Pi ,max − Pi ,t ,t (7) corrected by incorporating observed deviations into the i=1 predicted curve. This enables the UC module to make In Equation (7), R is the standby capacity required ,t more accurate start and stop decisions that reflect actual requirement of the system at time t . R denotes the i ,t system demand. This adaptive feedback mechanism is standby capacity that the i th unit can provide at time t . repeated in every cycle, progressively refining UC Equation (8) illustrates how the AGC module regulates the decisions to better match real-world operating conditions fluctuations by modifying the power in real time and improve overall dispatch responsiveness. depending on ED. In the mathematical model, schedule optimization aims to reduce the system's overall running costs. The OF P = base i ,t Pi ,t + Pi ,t ,i,t (8) is set as shown in Equation (1). In Equation (8), Pbase i,t denotes the base point load min Z = provided by the ED module [21]. P is the real-time T  N  (1) i,t (C fuel ,i,t +Cstart/stop,i,t )+Cresreve,t +CEENS ,t  power adjustment of the AGC module. The real-time t=1  i=1  balancing constraint is shown in Equation (9). In Equation (1), Z denotes the total system cost. T N denotes the total quantity of scheduling time segments. N Pi ,t = Dt , is the total quantity of units. C is the fuel cost of the i=1 fuel ,i ,t (9) N i th unit at time t . C is the startup and shutdown start /stop,i,t D = D −Pbase t t i ,t − PRES ,t cost of the i th unit at time t . C is the standby cost i=1 resreve,t In Equation (9), D represents the difference at time t . C is the desired power deficit cost at time t EENS ,t between the actual load demand and the base point load t , which mainly measures the supply-demand imbalance and RE output. The adjustment speed constraint is shown caused by RE fluctuations [19]. The UC module is in Equation (10). responsible for optimizing the start-stop state (SSS) of the P  Rresponse i,t i (10) units. The SSS constraint u is shown in Equation (2). i,t In Equation (10), Rresponse denotes the upper limit of ui,t 0,1,i, t (2) i real-time regulation speed. As a result, the synergistic In Equation (2), a value of 1 for u indicates that the i,t relationship among the UC, ED, and AGC modules is unit is on and a value of 0 indicates that the unit is off. realized through the tight coupling of inputs and outputs. Equation (3) displays the unit start/stop time limitation. The UC provides the SSS for the ED, the ED provides the T base point load for the AGC, and the feedback from the ui,tt  Tmin-on,i (3) AGC optimizes the start-stop strategy of the UC. t=1 Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 433 The proposed model incorporates uncertainty in wind search capabilities using quantum behavioral and solar output directly into the scheduling process to mechanisms. Hence, this study combines improved enhance adaptability to RE fluctuations. A limited number DDPG and QPSO to propose a joint heuristic algorithm. of representative renewable output scenarios are generated Figure 3 illustrates the computational flow of the during each scheduling cycle by applying random enhanced DDPG in this technique. deviations to forecasted values based on recent historical In Figure 3, the study introduces a dual experience variation. These scenarios simulate possible short-term pooling mechanism in DDPG, which balances exploration fluctuations in renewable generation. The reserve capacity and utilization by storing diverse samples and high-value constraint is adjusted accordingly based on the observed samples separately to improve training efficiency and fluctuation range, ensuring sufficient buffer during high- policy quality. Second, to prevent falling into the local variability periods. In the AGC stage, real-time control optimum, a time-decaying exploration noise technique is targets are fine-tuned using deviation trends derived from used to boost exploration at the beginning and improve these scenarios. The proposed model maintains dispatch stability at the end. Finally, the target network update feasibility and system stability under uncertain renewable strategy is optimized to dynamically adjust the target output conditions by dynamically updating reserve network parameters through the soft update method to settings and AGC parameters. enhance the training stability and convergence speed. The improved DDPG workflow consists of four main 3.2 Power system scheduling optimization stages. First, in the initialization phase, the Critic network, heuristic algorithm design Actor network, and their target networks are randomly initialized. Additionally, two experience pools, B1 and The proposed mathematical model for optimizing PS B2, are established. B1 stores the initial experience scheduling takes into account total system operating costs, samples. B2 stores the high-value samples that are the synergistic optimization of multiple modules, and the selected using a filtering mechanism. The dual-experience uncertainty associated with a high proportion of RE pool design maintains diversity in the training data, which sources. This provides a theoretical basis for scheduling. improves sample selection efficiency. Next, within the However, the simple model may be inefficient or training iterations and time step loop, the agent generates susceptible to local optimization when dealing with actions via the policy network. This enhances the complex, multi-constraint, multi-timescale optimization exploration of unknown strategies by adding exploration problems [22]. Therefore, heuristic algorithms are noise. After interacting with the environment, experience introduced to optimize the mathematical model and solve samples are generated and stored in B1. High-value it. In deep reinforcement learning, the DDPG method samples are then selected based on reward values and effectively optimizes unit SSS and power allocation. stored in B2. In this way, the experience pool contains However, it may converge slowly and become trapped in both common experience samples and high value samples. local optima when solving complex problems with This ensures the samples are diverse and valuable for multiple constraints [23]. QPSO overcomes the limitations training, thereby improving the algorithm's learning of DDPG by improving particle diversity and global efficiency. Sort Pool 1 by reward Start (descending); remove low-reward samples to Initialize the Critic and Actor Store experience in Pool 1 form Pool 2 networks Initialize two target Perform action, receive reward and networks next state Is Pool 1 full? Initialize experience Select actions based on the current N pools 1 and 2 strategy and explore noise Y N Sample a batch from Is max training round Y Is the number of time Pool 2 using the reached? Steps at its maximum? strategy Y N Update Critic network and Actor network Get initial state from environment Update two target End networks Figure 3: The framework of DDPG-QPSO joint heuristic algorithm 434 Informatica 49 (2025) 429–440 X. Qiao et al. Initialize particle positions Check whether the maximum number of iterations or Start convergence condition is reached Initialize algorithm parameters Initialization Update particle positions based on Calculate the initial the quantum behavior formula fitness values of particles Main loop Determine the global best position Calculate fitness values and individual best positions Output the Update the global best position optimal solution and individual best positions Dynamically adjust parameters End Figure 4: Schematic diagram of the algorithm flow of QPSO Then, a small batch of high-value samples is sampled from B2 to update the Critic and Actor networks. The (t+1) (t )  1  Critic network is updated using the error calculated from xi, j = Pi, j    xi, j − Pi, j  ln   (11)  u  the target value. Meanwhile, the Actor network is updated (t ) +1 using the policy gradient method to maximize the long- In Equation (11), x ) i , j and (t x denote the updated i, j term cumulative reward. This optimization allows the position of particle i after t and t +1 iterations in the j model to continuously improve its policy and value th dimension. P denotes the reference point of particle function, thereby enhancing the quality of its decisions. i , j Finally, the research employs a soft update method that i in the j th dimension.  denotes the quantum dynamically adjusts the target network parameters. This modulation factor. u denotes the random number, which further improves training stability and convergence speed. is used to introduce randomness to give the particle a non- The soft update strategy smoothly adjusts the target deterministic update property. P is obtained from the i , j network parameters. This prevents excessive fluctuations GOP and the IOP with certain weights, as shown in in the network during training and avoids instability Equation (12). caused by dramatic parameter updates. Figure 4 depicts the QPSO algorithm's flow. Pi, j =  pbest ,i, j + (1− )  gbest , j (12) In Figure 4, the overall process of QPSO algorithm is In Equation (12), p and denote the IOP best ,i, j gbest , j not much different from the traditional PSO algorithm. and GOP, respectively.  denotes the inertia factor, The steps are initializing particle positions and which is used to control whether the particle prefers the parameters, calculating fitness values, updating global individual OS or the global OS. In the integration of optimal position (GOP) and individual optimal position DDPG and QPSO, the improved DDPG algorithm first (IOP), dynamically adjusting parameters, and iterative generates an initial dispatch strategy based on the current judgment to achieve the output optimal solution (OS). environmental state and load demand information. This However, the core difference between the two is the way strategy includes the start-stop decisions and power of updating the particle position. Traditional PSO is based allocation for each generation unit over all time periods. on the iterative formula of velocity and position, while The output of DDPG is a deterministic decision vector, QPSO adopts the quantum behavioral formula, which representing an executable scheduling solution. This constructs the quantum distribution of the particle position comprehensive scheduling solution serves as a key through the GOP and IOP. In the QPSO algorithm, the reference for initializing the population in the QPSO quantum modulation factor controls the randomness of the algorithm. More specifically, the DDPG output is encoded particle update process and regulates the particles' ability as a particle position within the QPSO search space, which to explore the search space. This enhances search diversity is then assigned as the initial position of at least one and prevents the algorithm from getting trapped in local particle within the swarm. The remaining particles are OSs. Quantum distribution describes the probabilistic initialized in the vicinity of this solution through random characteristics of particle position updates. New particle perturbations, ensuring that the initial population has both positions are generated through formulas based on guidance and diversity. Based on this initialization, QPSO quantum behavior by combining global and IOPs. This performs global search optimization. Its quantum- reflects the non-deterministic update mode inspired by behavior mechanism further refines and adjusts the quantum mechanics. The quantum behavioral formulation scheduling strategy, improving the solution's overall updates the particle positions as shown in Equation (11). stability and adaptability. Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 435 Initially, the global search is favored and the local system cost, a multi-module cooperative scheduling optimization is favored in the later stage. Combining the model comprising UC, ED, and AGC modules is built. above, the final PS scheduling optimization flow designed The improved DDPG is utilized to generate the initial by the study is shown in Figure 5. optimization strategy, which is further optimized by In Figure 5, the final PS scheduling optimization QPSO. The optimized scheduling plan covers unit process consists of inputting load demand forecasts, RE start/stop status, power allocation, and standby capacity output scenarios, and related parameters to provide basic configuration. It ultimately achieves efficient and stable data for optimization. With the aim of reducing the overall scheduling optimization. Input load demand forecast Input renewable energy output scenarios Input unit parameters Determine Set constraint Input uncertainty factors optimization objectives conditions DDPG strategy as qPSO Improved DDPG for Collaborative initialization strategy generation scheduling model QPSO algorithm outputs Output the optimized the global optimal solution scheduling plan Figure 5: Final power system scheduling optimization process Table 1: Experimental environment configuration Hardware configuration Software configuration CPU Intel Core i9-12900K (16 cores, 3.2 GHz) Operating system Ubuntu 22.04 LTS GPU NVIDIA GeForce RTX 3090 (24GB VRAM) Programming language Python 3.10 Memory 32GB DDR4 Deep learning framework TensorFlow 2.10 Storage 1TB SSD Optimization algorithm library NumPy, SciPy, Pyomo Power supply 850W High-Efficiency Power Supply Power system simulation tools MATPOWER 7.1 (MATLAB Toolbox) / / Data processing tools Pandas Table 2: Results of model scheduling performance differences IEEE 30-Bus Test System IEEE 118-Bus Test System Indicator Traditional sequential Traditional sequential The proposed model The proposed model scheduling model scheduling model Total cost $12,500 $10,800 $48,200 $42,700 Fuel cost $7,200 $6,500 $28,500 $26,000 Startup/shutdown cost $3,000 $2,500 $12,000 $10,200 Reserve cost $2,000 $1,500 $6,500 $5,500 Demand-supply Imbalance cost $300 $300 $1,200 $1,000 Supply-demand deviation (MW) 5.5 4.2 25.0 18.5 Response time (s) 10.1 8.3 18.7 13.5 4 Results 4.1 Validation of mathematical model for The efficiency and superiority of the PS scheduling power system scheduling optimization optimization methods suggested in the study are A multi-module cooperative scheduling model including confirmed in this section using both heuristic algorithms UC, ED, and AGC modules is constructed with the goal and mathematical models. The focus is on verifying the of lowering the overall system cost. effectiveness of multi-module collaboration and Based on Table 1, the study selects the 30-node test uncertainty handling, as well as multi-timescale system and the 118-node test system from the IEEE optimization and the performance enhancement and standard examples. The former includes 30 nodes, 41 comprehensive optimization capabilities of the improved transmission lines, 6 generators, and 20 load nodes, which DDPG, QPSO, and joint heuristic algorithms. are suitable for preliminary verification and experimentation. The latter includes 118 nodes, 186 transmission lines, 54 generators, and 99 load nodes, which can be used for in-depth research on the 436 Informatica 49 (2025) 429–440 X. Qiao et al. optimization capabilities of multi module collaborative 5.2s respectively, improving the dynamic response scheduling and heuristic algorithms. First, the capability. Overall, the mathematical model proposed in performance of dispatches under the proposed closed-loop the study has achieved more efficient resource utilization sequential scheduling model based on UC-ED-AGC in small-scale systems and demonstrated superior feedback is compared with that under a traditional stage- adaptability to complex problems in large-scale systems. wise sequential scheduling model. The traditional model The suggested model is compared to conventional PS is a dispatch process in which the UC, ED, and AGC scheduling models that do not take uncertainty handling modules run independently in a fixed order. This process into account since it takes into account the uncertainty of does not consider RE uncertainty or provide feedback or RE. The result is shown in Figure 6. coordination. To ensure a fair comparison, both models In Figure 6 (a), within 30 days of PS scheduling are solved using the same optimization algorithm (QPSO) optimization, the proposed model achieves a power supply under identical system configurations and forecast reliability of over 94%, with an average of 96.58%. conditions. This setup ensures that performance However, traditional models that do not consider differences are attributed to model structure rather than uncertainty processing have the highest power supply solver differences. Table 2 displays the findings. reliability of only 93.73%, with an average of only In Table 2, the mathematical model proposed in the 92.16%. In Figure 6 (b), for the utilization rate of backup study demonstrates advantages in both IEEE 30 node and capacity, after 30 days of model operation, the proposed 118 node testing systems. In terms of economy, the total model increases the utilization rate to between 75% and cost of the 30 node system has been reduced by $1700, 87%, while the utilization rate of the traditional model and the 118 node system has been reduced by $5500. It only fluctuated between 60% and 70%. The outcomes optimizes fuel, start stop, and reserve capacity costs. In displays that the proposed model improves the terms of supply-demand balance capability, the supply- adaptability of RE fluctuations in PS scheduling demand deviation has been reduced by 1.3 MW and 6.5 optimization. Finally, on various time scales, the impact MW respectively, effectively addressing the uncertainty of scheduling optimization of the study's suggested model of load demand and RE fluctuations. Meanwhile, the real- is confirmed. The result is shown in Figure 7. time adjustment response time is shortened by 1.8s and The proposed model Traditional model The proposed model Traditional model 100 90 98 80 96 94 70 92 90 60 88 50 86 84 40 0 30 0 30 Time (days) Time (days) (a) Comparison of supply reliability (b) Comparison of reserve utilization rate Figure 6: Results compared with traditional models that do not consider uncertainty processing In Figure 7(a), in the short-term time (24 hours), the whereas after optimization the energy consumption rate frequency deviation (FD) of the PS dispatch before improves to 93.84%. In Figure 7(c), in the long-term time optimization is much larger, up to more than 4Hz. i.e., one year, the optimized PS dispatch significantly Whereas, after optimization using the research model, the reduces the dispatch cost from $45,600 to $37,860 while FD of the PS dispatch is effectively controlled and remains the pre-optimization dispatch cost is $43,080. The between -2Hz and 2Hz. In Figure 7(b), the energy outcomes reveal that the suggested model performs better consumption rate of the PS dispatch before optimization in terms of long-term economics, medium-term averages 84.15% during the interim time i.e., one week, efficiency, and short-term stability. Supply reliability (%) Reserve utilization rate (%) Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 437 6 Before optimization After optimization 4 2 0 -2 -4 0 2 4 6 8 10 12 14 16 18 20 22 24 Time (hour) (a) Short-term optimization effect 100 Before optimization 46,000 After optimization 95 44,000 42,000 90 40,000 85 38,000 Before optimization 80 36,000 After optimization 75 34,000 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 10 11 12 Time (day) Time (month) (b) Medium-term optimization effect (c) Long-term optimization effect Figure 7: Scheduling optimization effect on different time scales Traditional DDPG 48000 58000 Traditional DDPG Improved DDPG Improved DDPG 46000 56000 44000 54000 42000 52000 40000 50000 38000 48000 36000 46000 0 40 80 120 160 200 0 80 160 240 320 400 Iterations Iterations (a) IEEE 30-Bus Test System (b) IEEE 118-Bus Test System Figure 8: Comparison of DDPG algorithm before and after improvement 4.2 Validation of heuristic algorithms for the learning rates of the Critic network and the Actor power system scheduling optimization network are set to 0.0001. The discount factor is 0.99, the batch size is 64, and the experience pool size is 1,000,000. After the validity and superiority of the mathematical The noise is examined through the use of an Ornstein- model proposed by the study is verified, the study further Uhlenbeck process, which has an initial standard deviation validates the involved heuristic algorithms. Experiments of 0.2 and undergoes attenuation during the training are first conducted for the improvement of the DDPG process. The target network adopts a soft update with an algorithm. The DDPG before and after the improvement update parameter of 0.001 to ensure stability. The results is applied to solve the mathematical model proposed by are shown in Figure 8. the study at the IEEE 30-node and 118-node test systems. Figure 8(a) shows that the traditional DDPG When conducting the experiment, in the DDPG algorithm, algorithm converges after 160 iterations when using the IEEE 30-node system, resulting in a total cost of $39,560. Total cost (USD) Renewable energy utilization rate (%) Frequency deviation (Hz) Total cost (USD) Total cost (USD) 438 Informatica 49 (2025) 429–440 X. Qiao et al. The improved DDPG converges after 80 iterations, and In Figure 9(a), in the IEEE 30-node test system, the total cost is reduced to $37,960. This improvement QPSO has the fastest convergence speed among the five benefits from dual experience pooling. Storing low- algorithms and the lowest final fitness value. In Figure quality samples and high-value samples separately 9(b), in the IEEE 118-node test system, again QPSO has optimizes the efficiency of sample utilization and the fastest convergence speed among the five algorithms improves the training speed. This accelerates the and the lowest final fitness value. It can be concluded that convergence process and reduces the cost. Meanwhile, QPSO effectively enhances the global search capability of time-decay exploration improves initial exploration particles through the quantum behavior mechanism. Both capabilities and stabilizes strategy optimization in later in the smaller-scale IEEE 30-node system and in the more stages, accelerating convergence and reducing costs. In As complex IEEE 118-node system, QPSO shows superior shown in Figure 8(b), both the traditional and improved performance, proving its adaptability to problems of DDPG require more iterations to converge when using the different scales and complexities. Finally, the study more complex IEEE 118-node system. However, the applies the proposed mathematical model in combination improved DDPG still performs better and has a lower with the joint DDPG-QPSO heuristic algorithm in HP RE convergence cost. The dual experience pool and time scheduling optimization. The more advanced methods in decay exploration strategy effectively improves the reference [11], [12], 13], and [14] are selected as algorithm's adaptability and convergence efficiency in comparison methods. The algorithms from references [11] large-scale systems, demonstrating its superiority. to [14] are re-implemented by the research team based on Furthermore, the optimization effect of QPSO is validated the original descriptions in the respective papers. Each through research, and differential evolution (DE), grey method is tuned within a reasonable range of parameters wolf optimizer (GWO), and wolf search algorithm (WSA) based on the recommended settings. Then, it is validated are selected for comparison. In the QPSO algorithm, the to ensure optimal performance in the current test number of particles is 50, the maximum number of environment. All methods are evaluated under the same iterations is 1000, and the inertia factor is 0.9. The learning experimental conditions, which includes load forecast factors are set to 1.5 and 2.0, respectively, to control the profiles, RE output scenarios, system topology, and a global and local optimal attractive forces. Setting the unified evaluation period. All performance metrics are quantum modulation factor to 0.5 enhances the flexibility kept consistent across experiments. The results are of the particle position update. Figure 9 displays the presented in Table 3. findings. PSO PSO DE DE 1.0 1.0 GWO GWO WSA WSA 0.8 0.8 QPSO QPSO 0.6 0.6 0.4 0.4 0.2 0.2 0 0 40 80 120 160 200 0 40 80 120 160 200 Iterations Iterations (a) IEEE 30-Bus Test System (b) IEEE 118-Bus Test System Figure 9: Optimization effect verification of QPSO Table 3: Comprehensive comparison between research methods and reference methods Method Total cost ($) Energy utilization rate (%) Supply reliability (%) Frequency deviation (Hz) Optimization time (s) Proposed method 37960 94.85 97.10 1.25 58.3 Reference [11] 40230 90.76 93.85 2.64 125.4 Reference [12] 39760 92.42 94.50 2.18 98.7 Reference [13] 38450 93.57 95.87 1.89 75.6 Reference [14] 38930 93.25 95.30 2.01 88.9 In Table 3, the proposed mathematical model and the methods with the lowest total cost of $37960 and 94.85% joint DDPG-QPSO heuristic algorithm of the study show energy consumption rate. Meanwhile, the reliability of more obvious advantages in HP of REPS scheduling power supply reaches 97.10%, the FD is only 1.25 Hz, and optimization. The proposed method outperforms other the optimization time is 58.3 s. The proposed algorithm Fitness value Fitness value Integrating DDPG and QPSO for Multi-Objective… Informatica 49 (2025) 429–440 439 demonstrates excellent economy, system stability, and system and production process optimization. IEEE solution efficiency, and provides a highly efficient and Transactions on Industry Applications, 58(2):1581- reliable solution for PS scheduling optimization in high- 1591, 2022. percentage RE scenarios. https://doi.org/10.1109/TIA.2022.3144652 [2] A. A. Lebedev, A. A. Voloshin, and A. N. Lednev. 5 Discussion and conclusion Conceptual framework for developing highly- automated power distribution networks and Targeting the scheduling issue brought on by the HP of micropower systems. Power Technology and RE access in the PS, the study put out a mathematical Engineering, 58(1):163-168, 2024. model for multi-module cooperative scheduling that https://doi.org/10.1007/s10749-024-01790-2 brought together three main modules and used the [3] Ehsan Naderi, Lida Mirzaei, Mahdi Pourakbari- enhanced DDPG and QPSO algorithms to address the Kasmaei, Fernando V. Cerna, and Matti Lehtonen. issue. The efficacy of the study's suggested model and Optimization of active power dispatch considering algorithm was confirmed by experimental findings. In the unified power flow controller: application of IEEE 30-node and 118-node test systems, the proposed evolutionary algorithms in a fuzzy framework. model reduced the total scheduling cost by $1,700 and Evolutionary Intelligence, 17(3):1357-1387, 2024. $5,500, respectively, compared with the traditional https://doi.org/10.1007/s12065-023-00826-2 sequential scheduling model. It enhanced the energy [4] Lilin Cheng, Haixiang Zang, Anupam Trivedi, Dipti consumption rate and power supply reliability. The Srinivasan, Zhinong Wei, and Guoqiang Sun. improved DDPG algorithm increased the convergence Mitigating the impact of photovoltaic power ramps speed by 50% from $39,560 to $37,960 in the 30-node on intraday economic dispatch using reinforcement system by introducing a dual experience pool and a time forecasting. IEEE Transactions on Sustainable decay exploration strategy. QPSO exhibited a stronger Energy, 15(1):3-12, 2023. global search capability, with the fastest convergence https://doi.org/10.1109/TSTE.2023.3261444 speed and the lowest final adaptation value in systems of [5] Yu Dong, Xin Shan, Yaqin Yan, Xiwu Leng, and Yi different sizes compared to other algorithms. In addition, Wang. Architecture, key technologies and the study's optimization experiments on short-, medium-, applications of load dispatching in China power grid. and long-term time scales revealed that the FD was Journal of Modern Power Systems and Clean effectively reduced, the energy consumption rate was Energy, 10(2):316-327, 2022. improved by 9.69%, and the total dispatch cost was https://doi.org/10.35833/MPCE.2021.000685 reduced by 17.6%. The adaptability and superiority of the [6] Huating Xu, Bin Feng, Chutong Wang, Chuangxin model in cooperative optimization over multiple time Guo, Jian Qiu, and Mingyang Sun. Exact box- scales were demonstrated. constrained economic operating region for power The DDPG and QPSO algorithms perform well in the grids considering renewable energy sources. Journal test system. However, they may face challenges regarding of Modern Power Systems and Clean Energy, scalability and adaptability in an actual power grid. As the 12(2):514-523, 2023. power grid grows, its computational complexity will https://doi.org/10.35833/MPCE.2023.000312 increase significantly. This is particularly relevant when [7] Ahmed M. Abd-El Wahab, Salah Kamel, Mohamed working with large volumes of data and real-time H. Hassan, José Luis Domínguez-García, and Loai dispatching. These factors can lead to a shortage of Nasrat. Jaya-AEO: an innovative hybrid optimizer computing resources and excessively long training times. for reactive power dispatch optimization in power In addition, the diversity of power grid topologies and systems. Electric Power Components and Systems, operating conditions may affect the algorithm's 52(4):509-531, 2024. adaptability. The power grid itself contains complex https://doi.org/10.1080/15325008.2023.2227176 generators, energy storage systems, and distributed energy [8] John L. Cox, William T. Hamilton, Alexandra M. resources. Corresponding adjustments to the algorithm are Newman, Michael J. Wagner, and Alex J. Zolan. required to effectively address these. In terms of real-time Real-time dispatch optimization for concentrating performance, the algorithm functions well in a simulated solar power with thermal energy storage. environment. However, in a highly dynamic actual power Optimization and Engineering, 24(2):847-884, 2023. grid, it may not respond promptly to load fluctuations and https://doi.org/10.1007/s11081-022-09711-w changes in RE. This affects the stability of the system. [9] Huifeng Zhang, Dong Yue, Chunxia Dou, and Therefore, although the algorithm performs well in the test Gerhard P. Hancke. PBI based multi-objective system, it still needs further verification and optimization optimization via deep reinforcement elite learning for practical applications to improve stability and response strategy for micro-grid dispatch with frequency speed. dynamics. IEEE Transactions on Power Systems, 38(1):488-498, 2022. References https://doi.org/10.1109/TPWRS.2022.3155750 [10] Sicheng Hou, and Shigeru Fujimura. Day-ahead [1] Lei Gan, Tianyu Yang, Xingying Chen, Gengyin Li, multi-objective microgrid dispatch optimization and Kun Yu. Purchased power dispatching potential based on demand side management via particle evaluation of steel plant with joint multienergy swarm optimization. IEEJ Transactions on Electrical 440 Informatica 49 (2025) 429–440 X. Qiao et al. and Electronic Engineering, 18(1):25-37, 2023. Environmental Protection, 178(1):715-727, 2023. https://doi.org/10.1002/tee.23711 https://doi.org/10.1016/j.psep.2023.08.025 [11] Navid Shirzadi, Fuzhan Nasiri, Claude El-Bayeh, [20] Maolin Li, Youwen Tian, Haonan Zhang, and and Ursula Eicker. Optimal dispatching of renewable Nannan Zhang. The source-load-storage energy-based urban microgrids using a deep learning coordination and optimal dispatch from the high approach for electrical load and wind power proportion of distributed photovoltaic connected to forecasting. International Journal of Energy power grids. Journal of Engineering Research, Research, 46(3):3173-3188, 2022. 12(3):421-432, 2024. https://doi.org/10.1002/er.7374 https://doi.org/10.1016/j.jer.2023.10.042 [12] Zhongjie Guo, Wei Wei, Mohammad Shahidehpour, [21] Junjie Rong, Ming Zhou, Zhi Zhang, and Gengyin Zhaojian Wang, and Shengwei Mei. Optimisation Li. Coordination of preventive and emergency methods for dispatch and control of energy storage dispatch in renewable energy integrated power with renewable integration. IET Smart Grid, systems under extreme weather. IET Renewable 5(3):137-160, 2022. Power Generation, 18(7):1164-1176, 2024. https://doi.org/10.1049/stg2.12063 https://doi.org/10.1049/rpg2.12893 [13] Dai Cui, Weichun Ge, Wenguang Zhao, Feng Jiang, [22] Zhoujun Ma, Yizhou Zhou, Yuping Zheng, Li Yang, and Yushi Zhang. Economic low-carbon clean and Zhinong Wei Distributed robust optimal dispatching of power system containing P2G dispatch of regional integrated energy systems based considering the comprehensive influence of multi- on ADMM algorithm with adaptive step size. Journal price factor. Journal of Electrical Engineering & of Modern Power Systems and Clean Energy, Technology, 17(1):155-166, 2022. 12(3):852-862, 2023. https://doi.org/10.1007/s42835-021-00877-4 https://doi.org/10.35833/MPCE.2023.000204 [14] Fatma Gami, Ziyad A. Alrowaili, Mohammed [23] Jian Hu, Yingjun He, Wenqian Xu, Yixin Jiang, Ezzeldien, Mohamed Ebeed, Salah kamel, Eyad S. Zhihong Liang, and Yiwei Yang. Anomaly detection Oda, and Shazly A. Mohamed. Stochastic optimal in network access-using LSTM and encoder- reactive power dispatch at varying time of load enhanced generative adversarial networks. demand and renewable energsy resources using an Informatica, 49(7):175-186, 2025. efficient modified jellyfish optimizer. Neural https://doi.org/10.31449/inf.v49i7.7246 Computing and Applications, 34(22):20395-20410, 2022. https://doi.org/10.1007/s00521-022-07526-5 [15] Jatin Soni, and Kuntal Bhattacharjee. Multi- objective dynamic economic emission dispatch integration with renewable energy sources and plug- in electrical vehicle using equilibrium optimizer. Environment, Development and Sustainability, 26(4):8555-8586, 2024. https://doi.org/10.1007/s10668-023-03058-7 [16] Yukang Shen, Wenchuan Wu, Bin Wang, and Shumin Sun. Optimal allocation of virtual inertia and droop control for renewable energy in stochastic look-ahead power dispatch. IEEE Transactions on Sustainable Energy, 14(3):1881-1894, 2023. https://doi.org/10.1109/TSTE.2023.3254149 [17] Xiaojing Wang, Li Han, Mengjie Li, and Panpan Lu. A time-scale adaptive a forecasting and dispatching integration strategy of the combined heat and power system considering thermal inertia. IET Renewable Power Generation, 17(8):1966-1977, 2023. https://doi.org/10.1049/rpg2.12743 [18] Bing Sun, Ruipeng Jing, Leijiao Ge, Yuan Zeng, Shimeng Dong, and Luyang Hou. Quick hosting capacity evaluation based on distributed dispatching for smart distribution network planning with distributed generation. Journal of Modern Power Systems and Clean Energy, 12(1):128-140, 2023. https://doi.org/10.35833/MPCE.2022.000604 [19] Wei Liu, Tianhao Wang, Shuo Wang, Zhijun E, and Ruiqing Fan. Day-ahead robust optimal dispatching method for urban power grids containing high proportion of renewable energy. Process Safety and