Volume 47 Number 1 March 2023

ISSN 0350-5596

An International Journal of Computing
and Informatics

Editorial Boards
Informatica is a journal primarily covering intelligent systems in
the European computer science, informatics and cognitive community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications
between different European structures on the basis of equal rights
and international refereeing. It publishes scientific papers accepted by at least two referees outside the author’s country. In addition, it contains information about conferences, opinions, critical examinations of existing publications and news. Finally, major
practical achievements and innovations in the computer and information industry are presented through commercial publications as
well as through independent evaluations.
Editing and refereeing are distributed. Each editor from the
Editorial Board can conduct the refereeing process by appointing
two new referees or referees from the Board of Referees or Editorial Board. Referees should not be from the author’s country. If
new referees are appointed, their names will appear in the list of
referees. Each paper bears the name of the editor who appointed
the referees. Each editor can propose new members for the Editorial Board or referees. Editors and referees inactive for a longer
period can be automatically replaced. Changes in the Editorial
Board are confirmed by the Executive Editors.
The coordination necessary is made through the Executive Editors who examine the reviews, sort the accepted articles and maintain appropriate international distribution. The Executive Board is
appointed by the Society Informatika. Informatica is partially
supported by the Slovenian Ministry of Higher Education, Science and Technology.
Each author is guaranteed to receive the reviews of his article.
When accepted, publication in Informatica is guaranteed in less
than one year after the Executive Editors receive the corrected
version of the article.
Executive Editor – Editor in Chief
Matjaž Gams
Jamova 39, 1000 Ljubljana, Slovenia
Phone: +386 1 4773 900, Fax: +386 1 251 93 85
matjaz.gams@ijs.si
http://dis.ijs.si/mezi
Editor Emeritus
Anton P. Železnikar
Volaričeva 8, Ljubljana, Slovenia s51em@lea.hamradio.si
http://lea.hamradio.si/˜s51em/
Executive Associate Editor - Deputy Managing Editor Mitja
Luštrek, Jožef Stefan Institute
mitja.lustrek@ijs.si
Executive Associate Editor - Technical Editor
Drago Torkar, Jožef Stefan Institute Jamova
39, 1000 Ljubljana, Slovenia
Phone: +386 1 4773 900, Fax: +386 1 251 93 85
drago.torkar@ijs.si
Executive Associate Editor - Deputy Technical Editor Tine
Kolenik, Jožef Stefan Institute
tine.kolenik@ijs.si

Editorial Board
Juan Carlos Augusto (Argentina)
Vladimir Batagelj (Slovenia)
Francesco Bergadano (Italy) Marco
Botta (Italy)
Pavel Brazdil (Portugal)
Andrej Brodnik (Slovenia)
Ivan Bruha (Canada) Wray
Buntine (Finland)
Zhihua Cui (China)
Aleksander Denisiuk (Poland)
Hubert L. Dreyfus (USA) Jozo
Dujmovic´ (USA)
Johann Eder (Austria) George
Eleftherakis (Greece)
Ling Feng (China)
Vladimir A. Fomichov (Russia)
Maria Ganzha (Poland)
Sumit Goyal (India) Marjan
Gušev (Macedonia)
N. Jaisankar (India)
Dariusz Jacek Jakóbczak (Poland)
Dimitris Kanellopoulos (Greece)
Samee Ullah Khan (USA)
Hiroaki Kitano (Japan)
Igor Kononenko (Slovenia)
Miroslav Kubat (USA) Ante
Lauc (Croatia)
Jadran Lenarčič (Slovenia)
Shiguo Lian (China)
Suzana Loskovska (Macedonia)
Ramon L. de Mantaras (Spain)
Natividad Martínez Madrid (Germany)
Sanda Martinčić-Ipišić (Croatia)
Angelo Montanari (Italy)
Pavol Návrat (Slovakia)
Jerzy R. Nawrocki (Poland)
Nadia Nedjah (Brasil)
Franc Novak (Slovenia)
Marcin Paprzycki (USA/Poland)
Wiesław Pawłowski (Poland)
Ivana Podnar Žarko (Croatia)
Karl H. Pribram (USA)
Luc De Raedt (Belgium)
Shahram Rahimi (USA)
Dejan Rakovic´ (Serbia)
Jean Ramaekers (Belgium)
Wilhelm Rossak (Germany)
Ivan Rozman (Slovenia)
Sugata Sanyal (India)
Walter Schempp (Germany)
Johannes Schwinn (Germany)
Zhongzhi Shi (China) Oliviero
Stock (Italy)
Robert Trappl (Austria)
Terry Winograd (USA)
Stefan Wrobel (Germany)
Konrad Wrona (France)
Xindong Wu (USA)
Yudong Zhang (China)
Rushan Ziatdinov (Russia & Turkey)
Honorary Editors
Hubert L. Dreyfus (United States)

https://doi.org/10.31449/inf.v47i1.4463

Informatica 47 (2023) 1–10

1

Enhancement of NTSA Secure Communication with One-Time Pad (OTP) in
IoT
Ali Hasan Aidaros Alattas1 , Mahmood A. Al-Shareeda1 , Selvakumar Manickam1,∗ and Murtaja Ali Saare2
1
National Advanced IPv6 Centre, Universiti Sains Malaysia, 11800, Penang, Malaysia
2
Department of Computer Technology Engineering, Shatt Al-Arab University College, Basrah, Iraq
E-mail: alshareeda022@usm.my, selva@usm.my, murtaja.a.sari@sa-uc.edu.iq
∗
Corresponding author
Keywords: Internet of Things (IoT), NTSA, One-Time Pad (OTP), lightweight cryptographic algorithms
Received: October 24, 2022
Internet of Things (IoT) systems use interconnected devices with limited processing, memory, storage, and
power availability. Designing the IoT system requires careful consideration of data security. IoT networks
are used to collect, process, and transport data; as a result, it needs to be encrypted and secured. To
ensure that the data of IoT systems are protected, a variety of lightweight encryption techniques have been
developed. These algorithms are unable to carry out complicated or extensive computations. The current
challenge facing lightweight cryptographic algorithms, such as NTSA, is how to combine the highest level of
security with the least amount of negative influence on runtime speed and space. By applying the One-Time
Pad (OTP) technique, the proposed mechanism can raise the security level and effectiveness of NTSA. The
proposed mechanism must be put into practise and put to the test in order to demonstrate its effectiveness
and capacity to satisfy the needs of the resource-constrained devices. Due to the benefits of the OTP, this
suggested method would be beneficial for devices with minimal resources. The proposed technique offers
a greater security level, 2134, than NTSA, 2128, after examining and evaluating the experimental data
noticed throughout the tests. NTSA is slower than the suggested approach by 70% in terms of runtime
speed. While NTSA uses 16% of SRAM, the proposed algorithm only uses 12%. NTSA uses 70% more
energy than the suggested algorithm, with higher energy consumption results of 0.000388 Joules for the
proposed algorithm and 0.001295 Joules for NTSA.
Povzetek: Predstavljena je nova metoda za šifriranje in varnostna vprašanja IoT omrežij, ki dosega boljše
rezultate kot NTSA.

1 Introduction
The Internet is a system architecture that has allowed communications to advance to connect devices via different
networks all over the world. Any individual object that
connects to one of its networks can access the Internet
for nearly any purpose that requires information (1; 2).
It enables access to digital information through human or
machine-to-machine (M2M) communications (3). Each
connected object in the Internet of Things has a unique identity and can connect to other connected objects (4). Medical equipment, monitoring equipment, machinery, automobiles, and buildings will all be upgraded to become intelligent objects that can interact with people or other IoT devices (5; 6). The digital transformation of many industries
is what fuels the IoT’s growth. IoT connections will increase from fifteen billion in 2015 to seventy-five billion
by 2025, as stated in (7), see Figure 1.
The security issue is an afterthought because the
resource-constrained networked device is meant to consume a little power to give all essential capabilities (8; 9).
There are problems with IoT hardware, including the possibility of an attack on the device’s encrypted data since some

Figure 1: IoT connected devices in number.

IoT devices are too small to support asymmetric cryptography algorithms. A gadget transmits or receives data that
needs to be encrypted (10; 11). However, using cryptographic methods on devices with limited resources is difficult. The device itself, such as an 8-bit microcontroller
with a 2KB RAM limit, performs the encryption operation
(12; 13).

2

Informatica 47 (2023) 1–10

Traditional cryptography techniques cannot be implemented on such devices since they are expensive and inefficient. In order to address the security concerns on nodes
with limited resources, lightweight ciphers have been developed. They are made to achieve cryptographic computational operation while adhering to the restrictions of microcontrollers, small-size RAM, and low power consumption
(14; 15; 16).
By exploiting the benefits of the OTP technique, the suggested mechanism introduces a solution for both high security and higher performance. This paper focuses on symmetric encryption ciphers and the OTP approach as a foundation for lightweight cryptography. The advantages of
block ciphers, which are simple to implement, the OTP
technique, high security, and high performance can result
in a dependable and robust system. The following are some
of the research’s contributions:
– This study will make it easier to deploy OTP in all areas of life and execute its secrecy into sensitive applications that require a high level of security with high
performance because the OTP approach has shortcomings that have limited its adoption.
– One of OTP’s flaws is the key exchange procedure.
Therefore, this study will address this problem by creating a simple protocol for parties to exchange keys.
The rest of this paper is organized as follows. Section
2 reviews some related work. Section 3 introduces the
background of this paper. Section 4 describes the general
proposed mechanism’s architecture. Section 5 and Section
6 provide a security analysis and results for the proposed
mechanism, respectively. Lastly, Section 7 shows the conclusion and future work in this work.

2

Related work

Advance Encryption Standard (AES) was proven to be the
best trusted and researched block cipher and still has to be
subjected to more study to make it acceptable for resourceconstrained devices, as indicated in (17; 18). While some
lightweight cryptographic algorithms, like G-TBSA, are
adequate for some factors like processing power and energy, they are not resistant to all types of attacks. Like GTBSA, a number of lightweight cryptographic algorithms
are adequate in some respects, such as computational power
and energy, but are not resistant to other assaults.
None of the prominent modern lightweight block and
stream ciphers is typical in offering the security, affordability, and performance for IoT devices with limited resources
(19). It has been noted that the advancement of lightweight
cryptography is still ongoing (20; 21).
However, developing an algorithm that satisfies the
needs of lightweight cryptography for IoT devices is a
considerable task. To accommodate various IoT device
memory limits, the author has devised a simple encryption
method that employs variable-sized keys and data blocks.

A.H.A. Alattas et al.

This idea makes use of DNA sequences to produce random
keys.
Many current LWC algorithms, according to (22), concentrate on lowering the cost of memory, computational
power, physical area, and energy consumption and enhancing throughput and latency without paying attention to security vulnerabilities. In addition, the author claims that
a successful encryption algorithm must strike a balance
between three LWC design objectives (Security, Performance, Cost).
Banani et al. (23) employed the standard performance
measures (memory occupied, execution time, and power
consumption) to trade off among the various algorithms,
including TEA. They did this by referring to the security
and performance evaluation criteria. The avalanche effect
attribute was utilised by the author to illustrate the security
metrics in (24).
In order to summary the limitations of existing works, we
list algorithms and attacks occurred as presented in Table 1.
According to this leak, we enhance NTSA secure communication with OTP in IoT in order to raise the security level
and effectiveness of NTSA. The proposed mechanism must
be put into practise and put to the test in order to demonstrate its effectiveness and capacity to satisfy the needs of
the resource-constrained devices (13).

3 Background
3.1 One-Time Pad technique
Similar to a stream cipher but not one that uses a random
key generator is a One-Time Pad (OTP). It is a safe method
for encrypting a message so that a cryptanalyst cannot decipher the message from the information (25). When encrypting and decrypting data, a random key must have a length
equal to or greater than the message length produced by a
genuine random generator. Then, it will be deleted so that a
fresh new random key is used for the subsequent encryption
and decryption procedures (26; 27).
OTP typically employs the XOR operation to encrypt
plaintext by fusing the message and key bits, which is quick
and appropriate for IoT devices. This increases security
and makes OTP uncrackable under the following circumstances: (I) The key’s unpredictability; (II) The length of
the key must be at least as long as the plaintext; (IV) The
key can be used just once; and (IV) The key has a very high
level of confidentiality (28; 29; 30).

3.2 Lightweight Cryptographic Algorithm
(LWC)
Designed for devices with limited resources, Lightweight
Cryptographic Algorithm (LWC) is a branch of cryptography that seeks to offer solutions (31). The NIST started
a lightweight cryptography project in 2013 to investigate
how well the NIST-approved cryptographic standards function on restricted devices and to determine the demand for

Enhancement of NTSA Secure Communication…

Algorithm

Informatica 47 (2023) 1–10

3

Table 1: Different Attacks on Some Lightweight Cryptosystems in Related Work
Attack
Cipher
Key Size
Structure

TEA

Related-key attack

Block

128 bits

Feistel

XTEA

Related-key attack

Block

128 bits

Feistel

HB-2

Related-key attack

Hybrid

128 bits

Hybrid

PRINTcipher

Related-key attack

Block

80, 160 bits

SPN

PRESENT

Related-key attack

Block

80, 128 bits

SPN

XXTEA

Chosen-Plaintext attack

Block

128 bits

Feistel

AES

Biclique cryptanalysis

Block

128, 192, 256 bits

SPN

LED

Biclique cryptanalysis

Block

64, 80, 96, 128 bits

SPN

PRESENT

Biclique cryptanalysis

Block

80, 128 bits

SPN

Grain

Key recovery attack

Stream

80 bit

Stream

MICKEY

Differential fault attack

Stream

80 bits

Stream

SIMON

Differential fault attack

Block

64,72, 96,128, 144, 192,
256 bits

Feistel

SPECK

Differential fault attack

Block

64,72, 96,128, 144, 192,
256 bits

ARX

PRESENT

Differential fault attack

Block

80, 128 bits

SPN

PRESENT

Truncated differential attack

Block

80, 128 bits

SPN

ChaCha

Truncated differential attack

Stream

256 bits

ARX

specific lightweight cryptography standards. The literature will go into detail about how lightweight encryption
algorithms have been designed to meet the capabilities of
resource-constrained devices to provide both a high level of
security and high performance in terms of minimizing the
runtime and space complexities as much as feasible (32).

and OTP will offer a reliable and lightweight cryptosystem
for IoT devices with limited resources. As shown in Figure
2, the system uses block ciphers and OTP techniques along
with two different forms of symmetric-key primitives (36).

3.3 TEA and NTSA algorithms
TEA uses 64 rounds spread over 32 cycles. Starting with
dividing a 128-bit key into four 32-bit subkeys (k0, k1, k2,
and k3), a 128-bit plaintext block is split into two blocks
of 32 bits. Each set of four operations uses ADD, XOR,
and left and right shift operations. In order to increase confusion during all rounds of encrypting a 64-bit plaintext
block, NTSA, which is an upgrade to TEA, tries to generate
dynamically changing subkeys derived from a 128-bit key
(33; 34).

4 General proposed mechanism’s
architecture
The TEA algorithm performs well in LWC and is simple
to implement in both hardware and software. It also uses
less memory. However, it is susceptible to related-key assaults and has a flaw in the round function mixing. Based on
the findings of the comparison analysis between TEA and
NTSA, NTSA resolved the primary scheduling issue (35).
It turns out that developing a system based on the NTSA

Figure 2: The proposed scheme’s mechanism.

4.1

Random keys generation

Both in hardware and software, the TEA algorithm works
well in LWC and is straightforward to implement. It consumes less memory as well. However, it has a vulnerability in the round function mixing and is vulnerable to
related-key attacks. The fundamental scheduling issue was
resolved by NTSA based on the results of the comparative
analysis between TEA and NTSA (12).
To prevent noise bias between the axes, the Von Neumann extractor method extracts two bits from each axis.
Then the desired value is generated by applying Equation
(1) to a random byte.
RandByte = (x ≤ 6) ⊕ (z ≤ 4) ⊕ (y ≤ 2) ⊕ x ⊕ (z ≤ 2)
(1)

4

Informatica 47 (2023) 1–10

A.H.A. Alattas et al.

Additionally, XORing independent binary variables always minimizes bias, as the piling-up lemma in (37) shows.
Let the random byte be made up of the values x, y, and z
that are retrieved from the x, y, and z axes, respectively.
The flowchart for the procedure that will produce a random
byte is shown in Figure 3.

Figure 4: Encryption flowchart.
– single-use key.
– The key’s confidentiality.
– True random secret keys for each encryption procedure and the high level of security provided by random
keys contribute to increasing security.
Figure 3: RNG flowchart.
4.2.2 Decryption algorithm

4.2

Proposed mechanism

Figure 5 shows depicts the entire decryption process.

We examine three different instances of data transmission
via the Internet of Things devices:
– Periodically, devices will send data; the transmission
interval is determined by the application domain.
– The transmission interval is fixed, and devices will periodically deliver data.
– Data will be sent by devices when it is modified.
This approach is intended to be used in the third scenario,
which involves passing information from the temperature
sensor to an air conditioner. A key must be used once in
the OTP approach before being discarded in order to generate a new key for use in the following encryption procedure. Therefore, sending data on a regular basis is not
recommended, especially if the interval is small, like every
second or even every hour (16).
4.2.1 Encryption algorithm
Regarding the first research issue, Figure 4 provides an illustration of the suggested algorithm. The Feistel structure, which uses the around function, is used by the proposed method since it employs the same round function that
NTSA and TEA do. Data block P and subkey ki are two inputs that a round function accepts and returns one result.
The following conditions can be met by watching the encryption process’s algorithm:

Figure 5: The entire decryption process.

4.3 Key padding protocol
When a new key is generated and utilised in each encryption
procedure, the problem of key exchange between a sender
and recipient arises. The sender and the recipient need to
exchange this key. The problem must be taken into account
to minimise the need for computationally intensive operations, as is the case with traditional cryptography like RSA.
The complexity of key exchange problems grows as a result
of the connectivity between IoT devices and machine-tomachine communication. The approach suggests padding
the key for message encryption and decryption into the ciphertext after it has been encrypted using the previous key
and the XOR operation, as shown in Figure 6.

Enhancement of NTSA Secure Communication…

Informatica 47 (2023) 1–10

5

structures on which certain ciphers are based. The most
frequent threats to Feistel-structure block ciphers, upon
which this paper depends, will be covered.
5.0.1

Figure 6: Process of encrypted key padding.

In this type of attack, the attacker can capture ciphertext
and attempt to decrypt it in order to learn more about the
plaintext and, if possible, the key. To examine and decrypt
ciphertexts, an attacker needs n of them. These attacks have
not been successful against current ciphers.
5.0.2

4.4 Key extraction protocol
Using (16), the intermediate ciphertext’s bytes are first extracted from the final ciphertext C during the decryption
process, D(cn, kn-1). Next, the bytes of the encrypted key
ke is collected from the final ciphertext, as shown in Figure
7. Then kn is obtained by performing an XOR operation
between kn-1 and ke in order to decrypt the message and
the following newKey ke.

Cipher-text only attack

Known-plaintext attack

Some ciphertext’s plaintext can be deciphered by an attacker. The goal of this assault is to reveal and decrypt the
remaining ciphertext blocks using already-known information, which may reveal the key.
5.0.3

Chosen-plaintext attack

Despite being the least factual, this form of attack is potent.
With this technique, the key used to encrypt data is determined by measuring a change in the ciphertext.
5.0.4

Chosen-ciphertext attack

The chosen-Ciphertext attack also includes a chosenplaintext assault, which decrypts ciphertexts with a particular key. If this type of attack is combined with a chosenplaintext attack, it is not very practical.

Figure 7: Final ciphertext bits.

4.5 Discussion
OTP is the suggested remedy, as discussed in the sections
that came before it, to simplify the design complexity of
lightweight encryption methods. If all of its requirements
are met, it operates at a high-performance level with strong
security. NTSA was chosen for this thesis because it fixed a
flaw in the TEA algorithm, the most desirable lightweight
encryption technique. However, NTSA continues to employ the same key, which is vulnerable to attack, across
all encryption processes. IoT systems’ limited-resource devices can use the proposed technique. The simplest and
fastest computational processes, XOR, left and right shift,
and modular addition arithmetic, which rely on bitwise operation, have been suggested as an effective mechanisms
for implementing key exchange procedures.

5 Security analysis
Shift registers, Feistel structures, and substitutionpermutation networks are a few examples of specific

5.0.5

Differential cryptanalysis

The typical technique for attacking cryptographic algorithms is this one. Since linear cryptanalysis uses a knownplaintext attack instead of the usual differential cryptanalysis method’s chosen-plaintext attack, it is thought to be
more practical in everyday life. Particularly, it examines
ciphertext pairs. Pairs of ciphertexts with distinct plaintext differences and examine how these differences change
as the plaintexts move through the encryption algorithm’s
rounds when they are encrypted with the same key. As long
as the two plaintexts satisfy specific differences, they can
be selected at random (with a fixed difference). Then, assign various probabilities to various keys based on the variations in the generated ciphertexts. One key will become
more and more obvious as the most likely correct key as
more and more ciphertexts are studied.
5.0.6

Related-Key attack

Comparable to differential cryptanalysis, but focused on
key differences. Without knowing the actual keys, this approach focuses on the relationship between a pair of keys.
It uses plaintext encryption using both the real key K and
some derived keys, as well as a straightforward link between subkeys in neighboring rounds. The method for

6

Informatica 47 (2023) 1–10

changing the keys must be specified; it may involve flipping key bits while concealing the true key.
The TEA’s issue, which is brought on by weak key
scheduling and a weak mixing component of the round
function, is resolved by the NTSA. The TEA technique can
be broken by a related-key attack using 223 selected plaintexts, especially if the key is weak (38). The NTSA’s defence against the related-key attack will then be clear see
how it was created.
5.0.7 Keys equivalence
If two keys, k1, and k2, produce the same ciphertext after
encrypting the same plaintext, then they are equal. Ek1(P) =
Ek2(P), where E is the encryption function, P is the plaintext, and k1 and k2 are separate keys K and K is the key
space, illustrating this relationship. The connection between the classes that make up K is such that k1 and k2 are
members of the same class. To make this argument more
understandable, use mathematical equations and demonstrate the TEA’s susceptibility as shown in (39).

A.H.A. Alattas et al.

the performance of NTSA are affected by the number of
rounds and size of the data block.
6.1.1 Execution time of encryption process
The encryption function in the suggested technique requires
three inputs: a 64-bit block of data, a previous key with a
128-bit size, and a 128-bit fresh key that is generated before
each new encryption function begins. It has two parameters
in NTSA: plaintext block with a 128-bit key and 64 bits.
The execution time of NTSA increases by roughly 0.828 ms
in Figure 8 and Table 2 for the same number of rounds, 16
rounds, and various block sizes. The suggested algorithm,
however, is implemented more quickly than NTSA, and its
runtime increases by about 0.544 ms for every increase in
block size. In other words, the suggested algorithm outperforms the NTSA by 50%.

5.0.8 Resistance of NTSA against related-key attack
The NTSA uses the same round function as the TEA algorithm, with one modification to improve the key schedule
procedure. In the NTSA, the extract() function is called after each round, and it dynamically returns a value from an
array. Thirty-three separate 32-bit values that are obtained
from the 128-bit key fill up this array
Figure 8: Encryption time for 8 cycles in ms.
5.0.9 The proposed mechanism security analysis
against related-key attack
Great security level and high performance in terms of space
and time complexity are coupled in the suggested method
by incorporating the OTP technique. As long as its requirements are met, the OTP, a conventional but nonetheless
powerful cipher, can withstand quantum computers (40).
Despite using the same round function as the TEA, the suggested approach is more secure than NTSA.

6
6.1

Results
Execution time

This analysis will show the encryption and decryption process execution times for both algorithms, measured in milliseconds based on the number of cycles. The number of
bits encrypted and decrypted using the 128-bit key serves
as a measure of the data size, which is determined by each
cycle’s two rounds. Tables and bar charts are going to be
used to show the results. With data blocks of 64, 128, 192,
and 256 bits, encryption and decryption functions will be
conducted throughout the number of cycles 8, 16, and 32
to reach the execution time tests. These several categories
serve as illustrations of how the suggested mechanism and

Table 2: 8-cycle encryption process results in milliseconds
Algorithm
64 Bits 128
192
256
Bits
Bits
Bits
NTSA

0.802

1.640

2.462

3.285

Proposed

0.530

1.081

1.617

2.161

6.2 Execution time of decryption process
While the proposed algorithm’s ciphertext contains both
the encrypted data and the new key, the NTSA’s decryption function requires 64-bit ciphertext and 128-bit key parameters. The execution time increment rate for both algorithms to complete the decryption function in 8 cycles, or
16 rounds, is shown in Figure 9 and Table 3. The proposed
technique and the NTSA have slightly different runtimes
for the encryption and decryption operations under a class
of eight cycles.

6.3 Memory occupation
Memory occupation in Bytes: In this paper, memory usage
is calculated using SRAM memory for execution time and

Enhancement of NTSA Secure Communication…

Informatica 47 (2023) 1–10

Figure 9: Decryption time for 8 cycles in ms.
Table 3: 8-cycle decryption process results in milliseconds
Algorithm
64 Bits 128
192
256
Bits
Bits
Bits
NTSA

0.805

1.634

2.445

3.261

Proposed

0.558

1.024

1.533

2.045

flash memory for storing code. To measure the amount of
energy used by both algorithms, the Arduino Uno board is
powered by a 9-volt battery in this experiment. The reading
of the current passing through the Arduino Uno board was
taken using the multimeter. There is a 5V voltage and a
20mA current (0.02A).
Figures 10 and 11 show that the NTSA uses 7% of flash
memory to store the algorithm, which is 2546B, and 16%,
or 340 bytes of 2KB, to store the global variables in SRAM
for encrypting and decrypting 64-bit plaintext with a 128bit key. Because the NTSA’s code file has two routines—
encryption and decryption—as well as one function to retrieve the array’s contents, it uses less flash memory than
the suggested approach. In contrast, the code file for the
proposed approach contains the encryption and decryption
procedures as well as the key generation function. In comparison, the NTSA employs an array to hold 33 32-bit subkeys during runtime, which requires more SRAM capacity.
The suggested approach, in contrast, employs an array that
holds six 32-bit values that constitute the final ciphertext.

7

Figure 11: Global variable memory occupation.

6.4

Energy consumption

Energy consumption: a device with limited resources and
low energy usage lasts longer on its battery. As shown in
Figure 12, the following equipment should be available to
conduct this experiment and assess the power consumption:
an 8-bit microcontroller Arduino Uno board (MCU), the
proposed algorithm, and its equivalent NTSA. (1) Multimeter to measure the voltage and the current; (2) Jumper
wires; (3) Banana Plug to Crocodile Clip; (4) DC Barrel
Jack Adapter – Female to screw terminals; and (5) Power
Source whether 9V Battery with 9V Battery Connector to
DC Jack Arduino or Wall Power Supply (5V- 2Am).

Figure 12: Tools used.
The procedures that follow explain how to set up the necessary equipment to begin this experiment’s mechanism:
– Prepare the multimeter by inserting the red probe of
the banana-crocodile cable into the mAV port to measure the voltage and the black probe of the cable into
the COM port to measure the current.
– The multimeter’s dial should be set to A for current
and V for voltage.

Figure 10: Memory occupations.

– Connect the red probe from the multimeter to the (+)
end of the power supply and the black probe to the
Vin port on the Arduino Uno board using a DC Barrell
Jack Adapter. Lastly, attach the (-) end of the power

8

Informatica 47 (2023) 1–10

A.H.A. Alattas et al.

supply to the GND port on the Arduino Uno board.
This circuit has a series connection.
In this experiment, the Arduino Uno board is powered by
a 9-volt battery to measure the energy required by both algorithms. Using the multimeter, the reading of the current
flowing through the Arduino Uno board was captured. The
voltage is 5V, and the current is equal to 20mA (0.02A).
Table 4 shows the energy usage for the encryption procedure for various categories of data sizes with fewer than
64 rounds. The results in Table 4 demonstrate that the
suggested method provides great optimization in terms of
power usage compared to the NTSA.
Table 4: The energy consumption for encryption process
Algorithm 64 Bits
128 Bits
192 Bits
256 Bits
NTSA

0.0003192 0.00065

Proposed 0.000096

7

0.000193

0.000972

0.001295

0.00029

0.000388

Conclusion and future work

Traditional cryptographic algorithms are not suitable for
IoT devices due to their inherent limitations in terms of
processing power, memory, storage, and energy. However, the ongoing development of lightweight cryptography will continue to produce suitable lightweight cryptographic mechanisms to meet these requirements. Consequently, this research suggests a mechanism to incorporate
the OTP technique into the NTSA in order to take advantage of the high-security level with the high performance
offered by the OTP and easy implementation offered by
block cipher and combine them into one mechanism to provide a lightweight cryptographic algorithm that can be implemented on IoT devices easily and effectively.
The first research goal was accomplished by integrating
the OTP technique into NTSA in order to increase security.
The data was encrypted using various new random keys
generated by the MPU6050 sensor, and the final ciphertext was created by padding the bits in order to share the
newly generated key. The experiments covered in Chapter 4 demonstrate that the suggested mechanism offers a
greater security level and higher performance in terms of
the complexity of speed, reduced memory utilization, and
lower energy consumption. This is relevant to the second
study objective. The encryption and decryption runtimes
show that NTSA is 70% slower than the suggested technique. NTSA uses 16% of SRAM, compared to 12% for the
suggested method. In terms of security, the proposed technique offers 2134 security complexity compared to 2128
security complexity offered by NTSA. The proposed algorithm uses 0.000388 Joules of energy, but NTSA uses
0.0013 Joules, meaning that NTSA uses 70% more energy
than the proposed approach.

References
[1] S. S. Oyewobi, K. Djouani, and A. M. Kurien,
“Visible light communications for internet of things:
Prospects and approaches, challenges, solutions and
future directions,” Technologies, vol. 10, no. 1, p. 28,
2022. [Online]. Available: https://doi.org/10.3390/
technologies10010028
[2] M. A. Al-Shareeda, M. Anbar, I. H. Hasbullah,
and S. Manickam, “Survey of authentication and
privacy schemes in vehicular ad hoc networks,” IEEE
Sensors Journal, vol. 21, no. 2, pp. 2422–2433,
2020. [Online]. Available: https://doi.org/10.1109/
JSEN.2020.3021731
[3] I. H. Sarker, A. I. Khan, Y. B. Abushark, and F. Alsolami, “Internet of things (iot) security intelligence:
a comprehensive overview, machine learning solutions and research directions,” Mobile Networks and
Applications, pp. 1–17, 2022. [Online]. Available:
https://doi.org/10.1007/s11036-022-01937-3
[4] A. E. Omolara, A. Alabdulatif, O. I. Abiodun,
M. Alawida, A. Alabdulatif, H. Arshad et al., “The
internet of things security: A survey encompassing
unexplored areas and new insights,” Computers & Security, vol. 112, p. 102494, 2022. [Online]. Available:
https://doi.org/10.1016/j.cose.2021.102494
[5] I. Ashraf, Y. Park, S. Hur, S. W. Kim, R. Alroobaea,
Y. B. Zikria, and S. Nosheen, “A survey on cyber
security threats in iot-enabled maritime industry,”
IEEE Transactions on Intelligent Transportation
Systems, 2022. [Online]. Available: https://doi.org/
10.1109/TITS.2022.3164678
[6] M. A. Al-Shareeda, M. Anbar, S. Manickam,
and A. A. Yassin, “Vppcs: Vanet-based privacypreserving communication scheme,” IEEE Access, vol. 8, pp. 150 914–150 928, 2020. [Online]. Available: https://doi.org/10.1109/ACCESS.
2020.3017018
[7] T. Alam, “A reliable communication framework and its use in internet of things (iot),”
CSEIT1835111| Received, vol. 10, pp. 450–456,
2018. [Online]. Available: https://doi.org/10.36227/
TECHRXIV.12657158.V1
[8] M. A. Al-shareeda, M. Anbar, S. Manickam, I. H.
Hasbullah, N. Abdullah, M. M. Hamdi, and A. S.
Al-Hiti, “Ne-cppa: A new and efficient conditional privacy-preserving authentication scheme for
vehicular ad hoc networks (vanets),” Appl. Math,
vol. 14, no. 6, pp. 1–10, 2020. [Online]. Available:
https://doi.org/10.3390/s21248206
[9] M. Salimitari, M. Chatterjee, and Y. P. Fallah,
“A survey on consensus methods in blockchain

Enhancement of NTSA Secure Communication…

for resource-constrained iot networks,” Internet of
Things, vol. 11, p. 100212, 2020. [Online]. Available:
https://doi.org/10.36227/techrxiv.12152142

Informatica 47 (2023) 1–10

9

IEEE, 2019, pp. 0475–0481. [Online]. Available:
https://doi.org/10.1109/CCWC.2019.8666557

[10] S. Misra, A. Mukherjee, A. Roy, N. Saurabh,
Y. Rahulamathavan, and M. Rajarajan, “Blockchain
at the edge: Performance of resource-constrained
iot networks,” IEEE Transactions on Parallel and
Distributed Systems, vol. 32, no. 1, pp. 174–183,
2020. [Online]. Available: https://doi.org/10.1109/
TPDS.2020.3013892

[18] M. A. Al-Shareeda, S. Manickam, B. A. Mohammed,
Z. G. Al-Mekhlafi, A. Qtaish, A. J. Alzahrani,
G. Alshammari, A. A. Sallam, and K. Almekhlafi,
“Provably secure with efficient data sharing scheme
for fifth-generation (5g)-enabled vehicular networks
without road-side unit (rsu),” Sustainability, vol. 14,
no. 16, p. 9961, 2022. [Online]. Available: https:
//doi.org/10.3390/su14169961

[11] V. Tambe, G. Bansod, S. Khurana, and S. Khandekar, “Reliability and availability of iot devices in resource constrained environments,” International Journal of Quality & Reliability
Management, 2022. [Online]. Available: https:
//doi.org/10.1108/IJQRM-09-2021-0334

[19] M. Rana, Q. Mamun, and R. Islam, “Current
lightweight cryptography in iot security:
A
survey,” in Extended Abstracts. Charles Sturt
University, 2020, p. 27. [Online]. Available: https:
//researchoutput.csu.edu.au/ws/portalfiles/portal/
100690557/SCM_HDR_Booklet_2020.pdf#page=27

[12] M. A. Al-Shareeda, M. Anbar, S. Manickam, and
I. H. Hasbullah, “Password-guessing attack-aware
authentication scheme based on chinese remainder
theorem for 5g-enabled vehicular networks,” Applied
Sciences, vol. 12, no. 3, p. 1383, 2022. [Online].
Available: https://doi.org/10.3390/app12031383

[20] M. A. F. Al-Husainy, B. Al-Shargabi, and S. Aljawarneh, “Lightweight cryptography system for iot
devices using dna,” Computers and Electrical Engineering, vol. 95, p. 107418, 2021. [Online]. Available:
https://doi.org/10.1016/j.compeleceng.2021.107418

[13] M. A. Al-shareeda, M. A. Alazzawi, M. Anbar,
S. Manickam, and A. K. Al-Ani, “A comprehensive
survey on vehicular ad hoc networks (vanets),”
in 2021 International Conference on Advanced
Computer Applications (ACA). IEEE, 2021, pp.
156–160. [Online]. Available: http://doi.org/10.1109/
ACA52198.2021.9626779
[14] P. Panahi, C. Bayılmış, U. Çavuşoğlu, and S. Kaçar,
“Performance evaluation of lightweight encryption
algorithms for iot-based applications,” Arabian
Journal for Science and Engineering, vol. 46,
no. 4, pp. 4015–4037, 2021. [Online]. Available:
https://doi.org/10.1007/s13369-021-05358-4
[15] M. A. Al-Shareeda, M. Anbar, I. H. Hasbullah, S. Manickam, and S. M. Hanshi, “Efficient
conditional privacy preservation with mutual authentication in vehicular ad hoc networks,” IEEE
Access, vol. 8, pp. 144 957–144 968, 2020. [Online].
Available: https://doi.org/10.3390/su14169961
[16] M. A. Al-Shareeda, S. Manickam, M. A. Saare,
and N. C. Arjuman, “Proposed security mechanism
for preventing fake router advertisement attack in
ipv6 link-local network,” Indones. J. Electr. Eng.
Comput. Sci, vol. 2023, no. 29, pp. 518–526, 2023.
[Online]. Available: https://doi.org/10.11591/ijeecs.
v29.i1.pp518-526
[17] I. K. Dutta, B. Ghosh, and M. Bayoumi, “Lightweight
cryptography for internet of insecure things: A
survey,” in 2019 IEEE 9th Annual Computing and
Communication Workshop and Conference (CCWC).

[21] M. A. Al-Shareeda and S. Manickam, “Man-in-themiddle attacks in mobile ad hoc networks (manets):
Analysis and evaluation,” Symmetry, vol. 14, no. 8,
p. 1543, 2022. [Online]. Available: https://doi.org/
10.3390/sym14081543
[22] V. A. Thakor, M. A. Razzaque, and M. R. Khandaker,
“Lightweight cryptography algorithms for resourceconstrained iot devices: A review, comparison and
research opportunities,” IEEE Access, vol. 9, pp.
28 177–28 193, 2021. [Online]. Available: https:
//doi.org/10.1109/ACCESS.2021.3052867
[23] S. Banani, S. Thiemjarus, K. Wongthavarawat, and
N. Ounanong, “A dynamic light-weight symmetric
encryption algorithm for secure data transmission
via ble beacons,” Journal of Sensor and Actuator
Networks, vol. 11, no. 1, p. 2, 2021. [Online].
Available: https://doi.org/10.3390/jsan11010002
[24] W. Diaztary, D. Atmajaya, F. Umar, S. M. Abdullah
et al., “Tiny encryption algorithm on discrete cosine
transform watermarking,” in 2021 3rd East Indonesia
Conference on Computer and Information Technology
(EIConCIT). IEEE, 2021, pp. 415–420. [Online].
Available: https://doi.org/10.1109/EIConCIT50028.
2021.9431930
[25] F. Ramadhani, U. Ramadhani, and L. Basit, “Combination of hybrid cryptography in one time pad (otp)
algorithm and keyed-hash message authentication
code (hmac) in securing the whatsapp communication
application,” Journal of Computer Science, Information Technology and Telecommunication Engineering,
vol. 1, no. 1, pp. 31–36, 2020. [Online]. Available:
https://doi.org/10.30596/jcositte.v1i1.4359

10

Informatica 47 (2023) 1–10

[26] A. Sarkar, S. R. Chatterjee, and M. Chakraborty,
“Role of cryptography in network security,” in
The” Essence” of Network Security: An Endto-End Panorama.
Springer, 2021, pp. 103–
143. [Online]. Available: https://doi.org/10.1007/
978-981-15-9317-8_5
[27] V. B. Savant and R. D. Kasar, “A review on network
security and cryptography,” Research Journal of
Engineering and Technology, vol. 12, no. 4, pp.
110–114, 2021. [Online]. Available: https://doi.org/
10.12691/iteces-3-1-1
[28] S. Bourougaa-Tria, F. Mokhati, H. Tria, and
O. Bouziane, “Spubbin: Smart public bin based
on deep learning waste classification an iot system for smart environment in algeria,” Informatica, vol. 46, no. 8, 2022. [Online]. Available:
https://doi.org/10.31449/inf.v46i8.4331
[29] M. A. Al-Shareeda, S. Manickam, B. A. Mohammed, Z. G. Al-Mekhlafi, A. Qtaish, A. J.
Alzahrani, G. Alshammari, A. A. Sallam, and
K. Almekhlafi, “Cm-cppa: Chaotic map-based conditional privacy-preserving authentication scheme in
5g-enabled vehicular networks,” Sensors, vol. 22,
no. 13, p. 5026, 2022. [Online]. Available:
https://doi.org/10.3390/s22135026
[30] H. Kaur and A. Kaur, “An empirical study of aging
related bug prediction using cross project in cloud
oriented software,” Informatica, vol. 46, no. 8,
2022. [Online]. Available: https://doi.org/10.31449/
inf.v46i8.4197
[31] K. McKay, L. Bassham, M. Sönmez Turan, and
N. Mouha, “Report on lightweight cryptography,”
National Institute of Standards and Technology, Tech.
Rep., 2016. [Online]. Available: https://nvlpubs.nist.
gov/nistpubs/ir/2017/NIST.IR.8114.pdf
[32] M. A. Al-Shareeda, S. Manickam, B. A. Mohammed, Z. G. Al-Mekhlafi, A. Qtaish, A. J.
Alzahrani, G. Alshammari, A. A. Sallam, and
K. Almekhlafi, “Chebyshev polynomial-based
scheme for resisting side-channel attacks in 5genabled vehicular networks,” Applied Sciences,
vol. 12, no. 12, p. 5939, 2022. [Online]. Available: https://doi.org/10.3390/app12125939
[33] S. Rajesh, V. Paul, V. G. Menon, and M. R.
Khosravi, “A secure and efficient lightweight
symmetric encryption scheme for transfer of text
files between embedded iot devices,” Symmetry,
vol. 11, no. 2, p. 293, 2019. [Online]. Available:
https://doi.org/10.3390/sym11020293
[34] M. A. Al-Shareeda, S. Manickam, B. A. Mohammed, Z. G. Al-Mekhlafi, A. Qtaish, A. J.
Alzahrani, G. Alshammari, A. A. Sallam,

A.H.A. Alattas et al.

and K. Almekhlafi, “Cm-cppa: Chaotic mapbased conditional privacy-preserving authentication
scheme in 5g-enabled vehicular networks,” Sensors, vol. 22, no. 13, 2022. [Online]. Available:
https://www.mdpi.com/1424-8220/22/13/5026
[35] H. Ran, “Methodology for interval-valued intuitionistic fuzzy multiple attribute decision making
and applications to performance evaluation of sustainable microfinance groups lending,” Informatica, vol. 46, no. 8, 2022. [Online]. Available:
https://doi.org/10.31449/inf.v46i8.4355
[36] S. Nie, “Evaluation of innovative design of clothing
image elements using image processing,” Informatica, vol. 46, no. 8, 2022. [Online]. Available:
https://doi.org/10.31449/inf.v46i8.4250
[37] M. Matsui, “Linear cryptanalysis method for des
cipher,” in Workshop on the Theory and Application
of of Cryptographic Techniques. Springer, 1993,
pp. 386–397. [Online]. Available: https://doi.org/10.
1007/3-540-48285-7_33
[38] M. Shoeb and V. K. Gupta, “A crypt analysis of the
tiny encryption algorithm in key generation,” International Journal of communication and computer Technologies, vol. 1, no. 1, pp. 9–9, 2019. [Online]. Available: https://www.bibliomed.org/?mno=302643835
[39] R. Beaulieu, D. Shors, J. Smith, S. TreatmanClark, B. Weeks, and L. Wingers, “Simon
and speck: Block ciphers for the internet of
things,” Cryptology ePrint Archive, 2015. [Online]. Available:
https://csrc.nist.gov/csrc/media/
events/lightweight-cryptography-workshop-2015/
documents/papers/session1-shors-paper.pdf
[40] B. Singh, G. Athithan, and R. Pillai, “On extensions of
the one-time-pad,” Cryptology ePrint Archive, 2021.
[Online]. Available: https://eprint.iacr.org/2021/298.
pdf

https://doi.org/10.31449/inf.v47i1.4519

Informatica 47 (2023) 11–20

11

Predicting Students Performance Using Supervised Machine
Learning Based on Imbalanced Dataset and Wrapper Feature
Selection
Sadri Alija 1, Edmond Beqiri 2*, Alaa Sahl Gaafar 3, Alaa Khalaf Hamoud 4
1
Faculty of Business and Economics, South East European University, North Macedonia.
2
University of Peja “Haxhi Zeka” – Peja, Kosovo.
3
Department of Educational Planning, Directorate of Education in Basrah, Iraq.
4
Department of Computer Information Systems, University of Basrah, Iraq.
Email: s.aliji@seeu.edu.mk, edmond.beqiri@unhz.eu, alaasy.2040@gmail.com, alaa.hamoud@uobasrah.edu.iq.
Keywords: supervised machine learning, feature selection, wrapper, particle swarm optimization, info gain, SMOTE
Received: November 14, 2022
For learning environments like schools and colleges, predicting the performance of students is one of the
most crucial topics since it aids in the creation of practical systems that, among other things, promote
academic performance and prevent dropout. The decision-makers and stakeholders in educational
institutions always seek tools that help in predicting the number of failed courses for the students. These
tools can help in finding and investigating the factors that led to this failure. In this paper, many supervised
machine learning algorithms will investigate finding and exploring the optimal algorithm for predicting
the number of failed courses of students. An imbalanced dataset will be handled with Synthetic Minority
Oversampling TEchinque (SMOTE) to get an equal representation of the final class. Two feature selection
approaches will be implemented to find the best approach that produces a highly accurate prediction.
Wrapper with Particle Swarm Optimization (SPO) will be applied to find the optimal subset of features,
and Info Gain with ranker to get the most correlated individual features to the final class. Many supervised
algorithms will be implemented such as (Naïve Bayes, Random Forest, Random Tree, C4.5, LMT, Logistic,
and Sequential Minimal Optimization algorithm (SMO)). The findings show that the wrapper filter with
SPO-based SMOTE outperforms the Info-Gain filter with SMOTE and improves the performance of the
algorithms. Random Forest outperforms the other supervised machine learning algorithms with (85.6%)
in TP average rate and Recall, and (96.7%) in ROC curve.
Povzetek: Opisana je metoda za napovedovanje uspeha študentov s pomočjo strojnega učenja.

1

Introduction

High-quality universities always require a great record of
their students and the students are the main resource for
them. The main concern for the universities is the
performance of the students which is the base stone for
building the top rate graduates and post-graduate students
who will be the leaders of the nations and take
responsibility of the economic and social growth of the
society. Moreover, the main concerns for market
employers are the performance of universities and
students’ academic performance due to its direct effect on
the employment process and then employee productivity.
So, the employers’ demands are met by the graduated
students who exert efforts in their academic journey.
Student performance is measured by the learning
assessment and the curriculum according to Usamah et al
[1].
It is frequently important to be able to predict the behavior
of future students to enhance the design of the curriculum
and prepare the interventions for academic guidance and
support. Machine learning (ML) is useful in this situation.
ML approaches examine datasets, extract information, and

then organize that information for eventual use. The
primary goals of ML are to identify and extract patterns
from recorded data by using a variety of techniques and
algorithms [2]. Numerous algorithms exist and are used
with educational data, including supervised algorithms
such as Decision Tree (DT) and Naive Bayes (NB), and
unsupervised algorithms such as K-Nearest Neighbor
(KNN), and Neural Network (NN). Such algorithms
forecast patterns, upcoming trends, and behaviors,
enabling businesses to make informed, proactive decisions
mining. This paper's major goal is to predict student
performance using Supervised ML based on an
imbalanced dataset and wrapper feature selection. The
following section sheds light on related previous studies,
then followed by the methodology and the concluded
points, and future work.

12

2

Informatica 47 (2023) 11–20

S. Alija et al.

Literature review

High quality universities always require the great
record of their students where the students are the main
resource for them. The main concern for the universities
is the performance of the students which is the base stone
for building the top rate graduates and post-graduates
students who will be the leaders of the nations and take the
responsibilities of the economic and social growth of the
The concept of data mining techniques can be
implemented and applied in the educational field to
improve our comprehension of the learning process, with
a particular emphasis on the identification, extraction, and
evaluation of factors linked to students' learning processes
[3]. ML algorithms enable users to categorize and
summarize associations discovered throughout the mining
process as well as examine data from different
perspectives. Bhardwaj and Pal in [4] explore the
performance of the students by taking a sample of 300
undergraduate students' row records from the department
of computer application from different institutions in Dr.
R. M. L. Awadh University, India. The Bayesian
classifiers are utilized on 17 features where the researchers
found that there is a strong correlation between student
action and other factors such as (living location, the
academic background of the mother, senior secondary
exam, the status, and the annual outcome of the student’s
family).
Next, in the same university, Pandey and Pal [5]
selected 600 students to implement the model based on
Bayes classifier to classify the background qualification,
category, and language. While Hijazi and Naqvi in [6]
have selected 300 students (75 female, and 225 male) from
different colleges in Pakistan's Punjab University to
explore and investigate student performance. Based on the
linear regression, they found that there are many factors
that affected the student's performance such as the attitude
toward the class they attend, the time spent in studying
after college, the mothers’ ages, the income of their
families, and the educational level of their mothers (where
the performance is strongly affected by it). Khan in [7],
explored the performance by building a model based on a
clustering approach using 400 rows of student data from
Aligarh Muslim University's senior secondary school in
Aligarh, India. The main goal of the study is to determine
the predictive value of different measures such as
personality, cognition, and demographic variables that
affect success at a higher level of secondary school. The
outcomes of the study found that females with
socioeconomic status scored higher performance, whereas
males with low socioeconomics had higher performance
in the science stream.
In the next case study [8], Kovacic implemented a
data mining model for determining the educational
enrollment data in New Zealand to predict the
performance of the students. Chi-square automatic
interaction detection (CHAID) and Classification and
Regression (CART) algorithms are utilized to categorize
the successful and failed students. The algorithms did not
produce promising accuracies where they predicted the
results with (59.4, and 60.5 respectively). The other case

study is implemented by Galit [9] where the learning
behavior is examined to predict the students' outcomes and
alert the students to the critical status before the final
exam. The final study [10] is proposed by Al-Radaideh,
where the model is implemented to predict the students'
final grades in C++ course for the students enrolled in the
Yarmouk university in 2005, in Jordan. NB, DT (ID3, and
C4.5) are utilized to predict the grades where the DT has
outperformed the NB in prediction.
In our proposed model, the problem of imbalanced
dataset is handled and the effect of handing this problem
is observed by implementing different machine learning
algorithms (supervised and unsupervised). The effect of
handling imbalanced dataset is also observed by
implementing feature selection which has the direct effect
on the result accuracies.

3

Methodology

The model implementation framework is depicted in
Figure 1, which consists of five steps starting with data
preprocessing and ending with the model evaluation. The
step of attribute feature selection (FS) is implemented by
a single FS and a subset FS to find the effect of each step
on the result accuracies. SMOTE filter is applied then,
where it is followed by implementing supervised ML
algorithms.

Figure 1: Model framework

3.1

Dataset reliability

A questionnaire is adopted in this study to build the model
where Google Forms is used to build the questionnaire and
collect undergraduate students’ answers from both of
Faculty of Contemporary Sciences and Technologies
(CST) and the Faculty of Business and Economics (FBE)
in South East European University (SEEU) in North
Macedonia (RNM). The aim of this study is to find the
optimal DT in predicting student performance based on
the conceptual framework that was implemented by
researchers in [11]. The aim of the framework is to find
the hidden patterns that may affect and correlate with the
performance of the students and provide suggestions to
enhance and improve the performance. Many questions
related to many factors are found in the questionnaire,
such as academic behavior, health, finance, time planning,
self-development, social relationships, and achieving
goals. The questionnaire in [11] lists the factors and the
questions related to each question, where the answer for
most of the questions was on a 5-point Likert scale (from

Predicting Students Performance Using Supervised Machine…

1 to 5) which represented the formal answers (from
“Strongly Disagree” to “Strongly Agree”).
The dataset of the questionnaire involves 141 rows of
respondents. The dataset reliability is required to measure
the overall consistency of the dataset. The measure of
reliability which describes consistency can be confirmed
to have a high level if it produces similar results under
consistent conditions. The most frequent measure in
statistics is the coefficient alpha, which is used to calculate
the internal consistency of the independent variables of the
study. The coefficient’s alpha for the dataset is 0.93. This
value indicates an excellent internal consistency of the
dataset reliability [12][13]. The applied tool for this model
is Weka 3.8.5 and the system specifications are (RAM
8GB, HARD 35.5GB free, OS Win7 Pro).
Table 1: Dataset reliability
Number of
Respondents

141

3.2

Number of
Features

Coefficient’s
Alpha

58

0.93

%
of
Respondents

100%

Feature selection (FS)

FS approach can be considered as a form of data reduction
where features are reduced and only the correlated features
remain. The main goal of FS methods is finding the
optimal subset of features or the highly correlated features
that have a direct effect or may affect the final class(s).
Due to the number of attributes in our dataset (57), it is
required to find the most correlated attributes or features
that can be utilized in the next steps to get more accurate
results in classification [14]. Two approaches are applied
in our model (Wrapper with Particle Swarm Optimization
(PSO)) and (Info-Gain Attribute Evaluator).
• Wrapper method
The Wrapper method evaluates the subset of attributes
according to the classifier performance for both
supervised algorithms (such as DT, SVM, and NB) and
unsupervised algorithms (such as clustering). For each
subset, the evaluation process is repeated while the search
strategy determines the subset generation. The wrapper
method is slower than the filter in finding good subsets
because it depends on resource demands for the algorithm
of modeling. Due to using real modeling algorithms, the
wrapper method is proven empirically to produce better
feature subsets [15].
• Particle swarm optimization (PSO)
Kennedy and Eberhart in 1995 proposed one of the
evolutionary computation techniques based on social
behavior such as fish schooling and bird flocking. The
basic idea behind PSO underlines that the populationsocial interaction optimizes knowledge where the thinking
is personal and social. The solutions are represented by
particles, while particles are represented by vectors that
have positions in the search space. Each vector
xi=(xi1,xi2,…xiD) Where D is the search space
dimensionality. To search for the optimal solutions, the
particles move in the search space. According to that, each

Informatica 47 (2023)11–20

13

particle has a velocity that can be represented by vi where
vi takes the values (vi1,vi2,….,viD). The particle updates
its location and velocity during the movement, and this
update is performed according to the neighbors and their
own experience. Two values of positions are recorded, the
best which represents the best previous personal position
of the particle, and gbest is the best-obtained position by
the population. The following equation is used to update
the position and velocity:
𝑡+1
𝑣𝑖𝑑

𝑡+1
𝑡
𝑡+1
𝑥𝑖𝑑
= 𝑥𝑖𝑑
+ 𝑣𝑖𝑑
(1)
𝑡
𝑡
) + 𝑐2 ∗ 𝑟2 ∗
= 𝑤 ∗ 𝑣𝑖𝑑 + 𝑐1 ∗ 𝑟1 ∗ (𝑝𝑖𝑑 − 𝑥𝑖𝑑
𝑡
(𝑝𝑔𝑑 − 𝑥𝑖𝑑 ) (2)

Where t is the tth iteration in the evolutionary process
while d represents the d dimension in the search space
where d belongs to D. The weight w it controls the
previous velocity impact on the current velocity impact.
The acceleration constants c1, c2 are random values in the
range (0 to 1), pid and pgd represent the elements of pbest,
gbest alternatively in the dimension dth. vmax is the
𝑡+1
maximum velocity where 𝑣𝑖𝑑
∈ [−vmax, vmax]. The
algorithm will stop when the predefined criterion of
fitness is met with a good fitness value or a predefined
number of maximum iterations [16][17].
• Info gain
In this feature selection evaluator, the information of each
class is estimated to evaluate the attribute. The method
used in this evaluator is minimum description-lengthbased discretization where the attributes are binarized or
discretized. In this method, the missing values are either
regarded as separate values or distributed the values
among other values according to the frequencies. As the
value of the feature is absent, the decrease in entropy is
measured. For the multiclass attribute, the InfoGain
evaluator has reported the best performance. The
generalized form of the nominal values is taken from the
nominal attribute. Info Gain is measured by the decrease
of X entropy that is caused by Y which is represented by:
𝐼𝐺(𝑋|𝑌) = 𝐻(𝑋) − 𝐻(𝑋|𝑌) (3)
According to this measurement, (Y) feature can be
considered as more correlated to (X) feature if (IG(X│Y)
> IG(Z│Y). IG normalized the values that fall within the
range (0 to1), where (1) value indicates that the predicted
value is completely correct and (0) value indicates that (X)
feature is independent of (Y) feature. For the nominal and
continuous features, the Entropy can be applied in order to
determine the correlation between continuous and
nominal features [18][19][20][21].
The Wrapper filter with SPO is applied to find and explore
the most correlated subsets of features that make the
highly accurate results for each supervised algorithm.
Wrapper as a subset of attributes evaluator is applied for
each supervised classifier individually. In this step,
different subsets of features are found for each classifier
where the SPO is selected as a search method to improve
the speed of search for features subsets. In order to find

14

Informatica 47 (2023) 11–20

S. Alija et al.

the effect of wrapper evaluator, Info Gain evaluator is
applied to find the features with high correlations with the
final class and to find how wrapper and Info Gain affect
the result algorithms accuracies of the algorithms. Table 2
shows the most correlated features (subset) after applying
wrapper with SPO for each algorithm and Info Gain with
Ranker.
Table 2: Selection of attributes
Feature Evaluator
Wrapper (Random Forest)
with SPO
Wrapper (NB) with SPO
Wrapper (Logistic) with SPO
Wrapper
with SPO

(SimpleLogistic)

Wrapper (SMO) with SPO
Wrapper (LMT) with SPO

Wrapper (J48) with SPO
Wrapper (Random Tree) with
SPO
Info Gain with Ranker

Attributes
1,5,6,7,8,9,10,12,13,14,16,1
7,18,27,33,36,44,49,52,53,5
6
5,8,14,18,25,31,42,48
2,4,5,6,11,13,17,31,35,48,5
1,52,53,54,57
1,4,5,6,8,9,11,15,17,23,26,2
7,28,31,32,34,42,44,46,50,5
2,53,55
4,5,14,15,17,24,31,32,35,42
,45,47,54,55,56,57
1,2,4,5,6,7,8,9,11,14,15,17,
19,20,21,23,25,26,27,28,32,
34,41,42,44,49,52,53,55
5,7,13,22,23,24,26,31,35,42
,45,46,52
5,15,27,33,35,43,44,45,46,4
8,49
5,57,19,18,21,17,20,22
,15,23,26,25,24,16,14,28,7,
4,3,2,6

Figure 2: Comparison of number of minority correct for
replicated oversampling and SMOTE for a dataset [28].
In our imbalanced dataset, the percentage of classes’
representation is shown in Table 3. Class (3) takes only
(4.3%) of the overall dataset, followed by classes (1, and
2) respectively with (21.3%, and 21.9%). The SMOTE
filter in our model will be implemented on the classes (1,2,
and 3) to make the dataset balanced and to get reliable
performances of the algorithms. The SMOTE filter is
applied to get equal representations of all classes.
Table 3: Classes representation

3.3
Synthetic minority over-sampling
technique (SMOTE)
The dataset is said to be imbalanced if the classes in the
final class are not equally represented [22]. If the final
class has the classes (1,2, and 3) and the representations of
the classes are (10% for 1, 15% for 2 and 75% for 3) then
the dataset is imbalanced. The imbalanced datasets are
found in almost all sectors starting from the medical sector
[23], telecommunications management [24], fraudulent
telephone calls [25], and text classification [26]. The
SMOTE approach creates “synthetic” examples, to
oversample the minority classes or by replacing the
samples. This approach has been inspired and proven its
success by the recognition process of handwritten
characters [27]. The generation of synthetic examples is
performed based on the operating in the feature space
rather than the data space. The data space will face certain
operations to generate the training data. The process of
oversampling is performed by taking each minority class
attribute of the final class attribute and introducing new
examples (synthetic) along the line segments which join
all k classes if they are nearest neighbors. The selection of
the k nearest neighbors is performed randomly according
to the oversampling amount required. The synthetic
samples generation is implemented by taking the
difference between each sample with its neighbors, then
the result difference is multiplied by a random number
between 0 and 1; then the result obtained is added to the
feature vector. This process effectively forces to make the
minority class more generally, see Figure 2 [28].

Class
0
1
2
3

3.4

Number of Rows
74
30
31
6

% of Representation
52.5%
21.3%
21.9%
4.3%

Supervised machine learning (ML)

In the proposed model, many supervised ML algorithms
have been implemented to find the accurate algorithm for
predicting the number of failed courses for the students.
The algorithms fall in approaches such as (decision tree
(DT) (Random Forest, Random Tree, LMT, and J48),
naïve Bayes (Naïve Bayes, and Bayes Net), Logistic
(Logistic and Simple Logistic), and Support Vector
Machine (SMO)). DT is one of the supervised ML
approaches that aim to build a training model to be used
in predicting the final class attribute [29]. DT classifiers
are widely used in different sectors and have proved their
accuracies in the fields of education [11], [30][31],
healthcare [32], wireless sensor networks [33], image
processing [34][35], and disaster management [36][37].
There are many types of algorithms and the most used
algorithms are (Random Forest, CART, Iterative
Dichotomies 3 (ID3), and Successor of ID3 (C4.5 or J48)
[38][39]. DT is used in the field of classification
(predicting the categorical values) and regression
(predicting the continuous values) [40]. Random Forest
(which was proposed by L. Breiman in 2001) is a generalpurpose regression and classification approach that works
on the principle of aggregating the predictions by
calculating the predictions averages and shows excellent

Predicting Students Performance Using Supervised Machine…

performance when the variables numbers is larger than the
number of the observations [41]. In logistic model trees
(LMT), logistic regression is utilized to select the
attributes in a natural way by using stage-wise fitting. The
logistic model in this approach is built on leaves by
refining the leaves incrementally at the higher level of the
tree [42].
SVM is an ML algorithm that falls under the supervised
learning algorithm [43], as it is one of the data-based
algorithms used to solve classification problems. It is
considered one of the most important algorithms to
accomplish that task (solving classification problems)
[44]. Support Vector Machine has a vector support
processing approach in which many questions are
answered depending on the understanding and knowledge
of the problem and how to design it. Moving to the real
world, we find that the Support Vector Machine algorithm
was used to find solutions to many problems in this world,
including face recognition, detection, hand lines, and
others [45]. In order to understand the SVM algorithm, it
is necessary to understand its main terminology, the
maximum-margin hyperplane, the separating hyperplane,
the soft margin, and the kernel function [43]. SVM can be
classified into two types: Linear SVM, and Non-Linear
SVM. Linear SVM is an algorithm used when the data can
be separated into two groups in a linear way by using a
straight line where the data can be called as linearly
separable, in addition to that the classifier is described as
SVM classifier. Non-Linear SVM is an algorithm used
when the data cannot be separated in a linear manner, and
thus a straight line cannot be used to separate the data into
two categories. To compensate for this, another thing
called the kernel trick is used, through which we define
the data in a higher dimension to be separated using some
mathematical functions.
Regression is considered a simple type of ML algorithm.
It is considered a supervised learning algorithm. These
algorithms are used in a wide range to find a relationship
between the continuous predictor and response variables.
It is considered a way to measure the relationships
between response variables and continuous predictors
[46]. An example of this is the linear regression algorithm,
which is one of the supervised learning algorithms, where
this algorithm simulates the mathematical relationship
between variables. It attempts to find relationships
between independent variables (input data) and dependent
variables (result, and forecast). It works to find continuous
or numerical variables by predicting that as it assumes that
the relationships between the predicted variables and the
goal to be reached are linear, such as sales, age, and
product price. The regression may be linear or curvilinear,
so it must pass through all data points to reach the target
prediction so that if the measurement is made between the
data points and the regression line, the result is minimal.
In order to solve classification problems, a logistic
regression algorithm was built, which is one of the
supervised learning algorithms, where the results are
always binary, not devoid of one of the two values, either
1 or 0, success or failure, rain or no rain; its working
principle is probability. Logistic regression is used in the
analysis of binary outcomes, or as it is said that they are

Informatica 47 (2023)11–20

15

two-level, or whose levels are opposite [47]. A
characteristic of logistic regression is that its predictions
are deterministic and have the ability to adapt to multiple
predictions. This is necessary for the analysis of
observational data when adjustment is useful to avoid
differences in the totals to be compared [48]. Logistic
regression is used to reach the highest weighting of a
variable in the event that there is more than one variable.
Thus, it is similar in terms of multiple linear regression
and is inconsistent with it that the response variable has
only a binomial, and as a result, each variable is
considered to have an impact on the likelihood ratio of any
expected event. Hence, it has the advantage that it can
avoid confusing influences by analyzing the correlations
of all variables at the same time [49].
NB is considered one of the supervisor learning
algorithms; it is based on Bayes’ rule together with
additional to strong assumptions attributes that are
categorically and conditionally independent [50]. Then it
is used for solving classification problems. This algorithm
assumes conditional independence of traits; so it is rarely
true in the real world, which has made the competitive
performance of this algorithm a lot of attention and
surprising [51]. The Naïve Bayes algorithm is used in a
wide range of applications, including article classification
and spam filtering. Naïve Bayes Classifier is able to build
ML model through which we get fast predictions. The
hypothesis states that the independence between every two
features, so the naïve Bayes classifier calculates the
probability of belonging to a certain class. As a product of
simple probabilities resulting from assumed Naïve
independence. The hypothesis states that there is
independence between each of the two features, so the
Naïve Bayes classifier computes the probability of a
particular instance belonging to a particular class. If we
assume that the described is described by a vector x of
attributes and the target of the class is the element y, then
we can express the conditional probability p(y|x) as the
product of the simple probabilities resulting from the
assumed naïve Bayes independence [52].
Bayesian networks are considered probabilistic models
that depend mainly on non-periodic direct graphs. These
models are causal relationships between their variables,
and their structure represents the combination of previous
knowledge and target data. They are also called belief
networks as they belong to probabilistic graphical models,
and knowledge can be represented in uncertain domains
through the use of their graphical structures. It is observed
by looking at its graphs, where nodes represent random
variables, while arrows between nodes (variables)
represent probabilistic dependencies. In most cases,
generally accepted statistical methods are used to estimate
these conditional dependencies. Hence, we can say that
Bayesian networks combine graphs and statistics as well
as computer science and probability theory [53]. Also,
Bayesian networks are used to perform causal logic and
predict risks. In addition, there are many advantages if we
compare it with the methods used in regression methods
[54]. One of Bayes Network's products is the modeling
language in addition to the inference algorithms associated
with random domains. Experiments have proven a lot of

16

Informatica 47 (2023) 11–20

S. Alija et al.

success when used in medium-sized applications. But if
Bayesian networks are used in areas that are relatively
complex or large domains, then these networks will use
the task of modeling, which is somewhat similar to
programming using logic circuits [55].

3.5

Model evaluation

The evaluation process of algorithms is performed based
on the confusion matrix, see Figure 3. The class value of
True Negative (TN) is the predicted class as (NO) and it is
(NO), while the class value of False Positive (FP) is the
class when it is predicted as (YES) and it is (NO). False
Negative (FN) class value is the class when it is predicted
as (NO) and it is (YES) while True Positive (TP) class
value is the class when it is predicted as (YES) and it is
(YES).

Figure 3: Confusion matrix.
Based on the above matrix, the performance criteria are:
𝑇𝑃
𝑆𝑒𝑛𝑠𝑒𝑣𝑖𝑡𝑦 𝑜𝑟 𝑇𝑃 =
(4)
𝑇𝑃+𝐹𝑁
𝐹𝑃

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 𝑜𝑟 𝐹𝑃 =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑅𝑒𝑐𝑎𝑙𝑙 =

𝑇𝑃

𝐹𝑃+𝑇𝑁

𝑇𝑃+𝐹𝑃
𝑇𝑃

𝑇𝑃+𝐹𝑁

(5)

(6)

(7)

The sensitivity or recall is a measurement of the truly
predicted cases and measures the relevance of TP with FN.
The more the TP rate, the more accurate the predicted
cases and the more accurate the classification algorithm.
The specificity or FP rate is the false alarm rate that
measures the incorrectly predicted cases. The more FP, the
more predicted incorrect cases. The precision represents
the relevant cases among the predicted cases [29]–[31].

uncorrelated features where the Wrapper base classifier is
the supervised algorithm. Then, the SMOTE filter is
applied to get equal representations of classes for the final
class. RF algorithm outperforms the other supervised
algorithms with (85.6% in TP rate and Recall), (4.9% in
FP rate) and (85.7%) in precision. C4.5 (J48) algorithm
comes in the second rank with (79.6% in TP rate and
Recall), (6.7% in FP rate), and (79.6%) in Precision. NB
comes in the last rank with (71.7% in TP rate and Recall),
(9.4%) in FP rate, and (71.1%) in Precision.
Table 5: Algorithms performance after info gain
evaluator.
Algorithm
LMT
Random
Forest
Random
Tree
NB
Logistic
Simple
Logistic
SMO
J48
Bayes Net

TP
Rate
0.750
0.836

FP
Rate
0.083
0.054

Precision

Recall

0.749
0.835

0.750
0.836

0.701

0.099

0.696

0.701

0.678
0.737
0.707

0.107
0.087
0.097

0.678
0.735
0.701

0.678
0.737
0.707

0.734
0.753
0.750

0.088
0.082
0.083

0.730
0.750
0.753

0.734
0.753
0.750

Table 5 depicts the performance criteria of supervised ML
algorithms after implementing Info Gain. The algorithms
are implemented after removing the uncorrelated features
(36 features), then the SMOTE filter is applied to get equal
representations of classes for the final class. RF algorithm
outperforms the other supervised algorithms with (83.6%
in TP rate and Recall), (5.4% in FP rate) and (83.5%) in
precision. C4.5 (J48) algorithm comes in the second rank
with (75.3% in TP rate and Recall), (8.2% in FP rate), and
(75%) in Precision. NB comes in the last rank with (67.8%
in TP rate and Recall), (10.7%) in FP rate, and (67.8%) in
Precision.

Table 4: Algorithms performance after wrapper with
SPO.
Algorithm
LMT
Random
Forest
Random
Tree
NB
Logistic
Simple
Logistic
SMO
J48

TP
Rate
0.766
0.856

FP
Rate
0.078
0.049

Precision

Recall

0.762
0.857

0.766
0.856

0.697

0.100

0.695

0.697

0.717
0.727
0.773

0.094
0.091
0.075

0.711
0.729
0.770

0.717
0.727
0.773

0.757
0.796

0.081
0.067

0.752
0.796

0.757
0.796

Table 4 lists the performance evaluation of supervised
algorithms after implementing Wrapper with SPO. The
algorithms are implemented after removing the

Figure 4: ROC of algorithms with wrapper and info gain.
One of the performance criteria that determines the
optimal classifiers is the Receiver Operating
Characteristic (ROC) curve, where ROC is considered one
of the standard techniques that summarize classifier
performance over a range of tradeoffs between TP and FP
error rates [32][28]. As much as the ROC is closer to 1, as
much as the classifier is accurate. Based on Figure 4, the
RF classifier is the optimal classifier among all other
classifiers with (96.7%) ROC when the wrapper with SPO

Predicting Students Performance Using Supervised Machine…

is implemented. The ROC is (96.1%) for the same
classifier when Info Gain is implemented. The figure
shows that ROCs for all algorithms are enhanced after
implementing a wrapper evaluator with SPO. NB is the
only classifier that has (89.1%) ROC when implementing
wrapper and (89.5%) with Info Gain Evaluator.

4

Conclusions and future works

The imbalanced dataset faced many techniques and
approaches to solve the minority and majority class
problems related to the final class. In our model, the
imbalanced dataset has multi-values in the final class
which is required to handle this problem using SMOTE
filter. In our model, the step of feature selection is
performed two ways, the first one is by applying wrapper
evaluator with SPO as a search method to find subsets of
attributes that may affect and be correlated with the final
class, and the second one by applying Info Gain as an
evaluator with ranker as a search method to find the
features with most correlation with the final class. After
finding the most correlated features or feature subsets
using evaluators, the uncorrelated features are removed
and the SMOTE filter is applied to produce a balanced
dataset and to make the multi-values classes equally
represented. Many supervised ML algorithms are applied
such as (NB, RF, Random Tree, LMT, J48, Logistic,
Simple Logistic, and SMO). The performance evaluation
of the algorithms shows that using the wrapper with the
classifiers and SPO as a search method outperforms the
Info-Gain evaluator. RF algorithm outperforms other
algorithms in predicting students’ performance and the
number of failed courses. The model can be updated by
predicting the students’ status whether will fail or pass the
final class. The features will be explored and investigated
using different filters and classifiers to find the features
with the most correlations with students’ failure.

References
[1] U. Bin Mat, N. Buniyamin, P. M. Arsad, and R. A.
Kassim, “An overview of using academic analytics
to predict and improve students’ achievement: A
proposed proactive intelligent intervention,” in 2013
IEEE 5th International Conference on Engineering
Education: Aligning Engineering Education with
Industrial Needs for Nation Development, ICEED
2013,
126-130,
2014.
https://doi.org/10.1109/iceed.2013.6908316.
[2] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth,
“From data mining to knowledge discovery in
databases,” AI Mag., vol. 17, no. 3, 37-37, 1996.
[3] A. El-Halees, “Mining Students Data To Analyze
Learning Behavior : a Case Study Educational
Systems,” Work, 2008.
[4] A. B. E. D. Ahmed and I. S. Elaraby, “Data Mining:
A prediction for performance improvement using
classification,” World J. Comput. Appl. Technol.,

Informatica 47 (2023)11–20

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]
[13]

[14]

[15]

[16]

[17]

[18]

17

vol.
2,
no.
2,
2014.
https://doi.org/10.13189/wjcat.2014.020203
U. K. Pandey and S. Pal, “Data Mining: A prediction
of
performer
or
underperformer
using
classification,” arXiv Prepr. arXiv1104.4163, 2011.
S. M. M. Syed Tahir Hijazi & Raza Naqvi, “Factors
affecting students’ performance: A case of private
colleges,” Bangladesh e-Journal Sociol., vol. 3, no.
1, pp. 1–10, 2006.
Z. N. Khan, “Scholastic Achievement of Higher
Secondary Students in Science Stream,” J. Soc. Sci.,
vol. 1, no. 2, 2005.
https://doi.org/10.3844/jssp.2005.84.87
Z. J. Kovacic, “Early Prediction of Student Success:
Mining Students Enrolment Data,” in Proceedings of
the 2010 InSITE Conference, 2010.
https://doi.org/10.28945/1281
G. (Univ T. A. Ben-Zadok, R. (Univ T. A. Mintz, A.
(Univ T. A. Hershkovitz, and R. (Univ T. A.
Nachmias, “Examining online learning processes
based on log files analysis: A case study,” Res.
Reflections Innov. Integr. ICT Educ. Proc. Fifth
Intertnational Conf. Multimdeia ICT Educ., no. 2,
2009.
Q. A. Al-Radaideh, E. M. Al-Shawakfa, and M. I.
Al-Najjar, “Mining student data using decision
trees,” in International Arab Conference on
Information Technology (ACIT’2006), Yarmouk
University, Jordan, 2006.
A. K. Hamoud, A. S. Hashim, and W. A. Awadh,
“Predicting Student Performance in Higher
Education Institutions Using Decision Tree
Analysis,” Int. J. Interact. Multimed. Artif. Intell.,
2018.
https://doi.org/10.9781/ijimai.2018.02.004
B. Carson, “The transformative power of action
learning,” Chief Learn. Off. Retrieved, 2017.
U. Sekaran and R. Bougie, Research methods for
business: A skill building approach. john wiley &
sons, 2016.
B. Remeseiro and V. Bolon-Canedo, “A review of
feature selection methods in medical applications,”
Computers in Biology and Medicine, vol. 112. 2019.
https://doi.org/10.1016/j.compbiomed.2019.103375
Y. Kim, W. N. Street, and F. Menczer, “Evolutionary
model selection in unsupervised learning,” Intell.
Data Anal., vol. 6, no. 6, 2002.
https://doi.org/10.3233/ida-2002-6605
B. Xue, M. Zhang, and W. N. Browne, “Particle
swarm optimization for feature selection in
classification: A multi-objective approach,” IEEE
Trans. Cybern., vol. 43, no. 6, 2013.
https://doi.org/10.1109/tsmcb.2012.2227469
Y. Shi and R. Eberhart, “Modified particle swarm
optimizer,” in Proceedings of the IEEE Conference
on Evolutionary Computation, ICEC, 1998.
https://doi.org/10.1109/icec.1998.699146
L. Yu and H. Liu, “Feature Selection for HighDimensional Data: A Fast Correlation-Based Filter
Solution,” in Proceedings, Twentieth International
Conference on Machine Learning, 2003, vol. 2.

18

Informatica 47 (2023) 11–20

[19] E. Frank, M. A. Hall, and I. H. Witten, “The WEKA
Workbench Data Mining: Practical Machine
Learning Tools and Techniques,” Morgan
Kaufmann, Fourth Ed., 2016.
https://doi.org/10.1016/b978-0-12-374856-0.000109
[20] U. M. Fayyad and K. B. Irani, “Multi-interval
discretization of continuous-valued attributes for
classification learning,” in Proceedings of the 13th
International Joint Conference on Artificial
Intelligence, 1993.
[21] H. Liu, F. Hussain, C. L. Tan, and M. Dash,
“Discretization: An enabling technique,” Data Min.
Knowl. Discov., vol. 6, no. 4, 2002.
[22] F. Provost and T. Fawcett, “Robust classification for
imprecise environments,” Mach. Learn., vol. 42, no.
3, 2001.
[23] A. S. Desuky, A. H. Omar, and N. M. Mostafa,
“Boosting with crossover for improving imbalanced
medical datasets classification,” Bull. Electr. Eng.
Informatics, vol. 10, no. 5, 2021.
https://doi.org/10.11591/eei.v10i5.3121
[24] J. Xiao, L. Xie, C. He, and X. Jiang, “Dynamic
classifier ensemble model for customer classification
with imbalanced class distribution,” Expert Syst.
Appl., vol. 39, no. 3, 2012.
https://doi.org/10.1016/j.eswa.2011.09.059
[25] C. Lu, S. Lin, X. Liu, and H. Shi, “Telecom fraud
identification based on ADASYN and random
forest,” in 2020 5th International Conference on
Computer and Communication Systems, ICCCS
2020, 2020.
https://doi.org/10.1109/icccs49078.2020.9118521
[26] C. Padurariu and M. E. Breaban, “Dealing with data
imbalance in text classification,” in Procedia
Computer Science, 2019, vol. 159.
https://doi.org/10.1016/j.procs.2019.09.229
[27] T. M. Ha and H. Bunke, “Off-line, handwritten
numeral recognition by perturbation method,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 19, no. 5,
1997.
https://doi.org/10.1109/34.589216
[28] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P.
Kegelmeyer, “SMOTE: synthetic minority oversampling technique,” J. Artif. Intell. Res., vol. 16, pp.
3 https://doi.org/10.1613/jair.953 21–357, 2002.
[29] M. A. Kumar and A. J. Laxmi, “Machine Learning
Based Intentional Islanding Algorithm for DERs in
Disaster Management,” IEEE Access, vol. 9, 2021.
https://doi.org/10.1109/access.2021.3087914
[30] A. K. Hamoud, “Selection of Best Decision Tree
Algorithm for Prediction and Classification of
Students’ Action,” Am. Int. J. Res. Sci. Technol.
Eng. Math., vol. 16, no. 1, pp. 26–32, 2016.
[31] A. S. Hashim, W. A. Awadh, and A. K. Hamoud,
“Student performance prediction model based on
supervised machine learning algorithms,” in IOP
Conference Series: Materials Science and
Engineering, 2020, vol. 928, no. 3, p. 32019.
https://doi.org/10.1088/1757-899x/928/3/032019

S. Alija et al.

[32] T. Saba, I. Abunadi, M. N. Shahzad, and A. R. Khan,
“Machine learning techniques to detect and forecast
the daily total COVID-19 infected and deaths cases
under different lockdown types,” Microsc. Res.
Tech., vol. 84, no. 7, 2021.
https://doi.org/10.1002/jemt.23702
[33] I. A. Najm, A. K. Hamoud, J. Lloret, and I. Bosch,
“Machine Learning Prediction Approach to Enhance
Congestion Control in 5G IoT Environment,”
Electronics, vol. 8, no. 6, p. 607, May 2019.
https://doi.org/10.3390/electronics8060607
[34] J. Chen, Y. Lian, and Y. Li, “Real-time grain
impurity sensing for rice combine harvesters using
image processing and decision-tree algorithm,”
Comput. Electron. Agric., vol. 175, 2020.
https://doi.org/10.1016/j.compag.2020.105591
[35] I. S. Masad, A. Al-Fahoum, and I. Abu-Qasmieh,
“Automated measurements of lumbar lordosis in T2MR images using decision tree classifier and
morphological image processing,” Eng. Sci.
Technol. an Int. J., vol. 22, no. 4, 2019.
https://doi.org/10.1016/j.jestch.2019.03.002
[36] S. Khatoon et al., “Development of social media
analytics system for emergency event detection and
crisismanagement,” Comput. Mater. Contin., vol. 68,
no. 3, 2021.
https://doi.org/10.32604/cmc.2021.017371
[37] H. Li, D. Caragea, C. Caragea, and N. Herndon,
“Disaster response aided by tweet classification with
a domain adaptation approach,” J. Contingencies
Cris. Manag., vol. 26, no. 1, 2018.
https://doi.org/10.1111/1468-5973.12194
[38] Y. Y. Song and Y. Lu, “Decision tree methods:
applications for classification and prediction,”
Shanghai Arch. Psychiatry, vol. 27, no. 2, 2015.
[39] N. Mahdi Abdulkareem and A. Mohsin Abdulazeez,
“Machine Learning Classification Based on Radom
Forest Algorithm: A Review,” Int. J. Sci. Bus., vol.
5, no. 2, 2021.
[40] S. M. Rasoolimanesh, M. Wang, J. L. Roldán, and P.
Kunasekaran, “Are we in right path for mediation
analysis? Reviewing the literature and proposing
robust guidelines,” J. Hosp. Tour. Manag., vol. 48,
2021.
https://doi.org/10.1016/j.jhtm.2021.07.013
[41] G. Biau and E. Scornet, “A random forest guided
tour,” Test, vol. 25, no. 2, 2016.
https://doi.org/10.1007/s11749-016-0481-7
[42] N. Landwehr, M. Hall, and E. Frank, “Logistic
Model Trees,” Mach. Learn., vol. 59, no. 1, pp. 161–
205, 2005.
https://doi.org/10.1007/s10994-005-0466-3
[43] W. S. Noble, “What is a support vector machine?”
Nature Biotechnology, vol. 24, no. 12. 2006.
https://doi.org/10.1038/nbt1206-1565
[44] T. Joachims, “Svmlight: Support vector machine,”
SVM-Light
Support
Vector
Mach.
http//svmlight.joachims.org/, Univ. Dortmund, vol.
19, no. 4, 1999.
[45] S. Ghosh, A. Dasgupta, and A. Swetapadma, “A
study on support vector machine based linear and

Predicting Students Performance Using Supervised Machine…

[46]

[47]

[48]
[49]

[50]

[51]
[52]

[53]
[54]

[55]
[56]

non-linear pattern classification,” in Proceedings of
the International Conference on Intelligent
Sustainable Systems, ICISS 2019, 2019.
https://doi.org/10.1109/iss1.2019.8908018
K. Park, R. Rothfeder, S. Petheram, F. Buaku, R.
Ewing, and W. H. Greene, “Linear regression,” in
Basic Quantitative Research Methods for Urban
Planners, 2020.
https://doi.org/10.4324/9780429325021-12
A. J. Scott, D. W. Hosmer, and S. Lemeshow,
“Applied Logistic Regression.,” Biometrics, vol. 47,
no. 4, 1991.
https://doi.org/10.2307/2532419
B. R. Kirkwood and J. A. C. Sterne, Essential
Medical Statistics. 2003.
S. Sperandei, “Understanding logistic regression
analysis,” Biochem. Medica, vol. 24, no. 1, 2014.
https://doi.org/10.11613/bm.2014.003
G. I. Webb, E. Keogh, and R. Miikkulainen, “Naïve
Bayes.,” Encycl. Mach. Learn., vol. 15, pp. 713–714,
2010.
https://doi.org/10.1007/978-0-387-30164-8_576
H. Zhang, “The optimality of naive Bayes,” Aa, vol.
1, no. 2, p. 3, 2004.
W. Lou, X. Wang, F. Chen, Y. Chen, B. Jiang, and
H. Zhang, “Sequence based prediction of DNAbinding proteins based on hybrid feature selection
using random forest and Gaussian naive Bayes,”
PLoS One, vol. 9, no. 1, p. e86703, 2014.
https://doi.org/10.1371/journal.pone.0086703
J. Pearl, “Bayesian networks,” 2011.
P. Arora, D. Boyne, J. J. Slater, A. Gupta, D. R.
Brenner, and M. J. Druzdzel, “Bayesian networks for
risk prediction using real-world data: a tool for
precision medicine,” Value Heal., vol. 22, no. 4, pp.
439–445, 2019.
https://doi.org/10.1016/j.jval.2019.01.006
D. Koller and A. Pfeffer, “Object-oriented Bayesian
networks,” arXiv Prepr. arXiv1302.1554, 2013.
A. Khalaf et al., “Supervised Learning Algorithms in
Educational Data Mining: A Systematic Review,”
Southeast Eur. J. Soft Comput., vol. 10, no. 1, pp.
55–70, 2021.

Informatica 47 (2023)11–20

19

20

Informatica 47 (2023) 11–20

S. Alija et al.

https://doi.org/10.31449/inf.v47i1.4421

Informatica 47 (2023) 21–42

21

On Integrating Multiple Restriction Domains to Automatically Generate Test
Cases of Model Transformations
Thi-Hanh Nguyen and Duc-Hanh Dang∗
Department of Software Engineering,
VNU University of Engineering and Technology, Hanoi, Vietnam
E-mail: hanhit@hnue.edu.vn, hanhdd@vnu.edu.vn
∗
Corresponding author
Keywords: transformation testing, black-box testing, classifying term, model finding, OCL
Received: September 24, 2022
Testing model transformations poses several challenges, one of which is how to automatically generate
effective test suites. A promising approach for this is to employ equivalence partitioning, a well-known
technique for software testing. Specifically, in order to generate effective test suites, current works in
literature often focus on exploiting either the structural aspects of models or transformation contracts for
partition analysis. However, for the aim, they focus on only a single restriction source such as metamodels,
contracts of the transformation, and domain-expert knowledge. To increase the effectiveness of generated
test suites, partitioning techniques should be performed on a combination of various restriction sources.
This paper introduces a method to generate test models on such a multi-domain of restrictions. The method
also allows the tester to flexibly select and combine constraints to create a unified restriction for different
strategies and objectives in model transformation testing. We developed a support tool based on the UMLbased Specification Environment (USE) and performed experiments on several transformations to point
out the effectiveness of our method.
Povzetek: Opisana je metoda preverjanja programske kode na osnovi multi-modalnih omejitev posameznih
delov.

1 Introduction
Model transformations are the pillars of Model-Driven Engineering (MDE). Testing has been an effective technique
to ensure the quality of model transformations which is the
key to successfully realizing MDE in practice. This discipline consists of the following main tasks: synthesizing
models as test data that are referred to as test models, performing the transformation, and verifying the output results. Until now, how to synthesize automatically and effectively test models for model transformations is still challenging.
The test model generation is the synthesis of models from
different restriction sources including syntactic and semantic domains of source and target models. Such restriction
domains often have complex structures and semantics that
make it difficult to automate the generation. To the best of
our knowledge, there are typical restriction domains in the
context of MDE as follows. First, for a so-called source
metamodel coverage, as explained in [1, 2, 3, 4, 5, 6], test
models could be generated by applying the well-known
testing technique equivalence partitioning that splits the input metamodel into equivalence partitions for selecting representative test models. Second, for a so-called transformation specification coverage, as proposed in [7, 8, 11], additional restrictions on source models could be derived from
a transformation specification and taken as input contracts

to generate test models. Within the works, input contracts
of the transformation specification often are expressed as
OCL conditions. Third, following the white-box testing
approach, the works in [12, 13, 14, 15] focus on analyzing a model transformation implementation to build test
suites using the notion of transformation implementation
coverage. In addition, in interactive approaches, domain
knowledge can support the test model selection. For example, based on the test objective, domain experts could
choose representative values for the partition testing technique [1, 4, 16], or directly create examples for test models
within test-driven development approaches, as explained
in [18, 19].
Generating test models based on the analysis and synthesis of each single particular restriction domain can lead to
a large duplication of test models, wasting testing time and
effort. This highlights the need to generate test models from
multiple restriction domains. However, realizing this need
presents several challenges: (1) Constraints from multiple
domains expressed in heterogeneous formalism need to be
translated into a consistent and unified formalism to enable
model synthesis. (2) The partition analysis technique is often employed to obtain representative test models since exhaustive testing is a non-trivial task, but defining a suitable
partition on multiple restriction domains for different test
strategies can be challenging. (3) The automatic generation
of test models often requires manually defining (as input of

22

Informatica 47 (2023) 21–42

the solver) parameters for the testing environment as well
as the other configuration information. This is challenging
to automate this task.
This paper proposes a mechanism based on an integration of multiple restriction domains for a black-box testing
approach to automatic generation of test models. Specifically, multi-domain restrictions that include (1) conditions
for partitioning the metamodel and (2) transformation contracts are first translated into OCL conditions; and then
taken as the input of a constraint solver for generating test
models. For each common test strategy, a mechanism of
combining OCL conditions should be established to define
combinatorial partitions using logical operators. Moreover,
a scope-value searching method needs to be incorporated to
solve constraints and so that the set of generated test models
has a reasonable size. The main contributions of this paper
are summarized as follows:
– A method to automatically generate test models with
multi-domain restrictions for effective model transformation testing.
– A mechanism to define suitable partitions for different
test strategies.
– An OCL-based support tool and experimental results
to show the effectiveness of the proposed method.
The rest of this paper is organized as follows. Section 2
surveys related works. Section 3 motivates this work with
a transformation example. Section 4 outlines our approach.
Section 5 explains restriction domains as a basis for a partition analysis and automatic generation of test models. Section 6 introduces several strategies to combine partitions in
order to generate test models for different test objectives.
Section 7 shows our tool support. Section 8 illustrates our
testing method with several transformations and points out
the effectiveness of our method. Section 9 explains threats
to the validity of this work and discusses the results. This
paper is closed with a conclusion and a discussion of future
work.

2

Related work

In this section, we provide an overview of black-box testing
approaches for model transformations and address the following research questions: (1) how to automatically generate test models using the partition analysis technique in
a black-box testing approach, (2) how to construct test
oracles that check test outputs to ensure quality properties; and (3) how to evaluate the quality of test suites in
terms of one or more test objectives. First, a common
basic idea of black-box testing approaches for transformations is to use metamodel and requirements specification as test basis, i.e., they are independent of transformation implementations. Within these approaches, the wellknown testing technique equivalence partition [20] often
is used to split the input data domain into equivalence

N. Hanh et al.

partitions based on the test basis analysis and then to select a representative model for each partition. Fleurey et
al. [1, 24] have proposed a partitioning technique based on
the datatype of class attributes and the association end multiplicity within a UML class diagram representing the metamodel. Several other partitioning techniques for generating test models that conform to a metamodel as introduced
in [2, 21, 4, 23, 5, 22, 6, 39, 9]. One of the main limitations of the metamodel-based partitioning approach is that
the technique often generates a large number of test models,
and generated test models tend to correspond to just only a
subfragment of the source metamodel instead of the whole
metamodel. To overcome this limitation, Fleurey introduces the notion of a so-called effective metamodel as the
fragment of the source metamodel that is actually manipulated by the transformation. An effective metamodel can
be defined by either examining the specification of transformations as explained in [1, 3] or statically analyzing their
implementation as shown in [21].
The partitioning technique, as shown in [7, 10, 8, 9],
can also be employed to specify requirements of model
transformations. Following the research line, the works
in [9, 39, 8] propose to derive partitioning conditions from
a contract-based specification of the transformation. The
specification of transformations within these approaches
often includes preconditions and invariants as contracts
on the input data domain of the transformation (i.e., corresponding to restrictions on source models). Partitioning conditions are then translated into either OCL constraints [7] or other first-order logic languages like Alloy [4, 23] for the automatic generation of test models using the model finding technique. For different test objectives, the works in [1, 6, 8, 4, 36] have proposed suitable
techniques to improve the quality of test cases. Fleurey et
al. [1] proposed the use of the Bacteriologic algorithm to optimize test suites. On analyzing metamodel-based partitioning, Fleurey et al. [1], Janabin et al. [6], Gogolla et al. [8],
and Sen et al. [4] proposed using representative values provided by domain experts or testers. Similarly, Sen et al. [4]
proposed combining different knowledge domains and uniformly representing them as constraints in Alloy. Cabot et
al. [36] proposed a similar technique in which combinatorial partitioning conditions are represented in OCL.
Second, another major challenge for model transformation testing is how to predict desired expected outputs [1].
This research line can be divided into two groups: The first
aims to predict the whole output model, i.e., making use
of a complete oracle function, and the other aims to predict just part of desired target model, i.e., using a partial
oracle function. The first approach (with complete oracle functions) would take the expected output model as a
reference model and check if the actual output model conforms to this reference model, e.g., using model comparison
as regarded in [32, 15, 17, 30, 34, 35]. For this aim, Addazi et al. [35] employs EMFCompare, whereas the works
in [17, 15] design specific algorithms to compare models.
Besides, Kolovos [33] employs the Epsilon Comparison

On Integrating Multiple Restriction Domains…

Informatica 47 (2023) 21–42

23

Table 1: Black-box approaches for test model generation (MP=> Metamodel-based Partitioning; SP => Specificationbased Partitioning; MF => Model Finding; Al => Algorithm)

Language (ECL), a task-specific model management language, in order to define language-specific algorithms for
model matching. The second approach (with partial oracle
functions) aims to ensure certain properties of a transformation using the partial oracle functions. It is no need to
manually define a whole expected target model within the
approach. The works in [8, 28, 7] employ OCL contracts
or OCL assertions as partial test oracles to express the expected properties of generated models and to automatically
verify them. Contracts and assertions can be represented in
the form of visual graph patterns as explained in [7, 31].
Testing is an informal approach for verifying the quality
properties of model transformations. Depending on a given
test objective, partial oracle functions aim to check whether
the functional behavior of a transformation system fulfills
such following properties: (1) confluence, applicability and
termination which are called general properties; (2) correctness including syntactic and semantics correctness (for
both information preservation and behavior preservation)
and the completeness which are called specific properties.
A specific property is often specific to a certain transformation specification. The analysis of general properties such
as termination, determinism, rule independence, rule applicability, or reachability of system states requires to perform
on a set of given transformation rules. This task is out of
the scope of this paper.
The works in [28, 15, 7, 17, 1] propose to verify the
syntactical correctness of transformations using test oracles
captured from the target model’s contracts. To check the
source-target correspondence property [29], also known as
the information preservation property, current approaches
often employ source-target contracts represented either by
OCL conditions [8, 7] or graph patterns [36] to consistently
specify input test conditions and output test oracles. This
work focuses on analyzing the impact of constraints used
for test model generation on different transformation properties.

AL
AL
MF
MF
MF
MF

*

*
*
*
*
*

*
*
*
*
*

*

Specification
coverage

*
*
*
*
*
*
*
*
*

Metamodel
coverage

AL

Test adequacy

Domain
Expert

Pattern
Pattern
OCL
Pattern
Pattern
Alloy
Pattern, OCL
OCL
OCL

Transformation
implementation

Metamodel

MP
MP
MP
MP, SP
MP
MP, SP
SP
MP
SP

Specification

Generating
models

Restriction domains

Representing
patterns

Fleurey et al. [1]
Wang et al. [2, 21]
Wu et al. [5, 22]
Lamari [3]
Jahanbin et al. [6]
Sen et al. [4, 23]
Guerra et al. [7]
Burgueño et al. [9, 39]
Gogolla et al. [10, 27]

Test Model Generation
Partitioning

Reference

*
*
*
*
*

Test adequacy criteria measure the quality of a test suite
regarding to several objectives. Test adequacy criteria help
define testing goals to be achieved. In transformation testing, test adequacy criteria can be based on how well the
test basis (e.g., the input metamodel and the transformation
specification) is covered by the test models, or how effective the oracle functions are to identify synthetic bugs (socalled mutants) injected into the under-test transformation.
As shown in the two last columns of Table 1, coveragebased approaches propose to measure the effectiveness of
a black-box testing approach by evaluating how the input/output metamodels and/or the transformation specification are covered by the testing technique. Fleurey et al. [1]
propose to measure the quality of a set of test models by
measuring how much they cover the input metamodel. A
measurement technique is defined in terms of class coverage, attribute coverage, and association coverage. The
metamodel-coverage or effective metamodel-coverage are
also introduced in several other works [2, 21, 5, 22, 6, 9, 39].
The notion of transformation specification coverage is introduced in [7, 8]. Within contract-based specifications,
transformation contracts can be analyzed to define test conditions. For example, Guerra et al. [7] take preconditions
and invariants as transformation properties and define test
criteria that could cover these properties for generating test
models. Test criteria could also be defined based on the
combination of these properties within a combined testing
strategy like t-way testing.
Additionally, mutation analysis approaches aim to measure the effectiveness of test cases based on their ability
to detect bugs. Mottu et al. [50] propose exploring mutation analysis for model transformations. They study potential bugs that developers may bring into model transformations to define a set of generic mutation operators
for model transformations. The mutation analysis technique is commonly used by current works in literature to
effectively show the test case generated by proposed meth-

24

Informatica 47 (2023) 21–42

N. Hanh et al.

Table 2: Approaches for oracle function definition (PO=> Partial Oracle; CO => Complete Oracle; SC => Type
Correctness/Syntactic Correctness; IP => Information Preservation)
Reference

Oracle type

Guerra et al. [7]
Fleurey et al. [1]
Mottu el al. [31]
Wieber et al. [15]
Lano et al. [28]
Hilken et al. [8]
Lin et al. [17]
Troya et al. [30]
Orejas et al. [34]

PO
PO
CO
CO
PO
PO
PO
PO
PO

Representing
expected outputs
Pattern, OCL
OCL
Pattern
Model
OCL
OCL
Model
Model
Model

Automated

MT Properties

*
*

SC, IP

*
*
*

SC
SC
SC
SC
SC, IP
SC
SC

ods [24, 7, 15].

3

Running example

This section motivates our work with the CD2RDBM
model transformation between class diagrams (CD) and relational database models (RDBM). This transformation example is introduced in [46]. This paper focuses on its simplified version for common transformation situations as regarded in [25]. Metamodels specifying the input and output modeling spaces of the CD2RDBM transformations are
shown in Fig. 1 and Fig. 2, respectively. Requirements of
the CD2RDBM transformation contain constraints as restrictions on input/output models and the relationship between pairs of them. At the specification level, the requirements are independent of implementation language and often specified in the form of transformation contracts.

Figure 2: The simplified metamodel of relational database
schema.

fication typically consists of three sets of constraints corresponding to preconditions, postconditions, and invariants.
First, Preconditions include constraints defining a set of
models, each of which is a candidate as the source model.
Positive preconditions state the expected properties on valid
source models; Negative preconditions define source models that fulfill several forbidden properties, i.e., the source
ones are invalid. For example, the CD2RDBM transformation includes the following precondition constraints:
– A class does not inherit itself;
– The name of a class is unique;
– Attributes of a class must have distinctive names

Figure 1: The simplified metamodel of class diagrams.

A transformation contract allows a designer to specify
what a transformation does, under which conditions it can
be applied to a model, and what its excepted result is. Such
information is also helpful for choosing and applying the
proper transformation in the context of off-the-shelf transformations. A contract-based model transformation speci-

– The child class does not redefine attributes of its parent
class;
– The name of an association does not coincide with a
class’s name.
Second, Postconditions define a set of models produced
by the transformation: Positive postconditions state the expected properties of valid target models; Negative postconditions define target models that satisfy several forbidden
properties, i.e., the target ones are invalid. For example,

On Integrating Multiple Restriction Domains…

the CD2RDBM transformation includes the following postconditions:
– A table name is unique;
– Two columns of a table must have distinct names;
– A table cannot have more than one primary key column.
Third, Invariants specify the correspondence between
pair of source and target models, denoted by ps =⇒ pt.
A positive (negative) invariant has the expressing structure:
If the source model satisfies the property ps then the target
model (does not) satisfies the property pt. As discussed in
[26, 27, 7], the structure of each transformation rule also
can be represented by a positive invariant that must hold
between the source and target models to satisfy the transformation definition. The CD2RDBM transformation specification contains the following negative invariants:
– If the CD model has two classes with an inheritance relationship, the corresponding RDBM model could not
have two distinction tables mapping to these classes.

Informatica 47 (2023) 21–42

25

model transformation can be realized using different transformation implementation languages. To test the quality of
a model transformation captured from multiple restriction
domains, a black-box testing approach is often employed.
Since exhausting testing is impossible, testing criteria are
proposed to select representative test models to achieve the
source metamodel coverage and the specification coverage.
Depending on the test objective, either the positive testing strategy or the negative testing strategy will be used
to navigate the test case design and test execution process.
The analysis of information on a test basis allows testers to
determine test conditions in both negative testing and positive testing strategies.

4

Overview of the approach

Figure 3 overviews our approach to testing model transformations. The basic idea is to synthesize test models based
on an integration of multiple restriction domains.

– If the CD model has two mutually inherited classes,
then the corresponding RDMB model could not only
have the mapping table to the parent class while there
is no mapping table to the child class.
– If the CD model has a class, the corresponding RDBM
model could not have two distinction tables mapping
to this class.
In addition, the CD2RDBM transformation has six mapping rules that define how a CD model is mapped to a corresponding RDBM model:
– A class must be mapped to a same-name table;
– The name and data type of a non-primary attribute coincides with the ones of a corresponding column;
– A primary attribute is mapped to a column played as
the primary key;
– A multi-valued aggregation and association between
two classes is mapped to a new associative table that
relates the two corresponding tables;

Figure 3: An integration of multiple restriction domains for
model transformation testing.

– An aggregation/association relationship between two
classes is characterized by a single-valued end and a
multi-valued end (0..*, 1..*) is mapped to a foreign key
that relates two corresponding tables;

First, the partitioning technique is employed to define
test models that cover the source metamodel. The partitioning criteria that are either restrictions on the source metamodel or contracts of the transformation specification are
expressed in the form of boolean OCL expressions, referred
to as so-called classifying terms (CTs) [8]. In this way, the
underlying conditions which are used in characterizing test
models also can be flexibly combined to generate effective
test suites.
Second, test criteria could be defined for both positive
and negative testing strategies. To generate test models that

– A child and its parent class are mapped to the same
table.
Testing is required to find out if a model transformation
is implemented and executed as expected for all possible inputs, or if there are bugs in the transformation leading to unintended output models for certain input models [29]. This

26

Informatica 47 (2023) 21–42

satisfy a test criterion, input test conditions captured from
each restriction domain are expressed by classifying terms.
The classifying terms are then combined and taken as the
input of constraint solvers, including the SAT solver, in order to automatically generate test models.
Finally, to check test oracles, classifying terms derived
from the target metamodel and the transformation specification are defined to ensure that (1) the output model
conforms to the target metamodel, (2) the output model
satisfies the postcondition, and (3) the output model must
also comply with invariants that describe the transformation relationship between valid or invalid pairs of source
and target models. Such test output evaluation conditions
are then combined to evaluate expected properties on the
output model using model validator tools, including OCL
tools like USE.

5

Synthesizing test models from
restriction domains

Test model selection involves finding valid and invalid input models within positive and negative testing strategies,
respectively. Test models are generated by synthesizing
models from different restriction sources. This section explains combining knowledge sources to generate valid and
invalid models within the positive and negative test strategies based on the proposed covered criteria.

5.1

Metamodel coverage

In MDE, a metamodel is often represented in the form
of a UML class diagram with the key meta-concepts of
MOF [37] including classes, attributes, generalization, and
associations. Therefore, test models, that conform to the
source metamodel, can be defined by an equivalence partitioning on the class diagram [38] for the source metamodel
with the two following criteria [1]:
– AEM (Association End Multiplicities): For each association end, each representative multiplicity must be
covered. For instance, if an association end has the
multiplicity [0..*], then it should be instantiated with
the multiplicity 0, 1, and N (N is greater than 1).
– CA (Class Attribute): For each attribute, each representative value must be covered. For instance, representative values of a boolean attribute are true and
f alse that define two corresponding partitions.
These criteria AEM and CA, as illustrated in Table 3,
could be expressed in terms of representative values [1, 21,
4]. Representative multiplicity pairs can then be computed
for an association by taking the Cartesian product of the
possible multiplicities of each of its two ends. The representative values of each attribute can be computed from
the typical data types of class attributes such as Integer,
String, and Boolean.

N. Hanh et al.

Table 3: Representative values for multiplicities
Multiplicity property
0..1
1
0..*
1..*
N..*
N..M

Representative values
0, 1
1
0, 1, [>1]
1, 2, [>2]
N, N+1, [ >(N+1)]
N, N+1, M-1, M

Boolean classifying terms (CTs) [39] are used to represent equivalence partitions for test models as follows. For
each direction of an association between two classes, the
name of the first class, the role name of the second class, and
the multiplicity of the association ending at the second class
are parameterized by variables f Class, dClassRole and
sizeN umber, respectively. Note that the sizeN umber
corresponds to the representative multiplicity value (as depicted in Table 3) at the second class. The parameter
f Class1 is to define an arbitrary variant of instances of
the first class. Using these parameters as input of the following OCL template, boolean CTs are generated from the
metamodel.
fClass.allInstances -> exists( fClass1 |
fClass1.dClassRole -> size() = sizeNumber )

Figure 4 shows a set of CTs for the simplified class diagram metamodel. Figure 5 demonstrates the partition analysis based on CTs captured from multiplicity values. Test
suites with test models generated by the CT set would satisfy the association coverage.
This partitioning approach also includes the restriction
on the data type of class attributes. Thus, generated test
models could ensure the attribute coverage criterion [44, 2,
4, 3], i.e., each representative value of an attribute must be
covered in at least one test model. The following example
illustrates how representative values could be defined by
analyzing the data range of primitive data types.
– The representative values for Boolean attributes are
{true, f alse};
– The representative values for String attributes:
{null,′′ ,′ something ′ };
– The representative values for Integer attributes:
{0, 1, > 1}.
The following OCL template is proposed to generate CTs
for the attribute coverage criterion.
clsName.allInstances -> exists( varCls |
varCls.attrName = rprValue )

In this OCL template, the parameter attrN ame defines the
attribute name, the clsN ame defines the class name, the
rprV alue defines the chosen representative value for the
attribute data type, and the varCls is to define an arbitrary
variant of instances of the class. Figure 6 demonstrates the

On Integrating Multiple Restriction Domains…

Informatica 47 (2023) 21–42

27

Figure 4: CTs coverage representative values of association’s multiplicities.

5.2

Figure 5: Partition analysis based on CTs captured from
multiplicity values.

partition analysis based on CTs captured from the attribute’s
data type.

Figure 6: Partition analysis based on CTs captured from
attribute’s datatype.
There are two basic approaches to select representative
values of equivalence partitions. The first, default partitioning chooses representative values using the boundary
value analysis or the data random generation. The second, knowledge-based partitioning, representative values
are provided by domain experts for various test objectives.
This technique also allows the tester to flexibly adjust the
configuration to narrow the searching space of constraint
solving for a better test model generation.
Figure 7 shows a configuration file defined by the domain expert for the CD2RDBM transformation. Figure 8
shows classifying terms that partition source models based
on the properties of the class Class.

Transformation specification coverage

A contract-based specification of a model transformation,
brings benefits for debugging, testing, and, more generally, quality assurance. The partition analysis technique can
be applied to contracts of a transformation specification to
generate the test models that cover the transformation specification’s requirements as regarded in [8, 36, 7]. This section explains a partition analysis technique based on classifier terms for model transformations. First, the underlying
transformation will be captured by our TC4MT specification language [40]. The language TC4MT employs typed
graph patterns in the form of a UML class added with OCL
constraints to express transformation contracts. Transformation contracts can be either positive or negative.
Figure 9 shows a negative precondition specified in
the TC4MT language. The precondition states that the
CD2RDBM transformation rejects any input model in
which a child class redefines an attribute of the parent class.
Figure 10 shows a negative postcondition for the generated
RDBM models. The example postcondition states a restriction on the output models that the column names of a table
must be distinct.
Invariants within a transformation contract state how certain structures of an input model should be transformed. An
invariant often consists of a source graph, a target graph,
and an optional corresponding graph to connect them. A
positive invariant that holds on a pair of source and target
models would ensure there exists a target graph for each
given source graph. With negative invariants, such a target
graph should not be found from the target model domain.
Figures 11 and 12 show a positive invariant and a negative
invariant of the CD2RDBM transformation, respectively.
Considering each input condition on the input modeling space as a testing property, representative values of the
property are defined for a testing partition. Then, graph
patterns of representing input conditions are translated into
boolean OCL expressions using the template as illustrated

28

Informatica 47 (2023) 21–42

N. Hanh et al.

Figure 7: The configuration file for constructing source CTs.

in Fig. 13.
To translate graph patterns into OCL constraints, this
schema will iterate over all objects of each contract (lines
2,3). In the case of negative contracts, i.e., the attribute
status of all objects equals to −2, the negation operator
not appears at the first line of the schema. The function
conditions is to check a constraint on the underlying objects and their properties. If there exist two objects oi and
oj with the same type (typeoi = typeoj ) then the condition oi <> oj will be added. The association between
two objects will be translated into a corresponding condition, either oi.rolej → includes(oj) or oj.rolei →
includes(oi). The condition function omits the checking
of attributes with undefined value. Other OCL constraints
of the graph pattern will be included in the function conditions. Figure 14 shows an OCL condition translated from
the precondition shown in Fig. 9: There does not exist any
redefined attribute in the child class.
A boolean OCL expression can be assigned to one of
two values {true, f alse} to specify a corresponding equivalence partition of the input model set. Models that violate a
negative precondition will belong to an invalid equivalence
partition of the input model set, while the other models will
belong to the remaining partition of the input model set.
Figure 15 illustrates the result of the partition analysis on
preconditions.
Similarly, postcondition contracts as well as invariant
contracts could also be translated into boolean OCL expressions to partition the output model set. These OCL expressions will be taken as OCL assertions playing the oracle
function to verify actual output models.

6 Generating test models in
different test strategies
Model transformation testing aims to ensure a transformation fulfills its requirements (i.e., validation testing) and to
discover defects in the transformation (i.e., defect testing).
For a certain test objective, the tester would follow a suitable test strategy. This section explains how test models are
generated in such different test strategies.
Figure 16 depicts the workflow for test model generation. First, a test basic, including a transformation specification and a configuration of the test model domain, is
analyzed and translated into boolean OCL expressions as
classifying terms [8] to define partitioning information sets.
Second, depending on different testing strategies, partitioning information sets and test criteria describe how the partitions are combined and selected to design test cases. Here,
composite partitions are built according to certain specification coverage criteria. The test conditions in both test
strategies are defined by combining single partitions using
the relational operators {and, or, not}. Finally, these partition combinations are then taken as the input of an SAT
solver [41] to automatically generate test models. For a particular OCL condition, the solver might not find any valid
model since the given scope is too narrow, or there is inconsistency in the specification. In such cases, the search
scope can be extended interactively by adjusting the solver
parameters.
There are two main test strategies for model transformations: (1) A positive testing strategy aims to ensure correctness. This strategy focuses on generating valid input
models. The tester could combine restriction domains cor-

On Integrating Multiple Restriction Domains…

Informatica 47 (2023) 21–42

29

Figure 8: Some source CTs generated from the partition analysis on the class Class.

Figure 10: A negative postcondition of the CD2RDBM
transformation.

Figure 9: A negative precondition of the CD2RDBM transformation.

responding to different aspects of the correctness property
(including syntax correctness, semantic correctness, information preservation, and behavior preservation) to select
relevant input models together with OCL assertions. (2) A
negative testing strategy is applied to ensure safety and reliability. The strategy focuses on generating invalid input
models so that transformation’s defects might be detected.

6.1 Negative testing
Negative testing ensures that a model transformation can
gracefully handle invalid input or unexpected execution
scenarios. An input model is invalid if it violates at least
one negative precondition. The equivalence partition tech-

nique is applied to preconditions of the transformation to
identify various invalid partitions of input models.
To illustrate the negative testing approach, testers focus
on the following typical situation. From a given negative precondition, two equivalence partitions are defined:
a set of invalid input models that violate this precondition (f alse) and a set of the remaining models that fulfill this precondition (true). A model of the second set
can be valid or invalid due to other remaining constraint
conditions. Such a negative test case aims to discover defects when robustness testing. By combining many negative preconditions, a smaller partition of invalid input models would be defined.
To automate the generation of test inputs, a combination strategy is defined that describes how values (true or
f alse) for negative preconditions are selected such that the
underlying coverage criterion is satisfied. The t-wise coverage criterion tends to be chosen for the negative testing
approach. The coverage criterion is satisfied if any value
combinations of t parameters, i.e., negative preconditions,

30

Informatica 47 (2023) 21–42

Figure 11: A positive invariant of the CD2RDBM transformation.

N. Hanh et al.

Figure 14: The OCL expression translated from the negative precondition shown in Figure 9.

Figure 12: A negative invariant of the CD2RDBM transformation.

in this case, appear in at least one test input. As a special case of this, the following criteria are determined: each
choice (t = 1), pair-wise (t = 2), and exhaustive (t = n).
Based on the combinatorial testing with negative test
cases for software testing as explained in [42], different levels of specification coverage are defined for the negative
test case generation as follows (see Fig. 17 for an illustration).
– NP coverage: For each negative precondition (the twise coverage with t = 1) at least one input model is
selected.
– 2NP coverage: For each negative precondition (t =
1) and each pair of negative preconditions (t = 2), at
least one input model is selected.
– Combinatorial NP coverage: For each combination
of t negative precondition (t >= 1), at least one test
input model is selected. For instance, with the case

Figure 13: The OCL schema for the precondition compilation.

Figure 15: Partition analysis based on CTs captured from
preconditions.

t = 4, test input models would be generated for each
negative precondition and each combination from 2 to
4 negative preconditions.
Figure 18 shows four test models generated by solving
source classifying terms of the CD2RDBM transformation.
These classifying terms are defined as a combination of
negative preconditions. The first test model (M1) plays
the role of a negative test case violating the negative precondition NoSelfInheritance. The remaining test models
(M2,M3, and M4) are generated by the classifying term,
defined by combining the two negative preconditions NoSelfInheritance and NoDuplicateClassName.
Linking negative test cases with test oracles. Negative testing ensures that a model transformation can gracefully handle invalid input data or unexpected user behavior. The purpose of negative testing is to prevent the system
from crashing due to negative inputs and improve its quality and stability. The completeness property requires that
the transformation refuses invalid input data and does not
contain any incomplete execution. The syntactical correctness property requires that any output model produced from
an invalid input model needs to be invalid, i.e., it violates
at least one negative postcondition. The completeness of
a transformation could be checked by performing negative

On Integrating Multiple Restriction Domains…

Informatica 47 (2023) 21–42

31

Figure 16: A test model generation process in different test strategies.

the SAT solver when solving constraints to generate valid
input models.
A strategy to combine partition information in terms of
classifying terms is defined to avoid duplication and reduce the number of test models. This strategy also ensures
both the source metamodel coverage and the transformation
specification coverage. The concept Range is denoted to a
set of equivalent values of a class property and aPartition
contains one and more Range. The following coverage criteria are proposed for the positive testing approach.
The allRanges coverage. The representative value of each
range must be implemented in at least one test model.
The following examples are model fragments of the metamodel(MF).
MF{Class.allInstances → exists(c|c.name=’’)}
MF{Class.allInstances → exists(c|c.name=’var’)}
MF{Class.allInstances → exists(c|c.childClass →
size()=0)}

Figure 17: Partition analysis based on CTs captured from
preconditions.

test cases and observing manually the execution process.
The expected output of the execution is either an invalid input warning or a non-terminating state of the transformation
system. Similarly, the syntactical correctness of the transformation also can be checked. The output model is now
checked using the oracle function as shown in Fig. 19.

6.2 Positive testing
Positive testing verifies how the application behaves for
the positive set of data. In positive transformation testing,
single partitions are combined to select valid input models, representing composite partitions of the input model
domain. Because valid input models must satisfy without
violating any negative preconditions, all classifying terms
translated from negative preconditions will be pushed into

The allPartitions coverage. The set of representative
values of each Partition must appear in at least one test
model. The following example model fragments are generated from this fragmentation criterion.
MF{Class.allInstances → exists(c|c.name=’’) ∧
Class.allInstances → exists(c|c.name=’var’)}
MF{Class.allInstances → exists(c| c.childClass
→ size()=0) ∧ Class.allInstances → exists(c|
c.childClass → size()=0) ∧ Class.allInstances →
exists(c| c.childClass → size()>1)}

A test model based on this coverage criterion can represent more constructs to be tested in the source metamodel
than the allRanges coverage criterion. If an area is divided
into three ranges, the tester can create a test model that corresponds to the three instances of the test model set in the
allRanges criterion. Therefore, creating a suitable allPartitions coverage for a test model set can reduce the test case
size while ensuring the metamodel coverage criterion.
The allClassProperties coverage. Each value association
representing the partition of each class’ attribute values
must be implemented in at least one test model. The following example fragment models are generated from this

32

Informatica 47 (2023) 21–42

N. Hanh et al.

Figure 18: Four negative test cases generated from source CTs.
criterion.
MF{Class.allInstances → exists(c|c.name=’var’ ∧
c.childClass → size()=0 ∧ c.parentClass → size()=0
∧ c.ownedAtt → size()=0 ∧ c.srcAss → size()=0 ∧
c.destAss → size()=0)}
MF{Attribute.AllInstances → exists(a1| a1.name =
'attname') ∧ Attribute.AllInstances → exists (a1|
a1.datatype = 'atttype') ∧ Attribute.AllInstances →
exists(a1| a1.isPrimary = false)}

While the test coverage criteria allRanges, allPartitions,
allClassProperties allow us to achieve the source metamodel coverage, specification-based test coverage criteria
aim at the requirement coverage. A valid input model needs
to fulfill all negative preconditions, therefore, test models
generated by the positive testing strategy ensure the precondition coverage. In order to achieve the invariant coverage
and to navigate the test input model selection, the following
test coverage criterion is defined.
The invariant coverage. Each source pattern of invariants (consisting of both negative and positive invariants)
needs to be implemented in at least one test model.
Positive w.r.t. negative invariants describe valid w.r.t.
invalid pairs of source and target models. Therefore, the
source graphs as part of an invariant can be used as templates, i.e., positive patterns, to generate test models in

the positive testing strategy. The invariant coverage criterion requires that each source graph of invariants (including transformation rules) appears as a restriction on at least
one test model. Considering the CD2RDBM transformation, with three negative invariants and six transformation
rules, nine test models are required to satisfy the coverage
invariant criterion.
Linking positive test cases with test oracles. The completeness property of a model transformation requires any
valid input model also to be accepted as the input data and
then transformed into an output model. In case all generated
output models are valid target models satisfying all negative postconditions, the syntactical correctness property of
a model transformation is ensured.
A model transformation is correct only if both input and
output models are valid. In other words, the output model
must preserve the information as well as the behavior of
the input model through the transformation program. The
correspondence between source (input) models and target
(output) models can be captured by invariants. Therefore,
invariants should be effective knowledge sources to check
information preservation. Therefore, positive test cases
including valid input models should be used as test data
for checking the syntactical correctness and information
preservation as shown in Fig. 20 and Fig. 21.

On Integrating Multiple Restriction Domains…

Informatica 47 (2023) 21–42

33

Figure 19: Oracle function for the syntactical correctness in the negative testing strategy.

Figure 20: Oracle function for the syntactical correctness
in the positive testing strategy.

7 Tool support
The USE (UML-based Specification Environment) [43] is
the execution environment of the support tool of. The
tool includes three main functional components as follows: (1) TC4MT specification tool; (2) Test generator; and
(3) Test bench.
As shown in Fig. 22, the first component allows building the TC4MT transformation specification using the USE
editor. In this component, the metamodel of a transformation specification is represented by a UML class diagram
added with OCL constraints. Patterns of a specification
are represented by object diagrams conforming to the metamodel. For example, the class diagram in Fig. 22 shows the
metamodel of the CD2RDBM transformation specification
while preconditions, postconditions, invariants, and transformation rules are represented by object diagrams created
by using the graphical window interface or the scripting language SOIL of the USE editor.
The second component is a USE plugin that performs
the specification analysis to define test conditions. Figure 23 shows the GUI of this component. The plugin is
activated by loading a triple-type graph. The window MetamodelAnalysis (red label 1) is used to automatically generate source classifying terms from the partition analysis
on the source metamodel. An optional configuration file
containing information provided by the domain expert can

Figure 21: Oracle function for the information preservation
in the positive testing strategy.

be loaded to increase the expressiveness of the source CTs.
The window SpecificationAnalysis (red label 2) is used to
load patterns of preconditions, postconditions, and invariants (including transformation rules playing positive invariants) and translate them into CTs using the OCL schemes
introduced in Section 5.2.
The last component, as shown in Fig. 24, is also a USE
plugin playing the test bench. Test bench-related tasks are
to use the Kodkod engine to solve OCL constraints for finding model instances playing test models, invoke the system
under test (SUT) with the test models, and pass resulting
output models to the oracle function for evaluation. This
plugin is activated by loading the source metamodel. It
takes as input the specification files of metamodels, transformation definition, source CTs, target CTs, and the Model
Validator configuration, all of which are plain text files.
The transformation definition including a set of TGGbased rules is written in the RTL language [45] that can run
on the USE tool. The configuration file (including value options for links, attributes, and size of elements) is required
to restrict object models. The source CTs file is used to gen-

34

Informatica 47 (2023) 21–42

N. Hanh et al.

Figure 22: A transformation specification in the language TC4MT.

erate input models of test cases, the target CTs file is used
to validate output models. The mapping file contains a list
of patterns in the format of ”sourceCTs –> targetCTs”, in
which each side specifies a list of expected Boolean values
of CTs corresponding to passed test cases. The list of source
CTs that are combined to represent the input test specification based on the selected test coverage criteria while corresponding target CTs are combined to represent the expected
output property that defines the partial oracle function. The
test report obtained from the test bench is shown in Fig. 25.

Finally, as shown in Fig. 25, when executing the test
suite, each solution generated by SAT solver will be taken
as input for the model transformation. The tool then reports
whether the output model satisfies predefined OCL assertions. The partition information of each solution is presented in the panel Source Classifying Terms, as shown in
Fig. 25. The validation result of the corresponding output
model against OCL assertions is shown in the panel Target Classifying Terms. The oracle function that is predefined in the mapping file will check whether the test case
is passed. The test result will be depicted in the panel Validation result. The transformation execution script is shown
in the panel Executed transformations. The debugging of
transformation execution scenarios is performed by invoking each rule application, step by step. The current state
of the transformation system after each transformation step
could be checked.

8 Experimental results
In this section, several experiments are performed to evaluate the effectiveness of generated input models for detecting
transformation failures. The objective of the experiment is
to evaluate the error detection ability of the designed test
cases in both positive and negative testing strategies.

8.1 Tested setup
For the evaluation, the paper focuses on four model transformations written in the RTL implementation language,
the Restricted graph Transformations Language, as proposed in [45]. The purpose of the transformation examples
is as follows.
C2R. The CD2RDBM transformation [46] is implemented
for the running example which includes six rules, five
negative preconditions, three negative invariants, and
three negative postconditions;
B2D. The BibTeX2DocBook transformation [47] transforms the BibTeX model into the XML-based format
for document composition DocBook. However, in this
paper, we are only interested in converting the information about proceedings of conferences presented in
BibTeX models into corresponding information presented in DocBook models. The version of the transformation BibTeX2DocBook includes six rules, four

On Integrating Multiple Restriction Domains…

Informatica 47 (2023) 21–42

35

Figure 23: The GUI for the function to analyze transformation specifications.

negative preconditions, two negative invariants, and
three negative postconditions;
F2P. The Families2Persons transformation [48] is part of
the ATL transformation zoo and was created as part
of the “Usine Logicielle” project. This transformation
includes four rules, two negative preconditions, two
negative postconditions, and no negative invariant;
B2P. The BPMN2PetriNet transformation [49] transforms
BPMN models at the Computational Independent
Model (CIM) level to PetriNet models at the Platform
Independent Model (PIM) level. This model transformation includes twelve rules, five negative preconditions, two negative postconditions, and no negative invariant.
It is important to note here that the specification language
TC4MT is independent of transformation implementation
platforms. The transformation implementations need to
conform to the transformation specification but are not derived from the TC4MT specifications automatically. Generated test suites can be used for verification and validation
model transformation implementations using the black-box
testing approach.
Table 4 gathers the number of contracts of each transformation example, as well as the size of its input metamodel.
In our specification-based testing approach, we focus on
negative preconditions, negative postconditions as well as
negative invariants. Besides, transformation rules specify
expected corresponding mappings between source models
and target models so they can be considered as positive invariants of the source-target contracts.
The BPMN2PetriNet transformation is the most complex
in terms of the size of specification as well as the input
metamodel. The Families2Persons transformation is a simple transformation with few invariants, rules and classes.

8.2

Test suite generation

From the TC4MT specification, a test suite is derived based
on each selected coverage criterion. The test suites were automatically generated by using the tool presented in Sect. 7.
The numbers in side cell of Table 5 show the number of generated test models corresponding to a particular coverage
criterion. In general, the larger the size of the specification
is specified, the larger the test size is generated.

8.3

Efficacy of generated test suites

To measure the effectiveness of a test suite and help improve it, the common technique mutation testing [50] is employed. In mutation testing, faults are injected into a program to produce erroneous versions of it, which are called
mutants. Then each mutant is tested with the test suite.
Once the test suite could detect the error, the mutant is
killed. Otherwise, the mutant remains alive. The mutant
score, which is the number of killed mutants divided by the
total number of mutants, gives a measure of the quality of
the test suite.
The mutation testing technique is performed as follows.
First, mutants of each transformation implementation are
created manually by injecting faults by using systematic
classification of mutation operators of model transformation regarded in [50].
Navigation. The model has navigated thanks to the relations defined on its metamodel and a set of elements is
obtained. Therefore, navigation mutations replace the
navigation towards a class with the navigation towards
another, remove the last step of a chain of navigation,
or add the last step of navigation in a navigation chain.
Filtering. A rule application is usually performed on a limited set of input and output model elements described
by the filter conditions. Filtering mutations introduce

36

Informatica 47 (2023) 21–42

N. Hanh et al.

Figure 24: The GUI for the functions: test implementation, execution, and reporting.

Table 4: Test setup
Examples
CD2RDBM
BibTeX2DocBook
Families2Persons
BPMN2PetriNet

Precond
5
4
2
5

Specification
Postcond Inv+Rule
3
9
3
8
2
4
2
12

disturbances in the filters of a collection, either by
modifying the attributes used in the filter or by selecting only some instance types when the collection is
defined with a generic class.
Creation. Output model elements are created by the execution of transformation rules. The creation mutations
replace the creation of an object with another compatible type, delete the creation of a relation between two
objects, or add a useless relation between two objects
in the transformation rules of a transformation implementation.
Figure 26 shows an example of mutants. Here, the rule is
specified in the RTL transformation language. The injected
fault is highlighted in a colored square. In particular, the
class Column is related to the class T able by two associations corresponding to the role names ownPkey and ownCol. This mutant aims to replace c.ownP key of the column

Source Metamodel
Class Assoc Inheritance
3
4
0
5
3
2
2
4
0
15
2
14

c with the c.ownCol so that the cardinality is modified.
Table 6 shows the mutation operators used to create
the mutants in the experiment, which altogether belong to
all possible mutation types (navigation, filtering, and creation). Each mutant was created by applying a mutation
operator to the original transformation one time. Thus, each
cell in the table corresponds to the number of mutants created using a particular mutation operator. The last column
in the table summarizes the number of mutants created for
each transformation.
Table 7 shows the number of mutants created from each
transformation as well as the mutation score of the generated test suites using the negative testing strategy.
Table 8 shows the number of mutants created from each
transformation using the positive testing approach.

On Integrating Multiple Restriction Domains…

Informatica 47 (2023) 21–42

37

Figure 25: Test report with some partition information.

Table 5: The number of test models in test suites for coverage criteria
Examples
CD2RDBM
BibTeX2DocBook
Families2Persons
BPMN2PetriNet

Negative testing
1NP 2NP 3NP
5
35
63
4
22
50
2
5
5
35
63

Ranges
34
24
10
53

Table 6: Number of mutants on the transformations
CD2RDBM (C2R), BibTeX2DocBook (B2D), Families2Persons (F2P), and BPMN2PetriNet (B2P)
C2R
B2D
F2P
B2P

Navigation
9
6
12
13

Filtering
28
12
16
36

Creation
21
7
4
20

Total
58
25
32
69

9 Threats to Validity
Although we performed the experiments with utmost care,
some underlying parameters potentially threaten the validity of the obtained results:

Positive testing
Partitions ClassProperties
14
3
12
5
2
2
30
15

Invariants
9
8
4
12

Table 7: Mutation scores of the generated test suites in the
negative testing strategy
C2R
B2D
F2P
B2P

Mutants
58
25
32
69

1NP
0.90
0.84
0.81
0.62

2NP
0.90
0.84
0.81
0.62

3NP
0.90
0.88
0.81

i) We experimented with common transformation examples that are available in related works. However, we only specify and implement these transformation examples with simplified requirements and
particular fragments of input/output metamodels in-

38

Informatica 47 (2023) 21–42

N. Hanh et al.

Table 8: Mutation scores of the generated test suites in the positive testing strategy
CD2RDBM
BibTeX2DocBook
Families2Persons
BPMN2PetriNet

Mutants
58
25
32
69

allRanges
0.90
0.72
0.75
0.65

allPartitions
0.90
0.80
0.75
0.70

allClassProperties
0.90
0.80
0.75
0.70

allInvariants
0.90
0.72
0.75
0.65

more objective assessment.

iii) In the workflow of our proposed approach, several
steps are still performed manually and interactively,
such as the step to create configuration files containing representative values of partitions, the one to create mutant sets, and the one to select the solver configuration. Therefore, the quality of the tester’s work
and their decisions should have more or less an impact
on the experimental results.

Figure 26: An example for the mutation operator navigation.

stead of whole metamodels. For example, in the BibTeX2DocBook transformation, we only work on BibTeX files representing the conference proceedings.
Mutation scores of generated test suites generally are
dependent on specific factors such as the way to create mutants, the size of test suites, and the quality of
the under-test transformation implementation. Therefore, the obtained experimental results only point out
the error-detection efficiency of generated test suites
for typical semantic faults of transformations. Our
mutation-based evaluation method is inapplicable for
the other specific faults.
ii) We empirically evaluate transformation examples realized by the RTL bidirectional transformation language designed based on the integration of TGG and
OCL [45]. Because of the flexibility of the OCL language, there can be different implementations for the
transformation from a specification. Therefore, the
number of created mutants for each different implementation does not coincide with each other. Note
that the RTL implementation currently is not derived automatically from the TC4MT specification although it also has the TGG-based semantics. Even in
the case of the automatically generated implementation, testing such an implementation would affect the
evaluation results. This makes RTL implementations

Discussion. As surveyed in Sect. 2, current black-box testing approaches often employ meta-model coverage criteria to ensure that the set of generated input models contains at least one instance of any class or association of the
meta-model. They also refer to extreme values for the attributes. However, a limitation of the approaches is that
a very large number of test models, including unrelated or
duplicated test models, are generated, and the completely
generated test model is often not related to the test output
evaluation. Several testing approaches focus on contractbased model transformation specification analysis to generate smaller test model sets using the specification-based
coverage criteria. An advantage of these approaches is that
the test models remain intentional: They are generated for
testing a particular combination of transformation requirements so that they can be checked by the oracle function
more efficiently.
In this paper, a testing approach is proposed that combines different knowledge sources to generate smaller,
more efficient test model suites with different test objectives. This combination reduces test model duplication
while still ensuring efficient metamodel coverage and specification coverage. In our approach, the use of environment
configuration parameters provided by domain knowledge
makes generated test suites more efficient. Test models are
designed by using both negative testing and positive testing
strategies. The approach allows us to verify further quality properties of a model transformation. Some test oracle
functions also are defined for verifying quality properties
against appropriate test suites generated by different testing
strategies. To show the effectiveness of the generated test
cases in detecting common semantic errors, some experiments are performed on different transformation examples
as regarded in Sect. 8.

On Integrating Multiple Restriction Domains…

10 Conclusion and future work
This paper proposes a specification-driven testing approach
for test data selection. The basic idea is to leverage different
sources of knowledge that can be produced during the transformation specification development and to utilize them for
automatic generation of test suites. Different sources of
knowledge as restriction domains are translated into OCL
conditions to facilitate the partitioning testing. The experiments show that boolean OCL expressions could be combined to synthesize test models. Based on the characteristics of knowledge sources and selected testing strategies,
input model conditions would be linked with output model
assertions to check different quality properties.
The proposal testing framework, named TC4MT, employs SAT solver for finding test models automatically. The
TC4MT framework is installed to support automated testing of RTL transformation implementation on USE environments. Several experiments are conducted in which test
suites are automatically generated from several transformation specifications. We then measured the efficacy of the
generated tests using the mutation analysis. The quality of
the generated test set highly relies on how complete a specification is. If a specification only covers part of the transformation requirements, then the generated models may not
enable the testing of the underspecified parts.
The performance and scope of test model searching remain a challenge for the proposed approach, we plan to conduct further experiments to improve performance and test
coverage.

Acknowledgements
This work has been supported by Vietnam National University, Hanoi under Project No. QG.20.54. We wish to thank
the anonymous reviewers for numerous insightful feedback
on the first version of this paper.

References
[1] F. Fleurey, J. Steel, and B. Baudry (2004). Validation
in Model-Driven Engineering: Testing Model Transformation. First International Workshop on Model,
Design and Validation, IEEE, pp. 29-40,
https://doi.org/10.1109/modeva.2004.1425846
[2] J. Wang, S.K. Kim, and D. Carrington (2006). Verifying metamodel coverage of model transformations.
In Proc. of Australian Software Engineering Conference, IEEE, pp. 270–282,
https://doi.org/10.1109/aswec.2006.55
[3] M. Lamari (2007). Towards an automated test generation for the verification of model transformations. ACM Symposium on Applied Computing (SAC),
ACM, pp. 998–1005,
https://doi.org/10.1145/1244002.1244220

Informatica 47 (2023) 21–42

39

[4] S. Sen, B. Baudry, and J.M. Mottu (2008). On combining multi-formalism knowledge to select models
for model transformation testing. Software Testing,
Verification and Validation (ICST), IEEE, pp. 328–
337,
https://doi.org/10.1109/icst.2008.62
[5] H. Wu, R. Monahan, and J. F. Power (2013). Exploiting Attributed Type Graphs to Generate Metamodel
Instances Using an SMT Solver. In Proc. of TASE, pp.
175–182,
https://doi.org/10.1109/tase.2013.31
[6] S. Jahanbin and B. Zamani (2018). Test Model Generation Using Equivalence Partitioning. In Proc. of ICCKE, pp. 98–103,
https://doi.org/10.1109/iccke.2018.8566335
[7] E. Guerra (2012). Specification-driven test generation
for model transformations. In Proc. of ICMT , pp. 40–
55, https://doi.org/10.1007/978-3-642-30476-7_3
[8] F. Hilken, M. Gogolla, L. Burgueño, and A. Vallecillo
(2018). Testing models and model transformations using classifying terms. Software and Systems Modeling, pp. 885-912,
https://doi.org/10.1007/s10270-016-0568-3
[9] L. Burgueño, J. Cabot, R. Clarisó, and M. Gogolla
(2019). A Systematic Approach to Generate Diverse
Instantiations for Conceptual Schemas. In Proc. of
ER, pp. 513–521,
https://doi.org/10.1007/978-3-030-33223-5_42
[10] M. Gogolla, J. Bohling, and M. Richters (2005). Validating UML and OCL models in USE by automatic
snapshot generation. Software and Systems Modeling,
pp. 386–398,
https://doi.org/10.1007/s10270-005-0089-y
[11] Stephan Hildebrandt, Leen Lambers, Holger Giese
(2013). Complete Specification Coverage in Automatically Generated Conformance Test Cases for
TGG Implementations. In Proc. of ICMT , pp. 174188,
https://doi.org/10.1007/978-3-642-38883-5_16
[12] J.M. Küster and M. Abd-El-Razik (2006). Validation of Model Transformations - First Experiences
Using a White Box Approach. Models in Software
Engineering, Workshops and Symposia at MoDELS 2006, pp. 193-204, https://doi.org/10.1007/9783-540-69489-2_24
[13] C.A. Gonzalez and J. Cabot (2012). Atltest: A whitebox test generation approach for ATL transformations.
In Proc. of Model Driven Engineering Languages and
Systems, pp. 449–464, https://doi.org/10.1007/978-3642-33666-9_29

40

Informatica 47 (2023) 21–42

[14] D. Calegari and A. Delgado (2013). Rule Chains Coverage for Testing QVT-Relations Transformations. In
Proc. of the Second Workshop on the Analysis of
Model Transformations (AMT 2013), pp. 449-464.
[15] M. Wieber, A. Anjorin, and A. Schürr (2014). On the
Usage of TGGs for Automated Model Transformation Testing. Theory and Practice of Model Transformations, pp. 1-16, https://doi.org/10.1007/978-3-31908789-4_1
[16] B. Alkhazi, C. Abid, M. Kessentini, D. Leroy, and M.
Wimmer (2020). Multi-criteria test cases selection for
model transformations. Autom. Softw. Eng, 27(1): pp.
91-118, https://doi.org/10.1007/s10515-020-00271-w

N. Hanh et al.

Computer Science, Springer, pp. 5-399,
https://doi.org/10.1007/978-3-662-47980-3
[26] J. Cabot, R. Clarisó, E. Guerra, J.D. Lara (2010). Verification and validation of declarative model-to-model
transformations through invariants. J. Syst. Softw.
83(2), pp. 283-302,
https://doi.org/10.1016/j.jss.2009.08.012
[27] A. Vallecillo, M. Gogolla, L. Burgueño, M. Wimmer, and Lars Hamann (2012). Formal Specification
and Testing of Model Transformations. Formal Methods for Model-Driven Engineering, Springer, pp. 399437, https://doi.org/10.1007/978-3-642-30982-3_11

[17] Y. Lin, J. Zhang, and J. Gray (2005). A testing
framework for model transformations. In Model
Driven Software Development, pp. 219–236,
https://doi.org/10.1007/3-540-28554-7_10

[28] K. Lano, S. Fang, and S.K. Rahimi (2020). Model
Transformation Specification and Verification. In
Proc. of International Conference on Quality Software, pp. 45-54,
https://doi.org/10.1002/9780470522622.ch14

[18] L. Lengyel and H. Charaf (2015). Test-driven verification/validation of model transformations. Frontiers
of Information Technology and Electronic Engineering, pp. 85-97,
https://doi.org/10.1631/fitee.1400111

[29] A.R. Lukman and J. Whittle (2013). A survey of approaches for verifying model transformations. Software and Systems Modeling, pp. 1003-1028,
https://doi.org/10.1007/s10270-013-0358-0

[19] J.S. Cuadrado (2020). Towards Interactive, Testdriven Development of Model Transformations.
Journal of Object Technology, 19(1), pp. 1-22,
https://doi.org/10.5381/jot.2020.19.3.a18

[30] J. Troya, S. Segura, and A. Ruiz-Cortés (2018). Automated inference of likely metamorphic relations for
model transformations. Journal of Systems and Software (2018), pp. 188–208,
https://doi.org/10.1016/j.jss.2017.05.043

[20] T. J. Ostrand, M. J. Balcer (1988). The categorypartition method for specifying and generating functional tests. Communications of the ACM, pp. 676686,
https://doi.org/10.1145/62959.62964

[31] M. Mottu, S. Sen, M. Tisi, and J. Cabot (2012). Static
Analysis of Model Transformations for Effective Test
Generation. In Proc. of ISSRE, pp. 291–300,
https://doi.org/10.1109/issre.2012.7

[21] J. Wang, S.K. Kim, and D. Carrington (2008). Automatic generation of test models for model transformations. In Proc. of Australian Conf. on Software Engineering, IEEE, pp. 432–440,
https://doi.org/10.1109/aswec.2008.4483232

[32] S. Mazanek and C. Rutetzki (2011). On the importance of model comparison tools for the automatic
evaluation of the correctness of model transformations. In Proc. of IWMCP, pp. 12–15,
https://doi.org/10.1145/2000410.2000413

[22] H. Wu (2016). Generating metamodel instances
satisfying coverage criteria via SMT solving. In Proc. of MODELSWARD, pp. 40–51,
https://doi.org/10.5220/0005650000400051

[33] D. S. Kolovos (2009). Establishing Correspondences between Models with the Epsilon Comparison Language. In Proc. of ECMDA-FA, pp. 146–157,
https://doi.org/10.1007/978-3-642-02674-4_11

[23] S. Sen, B. Baudry, and J.M. Mottu (2009). Automatic Model Generation Strategies for Model Transformation Testing. In Proc. of ICMT, pp. 148–164,
https://doi.org/10.1007/978-3-642-02408-5_11

[34] F. Orejas and M. Wirsing (2009). On the specification and verification of model transformations. In:
Palsberg, Semantics and Algebraic Specification, vol.
5700 of Lecture Notes in Computer Science, pp. 140–
161,
https://doi.org/10.1007/978-3-642-04164-8_8

[24] F. Fleurey, B. Baudry, P. Muller, and Y. Le Traon
(2009). Qualifying input test data for model transformations. Software and Systems Modeling, pp. 185–
203,
https://doi.org/10.1007/s10270-007-0074-8
[25] H. Ehrig, C. Ermel, U. Golas, and F. Hermann (2015).
Graph and Model Transformation - General Framework and Applications. Monographs in Theoretical

[35] L. Addazi, A. Cicchetti, J. D. Rocco, D. D. Ruscio,
L. Iovino, and A. Pierantonio (2016). Semantic-based
Model Matching with EMFCompare. In Proc. of ME,
pp. 40–49.

On Integrating Multiple Restriction Domains…

[36] A.C. Carlos and J. Cabot (2014). Test Data Generation for Model Transformations Combining Partition and Constraint Analysis. In Proc. of International Conference on Model Transformation, pp. 2541, https://doi.org/10.1007/978-3-319-08789-4_3
[37] http://www.omg.org/spec/MOF/
[38] A.A. Andrews, R.B. France, S. Ghosh, and G. Craig
(2003). Test adequacy criteria for UML design models. Softw. Test. Verification Reliab, 13(2), pp. 95-127,
https://doi.org/10.1002/stvr.270
[39] L. Burgueño, F. Hilken, A. Vallecillo, and M. Gogolla
(2016). Generating effective test suites for model
transformations using classifying terms. In Proc. of
PAME/VOLT, pp. 48–57,
https://doi.org/10.1007/s10270-016-0568-3
[40] T.H. Nguyen and D.H. Dang (2021). A Graph Analysis Based Approach for Specification-Driven Testing
of Model Transformations. NAFOSTED Conference
on Information and Computer Science, pp. 224-229,
https://doi.org/10.1109/nics54270.2021.9701514
[41] E. Torlak, and D. Jackson (2007). Kodkod: A
Relational Model Finder. In Proc. of International
Conference on Tools and Algorithms for Construction and Analysis of Systems, pp. 632-647,
https://doi.org/10.1007/978-3-540-71209-1_49
[42] K. Fögen, H. Lichter (2019). Combinatorial Robustness Testing with Negative Test Cases. In Proc. of International Conference on Software Quality, Reliability and Security, pp. 34-45,
https://doi.org/10.1109/qrs.2019.00018
[43] M. Gogolla, F. Büttner, and M. Richters (2007).
USE: A UML-based specification environment for validating UML and OCL. Science

Informatica 47 (2023) 21–42

41

of Computer Programming, 69(1), pp. 27-34,
https://doi.org/10.1016/j.scico.2007.01.013
[44] E. Brottier, F. Fleurey, J. Steel, B. Baudry,
and L.T. Yves (2006). Metamodel-based Test
Generation for Model Transformations:
an
Algorithm and a Tool. Symposium on Software Reliability Engineering, IEEE, pp. 85-94,
https://doi.org/10.1109/issre.2006.27
[45] D.H. Dang (2009). On integrating triple graph
grammars and OCL for model-driven development. University of Bremen, Ph.D. thesis, 2009,
https://doi.org/10.1007/978-3-642-01648-6_14
[46] https://www.eclipse.org/atl/atlTransformations/
[47] A.G. Domínguez and G. Hinkel (2019). The TTC
2019 Live Case: BibTeX to DocBook. In Proc. of the
12th Transformation Tool Contest, co-located with the
2019 Software Technologies: Applications and Foundation, pp. 61-65.
[48] A. Anjorin, T. Buchmann, and B. Westfechtel (2017).
The Families to Persons Case. In Proc. of the 10th
Transformation Tool Contest (TTC 2017), pp. 27-34.
[49] Z. Li, X. Zhou, Z. Ye (2019). A Formalization Model
Transformation Approach on Workflow Automatic
Execution from CIM Level to PIM Level. International Journal of Software Engineering and Knowledge Engineering, pp. 1179-1217,
https://doi.org/10.1142/s0218194019500372
[50] J.M. Mottu, B. Baudry, and Y. L. Traon (2006). Mutation analysis testing for model transformations. In
Proc. of Model Driven Architecture - Foundations
and Applications, 2nd European Conference, pp. 376–
390,
https://doi.org/10.1007/11787044_28

42

Informatica 47 (2023) 21–42

N. Hanh et al.

https://doi.org/10.31449/inf.v47i1.4429

Informatica 47 (2023) 43–50 43

Implementation of Multiple CNN Architectures to Classify
the Sea Coral Images
Zainab N. Nemer1, Wala'a N. Jasim 2 and Esra'a J. Harfash1
1
College of Computer Science and Information Technology, University of Basrah, Iraq
2
Department of Pharmacognosy, College of Pharmacy, University of Basrah, Iraq
E-mail: zainab.nemer@uobasrah.edu.iq, walaa.jasim@uobasrah.edu.iq , esra.harfash@uobasrah.edu.iq
Keywords: corals, deep learning, classification images, image processing, coral sea identification CNN, AlexNet,
SqueezeNet, GoogLeNet, Inception-v3, coral classification.
Received: September 29, 2022
Image processing and computer vision have a major role in addressing many problems, where images
and techniques that are dealt with them contribute greatly to finding solutions to many topics and in
different directions. Classification techniques have a large and important role in this field, through
which it is possible to recognize and classify images in a way that helps in solving a specific problem.
Among the most prominent models that are distinguished for their ability and accuracy in
distinguishing is the CNN model. In this research, we have introduced a system to classify the sea
coral images because sea coral and its classes have many benefits in many aspects of our lives. The
important thing in this work is to study four CNN architectures model (i.e., AlexNet, SqueezeNet,
GoogLeNet/ Inception-v1, google Inception-v3) to determine the accuracy and efficiency of these
architectures and determine the best of them with coral image data, and we are shown the details in
the research paragraphs. The results showed 83.33% accuracy for AlexNet, 80.85% SqueezeNet,
90.5% GoogLeNet and 93.17% for Inception-v3.
Povzetek: Predstavljena je uporaba arhitektur konvolucijskih nevronskih mrež (CNN) za razvrščanje
slik morskih koral.

1 Introduction
There is a growing scientific consensus that earth
systems are under unprecedented stress.The human and
economic development model developed during the
recent industrial revolutions has had a significant impact
on our planet. For 10,000 years, the Earth’s relative
stability has allowed civilizations to flourish. Over time,
industrialization has jeopardized this stability. The
United Nations Sustainable Development Goals are
another lens to see the challenges facing humanity. Six of
the 17 goals are directly related to the environment and
human influence: combating climate change, wisely
using oceans and marine resources, managing forests,
combating desertification, islands reverse land
degradation and sustainable development. [1]
Effective management depends on ecosystem
monitoring, and prompt reporting is necessary to offer
timely advice. At the same time, the procedure of
gathering underwater data for following the communities
which exist under the benthic is greatly aided by digital
images. Recent years have seen a tremendous
advancement in image recognition technology within
artificial intelligence and its various uses in modern
society, opening up new technologies and avenues to
enhance coral reef monitoring

capabilities. Coral reef monitoring is expensive because
it requires specialized techniques. Furthermore, due to
the remoteness of reefs and diving requirements, longterm data sets are often scattered or spatially constrained.
The monitoring method has increased the usage of digital
underwater photography over small spatial scales in
order to keep costs down [2].In the last year, with the
rapid developments in the identification of digital
contents, the process of automatic image classification
has become the most challenging task in computer vision.
In comparison with human vision, the process of
comprehending and automatically analyzing images is
challenging [3], and as computer vision is a combination
of pattern recognition and image processing, the process’
output is image understanding [4]. One of the models
that have demonstrated excellent performance in
computer vision problems, particularly image
classification is the Convolutional neural networks
CNNs [5]. Currently, CNN has become one of the most
attractive methods, and it is now considered as a final
factor in many modern, diverse and challenging
applications of machine learning applications, for
example: ImageNet object detection challenge, image
classification, face recognition. A typical CNN Consist
of one or more blocks of sampling layers, then it is
followed by one or more fully connected layers (FCL)
and an output layer, as in Figure (1).

44 Informatica 47 (2023) 43–50

Z.N. Nemer et al.

Figure 2: AlexNet arhitecture
Figure 1: Convolutional Neural Network
The CNN’s central parts are the convolutional layer
(conv layer). The Images are static typically in nature.
That is, the formation of any one part of the image is the
same as the formation of any other part. Then, a feature
learned in one region may match a similar pattern in
another [6]. The CNN model has several architectures,
and below we talk about some of them that were used
in this work.
The AlexNet is a deep CNN. It is used to successfully
outperform the classical image object recognition
procedures. Rather than a Sigmoid or Tanh, which
represented function and were formerly the accepted
standards for traditional CNNs, the AlexNet uses ReLu
(Rectified Linear Unit) for the non-linear part. ReLu is
given by:
f(x) = max (0, x)

SqueezeNet can be defined as one of the CNN
architectures that has 50 times less parameters
compared to AlexNet while maintaining accuracy on
par with AlexNet. Also, this work demonstrated the
model’s architecture and its application to the ImageNet
dataset. The SqueezeNet model employs the following
techniques to cut the bulk of parameters: reducing the
number of input channels to 3x3 filters, substituting 1x1
filters for 3x3 filters, and down-sampling the network
later. Figure (3) shows how the fire module’s
convolution filters are organized, with a squeeze
convolution layer—which has just 1x1 filters—feeding
into an expand layer—which has a combination of 3x3
and 1x1 convolution filters [9,10].

Three FCLs are placed after five convolutional layers
with reducing filter sizes which are connected
(sequentially). AlexNet could quickly down sample the
intermediate representations with the use of strided
convolutions and max-pooling layers. Vectorized
convolutional maps are utilized as inputs to a
sequence of two FCLs, as depicted in Figure (2) [7,8].

Figure 3: Organization of convolution
filters in the fire module.

Informatica 47 (2023) 43–50 45

Implementation of Multiple CNN Architectures to Classify…

The GoogLeNet is based on the Inception architecture.
It is a system that repeats an inception module. From the
network’s architecture in Figure 4, it is indicated
that there are certain skip connections that, in essence,
constitute a mini-module that is replicated across the
network. This module was known as an “inception
module” by Google. Pooling procedures, spatial
convolution, and multiple channel reprojection are all
included in each module. Larger convolutional
operations (nxn) are split into two convolutional
operations with n x1 and n x1 filter sizes. The parameter
space is shrunk by two orders of magnitude as a result
[11,12,13]

Figure 4: An illustration of the layers of GoogLeNet.
The Inception-v3 CNN architecture uses Factorized 7
x 7 convolutions, Label Smoothing, and use the auxiliary
classifier to transfer label information lower down the
network, among other advances (along with using batch
normalization for layers in the side head). After that, an
FCL is developed on top of the Inception-V3 architecture
as a platform for optimizing the process of classification.
Convolution layers can learn enough on their own
convolution kernel to create the tensor outputs during the
model-building process. Additionally, prior to the
classification stage, our custom model is concatenated
with the individually acquired segmented features. Then
it is considered the base of any model because of its
capability to get important features that can be utilized in
the process of image classification. Figure 5 show the
general architecture [14,15].

Figure 5: complete architecture of Inception-v3

The four learning transfer architectures have been trained

in this study to test their capacity for identifying images of
sea coral, and the accuracy results were provided. The rest
of this work is structured as follows: Section 1 presents the
introduction, section 2 presents the related works, section
3 presents the working system’s description, section 4
presents experimental results thoroughly, and section 5
presents the discussions and conclusions.

2 Related work
Convolutional neural network models can be applied to
many topics for the purpose of classification. There are
many types of CNN models that can be used for each
specific topic, and the following is a set of research in this
direction.
This study by Sumit Sharan et al. is only based on the
challenging but significant Scleractinian (Stony) corals.
Further research is done on a suggested method using
structural levels like branching corals. The results of the
verification show that the testing and training data are
nearly identical, demonstrating the capability of the
suggested method to accurately predict and learn [16].
S. M. Jaisakthi et al. efforts to automatically recognize
and label several types of a benthic substrate using
bounding boxes in a given image are introduced as work
to monitor coral reefs. In order to recognize and detect
various kinds of benthic substrates, an approach based on
CNN is given in this research. Since this technique is
quicker and more accurate at recognizing objects, they
adopted a faster RCNN structure for substrate detection
[17].
The classification approach for coral reef images
was demonstrated by Zvy Dubinsky et al., and it may be
altered to fit other dataset features (number of classes, the
size of the dataset, class types, etc.). Also, the
study compared several CNN architectures, such
as ResNet-50 and VGG-16, and applied transfer learning
to the results. There were eleven classes of coral species
represented by 5500 images in the ResNet-50 dataset.
Here the use of DL is to find out which coral species
were most common in the Gulf of Eilat and then link
those findings to other ecological factors like water depth
or anthropogenic disturbance [18].
Szegedy et al. utilized seven GoogLeNet models in their
study. The initialization (and even initial weights, due to
oversight) and learning rate policies used for training
such models were the same. The main differences
between them were the sampling methods they used and
the randomness of the input images. The ILSVRC 2014
classification challenge involves placing an image into
one of 1000 leaf-node categories in the ImageNet
hierarchy. There are around 50,000 validation images,
1.2 million training images, and 100,000 testing images
[19].
The purpose of this work, led by Eduardo Tusa
and colleagues, is to construct a supervised machine
learning-based vision system for coral detections. A bank
of Gabor Wavelet filters have been used for
extracting texture feature descriptors, and learning
classifiers from the OpenCV library have been used to
distinguish between non-coral and coral reef. The database

46 Informatica 47 (2023) 43–50

of 621 images (created for this purpose) that depicts
Belize’s coral reef: Choose the Decision Trees approach
since it performs the most quickly and accurately (110 for
training the classifiers, 511 for testing the coral detector)
[20].
CNNs, a supervised deep learning technique, are used
by Mohamed Elsayed Elawady to offer an effective
sparse classification for coral species. Additionally, the
researchers experiment with cutting-edge underwater
image enhancement, color conversion, and color
normalization algorithms while computing Phase
Congruency (PC), Weber Local Descriptor (WLD), and
Zero Component Analysis (ZCA) Whitening to extract
shape and texture feature descriptors that are used as
supplementary channels (feature- based maps) with the
input coral image’s basic spatial color channels (spatialbased maps).[21]
The classification of radiography images using 11
CNN architectures (VGG-19, GoogLeNet, SqueezeNet,
AlexNet, Inception-v3, ResNet-18, VGG-16, ResNet50, DenseNet-201, ResNet-101, and Inception-ResNetv2) is presented by Ananda Ananda et al. With the use
of CNNs, two classes—normal and abnormal—of wrist
radiographs from the Stanford Musculoskeletal
Radiographs (MURA) dataset were identified. Different
hyper-parameters against accuracy and Cohen’s kappa
coefficient were used to compare the architectures. [22]
In order to establish a simpler, more effective, and
quicker way to automate the classification of corals, the
fundamental analysis was explored in the work of Sumit
Sharan and colleagues with the use of approaches like
CNN and DL. Only the challenging but significant
Scleractinian (Stony) corals are used as a basis for this
article. Further research is done on a suggested method
using structural levels like branching corals. The results
of the verification show that the testing and data are
nearly identical, demonstrating the capability of the
suggested method to accurately predict and learn [23].
In this article, Nurbaity Sabri and colleagues offer a study
that contrasts the leaf recognition abilities of basic
CNN and pre-trained models AlexNet and GoogLeNet.
The use of such classification models has greatly advanced
computer vision. This study uses MalayaKew for
detecting leaf recognition performance. GoogLeNet
exceeds both standard CNN and Alex Net, achieving a
flawless accuracy rate of 100%. Because of the several
layers in its architecture, GoogLeNet’s processing time is
longer than that of the other models [24].
The accuracy of a technique developed by Hopkinson
B.M. and colleagues to automatically classify 3D
reconstructions of reef sections were evaluated. Locations
on 3D reconstruction have been mapped back into the
original images to extract various views of the location to
produce a 3D classified map. CNNs have been utilized in
each
method
examined
for
classifying
or
extracting characteristics from images; however, each
method tested differed in the method for combining
information from different views of a point into a single
classification. Probability averaging, voting and a layer of
a learned NN were methods for combining information.
[25]-[27]

Z.N. Nemer et al.

3 Description of work system
The field of artificial intelligence and computer vision
has witnessed during these years tremendous
developments with regard to digital image processing
and in various disciplines, and this development had a
major role in addressing many of the issues that images
are mainly involved in solving, including medical,
industrial, educational and other issues. In any direction,
many factors control the quality of the results, including
the size of the amount of data, the method used for
processing, and the methods of extracting the final results
from the analyzed images. n this research, we turned to
treating pictures of sea coral and trying to classify them
using the method CNN. The following is a review of the
most important steps that were followed in this research
to read, treat and classify the sea corals.
There are many types of coral around the world, and there
are some species thrive in warm shallow waters and are
close to beaches and coasts, and some are located in the
depths of the cold, dark sea. So, there are different types
of corals in their characteristics, and in general, coral is
classified as either hard or soft coral; there are many
known types of hard and soft coral. They are easily
distinguished because they are similar to plants, live in
colonies,and have a distinctive appearance.
For the experiments, we dealt with ten classes of sea
corals:(Great Star Coral, Brain Coral, Table Coral, Pillar
Coral, Staghorn Coral, Bubble Coral, Sea Pens,
Toadstool Coral, Carnation Coral, Gorgonian (Sea Fans).
Each class has 50 images. Five of these classes are hard
coral, namely:(Great Star Coral, Brain Coral, Table
Coral, Pillar Coral, and Staghorn Coral), and the other
five are soft coral, namely:(Bubble Coral, Sea Pens,
Toadstool Coral, Carnation Coral, Gorgonian). This
dataset is compiled accurately and according to accurate
specifications of images, and from different sites of the
Internet. In the figure 6 samples from each class of the
approved coral database.

3.1 The CNN structure of sea coral
In this work, we tested four different CNN networks
are:(AlexNet,
SqueezeNet,
GoogLeNet,
and
inceptionv3) in order to test the efficiency of each net in
terms of its ability to classify sea coral data. The input
image is of size 250×250×3 and then cropped to the size
that is appropriate for each Net model and what it
requires. The following is a description of each network
that is used here in this classification problem;
AlexNet: The architecture of AlexNet consists of 25
layers:
• Input data size is [227,227,3]
• There are five Convolutional layers.
• To extract the most appropriate features, there are
three of Max-Pooling layers.
• Then two consecutive layers of FCLs,
• Then softmax is used here as the activation layer in

Informatica 47 (2023) 43–50 47

Implementation of Multiple CNN Architectures to Classify…

the last network layer for predictions.

• The ReLU activation function, where ReLU is the
default activation function,
• Also, the Stochastic gradient descent with momentum
(SGD) solver is used.
SqueezeNet: This model is very common in image
classification problems because it gives greataccuracy
in classification. SqueezeNet architecture consists of 68
layers:
• The input size here is 227x227 x3
• a single convolutional layer of an input and output
layer
• Three of 3x3 max Pooling with stride 2
• The Activation Function depends on the ReLU
activation function, implemented betweenthe
squeeze and expand layers.
• Eight fire modules
• The softmax and the SGD optimizer are used here.
GoogLeNet: GoogLeNet is one of the important
models because it is trained faster. The architecture of
thisnet consists of 144 layers:
• Input images of size 224x224x3
• Three of 3x3 max Pooling with stride 2
• Nine Inception models
• The ReLU activation function is implemented
• SGD optimizers are used
• Finally, fully connected and softmax
Inceptionv3: In Inception-v3 Architecture, there are
315 layers, and we indicated in this net the Conv comes
first, then Batch Norm and ReLU are used after it. The
following are some of the properties that applyin this
network:
• Input images of size 229x229x3
• four of 3x3 max Pooling with stride 2
• Nine Inception models
• Two grid size reduction
• The ReLU activation function is implemented
• SGD optimizers are used
• The Finally Fully connected
• Then prediction softmax

Figure 6: Samples of images of coral dataset

4 Discussion and experimental results
The purpose of implementing several CNN
architectures is to know and measure their efficiency in
Classification problems, especially in sea coral images,
and to determine the most efficient ones. We have
trained these nets according to the specifications
described above. The CNN architectures:(AlexNet,
GoogLeNet, SqueezeNet inceptionv3) are trained on
ten classes of coral images. The results obtained with
these four CNN models are very encouraging, and the
error accuracy of the total results of all ten classes of the
coral is shown in Table (1) for 30 epochs.
Table 1: Pretrained deep learning models.
Network

Accuracy validations (Top_1)

AlexNet

83.33

SqueezeNet

80.85

GoogLeNet

90.5

inceptionv3

93.17

48 Informatica 47 (2023) 43–50

These conventional accuracies represent the Top_1,
which means the expected answer (the highest
probability). All the architectures show important
accuracies, but the inception v3 and GoogLeNet achieved
higher average accuracy than AlexNet and SqueezeNet.
The elapsed time of training of each net is calculated and
distributed as in Figure (7). As we can see from the
figure, there is a clear difference in the time that each
network spends in the training phase with the stability of
the epoch number. Note that inception v3 had the highest
training time, although it was the highest accuracy.
With every architecture that is trained, we measure the
accuracy of each of ten categories in order to determine
the success rate of each type of coral, and the accuracy
was measured by relying on Top_5 accuracy (the highest
probability answers which should match the expected
answer). Table (2) shows the details of the accuracy of
each class with each architecture.
As is known, the Top_5 method always gives a higher
predictor of accuracy, as is evident in Table(2), but from
the point of view of careful observation, we find that the
Great Star and Sea pens coral are almost better with
every architecture.

Z.N. Nemer et al.

Table 2: Accuracy of each coral class with each network.
Name of coral

AlexNet

Squeeze
net

GoogLeN
et

inceptionv
3

0.9583

0.9916

0.9360

0.9498

0.9000

0.9429

0.9513

0.9352

Table Coral (hard) 0.9250

0.9958

0.9017

0.9345

Pillar Coral (hard) 0.9333

0.8941

0.9584

0.9301

Staghorn Coral
(hard)

0.8333

0.9428

0.9527

0.9445

Bubble
Coral(soft)

0.8667

0.9306

0.9962

0.9299

Sea Pens (soft)

Great Star
Coral(hard)
Brain Coral
(hard)

0.9167

0.9958

0.9399

0.9358

*Toadstool Coral
(soft)

0.908
3

0.8857

0.9656

0.9257

Carnation Coral
(soft)

0.875
0

0.9875

0.9855

0.9076

Gorgonian(soft)

0.883
3

0.9428

0.9236

0.9327

For another test, we trained the architects separately on
each type of coral, i.e., hard and soft. This experiment
aims to measure each architecture's efficiency in
identifying the classesof each type. Table (3) shows the
overall results in this case. It is noticeable here that the
accuracy error of identifying the classes of each type
(hard& soft) was better, but the accuracy of soft type in
all the architectures is a certain percentage higher than
hard type.
Table 3: Accuracy of each type of coral.
Network

Accuracy of

Accuracy of

hard coral

soft coral

AlexNet

89.33

90

SqueezeNet

86.83

88.33

GoogLeNet
inceptionv3

Figure 7: The total training time of architectures.
With every architecture that is trained, we measure the
accuracy of each of ten categories in order to determine
the success rate of each type of coral, and the accuracy
was measured by relying on Top_5 accuracy (the
highest probability answers that must match the
expected answer). Table (2) shows the details of the
accuracy of each class with each architecture.
As is known, the Top_5 method always gives a higher
predictor of accuracy, as is evident in Table(2), but
from the point of view of careful observation, we find
that the Great Star and Sea pens coral are almost better
with every architecture.

93.33

95

96.0

96.67

5 Conclusion
In this research, we have introduced work with
Multiple CNN architectures (AlexNet, SqueezeNet,
GoogLeNet, inception v3) to classify the sea coral images.
The point of view of this workis to know and study the
ability of each of architectures in classification problem,
especially withthis type of image. In this work, we want to
know the possibility of classification of sea coral images
by adopting these classification models. We hope at the
same time that this work will have a role in clarifying the
efficiency and ability of each of these CNN architectures
to make it easier to choose any of them according to the
data being processed.
Then, what distinguishes this work is the in-depth
research to reach results that give a decisions in to
directions:first determine the efficiency level of the
various CNN archtictures ,each separately ,second

Implementation of Multiple CNN Architectures to Classify…

,classifying marine coral and obtaining the best reselts
here ,as clarified in the previous paragraph and also in
this part. In this system adopts ten types of sea coral, five
of which are for the hard coral type and the other five
for the soft coral type. Two tests were carried out. In
the first test, training of each net(each one separately)
on all the ten coral classes, and the final results indicate
the high efficiency of allthe architectures in classifying
images as Coral, but GoogLeNet and Inception v3
generally recorded better results. The error accuracy with
GoogLeNet is (90.5%) and with Inception v3 (93.17%).
This is because the GoogLeNet and Inception v3 have
distinct architectures in terms of design compared with
the rest. They are deeper networks, so their results are
generally more accurate.
In the second test, we trained the four nets on each type
of coral separately, that is, hard and soft coral, and the
results obtained from this test indicated the high
efficiency of the four architectures in classification. The
GoogLeNet and Inception v3 were also distinguished by
relatively higher results than the AlexNet and
SqueezeNet, the accuracy of the error with the hard type
was (93.33 %) with GoogLeNet and (96%) with
Inception v3. And with the soft type was (95%) with
GoogLeNet and (96.67%) with Inception v3.
Although the results presented in this paper are very
impressive and are sufficient for what we were aiming of
this research, some issues may hinder obtaining higher
results in this work, including the limited number of
images adopted. We believe that if the number of coral
images was much greater, the results would have been
much higher accuracy. Also, GoogLeNet and Inception
v3 take longer time compared to the other models,
AlexNet and SqueezeNet, because the number of layers
is high in its architecture, especially with Inception v3
Finally, we have tried highlighting the power of CNN
models in recognizing the coral images by choosing
these four different Architectures. Although all these nets
take execution time on the CPU (especially Inception
v3), and of course, this time increases with the number of
cycles, they are very powerful discrimination models.

References
[1] Celine Herweijer, Dominic Waughray,” Harnessing
Artificial Intelligence for the Earth”, PwC and
Stanford Woods Institute for the Environment,
January,2018.
[2] Y. Manuel González-Rivero, Oscar Beijbom, Alberto
Rodriguez-Ramirez and Dominic E.,” Monitoring of
Coral Reefs Using Artificial Intelligence: A Feasible
and Cost- Effective Approach’’, Sensing, Volume 12,
Issue 3, p.489. , 2020, https://doi.org/10.3390/rs120
30489.
[3] Muthukrishnan Ramprasath, ‘Image Classification
using
Convolutional
Neural
Networks’ ,International Journal of Pure and Applied
Mathematics,Volume
119,No.
17,
pp.13071319,2018.

Informatica 47 (2023) 43–50 49

[4] Wiley Victor, and Thomas Lucas. “Computer vision
and image processing: a paper review.” International
Journal of Artificial Intelligence Research 2.1,pp. 2936,2018, https://doi.org/10.29099/ijair.v2i1.42.
[5] Wu, Jianxin. “Introduction to convolutional neural
networks.”,National Key Lab for Novel Software
Technology, Nanjing University, China,Vol. 5, no.
23, p. 495,2017.
[6] F Sultana, A Sufian, P Dutta.,2018, November.
Advancements in image classification using
convolutional neural network. In 2018 Fourth
International
Conference
on
Research
in
Computational Intelligence and Communication
Networks
(ICRCICN),,IEEE,pp.
122-129,
https://doi.org/10.1109/ICRCICN.2018.8718718
[7] Grm, Klemen, Vitomir Struc, Anais Artiges,
Matthieu Caron, and Hazım K. Ekenel, “Strengths
and weaknesses of deep learning models for face
recognition against image degradations.”,Iet
Biometrics vol
7,
no.
1,pp.
81-89,2018,
https://doi.org/10.1049/iet-bmt.2017.0083
[8] Shadman Q. Salih, Hawre Kh. Abdulla,’ Modified
AlexNet Convolution Neural Network For Covid-19
Detection Using Chest X-ray Images’, Kurdistan
Journal of Applied Research (KJAR),Vol. 5,No.1 ,pp.
119-130,2020, https://doi.org/10.24017/covid.14
[9] Forrest N. Iandola, Song Han and Matthew W.
Moskewicz, ‘squeezenet: alexnet-levelaccuracy with
50x fewer parameters and <0.5mb model size’,
Computer Vision and Pattern Recognition (cs.CV);
Artificial
Intelligence,
arXiv
preprint
arXiv:1602.07360, 2016.
[10] Ali Ahmed,’ Pre-trained CNNs Models for Content
based Image Retrieval’,International Journal of
Advanced Computer Science and Applications, Vol.
12,
No.
7,pp.200-206,
2021,https://doi.org/10.14569/ijacsa.2021.0120723
[11] Gomez-Ríos A., Tabik S., Luengo J., Shihavuddin
A.S.M., Krawczyk B. and Herrera F.,’ Towards
highly accurate coral texture images classification
using deep convolutional neural networks and data
augmentation’. Expert
Systems
with
Applications, 118,pp.315-328,2018,
https://doi.org/10.1016/j.eswa.2018.10.010
[12] Nur Azida Muhammad, Amelina Ab Nasir and
Zaidah Ibrahim,’ Evaluation of CNN, AlexNet and
GoogLeNet for Fruit Recognition’,Indonesian
Journal of Electrical Engineering and Computer
Science
Vol.
12,
No.
2,,pp.468475,2018,https://doi.org/10.11591/IJEECS.V12.I2.P
P468-475
[13] Sa Inkyu, Zongyuan Ge, Feras Dayoub, Ben
Upcroft, Tristan Perez, and Chris McCool,
‘Deepfruits: A fruit detection system using deep
neural networks.’, sensors Vol16, no. 8,p. 1222,
2016,https://doi.org/10.3390/s16081222
[14] Nivrito, A. K. M., Md Wahed, and Rayed Bin,
‘Comparative analysis between Inception-v3 and
other learning systems using facial expressions
detection.’, PhD diss., BRAC University, 2016.

50 Informatica 47 (2023) 43–50

[15] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., &
Wojna, Z.’ Rethinking the inception architecture for
computer vision’, In Proceedings of the IEEE
conference on computer vision and pattern
recognition,(pp.
2818-2826),
2016,https://doi.org/10.1109/CVPR.2016.308.

[16] Sharan, S., Harsh, H., Kininmonth, S., & Mehta, U.
(2021). Automated cnn based coral reef classification
using
image
augmentation
and
deep
learning. International Journal of Engineering
Intelligent Systems, Vol.29, no.4, pp.253–261,2021.

[17] Jaisakthi S.M., Mirunalini P., Aravindan C., ‘Coral
Reef Annotation and Localization using Faster RCNN’. InCLEF (Working Notes),Jan, 2019.

[18] Raphael, A., Dubinsky, Z., Netanyahu, N. S., & Iluz,
D.,’ Deep Neural Network Analysis for
Environmental Study of Coral Reefs in the Gulf of
Eilat
(Aqaba)’. Big
Data
and
Cognitive
Computing, Vol.5,
no.2,
pp.19.,
2021,
https://doi.org/10.3390/BDCC5020019.

[19] Szegedy C., Liu W., Jia Y., Sermanet P., Reed S.,
Anguelov D.,... & Rabinovich, A.,2015,’ Going
deeper with convolutions.’ In Proceedings of the
IEEE conference on computer vision and pattern
recognition,pp.1-9,
https://doi.org/10.1109/CVPR.2015.7298594

[20] Tusa Eduardo, Alan Reynolds, David M., Lane, Neil
M.,Robertson Hyxia V., and Antonio Bosnjak,
‘Implementation of a fast coral detector using a
supervised machine learning and gabor wavelet
feature descriptors.’, In 2014, IEEE Sensor Systems
for a Changing Ocean (SSCO)., pp. 1-6, IEEE, 2014.
https://doi.org/10.1109/SSCO.2014.7000371

[21] Elawady M.,’ Sparse coral classification using deep
convolutional neural networks.’, A Thesis Submitted
for the Degree of MSc Erasmus Mundus in Vision
and Robotics (VIBOT), Department of Computer
Architecture and Technology University of Girona,
2014.

[22] Ananda A., Ngan K.H., Karabag C., Ter-Sarkisov
A., Alonso E. and Reyes-Aldasoro C.C.,
‘Classification and visualisation of normal and
abnormal radiographs; a comparison between eleven
convolutional
neural
network
architectures.’, Sensors, Vol. 21,no.16, p.5381,
2021, https://doi.org/10.1101/2021.06.16.21259014

[23] Sharan S., Harsh H., Kininmonth S., & Mehta, U.,’
Automated cnn based coral reef classification using
image
augmentation
and
deep
learning.’, International Journal of Engineering

Z.N. Nemer et al.

Intelligent Systems, Vol. 29,no. 4,pp.253-261,2021,

[24] Sabri N., Aziz Z.A., Ibrahim Z., Rasydan M.A.
and Hafiz A.,’ Comparing convolution neural
network models for leaf recognition.’ International
Journal
of
Engineering
&
Technology,7.3.15,p.141-144,2018,
https://doi.org/10.14419/IJET.V7I3.15.17518

[25] Hopkinson B.M., King A.C., Owen D.P.,
Johnson-Roberson M., Long M.H. and
Bhandarkar, SM,,’ Automated classification of
three-dimensional reconstructions of coral reefs
using convolutional neural networks. ‘,PloS
one, Vol.15,
no.3,p.e0230671,2020,
https://doi.org/10.1371/journal.pone.0230671
[26] Wala’a, N. J., & Rana J. M., (2021). A Survey on
Segmentation Techniques for Image Processing,
Iraqi Journal for Electrical and Electronic
Engineering,
vol.
17
,
pp.
73-9,
https://doi.org/10.37917/ijeee.17.2.10.
[27] Nemer, Z.N., 2022. Hand Gestures Detecting
Using Radon And Fan Beam Projection
Features. Informatica, 46(5).
https://doi.org/10.31449/inf.v46i5.3744

https://doi.org/10.31449/inf.v47i1.4526

Informatica 47 (2023) 51–64

51

Threat Model and Risk Management for a Smart Home IoT System
Ahmed Redha Mahlous1*
1
Prince Sultan University, KSA, Saudi Arabia
Email: armahlous@psu.edu.sa
*
Corresponding Author
Keywords: STRIDE, DRED, smart homes, IoT, security risk assessment
Received: November 21, 2022
The emergence of smart homes, driven by the rapid growth and development of technology, has
brought numerous benefits to human life, including convenience and improved wellbeing. However,
the incorporation of IoT devices into smart homes and their connection to the Internet have created
new security and privacy challenges that affect the confidentiality, integrity, and availability of data
collected and exchanged by these devices. Such challenges have led to security threats that render IoT
devices in smart homes vulnerable to various vector attacks. To provide a comprehensive picture of
the security of smart homes, this paper applies the STRIDE [1] threat model to identify potential
threats at different layers, namely the IoT device, communication, and application layers.
Additionally, a risk-rating security threat model, DREAD, is used to assess the risks of these threats.
Finally, this paper proposes a risk mitigation strategy to respond to the rated risks and improve the
security of smart home IoT systems. The primary aim of this paper is to enhance the understanding of
the various security threats insmart homes and provide a security baseline to enhance the security of
smart home IoT systems.
Povzetek: V članku je predstavljena uporaba modela STRIDE na IoT napravah pametnega doma za
prepoznavanje potencialnih groženj na različnih ravneh.

1

Introduction

Smart homes or home automation is a term used for homes
that have certain devices that sense, control, and regulate
the attributes of the house, this might include attributes
such as temperature, power consumption, entertainment
systems, and might include security features such as
camera surveillance and door smart locking.
Smart home devices create a lot of convenience and more
control features to homeowners that are extremely
attractive to normal homeowners especially when they are
at a very competitive price. Benefits include remote
control over home features inside or outside the home
itself, a decrease in power consumption which creates
significant savings for the homeowner, having smart
security monitoring which gives a sense of security and
privacy for homeowners as well.
The market for smart home and home automation has been
increasing dramatically due to the convenience it brings,
ease of use and setup, and the decrease of its prices lately
due to the huge competition. The global market of home
automation is reaching a size of 100 Billion dollars, with
more than 250 million homes that use such technologies
which represent around 12% of homes worldwide [2].
The competitive nature of such a growing market has also
created many flaws that together with many risks and
technical issues that are growing as well. Issues and risks
may include platform fragmentation [3] which is a term
used when many devices with different incompatible
software are connected. Lack of technical standards in
many of these devices causes more risks that may affect
the devices’ security and privacy promises. Moreover, the
usage of different communication standards also creates

many complications when it comes to the security of the
systems. And finally, the usage of insecure operating
systems such as old versions of android due to the low
technical requirement and ease of development imposes
huge risks on the security of the systems, with studies that
show that more than 80% of android devices that are
running are not secure [4], and may have at least one
critical vulnerability.
Smart home devices may have many security risks that
include easier home intrusions which may happen if the
home security system had weak security which allows
hackers and thieves to break into the system and disable
its feature, or moreover, open the door for them. Also,
target targeted attack that targets the smart home device to
find and collect data about the user which includes his
name, phone number, main email account, password used
if it was not encrypted, and maybe their credit cards detail
as well. Moreover, a breach of privacy may happen if an
attacker had access to previous or even live recordings of
any internal camera/microphone which the attacker may
use against the victim at any time he wishes as blackmails
and more.
Smart home devices have so many kinds of risks due to
the amount of point of attack that exists because of their
nature, most of them use unprotected communication
protocols that are mainly wireless, most of them use
unprotected software that controls them, many of them use
very weak security policies and controls, and many of
them are IoT devices which are connected to the internet
which is another point of attack with many kinds of attack
as well.

52

Informatica 47 (2023) 51–64

The motivation to write this paper is a rapid increase in
smart home devices usage in recent years, and we wanted
to explore the different potential threats that can be used
against IoT systems in a smart home. The contribution of
this study is the result of the risk assessment model which
can be used to plan for successful strategy to mitigate risks
and contribute to the development of a secure IoT devices
for smart home. We believe that it is important to make
users and designers of smart home become more aware
about the security and privacy breach against such
devices.

2

A.R. Mahlous

requirements, section 4 reviews Security Objectives of the
system. Section 5 presents the risk assessment approach
and finally a conclusion is presented in section 6.

3

Scenario and requirements

To understand better the different assets and threats that
night exist in a smart home system, we present the
following scenario. The surface of the smart home is
200m2 and it consists of two stories building and an attic
as shown in Figure 1.

Literature review

Authors in [5] presented a review study of the different
face detection approaches in the IoT domain and their
application in smart home IoT systems. Authors in [6]
surveyed the security of the smart home and the privacy
of people living in. They analyzed the security and risks
faced by smart home and identified a set of vulnerabilities
that can be exploited to gain unauthorized access. The
security problems related to the usage of smartphone in
smart home was the study of [7]. They listed some
problems such as power and Internet malfunction,
Software failure, Confidential Data leakage and
Eavesdropping attack.
Many studies emphasized on the challenges, risk and
difficulties that smart home’s owners and designers face
in securing their IoT systems [8]. Authors in [9] mentioned
the example of the DDOS attack that happened in
November 2016 in two buildings in Finland when most of
the automated systems controlling thermostats were
shutdown.
The data privacy drew the attention of authors in [10] .
They highlighted the legal issues related to data privacy
and storage in IoT systems in smart home. While authors
in [11] tried to fil the gap related to the role of privacy in
smart home and address the concerns related to what
extend user’s concerns for information privacy influenced
the intended smart home usage. A multi-theoretical model
using Smart PLS 3.2.8 was tested and the derived findings
from empirical study emphasize the importance of
addressing privacy concerns because they can influence
on the intended usage of smart home.
Authors in [12] deducted that user assume that their
privacy is protected while using IoT devices but are often
unaware about of the potential leak of sensitive
information. In another study [13], authors concluded that
user’s security risk perception has an effect of their
intention to use smart home devices, while authors in [14]
stated that users convey responsibilities of their privacy
protection while using IoT devices to the manufacturers.
Authors in [15] provided an overview of users’ perception
of security while using IoT devices. They developed a
model and tested it with multiple linear regression. Using
a survey, they concluded that users’ awareness about
many threats, have an effect on IoT security importance.
In the other hand, most of the users do not check their
security settings and feel safe while using IoT devices.
The rest of the paper is structured as follows: section 2
literature review, then section 3 presents a scenario and

Figure 1: Smart home
The house contains the smart devices (Camera and Smart
Door) and some of the controlling devices (tablet), outside
of the house, other controlling devices are there such as a
Smartphone, all connected to the internet while having an
API communicate between the device interface and the
user interface.
The user of the smart home system is most of the time
away and needs to have the safest house possible. We
define the following requirements:
•
The user of the smart home system wants to be
able to access and monitor the following IoT systems
remotely when he is away:
o Climate control
o Smoke / fire
o Temperature issues out of the normal range
o Door and window locks
o Lawn watering
o Local alarm and emergency department
messaging
•
The user of the smart home system needs to have
control over the system locally and through the cloud,
which means he should be able to access the controller
remotely using his smart phone or locally via a web
browser.
•
The sensors should send their collected data to
the system and different actions should be taken upon the
sensor’s input. For instance, if the temperature goes above
a certain threshold, it probably means that the AC is not
working properly, and a notification should be sent to
someone without any delay. Also, in case of the presence
of a smoke, the smoke detector should sound, and an alert
should be sent to the owner as well as to the fire brigade.

Threat Model and Risk Management for a Smart Home IoT System

Informatica 47 (2023) 51–64

•
The system should allow the user of the smart
home system to change the threshold values that trigger
different actuators and events as necessary, either locally
or through the mobile app. The triggers and behaviors,
data analytics, and remote-control access are all available
through a home automation cloud application service that
the system will interface with.
•
The accounts used to access to the system should
be protected by strong passwords.

53

and retained
for a period
of time.
The
actuators in
smart home
should be

4 The security objectives of the
system

controlled
by the cloud

In the smart home we have many different IoT ranging
from locks, cameras, and climate controllers to smoke and
fire detector and lawn watering, each may have certain
logs that store info about their activity or previous
recordings. For example, cameras and microphones have
previous recordings that are video and voice files. Climate
controllers and door locks may have logs about previous
activities. All stored info, recordings, and activity logs can
be used to a hacker’s advantage by doing reconnaissance
and data analysis to find more info about the homeowner.
All of these kinds of data shall have clear policies
regarding their storage and access capabilities to eliminate
such risks. Thus, it is imperative to for any system like this
to define the associated security needs and objectives.
Taking into account the requirements mentioned in section
3 above, we present in Table 1 below the categories, the
risk of breaching them and their associated security needs.

application,
while the
IoT systems
should have
the ability
to load their
read data to
the cloud.
In terms of
machine to
machine
(M2M),

Table 1: Categories, risk and security needs.
Category

Risk

only
allowed

Security needs

machine

definition

access is
Identity:

Unauthor

Each person

access and

ized

who

authorization

access to

accesses the

controls

the IoT

smart home

should be in

system

should have

place to

from

a separate

document

stolen

Username

who is

credentia

and

accessing the

ls.

password.

IoT system.

All access
events
should be
logged in
the cloud

permitted.

Financial: A

Substanti

Document

financial loss

al cost

the

due to the

may

financial

system

incur due

losses that

failure should

to the

could occur

be

malfuncti

due to a

documented

on or

failure of

system

the system,

failure.

system

For

components

instance,

54

Informatica 47 (2023) 51–64

A.R. Mahlous

if the
climate

Privacy and

Personal

Document

Regulation:

financial,

the impact

health,

of any

and other

privacy

informati

concerns as

on stored

well as

on

regulation

devices

requirement

on the

s for this

network

system.

control
fail, the

Identify any

heating

data that

or

could cause

cooling

privacy

system

concerns for

run

the owner of

unnecess

the smart

arily.

home
system.

could be
stolen

Reputation:

In the

Document

Customer’s

event of

any

Availability

If the system

No downtime is

reputation

security

possible

Guarantees:

is down,

acceptable

might be

breach,

impact on

the system

negative

affected due

confident

the

should have

impact to the

to the system

ial

customer’s

maximum up

life of people

breach

financial

reputation if

time

using the

informati

the IoT

system and

on could

safety/secur

damage to the

be stolen

ity system

property itself

such as

is attacked

will incur...

credit

and

card

customer’s

Safety:

Significant

Document the

number.

financial

Ensure the

loss to the

potential

Consequ

information

safety of

property and

impacts to

ently, the

is stolen.

people using

loss of life is

physical welfare

customer

the smart

the system is

of people and

’s

home as well

compromised.

physical damage

reputatio

the safety of

to equipment

n may be

the property

and facilities.

damaged.

Threat Model and Risk Management for a Smart Home IoT System

5

Risk assessment approach

There are many threats that are documented by known
organizations that list vulnerabilities of such devices.
Some of the vulnerabilities are reoccurring such as
improper authentication techniques. Most vulnerabilities
are threats to the confidentiality of the saved data of the
smart home system which violates the confidentiality
attribute of the CIA model. This attribute specifically is
the most important due to the huge amount of privacy
concerns and threats generated from such vulnerabilities
in this domain.
Cyber attackers today, are becoming more and more
clever in launching a cyberattacks against smart home IoT
systems due to the existence of many kinds of
vulnerabilities that exist in smart home devices, from
authentication problems [16] to obtain admin account,
insecure storage configuration which allows attackers to
gain access [17], and some overflow bugs [18] to listening
to open TCP ports to fetch admin passwords [19]. These
vulnerabilities cause a potential threat to confidentiality
which is the most important aspect of these systems and
much more. Thus, it is imperative for smart home
designers to be aware of the different threats that might
target the smart homes IoT systems.

5.1

Informatica 47 (2023) 51–64

55

Table 2: Threat model at the device level.
Threat type

Asset type

(S)poofing – can

Sensors

Threats
Access to the

an attacker

wireless network

pretend to be

through password

someone he is

cracking

not, or falsify

Man in the middle

data?

attack can result in
fake data to be
injected using
bogus devices
False sensors can be
added to the mart
home IoT system

Threat model

In this paper we used the STRIDE framework to identify
threats, prioritizing and mitigating them. STRIDE is an
acronym for each of the threat categories it deals with:
Spoofing,
Tampering,
Repudiation,
Information
disclosure, Denial of Service, and Elevation of privilege.
It was created in 1999 by Microsoft [20].
We created a detailed threat model for the smart home
system. For each layer of the attack surface (IoT device
Layer, communication layer and application layer), we
identified the assets type used in the smart home and the
threats corresponding for each STRIDE’s category as
shown in Table 2, Table 3 and Table 4 respectively.

Actuators

Spoofing the
identity of the
actuator, thus
issuing false control
action

56

Informatica 47 (2023) 51–64

Threat type
(T)ampering –

A.R. Mahlous

Asset type
Sensors

Threats

Threat type

Open ports may

(I)nformation

can an attacker

lead to the access to

Disclosure – can

successfully

the smart sensor

the device leak

inject falsified

shell.

confidential data

data into the

to unauthorized

system?

parties?

Asset type

Threats

Sensors

Malware may create
false firmware
Credentials might
be stolen if access
to the terminal is
achieved.

Theft of sensors.

Encryption key and

Disconnecting

credentials might be

sensors from Power

disclosed
Buffer overflow
Sensor stolen or
damaged.
Actuators
(D)enial of

Actuators

Access code theft

Sensors

see above
power source can be

Service – can

disconnected,

the device be

batteries run out

shut down or

Theft of actuators.

made

Disconnecting

unavailable

actuators from

maliciously?

Power

theft or damage

Actuators

see above

Buffer overflow
(E)scalation of

Sensors

theft of passwords

Actuators stolen or

Privilege – can

or keys through

damaged.

users get access

access to firmware

to privileged

or binaries on the

resources meant

device

only for admins
or superusers?
(R)epudiation –

Sensors

-

Actuators

-

can a user
pretend that a
transaction did
not happen?

Actuators

see above

Threat Model and Risk Management for a Smart Home IoT System

Informatica 47 (2023) 51–64

Table 3: Threat model at the communication layer.
Threat type
Threat type
Spoofing

Network
or Device
sensoractuator
network

Wi-Fi
Network

cell phone

tablet

IoT
Gateway

Tampering

sensoractuator
network

Wi-Fi
Network

cell phone
tablet
IoT
Gateway

Repudiation

sensoractuator
network
Wi-Fi
Network

Threats
man-in-the-middle
attacks
implementation of
weak password in
802.1.5.4 security
suites
Interception and
decoding of traffic
by a False access
point.

same as Wi-Fi
using social
engineering to trick
users to give up
passwords
man-in-the-middle
lost unsecured
device allows
strangers to access
network
weak or default
credentials allow
access to logs,
locally stored sensor
data
fake device can join
network and submit
false data
lack of message or
payload
authentication
enables false data to
be sent on the
network
wireless protocol
security can be
hacked, false user
joins network and
injects false data
wireless protocol
security can be
hacked, false user
joins network and
injects false data
time stamping
tampered with,
damages credibility
of logging
-

Network
or Device
cell phone

tablet
IoT
Gateway
Denial of
Service

sensoractuator
network

Wi-Fi
Network

cell phone
tablet
IoT
Gateway

Escalation of
Privilege

sensoractuator
network
Wi-Fi
Network

cell phone

57

Threats
logs of cellular
communication not
available because of
privacy laws
damage or
destruction of any
logs on gateway
rogue device
broadcasts on
network, keeps
devices awake and
depletes power
wireless signal
jamming
replay attack ties up
network resources or
depletes sensor
device battery power
outdoor APs could
be damaged or
stolen
hacker can use
jamming attack
which , causes
legitimate users’
packets to be
dropped
various IP and TCP
DoS attacks
ICMP DoS ping
attack from outside
IP network
use of vulnerable
UDP services
interception of weak
credentials gains
unauthorized access
to the network
cracked password
allows user to gain
access
weak password on
AP allows access to
network information
and control
weak password on
lost or stolen
devices allows
thieves access to
device and
configured
credentials for other
networks

58

Informatica 47 (2023) 51–64

Threat type

5.2

Network
or Device
tablet
IoT
Gateway

A.R. Mahlous

Table 4: Threat model at the application layer.
Threats
same as phone
weak or default
passwords

Threat type

Application
local

Applications used in the application
layer

Before we define the threats at the application layer, it is
essential to know what applications are needed at this
layer. The smart home contains a number of applications
that help the user to understand what is happening in an
IoT system using dashboards and send information about
the system.
These applications are accessed through the internet via a
web portal and usually are part of a cloud service. Control
applications enable interaction with the system, either
through direct control of actuators from the application
interface, or through software which automates the
operation of the system through code that reads sensor
values and triggers actuators. We find also embedded
applications in some IoT system that can be accessed over
the network using HTTP interfaces. Figure 2 shows the
applications, how they can be accessed and their purpose.

Spoofing
mobile

cloud
local

Tampering
Figure 2: Applications used in smart home.

mobile

cloud

local
Repudiation
mobile

Threats
Wi-Fi man in the
middle, packet
capture and
decryption, false
access point
enables packet
capture
stolen phone
allows attacker to
impersonate
legitimate user
poorly built mobile
apps could use
insecure
communications
mobile apps could
steal data or be
vulnerable to
malware
password cracking
at web login
hardcoded
credentials,
encryption keys,
and certificates can
be stolen from
decompiled
firmware, can be
used to submit
false data
unencrypted data
may be stored by a
mobile app, could
be edited
unsecured
messaging
protocols (MQTT)
could allow false
data to be
submitted into the
system
UPnP opens ports
in firewall
no logging or
transaction
tracking
insufficient or
difficult to access
logging of mobile
app data

Threat Model and Risk Management for a Smart Home IoT System

Threat type

Application
cloud

local

Denial of
Service
mobile

cloud

Escalation of
Privilege

local

mobile

Threats
insufficient
logging, log file
corruption or
destruction,
timestamp
tampering
logging not
available or not
configured
unreliable logging
mechanism
unchanged default
passwords enable
making IoT
devices into bots
that work in DDoS
attacks
multiple failed
attempts to log on
to device can result
in lockout or
destroy data
repeated brute
force attacks
intentionally lock
out legitimate users
DoS attacks
against web portal
or cloud service
default user
accounts and
passwords on
embedded device
apps allow
successful logins
by unknown users
weak or default
passwords can
enable
unauthorized users
to access a lost or
stolen phone and
control the system
use on unsecured
public Wi-Fi
networks may
allow hackers to
steal credentials
and other
information

Informatica 47 (2023) 51–64

Threat type

5.3

Application
cloud

59

Threats
SQL injection can
provide access to
user account
information.
Weak or default
user credentials at
web portal allow
access to the app
across the Internet

DREAD risk assessment model

The risk assessment model we adopted in our paper is the
DREAD [20],[21]. Like the STRIDE model, it was created
by Microsoft and it helps rating, comparing and
prioritizing the severity of risk presented by each threat
that was classified using STRIDE defined earlier in this
paper.
DREAD is an acronym that represents the following risk
factors: Damage, Reproducibility, Exploitability,
Affected users and Discoverability. It averages the scores
rated 0-10 for each of risk factor. The higher the number
means more serious the risk is, and would be given a
higher priority, thus it should be given attention first.
Table 5 describes each of the DREAD factors.
Table 5: DREAD factors.
Factor
Damage

Reproducibility

Exploitability

Affected users

Discoverability

Definition
Damage defines the level of
damage that could be done to
users and the organization if an
attack were to succeed.
Reproducibility is a measure of
how easy it is to reproduce a
particular attack. For instance, if
an attack can be reproduced
reliably, it would be rated higher
than the one that is statistically
unlikely to be exploited or one
that cannot be reproduced
consistently.
The exploitability of a threat
describes how difficult it is to
exploit a vulnerability.
The affected users risk factor
represents percentage of users
that will be affected by a
particular threat. The greater the
number of users who may
potentially be affected, the higher
this risk factor should be rated.
Discoverability signifies how
easy it is to learn about the
vulnerability.

60

Informatica 47 (2023) 51–64

A.R. Mahlous

In this section, we consider risk metric for some of the
relevant threats that have been identified previously. The
following assumptions are made:
•
All members of the family that live in the home
will be affected by any exploit.
•
The reproducibility and discoverability metrics
always be rated as high (score of 3 for all types of
vulnerabilities)
•
The Reproducibility and Discoverability are
always rated 3.
Table 6: DREAD factor-score
DREAD Factor
Damage

Score
1 = low impact, 3 = high
impact
always 3 - easy
1 = difficult, 3 = easy
1 = few, 3 = many
always 3 - easy

Reproducibility
Exploitability
Affected Users
Discoverability

Based on the scoring described in Table 6, a grade is
assigned to some of the previously discovered threats from
each layer as shown in Table 7.
Table 7: Threat grade.
Attack Surface
and Threat
physical device
- firmware can
be decompiled
and file system
and files
inspected for
credentials or
keys

D

2

R

3

E

1

A

3

D Total

3

12

physical device
- power source
can be
disconnected,
batteries run out

communications
- lack of
message or
payload
authentication
enables false
data to be sent
on the network
communications
- ICMP DoS
ping attack from
outside IP
network

1

3

1

3

3

11

2

3

2

3

3

13

1

3

1

3

3

11

3

3

3

3

3

15

application unchanged
default
passwords
enables making
IoT devices into
bots that work
in DDoS attacks
application weak or default
passwords can
enable
unauthorized
users to access a
lost or stolen
phone and
control the
system

Once the scoring is defined, we put the risks in order by
the highest to lowest DREAD metric and estimate the
likelihood that the risk will occur. The score of the
likelihood is given1 for unlikely and 3 for very likely as
shown in Table 8.
Table 8: Threat likelihood score.

3

3

3

3

3

15

physical device
- data can be
faked by bogus
devices or
injected by man
in the middle
attacks
1

3

1

3

3

11

Attack Surface
and Threat
physical device power source can
be disconnected,
batteries run out
application weak or default
passwords can
enable
unauthorized
users to access a
lost or stolen
phone and control
the system

Total

Likelihood

15

2

15

2

Threat Model and Risk Management for a Smart Home IoT System

Informatica 47 (2023) 51–64

61

communications ICMP DoS ping
attack from
outside IP
network
13

1

12

1

physical device firmware can be
decompiled and
file system and
files inspected for
credentials or
keys
physical device data can be faked
by bogus devices
or injected by
man in the middle
attacks

Table 9: Risk response

11
communications lack of message
or payload
authentication
enables false data
to be sent on the
network
application unchanged
default passwords
enables making
IoT devices into
bots that work in
DDoS attacks

11

11

5.4

Figure 3: Risk treatment

1

1

3

Risk response for the rated risks

Once we have identified, categorized, and prioritized the
threats to smart home, we provide approaches that
document how we want to respond to the threat. As a
response to a security risk, we can tolerate the risk, transfer
the risk to another party, treat the risk, or terminate the risk
as shown in the Figure 3. The detection of threats has value
only if there are available responses. Plans for the
responses to various attacks should be made in advance.
Table 9 is the result of applying one of the responses to the
identified threats.

Threat
physical device - power source can
be disconnected, batteries run out
application - weak or default
passwords can enable unauthorized
users to access a lost or stolen
phone and control the system
communications - ICMP ping DoS
attack from outside IP network
physical device - firmware can be
decompiled and file system and
files inspected for credentials or
keys
physical device - data can be faked
by bogus devices or injected by
man in the middle attacks
communications - lack of message
or payload authentication enables
false data to be sent on the network
application - unchanged default
passwords enables making IoT
devices into bots that work in
DDoS attacks

5.5

Risk Response
Treat

Treat
Tolerate

Tolerate

Tolerate

Tolerate

Treat

Risk mitigation strategies

Finally, any risks that have been identified with a "treat"
response need to be mitigated. Table 10 shows a sample
of mitigation strategy for the concerned threats.

62

Informatica 47 (2023) 51–64

A.R. Mahlous

Table 10: Mitigation strategy

Threat

Risk
Response

physical device power source can
be disconnected,
batteries run out
Treat

application - weak
or default
passwords can
enable
unauthorized users
to access a lost or
stolen phone and
control the system

application unchanged default
passwords enable
making IoT
devices into bots
that work in
DDoS attacks

6

Treat

Treat

Mitigation
Strategy
because this is a
home installation,
everyone who lives
in the home can be
informed that the
IoT devices should
not be unplugged.
For any devices
that are on battery,
establish a regular
day to replace the
batteries during the
year.
Use strong
passwords. Inform
anyone who has
the controller
phone app to use
strong passwords
to protect access to
the phone to
prevent someone
from taking control
of the actuators in
the house or
stealing other
information if the
phone has been
lost.
Change any weak
or default
passwords. In the
design and
implementation of
this system, the
company should
enforce a policy
that these
passwords are
changed prior to
deployment at the
customer site.

Acknowledgment
The author would like to acknowledge the support of
Prince Sultan University for paying the Article Processing
Charges (APC) of this publication.

References
[1]

[2]
[3]
[4]

[5]

[6]

[7]

[8]

Conclusion

Smart home devices are great, they give a sense of security
to homeowners. Yet, they need constant enhancement to
their security measures, many types of security threats
exist nowadays from so many types of entry ports. These
threats can be resolved with a more standardized way of
building these devices and giving them well-designed
software that was designed with security in mind. With the
current devices in the market, we can see that smart home
devices are the weakest link in the chain of devices, so
more focus should be put into making them more secure.

[9]

[10]

https://learn.microsoft.com/en-us/previousversions/commerce
server/ee823878(v=cs.20)?redirectedfrom=MS
DN
https://www.statista.com/topics/2430/smarthomes/#dossierKeyfigures
https://www.mobileworldlive.com/mwc16articles/iot-experts-fret-over-fragmentation/
https://www.zdnet.com/article/android-securitya-market-for-lemons-that-leaves-87-percentinsecure/
Fatima, Saman & Aslam, Naila & Tariq, Iqra &
Ali, Nouman. (2020). Home Security and
Automation Based on Internet of Things: A
Comprehensive Review. IOP Conference Series:
Materials Science and Engineering. 899. 012011.
https://doi.org/10.1088/1757899X/899/1/01201
1
Mada Albany, Enas Alsahafi, Itidal Alruwili,
Salim Elkhediri,
A review: Secure Internet of thing System for
Smart Houses, Procedia Computer Science,
Volume 201, 2022,
Pages 437-444, ISSN 1877-0509,
https://doi.org/10.1016/j.procs.2022.03.057.
Karimi, Khaoula, and Salahddine Krit. “Smart
Home-Smartphone Systems: Threats, Security
Requirements and Open Research Challenges.”
2019 International Conference of Computer
Science and Renewable Energies (ICCSRE),
July
2019.
https://doi.org/10.1109/iccsre.2019.8807756.
Arabo, Abdullahi, Ian Brown, and Fadi ElMoussa. “Privacy in the Age of Mobility and
Smart Devices in Smart Homes.” 2012
International Conference on Privacy, Security,
Risk and Trust and 2012 International
Confernece on Social Computing, September
2012.
https://doi.org/10.1109/socialcompassat.2012.108.
Huraj, Ladislav, Marek Šimon, and Tibor Horák.
“Resistance of IoT Sensors against DDoS Attack
in Smart Home Environment.” Sensors 20, no. 18
(September
16,
2020):
5298.
https://doi.org/10.3390/s20185298.
Sanchez, Veralia, Carlos Pfeiffer, and Nils-Olav
Skeie. “A Review of Smart House Analysis
Methods for Assisting Older People Living
Alone.” Journal of Sensor and Actuator
Networks 6, no. 3 (July 21, 2017): 11.
https://doi.org/10.3390/jsan6030011.

Threat Model and Risk Management for a Smart Home IoT System

[11]

[12]

[13]

[14]

[15]

[16]
[17]
[18]
[19]
[20]
[21]

Guhr, Nadine, Oliver Werth, Philip Peter
Hermann Blacha, and Michael H. Breitner.
“Privacy Concerns in the Smart Home Context.”
SN Applied Sciences 2, no. 2 (January 21, 2020).
https://doi.org/10.1007/s42452-020-2025-8.
Zheng, Serena, Noah Apthorpe, Marshini Chetty,
and Nick Feamster. “User Perceptions of Smart
Home IoT Privacy.” Proceedings of the ACM on
Human-Computer Interaction 2, no. CSCW
(November
2018):
1–20.
https://doi.org/10.1145/3274469.
Klobas, Jane E., Tanya McGill, and Xuequn
Wang. “How Perceived Security Risk Affects
Intention to Use Smart Home Devices: A
Reasoned Action Explanation.” Computers
&amp; Security 87 (November 2019): 101571.
https://doi.org/10.1016/j.cose.2019.101571.
Haney, J.; Acar, Y.; Furman, S. “It’s the
Company, the Government, You and I”: User
Perceptions of Responsibility for Smart Home
Privacy and Security. In Proceedings of the 30th
USENIX Security Symposium (USENIX
Security 21), Online, 11–13 August 2021
Nemec Zlatolas, Lili, Nataša Feher, and Marko
Hölbl. “Security Perception of IoT Devices in
Smart Homes.” Journal of Cybersecurity and
Privacy 2, no. 1 (February 14, 2022): 65–74.
https://doi.org/10.3390/jcp2010005.
https://www.cvedetails.com/cve/CVE-20189162/
https://www.cvedetails.com/cve/CVE-201815123/
https://www.cvedetails.com/cve/CVE-201820299/
https://www.cvedetails.com/cve/CVE-201711634/
https://www.microsoft.com/enus/securityengineering/sdl/threatmodeling
https://learn.microsoft.com/en-us/windowshardware/drivers/driversecurity/threatmodeling-for-drivers.

Informatica 47 (2023) 51–64

63

64

Informatica 47 (2023) 51–64

A.R. Mahlous

https://doi.org/10.31449/inf.v47i1.3629

Informatica 47 (2023) 65–72

65

Prediction of Heart Disease Using Modified Hybrid Classifier
Rishabh Pipalwa1, Abhijit Paul2*, Tamoghna Mukherjee3*
1
Department of Information Technology, Amity University, Kolkata, India.
2
Department of Information Technology, Amity University, Kolkata, India.
3
Department of CSE, Amity University Kolkata, India.
Emails: rishabhpipalwa@gmail.com, a_paul84@rediffmail.com, tamoghna.9@gmail.com
*Corresponding author
Keywords: heart disease, cardiovascular disease, clinical diagnostic system, modified hybrid classifier, health care
system
Received: July 5, 2021
This paper proposes a Machine Learning or ML-based strategy to accurately identify a possible heart
disease patient. Unlike traditional diagnostic systems which are time-consuming and have human error
involved to take care of the patient and diagnose the patients. The proposed system identifies whether the
patient will face these kinds of diseases in near future or not. The system is developed based on machine
learning techniques such as Naive Bayes, XGBoost gradient classifier, support vector machine, and
decision tree. Some external factors were also considered which may lead to heart disease in the future.
Furthermore, an integrated web application has been developed that alert and gives a user-friendly
interface for recognition and prediction. Thirteen diagnostic factors and five environmental factors are
analyzed. The proposed diagnosis system attained good precision as compared to previous methods
recommended earlier. In addition, the system can easily be implemented in the public domain to spread
awareness regarding heart disease, and it also talks about the possibility of heart disease in near future.
Povzetek: Predstavljeni sistem zazna morebitno srčno bolezen iz trinajstih diagnostičnih in petih okoljskih
dejavnikov z uporabo algoritmov strojnega učenja.

1

Introduction

Cardiovascular diseases are popularly known as heart
disease leading to a heart attack. In 2018 heart attacks
killed nearly 17.9 million humans all over the world. Heart
disease is found in 3 out of 5 patients in the critical care
unit. The complexity of this disease lies in the fact that it
suddenly fails the functioning of human and then SOP
(Standard Operating Plan) is required; if not provided on
time, patients’ life is in danger. A proper healthcare
system takes time to detect the cause and effectively start
the diagnosis whereas our proposed system efficiently and
accurately tells the client whether a patient has heart
disease or not. The heart has an essential and critical role
in the physical body as it is in control of the flow of blood
in different parts of the body which helps in adequate
oxygen supply and nutritious elements to be supplied to
the required part. Any life is dependent totally on a proper
flow of blood, in human life heart is the pumping room of
blood. Any disturbance in the flow or the function of the
heart may lead to death within seconds [1]. According to
the World Health Organization, 17,000,000 people die
every year among the 3,000,000 who die before the age of
60 from heart disease. In 2019, the percentage of sudden
deaths from heart disease ranged from 4% in high-income
countries to 42% in low-income countries [2].
When the heart receives limited blood for a
longer period of time it is called ischemic heart disease.
Search conditions develop over a course of time which can

be periodically monitored and cured with the help of
expert supervision. There is a time when ischemic heart
patients have a heart attack and after that, the chance of
survival also reduces as the disease has been developing
over a longer period of time and the heart is habituated or
accustomed to limited blood flow. For such things, early
predictions or alertness help in the long run.
Diagnosis of heart disease is usually done by reading the
patient's medical history, the medical examination report,
and the evaluation of symptoms associated with a medical
doctor. Although the research found in this diagnostic
method is less accurate in diagnosing a heart disease
patient. In addition, it is expensive, and it is a
computerized challenge to analyze [3]. We have proposed
a machine-based diagnostic method. In this study, the
machine learning prediction model includes naive bayes,
support vector machine (SVM), tree decision, and
XGBoost gradient classifier. The standard state of these
models has been maintained for analysis purposes. Stalog,
Hungarian, Switzerland, Long Beach VA, and Cleveland
datasets combinedly were used in this article. We have
designed a web-based application that accesses the model
for general public use.
This article addressed the problem of predicting
the possibility of a heart disease using machine learning
(ML) techniques. Here standard feature extraction and
profound algorithm classifiers appropriate features were

66

Informatica 47 (2023) 65–72

extracted and analyze with expert guidance from medical's
experts which gave a good result in the analysis and
accuracy of the proposed algorithm. Then it predicts the
future possibility of heart disease by understanding the
environmental factors and common habits which may lead
to heart disease. Finally, all the modules are combined into
a single Python-based framework known as a flask for
giving the model a front-end part. This web-based
application represents heart disease possibilities with
simplicity so that any non-technical or layman can easily
detect heart disease.
The organization of the following sections is
explained below. This article aims to provide a literature
review on relevant heart disease factors and their
identification techniques in section 2. Section 3 introduces
the proposed system model. Section 4 introduces the webbased design of the proposed model. Section 5 includes
results and discussion where performance is analyzed and
training and testing results are shown.

2

Related work

The researchers in this case analyze several automatic
learning algorithm-based diagnosis strategies to find heart
illness. The analysis provides a few machine learningbased methods that make it easier to comprehend the
suggested approach. Detrano et al. [4] method for
classifying heart illness using machine learning
approaches produced a precise end result with an accuracy
of 77.00 percent. The dataset was utilized to extract
features from the system’s multi-layer kit architecture.
Another researcher, Gudadhe et al. [5], developed a
diagnosis method for heart dis-ease labeling utilizing a
multi-layer operational design and SVM classifier and
achieved a precision of 80.41 percent. The categorization
algorithm for the cardiac disease was created by
Kahramanli et al. [6] using a neural network and fuzzy
logic. The categorization algorithm achieved a precision
of 87.40%. An ANN troupe-based method of heart disease
detection was developed by Li et al. [7]. In addition to
using a numerical measurement method, it achieved 89.01
percent precision. A machine learning-based approach for
identifying heart disease was developed by McKinley et
al. [8]. The ANN-DBP system, in conjunction with the FS
algorithm, proved worthwhile. A professional health
diagnosis approach for heart dis-ease identification was
advised by Palaniappan et al. [9]. The prognostic ML
models Decision Tree (DT), Navies Bayes (NB), and
Neural Networks were utilized to improve the system.
Decision Tree Algorithms acquired a precision of 80.40
percent, ANN ac-curacy-ness of 88.12 percent, and
Navies Bayes attained a precision of 86.12 percent.
Olaniyi et al. [10] developed a 3-layer algorithm for heart
disease prediction based on neural network technology
and achieved 88.89 percent accuracy. A classification
scheme for heart disease employing restraint and stringent
set procedures was suggested by Liu et al. [11]. The
approach has a 92 percent accuracy rate. For the purpose
of identifying cardiac illness, Samuel et al. [12] developed
an integrated medical aid system based on Fuzzy AHP and

R. Pipalwa et al.

an artificial neural network. The performance of the
suggested approach in terms of precision achieved is 91%.
Cross-machine learning approaches were employed in one
of the research publications by Singh et al. [13] designed
as a heart disease forecast tool. They also suggested a
novel technique for comprehensive characteristic
selection from the data for effective ML classifier training
and testing. They were noted as having 88.07 percent
accuracy. Sequential Backward Selection Technique for
Features Selection, a selection and classification
algorithm, has been proposed in [14]. The suggested
strategy achieved excellent levels of accuracy. Geweid et
al.’s analysis of sophisticated Support Vector Machinebased dichotomy optimization algorithms for heart disease
identification [15]. Prior attempts to diagnose heart
disease had certain limitations, and the results have been
compiled to help people better appreciate the significance
of our suggested method. Among all the available
techniques, many ways are utilized to spot coronary heart
disease early on. Reduced accuracy and lengthy
computation times in those earlier solutions are major
problems, and it's possible that these are related to the use
of datasets with the wrong functions. The prediction must
be enhanced for increased detection accuracy, and it also
needs to develop effective and accurate early detection for
better treatment and healing. Ahammad et al. [16]
proposed an approach for designing a healthcare social
media platform for services for provisioning, consuming,
enabling patients to find an alternate source of healthcare
advice, and then it builds a collaborative health
community for all kinds of people. Gadiparthi et al. [17]
proposed a model for predicting ill effects. Here it predicts
the effects of human exposure to social networks in the
near future. Milioris et al. [18] investigated and
implemented a technique to assess health professionals’
views on the adoption and value of health information
systems and to assess their usage. Jasim et al. [19]
implemented CNN based model for building a system to
recognize diseases that are happened in citrus.
Considering the significant research gap and
difficulty in improving forecast accuracy, new methods
are being used in our paper to precisely locate coronary
heart disease to address these problems.

3

Proposed system model

The proposed system model has used a Hybrid classifier
that refers to the system being a composite mapping of
four algorithms (naive bayes, decision tree, support vector
machine, and XGBoost gradient classifiers). The mapping
referred to the design of the system in an additive form
such that the accuracy of the system gets increased and the
error rate reduces because too many systems to run
against.

3.1

Data set

Every dataset used can be found under the Index of heart
disease datasets from UCI Machine Learning Repository
at
the
following
link:
https://archive.ics.uci.edu/ml/machine-learning-

Prediction of Heart Disease Using Modified Hybrid Classifier

databases/heart-disease/. Stalog, Hungarian, Switzerland,
Long Beach VA, and Cleveland datasets combinedly used
in this article, featuring the following variables with their
description. The size of the dataset is 1023. For training
purposes, 648 data are used and for testing purposes, 412
data have been used. In all the classifiers same testing and
training ratio has been maintained to get the optimal result.
The dataset consists of 13 features dataset where one is the
output label output level has 2 possibilities one being the
presence of heart disease second being the absence of her
disease. Table 1 gives the description of 13 features of the
dataset with the feature code. Table 2 shows various
external factors and its description which may result in
future heart disease.
Table 1: Features of data set with their description.
Sl.
Feature name
No.

Feature
code

1

Age

2

Sex

3

Chest Pain

4

Resting Blood pressure

5

Serum Cholesterol
Fasting
blood
sugar>120mg/dl

6

Description

Age in Years
Avg=54.38
Male=1, Female=0
SEX
Ratio=70:30 (Male
to Female)
Atypical angina=1
Typical angina=2
CPT
Asymptomatic=3
Non-anginal
pain=4
RBP
In Mm hg
AGE

SCH
FBS

7

Resting
electrocardiogram

RES

8

Maximum heart rate

MHR

9

Exercise induced angina

EIA

Old
peak=ST
10 depression induced by
exercise relative to rest

In Mg/dl
True =1
False =0
Normal=0
ST T=1
Hypertrophy =2
Numeric
Yes =1
No=0

OPK

In Numeric

11

The slope of peak
Exercise ST Segment

PES

Up sloping=1
Flat =2
Down sloping =3

12

No. of major vessels
Colored by fluoroscopy

VCA

(0-3)

Normal=3
Fixed Defect=6
THA
Reversible
Defect=7
Patient has heart
LB diseases=1
Heathy Person =0

13 Thallium Scan

14 Label

Table 2: External factors with their description.
Factors
Body Mass Index

Feature
code
BMI

Description
True=has higher BMI
False=BMI normal

Informatica 47 (2023) 65–72

Feature
code

Factors
History of diseases
Family
diseases

history

67

Description

Yes =factor present
No
=Factor not
present
Yes =factor present
Fhist
No
=Factor not
present
Yes =factor present
Alchol
No
=Factor not
present
Phist

of

Alcohol

Four suitable and efficient classifier techniques are
described in a gist in Table 3. In Table 4, a comparison of
these four models with the proposed model is also shown.
Table 3: Classifier algorithms with their description.
Classifiers

Description

This is used for the classification concerned
problem. The training data set is used by the
algorithm to compute the value of the conditional
Navies Bayes
probability of a vector for a given class. The
algorithm
conditional probability value is evaluated for each
vector, and then the new vector class is evaluated
based on its conditional probability.
This is a supervised learning model with
Support
associated learning algorithms that analyze data
Vector
for classification and regression analysis. The
Machine
SVM algorithm is mostly used for classification
algorithm
problems because of its excellent performance in
various applications.
Shape is the just like a tree consisting of a leaf or
addition node. A decision tree has internal
Decision Tree external nodes linked to each other. The decisionalgorithm
making part of the internal node takes the
decisions and informs the child node to visit the
next note.
eXtreme
XGBClassifier of gradient boosting algorithm
Gradient
provides a wrapper class to allow models to be
Boosting
treated like a classifier or regressor.
algorithm

Table 4: Classifier algorithms with their limitation,
advantage and accuracy.
Classifiers

Limitation

Advantage

Heart disease
Accuracies are
diagnosis
very low and
using a single
Computation
system errors
machine
is less complex
can occur very
learning
easily
classifier
More
exaggeration
Accuracy is
Decision tree +
time is required comparatively
SVM
to generate the
high
result
Computationally
complex and
SVM + kNN +
accuracy is
performance
k-Means
high
time is very
High

Accuracy
in
percentage

70 to
80%

82.01%

87.4%

68

Informatica 47 (2023) 65–72

Classifiers

Limitation

Advantage

System based Computationally
Navies bayes
on Navies
complex and and decision tree
bayes +
ANN
achieved high
Decision tree + performance is performance in
ANN
low
terms of accuracy
Random forest
Random
showed less
Xgbosst
forest+xgboost accuracy in
showed high
+ Decision tree comparison to
accuracy
other classifiers
Performance
Navies Bayes
More execution
is high and
+ Decision tree
time is required
accurate. It
+ Support
to generate
suggests high
vector machine
results
performance in
+ XGboost
extreme situations

R. Pipalwa et al.

Accuracy
in
percentage

A function that calculates the degree of randomness,
also called entropy is defined as
𝑓(𝑠) = −𝑃+ × log(𝑃+ ) − 𝑃− × log(𝑃− ) where

84.33%

𝑃+ is the probability of the patient having heart
disease.
88.21%

98.73%

Heatmap shown in Figure-1 clearly reflects about the
variable weightage which helps in understanding the
relevance of each variable.

𝑃− is the probability of the patient not having heart
disease.

3.2.3

Support vector machine

This classifier partition training is set into two classes. It
maximizes the distance between two parallel hyperplanes
of the two classes and minimizes the sum of classification
errors. Here f is the optimal function that minimizes total
risk.
1

Min f = ‖𝑤‖2 + 𝑐 ∑𝑚
𝑖=1 𝜌𝑖 where
2

‖𝑤‖ is the distance between two hyperplanes.
𝜌 is the deviation of misclassified objects.
Here the first term of the objective function is
structural risk and the second term is an empirical risk.

3.2.4

Figure 1: Heatmap of dataset reflecting weightage of
each variable.

3.2

The modified classifier used in the
model

The modified hybrid classifier is a mechanism to use
multiple classifiers with different strategies to get desired
output with high accuracy.

3.2.1

Navies bayes

This classifier uses standard arguments in a modified
manner to get desired outcomes. Here probability (P) is
calculated based on the likelihood of heart disease and
class prior probability.
P(heart disease)
P(Likelihood of heart disease) ∗ Class prior probability
=
Predictor prior probability

3.2.2
Decision tree
This classifier is an application for choosing several
different extracted features in a modified manner to get
desired outcomes.

XGBoost

This classifier uses fast learning through parallel and
distributed computing and offers efficient memory usage.
It uses begging and boosting. Here boosting technique
makes use of trees with fewer splits.
𝑓𝑚−1 + ℎ𝑚 (𝑥) → 𝑓𝑚 (𝑥)
‘m’ denotes the iterations, until residuals have been
minimized as much as possible.
‘f’ is defined to predict the target where fm is the
current model and fm -1 is the previous model.
‘h’ denotes the fit to the residuals from the previous
step.

3.2.5

Modified hybrid classifier

Here we have taken 4 independent classifiers based on the
accuracy of our dataset. We have experimented and found
high accuracy using Naive Bayes, Decision tree, SVM,
XGBoost. After this, we propose that each new data
will go through all the classifiers and result in
individual results.

Prediction of Heart Disease Using Modified Hybrid Classifier

These results then will be cross-validated with each
other to check for any ambiguity. In case of ambiguity,
we would go with XGBoost (as it resulted in the highest
accuracy on our dataset used). If no ambiguity arises in
the result, we will go for result analysis. If the result is
positive which means the patient is having heart
disease, we display the message accordingly. If it does
not have heart disease the possibility of future
possibilities and display a message according to the
external factor’s possibilities.
Strategies used in the modified hybrid classifier are
shown below.
Step-1. We would first take the 13 features from the
user interface and run them among all 4
classifiers.
Step-2. In case the same value is found (all of them
predicting the same disease analysis), we would
display the message accordingly.
Step-3. In case of ambiguity (different value from the
classifiers), we consider the estimated value
given by XGBoost as it resulted in the highest
accuracy on our dataset used and display the
message of XGBoost on the screen.
Step-4. In case we see that person is not having a heart
disease we check with 5 future possibilities
variables and display a warning message
accordingly.

3.3

Algorithm for the proposed model

Step-1: It starts with the training of the dataset, in
which 624 data are trained to each algorithm
classifier.
a. There are 4 algorithm classifiers in the model
namely Decision tree, Navies Bayes, Support
vector machine, and XGboost.
b. A decision tree is a graphical representation for
getting all possible solutions to a decisionmaking situation on a given condition. It
follows the supervised learning technique,
where internal nodes represent the feature of a
data set and branches represent the decision
rules and each leaf node represents the possible
outcome.
c. Navies Bayesis a probabilistic classifier that
predicts the possibilities given by a probability
of an object. It applies Bayes law which is based
on the probability of a hypothesis with prior
knowledge.
d. In the Support vector machine, we plot each
data item into a point in the ‘n’ dimension space
where ‘n’ represents a number of features
available in the data set. Then classification is
performed on the hyperplane that differentiates
the two classes properly.
e. XGboost or extreme gradient boost is an
advanced version of gradient boosting

Informatica 47 (2023) 65–72

69

classifiers. The major difference lies in the fact
XGboost is a regularized model formalized to
control overfitting which gives better
performance.
Step-2: The extracted feature is computed after
training of data set for every algorithm
classifier upon which each variable can be used
for the model.
Step-3: All the extracted features are sent to 4 different
ML algorithms and a resultant output is
obtained without any ambiguity.
Step-4: If there is any ambiguity between the four
different algorithms the system alerts its
reserved feature of taking the most accurate
method among all.
Step-5: The resulting output shall be checked with 420
data for testing purposes and the reliability of
the model proposed
Step-6: The resulting output is converted into a model
which segregates the prediction of heart
disease and future possibilities of heart
disease.
Step-7: When a user enters new data, it follows a
certain pattern to label it into categories.
Step-8: Features are extracted from the new data. Then
it is passed to the proposed model.
Step-9: Then a prediction is made about the possibility
of having heart disease or not. If a person does
not have heart disease at present, then future
possibilities are also looked upon.
Step-10:
The user gets a message about the
present condition and consultations for the
future.

Figure 2(a): Flow diagram of the proposed algorithm.

Figure 2(b): Flow diagram of the proposed algorithm
with new input arrival

70

Informatica 47 (2023) 65–72

R. Pipalwa et al.

Figure 2(a) explains flow diagram and 2(b) explains
the manner of prediction done on each new input
arrival. Each new input is taken to individual classifiers
namely Naive Bayes, Decision tree, SVM, XGBoost.
Then each of the classifier’s results, with individual
output are predicted which are verified for any
ambiguous data or any system error then it is converted
into a model.
In the model verification and validation, results are
calculated and then the results are categorized into
positive and negative. If a patient has a negative result,
it redirects to take external factors and then make a
prediction according to it, whereas in positive results a
warning message is shown and consultation with a
specialist is advised.

4

Figure 3(c): Input of external factors of Non-heart
disease symptoms.

System U/I design

We have designed a webpage using flask for the
implementation of the proposed model in Fig 3 (a-d) and
its subparts. Here four figures depict the input and result
pattern of the heart disease patient.

Figure 3(a): Input of heart disease symptoms.

Figure 3(d): Result of negative heart disease but
having external factors positive.

5

Result and discussion

Table 5 clearly shows that all 4 methods used have
resulted very accurately in training and testing areas. The
training and testing are done on the ratio of 60 and 40 on
the same dataset. Some data are kept reserved for model
evaluation and application testing at a later stage to
evaluate a proper idea of the system errors. While testing
at the latest stage, no error was found either at the system
end or at the web end. The resultant accuracy was
calculated using a table-6 with the proposed algorithm
(Modified hybrid classifier) of the system proposed is
98.73% which is comparatively far better in the context of
previous research.
Table 5: Training and testing results.
Training
Testing
Classifiers
accuracy (%) accuracy (%)
Naive Bayes
85
78

Figure 3(b): Positive result of heart disease
symptoms.

Decision tree

100

96

Support vector machine
eXtreme Gradient
Boosting
Modified hybrid
classifier(Proposed)

100

84

100

97

100

98.73

Table 6: Confusion matrix of modified hybrid classifier.
Positive
Negative
Positive
760
8
Negative
7
300

Figure. 3(c): Input of Non-heart disease symptoms.

Prediction of Heart Disease Using Modified Hybrid Classifier

Informatica 47 (2023) 65–72

6

Figure 4: Representation of performance parameter.

71

Conclusion and future scope

Our proposed system achieved an accuracy of 98.73%
where the model accepts 13 clinical data and 5
environmental data, and it is trained using
backpropagation algorithms to read it and analyze the
presence or absence of heart disease in a patient. We also
presented a user-friendly web application which helps a
patient easily access his/her present condition and act
accordingly.
Integrated multiple disease prediction-based
models could be designed so that a user can analyze any
condition according to their choice. A market review
could also be done in order to launch the prototype for
medical and general public use. All these may help society
to come closer in the fight against modern-day diseases
and their detection.

Acknowledgment

Figure 5: Output of tested results for various
parameters.
This graph (shown in Figure 5) refers to the various
features entered in the line graph form and the blue dot
represents patients' results, the blue dot at 0 means heart
diseases not found blue dot at 1 means heart diseases
detected. (Note that this is the sample testing done on 216
data for a better understanding of system results and to get
an overall view).
The graph (Fig-6) shows us various algorithms
which are present in the industry and their accuracy
against the proposed method. The Blue dotted lines
suggest the industry standards line of accuracy.
The purpose of this experiment was to analyze and
predict the possibility of heart disease with high
precision which benefits directly to human society.
Results shown in the process of prediction suggest high
accuracy and fewer system failures. The on-ground
implementation of the project has been successfully
deployed with accurate precision. At no point in time,
no conclusive system error has occurred neither at the
system end or at the web application end.

Figure 6: Comparative analysis of algorithms.

We would thank Dr. Mohit Chowdhary (MBBS from
Kolkata Medical College, Junior resident, Department of
Medicine, All India Institute of Medical Sciences, New
Delhi) for the Technical Analysis of the Diseases and for
giving the medical point of view of the paper.

Author’s Contributions
All the authors have contributed equally to this paper.

References
[1] Medline
plus:
heart
diseases,
2021.
http://www.nlm.nih.gov/medlineplus/heartdiseases.h
tml (Accessed on April 22, 2021)
[2] Mohamed, MohamedM G., Mohammed Osman,
Babikir Kheiri, Maryam Saleem, Alexandre Lacasse,
and Mohamad Alkhouli. "Polypill for cardiovascular
disease prevention: systematic review and metaanalysis
of
randomized
controlled
trials." International Journal of Cardiology (2022).
https://doi.org/10.1016/j.ijcard.2022.04.085
[3] Tsanas, Athanasios, Max A. Little, Patrick E.
McSharry, and Lorraine O. Ramig. "Nonlinear speech
analysis algorithms mapped to a standard metric
achieve clinically useful quantification of average
Parkinson's disease symptom severity." Journal of the
royal society interface 8, no. 59 (2011): 842-855.
https://doi.org/10.1098/rsif.2010.0456
[4] Detrano, Robert, Andras Janosi, Walter Steinbrunn,
Matthias Pfisterer, Johann-Jakob Schmid, Sarbjit
Sandhu, Kern H. Guppy, Stella Lee, and Victor
Froelicher. "International application of a new
probability algorithm for the diagnosis of coronary
artery
disease." The
American
journal
of
cardiology 64,
no.
5
(1989):
304-310.
https://doi.org/10.1016/0002-9149(89)90524-9
[5] Gudadhe, Mrudula, Kapil Wankhade, and Snehlata
Dongre. "Decision support system for heart disease
based on support vector machine and artificial neural
network." In 2010 International Conference on

72

Informatica 47 (2023) 65–72

Computer and Communication Technology (ICCCT),
pp.
741-745.
IEEE,
2010.
https://doi.org/10.1109/ICCCT.2010.5640377
[6] Kahramanli, Humar, and Novruz Allahverdi. "Design
of a hybrid system for the diabetes and heart
diseases." Expert systems with applications 35, no. 12
(2008):
82-89.
https://doi.org/10.1016/j.eswa.2007.06.004
[7] Li, Yanping, Tianyi Huang, Yan Zheng, Tauland
Muka, Jenna Troup, and Frank B. Hu. "Folic acid
supplementation and the risk of cardiovascular
diseases: a meta‐analysis of randomized controlled
trials." Journal of the American Heart Association 5,
no.
8
(2016):
e003768.
https://doi.org/10.1161/JAHA.116.003768
[8] McKinley, DeAngelo, Pamela Moye-Dickerson,
Shondria Davis, and Ayman Akil. "Impact of a
pharmacist-led intervention on 30-day readmission
and assessment of factors predictive of readmission in
African American men with heart failure." American
journal of men's health 13, no. 1 (2019):
1557988318814295.
https://doi.org/10.1177/1557988318814295
[9] Palaniappan, Sellappan, and Rafiah Awang.
"Intelligent heart disease prediction system using data
mining techniques." In 2008 IEEE/ACS international
conference on computer systems and applications, pp.
108-115.
IEEE,
2008.
https://doi.org/10.1109/AICCSA.2008.4493524
[10] Olaniyi, Ebenezer Obaloluwa, Oyebade Kayode
Oyedotun, and Khashman Adnan. "Heart diseases
diagnosis
using
neural
networks
arbitration." International Journal of Intelligent
Systems and Applications 7, no. 12 (2015): 72.
DOI: 10.5815/ijisa.2015.12.08
[11] Liu, Peter Y., Alison K. Death, and David J.
Handelsman. "Androgens and cardiovascular
disease." Endocrine reviews 24, no. 3 (2003): 313340.
https://doi.org/10.1210/er.2003-0005
[12] Samuel, Oluwarotimi Williams, Grace Mojisola
Asogbon, Arun Kumar Sangaiah, Peng Fang, and
Guanglin Li. "An integrated decision support system
based on ANN and Fuzzy_AHP for heart failure risk
prediction." Expert Systems with Applications 68
(2017):
163-172.
https://doi.org/10.1016/j.eswa.2016.10.020
[13] Singh, Archana, and Rakesh Kumar. "Heart disease
prediction using machine learning algorithms."
In 2020 international conference on electrical and
electronics engineering (ICE3), pp. 452-457. IEEE,
2020.
https://doi.org/10.1109/ICE348803.2020.9122958
[14] 14. Zhou, Joey Tianyi, Hao Zhang, Di Jin, Xi Peng,
Yang Xiao, and Zhiguo Cao. "Roseq: Robust
sequence labeling." IEEE transactions on neural
networks and learning systems 31, no. 7 (2019):
2304-2314.
https://doi.org/10.1109/TNNLS.2019.2911236
[15] Geweid, Gamal GN, and Mahmoud A. Abdallah. "A
new automatic identification method of heart failure

R. Pipalwa et al.

using improved support vector machine based on
duality optimization technique." IEEE Access 7
(2019):
149595-149611.
https://doi.org/10.1109/ACCESS.2019.2945527
[16] Ahammad, Tanvir, Tamanna Yesmin, Md Mahmudul
Hasan, Sudipta Kumar Mondal, and Selina Sharmin.
"An approach for collaboration between different
stakeholders to strengthen the public health
system." Informatica 46,
no.
7
(2022).
https://doi.org/10.31449/inf.v46i7.3986
[17] Gadiparthi, Manjunath, and E. Srinivasa Reddy.
"Optimizing the Quality of Predicting the ill effects of
Intensive Human Exposure to Social Networks using
Ensemble Method." Informatica 46, no. 7 (2022).
https://doi.org/10.31449/inf.v46i7.4212
[18] Milioris,
Konstantinos,
Charalampos
Konstantopoulos, and Konstantinos Papageorgiou.
"Perceptions and needs of health professionals
concerning
health
information
systems." Informatica 46,
no.
7
(2022).
https://doi.org/10.31449/inf.v46i7.3974
[19] Almola, Sahera Abued Sead, Mohammed H. Haloob
Alabiech, and Esraa Jasem Harfash. "Citrus Diseases
Recognition by Using CNN." Informatica 46, no. 7
(2022).
https://doi.org/10.31449/inf.v46i7.4284

https://doi.org/10.31449/inf.v47i1.4055

Informatica 47 (2023) 73-82

73

Sentiment Analysis and Machine Learning Classification of COVID19 Vaccine Tweets: Vaccination in the Shadow of Fear-trust Dilemma
Samet Tüzemen1, Özge Barış-Tüzemen2 and Ali Kemal Çelik*1
1 Department of Business Administration, Ardahan University Çamlıçatak Ardahan, Türkiye
2 Department of Econometrics, Karadeniz Technical University Kalkınma Trabzon, Türkiye
E-mail: samettuzemen@ardahan.edu.tr, ozgebariss@gmail.com, alikemalcelik@ardahan.edu.tr
* Corresponding author
Keywords: COVID-19 vaccine, sentiment analysis, machine learning, text mining, twitter
Received: March 4, 2022
In addition to infecting millions of people and causing hundreds of thousands of deaths, COVID-19 has
also caused psychological and economic devastation. Studies on the vaccine, which is considered to be
the only way to eliminate this pandemic, have been rapidly completed and more than 10 vaccines have
begun to be applied worldwide by 2021. One of the biggest obstacles to the fight against COVID-19 is the
hesitation against the vaccine. The fear factor, fed by incomplete and false information spreading rapidly
through social media applications such as Twitter, is thought to be the main reason for this hesitation. In
this study, the general sentiment against the COVID-19 vaccine is analyzed. For this, in the first week of
January 2021, more than 8000 tweets are extracted with R statistical software and Twitter API, and
appropriate sentiment analysis methods are applied. On the other hand, accuracy values are obtained by
applying Logistic Regression and Naïve Bayes methods, which are effective and widely used supervised
machine learning methods, for sentiment classification. Although the results indicate that there is a
positive attitude about the vaccine, it is remarkable that the rate of negative sentiments is relatively high
(30%). Trust is the dominant sentiment on the positive side, while fear is the dominant sentiment on the
negative side. According to the results of the classification methods, accuracy values are close to 90%.
Povzetek: Študija obravnava splošno razpoloženje glede cepiva za COVID-19 na Twitterju.

1 Introduction
COVID-19 is a disease caused by the virus called severe
acute respiratory syndrome coronavirus 2 (SARS-CoV2), which is transmitted from person to person, affecting
the respiratory tract. This disease, which emerged with
the detection of the first case in Wuhan province of China
in December 2019 and spread all over the world in a few
months, was declared a pandemic by the World Health
Organization (WHO) on March 11th, 2020 and has been
the most important agenda item in the world until today.
COVID-19 is transmitted by inhaling droplets emitted by
sick individuals during a speech or sneezing/coughing.
For this reason, it is recommended to pay attention to
social distance, use of masks, and cleanliness as a
method of protection from disease. Since these
recommendations were found to be insufficient to
contain the spread of the disease, governments have
implemented various advanced measures such as
restrictions and closures.
The measures taken by many countries around the world,
in the form of travel and gathering bans and lockdowns,
have contributed to overcoming the periods in which the
spread of the epidemic accelerated, called waves, and
ensured the control of the epidemic to a certain extent, as

seen in Figure 1. However, the restriction of economic
activity and social life in this dimension has created great
pressure on individuals and the economies. Especially
individuals who are currently trying to cope with the
shock effect of a worldwide epidemic have also faced the
loss of social interaction. As a result, individuals have
started to experience disorders such as stress, anxiety,
and depression [1].

Figure 1: Daily new confirmed COVID-19 cases
(world).
On the other hand, these measures had a great impact on
macroeconomic indicators such as unemployment and

74 Informatica 47 (2023) 73–82

budget balance. This situation caused a sudden decline in
the Gross Domestic Product of the countries, causing
almost all of them to experience a large decrease. As seen
in Figure 2, the shrinkage experienced in GDP has found
an average of 4.4%. In addition, there also have been
dramatic increases in unemployment rates. All these
negativities experienced in economic terms ultimately
caused the psychological conditions of individuals to get
even worsen [3].

Figure 2: Real GDP growth (Annual %).
Simultaneously with the emergence of the disease,
scientists in many parts of the world began working to
develop a vaccine that would be effective against the
virus. It is known that years, not months, are needed for
an effective and safe vaccine to emerge after all
procedures are completed. Despite this, a great effort has
been made for an effective vaccine that will end the
COVID-19 pandemic, and the development process of at
least 3 vaccines has been completed before the end of
2020. In some countries such as China, United Kingdom,
and Russia, it has even been granted permission to use
these vaccines in emergencies. As of January 2021, 10
vaccines have been used by various countries and over
70 million people have been vaccinated. Figure 3 shows
the course of vaccination in 10 countries where the most
doses are applied.

Figure 3: Cumulative COVID-19 vaccines doses
administered (highest 10 countries).
Considering the psychological and economic destruction
of this pandemic, it is expected that the beginning of the

S. Tüzemen et al.

vaccination process, which is likely to end the epidemic,
would have a positive effect on people. This virus has
infected 100 million people worldwide and caused the
death of 2 million people as of January 2021. Despite this
phenomenon, the positive attitude towards the vaccine is
not high as is expected. Although it varies based on
countries, it is observed that there is a remarkable rate of
skepticism in the society against the vaccine [5]. The
most important factor that triggers this attitude, which is
an important obstacle in the effective fight against a
pandemic, is the rapid spread of misleading information
based on conspiracy theories. Social media has become
the primary communication tool that enables information
to spread rapidly around the world. However, the
accuracy of the aforementioned information cannot
always be guaranteed, and this causes information
corruption. This situation makes it difficult to manage
the perception of society in such an important period. As
a result, even the vaccine, which is the world's only hope
to end this global crisis, was faced with a significant
negative response.
This study aims to reveal the public sentiment against the
COVID-19 vaccine as of the first week of 2021 by
examining the posts (tweets) from Twitter, which is an
important social media tool with a large user base, while
the ongoing vaccination activities are given. For this
purpose, using the R statistical software, Twitter posts
are compiled, and sentiment analysis is performed with
the data cleaned with appropriate methods. Then, the
efficiency of the established models is examined by
applying Logistic Regression and Naïve Bayes
Classification methods, which are the most frequently
used machine learning methods.

2 Background and literature review
Studies on COVID-19 disease have increased
dramatically after spread around the world and declared
as a global pandemic by WHO. Not only the medical
effects of COVID-19 on sick people, but also the
psychological and behavioral effects on the whole
society, and even the socio-economic effects on the
countries are examined in these studies. Considering the
scope and impact of the disease, evaluating these studies
independently from each other will prevent one to
understand the real dimension of each effect. For this
reason, some of the wide-ranging researches are referred
and their contribution to this study is examined.
As in every large socio-economic incident, the primary
impact of the COVID-19 outbreak has been on the
psychology of individuals. Especially, the increasing
number of cases and deaths and the restrictions imposed
by the governments started to create increasing pressure
on the individuals in the society. It is thought that
determinants such as education, age, gender, and social

Sentiment Analysis and Machine Learning Classification of COVID…

status have an important contribution to the extent of this
pressure. Accordingly, certain groups are experiencing
unemployment and cost of living pressure, while others
must cope with concerns such as education and
socialization. Psychological problems such as stress,
anxiety disorder, and depression accompany this
pressure. With the rapid spread of the disease itself, the
spread of true and false information about it on social
media has increased the extension of individual traumas.
Although efforts are made to alleviate social trauma
through various methods such as free online group
therapies, some researchers argue that the effects of this
trauma will extend into the post-pandemic period and
only then will its profound effects be understood [1, 6, 7,
8, 9, 10, 11].
Today, people prefer to share their feelings on social
media. For this reason, social media applications such as
Twitter have become a very large and important data
source to measure the feelings of individuals and
societies in the face of certain events. In this process,
many studies have been conducted using tweets to
investigate how people feel about the COVID-19
outbreak. In these studies, which are called sentiment
analysis, researchers have applied various machine
learning classification methods such as Logistic
Regression, Naïve Bayes, Vector Support Machine, and
Recurrent Neural Network, which are widely used today.
The results vary according to the demoFigureic structure
and the measures taken by the governments. However, it
is seen that the presence of high polarity in relatively
homogeneous groups, the reaction of individuals to an
event is generally directly related to their characteristics
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22].
The impact of COVID-19 on the world economy has
been devastating. In addition to the restriction of
international transportation and trade, because of the
lockdown implementations and the dramatic decline in
commercial activity, the economies around the world
rapidly entered a severe recession. As a natural
consequence of this, unemployment rose to record levels
and financial difficulties increased significantly.
Although developed and rich countries have taken some
measures to mitigate the impact of the pandemic on the
general economy and individuals, the crisis has deepened
in countries that are already experiencing difficulties.
Another reason for the psychological disorders that
coped with the COVID-19 pandemic is the mentioned
unemployment. While most research show that
unemployment is an important trigger of stress and
depression, it also emphasizes that this situation causes
an increase in suicide cases. Again, the researchers
concluded that the groups considered as minorities are
more vulnerable when considering the factors of race,
gender, and age in the face of the above factors [23, 3,
24, 25, 26, 27, 28, 29].

Informatica 47 (2023) 73-82

75

Experts point out the need for herd immunity to end this
pandemic and therefore stop the material and moral
losses. Accordingly, in order for the epidemic to slow
down and disappear, at least 60% of the population must
be immunized [30]. This can be done in two ways. First,
this proportion of people should get sick. Second, people
should be vaccinated at this rate. There have been
countries that have tried the first method at the early
stage of the pandemic. However, with the realization that
the cost of this is too high to be incurred, the second
method has become the only hope for the whole world.
Especially the shock wave experienced at the beginning
of the pandemic provided support for vaccination studies
to a large extent.
With the availability of at least a few different vaccines
in the last months of 2020, conspiracy theories that
spread rapidly on social media emerged. These
conspiracy theories have triggered an unsafe
environment for the vaccine. In this context, the
hesitation against the COVID-19 vaccine is being
studied extensively. [31] argue that the current hesitation
against vaccination will not disappear in a short time,
even with a devastating pandemic such as COVID-19,
and this should be tackled at the local level. [32] found
that 71.5% of the participants were willing to get the
COVID-19 vaccine in their survey in June 2020, with
13,426 people from 19 countries. [33] measured vaccine
literacy and attitude against possible COVID-19 vaccine
in their survey study for Italy. Again, the results of this
study carried out in June 2020, show that there is an 8090% positive attitude towards the vaccine.
Examining the size of the hesitation against vaccines in
China in May 2020, [34] found that 95% of the
participants trust the vaccine to be developed in the
country and 83% want to get the COVID-19 vaccine
when it is ready. [35] conducted a similar study for the
USA in May 2020 and as a result, 69% positive response
was obtained for the COVID-19 vaccine. On the other
hand, [36] revealed in the survey they conducted for the
UK in September 2020 that 54% of the participants had
a positive approach to the COVID-19 vaccine. [37], who
examined the rate of refusal of the COVID-19 vaccine
by 5 consecutive survey studies in France between May
and October 2020, found that this rate gradually
increased. According to the findings of [38], who
conducted similar research for Italy, the situation was in
line with the results of the previous study. Researchers
have stated that there is a decrease in the intention of
vaccination between the two stages of the epidemic in
Italy and that the proportion of people who intend to be
vaccinated is not enough to end the epidemic (pp. 786787).
Many similar studies have been conducted for many
countries such as Finland, Israel, Pakistan, and

76 Informatica 47 (2023) 73–82

Indonesia. According to the findings and the joint
opinion of the researchers, the skeptical attitude towards
the COVID-19 vaccine is alarming and urgent action
should be taken against it [39, 40, 41, 5, 42, 43]. On the
other hand, in the survey study conducted for nursing
students, who are healthcare professionals of the future,
it was revealed that 63% of the participating students
intend to get the COVID-19 vaccine [44]. This rate
clearly shows that even young people with health
education have a skeptical attitude towards vaccination.
When the studies are evaluated as a whole, it is seen that
the size of the hesitation against the vaccine is at a
worrying level. Almost all researchers agree that a
proactive method should be followed in solving this
problem.

3 Data set and methodology
In this study, public sentiment analysis about the
COVID-19 vaccine is examined. The data set used for
this purpose is extracted from Twitter with the keywords
"coronavirus vaccine" on January 9th, 2021, using the R
statistical software and the “rtweet” library. More than
8000 tweets are converted to lowercase letters, freed
from repetitions, punctuations, numbers, stop words,
URLs, and non-ASCII words, and finally lemmatized in
order to make them ready for analysis. Finally, the
remaining 7935 tweets have been converted into a term
document matrix.
To analyze the general attitude towards COVID-19
vaccines, the sentiment analysis method, which is a
frequently used and effective method in big data
analytics, is used. Sentiment analysis is defined as the
classification of the main idea in a text with the
applications of natural language processing and text
analytics. Sentiment analysis aims to understand the
attitude of the author by detecting the emotion
polarization in a text and classifying it as positive,
negative, and neutral ([45] pp. 53-54). For this, a
dictionary-based emotion score is determined for each
word in the text using flexible and open-source
programming languages such as R and Python and
related packages. Later, this score, determined on the
basis of words, is calculated for the whole text. As a
result, the text is classified as positive, negative, or
neutral [45].
Despite the important advantage of being less complex,
the score calculation method is not efficient enough in
some cases. A positive sentence with negative score
words will be evaluated as negative by this method. On
the other hand, the machine learning approach, which
automates processes more, is widely used in sentiment
classification. In particular, a sentiment classification
model is created by training the available data with
supervised machine learning methods, and the obtained

S. Tüzemen et al.

accuracy values are compared. This comparison is used
to determine if the model has been set correctly, or if
there is an overfit or underfit issue. Logistic Regression
and Naïve Bayes, which are among these supervised
machine learning methods, are used in this study.
Logistic regression is a common type of generalized
linear models and models the probability of some events
occurring as a linear function of a set of predicted values.
In other words, the Logistic Regression method tries to
estimate the probability of the dependent variable having
a certain value instead of estimating the value of the
dependent variable. For example, instead of guessing
whether a soccer team will beat the round it played, it
tries to predict the probability of passing the round. The
actual state of the dependent variable is determined by
looking at the estimated probability. If the predicted
probability is greater than 0.50, the estimate is closer to
YES (i.e., to pass the round), otherwise, the failure to
pass the round is more probable. Logistic Regression is
used only when the dependent variable is a categorical
binary (0 or 1, YES or NO, etc.). In this case, these two
possibilities are calculated as 𝑃(𝑦𝑗 = 0) = 1 − 𝑝𝑗 and
𝑃(𝑦𝑗 = 1) = 𝑝𝑗 with the available data. In this case, the
linear logistics (logit) model is established as follows
([46] pp. 157-158).
log (

𝑝𝑗
[1 − 𝑝𝑗 ]

) = 𝛼 + 𝛽1 𝑋1𝑗 + 𝛽2 𝑋2𝑗 + 𝛽3 𝑋3𝑗 + ⋯
+ 𝛽𝑛 𝑋𝑛𝑗

Naïve Bayes is a simple but effective machine learning
classification method that uses the Bayes rule based on
the assumption of conditional independence of variables.
Bayes theory is a method of calculating the probability
of event A to occur depending on event B. It is basically
formulated as follows:
𝑃( 𝐴 ∣ 𝐵 ) =

𝑃( 𝐵 ∣ 𝐴 )𝑃(𝐴)
𝑃(𝐵)

where, 𝑃(𝐴) and 𝑃(𝐵) are the probability of occurrence
of events A and B, respectively, 𝑃( 𝐵 ∣ 𝐴 ) is the
conditional probability of event B to event A, and lastly,
𝑃( 𝐴 ∣ 𝐵 ) is the conditional probability of event A to
event B. Based on this, the Naïve Bayes classification
equation is simply shown as:
𝑃( 𝑦 ∣∣ 𝑥1 , … , 𝑥𝑗 ) =

𝑃( 𝑥1 , … , 𝑥𝑗 ∣∣ 𝑦 )𝑃(𝑦)
𝑃(𝑥1 , … , 𝑥𝑗 )

where, 𝑃( 𝑦 ∣∣ 𝑥1 , … , 𝑥𝑗 ) is the posterior conditional
probability of class (𝑦) to observation values (𝑥𝑛 ),
𝑃( 𝑥1 , … , 𝑥𝑗 ∣∣ 𝑦 ) is the conditional probability of
observation values to class. Finally, while 𝑃(𝑦) is the
prior probability of the class, 𝑃(𝑥1 , … , 𝑥𝑗 ) is called the
marginal probability ([47] pp. 279-280).

Sentiment Analysis and Machine Learning Classification of COVID…

77

Informatica 47 (2023) 73-82

4 Findings
In this part of the study in which the sentiment analysis
about the COVID-19 vaccine is examined, the obtained
findings are presented. Accordingly, the frequency
distribution of 200 or more words in extracted and
cleaned tweets with appropriate methods is presented in
Figure 4.

Figure 6: Sentiment frequency distribution of COVID19 vaccine tweets.

Figure 4: Word frequency of COVID-19 vaccine
tweets.
As seen in Figure 4, 4 words namely, free, give, scientist,
and health are used more than 300 times. Another
important result that is seen from the Figure is that there
are words with positive meanings such as good, safe, and
approve among the words used more than 200 times, and
the absence of words with negative meanings. On the
other hand, all the words in the tweets used in the study
are presented in Figure 5 in the form of a word cloud.

As seen in Figure 6, the strongest sentiment against the
COVID-19 vaccine as of the first week of January 2021
is positive. Accordingly, positive sentiments are about
twice as much as negative sentiments. With a simple
approach, it can be said that approximately 66% of the
tweets about the COVID-19 vaccine are positive and
approximately 33% are negative. These results coincide
with the findings of [35], [36], and [44] examined in the
literature section of the study. Although these rates give
the impression that there is a positive approach to the
vaccine at first glance, it is critically close to 60%, which
is required for the immunity rate also known as herd
immunity, to end the pandemic. Therefore, it is not
wrong to comment that the rate of negative sentiments
towards the vaccine is high. On the other hand, it is seen
that feelings of trust and hope are dominant on the
positive side, and fear is dominant on the negative side.
In order to classify tweets with supervised machine
learning methods, the tweets with positive emotion score
are marked as 1, and the ones with negative scores are
marked as 0. With this marking, the Logistic Regression
model is established and applied to the train and test data
set separated as 80%-20%. The obtained confusion
matrix and accuracy values are presented in Table 1.
Table 1: Confusion matrix and accuracy values for
logistic regression.
Train data

Test data
Actual

Figure 5: Word cloud of COVID-19 vaccine tweets.
Although it has a lower frequency, there are also negative
words such as stop, kill, opposition, and death in tweets,
as seen in Figure 5. When evaluated as a whole, the
distribution of sentiments in the tweets of this study is
shown in Figure 6.

Predicti
on

Negati
ve

Negative
Positive

Actual

Positi
ve

Predicti
on

Negati
ve

509

161

Negative

129

38

388

4109

Positive

83

1009

Accuracy: 0,8937

Positi
ve

Accuracy: 0,9039

As seen in Table 1, the Logistic Regression model has
made the negative and positive classification of the
sentiment of tweets with very high accuracy. In addition,
the sensitivity value giving the true positive rate is
calculated as 0.9637 for the train data set, while the
specificity value giving the true negative rate is

78 Informatica 47 (2023) 73–82

S. Tüzemen et al.

calculated as 0.6085. On the other hand, the results of the
Naïve Bayes model created for emotion classification of
tweets about the COVID-19 vaccine are presented in
Table 2.
Table 2: Confusion matrix and accuracy values for
naïve bayes.
Train data

Test data
Actual

Actual

Predicti
on

Negati
ve

Positi
ve

Predicti
on

Negati
ve

Positi
ve

Negative

544

254

Negative

132

65

Positive

353

4016

Positive

80

982

Accuracy: 0,8825

Accuracy: 0,8848

As seen in Table 2, where the results of the Naïve Bayes
classification model are presented, the accuracy values
are close to the results of Logistic Regression. The
sensitivity value for the train data set is 0.9405 and the
specificity value is 0.6065. When the findings are
compared with the results obtained from the Logistic
Regression model, it is seen that the Logistic Regression
classification model is slightly more effective.

5 Conclusion

basic way of this is to fight against misinformation that
spread rapidly, especially in social media, by sharing
effective and accurate information.
Logistic Regression and Naïve Bayes supervised
machine learning methods are applied to classify tweets
about the COVID-19 vaccine and their effectiveness is
determined and compared. According to the findings,
both methods have very high classification efficiency.
However, positive sentiment classification is more
successful than negative sentiment classification in both
methods. It can be thought that the reason for this is the
way the negative feelings are expressed (using "not
good" instead of "bad"). In this study, which tries to take
a snapshot of the attitude towards the vaccine in terms of
its results, it is recommended to examine the sentiment
locally in future studies on this subject and to investigate
how certain practices or developments affect the attitude
towards the vaccine instantly.

References
[1] Pillay, A. L., & Barnes, B. R. Psychology and
COVID-19: impacts, themes and way forward. South
African Journal of Psychology, 50(2), 148–153,
2020. https://doi.org/10.1177/0081246320937684
[2]

The aim of this study is to measure and evaluate the
attitude towards the COVID-19 vaccine with social
media, which has become the most important
communication tool today. For this purpose, the tweets
about the vaccine are extracted and sentiment analysis
about the vaccine is made with various classification
methods. For this, on January 9th, 2021, more than 8000
tweets belonging to the previous week are extracted via
R statistical software and Twitter API, and the obtained
data set is cleaned through appropriate libraries and
made ready for analysis. The results of the sentiment
analysis and machine learning classification are shared
in the findings section of this study.
When the results of the study are evaluated, it is seen that
positive sentiments about the COVID-19 vaccine are
more than negative ones. Therefore, the vaccine, which
is seen as the best possible solution to the major problems
caused by the pandemic, is generally accepted. On the
other hand, the high rate of negative sentiments is
worrisome. Similar to the study conducted by [48], this
rate (more than 30%) is an indication that hesitation
against vaccination should be evaluated carefully. The
results obtained with the classification of sentiments
reveal that the most dominant sentiment among negative
sentiments is 'fear'. Thus, in order to ensure that the fight
against the COVID-19 pandemic is not interrupted and
the desired level of immunity is achieved, those in the
public decision-making position must take strategic steps
to combat fear and the underlying uncertainty. The most

Ourworldindata.org.
Retrieved
from
https://ourworldindata.org/covidcases?country=~OWID_WRL
(Accessed:
27.01.2021)

[4] IMF World Economic Outlook (October 2020).
Retrieved
from
https://www.imf.org/external/datamapper/NGDP_R
PCH@WEO/WEOWORLD (Accessed: 21.01.2021)
[3] Achdut, N., & Refaeli, T. Unemployment and
Psychological Distress among Young People during
the COVID-19 Pandemic: Psychological Resources
and Risk Factors. International Journal of
Environmental Research and Public Health, 17(19),
7163,
2020.
http://dx.doi.org/10.3390/ijerph17197163
[5] Sallam, M. COVID-19 vaccine hesitancy worldwide:
a systematic review of vaccine acceptance rates.
medRxiv,
2021.
https://doi.org/10.1101/2020.12.28.20248950
[6] Marmarosh, C. L., Forsyth, D. R., Strauss, B., &
Burlingame, G. M. The psychology of the COVID19 pandemic: A group-level perspective. Group
Dynamics: Theory, Research, and Practice, 24(3),
122-138,
2020.
http://dx.doi.org/10.1037/gdn0000142
[7] Atalan, A. Is the lockdown important to prevent the
COVID-19 pandemic? Effects on psychology,

Sentiment Analysis and Machine Learning Classification of COVID…

environment and economy-perspective. Annals of
Medicine and Surgery, 56, 38-42, 2020.
https://doi.org/10.1016/j.amsu.2020.06.010
[8] Akat, M., & Karatas, K. Psychological Effects of
COVID-19 Pandemic on Society and Its Reflections
on Education. Turkish Studies, 15, 1-13, 2020.
https://doi.org/10.7827/TurkishStudies.44336
[9] Li, S., Wang, Y., Xue, J., Zhao, N., & Zhu, T. The
Impact of COVID-19 Epidemic Declaration on
Psychological Consequences: A Study on Active
Weibo Users. International journal of environmental
research and public health, 17(6), 2032, 2020.
https://doi.org/10.3390/ijerph17062032
[10] Aiello, L. M., Quercia, D., Zhou, K.,
Constantinides, M., Scepanovic, S., & Joglekar, S.
How Epidemic Psychology Works on social media:
Evolution of responses to the COVID-19 pandemic.
ArXiv, abs/2007.13169, 2020.
[11] Susič, D., Tomšič, J., & Gams, M. Ranking
Effectiveness of Non-Pharmaceutical Interventions
Against COVID-19: A Review. Informatica, 46(4),
2022. https://doi.org/10.31449/inf.v46i4.4181
[12] Tiwari, P., Pandey, H. M., Khamparia, A., &
Kumar, S. Twitter-based Opinion Mining for Flight
Service Utilizing Machine Learning. Informatica,
43(3),
381-386,
2019.
https://doi.org/10.31449/inf.v43i3.2615
[13] Chakraborty, K., Bhatia, S., Bhattacharyya, S.,
Platos, J., Bag, R., & Hassanien, A. E. Sentiment
Analysis of COVID-19 tweets by Deep Learning
Classifiers—A study to show how popularity is
affecting accuracy in social media. Applied Soft
Computing,
97,
Part
A,
2020.
https://doi.org/10.1016/j.asoc.2020.106754
[14] de Las Heras-Pedrosa, C., Sánchez-Núñez, P., &
Peláez, J. I. Sentiment Analysis and Emotion
Understanding during the COVID-19 Pandemic in
Spain and Its Impact on Digital Ecosystems.
International journal of environmental research and
public
health,
17(15),
5542,
2020.
https://doi.org/10.3390/ijerph17155542

Informatica 47 (2023) 73-82

psychiatry,
51,
102089,
https://doi.org/10.1016/j.ajp.2020.102089

79

2020.

[17] Samuel, J., Ali, G. G. M. N., Rahman, M. M., Esawi,
E., & Samuel, Y. COVID-19 Public Sentiment
Insights and Machine Learning for Tweets
Classification. Information, 11(6), 314, 2020.
http://dx.doi.org/10.3390/info11060314
[18] Nemes L., & Kiss, A. Social media sentiment
analysis based on COVID-19, Journal of
Information and Telecommunication, 2020.
http://dx.doi.org/10.1080/24751839.2020.1790793
[19] Zhou, J., Yang, S., Xiao, C., & Chen, F.
Examination of community sentiment dynamics
due to covid-19 pandemic: a case study from
Australia. ArXiv, abs/2006.12185, 2020.
[20] Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi,
M., & Shah, Z. Top Concerns of Tweeters During
the COVID-19 Pandemic: Infoveillance Study.
Journal of medical Internet research, 22(4),
e19016, 2020. https://doi.org/10.2196/19016
[21] Samant, S. S., Murthy, N. B., & Malapati, A.
Categorization of event clusters from twitter using
term weighting schemes. Informatica, 45(3), 2021.
https://doi.org/10.31449/inf.v45i3.3063
[22] Chawla, S., & Mehrotra, M. Impact of emotions in
social media content diffusion. Informatica, 45(6),
2021. https://doi.org/10.31449/inf.v45i6.3575
[23] OECD (2020). Economic Outlook No 108.
Retrieved
from
https://stats.oecd.org/Index.aspx?DataSetCode=E
O# (Accessed: 17.01.2021)
[24] Kong, E., & Prinz, D. Disentangling policy effects
using proxy data: Which shutdown policies
affected unemployment during the COVID-19
pandemic? Journal of Public Economics, 189,
104257,
2020.
https://doi.org/10.1016/j.jpubeco.2020.104257
[25]

Kawohl, W., & Nordt, C. COVID-19,
unemployment, and suicide. The Lancet
Psychiatry,
7(5),
389-390,
2020.
https://doi.org/10.1016/S2215-0366(20)30141-3

[15] Imran, A. S., Daudpota, S. M., Kastrati, Z., & Batra,
R. Cross-Cultural Polarity and Emotion Detection
Using Sentiment Analysis and Deep Learning on
COVID-19 Related Tweets. IEEE Access, 8, 181074181090,
2020.
https://doi.org/10.1109/ACCESS.2020.3027350

[26] Bauer, A., & Weber, E. COVID-19: how much
unemployment was caused by the shutdown in
Germany? Applied Economics Letters, 2020.
https://doi.org/10.1080/13504851.2020.1789544

[16] Barkur, G., Vibha, & Kamath, G. B. Sentiment
analysis of nationwide lockdown due to COVID 19
outbreak: Evidence from India. Asian journal of

[27] Fairlie, R. W., Couch, K., & Xu, H. The Impacts of
COVID-19 on Minority Unemployment: First
Evidence from April 2020 CPS Microdata.
National Bureau of Economic Research Working

80 Informatica 47 (2023) 73–82

Paper
Series,
27246,
https://doi.org/10.3386/w27246

S. Tüzemen et al.

2020.

2020.
https://doi.org/10.1101/2020.10.22.20217513

[28] Raifman, J., Bor, J., & Venkataramani, A.
Unemployment insurance and food insecurity
among people who lost employment in the wake of
COVID-19. medRxiv:the preprint server for health
sciences,
2020.07.28.20163618,
2020.
https://doi.org/10.1101/2020.07.28.20163618

[37] Hacquin, A. S., Altay, S., de Araujo, E., Chevallier,
C., & Mercier, H. Sharp rise in vaccine hesitancy in
a large and representative sample of the French
population: reasons for vaccine hesitancy.
PsyArXiv,
2020.
https://doi.org/10.31234/osf.io/r8h6z

[29] Blustein, D. L., Duffy, R., Ferreira, J. A., CohenScali, V., Cinamon, R. G., & Allan, B. A.
Unemployment in the time of COVID-19: A
research agenda. Journal of Vocational Behavior,
119,
103436,
2020.
https://doi.org/10.1016/j.jvb.2020.103436

[38] Palamenghi, L., Barello, S., Boccia, S., & Graffigna,
G. Mistrust in biomedical research and vaccine
hesitancy: the forefront challenge in the battle
against COVID-19 in Italy. European journal of
epidemiology,
35(8),
785–788,
2020.
https://doi.org/10.1007/s10654-020-00675-8

[30] Randolph, H. E., & Barreiro, L. B. Herd Immunity:
Understanding COVID-19, Immunity, 52(5), 737741,
2020.
https://doi.org/10.1016/j.immuni.2020.04.012

[39] Karlsson, L. C., Soveri, A., Lewandowsky, S.,
Karlsson, L., Karlsson, H., Nolvi, S., ... & Antfolk,
J. Fearing the disease or the vaccine: The case of
COVID-19.
Personality
and
Individual
Differences,
172,
110590,
2021.
https://doi.org/10.1016/j.paid.2020.110590

[31] Dubé, E., & MacDonald, N. E. How can a global
pandemic affect vaccine hesitancy? Expert Review
of
Vaccines,
19:10,
899-901,
2020.
https://doi.org/10.1080/14760584.2020.1825944
[32] Lazarus, J. V., Ratzan, S. C., Palayew, A., Gostin,
L. O., Larson, H. J., Rabin, K., Kimball, S., & ElMohandes, A. A global survey of potential
acceptance of a COVID-19 vaccine. Nature
medicine,
1–4,
2020.
https://doi.org/10.1038/s41591-020-1124-9

[40] Khan, Y. H., Mallhi, T. H., Alotaibi, N. H., Alzarea,
A. I., Alanazi, A. S., Tanveer, N., & Hashmi, F. K.
Threat of COVID-19 vaccine hesitancy in Pakistan:
the need for measures to neutralize misleading
narratives. The American journal of tropical
medicine and hygiene, 103(2), 603-604, 2020.
https://doi.org/10.4269/ajtmh.20-0654

[33] Luigi Roberto Biasio, L. R., Bonaccorsi, G., Lorini,
C., & Pecorelli, S. Assessing COVID-19 vaccine
literacy: a preliminary online survey. Human
Vaccines
&
Immunotherapeutics,
2020.
https://doi.org/10.1080/21645515.2020.1829315

[41] Harapan, H., Wagner, A. L., Yufika, A., Winardi,
W., Anwar, S., Gan, A. K., Setiawan, A. M.,
Rajamoorthy, Y., Sofyan, H., & Mudatsir, M.
Acceptance of a COVID-19 Vaccine in Southeast
Asia: A Cross-Sectional Study in Indonesia.
Frontiers in public health, 8, 381, 2020.
https://doi.org/10.3389/fpubh.2020.00381

[34] Lin, Y., Hu, Z., Zhao, Q., Alias, H., Danaee, M., &
Wong, L. P. Understanding COVID-19 vaccine
demand and hesitancy: A nationwide online survey
in China. PLoS Negl Trop Dis, 14(12): e0008961,
2020.
https://doi.org/10.1371/journal.pntd.0008961

[42] Dror, A. A., Eisenbach, N., Taiber, S., Morozov, N.
G., Mizrachi, M., Zigron, A., ... & Sela, E. Vaccine
hesitancy: the next challenge in the fight against
COVID-19. European Journal of Epidemiology,
35(8),
775–779,
2020.
https://doi.org/10.1007/s10654-020-00671-y

[35] Reiter, P. L., Pennell, M. L., & Katz, M. L.
Acceptability of a COVID-19 vaccine among
adults in the United States: How many people
would get vaccinated? Vaccine, 38(42), 6500-6507,
2020.
https://doi.org/10.1016/j.vaccine.2020.08.043

[43] Al Awaidy, S. T., & Khamis, F. Preparing the
Community for a Vaccine Against COVID-19.
Oman medical journal, 35(6), e193, 2020.
https://doi.org/10.5001/omj.2020.130

[36] Loomba, S., de Figueiredo, A., Piatek, S., de Graaf,
K., & Larson, H. J. Measuring the Impact of
Exposure to COVID-19 Vaccine Misinformation
on Vaccine Intent in the UK and US. medRxiv,

[44] Kwok, K., Li, K. K., WEI, W., Tang, A., Wong, S.,
& Lee, S. Influenza vaccine uptake, COVID-19
vaccination intention and vaccine hesitancy among
nurses: A survey. International Journal of Nursing
Studies.
114.
103854,
2021.
https://doi.org/10.1016/j.ijnurstu.2020.103854

Sentiment Analysis and Machine Learning Classification of COVID…

[45] Luo T., Chen S., Xu G., & Zhou J. Sentiment
Analysis. In: Trust-based Collective View
Prediction. Springer, New York, NY, 2013.
https://doi.org/10.1007/978-1-4614-7202-5_4
[46] Kantardzic, M. Data mining: Concepts, models,
methods, and algorithms (3rd ed.). Hoboken, NJ:
Wiley-IEEE Press, 2019.
[47] Albon, C. Machine learning with Python cookbook:
Practical solutions from preprocessing to deep
learning. Sebastopol, CA: O'Reilly Media, 2018.
[48] Callaghan, T., Moghtaderi, A., Lueck, J. A., Hotez,
P., Strych, U., Dor, A., Fowler, E. F., & Motta, M.
Correlates and disparities of intention to vaccinate
against COVID-19. Social science & medicine
(1982),
113638,
2021.
https://doi.org/10.1016/j.socscimed.2020.113638

Informatica 47 (2023) 73-82

81

82 Informatica 47 (2023) 73–82

S. Tüzemen et al.

https://doi.org/10.31449/inf.v47i1.4497

Informatica 47 (2023) 83–96

83

Learning the Structure of Bayesian Networks from Incomplete Data Using a
Mixture Model
Issam Salman1 , Jiří Vomlel2
1 Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Trojanova 13, 120 01,
Prague, CZ
2 Institute of Information Theory and Automation of the CA, Pod Vodárenskou věží 4, 182 00, Prague, CZ
E-mail: Issam.Salman@fjfi.cvut.cz, vomlel@utia.cas.cz
Keywords: Bayesian network, belief-noisy-OR, structure learning, incomplete data, EM-mixture
Received: November 8, 2022
In this paper, we provide an approach to learning optimal Bayesian network (BN) structures from incomplete data based on the BIC score function using a mixture model to handle missing values. We have
compared the proposed approach with other methods. Our experiments have been conducted on different
models, some of them Belief Noisy-Or (BNO) ones. We have performed experiments using datasets with
values missing completely at random having different missingness rates and data sizes. We have analyzed
the significance of differences between the algorithm performance levels using the Wilcoxon test. The new
approach typically learns additional edges in the case of Belief Noisy-or models. We have analyzed this
issue using the Chi-square test of independence between the variables in the true models; this approach
reveals that additional edges can be explained by strong dependence in generated data. An important
property of our new method for learning BNs from incomplete data is that it can learn not only optimal
general BNs but also specific Belief Noisy-Or models which is using in many applications such as medical
application.
Povzetek: Razvita je metoda za določitev optimalne Bayesove mreže ob nepopolnih podatkih.

1 Introduction

2

Bayesian network

Bayesian networks (BNs) have been used in a variety of
applications. The challenge of learning a BN can be categorized into two parts: (1) structural learning, which involves identifying the topology of the BN; and (2) parametric learning, which involves estimating the conditional
probabilities for a given network. The challenge of learning
the structure of a BN is by far more difficult than the other
one. Most methods, such as [1] and [2], require complete
data, while in practical applications we are often confronted
with values missing from the dataset; this problem regards
both parts (1 and 2) mentioned above and affects the performance of the model learning. A record with a missing
value should be omitted from the dataset.
An earlier work [3] studied the impact of learning the
parameters and the structure of a BN using hard EM and
soft EM with a comprehensive simulation study covering
incomplete data.
In this paper, we study the problem of learning the optimal BN structure from incomplete data, adopting a new
approach of using the product distribution mixture models
to handle missing values; the latter will be used with [2] to
estimate the missing values and learn the optimal structure.
In addition, we show in this paper that our new approach is
able to learn the structure of a Belief Noisy-OR (BNO) [4]
model from incomplete data.

A Bayesian network encodes a joint probability distribution
over a set of random variables U = {X1 , X2 , . . . , Xm }. We
consider only discrete variables in this work, which is the
most common current usage of BNs. A finite set of states
of a variable Xi will be denoted by Xi . Conditional probability distributions (CPDs) are attached to each variable in
the network. Their purpose is to quantify the strength of the
relationships depicted in the BN through its structure: these
CPDs mathematically describe the behavior of that variable
under every possible value assignment of its parents. Since
to specify this behavior one needs a number of parameters
exponential in the number of parents, and since this number
is typically smaller than the number of variables in the domain, this approach results in exponential savings in space
and time.
Formally, a Bayesian network for U is a pair B = ⟨G, Θ⟩.
Its first component, G, is a directed acyclic graph whose
vertices correspond to the random variables U, and whose
edges represent direct dependencies between these variables. The graph G encodes independence assumptions:
each variable Xi is independent of its non-descendants given
its parents in G. The second component of the pair, namely
Θ, represents the set of parameters that quantify the network. It contains parameter θxi |Πxi = f (xi |Πxi ) for each
possible value xi of Xi and Πxi of ΠXi , where ΠXi denotes

84

Informatica 47 (2023) 83–96

the set of parents of Xi in G. Accordingly, a Bayesian network B defines a unique joint probability distribution over
U given by:
m

F(X1 = x1 , . . . , Xm = xm ) =

∏ F(Xi = xi |ΠXi = Πxi )
i=1
m

=

∏ θxi |Πxi
i=1

for each ΠXi which is a parent of Xi .

2.1

Structure learning of BN

Note that a BN can be viewed from two perspectives: as an
effective coding of an independence relationship on the one
hand, and as an effective encoding of a high-dimensional
distribution of probabilities on the other hand.
One option of learning the structure is to rely on the specialists in the field through a conscious and meticulous process of knowledge gathering. This involves training experts
in probabilistic graphical modeling, validating expert opinions, and extracting and testing information. This process
all too often leads to disagreements among experts and a
lack of reliability pertaining to the model. Nonetheless, in
many fields, where data is scarce, this is one of the key approaches to model building.
Another mechanism is the automatic derivation of the
model based on a data set. It is this machine learning approach (ML) that we follow here (so that we avoid the very
rich field of human knowledge acquisition). For a data
set D = {u1 , u2 , . . . , un }, where ui is an instantiation of all
variables in U, the BN structure learning translates to the
problem of learning a network structure from D. Suppose
u is complete and discrete. Consequently, finding the optimal Bayesian network is reduced to finding the optimal
structure. The optimal structure can be learned by three approaches coming from the area of ML.
The first is the constraint-based approach to structure
learning, which takes advantage of the first perspective and
attempts to reconstruct a Bayesian network by analyzing
data independence. These algorithms require an infinite
amount of data to learn independence with certainty; highorder independence tests can be unreliable unless the sample size is truly huge [5]. The second is the score-based approach, which invests in the second perspective and looks
for Bayesian networks that adequately describe the available data with the best score. The core of this approach is
to assign a score value s(G) to each acyclic directed graph
G. The score function defines an overall order (up to equivalences) over the structures in such a way that a structure
with a better description of the data is assigned a higher
value. The last approach is a hyper-approach, which mixes
the two previous approaches together.

2.2

Score-based

Score-based learning is a technique frequently used for determining the optimal structure. In this process, each candi-

I. Salman et al.

date is assigned a BN score to measure the goodness-of-fit
of a structure to the data. The goal of the learning problem is then to find the optimal scoring structure. The score
usually measures how well this BN describes the data set
D.
Definition (1): Let B = ⟨G, Θ⟩ be a Bayesian network,
and let D = {u1 , . . . , un } be a training set, where each ui assigns a value to all variables in U. The MDL scoring function of a network B given a training data set D, written
MDL(B|D), is given by:
MDL(G|D) =

LL(G|D) −

log n
|G|
2

where |G| is the number of parameters in the network. The
first term represents the loglikelihood, i.e., it measures the
model fit. The second term penalizes the model complexity. The penalty term for MDL is greater than that for most
other evaluation functions, since optimal networks with the
MDL are usually sparser than optimal networks with function scoring. As its name suggests, an optimal network with
MDL minimizes the scoring function rather than maximizing it. The Bayesian information criterion (BIC) [6] is a
scoring function whose calculation is equivalent to MDL
for Bayesian networks, but it is derived on the basis of the
models’ asymptotic behavior. Where the score is decomposable, it can be written as a sum of the scores of each
variable and its parent set:
m

BIC(G|D) =

∑ BIC(Xi |ΠXi )

i=1
m

=

∑ {LL(Xi |ΠXi ) − Penalty(Xi |ΠXi )}

i=1

The score-based algorithms’ aim is to optimize this score
and return the structure G that maximizes it. As the space
of all possible structures is at least exponential in the
number of variables m, this presents a number of problems. There are m(m − 1)/2 possible undirected edges
and 2m(m−1)/2 possible structures for every subset of these
edges. Moreover, there may be more than one orientation
of the edges for each such choice. One popular choice is
hill-climbing [7].

2.3 Structural learning with pruning
Statistical testing is a method of reducing the set of potential DAGs. Another approach to reducing this set is to use
constraints provided by experts. Besides that, we can use
structural constraints similar to in [2]. The structural constraints can be applied locally as long as they include only
one node and its parents.
Algorithm 1 represents an approach to learning the optimal structure of a BN using the constraint rules and a decomposable score [8]. The main function of the algorithm
is to compute a collection of candidate parent sets for each
variable. Then we optimize across this collection by selecting one parent set for each variable, without creating

Learning the Structure of Bayesian…

Informatica 47 (2023) 83–96

directed cycles while maximizing the total score. The following theorem can be used to reduce the numbers of the
collections for candidate parents.
′

Lemma 2.1. Let Xi be a variable and Π be a candidate
′
parent set for Xi . Suppose that BIC(Xi |Π ) < BIC(Xi |{}).
′
Then Π can be safely ignored from the candidate parent
sets.
Proof. The proof uses the decomposability of the BIC
′
score. Let G and G be DAGs that differ only on the parent
′
′
set of Xi where Π is the parent set of Xi in G and Π is the
′
′
parent set of Xi in G. Suppose Π ⊂ Π . Therefore, if G does
not contain directed cycles then G cannot contain them ei′
ther. This fact, together with BIC(Xi |Π) > BIC(Xi |Π ), im′
plies thast G is not BIC optimal. This statement also holds
if the candidate subset is the empty set Π = {}.
Algorithm 1 Parent sets evaluation for the BN structure
learning algorithm
Input: .
D: a data set
m: an integer representing the number of variables in
D
Output: Accepted sets of parents for each node
Phase 1: initialize the parameters
gi = (V, E) // A DAG containing a node and its candidate parent set
Si : BIC score of gi
Qi : priority queue of triples (Xi , ΠXi , Si ) ordered by
Si
Phase 2: mscour(Xi , D) function to find the min(BIC)
score
Si = the BIC score of gi where only Xi is included
return Si
Phase 3: find the accepted Qi for Xi
Qi is empty
S∗ = mscour(Xi , D)
ΠXi is a parent set for Xi
add (Xi , ΠXi , S∗ ) to Qi
For each Xk , k ∈ {1, . . . , m} do:
add Xk to ΠXi
Ski = the BIC score of the updated ΠXi
if(Ski > S∗ )
add (Xi , ΠXi , Ski ) to Qk
For each X j , j ∈ {1, . . . , m}, i ̸= j ̸= k, do:
add X j to ΠXi
Ski = the BIC score of the updated ΠXi
if(Ski > S∗ )
add (Xi , ΠXi , Ski ) to Qk
else delete X j from ΠXi
end for
else delete Xk from ΠXi
end for
The gi = (V, E) in Phase 1 from Algorithm 1 is a DAG
containing the set of nodes V = {Xi , ΠXi } and the set of arcs

85

E = {(Xo , Xi ), ∀Xo ∈ ΠXi }. Algorithm 1 considers all possible parent sets that can lead to an optimal BN. Its implementation is based on [8]. After Phase 3, we find a DAG
with the highest BIC from among the variables given the
candidate parent sets of each variable. That is done using
GOBNILP [9] tool1 which is a smart algorithm using integer linear programming. We will refer to this algorithm as
A1.
Let us also note that, if a dataset is generated from a BN
having the empty graph as its structure and this dataset is
large enough then, for any parent set {ΠXi ̸= ϕ }, it holds that
BIC(Xi |{}) > BIC(Xi |ΠXi ). This implies that the variables
are independent and the penalty for larger parent sets makes
the BIC value worse for all nonempty parent sets.
One of the axioms of the pruning rules stated in the literature states that if a candidate subset has a better score than
another candidate set and the first candidate set is a subset of the second candidate set, it is safe to disregard that
second candidate set due to the decomposability of score
functions. We have applied the pruning rule as formalized
in the theorem 2.1 in Algorithm 1. That algorithm will reduce the collection of accepted parent sets for each node by
discarding all parent sets which do not meet the criteria.

3

Incomplete datasets

One of the widespread problems in data mining and machine learning is incomplete data. Values may be missing
even from training instances. Nowadays more and more
datasets are available, but most of them are incomplete.
Therefore, machine learning must cope with this problem.
Normally, to learn the BN structure using A1 algorithm [2],
we need complete data, such that all instances ui ∈ D, i ∈
{1, . . . , n} are complete and don’t have any missing values.
In the case of incomplete data and an instance which has
a missing value, A1 does not use this instance in the BN
structure learning.

3.1

Product distribution mixtures to handle
incomplete data

Because of incomplete data, most methods in machine
learning cannot be applied. An easy way to deal with
this problem is completing the data by simply omitting the
incomplete vectors or removing the incomplete variables.
But this approach has a weakness: we may lose a massive
part of the available information. Another alternative is to
use an estimation to replace the missing values [10] (i.e.,
put in estimates of the missing values). However, for certain reasons, the estimated values have to be typical, and the
natural variability of the data will be partially restricted. For
that, the product mixture model gives us a better way to directly apply the EM algorithm to complete the dataset [11].
We will refer to this approach as EM-Mixture.
Considering finite mixtures we assume that:
1 https://www.cs.york.ac.uk/aig/sw/gobnilp/

86

Informatica 47 (2023) 83–96

I. Salman et al.

r

∑ w j F(X| j),

P(X) =

(1)

j=1
m

r

i=1

j=1

∏ Fi (Xi | j), ∑ w j = 1

F(X| j) =

(2)

where w j > 0 is a probabilistic weight of the j-th mixture
component, Fi is the conditional distribution of the variable Xi , i ∈ 1, . . . , m, and r is the number of components.
Note that the product components do not imply that the involved variables are independent. In this sense, the mixture
model (1) is not restrictive [12]. It is easy to verify that, by
increasing the number of components r, we can describe
any discrete probability distribution in the form (1). In our
experiments, it was selected based on the number of variables in a dataset.
To estimate the mixture parameters, we maximize the
log-likelihood function:
n

LL

∑ log P(u(k) )

=

k=1

where n is the number of records in the dataset D and u(k) is
the k-th datavector from D. We will use the EM algorithm
to maximize the log-likelihood function.
Next, we explain how the learned product mixture model
will be used to fill in the missing values. Let C =
{i1 , i2 , . . . , ik } be a subset of M = {1, 2, . . . , m} such that the
corresponding sub-vector
uC = (xi1 , xi2 , . . . , xik )

r

∑ w j FC (uC | j)

(3)

∏ Fi (xi | j)

(4)

j=1

FC (uC | j) =

.

i∈C

Let z be an index of a variable unobserved in u, i.e., z ∈
M \C. Under the product mixture model, we can compute
the conditional distribution of the missing value uz given
the complete part uC with PC (uC ) > 0 as
Pz|C (uz |uC ) =

{v ∈ {1, 2, . . . , n} : uv observed in u}
{u ∈ D : i ∈ N (u)}

N (u) =
D(i) =

In Algorithm 2, we present the modification of the EM algorithm for the product mixture model for incomplete data.
For xv ∈ Xv , v ∈ {1, 2, . . . , n}, and j = 1, . . . , r, we use
Fv (xv | j) to denote the conditional probability of observing
value xv of variable Xv given the component j. The initialization of the EM-Mixture algorithm (presented in Algorithm 2) is performed using the partitions obtained from
agglomerative hierarchical clustering implemented in the
function hc of the R package mclust [13]. In our algorithm,
the symbol δ (x, y) denotes the standard delta function equal
to one if x = y and equal to zero otherwise.
Algorithm 2 EM-Mixture
Input: D is a data set
Output: a completed data set
Phase 1: initializing:
w j , j = 1, . . . , r
Fv (xv | j), for xv ∈ Xv , v ∈ {1, 2, . . . , n}, and j = 1, . . . , r

is complete. Then, under the product mixture model, we
can compute the marginal probability of uC as
PC (uC ) =

missing value. There are several ways of using this probability distribution to fill in the missing value of Xz in u – in
this paper, we select value uz maximizing Pz|C (uz |uC ) over
all values of Xz .
The last step of our presentation is the description of
adapting the EM algorithm for learning product mixture
models such that it is applicable to incomplete data. Given
a data vector u ∈ D and a variable Xi with index i ∈
{1, 2, . . . , n}, let N (u) be the subset of indices of the available variables (i.e., observed in that data) of u, and D(i) ⊂ D
be the subset of vectors with observed values of variable Xi :

Pz,C (uz , uC )
PC (uC )

L = −∞
Phase 2:
repeat
E-Step:
w j ∏v∈N(u) Fv (uv | j)
,
r
∑l=1 wl ∏v∈N(u) Fv (uv |l)

q( j|u) =
wj

1
∑ q( j|u), for j = 1, . . . , r
|D| u∈D

=

M-Step: for xv ∈ Xv , v ∈ {1, 2, . . . , n}, and j = 1, . . . , r
Fv (xv | j) =

r

=

∑ W j (uC )Fz (uz | j)

j=1

L

′

∑u∈D(v) δ (xv , uv ) · q( j|u)
∑u∈D(v) q( j|u)
[
r

=

∑ log ∑ w j ∏

u∈D

where W j (uC ) are the conditional component weights:

Q

=

L′ − L

w j FC (uC | j)
W j (uC ) = r
.
∑ j=1 w j FC (uC | j)

L

=

L′

We thus compute the probability distribution Pz|C (uz |uC )
for each missing value of each data vector u ∈ D with a

for u ∈ D, j = 1, . . . , r

j=1

]

Fv (uv | j)

v∈N (u)

until Q ≤ ε
The EM algorithm converges monotonically to a local

Learning the Structure of Bayesian…

or global maximum or a saddle point of the log-likelihood
∞
function L in the sense that the sequence of {Lt }t=0
does
not decrease. The presence of a local maximum makes the
starting point of the procedure influential; hence it is selected at random. We use the value of ε = 0.005 to terminate the main loop of the algorithm. The sequence of
log-likelihood values generated by E-Step and M-Step is
′
non-decreasing [11] (i.e., L L).
We adapt the BN structure learning algorithm A1 so that
it can learn from incomplete data. We use the EM-Mixture
algorithm, i.e., Algorithm 2, to make the incomplete data
complete in Phase 3. The whole algorithm will be referred
to as A2.

3.2

Informatica 47 (2023) 83–96

87

Table 1: Recommended algorithm by decision tree leaf
where MCAR rate in [5 - 10 ] -Group 1.
Leaf

Size

A

Size >5000

B

Size in [3000 - 5000]

C

Size in [1500 - 2500]

D

Size <1000

Bayesian Network
Weather
M1
Weather
M1
M2
Child
Weather
M1
M2
Child
Weather
M1
M2
Child

Recommended Algorithm
A1, A2, A3, A4
A2, A3, A4
A2, A3, A4
A2, A3, A4
A2
A2, A3, A4
A2, A3, A4
A2, A3
A2, A3, A4
A2
A2, A3, A4
A2, A3
A2
A2, A3

Experiments

The experiments have been repeated ten times on ten different subsets in each MCAR rate on different models, using
the generated datasets from the true models summarized in
Table 8 in A. We have compared our approach denoted as
A2 with three other methods. By A1 we denote the BIC
optimal learning from complete data created by omitting all
rows containing a missing value. In [3], the authors proposed the soft and hard EM algorithms to fill in the missing
values and learn an optimal BN structure from the completed data by Tabu search [14], which we refer to as A3
and A4, respectively.
The test scenarios, which include more than 700 incomplete datasets, are summarized in Figure 1. The resulting
BNs of the simulations within each scenario are shown in
Tables 1, 2, and 3.
The decision tree shown in Figure 1 is intended to guide
practitioners as to which imputation algorithm appears to
perform the best, depending on the characteristics of their
problem with incomplete data. Each leaf of the decision
tree corresponds to a subset of the scenario that we studied,
grouped according to the values of the experimental factors, to recommend which algorithm has the best average
Structure Hamming Distance [15] (SHD) values between
the essential graph of the learned model and the essential
graph of the true model. The dominance of the algorithms
has been tested using the Wilcoxon test [16]. We say that
an algorithm is better than another if it has a lower average
SHD and their confidence intervals do not overlap, i.e., the
p-value of the Wilcoxon test is lower than 5%.
In the results based on the SHD, A2 has scored the best
results. For the results based on the SHD and the Wilcoxon
test, we have observed some important general trends:

– A1 is significantly worse than other Algorithms in all
scenarios where the data size is smaller than 5,000.
Figure 2 represents the algorithm results of all models with
all dataset sizes and all MCAR rates.
Table 2: Recommended algorithm by decision tree leaf
where MCAR rate in [15 - 25] - Group 2.
Leaf

Size

E

Size >5000

F

Size in [3000 - 5000]

G

Size in [1500 - 2500]

P

Size <1000

Bayesian Network
Weather
M1
Weather
M1
M2
Child
Weather
M1
M2
Child
Weather
M1
M2
Child

Recommended Algorithm
A1, A2, A3, A4
A2, A3, A4
A2, A3, A4
A2, A3, A4
A2, A3
A2, A3
A2, A3, A4
A2, A3, A4
A2
A2, A3
A2, A3, A4
A2, A3
A2
A2, A3

Table 3: Recommended algorithm by decision tree leaf
where MCAR rate in [35 - 50] - Group 3.
Leaf

Size

H

Size >5000

G

Size in [3000 - 5000]

K

Size in [1500 - 2500]

M

Size <1000

Bayesian Network
Weather
M1
Weather
M1
Weather
M1
Weather
M1

Recommended Algorithm
A1, A2, A3, A4
A2, A3, A4
A2, A3, A4
A2, A3
A2, A3, A4
A2, A3
A2, A3, A4
A2

– A2 appears to be a good algorithm in all scenarios.
– A2 is significantly better than other Algorithms for
Model M2 in Leaves B and G.

4

– A2 is significantly better than other Algorithms for the
model Child in Leaf C.

The Belief Noisy-Or (BNO) model is suitable for describing a specific class of uncertain relationships in Bayesian
networks [4] common in several practical applications of
BNs. As an example, let us mention the QMR-DT network [17]. In Figure 3 we present the structure of a CPT

– A2 and A3 are significantly better than A1 and A4 for
Models M1 in Leaves C, D, P, and K.

Belief Noisy-Or model

88

Informatica 47 (2023) 83–96

I. Salman et al.

5000 <

< 1000

[1000 – 5000]

Data Size

[1000-2000]

[2001-5000]

Data Size

[35 - 50]

[5 - 10]

[15 - 25]

MCAR Rate

A

B

C

H

D

E

Groupe 1

F

G

G

K

M

P

Groupe 3

Groupe 2

Figure 1: The decision tree for different test scenarios.

F(Y |X1 , . . . , Xn ) where auxiliary nodes X1′ , . . . , Xn′ are added
to explicitly separate the noisy relations from the logical
OR relation. For a CPT with multiple parent variables
X1 , . . . , Xn the noisy-or is defined as follows2 :
′

F(Xi = 0|Xi = 0) =
′

F(Xi = 0|Xi = 1) =

n

F(Y = 0|X1 = x1 , . . . , Xn = xn ) =

i=1
n

=

′

1 − α , F(Xi = 1|Xi = 0) = α
pi ,

∏(pi )xi (1 − α )1−xi
1

n

′

F(Xi = 1|Xi = 1) = 1 − pi

′

∏ F(Xi = 0|Xi = xi )

F(Y = 1|X1 = x1 , . . . , Xn = xn ) =

1 − ∏(pi )xi (1 − α )1−xi
1

where i ∈ {1, . . . , n} and pi ∈ [0, 1] is the parameter which
defines the probability that the positive value xi of variable
Xi is inhibited – it is referred to as the inhibition probability
and the parameter α specifies the possibility of a positive
value even if the value of the corresponding parent variable
is negative. In most experiments, we will set α = 0. The
CPT of F(Y |X1′ , . . . , Xn′ ) represents the deterministic logical
OR function, i.e.,
F(Y = 0|X1′ = x1′ , . . . , Xn′ = xn′ ) =


 1


0

if x1′ = 0, . . .
xn′ = 0
otherwise.

Consequently, the CPT of F(Y |X1 , . . . , Xn ), which represents the noisy-or function, is computed as follows:
2 In the case of one parent variable, we use probability values as specified in Table 4.

4.1

Analysis of BNO models

In this Section, we analyze the BNO models represented in
Table 9 in A where α = 0. Tables 5, 6, and 7 show the
marginal probability distributions (MPD) of the variables
in BNO models N1, N2, and BN2O, respectively; look at
Figures 11 and 12. The Tables illustrate the decrease of
the marginal probability values for F(Ci = 0) in the case of
a node having more than one parent. See Table 7. This
decrease is due to the properties of the product of probabilities in (5). On the other hand, they also illustrate the
increase of that marginal probability with a higher number
of its predecessors in previous layers; that increase depends
on the number of layers above and also on the numbers of
the edges in those layers. See Table 5 and Table 6.
Using the conditional probability distributions of the
variables given their parents, we can easily calculate joint

Learning the Structure of Bayesian…

Informatica 47 (2023) 83–96

89

′

Table 4: F(Xi |Xi ) table
′

0
1

Xi

Xi
0
1−α
α

1
0.2
0.8

Table 6: N2 (Figure 11):
tions
C1 C2
F(Ci = 0) .5
.6
F(Ci = 1) .5
.4

Marginal probability distribuC3
.536
.464

C4
.539
.461

C5
.716
.284

C6
.707
.293

Table 5: N1 (Figure 11):
C1 C2
F(Ci = 0) .5
.6
F(Ci = 1) .5
.4

Marginal probability distributions
C3 C4
C5
C6
.68 .744 .795 .837
.32 .256 .205 .163

Table 7: BN2O (Figure
tions
C1 C2
F(Ci = 0) .5
.5
F(Ci = 1) .5
.5

12): Marginal probability distribuC3
.5
.5

C4
.5
.5

C5
.5
.5

C6
.129
.871

C7
.36
.64

C8
.36
.64

Values

30

20

10

Figure 3: Noisy-or
0
A1

A2

A3

A4

Algorithms

configurations xB :

Figure 2: The Structural Hamming Distance to the true
models from the resulting models of the structure learning
algorithms using data generated from all models, summarized in Table 8 averaged over all data sizes and all MCAR
rates.

D(F(XA |XB )||G(XA |XB )) = ∑ F(XB = xB )
xB

∗ ∑ F(XA = xA |XB = xB )
xA

∗ log
=

F(XA = xA |XB = xB )
G(XA = xA |XB = xB )

∑

F(XA = xA , XB = xB )

xA ,xB

probability distributions F(U) using formula (1) and conditional probability distributions (CPD) F(XA |XB ), where
XA ⊆ U and XB ⊆ U\XA . Recall that a CPD for a particular
configuration xB of parent nodes XB can be computed as3 :

F(XA |XB = xB ) =

F(XA , XB = xB )
F(XB = xB )

(5)

The Kullback-Leibler Distance (KLD) of two conditional
probability distributions F(XA |XB ) and G(XA |XB ) defined
on the same state space is computed as the weighted average KLD of F(XA |XB = xB ) and G(XA |XB = xB ) over all

3 Please, note that all BNs considered in this paper satisfy the condition
F(XB = xB ) > 0 for all xB .

∗ log

F(XA = xA |XB = xB )
G(XA = xA |XB = xB )

We will use KLD of conditional probability distributions
estimated from the true data to support our arguments when
we explain the results.

4.2

Experiments

We have performed experiments on different Belief noisyor (BNO) models with their CPTs defined in Table 4 where
α ∈ {0, 0.2} and the CPT of a node which has no parent is
uniform, i.e., F(Xi = 1|{}) = 0.5, F(Xi = 0|{}) = 0.5. The
experiments have been repeated ten times on ten different
datasets generated from BNO models with different MCAR
rates as specified in Table 9 in Appendix A. In all Figures,
we will denote additional edges by blue dashed lines, missing edges by red lines, and edges with different arrows by
orange lines.

90

Informatica 47 (2023) 83–96

4.2.1 Model N1
The true N1 model is shown in Figure 11 in Appendix A.
We use this model as an example of a simple model with
a chain structure. This model is motivated by some applications, e.g., from telecommunications. Let us summarize
the results of this model:
– All algorithms learn the true structure when α ̸= 0 in
all data sizes and all MCAR rates.
– The algorithms A2, A3, and A4 learn structures different from the true model in some cases with α = 0,
MCAR rate 15% and data size of 1,000. For example,
A3 and A4 learn additional edge C2 → C4, also, A2
learn C4 → C6 instead of C5 → C6.
– Using equation (5), we calculate F(C6|C5) and
F(C6|C4) from the true model N1. We have found
that their KLD value (computed using equation (4.1))
is very small, it is only 0.001. Also, the chi-square
test of independence, whose p-value is smaller than
0.0001, reveals that there is a strong dependence between C6 and C4 in addition to the relationship between C6 and C5, already explicitly present in the true
model. Also, the BIC of the learned structure4 is 2,252.93 and the BIC of the true model from the same
dataset is -2,255.64. This can be explained by the deterministic conditional distribution F(C6|C5 = 0) for
C5 = 0. For these reasons, we can conclude that we
can accept that A2 has learned C4 → C6 instead of
C5 → C6.
4.2.2

Model N2

The true N2 model is shown on the right hand side of Figure 11 in Appendix A. We use this model as an example of a
model more complicated than the previous model N1. This
model is motivated by some applications, e.g., by computer
networks. We summarize the results of the experiments performed with this model:
– Figure 4 represents the Structure Hamming Distance
(SHD) for all tested MCAR rates and models with α =
0. We can observe that, as expected, the algorithm’s
performance is getting better with increasing the data
size.
– We can see that A2 on average has a smaller SHD distance to the true model than other algorithms.
– In Figure 5, we compare the models learned from the
datasets of size 5,000 with MCAR rate 10% using all
four algorithms. We can see that A2 and A3 have the
same SHD but they differ in that A3 has a missing edge
C4 → C5 while A2 has an additional edge C3 → C6.
This additional edge can be explained by observing
that there is a chain of nodes C3 → C4 → C6 which
4 We report the BIC value for one of ten datasets since the results for
the remaining nine are similar.

I. Salman et al.

the state 0 is propagated through because of α = 0.
In other words, we calculate F(C3,C6|C1,C2,C4,C5)
and the product F(C3|C1,C2,C4,C5) · F(C6|C4,C5)
from the true model 11 using equation (5). The KLD
value (computed as explained in (4.1)) of these two
distributions is very small, it is only 0.02. Also,
the chi-square test of independence of C3 and C6 reveals these variables are dependent (the test’s p-value
is smaller than 0.0001). The additional edge can be
also supported by a comparison of BIC values of the
learned structure with and without the additional edge
C3 → C6; they are -9,813.67 for the model with the
additional edge and -9,880.5 for the true model.
– If α > 0 then no additional edge is learned anymore,
no matter what the MCAR rate is. Algorithms A2 and
A4 we are always able to learn the true structure when
the data size exceeds 1,000. Also, A1 and A3 learn the
true structure when the data size is larger than 1,500.

4.2.3 BN2O models
These models are motivated by health-care applications, for
example by the QMR-DT network [17]. We created 60 different BN2O models consisting of two layers with N = 20
nodes in total. They differ in the numbers of nodes in the
first layer, namely L1 ∈ {5, 8, 12, 15}; the numbers of nodes
in the second layer are L2 = 20 − L1 . The numbers of edges
between layers are generated randomly with three different
4·N
options N2 , 2·N
2 , and 2 ; each option repeated five times.
Using these models, we have generated multiple incomplete datasets where the sizes of the datasets are 3,000 and
5,000 and the MCAR are 10% and 15%. Figure 8 shows the
boxplot of additional and missing numbers of edges learned
in all instances for each algorithm where the dataset size is
3,000, and for all MCAR rates. The results show that A2
has better results on average (i.e., the distance to the true
model is smaller) than other algorithms.
Next we discuss in more detail one simpler example of
a BN2O. The structure of this model is shown in Figure 12
in Appendix A.
– Figure 6 represents the SHD of all learned models
grouped by MCAR rates with models where α = 0.
– The learned models from the dataset of size 5,000,
MCAR rate 10% and α = 0 using all algorithms are
shown in Figure 7. We can see that A2 performs better (i.e., the SHD distance to the true model is smaller)
than other algorithms.
– Note the additional edge C7 → C8 learned by A2
for most datasets. The argument supporting this
additional edge is similar to that valid for the additional edge in the N2 model. Again, we can
see that KLD of F(C7,C8|C2,C5) and the product
F(C7|C2,C5).F(C8|C2,C5) is very small; it is only
0.002. Also, the chi-square test of independence of C7

Learning the Structure of Bayesian…

Informatica 47 (2023) 83–96

and C8 has the p-value smaller than 0.0001 and there
is always only a very small difference between BIC
of the model with the extra edge and the true model;
for example„ BIC of the model with the extra edge is
-7,331.8 while the BIC of the true model is -7,338.5
for one of the en generated datasets.

3.0
2.0

o

o
+*
*

*
+
*
o
+*
*

1000

2000

A1
A2
A3
A4

o
+*
*

0.0

1.0

Value

o
+
**

3000

4000

91

– In the experiments with models having α > 0 no additional edge has been learned and the true model is
learned successfully when the data size is 2,500 or
larger for all MCAR rates.

5000

4.2.4 A large BN2O model

2.0

o

o
o

+*
*

+*
*
o

1.0

Value

3.0

Dataset Size

o
*

A1
A2
A3
A4

*
+
*

0.0

*
+
*

1000

+
*

2000

3000

4000

5000

2.0
0.0

1.0

Value

3.0

Dataset Size

o

o
o

+*
*

+*
*
o
*
+
*

1000

o
*
+
*

*
+
*
A1
A2
A3
A4
2000

3000

4000

5000

Dataset Size

Figure 4: The Structural Hamming Distance of the resulting models of the structure learning algorithms to the true
model (with α = 0) using the data generated from the N2
model (the true model is presented in Figure 11) using the
average over ten experiments for different data sizes and for
the MCAR rates of 5%, 10%, and 15%, respectively.

We have performed experiments with a model shown in
Figure 13 in Appendix A. This model consists of 25 variables; 14 in the first layer and 11 in the second layer. All
algorithms required a data size of more than 3,000 to give
a good performance. With the data size of 3,000 (and the
MCAR rate of 10%) the recorded SHD of algorithms A1,
A2, A3, and A4 still have not been very good – namely,
14.6, 10.2, 10, and 9.8, respectively. With the data size of
5000 and 7500 (and the MCAR rate of 10%) the recorded
average SHD of A1, A2, A3, and A4 are already much better – namely, 7.2, 4.2, 4.3, and 5.1, respectively. See Figure 9 for the learned models. With the data size of 10,000
we already get the true models except for the additional
edges in the case of A2, as discussed in Section 4.2.3.

92

Informatica 47 (2023) 83–96

I. Salman et al.

C3

C3

C3

C3

C1

C2

C1

C2

C1

C2

C1

C2

C4

C6

C4

C6

C4

C6

C4

C6

C5

C5

C5

C5

Figure 5: Models learned by A1, A2, A3, and A4, respectively, for most of ten datasets generated from the true N2 model
(presented in Figure 11) (for α = 0) with the MCAR rate 10 and the data size of 5,000.
6

10.0
7.5
7.5

Values

Values

Values

4
5.0

5.0

2
2.5
2.5
0

0.0
A1

A2

A3

A4

A1

A2

Algorithms

A3

A4

A1

Algorithms

A2

A3

A4

Algorithms

Figure 6: The Structural Hamming Distance to the true models of the resulting models of the structure learning algorithms
using data generated from the BN2O model (the true model is presented in Figure 12) (with α = 0) averaged over all data
sizes for MCAR rates of 5%, 10%, and 15%, respectively.

C4

C4

C3

C2

C4

C3

C2

C3

C4
C2

C3

C2

C1

C5

C1

C5

C1

C5

C1

C5

C6

C8

C6

C8

C6

C8

C6

C8

C7

C7

C7

C7

Figure 7: Models learned by A1, A2, A3, and A4, respectively, using data generated for most of ten datasets generated
from the true BN2O model (presented in Figure 12) (for α = 0) with the MCAR rate 10 and the data size of 5,000.
8

5

4

Values

Values

6

4

2

3

2

1

0

0
A1

A2

A3

Algorithms

A4

A1

A2

A3

A4

Algorithms

Figure 8: Results of the structure learning algorithms using data generated from the BN2O model (with α = 0) with the
data size of 3,000 and averaged over all tested MCAR rates. The plot on LHS displays the average number of additional
edges and the plot on RHS displays the average number of missing edges.

Learning the Structure of Bayesian…

L

C

J

I

D

E

K

L
M
F

A

R
V

T
P
Y

S

U

O

I

D

E

L
M

X

F

G
N

A

W

R

N

A

W

R

P
S

U

O

D

E

L
M

X

V

T
P
Y

S

U

O

X

I

D

E
M

B
H

Q

C

J

K
F

G

V

T
Y

I

B
H

Q

C

J

K

B
H

Q

C

J

K

B
G

93

Informatica 47 (2023) 83–96

F

G
N

A

W

R

H
N

W
V

T
Q

P
Y

S

U

O

X

Figure 9: Models learned by A1, A2, A3, and A4, respectively, using the data generated from the large BN2O model
consisting of 25 variables (for α = 0) with the MCAR rate of 10% and the data size of 7,500 (true model is presented in
Figure 13).

5 Conclusion
In this paper, we provide an approach to learning the optimal BN structure from incomplete data by adapting the considerations of [8]. This adaptation imputes missing values
using product mixtures learned by the EM algorithm [11].
We have shown that the sequence of log-likelihood values
generated by E-Step and M-Step of the EM algorithm is
non-decreasing and the algorithm converges. Theorem 2.1
helps us reduce the collection of candidate parent sets for a
variable, which can speed-up the learning algorithm.
We have performed experiments on incomplete data generated from different types of BN models to compare the
proposed Algorithm A2 with other algorithms, namely with
A1 [8], soft and hard EM [3], referred to as A3 and A4, respectively. In our comparisons, we use Structure Hamming
Distance of CPDAGs of learned DAGs to CPDAGs of the
original models.
Such comparisons have been undertaken on (a) general
Bayesian networks and (b) Belief Noisy-or [4] (BNO) models with partially deterministic and nondeterministic conditional probability distributions. The experiments with models of type (b) are motivated by uncertain relationships in
Bayesian networks, which are common in practical applications of BNs. We have obtained the following results in
detailed simulation studies.
(a) General BN models:
– The A2 algorithm appears to be the best choice
from among the tested algorithms for learning
the structure of BNs from any incomplete data
whatever the data size and the missing MCAR
rate are.
– In most scenarios corresponding to different
datasizes and MCAR rates, Algorithm A2 is significantly better than other algorithms and in no
scenario is it significantly worse than any other
algorithm according to the Wilcoxon test.
(b) BNO models:
– A2 is able to recover all true edges in the tested
models except for the N1 model (shown in Fig-

ure 11) at size 1,000 and a missing rate of 15%.
The different learned structure of the model N1 is
acceptable because the Chi-square(X 2 ) test and
the Kullback-Leibler distance (KLD) between
the related conditional probabilities suggest there
is a high degree of relationship between the connected variables.
– A2 has learned an additional edge in the case
of Models N2 (shown in Figure 11) and BN2O
(shown in Figure 12). The additional edge is
acceptable since the X 2 test and KLD suggest
there is a high degree of relationship between
these variables. We have seen that BIC of the
learned structure is almost equal to BIC of the
true model. For example, BIC of the model
learned using A2 (shown in Figure 7) is -7,331.8
and BIC of the true model is -7,338.5. Similar
behavior has been observed in other BNO models.
– A2 is always able to recover all edges while other
algorithms are not.
– For large BN2O models, all algorithms require
data sizes large than 3000 to have a good performance; e.g., for the BN2O with 25 variables
A2 needs at least 10,000 data records to learn the
correct model (with the exception of additional
edges as discussed in Section 4.2.3).

We have empirically shown that our Algorithm A2 behaves better than other tested algorithms on several studied BNs and in different scenarios. Based on these experiments, we can recommend this algorithm for practitioners
that use BNs or BNOs with incomplete data especially in
the medical domain where BNO could be used to study the
hidden relationship between symptoms and diseases. An
interesting topic for future research might be learning the
structure of large BN2O networks from incomplete data and
optimize the number of components in the EM-Mixture .

94

Informatica 47 (2023) 83–96

Acknowledgement
This work was supported by Student Grant CTU
SGS20/132/OHK4/2T/14

References
[1] Nir Friedman, Dan Geiger, and Moises Goldszmidt.
Bayesian network classifiers. Machine Learning,
20(2-3):131––163, 1997. URL: https://doi.org/
10.1023/a:1007465528199.
[2] Cassio P de Campos, Mauro Scanagatta, Giorgio
Corani, and Marco Zaffalon. Entropy-based pruning for learning Bayesian networks using BIC. Artificial Intelligence, 260:42––50, 2018. URL: https:
//doi.org/10.1016/j.artint.2018.04.002.
[3] Andrea Ruggieri, Francesco Stranieri, Fabio Stella,
and Marco Scutari. Hard and soft EM in Bayesian
network learning from incomplete data. Algorithms,
13(12):329, 2020. https://doi.org/10.3390/a13120329
doi:10.3390/a13120329.
[4] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan kaufmann, 1988.
[5] Nir Friedman and Moises Goldszmidt.
Learning Bayesian networks with local structure.
In
Learning in graphical models, page 421––459.
Springer, 1998.
URL: https://doi.org/10.
1007/978-94-011-5014-9_15.
[6] Zhifa Liu, Brandon Malone, and Changhe Yuan. Empirical evaluation of scoring functions for Bayesian
network model selection. In Proceedings of the Ninth
Annual MCBIOS Conference. Dealing with the Omics
Data Deluge, Oxford, MS, USA., 2012. BMC Bioinformatics.
https://doi.org/10.1186/1471-2105-13S15-S14 doi:10.1186/1471-2105-13-S15-S14.
[7] Poh Choo Song, Hui Yee Chong, Hong Choon Ong,
and Sing Yan Looi. A model of Bayesian network
analysis of the factors affecting student’s higher level
study decision: The private institution case. journal
of Telecommunication, Electronic and Computer Engineering (JTEC), 8(2):105––109, 2016.
[8] Cassio P de Campos, Zhi Zeng, and Qiang Ji.
Structure learning of Bayesian networks using
constraints. In Proceedings of the 26th Annual
International Conference on Machine Learning, ICML ’09, page 113––120, New York, NY,
USA, 2009. Association for Computing Machinery.
https://doi.org/10.1145/1553374.1553389
doi:10.1145/1553374.1553389.
[9] James Cussens. Bayesian network learning with cutting planes. In Proceedings of the Twenty-Seventh

I. Salman et al.

Conference on Uncertainty in Artificial Intelligence,
page 153––160, Arlington, Virginia, USA, 2011.
AUAI Press.
[10] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via
the EM algorithm. J. Roy. Statist. Soc. B, 39:1––
38, 1977. URL: https://doi.org/10.1111/j.
2517-6161.1977.tb01600.x.
[11] Jirí Grim, Jan Hora, Pavel Boček, Petr Somol, and
Pavel Pudil. Statistical model of the 2001 Czech census for interactive presentation. Journal of Official
Statistics, 26(4):673––694, 2010.
[12] J. Grim and P. Boček. Statistical model of prague
households for interactive presentation of census data.
In SoftStat 95. Advances in Statistical Software 5.
Conference on the Scientific Use of Statistical Software, Heidelberg, DE, 1996.
[13] Luca Scrucca, Michael Fop, T. Brendan Murphy, and
Adrian E. Raftery. mclust 5: clustering, classification
and density estimation using Gaussian finite mixture
models. The R Journal, 8(1):289––317, 2016.
[14] Fred Glover. Tabu search-part I. ORSA Journal on
computing, 1(3):190––206, 1989.
[15] Marco Scutari and Jean-Baptiste Denis. Bayesian Networks: with Examples in R. Chapman & Hall, Boca
Raton, 2014. URL: https://doi.org/10.1111/
biom.12856.
[16] M Neuhäuser and Mann-Whitney Test. International
Encyclopedia of Statistical Science. Springer Berlin
Heidelberg, 2011.
[17] Michael A Shwe, Blackford Middleton, David E
Heckerman, Max Henrion, Eric J Horvitz, Harold P
Lehmann, and Gregory F Cooper. Probabilistic diagnosis using a reformulation of the internist-1/qmr
knowledge base. Methods of information in Medicine,
30(04):241––255, 1991. URL: https://doi.org/
10.1055/s-0038-1634846.
[18] B Abramson, J Brown, Ward E, Allan Murphy, and
Robert L Winkler. Hailfinder: A bayesian system for
forecasting severe weather. International Journal of
Forecasting, 12(1):57–71, 1996. Probability Judgmental Forecasting. URL: https://doi.org/10.
1016/0169-2070(95)00664-8.
[19] A Philip Dawid. Prequential analysis, stochastic complexity and Bayesian inference. Bayesian statistics,
4:109––125, 1992.

Learning the Structure of Bayesian…

Informatica 47 (2023) 83–96

A Appendix A. Simulation
Scenarios
This Appendix provides an inclusive list of all experiments
in the simulation study described in Sections 3.2 and 4.2,
organized by their main characteristics in Tables 8 and 9, respectively. The number of components in each experiment
selected based on the number of variables in the datasets.
The true models mentioned in the Table 8 are shown in Figure 10. The true models mentioned in the Table 9 are shown
in Figures 11 and 12.

E

D

W

B

S

A

Table 8: Description of the key factors of all BN experiments in the simulation study.
Network
Weather [18]

Child [19]

M2 (Figure 10)

M1 (Figure 10)

Missing Rate (MCAR)
10
25
50
10
15
50
5
10
15
25
10
20
35
50

Replicates
10
10
10
10
10
10
10
10
10
10
10
10
10
10

Sample Size
100, 500,1000,5000,10000
100, 500,1000,5000,10000
100, 500,1000,5000,10000,13000
1000, 2000,3000,5000
1000, 2000,3000,5000
1000, 2000,3000,5000
500,1000,1500,2500,5000
500,1000,1500,2500,5000
500,1000,1500,2500,5000
500,1000,1500,2500,5000
500,1500,2500,5000,10000,13000
500,1500,2500,5000,10000,13000
500,1500,2500,5000,10000,13000
500,1500,2500,5000,10000,13000

R

K

C

L
H

R

C

G
F

W

I

S
K

Table 9: Description of the key factors of all Belief NoisyOR experiments in the simulation study (true models are
presented in Figures 11 and 12).
Network
BN2O

N1

N2
large BN2O

Missing Rate (MCAR)
5
10
15
5
10
15
5
10
15
10

Replicates
10
10
10
10
10
10
10
10
10
10

Sample Size
1000,1500,2500,5000
1000,1500,2500,5000
1000,1500,2500,5000
1000,1500,2500,5000
1000,1500,2500,5000
1000,1500,2500,5000
1000,1500,2500,5000
1000,1500,2500,5000
1000,1500,2500,5000
5000, 7500

E

D

A

M

N

B
X
O
U
T

P

V
Q

Figure 10: M1 and M2 true models, respectively

95

96

Informatica 47 (2023) 83–96

I. Salman et al.

C4
C3

C3

C2

C4

C1

C2

C1

C5

C6

C8

C7

C6

C5

Figure 12: BN2O true model. Its marginal probability distributions are summarized in Table 7

C3

C1

C2

C4

C6

L

C

J

I

D

E

K

M

B

F

G

H
N

A
C5

Figure 11: N1 and N2 true models, respectively. Their
marginal probability distributions are summarized in Tables 5 and 6

W

R
V

T
Q

P
Y

S

U

O

X

Figure 13: Example of a large BN2O model with 25 variables (whose learned models are presented in Figure 9).

https://doi.org/10.31449/inf.v47i1.4297

Informatica 47 (2023) 97–108 97

A Prediction Model for Student Academic Performance Using
Machine Learning
Harjinder Kaur1,Tarandeep Kaur1, Rachit Garg2
1
Scool of Computer Applications, Lovely Professional University, Phagwara, 144401, India
2
COS, School of Computer Science and Engineering, Lovely Professional University, Phagwara, 144401, India.
E-mail: Harjinder.12962@lpu.co.in, Tarandeep.24836@lpu.co.in, rachit.garg@lpu.co.in
Keywords: academic performance, decision tree education data mining, ensemble model, naïve bayes, performance
prediction
Received: July 15, 2022
Abstract: Academic data mining impacts a large number of educational institutions, significantly, playing
a prime role in accumulating, studying, and analyzing the academic data. The accumulated academic
data can be processed and analyzed for various purposes. It can be used for predicting the student
academic performance and thereby broadening the retention rate of academic institutions. The prediction
of students’ academic performance at the initial stage helps the students to identify their lacking subjects
such that they can focus more on their deficient subjects and improvise their academic performance.
Currently, numerous machine learning techniques are being used by the academic institutions to extract,
analyze and predict the students’ academic performance and identify the fast and slow learners. This
paper proposes an ensemble model, using the voting method for preclusive prediction of the student
academic performance. The predicted results are being further utilized by the poor performers to
concentrate more on their deficit courses. Accordingly, the instructors can focus on creating and
implementing novel strategies or amending the existing pedagogical tools and approaches to aid the slow
learners in improvising their performance. The proposed model has been tested on the academic data of
an educational institution using the RapidMiner tool. The results depicts that how the number of E grades
proportionally affects the performance of the students in academics. The proposed ensemble model
generates the predicted results with an accuracy of 90.83%.
Povzetek: Predstavljena je metoda strojnega učenja za napovedovanje učnega uspeha.

1 Introduction
Academic Data Mining (ADM) has obtained astounding
inquisitiveness in the recent years. The need for the
analysis and assessment of the factors impacting the
academic performance of students has embellished the
demand for Academic Data Mining (ADM) or
Educational Data Mining (EDM) [1]. Significantly, such
factors can include student academic performance
measured in terms of final grades obtained, course
attendance, mid-assessment marks, etc. [2]. ADM plays
a pivotal role in analyzing student performance based on
the above-said factors and thereby classifying them into
fast and slow learners. Additionally, ADM can also aid
in providing subtle suggestions and recommendations
for both the instructors as well as the students in
improvising their performance. This can involve
processes such as academic performance prediction and
academic performance recommendations. Both the
processes are essential for every educational institution
as their reputation is centered upon the academic
accomplishments of students [3]. The primary goal of
academic performance prediction of learners is the
identification of students at risk in their initial stage of
career. This identification helps the instructor to analyze
the factors affecting the performance such that corrective

actions can be taken for the students at risk of lower
achievement levels. Moreover, the timely analysis of
weak performers benefits the academic institutions in
increasing their retention rate [4].
The academic performance of students is predicted
using different supervised learning techniques such as
classification and prediction. Learning Analytics (LA)
plays a very significant in the field of education. The
motivation for using LA by academic institutions is to
analyze the patterns obtained from the educational data
after prediction. So, after the academic performance
prediction, LA in association with ADM is used to
generate effective results that leads to the categorization
of different types of students [5].
This research proposes a model that serves as an
alarming structure for educational organizations. The
proposed model can be used by the students to discover
and concentrate on their disconcerting subjects while the
faculties can focus on improving their learning strategies
towards such students. Currently, many machine
learning algorithms are available for envisaging student
educational performance and ADM [6, 7]. The proposed
model is also an ensemble machine learning-based
model that predicts the student’s academic performance
using an ensemble of machine learning algorithms,

98 Informatica 47 (2023) 97–108

Decision Tree, Naïve Bayes, and K-Nearest Neighbor.
For performance prediction, the records are collected
from the academic institution which is then preprocessed to eliminate anomalies so that only the data
which is helpful for the analysis purpose is anomalies
free. The cleaned data is then applied to the model and
thereafter produced the predicted results.

1.1 Motivation for the work
Currently, the majority of the academic institutions
face challenges related to the decreasing student
academic performance and thereby rising student
dropout ratio. This poses an alarming and stakecompromising situation for the academic institutions.
They consistently struggle to maintain the retention rate
of the students. Similarly, the decline in the student
academic
performance
impacts
a
student
physiologically, economically and socially. Some
students get demotivated and resultantly think of
discontinuing their degree. This leads to the increase in
the dropout rate for the academic institution. Such
circumstances are challenging for the teaching fraternity
as well since the failure or decrease in the student
academic performance puts a question mark on overall
conduct of the teacher. It raises concerns on the teaching
capabilities and pedagogical approach followed by
him/her.
The proposed ensemble prediction model has been
developed considering such circumstances. It helps in
the reduction of drop-out rates and results in improving
the retention rate of students. It provides the solution for
the increase drop out issue faced by institutions by
predicting the academic performance of the students
precisely and proficiently. The proposed model has been
trained using the historic data of students and then tested
using the testing dataset. The predicted results classify
the students into slow learners and fast learners. The
proposed model serves as an alarming system for slow
learners, the students who are at academic risk at the
early stage of their carrier along with the courses
affecting their performance. The early identification of
students at academic risk helps the instructors to create
new pedagogies, strategies and special academic
counselling sessions for the weak students. Additionally,
such initiatives helps the slow learners to concentrate
more on their weak areas so that they can perform well
in their academics and thereby improvising their
performance. The improvement in the academic
performance at early stage helps the slow learners to
complete their degree on time that further improves the
retention rate which further improves the repute of
academic institutions.
Overall, the proposed model is useful for academic
stakeholders including learners, instructors/ teachers and

H. Kaur et al.

educational institutions. It benefits the learners in their
self-assessment on academic background by providing
the reasons which are responsible for their academic
downfall. The model assist the instructors to keep track
of the academic growth of the students and helps them to
provide special attention towards the slow learners. The
predicted results of the proposed model helps the
educational institutions to devise new strategies and
steps for promoting and educating slow learners for their
performance improvement thereby increasing the
retention rate of the institutions.
The rest of the paper has been divided into 5
sections. Section 2 lists a tabular representation of the
existing techniques used for predicting students’
academic performance. The proposed model has been
elaborately discussed along with its structure and
working in Section 3. Section 4 covers the empirical
analysis of the proposed model on the collected data. The
last fragment in the paper concludes with a brief
description of why the ensemble approach has been
preferred for predicting students’ academic performance.
It also concludes with an insight into the futuristic
extensions that can be made in the proposed model.

2 Literature review
The existing educational research shows that the
intersection of academic data and machine learning
techniques is advantageous for carrying out
interdisciplinary work [8]. Research on educational data
helps in the identification and selection of various factors
revealing argumentative and empirical academic results.
The implementation of various machine learning
techniques on collected academic records can help in
developing dynamic alarming systems. Such systems
will be beneficial for both instructor/tutor as well as
learners to work in their lacking areas [9, 10].
Subsequently, the learners can improvise their
academic performance based on the feedback of
predicted results of alarming systems such that they can
complete their respective degrees on time and with
minimum dropouts or backlogs. Table. 1 illustrates the
review of literature along with the techniques used and
objectives of each model, and Figure. 1 shows the
categorization of different prediction models based on
machine learning.

A Prediction Model for Student Academic Performance…

Informatica 47 (2023) 97–108 99

Table 1: Existing academic performance prediction models.
Prediction
Models

Machine Learning Technique(s) Used

Core Objective

[1]

Decision Trees, Support Vector Machines,
Naive Bayes, Bagged Trees, and Boosted
Trees

The early segmentation of students based upon their performance
in the first year which helps in achievements of better results
during the course completion.

[3]

Decision Trees

To categorize the students based upon their performance.

[4]

Logistic Regression, Neural Networks,
Random Forests.

To identify the various challenges faced by the student in their
first educational year based upon student registration data.

[5]

Decision Trees, Rule and Fuzzy Rule
Induction Methods, and Neural Networks.

To predict the marks of university students in their final exams.

[11]

Logistic/Linear
Factorization

To use educational data for an intelligent tutoring system.

[12]

Linear Regression, Neural Networks,
Support Vector Machines

To predict the student score based upon their mid-term marks.

[13]

Neural Networks, Random Forests, and
Decision Tree

To predict the student academic performance of first-year
students

[14]

Linear regression, neural networks,
support vector machines, decision trees,
naive Bayes, k-nearest neighbor

To provide various courses based upon the existing data which
help in improving the academic performance of a student.

[15]

Decision tree, Gradient boost algorithm,
and Naïve Bayes

To identify the weak students and provide special counselling for
their betterment.

[16]

SVM and Naïve Bayes

To predict the student’s academic performance using Naïve
Bayes and compare the predicted results with the results
generated by SVM.

[17]

K-Nearest Neighbor, Naïve Bayes,
Decision Tree, and Logistic Regression

The main objective of the study is to predict the student’s
academic performance along with the factors affecting their
performance.

Decision Tree

To assess the student’s academic performance using the decision
tree. The predicted results were used to provide a
recommendation to weak students so that they can improve their
performance which lowers the failure rate.

Naïve Bayes, Neural Network, and
Decision Tree

The main objective is this research is the usage of various data
mining techniques to predict and analyse the academic
performance of students founded from the academic data
available by a participated forum.

[18,19]

[20]

[21,22]

Regression,

Matrix

Random Forest, Neural Networks, SVMs,
and Regression Techniques

EDM was used to identify the weak students, based upon their
performance. It also helps in the identification of various factors
responsible for affecting and deteriorating the academic
performance of the students.

100 Informatica 47 (2023) 97–108

H. Kaur et al.

Techniques Used

Decision
Tree
[1],[9],[13][
14],[15],[1
6],[18],[19]
,[20],[26]

Support
Vector
Machines
[1],[12],[14
],[17],[21]

Naive
Bayes
[1],[14][15]
,[17][18],[2
0]

Neural
Networks
[9],[12],[13
],[14],[22],[
27]

Random
Forest
[13],[22],[2
7]

Bagged
Trees and
Boosted
Trees
[1],[15]

Linear
Regression
[11],[12],[14]
,[22]

K-Nearest
Neighbour
[14],[18]

Logistic
Regression
[18],[28],[27]

Figure 1: Categorization of existing prediction models based on the machine learning techniques used.

3 Proposed ensemble model
The primary goal of creating an ensemble model
helps in the production of more accurate results as
compared to the accuracy of results produced by
individual classifier. The proposed model uses the
ensemble of heterogeneous classifiers. The ensemble
model proposed here accepts the output from multiple

classifiers such as decision tree, Naïve Bayes, and K-NN.
The proposed ensemble combines the output of
heterogeneous classifies using voting approach which
resultantly produces the final prediction results. The idea
of ensemble approach works if and only if all the selected
classifiers producing different class labels rather than
agreeing on the same decision. Figure 2 depicts the flow
of ensemble method.

Figure 2: Basic ensemble approach for prediction.

The proposed ensemble model performs
classification of the students based on their academic
performance considering their marks in the courses
inclusive of their attendance in each course. The data for
classification has been collected from sources such as
using Google form and a designed interface. Certain
attributes generate irrelevant values such as incomplete
data, duplicate data, naming identification problems and
hence have no participation in the classification process.
Thus, such irrelevant attributes were stricken out of the
classification process else the use of these attributes
could have increased the classification errors and

complexity of the selected algorithm. Conclusively, this
helped in making the predictions more accurate.
The proposed ensemble model has been designed
to predict student academic performance using an
ensemble of machine learning algorithms. The primary
objective of designing an ensemble model is that every
selected classifier must be complementary to each other
in the context of a judgment so that further accuracy can
be achieved [23]. The model intends to compute the
student academic performance (in terms of Cumulative
Grade Points) and achieve an early separation of learners

Informatica 47 (2023) 97–108 101

A Prediction Model for Student Academic Performance…

segregating them into slow and fast learners based upon
their educational performance.

3.1 Working of the proposed ensemble
model
When it comes to predicting student academic
performance, a single classification model might not
produce the appropriate outcome. Moreover, the single
classification models suffer from high variance [24,
25].In the proposed ensemble approach, the output of
multiple models has been combined which further
enhances the overall accuracy of prediction results.
There are some ensemble approaches like bagging,
boosting, stacking, and voting with each having its pros
and cons.
In the proposed model, voting technique has been
used because the prediction results have been produced
by combining the output of multiple classifiers. The
results generated by the voting approach are better in
comparison with a single classifier because in voting the
decision depends upon the majority vote [26]. The choice
of voting approach has been made because it produces
predicted results with low variance in comparison to the
variance produced by single classification model [27,
28].
The students are the key component of the proposed
ensemble model as they provide their academic details

as input. The academic details comprise their courses,
marks/grade in each course, and attendance in individual
courses as these academic parameters are considered as
the crucial factors for measuring the academic
performance of students. An interface has been designed
to get the academic details of the students that are used
for the model testing. The interface supports
heterogeneous devices where the learners can provide
their academic inputs by using either their smartphones,
laptops, or even their desktops too.
The students input their educational details through
the designed student interface. Such student academic
data is stored in an academic database and is the core
substantial asset for the prediction process. The stored
data formulates different student records and is preprocessed, and then it is used to train the proposed model.
During the pre-processing stage, the academic records
have been integrated followed by checks to look for any
inconsistencies, such as duplicates, missing values, etc.
Consequently, the pre-processing stage generates the
refined data which is further used to train the proposed
ensemble model.
In the proposed ensemble model, the training
dataset is used for the generation of rules which are being
used for the prediction as shown in Figure 3. The testing
dataset is being applied to constructed ensemble model
to get the predicted academic performance based upon
the rules generated using the training dataset.

Figure 3: Proposed ensemble model.

102 Informatica 47 (2023) 97–108

H. Kaur et al.

The predicted results of the model are beneficial for
both the instructor as well as the learner. It enables the
instructor in scrutinizing the student's academic results
and derive their performance from them which can be
further used to take certain novel strategic actions for
improvising the performance of slow learners.
Concomitantly, this helps the recognizing the students at
academic risk at the stage of academics which helps in
augmenting the student retention rate and completion of
degree on time. Also, the predicted academic
performance is used as feedback by the students.

3.2 Mathematical formulation and proposed
algorithm

•

The results of the proposed ensemble model used
by 2nd-semester students further recommend the
courses because in majority of the universities the
selection option has been started from the second
year onwards.
•
The number of subjects considered for the
calculation of 𝐶𝐺𝑃𝐴 was 8.
•
The total marks of various courses inclusive of
attendance marks.
•
For predicting the student academic performance
the grade consideration is from A-E.
The following table shows the description of grade
points and grades based upon the marks:

Analytically, the proposed algorithm helps to
categorize the different types of learners into strong and
weak learners. The differentiation identifies the weak
learners and also the courses in which they have
underperformed. Subsequently, this helps the weak
learners to concentrate more on such subjects they were
lagging and resultantly improvise their performance.
Identification of weak performers at early stage guides
them to perform well in their end term exams.
Mathematically, in order to categorize the students, their
𝐶𝐺𝑃𝐴 has been calculated by considering their grade
points and credit for each course. For calculating
the 𝐶𝐺𝑃𝐴, the student’s grade points have been initially
computed from the marks obtained in each course as
shown in Table. 2.
The proposed model has based on certain
assumptions which are as follows:
•

Table 2: Grade as per marks range.

The 𝐶𝐺𝑃𝐴 of students has been calculated by
considering the grade points of each course. In the
proposed model for 𝐶𝐺𝑃𝐴 calculation, the grade
point consideration is at a 10 scale.

𝑦𝑖𝑒𝑙𝑑𝑠

Objective Function: Map (𝑆𝑡𝑢𝑖 , 𝐶𝑜𝑢𝑗 , 𝑀𝐶𝑖𝑗 →

Range
of
Marks

Grade
Point

90 - 100

9.0 - 10.0

A+

80 - 89

8.0 - 8.9

A

70 - 79

7.0 - 7.9

B+

60 - 69

6.0 - 6.9

B

50 - 59

5.0 - 5.9

C

40 - 49

4.0 - 4.9

D

< 40

0.0-3.9

E

Grade

𝑐𝑔𝑝𝑎) (1)

Where:
𝑆𝑡𝑢: Students
𝐶𝑜𝑢: Courses
𝑀𝐶𝑖𝑗 : Marks obtained by 𝑖 𝑡ℎ student in 𝑗𝑡ℎ course.
𝐶𝐺𝑃𝐴: Commulative Grade Point Assessment.
𝑖: Index of Students𝑖 ∈ 𝑆 𝑤ℎ𝑒𝑟𝑒 𝑆 = {1 ≤ 𝑖 ≤ 𝑛}
𝑆 = 𝑆𝑒𝑡 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑎𝑛𝑑 𝑛 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑎𝑥𝑖𝑢𝑚𝑢𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠
𝑗: Index of Course and 𝑗 ∈ 𝑅 𝑤ℎ𝑒𝑟𝑒 𝑅 = {1 ≤ 𝑗 ≤ 𝑚}
𝑅 = 𝑆𝑒𝑡 𝑜𝑓 𝐶𝑜𝑢𝑟𝑠𝑒𝑠 𝑎𝑛𝑑 𝑚 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑎𝑥𝑖𝑢𝑚𝑢𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑢𝑟𝑠𝑒𝑠
𝑀𝐶𝑖, 𝑗: Marks in each course such that𝑖 ∈ 𝑆 𝑎𝑛𝑑 𝑗 ∈ 𝑅
𝑤ℎ𝑒𝑟𝑒 𝑆 = {1 ≤ 𝑖 ≤ 𝑛} 𝑎𝑛𝑑 𝑅 = {1 ≤ 𝑗 ≤ 𝑚}
For accomplishing the objective function, a map
function has been devised. The mapping function
predicts the performance of the students by calculating

their CGPA based upon the academic details given by
students. Here, the map is the function that maps the𝑖 𝑡ℎ
students in to their corresponding CGPA by considering

Informatica 47 (2023) 97–108 103

A Prediction Model for Student Academic Performance…

their course and marks in each course. The general
formula for the calculation of 𝐶𝐺𝑃𝐴 is depicted in Eq.
(2).

CGPA =

∑(G∗CR)
∑ CR

(2)

Where:
𝐶𝐺𝑃𝐴 −Cumulative Grade point Average
𝐶𝑅 −Represents the credit score of a course
𝐺 −Represents Grade points obtained by the student
in a course.

𝑔11
𝑐𝑔𝑝𝑎1
𝑔21
𝑐𝑔𝑝𝑎2
1
𝐶𝑔𝑝𝑎𝑖 =[ ⋮ ]=∑𝑚
[ ⋮
𝑗=1 𝑐𝑟𝑗
𝑐𝑔𝑝𝑎𝑛
𝑔𝑛1

The proposed model is composed of a set 𝑆𝑡 =
{𝑠𝑡𝑢1 , 𝑠𝑡𝑢2 , 𝑠𝑡𝑢3 , … … . . 𝑠𝑡𝑢𝑛 } of n students such that
𝑆𝑡𝑢 = {𝑠𝑡𝑢𝑖 |1 ≤ 𝑖 ≤ 𝑛} specifies the number of
students; a set 𝐶𝑜𝑢 = {𝑐𝑜𝑢1 , 𝑐𝑜𝑢2 , 𝑐𝑜𝑢3 , … … . . 𝑐𝑜𝑢𝑚 }
represents the 𝑚 different subjects such that 𝐶𝑜𝑢 =
{𝑐𝑜𝑢𝑗 |1 ≤ 𝑗 ≤ 𝑚}.
Let 𝑔𝑖𝑗 denotes the grade points obtained by the
𝑖 𝑡ℎ student in𝑗𝑡ℎ course. If 𝑐𝑔𝑝𝑎𝑖 is the CGPA of 𝑖 𝑡ℎ
student, then it can be obtained by matrix algorithm
specified in Eq. (3):

𝑔12
𝑔22
⋮
𝑔𝑛2

⋯ 𝑔1𝑚 𝑐𝑟1
⋯ 𝑔2𝑚 𝑐𝑟2
⋮ ][ ⋮ ]
⋯ 𝑔𝑛𝑚 𝑐𝑟𝑚

(3)

Where 𝑐𝑟𝑗 denotes the credits corresponding to 𝑗𝑡ℎ course∀ 1 ≤ 𝑗 ≤ 𝑚, and the proposed algorithm is shown as follows:
Objective Function: Mapping of student with their CGPA by considering their program courses and marks in individual
course which affect student’s academic performance consists of {Students, Courses, Marks in each course}
Input: Student academic details
Output: Student categorization into weak and strong learners; Special inputs to weak students for improving their
performance.
1. Perform preprocessing of collected data.
2. Use the pre-processed data as a training dataset.
3. Training dataset is used to train the model for the generation of rules.
4. Testing data is used for the prediction of performance using trained model;
{𝑠𝑡𝑢𝑖 , 𝑐𝑜𝑢𝑗 , 𝑀𝐶𝑖𝑗 } has been applied to map to get 𝑐𝑔𝑝𝑎𝑖 , using Eq. (1).
5. (a) Eq. (2) specifies the general formula for the calculation of 𝐶𝐺𝑃𝐴.
(b) 𝑐𝑔𝑝𝑎𝑖 is computed using Eq. (3) where 𝑐𝑔𝑝𝑎𝑖 is the 𝐶𝐺𝑃𝐴 of individual student.
6. The calculated𝐶𝐺𝑃𝐴 helps in the identification of weak and strong learners.
7. The predicted results are being used by the:
Learners (to improve their performance).
Instructors (to provide suggestive measures to poor performers)

4 Results
The experimental results have been obtained using
the data from the department of computer science of an
academic institution. The dataset contains 400 records of
current students belonging to different sections of the
computer science department. The dataset has been

divided using the split operator, where 70% of the entire
data is being used for training the model and the rest 30%
is used for the testing of an ensemble model. Major
attributes considered for analyzing the performance are
the attendance in each course, the grade obtained in each
course, the overall CGPA of the student, number of
pending E grades. Figure 4 shows the results generated
by the proposed ensemble model.

104 Informatica 47 (2023) 97–108

H. Kaur et al.

PREDICTED PERFORMANCE BY PROPOSED
ENSEMBLE MODEL
true 1
PRED. 1
PRED. 0

true 0

27

1
10

82

true 1

pred. 0
10

pred. 1
27

true 0

82

1

Figure 4: Results generated by ensemble method.
Performance vector shown in Tab. 3 proves the
accuracy of the ensemble method using a vote operator
that uses the majority vote from the base learners for
predicting the results. The ensemble method has shown
an accuracy of 90.83%. In the confusion matrix, 0

represents good performers and 1 denotes bad
performers. For fast learners, 82 instances are correctly
identified whereas 10 are incorrectly identified.
Similarly, for bad performers, 27 instances are correctly
identified whereas 1 is incorrectly identified.

Table 3: Performance vector (ensemble method).

true 0

true 1

class
precision

pred. 0

82

10

89.13%

pred. 1

1

27

96.43%

class recall

98.80%

72.97%

Figure 5: Relationship between actual and predicted performance.

A Prediction Model for Student Academic Performance…

The scattered 3D plot view of the relationship between
actual and predicted results generated by the proposed
ensemble model is illustrated in Figure 5 where x axis

Informatica 47 (2023) 97–108 105

represents the RegdNo and the value column signifies
performance in terms of slow (1) and fast learners (0).

Figure. 6: Actual and predicted results based on E grades.

Figure 7: Predicted performance based upon pending E grades.

The actual and predicted results based on E-grades has
been depicted in Figure 6 and 7. Figure 8 illustrates the
registration-wise predicted performance of students after
the re-appear exam has been given. The blue colour
circle indicates good performance that is signified by 0
whereas the green colour represents the poor

performance of the student which is denoted using 1. The
results show that the more the number of re-appears a
student is having considered under the category of a poor
performer. Therefore, corrective actions for such
students are required to be taken on time by the student
as well as from the instructor.

106 Informatica 47 (2023) 97–108

H. Kaur et al.

Figure 8: Predicted results considering reappear exam given.

5 Conclusion and future directions
Presently, the academic educational institutions are
facing difficulty in sustaining the low retention rate of
students. The task of maintain the retention rate can only
be achieved by reducing the drop-out ratio of students.
The high student retention rate depends significantly on
the student academic performance. It becomes highly
important for the academic institutions to predict the
student performance for subsequent sessions such that
retention rate can be maintained as well student
performance can be improved. Also, the prediction of
student academic performance at an early phase of their
degree helps to do self-assessment for their downfall so
that the student can do the corrective actions to improvise
his/her on time. The model is helpful for the instructors
as well who can verify and revise their pedagogical
approaches if required.
A lot of research is being carried out to develop models
for predicting the student academic performance using
academic data mining strategies. Various machine
learning techniques have been used to develop such
predication models that act as an aid for the academic
institutions. The paper proposes an ensemble model
based on machine learning techniques, Decision Tree,
Naïve Bayes, and K-NN classification algorithms
catering to such problems. It helps in identifying the
weak learners by predicting their performance based
upon the historical academic data. The model has been
implemented on a gathered dataset and achieves an
accuracy 90.83%.
The research work presented in this paper can be
further extended to develop a recommender system that
will use the performance prediction results and
subsequently recommend course-specific elective
courses to the students. Such recommendations tend to
augment student skills depending on their performance.
Additionally, a recommender system can be developed
that offers students interest-oriented or choice-driven

suggestions regarding course selection considering and
mapping the student’s previous performance along with
the student choice. The major research for the academic
performance prediction of the students considers the
direct factors (such as courses, marks in each course,
attendance and grades etc.). The incorporation of the indirect factors (such as physiological, behavioral,
economic and social etc.) that affect the student
academic performance can be carried out further.
Recently, several edtech companies have emerged
during COVID 2019 era. Such companies are engaged in
the practice of incorporating Information Technology
(IT) and digital tools for the student learning and
engagement. The edtech companies are now using
predictive analytics for mining student academic
records, enrollment, attendance, class engagement, etc.
The edtech companies can use the prediction as well as
recommendation models to help the students by
suggesting the appropriate course based upon their
predicted performance.
Acknowledgement: Mohamed Alwanin would like to
thank Deanship of Scientific Research at Majmaah
University for supporting this work under Project No. R2022-###. The authors deeply acknowledge the
Researchers Supporting Program (TUMA-Project-202114), AlMaarefa University, Riyadh, Saudi Arabia for
supporting steps of this work.
Funding Statement: Mohamed Alwanin like to thank
Deanship of Scientific Research at Majmaah University
for supporting this work under Project No. R-2022-###.
This research was supported by Researchers Supporting
Program
(TUMA-Project-2021-14),
AlMaarefa
University, Riyadh, Saudi Arabia.
Conflicts of Interest: Authors declare that there is no
conflict of interest associated with this study.

A Prediction Model for Student Academic Performance…

References
[1] V.L. Miguéi, A. Freitas, P.J.V. Garcia and A. Silva,
"Early segmentation of students according to their
academic performance: A predictive modelling
approach," Decision Support System,vol. 6, no. 5,
pp. 65-78, 2018.
[2] S. J. Lakshmi and M. Thangaraj, "Recommender
system for stimulating the learning skill of slow
learner in higher educational institution using
EDM," International Journal on Recent
Technolofical Engineering, vol. 5, pp. 98-109,
2019.
[3] D. T. Ha, P. T. T. Loan, C. N. Giap and N. T. L.
Huong, "An empirical study for student academic
performance prediction using machine learning
techniques," International Journal of Computer
Science and Information Security (IJCSIS), vol. 18,
no. 3, pp. 75-82, 2020
[4] R. Umer, T. Susnjak, A. Mathrani and S.
Suriadi,"On predicting academic performance with
process mining in learning analytics," Journal of
Resource Innovation and Teach Learnearning, vol.
78, pp. 155-168, 2017.
[5] O.H.T. Lu, A.Y.Q. Huang, J.C.H. Huang, A.J.Q
Lin, H. Ogata et al., "Applying learning analytics
for the early prediction of students’ academic
performance in blended learning," Educational
Technological Socoety,vol. 55, pp. 111-123, 2018.
[6] O. Viberg, M. Hatakka, O. Bälter, A.
Mavroudi,"The current landscape of learning
analytics in higher education,"Computers in
Human Behavior, vol. 18, pp. 1001-1222, 2018.
[7] M. S. B. M. Azmi and I. H. B. M. Paris, “Academic
performance prediction based on voting
technique,” in 2011 IEEE 3rd International
Conference on Communication Software and
Networks , Calcuta, India, pp. 24-27, 2011.
[8] Tarandeep Kaur, Harjinder Kaur, "Machine
Learning: An Internal Review", Journal of
Emerging Technologies and Innovative Research,
5, no. 11, 6, 2018.
[9] C. Romero, P.G. Espejo, A. Zafra, J.R. Romero and
S. Ventura, "Web usage mining for predicting final
marks of students that use Moodle courses,"
Computer Application in Engineering and
Education,vol. 65, pp. 555-578, 2013.
[10] A. M. Shahiri and W. Husain, "A review on
predicting student's performance using data mining
techniques," Procedia Computer Science, vol. 72,
pp. 414-422, 2015.
[11] N. Thai-Nghe, L. Drumond, A. Krohn-Grimberghe
and L. Schmidt-Thieme, "Recommender system for
predicting student performance," Procedia
Computer Science, vol. 20, pp. 55-65, 2010.
[12] S. Huang and N. Fang, "Predicting student
academic performance in an engineering dynamics
course: A comparison of four types of predictive

Informatica 47 (2023) 97–108 107

mathematical models,"Comput Education, vol. 55,
no. 6, pp. 33-42, 2013.
[13] M. Imran, S. Latif, D. Mehmood and M. S.
Shah,"Student academic performance prediction
using
supervised
learning
techniques,"
International Journal on Emerging Technologies in
Learning, vol. 77, pp. 102-120, 2019.
[14] P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira
and R. Abreu,"A Comparative Study of
Classification and Regression Algorithms for
Modelling Students’ Academic Performance,", in
Proc. ICEDM, Noida, India, pp. 55-64, 2015.
[15] P. Kamal and S. Ahuja,"An ensemble-based model
for prediction of academic performance of students
in undergrad professional course," Journal of
Engineering Design and Technology, vol. 98, pp.
654-672, 2019.
[16] V. Skrbinjek and V. Dermol, "Predicting students’
satisfaction using a decision tree,"Tert Education
and Management,vol. 64, pp. 210-218, 2019.
[17] Dr. Antino Marelino. (2014). Customer Satisfaction
Analysis based on Customer Relationship
Management. International Journal of New
Practices in Management and Engineering, 3(01), 07
12.
Retrieved
from
http://ijnpme.org/index.php/IJNPME/article/view/2
6
[18] Dr. Sandip Kadam. (2014). An Experimental
Analysis on performance of Content Management
Tools in an Organization. International Journal of
New Practices in Management and Engineering,
3(02),
01
07.
Retrieved
from
http://ijnpme.org/index.php/IJNPME/article/view/2
7
[19] Ms. Nora Zilam Runera. (2014). Performance
Analysis on Knowledge Management System on
Project Management. International Journal of New
Practices in Management and Engineering, 3(02),
08 - 13. Retrieved from
http://ijnpme.org/index.php/IJNPME/article/view/2
8
[20] Mrs. Leena Rathi. (2014). Ancient Vedic
Multiplication Based Optimized High Speed
Arithmetic Logic. International Journal of New
Practices in Management and Engineering, 3(03), 01
06.
Retrieved
from
http://ijnpme.org/index.php/IJNPME/article/view/2
9Kaur H, Kushwaha AS., “An elicit elucidation on
the process of education data mining” , International
Conference on Intelligent Computing and Control
Systems, ICCS 2019.
[21] S. Roy and A. Garg, "Predicting academic
performance of student using classification
techniques," in 2017 4th IEEE Uttar Pradesh
Section International Conference on Electrical,
Computer and Electronics (UPCON), Korat,
Thailand, pp. 568-572, 2017.
[22] S. Poonam, S. Ahuja, V. Jaitly and S. Jain, “A
framework to alleviate common problems from

108 Informatica 47 (2023) 97–108

[23]

[24]

[25]

[26]

[27]

recommender system,"A case study for technical
course recommendation," Journal of Discrete
Mathematical Sciences and Cryptography, vol. 23,
no.2, pp. 451-460, 2020.
A. Rajak, A. K. Shrivastava and V. Vidushi,
“Applying and comparing machine learning
classification algorithms for predicting the results
of students,” Journal of Discrete Mathematical
Sciences and Cryptography, vol. 23, no.2, pp. 419427, 2020.
H. Guruler, A. Istanbullu and M. Karahasan, "A
new student performance analysing system using
knowledge discovery in higher educational
databases," Computer Education, vol. 6, no. 5, pp.
125-138, 2010.
A. Rajak, A. K. Shrivastava and V. Vidushi,
“Applying and comparing machine learning
classification algorithms for predicting the results
of students,” Journal of Discrete Mathematical
Sciences and Cryptography, vol. 23, no.2, pp. 419427, 2020.
A. Siddique, A. Jan, F. Majeed, A.I. Qahmash,
N.N. Quadri et al., “Predicting Academic
Performance Using an Efficient Model Based on
Fusion of Classifiers,” Applied Sciences, vol. 11,
no. 24, pp. 11845, 2021.
A. S. Hoffait and M. Schyns,"Early detection of
university students with potential difficulties,"
Decision Support System, vol. 9, no. 5, pp. 5-20,
2017.

H. Kaur et al.

https://doi.org/10.31449/inf.v47i1.3510

Informatica 47 (2023) 109–114

109

A Multi-channel Convolutional Neural Network for Multilabel Sentiment
Classification Using Abilify Oral User Reviews
Tina Esther Trueman1 , Ashok Kumar Jayaraman∗,2 , Jasmine S2 , Gayathri Ananthakrishnan3 and Narayanasamy P4
1
Department of Computer Science, University of the People, Pasadena, United States
2
Department of Information Science and Technology, Anna University, Chennai, India
3
Department of Information Technology, Vellore Institute of Technology, Vellore, India
4
Department of Electrical and Electronics Engineering, PSG College of Technology, Coimbatore, India
E-mail: tina.trueman@uopeople.edu, jashokkumar83@auist.net, jasminemtech7@gmail.com, gayathri.a@vit.ac.in, drpnsam@gmail.com
∗
Corresponding author
Keywords: Multilabel classification, sentiment classification, multichannel convolutional neural network, abilify user
reviews
Received: April 13, 2021
Nowadays, patients and caregivers have become very active in social media. They are sharing a lot of
information about their medication and drugs in terms of posts or comments. Therefore, sentiment analysis
plays an active role to compute those posts or comments. However, each post is associated with multilabel
such as ease of use, effectiveness, and satisfaction. To solve this kind of problem, we propose a multichannel
convolution neural network for multilabel sentiment classification using Abilify oral user comments. The
multichannel represents the multiple versions of the standard model with different strides. Specifically,
we use the pre-trained model to generate word vectors. The proposed model is evaluated with multilabel
metrics. The results indicate that the proposed multichannel convolutional network model outperforms the
traditional machine learning algorithms.
Povzetek: Razvita je konvolucijska mreža za preučevanje izmenjav mnenj o boleznih na socialnih omrežjih.

1 Introduction
Social media has become an active part of drugs and medication users. They share the advantage or disadvantages
of their medication and drugs. This information may give
some insightful information about the reaction of the drug.
Therefore, sentiment analysis plays a wide role to compute
the opinions of drug users and caregivers. The sentiment
analysis can be performed at the document level, sentence
level, or aspect level [1, 2]. The document and sentence
level computes the overall opinion. But, the aspect level
computes opinion at a specific target or an entity. In this
paper, we aim to focus on aspect level sentiment. A comment may be associated with a single label or multilabel
[3]. The single label problem has only one label. However,
It has two classification methods namely, binary classification or multiclass classification [4]. The binary classification problem belongs to a binary set such as true and false
or positive and negative. The multiclass classification problem belongs to a set of more than two elements such as positive, neutral, and negative. In these problems, algorithms
assign only one label to comment or instance. Multilabel
classification problem belongs to a set of multiple target labels where each label maybe belongs to a binary class or
multiclass.
Traditionally, the multilabel classification problems are

solved using problem transformation, adapted algorithms,
and ensemble learnings [3]. The problem transformation
problem is further solved using the binary relevance, classifier chain, and label powerset methods [5, 6]. However, these methods use the traditional bag of words (BoW)
method to represent features. These features fail to represent semantic meaning between words. Therefore, deep
learning models are proposed to capture the semantic meaning between words in the input sequence. It is also proven
that they outperform in many tasks such as image classification, text classification, etc [7, 8, 9]. In this paper, we propose a multichannel convolution neural network for multilabel sentiment classification using Abilify oral user comments. The multichannel model represents the multiple versions of the standard model with different strides. Particularly, we use the GloVe pre-trained model [10] to generate word vectors. We then evaluate the proposed multilabel
metrics.
This paper is organized as follows. Section 2 briefly describes the related works. The proposed multichannel convolutional neural network for multilabel sentiment classification is presented in Section 3. In Section 4, the results and
their comparison is presented. Finally, Section 5 concludes
the paper.

110

2

Informatica 47 (2023) 109–114

Related works

In recent years, researchers widely studied clinical text and
user text using natural language processing (NLP). They
used both machine learning and deep learning to solve their
problems. In this paper, we present the existing works on
biomedical texts. Baumel et al. [11] investigated four models such as SVM, CNN, CBOW, and hierarchical attentionbased recurrent neural network models for the extreme multilabel task using the MIMIC datasets. The authors indicated that the hierarchical attention-based recurrent neural
network model achieves a 55.86% F1 score. Wang et al.
[12] developed a rule-based algorithm to generate labels
that are weakly supervised. Then, the authors used the pretrained word embeddings to represent deep features. They
employed SVM, random forest, MLPNN, and CNN algorithms. Their study indicated that the CNN model achieves
the best performance score. Singh et al. [13] developed an
attentive neural tree decoding model for tagging structured
bio-medical texts with multilabel. This method decodes an
input sequence into a tree of labels. The authors suggested
that the proposed model outperforms on SOTA (sate-of-the
art) approaches with biomedical abstracts. Citrome [14] reviewed the treatment of Abilify oral users with bipolar I disorder and schizophrenia. The author indicated that the tolerability of Abilify with schizophrenia appears superior to
haloperidol, risperidone, and perphenazine. Rios et al. [15]
demonstrated the biomedical text classification task using
CNN. They indicated that they achieved a 3% improvement
over the SOTA results.
Moreover, Gargiulo et al. [16] presented a deep neural network (DNN) for extreme multilabel and multiclass
text classification tasks. The authors used two models: the
first one uses a word embedding with two dense layers,
and the second uses the convolution, word embedding, and
the dense layers. Kolesov et al. [17] performed multilabel
classification on incompletely labeled biomedical texts using the SVM and RF. They used soft supervised learning
and weighted k-nearest neighbor algorithms for modifying
the training set. Their study indicated that both algorithms
perform better. Parwez et al. [18] presented the CNN
model for multilabel text classification. The authors used
the domain-specific and generic based pre-trained model to
predict class labels. In summary, the above authors used
SVM, NB, RF, and CNN to perform multilabel classification tasks on various biomedical texts (Table 1). In this
paper, we propose multichannel convolutional neural network for multilabel sentiment classification using Abilify
oral user comments.

3

The proposed method

In this section, we present a multichannel convolutional
neural network for a multilabel sentiment classification
model using Abilify oral user comments. The system architecture is shown in Fig.1. It includes data pre-processing,
word embedding, multichannel CNN, merge layer, fully

T.E. Trueman et al.

connected layer, and an output layer. Each of these processes is explained as follows.

3.1 Abilify oral dataset
We obtained this Abilify oral dataset from the IEEE Dataport [19, 20]. It contains 1722 user comments with their age
group, gender, treatment condition, patient type, treatment
duration, and labeled sentiment on satisfaction, effectiveness, and ease of use.

3.2 Pre-processing
The dataset is converted from upper case to lower case, removed punctuation lists and stop words, and retained the
numbers where it describes the drugs in grams. Then, each
instance is split into separate words using the tokenization
method.

3.3 Multichannel convolutional neural
network
The multichannel convolutional neural network represents
the multiple version of the standard convolutional neural
network model with different sizes of kernels. This representation allows the instance or document to process in
different n-grams such as 4-gram, 6-gram, and 8-grams at
the same time [22]. In particular, we define the standard
convolutional neural network model with a word embedding layer, one-dimensional convolutional layer, dropout
layer, max-pooling, and flatten layer. This standard version is defined with three channels for different n-grams.
Each component of the channel is explained as follows.
3.3.1 Word embedding
In NLP, word embedding represents a feature learning technique where it maps the vocabulary of words or phrases into
a vector space. Specifically, we use the GloVe word embedding [10] technique to generate word vectors in a fixed
dimension with the semantic relationship between words.
3.3.2 Convolutional layer
Convolutional neural networks perform well in image classification and computer vision-related tasks. The convolutional layer is an important part of the convolutional neural network. It slides over an input sequence with a fixed
kernel size to generate feature maps [15, 16, 18, 22, 23].
In this work, we use one-dimensional convolutional layers
to move the kernel in one direction. This layer is mostly
used to perform NLP tasks. The input and output of the 1D
convolutional layer are 2D. The convoluted feature maps
output the maximum, minimum, or average values using
pooling layers.

A multi-channel convolutional neural network for…

Authors
Baumel et al. [9]
Wang et al. [10]
Singh et al. [11]
Rios et al. [13]
Gargiulo et al. [14]
Kolesov et al. [15]
Parwez et al. [16]

Dataset
MIMIC Datasets
Mayo Clinic smoking status
Articles describing randomized controlled trials
MED-LINE Citations
PubMed Dataset
AgingPortfolio Dataset
Tweets dataset

Models
HA-GRU
CNN
NTD-s
CNN-Vote2
CNN-Dense
SVM
CNN-PubMed

Informatica 47 (2023) 109–114

Accuracy
55.86%
92.00%
32.70%
64.69%
20.15%
30.59%
94.12%

111

Key Findings
Classification of patient notes on ICD code assignment
A rule-based algorithm to generate labels that are weakly supervised
An attentive neural tree decoding model for tagging structured bio-medical texts with multilabel
Biomedical text classification
Extreme multilabel and multiclass text classification tasks
Multilabel classification on incompletely labeled biomedical texts
Domain-specific and generic based pre-trained model to predict class labels

Table 1: Summary of the related works.

Abilify
Reviews

Pre-processing

CNN
Channel 1
Word
Embedding

CNN
Channel 2
CNN
Channel 3

Merge
layer

Dense layer

BatchNorm
layer

Sigmoid
output

Figure 1: A multichannel convolutional neural network model.
3.3.3 Dropout layer

3.6

This layer is used to regularize the neural networks in terms
of overfitting and underfitting. Specifically, it ignores some
of the outputs in the neural network during the training process.

The batch normalization layer allows all layers of a network
to learn more independently. Specifically, it standardizes
or normalizes the result of previous layers. Also, this layer
acts as a regularization parameter to avoid overfitting.

3.3.4 Max-pooling

3.7

The max-pooling layer is applied over each feature map
to select the maximum value based on the filter size. It is
smaller in size than the feature map. The output of this layer
contains the most important feature values of the previous
feature map [15, 16, 18, 22].

The sigmoid output function predicts the probability-based
output for each label as shown in equation 1. It is successfully applied in multilabel classification problems [24].

3.3.5 Flatten layer
The flatten layer converts the pooled feature map into a single column or one-dimensional array. This result is passed
to a merged layer.

3.4 Merge layer
The merged or concatenate layer combines the output of
each channel. These combined results passed to a fully connected or dense layer.

3.5 Fully connected layer
A fully connected or dense layer connects the input of the
flatten layer to all units of the next layer. It works the same
as the feed-forward neural network.

Batch normalization layer

Sigmoid output layer

f (x) =

4

1
1 + expx

(1)

Results and discussion

We implemented the proposed multilabel multichannel
model on Abilify oral dataset. This dataset contains 1722
instances associated with a set of labels, namely, ease of
use, satisfaction, and effectiveness. We split the dataset
into training (1394), validation (155), and testing (173).
The data instances are preprocessed with various tasks such
as removing punctuations, stopwords, upper case to lower
case, and tokenization. Then, word vectors are generated
to the input sequences using the GloVe word embedding
model. The proposed multichannel convolutional network
model was applied to this dataset to perform the multilabel
classification task. This model represents three multiple
versions of the standard convolutional neural network with
different kernel sizes. The standard CNN model consists of
a word embedding layer, 1D-convolutional layer, dropout
layer, max-pooling layer, and a flattened layer. The output

112

Informatica 47 (2023) 109–114

T.E. Trueman et al.

Data

Accuracy score

Hamming loss

F1 micro score

Validation
Testing

0.548
0.538

0.275
0.303

0.839
0.820

Accuracy per label
0
1
2
0.820 0.931 0.750
0.815 0.912 0.715

Table 2: Performance of the proposed multichannel CNN model.
Data

Accuracy score

Hamming loss

F1 micro score

BR_NB
BR_DT
BR_SVM
CC_NB
CC_DT
CC_SVM
LP_NB
LP_DT
LP_SVM
Proposed

55.2
59.6
60.7
55.0
61.0
60.5
54.1
60.3
62.8
53.8

0.379
0.343
0.338
0.378
0.347
0.346
0.385
0.354
0.334
0.303

71.5
73.9
75.2
71.5
74.6
75.0
69.4
74.5
76.5
82.0

Accuracy per label
0
1
2
60.5 68.9 57.0
61.9 79.7 55.3
62.6 78.3 57.8
60.5 68.7 57.5
61.9 78.3 51.5
62.6 77.5 55.7
566. 72.0 55.7
60.9 77.7 55.1
63.4 79.8 56.4
81.5 91.2 71.5

Table 3: Model comparison.
of each channel is combined through a merged layer and it
is passed to a dense layer, batch normalization layer, and the
sigmoid output layer. Specifically, we fixed the following
hyperparameters using the random approach such as input
length with 150 units, 100 embedding dimension, three kernel sizes (4, 6, and 8), ReLU activation, 0.8 dropouts, pooling size 2, 10 units in the fully connected layer, 20 epochs,
and Adam optimizer with a binary cross-entropy loss function. The proposed multichannel CNN model for multilabel
classification is evaluated using various multilabel metrics,
namely, accuracy or exact match, hamming loss, F1-micro
average score, and accuracy per label [3, 5, 20, 21]. Table 2
shows the performance of the proposed multichannel CNN
model for multilabel classification. This result is compared
with the problem transformation approaches, namely, binary relevance, label powerset, and classifier chains with
NB, DT, and SVM [20] as shown in Table 3. The existing researchers in the Table 1 have addressed the multilabel
classification using different biomedical texts. In this work,
we used the patients and caregivers’ opinion on drugs and
medications dataset. In particular, we have compared the
results of our proposed method with various baselines as
shown in Table 3. The proposed multichannel CNN model
achieves better results in terms of Hamming loss (30.3%),
F1 micro score (82.0%), and accuracy per label (81.5%,
91.2%, 71.5%).

5

Conclusion

In this paper, we proposed a multichannel convolution neural network for multilabel sentiment classification using
Abilify oral user comments. A pre-trained model was used
to generate word vectors. Then, the proposed model was

evaluated with the multilabel classification metrics. The
results showed that the proposed multichannel CNN model
achieves the better result in terms of Hamming loss, F1 micro score, and accuracy per label than the problem transformation approaches. In future work, we study the trend of
drugs and medications in different age groups using patient
and caregiver reviews.

Acknowledgement
We thank the Department of Information Science and Technology, Anna University, Chennai for the facility provided
during this work.

References
[1] Ronen Feldman. Techniques and applications for
sentiment analysis. Communications of the ACM,
56(4):82-89, 2013. https://doi.org/10.1145/
2436256.2436274
[2] Bing Liu. Sentiment analysis: Mining opinions,
sentiments, and emotions. Cambridge university
press, 2020. https://www.cs.uic.edu/~liub/
FBS/sentiment-analysis-tutorial-2012.pdf
[3] Grigorios Tsoumakas and Ioannis Katakis. Multilabel classification: An overview. International
Journal of Data Warehousing and Mining (IJDWM),
3(3):1-13, 2007. https://doi.org/10.4018/
978-1-59904-951-9.ch006
[4] Sadri Alija, Edmond Beqiri, Alaa Sahl Gaafar, Alaa
Khalaf Hamoud. Predicting Students Performance

A multi-channel convolutional neural network for…

Using Supervised Machine Learning Based on Imbalanced Dataset and Wrapper Feature Selection. Informatica, 47(1), 2022. https://doi.org/10.31449/
inf.v47i1.4519
[5] Read J, Pfahringer B, Holmes G, and Frank E. Classifier chains for multi-label classification. In Joint
European Conference on Machine Learning and
Knowledge Discovery in Databases, 254-269, 2009.
Springer, Berlin, Heidelberg. https://doi.org/
10.1007/978-3-642-04174-7_17
[6] Tsoumakas G, Katakis I, and Vlahavas I. Random k-labelsets for multilabel classification. IEEE
Transactions on Knowledge and Data Engineering, 23(7):1079-1089, 2010. https://doi.org/
10.1109/TKDE.2010.164
[7] LeCun Y, Bengio Y, and Hinton G. Deep learning.
Nature, 521(7553):436-444, 2015. https://doi.
org/10.1038/nature14539
[8] Ian Goodfellow, Yoshua Bengio, and Aaron
Courville. Deep learning. Cambridge: MIT press,
1(2), 2016. http://www.deeplearningbook.org
[9] Patel R, Tanwani S, and Patidar C. Relation Extraction Between Medical Entities Using Deep Learning
Approach. Informatica, 45(3), 2021. https://doi.
org/10.31449/inf.v45i3.3056
[10] Jeffrey Pennington, Richard Socher, and Christopher
Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on
empirical methods in natural language processing
(EMNLP), 1532-1543, 2014. https://doi.org/
10.3115/v1/d14-1162
[11] Baumel T, Nassour-Kassis J, Cohen R, Elhadad M,
and Elhadad N. Multi-label classification of patient
notes a case study on ICD code assignment. 2017.
https://arxiv.org/abs/1709.09587
[12] Wang Y, Sohn S, and Liu S et al. A clinical text classification paradigm using weak supervision and deep
representation. BMC medical informatics and decision making, 19(1):1, 2019. https://doi.org/10.
1186/s12911-018-0723-6
[13] Singh G, Thomas J, Marshall IJ, Shawe-Taylor J,
and Wallace BC. Structured multi-label biomedical
text tagging via attentive neural tree decoding. 2018.
https://arxiv.org/abs/1810.01468
[14] Leslie Citrome. A review of aripiprazole in the
treatment of patients with schizophrenia or bipolar I disorder. Neuropsychiatric Disease and Treatment, 2(4):427, 2006. https://doi.org/10.2147/
nedt.2006.2.4.427

Informatica 47 (2023) 109–114

113

[15] Rios A and Kavuluru R. Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. In Proceedings of the 6th ACM Conference on Bioinformatics,
Computational Biology and Health Informatics, 258267, 2015. https://doi.org/10.1145/2808719.
2808746
[16] Gargiulo F, Silvestri S, and Ciampi M. Deep Convolution Neural Network for Extreme Multi-label Text
Classification. In HEALTHINF, 641-650, 2018.
[17] Kolesov A, Kamyshenkov D, Litovchenko M,
Smekalova E, Golovizin A, and Zhavoronkov A.
On multilabel classification methods of incompletely labeled biomedical text data. Computational
and mathematical methods in medicine, 2014.
https://doi.org/10.1155/2014/781807
[18] Parwez MA and Abulaish M. Multi-label classification of microblogging texts using convolution neural
network. IEEE Access, 7:68678-68691, 2019. https:
//doi.org/10.1109/ACCESS.2019.2919494
[19] Ashok Kumar J, Abirami S, and Tina Esther Trueman. Abilify Oral user reviews. IEEE Dataport, 2020.
https://dx.doi.org/10.21227/p1jp-2m84
[20] Kumar JA, Abirami S, and Trueman TE. Multilabel Aspect-Based Sentiment Classification for
Abilify Drug User Review. In 2019 11th International Conference on Advanced Computing
(ICoAC), IEEE, 376-380, 2019. https://doi.org/
10.1109/ICoAC48765.2019.246871
[21] Baadel S, Thabtah F, Lu J, and Harguem S. OMCOKE: A Machine Learning Outlier-based Overlapping Clustering Technique for Multi-Label Data Analysis. Informatica, 46(4), 2022. https://doi.org/
10.31449/inf.v46i4.3476
[22] Yoon Kim. Convolutional neural networks for sentence classification. 2014. https://arxiv.org/
abs/1510.03820
[23] Oo SH, Hung ND, and Theeramunkong T. Justifying convolutional neural network with argumentation
for explainability. Informatica, 46(9), 2023. https:
//doi.org/10.31449/inf.v46i9.4359
[24] Burkhardt S and Kramer S. Online multi-label dependency topic models for text classification. Machine Learning, 107(5):859-886, 2018. https://
doi.org/10.1007/s10994-017-5689-6

114

Informatica 47 (2023) 109–114

T.E. Trueman et al.

https://doi.org/10.31449/inf.v47i1.4478

Informatica 47 (2023) 107–114 107

Augmented Reality in Sports Education and Training for Children
With an Autism Spectrum Disorder
Adel Fridhi, naila Bali, zied Hassen
Research Laboratory on Disability and Social Unsuitability, LR13AS01, ISES, UMA, Tunisia
Higher Institute of Special Education, Tunisia
E-mail : adel.fridhi2013@gmail.com, naila.bali2020@gmail.com, zied.hassen2020@gmail.com
Keywords: augmented reality, autism spectrum disorders, adapted physical activity, avatar, daily environment
Received: November 1, 2022
3D modeling and augmented reality (AR) offer innovative perspectives for training in sports activity
for children with autism spectrum disorders (ASD). The objective of this article is to offer a reflection
on the design and learning methodology in the field of adapted physical activities (APA), with the aim
of improving its credibility towards children with ASD. We present an original experience of
development by augmented reality in team sport: an ergonomic approach to activity in a natural
situation makes it possible to model the decision-making of children with ASD; this is used to guide
children with ASD to follow the existing avatar in the scene in a daily environment (DE) using
augmented reality (AR).
Povzetek: Predstavljen je sistem za razvoj športnih sposobnosti avtističnih učencev.

1 Introduction
The term "autism spectrum disorder" (ASD) corresponds
to a disorder may be mental which attracts the intense
attention of psychologists, neurophysiologists.
Due to the development of new technologies the
construction of developed versions of virtual reality
(AR) has become possible. AR is simulated using
modern computer techniques and an everyday
environment (including a variety of interacting avatars),
which is perceived by the subject through certain
interface. The applications of AR may include
entertainment (eg, adapted physical activity) and
educational purposes, the scope of its application is
currently being considerably broadened. AR is an
interactive experience of a real environment where
certain things residing in the real world (objects, people,
events, etc.) are enhanced by computer-generated
perceptual information. AR can be defined as a threefunctional system: it combines real and virtual, it
provides real-time interaction, and it enables accurate 3D
recording of real and virtual objects. [1]
Direct physical activities with the teacher applied to
children with ASD remain a classic method. However,
the ability of such activities to replicate real-life tasks is
obviously questionable. At this level, AR seems to be
much more useful because it increases the capacity of
memory, gaze, objective and exact control of the
different behaviors of ASDs. The evaluation of the
attentional, emotional and executive functions of a
subject by the creation of AR has above all aroused our

interest. The potential contribution of AR in APA to the
assessment of children with ASD appears promising, and
these children may benefit from this technological
innovation.
As it was mentioned, the condition of children with ASD
is usually assessed by classical methods, which are now
traditional at the same time, several 3D tasks have also
been digitally reproduced using new technologies.
However, "Would AR be a more appropriate modality to
help children with ASD perform and successfully learn
physical activities?" is an unresolved question. The
objective of our paper is to provide an answer to this
problem by proposing a new method of learning APA
using AR for this population. First, the cognitive
functioning of children with ASD will be discussed
according to different explanatory hypotheses, then an
evaluation of the emotional and attentional advantages
via the educational scenes in APA will be discussed and
their use in relation to children with ASD, on the one
hand, as a means of intervention for this type of
population and, on the other hand, as a means of
evaluation in APA. Finally, the discussion will focus on
whether AR would be a more appropriate way to assess
and improve the behavior of children with ASD.
The query is as follows: "How the use of AR concerning
children with ASD can contribute to the improvement of
the observation, the memorization of the APA realized in
3D." To address this problem, we will focus on two
dimensions: the memory test and the repetition of APAs
in children with ASD.

108 Informatica 47 (2023) 107–114

2 Increased reality in adapted
physical education processes for asd
AR is a new technology that has added the most interest
in the field of adapted physical education [2], as it allows
real-time interaction and immersion between real and
virtual space [3]. AR comprises different stages which
increase the complexity of the effect they produce and
require highly developed computer equipment, a marker,
a camera and then virtual elements (avatars) are added to
the real space [4]. This techno-pedagogical tool used in
physical education allows a huge number of possibilities
that promote meaningful learning for children with ASD
through the approach of classroom contexts with which
they will be in full immersion and interaction with
avatars [5, 6].
The strength of AR's success lies in the large number of
benefits it brings to physical education adapted to
children with ASD [7]. From our practical tests we
noticed that it considerably increased the motivation of
children with ASD, because it allowed to present the
content of the motor actions existing in the scene
projected on the screen in an attractive way, arousing
curiosity for the process of learning [8-9].
Methodological trends in physical education aim to
increase motivation at the highest level [10].
Our experiments show that AR contributes to the
promotion of the contents of the scene in an autonomous
way and from a constructivist approach, produced by
learning by discovery [11]. It also facilitates
collaborative sequencing as another potential of AR is its
adaptation to the different stages of adapted physical
education [12]. It is a very versatile tool that can be
adapted to the profiles of children with ASD allowing it
to be used effectively [13]. In addition to all of these
factors, AR contributes to sports education by improving
digital skills and bringing virtual content into the try-out
room, which allows ASDs to contextualize their new
adapted physical activity learning method. and to
facilitate the understanding of different stages [14,15].
The end result is a great improvement in academic
performance in the showroom [16].
The physical education of ASDs is a pedagogical subject
that greatly benefits from the versatility of AR, because
it allows this population to increase their performance, it
facilitates the understanding of the practical tests of its
content [17] and accelerates the development of complex
motor skills [18] and improves spatial orientation and
interpersonal skills in ASD [19].
AR can be an excellent weapon for children with ASD if
used to model a learning environment mixed with
physical training that encompasses all knowledge in
adapted physical education [20]. The great value of AR
as a tool in the field of physical education adapted for
ASD is becoming very evident [21].

3 Methodology

A. Fridhi et al.

a.

Memory test

We started by using AR to measure the memory of
children with ASD to focus on faces and avatars. The
work consisted of memorizing a number of color sets and
avatars displayed virtually with a head-mounted display
and they had to indicate whether they are identical or not
(Fig. 01). The results showed that children with ASD are
highly motivated to recognize objects. This review
therefore reflects what has been concluded in the
scientific literature: children with ASD memorize a
smaller number of faces.

Figure 1: Presentation of color game to test memory.
Finally, AR has been used as part of the assessment or
verification of ASDs due to its scientific advantages that
produce faces or situations of adapted physical activity
easily reflecting the real environment. The results
mentioned above are consistent and very interesting with
what is included in the scientific analysis regarding ASD.
AR actions prove important eye contact, very easy social
decision making, great respect for social conventions and
great attention to memorizing faces. AR would then
prove to be a valuable assessment and verification tool
for ASDs because it offers to objectively monitor the
behaviors associated with this diagnosis. AR technology
aims to change the approach to assessment and
memorization of ASD-related processes. The next phase
of this progression will allow access to remote, ARbased environments (cases of covid-19 during
containment). Finally, before reaching this goal, several
questions must be addressed and resolved [22].
Applications modeled and created by AR are used
successfully to train children with ASD to overcome
memorization difficulties (when adding an avatar located
in the same scene) (Fig. 02).

Augmented Reality in Sports Education and Training for…

47 (2023) 107–114 109

When we talk about augmented reality, we think of a new
technology in its development and use. But as surprising
as it may seem augmented reality has not been used in
the setting of physical activity suitable for children with
ASD, our goal is to help this population as best as
possible. Augmented reality is a technology of the future,
but we like to think of it as an innovation of the present.
In this article we will go through some examples of the
use of AR in the context of adapted physical activities
more specifically towards children with ASD.
The example below (Fig. 03) is the first use of
augmented reality in the context of adapted physical
activities for children with ASD. AR was used using a
marker located in an exhibit hall, children could see color
avatars appear. When you rotate the marker you can look
at the virtual object modeled in 3D (Fig. 03).
Figure 2: Addition of another color to test the
memory of the TSA.

b.

Repetition test for sports activities

Figure 3: First try.

Overall, this step has been very well accepted by ASDs.
We noticed an improvement in the eyes, that's why we
considered it a very interesting step and it can be
improved in the following steps. Anyway, after the
success of AR technology, this system was further
improved in order to be the most stable, the most real and

also to allow its use by this subject, the camera at the top.
That allows coverage of the space used is visible by
TSAs in the showroom (Fig. 04).

110 Informatica 47 (2023) 107–114

A. Fridhi et al.

Figure 4: Second try.

These experiences have given us the ability to know
memory capacity and even see real actions by ASDs. Our
research laboratory Handicap and social maladjustment
RLHSM (Research Laboratory on Disability and Social
Unsuitability therefore released an application that
allowed children with ASD to "memorize, follow the
adapted physical activities and apply the existing
phenomenon in the existing scene" through the RA.

Benefit of augmented reality

Build a training
environment

Virtual stadium in development

Ease of learning

4 Results
The following table (01) shows the benefits of AR in
sports for children with autism spectrum disorder, and
then you will see an experimental study (fig.03, 04, 05,
06).

Results
-The AR solution enables the creation of a personalized training
environment for ASD child that displays forecast information for athletes.
Users can practice playing directly with virtual opponents created by
computers before participating in real matches. Players using AR sports
solution can experience this type anywhere. Additionally, players can
interact directly within the app. It is something special in this type of
technology that virtual reality (VR) does not have.
Fans can participate in home game commentary. Augmented reality in sports
provides extremely professional graphical analysis. AR allows them to
easily connect to stadiums in the virtual world. Sports companies incorporate
this model with the aim of helping users visualize ball curves/trajectories
and other important live match details.
- it is important first of all to master this new learning method. This being
in order to be able to take full advantage of all that it has to offer in the field
of sport.
- Once you understand how augmented reality works, you can then learn
much more easily by having a much more precise and concrete view of the
subject.

Augmented Reality in Sports Education and Training for…

Practice
anytime/anywhere

Link building

A more real feeling
Major discoveries

47 (2023) 107–114 111

AR used in sports training can help ASD child, as well as their coaches, set
new skill goals, as the sports AR app will display information to users over
time. Real about every shot, throw and mile traveled. The practice session
will help the players to know better techniques to improve. Moreover,
coaches or sports professionals can evaluate training sessions with AR to
make the right decision for their athletes.
-When a fact becomes more real, links are created more simply and this is a
very great advantage of augmented reality, including the world of ASD
children's sports without any limits.
- Augmented reality brings a much more real feeling when it is requested.
For this, she uses all our senses to help us ASD children.
Augmented reality sometimes gives us the ability to see what has never
been seen before, especially the ocean floor. Because of all this, discoveries
and new studies continue to increase every day so that we know better and
better the world in which we live.
Table 1: Benefit and results of augmented reality.

AR technology has the capacity to meet all the
requirements formulated [23]. Reality AR makes it
possible to simulate virtual sports environments, in a
very ecological way (Fig. 05), in particular inserted in a
demonstrative situation (in a classroom). Augmented
Reality (AR) offers the possibility of managing the
control of distracting components, the complexity of
stimuli and their alteration. Dynamically variable, in
response to the actions of children with autism. Other
technical specifications of the responses (precision,
rhythm) can be collected to allow a more precise
analysis.
This new methodology can increase the reliability of the
assessment by reducing the variability due to the
differences between the sports educators, the virtual

testing environment and the efficiency of the modeled
avatars. Finally, it can improve its validity by allowing
more detailed and specialized behavioral investigations
and by increasing the ecological characteristics of what
is measured with children with ASD [04].
A large number of experiments were done for the present
investigation and a group of ASDs was also made
available for the development of the experiment (Fig. 05
& 06). A new innovative methodology, since the process
of education was applied to it through the use of AR. The
objective was to find out if there were other significant
ones in each of the dimensions in the experimental
group. That is to say whether the method applied in one
group and another influenced each of the dimensions of
the study.

Figure 5 : Adaptation of TSA children with the scenes presented.

112 Informatica 47 (2023) 107–114

A. Fridhi et al.

Figure 6 : Confirmation of the adaptation of children with ASD with the scenes presented.

5 Discussion
The integration of new technologies in exhibition spaces
is becoming an increasingly common practice, more
precisely, in the field of physical education, more and
more training is betting on these tools in the exercise of
their practice. Educational, because of the benefits and
emotional enhancement they bring to children with ASD
[25]. As a result of this new learning method, the present
study aimed to analyze from a practical perspective how
the application of AR affected huge several skill
variables in ASD. The results obtained in the analyzes
after carrying out the experiment show how the
experiment with AR improved all of the variables
evaluated. We continue on a line of research that
consolidates the idea of the increased motivation caused
by practical trials with the use of this new technology,
[26]. Also, making contact with this technology allowed
ASDs to improve their perception of physical education
and aroused their interests and knowledge.
Finally, and on the basis of the few questions asked at the
beginning of this paper, the results obtained provided
motivating answers to go very far with this new learning
method in adapted physical education.

6 Conclusion
The techniques of AR are constantly changing the
approach to the assessment and rehabilitation of ASDrelated processes. The next phase of this progress will
allow one to provide access via the Internet to
environments based on virtual or augmented reality.
However, before this vision is realized, several questions

will have to be addressed and resolved [27]. Applications
created by AR are used to train children with learning
difficulties, such as children with ASD, on certain safety
gestures.
This paper is a review, and thus the correspondence of
the study to the existing ethical standards does not need
special confirmation. At the same time, some studies of
the authors of the review have been mentioned in the
latter; in all these studies involving human subjects, the
existing international ethical norms have been strictly
observed [28].

7 References
[1] Fridhi, A., Bali, N., Rebai, N., et al. Geospatial
Virtual/Augmented Environment: Applications for
Children
with
Pervasive
Developmental
Disorders. Neurophysiology, 2020, p. 1-8.
[2] López-Faican, L.; Jaén, J. EmoFindAR: Evaluation
of a mobile multiplayer augmented reality game for
primary school children. Comput. Educ. 2020, 149,
103814.
[3] Madanipour, P.; Cohrssen, C. Augmented reality as a
form of digital technology in early childhood
education. Australas. J. Early Child. 2020, 45, 5–13.
[4] Lester, S.; Hofmann, J. Some pedagogical
observations on using augmented reality in a

Augmented Reality in Sports Education and Training for…

vocational practicum. Br. J. Educ. Technol. 2020, 51,
607–866.
[5] Lee, I.J. Kinect-for-windows with augmented reality
in an interactive roleplay system for children with an
autism spectrum disorder. Interact. Learn. Environ.
2020, 1–17.
[6] El Kabtane, H.; El Adanani, M.; Sadgal, M.; Mourdi,
Y. Virtual reality and augmented reality at the service
of increasing interactivity in MOOCs. Educ. Inf.
Technol. 2020, 1–27.
[7] Salar, R.; Arici, F.; Caliklar, S.; Yilmaz, R.M. A
Model for Augmented Reality Immersion
Experiences of University Students Studying in
Science Education. J. Sci. Educ. Technol. 2020, 1–
15.
[8] Rivadulla, J.C.; Rodríguez, M. Incorporation of
augmented reality in science classroom. Contextos
educativos. Rev. Educ. 2020, 25, 237–255.
[9] Cabero, J.; Roig, R. The motivation of technological
scenarios in augmented reality (AR): Results of
di_erent experiments. Appl. Sci. 2019, 9, 2907
[10] Demitriadou, E.; Stavroulia, K.E.; Lanitis, A.
Comparative evaluation of virtual and augmented
reality for teaching mathematics in primary
education. Educ. Inf. Technol. 2020, 25, 381–401.
[11] De Almeida, G.N.; Cabero, J. Aid-augmented
reality for reinforced concrete class: Students’
perception. Alteridad 2020, 15, 12–24
[12] Rodríguez, A.M. ; Hinojo, F.J. ; Ágreda, M. Diseño
e implementación de una experiencia para trabajar
la interculturalidad en Educación Infantil a través
de realidad aumentada y códigos QR. Educar 2019,
55, 59–77.
[13] Sahin, D.; Yilmaz, R.M. The e_ect of Augmented
Reality Technology on middle school students’
achievements and attitudes towards science
education. Comput. Educ. 2020, 144, 103710.
[14] Habig, S. Who can benefit from augmented reality
in chemistry? Sex di_erences in solving
stereochemistry problems using augmented reality.
Br. J. Educ. Technol. 2019, 51, 629–644.
[15] Arici, F.; Yildirim, P.; Caliklar, S.; Yilmaz, R.M.
Research trends in the use of augmented reality in
science education: Content and bibliometric
mapping analysis. Comput. Educ. 2019, 142,
103647.

47 (2023) 107–114 113

[16] Fidan, M.; Tuncel, M. Integrating augmented reality
into problem-based learning: The e_ects on learning
achievement and attitude in physics education.
Comput. Educ. 2019, 142, 103635.
[17] Hsiao, K.F. Using augmented reality for student’s
health—Case of combining educational learning
with standard fitness. Multimed. Tools Appl. 2013,
64, 407–421.
[18] Chang, K.E.; Zhang, J.; Huang, Y.S.; Liu, T.C.;
Sung, Y.T. Applying augmented reality in physical
education on motor skills learning. Interact. Learn.
Environ. 2019, 1–13.
[19] Gallego-Lema, V.; Muñoz-Cristobal, J.A.; ArribasCubero, H.F.; Rubia-Avi, B. Orienteering in the
natural environment: Ubiquitous learning through
the use of technology. Movimento 2017, 23, 755–
770.
[20] Aznar-Díaz, I.; Cáceres-Reche, M.P.; TrujilloTorres, J.M.; Romero-Rodríguez, J.M. Mobile
learning y tecnologíasmóviles emergentes en
Educación Infantil: Percepciones de los maestros
en formación. Rev. Espac. 2019, 40, 14–21.
[21] Moreno-Guerrero, A.J. ; Rodríguez-Jiménez, C. ;
Ramos, M. ; Sola-Reche, J.M. Interés y Motivación
del Estudiantado de Educación Secundaria en el
uso de Aurasma en el Aula de Educación Física.
Retos 2020, 38, 333–340.
[22] Fridhi, A., & Bali, N. (2021). Science Education and
Augmented Reality: Interaction of students with
Avatars
Modeled
in
Augmented
Reality. International Journal of Environmental
Science, 6.
[23] A. A. Rizzo, M. T. Schultheis, K. A. Kerns, and C.
Mateer, “Analysis of assets for virtual reality
applications in neuropsychology,” Neuropsychol.
Rehab., 14, 207-239 (2004).
[24] Fridhi, A., Benzarti, F., Frihida, A., & Amiri, H.
(2018). Application of Virtual Reality and
Augmented
Reality
in
Psychiatry
and
Neuropsychology, in Particular in the Case of
Autistic
Spectrum
Disorder
(ASD). Neurophysiology, 50(3), 222-228.
[25] Aznar-Díaz, I. ; Trujillo-Torres, J.M. ; RomeroRodríguez, J.M. Estudio bibliométrico sobre la
realidad virtual aplicada a la neurorrehabilitación y
su influencia en la literatura científica. Revista
Cubana De Información En Ciencias De La Salud
2018, 29, 1–11.

114 Informatica 47 (2023) 107–114

[26] Gómez-García, G. ; Rodríguez-Jiménez, C. ; MarínMarín, J.A. La trascendencia de la Realidad
Aumentada en la motivación estudiantil. Una
revisión sistemática y meta-análisis. Alteridad
2020, 15, 36–46.
[27] Fridhi, A., Benzarti, F., Frihida, A., & Amiri, H.
(2018). Application of virtual reality and
augmented
reality
in
psychiatry
and
neuropsychology, in particular in the case of
autistic
spectrum
disorder
(ASD). Neurophysiology, 50(3), 222-228.
[28] Fridhi, A., Benzarti, F., Frihida, A., & Amiri, H.
(2018). Application of virtual reality and
augmented
reality
in
psychiatry
and
neuropsychology, in particular in the case of
autistic
spectrum
disorder
(ASD). Neurophysiology, 50(3), 222-228.

A. Fridhi et al.

https://doi.org/10.31449/inf.v47i1.4433

Informatica 47 (2023) 115–130 115

A Novel Method for Human MRI Based Pancreatic Cancer
Prediction Using Integration of Harris Hawks Varients & VGG16: A
Deep Learning Approach
Rama Prakasha Reddy Chegireddy1*, A Sri Nagesh 2
1
Research Scholar, Department of Computer Science and Engineering, D.r. Y.S.R ANJU College of Engineering and
Technology, Acharya Nagarjuna University, Guntur-Andhra Pradesh, India.
2
Professor, Department of Computer Science and Engineering, RVR&JC College of Engineering, Guntur, Andhra
Pradesh, India.
E-mail: reddysinfo@gmail.com, asrinagesh@gmail.com
* Corresponding author
Received: October 4, 2022
Keywords: BADF, classification, CLAHE, deep learning, pancreatic cancer, segmentation, UNET, medical image
processing, image segmentation
Among all cancers, pancreatic cancer has a very poor prognosis. Early diagnosis, as well as
successful treatment, are difficult to achieve. As the death rate is increasing at a rapid rate (47,050
out of 57650 cases), it is of utmost importance for medical experts to diagnose PC at earlier stages.
The application of Deep Learning (DL) techniques in the medical field has revolutionized so much in
this era of technological advancement. An analysis of clinical proteomic tumor data provided by the
Clinical Proteome Tumor Analysis Consortium Pancreatic Ductal Adenocarcinoma (CPTAC-PDA) at
the National Cancer Institute was used to demonstrate an innovative deep learning approach in this
study. This includes a) collection of data b) preprocessed using CLAHE and BADF techniques for
noise removal and image enhancement, c) segmentation using UNet++ for segmenting regions of
interest of cancer. Followed by, d) feature extraction using HHO based on CNN and e) feature
selection using HHO based on BOVW for extracting and selecting features from the images. Finally,
these are subject to the f) classification stage for better analysis using the VGG16 network.
Experimental results are carried out using various state-of-art models over various measures in
which the proposed model outperforms with better accuracy:0.96, sensitivity:0.97, specificity:0.98,
and detection rate:0.95.
Povzetek: Opisana je metoda globokega učenja za napovedovanje raka na ledvicah.

1 Introduction
The death rate from pancreatic cancer (PC) in the
United States is among the highest of all cancers.
Despite aggressive treatment approaches and
combination modalities, the 5-year survival rate
remains 5%. According to 2017’s SEER data [1],
Pancreatic ductal adenocarcinoma accounted for
47,050 deaths and new cases of 57,600 were reported.
In 2030, PDAC is expected to overtake cancer as the
2nd largest cause of mortality [2]. Only 15 to 20% of
sufferers are qualified for a potentially curative
surgery because of non-specific indications and late
discovery [3]. Whipple surgery left pancreatectomy
and complete pancreatectomy+ are the three surgical
options for pancreatic cancer treatment. By analyzing
the resection tissues, it will be possible to determine
whether or not lymph nodes are metastasizing from
the tumor, as well as whether there is pre-invasive
pancreatic
intraepithelial
neoplasia.
Further
therapeutic management will be based on pathological

results [4]. It is important to identify neoplastic cells
from benign or inflammatory cells to have a clear
picture of the tumor. Because of the tremendous
heterogeneity between and within tumors in growth
pattern, cytology, and stroma (figure 1), this can be a
daunting task. A fibrotic and inflammatory
microenvironment contributes to the heterogeneity
and complex growth pattern of tumors, with the latter
constituting most of the tumor mass [5]. On
microscopic examination, PDAC is primarily
glandular, with extensive desmoplastic stroma
formation. However, other structures can also be
observed, including (micro-)papillae, solid nests,
cribriform, or small, single-cell tumors [6]. There are
several molecular factors associated with the
development of non-glandular, histologically poorly
differentiated tumor growth patterns, such as
mesenchymal phenotypes, proteases, and neutrophil
infiltrates [7,8]. PDAC grows in a dispersed pattern. It
is in these cases that the tumor cells are not usually
grouped, but are instead found in cellular clusters
which encroach on the surrounding tissues, nerve

116 Informatica 47 (2023) 115–130

sheaths, and vascular networks [9]. A PanIN
(Pancreatic Intraepithelial Neoplasia) is the precursor
lesion of PDAC (Figure 2), and it is analogous to
ductal epithelial carcinomas in colon cancers, in
which ductal cells proliferate to become cancerinvasive.

R.P.R. Chegireddy et al.

could be reduced from being the leading cause of
death. One of the most difficult tasks completed by
the radiologist up to this point has been identifying the
nodules in the stomach wall. Nodules of the pancreas
have diverse shapes and sizes, which makes it difficult
to identify small nodules. While segmenting the tumor
region, difficulties such as over-segmentation and
under-segmentation can develop. While there are
many imaging modalities available, using the more
reliable and convenient modality is important for early
tumor detection. To identify and characterize the
tumor's location, scientists have recommended some
procedures. The contrast of MRI for soft tissues is
better than CT, and it can differentiate fat, water,
muscles, and other soft tissues more easily than CT.
Additionally, MRI has a higher sensitivity (33%) for
detecting tumors than CT (11%). The primary goal of
this research is to suggest a better framework that will
detect and classify pancreatic cancer from MRI
images to support radiologists in making diagnostic
decisions.

1.2 Key highlights
Figure 1: Pancreatic Cancer: MRI image of risk patients.
A healthy pancreas and chronic pancreatitis have
glandular and ductal features grouped in an organoidlobular configuration, while a malignant pancreas has
tumor glands that are dispersed throughout the stroma,
distorted, and display solitary cells [10,11]. Chronic
pancreatitis is characterized by fibrosis, ductal tissue
loss, and acinar thinning, all of which were linked to
an increased risk of invasive carcinoma [12]. PDAC
review
time
for
slides
with
histological
microarchitecture, distributed development, varied
microclimates, preinvasive lesions, inflammatory
tissue, and sealed anatomical tissue is predicted to be
1 to 2 minutes per slide [13]. The time variable is
significant for diagnosing, even if the accuracy of
diagnosis is high, and it will become even more
significant as the overall number of specialist
pathologists’ declines, and as the general demand for
information and specialization increases, as well as
the number of patients [14,15].
Techniques which enable and promote
morphological-based tissue slide evaluation and flag
crucial regions for further study by professional
pathologists are thus necessary. Digital pathology has
evolved as a means for evaluating histopathology
slides, supporting routine diagnostics and research, as
well as ensuring quality control. Reproducible tissue
categories are very important in spatial tissue studies.
Deep learning methods have previously been
demonstrated to be effective in determining lymph
node metastases and classifying tumor subsets [16].

1.1 Research gap
By identifying the onset period, pancreatic disease

This article aims to optimize methods and propose a
framework for detecting and classifying pancreatic
cancer using deep learning and image processing
techniques. The primary objectives of this article are
as follows:
•To suggest a framework based on MRI images to
detect and classify pancreatic cancer.
•To improve the MRI image quality using Boosted
Anisotropic Diffusion Filter (BADF) and contrasted
limited adaptive histogram equalization (CLAHE)
algorithms.
•To use the UNet++ architecture to create a
Computer-Aided Detection method (CAD) for the
early identification of pancreatic cancers. The
pancreatic region associated with a lesion is precisely
separated from the MRI image by segmentation using
the UNet++.
•To extract the best subset of texture features to
enhance classification accuracy and to create a
classification system based on these texture features
using HHO-based CNNs and HHO-based Bags of
visual terms.
•To distinguish different levels of malignancy in an
MRI image by developing a classifier based on the
VGG 16 model.
•To perform quantitative analysis for various tumor
classes and the accuracy of the proposed classifier is
assessed against the state-of-the-art work’s
performance.
Organization of the paper: As we already came
across the overview of PDAC and its respective areas
in Section 1, part 2 discusses the literature review,
third part illustrates the overall methodology adopted.
The fourth part presents the performance analysis, and
the fifth section summarizes the conclusion.

Informatica 47 (2023) 115–130 117

A Novel Method for Human Mri Based Pancreatic Cancer…

2

Literature review

Tonozoko et al. (2021) [17] developed a ComputerAided Diagnostics (CAD) approach that used deep
learning assessment of EUS pictures (EUS CAD) to
distinguish between persons with chronic pancreatitis
and those with Pancreatic Ductal Carcinoma (PDAC).
Liu et al. (2020) [18] used a CNN to
determine whether patches were carcinogenic.
According to the fraction of patches designated as
carcinogenic by the CNN and the trained and
validation datasets, a criterion for identifying
pancreatic cancer was created. Researchers utilized a
localized test group (101 pancreatic cancer patients
and 88 controls, local test group 2) in addition to data
from the United States (281 pancreatic cancer patients
and 82 controls). In this study, EM algorithms and
Gaussian Mixture models were integrated to highlight
the most necessary properties of the CT scan, and
threshold values were used to determine the
percentage of tumors present in the pancreas.
Vaiyapuri et al. (2022) [19] introduce an
intelligent deep-learning-enabled decision-making
medical system for pancreatic tumor classification
(IDLDMS-PTC) using CT images. The IDLDMSPTC model derives an emperor penguin optimizer
(EPO) with multilevel thresholding (EPO-MLT)
technique for pancreatic tumor segmentation. A
MobileNet model is applied as a feature extractor with
Author
Tonozoko
et al.
(2021) [17]

Algorithm
AlexNet

Fu et al.
(2021) [19]

optimal autoencoder (AE) for pancreatic tumor
classification. To optimally adjust the weight and bias
values of the AE technique, the multileader
optimization (MLO) technique was utilized.
Abbas et al. (2021) [20] suggest a Computer
Aided Diagnosis (CAD) system that uses Synergic
Inception ResNet-V2, a deep convolutional neural
network architecture, to identify PC cases from
publicly available CT images. This system could
extract PC graphical functionality to include clinical
diagnosis before the pathogenic examination, freeing
up valuable time for disease prevention. To
demonstrate the relatively encouraging outcomes in
terms of accuracy in recognizing BC-infected patients,
simulation results using MATLAB are provided in the
study. The suggested deep learning approach achieves
an accuracy of 99.23%.
Li et al. (2022) [21] offer a deep-learning
segmentation technique for pancreatic cancer based on
a dual meta-learning framework. This can combine
generic tumor data from idle MRIs with prominent
tumor information from Ct scan images to improve
the discrimination of high-level features. To provide
rich intermediate explanations for a meta-learning
technique that would follow, the randomized
intermediary modality between CTs and MRIs was
originally developed to fill in visual gaps.

Metrics
AUROC – 0.924
Sensitivity – 90.2
Specificity – 74.9

Strength
Higher-resolution
EUS
images are used. Higher
sensitivity.

Weakness
Risks and feasibility of EUS
imaging.

Inception
V3

Accuracy - 0.953

Patch-level and WSI-level
approach improves the
overall
classification
accuracy

The algorithm recognizes
cancer cells mainly from
nuclear features. Hence prone
to false positive results.

Liu et al.
(2020) [18]

VGG-16

Sensitivity - 0·973,
Specificity - 1·000,
and Accuracy 0·986

Achieved an accuracy
approaching 99% and
missed
fewer
tumors
compared with that of
radiologists.

Uses CT scans which show
less tumor detection sensitivity
of 11% compared to MRI
(33%).

Abbas et al.
(2021) [20]

ResNet

Accuracy - 99.23

The
isolateral
filter
enhances the quality of
poor
images
during
preprocessing.

Uses CT scans which show
less tumor detection sensitivity
of 11% compared to MRI
(33%).

Li et al.
(2022) [21]

GoogleNet

Dice score - 64.94

Dual
meta-learning
framework for pancreatic
cancer using MRI as well
as CT. Outperforms stateof-the-art methods based
on CT imaging.

NA

Table 1: Summary of literature review.

118 Informatica 47 (2023) 115–130

3

Methodology

This section outlines a novel approach for classifying
pancreatic cancers based on the Pancreatic Ductal
Adenocarcinoma cohort of the Clinical Proteomic
Tumor Analysis Consortium (CPTAC-PDA) dataset.
While there are various imaging techniques available,
MRI demonstrates improved tumor detection
sensitivity, which aids in discovering smaller tumors
(Grade I). The novelty of this study is the application
of image-enhancing methods and optimization
strategies to MRI images to increase the classification
accuracy when compared to the state-of-the-art
research under discussion. The overall design of the
proposed framework is shown in Figure 3, with the
steps outlined below.
During the pre-processing step, CLAHE and
BADF are used to enhance the images obtained from
the publically available MRI image collection CPTACPDA. A source image is divided into non-overlapping
contextual components known as sub-images, tiles, or
blocks by the CLAHE method. To balance each
contextual area, the CLAHE approach uses histogram
equalization. The cropped pixels are then redistributed
throughout the grey levels after the original histogram
is cropped. While traditional histograms, redistributing
histograms cap pixel intensities at a maximum value.
By including a Partial Differential Equation (PDE)
after it generates the diffuse image, the suggested
BADF improves on the existing anisotropic diffusion
filter. It's a sophisticated unsupervised machine
learning-based image enhancement tool. It's also
feasible to smooth details with a diffusion process
that's weak at the edges and borders of the images and
not only smooths out the image but also preserves
important characteristics like edges and patterns.
Excellent results were achieved when the number of
iterations was set to 20 based on extensive testing.
Once images are preprocessed, Segmentation is carried
out which is a crucial part of an image classification
method where the MRI image is segmented to isolate
the nodules. In this work, the UNet++ architecture is
used for the segmentation of MRI images. Once
segmented regions are obtained, features are extracted
and selected by using HHO-based CNN and HHObased BOVW. After segmentation and feature
extraction, the segmented tumor is identified using
texture features. Finally, the VGG-16 model is used to
distinguish between normal and tumor grades from the
MRI images. The Convolution Neural Network (CNN)
architecture VGG-16 is one of the best models for
image classification which allows transfer learning.
Transfer learning is the process of applying the
knowledge gained from one problem to another related
problem for further improvement.

R.P.R. Chegireddy et al.

3.1 Data collection
A dataset of CPTAC-PDA pancreatic ductal
adenocarcinomas from the National Cancer Institute is
included here. Proteogenomic, a large-scale method of
studying cancer genetics, is the goal of CPTAC [22].

Figure 2: The overall architecture of the proposed
framework.
The Cancer Imaging Archives is collecting radiology
and pathology images from CPTAC patients to provide
researchers with access to these images so they can
investigate cancer phenotypes and correlate them with
proteomic, genomic, and clinical findings.
There is a TCIA Collection for each type of cancer,
called CTAC- cancer type, which stores the images for
each type. Radiology pictures are compiled from
routine imaging conducted on patients immediately
before pathology diagnoses, as well as follow-up scans
where available. As a result, in terms of scanner
modalities, vendors, and acquisition processes,
radiology picture information sets are varied. The
CPTAC (Figure 4) qualification method includes
collecting pathology images. The National Cancer
Institute's Clinical Proteomic Tumor Analysis
Consortium Pancreatic Ductal Adenocarcinoma
(CPTAC-PDA)1 contains 45786 pancreatic images
from CPTAC third-phase patients. A total of 45
radiology topics and 77 pathology topics [23] are
included. This dataset includes samples from CT, CR,
and MRI scans. The pictures are of various sizes, but
they were shrunk to 128 in the current work. The
flexibility of the answer produced by the diverse
qualities of various imaging techniques is increased by
using numerous modalities in the training step.

Informatica 47 (2023) 115–130 119

A Novel Method for Human Mri Based Pancreatic Cancer…

3.2 Preprocessing
Preprocessing is carried out for removing noise and
anomalies and also thereby enhancing the images for
better prediction. So here we use both CLAHE and
BADF. They will be compared in Table 1 over
measures like PSNR, SSIM, and MSE for better
preprocessing analysis over ADF, BADF, AHE, and
CLAHE. From this, we get to know that, the higher the
PSNR and SSIM, the lower the MSE will give many
accurate results.

image. In general, the filters employ the very same
methodology as edge detection. Using multiple blurred
pictures generated by the diffusion process, the
anisotropic diffusion filtering process may be
described. The proposed BADF improves on the
previous anisotropic diffusion filter by adding a Partial
Differential Equation (PDE) after creating the diffused
image. Diffusion, which is absent at the edges and
boundary, can be utilized to smooth the surface [28].
After that, four conduction operators obtained from
Equations (20) and (21) are used to attenuate the highfrequency elements in each direction.

gN=
gS=

1
𝛻𝑁 𝐼𝑖,𝑗
1+(
)2
𝑘

1

𝛻𝑠 𝐼𝑖,𝑗
1+(
)2
𝑘

gE=

1

𝛻𝐸 𝐼𝑖,𝑗
1+(
)2
𝑘

gW=

1

𝛻𝑊 𝐼𝑖,𝑗
1+(
)2
𝑘

(1)
(2)
(3)
(4)

Algorithm 1: CLAHE
Figure 4. Pathology confirmed pancreatic ductal
adenocarcinoma in an elderly female patient. On fatsuppressed LAVA T1-(A) and T2-(B) weighted
imaging, C) MRI Cholangio-Pancreatography
(MRCP), D) Gadolinium-enhanced images in arterial,
E) Portal, F) dela
3.2.1 CLAHE
Because the pancreas is related to other organs such as
the duodenum and gallbladder, the input volume was
enhanced to make the pancreas more visible. To begin,
we modified the MRI images by adding a window
center (60) and window width (400) to make the
abdomen visible. By boosting the contrast of the
pancreatic region, the basic dataset was constructed by
contrast-limited adaptive histogram equalization
(CLAHE) [24-27]. By using the dynamic histogram
equalization method, each pixel is mapped to its
grayscale neighbors. Because the number of times the
approach is used is equivalent to the number of pixels
in the area, it consumes a lot of processing resources.
CLAHE accomplishes this by establishing a criterion.
If part of the picture's grey levels surpasses the
threshold, the surplus is dispersed equally among all
grey levels. The image will not be over-enhanced as a
result of this processing, and the issue of noise
amplification will be minimized.
3.2.2 BADF
The Perona-Malik Diffusion Process is another name
for the anisotropic diffusion filter, and it is named after
the people who devised it. It focuses primarily on
eliminating noise while maintaining fine features in the

K is a scalar that controls the level of smoothness, but
it must satisfy (K > 1), because a higher value of K
results in smoother outcomes. In a standard anisotropi
diffusion filter, K is set to 7. Equation (24) [29] is used
to automatically calculate variable K based on local
statistics in this investigation.

120 Informatica 47 (2023) 115–130

R.P.R. Chegireddy et al.

k=2*| 𝑚𝑒𝑎𝑛(𝑓𝑖,𝑗 ) |

ADF

(5)

(0.75∗𝜎(𝑓𝑖,𝑗 ))

CLAHE

Here, the standard deviation is denoted by σ. Using
Equation (10), we can determine the variance by
smoothing the visuals.

BADF

𝐼𝑖,𝑗 = 𝐼𝑖,𝑗 + 0.25[(𝑔𝑁 ∗ ∇𝑁 𝐼𝑖,𝑗 ) + (𝑔𝑆 ∗ ∇𝑆 𝐼𝑖,𝑗 ) +
(𝑔𝐸 ∗ ∇𝐸 𝐼𝑖,𝑗 ) + (𝑔𝑊 ∗ ∇𝑊 𝐼𝑖,𝑗 )]
(6)

AHE

where Ii, j is a smoothened image.

CLAHE

Algorithm 2: BADF
Step 1: Double the size of the input image.
Step 2: Diff im is a PDE (partial differential equation)
that needs to be initialized.
Step 3: Set the pixel distances in the centre.
dx = 1;
dy = 1;
Step 3: Identify four different 2D convolution masks
(N,S,E,W).
hN = [0 1 0; 0 -1 0; 0 0 0]
hS = [0 0 0; 0 -1 0; 0 1 0];
hE = [0 0 0; 0 -1 1; 0 0 0];
hW = [0 0 0; 1 -1 0; 0 0 0];
Step 4: Before evaluating the diffusion function,
identify the finite difference.

BADF

ADF

Table 2. Overall analysis under PSNR, MSE and
SSIM.
Preprocessi
PSN
SSI
MS
Ima
ng models
R
M
E
ge
AHE
23.5
0.24
3.2
6
9
ADF
22.8
0.33
4.7
Imag
1
e1
CLAHE
43.9
0.72
8.4
4
BADF
46.2
0.85
9.3
1
AHE

23.5
9
22.8
3
43.9
5
46.2
7

0.24
5
0.33
3
0.72
6
0.85
1

3.3
4.7
5
8.4
9
9.3
6

Imag
e2

AHE

23.6

0.25

ADF

22.9

0.34

Imag
e3

CLAHE
BADF

44.1
46.3

0.73
0.86

3.3
2
4.7
9
8.5
9.4

AHE

23.6
2

0.25
7

3.3
7

ADF
CLAHE
BADF

22.9
3
44.1
2
46.3
6

0.34
5
0.73
8
0.86
2

4.8

23.6
8
23.2

0.26

3.4

0.35

44.1
8
46.4

0.74

4.8
3
8.6

0.87

9.5

Imag
e4

8.5
3
9.4
6

Imag
e5

3.3 Segmentation
The proposed design is depicted in Figure 5a from a
high-level perspective. Unet++ is based on an encoder
subnetwork, which will be followed by a decoding
subnetwork. Therefore, skip paths (represented in
green and blue) connecting the two subnetworks have
been reconstructed, and deep supervision distinguishes
UNet++ from U-Net [30,31]. This is shown in red.

Figure 5: (a) An encoder and a decoder are linked via
thick convolutional blocks in UNet++. Before fusion,
UNet++ was primarily focused on bridging the
semantic gap between encoders and decoders. On the
original U-Net are black blocks with thick convolution
blocks on skip routes in green and blue, and red deep
supervision blocks. (b) A thorough investigation of
UNet++'s first skip path. (c) If UNet++ was trained
with a lot of supervision, it can be pruned during
inference. (Color image from the internet) [33]

3.3.1 Redesigned skip pathways
The communication between the encoder and decoder
sub-networks has improved thanks to redesigned skip
paths. The retrieved attributes from the encoder
enhance gain in the decoder in U-Net; The UNet++
method, however, uses dense convolution blocks,
whose number is determined by the pyramid level.

Informatica 47 (2023) 115–130 121

A Novel Method for Human Mri Based Pancreatic Cancer…

Convolution blocks X0, 0, and X1,3, for example,
contain three convolution layers. Because it is
concatenated, the result of each convolution layer is
merged with the reduced dense block result. Through
deep convolution, features extracted from the encoder
are transformed into feature maps that the decoder can
decode. The ideal is considered to have a simpler
approach to achieving optimum control issues if the
input encoder extracting properties and accompanying
decoder feature maps are conceptually equivalent.
A summary of the skip path is as follows: Let's
call the result of node Xi, j xi, j while I is the encoder's
down-sampling value and j is the dense block's
convolution layer. The following is how to calculate
the stack of extracted features denoted by xi,j:
𝐻(𝑥 𝑖−1,𝑗 )
𝑗=0
𝑖,𝑗
𝑥 ={
(7)
𝑖,𝑘 𝑗−1
𝑖+1,𝑗−1 )])
𝐻([[𝑥 ]𝑘=0 , 𝑢(𝑥
,𝑗 > 0
A convolution with an activation function of H(.) and an
upsampling layer of U(.). When a node at a level j > 1 is
selected, it accepts j + 1 inputs, j inputs representing
previous delete paths, and lastly, its output represents
the upsampled results of the lesser skip paths. Level j =
0 accepts only input from an encoder layer above it;
level j = 1 accepts input from an encoder sub-network at
a different stage, and level j > 1 accepts input from a
lesser encoder sub-network. Because each skipping
route employs a thick convolution block, all previously
extracted characteristics blend and reaches the current
node. Figure 5b illustrates how the characteristic
mappings flow through UNet++'s top skip pathway,
which better clarifies Eq. 1.
3.3.2 Deep supervision
Deep supervision is provided by UNet++ [30,31] so
that the model can run in two modes: (1) accurate
mode, where the categorized branches are averaged,
and (2) fast mode, where one of the classification
branches can be used as the categorization process
map, depending on the amount of pruning in the model
and the increase in the speed. In rapid mode, selecting
a segment branch gives designs of variable complexity,
as seen in Figure 5c.
With UNet++ one can stack skipping paths
with full-resolution attributes on multiple semantic
levels, including x0, j, j1, 2, 3, and 4 while being
deeply supervised. Each semantic phase is assigned a
loss function based on binary cross-entropy and dice
coefficient:
1
1
𝐿(𝑌, 𝑌̂) = − ∑𝑁
( . 𝑌𝑏 . 𝑙𝑜𝑔𝑌̂𝑏 +
𝑏=1
𝑁
2
2.𝑌𝑏 .𝑌̂𝑏
(8)
)
̂
𝑌𝑏 +𝑌𝑏

̂𝑏 and Yb is the flattened
N is the batch size, and 𝑌
projected probability and ground truth of the bth
image, respectively. The difference between UNet++
and U-Net is shown in Figure 5a which includes: In
terms of jump routes, (1) Convolution layers (green)

improve gradient flow; (2) Closely packed skip
connections on delete routes (blue); and (3) Deep
supervision (red) which prevents pruning and, in the
worst-case scenario, is similar to the performance of
using one loss layer in model 3.3.

3.4 Feature extraction
The HHO algorithm, a new metaheuristic stochastic
approach proposed by Harris hawks’ behaviors, is a
mathematical proposal. Harris hawks' behavior is
defined by their ability to track, encircle, and approach
potential prey (usually rabbits) and then attack them
with excellent synchronization. Surprise pounce is a
smart escape technique used in hunting. The HHO
technique, like earlier meta-heuristic algorithms [34,
35], includes exploratory and exploitative steps.
During the exploration phase, Harris hawks will pursue
prey randomly, according to the equation:

X(t+1)=
𝑋𝑟𝑎𝑛𝑑 (𝑡) − 𝑟1 |𝑋𝑟𝑎𝑛𝑑 (𝑡) − 2𝑟2 𝑋(𝑡)|𝑞 ≥ 0.5
(𝑋𝑟𝑎𝑏𝑏𝑖𝑡 (𝑡) − 𝑋𝑚 (𝑡)) − 𝑟3 (𝐿𝐵 +
}
{
𝑟4 (𝑈𝐵 − 𝐿𝐵))𝑞 < 0.5
(9)
The hawks are placed at X(t + 1), the rabbit (victim) at
Xrabbit (t), r1 to r4, and q are sequentially labeled
from 0 to 1, Xrand (t) signifies a random selection
hawk at a random location, and Xm denotes the current
hawk population's average location, as computed by
Equation (29):
1
𝑁

Xm(t)= ∑𝑁
𝑖=1 𝑋𝑖 (𝑡)

(10)

Xi(t) is the place of each hawk in iteration t, and N is
the total number of hawks. When the knowledge step
is finished, a duration occurs between the discovery
and exploitation periods. The rabbit's energy should be
shaped according to Equation (30) throughout this
moment of transition:
𝑡

E=2E0(1-𝑇)

(11)

where E represents the rabbit's escaping energy, E0
represents its initial energy state, and T represents the
maximum number of iterations. According to the
victim's physical condition, the E0 number could vary
from -1 to 1. When E0 approaches -1, the patient loses
energy and vice versa. The Harris hawks suddenly
approach their victim during the last stages of the
algorithm's processing. There are four attack strategies
available. r is a probability of escaping in this case.
Harris's hawks use a delicate besiege strategy to slowly
encircle the target when E ≥ 0.5 and r ≥ 0.5. The model
for mathematical analysis is as follows:

122 Informatica 47 (2023) 115–130

𝑋𝑖𝑡+1
𝑋𝑖𝑡

=

∆𝑋𝑖𝑡

− 𝐸|𝐽𝑋𝑝𝑟𝑒𝑦 −

𝑋𝑖𝑡 |, ∆𝑋𝑖𝑡

R.P.R. Chegireddy et al.

= 𝑋𝑝𝑟𝑒𝑦 −
(12)

J represents the strength of the prey's bouncing during
the escape, which takes a random value between 0 and
2, and individuals present in the presence of prey are
separated by a distance of Xi(t + I). The prey can't
escape when E 0.5, r 0.5, due to insufficient escaping
energy, and the Harris hawks' location is written as:

𝑋𝑖𝑡+1 = 𝑋𝑝𝑟𝑒𝑦 − 𝐸|∆𝑋𝑖𝑡 |

word. Third, a new image's local characteristics can be
quantified using the visual vocabulary gathered earlier.
Lastly, a BoW histogram is produced for image
representation [36,37,38] by collecting the frequency
of each bag of visuals in the frame.

(13)

That is E ≥ 0.5, r < 0.5 when Harris hawks soft besiege
with escalating quick dive tactics to confuse prey when
the prey has the necessary power to effectively flee. It
can be expressed in the following way:

𝑋𝑖𝑡+1 =
𝑌 = 𝑋𝑝𝑟𝑒𝑦 𝐸|𝐽𝑋𝑝𝑟𝑒𝑦 − 𝑋𝑖𝑡 |, 𝑖𝑓 𝑓(𝑌) < 𝑓(𝑋𝑖𝑡 )
}
{
𝑍 = 𝑌 + 𝑆 × 𝐿𝑒𝑣𝑦(𝑑), 𝑖𝑓(𝑍) < 𝑓(𝑋𝑖𝑡 )
(14)
S is a 1 D random vector, where d is the problem
dimension. When E < 0.5, r < 0.5, the prey has
insufficient escape energy, according to the Lévy
Flight function. This prey will be attacked by the
Harris hawks in the following ways:

𝑋𝑖𝑡+1 =
𝑡
𝑋𝑝𝑟𝑒𝑦 − 𝐸|𝐽𝑋𝑝𝑟𝑒𝑦 − 𝑋𝑚
|, 𝑖𝑓 𝑓(𝑌) < 𝑓(𝑋𝑖𝑡 )
}
{
𝑍 = 𝑌 + 𝑆 × 𝐿𝑒𝑣𝑦(𝑑), 𝑖𝑓(𝑍) < 𝑓(𝑋𝑖𝑡 )

Figure 6: HHO-based flowchart for feature extraction.

(15)
After using HHO (Figure 6) for extraction, CNN is
added at the end. We believe that the huge original
picture lxh is specified as x in the convolutional layer.
We begin by training sparse coding to extract the tiny
size image from the giant picture. It is necessary to
compute the f=(wxs+b) property by computing the
activation function and the weights and variances
between the explicit and visual layer units. We acquire
the matching value f' = (wxs'+ b') for each small
picture, as well as the convolution values of these fs'
and the matrix of convolution of the properties, for
each small image. These qualities must next be
categorized after they have been obtained by
convolution.

3.5 Feature selection
In four steps, the BoW model is explained. To begin,
each image of the given image collection is sampled
for patches represented by local descriptors. Second, a
clustering algorithm generates a visual vocabulary,
with each cluster center corresponding to a visual

As explained in the image, a set of elements from each
pixel is moved to a fresh feature space with k
characteristics, where k is the number of k-means
centroids. Hard-assignment coding was employed to
encode the features in this study. The following is an
example of a BoW image representation: Provided the
visual words BoW in a vocabulary,
1

𝑋(𝑊𝑖 ) = 𝑛 ∑𝑛𝑐=1 {

1 𝑖𝑓 𝑖 = 𝑎𝑟𝑔𝑗 𝑚𝑖𝑛||𝑊𝑗 − 𝑃𝑐 ||
0
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒,
(16)

n stands for the number of patches in the image, while
pc stands for patch c. Following that, a pictorial
representation is constructed using the BoW paradigm
and viewed as a "bag" of visual words.
At the start, strength profiles are employed to collect
the tumor's and the surrounding area's intensity
difference. An intensity profile is a vector of picture
intensity values calculated by analyzing the brightness
of pixels along the cancer border. The pixels were
taken from the center of the tumor to the border of the

Informatica 47 (2023) 115–130 123

A Novel Method for Human Mri Based Pancreatic Cancer…

cancerous area. As seen below, the intensity profile is
created. Gaussian kernels smooth the spots at the
tumor border to prevent them from being affected by
noise, which may cause the boundary normal to shift.
The Gaussian kernel is explained as follows in one
dimension:
1

𝐺1𝐷 (𝑋; 𝜎) = √2𝛱𝜎 𝑒

−𝑋 2⁄
2𝜎2

(17)

To convex with the points on the cancer boundary, the
first derivative of GiD (X,) is employed. The standard
deviation is σ . In a picture, L(x,y) represents the
coordinates of the tumor boundary. Convolution
results in the points' coordinates

𝐵(𝑋1 , 𝑌1 ) = 𝑏(𝑥, 𝑦)∗ 𝐺1𝐷 (𝑋; 𝜎)′

(18)

The border normal's angles are calculated using
𝑦′

𝜃 = arctan (𝑥 ′ )

(19)

For all the locations correlated with an intensity
profile, the angle θ is used as the coordinate.

𝑋𝑖 = 𝑥𝑖 + 𝑙 × 𝑐𝑜𝑠𝜃𝑖 ,
𝑌𝑖 = 𝑦𝑖 + 𝑙 × 𝑠𝑖𝑛𝜃𝑖

(20)

There is a distance l between normal and border sites
along the tumor boundary. This is the distance between
the location on the border and the normal sites along
the tumor boundary. Therefore, the picture's
parameters (x1,y1) may not be exact pixel dimensions.
Pixels in the picture are located using linear
interpolation. Two crucial steps in building a BoW
model are patch sampling and local descriptors. To
simplify the subsequent computation for each raw
patch, the one-dimensional feature vector is created.
SIFT descriptors, which are scale and rotation
invariant, are a better alternative to raw patches. Two
visual vocabularies are created using precompiled
patches from cancer and cancer margin regions,
accordingly. As a result of this process, the
vocabularies formed grow more locally unique.
Another way to put it is that visual representation
based on a region-specific language is more
meaningful than representation based on a universal
vocabulary that uses all of the image's data.
Patches collected in the margin zone together with the
four subregions are mapped to the margin region's
vocabulary to generate the image representation in the
margin zone. The BoW representation for the margin
region is constructed by integrating the BoW
histograms for each area. If the vocabulary of the
margin sector contains k1 words, the BoW description
of the margin area is a vector with 5*k1 dimensions.
As a result, the picture now has two BoW histograms:
one for the cancer zone and one for the cancer margin.
Finally, the recommended region-specific BoW
characterization for the malignancy on a pancreatic
cancer image is created by joining these two BoW
histograms together.

3.6 Classification
CNNs are learned in a feed-forward method, with error
back-propagation from the classification layer to the
first convolutional layer, from the very first input layer
to the final classification stage. The following is an
example of a forward pass: layer l's neuron I receive
input from layer l-1's neuron j:

𝑙𝑛𝑖𝑙 = ∑𝑛𝑗=1 𝑊𝑖𝑗𝑙 𝑥𝑗 + 𝑏𝑖

(21)

Non-linearity ReLu functions are used to calculate the
output:

𝑜𝑢𝑡𝑖𝑡 = max (0, 𝑙𝑛𝑖𝑡 )

(22)

Every neuron in the convolutional and fully connected
layers uses equations (2) and (3) to analyze the input
and receive the output in the form of nonlinear
activation. The pooling layer moves a K x K square
window across the N x N feature map and calculates
the highest or average value of each variable. As a
result, the feature map's spatial size shrinks from N×N
to 𝑁/𝐾× 𝑁/𝐾.
Finally, each cancer type's classification probability is
calculated using the Softmax function:
𝑡

𝑜𝑢𝑡𝑖𝑡 =

𝑒 𝑙𝑛𝑖

(23)

𝑡

∑𝑖 𝑒 𝑜𝑢𝑡𝑘

The back-propagation algorithm is used to train a CNN
by minimizing the following cost function regarding
undetermined weights W:
1

𝑖 𝑖
𝑐 = − 𝑚 ∑𝑚
𝑖 ln (𝑝(𝑦 |𝑥 ))

(24)

The 𝑖𝑡ℎ sample in the training set with the label yi is
𝑋𝑖, and the real categorization probability is ((yi|Xi).
The mini-batch cost is used to estimate the
development costs, and stochastic gradient descent is
used to lower the cost function 𝐶 over 𝑁 mini-batches.
The weights are then modified in the next iteration as
follows, with 𝑊𝑙𝑡 denoting the weights at iteration t
for convolutional layer 𝑙 and 𝐶̂ denoting the mini-batch
cost:
𝑡𝑁

𝛾 𝑡 = 𝛾 [ ⁄𝑚]
𝑉𝑙𝑡+1 = 𝜇𝑉𝑙𝑡 − 𝛾 𝑡 ∝1
𝑊𝑙𝑡+1 =𝑊𝑙𝑡 + 𝑉𝑙𝑡+1

𝜕∁
𝜕𝑊𝑙

(25)

Where is the layer l learning rate, 𝛾 is the scheduled
rate that decreases the initial training rate 𝛼 after a
certain number of epochs, and 𝜇 is the momentum
that determines the effects of earlier modified weights
in the most recent edition.
Every iteration of training updates the weights of the
CNN layers using equation (6). There are 16 layers and
138 million weights that can be learned using the
VGG16 framework. Overfitting in the training and
development of such deep networks can be caused by
the enormous local minima in equation (5). As a result,
we needed to use the pre-trained VGG16 dataset to
create the weights. For limited datasets, however,
determining the right local minima for the cost
function in equation (5) is particularly challenging,
resulting in the overfitting of the network. In this case,

124 Informatica 47 (2023) 115–130

weights were pre-trained on the VGG16 model
[39,40].
VGG16 was fine-tuned on the PDAC dataset after the
weights were transferred. This design is discussed in
Figure 7, which illustrates the VGG16's thirteen
convolutional layers and three fully linked layers. If
we use the layer-by-layer fine-tuning technique,
adding one layer at a time will result in nineteen
layers. It will be essential to use 95 VGG16 designs to
fine-tune five-fold cross-validation. If the training
duration for each structure is roughly thirty minutes,
fine-tuning the VGG16 layer-by-layer will take more
than a week. Determining the appropriate parameters
for layer-wise fine-tuning will take a similar length of
time. The findings were slightly improved with a
layer-by-layer fine-tuning method.

R.P.R. Chegireddy et al.

hardware specifications like Ryzen 5/7 series CPU,
NV GPU, 1 TB HDD, and Windows 10 OS and
software specifications like PyTorch, an open-source
python library for developing deep learning models,
and Google Collaboratory, an open-source Google
environment for building the model. Experimental
evaluation is carried over models like Alexnet,
Googlenet, Inception v3, VGG19, and Resnet50 over
measures like accuracy, sensitivity, specificity, recall,
precision, F1-score, detection rate, TPR, FPR, and
computation time. Table 2 depicts the overall analysis
of various models over 5 image instances under
accuracy, sensitivity, and specificity. Figure 9 depicts
the graphical representation of various models over the
accuracy, sensitivity, and specificity.
Table 3: Overall analysis under accuracy, sensitivity,
specificity.
Models

Accura
cy
81
84

Sensitiv
ity
85
89

Specific
ity
87
91

Imag
es

88

91

93

Imag
e1

87
76

92
81

95
84

96

97

98

Alexnet
Google
net
Inceptio
n v3
VGG19
Resnet5
0
VGG16

81.3
84.6

85.4
89.1

87.1
91.4

88.2

91.4

93.3

87.4
76.2

92.5
81.4

95.2
84.4

96.3

97.2

98.2

Alexnet
Google
net
Inceptio
n v3
VGG19
Resnet5
0
VGG16

81.5
84.7

85.7
89.4

87.3
91.5

88.4

91.7

93.6

87.6
76.4

92.7
81.8

95.4
84.7

96.5

97.5

98.5

Alexnet
Google
net
Inceptio
n v3
VGG19
Resnet5
0
VGG16

81.8
84.8

85.8
89.6

87.6
91.7

88.6

91.8

93.8

87.7
76.7

92.9
81.9

95.6
84.8

96.7

97.7

98.7

Alexnet
Google
net
Inceptio
n v3
VGG19
Resnet5
0
VGG16

82
85

86
90

87.8
92

89

92

94

87.9
76.8

93
82

95.7
84.9

96.9

97.8

98.8

Alexnet
Google
net
Inceptio
n v3
VGG19
Resnet
50
VGG16

Figure 7: VGG16 network trainable parameters
Based on the pooling layers, the VGG16 architecture
can be divided into six blocks. Figure 7 illustrates this
approach. The block-wise layout of the VGG16 is
depicted in Figure 8. The final fully connected layer of
VGG16 generally consists of 1000 neurons that relate
to ImageNet classes. According to the classes in the
PDAC dataset, the final fully connected layer of this
model is made up of three neurons.

Figure 8: VGG16 architecture and its respective
blocks.

4 Performance analysis
The proposed model has trained over 70% of the
dataset and 30% for testing under an epoch of 10 and a
learning rate of 0.09. The model is implemented using

Imag
e2

Imag
e3

Imag
e4

Imag
e5

Informatica 47 (2023) 115–130 125

A Novel Method for Human Mri Based Pancreatic Cancer…

Figure 9: Models vs Measures overall analysis under
accuracy, sensitivity and specificity

Table 2 depicts the overall analysis of various models
under precision, recall, and F1-score. Figure 10 illustrates
a graphical representation of various models.
Table 4: Overall analysis under precision, recall, F1score.
Models

Precisio
n
83
82
87

Recal
l
74
78
82

F1score
83
86
81

85
79
93

84
68
86

87
71
89

Alexnet
Googlenet

83.4
82.5

74.2
78.5

83.4
86.1

Inception
v3
VGG19
Resnet50
VGG16

87.2

82.1

81.4

85.3
79.1
93.3

84.3
68.2
86.3

87.5
71.4
89.2

Alexnet
Googlenet
Inception
v3
VGG19
Resnet50
VGG16

83.6
82.7
87.5

74.4
78.7
82.6

83.6
86.4
81.6

85.5
79.3
93.6

84.4
68.5
86.7

87.7
71.5
89.5

Alexnet
Googlenet
Inception
v3
VGG19
Resnet50
VGG16
Alexnet
Googlenet
Inception
v3
VGG19
Resnet50
VGG16

83.7
82.8
87.8

74.7
78.8
82.8

83.7
86.6
81.7

85.7
79.7
93.7
84
83
87.9

84.7
68.7
86.8
75
79
83

87.8
71.7
89.8
84
87
82

86
80
94

85
69
87

88
72
90

Alexnet
Googlenet
Inception
v3
VGG19
Resnet 50
VGG16

Figure 10: Models vs Measures. Overall analysis under
precision, recall and F1-score.
Table 3 depicts the overall analysis of various models
under detection rate, TPR and FPR. Figure 11 depicts a
graphical representation of various models in which
the proposed model outperforms at a greater rate.
Figure 12 depicts a graphical representation of various
models over computation time which will be obtained
during the training period. Figure 13 depicts the output
instances of segmentation.

Images

Table 5: Overall analysis under detection rate, TPR,
FPR.
Image
1

Image
2

Models
Alexnet
Googlenet
Inception
v3
VGG19
Resnet50
VGG16

Detection
rate
85
83
90

TPR

FPR

82
81
87

18
19
13

86
78
95

83
73
92

17
27
8

Image
3

Image
4

Figure 11: Models vs Measures. Overall analysis under
detection rate, TPR and FPR
Image
5

126 Informatica 47 (2023) 115–130

R.P.R. Chegireddy et al.

(e)
Figure 12: Models vs Computation time during the
training period.

(a)

(f)

(b)

Figure 13. Segmentation output where (a,b,c) depict
unhealthy output and (d,e,f) depicts the healthy output
using UNET++.

(c)

(d)

5 Discussion
The purpose of this study is to demonstrate the
effectiveness of MRI image analysis using the VGG16 model with the Harris hawk’s optimization (HHO)
algorithm in segmentation and feature selection for
pancreatic cancer classification from MRIs. Because
MRI provides better contrast between fat, water,
muscle, and other soft tissues than CT, it generally has
a good spatial resolution compared to other modalities.
Conventional MRI has shown a high degree of
sensitivity and specificity for the detection of
pancreatic tumors based on reviews of previous studies
[41] with awareness of the presence of the tumor. The
sensitivity of our proposed framework for the detection
of pancreatic cancer was 96.34 on the test data set, as
well as precision, recall, and F1 score that were
considered as high compared to other approaches in
the literature discussed in table 2. As a result, our

Informatica 47 (2023) 115–130 127

A Novel Method for Human Mri Based Pancreatic Cancer…

framework is comparable to the ability of humans to
recognize images. In an analysis of 225 asymptomatic
patients with a high risk of pancreatic cancer, Canto et
al. (2017) [42] found that EUS (Endoscopic
Ultrasound Scan) had the highest rates of tumor
detection (42%) as compared to CT (11%), and MRI
(33%). For tumor detection, Tonozoko et al. (2021)
[17] used EUC imaging, which yields a higher
sensitivity and can detect smaller tumors (Grade I).
However, due to the risks and convenience issues
associated with EUS, the MRI appears to be a better
method. Hence in our model, we use MRI images for
the detection of pancreatic cancer. The classification
accuracy of the proposed method is 93.52 as compared
to the image classification model of VGG-19 [18]
which shows an accuracy of 87.52. According to Fu et
al. (2021) [19], the inception model uses nuclear
features that lead to false positive results that can be
avoided by optimizing the selection of features. In our
proposed model we utilized a VGG-16 framework
with HHO-based CNNs and HHO-based Bags of
visual terms for feature extraction and selection to
improve the accuracy even with a smaller number of
convolutional layers as compared to the VGG-19
model [18].
In general, unlike computers, the human brain does not
perform at its best when fatigued, stressed, or limited
in experience, which results in misdiagnosis or
overlooking a lesion during an MRI. Artificial
intelligence, on the other hand, can consistently
provide reliable performance within a very short
period, thereby compensating for the limitations of
human capability and preventing human errors in
clinical practice. As a result, our framework can be
useful for both beginners learning MRI, as well as
fatigued experts or carelessness caused by individuals
who have accumulated a large number of screenings.
Additionally, the data set for this study included a
variety of images, including those with hazy borders
and unclear images, which are frequently seen in
clinical exams. These images were then enhanced
using the Boosted Anisotropic Diffusion Filter
(BADF) and Contrasted Limited Adaptive Histogram
Equalization (CLAHE) algorithms to improve the
image quality for better accuracy. Therefore, we
believe that our system can detect diverse tumors by
learning the images and through the utilization of
better image-enhancing techniques and optimal feature
selection strategies.

6 Conclusion
This paper brings an effective yet novel approach for
pancreatic cancer detection at an earlier stage using
deep learning. For this, initially, MRI data are
collected from the popular repository CTAC-PDAC
and with the help of CLAHE and BADF,
preprocessing is done and then proceeded to segment
cancer regions using UNet++. Further, for extracting
quintessential features along with selection, the use of

both HHO-based CNN and BOVW is done. Finally for
effective use of transfer learning VGG16 is performed
for detection. The proposed model outperforms better
with 0.96% accuracy over state-of-the-art models
under various measures. This paper will be helpful for
another research specialist to dig deep and get to
understand the stages and come up with better
integrated and advanced models.

References
[1] National Cancer Institute (28 February 2021).
Surveillance, Epidemiology, and End Results
(SEER) Cancer Stat Facts: Pancreatic Cancer.
Available
online:
https://seer.cancer.gov/statfacts/html/pancreas.htm
l.
[2] Siegel, R.L.; Miller, K.D.; Jemal, A. (2020)
Cancer statistics, CA Cancer J. Clinicians. 2020,
70, 7–30. https://doi.org/10.3322/caac.21590.
[3] Andersson.R., Vagianos.C.E.; Williamson.R.C. N
(2004). Preoperative staging, and evaluation of
resectability in pancreatic ductal adenocarcinoma.
HPB
(Oxford)
2004,
6,
5–12.
https://doi.org/10.1080/13651820310017093
[4] Jonathan D Mizrahi, Rishi Surana, Juan W Valle,
Rachna T Shroff (2020). Pancreatic cancer. The
Lancet,
395,
2008–2020.
https://doi.org/10.1016/s0140-6736(20)30974-0.
[5] Giada.M.M.(2020). The ambiguous role of the
inflammatory micromilieu in solid tumors.
Pathology.41,118–123.
https://doi.org/10.1007/s00292-020-00837-1.
[6] Mayer, P.; Dinkic, C.; Jesenofsky, R.; Klauss, M.;
Schirmacher, P.; Dapunt, U.; Hackert, T.; Uhle,
F.; Hansch, G.M.; Gaida, M.M. (2018). Changes
in the microarchitecture of the pancreatic cancer
stroma are linked to neutrophil-dependent
reprogramming of stellate cells and reflected by
diffusion-weighted magnetic resonance imaging.
Theranostics,
8,
13–30.
https://doi.org/10.7150/thno.21089.
[7] Grosse-Steffen, T.; Giese, T.; Giese, N.;
Longerich, T.; Schirmacher, P.; Hansch, G.M.;
Gaida, M.M. (2012). Epithelial-to-mesenchymal
transition in pancreatic ductal adenocarcinoma and
pancreatic tumor cell lines: The role of neutrophils
and neutrophil-derived elastase. Clinical and
Developmental
Immunology.
1-12.
https://doi.org/10.1155/2012/720768
[8] Gaida, M.M.; Steffen, T.G.; Gunther, F.;
Tschaharganeh, D.F.; Felix, K.; Bergmann, F.;
Schirmacher, P.; Hansch, G.M. (2012). Polymerphotonuclear neutrophils promote dyshesion of
tumor cells and elastase-mediated degradation of Ecadherin in pancreatic tumors. Eur. J. Immunol. 42,
3369–3380. https://doi.org/10.1002/eji.201242628.
[9] Verbeke, C.(2016). Morphological heterogeneity in
ductal adenocarcinoma of the pancreas—Does it
matter?
Pancreatology.
16,
295–301.
https://doi.org/10.1016/j.pan.2016.02.004.

128 Informatica 47 (2023) 115–130

[10] Hruban, R.H.; Adsay, N.V.; Albores-Saavedra, J.;
Compton, C.; Garrett, E.S.; Goodman, S.N.; Kern,
S.E.; Klimstra, D.S.; Kloppel, G.; Longnecker,
D.S.; (2001). Pancreatic intraepithelial neoplasia:
(A new nomenclature and classification system for
pancreatic duct lesions). Am. J. Surg. Pathol.,25,
579–586.
https://doi.org/10.1097/00000478200105000-00003.
[11] Ren, B.; Liu, X.; Suriawinata, A.A. (2019)
Pancreatic Ductal Adenocarcinoma and Its
Precursor Lesions: Histopathology, Cytopathology,
and Molecular Pathology. Am. J. Pathol.,189, 9–21.
https://doi.org/10.1016/j.ajpath.2018.10.004.
[12] Esposito, I.; Hruban, R.H.; Verbeke, C.; Terris, B.;
Zamboni, G.; Scarpa, A.; Morohoshi, T.; Suda, K.;
Luchini, C.; Klimstra, D.S.; et al. (2020).
Guidelines on the histopathology of chronic
pancreatitis. Recommendations from the working
group for the international consensus guidelines for
chronic pancreatitis in collaboration with the
International Association of Pancreatology, the
American Pancreatic Association, the Japan
Pancreas Society, and the European Pancreatic
Club.Pancreatology,20,
586–593.
https://doi.org/10.1016/j.pan.2020.04.009.
[13] Hanna, M.G.; Reuter, V.E.; Hameed, M.R.; Tan,
L.K.; Chiang, S.; Sigel, C.; Hollmann, T.; Giri, D.;
Samboy, J.; Model, C.; et al. (2019). Whole slide
imaging equivalency and efficiency study:
Experience at a large academic center. Modern
Pathology.32,
916–928.
https://doi.org/10.1038/s41379-019-0205-0
[14] Markl, B.; Fuzesi, L.; Huss, R.; Bauer, S.; Schaller,
T. (2020). Number of pathologists in Germany:
Comparison with European countries, USA, and
Canada.
Virchows
Arch.478,2,335-341.
https://doi.org/10.1007/s00428-020-02894-6
[15] Metter, D.M.; Colgan, T.J.; Leung, S.T.; Timmons,
C.F.; Park, J.Y. (2019). Trends in the US and
Canadian Pathologist Workforces From 2007 to
2017. JAMA Network Open, 2, e194337.
https://doi.org/10.1001/jamanetworkopen.2019.433
7
[16] Jiang, Y.; Yang, M.; Wang, S.; Li, X.; Sun, Y.
(2020). Emerging role of deep learning-based
artificial intelligence in tumour pathology. Cancer
Commun.
(Lond.),
40,
154–166.
https://doi.org/10.1002/cac2.12012
[17] Tonozuka, R., Itoi, T., Nagata, N., Kojima, H.,
Sofuni, A., Tsuchiya, T., ... & Mukai, S. (2021).
Deep learning analysis for the detection of
pancreatic cancer on endosonographic images: A
pilot study. Journal of Hepato‐Biliary‐Pancreatic
Sciences,
28(1),
95-104.
https://doi.org/10.1002/jhbp.825.
[18] Liu, K. L., Wu, T., Chen, P. T., Tsai, Y. M., Roth,
H., Wu, M. S., ... & Wang, W. (2020). Deep
learning to distinguish pancreatic cancer tissue from
non-cancerous pancreatic tissue: a retrospective
study with cross-racial external validation. The
Lancet
Digital
Health,
2(6),
e303-e313.

R.P.R. Chegireddy et al.

https://doi.org/10.1016/s2589-7500(20)30078-9
[19] Fu, Hao, et al. (2021). "Automatic pancreatic
ductal adenocarcinoma detection in whole slide
images using deep convolutional neural
networks." Frontiers in oncology. 11. 2464.
https://doi.org/10.3389/fonc.2021.665929
[20] Abbas, Sabah Khudhair, and Rusul Sabah Obied.
(2021). "Novel Computer Aided Diagnostic
System Using Synergic Deep Learning Technique
for Early Detection of Pancreatic Cancer."
Webology 18. Special Issue on Information
Retrieval
and
Web
Search.
367-379.
https://doi.org/10.14704/web/v18si02/web18105
[21] Li, J., Qi, L., Chen, Q., Zhang, Y. D., & Qian, X.
(2022). A Dual Meta-Learning Framework based
on Idle Data for Enhancing Segmentation of
Pancreatic Cancer. Medical Image Analysis,78.
102342.
https://doi.org/10.1016/j.media.2021.102342
[22] Online
source:
https://www.google.com/url?sa=t&source=web&r
ct=j&url=https://wiki.cancerimagingarchive.net/pl
ugins/servlet/mobile%3FcontentId%3D21267608
%23content/view/21267608&ved=2ahUKEwjR0t
C2KH3AhVUS2wGHcYYCBAQFnoECA4QAQ&u
sg=AOvVaw0pVpGZHP1Z40YdRfI5vBnt.
[23] Suman, G., Patra, A., Korfiatis, P., Majumder, S.,
Chari, S. T., Truty, M. J., ... & Goenka, A. H.
(2021). Quality gaps in public pancreas imaging
datasets: implications & challenges for AI
applications. Pancreatology, 21(5), 1001-1008.
https://doi.org/10.1016/j.pan.2021.03.016
[24] Rao, K., Bansal, M., & Kaur, G. (2022). RetinexCentred Contrast Enhancement Method for
Histopathology Images with Weighted CLAHE.
Arabian Journal for Science and Engineering,
47(11),13781-13798.
https://doi.org/10.1007/s13369-021-06421-w
[25] Rodríguez-Pena, A., Uranga-Solchaga, J., Ortiz-deSolórzano, C., & Cortés-Domínguez, I. (2020).
Spheroscope: A custom-made miniaturized
microscope for tracking tumour spheroids in
microfluidic devices. Scientific Reports, 10(1), 112. https://doi.org/10.1038/s41598-020-59673-1
[26] Sawssen, B., Okba, T., & Noureeddine, L. (2022).
A mammographic image classification technique
via the Gaussian Radial Basis Kernel ELM and
KPCA. Conference:International conference on
Mathematics and Computers in Science and
Engineering, at H10,spain.
[27] Uplaonkar, D. S., & Patil, N. (2021). Ultrasound
liver tumor segmentation using adaptively
regularized kernel-based fuzzy C means with the
enhanced level set algorithm. International Journal
of Intelligent Computing and Cybernetics.
15(3),438-453.https://doi.org/10.1108/ijicc-102021-0223
[28] Iima, M., & Le Bihan, D. (2016). Clinical
intravoxel incoherent motion and diffusion MR
imaging: past, present, and future. Radiology,

A Novel Method for Human Mri Based Pancreatic Cancer…

278(1),
13-32.
https://doi.org/10.1148/radiol.2015150244
[29] Goyal, B., Dogra, A., Agrawal, S., Sohi, B. S., &
Sharma, A. (2020). Image denoising review: From
classical to state-of-the-art approaches. Information
fusion,
55,
220-244.
https://doi.org/10.1016/j.inffus.2019.09.003
[30] Long, J., Shelhamer, E., Darrell, T. (2015): Fully
convolutional networks for semantic segmentation.
In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp.
3431–3440.
https://doi.org/10.1109/cvpr.2015.7298965
[31] Ronneberger, O., Fischer, P., Brox, T. (2015): UNet: convolutional networks for biomedical image
segmentation. LNCS, vol. 9351, pp. 234–241.
Springer, Cham. https://doi.org/10.1007/978-3-31924574-4_28
[32] Zhang, L., Shi, Y., Yao, J., Bian, Y., Cao, K., Jin,
D., ... & Lu, L. (2020). Robust pancreatic ductal
adenocarcinoma segmentation with multiinstitutional multi-phase partially-annotated CT
scans. Medical Image Computing and Computer
Assisted Intervention – MICCAI 2020, 491-500
https://doi.org/10.1007/978-3-030-59719-1_48
[33] Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh,
N., & Liang, J. (2018). Unet++: A nested u-net
architecture for medical image segmentation.
Deep learning in medical image analysis and
multimodal learning for clinical decision support
(pp. 3-11). https://doi.org/10.1007/978-3-03000889-5_1
[34] Basha, J., Bacanin, N., Vukobrat, N., Zivkovic, M.,
Venkatachalam, K., Hubálovský, S., & Trojovský,
P.(2021). Chaotic Harris hawks optimization with
quasi-reflection-based learning: An application to
enhance CNN design. Sensors, 21(19), 6654.
https://doi.org/10.3390/s21196654
[35] Thaher, T., Heidari, A. A., Mafarja, M., Dong, J. S.,
& Mirjalili, S. (2020). Binary Harris Hawks
optimizer for high-dimensional, low sample size
feature selection. Algorithms for Intelligent
Systems.251-272. https://doi.org/10.1007/978-98132-9990-0_12
[36] Bar, Y., Diamant, I., Wolf, L., Lieberman, S.,
Konen, E., & Greenspan, H. (2018). Chest
pathology identification using deep feature
selection with non-medical training. Computer
Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization, 6(3), 259263.
https://doi.org/10.1080/21681163.2016.1138324

Informatica 47 (2023) 115–130 129

[37] Chaib, S., Gu, Y., & Yao, H. (2015). An
informative feature selection method based on
sparse PCA for VHR scene classification. IEEE
Geoscience and Remote Sensing Letters, 13(2),
147-151. https://doi.org/10.1109/lgrs.2015.2501383
[38] Huang, M., Yang, W., Yu, M., Lu, Z., Feng, Q., &
Chen, W. (2012). Retrieval of brain tumors with
region-specific bag-of-visual-words representations
in contrast-enhanced MRI images. Computational
and mathematical methods in medicine.1-17.
https://doi.org/10.1155/2012/280538
[39] Baldota, S., Sharma, S., & Malathy, C. (2021,
July). Deep Transfer Learning for Pancreatic
Cancer Detection. In 2021 12th International
Conference on Computing Communication and
Networking
Technologies
(ICCCNT)
1-7.
https://doi.org/10.1109/icccnt51525.2021.9580000
[40] Sehmi, M. N. M., Fauzi, M. F. A., Ahmad, W. S.
H. M. W., & Chan, E. W. L. (2021). Pancreatic
cancer grading in pathological images using deep
learning
convolutional
neural
networks.
F1000Research,
10(1057),
1057.
https://doi.org/10.12688/f1000research.73161.1
[41] Costache, M. I., et al (2017). "Which is the best
imaging method in pancreatic adenocarcinoma
diagnosis and staging-CT, MRI or EUS?" Current
health
sciences
journal
43(2).132-136.
http://dx.doi.org/10.12865/CHSJ.43.02.05
[42] Canto, Marcia Irene, et al. (2012). "Frequent
detection of pancreatic lesions in asymptomatic
high-risk individuals." Gastroenterology 142(4)
796-804.
https://doi.org/10.1053/j.gastro.2012.02.029

130 Informatica 47 (2023) 115–130

R.P.R. Chegireddy et al.

Informatica 47 (2023) 131–140 131

https://doi.org/10.31449/inf.v47i1.3902

Assessing Mental Health Crisis in Pandemic Situation with
Computational Intelligence
Megha Rathi, Adwitiya Sinha*, Siddhant Tulsyan, Avishka Agarwal, Anushka Srivastava
Dept. of Comp. Sc. & Engineering and Information Technology Jaypee Institute of Information Technology, India
E-mail: megha.rathi@jiit.ac.in, mailtoadwitiya@gmail.com, siddhant05tulsyan@gmail.com,
avishka2404@gmail.com, anushka.srivastava2398@gmail.com
*Corresponding author
Keywords: mental health crisis, healthcare information management, computational intelligence, machine learning,
synthetic minority oversampling, covid-19, biomedical informatics.
Received: January 6, 2022
The coronavirus pandemic has created huge emotional distress and increased the risk of psychiatric
problems. This happened owing to imposition of necessary stringent healthcare measures that infringed
personal space, emotional freedom, and caused financial loss. Our physical well-being is directly
associated with mental fitness and health. From analysis it has been found that feature like struggling in
concentration and memory, visionary issues, and arthritis are customary symptoms in patients suffering
from mental crises. Our proposed research work aims to find out the reasons behind mental illness and
ways to improve mental disorders using supervised approach. The main focus is to develop a smart
computationally intelligent model to assist healthcare practitioners in analysing and diagnosing severe
mental illness. Our proposed model assists in analysing causes of mental disorder and aids in reducing
total medicinal cost along with reduced mental illness rate. Additionally, a recommendation system is also
developed for diagnosing depressive patients.
Povzetek: Opisana je inteligentna metoda za pomoč pri mentalnih boleznih, povezanih s pandemijami.

1

Introduction

The public health emergencies imposed during Covid-19
pandemic has caused distressed in communities at large.
The mandating of sudden and unfamiliar public safety
norms have caused emotional distress among people [1].
As the normal course of living was severely encroached
by home confinement and social distancing, many cases
of mental health crisis started to erupt. Moreover, people
who suffered from recurrent ailments during the pandemic
become even more vulnerable to psychiatric problems and
other severe health havocs. As a result, the yearly
medicinal cases for mental disorders started increasing
globally, and hence it become essential to reveal the root
causes for mental disorders, including anxiety, depression,
and many more adverse psychosocial disorders [2].
Moreover, the total expenditure for treating patients also
increased, which includes restorative cost for treatment. In
order to understand these ballooning costs, several largescale epidemiological studies are being conducted to
provide information on the health of United States
citizens. One such study, the Behavioural Risk Factor
Surveillance System (BRFSS), conducts surveys to collect
uniform data on health risk behaviours, chronic diseases,
access to healthcare information, and to employ
preventative medical services in the United States [3].
This survey provides valuable information on behavioural
patterns which, if coupled with current big data and
machine learning techniques, may help to provide

valuable insights into persons at risk of mental health
crises. By targeting and understanding these populations,
preventative health measures could be put into place to
ultimately help lower health care costs in the United
States. Adults with depression and anxiety are
significantly more expected to smoke, to be obese, to be
physically inactive, to binge drink, and to drink more
heavily than those who do not display any symptoms of
depression and anxiety. Additionally, a dose-dependent
relation exists between severity of depression and the
smoking intensity, obesity, and physical inactivity, in
which individuals who are more depressed become prone
to heavy engagements in such activities. In a study of the
2012 Behavioural Risk Factor Surveillance System
(BRFSS) data, found that there are significant
relationships between depression and childhood mental
illness, limited usual activity, and abuse [4]. In proposed
research work J48 classification tree is used to predict
depression with 82% accuracy, using these predictive
attributes. Our research aims to create a solid foundation
with the use of machine learning in helping to predict
mental crises using multiple health attributes.

2

Background study

Extensive research and case studies were conducted
in assessing the acuteness of emotional distress and
forecasting mental health crisis. The authors in [5] have
thoroughly discussed on the major stressors caused due to

132 Informatica 47 (2023) 131–140
quarantine and isolation measures, and different ways to
reduce its impact. In [6], several tools and measures were
suggested for measuring the psychological impact of the
Covid-19 pandemic. Moreover, technicians often face
scarcity and imbalance in healthcare data that pose a major
challenge for training models and supervised learning.
This has been taken forward by the author in [7] to deal
with the development of classifiers from imbalanced
datasets. A dataset is considered to be imbalanced, when
the characterization classes are not roughly similar.
Frequently certifiable informational indexes are
predominately made out of ordinary precedents with just
a little level of strange or intriguing models. It is
additionally the situation that the expense of
misclassifying an anomalous (fascinating) model as an
ordinary precedent is regularly a lot higher than the
expense of the invert blunder. The authors have
demonstrated that their proposed technique can
accomplish better classifier execution for over-examining
the minority (strange) class and under inspecting the
greater part (typical) class in the Receiver Operating
Characteristic (ROC) space, than just under testing the
larger part class. In another novel work, Synthetic
Minority Oversampling Technique (SMOTE) Rough Set
Theory (RST)is proposed, which is dependent on
oversampling and under sampling for high imbalanced
informational indexes [8]. SMOTE-RSB is a hybrid data
pre-processing approach that manages imbalanced
informational indexes through the development of new
examples and samples, utilizing SMOT together with the
use of an altering method dependent on the RST and the
lower estimation of a subset. The proposed technique has
been approved by a trial think about demonstrating great
outcomes utilizing C4.5 as the learning calculation.
Multi-mark learning has been turning into an
inexorably dynamic region into the machine learning
group since a wide variety of true issues are normally
multi-named. Destroyed is an oversampling system that
has been effectively connected for adjusting singlemarked informational indexes, however has not been
utilized in multi-name structures up until now. In this
regard, authors in [9] highlighted a few methodologies are
proposed and contrasted by the author all together with
produce manufactured examples for adjusting
informational indexes in the preparation of multi-name
calculations. Results demonstrate that a right
determination of seed tests for oversampling improves the
grouping execution of multi-mark calculations. In yet
another novel work [10], authors inspected the general
social insurance costs related with sorrow depression and
also, tension among essential consideration patients. Out
of 2110 back to back essential consideration patients in a
wellbeing support association, 12-thing Health General
Questionnaire were screened with 1,962 people. 615
patients were further selected for indicative appraisal;
Composite International diagnostic review performed on
373 patients and 328 were re-examining 12 months after
the fact. Electronic cost records were utilized to compute
absolute human services costs for the half year time frame
encompassing the gauge evaluation and a comparative
period encompassing the subsequent appraisal. Cost

M. Rathi et al.

contrasts reflected higher use of general therapeutic
administrations instead of higher psychological wellness
treatment costs. In research [11], authors used
computerized record frameworks of a vast staff model
well-being up keep association (HMO) were utilized to
distinguish sequential essential consideration patients
with visit findings of sorrow and a correlation test of
essential consideration patients with no melancholy
conclusion. Comparable cost contrasts were watched for
every one of the subdivisions inspected (treatment using
antidepressants, treatment without antidepressants, and
patients analysed at routine physical clinical visits). Drug
store records showed more noteworthy perpetual
medicinal sickness in the analysed discouragement
gathering, however huge cost contrasts stayed after
alteration ($3971 versus $2644). Two overlap cost
contrasts endured for no less than a year after
commencement of treatment. As an end, creators found
that finding of misery is related with a summed-up
increment being used of wellbeing administrations that is
just halfway clarified by co grim ailments.
The authors of the paper [12], regulated a poll to 367
patients with type-1 and type-2 diabetes from the primary
care clinics of two healthcare information management
organizations, to get information on socioeconomics,
burdensome side effects, diabetes learning, working, and
diabetes self-care. Based on computerized information,
we quantified therapeutic comorbidity, social insurance
costs, glycosylated haemoglobin (HbA1c) levels, and oral
hypoglycaemic remedy refills. Utilizing burdensome side
effect seriousness tertiles (less, mid-range, or highest),
they performed relapse investigations to decide the effect
of burdensome indications on constancy to diabetes selfsupport and oral hypoglycaemic regimens, HbA1c levels,
utilitarian debilitation, and human services costs.
Compared with patients in the low-seriousness gloom side
effect tertile, those in the medium and high-seriousness
tertiles were essentially less follower to dietary
suggestions. Further investigations testing the viability
and cost-adequacy of upgraded models of consideration of
diabetic patients with sorrow are required. In yet another
contribution in the field of mental illness authors have
provided information about imbalanced learning issues
that hold an unlike conveyance of information tests among
various classes and represent a test to any classifier as it
turns out to be difficult to get familiar with the minority
class tests [13]. This paper distinguishes that the majority
of the current oversampling techniques may create the
wrong engineered minority tests in certain situations and
make learning undertakings harder. To overcome this,
Majority Weighted Minority Oversampling Technique
(MWMOTE) is introduced for productively handling with
variant learning issues. MWMOTE first distinguishes the
difficult to-learn educational minority class tests and
relegates them loads as per their Euclidean separation
from the closest larger part class tests. In another
contribution, the authors have shown a novel Cluster
Based Synthetic Oversampling (CBSO) algorithm in the
proposed study [14]. CBSO receives its fundamental
thought from existing manufactured oversampling
methods and consolidates unsupervised clustering in its

Assessing Mental Health Crisis in Pandemic Situation with…

engineered information age system. One of the core
machine learning algorithms that gained achievement in
health analytics is Support Vector Machine (SVM).
Statistics of SVM makes it suitable to handle all type of
medical datasets. In numerous settings, we additionally

Informatica 47 (2023) 131–140 133

have the choice of utilizing pool-based dynamic learning.
Dynamic Learning with help vectors is examined in the
study [15], i.e., a computation for picking which examples
to demand straightaway. In another work, comparative

Table 1: Summarized application of machine learning techniques in mental health analysis.
S.No.

Author, Year

Objective

Approach

Results

O. Oyebode, F.
Alqahtani and R.
Orji, 2020 [24].

In the recent study authors
have analyzed mental health
apps. They have evaluated
online available 104 mental
health apps and perform
sentiment analysis on reviews.

Support Vector Machine
(SVM), Multinomial Naïve
Bayes (MNB), Stochastic
Gradient Descent (SGD),
Logistic Regression (LR), and
Random Forest (RF).

2.

Ela Gore, Sheetal
Rathi, 2019 [25].

In this work, authors surveyed
researches done for the
applicability of machine
learning for mental heal
analysis.

This paper surveyed numerous
machine and deep learning
models as SVM, K-Nearest
Neighbor (KNN), Random Tree,
Convolution Neural Network
(CNN), Recurrent Neural
Network (RNN) etc.

F1 Score and accuracy is
compared and it is found
that SGD achieved the best
overall F1 score of 89.42
then followed by SVM, and
LR.
From the survey it is
concluded that SVM with
their different kernels and
CNN models utilized in many
of the research work. They
also give better results in
terms of parameters like
accuracy, etc.

3.

Sabourin, A. A.,
Prater, J. C., &
Mason, N. A., 2019
[26].

4.

Hou, Y., Xu, J.,
Huang, Y., & Ma, X.
,2016 [27].

1.

In today’s competitive era
students are at high mental
health risk. Authors compared
the mental health status of
pharmacy students to other
university students.
This one is another significant
work done for analyzing mental
health profile of students. It
targets to find association
between reading habits of
students and depression
induced due to reading

Computational techniques like
SVM, Naive Bayes (NB), KNN,
and Random Forest (RF) used.

RF achieves precision
approximately equal to
83.33%, NB 71.42%, SVM
85.71% and KNN 55.55%.

Compare algorithms like SVM,
KNN, Decision Tree DT,
Artificial Neural Network (ANN)
, and Bayesian Classifier.

Most Accurate classifier is
SVM with 82% accuracy.

Gokten, E. S., &
Uyulan, C. ,2021
[28].

Advanced machine learning
techniques are applied to
predict psychiatric disorders

Random Forest is used and
applied on a record of 482
children.

Following results were
obtained for kids with
mental disorder: Accuracy=
72%, F1-Score=71%,
Precision= 72%, and Recall=
71%.

6.

Xin, Y., Ren, X, 2022
[29].

Purpose of this research work is
to forecast the psychiatric
illness amongst old age people
from the aspects like health
profile, relationship with family,
social behaviour, demographic
location, and behaviour of
health.

This paper used the random
forest classifier to predict the
depression of old age people.

The psychiatric disorder of
rural old age grouped was
57.67%, and that of urban
was 44.59%.

7.

Srividya, M.,
Mohanavalli, S., &
Bhalaji, N., 2018
[30].

Application of numerous
machine learning techniques to
identify mental health is the
main objective of this work.

Logistic Regression (LR), SVM,
NB, DT, KNN, RF, and Bagging.

Highest Accuracy achieved
by ensemble approach
Bagging (90%) and RF (90%)
followed by SVM (89%) and
KNN (89%).

8.

Tate, A. E., McCabe,
R. C., Larsson, H.,
Lundström, S.,
Lichtenstein, P., &
Kuja-Halkola, R.,
2020 [31].

A Machine Learning Model is
developed and compared for
predicting mental illness in
adolescence. All techniques are
explored based on statistical
evaluation parameters.

RF, XGBoost, Neural Network
(NN), logistic regression (LR),
neural network and SVM.

Models compared using Area
under Curve (AUC) and it is
noticed that SVM and RF had
highest AUC’s equals to
0.754.

5.

134 Informatica 47 (2023) 131–140

9.

Reddy, U. S., Thota,
A. V., & Dharun, A.,
2018 [32].

Stress patterns are analyzed in
working professionals using
machine learning techniques in
order to highlight the factors
that strongly affect the stress
level.

analysis of various computational intelligence
mathematical statistics for various infections
determination, for instance, heart disease, diabetes,
dengue, and hepatitis is presented [16]. Main emphasis
of this review work to highlight the importance of
machine learning techniques towards decision support
system and diagnostics. In yet another novel work,
authors highlighted the major mental issues also
explored treatment coverage country wise [17]. In
another survey [18] author has cited the importance and
significance of smart devices for assessing anxiety,
stress, and depression. Various work has been done in
the area of health informatics for finding and extracting
valuable insights using machine learning techniques [1920]. From these researches it has been concluded that
machine learning plays significant role in extracting and
predicting health outcomes [21-23, 41-43].
In our research initiative, supervised machine
learning approach is used to build a computationally
efficient model to serve the mental health crisis in the
society. Our proposed model ensures biomedical
applicability by aiding the doctors to provide reliable
healthcare service delivery to patients with mental health
issues. List of related work in the domain of analyzing
mental health illness is presented in Table I.

3 Proposed methodological
framework
In our research, the BRFSS dataset was considered,
which further required downstream analysis. This
required data scrubbing and pre-processing techniques
for cleaning and preparing the data for experimentation.
Various machine learning algorithms were applied on the
cleaned data set and respective accuracies were
predicted. Recommendation system was built on the
basis of this model using shiny web application [33-34].

Figure 1: Structural Flow of Proposed Framework.

M. Rathi et al.

LR, KNN, DT, Boosting, Bagging,
RF.

From the results it has been
concluded that embedded
approach boosting achieves
highest 75.13% accuracy.

3.1. Data collection
The Behavioural Risk Factor Surveillance System
(BRFSS) is a random annual phone-based survey which
tracks health risk behaviours, chronic diseases, access to
health care, and the use of preventative healthcare
service management in the United States, available
freely for access [4]. The most current data year (2016)
was used for this project, which contained 450 attributes
and 486,303 records. All questions asked in the survey
(attributes) are available in [3]. Mental illness was
characterized by individuals who had current anxiety and
depression, life-time depression detection, and or a
lifetime anxiety diagnosis and the class attribute (Mental
Crises) were compiled based on these answers.

3.2. Data processing & scrubbing
Data scrubbing is the necessary action required to
remove repeated, incorrect, and improperly data from the
dataset [35-37]. We renamed data frame to prevent
overwriting the original file, and identify the column
names.
The attributes of original data were written in their short
forms which were not easy to comprehend. These
attribute names were expanded to make more sense of
the data. It helped to read the data easily and connect
different habits of a patient with its mental status. Since
there were 450 attributes, some of these attributes were
removed which were not needed Attributes that had no
relevant meaning or no practical significance like
telephone number, address, number of family members,
etc that summed up to 60 columns, were removed.
Record identification column was removed from the data
base as it is unnecessary for downstream analysis. Our
dataset consists of 6, 17, 07, 536, and NA values. This
value was quite huge and hence was interfering in the
various machine learning algorithms. Survey contained
answer choices in the form of none (88), do-not-know
(7), refused (9), etc. which were replaced to NA as it did
not contribute in prediction. To normalize the data set,
all the NA values were then replaced by means of their
respective columns. Several attributes were explored.
Count of no and yes was checked in the output column
(depressive). This was done to check the proportion of
no to yes. The ratio came out to be 1:4. Due to the less
count of no, model prediction was not very accurate.
Since data was quite huge so due to computational
limitations, data set was sub sampled to 10% of the
original data set. We made sure ratio of noto yes does not
change in the sub sampled data, suggesting the smaller
data set is representative of the whole data set. Data
Scrubbing also included removing incomplete attributes

Assessing Mental Health Crisis in Pandemic Situation with…

(i.e. those with >25% unanswered answers) and
transforming attributes for downstream processing.
Data pre-processing is applied to transform raw data into
a format that is easily understandable and upgrade the
classifier performance [38]. Synthetic Minority Oversampling Technique (SMOTE) was used to combat an
imbalanced class design and to maintain the yes to no
ratio in the sub sampled dataset. Fig 2 shows the
comparison of number of classes (yes and no) before and
after SMOTE. This strategy enables us to adjust the class
configuration, wiping out any predisposition that may
ruin our downstream analyses. Unbalanced classification
issues cause problems to many learning calculations and
algorithms. These issues are portrayed by the uneven
extent of cases that are accessible for each class of the
issue. SMOTE is a notable calculation to tackle this
issue. Moreover, the dominant part class precedents are
additionally under-examined, prompting an increasingly
adjusted dataset.
The parameters perc.over and perc.under control the
measure of over examining of the minority class and
under-sampling of the majority classes, respectively.
perc.over will typically be a number over 100. With this
kind of qualities, for each case in the dataset having a
place with the minority class, new instances of that class
were made. In the event that perc.over is an incentive
underneath 100 than a solitary case will be created for a
haphazardly chosen extent (given by perc.over/100) of
the cases having a place with the minority class on the
first informational collection. The parameter perc.under
controls the extent of instances of the dominant part class
that will be arbitrarily chosen for the last adjusted
informational index. This extent is determined as for the
quantity of recently created minority class cases. The
parameter 𝑘 controls the manner in which the new
precedents or examples are made. These precedents will
be produced by utilizing the data from the 𝑘-nearest
neighbours of every case of the minority class. The
parameter 𝑘 controls what number of these neighbours
are utilized. This produces an arbitrary arrangement of
minority class perceptions, utilizing bootstrapping and
the datum point having 𝑘-closest neighbours. This
decreased the predisposition towards the larger part
class, while guaranteeing the new examples in the
minority class were illustrative of the previous qualities.
In this capacity 𝑘 was set to be 5 and perc.over to be 110.
The figure (2) demonstrates beginning number of no,
which were 10000; while that of yes were 40000.
Subsequent to applying SMOTE, number of no
expanded to 18000 and yes, diminished to 12000.To
further clean the dataset, Pearson correlation test was
used to determine the correlation between each feature
and the class attribute. Attributes with less than 10%
correlation were discarded from downstream analysis.
Pearson correlation test. This is an estimate of precise
association between two given variables of a system.
Pearson correlation coefficient (𝑟) is an estimate of the
strength of the connection between the two variables. It
has a value ranging from [-1,1]. If both variables increase
and decrease together it implies positive correlation

Informatica 47 (2023) 131–140 135

while if the value of one variable decrease with the
increase in other variable value or vice-versa it indicates
negative correlation.

(a)

(b)

Figure 2: Comparison of number of classes (yes-no)
(a) before and (b) after applying SMOTE.

3.3. Data classification
After the dataset was pre-processed and cleaned,
machine learning algorithms were applied to examine its
accuracy. Supervised algorithms such as 𝑘-nearest
Neighbour, Random Forest, Decision Tree and SVM
were applied. SVM gave an accuracy of 65% while KNN
gave a precision of 70.16%. Random forest achieved an
average accuracy score of 80% when n_tree was set to
100. Best accuracy was achieved through Decision Tree
which gave 81.19% precision. A description of
confusion matrix is given in Table 3. The decision tree
was assembled utilizing 10-fold cross validation. The
picked calculation, C4.5 or J48, was built utilizing a
multistep process is presented in Table II. To start with,
the single variable was discovered which best parts the
information into two groups. Second, the information
was separated, and the procedure was rehashed
recursively until the subgroups either achieved a greatest
size of 5 or no further modifications were made.
This methodology utilized a splitting criterion
known as the gain-ratio, and was pruned utilizing a
bottom up system known as error-based pruning. At last,
precision and Area under the Curve (AUC) was surveyed
to decide the reliability of the last tree and model. The
Area under the Curve (AUC) of the Receiver Operating
Characteristic (ROC) is a decent measure of the
execution of a model. The AUC esteem can go from 0.5
(the model plays out no superior to arbitrary shot) to 1
(model suitably clarifies the reaction inside the test set).

3.4. Building recommendation model
A recommendation system was compiled to provide a
user-interface program for use by doctors when their
patients are in the examination room. We have developed
this interface using shiny web application. This
visualization helped us to give some insights on how
habits like smoking, sleeping, remembering, etc can
affect their mental health. All the responses of the user
are recorded and scaled. We selected 6 questions
according to highest gain ratio that were achieved in our
decision tree model. These questions are illustrated as
following.

136 Informatica 47 (2023) 131–140
•
•
•
•
•

M. Rathi et al.

Have you visited a doctor for routine check-ups in
last 6 months?
Do you have memory loss issues, Concentration
Issues, or Trouble in finalizing decisions?
Do you have diabetes?
Medical history of disease like: arthritis, lupus,
fibromyalgia, or gout.
Do you have any visionary impairment?

•

Details of health policies of patient. Whether person
is under health cover or not?
These questions were clustered further with other six
questions whose correlation coefficient came out to be
more that 10%. Response to every question was grouped
with these questions to give an average depressive score.
If the average depressive score is more than 50% then it
represents that population in this cluster is more likely to
be depressive.

Table 2: Class-view & multiple attributes in mental health dataset.
S.No.

Attribute

1.

General Health

2.

Multiple Healthcare Professionals

3.

5.
6.
7.

Cost prohibiting seeing a doctor
Participate in physical activities or exercise
in past month
Having disease Asthma
Having disease COPD
Having disease Arthritis

8.

Time of last visit to dentist/ dental clinic

9.

Number of permanent teeth removed

10.

Gender of Respondent

11.

Marital status

12.

Education level

13.

Own/Rented home

14.

Employment status

15.
16.
17.
18.
19.
20.

Blind/difficulty in seeing
Difficulty in remembering/concentrating
Difficulty walking/climbing stairs
Difficulty dressing/bathing
Difficulty doing errands alone
Smoked at least 100 cigarettes in entire life

21.

Frequency of days currently smoking in
month

4.

Values
1. Excellent;
2. Very Good;
3.Good;
4. Fair; 5. Poor
1. Only one;
2.More than one;
3. None
1. Yes; 2. No

Correlation with class
attribute
-0.295607016

0.103815018
0.165653515

1. Yes; 2. No

-0.158701513

1. Yes; 2. No

0.100175029
0.186787737
0.237111272

1. Yes; 2. No
1. Yes; 2. No
1. within the year;
2.within past 2 years;
3. within past 5 years;
4. five or more years ago
1. 1-5; 2. 6 or more; All; 4. None
1. Male;
2. Female
1. Married;
2. Divorced;
3. Widowed;
4. Separated;
5. Never Married
1. Never attended;
2. Elementary;
3. Some High School;
4. High School Graduate;
5. Some College/ Technical;
6. College Graduate
1. Own;
2. Rented;
3. Other Arrangement
1. Employed;
2. Self-Employed;
3. Out of work for more than one year;
4. Out of work for less than one year;
5. Home maker;
6. Student
1. Yes; 2. No
1. Yes; 2. No
1. Yes; 2. No
1. Yes; 2. No
1. Yes; 2. No
1. Yes; 2. No
1. Every day;
2. Some days;
3. Not all days

-0.149358224
0.139458736
-0.248546199

-0.136709092

0.118850168

-0.251743782

-0.38432588

0.251269243
0.442148749
0.215583219
0.124759584
0.245702066
0.109147887
0.133169638

Assessing Mental Health Crisis in Pandemic Situation with…

22.
23.
24.
25.
26.

4

Have delayed getting medical care
Been without healthcare services in past 12
months
Activity has been limited due to health
problems
Having health problems that require special
equipment
Been diagnosed with depressive disorder
(class attribute)

Informatica 47 (2023) 131–140 137

1. Yes; 2. No
1. Yes; 2. No
1. Yes; 2. No
1. Yes; 2. No
1. Yes; 2. No

0.16235243
0.184359727
0.191887956
0.1665181842
1

Results & discussions

The correlation values between various attributes and
’depressive’ show common symptoms that a patient might
be dealing with in mental crises during the pandemic
lockdown [39]. The results for symptoms, including
difficulty in concentrating or remembering, blindness and
arthritis is shown in figures 5-7. These symptoms are quite
common in a person suffering from a mental crisis.
Correlation coefficients of these attributes were
0.442148749,
0.251269243
and
0.215583219
respectively. Further, in figure 8, the relationship between
depressive and health coverage is illustrated.
Also, we have compared our results from other
existing work in the same domain listed in Table II. It has
been found that the discoveries of this model help the
consequences of past examinations, emphatically
connecting burdensome scatters and dimensions of
periodontal ailment, and proposing a negative connection
with tooth brushing and dental checkups to melancholy
may exist. No other existing work finding out the
correlation amongst attributes as we did in the proposed
work which is quite effective in highlighting the positive
and negative features that directly or adverse impact the
outcome. Best accuracy was achieved through Decision
Tree, which gave 81.19% precision. The true positives
ratio came out to be 34.9 while true negatives ratio was
46.2. This low FN rate is basic in a working model, as the
cost of misclassifying a mental disease is a lot higher than
the expense of misclassifying a non-mental disease. The
highlights with the most elevated data gain give intriguing
bits of knowledge into the respondents’ practices in this
investigation.

Figure 5: Relationship between depression& difficulty in
concentration.

Figure 6: Relationship between depression&
blindness.

Figure 7: Relationship between depression&
arthritis.

138 Informatica 47 (2023) 131–140

M. Rathi et al.

Figure 8: Relationship between depression & health
coverage.

Table 3: Confusion matrix for proposed mental
health model
𝒚𝒆𝒔

𝒏𝒐

𝒚𝒆𝒔

34.9

3.8

𝒏𝒐

15.1

46.2

Figure 10: Truncated decision tree outcome.

A confusion matrix summarizes the performance of the
model [40]. The confusion matrix for Decision Tree is
presented in Table III, and the model accuracy (calculated
as (true observations/all observations)) was 81.07%.
Table IV presented all the precision scores in descending
order. From the table it is clear that Decision Tree
outperform all other algorithm.

For our proposed model, the AUC was 0.83 as shown in
figure 9. As it can be seen, the flat line initially depicts
bad precision of the model. As soon as specificity value
reaches a certain value of around 0.6, it escalates to a
maximum value of 0.83. This shows the model has
reached its maximum accuracy and hence it becomes
constant thereafter. The decision tree of observed
parameters is highlighted in figure 10.

Table 4: Comparative analysis of precision score

5

Figure 9: AUC of receiver operating characteristic.

Algorithm

Precision Score

Decision Tree

81.1935%

Random Forest

80.1265%

KNN

70.6312%

SVM

65.3542%

Conclusion & future research
directions

Our research initiative addresses the ever-increasing crisis
surfacing due to mental health related ailments, especially
in Covid-19 pandemic situation. A set of supervised
algorithms, including K-nearest Neighbor, Random
Forest, Decision Tree and SVM were applied. Our
proposed framework based off this model can help
biomedical specialists in rapidly distinguishing in danger
patients, prompting both higher rates of precaution
medicinal services and early intercession, at last bringing
down social insurance costs related with treating
discouragement and tension in the country. Future
undertakings should concentrate on expanding generally
speaking exactness of the model to guarantee unwavering
quality while giving specialists course with respect to
their emotional well-being patients.

Assessing Mental Health Crisis in Pandemic Situation with…

6

References

[1] Pfefferbaum, Betty, and Carol S. North. "Mental
health and the Covid-19 pandemic." New England
Journal of Medicine 383, no. 6 (2020): 510-512.
[2] Schäfer, Sarah K., M. Roxanne Sopp, Christian G.
Schanz, Marlene Staginnus, Anja S. Göritz, and
Tanja Michael. "Impact of COVID-19 on public
mental health and the buffering effect of a sense of
coherence." Psychotherapy and Psychosomatics 89,
no. 6 (2020): 386-392.
[3] Bish, Connie L., Heidi MichelsBlanck, Mary K.
Serdula, Michele Marcus, Harold W. Kohl III, and
Laura Kettel Khan. "Diet and physical activity
behaviors among Americans trying to lose weight:
2000 Behavioral Risk Factor Surveillance System."
Obesity research 13, no. 3 (2005): 596-607.
[4] Centers for Disease Control and Prevention.
"Behavioral risk factor surveillance system
questionnaire." System 83, no. 12 (2011): 76.
[5] Brooks, Samantha K., Rebecca K. Webster, Louise
E. Smith, Lisa Woodland, Simon Wessely, Neil
Greenberg, and Gideon James Rubin. "The
psychological impact of quarantine and how to
reduce it: rapid review of the evidence." The Lancet
395, no. 10227 (2020): 912-920.
[6] Cortez, Pedro Afonso, Shijo John Joseph, Nileswar
Das, Samrat Singh Bhandari, and Sheikh Shoib.
"Tools to measure the psychological impact of the
COVID-19 pandemic: What do we have in the
platter?" Asian Journal of Psychiatry 53 (2020):
102371.
[7] Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O.
Hall, and W. Philip Kegelmeyer. "SMOTE:
synthetic minority over-sampling technique."
Journal of Artificial Intelligence Research 16
(2002): 321-357.
[8] Ramentol, Enislay, Yailé Caballero, Rafael Bello,
and Francisco Herrera. "SMOTE-RSB*: a hybrid
preprocessing approach based on oversampling and
under sampling for high imbalanced data-sets using
SMOTE and rough sets theory." Knowledge and
information systems 33, no. 2 (2012): 245-265.
[9] Giraldo-Forero, Andrés Felipe, Jorge Alberto
Jaramillo-Garzón, José Francisco Ruiz-Muñoz, and
César Germán Castellanos-Domínguez. "Managing
imbalanced data sets in multi-label problems: a case
study with the SMOTE algorithm." In
Iberoamerican Congress on Pattern Recognition, pp.
334-342. Springer, Berlin, Heidelberg, 2013.
[10] Simon, Gregory, Johan Ormel, Michael Von Korff,
and William Barlow. "Health care costs associated
with depressive and anxiety disorders in primary
care." American Journal of Psychiatry 152, no. 3
(1995): 352-357.
[11] Simon, Gregory E., Michael Von Korff, and William
Barlow. "Health care costs of primary care patients
with recognized depression." Archives of general
psychiatry 52, no. 10 (1995): 850-856.

Informatica 47 (2023) 131–140 139

[12] Ciechanowski, Paul S., Wayne J. Katon, and Joan E.
Russo. "Depression and diabetes: impact of
depressive symptoms on adherence, function, and
costs." Archives of internal medicine 160, no. 21
(2000): 3278-3285.
[13] Barua, Sukarna, Md Monirul Islam, Xin Yao, and
Kazuyuki Murase. "MWMOTE--majority weighted
minority oversampling technique for imbalanced
data set learning." IEEE Transactions on knowledge
and data engineering 26, no. 2 (2012): 405-425.
[14] Barua, Sukarna, Md Monirul Islam, and Kazuyuki
Murase. "A novel synthetic minority oversampling
technique for imbalanced data set learning." In
International Conference on Neural Information
Processing, pp. 735-744. Springer, Berlin,
Heidelberg, 2011.
[15] Tong, Simon, and Daphne Koller. "Support vector
machine active learning with applications to text
classification." Journal of machine learning research
2, no. Nov (2001): 45-66.
[16] Fatima, Meherwar, and Maruf Pasha. "Survey of
machine learning algorithms for disease diagnostic."
Journal of Intelligent Learning Systems and
Applications 9, no. 01 (2017): 1.
[17] T. Kolenik and M. Gams, "Persuasive Technology
for Mental Health: One Step Closer to (Mental
Health Care) Equality?" in IEEE Technology and
Society Magazine, vol. 40, no. 1, pp. 80-86, March
2021, doi: 10.1109/MTS.2021.3056288.
[18] Kolenik, T. (2022). Methods in digital mental
health:
smartphone-based
assessment
and
intervention for stress, anxiety, and depression. In
Integrating Artificial Intelligence and IoT for
Advanced Health Informatics (pp. 105-128).
Springer, Cham.
[19] K. Nigam, K. Godani, D. Sharma, S. Khandelwal
and M. Rathi, Personalised Heart Monitoring and
Reporting System. 2020 Research, Innovation,
Knowledge
Management
and
Technology
Application for Business Sustainability (INBUSH),
2020,
pp.
68-73,
doi:
10.1109/INBUSH46973.2020.9392184.
[20] Rathi, M., Sahu, S., Goel, A., & Gupta, P. (2022).
Personalized Health Framework for Visually
Impaired. Informatica, 46(1).
[21] Gautam, A., Chauhan, A. S., Srivastava, A., Jadon,
C., & Rathi, M. (2019). Major Histocompatibility
Complex Binding and Various Health Parameters
Analysis. In Smart Healthcare Systems (pp. 151164). CRC Press.
[22] Rathi, M., Mittal, A., & Agarwal, D. (2020,
February). Prediction of Thorax Diseases Using
Deep and Transfer Learning. In 2020 Research,
Innovation,
Knowledge
Management
and
Technology Application for Business Sustainability
(INBUSH) (pp. 236-240). IEEE.
[23] Rathi, M., & Pareek, V. (2016). Disease prediction
tool: an integrated hybrid data mining approach for
healthcare. IRACST Int J Comput Sci Inf Technol
Secur (IJCSITS), 6(6), 32-40.

140 Informatica 47 (2023) 131–140
[24] O. Oyebode, F. Alqahtani and R. Orji, "Using
Machine Learning and Thematic Analysis
Methods to Evaluate Mental Health Apps Based
on User Reviews," in IEEE Access, vol. 8, pp.
111141-111158,
2020,
doi:
10.1109/ACCESS.2020.3002176.
[25] E. Gore and S. Rathi, "Surveying Machine
Learning Algorithms On Eeg Signals Data For
Mental Health Assessment," 2019 IEEE Pune
Section International Conference (PuneCon),
2019,
pp.
1-6,
doi:
10.1109/PuneCon46936.2019.9105749.
[26] Sabourin, A. A., Prater, J. C., & Mason, N. A.
(2019). Assessment of mental health in doctor of
pharmacy students. Currents in Pharmacy
Teaching and Learning, 11(3), 243-250.
[27] Hou, Y., Xu, J., Huang, Y., & Ma, X. (2016,
November). A big data application to predict
depression in the university based on the reading
habits. In 2016 3rd International Conference on
Systems and Informatics (ICSAI) (pp. 1085-1089).
IEEE.
[28] Gokten, E. S., & Uyulan, C. (2021). Prediction of
the development of depression and post-traumatic
stress disorder in sexually abused children using a
random forest classifier. Journal of Affective
Disorders, 279, 256-265.
[29] Xin, Y., Ren, X. Predicting depression among rural
and urban disabled elderly in China using a
random forest classifier. BMC Psychiatry 22, 118
(2022).
https://doi.org/10.1186/s12888-02203742-4.
[30] Srividya, M., Mohanavalli, S., & Bhalaji, N.
(2018). Behavioral modeling for mental health
using machine learning algorithms. Journal of
medical systems, 42(5), 1-12.
[31] Tate, A. E., McCabe, R. C., Larsson, H.,
Lundström, S., Lichtenstein, P., & Kuja-Halkola,
R. (2020). Predicting mental health problems in
adolescence using machine learning techniques.
PloS one, 15(4), e0230389.
[32] Reddy, U. S., Thota, A. V., & Dharun, A. (2018).
Machine learning techniques for stress prediction
in working employees. In 2018 IEEE International
[33] Conference on Computational Intelligence and
Computing Research (ICCIC) (pp. 1-4). IEEE.
[34] Potter, G., Wong, J., Alcaraz, I., & Chi, P. (2016).
Web application teaching tools for statistics using
R and shiny. Technology Innovations in Statistics
Education, 9(1).
[35] Conway, Jake R., Alexander Lex, and Nils
Gehlenborg. "UpSetR: an R package for the
visualization of intersecting sets and their
properties." Bioinformatics 33, no. 18 (2017):
2938-2940.
[36] Sinha, A., & Rathi, M. (2022). Advanced
Computational Techniques for Sustainable
Computing. ISBN 9781003046431, Taylor &
Francis, CRC Press, Chapman & Hall, pp. 1-338
[37] Adwitiya Sinha, “PSIR: A Novel Phase-wise
Diffusion Model for Lockdown Analysis of

M. Rathi et al.

[38]

[39]

[40]

[41]

[42]

[43]

[44]

COVID-19 Pandemic in India,” System Assurance
Engineering & Management, Springer, pp. 1-17,
October 2021
Ramanna, Sheela, and Lakhmi C. Jain. Emerging
paradigms in machine learning. Edited by Robert
J. Howlett. Heidelberg: Springer, 2013.
Sinha, A., & Rathi, M. (2021). COVID-19
prediction using AI analytics for South Korea.
Applied Intelligence, 51(12), 8579-8597.
Sinha, A. (2021). PSIR: a novel phase-wise
diffusion model for lockdown analysis of COVID19 pandemic in India. International Journal of
System Assurance Engineering and Management,
Springer, 1-14.
Saxena, N., Chahal, E. S., Sinha, A., & Chand, S.
(2021). Coronavirus Infection Segmentation &
Detection Using UNET Deep Learning
Architecture. In 2021 IEEE 18th India Council
International Conference (INDICON), pp. 1-6.
Gjoreski, M., Mitrevski, B., Luštrek, M., & Gams,
M. (2018). An inter-domain study for arousal
recognition
from
physiological
signals.
Informatica, 42(1).
Peng, X. (2021). Research on Emotion
Recognition Based on Deep Learning for Mental
Health. Informatica, 45(1).
Adeniji, O. D., Adeyemi, S. O., & Ajagbe, S. A.
(2022). An Improved Bagging Ensemble in
Predicting Mental Disorder using Hybridized
Random Forest-Artificial Neural Network Model.
Informatica, 46(4).

Informatica 47 (2023) 141–142 141

JOŽEF STEFAN INSTITUTE
Jožef Stefan (1835-1893) was one of the most
prominent physicists of the 19th century. Born to Slovene
parents, he obtained his Ph.D. at Vienna University,
where he was later Director of the Physics Institute, VicePresident of the Vienna Academy of Sciences and a
member of several sci- entific institutions in Europe.
Stefan explored many areas in hydrodynamics, optics,
acoustics, electricity, magnetism and the kinetic theory of
gases. Among other things, he originated the law that the
total radiation from a black body is proportional to the
4th power of its absolute tem- perature, known as the
Stefan–Boltzmann law.
The Jožef Stefan Institute (JSI) is the leading independent scientific research institution in Slovenia, covering
a broad spectrum of fundamental and applied research in
the fields of physics, chemistry and biochemistry,
electronics and information science, nuclear science
technology, en- ergy research and environmental science.
The Jožef Stefan Institute (JSI) is a research
organisation for pure and applied research in the natural
sciences and technology. Both are closely interconnected
in research de- partments composed of different task
teams. Emphasis in basic research is given to the
development and education of young scientists, while
applied research and development serve for the transfer
of advanced knowledge, contributing to the development
of the national economy and society in general.
At present the Institute, with a total of about 900 staff,
has 700 researchers, about 250 of whom are
postgraduates, around 500 of whom have doctorates
(Ph.D.), and around 200 of whom have permanent
professorships or temporary teaching assignments at the
Universities.
In view of its activities and status, the JSI plays the
role of a national institute, complementing the role of the
uni- versities and bridging the gap between basic science
and applications.
Research at the JSI includes the following major
fields: physics; chemistry; electronics, informatics and
computer sciences; biochemistry; ecology; reactor
technology; ap- plied mathematics.
Most of the
activities are more or
less closely connected to
information sciences, in particu- lar computer sciences,
artificial intelligence, language and speech technologies,
computer-aided design, computer architectures,
biocybernetics and robotics, computer automa- tion and
control, professional electronics, digital communications and networks, and applied mathematics.
The Institute is located in Ljubljana, the capital of the
in dependent state of Slovenia (or S nia). The capital

today isconsidered a crossroad between
♡ East, West and
Mediter-ranean Europe, offering excellent productive
capabilities and solid business opportunities, with strong
international connections. Ljubljana is connected to
important centers such as Prague, Budapest, Vienna,
Zagreb, Milan, Rome, Monaco, Nice, Bern and Munich,
all within a radius of 600 km.
From the Jožef Stefan Institute, the Technology park
“Ljubljana” has been proposed as part of the national
strat- egy for technological development to foster
synergies be- tween research and industry, to promote
joint ventures be- tween university bodies, research
institutes and innovative industry, to act as an incubator
for high-tech initiatives and to accelerate the development
cycle of innovative products.
Part of the Institute was reorganized into several hightech units supported by and connected within the Technology park at the Jožef Stefan Institute, established as the
beginning of a regional Technology park "Ljubljana". The
project was developed at a particularly historical moment,
characterized by the process of state reorganisation, privatisation and private initiative. The national Technology Park
is a shareholding company hosting an independent venturecapital institution.
The promoters and operational entities of the project
are the Republic of Slovenia, Ministry of Higher
Education, Science and Technology and the Jožef Stefan
Institute. The framework of the operation also includes
the University of Ljubljana, the National Institute of
Chemistry, the Institute for Electronics and Vacuum
Technology and the Institute for Materials and
Construction Research among others. In addition, the
project is supported by the Ministry of the Economy, the
National Chamber of Economy and the City of Ljubljana.
Jožef Stefan Institute
Jamova 39, 1000 Ljubljana, Slovenia
Tel.:+386 1 4773 900, Fax.:+386 1 251 93 85
WWW: http://www.ijs.si
E-mail: matjaz.gams@ijs.si
Public relations: Polona Strnad

142

Informatica 47 (2023) 141–142

Informatica 47 (2023)

INFORMATICA
AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS
INVITATION, COOPERATION

Submissions and Refereeing
Please register as an author and submit a manuscript at:
http://www.informatica.si. At least two referees outside the author’s country will examine it, and they are invited to make as
many remarks as possible from typing errors to global
philosoph- ical disagreements. The chosen editor will send the
author the obtained reviews. If the paper is accepted, the editor
will also send an email to the managing editor. The executive
board will inform the author that the paper has been accepted,
and the author will send the paper to the managing editor. The
paper will be pub- lished within one year of receipt of email
with the text in Infor- matica MS Word format or Informatica
LATEX format and figures in .eps format. Style and examples of
papers can be obtained from http://www.informatica.si.
Opinions, news, calls for conferences, calls for papers, etc.
should be sent directly to the managing edi- tor.

SUBSCRIPTION
Please, complete the order form and send it to Dr. Drago
Torkar, Informatica, Institut Jožef Stefan, Jamova 39, 1000
Ljubljana, Slovenia. E-mail: drago.torkar@ijs.si

Since 1977, Informatica has been a major Slovenian scientific
journal of computing and informatics, including telecommunications, automation and other related areas. In its 16th year
(more than twentyeight years ago) it became truly international,
although it still remains connected to Central Europe. The basic aim of Informatica is to impose intellectual values (science,
engineering) in a distributed organisation.
Informatica is a journal primarily covering intelligent systems in
the European computer science, informatics and cognitive community; scientific and educational as well as technical, commercial and industrial. Its basic aim is to enhance communications
between different European structures on the basis of equal
rights and international refereeing. It publishes scientific papers
ac- cepted by at least two referees outside the author’s country.
In ad- dition, it contains information about conferences,
opinions, criti- cal examinations of existing publications and
news. Finally, major practical achievements and innovations in
the computer and infor- mation industry are presented through
commercial publications as well as through independent
evaluations.
Editing and refereeing are distributed. Each editor can conduct
the refereeing process by appointing two new referees or
referees from the Board of Referees or Editorial Board. Referees
should not be from the author’s country. If new referees are
appointed, their names will appear in the Refereeing Board.
Informatica web edition is free of charge and accessible at
http://www.informatica.si.
Informatica print edition is free of charge for major scientific,
ed- ucational and governmental institutions. Others should
subscribe.

Informatica
An International Journal of Computing and Informatics
Web edition of Informatica may be accessed at: http://www.informatica.si.

Subscription Information Informatica (ISSN 0350-5596) is published four times a year in Spring, Summer,
Autumn, and Winter (4 issues per year) by the Slovene Society Informatika, Litostrojska cesta 54, 1000 Ljubljana,
Slovenia.
The subscription rate for 2022 (Volume 46) is
– 60 EUR for institutions,
– 30 EUR for individuals, and
– 15 EUR for students
Claims for missing issues will be honored free of charge within six months after the publication date of the issue.

Typesetting: Blaž Mahnič, Gašper Slapničar; gasper.slapnicar@ijs.si
Printing: ABO grafika d.o.o., Ob železnici 16, 1000 Ljubljana.

Orders may be placed by email (drago.torkar@ijs.si), telephone (+386 1 477 3900) or fax (+386 1 251 93 85). The
payment should be made to our bank account no.: 02083-0013014662 at NLB d.d., 1520 Ljubljana, Trg republike
2, Slovenija, IBAN no.: SI56020830013014662, SWIFT Code: LJBASI2X.

Informatica is published by Slovene Society Informatika (president Niko Schlamberger) in cooperation with the
following societies (and contact persons):
Slovene Society for Pattern Recognition (Vitomir Štruc)
Slovenian Artificial Intelligence Society (Sašo Džeroski)
Cognitive Science Society (Olga Markič)
Slovenian Society of Mathematicians, Physicists and Astronomers (Dragan Mihailović)
Automatic Control Society of Slovenia (Giovanni Godena)
Slovenian Association of Technical and Natural Sciences / Engineering Academy of Slovenia (Mark Pleško)
ACM Slovenia (Nikolaj Zimic)

Informatica is financially supported by the Slovenian research agency from the Call for co-financing of
scientific periodical publications.
Informatica is surveyed by: ACM Digital Library, Citeseer, COBISS, Compendex, Computer & Information
Systems Abstracts, Computer Database, Computer Science Index, Current Mathematical Publications, DBLP
Computer Science Bibliography, Directory of Open Access Journals, InfoTrac OneFile, Inspec, Linguistic and
Language Behaviour Abstracts, Mathematical Reviews, MatSciNet, MatSci on SilverPlatter, Scopus, Zentralblatt
Math

Volume 47 Number 1 March 2023

Enhancement of NTSA Secure Communication with One-Time
Pad (OTP) in IoT

ISSN 0350-5596

A. H. A. Allatas, M. A. AlShareeda, S. Manickam, M.
A. Saare

Predicting Students Performance Using Supervised Machine
Learning Based on Imbalanced Dataset and Wrapper Feature
Selection

S. Alija, E. Beqiri, A. S.
Gaafar, A. K. Hamoud

On Integrating Multiple Restriction Domains to Automatically
Generate Test Cases of Model Transformations

T-H. Nguyen, D-H. Dang

Implementation of Multiple CNN Architectures to Classify the
Sea Coral Images
Threat Model and Risk Management for a Smart Home IoT
System

1

11

21
43
Z. N. Nemer, W. N. Jasim,
E. J. Harfash

51

A. R. Mahlous

Prediction of Heart Disease Using Modified Hybrid Classifier

R. Pipalwa, A. Paul, T.
Mukherjee

65

Sentiment Analysis and Machine Learning Classification of
COVID-19 Vaccine Tweets: Vaccination in the shadow of feartrust dilemma

S. Tüzemen, Ö. BarışTüzemen, A. K. Çelik

73

Learning the Structure of Bayesian Networks from Incomplete
Data Using a Mixture Model

I. Salman, J. Vomlel

83

A Prediction Model for Student Academic Performance Using
Machine Learning

H. Kaur, T. Kaur, R. Garg

97

A Multi-channel Convolutional Neural Network for Multilabel
Sentiment Classification Using Abilify Oral User Reviews

T. E. Trueman, A. K.
Jayaraman, Jasmine S, G. A.
Narayanasamy
R. P. R. Chegireddy, A Sri
Nagesh

109

M. Rathi, A. Sinha, S.
Tulsyan, A. Agarwal, A.
Srivastava

131

A Novel Method for Human Mri Based Pancreatic Cancer
Prediction Using Integration of Harris Hawks Varients &
Vgg16: A Deep Learning Approach
Assessing Mental Health Crisis in Pandemic Situation with
Computational Intelligence

115

Informatica 47 (2023) Number 1, pp. 1–142