https://doi.org/10.31449/inf.v45i7.3633 Informatica 45 (2021) 147–166 147 
 
Diagnosis of Gastric Cancer Using Machine Learning Techniques in 
Healthcare Sector: A Survey 
Danish Jamil, Sellappan Palaniappan and Asiah Lokman  
Department of Information Technology, Malaysia University of Science and Technology, Kuala Lumpur, Malaysia 
E-mail: danish.jamil@phd.must.edu.my, sell@must.edu.my and asiah@must.edu.my 
 
Danish Jamil, Muhammad Naseem and Syed Saood Zia 
Department of Software Engineering, Sir Syed University of Engineering & Technology, Karachi, Pakistan  
E-mail: djamil@ssuet.edu.pk, mnaseem@ssuet.edu.pk, szia@ssuet.edu.pk  
 
Keywords: data mining, machine learning, artificial intelligence, early gastric cancer, gastric cancer, decision support 
system, clinical decision support system, knowledge discovery in database, deep learning, BigData, helicobacter 
pylori, stomach cancer 
Received: July 7, 2021 
Many researchers are trying hard to minimize the incidence of cancers, mainly GC. For GC, the five-year 
survival rate is generally 5–25%, but for EGC, it is almost up to 90%. Among the cancers, GC is very 
deadly. It is difficult for doctors to assess its threat to patients as it requires years of medical practice and 
rigorous testing. The healthcare sector has benefitted from AI for the early diagnosis or classification of 
GC. However, the current AI-based techniques need further improvement in clinical testing. 
Heterogeneous GC characterization requires more optimized methods for early detection of GC because 
of its type and severity. Hence, it is essential to investigate this area further and develop more optimized 
approaches for early diagnosis. Early detection will increase the chances of successful treatments. In this 
study, we have conducted a literature survey detailing the role of AI in the healthcare sector for GC 
diagnosis. We discuss basic principles, advantages and disadvantages, training and testing of data, and 
integration of applications like DSS, CDSS, KDD, ML, DM, BD, and DL, and their relevance to the 
healthcare industry. The study focuses on the application of ML techniques used in the diagnosis of GC. 
This review paper also introduces DM techniques, their application in the healthcare industry, limitations, 
roles, and operational challenges. These assist pathologists in helping minimize their workload while 
increasing diagnostic accuracy. These techniques will further assist medical practitioners with their 
decision-making process. 
Povzetek: Raziskava o uporabi tehnik strojnega učenja pri diagnostiki raka želodca v zdravstvenem 
sektorju. 
 
1 Introduction
According to the US Cancer Society, approximately 
28,000 people lived with cancer in 2019, accounting for 
1.7% of all cancer cases, while 10,960 people died from 
GC[1].In most parts of the African regions, there was a 
low risk of GC[2], though the rate of GC has fallen in 
recent decades, and it is the world's third leading cause of 
cancer deaths, after lung cancer and colorectal cancer. GC 
occurs quite often on the eastern side of Asia, particularly 
in Japan. The estimated occurrence of GC in Japan is 
about 60 per 100,000 men and women. An estimated 
1,688,780 new cancer cases in the US and 600,920 cancer 
deaths in 2019. Almost 4,630 new cases have been 
identified, and also, the number of deaths per day was 
1,650 between the years 1991 to 2014.[3]. Nearly 
28,000cases of GC in 2019(17,750 males and 10,250 
females). About 10,960 people are known to have died of 
cancer (6,720 males and 4,240 women). Almost 70% to 
90% of all GCs start with h.pylori infection. It is 
circulating in the human body through uncooked or 
unwashed food. Salty foods are more likely to cause an 
increase in GC, which can develop into a tumour. Around 
30 out of 100,000 people in Japan whose diagnosis GC 
over their lives. There is no way to avoid GC earlier; if the 
doctor finds the patient has severe symptoms, GC turns 
into a tumour. Operations, chemotherapy, therapy, and 
radiation therapy are the best treatments for patients. 
Physicians usually recommend two or more such 
treatment approaches for their patients. The Japanese 
government taking the initiative to diagnose GC in its 
early stages is commendable. Physicians need to diagnose 
GC earlier and start treatment. This reduces mortality rate 
and life expectancy. In the diagnosis of GC, the staging of 
GC is very important. The risk factors associated with 
cancer will increase the chances of a patient getting GC. If 
tumours are found in patients, they have higher risk 
factors. Diseases are associated with many risk factors: 
gender, age, ethnicity, geography, h.pylori infection. The 
risk of GC seems to rise through cigarettes and dieting [4]. 
The medical practitioners do not understand many 
dimensions of data produced by the healthcare sector; 
148 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
therefore, the primary purpose of this data is to improve 
the efficiency of medical procedures or medical treatment 
strategies[5]. Many hospitals produce a large amount of 
redundant data; most of the data is ambiguous and low 
quality due to its missing values. This heterogeneity of 
data contributes to the need for a comprehensive review of 
data to determine its output and recognize its potential 
issues. Since this data is complicated, it is challenging to 
evaluate or analyze with the help of standard tools and 
techniques[6]. Despite tremendous improvements and 
innovations in healthcare services that enable more 
prominent and more accurate diagnoses, in the area of the 
cancer domain, this remains one of the most lethal 
malignancies in the world, though recording a decreasing 
trend [2][7][1]. One of the most crucial factors is the 
excessive amount of eating and drinking found in the 
diagnoses of GC. Drinking alcoholic beverages and eating 
salty foods are two of the most dangerous causes of GC, 
and smoking also increases the chances of getting it. Some 
parts of cells in the human body allow GC to spread 
uncontrolled throughout the body [8].  
ML is a branch of computer science concerned with 
reforming and making systems capable of performing a 
specific task. The primary goal of ML is for computers to 
achieve human-level intelligence.AI made up of two 
interrelated disciplines known as machine learning (ML) 
and deep learning (DL); its purpose is to identify patterns 
and get data from prior occurrences.ML benefits people in 
various ways, including identifying cancerous cells, 
recognizing hacker or lawbreaker patterns in massive 
amounts of monetary transactions, performing speech and 
video recognition, and developing chatbots that speak and 
understand human speech to communicate better. Many 
ML techniques are available, including supervised, semi-
supervised, unsupervised,  and reinforcement, 
evolutionary. These ML techniques help classify the 
dataset[9]. Nowadays, in the clinical industry, it is a big 
challenge to identify the presence of a GC to have an 
accurate prognosis. Doctors must know the details of the 
patient results obtained from the physical examination. 
Therefore, for this purpose, well-designed computer-
based decision support systems (DSS) may be helpful in 
the diagnosis of GC in patients, which is very cost-
efficient. The healthcare industry has generated a large 
volume of patient assessment reports, diagnostic reports, 
and different types of tests. Proper orchestration is a 
challenging job[10] [11]. 
The main challenge in this area is the poor handling 
of data, which causes quality problems when organizing 
the data in the proper format. Enhancing a large volume of 
data necessitates the ML technique to accurately and 
adequately collect and process data in the right direction. 
Initially, ML algorithms were developed and implemented 
on a medical dataset for various cancers such as GC. 
Moreover, ML offers a variety of methods or procedures 
used for effective data processing. The digital revolution 
has provided economic and readily available acquisition, 
making it possible to capture and store vast amounts of 
data at a low cost[12]. The latest and most advanced 
machines are installed in hospitals. Their purpose is to 
utilize these machines for data gathering and data 
processing and to make more efficient healthcare facilities 
for the sake of easy and rapid data-gathering and retrieval. 
ML techniques help analyze medical data; they are 
incredible in the medical domain because of the variety of 
issues they can solve. Furthermore, there have been many 
applications for medical diagnosis since the emergence of 
large-specialized approaches, including ML, which fits 
the purpose of analyzing small-modified data very well. 
[13] [14]. 
The remaining portion of the paper is structured as 
follows: Section 2 discusses the background of the study. 
This section describes the fundamental principles, benefits 
and drawbacks, training and testing of their data, and their 
ties to the healthcare industry of BD (big data), DM, ML, 
DL, KDD, DSS, and CDSS. Section 3 has some related 
surveys associated with the diagnosis of GC.In Section 4, 
the nature of cancer and its impact on GC, along with 
operational challenges and limitations. Section 5 contains 
a detailed discussion of the findings of the analysis of 
Method Benefit Drawback 
Supervised 
Learning 
There are 
notions of 
output that 
occur in the 
learning 
process. It can 
perform 
classification 
as well as 
regression 
functions. It 
improves the 
results of 
measuring or 
transforming it 
into a new 
sample. 
A labeled data set  
usually required 
in the initial 
phase. It entails a 
training phase. 
Unsupervised 
Learning 
Classification 
is a 
straightforwar
d process. No 
training data is 
needed. 
Automatic 
labeling of the 
training data 
set saves much 
time wasted 
on manual 
classification. 
In the learning 
process, there are 
no notions of 
output. It does not 
allow for the 
computation or 
analysis of new 
sample data, 
which is a 
limitation. The 
findings may be 
significantly 
affected by 
outliers. It can 
only used for 
activities that 
involve 
classifying data. 
Table 1: Shows the two types of most popular (ML) 
approaches, as well as their benefits and drawbacks[15]. 
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 149 
 
various types of cancer. Section 6 discusses future 
directions. Section 7 concludes the paper. 
1.1 Our cooperation and effort in the 
organization of paper 
The purpose of the research is to enhance the use of ML 
techniques in the healthcare industry consistently and 
effectively. It is practicable that the ML-based technique 
guiding physicians and medical practitioners may require 
a considerable step forward in detecting GC and its cure. 
Furthermore, AI and ML clinical trials are the future 
waves during the GC diagnosis and treatment, allowing 
for more rapid mapping of a treatment strategy to fit a 
patient's specific needs. Finally, the advantages and 
limitations of clinical AI applications are highlighted. This 
study has contributed a new point of view on how AI 
technologies may help boost GC diagnosis and prognosis 
and the advancement of human health. 
• This article covers the essential principles and 
potential advantages behind AI, BD, ML, and DL and 
their implications. In addition, we explore the pertinent 
issues and the probable consequences and problems 
for healthcare experts and physicians. 
• The challenges and future commitments of doctors and 
experts in the era of AI are recognized and debated. 
Additionally, as AI, particularly ML and DL, has 
gained popularity in clinical cancer research in recent 
years, cancer prediction performance has improved 
significantly. 
• In this article, we reviewed the use of AI in cancer 
diagnosis and prognosis, as this feature will help better 
comprehend the content and how these strategies 
contribute to the field's evolution. 
2 Background of the study 
This section gives a general overview of the DM 
paradigm, which is linked up with a different field of study 
to help people better grasp the complexity of DM in 
healthcare. This paper presents research about a brief 
discussion of how these techniques are integrated, such as 
BD, DM, ML, and DL, with their pros and cons. On the 
other hand, we talk about integrating DSS, CDSS, KDD, 
and DM tools and techniques and approaches and their ties 
with the healthcare industry integrated. 
2.1 Knowledge Discovery in Database 
(KDD) 
Most recent innovations in technology and the 
computerization system within the healthcare sector have 
shown that there has been a rapid increase in development 
and innovation in recent years. The accelerated 
development and implementation of newer technologies 
in the computerization system, together with the rapid rise 
in the number of transactions performed every day, have 
led to an enormous quantity of data produced and 
gathered. This vast quantity of data must be refined into 
actionable and relevant information for companies, which 
will help them make better decisions. Moreover, there is 
also a need to extract knowledge from the increasing 
amount of digital data contributing to modern technology. 
The field is referred to as (KDD). The first step of the 
(KDD) process is to identify the source information 
extracted, which can be datasets, which are subsets of 
variables, up to large amounts of data[16][7]. Getting 
better outcomes and improving data quality is an essential 
step. As a result, it ensures that higher-quality patterns are 
discovered. 
2.2 Decision Support Systems (DSS)& 
Clinical Decision Support Systems 
(CDSS) in Diagnosis of gastric cancer 
domain 
The DSS is a computer-based system application. Its 
function is to resolve issues that occur during the process 
of decision-making. It can control and monitor all phases 
of decision-making that are made or done by the decision-
makers or experts. Its function is to support the decision-
making process, and these systems seek helpful 
information from DM techniques. Therefore, these DM 
techniques are used to analyze and explore data. Their 
primary purpose is to find patterns that could be useful for 
decision-making[17,18]. DSS is integrated with well-
known healthcare organizations such as hospitals to form 
CDSS, which will assist healthcare professionals in 
making more effective decisions. These systems facilitate 
healthcare professionals during the process of the clinical 
decision-making process. First, CDSS informs healthcare 
professionals or medical experts regarding errors or 
inconsistencies while the process progresses. It also alerts 
organizations about critical tasks that must be completed 
throughout the process. The CDSS system provides the 
right direction for healthcare professionals, providing the 
best medical care to reduce the likelihood of 
GC[19,20,21]. The number of patients who see doctors 
with symptoms indicative of the advanced stages of GC 
limits the range of viable treatment and detrimental effect 
on the patient life expectancy. People are too afraid to 
 
Figure 1: The KDD process phase[17]. 
150 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
share their concerns and seek advice from healthcare 
professionals when experiencing sick symptoms such as 
discomfort or the flu. It suggested that various tests be 
done to aid in the early detection of GC to identify 
potential diseases in the patient[20]. 
The Spread of the GC in the human body and the 
overall prognosis remains dire. Survival is highly 
dependent on the severity of the condition. Survival rates 
are often poor when the illness is identified at an advanced 
stage. A 5-year survival rate of 95% is possible if the 
cancer is detected early and is limited to the inner lining 
of the GC wall [21]. Upper endoscopy is an effective 
method for diagnosing GC. However, this sort of analysis 
is not inexpensive. Consequently, there is a need for a 
decision support system for early diagnosis that would 
provide extra information to a specialist when deciding 
whether to do an endoscopy in a particular instance.  
A CDSS aimed to help health professionals with 
efficient clinical decision-making by integrating clinical 
data. An effective CDSS encompasses patient data with 
medical expertise and combines this with a heuristic to 
facilitate clinical decision-making. Decision support 
systems are designed to serve three clinical functions: 
automating data input and retrieval tasks; being very fast, 
such as medical alerts and reminders; and providing 
individualized advice. The systems presented in this area 
are all knowledge-based systems, a subset of CDSS that 
make decisions at the domain expert level. These CDSSs 
make explicit use of a knowledge base to define the 
knowledge and provide facts about the highlighted issue. 
The systems utilize a logical inference mechanism to 
analyze the facts and form logical claims, often using if-
then-else expressions [22]. Typically, the interpretation 
process is as follows: match–resolve–execute.  All rules 
relating to the input data matched, followed by the 
determination of the order of rule execution (including 
conflict resolution), and lastly, execution occurs. Clinical 
decision-making is a more complex problem because the 
knowledge has qualities that make them suitable for 
solving the problem. The structure and decoupling of facts 
and rules simplify maintenance by removing hardcoded 
rules from a program [23][108]. 
2.3 Data mining implications in the 
healthcare sector 
In this era, large quantities of data are being processed 
daily in various industries such as hospitals. Medical data 
has undergone an unprecedented increase over the years 
due to the vast number of transactions every day. 
Consequently, DM has now arisen to turn this data into 
usable and meaningful knowledge for hospitals due to the 
enormous amount of produced data [24]. Given the 
benefits of DM techniques, they also have some 
drawbacks, particularly for the healthcare industry. The 
accessibility of healthcare data is minimal. Because of its 
dispersion into various systems, medical data must be 
collected and combined before the DM process. In 
addition, ethical and legal issues may arise if the hospital's 
protection and security of data. DM uses ML algorithms 
to apply statistical and computational functions on data to 
retrieve handy information that the user quickly 
understands systematically. It may identify trends and 
relationships in a large amount of data obtained from 
single and multiple data sources. It must represent 
different forms of representation, like equations, trees or 
graphs, patterns, or correlations[25]. Increasing amounts 
of medical data are collected, analyzed, and stored nearly 
every day, resulting in an ever-increasing amount of 
information and expertise for researchers and clinicians. 
This emerging technology, which is now in use, is 
described by BD.BD started with disparate data sources of 
diverse scale, configuration, and structure, and we wanted 
to observe how this data is linked, evolves, and integrated. 
Therefore, we used multiple-volume, decentralized 
controls. Business Intelligence (BI) is the method of 
finding trends in datasets useful in decision-making in 
diagnosing GC. The useful trends will allow us to make 
nontrivial predictions about new data and are insightful in 
many significant ways about the already-observed data. 
However, to make matters worse, it is not always simple 
to identify these trends [26] [27].To succeed, we will have 
to use more sophisticated means of representing these 
systemic trends in results. 
2.4 Data mining tools and techniques for 
gastric cancer diagnosis 
Many signs are used to facilitate decision-makers or 
experts with a better decision-making process, especially 
in hospital environments, to enhance their patients' 
services. So, to better comprehend these tools, the CRISP-
DM framework is suggested to execute the DM project. In 
addition, in the modelling phase, Waikato Environment 
for Knowledge Analysis (WEKA), an ML platform, is 
used to analyze and explore the data, which is accessible 
its purpose is to develop the desired models [28]. Finally, 
Table 2 below summarizes some of the merits and 
demerits of each algorithm.  Intriguingly, various ML 
approaches have proven beneficial in their respective 
application areas. Making statements regarding 
procedures is a necessary component of both experimental 
design and implementation design [29]. 
2.5 Overview of data mining application in 
the research areas 
In this part, we will look at how DM techniques have been 
used in academic research projects. First, we categorized 
DM applications in the healthcare domain according to the 
type of DM methods employed in the study. Next, we 
focused on DM applications in classification and 
clustering. Classification is a widely used technique in 
DM. It is the process of identifying a collection of models 
that enables the recognition and classification of training 
datasets. The goal of classification is to ascertain the 
category of prospective data objects based on the previous 
data obtained from the dataset. In classification, the 
process usually learns using a training dataset, and the 
gained information collected from the dataset is then 
validated on the testing dataset [30]. Cluster analysis is a 
technique used for learning and comprehending any data. 
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 151 
 
It helps to organize the data into categories (or clusters). 
Proximity metrics are essential in determining the degree 
of similarity between two items throughout this grouping 
process. Therefore, before implementing a supervised 
learning technique on a dataset, the elements associated 
with pre-processing, such as coping with sparse data, 
utilizing feature correlation, and balancing the scales of 
distinct features, must be addressed. Clustering is a critical 
approach that uses various disciplines, including image 
classification, text analytics, competitive analysis, and 
economics. This strategy divides a set of data points into 
distinct groups (clusters) to maximize intra-class 
similarity. 
Therefore, all identical points are grouped, but the 
clusters themselves are still dissimilar. This partitioning is 
implemented by utilizing specific proximity, density, or 
Method Benefit Drawback 
K-means 
Clustering 
This clustering method is fundamental. It 
incorporates a plethora of effective clustering 
techniques. 
It comprises the incorporation of many 
clusters. There are difficulties associated with 
categorical attributes. When outliers found 
during the process, the outcomes differ 
substantially. 
Support 
Vector 
Machine 
(SVM) 
They give better accuracy in comparison to other 
classification algorithms. In contrast to other 
methods, the problem of overfitting is not as severe.  
It entails a considerable amount of computing 
effort. In comparison to other techniques, the 
training procedure is more time-consuming. 
ID3 
This algorithm has no domain-specific prerequisites- 
exact value results for different actions reduce the 
uncertainty associated with complex decisions. Its 
classifiers and outputs are concise. Databases with a 
large number of dimensions are more readily 
processed. 
Sensitivity to unstructured data. The 
procedure will need substantial computer 
resources to perform the tests—a high storage 
capacity needed for complicated projects. 
KNN 
It is a simple technique to implement. It has allowed 
computing costs because of the training phase. 
Sensitivity to unstructured data. The 
procedure will need a substantial amount of 
computer resources to perform the tests. A 
high amount of storage capacity needed for 
complicated projects. 
Naïve 
Bayes 
Bayesian 
Networks 
This method is straightforward to execute. The 
algorithm performs much better when dealing with 
large, multidimensional datasets. 
 
 
Whenever factors are interdependent, 
accuracy is poor. 
Linear 
Regression 
In terms of accuracy, it surpasses the other classifiers. 
It is easy to identify the underlying relationship 
between dependent and independent variables. 
 
If outliers are present, the results differ 
substantially. In comparison to other methods, 
the training process is more time-consuming. 
The performance of the classifier is dependent 
on the kind of dataset, which renders it 
unpredictable. The output is all numerical. 
Logistic 
Regression 
Better accuracy compared to other classifiers.  It is 
easy to determine the underlying relationship 
between dependent and independent variables. 
When outliers are present, the results vary 
significantly. In comparison to other methods, 
the training procedure is more time-
consuming. The effectiveness of the classifier 
is dependent on the kind of dataset, which 
renders it unpredictable. No output is 
categorical. 
Neural 
Network 
The identification of meaningful interconnections 
between dependent and independent variables is very 
straightforward. It is capable of managing databases 
with a high level of noise. It is not necessary to 
complete a primary feature extraction task. 
There is a high likelihood of local minima. 
There is a significant likelihood of overfitting 
problems occurring. In many cases, classifiers 
may be challenging to comprehend. When 
their many layers, a considerable amount of 
computing time is a Proximity metrics are 
needed. There can be no rationale for choices, 
which is a “black box" feature. 
Table 2: Benefits and Drawbacks of described algorithms [29]. 
152 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
other such factors. Unlike classification, which needs class 
labels to recognize patterns in a given dataset, clustering 
does not require any class labels in a given dataset. It is 
usually difficult or costly to collect class label information 
for a dataset (such as images and web documents). The 
aggregate data is identified by categorized clustering 
algorithms: hierarchical, partitioned, density-based, grid-
based, and modelling-based. K-means is an extensively 
used partitioned clustering approach that minimizes the 
covariance matrix of distances between the centroids and 
data points to find the optimal data partitioning for a 
particular dataset. Each data point in this partitioning is 
uniquely associated with a single cluster [31]. The two 
most commonly used DM tools are ORANGE and 
WEKA. Its goal is to identify significant risk factors 
related to patients before surgery. This toolkit is used in 
probability tests such as the χ2-Test. It is used for visual 
programming for data visualization. For algorithm-based 
analysis, WEKA was used. The WEKA tool is used to find 
the correlation between variables using the Apriori 
algorithm[32]. 
2.6 The impact of medical generated big 
data over GC 
The essence of BD is the re-examination of data’s 
fundamental worth in the information explosion age. 
Healthcare information platforms, such as hospitals, their 
health management information systems, and digital 
medical equipment, are rapidly expanding, generating 
massive amounts of clinical data in the process. The 
government has issued a set of rules and regulations to 
speed up the development of medical devices. As a result 
of aggressive lobbying at the national level, local 
governments, healthcare organizations, and associated 
corporations have decided to cut various linkages, 
generate medical BD, and aggressively explore associated 
economic applications. The present situation of data 
consumption for the regional medical information 
platform falls into two broad categories: direct usage and 
indirect usage. Immediate use includes information 
sharing, intelligent prompting and diagnostic aid in 
contemporary health care, and other commercial 
cooperation services based on information sharing. The 
indirect usage is mainly based on health management and 
primary management data and its overall performance 
analysis. 
Clinical BD is a collection that refers to the enormous 
amount of data created. The three primary data sources are 
as follows. The first category is the kind of data found in 
clinical health records. The database for existing medical 
care is rising at a breakneck pace. As a result, clinical 
medicine requires a growing quantity of information, from 
an electrocardiogram to CT imaging to an entire medical 
file. The second category is pharma research data and data 
from biological sciences. Understanding pharmacological 
effects and fundamental drug interactions are necessary 
for drug development. It is a time-consuming operation, 
and it generates vast volumes of data. As more is learned 
about genes, data such as gene sequencing and personal 
gene mapping will become available to the general public 
in the realm of biological sciences. The third category is 
private health information. At the moment, most people 
are only aware of their physical health through yearly 
physical exams[34,35,36]. 
2.7 The role of machine learning for GC 
diagnosis 
To tackle biological research challenges, several 
researchers use ML algorithms. In supervised ML 
approaches, the learning phase, training phase, and testing 
phase comprised three steps. In the learning phase, the ML 
algorithm is constructed. In the training phase, a large 
amount of data is provided to the ML model to allow it to 
generate generic rules out of it. Finally, in the testing 
phase, new data is input to test the accuracy of the model 
prediction. Whereas in unsupervised ML learning, data 
points are given with no class labels. The difficulty is in 
splitting the data points so that there should be maximum 
relevance and least redundancy. Identifying unknown 
dependencies requires two steps: first, using a dataset 
(input) to measure them, and then using the estimated 
dependencies to create a whole new system (outputs).In 
this subsection, we shall examine the strategies used in 
both phases and compare their effectiveness. When 
entering input into an ML algorithm, get a series of data 
instances. This data instance must be classified or grouped 
because it derives from objects and data. There are several 
data instances to consider, i.e., each data instance must 
work as an individual principle that must be learned in 
isolation. Each data instance is characterized by a 
collection of fixed attributes, such as age, race, gender, 
education, and class attributes. In the context of a 
database, each dataset is defined as a matrix of instances 
and attribute. A flat file (single relationship) is an entity 
with many dimensions [34]. 
The most commonly used ML types for training 
approaches are supervised and unsupervised learning 
approaches. This approach illustrates an unsupervised 
learning technique in which unlabeled or novel examples 
are given, and there is no notion of the outcome. Such a 
technique aims to obtain several categories or groups, 
which will help us organize the information. In supervised 
learning, labelled data is used to estimate or map the 
model output[35]. The amount of labelling needed using 
the completely labelled data set can be reduced by around 
30-40% when using our new techniques[39,40]. This 
limitation, however, can be addressed by Active Learning 
(AL), which learns incrementally through beginning with 
a few examples and then telling the medical expert to mark 
only the instances that the algorithm judges to be the most 
insightful in each iteration. The ML techniques are widely 
used in the medical industry. However, controlled 
instruction also involves grouping and regression. A 
classification method's critical problems include: 
Determining the number of groups, naming the various 
characteristics of the records, and learning from them. 
This research establishes that each new sample is 
associated with one of the existing groups. Regression 
tasks are applied using the learning technique, which 
translates the raw data into a fundamental variable in the 
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 153 
 
model. For each new piece of research, this method may 
use to calculate the predictive variable. 
However, classification and regression are two tasks 
of supervised learning. In the classification method, the 
learning mechanism assigns data to a finite number of 
groups, a new sample classified into one of the current 
groups using this method. While in a regression method, 
the mechanism converts data to a real variable. The 
predictive variable value for each new sample was 
calculated using this method [37]. A wide range of 
medical data requires better extraction and overall 
treatment methods, accurate diagnosis by GC through ML 
approaches, prediction of patients with GCs or lesions, 
and stability among domain expert knowledge in the 
associated areas and capability of professional data 
analysis and data processing that result in the best 
outcomes of GC diagnosis. Medical Decision Support 
systems combine ML and medical care as the fundamental 
technology of intelligent medicine, with enormous 
advantages for early prevention of different GCs and 
patient care [38]. 
2.8 The role of deep learning for GC 
diagnosis 
The domain of DL offers an effective platform for 
supervised learning. Standard ML models such as SVM, 
ID3, KNN, and Naive Bayes have shallow architectures, 
whereas other linear and logistic regressions do not. DL is 
changing this. DL is a step above artificial neural networks 
as it incorporates additional layers that allow higher levels 
of abstraction and more accurate predictions from data 
[39].The DL model train using a variety of approaches and 
algorithms. Thus, the DL architecture is a multilayer stack 
of basic modules subject to learning; many apply 
computational models with nonlinear input-output 
mappings. At each level of the pile, each module also 
expands the range of inputs. Its purpose is to maximize the 
representation selectivity and invariance. A deep network 
may reflect functions of increasing complexity by adding 
more layers or additional units that can be added inside 
[40]—DL techniques used in the medical industry in some 
applications. DL is a type of ML, so we first introduce the 
basic concepts of ML. ML is one approach to data analysis 
that detects patterns in the data and then uses these patterns 
to predict future outcomes. When considering medical 
datasets, two main types of ML techniques: the first one 
has supervised learning, and the second one is 
unsupervised learning [41]. 
The first approach to ML is the supervised learning 
technique, which involves mapping from inputs to outputs 
based on labels assigned to the input-output pairs. This 
mapping job is characterized as a classification task when 
the findings are represented as categorical data. For 
example, a supervised learning challenge is when a 
machine learning agent learns to differentiate between 
normal and tumour tissues on pathologic slides based on 
labels such as "normal" or "tumour. "On the other hand, if 
the results are real scalars, this is known as a problem of 
regression[42]. The second approach to ML is 
unsupervised learning, which involves learning without 
particular labels. Clustering is one of the most common 
examples of unsupervised learning. Take a closer look at 
a circumstance in which thousands of nuclear photos were 
obtained from diverse cells on histopathological slides. 
These pictures can be grouped automatically into a set 
number of groups based on the similarity measurement of 
the ML agent. Because this activity does not contain a 
unique title, it is regarded as unsupervised learning. Taken 
together, two sample forms of ML algorithms allow 
medical data sets to be analyzed: supervised and 
unsupervised. Which approach is more appropriate is 
determined by the types of questions asked and the various 
properties of the data.[43]. AI-based deep learning is 
particularly well-suited for examining structured data and 
addressing the classification problem associated with 
structured data. DL is a field of ML. It is a mathematical 
tool utilized in research applications such as healthcare 
image analysis, object detection, speech analysis, and 
natural language processing [44]. DL has gained 
significant interest over the last six years as computational 
power has risen, system expenses have dropped, and many 
new datasets have been generated. DL algorithms are 
particularly effective in diagnosing and classifying GC 
and its many subtypes and segmenting tumours. DL 
techniques can provide superior information regarding 
specific types of cancers, their symptoms, locations, 
Method Benefit Drawback 
ML 
These algorithms 
are often 
straightforward to 
implement. 
Algorithms are 
sufficiently 
adaptive to 
complex situations 
involving a large 
number of 
interdependent 
variables. 
Variations in the 
input and output 
may seen. 
In high-
dimensional 
databases, 
complex 
relationships 
between 
dependent and 
independent 
variables are 
challenging to 
determine. Its 
computational 
cost is very high. 
DL 
In high-
dimensional 
databases, complex 
interactions 
between dependent 
and independent 
variables are 
effectively-
recognized. It has 
the potential to 
maintain databases 
with a high level of 
noise. 
Both the input and 
output are 
identical. The risk 
of an overfitting 
problem is very 
significant.  When 
compared to 
machine learning, 
implementation is 
more complex. 
The training phase 
requires much 
more computing 
power than 
machine learning. 
Table 3: Benefits & Drawbacks of ML and DL. 
154 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
stages, aggressiveness, and metastases. Physicians can 
benefit from DL approaches by providing supplementary 
thoughts and identifying regions connected to pictures. 
Furthermore, a single DL model has been demonstrated to 
be helpful in diagnosis when compared to conventional 
medical procedures[45]. Table 3 shows some of the 
benefits and drawbacks of some commonly mentioned 
techniques, ML and DL [46]. 
One of the primary problems of ML is predicting 
results from the new data. Frequency analysis of the data 
performed using training data. A training data set is a set 
of input variables; corresponding outcomes are selected at 
random for each case, including positive and negative 
examples. Often, a subset of the initial unlabeled data set, 
referred to as the testing set, is used to evaluate the model 
consistency [47]. We are subject to estimation faults 
during the training phase. In other words, if supervised 
learning determines how well it has performed, the smaller 
the gap between training and testing errors, the better the 
algorithm will perform on testing data. These two 
principles describe the problems of overfitting and under-
fitting in ML, and both trigger algorithm output reduction. 
If supervised learning incorrectly models the underlying 
dataset random fluctuations, it may become overly 
focused on the latter rather than describing it as an error. 
Under-fitting occurs when the algorithm fails to identify a 
feasible solution because the observed data does not 
explain consciousness. Thus, the least amount of 
overfitting and under-fitting occurs where a model 
encompasses both the training and testing sets since the 
training and testing data are distinct but have the same 
underlying distribution. As it stands, the method capacity 
(for fitting the training samples) is close to 100% in this 
case[48]—criteria for preventing and discovering the 
pitfall of overfitting. The most popular approach to 
overfitting a model is to restrict its complexity rather than 
minimize the number of features. The most common 
technique of under-fitting is to maximize the number of 
features. At around the same time, the number of samples 
used in the training and testing of a model should be of 
adequate size to obtain reliable results. What is done but 
is not always practical. In the third stage, multiple sources 
for evaluating the algorithm performance on a large and 
smaller set of data are outlined[49]. 
The development of Electronic Health Records 
(EHRs), which are permanent records of individual health 
records, is a priority for global healthcare systems. The 
exponential growth in the quantity of digitized clinical 
data has facilitated the development of data-driven 
healthcare, which integrates intelligent data analytics to 
improve decision-making and individualized treatment 
based on complex, diverse, and restricted data. The usage 
of ML and DL is required for thoughtful data analysis. ML 
combines methods from AI that allow the machine to learn 
from data to uncover complex and hidden patterns 
automatically. In the healthcare system, ML models are 
currently being used to evaluate data. In general, 
predictive analytics makes use of supervised ML 
algorithms. These strategies either categorize data into 
discrete categories (classification methods) or predict 
value (regression methodology)[50][51]. The DL 
approach progressively extracts higher-level information 
from the input image using several processing layers of 
linear and nonlinear transformations. The majority of DL 
techniques are built on a neural network framework. The 
word "deep" is categorized and frequently used to indicate 
several layers buried inside neural networks. DL methods 
learn features directly from the data, eliminating the need 
for human feature extraction. DL has been used for 
medical images only in the healthcare environment. 
Nevertheless, novel applications for EHR and bio-signal 
research have emerged lately. 
AI undoubtedly has a wide range of applications in 
clinical practice. Using a range of clinical diversity 
characteristics and the current lack of objectivity and 
universality in expert systems improves patient usability 
while helping with existing subjectivity and objectivity 
problems[52]. In addition, ML may assist healthcare 
institutions in educating young physicians about clinical 
diagnosis and decision-making. A rising number of 
research publications show the outstanding diagnostic and 
prognostic performance of ML-based computer systems. 
In particular, DL algorithms are revolutionizing our 
capacity to analyses imaging data. These findings may 
increase sensitivity and guarantee that radiologists will 
have fewer false positives. They do, however, risk 
overfitting the training data, leading to brittle performance 
degradation in some scenarios. As a result, ML often 
entails a trade-off between accuracy and intelligibility. 
More precise models, like boosted trees, random forests, 
and neural networks, are often incomprehensible 
understandable models, such as logistic regression, Naive-
Bayes, and single decision trees, often perform worse [53]. 
3 Related work 
The author[54] summarizes the epidemiology and 
management of GC and gastroesophageal junction cancers 
(GEJC) and estimates their global economic and 
humanistic burdens. GC has a significant impact on 
patient health since it is associated with severe symptoms, 
a shortage of effective treatments, and economic costs that 
are expected to continue rising worldwide over the next 
decade. Nevertheless, there is still significant room for 
advancement in early detection and intervention and the 
discovery of new life-extending medications. In addition, 
predictive biomarkers may be used to identify patients 
who are most likely to benefit from a particular medicine, 
which can help maximize treatment efficiency and patient 
success. Unfortunately, although several studies have 
examined the epidemiology and treatment of GC and 
gastroesophageal junction cancers (GEJC), no worldwide 
estimates of their economic impact have been published to 
our knowledge. 
In this study, the author [59] highlights how ML may 
support cancer detection and therapy via supervised, 
unsupervised, and DL techniques. Current technology 
approaches are grouped under one cluster for accuracy, 
sensitivity, specificity, and fake positive metrics 
compared with benchmark data sets. 
In this paper[55], the author proposed a prediction 
method for survival, distant metastases, and peritoneal 
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 155 
 
metastases in GC using Gaussian Naive Bayes (GNB), 
XGBoost, and random forest algorithms. The study has 
observed the most successful models in OS prediction of 
distant metastases. Further, the peritoneal metastases were 
identified to be GNB with 81% accuracy. 
In this study, the author [56] discusses how ML with 
supervised, unsupervised, and DL techniques can help 
with cancer diagnosis and treatment. Many state-of-the-art 
methods are classified under the same cluster for the 
accuracy, sensitivity, specificity, and false-positive 
metrics and the results compared to benchmark datasets. 
This study also examines, categorizes, and exposes current 
limits on methods for distinct forms of cancer. In addition, 
several obstacles to prospective future work. The primary 
goal of this study is to provide new researchers with an 
intellectual foundation for them to begin their research in 
medicine. The challenges in cancer detection and 
treatment are redesigning the research pipeline, 
understanding cancer development phenomena, 
developing preclinical models, accurately managing 
complex cures, treating earlier, developing and delivering 
innovative clinical trial methods, and improving trial 
accuracy. 
According to the author [57], many classifiers have 
been used in cancer diagnosis, with a decision tree, neural 
network, and support vector machine outperforming 
others. While these classifiers have been the most accurate 
at predicting, their findings may include datasets that 
differ. In real-time, the gastric cancer detection process 
requires high prediction accuracy with less training time. 
Also, adopting optimization algorithms for improving the 
network architecture based on the dataset availability can 
improve the detection rate. 
In this research, the author [58] states that due to the 
almost total absence of symptoms in the early stages of 
GC, it is difficult to diagnose the exact form of cancer in 
the earliest accounts. Endoscopy is a very accurate and 
precise diagnostic technique. While the processes perform 
under the supervision of a physician, malignant patches 
may be omitted or not diagnosed effectively. Due to the 
inability to completely identify the malignant spot, cancer 
may reappear after invasive surgery. To reduce this issue, 
a computerized decision support system, CDS, was 
developed with the help of expert physicians and image 
processing techniques. The CDS approach adopted in this 
study serves as a guide for gastrointestinal physicians, 
guiding them to identify cancerous patches in endoscopy 
images of the framework, collecting samples from these 
spots, and providing a more accurate diagnosis. The 
region will be determined with the use of biopsy samples 
taken from the patient. As a result, it is regarded as a 
model. Thus, this study would have prevented patients’ 
mental health from deteriorating, as well as the 
complications connected with multiple biopsies and the 
resultant loss of faith in clinicians. 
Even though the decrease in the prevalence of GC, the 
author [59]claims that it is still a fatal disease, with more 
than a quarter of a million individuals dying each year. 
H.pylori is the primary cause, accounting for 60–70% of 
all cases. In a clinical investigation done in China, the 
incidence of GC reduced up to 39% following h. Pylori 
eradication during 15 years. Combining h. pylori 
eradication with endoscopic screening has shown promise 
in GC incidence in high-risk groups. In Japan, this risk-
based approach to GC prevention can pave the way for 
eliminating this, especially deadly illnesses.   
In this study, the author [60] explains recent advances 
in GC prevention measures and recommends international 
cooperation ability to achieve population-based 
helicobacter pylori treatment programs, as the most 
evidence-based strategy currently available for GC 
prevention, in the context of demonstration projects in 
selected populations, to increase. In order to reduce the 
tremendous loss of life and productivity caused by this 
avoidable cancer, we must act quickly. To substantially 
decrease the enormous loss of life and productivity caused 
by this avoidable malignancy, researchers propose a 
nationally coordinated effort to adopt a population-based 
h.pylori treatment program, the most robust evidence-
based approach presently available for GC prevention, in 
the context of demonstration projects in selected groups 
that will be scaled up later. 
4 Nature of cancer and its impact 
Before implementing an application, let get some 
background information on the classification, grouping, 
and impact of cancer. Cancer is caused by the growth of 
abnormal cells in the human body. It is generally a tumour 
type characterized by its size, shape, type of tumour, 
pattern of growth, and location [61]. Tumours are 
therefore divided into three different categories: benign, 
premalignant, and malignant. The term "benign tumour" 
refers to tumours that pose no danger to the patient's health 
(i.e., they do not cause an invasion of the surrounding 
tissue and do not spread to any other part of the human 
body). Although premalignant tumours are not yet 
cancerous, it has been shown that cancer can develop and 
form tumours. This tumour wants close surveillance of the 
patient. Malignant tumours, in the end, primarily affect the 
human body. In the case of malignant patients, the 
tumours rapidly split and spread over remote sites, thereby 
inducing metastases and other organ infestations [62]. 
Various therapies, including surgery, immunotherapy, and 
radiation therapy, are used because of the unparalleled 
complexity of cancer. Surgical procedures are frequently 
performed on patients to prevent tumours from spreading 
further into the human body. Chemotherapy is a method 
of diagnosing and treating hormone-related tumours. 
4.1 Contribution of data mining in cancer 
domain 
In the field of the medical domain, healthcare-associated 
DM is one of the most valuable and demanding 
applications of KDD. The problem is by data sets 
consisting of volume, value, velocity, variety, and 
veracity. Furthermore, the medical records data set is 
stored and distributed in multiple locations to integrate 
various sources. Moreover, the other issue faced by data 
miners is political, legal, and social problems with 
sensitive medical data. The reality is that data analysts do 
156 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
not have domain knowledge. Therefore, it requires active 
collaboration between domain experts and data 
miners[63]. There are various DM techniques, particularly 
those used to identify issues in the medical domain, and 
the problems that people face are as follows. When 
speaking with different people, the term DM takes on a 
new meaning. However, large amounts of data to predict 
future events are analyzed as the true basic definition. DM 
currently plays a significant role in the health industry by 
improving the efficiency of the health care industry. DM 
addresses several real-life issues at the moment. Because 
raw data is mainly transformed into more meaningful 
information by the DM technique[3], criticize the lack of 
DM compliance with all statistical requirements[64]. For 
example, many data extraction tools use the same sample 
of training and testing[65]. DM may remain a medical 
resource, given this criticism.DM may help physicians 
identify cancer by giving them a better perspective of the 
disorder and generating a wealth of information that may 
be examined in many medical areas, where vast quantities 
of data may overlap[65]. The statistical accuracy of the 
models included is just one indication of the significance 
of DM in the medical field. 
4.1.1 Heterogeneity of data 
In the field of the medical domain, the data set contains 
dissimilarities in its data type and is very large in volume. 
Images, patient interviews, physician notes, and 
explanations are all used to collect clinical data. These 
factors are critical for DM scholars, as the doctor identifies 
and guesses the best treatment for patients. Physicians use 
unstructured, data-free texts in English, images, and other 
clinical data for software normalization and processing. In 
contrast to other branches of science, the underlying 
medical data set. Because medical data lacks a formal 
structure, the research organizes the information gathering 
[63]. 
4.1.2 Moral and social issues 
Because human data is involved in medical data mining, 
they have related issues, as there may be some legal, 
ethical, and social concerns. Prevention of patient data and 
sensitive information handled very carefully. The range of 
available human medical data knowledge for DM is 
enormous. Data ownership can stop attempts to acquire 
the necessary data or to link other data sets. There are 
serious issues concerning the possession of data by 
patients, including ongoing high-profile hearings and 
court investigations. Concerns about privacy and security 
are another distinguishing feature of medical data, 
particularly in diabetes. The physician has easy access to 
the patient’s data; there is no need for such data to be 
published. When transferring data to other servers, data 
security is a concern[63]. 
4.2 Operational challenges in the cancer 
domain 
A cancer forecasting support system enables doctors to 
make appropriate, accurate, and timely decisions that 
reduce overall treatment costs. For the conduct of 
experiments, different classifiers were used. The KNN 
classifier provides the most predictive variables with the 
highest classification accuracy. The summary results 
showed that the Decision Tree Inducing methodology 
(C5) had an accuracy of 93.6%. The second most efficient 
model to be used with precision to classify it is 91.2%. The 
worst-case logistic regression model has a classification 
accuracy of 89.2%[64]. With so many risk factors linked 
with heart disease, a solid model is required to estimate 
cancer’s likelihood. The time of the GC is precious for the 
heart. The correct risk assessment can save the lives of 
many patients[109][65]. ML algorithms have played a 
critical role in categorizing cervical cancer to diagnose its 
early[4]. ML is now one of the most promising and fast-
growing sectors for medical data diagnostics. The 
different ML algorithms, the functions, data sets, and the 
exactness used. Track the process of correlation. 
Moreover, to predict the depth of cervical cancer through 
machine learning [4].  
In the last decade, physicians could not say which of 
their patients could lead to cancer GC. Most patients do 
not know how long they will have any abnormalities in 
their stomachs. Doctors make decisions based on their 
experience and skills, not on the rich data but the patient 
database. This practice induces undesirable prejudices, 
mistakes, and high medical costs that affect the quality of 
patient service. The integration of clinical decision 
support (CDSS) and the patient database might lead to 
intelligent decision-making. In this context, DM is a 
fantastic approach to making better clinical decisions [66]. 
Early diagnosis of cancer prolongs human lives and is vital 
in fighting against GC. Medical imaging data is another 
factor in the early detection of GC diagnosis. Despite the 
increase in medical imaging data, the interpretation of the 
data concerning or compared to the speed of progression 
of cancer GC is time-consuming and difficult. In addition, 
if physicians misinterpret data in detection GCs, this will 
decline the accuracy rate sharply. ML is a sub-branch of 
artificial intelligence, widely used in medical image 
processing for cancer detection, classification, and tumour 
segmentation diagnosis [67]. Another prominent 
application of ML in the healthcare field is cancer, such as 
breast cancer, heart cancer, cervical cancer, and 
tuberculosis cancer. It is often done by applying ML 
methods to images of different organs or tissues suspected 
[56]. ML models have been proposed in multiple research 
works in the literature to detect cancer based on tissue 
images due to their success with image classification 
problems in general. 
4.3 Limitations of data mining in 
healthcare sector 
Although the DM provides information and support to 
paramedic staff by identifying patterns hidden in the 
dataset, the DM capabilities remain limited. It is worth 
noting that not all hidden patterns are stored in a dataset 
and can only use the DM technique. It should be 
reasonable and feasible to make the pattern interesting. 
DM is, therefore, for manual intervention to benefit from 
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 157 
 
the knowledge extracted. DM, for instance, could help 
diagnose, prescribe or replenish the intuition and skills of 
the doctor [68]. This approach argued that medical data is 
often composed of heterogeneous variables such as 
ethnicity, a family history of cancer, medicines, allergic 
responses, metabolic problems, and imaging tests. Each 
gives a partial representation of a patient's health. 
Moreover, the statistical properties of the above outlets are 
intrinsically distinct. As many scholars and doctors 
analyze this data, they address two challenges: the curse 
of dimensionality (the feature space grows exponentially 
in terms of dimension and sample size) and the variability 
of the feature sources and their statistical properties. These 
factors contribute to cancer GC diagnosis delays and 
inconsistencies, preventing patients from receiving timely 
treatment [110]. Therefore, there is a strong need for a 
comprehensive approach that enables early cancer GC 
diagnosis and can be utilized as a physician's decision. As 
a result, physicians and advanced statistical domain 
experts are overwhelmed with the dilemma of identifying 
innovative techniques for predicting cancer GC prognosis 
and diagnosis, as existing paradigms are incapable of 
handling any of this knowledge. This prerequisite is 
intimately associated with innovations in other sectors, 
such as BD, DM, and AI [10]. While an increasing amount 
of information is available in today's healthcare sector, 
doctors and nursing staff have difficulties performing 
time-consuming manual data analysis to make the best 
medical decisions while reducing uncertainty, patient 
risks, and costs. It has the unintended effect of resulting in 
substandard patient care, which is unacceptable. 
According to one study, 44,000 to 98,000 people die in the 
United States each year due to preventable medical errors 
(which account for 2–4% of all medical fatalities). 
Separate research found that up to 40% of patients were 
not receiving the appropriate treatment for chronic or 
acute illnesses [26]. 
5 Result analysis and discussion of 
the finding 
The author [69] explains the four ML algorithms to 
identify GC patients in their study. Their research sought 
to find patterns using customized ML algorithms to assess 
gastrointestinal problems and mobile algorithms to predict 
GC cancers. The findings showed that 95% of the 
improved performance in the data set through an 
algorithm. In their studies [5] show that hybrid ML models 
may improve sensitivity and general accuracy. The neural 
network has categorized cancer cells with 100 percent 
sensitivity and 99.66 percent accuracy. 
This study [70] was used to evaluate medical 
information using hybrid ML algorithms. After addressing 
the constraints of prior architectures, they have 
implemented a new computational intelligence 
architecture. This new computing architecture utilizes the 
SVM to carry on the medical data set. The researchers also 
pointed out, when we integrated it with the feature 
selection algorithm, the Genetic Algorithm (GA), the 
SVM outperformed the other classification algorithms by 
attaining greater accuracy and sensitivity. 
The author's goal in this study[71]is to address some 
hybrid approaches, specifically DM and optimization 
techniques. The author explores and applies several 
strategies for classification and prediction to diagnose 
heart GC in earlier stages. It also builds up the structure 
for decision-making and prediction. According to the 
author of this study[72], cervical cancer is the fourth 
leading cause of death in women. In this paper, the author 
suggests three approaches—the SVM-based approach for 
the diagnosis of cervical cancer. The second author 
proposed two improved SVM approaches for diagnosing 
cancer samples: machine-recursive removal vector and 
support machine-main component analysis (SVM-PCA). 
This paper [73]tested several hybrid ML models and 
discovered that the SVM hybrid model and simulated 
annealing provided 96% predictive precision when 
classifying hepatitis patients. These previous studies 
inspired the following studies to implement several hybrid 
combinations of ML algorithms, such as classification 
algorithms integrated with feature selection algorithms, 
and then use error optimization algorithms to perform 
hyper-parameter tuning. Removing unnecessary data from 
a dataset improves prediction model performance and 
avoids algorithm misdirection.[74].It is evident from 
Table 5 that several studies have been done in the past to 
find the DM technique and its use in various types of 
cancer, such as GC, breast, heart, GC, and cervical. DM 
techniques like SVM (Support Vector Machine), KNN, 
Decision Tree, and Naive Bayes have shown the most 
significant results in terms of accuracy when compared to 
other techniques. GC and different DM methods have 
been the subject of extensive study efforts in the last 
several decades. Many new ML techniques, including 
Artificial Neural Network (ANN), Bayesian networks 
(BN), Random Forest (RF), Support Vector Machines 
(SVM), Decision Trees (DT), and multilayer perceptron 
(MLP). Although ML algorithms have been widely used 
in GC and ultimately delivered high classifications, an 
appropriate level of validation is required in daily clinical 
and practice to take these methods into account. 
Globally, GC continues to be a leading cause of 
cancer death, with a high death rate, attributable to the fact 
that the vast majority of GC patients are at an advanced 
stage of cancer, and prognoses are bleak treatment 
preferences are minimal. Certain types of cancer, such as 
GC, are hard to identify early on owing to their non-
specific symptoms and ambiguous tell-tale that are 
difficult to identify at an early stage. As a result, improved 
prediction models based on multivariate data and high-
resolution diagnostic tools are vital in clinical cancer 
research. Because of a large, curved organ with blind spots 
in the stomach, it is pretty challenging to inspect the whole 
stomach[75] thoroughly. If AI can identify the anatomical 
parts of the stomach, it may be sure that the whole stomach 
has been thoroughly checked. Because GC occurs in 
individuals with chronic gastritis, certain EGC seem to be 
similar to gastritis and are thus difficult to distinguish. The 
prognosis of GC varies according to the stage at the 
diagnosis; the prognosis is terrible when detected at an 
advanced stage. Nevertheless, the 5-year survival rate for 
early gastric cancer approaches 90 percent. A gastric 
158 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
cancer diagnosis is an interdisciplinary research method 
that is identical to the endoscope [76]. Each year, about 80 
million deaths occur due to misdiagnosis of cancer. A 
large number of cases and a patient's short prior medical 
history have resulted in fatal errors committed. These 
factors do not affect ML.ML algorithms can anticipate and 
diagnose cancers at a quicker rate in the healthcare 
industry when compared to medical experts, which is a 
significant advantage. The significance of early diagnosis 
and prognosis in increasing the survival rate of GC 
patients has been well documented [77]. Aside from that, 
the tumour in the digestive system in GC patients directly 
affects digestion and nutrient absorption and the adverse 
effects associated with chemical treatment of the GI tract. 
GC may be severe if not treated immediately. The reduced 
gastric digesting function may harm the nutritional 
condition of patients.  Cancer is a chronic disease that 
affects the start site and can spread to other sites, resulting 
in a cascade of adverse effects on the patient's health and 
nutritional status. The presence of malnutrition as the first 
symptom indicates the occurrence of this cancer. It has a 
significant impact on cancer patients' nutritional and 
health conditions due to the adverse effects of cancer 
treatment[78][79]. The integration of genetic and 
histological components has brought progress in our 
knowledge of gastric cancer at a pathological level. One 
of the areas of pathological diagnosis might greatly benefit 
from methods that support DL integration. Many patients 
are hopeful as to how their diagnosis will be affected 
according to the present mood. DL may help increase the 
range of treatment choices and methods. In the world of 
cancer, having more detailed knowledge about the 
particular causes of stomach cancer provides it with an 
advantage over other forms of cancer since the origins of 
these cancers are not known fully or decoded. [80][81]. 
Many technical difficulties are still to be faced before 
AI can significantly impact the medical profession. It is 
essential to ensure that the training data has adequate high-
quality data since these approaches rely heavily on 
massive amounts of high-quality data. Some diverse 
healthcare systems may collect data with biases and noise, 
which can negatively affect a model's training to perform 
well in one environment, but it does not work well 
elsewhere [111]. When diagnostic tasks reveal poor inter-
expert agreement, machine learning models trained on the 
data may enhance their performance. Comprehensive data 
curation is needed to handle a range of data sources; 
extensive data curation is necessary. Comprehensive data 
curation is required. Many successful machine-learning 
models cannot be understood by those who are not 
involved in their development. These computer models 
can perform better than humans, is difficult to articulate 
the concepts in the models, pinpoint weaknesses in the 
models, or uncover new biological insights when 
analyzing these computational "black boxes" [82][83]. 
The healthcare data collection, storage, and sharing 
problem persist with electronic health records (EHRs). 
Safe data exchange via cloud services is possible with 
privacy-preserving techniques (such as third-party-hosted 
computing environments). The development of 
interoperable apps that satisfy the standard for clinical 
information is needed to make this infrastructure widely 
available. There is limited, sluggish, and inconsistent 
integration of health-related information across healthcare 
apps and locales [84]. 
To sum up, the research on AI in stomach cancer in 
the current context focuses on diagnosing cancer. At this 
point, the real benefits of AI are more impressive than 
broadly applicable therapeutic benefits. It may explain 
why there are so few AI researchers in the medical field. 
It may be feasible to extend the fundamental concepts of 
minimally invasive laparoscopic surgery (as they now 
exist) and apply them to the development of new 
computer-aided techniques, such as AI. Clinical issues 
that we see approaching are, on the surface, quite similar 
to AI-based problems [85]. Access to global medical 
resources may slow the adoption of AI in well-resourced 
areas, but more sufficient resources may have advanced 
the development of AI faster. Clinicians with AI expertise 
may be the crucial figures as far as AI advancement is 
concerned. There is a sizeable degree of acceptability and 
feasibility to the negative effect of accident situations, 
which may reduce by AI-assisted regurgitation, digestion 
and distribution of standardized therapeutic 
administration. Medical professionals nowadays have 
limited time to keep up with the newest advances in the 
digital patient care system in today's healthcare system. As 
a result, health expenditures remain high; regrettably, a 
significant portion of the population does not have access 
to quality medical treatment[86].  
It is common to see physicians making extensive use 
of clinical trial data throughout utilizing vast amounts of 
patient data to grow and enhance their practice.ML may 
lead to clinical discoveries by uncovering previously 
unnoticed patterns in large data sets. It is clear that AI is a 
part of this area, but training and cooperation between 
specialists in computer science experts and medical 
professions are even more critical. This new technology 
implementation in an affordable manner by the medical 
personnel before it affects patient care [84]. Cancer 
treatment may be affected by AI invention because it may 
impact many aspects of it. These include predicting, 
screening, understanding large data sets, and interpreting 
imaging tests in the clinic. Early detection of tumour 
targets in both healthy and high-risk populations affords 
the potential to find cancer before it spreads, providing 
patients with a chance of successful treatment and more 
rapid recovery for a cure. A rise in AI, ML, and DL is 
speeding up, and soon these advances will revolutionize 
cancer screening and diagnosis. While we are eager to use 
cutting-edge AI technology to enhance cancer prediction, 
we must also work to educate our cutting-edge AI 
technologies on the nature of cancer early on. While AI 
applications are currently restricted, the potential for AI to 
play a significant role in cancer early detection is 
enormous. It is possible to determine diagnosis, prognosis, 
and therapy response by extracting information from the 
results. [85][57]. 
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 159 
 
5.1 Applied excellence in machine learning 
and deep learning 
Medical experts find ML a helpful asset in patient care, 
prevention, and identification of infectious GCs. Effective 
use of these strategies is almost entirely missing in the 
hospital environment. A more natural way of thinking 
about ML algorithms trained on a task and then learned to 
do that job. What is true if the training data set was 
manually selected and labelled under the supervision of 
someone who favoured the techniques, parameters, and 
processes used to construct the task. Through their 
interpretation, analysis, and optimization components, 
ML algorithms leverage the accessibility of massive 
quantities of data and higher computational architectures 
to represent more multivariate analysis processes than 
traditional approaches; DL facilitates the detection of 
previously hidden patterns, extrapolation of trends, and 
prediction of outcomes in a wide variety of problems, all 
while seeking to "learn". Now, ML algorithms are an 
initiation into the medical reports of patients. The aim is 
to determine, for example, which patients are more likely 
to need readmission to the hospital or who are unable to 
adhere to prescription medications. The applications in 
diagnosis, testing, drug development, and clinical trials 
are nearly limitless [102]. Despite the abundance of 
digitized evidence, predictive methodologies constructed 
from hospital records remain predominantly on a linear 
model and only have more than 20 or 30 variables. 
However, one tangible advantage of ML is that experts are 
not required to determine which parameters to consider 
TYPE MACHINE  LEARNING  TECHNIQUE REFERENCES ACCURACY 
BREAST CANCER 
Neural Network(NN), DBN, and backward-propagation. Abdel-Zaher and Eldeib 
(2016)[5] 
99.68% 
Support Vector Machine, K-Nearest Neighbors, 
multilayer perceptron, Decision Trees, Random Forest, 
Logistic Regression, Adaptive boosting, Gradient 
Boosting. 
Turgut et al. (2018)[87] 
95% 
SVM,Decision Tree, C4.5,Naive Bayes, classification 
algorithms. 
Pritom et al. (2016)[74] 76.26% 
Comparison of LR and RF. 
 
R. Kannan et al.(2018)[88] 87% 
Decision Tree. 
 
Rajesh Jangade et al. 
(2018)[89] 
75.10% 
Genetic Algorithm(GA) application. KaanUyar (2017)[90] 97.78% 
HEART ATTACK 
Random Forest, Decision Tree, and Naïve Bayes. 
 
H. Benjamin et al.,(2018)[91] 81% 
DecisionTree(DT), Genetic Algorithm (GA), Artificial Neural 
Network (ANN), naive Bayes algorithms. 
 
Hilal et al. (2017).[92] 
69.5% 
Neural Network Naïve Bayes classifier, and J48 decision tree. 
 
Noreen Akhtar et al. (2018)[93] 80% 
Principal Component Analysis, decision tree, and SVM. Dey, A et al. (2016)[94] 70% 
(CANFIS) coactive neuro-fuzzy inference system, and GA. Parthiban L et al. (2018)[95] - 
Hybrid DM model. 
Shrivastava A al (2016)[71] 
Zriqat I al (2016)[96] 
99.0% 
Support Vector Machine technique discovered to be 
attractive with high accuracy in the model. Assari R et al. (2017)[6] 
84.33% 
CERVICAL 
CANCER 
Support Vector Machine (SVM) with Genetic Algorithm (GA). 
 
Kalantari et al. (2017)[70] 97.88% 
(SVM) algorithms Support Vector Machine with PCA. 
 
Wu and Zhou (2017)[72] 93.97% 
Radial kernel support vector classifier (SVM Radial), Bayesian 
Optimization, and GB Machines. 
 
Nishio et al. (2018) [97] 
- 
Grid Optimization algorithm and (SVM). Zhao et al. (2018)[98] 85.56% 
GASTRIC CANCER 
 
  
Apriori, CN2 Rules, C4.5, and Naive Bayes (NB). 
 
Mahmoodi et al. (2016)[99] 87.2% 
Logistic regression (LR), C5.0, Decision Tree (DT), multilayer 
perceptron (MLP), and tree augmented naive Bayesian network.  
 
Liu, M et al. (2018)[100] 
77.84% 
Different techniques such as support vector machine (SVM), 
decision tree (DT), naïve Bayesian model, and k-nearest 
neighbour used to find the closest neighbour in a dataset (KNN). 
 
AsgharMortezagholi et al. 
(2019)[69] 
90.08% 
Logistic Regression Algorithm, Genetic Algorithm & MICE 
Algorithm. 
 
LadanGoshayeshi et al. 
(2017)[66] 
72.57% 
K-Nearest Neighbour, XGBoost, and LightGBM. 
 
Amirgaliyev Y et al. 
(2019)[101] 
 
95% 
Table 4: DM techniques & their uses in various cancer. 
160 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
and in what variations. An essential aspect to remember 
when implementing ML in medical care is the accuracy of 
evidence from different sources. Each healthcare system 
can collect patient information in unique ways to attain 
mutual objectives. As a result, before implementing ML, 
it is necessary to match the results. Avoids data 
overfitting, which makes it more challenging to extend the 
methodology to other data sets. The issue of racism is 
often crucial. This challenge arises where training data 
coverage is inadequate and inaccurate when referring to 
minority groups. In particular, it is valuable in medical 
care to provide a variety of broad datasets to reflect the 
distinctive features of each group of patients. In general, 
having a variety of significant datasets in a hospital is 
desirable, intending to emphasize the distinguishing 
features of each patient group. Therefore, the algorithm's 
intelligibility is of great importance. It is essential to 
maintain a balance between performance and 
accessibility. Due to the (variable) complexity of higher-
performance models (for example, DL), it is much more 
difficult to identify them. By contrast, models (regression 
models or decision trees) are considerably more precisely 
defined[10]. Finally, another area where ML has shown 
great promise is in the field of the healthcare sector. 
Modern medical organizations frequently use EHR due to 
the widespread use of electronic health records (and many 
heterogeneous data components) [103]. It includes patient 
demographics, diagnoses, laboratory test results, 
medicines patients have already taken, and medical 
records. In addition to medical imaging, sensor, and text 
data, patient data includes imaging, sensor, and text 
data[103]. 
Although it was previously believed that having 
access to more information on each patient would lead to 
better-informed medical choices, this has yet to be proven 
true. While this remark highlights that medical 
practitioners are constantly bombarded with information, 
it glosses over the massive amount of data available to 
medical specialists. The use of ML algorithms is 
highlighted in this part by applications of these techniques 
in several research areas, including predicting individual 
patient responses to cancer medicines, diabetes research, 
retinopathy detection, and cancer detection [10]. 
6 Future research and 
developmental challenges 
Marketing and manufacturing are only two industries that 
have made extensive use of (ML). Its use in the healthcare 
sector is gaining traction. Difficulties such as irrelevant 
characteristics, uncertainty, computational difficulty, 
dynamic nature, and computational time are becoming 
more widely studied, thanks to the increase in recent 
research investigating the ML complexities. This section 
discusses potential research applications, such as 
personalized treatment, data loss during pre-processing, 
clinical data collection for scientific purposes, automation 
for junior expert users, collaborative study and domain 
expert experience, integration into the healthcare sector, 
and prediction-specific to DM application and integration 
with the healthcare system. There are some challenges 
mentioned below for the diagnosis of GC using the ML 
technique [104]. 
6.1 Information loss during data pre-
processing 
Information gathering or data pre-processing is the most 
time-consuming & cost-effective component in the DM. 
Missing values accounted for roughly 46.5 percent of the 
data and 363 out of 410 attributes in one study. We lose a 
large amount of information when we filter out missing 
value instances and outliers. Future studies should develop 
a more accurate way of determining missing values rather 
than when they were removed. Additionally, new or 
modified data gathering procedures were modified to 
circumvent this problem. If there is any missing data, a 
strategy is to deal with outliers. However, as shown in one 
of the studies we reviewed, outliers may be utilized to 
learn uncommon disorders. Instead of leaving out the 
oddities, future studies should seek them out and discover 
what they can teach[105]. 
6.2 DM process automation for junior 
expert users 
Physicians, nurses, and other paramedic staff with 
insufficient data analytics knowledge or training are the 
beneficiaries of DM in the healthcare sector. One way to 
address this issue is to establish an automated (i.e., not 
supervised by humans) system for end-users. A cloud-
based framework for preventing medical errors was also 
designed, although the work would be difficult due to the 
variety of application areas and that no single algorithm 
would be equally accurate for all applications [30]. 
6.3 The study's interdisciplinary approach 
and domain technical expertise 
Health informatics is a field of study that encompasses 
multiple disciplines. In certain domains of healthcare 
issues (for example, oncologists for cancer studies), DM 
is used in conjunction with expert opinion. Approximately 
32% of publications in analytics did not include any 
professional guidance. Deeper analysis should involve 
representatives from a variety of fields, including 
healthcare [106]. 
6.4 Incorporated into the healthcare sector 
Several articles reviewed attempted to incorporate the DM 
technique into the decision-making mechanism itself. The 
effect of knowledge discovery via DM on the workload 
and time of medical practitioners is questionable. Future 
research should investigate system integration and its 
influence on professional environments[103]. The 
research findings provide valuable information on GC 
patients’ nutritional state. As a result, clinicians and 
nutritionists have a solid foundation to build treatments 
that are helpful to patients. Patients suffering from nausea 
and vomiting due to their cancer treatment may 
compromise their nutritional needs. When attempting to 
improve a patient's general health, especially those with 
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 161 
 
GC, it is critical to ensure that the patient consumes food 
before, during, and after treatment. In light of these facts, 
scientists conducted a study to see whether there was a link 
between dietary circumstances and GC patients’ quality of 
life. The nutritional status of GC patient survivors impacts 
their quality of life and correlates strongly with obesity, 
overweight, and underweight. Quality of life is 
significantly impacting on receiving regular medication, 
and patients should thus get it. While patients with eating 
disorders are identified, they also get dietary guidance to 
help them develop appropriate treatment plans  [107]. As 
a further point, there are so many challenges that occur 
during the diagnosis of GC. These issues are possibly the 
challenging obstacles to resolve which raised. Although it 
is still too early to establish general guidelines to deal with 
this challenge, I believe it is feasible to concentrate on 
these topics; I want future researchers to consider the 
suggestions. As a result, this is an open area for future 
researchers to explore. 
• It identifies the critical risk factors, such as nutrients, 
that contribute to the development of GC. 
• What are the interventions? How can we determine the 
interventions for the factors that contribute to GC? 
• To obtain an accurate prognosis, use different ML 
techniques to detect the presence of GC. 
7 Conclusion 
The role of BD in the healthcare industry is to empower 
us to develop more detailed health profiles of patients and 
predictive models for individuals, to improve patient care 
and prognosis in GC. In this paper, we cover BD, 
emphasizing its use in the healthcare sector.  Additionally, 
the primary challenge with BD in healthcare is making the 
data simple to interpret, which is beneficial for medical 
practitioners since it is a tool for detecting significant 
patterns in complex data. ML offers a solution to this 
challenge. The main focus of big data analytics in 
healthcare is integrating and analyzing massive volumes 
of complex and heterogeneous data from various sources, 
including biomedical data and electronic health 
information data. ML assists physicians in diagnosing 
effectively and predicting prognostic outcomes using 
multiple techniques that are beneficial to the patient's 
health with the support of ML. Furthermore, ML promotes 
the integration of models by utilizing various algorithms 
to support and predict factors with perfect precision.  
The rapid implementation in the medical industry and 
the use of BD are still challenging tasks for researchers in 
this domain. This article discusses the ML, DL, and BD 
paradigms and the different applications and their 
associated possible limitations, challenges, advantages, 
and drawbacks in GC. In addition, this research 
incorporates the KDD process, which shows how medical 
practitioners will extract knowledge from datasets and 
integrate (DSS) into the medical data set, which will assist 
in producing outcomes utilizing DM algorithms. This 
paper also gives information about DM's classification 
algorithms that will be implemented for various cancers. 
References 
[1] M. Mahmood, B. Al-Khateeb, and W. M. Alwash, 
“A review on neural networks approach on 
classifying cancers,” Int J Artif Intell, vol. 9, no. 2, 
pp. 317–326, 2020. 
https://doi.org/10.11591/ijai.v9.i2.pp317-326 
[2] A. M. Brushfield, T. T. Luu, B. D. Callahan, and P. 
E. Gilbert, “A comparison of discrimination and 
reversal learning for olfactory and visual stimuli in 
aged rats.,” Behav. Neurosci., vol. 122, no. 1, p. 54, 
2008. https://doi.org/10.1037/0735-7044.122.1.54 
[3] R. A. Smith et al., “Cancer screening in the United 
States, 2019: A review of current American Cancer 
Society guidelines and current issues in cancer 
screening,” CA. Cancer J. Clin., vol. 69, no. 3, pp. 
184–210, 2019. https://doi.org/10.3322/caac.21557   
[4] A. Shetty and V. Shah, “Survey of Cervical Cancer 
Prediction Using Machine Learning: A Comparative 
Approach,” in 2018 9th International Conference on 
Computing, Communication and Networking 
Technologies (ICCCNT), 2018, pp. 1–6. 
https://doi.org/10.1109/icccnt.2018.8494169 
[5] A. M. Abdel-Zaher and A. M. Eldeib, “Breast cancer 
classification using deep belief networks,” Expert 
Syst. Appl., vol. 46, pp. 139–144, 2016. 
https://doi.org/10.1016/j.eswa.2015.10.015 
[6] R. Assari, P. Azimi, and M. R. Taghva, “Heart 
Disease Diagnosis Using Data Mining Techniques,” 
Int. J. Econ. Manag. Sci., vol. 6, no. 3, 2017. 
https://doi.org/10.4172/2162-6359.1000415 
[7] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, 
“From data mining to knowledge discovery in 
databases,” AI Mag., vol. 17, no. 3, p. 37, 1996. 
https://doi.org/10.1145/240455.240464 
[8] P. Bertuccio et al., “Citrus fruit intake and gastric 
cancer: The stomach cancer pooling (StoP) project 
consortium,” Int. J. cancer, vol. 144, no. 12, pp. 
2936–2944, 2019.  
https://doi.org/10.1002/ijc.32046 
[9] S. Secinaro, D. Calandra, A. Secinaro, V. 
Muthurangu, and P. Biancone, “The role of artificial 
intelligence in healthcare: a structured literature 
review,” BMC Med. Inform. Decis. Mak., vol. 21, 
no. 1, pp. 1–23, 2021. 
https://doi.org/10.1186/s12911-021-01488-9 
[10] C. J. Kelly, A. Karthikesalingam, M. Suleyman, G. 
Corrado, and D. King, “Key challenges for 
delivering clinical impact with artificial 
intelligence,” BMC Med., vol. 17, no. 1, pp. 1–9, 
2019.  
https://doi.org/10.1186/s12916-019-1426-2 
[11] S. M. Jameel, M. A. Hashmani, M. Rehman, and A. 
Budiman, “An adaptive deep learning framework for 
dynamic image classification in the internet of things 
environment,” Sensors, vol. 20, no. 20, p. 5811, 
2020.        
[12] S. van Baalen, M. Boon, and P. Verhoef, “From 
clinical decision support to clinical reasoning 
support systems.,” Authorea Prepr., 2020.  
https://doi.org/10.22541/au.159986468.80473725. 
162 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
[13] S. Akundi, R. Soujanya, and P. M. Madhuri, “Big 
Data Analytics in Healthcare Using Machine 
Learning Algorithms: A Comparative Study,” 2020. 
https://doi.org/10.3991/ijoe.v16i13.18609. 
[14] M. A. Hashmani, S. M. Jameel, H. Al-Hussain, M. 
Rehman, and A. Budiman, “Accuracy performance 
degradation in image classification models due to 
concept drift,” Int. J. Adv. Comput. Sci. Appl, vol. 
10, 2019. 
https://doi.org/10.14569/ijacsa.2019.0100552. 
[15] P. Acharya and M. Mathur, “Artificial intelligence in 
dermatology: the ‘unsupervised’learning,” Br. J. 
Dermatol., vol. 182, no. 6, pp. 1507–1508, 2020.  
https://doi.org/10.1111/bjd.18955 
[16] T. Silwattananusarn and K. Tuamsuk, “Data mining 
and its applications for knowledge management: a 
literature review from 2007 to 2012,” arXiv Prepr. 
arXiv1210.2872, 2012.  
https://doi.org/10.5121/ijdkp.2012.2502  
[17] S. S. ZIA, P. AKHTAR, and T. J. A. MUGHAL, 
“Case Retrieval Process of CBR Technique 
Implements on Knowledge-Based Clinical Decision 
Support Systems (KBCDSS) for Diagnosis of Breast 
Cancer Disease,” Sindh Univ. Res. Journal-SURJ 
(Science Ser., vol. 47, no. 2, 2015.  
https://doi.org/10.26692/sujo/2019.01.22  
[18] A. Karahoca, Advances in data mining knowledge 
discovery and applications. BoD--Books on 
Demand, 2012. https://doi.org/10.5772/3349  
[19] P. E. Beeler, D. W. Bates, and B. L. Hug, “Clinical 
decision support systems,” Swiss Med. Wkly., vol. 
144, p. w14073, 2014.  
https://doi.org/10.4414/smw.2014.14073  
[20] C. Schuh, J. S. de Bruin, and W. Seeling, “Clinical 
decision support systems at the Vienna General 
Hospital using Arden Syntax: Design, 
implementation, and integration,” Artif. Intell. Med., 
vol. 92, pp. 24–33, 2018.  
https://doi.org/10.1016/j.artmed.2015.11.002  
[21] J. Ferlay et al., “Cancer statistics for the year 2020: 
An overview,” Int. J. Cancer,2021  
https://doi.org/10.1002/ijc.33588  
[22] S. S. ZIA, P. Akhtar, and T. J. A. MUGHAL, 
“Schematic Cycle of Case-Based Reasoning 
Technique Implements in Clinical Decision Support 
Systems Used for Diagnosis of Liver Disease,” 
Sindh Univ. Res. Journal-SURJ (Science Ser., vol. 
47, no. 2, 2015.  
[23] T. Lysaght, H. Y. Lim, V. Xafis, and K. Y. Ngiam, 
“AI-assisted decision-making in healthcare,” Asian 
Bioeth. Rev., vol. 11, no. 3, pp. 299–314, 2019. 
https://doi.org/10.1007/s41649-019-00096-0  
[24] H. C. Koh, G. Tan, and others, “Data mining 
applications in healthcare,” J. Healthc. Inf. Manag., 
vol. 19, no. 2, p. 65, 2011.  
https://doi.org/10.1109/icdmw.2011.202  
[25] I. Kavakiotis, O. Tsave, A. Salifoglou, N. 
Maglaveras, I. Vlahavas, and I. Chouvarda, 
“Machine learning and data mining methods in 
diabetes research,” Comput. Struct. Biotechnol. J., 
vol. 15, pp. 104–116, 2017.  
https://doi.org/10.1016/j.csbj.2016.12.005  
[26] C. Neto, M. Brito, V. Lopes, H. Peixoto, A. Abelha, 
and J. Machado, “Application of data mining for the 
prediction of mortality and occurrence of 
complications for gastric cancer patients,” Entropy, 
vol. 21, no. 12, p. 1163, 2019, https://doi: 
10.3390/e21121163. 
https://doi.org/10.3390/e21121163  
[27] S. M. Jameel, M. A. Hashmani, H. Alhussain, M. 
Rehman, and A. Budiman, “An optimized deep 
convolutional neural network architecture for 
concept drifted image classification,” in Proceedings 
of SAI Intelligent Systems Conference, 2019, pp. 
932–942.  https://doi.org/10.1007/978-3-030-29516-
5_70  
[28] F. M. Couto, Data and text processing for health and 
life sciences. Springer Nature, 2019.  
https://doi.org/10.1007/978-3-030-13845-5  
[29] T. Panch, P. Szolovits, and R. Atun, “Artificial 
intelligence, machine learning and health systems,” 
J. Glob. Health, vol. 8, no. 2, 2018.  
https://doi.org/10.7189/jogh.08.020303  
[30] S. R. Kumar, N. Gayathri, S. Muthuramalingam, B. 
Balamurugan, C. Ramesh, and M. K. Nallakaruppan, 
“Medical big data mining and processing in e-
healthcare,” in Internet of Things in Biomedical 
Engineering, Elsevier, 2019, pp. 323–339. 
https://doi.org/10.1016/b978-0-12-817356-5.00016-
4  
[31] M. Mittal, L. M. Goyal, D. J. Hemanth, and J. K. 
Sethi, “Clustering approaches for high-dimensional 
databases: A review,” Wiley Interdiscip. Rev. Data 
Min. Knowl. Discov., vol. 9, no. 3, p. e1300, 2019.  
https://doi.org/10.1002/widm.1300  
[32] H. W. Ian and F. Eibe, “Data mining: Practical 
machine learning tools and techniques.” Morgan 
Kaufmann Publishers, 2005.  
https://doi.org/10.1145/2020976.2021004  
[33] L. Wang and C. A. Alexander, “Big data analytics in 
medical engineering and healthcare: methods, 
advances and challenges,” J. Med. Eng. \& Technol., 
vol. 44, no. 6, pp. 267–283, 2020.  
https://doi.org/10.1080/03091902.2020.1769758  
[34] I. H. Witten, E. Frank, and M. A. Hall, Data Mining 
Practical Machine Learning Tools and Techniques 
Third Edition. Morgan Kaufmann, 2017.  
https://doi.org/10.1016/b978-0-12-374856-0.00015-
8  
[35] K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V 
Karamouzis, and D. I. Fotiadis, “Machine learning 
applications in cancer prognosis and prediction,” 
Comput. Struct. Biotechnol. J., vol. 13, pp. 8–17, 
2015. https://doi.org/10.1016/j.csbj.2014.11.005  
[36] N. Nissim et al., “Improving condition severity 
classification with an efficient active learning based 
framework,” J. Biomed. Inform., vol. 61, pp. 44–54, 
2016. https://doi.org/10.1016/j.jbi.2016.03.016  
[37] N. Iqbal and M. Islam, “Machine learning for dengue 
outbreak prediction: A performance evaluation of 
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 163 
 
different prominent classifiers,” Informatica, vol. 43, 
no. 3, 2019. https://doi.org/10.31449/inf.v43i3.1548  
[38] N. Nissim, Y. Shahar, Y. Elovici, G. Hripcsak, and 
R. Moskovitch, “Inter-labeler and intra-labeler 
variability of condition severity classification 
models using active and passive learning methods,” 
Artif. Intell. Med., vol. 81, pp. 12–32, 2017.  
https://doi.org/10.1016/j.artmed.2017.03.003  
[39] S. K. Zhou et al., “A review of deep learning in 
medical imaging: Image traits, technology trends, 
case studies with progress highlights, and future 
promises,” arXiv Prepr. arXiv2008.09104, 2020. 
https://doi.org/10.1109/jproc.2021.3054390  
[40] T. Lu, Y. Du, L. Ouyang, Q. Chen, and X. Wang, 
“Android malware detection based on a hybrid deep 
learning model,” Secur. Commun. Networks, vol. 
2020, 2020. https://doi.org/10.1155/2020/8863617  
[41] T. J. Saleem and M. A. Chishti, “Exploring the 
applications of Machine Learning in Healthcare,” 
Int. J. Sensors Wirel. Commun. Control, vol. 10, no. 
4, pp. 458–472, 2020.  
https://doi.org/10.2174/22103279106661912201034
17  
[42] S. Mittal and Y. Hasija, “Applications of deep 
learning in healthcare and biomedicine,” in Deep 
Learning Techniques for Biomedical and Health 
Informatics, Springer, 2020, pp. 57–77.  
https://doi.org/10.1007/978-3-030-33966-1_4  
[43] Y.-W. Chen and L. C. Jain, “Deep Learning in 
Healthcare.” Springer, 2020.  
https://doi.org/10.1155/2020/8863617  
[44] S. Pitoglou, “Machine Learning in Healthcare: 
Introduction and Real-World Application
 Considerations,” in Quality Assurance in the Era of 
Individualized Medicine, IGI Global, 2020, pp. 92–
109.  
https://doi.org/10.4018/978-1-7998-2390-2.ch004  
[45] A. Mustafa and M. Rahimi Azghadi, “Automated 
Machine Learning for Healthcare and Clinical Notes 
Analysis,” Computers, vol. 10, no. 2, p. 24, 2021.  
https://doi.org/10.3390/computers10020024  
[46] H. Wang and B. Raj, “A survey: Time travel in deep 
learning space: An introduction to deep learning 
models and how deep learning models evolved from 
the initial ideas,” arXiv Prepr. arXiv1510.04781, 
2015. 
https://doi.org/10.1007/978-1-4842-2766-4_14  
[47] A. Crippa et al., “Use of machine learning to identify 
children with autism and their motor abnormalities,” 
J. Autism Dev. Disord., vol. 45, no. 7, pp. 2146–
2156, 2015. https://doi.org/10.1007/s10803-015-
2379-8  
[48] A. D. Gavrilov, A. Jordache, M. Vasdani, and J. 
Deng, “Preventing model overfitting and 
underfitting in convolutional neural networks,” Int. 
J. Softw. Sci. Comput. Intell., vol. 10, no. 4, pp. 19–
28, 2018.  https://doi.org/10.4018/ijssci.2018100102  
[49] P. Samui, Handbook of research on advanced 
computational techniques for simulation-based 
engineering. IGI Global, 2015.  
https://doi.org/10.4018/978-1-4666-9479-8  
[50] X.-Y. Wang and J. M. Garibaldi, “Simulated 
annealing fuzzy clustering in cancer diagnosis,” 
Informatica, vol. 29, no. 1, 2005. 
[51] S. Sunarti, F. F. Rahman, M. Naufal, M. Risky, K. 
Febriyanto, and R. Masnina, “Artificial intelligence 
in healthcare: opportunities and risk for future,” Gac. 
Sanit., vol. 35, pp. S67--S70, 2021.  
https://doi.org/10.1016/j.gaceta.2020.12.019  
[52] M. Masmoudi, B. Jarboui, and P. Siarry, “Artificial 
Intelligence and Data Mining in Healthcare.” 
Springer, 2020.  https://doi.org/10.1007/978-3-030-
45240-7  
[53] S. Shamshirband, M. Fathi, A. Dehzangi, A. T. 
Chronopoulos, and H. Alinejad-Rokny, “A Review 
on Deep Learning Approaches in Healthcare 
Systems: Taxonomies, Challenges, and Open 
Issues,” J. Biomed. Inform., p. 103627, 2020. 
https://doi.org/10.1016/j.jbi.2020.103627  
[54] M. Casamayor, R. Morlock, H. Maeda, and J. Ajani, 
“Targeted literature review of the global burden of 
gastric cancer,” Ecancermedicalscience, vol. 12, 
2018.  https://doi.org/10.3332/ecancer.2018.883  
[55] M. Akcay, D. Etiz, and O. Celik, “Prediction of 
Survival and Recurrence Patterns by Machine 
Learning in Gastric Cancer Cases Undergoing 
Radiation Therapy and Chemotherapy,” Adv. 
Radiat. Oncol., vol. 5, no. 6, pp. 1179–1187, 2020. 
https://doi.org/10.1016/j.adro.2020.07.007  
[56] T. Saba, “Recent advancement in cancer detection 
using machine learning: Systematic survey of 
decades, comparisons and challenges,” J. Infect. 
Public Health, vol. 13, no. 9, pp. 1274–1289, 2020. 
https://doi.org/10.1016/j.jiph.2020.06.033  
[57] S.-L. Zhu, J. Dong, C. Zhang, Y.-B. Huang, and W. 
Pan, “Application of machine learning in the 
diagnosis of gastric cancer based on noninvasive 
characteristics,” PLoS One, vol. 15, no. 12, p. 
e0244869, 2020.  
https://doi.org/10.1371/journal.pone.0244869  
[58] A. N. Richter and T. M. Khoshgoftaar, “A review of 
statistical and machine learning methods for 
modeling cancer risk using structured clinical data,” 
Artif. Intell. Med., vol. 90, pp. 1–14, 2018.  
https://doi.org/10.1016/j.artmed.2018.06.002  
[59] A. Yasar, I. Saritas, and H. Korkmaz, “Computer-
aided diagnosis system for detection of stomach 
cancer with image processing techniques,” J. Med. 
Syst., vol. 43, no. 4, pp. 1–11, 2019.  
https://doi.org/10.1007/s10916-019-1203-y  
[60] J. Y. Park and R. Herrero, “Recent progress in gastric 
cancer prevention,” Best Pract. \& Res. Clin. 
Gastroenterol., p. 101733, 2021.  
https://doi.org/10.1016/j.bpg.2021.101733  
[61] A. Onasanya and M. Elshakankiri, “Smart integrated 
IoT healthcare system for cancer care,” Wirel. 
Networks, pp. 1–16, 2019.  
https://doi.org/10.1007/s11276-018-01932-1  
[62] M. D. Islam, W. A. Kaplan, D. Trachtenberg, R. 
Thrasher, K. P. Gallagher, and V. J. Wirtz, “Impacts 
of intellectual property provisions in trade treaties on 
access to medicine in low and middle income 
164 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
countries: a systematic review,” Global. Health, vol. 
15, no. 1, p. 88, 2019.  
https://doi.org/10.1186/s12992-019-0528-0  
[63] K. J. Cios, B. Krawczyk, J. Cios, and K. J. Staley, 
“Uniqueness of Medical Data Mining: How the new 
technologies and data they generate are transforming 
medicine,” arXiv Prepr. arXiv1905.09203, 2019. 
https://doi.org/10.1016/s0933-3657(02)00049-0  
[64] M. Kumari and V. Singh, “Breast cancer prediction 
system,” Procedia Comput. Sci., vol. 132, pp. 371–
376, 2018.  
https://doi.org/10.1016/j.procs.2018.05.197  
[65] G. Purusothaman and P. Krishnakumari, “A survey 
of data mining techniques on risk prediction: Heart 
disease,” Indian J. Sci. Technol., vol. 8, no. 12, p. 1, 
2015. 
https://doi.org/10.17485/ijst/2015/v8i12/58385  
[66] L. Goshayeshi et al., “Predictive model for survival 
in patients with gastric cancer,” Electron. physician, 
vol. 9, no. 12, p. 6035, 2017.  
https://doi.org/10.19082/6035  
[67] N. C. Caballé, J. L. Castillo-Sequera, J. A. Gómez-
Pulido, and M. L. Polo-Luque, “Machine learning 
applied to diagnosis of human diseases: A systematic 
review,” 2020. https://doi.org/10.3390/app10155135  
[68] W. H. Organization and others, “Cancer. 2018,” 
World Heal. Organ. Available http//www. who. 
int/mediacentre/factsheets/fs297/en, 2017.  
https://doi.org/10.23846/ow3.ie71  
[69] A. Mortezagholi, O. Khosravizadehorcid, M. B. 
Menhaj, Y. Shafigh, and R. Kalhor, “Make 
intelligent of gastric cancer diagnosis error in 
Qazvin’s medical centers: Using data mining 
method,” Asian Pacific J. Cancer Prev., vol. 20, no. 
9, pp. 2607–2610, 2019, doi: 
10.31557/APJCP.2019.20.9.2607.  
https://doi.org/10.31557/apjcp.2019.20.9.2607  
[70] A. Kalantari, A. Kamsin, S. Shamshirband, A. Gani, 
H. Alinejad-Rokny, and A. T. Chronopoulos, 
“Computational intelligence approaches for 
classification of medical data: State-of-the-art, future 
challenges and research directions,”  
Neurocomputing, vol. 276, pp. 2–22, 2018. 
https://doi.org/10.1016/j.neucom.2017.01.126  
[71] A. Shrivastava and S. S. Tomar, “A hybrid 
framework for heart disease prediction: review and 
analysis,” Int. J. Adv. Technol. Eng. Explor., vol. 3, 
no. 15, p. 21, 2016. 
https://doi.org/10.19101/ijatee.2016.315003  
[72] W. Wu and H. Zhou, “Data-driven diagnosis of 
cervical cancer with support vector machine-based 
approaches,” IEEE Access, vol. 5, pp. 25189–25195, 
2017.  https://doi.org/10.1109/access.2017.2763984  
[73] J. S. Sartakhti, M. H. Zangooei, and K. Mozafari, 
“Hepatitis disease diagnosis using a novel hybrid 
method based on support vector machine and 
simulated annealing (SVM-SA),” Comput. Methods 
Programs Biomed., vol. 108, no. 2, pp. 570–579, 
2012. https://doi.org/10.1016/j.cmpb.2011.08.003  
[74] A. I. Pritom, M. A. R. Munshi, S. A. Sabab, and S. 
Shihab, “Predicting breast cancer recurrence using 
effective classification and feature selection 
technique,” in 2016 19th International Conference 
on Computer and Information Technology (ICCIT), 
2016, pp. 310–314.  
https://doi.org/10.1109/iccitechn.2016.7860215  
[75] M. A. de Brito, C. Neto, A. Abelha, and J. Machado, 
“Prediction of mortality and occurrence of 
complications for gastric cancer patients,” in 2019 
International Conference in Engineering 
Applications (ICEA), 2019, pp. 1–6.  
https://doi.org/10.1109/ceap.2019.8883494  
[76] P. Jin et al., “Artificial intelligence in gastric cancer: 
a systematic review,” J. Cancer Res. Clin. Oncol., pp. 
1–12, 2020.  
https://doi.org/10.1007/s00432-020-03304-9  
[77] H. Nakashima, H. Kawahira, H. Kawachi, and N. 
Sakaki, “Artificial intelligence diagnosis of 
Helicobacter pylori infection using blue laser 
imaging-bright and linked color imaging: a single-
center prospective study,” Ann. Gastroenterol., vol. 
31, no. 4, p. 462, 2018.  
https://doi.org/10.20524/aog.2018.0269  
[78] M. Venerito, A. C. Ford, T. Rokkas, and P. 
Malfertheiner, “Prevention and management of 
gastric cancer,” Helicobacter, vol. 25, p. e12740, 
2020.  https://doi.org/10.1111/hel.12740  
[79] B. Kramer et al., “Long-term quality of life and 
nutritional status of patients with head and neck 
cancer,” Nutr. Cancer, vol. 71, no. 3, pp. 424–437, 
2019. 
https://doi.org/10.1080/01635581.2018.1506492  
[80] K. Togashi, “Applications of artificial intelligence to 
endoscopy practice: The view from Japan Digestive 
Disease Week 2018.” Wiley Online Library, 2019.  
https://doi.org/10.1111/den.13354  
[81] S. Yalcin et al., “Nutritional aspect of cancer care in 
medical oncology patients,” Clin. Ther., vol. 41, no. 
11, pp. 2382–2396, 2019.  
https://doi.org/10.1016/j.clinthera.2019.09.006 
[82] S. Vollmer et al., “Machine learning and artificial 
intelligence research for patient benefit: 20 critical 
questions on transparency, replicability, ethics, and 
effectiveness,” bmj, vol. 368, 2020.  
https://doi.org/10.1136/bmj.m1312  
[83] S. Huang, J. Yang, S. Fong, and Q. Zhao, “Artificial 
intelligence in cancer diagnosis and prognosis: 
Opportunities and challenges,” Cancer Lett., vol. 
471, pp. 61–71, 2020.  
https://doi.org/10.1016/j.canlet.2019.12.007  
[84] A. S. Ahuja, “The impact of artificial intelligence in 
medicine on the future role of the physician,” PeerJ, 
vol. 7, p. e7702, 2019.  
https://doi.org/10.7717/peerj.7702  
[85] T. M. Noguerol, F. Paulano-Godino, M. T. Mart\’\in-
Valdivia, C. O. Menias, and A. Luna, “Strengths, 
weaknesses, opportunities, and threats analysis of 
artificial intelligence and machine learning 
applications in radiology,” J. Am. Coll. Radiol., vol. 
16, no. 9, pp. 1239–1247, 2019.  
https://doi.org/10.1016/j.jacr.2019.05.047  
Diagnosis of Gastric Cancer Using Machine Learning... Informatica 45 (2021) 147–166 165 
 
[86] S. Hamid, “The opportunities and risks of artificial 
intelligence in medicine and healthcare,” 2016. 
https://doi.org/10.1201/b19187-4  
[87] S. Turgut, M. Da\ugtekin, and T. Ensari, 
“Microarray breast cancer data classification using 
machine learning methods,” in 2018 Electric 
Electronics, Computer Science, Biomedical 
Engineerings’ Meeting (EBBT), 2018, pp. 1–3.  
https://doi.org/10.1109/ebbt.2018.8391468  
[88] R. Kannan and V. Vasanthi, “Machine learning 
algorithms with ROC curve for predicting and 
diagnosing the heart disease,” in Soft Computing and 
Medical Bioinformatics, Springer, 2019, pp. 63–72.  
https://doi.org/10.1007/978-981-13-0059-2_8. 
[89] R. Chauhan, R. Jangade, and R. Rekapally, 
“Classification model for prediction of heart 
disease,” in Soft Computing: Theories and 
Applications, Springer, 2018, pp. 707–714.  
https://doi.org/10.1007/978-981-10-5699-4_67  
[90] K. Uyar and A. \.Ilhan, “Diagnosis of heart disease 
using genetic algorithm based trained recurrent fuzzy 
neural networks,” Procedia Comput. Sci., vol. 120, 
pp. 588–593, 2017.  
https://doi.org/10.1016/j.procs.2017.11.283  
[91] H. David and S. A. Belcy, “HEART DISEASE 
PREDICTION USING DATA MINING 
TECHNIQUES.,” ICTACT J. Soft Comput., vol. 9, 
no. 1, 2018. https://doi.org/10.21917/ijsc.2017.0202  
[92] H. Almarabeh and E. Amer, “A study of data mining 
techniques accuracy for healthcare,” Int. J. Comput. 
Appl., vol. 168, no. 3, pp. 12–17, 2017.  
https://doi.org/10.5120/ijca2017914338  
[93] N. Akhtar, M. R. Talib, and N. Kanwal, “Data 
Mining Techniques to Construct a Model: Cardiac 
Diseases,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 
1, 2018. 
https://doi.org/10.14569/ijacsa.2018.090173  
[94] A. Dey, J. Singh, and N. Singh, “Analysis of 
supervised machine learning algorithms for heart 
disease prediction with reduced number of attributes 
using principal component analysis,” Int. J. Comput. 
Appl., vol. 140, no. 2, pp. 27–31, 2016.  
https://doi.org/10.5120/ijca2016909231  
L. Parthiban and R. Subramanian, “Intelligent heart 
disease prediction system using CANFIS and genetic 
algorithm,” Int. J. Biol. Biomed. Med. Sci., vol. 3, 
no. 3, 2008. 
https://doi.org/10.1109/iama.2009.5228016  
[95] I. A. Zriqat, A. M. Altamimi, and M. Azzeh, “A 
comparative study for predicting heart diseases using 
data mining classification methods,” arXiv Prepr. 
arXiv1704.02799, 2017.  
https://doi.org/10.21884/ijmter.2017.4211.vxayk  
[96] M. Nishio et al., “Computer-aided diagnosis of lung 
nodule using gradient tree boosting and Bayesian 
optimization,” PLoS One, vol. 13, no. 4, p. 
e0195875, 2018. 
https://doi.org/10.1371/journal.pone.0195875  
[97] Y. Zhao, Y. Liu, and W. Huang, “Prediction model 
of HBV reactivation in primary liver cancer.Based 
on NCA feature selection and SVM classifier with 
Bayesian and grid optimization,” in 2018 IEEE 3rd 
International Conference on Cloud Computing and 
Big Data Analysis (ICCCBDA), 2018, pp. 547–551. 
https://doi.org/10.1109/icccbda.2018.8386576  
[98] S. A. Mahmoodi, K. Mirzaie, and S. M. Mahmoudi, 
“A new algorithm to extract hidden rules of gastric 
cancer data based on ontology,” Springerplus, vol. 5, 
no. 1, p. 312, 2016.  https://doi.org/10.1186/s40064-
016-1943-9  
[99] M.-M. Liu, L. Wen, Y.-J. Liu, Q. Cai, L.-T. Li, and 
Y.-M. Cai, “Application of data mining methods to 
improve screening for the risk of early gastric 
cancer,” BMC Med. Inform. Decis. Mak., vol. 18, 
no. 5, p. 121, 2018. https://doi.org/10.1186/s12911-
018-0689-4  
[100] Y. Amirgaliyev, S. Shamiluulu, T. Merembayev, 
and D. Yedilkhan, “Using Machine Learning 
Algorithm for Diagnosis of Stomach Disorders,” in 
International Conference on Mathematical 
Optimization Theory and Operations Research, 
2019, pp. 343–355. https://doi.org/10.1007/978-3-
030-33394-2_27 
[101] A. Rajkomar et al., “Scalable and accurate deep 
learning with electronic health records,” NPJ Digit. 
Med., vol. 1, no. 1, pp. 1–10, 2018.  
https://doi.org/10.3410/f.733181042.793560090  
[102] P. Yadav, M. Steinbach, V. Kumar, and G. Simon, 
“Mining electronic health records (EHRs) A 
survey,” ACM Comput. Surv., vol. 50, no. 6, pp. 1–
40, 2018.  https://doi.org/10.1145/3127881  
[103] S. Shirazi, H. Baziyad, and H. Karimi, “An 
Application-Based Review of Recent Advances of 
Data Mining in Healthcare,” J. Biostat. Epidemiol., 
2019. https://doi.org/10.18502/jbe.v5i4.3864  
[104] V. V Petrov, O. P. Mintser, A. A. Kryuchyn, and 
Y. A. Kryuchyna, “Big Data in medicine: promise 
and challenges,” 2019. https://doi.org/10.11603/ 
mie.1996-1960.2019.3.10429 
[105] D. Cirillo and A. Valencia, “Big data analytics for 
personalized medicine,” Curr. Opin. Biotechnol., 
vol. 58, pp. 161–167, 2019.  
https://doi.org/10.1016/j.copbio.2019.03.004 
[106] G. Torbahn, T. Strauss, C. C. Sieber, E. 
Kiesswetter, and D. Volkert, “Nutritional status 
according to the mini nutritional assessment 
(MNA)®as potential prognostic factor for health and 
treatment outcomes in patients with cancer--a 
systematic review,” BMC Cancer, vol. 20, no. 1, pp. 
1–18, 2020. https://doi.org/10.1186/s12885-020-
07052-4 
[107] K. Farooq, B. S. Khan, M. A. Niazi, S. J.Leslie, and 
A.Hussain, “Clinical decision support systems: A 
visual survey,” arXiv Prepr. arXiv1708.09734, 2017. 
https://doi.org/10.31449/inf.v42i4.1571. 
[108] G. Veselov, A. Tselykh, A. Sharma, and R. Huang, 
Applications of Artificial Intelligence in Evolution 
of Smart Cities and Societies, Informatica, vol. 45, 
no. 5, 2021.   
https://doi.org/10.31449/inf.v45i5.3600 
[109] M. Možina,“Arguments in interactive machine 
learning,” Informatica, vol. 42, no. 1, 2018. 
166 Informatica 45 (2021) 147–166 D. Jamil et al. 
 
[110] A. A. Abaker and F. A. Saeed, “A Comparative 
Analysis of Machine Learning Algorithms to Build a 
Predictive Model for Detecting Diabetes   
Complications,” Informatica, vol. 45, no. 1, 2021. 
https://doi.org/10.31449/inf.v45i1.3111