https://doi.org/10.31449/inf.v47i9.4964 Informatica 47 (2023) 81–90 81 
Optimizing Sequential Forward Selection on Classification Using 
Genetic Algorithm 
Knitchepon Chotchantarakun 
E-mail: knitchepon@go.buu.ac.th 
Department of Information Studies, Faculty of Humanities and Social Science, Burapha University 
169 Longhaad Bangsaen Rd, Saensuk, Mueang, Chonburi, 20131, Thailand 
Keywords: classification accuracy, data mining, genetic algorithm, optimization, sequential feature selection 
Received: June 22, 2023 
Regarding the digital transformation of modern technologies, the amount of data increases significantly 
resulting in novel knowledge discovery techniques in Data Analytic and Data Mining. These data usually 
consist of noises or non-informative features that affect the analysis results. The features-eliminating 
approaches have been studied extensively in the past few decades name feature selection. It is a significant 
preprocessing step of the mining process, which selects only the informative features from the original 
feature set. These selected features improve the learning model efficiency. This study proposes a forward 
sequential feature selection method called Forward Selection with Genetic Algorithm (FS-GA). FS-GA 
consists of three major steps. First, it creates the preliminarily selected subsets. Second, it provides an 
improvement on the previous subsets. Third, it optimizes the selected subset using the genetic algorithm. 
Hence, it maximizes the classification accuracy during the feature addition. We performed experiments 
based on ten standard UCI datasets using three popular classification models including the Decision Tree, 
Naive Bayes, and K-Nearest Neighbour classifiers. The results are compared with the state-of-the-art 
methods. FS-GA has shown the best results against the other sequential forward selection methods for all 
the tested datasets with O(n
2
) time complexity. 
Povzetek: Genetski algoritem je bil uporabljen za iskanje najboljšega zmanjšanega nabora atributov za 
namene strojnega učenja. FS-GA dosega boljše rezultate s tremi algoritmi na 10 UCI domenah. 
 
1 Introduction 
With the development in the field of Computer Science 
and Information Technology, data collection is a 
significant aspect that needs to be considered due to the 
hidden information behind these data. The amount of data 
is growing rapidly along with modern technologies both 
in dimension and volume. Usually, these data can be a 
high-dimensional dataset containing an excessively large 
number of variables or features. Some of these features 
may not contain valuable information and are regarded as 
noise. In the early stage of Data Mining (DM), which is 
the data preparation step, it is essential to eliminate these 
non-informative features from the large feature set. The 
removal of these features increases the learning 
performance and computational efficiency in the DM 
process. This dimensionality reduction technique is 
feature selection. 
The process of feature selection is minimizing the 
redundant features and maximizing the relevant features 
by identifying a feature subset consisting of only 
informative features. It offers advantages such as reducing 
the requirement for computer storage, enhancing data 
visualization, improving model prediction, and reducing 
training times [1]. To guarantee the optimal feature subset, 
an exhaustive feature selection method requires O(2
n
) of  
 
time complexity, where n is the number of features in the   
input data. Exhaustive searches such as branch and bound 
[2, 3] result in exponential growth in computing time. 
Even though this method ensures the optimal feature 
subset, it is not feasible, particularly for a considerably 
large n. This NP-hard problem requires enormous 
execution time. Several search strategies have been 
studied for suboptimal solutions to reduce the time 
complexity. Feature selection algorithms are classified 
into different approaches. One of the standard approaches 
categorized it into three different types including filter 
method, wrapper method, and embedded method [4, 5]. 
The filter method concerns the relationship between 
the feature and the class label. It uses measurements such 
as similarity or distance to rank features from highest to 
lowest score. It concerns only the criterion value of 
individual features by neglecting the relationship between 
the feature and the classifier. Some examples of the filter 
methods are the Chi-square Test, Euclidean Distance, and 
Information Gain. The advantage of the filter method is 
the small computing time. 
The wrapper method determines the goodness of each 
feature according to the classification accuracy, that is, the 
selected feature subset directly depending on the 
classification algorithm. The computing time is much 
slower than the filter method because of the application of 
the data mining algorithm on each considering subset 
82 Informatica 47 (2023) 81–90 K. Chotchantarakun 
during the searching process. It is the main disadvantage 
of the wrapper method over the filter method. Some 
examples of wrapper methods are Sequential Feature 
Search and Heuristic Search. Regarding the quality of the 
selected feature subset, the wrapper method provides 
better performance since it is related to the classifiers used 
in classification. 
The embedded or hybrid method combines feature 
selection with model training to reduce computing time. It 
returns the learned model and the feature subset 
concurrently. The embedded method applies a filter and a 
wrapper to select candidate feature subsets and then 
evaluates these subsets to find the best subset using 
classification. This method reduces the number of features 
in the dataset along with the decrease in computing time 
and leads to better performance. 
Another categorization approach related to the 
training data before the learning process of feature 
selection is Supervised and Unsupervised algorithms [6]. 
In the supervised method, learning data are labeled with 
classes. On the other hand, the unlabeled data are applied 
to the unsupervised algorithm. The Unsupervised method 
arranges similar features into classes using some given 
criterion. The unsupervised method is usually more 
complicated than the supervised method. 
The objective of this study is to explore an effective 
way to select a suboptimal feature subset by maximizing 
the selection performance in terms of classification 
accuracy regarding forward sequential feature selection. 
We discuss some related work in section 2. Section 3 
presents the proposed method. Section 4 provides the 
experimental details of this study. Results and discussion 
are examined in section 5. The conclusion is stated in 
section 6. 
2 Related work 
2.1 Feature selection process 
A very high dimension of collected data includes both 
relevant and irrelevant features in the dataset resulting in 
a vast number of feature selection techniques proposed in 
the literature. These dimensionality reduction techniques 
become significant by removing the irrelevant features 
from the relevant ones. Thus, feature selection is a 
necessary prior step for reducing the computation time and 
improving the DM performance. Feature selection 
consists of three essential steps name search, evaluate, and 
stop. It not only improves the classification accuracy by 
reducing irrelevant features but also reduces the 
computing time. 
 
 
Figure 1: Feature selection process. 
 
Figure 1 shows a feature selection process by first 
producing candidate feature subsets. Then, evaluate each 
candidate subset using the chosen strategies. This subset 
evaluation step can either be a filter or a wrapper 
approach. Continue the selection process until the 
stopping criterion is met. It can be a specific number of 
selected features, the number of predefined iterations, or 
there are no better solutions while adding a newly selected 
feature. Finally, compare the results using the chosen 
classifiers in the result validation step. 
The search procedures can be categorized into three 
strategies [7]. The first strategy is the Exponential 
algorithm, in which the subset size grows exponentially. 
Some examples of this strategy are the Exhaustive search 
and the Branch-and-bound algorithm. The second strategy 
is a Sequential algorithm such as Sequential Forward 
Floating Selection (SFFS) [8]. This searching method 
adds or removes features from the current feature subset. 
The size of the subset can be increased or decreased 
depending on which direction produces a higher criterion 
value. The third strategy is the Random algorithm which 
commonly has a linear time complexity. It is designed to 
maximize the solutions and avoid the local optimum. The 
drawback of this strategy is the difficulty of choosing 
effective parameters. This Heuristic search optimizes the 
solution by incorporating randomness into the search 
process. 
Evolutionary algorithms, as part of random 
algorithms, are becoming a more attractive field of study. 
Some remarkable examples include GA, Particle Swarm 
Optimization (PSO), and Ant Colony Optimization 
(ACO). GA is developed from the Darwinian principle 
relating to the survival of living things. PSO is more 
efficient than GA but requires numerous mathematical 
calculations and user-defined parameters and is difficult 
when dealing with an experiment. ACO deals with the 
shortest paths discovered by the real ants while searching 
for their food. Even though the PSO and ACO algorithms 
produce similar results to GA, researchers take more 
interest in GA due to its simplicity and efficiency for 
implementation [9]. 
2.2 Sequential feature selection 
The iterative nature of Sequential Feature Selection is to 
select a feature and add it to an active subset one by one 
sequentially. One of the earliest sequential selection 
methods is Sequential Forward Selection (SFS) [10]. It 
searches for the best feature from the remaining feature set 
in the forward direction. Hence, the selected subset 
gradually increases in size. The addition of a feature needs 
to maximize the performance validation by raising the 
learning accuracy of the active subset. Repeat the same 
operation until reaching the specified subset size. SFS is a 
base subset construction for other complicated selection 
algorithms. 
The SFFS algorithm combines the forward SFS with 
the Sequential Backward Selection (SBS). The addition of 
SBS provides a better search than SFS by introducing the 
backtracking step in the backward direction. This 
backtracking step helps to eliminate a feature that affects 
learning efficiency. It is a conditional step where the 
improvement occurs during the selection process. SFFS is 
Optimizing Sequential Forward Selection on Classification... Informatica 47 (2023) 81–90 83 
widely used in several applications and is said to be the 
state-of-the-art (SOTA) method. The SFFS algorithm 
consists of three simple steps including the inclusion step 
(SFS step), the backtracking step (SBS step), and the 
stopping criterion checking step. 
The improved version of SFFS occurs continuously 
starting from the Adaptive SFFS (ASFFS) [11]. ASFFS 
used SFFS as a baseline and improved the searching 
process by looking ahead both forward and backward for 
some specified number of features adaptively. It provided 
a higher opportunity to find a better subset. Additional 
work on the adaptive step significantly requires longer 
computing time. 
The remarkable improvement of SFFS is the Improve 
Forward Floating Selection (IFFS) [12]. The idea of IFFS 
is to remove the weak feature in the selected subset and 
add a feature from the remaining set. Due to the fact that 
the best (k−1)-subset is not a fundamental subset for 
constructing the best k-subset. Hence, IFFS is able to fix 
the nesting problem. Not only the IFFS is capable of 
solving the nesting effect of SFFS but also provides a 
simpler algorithm than ASFFS. IFFS returns a better 
solution than SFFS and ASFFS with a slightly longer 
computing time than SFFS. 
Similar to ASFFS, the Sequential Deep Floating 
Forward Search (SDFFS) [13] performed a deep search in 
both directions to prove the existence of better results than 
the current subset. This study compared SDFFS against 
SFS, SFFS, IFFS, and Plus-L-Minus-R (PlMr) [14]. The 
results showed that SDFFS returns the highest accuracy 
for the majority of the tested datasets. However, SDFFS 
required massive computing time with a small chance of 
finding a better subset. 
 
Table 1: Summary Table for related methods on 
Sequential Feature Selection. 
Study Technique Dataset 
Performance 
Metric 
[10] SFS Signal classification systems 
Classification 
accuracy 
[8] SFFS Not indicated 
Mahalanobis 
distance 
[11] ASFFS 
Mammogram and Sonar 
datasets 
Bhattacharyya 
distance 
[12] IFFS 
Mammogram, Sonar, Musk, 
and one versus six numeral 
recognition datasets 
Classification 
accuracy, 
Mahalanobis 
distance, 
Divergence 
distance, 
[13] SDFFS 
Colon, DLBCL, SRBCT, 
lung, musk1, synthetic, mfeat, 
and arrhythmia datasets 
Classification 
accuracy 
[15] MLFI 
Wine, Online shopper, 
Lymphography, 
Crowdsourced, Ionosphere, 
Soybean, Spectf heart, Sonar 
datasets 
Classification 
accuracy 
[16] OFMB 
Wine, Thoracic Surgery, 
Online shopper, 
Lymphography, Image 
Segmentation, Crowdsourced, 
Breast Cancer, Ionosphere, 
Soybean, Spambase, Sonar, 
Urban land cover datasets 
Classification 
accuracy 
 
Recently, two novel algorithms regarding sequential 
search were presented. Multi-levels Forward Inclusion 
(MLFI) [15] provided an improvement by applying an 
adaptive multi-level forward search method without any 
backtracking step. Whereas, the One-level Forward Multi-
level Backward Selection (OFMB) [16] focused on the 
adaptive backtracking search direction. Most of the tested 
results from both algorithms showed higher accuracy than 
the previous standard methods. These two techniques are 
deterministic algorithms. 
Table 1 summarizes some related popular sequential 
feature selection methods. The SOTA methods are 
indicated in bold. Other techniques are extensions of the 
SOTA method. The performance matrices applied to the 
proposed algorithm usually either be some kind of 
distance measure or classification accuracy. Many of them 
employ open datasets which are free and complete to test 
the goodness of their performances. 
2.3 Genetic algorithm in feature selection 
A deterministic approach for feature selection such as a 
floating search may be trapped in a suboptimal solution. 
Recently, many researchers have tried to explore a new 
area of feature selection techniques that are capable of 
escaping the suboptimal solutions using evolutionary 
techniques as part of the traditional method to improve 
performance. These hybrid approaches take advantage of 
the distinctive properties of at least two methods to 
optimize selection efficiency. The various hybrid methods 
optimize the solutions based on evolutionary computation 
related to biological evolution. 
One of the most well-known evolutionary algorithms 
is GA [17]. The idea is to imitate the natural evolution 
process for survival based on the Darwinian principle for 
solving complicated computation problems. The GA 
process is motivated by natural evolution including 
inheritance, selection, crossover, and mutation. The GA 
concept has been implemented in various areas of 
computer science. One of them is the feature selection 
based on natural genetics that provides powerful search 
proficiency in large datasets. 
GA is iteratively operated on a set of populations 
represented by chromosomes. In computational terms, 
these chromosomes represent binary strings which 
randomly generated via an encoding mechanism. In a 
binary string representation, a chromosome contains bit 1 
or 0 which indicates the presence or absence of a particular 
feature in the active subset. The number of features in a 
subset implies the length of the chromosome. Each 
chromosome has a fitness value indicating its quality to 
rank them in descending order. 
In the feature selection problem, the criterion function 
is assigned as a fitness function which is the classification 
accuracy. In each iteration, the fitness value is calculated 
for each candidate solution. Selection of higher fitness 
gives more chances of getting favorable solutions. The 
selected individuals represent the parents of the next 
generation. Crossover and mutation operators are applied 
to the parent chromosomes producing new populations in 
the search space. Since ‘good’ parents produce ‘good’ 
84 Informatica 47 (2023) 81–90 K. Chotchantarakun 
offspring, GA has a strong possibility that the solutions are 
expected to converge to the global optimal over time while 
the evolution process takes place for several generations. 
Tournament Selection is one of the most common 
selection methods due to its simplicity and efficiency. It 
randomly selects a set of k individuals regarding their 
fitness. The highest k fitness is selected and assigned to 
the next generation. In crossover operation, two-parent 
chromosomes crossbred to construct two new 
chromosomes. One-point, two-point, or k-point crossovers 
are the random points for constructing children's 
chromosomes. The new chromosomes have a great 
opportunity to gain the dominant genes from the parent 
generating higher fitness. This crossover operation 
provides a good capability to solve the local optimum 
problem. 
The other critical aspect of evolution is mutation since 
it also mitigates the risk of the search falling into a local 
optimum. This operation is applied to the evolutionary 
step in order to introduce new genes in the children's 
chromosomes. Mutation randomly changes the value of a 
gene by inverting the chosen pair of bits to the opposite 
values. A pair of flipping bits is essential to preserve the 
number of features in the active subset. 
Several researchers integrate the GA technique with 
other methods when dealing with feature selection. In 
[18], the hybrid GA with PSO using random forest (RF) 
selected the best feature subset to predict the risk of heart 
disease. In [19], a floating search technique combined GA 
with SFFS to improve performance. It used Mutual 
Information (MI) as a criterion function to create a 
candidate subset during the inclusion step. In the GA step, 
it assigned a population size of 4-100 individuals, a 
mutation rate of 0.01, with a single-point crossover. The 
number of generations was 500. The experiments showed 
that GA improved the overall performance for most of the 
tested datasets. However, the inclusion of a backtracking 
step limited the search space. 
In [20], to select a feature, they applied a machine 
learning (ML) model to construct a fraud detection engine 
on credit card data using GA. They tested the proposed 
method using various classifiers including Random Forest 
(RF), Decision Tree (DT), Naive Bayes (NB), Logistic 
Regression (LR), and Artificial Neural Network (ANN). 
The result showed that their proposed method exceeded 
the previous works.  
The study from [21] presented the disease detection 
model using hybrid feature selection based on Honeybee-
SMOTE and c4.5 algorithm relating to feature selection 
problems. Other research areas are the applications of 
feature selection techniques in education mining, text 
mining, and medical science [22-24]. 
2.4 Data mining in real-world applications 
Data mining is an essential part of data analytics. Its 
techniques are used in many different fields such as 
business decision-making, bank insurance tasks, 
manufacturing, education, medicine, and public health. In 
the telecommunications business, results from data mining 
can improve customer services by focusing on finding 
patterns in customer behavior. In bank insurance, data 
mining can solve the management risk or customer exit 
problems. In manufacturing, data mining is used to predict 
phenomena that may occur in production and find 
solutions. In educational services, data mining algorithms 
are used to identify factors affecting student achievement 
and design policies to support them. In the medical and 
public health fields, data mining techniques are also used 
to solve various problems, such as predicting disease from 
patient symptoms or analyzing the risk factors for serious 
diseases. It can be seen that the use of data mining 
techniques has a widely used in large scope. Our proposed 
method is one of many feature selection techniques and is 
part of the data mining process, thus it can be applied to 
any real-world applications mentioned above. 
3 Proposed method 
This study is a wrapper-based approach that aims to 
optimize the sequential forward feature selection. The 
possible feature subset is generated using machine 
learning models to determine the goodness of each subset. 
The number of selected features in the active subset 
gradually increases or remains unchanged until the 
specified subset size is reached. In traditional floating 
search, the evaluation process appears in both forward and 
backward directions. The newly selected subset is a result 
of the previously selected subset. There is a possibility that 
the solution may be trapped in a local optimum. Moreover, 
the backtracking step also leads to a longer search time.  
The proposed method tries to remove the weakness of 
the floating search algorithms by applying a randomized 
technique to avoid the local optimum and maximize the 
results by focusing only on the forward direction. A 
maximum k-subset is carefully selected from a large 
search space without decreasing the current feature size. 
Subsequently, a newly selected subset with a higher 
classification accuracy is explored. 
We attempt to optimize a sequential forward selection 
that outperforms the standard methods. The combination 
of the feature forward selection, the weak feature 
replacement, and the GA technique come up with a novel 
forward feature selection method named Forward 
Selection with Genetic Algorithms (FS-GA). FS-GA 
intends to remove the backward step and explore further 
in the forward directions as well as escape the local 
optimal solution. FS-GA considers a larger search space 
that leads to a more exhaustive search. Hence, it increases 
the opportunity to optimize the active feature subset 
regarding the fitness value. 
The mathematical calculation is as follows. Assume 
there is a feature set Z = {z 1, z 2, z 3,…, z D}, where D is the 
input features of the dataset. The subset S k = {s j | j = 1, 2, 
…, k; s j  Z} is the active subset, where 0 ≤ k ≤ D and d is 
the target subset size. Initialize S 0 = {} and k = 0. Let s
+
 be 
a selected feature included in the active subset since s
+
 = 
arg max F(s k = s), where s  Z – S k and F is the fitness 
value. 
 
 
Optimizing Sequential Forward Selection on Classification... Informatica 47 (2023) 81–90 85 
Algorithm 1: Forward Selection with Genetic Algorithms (FS-GA) 
Input: A set of the input feature Z; Fitness value F represents a 
function to determine the quality of the chromosome; d is the 
target size; the population is a set of selected chromosomes. 
Output: A feature subset max(S k) with maximum classification 
accuracy. 
Initialize: Initialize S 0 = {}; k = 0, generation = 1. 
(1) Feature Inclusion Step  
#Apply SFS to search for s
+ 
and max(S k) 
s
+
 = arg max F(s k = s), where s  Z – S k 
S k+1 = S k + s
+
, k +1 
update max(S k) = S k+1 
(2) Feature Improvement Step 
#Improve the selected subset S k iteratively. 
repeat  
for s j in S k : (j = 1 to k) 
S k–1 = S k – s i 
for s i in Z − S k–1 : (i = 1 to  d − (k–1)) 
s i = arg max F(s i) 
if F(S k–1 + s i) > F(S k): 
S k = S k–1 + s i 
update max(S k) = S k 
until F(S k) is the maximum feature subset 
(3) Evolutionary Step 
repeat 
(3.1) Initialize Population 
#Apply SFS to create a subset size 2k 
length (individual) = 2k 
a = k +1 
for a from k +1 to 2k: 
s
+
 = arg max F(s a = s), where s  Z – S a 
S a+1 = S a + s
+
 
a = a + 1 
update max(S a) = S a+1 
Randomly assign “1” to each gene in the chromosome 
for k genes; the other genes assign to “0” to get a binary 
string of 0 and 1 where both bits have equal k size. (This 
step applies only at the beginning of the process for the 
initial two chromosomes that represent the population) 
(3.2) Selection Operation 
#Apply the Rank Selection technique to select the best 
two chromosomes from the population according to the 
fitness value (F) of each candidate chromosome. 
sort(individual(F)) 
parent1 = select_parents(population, F) 
parent2 = select_parents(population, F) 
(3.3) Crossover Operation 
#Perform the One Point Crossover technique to 
randomly choose a point on the chromosome for 
swapping the genetic material 
point = random_point(individual) 
child1 = parent1[0: point] + parent2[point: ] 
child2 = parent2[0: point] + parent1[point: ] 
population = {parent1, parent2, child1, child2} 
Control the number of bit 1 equal to k 
(3.4) Mutation Operation 
#Perform randomly swapping any bit of each 
chromosome by swap bit 0 with bit 1 or vice versa 
for individual in population: 
index1 = random_index(individual) 
index2 = random_index(individual) 
index1 and index2 must be different bit 
temp = individual[index1] 
individual[index1] = individual[index2] 
individual[index2] = temp 
generation = generation +1 
until generation > 100 
(4) Termination Condition 
#Stop the execution when k = d 
if k < d:  
return the maximum subset max(S k) 
continue steps 1 to 4. 
End 
 
Algorithm 1 is a pseudocode of the FS-GA algorithm 
that illustrates the selection process for the candidate 
feature subset which is explained below. 
Step 1: In the beginning, select a feature from the 
remaining feature set. Then add it to the active 
feature subset. Move on to step 2 with the feature 
subset S k and increase the size of k by 1. 
Step 2: Continue from the k-subset (S k), and remove one 
feature to get k −1 subsets (S k–1). Repeat this 
process k −1 times. Then, apply SFS to select a 
feature from the remaining set (Z − S k–1). Add the 
newly selected feature to get S k for k −1 subsets. 
Evaluate all candidate subsets to find an 
improvement. Replace the improved subset with 
the current subset then repeat step 2. Otherwise, 
continue step 3 with the maximum subset S k. 
Step 3: Apply GA to the selected subsets. 
From the currently selected subset, we apply 
SFS to create a subset of size 2k. Then, randomly 
generates two individuals by assigning bit 1 to k 
genes and bit 0 to the other genes in the 
chromosomes. These two individuals or 
chromosomes represent the initial population as 
a starting point of the evolutionary step. This 
technique is used in only the first generation. 
Process the crossover and mutation to get four 
individuals consisting of two parents and two 
children chromosomes. 
From the second generation onwards, we apply 
the selection operation by ranking the four 
individuals according to their finesses. Then, 
select only the best two individuals to be the 
parent chromosomes. If the two selected 
chromosomes have higher fitness values than the 
previous chromosome of size k with maximum 
fitness, then replace the previous chromosome 
with the newly selected one that is the best k-
subset found so far. 
At this point, we have two parents with the 
highest fitness values. Apply crossover operation 
to the selected parents to produce two children's 
chromosomes. Adjust the number of bit 1 in the 
offspring to preserve the subset size. Now, there 
are four individuals in the population pool. 
Apply mutation operation to all individuals. 
Randomly select the first gene, then flip that gene 
to the opposite bit. Randomly select the second 
gene of the opposite bit to the first gene and flip 
it. Repeat step 3 for 100 generations, then go to 
step 4. 
Step 4:  Repeat steps 1 to 4 until k = d. 
3.1 The application of genetic algorithm 
The application of GA optimizes the search algorithm to 
discover the most desired solutions. Based on natural 
selection, construct the genes to form chromosomes. 
These chromosomes represent a population in the pool. 
GA basic operations include selection, crossover, and 
mutation. 
86 Informatica 47 (2023) 81–90 K. Chotchantarakun 
Selection: The selection operator selects potentially 
useful solutions to recombine them using various 
techniques such as Tournament Selection, Rank Selection, 
or Random Selection. Tournament selection is applied in 
our study. All individuals in the population pool are 
measured using the fitness function. We rank them in 
descending order. The highest two fitness are selected for 
calculation in the next generation. 
Crossover: Crossover operation is the matching of 
the two parent chromosomes producing the offspring 
chromosomes. This operation can be classified as a single-
point, two-point, or uniform crossover. The crossover 
operator splits chromosome pairs randomly and combines 
the crossover pairs to form a pair of offspring 
chromosomes.  
In our study, the one-point crossover is applied since 
it is the most well-known operation. The selected parent 
population is divided into two parts at a randomly selected 
point, namely the crossover point. Hence, the information 
from one gene to another is interchangeable. The genetic 
information from the two parents is exchanged to produce 
two children's chromosomes shown in Figure 2. 
 
 
Figure 2: Crossover operation. 
 
The idea behind the crossover operation is that the 
children's chromosomes have a better chance to carry out 
the good characteristics of their parents. In other words, 
the subsets of higher fitness are possibly discovered. 
Mutation: Mutation operation introduces a slight 
variation in the chromosome by twisting one gene in the 
bit string. Practically, the mutation is done by randomly 
swapping any bit of the selected individuals. The mutation 
operator randomly selects genes and converts them to the 
opposite bit values on the binary string representation. 
This process changes the genetic sequence for introducing 
a new chromosome in the potential search space. It 
controls a cause of variation in the genetic population. 
 
 
Figure 3: Mutation operation. 
 
This study applies the interchanging mutation 
technique. Two positions are selected randomly from the 
bit strings and then switched the bit values of the genes. 
To stabilize the subset size, if the first randomly selected 
bit is 1, the second bit must be 0. Therefore, the size of the 
solution is preserved in every generation. Figure 3 
illustrates the mutation operation. 
3.2 Demonstration using the Wine dataset 
The Wine dataset is one of the standard datasets from the 
UCI repository [25]. We have decided to use this dataset 
to show our calculation based on the FS-GA algorithm 
because of its simplicity and small. At the beginning, we 
have Z = {f 0, f 1, f 2, f 3, f 4, f 5, f 6, f 7, f 8, f 9, f 10, f 11, f 12} with 13 
features that means d = 13. We assign the generation to 
100. 
Feature Inclusion: Initially, assume that the 
algorithm applies forward selection for the first four 
features, that is k = 1, 2, 3, and 4. Then S 1 = {f 6}, S 2 = {f 6, 
f 10}, S 3 = {f 6, f 10, f 2} and S 4 = {f 6, f 10, f 2, f 7} respectively. At 
this point we have k = 4 is S 4 = {f 6, f 10, f 2, f 7}. This is the 
best 4-subset that we have found so far. 
Feature Improvement: In this step, the subset size 
is k = 4, that is S 4 = {f 6, f 10, f 2, f 7} with 89.98% accuracy. 
Remove a feature in the subset except for s 4 = f 7 to form 
three smaller subsets. Then the considering subsets are {f 6, 
f 10, f 7}, {f 6, f 2, f 7}, and {f 10, f 2, f 7}. Select the best feature in 
the remaining set (Y - S 4), then add it to the three candidate 
subsets above. At this point, there are three subsets of size 
4 for comparison. After the F-value calculation, the 
algorithm finds a subset {f 10, f 2, f 7} with the combination 
of a feature f 9 results in the highest fitness with 92.75%, 
that is F({f 10, f 6, f 7, f 9}) = 92.75%. Therefore, replace {f 6, 
f 10, f 2, f 7} with {f 10, f 6, f 7, f 9}. Repeat this process with S 4 = 
{f 10, f 6, f 7, f 9}. The result is no better solution, thus move 
on to the evolutionary step with S 4 = {f 10, f 6, f 7, f 9}. The 
next step optimizes this solution using the GA technique. 
Evolutionary Selection: Assume the algorithm 
continues until k = 5, that is S 5 = {f 10, f 6, f 7, f 9, f 5} with 
91.63% accuracy. Create a larger mating pool by selecting 
more features until the length of the feature subset doubles 
in size. Thus, the subset size must be 2k, which is 2×5 = 
10 features. Then, apply SFS to get a subset of size 10 
which is S 10 = {f 10, f 6, f 7, f 9, f 5, f 0, f 1, f 2, f 8, f 11}. Transform 
the feature subset into the binary strings where 0 and 1 
indicate the absence and presence of the i
th
 feature in the 
solution. This step applies a random mechanism to create 
a wider solution region in the search space for the 
exploration by generating 2 parents in a mating pool. 
Individual representation using a binary string occurs only 
on the first iteration. From the second iteration onward, 
there would be 4 individuals, which are 2 parents and 2 
children. These 4 individuals represent 4 chromosomes in 
the population set for the selection operation. 
 
 
Figure 4: Two-parent selection. 
 
Optimizing Sequential Forward Selection on Classification... Informatica 47 (2023) 81–90 87 
1) Selection Operation: For the selection operation, 
there are 4 individuals in the mating pool in every 
iteration. The best 2 individuals are chosen according to 
the fitness values calculated by the selected classifier. 
Assume that the best 2 individuals are [0, 0, 0, 1, 1, 1, 0, 
1, 1, 0] and [1, 0, 0, 1, 1, 1, 0, 1, 0, 0]. Figure 4 shows the 
representation of the parent sets {f 9, f 5, f 0, f 2, f 8} and {f 10, 
f 9, f 5, f 0, f 2} for the crossover operation. 
2) Crossover Operation: In this step, a pair of parent 
chromosomes {f 9, f 5, f 0, f 2, f 8} = [0, 0, 0, 1, 1, 1, 0, 1, 1, 0] 
and {f 10, f 9, f 5, f 0, f 2} = [1, 0, 0, 1, 1, 1, 0, 1, 0, 0] perform 
crossover operation to produce a pair of child 
chromosomes. First, compute a one-point crossover by 
randomly choosing a crossover point from the parent. This 
point is located between two consecutive bits and then 
partitions each chromosome into two sections. Assume we 
have a crossover point of 4. After that, swapping the 
opposite parts of the two chromosomes forms the other 
two new chromosomes shown in Figure 5. Children's 
chromosomes may have more or fewer bits than the 
parents. In this case, random bits automatically flip to 
maintain the size of their parents which is a subset size 5. 
 
 
Figure 5: Perform one-point crossover. 
 
3) Mutation Operation: The mutation operation 
applies to the child's chromosomes from the previous step. 
It randomly flips bits in the new chromosomes by turning 
1 to 0 and vice versa. 
 
 
Figure 6: Classification accuracy of the Wine dataset. 
 
The evolutionary step continues for several iterations 
that indicate the number of generations. The best fitness 
from the reproduction subsets is the one that survives for 
selecting the next feature. Figure 6 shows that S 5 from the 
FS-GA algorithm produces the highest classification 
accuracy with 93.3%, whereas the other sequential 
searching techniques cannot find this accuracy value. To 
explore even further, we can increase the number of 
generations and more features for gene representation 
results in a thorough search. Our study explores the 
potential subsets for 100 generations and then assigns the 
maximum solution of S 5. Return to step 1 and repeat the 
process for k = 6. 
Termination Condition: The proposed algorithm 
processes sequentially by adding one feature on each 
iteration. The size is growing from k to d with the best 
performance for each subset size indicated by S k. The 
algorithm stops when we get the required subset size. This 
method applies the concept of the evolutionary searching 
technique named GA to optimize the solutions in terms of 
classification accuracy using various machine learning 
models. 
The FS-GA algorithm gives a higher chance to 
explore deeper to find the potential subsets that have not 
come across before and cannot be found by the earlier 
methods. There is a high possibility that better subsets 
with better accuracy are discovered. Therefore, the results 
move closer to the optimal solutions. The proposed 
algorithm is the sequential forward selection using the 
evolutionary method to improve performance. Due to the 
non-heuristic behavior, it is capable of avoiding the trap 
of the local optimum and providing suboptimal solutions 
that overcome the SOTA methods. 
4 Experimental setup 
The proposed algorithm was developed using Python 
programming language with the Jupyter Notebook editor 
to perform the experiments. We used ten standard datasets 
from the UCI listed in Table 2. These datasets are open 
data with adequate information for carrying out the 
experiments, and some of them were also used in previous 
works regarding sequential feature selection. Other kinds 
of datasets with CSV format are also applicable to the 
proposed algorithm. 
 
Table 2: Tested Datasets. 
Dataset Name Instances Dimensions 
Wine 173 13 
Thoracic Surgery 470 17 
Lymphography 142 18 
Image Segmentation 2310 19 
Breast Cancer 569 32 
Ionosphere 351 34 
Soybean 307 35 
Spam base 4601 57 
Sonar 208 60 
Urban Land Cover 675 147 
 
We compared our results with SFS, SFFS, and IFFS 
using DT, NB, and K-Nearest Neighbor (KNN) for 
performance validation. These popular classifiers are 
implemented in many applications due to their dominant 
characteristics including robustness, effectiveness, and 
ease of implementation. FS-GA is a general selection 
method similar to the other previous methods which 
88 Informatica 47 (2023) 81–90 K. Chotchantarakun 
means it can be applied to other kinds of datasets that 
contain structured information on both features and class. 
We applied the three classifiers to all four methods for the 
comparison. The evaluation was based on data 
normalization using 5-fold cross-validation. Noise data 
like missing values or outliers were eliminated where 
necessary. 
5 Results and discussion 
5.1 Compare FS-GA with the standard 
methods using DT 
In Table 3, the experimental results of our proposed 
method are compared to the other popular sequential 
search methods. The comparison of maximum accuracy in 
percentage (%) and the number of selected features is in 
the brackets. The highest accuracy is highlighted in bold. 
The best solutions are the result of the FS-GA algorithm 
based on the DT classifier. SFS and SFFS cannot provide 
the maximum classification accuracy in all tested datasets. 
IFFS generates feature subsets identical to the proposed 
method in only Wine and Lymphography datasets. While 
the dimension increases, the FS-GA performance is the 
best among the other popular methods on the remaining 
datasets excluding only Wine and Lymphography sets. 
The experiments have proved that our proposed method is 
a more effective method than the other standard methods 
for forward feature selection due to the application of 
weak feature replacement and genetic optimization. 
 
Table 3: Classification Accuracy using DT. 
Dataset Name SFS SFFS IFFS FS-GA 
Wine (13) 92.76 (3) 93.32 (6) 93.89 (4) 93.89 (4) 
Thoracic Surgery (17) 80.64 (4) 81.28 (6) 81.06 (6) 81.70 (6) 
Lymphography (18) 85.94 (4) 86.63 (5) 86.65 (6) 86.65 (6) 
Image Segmentation (19) 84.29 (12) 85.24 (9) 87.14 (8) 87.62 (10) 
Breast Cancer (32) 95.60 (5) 95.60 (5) 96.31 (7) 96.49 (10) 
Ionosphere (34) 91.75 (20) 92.03 (15) 93.45 (7) 94.02 (20) 
Soybean (35) 89.84 (20) 90.60 (13) 90.98 (15) 92.11 (20) 
Spam base (57) 90.33 (17) 90.87 (16) 91.00 (19) 91.07 (18) 
Sonar (60) 77.05 (6) 81.35 (10) 87.51 (10) 87.99 (9) 
Urban Land Cover (147) 79.41 (20) 77.63 (17) 80.15 (13) 81.19 (19) 
5.2 Compare FS-GA with the standard 
methods using NB 
In Table 4, the classification accuracy enlarges by the 
proposed method based on the NB classifier. Only the 
Wine dataset has the same results for all comparison 
methods. Excluding the Wine dataset, IFFS produces 
equal maximum accuracy as the proposed method for 
Image Segmentation, Breast Cancer, Ionosphere, and 
Soybean datasets. In the Ionosphere dataset, although 
IFFS has the same accuracy as the FS-GA algorithm, it 
requires a higher number of features (14 > 10 features) 
consequently the proposed method considers to be the best 
performance for this particular dataset. In the remaining 
tested datasets, the proposed method achieves better 
results. Accordingly, FS-GA generates the minimal 
feature subset with the highest classification accuracy in 
the most tested datasets. 
 
Table 4: Classification Accuracy using NB. 
Dataset Name SFS SFFS IFFS    FS-GA 
Wine (13) 93.30 (5) 93.30 (5) 93.30 (5) 93.30 (5) 
Thoracic Surgery (17) 85.11 (1) 85.11 (1) 85.11 (1) 85.32 (5) 
Lymphography (18) 86.58 (8) 87.24 (10) 87.32 (9) 88.03 (8) 
Image Segmentation (19) 81.90 (5) 81.90 (5) 82.86 (5) 82.86 (5) 
Breast Cancer (32) 95.43 (8) 95.43 (8) 96.14 (6) 96.14 (6) 
Ionosphere (34) 92.58 (14) 93.44 (11) 93.72 (14) 93.72 (10) 
Soybean (35) 83.86 (20) 84.61 (15) 91.73 (12) 91.73 (12) 
Spam base (57) 72.55 (4) 74.33 (13) 76.83 (11) 81.03 (16) 
Sonar (60) 78.85 (19) 82.20 (17) 81.32 (9) 82.71 (12) 
Urban Land Cover (147) 72.44 (15) 74.22 (14) 75.70 (14) 76.00 (19) 
5.3 Compare FS-GA with the standard 
methods using KNN 
As shown in Table 5, it is obvious that the classification 
accuracy is enhanced by the GA step based on the KNN 
classifier. FS-GA shows the maximum performance in 
almost all the tested datasets. In the Wine dataset, FS-GA 
produces equal solutions with SFFS and IFFS because the 
dataset is small with limited subset combinations. In the 
Thoracic Surgery dataset, even though IFFS has the same 
maximum accuracy as SFFS and FS-GA, it has a higher 
number of selected features, thus SFFS and FS-GA 
perform better in this case. In the Lymphography datasets, 
SFFS and FS-GA are the best methods. In Image 
Segmentation and Ionosphere datasets, IFFS performs 
quite well and has equal maximum accuracy with our 
method. For the rest of the results, while the feature size 
increases, the FS-GA algorithm produces the highest 
performance among the other techniques. 
 
Table 5: Classification Accuracy using KNN. 
Dataset Name SFS SFFS IFFS    FS-GA 
Wine (13) 92.21 (6) 93.33 (7) 93.33 (7) 93.33 (7) 
Thoracic Surgery (17) 84.89 (5) 85.96 (9) 85.96 (10) 85.96 (9) 
Lymphography (18) 86.55 (3) 90.12 (11) 88.72 (12) 90.12 (11) 
Image Segmentation (19) 80.95 (10) 80.95 (7) 81.43 (8) 81.43 (8) 
Breast Cancer (32) 88.09 (20) 88.66 (20) 88.80 (20) 88.97 (20) 
Ionosphere (34) 95.43 (18) 95.43 (8) 95.43 (6) 95.43 (6) 
Soybean (35) 93.45 (5) 94.02 (12) 94.59 (12) 94.87 (9) 
Spam base (57) 89.10 (18) 88.73 (16) 90.99 (20) 91.35 (20) 
Sonar (60) 88.92 (20) 91.26 (18) 92.09 (19) 92.13 (20) 
Urban Land Cover (147) 79.36 (12) 79.84 (11) 79.34 (17) 84.08 (17) 
 
Optimizing Sequential Forward Selection on Classification... Informatica 47 (2023) 81–90 89 
5.4 Compare FS-GA using different 
classifiers 
In Table 6, the comparison of FS-GA is presented by 
applying different criterion functions using DT, NB, and 
KNN. DT gives the highest accuracy on only three tested 
datasets related to Wine, Image Segmentation, and Breast 
Cancer datasets. Most of the highest accuracies are from 
the KNN classifier on the seven remaining sample 
datasets. On the other hand, using the NB classifier does 
not provide the best results in most cases. Accordingly, 
different classifiers affect the algorithm's performance. 
They return different results since each has a unique 
characteristic that allows access to the feature subset 
selection. Moreover, KNN returns the highest accuracy 
compared with the other two classifiers. Hence, KNN is a 
desired classifier for capturing the best solutions. 
 
Table 6: Classification Accuracy of FS-GA using DT, 
NB, and KNN. 
Dataset name DT NB KNN 
Wine (13) 93.89 (4) 93.30 (5) 93.33 (7) 
Thoracic Surgery (17) 81.70 (6) 85.32 (5) 85.96 (9) 
Lymphography (18) 86.65 (6) 88.03 (8) 90.12 (11) 
Image Segmentation (19) 87.62 (10) 82.86 (5) 81.43 (8) 
Breast Cancer (32) 96.49 (10) 96.14 (6) 88.97 (20) 
Ionosphere (34) 94.02 (20) 93.72 (10) 95.43 (6) 
Soybean (35) 92.11 (20) 91.73 (12) 94.87 (9) 
Spam base (57) 91.07 (18) 81.03 (16) 91.35 (20) 
Sonar (60) 87.99 (9) 82.71 (12) 92.13 (20) 
Urban Land Cover (147) 81.19 (19) 76.00 (19) 84.08 (17) 
 
The FS-GA algorithm provides an evolutionary 
optimization result to a small effective feature subset. The 
improvement is reinforced with the application of the GA 
due to a more in-depth search with more probability of 
discovering the optimal solution. Similar to other 
methods, FS-GA can also perform on larger datasets with 
longer computing time. The proposed algorithm identifies 
a more relevant and informative feature from the initial 
dataset by incorporating the feature improvement step 
with the evolutionary step using GA. The significant 
characteristic of GA as a randomized-based algorithm is 
to avoid the situation where the solutions are trapped in 
the local optimum. Moreover, there is no pattern learned 
from GA, that is each execution may obtain different 
results. Hence, it may need to apply the algorithm a few 
times to get the best performance. 
5.5 Time complexity 
The time complexity for the FS-GA algorithm derives 
from two critical steps. The first one is the feature 
improvement step which removes a feature from the active 
subset and adds another one to the same set. There are n 
features from the remaining set to be selected at most. This 
same operation repeats for n iterations. Thus, the feature 
improvement step requires at most n
2
. 
The second one is the evolutionary step. The 
computing time is directly related to the number of 
generations. In this situation, the number of generations 
that are assigned to FS-GA is 100, consequently, the 
computing time is 100*k, where k is the size of the 
chromosome. The number of generations can be varied 
depending on our decision. 
Apart from the two steps mentioned earlier, other 
actions are constant time and can be ignored. Therefore, 
the time combination of the two critical steps is O(n
2
 + 
100*k). Consequently, FS-GA requires a little higher 
computation time than IFFS, since IFFS requires at most 
O(n
2
) time due to the weak feature replacement step. SFS 
is simply adding one feature at a time and leads to only 
O(n). Moreover, SFFS is adding or removing one feature 
while increasing in size up to n feature with backward 
tracking at the most k times. Hence, SFFS is bounded by 
O(kn) which is roughly O(n) time complexity. 
6 Conclusion 
This study proposed a novel sequential forward selection 
approach based on GA. The proposed algorithm is 
Forward Selection with Genetic Algorithms (FS-GA). The 
aim is to maximize the classification accuracy and 
outperform various standard methods. The proposed 
method is a searching technique in the forward direction 
by improving the performance of SFS.  
FS-GA incorporates a feature improvement step with 
GA to find the optimal subsets. The algorithm utilizes an 
evolutionary technique to optimize the solutions by 
adjusting the number of reproductions and the population 
size. GA improves the results not only by maximizing the 
classification accuracy but also by helping the searching 
process escape from the local optimum trapping. 
Therefore, with this powerful technique, the proposed 
method can search through possible feature subsets more 
exhaustively. Consequently, there is a greater chance of 
discovering a better subset from the candidate subsets.  
We applied the DT, NB, and KNN classifiers to our 
experiments, and then compared the proposed method 
with the standard SFS, SFFS, and IFFS methods. The 
results showed that FS-GA performed the best for all the 
sample datasets with O(n
2
) time complexity. 
The limitation of FS-GA is similar to the other 
evolutionary algorithms due to their random nature and 
does not guarantee the best results for the newly 
discovered subset. However, we can adjust the parameters 
such as the number of genes or generations to improve the 
results. Future directions can focus on modifying the 
number of generations or populations and altering the size 
of the individual. 
References 
[1] Zeng, Z., Zhang, H, Zhang, R. and Zhang, Y. (2014). 
Hybrid Feature Selection Method based on Rough 
Conditional Mutual Information and Naïve Bayesian 
90 Informatica 47 (2023) 81–90 K. Chotchantarakun 
Classifier. Hindawi Publishing Corporation, ISRN 
Applied Mathematics. 
 https://doi.org/10.1155/2014/382738 
[2] Somol, P., Pudil, P. and Kittler, J. (2004). Fast Branch 
& Bound Algorithms for Optimal Feature Selection. 
IEEE Transactions on Pattern Analysis and Machine 
Intelligence, 26(7), pp. 900-912. 
 https://doi.org/10.1109/tpami.2004.28 
[3] Nakariyakul, S. and Casasent, D. P. (2007). Adaptive 
branch and bound algorithm for selecting optimal 
features. Pattern Recognition Letters, 28, pp. 1415-
1427. 
 https://doi.org/10.1016/j.patrec.2007.02.015 
[4] Cai, J., Luo, J., Wang, S. and Yang, S. (2018). Feature 
selection in machine learning: A new perspective. 
Neurocomputing, pp. 70-79. 
 https://doi.org/10.1016/j.neucom.2017.11.077 
[5] Chandrashekar, G. and Sahin, F. (2014). A survey on 
feature selection methods. Computers and Electrical 
Engineering, 40, pp. 16-28. 
[6] Sutha, K. and Tamilselvi, J. J. (2015). A Review of 
Feature Selection Algorithms for Data Mining 
Techniques. International Journal on Computer 
Science and Engineering (IJCSE), pp. 63-67. 
[7] Jovic, A., Brkic, K. and Bogunovic, N. (2015). A 
review of feature selection methods with applications. 
International Convention on Information and 
Communication Technology. 
[8] Pudil, P., Novovicova, J. and Kittler, J. (1994). 
Floating search methods in feature selection. Pattern 
Recognition Letters, pp. 1119-1125. 
 https://doi.org/10.1016/0167-8655(94)90127-9 
[9] Pavya, K. and Srinivasan, B. (2017). Feature 
Selection Techniques in Data Mining: A Study. 
International Journal of Scientific Development and 
Research (IJSDR), 2(6), pp. 594-598. 
[10] A. W. Whitney. (1971). A Direct Method of 
Nonparametric Measurement Selection. IEEE 
Transactions on Computers, pp. 1100-1103. 
 https://doi.org/10.1109/t-c.1971.223410 
[11] Somol, P., Pudil, P., Novovicova, J. and Paclik P. 
(1999). Adaptive floating search methods in feature 
selection. Pattern Recognition Letters, pp. 1157-
1163. 
 https://doi.org/10.1016/s0167-8655(99)00083-5 
[12] Nakariyakul, S. and Casasent, D. P. (2009). An 
improvement on floating search algorithms for 
feature subset selection. Pattern Recognition, pp. 
1932-1940. 
 https://doi.org/10.1016/j.patcog.2008.11.018 
[13] Lv, J., Peng, Q. and Sun, Z. (2015). A modified 
sequential deep floating search algorithm for feature 
selection. International Conference on Information 
and Automation, pp. 2988-2933. 
 https://doi.org/10.1109/icinfa.2015.7279800 
[14] Pudil, P., Ferri, F. J., Novovicova, J. and Kittler, J. 
(1994). Floating Search Methods for Feature 
Selection with Nonmonotonic Criterion Functions. 
Proceedings of the 12th IAPR International 
Conference on Pattern Recognition, pp. 279-283. 
 https://doi.org/10.1109/icpr.1994.576920 
[15] Chotchantarakun, K. and Sornil, O. (2021). An 
Adaptive Multi-levels Sequential Feature Selection. 
International Journal of Computer Information 
Systems and Industrial Management Applications 
(IJCISIM), 13, pp. 010-019. 
[16] Chotchantarakun, K. and Sornil, O. (2021). Adaptive 
Multi-level Backward Tracking for Sequential 
Feature Selection. Journal of ICT Research and 
Applications, 15, pp. 1-20. 
 https://doi.org/ 1 0 . 5 6 1 4 /itbj.ict.res.appl. 2 0 2 1 . 1 5 . 1 . 1 
[17] Holland, J. H. (1992). Adaptation in Natural and 
Artificial Systems: An Introductory Analysis with 
Applications to Biology, Control, and Artificial 
Intelligence, MIT Press: Cambridge, UK. 
 https://doi.org/10.7551/mitpress/1090.003.0016 
[18] El-Shafiey, M. G., Hagag, A., El-Dahshan, E. A. and 
Ismail, M. A. (2022). A hybrid GA and PSO 
optimized approach for heart-disease prediction 
based on random forest. Multimedia Tools and 
Applications, 81, pp. 18155-18179. 
 https://doi.org/10.1007/s11042-022-12425-x 
[19] Homsapaya, K. and Sornil, O. (2017). Improving 
Floating Search Feature Selection using Genetic 
Algorithm. Journal of ICT Research and 
Applications, 11(3), pp. 299-317. 
 https://doi.org/10.5614/itbj.ict.res.appl.2017.11.3.6 
[20] Ileberi, E., Sun, Y. and Wang, Z. (2022). A machine 
learning based credit card fraud detection using the 
GA algorithm for feature selection. Journal of Big 
Data, 9(24). 
 https://doi.org/10.1186/s40537-022-00573-8 
[21] Aswal, S., Jyothi, A. and Mehra, R. (2023). Feature 
Selection Method Based on Honeybee-SMOTE for 
Medical Data Classification. Informatica, 46(9), pp. 
111-118. 
 https://doi.org/10.31449/inf.v46i9.4098 
[22] Alija, S., Beqiri, E., Gaafar, A. S. and Hamoud, A. K. 
(2023). Predicting Students Performance Using 
Supervised Machine Learning Based on Imbalanced 
Dataset and Wrapper Feature Selection. Informatica, 
47(1), pp. 11-20. 
 https://doi.org/10.31449/inf.v47i1.4519 
[23] Al-jadir, I., Wong, K. W., Fung, C. C. and Xie, H. 
(2017). Text Document Clustering Using Memetic 
Feature Selection. Proceedings of the 9th 
International Conference on Machine Learning and 
Computing (ICMLC), pp. 415-420. 
https://doi.org/10.1145/3055635.3056603 
[24] Panda, D., Panda, D., Dash, S. R. and Parida, S. 
(2021). Extreme Learning Machines with Feature 
Selection Using GA for Effective Prediction of Fetal 
Heart Disease: A Novel Approach. Informatica, 
45(3), pp. 381-392. 
 https://doi.org/10.31449/inf.v45i3.3223 
[25] Dua, C. Graff. (2019). UCI Machine Learning 
Repository, Irvine, CA: University of California, 
School of Information and Computer Science. 
 https://archive.ics.uci.edu/datasets