https://doi.org/10.31449/inf.v47i9.5148 Informatica 47 (2023) 133 –144 133 Hyperparameter Optimization for Convolutional Neural Networks using the Salp Swarm Algorithm Entesar H. Abdulsaed 1 , Maytham Alabbas 1* , Raidah S. Khudeyer 2 1 Department of Computer Science, College of Computer Science and Information Technology, University of Basrah, Basrah, Iraq. 2 Department of Computer Information Systems, College of Computer Science and Information Technology, University of Basrah, Basrah, Iraq. Email: hopidhadha@gmail.com, ma@uobasrah.edu.iq, raidah.khudayer@uobasrah.edu.iq * Correspondence Keywords: deep learning, convolutional neural networks, salp swarm algorithm, hyperparameters optimization Received: November 22, 2023 Convolutional neural networks (CNNs) have exceptionally performed across various computer vision tasks. However, their effectiveness depends heavily on the careful selection of hyperparameters. Optimizing these hyperparameters can be challenging and time-consuming, especially when working with large datasets and complex network architectures. In response, we propose a novel approach for hyperparameter optimization in CNNs using the Salp Swarm Algorithm (SSA). Based on the natural behavior of mollusks, SSA mimics the collective intelligence that governs feeding and navigation. Taking advantage of SSA's unique properties, our research thoroughly explores the hyperparameter space. This exploration aims to identify the algorithm that maximizes CNNs performance. This paper presents the architecture of the SSA-based framework for hyperparameter optimization and compares it to other established optimization techniques, such as Particle Swarm Optimization (PSO) and Genetic Algorithm (GA). We also present experimental results using the MNIST and fashion MNIST datasets, achieving an impressive classification accuracy of 99.46% for MNIST and 94.53% for fashion-MNIST. This case study not only contributes to the fields of deep learning and hyperparameter optimization by demonstrating the effectiveness of SSA in optimizing CNNs, but it also provides benefits to researchers and practitioners who are looking for optimal hyperparameter configurations for CNNs in a variety of computer vision applications. We also evaluate the scalability and robustness of our proposed method in the context of different CNNs structures. The insights we gained highlight SSA's potential for addressing challenges related to hyperparameter optimization. Povzetek: Članek predstavlja optimizacijo hiperparametrov v konvolucijskih nevronskih mrežah s pomočjo algoritma Salp Swarm, ki izboljša učinkovitost in natančnost. 1 Introduction Deep learning has emerged as a powerful and versatile field within the broader domain of machine learning [1]. It has revolutionized various domains, such as computer vision, natural language processing, and speech recognition [2, 3]. One of the fundamental techniques used in deep learning is convolutional neural networks (CNNs), which have demonstrated exceptional performance in image recognition, object detection, and classification tasks. CNNs are designed to process grid-like data, such as images, by capturing spatial and hierarchical relationships between different features. Their architecture consists of multiple layers, including convolutional (Conv.), pooling, and fully connected layers (FC layers), allowing them to automatically extract meaningful features from input data [4]. This inherent capability makes CNNs highly effective in analyzing visual data and extracting intricate patterns that may not be discernible to the human eye [5]. While CNNs offer numerous advantages, including their ability to handle large amounts of data, learn complex representations, and achieve state-of-the-art performance in various tasks, they also possess specific weaknesses [6]. One of the critical challenges in utilizing CNNs effectively is selecting appropriate hyperparameters [7]. Hyperparameters are the configuration settings that control the behavior and performance of a CNNs model, such as learning rate, batch size, dropout rate, and kernel size [8]. The optimal selection of hyperparameters significantly impacts CNN models' performance and convergence speed. However, choosing the right combination of hyperparameters is a challenging and time-consuming task. Traditional methods, such as grid search, random search, and Bayesian optimization, suffer from computational inefficiency and may not explore the entire hyperparameter space effectively. Therefore, there is a need for advanced techniques that can efficiently and effectively optimize hyperparameters for CNNs [9]. 134 Informatica 47 (2023) 133 –144 E.H. Abdulsaed et al. In recent years, metaheuristic optimization algorithms have gained significant attention in deep learning for hyperparameter optimization. One such algorithm is the salp swarm algorithm (SSA), inspired by the collective behavior of natural salp swarms. SSA is a population- based metaheuristic algorithm that mimics the social behavior of salps to search for the global optimum in a given search space. It has shown promising results in solving various optimization problems since its proposal, including feature selection, neural network training, clustering analysis, image processing, engineering optimization, and financial portfolio optimization. The current work is a step forward in this regard. It focuses on investigating the effectiveness of SSA in optimizing hyperparameters for CNNs. We will conduct experiments using the MNIST dataset [10] and fashion MNIST [11] and compare the performance of SSA with other methods. SSA is chosen for its ability to approximate optimal solutions with satisfactory convergence rates and solution space coverage. Its straightforward mathematical framework makes it simpler to comprehend and implement [12]. The results obtained will provide valuable insights into the efficiency and effectiveness of SSA for hyperparameter optimization in CNNs and contribute to the ongoing research efforts in improving the performance of deep learning models. This paper is structured as follows: Section 2 delves into prior research focused on enhancing CNNs. Moving to Section 3, we explore the components and tools integral to CNNs. Our proposed work is introduced in Section 4, while Section 5 is dedicated to presenting our experimental evaluation. The outcomes and analysis of our experiments are detailed in Section 6, and in Section 7, we conclude the paper while also addressing potential avenues for future research. 2 Related work Many researchers have used different techniques to automatically set hyperparameters and choose the structure for CNNs. This is done to avoid the additional voltage and time required for manual network construction and to improve the performance of CNNs. In this section, we preview the state-of-the-art studies in this field and summarize them in Table 1. Some researchers use particle swarm optimization (PSO) to improve the accuracy of CNNs. In [13], PSO was recommended for use in CNNs. To increase accuracy, PSO is used in the training phase to optimize the results of the solution vectors on CNNs. In [14], CNNs were optimized using microcanonical annealing. As suggested by the authors, the performance of the original CNNs may be significantly improved using this proposed method. In [15], a genetic algorithm (GA) was used to optimize multiple parameters of CNNs at once. Various types and ranges of GA parameters were also used. After a lengthy process, an approximation to the global optimal solution emerged. Training a large amount of data at once did not result in high precision. In [16], the authors suggest using distributed particle swarm optimization (DPSO) to improve the hyperparameters of CNNs for image classification tasks. The DPSO method employs a mixed-variable encoding strategy and associated update operations for each particle to encode CNNs, enabling automatic and global search for the best CNNs model. The method also employs a distributed framework to reduce execution time and accelerate optimization. However, one possible drawback is the requirement for many particles to achieve good results, which can increase computational complexity. In [17], the authors proposed an automatic method incorporating enhanced metaheuristic algorithms (the tree growth and firefly algorithms) for optimizing hyperparameters and designing structures. However, the proposed methods had a higher computational cost, restricting the inclusion of more datasets in the study. In [18], the researchers proposed optimizing CNNs hyperparameters using linearly decreasing weight particle swarm optimization (LDWPSO). The architecture of this model is LeNet-5. They mentioned the need for additional research and testing to validate the method's effectiveness. Additionally, they did not discuss potential limitations or challenges associated with the proposed method, such as computational complexity or convergence issues. In [19], this work discusses a method for image classification that utilizes histograms of oriented gradient (HOG) features, gray-level co-occurrence matrix (GLCM) features, and support vector machine (SVM) for classification. The authors used the fashion-MNIST dataset to examine the accuracy of this method, and they obtained an accuracy of 91.59%. The work in [20] focuses on the swarm intelligence component of the OpenNAS system for neural architecture search (NAS). The authors use PSO and ACO swarm algorithms with transfer learning for feature extraction (VGG16). In [21], the study introduces the competitive activation function (CAF) concept and derives the parameter-free rectified exponential unit (PFREU) as a particular kind of CAF. The authors use two architectures for classification: LeNet-5 on fashion-MNIST and ResNet-110 on CIFAR-10. In [22], the authors presented IntelliSwAS, a method for optimizing deep neural network architectures for classification and regression tasks. They used DAGRNN [23] to improve the search technique. IntelliSwAS effectively located high-quality CNNs cells, but these cells had to be manually incorporated into larger CNNs architectures. In [24], this study proposed a hybrid particle swarm optimization and grey wolf optimization (HPSGW) algorithm to optimize these hyperparameters and enhance the accuracy of the CNNs model. In [25], the authors proposed a simple deterministic selection genetic algorithm (SDSGA) to optimize the hyperparameters of two well-known machine learning models: CNNs and the random forest (RF) algorithm. In [26], the authors use a multiple convolutional neural network (MCNN15) with 15 convolutional layers. Hyperparameter Optimization for Convolutional Neural Networks … Informatica 47 (2023) 133 –144 135 In [27], the proposed work compared the performance of CNNs and ANNs for image classification on the Fashion-MNIST apparel dataset using various optimizers. In [28], the study employed large-scale deep learning networks such as VGG16 and ResNet to enhance classification accuracy and an approximate dynamic learning rate update algorithm to ensure rapid convergence and reduced training time. In [29], the authors utilized the local autonomous competitive harmony search (LACHS) algorithm to achieve the highest classification accuracy on the Fashion- MNIST and CIFAR-10 datasets. The VGGNet was the main network used for experimental research in this work. Ref. year Method Parameters for optimization Limitations Dataset Accur acy % [13] 2016 CNN-PSO kernel size, pool size, learning rate • CNNPSO consumes a longer time than CNN. • CNNPSO accuracy is slightly lower than CNN optimized with simulated annealing. MNIST 95.08 [14] 2017 CNN-MAA kernel size, pool size • MA needs more processing time MNIST 98.75 [15] 2019 GA Learning rate, dropout, batch size, no. of layers • The experiment took a long because of the large dataset. MNIST 99.4 [16] 2020 PSO (DPSO) kernel size, type of pooling, Activ.Fun. in FC, dropout, Learning rate • Many particles required for good results can increase computational complexity. MNIST, Fashion-MNIST 99.3, 92.92 [17] 2020 SI (tree growth & firefly) algorithms no. of convolution, no. of FC, kernel size, no. of kernels per conv. layer, FC-layer size • Use a single dataset to evaluate the accuracy of the method. MNIST 99.18 [18] 2020 PSO (LDWPSO) no. of kernels, kernel size, activation fun., no. of neurons, batch size, optimizer • The technique uses a simple and basic CNN architecture (LeNet-5). • There is a lack of comparison with other optimization methods or CNN architectures. MNIST 98.95 [20] 2020 PSO & ACO no. of kernels, kernel size, dropout rate • There is no comparison with other neural architecture search methods and no performance-efficiency trade-off analysis. Fashion-MNIST 94.5 [21] 2021 CAF activation fun. • There is no comparison to other state-of-the-art activation functions. • There is no theoretical analysis of CAF. • There is no investigation of hyperparameter impact on performance. Fashion-MNIST 91.21 [22] 2022 PSO (IntelliSwAS) convolution, depthwise-separable convolution, dilated convolution • IntelliSwAS found high-quality CNN cells but required manual incorporation into larger CNN architectures. MNIST 95 [24] 2022 PSO&GWO (HPSGW) No. of Kernel, Kernel Size, Batch size No. of Epochs • It can optimize a few CNN hyperparameters. • High computational cost. MNIST 99.4 [25] 2022 GA SDSGA learning rate, batch size • Selection may reduce diversity, and a fixed mutation rate may not work for all problems. MNIST 99.2 [26] 2022 MCNN15 no. of the kernel, kernel size, batch size, no. of neurons • There is no evaluation of model performance or comparison to state- of-the-art. Fashion-MNIST 94.04 Table 1: Summarization of the related works. 136 Informatica 47 (2023) 133 –144 E.H. Abdulsaed et al. [27] 2022 ANN and CNN optimizer • Difficulty handling complex or novel images. • Sensitivity to hyperparameter choices. • High computational cost and time consumption. Fashion-MNIST 91 [28] 2023 VGG16, ResNet, and approximate dynamic learning rate update algorithm learning rate • Deep network hierarchies and complex parameters can overfit, limiting training time, especially with small samples. Fashion-MNIST 93 [29] 2023 LACHS no. of the kernel, kernel size, Activ.Fun. in conv., no. of neurons, learning rate, batch size, Momentum • No comparison to other hyperparameter optimization techniques makes it hard to evaluate the usefulness and superiority of LACHS. Fashion-MNIST 93.34 3 Tools In this section, we look at the necessary background for the two main techniques used in our proposed approach: CNNs and SSA. 3.1 Convolution neural networks (CNNs) CNNs represent a category of deep neural networks that have gained significant prominence in computer vision applications. These networks have revolutionized the field by achieving cutting-edge results across various tasks. They have proven their mettle in diverse domains, such as handwriting recognition [30], automotive safety [31], video surveillance [32], face detection [33], semantic segmentation [34], and speech recognition [35]. This versatility has rendered them indispensable in modern computing systems. Explicitly designed for data with a grid-like structure, such as images, CNNs have surged in importance due to their capacity to automate the once manual and time-intensive feature extraction process. At the heart of CNNs lies a pivotal feature: weight sharing. By sharing weights, these networks reduce the number of trainable parameters, a feat that enhances generalization capabilities and curbs overfitting issues. Unlike traditional neural networks, CNNs capitalize on the intrinsic spatial organization present in images. This enables them to capture local relationships and learn hierarchical representations. The architecture of CNNs encompasses a multi-stage structure, integrating both linear and nonlinear operations to undertake feature extraction and classification. The initial feature extraction stage encompasses a sequence of primary layers, including the Conv. layer housing an activation function (Act. Fun.) and a subsequent pooling layer. Conversely, the classification stage contains numerous FC layers [36]. It is important to note that this architecture necessitates substantial data for training, demanding a significant time investment and considerable expertise for manual construction. To address this, many optimization techniques have been deployed to fine-tune hyperparameters and structures [37]. Researchers have engineered several CNNs models, training them on specific problem areas using varied datasets and achieving impressive results within these domains. Leveraging a pre-trained network involves tailoring it to a particular task. Commonly referred to as "transfer learning," this approach enables the classification of images across a vast array of 1,000 distinct categories, avoiding the need to build CNNs from scratch. The fine- tuning of hyperparameters through transfer learning can involve freezing or unfreezing layers [38]. An array of pre-trained models, including DenseNet [25], EfficientNet [26], MobileNetV3 [27], and more, offer additional options for developers and researchers in this realm. The overall structure of the network is built in the form of layers stacked on top of each other to process the data that enters it, extract features from it, and classify it according to the problem. These layers are: 3.2 Input layer The input layer, which is the leftmost layer, represents the input image to the CNNs. 3.3 Convolutional layers The convolutional layers are the foundation of CNNs. They contain the learned kernels (filters, weights) that are used to extract features from images. This is done by convolving the input image with a stack of kernels. Each kernel extracts a specific feature, such as edges, textures, or object parts [39]. By using multiple kernels, CNNs can capture a variety of spatial patterns. This allows them to automatically learn relevant visual features without requiring manual feature engineering. Each convolutional layer has a set of hyperparameters that are initialized before the layer is used. These hyperparameters determine the number of connections and output size for the feature maps. The hyperparameters are: Hyperparameter Optimization for Convolutional Neural Networks … Informatica 47 (2023) 133 –144 137 • Number of filters: This determines the depth of the resulting feature maps. • Filter size: This gives the spatial dimensions of the filters. • Padding: This determines the amount of zero-padding applied to the input data. • Stride: This specifies the step size for shifting the filters over the input. After all layers with weights (also known as trainable layers, such as Conv. layers and FC layers) in CNNs architecture, nonlinear activation layers are used. The non- linearity of the activation layers means that the mapping from input to output is nonlinear. This allows the CNNs to learn complex things [9]. CNNs commonly use the rectified linear unit (ReLU), sigmoid, and hyperbolic tangent (tanh) activation functions. ReLU is the most popular Act. Fun. because of its simplicity and ability to mitigate the vanishing gradient problem. 3.4 Pooling layers Pooling layers are commonly used in CNNs to reduce the spatial dimensions of feature maps while preserving their most essential features. This helps to reduce the amount of computing power required to process data and makes the model more robust to small spatial variations. There are two main types of pooling layers: max pooling and average pooling. In max pooling, the maximum value in each region of the feature map is selected. In average pooling, the average value in each region is selected. The pooling size and stride are two hyperparameters that need to be specified for each pooling layer. The pooling size determines the size of the pooled region, and the stride determines the step size used to move the pooling region over the feature map. The features extracted by the pooling layers are then passed to the next layer in the CNN, which is typically an FC layer. The FC layer takes the features from the pooling layers and combines them to predict the input image. Prior to the entire FC layer, the previous layer's output must be transformed into a one-dimensional vector. This flattening process converts multidimensional feature maps into a format compatible with completely connected layers. 3.5 Fully connected layers The FC layers are positioned just before a CNNs output layer. They are responsible for converting the learned features into class probabilities or regression values. FC layers connect every neuron in the previous layer to every neuron in the next layer. This allows them to combine high-level features and make accurate predictions. However, they also introduce many parameters, which can lead to overfitting if not carefully regularized. The number of neurons in an FC layer determines the layer's output size. The number of neurons required varies depending on the specific task being performed. For example, in image classification, the output layer may contain neurons representing different classes, while in object detection, it may contain neurons for bounding box coordinates and class probabilities. Here are some of the hyperparameters that need to be tuned for FC layers to give the best results in CNNs: • Number of hidden layers: The more hidden layers, the better the network performs, which can also lead to overfitting. A good starting point is to use two or three hidden layers. • Number of neurons: The number of neurons in each hidden layer should be related to the complexity of the task. The task requiring a higher level of prediction requires more neurons. • Activation function: The activation function determines how the output of each neuron is transformed. A commonly used Act. Fun. is the rectified linear unit (ReLU). • Weight initialization: The weights of the FC layers are initialized randomly. A suitable initialization method can help to prevent the network from converging to a suboptimal solution. • Regularization: Regularization techniques can help to prevent overfitting. A commonly used regularization technique is dropout. 3.6 Output layer The output layer is the final layer of neurons in CNNs. It generates the network output, typically a class prediction or a regression value. The output layer is typically an FC layer, which means that each neuron in the output layer is connected to all of the neurons in the previous layer. This allows the output layer to combine the features extracted by the earlier layers and make a prediction about the input data. The choice of activation function in the output layer is determined by the specific task being performed by the CNN. For classification tasks, a commonly used activation function is the softmax function. The softmax function takes a vector of outputs from the previous layer and transforms it into a vector of probabilities, where each probability represents the likelihood that the input data belongs to a particular class. Figure 1 shows the general architecture of CNNs. Figure 1: Simple structure of CNNs 138 Informatica 47 (2023) 133 –144 E.H. Abdulsaed et al. CNNs are trained using backpropagation, which involves iteratively passing data through the network (forward propagation) and then adjusting the network weights based on the error (backpropagation). The number of Conv., pooling, and FC layers can vary depending on the task and the available computational resources. CNNs have several strengths, including understanding spatial hierarchies of features, handling large amounts of data, and generalizing well to new data. However, they also have limitations, such as the need for extensive training data and the computational expense. 3.7 Salp Swarm Algorithm (SSA) SSA is a population-based metaheuristic algorithm inspired by the chain formation behavior of salps, which are gelatinous marine organisms [40]. The SSA algorithm maintains a population of solutions, each representing a salp. The salps move towards better solutions by adjusting their positions and velocities. The movement of a salp is influenced by three main factors: the current location of the best solution, the position of the best solution in its neighborhood, and a randomization factor [41]. The SSA algorithm has been shown to be effective for various optimization problems, including function optimization, engineering design, and parameter estimation. The SSA algorithm has several advantages over other metaheuristic algorithms, such as PSO, GA, and DE. These advantages include [42]: • Simple to implement: The SSA algorithm is relatively simple to implement, making it easy to understand and use. • Fewer parameters: The SSA algorithm has fewer parameters than other metaheuristic algorithms, making tuning easier. • Robust: The SSA algorithm is robust to noise and outliers, making it a good choice for problems with noisy data. • Efficient: The SSA algorithm is efficient, making it a good choice for large-scale optimization problems. The pseudocode for the SSA algorithm is clearly illustrated in Figure 2. Initialize the salp population 𝑥 𝑖 (i= 1, 2, ..., n) considering ub and lb While (end condition is not satisfied) Calculate the fitness of each search agent (salp) F=the best search agent Update c1 by use: 𝑐 1 = 2𝑒 −( 4𝚤 𝐿 ) 2 For each salp (𝑥 𝑖 ) if (𝑖 ==1) Update the position of the leading salp using: 𝑥 𝑗 1 = { 𝐹 𝑗 + 𝑐 1 ((𝑢 𝑏 𝑗 − 𝑙 𝑏 𝑗 )𝑐 2 + 𝑙 𝑏 𝑗 ) 𝑐 ₃ ≥ 0 𝐹 𝑗 − 𝑐 1 ((𝑢 𝑏 𝑗 − 𝑙 𝑏 𝑗 )𝑐 2 + 𝑙 𝑏 𝑗 ) 𝑐 ₃ < 0 else Update the position of the follower salp using: 𝑥 𝑗 𝑖 = 1 2 (𝑥 𝑗 𝑖 + 𝑥 𝑗 𝑖 −1 ) endif endfor Amend the salps based on the upper and lower bounds of variables endwhile return F Figure 2: Pseudocode of the SSA algorithm 4 Current work In the present work, we use the SSA to identify the optimal hyperparameter settings for enhancing the performance of CNNs. We strategically select six key hyperparameters: the number of kernels, kernel size, pool size, dropout rate, hidden units, and learning rate. The learning rate is particularly influential, as it profoundly impacts the network's operation. The number of available network weights limits the scale of the candidate solution population. We can address this challenge by creating multiple iterations of the CNNs architecture. We create distinct versions of the CNNs by assigning various values to each of the six identified hyperparameters. We then use the SSA to individually train each version, using a diverse set of candidate solutions for training. This diversified approach to hyperparameters primarily aims to maximize the classification accuracy of CNNs. By systematically exploring and optimizing the hyperparameters, we aim to extract the best possible performance from CNNs, pushing the boundaries of their classification capabilities. 4.1 Individual representation To scrutinize the scalability and robustness of the SSA- CNNs method, we undertook extensive training and evaluation across multiple CNN architectures, datasets, and a range of hyperparameter configurations. This iterative process led us to the current set of hyperparameters and settings. The current individual is characterized by a 6-dimensional vector corresponding to the following hyperparameters for CNNs. Each of them is defined within specific ranges, with lower and upper boundaries as follows: • Number of kernels: 32, 64, 128, 256, 512, or 1024 • Kernel size: 3, 5, or 7 • Pool size: 3, 5, or 7 • Dropout rate: 0.2, 0.3, or 0.4 • Learning rate: 0.001, 0.0001, or 0.00001 • Hidden units: 64, 128, 256, 512, or 1024 Figure 3 shows a visual representation of an individual in the context of this work. Number of kernels Kernel size Pool size Dropout rate Learning rate Hidden units Figure 3: Presentation of SSA individual 4.2 Fitness evaluation Our method's goal, consistent with the principles of all metaheuristic algorithms, is to quickly identify an individual with superior accuracy (or minimized errors). In our current approach, fitness evaluation involves assessing the accuracy of individual CNNs, each of which is represented as an independent entity. The architectural details of the salp are stored in the population. These details are then transferred to CNNs with the corresponding architecture. The CNNs model is Hyperparameter Optimization for Convolutional Neural Networks … Informatica 47 (2023) 133 –144 139 then trained on the supplied training data using multiple epochs to evaluate the "salp." Figure 4 depicts the flowchart of the present method. Figure 4: Flowchart of the current work 5 Experimental evaluation • Dataset To assess the effectiveness of our approach, we selected the MNIST and Fashion-MNIST datasets for their widespread use in the deep learning community, manageable dataset sizes, and diverse content, enabling us to effectively assess the generalizability of the current work across different domains. ▪ The MNIST dataset. This dataset is a widely used benchmark for image classification tasks, and it consists of 70,000 grayscale images of handwritten digits. The images are 28 × 28 pixels in size [17], and they have been preprocessed, normalized, and formatted to improve their consistency [43]. The MNIST dataset is divided into two sets: 60,000 images for training and 10,000 images for testing. The training set is used to train the image recognition model, and the test set is used to evaluate the model's performance. The MNIST dataset is valuable for developing and evaluating image recognition models. It is a standardized benchmark that allows researchers to compare their results with other researchers. ▪ The Fashion-MNIST dataset comprises 70,000 grayscale images of fashion items from 10 categories, each having 7,000 images. The images are of size 28 × 28 pixels. The dataset has a training set of 60,000 images and a test set of 10,000 images. This dataset is intended to replace the original MNIST dataset as it has the exact image dimensions, data format, and training/testing split structure [11]. • Parameters Setting To understand the parameters used in this study, please refer to Table 2. The table has two categories of parameters: CNNs Training and SSA. The first category of parameters defines the basic CNNs architecture. These include the number of convolutional layers, the number of pooling layers, the activation function for convolutional layers, the stride, the padding configuration, the specifications for hidden layers, the activation function for FC layers, the activation function for the output layer, the chosen loss function, the optimizer selection, the metrics used, the designated epochs, and the batch size. The second category of parameters controls the behavior of the SSA. This category has two variables: the number of salps and the maximum number of generations. Table 2: Parameters used for evaluating the current work. CNNs parameters values 1 No. of convolutional layers 2 2 No. of pooling layers 2 3 Activation function for convolutional layers Relu 4 Stride 1 5 padding same 6 Hidden layers 3 7 Activation function for FC layers Relu 8 Activation function for the output layer softmax 9 Loss function Categorical- crossentropy 10 Optimizer Adam 11 Metrics accuracy 12 epochs 10 13 Batch-size 128 SSA parameters values 1 Number of salps 15 2 Max-generations 100 We utilized Google Colab, which offers a cost-free environment and hardware acceleration for Python 3 programming, equipped with the GPU T4. 140 Informatica 47 (2023) 133 –144 E.H. Abdulsaed et al. 6 Results and analysis We present the results of our approach and compare it to other methods in terms of their accuracies on the MNIST and fashion-MNIST datasets. We also show the optimal architectures found in this work. Tables 3-4 show the experimental results of our approach compared to other methods. Table 3: Comparison of accuracy of the current work and different models on the MNIST dataset. Ref. Method Acc.% [13] CNN-PSO 95.08 [14] CNN-MAA 98.75 [15] GA 99.4 [16] PSO (DPSO) 99.3 [17] SI (tree growth & firefly) algorithms 99.18 [18] PSO(LDWPSO) 98.95 [22] PSO (IntelliSwAS) 95 [24] PSO&GWO (HPSGW) 99.4 [25] GA(SDSGA) 99.2 Our method (SSA-CNNs) 99.46 Table 4: Comparison of current work accuracy and different models on the fashion-MNIST dataset. Ref. Method Acc.% [16] PSO (DPSO) 92.92 [20] PSO & ACO 94.5 [21] CAF 91.21 [26] MCNN15 94.04 [27] Multioptimizers 91 [28] approximate dynamic learning rate update algorithm 93 [29] LACHS 93.34 Our method (SSA-CNNs) 94.53 Table 3 illustrates that the SSA-CNNs method (99.46%) outperforms most other techniques on the MNIST dataset. Specifically, SSA-CNNs achieved higher accuracy than CNN-PSO (95.08%), CNN-MAA (98.75%), PSO (DPSO) (99.3), PSO (LDWPSO) (98.95), PSO (IntelliSwAS) (95), and SI (tree growth & firefly) algorithms (99.18). It also performed on par with the best-performing techniques, GA (99.4), PSO&GWO (HPSGW) (99.4), and GA (SDSGA) (99.2). These results suggest that the SSA-CNNs method is highly competitive and may offer superior accuracy compared to other optimization techniques when applied to the MNIST dataset. It showcases the effectiveness of SSA in enhancing CNNs for image classification tasks. Table 4 shows that the SSA-CNNs method achieves the best accuracy on the Fashion-MNIST dataset, with an accuracy of 94.53%. This is higher than the accuracy of any of the other techniques listed, including PSO (DPSO) (92.92%), PSO & ACO (94.5), CAF (91.21), MCNN15 (94.04), Multioptimizers (91), approximate dynamic learning rate update algorithm (93), and LACHS (93.34). This suggests that the SSA-CNNs method is a promising approach for image classification tasks and may be particularly well-suited for datasets such as Fashion- MNIST, which contain many classes. Overall, the SSA-CNNs technique has proven to be a highly effective method for image classification on both the MNIST and Fashion-MNIST datasets. With a remarkable 99.46% accuracy on MNIST and a competitive 94.53% accuracy on Fashion-MNIST, SSA-CNNs showcases its versatility and robustness. This approach, which integrates SSA with CNNs, offers a promising path for optimizing image classification tasks, consistently delivering outstanding results. The top-performing individuals achieving the highest accuracy on MNIST and Fashion-MNIST are depicted in Figures 5 and 6, respectively. Number of kernels Kernel size Pool size Dropout rate Learning rate Hidden units 512 3×3 3×3 0.3 0.001 512 Figure 5: Best individual for MNIST dataset. Figure 5 presents the best individual layered architecture for MNIST. Number of kernels Kernel size Pool size Dropout rate Learning rate Hidden units 512 5×5 5×5 0.3 0.0001 128 Figure 6: The best individual for the fashion-MNIST dataset. Figure 7: The present method of layered architecture for MNIST dataset. 7 Conclusion We present a new approach to optimizing CNNs using the SSA method. This approach has several advantages. It balances accuracy, computational efficiency, and training time well. It also achieves exceptional classification accuracy on the MNIST and fashion-MNIST datasets. This SSA-based optimization method outperforms other Hyperparameter Optimization for Convolutional Neural Networks … Informatica 47 (2023) 133 –144 141 algorithms that require significant computational resources and time, making it a promising candidate for practical applications. The proposed method allows seamlessly integrating CNNs into real-world scenarios, particularly in resource- constrained and time-sensitive settings. Future research could explore the adaptability of the SSA-based optimization technique to other deep-learning architectures and tasks beyond computer vision. Additionally, delving into the theoretical underpinnings of the SSA algorithm and refining parameter tuning strategies could help broaden its adoption in optimization and machine learning. Our plan to improve the SSA-based hyperparameter optimization framework involves four main goals. Firstly, we will test the effectiveness of the SSA-based framework on different CNN architectures and datasets. Secondly, we intend to create new or improved SSA variants for hyperparameter optimization. Thirdly, we will integrate the SSA-based framework with other hyperparameter optimization techniques to develop a hybrid approach. Lastly, we will apply the SSA-based framework to other machine learning tasks, like natural language processing and computer vision. By pursuing these goals, we aim to make essential contributions to hyperparameter optimization. References [1] Gadri, S., Developing an efficient predictive model based on ml and dl approaches to detect diabetes. Informatica, 2021. 45(3). http://dx.doi.org/10.31449/inf.v45i3.3041 [2] Abdulla, M. and A. Marhoon, Agriculture based on Internet of Things and Deep Learning. Iraqi Journal for Electrical and Electronic Engineering, 2022. 18(2): p. 1-8. http://dx.doi.org/10.37917/ijeee.18.2.1 [3] Xu, Y., et al., Batch normalization with enhanced linear transformation. arXiv preprint arXiv:2011.14150, 2020. https://doi.org/10.48550/arXiv.2011.14150 [4] Shrestha, A. and A. Mahmood, Review of deep learning algorithms and architectures. IEEE access, 2019. 7: p. 53040-53065. http://dx.doi.org/10.1109/access.2019.2912200 [5] Hassan, N.F.A., A.A. Abed, and T.Y. Abdalla, Face mask detection using deep learning on NVIDIA Jetson Nano. International Journal of Electrical & Computer Engineering (2088-8708), 2022. 12(5). http://dx.doi.org/10.11591/ijece.v12i5.pp5427-5434 [6] Gaafar, A.S., J.M. Dahr, and A.K. Hamoud, Comparative Analysis of Performance of Deep Learning Classification Approach based on LSTM- RNN for Textual and Image Datasets. Informatica, 2022. 46(5). http://dx.doi.org/10.31449/inf.v46i5.3872 [7] Wang, Y., H. Zhang, and G. Zhang, cPSO-CNN: An efficient PSO-based algorithm for fine-tuning hyper- parameters of convolutional neural networks. Swarm and Evolutionary Computation, 2019. 49: p. 114-123. http://dx.doi.org/10.1016/j.swevo.2019.06.002 [8] Darwish, A., D. Ezzat, and A.E. Hassanien, An optimized model based on convolutional neural networks and orthogonal learning particle swarm optimization algorithm for plant diseases diagnosis. Swarm and evolutionary computation, 2020. 52: p. 100616. http://dx.doi.org/10.1016/j.swevo.2019.100616 [9] Alzubaidi, L., et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data, 2021. 8(1): p. 53 DOI: 10.1186/s40537-021-00444-8. [10]LeCun, Y., The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998. [11]Xiao, H., K. Rasul, and R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017. https://doi.org/10.48550/arXiv.1708.07747 [12]Zhang, H., et al., Differential evolution-assisted salp swarm algorithm with chaotic structure for real-world problems. Eng Comput, 2022. 39(3): p. 1735-1769 DOI: 10.1007/s00366-021-01545-x. [13]Syulistyo, A.R., et al., Particle swarm optimization (PSO) for training optimization on convolutional neural network (CNN). Jurnal Ilmu Komputer dan Informasi, 2016. 9(1): p. 52-58. http://dx.doi.org/10.21609/jiki.v9i1.366 [14]Ayumi, V., et al. Optimization of convolutional neural network using microcanonical annealing algorithm. in 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS). 2016. IEEE. http://dx.doi.org/10.1109/icacsis.2016.7872787 [15]Yoo, J.-H., et al. Optimization of hyper-parameter for CNN model using genetic algorithm. in 2019 1st International conference on electrical, control and instrumentation engineering (ICECIE). 2019. IEEE. http://dx.doi.org/10.1109/icecie47765.2019.8974762 [16]Guo, Y., J.-Y. Li, and Z.-H. Zhan, Efficient hyperparameter optimization for convolution neural networks in deep learning: A distributed particle swarm optimization approach. Cybernetics and Systems, 2020. 52(1): p. 36-57. http://dx.doi.org/10.1080/01969722.2020.1827797 [17]Bacanin, N., et al., Optimizing Convolutional Neural Network Hyperparameters by Enhanced Swarm Intelligence Metaheuristics. Algorithms, 2020. 13(3).DOI: 10.3390/a13030067. [18]Serizawa, T. and H. Fujita, Optimization of convolutional neural network using the linearly decreasing weight particle swarm optimization. arXiv preprint arXiv:2001.05670, 2020. https://doi.org/10.48550/arXiv.2001.05670 [19]Greeshma, K. and J.V. Gripsy, Image classification using HOG and LBP feature descriptors with SVM and CNN. Int J Eng Res Technol, 2020. 8(4): p. 1-4. DOI : 10.17577/IJERTCONV8IS04021 [20]Lankford, S. and D. Grimes, Neural architecture search using particle swarm and ant colony optimization. 2020. [21]Ying, Y., et al., Improving convolutional neural networks with competitive activation function. 142 Informatica 47 (2023) 133 –144 E.H. Abdulsaed et al. Security and Communication Networks, 2021. 2021: p. 1-9. [22]Nistor, S.C. and G. Czibula, IntelliSwAS: Optimizing deep neural network architectures using a particle swarm-based approach. Expert Systems with Applications, 2022. 187: p. 115945. http://dx.doi.org/10.1016/j.eswa.2021.115945 [23]Moodie, E.E. and D.A. Stephens, Comment: Clarifying endogeneous data structures and consequent modelling choices using causal graphs. 2020. http://dx.doi.org/10.1214/20-sts777 [24]Challapalli, J.R. and N. Devarakonda, A novel approach for optimization of convolution neural network with hybrid particle swarm and grey wolf algorithm for classification of Indian classical dances. Knowledge and Information Systems, 2022. 64(9): p. 2411-2434. http://dx.doi.org/10.1007/s10115-022- 01707-3 [25]Raji, I.D., et al., Simple deterministic selection-based genetic algorithm for hyperparameter tuning of machine learning models. Applied Sciences, 2022. 12(3): p. 1186. http://dx.doi.org/10.3390/app12031186 [26]Nocentini, O., et al., Image classification using multiple convolutional neural networks on the fashion- MNIST dataset. Sensors, 2022. 22(23): p. 9544. http://dx.doi.org/10.3390/s22239544 [27]Sumera, S.R., N. Anjum, and K. Vaidehi, Implementation of CNN and ANN for Fashion- MNIST-Dataset using Different Optimizers. Indian Journal of Science and Technology, 2022. 15(47): p. 2639-2645. http://dx.doi.org/10.17485/ijst/v15i47.1821 [28]Shin, S.-Y., G. Jo, and G. Wang, A Novel Method for Fashion Clothing Image Classification Based on Deep Learning. Journal of Information and Communication Technology, 2023. 22(1): p. 127-148. http://dx.doi.org/10.32890/jict2023.22.1.6 [29]Liu, D., et al., Hyperparameters Optimization of Convolutional Neural Network Based on Local Autonomous Competition Harmony Search Algorithm. Journal of Computational Design and Engineering, 2023: p. qwad050. http://dx.doi.org/10.1093/jcde/qwad050 [30]Altwaijry, N. and I. Al-Turaiki, Arabic handwriting recognition system using convolutional neural network. Neural Computing and Applications, 2021. 33(7): p. 2249-2261. http://dx.doi.org/10.1007/s00521-020-05070-8 [31]Ren, L., et al., A data-driven auto-CNN-LSTM prediction model for lithium-ion battery remaining useful life. IEEE Transactions on Industrial Informatics, 2020. 17(5): p. 3478-3487. http://dx.doi.org/10.1109/tii.2020.3008223 [32]Ashraf, A.H., et al., Weapons detection for security and video surveillance using cnn and YOLO-v5s. CMC-Comput. Mater. Contin, 2022. 70: p. 2761-2775. http://dx.doi.org/10.32604/cmc.2022.018785 [33]Zamir, M., et al., Face Detection & Recognition from Images & Videos Based on CNN & Raspberry Pi. Computation, 2022. 10(9): p. 148. http://dx.doi.org/10.3390/computation10090148 [34]Li, C., et al., Segmenting objects in day and night: Edge-conditioned CNN for thermal image semantic segmentation. IEEE Transactions on Neural Networks and Learning Systems, 2020. 32(7): p. 3069-3082. http://dx.doi.org/10.1109/tnnls.2020.3009373 [35]Haque, M.A., et al. Experimental evaluation of CNN architecture for speech recognition. in First International Conference on Sustainable Technologies for Computational Intelligence: Proceedings of ICTSCI 2019. 2020. Springer. http://dx.doi.org/10.1007/978-981-15-0029-9_40 [36]Khudeyer, R.S. and N.M. Almoosawi, Combination of machine learning algorithms and Resnet50 for Arabic Handwritten Classification. Informatica, 2023. 46(9). http://dx.doi.org/10.31449/inf.v46i9.4375 [37]Fregoso, J., C.I. Gonzalez, and G.E. Martinez, Optimization of convolutional neural networks architectures using PSO for sign language recognition. Axioms, 2021. 10(3): p. 139. http://dx.doi.org/10.3390/axioms10030139 [38]Alhijaj, J.A. and R.S. Khudeyer, Integration of EfficientNetB0 and Machine Learning for Fingerprint Classification. Informatica, 2023. 47(5). http://dx.doi.org/10.31449/inf.v47i5.4724 [39]Al, N.M.A.-M.M. and R.S. Khudeyer, ResNet-34/DR: a residual convolutional neural network for the diagnosis of diabetic retinopathy. Informatica, 2021. 45(7). http://dx.doi.org/10.31449/inf.v45i7.3774 [40]Mirjalili, S., et al., Salp Swarm Algorithm: A bio- inspired optimizer for engineering design problems. Advances in engineering software, 2017. 114: p. 163- 191. http://dx.doi.org/10.1016/j.advengsoft.2017.07.002 [41]Duan, Q., et al., Improved salp swarm algorithm with simulated annealing for solving engineering optimization problems. Symmetry, 2021. 13(6): p. 1092. http://dx.doi.org/10.3390/sym13061092 [42]Faris, H., et al., Salp swarm algorithm: theory, literature review, and application in extreme learning machines. Nature-inspired optimizers: theories, literature reviews and applications, 2020: p. 185-199. http://dx.doi.org/10.1007/978-3-030-12127-3_11 [43]Wu, H., CNN-Based Recognition of Handwritten Digits in MNIST Database. Research School of Computer Science. The Australia National University, Canberra, 2018.