https://doi.org/10.31449/inf.v46i5.3872 Informatica 46 (2022) 21–28 21 Comparative Analysis of Performance of Deep Learning Classification Approach based on LSTM-RNN for Textual and Image Datasets Alaa Sahl Gaafar and Jasim Mohammed Dahr Directorate of Education in Basrah, Basrah, Iraq. E-mail: alaasy.2040@gmail.com and Jmd20586@gmail.com Alaa Khalaf Hamoud Department of Computer Information Systems, University of Basrah, Basrah, Iraq E-mail: alaa.hamoud@uobasrah.edu.iq Keywords: deep learning, neural networks, LSTM, RNN, accuracy, speed, text, image, dataset Received: December 14, 2021 Deep learning approaches can be applied to a large amount of data for the purpose of simplifying and improving the engineering practice of automated decision-making activities rather than relying on human encoded heuristics. The need for generating faster and effective decisions about systems, processes, and applications gave rise to many artificial intelligences motivated approaches such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), fuzzy analytics, etc. Deep learning deploys diverse multiple layers of cascaded processing elements to enable features extraction and transformations. These deep learning approaches conduct multiple levels of depiction corresponding to distinct abstraction levels. There are several applications of deep learning algorithms including weather forecasting, object recognition, stock market performance forecasts, medical diagnosis, and emergency warning systems. This paper investigates the performance of the deep learning approach on the basis of processing components, data representation, and data types. To achieve this, a deep learning algorithm based on a long short-term memory-recurrent neural network (LSTM-RNN) was utilized to learn hidden patterns and features in the textual and image datasets respectively. The outcomes reveal that the performance of the image-based deep learning model was better in terms of speed due to well-defined patterns of data representation against the data with sentiments-based deep learning by 3.49 mins. to 18.25 mins. While the LSTM-RNN with images offered better classification accuracy by 96.50% to 85.69% due to complex network architecture, processing elements, and features of the underlying datasets. Povzetek: . 1 Introduction The concept of transfer learning or knowledge transfer is a machine technique that allows the reuse of a pre-trained model in the source domain as a point of reference in a dissimilar but associated target domain. Machine learning makes use of an algorithm to perform specific functions or tasks but, this can be extended to fresh tasks by means of transfer learning. Deep learning and machine learning are data-driven models which can be trained on an end-to-end basis for the purpose of accomplishing tasks such as feature extraction, feature selection, malignancy forecasts and optimization tasks [1]. CNNs is a feature depiction model that makes use of a collection of convolutional kernels in its convolution layer to extract features in high-dimensional structural data especially multi-channel inputs such as colour images. Many variants of CNNs structure have evolved over the years including: ResNet, VGG, GoogleNet, AlexNet and DenseNet for computer vision and image processing tasks [2]. CNN models perform computations by utilizing numerous processing layers to generate features inside raw data at multilevel depiction and hierarchical abstraction [1]. The effectiveness of CNN depends on the exploitation of domain knowledge concerning features invariances inside its structure which has been deployed on diverse recognition tasks such as image processing. This puts CNN over the standard fully connected deep neural networks (DNNs) [3]. Content-based learning methods have been applied to textual information for purpose of revealing hidden structures and meanings represented in linguistic/writing styles. Natural language processing enables crossed multiple levels of language represented in different hand- crafted features to be extracted before classification operations on their sentences, phrases and other syntactic features. Machine learning and deep learning techniques have been utilized to extracting features [4] within news content in order to classify fake news and rhetoric in case of political bias. SVM, random forest, vector space model, and bag-of-words features. CNNs, RNNs have recently been adopted for extracting the explicit and latent multi- 22 Informatica 46 (2022) 21–28 A. S. Gaafar et al. modal features with news contents [5]. The concept of fake news gave rise to machine learning based on enormous annotated data for the English language and few others. The deep learning models effectiveness in two cases of data representation, that is, images and textual with LSTM-RNNs classifications are investigated in this article. 2 Literature review 2.1 Overview of Convolutional Neural Network The CNN has its origin in the biological science domain for purpose classification tasks [6]. It is considered to be a special neural network because of its kind of arrangement in terms of full weight sharing. In this CNN arrangement, the foremost layer comprises of various feature maps (convolution layer) [3,7]. The convolution layer consists of neurons for accepting input emanating from a local receptive field matching features of a limited frequency range. The sets of neurons belonging to the similar feature map share the same weights known as kernels (or filters), which accept distinct frequency-shifted inputs. Consequently, the convolution layer performs convolution of the kernels using lower layer activations. The pooling layer is located on top of the convolutional layer for the purpose of computing a lower resolution depiction of the convolution layer activations by means of sub-sampling. The pooling function generates certain statistics of the activation which is applied to the neurons using a frequency band window realized from the same feature map in the convolutional layer. Also, the max pooling function is used to compute the feature with maximum value on the matching frequency bands. In particular, max-pooling and weight sharing are significant for attaining invariance to the small frequency shifts, and reduction of over-fitting caused by reduced amounts of trainable parameters. The higher level of features can be realized by combining or stacking pairs of convolution-pooling. Then, standard fully connected layers are introduced to aggregate the diverse bands of features. In image processing, CNN is considered to be potent because its weight sharing approach enables similar image patterns can be depicted at any point in the image. This makes the limited weight sharing approach most appropriate in that different (or unshared) kernel is applied at the different frequency window inside the convolution layer. It processes only a limited range of input bands and produces an output band in the pooling layer. The basic architecture of CNN is depicted in Figure 1 [8,9]. CNNs are categorized to match the number of layers in their architectures including: shallow layer CNN, deep layer CNN, extract features from pre-trained CNN, Fine- tune pre-trained CNN which are implemented for many image classifications tasks [9]. 2.2 Model of CNNs The CNN model is similar to the structure of artificial neural network (ANN) in that they are consist of neurons that improve the results through learning. In ANN and CNN, the neuron will receive the input to perform an operation to rectify the weights assigned to each connection between neurons. The loss functions that associated with the last layer in the ANN are the same in CNN. CNN is often used in the pattern recognition field in images. The features of image are encoded into network architecture to make the focus more accurate and to reduce the parameters required for the model set up step [10]. The basic structure of neuron model that consists the ANN and CNN model can be [11]: a. Single input neuron In this model as shown in Figure 2, the scalar input p and the scalar weight w are multiplied to produce the wp, where this result is sent to the summation function. The Figure 1 is the other input and the bias b are multiplied and passed to the summation function. The transfer function receives the output of summation function n to produce the scalar output of neuron. b. Multiple input neuron The neuron in this model has more than one input as shown in Figure 2. A weight element is assign to each individual input in the network. The weighted matrix is start with w 1,1, w 1,2, ….w 1,R . The bias within neuron is summed with the value of the weighted inputs to produce the net input as in equation (1): 𝑛 = 𝑤 1,1 𝑃 1 + 𝑤 1,2 𝑃 2 + ⋯ + 𝑤 1,𝑅 𝑃 3𝑅 + 𝑏 (1) The output is calculated as calculated in equation (2): 𝑎 = 𝑓 (𝑊𝑝 + 𝑏 ) (2) Recurrent Neural Network (RNN) is a class of powerful deep neural network using its internal memory with loops to deal with sequence data. The architecture of RNNs, which also is the basic structure of LSTMs [12]. For a hidden layer in RNN, it receives an input vector, and generates the output vector. RNNs exhibit the superior capability of adapting themselves to predict nonlinear time Figure 1: The basic structure of CNN. Figure 2: Single input neuron. Input layer Convolution layer Pooling layer Other fully connected hidden layers Output layer Comparative Analysis of Performance of Deep Learning ... Informatica 46 (2022) 21–28 23 series problems. Though, certain RNNs are bound to reach the vanishing with the Backpropagation coefficient learning, thereby making them unsuitable for long period lags learning nor accounting for long-term dependencies. These short-lived the widespread usage of RNNs, which gave rise to more improved approaches including the Long short-term memory (LSTM) and Gated Recurrent Unit (GRU) architectures. In recent applications, the LSTMs have shown promise on sequence-based computations with long-term dependencies. Though, GRU is an abridged LSTM architecture, which is a relative innovation in machine translation tasks such as SemSeq4FD [13]. 2.3 The Real-valued CNNs In computer vision, CNNs are considered to be the most effective models in performing tasks. After the huge success of LeNet in the domain of digit recognition, AlexNet was developed as the foremost deep CNN. It highly outperforms previous image classification modeling tasks [14]. Thereafter, several models having complicated and deep structures were developed including VGG [15] and ResNet [16], which have performed incredibly well in ILSVRC. Though, the CNN models have been extended to low-level vision tasks as in the case of SRCNN for image super-resolution. Aside from this, CNNs have been applied for denoising and in painting with encouraging outcomes. Recently, there are a number of efforts in extending real-valued neural networks to several number-related fields [1]. In the main time, complex-valued neural networks have been developed and proved to take advantage of generalization capability [17] not without the ease of optimization. In particular, audio signals have complex number representations which make complex CNNs idea for these tasks rather than real-valued CNNs. Nevertheless, there evidence that deep complex networks offer improved outcomes when compared with real-valued models in terms of audio-based tasks [18]. Another deep quaternion network is put forward by [19] in which its quaternion convolution basically substitutes the real multiplications, and its quaternion kernel is not parameterized further. 2.4 Deep Sentiment with CNNs The concept of sentiment analysis became an interesting area of research after the breakout applications of social networks and e-commerce websites. The foremost meaning of sentiment is to understand the public views and opinions concerning a particular topic, trends or product by means of tweets or reviews. Later, it was extended to the political and social trends extraction through annotation of opinions. Recently, machine learning algorithms were deployed to automatically generate annotations from diverse points of views to form a real-world knowledge base to map various scenarios to associated emotions for the purpose of extracting an overall sentiment in a statement. The advent of computer vision and natural language processing offered the opportunity to combine enormous labeled datasets which make deep learning models suitable. There is a shift from the traditional approach of extracting hand-crafted features from images and texts and supplying them to the classification model. At present, it is possible to conduct end-to-end models in order to simultaneously lean the feature depiction and classify accordingly. Deep learning inspired models are capable of undertaking many tasks including machine translation, object detection, word embedding, sentiment analysis, name entity in NLP, question answering, image classification, image generation, and computer vision’s unsupervised feature learning. The process of conducting sentiment analysis involves progressive training on certain labeled data known as transfer learning such as CNNs, and LSTM [20]. 3 Related works 3.1 Image classification A hybrid transfer learning approach based on CNNs’ AlexNet and VGG-16 net were developed for the purpose of detecting cervical cancer. The technique is based on the reduction of filters needed for undertaking the tasks especially the feature selection process by means of convolutional layers of pre-trained networks considered [9]. A fresh image dataset together with expert annotated diagnosis for performing image-based cervical disease classification tasks was proposed by [21]. A fine-tuned CNN model for extracting the several features in the images before classifying with classifiers such as ensemble-tree models, and multi-layer perceptron models. The effectiveness of randomized and optimized weights of CNN model was investigated by [22]. These models were applied to linear and non-linear data types with more desirable outcomes for CNN with optimized weights. In a separate work, the effectiveness of CNN was tested in classifying habitus images for the purpose of recognizing insects, which is at 74.9% according to [4]. To ascertain optimal deep learning schemes for point-of-care ultrasound (POCUS) cardiac image classification tasks in [23]. Six pretrained CNNs including AlexNet, Inception- v4, VGG-16, VGG-19, ResNet50, and DenseNet201 were trained with five samples of POCUS images which generated accuracy between 96%-85.6%. VGG-16 was the best while was the least performing classifier. Multi- Object-based CNN (OCNN) was proposed DenseNet201 by [24] to improve the performance of large-area classification across diverse landscapes, which is composed of image segmentation, skeleton-driven algorithm object analysis, CNN final classification. The overall results showed 87.2% for sensed images. The peculiarity of Hyperspectral image (HSI) makes it challenging for classification tasks due to enormous bands with close correlation in spatial and spectral domains. The problem of smaller training samples is even more difficult which led to use of deep CNN for extracting spatial features in HSI. In addition, AdaMAx, RMSprop, SGD, Adadelta, and Adagrad classifiers were used to optimize the deep CNN model. The outcomes relieved that deep CNN with Adam classifier produced the best performance at 98.97 ± 0.81% according to [25]. Another similar work on HSI classification utilized CNN with 24 Informatica 46 (2022) 21–28 A. S. Gaafar et al. Markov Random Field (MRF) in order to enhance effectiveness [26]. Again, In [27], the authors investigated a combination of CNNs for learning joint spatial-spectral features effectively across multiple scale in HSI which produced 66.73% accuracy. The adoption of machine learning technique for detecting early stages and severity of Diabetic Retinopathy based on colour fundus image. Hyperparameter tuning Inception-v4 (HPTT-v4) model was utilized by [28] for detection and classification tasks with 99.49% performance accuracy. In [29], the authors proposed enhanced feature selection-based model for medical image classification. Opposition-based Crow Search algorithm was used to optimize the DL model with 95.22% accuracy. 3.2 Textual classification The ensemble model founded on CNN and LSTM for extracting temporal information and local structure of textual datasets was developed by [20]. The performance of the proposed model outperforms the individual model and previous studies. Fake news classification model based on reinforcement learning of textual content was proposed by [30]. It is made up of an annotator, reinforced selector and fake news detector for automatically generating labels of unlabeled news items, selecting high- quality samples for training, and detection of fake or genuine news items. A fusion method of multi-modal features extraction with CNN, Inception, Multilingual- BERT, XLNet, XLM-Roberta and VGG16 models for the image and textual dataset was used to investigate social media memes by [31]. Sentiment analysis of textual items based on lexicon integrated two-channel CNN-LSTM family models was proposed by [32]. It is a deep learning approach that combined CNN and LSTM/BiLSTM channels in a concurrent manner whose results are encouraging for different datasets considered. Another study on a multimodal fusion of linguistic and speech in the detection of depression using Gated CNN, LSTM, and CNN was experimented by [33]. The hybrid model of audio features extraction of the GCNN-LSTM model and text features acquired from the CNN-LSTM architecture was better in both CCC and RMSE parameters for the development and the test datasets. An automatic approach for labeling on-topic social media posts composed of visual-textual features was experimented by [7]. The extractions of textual and visual features were achieved by word embedded CNN and Inception-V3 CNN. The classification process utilized concatenated features for effective and timely disaster reporting and mitigation. A model based on deep CNN and LSTM was developed to enhance the accuracy of image capturing [34]. The model based on gated CNN and BERT-CNN multimodal approach was used to determine depression detection in the work by [33]. A texture recognition task based on optimized CNN was developed by [6]. It optimized filter values, and weights, and biased values in the convolution layer and fully-connected layer by means of a whale optimization algorithm. The speech modality features were extracted from the trained VGG-16 network then Gated CNN and LSTM layer. The textual features embedding, BERT features from textual were extracted, then CNN and LSTM layer. These provided Patient Health Questionnaire score estimates. Hierarchical Graph Attention Network (HGAT) makes use of hierarchical attention architecture that is schema-level and node-level attention to recognize fake news of online news articles. According to [35], this is effective for heterogeneous networks and no previous knowledge is required. A deep CNN known as FNDNet was developed to identify fake news across online social media by automatically learning the discriminatory features. This model extracts diverse features of news articles at each layer with superior outcomes over available models according to [36]. 4 Research methodology This section presents the conceptual models for image and textual data feature extractions and classification. The effectiveness of the models is validated accordingly. 4.1 Description of the proposed model The complete structure of the proposed model is composed of the input, process and output components respectively as shown in Figure 3. The classification model utilizes image and a textual dataset which are received at the input component of the model. The distinct datasets are preprocessed through the removal of noise and redundancy for the purpose of obtaining enhanced input. In case of the image dataset, the preprocessing and augmentation involve whitening of image samples, after their up sampling, its 32 by 32 crop size was selected [37]. While, the textual dataset during preprocessing were grouped into similar clusters [38], and conversion textual information into vectors or numeric values, and removal of infrequent classes using semantic and syntactic association of words by mean of natural language processing (NLP) [39]. The process component undertakes feature extraction, feature selection and classification after training procedures. These are achieved with deep learning algorithms of LSTM-RNN situated within the process component of the model. The output component provides the results of the evaluation carried out by the distinct deep classification models of LSTM-RNN using a test dataset, that is, a portion of the input dataset. The results are expressed in error rates and percentage of accuracy. Figure 3: Conceptual structure of proposed deep learning models. Input component Process component Output component Image dataset Textual dataset Comparative Analysis of Performance of Deep Learning ... Informatica 46 (2022) 21–28 25 4.2 Simulation settings The minimum specifications for the experimenting concept of textual and image-based classifications of LSTM-RNN model are presented in Table 1. Hyperparameters training for the RNN-LSTM for textual: The procedure of choosing hyperparameters is the major aspect of the most of deep learning approaches, which can be achieved in a manual or automatic way. The goal is to minimize the cost and memory of execution. The learning algorithm makes use of the hyperparameter settings for purpose of training datasets the context- specific dataset on the model as provided in Table 2. Hyperparameters training for LSTM-RNN for images: The complete values of the hyperparameters for the deep learning approach which are manually carried out in order to reduce memory and cost of execution. The learning algorithm make use of the hyperparameter settings for purpose of training datasets the context- specific dataset on the classifier is given in Table 3. The steps for proof-of-concept in both RNNs-based classification cases are presented as follows: i. The data input is collected from the standard textual and image repository (TensorFlow). ii. The text and image datasets are preprocessed by removing noise and irrelevant items. iii. The minimal experimental parameters are used to construct the two RNN models. iv. The textual and image features are extracted using the deep neural networks built in the previous step (iii) to generate the training sets. v. The textual and visual features are used to train the RNN classifiers in both cases. vi. The outcomes of classification are generated and analyzed. 4.3 Dataset Description The image dataset is a large set of Fashion MNIST images comprising of 221.83 MiB size, 70000 items with nine labels namely: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, and Ankle boot were downloaded from TensorFlow image catalogue. Similar dataset was utilized in the work by [40]. Again, the textual dataset is a large IMDB movie reviews collected from TensorFlow composed of customers’ reviews of positive and negative sentiments, which is 32.06 MiB size. It is made up of 10000 buffer size and 64 batch size categorized as texts and labels. Again, the training and testing components were 70% to 30% for training and testing respectively. Similar textual dataset was used to measure the performance of models in previous studies [20,36]. In both cases, the datasets were split into 70% and 30% for training and testing of classification models respectively. Evaluation parameters: The performances of the two classification approaches are computed using accuracy metric as represented in Equations 3 and 4 [29]: 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑁 + 𝑇𝑃 𝐹𝑁 + 𝐹𝑃 + 𝑇𝑁 + 𝑇 𝑃 (3) System Requirement Property Hardware Hard Disk Drive 68.35 GB RAM 12.69 GB Software Operating system Python 3 Google Compute Engine backend (GPU) Simulator Google Collaboratory (Colab) Classification algorithm LSTM-RNN Input types Text, Image Browser type Google Chrome Version 91.0.4472.124 (Official Build) (64-bit) Table 1: Simulation settings. Hyperparameter Value Network model Keras-LSTM Number of layers 32 Embedding dimension 64 Number of dense layer(s) 1 Max number of Epochs 10 Gradient Threshold 1 Dropout 0.5 Activation relu Optimizers Adam Metrics Accuracy Loss function BinaryCrossentropy Input type Textual Table 2: Minimal parameters for LSTM-RNN model for text. Hyperparameter Value Number of inputs 28 Number of steps 28 Classifier RNN-LSTM Activation function Relu Loss function CrossEntropyLoss Number of neurons 150 Number of Epochs 10 Number of outputs 10 Optimizer adam Learning rate 0.001 Metrics Accuracy Input Grayscale images Table 3: Minimal parameters for LSTM-RNN for images. 26 Informatica 46 (2022) 21–28 A. S. Gaafar et al. 𝐸𝑙𝑎𝑠𝑝𝑠𝑒𝑑 𝑇𝑖𝑚𝑒 = 𝐸𝑇 − 𝑆𝑇 (4) Where, TP, FP, TN, FN, ET and ST depict the following: True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN), Execution end time, and ST is execution start time. 4.4 Results and Discussion The progress of training and validation accuracy for the long short-term memory (LSTM) network structure of RNN are shown in Figure 4. In Figure 4, there are strong correlations in the accuracy and validation accuracy during the training process of the dataset especially at epoch 1 to 10. Similarly, the loss distributions over the textual data deep learning are relatively diminished for loss and validation loss from epoch 1 to 10 respectively. The summary of LSTM-RNN deep leaning outcomes after training and validation of textual data are presented in Table 4. From Table 2, the LSTM classified 1000 words into 5 classes after training and identifying the hidden structures with considerable effectiveness at 85.69% accuracy. However, the time for training and validation of the deep learning model provided by RNN is relatively ineffective at 18.25 mins. Similarly, the LSTM-RNN deep learning structure of the CNN model is used for training the image features selected are contained in Table 5. From Table 3, the process of classifying 60000 image features with 9 classes provided 88.46% accuracy over the period of 3.49 mins. This model performs a complete training and validation procedure in relatively faster elapsed time due to improved feature selection and well- defined hidden data patterns in the images. The comparison with existing deep learning models for textual and image classifications is presented in Table 6 From Table 6, the dataset nature and network architecture strongly impact on the deep learning classification accuracy and speed. Therefore, deep learning offered by LSTM-RNN for images is better for speed and accuracy of classification outcomes against textual due to the complicated structure of network and data formation in spectral and spatial domains. When compared to existing approaches using the same datasets, majorly text has higher accuracy than image classified due to domain of application, complexity of hidden patterns in media, and preprocessing performed prior to classification tasks. 5 Conclusions and future works This paper demonstrated the effectiveness of deep learning approach of LSTM-RNN, which is a derivative of convolutional neural networks for image and textual datasets classification tasks. In both cases, LSTM-RNN was used to extract information concerning the local structure of the data by means of the application of multiple filters with distinct dimensions in the images and textual before classifying. Though, the process of appropriately extracting the temporal correlation of data and dependencies in the text snippet through the word encoding approach is different from the images. The performance of image classification tasks outperformed textual classification in terms of accuracy due to quicker way of learning the patterns, knowledge, and information contained in the dataset by 96.50% to 85.69%. More so, LSTM-RNN for images performed better in terms of speed of classification by 3.49 mins to 18.25 mins. In future work, there is a need to utilize more classification models to demonstrate the concept understudied in this article. Also, there is need to effectively classify various data representations in textual and other media at higher levels of accuracy and speed. This work can be extended to other high-performance deep learning algorithms, and datasets (such as video, audio and non-English textual). Figure 4: The training process for accuracy and loss functions of textual learning. S/N. Parameter Value 1. Dimension of embedding 171 2. Number of classes 5 3. Number of vocabularies 1000 4. Accuracy 0.8569 5. Loss 0.3169 6. Elapsed Time 18.25 mins Table 4: Textual deep learning outcomes. S/N. Parameter Value 1. Number of image features 60000 2. Number of classes 9 3. Accuracy 0.9650 4. Loss 0.1250 5. Elapsed Time 3.49 mins Table 5: . Classification model Accuracy Elapsed time Dataset Data type LSTM-RNN 96.50% 3.49 mins TensorFlow Image LSTM-RNN 85.10% 18.25 mins TensorFlow Text Modified CNN 74.6% Unspecified TensorFlow Image CNN 99.98% Unspecified TensorFlow Image LSTM 80.5% Unspecified Kaggle Text Deep CNN (FNDNet) 98.36% Unspecified Kaggle Text Table 6: Comparisons of classification models of textual and image. Comparative Analysis of Performance of Deep Learning ... Informatica 46 (2022) 21–28 27 References [1] X. Zhu, Y. Xu, H. Xu, and C. Chen, “Quaternion convolutional neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 631–647. [2] L. Zou, S. Yu, T. Meng, Z. Zhang, X. Liang, and Y. Xie, “A technical review of convolutional neural network-based mammographic breast cancer diagnosis,” Comput. Math. Methods Med., vol. 2019, 2019. [3] O. Abdel-Hamid, L. Deng, and D. Yu, “Exploring convolutional neural network structures and optimization techniques for speech recognition.,” in Interspeech, 2013, vol. 11, pp. 73–75. [4] O. L. P. Hansen et al., “Species‐level image classification with convolutional neural network enables insect identification from habitus images,” Ecol. Evol., vol. 10, no. 2, pp. 737–747, 2020. [5] V. Suryanarayanan, B. Patra, P. Bhattacharya, C. Fufa, and C. Lee, “ScopeIt: Scoping Task Relevant Sentences in Documents,” arXiv Prepr. arXiv2003.04988, 2020. [6] U. Dixit, A. Mishra, A. Shukla, and R. Tiwari, “Texture classification using convolutional neural network optimized with whale optimization algorithm,” SN Appl. Sci., vol. 1, no. 6, pp. 1–11, 2019. [7] X. Huang, Z. Li, C. Wang, and H. Ning, “Identifying disaster related social media for rapid response: a visual-textual fused CNN architecture,” Int. J. Digit. Earth, 2019. [8] F. E. F. Junior and G. G. Yen, “Particle swarm optimization of deep neural networks architectures for image classification,” Swarm Evol. Comput., vol. 49, pp. 62–74, 2019. [9] V. Kudva, K. Prasad, and S. Guruvare, “Hybrid transfer learning for classification of uterine cervix images for cervical cancer screening,” J. Digit. Imaging, vol. 33, no. 3, pp. 619–631, 2020. [10] K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv Prepr. arXiv1511.08458, 2015. [11] A. Hamoud and A. Humadi, “Student’s success prediction model based on artificial neural networks (ANN) and a combination of feature selection methods,” J. Southwest Jiaotong Univ., vol. 54, no. 3, 2019. [12] S. Sokolov, S. Vlaev, and M. Chalashkanov, “Technique for storing and automated processing of weather station data in cloud platforms,” in IOP Conference Series: Materials Science and Engineering, 2021, vol. 1032, no. 1, p. 12021. [13] Y. Wang, L. Wang, Y. Yang, and T. Lian, “SemSeq4FD: Integrating global semantic relationship and local sequential order to enhance text representation for fake news detection,” Expert Syst. Appl., vol. 166, p. 114090, 2021. [14] M. Thoma, “Analysis and optimization of convolutional neural network architectures,” arXiv Prepr. arXiv1707.09725, 2017. [15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv Prepr. arXiv1409.1556, 2014. [16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [17] A. Hirose and S. Yoshida, “Generalization characteristics of complex-valued feedforward neural networks in relation to signal coherence,” IEEE Trans. Neural Networks Learn. Syst., vol. 23, no. 4, pp. 541–551, 2012. [18] C. Trabelsi et al., “Deep complex networks,” arXiv Prepr. arXiv1705.09792, 2017. [19] C. J. Gaudet and A. S. Maida, “Deep quaternion networks,” in 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8. [20] S. Minaee, E. Azimi, and A. Abdolrashidi, “Deep- sentiment: Sentiment analysis using ensemble of cnn and bi-lstm models,” arXiv Prepr. arXiv1904.04206, 2019. [21] T. Xu et al., “Multi-feature based benchmark for cervical dysplasia classification evaluation,” Pattern Recognit., vol. 63, pp. 468–475, 2017. [22] W. Yu and M. Pacheco, “Impact of random weights on nonlinear system identification using convolutional neural networks,” Inf. Sci. (Ny)., vol. 477, pp. 1–14, 2019. [23] M. Blaivas and L. Blaivas, “Are all deep learning architectures alike for point‐of‐care ultrasound?: evidence from a cardiac image classification model suggests otherwise,” J. Ultrasound Med., vol. 39, no. 6, pp. 1187–1194, 2020. [24] V. S. Martins, A. L. Kaleita, B. K. Gelder, H. L. F. da Silveira, and C. A. Abe, “Exploring multiscale object- based convolutional neural network (multi-OCNN) for remote sensing image classification at high spatial resolution,” ISPRS J. Photogramm. Remote Sens., vol. 168, pp. 56–73, 2020. [25] S. Bera and V. K. Shrivastava, “Analysis of various optimizers on deep convolutional neural network model in the application of hyperspectral remote sensing image classification,” Int. J. Remote Sens., vol. 41, no. 7, pp. 2664–2683, 2020. [26] X. Cao, J. Yao, Z. Xu, and D. Meng, “Hyperspectral image classification with convolutional neural network and active learning,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 7, pp. 4604–4616, 2020. [27] K. Safari, S. Prasad, and D. Labate, “A multiscale deep learning approach for high-resolution hyperspectral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 18, no. 1, pp. 167–171, 2020. [28] K. Shankar, Y. Zhang, Y. Liu, L. Wu, and C.-H. Chen, “Hyperparameter tuning deep learning for diabetic retinopathy fundus image classification,” IEEE Access, vol. 8, pp. 118164–118173, 2020. [29] R. J. S. Raj, S. J. Shobana, I. V. Pustokhina, D. A. Pustokhin, D. Gupta, and K. Shankar, “Optimal feature selection-based medical image classification using deep learning model in internet of medical things,” IEEE Access, vol. 8, pp. 58006–58017, 2020. 28 Informatica 46 (2022) 21–28 A. S. Gaafar et al. [30] Y. Wang et al., “Weak supervision for fake news detection via reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 01, pp. 516–523. [31] E. Hossain, O. Sharif, and M. M. Hoque, “NLP- CUET@ DravidianLangTech-EACL2021: Investigating Visual and Textual Features to Identify Trolls from Multimodal Social Media Memes,” arXiv Prepr. arXiv2103.00466, 2021. [32] W. Li, L. Zhu, Y. Shi, K. Guo, and E. Cambria, “User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models,” Appl. Soft Comput., vol. 94, p. 106435, 2020. [33] M. Rodrigues Makiuchi, T. Warnita, K. Uto, and K. Shinoda, “Multimodal fusion of bert-cnn and gated cnn representations for depression detection,” in Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019, pp. 55–63. [34] N. Gupta and A. S. Jalal, “Integration of textual cues for fine-grained image captioning using deep CNN and LSTM,” Neural Comput. Appl., vol. 32, no. 24, pp. 17899–17908, 2020. [35] Y. Ren and J. Zhang, “Fake news detection on news- oriented heterogeneous information networks through hierarchical graph attention,” arXiv Prepr. arXiv2002.04397, 2020. [36] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, “FNDNet–a deep convolutional neural network for fake news detection,” Cogn. Syst. Res., vol. 61, pp. 32–44, 2020. [37] B. Zoph and Q. V Le, “Neural architecture search with reinforcement learning,” arXiv Prepr. arXiv1611.01578, 2016. [38] F. A. Ozbay and B. Alatas, “Fake news detection within online social media using supervised artificial intelligence algorithms,” Phys. A Stat. Mech. its Appl., vol. 540, p. 123174, 2020. [39] Q. Umer, H. Liu, and I. Illahi, “CNN-based automatic prioritization of bug reports,” IEEE Trans. Reliab., vol. 69, no. 4, pp. 1341–1354, 2019. [40] P. Yuan and R. Huang, “Integrating the device-to- device communication technology into edge computing: A case study,” Peer-to-Peer Netw. Appl., vol. 14, no. 2, pp. 599–608, 2021.