https://doi.org/10.31449/inf.v49i14.7272 Informatica 49 (2025) 157–170 157 
Self-Learning Model for Pattern Recognition in Vision System Based 
on Adaptive Kernel 
 
Aradea
*
, Rianto, Nina Herlina, Irani Hoeronis
 
Department of Informatics, Faculty of Engineering, Siliwangi University, Kode Pos 46115, Tasikmalaya, Indonesia  
E-mail: aradea@unsil.ac.id, rianto@unsil.ac.id, ninaherlina@unsil.ac.id, iranihoeronis@unsil.ac.id 
*Coresponding Author 
Keywords: neural network, pattern recognition, self-adaptation, self-learning, vision system  
Received: October 4, 2024 
Recently, the solution for recognizing and understanding an object based on visuals is to integrate the 
adaptation function (continuous machine-driven process) into the system update function involving humans 
(continuous human-driven process). However, this has created a gap between the adaptation function and 
the system. This situation requires understanding the system viewed as a dynamic composition of the learning 
process. This research introduced a self-learning model in the form of an adaptive kernel equipped with the 
SpinalNet architecture, and the goal of this study is to increase the Convolutional Neural Network (CNN) 
accuracy. The model consisted of a domain model, contextual knowledge, and adaptive learner developed 
based on the CNN with SpinalNet. The combination of Adaptive Kernel and SpiralNet in this CNN has a 
significant impact, allowing the model to adjust the selection of subsequent kernels based on the optimal input 
from the previous kernel. Moreover, this combination results in lower memory usage during training. The 
evaluation results show that our proposed model provides better classification accuracy than the SpiralNet 
model without the Adaptive Kernel. Furthermore, in terms of inference speed, our model outperforms 
SpiralNet, as evidenced by the use of fewer parameters.  
     Povzetek: Prilagodljiv model samoučenja, ki temelji na jedru, izboljša prepoznavanje vzorcev v sistemih za 
vid, integracijo CNN s SpinalNet za izboljšanje natančnosti klasifikacije, optimizacijo izbire jedra in 
zmanjšanje uporabe pomnilnika med usposabljanjem.
1 Introduction 
Computer (system) vision is a field of artificial 
intelligence that trains computer machines to interpret and 
understand (recognize) the visual world through deep 
learning models. The goal is that the machine can 
accurately identify and classify real-world objects and 
then react to what it sees. At present, the recognition and 
reaction capabilities of a system will be associated with 
the complexity of a highly dynamic, unpredictable, and 
uncertain environment [1]. In addition, the involvement of 
various elements of the real world interacting with the 
system will require the adaptability of the system. This 
ability will determine the success or failure of a system in 
recognizing and acting on what is occurring in its 
environmental context [2]. In fact, [3] states that the need 
to develop a system has entered the wave of learning from 
experience, namely the deployment of machine learning 
techniques. It functions to support various system 
functions to create an adaptive system, including a system 
capable of operating under conditions of uncertainty. 
Also, it can guarantee that its main property will function 
optimally. Therefore, a vision system for the current world 
requires a pattern recognition model possessing 
adaptability and a reliable optimization level. 
Adaptability in a system aims at realizing the behavior of 
adapting a system built based on special requirements [4]. 
This situation, among others, requires a system to 
recognize changes in its application domain. Additionally, 
it can change itself to produce alternative behaviors [5].  
 
Further, [3] in his latest review of long-term challenges 
that could trigger a new wave of scrutiny in the field of 
self-adaptation, raises an interesting question, namely the 
extent to which to develop systems to handle conditions 
that were not (fully) anticipated at the time the system was 
cultivated. Researchers have proposed various approaches 
to fostering adaptability in a system based on their 
respective problem domains. As a result, currently, neither 
a definition nor a specification for a system's adaptability 
has been widely agreed upon [6]. Besides, this applies to 
the specification of adaptability in vision systems. As an 
example, there is a need for deep meta-learning applied to 
image recognition problems  [7]. This problem can be 
resolved by understanding the system viewed as a 
dynamic composition of the learning process, namely how 
to enhance the system with self-learning abilities [3]. 
The perspective of growing adaptability is grounded in the 
self-learning model. The idea is to overcome the gap 
existing in the traditional perspective. In particular, there 
is a need to integrate the adaptation function (continuous 
machine-driven process) into the system update function 
involving humans (continuous human-driven process). 
Consequently, the system can only run for a short cycle 
since it has to wait for updates to deploy. Researchers 
generally develop adaptability for pattern recognition in 
vision systems by expanding various features to 
complement machine learning's ability to recognize visual 
cues. Some approaches or techniques can be used. 
Generally, they can be categorized into three categories, 
namely feature-based, template-matching, and image-
158 Informatica 49 (2025) 157–170 A. Rianto et al.  
based [8]. Image-based techniques are one of the concerns 
in this study because they can utilize all parts of an image. 
As a result, the detection process does not depend on the 
characteristics of an image or not focus on matching small 
parts of the image, becoming the model [9], [10]. It is 
expected that our research can be more flexible in 
developing a generic model to capture and recognize 
objects holistically with an optimal level of accuracy with 
self-learning capabilities. 
The existing main problem related to the application of 
machine learning for vision systems is to determine the 
most optimal algorithm. In addition, the researchers have 
conducted miscellaneous empirical investigations on 
various existing algorithms. One of them is a neural 
network. Nowadays, neural networks have become a 
method in machine learning with great success, including 
in object detection research [11] which initially had 
difficulties in its development. With various extensions of 
existing neural network-based methods, the development 
process has become easier [12]. One of them is the 
utilization of a deep neural network. It is a neural network 
architecture delving image data. In the context of a vision 
system, object detection is performed by training a 
computer to interpret and understand the visual world 
through a deep learning model. Hence, the machine can 
accurately identify and classify objects and react to what 
it screens. Therefore, the vision system requires a pattern 
recognition model to reach a reliable level of optimization 
and adaptation. 
There are myriad neural network algorithms. One of the 
developments (types) is the Convolutional Neural 
Network (hereafter, CNN) algorithm. CNN is a variation 
of Multilayer Perceptron (hereafter, MLP) designed to 
process two-dimensional data. On the one hand, MLP is 
not suitable for use in the case of image classification since 
it does not store special information from image data and 
considers each pixel as an independent feature, resulting 
in poor results [13]. On the other hand, CNN is also a type 
of deep neural network designed to process two-
dimensional data with a high network depth and is widely 
applied to image data [14]. Based on research [15], CNN 
has shortcomings in terms of the old model training 
process. Therefore, there has been a plethora of studies 
developing the CNN algorithm to get results or 
performance, especially regarding the level of accuracy so 
that it gets better. One of the developments in the use of 
optimization algorithms. Several optimization algorithms 
are included in the minibatch-based adaptive algorithm or 
algorithms included in the gradient descent optimization 
algorithm. 
These works allow us to extend the self-learning model 
based on neural network theory. A neural network, as a 
fundamental primitive, can provide flexibility in designing 
an architecture that focuses on adaptability. However, its 
impact on computational complexity should also be noted. 
Furthermore, various existing research results mainly 
accentuated the level of accuracy in the pattern recognition 
process. Only a tiny proportion pays attention to the 
adaptability of the learning process. One of the reasons for 
this is the lack of a good representation for meta-learning 
[7]. This study introduced a self-learning model for pattern 
recognition in the vision system by bringing up the 
adaptability function in the learning process. The model 
consisted of an optimized CNN algorithm employing an 
adaptive kernel. Thus, CNN can adapt to the model 
parameters in the learning process. The rest of this study 
consists of the second part discussing relevant studies, the 
third part describing the proposed model, and the fourth 
part eliciting the application of the model. In particular, it 
consists of experiments and a discussion of the evaluation 
results.  Finally, the fifth part concludes all the work 
results and discusses future job opportunities. The rest of 
this study consists of the second part discussing relevant 
studies, the third part describing the proposed model and 
the fourth part eliciting the application of the model. In 
particular, it consists of experiments and a discussion of 
the evaluation results.  Finally, the fifth part concludes all 
the work results and discusses future job opportunities. 
2 Related work 
There have been various empirical results relevant to 
machine learning for pattern recognition needs in vision 
systems. [16] compared the results of applying various 
optimization algorithms in deep learning, namely CNN, 
with three different CNN architectures. This study 
deployed two machine learning models, namely 
supervised and unsupervised learning. There were ten 
algorithms compared in this study, including the 
minibatch-based adaptive algorithm or algorithms 
included in the gradient descent optimization algorithm, 
namely the Stochastic Gradient Descent (SGD) algorithm, 
SGD-Momentum, SGD-Nesterov, AdaGrad, AdaDelta, 
RMSProp, Adaptive Momentum, AdaMax, Nadam, and 
AMSGrad. Four datasets were utilized: MNIST, CIFAR-
10, LFW, and Kaggle Flowers. One of the results of this 
study was that the Adaptive Momentum optimization 
algorithm worked optimally. In other words, it reached the 
highest level of accuracy when applied to the first and 
third CNN architectures with the dataset applied as LFW. 
Besides, [17] also compared the performance of CNN. 
The results indicated that the Adaptive Momentum 
optimization algorithm had the highest level of accuracy. 
This study applied the Adaptive Momentum algorithm to 
three different CNN architectures, namely ShallowNet, 
LeNet, and AlexNet. The results reported that the best way 
to increase the accuracy of photosynthetic pigment 
prediction on plant digital images was to deploy the 
adaptive momentum algorithm combined with the LeNet 
architecture.  
Currently, the use of CNN architecture has reached a 
higher level by adding an adaptive scheme to the training 
process. The research in [18] introduced an adaptive 
learning rate rule in CNN training by integrating the Egret 
Swarm Optimization Algorithm (ESOA) and quadratic 
interpolation (QIESOA) to improve prediction accuracy. 
Adapting the learning rate improved CNN's weaknesses in 
multi-domain image classification tasks, achieving the 
highest accuracy of 97.15% on the test dataset. Luo and 
Hu [19] developed Adaptive Attention ResNet (AA-
ResNet), which addresses overfitting and training errors in 
CNNs with deeper networks. Feature extraction became a 
Self-Learning Model for Pattern Recognition in Vision System… Informatica 49 (2025) 157–170   159 
primary focus of their research, using residual modules 
and adaptive attention to enhance feature representation. 
The developed model demonstrated high performance on 
the Cifar-10, Caltech-101, and Caltech-256 datasets. The 
research by Jiang et al. [20] discussed the role of activation 
functions in Convolutional Neural Networks (CNNs). It 
introduced the Adaptive Offset Activation Function 
(AOAF) as a solution to improve image classification 
accuracy. AOAF is a new parametric activation function 
that connects negative and positive values by adding an 
adaptive parameter (the average of the input feature 
tensor) [20]. The results showed that AOAF significantly 
improved accuracy, especially on datasets with high 
feature complexity. Wu and Pan [21] introduced an 
adaptive modular convolutional neural network (CNN) 
model design to improve efficiency and accuracy in image 
recognition tasks. Through a gate unit based on attention 
mechanisms, the model adaptively selects the optimal 
network structure based on learning. The results showed 
high accuracy on three Kaggle datasets (Cats-vs.-Dogs, 
10-Monkey Species, Birds-400). The research by Guo et 
al. [22] focused on developing an Adaptive Pooling 
Network (APN) based on memristor arrays to improve the 
performance and resilience of CNNs in managing 
information loss during pooling. The results demonstrated 
that APN enhanced CNN performance in terms of both 
accuracy and robustness on the MNIST and CAPTCHA 
datasets. To clarify the research results and identify gaps 
in the state-of-the-art concerning adaptability in vision 
systems, especially CNNs, we have summarized the 
findings in a table, as shown in Table 1.  
 
Table 1: State-of-the-art 
Research Proposed 
Method 
Problem Contribution Result Weakness 
Wei dkk. [18] CNN + QIESOA Slow 
convergence of 
traditional CNNs 
Adaptive 
learning rate 
update with 
ESOA and 
Quadratic 
Interpolation. 
91.25% (Cifar-
10), 88.66% 
(EMNIST), 
95.87% 
(EuroSAT), 
88.66% 
(Fashion-
MNIST), 
97.15% 
(RiceImage). 
Adaptation to 
datasets with 
high dynamics or 
specialized 
domains has not 
been discussed. 
Luo dan Hu [19] AA-ResNet Overfitting due 
to network 
depth. 
Adaptive 
attention, 
multitask loss 
function. 
92.43% (Cifar-
10), 69.61% 
(Caltech-101), 
52.29% 
(Caltech-256). 
Adaptation to 
large-scale 
datasets or new 
domains has not 
been tested. 
Jiang dkk. [20] AOAF Low 
performance of 
the ReLU 
function. 
Using negative 
values in feature 
extraction. 
Accuracy 
increased by 
4.8% compared 
to ReLU 
Not tested on 
datasets with 
high noise or 
different 
distributions. 
Wu dan Pan [21] Adaptive 
Modular CNN 
Model 
Overfitting, 
large parameters. 
Parallel modules 
and submodules, 
adaptive 
reduction of 
FLOPs. 
99.3% (Cats-vs-
Dogs), 99.26% 
(10-Monkey 
Species), 
99.13% (Birds-
400) 
Not evaluated on 
datasets with 
noise or extreme 
variations. 
Guo dkk. [22] APN 
(Memristor-
based) 
Information loss 
in CNN pooling. 
Adaptive pooling 
without 
backpropagation. 
99.3% (MNIST), 
92.6% 
(CAPTCHA). 
Difficult to adapt 
to systems 
without 
memristors and 
large datasets. 
 
Self-learning capabilities for vision systems have also 
been developed [7] by proposing a framework consisting 
of three main modules: the concept generator, meta-
learners, and concept discriminators. This framework 
integrated the representational power of deep learning into 
meta-learning. The results substantially improved vanilla 
meta-learning, demonstrated in various few-shot image 
recognition problems. Other researchers, including [23], 
employed a new structure and concept called SpinalNet. 
SpinalNet is an amalgamation of DNN and Gradual Input 
implementations. This study highlighted the shortcomings 
of DNNs related to computational intensity due to the size 
of the input network. Therefore, this study applied gradual 
input, which was the concept of input gradually, to reduce 
the burden of the calculation process. The results of this 
160 Informatica 49 (2025) 157–170 A. Rianto et al.  
study indicated that SpinalNet was able to increase the 
accuracy of the usual DNN. 
We had studied the model's adaptability before, starting 
with integrating the self-adaptation approach into 
requirements modeling [24]. As an illustration, we 
introduced a self-adaptation approach embedded into the 
primitive system requirements specification. Furthermore, 
in the study [25], we added a contextual-requirements 
approach to the adaptation pattern of the primitive system 
requirements. The goal was to capture the relevant context 
attributes so that the adaptive behavior of the system 
would match the prevailing context. In another study [1], 
we developed a pattern of adaptation to deal with the 
variability of system services. In this case, our primitive 
system requirements map to the various service levels of 
the system. In this study [2], we introduced the adaptation 
requirements for the adaptive systems (ARAS) 
framework, extending the system modeling language with 
control loop patterns and the context inheritance 
hierarchies. Technically, both were mapped into a graph 
network (Bayesian Network). We have defined several 
formalizations for adaptability in a graph. However, the 
results specific to the requirements of the vision system 
have not been attained. More recently, in the paper [26], 
[27], we merely attempted to apply the adaptability of this 
graph network to the needs of the Internet of Things (IoT) 
network system. 
We captured research opportunities Based on the related 
job descriptions and studies conducted previously. In this 
case, the study can be performed to improve (to enhance) 
the adaptability of the learning process in pattern 
recognition for vision systems. One example is the 
expansion of the CNN model development. More 
technically, the addition of the SpinalNet architecture to 
the CNN model that we have developed can have the 
opportunity to increase the adaptability and optimization 
of the learning process. Meanwhile, based on studies in 
related studies, it was explained that CNN fit image data. 
It has even been widely applied to image data [8]. 
Consequently, image-based techniques were also our 
concern when formulating the needs of this research. 
Additionally, [7], contended that there is a lack of a good 
representation for meta-learning, where this meta-learning 
will learn the learning algorithm (meta-learner) of many 
related tasks. These statements and facts have motivated 
us to develop a new model with self-learning capabilities 
for pattern recognition in vision systems. 
3 Proposed method 
The perspective used in developing this proposed model 
was inspired by [3]. In this sense, [3] notes that the 
challenge in the long-term triggering a new wave of 
research in the field of self-adaptation is to understand the 
system as a dynamic composition of the learning process. 
The idea is to enhance a system with self-learning 
capabilities. To illustrate, a system allows it to learn from 
the variety of data it collects and autonomously develops 
its learning process under changing and unpredictable 
conditions. In the context of the vision system, the work 
of [7] applied this perspective by proposing deep meta-
learning. Further, they also demonstrated its usefulness in 
image recognition problems. This work was extremely 
inspiring for us to propose a new model of self-learning 
capability for vision systems. Our model consists of three 
main components, namely the domain model, contextual 
knowledge, and adaptive learner as presented in Figure 1: 
a. Domain model is a domain modeling in the form of a 
graph network structure to capture high-level visual 
signal representations. 
b. Contextual knowledge represents the relevant context 
attributes in the model domain according to the current 
dynamic visual cue context. 
c. Adaptive learner consists of utility (utility function) 
and learner (learner function) functions that carry out 
learning and recognize visual cues representations 
based on the prevailing context. 
 
 
 
Figure 1: Self-learning model for pattern recognition in 
the vision system 
 
Domain model 
In a previous study [1], [2], we defined every element in 
the model domain indicating a dependency relationship. 
Furthermore, the model was regarded as a dynamic 
property in nature to be monitored based on certain 
parameter values. In this study, we developed it to specific 
representations for monitoring and capturing high-level 
visual cues. More specifically, the model deployed the 
SpinalNet structure developed by [23] taking inspiration 
from the human somatosensory system as presented in 
Figure 2. Following the way of how the human spinal 
network works, Spinal Net utilized gradual input (Gradual 
Input). All the layers contained in the model contributed 
to the main output of X in the same way that reflexes 
worked. Next, the modular input was sent to the main 
output of X. It was similar to how the brain works. 
Self-Learning Model for Pattern Recognition in Vision System… Informatica 49 (2025) 157–170   161 
 
Figure 2: SpinalNet model from (Adapted from [25]) 
 
In a model like the one illustrated in figure 3, the first layer 
utilized a simple linear function and obtained only the sum 
of weight w from x1-x5. The second layer of the model 
gained the total weight w of x6-x10 as one input and the 
result of layer 1 as the other input. Briefly stated, the 
definition can be formulated as follows: 
 
a. For each layer 𝑥 𝑖 ∈ {𝑥 1
, 𝑥 2
, . . . , 𝑥 𝑛 } will contribute 
to the main output layer X. 
b. For each input 𝑁 𝑖 ∈ {𝑁 1
, 𝑁 2
, . . . , 𝑁 𝑛 } can be 
modularized into each of its 𝑥 𝑖 layers and become 
inputs for 𝑥 𝑖 +1
. layers. 
 
 
Figure 3: Simplified SpinalNet as a single hidden layer 
from (Adapted from [23]) 
 
Contextual Knowledge 
Contextual knowledge is a representation of dynamic 
properties in the model domain [2]. It refers to the 
abstraction of domain properties relevant to the expected 
system behavior. Also, it covers the specific context in 
which this expected behavior applies [28], [29], [30]. In 
this investigation, contextual knowledge was specified for 
the needs of context attributes related to visual cues in the 
model domain. The attribute applied as contextual 
knowledge in this research was the kernel dimension. It 
was intended to determine the size of the matrix to perform 
convolution and input shift. The kernel on convolution is 
formulated as follows: 
 
a. 𝐹 ( 𝑥 )∗ 𝐹 ( 𝑦 ) is the dimension of the kernel matrix. 
b. 𝑁 ( 𝑥 )∗ 𝑁 ( 𝑦 ) is the dimension of the input matrix. 
c. The output dimension of the convolution is 𝑁 ( 𝑥 )−
𝐹 ( 𝑥 )+ 1 ∗ 𝑁 ( 𝑦 )− 𝐹 ( 𝑦 )+ 1. 
d. Convoluting the kernel 𝑄 𝑢 ,𝑣 with the activation 
function tanh will result in weight 𝐾 𝑢 ,𝑣 . 
 
Adaptive learner 
The adaptive learner is a module that can automatically 
serve adjustments due to changing and growing needs. 
The main purpose of this module is to model the system 
dynamically. In particular, the module learned to 
recognize every need existing in the model domain and 
contextual knowledge on a run-time basis. The main 
problem to be handled was related to variables with 
varying, different, and flexible properties. This module 
indicated two functions, namely the utility function in the 
form of a function to sort or define alternative varieties 
according to their use for individual visual cues, and the 
learner function to carry out learning and introduction to 
obtain the most optimal results. The new kernel function 
was obtained through the result of the convolution of each 
input convolution Q (u,v) as in the following equation: 
 
𝜎 𝑢 ,𝑣 = ∑ ∑ 𝑄 ( 𝑢 ,𝑣 )
𝑖 ,𝑗 𝑥 𝑖 ,𝑗 𝑁 −1
𝑗 =0
𝑁 −1
𝑖 =0
   . . ( 1) 
 
The new kernel K (u,v) can then be deployed to perform 
convolution on the input image to produce S. 
Subsequently, it was applied as the output kernel as in the 
following calculation: 
 
𝑆 = ∑ 𝑥 𝑢 ,𝑣 𝐾 (∑ 𝑄 𝑢 ,𝑣 𝑖 ,𝑗 𝑥 𝑖 ,𝑗 𝑖 ,𝑗 )
𝑢 ,𝑣 . . ( 2) 
 
𝑓 = tanh(S) ..(3) 
 
4 Experiment 
This section describes the evaluation of our proposed 
model for recognizing visual cue patterns, particularly 
handwriting patterns. In this experiment, we deployed 
MNIST datasets sourced from the research of LeCun, et. 
al. [31]. These datasets refer to a collection of handwritten 
images of numbers 0-9 consisting of 60,000 training data 
and 10,000 test data. The images were black and white. 
Each image was 28x28 pixels. The use of the MNIST 
dataset on the CNN method was performed by Saqib, et. 
al. [32]. The study succeeded in building a model 
recognizing and classifying handwritten figure images. 
The experimental results showed that the CNN model 
attained the highest classification of accuracy for a certain 
number of hidden layer neurons. Another scrutiny was 
conducted by Anwar, et. al. [33] Involving the MNIST 
dataset as the classification object of CNN. In addition to 
using MNIST, we also applied other datasets such as 
KMNIST, QMNIST, Fashion-MNIST, and EMNIST to 
strengthen the validation of the model we have developed. 
 
162 Informatica 49 (2025) 157–170 A. Rianto et al.  
Preparation of model application 
The network architecture structure developed in this 
experiment was inspired by the SpinalNet architectural 
model by carrying out several expansions, namely 
combining it with the Convolutional Neural Network 
architecture through an adaptive kernel on the convolution 
layer. The experimental mechanism was applied to the 
MNIST dataset with several models. As an example, the 
conventional CNN model commonly used covers the 
CNN model combined with the Adaptive Kernel, the 
SpinalNet model, and the model developed by the authors. 
Contextual knowledge elicitation was conducted to 
identify the relevant context attributes in the model 
domain related to the dynamic context of visual cues. This 
provides dynamic parameter updates during training on 
the adaptive kernel. The adaptive kernel parameters are 
iteratively updated during the backpropagation process. 
The kernel adapts by minimizing the cross-entropy loss 
through the gradient descent algorithm. The utility 
function calculates the optimal kernel value based on the 
contextual knowledge that has been learned. This 
mechanism allows the kernel to dynamically shift its focus 
and optimize the most relevant features for visual signals. 
Unlike non-adaptive kernels, which rely on static 
parameters, adaptive kernels dynamically adjust their 
parameters during training. For instance, after each 
convolution operation, the kernel dimensions are updated 
to optimize weight alignment in subsequent layers. This 
flexibility results in higher accuracy and efficiency, as 
demonstrated in our model. Figure 4 shows the distinction 
between the adaptive and non-adaptive kernel processes to 
clarify the differences. 
In this experiment, we identified the data collected from 
the results of pre-processing and preparation for the 
application of the model as contextual knowledge, namely 
the kernel that can change according to the determined 
input. 
 
Pattern recognition implementation and operation 
Our proposed model applied three main parts, namely the 
Adaptive Kernel, Convolutional Layer, and Full 
Connected Layer as shown in figure 5. First, the Adaptive 
Kernel was a Convolutional Layer involving an adaptive 
system in its kernel parameters. The determination of the 
kernel was based on the optimal input of the previously 
applied convolution. Second, the Convolutional Layer, 
both Adaptive Kernel and Convolutional Layer applied 
Maxpooling and Relu as activating functions. By applying 
the Spinal Layer to the Full Connected Layer section, the 
input parameters were smaller. As a result, memory usage 
can be kept to a minimum in learning the model. Third, 
Spinal Layer divided the input into several parts and then 
processed it with a linear function. In our model, the input 
was divided into two equal sizes and was processed 
linearly in six layers. In the final stage of the full 
connected layer, a linear function was utilized to combine 
the applied Spinal Layers. The utilized Spinal Net 
structure is shown in Figure 6. 
 
 
Figure 4: Differences between adaptive kernel approaches compared to non-adaptive methods. 
 
 
 
 
Figure 5. Self-learning model architecture 
 
 
Figure 6: SpinalNet architecture in full-connected 
layer 
 
More specifically, the implementation of the integration 
between the adaptive kernel and SpinalNet is shown in 
Figure 7, which illustrates the workflow of the proposed 
model. 
The model designed in Figure 7 processes a 28x28 
grayscale input image through a series of steps, starting 
from the dynamic kernel to the fully connected layer. In 
Self-Learning Model for Pattern Recognition in Vision System… Informatica 49 (2025) 157–170   163 
the dynamic kernel, the kernel weights are adaptively 
adjusted during training, resulting in 25 feature maps 
(28x28x25). This output is then passed to the dynamic 
layer, where the results from multiple kernels are 
combined, and the channels are reduced to 6 (28x28x6). 
Next, the Conv2D Layer extracts deeper features, 
followed by the MaxPooling Layer for downsampling, 
producing an output of 12x12x20. A Dropout Layer is 
applied to prevent overfitting without altering the data 
dimensions. The data is then flattened through the Flatten 
Layer into a 1D vector (500 elements), which is processed 
progressively by the SpinalNet Layers by splitting the 
vector into six segments and generating a combined 
representation with a total of 1500 elements. Finally, the 
Fully Connected Layer processes this representation into 
logits for 10 classes to generate probabilities, determining 
the final class prediction. Combining the Adaptive Kernel, 
Convolutional Layer, and SpinalNet ensures 
computational efficiency and model adaptability in 
handling visual data. 
 
 
Figure 7: Purposed method framework 
 
 
The integration results between SpinalNet and the 
Adaptive Kernel were trained using various datasets for 
classification tasks. This experiment has two training 
scenarios: one where the model is trained with the 
additional VGG-5 network [34] and another where the 
model is trained without that additional network. The 
model with the added VGG-5 was trained for 100 epochs, 
using a batch size of 128 and a learning rate 5×10
-3
. In 
contrast to the hyperparameters used in the first scenario, 
the model without the VGG-5 addition was trained for 
eight epochs, using a batch size of 128 and a learning rate 
of 1×10
-2
. The difference in hyperparameter usage was 
made to adjust to the needs of each model being trained to 
maximize the potential of the training results. 
Additionally, both training scenarios were optimized 
using Stochastic Gradient Descent (SGD) with the same 
momentum value 0.9. 
These hyperparameters were determined based on the 
results of a systematic evaluation of several 
hyperparameter choices using a grid search approach. The 
evaluation was based on validation accuracy across 
various configurations while also monitoring the stability 
of the loss function and the efficiency of the number of  
 
 
 
parameters in the model. The evaluation results for each 
hyperparameter choice are shown in Tables 1 and 2. 
 
164 Informatica 49 (2025) 157–170 A. Rianto et al.  
Table 1: Hyperparameter testing for the proposed 
model with the added VGG-5 
Hyperparameter Range Optimal Value 
Learning Rate [0.001, 0.005, 
0.01] 
0.005 
Batch Size 
[32, 64, 128] 
128 
Hidden Layer in 
SpinalNet 
[64, 128, 256] 128 
Neuron per Layer 
in SpinalNet 
[64, 128, 256] 128 
Momentum [0.5, 0.7, 0.9] 0.9 
 
Table 2: Hyperparameter testing for the proposed 
model without VGG-5 
Hyperparameter Range Optimal Value 
Learning Rate [0.001, 0.005, 
0.01] 
0.01 
Batch Size 
[32, 64, 128] 
128 
Hidden Layer in 
SpinalNet 
[4, 6, 8] 8 
Neuron per Layer 
in SpinalNet 
[125. 250, 500] 250 
Momentum [0.5, 0.7, 0.9] 0.9 
 
From the tests in Table 1, the optimal configuration for the 
proposed model with the added VGG-5 was obtained, 
which included a learning rate of 0.005, a batch size of 
128, 128 hidden layers in SpinalNet, 128 neurons per 
layer, and a momentum value of 0.9 for SGD. Meanwhile, 
the optimal performance for the proposed model without 
the VGG-5 addition was achieved with a learning rate of 
0.01, a batch size of 128, 8 hidden layers in SpinalNet, 250 
neurons per layer, and a momentum value of 0.9. This 
configuration provided the highest validation accuracy, 
maintained a stable loss curve throughout training, and 
showed a balance between performance and 
computational efficiency. 
Before the training process, we performed data 
preprocessing on all datasets used. In this process, we 
applied the same steps to all datasets, which included 
converting the images to tensors and normalizing the 
values. In the tensor conversion process, the pixel values 
of the images were changed from the original range (0 to 
255) to the range [0.0, 1.0] by dividing each pixel value 
by 255. Afterward, the converted pixel values underwent 
normalization using the Z-Score normalization method. 
After passing through this data preprocessing stage, the 
model training process is expected to be faster and more 
stable, accelerating convergence and reducing imbalance. 
 
Model evaluation and comparison 
To validate the proposed model, we compared this 
research model with the original SpinalNet model. The 
comparison included accuracy, the number of parameters 
used, and the inference speed of the model on each test 
dataset used. Tables 3 and 4 compare the evaluation results 
between our model and SpinalNet. 
 
Table 3: Comparison of Adaptive-SpinalNet and 
SpinalNet with the added VGG-5. 
Dataset 
Adaptive-SpinalNet SpinalNet [23] 
Accuracy Inference 
Time  
Accuracy Inference 
Time 
MNIST 99.78% 5.21s 99.72% 5.33s 
KMNIST 99.24% 5.25s 99.15% 6.12s 
QMNIST 99.54% 16.77s 99.68% 16.92s 
Fashion-
MNIS 
95.21% 5.70s 94.68% 6.43s 
EMNIST 
(Digits) 
99.74% 12.57s 99.82% `13.03s 
EMNIST 
(Letters) 
94.69% 8.68s 95.88% 9.17s 
 
 
 
 
 
Table 3 highlights the comparison between VGG-5 + 
Adaptive-SpinalNet and VGG-5 + SpinalNet regarding 
accuracy and inference time. It is evident that the inference 
speed of our model consistently outperforms across all 
datasets. Similarly, the Adaptive-SpinalNet model 
demonstrates a speed advantage compared to the original 
SpinalNet model. The adaptive kernel dynamically adjusts 
weights based on the input it receives, enabling a focus on 
the most relevant features for the classification task and 
thereby reducing processing time for less significant 
information. Additionally, parameter efficiency is 
achieved by minimizing redundancy in kernel weights. 
This results in optimal representation without excess 
parameters that could slow the inference process. The 
comparison of parameter reduction is illustrated in Figure 
8. 
 
Table 4: Comparison Between Adaptive-SpinalNet and 
SpinalNet 
Dataset 
Adaptive-SpinalNet SpinalNet [23] 
Accuracy Inference 
Time  
Accuracy Inference 
Time 
MNIST 98.93% 3.42s 98.48% 3.61s 
KMNIST 92.52% 3.81s 88.25% 4.08s 
QMNIST 98.47% 12.85s 98.07% 13.03s 
Fashion-
MNIS 
87.92% 3.54s 86.61% 3.90s 
EMNIST 
(Digits) 
99.35% 9.09s 99.16% 9.29s 
EMNIST 
(Letters) 
91.43% 5.24 90.23% 5.97s 
 
In addition to its positive impact on parameter reduction, 
the SpinalNet architecture combined with Adaptive 
Kernel generally enhances accuracy across all datasets. 
This is particularly evident in Table 4, demonstrating that 
directly applying Adaptive Kernel to SpinalNet improves 
the model's accuracy on all test datasets. This indicates 
that our model, tested on various datasets (including 
MNIST, Fashion MNIST, KMNIST, and EMNIST), can 
generalize across different data distributions. 
Experimental results reveal that the Adaptive-SpinalNet 
Self-Learning Model for Pattern Recognition in Vision System… Informatica 49 (2025) 157–170   165 
model consistently achieves competitive performance, 
even on datasets with significantly different visual 
patterns from MNIST. This highlights the model's ability 
to adapt to diverse data distributions. This adaptability is 
further reinforced by the Dynamic Kernel mechanism, 
which dynamically adjusts kernel weights based on input 
patterns during inference. This allows the model to capture 
relevant features under varying data conditions. 
Furthermore, the SpinalNet architecture processes feature 
independent segments, offering additional flexibility in 
handling shifts in data distribution. 
Furthermore, to provide stronger validation, we compared 
the model's performance with related studies using the 
same dataset benchmarks. This comparison is presented in 
Table 5.  
 
 
 
 
(a) 
 
(b) 
 
Figure 8: Comparison of the number of parameters between Adaptive-SpinalNet and SpinalNet: (a) with VGG-5, 
(b) without VGG-5 
 
Table 5: Comparison of Adaptive-SpinalNet with related studies 
Model Accuracy Number of 
Parameters MNIST KMNIST QMNIST Fashion MNIST 
SpinalNet [23] 98.48% 88.25% 98.07% 86.61% 16K 
VGG-5 + 
SpinalNet [23] 
99.72% 99.15% 99.68% 94.68% 3.6M 
CNN + QIESOA 
[18] 
- - - 97.15% Not Mentioned 
APN 
(Memristor-
based) [22] 
99.3% - - - Not Mentioned 
R-ExplaiNet26-
64 [35] 
99.70% 98.66% - 93.03% 0.89M 
Improved 
Efﬁcient 
Capsnet [36] 
- 98.43% - - 0.58M 
166 Informatica 49 (2025) 157–170 A. Rianto et al.  
PMM [37] 97.38% - - 88.58% 
4.9K (MNIST), 
16.7K (Fashion 
MNIST) 
ConvPMM [37] 99.10% - - 90.94% 
0.13M 
(MNIST), 
0.28M (Fashion 
MNIST) 
Adaptive-
SpinalNet 
98.93% 92.52% 98.47% 87.92% 15.9K 
VGG-5 + 
Adaptive-
SpinalNet  
99.78% 99.24% 99.54% 95.21% 1.1M 
 
 
Table 5 demonstrates that the VGG-5 + Adaptive-
SpinalNet model outperforms all other models in terms of 
accuracy on the MNIST and KMNIST datasets. Although 
its accuracy on the QMNIST and Fashion MNIST datasets 
remains slightly below the VGG-5 SpinalNet and CNN + 
QIESOA models, the differences are insignificant, 
indicating that our model performs well in handling data 
variability. The Adaptive-SpinalNet model has the fewest 
parameters compared to other models, except for the 
PMM model on the MNIST dataset. This proves the 
effectiveness of the Adaptive Kernel in reducing 
computational complexity in the SpinalNet model with 
minimal accuracy trade-offs. This performance is 
achieved through the Dynamic Kernel, which dynamically 
adjusts weights to extract relevant features, while 
SpinalNet processes features in independent segments to 
enhance flexibility and computational efficiency. With a 
low parameter count, Adaptive-SpinalNet demonstrates 
strong generalization across various datasets, making it 
suitable for real-world applications involving diverse data. 
In addition to the appropriate selection of 
hyperparameters, the performance achieved by Adaptive-
SpinalNet is also attributed to the optimal sizing of the 
Dynamic Kernel. The kernel size significantly affects the 
model's adaptability. To clarify this, Table 6 presents the 
model's performance trained on the MNIST dataset using 
different Dynamic Kernel sizes. 
 
Table 6: Comparison of adaptive-spinalnet model 
performance on the MNIST dataset based on kernel size 
Ukuran 
Kernel 
Recall Precision F1-Score Accuracy 
(3x3) 98.91% 98.91% 98.91% 98.91% 
(5x5) 98.93% 98.93% 98.93% 98.93% 
(7x7) 98.80% 98.80% 98.80% 98.80% 
(9x9) 98.38% 98.38% 98.38% 98.38% 
 
The results in Table 6 show a performance improvement 
when the kernel size is increased from (3x3) to (5x5). This 
suggests that enlarging the kernel size in the Dynamic 
Kernel can enhance performance. However, when the 
kernel size is further increased to (9x9), performance 
decreases. A larger Dynamic Kernel does not necessarily 
guarantee an improvement in model performance, as a 
very large kernel tends to aggregate information over a 
larger area, potentially overlooking important small or 
local patterns. In addition to this finding, another 
interesting observation from the comparison in Table 6 is 
the consistency between precision, recall, F1-Score, and 
accuracy. Identical values for precision, recall, and F1-
score indicate that our model works effectively, achieves 
an optimal balance, and handles class distribution well. 
This demonstrates that our model performs well on the 
MNIST dataset. 
 
 
 
 
Another option that can be used as an adaptation method 
for the SpinalNet model is Reinforcement Learning (RL)-
-based adaptivity, which can be used to select or adjust 
kernels based on feedback from the environment to 
optimize performance. While this method may have the 
potential to adjust kernels based on experience, 
weaknesses such as computational overhead, dependence 
on reward design, and stability issues make it less ideal for 
high-efficiency real-time applications. The performance 
comparison between the Adaptive Kernel and RL methods 
in Table 7 demonstrates this. 
 
Table 7: Performance comparison of adaptive kernel and 
rl methods on the spinalnet model using the MNIST 
dataset 
Method Epoch 
Acc 
(%) 
Inference 
Time (s) 
Domain 
Shift 
Acc (%) 
Adaptive 
Kernel 
5 97.85 3.42 88.97 
Reinforcement 
Learning-
Based 
Adaptivity 
5 96.67 5.65 85.74 
 
The comparison results in Table 7 show that the Adaptive 
Kernel method has a significant advantage over the 
Reinforcement Learning-based Adaptivity approach in 
terms of accuracy, inference time efficiency, and handling 
domain shift. Both methods were tested with five training 
epochs, with the Adaptive Kernel method achieving an 
accuracy of 97.85%, higher than the RL-based method, 
which only reached 96.67%. Furthermore, the inference 
time of the Adaptive Kernel is much faster, at 3.42 
seconds, compared to 5.65 seconds for the RL method. 
Self-Learning Model for Pattern Recognition in Vision System… Informatica 49 (2025) 157–170   167 
This indicates that the Adaptive Kernel method is more 
efficient for real-time applications than the RL-based 
adaptivity method. To further test the adaptability, we 
performed data augmentation for domain shift, which 
included random image rotation of up to 30°, brightness 
variation, and contrast changes. The evaluation results 
showed that the Adaptive Kernel's adaptation to domain 
shift was also superior, with an accuracy of 88.97% 
compared to 85.74% for the RL-based method. These 
results confirm that the direct adaptation mechanism of the 
Adaptive Kernel is more effective and efficient than the 
RL-based exploration, making it more suitable for 
adaptive vision systems that require high performance and 
resilience to data distribution changes. Another advantage 
is shown in the loss values generated at each epoch. 
Although the Adaptive Kernel method has a higher loss 
value than the RL-based adaptivity method in the first 
epoch, in subsequent epochs, the loss values for the 
proposed method consistently stay lower than those of the 
RL method. The comparison of loss values is shown in 
Figure 9. 
 
 
 
(a) 
 
(b) 
Figure 9: Comparison of Loss Values Between Adaptive Kernel and RL-based Adaptivity. (a) Original MNIST 
Test Data, (b) Augmented MNIST Test Data 
 
168 Informatica 49 (2025) 157–170 A. Rianto et al.  
 
Threats to validity 
a. Pre-Liminaries validity 
Validity in the preliminaries stage is measured by 
looking at the problem domain, which is understood as 
the clarity of data rows, datasets, and pre-processing. 
The transformation from non-linear to linearly 
separable transforms the data to a higher level by 
adding features using kernel functions. The raw data 
consists of various variants of the MNIST dataset, 
including MNIST itself, KMNIST, QMNIST, and 
Fashion-MNIST. Data identification is carried out to 
ensure that the pre-processing process in the model 
preparation stage is carried out correctly. This is a 
preparation stage for implementing the model as 
contextual knowledge. The data used is not too large. 
This was done to see how the model could be used with 
a limited amount of data but with high accuracy. 
 
b. Fitting validity 
In the evaluation of our research, the process of 
determining the model is carried out using cross-
validation and a confusion matrix. Validation is done 
by estimating the error and how our model can 
accommodate the unseen data. K-fold cross-validation 
is used to reduce parts that cause underfitting. By 
reducing training data, it is possible to lose trends in 
the data set, increasing the error caused by bias. The 
validation used is cross-validation, which generalizes 
the independent/unseen data set.At the validation 
stage, our learning self-learning model ensures that 
each process is carried out with attention to the 
evaluation of metrics, how to handle overfitting, and 
processes to reduce bias. 
c. Bias validity 
Measurement of the accuracy of each model is 
optimized by optimization of Stochastic Gradient 
Descent (SGD) and Cross Entropy Loss. To eliminate 
the habit of estimating gradients, SGD is required to 
reduce the cost of each iteration. The computing cost 
of each iteration will run linearly from O(n) to O(1). In 
determining the SGD variable, the learning rate affects 
the resolution of the conflicting goal by reducing the 
learning rate dynamically as optimization progresses. 
Cross entropy is determined to define the loss function 
in optimization. This is done by minimizing the cross 
entropy. Defining cross entropy indirectly proves the 
equivalence of the relationship between objects. Done 
as long as the entropy data is constant.. 
 
5 Conclusion dan further studies 
This study introduces a self-learning model for pattern 
recognition in vision systems. The model is developed 
through a self-adaptation approach where the system is 
regarded as a dynamic composition of the learning 
process. The goal is to enhance the system with self-
learning capabilities, enable it to learn from collected 
visual data and develop its learning process 
autonomously. Our model encompasses three main 
components, namely (a) domain model to capture high-
level representations of visual cues, (b) contextual 
knowledge representing context attributes relevant to the 
current dynamic context of visual cues, and (c) adaptive 
learner performing learning and recognizing visual cue 
representations based on the prevailing context. This 
model is prepared with a formulation combining the 
adaptive kernel method on the CNN architecture and the 
utilization of SpinalNet in the fully connected layer of the 
CNN. 
The validity of the proposed model was evaluated using 
cross-validation with several testing schemes. In addition 
to the evaluation results compared with the original 
SpinalNet model, we also validated the model by 
comparing its performance through evaluations with 
methods used in related studies, varying kernel sizes, and 
comparisons with other adaptation methods. The 
evaluation results indicate that the proposed model 
performs very well regarding accuracy and computational 
complexity. The results of this work pave the way for 
future studies. In other words, future studies can include 
developing and expanding our proposed model for other 
domain needs (e.g., audio recognition, machine 
translation, and so on). 
 
References 
 
[1] A. Aradea, I. Supriana, and K. Surendro, “Self-
adaptive model based on goal-oriented 
requirements engineering for handling service 
variability,” Journal of Information and 
Communication Technology, vol. 19, no. 2, pp. 
225–250, 2020, doi: 10.32890/jict2020.19.2.4. 
[2] Aradea, I. Supriana, and K. Surendro, “ARAS: 
adaptation requirements for adaptive systems,” 
Automated Software Engineering, vol. 30, no. 1, 
p. 2, 2022, doi: 10.1007/s10515-022-00369-3. 
[3] D. Weyns, “Wave VII: Learning from 
Experience,” in An Introduction to Self-adaptive 
Systems: A Contemporary Software Engineering 
Perspective, 2020, pp. 201–226. doi: 
10.1002/9781119574910.ch10. 
[4] A. Mollajan, A. Shahdadi, A. Ashofteh, F. 
Hamedani-KarAzmoudehFar, and S. H. 
Iranmanesh, “System Adaptability Enhancement 
Based on Improving the System Reconfigurability 
by Modularization of the System Architecture,” 
2023. doi: 10.2139/ssrn.4519777. 
[5] M. Bhadra, D. S. Lopera, R. Kunzelmann, and W. 
Ecker, “ A Model-Driven Architecture Approach 
to Accelerate Software Code Generation ,” in 
2024 7th International Conference on Software 
and System Engineering (ICoSSE) , Los Alamitos, 
CA, USA: IEEE Computer Society, Apr. 2024, 
pp. 23–30. doi: 
10.1109/ICoSSE62619.2024.00012. 
[6] M. Huisman, J. N. van Rijn, and A. Plaat, “A 
survey of deep meta-learning,” Artif Intell Rev, 
Self-Learning Model for Pattern Recognition in Vision System… Informatica 49 (2025) 157–170   169 
vol. 54, no. 6, pp. 4483–4541, Aug. 2021, doi: 
10.1007/s10462-021-10004-4. 
[7] T. Gong, X. Zheng, and X. Lu, “Meta Self-
Supervised Learning for Distribution Shifted 
Few-Shot Scene Classification,” IEEE 
Geoscience and Remote Sensing Letters, vol. 19, 
pp. 1–5, 2022, doi: 10.1109/LGRS.2022.3174277. 
[8] P. Terhörst, M. Huber, N. Damer, F. 
Kirchbuchner, K. Raja, and A. Kuijper, “Pixel-
Level Face Image Quality Assessment for 
Explainable Face Recognition,” IEEE Trans Biom 
Behav Identity Sci, vol. 5, no. 2, pp. 288–297, 
2023, doi: 10.1109/TBIOM.2023.3263186. 
[9] S. Malakar, W. Chiracharit, and K. 
Chamnongthai, “Masked Face Recognition With 
Generated Occluded Part Using Image 
Augmentation and CNN Maintaining Face 
Identity,” IEEE Access, vol. 12, pp. 126356–
126375, 2024, doi: 
10.1109/ACCESS.2024.3446652. 
[10] H.-I. Kim, K. Yun, and Y. M. Ro, “Face Shape-
Guided Deep Feature Alignment for Face 
Recognition Robust to Face Misalignment,” IEEE 
Trans Biom Behav Identity Sci, vol. 4, no. 4, pp. 
556–569, 2022, doi: 
10.1109/TBIOM.2022.3213845. 
[11] D. Reis, J. Kupec, J. Hong, and A. Daoudi, “Real-
Time Flying Object Detection with YOLOv8,” 
May 2023, doi: 
https://doi.org/10.48550/arXiv.2305.09972. 
[12] I. H. Sarker, “Deep Learning: A Comprehensive 
Overview on Techniques, Taxonomy, 
Applications and Research Directions,” Nov. 01, 
2021, Springer. doi: 10.1007/s42979-021-00815-
1. 
[13] C. Xu, “Applying MLP and CNN on Handwriting 
Images for Image Classification Task,” in 2022 
5th International Conference on Advanced 
Electronic Materials, Computers and Software 
Engineering (AEMCSE), 2022, pp. 830–835. doi: 
10.1109/AEMCSE55572.2022.00167. 
[14] L. Kurniasari and A. Setyanto, “Sentiment 
Analysis using Recurrent Neural Network,” in 
Journal of Physics: Conference Series, Institute of 
Physics Publishing, Mar. 2020. doi: 
10.1088/1742-6596/1471/1/012018. 
[15] G. Priyadharshini and D. R. Judie Dolly, 
“Comparative Investigations on Tomato Leaf 
Disease Detection and Classification Using CNN, 
R-CNN, Fast R-CNN and Faster R-CNN,” in 
2023 9th International Conference on Advanced 
Computing and Communication Systems 
(ICACCS), 2023, pp. 1540–1545. doi: 
10.1109/ICACCS57279.2023.10112860. 
[16] D. Soydaner, “A Comparison of Optimization 
Algorithms for Deep Learning,” Intern J Pattern 
Recognit Artif Intell, vol. 34, no. 13, p. 2052013, 
Dec. 2020, doi: 10.1142/S0218001420520138. 
[17] G. L. Sree and R. Baskar, “Performance Analysis 
of CNN Algorithm in Comparison with LR 
algorithm for Face Recognition in Smart-Lock,” 
in 2024 International Conference on Trends in 
Quantum Computing and Emerging Business 
Technologies, 2024, pp. 1–5. doi: 
10.1109/TQCEBT59414.2024.10545038. 
[18] P. Wei, M. Shang, J. Zhou, and X. Shi, “Efficient 
adaptive learning rate for convolutional neural 
network based on quadratic interpolation egret 
swarm optimization algorithm,” Heliyon, vol. 10, 
no. 18, Sep. 2024, doi: 
10.1016/j.heliyon.2024.e37814. 
[19] J. Luo and D. Hu, “An Image Classification 
Method Based on Adaptive Attention Mechanism 
and Feature Extraction Network,” Comput Intell 
Neurosci, vol. 2023, no. 1, Jan. 2023, doi: 
10.1155/2023/4305594. 
[20] Y. Jiang, J. Xie, and D. Zhang, “An Adaptive 
Offset Activation Function for CNN Image 
Classification Tasks,” Electronics (Switzerland), 
vol. 11, no. 22, Nov. 2022, doi: 
10.3390/electronics11223799. 
[21] W. Wu and Y. Pan, “Adaptive Modular 
Convolutional Neural Network for Image 
Recognition,” Sensors, vol. 22, no. 15, Aug. 2022, 
doi: 10.3390/s22155488. 
[22] W. Guo et al., “A Memristor-Based Adaptive 
Pooling Network for Cnn Optimization,” 2023. 
doi: 10.2139/ssrn.4648000. 
[23] H. M. D. Kabir et al., “SpinalNet: Deep Neural 
Network With Gradual Input,” IEEE Transactions 
on Artificial Intelligence, vol. 4, no. 5, pp. 1165–
1177, Oct. 2023, doi: 
10.1109/TAI.2022.3185179. 
[24] Aradea, I. Supriana, K. Surendro, and I. 
Darmawan, “Integration of Self-adaptation 
Approach on Requirements Modeling,” in Recent 
Advances on Soft Computing and Data Mining, T. 
Herawan, R. Ghazali, N. M. Nawi, and M. M. 
Deris, Eds., Cham: Springer International 
Publishing, 2017, pp. 233–243, doi:  10.1007/978-
3-319-51281-5_24. 
[25] Aradea, I. Supriana, and K. Surendro, “Self-
adaptive software modeling based on contextual 
requirements,” Telkomnika (Telecommunication 
Computing Electronics and Control), vol. 16, no. 
3, pp. 1276–1288, 2018, doi: 
10.12928/TELKOMNIKA.v16i3.7032. 
[26] A. Aradea, R. Rianto, and H. Mubarok, “Inference 
Model for Self-Adaptive IoT Service Systems,” 
International Journal of Intelligent Engineering 
and Systems, vol. 14, no. 4, pp. 337–349, 2021, 
doi: 10.22266/ijies2021.0831.30. 
[27] Aradea, Rianto, and H. Mubarok, “Cultivating 
Service Knowledge Models for IoT-Based 
Systems Adaptability,” Informatica (Slovenia), 
vol. 46, no. 5, pp. 115–122, 2022, doi: 
10.31449/inf.v46i5.3874. 
[28] M. Acheli, D. Grigori, and M. Weidlich, 
“Discovering and Analyzing Contextual 
Behavioral Patterns From Event Logs,” IEEE 
Trans Knowl Data Eng, vol. 34, no. 12, pp. 5708–
5721, 2022, doi: 10.1109/TKDE.2021.3077653. 
170 Informatica 49 (2025) 157–170 A. Rianto et al.  
[29] B. Yang, W. Wu, Y. Liu, and H. Liu, “A Novel 
Sleep Stage Contextual Refinement Algorithm 
Leveraging Conditional Random Fields,” IEEE 
Trans Instrum Meas, vol. 71, pp. 1–13, 2022, doi: 
10.1109/TIM.2022.3154838. 
[30] W. Zhao, S. Peng, J. Chen, and R. Peng, 
“Contextual-Aware Land Cover Classification 
With U-Shaped Object Graph Neural Network,” 
IEEE Geoscience and Remote Sensing Letters, 
vol. 19, pp. 1–5, 2022, doi: 
10.1109/LGRS.2022.3177778. 
[31] L. Deng, “The MNIST Database of Handwritten 
Digit Images for Machine Learning Research 
[Best of the Web],” IEEE Signal Process Mag, 
vol. 29, no. 6, pp. 141–142, 2012, doi: 
10.1109/MSP.2012.2211477. 
[32] N. Saqib, K. F. Haque, V. P. Yanambaka, and A. 
Abdelgawad, “Convolutional-Neural-Network-
Based Handwritten Character Recognition: An 
Approach with Massive Multisource Data,” 
Algorithms, vol. 15, no. 4, Apr. 2022, doi: 
10.3390/a15040129. 
[33] M. Anwar, H. M. Ali, M. A. Hossain, and A. 
Mohon, “Recognition of Handwritten Digit using 
Convolutional Neural Network (CNN)”,Global 
Journal of Computer Science and Technology, 
2019, doi: 10.34257/gjcstdvol19is2pg27. 
[34] K. Simonyan and A. Zisserman, “Very Deep 
Convolutional Networks for Large-Scale Image 
Recognition,” in 3rd International Conference on 
Learning Representations (ICLR 2015), San 
Diego: Computational and Biological Learning 
Society, Sep. 2014, pp. 1–14. [Online]. Available: 
https://doi.org/10.48550/arXiv.1409.1556 
[35] P. I. Kaplanoglou and K. Diamantaras, “Learning 
local discrete features in explainable-by-design 
convolutional neural networks,” arXiv 
preprint  arXiv:2411.00139, Oct. 2024, doi: 
https://doi.org/10.48550/arXiv.2411.00139 
[36] M. Bukowski, I. Antoniuk, and J. Kurek, 
“Improved efficient capsule network for 
Kuzushiji-MNIST benchmark dataset 
classification,” Bulletin of the Polish Academy of 
Sciences: Technical Sciences, vol. 71, no. 6, 2023, 
doi: 10.24425/bpasts.2023.147338. 
[37] P. Cook, D. Jammooa, M. Hjorth-Jensen, D. D. 
Lee, and D. Lee, “Parametric Matrix Models,” 
arXiv preprint  arXiv:2401.11694, Jan. 2024, 
[Online]. Available: 
https://doi.org/10.48550/arXiv.2401.11694