https://doi.org/10.31449/inf.v48i11.5851                                                                                           Informatica 48 (2024) 113–124     113 
 
Application of New Feature Techniques for Multimedia Analysis in 
Artificial Neural Networks by Using Image Processing 
 
Lianqiu Liu
1*
, Yongping Yang
2
, Hong shun Chen
2
 
1
School Of Artificial Intelligence and Big Data, Chongqing College of Finance and Economics, Chongqing, China, 402160 
2
Department of Quality Monitoring and Development Planning, Hunan University of Humanities, Science and Technology, 
Loudi, Hunan, 417000, China 
E-mail: cqcfe007@163.com 
*
Corresponding author 
 
Keywords: kernel principal component analysis (KPCA), multimedia analysis, artificial neural networks (ANN), image 
processing 
 
Recieved: March 6, 2024 
To evaluate and extract information from multimedia material including photos, videos, and audio, a 
fusion of image processing and computer vision methods known as kernel principal component analysis 
(KPCA) is used. The objective is to create a system that can automatically identify relevant aspects of 
multimedia data and make them available for analysis and decision-making. By merging various 
processing methods, the fusion of image processing and multimedia analysis may improve analysis 
efficiency. The CT scans dataset from Kaggle is used to train artificial neural networks (ANN) for use in 
multimedia analysis. For multimedia analysis by combining image processing, we proposed kernel 
principal component analysis with artificial neural networks (KPCA-ANN) in this paper. The ability to 
process, analyze, and understand multimedia data by merging image processing into multimedia analysis 
has tremendous promise. It may improve decision-making, deepen our comprehension of complicated 
processes, and provide more fruitful means of information exchange. The experimental findings 
demonstrate that the proposed strategy has provided an absolute mean error of 7 and a structural 
similarity index of 86.  
Povzetek: Predlagana metoda z uporabo KPCA-ANN izboljša analizo večpredstavnostnih podatkov z 
ekstrakcijo skritih značilnosti in natančno napovedjo, kar povečuje učinkovitost in kakovost odločanja. 
 
1   Introduction 
In recent years, multimodal image fusion has risen to 
prominence in the domain of medical technology. Single 
pieces of medical equipment can only produce single-modal 
images, which results in restricted information. To provide 
an accurate diagnosis, doctors frequently require a large 
number of multimodal pictures. Direct use of multimodal 
pictures increases every burden associated with illness 
diagnosis and increases the likelihood of interference and 
incorrect diagnoses. Medical professionals have found that 
fusion algorithms are a great way to combine the vast 
amounts of data included in multimodal pictures. 
Multimodal medical imaging is increasingly used, with 
MRIs and CT scans being the most common examples (CT) 
[1]. The evolution of culture and civilization the 
development of e martial arts, and illustrious history. As a 
culture, they are handed down from generation to 
generation. In addition to being a symbol of athletic 
competition, it is also a symbol of traditional culture. While 
communicating with members of the international 
community, a nation needs to project a positive picture of 
itself. Every nation typically possesses some form of culture 
to serve as an image representative of national interactions 
[2]. According to statistics, the yearly death, incidence rate 
of brain cancer is a third higher than that of stomach cancer, 
lung cancer, and lung cancer combined. Each year, over 
80,000 individuals lose their lives to brain cancer, which 
accounts for approximately 22.5% of the total number of 
cases of brain cancer worldwide. The number of persons 
being diagnosed with brain cancer has been steadily rising 
throughout the last few decades of the 21st century, 
particularly in Asian nations [3]. Because of limitations in 
114   Informatica 48 (2024) 113–124                                                                                                                                            L. Liu et al.
 
 
imaging technology, impossible to obtain a picture in which 
all subjects are sharply focused at the same time.  To be 
more specific, in the case of an optical or convex lens with 
a fixed focal setting, objects be caught in focus if they fall 
within every low Depth-of-Field (DOF), though other things 
that are outside of DOF would be blurry [4]. As compared 
to English, written Chinese lacks both natural and explicit 
word borders and separators between words. The 
recognition of entities in Mandarin can be accomplished in 
one approach by first breaking down sentences into word 
sequences and then using those word sequences as the basis 
for entity recognition [5]. Another promising field for neural 
networks to play important roles in addressing issues and 
offering answers is medical imaging, and this field has a lot 
of potential for further development [6] [7]. With the 
ongoing advancement in both information processing 
technology and medical imaging technologies, a wide 
variety of medical pictures are now available for use in 
clinical diagnostics. Many imaging concepts and imaging 
technologies are applied in the creation of the various 
modalities that make up medical imaging. These medical 
images illustrate a variety of facts about the same human 
tissues and organs; nonetheless, they were created for 
medical purposes. Computed Tomography (CT) scans, 
which have a spatial resolution of less than one millimeter 
and are utilized extensively in a wide range of clinical illness 
assessments and aided diagnostics, can offer clear 
anatomical information about human bone structure [8]. The 
expression image enhancement refers to the process of 
boosting the visibility of specific information included 
within an image following a particular requirement, all 
while reducing any information that is regarded to be 
superfluous and enhancing the image's overall quality. The 
goal of image enhancement is to draw greater attention to 
particular aspects of an image so that the treated picture will 
be more suited for both human visual qualities and computer 
analysis, allowing for more sophisticated forms of image 
processing and analysis to be performed [9]. A strong 
Artificial Neural Networks (ANN) will need to be trained 
on a broad dataset to guarantee that it can properly analyze 
all different sorts of images. In general, although ANNs 
have the potential to significantly enhance brain CT image 
processing, it is essential to thoughtfully assess the possible 
drawbacks and restrictions of technology [10]. 
The paper's [11] goal of info concealing is to include a 
significant number of sensitive information stored in 
visuals, including text, audio, rigid and moving images, and 
motion. For penetration testing, image-based content 
concealing has been a crucial subject. Active picture deep 
embedding techniques have been proposed in this case to 
conceal data. To incorporate a hidden signal into the initial 
image, Lowest Substantial Bits (LSB) stenography, a data 
encrypting technique, is recommended. The paper [12] 
separates images in the interactive media advanced 
embedded equipment into the benefits of the minimum and 
maximum components using the principal component 
analysis, and then develops a representation for the 
multimodal sequencing methods based on the Gabor filter. 
The integrand is best when four levels of spectral analysis 
are used, and when the cubic b-spline quantized is utilized 
as the quadratic target arrangement, the prediction accuracy 
is 89.08%. The paper [13] to better analyze data from 
multimedia, this work tries to enhance the pattern 
recognition version. Several methods undergoing validation 
set to identify the pattern may be used to put this into 
practice. One of these methods uses the Lexicon Index (LSI) 
and Euclidean distance (VSM) to use in textual data based 
on the type of query syntax. The alternative approach 
recovers similar qualities using wavelet segmentation and 
statistical metrics like mean, sample variance, and signal 
energy. The paper [14] provided a thorough grasp of the 
mass spectrometry used in the domain of visual recognition, 
to highlight the most significant developments, and to 
provide insight into potential future research. The 
connection between pattern recognition and image 
processing problems further, we may learn more about what 
makes deep learning effective as well as develop new deep 
models and training techniques. The paper [15] discipline of 
vision and image processing in computers has extensively 
used image individual works. Study [16] evaluates three 
major algorithms Single Shot Detection (SSD), You Only 
Look Once (YOLO), and Faster Region-based 
Convolutional Neural Networks (Faster R-CNN) for 
detecting images. As a finding, YOLO and SDD had better 
performance among the three algorithms based on 
parameters. The paper [17] examined important 
characteristics of sensory perceptions that are necessary for 
intelligent multimedia processing. Every goal is to 
demonstrate neural networks are an essential component of 
the mentioned multimedia functionalities: effective 
audio/visual intelligence generalizations, detection and 
classification methods, a fusion of multimodal signals, and 
multimodal regeneration and synchronization. Moreover, it 
shows how adaptive neural network technology offers a 
uniform response to a wide range of multimedia 
applications. Pertinent instances of neural networks being 
effectively applied to intelligent multimedia processing 
applications are noted as factual information. This research 
Application of New Feature Techniques for Multimedia Analysis...                                                  Informatica 48 (2024) 113–124   115                                                                                                                                                
 
[18] offers a neural network-based method for multimedia 
assurance. Artificial neural networks are most appropriate 
for this task because they have qualities like excellent 
security, little variation, and the capacity to operate for 
asymmetric input parameters. This is because encryption is 
a one-way functioning. The approach has applications in 
battlefield image database sharing, medical imaging 
techniques, and private streaming video, among other fields. 
Study [19] provided a novel data-enriched-transfer learning 
framework that consists of an support vector machine 
(SVM)-tuned trained VGG16 model with enhanced training 
data. The initial use cases for testing and fine-tuning are two 
Twitter image collections. In the findings, SVM assures the 
highest levels of total accuracy and recall. In the paper [20] 
at the beginning, just one manually altered picture was used 
to identify any object fiddling, however in-universe, a real 
individual may be altered using a variety of image 
manipulation methods. It is now more difficult for a detector 
to identify tampering since numerous tampering actions are 
conducted on images and post-processing is used to remove 
the traces left behind. Newer approaches for detecting 
picture alteration rely on convolutional neural networks. 
Table 1 represents the summary table of related research 
papers.  
Table 1: Summary table 
Reference Key metrics Method Findings 
[11] Accuracy 
It was suggested to employ 
the least significant bit 
(LSB) steganography 
technique to encrypt an 
original image with a secret 
message. 
The experiment’s 
outcome demonstrates 
the model’s resilience in 
terms of efficiency 
when compared to other 
cutting-edge systems 
with an accuracy rate of 
95.1%. 
[12] Accuracy 
Separating the image into 
the advantages of the high-
energy and low-frequency 
components used the four-
layer wavelet transformation 
Applying four layers of 
wavelet decomposition 
yields an optimal 
categorization result 
with an accuracy rate of 
89.08%. 
[13] 
training time and precision 
curve 
ANN to enhance the 
multifaceted data retrieval 
search model 
The neuronal network-
based curvelet has a 
longer time elapsed than 
neural networks based 
on DWT or histogram, 
according to the training 
phase findings. 
116   Informatica 48 (2024) 113–124                                                                                                                                            L. Liu et al.
 
 
[16] 
Accuracy, F1 score, and 
precision 
Single Shot Detection 
(SSD), You Only Look 
Once (YOLO), and Faster 
Region-based Convolutional 
Neural Networks (Faster R-
CNN). 
YOLO and SSD are the 
best algorithms of these 
three algorithms. 
[18] Encryption and decryption 
An approach based on 
neural networks for 
multimedia assurance and 
cryptosystem with neural 
network. 
A multi-layered neural 
network's training 
approach is accessible 
for maintaining 
multimedia data 
securely. 
[19] Accuracy and recall 
Data-augmented-transfer 
developed architecture, 
which is based on a VGG16 
algorithm that has already 
been trained and improved 
used SVM with more 
training data. 
According to the 
findings, VGG16-SVM 
generates the best 
overall accuracy (94%) 
and recall (96%). 
2   Materials and methods 
In this study, we suggest using kernel principal component 
analysis with artificial neural networks (KPCA-ANN) for 
multimedia analysis using image processing. The large 
dataset was collected for the TianChi contest to train ANNs 
for application in multimedia analysis. KPCA is a popular 
method for extracting features from images. Our planned 
research procedure is shown in Figure 1. 
 
2.1 Dataset 
We gather CT scan datasets from Kaggle 
(https://www.kaggle.com/datasets/trainingdatapro/compute
d-tomography-ct-of-the-brain). We utilize 80% data for 
training and 20 % data for testing. High accuracy rates were 
attained by the best models, which also showed promise for 
improving the early identification and diagnosis of brain 
disorders. 
 
 
Figure 1: KPCA-ANN flow diagram 
 
 
Application of New Feature Techniques for Multimedia Analysis...                                                  Informatica 48 (2024) 113–124   117                                                                                                                                                
 
2.2 Pre-processing  
In multimedia analysis, particularly in image processing, 
data normalization is a crucial pre-processing step. To make 
data easier to study and compare, normalization is the act of 
scaling or changing information into a common range or 
format. To enhance an image's visual quality and make it 
simpler to extract features, normalization in image 
processing may be used to alter an image's brightness, 
contrast, and color levels. In the field of image processing 
and multimedia analysis, min-max normalization is widely 
used. With this method, the pixel values of an image are 
transformed into a new range, often between 0 and 1. To do 
this, we first calculate the range of pixel values by 
subtracting the lowest pixel value from each pixel in the 
image and then dividing that number by the maximum and 
minimum pixel values. 
Min-max normalization is a technique used to guarantee that 
all images in a dataset have the same range of pixel values, 
which is useful for some different kinds of multimedia 
research. When analyzing photos using machine learning 
algorithms, for instance, normalizing the input data is 
crucial for removing potential sources of error. In addition, 
min-max normalization may enhance an image's contrast 
and dynamic range, which makes it easier to select critical 
details. 
The min-max normalization formula is as follows: 
𝑝𝑖𝑥𝑒 𝑙 𝑛𝑒𝑤 =
(𝑝𝑖𝑥𝑒 𝑙 𝑜𝑙𝑑 − 𝑚𝑖 𝑛 𝑝𝑖𝑥𝑒 𝑙 𝑣𝑎𝑙𝑢𝑒 )
(𝑚𝑎 𝑥 𝑝𝑖𝑥𝑒 𝑙 𝑣𝑎𝑙𝑢𝑒 −
 𝑚𝑖 𝑛 𝑝𝑖𝑥𝑒 𝑙 𝑣𝑎𝑙𝑢𝑒 )                                                                 (1) 
Where: pixel _old represents the image's original pixel 
value; min _pixel_value denotes the image's minimum pixel 
value; max_pixel_value denotes its maximum pixel value; 
and pixel_new represents the image's new, normalized pixel 
value. To create the normalized image, this equation is 
applied to each pixel. 
2.3 Feature extraction using KPCA 
KPCA's fundamental notion is simple and familiar. In most 
cases, a linearly variable data collection is required for PCA 
to provide accurate results. The data may be transformed 
into a linear space of greater dimensions whenever the 
fluctuations are nonlinear. In other words, Cover's theorem 
states that high-dimensional nonlinear mapping often results 
in a linear data structure in input space creating a nonlinear 
relationship from the input area to the characteristic space 
using a simple kernel-based function, KPCA provides a 
computationally tractable solution. The input space 
undergoes a nonlinear PCA operation, which KPCA 
conducts. 
Each image must be depicted as a high-dimensional vector 
of features to use KPCA for image processing. If a PCA is 
used to diagonalizable the covariance matrix of a given 
collection of data, such as 𝑥 𝑘 ∈ 𝑅 𝑚 , 𝑘 =1,…, 𝑁 , to 
decouple nonlinear correlations between them, Instead of 
using an exponential input space, the coefficient of variation 
can be expressed in a linear characteristic space F., as shown 
in equation 
 𝐶 𝐹 =
1
𝑁 ∑ Φ(𝑥 𝑗 )Φ(𝑥 𝑗 )
𝑇 𝑁 𝑗 =1
             (2) 
Where it is assumed that input vectors are projected from 
input space to F by the nonlinear mapping function Φ(·) and 
that ∑ Φ(𝑥 𝑘 ) = 0
𝑁 𝑘 =1
. Every eigenvalue issue in feature 
space must be resolved to diagonalizable the covariance 
matrix. 
𝜆𝑣 = 𝐶 𝐹 𝑣               (3) 
Where the eigenvalues are 𝑣 ∈ 𝐹 {0} and 𝜆 ≥ 0. The first PC 
in F is formed by the v and the greatest value given by 
equation (3), while the final PC is formed by the v and the 
lowest value. In this case, 𝐶 𝐹 𝑣 may be written as (4) 
𝐶 𝐹 𝑣 = (
1
𝑛 ∑ Φ(𝑥 𝑗 )Φ(𝑥 𝑗 )
𝑁 𝑗 =1
) 𝑣 =
1
𝑁 ∑ (Φ(𝑥 𝑗 ), 〈𝑣 〉
𝑁 𝑗 =1
Φ(𝑥 𝑗 )   
           (4) 
Where the dot product of x and y is indicated by 〈𝑥 , 𝑦 〉. This 
suggests that any solution with expression v with 𝜆 ≠
0 must fall within the range of Φ(𝑥 1
), … . , Φ(𝑥 𝑁 ). 𝜆𝑣 =
𝐶 𝐹 is therefore identical to 
𝜆 〈Φ(𝑥 𝑘 )𝑣 〉, = 〈Φ(𝑥 𝑘 ), 𝐶 𝐹 𝑣 〉, 𝑘 = 1, … . . , 𝑁           (5) 
Furthermore, coefficients 𝑥 𝑖 (𝑖 = 1, … . . , 𝑁 ) occur that 
𝑣 = ∑ 𝑥 𝑖 Φ(𝑥 𝑖 )
𝑁 𝑖 =1
              (6) 
Equations (5) and (6) combined, we get 
𝜆 ∑ 𝑥 𝑖 〈Φ(𝑥 𝑘 ), Φ(𝑥 𝑖 )〉
𝑁 𝑖 =1
 
=
1
𝑁 ∑ 𝑥 𝑖 〈Φ(𝑥 𝑘 ), ∑ Φ(𝑥 𝑗 )
𝑁 𝑗 =1
〉
𝑁 𝑖 =1
〈Φ(𝑥 𝑗 ), Φ(𝑥 𝑖 )〉      (7) 
The eigenvalue issue in equation (7) only concerns the dot 
products of mapped form vectors in feature space for all,
𝑘 = 1, … . ., N. The mapping Φ(·) exists but is often not 
always computationally tractable. 
118   Informatica 48 (2024) 113–124                                                                                                                                            L. Liu et al.
 
 
2.4 Classification using ANN 
Biological neural networks in the brain serve as inspiration 
for ANNs, which are computer models. It's made up of 
stacked layers of interconnected processing nodes termed 
artificial neurons or just neurons. Neurons in a network 
operate by taking signals sent to them by other neurons, 
applying some kind of mathematical function to that data, 
and then sending that processed signal on to the next neuron 
in the network or outputting the data altogether. The image 
processing ANN is represented by Algorithm 1. Pseudocode 
representations about many neurons in input, hidden, and 
output layers are numInputNeurons, numHiddenNeurons, 
and numOutputNeurons, respectively. Training Data is a 
collection of annotated images that are used to teach the 
network. The training procedure is done numerous times. 
For each given input image, the desired output specifies 
what should be produced. By using CreateLayer (), a new 
layer of neurons with the given number of neurons is 
generated. With the help of the ConnectLayers () function, 
we can establish weighted connections between all of the 
neurons in two different layers of neurons. Input values for 
the forward layer may be changed using the setInputValues 
() method. The network's output values are determined by 
the propagate () function, the SetOutputError () function 
determines the error for each neuron in the output layer, and 
the Reverse () function reverses the forward error. The 
mistake is sent across the network through the Propagate () 
method, and the subsequent method Weights () modifies the 
neuronal connection weights. When the input image has 
been processed, the values for the output layer may be 
retrieved using the getOutputValues () method. 
Algorithm 1: Artificial neural network 
 Start up the neural network 
InputLayer = createLayer(numInputNeurons) 
HiddenLayer = createLayer(numHiddenNeurons) 
OutputLayer = createLayer(numOutputNeurons) 
ConnectLayers (input layer, hidden layer) 
ConnectLayers (hiddenLayer, outputLayer) 
Train the neural network 
For each epoch in numEpochs: 
For each image in trainingData: 
Forward propagation 
    SetInputValues (inputLayer, image) 
    ForwardPropagate (inputLayer) 
       Backpropagation 
    SetOutputError (outputLayer, desiredOutput) 
    backwardPropagate(outputLayer) 
     Update weights 
    updateWeights (hiddenLayer) 
    updateWeights (outputLayer) 
 Image processing using a neural network 
SetInputValues (inputLayer, image) 
ForwardPropagate (inputLayer) 
Result = getOutputValues (outputLayer) 
2.5 Hybrid of KPCA-ANN 
The hybrid model that combines Artificial Neural Networks 
(ANN) with Kernel Principal Component Analysis (KPCA) 
offers a strong foundation for training and extraction of 
features. The feature extraction stage of this architecture 
uses KPCA to efficiently capture nonlinear relationships in 
the data. Simply the most informative main components are 
retained once the input data is transformed into a high-
dimensional space using KPCA. The ANN phase, which 
normally consists of a input layer, many layers that are 
hidden, and an output layer, then receives these features that 
were extracted. The input layer serves as the neural 
network's first input by receiving the reduced-dimensional 
characteristics generated by KPCA. Convolutional layers of 
information and fully interconnected layers are two 
examples of hidden layer types that are used to learn 
intricate patterns and representations from data. This hybrid 
approach provides an adaptable structure for problems that 
include reducing dimensionality to regression and 
classification in a number of domains. It does that by 
utilizing the characteristics of both ANN for consistent 
training and KPCA to perform efficient feature extraction. 
 
 
Application of New Feature Techniques for Multimedia Analysis...                                                  Informatica 48 (2024) 113–124   119                                                                                                                                                
 
3   Result and discussion 
The quality of the proposed KPCA-ANN strategy is 
thoroughly investigated through comparison and evaluation 
of the outcomes. A proposed method's precision and 
effectiveness are contrasted with those of contemporary 
methods like the Weighted Feature fusion of Convolutional 
neural network and Graph attention network (WFCG) [23], 
Convolutional Neural Network (CNN) [21], and Artificial 
Intelligence Fusion Model-Colorectal Cancer (AIFM-CRC) 
[22] to show that it is effective. The estimated Root Mean 
Square Error (RMSE), Peak Signal-to-noise Ratio (PSNR), 
Mean Absolute Error (MAE), Structural Similarity Index 
Measure (SSIM), and Entropy are shown in the results for 
the provided approach. 
Peak Signal-to-noise Ratio (PSNR): calculates the similarity 
between reference picture and fused image and is defined 
as,  
𝑃𝑆𝑁𝑅 = 10 × 𝑙𝑜𝑔
10
(
255
𝑅𝑀𝑆𝐸
)           (8) 
High PSNR also denotes superior fused pictures, where 
RMSE is the root mean square error between reference and 
fused images, as shown in Figure 2. 
 
 
Figure 2: Peak Signal-to-noise ratio 
The proposed method achieves the proposed 95% accuracy, 
CNN has obtained 86%, AIFM-CRC has gained 84%, and 
WFCG has scored 88%. Table 2 illustrates that the proposed 
strategy is much more effective than the existing one.  
Table 2: Peak Signal-to-noise ratio 
 
PSNR (%) 
CNN 86 
AIFM-CRC 84 
WFCG 88 
KPCA-ANN [Proposed] 92 
 
Root mean square error (RMSE) is calculating the square 
root of the average variance between the sort of disturbances 
that are expected and those that follow the equation: 
𝑅𝑀𝑆𝐸 =
√
∑ (𝑊 𝑜 −𝑊 𝑖 )
2 𝑏 𝑢 =1
𝑏            (9) 
 
 
Figure 3: Root mean square error 
Figure 3 shows the suggested system's Root Mean Square 
Error (RMSE). Every consumption prediction about Root 
Mean Square Error (RMSE) in existing systems and the 
proposed system is denoted. CNN has attained 17%, AIFM-
CRC has acquired 19%, and FAWT has reached 19%, 
whereas the proposed system attains 12% of precision.  It 
shows that the proposed approach is more effective than the 
existing one, as shown in Table 3,  
Table 3: Root mean square error 
 
RMSE (%) 
  
 
CNN 
AIFM-
CRC 
WFCG 
KPCA-ANN 
[Proposed] 
1 5 15 22 5 
2 7 10 25 9 
120   Informatica 48 (2024) 113–124                                                                                                                                            L. Liu et al.
 
 
3 9 12 21 10 
4 11 13 27 7 
5 17 19 20 12 
Entropy € Calculates the amount of information in the 
image. It is defined by, 
𝐸 = ∑ 𝑝 5
 𝑙𝑜𝑔
2
 𝑃 5
1−1
𝑗 =0
                                                             (10) 
 
 
Figure 4: Entropy 
Figure 4 shows the entropy, while the proposed method 
attains the proposed 85% accuracy, CNN has obtained 80%, 
AIFM-CRC has gained 80%, and WFCG has attained 77%. 
It illustrates that the proposed strategy is superior to the 
existing one, as shown in Table 4. 
Table 4: Entropy 
 
Entropy 
CNN 76 
AIEM-CRC 80 
WFCG 77 
KPCA-ANN [Proposed] 85 
 
Mean absolute error (MAE) is the difference between the 
classifiers anticipated and value also the actual value may 
be stated as in the equation that follows: 
𝑀𝐴𝐸 =
|∑ (𝑊 𝑜 −𝑊 𝑖 )
𝑏 𝑢 |
𝑏          (11) 
 
 
 
Figure 5: Mean absolute error 
Figure 5 shows the Mean Absolute Error (MAE) of the 
proposed system. Where 𝑤 𝑖 represents the estimated value 
of the classifier in the supplied sample data and 𝑤 𝑜 is the 
observed value. The consumption prediction of Mean 
Absolute Error (MAE) in existing systems and the proposed 
system is denoted. CNN has attained 15 %, AIFM-CRC has 
acquired 18 %, and WFCG has reached 10 %, whereas the 
proposed system attains 7 % of precision.  It shows that the 
proposed approach is more effective than the existing one, 
as shown in Table 5. 
Table 5: Mean absolute error 
 
MAE (%) 
CNN 15 
AIFM-CRC 18 
WFCG 10 
KPCA-ANN [Proposed] 7 
 
A popular metric for determining how similar two pictures 
are in image processing is the Structural Similarity Index 
Measure (SSIM). 
𝑆𝑆𝐼𝑀 (𝑎 , 𝑏 ) = [𝑙 (𝑎 , 𝑏 ) ∗ 𝑐 (𝑎 , 𝑏 ) ∗ 𝑠 (𝑎 , 𝑏 )]       (12) 
Application of New Feature Techniques for Multimedia Analysis...                                                  Informatica 48 (2024) 113–124   121                                                                                                                                                
 
Where, 𝑙 (𝑎 , 𝑏 ) is the luminance factor, 𝑐 (𝑎 , 𝑏 ) is the 
contrast factor, 𝑠 (𝑎 , 𝑏 ) is the structural similarity factor, and 
𝑥 and 𝑦 are the two pictures being compared. 
 
 
Figure 6: Structural similarity index measure 
Figure 6 shows the structural similarity index measure, 
while the proposed method achieves the proposed 86% 
accuracy, CNN has attained 73%, AIFM-CRC has scored 
79%, and WFCG has achieved 74%. Table 6 illustrates that 
the recommended approach of action is much more effective 
than the existing method. 
Table 6: Structural similarity index measure 
 
SSIM (%) 
CNN 73 
AIFM-CRC 79 
WFCG 74 
KPCA-ANN [Proposed] 86 
 
3.1 Discussion 
Our proposed method was compared with existing methods 
including CNN, AIFM-CRC, and WFCG. CNN can be 
difficult to make an exact decision because it lacks clear 
comprehension. CNN attained PSNR (86%), RMSE (17%), 
Entropy (76%), MAE (15%) and SSIM (73%). The lack of 
ability of AIFM-CRC to generalize to an extensive spectrum 
of CRC patients and its reliance on the diversity and quality 
of the input data, which could have an impact on how well 
the model performs in various patient populations or disease 
phases, are among its drawbacks. AIFM-CRC achieved 
PSNR (84%), RMSE (19%), Entropy (80%), MAE (18%) 
and SSIM (79%). The processing constraints of WFCG and 
its potential difficulty in handling complicated 
hyperspectral data pose limits that might hinder its 
scalability and real-time usage. WFCG achieved PSNR 
(88%), RMSE (20%), Entropy (77%), MAE (10%) and 
SSIM (74%). Our proposed method overcomes these 
limitations effectively and demonstrates superior 
performance such as PSNR (92%), RMSE (12%), Entropy 
(85%), MAE (7%), and SSIM (86%).  
 
3.2 Practical implication 
Improved diagnosis effectiveness and precision in hospitals 
are made possible by the results, which have important 
implications for medical imaging. Promising practical uses 
include modified treatment planning, disease tracking, and 
computer-aided diagnostics. All of these will eventually 
result in better patient outcomes and healthcare provision. 
4   Conclusion 
In this study, we propose using kernel principal component 
analysis with artificial neural networks (KPCA-ANN) to 
perform multimedia analysis using a combination of image 
processing. The aim of image analysis is to get useful data 
from images by breaking them down into their constituent 
parts. Images are a crucial information carrier because of 
their intuitive visuals and rich substance. They have a 
crucial role in conveying meaning. The ability to process 
images is now a staple of multimedia processing software. 
Multimedia analysis ANN trained on a big dataset collected 
from the TianChi contest. The experimental findings 
demonstrate that the proposed strategy outperforms the 
existing approaches. Scalability and appropriate 
representation of information may be limited by the 
computationally demanding nature of KPCA and ANN 
integration and the sensitivity to kernel function selection, 
which could restrict the generalization and interpretability 
of the suggested multimedia analysis system. In order to 
efficiently manage larger and more diversified multimedia 
datasets, future research might concentrate on strengthening 
the KPCA-ANN approach's adaptability and generalization, 
as well as its resilience to variability in data features. 
122   Informatica 48 (2024) 113–124                                                                                                                                            L. Liu et al.
 
 
References 
[1] Su, Y., Tian, J. and Zan, X., 2022. The research of 
Chinese martial arts cross-media communication 
system based on deep neural network. 
Computational Intelligence and Neuroscience, 
2022. 
[2] Wan, Z., Dong, Y., Yu, Z., Lv, H. and Lv, Z., 2021. 
Semi-supervised support vector machine for digital 
twins-based brain image fusion. Frontiers in 
Neuroscience, 15, p.705323. 
[3] Li, J., Guo, X., Lu, G., Zhang, B., Xu, Y., Wu, F. 
and Zhang, D., 2020. DRPL: Deep regression pair 
learning for multi-focus image fusion. IEEE 
Transactions on Image Processing, 29, pp.4816-
4831. 
[4] Gu, R., Wang, T., Deng, J. and Cheng, L., 2023. 
Improving Chinese Named Entity Recognition by 
Interactive Fusion of Contextual Representation 
and Glyph Representation. Applied Sciences, 
13(7), p.4299. 
[5] Xu, Q., Zeng, Y., Tang, W., Peng, W., Xia, T., Li, 
Z., Teng, F., Li, W. and Guo, J., 2020. Multi-task 
joint learning model for segmenting and 
classifying tongue images using a deep neural 
network. IEEE Journal of biomedical and health 
informatics, 24(9), pp.2481-2489. 
[6] Hu, M., Zhong, Y., Xie, S., Lv, H. and Lv, Z., 
2021. Fuzzy system based medical image 
processing for brain disease prediction. Frontiers 
in Neuroscience, 15, p.714318. 
[7] Gai, D., Shen, X., Chen, H., Xie, Z. and Su, P., 
2020. Medical image fusion using the PCNN based 
on IQPSO in NSST domain. IET image processing, 
14(9), pp.1870-1880. 
[8] Fu, J., Li, W., Du, J. and Xiao, B., 2020. 
Multimodal medical image fusion via laplacian 
pyramid and convolutional neural network 
reconstruction with local gradient energy strategy. 
Computers in Biology and Medicine, 126, 
p.104048. 
[9] Qiu, T., Wen, C., Xie, K., Wen, F.Q., Sheng, G.Q. 
and Tang, X.G., 2019. Efficient medical image 
enhancement based on CNN‐FBB model. IET 
Image Processing, 13(10), pp.1736-1744. 
[10] Yu, Q., Shi, Y., Sun, J., Gao, Y., Zhu, J. and Dai, 
Y., 2019. Crossbar-net: A novel convolutional 
neural network for kidney tumor segmentation in 
ct images. IEEE transactions on image processing, 
28(8), pp.4060-4074 
[11] Khan, A.A., Shaikh, A.A., Cheikhrouhou, O., 
Laghari, A.A., Rashid, M., Shafiq, M. and Hamam, 
H., 2022. IMG‐forensics: Multimedia‐enabled 
information hiding investigation using 
convolutional neural network. IET Image 
Processing, 16(11), pp.2854-2862. 
[12] Sui, K. and Kim, H.G., 2019. Research on 
application of multimedia image processing 
technology based on wavelet transform. EURASIP 
Journal on Image and Video Processing, 2019(1), 
pp.1-9. 
[13] Mahmood, M., Al-Kubaisy, W.J. and Al-Khateeb, 
B., 2023. Multimedia information retrieval using 
artificial neural network. IAES International 
Journal of Artificial Intelligence, 12(1), p.146. 
[14] Jiao, L. and Zhao, J., 2019. A survey on the new 
generation of deep learning in image processing. 
IEEE Access, 7, pp.172231-172263. 
[15] Xiang, H., Zou, Q., Nawaz, M.A., Huang, X., 
Zhang, F. and Yu, H., 2023. Deep learning for 
image inpainting: A survey. Pattern Recognition, 
134, p.109046. 
[16] Srivastava, S., Divekar, A.V., Anilkumar, C., Naik, 
I., Kulkarni, V. and Pattabiraman, V., 2021. 
Comparative analysis of deep learning image 
detection algorithms. Journal of Big data, 8(1), 
p.66.  
[17] Kung, S.Y. and Hwang, J.N., 1998. Neural 
networks for intelligent multimedia processing. 
Proceedings of the IEEE, 86(6), pp.1244-1272. 
[18] Kumar, S.N., 2014. Technique for security of 
multimedia using neural network. Paper id-
IJRETM-2014-02-05-020, IJRETM, 2(05), pp.1-7. 
[19] Jiang, Z., Zaheer, W., Wali, A. and Gilani, S.A.M., 
2024. Visual sentiment analysis using data-
augmented deep transfer learning techniques. 
Multimedia Tools and Applications, 83(6), 
pp.17233-17249. 
[20] Thakur, R. and Rohilla, R., 2020. Recent advances 
in digital image manipulation detection techniques: 
A brief review. Forensic science international, 
312, p.110311 
[21] Bayar, B. and Stamm, M.C., 2016, June. A deep 
learning approach to universal image manipulation 
detection using a new convolutional layer. In 
Proceedings of the 4th ACM workshop on 
Application of New Feature Techniques for Multimedia Analysis...                                                  Informatica 48 (2024) 113–124   123                                                                                                                                                
 
information hiding and multimedia security (pp. 5-
10). 
[22] Mansour, R.F., Alfar, N.M., Abdel‐Khalek, S., 
Abdelhaq, M., Saeed, R.A. and Alsaqour, R., 2022. 
Optimal deep learning based fusion model for 
biomedical image classification. Expert Systems, 
39(3), p.e12764. 
[23] Dong, Y., Liu, Q., Du, B. and Zhang, L., 2022. 
Weighted feature fusion of convolutional neural 
network and graph attention network for 
hyperspectral image classification. IEEE 
Transactions on Image Processing, 31, pp.1559-
1572. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124   Informatica 48 (2024) 113–124                                                                                                                                            L. Liu et al.