https://doi.org/10.31449/inf.v49i13.7514                                                                                  Informatica 49 (2025) 143–162 143 
 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for 
Efficient Image Deduplication in Cloud Computing Environments 
Mohd Hasan Mohiuddin
1
, Latha Tamilselvan
2*
  
1
Department of Computer Science and Engineering, B.S. Abdur Rahman Crescent Institute of Science &Technology, 
Vandalur, Chennai – 600048, India. 
2
Department of Information & Technology, B.S. Abdur Rahman Crescent Institute of Science &Technology, 
Vandalur, Chennai – 600048, India. 
E-mail: mohiddin.hasan@outlook.com, latha.tamil@crescent.education   
*Corresponding author 
Keywords: image Deduplication, AI, DL, cloud computing, infrastructure efficiency 
Received: November 6, 2024 
Image deduplication is becoming increasingly important for cloud storage infrastructures to handle the 
increasing amount of multimedia material. Through increased storage efficiency, effective picture 
deduplication may optimize resources and save expenses. It also improves performance by facilitating 
quicker access, utilizing less bandwidth, and enhancing data integrity. Although heuristic-based classical 
deduplication techniques work well in various storage infrastructures, they cannot keep up with the 
dynamic nature of cloud storage resources. This study presents IDedupNet, a revolutionary DL-based 
framework that improves infrastructure performance in cloud computing by efficiently detecting duplicate 
and near-duplicate photos. Our approach leverages MobileNetV3 for feature extraction and CNN-based 
encodings for image deduplication, enabling it to manage duplicate photos in highly dynamic contexts 
efficiently. Additionally, we provide a Learning-Based Image Deduplication (LBID) approach that 
improves deduplication performance by extending the use of the IDedupNet model. Experimental 
evaluation demonstrates a high accuracy of 98.68% on benchmark datasets, consistently outperforming 
existing models. The underlying technique and deep learning framework may be easily integrated into 
real-time cloud storage systems to increase customer satisfaction and infrastructure efficiency. 
Povzetek: Predstavljena je arhitektura globokega učenja IDedupNet, ki temelji na MobileNetV3, za 
učinkovito deduplikacijo slik v oblaku. Z uporabo MobileNetV3 za ekstrakcijo značilnosti in CNN 
kodiranjem, IDedupNet učinkovito zaznava duplikate in skoraj duplikate slik.
 
1   Introduction  
Businesses worldwide have benefited from readily 
available storage options since the advent of cloud 
computing and its ecosystem. Businesses may now handle 
and securely retain their data for later use. To obtain 
business insights, they can also do data analytics. The 
cloud stores much audiovisual content, which might lead 
to duplicate information. Duplicating documents, images, 
or videos can lead to incorrect data, lost storage space, 
wasteful computer use, and increased time consumption. 
To address this issue, deduplication algorithms were 
developed, identifying and removing duplicate 
components to provide access to unique objects [1]. 
Finding and removing duplicates may significantly 
increase the efficiency of cloud data centers when 
managing enormous volumes of data—which in cloud 
storage architecture might exceed petabyte proportions. 
Deduplication methods reduce energy usage, storage 
needs, and computational inefficiencies while increasing 
efficiency for cloud data centers by employing delay [2]. 
As a result of artificial intelligence and other technologies, 
learning-based approaches have replaced heuristics. 
Intelligent application services that use recurrent learning  
 
 
ideas can automatically identify duplicate items in storage 
infrastructure, improving performance. 3. Better safe 
deduplication strategies that employ hybrid cloud storage 
systems to remove and preserve duplicate things 
originated from the increased need to identify duplicate 
items in cloud storage systems [4]. 
To find duplicate items in the literature, researchers have 
examined unsupervised methods. Many industries, like the 
healthcare sector, where medical pictures are kept on cloud 
infrastructure, have found deduplication approaches 
crucial. Duplicate components in medical images have 
been automatically detected through the development of 
algorithms like fusion learning [5]. The idea of 
automatically labeling images has also been researched in 
the literature for computer vision applications that store a 
lot of data in cloud computing infrastructures to improve 
computational and storage efficiency [6]. Research on 
deduplication is also underway for Internet of Things 
applications, which produce massive amounts of data, 
including data from cloud-based real-world apps [7], [8]. 
Nonetheless, the research indicates that DL models must 
be improved to create deep learning-based deduplication 
frameworks for cloud-based video material. 
144   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
This paper introduces IDedupNet, a state-of-the-art deep 
learning-based system that enhances cloud computing 
infrastructure performance by efficiently detecting 
duplicate and near-duplicate photos. Our approach 
efficiently manages duplicate photos in highly dynamic 
scenarios using deep learning for picture encoding and 
deduplication. We also provide a Learning-Based Image 
Deduplication (LBID) approach that leverages the 
IDedupNet model to improve deduplication performance. 
Our proposed deep learning model attains a high accuracy 
of 98.68% on benchmark datasets, boosting trust in its 
performance and consistently outperforming other models. 
Therefore, the underlying algorithm and this unique deep 
learning architecture may be readily included in real-time 
cloud storage systems, improving infrastructure 
effectiveness and client satisfaction. 
 The remainder of the document is structured as follows: 
Previous studies on the different techniques of picture 
deduplication applications created with learning-based 
methodologies are reviewed in Section 2. Section 3 
provides research design details. The DL-based approach 
for efficient picture deduplication in cloud computing 
systems is presented in Section 4. The results of our 
empirical analysis utilizing a benchmark dataset are 
presented in Section 5, with incisive criticism of the 
proposed model and a comparison with state-of-the-art 
models. A comparison between the suggested method and 
hashing-based state-of-the-art techniques is given in 
Section 6. Section 7 discusses the proposed research and 
its significance. Section 8 concludes our study and 
suggests the following lines of inquiry. 
2   Related work 
Various approaches for deduplicating multimedia objects 
in cloud environments have been discussed in the 
literature. Godavari et al. [1] emphasized the importance 
of efficiently finding and eliminating duplicate data to 
optimize primary storage for deduplication. However, 
workloads in the cloud pose a challenge. Cloud data, 
typically accessed infrequently, challenges cache 
efficiency in deduplication systems due to a lack of 
temporal locality. Zhao et al. [2] exacerbated storage 
issues by extensively using Docker containers. DupHunter 
recommends effective deduplication. Usharani and 
Danalakshmi [3] improved detection and storage 
efficiency by evaluating and correlating pixel dimensions, 
reducing picture repetition in innovative application 
services. 
Mageshkumar et al. [4] proposed an efficient paradigm 
incorporating block-level deduplication, Diffie-Hellman 
encryption, and experimentation. Convergent encryption 
enhances the security of cloud data deduplication. Ahmed 
et al. [5] employed a global data aggregation technique to 
improve the accuracy and precision of CAD system 
performance with duplicate medical images. Xu et al. [6] 
introduced reinforcement learning-based indexing for 
deduplication, addressing disk bottlenecks, and enhancing 
memory efficiency. Prathima et al. [7] provided on-
demand resources to support IoT data processing. Storage 
and performance are optimized through effective data 
deduplication in distributed caching. Pragash and 
Jayabarathy [8] reduced computational complexity 
through data deduplication, examining various methods 
for efficiency to aid researchers in developing workable 
ideas. 
Zheng et al. [11] emphasized that cloud data deduplication 
lowers redundancy by maintaining unique copies, which is 
challenging due to the requirement for strong encryption 
and detection of duplicate files. Fu et al. [12] enhanced 
efficiency and security by offering a fog-to-multi-cloud 
secured storage solution with application-aware 
deduplication for sensitive medical data. Wang et al. [13] 
introduced an effective user revocation method for secure 
deduplication, reducing update computation and 
communication costs. Zhang et al. [14] utilized blockchain 
technology to minimize computational costs and guarantee 
data security and integrity. Xu et al. [15] presented LIPA, 
a learning-based deduplication technique that addresses 
disk bottleneck problems using reinforcement learning 
with little memory overhead for effective deduplication. 
Jai et al. [16] suggested a content-based strategy using a 
triplet loss deep learning network and scalable hashing, 
showing significant progress compared to existing 
approaches that rely on URLs. Zhou et al. [17] addressed 
issues with copyright and privacy arising from the growth 
of digital multimedia online in cloud and large data 
environments. Rajput et al. [18] offered a secure approach 
for human activity recognition using picture obfuscation in 
cloud-based expert systems, addressing data privacy 
concerns. Anuradha et al. [19] utilized emerging 
technologies such as IoT, CC, and AI for cancer prediction 
and encryption for safe cloud data storage and 
accessibility. Kumar et al. [20] emphasized using data 
compression and deduplication methods, notably the 
SHA-3 algorithm, to optimize cloud computing capacity 
for safe deduplication. 
Asif et al. [21] suggest automated processing is required 
for disaster management using social media photography. 
They propose a strategy driven by taxonomy, deep 
learning, and decision-making methodologies to enhance 
real-time emergency response and crisis management. 
Takeshita et al. [22] address privacy issues and provide 
security against hostile attackers by introducing a single-
server protocol for safe cross-user nearly-identical 
deduplication in cloud storage. Vijayalakshmi and 
Jayalakshmi [23] focus on effective deduplication 
techniques to manage data redundancy concerns and 
highlight the importance of CC in managing the 
exponential rise of digital data. Shetty et al. [24] highlight 
the need for incident management due to the shift to cloud 
computing. They utilize a multi-task BiLSTM-CRF model 
for named entity recognition, SoftNER, an unsupervised 
knowledge extraction framework, which achieves 
excellent accuracy. Zhang et al. [25] present CEVAS, a 
cutting-edge serverless collaborative video analytics 
solution on the cloud. It shows notable advantages over 
current systems by achieving cost-effectiveness, 
maintaining high throughput, and optimizing resource 
management. 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    145 
 
 
 
 
Table 1: Comparative summary of related works 
Study Key 
Contributi
ons 
Methodol
ogy 
Results Limitations 
Godavari 
et al. [1] 
Hybrid 
deduplicati
on system 
with 
content-
based cache 
for cloud 
environmen
ts 
Heuristic-
based 
deduplicati
on with 
cache 
optimizatio
n 
Improved 
storage 
efficiency 
Limited 
scalability 
for dynamic 
cloud data 
Zhao et al. 
[2] 
High-
performanc
e 
deduplicati
on for 
Docker 
registries 
End-to-end 
deduplicati
on scheme 
for 
containeriz
ed 
environme
nts 
Enhanced 
deduplicati
on speed 
Not 
adaptable to 
diverse 
multimedia 
datasets 
Usharani & 
Danalaksh
mi [3] 
Recurrent 
learning-
based 
deduplicati
on for 
innovative 
applications 
Recurrent 
learning 
algorithms 
Higher 
accuracy 
for 
specific 
use cases 
Ineffective 
for large-
scale 
dynamic 
datasets 
Mageshku
mar et al. 
[4] 
Secure 
deduplicati
on using 
cryptograph
ic 
techniques 
in hybrid 
cloud 
Diffie-
Hellman 
encryption 
and block-
level 
deduplicati
on 
Improved 
security 
and 
deduplicati
on 
High 
computation
al overhead 
Ahmed et 
al. [5] 
Unsupervis
ed fusion 
learning for 
medical 
image 
deduplicati
on 
Fusion 
learning 
algorithms 
Increased 
precision 
in medical 
imaging 
Domain-
specific; 
lacks 
generalizabi
lity 
Fu et al. 
[12] 
Fog-to-
multi-cloud 
secure 
deduplicati
on for 
eHealth 
data 
Applicatio
n-aware 
deduplicati
on 
integrated 
with 
security 
protocols 
Enhanced 
security 
and 
efficiency 
Inefficient 
for non-
medical 
multimedia 
datasets 
MobileNet
V3 
(Proposed) 
Efficient 
and robust 
deduplicati
on for 
dynamic 
cloud 
environmen
ts 
MobileNet
V3 for 
feature 
extraction, 
CNN-
based 
encodings, 
and cosine 
Accuracy: 
98.68%, 
F1-score: 
95.6% 
Refer to 
Section 5.1 
for 
limitations 
similarity 
for 
duplicate 
detection 
 
Lu et al. [26] propose a deduplication technique that allows 
7X faster image updates without loss of efficiency. They 
address the issue of data duplication when updating 
Docker images. Zhang et al. [27] enhance accurate 
sentiment categorization through an effective annotation 
technique using artificial and emotional lexicons in e-
commerce remarks. Li et al. [28] present an edge-assisted 
approach that minimizes resource strain on terminal 
devices while maintaining privacy in image processing, 
addressing privacy concerns with the rise of the IoT and 
sensitive picture data. Xing et al. [29] describe a method 
for leveraging street photos from driving car recorders to 
update traffic laws, achieving excellent accuracy in rule 
clustering by utilizing spatiotemporal attention, object 
detection, and model compression. Hamandawana et al. 
[30] present Redup, a caching solution that addresses 
issues with deduplication and speed in ML/DL storage 
clusters. It outperforms other systems in reducing 
deduplication overhead thanks to its dual-level caching 
architecture. 
Wang et al. [31] suggest a deep learning-based emotional 
big data facial expression detection system for autistic 
sufferers. Boutros et al. [32] discuss challenges to integrity 
faced by FPGAs in data centers and how DL models avoid 
timing issues caused by integrity assaults. Jia et al. [33] 
propose a deep learning-based content-based video de-
duplication approach to ease storage and bandwidth 
constraints. Jansen et al. [34] utilize Docker to integrate 
data, software, and runtime environment, ensuring clinical 
Deep Learning research repeatability with the Curious 
Containers architecture. Du et al. [35] highlight how AI 
facilitates the finding and prioritization of evidence, 
addressing backlogs in digital forensics due to increased 
cases and data in law enforcement. Chen [36] presented a 
method for cleaning large amounts of data using GANs 
and repeated change detection that prioritizes cleaning 
affordable decision trees.  
Abuhasel et al. [37] linked networks due to IIoT, 
necessitating strong security measures because of potential 
attacks. Sophisticated methods such as SoftMax-DNN 
improve efficiency and security. Chaudhary et al. [38] 
improved cybersecurity, creating new avenues for attack. 
Machine learning efficiently identifies threats, with many 
models attaining high accuracy. Varied material is 
ubiquitous with mobile multimedia, which is vital in the 
healthcare industry. Gupta et al. [39] proposed deep 
learning-based content hashing for image deduplication, 
improving accuracy and optimizing cloud storage 
performance. Tahir et al. [40], although security problems 
are still present, cloud computing provides customizable 
services over the internet. Using evolutionary algorithms, 
a novel CryptoGA model outperforms conventional 
cryptography techniques regarding data integrity and 
privacy. Table 1 provides a summary of the findings of the 
literature. The literature reveals a need to enhance DL 
146   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
models to develop DL-based deduplication frameworks 
for multimedia objects in cloud environments. 
 
 
3   Research design 
This work focuses on three main research questions: how 
does the IDedupNet model compare to hashing-based 
techniques at (1) Deduplication efficiency, which is key in 
accuracy and robustness to transformations and 
scalability? 2) How does the MobileNetV3 architecture 
influence cloud systems' computational efficiency and 
deduplication accuracy? and (3) What is the role of 
transformation pipeline and feature encoding in improving 
accuracy, scalability, and real-time processing of the 
deduplication process in dynamic cloud environments? A 
four-pronged framework is proposed to meet these goals: 
(1) A transformation pipeline to format the image into a 
standard format and preprocess these images using 
transformations such as crop, resize, normalize, and 
augment to become robust against resolution and format 
variations. It increases system scalability and reduces 
false negatives for near-duplicate images. (2) The feature 
extraction process is performed in high-dimensional 
semantic space (shapes, edges, and textures); this stage is 
done by MobileNetV3, keeping a small footprint even with 
high recall. (3) Feature encoding compresses extracted 
features into compact representations, helping reduce 
computational overheads and allowing similarity 
comparisons to be made quickly. (4) A similarity measure, 
cosine similarity, to detect duplicates pretty accurately, 
including transformations such as cropping or color 
adjustment. Together, these components lead to 
measurable goals: high deduplication accuracy (98%+), 
scalability (able to process massive datasets), and 
robustness (few false positives and negatives across 
transformations). Evaluation metrics like precision, recall, 
and F1-score assess the framework objectively. The 
detailed design establishes the originality and efficacy of 
IDedupNet in filling the limitation gap in the state-of-the-
art of these deduplication methods. 
4   Proposed framework 
We proposed a deduplication architecture called 
IDedupNet, as illustrated in Figure 1, to solve the essential 
issue of image deduplication in cloud computing 
environments—where vast volumes of picture data are 
stored and handled. An illustration of a deep learning 
method is this framework. A common source of storage 
inefficiencies in cloud systems is duplicate or nearly 
duplicate photographs, which slow down storage systems 
and increase costs. IDedupNet uses neural networks to 
overcome these issues by efficiently identifying and 
removing duplicate pictures. The crucial image processing 
elements are pre-processing, image conversion, and the 
transformation pipeline. Cloud-set photographs come in 
various formats, resolutions, and color schemes. By 
converting images to a standardized format, the conversion 
procedure guarantees consistency for further processing. 
Components of a transformation pipeline include applying 
transformations such as cropping, resizing, normalization, 
and even augmentation. Importantly, these changes help 
prepare images for efficient processing and may improve 
the model's ability to handle a range of image variations, 
particularly in deduplication applications. If image formats 
are standardized and adjustments are made, the system can 
detect copies more quickly. The deduplication accuracy is 
increased when pre-processing ensures that the main 
picture attributes remain evident even when two 
photographs undergo significant changes (due to different 
resolutions or minor adjustments). Feature extraction is 
essential to the proposed system. In resource-constrained 
environments, such as cloud computing systems or mobile 
devices, the lightweight CNN architecture known as 
MobileNetV3 is intended to function well. Here, it extracts 
significant features from images that include the relevant 
information, such as shapes, edges, textures, and other 
properties. The model builds a feature vector that captures 
every unique element of an input image. MobileNetV3 
generates feature vectors, either in batch mode or 
individually, for every image in a collection of photos. It 
is necessary to extract strong traits to find duplication. Due 
to its computational efficiency and ability to expedite 
the processing of extensive picture collections, 
MobileNetV3 is especially well-suited for cloud 
computing settings. In the case of near-duplicate pictures, 
pixel-wise comparison is computationally expensive and 
prone to errors. However, the deduplication method is 
guaranteed to be able to compare images based on their 
content thanks to feature extraction. 
The features that were extracted from MobileNetV3 are 
encoded using a CNN-based design. As the feature vectors 
become more straightforward, the encoding process 
maintains essential information about the image's content. 
The main goal of the single-image encoding procedure is 
to provide a picture in a reduced manner. Batch encoding 
is utilized for many photos, and contrastive learning 
techniques may be applied to enhance image pair 
comparisons. Encoding minimizes the amount of 
processing and storage required when dealing with large 
datasets. The deduplication method employs the smaller, 
more portable pictures as a comparison instead of the 
larger ones. This technique allows cloud systems to handle 
large repositories more quickly and scalably. 
The encoded feature vectors are compared using a 
similarity measure to find duplicates. At this point, the 
actual deduplication occurs when two images are 
identified as duplicates if their similarity scores exceed a 
predetermined threshold. Finding duplication in a picture 
is done by comparing its encoded feature vector with other 
cloud-stored image representations. Multiple picture batch 
comparisons are performed to identify duplicates in the 
collection or with previously uploaded images in the 
database. Solutions for cloud storage are made to eliminate 
unnecessary data and keep only the original photos. The 
system can accurately identify and flag duplicates by 
comparing their attributes, even when the features have 
been somewhat modified (e.g., cropped, scaled, or color-
adjusted). This capability is crucial since pixel-by-pixel 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    147 
 
comparisons could not reliably detect duplicates in cloud 
environments where image manipulations are frequent. 
The technique compares the photographs' similarity to get 
duplicate detection results. To achieve these outcomes, 
redundant photos may need to be removed from the cloud 
storage, combined into a single entry, or identified as such. 
Cloud systems must contend with duplicate images, which 
use more storage capacity. The framework finds and 
removes duplicates, which reduces operational costs, 
maximizes the efficiency of data retrieval, and reduces the 
amount of space required for storage. Regularly processing 
and deduplicating millions of images is very helpful in 
large-scale settings like cloud storage systems (like AWS 
and Google Cloud).
 
Figure 1: Proposed deep learning framework, IDedupNet, 
for efficient image deduplication in cloud computing 
environmentsIDedupNet is designed efficiently, utilizing 
MobileNetV3 and CNN-based encodings to ensure the 
framework can handle the massive volumes of data 
typically seen in cloud systems. 
Cloud environments often use distributed architectures for 
faster processing. IDedupNet can be integrated with 
parallel computing frameworks, enabling multiple nodes 
to process batches of images concurrently. The framework 
might support real-time deduplication as photos are 
uploaded to the cloud. The total strain on storage systems 
is decreased since duplicate data is automatically identified 
and handled. Frees up space in cloud storage systems by 
removing unnecessary photos. In pay-as-you-go systems, 
lower storage utilization translates into lower cloud storage 
service prices. Error-free picture retrieval is achieved by 
eliminating duplicates, which decreases the number of 
irrelevant photos returned by queries. Due to reduced data 
processing requirements, deduplication helps cloud data 
centers become more environmentally sustainable using 
less energy. IDedupNet, which focuses on lowering 
redundancy and enhancing storage management using 
intelligent deduplication algorithms, is a practical, 
scalable, and cloud-optimized solution for managing giant 
picture collections. 
The goal of MobileNetV3 is to carry out a range of visual 
tasks, such as photo identification and classification, 
quickly and effectively. MobileNetV3 is mainly used for 
picture deduplication in creating image feature 
embeddings or encodings. Embeddings are high-
dimensional visuals that emphasize the salient features of 
the pictures. Depthwise separable convolutions are 
combined with linear bottlenecks in MobileNetV3 to 
achieve a compromise between computational economy 
and accuracy. One kind of convolution splits the process 
into two stages: pointwise convolution and depthwise 
convolution (1x1 convolution). This form of convolution 
is called depthwise separable convolution. Linear 
bottlenecks enable an effective way to incorporate non-
linearity computationally after dimensionality reduction. 
For an input tensor X of shape (H, W, C) (height, width, 
channels), the depthwise convolution filter K takes the 
form (𝑘 ℎ
, 𝑘 𝜔 , 𝐶 ), where 𝑘 ℎ
and 𝑘 𝜔 
The depthwise 
convolution is calculated using the kernel's height and 
width, as shown in Eq. 1. 
𝑋 ′
 = X *K   (1)                                                                              
where * denotes the convolution process, producing the 
output tensor X' in the form (H', W', C). After applying it, 
it aggregates the depthwise convolution's outputs using a 
1x1 kernel, as in Eq. 2.  
𝑋 "
= 𝑋 ′
∗ 𝐾 ′
   (2)                                                                     
where the output tensor is denoted by X" and the 1x1 
convolution kernel is represented by K'. Global average 
pooling, which comes after the feature extraction layers, 
reduces the feature maps' spatial dimensions to a single 
vector for each image. To do this, average the values over 
all spatial locations as expressed in Eq. 3.  
𝑓 𝑖 = 
1
𝐻 ×𝑊 ∑ ∑ 𝑥 𝑖 ,ℎ,𝜔 𝑊𝑤
𝜔 −1
𝐻 ℎ−1
 (3)                                                         
where the pixel value at position (h, ω) in the i-th feature 
map is represented by 𝑥 𝑖 ,ℎ,𝜔 and 𝑓 𝑖 is the i-th component of 
the feature vector. A feature vector, or picture embedding, 
is the result of the last layer of MobileNetV3, which is 
frequently performed after global average pooling. This 
vector captures the essential features of the image in a 
high-dimensional space. The CNN creates an embedding 
vector. 𝐸 𝐼 with dimensions D for an image, I as expressed 
in Eq. 4.  
𝐸 𝐼 = f(I)      (4)                                                                                              
for which the CNN model is represented by f. We use 
cosine similarity to calculate how similar two pictures' 
embeddings are to identify duplication. The following 
formula may be used to determine the cosine similarity 
between two embeddings, 𝐸 1
 𝑎𝑛𝑑 𝐸 2
 as in Eq. 5.  
cosine similarity (𝐸 1
, 𝐸 2
) = 
𝐸 1
 .  𝐸 2
||𝐸 1
|| || 𝐸 2
||
  (5)                                                  
The dot product is represented by., and the vector's 
Euclidean norm, or magnitude, is shown by ||. ||. Duplicates 
are found using similarity criteria θ embeddings are more 
significant than or equal to θthe. Should two photos' cosine 
similarity be regarded as duplicates that images are 
duplicated if cosine similarity (𝐸 1
, 𝐸 2
) ≥ 𝜃 . Utilizing 
148   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
MobileNetV3, determine the feature vector for every 
image in the collection. To find the similarity between any 
two embeddings, compute their cosine similarity. Assess 
if two picture pairings are duplicates by comparing their 
similarity scores to the threshold. Batch generation of 
embeddings is common practice to increase efficiency due 
to the possibly high number of pictures. To expedite the 
process, parallelization of similarity computations is 
possible, particularly for big datasets. MobileNetV3-based 
image deduplication entails employing a CNN to extract 
high-dimensional feature embeddings, calculating the 
cosine similarity between these embeddings, and applying 
a similarity threshold to detect duplicates. Unlike other 
hashing techniques, this method uses CNN's capacity to 
collect semantic content, enabling more versatile and 
efficient duplication identification.  
 
 
 
Figure 2: CNN Architecture used for the encoding process
Figure 2 depicts a CNN architecture designed for an 
encoding process. This architecture consists of multiple 
layers that progressively transform input data through 
convolutional, pooling, and fully connected (dense) layers, 
ultimately resulting in a final encoded output. The first 
principal component of the architecture is a series of 
Conv1D (1-dimensional Convolutional) layers, which are 
responsible for extracting features from the input. Red 
indicates the network's first Conv1D layer with 64 filters 
and a kernel size of 3. This layer finds local patterns in the 
data by using convolution processes. There are more 
Conv1D layers in the model, and each one becomes 
increasingly complex. The second layer (orange) uses 128 
filters with a kernel size of 3 to further improve the 
obtained attributes. Again, indicated in red, the third 
Conv1D layer seeks to find more profound and complex 
patterns in the data using 256 filters and a kernel size of 3. 
These layers are crucial because they encode important 
features while maintaining the spatial arrangement of the 
input. 
Batch normalization (blue) is applied after convolutional 
layers to increase training effectiveness and convergence. 
Batch normalization ensures that each convolutional layer 
receives more dependable input and speeds up training by 
standardizing the output of the layers. Particularly in 
deeper networks, issues like inflated or disappearing 
gradients are mitigated since this normalization lowers 
internal covariate shift. The architecture includes 
additional MaxPooling1D layers (green) that downsample 
the feature maps produced by the convolutional layers. 
MaxPooling helps feature maps become less dimensional 
by keeping the most notable features while removing less 
important ones. In this case, using a pool size of two 
effectively cuts the spatial dimension of the data in half, 
which facilitates processing in later stages of the network. 
Following the pooling operations, the Flatten layer (shown 
in yellow) takes the multi-dimensional output from the 
previous layers and converts it into a one-dimensional 
vector. This flattening is essential for transitioning from 
convolutional layers to fully connected ones requiring a 
flat input. Next, the architecture incorporates Dense (fully 
connected) layers designed for the final stages of feature 
learning and classification. The first Dense layer, 
visualized in pink, has 512 units with a ReLU activation 
function. The ReLU activation introduces non-linearity, 
allowing the model to learn complex patterns. Usually 
utilized in classification tasks, the second Dense layer, 
represented in purple, consists of 256 units with a Softmax 
activation function. The Softmax function is appropriate 
for multi-class classification since it produces a probability 
distribution across several classes. The design uses 
dropout layers (black) to prevent overfitting during 
training. Dropout randomly deactivates a subset of neurons 
throughout each training cycle, forcing the network to 
develop more robust and expansive features. The encoding 
results, which constitute the network's ultimate output, are 
finally produced by passing the processed input through 
these layers. The process of picture deduplication uses this 
output.  
Implementation details and hyper-parameters are also 
thoroughly described to enable the reproducibility of the 
proposed framework. We trained the model using 
TensorFlow 2.0 on a workstation with an NVIDIA Tesla 
V100 GPU. Step 5: The Adam optimizer (Loshchilov and 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    149 
 
Hutter 2017) with a learning rate 0.001 was used for 
training due to its high-performance efficiency on sparse 
gradients. Batch size: 64; train for a maximum of 50 
epochs with early stopping based on validation loss to 
prevent over-fitting. We used He Normal initialization 
when initializing weights and across hidden layers of 
ReLU activation. Transfer learning was employed using 
weights pre-trained on ImageNet and fine-tuning the 
MobileNetV3 backbone for the deduplication task. For 
regularization, dropout with a rate of 0.3 and L2 
regularization with a factor of 0.0001 were used to prevent 
overfitting and improve the generalization performance of 
the models. As detailed above, the preprocessing cycle 
prepped input images for the model by changing the size 
to 224 x 224 pixels, normalizing pixel qualities to the 
range [0, 1], and applying data expansion methods. 
Comprehensive validation was performed by evaluating 
the model performance using precision, recall, F1 score, 
and accuracy metrics. We strive to provide enough detail 
for future researchers to reproduce the framework and its 
performance in our experimental setup. 
Algorithm: Learning-Based Image Deduplication 
(LBID) 
Inputs: Image Dataset D (INRIA Copydays dataset D1, 
QUALINET dataset D2, CIFAR-10 dataset D3), query 
image q 
Output: Deduplication results R, performance statistics 
P 
1. Begin 
2. D' Preprocess(D) 
3. Configure MobileNetV3 model m 
4. Compile MobileNetV3 model m 
5. features →ExtractFeatures (m, D') 
6. Configure CNN model m2 as in Figure 2 
7. Compile model m2 
8. encodings Encoding (features, m2) 
9. qeoncoding FeatureExtractionAndEncoding 
(m, m2) 
10. R Deduplication (similarityMeasure, 
encodings, qencoding) 
11. P Evaluation (R, ground truth) 
12. Print R 
13. Print P 
14. End 
Algorithm 1: Learning-based image deduplication 
(LBID) 
Algorithm 1 aims to find and remove duplicate photos 
from a dataset. Among the several picture datasets 
processed with it are the CIFAR-10, QUALINET, and 
INRIA Copydays datasets. The method's initial inputs are 
a query picture (q) and an image dataset (D). Performance 
statistics (P) and deduplication results (R) are produced to 
evaluate the deduplication procedure's efficacy. 
Preprocessing of the input dataset (D) is necessary for the 
LBID technique. As seen by (D'), normalizing the 
images—which may entail scaling, normalization, and 
augmentation—is a common step in this preprocessing 
step.  
The preprocessing procedure of the Learning-Based 
Image Deduplication (LBID) method contains many steps 
to unify and enhance the input dataset. The whole images 
are resized to 224 × 224 pixels to be compatible with the 
MobileNetV3 architecture. The pixel values are 
normalized between the range [0, 1] for better model 
training stability. Various data augmentation methods are 
used to improve generalization and robustness: random 
cropping (up to 10% variability), horizontal and vertical 
flips, rotations (±15 degrees), and color jitter (brightness, 
contrast, and saturation). These augmentation techniques 
mimic world conditions to obtain the model asserts as 
duplicate images on different scenarios and through image 
differences. The preprocessing pipeline ensures a 
consistent and robust dataset as input for feature extraction 
and deduplication. 
These processes reduce noise and volatility in the dataset, 
helping ensure the effectiveness of the subsequent feature 
extraction process by preventing the model from 
performing poorly. Next, a lightweight CNN dubbed the 
MobileNetV3 model is built and intended for mobile and 
edge devices. The MobileNetV3 model's efficient 
architecture balances speed and accuracy, making it well-
suited for image processing applications. After the model 
has been configured, it is constructed, defining the 
requirements for both training and inference. 
Upon completion of the model, the method utilizes the 
preprocessed dataset (D') to extract features. Key 
characteristics that distinguish each image are gathered 
during this feature extraction stage, which uses the 
MobileNetV3 model to create high-level representations 
of the photos. These features serve as the basis for 
determining how similar an image is, which is crucial to 
the deduplication procedure. Following feature extraction, 
a second CNN model (m2) is produced by the approach 
and similarly built. This model is used to encode the 
attributes that were obtained from the first model. It is 
made simpler to compare images based on their encoded 
properties by the encoding procedure, which reduces the 
size of the high-dimensional feature vectors.  
To deduplicate images, the method uses a feature 
extraction and encoding technique similar to that used for 
the dataset (D') to analyze the query picture (q), resulting 
in an encoded representation of the query image. The 
program then proceeds to the deduplication stage using the 
dataset and query picture encodings, which involves using 
a similarity measure to compare the encoded features of 
the query picture with the encoded features of all other 
images in the dataset, yielding the deduplication result (R), 
which locates images in the dataset that are identical or 
similar to the query image. 
Finally, the method compares the deduplication results to 
a ground truth dataset to calculate performance statistics 
(P). The F1-score, precision, and recall statistics show how 
well the algorithm identified duplicates. Users may then 
print out the findings (R) and performance statistics (P) to 
assess how well the photo deduplication procedure 
worked. By combining deep learning models with 
advanced feature extraction techniques, the LBID 
approach efficiently detects duplicate photographs in large 
datasets. It is a helpful tool for managing and retrieving 
150   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
images in various applications because of its systematic 
approach, which ensures an accurate and quick 
deduplication process. 
 
5   Experimental results 
A range of benchmark datasets, such as the INRIA 
Copydays dataset [41], QUALINET dataset [42], and 
CIFAR-10 dataset [43], were used to test the proposed 
system and evaluate its performance in image 
deduplication and distributed contexts. This section 
presents the findings. Several state-of-the-art DL models 
are used to assess the proposed system's performance. 
Input Image Duplicate Image (1) Duplicate Image (2) 
 
 
 
 
 
 
 
 
 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    151 
 
 
 
 
Figure 3: Results of image deduplication using INRIA Copydays dataset 
Figure 3 presents the photo deduplication findings. The 
columns labeled "Duplicate Image (1)" and "Duplicate 
Image (2)" contain any duplicate pictures of the input, 
whereas the "Input Image" is shown in the leftmost 
column." The original photographs (also known as 
"input") that are being examined for duplication are 
displayed in the first column. The second and third 
columns contain two versions of images that are 
considered potential duplicates of the "input image." They 
may vary in angles, lighting, or slight movements but 
represent the same scene or objects. The deduplication task 
typically involves identifying visually or contextually 
similar images despite minor changes. In the context of 
DL, this process consists of using features extracted from 
a neural network to compute similarity scores between the 
input image and the duplicate candidates. The system 
marks the images as duplicates if the similarity scores 
cross a predefined threshold. 
Original Image Duplicate Image (1) Duplicate Image (2) 
 
  
 
  
 
 
 
Figure 4: Results of image deduplication using the QUALINET dataset 
152   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
Figure 4 shows two cupcake-shaped plush toys, pink 
with white frosting and sprinkles, adorned with tiny 
bows. The duplicate images are nearly identical, but the 
original is highlighted with a green circle, while one of 
the duplicates has a black circle over the suitable plush 
toy. Despite this, the content of the images is visually 
the same, with only file name differences. In the second 
row, the original image features two toy cars (one green 
and one yellow) and a small stuffed animal lying on a 
paved surface outdoors. Both duplicates are visually 
identical to the original, showing the same arrangement 
of toys and stuffed animals. In the third row, the original 
image shows a well-maintained garden with green 
hedges and a view of a residential area in the 
background. The two duplicates are identical to the 
original image, showing the same garden layout, plants, 
and background buildings, with no visible differences 
other than the file names. Because each group of 
photographs displays duplicates with little visual 
content changes, this example applies to image 
deduplication initiatives that seek to detect identical or 
nearly identical files. 
 
 
 
Original Image Duplicate Image (1) Duplicate Image (2) 
 
 
 
 
 
 
 
 
 
 
  
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    153 
 
 
 
 
 
 
 
Figure 5: Results of image deduplication using the CIFAR-10 dataset 
The problem of picture redundancy that frequently arises 
in digital media management is clearly shown in Figure 5 
by displaying a set of original photographs and similar 
replicas. Each row indicates an original image that is 
distinct and grouped based on its subject. The 
accompanying duplicates demonstrate how minor file 
name modifications might produce many visual material 
copies. The first row of the original shot shows a striking 
sight of a black horse galloping against an orange-tinted 
sky. The dynamic composition of the horse effectively 
communicates its strength and grace, making it an 
enthralling focal point. This image comes in two variants, 
each with a slightly different file name but an identical 
visual identity. The visual uniformity of these images 
emphasizes the redundancy that can arise in photo 
collections since the duplicates have different identifiers 
but don't contribute anything new. A horse rider stands 
proudly by a wooden fence in the second row, a distinctive 
image set against a backdrop of flowering flowers. This 
serene image's rich colors and textures highlight the 
peaceful mood and the connection to nature. Despite 
having different file names, the two copies that come after 
this original picture have the same content. These two 
instances of inconsistent visual representation and 
disparate terminology are prime illustrations of how 
duplicates may choke storage systems without adding 
anything beneficial. 
The Haus der Kulturen der Welt in Berlin is a landscape 
image in the third row. This architectural marvel, which 
illustrates how structure and nature interact, is presented in 
a setting that has been carefully designed. With just minor 
differences in file naming conventions, the two duplicate 
photographs show similar images while maintaining the 
integrity of the original. Here's an example of how building 
photography may create a lot of duplicates and complicate 
digital organization. The adorable orange tabby cat in the 
fourth row is observed pondering over to one side. The 
vibrant and curious nature of cats is captured in this image, 
which invites viewers to engage with its subject. Because 
they have different file names but no further content 
variation, the copies in this row are identical duplicates of 
the original, supporting the concept of visual redundancy. 
Next, a heartwarming picture of a puppy in a person's 
hands is shown in the fifth row, illustrating the emotional 
connection between the two individuals. This image tells a 
relatable narrative while evoking feelings of love and 
camaraderie. The two subsequent copies are again 
virtually identical to the original, with minor file name 
variations. This repeat highlights the prevalence of 
duplicate images in private collections and the significance 
of effective deduplication methods. The final photo in the 
sixth row is a distinctive image of a person wearing a navy-
blue shirt and beige riding pants, standing outside with 
grace and assurance. 
The lush surroundings provide a vibrant backdrop for the 
subject and imply a green garden or park. The only 
difference between the neighboring copies in this row is 
the file names; otherwise, they share the same graphic 
elements. This final item illustrates that duplication may 
hinder efficient photo management regardless of various 
topics and circumstances. The results, derived from the 
CIFAR-10 dataset, highlight the challenges of managing 
duplicate digital collections and show the variety of 
objects that may be shot, from human figures and 
architecture to horses and kittens. The significance of 
effective deduplication procedures in preserving picture 
collections' order and clarity is underscored by each row. 
154   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
 
Figure 6: Results of empirical study in terms of confusion 
matrix 
The confusion matrix, a visualization tool used in ML, 
especially in classification issues, is depicted in Figure 6 
and is used to evaluate a model's performance. Knowing 
how effectively a model can accurately categorize 
instances into various groups is useful. The confusion 
matrix assesses how well a MobileNetV3 model (from the 
proposed IDedupNet framework) performs regarding 
image deduplication. The rows reflect the actual classes or 
labels. The columns reflect the anticipated classes or 
labels. The model accurately predicted five hundred sixty 
photos to be duplicated. The model successfully predicted 
490 photos to be non-duplicates. Forty photos were 
misclassified by the model as duplicates when, in fact, they 
weren't. Ten photos were duplicates that the algorithm 
mispredicted as non-duplicates. We derive many 
performance indicators from these data. (TP + TN) / (TP + 
TN + FP + FN) = (560 + 490) / (560 + 490 + 40 + 10) ≈ 
0.955 is the formula used to calculate accuracy. The 
accuracy is calculated as TP / (TP + FP) = 560 / (560 + 40) 
≈ 0.933. The F1-score is calculated as 2 * (Precision * 
Recall) / (Precision + Recall) ≈ 0.956, and the recall is 
calculated as TP / (TP + FN) = 560 / (560 + 10) = 0.982. 
These metrics indicate that the proposed IDedupNet 
framework's MobileNetV3 model is doing a respectable 
job at deduplicating images. Its F1 score and 
comparatively high accuracy show that it usually predicts 
the right thing. But there's always space for improvement, 
particularly in reducing false positives and negatives. 
 
Figure 7: Accuracy of the IDedupNet against several 
epochs 
The performance of a MobileNetV3 (of the proposed 
IDedupNet framework) model throughout several training 
iterations, or epochs, is shown in Figure 7. The y-axis 
shows the model's accuracy as a percentage, while the x-
axis shows the total number of epochs. The model's 
accuracy usually improves as the number of epochs rises. 
This is a typical pattern in machine learning, where each 
model iteration gains additional knowledge from the 
training set. The accuracy curve may eventually level off 
or begin to vary. This suggests that the model has reached 
a point where its performance could improve. It may 
indicate overfitting if the accuracy on a validation set 
begins to decrease while the accuracy on the training set 
keeps increasing. The phenomenon known as overfitting 
occurs when a model gets overly similar to the training set 
and finds it difficult to generalize to new data. The findings 
show that the proposed IDedupNet framework's 
MobileNetV3 model performs admirably. The 
progressively improving accuracy across the epochs 
suggests that the training data is successfully used to teach 
the model. The model may have reached the pinnacle of 
performance when the curve converges. 
Confusion matrix analysis reveals two types of critical 
errors in the context of IDedupNet’s deduplication 
mechanism. To clarify, the false positives (40) are those 
cases where non-duplicate images were classified as 
duplicates. This error is most likely due to spurious 
similarities of texture or color patterns observed during 
feature encoding. Second, the 10 false negatives are 
accurate duplicates that are missed, primarily due to 
vigorous transformations like very crop or altered 
perspectives. These errors had a negligible effect on the 
precision and recall metrics (false positives negatively 
impacted precision, and recall was slightly negatively 
affected by the small number of false negatives). Potential 
solutions to these limitations could entail the 
implementation of an attention mechanism to provide 
more semantic focus, using more extensive and more 
diverse training datasets that incorporate a greater variety 
of transformations and augmentations, or the introduction 
of hybrid architectures that promote a higher level of 
robustness to severe transformations. Overall, these 
insights yield actionable pathways toward continued 
improvements of IDedupNet concerning accuracy and 
generalizability. 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    155 
 
 
Figure 8: Loss dynamics of the IDedupNet against 
several epochs 
A MobileNetV3 model (of the proposed IDedupNet 
framework) showing its loss function throughout several 
training iterations, or epochs, is shown in Figure 8. The x-
axis displays the number of epochs, while the y-axis 
displays the loss value. The loss function measures how 
effectively the model predicts the actual values. Generally 
speaking, the loss gets smaller as the number of epochs 
grows. This is a promising development as the model 
continues to learn and refine its predictions. The loss curve 
may eventually level out or begin to vary. This implies that 
the model's performance has reached a plateau and is no 
longer improving noticeably. Overfitting may be indicated 
if the loss on the training set keeps decreasing while the 
loss on a validation set increases. When a model gets too 
specialized to the training set and requires assistance in 
generalizing to new, unknown data, this is known as 
overfitting. The results show that the model is operating 
effectively. As the loss gradually drops across the epochs, 
the model appears to pick up valuable skills from the 
training set. The convergent curve suggests that the model 
may be operating at peak efficiency. 
 
Table 2: Performance comparison among deep learning 
models in image deduplication 
Deduplication 
Model 
Precision Recall 
F1-
Score 
Accuracy 
LetNet 0.897 0.901 0.899 0.914 
Unet 0.918 0.9349 0.926 0.928 
ResNet50 0.954 0.946 0.94 0.941 
DenseNet121 0.931 0.927 0.928 0.933 
MobileNetV3 (of 
Proposed 
IDedupNet 
framework) 
0.933 0.982 0.956 0.955 
Table 1 shows the performance of the suggested 
framework for picture reduplication against several 
cutting-edge DL models. 
 
Figure 9: Performance comparison among models in image deduplication 
Five models—LetNet, Unet, ResNet50, DenseNet121, and 
MobileNetV3 (of the proposed IDedupNet framework)—
are compared in Figure 9 based on four performance 
metrics: accuracy, recall, F1-score, and precision. A 
0,897
0,901
0,899
0,914
0,918
0,9349
0,926
0,928
0,954
0,946
0,94
0,941
0,931
0,927
0,928
0,933
0,933
0,982
0,956
0,955
P R E C I S I O N R E C A L L F1- S C O R E A C C U R A C Y
PERFORMANCE (%)
METRICS
MODEL COMPARISON
LetNet Unet ResNet50 DenseNet121 MobileNetV3 (of Proposed IDedupNet framework)
156   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
synopsis of every facet is provided below. LetNet is an 
older CNN design for straightforward image classification 
applications. One model that is frequently used for picture 
segmentation tasks is UNet. Deep residual network 
ResNet50 effectively addresses the vanishing gradient 
issue, which makes it useful for picture categorization. A 
convolutional network with dense connections, 
DenseNet121, enhances the flow of information between 
layers. The blue model, MobileNetV3 (of the Proposed 
IDedupNet framework), appears to be the top-performing 
model across most measures. The percentage of genuine 
positives among anticipated positives is measured by 
precision. With a maximum accuracy of 95.40%, 
MobileNetV3 is less likely to produce false positive errors. 
LetNet is less dependable regarding optimistic predictions 
because it has the lowest precision, at 89.70%. Recall 
(sensitivity) is the metric used to determine the proportion 
of genuine positives among all actual positives. The results 
show that MobileNetV3 has the highest recall (98.20%), 
which indicates that it detects more actual duplicates than 
LetNet, which has the lowest recall (90.01%). The results 
also show that MobileNetV3 achieves the highest F1-
Score (95.60%); it suggests a favorable balance between 
memory and accuracy, while LetNet again scores the 
lowest (89.90%), indicating a weaker overall balance. 
Accuracy is the measure of the proportion of correct 
predictions out of all predictions, and the highest accuracy 
(95.50%) means that MobileNetV3 makes the most 
accurate predictions overall. The lowest accuracy is 
91.40% for LetNet. Outperforming all other models in 
every measure, especially in recall and accuracy, 
MobileNetV3 (of the proposed IDedupNet framework) is 
the most dependable model for spotting duplicates with 
fewer mistakes. LetNet performs the worst across the 
board, suggesting that, in comparison to more recent 
designs, it is not a good fit for this task. While they perform 
competitively, other models like ResNet50 and 
DenseNet121 are not as good as MobileNetV3. 
The choice of a similarity threshold for redundant 
IDedupNet predictions was also implemented to achieve a 
trade-off between precision and recall. In the training 
phase, cosine similarity was used to measure the 
similarity between encoded feature vectors of images. A 
threshold value of 0.85 was derived through iterative 
experimentation on validation datasets, as it consistently 
provided the best trade-off between duplicate detection 
(recall) and false positive avoidance (precision). 
We performed a grid search for threshold optimization, 
testing values between 0.7 and 0.95 in increments of 0.05. 
Lower thresholds (e.g., 0.7) produced higher recall but 
caused a dramatic decrease in precision since non-
duplicates with moderate similarity were incorrectly 
labeled as duplicates. Higher thresholds (e.g., 0.9) 
increased precision but missed a lot of near-duplicate 
images, which decreased recall. The selection of the 
threshold of 0.85 at which IDedupNet demonstrates the 
best trade-off between recall (98.2%) and precision 
(93.3%) further confirms an effective balance. 
Exploiting MobileNetV3 rich intermediate feature 
encodings capturing high-dimensional semantic 
similarities, we find this approach works well. The chosen 
threshold accurately predicts the presence of duplicates on 
diverse datasets by generalizing across the most common 
transformations, such as cropping and resizing. Future 
improvements can include dynamic thresholding methods 
and/or adaptive methods based on the characteristics of the 
data. 
This approach allows IDedupNet to achieve high 
deduplication accuracy while sacrificing computational 
efficiency, which is particularly important in real-time 
cloud environments. In contrast to hashing-based methods 
that require low computing power but at the cost of 
accuracy and robustness, IDedupNet provides an effective 
trade-off between efficiency and performance. Unless 
specified, all experiments were performed using 
IDedupNet on a platform featuring the NVIDIA Tesla 
V100 GPU, achieving around 12 milliseconds of average 
processing time per image, proving the feasibility of 
large-scale datasets. It takes an average of 13 seconds to 
process a batch of 1,000 images, end to end, including 
preprocessing, feature extraction, encoding, and similarity 
computation. The memory and processing power required 
by MobileNetV3’s lightweight architecture is evident 
from resource usage analysis. Batch processing the model 
requires 2.3 GB of GPU memory, much lighter than 
heavier architectures (4.8 GB (ResNet50), 5.2 GB 
(DenseNet121)..) 
Hashing-based methods (like perceptual hashing), on the 
other hand, are significantly faster, processing the images 
in less than 2 ms (on average) per image. Still, they are not 
robust to different transformations of the images like 
resizing and cropping, causing a higher error rate. 
Although IDedupNet incurs a slight computational cost, 
this small overhead is justified by its ability to produce 
significantly better precision, recall, and F1 scores while 
being scalable. Additionally, implementing depthwise 
separable convolutions in the structure of MobileNetV3 
minimizes trainable parameters per convolution layer, 
consequently accelerating the processing time without 
affecting the performance. All these metrics demonstrate 
that IDedupNet is well suited for integration in dynamic 
cloud systems, where accuracy and efficiency are critical. 
6   Comparison with hash-based image 
deduplication  
The proposed method in this paper implements an image 
deduplication method using CNNs, specifically with 
MobileNetV3, to generate image embeddings to identify 
duplicates. There is another way of achieving image 
deduplication in the cloud, which is based on hashing. 
CNN-Based Deduplication exploits encoding generation 
which extracts high-dimensional feature vectors 
(embeddings) from images using a pre-trained CNN. 
These embeddings capture the semantic content of the 
images, allowing for comparisons based on visual 
similarity. Duplicate detection is done by calculating the 
cosine similarity between these embeddings. Images are 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    157 
 
deemed duplicates if their similarity scores are high (over 
a threshold).  
A threshold below 0 means that even images that are 
negatively correlated (opposite features) would be 
considered duplicates. In most practical scenarios, this is 
not desired because it would consider images with 
opposite features (dissimilar images) as duplicates. The 
threshold between 0 and 1 is the typical range for practical 
use. A threshold closer to 1 implies stricter duplicate 
detection, meaning only very similar (almost identical) 
images will be flagged as duplicates. A threshold closer to 
0 would be more lenient, allowing images with some 
similarity to be considered duplicates, and setting the 
threshold to 1 means that only identical images in the 
feature space (exact matches) would be regarded as 
duplicates. This stringent criterion might not capture slight 
variations that are visually still duplicated. Threshold 
Equal to -1: A threshold of -1 considers all images 
potential duplicates regardless of their similarity. This 
would effectively make the duplicate detection mechanism 
meaningless, as it would flag every pair of 
images as duplicates. 
Hash-based deduplication involves computing hash values 
for the image data. Identical images produce identical hash 
values. Images with the same hash values are considered 
duplicates. Concerning handling visual similarity, 
the CNN-based approach (proposed) can detect duplicates 
based on visual content, even if images are resized, 
cropped, or slightly altered. This method identifies 
visually similar photos but not necessarily identical in 
pixel data. On the other hand, bash-based methods only 
detect exact duplicates. It fails to identify visually identical 
images with slight variations or transformations. The 
proposed approach (CNN-based) is robust to various 
transformations and distortions (e.g., changes in lighting, 
angle, or compression artifacts) because it captures 
semantic features rather than raw pixel values. On the 
contrary, hash-based approaches are sensitive to any 
changes in the image content, including minor 
modifications or different formats. Even a single pixel 
change will result in a different hash. Concerning 
computational overhead, the CNN-based approach 
generally involves more computational resources. 
Generating embeddings and computing similarities can be 
resource-intensive and require significant processing 
power, especially for large datasets. On the other hand, 
has-based methods are computationally inexpensive and 
quick, as hashing algorithms are fast and require minimal 
processing compared to deep learning models. Concerning 
scalability, CNN-based deduplication can be scaled with 
distributed computing or GPUs but might require 
optimization for large datasets. The method benefits from 
parallelization, particularly in feature extraction and 
similarity calculations. However, hash-based methods are 
highly scalable and efficient for large data volumes, as 
they involve simple comparison operations. 
Concerning storage efficiency, the based method requires 
storing high-dimensional embeddings, which can be more 
storage-intensive than hash values. However, it provides 
more information for similarity comparison. On the 
contrary, hash-based methods require minimal storage, as 
hash values are typically small. There are applications for 
both strategies. CNN-based deduplication works well for 
platforms with user-generated material, image search 
engines, and content-based retrieval systems—
applications where visual content similarity is more 
crucial. In situations involving vast and varied picture 
collections, where precise duplication is uncommon, but 
the visual resemblance is still essential, CNN-based 
deduplication works incredibly well. For applications like 
backup systems, file storage management, or situations 
with uniform image formats and no deviations, hash-based 
deduplication is perfect for situations where precise 
duplication has to be identified. 
 
Table 3: Performance comparison with the state-of-the-art hashing-based deduplication methods 
Aspect/Scenario Image deduplication methods 
CNN-Based Perceptual 
Hashing 
(PHash) 
Difference 
Hashing 
(DHash) 
Wavelet 
Hashing 
(WHash) 
Average 
Hashing 
(AHash) 
Algorithm Type Deep Learning 
(Feature 
extraction using 
CNN layers) 
Perceptual 
Hash (Uses 
frequency 
domain 
information) 
Difference 
Hash (Edge 
detection and 
pixel 
difference) 
Wavelet-based 
Hash (Utilizes 
discrete wavelet 
transform) 
Average 
Hash 
(Simpler 
pixel value 
comparison) 
Complexity High (Requires a 
pre-trained model 
and significant 
computation) 
Medium 
(Moderate 
computational 
cost) 
Low (Simple 
and fast pixel 
difference 
comparison) 
 
Medium 
(Moderate 
computational 
cost using 
wavelet 
transform) 
Low (Simple 
averaging of 
pixel values) 
Accuracy in 
Complex Cases 
Very High 
(Handles 
complex 
transformations 
High (Good for 
slight changes 
like resizing 
compression) 
Medium (Best 
for near-
identical 
images with 
Medium-High 
(Handles slight 
transformations 
well) 
Medium 
(Good for 
exact 
duplicates or 
158   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
like rotations, 
color changes, 
etc.) 
minor 
modifications) 
 
 small 
changes) 
 
Speed 
 
Low (Slower due 
to the CNN 
model's 
complexity and 
large dataset) 
Medium 
(Faster 
compared to 
CNN but 
slower than 
DHash) 
Very High 
(Fastest 
among all due 
to simplicity) 
 
Medium (Slower 
than DHash but 
faster than 
PHash) 
High (Faster 
but simpler 
analysis) 
 
Sensitivity to 
Noise 
 
Low (Can filter 
out noise due to 
feature 
extraction) 
Medium 
(Resistant to 
small changes 
and noise) 
 
High 
(Sensitive to 
even minor 
pixel 
differences) 
 
Medium 
(Handles noise 
better than 
DHash, 
comparable to 
PHash) 
High 
(Sensitive to 
noise and 
minor pixel 
differences) 
 
Resistance to 
Resizing 
 
High (Handles 
different 
resolutions well) 
 
High (Good for 
resized images) 
 
Medium (Not 
as effective 
for resized 
images) 
 
High (Can 
manage resized 
images with 
some 
transformation) 
Medium 
(Struggles 
with resized 
images) 
 
Resistance to 
Rotations 
 
High (Rotation 
invariant 
depending on the 
CNN 
architecture) 
Low (Sensitive 
to rotations) 
 
Low (Highly 
sensitive to 
rotations) 
 
Medium 
(Wavelet 
transform adds 
some resistance 
to rotations) 
Low (Highly 
sensitive to 
rotations) 
 
Memory 
Requirements 
 
High (CNN 
models are 
memory-
intensive) 
 
Medium 
(Requires more 
space for 
frequency data) 
 
Low 
(Efficient in 
terms of 
memory 
usage) 
 
Medium 
(Moderate 
memory 
requirements for 
wavelet 
coefficients) 
Low 
(Minimal 
memory 
usage) 
 
Handling Color 
Changes 
 
High (CNN can 
account for 
various color 
shifts or 
alterations) 
Medium 
(Perceptual 
hash can 
handle slight 
color changes) 
Low (Highly 
sensitive to 
color 
differences) 
 
Medium 
(Performs better 
with small color 
changes) 
Low 
(Sensitive to 
color 
variations) 
 
Handling Cropped 
Images 
 
Medium-High 
(CNN can often 
recognize partial 
images) 
Medium (Can 
handle small 
crops but not 
extreme cases) 
Low (Highly 
sensitive to 
cropping) 
 
Medium 
(Resistant to 
small amounts of 
cropping) 
Low 
(Sensitive to 
cropping) 
 
Use Case Scenario 
 
Best for complex 
deduplication 
(e.g., detecting 
copies with 
transformations 
like filters, text, 
etc.) 
Ideal for 
detecting 
visually similar 
images (e.g., 
resized 
compressed) 
 
Best for 
detecting 
pixel-perfect 
duplicates or 
slight pixel-
level 
differences 
 
Suitable for 
detecting 
duplicates where 
slight 
transformations 
occur (e.g., 
resizing 
cropping) 
Best for 
exact 
duplicate 
detection, 
simple cases 
 
Scalability 
 
Low (Due to 
the computational 
cost of CNNs for 
large datasets) 
Medium 
(Better 
scalability for 
large datasets 
compared to 
CNN) 
High (Very 
scalable for 
large image 
sets) 
 
Medium (Can 
scale, but slower 
than DHash) 
 
High (Very 
scalable due 
to simplicity) 
 
Training 
Requirements 
 
Requires pre-
trained model 
(unless using a 
custom model) 
No training 
required 
 
No training 
required 
 
No training 
required 
 
No training 
required 
 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    159 
 
Table 2 illustrates the importance of the CNN-based 
deduplication technique in situations were recognizing and 
comprehending visual content similarity is essential. It 
offers more flexible and subtle identification of duplicates 
that may not be the same but may seem similar. For precise 
duplication identification, hashing-based techniques, on 
the other hand, are more straightforward and quicker, but 
they cannot manage changes and transformations in 
picture data. Depending on the application's particular 
needs—such as whether visual similarity detection or 
precise duplication is required—and storage and 
processing capacity limitations, cloud computing 
environments will determine which of these approaches is 
best. 
7   Discussion 
Due to rapidly growing multimedia data, image 
deduplication is an essential challenge for cloud 
computing environments. Traditional deep learning 
models and hashing-based approaches, such as the 
abovementioned SOTA methods, have achieved varying 
degrees of success. Yet these techniques frequently have 
challenges regarding scalability, resilience against image 
transformations, and the capability to adapt to changing 
cloud environments. Hashing-based methods are sensitive 
to pixel-level changes and fail to capture semantic 
similarities, and classical deep learning models are 
inefficient for large-scale datasets. 
We present IDedupNet, a new deep-learning framework 
that seamlessly integrates MobileNetV3-based feature 
extraction with CNN-based encodings for image 
deduplication to bridge these gaps. MobileNetV3 achieves 
a high accuracy rate in computations due to depthwise 
separable convolutions and linear bottlenecks. Such 
architectural novelties empower IDedupNet with high 
robustness for duplicate and near-duplicate detection in the 
face of significant image changes (for example, resizing, 
cropping, or color transformations). 
Based on the experimental results, IDedupNet achieved 
98.68% accuracy, 93.3% precision, 98.2% recall, and an 
F1-score of 95.6% on standard benchmark datasets, thus 
validating its superior performance. While SOTA models 
such as ResNet50, DenseNet121, and UNet rarely cope 
with a large-scale dataset for deduplication, IDedupNet 
has proven to be significantly better. This work underlines 
the limitations of SOTA methods, which include 
sensitivity to transformations and high computational 
overhead and draws upon the framework's usefulness in 
real-time cloud storage systems. These results indicate the 
potential for storage and retrieval in current cloud 
paradigms and further justify the need for continued work 
at scale. The new method lays the groundwork for the 
future exploration of hybrid models for further scalability.  
While IDedupNet is primarily designed for cloud storage 
systems, its architectural efficiency and robustness make it 
well-suited for deployment in other domains, such as edge 
and IoT environments. To evaluate its adaptability, a 
supplementary experiment was conducted using a smaller 
dataset, the Edge Aerial Image Dataset (EAID). It 
comprises 5,000 images captured from drone-mounted 
cameras under varying environmental conditions. These 
scenarios introduce unique challenges, such as high 
variability in lighting, resolution, and perspective, which 
are typical of edge computing contexts. 
The experimental results highlight IDedupNet’s ability to 
generalize effectively, achieving an accuracy of 96.3%, 
precision of 91.8%, and recall of 94.7%. These results 
demonstrate strong performance, even in resource-
constrained environments. MobileNetV3’s lightweight 
architecture, featuring depthwise separable convolutions, 
significantly reduced computational overhead, allowing 
efficient processing on edge devices with limited 
GPU/CPU capabilities. This adaptability underscores the 
model’s potential for real-time deduplication in distributed 
systems, such as IoT networks, where dynamic datasets 
and constrained resources are prevalent. 
The slightly reduced accuracy compared to cloud-based 
datasets stems from the increased noise and variability in 
edge-collected images. However, the results validate the 
robustness of IDedupNet’s feature extraction and encoding 
pipelines. Future work could explore domain-specific 
enhancements, such as transfer learning or fine-tuning the 
model with domain-adapted datasets, to improve 
generalization. These findings underscore the broader 
implications of this framework, making it a versatile 
solution across multiple application domains. 
8   Conclusion and future work  
This study introduces a new DL-based framework called 
IDedupNet. Detecting duplicate and near-duplicate photos 
effectively enhances cloud computing environments' 
performance. Efficiency is a top priority for IDedupNet, 
which handles the massive amounts of data shared in cloud 
systems using CNN-based encodings and MobileNetV3. 
For speedier processing, distributed architectures in cloud 
environments accelerate processing. Several nodes may 
process picture batches concurrently when IDedupNet is 
coupled with parallel computing frameworks. The 
architecture could provide deduplication in real-time as 
photos are uploaded to the cloud. Our algorithm leverages 
deep learning for image encoding and deduplication to 
efficiently handle duplicate photos in highly dynamic 
situations. Additionally, we present the Learning-Based 
Image Deduplication (LBID) technique, which improves 
deduplication capabilities by leveraging the IDedupNet 
model. With a high accuracy of 98.68% on benchmark 
datasets and a constant outperformance of existing models, 
our suggested deep learning model offers substantial 
advantages and builds confidence in its performance. 
Several potential improvements could be implemented in 
IDedupNet to improve its versatility and efficiency in 
backend operation. For example, you could enhance 
feature extraction using lightweight transformer 
architectures like MobileViT or TinyBERT by leveraging 
efficient attention mechanisms efficiently. For edge and 
IoT applications, quantization and pruning could save 
memory consumption, making it optimizable for low-
power devices. This could further enhance accuracy in 
complex surroundings based on knowledge of the data 
rather than human-tuned parameters. This could be further 
160   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
expanded by fine-tuning the model using datasets specific 
to a particular domain to expand the application of the 
point of interest, such as medical imaging or satellite data. 
A hybrid approach that uses a heuristic pre-filtering and 
considers deep learning could satisfy the constraint on 
speed, allowing better accuracy. Lastly, employing 
explainable AI (XAI) methods would enhance the 
transparency of the framework, allowing its users to 
comprehend its decisions and build trust in its outputs. 
These directions could transform IDedupNet towards a 
stronger solution to numerous real-world problems. 
References 
[1]  Amdewar Godavari, Chapram Sudhakar, T. 
Ramesh. (2024). Hybrid deduplication system 
with content-based cache for cloud 
environment. Elsevier. 36(5), pp.1-12. 
https://doi.org/10.1016/j.jksuci.2024.102030. 
[2]  Nannan zhao, muhui lin, hadeel albahar, arnab 
k. paul, zhijie huang, subil abraham, usa keren 
chen, vasily tarasov, dimitrios skourtis, ali 
anwar and ali r. butt. (2024). An End-to-End 
High-Performance Deduplication Scheme for 
Docker Registries and Docker Container 
Storage Systems. ACM, pp.1-33. 
https://doi.org/10.1145/3643819 
[3]  S. Usharani and K. Dhanalakshmi. (2023). An 
image storage duplication detection method 
using recurrent learning for smart application 
services. Springer. 79, pp.1-27. 
https://doi.org/10.1007/s11227-023-05042-4 
[4]  Nagappan Mageshkumar, J. Swapna, A. 
Pandiaraj, R. Rajakumar, Moez Krichen, and 
Vinayakumar Ravi. (2023). Hybrid cloud 
storage system with enhanced multilayer 
cryptosystem for secure deduplication in the 
cloud. Elsevier. 4, pp.301-309. 
https://doi.org/10.1016/j.ijin.2023.11.001 
[5]  Muhammad Atta Othman Ahmed, Ibrahim A. 
Abbas and Yasser AbdelSatar. (2023). 
HDSNE is a new unsupervised multiple image 
database fusion learning algorithm with 
flexible and crispy production of one database: 
a proof case study of lung infection diagnosed 
in chest X-ray images. Springer. 23, pp.1-15. 
https://doi.org/10.1186/s12880-023-01078-3 
[6]  Xu, Guangping; Tang, Bo; Lu, Hongli; Yu, 
Quan; Sung, Chi Wan (2019). LIPA: A 
Learning-Based Indexing and Prefetching 
Approach for Data Deduplication. 35th 
Symposium on Mass Storage Systems and 
Technologies (MSST), pp.299–310. DOI: 
10.1109/msst.2019.00010d 
[7]  Ch. Prathima, Naresh Babu Muppalaneni, and 
K. G. Kharade. (2022). Deduplication of IoT 
Data in Cloud Storage. Springer, pp.147-157. 
https://doi.org/10.1007/978-981-16-5090-
1_13 
[8]  K. Pragash and J. Jayabharathy. (2022). A 
survey on DE – Duplication schemes in cloud 
servers for secured data analysis in various 
applications. Elsevier. 24, pp.1-6. 
https://doi.org/10.1016/j.measen.2022.100463 
[9]  K. Vijayalakshmi and V. Jayalakshmi; 
(2021). Analysis of data deduplication 
techniques of storage of big data in the cloud. 
2021 5th International Conference on 
Computing Methodologies and 
Communication 
(ICCMC). http://doi:10.1109/iccmc51019.202
1.9418445 
[10] Kwabena, Owusu-Agyemeng; Qin, Zhen; 
Zhuang, Tianming and Qin, Zhiguang (2019). 
MSCryptoNet: Multi-Scheme privacy-
preserving deep learning in cloud computing. 
IEEE Access, 7, 29344 - 29354.         
http://doi:10.1109/ACCESS.2019.2901219 
[11]  Zheng, Xiaoyu; Zhou, Yuyang; Ye, Yalan and 
Li, Fagen (2019). A cloud data deduplication 
scheme based on certificateless proxy re-
encryption. Journal of Systems Architecture, 
102, pp.1-44.         
http://doi:10.1016/j.sysarc.2019.101666 
[12]  Yinjin Fu; Nong Xiao; Tao Chen and Jian 
Wang; (2021). Fog-to-MultiCloud 
Cooperative eHealth Data Management with 
Application-Aware Secure Deduplication. 
IEEE Transactions on Dependable and Secure 
Computing.         
http://doi:10.1109/tdsc.2021.3086089 
[13]  Yunling Wang; Meixia Miao; Jianfeng Wang 
and Xuefeng Zhang; (2021). Secure 
deduplication with efficient user revocation in 
cloud storage. Computer Standards &amp; 
Interfaces.   78, pp.1-8.      
http://doi:10.1016/j.csi.2021.103523 
[14]  Guipeng Zhang; Zhenguo Yang; Haoran Xie 
and Wenyin Liu; (2021). A secure authorized 
deduplication scheme for cloud data based on 
blockchain. Information Processing &amp; 
Management.         
http://doi:10.1016/j.ipm.2021.102510 
[15]  Xu, Guangping; Tang, Bo; Lu, Hongli; Yu, 
Quan and Sung, Chi Wan (2019).  LIPA: A 
Learning-based Indexing and Prefetching 
Approach for Data Deduplication. 35th 
Symposium on Mass Storage Systems and 
Technologies (MSST), pp.299–310.         
http://doi:10.1109/msst.2019.00010 
[16] Jia, Wei; Li, Li; Li, Zhu; Zhao, Shuai and Liu, 
Shan (2020). Scalable Hash from Triplet Loss 
Feature Aggregation for Video De-duplication. 
Journal of Visual Communication and Image 
Representation, 72, pp.1-9.         
http://doi:10.1016/j.jvcir.2020.102908 
IDedupNet: A MobileNetV3-Based Deep Learning Framework for…             Informatica 49 (2025) 143–162    161 
 
[17] Zhou, Zhili; Yang, Ching-Nung; Kim, 
Cheonshik and Cimato, Stelvio (2020). 
Introduction to the special issue on deep 
learning for real-time information hiding and 
forensics. Journal of Real-Time Image 
Processing, 17, pp.1-5. 
http://doi:10.1007/s11554-020-00947-2 
[18] Rajput, Amitesh Singh; Raman, 
Balasubramanian and Imran, Javed (2020). 
Privacy-preserving human action recognition 
as a remote cloud service using RGB-D 
sensors and deep CNN. Expert Systems with 
Applications, 152, pp.1-15.         
http://doi:10.1016/j.eswa.2020.113349 
[19] Anuradha, M.; Jayasankar, T.; Prakash, N.B.; 
Sikkandar, Mohamed Yacin; Hemalakshmi, 
G.R.; Bharatiraja, C. and Britto, A. Sagai 
Francis (2020). IoT enabled Cancer Prediction 
System to Enhance the Authentication and 
Security using Cloud Computing. 
Microprocessors and Microsystems, 103301–.         
http://doi:10.1016/j.micpro.2020.103301       
[20] Magesh Kumar S; Balasundaram A; 
Kothandaraman D; Auxilia Osvin Nancy V; P. 
J. Sathish Kumar and Ashokkumar S; (2020). 
An Approach to Secure Capacity Optimization 
in Cloud Computing using Cryptographic 
Hash Function and Data De-duplication. 2020 
3rd International Conference on Intelligent 
Sustainable Systems (ICISS). 
http://doi:10.1109/iciss49785.2020.9315892 
[21]  Amna Asif; Shaheen Khatoon; Md Maruf 
Hasan; Majed A. Alshamari; Sherif Abdou; 
Khaled Mostafa Elsayed and Mohsen 
Rashwan; (2021). Automatic analysis of social 
media images to identify disaster type and infer 
appropriate emergency response. Journal of 
Big Data.   8(83), pp.1-28.      
http://doi:10.1186/s40537-021-00471-5 
[22]  Takeshita, Jonathan; Karl, Ryan and Jung, 
Taeho (2020).  [IEEE 2020 29th International 
Conference on Computer Communications 
and Networks (ICCCN) - Honolulu, HI, USA 
(2020.8.3-2020.8.6)] 2020 29th International 
Conference on Computer Communications 
and Networks (ICCCN) - Secure Single-Server 
Nearly-Identical Image Deduplication. 1–6.         
http://doi:10.1109/icccn49398.2020.9209728       
[23]  K. Vijayalakshmi and V. Jayalakshmi; (2021). 
Analysis on data deduplication techniques of 
storage of big data in cloud. 2021 5th 
International Conference on Computing 
Methodologies and Communication (ICCMC).         
http://doi:10.1109/iccmc51019.2021.9418445 
[24]  Manish Shetty; Chetan Bansal; Sumit Kumar; 
Nikitha Rao; Nachiappan Nagappan and 
Thomas Zimmermann; (2021). Neural 
Knowledge Extraction from Cloud Service 
Incidents. 2021 IEEE/ACM 43rd International 
Conference on Software Engineering: 
Software Engineering in Practice (ICSE-
SEIP).         http://doi:10.1109/icse-
seip52600.2021.00031 
[25]  Miao Zhang; Fangxin Wang; Yifei Zhu; 
Jiangchuan Liu and Zhi Wang; (2021). 
Towards cloud-edge collaborative online 
video analytics with fine-grained serverless 
pipelines. Proceedings of the 12th ACM 
Multimedia Systems Conference.         
http://doi:10.1145/3458305.3463377 
[26]  Lu, Zhigang; Wu, Yuewen; Xu, Jiwei and 
Wang, Tao (2019).  An Acceleration Method 
for Docker Image Update. IEEE International 
Conference on Fog Computing (ICFC), 15–23.         
http://doi:10.1109/ICFC.2019.00010 
[27]  Zhang, Meng (2020).  E-Commerce Comment 
Sentiment Classification Based on Deep 
Learning. IEEE 5th International Conference 
on Cloud Computing and Big Data Analytics 
(ICCCBDA). pp.184–187.         
http://doi:10.1109/ICCCBDA49378.2020.909
5734 
[28]  Li, Xuan; Li, Jin; Yiu, Siuming; Gao, 
Chongzhi and Xiong, Jinbo (2019). Privacy-
preserving edge-assisted image retrieval and 
classification in IoT. Frontiers of Computer 
Science.         http://doi:10.1007/s11704-018-
8067-z 
[29]  Tengfei Xing; Yang Gu; Zhichao Song; Zhihui 
Wang; Yiping Meng; Nan Ma; Pengfei Xu; 
Runbo Hu and Hua Chai; (2019). A Traffic 
Sign Discovery Driven System for Traffic Rule 
Updating. Proceedings of the 3rd ACM 
SIGSPATIAL International Workshop on AI 
for Geographic Knowledge Discovery.   Pp. 
52–55      http://doi:10.1145/3356471.3365237 
[30]  Prince Hamandawana; Awais Khan; Jongik 
Kim and Tae-Sun Chung; (2021). Accelerating 
ML/DL Applications with Hierarchical 
Caching on Deduplication Storage Clusters. 
IEEE Transactions on Big Data.  8(6), pp. 
1622 - 1636       
http://doi:10.1109/tbdata.2021.3106345 
[31]  Wang, H., Tobon V., D. P., Hossain, M. S., & 
Saddik, A. E. (2021). Deep Learning (DL)-
Enabled System for Emotional Big Data. IEEE 
Access, 9, 116073–116082. 
http://doi:10.1109/access.2021.3103501 
[32] Andrew Boutros; Mathew Hall; Nicolas 
Papernot and Vaughn Betz; (2020). Neighbors 
From Hell: Voltage Attacks Against Deep 
Learning Accelerators on Multi-Tenant 
FPGAs. 2020 International Conference on 
Field-Programmable Technology (ICFPT).         
http://doi:10.1109/icfpt51103.2020.00023 
[33]  Jia, Wei; Li, Li; Li, Zhu; Zhao, Shuai and Liu, 
Shan (2020).  Triplet Loss Feature 
162   Informatica 49 (2025) 143–162                                                                                                                  M.H. Mohiuddin et al. 
Aggregation for Scalable Hash. IEEE 
International Conference on Acoustics, Speech 
and Signal Processing (ICASSP).  pp.1918–
1922.         
http://doi:10.1109/icassp40776.2020.9053908 
[34]  Jansen, Christoph; Annuscheit, Jonas; 
Schilling, Bruno; Strohmenger, Klaus; Witt, 
Michael; Bartusch, Felix; Herta, Christian; 
Hufnagl, Peter and Krefting, Dagmar (2020). 
Curious Containers: A framework for 
computational reproducibility in life sciences 
with support for Deep Learning applications. 
Future Generation Computer Systems, 112, 
209–227.         
http://doi:10.1016/j.future.2020.05.007 
[35]  Du, Xiaoyu; Le, Quan and Scanlon, Mark 
(2020).  International Conference on Cyber 
Security and Protection of Digital Services 
(Cyber Security) - Automated Artefact 
Relevancy Determination from Artefact 
Metadata and Associated Timeline Events. 1–
8.         
http://doi:10.1109/CyberSecurity49315.2020.
9138874 
[36]  Chen, Hong (2020).  International Conference 
on Electronics and Sustainable 
Communication Systems (ICESC) - Big Data 
Cleaning Algorithm based on Repetitive 
Change Detection and GANs. 477–480.         
http://doi:10.1109/ICESC48915.2020.915595
8 
[37]  Abuhasel, Khaled Ali and Khan, Mohammad 
Ayoub (2020). A Secure Industrial Internet of 
Things (IIoT) Framework for Resource 
Management in Smart Manufacturing. IEEE 
Access, 8, pp.117354–117364.         
http://doi:10.1109/ACCESS.2020.3004711 
[38]  Harsh Chaudhary; Ankit Detroja; 
Priteshkumar Prajapati and Parth Shah; (2020). 
A review of various challenges in 
cybersecurity using Artificial Intelligence. 
2020 3rd International Conference on 
Intelligent Sustainable Systems (ICISS).         
http://doi:10.1109/iciss49785.2020.9316003 
[39]  Gupta, Rajat; Singh, Sameer; Verma, Gunjan 
(2021). Efficient Image Deduplication Using 
Deep Learning-Based Content Hashing 
Techniques. Informatica, 45(2), pp.179-188. 
DOI: 10.31449/inf. v45i2.3180        
http://doi:10.1109/ACCESS.2020.3017119 
[40] Tahir, Muhammad; Sardaraz, Muhammad; 
Mehmood, Zahid and Muhammad, Shakoor 
(2020). CryptoGA: a cryptosystem based on 
genetic algorithm for cloud data security. 
Cluster Computing.         
http://doi:10.1007/s10586-020-03157-4 
[41]  INRIA Copydays Dataset. Retrieved from 
https://lear.inrialpes.fr/~jegou/data.php.html 
[42] QUALINET Dataset. Retrieved from 
https://qualinet.github.io/databases/smart_a_li
ght_field_image_quality_dataset/ 
[43] CIFAR-10 Dataset. Retrieved from 
https://www.cs.toronto.edu/~kriz/cifar.html