https://doi.org/10.31449/inf.v47i7.4745 Informatica 47 (2023) 121–132 121 
 
Locality Improvement Scheme Based on QR Code Technique Within 
Inverted Index 
Aya A. Alyousif
*
, Ali A. Yassin 
E-mail:  pgs.aya.alyousif@uobasrah.edu.iq, ali.yassin@uobasrah.edu.iq 
Department of Computer Science, Education College for Pure Sciences, University of Basrah, Basrah, Iraq 
*
Corresponding author 
Keywords: advanced encryption standard, locality, qr code, searchable encryption  
Received: March 16, 2023 
Searchable symmetric encryption is one of the most important modern technologies that allow the owner 
to store private data on an unreliable server and search for the data securely while preserving the data’s 
confidentiality and privacy. This field has several schemes, but these schemes suffered from slower data 
retrieval in the case of large database sizes owing to the poor locality. Hence, the server visits several 
memory locations for a single query. Other studies focused on improving the locality, but the result is 
either increased storage capacity or decreased efficiency of data reading. In the present study, we present 
a secure, searchable scheme that overcomes the abovementioned issues and works to improve the locality 
by exploiting the QR code technique and the Advanced Encryption Standard algorithm. Furthermore, our 
work maintains read efficiency, helps reduce the risk of data breaches, and protects sensitive information 
from being accessed by unauthorized individuals. Moreover, the proposed scheme can resist cyber 
security attacks, such as frequency analysis attacks and keyword guessing attacks. Additionally, we used 
real-world data in our experiments and demonstrated that our proposed scheme is secure and practically 
efficient and holds high accuracy. 
Povzetek: Predstavljana je varnostna iskalna shema za izboljšanje šifriranja, ki izboljšuje lokalnost, 
ohranja učinkovitost branja in je odporna na kibernetske napade. 
 
1  Introduction 
Cloud storage outsourcing is a service in which a company 
outsources the storage of its data to a cloud storage 
provider. Cloud storage outsourcing can provide several 
benefits, including cost savings, increased scalability and 
flexibility, and improved data security and availability. 
Cloud storage providers typically have strict security 
measures to protect customer data, including measures 
such as encryption, secure data centers, and access 
controls. However, the data owner (𝐷𝑊 ) or company is 
still responsible for ensuring that its sensitive data are 
protected and in compliance with relevant regulations or 
industry standards.  
     Searchable symmetric encryption (𝑆𝑆𝐸 ) is one of the 
most important operations that can be applied on sensitive 
data in the outsourcing cloud [1]. Using searchable 
encryption to store sensitive data in the cloud can help to 
protect the data from unauthorized access while still 
allowing it to be searched and used in a controlled manner. 
However, carefully evaluating the security and 
performance tradeoffs of different searchable encryption 
approaches and ensuring that the chosen approach meets 
the 𝐷𝑊 ’s needs and security requirements are essential. In 
more detail, 𝑆𝑆𝐸 is done by the 𝐷𝑊 as a first step to 
encrypt the data using a secret key, and then, it is sent to  
 
 
the server. After, the 𝐷𝑊 creates a secure index based on 
its own database. The encrypted data and a secure index  
are sent to the cloud storage server, which could be the 
same untrusted server or another server chosen by the 𝐷𝑊 
as a third party. The 𝐷𝑊 generates a search token used to  
retrieve information based on a secure index file to 
perform a secure search.  
     SSE schemes in the literature review have several 
syntaxes, which are divided into two types. Each type 
relies on the interaction between the server and the 𝐷𝑊 in 
each request query about data. The first type is a single-
round interaction where the server also decrypts the data 
and sends the result to the 𝐷𝑊 (therefore, the server learns 
the output). Additionally, the second type uses more than 
one round of interaction (where the server learns no 
information about the output) [2]. We use the second type 
in our scheme for the security of the exchanged data over 
the communication channel between server and client. 
Therefore, the server cannot decrypt the data and preserve 
the privacy of the client’s data. 𝑆𝑆𝐸 schemes face several 
challenges as follows: 
 
• Key management 
In the 𝑆𝑆𝐸 scheme, the same secret key is used for 
encrypting and decrypting the data. This case implies that 
the key must be safely distributed to all parties who need 
to be able to look for the encrypted data. This can pose a 
122 Informatica 47 (2023) 121–132 A.A. Alyusif et al. 
difficulty, particularly in large organizations with many 
users [3]. 
 
• Security  
Although symmetric searchable encryption schemes are 
generally secure, they are vulnerable to key compromise. 
If the secret key is compromised, an attacker could decrypt 
the encrypted data and potentially perform unauthorized 
searches. 
 
 
• Key size 
 To ensure the security of the encrypted data, the secret 
key used in a symmetric searchable encryption scheme 
must be long enough to withstand brute-force attacks. This 
case can increase the size of the key, which can be a 
problem in systems with limited storage or bandwidth. 
 
• Performance 
Symmetric searchable encryption schemes can be slower 
than other encryption schemes because they require the 
use of a single shared key for encryption and decryption. 
This case can be a problem when searching large datasets 
as the search may take a long time to complete. Another 
issue related to the slow searching in large datasets is poor 
locality. Thus, locality is defined as the number of times a 
server accesses multiple memory locations separately  
in response to a single request from the user [1], [2], [4]–
[6].  
     The improvement in locality always leads to a negative 
impact on other characteristics, such as the efficiency of 
reading or a significant increase in storage capacity. 
Hence, finding a scheme that combines the best locality 
and the efficiency of reading and storage is difficult. 
Our contribution 
 In this study, we propose a good locality secure 
searchable encryption based on the QR code technology 
and symmetric encryption method to solve the problem of 
searchable encryption in large datasets. Additionally, we 
use a new inverted index to improve locality that 
contributes to preventing leakage data and preserves the 
privacy data of 𝐷𝑊 . Provided with a comprehensive 
description of our scheme and a discussion of its resistance 
against several of the most famous attacks on SSE. We will 
briefly summarize our contributions as follows: 
• Optimization locality by 800 times for small 
databases that contain less than a thousand identifiers 
and by 400 times for large databases. Furthermore, we 
applied our proposed scheme in real-world data, and 
the experimental results denote that our work 
achieves the best results in performance, read 
efficacy, and resisting malicious attacks compared 
with previous works. 
 
• Our proposed scheme does not excessively affect 
storage and maintains storage without resulting in a 
large increase due to locality improvement. 
Furthermore, it maintains read efficiency at 𝑂 (1). 
 
Character Description 
𝑉 Number of users 
𝑈 𝑖 User, where 𝑖 ∈ 𝑉 
𝑤 𝑖 Word 
𝑚 Number of words 
𝑊 Words in 𝐷𝐵 , 𝑊 =
{𝑤 𝑖 , . . . , 𝑤 𝑚 } 
𝑖𝑑 Identifier 
𝑛 Total of identifiers 𝐷𝐵 
𝑛 𝑖 Total of identifiers 𝑤 𝑖 
𝑁 ∑ |𝐷𝐵 (𝑤 𝑖 )|
𝑖 =1
𝑚 where 
𝐷𝐵 (𝑤 𝑖 ) = {𝑖𝑑
1
, . . , 𝑖𝑑
𝑛 𝑖 } 
𝑃𝑅𝐹 Pseudo-random function 
𝐻 𝑇 Hash table representing data 
structures used to store and 
retrieve data and consist of 
a pair of algorithms Add 
and Get [6] 
𝐴𝑑𝑑 Algorithm adds pairs of 
(𝑘𝑒𝑦 , 𝑣𝑎𝑙𝑢𝑒 )  to 𝐻 𝑇 
𝐺𝑒𝑡 value=Get(key) 
𝑆 String 
Ŝ Encrypted string 
𝑙𝑎 Label is used to store and 
retrieve Ŝ in 𝐻 𝑇 , 
𝐴𝑑𝑑 (𝑙𝑎 , Ŝ), Ŝ = 𝐺𝑒𝑡 (𝑙𝑎 ) 
𝐸𝑛𝑐 Function to encryption 𝑆 
𝐷𝑒𝑐 Function to decryption Ŝ 
𝑘 1
 Derivative key to create   𝑙𝑎 
𝑘 2
 Derivative key to encrypted 
and decrypted 
𝑏 Block of word identifiers 
𝑄 _𝑖𝑚𝑔 QR code  image 
𝐶𝑟𝑒𝑎𝑡 _𝑞 Create QR code function 
𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑠 _𝑞 Convert string to  𝑄 _𝑖𝑚𝑔 
function 
𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑞 _𝑠 Convert 𝑄 _𝑖𝑚𝑔 to string 
function 
Ť Encryption 𝑇 
𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 A list used by the CS to store 
the values encoded in it to 
respond to the user query 
𝑟𝑒𝑎𝑑 Read 𝑄 _𝑖𝑚𝑔 function 
𝑘 𝑢 Key used to secure 𝑇 and 
𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 exchanged 
Table 1:  List of symbols 
A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 123 
 
 
between major 
components 𝐶𝑆 and 𝑈 𝑖 
𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 Encryption 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 
 
2 General background 
2.1 𝑺𝑺𝑬 algorithms 
• 𝑘 ← 𝐺𝑒𝑛 (1
λ
): Is a key generation algorithm that is 
run by the 𝐷𝑊 token as a security parameter 1
λ
 
(input) and a secret key 𝑘 used for 
encrypting/decrypting database 𝐷𝐵 . 
• 𝑆𝐼 ← 𝐸𝑛𝑐 (𝑘 , 𝐷𝐵 ): Is used for building a secure 
index file (𝑆𝐼 ) based on 𝑘 and 𝐷𝐵 . 
• 𝑇 ← 𝑇𝑟𝑝𝑑𝑟 (𝑘 , 𝑤 𝑖 ): This algorithm is more 
concerned with preventing the disclosure of 
information stored on the server as an encrypted list 
of keywords (𝑆𝐼 ). As a result, the server responds to 
the users’ request (𝑇 ) in a safe manner. 
• 𝑑 ← 𝑆 𝑒 𝑎𝑟𝑐 ℎ(𝑆𝐼 , 𝑇 ): Is a deterministic algorithm run 
by the server to search for the data d through a 
trapdoor 𝑇 in the secure index 𝑆𝐼 . If d is encrypted, 
then we will need a resolve algorithm. 
•  𝑅 ← 𝑅𝑒𝑠𝑜𝑙𝑣𝑒 (𝑘 , 𝑑 ): This algorithm is performed 
by the 𝐷𝑊 to recover an identifiers for the keyword. 
It takes a secret key 𝑘 and a data point d as inputs and 
outputs of the final result 𝑅 . 
2.2 Locality and read efficiency  
We must get acquainted with the concept of read patterns 
to understand locality in more detail. The search function 
for any 𝑆𝑆𝐸 scheme by the server 𝑑 ← 𝑆𝑒𝑎𝑟𝑐 ℎ(𝑆𝐼 , 𝑇 ) 
can be analyzed into a series of intervals 
 [𝑎 1
, 𝑏 1
]... [𝑎 𝑣 , 𝑏 𝑣 ], where the server starts from the first 
interval [𝑎 1
, 𝑏 1
] to the last interval [𝑎 𝑣 , 𝑏 𝑣 ] depending on 
𝑇 in 𝑆𝐼 . Moreover, we can express these intervals as a read 
pattern function 𝑅𝑑𝑃𝑎𝑡 (𝑆𝐼 , 𝑇 ). When only one interval is 
obtained, the scheme has the best possible locality [4], [6]. 
 
Deﬁnition (Locality 𝑳 ) An 𝑆𝑆𝐸 scheme 𝛱 has locality (𝐿 ) 
with each security parameter (1
λ
, 𝐷𝐵 , and 𝑤 𝑖 ∈ N). We 
use 𝑅𝑒𝑎𝑑𝑃𝑎𝑡 (𝑆𝐼 , 𝑇 ) consisting of 𝐿 intervals with 
probability 1, when 𝑇 and 𝑆𝐼 are computed as follows: 
 𝑘 ← 𝐺𝑒𝑛 (1
λ
), 𝑆𝐼 ← 𝐸𝑛𝑐 (𝑘 , 𝐷𝐵 )  and 𝑇 ←
 𝑇 𝑟𝑝𝑑𝑟 (𝑘 , 𝑤 𝑖 ). Furthermore, if 𝐿 = 1, the 𝑆𝑆𝐸 scheme 
(𝛱 ) has perfect locality. However, the perfect locality is 
insufficient because we can simply transform any scheme 
into a perfect locality by the poor read efficiency, which is 
defined as reading only the required data from the server 
for each query[4]. 
 
Deﬁnition  (Read Eﬃciency 𝑬 ) An 𝑆𝑆𝐸 scheme 𝛱 is 𝐸 -
read eﬃcient with each main parameters 1
λ
, DB, and 𝑤 𝑖 ∈
 N. We have 𝑅𝑒𝑎𝑑𝑃𝑎𝑡 (𝑆𝐼 , 𝑇 ) that consists of intervals of 
total length at most 𝐸 · |𝐵𝑖𝑛𝐸𝑛𝑐 (𝐷𝐵 (𝑤 𝑖 ))| bits, where 
𝐵𝑖𝑛𝐸𝑛𝑐 (𝐷𝐵 (𝑤 𝑖 )) represents binary coding of 𝐷𝐵 (𝑤 𝑖 ). 
The concatenation of all keywords’ identiﬁers signified as 
bit strings [4]. A perfect locality can also be obtained by 
violating storage efficiency by creating a perfect locality 
scheme but  with  excessive storage space overhead (the 
encrypted 𝐷𝐵 should not be much larger than the original 
𝐷𝐵 ).  
2.3 Related works 
In 2000, Song et al. defined SSE and provided efficient 
constructions [7]. These mechanisms securely store data 
on an untrusted server but cannot retrieve client's data. 
Many subsequent studies focused on this field, achieving 
SSE's basic principle. However, practical experiments 
with large databases revealed poor performance and 
degradation with increasing size. The reason is that they 
often suffer from a bottleneck problem [8]. The literature 
found that the bottleneck in these schemes was not caused 
by encryption but by the lower-level memory access 
issues in more specifically poor locality. The known 
constructions can be broadly categorized into two.  
     The first approach is characterized by  linear space and 
constant read efficiency but poor locality in [8], [9]. An 
array of size 𝑁 is allocated, and 𝑁 elements of the database 
are uniformly mapped into the array. To recover a list of 
documents that contain a given keyword, each document 
identifier is stored in the array together with a pointer to 
the next document identifier in the list. Additionally, the 
first approach requires the server to access random 
locations in the array with the number of identifiers that 
the word appears in. This case is inefficient because of 
poor performance resulting from moving to a large 
number of different locations.  
     The second approach has optimal read efficiency and 
locality but at the cost of substantial space overhead [10], 
[12]. The strategy behind this approach is to allocate a 
sufficiently large array and uniformly map the list of word 
identifiers into a contiguous interval in the array by the 
length of word identifiers, without any overlaps among 
different lists. To efficiently retrieve a list for a given 
keyword, the server needs to access only a single random 
location and read all consecutive identifier entries, thereby 
resulting in optimal read efficiency and locality. However, 
the locations of the lists in the array reveal information 
about the structure of the underlying database. Therefore, 
padding must be applied to conceal information about the 
lengths of the lists, leading to a polynomial space 
overhead. Hence, creating a scheme with the best locality, 
storage, and read efficiency is a challenge, as David Cash 
and  Tessaro in 2014 [4] proved that it is impossible for this 
to happen. They also set a lower bound on the tradeoff 
among these three criteria. In addition to their 
improvement on the locality by creating a scheme with a 
logarithmic locality (𝑙𝑜𝑔 𝑁 ), the storage space is not good 
enough 𝑂 (𝑁 𝑙𝑜𝑔 𝑁 ).  
     In 2016, Gilad Asharov et al. [2], in their third scheme, 
updated the locality of David cash and Tessaro scheme to 
become 𝑂 (1) with the same storage space.  
     In 2017, Demertzis and Papamanthou [5] created two 
schemes, the first with optimal locality and space 𝑂 (𝑁 𝑆 𝑙 ). 
𝑆 𝑙 is the number of levels used to store data, but there was 
124 Informatica 47 (2023) 121–132 A.A. Alyusif et al. 
an effect on read efficiency by a small percentage, and the 
storage space is still large. As the second scheme worked 
in the same storage space as the first scheme, it achieves 
to a tunable locality, which is chosen as a parameter by 
𝐷𝑊 during the setup phase. 
     In 2021, Asharov et al. [6] significantly strengthened 
the lower bound of Cash and Tessaro by creating two 
general frameworks, the first pad-and-split framework and 
the second statistical-independence framework. 
     From 2021 to 2023, various research studies in SSE 
across different domains have emerged, highlighting 
numerous benefits. However, all of them still lack good 
locality, as evidenced by references [14]–[19]. 
Table 2 displays previous works in terms of three 
critical features: locality, reading efficiency, and storage 
space. It is worth noting that none of the works satisfy all 
three characteristics simultaneously. Certain searches 
exhibit poor locality, such as [8], [9], [13], where a search 
word containing 2,000 identifiers 𝑛 𝑖 = 2000, would 
require the cloud server to traverse through 2,000 distinct 
positions to fulfill the user's request. 
As a result, the searchable symmetric encryption's overall 
performance is hampered. In certain prior research, the 
locality has been favorable, as in [2], [4]and [10]. 
In cases where the locality is 𝑂 (1), the cloud server can 
fulfill the user's request by moving to just one location. If 
𝑁 = 1200 and the locality is 𝑂 (𝑙𝑜𝑔 𝑁 ), it would be 10, 
implying that the cloud server would need to traverse 10 
distinct positions to fulfill the user's request. Despite the 
favorable locality in some prior research [2], [4] and [10], 
they encountered issues with large storage space. For 
instance, if 𝑁 = 1200 and the storage space is 
𝑂 (𝑁 𝑙𝑜𝑔 𝑁 ), the storage space will expand tenfold 
compared to its original size, such as in [2] [4]. 
 
Additionally, in other cases, the word that belongs to 
the most extensive set of identifiers can impact the storage 
space where, the storage space O(( Max|DB(w
i
)|)𝑛 ) . If 
this word corresponds to all the identifiers in the database,   
that is, 𝑛 = 𝑛 𝑖 the size of the encrypted index can become 
extremely large, as in [10]. 
 
3 Discussion 
In this section, we will present our work's position in 
relation to locality, and storage space. By leveraging QR 
code technology, we have successfully improved the 
locality to 𝑂 (𝑛𝑞 ) by grouping a set of identifiers together 
in a single 𝑄 _𝑖𝑚𝑔 . This has led to a significant reduction 
in the number of cloud server readings required. This has 
the added benefit of reducing the number of times the user 
needs to perform decryption operations, resulting in a 
more efficient and streamlined experience. 
Regarding storage space, since  = ∑ |𝐷𝐵 (𝑤 𝑖 )|
𝑖 =1
𝑚 , 
according to the previous example, the value of 𝑁 =
 1700, and this is its value in general. As for our work, 𝑁 
depends mainly  ∑ 𝑛𝑞 for all words, and in previous 
example ∑ 𝑛𝑞 = 5. Each one of these five 𝑄 _𝑖𝑚𝑔  will be 
stored with a size that is similar to the storage size required 
to store 400 identifiers, this implies that N's value will be 
around 2000, which is in proximity to the desired value. 
 
 Related works Space Locality Read efficiency 
Curtmola et al. [13] 𝑂 (𝑁 ) 𝑂 (𝑛 𝑖 ) 𝑂 (1) 
Kamara et al. [9] 𝑂 (𝑁 ) 𝑂 (𝑛 𝑖 ) 𝑂 (1) 
David cash et al. [8] 𝑂 (𝑁 ) 𝑂 (𝑛 𝑖 ) 𝑂 (1) 
Chase and Kamara [10] O(( Max|DB(w
i
)|)𝑛 ) 𝑂 (1) 𝑂 (1) 
P. van Liesdonk et al.[11] 𝑂 (𝑚𝑛 ) 𝑂 (𝑛 𝑖 ) 𝑂 (1) 
Kamara and Papamanthou [12] 𝑂 (𝑚𝑛 ) 𝑂 (𝑛 𝑖 𝑙𝑜𝑔 𝑛 ) 𝑂 (𝑛 𝑙𝑜𝑔 𝑛 ) 
David cash et al. [4] 𝑂 (𝑁 𝑙𝑜𝑔 𝑁 ) 𝑂 (𝑙𝑜𝑔 𝑁 ) 𝑂 (1) 
Asharov et al. (Scheme 3) [2] 𝑂 (𝑁 𝑙𝑜𝑔 𝑁 ) 𝑂 (1) 𝑂 (1) 
Demertzis and Papamanthou [5] 𝑂 (𝑁 𝑆 𝑙 ) 𝑂 (𝐿 𝑑 ) 
Where 𝐿 𝑑 is a tunable 
locality 
𝑂 (
𝑁 1
𝑠 𝐿 ) 
Asharov et al. (Pad-and-split 
scheme) [6] 
𝑂 (𝑁 𝑙𝑜𝑔 𝑁 / 𝑙𝑜𝑔 𝐿 ) 
𝑂 (𝐿 𝑑 ) 
Where 𝐿 𝑑 depends on the 
scheme in which it is 
implemented within its 
framework 
𝑂 (1) 
Our work 𝑁 = ∑ (∑ 𝑆 𝑗 =1
𝑛𝑞
)
𝑖 =1
𝑚 
𝑂 (𝑛𝑞 ) 𝑂 (1) 
Table 2: Comparison with previous schemes based on space, locality, and read efficiency 
 
A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 125 
 
 
3.1 Primitive tools 
▪ QR codes  
A QR code is a two-dimensional code containing 
information in all directions, which is a digital image that 
can be easily scanned. Once scanned, it will quickly direct 
to the information embedded in the code, and the person 
who looks at the QR code is unable to identify it because 
its content is only machine-readable [20].  
     The QR code includes good storage capacity, fast 
readability, error correction, and support for more 
languages. It can hold 7,089 numeric characters and 4,296 
alphanumeric characters [21]. 
 
▪ Advanced Encryption Standard (𝑨𝑬𝑺 )  
The 𝐴𝐸𝑆 algorithm is a widely used and recognized 
symmetric block cipher used for encrypting and 
decrypting sensitive data. It is implemented in hardware 
and software and is considered highly secure, making it 
difficult for hackers to access the original data. There is 
currently no known method for breaking 𝐴𝐸𝑆 encryption. 
𝐴𝐸𝑆 offers the flexibility to use different key sizes, 
including 128, 192, and 256 bits, with a fixed block size 
of 128 bits.  
     AES is a block cipher algorithm that uses substitution 
and permutation network techniques to encrypt and 
decrypt data. It operates on a fixed plaintext block size of 
128 bits (16 bytes) represented as a 4×4 matrix. The 
number of rounds in AES depends on the key size, with 
10 rounds for 128-bit keys, 12 rounds for 192-bit keys, and 
14 rounds for 256-bit keys [22]. 
4 Proposed scheme 
In this section, we pay more attention to our proposed 
scheme, which consists of five phases as follows: key 
generation, setup, token generator, secure search, and 
resolve. There are three major components, 𝐷𝑊 , cloud 
server (𝐶𝑆 ), and user (𝑈 𝑖 ) to manage the environment of 
our work. Hence, 𝑖 ∈ 𝑉 , 𝑉 represents the number of users. 
The following construction explains the proposed scheme. 
 
4.1  Our scheme construction with more 
detail 
 
▪ Key generation phase  
In this phase, 𝐷𝑊 creates a secret key 𝑘 by PRF which it 
can be explained as follows: 
Pseudo-Random Functions (𝑃𝑅𝐹 ): A PRF function is 
𝐹 : {0,1}
∗
 𝑥 {0,1}
∗
 → {0,1}
∗
. Thus, it receives two inputs 
a key and input. Therefore, 𝐹 cannot be distinguished from 
a truly random function only with negligible probability in 
1
𝜆 , denoted as 𝑛𝑒𝑔𝑙 (1
𝜆 ) [23]. It is to be used in setup and 
CONSTRUCTION 1. (QR code based scheme) 
 Let 𝐷𝐵 = {𝐷𝐵 (𝑤 1
), . . , 𝐷𝐵 (𝑤 𝑚 )},  𝑊 =
Words in 𝐷𝐵 , 𝑊 = {𝑤 𝑖 , . . . , 𝑤 𝑚 } 
 For 𝑤 𝑖 ∈ 𝑊 let 𝐷𝐵 (𝑤 𝑖 ) = {𝑖𝑑
1
, . . , 𝑖𝑑
𝑛 𝑖 }. 
Key generation phase 𝒌 &  𝒌 𝒖 ← 𝑮𝒆𝒏 (𝟏 𝝀 ): 
1. Input Security parameter 1
𝜆 
2. Output 𝑘  and  𝑘 𝑢 , where 𝑘 &  𝑘 𝑢 𝜖 𝑍 
3. Compute both keys based on 𝑃𝑅𝐹 
Setup and Secure phase 𝑺𝑰 ← 𝑬𝒏𝒄 (𝒌 , 𝑫𝑩 ): 
1. Input 𝑘 and 𝐷𝐵 
2. Output 𝑆𝐼 = 𝐻 𝑇 
3. 𝑏𝑠 = 𝐹𝑏𝑠 (𝑛 ) 
4. Initialize empty  𝐻 𝑇 
5. For every 𝑤 𝑖 𝜖 𝑊 
 𝑛𝑞 = 𝑛 𝑖 /𝑏𝑠 
 Compute 𝑘 1
= 𝑃𝑅𝐹 𝑘 (1 ‖  𝑤 𝑖 ) and 
 𝑘 2
= 𝑃𝑅𝐹 𝑘 (2 ‖ 𝑤 𝑖 ) 
 Initialize counter 𝑖 = 0 
 For every 𝑏 𝜖 𝑛𝑞 
 𝑄 _𝑖𝑚𝑔 = 𝐶𝑟𝑒𝑎𝑡𝑒 _𝑞 (𝑏 ) 
 𝑆 = 𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑞 _𝑠 (𝑄 _𝑖𝑚𝑔 ) 
 Delete 𝑄 _𝑖𝑚𝑔 
 Ŝ = 𝐸𝑛𝑐 𝑘 2
(𝑆 ) by AES256 
 Compute 𝑙𝑎 = 𝑃𝑅𝐹 𝑘 1
 (𝑖 ) 
 Add (𝑙𝑎 , Ŝ) to 𝐻 𝑇 
 𝑖 = 𝑖 + 1 
6. Upload 𝐻 𝑇 to 𝐶𝑆 
Token generator phase Ť ← 𝑻𝒓𝒑𝒅𝒓 (𝒌 𝒖 , 𝒌 , 𝒘 𝒊 ): 
1. Input 𝑘 𝑢 , 𝑘 and 𝑤 𝑖 
2. Output Ť 
3. Compute 𝑇 = 𝑃𝑅𝐹 𝑘 (1 ‖  𝑤 𝑖 ) = 𝑘 1
 
4. Ť = 𝐸𝑛𝑐 𝑘 𝑢 ( 𝑇 )  
5. Send Ť to 𝐶𝑆 
Secure Search phase 𝑬 _𝑶𝒖𝒕𝒑𝒖 𝒕 𝒍𝒊𝒔𝒕 ←
𝑺𝒆𝒂𝒓𝒄𝒉 (𝒌 𝒖 , Ť , 𝑺𝑰 ):  
1. Input 𝑘 𝑢 , Ť  and 𝑆𝐼 
2. Output 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 
3. Initialize counter 𝑖 = 0 and 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 
4. 𝑇 = 𝐷𝑒𝑐 𝑘 𝑢 ( Ť) = 𝑘 1
 
5. While true 
 Compute 𝑙𝑎 = 𝑃𝑅𝐹 𝑘 1
 (𝑖 ) 
 Ŝ = 𝐺𝑒𝑡 (𝑙𝑎 ) 
 Add(Ŝ) to 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡  
 𝑖 = 𝑖 + 1 
End While 
6. 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = 𝐸𝑛𝑐 𝑘 𝑢 (𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) 
7. Send to 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 𝑈 𝑖 
 
 
126 Informatica 47 (2023) 121–132 A.A. Alyusif et al. 
secure phase 𝑆𝐼 ← 𝐸𝑛𝑐 (𝑘 , 𝐷𝐵 ) and send it to all trusted 
users. 
 𝑈 𝑖 should use secret key 𝑘 in the next phases, such as 
token generator phase Ť ← 𝑇𝑟𝑝𝑑𝑟 (𝑘 𝑢 , 𝑘 , 𝑤 𝑖 ) and resolve 
phase  𝑅 ← 𝑅𝑒𝑠𝑜𝑙𝑣𝑒 ( 𝑘 𝑢 , 𝑘 , 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ). In addition, 
a second key was created in the same way, called  𝑘 𝑢 , and 
it is used to secure 𝑇 and 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 exchanged between 
major components 𝐶𝑆 and 𝑈 𝑖 . 
 
▪ Setup and secure phase  
The 𝐷𝑊 configures secure index 𝑆𝐼 ,  which will be 
uploaded to 𝐶𝑆 after completing this phase,  where it uses 
𝑘 and 𝐷𝐵 as main inputs and computes SI as output. The 
following steps explain the mechanism working of the 
current phase: 
• Configure empty hash table (𝐻 𝑇 ) that will be 
depended on to save secure parameters and then used 
as a secure index file (𝑆𝐼 ) for retrieving and 
processing the users’ requests. 
• Determine the size of the block 𝑏𝑠 = Fbs (n) where 
𝑛 represents the number of identifiers of the database 
(𝐷𝐵 ) and 𝑏𝑠 is the number of identifiers that can be 
stored together as a single block allowed to 
create 𝑄 _𝑖𝑚𝑔 . From experimental results of our 
work, we discover a significant relationship between 
the number of identifiers of database 𝑛 and the size of 
𝑄 _𝑖𝑚𝑔 . We found that the 𝑄 _𝑖𝑚𝑔 has the flexibility 
to save maximum numbers of identifiers, whereas the 
value of 𝑛 is low based on the number of 𝑛 ’s ranking) 
• For every keyword taken from 𝑊 where 𝑤 𝑖 𝜖 𝑊 , we 
carried out several successive steps as follows: 
 
− Compute the numbers of 𝑄 _𝑖𝑚𝑔 𝑛𝑞 that 𝑤 𝑖 will 
be needed by 𝑛 𝑖 /𝑏𝑠 , it is necessary to know the 
number of 𝑄 _𝑖𝑚𝑔 for each word based on the 
number of identifiers 𝑛 𝑖 of words after dividing it 
by 𝑏𝑠 . 
− Derivation of two keys from 𝑤 𝑖 . The first one 𝑘 1
 
is computed using 𝑃𝑅𝐹 , which takes 1 and 𝑤 𝑖 as 
its inputs as follows: 𝑘 1
= 𝑃𝑅𝐹 𝑘 (1 ‖  𝑤 𝑖 ). The 
label 𝑙𝑎 has been created by 𝑘 1
. The second key  
 𝑘 2
 creates in the same way 𝑘 2
= 𝑃𝑅𝐹 𝑘 (2 ‖ 𝑤 𝑖 ) . 
− Initialize counter 𝑖 = 0. 
− For every block 𝑏 𝜖 𝑛𝑞 do the following: 
− Convert this 𝑏 to QR code image 𝑄 _𝑖𝑚𝑔 
using the 𝐶𝑟𝑒𝑎𝑡𝑒 _𝑞 (𝑏 ) function, which 
receives 𝑏 as input and converts 𝑏 to 
𝑄 _𝑖𝑚𝑔 . 
− Convert the 𝑄 _𝑖𝑚𝑔 to a string 𝑆 by 
using 𝑆 = 𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑞 _𝑠 (𝑄 _𝑖𝑚𝑔 ) 
function. 
− Delete 𝑄 _𝑖𝑚𝑔 
− Encrypt  𝑆  using Ŝ = 𝐸𝑛𝑐 𝑘 2
(𝑆 ), Ŝ 
represents an encrypted string. 
− Derives 𝑙𝑎 from  𝑤 𝑖 by 𝑙𝑎 =
 𝑃𝑅𝐹 𝑘 1
 (𝑖 ), which receives 𝑘 1
 and 𝑖 as 
input to get a different 𝑙𝑎 each time. 
− Add the 𝑙𝑎 and Ŝ pair (𝑙𝑎 , Ŝ) to the 
𝐻 𝑇 as a (key, value). 
− Increase the value of 𝑖 . 
 
 
▪ Trapdoor phase  
The current phase is implemented by 𝑈 𝑖 to search for a 
specific word 𝑤 𝑖 , the 𝑇𝑟𝑝𝑑𝑟 receives 𝑤 𝑖 , 𝑘 𝑢 and 𝑘 as 
input parameters and gives Ť as an output, which will 
compute 𝑇 and then secure it using 𝑘 𝑢 , Ť = 𝐸𝑛𝑐 𝑘 𝑢 (𝑇 ) to 
send it to 𝐶𝑆 more securely. 
 
 
 
 
Resolve phase 𝑹 ←
 𝑹𝒆𝒔𝒐𝒍𝒗𝒆 ( 𝒌 𝒖 , 𝒌 , 𝑬 _𝑶𝒖𝒕𝒑𝒖 𝒕 𝒍𝒊𝒔𝒕 ): 
1. 1.Input 𝑘 , 𝑘 𝑢 and 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 
2. 2.Output 𝑅 = 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑟𝑠 
3. 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = 𝐷𝑒𝑐 𝑘 𝑢 (𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) 
4. Restore  𝑃𝑅𝐹 𝑘 (2 ‖ 𝑤 𝑖 ) = 𝑘 2
 
5. For every Ŝ 𝜖 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 
 𝑆 = 𝐷𝑒𝑐 𝑘 2
(Ŝ) 
 𝑄 _𝑖𝑚𝑔 = 𝑐𝑜𝑛𝑣𝑒𝑟𝑡 _𝑠 _𝑞 (𝑆 ) 
 𝑅 = 𝑟𝑒𝑎𝑑 (𝑄 _𝑖𝑚𝑔 ) 
 Delete 𝑄 _𝑖𝑚𝑔 
 END CONSTRUCTION  
 
Figure 1: Key generation phase and setup and 
secure phase. 
A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 127 
 
 
▪ Secure search phase 
𝐶𝑆 responds to a user’s query and returns an encrypted list 
of secure values (𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) in 𝑆𝐼 . CS performs the 
following steps: 
− 𝑇 = 𝐷𝑒𝑐 𝑘 𝑢 ( Ť) = 𝑘 1
 
− Initialize counter 𝑖 = 0 and empty 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = [].  
While True 
         Let 𝑙𝑎 = 𝑃𝑅𝐹 𝑘 1
 (𝑖 ) 
          Retrieve 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 [𝑖 ] = Ŝ = 𝐺𝑒𝑡 (𝑙𝑎 ) 
         𝑖 = 𝑖 + 1 
      End 
−  𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = 𝐸𝑛𝑐 𝑘 𝑢 (𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) 
− CS resubmits 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 to 𝑈 𝑖 . 
 
▪ Resolve phase   
 𝑈 𝑖  receives 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 from CS and performs the following: 
− 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = 𝐷𝑒𝑐 𝑘 𝑢 (𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) 
− 𝑘 2
= 𝑃𝑅𝐹 𝑘 (2 ‖ 𝑤 𝑖 )  
− For every Ŝ 𝜖  𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 
− 𝑆 = 𝐷𝑒𝑐 𝑘 2
(Ŝ) 
− Convert 𝑆 to 𝑄 _𝑖𝑚𝑔 using the 
𝑐𝑜𝑛𝑣𝑒𝑟𝑡 _𝑠 _𝑞 (𝑆 ) function. 
− Read the 𝑄 _𝑖𝑚𝑔 with the read function and 
get the identifiers stored in 𝑄 _𝑖𝑚𝑔 . 
5 Security analysis 
5.1 Server information leakage  
Leakage functions Ł, the amount of information the server 
can learn about the stored data. It can be divided into three 
types, the first two are  Ł
𝑚𝑎𝑥
 and Ł
𝑚𝑖𝑛 associated by 
single-round interaction SSE, and third one is Ł
𝑠𝑖𝑧𝑒 , which 
is in more than one round of 𝑆𝑆𝐸 interaction. Then, all 
types take 𝐷𝐵 and 𝑊 as input parameters [4], [6]. 
 Ł
𝑚𝑎𝑥
 (𝐷𝐵 , 𝑊 ): output 
(𝑁 , {𝐷𝐵 (𝑤 𝑖 )}
𝑤 𝑖 ∈𝑊 , 𝑚 , 𝑛 , 𝑛 𝑖 , Max(𝐷𝐵 (𝑤 𝑖 ))
𝑤 𝑖 ∈𝑊 ). It 
reflects the maximum cases of leakage. 
Ł
𝑚𝑖𝑛 (𝐷𝐵 , 𝑊 ): output (𝑁 , {𝐷𝐵 (𝑤 𝑖 )}
𝑤 𝑖 ∈𝑊 ). It is 
considered a somewhat acceptable leak. The server must 
know 𝑁 in all cases. In a single-round interaction, the 
server will be able to view the query results because it can 
decrypt the result. 
Ł
𝑠𝑖𝑧𝑒 (𝐷𝐵 , 𝑊 ): output (𝑁 , {|𝐷𝐵 (𝑤 𝑖 )|}
𝑤 𝑖 ∈𝑊 ) in more than 
one round of interaction where the server cannot decrypt 
data.  Thus, when the response to the user’s request is 
encrypted, the server can learn only the output size, which 
is one of the best cases of leakage. In addition, detecting 
the size does not mean that the server has the ability to 
reveal the main information about our proposed scheme. 
Therefore, our work is resisting all types of leakage. 
In our work, we go into deep details for security 
analysis. We noticed from practical experiments of our 
scheme that the shape of the 𝑄 _𝑖𝑚𝑔 is simple and less 
complicated if the number of word identifiers in the input 
𝑏 of the function is small 𝑄 _𝑖𝑚𝑔 = 𝐶𝑟𝑒𝑎𝑡𝑒 _𝑞 (𝑏 ). This 
case in turn will affect the next step, that is, the process of 
converting this 𝑄 _𝑖𝑚𝑔 into a string 𝑆 =
 𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑞 _𝑠 (𝑄 _𝑖𝑚𝑔 ), which leads to a simple and less 
complex 𝑆 . Its size differs from the size of another 𝑆 
coming from a block full of identifiers. This difference in 
size is considered a security vulnerability. However, we 
made sure to address it by making all strings have very 
close sizes. By increasing the complexity of the 𝑄 _𝑖𝑚𝑔 if 
the number of block identifiers is small, this 𝑄 _𝑖𝑚𝑔 will 
lead to a larger string size. Thus, the sum of 𝑆 does not 
reflect the real size of the database because they are all 
close in size, even if they are the result of preparing 
different identifiers. In addition, our work is more than one 
round of interaction. Furthermore, the server responds to 
the user’s request without decrypting the data. From this 
standpoint, we can say that our construction has only Ł
𝑠𝑖𝑧 𝑒 
leakage. 
5.2 Resisting attacks  
In this part, we discussed the resistance of our scheme, the 
most famous types of attacks on 𝑆𝑆𝐸 .  
 
▪ Frequency analysis attack  
A frequency analysis attack is one of the known ciphertext 
attacks. It is built on studying the frequency of letters or 
collections of letters in a ciphertext [24]. Hence, it takes 
advantage of the frequency of encrypted data uploaded to 
𝑆𝐼 , which is either term frequency 𝑇𝐹 or term frequency-
inverse 𝑇𝐹 _𝐼𝐷𝐹 . 𝑇𝐹 is defined as the number of times 
𝑤 𝑖 looks in a document 𝑖𝑑 , and TF-IDF is the 
multiplication of term frequency 𝑇𝐹 and 𝐼𝐷𝐹 values. 𝐼𝐷𝐹 
is defined as a value of 𝑤 𝑖 that could be obtained by 
Fig. 2: Trapdoor, secure search and resolve 
phases. 
128 Informatica 47 (2023) 121–132 A.A. Alyusif et al. 
dividing 𝑛  by the document frequency number of 
documents in which such a word appeared. If the 𝐶𝑆 can 
access this important information, then it can carry out this 
attack and know the keyword that is being searched for. 
      Based on the above, we can say that our work is safe 
against the current attack because the values stored in 𝐶𝑆 
are encrypted and do not reflect the identifiers originally, 
but rather an obscure text that helps the 𝑈 𝑖 to access 
identifiers later.  
 
▪ 𝑰 𝑲𝑲 attack  
IKK attack uses the disclosed partly information to find 
out what plaintext words the 𝑈 𝑖 searched trapdoors [25]. 
Hence, this attack relies mainly on the leaking of the 
access pattern information, which is defined as the result 
of the search for 𝑇 in 𝑆𝐼 by 𝐶𝑆 .  
     For example, suppose our 𝐷𝐵 is about computer 
science, and the user requests three queries as trapdoors: 
𝑇 1
, 𝑇 2 
, and 𝑇 3
, which represent the words “network,” 
“computer,” and “information,” respectively. After 
completing the communication between 𝑈 𝑖 and 𝐶𝑆 , the 𝐶𝑆 
will look at the obtained results, which is the set of 
identifiers corresponding to the trapdoors. The 𝐶 𝑆 can 
then compute the probability of appearance of any two of 
these keywords in any document by observing the number 
of the same documents reverted for those corresponding 
trapdoors. By continuing the search and leaking the access 
pattern, thereby obtaining more probabilities, the server 
will be able to access the keywords that correspond to the 
trapdoors [24]. After a modest clarification of this attack, 
we can say that our scheme resists the  IKK attack because 
the search result by the CS is encrypted, and the access 
pattern does not leak any important information. That is, 
the CS will not be able to access the identifiers 
{𝑖𝑑
1
, . . , 𝑖𝑑
𝑛 𝑖 } corresponding to the trapdoors [25]. 
 
▪ Keyword guessing attack (𝐊𝐆𝐀 )  
KGA attack is an attack on the encrypted index stored in 
the server, where the attacker tries to guess the keyword 
that is being searched to use it in the search later and find 
its identifiers [26]. This attack is resisted by various 
precautionary measures, such as encrypting the words  
themselves and keeping the key secret and secure. Both 
have been worked out in our scheme. Hence, we can say 
that our work resists KGA attacks. 
 
▪ Man-in-the-middle attack 𝑴𝑰𝑻𝑴  
This type of attack occurs when the communication 
channel between the 𝑈 𝑖 and the 𝐶𝑆 is not secured, as the 
attacker will occupy the identity of one of the parties 
(𝑈 𝑖 , 𝐶𝑆 ) [27]. We took the resistance to this attack into 
account in our work, where the communication channel 
was secured by encrypting the exchanged data (𝑇 and 
𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) between the 𝐶𝑆 and 𝑈 𝑖 based on 𝑘 𝑢 key, in 
addition to mutual authentication between the two parties 
with the same key as the following construction: 
6 Experimental results   
In this section, we evaluated the performance of our 
proposed scheme using a real-data collection of Wikipedia 
articles. We selected a total of 250 articles, along with 
their eight corresponding historical versions, resulting in a 
collection of 2000 files. Our collection contained a total of 
117000 keywords. As the total number of identifiers 𝑛 for 
𝐷𝐵 is 2000, the absorption of the QR code will be 
approximately 400 identifiers. In addition, the 
probabilities of nq values for each word will be from 1 to 
5, which means that the maximum locality will be only 5. 
The experiments were conducted on a PC with a 2.6 GH 
Intel Core i5 CPU and 8 GB of RAM running on 64-bit 
Windows 10. The code was implemented in python 
because of its many known features, and it supports the 
creation of QR codes technologies. 
 
▪ Retrieval time  
We conducted an experiment to find out the time taken to 
retrieve identifiers for three words that differ in the 
number of their identifiers 𝑛 𝑖 
and the number of QR code 
𝑛𝑞 . Moreover, the first word 𝑤 1
 contains 340 identifiers 
𝑛𝑞 = 1, and the second word 𝑤 2
 contains 1172 identifiers 
1172/400 = 2.93. That is, 3 𝑛𝑞 , and the third 𝑤 3
 appears 
in all files 2000/400 = 5 𝑛𝑞 . Evidently, the retrieval time 
grows linearly with the increase in 𝑛𝑞 values. We notice 
that the retrieval time is the time of each of the secure 
search and resolve phases. 
 
CONSTRUCTION 2. 
In Trapdoor Phase: 
Choose a random identifier 𝑢 𝐼𝐷
 by 𝑈 𝑖 
ℎ𝑢 𝐼𝐷
= 𝐻 𝐾 𝑢 (𝑢 𝐼𝐷
) by MD5 
Send (ℎ𝑢 𝐼𝐷
 ||  𝑢 𝐼𝐷
|| Ť) to 𝐶𝑆 
In Secure Search Phase: 
Choose a random identifier 𝑐𝑠
𝐼𝐷
 by 𝐶𝑆 
If ℎ𝑢 𝐼𝐷
=𝑣𝑟𝑓𝑦 𝐾 𝑢 (𝑢 𝐼𝐷
) → 𝑈 𝑖 authentication 
Before send 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡  
ℎ𝑐𝑠
𝐼𝐷
= 𝐻 𝐾 𝑢 (𝑐𝑠
𝐼𝐷
) by MD5 
send (ℎ𝑐𝑠
𝐼𝐷
 ||  𝑐𝑠
𝐼𝐷
 || 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) to 𝑈 𝑖 
In Resolve Phase: 
If ℎ𝑐𝑠
𝐼𝐷
=𝑣𝑟𝑓𝑦 𝐾 𝑢 (𝑐𝑠
𝐼𝐷
) → 𝐶𝑆 authentication 
Where 𝐻 is hash function, 𝑣𝑟𝑓𝑦 is verification 
function, ℎ𝑢 𝐼𝐷
 represent user hash value and ℎ𝑐𝑠
𝐼𝐷
 
represent cloud server hash value 
A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 129 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
▪ Comparison with previous schemes  
To ensure the efficiency of our scheme, we implemented 
some of the previous schemes as a first step. These 
schemes are Cash et al. [28], Cash and Tessaro [4], 
Asharov et al. (Scheme 3) [2], and Demertzis and 
Papamanthou [5]. As a second step, we perform the same 
experiment on our scheme and these four schemes on the 
same database based on the AES256 encryption 
algorithm, which are the same words to know the time 
required to secure the search of the word identifiers. We 
chose two words: the first word 𝑤 1
 with the largest 
number of identifiers and that appears in all database 
identifiers, that is, appears in 2000 files, and the second 
𝑤 2
 appears in 1000 files. For the comparison to be fair and 
as we used more than one round of interaction in our work 
to increase security, the search time was very short 
because 𝐶𝑆 does not decrypt and process the data. Hence, 
we calculated the retrieval time for our work  to find out 
the total processing time to obtain the identifiers. 
 
 
 
 
 
 
 
Figure 3: The retrieval time for three words. 
Figure 4: Comparison with previous schemes. 
 
130 Informatica 47 (2023) 121–132 A.A. Alyusif et al. 
7 Conclusions 
In this study, we discuss the problem of slow retrieval of 
encrypted data owing to poor locality and present a new 
scheme that helps solve this problem by adding the QR 
code technique to the inverted index. This case mainly 
improves the locality. Finally, our scheme has less leakage 
and high security and is resistant to most common SSE 
attacks. Additionally, we used real-world data in our 
experiments and demonstrated that our proposed scheme 
is secure and practically efficient and holds high accuracy. 
References 
 
[1] G. Sen Poh, J.-J. Chin, W.-C. Yau, K.-K. R. Choo, 
and M. S. Mohamad, “Searchable symmetric 
encryption: designs and challenges,” ACM 
Comput. Surv., vol. 50, no. 3, pp. 1–37, 2017, doi: 
{10.1145/3064005}. 
[2] G. Asharov, M. Naor, G. Segev, and I. Shahaf, 
“Searchable symmetric encryption: optimal 
locality in linear space via two-dimensional 
balanced allocations,” in Proceedings of the forty-
eighth annual ACM symposium on Theory of 
Computing, 2016, pp. 1101–1114. doi: 
{10.1145/2897518.2897562}. 
[3] E. Damiani, S. D. C. Di Vimercati, S. Foresti, S. 
Jajodia, S. Paraboschi, and P. Samarati, “Key 
management for multi-user encrypted databases,” 
in Proceedings of the 2005 ACM workshop on 
Storage security and survivability, 2005, pp. 74–
83. doi: {10.1145/1103780.1103792}. 
[4] D. Cash and S. Tessaro, “The locality of 
searchable symmetric encryption,” in Annual 
international conference on the theory and 
applications of cryptographic techniques, 2014, 
pp. 351–368. 
[5] I. Demertzis and C. Papamanthou, “Fast 
searchable encryption with tunable locality,” in 
Proceedings of the 2017 ACM International 
Conference on Management of Data, 2017, pp. 
1053–1067. doi: {10.1145/3035918.3064057}. 
[6] G. Asharov, G. Segev, and I. Shahaf, “Tight 
tradeoffs in searchable symmetric encryption,” J. 
Cryptol., vol. 34, no. 2, pp. 1–37, 2021, doi: 
https://doi.org/10.1007/s00145-020-09370-z. 
[7] D. X. Song, D. Wagner, and A. Perrig, “Practical 
techniques for searches on encrypted data,” in 
Proceeding 2000 IEEE symposium on security 
and privacy. S&P 2000, 2000, pp. 44–55. doi: 
10.1109/SECPRI.2000.848445. 
[8] D. Cash, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. 
Roşu, and M. Steiner, “Highly-scalable 
searchable symmetric encryption with support for 
boolean queries,” in Annual cryptology 
conference, 2013, pp. 353–373. 
[9] S. Kamara, C. Papamanthou, and T. Roeder, 
“Dynamic searchable symmetric encryption,” in 
Proceedings of the 2012 ACM conference on 
Computer and communications security, 2012, 
pp. 965–976. doi: {10.1145/2382196.2382298}. 
[10] M. Chase and S. Kamara, “Structured encryption 
and controlled disclosure,” in International 
conference on the theory and application of 
cryptology and information security, 2010, pp. 
577–594. doi: https://doi.org/10.1007/978-3-642-
17373-8_33. 
[11] P. Van Liesdonk, S. Sedghi, J. Doumen, P. Hartel, 
and W. Jonker, “Computationally efficient 
searchable symmetric encryption,” in Workshop 
on Secure Data Management, 2010, pp. 87–100. 
doi: https://doi.org/10.1007/978-3-642-15546-
8_7. 
[12] S. Kamara and C. Papamanthou, “Parallel and 
dynamic searchable symmetric encryption,” in 
International conference on financial 
cryptography and data security, 2013, pp. 258–
274. 
[13] R. Curtmola, J. Garay, S. Kamara, and R. 
Ostrovsky, “Searchable symmetric encryption: 
improved definitions and efficient constructions,” 
in Proceedings of the 13th ACM conference on 
Computer and communications security, 2006, 
pp. 79–88. doi: {10.1145/1180405.1180417}. 
[14] H. M. Mohammed and A. I. Abdulsada, “Secure 
Multi-keyword Similarity Search Over Encrypted 
Data With Security Improvement.,” Iraqi J. 
Electr. Electron. Eng., vol. 17, no. 2, 2021. 
[15] H. M. Mohammed and A. I. Abdulsada, “Multi-
keyword search over encrypted data with security 
proof,” J. Basrah Res., vol. 47, no. 1, 2021. 
[16] C. Guo, W. Li, X. Tang, K.-K. R. Choo, and Y. 
Liu, “Forward Private Verifiable Dynamic 
Searchable Symmetric Encryption with Efficient 
Conjunctive Query,” IEEE Trans. Dependable 
Secur. Comput., 2023, doi: 
10.1109/TDSC.2023.3262060. 
[17] Z. A. Abduljabbar, A. Ibrahim, M. A. Al Sibahee, 
S. Lu, and S. M. Umran, “Lightweight Privacy-
Preserving Similar Documents Retrieval over 
Encrypted Data,” in 2021 IEEE 45th Annual 
Computers, Software, and Applications 
Conference (COMPSAC), 2021, pp. 1397–1398. 
doi: 10.1109/COMPSAC51774.2021.00202. 
[18] M. A. Al Sibahee, A. I. Abdulsada, Z. A. 
Abduljabbar, J. Ma, V. O. Nyangaresi, and S. M. 
A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 131 
 
 
Umran, “Lightweight, Secure, Similar-Document 
Retrieval over Encrypted Data,” Appl. Sci., vol. 
11, no. 24, p. 12040, 2021, doi: 
https://doi.org/10.3390/app112412040. 
[19] M. A. Hussain et al., “Provably throttling SQLI 
using an enciphering query and secure matching,” 
Egypt. Informatics J., vol. 23, no. 4, pp. 145–162, 
2022, doi: 
https://doi.org/10.1016/j.eij.2022.10.001. 
[20] S. Singh, “QR code analysis,” Int. J. Adv. Res. 
Comput. Sci. Softw. Eng., vol. 6, no. 5, 2016. 
[21] A. S. Narayanan, “QR codes and security 
solutions,” Int. J. Comput. Sci. Telecommun., vol. 
3, no. 7, pp. 69–72, 2012. 
[22] A. M. Abdullah, “Advanced encryption standard 
(AES) algorithm to encrypt and decrypt data,” 
Cryptogr. Netw. Secur., vol. 16, pp. 1–11, 2017. 
[23] J. Katz and Y. Lindell, Introduction to modern 
cryptography. CRC press, 2020. 
[24] D. V. N. Siva Kumar and P. Santhi Thilagam, 
“Searchable encryption approaches: attacks and 
challenges,” Knowl. Inf. Syst., vol. 61, no. 3, pp. 
1179–1207, 2019. 
[25] D. Cash, P. Grubbs, J. Perry, and T. Ristenpart, 
“Leakage-abuse attacks against searchable 
encryption,” in Proceedings of the 22nd ACM 
SIGSAC conference on computer and 
communications security, 2015, pp. 668–679. doi: 
{10.1145/2810103.2813700}. 
[26] Y. Miao, Q. Tong, R. H. Deng, K.-K. R. Choo, X. 
Liu, and H. Li, “Verifiable searchable encryption 
framework against insider keyword-guessing 
attack in cloud storage,” IEEE Trans. Cloud 
Comput., vol. 10, no. 2, pp. 835–848, 2020, doi: 
10.1109/TCC.2020.2989296. 
[27] S. Gangan, “A review of man-in-the-middle 
attacks,” arXiv Prepr. arXiv1504.02115, 2015, 
doi: https://doi.org/10.48550/arXiv.1504.02115. 
[28] D. Cash et al., “Dynamic searchable encryption in 
very-large databases: Data structures and 
implementation,” Cryptol. ePrint Arch., 2014, 
doi: 10.14722/ndss.2014.23264. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132 Informatica 47 (2023) 121–132 A.A. Alyusif et al.