https://doi.org/10.31449/inf.v47i7.4745 Informatica 47 (2023) 121–132 121 Locality Improvement Scheme Based on QR Code Technique Within Inverted Index Aya A. Alyousif * , Ali A. Yassin E-mail: pgs.aya.alyousif@uobasrah.edu.iq, ali.yassin@uobasrah.edu.iq Department of Computer Science, Education College for Pure Sciences, University of Basrah, Basrah, Iraq * Corresponding author Keywords: advanced encryption standard, locality, qr code, searchable encryption Received: March 16, 2023 Searchable symmetric encryption is one of the most important modern technologies that allow the owner to store private data on an unreliable server and search for the data securely while preserving the data’s confidentiality and privacy. This field has several schemes, but these schemes suffered from slower data retrieval in the case of large database sizes owing to the poor locality. Hence, the server visits several memory locations for a single query. Other studies focused on improving the locality, but the result is either increased storage capacity or decreased efficiency of data reading. In the present study, we present a secure, searchable scheme that overcomes the abovementioned issues and works to improve the locality by exploiting the QR code technique and the Advanced Encryption Standard algorithm. Furthermore, our work maintains read efficiency, helps reduce the risk of data breaches, and protects sensitive information from being accessed by unauthorized individuals. Moreover, the proposed scheme can resist cyber security attacks, such as frequency analysis attacks and keyword guessing attacks. Additionally, we used real-world data in our experiments and demonstrated that our proposed scheme is secure and practically efficient and holds high accuracy. Povzetek: Predstavljana je varnostna iskalna shema za izboljšanje šifriranja, ki izboljšuje lokalnost, ohranja učinkovitost branja in je odporna na kibernetske napade. 1 Introduction Cloud storage outsourcing is a service in which a company outsources the storage of its data to a cloud storage provider. Cloud storage outsourcing can provide several benefits, including cost savings, increased scalability and flexibility, and improved data security and availability. Cloud storage providers typically have strict security measures to protect customer data, including measures such as encryption, secure data centers, and access controls. However, the data owner (𝐷𝑊 ) or company is still responsible for ensuring that its sensitive data are protected and in compliance with relevant regulations or industry standards. Searchable symmetric encryption (𝑆𝑆𝐸 ) is one of the most important operations that can be applied on sensitive data in the outsourcing cloud [1]. Using searchable encryption to store sensitive data in the cloud can help to protect the data from unauthorized access while still allowing it to be searched and used in a controlled manner. However, carefully evaluating the security and performance tradeoffs of different searchable encryption approaches and ensuring that the chosen approach meets the 𝐷𝑊 ’s needs and security requirements are essential. In more detail, 𝑆𝑆𝐸 is done by the 𝐷𝑊 as a first step to encrypt the data using a secret key, and then, it is sent to the server. After, the 𝐷𝑊 creates a secure index based on its own database. The encrypted data and a secure index are sent to the cloud storage server, which could be the same untrusted server or another server chosen by the 𝐷𝑊 as a third party. The 𝐷𝑊 generates a search token used to retrieve information based on a secure index file to perform a secure search. SSE schemes in the literature review have several syntaxes, which are divided into two types. Each type relies on the interaction between the server and the 𝐷𝑊 in each request query about data. The first type is a single- round interaction where the server also decrypts the data and sends the result to the 𝐷𝑊 (therefore, the server learns the output). Additionally, the second type uses more than one round of interaction (where the server learns no information about the output) [2]. We use the second type in our scheme for the security of the exchanged data over the communication channel between server and client. Therefore, the server cannot decrypt the data and preserve the privacy of the client’s data. 𝑆𝑆𝐸 schemes face several challenges as follows: • Key management In the 𝑆𝑆𝐸 scheme, the same secret key is used for encrypting and decrypting the data. This case implies that the key must be safely distributed to all parties who need to be able to look for the encrypted data. This can pose a 122 Informatica 47 (2023) 121–132 A.A. Alyusif et al. difficulty, particularly in large organizations with many users [3]. • Security Although symmetric searchable encryption schemes are generally secure, they are vulnerable to key compromise. If the secret key is compromised, an attacker could decrypt the encrypted data and potentially perform unauthorized searches. • Key size To ensure the security of the encrypted data, the secret key used in a symmetric searchable encryption scheme must be long enough to withstand brute-force attacks. This case can increase the size of the key, which can be a problem in systems with limited storage or bandwidth. • Performance Symmetric searchable encryption schemes can be slower than other encryption schemes because they require the use of a single shared key for encryption and decryption. This case can be a problem when searching large datasets as the search may take a long time to complete. Another issue related to the slow searching in large datasets is poor locality. Thus, locality is defined as the number of times a server accesses multiple memory locations separately in response to a single request from the user [1], [2], [4]– [6]. The improvement in locality always leads to a negative impact on other characteristics, such as the efficiency of reading or a significant increase in storage capacity. Hence, finding a scheme that combines the best locality and the efficiency of reading and storage is difficult. Our contribution In this study, we propose a good locality secure searchable encryption based on the QR code technology and symmetric encryption method to solve the problem of searchable encryption in large datasets. Additionally, we use a new inverted index to improve locality that contributes to preventing leakage data and preserves the privacy data of 𝐷𝑊 . Provided with a comprehensive description of our scheme and a discussion of its resistance against several of the most famous attacks on SSE. We will briefly summarize our contributions as follows: • Optimization locality by 800 times for small databases that contain less than a thousand identifiers and by 400 times for large databases. Furthermore, we applied our proposed scheme in real-world data, and the experimental results denote that our work achieves the best results in performance, read efficacy, and resisting malicious attacks compared with previous works. • Our proposed scheme does not excessively affect storage and maintains storage without resulting in a large increase due to locality improvement. Furthermore, it maintains read efficiency at 𝑂 (1). Character Description 𝑉 Number of users 𝑈 𝑖 User, where 𝑖 ∈ 𝑉 𝑤 𝑖 Word 𝑚 Number of words 𝑊 Words in 𝐷𝐵 , 𝑊 = {𝑤 𝑖 , . . . , 𝑤 𝑚 } 𝑖𝑑 Identifier 𝑛 Total of identifiers 𝐷𝐵 𝑛 𝑖 Total of identifiers 𝑤 𝑖 𝑁 ∑ |𝐷𝐵 (𝑤 𝑖 )| 𝑖 =1 𝑚 where 𝐷𝐵 (𝑤 𝑖 ) = {𝑖𝑑 1 , . . , 𝑖𝑑 𝑛 𝑖 } 𝑃𝑅𝐹 Pseudo-random function 𝐻 𝑇 Hash table representing data structures used to store and retrieve data and consist of a pair of algorithms Add and Get [6] 𝐴𝑑𝑑 Algorithm adds pairs of (𝑘𝑒𝑦 , 𝑣𝑎𝑙𝑢𝑒 ) to 𝐻 𝑇 𝐺𝑒𝑡 value=Get(key) 𝑆 String Ŝ Encrypted string 𝑙𝑎 Label is used to store and retrieve Ŝ in 𝐻 𝑇 , 𝐴𝑑𝑑 (𝑙𝑎 , Ŝ), Ŝ = 𝐺𝑒𝑡 (𝑙𝑎 ) 𝐸𝑛𝑐 Function to encryption 𝑆 𝐷𝑒𝑐 Function to decryption Ŝ 𝑘 1 Derivative key to create 𝑙𝑎 𝑘 2 Derivative key to encrypted and decrypted 𝑏 Block of word identifiers 𝑄 _𝑖𝑚𝑔 QR code image 𝐶𝑟𝑒𝑎𝑡 _𝑞 Create QR code function 𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑠 _𝑞 Convert string to 𝑄 _𝑖𝑚𝑔 function 𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑞 _𝑠 Convert 𝑄 _𝑖𝑚𝑔 to string function Ť Encryption 𝑇 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 A list used by the CS to store the values encoded in it to respond to the user query 𝑟𝑒𝑎𝑑 Read 𝑄 _𝑖𝑚𝑔 function 𝑘 𝑢 Key used to secure 𝑇 and 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 exchanged Table 1: List of symbols A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 123 between major components 𝐶𝑆 and 𝑈 𝑖 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 Encryption 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 2 General background 2.1 𝑺𝑺𝑬 algorithms • 𝑘 ← 𝐺𝑒𝑛 (1 λ ): Is a key generation algorithm that is run by the 𝐷𝑊 token as a security parameter 1 λ (input) and a secret key 𝑘 used for encrypting/decrypting database 𝐷𝐵 . • 𝑆𝐼 ← 𝐸𝑛𝑐 (𝑘 , 𝐷𝐵 ): Is used for building a secure index file (𝑆𝐼 ) based on 𝑘 and 𝐷𝐵 . • 𝑇 ← 𝑇𝑟𝑝𝑑𝑟 (𝑘 , 𝑤 𝑖 ): This algorithm is more concerned with preventing the disclosure of information stored on the server as an encrypted list of keywords (𝑆𝐼 ). As a result, the server responds to the users’ request (𝑇 ) in a safe manner. • 𝑑 ← 𝑆 𝑒 𝑎𝑟𝑐 ℎ(𝑆𝐼 , 𝑇 ): Is a deterministic algorithm run by the server to search for the data d through a trapdoor 𝑇 in the secure index 𝑆𝐼 . If d is encrypted, then we will need a resolve algorithm. • 𝑅 ← 𝑅𝑒𝑠𝑜𝑙𝑣𝑒 (𝑘 , 𝑑 ): This algorithm is performed by the 𝐷𝑊 to recover an identifiers for the keyword. It takes a secret key 𝑘 and a data point d as inputs and outputs of the final result 𝑅 . 2.2 Locality and read efficiency We must get acquainted with the concept of read patterns to understand locality in more detail. The search function for any 𝑆𝑆𝐸 scheme by the server 𝑑 ← 𝑆𝑒𝑎𝑟𝑐 ℎ(𝑆𝐼 , 𝑇 ) can be analyzed into a series of intervals [𝑎 1 , 𝑏 1 ]... [𝑎 𝑣 , 𝑏 𝑣 ], where the server starts from the first interval [𝑎 1 , 𝑏 1 ] to the last interval [𝑎 𝑣 , 𝑏 𝑣 ] depending on 𝑇 in 𝑆𝐼 . Moreover, we can express these intervals as a read pattern function 𝑅𝑑𝑃𝑎𝑡 (𝑆𝐼 , 𝑇 ). When only one interval is obtained, the scheme has the best possible locality [4], [6]. Definition (Locality 𝑳 ) An 𝑆𝑆𝐸 scheme 𝛱 has locality (𝐿 ) with each security parameter (1 λ , 𝐷𝐵 , and 𝑤 𝑖 ∈ N). We use 𝑅𝑒𝑎𝑑𝑃𝑎𝑡 (𝑆𝐼 , 𝑇 ) consisting of 𝐿 intervals with probability 1, when 𝑇 and 𝑆𝐼 are computed as follows: 𝑘 ← 𝐺𝑒𝑛 (1 λ ), 𝑆𝐼 ← 𝐸𝑛𝑐 (𝑘 , 𝐷𝐵 ) and 𝑇 ← 𝑇 𝑟𝑝𝑑𝑟 (𝑘 , 𝑤 𝑖 ). Furthermore, if 𝐿 = 1, the 𝑆𝑆𝐸 scheme (𝛱 ) has perfect locality. However, the perfect locality is insufficient because we can simply transform any scheme into a perfect locality by the poor read efficiency, which is defined as reading only the required data from the server for each query[4]. Definition (Read Efficiency 𝑬 ) An 𝑆𝑆𝐸 scheme 𝛱 is 𝐸 - read efficient with each main parameters 1 λ , DB, and 𝑤 𝑖 ∈ N. We have 𝑅𝑒𝑎𝑑𝑃𝑎𝑡 (𝑆𝐼 , 𝑇 ) that consists of intervals of total length at most 𝐸 · |𝐵𝑖𝑛𝐸𝑛𝑐 (𝐷𝐵 (𝑤 𝑖 ))| bits, where 𝐵𝑖𝑛𝐸𝑛𝑐 (𝐷𝐵 (𝑤 𝑖 )) represents binary coding of 𝐷𝐵 (𝑤 𝑖 ). The concatenation of all keywords’ identifiers signified as bit strings [4]. A perfect locality can also be obtained by violating storage efficiency by creating a perfect locality scheme but with excessive storage space overhead (the encrypted 𝐷𝐵 should not be much larger than the original 𝐷𝐵 ). 2.3 Related works In 2000, Song et al. defined SSE and provided efficient constructions [7]. These mechanisms securely store data on an untrusted server but cannot retrieve client's data. Many subsequent studies focused on this field, achieving SSE's basic principle. However, practical experiments with large databases revealed poor performance and degradation with increasing size. The reason is that they often suffer from a bottleneck problem [8]. The literature found that the bottleneck in these schemes was not caused by encryption but by the lower-level memory access issues in more specifically poor locality. The known constructions can be broadly categorized into two. The first approach is characterized by linear space and constant read efficiency but poor locality in [8], [9]. An array of size 𝑁 is allocated, and 𝑁 elements of the database are uniformly mapped into the array. To recover a list of documents that contain a given keyword, each document identifier is stored in the array together with a pointer to the next document identifier in the list. Additionally, the first approach requires the server to access random locations in the array with the number of identifiers that the word appears in. This case is inefficient because of poor performance resulting from moving to a large number of different locations. The second approach has optimal read efficiency and locality but at the cost of substantial space overhead [10], [12]. The strategy behind this approach is to allocate a sufficiently large array and uniformly map the list of word identifiers into a contiguous interval in the array by the length of word identifiers, without any overlaps among different lists. To efficiently retrieve a list for a given keyword, the server needs to access only a single random location and read all consecutive identifier entries, thereby resulting in optimal read efficiency and locality. However, the locations of the lists in the array reveal information about the structure of the underlying database. Therefore, padding must be applied to conceal information about the lengths of the lists, leading to a polynomial space overhead. Hence, creating a scheme with the best locality, storage, and read efficiency is a challenge, as David Cash and Tessaro in 2014 [4] proved that it is impossible for this to happen. They also set a lower bound on the tradeoff among these three criteria. In addition to their improvement on the locality by creating a scheme with a logarithmic locality (𝑙𝑜𝑔 𝑁 ), the storage space is not good enough 𝑂 (𝑁 𝑙𝑜𝑔 𝑁 ). In 2016, Gilad Asharov et al. [2], in their third scheme, updated the locality of David cash and Tessaro scheme to become 𝑂 (1) with the same storage space. In 2017, Demertzis and Papamanthou [5] created two schemes, the first with optimal locality and space 𝑂 (𝑁 𝑆 𝑙 ). 𝑆 𝑙 is the number of levels used to store data, but there was 124 Informatica 47 (2023) 121–132 A.A. Alyusif et al. an effect on read efficiency by a small percentage, and the storage space is still large. As the second scheme worked in the same storage space as the first scheme, it achieves to a tunable locality, which is chosen as a parameter by 𝐷𝑊 during the setup phase. In 2021, Asharov et al. [6] significantly strengthened the lower bound of Cash and Tessaro by creating two general frameworks, the first pad-and-split framework and the second statistical-independence framework. From 2021 to 2023, various research studies in SSE across different domains have emerged, highlighting numerous benefits. However, all of them still lack good locality, as evidenced by references [14]–[19]. Table 2 displays previous works in terms of three critical features: locality, reading efficiency, and storage space. It is worth noting that none of the works satisfy all three characteristics simultaneously. Certain searches exhibit poor locality, such as [8], [9], [13], where a search word containing 2,000 identifiers 𝑛 𝑖 = 2000, would require the cloud server to traverse through 2,000 distinct positions to fulfill the user's request. As a result, the searchable symmetric encryption's overall performance is hampered. In certain prior research, the locality has been favorable, as in [2], [4]and [10]. In cases where the locality is 𝑂 (1), the cloud server can fulfill the user's request by moving to just one location. If 𝑁 = 1200 and the locality is 𝑂 (𝑙𝑜𝑔 𝑁 ), it would be 10, implying that the cloud server would need to traverse 10 distinct positions to fulfill the user's request. Despite the favorable locality in some prior research [2], [4] and [10], they encountered issues with large storage space. For instance, if 𝑁 = 1200 and the storage space is 𝑂 (𝑁 𝑙𝑜𝑔 𝑁 ), the storage space will expand tenfold compared to its original size, such as in [2] [4]. Additionally, in other cases, the word that belongs to the most extensive set of identifiers can impact the storage space where, the storage space O(( Max|DB(w i )|)𝑛 ) . If this word corresponds to all the identifiers in the database, that is, 𝑛 = 𝑛 𝑖 the size of the encrypted index can become extremely large, as in [10]. 3 Discussion In this section, we will present our work's position in relation to locality, and storage space. By leveraging QR code technology, we have successfully improved the locality to 𝑂 (𝑛𝑞 ) by grouping a set of identifiers together in a single 𝑄 _𝑖𝑚𝑔 . This has led to a significant reduction in the number of cloud server readings required. This has the added benefit of reducing the number of times the user needs to perform decryption operations, resulting in a more efficient and streamlined experience. Regarding storage space, since = ∑ |𝐷𝐵 (𝑤 𝑖 )| 𝑖 =1 𝑚 , according to the previous example, the value of 𝑁 = 1700, and this is its value in general. As for our work, 𝑁 depends mainly ∑ 𝑛𝑞 for all words, and in previous example ∑ 𝑛𝑞 = 5. Each one of these five 𝑄 _𝑖𝑚𝑔 will be stored with a size that is similar to the storage size required to store 400 identifiers, this implies that N's value will be around 2000, which is in proximity to the desired value. Related works Space Locality Read efficiency Curtmola et al. [13] 𝑂 (𝑁 ) 𝑂 (𝑛 𝑖 ) 𝑂 (1) Kamara et al. [9] 𝑂 (𝑁 ) 𝑂 (𝑛 𝑖 ) 𝑂 (1) David cash et al. [8] 𝑂 (𝑁 ) 𝑂 (𝑛 𝑖 ) 𝑂 (1) Chase and Kamara [10] O(( Max|DB(w i )|)𝑛 ) 𝑂 (1) 𝑂 (1) P. van Liesdonk et al.[11] 𝑂 (𝑚𝑛 ) 𝑂 (𝑛 𝑖 ) 𝑂 (1) Kamara and Papamanthou [12] 𝑂 (𝑚𝑛 ) 𝑂 (𝑛 𝑖 𝑙𝑜𝑔 𝑛 ) 𝑂 (𝑛 𝑙𝑜𝑔 𝑛 ) David cash et al. [4] 𝑂 (𝑁 𝑙𝑜𝑔 𝑁 ) 𝑂 (𝑙𝑜𝑔 𝑁 ) 𝑂 (1) Asharov et al. (Scheme 3) [2] 𝑂 (𝑁 𝑙𝑜𝑔 𝑁 ) 𝑂 (1) 𝑂 (1) Demertzis and Papamanthou [5] 𝑂 (𝑁 𝑆 𝑙 ) 𝑂 (𝐿 𝑑 ) Where 𝐿 𝑑 is a tunable locality 𝑂 ( 𝑁 1 𝑠 𝐿 ) Asharov et al. (Pad-and-split scheme) [6] 𝑂 (𝑁 𝑙𝑜𝑔 𝑁 / 𝑙𝑜𝑔 𝐿 ) 𝑂 (𝐿 𝑑 ) Where 𝐿 𝑑 depends on the scheme in which it is implemented within its framework 𝑂 (1) Our work 𝑁 = ∑ (∑ 𝑆 𝑗 =1 𝑛𝑞 ) 𝑖 =1 𝑚 𝑂 (𝑛𝑞 ) 𝑂 (1) Table 2: Comparison with previous schemes based on space, locality, and read efficiency A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 125 3.1 Primitive tools ▪ QR codes A QR code is a two-dimensional code containing information in all directions, which is a digital image that can be easily scanned. Once scanned, it will quickly direct to the information embedded in the code, and the person who looks at the QR code is unable to identify it because its content is only machine-readable [20]. The QR code includes good storage capacity, fast readability, error correction, and support for more languages. It can hold 7,089 numeric characters and 4,296 alphanumeric characters [21]. ▪ Advanced Encryption Standard (𝑨𝑬𝑺 ) The 𝐴𝐸𝑆 algorithm is a widely used and recognized symmetric block cipher used for encrypting and decrypting sensitive data. It is implemented in hardware and software and is considered highly secure, making it difficult for hackers to access the original data. There is currently no known method for breaking 𝐴𝐸𝑆 encryption. 𝐴𝐸𝑆 offers the flexibility to use different key sizes, including 128, 192, and 256 bits, with a fixed block size of 128 bits. AES is a block cipher algorithm that uses substitution and permutation network techniques to encrypt and decrypt data. It operates on a fixed plaintext block size of 128 bits (16 bytes) represented as a 4×4 matrix. The number of rounds in AES depends on the key size, with 10 rounds for 128-bit keys, 12 rounds for 192-bit keys, and 14 rounds for 256-bit keys [22]. 4 Proposed scheme In this section, we pay more attention to our proposed scheme, which consists of five phases as follows: key generation, setup, token generator, secure search, and resolve. There are three major components, 𝐷𝑊 , cloud server (𝐶𝑆 ), and user (𝑈 𝑖 ) to manage the environment of our work. Hence, 𝑖 ∈ 𝑉 , 𝑉 represents the number of users. The following construction explains the proposed scheme. 4.1 Our scheme construction with more detail ▪ Key generation phase In this phase, 𝐷𝑊 creates a secret key 𝑘 by PRF which it can be explained as follows: Pseudo-Random Functions (𝑃𝑅𝐹 ): A PRF function is 𝐹 : {0,1} ∗ 𝑥 {0,1} ∗ → {0,1} ∗ . Thus, it receives two inputs a key and input. Therefore, 𝐹 cannot be distinguished from a truly random function only with negligible probability in 1 𝜆 , denoted as 𝑛𝑒𝑔𝑙 (1 𝜆 ) [23]. It is to be used in setup and CONSTRUCTION 1. (QR code based scheme) Let 𝐷𝐵 = {𝐷𝐵 (𝑤 1 ), . . , 𝐷𝐵 (𝑤 𝑚 )}, 𝑊 = Words in 𝐷𝐵 , 𝑊 = {𝑤 𝑖 , . . . , 𝑤 𝑚 } For 𝑤 𝑖 ∈ 𝑊 let 𝐷𝐵 (𝑤 𝑖 ) = {𝑖𝑑 1 , . . , 𝑖𝑑 𝑛 𝑖 }. Key generation phase 𝒌 & 𝒌 𝒖 ← 𝑮𝒆𝒏 (𝟏 𝝀 ): 1. Input Security parameter 1 𝜆 2. Output 𝑘 and 𝑘 𝑢 , where 𝑘 & 𝑘 𝑢 𝜖 𝑍 3. Compute both keys based on 𝑃𝑅𝐹 Setup and Secure phase 𝑺𝑰 ← 𝑬𝒏𝒄 (𝒌 , 𝑫𝑩 ): 1. Input 𝑘 and 𝐷𝐵 2. Output 𝑆𝐼 = 𝐻 𝑇 3. 𝑏𝑠 = 𝐹𝑏𝑠 (𝑛 ) 4. Initialize empty 𝐻 𝑇 5. For every 𝑤 𝑖 𝜖 𝑊 𝑛𝑞 = 𝑛 𝑖 /𝑏𝑠 Compute 𝑘 1 = 𝑃𝑅𝐹 𝑘 (1 ‖ 𝑤 𝑖 ) and 𝑘 2 = 𝑃𝑅𝐹 𝑘 (2 ‖ 𝑤 𝑖 ) Initialize counter 𝑖 = 0 For every 𝑏 𝜖 𝑛𝑞 𝑄 _𝑖𝑚𝑔 = 𝐶𝑟𝑒𝑎𝑡𝑒 _𝑞 (𝑏 ) 𝑆 = 𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑞 _𝑠 (𝑄 _𝑖𝑚𝑔 ) Delete 𝑄 _𝑖𝑚𝑔 Ŝ = 𝐸𝑛𝑐 𝑘 2 (𝑆 ) by AES256 Compute 𝑙𝑎 = 𝑃𝑅𝐹 𝑘 1 (𝑖 ) Add (𝑙𝑎 , Ŝ) to 𝐻 𝑇 𝑖 = 𝑖 + 1 6. Upload 𝐻 𝑇 to 𝐶𝑆 Token generator phase Ť ← 𝑻𝒓𝒑𝒅𝒓 (𝒌 𝒖 , 𝒌 , 𝒘 𝒊 ): 1. Input 𝑘 𝑢 , 𝑘 and 𝑤 𝑖 2. Output Ť 3. Compute 𝑇 = 𝑃𝑅𝐹 𝑘 (1 ‖ 𝑤 𝑖 ) = 𝑘 1 4. Ť = 𝐸𝑛𝑐 𝑘 𝑢 ( 𝑇 ) 5. Send Ť to 𝐶𝑆 Secure Search phase 𝑬 _𝑶𝒖𝒕𝒑𝒖 𝒕 𝒍𝒊𝒔𝒕 ← 𝑺𝒆𝒂𝒓𝒄𝒉 (𝒌 𝒖 , Ť , 𝑺𝑰 ): 1. Input 𝑘 𝑢 , Ť and 𝑆𝐼 2. Output 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 3. Initialize counter 𝑖 = 0 and 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 4. 𝑇 = 𝐷𝑒𝑐 𝑘 𝑢 ( Ť) = 𝑘 1 5. While true Compute 𝑙𝑎 = 𝑃𝑅𝐹 𝑘 1 (𝑖 ) Ŝ = 𝐺𝑒𝑡 (𝑙𝑎 ) Add(Ŝ) to 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 𝑖 = 𝑖 + 1 End While 6. 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = 𝐸𝑛𝑐 𝑘 𝑢 (𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) 7. Send to 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 𝑈 𝑖 126 Informatica 47 (2023) 121–132 A.A. Alyusif et al. secure phase 𝑆𝐼 ← 𝐸𝑛𝑐 (𝑘 , 𝐷𝐵 ) and send it to all trusted users. 𝑈 𝑖 should use secret key 𝑘 in the next phases, such as token generator phase Ť ← 𝑇𝑟𝑝𝑑𝑟 (𝑘 𝑢 , 𝑘 , 𝑤 𝑖 ) and resolve phase 𝑅 ← 𝑅𝑒𝑠𝑜𝑙𝑣𝑒 ( 𝑘 𝑢 , 𝑘 , 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ). In addition, a second key was created in the same way, called 𝑘 𝑢 , and it is used to secure 𝑇 and 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 exchanged between major components 𝐶𝑆 and 𝑈 𝑖 . ▪ Setup and secure phase The 𝐷𝑊 configures secure index 𝑆𝐼 , which will be uploaded to 𝐶𝑆 after completing this phase, where it uses 𝑘 and 𝐷𝐵 as main inputs and computes SI as output. The following steps explain the mechanism working of the current phase: • Configure empty hash table (𝐻 𝑇 ) that will be depended on to save secure parameters and then used as a secure index file (𝑆𝐼 ) for retrieving and processing the users’ requests. • Determine the size of the block 𝑏𝑠 = Fbs (n) where 𝑛 represents the number of identifiers of the database (𝐷𝐵 ) and 𝑏𝑠 is the number of identifiers that can be stored together as a single block allowed to create 𝑄 _𝑖𝑚𝑔 . From experimental results of our work, we discover a significant relationship between the number of identifiers of database 𝑛 and the size of 𝑄 _𝑖𝑚𝑔 . We found that the 𝑄 _𝑖𝑚𝑔 has the flexibility to save maximum numbers of identifiers, whereas the value of 𝑛 is low based on the number of 𝑛 ’s ranking) • For every keyword taken from 𝑊 where 𝑤 𝑖 𝜖 𝑊 , we carried out several successive steps as follows: − Compute the numbers of 𝑄 _𝑖𝑚𝑔 𝑛𝑞 that 𝑤 𝑖 will be needed by 𝑛 𝑖 /𝑏𝑠 , it is necessary to know the number of 𝑄 _𝑖𝑚𝑔 for each word based on the number of identifiers 𝑛 𝑖 of words after dividing it by 𝑏𝑠 . − Derivation of two keys from 𝑤 𝑖 . The first one 𝑘 1 is computed using 𝑃𝑅𝐹 , which takes 1 and 𝑤 𝑖 as its inputs as follows: 𝑘 1 = 𝑃𝑅𝐹 𝑘 (1 ‖ 𝑤 𝑖 ). The label 𝑙𝑎 has been created by 𝑘 1 . The second key 𝑘 2 creates in the same way 𝑘 2 = 𝑃𝑅𝐹 𝑘 (2 ‖ 𝑤 𝑖 ) . − Initialize counter 𝑖 = 0. − For every block 𝑏 𝜖 𝑛𝑞 do the following: − Convert this 𝑏 to QR code image 𝑄 _𝑖𝑚𝑔 using the 𝐶𝑟𝑒𝑎𝑡𝑒 _𝑞 (𝑏 ) function, which receives 𝑏 as input and converts 𝑏 to 𝑄 _𝑖𝑚𝑔 . − Convert the 𝑄 _𝑖𝑚𝑔 to a string 𝑆 by using 𝑆 = 𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑞 _𝑠 (𝑄 _𝑖𝑚𝑔 ) function. − Delete 𝑄 _𝑖𝑚𝑔 − Encrypt 𝑆 using Ŝ = 𝐸𝑛𝑐 𝑘 2 (𝑆 ), Ŝ represents an encrypted string. − Derives 𝑙𝑎 from 𝑤 𝑖 by 𝑙𝑎 = 𝑃𝑅𝐹 𝑘 1 (𝑖 ), which receives 𝑘 1 and 𝑖 as input to get a different 𝑙𝑎 each time. − Add the 𝑙𝑎 and Ŝ pair (𝑙𝑎 , Ŝ) to the 𝐻 𝑇 as a (key, value). − Increase the value of 𝑖 . ▪ Trapdoor phase The current phase is implemented by 𝑈 𝑖 to search for a specific word 𝑤 𝑖 , the 𝑇𝑟𝑝𝑑𝑟 receives 𝑤 𝑖 , 𝑘 𝑢 and 𝑘 as input parameters and gives Ť as an output, which will compute 𝑇 and then secure it using 𝑘 𝑢 , Ť = 𝐸𝑛𝑐 𝑘 𝑢 (𝑇 ) to send it to 𝐶𝑆 more securely. Resolve phase 𝑹 ← 𝑹𝒆𝒔𝒐𝒍𝒗𝒆 ( 𝒌 𝒖 , 𝒌 , 𝑬 _𝑶𝒖𝒕𝒑𝒖 𝒕 𝒍𝒊𝒔𝒕 ): 1. 1.Input 𝑘 , 𝑘 𝑢 and 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 2. 2.Output 𝑅 = 𝑖𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑟𝑠 3. 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = 𝐷𝑒𝑐 𝑘 𝑢 (𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) 4. Restore 𝑃𝑅𝐹 𝑘 (2 ‖ 𝑤 𝑖 ) = 𝑘 2 5. For every Ŝ 𝜖 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 𝑆 = 𝐷𝑒𝑐 𝑘 2 (Ŝ) 𝑄 _𝑖𝑚𝑔 = 𝑐𝑜𝑛𝑣𝑒𝑟𝑡 _𝑠 _𝑞 (𝑆 ) 𝑅 = 𝑟𝑒𝑎𝑑 (𝑄 _𝑖𝑚𝑔 ) Delete 𝑄 _𝑖𝑚𝑔 END CONSTRUCTION Figure 1: Key generation phase and setup and secure phase. A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 127 ▪ Secure search phase 𝐶𝑆 responds to a user’s query and returns an encrypted list of secure values (𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) in 𝑆𝐼 . CS performs the following steps: − 𝑇 = 𝐷𝑒𝑐 𝑘 𝑢 ( Ť) = 𝑘 1 − Initialize counter 𝑖 = 0 and empty 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = []. While True Let 𝑙𝑎 = 𝑃𝑅𝐹 𝑘 1 (𝑖 ) Retrieve 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 [𝑖 ] = Ŝ = 𝐺𝑒𝑡 (𝑙𝑎 ) 𝑖 = 𝑖 + 1 End − 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = 𝐸𝑛𝑐 𝑘 𝑢 (𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) − CS resubmits 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 to 𝑈 𝑖 . ▪ Resolve phase 𝑈 𝑖 receives 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 from CS and performs the following: − 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 = 𝐷𝑒𝑐 𝑘 𝑢 (𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) − 𝑘 2 = 𝑃𝑅𝐹 𝑘 (2 ‖ 𝑤 𝑖 ) − For every Ŝ 𝜖 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 − 𝑆 = 𝐷𝑒𝑐 𝑘 2 (Ŝ) − Convert 𝑆 to 𝑄 _𝑖𝑚𝑔 using the 𝑐𝑜𝑛𝑣𝑒𝑟𝑡 _𝑠 _𝑞 (𝑆 ) function. − Read the 𝑄 _𝑖𝑚𝑔 with the read function and get the identifiers stored in 𝑄 _𝑖𝑚𝑔 . 5 Security analysis 5.1 Server information leakage Leakage functions Ł, the amount of information the server can learn about the stored data. It can be divided into three types, the first two are Ł 𝑚𝑎𝑥 and Ł 𝑚𝑖𝑛 associated by single-round interaction SSE, and third one is Ł 𝑠𝑖𝑧𝑒 , which is in more than one round of 𝑆𝑆𝐸 interaction. Then, all types take 𝐷𝐵 and 𝑊 as input parameters [4], [6]. Ł 𝑚𝑎𝑥 (𝐷𝐵 , 𝑊 ): output (𝑁 , {𝐷𝐵 (𝑤 𝑖 )} 𝑤 𝑖 ∈𝑊 , 𝑚 , 𝑛 , 𝑛 𝑖 , Max(𝐷𝐵 (𝑤 𝑖 )) 𝑤 𝑖 ∈𝑊 ). It reflects the maximum cases of leakage. Ł 𝑚𝑖𝑛 (𝐷𝐵 , 𝑊 ): output (𝑁 , {𝐷𝐵 (𝑤 𝑖 )} 𝑤 𝑖 ∈𝑊 ). It is considered a somewhat acceptable leak. The server must know 𝑁 in all cases. In a single-round interaction, the server will be able to view the query results because it can decrypt the result. Ł 𝑠𝑖𝑧𝑒 (𝐷𝐵 , 𝑊 ): output (𝑁 , {|𝐷𝐵 (𝑤 𝑖 )|} 𝑤 𝑖 ∈𝑊 ) in more than one round of interaction where the server cannot decrypt data. Thus, when the response to the user’s request is encrypted, the server can learn only the output size, which is one of the best cases of leakage. In addition, detecting the size does not mean that the server has the ability to reveal the main information about our proposed scheme. Therefore, our work is resisting all types of leakage. In our work, we go into deep details for security analysis. We noticed from practical experiments of our scheme that the shape of the 𝑄 _𝑖𝑚𝑔 is simple and less complicated if the number of word identifiers in the input 𝑏 of the function is small 𝑄 _𝑖𝑚𝑔 = 𝐶𝑟𝑒𝑎𝑡𝑒 _𝑞 (𝑏 ). This case in turn will affect the next step, that is, the process of converting this 𝑄 _𝑖𝑚𝑔 into a string 𝑆 = 𝐶𝑜𝑛𝑣𝑒𝑟𝑡 _𝑞 _𝑠 (𝑄 _𝑖𝑚𝑔 ), which leads to a simple and less complex 𝑆 . Its size differs from the size of another 𝑆 coming from a block full of identifiers. This difference in size is considered a security vulnerability. However, we made sure to address it by making all strings have very close sizes. By increasing the complexity of the 𝑄 _𝑖𝑚𝑔 if the number of block identifiers is small, this 𝑄 _𝑖𝑚𝑔 will lead to a larger string size. Thus, the sum of 𝑆 does not reflect the real size of the database because they are all close in size, even if they are the result of preparing different identifiers. In addition, our work is more than one round of interaction. Furthermore, the server responds to the user’s request without decrypting the data. From this standpoint, we can say that our construction has only Ł 𝑠𝑖𝑧 𝑒 leakage. 5.2 Resisting attacks In this part, we discussed the resistance of our scheme, the most famous types of attacks on 𝑆𝑆𝐸 . ▪ Frequency analysis attack A frequency analysis attack is one of the known ciphertext attacks. It is built on studying the frequency of letters or collections of letters in a ciphertext [24]. Hence, it takes advantage of the frequency of encrypted data uploaded to 𝑆𝐼 , which is either term frequency 𝑇𝐹 or term frequency- inverse 𝑇𝐹 _𝐼𝐷𝐹 . 𝑇𝐹 is defined as the number of times 𝑤 𝑖 looks in a document 𝑖𝑑 , and TF-IDF is the multiplication of term frequency 𝑇𝐹 and 𝐼𝐷𝐹 values. 𝐼𝐷𝐹 is defined as a value of 𝑤 𝑖 that could be obtained by Fig. 2: Trapdoor, secure search and resolve phases. 128 Informatica 47 (2023) 121–132 A.A. Alyusif et al. dividing 𝑛 by the document frequency number of documents in which such a word appeared. If the 𝐶𝑆 can access this important information, then it can carry out this attack and know the keyword that is being searched for. Based on the above, we can say that our work is safe against the current attack because the values stored in 𝐶𝑆 are encrypted and do not reflect the identifiers originally, but rather an obscure text that helps the 𝑈 𝑖 to access identifiers later. ▪ 𝑰 𝑲𝑲 attack IKK attack uses the disclosed partly information to find out what plaintext words the 𝑈 𝑖 searched trapdoors [25]. Hence, this attack relies mainly on the leaking of the access pattern information, which is defined as the result of the search for 𝑇 in 𝑆𝐼 by 𝐶𝑆 . For example, suppose our 𝐷𝐵 is about computer science, and the user requests three queries as trapdoors: 𝑇 1 , 𝑇 2 , and 𝑇 3 , which represent the words “network,” “computer,” and “information,” respectively. After completing the communication between 𝑈 𝑖 and 𝐶𝑆 , the 𝐶𝑆 will look at the obtained results, which is the set of identifiers corresponding to the trapdoors. The 𝐶 𝑆 can then compute the probability of appearance of any two of these keywords in any document by observing the number of the same documents reverted for those corresponding trapdoors. By continuing the search and leaking the access pattern, thereby obtaining more probabilities, the server will be able to access the keywords that correspond to the trapdoors [24]. After a modest clarification of this attack, we can say that our scheme resists the IKK attack because the search result by the CS is encrypted, and the access pattern does not leak any important information. That is, the CS will not be able to access the identifiers {𝑖𝑑 1 , . . , 𝑖𝑑 𝑛 𝑖 } corresponding to the trapdoors [25]. ▪ Keyword guessing attack (𝐊𝐆𝐀 ) KGA attack is an attack on the encrypted index stored in the server, where the attacker tries to guess the keyword that is being searched to use it in the search later and find its identifiers [26]. This attack is resisted by various precautionary measures, such as encrypting the words themselves and keeping the key secret and secure. Both have been worked out in our scheme. Hence, we can say that our work resists KGA attacks. ▪ Man-in-the-middle attack 𝑴𝑰𝑻𝑴 This type of attack occurs when the communication channel between the 𝑈 𝑖 and the 𝐶𝑆 is not secured, as the attacker will occupy the identity of one of the parties (𝑈 𝑖 , 𝐶𝑆 ) [27]. We took the resistance to this attack into account in our work, where the communication channel was secured by encrypting the exchanged data (𝑇 and 𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) between the 𝐶𝑆 and 𝑈 𝑖 based on 𝑘 𝑢 key, in addition to mutual authentication between the two parties with the same key as the following construction: 6 Experimental results In this section, we evaluated the performance of our proposed scheme using a real-data collection of Wikipedia articles. We selected a total of 250 articles, along with their eight corresponding historical versions, resulting in a collection of 2000 files. Our collection contained a total of 117000 keywords. As the total number of identifiers 𝑛 for 𝐷𝐵 is 2000, the absorption of the QR code will be approximately 400 identifiers. In addition, the probabilities of nq values for each word will be from 1 to 5, which means that the maximum locality will be only 5. The experiments were conducted on a PC with a 2.6 GH Intel Core i5 CPU and 8 GB of RAM running on 64-bit Windows 10. The code was implemented in python because of its many known features, and it supports the creation of QR codes technologies. ▪ Retrieval time We conducted an experiment to find out the time taken to retrieve identifiers for three words that differ in the number of their identifiers 𝑛 𝑖 and the number of QR code 𝑛𝑞 . Moreover, the first word 𝑤 1 contains 340 identifiers 𝑛𝑞 = 1, and the second word 𝑤 2 contains 1172 identifiers 1172/400 = 2.93. That is, 3 𝑛𝑞 , and the third 𝑤 3 appears in all files 2000/400 = 5 𝑛𝑞 . Evidently, the retrieval time grows linearly with the increase in 𝑛𝑞 values. We notice that the retrieval time is the time of each of the secure search and resolve phases. CONSTRUCTION 2. In Trapdoor Phase: Choose a random identifier 𝑢 𝐼𝐷 by 𝑈 𝑖 ℎ𝑢 𝐼𝐷 = 𝐻 𝐾 𝑢 (𝑢 𝐼𝐷 ) by MD5 Send (ℎ𝑢 𝐼𝐷 || 𝑢 𝐼𝐷 || Ť) to 𝐶𝑆 In Secure Search Phase: Choose a random identifier 𝑐𝑠 𝐼𝐷 by 𝐶𝑆 If ℎ𝑢 𝐼𝐷 =𝑣𝑟𝑓𝑦 𝐾 𝑢 (𝑢 𝐼𝐷 ) → 𝑈 𝑖 authentication Before send 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ℎ𝑐𝑠 𝐼𝐷 = 𝐻 𝐾 𝑢 (𝑐𝑠 𝐼𝐷 ) by MD5 send (ℎ𝑐𝑠 𝐼𝐷 || 𝑐𝑠 𝐼𝐷 || 𝐸 _𝑂𝑢𝑡𝑝𝑢 𝑡 𝑙𝑖𝑠𝑡 ) to 𝑈 𝑖 In Resolve Phase: If ℎ𝑐𝑠 𝐼𝐷 =𝑣𝑟𝑓𝑦 𝐾 𝑢 (𝑐𝑠 𝐼𝐷 ) → 𝐶𝑆 authentication Where 𝐻 is hash function, 𝑣𝑟𝑓𝑦 is verification function, ℎ𝑢 𝐼𝐷 represent user hash value and ℎ𝑐𝑠 𝐼𝐷 represent cloud server hash value A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 129 ▪ Comparison with previous schemes To ensure the efficiency of our scheme, we implemented some of the previous schemes as a first step. These schemes are Cash et al. [28], Cash and Tessaro [4], Asharov et al. (Scheme 3) [2], and Demertzis and Papamanthou [5]. As a second step, we perform the same experiment on our scheme and these four schemes on the same database based on the AES256 encryption algorithm, which are the same words to know the time required to secure the search of the word identifiers. We chose two words: the first word 𝑤 1 with the largest number of identifiers and that appears in all database identifiers, that is, appears in 2000 files, and the second 𝑤 2 appears in 1000 files. For the comparison to be fair and as we used more than one round of interaction in our work to increase security, the search time was very short because 𝐶𝑆 does not decrypt and process the data. Hence, we calculated the retrieval time for our work to find out the total processing time to obtain the identifiers. Figure 3: The retrieval time for three words. Figure 4: Comparison with previous schemes. 130 Informatica 47 (2023) 121–132 A.A. Alyusif et al. 7 Conclusions In this study, we discuss the problem of slow retrieval of encrypted data owing to poor locality and present a new scheme that helps solve this problem by adding the QR code technique to the inverted index. This case mainly improves the locality. Finally, our scheme has less leakage and high security and is resistant to most common SSE attacks. Additionally, we used real-world data in our experiments and demonstrated that our proposed scheme is secure and practically efficient and holds high accuracy. References [1] G. Sen Poh, J.-J. Chin, W.-C. Yau, K.-K. R. Choo, and M. S. Mohamad, “Searchable symmetric encryption: designs and challenges,” ACM Comput. Surv., vol. 50, no. 3, pp. 1–37, 2017, doi: {10.1145/3064005}. [2] G. Asharov, M. Naor, G. Segev, and I. Shahaf, “Searchable symmetric encryption: optimal locality in linear space via two-dimensional balanced allocations,” in Proceedings of the forty- eighth annual ACM symposium on Theory of Computing, 2016, pp. 1101–1114. doi: {10.1145/2897518.2897562}. [3] E. Damiani, S. D. C. Di Vimercati, S. Foresti, S. Jajodia, S. Paraboschi, and P. Samarati, “Key management for multi-user encrypted databases,” in Proceedings of the 2005 ACM workshop on Storage security and survivability, 2005, pp. 74– 83. doi: {10.1145/1103780.1103792}. [4] D. Cash and S. Tessaro, “The locality of searchable symmetric encryption,” in Annual international conference on the theory and applications of cryptographic techniques, 2014, pp. 351–368. [5] I. Demertzis and C. Papamanthou, “Fast searchable encryption with tunable locality,” in Proceedings of the 2017 ACM International Conference on Management of Data, 2017, pp. 1053–1067. doi: {10.1145/3035918.3064057}. [6] G. Asharov, G. Segev, and I. Shahaf, “Tight tradeoffs in searchable symmetric encryption,” J. Cryptol., vol. 34, no. 2, pp. 1–37, 2021, doi: https://doi.org/10.1007/s00145-020-09370-z. [7] D. X. Song, D. Wagner, and A. Perrig, “Practical techniques for searches on encrypted data,” in Proceeding 2000 IEEE symposium on security and privacy. S&P 2000, 2000, pp. 44–55. doi: 10.1109/SECPRI.2000.848445. [8] D. Cash, S. Jarecki, C. Jutla, H. Krawczyk, M.-C. Roşu, and M. Steiner, “Highly-scalable searchable symmetric encryption with support for boolean queries,” in Annual cryptology conference, 2013, pp. 353–373. [9] S. Kamara, C. Papamanthou, and T. Roeder, “Dynamic searchable symmetric encryption,” in Proceedings of the 2012 ACM conference on Computer and communications security, 2012, pp. 965–976. doi: {10.1145/2382196.2382298}. [10] M. Chase and S. Kamara, “Structured encryption and controlled disclosure,” in International conference on the theory and application of cryptology and information security, 2010, pp. 577–594. doi: https://doi.org/10.1007/978-3-642- 17373-8_33. [11] P. Van Liesdonk, S. Sedghi, J. Doumen, P. Hartel, and W. Jonker, “Computationally efficient searchable symmetric encryption,” in Workshop on Secure Data Management, 2010, pp. 87–100. doi: https://doi.org/10.1007/978-3-642-15546- 8_7. [12] S. Kamara and C. Papamanthou, “Parallel and dynamic searchable symmetric encryption,” in International conference on financial cryptography and data security, 2013, pp. 258– 274. [13] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, “Searchable symmetric encryption: improved definitions and efficient constructions,” in Proceedings of the 13th ACM conference on Computer and communications security, 2006, pp. 79–88. doi: {10.1145/1180405.1180417}. [14] H. M. Mohammed and A. I. Abdulsada, “Secure Multi-keyword Similarity Search Over Encrypted Data With Security Improvement.,” Iraqi J. Electr. Electron. Eng., vol. 17, no. 2, 2021. [15] H. M. Mohammed and A. I. Abdulsada, “Multi- keyword search over encrypted data with security proof,” J. Basrah Res., vol. 47, no. 1, 2021. [16] C. Guo, W. Li, X. Tang, K.-K. R. Choo, and Y. Liu, “Forward Private Verifiable Dynamic Searchable Symmetric Encryption with Efficient Conjunctive Query,” IEEE Trans. Dependable Secur. Comput., 2023, doi: 10.1109/TDSC.2023.3262060. [17] Z. A. Abduljabbar, A. Ibrahim, M. A. Al Sibahee, S. Lu, and S. M. Umran, “Lightweight Privacy- Preserving Similar Documents Retrieval over Encrypted Data,” in 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), 2021, pp. 1397–1398. doi: 10.1109/COMPSAC51774.2021.00202. [18] M. A. Al Sibahee, A. I. Abdulsada, Z. A. Abduljabbar, J. Ma, V. O. Nyangaresi, and S. M. A New Approach Based on Intelligent Method to Classify… Informatica 47 (2023) 121–132 131 Umran, “Lightweight, Secure, Similar-Document Retrieval over Encrypted Data,” Appl. Sci., vol. 11, no. 24, p. 12040, 2021, doi: https://doi.org/10.3390/app112412040. [19] M. A. Hussain et al., “Provably throttling SQLI using an enciphering query and secure matching,” Egypt. Informatics J., vol. 23, no. 4, pp. 145–162, 2022, doi: https://doi.org/10.1016/j.eij.2022.10.001. [20] S. Singh, “QR code analysis,” Int. J. Adv. Res. Comput. Sci. Softw. Eng., vol. 6, no. 5, 2016. [21] A. S. Narayanan, “QR codes and security solutions,” Int. J. Comput. Sci. Telecommun., vol. 3, no. 7, pp. 69–72, 2012. [22] A. M. Abdullah, “Advanced encryption standard (AES) algorithm to encrypt and decrypt data,” Cryptogr. Netw. Secur., vol. 16, pp. 1–11, 2017. [23] J. Katz and Y. Lindell, Introduction to modern cryptography. CRC press, 2020. [24] D. V. N. Siva Kumar and P. Santhi Thilagam, “Searchable encryption approaches: attacks and challenges,” Knowl. Inf. Syst., vol. 61, no. 3, pp. 1179–1207, 2019. [25] D. Cash, P. Grubbs, J. Perry, and T. Ristenpart, “Leakage-abuse attacks against searchable encryption,” in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015, pp. 668–679. doi: {10.1145/2810103.2813700}. [26] Y. Miao, Q. Tong, R. H. Deng, K.-K. R. Choo, X. Liu, and H. Li, “Verifiable searchable encryption framework against insider keyword-guessing attack in cloud storage,” IEEE Trans. Cloud Comput., vol. 10, no. 2, pp. 835–848, 2020, doi: 10.1109/TCC.2020.2989296. [27] S. Gangan, “A review of man-in-the-middle attacks,” arXiv Prepr. arXiv1504.02115, 2015, doi: https://doi.org/10.48550/arXiv.1504.02115. [28] D. Cash et al., “Dynamic searchable encryption in very-large databases: Data structures and implementation,” Cryptol. ePrint Arch., 2014, doi: 10.14722/ndss.2014.23264. 132 Informatica 47 (2023) 121–132 A.A. Alyusif et al.