Editorial Boards Informatica is a journal primarily covering intelligent systems in the European computer science, informatics and cognitive com­munity; scientifc and educational as well as technical, commer­cial and industrial. Its basic aim is to enhance communications between different European structures on the basis of equal rights and international refereeing. It publishes scientifc papers ac-ceptedbyat leasttwo referees outsidethe author’s country.Inad­dition, it contains information about conferences, opinions, criti­calexaminationsofexisting publicationsandnews. Finally,major practical achievements and innovations in the computer and infor­mation industry are presented through commercial publications as well as through independent evaluations. Editing and refereeing are distributed. Each editor from the Editorial Board can conduct the refereeing process by appointing two new referees or referees from the Board of Referees or Edi­torial Board. Referees should not be from the author’s country. If new referees are appointed, their names will appear in the list of referees. Each paper bears the name of the editor who appointed the referees. Each editor can propose new members for the Edi­torial Board or referees. Editors and referees inactive for a longer period can be automatically replaced. Changes in the Editorial Board are confrmed by the Executive Editors. The coordination necessary is made through the ExecutiveEdi-tors whoexamine the reviews, sort the accepted articlesand main­tain appropriate international distribution. The Executive Board is appointed by the Society Informatika. Informatica is partially supported by the Slovenian Ministry of Higher Education, Sci­ence andTechnology. Each author is guaranteed to receive the reviews of his article. When accepted, publication in Informatica is guaranteed in less than one year after the Executive Editors receive the corrected version of the article. Executive Editor – Editor in Chief Matjaž Gams Jamova 39, 1000 Ljubljana, Slovenia Phone: +38614773 900,Fax: +38612519385 matjaz.gams@ijs.si http://dis.ijs.si/mezi/matjaz.html Editor Emeritus AntonP. Železnikar Volari.ceva 8, Ljubljana, Slovenia s51em@lea.hamradio.si http://lea.hamradio.si/˜s51em/ Executive Associate Editor -Deputy Managing Editor Mitja Luštrek, Jožef Stefan Institute mitja.lustrek@ijs.si Executive Associate Editor -Technical Editor DragoTorkar, Jožef Stefan Institute Jamova 39, 1000 Ljubljana, Slovenia Phone: +38614773 900,Fax: +38612519385 drago.torkar@ijs.si Executive Associate Editor -DeputyTechnical Editor TineKolenik, Jožef Stefan Institute tine.kolenik@ijs.si Editorial Board Juan Carlos Augusto (Argentina) Vladimir Batagelj (Slovenia) Francesco Bergadano (Italy) Marco Botta (Italy) Pavel Brazdil (Portugal) Andrej Brodnik (Slovenia) Ivan Bruha (Canada) Wray Buntine (Finland) Zhihua Cui (China) Aleksander Denisiuk (Poland) Hubert L. Dreyfus (USA) Jozo Dujmovi´ c (USA) Johann Eder (Austria) George Eleftherakis (Greece) Ling Feng (China) VladimirA.Fomichov (Russia) Maria Ganzha (Poland) Sumit Goyal (India) Marjan Gušev (Macedonia) N. Jaisankar (India) Dariusz Jacek Jakczak (Poland) Dimitris Kanellopoulos (Greece) Samee Ullah Khan (USA) Hiroaki Kitano (Japan) IgorKononenko (Slovenia) MiroslavKubat (USA) Ante Lauc (Croatia) Jadran Lenarci.. c (Slovenia) Shiguo Lian (China) Suzana Loskovska (Macedonia) Ramon L. de Mantaras (Spain) Natividad Martínez Madrid (Germany) Sando Martinci.´c (Croatia) c-Ipiši´Angelo Montanari (Italy) Pavol Návrat (Slovakia) Jerzy R. Nawrocki (Poland) Nadia Nedjah (Brasil) Franc Novak (Slovenia) MarcinPaprzycki (USA/Poland) Wies awPaw owski (Poland) Ivana Podnar Žarko (Croatia) Karl H. Pribram (USA) Luc De Raedt (Belgium) Shahram Rahimi (USA) Dejan Rakovi´ c (Serbia) Jean Ramaekers (Belgium) Wilhelm Rossak (Germany) Ivan Rozman (Slovenia) Sugata Sanyal (India) Walter Schempp (Germany) Johannes Schwinn (Germany) Zhongzhi Shi (China) Oliviero Stock (Italy) RobertTrappl (Austria) TerryWinograd (USA) Stefan Wrobel (Germany) Konrad Wrona (France) XindongWu (USA) Yudong Zhang (China) Rushan Ziatdinov (Russia&Turkey) Possibilities for Applying Blockchain Technology – a Survey Mimoza Mijoska and Blagoj Ristevski Faculty of Information and Communication Technologies – Bitola, University "St. Kliment Ohridski" – Bitola Partizanska bb, 7000 Bitola, Republic of Macedonia E-mail: mijoska.mimoza@uklo.edu.mk, blagoj.ristevski@uklo.edu.mk Keywords: blockchain technology, healthcare, decentralized personal data protection, digital property, Internet of things Received: July 24, 2020 The blockchain technology has the potential to be applied in a variety of areas of our daily life. From the original Bitcoin cryptocurrency to the current smart contracts, the blockchain has been applied to many domains. Its numerous applications result in much ongoing research in different practical and scientific areas. This new technology is being seen as a revolutionary solution in finance, decentralization, trust, identity, data ownership and data-driven decisions. This paper presents the novel solutions associated with some of the big data areas that can be empowered by the blockchain technology such as healthcare, decentralized personal data protection, digital property, Internet of Things, digital identity, financial services and infrastructure, e-commerce, educational records, educational system, knowledge sharing, insurance, food industry, accounting, auditing and e-voting. Blockchain technology could be used in electronic health records and the establishment and maintenance of birth registers, deaths, marriages, registration of business activities, but also in the organization of elections. The features of this technology can be the redefinition of Internet 3.0 defined as a new type of decentralized infrastructure or network of networks. Povzetek: Pregledni clanek prikazuje možnosti uporabe tehnologij veriženja. In 2008, powerful American financial institutions and insurance companies were on the edge of bankruptcy. These circumstances called for immediate intervention by the federal government to avoid domestic and possibly global financial collapse. These events illustrated the dangers of living in a digital, interconnected world that depends on transactional intermediaries and leaves people vulnerable to digital exploitation, greed, and crime. Blockchain technology is a relatively new concept and a fast-growing part of the Internet and cloud computing. Similar data structures existed long before the famous bitcoin was conceived, but the main theories about blockchain architectures used today were originally defined in the original article on bitcoin written and published by a person (or a group of people) under the pseudonym Satoshi Nakamoto in 2008 [1]. The first blockchain innovational application was Bitcoin. Beyond their use in the economic domain, Bitcoin and blockchain technology as articulated by Nakamoto solve an important computer science problem that had been a barrier to having a functional digital monetary system for years: the double-spending problem. The double-spending problem refers to a case where money should only be spent once. The first Bitcoin transactions occurred in January 2009. With the first “release” ofthe Bitcoin network on January 3, 2009, the first Bitcoins (cryptocurrency units) were created. Nakamoto developed the software and he was the lead developer until the mid-2010s. Bitcoin was the first cryptocurrency created on blockchain technology basis in 2009. Later, it was noticed that blockchain as a technology could be used for other purposes than the realization of cryptocurrencies. Since 2015, many international financial organizations have planned to further develop the blockchain system. In 2014, a consortium called R3 was established to start research and development of blockchain technology. In March 2017, this group counted about 75 companies to reach 200 in March 2018, including Bank of America, Merrill Lynch, UniCredit Group and many other real estate companies with the goal of better education, law and technology development in blockchain technology [2]. This paper describes the interesting implementations of blockchain technology in various domains such as healthcare. That includes all healthcare stakeholders such as hospitals, healthcare facilities and physicians. Then, a proposed solution for protecting data privacy is described by finding a solution for the digital property register. Next, a secure and distributed blockchain technology system is displayed that can serve as an IoT platform in which devices are connected reasonably and securely. The potential of this technology to replace all existing physical identities such as passports, driver's licenses and ID cards and to move them to a digital platform is presented. It is described how the application of this technology can improve financial services and payment opportunities, provide a clear and transparent supply chain management system in the retail industry. By using this technology, student records, as well as completed courses, test results, diplomas and more, can be stored in the form of a digital record. .lso the attendance of online classes and the realization of online teaching during the COVID-19 pandemic can be stored in digital form that cannot be changed. Moreover, this paper depicts the decentralization of Wikipedia's knowledge base. This is followed by applications of blockchain technology in the insurance sector, the food industry, accounting, auditing and election management. The remainder of the paper is organized as follows. Section 2 highlights the principles of blockchain technology. The application of blockchain technology is described in the subsequent section. Section 4 describes the key challenges in the implementation of blockchain technology. Finally, concluding remarks and directions for further works are given in the last section. 2 Blockchain technology “Blockchain" is a coined word, composed of the words "block" and "chain". Blockchain is a distributed replicated database organized in the form of a single linked list ­chain, where nodes are blocks of data for transactions. These blocks, after grouping, are protected by using cryptographic methods. Blockchain technology enables the accomplishment of digital transactions without intermediaries. Linking blocks use a cryptographic hash function, in a way that makes it impossible to change the content of a block without changing the contents of all subsequent blocks. This is a very important feature of the blockchain, as it ensures the unchangeability of the data which it contains. The SHA-256 hash function for the same input data always produces the same output with a fixed length. Regarding the blockchain technology, the benefit of using cryptographic hash functions is that they are cryptographically strong hash functions with requirements such as second pre-image resistance and can serve as a digital fingerprint. Blockchain technology is based on P2P architecture, where the nodes involved in the implementation of the service have a copy of all records and they constantly communicate with each other and synchronize new records [41]. Blockchain technology covers techniques from several disciplines: cryptography, mathematics, algorithms and economic theory. They are combining P2P networks and use a distributed consensus algorithm to solve traditional distributed database synchronize problem. Blockchains are integrated multi-user databases that form a type of decentralized infrastructure. The blockchain technology is specified of the following six key features [7] [10]: Decentralization. The basic feature of blockchain, which means that blockchain does not have to rely on a centralized node anymore, the data can be entered, stored and updated in a distributed manner. Transparency. The data's record by blockchain system is transparent to each node. It is also transparent to update the data, enabling the blockchain to be a trustworthy technology. Open Source. Most blockchain systems are open to everyone. The record can be check publicly and people M. Mijoska et al. can also use blockchain technologies to create any desired application. Autonomy. Because of the principle of consensus, every node on the blockchain system can transfer or update data safely. The idea is to distrust individual network participants and to trust the integrity of the whole blockchain. Immutable. Any records will be preserved forever, and cannot be changed unless someone can take control more than 51% of peer network nodes at the same time. Anonymity. Blockchain technologies solve the trust problem between a node to node, so data transfer or even transaction can be anonymous. They only need to know the person's blockchain address. The blockchain database is managed autonomously using a P2P network and a distributed timestamping server. It is empowered by mass collaboration driven by collective self-interest. The result is a robust workflow, where participants' uncertainty about data security is marginal. Blockchain removes the features of endless duplication of digital assets. This forms a consensus system which ensures that each unit of value is transferred only once, solving the longstanding problem of double spending and blockchain-based value exchange system can be faster, safer and cheaper than traditional systems. 2.1 Decentralization of the blockchain Blockchain (distributed ledger technology) is a network software protocol that enables the secure transfer of money, assets, and information through the Internet, without a third-party organization as an intermediary [3]. It can safely store transactions such as digital cryptocurrencies or data/information about debt, copyrights, equity, and digital assets. The stored information cannot be easily forged and tampered because it requires individual approval of all distributed nodes. This significantly reduces the cost of trusting and accounting that commonly exist in non-digital economies and other social activities. Blockchain has four components [43]: (1) hash, which uses one-way mathematical functions to assign unique indexes; (2) a digital signature, which is implemented as a public cryptographic key; (3) P2P network, which serves as a routing structure for nodes to use the distributed hash; and (4) consensus mechanism, which is a set of digital procedures designed to ensure the accuracy and consistency of the stored information across the participating nodes. The blockchain data structure is depicted in Fig.1 [4]. In the Blockchain Body, the bottom is a part of Merkle Hash Tree which can be either a binary tree or a multi-tree in the data structure. Specifically, data or information is recorded as the hash value stored in the Blockchain Body, and the generated Merkle root through Merkle tree's hash process will be recorded in Blockchain Head [42]. The blockchain technology platform is gradually shaping up into three directions: (1) underlying infrastructure which includes facilities for mining and manufacturing of specialized computer hardware to perform blockchain-related tasks; (2) middle layer between the blockchain platform and client application services, including smart contracts, a blockchain platform, financial software, and other services; and (3) hotspot distributed applications in various industries, including finance (e.g., cross-border payments, liquidation, financial services, and asset digitization), cybersecurity (e.g., identity protection, data authenticity protection, and critical infrastructure protection), and supply chain management (e.g., logistics tracking and digital works tracking). This distributed general ledger is replicated to thousands of computer nodes around the world and is publicly available. Despite its openness, it is also confidential and reliable. This is achieved through a mathematical puzzle and computer power embedded in its "consensus mechanism" -the process in which the nodes agree on how to update the blockchain with each transaction of moving the value from one person to another. Users use public and private keys to digitally sign and make transactions in the system in a secure way. Blockchain users can solve puzzles using cryptographic hash methods hoping to be rewarded with a fixed amount of cryptocurrency [5]. Blockchain systems seem very complex. However, they can be easily understood by examining each technology component individually. At a high level, blockchains utilize well-known computer science mechanisms (linked lists, distributed networking) as well as cryptographic primitives (hashing, digital signatures, public/private keys) mixed with financial concepts (such as ledgers) [6]. A ledger is a collection of transactions. Ledgers are often stored digitally in large databases owned and operated solely by centralized "trusted" third parties. However, we must trust the third party that the data are backed up, transactions are validated and complete, and the history is not altered. A ledger implemented using a blockchain can mitigate these issues through the use of a distributed consensus method. One of the aspects is that the blockchain ledger will be copied and distributed amongst every node within the system. When new transactions are submitted to a node, the rest of the network is alerted that a new transaction has arrived and Informatica 45 (2021) 319–333 321 at this point, this is a pending transaction. Eventually, one of the nodes will include this new transaction within a block and complete the system's required consensus method. This new block will be distributed across the network and all ledgers will be updated to include the new transaction. When new users join the system, they receive a full copy of the blockchain, making loss or destruction of the ledger to be difficult. Each transaction submitted to the network passes through several steps to be included and published in a block of the blockchain: A transaction is a record of a transfer of assets (digital currency, units of inventory, etc.) between involved parties. For each input transaction A, an output hash value #A is created using a cryptographic function. Hashing is a method of calculating a relatively unique fixed-size output for an input of nearly any size (e.g., a file, some text, or an image). Even the smallest change of input will result in a completely different output digest. Hash functions are designed to be one-way: they are computationally infeasible to find any input that maps to any pre-specified output. If a particular output is desired, many inputs must be tried by passing them through the hash function until input is found that gives the desired result. Moreover, hash algorithms are designed to be collision-resistant, they are computationally infeasible to find two or more inputs that produce the same output. A commonly used hashing algorithm in many blockchain technologies is the Secure Hash Algorithm (SHA) with an output size of 256 bits (SHA-256). Each block in a blockchain contains multiple transactions, which are grouped in sets. Hash values are further combined in a system called a Merkle tree [5]. Merkle tree is a data structure where the data is hashed and combined until there is a singular root hash that represents the entire structure. The root is an efficient mechanism used to sum up the transactions in a block and verify the presence of a transaction within a block. This structure ensures that the data sent in a distributed network are valid since any alteration to the underlying data would be detected and can be discarded. The result of all the hashing then goes into the block’s header, and it is combined withthe hashofthepreviousblock’sheaderand a timestamp. This combination becomes a part of the cryptographic puzzle. The solution for the puzzle is to find a nonce value. The nonce value is a number manipulated by the mining node to solve the hash puzzle and with this, it gives them the right to publish the block [5]. After creation, each block is hashed thereby creating a digest that represents the block. The change of even a single bit in the block would completely change the hash value. The block's hash digest is used to prevent the block from changes since all nodes will have a copy of the block's hash and can then check to make sure that the block has not been changed. An additional feature of blockchain systems is that they can run so-called smart contracts [6], which is an auto executable code that fires off once certain conditions occur. A smart contract is computer protocol or collection of code and data which runs automatically under defined criteria when deployed on the blockchain. The contract executes the appropriate method with the user-provided data to perform a service. The code, being on the blockchain, is immutable and therefore can be used (among other purposes) as a trusted third party for financial transactions that are more complex than simply sending money between accounts. A smart contract can perform calculations, store information, and automatically send money to other accounts. It doesn't necessarily even have to perform a financial function. Instead of relying on a central authority to securely deal with other users, blockchain uses innovative nodes consensus protocols, verifies transactions, and seamlessly records data. Because blockchain is a distributed ledger, similar to a database, the data stored must be authentic and accurate. Separate nodes in the network constantly confirm the authenticity of the entries in the chain and reject the proposed data blocks, if they do not pass verification. Taking control of a large number of nodes in the network (more than 50%) would be extremely complicated and expensive for attackers and is much more difficult than compromising any centralized service. Therefore, this approach to data storage is considered far more secure compared to centralized databases. Theoretically, it is possible to break a blockchain, if the attacker controls more than half the total network power, in which case it is called a 51% attack. But if someone succeeds to change the blockchain, it will destroy it and hence it will have no value. 2.2 Types of blockchain systems Blockchain is a technology that is constantly evolving. Due to its basic technological features, new applications are constantly being developed using its framework. The most common types of blockchains are: - public blockchain; - private blockchain and - hybrid blockchain. 2.2.1 Public blockchain A public blockchain is one that contains absolutely no restrictions. Public blockchains allow anyone to submit data to the general ledger with all participants holding an identical copy of the general ledger. As there is no single owner of the general ledger, this methodology is more suitable for anti-censorship applications (for instance, Bitcoin) 0. The public blockchain has the following features: • Anyone can access it, i.e. see all the transactions that appear on the block. Several services allow you to view the public blockchain (known as block explorers), and the most famous is blockchain.info, which can track bitcoin blockchain. • Anyone can make transactions. It isenough todownload a mobile or desktop wallet (Wallet) or to use one of the online wallets and to carry out transactions freely. • Anyone can participate in the creation of blocks and in the distribution of the reward that follows for adding blocks. Alternatively stated anyone can be a "miner". • Everyone can have a say in deciding whether to amend the cryptocurrency protocol. In some cryptocurrencies, M. Mijoska et al. miners make decisions, but there are cryptocurrencies in which other participants have a controlling stake. • The protocol that controls the system is in the form of open source. Anyone can view this code, but anyone can suggest changes to that code. If the majority accepts the proposed changes, those changes become an integral part of the protocol. It is also very common to take the protocol from one cryptocurrency, modify it slightly and then launch it as a new cryptocurrency. The full openness of public blockchains reveals some of their advantages and disadvantages. The main advantages of public blockchains are: • Blockchain is resistant to potential attacks. • Because anyone can be a node in a P2P network, the number of these nodes is very large and therefore it is more difficult to have more than 50% "unfair players", which is very expensive. The main disadvantages of public blockchains are: • The capacity of the blockchain is very limited, both in terms of the number of transactions that can be processed per unit time and the amount of data that can be stored in the blockchain. For more people to participate in network maintenance, the requirements must be relatively modest, both in the amount of hard disk space occupied by the blockchain and in the speed of the Internet connection. The Bitcoin network can currently process only a few transactions per second, and by comparison, Visa can process tens of thousands of transactions per second 0. Management mode is inefficient. To implement any, even the slightest changes of the system, it is necessary for the majority of the members of the network to agree to this change. In comparison, this would be roughly how a referendum would be organized for each decision in the country 0. 2.2.2 Private blockchain In private blockchains, only invited participants are allowed to join the network. These networks are controlled by one or more designated network administrators. Private blockchains allow distributed duplicates of the registry, but only to a limited number of trusted users. Because the network may have an owner, this methodology is better for applications that require simplicity, speed, and greater transparency. What companies do not like is the transparency and the fact that the system is accessible to everyone. That is why the idea of a private blockchain has emerged that will retain most of the benefits of the public blockchain, but will also eliminate the disadvantages that do not suit companies 0. 2.2.3 Hybrid blockchain Hybrid blockchains, called consortium blockchains, are considered as semi-decentralized and use the characteristics of public and private blockchains. Hybrid blockchains contain licensing groups similar to private blockchains, but instead of one controlling organization, they are controlled by a group of contracted organizations to ensure the validation of transactions [42]. The consortium blockchain allows a group of individual organizations to validate blocks, instead of having everyone participate in the process, or having only a single entity to decide the validation process. Hyperledger Fabric [38] and Hyperledger Burrow [39] are examples of consortium blockchain frameworks. For the consensus mechanism, the consortium blockchain uses consensus algorithms to validate transactions, such as Byzantine fault tolerance (BFT) and practical BFT (PBFT) consensus through the Tendermint algorithms, which are not expensive computationally. 2.3 Smart contracts A smart contract is a code in a programming language that facilitates the exchange of money, real estate, shares, or any value [39]. Smart contracts are computer programs that automatically execute the terms parties have agreed on, to regulate their relations. This code can be written to a blockchain and executed on any computer in a distributed network. The smart contract is automatically executed when specific conditions are fulfilled. Because the code of the smart contract is written on the blockchain, the execution takes place without any possibility of censorship, interruptions, fraud or intrusion by third parties. We can say that the block on which smart deals are stored is a distributed operating system. In the context of blockchain, smart contracts have existed before Ethereum. American developer Nick Shabo defined the concept back in 1996 in the book Smart Contracts: Building Blocks for Digital Markets. He wrote that the basic idea of smart contracts is that many types of contract clauses can be embedded in the hardware and software we communicate with, in such a way that breach of contract is costly for the offender [6]. Application of blockchain technology Besides the first application of blockchain technology in the creation of cryptocurrencies such as bitcoin and many other altcoins, it is widely used in different domains of daily living where new innovative ways of working are introduced. 3.1 Healthcare system One challenging implementation of blockchain technology is in the healthcare involving all stakeholders such as hospitals, healthcare authorities and clinicians by satisfying patient needs and protecting patients' privacy by using blockchain to pay fees with bitcoin [8]. Until now, doctors have faced many obstacles when it comes to digital storage and information sharing for patients. The authors in [10] identified three major barriers to effective digital record management: -easy access to medical records; -maintaining the patient's privacy and -ensuring the sharing of records between different platforms without losing their meaning and proving their authenticity and credibility. With blockchain technology, all three obstacles are reduced to an acceptable level or eliminated. Informatica 45 (2021) 319–333 323 Firstly, decentralized blockchain databases can be accessed by authorized persons, whether they are healthcare providers, insurance companies or patients ­anywhere, anytime, and in a format that can be used by all parties. An additional advantage of such a system would be the elimination of centralized data controllers. Secondly, blockchain technology provides security measures that are not available for other digital sharing methods, making it easier to address privacy concerns. Finally, blockchain technology reduces the number of agents that manage each entry and maintains a consistent, timely record of each transaction, reducing the possibility of error while providing a high level of transparency and trust. In the current system, where full records are kept on paper, if information seekers need to see a personal health record (PHR), they must complete a request form and send it to the registration office for approval. Upon approval, the information requester will pay for a copy at the cashier and receive a payment confirmation account. The information seeker then shows the account to the registration office to obtain a copy of the PHR. However, the PHR may be lost or duplicated for illegal purposes. When the information seeker sends a PHR request to the publisher (hospital or healthcare institutions) and the publisher agrees with the information requestor, Bitcoin will be placed. Before sending a PHR to the information seeker, it requires the approval of a family doctor and the patient, so that only specific anonymized records are required for the patient to be sent [8]. Rapke in his approach to healthcare discussed that perhaps changes should come from a point where people own and can access data about their health [10]. With his working concept blockchain has a way of bringing this consumer-oriented approach to the healthcare sector. Data from the results and procedures can be stored on the blockchain, which will not rely on a single central storage facility. This will help the governments and other companies to be held accountable for that data. At the same time, the data will reside on the latest secure technology using cryptography. As data owners, patients will have the authority to decide with whom they share their data. Healthcare will be more patient-oriented, but it is still in balance with other important players in the healthcare system. In [12] authors explained how blockchain can enable the interoperable and secure exchange of electronic health records (EHR) in which health consumers are the ultimate owners. The proposed scenario is to store only health and medical metadata on the blockchain. Otherwise, blockchain's infrastructure storage capacity would have to be massively expanded to support full health records. So, metadata such as patient identity, visitor ID, provider ID, customer ID, etc. can be stored on blockchain but the actual records should be stored in a separate universal health cloud, as shown in Fig. 2. For instance, if a patient visits two hospitals today, they will store their data in two databases that the patient does not own. If hospitals need to communicate, they will use a standardized communication mediator such as web services, e-mail, or a shared file repository. In a scenario where blockchain is applied, the first hospital creates a Figure 2: Overview of the Blockchain Healthcare System [13]. record of the universal health cloud. The hospital then creates a transaction in blockchain with metadata for visits and a URL (Uniform Resource Locator) for the cloud entry. The patient signs this transaction with his/her key. When a patient now visits a second hospital, he/she must provide his/her key reading blockchain transactions. Only those affected by the patient's key can decrypt the transactions. So this is an example of how people can own data and approve access. Even smart contracts can be encoded in blocks to carry insurance instructions, emergency contacts, wills etc. These smart contracts will be activated with events that a blockchain can read from another web service. Transactions in blocks should contain a unique user identifier, a coded link to the health record, and a timestamp for the time when the transaction was made. The transaction may also contain the type of stored data. Depending on the implementation, this can help search and process the access data. This blockchain will contain a history of all medical data, including formal medical records, as well as data from mobile applications and various sensors being carried. Keeping data away from blockchain in a cloud pool can be a good basis for exploration, mining, analytics and machine learning. This type of analysis does not have to affect the privacy of each patient. These data have to be encrypted and digitally signed to ensure the privacy and authenticity of the information. The user will have the option to assign a set of access permissions and specify who can request and write data to their blockchain. Further development of the user interfaces for the patient to review their healthcare data and manage access privileges is entirely possible [14]. Another blockchain case study in healthcare uses Ethereum's smart contracts to create representations of existing medical records [15]. These contracts are stored directly in separate nodes on the network. The proposed solution called MedRec structures large amounts of data into three types of contracts. The first one is the Registrar Contract. It holds the identity of the participants with all the necessary details and of course the public keys. This type of identity registration can be restricted to authorized institutions only. The second contract is the relationship between the healthcare provider and the patient. The main use will be when there is a smart agreement between the care provider and the patient. A third type is a contract that helps the patient to locate his/her medical history. As a result of this agreement, all previous and current M. Mijoska et al. engagements with other nodes in the system are listed. MedRec also proposes a data mining model that incorporates the entire healthcare community into data mining. Medical researchers and healthcare stakeholders can work in the network. Big data in healthcare come from a variety of sources, such as clinical trials, EHRs, patient databases, medical measurements and imaging. All these data come in a wide range of formats and from different data streams. The data should be evaluated and interpreted temporally to benefit patients. But doctors need new tools to monitor, track, and provide quick feedback to individual patients. Appropriate data management will also help with prediction strategies, interventions, healthcare services and healthcare policies. Because medicine is always on the cutting edge of technology, moving forward with many innovations, healthcare data need a crucial transformation, as do big data. 3.2 Decentralized protection of personal data From social networks, personal data, activities and behaviours of users are constantly being collected, without users to be conscious about the privacy issues. Users still do not have a clear idea of what precise data is being collected and for what purpose, they are losing whole control of what is happening with the data afterwards and cannot withdraw the permissions. Concerns about data privacy grow when faced with the consequences of what others have seen or learned about us. Zyskind in [16] has proposed a solution that is an access control management system that mainly focuses on mobile platforms and the inability of the user to recall authorized access to private data. By installing a mobile application, permissions are granted indeterminately and the users must uninstall the application and stop using the services if they want to revoke the access. The purpose of the new solution is for the user to be able to control and revise what data are stored and how they are used. The idea is to keep the personal data access policies on blockchain and then the nodes in the blockchain to control access to the DHT (distributed hash table). The solution consists of three entities: the user, the company providing the service and blockchain. When a user wants to allow or block access to his/her data, blockchain comes into play as an intermediary. Here blockchain supports two types of transactions: access transaction and data transaction. These types of transactions allow access control management, data storage, and data extraction. When the user installs a new application, a shared identity is created and sent to blockchain along with the configured permissions at the user's request. All assigned permits are listed in the so-called policy. The shared keys, the public key of the user and the public key of the service, and the policy are sent through an access transaction in the blockchain. A new complex identity is being introduced into the proposed system. Complex identity is a shared identity between the user and the service. The user is the owner of the key, and the service is a guest. The complex key is composed of pairs for signing on both sides, so that the data will be protected by all other parties in the system, except the owner and all his/her guests. Sensitive user data are encrypted with the shared encryption key and sent with transaction data for storage. Blockchain sends data to a DHT and retains only the hash value as a pointer to the data. The value set in DHT is encrypted with the complex key. The value indicator is known to the user and the service. DHT only performs the already approved read and write functions. Both the user and the service can search the data using the data pointer. Each time the service accesses the data, its permissions are checked compared to the last access transaction. The user can revoke permissions at any time or modify them by initiating a new access transaction. To track this, it is possible to easily develop a web table showing the user's current permissions. Overview of the decentralized permitting system using blockchain is shown in Fig. 3. Another study based on social media controlled by users on the blockchain is described in [17] by Ushare. Ushare's vision is that users should control their presence over the Internet by affecting the posts they share and controlling the ability to share again. Using P2P capabilities, Ushare created a decentralized content distribution network. The tool that blockchain manages, in this case, is the data that users publish. Ushare's proposed solution consists of a user-encrypted hash table, a system for controlling the maximum number of actions performed by user circles, a local Certificate Authority (PCA) that manages user circles and blockchain. When a user shares a post with his/her circle, his/her PCA encrypts the data with the public key on the circle. Encrypted data are stored in a DHT. This table has three columns that allow users to share already published data they see. Each time a user shares a post, the first column stores hacked encrypted data about the post that is being viewed and shared. The second column records the hash of the encrypted data with the public key on its circle. The third column stores the encrypted data item. The reason for using DHT in this second solution is the same -large data such as documents, images and videos should be stored in a decentralized manner. Blockchain only stores transactions for sharing user posts. The actual data cannot be stored because downloading the Informatica 45 (2021) 319–333 325 entire chain to all nodes will create limitations in computer calculations and time constraints. When the user creates a post, he/she sends a new transaction to the blockchain with his/her identity, the hash key of the encrypted data and the token in which he/she states the allowed number of shares. Afterwards, the user sends a separate transaction to each member of his/her circle with the data encryption key. If another user who has received the post wants to share it, he/she sends a new transaction with his/her identity and the data key encrypted with the key to this new user's circle. Again, more new transactions are sent to the subsequent users who can review the re-shared transaction. The number of tokens decreases with each action. All efforts to create these two blockchain solutions are because personal and sensitive data should not be entrusted to third parties. Because users create and publish data, they need to remain the main owners of the data. Regarding the monitoring done by following the procedures and interests of the users, the users should at least know that. A blockchain can be a filter for permissions to access private data or it can implement a fully decentralized social network, as shown in the second solution. 3.3 Digital property The World Wide Web (www) started with simple links and laid the groundwork for making it visible what is original and what is being copied. After that, people thought of a way to preserve authorship by mentioning the author in references. But this system is far from perfect. People can always find a way to copy things, and the author will not know and will not be notified. Or, people may assign the copyright, but it's in one direction, so the author still won't know that someone has addressed his work. Or even worse, people may cite someone else who is not the original author. It is concluded that the main tool for digital content and digital sharing (www) ignores the need for digital ownership. But at the beginning of the development of the service www was different. The Xanadu Project was the first hypertext project, founded in 1960 by Ted Nelson [18]. Even then, the problem of digital ownership was approached by introducing a copyright scheme based on copyright, in a system that would provide storage and publishing services. The granting of copyright is built into this system. Two-way connections should be set up automatically each time someone uses another user's data. It was concluded that it was a complicated, unfeasible technology and the project was closed. With blockchain technology, Ascribe [19] tries to achieve Xanadu's goals by finding a solution for the digital property registry and copy visibility. In terms of visibility, they try to find all the copies of the protected content that exist on the Internet. This can be done by searching the entire Internet and performing a match to match the content of the creator. This problem would be solved by machine learning techniques. When the copies are found, the system performs two-way automatic connections. The author must then decide whether to apply for a license or perhaps a revocation request. When it comes to selling intellectual property of digital art, it is not just selling a copy, it is selling property and the right to use, modify or resell the content. To make this type of sale of a property, it is necessary to conclude a legal agreement, hire a lawyer, etc. The idea of using blockchain to store and sell digital data ownership will be as simple as sending a signature e-mail that the user transfers ownership of its content. The terms of service provided by Ascribe are made in consultation with specialized lawyers. The complexity of legal licensing and ownership processes is achieved by accepting the terms of service. Blockchain is a publicly trusted book and will provide copyright to all users. Transaction time stamps can be used as evidence in court in the event of a property dispute. Regarding blockchain implementation, Ascribe has created its protocol called SPOOL -Secure Public Online Ownership Ledger [20]. This protocol is made specifically for documenting digital ownership transactions. Ascribe allows the artist to set a fixed number of editions of the work that can then be transferred, ensuring that each edition is authentic to the artist. So, when a transaction with transfers is made, the user can transfer ownership for one or more editions. The editions are under a work preserved in BigchainDB [21] and used for use in the blockchain. So, when transferring ownership, the users create a transaction for one of their works, includes the value of the hash, the issue and the new owner. The user signs the transaction and sends it. Because Ascribe uses the Bitcoin blockchain, a public researcher who knows the hashtag of the work can track the ownership of the work and find all the addresses that each edition has. Another blockchain implementation is Monegraph, which is proof that a new, modern and digital marketplace can be built that is easy for users. From the users' point of view, with Monegraph, it is easy to buy and sell fully licensed digital media directly with terms, rights and prices controlled by the authors. Monegraph facilitates the process of licensing and receiving revenue of art in many digital forms created by photographers, designers, illustrators and other media creators [24]. Monegraph allows authors to create and customize a license agreement that sets out the usage parameters for their media. There are four types of licenses: -license for works of art -for non-commercial usage, -photo news license -for editing, -product image license -commercial license and -picture of the situation is a license that gives all the rights. The authors have a public catalogue as a portfolio of their work that is publicly available and sold. Blockchain takes into account the history of ownership. But storing only information for owners may not be enough in the market. There is a lot of data on the sales’ contracts and product metadata that is just as important as the product. Due to the size of that data, these data cannot be included in the blockchain. That's why Monegraph sees the need for a blockchain system that can be implemented in other digital markets. What is needed for the solution is M. Mijoska et al. integration with other services. So, ownership data are stored in the blockchain to remain confidential, traceable and irreversible. But other documents related to the product can be stored in the database, such as MongoDB or CouchDB. Documents can be public, encrypted or not. Digital art itself can be stored in a document repository that can be accessed with HTTP or P2P. For example, the file can be stored in the Amazon Simple Storage Service (S3). The ecosystem, shown in Fig. 3, needs to find a way to connect blockchain, the document repository, and the digital art repository. The Bitmark real estate transfer system allows the transfer of both digital and physical objects [22]. When storing information about physical objects as an abstract property, a problem arises. Digital fingerprint can be easily traced using the hashtag algorithm, but when it comes to physical objects, there is a new solution called ObjectMinutiae [24]. It can be concluded that so far the Internet has left a lot of material with lost authorship. But looking to the future, blockchain can provide reliable proof of ownership by preserving the hash value of digital art in a time frame for a transaction. The concept is the same for all examples: the owner will have the private key and the original copy of the hashed art. No one can prove otherwise and no institution can change the data. The author can sell digital work, and the new owner will now have the private key to the digital work. 3.4 Internet of Things (IoT) Current IoT systems rely on a centralized client-server architecture. All devices are identified, authenticated and connected via cloud servers. The connection between the devices goes through the Internet. Even if this solution works well, for now, it may not be able to meet the needs of larger IoT systems in the future [25]. Blockchain technology can become an ideal component and a basic element for tracking billions of connected devices, processing transactions, and coordinating devices. Because blockchain is a decentralized mechanism, it allows the distribution of digital information between different nodes without copying. This can create a new kind of Internet that is safe and resistant to unauthorized modification. A secure and distributed blockchain system can serve as a platform for the IoT to connect devices seamlessly and securely. Blockchain will allow P2P messaging, file distribution, and autonomous coordination between devices without the need for a centralized cloud. Each device will have its role and will manage its behaviour in the new Internet in a decentralized and autonomous way [26]. The user controls IoT devices from a central point. The central point may be the user's mobile device. All activities, commands and rules are set by the user. Although this is good for personal control, it is not automated in many ways. The secure model of blockchain technology allows easy human interaction with devices without the need for a central cloud-based central system, which is usually more expensive. Additionally, because there will be no central control, such as a cloud, the possibility of damaging or stopping the entire IoT system is negligible. This ensures continuity, ease of operation, robustness, scalability and IoT security at a very low cost. A real revolution can happen if all devices are controlled by blockchain instead of direct user control. This is possible by using smart contracts. A smart contract is a code which enforces conditions and business rules that must be met before a transaction can be included in the blockchain. A transaction written in blockchain can be more complex than just transferring ownership. Smart contracts have an integrated mechanism for implementing different types of agreements between nodes. The smart deal is also autonomous and technically it is a computer code that can be self-sustaining and self-executing. Once in force, no human factor is needed to control it [27]. Execution of smart contracts is made possible by Ethereum, which is a platform for creating blockchain systems. Ethereum has its network of nodes and miners, just like Bitcoin. Slock.it is the first implementation of IoT and blockchain using the Ethereum platform [28]. The so-called Slocks are real physical objects that can be controlled by blockchain. They use a computer, Ethereum, which is a piece of electronics that brings blockchain technology into the whole home, allowing to rent access to any compatible smart object and accept payments without intermediaries. Slock.it allows anyone to rent, sell or share anything -without intermediaries. It works as follows: the owner of a smart object (Slock) creates a smart agreement for its use by setting the price and the deposit. Users can find Slock and then pay using Ethereum blockchain, which will allow them to open or close that Slock, which means using it as agreed. The smart contract is implemented automatically, and the deposit is returned to the user minus the rental price. In practice, this means that by installing a smart door lock for the apartment, users can rent an apartment on the blockchain. The smart contract will unlock it and make it available, as stipulated in the contract. In addition to smart doors, this system allows you to rent, sell or share any smart item that has built-in Slock.it technology. Bicycles, cars and any item that can be provided with a physical lock is a case of potential use. Due to the shortcomings of existing IoT architectures, a new secure, private and easy-to-use IoT architecture for a smart home based on the blockchain technology is being proposed [29]. Here, block mining is considered the first problem because IoT devices are devices with limited resources and cannot perform such an operation. IoT devices must also operate at the same time for a detected or assigned command. The time required for mining in blocks, in most cases, may not be acceptable. The proposed solution [29] contains three levels: local network, folding network and cloud storage. Informatica 45 (2021) 319–333 327 The local network contains all the smart home objects and a local computer that acts as a local blockchain that is constantly on the Internet. This local blockchain is centrally managed by its owner. When there is a new smart device in the home, the user adds it to the blockchain. All transactions related to a particular device are linked to a chain. The folding network is a P2P network that connects multiple smart homes and users. This network manages public keys, allowing users access to smart home data and public keys to smart homes that provide accessible data. Cloud storage is included as a solution for devices that may want to store data in the cloud so that a third party can access the data and provide certain smart services. Using blockchain, the IoT can switch to a network of devices that can communicate with each other and the environment without human intervention. The devices will also make smart decisions, so that many workflows will be automated in new ways, achieving significant time and cost savings. 3.5 Digital identity Due to a security mechanism that protects against disruption, blockchain can play a key role in securing digital identities 0. Blockchain can protect identity by encrypting. Also, blockchain can be used to build a very strong, secure and impenetrable identity system, which can prevent unauthorized activity. Blockchain technology has the potential to replace all existing physical identities and move them to a digital platform. Identity ranges, such as passports, driver's licenses, ID cards, and even votes election, can be digitally generated using blockchain technology [30]. It is also possible for all identities to be kept together and secured with a blockchain. The use of blockchain, unauthorized modification of various certificates, such as educational, marital, death certificates or registry books, cannot be performed, thus preventing unauthorized and malicious modification. 3.6 Financial services and infrastructure Blockchain technology can provide a platform for better financial services and payment opportunities. The use of cryptocurrencies such as bitcoin can restore existing payment systems and other financial services [31]. For example, if one person sends money to others in another country, possible transfer funds are banks, payment applications (such as PayPal) or other intermediary organizations. But their service costs are high, even for small transactions. All of these intermediaries can be eliminated and the money can be transferred directly from the sender to the recipient using cryptocurrencies such as bitcoin, without the involvement of an intermediary. Tracking transactions and property rights can also be implemented in the financial sectors using blockchain. The use of blockchain technology in the financial sector will not only provide a free payment system but will also provide a secure way to conduct online transactions. 3.7 E-commerce If implemented properly, blockchain technology can massively support e-commerce and retail in terms of growth, sales and marketing. Retail has already begun to witness the growth and profit from the sale of consumer goods and services using blockchain technology. The Internet acts as a great platform for promoting local businesses and other content, but there is always a risk that the content will be used without proper permission. Overall plagiarism can be limited with timestamping techniques of blockchain and thus preserve the originality of any content. Implementing blockchain technology in the retail industry will also provide a clear and transparent supply chain management system that will allow users to gain insight into the origins of their food and other products. E-commerce websites and other companies such as OpenBazaar, Provenance, Everledger, Ascribe, and BlockVerify are some of the business activities supported by blockchain involved in the retail industry [30]. This eliminates brokers and commissions and can set up a direct transaction channel between the buyers and sellers. In this way, online sales will promote retail business transactions and the economy. 3.8 Educational records Sony Corporation, in partnership with Sony Global Education, recently applied for a blockchain-based repository patent that would include student records, including completed courses, test scores, diplomas, and more, in the form of a digital record 0. Such a system can allow teachers and students to access relevant data while maintaining privacy. It can also provide potential educational institutions and potential employers with a transparent, secure place to acquire applicants' credentials. 3.9 Education system In this time of pandemic when teaching in schools takes place online, many problems arise. Classes are shortened and the teacher has to spend valuable time constantly checking which students are present in the class. Also, the principal of each school wants to know if the staff teachers hold their classes regularly and on time. These problems could be overcome if the data from each class was recorded on a blockchain. In that way, no one will be able to manipulate that data. You can check at any time which professor and when he/she taught and which student attended the class. If students knew that everything was recorded and the data could not be changed, they would surely attend classes more regularly. And that is the most important condition for them to improve their success. Improved student achievement is the best satisfaction for teachers. Parents would also be more satisfied with such behaviour of their children. The whole society would benefit the most from a good education because knowledge is strength and power. M. Mijoska et al. 3.10 Sharing knowledge Everipedia, the world's first blockchain encyclopedia, has announced plans to build a new open-source wiki network that decentralizes the Wikipedia knowledge base, allowing any editor to become a webmaster. According to Larry Sanger, one of the founders of Wikipedia and now acting director of information at Everipedia, it will rely on a blockchain program that allows users to contribute more responsibly. Editors will start earning "tokens" based on their "IQ" (earned points for useful contributions) that will represent the virtual shares of the platform [35]. Theodor Forselius, co-founder and CEO of Everipedia, noted that the ability of the individual to be a stakeholder in the encyclopedia, he/she edits and in turn to gain real monetary value is an exciting idea. Customers will have to pay a deposit before making contributions. If their changes are considered incorrect, they will lose the token and those whose changes are correct will receive an original deposit and additional tokens as a reward. Forselius also listed two additional benefits of moving Everipedia to the blockchain. The first benefit is that the data will no longer be stored on a centralized server, which means that they will survive even if the central organization, Everipedia, ceases to exist. The second benefit is that it will be impossible to censor the data, which means that governments that currently censor Wikipedia will not be able to prevent users from contributing to the platform. 3.11 Insurance Blockchain technology has many applications in the insurance sector because the sector is based on agreements and trust between two parties. The use of blockchain in this regard would mean that both the insurance contract and the consumer's personal information can be stored in the distributed book, while the consumer controls who has access. The data remains on the user's device and this can eliminate the need for brokers and other intermediaries between insurance companies and consumers. The presence of smart contracts is already being felt in the insurance sector. The insurance company AXA worked with a website called Fizzy, which ensured consumers against flights delayed by two or more hours 0. Fizzy noted the procurement of airline tickets on Ethereum blockchain and linked the resulting smart contracts to global air traffic databases. If sufficient delay is observed, the fee was paid automatically. Another example of using smart contracts in the insurance sector is to compensate farmers in the event of a drought or other disaster that harms their property. The other application of blockchain in the insurance industry is its role in potentially reducing fraud 0. In this case, the blockchain records all policies and all claims in a single distributed book. 3.12 Food industry This technology is potentially beneficial for everyone in the food industry. When foodborne illness occurs, restaurants serving food or grocery stores often have difficulty discovering the source of the contamination. Monitoring with blockchain will help to immediately track affected objects and their origins, quickly locate the problem so that contaminated products can be removed from menus, shelves and supply chains 0. The standards and reputation of blockchain-based vendors will ensure the integrity of marketing claims. Certificates and reports on the existing audit of existing facilities will be registered on the site to prove the above claims. If all suppliers in the supply chain follow these rules and write down the data on the origin of food in decentralized monitoring systems, those who make false claims or misrepresent the origin of their products will be destroyed. Blockchain technology allows farmers and producers real-time access to commodity prices and market data. In this way, farmers have better information about the market and they can be more competitive and more productive. More and more people want to know what the products they consume contain. They want to be able to make reliable food choices for themselves, their families, and their communities. Blockchain technology will help build that trust. 3.13 Accounting and audit Because changing transactions or whole blocks are almost impossible in blockchain, the use of blockchain technology makes it easy to prove the integrity of electronic files. One approach is to generate hash arrays for existing evidence, such as invoices. This set of hash strings is a digital print. Furthermore, this imprint is immutable and is recorded on the blockchain through a transaction. At any later time, it is possible to prove the integrity of that file by re-creating a digital fingerprint and comparing it to the digital fingerprint stored in the block. In case the digital prints are identical, it is proved that the document remained unchanged since the first block placement in the blockchain [34]. Companies will benefit from this type of blockchain implementation. Standardization will allow financial auditors to automatically verify much of the most important financial reporting data. The costs and time required to carry out the audit will be significantly reduced and the time-released can be spent on checking very complex transactions or internal control mechanisms. 3.14 Cross-border payment Cross-border payments generally refer to the transnational and transregional transfer of funds between two or more countries or territories through international trade, international investment, and other international claims and debts using certain settlement instruments and payment systems. The traditional cross-border payment is based on the banking system which has such characteristics as time-consuming, high cost, more funds occupied, and low security. However, all these bottlenecks can be effectively overcome by applying blockchain to reconstruct the credit system and expand the payment boundary. Researchers pointed out that applying blockchain technology to the cross-border payment has a high potential effect. Holotiuk et al. in [44] stated that the Informatica 45 (2021) 319–333 329 blockchain technology will improve the payment system by providing a solid structure for cross-border transactions and removing expensive intermediary costs and gradually weaken or alter the business model of the existing payment industries. Yao and Zhu have proposed that the blockchain technology can be adopted for the cross-border payment based on the application of VISA and SWIFT blockchain. R3 has been working with 22 of its member banks to build a real-time, cross-border payments solution on Corda that is the consortium's “ blockchain-inspired ” distributed ledger [51]. 3.15 SARS-CoV-2 virus vs. the Red Cross: better solutions via blockchain and artificial intelligence Beijing has ordered all public donations for the Wuhan crisis to be funnelled to five government-backed charity organizations. This is a throwback to pre-2016 China before the Charity Law of China was introduced to enable the establishment of private charities. The Charity Law was intended to develop the charity field and protect the interests of relevant stakeholders. Although all charities in China are required to have in place sound internal governance structures, the funnelling order implicitly assumes that the five government-backed charities are fit for purpose and better able to manage the current crisis. That assumption may be at odds with historical and more recent evidence suggesting organizations responsible for responding to crises appear to struggle to manage their core responsibilities. And if Beijing’s implicit assumption is wrong then the centralizing effect produced by funnelling merely serves to compound the problem [46]. In this instance, and not for the first time, the Red Cross in China is in the crosshairs of public anger. ‘One of the lessons learned was that emergency response must be better developed at the local level’.This is what the Red Cross said in 2017 on the 10th anniversary of the deadly Wenchuan earthquake in Sichuan province in western China. Billions of dollars had been donated following the Sichuan earthquake but had been ‘mishandled’. What has been learned? The public in China has again been angered by the mishandling of donations, and this impacts on the willingness to donate, which retards the objective of addressing a problem. Blockchain and AI are now in frequent use by global technology companies and represent tools that can be used to better manage crises. A private blockchain network would enable the recording and tracking of anything that is donated, from donation to N95 masks. It also creates clear points at which it is possible to hold a person or organization to account, from the loading of donations for delivery through to its final end-use. Importantly, the blockchain can also be given public visibility, providing transparency to all stakeholders -donors and donees, as well as public oversight bodies. Anyone could track the progress and use of their donation. 3.16 E-voting The use of blockchain technology would prevent any participant from fraud, from the voters themselves to the vote counters in conducting the election. Blockchain technology would make sure that the individual could not vote several times because there is an unchanging record of their voice and identity. Also, no one could delete votes because, as has been said, blockchain is immutable. Those responsible for counting votes will have a final record of the number of votes that regulators or auditors can control at any one time. The results can be encrypted, which would enhance transparency while maintaining a key sense of privacy. The results entered and saved in the blockchain are not only immutable and transparent but also available. This means that voting with blockchain is more efficient than traditional voting 0. Blockchain technology can also be used to improve voting processes in public and private companies and organizations. 4 Key challenges in the implementation of blockchain technology Implementation of blockchain in all industrial sectors will potentially lead to several issues as well as new dependencies [44]: Awareness and understanding. The principal challenge associated with blockchain is a lack of awareness of the technology, especially in sectors other than banking, and a widespread lack of understanding of how it works. This is hampering investment and the exploration of ideas. Organization. The blockchain creates the most value for organizations when they work together on areas of shared pain or shared opportunity – especially problems particular to each industry sector. The problem with many current approaches, though, is that they remain stove-piped: organizations are developing their blockchains and applications to run on top of them. In any one industry sector, many different chains are therefore being developed by many different organizations to many different standards. This defeats the purpose of distributed ledgers, fails to harness network effects and can be less efficient than current approaches. Culture. A blockchain represents a total shift away from the traditional ways of doing things – even for industries that have already seen a significant transformation from digital technologies. It places trust and authority in a decentralised network rather than in a powerful central institution. And for most, this loss of control can be deeply unsettling. Cost and efficiency. The speed and effectiveness with which blockchain networks can execute P2P transactions come at a high aggregate cost, which is greater for some types of blockchain than others. This inefficiency arises because each node performs the same tasks as every other node on its copy of the data in an attempt to be the first to find a solution. Regulation and governance. Regulations have always struggled to keep up with advances in technology. M. Mijoska et al. Indeed, some technologies like the Bitcoin blockchain bypass regulation completely to tackle inefficiencies in conventional intermediated payment networks. One of the other challenges of the blockchain approach, which was also one of its original motivations, is that it reduces oversight. Security and privacy. While cryptocurrencies like Bitcoin offer pseudonymity (Bitcoin transactions are tied to ‘wallets’ rather than to individuals), many potential applications of the blockchain require smart transactions and contracts to be indisputably linked to known identities and thus raise important questions about privacy and the security of the data stored and accessible on the shared ledger. 5 Conclusion and further work Blockchain is as much a political and economic hypothesis as a technological one. Blockchain technology provides a new way to think about how we agree on things. For the first time, multiple untrusted parties can create and agree on a single source of truth, without the use of a middleman. The advance of blockchain technology claims and has the potential to revolutionize many areas of human activity, especially the financial and business worlds. Other applications of blockchain technology discussed previously are equally important. Some of the displayed applications are already implemented, and some can be very easily implemented soon. A major area of future work will focus on implementing blockchain technology for use in EHRs. It will consider how an external stakeholder can use or request a patient’s health records from the hospital or health authority without violating patient privacy. The technology can also be used to maintain birth registers for births, deaths, marriages, business registrations, but also to enable the right to vote in elections without the physical presence of the polling station and to prevent fraud during the voting process. With the application of decentralization enabled by Blockchain technology, stock market fraud would no longer be possible, preventing the emergence of issuing more shares than the presented number, and companies will not be able to hide the profit. Large companies have a monopoly on electricity sales. But with the help of blockchain, everyone can be involved in the production of electricity through the possibility of micro-transactions. The surplus electricity produced by solar panels in an individual house can be sold to someone who needs it, and the transaction can be done through a crypto-wallet. The application of blockchain in smart contracts allows executing certain simple contracts when all the conditions are met. This innovation can have enormous implications for any kind of agreement that has the potential to digitize output. Without control and supervision, smart contracts can revolutionize the work done by lawyers. Crowdfunding is a very interesting concept that shows that people want to invest in new products and ideas and contribute to their development. Using smart contracts would make it impossible to sell the same things over and over again, which would give a new dimension to this type of financing. Another application of blockchain technology is the platform for predicting market movements. When asking a particular question, many people answer it. It is known that 100 thousand average intelligent people will give a better answer to the question than one very intelligent person on the same topic. Such systems provide accuracy and reward if you have given an answer that turned out to be accurate. Such applications already exist -the Augur platform, on which in just a few dollars, a question can be asked, and there is no cost for agency intermediaries, and the answer is always correct and unchanging 0. With the numerous blockchain technology applications, it should be noted that it requires a lot of resources and is currently not the most cost-effective option for achieving most of these technological advances. It is about spending huge amounts of computing power to implement and verify the information. Blockchain technology is trying to enter revolutionizing many of today's systems, enabling the development of many previously unimaginable technologies. An interesting offspring of this technology can be the redefinition of Internet 3.0 defined as a new type of decentralized infrastructure or network of networks. References [1] Satoshi Nakamoto (2008). “Bitcoin: A Peer-to-Peer Electronic Cash System”, https://doi.org/10.2139/ssrn.3440802 [2] Dragana Tadic Živkovic (2018). “Blockchain technology: opportunity or a threat to the future development of banking”, Proceedings of Ekonbiz, http://www.ekonbiz.ues.rs.ba/ojs/article/view/139.h tml (last accessed 06/10/2021) [3] Swan M. (2015). “Blockchain: Blueprint for a new economy”, https://www.goodreads.com/work/best_book/44338 116-blockchain-blueprint-for-a-new-economy (last accessed 06/10/2021). [4] Li Zhang, Yongping Xie, Yang Zheng, Wei Xue, Xianrong Zheng, Xiaobo Xu (2020). “The challenges and countermeasures of blockchain in finance and economics”, John Wiley & Sons, Ltd. https://doi.org/10.1088/1755-1315/825/1/012017 https://doi.org/10.6028/NIST.IR.8202 [5] Julija Basheska, Vladimir Trajkovik (2018). “Blockchain based Transformation in government: review of case studies”, ETAI 2018, http://www.etai.org.mk. (last accessed 06.10.2021) [6] Nick Szabo (1997). “The idea of smart contracts”. Nick Szabo’s Papers and Concise Tutorials. https://fon.hum.uva.nl/rob/Courses/InformationInSp eech/CDROM/Literature/LOTwinterschool2006/sza bo.best.vwh.net/idea.html (last accessed 06.10.2021) [7] J. Garay, A. Kiayias, and N. Leonardos (2015). “The Bitcoin Backbone Protocol: Analysis and Applications”, pp. 281-310, Springer Berlin Heidelberg. Informatica 45 (2021) 319–333 331 https://eprint.iacr.org/2014/765.pdf (last accessed 06.10.2021) [8] A. Gervais, G. O. Karame, V. Capkun, and S. Capkun (2014). “Is bitcoin a decentralized currency?”, IEEE, Security Privacy, vol. 12, pp. 54­60. https://doi.org/10.1109/MSP.2014.49. [9] Pinyaphat Tasatanattakool, Chian Techapanupreeda (2018). “Blockchain: Challenges and Applications”, 978-1-5386-2290-2/18/$31.00 ©IEEE. https://doi.org/10.1109/ICOIN.2018.8343163. [10] Roderick Neame (2013). “Effective Sharing of Records and Maintaining Privacy”, Online Journal of Public Health Informatics, https://doi.org/10.5210/ojphi.v5i2.4344. [11] Tal Rapke, MD (2016). “Blockchain Technology & the Potential for Its Use in Healthcare. https://oncprojectracking.healthit.gov/wiki/downloa d/attachments/14582699/24­ Blockchain%20Technology_Tal%20Rapke%20MD .pdf?version=1&modificationDate=1474475152000 &api=v2 (last accessed 06/10/2021). [12] Nitesh Gupta, Anand Jha, and Purna Roy (2016). “Adopting Blockchain Technology for Electronic Health Record Interoperability”. https://doi.org/10.1109/ColComCon.2018.8466733. [13] Elena Karafiloski (2017). “Blockchain Solutions for Big Data Challenges”, IEEE EUROCON. https://doi.org/10.1109/EUROCON.2017.8011213. [14] Laure A. Linn, Martha B. Koo, M.D. (2016). “BlockchainForHealthDataandItsPotential Use in Health IT and Health Care Related Research”. https://www.healthit.gov/sites/default/files/11-74­ ablockchainforhealthcare.pdf. (last accessed 06/10/2021). [15] Ariel Ekblaw, Asaph Azaria, John D. Halamka, MD, Andrew Lippman (2016). “A Case Study for Blockchain in Healthcare: “MedRec” prototype for electronic healthrecordsandmedical researchdata”. https://doi.org/10.1109/ICOIN.2018.834316310.11 09/ICOIN.2018.8343163 [16] GuyZyskind, Oz Nathan andAlex ’Sandy’ Pentland (2015). “Decentralizing Privacy: Using Blockchain to Protect Personal Data”, Security and Privacy Workshops (SPW), IEEE. https://doi.org/10.1109/SPW.2015.27 [17] Antorweep Chakravorty and Chunming Rong (2017). “Ushare: user controlled social media based on blockchain”, International Conference on Ubiquitous Information Management and Communication. https://doi.org/10.1145/3022227.3022325 [18] Roy Rosenzweig (2001). “The Road to Xanadu: Public and Private Pathways on the History Web”, The Journal of American History, 88.2. https://chnm.gmu.edu/digitalhistory/links/cached/int roduction/link0.27a.pathwaysonhistweb.html (last accessed 06.10.2021) [19] Trent McConaghy and David Holtzman (2015). “Towards an Ownership Layer for the Internet”, ascribe GmbH. http://trent.st/content/2015-06­ 24%20ascribe%20whitepaper.pdf (last accessed 06.10.2021) [20] Dimitri de Jonghe (2016). “SPOOL Protocol”, https://github.com/ascribe/spool. (last accessed 06.10.2021) [21] Trent McConaghy, Rodolphe Marques, Andreas Muller, Dimitri De Jonghe, T. Troy McConaghy, Greg McMullen, Ryan Henderson, Sylvain Bellemare, and Alberto Granzotto (2016). “BigchainDB: A Scalable Blockchain Database”, ascribe GmbH, Berlin, Germany. http://blockchain.jetzt/wp­ content/uploads/2016/02/bigchaindb­ whitepaper.pdf (last accessed 06.10.2021). [22] The Guardian (2015). “PRS for Music takes legal action against SoundCloud streaming service” https://www.theguardian.com/technology/2015/aug/ 27/prs-for-music-takes-legal-action-against­ soundcloud. (last accessed 06.10.2021) [23] Christopher Hall, Casey Alt, Lê Quý Qu.c Cu.ng, and Sean Moss-Pultz (2016).“Bitmark: The property system for the digital environment”. https://docs.bitmark.com/assets/pdf/bitmark­technical-white-paper.pdf (last accessed 06.10.2021) [24] Tzu-Yun Lin, Yu-Chiang Frank Wang, Sean Moss-Pultz (2015). “ObjectMinutiae: Fingerprinting for Object Authentication”. https://doi.org/10.1145/2733373.2807989. [25] FTC Staff Report (2015). “Internet of Things: Privacy & Security in a Connected World”, https://www.ftc.gov/system/files/documents/reports /federal-trade-commission-staff-report-november­ 2013-workshop-entitled-internet-things­ privacy/150127iotrpt.pdf. (last accessed 06.10.2021) [26] Executive Report (2015). “Device democracy ­Saving the future of the Internet of Things”, IBM Institute for Business Value. https://www.ibm.com/downloads/cas/Y5ONA8EV (last accessed 06.10.2021) [27] Vitalik Buterin (2016). “A next generation smart contract & decentralized application platform”, Ethereum White Paper. https://www.weusecoins.com/assets/pdf/library/Eth ereum_white_paper-a_next_generation_smart_contract_and_decentraliz ed_application_platform-vitalik-buterin.pdf (last accessed 06.10.2021) [28] Christoph Jentzsch (2015). “Decentralized Autonomous Organization to Automate Governance”. http://cryptochainuni.com/wp­ content/uploads/Decentralized-Autonomous­ Organization-To-Automate-Governance.pdf. (last accessed 06.10.2021) [29] Ali Dorri, Salil S. Kanhere, and Raja Jurdak (2016). “Blockchain in Internet of Things: Challenges and Solutions”. https://arxiv.org/abs/1608.05187, (last accessed 06.10.2021) M. Mijoska et al. https://en.wikipedia.org/wiki/Digital_identity, (last accessed 06.10.2021). [30] TechGenix (2018). “Blockchain technology: Why it will change the world”, [Internet], URL: https://techgenix.com/blockchain-technology, (last accessed 06.10.2021). [31] Quora (2017). “How will blockchain impact accounting, auditing & finance?”, https://www.quora.com/How-will-blockchain­ impact-accounting-auditing-finance, (last accessed 06.10.2021). [32] Object Computing (2017). “8 ways blockchain is changing the world”, https://objectcomputing.com/news/2017/12/20/8­ ways-blockchain-changing-world, (last accessed 06.10.2021). [33] Blockchain Expo (2018). “How will blockchain impact the insurance sector?”, https://www.blockchain­ expo.com/2018/02/blockchain/blockchain-insurance, (last accessed 06.10.2021). [34] Forbes (2018). “3 Innovative Ways Blockchain Will Build Trust In The FoodIndustry”, https://www.forbes.com/sites/samantharadocchia/20 18/04/26/3-innovative-ways-blockchain-will-build­ trust-in-the-food-industry/#285bfa832afc, (last accessed 06.10.2021). [35] Hacker Noon (2018). “How Blockchain Will Make Electronic Voting More Secure”, https://www.bitcoininsider.org/article/27800/how­blockchain-will-make-electronic-voting-more-secure, (last accessed 06.10.2021). https://www.augur.markets (last accessed 06.10.2021). https://www.axa.com/en/magazine/axa-goes-blockchain-with-fizzy, (last accessed 06.10.2021). [36] Aleksandar Matanovic, “Osnove kriptovaluta i blokcein tehnologije”, http://fzp.singidunum.ac.rs/demo/wp­ content/uploads/Osnove-kriptovaluta-i­ blok%C4%8Dein-tehnologije.pdf, (last accessed 06.10.2021). [37] Gu, J.; Sun, B.; Du, X.; Wang, J.; Zhuang, Y.; Wang, Z. (2018). “Consortium blockchain-based malware detection inmobile devices”, IEEE. https://doi.org/10.1088/1742-6596/1693/1/012025 [38] Androulaki, E.; Barger, A.; Bortnikov, V.; Cachin, C.; Christidis, K.; De Caro, A.; Enyeart, D.; Ferris, C.;Laventman, G.; Manevich, Y.; et al. (2018). “Hyperledger fabric: A distributed operating system forpermissionedblockchains”.InProceedingsofthe Thirteenth EuroSys Conference ACM, Porto, Portugal, 23–26 April;pp. 1–15. https://doi.org/10.1145/3190508.3190538 [39] Hyperledger Burrow -Hyperledger. Available online: https://www.hyperledger.org/projects/hyperledger-burrow (last accessed 06.10.2021). [40] Domina Hozjan (2017). “Blockchain”, Sveucilište u Zagrebu Prirodoslovno–Matematicki Fakultet matematicki Odsjek, Zagreb. https://repozitorij.pmf.unizg.hr/islandora/object/pmf :779 (last accessed 06.10.2021). [41] Miroslav Minovic (2017). “Blockchain technology: usage beside cripto currencies”. Available online: https://www.researchgate.net/publication/31872273 [42] Dejan Vujicic, Dijana Jagodic, Siniša Randic(2018). “Blockchain technology, bitcoin, and Ethereum: A brief overview”, Available online: https://doi.org/10.1109/INFOTEH.2018.8345547 [43] Mijoska M. and Ristevski B. (2020). “Blockchain Technology and its Application in the Finance and Economics”, International Conference on 10th Applied Information and Internet Technologies – AIIT 2020, October, Zrenjanin, Serbia. http://www.tfzr.uns.ac.rs/aiit/files/AIIT2020%20ePr oceedings.pdf. P.197-202 (last accessed 06.10.2021). [44] https://www2.deloitte.com/content/dam/Deloitte/uk/ Documents/Innovation/deloitte-uk-blockchain-key­challenges.pdf, (last accessed 06.10.2021). [45] Holotiuk, F., Pisani, F., & Moormann, J. (2017). The impact of blockchain technology on business models in the payments industry. Wirtschaftsinformatik 2017 Proceedings. Retrieved https://wi2017.ch/images/wi2017-0263.pdf (last accessed 06.10.2021). [46] Crosman, P. (2017). “R3 to take on Ripple with cross-border payments blockchain”. American Banker; New York, N.Y. Retrieved from https://www.bitcoinisle.com/2017/10/31/r3-to-take-on-ripple-with-cross-border-payments-blockchain/ (last accessed 06.10.2021). [47] Syren Johnstone (2020). A Viral Warning for Change. COVID-19 Versus the Red Cross: Better Solutions Via Blockchain and Artificial Intelligence, University of Hong Kong Faculty of Law Research Paper No. 2020/005 http://researchblog.law.hku.hk/2020/02/syren­ johnstone-on-wuhan-coronavirus.html, (last accessed 06.10.2021). Cybersecurity Awareness: A Critical Analysis of Education and Law Enforcement Methods Said Baadel Canadian University Dubai, Dubai, UAE E-mail: s.baadel@gmail.com Fadi Thabtah ASDTests, Auckland, New Zealand E-mail: fadi@asdtests.co.nz Joan Lu University of Huddersfield, Huddersfield, UK E-mail: j.lu@hud.ac.uk Keywords: anti-phishing, cyber security, embedded training, law enforcement, spear phishing Received: October 7, 2020 According to the international Anti-Phishing Work Group (APWG), phishing activities have abruptly risen over the last few years, and users are becoming more susceptible to online and mobile fraud. Machine Learning techniques have potential for building technical anti-phishing models, with a handful already implemented in the real time environment. However, majority of them have yet to be applied in a real time environment and require domain experts to interpret the results. This gives conventional techniques a vital role as supportive tools for a wider audience, especially novice users. This paper reviews in-depth, common, phishing countermeasures including legislation, law enforcement, hands-on training, and education among others. A complete prevention layer based on the aforementioned approaches is suggested to increase awareness and report phishing to different stakeholders, including organizations, novice users, researchers, and computer security experts. Therefore, these stakeholders can understand the upsides and downsides of the current conventional approaches and the ways forward for improving them. Povzetek: Prispevek preucuje ukrepe proti ribarjenju vkljucno z izobraževanjem in prakticnim usposabljanjem. Introduction Phishing is an attempt to gain sensitive personal and financial information (such as usernames and passwords, account details, and social security numbers) with malicious intent via online deception [1][2][3]. Phishing typically employs identity theft and social engineering techniques, such as creating websites that replicate existing authentic ones. Through a seemingly legitimate email that contains a hyperlink, potential users are redirected to the malicious website in order to divulge their private information and credentials [4]. Phishing techniques include spear phishing, which is a directed attack where emails that appear legitimate are sent to employees of a certain company in an attempt to access a company’scomputersystemandhencegaintheirsensitive credentials, or whaling, that targets senior corporate executives [5]. These attacks require a proper understanding of the organisational structure in order for the phishing attack to be in its proper context. Advancements in computer networks and cloud technology in recent years have resulted in an exponential growth of online and mobile commerce, where customers perform substantial online purchases [6]. This online growth has led to phishing activities reaching unprecedented levels in recent months. The Anti-Phishing Work Group (APWG), which aims to minimize online threats (including pharming, spoofing, phishing, malware, etc.) has published there Q4 report about phishing activities of 2019 [7]. The report showed that there were approximately 162,155 unique phishing websites detected in the fourth quarter of 2019, with industries providing software as a service (SaaS) and Webmail followed by payments and financial institutions as the most targeted ones. More and more users become prone to information breaches and identity theft, their trust in e-commerce or mobile commerce platforms will deteriorate, thus resulting in a huge loss of financial gains [8]. So, why is there an alarming increase in phishing activities and more users becoming susceptible to phishing scams? The answer to this can be due to inexperienced users and limited knowledge about the severity of phishing. Since phishing can be seen partly as a social problem, software tools are not able to provide a permanent solution to it. The problem can be minimised by addressing it in three ways: educating the public on identifying fraudulent phishing websites, enforcing the law to punish scammers, and developing more intelligent intervention techniques. There are claims that anti-phishing solutions that adopt Machine Learning (ML) tend to be more practical and effective in combating phishing [9][10]. Nevertheless, the majority of the ML solutions deal with phishing as a static problem in which they only produce the classification decision from an historical dataset [11]. Continuing, the dynamic nature of phishing that involves users browsing in real time necessitates the decision to be on the fly, which makes ML approaches not fully suitable despite being around for the last decade. There are also needs to educate the online community, especially novice users, on phishing as well as to revise existing legislation. Existing reviews on website and email phishing, such as [10] [12] [13] [14] [15] [16] have dealt with the problem from a technological solution perspective. Their reviews focused on broad anti-phishing techniques based on data mining, ML, databases, and toolbars, and only briefly discuss solutions such as law enforcement, awareness programs, user education, and training among others. To be clear, there are few discussions and critical analyses on the benefits gained by legislative law and simulated training to combat phishing. Other reviews have dismissed conventional solutions and just reviewed ML solutions [9] [17]. Therefore, the key objective of this paper is to reveal the benefits and drawbacks of the classic anti-phishing countermeasures and provide an in depth discussion on legislation, law enforcement, and user training. The remainder of this paper is organized as follows: section 2 briefly outlines the phishing attacks procedure. In section 3, adding an anti-phishing preventive layer that includes some of the conventional countermeasures is discussed. In Section 4, some of the pros and cons of each of the countermeasures are analyzed. Section 5 then looks into an emerging phishing threat. Finally, a brief summary and conclusion are provided in Section 6. 2 Phishing attacks procedure Phishing attacks are often initiated when an attacker sends an email to potential victims with a link that can direct them to a phony website that resembles one that is legitimate. Other initiation processes include online blogs, short message services (SMS), social media websites using web 2.0 services (such as Facebook and Twitter), peer to peer (P2P) file sharing services, and Voice over IP (VoIP) systems where spoofing caller IDs are used by attackers [18]. Each of these phishing methods have a slight variation on how the procedure is done all with the goal of defrauding the unsuspecting user. To see how the phishers design their scheme, Figure 1 below shows an example of the life cycle of a phishing attack by email where the phisher uses a common technique of adding a hyperlink to route unsuspecting users to a phony website. The procedure can be summarized as follows: 1) Phishers set up a phony website resembling a legitimate one. S. Baadel et al. 2) A hypertext link is sent via email to potential victims to take immediate action such as updating their account information, resetting their password, etc. Urgency is a vital element in such an email in order to bait unsuspecting users. 3) Once the link is clicked, this action will route users to the fraudulent phishing website. 4) The fraudulent website collects vital sensitive information such as user name and password. Embezzled information can be used to access other platforms such as ebanks, emails, etc., for financial gain, identity theft, or other cybercrimes. Text of the second section. Conventional Anti-Phishing Prevention Layer due to the broad nature and severity of phishing scams for individuals, businesses, government entities, and non­profit organisations, there have been different methods proposed in the literature to combat phishing. Among these are technical solutions that address the role of ML techniques to identify phishing features [3] [18] [19] [20] [21] [22] [23] [24] [25]. ML approaches can be seen as the first layer of prevention addressing the menace of phishing attacks. However, ML solutions alone cannot eradicate the problem due to the dynamic nature of it as well as the complexity of the outcomes that ML techniques offer to the end-user. Usually ML technique outcomes are hard to interpret by novice users, and thus are rarely applied when phishing attacks are occurring in real time. These two drawbacks limit the use of ML in commercial anti-phishing tools. There is a need to address other social approaches, such as user training and education, to raise awareness among different types of users. These conventional approaches provide an additional layer in combating phishing, as shown in figures 2 and 3. Furthermore, developing social online communities’ enables rapid data growth through users reporting their phishing experience, and thus similarities between new deceptive scams can be shared by the different stakeholders. Lastly, new legislation that can introduce harsher jail time for cybercrimes can help in reducing phishing attacks. While there is a push by government entities and academic institutions to educate the public and raise awareness about security issues, little research has been done to educate them on how to protect themselves from phishing attacks [26]. These conventional approaches provide an additional layer in combating phishing, as shown in figures 2 & 3. In the next section, how government and law enforcement have come around with other conventional anti-phishing approaches are examined. 2.1 Legislation and law enforcement Phishing scammers have the potential to target individual users and businesses, therefore legislative bills must be designed to protect different stakeholders. In the United States of America (USA) and Canada, a joint task force was formed between the U.S Department of Justice (DoJ) and the Public Safety and Emergency Preparedness Canada (PSEPC) in 2004. The primary purposes of the task force were to define the nature and scope of phishing, its impact on cross-border criminality, and to provide the public with information about common phishing techniques [27]. Later that year, the US Senate introduced a bill known as the Anti-Phishing Act of 2004 in order to have legislation at the federal level tackling phishing. After failing to make it through the senate’s calendar that year, the bill was re-introduced as the Anti-Phishing Act of 2005. The aim of the bill was to amend the federal criminal code to include phishing and impose an imprisonment of up to five years for anyone found guilty of phishing. However, this bill died in the senate sub­committee reviews and never made it into law. While this legislation tried to address phishing specifically, there are other laws such as “18U.S.C. section 1028” which do not mention phishing specifically but covers topics such as fraudulency, identity theft, and organized crime, which can be used to address phishing scams [28][29]. Adopting Informatica 45 (2021) 335–345 337 organized crime laws to combat cybercrimes may give law enforcement enhanced investigative powers [30][31]. At the State level, California was able to enact the Anti-Phishing Act of 2005 (named after the failed senate bill) that criminalizes phishing attacks. Businesses under this law are able to sue phishing scammers for financial damages. Individuals can claim the greatest of three times of the actual damages or five thousand dollars for every violation cited. Many other states (such as Arizona, Florida, Connecticut, Michigan, Texas, etc.) followed California’s lead and enacted their own cybercrime legislations. The United Kingdom (UK) strengthened its legal system against cybercrimes, including fraud and identity theft, by introducing a new law in 2006 called the Fraud Act. The act increased prison sentences (up to ten-years) for online fraud offences that included phishing [29][32]. The government also set up Action Fraud, a website dedicated to national fraud and cybercrime where users can find educational materials on different cybercrimes and have a forum for reporting any suspicious activities. Other countries such as Canada passed a broad Anti-Spam law in December of 2010 that included phishing among other cybercrimes. The law allowed three government agencies (Canadian Radio-Television and Telecom Commission, the Competition Bureau, and The Office of the Privacy Commission) to enforce it, and even allowed the agencies to share information with other foreign states if such information is relevant to an investigation. The government of Canada has also posted information online to educate the public on the different cybercrimes, and also encourages them to report any fraud through their website. Many other countries have enacted similar laws for combatting phishing and other cybercrimes. According to [33] [34], legislation should be designed to provide large-scale damage against individual phishers or secondary liability against Internet Service Providers (ISPs) in hopes that ISPs will be motivated to play their role in fighting phishing. The authors suggested that it can be done under the auspices of intellectual property or unfair competition laws. However, cybercrime is mostly done cross-border, and many phishing attacks have a short life-span. This brings us to two main challenges: locating the phisher and obtaining jurisdiction to enforce the law. a) Finding and locating the phishing source: 1) Online scammers hide their identities and use secure servers in their activities. Back-tracing the IP of phishers becomes very difficult over the network. 2) Many use fake emails and register malicious domain names for their activities. There are no authentication requirements for any user when opening an email account to verify their identities. Since the internet allows a user to communicate anonymously, it is virtually impossible to locate them. 3) Even when the source is located, it has become increasingly difficult and time consuming for law enforcement to find evidence from their computer systems due to data encryption. According to the latest APWG report, more than 195,000 domains were used for phishing in 2016, of which more than half (95,424) were registered maliciously by phishers with 75% of them having top level domains (TLDs) from the Cocos Islands (.cc), pacific island of Palau (.pw), and Tokelau (.tk)[35]. The report also found that many phishing attacks originate from countries such as China and North Korea. b) Obtaining jurisdiction in order to enforce the law: 1) Many online phishers tend to conduct their activities in countries that have weak cyber laws and law enforcement, and a foreign state may not have jurisdiction over those countries. In the same APWG report, more than half of those phishing domains were registered in China. A country may have strict laws on phishing and other cybercrimes yet enforcing that law will become very difficult if the cybercrime is crossing borders where they do not have any jurisdiction. 2) Phishers and other cyber criminals can simply declare bankruptcy, appeal any conviction, or deny any criminal engagement if the host country is not able to prove so or does not have cybercrime laws. 3) Requesting cyber criminals to be extradited in order to face trial is a lengthy and expensive process that could require years. Since these countries have their own sovereign jurisdiction and borders, internal investigation and prosecution is essential in order to find the culprit guilty of cybercrimes before they can even engage in the extraditing process. These hurdles have allowed phishers to thrive under the cover of cyber networks while government agencies and law enforcement officials find it difficult to locate and prosecute the perpetrators of cybercrimes. It is therefore imperative that users are better equipped with information and hands-on training to make them aware of this problem. 2.2 Simulated training, visual cues, and user education Phishers prey on the lack of security knowledge and computer self-efficacy of users. User education, therefore, refers to raising awareness to keep users from becoming victims of phishing attacks [36]. This can be done for instance by providing materials (online, mobile, or hard copy) on how phishing attacks occur, especially during regular work activities. On the other hand, simulated training involves techniques that are used where researchers or organizations simulate real-world phishing scenarios on their users in an experimental, safe, environment in order to track their susceptibility to phishing [37]. Research works by [38] [39] [40] [41] utilize the Elaboration Likelihood Model (ELM), designed by Petty and Cacioppo [42], which suggests that user’s cognitive processing is a key reason why many fall victims to deception. How a user pays attention to cues in emails (i.e. initially noticing something fishy, what ELM classifies as “attention”) and thus consequently digs deeper to search for more cues (what ELM classifies as “elaboration process”), are key factors for identifying a fraudulent website. In the next sub-section, simulated training, visual cause, and user education techniques are critically analyzed. S. Baadel et al. a) Simulated Training A number of research studies have been conducted on simulated user training for phishing awareness [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53]. These studies involved either sending users an email with links and monitoring how they responded or making the participants aware that a simulated phishing experiment was to be conducted and are gauged on their abilities to correctly identify phishing emails. At the end of the training, users were normally given the materials and informed about their vulnerability to phishing. Harrison, et al. [52], conducted a study at North­eastern University in the USA, where he exposed students to real-world phishing attacks in a safe simulated environment. The study used five measures: elaborations, attention, subjective email knowledge and experience, objective phishing knowledge, and individual personality-based technological innovativeness. After two weeks of students registering their email addresses, they received an email message with a hyperlink that routed them to the phishy webpage where they were asked to log in using their university credentials. Those who did not respond to the initial email were sent a second reminder to participate. The authors wanted to experiment how message factors, elaboration, and attention predicted the participant’s susceptibility to phishing attacks. The authors concluded that anti-phishing efforts should focus on refining the quality of initial attention to the e-mail. They suggested that this can be attained and enhanced through educating users to pay attention to just a few key elements in the message, such as noticing red flag elements such as a hyperlink and knowing where to find the actual address that the e-mail was sent from. Jensen, et al.[53], designed a study at a midwestern university in the USA. A generic and customized phishing message was distributed to students and staff at the university after they were asked to participate in the study. The authors created a fictitious employee email account for the study and sent two types of emails: a generic one that asked the participants to log in and try a new web portal, and a customized one with a similar message but containing the university mascot, displaying a local phone number, etc. A URL in the email directed the participants to a fictitious website where the participant’s credentials were collected. The study showed that most of those who fell victim to the attack did so in the first 24 hours of the experiment. The study also concluded that brief online training was effective, and that it should be included as part of a layered set of defences to accompany automated intelligent tools in fighting phishing attacks. An earlier study by [45], of 921 students from the University of Indiana revealed that students who received an email that was perceived to be from a friend clicked on the link 72% of the time compared to 16% when it was from an unknown address. A similar pilot study was conducted by [43], using embedded training methodology to measure phishing awareness at a university. Malicious emails were urging users to click on a link that would redirect them to a phoney website where they would input their login credentials. During the experiment, the users were interrupted immediately when they clicked the link and were subsequently provided with training material on phishing. [47] [48] conducted a study using an embedded training methodology where users were immediately alerted and trained after they fell victim to a simulated phishing attack. The authors argued that users become more motivated to learn about phishing attacks once they have realized that they are victims of such attacks. The authors also wanted to see how effective such a methodology was for user knowledge retention. They concluded that users will be better equipped and can learn more effectively in embedded training simulation as opposed to training sent via regular emails. In their 2009 study using PhishGuru, the authors found that this method allowed users to retain and transfer their knowledge better than with non-embedded training. An example of a real-life application of simulation training is PhishMe Simulator [54] that was designed to enhance employee awareness and equip them with the proper tools to recognize and report phishing emails by immersing them in simulated phishing scenarios. The tool allows real-time educational opportunities the moment users take the phishing bait. b) Visual Cues and User Education In the Elaboration Likelihood Model (ELM), initial notice of something fishy is a crucial first step in how a user pays attention to other cues. Visual cues tend to mimic an alert system where a red flag is raised and a user who picks up on that red flag may dig deeper to search for more cues and potentially identify threats. One of the classical suggestions of human interactive proof (HIP), where online users are required to identify and verify visual cues and contents, was proposed by [55] and is known as dynamic security skins (DSS). In DSS a random image is displayed that is personal to the user prior to the user entering their password. This image can be overlaid on top of the password textboxes, making sure the user sees it and thus making it difficult for phishers to spoof the password entry. [56] queried users to analyse some emails and gauge whether their understanding of virus attacks gave them a better understanding of web threats. The results indicated that their knowledge of negative consequences resulting from computer related crimes did not prevent the users from being vulnerable to phishing attacks. It was concluded that more specific training should be conducted, focusing on phishing attacks as opposed to providing warnings. [57] proposed a system that embeds key information on the clients’ side for the usertoenter, which can then be verified at the server side. The authors introduced what is known as the completely automated public turing test to tell computers and humans apart (CAPTCHA). [58] extended this concept by adding an additional security layer consisting of a time-sensitive restriction of one-time-password (OTP). [59] developed a game-based tutorial called Anti-phishing Phil that trained users on how to avoid phishing scams. The interactive game showed users how to identify phishing URLs, identify other cues on the browser, and how to distinguish between legitimate and fraudulent Informatica 45 (2021) 335–345 339 sites. The study concluded that users who played the game were better equipped to identify a phishing attack. A later study by [60] investigated whether an interactive mobile platform is more effective in educating users in contrast to traditional security training. A comparison of users’ responsiveness to phishing was conducted, using a mobile game developed by [61] versus training through a website designed by APWG. Results indicated that users trained through the mobile application had a higher success rate of identifying phishing sites compared to their counterparts who only used the APWG website. In their recent study on phishing threat avoidance using games, [51] concluded that all their participants were convinced that the mobile game was somewhat effective compared to articles and lecture notes for enhancing their avoidance behaviour through motivation to protect themselves from phishing threats. The participants argued that mobile game-based education was fun and gave them immediate feedback, whereas lecture notes or articles provided them with little practical experience. [50] conducted a study based on the conceptual model of Theoretical Threat Avoidance Theory (TTAT) by [62] to assess the level of computer user’s knowledge on how to thwart phishing attacks. Participants were given 5 phishing URL’s to assess their procedural knowledge (identify if the given URL is legitimate or suspicious) and another 5 phishing URL’s to assess their conceptual knowledge (identify which part of the URL is suspicious). The results of the study concluded that the combination of procedural and conceptual knowledge positively affected self-efficacy, which enhanced the user’s avoidance behaviour. A study by [63], using improved browser security indicators and visual cues to attract attention by users in order to identify phishing websites found that there was a correlation between users gazing at the visual cues and detecting phishing sites. The study showed that users that paid attention to the visual cues had a 53% greater chance of identifying phishing websites. These visual cues rely solely on human interventions and their abilities to utilize them at the right time. This provides a major challenge as many users tend to ignore the visual cues on the toolbars or fail to interpret some of the security cues appropriately [64][65][66]. Moreover, based on the majority of training and education research, most users are unaware of how phishing attacks start or how to visually recognise and differentiate between a fraudulent and legitimate website [22][64][67]. Many educational and training materials let users become aware of the threats, but do not necessarily provide them with the necessary skills or knowledge for protecting themselves or their organisations from such attacks [9]. While many of the educational materials used to train users on web attacks are readily available online, the vast majority of users ignore them. Some argue that of those who actually train themselves on security and cyber threats, many tend to develop a fear of doing online commerce as opposed to learning how to protect themselves and engage in it [36]. 2.3 Online user communities As users become more aware and are able to identify online scams or fall victim to phishing attacks, they may report their experience in order to prevent others from similar attacks. Users can report fraudulent websites or URL links that can then be stored in online databases. Such accumulated resources can also be used by researchers to study phishing scammers and their evolving ways of devising their scams. These online communities can also be a vital source of information regarding what different types of phishing attacks exist and their potential threats to individuals and organizations. For example, figure 4 below from the user community website Cofense, reveals that more than 90% of IT executives in the US worry about email related phishing [54]. Individuals who recognise phishing activity may report it via public anti-phishing communities. This collection of previously identified and detected phishing domain names, or URLs, is commonly referred to as a “blacklist”. Phishing Threat Type 50% 40% 30% 20% 10% 0% Figure 4: USA IT Executives Concerns Over Phishing Threats [54]. A blacklisted website significantly loses its user traffic and any potential revenues. However, according to [59], effectiveness of blacklists depends on: a) frequency of the database update, b) an accurate phishing detection rate (i.e. correctly detecting phishing instances, also known as the True Positive (TP) rate). Google and Microsoft blacklists are commonly used by marketers, users, and businesses because of their lower false positive (FP) rates (legitimate instances that are incorrectly identified as phishing) compared to other publicly available blacklists and due to their frequency of database updates.Microsoft’sblacklist isupdatedbetween every nine hours and six days, whereas Google’s blacklist gets updated between twenty hours and twelve days [10]. This is a definite limitation on the blacklist approach, as phishing campaigns take significantly lower times to make their attacks before they can be detected and blocked [18] [59][68]. The online communities play an important role in raising anti-phishing awareness and keeping the conversation progressing. However, the vital part of these databases is the ability of the users to identify the fraudulent website before it could be blacklisted. Thus, users are potentially vulnerable until the URL is reported. S. Baadel et al. This also highlights the importance of proactive user education and training. 3 Analysis and discussions Table 1 below provides a brief summary of the pros and cons of the different approaches identified and discussed in section 3. A thorough analysis and discussion of the table is presented below. Some recommendations are provided as a way forward and are given in the sub­sequent sections. 3.1 Legislation and law enforcement Many developed countries have adjusted their criminal laws to include online computer fraud, such as phishing. One of the major benefits of legislation and law enforcement is that when phishing activities are criminalized; it brings this problem to the forefront of the public eye as a criminal activity. This in turn facilitates the other two approaches discussed in this paper. Users are therefore able to engage in training in order to become aware of this criminal activity and may participate in reporting any phishing scams to the government run databases or commercial ones. Businesses that suffer from spear phishing may conduct their internal investigations, and if they are able find the perpetrator seek compensation, retribution, or protection of their brands by filing a law suit when such laws exist. Harsh jail terms and steep fines are crucial for deterring potential phishers from initiating an attack. However, some of the negatives to this approach is that many phishers are smart enough to hide their identities by using secure servers. There are no specific laws or requirements that check and verify a users’ identity and details when opening anemail account or registering a website [69]. Phishers therefore tend to register their websites maliciously and use fake email accounts, making it difficult to locate them. Since many attacks have a short life span, phishers can successfully defraud users and quickly shut down their activities and disappear before law enforcement is able to even begin investigating a phishing attack. While any law enforcement cannot begin before the perpetrator is caught, which as indicated can be very difficult, other issues may arise such as jurisdiction to even implement the law. If such laws cannot be enforced, then they will have little deterrent effect. It can be seen that according to the APWG, many phishing attacks originate from countries that have very lenient or weak cyber laws. Extraditing such criminals would thus be virtually impossible when such treaties do not exist between foreign states. Countries that do not have cybercrime laws need to act and enact legislation that will criminalize these activities. A globally harmonized policy will be required in order to have a uniform definition of what amounts to cybercrime which can be implemented across all countries with similar legislations. Extradition treaties that can be enforced through law enforcement agents such as the International Police organization (INTERPOL) should then be encouraged among member countries. It is quite obvious that extradition is time consuming, not a cost­effective process, and may require a lengthy court process in the native countries (even on crimes where the suspects have physical addresses or business), yet nonetheless, is a necessary first step toward combating this menace at a global scale. Information sharing among countries is also critical to fighting cyber criminals. The following are a few examples of coordinated efforts from different law enforcement agencies, covering different jurisdictions, to indict cyber criminals and phishers. i) A Florida man was indicted in Pennsylvania for a phishing scam, pretending to be a legitimate Hurricane Katrina relief website [70]. ii) A collaboration between the FBI of the US and law enforcement in Egypt netted around 80 phishers working together in an elaborate banking scam. The FBI made about 33 arrests from phishers based in Southern California, North Carolina, and Nevada [71]. iii) Indian police arrested a ring leader and mastermind of phishers who impersonated agents from the Internal Revenue Service (IRS) and US Citizenship and Immigration Services (USCIS). Following his arrest, the Department of Justice (DoJ) together with the IRS and Department of Homeland Security announced the arrest of 20 individuals in the United States in connection with the same scam and proceeded with extradition requests for the Indian arrests to be charged in the US [72]. 3.2 Simulated training and user education There are many pros as well as cons to user education and training. User training enables the user to identify phishing attacks in a simulated experiment. When the user is well trained, they are better prepared and are aware of phishing scams and other cybercrimes. Users should be trained on specifics, such as phishing attacks and how they work, as opposed to general knowledge of negative consequences to cybercrimes such as identity theft. Specific training will raise awareness and understanding of phishing, and in turn minimise users’vulnerabilitytophishingattacks.Some of the cons in this approach is that in the general sense, many non-technical users will resist training and learning. Researchers who utilised the Elaboration Likelihood Model (ELM) suggested that attention and elaboration is critical to identifying a fraudulent website. While these are cognitive processes, it requires behavioural adjustment to adapt the two strategies in order for a new user to not continually fall victim to numerous phishing attacks. Changing users’ online interactive behaviour is not an easy feat. Users tend to ignore or pay less attention to training materials and visual cues, and many have a difficulty visually recognizing and distinguishing fraudulent sites from legitimate ones. Awareness can be raised, and users trained on how to identify and appropriately deal with phishing scams in three ways. This can be attained through a traditional medium such as in schools and universities, through the enforcement and participation by the Internet Service Providers, or through a mobile game platform. i) The traditional medium may raise awareness through the introduction of cybercrimes in high school, university curriculums, or even short courses offered to the community at large. Technique Pros Cons Reference Legislation and Law Enforcement -Incriminate phishers -Harsh jail terms, penalties and fines are a deterrent -Difficulty in locating phishers -Jurisdiction issues when trying to implement the law [30] [32] [33] [34] User Education -Minimizes susceptibility -Users do not pay attention to visual [11] [51] [52] [53] and Training to phishing attacks -Raises awareness and understanding of phishing attacks and other cybercrimes -Sharing information among organisations, employees, and other stockholders cues -Users deal with the training as any other annual training conducted -Users ignore training materials -Users do not learn skills on how to combat phishing attacks [63] Online social -Prevents users from -reactive approach and phishers are [18] [10] [35] communities falling victim to identified only blacklisted after an attack has approach phishing URLs -Information is shared in a single platform -Data is collected that can be used for further analysis to understand causes of phishing -Past experiences are helpful for other users already occurred -lack of real time blacklist update mechanism -Not possessing a high accurate phishing detection rate -Requires user intervention to make it work (users may or may not have proper education and training on phishing attacks) Table 1: Summary of Anti-Phishing Countermeasures Pros and Cons. While this can be a daunting task and require educational institutions to adopt and adjust their curricula, the strategy has proven to be effective. This approach was partially experimented by [73] in the introduction to computing courses taken by students not pursuing a computer science education. The authors concluded in the class assessment that students had an increased level of awareness and were better able to recognise phishing scams. [74] also concluded that user education is crucial for elevating defences against phishing attacks. ii) Internet Service Providers (ISPs) are in a position to play a larger role in the prevention of phishing attacks. By putting some of the liabilities on ISPs, [33][34] suggest that this may put pressure on the ISP to take a more proactive stance in training their employees and users and may require them to cascade such knowledge to companies using their services. This can be in the form of embedded training where employees will continuously learn as they conduct their daily work activities. Such training materials can be placed in emails, on company intranet sites, or through simulated text messaging over regular social media platforms. iii) Mobile game platforms bring an interactive and fun approach to education and training and are somewhat more effective compared to traditional articles or lectures. Users who participated in mobile game studies argued that mobile game-based education was fun and gave them immediate feedback so that they were better equipped to identify a phishing attack after completing the game [59] [60]. Users trained through mobile application had a higher success rate of identifying phishing sites compared to their counterparts who used traditional mediums [61]. 3.3 Online communities The online communities’ database approachhelps prevent users from falling victim to previously blacklisted sites. This strategy can reduce the amount of people being defrauded by phishers and cut their potential revenues. However, the con for this approach is that it does not protect from zero-hour phishing. New phishing attacks need to be detected first and then blacklisted. This process takes time, and many of the well-known databases have a slow database update rate. The lack of real-time blacklist updating is a major drawback to this approach [10][18] [59][68]. This lag time is enough for the phishers to complete their attacks and move on to something else as the phishing life span is very short. Accuracy in phishing detection is very critical, and failure in this may result in legitimate sites being blacklisted. These online communities play an important role in raising anti-phishing awareness. It serves the online community in two ways: i) The accumulated resources can be used by researchers to study phishing scammers and their evolving ways of devising their scams. ii) It provides a platform for novice users to share experiences and keep the conversation about phishing and other cybercrime progressing. S. Baadel et al. 4 Conclusions and future work This paper investigated common conventional anti-phishing prevention techniques, including law enforcement, legislative bills, education, simulated training, and online communities. While many countries such as the USA, Canada, and the UK have taken a lead to criminalise phishing attacks and put together harsh legislations, it is still difficult to locate attackers. This is since phishing attacks have a short life span, allowing attackers to change identity or move on before law enforcement agencies can locate them. Despite these limitations, it is still vital that government and other enforcement agencies improve their services to reduce phishing rates by sharing information and removing jurisdiction barriers. User training and visual cues partially improve users’ abilities to identify phishing. However, many novice users are still not paying high enough attention to visual cues when browsing websites, making them vulnerable to phishing attacks. Users need to be exposed, in a repetitive manner, to training about phishing since phishers continuously change their deception techniques. This approach of preventing phishing is useful for novice users, but it has proved to not be cost effective. Online phishing communities accumulate data repositories that allow users to share useful information about phishing incidents, such as URLs that have been blacklisted and phishing experiences. This does create a knowledge base for users’ online communities but requires some computer literacy as well as awareness about security indicators. In addition, due to the nature of the phishing attacks, these blacklists frequently become outdated as updates are only performed periodically rather than in real-time. While each of the conventional methods has their own deficiencies, as a whole they reinforce each other and provide an additional layer of protection against phishing scams. Novice users can benefit tremendously by combining some of the approaches discussed in order to improve their effectiveness in identifying phishing attacks and should not rely solely on a single method. This paper also provides a clear thorough analysis and discussion on each of the countermeasures proposed as a preventive layer to better equip companies, security experts, and researchers in selecting what can work well and equip individuals with knowledge and skills that may prevent phishing attacks on a wider context within the community. In future work, it is planned to present an anti-phishing framework in the context of IoT that integrates automated knowledge produced by computational intelligence in visual cues besides using human expert knowledge as a base. References [1] Aaron, G., and Rasmussen, R. (2010). Global phishing survey: trends and domain name use in 2H 2009. Lexington, MA: Anti-Phishing Working Group (APWG). [2] Ramanathan, V., and Wechsler, H. (2013). Phishing detection and impersonated entity discovery using conditional random field and latent Dirichlet allocation. Computers & Security, 34, 123-139. [3] Abdehamid, N. (2015). Multi-label rules for phishing classification. Applied Computing and Informatics 11 (1), 29-46. [4] Atkins, B., and Huang, W. (2013). A study of social engineering in online frauds. Open J Soc Sci, 1(03):23-32. [5] Afroz, A., and Greenstadt, R. (2011). PhishZoo: Detecting Phishing Websites by Looking at Them. Proceedings of the Fifth International Conference on Semantic Computing. Palo Alto, California, USA. IEEE. [6] Abdelhamid, N., and Thabtah F. (2014). Associative Classification Approaches: Review and Comparison. Journal of Information and Knowledge Management (JIKM), 13(3). [7] Aaron, G., and Manning, R. (2020). APWG Phishing Activity Trends Reports. https://docs.apwg.org/reports/apwg_trends_report_q 4_2019.pdf [Accessed March 10th 2020]. [8] Nguyen, L., To, B., and Nguyen H. (2015). An Efficient Approach for Phishing Detection Using Neuro-Fuzzy Model. Journal of Automation and Control Engineering, 3(6). [9] Abdelhamid, N., Thabtah, F., and Abdeljaber, H. (2017). Phishing detection: A recent intelligent machine learning comparison based on models content and features. Proceedings of IEEE International Conference on Intelligence and Security Informatics (ISI), China. IEEE. [10] Mohammad, R., Thabtah, F., and McCluskey, L. (2015). Tutorial and critical analysis of phishing websites methods. Computer Science Review Journal, 17, 1–24. [11] Baadel, S., Thabtah, F., Majeed, A. (2018). Avoiding the Phishing Bait: The Need for Conventional Countermeasures for Mobile Users. Proceedings of 9th the IEEE Annual Information Technology, Electronics and Mobile Communication Conference. Vancouver, Canada. [12] Khonji, M., Iraqi, Y., and Jones, A. (2013). Phishing Detection: A Literature Survey. IEEE Surveys and Tutorials, 15(4). [13] Purkait, S. (2012). Phishing counter measures and their effectiveness – literature review. Information Management & Computer Security, 20(5): 382-420. [14] Aleroud, A., and Zhou, L. (2017). Phishing environments, techniques, and countermeasures: A survey. Computer and Security, 68: 160-196. [15] Baadel, S., Lu, J. (2019). Data Analytics: intelligent anti-phishing techniques based on Machine Learning. Journal of Knowledge and Information Management. 18 (1) 1950005. [16] Jain, A., Gupta, B (2017). Phishing Detection: Analysis of Visual Similarity Based Approaches. Security and Communication Networks, Volume 2017, pp. 1-20. Informatica 45 (2021) 335–345 343 [17] Varshney, G., Misra, M., and Atrey, P. 2016. A survey and classification of web phishing detection schemes. Security and Communication Networks, 6266–6284. [18] Abdelhamid, N., Thabtah, F., Ayesh, A. (2014). Phishing detection based associative classification data mining. Expert systems with Applications Journal, 41, 5948–5959. [19] Aburrous, M., Hossain, M., Dahal, K., and Thabtah, F. (2010). Experimental Case Studies for Investigating E-Banking Phishing Techniques and Attack Strategies. Journal of Cognitive Computation, 2(3): 242-253. [20] Medvet, E., Kirda, E., and Kruegel, C. (2008). Visual-similarity-based phishing detection. Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, pp. 22:1-22:6 [21] Ma, J., Saul, L., Savage, S., and Voelker, G. (2009). Beyond blacklists: Learning to detect malicious web sites from suspicious urls. Proceedings of the 15th ACM SIGKDD, 2009, pp. 1245-1254. [22] Mohammad, R., Thabtah, F., and McCluskey L. (2014). Predicting Phishing Websites based on Self-Structuring Neural Network. Journal of Neural Computing and Applications, 25 (2): 443-458. [23] Qabajeh, I., Thabtah, F., Chiclana, F. (2015). Dynamic Classification Rules Data Mining Method. Journal of Management Analytics, 2(3):233-253. [24] Thabtah, F., Mohammad, R., and McCluskey, L. (2016). A Dynamic Self-Structuring Neural Network Model to Combat Phishing. Proceedings of the 2016 IEEE World Congress on Computational Intelligence. Vancouver, Canada. [25] Marchal, S., Saari, K., Singh, N., and Asokan, N. (2016). Know your phish: Novel techniques for detecting phishing sites and their targets. Proceedings of the IEEE 36th International Conference on Distributed Computing Systems (ICDCS). [26] Kirlappos, I., and Sasse, M. (2012). Security education against phishing: a modest proposal for a major rethink. Security & Privacy, 10: 24-32. [27] Department of Justice (2004). Report on Phishing. United States Dept. of Justice, p. 3. https://www.justice.gov/sites/default/files/opa/legacy /2006/11/21/report_on_phishing.pdf [Accessed August. 22, 2020]. [28] Pike, G. (2006). Lost data: The legal challenges. Information Today, 23 (10): 1–3. [29] Granova, A., and Eloff, J. (2005). A legal overview of phishing. Computer Fraud & Security, Vol. 20(7):6­11. [30] Leukfeldt, E., Lavorgna, A., and Kleemans, E. (2017). Organised Cybercrime or Cybercrime that is Organised? An Assessment of the Conceptualisation of Financial Cybercrime as Organised Crime. European Journal on Criminal Policy and Research, 23(3):287-300. [31] Calman, C. (2006). Bigger phish to fry: California's antiphishing statute and its potential imposition of secondary liability on internet service providers. Richmond Journal of Law & Technology,13(1): 1-24. [32] Bainbridge, D. (2007). Criminal law tackles computer fraud and misuse. Computer Law & Security Review, 23(3):276-281. [33] Larson, J. (2010). Enforcing intellectual property rights to deter phishing. Intellectual Property & Technology Law Journal, 22(1):1-8. [34] Cassim, F. (2014). Addressing the Spectre of Phishing: Are Adequate Measures in Place to Protect Victims of Phishing. The Comparative and International Law Journal of Southern Africa, 47(3):401-428. [35] Aaron, G., and Manning, R. (2020). APWG Phishing Activity Trends Reports. https://docs.apwg.org/reports/apwg_trends_report_q 4_2019.pdf [Accessed March 10th 2020]. [36] Arachchilage, N., Love, S., and Beznosov, K. (2016). Phishing threat avoidance behaviour: an empirical investigation. Computers in Human Behaviour, 60: 185–197. [37] Hadnagy, C. (2015). Phishing-as-a-service (PHaas) used to increase corporate security awareness. U.S. Patent Application 14/704, 148. [38] Harrison, B., Vishwanath, A., Yu, J., Ng, and Rao, R. (2015). Examining the impact of presence on individual phishing victimization. 48th Hawaii International Conference on System Sciences (HICSS), pp. 3483-3489. [39] Vishwanath, A., Harrison, B., and Ng, Y. (2015). Suspicion, cognition, automaticity model (SCAM) of phishing susceptibility. Proceedings of the Annual Meeting of 65th International Communication Association Conference, San Juan. [40] Vishwanath, A., Herath, T., Chen, R., Wang, J., and Rao, H. (2011). Why do people get phished? Testing individual differences in phishing vulnerability within an integrated information processing model. Decision Support Systems, 51(3): 576-586. [41] Workman, M. (2008). A test of intervention for security threats from social engineering. Information Management & Computer Security, 16(5): 463-483. [42] Petty, R., and Cacioppo, J. (1986). The elaboration likelihood model of persuasion. L. (Ed.), Advances in Experimental Social Psychology, Vol 19. New York: Academic Press, 123-205. [43] Arachchilage, N., Rhee, Y., Sheng, S., Hasan, S., Acquisti, A., Cranor, L., and Hong, J. (2007). Getting users to pay attention to anti-phishing education: evaluation of retention and transfer. Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit. Pittsburgh, PA, USA. ACM. [44] Ronald, D., Curtis, C., and Aaron, F. (2007). Phishing for user security awareness. Computers & Security, 26(1): 73-80. [45] Jagatic, T., Johnson, N., Jakobsson, M., and Menczer, F. (2007). Social phishing. Communications of the ACM, 50(10):94-100. [46] Downs, J., Holbrook, M., and Cranor, L. (2007). Behavioral response to phishing risk. 2nd annual eCrime researcher’s summit. Pittsburgh, PA. USA. S. Baadel et al. [47] Kumaraguru, P., Rhee, Y., Sheng, S., Hasan, S., Acquisti, A., Cranor, L., et al. (2007). Getting users to pay attention to anti-phishing education: evaluation of retention and transfer. 2nd annual eCrime researchers summit, Pittsburgh, PA. USA. [48] Kumaraguru, P., Cranshaw, J., et al. (2009). School of phish: a real-world evaluation of anti-phishing training. Proceedings of the 5th Symposium on Usable Privacy and Security Article No. 3, ACM. [49] Aburrous, M., Hossain, M., Dahal, K., and Thabtah, F. (2010). Experimental Case Studies for Investigating E-Banking Phishing Techniques and Attack Strategies. Journal of Cognitive Computation, 2(3): 242-253. [50] Arachchilage, N., and Love, S. (2014). Security awareness of computer users: A phishing threat avoidance perspective. Computers in Human Behaviour, 38: 304-312. [51] Arachchilage, N., Love, S., and Beznosov, K. (2016). Phishing threat avoidance behaviour: an empirical investigation. Computers in Human Behaviour, 60: 185–197. [52] Harrison, B., Svetieva, E., and Vishwanath, A. (2016). Individual processing of phishing emails. Online Information Review, 40(2):265-281. [53] Jensen, M., Dinger, M., Wright, R., Thatcher, J. (2017). Training to Mitigate Phishing Attacks Using Mindfulness Techniques. Journal of Management Information Systems 34(2):597-626. [54] Cofense Report (2019). 5 Uncomfortable Truths About Phishing Defense. https://cofense.com/ [Accessed March 10th, 2020] [55] Dhamija, R., and Tygar, J. (2005). The battle against phishing: dynamic security skins. Symposium on Usable Privacy and Security (SOUPS) Pittsburgh, PA, USA, pp. 77-88. [56] Downs, J., Holbrook, M., and Cranor, L. (2007). Behavioral response to phishing risk. 2nd annual eCrime researcher’s summit. Pittsburgh, PA. USA. [57] Saklikar, S., and Saha, S. (2008). Public key-embedded graphic CAPTCHAs. Proceedings of the Consumer Communications and Networking Conference (CCNC 2008), pp. 262-6. [58] Leung, C. (2009). Depress phishing by CAPTCHA with OTP. Proceedings of the 3rd International Conference on Anti-counterfeiting, Security, and Identification inCommunication. Pp. 187-92. [59] Sheng, S., Magnien, B., Kumaraguru, P., Acquisti, A., et al. (2007). Anti-Phishing Phil: The Design and Evaluation of a Game That Teaches People Not to Fall for Phish. Proceedings of the 2007 Symposium On Usable Privacy and Security, Pittsburgh, PA. [60] Arachchilage, N., and Love, S. (2013). A game design framework for avoiding phishing attacks. Computers in Human Behavior, 29(3): 706-714. [61] Arachchilage, N., and Cole, M. (2011). Design a mobile game for home computer users to prevent from “phishing attacks”. International Conference on Information Society (i-Society), 485-489. [62] Liang, X., and Xue, Y. (2010). Understanding security behaviours in personal computer usage: A threat avoidance perspective. Association for Information Systems, 11(7):394-413. [63] Alsharnouby, M., Alaca, F., and Chiasson, S. (2015). Why phishing still works: User strategies for combating phishing attacks. International Journal of Human-Computer Studies, 82: 69-82. [64] Huang, H., Tan J., and Liu, L. (2009). Countermeasure techniques for deceptive phishing attack. International Conference on New Trends in Information and Service Sciences. Pg 636-641. [65] Wu, L., Du, X., and Wu, J. (2016). Effective Defense Schemes for Phishing Attacks on Mobile Computing Platforms. IEEE Transactions on Vehicular Technology, Vol. 65, Issue: 8. IEEE. [66] Wu, M., Miller, R., and Garfinkel, S. (2006). Do security toolbars actually prevent phishing attacks? Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’06, pp. 601– 610. [67] Yue, C., and Wang, H. (2008). Anti-phishing in offense and defense. Proceedings of the Annual Computer Security Applications Conference (ACSAC), pp. 345-54. [68] Sheng S., Holbrook M., Arachchilage, N., Cranor, L., and Downs, J. (2010). Who falls for phish? a demographic analysis of phishing susceptibility and effectiveness of interventions. Proceedings of the 28th international conference on Human factors in Informatica 45 (2021) 335–345 345 computing systems. New York, NY, USA, 2010. ACM. [69] Stevenson, R. (2005). Plugging the “Phishing” Hole: Legislation versus Technology. Duke Law and Technology Review, 2005(6). [70] Leyden, J. (2006). Florida Man Indicted over Katrina Phishing Scam. The Register (U.K.), http://www.theregister.com/2006/08/18/hurricane_k _phishing_scam/ [Accessed Oct 10, 2019] [71] Associated Press (2009). Dozens Charged in Phishing Scam. http://www.independent.co.uk/life-style/gadgets-and-tech/news/dozens-charged-in­phishing-scam-1799482.html [Accessed Oct 8, 2019] [72] Phillips, K. (2017). Police Arrest Millennial Behind Multi-Million Dollar IRS Phone Scam. Forbes (US). https://www.forbes.com/sites/kellyphillipserb/2017/ 04/10/police-arrest-millennial-behind-multi-million­dollar-irs-phone-scam/#1bb604206ffc [Accessed Oct 6, 2019] [73] Robila, S., and Ragucci, J. (2006). Don't be a phish: steps in user education. Proceedings of the 11th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education. ACM Press, New York, NY, pp. 237-41. [74] Lungu, I., and Tabusca, A. (2010). Optimising anti-phishing solutions based on user awareness, education and the use of the latest web security solutions. Informatica Economica Journal,14(2): 27­36. Machine Learning with Remote Sensing Image Datasets Biserka Petrovska Ministry of Defense, Republic of North Macedonia E-mail: biserka.petrovska@morm.gov.mk Tatjana Atanasova-Pacemska, Natasa Stojkovik, Aleksandra Stojanova and Mirjana Kocaleva Faculty of Computer Science, University “Goce Delcev,” Republic of North Macedonia E-mail: tatjana.pacemska@ugd.edu.mk; natasa.stojkovik@ugd.edu.mk; aleksandra.stojanova@ugd.edu.mk; mirjana.kocaleva@ugd.edu.mk Keywords: machine learning, remote sensing, convolutional neural networks, transfer learning, feature extraction, fine-tuning Received: September 1, 2020 Computer vision, as a part of machine learning, gains significant attention from researches nowadays. Aerial scene classification is a prominent chapter of computer vision with a vast application: military, surveillance and security, environment monitoring, detection of geospatial objects, etc. There are several publicly available remote sensing image datasets, which enable the deployment of various aerial scene classification algorithms. In our article, we use transfer learning from pre-trained deep Convolutional Neural Networks (CNN) within remote sensing image classification. Neural networks utilized in our research are high-dimensional previously trained CNN on ImageNet dataset. Transfer learning can be performed through feature extraction or fine-tuning. We proposed a two-stream feature extraction method and afterward image classification through a handcrafted classifier. Fine-tuning was performed with adaptive learning rates and a regularization method label smoothing. The proposed transfer learning techniques were validated on two remote sensing image datasets: WHU RS datasets and AID dataset. Our proposed method obtained competitive results compared to state-of-the-art methods. Povzetek: Metoda prenesenega ucenja je uporabljena za analizo posnetkov iz zraka na nekaj referencnih bazah. Introduction Scene classification is a process of assigning a semantic label to remote sensing (RS) images [1, 2]. It is one of the crucial tasks in aerial image understanding. Aerial scene classification is possible due to the existence of several RS images datasets collected from satellites, aerial systems, and unmanned aerial vehicles (UAV). Remote sensing image classification has located its utilization in many fields: military, traffic observation, and disaster monitoring [3, 4]. The problem of aerial scene classification is complex because the composition of remote sensing images is compound, and it is rich in features: space and texture. This is the reason for developing numerous scene classification methods. Remote sensing image classification methods that rely on feature extraction can be categorized in one of the following groups: methods that use low-level image features, methods that use mid-level image features and methods that utilize high-level image representation. Methods using low-level image features operate on aerial scene classification with low-level visual descriptors: spectral, textural, structural, etc. Scale Invariant Feature Transform (SIFT) is a local descriptor that simulates local fluctuation of structures in remote sensing images [5]. Statistical and global allocation of certain image characteristics: color [6] and texture data [7] are utilized by other descriptors. Different color and texture descriptors, like color histograms and local binary pattern (LBP) descriptors, are comparatively analyzed in [8]. Remote sensing classification in [9] is performed by compound-feature figures of 6 different types of descriptors: SIFT, radiometric features, Grey Level Co-Occurrence Matrix (GLCM), Gaussian wavelet features, shape features, and Gabor .lters, with varying spatial resolution. Other descriptors used by researches are the orientation di.erence descriptor [10], and the Enhanced Gabor Texture Descriptor (EGTD) [11]. For aerial scene classification, authors in [12] use completed local binary patterns with multi-scales (MS-CLBP) and achieved state-of-the-art-results compared to other methods based on low-level image features. Mid-level image features methods try to represent aerial images with a statistical representation of a high degree obtained from the extracted local image features. The first step within these methods is to extract local image features from local patches employing descriptors like SIFT or color histograms. The second step is to encode those features to obtain a mid-level representation for remote sensing images. A widely used mid-level method is bag-of-visual-words (BoVW) [13]. The first step of BoVW is to extract features with SIFT from local image patches [14], and afterward, to learn so-called visual dictionary or visual codebook, that is a vocabulary of visual words. In aerial scene classification tasks, the basic BoVW technique can be combined with various local descriptors [14]: GIST, SIFT, color histogram, LBP. Another mid-level method relies on a sparse coding method [15] where low level extracted features such as structural, spectral, and textural are encoded. Improvement of the classification accuracy can be obtained with Principal Component Analysis (PCA) which enables dimensionality reduction of extracted features before fusing them in compound-representatives, or with methods such as the Improved Fisher Vector (IFK) [16] and Vectors of Locally Aggregated Tensors (VLAT) [17]. In the literature can be found improved models of BoVW like spatial pyramid co-occurrence kernel (SPCK) [18], which integrates the absolute and relative spatial data. This method combines concepts of spatial pyramid match kernel (SPM) [19] and spatial co-occurrence kernel (SCK) [13]. In [20] pyramid-of-spatial-relations (PSR) model is presented, which includes absolute and relative spatial connections of local low-level features. The third group of feature extraction methods for image classification relies on high-level vision information. The latest techniques that include high-level features based on CNN learning have shown significant improvement of classification accuracies compared to older low-level and mid-level image features methods. High-level methods can acquire more abstract and discriminative semantic representations, which guides in improved classi.cation performance. Feature extraction with deep neural networks, previously trained on ImageNet data set [21], results in significant performance for aerial scene classi.cation [22]. Remote sensing image classification accuracy achieved with GoogleNet [23] can be improved with an input strategy of multi-ratio for multi-view CNN learning. Multi-scale image features are extracted from the last convolutional layer of CNN [24] and then encoded with BoVW [25], Vector of Locally Aggregated Descriptors (VLAD) [26] and Improved Fisher Kernel (IFK) [16] to compose the .nal image representation. Nogueira et al. [27] extracted global features from CNN architectures and guided them to a classifier. In all of the examples mentioned above, the global or local extracted features were obtained from CNNs previously trained on massive data sets like ImageNet, formed of natural images. Extracted features were utilized for remote sensing image classi.cation. Another method of transfer learning is the fine-tuning of CNN weights. It is a technique where the original classification layer of the pre-trained CNN (usually softmax layer) is replaced with a new one, which contains a number of nodes equal to a number of classes of the target dataset. Altered CNN is trained with a random initialization of new layers, but the remaining layers begin with the pre-trained weights. Compared to a neural network training with random weight initialization, fine-tuning achieves a better minimum of the lost function. Authors in [28] achieved significant performance improvement by .ne-tuning a pre-trained CNN. They experimented with AlexNet [29] and obtained a better outcome for semantic segmentation. Also, there are several papers in the remote sensing community [30], B. Petrovska et al. [31], that surveyed the benefits of fine-tuning CNNs. A comparison between CNN trained from scratch, and fine-tuned one showed the advantages of using aerial scene data [31]. The fine-tuning method could be useful for the classification of hyperspectral images [30]. Fine-tuning weights of pre-trained CNNs results in the extraction of better scene features [32]. This transfer learning technique, performed on neural networks previously trained on the ImageNet dataset, results in good classification accuracy on remote sensing image data sets [24], [27]. Our previous work, [53], [54], showed that transfer learning techniques, feature extraction as well as fine-tuning, are superb methods for aerial scene classification. Despite the two transfer learning methods described above, the other alternative is to train CNN from scratch, i.e., with random initialization of network weights. This solution shows low classification accuracy for small-scale aerial scene datasets [27]. Full network training of CaffeNet and GoogLeNet resulted in poor classification results for the UC-Merced dataset [13] and the WHU­RS19 dataset [33]. But, full CNN training using large-scale datasets like AID [34] and NWPU-RESISC45 [35] has obtained good results. In this paper, we evaluate miscellaneous CNN models on resolving the task of high-resolution aerial scene classi.cation. We utilize convolutional neural networks pre-trained on ImageNet data set with a twofold purpose: like feature extractors and for fine-tuning on particular remote sensing datasets. When we use pre-trained CNN as feature extractors, we try to form better features for aerial imagery. Thus, we acquire activations from different CNN layers: the average pooling layer, the last, and one of the intermediate convolutional layers. In order to enable the fusion of features from convolutional layers with ones from average pooling layers, the feature dimensionality reduction method is utilized on those from convolutional layers. Compound features of the image scenes are processed by a linear classifier to determine image classes. Inthe secondexperimental setup, we explore the .ne-tuning of network weights on the remote sensing imagery. We trained CNNs with adaptive learning rates: linear decay schedule and cyclical learning rates and assessed if they are appropriate for fine-tuning of pre-trained CNN on aerial scene imagery. In order to achieve classification accuracy comparable to state-of-the-art methods, we included label smoothing as a regularization technique and assessed its impact on the experimental results. The main contributions of this paper are (1) evaluation of transfer learning techniques with various CNN models on two remote sensing image datasets, (2) analysis of the impact of fused features obtained by concatenation of activations from different pre-trained CNN layers on classification accuracy, (3) assessment of the influence of adaptive learning rates at the fine-tuning methodfrom the aspect ofclassi.cation accuracy, and (4) the proposed transfer learning techniques are compared to state-of-the-art methods and provide a baseline for aerial imagery classi.cation. The remainder of this article is organized as follows. In Section 2, the methodologies used for transfer learning from CNN are presented. Experimental results obtained from the examined remote sensing images classi.cation methods are presented in Section 3. Discussion of impact factors on our method’s results, as well as summarization and conclusion of the paper, is given in Section 4. Materials and methods This section of the article gives a short description of the pre-trained CNNs used for transfer learning: InceptionV3, ResNet50, Xception, and DenseNet121. Following that, we introduce the PCA for dimensionality reduction, linear decay scheduler, and cyclical learning rates as methods for transfer learning. Next, we present the two publicly available data sets: WHU-RS19 and AID included in our experiments. Finally, the utilized experimental setup and the evaluation metrics are given. 2.1 Convolutional neural networks ResNet won the classification task part of ILSVRC-2015. ResNet is a deep CNN that can have up to 152 layers [49]. It is similar to the VGG model because it contains mostly 3x3 filters, but the number of filters is smaller, and CNN is simpler [49]. Deep learning architectures can have high training error and vanishing gradient problem. The solution to the vanishing gradient problem is including a residual module in the neural network. The deep learning residual module, as shown in Figure.1, has a short connection between the input and the output. The first inception based network was named Inception-v1 or GoogleNet [50]. In GoogleNet architecture, inception modules are included, and thus the number of learning parameters is decreased. The original inception module, Figure.2, has a pooling layer and convolutional layers with dimensions 1x1, 3x3, and 5x5. Module output is got by concatenating the outputs of these layers. Inception based networks relay on the detail that the correlation within the image pixels is local. The number of learning parameters is reduced based on the local correlations. Inception-v3 [37] is the third iteration of inception based networks. It contains three different types of inception modules: type 1, got by dividing into smaller convolutions; type 2, got by dividing into asymmetric convolutions; and type 3, that was included to improve representations with high dimensions. Informatica 45 (2021) 333–344 349 Another deep CNN, which is similar to Inception, is Xception. In the Xception, the inception module is replaced with depth-wise separable convolutional layers [51]. This CNN is a cluster of depth-wise separable convolutional layers with shortcut connections. A depthwise separable convolution is separated into two phases. The first phase is a spatial convolution applied separately on each input channel, so-called depthwise convolution. After that, pointwise convolution follows, which is 1x1 convolution for conveying the output of depthwise convolution output channels to a new channel space. Dense Convolutional Network (DenseNet) [52], enables highest data flow between network layers, by attaching layers to each other in a feed-forward manner. The only condition for such connections is the layers to have corresponding dimensions of feature maps. The input for each layer are the feature maps of the preceding layers, and its own feature maps are carried into all layers ahead, as their input. Opposite to ResNets, here the authors [52] fuse features with concatenation, but don’t add together the features to lead them afterward into the following layer. This neural network got its name after the dense connectivity pattern, Dense Convolutional Network (DenseNet). The dense pattern suggests that there is no need to relearn redundant feature maps, which leads DenseNet to have a smaller number of parameters than other deep CNN. Figure 2: The architecture of a basic inception module. 2.2 Principal Component Analysis In our experiments, we used Principal Component Analysis (PCA) as a dimensionality-reduction technique. It establishes a new group of the basis of view, and then project the data from the original representation to a representation with fewer dimensions. The new dimensions are orthogonal to each other, independent and ordered, depending on the variance of data they contain. The first principal component is the one with the highest variance. The new data matrix consists of n data points with k features for each of them: [..............]..*..=[................................]..*..[........................]..*.. (1) The covariance matrix is symmetric. The variance of every dimension is on the main diagonal, and the covariances of dimensions are placed elsewhere. PCA is a dimensionality reduction method that spreads out data to have high variance within a fewer number of dimensions. 2.3 Adaptive learning rates The most crucial hyperparameters for CNN training are initial learning rate, number of training epochs, learning rate schedule, and regularization method (L2, dropout). The invariable learning rate for network training might be a reasonable choice in some instances, but more often, an adaptive learning rate is more beneficial. When training CNN, we are trying to find global, local minima, or only a part of the loss surface with adequate low loss. If we train the network with a constant but large learning rate, we can’t reach the desired valley of loss terrain. But if we adapt (decrease) our learning rate, the neural network can descend into more optimal parts of the loss landscape. In our proposed fine-tuning method, we use a linear decay schedule, which decays our learning rate to zero at the end of the network training. The learning rate a in every training epoch is given with: .. ..=....*(1- ) (2) ........ where aI is the learning rate at the beginning of training, E is the number of the current epoch, and Emax is the overall number of epochs. Cyclical Learning Rates (CLR) are another form of adaptive learning rates. In this case, there is no need to determine the optimal initial learning rate and schedule for the learning rate when we train CNN [36]. When the network is trained with learning rate schedules, the learning rate is being continuously reduced, but CLR allows the learning rate to oscillate among pre-defined limits. The network training with CLR convergences faster with fewer hyperparameter updates. Authors in [36] define a few CLR policies: triangular shown in Figure 3, triangular2, and exponential range policy. The triangular policy, as can be seen in Figure 3, is a triangular pattern: the learning rate oscillates linearly between the fixed lower limit and the upper limit. Triangular2 policy looks similar to triangular policy, except that the upper limit of a learning rate is twice lower after every cycle. As a result of this, triangular2 policy training is more stable. The exponential range policy B. Petrovska et al. Figure 3: Cyclical learning rate with a triangular policy model. (i) (ii) Figure 4: Some images of different classes from (i) WHU RS and (ii) AID data set. encompasses exponential declination of a maximum learning rate. 2.4 Remote sensing datasets We test our proposed transfer learning techniques on two common aerial scene data sets; the WHU RS data set [43] and the aerial image dataset (AID) [34]. The WHU-RS data set [43] is selected from Google Earth imagery, and the images are collected from all over the world. There are 19 image classes, with at least 50 images per class, entirely 1005 images. Image dimensions are 600x600 pixels. WHU-RS data set has been extensively used in experimental studies of remote sensing classification tasks. Image classes in the WHU-RS data set are airport, beach, bridge, commercial, desert, farmland, football field, forest, industrial, meadow, mountain, park, parking, pond, port, railway station, residential, river, and viaduct. The aerial image dataset (AID) has approximately 10,000 remote sensing images assigned to 30 classes: airport, bare land, baseball field, beach, bridge, center, church, commercial, dense residential, desert, farmland, forest, industrial, meadow, medium residential, mountain, park, parking, playground, pond, port, railway station, resort, river, school, sparse residential, square, stadium, storage tanks, and viaduct. Image dimensions are 600×600 pixels with a pixel resolution of half a meter. Images are obtained from Google Earth imagery. They are picked from various world regions at different times of the year and climate conditions: mostly from China, Japan, Europe, and the United States. 2.5 Experimental setup and evaluation metrics The first proposed transfer learning method was based on feature extraction. We involved four different pre-trained CNNs and extracted features from three various layers of each of them. For ResNet50 the layers were: bn4f_branch2c, the last convolutional layer, and average pooling layer; for InceptionV3: mixed_8, mixed_10 and average pooling layer; for Xception: block14_sepconv1_act, block14_sepconv2_act, and average pooling layer; and for DenseNet121 the layers were: conv4_block24_concat, the last convolutional layer, and average pooling layer. Before feature extraction, data set images were resized and pre-processed according to the demands of each pre-trained CNN. Data augmentation was applied to the images of the training set. Five patches of each training image were made with rotation, shifting, shearing, zooming, and flipping. The feature extraction was performed for the WHU-RS data set, for 60%/40% and 40%/60% training/test split ratio. The splits are random and without stratification. We included feature fusion to improve the classification performance of the proposed method. At first, image features were extracted from two various layers of two CNNs, so that one layer was the average pooling layer, and the other was some of the aforementioned convolutional layers. Then, we applied PCA transformation on convolutional layer features. Before the features are concatenated, we performed L2 normalization on PCA transformed convolutional layer features and the average pooling layer features. After the features fusion, a linear Support Vector Machine SVM classifier is trained with compound features. The SVM is a classifier described by a separating hyperplane. SVM model has four hyperparameters: type of kernel, the influence of regularization, gamma, and margin. The kernel can be linear, exponential, polynomial, etc. We used a linear kernel in our experimental setup. The prediction of a classifier for linear kernel is given with f(x)=B(0)+sum(....*(x,x..)) (3) The output of the classifier is acquired with the dot product of the input (x) and each of the support vectors (xi). The model computes the inner products of each input vector (x) with all support vectors in training images. The learning algorithm determines coefficients B(0) and ai from the training data. In our proposed feature extraction method, we tuned the regularization parameter. During SVM optimization, the regularization parameter regulates to what extent to take into consideration the misclassification of each training image. Our proposed fine-tuning method for aerial scene classification, as a form of transfer learning, is carried out with adaptive learning rates, as well as label smoothing. Label smoothing is a regularization method that fights against overfitting and leads to a better generalization of the CNN model. It is expectable our model to overfit because we use pre-trained CNN with high dimensionality and fine-tune them with a data set that has only a couple of thousands of images. Label smoothing [37] magnifies Informatica 45 (2021) 333–344 351 classification accuracy evaluating the cost function with “soft” labels from the data set (weighted sum ofthe labels withequal distribution)insteadof“hard” labels. When we apply label smoothing with parameter a, we reduce the cost function between the ‘smoothed’ labels y_k^LS and the network outcome pk, smoothed labels are as follows: LS yk =y..(1-a)+a/K (4) where the real labels are yk, and K is the number of classes. Label smoothing was applied only to the training images. In-place data augmentation was used for training images as well. In the simulation scenario, we included four pre-trained CNN: ResNet50, InceptionV3, Xception, and DenseNet121, and images of the target data set were resized according to the requirements of each CNN. The fine-tuning method was applied to the AID data set, and the experiments were performed with 50%/50% and 20%/80% train/test data split ratios. To prepare pre-trained CNN for fine-tuning, we removed from each network the layers after the average pooling layer. On top of this, a new CNN head was constructed by adding a fully connected layer, dropout layer, and softmax classifier. We started fine-tuning with warming-up the new CNN head. We warmed new network layers with an RMSprop optimizer and a constant learning rate. The fine-tuning continued with Stochastic Gradient Descent (SGD), and training was performed on all network layers. Different simulations scenarios were carried out with a linear decay schedule and cyclical learning rates with triangular policy. When it comes to the linear decay learning rate, the initial learning rate was selected relatively small, 1-2 orders of magnitudes lower than the initial learning rate of the originally trained CNN. Cyclical learning rates oscillated between the maximum and minimum bound with the optimal learning rate somewhere in between. The step size is equal to 4 or 8 times the number of training iterations in the epoch, and the number of epochs is chosen to contain integer of cycles. In our paper, we use the following evaluation metrics: Overall Accuracy OA (classification accuracy) and confusion matrix. OA is the ratio between the number of correctly classified test images and the total number of test images. It is always lower than 1 (lower than 100%). The confusion matrix is a table that represents the partial accuracy of each image class. This graphical display shows the errors of every single different class and confusion between the classes. Here the columns appear for the predicted classes, and the rows appear for the real classes. Better classification accuracy leads to higher values of the main diagonal of the confusion matrix and lower values for other entries. To check the reliability of the results, all cases are repeated ten times (five times for fine-tuning method). After that, the mean value and the standard deviation (SD) for each experiment are calculated. 3 Results 3.1 Classification of WHU-RS dataset The feature extraction transfer learning method was evaluated on the WHU-RS data set. Accuracy of SVM classification of compound features from the average pooling layer and PCA transformed convolutional layer features is shown in Table 1. Table 2 presents a comparative analysis of the proposed feature extraction method to competitive classification methods. It can be concluded that feature fusion with PCA transformation is a technique that achieves state-of-the-art classification accuracies. Under a training ratio of 40% of the WHU-RS data set, this method outperforms all the other classification methods. Figure 5 and Figure 6 show the confusion matrices without normalization obtained from the classi.cation of WHU-RS data set under 60% training data with B. Petrovska et al. InceptionV3 mixed_8 (PCA) and ResNet50 average pooling, and under 40% training data with DenseNet121 conv5_block16_concat (PCA) and ResNet50 average pooling. 3.2 Classification of AID dataset The experimental results of the fine-tuning method for classification of the AID dataset are displayed in Table 3. As can be seen from Table 3, the linear decay scheduler gives better classification results for a 50%/50% train/test split ratio for ResNet50 and InceptionV3. Cyclical learning rate works better for Xception and DenseNet121. For 20%/80% train/test split ratio linear decay scheduler is a better option for ResNet50, Xception and DenseNet121, and cyclical learning rates for InceptionV3. Table 4 is a comparative display of our fine-tuning method with other state-of-the-art techniques. Our method achieved the best classification results on the AID dataset 60% training 40% training Method ratio ratio ResNet50 last conv layer (PCA) and 98.26 95.02 InceptionV3 average pool ResNet50 last conv layer (PCA) and 97.62 96.52 Xception average pool InceptionV3 mixed_10 (PCA) and 96.27 95.85 ResNet50 average pool InceptionV3 mixed_8 (PCA) and 98.01 98.67 ResNet50 average pool InceptionV3 mixed_10 (PCA) and 96.77 96.02 Xception average pool InceptionV3 mixed_8 (PCA) and 98.01 96.35 Xception average pool DenseNet121 conv5_block16_concat 98.76 98.34 (PCA) and ResNet50 average pool DenseNet121 conv4_block24_concat 96.77 96.52 (PCA) and ResNet50 average pool Table 1: Classification accuracy of feature extraction method with WHU-RS data set. 60% of WHU-40% of WHU-Method RS data set as a RS data set as training set a training set Bag of SIFT [20] 85.52 ± 1.23 / Multi Scale Completed LBP + BoVW 89.29 ± 1.30 / [44] GoogLeNet [34] 94.71 ± 1.33 93.12 ± 0.82 VGG-VD-16 [34] 96.05 ± 0.91 95.44 ± 0.60 CaffeNet [34] 96.24 ± 0.56 95.11 ± 1.20 salM3LBP-CLM [45] 96.38 ± 0.82 95.35 ± 0.76 TEX-Network-LF [46] 96.62 ± 0.49 95.89 ± 0.37 InceptionV3 mixed_8 (PCA) and 98.13 ± 0.51 / ResNet50 average pool (Ours) DCA by concatenation [47] 98.70 ± 0.22 97.61 ± 0.36 Addition with saliency detection [48] 98.92 ± 0.52 98.23 ± 0.56 DenseNet121 conv5_block16_concat (PCA) and ResNet50 average pool / 98.26 ± 0.40 (Ours) Table 2: Classification accuracy (%) and standard deviation of the state-of-the-art methods with WHU-RS data set. Figure 5: Confusion matrix of the feature extraction method under 60% training data of WHU-RS data set. Figure 6: Confusion matrix of the feature extraction method under 40% training data of WHU-RS data set. for InceptionV3, under 50% training data with a linear decay scheduler, and under 20% training data with cyclical learning rates. As can be concluded from Table 4, some methods outperformed the proposed fine-tuning technique, like EfficientNet-B3-aux [38]. Authors in [38] used the fine-tuning of the EfficientNet-B3 network with auxiliary classifier. The explanation for better classification results might be that the network mentioned above has achieved better top-1 classification accuracy on ImageNet data set than CNNs we utilized in our experimental setup. Figure 7 displays the confusion matrix of the AID dataset classification for the proposed fine-tuning method with a 20%/80% train/test split ratio for ResNet50, cyclical learning rates, and softmax classifier. The main diagonal shows the number of properly predicted test images; the other elements give misclassified test images. - 20% training ratio 50% training ratio Method ResNet50 93.06±0.16 95.62±0.15 Linear decay scheduler 92.91±0.35 95.52±0.28 Cyclical learning rate InceptionV3 93.7±0.33 96.41±0.23 Linear decay scheduler 93.79±0.24 95.95±0.2 Cyclical learning rate Xception 93.67±0.18 96.14±0.12 Linear decay scheduler 93.44±0.10 96.15±0.17 Cyclical learning rate DenseNet121 93.74±0.24 96.03±0.16 Linear decay scheduler 93.54±0.15 96.21±0.19 Cyclical learning rate Table 3: Overall accuracy (%) and standard deviation of the fine-tuning method with the AID data set. 20% training ratio 50% training ratio Method 86.86±0.46 89.53±0.31 CaffeNet [34] / 91.80±0.22 MCNNs [55] / 91.87±0.36 Fusion by concatenation [39] 90.87±0.11 92.96±0.18 TEX-Net-LF [46] 89.49±0.34 93.60±0.64 VGG-16 (fine-tuning) [40] / 95.36±0.22 Multilevel fusion [56] 92.20±0.23 95.48±0.25 GBNet +global Feature [40] 93.79±0.13 96.32±0.12 InceptionV3-CapsNet [41] InceptionV3 with linear 96.41±0.23 93.7±0.33 decay scheduler (ours) InceptionV3 with cyclical 95.95±0.2 93.79±0.24 learning rate (ours) 94.19±0.15 96.56±0.14 EfficientNet-B3-aux [38] 92.48±0.38 96.85±0.23 GCFs + LOFs [42] Table 4: Overall accuracy (%) and standard deviation of the fine-tuning method compared to state-of-the-art methods for the AID data set. Figure 7: Confusion matrix of the fine-tuning technique under 20% training data of AID dataset for ResNet50, cyclical learning rates, and softmax classifier. Figure 8: Training plot of the fine-tuning technique under 50% training data of AID data set for InceptionV3, cyclical learning rate, and softmax classifier. with a 50%/50% train/test split ratio of AID data set with cyclical learning rates and softmax classifier. The plot 4 Discussion and conclusion shows the fine-tuning when all layers are “trainable” with From the completed simulations and obtain results, the an SGD optimizer. The plot has a characteristic shape for following valuable concepts can be summed up: training with cyclical learning rates; the form of training -When it comes to feature extraction method, andvalidationlosslines is“wavy.” Because we fine-tuned Inception V3, and DenseNet121 are the pre-trained the network with smoothed train labels, the training loss is CNN that give the highest classification accuracies. higher than validation loss. As it is presented in Table 1, the best experimental results on the WHU-RS dataset are obtained when features from these networks’ layers are fused. From Table 3 it is evident that InceptionV3 outperforms other pre-trained CNNs in transfer learning through fine-tuning for the AID data set; -The most suitable layer for feature extraction is mixed_8 from Inception V3. It gives good classification results with ResNet50, as well as DenseNet121 average pooling layer. ResNet50 average pooling layer also gives significant classification results when it is combined with DenseNet121 convolutional layers, the last or the intermediate ones; -For the fine-tuning method, under 50% training data ratio linear learning rate decay scheduler gives better classification results for ResNet50 and Inception V3 pre-trained networks, and cyclical learning rates are a better choice for Xception and DenseNet121. Under 20% training data ratio, learning rate decay scheduler works better for Xception and DenseNet121, and cyclical learning rates are a better choice for ResNet50 and Inception V3; -Our proposed transfer learning methods give classification accuracies comparable to state-of-the-art techniques. The feature fusion method with PCA transformation gives the classification accuracy of 98.26 ± 0.40 under a 40% training ratio of the WHU­RS dataset, which outperforms other methods in the literature. For the fine-tuning method applied to the AID dataset, some methods obtain better experimental results compared to ours, like EfficientNet-B3-aux [38], and the reason for better classification accuracy might be the type of pre-trained CNN utilized in the scenario. In our paper, we proposed two distinct transfer learning techniques for remote sensing image classification. The feature extraction method utilizes the concatenation of extracted features from different CNNs’ layers with prior PCA transformation. The fine-tuning method includes adaptive learning rates and label smoothing. With both transfer learning methods, we have achieved significant classification results on the two datasets. The proposed feature extraction technique can be further explored with feature extraction from lower layers of pre-trained CNN, as well as with stratification of train/test data split. For future development of the fine-tuning method, we suggest including different types of pre-trained CNNs apart from the ones used in this article, like EfficientNets, and involving of learning rate finder [36] to discover optimal values for initial learning rate or limits for cyclical learning rates. References [1] A. Qayyum, A. S. Malik, N. M. Saad, M. Iqbal, M. F. Abdullah, and W. Rasheed, Scene classification for aerial images based on CNN using sparse coding technique. International Journal of Remote Sensing, vol.38, pp.2662–2685, 2017. B. Petrovska et al. [2] J. Gan, Q. Li, Z. Zhang, and J. Wang, Two-level feature representation for aerial scene classification. IEEE Geoscience and Remote Sensing Letters, vol.13, no.11, pp.1626–1630, 2016. [3] W. Yang, X. Yin, and G. S. Xia, Learning high-level features for satellite image classification with limited labeled samples. IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 8, pp.4472–4482, 2015. [4] F. Huang and L. Yan, Hull vector-based incremental learning of hyperspectral remote sensing images. Journal of Applied Remote Sensing, vol.9, no.1, Article ID096022, 2015. [5] D. G. Lowe, Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004 [6] M. J. Swain and D. H. Ballard, Color indexing. International journal of computer vision, vol. 7, no. 1, pp. 11–32, 1991 [7] V. Risojevic, and Z. Babic, Aerial image classi.cation using structural texture similarity.IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 2011, pp. 190–195 [8] J. A. dos Santos, O. A. B. Penatti, and R. da Silva Torres, Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classi.cation. in VISAPP (2), 2010, pp. 203–208 [9] B. Luo, S. Jiang, and L. Zhang, Indexing of remote sensing images with di.erent resolutions by multiple features. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6, no. 4, pp. 1899–1912, 2013. [10] V. Risojevic, and Z. Babic, Orientation di.erence descriptor for aerial image classi.cation. International Conference on Systems, Signals, and Image Processing (IWSSIP). IEEE, 2012, pp. 150– 153 [11] V. Risojevic, and Z. Babic, Fusion of global and local descriptors for remote sensing image classi.cation. IEEE Geoscience and Remote Sensing Letters, vol. 10, no. 4, pp. 836–840, 2013 [12] C. Chen, B. Zhang, H. Su, W. Li, and L. Wang, Land-use scene classi.cation using multi-scale completed local binary patterns. Signal, Image, and Video Processing, pp. 1–8, 2015 [13] Y. Yang and S. Newsam, Bag-of-visual-words and spatial extensions for land-use classi.cation. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, 2010, pp. 270–279 [14] L. Chen, W. Yang, K. Xu, and T. Xu, Evaluation of local features for scene classi.cation using vhr satellite images. Joint Urban Remote Sensing Event (JURSE). IEEE, 2011, pp. 385–388. [15] G. Sheng, W. Yang, T. Xu, and H. Sun, High- resolution satellite scene classi.cation using a sparse coding based multiple feature combination. International journal of remote sensing, vol. 33, no. 8, pp. 2395–2412, 2012 [16] F. Perronnin, J. Sanchez, and T. Mensink, Improving the .sher kernel for large-scale image classi.cation. Proc. European Conference on Computer Vision, 2010, pp. 143–156. [17] R. Negrel, D. Picard, and P.-H. Gosselin, Evaluation of second-order visual features for land-use classi.cation. International Workshop on Content-Based Multimedia Indexing (CBMI). IEEE, 2014, pp. 1–5. [18] Y. Yang and S. Newsam, Spatial pyramid co- occurrence for image classi.cation. IEEE International Conference on Computer Vision (ICCV). IEEE, 2011, pp. 1465–1472 [19] S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. Proc. IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2006, pp. 2169–2178 [20] S. Chen and Y. Tian, Pyramid of spatial relations for scene-level landuse classi.cation.IEEETransactions on Geoscience and Remote Sensing, vol. 53, no. 4, pp. 1947–1957, 2015 [21] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, pp. 1–42, April 2015. [22] O. A. B. Penatti, K. Nogueira, and J. A. DosSantos, Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’15), pp.44–51, IEEE, Boston, Mass, USA, June 2015 [23] F. P. S. Luus, B. P. Salmon, F. VanDenBergh, and B. T. J. Maharaj, Multi-view deep learning for land-use classification. IEEE Geoscience and Remote Sensing Letters, vol.12, no.12, pp.2448– 2452, 2015. [24] F. Hu, G.-S. Xia, J. Hu, and L. Zhang, Transferring deep convolutional neural networks for the scene classi.cation of high-resolution remote sensing imagery. Remote Sensing, vol. 7, no. 11, pp. 14680– 14707, 2015. [25] J. Sivic and A. Zisserman, Video google: A text retrieval approach to object matching in videos, in Proc. IEEE International Conference on Computer Vision, 2003, pp. 1470–1477. [26] H. J´egou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid, Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1704–1716, 2012 [27] Nogueira, K.; Penatti, O.A.B.; dos Santos, J.A., Towards better exploiting convolutional neural networks for remote sensing scene classi.cation. Pattern Recognit. 2017, 61, 539–556 [28] R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation. Computer Vision and Pattern Recognition, IEEE, 2014, pp. 580–587 Informatica 45 (2021) 333–344 357 [29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classi.cation with deep convolutional neural networks. Neural Information Processing Systems, 2012, pp. 1106–1114 [30] J. Yue, W. Zhao, S. Mao, and H. Liu, Spectral–spatial classi.cation of hyperspectral images using deep convolutional neural networks. Remote Sensing Letters 6 (6) (2015) 468–477. [31] M. Xie, N. Jean, M. Burke, D. Lobell, and S. Ermon, Transfer learning from deep features for remote sensing and poverty mapping. arXiv preprint arXiv:1510.00098 [32] M. Castelluccio, G. Poggi, C. Sansone, and L. Verdoliva, Land use classi.cation in remote sensing images by convolutional neural networks. arXiv preprint arXiv:1508.00092, 2015 [33] G.-S. Xia, W. Yang, J. Delon, Y. Gousseau, H. Sun, and H. Mantre, Structural high-resolution satellite image indexing, in ISPRS TC VII Symposium-100 Years ISPRS, vol. 38, 2010, pp. 298–303 [34] G.S. Xia, J. Hu, F. Hu, B. Shi, AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification, IEEE Transactions on Geoscience and Remote Sensing, vol.55, 2017, pp. 3965-3981 [35] G. Cheng, J. Han, X. Lu, Remote Sensing Image Classification: Benchmark and State of the Art, Proceedings of the IEEE, vol.105, 2017, pp. 1865­1883 [36] L. Smith, Cyclical learning rates for Training Neural Networks. arXiv:1506.01186v6, 2017 [37] C. Szegedy, V. Vanhouck, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the Inception Architecture for Computer Vision. arXiv:1512.00567v3, 2015 [38] Y. Bazi, M.M. Al Rahhal, H. Alhichri, and N. Alajlan, Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification. Remote Sens. 2019, 11(24), 2908; https: //doi.org/10.3390/rs11242908 [39] G. Wang, B. Fan, S. Xiang, and C. Pan, Aggregating RichHierarchical Features forScene Classi.cationin Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4104–4115. [40] H. Sun, S. Li, X. Zheng, and X. Lu, Remote Sensing Scene Classi.cation byGatedBidirectional Network. IEEE Trans. Geosci. Remote Sens. 2019, 1–15. [41] W. Zhang, P. Tang, and L. Zhao, Remote Sensing Image Scene Classification Using CNN-CapsNet. Remote Sens. 2019, 11, 494. [42] D. Zeng, S. Chen, B. Chen, and S. Li, Improving remote sensing scene classi.cation by integrating global-context and local-object features. Remote Sens. 2018, 10, 734 [43] G.-S. Xia, W. Yang, J. Delon, Y. Gousseau, H. Sun, and H. Mantre, Structural high-resolution satellite image indexing, in ISPRS TC VII Symposium-100 Years ISPRS, vol. 38, 2010, pp. 298–303 [44] L. Huang, C. Chen, W. Li, and Q. Du, Remote sensing image scene classification using multi-scale completed local binary patterns and fisher vectors, Remote Sensing, vol.8, no.6, article no.483, 2016 [45] X. Bian, C. Chen, L. Tian, Q. Du, Fusing local and global features for high-resolution scene classi.cation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 2889–2901. [46] R.M. Anwer, F.S. Khan, J. vandeWeijer, M. Monlinier, J. Laaksonen, Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classi.cation. arXiv2017, arXiv: 1706.01171. [47] S. Chaib, H. Liu, Y. Gu, H. Yao, Deep feature fusion for VHR remote sensing scene classi.cation. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4775–4784. [48] Y. Yu, F. Liu, A two-stream deep fusion framework for high-resolution aerial scene classi.cation. Comput. Intell. Neurosci. 2018, 2018, 8639367. [49] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, arXiv:1512.03385v1, 10 Dec 2015 [50] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, IEEE Conf. on Comput. Vision and Pattern Recognition, Boston, MA, June 2015, pp. 1–9. [51] F. Chollet, Xception: Deep Learning with Depthwise Separable Convolutions, arXiv: 1610.02357v3, 4 Apr 2017 [52] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely Connected Convolutional Networks, arXiv:1608.06993v5, 28 Jan 2018 [53] Petrovska B., Zdravevski E., Lameski P., Corizzo R., Stajduhar I., Lerga J., Deep Learning for Feature Extraction in Remote Sensing: A Case-study of Aerial Scene Classification. Sensors 2020, 14, 3906 [54] Petrovska B., Atanasova-Pacemska T., Corizzo R., Mignone P., Lameski P., Zdravevski E., Aerial Scene Classification through Fine-Tuning with Adaptive Learning Rates and Label Smoothing. Appl. Sci. 2020, 10, 5792 [55] Liu Y., Huang C. Scene classification via triplet networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 220–237. [56] Yu Y., Liu F. Aerial Scene Classification via Multilevel Fusion Based on Deep Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 287–291. Relation Extraction Between Medical Entities Using Deep Learning Approach Ruchi Patel Department of Computer Science Engineering Medi-Caps University, Indore, (M.P.), India E-mail: ruchipatel294@gmail.com Sanjay Tanwani School of Computer Science & IT, DAVV, Indore, (M.P.), India E-mail: sanjay_tanwani@hotmail.com Chhaya Patidar Department of Computer Science Engineering E-mail: chhayapatidar377@gmail.com Keywords: convolution neural network, feature extraction, relation classification, word embedding Received: February 10, 2020 Medical discharge summaries or patient prescriptions contain a variety of medical terms. The semantic relation extraction between medical terms is essential for the discovery of significant medical knowledge. The relation classification is one of the imperative tasks of biomedical information extraction. The automatic identification of relations between medical diseases, tests, and treatments can improve the quality of patient care. This paper presents the deep learning based proposed system for relation extraction between medical entities. In this paper, a convolution neural network is used for relation classification. The system is divided into four modules: word embedding, feature extraction, convolution, and softmax classifier. The output contains classified relations between medical entities. In this work, data set provided by I2b2 2010 challenge is used for relation detection which consisted of total 9070 relations in test data and 5262 total relations in the train dataset. The performance evaluation of relation extraction task is done using precision and recall. The system achieved an average of 75% precision and 72% recall. The performance of the system is compared with the awarded i2b2 participated systems. Povzetek: Metoda globokega ucenja je uporabljena na iskanju relacij med zdravstvenimi entitetami. 1 Introduction Relation extraction is an essential task of biomedical text medical problem) and TrNAP(treatment is not mining. How any medical difficulty is related to administered because of the medical problem), other for symptoms, syndrome, and treatment, which tests will be test with problem TeRP (test shows the medical required for disease diagnosis? These types of problem), TeCP (test conducted to investigate medical information are required in health care and clinical problem) and the problem with other problem indicates procedures. Relation extraction is the task of PIP (problem indicates problem) [1]. Examples of classification in which a pair of relations between relations are: c="pacemaker" || r="TrAP"|| c="sinus node medical entities can be identified. It is the core clinical dysfunction", c="an angiography"|| r="TeRP"|| information identification problem that identifies c="bleeding in two vessels" etc.Medical relations are semantic relations between medical concepts problem, classified into categories which shown in table I. test, and treatment in discharge summaries [13]. It is one Remainder sections of this paper are organized as of the challenging tasks of i2b2 2010 NLP challenges. follows: Section 2 shows the review of papers related to Relation extraction is divided into various types medical relation extraction, Section 3 describes the according to their usage such asTrIPindicates treatment proposed method and dataset, Section 4 gives improvement with problem, TrWP (treatment worsen the experimental results and discussion and Section 5 medical problem), TrCP (treatment causes the medical conclude the results of proposed approach and give some problem), TrAP (treatment is administered for the novel directions for added research work. Type of Relation Categories of Relations Medical Problem-Treatment TrIP (Treatment improves medical problem) TrWP (Treatment worsens medical problem) TrCP (Treatment causes medical problem) TrAP (Treatment is Administered for medical problem) TrNAP (Treatment is not Administered because of medical problem) Relation between Treatment and Medical problem does not exist other than above types Medical Problem- TeRP (Treatment reveals medical problem) Test TeCP (Test conducted to investigate medical problem) Relation between Testand Medical problem does not exist other than above types Medical Problem-Problem PIP (Medical problem indicates medical problem) Relation between Medical problem and Medical problem does not exist other than PIP Table 1: Relation categories provided in i2b2 2010 challenge [22]. Literature review 2.1 Review of i2b2 NLP challenge work In 2010 i2b2 challenge, enormous work is done by authors in the field of relation extraction for medical text. In this NLP challenge several supervised and unsupervised machine learning classifiers are used. Various effective relation classification systems have been used such as CRF classifiers, Semi Markov models,SVM classifiers, Naive bayes classifier and SSVM (structural support vector machine)[2-5]. Some authors have used machine learning based modulesfor train data followed by post processing rules. Maximum entropy (ME) classifier is trained using semantic and syntactic features in[6]. In this paper, results given by ME classifiers are relatively less memory demanding as compare to kernel based k-nearest neighbour (kNN) classifier and less computationally expensive than support vector machine (SVM) classifier. Other than these given reasons, did not detect a major difference in performance between these classifiers. This system attained top rank in the i2b2 NLP challenge. SVM classifier is used to train on dataset using word as features in [7], and classify the data into different relation types. Syntactic features are also taken as important R. Patel et al. feature in the paper but recall can be improved using full parse tree implementation. In [8], the system combines supervised classifier with rule based method. Entity and relation extraction in a joint framework is proposed using card pyramid parsing approach in [9]. Syntactic analysis of sentences is done by using the concept of bottom up parsing with SVM relation classification [10]. Few authors had used SVM classification algorithm with various features like syntactical, lexical, medical semantics and sentence level context information [11][12][13]. ConText algorithm is also designed for contextual feature creation [14], which recognizes context of patient’s medical condition such as family history, previous record of disease or disease related treatment and symptoms etc. Relation extraction between treatment and problem concepts is explored in [15], and SemRep method is used for detection of semantic and lexical features associated with concepts [16]. But SemRep is trained for semantic representation of entities for general English text, so it is unable to extract treatment concepts in clinical domain precisely. Semantic relation discovery on clinical records is presented in [17], in which SVM is used for classification of disease-test, disease–treatment, and symptom–treatmentrelation types. The performance of this work is dependent on semantic types of a particular domain.REMed is a learning-based approach which is introduced for automatic relation discovery in the work presented in paper [18]. Relation extraction from clinical notes is also given by the usage of parse tree enhancement with semantic features [19]. Performance of relation extraction system can be improved by the integration of semantic features into parse trees. In [20], authors had used three different classifiers to classify problem-test, problem-treatment and problem-problem relations respectively. Maximum entropy framework is used as classifier and implemented in the maximum entropy OpenNLP toolkit. In the paper [1], Hybrid approach is explored which integrates the linguistic pattern matching with SVM classifier. SVM classifier is trained with libsvm tool and created linguistic patterns manually. It is found that, the usage of patterns with SVM classifier improves the relation extraction. I2b2 challenge participants had used different set of features for relation classification. Features such as context features, semantic features, concepts co-occurrence, N-gram sequential feature and parser output are used for medical relation extraction in previous papers [4, 5,20,21]. Relationship between Coronary Obstructive Pulmonary Disease and cardiovascular diseases is detected through various machine learning classifiers in paper [22]. Diabetes complications are also detected by using various machine learning classifiers in paper [23]. 2.2 Performance of Existing Systems Performance evaluation of relation extraction for every relation category is done using recall, precision and F-score. Table II presents F-score of various relation Table 2: Performance evaluation of Relation detection systemsin i2b2 challenge (2010) [24]. Relation Detection Authors of Systems Methods F-Score (%) Roberts Supervised method 73.7 DeBruijn Semi Supervised method 73 Grouin Hybrid method 71 Patrick Supervised method 70.2 Jonnalagaddaand Gonzalez Supervised method 69.7 Divita Supervised method 69.5 Solt Supervised method 67 Demner-Fushman Supervised method 66.6 Anik Supervised method 66 Cohen Supervised method 65.6 Relation Training Testing Test Test data Test data Type Data (figure) Data (figure) data Recall, Train data Recall, In (%) Precision , Train data Precision , In (%) F-Score, Train data F-Score, In (%) PIP Relation 1239 1986 62.5% 64% 67.7% 73% 65% 68% TrWP Relation 56 143 2.8% 3.7% 80% 100% 5.4% 7% TrAP Relation 1422 2487 72% 78% 70% 68.4% 71% 72.8% TrNAP Relation 106 191 13% 26.4% 55.5% 70% 21% 38% TrCP Relation 296 444 48% 44.9% 49.5% 63.6% 48% 52% TrIP Relation 107 198 15.7% 23.3% 86% 69% 26.5% 35% TeCP Relation 303 588 43% 47.8% 61% 77% 50% 59% TeRP Relation 1733 3033 84% 87% 84% 82.3% 84% 84.6% Overall Result 5262 9070 67.5% 70.9% 73% 74.5% 70% 72.6% Table 3: Evaluation of Relation Extraction System [2]. extraction systems contributed in challenge 2010 [24]. Table 3 presents recall, precision and F-score for each relation class label shown by Patrick et al [4], in whichTeRP, TrAP and PIP relations got highest F-score; while TrWP relation got very low F-Score. 2.3 Summary and research gaps Extensive review is done in the field of medical relation detection. In i2b2 challenge, participants have used Table 4: Summary of Relation Types of train and test Relation Type Training Data (figure) Testing Data (figure) PIP Relation 1239 1986 TrWP Relation 56 143 TrAP Relation 1422 2487 TrNAP Relation 106 191 TrCP Relation 296 444 TrIP Relation 107 198 TeCP Relation 303 588 TeRP Relation 1733 3033 Overall Result 5262 9070 data [1]. machine learning methods with feature of the engineering module. Machine learning methods performed well but feature engineering module is time taking and requires domain knowledge. The importance of feature design and usefulness of rich features influences the results. SVM classifier is used in relation extraction. It is observed that the use of patterns with SVM classifier improves the relation extraction. Medical relations in intra-sentences are extracted in existing systems accurately but relations in inter-sentences require more attention. 3 Proposed methodology An extensive discussion on existing work in clinical relation extraction is done in related work. Different tools, techniques, and methods are discussed for the medical domain. The system is proposed for medical relation extraction which is based on the concept of deep learning. 3.1 Dataset I2b2 2010 challenge organizers provided data set for relation classification which consisted of total 9070 relations in test data and 5262 total relations in the train dataset. The summary of each relation type for train and test data is shown in table 4. Example of annotated dataset for relation extraction is shown in table5. 3.2 Proposed deep learning based relation extraction system The extraction of semantic relations is essential for the discovery of significant medical knowledge. Relation extraction is an important task in the field of biomedical text mining. For improving the relation between medical entities, Deep Learning based method CNN (convolution neural network) with word2vec is used [25]. The method is divided into four steps: word embedding, feature extraction, convolution, and softmax classifier. The word embedding model takes word tokens as input, so initially the sentences divided into word tokens. Then the word token are converted into vectors using word embedding. In this work, new word embedding model is trained c="coronary artery bypass graft" 115:4 115:7||r="TrAP"||c="coronary artery disease" 115:0 115:2 c="a amiodarone gtt" 75:11 75:13||r="TrAP"||c="burst of atrial fibrillation" 75:3 75:6 c="antibiotics" 80:15 80:15||r="TrAP"||c="left arm phlebitis" 80:8 80:10 c="creams" 124:1 124:1||r="TrNAP"||c="incisions" 124:10 124:10 c="cath" 19:14 19:14||r="TeCP"||c="abnormal ett" 19:9 19:10 c="powders" 124:5 124:5||r="TrNAP"||c="incisions" 124:10 124:10 c="lotions" 124:3 124:3||r="TrNAP"||c="incisions" 124:10 124:10 c="ointments" 124:8 124:8||r="TrNAP"||c="incisions" 124:10 124:10 c="oxycodone -acetaminophen" 92:1 92:3||r="TrAP"||c="pain" 92:21 92:21 c="drugs" 12:8 12:8||r="TrCP"||c="known allergies" 12:5 12:6 c="cath" 20:0 20:0||r="TeRP"||c="severe 3 vessel disease" 20:2 20:5 c="cxr" 56:0 56:0||r="TeRP"||c="left lower lobe atelectasis" 56:3 56:6 c="cabg" 28:8 28:8||r="TrAP"||c="mi" 28:2 28:2 c="po amiodarone" 79:9 79:10||r="TrIP"||c="further episodes of afib" 79:3 79:6 c="overall left ventricular systolic function" 44:0 44:4||r="TeRP"||c="mildly depressed" 44:6 44:7 c="wounds" 121:1 121:1||r="PIP"||c="infection" 121:3 121:3 c="wounds" 121:1 121:1||r="PIP"||c="redness" 121:5 121:5 c="wounds" 121:1 121:1||r="PIP"||c="drainage" 121:7 121:7 Table 5: Example of Annotated Relation Corpus. using clinical 2010 i2b2 dataset. Then lexical and sentence level feature vectors are created separately and then concatenated into the final feature vector. In progression, the final feature vector is fed into the softmax classifier for the relation classification. The dimension of output vector is equal to number of predefined relation types. Figure 1 shows the architecture of deep learning based proposed relation extraction system. The description of each component is mentioned below. 3.2.1 Word embedding Word embedding is the word representation method which represents same context words in a similar representation [26]. Word2Vec is the technique to represent word embedding concept. In this component, Word2Vec takes word tokens as input and generates the vector of words as output. It constructs vocabulary from training text data and then creates vector of context similar words. However, there are many trained word embedding models are available which can directly used in the data set [27]. But these models are trained on general text. In the proposed work, new word embedding model is generated using clinical i2b2 data set. The size of vocabulary is 9060 of i2b2 test data set. Word2vec results are not entirely dependent on the corpus but also on the parameters used. Basic parameters for training the model are: • CBOW (continuous bag of words) and SG (skip gram) vector model architectures • Dimensionality of vector space such as 200, 300, 500 and 800 • Word windows size such as 5, 10 and 20 • min_count represents (Ignores all words with total frequency lower than this) 3.2.2 Feature extraction In this component, lexical level and sentence level features are identified. Lexical level features are important indication for relation identification. In the proposed system, lexical features are identified by using word embedding method. For lexical features, clinical entities are also important, which identified by existing deep learning method proposed in [28]. Window and semantic features are identified by word2vec method. For medical semantic types mapping, UMLS (unified medical language system) is used [29]. These features are concatenated into lexical feature vector. Here lexical feature vector is denoted as: L = {< l1, w1 >,< l2, w2 >, ..., } where li is a semantic concept in the UMLS, and wi is a weight to symbolize the importance of the clinical text associated with li. Lexical feature vector is obtained using the semantic types mapping with UMLS from given clinical text. This component has several layers: input layer, convolution layers, pooling layers and hidden layers. Input Layer -The input layer converts the clinical text into a matrix of embedding, indicated as W . R(k+n)×m, where k and n is the semantic types of word and maximum number of words respectively, and m is the dimension of word embedding. W is obtained by concatenating the embedding of words and semantic types together: W = Ww.Wl. Here Wl and Ww are the embedding of the semantic types and words, respectively. And concatenation operation is denoted as .. Convolution Layer -In neural network model, the convolution approach is used to merge all the features. Convolution layer uses the filters to create features maps. For predicting relation types, features are identified globally on complete data. The convolution layer is used to find out high level features from the input layer. To find different varieties of features, apply filters with different sizes. ReLU function is usedfor convolution layers asnon-linear function. The filter is applied to all possible windows of words and semantic types in W and produced a feature map s .Rn+k-h+1 . Pooling Layer-Best feature can be extracted through max pooling operation of features. The pooling layer is used to further abstract the features by aggregating the score for each filter which produced from convolution layer. In this work, over each feature map, max-over­time pooling operation is applied. Important features are identified by selecting the uppermost value on each dimension of vector. Pooling layers are used to induce a fixed-length vector from feature maps. Hidden Layer -Hidden layer is used to combine different features after getting from pooling layers. Informatica 45 (2021) 359–366 363 In the present work, tanh is used as an activation function. Sentence level feature vector is also generated like lexical level feature vector. It also consists of several layers: input layer, convolution layers, pooling layers and hidden layers. The input of this component is the word and its position in the sentence. Again W is obtained by concatenating the embedding of words and position together: W = Ww. Wp. Here, Ww and Wp are the embedding of the words and position of words, respectively. Finally, combine the output vectors of the lexical level features and sentence level features by concatenating them. Softmax classifier – It is used as the final layer of neural network. It gives the confidence of each relation type. In the work, softmax classifier is used as multi class identification of relations.The output layer is applied on the combined vector to transform the output values into probabilities for relation detection.It returns the probabilities of each relation and target relation is having highest probability. Training -In the present work, parameters are trained as a set T, in which the training data set is denoted as M and the class label as N. For each m .M, the component computes a score s(n; m, T) for each class n .N. The softmax operation is used to transform scores into a conditional probability distribution in the output layer which over the scores for all n . N, Shown in Eq. (1). (1) The training target of the model is to maximize the loglikelihood over the training set with respect to T. Shown in Eq. (2). (2) The architecture of sentence level feature generation is shown in fig 2. First component is feature extraction of word features and position features. Word features are representation of contextual similar words associated with the index of word in a sentence. Whole sentence is represented as list of word vectors with its ranking. Pair wise ranking is used to train word embedding model. Position features are relative distance (d1 and d2) of current word with left and right word, which is w1 and w2. Combination of word and position feature vector is fed into convolution component for extracting sentence level features. Table 6 shows different hyperparameters with its values which are tuned for convolutional neural network for relation extraction. The results are using 10 cross fold validation in which the model is trained for 10 times. 4 Results and discussions Figure 2: Architecture of Sentence level feature component. Description Values Activation ReLU, Sigmoid, Function Softmax, Tanh Feature Maps 50, 100, 128 Dropout rate 0.5, 0.7, 0.9 Pooling 1-Max Table 6: Setup of Hyperparameters. Relation Types Precision (%) Recall (%) PIP 90 95 TrWP 10 5 TrAP 73 85 TrNAP 59 10 TrCP 60 30 TrIP 50 20 TeCP 70 35 TeRP 82 86 Overall Result 75 72 Table 7: Summary of result of medical relation types. Authors of the Systems Methods F-Score (%) Roberts et al[7] Supervised method 73.7 DeBruijnet al[3] Semi Supervised method 73.1 Grouinet al[8] Hybrid method (Machine learning and linguistic pattern matching) 71 Proposed System Convolution Neural Network 74 Table 8: Comparison of performance of proposed system with awarded i2b2 participated systems. Performance evaluation of relation detection is done using recall and precision for every relation category. Table VII presents results of relation types in which TeRP, TrAP and PIP relations got highest precision and recall; while TrWP relation precision and recall is very low. Because training data contains less TrWP (treatment R. Patel et al. worsen problem) relations. Using convolution neural network, the performance of the system has improved for few relations. The F-score of PIP relation is increased from 68% to 92% and TrAP relation is increased from 73% to 79%. It is observed that relation types which are more presented in training data, gave best results. Table VIII shows the comparison of proposed system with existing i2b2 challenge systems. 5 Conclusion Biomedical information is necessary for doctors, health care professionals, and clinical researchers. The information growing exponentially and scattered in published literatures and patient health records. The need is to identify appropriate tools and techniques for extracting knowledge from medical text. In this paper, medical relations are extracted between clinical concepts using word embedding and CNN based deep learning method. The system is trained using word embedding model with lexical and sentence level features. The performance of the system is compared with existing relation extraction systems. Medical relations in intra-sentences are extracted in existing systems accurately but relations extraction in inter-sentences has more scope for future work. 6 Acknowledgement Thanks to the 2010 i2b2/VA challenge organizers for the development of training and test corpora. I also thank U.S. National Library of Medicine to provide UMLS for research work. References [1] A.-L. Minard, A.-L. Ligozat, A. Ben Abacha, D. Bernhard, B. Cartoni, L. Deléger, et al., "Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification," Journal of the American Medical Informatics Association, vol. 18, p. 588, 2011. https://doi.org/10.1136/amiajnl-2011-000154 [2] N. Kang, R. J. Barendse, Z. Afzal, B. Singh, M. J. Schuemie, E. M. van Mulligen, et al., "Erasmus MC approaches to the i2b2 Challenge," in Proceedings of the 2010 i2b2/VA workshop on challenges in natural language processing for clinical data. Boston, MA, USA: i2b2, 2010. [3] B. deBruijn, C. Cherry, S. Kiritchenko, J. Martin, and X. Zhu, "NRC at i2b2: one challenge, three practical tasks, nine statistical systems, hundreds of clinical records, millions of useful features," in Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. Boston, MA, USA: i2b2, 2010. [4] . D. Patrick, D. H. M. Nguyen, Y. Wang, and M. Li, "A knowledge discovery and reuse pipeline for information extraction in clinical notes," Journal of the American Medical Informatics Association, vol. 18, pp. 574-579, 2011. https://doi.org/10.1136/amiajnl-2011-000302 [5] I. Solt, F. P. Szidarovszky, and D. Tikk, "Concept, Assertion and Relation Extraction at the 2010 i2b2 Relation Extraction Challenge using parsing information and dictionaries," Proc. of i2b2/VA Shared-Task. Washington, DC, 2010. [6] X. Zhu, C. Cherry, S. Kiritchenko, J. Martin, and B. De Bruijn, "Detecting concept relations in clinical text: Insights from a state-of-the-art model," Journal of biomedical informatics, vol. 46, pp. 275-285, 2013. https://doi.org/10.1016/j.jbi.2012.11.006 [7] K. Roberts, B. Rink, and S. Harabagiu, "Extraction of medical concepts, assertions, and relations from discharge summaries for the fourth i2b2/VA shared task," in Proceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data. Boston, MA, USA: i2b2, 2010. [8] C. Grouin, A. B. Abacha, D. Bernhard, B. Cartoni, L. Deleger, B. Grau, et al., "CARAMBA: concept, assertion, and relation annotation using machine-learning based approaches," in i2b2 Medication Extraction Challenge Workshop, 2010, pp. -. [9] R. J. Kate and R. J. Mooney, "Joint entity and relation extraction using card-pyramid parsing," in Proceedings of the Fourteenth Conference on Computational Natural Language Learning, 2010, pp. 203-212. [10] M. Liu, L. Jiang, and H. Hu, "Automatic extraction and visualization of semantic relations between medical entities from medicine instructions," Multimedia Tools and Applications, vol. 76, pp. 10555-10573, 2017. https://doi.org/10.1007/s11042-015-3093-4 [11] O. Frunza and D. Inkpen, "Extracting relations between diseases, treatments, and tests from clinical data," in Canadian Conference on Artificial Intelligence, 2011, pp. 140-145. https://doi.org/10.1007/978-3-642-21043-3_17 [12] O. Frunza, D. Inkpen, and T. Tran, "A machine learning approach for identifying disease-treatment relations in short texts," IEEE transactions on knowledge and data engineering, vol. 23, pp. 801­814, 2011. 10.1109/TKDE.2010.152 [13] C. Giuliano, A. Lavelli, and L. Romano, "Exploiting shallow linguistic information for relation extraction from biomedical literature," in 11th Conference of the European Chapter of the Association for Computational Linguistics, 2006. [14] W. W. Chapman, D. Chu, and J. N. Dowling, "ConText: An algorithm for identifying contextual features from clinical text," in Proceedings of the workshop on BioNLP 2007: biological, translational, and clinical language processing, 2007, pp. 81-88. [15] C. A. Bejan and J. C. Denny, "Learning to identify treatment relations in clinical text," in AMIA Annual Symposium Proceedings, 2014, p. 282. [16] D. Hristovski, C. Friedman, T. C. Rindflesch, and B. Peterlin, "Exploiting semantic relations for literature-based discovery," AMIA ... Annual Informatica 45 (2021) 359–366 365 Symposium proceedings. AMIA Symposium, vol. 2006, pp. 349-353, 2006. [17] O. Uzuner, J. Mailoa, R. Ryan, and T. Sibanda, "Semantic relations for problem-oriented medical records," Artificial intelligence in medicine, vol. 50, pp. 63-73, 2010. https://doi.org/10.1016/j.artmed.2010.05.006 [18] M. Porumb, I. Barbantan, C. Lemnaru, and R. Potolea, "REMed: automatic relation extraction from medical documents," presented at the Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services, Brussels, Belgium, 2015. https://doi.org/10.1145/2837185.2837239 [19] J. Kim, Y. Choe, and K. Mueller, "Extracting Clinical Relations in Electronic Health Records Using Enriched Parse Trees," Procedia Computer Science, vol. 53, pp. 274-283, 2015/01/01/ 2015. https://doi.org/10.1016/j.procs.2015.07.304 [20] B. de Bruijn, C. Cherry, S. Kiritchenko, J. Martin, and X. Zhu, "Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010," Journal of the American Medical Informatics Association : JAMIA, vol. 18, pp. 557-562, Sep-Oct 2011. https://doi.org/10.1136/amiajnl-2011-000150 [21] Y. Xu, K. Hong, J. Tsujii, and E. I. C. Chang, "Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries," Journal of the American Medical Informatics Association : JAMIA, vol. 19, pp. 824-832, Sep-Oct 2012. https://doi.org/10.1136/amiajnl-2011-000776 [22] Debjani Panda, Satya Ranjan Dash, Ratula Ray, Shantipriya Parida, “Predicting The Causal Effect Relationship Between Copd And Cardio Vascular Diseases”, Informatica, vol 44, no 4, 2020. https://doi.org/10.31449/inf.v44i4.3088 [23] Ali A. Abaker, Fakhreldeen A. Saeed, “A Comparative Analysis Of Machine Learning Algorithms To Build A Predictive Model For Detecting Diabetes Complications”, Informatica, Vol 45, No 1, 20201. https://doi.org/10.31449/inf.v45i1.3111 [24] Ö. Uzuner, B. R. South, S. Shen, and S. L. DuVall, "2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text," Journal of the American Medical Informatics Association : JAMIA, vol. 18, pp. 552-556, Sep-Oct 2011. https://doi.org/10.1136/amiajnl-2011-000203 [25] D. Zeng, K. Liu, S. Lai, G. Zhou, and J. Zhao, "Relation classification via convolutional deep neural network," 2014. [26] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3111-3119. [27] Q. Le and T. Mikolov, "Distributed representations of sentences and documents," in International conference on machine learning, 2014, pp. 1188­1196. [28] Y. Wu, M. Jiang, J. Xu, D. Zhi, and H. Xu, "Clinical Named Entity Recognition Using Deep Learning Models," AMIA ... Annual Symposium proceedings. AMIA Symposium, vol. 2017, pp. 1812-1819, 2018. R. Patel et al. [29] O. Bodenreider, "The Unified Medical Language System (UMLS): integrating biomedical terminology," Nucleic Acids Research, vol. 32, pp. D267-D270, 2004. https://doi.org/10.1093/nar/gkh061 AClassifer EnsembleApproachforPredictionof RiceYield Based on ClimaticVariabilityfor Coastal Odisha Regionof India Subhadra Mishra Departmentof Computer Science and Application, CPGS, Odisha Universityof Agriculture andTechnology Bhubaneswar, Odisha, India E-mail: mishra.subhadra@gmail.com Debahuti Mishra Department of Computer Science and Engineering, Siksha ’O’ Anusandhan Deemed to be University Bhubaneswar, Odisha, India E-mail: debahutimishra@soa.ac.in PradeepKumar Mallick School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India E-mail: pradeep.mallickfcs@kiit.ac.in Gour Hari Santra Department of Soil Science and Agricultural Chemistry, IAS, Siksha ’O’ Anusandhan Deemed to be University Bhubaneswar, Odisha, India E-mail: santragh@gmail.com SachinKumar (Corresponding Author) Department of Computer Science, South Ural State University, Chelyabinsk, Russia E-mail: sachinagnihotri16@gmail.com Keywords: crop prediction, classifer ensemble, support vector machine, k-nearest neighbour, naive bayesian, decision tree, linear discriminant analysis Received: February 23, 2021 Agriculture is the backbone of Indian economy especially rice production,but due to several reasons the expected rice yields are not produced. The rice production mainly depends on climatic parameters such asrainfall, temperature,humidity,windspeedetc.Ifthefarmerscangetthetimelyadviceonvariationof climatic condition, theycan take appropriate action to increase the rice production. Thisfactor motivate us to preparea computationalmodelforthefarmersand ultimatelytothesocietyalso.Themain contribution ofthisworkisto presenta classifer ensemble based predictionmodelbyconsideringthe originalriceyield and climatic datasets of coastal districts Odisha namely Balasore, Cuttack and Puri for the period of 1983 to 2014 for Rabi and Kharif seasons. This ensemble method uses fvediversifed classifers such as Support VectorMachine, k-Nearest Neighbour, Naive Bayesian, DecisionTree, and Linear Discriminant Analysis. This is an iterative approach; where at each iteration one classifer acts as main classifer and other four classifers are used as base classifers whose output has been considered after taking the majority voting. The performance measure increases 95.38% to 98.10% and 95.38% to 98.10% for specifcity, 88.48% to 96.25% and 83.60% to 94.81% for both sensitivity and precision and 91.78% to 97.17% and 74.48% to 88.59% forAUC for Rabi and Kharif seasons dataset of Balasore district and also same improvement in Puri and Cuttack District. Thus the average classifcation accuracyis found to be above 96%. Povzetek: Opisana je ansambelska metoda napovedovanja pridelka riža v Indiji. 1 Introduction Agricultureisthepivotof Indian economy. Around58%of rural households are dependent on agriculture as their ma­jor means of livelihood. However, the share of agriculture has changed considerably in the past 50 years. In 1950 55% of GDP came from agriculture while in 2009 it is 18.5% and during the fnancial year 2015-2016 it is 16.85% [1]. Indian agriculture has made great progress in ensuring food security to its huge population with its food grains produc­tion reaching a record level of 236 million ton in 2013­2014. While the required amount for 2030 and 2050 are 345 and 494 million ton respectively. In India rice is grown in different agro climatic zones and altitudes. Rice grown inIndiahasextendedfrom8to35°N latitudeandfromsea level to 3000 meter. Rice required a hot and humid climate and well suited to the areas which have high humidity, long sun shine and suffcient water supply. The average temper­ature required for the crop is 21 to 36°C. It is predicted that the demand for the rice will grow further than other crops. There are various challenges to achieve higher productiv­ity with respect to climate change and its repercussions. In tropical area higher temperature is one of the important en-vironmentalfactors which limit rice production. Different parts of the country have variable impacts due to climate change. For example by the year of 2080 the numbers of rain days are to be decreased along with narrow rise of 7­10%annualrainfallwhichwillleadtohighintensity storm. Moreover, on one hand when monsoon rain fall over the country is expected to rise by 10-15%, on the other hand the winter rainfallisexpectedto reduceby5-26%and sea­sonal variability would be further compounded [2]. Then, cereal production is expected to be reduced by 10-40% by 2100 due to rise in temperature, rising water scarcity and decrease in number of rain days. Higher loss is predicted in Rabi crops [3]. Rice productivity may declineby6percent for every 10C rising temperature [4]. In general changing climate trends will leadtooverall decline agricultural yield. The simulation analysis projected that on all India basis, the consequent of climate change on productivity in 2030s ranges from -2.5 to -12% for crops such as rice, wheat, maize, sorghum, mustard and potato [5, 6]. Climate is the sumof totalvariationin temperature, humidity,rainfalland other metrologicalfactorsina particular area fora period of at least 25 years [1]. Odisha’s climate has also under gone appreciable changesasaresultofvariousfactors. The previous six seasons of the year has changed into basically two mainly summer and rainy. The deviation in day tem­perature and annual precipitation is mainly restricted to4 months in a year and number of rain days decreased from 120 to 90 days apart from being abnormal. In addition, the mean temperature is increasing and minimum temper­ature has increased about 25% [2, 3, 4, 5]. Such climate change related adversity is affecting adversely productivity and production of food grains. Agriculture is the backbone of Indian economy. Butduetoseveral reasonstheexpected crop yields are not produced. The production mainly de­pends on climatic parameters such as rainfall, temperature, humidity, wind speed etc. So thefarmer should know the timely variation in climatic condition. If they can get the timely advice then they can increase the production. Be­fore development of the technology the farmers can pre­dict the production just by seeing the previous experience on a particular crop. But gradually the data increases and due to the environmentalfactors the weather changes. So we can use this vast amount of data for prediction of rice production. For a uniform growth and development assur­ance in agriculture (the current rate is 2.8% per annum), anexhaustive appraisalofthe accountabilityofthe agricul­ture production owing to predicted type of weather trans­form is necessary.In this paper the main aim is to create an ensemble model for prediction of climatic variability on rice yield for coastal Odisha. The weather parameters such as rainfall, temperature and humidity etc. are considered S. Mishra et al. because theyaffect the 95% production of rice crop. Ad­ditionally, the classifer’s accuracy validity has been mea­sured using specifcity, sensitivity/recall, precision, Neg­ative Predictive Value (NPV), False Positive Rate (FPR), False Negative Rate (FNP), False Discovery Rate (FDR) and the probabilistic measures such as; F-Score, G-Mean, Matthews Correlation Coeffcient (MCC) and J-Statistics. This paper is organized as follows; section2describes the related works, materials and methods or approaches used for experimentation are described in section 3. The frame­work of the proposed prediction model is given in section 4, section5 deals withexperimentation and modelevalu­ation. The result analysis, discussion and conclusion are givenin section6,7and8respectively. 2 Related work While undertaking thiswork, theexisting literature that has been followed during every phase of the entire research work with the intention of clear representation of the ma­chine learning based prediction models. The various ap­proaches are explored and have been addressed to design the ensemble based rice production model based on cli­matic variability. This section describes few recent works on this are which motivated us to develop an ensemble based model. Narayan Balkrishnan [7] proposed an en­semble model AdaSVM and AdaNaive which is used to project the crop production. Authors compared their pro­posed model among the Support Vector Machine (SVM) and Naïve Bayes (NB) methods. For prediction of out­put, two parameters are used such as accuracy and the classifcation error and it has been observed that AdaSVM and AdaNaive are better than SVM and NB.BNarayanan [8][8] compared the SVM and NB with AdaSVM and AdaNaive and conclude that the later one is better than frst twomethods. SadeghBafandeh [9] studied the detailed his­torical background and different applications of the method invarious areas.Ifthe distributionofthedataisnotknown then the k-Nearest Neighbour(K-NN) method can be ap­plied for classifcation technique [10, 11, 12]. In the feature space objects can be classifed on the basis of closest train­ing examples. It is one of the instance–based learning or lazy learning where computation is done until classifcation and function is approximated locally [13, 14]. ABayesian network or Bayes network or belief network or Bayesian model or probabilistic directed acyclic graphical models a typeof statistical model.Abelief networkto assesstheef­fectof climatechangeonpotato productionwas formulated by yiqun Gu et. al. [15]. Authors have shown a belief net­work combining the uncertainty of future climate change, considering the variability of current weather parameters such as temperature, radiation, rainfall and the knowledge about potato development. They thought that their net­work give support for policymakers in agriculture. They test their model by using synthetic weather scenarios and then the results are compared with the conventional math­ematical model and conclude that the effciency is more for the belief network. There arevariousfactors infuenc­ing the prediction. UnoY et al. [16] used agronomic vari­ables, nitrogen application and weed control using the ma­chine learning algorithm such as artifcial neural network and DecisionTree (DT) to develop the yield mapping and to forecast yield. They have concluded that high predic­tion accuracies are obtained by using ANNs. Veenadhari Set al. [17] described the soybean productivity modelling using DT algorithms. Authors have collected the climate data of Bhopal district for the period 1984-2003. They considered the climaticfactors such asevaporation, maxi­mum temperature, maximum relativehumidity,rainfall and the crop was soybean yield and applied the Interactive Di­chotomizer3 algorithm which is information based method and based on two assumptions. Using the induction tree analysis it was found that the relative humidity is a ma­jor infuencing parameter on the soybean crop yield. DT formed for infuence of climaticfactors on soybean yield. Using the if-then-else rules the DT is formulated to classi­fcation rules. Relative humidity affects much on the pro­duction of soybean and some rules generated which help to in the low and high prediction of soybean. One of the drawbacks was only the low or high yield can be predicted but the amount of yield production cannot be predicted. Due to the diversity of climate in India, agriculture crops are poorly impressed in terms of their achievement from past two decades. Forecasting of crop production and ad-vancedyieldmightbe helpfultopolicyinventorandfarm­ers to take convenient decision. The forecasting also helps for planning in the industries and they can coordinate their business on account of the component of the climate. A software tool titled ‘Crop Advisor’ has been developed by Veenadhari et al. [18] which is a client friendly and can forecast the crop yields with the effect of weather parame­ters.C4.5 algorithm is applied ascertain the most effective climatic parameter on the crop yields of specifed crops in preferred district of Madhya Pradesh. The software will be helpful for advice the effect of various weather parame­ters on the crop yield. Other agro –input parameters liable for crop yield are not accommodating in this tool, since the application of these input parameters differ with indi­vidual felds in space and time. Alexander Brenning et al. [19] compared all the classifer including Linear Discrimi­nant Analysis (LDA)for crop identifcation based on multi-temporal land dataset and concluded that stabilized LDA performed well mainly in feld wise classifcation. Ming-gang Du et al. [20] used the method LDAfor plant classi­fcation and conclude that LDAwith Principal Component Analysis is effective and feasible for plant classifcation. Renrang Liao [21] classifed fruit tree crops using penal­ized LDAand found that the LDAmay not be able to deal with collinear high dimensional data. It has been observed that, mostof literature areusing single classifcation model to predict the crop yield leading to increase in misclassif­cation by data biasing, therefore we have been motivated to formulate a multiclassifer based model known as clas­sifer ensemble [22]. This ensemble technique helps to re­duce the classifcation error by considering the outputs of different classifers by taking the majority of right outputs [23, 24]. In this paper we have tried to consider the colli­sion of the weather transform scenario of Odisha context of thefarmingyieldofthe one mainfasten food riceusingthe machine learning methods such as SVM, K-NN, NB, DT and LDA[25, 26]. 3 Materials and methods This section briefy describes the machine learning tech­niques and tools used to develop the ensemble based crop prediction model. 3.1 SupportVector Machine (SVM) TSVM is one of the supervised machine learning tech­niques and also known as support vector networks. It anal­yses data mainly for classifcation and regression analysis. Aset of labelled training data it produces by using input-output mapping functions [27]. For both classifcation of linear and non linear dataset, SVM method can be used. The original training data transformed a higher dimension by SVM using non linear mapping. Then for the linear op­timal separatinghyper plane, the new dimension searched by SVM. Thus, a decision boundary formed which sepa­rates the different classes from one another [28]. When the SVM is used for the prediction of the crop yield then it is known as support vector regression. The main objective of theSVMistofnd non-linear functionbytheuseofkernel that is a linear on polynomial function [29, 30, 30]. The radial basis function and the polynomial function are the widely usedkernel functions. In case large input samples space the diffculty of using linear function can be avoided by using SVM. Due to optimization the complex problem can be converted into simple linear function optimization [32]. 3.2 K-Nearest Neighbour(K-NN) K-NN [33] is one of the simplest supervised learning meth­ods used for both classifcation and prediction techniques [34, 35]. By using K-NN the unknown sample can be clas­sifed to predefned classes, based on the training data. It requires more computation than other techniques. But it is better for dynamic numbers that change or updated quickly. For new sample classifcation the K-NN process the de­tachment among the entire sample in the training data. The Euclidian distance is used for distance measurement. The samples with the smallest distance to the new sample are known as K-nearest neighbours [36]. The main idea be­hind theK-NN is to estimate on a fxed number of obser­vations those are closest to the desired output. It can be used for both in discrete and continuous decision making such as classifcation and regression. In case of classif­cation most frequent neighbours are selected and in case of prediction or regression the average of k-neighbours are calculated. Besides the Euclidean distance, Manhattan dis­tance and Minkowski distance are used in K-NN [37]. 3.3 Naïve Bayesian Classifer (NB) The NB classifcation technique is developed on the ba­sis of Bayesian theorem. This technique is most suitable when the input value is very high that when the dataset is very high we can use the Naïve Bayes technique. The other names of Bayes classifers are simple Bayes or idiot Bayes [38]. Naïve Bayes classiferisa simple probabilistic classi­fer with strong independence assumptions. The classifer can be trained on the nature of the probability model. It can work well in many complex real world situations. It requiresa little quantityof training datato calculatethepa­rameter essentialforthe classifcationanditisthe mainad-vantage of Naïve Bayes classifer. Bayes theorem is based on probabilistic belief. It is based on conditional proba­bility on mathematical manipulation. Therefore, Bayes im­portant characteristics can be computed using rules of prob­ability, more specifc conditional probability [39]. 3.4 DecisionTree (DT) DT presents a very encouraging technique for automating most of the data mining and predictive modelling process. They embed automated solutions such as over ftting and handlingmissingdata.The modelsbuiltbyDTscanbe eas­ily viewed as a tree of simple decisions and provide well-integrated solutions with high accuracy. DT also known as classifcation tree is a tree like structure which recursively partitions the dataset in terms of its features. Each interior nodeof sucha treeis labelled witha test function. The best known DT algorithms are C4.5 and ID3 [40].The fgure1 illustrates an example of DT with their IF ... THEN... ELSE ... rules form. Figure 1: DecisionTree with IF ... THEN ... ELSE ... Rules form S. Mishra et al. 3.5 Linear Discriminant Analysis (LDA) Discriminant analysis is a multivariate method of classif­cation. Discriminantanalysis is similar to regression anal­ysis except that the dependent variable is categorical rather than continuous in discriminant analysis; the intent is to predict class membership of individual observations based on a set of predictor variables. LDA generally attempts to fnd linear combinations of predictor variables that best separate the groups of observations. These combinations are called discriminant functions. It is one of the dimen­sional reduction methods, used in preprocessing in pattern-classifcation and machine learning applications. In order to avoid over ftting we can apply LDA in the dataset for good class separability with reduced computational cost [41]. Linear combinations of the predictors are used by LDAto model the degree to which an observation belongs toeachclassand discriminant functionisusedandathresh-old is applied for classifcation [42]. 3.6 Majority voting Majorityvotingis oneofthe ensemble learning algorithms, whichisavoting based methods. Majorityvoteis appropri­ate when each classifer cl can produce class-probability es­timates rather thanasimple classifcation decision. Aclass-probability estimate for data point y is the probability that the true class is k : A(f(x)= m|cl), for m =1, ··· ,M. We can combine the class probabilities of all thehypothe­ses so that the class probability of the ensemble can be found [43]. Sarwesh Site et. al. described about the bet­ter performance for better prediction after merging two or more classifer using the voting of data, which is known as ensemble classifer. They described various technique of ensemble classifer both for binary classifcation and multi-class classifcation [44]. Xueyi Wang et. al. prepared a model to fnd the accuracies of majority voting ensembles by taking the UCI repository data and made experiment of the 32 dataset. Theymade their data into different subsets such as core, outlier and boundary and found result that for better ensemble method or to achieve high accuracy; the weak individual classifer should be partly diverse [45]. 3.7 Performance measures This section discusses the basics of specifcity, sensitiv-ity/recall, and precision, NPV, FPR, FNP, FDR, F-Score, G-Mean, MCC and J-Statistics. These are extent to which a test measures what it is supposed to measure; in other words, it is the accuracy of the test or validity of the test and measured using a confusion matrix i.e. a two-by-two matrix. There are four elements of a confusion matrix such as; TruePositives (TP), FalsePositives (FP), False Negatives (FN)and True Negatives (TN)represented in the a, b, c and d cells in the matrix []. Specifcity is computed as d(TN)/(FP )+ d(TN), sensitivity as; a(TP )/a(TP )+ c(FN). Sensitivity and specifcity are in­versely proportional, i.e. as the sensitivity increases, the specifcity decreases and vice versa. Precision tells about, how many of test positives are true positives and if this number is higher or closer to 100 then, this test it sug­gests that this new test is doing as good as the defned standard. It can be computed as; a(TP )/a(TP )+ b(FP ); NPV tells how many of test negatives are true negatives and the desired value is approximately 100 and then it sug­gests that this new test is doing as good as the defned stan­dard. Computed as; d(TN)/c(FN)+ d(TN). Assum­ing all otherfactors remain constant, the PPV will increase with increasing prevalence; and NPV decreases with in-creasein prevalence.Afalse positive error orfall-outisa result that indicates a given condition has been fulflled, when it actually has not been fulflled, or erroneously a positive effect has been assumed. In other words, it is the proportion of all negatives that still yield positive test out­comes, i.e., the conditional probability of a positive test re­sult given an event that was not present and computed as b(FP )/b(FP )+ d(TN) or 1-Specifcity. An FNRisa test that resultindicatesa conditionfailed, whileit actually was successful, or erroneously no effect has been assumed. In other words, it is the proportion of events that are be­ing tested for which yield negative test outcomes with the test, i.e., the conditional probability of a negative test re­sult given that the event being looked for has taken place and can be computed as, c(FN)/a(TP )+ c(FN) or 1­Sensitivity. FDR is a way of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are de­signed to control the expected proportion of rejected null hypotheses that were incorrect rejections orfalse discover­ies and computed as, b(FP )/b(FP )+ a(TP ) or 1-PPV. F-Score measure considers both the precision and the re­call of the test to compute the score. It can be interpreted as a weighted average of the precision and recall, where an F-Score reachesitsbestvalueat1andworstat0.Itcanbe computed as: 2 × ((P recision × Recall)/(P recision × Recall)). MCC is used to measure the quality of binary classifcation. It takes into account true and false pos­itives and negatives and is generally regarded as a bal­anced measure and can be used in case of imbalanced datasets. This is a correlation coeffcient between the ob­served and predicted binary classifcation results. While there is no perfect way of describing the confusion ma­trix of true and false positives and negatives by a single number, MCC is generally regarded as being one of the best such measures and can be computed as: ((a × d) - p (b×c))/(a + b) × (a + c) × (d + b) × (d + c). The ac­curacy determined for the classifers may not be an ade­quate performance measure when the number of negative cases is much greater than the number of positive cases i.e. the imbalanced classes. Suppose,there are 1000 cases, 995 of which are negative cases and5 of which are pos­itive cases. If the system classifes them all as negative, the accuracy would be 99.5% even though the classifer missed all positive cases, in such cases G-mean comes into action. G-mean has the maximum value when sen­sitivity and specifcity are equal and can be computed as: v P recision × Recall.Youden’sJ Statisticsisawayof summarizing the performance of a diagnostic test. For a test with poor diagnostic accuracy,Youden’s index equals 0, andina perfect testYouden’s index equals1. The in-dexgives equal weighttofalse positive andfalse negative values, so all tests with the same value of the index give the same proportion of total misclassifed results. This is Sensitivity + Specificity.1. 4 Structural and functional representation of proposed ensemble based prediction model The schematic representation of the proposed model is shown in Figure 2. First the datasets are collected from three coastal district of Odisha and different parame­ters collected from the Odisha Agriculture Statistics, Di­rector of Agriculture and Food Production, Govt. of Odisha, Bhubaneswar sources, and then the datasets are pre-processed. The proposed methodology is based on classifer ensemble method. The intension is to predict the rice yield for two seasons such as Rabi and Kharif with re­spect to the climatic variability of the coastal Odisha. This model uses fve classifers where four classifers act as base classifer and one act as main classifer. List of classifer used are SVM, k-NN, DT, NB and LDA. Experiments are conductedbyconsidering each classifer once as main clas­sifer and remaining four as base classifersby usingMAT-LAB 10 at windows OS. Then, we get fve different pre­dicted outputs for rice production. Each classifer isbuild according the basic algorithm defned in literature [26] [31] [36] [38] [40] [43]. Let B = {b1, ··· ,b4} be the four base classifers, and C = {c1, ··· ,c4} be the output of those four base clas­sifers. The output of each classifer is passed through a conversion function f to retrieve the production denoted as S ˆasgiven below and this acts as input to main classifer. Sˆl = f(ci) (1) Where f can be computed using equation (2) N f(ci)= (2) |N| Where N is the sum of Si which belongs to class ci Hence, main classifer will have input having vector ˆˆ D = {dataset, Sˆ1,S2,Sˆ3,S4}. Result obtained after pro­cessing D by main classifer is compared expected output (y). Again equation (2) is used to compute the production based upon the class labels predicted. Final predictionis madeby using majorityvoting on the class label predicted by each classifer as main classifer (Figure 3). Throughout the paper #symbol is used before classifer for differentiating it with base classifers S. Mishra et al. 5 Experimentation and model evaluation This section elaborates the experimentation process start­ing from datasets chosen with their description, step wise representationoftheworking principleof proposed method and also the results are analyzed with respect to the aver­age classifcation accuracyand the predictiveperformances used to validate the model. 5.1 Dataset description Real dataset D is collected from three coastal regions of Odisha such as Balasore, Puri, Cuttack district. Let di . D .i =1, ··· , 31 features where |di| = 25 represents the attributes of the datasets. Differ­ent parameters collected from the Odisha Agriculture Statistics, Director of Agriculture and Food Production, Govt. of Odisha, Bhubaneswar [46]; such as p = {max temperature,min temperature,rain fall, humidity} that effect the rice production. Since, there are two types of rice production seasons such as; Rabi and Kharif produced between months “January -June” and “July –December”, hence pi is collected over the range of six months each resulting 24 set of attributes and 25th at­tribute is the production in hector of crops for particular year. The rice production graph for those three coastal ar­eas of Odisha from the year 1983-2014 is shown in Figure (4a) and Figure (4b) for Rabi and Kharif season respec­tively. The detail description of datasets with standard de-viation(Std.Dev)for three areasisshowninTable1. 5.2 Constructionof datasetfor classifcation Raw data collected have some missing value, and without class. One way is to deal with missing value is to simply replace it with most negligible positive real number. For classifcation, D must be in the form D = {d, y}, where di refers to features and yi refers to class label. In order to predict the production of rice crop, one needs to properly defne class label. Onewayisto use clusteringand allocate each feature a class label similar to their cluster number. Looking to the random cluster index formed makes it dif­fcult tobuild common classlabel for the feature. Hence, in our work we have proposed a range based class label formation. Let Sdenote the production column vector of dataset D and yi can be formulated using equation (3). . u = si