https://doi.or g/10.31449/inf.v48i15.4646 Informatica 48 (2024) 191–206 191 Intrusion Detection System for 5G Device- to-Device Communication T echnology in Internet of Things Ola Malkawi 1 , W esam Almobaideen 2 , Nadeem Obaid 3 , Bassam Hammo 3 1 Amman Arab University , Jordan 2 University of Jordan , Rochester Institute of T echnology , Jordan 3 University of Jordan, Jordan E-mail: o.malkawi@aau.edu.jo, wxacad@rit.edu, obein@ju.edu.jo, b.hammo@ju.edu.jo Keywords: device to device communication, intrusion detection system, machine learning, classification, 5G cellular communications Received: Feb 1, 2023 The emer gence of Internet of Things (IoT) has raised the need for high quality communications, and high performance networks. 5G cellular communication technology exhibits the r eadiness to pr ovide such high quality communication channels by using various advanced technologies. Device to device communica- tions is one of multiple technologies that have been suggested in 5G. By the employment of this technology , mobile devices can communicate with each other without the involvement of a base station (BS). This can eliminate congestion, expand coverage ar ea and incr ease thr oughput. Communicating devices set up a multi-hop path using nearby devices which act as r elaying elements, or r outers. However , the Self- or ganizing natur e and the lack of centralized contr ol of D2D make it easier to launch multiple types of attacks. In this paper , an intrusion detection system IDS is pr oposed using machine learning techniques. Eight types of attacks ar e consider ed to train the system for intrusion detection, then, multiple classification algorithms have been compar ed. Finally , a multi-objective model has been designed based on the r esults of comparison to secur e the communication pr ocess under D2D technology . The used dataset is generated using Network Simulator NS-2. Povzetek: V članku je pr edstavljen sistem za odkrivanje vdor ov (IDS) v komunikacijo naprava-naprava (D2D) v tehnologiji 5G, ki uporablja str ojno učenje za pr epoznavanje več vrst napadov . 1 Intr oduction The massive growth in wireless communications poses many challenges to meet users’ requirements. These re- quirements include the transmission of lar ge data volumes, reliable communications and small response time. The need for these requirements increase dramatically , espe- cially with the existence of Internet of Things (IoT) [ 27 ]. The result of the lar ge number of communicating mobile devices is a fully overloaded, low performance or even a dis-functioning cellular networks [ 21 ]. The next generation of cellular networks, i.e. 5G, is a promising solution for the growing demand on high performance networks [ 9 ] as it uti- lizes a number of technologies including: multiple inputs multiple outputs (MIMO), mm-W aves, small cells, beam forming, full-duplex and device to device (D2D) communi- cations [ 15 ] and [ 4 ], these technologies have come to fulfill the 5G promises. Device to device communications can provide an ef ficient use of millimeter waves and better utilization of the available bandwidth. W ith D2D, the communication between two devices can be accomplished without the need for the involvement of a BS which may involve long distance communications. Any two devices can communicate depending on multiple small hops instead of two long hops, from the sender to (BS) and from (BS) to the receiver . Therefore, a User Equipment (UE) can either help other UEs to communicate without the need to contact a BS, as Figure 1 shows in the communication between devices (B) and (C), or a UE may assist another UE to communicate with BS, as depicted in Figure 1 between device (A) and (BS), even in the case where a UE is located out of the transmission range of the BS [ 34 ]. W e can notice that there is a lack of researches inves- tigating security in D2D cellular networks. That is, up to our knowledge, there is no research work that has consid- ered security attacks resulted from the self-or ganizing na- ture of D2D devices where no centralized point is respon- sible for controlling communication process.Nevertheless, there are a number of researches considered other security problems such as [ 14 ] where a new key management ap- proach is proposed to secure the communication process between devices in D2D technology . However , because there are many similarities between D2D technology and wireless ad hoc paradigm, security studies on ad hoc can be applied to D2D communications. Intrusion detection in wireless environment networks be- 192 Informatica 48 (2024) 191–206 O. Malkawi et al. Figure 1: Communication in D2D technology comes a very challenging task, especially with the emer - gence of the modern technologies where normal users can initiate the cellular communication process using their or - dinary user equipment. That is, any user can advertise any piece of information to other users within the communi- cation process regardless of the degree of authenticity or honesty of that user . This can imply a lar ge amount of illegal actions which can arise and corrupt the function- ality of such systems. T raditional systems which depend on pre-established rules to classify users’ actions to normal versus malicious actions could be unable to perform ef fi- ciently as new attacks arise constantly . The more suitable choice is the use of data mining and machine learning tech- niques [ 1 1 ], where data can be collected and used to train a system how to discriminate normal behavior of a network from that with malicious actions. In this paper , we suggest that a moderate database is established in each base station where the traf fic is collected and analyzed based on a specially designed model to classify the network behavior to either normal or malicious. Consequently , taking the convenient procedure to secure the network. W e have used NS-2 to simulate the D2D environment in order to create the dataset which contains normal network behavior as well as the behavior of eight attacks. Five classification algorithms have been compared to select the best classifier , including random forest, artificial neural networks, support vector machines, decision trees and Naïve Bayes. The suggested features are ordered based on the importance of each feature and have been tested to select the optimal subset of features for the final model. After the optimal classification algorithm, random forest, has been selected, it has been applied to design a general model to classify new types of attacks which have not been seen in the test dataset. Based on classification results, the proposed model is presented, discussed and has proved to provide a highly secured system. The contribution of this work is summarized as follows: 1. A new dataset is generated using Network Simulator2, the dataset consists of 4200 instances, each instance represents either a normal network traf fic or an at- tacked network traf fic for a number of nodes within two minutes. 2. Multiple classification algorithms are tested to select the most appropriate classifier for the proposed IDS. 3. A complete intrusion detection model is proposed based on the selected classifiers. 4. The proposed model is tested and proved to be ef fi- cient in detecting both seen as well as unseen attacks. This paper is or ganized as follows. A background for D2D technology , possible attacks, data mining field and classification algorithms is provided in Section 2 . In Section 3 we discuss our methodology , experiment envi- ronment. Results are presented and discussed in Section 4 . Section 5 presents the proposed intrusion detection model. Finally , conclusion is drawn in Section 6 . 2 Backgr ound In this section, a brief background is provided on D2D com- munication technology ,its relation to ad hoc networks, se- curity of D2D devices and types of possible attacks on D2D communication process. 2.1 D2D communications The D2D is proposed for the first time in (3GPP Rel12) [ 35 ], the term D2D is suggested with the title (ProServ), which stands for Proximity based Services, and it was lim- ited for the adjacent devices with only one hop. Afterward, the concept of D2D has evolved with 4G (L TE) for emer - gency services [ 19 ]. D2D can of fer many advantages in cellular systems, this advantages include: 1. The ability to access communication services in the case of emer gency or disaster situations [ 21 ], where nodes can relay info without connecting the dis- functioning cellular network. 2. Better utilization of the spectrum, by using millime- ter waves and unlicensed spectrum [ 35 ], [ 31 ], [ 21 ] and [ 34 ]. 3. Network coverage expansion, where the communica- tion range can be expanded without adding additional BSs [ 35 ] and [ 21 ]. 4. Optimizing Power consumption [ 15 ], [ 35 ], [ 31 ], [ 21 ] and [ 34 ]. 5. Economic benefits, [ 15 ] by reducing the cost per bit and increasing revenue for operators [ 30 ] and [ 13 ]. Intrusion Detection System for 5G Device-to-Device… Informatica 48 (2024) 191–206 193 6. Flexibility for traf fic of floading [ 15 ]. 7. Better exploitation of devices proximity [ 21 ]. 8. Eliminating interference [ 35 ] and [ 25 ] due to the high path loss, which is defined as the attenuation of elec- tromagnetic waves during propagation through space, path loss is considered as an advantage in D2D com- munications because concurrent communications can be carried out without interfering [ 4 ]. 9. Eliminating congestion [ 35 ] because the traf fic is dis- tributed rather than accumulated around BS. 10. Diminishing data loss and the need for re-transmission which saves bandwidth [ 35 ]. These advantages implies a higher network performance in terms of throughput [ 35 ] and [ 31 ], latency [ 15 ], sys- tem capacity [ 12 ] and quality of service (QoS) [ 30 ], which can provide a significant advancement to a wide variety of uses. T o encourage cellular network users to be participants in D2D technology , relaying D2D devices may be com- pensated with either a financial incentive or by provision- ing services such as security during communication opera- tion [ 30 ]. 2.2 D2D communications and ad hoc networks By studying the distinctions between D2D and ad hoc net- works, we can conclude that D2D can easily operate in ad hoc mode. The most significant dif ference between D2D and ad hoc networks is that D2D can ask for some assis- tance from BS in some situations such as control, synchro- nization, path discovery [ 16 ], and resource allocation [ 18 ], while in ad hoc there are no such centralized assistance. Thus, D2D communication operation can be either con- trolled by a BS, or uncontrolled where each device perform a peer discovery . In literature, the suitability of applying ad hoc routing on D2D is studied in [ 21 ] by implementing both AODV and DSDV in D2D communications. The results in [ 21 ] have shown that using ad hoc routing protocols with D2D is a promising approach for cellular communications. Ad- ditionally , AODV is proved as a convenient candidate for D2D. AODV has also shown better ener gy consumption for lar ge scale D2D networks [ 26 ], and it has been suggested for D2D communications in [ 1 ] and [ 16 ]. In this paper , we have adopted AODV routing protocol to simulate D2D en- vironment to create our own dataset for the proposed model training and testing. 2.3 Security in D2D communications There is a lack in existing researches on the security of D2D communications technology . Up to our knowledge, there is no research that has considered possible attacks when applying D2D technology in cellular communica- tions. Nevertheless, we cannot overlook security studies on ad hoc networks attacks which are highly linked to possible attacks on D2D. In this section we look over a number of existing researches related to IDSs in ad hoc networks. In [ 3 ], there is a summarized state of the art of IDSs in ad hoc networks. Multiple types of IDS are designed such as fuzzy logic based systems, and cross layer acknowledgement based systems. One major IDS type is the classification based IDS that depends on machine learning techniques. W e consider this type as it is the most related type to our approach. In [ 3 ] multiple classifiers are mentioned such as SVM, NN and NB, which have been proved to be the most ef ficient classifiers. In [ 7 ] dif ferent IDSs for ad hoc networks are investigated and classified into multiple types. The machine learning based IDS is discussed, the most common model for this type is Bayesian network, fuzzy logic, NN and GA. In this research, we are going to apply theses classifiers and compare them to select the most appropriate one for our model. In [ 32 ], an IDS has been proposed for wireless mesh networks (WMN), which is a type of ad hoc networks. A dataset has been generated using NS3. Five attacks are tar geted by the proposed IDS. Genetic algorithms have been used for feature selection, the main idea in [ 32 ] is that dif ferent set of features might be beneficial for each attack. Moreover , the proposed IDS is limited to the specified attacks. it does not discuss if it could be deal with further or unseen attacks. In [ 20 ] the notion of cooperative IDSs is discussed. An optimization problem for how long an IDS needs to remain active in mobile ad hoc network to achieve the higher protection as well as saving battery life is presented. In [ 29 ], an IDS is proposed based on clustering, where a cluster head is responsible for monitoring to detect attacks rather than individual nodes continuous monitoring which can lead to high depletion of node’ s battery life. In [ 23 ], a number of attacks have been considered based on the notion of the adaptive response mechanism depending on the fact that fixed response mechanisms have a lot of deficiencies related to the ability of detection and power consumption. The use of machine learning in intrusion detection sys- tems is also adopted in [ 5 ] where multiple classifiers are compared to construct an IDS for small smart home net- work, with eight connected devices, the proposed IDS has shown a promising results for a small network, it needs to be expanded to be applied for a wider network to prove the feasibility of using machine learning for such networks. Despite the fact that the communication style of ad hoc networks is similar to D2D in many aspects, all IDSs of ad hoc depend on that fact that there is no centralized point to monitor traf fic except cluster heads in cluster based ad hoc networks. Even in this type, clusters are often ordi- 194 Informatica 48 (2024) 191–206 O. Malkawi et al. nary nodes with limited capabilities. Most researches de- pend on host based detection which relies on the wireless node capabilities and traf fic. However , in D2D, the main dif ference is the presence of base station, which can act as a centralized point and be utilized to monitor the overall traf fic and to analyze this traf fic and identify attacks if they occur . T able 1 summarizes the state of the art for security in literature, and highlights the main limitations in the state of the art. In this paper , we have considered these limita- tions. W e have started from the security aspect by consider - ing the most possible attacks and network variations as our first priority , then we consider the performance of machine learning algorithms and power consumption. W e have also investigated the ability of the proposed IDS to detect new attacks. Moreover , we have consider moderate to lar ge net- work sizes. 3 Methodology In this section, we discuss the methodology of this research and show the steps that we have gone through to develop our IDS. Figure 2 shows the block diagram of our method- ology . As figure 2 shows , our methodology consists of five stages: problem understanding, data generation, data preparation, modeling, and evaluation. In the following subsections we provide a description of these stages. 3.1 Pr oblem understanding Internet of things is getting more and more acceptance and popularity by dif ferent categories of users. The free na- ture of (IoT) opens the door to a wide variety of attacks and makes them very easy to be launched [ 28 ]. In cellu- lar communication, and particularly with D2D, it is very essential to detect these attacks in order to preserve com- munication process functioning ef ficiently by a robust and ef ficient IDS. T o achieve this objective, this work proposes and design a data mining model to detect attacks as their oc- currence in the network, and based on detection outcomes, the system behaves relying on a predefined plan to counter the malicious node. 3.2 Dataset generation As D2D is an underdevelopment technology and up to our knowledge, we cannot find a real dataset for actual traf fic. Therefore, we have generated this dataset using Network Simulator 2. In our conducted simulation experiments, we have selected AODV routing protocol [ 2 ], as it has been proved to be the most ef ficient routing protocol for D2D technology as it has been stated in [ 21 ]. W e have conducted 4200 simulation scenario experi- ments, each within two minutes. 50-100 nodes are de- ployed within 1000mX1000m terrain area. Mobility speed has been varied from 0 m/s which denotes the static, im- movable, nodes to 12m/s speed which denotes a node mov- ing with 43 k/h which is equivalent the speed of driving a Figure 2: Main Steps of Proposed Methodology car in residential quarter . T able 2 illustrates the details of simulation environment. Each instance of the generated dataset represents two minutes traf fic of either 50 nodes or 100 nodes moving with either (0-3)m/s, (3-6)m/s,(6-9)m/s or (9-12)m/s. Part of this dataset represents normal network behaviour , the remain- ing instances represent the traf fic with the aforementioned attacks launched. After the simulation has been conducted for the 4200 instances, performance has been measured in terms of predefined metrics which are considered as dataset features. In the next section we briefly discuss the proposed features that have been considered as inputs to the classifi- cation algorithm. 3.3 Featur e extraction After the simulation have been conducted, we have pro- posed a number of traf fic properties to be considered as in- put features for classification algorithms.T able 3 shows our proposed features with a brief description for each feature. These features can be measures from trace files that have been resulted from the simulation. T race files act as detailed log files for all communications in a given scenario. T race files are analyzed to calculate the aforementioned features. Figure 3 depicts a part of the final dataset.In Figure 3 we can see the aforementioned features of T able 3 as well as two class labels. we have considered two class labels since we will produce two models as we will discuss later . 3.4 Data pr eparation As the dataset has been generated from a simulation tool, we was able to control data format by building output fea- tures as needed. The only pre-processing that have been needed was dealing with missing values by deleting in- stances which contain null or infinite values. By this step we have our final dataset version ready to be an input to Intrusion Detection System for 5G Device-to-Device… Informatica 48 (2024) 191–206 195 T able 1: Security in literature Paper T ar get NT T ype Goals Limitations (Alnaghes et. al) Ad Hoc Comparison between existing classifiers Do not consider security aspects. (Just ML enhancement) (V ijayanand et. al) W irels Mesh Networks Feature Selection Limited to 5 attacks . Do not considers unseen attacks (Marchang et. al) Ad Hoc Studying how long an IDS need to be active Limited to enhance battery life aspects and activation time of IDS. (Subba et. al) Clustered WSNs Battery life IDS based on cluster heads Cluster heads are normal nodes which means that monitoring can lead to battery depletion. (Nadeem et. al) Ad Hoc Adaptive IDS to save battery Battery life is the main consideration rather than security itself (Anthi et. al) Ad Hoc Comparison of classifiers performance within a smart home environment Limited to small networks (8 devises only) T able 2: Simulation Environment Simulation Parameter V alue Simulator NS2 Routing ProtocOl AODV T ransport Layer Protocol TCP Simulation Duration 120 seconds Number of UEs 50,100 nodes Mobility Speed (0-3, 3-6, 6-9, 9-12) T errain area 1500X1500 m2 classification algorithms. The class distribution of the fi- nal dataset is shown in figure 4 . The next step is to per - form feature selection, the main tar get of applying feature selection here is to use as less features as possible to reduce computational cost. W e have adopted a simple feature se- lection approach using R language which is a programming language used for statistical computations [ 8 ]. W e have uti- lized R language to order the proposed 14 features accord- ing to feature importance in random forest classifier and us- ing R importance function. Figure ?? shows features rank- ing using R importance function in random forest classifier , which will be discussed in details later . 3.5 Modeling As the dataset has become ready for classification process, we have to select the most appropriate classification algo- rithm, we decided to compare multiple classifiers in order to select the best one based on classification outcomes. The compared classifiers have been chosen based on the pre- viously designed IDSs. As denoted before, multiple clas- sifiers are implemented in the literature such as support vector machines, K-nearest neighbors, artificial neural net- works, decision trees and Naive Bayes and they have been proven to be ef ficient in developing IDSs [ 3 ]. Therefore, we have selected theses classifiers to be compared, then we propose to add random forest as it is considered as a promis- ing classifier , specially in intrusion detection systems [ 24 ] and [ 22 ]. The tar get of this step is to find the best predic- tion model to get a high performance IDS. WEKA Environ- ment for Knowledge Analysis version 3.8.3 has been used to apply the aforementioned classification algorithms and compare the results objectively . In this research, our objective is to build two IDSs models, the first model is the binary classification model which classifies network traf fic to either normal or abnor - mal where abnormal denotes the occurrence of an attack. On the other hand, the second model tar get is to specify attack name and type. In this paper , we will make our ex- periments based on each of these two models separately . Finally , we will integrate these two models into one multi- objective IDS. WEKA is considered as a comprehensive collection of machine learning algorithms as well as data pre-processing tools. It is one of most common data mining tools [ 10 ]. The main advantages of using WEKA is that it contains a wide variety of algorithms , it provides the most necessary performance measurements, and it has a simple graphical user interface, which makes it easy to be utilized in data mining researches. 3.6 Evaluation T o evaluate the performance of of the selected classifiers to be compared, we have considered the most common eval- uation parameters of classification algorithms. These per - formance metrics include: classification accuracy , sensitiv- ity , specificity , G-means and AUC. The first four measure- 196 Informatica 48 (2024) 191–206 O. Malkawi et al. T able 3: Description of dataset features. Abb reviation T erm Description Range E2E End to end delay time elapsed between send- ing and receiving a packet Between 0 to infinity DUP Duplicated packets the number of packets that have been sent more than once between 0 to infinity OH Overhead the number of control pack- ets sent 0 to infinity SENT Sent packets the number of packets have been initiated by all sources 0 to infinity RCVD Received packets the number of packets have been received by intended destinations 0 to infinity LOST Lost packets the number of packets sent from their source and have not been delivered to final destination 0 to infinity FWD Forwards the number of forwards for all transmitted packets by all intermediate nodes 0 to infinity THRPUT Throughput the number of delivered packets per second 0 to infinity RET Re-transmissions the number of packets that have been re-transmitted based on an error 0 to infinity PDR Delivery Ratio received packets/sent pack- ets 0 to 1 P A TH Path Length average number of hops from source to destination for all transmitted packets 0 to the total number of nodes TIME T ime the time when the last packet has been sent or received 0 to 120 SPEED Mobility Speed The Maximum mobility speed of the mobile nodes 0 to 12 DENS Density Number of nodes per 1000m X 1000m 50,100 ments are based on the confusion matrix. 4 Experiments and r esults In this section, we discuss the experiments that we have conducted based on the methodology described in Figure 2 and using the generated dataset. W e consider the following experiment scenarios. – Scenario 1 : The standard classifiers k-NN, DT , NB and NN, RF , SVM are applied for the e ntire dataset without feature reduction, that is, the 14 features are considered as inputs to all classification algorithms. Accordingly , multiple performance metrics are mea- sured. Figure 5 depicts accuracy , recall, Gmean, F-measure and AUC for the aforementioned classi- fiers. From Figure 5 , we can notice that random for - est do well in terms of all performance metrics and it achieves the higher level of performance. WEKA Environment for Knowledge Analysis version (3.8.3) has been used for all comparison experiments. For Naive Bayes, random forest, support vector machine and J48 decision trees, we have used their Java imple- mentations in WEKA. For k-NN, K is set to 1 as this value produced the best output. For artificial neural networks, the number of hidden layers is set to 2, we have selected 2 as it is considered to be suf ficient with simple data sets. – Scenario 2 : After the initial evaluation of the best classification algorithms to be used in our IDS, we have concluded that random forest is the most appro- priate classifier , so, we have used R language random forest importance function to rank the dataset features. The outcomes of the ranking process are presented in Figure ?? . W e have considered this ranking to apply Intrusion Detection System for 5G Device-to-Device… Informatica 48 (2024) 191–206 197 Figure 3: A shot of the generated data set Figure 4: Class distribution of the generated data set feature selection. W e have simply removed the x least importance features and observe the performance of the tar get classifiers. x has been varied to find the op- timal number of features to be removed in order to get the best performance. W e have started with removing 4 features and keeping 10, then we have removed 8 and 12 features to keep 6 and 2 features, respectively . In scenario 2, the same classifiers applied in scenario 1 are also applied using the same instances and pa- rameter settings with varying the removed features. The tar get of the second scenario is to quantify the im- provement in the performance of each classifier when the number of features is reduced. Thereafter , the main tar get is to use as less features as possible while getting the higher performance to optimize the pro- posed IDS in terms of computational cost. Figure 6 shows the steps of scenario 2. 4.1 Experimental setup In this section we discuss the experiments which have been conducted to design the final model. The aforemen- tioned classification algorithms, namely k-NN, DT , NB, NN , SVM and RF , are trained and tested based on 10-fold cross validation technique. In 10-fold cross validation, the dataset is divided into 10 equal parts, thereafter , training is carried out on nine parts and tested on the remaining one part. T raining and testing are repeated ten times such that in each time the test part is changed. Finally , average of all test results is reported. When 10-Fold cross validation is used, we can guarantee that the entire dataset is eventually used for both training and testing. Moreover , we ensure that stratified sampling is achieved by creating the 10 folds such that in each fold, class distribution is close as possible to the dataset distri- bution. Here, stratified sampling is very important to get better results in terms of bias and variance [ 17 ]. 4.2 Experiment I: binary classification with all featur es dataset In this experiment, our selected classification algorithms are applied to the generated dataset without removing any feature. Our tar get here is to evaluate the performance of all classifiers to determine if there is an attack or not using all features proposed. Results are depicted in the column chart of Figure 5 . By examining the results, we notice that we can achieve a very high classification accuracy under most classifiers. W e notice also that the lowest performance clas- sifier is NB classification algorithm, this is due to the fact 198 Informatica 48 (2024) 191–206 O. Malkawi et al. Figure 5: Binary classification performance of classification algorithms for the entire dataset that the proposed features cannot be independent, they al- most depend on each other .Delivery ratio for example is de- rived from the other two features, namely , sent and received packets. As another example, lost is the result of abstract- ing sent packets and received packets. In NB, the classifica- tion is built on the assumption that features are independent from each other [ 6 ]. As we can see, our generated dataset violates this assumption, therefore NB classifiers have the worst performance. In the next experiment we are going to compare all classifiers in terms of a number of performance metrics, however , we are going to a apply feature selection based on the performance-wise ranking shown in Figure ?? in order to optimize performance as well as the computation cost. 4.3 Experiment II: binary classification with featur e selection This experiment tar gets the process of determining if there is an attack or not. In this experiment, to optimize the per - formance and communication cost of the final classification model for our IDS, first, we have applied the R performance function order of the suggested 14 attributes. The result of ordering process for binary classification is shown in the left-hand table of Figure ?? . Then we have tried to remove the least importance features, unnecessary features increase computational cost and time and may hamper classification process which limits performance. T o identify the optimal number of features to be removed, we have tried dif ferent removal ratios to get the best model. W e experienced the performance of selected classifiers by training them on datasets with dif ferent features, starting with full features dataset which produced results shown in Figure 5 . Then, we have tried to remove the lowest 4,8 and 12 features and keeping the higher 10, 6, and 2 higher importance features, respectively . W e have measured ac- curacy , root-mean-square-error , area under curve curve, re- call, f-measure and g-mean. Figure 7 up to Figure 12 show the evaluation results of this experiment. In Figure 7 , we can see that feature reduction did not af- fect classification accuracy significantly for all used classi- fiers except with NB which is the only classifier that has been improved in terms of accuracy with 40% approxi- mately which was achieved by applying classification with the only two higher importance features, duplication and lost packets. The same performance has been noticed in Figure 8 , Fig- ure 9 , Figure 10 , Figure 1 1 and Figure 12 , for recall, f- measure, area under curve, G-mean and root mean square error , respectively . Results of theses figures indicate that it is still easy for random forest, decision trees and K-nearest neighbor classifiers to identify attacks which represent ma- jority in the dataset.on the other hand naive Basie and sup- port vector machine have the worst classification perfor - mance [ 33 ]. W e have noticed that random forest classifier has achieved the best performance in f-measure (97%), accu- racy (95%) and AUC (98%), while it was the second best Intrusion Detection System for 5G Device-to-Device… Informatica 48 (2024) 191–206 199 Figure 6: steps of feature selection classifier in terms of Gmean (89%) and mean-square er - ror(10%) metrics. W e conclude that removing the least importance features did not significantly af fect classifica- tion performance while it is guaranteed to limit computa- tion cost. The aforementioned performance ratio of random forest indicates that it can identify an attack with a ratio of 95%. As a conclusion of this experiment, random forest model is the best model for intrusion detection. This conclusion was made based on the values of accuracy , which is referred to as detection rate and considered as the most important met- ric in intrusion detection systems. This conclusion will be adopted in our final model. 4.4 Experiment III: attack based classification with featur e selection In this experiment our tar get is to exactly specify the type of attack launched, so, we have repeated steps of experiment II in order to optimize performance and communication cost of the final classification model for our IDS. we have started with applying the R performance function to order the 14 attributes. The output of the ordering process for attack classification is shown in the right-hand table of Figure ?? . Next, we have tried to remove the least importance fea- tures. As in experiment II, we have tried to remove dif ferent number of features and measuring the performance, starting with full features dataset which results are presented in Fig- ure 13 . Then, we have tried to remove the lowest 4,8 and 12 features and keeping the higher 10,6,and 2 higher im- portance features, respectively . classification is applied for the remaining features to identify the type of the attack. W e have measured accuracy , root-mean-square-error , area un- der curve curve, recall,f-measure and g-mean. Figure 14 up to Figure 19 show the evaluation results of this experiment. In Figure 14 , we can see that feature reduction did not af fect considerably classification accuracy for the three higher classifiers which are (RF ,KNN, and J48). These three classifiers have shown the best accuracy , between (75% and 85%). The remaining classification algorithms have shown a significantly lower accuracy for attack spec- ification (lower than 55%). Similar performance has been noticed in Figure 15 , Fig- ure 16 , Figure 17 and Figure 18 for recall, f-measure, area 200 Informatica 48 (2024) 191–206 O. Malkawi et al. Figure 7: Accuracy of classification algorithms for the dif- ferent datasets Figure 8: Recall of classification algorithms for the dif fer - ent datasets Figure 9: F-measure of classification algorithms for the dif- ferent datasets under curve and Gmean, respectively , where the dominated observations of these metrics for the experienced classifiers are that RF ,KNN and J48 have the highest performance in terms of these metrics. Random forest is the best of them with 80%, 82%, 98% and 90% for each of recall, f-measure, Figure 10: Area under curve of classification algorithms for the dif ferent datasets Figure 1 1: Gmean for classification algorithms for the dif- ferent datasets Figure 12: Root mean square error of classification algo- rithms for the dif ferent datasets area under curve and Gmean, respectively . The next observation is that classification with all features included has the best performance. Finally , the last obser - vation is related to NN, which shows an improvement of performance with feature reduction until we remove 8 fea- tures and keep 6. W ith less than 6 features, we notice a sig- Intrusion Detection System for 5G Device-to-Device… Informatica 48 (2024) 191–206 201 Figure 13: Attack classification performance of classifica- tion algorithms for the entire dataset Figure 14: Accuracy of classification algorithms for the dif- ferent datasets Figure 15: Recall of classification algorithms for the dif fer - ent datasets nificant drop of performance metrics for NN particularly . Figure 19 shows the root mean square error metric for experimented classification algorithms, results seem to be dif ferent for this metric because we notice that there are no significant improvement as features are eliminated, moreover , random forest classifier has achieved the lower Figure 16: F-measure of classification algorithms for the dif ferent datasets Figure 17: Area under curve of classification algorithms for the dif ferent datasets Figure 18: G-mean for classification algorithms for the dif- ferent datasets root mean square error with 16% using all features. In this experiment, we can conclude that random forest is the most suitable classifier to be considered in our proposed IDS. In the next experiment, we are going to integrate re- sults and conclusions of these three experiments to provide the final model of the proposed IDS. 202 Informatica 48 (2024) 191–206 O. Malkawi et al. Figure 19: Root mean square error of classification algo- rithms for the dif ferent datasets 4.5 Experiment VI: detection unseen attacks AS the number of users increases and new technologies are taking place with the emer gence of Internet of Things, new attacks are continuously occur . In this section, we are going to test the ability of random forest classifier to detect new attacks which have not been included in the training dataset. W e have selected random forest based on the previous ex- periments which have shown that it is the most appropriate classification algorithm. W e have divided our dataset into two parts, training and testing datasets. In testing dataset, we have included instances for only two attacks, A and B, as well as normal instances. On the other hand, the training dataset include all attacks except A and B. The tar get is to measure the ability of the classifier to recognize new attacks such that it has not been trained on. W e have tried this experiment 4 times with dif ferent values of A and B to cover all attacks. T able 3 shows the perfor - mance of random forest classifier during four conducted ex- periments in terms of accuracy , recall, F-measure and area under curve. From Figure 3 , we can notice that random forest is able to detect 86% of unseen attacks, which is rep- resented by the average recall metric. Detection ratio is also near 86% which represents the classifier ’ s ability to distin- guish normal network behaviour from attacks. F-measure and area under curve metrics achieved 84% and 79%, re- spectively , which are considered as acceptable detection for unseen and emer ged attacks. 5 Intrusion detection model for D2D communications In this section we integrate outcomes of the conducted ex- periments to provide a design for a complete intrusion de- tection system for D2D communications. Figure shows the IDS design for a cellular network. This model sug- gests to add a spatio-temporal database system, which is used in wireless communication networks, and only for a short time-span within a geographic region. By adding the spatio-temporal database to the cellular system, traf fic of all nodes connecting to a base-station is temporally stored in the aforementioned database. From this database, we can extract the features of our proposed dataset. Classification algorithm, random forest, which has been selected based on this research can be applied periodically , e.g. every two minutes, in the initial step, random forest performs binary classification to determine if there is an attack. If no attack is detected, there is nothing to do, otherwise, if an attack is detected, a second classification is applied to determine its type. When the attack is specified, the appropriate response is determined by either disabling D2D communication and returning to the usual cellular communication paradigm, or by enabling one of the detection or mitigation techniques. Response of detecting an attack represents a separate and complementary part of our designed IDS. Figure 20: Intrusion Detection Model for a cellular network with D2D communications 6 Conclusion According to the increasing number of internet users and the emer gence of new technologies, the number of cyber - security attacks increases. The existence of an intrusion detection system becomes a high necessity . Detection and mitigating techniques are often provided at the expense of some cost such as, delay , additional equipment, overhead, and so on. The employment of machine learning techniques help to detect intrusions based on the existing knowledge and data, and without adding any extra cost. In this research, the tar get was to use classification al- gorithms in intrusion detection for D2D communications. First, we have generated our own dataset using NS-2 sim- ulator , then, we have compared multiple classification al- gorithms to select the most appropriate classifier to be used in our IDS. W e have also applied a simple feature selection based on feature importance estimated using R language. Intrusion Detection System for 5G Device-to-Device… Informatica 48 (2024) 191–206 203 T able 4: Performance results of unseen attacks Exp No. Unseen Attacks Accuracy Recall F-Measur e AUC 1 Rushing + W ormhole 86.30% 0.863 0.866 0.898 2 Blackhole + Cachepoisoning 84.10% 0.841 0.864 0.631 3 Hellofloding+Jellyfish 99.60% 0.997 0.991 1 4 Cooperative BH + Greyhole 75.20% 0.752 0.646 0.66 A verage 86.30% 86.33% 84.18% 79.73% Figure 21: Contribution of this work as compared to SOT A Figure 21 dpicts the main contribution of this paper as com- pared to previous research in the SOT A (State Of The Art) Experiments indicated that random forest is the most ap- propriate classification algorithm to be used for our IDS. It has proved a 97% detection rate for binary classification, and 85% accuracy in attack type identification. Finally , we have provided a suggested design for an IDS of a cellular network. Refer ences [1] S. A. Abd, S. Manjunath, and S. Abdulhayan. “Direct Device-to-Device communication in 5G Networks”. In: Computation System and Infor - mation T echnology for Sustainable Solutions (CSITSS), International Confer ence on . doi: 10.1 109/CSITSS.2016.7779425 . IEEE. 2016, pp. 216–219. [2] W . Almobaideen and D. AlKhateeb. “CSPDA: Contention and stability aware partially disjoint AOMDV routing protocol”. In: Cr oss valida- tion2015 IEEE Jor dan Confer ence on Applied Electrical Engineering and Computing T echnolo- gies (AEECT) . doi: 10.1 109/AEECT .2015.7360548 . IEEE. 2015, pp. 1–6. [3] M. S. Alnaghes and F . Gebali. “A Survey on Some Currently Existing Intrusion Detection Systems for Mobile Ad Hoc Networks”. In: The Second Inter - national Confer ence on Electrical and Electr onics Engineering, Clean Ener gy and Gr een Computing (EEECEGC2015) . V ol. 12. 2015. URL: https : / / api . semanticscholar . org / CorpusID : 54682924 . [4] R. I. Ansari et al. “5G D2D networks: T echniques, challenges, and future prospects”. In: IEEE Systems Journal (2017). doi: 10.1 109/JSYST .2017.2773633 . [5] E. Anthi et al. “A supervised intrusion detec- tion system for smart home IoT devices”. In: IEEE Internet of Things Journal 6.5 (2019). doi: 10.1 109/JIOT .2019.2926365 , pp. 9042–9053. [6] M. Bramer . Principles of data mining . V ol. 180. doi: 10.2165/00002018-200730070-00010 . Springer , 2007. [7] I. Butun, S. D. Mor gera, and R. Sankar . “A survey of intrusion detection systems in wire- 204 Informatica 48 (2024) 191–206 O. Malkawi et al. less sensor networks”. In: IEEE communi- cations surveys & tutorials 16.1 (2014). doi: 10.1 109/SUR V .2013.0501 13.00191 , pp. 266–282. [8] M. J. Crawley . The R book . doi: 10.1002/9781 1 18448908 . John W iley & Sons, 2012. [9] A. Habbal, S. I. Goudar , and S. Hassan. “A Context- aware Radio Access T echnology selection mecha- nism in 5G mobile network for smart city applica- tions”. In: Journal of Network and Computer Appli- cations 135 (2019). doi: 10.1016/j.jnca.2019.02.019 , pp. 97–107. [10] M. Hall et al. “The WEKA data mining software: an update”. In: ACM SIGKDD explorations newsletter 1 1.1 (2009). doi: 10.1 145/1656274.1656278 , pp. 10– 18. [1 1] K. M. Harahsheh and C.-H. Chen. “A survey of using machine learning in IoT security and the challenges faced by researchers”. In: Informatica 47.6 (2023). doi: 10.31449/inf.v47i6.4635 . [12] Z. Hashim and N. Gupta. “Futuristic device-to- device communication paradigm in vehicular ad-hoc network”. In: Information T echnology (InCIT e)-The Next Generation IT Summit on the Theme-Internet of Things: Connect your W orlds, International Confer - ence on . doi: 10.1 109/INCITE.2016.7857618 . IEEE. 2016, pp. 209–214. [13] Y . Jung, E. Festijo, and M. Peradilla. “Joint operation of routing control and group key management for 5G ad hoc D2D networks”. In: Privacy and Security in Mobile Systems (PRISMS), 2014 International Con- fer ence on . doi: 10.1 109/PRISMS.2014.6970602 . IEEE. 2014, pp. 1–8. [14] M. A. Kandi et al. “A versatile Key Management protocol for secure Group and Device-to-Device Communication in the Internet of Things”. In: Jour - nal of Network and Computer Applications 150 (2020). doi: 10.1016/j.jnca.2019.102480 , p. 102480. [15] U. N. Kar and D. K. Sanyal. “An overview of device-to-device communication in cellu- lar networks”. In: ICT Expr ess (2017). doi: 10.1016/j.icte.2017.08.002 . [16] B. Kaufman and B. Aazhang. “Cellular net- works with an overlaid device to device net- work”. In: Signals, Systems and Computers, 2008 42nd Asilomar Confer ence on . doi: 10.1 109/AC- SSC.2008.5074679 . IEEE. 2008, pp. 1537–1541. [17] R. Kohavi et al. “A study of cross-validation and bootstrap for accuracy estimation and model se- lection”. In: Cr oss validation . V ol. 14. Montreal, Canada. 1995, pp. 1 137–1 145. URL: https : / / www . ijcai . org / Proceedings / 95 - 2 / Papers / 016.pdf . [18] X. Lin et al. “An overview of 3GPP device- to-device proximity services”. In: IEEE Communications Magazine 52.4 (2014). doi: 10.1 109/MCOM.2014.6807945 , pp. 40–48. [19] J. Liu et al. “Device-to-device communication in L TE-advanced networks: A survey”. In: IEEE Com- munications Surveys & T utorials 17.4 (2015). doi: 10.1 109/COMST .2014.2375934 , pp. 1923–1940. [20] N. Marchang, R. Datta, and S. K. Das. “A novel approach for ef ficient usage of intrusion detec- tion system in mobile Ad Hoc networks”. In: IEEE T rans. V ehicular T echnology 66.2 (2017). doi: 10.1 109/TVT .2016.2557808 , pp. 1684–1695. [21] P . Masek, A. Muthanna, and J. Hosek. “Suitability of MANET routing protocols for the next-generation national security and public safety systems”. In: Confer ence on Smart Spaces . doi: 10.1007/978-3- 319-23126-6 2 2 . Springer . 2015, pp. 242–253. [22] Y . Meidan et al. “Detection of unauthorized iot devices using machine learning techniques”. In: arXiv pr eprint arXiv:1709.04647 (2017). doi: 10.48550/arXiv .1709.04647 . [23] A. Nadeem and M. P . Howarth. “An intrusion detection & adaptive response mechanism for MANET s”. In: Ad Hoc Networks 13 (2014). doi: 10.1016/j.adhoc.2013.08.017 , pp. 368–380. [24] F . A. Narudin et al. “Evaluation of machine learning classifiers for mobile malware detection”. In: Soft Computing 20.1 (2016). doi: 10.1007/s00500-014- 151 1-6 , pp. 343–357. [25] J. Qiao et al. “Enabling device-to-device communi- cations in millimeter -wave 5G cellular networks”. In: IEEE Communications Magazine 53.1 (2015). doi: 10.1 109/MCOM.2015.7010536 , pp. 209–215. [26] S. Riaz, H. K. Qureshi, and M. Saleem. “Perfor - mance evaluation of routing protocols in ener gy harvesting D2D network”. In: Computing, Elec- tr onic and Electrical Engineering (ICE Cube), 2016 International Confer ence on . doi: 10.1 109/ICE- CUBE.2016.7495233 . IEEE. 2016, pp. 251–255. [27] H. Saadeh et al. “Hybrid SDN-ICN Architecture De- sign for the Internet of Things”. In: 2019 Sixth In- ternational Confer ence on Softwar e Defined Sys- tems (SDS) . doi: 10.1 109/SDS.2019.8768582 . IEEE. 2019, pp. 96–101. [28] J. Sengupta, S. Ruj, and S. D. Bit. “A Comprehensive survey on attacks, security issues and blockchain solutions for IoT and IIoT”. In: Journal of Net- work and Computer Applications 149 (2020). doi: 10.1016/j.jnca.2019.102481 , p. 102481. Intrusion Detection System for 5G Device-to-Device… Informatica 48 (2024) 191–206 205 [29] B. Subba, S. Biswas, and S. Karmakar . “Intrusion de- tection in Mobile Ad-hoc Networks: Bayesian game formulation”. In: Engineering Science and T ech- nology , an International Journal 19.2 (2016). doi: 10.1016/j.jestch.2015.1 1.001 , pp. 782–799. [30] M. N. T ehrani, M. Uysal, and H. Y anikomeroglu. “Device-to-device communication in 5G cellular networks: challenges, solutions, and future direc- tions”. In: IEEE Communications Magazine 52.5 (2014). doi: 10.1 109/MCOM.2014.6815897 , pp. 86– 92. [31] M. Usman et al. “A software-defined device-to- device communication architecture for public safety applications in 5G networks”. In: IEEE Access 3 (2015). doi: 10.1 109/ACCESS.2015.2479855 , pp. 1649–1654. [32] R. V ijayanand, D. Devaraj, and B. Kannapiran. “Intrusion detection system for wireless mesh net- work using multiple support vector machine clas- sifiers with genetic-algorithm-based feature selec- tion”. In: Computers & Security 77 (2018). doi: 10.1016/j.cose.2018.04.010 , pp. 304–314. [33] D. W ang and G. Xu. “Research on the detection of network intrusion prevention with SVM based op- timization algorithm”. In: Informatica 44.2 (2020). doi: 10.31449/inf.v44i2.3195 . [34] L. W ei et al. “Ener gy ef ficiency and spectrum ef- ficiency of multihop device-to-device communi- cations underlaying cellular networks”. In: IEEE T ransactions on V ehicular T echnology 65.1 (2016). doi: 10.1 109/TVT .2015.2389823 , pp. 367–380. [35] V . Y azıcı, U. C. Kozat, and M. O. Sunay . “A new control plane for 5G network architecture with a case study on unified handof f, mobility , and routing man- agement”. In: IEEE communications magazine 52.1 1 (2014). doi: 10.1 109/MCOM.2014.6957146 , pp. 76– 85. 206 Informatica 48 (2024) 191–206 O. Malkawi et al.