https://doi.org/10.31449/inf.v47i6.4691 Informatica 47 (2023) 173–190 173 Performance Evaluation of Machine Learning Models for Cyber Threat Detection and Prevention in Mobile Money Services Bodunde Odunola Akinyemi 1* , Dauda Akinwuyi Olalere 2 , Mistura Laide Sanni 1 , Emmanuel Ajayi Olajubu 1 , Ganiyu Adesola Aderounmu 1 , Isa Ali Ibrahim 3 1 Obafemi Awolowo University, Ile-Ife, Nigeria 2 MTN, Nigeria 3 Research and Development Department, Federal Ministry of Communications and Digital Economy E-mail: bakinyemi@oauife.edu.ng, DaudaO@mtnnigeria.net, msanni@oauife.edu.ng, emmolajubu@oauife.edu.ng, gaderoun@oauife.edu.ng, isaaliibrahim@hotmail.com Keywords: machine learning, SMOTE, mobile money, cyber threats, evaluation, predictive models Received: February 20, 2023 In this paper, an investigation was made to evaluate the effectiveness of the different classifiers suitable to predict the probability of a cyber-threat or fraudulent intent applicant during the Mobile Money Service on-boarding or service activation process, with the goal of determining the best machine learning model for the predictive model solution. Experimental work was carried out by formulating cyber threat predictive models using six supervised machine learning algorithms: Logistic regression(LR), Naïve Bayes, Shallow Neural Network (SNN), Deep Neural Network (DNN), Classification and Regression Trees (CART) and Random Forest (RF) of different configurations. Each model was simulated with both Synthetic Minority Operation Techniques (SMOTE) and without SMOTE (No-SMOTE) on 25,000 records of mobile money applicants. Twenty-four (24) different configurations of the formulated predictive models were simulated and evaluated using the Python programming language. Simulation results of the predictive models proved that the Random Forest model multiclass configurations with the SMOTE dataset outperformed all other configurations. The results also showed that the multiclass experiments with SMOTE had better performance than the binary configurations with NO-SMOTE in the predictive models. The study concluded that using the Random Forest-based predictive machine learning model will increase the security level of the Mobile Money solution by detecting and preventing anomalous customer registrations during the unbanked onboarding process. Povzetek: Napovedovanje kibernetskih groženj mobilnega denarja z uporabo algoritmov strojnega učenja. 1 Introduction Modern economies are today inspired mostly by digital currency, and the widespread usage of mobile devices has opened up a new market for digital financial services in emerging countries [1]. These developments have made it easier for the underprivileged people in these nations to access financial services [2–5]. In Africa, there is currently a great deal of demand for promoting financial inclusion, premised on the willingness of the nations to adopt financial inclusion action plans in order to eradicate poverty and boost their economies [6–8]. Unbanked financial services such as Mobile Money Services (MMS) typically function using smartphone applications that are backed by mobile operators or banking institutions. Despite the mobile money sector's expansion and its enormous prospects, research indicates that the adoption of Mobile Financial Services (MFS) is still low in sub-Saharan Africa [8]. The widespread use of mobile devices has significantly increased the number of people who have access to the Internet. As mobile money adoption continues to gain ground, fraudsters are now focusing on this new money transfer route [9–10]. As a result, this advancement has inadvertently ushered in a brand-new age of crime: cybercrime. Financial fraud has evolved and become more complex in recent years as a result of the widespread use of advanced technology. Consumers now accept mobile money as one of the latest means of getting access to financial services that offer quality, affordability, and ease of use. Meanwhile, criminals have discovered new ways to move their illicit funds or fund criminal activities covertly. Therefore, it is commonly acknowledged that the frequency of crime driven by the economy in many societies poses a serious danger to the growth and stability of the global economy. Fraud is a global financial concern that endangers the viability of MMS. The likelihood of cybercrime in MMS is rising and becoming more pervasive [11]. If financial crime aimed at various stakeholders, mobile money agents, and Mobile Network Operator (MNO) systems is not properly addressed, it may deter people from using MMS, potentially undoing years of progress towards financial inclusion [12]. It was observed that due to the exclusion of inclusive development, inadequate security standards by both service providers, and the resulting restrictions and behaviour of mobile end users, Africa as a 174 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. continent lag behind all the other continents in financial inclusion [13]. With the increasing usage of MMS in these countries, it is critical to develop a comprehensive scheme for mobile money security that would alleviate security vulnerabilities and mitigate fraud, as several mobile money service providers have suffered huge losses in revenues due to this emerging threat. It is impossible to overstate the importance of humans in successful cyberattacks against MFS transactions. They could be the attack's instigator, medium, or real perpetrator [13]. The administration of human stakeholders is highly essential to the mobile money security system. From platforms to platforms, internal and external users, and more especially the customer management strategy or practises employed to set up, update, and activate the users by the operators. As a result, the risks posed by the human factor in the intensification of cybercrime on mobile money initiatives must be predicted and avoided through robust and intelligent countermeasures [14]. The existing methodology employs rule-based algorithms and manual eyeballing for the identification and blocking of fraudulent customer registrations [13]. This methodology is frequently time-consuming for the agents, uneconomical, and ineffective for detecting cyber-threats. Fraudsters are encouraged by the MMT services' quick proliferation, and MNOs that offer these services are required to identify ML activity. It is crucial that the tools for detection be effective at detecting threats and simple to use [15]. There have previously been a variety of methods used to address financial transaction fraud. These techniques included rule-based and related statistical techniques. The rule-based technique has a high proportion of false- positive outcomes and is time-consuming and expensive. However, these techniques are gradually losing their effectiveness as criminal behaviour patterns and operating procedures get more sophisticated [16]. The emphasis has shifted away from conventional, rule-based approaches to more advanced computational methods. Applications of Artificial Intelligence (AI), data mining, and Machine Learning (ML) models have been discovered to reduce fraud in high-risk mobile payments and decrease false declines. Researchers have demonstrated the effectiveness of these methods in predicting the cyber threat to MMS and financial crimes [17]. ML addresses the problems associated with conventional approaches by allowing computers to adapt to data and generate predictions. When incorporated into MMS, ML is utilised to deliver automatic detection of potentially fraudulent activities. A collection of transactions that have been presumed to be fraudulent would be used to train an ML algorithm. The algorithm could be adjusted to identify impending fraudulent transactions based on learned experience by identifying patterns that match those in the training data. ML algorithms proactively detect suspicious transactions in real-time, swiftly identify and block transactions that may be fraudulent, minimise the number of fraudulent transactions, and consequently eliminate the need for significant human engagement. Various pre-processing procedures or data transformation methods have been employed to enhance the data quality and, subsequently, the classification accuracy of the Financial Inclusion dataset [18]. ML algorithms are increasingly being used to predict fraudulent transactions. These algorithms, whether supervised or unsupervised, including logistic regression (LR), K-nearest neighbor (KNN), Support Vector Machines (SVM) and Naive Bayes, are trained with datasets and utilised to categorise and classify mobile financial transactions into valid and suspicious ones. Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs), among other deep learning techniques, have additionally been utilised to find anomalies in financial transactions. A significant degree of prediction accuracy has been exhibited by these deep learning and ML systems [19]. All indications point to the conclusion that ML models are useful for automating and modelling cyber-risk assessment in MMS. However, there is a need to investigate the performance of various machine learning classification models that can anticipate malicious customers having cyber-threat risks during the on- boarding procedures for MMS in the developing world. The main objective of this investigation is to evaluate the effectiveness of several classifiers that can predict an applicant's likelihood of being a cyber-threat or having fraudulent intentions during the MMS onboarding or service activation process in order to discover the most accurate predictive ML model. 2 Related works A significant chunk of past research in the field of mobile money focused on the best way to use MMS effectively while reducing fraud and financial concerns. These studies examine the variables that influence the successful application of mobile money. Most research on fraud prediction and detection using AI, data mining, and other statistical techniques that has been conducted in the domain of finance has focused on credit card fraud detection. A state-of-the-art survey on the security issues of MMS identified some conventional methods that have been employed to improve the security of MMS [20–22]. These techniques include biometric methods [23], quantitative analysis of subject matter [24], two-factor authentication [25], structural equation modelling [26], case-based reasoning [27], and a variety of others. It was, however, noted that these conventional methods for combating fraud in MMS are ineffective due to the problems of cybercrime [28]. There are many difficulties with the procedures, rules, and measures for MMS to offer tools for curbing cybercrime threats because no practical solution has been offered, particularly in the context of developing countries, as demonstrated by the survey conducted in [29]. Investigation and research into various models and methodologies have become necessary due to the necessity of building a plan for managing the significant Performance Evaluation of Machine Learning Models for Cyber… Informatica 47 (2023) 173–190 175 risk of mobile money fraud detection. Any dataset on financial transactions, like MMS, has a relatively small fraction of transactions that are fraudulent (positive class) as opposed to valid (negative class). Because of this, the datasets are really imbalanced [30], and ML algorithms that use this data to make predictions are biased in favour of valid transactions, which has the long-term impact of making predictions based on this data potentially false. Credit card transactions have a very imbalanced class distribution because fraud typically accounts for less than 1% of total transactions. Different computational sampling approaches have been used to address the issue of imbalanced data, such as K-means clustering and genetic algorithms [31], a hybrid model based on genetic algorithms [32], and kernel principal component analysis [33], which were used as feature selection methods with some chosen ML algorithms to detect fraud. Despite being a straightforward solution to the issue of data skewness, random undersampling or oversampling still introduces uninformative or unhelpful sub-structures in datasets. The suitability of several machine learning models for fraud detection and classification techniques has been examined [34–35]. Methods of supervised learning are widely applied in the investigation of fraud. These models were used to predict the likelihood of credit card fraud based on a certain number of transactions. Some of the experimental works are SVM and Back Propagation Networks [36]; Weighted Support Vector Machine [37]; Naive Bayes, LR, SVM, and KNN [38]; KNN, Random Forest (RF), LR, Decision Tree, and Naive Bayes classifiers [39]; and comparison of various machine learning models for binary categorization of imbalanced credit card fraud data [34–35]. These applications of these approaches were assessed based on their accuracy, precision, specificity, and sensitivity. The results provide optimal accuracy for the classifiers supported by LR, SVM, Naive Bayes, and KNN, as shown in the summary in Table 1. Results from databases of credit card transactions demonstrate the effectiveness and efficiency of these ML algorithms in the fight against financial transaction fraud. However, the majority of supervised learning techniques for fraud detection have typically been established with the presumption that the mobile money ecosystem is relatively harmless, i.e., that there are no enemies attempting to defeat MMS. Meanwhile, the MMS is now bedevilled by attacks. Given this situation, potential fraudster behaviours were taken into account in MMS using ML techniques [19, 28, 40–41]. Utilization of graph-theoretical methods to identify fraud schemes that result in long-term changes in the typical behaviour of MMS customers [42]. Additionally, ML models were employed to anticipate the adoption of mobile money [43– 44]. Recently, a prediction model using a LR classifier was developed and assessed in order to identify and mitigate suspicious clients with the ability to commit cybercrime during the onboarding processes for MMS in emerging regions. Employing binary and multiclass setups, with or without Synthetic Minority Oversampling Technique (SMOTE or No-SMOTE), the model's performance in identifying and categorising fraudulent MMS application intentions was examined [13]. Among the different configurations of the experiments using LR, the results showed that the LR classifier with the SMOTE application achieved the highest classification accuracy. Also, in order to categorise and forecast fraud in mobile money transactions, investigations were carried out on how well the LR classifier performed by experimenting with various undersampling, weighting, and oversampling strategies [19]. The findings demonstrated that manually adjusting the class weights for false positives and false negatives was the most effective model for these tests. In most of the investigations conducted, the LR classifier and random forest model have been recognised as having the most exceptional performance among all measures, while other classifiers were highly beneficial in predicting suspicious transactions. Among all the classifiers, these two models were the most reliable and effective because they could be modified to reach high precision and successfully learn from data with multiple features. Despite the fact that LR and random forest classifiers are effective ML techniques for detecting fraud, more research is still required to examine how well other ML classification models perform in predicting suspicious customers with the potential for cyber threats during the on-boarding process for MMS in developing countries and produce a more conclusive result. 3 Methodology The goal of this investigation is to develop and evaluate a reliable model for predicting fraudulent mobile money transactions. The work employed supervised learning algorithms to construct an effective prediction system for MMS, using known normal and fraud cases to train the models and uncover their properties. In this study, machine learning models for cyber threat detection and prevention were developed. Analytical models were employed to ascertain the validity of incoming registration or activation record details from Mobile Money applicants. Supervised learning algorithms were used for the model building as follows: Six (6) machine learning algorithm models were used for modelling the prediction of cyber-threat during MMS activation via customer on-boarding or SIM registration processes, namely LR, SNN, DNN, Naive Bayes, Decision Trees (Cart-Classification and Regression Trees), and RF. To avoid class imbalance, the length of the positive class was oversampled with synthetic data using the Synthetic Minority Oversampling Technique (SMOTE). The algorithms were developed with a broad range of configurations determined by two variants: one based on balancing the dataset using SMOTE or not, and two based on binary and multiclass configurations of the algorithms, leveraging the experimental work done in [13]. This brought about the six (6) supervised learning algorithms having four different variants based on the configurations, finally resulting in a total of twenty-four (24) algorithms 176 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. as shown in Table 2. The historical SIM registration data set for new applications and current customers served as the training dataset for the models. To train the model, numerous iterations of this were done. Table 1: Literature review summary table Table 2: Rules for classifying records of mobile money applicants Research work Problems addressed and Techniques used Dataset Distribution Feature Classification Results [13] Using logistic regression to create a prediction model to identify suspicious customers with potential cyber-threats SMOTE Binary and Multiclass LR gives good results [33] Analysed the effectiveness of naive bayes, KNN, and LR on data from credit card fraud that is incredibly imbalanced. oversampling and under-sampling Binary KNN performs better [34] Compare LR, RF, Naive Bayes and Multilayer Perceptron models for detection of fraud data SMOTE Binary RF algorithm gives the best results [35] To investigate SVM-S and Back Propagation Networks (BPN) for building models representing normal and abnormal customer behavior Random under- sampling Binary SVM-S have better prediction performance than Back Propagation Networks (BPN) [36] To judge the veracity of the LR, SVM, and RF algorithm in Credit Card Fraud Detection random under- sampling Binary A weighted SVM model methodology perform best [37] To examine highly skewed data on credit card fraud using SVM, Naive Bayes, LR, and KNN random under- sampling Binary LR was the most accurate [38] Exploring the use of KNN, Naive Bayes, Decision Trees, LR, and RF models to forecast the likelihood that a fraudulent credit card transaction would occur . Imbalanced Dataset Binary classification Decision Tree Model is the best approach Algorithms Description A Logistics Regression 1 LR Binary-No SMOTE Logistic Regression with Binary feature configuration and NO-SMOTE application to Dataset 2 LR Binary-SMOTE Logistic Regression with Binary feature configuration and with SMOTE application to Dataset 3 LR Multiclass-No SMOTE Logistic Regression with Multiclass feature configuration and No-SMOTE application to Dataset 4 LR Multiclass-SMOTE Logistic Regression with Multiclass feature configuration and with SMOTE application to Dataset B Shallow Neural Network 5 SNN Binary-No SMOTE Shallow Neural Network with Binary feature configuration and No-SMOTE application to Dataset 6 SNN Binary-SMOTE Shallow Neural Network with Binary feature configuration and with SMOTE application to Dataset 7 SNN Multiclass-No SMOTE Shallow Neural Network with Multiclass feature configuration and No-SMOTE application to Dataset 8 SNN Multiclass-SMOTE Shallow Neural Network with Multiclass feature configuration and with SMOTE application to Dataset C Deep Neural Network 9 DNN Binary-No SMOTE Deep Neural Network with Binary feature configuration and No-SMOTE application to Dataset 10 DNN Binary-SMOTE Deep Neural Network with Binary feature configuration and with SMOTE application to Dataset 11 DNN Multiclass-No SMOTE Deep Neural Network with Multiclass feature configuration and No- MOTE application to Dataset 12 DNN Multiclass-SMOTE Deep Neural Network with Multiclass feature configuration and with SMOTE application to Dataset D Naïve Bayes(NB) 13 NB Binary-No SMOTE Naïve Bayes(NB) with Binary feature configuration and No-SMOTE application to Dataset 14 NB Binary-SMOTE Naïve Bayes(NB) with Binary feature configuration and with SMOTE application to Dataset 15 NB Multiclass-No SMOTE Naïve Bayes(NB) with Multiclass feature configuration and No-SMOTE application to Dataset 16 NB Multiclass-SMOTE Naïve Bayes(NB) with Multiclass feature configuration and with SMOTE application to Dataset E Decision Tree(CART) 17 CART Binary-No SMOTE Decision Tree(CART) with Binary feature configuration and No-SMOTE application to Dataset 18 CART Binary-SMOTE Decision Tree(CART) with Binary feature configuration and with SMOTE application to Dataset 19 CART Multiclass-No SMOTE Decision Tree(CART)with Multiclass feature configuration and No-SMOTE application to Dataset 20 CART Multiclass-SMOTE Decision Tree(CART) with Multiclass feature configuration and with SMOTE application to Dataset F Random Forest(RF) 21 RF Binary-No SMOTE Random Forest(RF) with Binary feature configuration and No-SMOTE application to Dataset Performance Evaluation of Machine Learning Models for Cyber… Informatica 47 (2023) 173–190 177 4 Results and discussions The Python 3.7 programming language software was used for data analysis, which supports ML methods and data conversion and transformation capabilities. The simulation of the predictive model for detecting and preventing mobile money cyber-attacks was conducted at the time of registering the mobile money applicants' cyber threat intent prediction based on the applicant's biodata registration details to identify an applicant with malicious intentions while on-boarding in order to choose the ideal machine learning model for the outcome. Performance evaluation parameters and metrics were also defined for performance measures. 4.1 Result analysis by experiment grouping The experiments were performed according to [13]. They were grouped into two for each algorithm, with two experiments per group, making a total of four experiments performed per algorithm. These experiments were done with or without rebalancing of the imbalanced dataset using SMOTE with the aim of seeking the best performing algorithm for the Mobile Money on-boarding process cyber threat predictions into multiple classes of applicants’ details as compliant, class 0, low risks, class 1, and high risks, class 2. For Group I experiments, the classifiers were tested with their default binary classification capability for classifying the classifiers' ability to categorise the applicants' records into compliant (zero) and bi-level illegitimate registration categories: low risk (1) and high risk (2). Before running the algorithms, the dataset was treated in two ways after the required preprocessing of string and categorical variables with bag of words and label and one-hot encoding, respectively. The dataset was unbalanced in the distribution of target classifications; each algorithm was run on the dataset, and after applying SMOTE and No-SMOTE, the dataset was rebalanced. The non-application of SMOTE constitutes experiment I, while the application of SMOTE before running the dataset constitutes experiment II of group I. The dataset was divided into a 70% training and 30% testing set. The results obtained are shown in Table 3 for Group I simulation experiments. Overall, the outcome indicated that the dataset's balancing properties had a significant impact on the findings when the two scenarios were compared for all twelve experiments conducted in groups for different variants of the six algorithms simulated. Also, datasets with SMOTE performed better than when the SMOTE operation was not performed on the dataset before running the algorithm, except in the case of LR, where a binary feature No-SMOTE (accuracy = 0.72, MCC = 0.16) performed better than one with SMOTE (accuracy =0.42, MCC = 0.15), as in Table 3. However, a closer look at the confusion matrix for the classification showed that the LR just completed a binary classification and not into multiple classes of compliant (0), low risk (1), and high risk (2). For Group II experiments, as shown in Table 4, these experiments test the multi-classification performance of the algorithms when the dataset was used with No- SMOTE and when SMOTE was applied to the unbalanced dataset for rebalancing. Again, the multiclass algorithm ran on a balanced dataset after the SMOTE application and performed better than those with No-SMOTE. In group II experiments, Random Forest has the highest performance indicator of mcc (0.88), accuracy (0.91), and misclassification rate of 0.05. Thus, the overall performance of the algorithms was better with the multiclass feature enabled and when SMOTE was applied to the dataset before algorithm training and testing. The ability of the classifiers to classify the dataset showed the reliability of the pre-processing processes used by the bag of words to string features in the dataset and SMOTE applications. Due to the underperformance of the binary algorithms, the multiclass feature of the algorithms also aids in better performance. 4.2 Result analysis by individual predictive machine learning algorithm models Each classifier was trained with different configurations of the classifiers, such as binary or multiclass with the integration of SMOTE and No- SMOTE for MM applicant fraudulent intent detection and classification. For each algorithm, the classifier was trained to build the analytical model, and the results discussion for each was presented subsequently. Each classifier was run five times, and the average of the accuracy metrics was taken. A. Logistics regression (LR) experiments Evaluating the classification capability of LR in terms of its accuracy and the Mathews Correlation Coefficient (MCC) with unbalanced (No-SMOTE) and balanced (SMOTE) datasets shows marked differences. The results are described as follows: (i.) Accuracy and MCC: With unbalanced datasets, a deceptively high prediction accuracy of 0.72 was observed with the default binary classification feature of the algorithm for the classified applicant’s dataset (into compliant (class 0) and the two categories of cyber-threat risks: low risks (class 1) and high risks (class 2) in Experiment I of Group I, while with balanced datasets, the accuracy dropped to 0.42 in Experiment II of Group II, thus showing the true algorithm classification performance. This showed 22 RF Binary-SMOTE Random Forest(RF) with Binary feature configuration and with SMOTE application to Dataset 23 RF Multiclass-No SMOTE Random Forest(RF) with Multiclass feature configuration and No-SMOTE application to Dataset 24 RF Multiclass-SMOTE Random Forest(RF) with Multiclass feature configuration and with SMOTE application to Dataset 178 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. that the default feature of LR was to do binary classification and not a good multi-class classifier as the confusion matrix revealed that the classification with an unbalanced dataset of 0.72 accuracy only classified the dataset into two classes: class 0 and class 2. However, when multi-class configuration was used with the LR classifier, the performance was better with SMOTE and NO-SMOTE as the datasets were classified into the three classes: class 0, class 1, and class 2. The classification accuracy was high for both No-SMOTE (0.71) and SMOTE (0.72). When including SMOTE, accuracy (0.72) was the same as when there was no SMOTE using the binary logistics feature, but with SMOTE, the classifier did classify Table 3: Machine learning algorithm binary features for cyber threat prediction experiments MCC Accurac y F1-Score Precisio n Mis- classificatio n Rate AUC TNR(Sp ecificity ) FPR TPR(Sen sitivity) Runtime (min) Group I Experiment II LR Binary- SMOTE 0.15 0.42 0.47 0.59 0.57 0.64 0.71 0.29 0.42 0.0256 SNN Binary- SMOTE 0.53 0.67 0.69 0.77 0.33 0.87 0.83 0.17 0.67 0.0117 DNN Binary- SMOTE 0.19 0.43 0.49 0.73 0.57 0.64 0.72 0.28 0.43 0.03 NB Binary- SMOTE 0.31 0.54 0.53 0.55 0.46 0.72 0.77 0.23 0.54 0.01 CART Binary- SMOTE 0.53 0.68 0.69 0.71 0.32 0.77 0.84 0.16 0.68 0.02 RF Binary- SMOTE 0.86 0.90 0.90 0.92 0.10 0.98 0.95 0.05 0.90 0.18 Group I Experiment I LR Binary- No SMOTE 0.16 0.72 0.79 0.92 0.29 0.62 0.76 0.24 0.48 0.0115 SNN Binary-No SMOTE 0.25 0.62 0.63 0.7 0.39 0.69 0.76 0.24 0.49 0.2444 DNN Binary-No SMOTE 0.2 0.71 0.81 0.96 0.29 0.56 0.69 0.31 0.39 0.1400 NB Binary- No SMOTE 0.18 0.69 0.75 0.83 0.31 0.64 0.71 0.29 0.41 0.0100 CART Binary-No SMOTE 0.34 0.74 0.78 0.85 0.26 0.64 0.75 0.25 0.51 0.3900 RF Binary- No SMOTE 0.50 0.79 0.86 0.96 0.21 0.78 0.77 0.23 0.56 0.5300 Table 4: Machine learning algorithm multiclass features for cyber threat prediction experiments MCC Accuracy F1- Score Precisio n Mis- classificat ion Rate AUC TNR (Specifi city) FPR TPR (Sensiti vity) Runtime (min) Group II Experiment II LR Multiclass- SMOTE 0.58 0.72 0.72 0.72 0.28 0.84 0.86 0.14 0.72 0.0857 SNN Multiclass- SMOTE 0.59 0.72 0.72 0.73 0.28 0.87 0.86 0.14 0.72 1.8200 DNN Multiclass- SMOTE 0.43 0.61 0.65 0.76 0.39 0.87 0.80 0.20 0.61 1.3100 NB Multiclass- SMOTE 0.34 0.56 0.56 0.58 0.44 0.74 0.78 0.22 0.56 0.0200 CART Multiclass- SMOTE 0.82 0.88 0.88 0.88 0.12 0.89 0.94 0.06 0.88 0.3300 RF Multiclass- SMOTE 0.88 0.91 0.91 0.93 0.09 0.99 0.95 0.05 0.90 0.400 Group II Experiment I LR Multiclass-No SMOTE 0.27 0.69 0.71 0.74 0.31 0.71 0.75 0.25 0.48 0.0156 SNN Multiclass- No SMOTE 0.3 0.72 0.76 0.84 0.28 0.69 0.74 0.26 0.47 1.0100 Performance Evaluation of Machine Learning Models for Cyber… Informatica 47 (2023) 173–190 179 DNN Multiclass- No SMOTE 0.3 0.73 0.81 0.92 0.27 0.69 0.72 0.28 0.45 0.5400 NB Multiclass-No SMOTE 0.24 0.72 0.78 0.88 0.28 0.66 0.72 0.28 0.42 0.0100 CART Multiclass- No SMOTE 0.392 0.733 0.763 0.807 0.267 0.63 0.79 0.21 0.59 1.7400 RF Multiclass-No SMOTE 0.51 0.79 0.86 0.96 0.21 0.78 0.77 0.23 0.56 1.7000 into distinct three classes, which made the performance better in the context of the multi- classification of cyber threat risks. For a clear performance evaluation, MCC was also used to substantiate the evaluation. The MCC gave a very clear distinction and better performance measurements; hence, the LR experiment with the best classifier configuration was the configuration with Multiclass with SMOTE among the four LR experiments performed, which had the highest MCC of 0.58 when compared with other experiments MCCs of 0.27, 0.15, and 0.16, as shown in Table 5 and Figure 1. (ii.) Precision, recall and F1-score: If the dataset was fairly balanced, accuracy as an evaluation metric would suffice for a sound conclusion; however, precision, recall, and F1 score are good for evaluating an imbalanced dataset. The fraud detection dataset was unbalanced; hence, to evaluate the effectiveness of such a model, examining the precision and recall is very important. As presented in Table 5, the precision or specificity for the LR binary with No-SMOTE classification experiment was 0.76, and the recall or sensitivity was 0.48 when compared with the multiclass classification logistic regression model specificity of 0.86 and sensitivity of 0.72. (iii.) ROC and predicted probabilities: The Receiver Operating Characteristics (ROC) Area Under Curve (AUC) for multiclass LR of 0.84 was also higher than for all other LR experiments. This further buttresses the fact that the LR classifier (including the SMOTE application) provides the best classification performance among the various configurations of LR experiments performed. The ROC AUC value is presented in Table 5 and Figure 2. Thus, SMOTE improves the performance of the LR classifiers, although the multiclass with LR gave the best performance. B. Shallow neural network (SNN) experiments Evaluating the classification capability of SNN in terms of its accuracy and Mathews Correlation Coefficient (MCC) with the unbalanced (No-SMOTE) and balanced (SMOTE) dataset shows marked differences, as presented in Table 6, Figures 3 and 4. The results are described as follows: (i.) Accuracy and MCC: The accuracy of both multiclass experiments remains the highest and equal 0.72 among the four experiments performed for SNN; however, the MCC value revealed the best algorithm with a value of 0.59 for multiclass configuration with SMOTE and 0.3 for multiclass configuration with No-SMOTE. The MCC thus revealed the algorithm configuration with the optimal efficiency among the different configurations of the SNN experiments for the classification into classes 0, 1, and 2. (ii.) Precision, recall and F1-score: The performance parameters of the multi-class with SMOTE configuration of the SNN experiments were the highest, with a specificity of 0.86, a sensitivity of 0.72, and an F1-Score of 0.72, which implies it has the best performance among the other SNN experiment configurations. (iii.) ROC and predicted probabilities: The Receiver Operating Characteristics (ROC) Area Under Curve (AUC) of 0.87 for multiclass SNN was the highest for the dataset with the SMOTE application, and this was the same for the binary configurations. This was the most successful of the experiments with No-SMOTE. Figure 1: Different LR configuration results by performance parameters. 180 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. Figure 2: LR performance results for different configurations. Table 5: Performance metrics for LR with (SMOTE)/without SMOTE (No-SMOTE) Logistics Regression MCC (-1+1) Accurac y F1- Score Precisio n Mis- Classific ation Rate ROC AUC Specifici ty (TNR) FPR Sensitivi ty (TPR) FNR Runtime (min) LR Binary-No SMOTE 0.16 0.72 0.79 0.92 0.29 0.62 0.76 0.24 0.48 0.52 0.0115 LR Binary- SMOTE 0.15 0.42 0.47 0.59 0.57 0.64 0.71 0.29 0.42 0.58 0.0256 LR Multiclass- No SMOTE 0.27 0.69 0.71 0.74 0.31 0.71 0.75 0.25 0.48 0.52 0.0156 LR Multiclass- SMOTE 0.58 0.72 0.72 0.72 0.28 0.84 0.86 0.14 0.72 0.28 0.0857 Table 6: Performance metrics for SNN with (SMOTE)/without SMOTE (No-SMOTE) Shallow Neural Network(SNN) MCC Accur acy F1- Score Precis ion Mis- classificati on Rate AUC Specific ity (TNR) FPR Sensitiv ity (TPR) FNR Runtime (min) SNN Binary- No SMOTE 0.25 0.62 0.63 0.7 0.39 0.69 0.76 0.24 0.49 0.51 0.2444 SNN Binary- SMOTE 0.53 0.67 0.69 0.77 0.33 0.87 0.83 0.17 0.67 0.33 0.0117 SNN Multiclass-No SMOTE 0.3 0.72 0.76 0.84 0.28 0.69 0.74 0.26 0.47 0.53 1.0100 SNN Multiclass- SMOTE 0.59 0.72 0.72 0.73 0.28 0.87 0.86 0.14 0.72 0.28 1.8200 Performance Evaluation of Machine Learning Models for Cyber… Informatica 47 (2023) 173–190 181 Figure 3: Different Shallow Neural Network (SNN) configuration results by performance parameters. Figure 4: Shallow Neural Network (SNN) performance results for different configurations. C. Deep neural network (DNN) experiments Evaluating the classification capability of the DNN in terms of its accuracy and the Mathews Correlation Coefficient (MCC) with unbalanced (No-SMOTE) and balanced (SMOTE) datasets shows marked differences, as presented in Table 7 and Figures 5 and 6. The results are described as follows: (i.) Accuracy and MCC: For the DNN, the classifier's performance was poorer than that of the SNN in all scenarios tested. This was evident from the accuracy metrics, with the highest accuracy of 0.71 for the binary and No-SMOTE configurations for the classifier, and the algorithm classified the dataset into two of the three classes. The multiclass with SMOTE configuration had the best performance among the DNN experiments performed, as revealed by the MCC value of 0.43 (but with an accuracy of 0.61), which is the highest among all the experiments. However, the overall performance is weak. (ii.) Precision, recall and F1-score: The performance of the DNN was poorer than that of the SNN when compared with SNN performance parameters. The multi-class with SMOTE configuration of the DNN experiments had a specificity of 0.80, a sensitivity of 0.61, and an F1-Score of 0.65, which clearly showed a lower performance. (iii.) ROC and predicted probabilities: The Receiver Operating Characteristics (ROC) Area Under Curve (AUC) of 0.87 for multiclass DNN with SMOTE was also the same as that of the SNN configurations in the multiclass experiment with SMOTE. 182 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. D. Naïve Bayes experiments Evaluating the classification capability of Naïve Bayes in terms of its accuracy and the Mathews Correlation Coefficient (MCC) with unbalanced (No- SMOTE) and balanced (SMOTE) datasets shows marked differences, as presented in Table 8, Figures 7 and 8. The results are described as follows: (i.) Accuracy and MCC: The No-SMOTE experiments for Naïve Bayes experiments performed better than with SMOTE application in terms of accuracy (0.69 and 0.71 for the binary No-SMOTE and multiclass with SMOTE, respectively); however, the MCC (0.31 and 0.34 for the binary with SMOTE and multiclass with SMOTE, respectively) and confusion matrix distributions showed that the multiclass performed better for the experiments. This still reinforces the fact that accuracy may not always be the best performance metric for evaluation. In general, Naïve Bayes performed poorly by performance metrics; however, it was the fastest algorithm in all the experiments performed, taking less than 2 seconds to run. (ii.) Precision, recall and F1-score: The specificity of 0.78 and the sensitivity of 0.56 reinforce the fact that the multiclass configuration with the SMOTE application had the best performance out of all the Naïve Bayes experiments. (iii.) ROC and predicted probabilities: The Receiver Operating Characteristics (ROC) Area Under Curve (AUC) of 0.74 for multiclass Naïve Bayes also highlights that multiclass with SMOTE application revealed that the multiclass performed the best for Naïve Bayes. E. Decision tree (classification and regression tress- CART) experiments The CART performed second best overall in the overall simulation for MMS cyber threat detection. Evaluating the classification capability of the CART algorithm in terms of its performance metrics with SMOTE and NO- SMOTE applied to the dataset for experimenting with the performance of CART to classify mobile money applicants showed marked differences, as presented in Table 9 and Figures 9 and 10. The results are described as follows: (i.) Accuracy and MCC: Decision Tree algorithm performed very well with SMOTE application to the dataset, with an accuracy of 0.53 for binary and 0.88 for multiclass configurations within a very reasonable time. The multiclass configuration performed overall best among the four experiments performed for the algorithm, with an MCC of 0.82, which was far above any of the experiments for the algorithmic CART. The MCC thus further confirms the authority of the algorithm's performance. (ii.) Precision, recall and F1-score: The specificity and sensitivity also gave interesting results for CART as a high-performing classifier in the research modelling scenario for a multiclass CART configuration with SMOTE. The specificity was 0.94 and the sensitivity was 0.88, which was the highest for the decision tree (CART) experiment performed. (iii.) ROC and predicted probabilities: The ROC AUC of 0.89 for multiclass CART also confirms a relatively high performance for decision trees. F. Random forest (RF) experiments Random Forest performed the best overall for the predictive model. The ensemble classifier, Random Forest (RF), performance evaluation with SMOTE and Figure 5: Different deep neural network (DNN) configuration results by performance parameters. Performance Evaluation of Machine Learning Models for Cyber… Informatica 47 (2023) 173–190 183 Figure 6: Deep neural network (DNN) performance results for different configurations. Table 7: Performance metrics for deep neural network (DNN) with (SMOTE)/without SMOTE (No-SMOTE) Deep Neural Network MCC Accur acy F1- Score Precisi on Mis- classification Rate AUC Specific ity(TN R) FPR TRP(Sensit ivity) FNR Runti me(mi n) DNN Binary- No SMOTE 0.2 0.71 0.81 0.96 0.29 0.56 0.69 0.31 0.39 0.61 0.14 DNN Binary- SMOTE 0.19 0.43 0.49 0.73 0.57 0.64 0.72 0.28 0.43 0.57 0.03 DNN Multiclass-No SMOTE 0.3 0.73 0.81 0.92 0.27 0.69 0.72 0.28 0.45 0.55 0.54 DNN Multiclass- SMOTE 0.43 0.61 0.65 0.76 0.39 0.87 0.80 0.20 0.61 0.40 1.31 Table 8: Performance metrics for naïve bayes (NB) with (SMOTE)/without SMOTE (No-SMOTE) Naïve Bayes(NB) MCC Accura cy F1- Score Precisio n Mis- classification Rate ROC AUC TNR(S pecificit y) FPR TRP(Sens itivity) FNR Runti me(mi n) NB Binary-No SMOTE 0.18 0.69 0.75 0.83 0.31 0.64 0.71 0.29 0.41 0.59 0.01 NB Binary- SMOTE +0.31 0.54 0.53 0.55 0.46 0.72 0.77 0.23 0.54 0.46 0.01 NB Multiclass- No SMOTE +0.24 0.72 0.78 0.88 0.28 0.66 0.72 0.28 0.42 0.58 0.01 NB Multiclass- SMOTE 0.34 0.56 0.56 0.58 0.44 0.74 0.78 0.22 0.56 0.44 0.02 184 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. Figure 7: Different naïve bayes algorithm configuration results by performance parameters. Figure 8: Naïve bayes performance results for different configurations. Table 9: Performance metrics for decision trees (CART with (SMOTE)/without SMOTE (No-SMOTE) Decision Tree(CART) MCC Accu racy F1- Score Precisi on Mis- classification Rate AUC TNR(Sp ecificity ) FP R TPR(Sensit ivity) FN R Runtime( min) CART Binary-No SMOTE 0.34 0.74 0.78 0.85 0.26 0.64 0.75 0.2 5 0.51 0.4 9 0.39 CART Binary- SMOTE 0.53 0.68 0.69 0.71 0.32 0.77 0.84 0.1 6 0.68 0.3 2 0.02 CART Multiclass- No SMOTE 0.39 0.73 0.763 0.807 0.267 0.63 0.79 0.2 1 0.59 0.4 1 1.74 CART Multiclass- SMOTE 0.82 0.88 0.88 0.88 0.12 0.89 0.94 0.0 6 0.88 0.1 2 0.33 Figure 9: Different decision trees (CART) algorithm configuration results by performance parameters. Figure 10: Decision trees (CART) performance results for different configurations. NO-SMOTE on the dataset, presented an interesting experiment as it made good on a promise for the research under study. The algorithm showed the best performance for both the binary and multiclass algorithm configuration capabilities for the multiclass classification problem at hand. It performed as the best overall experiment for the cyber threat predictive model for MMS applicant cyber threats or fraud intent detection and prevention. The results are as shown in Table 10, Figures 11 and 12. The results are described as follows: (i.) Accuracy and MCC: RF performed the best overall in all the simulation experiments and was recommended for predicting mobile money cyber threat detection. The algorithm had an accuracy of 0.91 and an MCC of 0.88 for the multiclass Performance Evaluation of Machine Learning Models for Cyber… Informatica 47 (2023) 173–190 185 configuration, with a minimal classification error rate of 0.09. These are the highest in the overall simulation experiments conducted. (ii.) Precision, recall and F1-score: The high- performance result for the RF algorithm for the predictive model was also shown by the high multiclass configuration sensitivity and specificity of 0.90 and 0.95, respectively, as well as the F1-Score of 0.91 in the overall experiments. (iii.) ROC and predicted probabilities: The ROC AUC for RF was 0.99 for multiclass configurations. This is also confirmation that the best algorithm for the Mobile Money customer onboarding predictive model is the RF classifier. 4.3 Result analysis by algorithm performance comparison Comparing all the predictive models and simulation experiments in order of performance metrics, as shown in Table 11 and Figures 13 and 14, the best algorithm was Random Forest. The multiclass configuration of the algorithm with SMOTE performed overall best, while the binary configuration with SMOTE came in second. The RF with SMOTE has the overall highest MCC of 0.88, accuracy of 0.91, precision of 0.93, the lowest classification error of 0.09, a ROC AUC of 0.99, specificity or true negative rate (TNR) of 0.95 (95%), and sensitivity or recall (TPR) of 0.90 (90%), while the binary configuration has an MCC of 0.86, accuracy of 0.90, precision of 0.92, the lowest classification error of 0.10, a ROC AUC of 0.98, specificity (TNR) of 0.95 (95%), and sensitivity or recall (TPR) of 0.90 (90%). However, the run duration for binary configuration was faster, with a total time of 0.18 min, than the multiclass configuration duration of 0.40 min. The implication of the narrow differences in performance metrics between the multiclass and binary configurations with SMOTE reinforced, confirmed, and showed that Random Forest (RF) is a default multiclass classifier as it was able to predict the cyber threat risk levels into multiple classes according to dataset labels. 5 Conclusion In order to prevent mobile money fraud and deal with anti-money laundering compliance, machine learning (ML) and artificial intelligence (AI) are becoming more and more widely accepted as essential. The fight against financial crime has always involved computational technology, but the development of ML and AI has given law enforcement a potent new weapon in the fight against mobile money fraud. Financial institutions can better understand their customers' demands and risk profiles by using AI to spot and highlight problematic conduct, such as large or unexpected transactions. Financial institutions may greatly enhance their capacity to prevent mobile money fraud and handle money laundering issues by leveraging the power of AI. This study attempts to develop a fraud detection model that will identify warning signs of fraud and money laundering in mobile money transfers using ML algorithms. More specifically, a collection of risk-based indicators was employed in this study to forecast the likelihood that a transaction would be fraudulent. SMOTE techniques were used to create artificial minority class samples in order to prevent dataset sub-structures that were either uninformative or poorly informative. This work significantly contributes to the body of knowledge on how to detect suspicious activity in mobile money transfers in a number of ways. Theoretically, machine learning algorithms that rely on the more traditional rule-based benchmark methodology can get around the difficulties associated with trying to identify illicit transactions. The traditional rule-based benchmark technique uses established criteria based on mathematical circumstances to identify illicit transactions. Table 10: Performance Metrics for Random Forest with (SMOTE)/without SMOTE (No-SMOTE) Random Forest(RF) MCC Accu racy F1- Score Precisi on Misclassificati on Rate AUC TNR (Specific ity) FPR TPR (Sensitivit y) FNR Runtime (min) RF Binary-No SMOTE 0.50 0.79 0.86 0.96 0.21 0.78 0.77 0.23 0.56 0.44 0.53 RF Binary- SMOTE 0.86 0.90 0.90 0.92 0.10 0.98 0.95 0.05 0.90 0.10 0.18 RF Multiclass- No SMOTE 0.51 0.79 0.86 0.96 0.21 0.78 0.77 0.23 0.56 0.44 1.7 RF Multiclass- SMOTE 0.88 0.91 0.91 0.93 0.09 0.99 0.95 0.05 0.90 0.10 0.4 186 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. Figure 12: Random forest performance results for different configurations. Table 11: Ranking algorithm experimental performance with (SMOTE)/without SMOTE (No-SMOTE) Algorithm MCC Accuracy Precision Misclassificat ion Rate AUC TNR (Specificity) FPR TPR (Sensitivity) FNR Runtime (min) RF Multiclass- SMOTE 0.88 0.91 0.93 0.09 0.99 0.95 0.05 0.90 0.10 0.40 RF Binary- SMOTE 0.86 0.90 0.92 0.10 0.98 0.95 0.05 0.90 0.10 0.18 CART Multiclass- SMOTE 0.82 0.88 0.88 0.12 0.89 0.94 0.06 0.88 0.12 0.33 SNN Multiclass- SMOTE 0.59 0.72 0.73 0.28 0.87 0.86 0.14 0.72 0.28 1.82 LR Multiclass- SMOTE 0.58 0.72 0.72 0.28 0.84 0.86 0.14 0.72 0.28 0.09 CART Binary- SMOTE 0.53 0.68 0.71 0.32 0.77 0.84 0.16 0.68 0.32 0.02 SNN Binary- SMOTE 0.53 0.67 0.77 0.33 0.87 0.83 0.17 0.67 0.33 0.01 Figure 14: All algorithm performance results for different configurations. Figure 11: Different random forest algorithm configuration results by performance parameters. Figure 13: Different experimented algorithm configuration results by performance parameters Figure 12: Random Forest performance results for different configurations Performance Evaluation of Machine Learning Models for Cyber… Informatica 47 (2023) 173–190 187 RF Multiclass- No SMOTE 0.51 0.79 0.96 0.21 0.78 0.77 0.23 0.56 0.44 1.70 RF Binary-No SMOTE 0.50 0.79 0.96 0.21 0.78 0.77 0.23 0.56 0.44 0.53 DNN Multiclass- SMOTE 0.43 0.61 0.76 0.39 0.87 0.80 0.20 0.61 0.39 1.31 CART Multiclass-No SMOTE 0.392 0.733 0.807 0.267 0.63 0.79 0.21 0.59 0.41 1.74 CART Binary- No SMOTE 0.34 0.74 0.85 0.26 0.64 0.75 0.25 0.51 0.49 0.39 NB Multiclass- SMOTE 0.34 0.56 0.58 0.44 0.74 0.78 0.22 0.56 0.44 0.02 NB Binary- SMOTE 0.31 0.54 0.55 0.46 0.72 0.77 0.23 0.54 0.46 0.01 DNN Multiclass- No SMOTE 0.3 0.73 0.92 0.27 0.69 0.72 0.28 0.45 0.55 0.54 SNN Multiclass- No SMOTE 0.3 0.72 0.84 0.28 0.69 0.74 0.26 0.47 0.53 1.01 LR Multiclass- No SMOTE 0.27 0.69 0.74 0.31 0.71 0.75 0.25 0.48 0.52 0.02 SNN Binary-No SMOTE 0.25 0.62 0.7 0.39 0.69 0.76 0.24 0.49 0.51 0.24 NB Multiclass- No SMOTE 0.24 0.72 0.88 0.28 0.66 0.72 0.28 0.42 0.58 0.01 DNN Binary-No SMOTE 0.2 0.71 0.96 0.29 0.56 0.69 0.31 0.39 0.61 0.14 DNN Binary- SMOTE 0.19 0.43 0.73 0.57 0.64 0.72 0.28 0.43 0.57 0.03 NB Binary-No SMOTE 0.18 0.69 0.83 0.31 0.64 0.71 0.29 0.41 0.59 0.01 LR Binary-No SMOTE 0.16 0.72 0.92 0.29 0.62 0.76 0.24 0.48 0.52 0.01 LR Binary- SMOTE 0.15 0.42 0.59 0.57 0.64 0.71 0.29 0.42 0.58 0.03 Acknowledgment This Research was funded by the TETFund Research Fund and Africa Centre of Excellence OAK-Park. References [1] Sam Castle, Pervaiz Fahad, Cassebeer Weld Galen, Roesner Franziska and Richard J. Anderson. Let's Talk Money: Evaluating the Security Challenges of Mobile Money in the Developing World. In Proceedings of the 7th Annual Symposium on Computing for Development (ACM DEV '16, 18 – 20 November 2016, Nairobi, Kenya, 1-10, 2016. https://doi.org/10.1145/3001913.3001919 [2] Alex Bara. Mobile money for financial inclusion: policy and regulatory perspective in Zimbabwe, African Journal of Science, Technology, Innovation and Development, 5(5): 345-354, 2013 https://doi.org/10.1080/20421338.2013.829287 [3] Sandra L, Suárez. Poor people’s money: the politics of mobile money in Mexico and Kenya, Telecommunications Policy, 40 (10/11): 945- 955, 2016. https://doi.org/10.1016/j.telpol.2016.03.001 Frederick Kanobe, Patricia M. Alexander, and Kelvin J. Bwalya. Policies, regulations and procedures and their effects on mobile money systems in Uganda, The Electronic Journal of Information Systems in Developing Countries, 83(1):1-15, 2017. https://doi.org/10.1002/j.1681- 4835.2017.tb00615.x [4] Isaac Akomea-Frimpong, Charles Andoh, Agnes Akomea-Frimpong, Yvonne Dwomoh- Okudzeto. Control of fraud on mobile money services in Ghana: an exploratory study, Journal of Money Laundering Control, 22(2):300-317, 2019. https://doi.org/10.1108/JMLC-03-2018- 0023 [5] Claire Célerier and Adrien Matray. Bank-branch supply, financial inclusion, and wealth accumulation. Rev. Finan. Stud, 32(12):4767– 4809, 2019. https://doi.org/10.1093/rfs/hhz046 [6] M. Mostak Ahamed and Mallick Sushanta. Is financial inclusion good for bank stability?. Journal of Economic Behavior & Organization, 157(1): 403–427, 2019. https://doi.org/10.1016/j.jebo.2017.07.027 [7] Joyce Koi Akrofi. Mobile Money Adoption in Africa: A Literature-Based Analysis, Texila 188 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. International Journal of Management, 8(2): 1-12, 2022. https://doi.org/10.21522/TIJMG.2015.08.02.Art 014 [8] Pierre-Laurent Chatain, Andrew Zerzan, Wameek Noor, Najah Dannaoui, and Louis de Koker. Protecting Mobile Money against Financial Crimes: Global Policy Challenges and Solutions, World Bank Publications, New York, NY, 2011. https://doi.org/10.1596/978-0-8213- 8669-9 [9] Mercy W. Buku and Rafe Mazer. Fraud in Mobile Financial Services: Protecting Consumers, Providers, and the System. World bank publications, 2017. https://documents1.worldbank.org/curated/en/24 9151504766545101/pdf/119208-BRI-PUBLIC- Brief-Fraud-in-Mobile-Financial-Services- April-2017.pdf [10] Hakeem J. Pallangyo. Cyber Security Challenges, its Emerging Trends on Latest Information and Communication Technology and Cyber Crime in Mobile Money Transaction Services, Tanzania Journal of Engineering and Technology, 41(2):189-204, 2022. http://dx.doi.org/10.52339/tjet.v41i2.792. [11] Denish Azamuke, Marriette Katarahweire and Engineer Bainomugisha. Scenario-based Synthetic Dataset Generation for Mobile Money Transactions. In Proceedings of the Federated Africa and Middle East Conference on Software Engineering, 64–72, 2022. https://doi.org/10.1145/3531056.3542774 [12] Mistura Laide Sanni., Bodunde Odunola Akinyemi, Dauda Akinwuyi Olalere, Emmanuel Adebayo Olajubu and Ganiyu Adesola Aderounmu. A Predictive Cyber Threat Model for Mobile Money Service. Annals of Emerging Technologies in Computing, 7(1): 40-60, 2023. https://doi.org/10.33166/AETiC.2023.01.004 [13] Stephen Ambore, Christopher Richardson, Huseyin Dogan, Edward Apeh and David Osselton. A resilient cybersecurity framework for Mobile Financial Services (MFS). Journal of Cyber Security Technology, 1(3-4): 202-224, 2017. https://doi.org/10.1080/23742917.2017.1386483 [14] Maria Zhdanova, Jürgen Repp, Roland Rieke, Chrystel Gaber and Baptiste Hemery. No Smurfs: Revealing Fraud Chains in Mobile Money Transfers. In proceedings of the International Conference on Availability, Reliability and Security (ARES), Switzerland, 10, 2014. https://doi.org/10.1109/ARES.2014.10 [15] Francis Effirim Botchey, Zhen Qin, and Kwesi Hughes-Lartey. Mobile Money Fraud Prediction—A Cross-Case Analysis on the Efficiency of Support Vector Machines, Gradient Boosted Decision Trees, and Naïve Bayes Algorithms. Information, 11(8): 383, 2020. https://doi.org/10.3390/info11080383 [16] Ibukun Eweoya, Ayodele Adebiyi, Ambrose Azeta, Okesola Olatunji. Fraud Prediction in Bank Credit Administration: A Systematic Literature Review. Journal of Theoretical and Applied Information Technology, 97 (11): 3147- 3169, 2019. [17] Boluwaji A. Akinnuwesi, Stephen G. Fashoto, Andile S. Metfula and Adetutu N. Akinnuwesi. Experimental Application of Machine Learning on Financial Inclusion Data for Governance in Eswatini. In: Hattingh, M., Matthee, M., Smuts, H., Pappas, I., Dwivedi, Y.K., Mäntymäki, M. (eds) Responsible Design, Implementation and Use of Information and Communication Technology. I3E 2020. Lecture Notes in Computer Science, 12067. Springer, Cham, 2020. https://doi.org/10.1007/978-3-030-45002- 1_36 [18] Francis Effirim Botchey, Zhen Qin, Kwesi Hughes-Lartey and Ernest Kwame Ampomah. Predicting Fraud in Mobile Money Transactions using Machine Learning: The Effects of Sampling Techniques on the Imbalanced Dataset. Informatica, 45: 45–56, 2021. https://doi.org/10.31449/inf.v45i7.3179. [19] Majda Omer Albasheer and Eihab B. M. Bashier. Enhanced Model for PKI Certificate Validation in the Mobile Banking. in Proceedings of the 2013 International Conference on Computing, Electrical and Electronics Engineering (ICCEEE), 26-28 August 2013, Khartoum, Sudan, 470–476, 2013. https://doi.org/10.1109/ICCEEE.2013.6633984 [20] Shaik Shakeel Ahamad, V. N. Sastry, and Madhusoodhnan Nair. Biometric Based Secure Mobile Payment Framework. in Proceedings 2013 4th International Conference on Computer and Communication Technology (ICCCT), 20- 22 September 2013, Allahabad, India, 239-246, 2013. https://doi.org/10.1109/ICCCT.2013.6749634 [21] C. Narendiran, S. Albert Rabara, and N. Rajendran. Public Key Infrastructure for Mobile Banking Security. in Proceedings of the 2009 Global Mobile Congress, 12-14 October 2009, Shanghai, China, 1–6, 2009. https://doi.org/10.1109/GMC.2009.5295898 [22] Mangala Belkhede, Veena Gulhane, and Preeti Bajaj. Biometric Mechanism for Enhanced Performance Evaluation of Machine Learning Models for Cyber… Informatica 47 (2023) 173–190 189 Security of Online Transaction on Android System: A Design Approach, in Proceedings of the 2012 14th International Conference on Advanced Communication Technology (ICACT), 19-22 February 2012, PyeongChang, South Korea, 1193 – 1197, 2012. [23] Min Hee Yeon, Park Jin Hyunga and Kim In Seok. Outlier Detection Method for Mobile Banking with User Input Pattern and E-finance Transaction Pattern. Journal of Internet Computing and Services, 15(1):157–170, 2014. https://doi.org/10.7472/JKSII.2014.15.1.157 [24] Adam B. Mtaho. Improving Mobile Money Security with Two-Factor Authentication. International Journal of Computer Applications, 109(7): 9-15, 2015, https://doi.org/10.5120/19198-0826 [25] Peter Tobbin and John K.M. Kuwornu. Adoption of Mobile Money Transfer Technology: Structural Equation Modelling Approach. European Journal of Business and Management, 3(7):59–77. 2011. [26] Adeyinka Adedoyin, Stelios Kapetanakis, Georgios Samakovitis and Miltos Petridis. Predicting fraud in mobile money transfer using case-based reasoning. In proceedings of the Artificial Intelligence XXXIV: 37th SGAI International Conference on Artificial Intelligence, AI 2017, Cambridge, UK, December 12-14, 2017. https://doi.org/10.1007/978-3-319-71078-5_28. [27] Simon Delecourt and Li Guo. Building a Robust Mobile Payment Fraud Detection System with Adversarial Examples. In proceedings of the 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 03-05 June 2019. https://doi.org/10.1109/AIKE.2019.00026. [28] Md. Alamgir Hossain. Security Perception in the Adoption of Mobile Payment and the Moderating Effect of Gender, PSU Research Review, 3(3):179-190, 2019. https://doi.org/10.1108/PRR-03-2019-0006 [29] Suleiman Ali Alsaif and Adel Hidri. Impact of data balancing during training for best predictions, Informatica, 45(2): 223–230, 2021. https://doi.org/10.31449/inf.v45i2.3479. [30] Ibtissam Benchaji, Samira Douzi, and Bouabid ElOuahidi. Using Genetic Algorithm to Improve Classification of Imbalanced Datasets for Credit Card Fraud Detection, in Proceedings of the 2018 2nd Cyber Security in Networking Conference (CSNet), Paris, France, 1-5, 2018. https://doi.org/10.1109/CSNET.2018.8602972 [31] Bashir S., and Ghous H., Detecting Mobile Money Laundering Using Genetic Algorithm as Feature Selection Method with Classification Method. LC International Journal of STEM, 1(4):121-129, 2021. https://doi.org/10.5281/zenodo.5149794 [32] Shamila Bashir, Dr. Hamid ur Rehman. Detecting Mobile Money Laundering Using KPCA as Feature Selection Method. LC International Journal of STEM, 2(3):1-8, 2021. https://doi.org/10.5281/zenodo.5751721 [33] John O. Awoyemi; Adebayo O. Adetunmbi; Samuel A. Oluwadare. Credit card fraud detection using machine learning techniques: A comparative analysis, in proceedings of the IEEE 2017 International Conference on Computing Networking and Informatics (ICCNI). 1–9, 2017. https://doi.org/10.1109/ICCNI.2017.8123782 [34] Varmedja, D., Karanovic, M. Sladojevic, S., Arsenovic, M., and Anderla, A. Credit card fraud detection-machine learning methods, In proceedings of the IEEE 2019 18 th International Symposium INFOTEH-JAHORINA (INFOTEH), 1–5, 2019. https://doi.org/10.1109/infoteh.2019.8717766 [35] Nana Kwame Gyamfi, and Jamal-Deen Abdulai. Bank Fraud Detection Using Support Vector Machine, in proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 37-41, 2018 https://doi.org/10.1109/IEMCON.2018.8614994 . [36] Dongfang Zhang, Basu Bhandari, and Dennis Black. Credit Card Fraud Detection Using Weighted Support Vector Machine. Applied Mathematics, 11, 1275-1291, 2020. https://doi.org/10.4236/am.2020.1112087 [37] Olawale Adepoju, Julius Wosowei, Shiwani lawte and Hemaint Jaiman. Comparative Evaluation of Credit Card Fraud Detection Using Machine Learning Techniques, In proceedings of the 2019 Global Conference for Advancement in Technology (GCAT), Bangalore, India, 1-6, 2019. https://doi.org/10.1109/GCAT47503.2019.8978 372. [38] Samidha Khatri, Aishwarya Arora, and Arun Prakash Agrawal. Supervised Machine Learning Algorithms for Credit Card Fraud Detection: A Comparison, In proceedings of the 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 680-683, 2020. https://doi.org/10.1109/Confluence47617.2020. 9057851. 190 Informatica 47 (2023) 173–190 B.O. Akinyemi et al. [39] Iddi S. Mambina, Jema D. Ndibwile, and Kisangiri F. Michael. Classifying Swahili Smishing Attacks for Mobile Money Users: A Machine-Learning Approach, IEEE Access, 10, 83061-83074, 2022. https://doi.org/10.1109/ACCESS.2022.3196464 [40] N. NishaBalani, MeherBhawnani M., and AnkitaKamle M. Implementation and Design on Fraud Detection and Prediction of Mobile Money Transaction Using ML Techniques. Annals of the Romanian Society for Cell Biology, 24(2):261 – 269, 2021. [41] Evgenia Novikova and Igor Kotenko. Visual Analytics for Detecting Anomalous Activity in Mobile Money Transfer Services. In Proceedings of the International Cross-Domain Conference and Workshop on Availability, Reliability, and Security (CD-ARES), Sep, Fribourg, Switzerland. 63-78, 2014. https://doi.org/10.1007/978-3-319-10975-6_5 [42] Muhammad R. Khan and Joshua E. Blumenstock. Predictors without borders: behavioral modeling of product adoption in three developing countries. In Proceedings of the ACM 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 145–154, 2016. http://dx.doi.org/10.1145/2939672.2939710 [43] Simone Centellegher, Giovanna Miritello, Daniel Villatoro, Devyani Parameshwar, Bruno Lepri and Nuria Oliver. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 2(7):1–18, 2018. https://doi.org/10.1145/3287035.