https://doi.org/10.31449/inf.v47i5.4467 Informatica 47 (2023) 57–62 57 Prediction of Heart Diseases Using Data Mining Algorithms Karrar AL-Jammali Faculty of Pharmacy, University of Kufa, Najaf, Iraq E-mail: karrara.aljammali@uokufa.edu.iq Keywords: ANN artificial neural networks, SVM support vector machine, decision tree, heart disease, data mining Received: October 24, 2022 Data mining has been successfully used in numerous businesses and sectors as a result of its success in great visible areas like e-commerce and marketing. Healthcare is one of the recently identified industries. The healthcare sector continues to be "information-rich." Healthcare systems have access to a multitude of datasets and can use them to find hidden links and trends in data. There aren't enough efficient analysis tools, though. The dataset is analyzed using various machine learning algorithms, i.e., decision trees, neural networks, support vector machines, and algorithms. The experiment makes use of data mining. This study paper aims to present an overview of the most recent methods for knowledge discovery in databases utilizing. Data mining is a technique used in modern medical research, especially to predict heart disease. The primary cause of a significant portion of deaths worldwide is heart disease. Several experiments on the dataset have been done to compare the performance of predictive data mining techniques. The results show that SVM performs better of Other predictive techniques, such as ANN Neural Networks, and the decision tree performs poorly. We are recommending that you test more classifiers, so you may compare the results with other algorithms and improve the system in our earlier work by adding more features. This will help the system predict and diagnose people with heart disease more accurately. Povzetek: Glavni cilj te študije je bil napovedati človeško stanje in ali ima srčno bolezen ali ne. 1 Introduction Data mining technology offers a user-focused method for discovering new and hidden patterns in medical data sets. Medical data mining has a lot of potential for clinical diagnosis, and these patterns can be used. However, the readily accessible raw medical data is dispersed, diverse, and substantial. This data must be gathered in a structured manner. A hospital information system can then be created by integrating the acquired data. According to the World Health Organization, heart disease kills 12 million people every year. Cardiovascular illnesses are responsible for half of all the United States and other affluent nations' fatalities. It is also the main factor in deaths in several countries [1]. The main cause of death in the worldwide is heart disease. In the US, a person dies of heart disease every 34 seconds. There are some types of cardiac disorders, including cardiomyopathy, coronary heart disease, and cardiovascular disease. The term "cardiovascular disease" refers to a broad spectrum of disorders that have an impact on the heart, blood arteries, and how the body pumps and circulates blood. Cardiovascular disease (CVD) causes a variety of ailments, disabilities, and fatalities. Disease diagnosis is a crucial and complex task in medicine [2]. Medical diagnosis is thought of as a significant but challenging duty that must be carried out precisely and effectively. This system would greatly benefit from automation. An automatic medical diagnosis system would likely be incredibly helpful. Clinical tests can be conducted at a lower cost with the help of suitable computer-based information and/or decision support systems. A comparison study of many methodologies available is necessary for the effective and precise implementation of automated systems. In this research, different ways of using predictive and descriptive data mining to diagnose heart disease are looked at [3]. 2 Technology for data mining An artificial neural network (ANN) is a mathematical or computational model that is based on the structural and functional characteristics of biological brain networks. They derived their inspiration from the type of computation carried out by the human brain. ANN is a network of synthetic neurons that uses a connectionist method of computation to analyze input. According to the basic connection principle, mental processes can be modelled as networks of simple, typically uniform units that are interconnected. During the learning phase, ANN frequently acts as an adaptive system, changing its structure in response to external or internal data. In order to find patterns from sets of data, modern neural networks are frequently used to describe complicated interactions between inputs and outputs [4]. ANN is seen as a nonlinear statistical data modelling tool. It is made up of 58 Informatica 47 (2023) 57–62 K. AL-Jammali numerous extremely linked small processing units (artificial neurons). Data is input into ANN using a model of the human brain. An extensive training set is required because ANN is an iterative process. Its unique capability is to extract patterns and directions from complicated data that are too challenging for humans or other computer abilities to identify [5]. In the medical field, medical devices can be monitored by artificial neural networks, which include continuous updating of many requirements, such as heart rate, blood pressure, etc. Neural networks can be trained to learn a classification task and to predict diseases [6]. In the medical field, medical devices can be monitored by artificial neural networks, which include continuous updating of many requirements, such as heart rate, blood pressure, etc. Neural networks can be trained to learn a classification task and to predict diseases [7]. 3 Decision tree Data mining software is essential to the process of discovering knowledge, It uncovers important hidden information. To create fresh target patterns, vast data collection can also be processed. Decision trees are used in many fields, including machine learning, information extraction, applications in biomedicine, and categorization research in science. Systems that produce classifiers are one of the most widely used data mining techniques. Data classification algorithms in data mining can process a large volume of data or knowledge [8]. It can be applied to infer conclusions about category class names, to categorize information according to training materials and class descriptions, and to categorize newly available machine learning techniques for data classification containing multiple algorithms, and this work used the general decision tree algorithm[9]. The decision tree can process nominal and numerical data simultaneously, can be visually explained, visually analyzed, and easily extract rules. When the data set is tested, the decision tree's size is independent of the database size, its running speed is relatively quick, and it can be extended to large databases. The decision tree does not require more expertise in the subject. Fast and simple to understand. Decision trees can handle a variety of data types, including binary, real, ordinal, and nominal values [10]. 4 Support vector machine Support vector machines are a supervised machine learning method It functions both as a predictor and a classifier; it locates a hyper-plane in the feature space for categorization that distinguishes between classes[11]. After that, the test data points are mapped in the same area and are categorized according to either side of a wide margin [12]. 5 Heart disease data We pick a dataset from UCI Machine Learning and download it [13]. We present 13 attributes in this database were extracted from a larger set of 75. The dataset, which includes 13 variables related to heart disease, was created using data from 270 individuals, some of whom were diagnosed with heart disease. While others were not. It is thought that 14 characteristics are a class. Data analysis aims to determine whether or not there is heart disease (1 is none and 2 is present). Three classifiers were utilized in the procedures to identify the new suspect patient’s condition. 6 Results Apply classification models to the following steps that have been taken with the Rapid Miner Framework: split validation for training and test data. 90% of the data is used for training and 10% is used for testing in the ANN classifier The model is then optimized for maximum performance, and the ANN's class detection accuracy is improved by using a confusion matrix. With the first step's default configuration to get the accuracy. We have some steps to configuration. The first step is to add only one hidden layer and increase its neuron count. The second step is to check shuffle data. We normalize values, and we do some steps in other models, like SVM and Decision Tree, Table 1 shows Confusion Matrix for ANN. Table 1: Confusion Matrix for ANN Table 2 shows the Confusion Matrix for Support vector machine. Table 2: Confusion Matrix for SVM Table 3 shows the Confusion Matrix of the Decision Tree. Table 3: Confusion Matrix of the Decision Tree Sick Normal Class Precision Prediction. Sick 30 2.5 80.56% Prediction. Normal 34 3.0 84.44% Class Recall 80.86% 84.44% Sick Normal Class Precision Prediction. Sick 10 1 90.91% Prediction. Normal 2 14 88.25% Class Recall 83.33% 93.33% Sick Normal Class Precision Prediction. Sick 9 2 81.82% Prediction. Normal 3 13 81.25% Class Recall 75.00% 86.67% Prediction of Heart Diseases Using Data Mining Algorithms… Informatica 47 (2023) 57–62 59 Figure 1 shows Confusion Matrix of ANN, SVM, and Decision Tree Figure 1: Confusion Matrix of ANN, SVM, and Decision Tree. As show in Figure 2, the diagnosis model has four stages, the first stage pre-processing that the SVM will use to make a diagnosis. The second step is to set up the SVM algorithm so that it gives the best results. The third step is to use the SVM algorithm to figure out what's wrong with a new case, once you put in the details of a new case, SVM will use of training data to figure out how to handle the new case. In the last step, a medical expert checks the results to make sure they are correct. The new case data is added to the training data to make a better accurate model. By adding the new case's results to the training set, our model will get better. After some time, the amount of training data we have will grow. After many more steps, there will be two types of records in the training data. The original data collect before but wasn’t check it by a doctor, and the other records have been checked one after one. More verified data records will make the model more accurate, and the training data will also continue to grow. We can make less mistakes in training data if we add new patient data that has been checked. SVM and doctors classify this information about patients. Figure 2: Diagnoses Model Figure 3: Accuracy of three model 60 Informatica 47 (2023) 57–62 K. AL-Jammali 7 Analyze the results The results of the SVM classification algorithm are much better than those of ANN classification. There is a clear difference in how accurate they are. When we go beyond the training data, the following Table 3 shows that SVM is better at diagnosing heart disease than ANN learning and Decision Tree. This happens because the ANN model was trained with some examples. This shows that the ANN model should be used in deep learning with large datasets to get a better result. Regarding accuracy reflex, the results of the two model-based training methods were the same, and the result of decision tree model is less than SVM in accuracy reflex that the model-based the technique in the training. On the other hand, the experiment in Table 3 shows that the SVM classifier gives the best results. By comparing the tables, the result of SVM is a better model than other algorithms for classifying heart disease shown in Table 4. so, we have done some tests to see how well and how practical different classification algorithms are for making predictions about Heart Patients shows in Figure 3. And in the medical field, getting things accurate is very important. Figure 2 shows the Diagnoses Model. 8 Compare the results We describe three key categorization models, decision trees, artificial neural networks, support vector machines and using overfitting and hyper parameters to forecast and identify disease. SVM classifier gives the best result that mean SVM was more accurate than ANN Artificial Neural Networks and Decision Tree by 88.89% and we obtain ANN with the greatest accuracy of 82.72%, and the decision tree is (81.48%). The best model that can be used for achieving the results is support vector machines SVM. Table 4: Model accuracy. 9 Conclusion This project included research about one of the most well- known data mining tasks. The main objective of this study was to assess if a person has heart disease or not by comparing three classification algorithms. Since more informative models produce more accurate results, we use SVM, which is more accurate than ANN or decision trees. We describe three key categorization models, decision trees, artificial neural networks, support vector machines and using overfitting and hyper parameters to forecast and identify disease. Overfitting conditions can result from tedious configuration operations, such as setting arguments. Additionally, our experimental results showed that train sets and test sets of data determined model performance and accuracy to evaluate the model's correctness, we employ a confusion matrix. Therefore, the same factors can be utilized to diagnose a state. 270 instances of dataset are used in this study's experiments, which are carried out using RapidMiner and validated using split validation techniques. We conclude from our experiments that, when used to solve the classification issue for the heart disease data analysis task, the SVM classification model performs more accurately than ANN and decision trees, which use sequential minimal optimization. We conducted these tests to make predictions about human health and whether or not he has heart disease. According to the computer's learning theory, the system may forecast new unclassified circumstances after learning from previously classified data. References [1] Ramalingam, V. V., Ayantan Dandapath, and M. Karthik Raja. "Heart disease prediction using machine learning techniques: a survey." International Journal of Engineering & Technology 7.2.8 (2018): 684-687. [Online].Available:https://doi.org/10.14419/ijet.v7i2. 8.10557 [2] Palaniappan, Sellappan, and Rafiah Awang. "Intelligent heart disease prediction system using data mining techniques." 2008 IEEE/ACS international conference on computer systems and applications. IEEE,2008.[Online].Available:https://doi.org/10.1109 /aiccsa.2008.4493524 [3] Dangare, Chaitrali S., and Sulabha S. Apte. "Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques." International Journal of Computer Applications, 47.10 (2012), 44- 48.[Online]. Available: https://doi.org/10.5120/7228- 0076 [4] D.J. Montana and L. Davis. Training Feedforward Neural Networks Using Genetic Algorithms. IJCAI, 1989.[Online].Available:https://www.ijcai.org/Procee dings/89-1/Papers/122.pdf [5] Krizhevsky, Alex. Sutskever and G. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks"(2012).[Online].Available:https://doi.org/0 1145/3065386 [6] J. Schmidhuber, An Overview of Deep Learning in Neural Networks. Neural networks, 61: 80-115, 2015. [Online].Available:https://doi.org/10.1016/j.neunet.20 14.09.003 [7] Katarya, Rahul, and Sunit Kumar Meena. "Machine learning techniques for heart disease prediction: a comparative study and analysis."87-97. [Online]. Model Accuracy result SVM 88.89% ANN 82.72% Decision Tree 81.48% Prediction of Heart Diseases Using Data Mining Algorithms… Informatica 47 (2023) 57–62 61 Available: https://doi.org/10.1007/s12553-020-00505- 7 [8] Sabarinathan, V., and V. Sugumaran. "Diagnosis of heart disease using decision tree." International Journal of Research in Computer Applications & Information Technology 2.6 (2014): 74-79. [Online]. Available: https://www.researchgate.net/publication/298181341 [9] Najjar, Noor. "Analyzing Data Mining Statistical Models of Bio Medical." (2018). [Online]. Available:https://edit.elte.hu/xmlui/handle/10831/410 34?key=Noor [10] Singla, Anshu, Swarnajyoti Patra, and Lorenzo Bruzzone. "A novel classification technique based on progressive transductive SVM learning." Pattern Recognition Letters 42 (2014): 101-106. [Online]. Available:https://doi.org/10.1016/j.patrec.2014.02.03 [11] Zhang, Ying, et al. "Sample-specific svm learning for person re-identification." Proceedings of the IEEE conference on computer vision and pattern recognition.2016.[Online].Available:https://doi.org/0. 1109/cvpr.2016.143 [12] Wang, Shengzheng, Dacheng Tao, and Jie Yang. "Relative attribute SVM+learning for age estimation." IEEE transactions on cybernetics 46.3 (2015)p.825-835.[Online].Available: https://doi.org/10.1109/tcyb.2015.2416321 [13] M. Lichtman, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], Irvine, University of California, Irvine, School of Information and Computer Sciences (2013). [Online]. Available: http://archive.ics.uci.edu/ml/datasets/Heart+Disease 62 Informatica 47 (2023) 57–62 K. AL-Jammali