https://doi.org/10.31449/inf.v45i3.3223 Informatica 45 (2021) 381–392 381 Extreme Learning Machines with Feature Selection Using GA for Effective Prediction of Fetal Heart Disease: A Novel Approach Debjani Panda KIIT University, Bhubaneswar, India E-mail: pandad@indianoil.in Divyajyoti Panda National Institute of Technology, Rourkela, India E-mail: pandadivya02@gmail.com Satya Ranjan Dash KIIT University, Bhubaneswar, India E-mail: sdashfca@kiit.ac.in Shantipriya Parida (corresponding author) Idiap Research Institute, Martigny, Switzerland E-mail: shantipriya.parida@idiap.ch Keywords: extreme learning machine, ga, feature selection, linear regression, ridge, lasso, heart disease Received: July 1, 2020 Heart disease is considered to be the most life-threatening ailment in the entire world and has been a major concern of developing countries. Heart disease also affects the fetus, which can be detected by cardiotocography tests conducted on the mother during her pregnancy. This paper analyses the presence of heart disease in the foetus by optimizing the Extreme Learning Machine with a novel activation function (roots). The accuracy of predicting the heart condition of the foetus is measured and compared with other activation functions like sigmoid, Fourier, tan hyperbolic, and a user-defined function, called “roots”. The best features from the Cardiotocography data set are selected by applying the Genetic Algorithm (GA). ELM with activation functions sigmoid, Fourier, tan hyperbolic, and roots (a novel function), have been measured and compared on accuracy, sensitivity, specificity, precision, F-score, area under the curve (AUC), and computation time metrics. The GA uses three types of regression: linear, lasso, and ridge, for cross-validation of the features. ELM with user-defined activation function shows comparable performance with sigmoid and hyperbolic tangent functions. Features selected from linear and lasso produce better results in ELM than those selected from the ridge. It gives an accuracy of 96.45% as compared to 94.56% and 94.56% respectively with the best features selected from both linear and lasso. The roots activation function also takes 2.50 seconds computation time versus 3.27 seconds and 2.67 seconds for sigmoid and hyperbolic tangent respectively and scores better on all other metrics in designing an efficient model to classify fetal heart disease. Povzetek: Z metodami strojnega uˇ cenja in genetskih algoritmov je analizirana bolezen srca pri fetusih. 1 Introduction Cardiovascular disease is growing at a very fast rate and as per WHO, 30% of world population deaths occur due to cardiovascular heart diseases, and 23.6 million are ex- pected to be affected by this disease by 2030 [3]. Cardiac disease is not only present in adults but can also be present as a birth anomaly in a newborn child and causes neonatal fatalities. The heart health of the fetus can be monitored to detect abnormal heartbeats and predict diseases affecting the fetus. Thus, predicting the cardiac health of a fetus is the need of the hour. Cardiotocography is one of the most commonly used Nonstress Tests which helps in determin- ing the fetus’s well-being in the womb and during labor. Cardiotocography consists of uterine contractions and fe- tal heart rate. Fetal heart rate includes attributes like base- line heart rate, variations in baseline heart rate, accelera- tions, decelerations, and uterine contractions. This test is very useful in studying the base heart rate and uterine con- tractions pattern and is a vital tool for medical experts to know when a fetus is suffering from an inadequate supply of blood or oxygen to the body or any of its parts. As per the important factors identified by the National Institute of Child Health and Human Development (NICHD), baseline heart rate and its variability, accelerations, deceleration and Nonstress test (NST) are important factors to be considered while examining the well-being of the fetus [24]. The cardiotocography test is carried out by a device 382 Informatica 45 (2021) 381–392 D. Panda et al. called Electronic Fetal Monitor [27] which gives two sig- nals fetal heart rate (FHR) and uterine contractions (UC). NST and contraction stress test (CST) are two main compo- nents of a CTG [8]. The NST determines whether the fetus is distressed and CST determines the placenta’s respiratory function. The normal range of FHR baseline lies between 110 bpm and 160 bpm. If the FHR baseline is higher than 160 bpm for more than 10 minutes, the fetus is considered to be suf- fering from tachycardia. On the other hand, if the FHR baseline is less than 110 bpm for more than 10 minutes is called bradycardia [6]. Both tachycardia and bradycar- dia are signs of fetal distress. The conditions are found out from NST which determines the fetal reactivity i.e. the interaction between the sympathetic and parasympathetic autonomous nervous system of the fetus. Recently machine learning with the use of artificial in- telligence has become an important and powerful tool for predicting the heart health of patients. They are effective in both binary and multi-class classification and are effec- tive in predicting cardiac disease. One of the effective tools which are being used for the learning process for single hid- den layer feeds forward neural networks (SLFNs) is called extreme learning machine (ELM) [2]. The prime benefit of ELM is that the hidden layer of SLFNs does not re- quire tuning and it also has a fast rate of convergence [13]. The learning speed of ELM is considered to be thousands of times faster than the traditional feed-forward network learning algorithms [11]. Our study mainly focuses on us- ing GA for feature selection and studying the accuracy of ELM using different activation functions. The following section describes the details of the data set, implementation of ELM as a Classifier that uses the best features identified by the Genetic algorithm. The cross-validation methods used for obtaining the best fea- tures are studied thoroughly to study the impact of ELM with four activation functions. The purpose is to study the effectiveness of the novel activation function by comparing it with existing activation functions. 2 Methods 2.1 Workflow diagram The process flow of our proposed model is as described be- low in Figure 1. The data set is considered with output class NSP and is pre-processed to remove duplicate entries. Us- ing GA for obtaining the best features, the model is cross- validated with 3 regression models and the performance of ELM is studied before and after feature selection with the existing and novel activation function. 2.2 Dataset details The Cardiotocography Data Set, obtained from UCI repos- itory [9], has been used for our study and experimenta- tion. The data set originally has 2126 instances with 23 attributes. The CTGs were also classified by three expert obstetricians into 2 types of classes including the class pat- tern (1-10) and fetal state class (N=Normal, S=Suspect, P= Pathologic). The data set has 21 attributes and two out- put classes. Our experiment is focused on considering all 21 attributes along with one output class. Similar to other studies conducted on this data set, our experiment also con- siders 22 attributes where 21 attributes are inputs and the 22nd attribute is the output class “NSP". We have not con- sidered the other output class “CLASS" for our study. 21 attributes with NSP as the output class, described in Table 1. 2.3 Data pre-processing and splitting of data sets for model training Other than the aforementioned 21 features and the out- put columns, ‘CLASS’ and ‘NSP’, the original database has 23 other columns, which were removed. Thereafter, the data set, named ‘DT’, were split into two subset data sets ‘DT_CLASS’ and ‘DT_NSP’ containing ‘CLASS’ and ‘NSP’ respectively. 12 duplicate rows were deleted, and the last four rows containing null values were also removed. The data set of DT_NSP was split to an 80:20 ratio to train the classifiers on 80% of the data and perform the test- ing on the remaining 20% of the data. 2.4 Feature Selection and classification Feature Selection is an important part of designing a pre- dictive model to reduce unwanted features and also to re- duce the training time of classifiers. In this paper, the im- portant features are identified by using the Genetic Algo- rithm. The training data set were given as input to ELM with different activation functions and their accuracy was stud- ied. Linear, lasso, and ridge regression models have been used for cross-validation of candidate feature subsets gen- erated by GA. The attributes selected are considered as best features and the classification algorithms performance has been tabulated. 2.4.1 Genetic Algorithm (GA) The genetic algorithm is a simple Evolutionary search heuristic algorithm that randomly generates a new popu- lation. Its basic objective is to find the attributes with max- imum fitness value in the population [14]. Based on the Darwinian Principle, it tries to find the fittest individuals. The entire set of candidate solutions is called a popula- tion and each solution is called an individual. Our Genetic algorithm searches for the solution which gives the mini- mum cross-validation error through linear, lasso, and ridge regression models. The chromosomes are generated with fitness values as true or false for each attribute and after it- erating for the total number of generations the features are determined which are best fit to predict the outcome. GA Extreme Learning Machines with Feature Selection Using. . . Informatica 45 (2021) 381–392 383 Figure 1: Process flow diagram to study impact of feature selection on ELM with various activation functions. depends upon the number of generations, number of chro- mosomes, number of children created during the crossover, and best chromosomes. Depending upon the best fitness values, parents are selected for mating [1]. Crossover has been carried out with 2 parents and mutated to generate the new population and the process was repeated for 20 gen- erations after which the fitness value of features remained constant. Finally, the features with the best fitness values are obtained. Regression models: It is a supervised method in ma- chine learning to find the correlation of dependant variables in terms of the independent variables. It is effectively used for dimensionality reduction of collinear or multi-collinear variables. The following regression models are used in GA: Linear Regression: The equation can be written as shown in Equation 1. y = 0 + p X k=1 of ik x ik (1) Ridge: This method uses L2 regularization, where L2 is the penalty equivalent [23] to the sum of the magnitude of coefficients. This type of regression [22] helps in deal- ing with a variance that is resultant of the multi-collinearity of variables. It helps in reducing the variance which is a resultant of non-linear relationships between two indepen- dent variables. Lasso: This model is based on L1 regularization in which the least related variables are treated as zero. So, it helps minimize irrelevant features. It adds a penalty to minimize the loss of a model. L1 is the penalty added to the sum of the absolute value of coefficients. For the objective function (Equation 2), P N I=1 off(x i ; y i ; ; ) N (2) the lasso regularized version of the estimator will be the solution to the Equation 3. min ; of P N I=1 off(x i ; y i ; ; ) N ; subject tok k 1