https://doi.org/10.31449/inf.v45i2.3407 Informatica 45 (2021) 243–256 243 
 
Stock Market Prediction with Gaussian Naïve Bayes Machine 
Learning Algorithm 
Ernest Kwame Ampomah  
School of Information & Software Engineering,  
University of Electronic Science and Technology of China, China 
E-mail: ampomahke@gmail.com  
 
Gabriel Nyame
 
Department of Information Technology Education 
Akenten Appiah-Menka University of Skills Training and Entrepreneurial Development, Kumasi-Ghana  
E-mail: kwakuasane1972@gmail.com 
 
Zhiguang Qin  
School of Information & Software Engineering,  
University of Electronic Science and Technology of China, China 
E-mail: qinzg@uestc.edu.cn  
 
Prince Clement Addo
 
School of Management and Economics, University of Electronic Science and Technology of China, China 
E-mail: prince@std.uestc.edu.cn 
 
Enoch Opanin Gyamfi  
School of Information & Software Engineering,  
University of Electronic Science and Technology of China, China 
E-mail: enochopaningyamfi@outlook.com 
 
Michael Gyan
 
Department of Physics Education, University of Education, Winneba-Ghana  
E-mail: mgyan173@gmail.com 
Keywords: machine learning, gaussian naïve Bayes, stock price, feature extraction, scaling  
Received: January 8, 2021 
The stock market is one of the key sectors of a country’s economy. It provides investors with an opportunity 
to invest and gain returns on their investment. Predicting the stock market is a very challenging task and 
has attracted serious interest from researchers from many fields such as statistics, artificial intelligence, 
economics, and finance. An accurate prediction of the stock market reduces investment risk in the market. 
Different approaches have been used to predict the stock market. The performances of Machine learning 
(ML) models are typically superior to those of statistical and econometric models. The ability of Gaussian 
Naïve Bayes ML algorithm to predict stock price movement has not been addressed properly in the existing 
literature, hence this attempt to fill that gap in the literature by evaluating the performance of GNB 
algorithm when combined with different feature scaling and feature extraction techniques in stock price 
movement prediction. The performance of the GNB models set up were ranked using the Kendall’s test of 
concordance for the various evaluation metrics used. The results indicated that, the predictive model 
based on integration of GNB algorithm and Linear Discriminant Analysis (GNB_LDA) outperformed all 
the other models of GNB considered in three of the four evaluation metrics (i.e., accuracy, F1-score, and 
AUC). Similarly, the predictive model based on GNB algorithm, Min-Max scaling, and PCA produced the 
best rank using the specificity results. In addition, GNB produced better performance with Min-Max 
scaling technique than it does with standardization scaling techniques 
Povzetek: Predstavljena je metoda Gausovega naivnega Bayesa za borzne napovedi. 
 Introduction
The stock market is one of the key sectors of a country’s 
economy. It provides investors with an opportunity to 
invest and gain returns on their investment. Predicting the 
stock market has attracted serious interest from 
researchers from many fields such as statistics, artificial 
intelligence, economics, and finance. An accurate 
244 Informatica 45 (2021) 243–256  E.K. Ampomah et al. 
 
prediction of the stock market reduces investment risk in 
the market. Different opinions exist as regards to the 
predictability of the stock market. The efficient market 
hypothesis (EMH) states that all available information is 
fully incorporated by current market price immediately, 
therefore, changes in price of the stocks are as a result of 
new information [1]. The EMH implies that stock prices 
would trail a random walk pattern, hence, the stock market 
cannot be forecasted from past data to make any 
meaningful returns [2]. However, numerous researches 
have been conducted since the beginning of the 21st 
century which contradicts the EMH and show that the 
stock market can be predicted to some extent [3-5]. 
Exploration of many prediction algorithms in stock market 
forecasting has taken place and showed that the behavior 
of stock prices can be forecast [6]. The prediction of the 
stock market behavior is a very difficult task since this 
market is very complex, non-linear, and evolutionary. The 
market is influenced by situations such as investors’ 
sentiments, political events, and overall economic 
conditions [7]. Three main approaches: fundamental 
analysis, technical indicators, and machine learning (ML) 
are used to forecast the stock market. In fundamental 
analysis, the value of a stock is derived from the general 
economic and financial factors such inflation, return on 
equity (ROE), price to earnings (PE) ratios, and debt 
levels. In technical analysis approach, technicians use 
charts and market statistics from historical price data to 
identify market trends and patterns so that they can make 
fairly accurate forecast of the trajectories of the stock 
market behavior [8]. The machine learning approach 
offers system the ability to learn and improve 
automatically from massive amount of historical data 
without them being explicitly programmed. Machine 
learning models have been shown to perform better than 
both fundamental, and technical analyses in the literature 
[9-11]. Distributional assumptions are not required by ML 
models. Also, ML models are able find hidden patterns in 
time series data [12-13].  Several machine learning 
algorithms exist, but the focus of this study is on Gaussian 
Naïve Bayes (GNB) algorithm. GNB is a probabilistic 
classifier based on Bayes' theorem with assumption of 
strong (naïve) independence between the features [14].  
GNB algorithm is very simple and easy to implement and 
does not require too many training data. It is highly 
scalable (it scales linearly with the number of features and 
data points), not sensitive to irrelevant features and able to 
deal with missing data very effectively. A major weakness 
with GNB algorithm is the assumption of independence 
between predictors. GNB assumes that all the predictors 
are mutually independent. This assumption is hardly true 
in real life especially with financial data. However, this 
assumption can be met by applying feature extraction 
techniques to extract independent predictors from the 
given data. Many feature extraction techniques are 
available in the literature which can be used to achieve this 
goal. Hence, this work assesses the performance of GNB 
with different feature scaling and feature extraction 
techniques in predicting the direction of movement of 
stock prices. 
 Related studies 
Many ML algorithms have been used in the literature of 
forecasting the direction of stock price. A review of some 
of those works is provided. Ampomah et al, (2020) [14] 
studied the effectiveness of tree-based AdaBoost 
ensemble ML models (namely, AdaBoost-DecisionTree 
(Ada-DT), AdaBoost-RandomForest (Ada-RF), 
AdaBoost-Bagging (Ada-BAG), and Bagging-ExtraTrees 
(Bag-ET)) in predicting stock prices. The experimental 
results showed that AdaBoost- ExtraTree (Ada-ET) model 
generated the highest performance among the tree-based 
AdaBoost ensemble models studied. Kumar and 
Thenmozhi (2006) [15] carried out a study to forecast the 
direction of S&P CNX NIFTY Market Index of the 
National Stock Exchange (NSE). Random forest, linear 
discriminant analysis, artificial neural network, logit, and 
SVM machine learning algorithms were used by the 
researchers. The experimental results indicated that SVM 
is the best performer among the classification algorithms 
used. Ou and Wang (2009) [16], studied and applied ten 
different data mining techniques to forecast stock price 
movement of Hang Seng index of Hong Kong stock 
market. The techniques included neural network, Linear 
discriminant analysis (LDA), Logit model, Quadratic 
discriminant analysis (QDA), K-nearest neighbor 
classification, Naïve Bayes based on kernel estimation, 
Bayesian classification with Gaussian process, Tree based 
classification, SVM and Least squares support vector 
machine (LS-SVM). The empirical results presented 
indicate that the performance of SVM and LS-SVM 
models are superior to those of the other models. Subha 
and Nambi (2012) [17] examined the predictability of the 
movement of BSE-SENSEX and NSE-NIFTY stock 
indices of the Indian Stock Market by using k-Nearest 
Neighbours algorithm (k-NN) and Logistic Regression 
model to predict the daily movement of the indices. Data 
for the period between January 2006 to May 2011were 
used. The research outcome shows that the k-NN classifier 
performed better than the logistic regression model in all 
the model evaluation metrics used. Saifan et al, (2020) 
[18] applied the Quantopian algorithmic stock market 
trading simulator to evaluate ensemble models 
performance in daily prediction and trading. The ensemble 
models used are Extremely Randomized Trees, Random 
Forest, and Gradient Boosting. The models were trained 
using multiple technical indicators and automatic stock 
selection. The results showed a significant returns relative 
to the benchmark and large values of alpha were generated 
from all models. A study to verify whether modified SVM 
classifier can be applied successfully in prediction of 
short-term trends in the stock market was undertaken by 
Zikowski, (2015) [19]. The author computed and used 
several technical indicators and statistical measures as 
input features. Fisher’s method was applied to perform 
feature selection. The study outcome shows that using the 
modified SVM in conjunction with feature selection 
enhance significantly the trading strategy results in terms 
of the total rate of return, as well as the maximum 
drawdown during a trading period. Patel, et al (2015) [20], 
compared the performance of Artificial Neural Network 
Stock Market Prediction with Gaussian Naïve Bayes... Informatica 45 (2021) 243–256 245 
 
(ANN), support vector machine (SVM), random forest 
and Naive-Bayes with two different approaches for input 
data to the models in forecasting the direction of 
movement of stock and stock price index. The first 
approach to input data computed ten technical indicators 
from the stock trading data (open, high, low & close 
prices) and the second approach represent the technical 
indicators as trend deterministic data. They evaluated the 
models with 10 years of historical stock data from 2003 to 
2012 of Reliance Industries, Infosys Ltd, CNX Nifty and 
S&P Bombay Stock Exchange (BSE) Sensex. The 
outcome of the study shows that for the first approach 
random forest outperforms other three prediction models 
on overall performance. Also, that the performance of all 
the prediction models improved when these technical 
indicators are represented as trend deterministic data. Sun 
et al, (2018) [21], proposed a hybrid ensemble learning 
model combining AdaBoost and LSTM network to predict 
financial time series. Daily datasets of two major 
exchange rate and two stock market indices were used for 
evaluating the model. The experimental outcome shows 
that the AdaBoost-LSTM ensemble model outperformed 
the other single forecasting models and ensemble models 
that were compared with it. Khan et al, (2020) [22] 
assessed the impact of social media and financial news 
data on stock market prediction accuracy. The authors 
performed feature selection and spam tweets reduction on 
the data sets. Experiments were performed to find stock 
markets that are difficult to predict and those that were 
more influenced by social media and financial news.  They 
compared the results of different algorithms to find a 
consistent classifier. Deep learning and some ensemble 
classifiers were used. The experimental results indicated 
that highest prediction accuracies of 80.53% and 75.16% 
were achieved using social media and financial news, 
respectively. Also, New York and Red Hat stock markets 
were difficult to predict, New York and IBM stocks are 
more influenced by social media, while London and 
Microsoft stocks by financial news. Random forest 
classifier was found to be consistent and highest accuracy 
of 83.22% was achieved by its ensemble. Bhandare et al, 
(2020) [23] used the Naive Bayes classifier to provide 
analyse and quantify the performance of stock market 
analysts by providing ratings. The recommendations given 
by the analysts was analysed and factors relevant to the 
success or failure of the recommendation extracted. The 
Naive Bayes classifier was used provide a rating on the 
factors thus extracted. The results indicated that the 
system efficiently analyse the performance of an analyst 
given their passed records by matching it with the actual 
stock prices and provide a rating for the analyst using the 
Naive Bayes classifier. The performance of the system is 
optimal when Gaussian Naive Bayes Classifier was used. 
From the above discussion, and to the best of our 
knowledge, the ability of Gaussian Naïve Bayes to predict 
stock price movement has not been addressed properly in 
the existing literature. Hence, a gap study aims to fill in 
that gap by evaluating the impact of feature scaling and 
feature extraction techniques on GNB algorithm in 
prediction of stock price movement. 
 Method 
3.1 Experimental design 
Stock Data set used for the study were gathered randomly 
from three different stock market (NYSE, NASDAQ and 
NSE) through yahoo financial application programming 
interface (API). Daily data of seven stocks were gathered. 
Details of the stock data used are given in Table A1 in the 
Appendix. Forty (40) technical indicators were computed 
from the raw stock data which comprise of open price, low 
price, high price, close price and volume. The computed 
technical indicators were used as input features for the 
GNB models. Details of these technical indicators are 
presented in Table A2-A4 in the Appendix. Each data set 
was split into training and test set. Initial seventy percent 
(70%) of the data was used as the training set, and the final 
thirty percent (30%) of the data was used as the test set. In 
this work, the ability of GNB algorithm in combination 
with different feature scaling techniques (i.e., 
Standardization scale and Min-Max scale) and different 
feature extraction techniques (i.e., PCA, LDA, and FA) to 
forecast stock price movement were evaluated.  
The following GNB models were evaluated and 
compared: (i) GNB model, (ii) Integrated model based on 
GNB algorithm and standardization scaling (GNB_Z-
Score) (iii) Integrated model based on  GNB algorithm and 
Min-Max normalization (GNB_Min-Max) (iv) Integrated 
model based on GNB algorithm and principal component 
analysis (GNB_PCA) (v) Integrated model based on GNB 
algorithm and factor analysis (GNB_FA) (vi) Integrated 
model based on GNB algorithm and  linear discriminant 
analysis (GNB_LDA) (vii) Integrated model based on 
GNB algorithm, standardization scaling, and principal 
component analysis (GNB_Z-Score_PCA) (viii) 
Integrated model based on GNB algorithm, 
standardization scaling, and factor analysis (GNB_Z-
Score_FA) (ix) Integrated model based on GNB 
algorithm, Min-Max normalization, and principal 
component analysis (GNB_Min-Max_PCA) (x) 
Integrated model based on GNB algorithm, Min-Max 
normalization, and factor analysis (GNB_Min-Max_FA). 
 GNB model applies the GNB algorithm to the raw stock 
data without any feature scaling or feature extraction to 
make prediction. GNB_Z-Score model first used the 
standardization scaling technique to scale the data, and 
then the GNB algorithm was applied to forecast the 
movement of stock price. GNB_Min-Max model applied 
Min-Max scaling technique to scale the data before the 
GNB algorithm was applied to the scaled data to make 
predictions. With GNB_PCA model, the PCA was first 
applied to the unscaled stock data to extract important 
features from the data, and then the GNB algorithm was 
applied to the extracted data to make prediction. GNB_FA 
model initially applied FA technique to the unscaled stock 
data to extract relevant features from the original data, and 
then applied the GNB algorithm to the extracted data to 
make predictions. GNB_LDA model first used LDA 
technique to extract relevant features from the initial input 
data, and then applied the GNB algorithm to the extracted 
data. GNB_Z-Score_PCA model first applied 
246 Informatica 45 (2021) 243–256  E.K. Ampomah et al. 
 
standardization scaling technique to the initial data to 
scale it, after that it then applied PCA to extract important 
features from the scaled data and finally applied the GNB 
to the extracted scaled stock data to make predictions. 
GNB_Z-Score_FA model initially used standardization 
scaling technique to scale the data, then applies the FA 
technique to extract relevant features from the scaled data 
and the GNB algorithm was applied to the extracted scaled 
data to make predictions. GNB_Min-Max_PCA model 
initially scaled the input data with Min-Max scaling 
technique, then applied PCA to extract important feature 
from the original stock data, and then applied the GNB 
algorithm to the extracted scaled data to make predictions. 
GNB_Min-Max_FA uses Min-Max to first scaled the 
data, then applied the FA technique to extract relevant 
features from the scaled data, and finally applied the GNB 
algorithm to make predictions. 
3.2 Feature scaling techniques 
Feature Scaling is a way of standardizing the independent 
features that are present in the data within a fixed range. 
The two most widely used feature scaling techniques are 
standardization scaling and Min-Max Normalization. 
3.2.1 Standardization scaling 
Standardization scaling (Z-score) is a scaling method that 
centers the values around the mean with a unit standard 
deviation. The data is scaled to a specific area to enable a 
thorough analysis. The variables are rescaled to have a 
mean of zero and the resulting distributions have a unit 
standard deviation. The standardized scaling is expressed 
by the formula below. 
𝑋 ′
=
𝑋 − 𝜇 𝜎                                                                         (1) 
 = mean of the feature values 
 = standard deviation of the feature values 
3.2.2 Min-max normalization 
Min-Max normalization (Min-Max) is a scaling approach 
in which features are re-scaled so that the data will fall in 
the range of zero and one. It undertakes a linear alteration 
on the initial data [24]. In the Min-Max scaling, the 
minimum value of every feature is converted to zero, and 
the maximum value of each feature is converted to one. 
The formula below expresses how the normalized form of 
each feature is computed. 
 
𝑋 ′
=
𝑋 − 𝑋 𝑚 𝑖 𝑛 𝑋𝑚𝑖 𝑛 𝑚 𝑎 𝑥                                                                  (2) 
𝑋 𝑚 𝑖𝑛 = minimum value of the feature, 
max
X = maximum 
value of the feature 
3.3 Feature extraction techniques 
Feature extraction is a dimensionality reduction process 
that extracts important features or attributes of the data in 
order to reduce the initial set of data to generate a more 
concise description of the data for processing. There are 
many feature extraction techniques in existing literature, 
however, in this study the principal component analysis 
(PCA), Linear discriminant analysis (LDA), and factor 
analysis (FA) were applied. 
3.3.1 Principal component analysis 
Principal Component Analysis (PCA) is a dimensionality-
reduction technique that transform higher data sets to a 
lower dimensional set. It transforms a data set of 
interrelated features, into a new set of uncorrelated 
features called principal components (PCs) and the initial 
few of these PCs hold most of the variation present in the 
entire data set [25]. The PCs are linear combination of the 
actual features in such a way that the first PC has the 
largest amount of variation and the second PC is 
orthogonal to the first PC and has the most variance among 
the remaining PCs. The subsequent PCs follow in that 
order. The underlying assumption in PCA is that the 
coordinates with the large variants demonstrate the 
divergence between sample points, while the coordinates 
with lesser variants may be a source of noise, which must 
be ignored or suppressed. The correlation between two 
dimensions denotes irrelevant information, which will not 
be presented. This is why PCA requires the subsequent 
coordinates to be orthogonal to previous coordinates [26]. 
PCA is sensitive to scaling. 
3.3.2 Linear discriminant analysis 
Linear Discriminant Analysis (LDA) is a supervised linear 
transformation technique that computes the linear 
discriminants (directions) that will represent the axes 
which maximize the differences between multiple classes. 
The objective of the technique is to maximize the ratio of 
the between-group variance and the within-group 
variance. When the ratio is maximum, then the instances 
within each group have the least possible scatter and the 
groups are separated from each other the most. LDA is 
used to map features in higher dimension space into a 
lower dimension space while keeping the class-
discriminatory information [27]. LDA is not sensitive to 
scaling, hence, the performance of LDA remains the same 
with or without scaling. The LDA uses two criteria to 
generate a new axis: (i) maximize the distance between 
means of the two classes, (ii) minimize the variation 
within each class. 
3.3.3 Factor analysis 
Factor analysis (FA) is a data reduction technique that 
describes variability among observed, correlated features 
in terms of a potentially smaller number of unobserved 
(latent) features called factors. The observed features are 
modeled as linear combinations of the factors plus error. 
FA extracts maximum common variance from all features 
and place them under a common score. This score as an 
index of all features can be used to do further analysis. FA 
evaluates how much of the variability in the data is as a 
result of common factors. The main goals of FA are to 
display multidimensional data in a lower dimensional 
space with minimum loss of information and to extract the 
independent latent of the data [28]. The FA technique 
makes the following assumptions: linear relationship 
Stock Market Prediction with Gaussian Naïve Bayes... Informatica 45 (2021) 243–256 247 
 
exists between the observed features and the common 
factors, no multi-collinearity is present, it includes 
relevant features into analysis, and there is true correlation 
between features and factors. 
3.4 Evaluation metrics 
The performances of the models were evaluated using the 
following evaluation metrics:  
Accuracy: The percentage of entire instances rightly 
predicted by the model. 
𝑎𝑐 𝑐𝑢𝑟 𝑎𝑐 𝑦 =
𝑡𝑝 +𝑡𝑛
𝑡𝑝 +𝑡𝑛 + 𝑓𝑝 + 𝑓𝑛
                                 (3) 
F1-score: This is a harmonic mean of precision and recall 
𝐹 1 − 𝑠 𝑐𝑜 𝑟 𝑒 =
2 × 𝑝 𝑟 𝑒 𝑐 𝑖 𝑠 𝑖 𝑜𝑛 × 𝑟 𝑒 𝑐 𝑎 𝑙 𝑙 𝑝 𝑟 𝑒 𝑐 𝑖 𝑠 𝑖 𝑜𝑛 + 𝑟 𝑒 𝑐 𝑎 𝑙 𝑙                                     (4)  
Specificity: The proportion of negative instances rightly 
predicted by the classifier out of the total instances that are 
actually negative. This shows a model’s ability to classify 
true negative instances as negative. 
𝑠 𝑝 𝑒𝑐 𝑖 𝑓 𝑖 𝑐 𝑖 𝑡𝑦 =
𝑡𝑛
𝑡𝑛 + 𝑓𝑛
                                                      (5) 
Area Under Receiver Operating Characteristics Curve 
(AUC): Measures the ability of the classifier to distinguish 
between the positive and negative classes. A perfect 
classifier will have AUC of one. AUC measures tradeoff 
between specificity and recall. 
 
Kendall’s coefficient of concordance (W): is a metric that 
uses ranks to establish an agreement among raters. It 
measures the agreement among different raters who are 
evaluating a given set of objects [29]. Depending on the 
area where it is being used, the raters can be variables, 
characters, and so on. The raters are the different data sets 
in this work.  
  Experimental results 
Table 1 provides the accuracy results generated by the 
various GNB models on the different stock data sets used. 
GNB_LDA model produced accuracy results which were 
better than all the other GNB models on each of the stock 
data used. The highest accuracy value recorded by any of 
 
Figure 1: Accuracy results of the GNB models on the different stock data sets. 
DataSets GNB GNB_ 
Z-Score 
GNB_ 
MinMax 
GNB_  
PCA 
GNB_ 
Z-Score_PCA 
AAPL 0.5361 0.6241 0.6241 0.5342 0.6444 
ABT 0.5713 0.6954 0.6954 0.5361 0.8639 
KMX 0.5583 0.6982 0.6982 0.5111 0.8509 
S&P_500 0.5722 0.6704 0.6704 0.5472 0.7444 
TATASTEEL 0.5461 0.7232 0.7232 0.5012 0.8713 
HPCL 0.5197 0.6085 0.6085 0.5126 0.6953 
BAC 0.5472 0.7111 0.7111 0.5056 0.8333 
Mean 0.5501 0.6758 0.6758 0.5211 0.7862 
      
DataSets GNB_ 
MinMax_ PCA 
GNB_ 
FA 
GNB_ 
 Z-Score_FA 
GNB_ 
MinMax_FA 
GNB_ 
LDA 
AAPL 0.7241 0.6370 0.7111 0.7314 0.8769 
ABT 0.8398 0.8407 0.8462 0.8528 0.8861 
KMX 0.8648 0.7963 0.7907 0.8176 0.8870 
S&P_500 0.7546 0.4537 0.7269 0.7583 0.8259 
TATASTEEL 0.8809 0.8701 0.8616 0.8637 0.9142 
HPCL 0.7548 0.5832 0.6700 0.6700 0.9092 
BAC 0.8435 0.8509 0.8407 0.8454 0.8713 
Mean 0.8089 0.7188 0.7782 0.7913 0.8815 
Table 1: Accuracy results recorded by the GNB models. 
248 Informatica 45 (2021) 243–256  E.K. Ampomah et al. 
 
the models is 0.9142 generated by the GNB_LDA model 
on the TATASTEEL data. The least accuracy value 
recorded by any of the models is 0.5012 by GNB_ PCA 
model on the TATASTEEL stock data. The mean 
accuracy value of GNB_LDA model (0.8815) was the 
highest mean accuracy value, and GNB_ PCA produced 
the least mean accuracy value (0.5211).  Figure 1 provides 
the column chart of the accuracy values produced by the 
GNB models on the different stock data.     
Table 2 shows the outcome of F1-scores evaluation 
metric of the GNB models on the different stock data sets 
used. The F1-score of GNB_LDA model was better than 
all the other GNB models on each of the stock data. The 
highest F1-score recorded by any of the models was 
0.9167 generated by the GNB_LDA model on the 
TATASTEEL data. The least F1-score value recorded by 
any of the models was 0.4166 produced by GNB_ PCA on 
the TATASTEEL stock data. The mean F1-score of 
GNB_LDA model (0.8815) was the highest mean F1-
score among the GNB models, and the mean F1-score of 
the GNB model (0.6244) was the least mean F1-score 
among the various models. Figure 2 represents the column 
chart of the F1-score evaluation metric outcome for the 
GNB models on the different stock data. 
Table 3 presents the specificity results of the models 
on the different stock data sets used. GNB_Z-Score_FA 
 
Figure 2: F1-score of the GNB models on the different stock data sets. 
DataSets GNB GNB _ 
Z-Score 
GNB _ 
MinMax 
GNB _ PCA GNB_ 
Z-Score_PCA 
AAPL 0.6962 0.5365 0.5365 0.6964 0.5362 
ABT 0.6662 0.6803 0.6803 0.6980 0.8740 
KMX 0.6175 0.6766 0.6766 0.5629 0.8540 
S&P_500 0.6583 0.6139 0.6139 0.7074 0.7058 
TATASTEEL 0.4860 0.7461 0.7461 0.4166 0.8747 
HPCL 0.6161 0.7074 0.7074 0.6777 0.7659 
BAC 0.6304 0.7342 0.7342 0.6454 0.8454 
Mean 0.6244 0.6707 0.6707 0.6292 0.7794 
      
DataSets GNB_ 
MinMax_PCA 
GNB_ 
FA 
GNB_ 
Z-Score_FA 
GNB_ 
MinMax_FA 
GNB_ 
LDA 
AAPL 0.6740 0.5220 0.6494 0.6875 0.8783 
ABT 0.8443 0.8590 0.8534 0.8635 0.8976 
KMX 0.8653 0.7835 0.7839 0.8108 0.8891 
S&P_500 0.7237 0.3114 0.7053 0.7398 0.8175 
TATASTEEL 0.8760 0.8762 0.8626 0.8648 0.9167 
HPCL 0.7990 0.7031 0.7532 0.7532 0.9119 
BAC 0.8516 0.8546 0.8473 0.8542 0.8790 
Mean 0.8048 0.7014 0.7793 0.7963 0.8843 
Table 2: F1-scores recorded by the GNB model. 
Stock Market Prediction with Gaussian Naïve Bayes... Informatica 45 (2021) 243–256 249 
 
model outperformed the other models on AAPL. 
GNB_Min-Max_ PCA model performed better than the 
rest of the models on ABT, and TATASTEEL stock data. 
GNB_LDA model recorded better specificity results than 
the rest of the models on KMX, and HPCL stock data. 
GNB_FA model produced better specificity results than 
the other models on S&P_500 and BAC stock data. The 
highest specificity value recorded by any of the GNB 
models was 0.9800 generated by the GNB_FA model on 
the S&P_500 data. The least specificity value recorded by 
any of the models was 0.1030 produced by GNB_ PCA on 
the ABT stock data. The mean specificity of GNB_LDA 
model (0.8921) was the highest mean specificity among 
the GNB models, and the mean specificity of GNB_PCA 
model (0.2532) was the least mean specificity among the 
GNB models. Figure 3 presents the column chart of the 
specificity results of the GNB models on the different 
stock data.  
Table 4 provides the AUC results of the GNB models 
on the different stock data sets used. The performance of 
GNB_LDA model on each of the stock data was better 
than the rest of the GNB models. In general, the highest 
AUC value recorded by any of the models was 0.9743 
generated by the GNB_LDA model on the TATASTEEL 
 
Figure 3: Specificity of the GNB models on the different stock data sets. 
DataSets GNB GNB_ 
Z-Score 
GNB_ 
Min-Max 
GNB_ 
PCA 
GNB_ 
Z-Score_PCA 
AAPL 0.1099 0.8728 0.8728 0.1200 0.9423 
ABT 0.3094 0.8004 0.8004 0.1030 0.8443 
KMX 0.4184 0.7927 0.7927 0.4069 0.8599 
S&P_500 0.3538 0.9018 0.9018 0.2131 0.9672 
TATASTEEL 0.6717 0.6413 0.6413 0.6544 0.8544 
HPCL 0.2754 0.2774 0.2774 0.1621 0.4037 
BAC 0.3283 0.6359 0.6359 0.1132 0.7698 
Mean 0.3524 0.7032 0.7032 0.2532 0.8059 
      
DataSets GNB_ 
Min-Max_PCA 
GNB_ 
FA 
GNB_ 
Z-Score_FA 
GNB_ 
Min-Max_FA 
GNB_ 
LDA 
AAPL 0.9423 0.9424 0.9523 0.9363 0.9284 
ABT 0.8743 0.7665 0.8603 0.8343 0.8343 
KMX 0.8925 0.8868 0.8522 0.8848 0.9002 
S&P_500 0.9571 0.9800 0.8834 0.9161 0.9632 
TATASTEEL 0.9326 0.8326 0.8652 0.8674 0.8957 
HPCL 0.5487 0.1843 0.3416 0.3416 0.9006 
BAC 0.8038 0.8415 0.8132 0.8000 0.8226 
Mean 0.8502 0.7763 0.7955 0.7972 0.8921 
Table 3: Specificity values recorded by the GNB models. 
 
0
0,2
0,4
0,6
0,8
1
1,2
Specificity
GNB Models
AAPL
ABT
KMX
S&P_500
TATASTEEL
HPCL
BAC
250 Informatica 45 (2021) 243–256  E.K. Ampomah et al. 
 
data. The smallest AUC value recorded by any of the 
models was 0.4649 by GNB_ PCA on the S&P_500 stock 
data. The mean AUC value of GNB_LDA model (0.9563) 
was the highest mean AUC recorded among the GNB 
models. The mean AUC of the GNB_PCA model (0.5197) 
was the least mean AUC among the various models. 
Figure 4 represents the column chart of the AUC 
evaluation metric result for the GNB models on the 
different stock data. 
The ROC curves of the various GNB models on AAPL, 
ABT, KMX, S&P_500, TATASTEEL, HPCL, and BAC 
stock data sets are presented by Figure 5 to Figure 11 
respectively.   
Table 5 to Table 8 present the Kendall’s coefficient of 
concordance rankings of the GNB models using accuracy, 
F1 score, specificity, and AUC evaluation results 
respectively. The study used a cutoff value of 0.05, and 
the Kendall’s coefficient is considered significant and able 
to assign ranks to the models when 𝑝 < 0 . 05 and 𝜒 2
>
16 . 919.   
From Table 5, the Kendall’s coefficient was 
significant to rank the GNB models using the accuracy 
 
Figure 4: AUC values of the GNB models on the different stock data sets. 
DataSets GNB GNB_ 
Z-Score 
GNB_ 
Min-Max 
GNB_ 
PCA 
GNB_ 
Z-Score_PCA 
AAPL 0.5883 0.7216 0.7216 0.5487 0.7438 
ABT 0.6046 0.7736 0.7736 0.5517 0.9246 
KMX 0.5661 0.7958 0.7958 0.5251 0.9238 
S&P_500 0.5830 0.7927 0.7927 0.4649 0.8764 
TATASTEEL 0.5688 0.8115 0.8115 0.5022 0.9540 
HPCL 0.5085 0.6725 0.6708 0.5172 0.7225 
BAC 0.5819 0.8037 0.8037 0.5281 0.9291 
Mean 0.5716 0.7673 0.7671 0.5197 0.8677 
      
DataSets GNB_ 
Min-Max_PCA 
GNB_ 
FA 
GNB_ 
Z-Score_FA 
GNB_ 
Min-Max_FA 
GNB_ 
LDA 
AAPL 0.8421 0.6986 0.7982 0.8416 0.9586 
ABT 0.9211 0.9232 0.9296 0.9329 0.9603 
KMX 0.9447 0.8850 0.8817 0.8994 0.9581 
S&P_500 0.8927 0.5914 0.8382 0.8663 0.9282 
TATASTEEL 0.9483 0.9421 0.9507 0.9511 0.9743 
HPCL 0.8067 0.5757 0.7726 0.7726 0.9617 
BAC 0.9364 0.9251 0.9268 0.9305 0.9532 
Mean 0.8989 0.7916 0.8711 0.8849 0.9563 
Table 4: AUC values recorded by the GNB models. 
0
0,2
0,4
0,6
0,8
1
1,2
AUC
GNB Models
AAPL
ABT
KMX
S&P_500
TATASTEEL
HPCL
BAC
Stock Market Prediction with Gaussian Naïve Bayes... Informatica 45 (2021) 243–256 251 
 
results. GNB_LDA model attained the highest rank. The 
overall ranking of the models was:   
GNB_LDA > GNB_Min-Max _PCA > GNB_Min-
Max_FA > GNB_Z-Score_PCA > GNB_Z-Score_FA > 
GNB_FA > GNB_Z-Score = GNB_Min-Max > GNB > 
GNB_PCA 
Table 6 shows that Kendall’s coefficient was 
significant to rank the GNB models using the F1-Score 
metric. GNB_LDA model generated the highest rank. The 
overall ranking was given as: 
GNB_LDA > GNB_Min-Max _PCA > GNB_Min-
Max_FA > GNB_Z-Score_PCA > GNB_Z-Score_FA > 
GNB_FA > GNB_Z-Score = GNB_Min-Max > 
GNB_PCA > GNB 
Table 7 indicates that Kendall’s coefficient is 
significant to rank the GNB models using specificity 
 
Figure 5: ROC Curves of the GNB models on the AAPL 
stock data set. 
 
Figure 6: ROC Curves of the GNB models on the ABT 
stock data set. 
 
Figure 7: ROC Curves of the GNB models on the KMX 
stock data set. 
 
Figure 8: ROC Curves of the GNB models on the S&P_500 
index stock data set. 
 
Figure 9: ROC Curves of the GNB models on the 
TATASTEEL stock data set. 
    
Figure 10: ROC Curves of the GNB models on the 
TATASTEEL stock data set. 
 
Figure 11: ROC Curves of the GNB models on the HPCL 
stock data set 
252 Informatica 45 (2021) 243–256  E.K. Ampomah et al. 
 
metric. GNB_Min-Max _PCA model produced the 
highest rank. The overall ranking of the models was: 
GNB_Min-Max _PCA > GNB_LDA > GNB_Z-
Score_PCA > GNB_Z-Score_FA > GNB_Min-Max_FA 
> GNB_FA > GNB_Z-Score = GNB_Min-Max > GNB > 
GNB_PCA 
Table 8 shows that Kendall’s coefficient was 
significant to rank the GNB models using AUC metric. 
GNB_LDA generated the highest rank. The overall 
ranking of the GNB models was:  
GNB_LDA > GNB_Min-Max _PCA > GNB_Min-
Max_FA > GNB_Z-Score_PCA > GNB_Z-Score_FA > 
GNB_FA > GNB_Z-Score > GNB_Min-Max > GNB > 
GNB_PCA. 
The ROC curves of the various GNB models on 
AAPL, ABT, KMX, S&P_500, TATASTEEL, HPCL, 
and BAC stock data sets are presented by Figure 5 to 
Figure 11 respectively.   
Table 5 to Table 8 present the Kendall’s coefficient of 
concordance rankings of the GNB models using accuracy, 
Metric W 𝝌𝟐 p  Ranks     
Accuracy 0.84 53.07 0.00 Technique GNB GNB _ 
Z-Score 
GNB_ 
Min-
Max 
GNB_ 
PCA 
GNB _ 
Z-Score_ 
PCA 
    Mean Rank 2.14 3.79 3.79 1.14 7.29 
    Technique GNB_ 
Min-Max 
_PCA 
GNB_ 
FA 
GNB_ 
Z-Score  
_FA 
GNB_ 
Min-Max 
_FA 
GNB_ 
LDA 
    Mean Rank 7.86 5.29 6.07 7.50 10.00 
Table 5: Kendall’s W Test of Concordance Rankings of the GNB models using the accuracy metric. 
Metric W 𝝌𝟐 p  Ranks     
F1-Score 0.62 39.22 0.00 Technique GNB GNB_ 
Z-Score 
GNB_ 
Min-Max 
GNB_ 
PCA 
GNB_  
Z-Score 
_PCA 
    Mean Rank 2.71 3.36 3.36 3.14 6.43 
    Technique GNB_ 
Min-Max 
_PCA 
GNB_ FA GNB_ 
Z-Score 
_FA 
GNB_ 
Min-Max 
_FA 
GNB_ 
LDA 
    Mean Rank 7.43 5.00 6.07 7.36 10.00 
Table 6: Kendall’s W Test of Concordance Rankings of the GNB models using the Specificity metric. 
Metric W 𝝌𝟐  p  Ranks     
Specificity 0.70 43.76 0.00 Technique GNB GNB _ 
Z-Score 
GNB _ 
Min-Max 
GNB_ 
PCA 
GNB _ 
Z-Score 
_PCA 
    Mean Rank 2.29 3.64 3.64 1.43 7.07 
    Technique GNB_ 
Min-Max 
_PCA 
GNB _ FA GNB _ 
Z-Score 
_FA 
GNB_ 
Min-Max 
_ FA 
GNB 
_LDA 
    Mean Rank 8.50 6.71 6.93 6.57 8.21 
Table 7: Kendall’s W Test of Concordance Rankings of the GNB models using the Specificity metric. 
Metric W 𝝌𝟐  p  Ranks     
AUC 0.90 56.95 0.00 Technique GNB GNB _ 
Z-Score 
GNB_ 
Min-Max 
GNB 
_PCA 
GNB _ 
Z-Score 
_PCA 
    Mean Rank 1.86 4.00 3.86 1.14 7.29 
    Technique GNB_ 
Min-Max 
_ PCA 
GNB _ FA GNB_ 
Z-Score 
_FA 
GNB_ 
Min-Max 
_FA 
GNB _ LDA 
    Mean Rank 8.00 4.43 6.64 7.79 10.00 
Table 8: Kendall’s W Test of Concordance Rankings of the GNB models using the AUC metric. 
Stock Market Prediction with Gaussian Naïve Bayes... Informatica 45 (2021) 243–256 253 
 
F1 score, specificity, and AUC evaluation results 
respectively. The study used a cutoff value of 0.05, and 
the Kendall’s coefficient is considered significant and able 
to assign ranks to the models when  0.05 p  and 𝜒 2
>
16. 919.   
From Table 5, the Kendall’s coefficient was 
significant to rank the GNB models using the accuracy 
results. GNB_LDA model attained the highest rank. The 
overall ranking of the models was: 
GNB_LDA > GNB_Min-Max _PCA > GNB_Min-
Max_FA > GNB_Z-Score_PCA > GNB_Z-Score_FA > 
GNB_FA > GNB_Z-Score = GNB_Min-Max > GNB > 
GNB_PCA 
Table 6 shows that Kendall’s coefficient was 
significant to rank the GNB models using the F1-Score 
metric. GNB_LDA model generated the highest rank. The 
overall ranking was given as: 
GNB_LDA > GNB_Min-Max _PCA > GNB_Min-
Max_FA > GNB_Z-Score_PCA > GNB_Z-Score_FA > 
GNB_FA > GNB_Z-Score = GNB_Min-Max > 
GNB_PCA > GNB 
Table 7 indicates that Kendall’s coefficient is 
significant to rank the GNB models using specificity 
metric. GNB_Min-Max _PCA model produced the 
highest rank. The overall ranking of the models was: 
GNB_Min-Max _PCA > GNB_LDA > GNB_Z-
Score_PCA > GNB_Z-Score_FA > GNB_Min-Max_FA 
> GNB_FA > GNB_Z-Score = GNB_Min-Max > GNB > 
GNB_PCA     
Table 8 shows that Kendall’s coefficient was 
significant to rank the GNB models using AUC metric. 
GNB_LDA generated the highest rank. The overall 
ranking of the GNB models was:  
GNB_LDA > GNB_Min-Max _PCA > GNB_Min-
Max_FA > GNB_Z-Score_PCA > GNB_Z-Score_FA > 
GNB_FA > GNB_Z-Score > GNB_Min-Max > GNB > 
GNB_PCA. 
 Conclusion 
This study assessed how the GNB algorithm performed 
with different feature scaling (i.e., standardization scaling, 
and Min-Max scaling techniques) and feature extraction 
techniques (i.e., PCA, LDA, and FA) in predicting the 
direction of movement of stock price using stock data 
randomly collected from different stock markets. The 
performance of the various GNB models were evaluated 
using accuracy, F1-Score, specificity and AUC evaluation 
metrics. Kendall’s W test of concordance was used to 
generate ranks for the GNB models using the evaluation 
metrics.  
The experimental results indicated that application of 
scaling techniques improved the performance of the GNB 
model. Models based on integration of GNB algorithm, 
feature scaling technique and feature extraction technique 
generated results which were superior to results produced 
by models based on either integration of GNB algorithm 
and feature scaling technique or GNB algorithm and 
feature extraction technique with the exception of 
GNB_LDA. In general, the model based on integration of 
GNB algorithm and Linear Discriminant Analysis 
(GNB_LDA) outperformed all the other models of GNB 
considered in three of the four evaluation metrics (i.e., 
accuracy, F1-score, and AUC). Similarly, the predictive 
model based on GNB algorithm, Min-Max scaling, and 
PCA produced the best rank using the specificity results. 
In addition, GNB produced better performance with Min-
Max scaling technique than it does with standardization 
scaling techniques.   
 Acknowledgement 
This work was supported by the NSFC-Guangdong Joint 
Fund (Grant No. U1401257), National Natural Science 
Foundation of China (Grant Nos. 61300090, 61133016, 
and 61272527), science and technology plan projects in 
Sichuan Province (Grant No. 2014JY0172) and the 
opening project of Guangdong Provincial Key Laboratory 
of Electronic Information Products Reliability 
Technology (Grant No. 2013A061401003) 
 Reference 
[1] Fama E. F, Fisher L, Jensen M, Roll R (1969) The 
adjustment of stock price to new information. Int Eco 
Rev 10(1):1–21 
[2] Yeh, I.-C., & Hsu, T.-K. (2014). Exploring the 
dynamic model of the returns from value stocks and 
growth stocks using time series mining. Expert 
Systems with Applications, 41, 7730–7743.  
https://doi.org/10.1016/j.eswa.2014.06.036 
[3] Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood 
predicts the stock market. Journal of Computational 
Science, 2, 1–8.  
https://doi.org/10.1016/j.jocs.2010.12.007 
[4] Smith, V. L. (2003). Constructivist and ecological 
rationality in economics. American Economic 
Review, 93, 465–508. 
https://doi.org/10.1257/000282803322156954 
[5] Gandhmal, D. P., & Kumar, K. (2019). Systematic 
analysis and review of stock market prediction 
techniques. Computer Science Review, 34, 100190. 
https://doi.org/10.1016/j.cosrev.2019.08.001 
[6] Huang, C.-J., Yang, D.-X., & Chuang, Y.-T. (2008). 
Application of wrapper approach and composite 
classifier to the stock trend prediction. Expert 
Systems with Applications, 34(4), 2870–2878. 
https://doi.org/10.1016/j.eswa.2007.05.035 
[7] Huang, W., Nakamori, Y., &Wang, S.-Y. (2005). 
Forecasting stock market movement direction with 
support vector machine. Computers & Operations 
Research, 32(10), 2513–2522. 
https://doi.org/10.1016/j.cor.2004.03.016 
[8] Maragoudakis M., Serpanos D. (2015). Exploiting 
Financial News and Social Media Opinions for Stock 
Market Analysis using MCMC Bayesian Inference. 
Computational Economics. DOI 10.1007/s10614-
015-9492-9. 
https://doi.org/10.1007/s10614-015-9492-9 
[9] Iqbal, N., & Islam, M. (2019). Machine learning for 
dengue outbreak prediction: A performance 
254 Informatica 45 (2021) 243–256  E.K. Ampomah et al. 
 
evaluation of different prominent classifiers. 
Informatica, 43(3), 361 -371.  
https://doi.org/10.31449/inf.v43i3.1548 
[10] Abaker, A. A., & Saeed, F. A. (2021). A Comparative 
Analysis of Machine Learning Algorithms to Build a 
Predictive Model for Detecting Diabetes 
Complications. Informatica, 45(1), 117 -125.  
https://doi.org/10.31449/inf.v45i1.3111 
[11] Zhang, Y., & Wu, L. (2009). Stock market prediction 
of s&p 500 via combination of improved bco 
approach and bp neural network. Expert Systems with 
Applications, 36 (5), 8849–8854. 
https://doi.org/10.1016/j.eswa.2008.11.028 
Meesad, P., & Rasel, R. I. (2013). Predicting stock 
market price using support vector regression. In 
Informatics, electronics & vision (iciev), 2013 
international conference on informatics. IEEE, 2013, 
1–6. https://doi.org/10.1109/iciev.2013.6572570 
[12] Zhou, Z., Gao, M., Liu, Q., & Xiao, H. (2020). 
Forecasting stock price movements with multiple data 
sources: Evidence from stock market in China. 
Physica A: Statistical Mechanics and its Applications, 
542, 123389. 
https://doi.org/10.1016/j.physa.2019.123389 
[13] Ampomah, E. K., Qin, Z., Nyame, G., & Botchey, F. 
E. (2021). Stock market decision support modeling 
with tree-based AdaBoost ensemble machine learning 
models. Informatica, 44(4), 363 – 375  
https://doi.org/10.31449/inf.v44i4.3159 
[14] Kumar, M., & Thenmozhi,M. (2006). Forecasting 
Stock index movement: A comparison of support 
vector machines and random forest. SSRN Scholarly 
Paper. Rochester, NY: Social Science Research 
Network, January 24, 2006. 
https://doi.org/10.2139/ssrn.876544 
[15] Ou, P., & Wang, H. (2009). Prediction of stock 
market index movement by ten data mining 
techniques. Modern Applied Science, 3(12), 28. 
https://doi.org/10.5539/mas.v3n12p28 
[16] Subha, M. V., & Nambi, S. T. (2012). Classification 
of Stock Index movement using K-nearest neighbours 
(k-NN) algorithm. WSEAS Transactions on 
Information Science & Applications, 9(9), 261–270. 
P259. 
[17] Saifan R, Sharif K, Abu-Ghazaleh M, Abdel-Majeed 
M. Investigating Algorithmic Stock Market Trading 
Using Ensemble Machine Learning Methods. 
Informatica. 2020 Sep 15;44(3), 311-325 
https://doi.org/10.31449/inf.v44i3.2904 
[18] Zikowski, K. (2015). Using volume weighted support 
vector machines with walk forward testing and 
feature selection for the purpose of creating stock 
trading strategy. Expert Systems with Applications, 
42, 1797–1805. 
https://doi.org/10.1016/j.eswa.2014.10.001 
[19] Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). 
Predicting stock and stock price index movement 
using Trend Deterministic Data Preparation and 
machine learning techniques. Expert Systems with 
Applications, 42, 259–268.  
https://doi.org/10.1016/j.eswa.2014.07.040 
[20] Sun, S., Wei, Y., & Wang, S. (2018). AdaBoost-
LSTM Ensemble Learning for Financial Time Series 
Forecasting. Computational Science – ICCS 2018, 
590–597. https://doi:10.1007/978-3-319-93713-7_55 
[21] Khan, W., Ghazanfar, M.A., Azam, M.A. et al. 
(2020). Stock market prediction using machine 
learning classifiers and social media, news. J Ambient 
Intell Human Comput. 
https://doi.org/10.1007/s12652-020-01839-w 
[22] Bhandare, Y., Bharsawade, S., Nayyar, D., Phadtare, 
O., & Gore, D. (2020). SMART: Stock Market 
Analyst Rating Technique Using Naive Bayes 
Classifier. 2020 International Conference for 
Emerging Technology (INCET). 
https://doi:10.1109/incet49848.2020.9154002 
[23] Saranya C. Manikandan G. (2013). A study on 
normalization techniques for privacy preserving data 
mining. International Journal of Engineering and 
Technology (IJET). 5(3):2701-2714 
[24] Abdi H.  Williams L. J. (2010).  Principal component 
analysis. Wiley interdisciplinary reviews: 
computational statistics. 2(4):433-459 
https://doi.org/10.1002/wics.101 
[25] Bro R.  Smilde A. K. (2014) Principal component 
analysis. Analytical Methods. 6(9):2812-2831 
[26] Tharwat A. Gaber T.  Ibrahim A.  Hassanien A. E. 
Linear discriminant analysis: A detailed tutorial. AI 
communications. 2017, 30(2):169-190 
https://doi.org/10.3233/aic-170729 
[27] Maskey R.  Fei J.  Nguyen H. O. (2018), Use of 
exploratory factor analysis in maritime research. The 
Asian journal of shipping and logistics. 34(2):91-111 
https://doi.org/10.1016/j.ajsl.2018.06.006 
[28] Kendall, M. G. Babington, S. B.  (1939). The Problem 
of m Rankings, The Annals of Mathematical 
Statistics, 10, 275-287. 
  
Stock Market Prediction with Gaussian Naïve Bayes... Informatica 45 (2021) 243–256 255 
 
 Appendix 
 
 
  
Data Set 
Stock  
Market 
Time Frame Number of Sample 
AAPL NASDAQ 2005-01-01 to 2019-12-30 3773 
ABT NYSE 2005-01-01 to 2019-12-30 3773 
BAC NYSE 2005-01-01 to 2019-12-30 3773 
S&P_500 INDEXSP 2005-01-01 to 2019-12-30 3773 
HPCL NSE 2005-01-01 to 2019-12-30 3278 
KMX NYSE 2005-01-01 to 2019-12-30 3773 
TATASTEEL NSE 2005-01-01 to 2019-12-30 3476 
Table A1: Detail of stock data sets used. 
Volume Indicator Description 
Chaikin A/D Line (ADL) Estimates the Advance/Decline of the market. 
Chaikin A/D Oscillator (ADOSC) Indicator of another indicator. It is created through application of 
MACD to the Chaikin A/D Line 
On Balance Volume (OBV) Uses volume flow to forecast changes in price of stock 
Table A2: Description of Volume Indicators used in the study. 
Price Transform Indicator Description 
Median Price (MEDPRICE) Measures the mid-point of each day’s high and low prices. 
Typical Price (TYPPRICE) Measures the average of each day’s price. 
Weighted Close Price (WCLPRICE) Average of each day's price with extra weight given to the 
closing price. 
Table 3: Description of Price Transform Function. 
Overlap Studies Indicators Description 
Bollinger Bands (BBANDS) Describes the different highs and lows of a financial 
instrument in a particular duration. 
Weighted Moving Average (WMA) Moving average that assign a greater weight to more recent 
data points than past data points 
Exponential Moving Average (EMA) Weighted moving average that puts greater weight and 
importance on current data points, however, the rate of 
decrease between a price and its preceding price is not 
consistent. 
Double Exponential Moving Average (DEMA) It is based on EMA and attempts to provide a smoothed 
average with less lag than EMA. 
Kaufman Adaptive Moving Average (KAMA) Moving average designed to be responsive to market trends 
and volatility. 
MESA Adaptive Moving Average (MAMA) Adjusts to movement in price based on the rate of change of 
phase as determined by the Hilbert transform discriminator. 
Midpoint Price over period (MIDPRICE) Average of the highest close minus lowest close within the 
look back period 
Parabolic SAR (SAR) Heights potential reversals in the direction of market price of 
securities. 
Simple Moving Average (SMA) Arithmetic moving average computed by averaging prices 
over a given time period. 
Triple Exponential Moving Average (T3) It is a triple smoothed combination of the DEMA and EMA 
Triple Exponential Moving Average (TEMA) An indicator used for smoothing price fluctuations and 
filtering out volatility. Provides a moving average having less 
lag than the classical exponential moving average. 
Triangular Moving Average (TRIMA) Moving average that is double smoothed (averaged twice) 
Table A3: Description of Overlap Studies Indicators used in the study. 
 
256 Informatica 45 (2021) 243–256  E.K. Ampomah et al. 
 
 
Momentum Indicators Description 
Average Directional Movement Index 
(ADX) 
Measures how strong or weak (strength of) a trend is over time 
Average Directional Movement Index 
Rating (ADXR) 
Estimates momentum change in ADX.   
Absolute Price Oscillator (APO) Computes the differences between two moving averages 
Aroon Used to find changes in trends in the price of an asset 
Aroon Oscillator (AROONOSC) Used to estimate the strength of a trend 
Balance of Power (BOP) Measures the strength of buyers and sellers in moving stock prices 
to the extremes 
Commodity Channel Index (CCI) Determine the price level now relative to an average price level 
over a period of time 
Chande Momentum Oscillator (CMO) Estimated by computing the difference between the sum of recent 
gains and the sum of recent losses 
Directional Movement Index (DMI) Indicate the direction of movement of the price of an asset 
Moving Average Convergence 
/Divergence (MACD) 
Uses moving averages to estimate the momentum of a security 
asset 
Money Flow Index (MFI) Utilize price and volume to identify buying and selling pressures 
Minus Directional Indicator 
(MINUS_DI) 
Component of ADX and it is used to identify presence of 
downtrend. 
Momentum (MOM) Measurement of price changes of a financial instrument over a 
period of time 
Plus Directional Indicator (PLUS_DI) Component of ADX and it is used to identify presence of uptrend. 
Log Return The log return for a period of time is the addition of the log returns 
of partitions of that period of time. It makes the assumption 
that returns are compounded continuously rather than across sub-
periods 
Percentage Price Oscillator (PPO) Computes the difference between two moving averages as a 
percentage of the bigger moving average 
Rate of change (ROC) Measure of percentage change between the current price with 
respect to a at closing price n periods ago. 
Relative Strength Index (RSI) Determines the strength of current price in relation to preceding 
price 
Stochastic (STOCH) Measures momentum by comparing closing of a security with 
earlier trading range over a specific period of time 
Stochastic Relative Strength Index 
(STOCHRSI) 
Used to estimate whether a security is overbought or oversold. It 
measures RSI over its own high/low range over a specified period. 
Ultimate Oscillator (ULTOSC) Estimates the price momentum of a security asset across different 
time frames. 
Williams' %R (WILLR) Indicates the position of the last closing price relative to the highest 
and lowest price over a time  
Table A4: Description of Momentum Indicators used in the study.