https://doi.org/10.31449/inf.v47i1.4297                                                                                           Informatica 47 (2023) 97–108   97 
A Prediction Model for Student Academic Performance Using 
Machine Learning  
Harjinder Kaur
1
,Tarandeep Kaur
1
, Rachit Garg
2 
1
Scool of Computer Applications, Lovely Professional University, Phagwara, 144401, India 
2
COS, School of Computer Science and Engineering, Lovely Professional University, Phagwara, 144401, India. 
E-mail: Harjinder.12962@lpu.co.in, Tarandeep.24836@lpu.co.in, rachit.garg@lpu.co.in 
Keywords: academic performance, decision tree education data mining, ensemble model, naïve bayes, performance 
prediction 
Received: July 15, 2022 
Abstract: Academic data mining impacts a large number of educational institutions, significantly, playing 
a prime role in accumulating, studying, and analyzing the academic data. The accumulated academic 
data can be processed and analyzed for various purposes. It can be used for predicting the student 
academic performance and thereby broadening the retention rate of academic institutions. The prediction 
of students’ academic performance at the initial stage helps the students to identify their lacking subjects 
such that they can focus more on their deficient subjects and improvise their academic performance. 
Currently, numerous machine learning techniques are being used by the academic institutions to extract, 
analyze and predict the students’ academic performance and identify the fast and slow learners. This 
paper proposes an ensemble model, using the voting method for preclusive prediction of the student 
academic performance. The predicted results are being further utilized by the poor performers to 
concentrate more on their deficit courses. Accordingly, the instructors can focus on creating and 
implementing novel strategies or amending the existing pedagogical tools and approaches to aid the slow 
learners in improvising their performance. The proposed model has been tested on the academic data of 
an educational institution using the RapidMiner tool. The results depicts that how the number of E grades 
proportionally affects the performance of the students in academics. The proposed ensemble model 
generates the predicted results with an accuracy of 90.83%.   
Povzetek: Predstavljena je metoda strojnega učenja za napovedovanje učnega uspeha.
 
1   Introduction 
Academic Data Mining (ADM) has obtained astounding 
inquisitiveness in the recent years. The need for the 
analysis and assessment of the factors impacting the 
academic performance of students has embellished the 
demand for Academic Data Mining (ADM) or 
Educational Data Mining (EDM) [1]. Significantly, such 
factors can include student academic performance 
measured in terms of final grades obtained, course 
attendance, mid-assessment marks, etc. [2]. ADM plays 
a pivotal role in analyzing student performance based on 
the above-said factors and thereby classifying them into 
fast and slow learners. Additionally, ADM can also aid 
in providing subtle suggestions and recommendations 
for both the instructors as well as the students in 
improvising their performance. This can involve 
processes such as academic performance prediction and 
academic performance recommendations. Both the 
processes are essential for every educational institution 
as their reputation is centered upon the academic 
accomplishments of students [3]. The primary goal of 
academic performance prediction of learners is the 
identification of students at risk in their initial stage of 
career. This identification helps the instructor to analyze 
the factors affecting the performance such that corrective 
actions can be taken for the students at risk of lower 
achievement levels. Moreover, the timely analysis of 
weak performers benefits the academic institutions in 
increasing their retention rate [4].   
The academic performance of students is predicted 
using different supervised learning techniques such as 
classification and prediction. Learning Analytics (LA) 
plays a very significant in the field of education. The 
motivation for using LA by academic institutions is to 
analyze the patterns obtained from the educational data 
after prediction. So, after the academic performance 
prediction, LA in association with ADM is used to 
generate effective results that leads to the categorization 
of different types of students [5]. 
This research proposes a model that serves as an 
alarming structure for educational organizations. The 
proposed model can be used by the students to discover 
and concentrate on their disconcerting subjects while the 
faculties can focus on improving their learning strategies 
towards such students. Currently, many machine 
learning algorithms are available for envisaging student 
educational performance and ADM [6, 7]. The proposed 
model is also an ensemble machine learning-based 
model that predicts the student’s academic performance 
using an ensemble of machine learning algorithms, 
98   Informatica 47 (2023) 97–108                                                                                                                H. Kaur et al. 
Decision Tree, Naïve Bayes, and K-Nearest Neighbor. 
For performance prediction, the records are collected 
from the academic institution which is then pre-
processed to eliminate anomalies so that only the data 
which is helpful for the analysis purpose is anomalies 
free. The cleaned data is then applied to the model and 
thereafter produced the predicted results. 
 
1.1   Motivation for the work 
Currently, the majority of the academic institutions 
face challenges related to the decreasing student 
academic performance and thereby rising student 
dropout ratio. This poses an alarming and stake-
compromising situation for the academic institutions. 
They consistently struggle to maintain the retention rate 
of the students. Similarly, the decline in the student 
academic performance impacts a student 
physiologically, economically and socially. Some 
students get demotivated and resultantly think of 
discontinuing their degree. This leads to the increase in 
the dropout rate for the academic institution. Such 
circumstances are challenging for the teaching fraternity 
as well since the failure or decrease in the student 
academic performance puts a question mark on overall 
conduct of the teacher. It raises concerns on the teaching 
capabilities and pedagogical approach followed by 
him/her.  
The proposed ensemble prediction model has been 
developed considering such circumstances. It helps in 
the reduction of drop-out rates and results in improving 
the retention rate of students. It provides the solution for 
the increase drop out issue faced by institutions by 
predicting the academic performance of the students 
precisely and proficiently. The proposed model has been 
trained using the historic data of students and then tested 
using the testing dataset. The predicted results classify 
the students into slow learners and fast learners. The 
proposed model serves as an alarming system for slow 
learners, the students who are at academic risk at the 
early stage of their carrier along with the courses 
affecting their performance. The early identification of 
students at academic risk helps the instructors to create 
new pedagogies, strategies and special academic 
counselling sessions for the weak students. Additionally, 
such initiatives helps the slow learners to concentrate 
more on their weak areas so that they can perform well 
in their academics and thereby improvising their 
performance. The improvement in the academic 
performance at early stage helps the slow learners to 
complete their degree on time that further improves the 
retention rate which further improves the repute of 
academic institutions. 
Overall, the proposed model is useful for academic 
stakeholders including learners, instructors/ teachers and 
educational institutions. It benefits the learners in their 
self-assessment on academic background by providing 
the reasons which are responsible for their academic 
downfall. The model assist the instructors to keep track 
of the academic growth of the students and helps them to 
provide special attention towards the slow learners. The 
predicted results of the proposed model helps the 
educational institutions to devise new strategies and 
steps for promoting and educating slow learners for their 
performance improvement thereby increasing the 
retention rate of the institutions.  
The rest of the paper has been divided into 5 
sections. Section 2 lists a tabular representation of the 
existing techniques used for predicting students’ 
academic performance. The proposed model has been 
elaborately discussed along with its structure and 
working in Section 3. Section 4 covers the empirical 
analysis of the proposed model on the collected data. The 
last fragment in the paper concludes with a brief 
description of why the ensemble approach has been 
preferred for predicting students’ academic performance. 
It also concludes with an insight into the futuristic 
extensions that can be made in the proposed model.  
2    Literature review 
The existing educational research shows that the 
intersection of academic data and machine learning 
techniques is advantageous for carrying out 
interdisciplinary work [8]. Research on educational data 
helps in the identification and selection of various factors 
revealing argumentative and empirical academic results. 
The implementation of various machine learning 
techniques on collected academic records can help in 
developing dynamic alarming systems. Such systems 
will be beneficial for both instructor/tutor as well as 
learners to work in their lacking areas [9, 10].  
Subsequently, the learners can improvise their 
academic performance based on the feedback of 
predicted results of alarming systems such that they can 
complete their respective degrees on time and with 
minimum dropouts or backlogs. Table. 1 illustrates the 
review of literature along with the techniques used and 
objectives of each model, and Figure. 1 shows the 
categorization of different prediction models based on 
machine learning.  
 
 
 
 
 
A Prediction Model for Student Academic Performance…                                                         Informatica 47 (2023) 97–108   99 
 
Table 1: Existing academic performance prediction models. 
Prediction 
Models 
Machine Learning Technique(s) Used Core Objective 
 
[1] 
Decision Trees, Support Vector Machines, 
Naive Bayes, Bagged Trees, and Boosted 
Trees 
The early segmentation of students based upon their performance 
in the first year which helps in achievements of better results 
during the course completion. 
[3] Decision Trees To categorize the students based upon their performance. 
[4] Logistic Regression, Neural Networks, 
Random Forests. 
To identify the various challenges faced by the student in their 
first educational year based upon student registration data. 
[5] Decision Trees, Rule and Fuzzy Rule 
Induction Methods, and Neural Networks. 
To predict the marks of university students in their final exams. 
[11] Logistic/Linear Regression, Matrix 
Factorization 
To use educational data for an intelligent tutoring system. 
[12] Linear Regression, Neural Networks, 
Support Vector Machines 
To predict the student score based upon their mid-term marks. 
[13]  Neural Networks, Random Forests, and 
Decision Tree 
To predict the student academic performance of first-year  
students 
 
[14] Linear regression, neural networks, 
support vector machines, decision trees, 
naive Bayes, k-nearest neighbor 
To provide various courses based upon the existing data which 
help in improving the academic performance of a student. 
[15] Decision tree, Gradient boost algorithm, 
and Naïve Bayes 
To identify the weak students and provide special counselling for 
their betterment. 
[16] SVM and Naïve Bayes 
To predict the student’s academic performance using Naïve 
Bayes and compare the predicted results with the results 
generated by SVM. 
[17] 
K-Nearest Neighbor, Naïve Bayes, 
Decision Tree, and Logistic Regression 
The main objective of the study is to predict the student’s 
academic performance along with the factors affecting their 
performance. 
[18,19] Decision Tree 
To assess the student’s academic performance using the decision 
tree. The predicted results were used to provide a 
recommendation to weak students so that they can improve their 
performance which lowers the failure rate. 
[20] 
Naïve Bayes, Neural Network, and 
Decision Tree 
The main objective is this research is the usage of various data 
mining techniques to predict and analyse the academic 
performance of students founded from the academic data 
available by a participated forum. 
[21,22] Random Forest, Neural Networks, SVMs, 
and Regression Techniques 
EDM was used to identify the weak students, based upon their 
performance. It also helps in the identification of various factors 
responsible for affecting and deteriorating the academic 
performance of the students. 
100   Informatica 47 (2023) 97–108                                                                                                                H. Kaur et al. 
 
 
Figure 1: Categorization of existing prediction models based on the machine learning techniques used. 
 
3   Proposed ensemble model 
The primary goal of creating an ensemble model 
helps in the production of more accurate results as 
compared to the accuracy of results produced by 
individual classifier. The proposed model uses the 
ensemble of heterogeneous classifiers. The ensemble 
model proposed here accepts the output from multiple  
 
classifiers such as decision tree, Naïve Bayes, and K-NN. 
The proposed ensemble combines the output of 
heterogeneous classifies using voting approach which 
resultantly produces the final prediction results. The idea 
of ensemble approach works if and only if all the selected 
classifiers producing different class labels rather than 
agreeing on the same decision. Figure 2 depicts the flow 
of ensemble method. 
 
Figure 2: Basic ensemble approach for prediction. 
 
The proposed ensemble model performs 
classification of the students based on their academic 
performance considering their marks in the courses 
inclusive of their attendance in each course. The data for 
classification has been collected from   sources such as 
using Google form and a designed interface. Certain 
attributes generate irrelevant values such as incomplete 
data, duplicate data, naming identification problems and 
hence have no participation in the classification process. 
Thus, such irrelevant attributes were stricken out of the 
classification process else the use of these attributes 
could have increased the classification errors and  
 
complexity of the selected algorithm. Conclusively, this 
helped in making the predictions more accurate.  
The proposed ensemble model has been designed 
to predict student academic performance using an 
ensemble of machine learning algorithms. The primary 
objective of designing an ensemble model is that every 
selected classifier must be complementary to each other 
in the context of a judgment so that further accuracy can 
be achieved [23]. The model intends to compute the 
student academic performance (in terms of Cumulative 
Grade Points) and achieve an early separation of learners 
Techniques Used
Decision 
Tree
[1],[9],[13][
14],[15],[1
6],[18],[19]
,[20],[26]
Support 
Vector 
Machines
[1],[12],[14
],[17],[21]
Naive 
Bayes
[1],[14][15]
,[17][18],[2
0]
Neural 
Networks
[9],[12],[13
],[14],[22],[
27]
Random 
Forest
[13],[22],[2
7]
Bagged 
Trees and 
Boosted 
Trees
[1],[15]
Linear 
Regression
[11],[12],[14]
,[22]
K-Nearest 
Neighbour
[14],[18]
Logistic 
Regression
[18],[28],[27]
A Prediction Model for Student Academic Performance…                                                         Informatica 47 (2023) 97–108   101 
segregating them into slow and fast learners based upon 
their educational performance. 
3.1 Working of the proposed ensemble 
model 
When it comes to predicting student academic 
performance, a single classification model might not 
produce the appropriate outcome. Moreover, the single 
classification models suffer from high variance [24, 
25].In the proposed ensemble approach, the output of 
multiple models has been combined which further 
enhances the overall accuracy of prediction results. 
There are some ensemble approaches like bagging, 
boosting, stacking, and voting with each having its pros 
and cons.  
In the proposed model, voting technique has been 
used because the prediction results have been produced 
by combining the output of multiple classifiers. The 
results generated by the voting approach are better in 
comparison with a single classifier because in voting the 
decision depends upon the majority vote [26]. The choice 
of voting approach has been made because it produces 
predicted results with low variance in comparison to the 
variance produced by single classification model [27, 
28]. 
The students are the key component of the proposed 
ensemble model as they provide their academic details 
as input. The academic details comprise their courses, 
marks/grade in each course, and attendance in individual 
courses as these academic parameters are considered as 
the crucial factors for measuring the academic 
performance of students. An interface has been designed 
to get the academic details of the students that are used 
for the model testing. The interface supports 
heterogeneous devices where the learners can provide 
their academic inputs by using either their smartphones, 
laptops, or even their desktops too. 
The students input their educational details through 
the designed student interface. Such student academic 
data is stored in an academic database and is the core 
substantial asset for the prediction process. The stored 
data formulates different student records and is pre-
processed, and then it is used to train the proposed model. 
During the pre-processing stage, the academic records 
have been integrated followed by checks to look for any 
inconsistencies, such as duplicates, missing values, etc. 
Consequently, the pre-processing stage generates the 
refined data which is further used to train the proposed 
ensemble model.  
In the proposed ensemble model, the training 
dataset is used for the generation of rules which are being 
used for the prediction as shown in Figure 3. The testing 
dataset is being applied to constructed ensemble model 
to get the predicted academic performance based upon 
the rules generated using the training dataset.  
 
Figure 3: Proposed ensemble model. 
 
102   Informatica 47 (2023) 97–108                                                                                                                H. Kaur et al. 
The predicted results of the model are beneficial for 
both the instructor as well as the learner. It enables the 
instructor in scrutinizing the student's academic results 
and derive their performance from them which can be 
further used to take certain novel strategic actions for 
improvising the performance of slow learners. 
Concomitantly, this helps the recognizing the students at 
academic risk at the stage of academics which helps in 
augmenting the student retention rate and completion of 
degree on time. Also, the predicted academic 
performance is used as feedback by the students. 
3.2 Mathematical formulation and proposed 
algorithm 
Analytically, the proposed algorithm helps to 
categorize the different types of learners into strong and 
weak learners. The differentiation identifies the weak 
learners and also the courses in which they have 
underperformed. Subsequently, this helps the weak 
learners to concentrate more on such subjects they were 
lagging and resultantly improvise their performance. 
Identification of weak performers at early stage guides 
them to perform well in their end term exams. 
Mathematically, in order to categorize the students, their 
𝐶𝐺𝑃𝐴 has been calculated by considering their grade 
points and credit for each course. For calculating 
the 𝐶𝐺𝑃𝐴 , the student’s grade points have been initially 
computed from the marks obtained in each course as 
shown in Table. 2. 
The proposed model has based on certain 
assumptions which are as follows: 
• The 𝐶𝐺𝑃𝐴 of students has been calculated by 
considering the grade points of each course. In the 
proposed model for 𝐶𝐺𝑃𝐴 calculation, the grade 
point consideration is at a 10 scale. 
• The results of the proposed ensemble model used 
by 2
nd
-semester students further recommend the 
courses because in majority of the universities the 
selection option has been started from the second 
year onwards. 
• The number of subjects considered for the 
calculation of 𝐶𝐺𝑃𝐴 was 8.  
• The total marks of various courses inclusive of 
attendance marks. 
• For predicting the student academic performance 
the grade consideration is from A-E. 
The following table shows the description of grade 
points and grades based upon the marks: 
 
Table 2: Grade as per marks range. 
Range 
of 
Marks 
Grade 
Point 
Grade 
90 - 100 9.0 - 10.0 A+ 
80 - 89 8.0 - 8.9 A 
70 - 79 7.0 - 7.9 B+ 
60 - 69 6.0 - 6.9 B 
50 - 59 5.0 - 5.9 C 
40 - 49 4.0 - 4.9 D 
< 40 0.0-3.9 E 
 
Objective Function: Map (𝑆𝑡 𝑢 𝑖,𝐶𝑜 𝑢 𝑗 ,𝑀 𝐶 𝑖𝑗
𝑦𝑖𝑒𝑙𝑑𝑠 →    𝑐𝑔𝑝𝑎 )   (1) 
Where: 
𝑆𝑡𝑢 : Students 
𝐶𝑜𝑢 : Courses 
𝑀 𝐶 𝑖𝑗
: Marks obtained by 𝑖 𝑡 ℎ
 student in 𝑗 𝑡 ℎ
 course. 
𝐶𝐺𝑃𝐴 : Commulative Grade Point Assessment. 
𝑖 : Index of Students𝑖 ∈𝑆 𝑤 ℎ𝑒𝑟𝑒 𝑆 ={1≤𝑖 ≤𝑛 } 
𝑆 = 𝑆𝑒𝑡 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝑠 𝑎𝑛𝑑 𝑛 𝑖𝑠 𝑡 ℎ𝑒 𝑚𝑎𝑥𝑖𝑢𝑚𝑢𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 
𝑗 : Index of Course and  𝑗 ∈𝑅 𝑤 ℎ𝑒𝑟𝑒 𝑅 ={1≤𝑗 ≤𝑚 } 
𝑅 = 𝑆𝑒𝑡 𝑜𝑓 𝐶𝑜𝑢𝑟𝑠𝑒𝑠 𝑎𝑛𝑑 𝑚 𝑖𝑠 𝑡 ℎ𝑒 𝑚𝑎𝑥𝑖𝑢𝑚𝑢𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑢𝑟𝑠𝑒𝑠 
𝑀𝐶 𝑖,𝑗 : Marks in each course such that𝑖 ∈𝑆 𝑎𝑛𝑑 𝑗 ∈𝑅 
𝑤 ℎ𝑒𝑟𝑒 𝑆 ={1≤𝑖 ≤𝑛 } 𝑎𝑛𝑑 𝑅 ={1≤𝑗 ≤𝑚 } 
 
For accomplishing the objective function, a map 
function has been devised. The mapping function 
predicts the performance of the students by calculating 
their CGPA based upon the academic details given by 
students. Here, the map is the function that maps the𝑖 𝑡 ℎ
 
students in to their corresponding CGPA by considering 
A Prediction Model for Student Academic Performance…                                                         Informatica 47 (2023) 97–108   103 
their course and marks in each course. The general 
formula for the calculation of 𝐶𝐺𝑃𝐴 is depicted in Eq. 
(2). 
CGPA=
∑(G∗CR)
∑CR
   (2)                       
 
Where: 
𝐶𝐺𝑃𝐴 −Cumulative Grade point Average 
𝐶𝑅 −Represents the credit score of a course 
𝐺 −Represents Grade points obtained by the student 
in a course. 
The proposed model is composed of a set 𝑆𝑡 =
{𝑠 𝑡𝑢
1
,𝑠 𝑡𝑢
2
,𝑠 𝑡 𝑢 3
,……..𝑠 𝑡𝑢
𝑛 } of n students such that 
𝑆𝑡𝑢 ={𝑠 𝑡𝑢
𝑖|1≤𝑖 ≤𝑛 } specifies the number of 
students; a set 𝐶𝑜𝑢 ={𝑐𝑜 𝑢 1
,𝑐𝑜 𝑢 2
,𝑐𝑜 𝑢 3
,……..𝑐𝑜 𝑢 𝑚 } 
represents the 𝑚 different subjects such that 𝐶𝑜𝑢 =
{𝑐𝑜 𝑢 𝑗 |1≤𝑗 ≤𝑚 }. 
Let 𝑔 𝑖𝑗
denotes the grade points obtained by the 
𝑖 𝑡 ℎ
student in𝑗 𝑡 ℎ
course. If 𝑐𝑔𝑝 𝑎 𝑖 is the CGPA of 𝑖 𝑡 ℎ
 
student, then it can be obtained by matrix algorithm 
specified in Eq. (3): 
 
𝐶𝑔𝑝 𝑎 𝑖  =[
𝑐𝑔𝑝 𝑎 1
𝑐𝑔𝑝 𝑎 2
⋮
𝑐𝑔𝑝 𝑎 𝑛 ]=
1
∑ 𝑐 𝑟 𝑗 𝑚 𝑗 =1
[
𝑔 11
𝑔 12
𝑔 21
𝑔 22
⋯ 𝑔 1𝑚 ⋯ 𝑔 2𝑚 ⋮ ⋮
𝑔 𝑛 1
𝑔 𝑛 2
⋮
⋯ 𝑔 𝑛𝑚
][
𝑐 𝑟 1
𝑐 𝑟 2
⋮
𝑐 𝑟 𝑚 ]    (3) 
 
Where 𝑐 𝑟 𝑗 denotes the credits corresponding to 𝑗 𝑡 ℎ
course∀ 1≤𝑗 ≤𝑚 , and the proposed algorithm is shown as follows: 
Objective Function: Mapping of student with their CGPA by considering their program courses and marks in individual 
course which affect student’s academic performance consists of {Students, Courses, Marks in each course} 
Input: Student academic details  
Output: Student categorization into weak and strong learners; Special inputs to weak students for improving their 
performance.  
1. Perform preprocessing of collected data. 
2. Use the pre-processed data as a training dataset. 
3. Training dataset is used to train the model for the generation of rules. 
4. Testing data is used for the prediction of performance using trained model; 
 {𝑠𝑡 𝑢 𝑖,𝑐𝑜 𝑢 𝑗 ,𝑀 𝐶 𝑖𝑗
} has been applied to map to get 𝑐𝑔𝑝 𝑎 𝑖 , using Eq. (1). 
5. (a) Eq. (2) specifies the general formula for the calculation of 𝐶𝐺𝑃𝐴 . 
(b) 𝑐𝑔𝑝 𝑎 𝑖 is computed using Eq. (3) where 𝑐𝑔𝑝 𝑎 𝑖 is the 𝐶𝐺𝑃𝐴 of individual student. 
6. The calculated𝐶𝐺𝑃𝐴 helps in the identification of weak and strong learners. 
7. The predicted results are being used by the: 
Learners (to improve their performance). 
Instructors (to provide suggestive measures to poor performers) 
 
4   Results 
The experimental results have been obtained using 
the data from the department of computer science of an 
academic institution. The dataset contains 400 records of 
current students belonging to different sections of the 
computer science department. The dataset has been  
 
 
 
divided using the split operator, where 70% of the entire 
data is being used for training the model and the rest 30% 
is used for the testing of an ensemble model. Major 
attributes considered for analyzing the performance are 
the attendance in each course, the grade obtained in each 
course, the overall CGPA of the student, number of 
pending E grades. Figure 4 shows the results generated 
by the proposed ensemble model. 
104   Informatica 47 (2023) 97–108                                                                                                                H. Kaur et al. 
 
Figure 4: Results generated by ensemble method. 
 
Performance vector shown in Tab. 3 proves the 
accuracy of the ensemble method using a vote operator 
that uses the majority vote from the base learners for 
predicting the results. The ensemble method has shown 
an accuracy of 90.83%. In the confusion matrix, 0 
represents good performers and 1 denotes bad 
performers. For fast learners, 82 instances are correctly 
identified whereas 10 are incorrectly identified. 
Similarly, for bad performers, 27 instances are correctly 
identified whereas 1 is incorrectly identified.  
 
 
Table 3: Performance vector (ensemble method). 
 
true 0 true 1 
class 
precision 
pred. 0 82 10 89.13% 
pred. 1 1 27 96.43% 
class recall 98.80% 72.97%  
 
 
 
Figure 5: Relationship between actual and predicted performance. 
 
 
 
82
1
10
27
P R E D . 0
P R E D . 1
pred. 0 pred. 1
true 1 10 27
true 0 82 1
P R E D I CT E D P E RF O RMAN CE BY P RO P O S E D 
E N S E MBLE MO D E L
true 1 true 0
A Prediction Model for Student Academic Performance…                                                         Informatica 47 (2023) 97–108   105 
The scattered 3D plot view of the relationship between 
actual and predicted results generated by the proposed 
ensemble model is illustrated in Figure 5 where x axis 
represents the RegdNo and the value column signifies 
performance in terms of slow (1) and fast learners (0).  
 
 
 
 
Figure. 6: Actual and predicted results based on E grades. 
 
 
 
 
Figure 7: Predicted performance based upon pending E grades. 
 
 
The actual and predicted results based on E-grades has 
been depicted in Figure 6 and 7. Figure 8 illustrates the 
registration-wise predicted performance of students after 
the re-appear exam has been given. The blue colour 
circle indicates good performance that is signified by 0 
whereas the green colour represents the poor 
performance of the student which is denoted using 1. The 
results show that the more the number of re-appears a 
student is having considered under the category of a poor 
performer. Therefore, corrective actions for such 
students are required to be taken on time by the student 
as well as from the instructor. 
106   Informatica 47 (2023) 97–108                                                                                                                H. Kaur et al. 
 
Figure 8: Predicted results considering reappear exam given. 
 
5    Conclusion and future directions 
Presently, the academic educational institutions are 
facing difficulty in sustaining the low retention rate of 
students. The task of maintain the retention rate can only 
be achieved by reducing the drop-out ratio of students. 
The high student retention rate depends significantly on 
the student academic performance. It becomes highly 
important for the academic institutions to predict the 
student performance for subsequent sessions such that 
retention rate can be maintained as well student 
performance can be improved. Also, the prediction of 
student academic performance at an early phase of their 
degree helps to do self-assessment for their downfall so 
that the student can do the corrective actions to improvise 
his/her on time. The model is helpful for the instructors 
as well who can verify and revise their pedagogical 
approaches if required. 
A lot of research is being carried out to develop models 
for predicting the student academic performance using 
academic data mining strategies. Various machine 
learning techniques have been used to develop such 
predication models that act as an aid for the academic 
institutions. The paper proposes an ensemble model 
based on machine learning techniques, Decision Tree, 
Naïve Bayes, and K-NN classification algorithms 
catering to such problems. It helps in identifying the 
weak learners by predicting their performance based 
upon the historical academic data. The model has been 
implemented on a gathered dataset and achieves an 
accuracy 90.83%.   
The research work presented in this paper can be 
further extended to develop a recommender system that 
will use the performance prediction results and 
subsequently recommend course-specific elective 
courses to the students. Such recommendations tend to 
augment student skills depending on their performance. 
Additionally, a recommender system can be developed 
that offers students interest-oriented or choice-driven  
 
suggestions regarding course selection considering and 
mapping the student’s previous performance along with 
the student choice. The major research for the academic 
performance prediction of the students considers the 
direct factors (such as courses, marks in each course, 
attendance and grades etc.). The incorporation of the in-
direct factors (such as physiological, behavioral, 
economic and social etc.) that affect the student 
academic performance can be carried out further.  
Recently, several edtech companies have emerged 
during COVID 2019 era. Such companies are engaged in 
the practice of incorporating Information Technology 
(IT) and digital tools for the student learning and 
engagement. The edtech companies are now using 
predictive analytics for mining student academic 
records, enrollment, attendance, class engagement, etc. 
The edtech companies can use the prediction as well as 
recommendation models to help the students by 
suggesting the appropriate course based upon their 
predicted performance.  
Acknowledgement: Mohamed Alwanin would like to 
thank Deanship of Scientific Research at Majmaah 
University for supporting this work under Project No. R-
2022-###. The authors deeply acknowledge the 
Researchers Supporting Program (TUMA-Project-2021-
14), AlMaarefa University, Riyadh, Saudi Arabia for 
supporting steps of this work. 
Funding Statement: Mohamed Alwanin like to thank 
Deanship of Scientific Research at Majmaah University 
for supporting this work under Project No. R-2022-###. 
This research was supported by Researchers Supporting 
Program (TUMA-Project-2021-14), AlMaarefa 
University, Riyadh, Saudi Arabia. 
Conflicts of Interest: Authors declare that there is no 
conflict of interest associated with this study. 
A Prediction Model for Student Academic Performance…                                                         Informatica 47 (2023) 97–108   107 
R efer ence s 
[1] V.L. Miguéi, A. Freitas, P.J.V. Garcia and A. Silva, 
"Early segmentation of students according to their 
academic performance:   A predictive modelling 
approach," Decision Support System,vol. 6, no. 5, 
pp. 65-78, 2018. 
[2] S. J. Lakshmi and M. Thangaraj, "Recommender 
system for stimulating the learning skill of slow 
learner in higher educational institution using 
EDM," International Journal on Recent 
Technolofical Engineering, vol. 5, pp. 98-109, 
2019. 
[3] D. T. Ha, P. T. T. Loan, C. N. Giap and N. T. L. 
Huong, "An empirical study for student academic 
performance prediction using machine learning 
techniques," International Journal of Computer 
Science and Information Security (IJCSIS), vol. 18, 
no. 3, pp. 75-82, 2020 
[4] R. Umer, T. Susnjak, A. Mathrani and S. 
Suriadi,"On predicting academic performance with 
process mining in learning analytics," Journal of 
Resource Innovation and Teach Learnearning, vol. 
78, pp. 155-168, 2017. 
[5] O.H.T. Lu, A.Y.Q. Huang, J.C.H. Huang, A.J.Q 
Lin, H. Ogata et al., "Applying learning analytics 
for the early prediction of students’ academic 
performance in blended learning," Educational 
Technological Socoety,vol. 55, pp. 111-123, 2018. 
[6] O. Viberg, M. Hatakka, O. Bälter, A. 
Mavroudi,"The current landscape of learning 
analytics in higher education,"Computers in 
Human Behavior, vol. 18, pp. 1001-1222, 2018. 
[7] M. S. B. M. Azmi and I. H. B. M. Paris, “Academic 
performance prediction based on voting 
technique,” in 2011 IEEE 3rd International 
Conference on Communication Software and 
Networks , Calcuta, India, pp. 24-27, 2011. 
[8] Tarandeep Kaur, Harjinder Kaur, "Machine 
Learning: An Internal Review", Journal of 
Emerging Technologies and Innovative Research, 
5, no. 11, 6, 2018. 
[9] C. Romero, P.G. Espejo, A. Zafra, J.R. Romero and 
S. Ventura, "Web usage mining for predicting final 
marks of students that use Moodle courses," 
Computer Application in Engineering and 
Education,vol. 65, pp. 555-578, 2013. 
[10] A. M. Shahiri and W. Husain, "A review on 
predicting student's performance using data mining 
techniques," Procedia Computer Science, vol. 72, 
pp. 414-422, 2015. 
[11] N. Thai-Nghe, L. Drumond, A. Krohn-Grimberghe 
and L. Schmidt-Thieme, "Recommender system for 
predicting student performance," Procedia 
Computer Science, vol. 20, pp. 55-65, 2010. 
[12] S. Huang and N. Fang, "Predicting student 
academic performance in an engineering dynamics 
course: A comparison of four types of predictive 
mathematical models,"Comput Education, vol. 55, 
no. 6, pp. 33-42, 2013. 
[13] M. Imran, S. Latif, D. Mehmood and M. S. 
Shah,"Student academic performance prediction 
using supervised learning techniques," 
International Journal on Emerging Technologies in 
Learning, vol. 77, pp. 102-120, 2019. 
[14] P. Strecht, L. Cruz, C. Soares, J. Mendes-Moreira 
and R. Abreu,"A Comparative Study of 
Classification and Regression Algorithms for 
Modelling Students’ Academic Performance,", in 
Proc. ICEDM, Noida, India, pp. 55-64, 2015. 
[15] P. Kamal and S. Ahuja,"An ensemble-based model 
for prediction of academic performance of students 
in undergrad professional course," Journal of 
Engineering Design and Technology, vol. 98, pp. 
654-672, 2019. 
[16] V. Skrbinjek and V. Dermol, "Predicting students’ 
satisfaction using a decision tree,"Tert Education 
and Management,vol. 64, pp. 210-218, 2019. 
[17] Dr. Antino Marelino. (2014). Customer Satisfaction 
Analysis based on Customer Relationship 
Management. International Journal of New 
Practices in Management and Engineering, 3(01), 07 
- 12. Retrieved from 
http://ijnpme.org/index.php/IJNPME/article/view/2
6 
[18] Dr. Sandip Kadam. (2014). An Experimental 
Analysis on performance of Content Management 
Tools in an Organization. International Journal of 
New Practices in Management and Engineering, 
3(02), 01 - 07. Retrieved from 
http://ijnpme.org/index.php/IJNPME/article/view/2
7 
[19] Ms. Nora Zilam Runera. (2014). Performance 
Analysis on Knowledge Management System on 
Project Management. International Journal of New 
Practices in Management and Engineering, 3(02), 
08 - 13. Retrieved from 
http://ijnpme.org/index.php/IJNPME/article/view/2
8 
[20] Mrs. Leena Rathi. (2014). Ancient Vedic 
Multiplication Based Optimized High Speed 
Arithmetic Logic. International Journal of New 
Practices in Management and Engineering, 3(03), 01 
- 06. Retrieved from 
http://ijnpme.org/index.php/IJNPME/article/view/2
9Kaur H, Kushwaha AS., “An elicit elucidation on 
the process of education data mining” , International 
Conference on Intelligent Computing and Control 
Systems, ICCS 2019.  
[21] S. Roy and A. Garg, "Predicting academic 
performance of student using classification 
techniques," in 2017 4th IEEE Uttar Pradesh 
Section International Conference on Electrical, 
Computer and Electronics (UPCON), Korat, 
Thailand, pp. 568-572, 2017. 
[22] S. Poonam, S. Ahuja, V. Jaitly and S. Jain, “A 
framework to alleviate common problems from 
108   Informatica 47 (2023) 97–108                                                                                                                H. Kaur et al. 
recommender system,"A case study for technical 
course recommendation," Journal of Discrete 
Mathematical Sciences and Cryptography, vol. 23, 
no.2, pp. 451-460, 2020. 
[23] A. Rajak, A. K. Shrivastava and V. Vidushi, 
“Applying and comparing machine learning 
classification algorithms for predicting the results 
of students,” Journal of Discrete Mathematical 
Sciences and Cryptography, vol. 23, no.2, pp. 419-
427, 2020. 
[24] H. Guruler, A. Istanbullu and M. Karahasan, "A 
new student performance analysing system using 
knowledge discovery in higher educational 
databases," Computer Education, vol. 6, no. 5, pp. 
125-138, 2010. 
[25] A. Rajak, A. K. Shrivastava and V. Vidushi, 
“Applying and comparing machine learning 
classification algorithms for predicting the results 
of students,” Journal of Discrete Mathematical 
Sciences and Cryptography, vol. 23, no.2, pp. 419-
427, 2020. 
[26] A. Siddique, A. Jan, F. Majeed, A.I. Qahmash, 
N.N. Quadri et al., “Predicting Academic 
Performance Using an Efficient Model Based on 
Fusion of Classifiers,” Applied Sciences, vol. 11, 
no. 24, pp. 11845, 2021. 
[27] A. S. Hoffait and M. Schyns,"Early detection of 
university students with potential difficulties," 
Decision Support System, vol. 9, no. 5, pp. 5-20, 
2017.