Informatica 40 (2016) 409–414 409
Performance Comparison of Featured Neural Network Trained with
Backpropagation and Delta Rule Techniques for Movie Rating Prediction in
Multi-criteria Recommender Systems
Mohammed Hassan
University of Aizu, Aizuwakamatsu, Fukushima, Japan
E-mail: d8171104@u-aizu.ac.jp
Mohamed Hamada
University of Aizu, Aizuwakamatsu, Fukushima, Japan
E-mail: hamada@u-aizu.ac.jp
Keywords: multi-criteria recommender systems, artificial neural network, prediction accuracy, backpropagation, delta
rule
Received: November 22, 2016
Recommender systems are software tools that have been widely used to recommend valuable items to users.
They have the capacity to support and enhance the quality of decisions people make when finding and se-
lecting items online. Such systems work based on which techniques are used to estimate users’ preferences
on potentially new items that might be useful to them. Traditionally, the most common techniques used by
many existing recommendation systems are collaborative filtering, content-based, knowledge-based and a
hybrid-based which combines two or more techniques in different ways. The multi-criteria recommenda-
tion technique is a new technique used to recommend items to users based on ratings given to multiple
attributes of items. This technique has been used and proven by researchers in industries and academic
institutions to provide more accurate predictions than traditional techniques. However, what is still not yet
clear is the role of some machine learning algorithms such as the artificial neural network to improve its
prediction accuracy. This paper proposed using a feedforward neural network to model user preferences
in multi-criteria recommender systems. The operational results of experiments for training and testing the
network using two training algorithms and Yahoo!Movie dataset are also presented.
Povzetek: Opisana je primerjava več metod, tudi nevronske mreže, za napovedovanje uspešnosti filmov z
večkriterijskim priporočilnim sistemom.
1 Introduction
Recommender systems are intelligent systems that play
important roles in providing suggestions of valuable items
to users. The types of suggestions given by the systems
can be of different forms depending on the domain of rec-
ommendations. For example, in a movie recommendation
problem such as Netflix1, the systems can suggest the kinds
of movies to watch. Similarly, music can be recommended
to users in a music recommender systems like Pandora2,
or items to buy can be recommended in Amazon3, or per-
sonalized online news recommender systems like Google-
News4 can recommend news for users to read [1, 2, 3, 4].
Recommender systems are classified based on the tech-
This paper is based on Mohammed Hassan & Mohamed Hamada,
Rating Prediction Operation of Multi-criteria Recommender Systems
Based on Feedforward Network, published in the Proceedings of the
2nd International Conference on Applications in Information Technology
(ICAIT-2016).
1https://www.netflix.com/
2www.pandora.com
3https://www.amazon.com/
4https://news.google.com/
nique used during their design and implementation. Tradi-
tionally, collaborative filtering, content-based, knowledge-
based, and a hybrid-based filtering are the commonly used
techniques to design recommender systems. Therefore,
knowing the recommendation techniques is at the heart of
our understanding of recommender systems. Those tech-
niques are sometimes called traditional techniques, and are
increasingly becoming popular ways of building recom-
mender systems [5].
However, despite their popularity and ability to provide
considerable prediction and recommendation accuracies,
they suffer from major drawbacks [6, 7, 8] because they
work with just a single rating, whereas most of the time
the acceptability of the item recommended may depend on
several item’s attributes [9]. Researchers have suggested
that if ratings provided to those several characteristics of
items would be considered during the prediction and rec-
ommendation process, it could help to enhance the quality
of recommendations since complex opinions of users will
be captured from various attributes of the item. Recent de-
velopments in this field have led to the existence of a new
410 Informatica 40 (2016) 409–414 M. Hassan et al.
recommendation technique known as the multi-criteria rec-
ommendation technique [6, 9] that exploits multiple crite-
ria ratings from various items’ characteristics to make rec-
ommendations. This technique has been used for a wide
range of recommendation applications such as recommend-
ing products to customers [11, 10], hotel recommendations
for travel and tourism [12], and so on. Nevertheless, having
considered multi-criteria techniques as the answer to some
of the limitations of traditional techniques, it is also logi-
cal to look at various ways of modeling the multiple ratings
to enhance the prediction accuracies and recommendation
qualities. However, few researchers have been able to ad-
vance on systematic research into improving the prediction
accuracy [13]. In addition, no previous research has in-
vestigated the effect of using artificial neural networks to
model users’ preferences in order to improve the predic-
tion operations of multi-criteria recommender systems [9].
Therefore, as an attempt to investigate the effectiveness of
applying neural network techniques for improving predic-
tion accuracies of multi-criteria recommender systems, this
study seeks to examine the performance of backpropaga-
tion and delta rule algorithms to train the network using
a multi-criteria rating dataset for recommending movies to
users based on four attributes of the movies. This paper has
been divided into five sections including this introduction
section. The second part of the paper gives a brief literature
review. The experimental methodologies are contained in
the third section while the fourth section displays the re-
sults and discussion and the final section is concerned with
the conclusion and presenting future research work.
2 Related background
2.1 Multi-criteria recommender systems
To be able to understand the concept of recommender sys-
tems, some mathematical notations U , I , δ, and ψ to repre-
sent the set of users, the set of items, a numerical rating, and
a utility function are introduced respectively. The notation
δ is the measure of the degree to which a user µ ∈ U will
like ι ∈ I , while the utility function ψ is a mapping from
a µ × ι pair to a number δ, written as ψ : µ × ι 7→ δ. The
value of δ is a number within a specifically defined interval
such as between 1 and 5, 1 and 13, or it can be represented
using non-numerical values such as "like", "don’t like", . .
., "strongly like", true or false, and so on [14]. Therefore,
recommender systems try to predict the value of δ ∀ι ∈ I
that have not been seen by µ and recommend those with a
high value of δ.
The methods of prediction and recommendation ex-
plained in the above paragraph are the mechanisms fol-
lowed essentially by traditional recommendation tech-
niques. Moreover, a similar approach is followed by the
multi-criteria recommendation technique with the distinc-
tion that it uses multiple values of δ for each µ × ι pair.
In the multi-criteria technique, the utility function ψ can be
generally defined using the relations in equation 1.
ψ : µ× ι 7→ δ0 × δ1 × δ2 × ...× δn (1)
It is important to note however, there are n ratings in the
above equation with the additional rating δ0 called the over-
all rating which needs to be computed based on the other n
values as in equation 2.
δ0 = f(δ1, δ2, δ3, ..., δn) (2)
The technique can work even without taking δ0 into ac-
count so that there is no overall rating, only ratings of
other attributes will be used to undertake the operation pro-
cess. However, evidence observed from many researchers
confirmed the greater efficiency of considering the overall
rating rather than ignoring it [6]. The two common ap-
proaches used to model multi-criteria rating recommenders
are heuristic-based approach that uses certain heuristic as-
sumptions to estimate the rating of an individual item for
a user, and a model-based approach that learns a model to
predict the utility and recommends unknown items. This
classification leads to grouping the multi-criteria rating al-
gorithms into model- and heuristic-based algorithms. For
the sake of this experiment, we only need to understand
one model among model-based approaches known as the
aggregation function model, but nonetheless, for a detailed
explanation of the two categories of multi-criteria rating al-
gorithms readers can refer to [8, 9].
The aggregation function approach starts by selecting
and training a function or model (such as a neural network)
to learn how to predict the overall rating from the crite-
ria ratings. Secondarily, the multi-criteria problem will be
decomposed into traditional recommendation problems so
that missing ratings for each criterion can be treated as a
single rating problem. Finally, the system uses the trained
model and the single rating recommenders to predict the
overall rating as in equation 2.
2.2 Artificial neural network model
An artificial neural network is one of the most powerful
classes of machine learning models that can learn com-
plicated functions from a data to solve many optimization
problems. It aimed to mimic the functions of biological
neurons that receive, integrate, and communicate incoming
signals to other parts of the body [15]. Similarly, the ar-
tificial neural network contains sets of connected neurons
arranged in a layered style (see Figure 1), where the input
layer consists of neurons that receive input from the exter-
nal environment and the output layer neuron receives the
weighted sums of the products of input values and their
corresponding weights from the previous layer and sends
its computational result to the outside environment.
The features x1, x2, x3, and x4 in Figure 1 are inputs
presented to the input layer, the parameters ω0, ω1, ω2, ω3,
and ω4 are the synaptic weights for links between the input
and output nuerons.
∑
is the weighted sum of ωixi for
i ∈ [0, 4], x0 is a bias, and f is an activation function that
Performance Comparison of Featured Neural Network. . . Informatica 40 (2016) 409–414 411
Figure 1: A Single Layer Neural Network
estimates the output Y written as f(
∑4
i=0 ωixi). A feed-
forward network may contain more than two layers, where
hidden layer(s) can be added between the input and output
layers. The second experiment uses an extended version
of the network presented in Figure 1 by adding one hidden
layer between input and output layer. This is a brief ex-
planation of how the neural network behaves; details of its
learning process can be found in several machine learning
books and articles [15, 16].
3 Experiments
The experiment was performed using Yahoo!Movies5
datasets obtained from [10] for a multi-criteria movie rec-
ommendation system where movies are recommended to
users based on four characteristics of movies, namely, ac-
tion, story, direction, and visual effect of the movie which
are represented as c1, c2, c3, and c4 respectively. In addi-
tion to those four criteria, an additional rating co, called
overall rating criterion, was used to represent the final
user’s preference on a movie. The criteria values (ratings)
in the dataset were initially presented using a 13-fold quan-
titative scale from A+ to F representing the highest and the
lowest preferences of the user respectively. In the same
manner, we changed the rating representation to numerical
form (13 to 1 instead of A+ to F).
Table 1 consists of three parts: namely, Original, Modi-
fied, and Normalized datasets, where the first part displays
the sample of the original dataset extracted, and the second
part of the table is the same sample of the dataset modi-
5https://www.yahoo.com/movies/
fied into numerical ratings. Finally, for the network models
to work faster and more efficiently, the numerically trans-
formed dataset was normalized to real numbers between 0
and 1 through dividing each of the modified ratings by 13
(since 13 is the highest) as displayed in the last part of the
same table. The dataset was well cleaned to avoid cases
of uncompleted entries where ratings to some criteria will
be missing, and also cases of users who rated few movies
(less than five movies). Movies rated by a small number of
users were removed completely from the dataset. This data
cleaning process reduced the size of the dataset to a total of
approximately 63,000 ratings sets. The dataset was divided
into training and test data in a ratio of 75:25 for the two ex-
periments. The target of the study was to use a feedforward
network to learn how to estimate co from c1, c2, c3, and c4.
Two feedforward networks were developed using object
oriented programming techniques in java [18] with learning
capacities in delta rule and backpropagation. The Adaline
network consists of an input and output layer as in Figure 1
with the input layer containing four neurons and a bias for
passing the data to the output layer. The linear activation
function f was used in the output neuron to process the
weighted sum of the inputs xi received from the input layer.
Furthermore, in addition to the two layers in the Ada-
line, a network containing an additional hidden layer with
the same number of neurons as the input layers was used
for backpropagation training with an additional activation
function g (sigmoid function) that receives the weighted
sum from the input layer and sends the result of its compu-
tation to the output neuron. For measuring the training and
test error, mean square error in equation 3 for real output
oj (where oj = f(
∑5
i=0 xiωi)) and the estimated output
412 Informatica 40 (2016) 409–414 M. Hassan et al.
UserID MovieID Action Story Direction Visual Overall
c1 c2 c3 c4 co
Original dataset 459 B A
− A A− B+
1 554 A− A A A A
554 A− A− A A+ A−
Modified dataset 459 9 11 12 11 101 554 11 12 12 12 12
554 11 11 12 13 11
Normalized dataset 459 0.692... 0.846... 0.923... 0.846... 0.769...1 554 0.846... 0.923... 0.923... 0.923... 0.923...
554 0.846... 0.846... 0.923... 1.000 0.846...
Table 1: Sample of extracted and modified dataset
yj , was used to compute the errors. Pearson correlation co-
efficient (PCC) presented in equation 4 was also used as a
metric for measuring the relative relationship between the
real and estimated output for the test data.
MSE =
1
2N
N∑
j=1
(yj − oj)2 (3)
PCC =
∑
(yj − y)(oj − o)√∑
(yj − y)2
√∑
(oj − o)2
(4)
4 Results and discussion
In each of the two algorithms, neurons’ weights ωi were
initially generated at random and the network computes
the outputs and the corresponding errors (as 12 (yj − oj)2).
Iteratively, the algorithms search for a set of weights ωi
i = 0, 1, ..., 4 that minimize the error. Since the two al-
gorithms are based on gradient descent, the training begins
at some points on the error function shown in equation 3
with defined ωi, and tries to move to the optimal solution
(global minimum) of the function. The rate of the move-
ment is always determined by a parameter known as learn-
ing rate denoted by α which controls how much the ωi′s
can be changed with respect to the observed training errors.
Therefore, choosing the correct α is paramount since it can
greatly influence the accuracy of the models. Deciding on
the best value of α is not always obvious from the begin-
ning of the experiment, as such, the study began by testing
various values between 0.1 and 0.001 to find the one that
could relatively produce the smallest error. The entire ex-
periment was carried out using α = 0.007, which produced
the optimal error. The adaptive linear neuron (Adaline) net-
work trained using delta rule shows a quick convergence
within a few number of iterations (about 10 iterations) with
a very good performance. On the other hand, the backprop-
agation algorithm prolongs the learning process where a
large number of training cycles (epochs) have been used to
monitor its performance and the result is presented in Fig-
ure 2. This figure shows the average MSE for the various
Table 2: Performance Statistics
Algorithm Number
of Itera-
tions
Average
Training MSE
(×10−3)
Percentage
PCC
Adaline 10 5.34 94.4%
BPA 10,000 7.30 90.0%
numbers of training cycles. It shows that the convergence
can only be attained at a very high number of iterations.
However, for the purpose of comparison, the number of
training cycles was set to 10,000 cycles (epoch = 10,000),
the training error and correlations between the actual and
estimated output of the test set for the two algorithms are
shown in Table 2. Furthermore, to reaffirm the correlations
between test results each of the two models and the actual
values from the dataset, Figure 3 shows the curves, one for
the actual values from the dataset, and the other two repre-
sent the corresponding predicted values by Adaptive linear
neuron (Adaline)- and backpropagation (BPA)-based net-
works. The figure confirmed the accuracy of the Adaline
network over the backpropagation-based network.
5 Conclusion and future work
This study was carried out to investigate the relative perfor-
mance of single layer and multilayer feedforward networks
trained using delta rule and the backpropagation algorithm
respectively.
The performance of each model was measured using
MSE for the training and the percentage of the correct pre-
dictions were evaluated on the test data using Pearson cor-
relation coefficient. From Figure 2 and Table 2, it can be
seen that the backpropagation algorithm has a greater de-
mand for longer training cycles to converge.
Moreover, the results indicate that the one layer net-
work trained using the delta rule algorithm is more effi-
cient than the two layer network which supports the tradi-
Performance Comparison of Featured Neural Network. . . Informatica 40 (2016) 409–414 413
Figure 2: Average training MSE for Backpropagation
tional belief that a single layer network produces less error
than a multilayered network [17]. Up to our last experi-
ment with epochs of 10,000, backpropagation did not com-
pletely show final convergence, therefore, further investi-
gation is recommended to estimate the approximate epochs
required by the algorithm to converge and to know whether
it will produce a better result than Adaline. The study con-
firmed the usefulness of training a neural network model
with features of inputs obtained for predicting user prefer-
ences on items based on several characteristics of items in
multi-criteria recommender systems. Future studies on the
current topic are recommended to investigate the perfor-
mance of more sophisticated neural network architectures
and algorithms such as the restricted Boltzmann machine,
deep neural networks, convolutional neural networks, and
other similar neural networks. However, as the result of
this study gives us a hint on the best network architecture
and appropriate training algorithm to use, further work is
required to extend this research by integrating the model
with some popular collaborative filtering algorithms; such
as the matrix factorization algorithm that can work on a sin-
gle rating to predict individual criterion ratings to develop
a complete multi-criteria recommender based on Adaline.
Furthermore, as the scope of recommender systems cov-
ers many application domains like the domain of technol-
ogy enhanced learning and e-commerce, investigating the
effect of neural networks to improve their accuracies is a
good direction for future research.
References
[1] Ricci, F., Rokach, L., & Shapira, B. (2015). Recom-
mender systems: introduction and challenges. In Rec-
ommender Systems Handbook (pp. 1-34). Springer
US.
[2] Mahmood, T., & Ricci, F. (2009, June). Improving
recommender systems with adaptive conversational
strategies, In Proceedings of the 20th ACM confer-
ence on Hypertext and hypermedia (pp. 73-82).
[3] Park, D. H., Kim, H. K., Choi, I. Y., & Kim, J. K.
(2012). A literature review and classification of rec-
ommender systems research. Expert Systems with Ap-
plications, 39(11), 10059-10072.
[4] , Beam, Michael A (2014). Automating the News
How Personalized News Recommender System De-
sign Choices Impact News Reception. Communica-
tion Research 41(8) Sage Publications. pp. 1019–
1041.
[5] Hassan, M., & Hamada, M. (2016). Recommending
Learning Peers for Collaborative Learning Through
Social Network Sites. IEEE ISMS, Intelligent Sys-
tems, Modeling and Simulation.
[6] Adomavicius, G., & Kwon, Y. (2007). New rec-
ommendation techniques for multicriteria rating sys-
tems. IEEE Intelligent Systems, 22(3), 48-55.
[7] Adomavicius, G., & Tuzhilin, A. (2005). Toward the
next generation of recommender systems: A sur-
vey of the state-of-the-art and possible extensions.
IEEE transactions on knowledge and data engineer-
ing, 17(6), 734-749.
[8] Manouselis, N., & Costopoulou, C. (2007). Analysis
and classification of multi-criteria recommender sys-
tems. World Wide Web, 10(4), 415-441.
414 Informatica 40 (2016) 409–414 M. Hassan et al.
Figure 3: Curve of Actual and some testing results
[9] Adomavicius, G., Manouselis, N., & Kwon, Y.
(2015). Multi-criteria recommender systems. In Rec-
ommender systems handbook, Springer US. (pp. 854–
887).
[10] Lakiotaki, K., Matsatsinis, N. F., & Tsoukias, A.
(2011). Multicriteria user modeling in recommender
systems. IEEE Intelligent Systems, 26(2), 64-76.
[11] Palanivel, K., & Sivakumar, R. (2011). A study
on collaborative recommender system using fuzzy-
multicriteria approaches. International Journal of
Business Information Systems, 7(4), 419-439.
[12] Jannach, D., Gedikli, F., Karakaya, Z., & Juwig,
O. (2012). Recommending hotels based on multi-
dimensional customer ratings. na.
[13] Jannach, D., Karakaya, Z., & Gedikli, F. (2012,
June). Accuracy improvements for multi-criteria rec-
ommender systems. In Proceedings of the 13th ACM
Conference on Electronic Commerce (pp. 674-689).
ACM.
[14] Ning, X., Desrosiers, C., & Karypis, G. (2015). A
comprehensive survey of neighborhood-based recom-
mendation methods.In Recommender systems hand-
book Springer US.(pp. 37-76).
[15] Graupe, D. (2013). Principles of artificial neural net-
works (Vol. 7). World Scientific.
[16] Wasserman, P. D. (1993). Advanced methods in neu-
ral computing. John Wiley & Sons, Inc...
[17] Souza, A. M., & Soares, F. M. (2016). Neural network
programming with Java. Packt Publishing Ltd.
[18] Kendal, S. (2009). Object Oriented Programming us-
ing Java. Bookboon.