https://doi.org/10.31449/inf.v45i1.3258 Informatica 45 (2021) 13–31 13
A Novel Borda Count Based Feature Ranking and Feature Fusion Strategy to
Attain Effective Climatic Features for Rice Yield Prediction
Subhadra Mishra
Department of Computer Science and Application, CPGS
Odisha University of Agriculture and Technology, Bhubaneswar, Odisha, India
E-mail: mishra.subhadra@gmail.com
Debahuti Mishra
Department of Computer Science and Engineering,
Siksha ’O’ Anusandhan Deemed to be University, Bhubaneswar, Odisha, India
E-mail: debahutimishra@soa.ac.in
Pradeep Kumar Mallick
School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
E-mail: pradeep.mallickfcs@kiit.ac.in
Gour Hari Santra
Department of Soil Science and Agricultural Chemistry, IAS
Siksha ’O’ Anusandhan Deemed to be University, Bhubaneswar, Odisha, India
E-mail: santragh@gmail.com
Sachin Kumar (Corresponding Author)
Department of Computer Science, South Ural State University, Chelyabinsk, Russia
E-mail: sachinagnihotri16@gmail.com
Keywords: rice crop yield prediction, climatic variability, extreme learning machine, feature ranking, feature fusion
Received: July 29, 2020
An attempt has been made in the agricultural ﬁeld to predict the effect of climatic variability based on
rice crop production and climatic features of three coastal regions of Odisha, a state of India. The novelty
of this work is Borda Count based fusion strategy on the ranked features obtained from various ranking
methodologies. Proposed prediction model works in three phases; in the ﬁrst phase, three feature ranking
approaches such as; Random Forest, Support Vector Regression-Recursive Feature Elimination (SVR-
RFE) and F-Test are applied individually on the two datasets of three coastal areas and features are ranked
as per the their algorithm. In the second phase; Borda Count as a fusion method has been implemented on
those ranked features from the above phase to obtain top ﬁve best features. The multiquadratic activation
function based Extreme Learning Machine (ELM) has been used to predict the rice crop yield using those
ranked features obtained from fusion based raking strategy and the number of varying features are obtained
which gives prediction accuracy above 99% in the third phase of experimentation. Finally, the statistical
paired T-test has been used to evaluate and validate the signiﬁcance of proposed fusion based ranking
prediction model. This prediction model not only predicts the rice yield per hector but also able to obtain
the signiﬁcant or most affecting features during Rabi and Kharif seasons. From the observations made
during experimentation, it has been found that; relative humidity is playing a vital role along with minimum
and maximum temperature for rice crop yield during Rabi and Kharif seasons.
Povzetek:
ˇ
Clanek opisuje izviren pristop pri iskanju vzorcev vremenske variabilnosti s pomoˇ cjo metod za
izbiro in združevanjem atributov.
1 Introduction
Agriculture is the major source of livelihood for people
in Odisha as well as India, but here it is said that ‘Agri-
culture is the gamble of the monsoon’. Due the climatic
changes the production of major yield is reduced in the
Kharif. While Kharif rain fall over the country might be
increased by 10-15%, but winter rain fall is expected to de-
crease by 5-25% and seasonal variability would be further
compounded [1].
It is highlighted that, due to heavy temperature, includ-
ing water shortage, distribution of rainy days, maximum
loss is expected in Rabi crops and the productivity of Rabi
crops is decreased from 10% to 40% by 2100 [2]. Rice
yield is expected to decline by 6% for every 10°C rise in
14 Informatica 45 (2021) 13–31 S. Mishra et al.
temperature [3]. The scientiﬁc and policy personnel have
accepted the susceptibility of agriculture crop to climate
change and raised question the capability of farmers to
adapt because of the direct and strong dependence of crop
agriculture on climate [4]. There are different forecast-
ing methodologies available and evaluated by the research
workers all over the world in the ﬁeld of Agriculture. On
all India basis, the imitation study developed shows that
the yield of rice crop is affected by weather change from
2.5 to 12% [5]. The rice is the main food in eastern India
speciﬁcally in the states of Odisha, West Bengal, Jharkhand
and Bihar. In India green revolution is mainly Wheat as
contributed states was mainly Punjab, Haryana and UP. So,
Government of India is expecting the 2
nd
green revolution
from eastern India. The amount of data set is very large in
Indian agriculture. Earlier, the different model form dataset
was done only by manual system, when there was no outset
of computer. But with advancement of computer technol-
ogy, collection of huge data, their classiﬁcation and stor-
age has been increased. This has established enormous im-
provement in pattern perception. In this paper, the main
focus to develop a user friendly network for farmers which
provide the study of rice production on the basis of impor-
tant climatic parameter.
The current age is the age of data. As we are taking
the large dataset for accuracy of the result, so for model-
ing of the dataset the feature selection technique becomes
the prerequisite method [6, 7]. To increase the correctness
level of the experiment we have to increase the attributes
of the training examples that is the dataset [8, 9, 10]. As
the knowledge discovery technique is ﬁnding the knowl-
edge from the vast amount of data, so it is dare to do future
research for solving the real world troubles. Ranking is a
method to ﬁnd a rank between all the features according
to their importance. Selecting a least number of features
produce a simple model, this will take less time for com-
putation and can be understood easily. Due to the simpler
model fewer resources also required, which can be afford-
able. Now the question is how we can rank the features or
variables [9, 10, 11, 12, 13, 14]. There are so many algo-
rithms in machine learning to ﬁnd the signiﬁcant variables.
Thus, the concept of feature selection or variable selection
arises. It is the selection of the variables or selecting the
subset of the variables and this technique does not change
the original illustration of the variables.
During the application of the various feature ranking
techniques on the dataset, on each iteration small subsets
are being generated. For each feature, there is a rank order
of the result of each run and then united with the earlier
runs to form an ensemble [15, 16]. The Monte Carlo algo-
rithm states that an conclusion can be achieved by the com-
bining random consecutive rough calculation to the same
result [17]. This method stimulated the ensemble method.
As agriculture is the backbone bone of Indian economy
and rice is the main staple food, so the prediction of rice
and the timely advice on variation of climatic condition
for the farmers is required. This factor motivate us to pre-
pare a computational model for the farmers and ultimately
to the society also. The main aim of this work is to pre-
pare a computational model to ﬁnd the feature affected
most for the rice production. Here we have used three
different feature ranking methods such as Random Forest
[18, 19, 20, 21, 22, 23], SVR-RFE [24, 25, 26] and F-Test
[27, 28] for regression. These are mainly used for rank-
ing of genes in gene expression datasets. The same meth-
ods are used here to rank the features of rice crop predic-
tion datasets. Three ranking algorithms gave three different
ranks to each feature of the dataset. Then, a feature fusion
method has been proposed to evaluate the ﬁnal rank of each
feature and then, these newly ranked features are evalu-
ated by Extreme Learning Machine (ELM) [29, 30, 31, 32]
based regressor to measure the importance of each feature.
The accuracy of ELM-Regressor has been calculated by de-
creasing one by one feature from the dataset. Finally, the
comparison between proposed fusion based ranking strat-
egy and non fusion based ranking strategy has been made to
obtain the number signiﬁcant features contributing towards
the maximum accuracy of regressor. These features decide
the importance of climatic parameters in rice crop produc-
tion both for the Rabi season and Kharif season in the col-
lected districts namely, Balasore, Cuttack and Puri. Thus
the important ﬁnding of the study is temperature and hu-
midity affect mostly for the crop production in the coastal
district of Odisha.
1.1 Study area
In the Figure 1, the rice crop production dataset of three
districts such as: Balasore, Puri, Cuttack are shown [33].
The production of rice is mainly in two seasons, such as:
Rabi and Kharif. There are different features considered
for this production, such as: rainfall, minimum and maxi-
mum temperature and relative humidity in the morning and
afternoon hour. To avoid the inconsistency in the dataset
there are various methods for missing value [36] imputa-
tion. In this paper mean value used to solve the missing
value problem.
Figure 1: Odisha complete area taken from Google Map an
state of India [34]
1.2 Goal
Considering the typical data available in the above men-
tioned section, the use of data mining or machine learning
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 15
strategies should be able to produce a natural decision for
crop production based on the important or signiﬁcant cli-
matic parameters which affects the yield of rice during both
the Rabi and Kharif seasons. This paper mainly focuses on
the capabilities of ranking and fusion strategies, on two as-
pects such as; feature ranking and fusion of those ranked
features. Speciﬁcally, the goal of this study can be outlined
as follows:
(a) Collection of climatic data of rice yield for both the
Rabi season and the Kharif season of three coastal ar-
eas of Odisha, a state of India.
(b) Feature importance evaluation and selection;
(i) Ranking of features by applying various ranking
strategies.
(ii) Fusion of those ranked features.
(c) Selection important climatic features derived from the
ranked and fused features.
(d) Model tuning or searching for appropriate algorithm
parameters for better performance.
(e) Model evaluation and validation through performance
comparisons and statistical validation.
1.3 Paper layout
The rest of the paper is outlined as follows; the related work
in this ﬁeld is discussed in Section 2. The diagrammatic
representation of proposed regressor has been detailed in
Section 3. The methodologies such as Random Forest,
SVR-RFE, F-Test and ELM regressor and various fusion
strategies are discussed in Section 4. The experimentation
and model evaluation is discussed in Section 5 and Section
6 discusses the principal ﬁndings obtained from this study.
Finally Section 7 concludes the paper with future scope of
this work.
2 Literature survey
To contextualize the effect of goals set and discussed in
Section 1.2 in rice yield modeling, many papers were se-
lected for review which are based on machine learning or
data mining techniques be useful for modeling in this se-
rial; (a) ranking of features based on Random Forest, F-
Test and SVR-RFE (b) fusion strategies for feature selec-
tion and; (c) model evaluation and validation for proper
classiﬁcation. This section explores the various works done
on prediction on agricultural ﬁeld based on random forest,
F-Test and SVR-RFE etc. SML Venkata et al. [35] used the
dataset consisting of rainfall, precipitation and temperature
and applied random forest which is the collection of deci-
sion trees, on the two-third of the records and then the re-
sulting decision trees are applied on the remaining records
and lastly for the prediction of the crop data, the resultant
training sets applied on the test data based on the input at-
tributes. They have used R Studio and they evaluated their
results by using other performance measures. Evathia E
et al. [18] modiﬁed the structure and selection mechanism
of the random forest algorithm to improve the prediction
performance. Authors have veriﬁed all the evaluation mea-
sure and basing on the feature selection, clustering etc, they
have done the voting procedure. The main objective of their
work was the combination of the construction and voting
method of random forest algorithm. They found the posi-
tive effect on the performance by using 24 datasets. Hari
Dahal et al. [36] took six soil variables with crop yield data
to ﬁnd the level of crop productivity. They found some of
the soil variables have extremely correlated. So to estimate
the potency of the relationship they developed the multiple
regression models and applied F-Test to know which vari-
able is most signiﬁcant and found that total nitrogen, or-
ganic matter and phosphorous affect the yield of paddy. J.
P. Powell et al. [37] analyses the various weather events on
the crop winter wheat taking the data on the farm based and
of 334 farms for 12 years. They have used the F-Test to ﬁnd
the signiﬁcance of weather events in the model. They ob-
served and concluded that, the effect of weather events on
yield is time speciﬁc and also found that the high tempera-
ture and precipitation events signiﬁcantly decrease yields.
Ke Yan et al. [24] studied both the linear and non-
linear SVM-RFE algorithm. They have analyzed the cor-
relation bias and anticipated a new algorithm such as,
SVM-RFE+CBR. They have implemented in the synthetic
dataset. Lastly they found the accuracy on their proposed
method. Meng-Dar Shieh et al. [25] proposed one method
to eliminate the problem of choosing the features subset.
Shruti Mishra et al. [26] recommended one extensive devi-
ation of SVM-RFE and SVM-T-RFE. They found the max-
imum accuracy in case of classiﬁcation taking the less sub-
set of gene sets and also of high dimensional data. They
have also compared with other two methods such as SVM-
T-RFE and SVM-RFE and conclude that the projected step
by step method is 40% better than SVM-RFE and 25% bet-
ter than SVM-T-RFE. The ranking strategies adopted by
the above mentioned authors have motivated us to carry
forward our research on agricultural and climatic datasets.
3 Schematic representation of
proposed method
The feature ranking methods are mainly used to rank the
features. In this study, a revolutionary effort based on
feature ranking methods to ﬁnd the signiﬁcant climatic
features which affects mostly on the yield of rice of the
three coastal districts of Odisha for both the season such
as :Rabi and Kharif have been introduced. This empirical
study mainly focuses on the selection of signiﬁcant features
through feature ranking and feature fusion based strate-
gies. It works in three important phases, in the ﬁrst phase
known as feature ranking, Random Forest, SVR-FRE and
16 Informatica 45 (2021) 13–31 S. Mishra et al.
F-Test based regression methods are explored to rank all
the features of the datasets, then in second phase, new
ranks have been evaluated by considering all the ranked
features from above mentioned ranking techniques and ﬁ-
nally, ELM based regressor has been used to empirically
evaluate and validate the yield modeling. The Figure 2
illustrates the ﬂow of implementation of proposed ELM
based regressor model to obtain the important features that
contribute to the yield of rice production in the coastal ar-
eas of state of Odisha.
3.1 Data set description
The datasetD is composed of Odisha district of India (Fig-
ure 1). Let d
i
2 D 8i = 1;    ;31 features that is 31
years of data. wherejd
i
j=25 features that is represents the
attributes of the datasets. Different parameters are, such as
p=fmaxtemperature; mintemperature; rainfall;
humidityg that effect the rice production. Since, there
are two types of rice production seasons such as; Rabi
and Kharif produced between months ’January–May’ and
’June–December’, hence p
i
is collected over the range of
six months each resulting 24 set of attributes and 25
th
at-
tribute is the production in hector of crops for particular
year.
The rice production graph for those three coastal areas of
Odisha from the year 1983-2014 is shown in Figure 3(a)
and Figure 3(b) for Rabi and Kharif season respectively.
The detail description of datasets with standard deviation
(Std. Dev.) for three areas is shown in Table 1.
The range and average values of the parameters such as;
rainfall in mm/hector, maximum and minimum tempera-
ture in °C, mean relative humidity both at 8.30 am and 5.30
pm, of all three datasets with respect to three coastal dis-
tricts are shown in Table 2 for Rabi and Kharif seasons.
3.2 Study procedures
This section presents a usable scheme to predict the effect
of climatic parameters for rice yield in the coastal areas of a
state of India, Odisha, during both the Rabi and the Kharif
season. These steps are narrated as follows:
  Collection of the raw data including climatologic
characteristics and rice production per hector.
  Calculating the range and average of parameters of
those datasets for proper knowledge about the fea-
tures.
  Deﬁning the attributes affecting the rice yield.
  Redeﬁning the datasets and constructing the database
of all tuples according to the selected attributes.
  Dividing the raw data into training and testing
datasets.
  Designing the feature ranking models to rank all the
features of individual datasets for further processing.
  Designing a feature level fusion model using Borda
Count to generate a new set of ranked features by tak-
ing the ranked features from all three feature ranking
strategies for further analysis.
  Designing an ELM based regressor to classify the
datasets with the newly ranked features to measure the
importance of each feature.
  The accuracy of ELM regressor has been calculated
using by R2 score decreasing one by one feature from
the datasets.
  Finally, with respect to maximum accuracy, top 5
ranked features are selected, which decide the impor-
tance of climatic parameters in rice crop production
both Rabi and Kharif in three different districts.
  Finally, with respect to maximum accuracy, top 5
ranked features are selected, which decide the impor-
tance of climatic parameters in rice crop production
both Rabi and Kharif in three different districts.
4 Methodologies adopted for
experimentation
This section discusses the various methodologies such as
random forest; F-Test and SVR-RFE used for feature re-
duction and ELM for classiﬁcation are discussed in this
section.
4.1 Random forest
Random forest or Random Forest is one of the most im-
portant and popular supervised learning algorithm. It can
be used both for classiﬁcation and regression tasks. In this
case multiple trees are grown. Then for the classiﬁcation
of a new object based on the attributes, a classiﬁcation is
given by each tree and that is the tree ‘votes’ for that class.
The most votes over all the trees in the forest are chosen
for classiﬁcation and average of outputs by different trees
in case of regression. Random forest is one of the ensemble
methods of decision trees. Breiman proposed random for-
est where he adds an extra layer of randomness to bagging
[19]. Random forest has a vast number of applications due
to its good constancy and simpliﬁcation [19, 20, 21, 22, 23].
4.2 F-Test for regression
The F-Test for linear regression is one of the methods to
know the signiﬁcance of any variable among the indepen-
dent variables in a multiple linear regression. How the null
hypothesis can be can tested in a multiple regression model
with intercept can be described by the F-Test for regression
[27, 28].
H
0
:  1
=  2
=    =  p  1
=0 (1)
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 17
Figure 2: Graphical abstract of proposed model.
Table 1: Description of real datasets collected over period 1983-2014 for Rabi and Kharif production.
Seasons
Rabi Kharif
Dimension Mean Std. Dev. Dimension Mean Std. Dev.
Balasore 31  25 47.8386 20.84 31  35 81.6430 43.7791
Cuttack 31  25 44.7391 18.43 31  35 80.6577 50.6339
Puri 31  25 47.6373 25.77 31  35 78.9684 44.2095
H
0
:  i
6=0 for atleast one value of i (2)
Then, assuming the null hypothesis as true we have to test.
F =
MSM
MSE
=
Explained Variance
Enexplained Variance
(3)
Where, MSM=
SSM
DFM
and MSE=
SSE
DFE
MSM=Mean Squares for Model
SSM=Corrected Sum of Squares of Models
DFM=Corrected Degrees of Freedom for Models
DFE=Degree of Freedom for Error
Then, using an F-table or statistical software, we have to
ﬁnd conﬁdence interval for degrees of freedom.
4.3 Support vector regressor-recursive
feature elimination (SVR-RFE)
SVR-RFE is one of the variable selection or feature selec-
tion method. It is an optimization method for ﬁnding the
best performing feature set. Repeatedly it creates models
taking features subset and next with left features and lastly
it ranks the features on the basis of order of elimination
[24-26]. First the algorithm is trained by SVM with a lin-
ear kernel and then the features are detached recursively
using the smallest ranking criterion. In order to generate a
rank the weight vector needs to be calculated as given in
Equation (4).
W =
n
X
i=1
  i
x
i
y
i
(4)
Where,i is the number of features ranging from1ton;  i
is
the Lagrangian Multiplier estimated from the training set;
18 Informatica 45 (2021) 13–31 S. Mishra et al.
(a) (b)
Figure 3: Graphical representation of rice production of three regions for Rabi and Kharif seasons.
Table 2: Range and average values of the parameters in datasets.
Districts Parameter
Range Average
Average Rice
Production
Rabi Kharif Rabi Kharif Rabi Kharif
Rainfall(mm/hector) 0.0–431.2 0.0–696.5 55.2 280.3
Max Temperature (°C) 25-42 13 –37.4 33 32
Balasore Min Temperature (°C) 9.8 – 32 11.9 - 28 21 25 2261.50 1243.8
Mean Relative Humidity at
8.30AM (%)
53 - 81 35 - 88 68 79
Mean Relative Humidity at
5.30PM (%)
45 - 87 34 - 89 66 78
Rainfall(mm/hector) 0.0 – 477.8 0 – 752.8 36.42 268.2
Max Temperature (°C) 26 – 40 26.8 - 38 31.76 32
Cuttack Min Temperature (°C) 11 - 32 15 - 33 20.92 25 2064.71 1472.5
Mean Relative Humidity at
8.30AM (%)
58 – 95.5 67 – 95.4 84.33 87
Mean Relative Humidity at
5.30PM (%)
29.3 - 89 12 - 90 50.27 73
Rainfall (mm/hector) 0.0 – 735.5 0.0 – 826.5 27.13 247
Max Temperature (°C) 25 – 35.3 20.8 – 40.8 30.43 32
Puri Min Temperature (°C) 12 - 29 15.2 - 29 23.49 26 2053 1240
Mean Relative Humidity at
8.30AM (%)
70 - 92 66 - 92 80.74 83
Mean Relative Humidity at
5.30PM (%)
64 – 90 17 - 91 78.87 81
x
i
is the gene expression vector for samplei andy
i
is the
class label ofi(y
i
2[  1;+1])
4.4 Extreme Learning Machine (ELM)
Artiﬁcial Neural Network (ANN) is one of the best exam-
ples of classiﬁcation and regression technique which works
on back-propagation method. In this case weights are ad-
justed by trial and error methods. But there are various
disadvantages of ANN, such as; local minima, over ﬁtting
problem and large training time [38-40]. To overcome the
problem of memory requirements, Hung et al. [29] pro-
jected new method which is based on the least square algo-
rithm for classiﬁcation and regression problem, known as
ELM. ELM also has unique minimum solution, with both
smallest training error and smallest weight norm, does not
need a stopping methods.
ELM is a learning neural algorithm, introduced to de-
velop the efﬁciency of Single Layer Feed Forward Neu-
ral Network (SLFN). This section will brieﬂy explain the
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 19
Algorithm 1: SVR-RFE [[21, 22, 23]
Input: Initial feature subset,F =f1;2;    ;ng
Output: Rank list according to smallest weight
criterion,R.
1 SetR=fg
2 Repeat 3 -8 untilF is not empty
3 Train the SVM usingF .
4 Compute the Weight Vector using (1)
5 Compute the Ranking Criteria,Rank =W
2
6 Rank the features as in sorted manner,
New
Rank
=Sort(Rank)
7 Update the Feature Rank list
Update R=R+F(New
Rank
)
8 Eliminate the feature with smallest rank
Update F =F  F(New
Rank
)
working principle of ELM [30, 31, 32]. N is given as
a training sample, where (X
i
;Y
j
) 2 R
n
  R
m
. Here,
j =1;2;    ;N and the number of hidden nodes is consid-
ered asM. Representing the output of SLFN, the equation
is formulated in (5).
output
k
=
M
X
j=1
  j
f(X
k
)=
M
X
j=1
  j
f(X
k
;a
j
;b
j
);
k =1;2;    ;N (5)
Where, with respect to the input sample, the output vector
isoutput
k
andf(X
k
;a
j
;b
j
) is the activation function. a
j
and b
j
are the randomly generated learning parameter of
thek
th
hidden node and (5) can be compactly written as
H    =CalculatedOutput (6)
Here,
H =
2
6
6
6
4
f(a
1
:x
1
+b
1
)     f(a
M
:x
1
+b
M
)
.
.
.
.
.
.
.
.
.
f(a
1
:x
N
+b
1
)     f(a
M
:x
N
+b
M
)
3
7
7
7
5
N  M
  =
2
6
6
6
4
  T
1
.
.
.
  T
M
3
7
7
7
5
M  1
CalculatedOutput=
2
6
6
6
4
Output
T
1
.
.
.
Output
T
N
3
7
7
7
5
N  1
Where,H is the output matrix, (2) can be linear system by
analytically determine the output weights by ﬁnding the
least square solution, which is deﬁned in (3)
^
  =inv(H
0
  H)  H
0
  trainoutput (7)
Where,trainoutput is the output of the training data and
the beneﬁt of the ELM is that, the output weight is system-
atically calculated by using some mathematical transfor-
mation, avoiding the lengthy process of training and simul-
taneously no iterative adjustment of the training parameter
is required.
4.5 Fusion strategies
The Borda Count [41, 42] is one of the superior voting sys-
tem. In this case the voters rank the candidates according to
the inclination. Then the points are formed from ranking.
The candidates which will gate score one point then ranked
last, then score two and next-to-last and so on. Who will
secure the more points then declared as winner. There are
various other standard voting systems such as: Alternative
vote and the single transferable vote, but the advantages
of Borda count are, all the MPs have the support of a ma-
jority of their votes. The parties nominate the good one.
This method is a kind of group consensus functions which
maps the inputs of individual rankings to a combined form
of ranking which leads to a most appropriate and relevant
decision making process. With respect to machine learn-
ing, Borda Count is deﬁned as a sum of number of classes
ranked below the class by each classiﬁer. The degree of the
Borda Count reﬂects the level of agreement that the input
pattern belongs to the considered class. The main advan-
tage of this method is to implement and does not require
any training.
4.6 Validation strategies adopted
R
2
is one of the statistical compute to ﬁnd the ﬁtness of
the regression line with the data [43]. Some knowledge re-
garding the goodness of ﬁt of a model can be deﬁned by
this statistic [35, 36]. A linear model explains the propor-
tion of response variable variation and values ofR
2
always
lie between 0 and 100% or 0 and 1, where; 0% or 0 indi-
cates that the model explains none of the variability of the
response data around its mean and 100% or 1 indicates that
the model explains all the variability of the response data
around its mean and this statistics measure of how well the
regression predictions approximate the real data points. An
R
2
of 100% or 1 indicates that the regression predictions
perfectly ﬁt the data.
5 Experimentation and model
evaluation
5.1 Experimental setup
In this work all the implementations have been carried out
using python programming environment in Linux operat-
ing system with a minimum hardware conﬁguration of 4GB
RAM and 100GB hard disk. First of all, the different acti-
vation functions are tested for best suitability to our prob-
20 Informatica 45 (2021) 13–31 S. Mishra et al.
lem domain. Then, different feature ranking strategies have
been tested with ELM. Finally, the proposed fusion of fea-
ture ranking has been tested. The parameters used for ex-
perimentation is illustrated in Table 3.
5.2 Parameters used
The Table 3 gives the details of the parameters used for the
implementation.
5.3 Feature ranking methods
Here three different feature ranking methods such as Ran-
dom Forest, SVR-RFE and F-Test have been experimented
for regression. In literature, it has been found that, these
are mainly used for ranking of genes in gene expression
datasets and in this study; the same methods are used to
rank the features of rice crop prediction datasets. This
methodology works in three different steps such as; (a)
ﬁrst, the three ranking algorithms outputs three different
ranks to each feature of the dataset; (b) secondly, a fea-
ture fusion method based on Borda Count has been used to
evaluate the ﬁnal rank of each feature and; (c) ﬁnally, these
newly ranked features are evaluated by ELM based regres-
sor to measure the importance of each feature. The accu-
racy of ELM regressor has been calculated by decreasing
one by one feature from the datasets. Finally, with respect
to maximum accuracy, top ﬁve ranked features are selected,
which decide the importance of climatic parameters in rice
crop production both for the Rabi season and the Kharif
season in all the districts taken for the analysis. Figure
4 and Figure 5 shows the features are arranged in the de-
scending of their R
2
scores measuring the importance of
the features after applying the Random Forest feature rank-
ing method on both Rabi and Kharif seasons respectively
for Balasore, Cuttack and Puri districts. From Figure 4 for
Rabi season it can be observed that, the features 21, 18, 13
and 11 are having approximate importance scores from 0 to
13, whereas features 7 and 12 are having very less impor-
tance scores and rest are in a moderate stage for Balasore
district, for Cuttack district, features 0 (ﬁrst feature) and
7 are having approximate importance scores from 0 to 14,
whereas, features 3, 17 and 9 are having very less impor-
tance score. Similarly, for Puri district feature 21 has very
high importance and 19, 17, 23, 20, 8, 15, 22, 18 and 7 are
having moderate scores. Rest others can be ignored due to
their very less scores of importance.
Similarly, for Kharif season, from Figure 5 it can be seen
that, the feature 5 is showing highest importance score of 8
and the feature 5 is having the lowest score of importance
and rest are lying within the range of 2-6 scores for Bala-
sore district. For Cuttack district, features 1, 24, 8, 9, 23,
14 and 7 are having approximate importance scores from
0 to 7, rest other features are having very less importance
scores. Similarly, for Puri district features 8 and are hav-
ing very high importance with the scores 0 to 16, and 4, 10
and 9 are having moderate scores. Rest others can be ig-
nored due to their very less scores of importance. Figure
6 and Figure 7 shows the features with respect to theirR
2
scores measuring the importance of the features after ap-
plying the SVR-RFE feature ranking method on both Rabi
and Kharif seasons respectively for Balasore, Cuttack and
Puri districts. From Figure 6 for Rabi season it can be ob-
served that, the feature 23 is having the 1st rank, then fea-
tures 15, 9, 21 and 14 are showing better rank and few more
are showing moderate rank and feature 4 is having the low-
est rank giving rise to non-signiﬁcant feature. The feature
7 is having the highest rank, and feature 17 is with lowest
rank in Cuttack district. Similarly, the feature 19 has very
high rank and features 17, 11, 15, 23 are having better rank
and feature 4 has less importance in Puri district. Similarly,
in Figure 7, the feature 27 is experiencing the highest rank,
feature 25 and 9 is next to best and feature 0 (ﬁrst feature) is
having less rank with less impact of the feature in Balasore
district. For Cuttack district feature 16 is of great impor-
tance and feature 34 is of no or less importance, therefore
can be ignored. Feature 29 is showing the highest rank and
23, 9, 8 and 20 features are also experiencing better scores,
but feature 33 is with the lowest rank in Puri district. The
importance of features for both Rabi and Kharif seasons us-
ing F-Test for regression has been plotted in Figure 8 and
Figure 9 respectively. From the experimentation of Rabi
season (Figure 8), it can be seen that, for Balasore district
feature 21 is with the highest score, features 22,24,8,7,5,0
are with lowest scores,4,13,19 are negligible score and rest
others are having moderate scores. For Cuttack district fea-
ture 6 is with the highest score, 1, 5, 8, 9, 13, 17 and 18
are of no importance and they do not contribute for pro-
cessing. Similarly for Puri district feature 17 and 21 are
having the highest scores, features 1,4,5,13 and 23 are with
lowest scores and also it can be seen that rest other fea-
tures are also not showing better scores. From Figure 8
for Kharif season, the features 16, 1 and 8 are having the
highest importance for Balasore, Cuttack and Puri districts
respectively. Features 7, 8, 12, and 31 for Balasore, 2, 5, 6,
10 and 12 for Cuttack and 6, 15, 20, 24, 28 and 31 for Puri
datasets are showing scores of least importance.
5.4 Fusion of feature ranking methods
Here, a multiple ranking fusion scheme has been proposed.
In this scheme, the individual rankings using different rank-
ing methods have been obtained and then those ranked fea-
tures are combined to obtain the ﬁnal rankings of features.
The most popular and effective method for fusion used here
is Borda count method.
Mathematically, the fusion of features based strategy
can be proposed as; let the dataset is deﬁned as DS =
fx
1
;x
2
;x
3
;    ;x
n
g, where x
1
, x
2
, x
3
,     , x
n
repre-
sents n number of features of the dataset and r
1
, r
2
and
r
3
are three ranking methods used and the proposed fusion
of ranking strategy can be described as shown in Figure
10. The importance of features for both Rabi and Kharif
seasons using fusion of ranking strategy for regression has
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 21
(a) Balesore (b) Cuttack (c) Puri
Figure 4: Feature ranking using Random Forest for Rabi season in three districts.
(a) Balesore (b) Cuttack (c) Puri
Figure 5: Feature ranking using Random Forest for Kharif season in three districts.
(a) Balesore (b) Cuttack (c) Puri
Figure 6: Feature ranking using SVR-RFE for Rabi season in three districts.
(a) Balesore (b) Cuttack (c) Puri
Figure 7: Feature ranking using SVR-RFE for Kharif season in three districts.
22 Informatica 45 (2021) 13–31 S. Mishra et al.
Table 3: Parameter set up for ranking methods.
Techniques Parameters
Random Forest for feature ranking No of estimators=1000, criterion=mean square error
SVR-RFE for feature ranking C=1.0 (Penalty parameter), Base estimator=SVR, ker-
nel=linear, no of features to select=1, step=1
F-Test for feature ranking Score_function=Ftest, no of features=1
Extreme Learning Machine No. of hidden layers - 500, Activation function - Multi-
quadric
(a) Balesore (b) Cuttack (c) Puri
Figure 8: Feature ranking using F-Test for Regression for Rabi season in three districts.
been plotted in Figure 11 and Figure 12 respectively and
the ﬁve top ranked features obtained are listed in Table 4.
5.5 Extreme learning machine regressor
In this work, ﬁrst, all the variants of ELM regressors have
been evaluated with different activation functions such
as; tanh, sine, tribas, inv-tribas, sigmoid, hardlim, soft-
lim, gaussian, multiquadric, inv-multiquadric etc. Among
these functions it has been observed that, tribas, inv-tribas,
hardlim, softlim and Gaussian functions gives a negative
value of R2 score and score of tanh, sine, sigmoid, mul-
tiquadric and inv-multiquadri functions are found to be  98% as detailed in Figure 13 and Figure 14 and also Table 5
and Table 6, shows the graph for R2 score for different ac-
tivation functions for ELM to predict Rabi and Kharif rice
crops respectively. From all those ten activation functions
multiquadric is having the highest R2 score while consider-
ing all the districts for Rabi and Kharif seasons. Hence, for
the experimentation, mutiquadric function has been consid-
ered.
5.6 ELM-Regressor for varying number of
features
Once, the newly ranked features are obtained from pro-
posed feature fusion strategy and the activation function
(multiquadratic) have been also found to be used by ELM,
now the accuracy of ELM Regressor has been calculated by
decreasing one by one feature from the datasets as shown
in Figure 15 and Figure 16.
Table 7 and Table 8 depicts the accuracy of prediction ob-
tained by multiquadratic based ELM regressor for Rabi and
Kharif seasons respectively for all three coastal regions by
decreasing the features one by one. The maximum num-
ber features those shows above 99% accuracy are coded in
red, green and blue colors for Balasore, Cuttack and Puri
districts respectively for proper visualization of the read-
ers. From Table 7, it is evident that, while decreasing the
number of features from 25 to 20, 15, 14, 13, 12, 8, 6 and
even 3 shows above 99% prediction accuracy for Balasore,
for Cuttack the number features showing 99% prediction
accuracy are 20, 10, 9 and, similarly, for Puri, 18, 15, 11,
10, 6, 3 and 2 number of features are giving maximum pre-
diction accuracy above 99%.
Similarly, from Table 8, it can be observed that, while de-
creasing the number of features from 35 to 34, 33, 30, 26,
22, 23, 20, 18, 17, 16 and 15 shows above 99% predic-
tion accuracy for Balasore, for Cuttack only 18, 11, 6, 4,
3, 2, and 1 number features are below 99% prediction ac-
curacy and rest are giving above 99%, and, similarly, for
Puri, 33, 30, 27, 26, 24, 23, 18, 16, 11, 10, 9, 8, 7, 4, 3, 2
and 1 number of features are giving below 99% prediction
accuracy. From those two table and ﬁgures this, it can be
accomplished that, to predict the crop yield for Rabi season
less number of features are working better in comparison to
Kharif seasons.
5.7 Result analysis
After obtaining the top ﬁve ranked features and the vary-
ing number features which give above 99% prediction ac-
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 23
(a) Balesore (b) Cuttack (c) Puri
Figure 9: Feature ranking using F-Test for Regression for Kharif season in three districts.
Table 4: Five top ranked features extracted using feature ranking based on Borda Count feature fusion strategy of three
districts of Rabi and Kharif season.
Seasons
Balesore Cuttack Puri
Feature
No
Feature Name
Feature
No
Feature Name
Feature
No
Feature Name
Rabi 23 May RH-8:30 AM 12 Mar Min Temp 22 Apr RH-5:30 PM
15 Jan RH-8:30 AM 21 Apr RH-8:30 AM 24 May RH-5:30 PM
9 May Max Temp 7 Mar Max Temp 11 Feb Min Temp
11 Feb Min Temp 23 May RH- 8:30 AM 15 Jan RH 8:30 AM
14 May Min Temp 16 Jan RH-5:30 PM 23 May RH-8:30 AM
Kharif 27 Sep RH-8:30 AM 31 Nov RH-8:30 AM 12 Nov Max Temp
25 Aug RH-8:30 AM 21 June RH-8:30 AM 19 Nov Min Temp
9 Aug Max Temp 23 July RH-8:30 AM 9 Aug Max Temp
21 June RH-8:30 AM 20 Dec Min Temp 25 Aug RH-8:30 AM
26 Aug RH-5.30 AM 16 Aug Min Temp 20 Dec Min Temp
Table 5: R
2
score of all activation functions of ELM for Rabi seasons.
ELM Activation
Functions
R
2
score for Rabi Season
Balesore Cuttack Puri
TANH 0.998093743193 0.994257107884 0.9989356101
SINE 0.996717695318 0.99942453749 0.983896079504
SIGMOID 0.987233114958 0.998330563403 0.999426924698
MULTIQUADRIC 0.999957522834 0.999818219303 0.999726152755
INV-MULTIQUADRIC 0.958613787069 0.935259681129 0.966708028068
TRIBAS -12.4064777143 -11.4144046567 -9.03257377773
INV-TRIBAS 0.0 -2.22044604925e-16 -2.22044604925e-16
HARDLIM 0.0 -2.22044604925e-16 -2.22044604925e-16
SOFTLIM 0.0 -2.22044604925e-16 -2.22044604925e-16
GAUSSIAN -1.13177301381 -0.59754939441 -0.0727264050254
curacy for both the seasons, in this section an attempt has
been made to validate proposed fusion of feature ranking
based strategy with Random Forest, SVR-RFE and F-Test
with multiquadratic based ELM to ﬁnd the impact of fusion
based strategy with non-fusion based ranking strategies for
the maximum number features that contribute to achieve
24 Informatica 45 (2021) 13–31 S. Mishra et al.
Table 6: R
2
score of all activation functions of ELM for Kharif seasons.
ELM Activation
Functions
R
2
score for Kharif Season
Balesore Cuttack Puri
TANH 0.999802124998 0.900092462207 0.941367554859
SINE 0.981838265602 0.983905092261 0.854629459493
SIGMOID 0.993512967504 0.936873558516 0.964438947941
MULTIQUADRIC 0.999993905565 0.999624110222 0.991070886794
INV-MULTIQUADRIC 0.979667648088 0.999512861069 0.960984673885
TRIBAS -21.7579913615 -37.3921891299 -8.74341923183
INV-TRIBAS 0.103557054691 -4.4408920985e-16 0.0669379515163
HARDLIM 0.0 -4.4408920985e-16 -8.881784197e-16
SOFTLIM 0.103557054691 -4.4408920985e-16 0.0669379515163
GAUSSIAN -0.308828188795 -1.77193827293 0.549504155107
Figure 10: Fusion of feature ranking strategy.
99% prediction accuracy as shown in Table 9 and Table 10
for Rabi and Kharif season crops.
For Rabi season crop from Table 9, it can be seen that,
proposed fusion based ranking strategy when compared
with non fusion based strategies, the maximum number of
features that contribute predictive accuracy above 99% for
ELM with Random Forest is 7, 10, 6; ELM with SVM-
RFE is 5, 9, 4 and similarly ELM with F-Test needs 9, 11,
8 numbers of features to give 99% and above predictive
accuracy. While with a very less number of features such
as; 3, 5 and 2 can predict above 99% accuracy for Bala-
sore, Cuttack and Puri districts respectively. From Table
4, where the top ﬁve ranked features extracted from fusion
strategy, it can be concluded that he crop yield for Bala-
sore district in Rabi season can be accurately predicted if
we consider only three features out of RH at 5.30 PM of
March, April, May, RH of February 8.30 AM and 5.30 PM,
because they are affecting the rice crop yield maximum.
The ﬁve features that affect the rice yield during Rabi sea-
son for Cuttack district are; RH of March, April and May
and also the minimum and maximum temperature of May
month. Similarly, the two features that affect the crop yield
of Puri district during Rabi season are out of ﬁve features
such as; RH of March and May months and minimum tem-
perature of March and May months. From this observation,
it can be said that the features containing RH in 8.30 AM
Table 7: Performance of ELM with varying number of fea-
tures for Rabi crop prediction.
No. of
Features
Balesore Cuttack Puri
25 0.9721452943 0.983009018 0.8918569778
24 0.8820921735 0.9211651663 0.899522749
23 0.9723284733 0.9717225517 0.8644802695
22 0.9844701984 0.9800205232 0.9897965576
21 0.9668404406 0.9551503977 0.9665234947
20 0.9996177026 0.9999348622 0.9869710443
19 0.9592805399 0.9127841794 0.9398241394
18 0.8356003816 0.9081500374 0.9942342785
17 0.9511241577 0.9780000307 0.9288099884
16 0.9354752886 0.9358388115 0.9363188192
15 0.9930751632 0.9172893274 0.9928662122
14 0.9901838183 0.9617978239 0.9512236619
13 0.9999946834 0.9896162607 0.9585303661
12 0.9934721511 0.9027066465 0.9372090401
11 0.9594424161 0.9510466357 0.9919344792
10 0.8894488099 0.9943953688 0.992055051
9 0.9765177632 0.999323231 0.971105448
8 0.9990643405 0.9784069021 0.9623709905
7 0.9735100076 0.9850878134 0.9978247397
6 0.9968633499 0.9728457206 0.9757838604
5 0.9135013909 0.9969514165 0.9785948706
4 0.9815795296 0.836152001 0.9720037388
3 0.9992149872 0.9126616196 0.998992344
2 0.9183391785 0.9892897973 0.9945608993
1 0.3091946087 0.7773622128 0.5590549638
and 5.30PM are the mostly affecting rice crop yield in all
the three districts for the Rabi season crop.
Similarly, while analyzing the Table 10 for Kharif sea-
son for all the district datasets, the observation says, Kharif
season crops needs more parameters or features to be con-
sidered in comparison to Rabi season crops which is evi-
dent from Table 8 and Table 10. The top 15, 5 and 5 ranked
features are need to accurately predict the rice yield during
this season for Balasore, Cuttack and Puri districts respec-
tively. Observing from Table 4, it can be established that,
for Balasore district 15 numbers of features are affecting
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 25
(a) (b) (c)
Figure 11: Feature ranking based on Borda Count based feature fusion strategy for Rabi season in three districts.
(a) (b) (c)
Figure 12: Feature ranking based on Borda Count based feature fusion strategy for Kharif season in three districts.
Figure 13: Performance comparison of different activation
functions for ELM for Rabi Crop prediction in three dif-
ferent districts.
Figure 14: Performance comparison of different activation
functions for ELM for Kharif crop prediction in three dif-
ferent districts.
26 Informatica 45 (2021) 13–31 S. Mishra et al.
Figure 15: Performance comparison of ELM based Regressor for rice crop prediction (Rabi season) with varying number
of features.
Figure 16: Performance comparison of ELM based Regressor for rice crop prediction (Kharif season) with varying number
of features.
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 27
Table 8: Performance of ELM with varying number of fea-
tures for Kharif crop prediction.
No. of
Features
Balesore Cuttack Puri
35 0.9846787896 0.9957370161 0.9951327585
34 0.9998355712 0.9999942814 0.9968069715
33 0.9988511032 0.9971118552 0.9882099882
32 0.834652697 0.9997670449 0.9970224703
31 0.9644203951 0.9999974004 0.9995567779
30 0.9985549641 0.998758502 0.9855062635
29 0.9361953426 0.9990206928 0.9973169474
28 0.9134833525 0.9982784062 0.9993592848
27 0.9687297599 0.9997739351 0.9323478922
26 0.9935929806 0.9691971505 0.8970090564
25 0.9303826554 0.9983009945 0.9976403045
24 0.9738004793 0.9999393614 0.8613974874
23 0.9931366576 0.9993454567 0.9194678778
22 0.9998339948 0.9970860103 0.9953840259
21 0.9838072021 0.9938437106 0.9934978926
20 0.9998996937 0.9997478388 0.9923063525
19 0.985075577 0.9925039823 0.9966797975
18 0.9999812113 0.9850548875 0.9875049342
17 0.991885116 0.9977728244 0.9982482335
16 0.9937114339 0.9956994688 0.9718,40716746
15 0.9975411687 0.9987271376 0.9994908157
14 0.9855317118 0.9991787244 0.9958694953
13 0.8924205418 0.9997424156 0.9996766792
12 0.8646878928 0.9986409272 0.9999242476
11 0.8996188387 0.9760167761 0.9766086922
10 0.7759922185 0.9961673342 0.9402894268
9 0.7333665426 0.9999887705 0.9782569996
8 0.8055817301 0.9990313839 0.9314026123
7 0.6255842427 0.9922110063 0.9736563656
6 0.7024204135 0.9695098805 0.9997970624
5 0.5146431718 0.9919755462 0.9917328613
4 0.6254793909 0.9292500719 0.9863601838
3 0.7288521573 0.9570653424 0.9075062814
2 0.5854590008 0.83996043 0.8564995075
1 0.4545559232 0.7583238157 0.67564097
the crop yield out of while top ﬁve features such as; RH of
October, November, December during 8.30 AM and 5.30
PM are shown due to less space. The features affecting the
Cuttack district rice yield are RH of July, Sept and Octo-
ber during 8.30 AM and 5.30 PM and also the minimum
temperature during September and November months; for
the Puri district, the 5 features that affects the rice yield are
RH of June, August, September and December mostly 5.30
PM and only 8.30AM in December and also the minimum
temperature during October months. From this, it can be
concluded that, the features affecting mostly for rice yield
are RH during 8.30 AM and 5.30 PM during Kharif season
for all three districts as similar to Rabi season.
5.8 Statistical validation
Paired T-test is one of the methods, to assess the conse-
quence of the proposed fusion of feature ranking approach.
The outcome produced by ELM-SVR-RFE was compared
with proposed approach for ﬁve independent runs consid-
ering top ﬁve ranked features. Here, only ELM-SVR-RFE
for statistical validation has been considered for paired test,
as it gives better result than the other basic feature rank-
ing based methods. There is no difference found between
the outcomes of the two methods that the null hypothesis
was the case. The outcomes shown both for the Rabi and
the Kharif seasons respectively in the Table 11 and Table
12. From the below tables we can see that, the null hy-
pothesis is rejected and average p-value is 0.0023, 0.0021,
0.0044 for the taken three districts such as: Balasore, Puri
and Cuttack of Rabi season and 0.0335, 0.0221 and 0.0450
for Kharif season of all three districts such as: Balasore,
Puri and Cuttack. We can observe that the values are closer
to zero and for this reason the arguments are strengthened
and the projected fusion of feature ranking approach has
improved performance than the other only feature ranking
based methods.
6 Discussion on principal ﬁndings
The principal aim of the present study is to discover the
features those have important role or affects mostly in rice
crop production both for the Rabi and Kharif seasons of
Balasore, Cuttack and Puri. To obtain our desired result,
a fusion based strategy based of feature ranking methods
has been proposed and explored. This methodology works
in three computational phases and not only ﬁnds the most
signiﬁcant features contributing towards rice yield but also
shows 99% and above prediction accuracy. According
to the results obtained the following are few observations
made on this study:
  First, the raw data including climatologic character-
istics and rice production per hector are collected for
three districts and two seasons and the range and av-
erage of parameters of those datasets are computed to
have a greater insight about the features for proper un-
derstanding.
  The importance of features have been evaluated and
those features are selected for prediction of rice yield
using, ranking of features by applying Random For-
est, SVR-RFE and F-Test ranking strategies. These
feature ranking models, rank all the features of indi-
vidual datasets for further processing.
  A feature level fusion model using Borda Count has
been explored to generate a new set of ranked features
by taking the ranked features from all three feature
ranking strategies for further analysis. From this, top
ﬁve ranked features contributing mostly for rice yield
have been listed in Table 4.
  Multiquadratic activation has been conﬁrmed from ten
activations functions based on R2 score to be used by
the ELM regressor to obtain the rice yield prediction
above 99% predictive accuracy by decreasing the fea-
tures one by one for two seasons and three district
datasets and results are shown in Table 7 and Table
8.
28 Informatica 45 (2021) 13–31 S. Mishra et al.
Table 9: Performance comparison of proposed feature ranking based fusion strategy with feature ranking based methods
for Rabi crop prediction.
Districts Number of top ranked features required to achieve a threshold accuracy of 99%
ELM with Random
Forest
ELM with
SVR-RFE
ELM with F-Test
ELM with Proposed
Fusion Strategy
Balasore 7 5 9 3
Cuttack 10 9 11 5
Puri 6 4 8 2
Table 10: Performance comparison of proposed feature ranking based fusion strategy with feature ranking based methods
for Kharif crop prediction.
Districts Number of top ranked features required to achieve a threshold accuracy of 99%
ELM with Random
Forest
ELM with
SVR-RFE
ELM with F-Test
ELM with Proposed
Fusion Strategy
Balasore 21 15 24 15
Cuttack 17 10 21 5
Puri 17 12 22 5
Table 11: Paired T-test of Rabi season datasets (all three districts) for the ELM-SVR-RFE approach and proposed Fusion
based feature ranking strategy.
Runs
Balasore District Dataset Puri District Dataset Cuttack District Dataset
Hypothesis p-Value Hypothesis p-Value Hypothesis p-Value
Test Test Test
1 1 0.002374351 1 0.002145848 1 0.00442727
2 1 0.002376581 1 0.002763544 1 0.00423645
3 1 0.002432856 1 0.002658974 1 0.00445726
4 1 0.002743567 1 0.002738465 1 0.00465187
5 1 0.002267655 1 0.002748983 1 0.00435478
Table 12: Paired T-test of Kharif season datasets (all three districts) for the ELM-SVR-RFE approach and proposed Fusion
based feature ranking strategy.
Runs
Balasore District Dataset Puri District Dataset Cuttack District Dataset
Hypothesis p-Value Hypothesis p-Value Hypothesis p-Value
Test Test Test
1 1 0.03316396 1 0.022158879 1 0.045061108
2 1 0.03426353 1 0.022165374 1 0.045182873
3 1 0.03326354 1 0.022263667 1 0.044762783
4 1 0.03387623 1 0.021773664 1 0.045002388
5 1 0.03316538 1 0.022377488 1 0.045288384
  Again, the performance comparison of proposed fea-
ture ranking based fusion strategy with feature rank-
ing based methods for Rabi and Kharif seasons crop
prediction are done to obtain the minimum number
of features contributing towards rice crop yield and
shown in Table 9 and Table 10. From those ta-
bles, it can be concluded that, the features affecting
mostly for rice yield are RH during 8.30 AM and 5.30
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 29
PM for all three districts taken during both the Rabi
and Kharif season and also the minimum temperature
plays a vital role.
  The paired T-test was used to calculate the importance
of proposed fusion of feature ranking approach. The
outcomes found by ELM-SVR-RFE were compared
with proposed approach for ﬁve independent runs con-
sidering top ﬁve ranked features. Here, only ELM-
SVR-RFE for statistical validation has been consid-
ered for paired test, as it gives healthier result than
other basic feature ranking based methods.
  It can be observed from Table 11 and Table 12 that,
the null hypothesis is rejected in case of Rabi sea-
son for all the three districts such as: Balasore, Puri
and Cuttack and for three districts of Kharif season,
as the values are closer to zero, which strengthens the
argument that, proposed fusion of feature ranking ap-
proach has improved performance than the other only
feature ranking based methods.
7 Conclusion and future scope
In this study an attempt has been made to obtain the cli-
matic effect on rice yield of coastal areas of Odisha. The
fusion based strategy is the novelty of this work. This pre-
diction model not only predicts the rice yield per hector but
also able to obtain the signiﬁcant or most affecting features
during Rabi and Kharif seasons. This methodology works
in three phases, in the ﬁrst phase, three feature ranking ap-
proaches such as; Random Forest, SVR-RFE and F-Test
has been applied on the three two datasets of three coastal
areas and features are ranked as per the their algorithm.
In the second phase, Borda Count as a fusion method has
been implemented on those ranked features from the above
phase to obtain top ﬁve best features. Then in the third
phase, multiquadratic based ELM has been used to pre-
dict the rice crop yield using those ranked features obtained
from fusion based raking strategy of second phase. After
applying ELM with fusion strategy, it is seen that by tak-
ing at least 3 features for Balasore, 5 features for Cuttack
and 2 features for Puri we can get the accuracy of 99%
where as in each individual ranking method with ELM we
have to take more features. Finally, the statistical paired T-
test has been used to evaluate and validate the signiﬁcance
of proposed fusion based ranking prediction model. From
the observations made during experimentation, it has been
found that; relative humidity and in some case temperature
also is playing a vital role for rice crop production both for
the Rabi season and the Kharif season. However, in future,
the not linked or inconsequential factors can be later dealt
with by working on optimized strategies.
Acknowledgement
This work is ﬁnancially supported by the Ministry of
Science and Higher Education of the Russian Federation
(Government Order FENU-2020-0022).
References
[1] Central Soil and water Conservation Research
& Training Institute (CSWCR & TI), Vision
2030, http://www.cswcrtiweb.org/. (Ac-
cessed on 17/10/2014).
[2] Venkateswarlu, B. (2010). The 21st Dr. SP Raychaud-
huri Memorial Lecture-Climate change: Adaptation
and mitigation strategies in rainfed agriculture. Jour-
nal of the Indian Society of Soil Science, 58, S27-S35.
[3] Saseendran, S. A., Singh, K. K., Rathore, L. S.,
Singh, S. V ., & Sinha, S. K. (2000). Effects of climate
change on rice production in the tropical humid cli-
mate of Kerala, India. Climatic Change, 44(4), 495-
514.
[4] Sarker, M. A. R., Alam, K., & Gow, J. (2012). Ex-
ploring the relationship between climate change and
rice yield in Bangladesh: An analysis of time series
data. Agricultural Systems, 112, 11-16.
[5] Soora, N. K., Aggarwal, P. K., Saxena, R., Rani, S.,
Jain, S., & Chauhan, N. (2013). An assessment of re-
gional vulnerability of rice to climate change in India.
Climatic Change, 118(3-4), 683-699.
[6] Bocca, F. F., & Rodrigues, L. H. A. (2016). The effect
of tuning, feature engineering, and feature selection in
data mining applied to rainfed sugarcane yield mod-
elling. Computers and electronics in agriculture, 128,
67-76.
[7] Gilbertson, J. K., & Van Niekerk, A. (2017). Value of
dimensionality reduction for crop differentiation with
multi-temporal imagery and machine learning. Com-
puters and Electronics in Agriculture, 142, 50-58.
[8] Ma, C., Zhang, H. H., & Wang, X. (2014). Machine
learning for Big Data analytics in plants. Trends in
plant science, 19(12), 798-808.
[9] Hancer, E., Xue, B., & Zhang, M. (2018). Differential
evolution for ﬁlter feature selection based on infor-
mation theory and feature ranking. Knowledge-Based
Systems, 140, 103-119.
[10] Razmjoo, A., Xanthopoulos, P., & Zheng, Q. P.
(2017). Online feature importance ranking based on
sensitivity analysis. Expert Systems with Applica-
tions, 85, 397-406.
[11] Teisseyre, P. (2016). Feature ranking for multi-label
classiﬁcation using Markov networks. Neurocomput-
ing, 205, 439-454.
30 Informatica 45 (2021) 13–31 S. Mishra et al.
[12] Lee, J., & Kim, D. W. (2015). Fast multi-label fea-
ture selection based on information-theoretic feature
ranking. Pattern Recognition, 48(9), 2761-2771.
[13] Fakhraei, S., Soltanian-Zadeh, H., & Fotouhi, F.
(2014). Bias and stability of single variable classiﬁers
for feature ranking and selection. Expert systems with
applications, 41(15), 6945-6958.
[14] Hall, M. A., & Holmes, G. (2003). Benchmarking
attribute selection techniques for discrete class data
mining. IEEE Transactions on Knowledge and Data
engineering, 15(6), 1437-1447.
[15] Wei, C. C. (2013). Soft computing techniques in en-
semble precipitation nowcast. Applied Soft Comput-
ing, 13(2), 793-805.
[16] Cruz, R. M., Sabourin, R., & Cavalcanti, G. D.
(2017). META-DES. Oracle: Meta-learning and fea-
ture selection for dynamic ensemble selection. Infor-
mation fusion, 38, 84-103.
[17] Drami´ nski, M., Rada-Iglesias, A., Enroth, S.,
Wadelius, C., Koronacki, J., & Komorowski, J.
(2008). Monte Carlo feature selection for supervised
classiﬁcation. Bioinformatics, 24(1), 110-117.
[18] Tripoliti, E. E., Fotiadis, D. I., & Manis, G. (2013).
Modiﬁcations of the construction and voting mech-
anisms of the random forests algorithm. Data &
Knowledge Engineering, 87, 41-65.
[19] Breiman, L. (2001). Random forests. Machine learn-
ing, 45(1), 5-32.
[20] Zhang, H. R., & Min, F. (2016). Three-way
recommender systems based on random forests.
Knowledge-Based Systems, 91, 275-286.
[21] Wu, Q., Ye, Y ., Zhang, H., Ng, M. K., & Ho, S. S.
(2014). ForesTexter: an efﬁcient random forest algo-
rithm for imbalanced text categorization. Knowledge-
Based Systems, 67, 105-116.
[22] Yeh, C. C., Lin, F., & Hsu, C. Y . (2012). A hybrid
KMV model, random forests and rough set theory ap-
proach for credit rating. Knowledge-Based Systems,
33, 166-172.
[23] Liaw, A., & Wiener, M. (2002). Classiﬁcation and re-
gression by randomForest. R news, 2(3), 18-22.
[24] Yan, K., & Zhang, D. (2015). Feature selection and
analysis on correlated gas sensor data with recursive
feature elimination. Sensors and Actuators B: Chem-
ical, 212, 353-363.
[25] Shieh, M. D., & Yang, C. C. (2008). Multiclass SVM-
RFE for product form feature selection. Expert Sys-
tems with Applications, 35(1-2), 531-541.
[26] Mishra, S., & Mishra, D. (2015). SVM-BT-RFE: An
improved gene selection framework using Bayesian
T-test embedded in support vector machine (recursive
feature elimination) algorithm. Karbala International
Journal of Modern Science, 1(2), 86-96.
[27] Xu, Q., Kamel, M., & Salama, M. M. (2004,
September). Signiﬁcance test for feature subset selec-
tion on image recognition. In International Confer-
ence Image Analysis and Recognition (pp. 244-252).
Springer, Berlin, Heidelberg.
[28] Golugula, A., Lee, G., & Madabhushi, A. (2011, Au-
gust). Evaluating feature selection strategies for high
dimensional, small sample size datasets.In 2011 An-
nual International conference of the IEEE engineer-
ing in medicine and biology society (pp. 949-952).
IEEE.
[29] Huang, G. B., Zhu, Q. Y ., & Siew, C. K. (2006).
Extreme learning machine: theory and applications.
Neurocomputing, 70(1-3), 489-501.
[30] Das, S. R., Mishra, D., & Rout, M. (2019). A hy-
bridized ELM using self-adaptive multi-population-
based Jaya algorithm for currency exchange predic-
tion: an empirical assessment. Neural Computing and
Applications, 31(11), 7071-7094.
[31] Li, X., Xie, H., Wang, R., Cai, Y ., Cao, J., Wang, F.,
Min,H. & Deng, X. (2016). Empirical analysis: stock
market prediction via extreme learning machine. Neu-
ral Computing and Applications, 27(1), 67-78.
[32] Balasundaram, S., & Gupta, D. (2016). Knowledge-
based extreme learning machines. Neural Computing
and Applications, 27(6), 1629-1641.
[33] Orissa Agricultural Statistics Year Book, (1983-
2013). Directorate of Agriculture and Food Produc-
tion, Govt. of Odisha, Bhubaneswar.
[34] https://www.google.co.in/images
[35] Narasimhamurthy, V ., & Kumar, P. (2017). Rice Crop
Yield Forecasting Using Random Forest Algorithm.
Int. J. Res. Appl. Sci. Eng. Technol. IJRASET, 5,
1220-1225.
[36] Dahal, H., & Routray, J. K. (2011). Identifying asso-
ciations between soil and production variables using
linear multiple regression models. Journal of Agricul-
ture and Environment, 12, 27-37.
[37] Powell, J. P., & Reinhard, S. (2016). Measuring the
effects of extreme weather events on yields. Weather
and Climate extremes, 12, 69-79.
[38] Yusof, M. F., Azamathulla, H. M., & Abdullah, R.
(2014). Prediction of soil erodibility factor for Penin-
sular Malaysia soil series using ANN. Neural Com-
puting and Applications, 24(2), 383-389.
A Novel Borda Count Based Feature Ranking and. . . Informatica 45 (2021) 13–31 31
[39] Erdil, A., & Arcaklioglu, E. (2013). The prediction
of meteorological variables using artiﬁcial neural net-
work. Neural Computing and Applications, 22(7-8),
1677-1683.
[40] Anitha, A., & Acharjya, D. P. (2018). Crop suitabil-
ity prediction in Vellore District using rough set on
fuzzy approximation space and neural network. Neu-
ral Computing and Applications, 30(12), 3633-3650.
[41] Zahid, M. A., & De Swart, H. (2015). The borda ma-
jority count. Information Sciences, 295, 429-440.
[42] García-Lapresta, J. L., Martínez-Panero, M., &
Meneses, L. C. (2009). Deﬁning the Borda count in a
linguistic decision making context. Information Sci-
ences, 179(14), 2309-2316.
[43] https://www.casact.org/pubs/forum/
98wforum/98wf055.pdf
[44] Hirai, GI., Chiyo, H., Tanka, O., Hikano, T., & Oan-
otri, M. (1993). Studies on the effect of relative hu-
midity of atmosphere on growth and physiology of
rice plants. VIII effect of ambient humidity on dry
matter production and nitrogen absorption at vari-
ous temperatures, Japanese Journal of Crop Science,
62(3), 395-400.
[45] Sunil, K. M. (2000). Crops weather relationship
in rice (Doctoral dissertation, Department of Agri-
cultural Meteorology, College of Horticulture, Vel-
lanikkara).
[46] Vijayakumar, CM. (1996). Hybrid rice seed produc-
tion technology- theory and practice. Directorate of
rice research, Hyderabad, 52-55.
[47] Gridyal, B. P., & Jana, R. K. (1997). Agrometerol-
ogycal environmental affecting rice yield. Agronomy
Journal, 59, 286-287.
[48] Narayanan, A. L. (2004). Relative inﬂuence of
weather parameters on rice hybrid and variety and
validation of CERES- Rice model for staggered
weeks of transplanting. PhD Thesis, Tamilnadu Agri-
cultural University, Coimbatore.
[49] Shi, C. H., & Shen, Z. T. (1990). Effect of high hu-
midity and low temperature on spikelet fertility in
indica rice. International Rice Research Newsletter,
15(3), 10-11.
[50] Morita, S., Wada, H., & Matsue, Y . (2016). Counter-
measures for heat damage in rice grain quality under
climate change. Plant Production Science, 19(1), 1-
11.
32 Informatica 45 (2021) 13–31 S. Mishra et al.