https://doi.org/10.31449/inf.v42i4.2003 Informatica 42 (2018) 587–594 587
  
Integrated Speaker and Speech Recognition for Wheel Chair Movement 
using Artificial Intelligence 
Gurpreet Kaur 
Research Scholar, I.K Gujral Punjab Technical University, Kapurthala-144603, India 
Assistant Professor, University Institute of Engineering & Technology, Panjab University,Chandigarh-160025, India 
E-mail: regs4gurpreet@yahoo.co.in 
 
Mohit Srivastava 
Professor, Chandigarh Engineering College, Landran, Mohali-140307, India 
E-mail: mohitsrivastava.78@gmail.com 
 
Amod Kumar 
Scientist, Central Scientific Instruments Organisation, Chandigarh-160030, India 
E-mail: csioamod@yahoo.com 
 
Keywords: speaker recognition, speech recognition, mel frequency cepstral coefficients, artificial bee colony 
algorithm, feed forward back propagation neural network  
Received: November 10, 2017 
Abstract: A speech signal is a result of the constrictions of vocal tract and different sounds can be 
generated by different vocal tract constrictions. A speech signal carries two things i.e. speaker's identity 
and meaning. For a specific applications of speaker and speech recognition like voice operated wheel 
chair, both the speaker and speech is to be recognized for movement of wheel chair. Automation in 
wheelchair is,  today's requirement as the numbers of people are increasing with disabilities like injuries 
in spine, amputation, impairments in hands etc. They need assistance for moving their wheel chair. 
Voice operated wheel chair is one of the solution. The intention of this study is to use a speaker and 
speech dependent system to control the wheelchair and minimize the risk of unwanted accident. We have 
proposed a system in which speaker (patient) as well as speech (commands) is recognized based upon 
the acoustic features like Mel Frequency Cepstral Coefficients (MFCC). Optimization of the features 
using Artificial Bee Algorithm (ABC) is done to gain good accuracy with artificial intelligence technique 
as a classifier. We have tested our system on standard dataset (TIDIGITS) and our own prepared 
dataset. Also, validation of proposed work is done by generating control signal to actuate the wheel 
chair in real time scenario. 
Povzetek: Predstavljen je nadzor invalidskega vozička s pomočjo govora in metod umetne inteligence. 
1 Introduction
In the recent years, speaker and speech recognition has 
become a major domain of research because of various 
applications in real world from home to health care 
services. Speaker and speech recognition is jointly used 
in voice operated wheel chair. Wheel chair should move 
only when specific person (speaker recognition) should 
give commands (speech recognition). To guarantee good 
accuracy and less learning time, feature extraction 
process is very important for any recognition system. We 
have done detailed analysis on various feature extraction 
methods [1-2]. MFCC are the most used features for the 
speaker and speech recognition systems. Various 
optimization algorithms like Genetic algorithms with 
convolutional neural networks have been used for 
various applications like human action recognition [3-4]. 
Weights can be optimized using Genetic Algorithm (GA) 
framework.  Dervis Karaboga proposed ABC algorithm 
for optimizing numerical problems[5]. This algorithm 
can be used in various fields like data mining, designing 
of filters, speech/speaker recognition etc. [6]. Initially 
DNN based system are used on phone recognition task. 
After that DNNs are used at large scale on big 
vocabulary continuous speech [7-10].When DNN based 
systems are compared with other systems like dynamic 
time warping (DTW), Hidden Markov Models (HMM), 
Gaussian mixture models (GMM) based systems, then 
recognition accuracy is more for DNN based systems 
[11].There is fast learning in deep neural network by 
parameterization of weight matrix by using periodic 
functions. In this way training time is reduced and 
classification accuracy improves. Different 
configurations of layers in neural network results in 
different results in terms of accuracy by considering 
complexity and memory requirements of the system [12-
13]. Feed forward back propagation network (FFBPN) 
and recurrent neural network (RNN) are most used 
networks type [14-15]. This paper presents the integrated 
speaker and speech recognition system with optimized 
588 Informatica 42 (2018) 587–594 G. Kaur et al. 
MFCC features using ABC algorithm and FFBPN is used 
for classification. The main aim of proposed work is to 
combined the features of speech according to the speaker 
and create a speech and speaker dependent system to 
recognize the command which is given by the speaker to 
operate the wheelchair. If the wheelchair is operated only 
on commands without recognizing speaker then anybody 
can provide the wrong commands. So we need to 
combine the speech with speaker for the movement of 
wheel chair without any error. Operating the wheelchair 
according to the command of speaker is a big task and 
the feature extraction technique plays an important role. 
If the extracted features of speech signal are not unique 
then the accuracy of recognition system is not acceptable 
and the chances of error will increase. So the selection of 
a feature extraction technique is major issue  in the 
speech recognition system.  MFCC is a better option to 
find out the features from the speech signals. But the 
chances of unwanted features are still there, so 
optimization is needed and we have used the artificial 
bee colony algorithm with a novel objective function. So 
in the proposed work, an efficient speech and speaker 
recognition system is presented using the artificial 
intelligence technique along with the ABC optimization 
algorithm. 
2 Related work 
There are many existing speech and speaker recognition 
system using different types of techniques.  
Many researchers have developed smart wheel chairs 
with voice commands using commercially available 
recognition systems. Simpson et al. [16] developed a 
wheelchair with joystick as well as voice controlled 
module. They have used commercially available Verbex 
speech commander recognition system with nine 
commands. With the development of many algorithms in 
artificial intelligence domain, researchers have moved to 
this field. Pacnik et al. [17] proposed a voice operated 
intelligent wheel chair (VOIC) using LPC cepstral 
analysis with neural network. They have analyzed that 
recognition of speech is possible with neural network, 
giving 4% error. Another author Jabardi [18] developed a 
wheelchair based upon artificial neural network with 
86% of accuracy for known speakers and 76% of 
accuracy for unknown speakers. Speech recognition field 
has gain more advancement in terms of recognition rate, 
environmental conditions, speaker variability etc. with 
the development of deep neural networks[19-24]. 
From the above survey we have decided to present 
an integrated speaker and speech recognition system 
using ABC algorithm with artificial intelligence 
technique for the movement of wheel chair. 
 
Figure 1: Integrated Speaker and Speech Recognition System for movement of wheel chair. 
Integrated Speaker and Speech Recognition for... Informatica 42 (2018) 587–594 589 
3 Proposed system 
We have proposed an integrated speaker and speech 
recognition system as shown in figure 1. 
Above figure represent the flow diagram of 
integrated speaker and speech recognition system based 
on the artificial intelligence concept. In the speaker and 
speech recognition system, firstly features are extracted 
from the speech signal. These features should be robust 
to noise and efficient enough so that classification can 
discriminate between the speakers and words. We have 
used MFCC features and to gain in accuracy, 
optimization of these features is done by ABC algorithm. 
Then classification is done with feed forward back 
propagation neural network. The FFBPN output is what 
is recognized i.e. who is speaking and what he or she is 
speaking. The command word is given through the RF 
transceiver to the microcontroller. The MCU (ATMEGA 
8) interprets the commands received and accordingly 
motor is controlled through driver circuit (L293D) to 
move the wheel chair. 
3.1 Feature extraction using MFCC 
algorithm 
In this section, we have described the MFCC algorithm 
which is used to find out the feature set from the speech 
signal with respect to the speaker. The algorithm of 
MFCC is given below. 
Firstly Initialized parameters 
Tw = 25(analysis frame duration (ms)) 
Ts = 10 (analysis frame shift (ms)) 
Alpha = 0.97(pre emphasis coefficient) 
R = [300 3700] (frequency range to consider) 
M = 20 (number of filter bank channels) 
C = 13 (number of cepstral coefficients) 
L = 22 (cepstral sine lifter parameter) 
Hamming=((N)(0.54-0.46*cos(2*pi*[0:N-1].'/(N-1)))) 
MFCC
features{i}
= ∑ MFCC( Signal, fs, Tw, Ts, Alpha, Hamming, R, M, C, L)
𝑛 𝑖 =1
 
 
Where Signal is the speech data which is uploaded by 
user and MFCC_features are the extracted feature set 
from the uploaded speech data. 
3.2 Optimization with ABC algorithm 
The probability of unwanted signals are more in 
extracted features and due to the unwanted signal the 
accuracy of work is degraded. So we need to enhance the 
features set by removing the unwanted signal using the 
artificial bee colony algorithm as an optimization 
technique. To optimize the features set, we have defined 
a novel objective function and fitness function of ABC 
algorithm as shown in equation 1. 
 
𝐴𝐵𝐶 𝑓𝑓
=
{
𝐵𝑒𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑖𝑓 𝐵𝑒𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 > 𝐵𝑒𝑒 𝑜𝑛𝑙𝑜𝑜𝑘𝑒𝑟 𝐵𝑒𝑒 𝑜𝑛𝑙𝑜𝑜𝑘𝑒𝑟 𝑒𝑙𝑠𝑒 }..(1) 
 
Where ABC ff is the output of fitness function and     
Bee currentis the total bee which is called MFCC feature 
and Bee onlooker is the threshold value of feature set. The 
steps of ABC algorithm is given in below; 
Upload dataset for Training 
Select Case (B, F, L, R and S) 
Choose Noise Type 
       A: Without Noise 
       B: White Gaussian Noise (WGN) 
       C: Adaptive WGN 
If user=1 (Without Noise) 
Speech_signal=load (Speech Data) 
Speech _MFCC
features{i}
= ∑ mfcc ( Speech _signal )
𝑛 𝑖 =1
 
Initialize ABC Algorithm 
Define - Employed bee 
       - Onlookers bee and  
       - Scouts bee 
Set objective function: 
𝐴𝐵𝐶 𝑓𝑓
= {
𝐵𝑒𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡                             𝑖𝑓     𝐵𝑒𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 > 𝐵𝑒𝑒 𝑜𝑛𝑙𝑜𝑜𝑘𝑒𝑟  𝐵𝑒𝑒 𝑜𝑛𝑙𝑜𝑜𝑘𝑒𝑟                            𝑒𝑙𝑠𝑒                                                   
} 
 
  Opitmized _MFCC{i}
= ∑ ∑ ABC( MFCC
features
, fs, ft)
c
j=1
𝑟 𝑖 =1
 
where fs is selected value and ft is threshold value 
else if  user=2 (WGN Noise) 
Speech_signal_WGN=load (Speech Data) 
 
Speech _WGN_MFCC
features{i}
= ∑ mfcc ( Speech _signal _WGN)
𝑛 𝑖 =1
 
  Opitmized _WGN_MFCC{i}
= ∑ ∑ ABC( MFCC
features
, fs, ft)
c
j=1
𝑟 𝑖 =1
 
where fs is selected value and ft is threshold value 
 
else if  user=3 (AWGN Noise) 
Speech_signal_AWGN=load (Speech Data) 
 
Speech _AWGN_MFCC
features{i}
= ∑ mfcc ( Speech _signal _AWGN)
𝑛 𝑖 =1
 
  Opitmized _AWGN_MFCC{i}
= ∑ ∑ ABC( MFCC
features
, fs, ft)
c
j=1
𝑟 𝑖 =1
 
Where fs is selected value which is called current 
bee and ft is threshold value which is called onlooker 
bee. The best optimized is called scout bee which is the 
optimal feature from the MFCC feature sets. 
end  
Where speech_signal is the speech data which is 
uploaded by users and Opitmized_AWGN_MFCC is the 
590 Informatica 42 (2018) 587–594 G. Kaur et al. 
optimized features set which is used in the training of 
proposed system as an input of FFBPN algorithm. 
3.3 FFBPN algorithm 
A Feed forward back propagation neural network 
(FFBPN) is an authoritative machine learning technique 
from the field of deep learning. FFBPNs are trained 
using large collections of optimized features set. From 
these large collections, FFBPNs can learn prosperous 
feature representations for a wide range of features. The 
used algorithm of FFBPN is given as; 
 
Load Opitmized_MFCC_Data 
Trainingdata = Opitmized_MFCC_Data 
Initialize FFBPN  
Generate group of data = group 
Set iteration = 1000 
for i = 1: iteration  
Weight = Opitmized_MFCC_Data(i) 
Hidden_Layer = [25, 25, 25] (tansig) 
Net_algo = trainrp 
Generate Net structure of FBPNN (net) 
Net_Output = train (net, Trainingdata, group) 
end  
 
We have saved the Net_Output as a training data and 
simulated with test data and appropriate results are 
calculated with feed forward back propagation neural 
network. The Net_Output depends on the training data of 
network and it contains the categories of data which is 
used in the classification stage of proposed work. In the 
training phase we have considererd the 25 neurons in 
each hidden layers with tan sigmoid transfer function. 
This is used as a carrier of signal from one layer to 
another layer of FFBPN. Each layer of FFBPN produces 
a response, or activation, to an input feature. However, 
there are only a few layers within a FFBPN that are 
suitable for feature training. Here we have set the 1000 
iteration for the training of input data based on the 
performance criteria of FFBPN. In the each iteration 
FFBPN adjust the weight of input feature and create a 
structure of output according to the defined group at the 
time of initialization of network. 
4 Proposed system 
We have proposed an integrated speaker and speech 
recognition system as shown in figure 1. 
Above figure represent the flow diagram of 
integrated speaker and speech recognition system based 
on the artificial intelligence concept. In the speaker and 
speech recognition system, firstly features are extracted 
from the speech signal. These features should be robust 
to noise and efficient enough so that classification can 
discriminate between the speakers and words. We have 
used MFCC features and to gain in accuracy, 
optimization of these features is done by ABC algorithm. 
Then classification is done with feed forward back 
propagation neural network. The FFBPN output is what 
is recognized i.e. who is speaking and what he or she is 
speaking. The command word is given through the RF 
transceiver to the microcontroller. The MCU (ATMEGA 
8) interprets the commands received and accordingly 
motor is controlled through driver circuit (L293D) to 
move the wheel chair. 
4.1 Feature extraction using MFCC 
algorithm 
In this section, we have described the MFCC algorithm 
which is used to find out the feature set from the speech 
signal with respect to the speaker. The algorithm of 
MFCC is given below. 
Firstly Initialized parameters 
Tw = 25(analysis frame duration (ms)) 
Ts = 10 (analysis frame shift (ms)) 
Alpha = 0.97(pre emphasis coefficient) 
R = [300 3700] (frequency range to consider) 
M = 20 (number of filter bank channels) 
C = 13 (number of cepstral coefficients) 
L = 22 (cepstral sine lifter parameter) 
Hamming=((N)(0.54-0.46*cos(2*pi*[0:N-1].'/(N-1)))) 
MFCC
features{i}
= ∑ MFCC( Signal, fs, Tw, Ts, Alpha, Hamming, R, M, C, L)
𝑛 𝑖 =1
 
 
Where Signal is the speech data which is uploaded by 
user and MFCC_features are the extracted feature set 
from the uploaded speech data. 
4.2 Optimization with ABC algorithm 
The probability of unwanted signals are more in 
extracted features and due to the unwanted signal the 
accuracy of work is degraded. So we need to enhance the 
features set by removing the unwanted signal using the 
artificial bee colony algorithm as an optimization 
technique. To optimize the features set, we have defined 
a novel objective function and fitness function of ABC 
algorithm as shown in equation 1. 
 
𝐴𝐵𝐶 𝑓𝑓
=
{
𝐵𝑒𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑖𝑓 𝐵𝑒𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 > 𝐵𝑒𝑒 𝑜𝑛𝑙𝑜𝑜𝑘𝑒𝑟 𝐵𝑒𝑒 𝑜𝑛𝑙𝑜𝑜𝑘𝑒𝑟 𝑒𝑙𝑠𝑒 }..(1) 
 
Where ABC ff is the output of fitness function and     
Bee currentis the total bee which is called MFCC feature 
and Bee onlooker is the threshold value of feature set. The 
steps of ABC algorithm is given in below; 
Upload dataset for Training 
Select Case (B, F, L, R and S) 
Choose Noise Type 
       A: Without Noise 
       B: White Gaussian Noise (WGN) 
       C: Adaptive WGN 
If user=1 (Without Noise) 
Speech_signal=load (Speech Data) 
Integrated Speaker and Speech Recognition for... Informatica 42 (2018) 587–594 591 
Speech _MFCC
features{i}
= ∑ mfcc ( Speech _signal )
𝑛 𝑖 =1
 
Initialize ABC Algorithm 
Define - Employed bee 
       - Onlookers bee and  
       - Scouts bee 
Set objective function: 
𝐴𝐵𝐶 𝑓𝑓
= {
𝐵𝑒𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡                             𝑖𝑓     𝐵𝑒𝑒 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 > 𝐵𝑒𝑒 𝑜𝑛𝑙𝑜𝑜𝑘𝑒𝑟  𝐵𝑒𝑒 𝑜𝑛𝑙𝑜𝑜𝑘𝑒𝑟                            𝑒𝑙𝑠𝑒                                                   
} 
 
  Opitmized _MFCC{i}
= ∑ ∑ ABC( MFCC
features
, fs, ft)
c
j=1
𝑟 𝑖 =1
 
where fs is selected value and ft is threshold value 
else if  user=2 (WGN Noise) 
Speech_signal_WGN=load (Speech Data) 
 
Speech _WGN_MFCC
features{i}
= ∑ mfcc ( Speech _signal _WGN)
𝑛 𝑖 =1
 
  Opitmized _WGN_MFCC{i}
= ∑ ∑ ABC( MFCC
features
, fs, ft)
c
j=1
𝑟 𝑖 =1
 
where fs is selected value and ft is threshold value 
 
else if  user=3 (AWGN Noise) 
Speech_signal_AWGN=load (Speech Data) 
 
Speech _AWGN_MFCC
features{i}
= ∑ mfcc ( Speech _signal _AWGN)
𝑛 𝑖 =1
 
  Opitmized _AWGN_MFCC{i}
= ∑ ∑ ABC( MFCC
features
, fs, ft)
c
j=1
𝑟 𝑖 =1
 
Where fs is selected value which is called current 
bee and ft is threshold value which is called onlooker 
bee. The best optimized is called scout bee which is the 
optimal feature from the MFCC feature sets. 
end  
Where speech_signal is the speech data which is 
uploaded by users and Opitmized_AWGN_MFCC is the 
optimized features set which is used in the training of 
proposed system as an input of FFBPN algorithm. 
4.3 FFBPN algorithm 
A Feed forward back propagation neural network 
(FFBPN) is an authoritative machine learning technique 
from the field of deep learning. FFBPNs are trained 
using large collections of optimized features set. From 
these large collections, FFBPNs can learn prosperous 
feature representations for a wide range of features. The 
used algorithm of FFBPN is given as; 
 
Load Opitmized_MFCC_Data 
Trainingdata = Opitmized_MFCC_Data 
Initialize FFBPN  
Generate group of data = group 
Set iteration = 1000 
for i = 1: iteration  
Weight = Opitmized_MFCC_Data(i) 
Hidden_Layer = [25, 25, 25] (tansig) 
Net_algo = trainrp 
Generate Net structure of FBPNN (net) 
Net_Output = train (net, Trainingdata, group) 
end  
 
We have saved the Net_Output as a training data and 
simulated with test data and appropriate results are 
calculated with feed forward back propagation neural 
network. The Net_Output depends on the training data of 
network and it contains the categories of data which is 
used in the classification stage of proposed work. In the 
training phase we have considererd the 25 neurons in 
each hidden layers with tan sigmoid transfer function. 
This is used as a carrier of signal from one layer to 
another layer of FFBPN. Each layer of FFBPN produces 
a response, or activation, to an input feature. However, 
there are only a few layers within a FFBPN that are 
suitable for feature training. Here we have set the 1000 
iteration for the training of input data based on the 
performance criteria of FFBPN. In the each iteration 
FFBPN adjust the weight of input feature and create a 
structure of output according to the defined group at the 
time of initialization of network. 
 
5 Experiments and results 
In this section, the simulation results and analysis of 
proposed work is described. The speech is acquired by 
sound recorder with the help of headphone at 16 KHz 
frequency at room environment in mono format. In our 
case, database is prepared for four speakers of age 27-34, 
two females (F1, F2) and two males (M1, M2). The 
words recorded are 'Forward, Backward, Left, Right, 
Stop'.  Each word is recorded 80 times and hence 400 
words are recorded for each speaker creating a database 
of 1600 words. It is much more difficult to recognize 
speech in presence of noise. Proposed work is tested on 
various types of noises like White Gaussian Noise 
(WGN), Adaptive White Gaussian Noise (AWGN) etc. 
We have tested our system on TIDIGITS database and 
our own created database. For the all types of signal, we 
have extracted the features and then optimized them to 
enhance the features set. After the optimization, we have 
trained the features with FFBPN. 
592 Informatica 42 (2018) 587–594 G. Kaur et al. 
In the training phase, we have used the set of 25 
neurons in each hidden layers with tan sigmoid transfer 
function to train the input feature data. After the training, 
we have tested the simulation with a test speech signal 
and process is repeated for testing phase. 
Figure 2: ROC Curve for Proposed Work. 
Figure 2 shows the receiver operating characteristics 
(ROC) curve of proposed speech and speaker recognition 
system. It is a graphical method for comparing two 
empirical distributions where x-axis denotes the false 
positive rate and y-axis denotes the true positive rate. On 
the basis of ROC curve we have calculated the 
probability of recognition accuracy using the area under 
curve (AUC), which varies from 0 to 1. Form the figure 
2, the AUC value is 0.8993 which indicates the training 
to system is good, therefore better classification rate. 
Table 1 shows the recognition accuracy for four persons 
including 2 men and 2 women. We have compiled the 
results for four persons using TIDIGITS dataset. 
Speech 
Signal 
(Words) 
Man 1 Man 2 Woman 1 Woman 2 
One 96.88 97.11 97.35 96.20 
Two 95.38 94.27 95.00 96.24 
Three 99.42 98.30 97.91 99.35 
Four 98.32 99.14 98.10 98.65 
Five 99.16 97.88 99.02 99.35 
Six 96.88 97.11 97.35 96.26 
Seven 95.38 94.27 95.01 96.28 
Eight 99.42 98.30 97.91 99.35 
Nine 98.32 99.14 98.13 98.65 
Zero 99.16 97.88 99.02 99.35 
Table 1: Accuracy of integrated speaker and speech 
recognition for different isolated words (Clean 
Environment). 
Figure 3 shows the accuracy of integrated speaker and 
speech recognition work for the digit database in clean 
environment. The average accuracy is more than 97% in 
clean environment. 
 
Figure 3: Accuracy of integrated speaker and speech 
recognition in clean environment. 
Speech 
Signal 
(Words) 
Man 1 Man 2 Woman 
1 
Woman 
2 
One 
88.76 90.93 91.17 90.08 
Two 
89.26 88.09 88.82 90.03 
Three 
93.26 92.12 91.79 93.17 
Four 
92.19 92.96 91.92 92.47 
Five 
92.91 91.74 92.85 93.84 
Six 
90.75 90.93 91.19 90.56 
Seven 
89.21 88.09 88.87 90.07 
Eight 
93.24 92.15 91.72 93.17 
Nine 
92.14 92.99 91.92 92.45 
Zero 92.98 91.72 92.84 93.17 
Table 2: Accuracy of integrated speaker and speech 
recognition for different isolated words (Noisy 
Environment). 
 
Figure 4: Accuracy of integrated speaker and speech 
recognition in noisy environment. 
Figure 4 and table 2 shows the achieved accuracy in 
noisy environment. Speech signal is corrupted by adding 
White Gaussian Noise and then accuracy is measured. 
The average accuracy is more than 91% in noisy 
environment.  
Table 3 shows the comparison of two methods: One with 
MFCC only and another is with MFCC and ABC 
Integrated Speaker and Speech Recognition for... Informatica 42 (2018) 587–594 593 
algorithm on our own created algorithm using FFBPN as 
a classifier. 
 
No. of 
Iterations 
Proposed 
work using 
MFCC 
Proposed work 
using MFCC with 
ABC Algorithm 
1 92.85 97.64 
2 94.06 98.94 
3 90.68 97.47 
4 89.84 95.94 
5 91.64 97.69 
Average 
(%) 91.82 97.53 
Table 3: Accuracy of integrated speaker and speech 
recognition for own created database. 
 
Figure 5: Comparison of accuracy. 
In the figure 5 the comparison of accuracy for proposed 
work using optimization and without optimization is 
given. The accuracy is better for optimization case. So in 
integrated speaker and speech recognition, optimization 
is a better tool to create a unique feature set.  
Further, we have tested our system in real time scenario 
for the movement of wheelchair. The command from 
MATLAB software is received using RF data modem. It 
works on 2.4 GHz frequency with adjustable baud rates 
of 9600 /115200 for direct interfacing with MCU. The 
MCU (ATMEGA 8) interprets the commands received 
and accordingly motor is controlled through driver circuit 
(L293D). Programming for MCU (ATMEGA 8) is done 
on ARDUINO Compiler. We have achieved average 
87.4% accuracy for five isolated words in different 
environments like lab, canteen, office etc. 
6 Conclusion 
In proposed work, we have presented that speaker as well 
as speech recognition system with MFCC, ABC and 
FFBPN is helpful in achieving more accuracy. To be 
specific, we have found that optimization and feature 
extraction are very important as well as difficult steps in 
any pattern recognition system. In proposed work, we 
have extracted more useful feature set from speech signal 
using MFCC technique, feature optimization using ABC 
optimization algorithm and for the training and 
classification of data, FFBPN is used. The experimental 
results analyzed that proposed method using MFCC with 
ABC algorithm provides good results with 97% of 
accuracy and it is 6% more than without using 
optimization technique. In real time scenario, average 
accuracy achieved is 87.4%. 
7 References 
[1] Cutajar M., Micallef J., Casha O., Grech I., and Gatt 
E. Comparative study of automatic speech 
recognition techniques, IET Signal Processing, 7(1): 
25–46, 2013. 
http://dx.doi.org/10.1049/iet-spr.2012.0151 
[2] Kaur, G., Srivastava, M., and Kumar, A. Analysis of 
feature extraction methods for speaker dependent 
speech recognition, International journal of 
engineering and technology innovation, 7(2):78–88, 
2017. 
[3] Ijjina E. P. and Mohan C. K. Human action 
recognition using genetic algorithms and 
convolutional neural networks, Pattern recognition, 
59: 199–212, 2016. 
http://dx.doi.org/10.1016/j.patcog.2016.01.012 
[4] Ijjina E. P. and Mohan C. K.  Hybrid deep neural 
network model for human action recognition, 
Applied soft computing, 46: 936–952, 2015. 
https://doi.org/10.1016/j.asoc.2015.08.025 
[5] Karaboga D., and Akay B. A comparative study of 
Artificial Bee Colony algorithm, Applied 
mathematics and computation, 214(1): 108–132, 
2009. 
https://doi.org/10.1016/j.amc.2009.03.090 
[6] Bolaji A. L., Khader A. T., Al-Betar M. A., and 
Awadallah M. A. Artificial bee colony algorithm, its 
variants and applications: A survey, Journal of 
theoretical and applied information technology, 
47(2): 434–459, 2013. 
https://doi.org/10.1504/IJAIP.2013.054681 
[7] Chandra B., and Sharma R. K. Fast learning in deep 
neural networks, Neuro-computing, 171: 1205–
1215, 2016. 
https://doi.org/10.1016/j.neucom.2015.07.093 
[8] Li K., Wu X., and Meng H. Intonation classification 
for L2 English speech using multi-distribution deep 
neural networks, computer speech & language, 43: 
18–33, 2017. 
https://doi.org/10.1016/j.csl.2016.11.006 
[9] Richardson F., Member S., Reynolds D., and Dehak 
N. Deep neural network approaches to speaker and 
language recognition, IEEE signal processing letters, 
22(10): 1671–1675, 2015. 
https://doi.org/10.1109/LSP.2015.2420092 
[10] Dahl G. E., Yu D., Deng L., and Acero A., Context-
dependent pre-trained deep neural networks for 
large-vocabulary speech recognition, 20(1): 30–42, 
2012. 
https://doi.org/10.1109/TASL.2011.2134090 
[11] Solera-Urena R. and Garcia-Moral A. I.Real-time 
robust automatic speech recognition using  compact 
594 Informatica 42 (2018) 587–594 G. Kaur et al. 
support vector machines, Audio speech and language 
processing, 20(4): 1347–1361, 2012. 
https://doi.org/10.1109/TASL.2011.2178597 
[12] Mohamad D., and Salleh S. Malay isolated speech 
recognition using neural network : a work in finding 
number of hidden nodes and learning parameters, 
International Arab journal of information 
technology, 8(4):  364–371, 2011. 
[13] Desai Vijayendra A., and Thakar V. K. Neural 
network based Guajarati speech recognition for 
dataset collected by in-ear microphone, Procedia 
computer science, 93: 668–675, 2016. 
https://doi.org/10.1016/j.procs.2016.07.259 
[14] Abdalla O. A., Zakaria M. N., Sulaiman S., and 
Ahmad W. F. W. A comparison of feed-forward 
back-propagation and radial basis artificial neural 
networks: A Monte Carlo study,” Proceedings 2010 
International Symposium on Information 
Technology , 2: 994–998, 2010. 
https://doi.org/10.1109/ITSIM.2010.5561599 
[15] Chen X., Liu X., Wang Y., Gales M. J. F., and 
Woodland P. C. Efficient training and evaluation of 
recurrent neural network language models for 
automatic speech recognition, IEEE/ACM 
Transactions on audio speech and language 
processing, 24(11): 2146–2157, 2016. 
https://doi.org/10.1109/TASLP.2016.2598304 
[16] Simpson, R.C. et al. NavChair : An assistive 
wheelchair navigation system with automatic 
adaptation, Assistive technology and artificial 
intelligence, 1458: 235–255, 1998 
[17] Pacnik, G., Benkic, K. and Brecko, B. Voice 
operated intelligent wheelchair - VOIC," IEEE 
international symposium on industrial electronics, 
1221–1226, 2005. 
https://doi.org/10.1109/ISIE.2005.1529099 
[18] Jabardi, M.H., "Voice controlled smart electric-
powered wheelchair based on artificial neural 
network," International journal of advanced research 
in computer science, 8(5): 31–37, 2017. 
https://doi.org/10.26483/ijarcs.v8i5.3650 
[19] Siniscalchi S. M., Svendsen T., and Lee C.-H. An 
artificial neural network approach to automatic 
speech processing, Neurocomputing,  140: 326–338, 
2014. 
https://doi.org/10.1016/j.neucom.2014.03.005 
[20] Hossain A., Rahman M., Prodhan U. K., and Khan 
F. Implementation of back-propagation neural 
network for isolated Bangla speech recognition, 
International journal of information sciences and 
techniques, 3(4): 1–9, 2013. 
[21] Mansour A. H., Zen G., Salh A., Hayder H., and 
Alabdeen Z. Voice recognition using back 
propagation algorithm in neural networks,” 
International journal of computer trends and 
technology, 23(3): 132–139, 2015. 
[22] Qian Y., Tan T., and Yu D. Neural network based 
multi-factor aware joint training for robust speech 
recognition, IEEE/ACM transactions on audio 
speech and language processing, 24(12): 2231–2240, 
2016. 
https://doi.org/10.1109/TASLP.2016.2598308 
[23] Dede G.  and Sazlı M. H. Speech recognition with 
artificial neural networks, Proceedings of the annual 
conference of the international speech 
communication association, INTERSPEECH, 20(3), 
763–768, 2015. 
https://doi.org/10.1016/j.dsp.2009.10.004 
[24] Shahamiri S. R. and Binti Salim S. S. Real-time 
frequency-based noise-robust automatic speech 
recognition using multi-nets artificial neural 
networks: A multi-views multi-learners approach, 
Neurocomputing, 129(5), 1053–1063, 2014. 
https://doi.org/10.1016/j.neucom.2013.09.040