Informatica 41 (2017) 133–148 133
Improvement of Person Tracking Accuracy in Camera Network by Fusing
WiFi and Visual Information
Thi Thanh Thuy Pham
Academy of People Security, Hanoi, Vietnam
E-mail: thanh-thuy.pham@mica.edu.vn
Thi-Lan Le and Trung-Kien Dao
MICA International Research Institute, Hanoi University of Science and Technology
(HUST - CNRS/UMI-2954 - Grenoble INP), Hanoi, Vietnam
E-mail: {thi-lan.le, trung-kien.dao}@mica.edu.vn
Keywords: camera, WiFi, fusion method, person tracking by identification
Received: March 29, 2017
Person tracking in camera network is still an open subject nowadays. The main challenge for this problem
is how to link exactly individual trajectories when people move in a camera FOV (Field of View) or
switch to other ones. This refers to solve the problem of person re-identification (Re-ID) in tracking
process. A popular method for this is assigning the current position with the previous one based on the
minimum distance between them. This is called as person identification by tracking. In this work, we
approach tracking by identification, which means the trajectory assignment is done by the person identity
(ID) determined at each video frame. In order to improve the accuracy of vision-based person tracking, we
focus on accuracy enhancement for person identification by adding ID of the WiFi-enabled device held by
each person. A fusion scheme of WiFi and visual signals is proposed in this work for person tracking. An
optimal assignment and Kalman filter are used in this combination to assign the position observations and
predicted states from camera and WiFi systems. The correction step of Kalman filter is then applied for
each tracker to give out state estimations of locations. The fusion method allows tracking by identification
in non-overlapping cameras, with clear identity information taken from WiFi adapter. The evaluation on
a multi-model dataset show outperforming tracking results of the proposed fusion method in comparison
with vision-based only method.
Povzetek: Opisana je metoda sledenja osebam preko kamer s pomočjo zlivanja podatkov.
1 Introduction
There have been several attempts to combine camera and
WiFi systems for indoor person tracking. A multi-modal
system is reported in [1] using WiFi-based localization
and tracking by stationary cameras. The combined sys-
tem focuses on improving the positioning accuracy and
confidence at room level. According to the authors’ as-
sessments, camera-based localization achieves higher posi-
tioning accuracy than WiFi-based system. However, blind
points, occlusions and person identification are much more
challenging for camera systems. WiFi systems give clearer
identity information because each mobile device has a
unique MAC address, but considered targets are required
to hold mobile devices during tracking. In this work, RSSI
property and fingerprinting method are used in WiFi sys-
tem to locate mobile targets. In camera-based system, fore-
ground segmentation is done by GMM (Gaussian Mixture
Model) method. The region which contains person feet is
then extracted from foreground and projected on the floor
plane. Gaussian kernels are used to model the foot region.
Each single module is executed depending on the avail-
ability of each sensor information. When both of them
appear, a combined Bayes model with the corresponding
confidence weights is done.
The authors in [2] reported another approach for object
localization fusing images and WiFi signals. The system
can be deployed in both indoor and outdoor environments.
The algorithm of PlaceEngine [3] and the modified ver-
sion of the Centroid algorithm [4] are used in this work
for WiFi-based localization. The mixture of observation
model based on Particle filter allows continuously track
targets even in case they are occluded by other objects or
temporarily disappear when moving in blind areas among
disjoint cameras.
In [5], the authors proposed to combine RGB data with
wireless signals emitted from a person’s cell phone to lo-
cate and track individuals. The authors considered a unique
MAC address of mobile device as a reliable cue of person’s
ID. Wireless data is efficiently embedded in RGB data as a
ring image, which captures radius estimation, error bounds,
and confidence level (noise detection) for each antenna. In
134 Informatica 41 (2017) 133–148 T.T.T. Pham et al.
order to improve tracking algorithm, each MAC address
is assigned to an observed tracklet and bipartite graph is
proposed for data association problem. The testing results
proved that performance of person localization and track-
ing can be improved by fusion RGB and wireless data.
In this paper, we propose a fusion method of WiFi and
camera for person localization and Re-ID in a camera net-
work. It allows to improve the vision-based person tracking
in not only one camera FOV, but also among different cam-
era FOVs by using the unique ID information from WiFi
hardware.
The rest of paper is organized as follows. In Section II,
a framework for multi-modal person tracking by fusion of
WiFi and camera is presented. Section III and Section IV
indicate each single person localization system based on
visual and WiFi signals, respectively. A combined method
of WiFi and camera is discussed in Section V. The compar-
ative evaluations are shown in Section VI. Conclusion and
future directions will be finally denoted in the last section.
2 Framework
Figure 1 shows the fusion framework for person localiza-
tion and Re-ID in non-overlapping camera networks. The
combined model is processed in the real scenario of a fully-
automated person surveillance system, which is reported in
our previous work [6].
In this system, the camera FOVs are covered by WiFi
range. This means WiFi signals are always available for
person localization, but disjointed camera shot areas cause
intermittent positioning for vision-based system. In each
camera FOV, person localization is done by three phases,
i.e., human detection, tracking and localization to output
person ID j by camera C (IDCj ) and the corresponding po-
sition (PCj ). Because WiFi range covers the camera FOVs,
so in each camera FOV, the vision-based positioning result
of person j will be combined with WiFi-based localization
result of person i (PWi , ID
W
i ) by a fusion algorithm in
order to make effective decisions about position and iden-
tity of person in environments. When people switch from
one camera FOV to another, they will be re-identified to
update the ID for each individual trajectory. The trajecto-
ries through the cameras will be also linked to show the
entire route in the environment. Addtionally, in the fusion
model, WiFi-based localization results are used to activate
the cameras which are in the positioning range returned by
a WiFi-based system. The proposed mixture model allows
to continuously localize and identify person moving in non-
overlapping camera networks.
In the proposed system, the positioning processes are
executed independently from each single model. The lo-
cations calculated from both models of WiFi and camera
are shown on the uniform coordinate system of a 2D floor
map. A fusion algorithm for person localization and Re-ID
is proposed. It is based on Kalman filter model, together
with an optimal assignment of estimated and observed lo-
cations from both models. The details for each single per-
son localization system and the proposed fusion algorithm
will be shown in the next sections.
3 Vision-based person localization
and Re-ID
Camera-based person localization and Re-ID is a process of
finding the positions and the corresponding ID of a person
when he/she moves in one camera FOV or switches from
one camera FOV to others in camera networks. It refers to
linking person trajectories in the frame sequences captured
from multiple cameras. These trajectories are then trans-
formed to real-world coordinate system by a process called
3D localization.
3.1 Person localization
A camera-based person localization system includes three
main steps of human detection, tracking and 3D localiza-
tion. For each camera FOV, human detection is executed at
each frame to output the human ROI (Region of Interest),
which is presented by a rectangular bounding box contain-
ing the person. The person position on image is defined
in this work as a middle point of the rectangle’s bottom
edge which has contact with the floor plane (see Figure
2). It is called a FootPoint position. Human tracking in
a frame sequence captured from a camera FOV is consid-
ered as FootPoint tracking. In case of multi-person track-
ing, each detected FootPoint has to be assigned with the
corresponding ID. 3D person localization is done by trans-
forming FootPoint positions to real world locations on a
predefined 2D coordinate system of the floor plane where
the person moves.
First, a combination of HOG-SVM and GMM back-
ground subtraction techniques [6] is applied for human de-
tection. In order to improve the performance of human de-
tection, shadow removal method in [6] is used as a post-
processing step for human detection.
Second, in each camera FOV, based on the detection re-
sults, FootPoint tracking is done by utilizing Kalman Filter
and Hungarian data association algorithm [7] to improve
the performance of track association. For each camera, a
grid of the floor plane where people move in the camera
FOV, namely detection grid (see Figure 3), is defined as a
function G(x, y):
G(x, y) =
{
1 if (x, y) ∈ CT ;
0 otherwise.
where CT is a threshold region bounded by a contour line
which is the border of camera FOV on the floor plane. As
each detected person is represented by a FootPoint posi-
tion, so a FootPoint position can belong to one of the posi-
tions of the detection grid where G(x, y)=1. Let (pxt, pyt)
denote the pixel coordinates of a FootPoint position at
time t in the grid, (mxt,myt) the pixel coordinates of a
Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 135
Vision-based 
Localization
Input 
frames
WiFi
signals
(
 , 
) 
(, )
WiFi-based 
Localization
(
 , 
) 
Fusion
Figure 1: Framework of person localization and Re-ID using the combined system of WiFi and camera.
Figure 2: Examples of tracking lines which are formed by
linking trajectories of corresponding FootPoint positions.
measurement in the grid, so that G(mxt,myt) = 1, and
(vxt, vyt) velocity values at time t in x and y direction.
The state vector xt of an user at time frame t can be
characterized by the corresponding FootPoint location, and
measurement vector zt are defined as:
xt = (pxt, pyt, vxt, vyt) (1)
zt = (mxt,myt) (2)
Using the state and measurement update equations of
Kalman filter, in conjunction with the initial conditions, at
each time frame, the state vector and its covariance matrix
are estimated. The 2D spatial coordinates of an estimated
state (p̂x, p̂y) (an estimated FootPoint position) refer to the
position p of the user u.
In multi-person tracking, a separate Kalman filter is ini-
tialized and models each person’s trajectory. A set Ut of
individuals and a setMt of measurements at time frame t
are defined as:
Ut = {u1, u2, .., uN} (3)
Mt = {m1,m2, ..,mL} (4)
withN is the number of people need to be tracked or track-
ers, and L is the number of available measurements at time
Figure 3: Example of a grid map and threshold region
bounded by a contour line.
t. In order to assign a person i to a measurement j, the
Hungarian method is used.
Third, in order to locate people in real world coordinate
system, we define a 2D map of the floor plane on which
people move. This map contains all considered camera
FOVs on the floor plane. We then calculate the coordi-
nates of each FootPoint position on the 2D map on the ba-
sis of camera calibration and hormography transform [8].
The trajectories for each person through cameras are then
linked by a method of wrapping multiple camera FOVs us-
ing a stereo calibration technique [9].
3.2 Person re-ID
In this paper, the person Re-ID problem is solved in the sce-
nario of tracking by identification. This means that at each
detected FootPoint position, we extract the human ROI, and
a feature descriptor is built on this region. In this work, a
robust KDES descriptor (Kernel Descriptor) which is pro-
posed in our previous work [6], and an SVM classifier are
used for person Re-ID in camera networks. The basic idea
of KDES descriptor is to compute the approximate explicit
feature map for kernel match function (see Figure 4). In
other words, the kernel match functions are approximated
136 Informatica 41 (2017) 133–148 T.T.T. Pham et al.
by explicit feature maps. This enables efficient learning
methods for linear kernels to be applied to the non-linear
kernels. Given a match kernel function k(x, y), the feature
map ϕ(.) for the kernel k(x, y) is a function mapping a
vector x into a feature space so that k(x, y) = ϕ(x)>ϕ(y).
Given a set of basis vectors B = {ϕ(vi)}Di=1, the approxi-
mation of feature map ϕ(x) can be:
φ(x) = GkB(x) (5)
where G>G = K−1BB , KBB is a D × D matrix with
{KBB}ij = k(vi, vj), and kB is a D × 1 vector with
{kB}i = k(x, vi).
Kernel Trick
KDES
),( yxk
)()(),( yxyxk T φφ≈)()( yx φϕ ≈
)( xx ϕ→
Figure 4: The basic idea of representation based on kernel
methods.
Similar to [10], three match kernel functions for gradi-
ent, color and shape are built from different pixel attributes
of gradient, color and local binary pattern (LBP). For each
match kernel, feature extraction is done at three levels:
pixel, patch and whole detected human region.
4 WiFi-based person localization
For WiFi, RSSI is the most popular attribute used in lo-
calization. However, the localization performance depends
much on how well we can model the relationship between
RSSI and the distance. Two main approaches have been
proposed to solve this: pass-loss/radio propagation model
[12, 13] and fingerprinting method [14]. The first one is
still an open subject, because it is not easy to have an op-
timal model for relationship between RSSI and distance.
The second one is time and workforce consuming but it is
effective for localization, especially when the probabilistic
methods are applied.
In this work, both of radio propagation model and fin-
gerprinting method for WiFi-based localization are ap-
proached. A probabilistic propagation model (PPM) in
[11], together with a new-defined radio map in fingerprint-
ing database are used. The radio propagation model reflects
the complex nature of indoor environments by taking into
account the obstacles, such as walls and floors to model
the relationship between RSSI value and the distance to a
reference point (RP). The model is based on the empirical
equation of radio-frequency signal strength in indoor envi-
ronments and its uncertainty is considered by probabilistic
characteristics. An optimization process based on genetic
algorithm is also applied to tune system parameters for best
fitting with the devices in use. Based on the probabilis-
tic propagation model, the distance between a mobile user
and APs is calculated. In fingerprinting database, a new ra-
dio map of distance features instead of RSSI values is de-
fined in order to make the radio map more reliable and sta-
ble, with lower cost for setting and updating. Additionally,
KNN matching method is applied with an additional coef-
ficient reflecting temporal changes of fingerprinting data in
environments. The flowchart of the proposed WiFi-based
person localization system is illustrated in Figure 5, with
two main phases of training and testing. The first phase is
Radio Map
RP
Coordinates
Fingerprint
Database
RSSI
values PPM
Distance
values
Offline training phase
Online testing phase
SERVER
Distance values
Mobile
User
Position
RSSI PPM
KNN 
matching
Figure 5: Diagram of the proposed WiFi-based object lo-
calization system.
processed off-line with radio maps are constructed to make
fingerprint database. Normally, a radio map contains RP
coordinates and corresponding RSSI values from available
APs. However, in our proposed system, RSSI values are
replaced by distance values. A distance value is defined as
the distance di(L) from the ith RP to the Lth AP in range
(see Figure 6) which is calculated from RSSI observations
by using the PPM model. In the testing phase, a mobile de-










AP2
RP3RP2RP1
AP3AP1
Figure 6: An example of radio map with a set of pi RPs
and the distance values di(L) from each RP to L APs.
vice continuously scan signals from nearby APs and sends
corresponding RSSI values to a server. These values are
then transformed to distance values by a proposed proba-
bilistic propagation model. Distance matchings are done
with fingerprint database by methods of KNN to find the
best candidates for mobile user location.
Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 137
4.1 Probabilistic propagation model
The probabilistic propagation model which is formed by a
deterministic model in Eq. 6 and a probabilistic model.
P = P0 − 10nlog(
r
r0
)− kd
nw∑
i=1
di
cosβi
(6)
where nw is the number of walls and floors in the middle
of the AP and the receiver, di is the thickness of the ith
wall/floor, βi is the angle of arrival corresponding to the
ith wall/floor, and kd is an attenuation factor per wall/floor
thickness unit, as illustrated in Figure 7.
Figure 7: WiFi signal attenuation through walls/floors.
The deterministic model in Eq. 6 does not consider the
uncertainty of RSSI values at a distance, so a probabilistic
model (Eq. 7) is proposed. In reality, given RSSI P , the
distance r might not be exactly the value calculated from
Equation 6, but it is within a range around this value, which
is denoted by r̄. To be more precise, r̄ will be the nominate
value of the distance r with the highest probability. Given
a RSSI P , the distribution of the distance is assumed to
follow a normal (or Gaussian) distribution with median r̄:
ρ(r, P ) = Pr(r|P ) =
1
σ
√
2π
e
−(r−r̄)2
2σ2 (7)
where σ is a standard deviation, which is also a function
of P . For simplicity, σ is assumed to be related to r̄ by a
linear relation:
σ = kσr̄ (8)
In the proposed probabilistic propagation model, there
are totally five parameters to be determined: P0, r0, n, kd
and kσ. Excepting k0, other parameters can be estimated
separately from individual measurements in a straightfor-
ward manner. However, the values of these parameters
can be slightly affected by the assumptions taken in the
RF (Radio Frequency) propagation model. For this reason,
a genetic algorithm (GA) [15] is used to find the optimal
parameter set, all together. Genetic algorithms are global
search techniques modeled after the natural genetic mech-
anism to find approximate or exact solutions for optimiza-
tion and search problems. In a GA, each parameter to be
optimized is represented by a gene. Moreover, each indi-
vidual is characterized by a chromosome, which is actually
the above set of parameters awaiting optimization. To as-
sess the quality of an individual, a fitness function (objec-
tive function, or cost function) must be defined. For the
localization module, the fitness function Ψ is defined as the
root mean square of the localization error.
Ψ = (
1
N
N∑
i=1
(x̂i−xi)2+(ŷi−yi)2+(ẑi−zi)2)1/2
(9)
where N is the number of measurements, (xi, yi, zi) and
(x̂i, ŷi, ẑi) are the real and the estimated positions, respec-
tively.
4.2 Fingerprinting database and KNN
matching
Normally, a radio map in fingerprinting method is defined
as follows:
R , {(pi ,F(pi)) | i = 1 , ..,N } (10)
where pi , [px py pz]T is real world coordinates of the ith
RP and F(pi) , [ri(1) ,..,ri(n)] is the fingerprinting ma-
trix, with n being the number of training samples at each
RP. The vector ri(t),[r1i (t), .., r
L
i (t)]
T contains RSSI
values that are scanned from L APs at time t and the loca-
tion pi. By using distance feature instead of RSSI, the radio
map in Equation 10 then has a fingerprinting matrix F(pi)
, [di(1) ,..,di(n)], with a vector di(t),[d1i (t), .., d
L
i (t)]
contains distance samples di from the ith RP to L APs.
This results in a reliable and stable radio map even in case
some APs may be inactive at a certain point of time. Fur-
thermore, the cost for setting and updating the radio map
is much lower than using RSSI as usual. It is only rebuilt
when we deploy new APs and RPs or discard them from
the WiFi-based localization system.
In testing phase, the RSSI values scanned from nearby
APs by a mobile device will be converted to the corre-
sponding distance values by PPM model. They will be
compared with the training data to find the best matches.
The matching method used in this work is KNN. In KNN,
prediction for a new instance is based on its nearest neigh-
bors in the training data. There are three main ingredi-
ents associated with this method, those are (1) the similar-
ity measure (the distance measurement) between the query
patterns and training data; (2) the number of neighbors to
be taken in the prediction; (3) the weight of the neigh-
bors; Euclidean and Manhattan distances are two common
geometric measures, in which Euclidean is the most used
in WiFi-based localization system [16, 17]. In this work,
KNN method is evaluated by Euclidean measure.
In the proposed radio map, each RP is represented by
vector di(t),[d1i (t), .., d
L
i (t)]
T in L dimensional space.
In learning phase, all these training data D with their de-
pendent variables are stored. In this case, the dependent
variables are equivalent to the positions pi of RPs in the
environment. In prediction, for a new query pattern z and
for each instance d in D, the similarity between d and z is
138 Informatica 41 (2017) 133–148 T.T.T. Pham et al.
computed by Euclidean distance measure:
l(d, z) =
√√√√ n∑
i=1
(di − zi)2 (11)
A set NB(z) of the nearest neighbors of z with
|NB(z)| = k is also determined and then the estimated
location for z is calculated. To find out an optimal k, we
test on the empirical data with k in the range from 1 to 200
by an error function (12) for each k.
Ek =
√√√√ n∑
i=1
(
ŷ − y
y
)2 (12)
where ŷ is the estimated position and y is true position.
Finally, the predicted location of z is calculated by the
weighted sum of the k neighbors (13).
yz =
∑
d∈NB(z)
w(d, z)× yd∑
d∈NB(z)
w(d, z)
(13)
where w shows the weights that are chosen by (14).
w(d, z) = e−θ×l(d,z) × e−λ×|ti−t0| (14)
where θ and λ are constants used to define the curve of ex-
ponential functions; t0 belongs to the time a query instance
is captured and ti is the time of WiFi signal scanning at
each corresponding RP in training phase; l(d, z) is the dis-
similarity between a query instance and the its neighbor. In
Equation 14, beside the weight based on dissimilarity θ a
new coefficient of λ is proposed to reflect the chronologi-
cal changes of fingerprinting data in the environment. This
means the recently-updated fingerprinting data with query
instance will have higher weight than the older one.
5 Proposed fusion method
In order to improve the performance of person tracking in
camera networks, for each camera FOV, person’s locations
determined by WiFi system are optimally assigned with
positioning results from camera system. This allows to
not only maintain the high accuracy of vision-based per-
son localization, but also improve the performance of per-
son tracking in camera networks by assigning clearer ID of
WiFi adapter to each position determined by camera sys-
tem.
Algorithm 1 shows the combined method of WiFi and
camera system for people localization and identification.
At time t, on the 2D floor map, a set of position observa-
tions from WiFi system (zwi,t) or camera system (zcj,t) for
multiple targets are shown. Index i designates one among
N targets located by WiFi system, and index j refers to one
of M positions observed by camera system. We consider
recursively two consecutive observations of the localiza-
tion results from any available sensors. At time t, assuming
that we have a set of location observations coming from
WiFi system for N targets, with zwi,t = (Xwi,t, Y wi,t, IDwi,t).
If at previous time step (t-1) we get the observations
zcj,t−1 = (Xcj,t−1, Y cj,t−1) for M positions from camera
system. Without loss of generality, we can consider these
observations as the state estimations at time t-1. The pre-
diction step of the Kalman filter (KalmanPrediction)
will be applied to estimate the next state xcj,t based on
zcj,t−1. An assignment algorithm is then utilized to find
out optimal matchings between the estimated states xcj,t
from camera system with observations (zwi,t) from the WiFi
system. Considering the result Ki,t of the assignment is
the observations at the current time t, then the predicted
state xt will be corrected by KalmanCorrection step,
by which WiFi-based positions will be augmented with the
vision-based positions.
5.1 Kalman filter
In the proposed fusion algorithm, the step of state predic-
tion in Kalman filter is used to estimate the process state
at a certain time based on the position observation or mea-
surement obtained from the previous time. The correction
step of Kalman filter is done after doing optimal assign-
ment between the estimated states and the observations at
a certain time. In this case, a process state need to be esti-
mated at a certain time is defined as a position pt of a per-
son in the real world coordinate system of 2D floor map. It
is presented by a state vector xt of location coordinates pXt
and pYt on 2D floor map, together with their corresponding
velocity values vXt and vYt:
xt = (pXt, pYt, vXt, vYt) (15)
A position observation zt is then defined as follows:
zt = (mXt,mYt) (16)
By assumption of constant velocity and acceleration in
movement of people, and the position is measured n times
per second, the state equations are then defined as follows:
pXt = pXt−1 + vXt−1∆T (17)
pYt = pYt−1 + vYt−1∆T (18)
vXt = vXt−1 (19)
vYt = vYt−1 (20)
where ∆T = 1n . The state transition matrix A and the
state-measurement matrix H are then defined as:
A =

1 0 ∆T 0
0 1 0 ∆T
0 0 1 0
0 0 0 1
 , H = [1 0 0 00 1 0 0
]
Kalman-based tracking will be started after the first suc-
cessful calculated position from WiFi or camera system,
Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 139
Algorithm 1: Person tracking by fusion of position observations from WiFi and camera systems.
Input: position observations z from WiFi and camera localization systems
Output: position estimations x
1 Parameters initiation: A, H, P1, Q, R;
2 for each set of position observations z do
3 if zi,t is from WiFi location system [zwi,t = (Xwi,t, Y wi,t, IDwi,t)] then
4 if zi,t−1 is from camera location system [zcj,t−1 = (Xci,t−1, Y ci,t−1)] then
5 [xcj,t,Pt] = KalmanPrediction(A,Q,zcj,t−1,Pt−1);
6 Ki,t = Assignment(xcj,t, zwi,t);
7 [xwi,t,Pt] = KalmanCorrection(H,R,Ki,t, xt,Pt);
8 Save xwi,t as a state estimation at time t;
9 end
10 else
11 [zcj,t = (Xcj,t, Y cj,t)]
12 if zi,t−1 is from WiFi localization system [zwi,t−1 = (Xwi,t−1, Y wi,t−1, IDwi,t−1)] then
13 [xwi,t,Pt] = KalmanPrediction(A,Q,zwi,t−1,Pt−1);
14 Ki,t = Assignment(xwi,t,zcj,t);
15 [xwi,t,Pt] = KalmanCorrection(H,R,Ki,t,xt,Pt);
16 Save xwi,t as a state estimation at time t;
17 end
18 end
19 end
20 return xwi,t;
with the initial state vector x1. The initial covariance ma-
trix P1 for the initial state is:
P1 =

σ2x1 0 0 0
0 σ2y1 0 0
0 0 σ2vx1 0
0 0 0 σ2vy1

The state noise covariance matrix Q and the measurement
noise covariance matrix R are defined as:
Q =

σ2pX 0 0 0
0 σ2pY 0 0
0 0 σ2vX 0
0 0 0 σ2vY
 ,R = [σ2mX 00 σ2mY
]
where σ2 denotes deviation in centimeter from real values
of each quantity. The measurement noise refers to the noise
of calculated positions from WiFi or camera system, and
the state noise is defined according to the motion of people.
The initial covariance matrix P1 for the initial state x1, with
assumption that the calculated position has the deviation of
±5cm from real position in both X and Y directions, and
the velocity has the deviation of ±3cm. Similarly, the state
noise covariance matrix Q is set with standard deviations of
±5cm and±3cm for the determined position and its veloc-
ity, respectively. The measurement noise covariance matrix
R is described with the standard deviation of 3cm for Foot-
Point measurement in X and Y directions, and ∆T is set
to 1, meaning that the position is measured every second.
5.2 Optimal assignment
After the Kalman prediction step, we have a position esti-
mation of xcj,t or xwi,t for camera or WiFi system, respec-
tively. Considering the first case of position estimation xcj,t
at time t for camera system, it is estimated from the previ-
ous observation of vision-based location zj,t−1. Then, op-
timal assignment at time t between xcj,t and zwi,t is applied.
Assuming that the assignment of an estimated position xj
and an observation zi incurs a cost dij which is the Eu-
clidean distance between them, then the matrix DN×L of
the costs or distances between every x ∈ M and z ∈ N is
then defined as:
D =

d11 d12 ... d1N
d21 d22 ... d2N
... ... ... ...
dM1 dM2 ... dMN

where dij =
√
(Xcj −Xwi )2 + (Y cj − Y wi )2. The assign-
ment is now formulated as a linear assignment problem:
min
∑
i∈N
∑
j∈M
dijxij (21)
subject to ∑
i∈N
xij = 1 ∀j ∈M∑
j∈M
xij = 1 ∀i ∈ N
xij ≥ 0 ∀i ∈ N , j ∈M
140 Informatica 41 (2017) 133–148 T.T.T. Pham et al.
This optimal assignment is done with the following con-
straints:
– If N = M , for each pair of (xcj,t, zwi,t), we augment
the position xcj,t with the identity IDwi,t from zwi,t;
– If N > M , all unassigned zwi,t will be kept up with
their original coordinates which are computed from
WiFi-based localization system;
– If N < M , all unassigned xcj,t are considered as false
positives and will be discarded, because we assume in
the surveillance system that all people coming in the
monitoring areas hold WiFi-enabled devices and they
have checked in at the entrance.
The overall formula for these constraints is given as fol-
lows:
Ki,t =
{
(Xcj,t, Y
c
j,t, ID
w
i,t) if zwi,t is assigned;
(Xwi,t, Y
w
i,t, ID
w
i,t) otherwise.
where Ki,t denotes the association between position esti-
mations xcj,t and observations zwi,t. Each component Ki,t
is a random variable that takes its value among {0, .., N}.
Based on this association, the location information from
WiFi-based observations will be corrected according to the
positions given by the camera system, and the correspond-
ing ID from the WiFi system will be assigned. The cor-
rection step of the Kalman filter is applied to update the
predicted state by the current position observation Ki,t.
The same procedure is done for the case in which WiFi-
based location observations come before camera-based
ones, and we have optimal assignment of an estimated po-
sition xi from the WiFi system and an observation zj from
the camera system.
6 Dataset and evaluation
6.1 Testing dataset
In order to evaluate the combined algorithm for person
tracking using both WiFi and camera systems, a multi-
modal dataset with two scripts are constructed in this work.
Script 1 is set with simpler scenarios than Script 2. Two
people are involved in Script 1, with their random routes
of moving through two non-overlapping cameras. Some
inter-person occlusions appeared but not as frequently as
in Script 2. The visual data in Script 1 is used for per-
son localization and Re-ID based on camera. Script 2 con-
tains five scenarios referring to different number of people
taking part in each scenario: one person, two, three, and
five moving people. The data in Script 2 is very challeng-
ing for both WiFi-based and vision-based systems. People
move through four different cameras. Severe occlusions
happened because all people are required to move in close
proximity with a fixed route (see Figure 8). Moreover, the
similar human appearance is a challenge for visual process-
ing problems.
Figure 8: A 2D floor map of the testing environment in
Figure 9, with the routing path of moving people in testing
scenarios.
Figure 9: Testing environment.
The testing environment for building the dataset is
shown in Figure 9, with 6 access points (APs) and 4 cam-
eras are deployed in the environment. The APs are set to a
same SSID, which assures continuous connectivity for mo-
bile devices when people move from the range of one AP
to another. The WiFi range for each AP is about 30–50
meters in radius, depending on walls and obstacles in the
environment. The AP specifications are MAC address, AP
position in X , Y and Z. All APs used in the testing are
Linksys E1200 devices. A person holds a WiFi-enable de-
vice and moves in the testing environment, with a normal
velocity of 1–1.3m/s.
The time duration for each scenario is from 3 to 5 min-
utes, with about 400 RSSI values are acquired from 6 APs
and average time deviation between two consecutive sam-
ples is 2 seconds. The mobile devices and cameras are time
synchronized to Internet time. This makes a synchroniza-
tion of data captured from both camera and WiFi. Basing
on this, we can compute real-world positions of a mobile
user on the 2D floor map at each time. The time stamp
for each person location calculated from camera or WiFi
system will provide the basis for processing multi-model
object localization. The WiFi data is scanned from the mo-
bile devices and stored in XML files. These devices con-
Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 141
Frame 491Frame 135 Frame 1596Frame 1145
Frame 905Frame 431 Frame 1969Frame 957
Frame 692Frame 242 Frame 1541Frame 1328
Frame 784Frame 313 Frame 2114Frame 810
Figure 10: The visual examples in Script 2. The first row contains frames for the scenario of one moving person. The
scenarios for 2, 3 and 5 moving people are shown in the second, third and fourth rows.
tinuously capture the signals from available APs in the en-
vironment. The AP specifications are saved as a record
of scanning time, MAC address, AP name, and RSSI. The
APs are distinguished by their own MAC addresses.
For visual data, we manually assign FootPoint positions
on the captured frames with the corresponding time stamps
and IDs. These positions are then automatically trans-
formed into 2D locations on the floor map by using camera
calibration and homography matrix. The person ID which
is assigned in visual data is equivalent to the ID of WiFi
adapter by predefined convention. In short, for each sce-
nario, the ground truth data is achieved and saved as XML
files which contain the following records:
– Frame number.
– Person ID.
– Coordinates of top left and bottom right positions of
the bounding box containing the person.
– The image coordinates of FootPoint position.
– The corresponding coordinates of FootPoint positions
on 2D floor map.
In case of no person detected, except frame number, all
other records are set to -1.
Figure 10 illustrates examples in Script 2. The frames
in the first row show the scenario of one moving person,
while those in the second, third and fourth rows are frames
for the scenarios of two, three and five moving people.
For WiFi data which is determined outside camera FOV,
the ground truth of person locations in these regions are
calculated by a pedestrian foot counting program. It takes
input information from the acceleration and direction sen-
sors that are available on smart phones or tablets [20]. Ba-
sically, the positions of mobile user in this region are com-
puted by the route length that user passes through marking
points or reference points. This distance is calculated by
foot counter with the average length of the foot step of each
particular person is considered. The foot counter gives the
positioning result of 5m with the deviation of 3m for the
route length of 120m. In our test, the route length outside
camera view is only about 10m. In addition, the bias for
foot counter is accumulated from time to time, so in 10m
this deviation will be 0.8m (equivalent to 8% of the route
length). This makes the deviation of 8cm per one meter la-
beled in the dataset in comparison with the truth positions.
After the step of synchronization between WiFi and vi-
142 Informatica 41 (2017) 133–148 T.T.T. Pham et al.
sual data, the interpolation method is applied to calculate
the person positions that are outside the camera field of
views.
6.2 Evaluation metrics
In order to evaluate the performance of vision-based
tracking, the metrics of Multi Object Tracking Precision
(MOTP) [18], Global Multiple Object Tracking Accuracy
(GMOTA) [19], and CMC (Cumulative Match Curve) are
utilized.
Assuming that for each time step t, a multi-person
tracker outputs a set of hypotheses {h1, .., hm} for a set
of visible people {u1, .., un}. MOTP measures the posi-
tioning error for all matched pairs of person and tracker
hypothesis on all frames. This metric is defined by:
MOTP =
∑
i,t di,t∑
t ct
(22)
where di,t is Euclidean distance between ground truth and
tracker hypothesis values for the person ith at time frame t.
In this work, it is Euclidean distance between ground truth
and tracker hypothesis of FootPoint positions. The element
ct indicates the number of matched pairs at time step t.
GMOTA is an extension of MOTA (Multiple Object
Tracking Accuracy) [18]. MOTA measures the number of
errors the tracker made in terms of false negatives (missed
detections), false positives (wrong detections), mismatches
and failure to recover tracks. This score is computed as
follows:
MOTA = 1−
∑
t(FNt + FPt + IDt)∑
t gt
(23)
where FNt is false negatives, FPt is false positive, IDt
shows the number of instantaneous identity switches, and
gt denote the number of ground truth detections at time
frame t. In GMOTA score, the IDt is replaced by global
IDt (gIDt). This means that gIDt presents the perfor-
mance of the tracker in preservation of person identity as-
signments in a global manner instead of instantaneous iden-
tity assignments of MOTA.
GMOTA = 1−
∑
t(FNt + FPt + gIDt)∑
t gt
(24)
The CMC is employed as the performance evaluation
metric for vision-based person Re-ID. The CMC curve
presents the expectation of finding correct match in the top
n matches.
The accuracy of the WiFi-based localization system is
evaluated by the statistical values of maximal error, error
average, and error at reliability of 90%. Maximal error
is the maximum distance deviation in meter between the
positions determined by the system and the ground truth
positions. The error average refer to the average distance
deviation in meter between the positions determined by the
system and the ground truth positions. Error at reliability
of 90% indicates the distance deviation value in meter in
which 90% of the testing times are smaller than this value.
The performance of fusion method is evaluated in this
work by the metric of GMOTA.
6.3 Experimental results
In vision-based person localization, at each camera FOV,
person identification is done by a so-called process of iden-
tification by tracking. This means a trajectory which be-
longs to an individual in the current frame is linked to the
corresponding one from the previous frame based on an
optimal assignment of Euclidean distances between them.
However, this results in ID switches when people switch to
each others.
The proposed method in 3.2 for person Re-ID helps to
solve not only person identification in each camera FOV,
but also person Re-ID among multiple cameras by using a
robust appearance-based descriptor built on each detected
human ROI at each FootPoint position. This allows to per-
form tracking by identification. However, person identifi-
cation and Re-ID performance still need to be improved,
especially in case of inter-person occlusions and people
have similar appearances.
The proposed fusion algorithm allows adding clearer ID
information of WiFi adapter for performing tracking by
identification.
In the following sections, the testing results for WiFi-
based localization, vision-based localization and Re-ID,
fusion-based tracking are shown.
6.3.1 WiFi-based localization results
The system parameters of the WiFi-based localization
model are calculated, then based on these, the positioning
results are given out.
Firstly, the training process using GA algorithm is set
up with the configuration provided in Table 1. Using these
data, the optimal parameters are produced as in Table 2.
Parameter Value Parameter Value
Population size 20 Tolerance 10−6
Elite count 5 Selection Uniform
Crossover fraction 0.5 Crossover Scattered
Time limit No Mutation Uniform
Maximal generations No Creation population Uniform
Table 1: Genetic algorithm configuration.
Parameter Values for the first scenario Values for the second scenario
P0 -41 dBm -36.1757 dBm
n 1.1 2.2029
kσ 1.0035 m−1 5.3147 m−1
r0 5 m 2.5117 m
kd 49.23 dBm.m−1 5.1311 dBm.m−1
Table 2: Optimized system parameters for the first and the
second scenarios of testing environments.
Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 143
Fingerprint
Feature
Maximal error
(m)
Average error
(m)
Error at reliability of 90%
(m)
RSSI 6.3 1.86 2.99
Distance 6.27 1.89 2.98
Table 3: Evaluations for distance and RSSI features in case
of using coefficient λ.
Fingerprint
Feature
Maximal error
(m)
Average error
(m)
Error at reliability of 90%
(m)
RSSI 6.06 1.76 3.55
Distance 6.5 1.59 2.9
Table 4: Localization results using different features of dis-
tance and RSSI, without using coefficient λ.
Secondly, the weights of different values of θ based on
dissimilarity are given out (see Figure 11). Different values
of λ are presented in Figure 12, with λ = 0.5 × 10−6, the
influence is reduced by 3 when fingerprints is scanned from
1 month since the testing time (roughly 2.6×106 seconds).
Similarly, when fingerprints is taken from 2 months since
the testing time, the influence takes only 10% compared
with that of new fingerprints. In this work, we choose k =
9, θ = 1.1 and λ = 2× 10−6.
The radio maps and fingerprint locations in the testing
environment are shown in Figure 13a and Figure 13b. The
regions with deep pink color indicate that more APs are
available than the regions with light pink color.
The localization experiments are conducted by using fin-
gerprinting method with distance features calculated by
the proposed probabilistic propagation model. The com-
parative results are also given out for using fingerprinting
method with RSSI features. Additionally, the stability and
reliability of radio map with distance features is also con-
firmed by the evaluations with coefficient λ.
Figures 14, 15, 16 show the comparative results when
the coefficient λ is taken into account. The localization re-
sults, distribution of the localization results compared to
the real locations, and the reliability of the localization re-
sult as a function of the localization error are shown corre-
spondingly in these figures. The details for these results
are shown in Table 3. It can be seen from the experiments
that the positioning errors at reliability of 90% when using
distance features are a little bit higher than using RSSI fea-
tures. However, without using λ, the localization reliability
for RSSI features decreases, whilst it is stable for distance
features. The results for this are shown in Figures 17, 18,
21, and in Table 4, with the error at reliability of 90% is
3.55m for RSSI features, but it is 2.9m for distance fea-
tures. The above experiments show that using distance fea-
tures for fingerprint data will result in more stable and re-
liable radio maps in comparison with using RSSI features.
Moreover, this also brings lower cost for updating finger-
print data, which is considered as one of the most challeng-
ing problem of fingerprinting method in WiFi-based local-
ization.
0 2 4 6 8 10 12 14 16 18 20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dissimilarity (m)
W
ei
gh
t
 
 
theta=0.5
theta=0.7
theta=0.9
theta=1.1
theta=1.3
Figure 11: Weights of different values of θ based on dis-
similarity.
0 2 4 6 8 10 12 14
x 10
6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time (s)
W
ei
gh
t
 
 
lambda=1/1000000
lambda=1/2000000
lambda=1/4000000
lambda=1/6000000
Figure 12: Weights of different values of λ based on dis-
similarity.
(a) (b)
Figure 13: (a) the radio map, with (b) 2000 fingerprint lo-
cations collected in the testing environment.
144 Informatica 41 (2017) 133–148 T.T.T. Pham et al.
Vision-based evaluations The proposed fusion algorithm
Hallway (Cam 1) Showroom (Cam 3) Hallway (Cam 1) Showroom (Cam 3)
MOTP (cm) 24.3 21.3 24.3 21.3
FN (%) 17.1 26.4 7.6 12.6
FP (%) 22.7 18.3 3.4 2.1
gID 28.3 11.6 4.9 2.3
GMOTA (%) 31.2 52.6 83.9 85.7
Table 5: The comparative results of the proposed fusion algorithm against the vision-based evaluations on testing data of
Script 1.
0 10 20 30 40 50 60
0
10
20
30
40
50
60
X (m)
Y
 (
m
)
Ground Truth path
Localization result with distance feature
Localization result with RSSI feature
Figure 14: Localization results with distance and RSSI fea-
tures when using coefficient λ.
6.3.2 Experimental results for vision and
fusion-based tracking.
The performance of vision-based person localization and
Re-ID is evaluated on Script 1 and Script 2 databases. In
addition, the comparative results gained from fusion sys-
tem of camera and WiFi are also indicated on these.
Firstly, vision-based person Re-ID evaluations are done
on Script 1 data. The human ROIs are manually extracted
from the frames captured by three non-overlapping cam-
eras: Cam 1 (hallway), Cam 2 (lobby) and Cam 3 (show-
room). The human ROIs from Cam 2 are used for training
phase (see Figure 19) and the human ROIs from Cam 1 and
Cam 3 for testing phase (see Figure 20).
We train the system with totally 10 people, including two
testing ones, by the images of human ROI extracted from
Cam 2. Figure 22 shows person recognition rates for this
experiment, with Rank 1 is 51.1%.
Table 5 shows the results for vision-based localization,
with two scenarios of Hallway (Cam 1) and Showroom
(Cam 3) are considered. MOTP evaluated on the vision-
based localization system with 24.3cm and 21.3cm for
Hallway and Showroom scenarios respectively. These val-
ues are retained for the fusion model of camera and WiFi.
X (m)
-6 -3 0 3 6
-9
-6
-3
0
3
Y
 (
m
)
Distribution error with distance feature
Distribution error with RSSI feature
Figure 15: Distribution of localization error for distance
and RSSI features when using coefficient λ.
GMOTA ratio for Hallway is better than Showroom, with
correspondingly 31.2% compared to 52%. However, by be-
ing integrated with WiFi, these values increase incredibly
to 83.9% for Hallway and 85.7% for Showroom. This re-
sulted from the sharply decreases in the rates of FN, FP and
gID in both scenarios. Additionally, in comparison with the
perfect case of manual human detection in vision-based Re-
ID, the performance of person tracking by identification is
not as good as the results from the proposed fusion algo-
rithm.
Secondly, further evaluations for the proposed fusion al-
gorithm, the experiments are done on the data of Script 2.
This dataset is very challenging compared to Script 1, be-
cause of severe occlusions and the similarity in human ap-
pearance. Moreover, people moving together in the same
route is also a challenge for WiFi-based localization.
In the experiments with this data, we use the ground
truth data of FootPoint positions and the corresponding hu-
man ROIs for testing evaluations. The parameter gID in
GMOTA metric now indicates the performance of tracker
in maintaining the person ID when he/she moves from one
camera FOV to others or re-appears in one camera FOV.
Table 6 shows the comparative results of GMOTA when
applying the fusion algorithm and Rank 1 for person Re-
ID. It should be noted that FN and FP are not included
Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 145
0 1 2 3 4 5 6 7 8
0
10
20
30
40
50
60
70
80
90
100
Error (m)
R
el
ia
bi
lit
y 
di
st
rib
ut
io
n 
(%
)
Localization reliability with distance feature
Localization reliability with RSSI feature
Figure 16: Localization reliability for distance and RSSI
features when using coefficient λ.
0 10 20 30 40 50 60
0
10
20
30
40
50
60
X (m)
Y
 (
m
)
Ground Truth path
RSSI
Distance
Figure 17: Localization results for distance and RSSI fea-
tures, without using coefficient λ.
in the testing evaluations of GMOTA because we use the
ground truth data of FootPoint positions and human ROIs.
In this case, only gID is taken into account. This means
performance of maintenance person ID in tracking now de-
pends only on the performance of WiFi-based person local-
ization. In comparison with GMOTA values from Script 1,
GMOTA figures from Script 2 are much lower. It is only
31.7% for the scenario of two moving people, 16.5% and
11.2% for scenarios of three and five moving people, re-
spectively. This can be explained that data of Script 2 is
much challenging than Script 1. People moving together in
very close proximity is not only a burden for vision-based
person localization and identification, but also for WiFi-
based person localization because of noisy WiFi data when
people are close to each other.
However, in comparison with person Re-ID by kernel
descriptor, these results are much higher. In this experi-
ments, besides the number of testing people, we train the
system with 20 other people at check-in gate for person
Re-ID. The recognition rate at Rank 1 is only 12.6% for
scenario of two moving people, which is 19.1% lower than
-9 -6 -3 0 3 6
-9
-6
-3
0
3
6
9
X (m)
Y
 (
m
)
RSSI
Distance
Figure 18: Distribution of localization error for distance
and RSSI features, without using coefficient λ.
Two people Three people Five people
GMOTA (%) 31.7 16.5 11.2
Rank 1 (%) 12.6 8.9 5.6
Table 6: The experimental results for person tracking and
person Re-ID with Script 2 dataset.
fusion-based method. Rank 1 figures for scenarios of three
and five moving people is 8.9% and 5.6%. Clearly, perfor-
mance of person Re-ID based on kernel descriptor will be
degraded in case of the similar human appearance.
From the above comparative evaluations, we can see that
by using the proposed fusion algorithm, the performance of
person tracking by identification and person Re-ID is im-
proved significantly. The vision-based person localization
with high accuracy, together with the clear ID information
from WiFi-enable device are integrated into each detected
FootPoint position. This allows to do tracking by identifi-
cation at each camera FOV, and based on this, the person
Re-ID in non-overlapping camera networks can be solved
more effectively than applying only vision-based method.
7 Conclusion
In this work, person localization and Re-ID in surveillance
regions covered by WiFi signals and disjointed FOV cam-
eras are improved by a fusion algorithm based on Kalman
filter and optimal assignment technique. This algorithm is
executed with the position observations on 2D floor map
achieved from each single system of camera or WiFi.
Evaluation on the multimodal dataset shows outperform-
ing results when the proposed fusion algorithm is applied.
The high positioning accuracy of vision-based system is
maintained in multimodal person localization system. Ad-
146 Informatica 41 (2017) 133–148 T.T.T. Pham et al.
Figure 19: Training examples of manually-extracted human ROIs from Cam 2 for person 1 (images on the left) and person
2 (images on the right).
(a)
(b)
Figure 20: Testing examples of manually-extracted human ROIs from Cam 1 (images on the left column) and Cam 3
(images on the right column) for (a) person 1 and (b) person 2.
Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 147
0 1 2 3 4 5 6
0
10
20
30
40
50
60
70
80
90
100
Error (m)
R
el
ia
bi
lit
y 
di
st
rib
ut
io
n 
(%
)
RSSI
Distance
Figure 21: Localization reliability for distance and RSSI
features, without using coefficient λ.
1 2 3 4 5 6 7 8 9 10
10
20
30
40
50
60
70
80
90
100
Rank
R
ec
og
ni
tio
n 
R
at
e
Figure 22: Person Re-ID evaluations on testing data of two
moving people.
ditionally, the fusion algorithm allows tracking by identifi-
cation and based on this person Re-ID in non-overlapping
cameras is done with clear identity information taken from
the WiFi-based system.
In the future works, some other localization techniques,
such as RFID or UWB, can be integrated into a multi-
modal system in order to improve the positioning accuracy
and person Re-ID. The fusion algorithm for person local-
ization and Re-ID is also correspondingly broaden to adapt
this addition.
Acknowledgement
This research is funded by the Vietnam National Founda-
tion for Science and Technology Development (NAFOS-
TED) under grant number 102.04-2013.32.
References
[1] Van den Berghe, Sam and Weyn, Maarten and Spruyt,
Vincent and Ledda, Alessandro (2011) Combining
wireless and visual tracking for an indoor environ-
ment, International Conference on Indoor Position-
ing and Indoor Navigation (IPIN-2011).
[2] MIYAKI, Takashi, YAMASAKI, Toshihiko, et
AIZAWA, Kiyoharu (2007) Visual tracking of pedes-
trians jointly using wi-fi location system on dis-
tributed camera network, 2007 IEEE International
Conference on Multimedia and Expo, IEEE, 2007. p.
1762–1765.
[3] Rekimoto, Jun and Shionozaki, Atsushi and
Sueyoshi, Takahiko and Miyaki, Takashi (2006)
PlaceEngine: a WiFi location platform based
on realworld folksonomy Internet conference, p.
95–104.
[4] Cheng, Yu-Chung and Chawathe, Yatin and LaMarca,
Anthony and Krumm, John (2005) Accuracy charac-
terization for metropolitan-scale Wi-Fi localization,
Proceedings of the 3rd international conference on
Mobile systems, applications, and services, ACM, p.
233–245.
[5] Alahi, Alexandre and Haque, Albert and Fei-Fei, Li
(2015) RGB-W: When Vision Meets Wireless, Pro-
ceedings of the IEEE International Conference on
Computer Vision, IEEE, p. 3289–3297.
[6] Pham, T. T. T., Le, T. L., Vu, H., and Dao, T.
K. (2017) Fully-automated person re-identification in
multi-camera surveillance system with a robust ker-
nel descriptor and effective shadow removal method,
Image and Vision Computing, Elsevier, p. 44-62.
[7] Kuhn, Harold W (1955) Naval research logistics
quarterly, Wiley Online Library, p. 83–97.
[8] Zhang, Zhengyou (2000) A flexible new technique for
camera calibration, Pattern Analysis and Machine In-
telligence, IEEE, p. 1330–1334.
[9] Thi Thanh Thuy Pham, Anh Tuan Pham, Hai Vu
(2015) A new technique for linking person trajecto-
ries in surveillance camera network, Conference on
Fundamental and Applied IT Research (FAIR), p. 8–
15.
[10] Bo, Liefeng and Ren, Xiaofeng and Fox, Dieter
(2010) Kernel descriptors for visual recognition, Ad-
vances in Neural Information Processing Systems
(NIPS), Vancouver, Canada, p. 244–252.
[11] Dao, Trung-Kien and Pham, Thanh-Thuy and
Castelli, Eric (2013) A robust WLAN positioning sys-
tem based on probabilistic propagation model, 9th In-
ternational Conference on Intelligent Environments
(IE), IEEE, p. 24–29.
[12] Goldsmith, A. (2005), Wireless communications,
Cambridge university press.
148 Informatica 41 (2017) 133–148 T.T.T. Pham et al.
[13] Roberts B. and Pahlavan K. (2009) Site-specific rss
signature modeling for wifi localization, In Global
Telecommunications Conference, IEEE, p. 1–6.
[14] Munoz D., Lara F.B., Vargas C., and Enriquez-
Caldera R. (2009), Position location techniques and
applications, Academic Press.
[15] Haupt, Randy L and Haupt, Sue Ellen (2004) Practi-
cal genetic algorithms, John Wiley & Sons.
[16] Jungmin So, Joo-Yub Lee, Cheal-Hwan Yoon, Hyun-
jae Park (2013) An Improved Location Estimation
Method for Wifi Fingerprint-based Indoor Localiza-
tion, International Journal of Software Engineering
and Its Applications.
[17] Arsham Farshad, Jiwei Li, Mahesh K. Marina, Fran-
cisco J. Garcia (2013) A Microscopic Look at WiFi
Fingerprinting for Indoor Mobile Phone Localization
in Diverse Environments, International Conference
on Indoor Positioning and Indoor Navigation.
[18] Bernardin, Keni and Stiefelhagen, Rainer (2008)
Evaluating multiple object tracking performance: the
CLEAR MOT metrics, EURASIP Journal on Image
and Video Processing, Springer, p. 1–10.
[19] Ben Shitrit, Horesh and Berclaz, Jerome and Fleuret,
François and Fua, Pascal (2013) Tracklet-based
Multi-Commodity Network Flow for Tracking Mul-
tiple People, No. EPFL-PATENT-186751, WO.
[20] Kothari, Nisarg and Kannan, Balajee and Glasgwow,
Evan D and Dias, M Bernardine (2012) Robust indoor
localization on a commercial smart phone, Procedia
computer science, Elsevier, p. 1114–1120.