Informatica 41 (2017) 133–148 133 Improvement of Person Tracking Accuracy in Camera Network by Fusing WiFi and Visual Information Thi Thanh Thuy Pham Academy of People Security, Hanoi, Vietnam E-mail: thanh-thuy.pham@mica.edu.vn Thi-Lan Le and Trung-Kien Dao MICA International Research Institute, Hanoi University of Science and Technology (HUST - CNRS/UMI-2954 - Grenoble INP), Hanoi, Vietnam E-mail: {thi-lan.le, trung-kien.dao}@mica.edu.vn Keywords: camera, WiFi, fusion method, person tracking by identification Received: March 29, 2017 Person tracking in camera network is still an open subject nowadays. The main challenge for this problem is how to link exactly individual trajectories when people move in a camera FOV (Field of View) or switch to other ones. This refers to solve the problem of person re-identification (Re-ID) in tracking process. A popular method for this is assigning the current position with the previous one based on the minimum distance between them. This is called as person identification by tracking. In this work, we approach tracking by identification, which means the trajectory assignment is done by the person identity (ID) determined at each video frame. In order to improve the accuracy of vision-based person tracking, we focus on accuracy enhancement for person identification by adding ID of the WiFi-enabled device held by each person. A fusion scheme of WiFi and visual signals is proposed in this work for person tracking. An optimal assignment and Kalman filter are used in this combination to assign the position observations and predicted states from camera and WiFi systems. The correction step of Kalman filter is then applied for each tracker to give out state estimations of locations. The fusion method allows tracking by identification in non-overlapping cameras, with clear identity information taken from WiFi adapter. The evaluation on a multi-model dataset show outperforming tracking results of the proposed fusion method in comparison with vision-based only method. Povzetek: Opisana je metoda sledenja osebam preko kamer s pomočjo zlivanja podatkov. 1 Introduction There have been several attempts to combine camera and WiFi systems for indoor person tracking. A multi-modal system is reported in [1] using WiFi-based localization and tracking by stationary cameras. The combined sys- tem focuses on improving the positioning accuracy and confidence at room level. According to the authors’ as- sessments, camera-based localization achieves higher posi- tioning accuracy than WiFi-based system. However, blind points, occlusions and person identification are much more challenging for camera systems. WiFi systems give clearer identity information because each mobile device has a unique MAC address, but considered targets are required to hold mobile devices during tracking. In this work, RSSI property and fingerprinting method are used in WiFi sys- tem to locate mobile targets. In camera-based system, fore- ground segmentation is done by GMM (Gaussian Mixture Model) method. The region which contains person feet is then extracted from foreground and projected on the floor plane. Gaussian kernels are used to model the foot region. Each single module is executed depending on the avail- ability of each sensor information. When both of them appear, a combined Bayes model with the corresponding confidence weights is done. The authors in [2] reported another approach for object localization fusing images and WiFi signals. The system can be deployed in both indoor and outdoor environments. The algorithm of PlaceEngine [3] and the modified ver- sion of the Centroid algorithm [4] are used in this work for WiFi-based localization. The mixture of observation model based on Particle filter allows continuously track targets even in case they are occluded by other objects or temporarily disappear when moving in blind areas among disjoint cameras. In [5], the authors proposed to combine RGB data with wireless signals emitted from a person’s cell phone to lo- cate and track individuals. The authors considered a unique MAC address of mobile device as a reliable cue of person’s ID. Wireless data is efficiently embedded in RGB data as a ring image, which captures radius estimation, error bounds, and confidence level (noise detection) for each antenna. In 134 Informatica 41 (2017) 133–148 T.T.T. Pham et al. order to improve tracking algorithm, each MAC address is assigned to an observed tracklet and bipartite graph is proposed for data association problem. The testing results proved that performance of person localization and track- ing can be improved by fusion RGB and wireless data. In this paper, we propose a fusion method of WiFi and camera for person localization and Re-ID in a camera net- work. It allows to improve the vision-based person tracking in not only one camera FOV, but also among different cam- era FOVs by using the unique ID information from WiFi hardware. The rest of paper is organized as follows. In Section II, a framework for multi-modal person tracking by fusion of WiFi and camera is presented. Section III and Section IV indicate each single person localization system based on visual and WiFi signals, respectively. A combined method of WiFi and camera is discussed in Section V. The compar- ative evaluations are shown in Section VI. Conclusion and future directions will be finally denoted in the last section. 2 Framework Figure 1 shows the fusion framework for person localiza- tion and Re-ID in non-overlapping camera networks. The combined model is processed in the real scenario of a fully- automated person surveillance system, which is reported in our previous work [6]. In this system, the camera FOVs are covered by WiFi range. This means WiFi signals are always available for person localization, but disjointed camera shot areas cause intermittent positioning for vision-based system. In each camera FOV, person localization is done by three phases, i.e., human detection, tracking and localization to output person ID j by camera C (IDCj ) and the corresponding po- sition (PCj ). Because WiFi range covers the camera FOVs, so in each camera FOV, the vision-based positioning result of person j will be combined with WiFi-based localization result of person i (PWi , ID W i ) by a fusion algorithm in order to make effective decisions about position and iden- tity of person in environments. When people switch from one camera FOV to another, they will be re-identified to update the ID for each individual trajectory. The trajecto- ries through the cameras will be also linked to show the entire route in the environment. Addtionally, in the fusion model, WiFi-based localization results are used to activate the cameras which are in the positioning range returned by a WiFi-based system. The proposed mixture model allows to continuously localize and identify person moving in non- overlapping camera networks. In the proposed system, the positioning processes are executed independently from each single model. The lo- cations calculated from both models of WiFi and camera are shown on the uniform coordinate system of a 2D floor map. A fusion algorithm for person localization and Re-ID is proposed. It is based on Kalman filter model, together with an optimal assignment of estimated and observed lo- cations from both models. The details for each single per- son localization system and the proposed fusion algorithm will be shown in the next sections. 3 Vision-based person localization and Re-ID Camera-based person localization and Re-ID is a process of finding the positions and the corresponding ID of a person when he/she moves in one camera FOV or switches from one camera FOV to others in camera networks. It refers to linking person trajectories in the frame sequences captured from multiple cameras. These trajectories are then trans- formed to real-world coordinate system by a process called 3D localization. 3.1 Person localization A camera-based person localization system includes three main steps of human detection, tracking and 3D localiza- tion. For each camera FOV, human detection is executed at each frame to output the human ROI (Region of Interest), which is presented by a rectangular bounding box contain- ing the person. The person position on image is defined in this work as a middle point of the rectangle’s bottom edge which has contact with the floor plane (see Figure 2). It is called a FootPoint position. Human tracking in a frame sequence captured from a camera FOV is consid- ered as FootPoint tracking. In case of multi-person track- ing, each detected FootPoint has to be assigned with the corresponding ID. 3D person localization is done by trans- forming FootPoint positions to real world locations on a predefined 2D coordinate system of the floor plane where the person moves. First, a combination of HOG-SVM and GMM back- ground subtraction techniques [6] is applied for human de- tection. In order to improve the performance of human de- tection, shadow removal method in [6] is used as a post- processing step for human detection. Second, in each camera FOV, based on the detection re- sults, FootPoint tracking is done by utilizing Kalman Filter and Hungarian data association algorithm [7] to improve the performance of track association. For each camera, a grid of the floor plane where people move in the camera FOV, namely detection grid (see Figure 3), is defined as a function G(x, y): G(x, y) = { 1 if (x, y) ∈ CT ; 0 otherwise. where CT is a threshold region bounded by a contour line which is the border of camera FOV on the floor plane. As each detected person is represented by a FootPoint posi- tion, so a FootPoint position can belong to one of the posi- tions of the detection grid where G(x, y)=1. Let (pxt, pyt) denote the pixel coordinates of a FootPoint position at time t in the grid, (mxt,myt) the pixel coordinates of a Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 135 Vision-based Localization Input frames WiFi signals (  ,  ) (, ) WiFi-based Localization (  ,  ) Fusion Figure 1: Framework of person localization and Re-ID using the combined system of WiFi and camera. Figure 2: Examples of tracking lines which are formed by linking trajectories of corresponding FootPoint positions. measurement in the grid, so that G(mxt,myt) = 1, and (vxt, vyt) velocity values at time t in x and y direction. The state vector xt of an user at time frame t can be characterized by the corresponding FootPoint location, and measurement vector zt are defined as: xt = (pxt, pyt, vxt, vyt) (1) zt = (mxt,myt) (2) Using the state and measurement update equations of Kalman filter, in conjunction with the initial conditions, at each time frame, the state vector and its covariance matrix are estimated. The 2D spatial coordinates of an estimated state (p̂x, p̂y) (an estimated FootPoint position) refer to the position p of the user u. In multi-person tracking, a separate Kalman filter is ini- tialized and models each person’s trajectory. A set Ut of individuals and a setMt of measurements at time frame t are defined as: Ut = {u1, u2, .., uN} (3) Mt = {m1,m2, ..,mL} (4) withN is the number of people need to be tracked or track- ers, and L is the number of available measurements at time Figure 3: Example of a grid map and threshold region bounded by a contour line. t. In order to assign a person i to a measurement j, the Hungarian method is used. Third, in order to locate people in real world coordinate system, we define a 2D map of the floor plane on which people move. This map contains all considered camera FOVs on the floor plane. We then calculate the coordi- nates of each FootPoint position on the 2D map on the ba- sis of camera calibration and hormography transform [8]. The trajectories for each person through cameras are then linked by a method of wrapping multiple camera FOVs us- ing a stereo calibration technique [9]. 3.2 Person re-ID In this paper, the person Re-ID problem is solved in the sce- nario of tracking by identification. This means that at each detected FootPoint position, we extract the human ROI, and a feature descriptor is built on this region. In this work, a robust KDES descriptor (Kernel Descriptor) which is pro- posed in our previous work [6], and an SVM classifier are used for person Re-ID in camera networks. The basic idea of KDES descriptor is to compute the approximate explicit feature map for kernel match function (see Figure 4). In other words, the kernel match functions are approximated 136 Informatica 41 (2017) 133–148 T.T.T. Pham et al. by explicit feature maps. This enables efficient learning methods for linear kernels to be applied to the non-linear kernels. Given a match kernel function k(x, y), the feature map ϕ(.) for the kernel k(x, y) is a function mapping a vector x into a feature space so that k(x, y) = ϕ(x)>ϕ(y). Given a set of basis vectors B = {ϕ(vi)}Di=1, the approxi- mation of feature map ϕ(x) can be: φ(x) = GkB(x) (5) where G>G = K−1BB , KBB is a D × D matrix with {KBB}ij = k(vi, vj), and kB is a D × 1 vector with {kB}i = k(x, vi). Kernel Trick KDES ),( yxk )()(),( yxyxk T φφ≈)()( yx φϕ ≈ )( xx ϕ→ Figure 4: The basic idea of representation based on kernel methods. Similar to [10], three match kernel functions for gradi- ent, color and shape are built from different pixel attributes of gradient, color and local binary pattern (LBP). For each match kernel, feature extraction is done at three levels: pixel, patch and whole detected human region. 4 WiFi-based person localization For WiFi, RSSI is the most popular attribute used in lo- calization. However, the localization performance depends much on how well we can model the relationship between RSSI and the distance. Two main approaches have been proposed to solve this: pass-loss/radio propagation model [12, 13] and fingerprinting method [14]. The first one is still an open subject, because it is not easy to have an op- timal model for relationship between RSSI and distance. The second one is time and workforce consuming but it is effective for localization, especially when the probabilistic methods are applied. In this work, both of radio propagation model and fin- gerprinting method for WiFi-based localization are ap- proached. A probabilistic propagation model (PPM) in [11], together with a new-defined radio map in fingerprint- ing database are used. The radio propagation model reflects the complex nature of indoor environments by taking into account the obstacles, such as walls and floors to model the relationship between RSSI value and the distance to a reference point (RP). The model is based on the empirical equation of radio-frequency signal strength in indoor envi- ronments and its uncertainty is considered by probabilistic characteristics. An optimization process based on genetic algorithm is also applied to tune system parameters for best fitting with the devices in use. Based on the probabilis- tic propagation model, the distance between a mobile user and APs is calculated. In fingerprinting database, a new ra- dio map of distance features instead of RSSI values is de- fined in order to make the radio map more reliable and sta- ble, with lower cost for setting and updating. Additionally, KNN matching method is applied with an additional coef- ficient reflecting temporal changes of fingerprinting data in environments. The flowchart of the proposed WiFi-based person localization system is illustrated in Figure 5, with two main phases of training and testing. The first phase is Radio Map RP Coordinates Fingerprint Database RSSI values PPM Distance values Offline training phase Online testing phase SERVER Distance values Mobile User Position RSSI PPM KNN matching Figure 5: Diagram of the proposed WiFi-based object lo- calization system. processed off-line with radio maps are constructed to make fingerprint database. Normally, a radio map contains RP coordinates and corresponding RSSI values from available APs. However, in our proposed system, RSSI values are replaced by distance values. A distance value is defined as the distance di(L) from the ith RP to the Lth AP in range (see Figure 6) which is calculated from RSSI observations by using the PPM model. In the testing phase, a mobile de-           AP2 RP3RP2RP1 AP3AP1 Figure 6: An example of radio map with a set of pi RPs and the distance values di(L) from each RP to L APs. vice continuously scan signals from nearby APs and sends corresponding RSSI values to a server. These values are then transformed to distance values by a proposed proba- bilistic propagation model. Distance matchings are done with fingerprint database by methods of KNN to find the best candidates for mobile user location. Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 137 4.1 Probabilistic propagation model The probabilistic propagation model which is formed by a deterministic model in Eq. 6 and a probabilistic model. P = P0 − 10nlog( r r0 )− kd nw∑ i=1 di cosβi (6) where nw is the number of walls and floors in the middle of the AP and the receiver, di is the thickness of the ith wall/floor, βi is the angle of arrival corresponding to the ith wall/floor, and kd is an attenuation factor per wall/floor thickness unit, as illustrated in Figure 7. Figure 7: WiFi signal attenuation through walls/floors. The deterministic model in Eq. 6 does not consider the uncertainty of RSSI values at a distance, so a probabilistic model (Eq. 7) is proposed. In reality, given RSSI P , the distance r might not be exactly the value calculated from Equation 6, but it is within a range around this value, which is denoted by r̄. To be more precise, r̄ will be the nominate value of the distance r with the highest probability. Given a RSSI P , the distribution of the distance is assumed to follow a normal (or Gaussian) distribution with median r̄: ρ(r, P ) = Pr(r|P ) = 1 σ √ 2π e −(r−r̄)2 2σ2 (7) where σ is a standard deviation, which is also a function of P . For simplicity, σ is assumed to be related to r̄ by a linear relation: σ = kσr̄ (8) In the proposed probabilistic propagation model, there are totally five parameters to be determined: P0, r0, n, kd and kσ. Excepting k0, other parameters can be estimated separately from individual measurements in a straightfor- ward manner. However, the values of these parameters can be slightly affected by the assumptions taken in the RF (Radio Frequency) propagation model. For this reason, a genetic algorithm (GA) [15] is used to find the optimal parameter set, all together. Genetic algorithms are global search techniques modeled after the natural genetic mech- anism to find approximate or exact solutions for optimiza- tion and search problems. In a GA, each parameter to be optimized is represented by a gene. Moreover, each indi- vidual is characterized by a chromosome, which is actually the above set of parameters awaiting optimization. To as- sess the quality of an individual, a fitness function (objec- tive function, or cost function) must be defined. For the localization module, the fitness function Ψ is defined as the root mean square of the localization error. Ψ = ( 1 N N∑ i=1 (x̂i−xi)2+(ŷi−yi)2+(ẑi−zi)2)1/2 (9) where N is the number of measurements, (xi, yi, zi) and (x̂i, ŷi, ẑi) are the real and the estimated positions, respec- tively. 4.2 Fingerprinting database and KNN matching Normally, a radio map in fingerprinting method is defined as follows: R , {(pi ,F(pi)) | i = 1 , ..,N } (10) where pi , [px py pz]T is real world coordinates of the ith RP and F(pi) , [ri(1) ,..,ri(n)] is the fingerprinting ma- trix, with n being the number of training samples at each RP. The vector ri(t),[r1i (t), .., r L i (t)] T contains RSSI values that are scanned from L APs at time t and the loca- tion pi. By using distance feature instead of RSSI, the radio map in Equation 10 then has a fingerprinting matrix F(pi) , [di(1) ,..,di(n)], with a vector di(t),[d1i (t), .., d L i (t)] contains distance samples di from the ith RP to L APs. This results in a reliable and stable radio map even in case some APs may be inactive at a certain point of time. Fur- thermore, the cost for setting and updating the radio map is much lower than using RSSI as usual. It is only rebuilt when we deploy new APs and RPs or discard them from the WiFi-based localization system. In testing phase, the RSSI values scanned from nearby APs by a mobile device will be converted to the corre- sponding distance values by PPM model. They will be compared with the training data to find the best matches. The matching method used in this work is KNN. In KNN, prediction for a new instance is based on its nearest neigh- bors in the training data. There are three main ingredi- ents associated with this method, those are (1) the similar- ity measure (the distance measurement) between the query patterns and training data; (2) the number of neighbors to be taken in the prediction; (3) the weight of the neigh- bors; Euclidean and Manhattan distances are two common geometric measures, in which Euclidean is the most used in WiFi-based localization system [16, 17]. In this work, KNN method is evaluated by Euclidean measure. In the proposed radio map, each RP is represented by vector di(t),[d1i (t), .., d L i (t)] T in L dimensional space. In learning phase, all these training data D with their de- pendent variables are stored. In this case, the dependent variables are equivalent to the positions pi of RPs in the environment. In prediction, for a new query pattern z and for each instance d in D, the similarity between d and z is 138 Informatica 41 (2017) 133–148 T.T.T. Pham et al. computed by Euclidean distance measure: l(d, z) = √√√√ n∑ i=1 (di − zi)2 (11) A set NB(z) of the nearest neighbors of z with |NB(z)| = k is also determined and then the estimated location for z is calculated. To find out an optimal k, we test on the empirical data with k in the range from 1 to 200 by an error function (12) for each k. Ek = √√√√ n∑ i=1 ( ŷ − y y )2 (12) where ŷ is the estimated position and y is true position. Finally, the predicted location of z is calculated by the weighted sum of the k neighbors (13). yz = ∑ d∈NB(z) w(d, z)× yd∑ d∈NB(z) w(d, z) (13) where w shows the weights that are chosen by (14). w(d, z) = e−θ×l(d,z) × e−λ×|ti−t0| (14) where θ and λ are constants used to define the curve of ex- ponential functions; t0 belongs to the time a query instance is captured and ti is the time of WiFi signal scanning at each corresponding RP in training phase; l(d, z) is the dis- similarity between a query instance and the its neighbor. In Equation 14, beside the weight based on dissimilarity θ a new coefficient of λ is proposed to reflect the chronologi- cal changes of fingerprinting data in the environment. This means the recently-updated fingerprinting data with query instance will have higher weight than the older one. 5 Proposed fusion method In order to improve the performance of person tracking in camera networks, for each camera FOV, person’s locations determined by WiFi system are optimally assigned with positioning results from camera system. This allows to not only maintain the high accuracy of vision-based per- son localization, but also improve the performance of per- son tracking in camera networks by assigning clearer ID of WiFi adapter to each position determined by camera sys- tem. Algorithm 1 shows the combined method of WiFi and camera system for people localization and identification. At time t, on the 2D floor map, a set of position observa- tions from WiFi system (zwi,t) or camera system (zcj,t) for multiple targets are shown. Index i designates one among N targets located by WiFi system, and index j refers to one of M positions observed by camera system. We consider recursively two consecutive observations of the localiza- tion results from any available sensors. At time t, assuming that we have a set of location observations coming from WiFi system for N targets, with zwi,t = (Xwi,t, Y wi,t, IDwi,t). If at previous time step (t-1) we get the observations zcj,t−1 = (Xcj,t−1, Y cj,t−1) for M positions from camera system. Without loss of generality, we can consider these observations as the state estimations at time t-1. The pre- diction step of the Kalman filter (KalmanPrediction) will be applied to estimate the next state xcj,t based on zcj,t−1. An assignment algorithm is then utilized to find out optimal matchings between the estimated states xcj,t from camera system with observations (zwi,t) from the WiFi system. Considering the result Ki,t of the assignment is the observations at the current time t, then the predicted state xt will be corrected by KalmanCorrection step, by which WiFi-based positions will be augmented with the vision-based positions. 5.1 Kalman filter In the proposed fusion algorithm, the step of state predic- tion in Kalman filter is used to estimate the process state at a certain time based on the position observation or mea- surement obtained from the previous time. The correction step of Kalman filter is done after doing optimal assign- ment between the estimated states and the observations at a certain time. In this case, a process state need to be esti- mated at a certain time is defined as a position pt of a per- son in the real world coordinate system of 2D floor map. It is presented by a state vector xt of location coordinates pXt and pYt on 2D floor map, together with their corresponding velocity values vXt and vYt: xt = (pXt, pYt, vXt, vYt) (15) A position observation zt is then defined as follows: zt = (mXt,mYt) (16) By assumption of constant velocity and acceleration in movement of people, and the position is measured n times per second, the state equations are then defined as follows: pXt = pXt−1 + vXt−1∆T (17) pYt = pYt−1 + vYt−1∆T (18) vXt = vXt−1 (19) vYt = vYt−1 (20) where ∆T = 1n . The state transition matrix A and the state-measurement matrix H are then defined as: A =  1 0 ∆T 0 0 1 0 ∆T 0 0 1 0 0 0 0 1  , H = [1 0 0 00 1 0 0 ] Kalman-based tracking will be started after the first suc- cessful calculated position from WiFi or camera system, Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 139 Algorithm 1: Person tracking by fusion of position observations from WiFi and camera systems. Input: position observations z from WiFi and camera localization systems Output: position estimations x 1 Parameters initiation: A, H, P1, Q, R; 2 for each set of position observations z do 3 if zi,t is from WiFi location system [zwi,t = (Xwi,t, Y wi,t, IDwi,t)] then 4 if zi,t−1 is from camera location system [zcj,t−1 = (Xci,t−1, Y ci,t−1)] then 5 [xcj,t,Pt] = KalmanPrediction(A,Q,zcj,t−1,Pt−1); 6 Ki,t = Assignment(xcj,t, zwi,t); 7 [xwi,t,Pt] = KalmanCorrection(H,R,Ki,t, xt,Pt); 8 Save xwi,t as a state estimation at time t; 9 end 10 else 11 [zcj,t = (Xcj,t, Y cj,t)] 12 if zi,t−1 is from WiFi localization system [zwi,t−1 = (Xwi,t−1, Y wi,t−1, IDwi,t−1)] then 13 [xwi,t,Pt] = KalmanPrediction(A,Q,zwi,t−1,Pt−1); 14 Ki,t = Assignment(xwi,t,zcj,t); 15 [xwi,t,Pt] = KalmanCorrection(H,R,Ki,t,xt,Pt); 16 Save xwi,t as a state estimation at time t; 17 end 18 end 19 end 20 return xwi,t; with the initial state vector x1. The initial covariance ma- trix P1 for the initial state is: P1 =  σ2x1 0 0 0 0 σ2y1 0 0 0 0 σ2vx1 0 0 0 0 σ2vy1  The state noise covariance matrix Q and the measurement noise covariance matrix R are defined as: Q =  σ2pX 0 0 0 0 σ2pY 0 0 0 0 σ2vX 0 0 0 0 σ2vY  ,R = [σ2mX 00 σ2mY ] where σ2 denotes deviation in centimeter from real values of each quantity. The measurement noise refers to the noise of calculated positions from WiFi or camera system, and the state noise is defined according to the motion of people. The initial covariance matrix P1 for the initial state x1, with assumption that the calculated position has the deviation of ±5cm from real position in both X and Y directions, and the velocity has the deviation of ±3cm. Similarly, the state noise covariance matrix Q is set with standard deviations of ±5cm and±3cm for the determined position and its veloc- ity, respectively. The measurement noise covariance matrix R is described with the standard deviation of 3cm for Foot- Point measurement in X and Y directions, and ∆T is set to 1, meaning that the position is measured every second. 5.2 Optimal assignment After the Kalman prediction step, we have a position esti- mation of xcj,t or xwi,t for camera or WiFi system, respec- tively. Considering the first case of position estimation xcj,t at time t for camera system, it is estimated from the previ- ous observation of vision-based location zj,t−1. Then, op- timal assignment at time t between xcj,t and zwi,t is applied. Assuming that the assignment of an estimated position xj and an observation zi incurs a cost dij which is the Eu- clidean distance between them, then the matrix DN×L of the costs or distances between every x ∈ M and z ∈ N is then defined as: D =  d11 d12 ... d1N d21 d22 ... d2N ... ... ... ... dM1 dM2 ... dMN  where dij = √ (Xcj −Xwi )2 + (Y cj − Y wi )2. The assign- ment is now formulated as a linear assignment problem: min ∑ i∈N ∑ j∈M dijxij (21) subject to ∑ i∈N xij = 1 ∀j ∈M∑ j∈M xij = 1 ∀i ∈ N xij ≥ 0 ∀i ∈ N , j ∈M 140 Informatica 41 (2017) 133–148 T.T.T. Pham et al. This optimal assignment is done with the following con- straints: – If N = M , for each pair of (xcj,t, zwi,t), we augment the position xcj,t with the identity IDwi,t from zwi,t; – If N > M , all unassigned zwi,t will be kept up with their original coordinates which are computed from WiFi-based localization system; – If N < M , all unassigned xcj,t are considered as false positives and will be discarded, because we assume in the surveillance system that all people coming in the monitoring areas hold WiFi-enabled devices and they have checked in at the entrance. The overall formula for these constraints is given as fol- lows: Ki,t = { (Xcj,t, Y c j,t, ID w i,t) if zwi,t is assigned; (Xwi,t, Y w i,t, ID w i,t) otherwise. where Ki,t denotes the association between position esti- mations xcj,t and observations zwi,t. Each component Ki,t is a random variable that takes its value among {0, .., N}. Based on this association, the location information from WiFi-based observations will be corrected according to the positions given by the camera system, and the correspond- ing ID from the WiFi system will be assigned. The cor- rection step of the Kalman filter is applied to update the predicted state by the current position observation Ki,t. The same procedure is done for the case in which WiFi- based location observations come before camera-based ones, and we have optimal assignment of an estimated po- sition xi from the WiFi system and an observation zj from the camera system. 6 Dataset and evaluation 6.1 Testing dataset In order to evaluate the combined algorithm for person tracking using both WiFi and camera systems, a multi- modal dataset with two scripts are constructed in this work. Script 1 is set with simpler scenarios than Script 2. Two people are involved in Script 1, with their random routes of moving through two non-overlapping cameras. Some inter-person occlusions appeared but not as frequently as in Script 2. The visual data in Script 1 is used for per- son localization and Re-ID based on camera. Script 2 con- tains five scenarios referring to different number of people taking part in each scenario: one person, two, three, and five moving people. The data in Script 2 is very challeng- ing for both WiFi-based and vision-based systems. People move through four different cameras. Severe occlusions happened because all people are required to move in close proximity with a fixed route (see Figure 8). Moreover, the similar human appearance is a challenge for visual process- ing problems. Figure 8: A 2D floor map of the testing environment in Figure 9, with the routing path of moving people in testing scenarios. Figure 9: Testing environment. The testing environment for building the dataset is shown in Figure 9, with 6 access points (APs) and 4 cam- eras are deployed in the environment. The APs are set to a same SSID, which assures continuous connectivity for mo- bile devices when people move from the range of one AP to another. The WiFi range for each AP is about 30–50 meters in radius, depending on walls and obstacles in the environment. The AP specifications are MAC address, AP position in X , Y and Z. All APs used in the testing are Linksys E1200 devices. A person holds a WiFi-enable de- vice and moves in the testing environment, with a normal velocity of 1–1.3m/s. The time duration for each scenario is from 3 to 5 min- utes, with about 400 RSSI values are acquired from 6 APs and average time deviation between two consecutive sam- ples is 2 seconds. The mobile devices and cameras are time synchronized to Internet time. This makes a synchroniza- tion of data captured from both camera and WiFi. Basing on this, we can compute real-world positions of a mobile user on the 2D floor map at each time. The time stamp for each person location calculated from camera or WiFi system will provide the basis for processing multi-model object localization. The WiFi data is scanned from the mo- bile devices and stored in XML files. These devices con- Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 141 Frame 491Frame 135 Frame 1596Frame 1145 Frame 905Frame 431 Frame 1969Frame 957 Frame 692Frame 242 Frame 1541Frame 1328 Frame 784Frame 313 Frame 2114Frame 810 Figure 10: The visual examples in Script 2. The first row contains frames for the scenario of one moving person. The scenarios for 2, 3 and 5 moving people are shown in the second, third and fourth rows. tinuously capture the signals from available APs in the en- vironment. The AP specifications are saved as a record of scanning time, MAC address, AP name, and RSSI. The APs are distinguished by their own MAC addresses. For visual data, we manually assign FootPoint positions on the captured frames with the corresponding time stamps and IDs. These positions are then automatically trans- formed into 2D locations on the floor map by using camera calibration and homography matrix. The person ID which is assigned in visual data is equivalent to the ID of WiFi adapter by predefined convention. In short, for each sce- nario, the ground truth data is achieved and saved as XML files which contain the following records: – Frame number. – Person ID. – Coordinates of top left and bottom right positions of the bounding box containing the person. – The image coordinates of FootPoint position. – The corresponding coordinates of FootPoint positions on 2D floor map. In case of no person detected, except frame number, all other records are set to -1. Figure 10 illustrates examples in Script 2. The frames in the first row show the scenario of one moving person, while those in the second, third and fourth rows are frames for the scenarios of two, three and five moving people. For WiFi data which is determined outside camera FOV, the ground truth of person locations in these regions are calculated by a pedestrian foot counting program. It takes input information from the acceleration and direction sen- sors that are available on smart phones or tablets [20]. Ba- sically, the positions of mobile user in this region are com- puted by the route length that user passes through marking points or reference points. This distance is calculated by foot counter with the average length of the foot step of each particular person is considered. The foot counter gives the positioning result of 5m with the deviation of 3m for the route length of 120m. In our test, the route length outside camera view is only about 10m. In addition, the bias for foot counter is accumulated from time to time, so in 10m this deviation will be 0.8m (equivalent to 8% of the route length). This makes the deviation of 8cm per one meter la- beled in the dataset in comparison with the truth positions. After the step of synchronization between WiFi and vi- 142 Informatica 41 (2017) 133–148 T.T.T. Pham et al. sual data, the interpolation method is applied to calculate the person positions that are outside the camera field of views. 6.2 Evaluation metrics In order to evaluate the performance of vision-based tracking, the metrics of Multi Object Tracking Precision (MOTP) [18], Global Multiple Object Tracking Accuracy (GMOTA) [19], and CMC (Cumulative Match Curve) are utilized. Assuming that for each time step t, a multi-person tracker outputs a set of hypotheses {h1, .., hm} for a set of visible people {u1, .., un}. MOTP measures the posi- tioning error for all matched pairs of person and tracker hypothesis on all frames. This metric is defined by: MOTP = ∑ i,t di,t∑ t ct (22) where di,t is Euclidean distance between ground truth and tracker hypothesis values for the person ith at time frame t. In this work, it is Euclidean distance between ground truth and tracker hypothesis of FootPoint positions. The element ct indicates the number of matched pairs at time step t. GMOTA is an extension of MOTA (Multiple Object Tracking Accuracy) [18]. MOTA measures the number of errors the tracker made in terms of false negatives (missed detections), false positives (wrong detections), mismatches and failure to recover tracks. This score is computed as follows: MOTA = 1− ∑ t(FNt + FPt + IDt)∑ t gt (23) where FNt is false negatives, FPt is false positive, IDt shows the number of instantaneous identity switches, and gt denote the number of ground truth detections at time frame t. In GMOTA score, the IDt is replaced by global IDt (gIDt). This means that gIDt presents the perfor- mance of the tracker in preservation of person identity as- signments in a global manner instead of instantaneous iden- tity assignments of MOTA. GMOTA = 1− ∑ t(FNt + FPt + gIDt)∑ t gt (24) The CMC is employed as the performance evaluation metric for vision-based person Re-ID. The CMC curve presents the expectation of finding correct match in the top n matches. The accuracy of the WiFi-based localization system is evaluated by the statistical values of maximal error, error average, and error at reliability of 90%. Maximal error is the maximum distance deviation in meter between the positions determined by the system and the ground truth positions. The error average refer to the average distance deviation in meter between the positions determined by the system and the ground truth positions. Error at reliability of 90% indicates the distance deviation value in meter in which 90% of the testing times are smaller than this value. The performance of fusion method is evaluated in this work by the metric of GMOTA. 6.3 Experimental results In vision-based person localization, at each camera FOV, person identification is done by a so-called process of iden- tification by tracking. This means a trajectory which be- longs to an individual in the current frame is linked to the corresponding one from the previous frame based on an optimal assignment of Euclidean distances between them. However, this results in ID switches when people switch to each others. The proposed method in 3.2 for person Re-ID helps to solve not only person identification in each camera FOV, but also person Re-ID among multiple cameras by using a robust appearance-based descriptor built on each detected human ROI at each FootPoint position. This allows to per- form tracking by identification. However, person identifi- cation and Re-ID performance still need to be improved, especially in case of inter-person occlusions and people have similar appearances. The proposed fusion algorithm allows adding clearer ID information of WiFi adapter for performing tracking by identification. In the following sections, the testing results for WiFi- based localization, vision-based localization and Re-ID, fusion-based tracking are shown. 6.3.1 WiFi-based localization results The system parameters of the WiFi-based localization model are calculated, then based on these, the positioning results are given out. Firstly, the training process using GA algorithm is set up with the configuration provided in Table 1. Using these data, the optimal parameters are produced as in Table 2. Parameter Value Parameter Value Population size 20 Tolerance 10−6 Elite count 5 Selection Uniform Crossover fraction 0.5 Crossover Scattered Time limit No Mutation Uniform Maximal generations No Creation population Uniform Table 1: Genetic algorithm configuration. Parameter Values for the first scenario Values for the second scenario P0 -41 dBm -36.1757 dBm n 1.1 2.2029 kσ 1.0035 m−1 5.3147 m−1 r0 5 m 2.5117 m kd 49.23 dBm.m−1 5.1311 dBm.m−1 Table 2: Optimized system parameters for the first and the second scenarios of testing environments. Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 143 Fingerprint Feature Maximal error (m) Average error (m) Error at reliability of 90% (m) RSSI 6.3 1.86 2.99 Distance 6.27 1.89 2.98 Table 3: Evaluations for distance and RSSI features in case of using coefficient λ. Fingerprint Feature Maximal error (m) Average error (m) Error at reliability of 90% (m) RSSI 6.06 1.76 3.55 Distance 6.5 1.59 2.9 Table 4: Localization results using different features of dis- tance and RSSI, without using coefficient λ. Secondly, the weights of different values of θ based on dissimilarity are given out (see Figure 11). Different values of λ are presented in Figure 12, with λ = 0.5 × 10−6, the influence is reduced by 3 when fingerprints is scanned from 1 month since the testing time (roughly 2.6×106 seconds). Similarly, when fingerprints is taken from 2 months since the testing time, the influence takes only 10% compared with that of new fingerprints. In this work, we choose k = 9, θ = 1.1 and λ = 2× 10−6. The radio maps and fingerprint locations in the testing environment are shown in Figure 13a and Figure 13b. The regions with deep pink color indicate that more APs are available than the regions with light pink color. The localization experiments are conducted by using fin- gerprinting method with distance features calculated by the proposed probabilistic propagation model. The com- parative results are also given out for using fingerprinting method with RSSI features. Additionally, the stability and reliability of radio map with distance features is also con- firmed by the evaluations with coefficient λ. Figures 14, 15, 16 show the comparative results when the coefficient λ is taken into account. The localization re- sults, distribution of the localization results compared to the real locations, and the reliability of the localization re- sult as a function of the localization error are shown corre- spondingly in these figures. The details for these results are shown in Table 3. It can be seen from the experiments that the positioning errors at reliability of 90% when using distance features are a little bit higher than using RSSI fea- tures. However, without using λ, the localization reliability for RSSI features decreases, whilst it is stable for distance features. The results for this are shown in Figures 17, 18, 21, and in Table 4, with the error at reliability of 90% is 3.55m for RSSI features, but it is 2.9m for distance fea- tures. The above experiments show that using distance fea- tures for fingerprint data will result in more stable and re- liable radio maps in comparison with using RSSI features. Moreover, this also brings lower cost for updating finger- print data, which is considered as one of the most challeng- ing problem of fingerprinting method in WiFi-based local- ization. 0 2 4 6 8 10 12 14 16 18 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Dissimilarity (m) W ei gh t theta=0.5 theta=0.7 theta=0.9 theta=1.1 theta=1.3 Figure 11: Weights of different values of θ based on dis- similarity. 0 2 4 6 8 10 12 14 x 10 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time (s) W ei gh t lambda=1/1000000 lambda=1/2000000 lambda=1/4000000 lambda=1/6000000 Figure 12: Weights of different values of λ based on dis- similarity. (a) (b) Figure 13: (a) the radio map, with (b) 2000 fingerprint lo- cations collected in the testing environment. 144 Informatica 41 (2017) 133–148 T.T.T. Pham et al. Vision-based evaluations The proposed fusion algorithm Hallway (Cam 1) Showroom (Cam 3) Hallway (Cam 1) Showroom (Cam 3) MOTP (cm) 24.3 21.3 24.3 21.3 FN (%) 17.1 26.4 7.6 12.6 FP (%) 22.7 18.3 3.4 2.1 gID 28.3 11.6 4.9 2.3 GMOTA (%) 31.2 52.6 83.9 85.7 Table 5: The comparative results of the proposed fusion algorithm against the vision-based evaluations on testing data of Script 1. 0 10 20 30 40 50 60 0 10 20 30 40 50 60 X (m) Y ( m ) Ground Truth path Localization result with distance feature Localization result with RSSI feature Figure 14: Localization results with distance and RSSI fea- tures when using coefficient λ. 6.3.2 Experimental results for vision and fusion-based tracking. The performance of vision-based person localization and Re-ID is evaluated on Script 1 and Script 2 databases. In addition, the comparative results gained from fusion sys- tem of camera and WiFi are also indicated on these. Firstly, vision-based person Re-ID evaluations are done on Script 1 data. The human ROIs are manually extracted from the frames captured by three non-overlapping cam- eras: Cam 1 (hallway), Cam 2 (lobby) and Cam 3 (show- room). The human ROIs from Cam 2 are used for training phase (see Figure 19) and the human ROIs from Cam 1 and Cam 3 for testing phase (see Figure 20). We train the system with totally 10 people, including two testing ones, by the images of human ROI extracted from Cam 2. Figure 22 shows person recognition rates for this experiment, with Rank 1 is 51.1%. Table 5 shows the results for vision-based localization, with two scenarios of Hallway (Cam 1) and Showroom (Cam 3) are considered. MOTP evaluated on the vision- based localization system with 24.3cm and 21.3cm for Hallway and Showroom scenarios respectively. These val- ues are retained for the fusion model of camera and WiFi. X (m) -6 -3 0 3 6 -9 -6 -3 0 3 Y ( m ) Distribution error with distance feature Distribution error with RSSI feature Figure 15: Distribution of localization error for distance and RSSI features when using coefficient λ. GMOTA ratio for Hallway is better than Showroom, with correspondingly 31.2% compared to 52%. However, by be- ing integrated with WiFi, these values increase incredibly to 83.9% for Hallway and 85.7% for Showroom. This re- sulted from the sharply decreases in the rates of FN, FP and gID in both scenarios. Additionally, in comparison with the perfect case of manual human detection in vision-based Re- ID, the performance of person tracking by identification is not as good as the results from the proposed fusion algo- rithm. Secondly, further evaluations for the proposed fusion al- gorithm, the experiments are done on the data of Script 2. This dataset is very challenging compared to Script 1, be- cause of severe occlusions and the similarity in human ap- pearance. Moreover, people moving together in the same route is also a challenge for WiFi-based localization. In the experiments with this data, we use the ground truth data of FootPoint positions and the corresponding hu- man ROIs for testing evaluations. The parameter gID in GMOTA metric now indicates the performance of tracker in maintaining the person ID when he/she moves from one camera FOV to others or re-appears in one camera FOV. Table 6 shows the comparative results of GMOTA when applying the fusion algorithm and Rank 1 for person Re- ID. It should be noted that FN and FP are not included Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 145 0 1 2 3 4 5 6 7 8 0 10 20 30 40 50 60 70 80 90 100 Error (m) R el ia bi lit y di st rib ut io n (% ) Localization reliability with distance feature Localization reliability with RSSI feature Figure 16: Localization reliability for distance and RSSI features when using coefficient λ. 0 10 20 30 40 50 60 0 10 20 30 40 50 60 X (m) Y ( m ) Ground Truth path RSSI Distance Figure 17: Localization results for distance and RSSI fea- tures, without using coefficient λ. in the testing evaluations of GMOTA because we use the ground truth data of FootPoint positions and human ROIs. In this case, only gID is taken into account. This means performance of maintenance person ID in tracking now de- pends only on the performance of WiFi-based person local- ization. In comparison with GMOTA values from Script 1, GMOTA figures from Script 2 are much lower. It is only 31.7% for the scenario of two moving people, 16.5% and 11.2% for scenarios of three and five moving people, re- spectively. This can be explained that data of Script 2 is much challenging than Script 1. People moving together in very close proximity is not only a burden for vision-based person localization and identification, but also for WiFi- based person localization because of noisy WiFi data when people are close to each other. However, in comparison with person Re-ID by kernel descriptor, these results are much higher. In this experi- ments, besides the number of testing people, we train the system with 20 other people at check-in gate for person Re-ID. The recognition rate at Rank 1 is only 12.6% for scenario of two moving people, which is 19.1% lower than -9 -6 -3 0 3 6 -9 -6 -3 0 3 6 9 X (m) Y ( m ) RSSI Distance Figure 18: Distribution of localization error for distance and RSSI features, without using coefficient λ. Two people Three people Five people GMOTA (%) 31.7 16.5 11.2 Rank 1 (%) 12.6 8.9 5.6 Table 6: The experimental results for person tracking and person Re-ID with Script 2 dataset. fusion-based method. Rank 1 figures for scenarios of three and five moving people is 8.9% and 5.6%. Clearly, perfor- mance of person Re-ID based on kernel descriptor will be degraded in case of the similar human appearance. From the above comparative evaluations, we can see that by using the proposed fusion algorithm, the performance of person tracking by identification and person Re-ID is im- proved significantly. The vision-based person localization with high accuracy, together with the clear ID information from WiFi-enable device are integrated into each detected FootPoint position. This allows to do tracking by identifi- cation at each camera FOV, and based on this, the person Re-ID in non-overlapping camera networks can be solved more effectively than applying only vision-based method. 7 Conclusion In this work, person localization and Re-ID in surveillance regions covered by WiFi signals and disjointed FOV cam- eras are improved by a fusion algorithm based on Kalman filter and optimal assignment technique. This algorithm is executed with the position observations on 2D floor map achieved from each single system of camera or WiFi. Evaluation on the multimodal dataset shows outperform- ing results when the proposed fusion algorithm is applied. The high positioning accuracy of vision-based system is maintained in multimodal person localization system. Ad- 146 Informatica 41 (2017) 133–148 T.T.T. Pham et al. Figure 19: Training examples of manually-extracted human ROIs from Cam 2 for person 1 (images on the left) and person 2 (images on the right). (a) (b) Figure 20: Testing examples of manually-extracted human ROIs from Cam 1 (images on the left column) and Cam 3 (images on the right column) for (a) person 1 and (b) person 2. Improvement of Person Tracking Accuracy in. . . Informatica 41 (2017) 133–148 147 0 1 2 3 4 5 6 0 10 20 30 40 50 60 70 80 90 100 Error (m) R el ia bi lit y di st rib ut io n (% ) RSSI Distance Figure 21: Localization reliability for distance and RSSI features, without using coefficient λ. 1 2 3 4 5 6 7 8 9 10 10 20 30 40 50 60 70 80 90 100 Rank R ec og ni tio n R at e Figure 22: Person Re-ID evaluations on testing data of two moving people. ditionally, the fusion algorithm allows tracking by identifi- cation and based on this person Re-ID in non-overlapping cameras is done with clear identity information taken from the WiFi-based system. In the future works, some other localization techniques, such as RFID or UWB, can be integrated into a multi- modal system in order to improve the positioning accuracy and person Re-ID. The fusion algorithm for person local- ization and Re-ID is also correspondingly broaden to adapt this addition. Acknowledgement This research is funded by the Vietnam National Founda- tion for Science and Technology Development (NAFOS- TED) under grant number 102.04-2013.32. References [1] Van den Berghe, Sam and Weyn, Maarten and Spruyt, Vincent and Ledda, Alessandro (2011) Combining wireless and visual tracking for an indoor environ- ment, International Conference on Indoor Position- ing and Indoor Navigation (IPIN-2011). [2] MIYAKI, Takashi, YAMASAKI, Toshihiko, et AIZAWA, Kiyoharu (2007) Visual tracking of pedes- trians jointly using wi-fi location system on dis- tributed camera network, 2007 IEEE International Conference on Multimedia and Expo, IEEE, 2007. p. 1762–1765. [3] Rekimoto, Jun and Shionozaki, Atsushi and Sueyoshi, Takahiko and Miyaki, Takashi (2006) PlaceEngine: a WiFi location platform based on realworld folksonomy Internet conference, p. 95–104. [4] Cheng, Yu-Chung and Chawathe, Yatin and LaMarca, Anthony and Krumm, John (2005) Accuracy charac- terization for metropolitan-scale Wi-Fi localization, Proceedings of the 3rd international conference on Mobile systems, applications, and services, ACM, p. 233–245. [5] Alahi, Alexandre and Haque, Albert and Fei-Fei, Li (2015) RGB-W: When Vision Meets Wireless, Pro- ceedings of the IEEE International Conference on Computer Vision, IEEE, p. 3289–3297. [6] Pham, T. T. T., Le, T. L., Vu, H., and Dao, T. K. (2017) Fully-automated person re-identification in multi-camera surveillance system with a robust ker- nel descriptor and effective shadow removal method, Image and Vision Computing, Elsevier, p. 44-62. [7] Kuhn, Harold W (1955) Naval research logistics quarterly, Wiley Online Library, p. 83–97. [8] Zhang, Zhengyou (2000) A flexible new technique for camera calibration, Pattern Analysis and Machine In- telligence, IEEE, p. 1330–1334. [9] Thi Thanh Thuy Pham, Anh Tuan Pham, Hai Vu (2015) A new technique for linking person trajecto- ries in surveillance camera network, Conference on Fundamental and Applied IT Research (FAIR), p. 8– 15. [10] Bo, Liefeng and Ren, Xiaofeng and Fox, Dieter (2010) Kernel descriptors for visual recognition, Ad- vances in Neural Information Processing Systems (NIPS), Vancouver, Canada, p. 244–252. [11] Dao, Trung-Kien and Pham, Thanh-Thuy and Castelli, Eric (2013) A robust WLAN positioning sys- tem based on probabilistic propagation model, 9th In- ternational Conference on Intelligent Environments (IE), IEEE, p. 24–29. [12] Goldsmith, A. (2005), Wireless communications, Cambridge university press. 148 Informatica 41 (2017) 133–148 T.T.T. Pham et al. [13] Roberts B. and Pahlavan K. (2009) Site-specific rss signature modeling for wifi localization, In Global Telecommunications Conference, IEEE, p. 1–6. [14] Munoz D., Lara F.B., Vargas C., and Enriquez- Caldera R. (2009), Position location techniques and applications, Academic Press. [15] Haupt, Randy L and Haupt, Sue Ellen (2004) Practi- cal genetic algorithms, John Wiley & Sons. [16] Jungmin So, Joo-Yub Lee, Cheal-Hwan Yoon, Hyun- jae Park (2013) An Improved Location Estimation Method for Wifi Fingerprint-based Indoor Localiza- tion, International Journal of Software Engineering and Its Applications. [17] Arsham Farshad, Jiwei Li, Mahesh K. Marina, Fran- cisco J. Garcia (2013) A Microscopic Look at WiFi Fingerprinting for Indoor Mobile Phone Localization in Diverse Environments, International Conference on Indoor Positioning and Indoor Navigation. [18] Bernardin, Keni and Stiefelhagen, Rainer (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics, EURASIP Journal on Image and Video Processing, Springer, p. 1–10. [19] Ben Shitrit, Horesh and Berclaz, Jerome and Fleuret, François and Fua, Pascal (2013) Tracklet-based Multi-Commodity Network Flow for Tracking Mul- tiple People, No. EPFL-PATENT-186751, WO. [20] Kothari, Nisarg and Kannan, Balajee and Glasgwow, Evan D and Dias, M Bernardine (2012) Robust indoor localization on a commercial smart phone, Procedia computer science, Elsevier, p. 1114–1120.