Informatica 38 (2014) 339-346 339 An Online Compression Algorithm for Positioning Data Acquisition Wei Pan, Chunlong Yao, Xu Li and Lan Shen School of Information Science and Engineering, Dalian Polytechnic University, Dalian 116034, China E-mail: yaocl@dlpu.edu.cn Keywords: location data acquisition systems, positioning data, trajectory compression, trajectory recovery Received: September 17, 2014 Positioning data are usually acquired periodically and uploaded to the server via wireless network in the location data acquisition systems. Huge communication overheads between the terminal and the server and heavy loads of storage space are needed when a large number of data points are uploaded. To this end, an online compression algorithm for positioning data acquisition is proposed, which compresses data by reducing the number of uploaded positioning points. Error threshold can be set according to users' needs. Feature points are extracted to upload real-timely by considering the changes of direction and speed. If necessary, an approximation trajectory can be obtained by using the proposed recovery algorithm based on the feature points on the server. Positioning data in three different travel modes, including walk, non-walk and mixed mode, are acquired to validate the efficiency of the algorithm. The experimental results show that the proposed algorithm can get appropriate compression rate in various road conditions and travel modes, and has better adaptability. Povzetek: Predstavljen je nov algoritem za zajemanje podatkov o realnem času, uporaben za sisteme za določanje položaja. 1 Introduction With advances in tracking technologies and the rapid improvement in location-based services (LBS), a larger amount of location data needs to be collected [1]. These positioning data are linked up in chronological order to form a trajectory which contains a sequence of positioning data with latitude, longitude and timestamp. Trajectory can be used for positioning, navigation, providing some location-based services and expressing user geospatial history behaviour. For location acquisition systems, a large number of positioning data not only generate lots of communication overheads but also bring heavy burden for the storage system [2]. A major reason for this problem is that the uploaded data points are far more than necessary points [3]. Therefore, trajectory compression and simplification have received increasing concern. Trajectory compression can be classified into two categories, offline compression and online compression. In the offline compression, all data points collected (usually on the server) are considered for compression regardless of the data transfer process. In the online compression, data points are processed before they are uploaded, and only feature points selected are transmitted to the server. For the offline compression, Zhang et al. [2] proposed a trajectory data compression algorithm based on the spatial-temporal characteristic. In order to extract accurately trajectory information, the algorithm calculates the distance standard to judge the feature point according to the three-dimensional spatial-temporal characteristic of GPS data points. Given a trajectory Traj and error threshold, the Doulas-Peucker Algorithm [4] constructs a new trajectory Traj' by adding points from Traj repeatedly until the maximum spatial error of Traj' becomes smaller than error threshold. The Douglas-Peucker algorithm has the limitation of ignoring temporal data. Top-Down Time Ratio (TD-TR) [5] overcomes this limitation by using Synchronized Euclidean Distance (SED) [6] instead of spatial error. The characteristics of the vehicle trajectory are described in the proposed algorithm [7], in which the advantages and disadvantages of judgment method with point by point and judgment method with multi-point joint are analysed. This algorithm puts forward the description method for trajectory based on turning point judgment method. Chen et al. [8] proposed a trajectory simplification algorithm (TS), which considered both the shape skeleton and the semantic meanings of a GPS trajectory. The heading change degree of a GPS point and the distance between this point and its adjacent neighbours in this algorithm are used to weight the importance of the point. By studying the model of temporal data in vehicle monitoring system, Wang et al. [9] proposed a modelling idea for establishing trajectory version. To reduce storage and improve temporal queries, spatial-temporal cube is segmented to form unit spatial-temporal cube by this modelling idea. With discussing some problems of trajectory data compression about road net, Guo [10] proposed a non-linear compression algorithm of moving objects trajectories based on road net. The above offline algorithms have played a positive role in reducing amount of positioning points. However, they are only applied to the case that the start and end of the trajectory are clear. That is to say, compression begins only after obtaining all of the points from the input trajectory. That is why communication overheads 340 Informatica 38 (2014) 339-346 W. Pan et al. between the terminal and the server are huge and burdens of storage space are heavy. In recent years, some online compression algorithms have been proposed. Similar to Douglas-Peucker, Opening Window algorithm [11] approximates each trajectory using an increasing number of points so that the resulting spatial error is smaller than the bound. This algorithm is a generalized online compression with a little delay, and it works without considering time data. Opening Window Time Ratio (OPW-TR) [5] is an extension to Opening Window which uses SED instead of spatial error. Compared to Opening Window algorithms, OPW-TR has the benefit of taking temporal data into account. However, just like Opening Window algorithm, OPW-TR also has a little delay. Both of them cannot real-timely determine whether current point needs to be compressed. An algorithm called SQUTSH (Spatial QUalIty Simplification Heuristic) [12] uses a priority queue where the priority of each point is defined as an estimate of the error that the removal of that point would introduce. SQUISH compresses each trajectory by removing points of the lowest priority from the priority queue until it achieves the target compression ratio. This algorithm is fast and tends to introduce small errors during compression. However, it cannot compress trajectories while ensuring that the resulting error is within a user-specified bound. It also exhibits relatively large errors when the compression ratio is high [13]. Yin et al. [14] proposed an algorithm and a restraining model for vehicle data stream transmission. It gives a formula for data recovery in the monitoring centre by discussing the determination condition of speed and direction change degree in the algorithm. However, this algorithm is suitable for the integrated system of monitoring and navigation, and must be matched to the map. Wang et al. [15] proposed an improved moving object model based on vehicle monitoring system of moving objects model. This model provides a dynamic time selection algorithm, which adjusts dynamically the time density according to object moving speed. Therefore, the number of communications is reduced effectively. However, this algorithm works regardless of data recovery and the issue that the moving direction affects the accuracy of the model. Han et al. [16] proposed an adaptive algorithm for vehicle GPS monitoring data. The speed function is expressed by using fitting polynomial with constraint based on endpoint speed and mileage. This algorithm considers the case that terminal with and without map, in which the deviation distance is used for denoting the deviation of speed direction. At the same time, data recovery algorithm, algorithm evaluation index and application mode of algorithm are proposed in this algorithm. However, it must be set speed threshold in advance according to the road. In practice, according to the different road, predetermined speed threshold is very difficult. In this paper, an online compression algorithm for positioning data acquisition systems is proposed, which compresses the data points before transmission from terminal to the server. In this algorithm, the feature points are extracted and uploaded by computing the change of speed and direction based on error threshold set according to the users' requirements, and non-feature points are discarded to reduce unnecessary data transmission and bandwidth occupation. The remainder of this paper is organized as follows. Section 2 describes our new online compression algorithm for positioning data acquisition in detail. An experimental evaluation of trajectory compression algorithm is provided in Section 3. The paper concludes with future work in Section 4. 2 The online compression algorithm In order to compress an original trajectory, the key of our algorithm is to pick out feature points .For each point collected in the terminal, the current point will be uploaded and preserved if it is a feature point, else it will be discarded as a non-feature point. If needed, the recovery trajectory that approximated to the original trajectory can be obtained by the recovery algorithm with the feature points on the server. Our algorithm involves feature points, non-feature points and lost points. A feature point is a positioning point that a large change in speed or direction; a non-feature point is a positioning point that is discarded with a small change in speed and direction; a lost point is a positioning point that is lost because the communication delay caused by the positioning signal blind area. Generally, original trajectory is a temporally ordered sequence of positioning points P = {p1, P2, ..., Pn}, for i = 1, 2, ..., n, pt = (lngi, lati, ti) where lngi, lati represent the longitude and latitude of the ith point and ti stands for the time stamp. For a point pi, pi.lng, pi.lat and pi.t denote longitude, latitude and time stamp of Pi respectively. For each positioning point pi in an original trajectory, recovery point p/of pi is a data point obtained by the recovery algorithm, which represents pi in the recovery trajectory. 2.1 Algorithms In the compression phase, error threshold and acquisition interval are set in advance according to accuracy requirements. The initial two collected points (these two points must be the feature points) are directly uploaded to the server without judgment. Starting from the third positioning point, corresponding recovery point is calculated for each point collected currently. The distance between the actual point pi and its corresponding recovery point pi' is compared with the error threshold. If larger than the error threshold, the current point is regarded as a feature point and uploaded. If not, the current point is abandoned as a non-feature point (see Fig. 1). Judgment is done for each point, and the last positioning point is uploaded until the end of the acquisition. An Online Compression Algorithm for. Informatica 38 (2014) 339-346 341 p3 d3 pk-1 dk-1 pk-1 pk H pi d pi Figure 1: A schematic diagram for the compression process. collected at an interval. This interval is consistent for counter and data acquisition. When the time reaches an interval, if the signal is acquired, the counter is "0". If the signal is not acquired, that is the signal loss, counter is added with "1" until the signal is acquired. Fig. 2 shows a schematic diagram for the lost point. pi p1 p2 p3 p5 p4 As shown in Fig. 1, pi, p2, p3, p4, ps, pk-i, pk and pi are the original positioning points, each positioning point has a corresponding recovery point. pi, p2, p5 and pk are feature points, their recovery points are themselves. p3, p4'and pk-i'are recovery points of p3, p4 and pk-i. pi is previous feature points of p3' and p1 is previous points of pi. p3'is calculated by pi and pi. The speed of pi can be concluded by distance and time d-value between pi and pi. Supposing pi and p3'have the same speed. The product of this speed and interval T is the distance of p3. Set p3'on a straight line that pi, pi are located at. According to the coordinate of pi and pi, the coordinate of p3'can be calculated. In the same way, pi is still previous feature points of p4' and ps', and they are also calculated by pi and pi. Differently, the distance of p4' is the product of speed of the pi and twice interval T, the distance of ps'is the product of speed of the pi and triple interval T. The distance between recovery point pi' and actual point pi is called di and di is compared with the error threshold D. If di > D, current point is uploaded as a feature point. If not, current point is discarded as nonfeature point. d3, d4 in Fig. i are smaller than D, so p3, p4 are discarded. ds is not less than D, so ps is uploaded as feature point. The recovery point pi' of current point pi is calculated though the previous feature points pk of pi ' and the previous point pk-i ' ( it can be a feature point or not) of pk. The speed of pk is calculated according to pk and pk-i'. The product of this speed and time d-value between pi and pk is the distance of pi' . Set pi' in the straight line that pk, pk-i ' are located at. According to the coordinate of pk, pk-i' and the distance of p/, the coordinate of pi' is calculated. In the recovery phase, pi, pi, ps and pk are recovered directly as feature points. p3, p4 and pk-i are non-feature points. They are recovered to p3, p4', pk-i' according to the above process. In the recovery phase, the first and second feature points received are recovered directly. Determine whether there are non-feature points between the current pi and previous feature point pk from the third feature point. If there are non-feature points, they are recovered in order by calculating corresponding recovery points according to pk. Then recover the current point pi after all non-feature points between pi and pk are recovered. The process cycles until the recovery of the last feature point. To deal with the problem of lost point, positioning point pi introduces lost point count (Count) property and it is given with pi = (lngi, lat,, t, ci). The design idea is to add a counter in the terminal. Positioning data are Figure i: A schematic diagram for the lost point. As shown in Fig. i, pi, pi and p4 are original positioning points. After compression, p3 is discarded as non-feature point and p3' is corresponding recovery point of p3. There are three intervals between p3 and p4, and it should have two positioning points between them. However, due to signal loss, these two points are not collected. They are called the lost points. When determining whether ps is a feature point or not, p3'and p4 are made use for calculation direct without considering the lost points. When recovery of p3, the lost points are needed to consider. There are four intervals between two feature points pi and p4. It should have three positioning points. Since C4 = i, that is to say, two positioning point are lost, so just to recover the only non- feature point p3. 2.2 Algorithm description The recovery point is used to determine whether the current point is a feature point in the compression phase. The recovery point is the non-feature point recovered in the recovery phase. Therefore, the calculation of recovery point is the key of the algorithm. Making use of the previous feature point pk of pi' and the previous point pk-i'of pk to calculate the recovery point p/of current point pi (if i = 3, then pk = pi, pk-i' = pi). 2.2.1 The compression algorithm The recovery point of current point is calculated from the third positioning point collected. The actual point has attributes of latitude and longitude. For ease of calculation, latitude and longitude as spherical coordinate can be transformed into rectangular coordinate by the algorithm [i7]. That is to say, pi = (longi, lat, t) as spherical coordinate is transformed into pi = (xi, y, ti) as rectangular coordinate. For any two points pi and p,, the distance of them is Lj= Dist (pi, pj) in the trajectory with n points. Accordingly, the distance between pk and pk-i'is Lk. pk and pk-i ' are the consecutive points and the time d-value of them is pk. t - pk - i'.t (in the case of signal not loss, the d-value is the interval T). The speed Vi is calculated by Formula (i) and interval T. The distance Si of pi' is calculated according to the time d-value of pi, pk and Formula (i). 342 Informatica 38 (2014) 339-346 W. Pan et al. Lk = Dist(pk_! , pk )=y[(pk -x - pk_! .x)2 + (pk .y - pk_x .yf , 2 < k < n -1 (O i . \2 , i _ _ ,.\2 L I ' 2 ' 2" T/ -L- V(pk-x_pk-1-x) +(pk-y_pk-1 -y) . , , , , , , . , F = ' = ---,-, 2 < k < n-1, 3 < i < n C2) pk * - pk-1 * pk * - pk-1 - S, = V, x (pt .t- pk .*) = •¡iPk- r r x - pk-1 ■x)1 +( pk .y - pk-1 yf pk * - pk-1. x (pi tt - pk .t), 2 < k < n-1, 3 < i < n (3) coordinates of pk and pk-i' and triangle similar properties After calculating the distance of pi, position [18], Formulas are as follows. Detailed compression coordinate of p,' is calculated according to rectangular process is shown in Algorithm 1. pi'.x = pk.x + si x (pk x - pk-1 x) L = Pk.x + p,'.y = pk.y + (pt * - pk .t) x (pk .x - pk-1 .x) I , pk .t - pk-1 . t x (pk.y - pk -1 .y) L 2 < k < n-1, 3 < i < n (4) = phy + (p < - pk D x (pk .y - pk-1 .y) , 2 < k< n-l, 3 < i < n pk 1- pk-1 . (5) Algorithm 1 for compression Input: pi , pk andpk-1', D and T. Output: A simplified trajectory Traj' with feature points. 1. Collect data points and open counter , Traj' = 0; 2. Upload pi and p2, // The first and second positioning point needn't decide and are uploaded directly 3. pk=p2 ,pk-i' = pi; //p2 is assigned to pk, pi is assigned to pk-i' 4. Foreach p, (i > 3); // Cyclic judge for p, 5.Transform p, = (lng,, lat,, t,) to p, = (x,, y,, t,); // Spherical coordinate is transformed into rectangular coordinate 6. Calculate S,; // Calculate the distance ofp{ 7.Calculate (x^y,); // Calculate the rectangular coordinate ofp{ 8.Transformpi = (x,, y,, t,) to p, = (Ing,, lat,, t,); // Rectangular coordinate is transformed into spherical coordinate 9. Dist(pi, pi) is di; // The distance of p, and pi is d, 10. Compare d, and D ; // Error threshold is D 11. If (d, > D) , upload p, ; // If di > D, upload p, as a feature point 12. pk=p, , pk-1' = p,-1'; // pi assigned to pk» p,-1' assigned to pk-1' 13. Else (di < D) , ignore p,; // If di < D, discardp,as a non-feature point 14. End foreach; // End of cycle 15. Return Traj' = pp2,... pk,- Pn }_ 2.2.2 The recovery algorithm Determine whether exist the non-feature points between the current point pt and the previous feature point pk from the third feature point received. Judgment is that getting the number Nt of intervals based on the time d- value of two points divided by the interval T. By the Formula (6) and the number C, of lost points p,, the number Mi of recovery point is calculated. Special attention is that the interval number of two adjacent original positioning points is "1", but there is no recovery point between them. Therefore, the number of An Online Compression Algorithm for. Informatica 38 (2014) 339-346 343 recovery points should extra minus "1". If Mi is not equal"0", it means that it exists recovery points. To recover the Mi recovery points in order. The speed of the recovery point is calculated by Formula (2), and the product of speed and r interval T is the distance Gr of the r-th recovery point. p .t — p, .t Ni = ^-^ , 3 < i < n, 2< k 3); // Cyclic judge for current feature point pi 5.Calculate Mi ; // Calculate the number of recovery points between current feature point and previous feature point 6. Foreach r (1< r