Informatica 39 (2015) 355-363 355 A Churn Resilience Technique on P2P Sensor Data Stream Delivery System Using Distributed Hashing Tomoya Kawakami Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara, Japan E-mail: kawakami@is.naist.jp Tomoya Kawakami, Yoshimasa Ishi, Tomoki Yoshihisa and Yuuichi Teranishi Cybermedia Center, Osaka University, Ibaraki, Osaka, Japan E-mail: ishi.yoshimasa@ais.cmc.osaka-u.ac.jp, yoshihisa@cmc.osaka-u.ac.jp Yuuichi Teranishi National Institute of Information and Communications Technology, Koganei, Tokyo, Japan E-mail: teranisi@cmc.osaka-u.ac.jp Keywords: sensor data, data stream, delivery cycle, distributed processing, churn resilience Received: July 16, 2015 Recently, sensor data stream delivery system that collects sensor data periodically and delivers successively has been attracting great attention. As for this sensor data stream delivery, receivers are possible to require the same sensor data stream with different delivery cycles. Our research team proposed methods to distribute communication loads by relay nodes in the case of delivering the sensor data streams that have different data delivery cycles. However, in the previous methods, since the specific node builds delivery paths and notifies related nodes, the assigned node is required to be updated when the related nodes churn. Therefore, in this paper, we propose a churn resilience technique that enhances the robustness of delivery system. We confirmed in simulations that the proposed technique improves the reliability of the delivery system. Povzetek: Razvita in stestiranaje nova metoda za asinhrono pošiljanje in zbiranje toka podatkov. 1 Introduction In recent years, various types of applications such as video delivery and environmental monitoring have been possible, and therefore, sensor data stream delivery where sensor data is periodically collected and delivered successively has been attracting great attention. As for this sensor data stream delivery, it is possible for the same sensor data stream to have different collection periods depending on the receivers. In the case where a live video of a solar eclipse taken from a camera is delivered, for example, the video is delivered at 30 fps to personal computers connected to the Internet through a wire and is delivered at 10 fps to mobile computers connected to the Internet through a 3G channel while moving. It is general in sensor data stream delivery that sensor data gained by one sensor is shared by a large number of users. Currently, various P2P-based techniques for dispersing the communication load of the deliverer (source) have been proposed in the data streaming [1-10]. In these researches, the same sensor data stream is delivered to a number of terminals (destinations), the communication load of the source is dispersed by sending the received data to other destinations. When the delivery cycle is different, the sensor data stream whose delivery cycle is a common divisor of required cycles can be delivered to all of the destinations if the delivery cycles are in a multiple relationship or can be approximated as having a multiple relationship. However, the destinations receive redundant data which are not included to the times of each required cycle. We have proposed techniques for sensor data stream delivery system in P2P model [11,12]. In our previous techniques, destinations having a long delivery cycle transmit the sensor data stream to other destinations so that the load of the source is dispersed. However, in the previous methods, since the specific node builds delivery paths and notifies related nodes, the assigned node is required to be updated when the related nodes churn. Therefore, in this paper, we propose a churn resilience technique that enhances the robustness of delivery system. The proposed technique uses a successor list used in Chord [13]. We confirmed in simulations that the proposed technique improves the reliability of the delivery system. 356 Informatica 39 (2015) 355-363 T. Kawakami et al. 2 Addressed problems 2.1 Assumptions We assume that computers (nodes) to relay sensor data streams constructs P2P overlay network in the sensor data stream delivery system. The sensor data stream delivery system distributes the delivery loads to the nodes and keeps high scalability in an environment where there are a huge number of sensor data streams and destinations. Sensor data streams are periodically sent from their sources through the Internet and delivered to destinations by the hops among nodes. Destinations request sensor data streams with those delivery cycles to a specific node also through the Internet. We assume that selectable delivery cycles for each sensor data stream are given. Nodes are able to send sensor data to other nodes anytime, and sensor data are distributed for each sensor data stream and time. The sensor data streams are denoted by Si (i = 1, • • • , l), destinations are denoted by Di (i = 1, • • • , m), and nodes are N (i = 1, • • • , n). Figure 1 shows a model of delivery system. In Figure 1, the number of sensor data streams is l = 2, the number of destinations is m = 4 and the number of nodes is n = 3. The 'a' represents the sensor data stream Si, and the 'b' represents the sensor data stream S2. The delivery cycles are shown near sources, nodes and destinations in Figure 1. The 's' represents the source of sensor data stream, and the numbers near destinations are requested delivery cycles from each destination. In the case where the delivery cycle is 0, it means that the destination does not request the sensor data stream. This corresponds to case where a live camera acquires an image once every second, and D1 does not view the image, D2 and D3 view the image once every second, and D4 views the image once every three seconds, for example. In this paper, we assume that selectable delivery cycles for each sensor data stream are given and are represented by C (i = 1, 2, • • •). The sensor data delivery system assigns delivery cycles or times to relay sensor data streams to nodes. The nodes send and receive various sensor data each other on specific times. 2.2 Objective function In sensor data stream delivery, it is important to avoid concentrating of processes and loads to a specific computer or network because the processing time affects and accumulates a delay of delivery. Therefore, in this paper, we aim to distribute communication loads to relay nodes on sensor data stream delivery system. The communication load of the node N is given as the total of the load due to the reception of the sensor data stream and the load due to the transmission. The communication load due to the reception is referred to as a reception load, and the reception load of N is I. The communication load due to the transmission is referred to as transmission load, and the transmission load of N is Oi. Figure 1: System model. In many cases, the reception load and the transmission load are proportional to the number of sensor data pieces per unit hour of the sensor data stream to be sent and received. The number of pieces of sensor data per unit hour of the sensor data stream that is to be received by N from Nj or Sk (i = j; i, j = 1, • • • , n; k = 1, • • • , l) is R(Nj, N) or R(Sk, Ni). In addition, the number of pieces of sensor data per unit hour of the sensor data stream that is to be sent by N to Nj or Dk (i = j; i, j = 1, • • • , n; k = 1, • • • , m) is R(Ni, Nj) or R(N, Dfc). The loads Li, I and Oi of Ni are given in the following equations: Li = Ii + Oi (1) Ii = a Y ,Ni ) + a Y R(Sfc ) (2) j=1 k=1 Oi = ß Y R(Ni,Nj)+ ß Y R(Ni,Dfc) (3) j=i fc=i where a and ft are loads with which one piece of sensor data is received and sent, respectively. The communication load SL of the entirety of the system is given by the following equation: SL = Y, Li (4) i=1 In addition, the following fairness index (FI) is often used as an index for load dispersion: FI = (ELi Li )2 n=i l2 (5) where 0 < FI < 1, and when FI =1, Lo = • • • = Ln. It is indicated that the closer FI is to 1, the more the load is dispersed. Another purpose of this study is to disperse the communication load to the destination nodes while suppressing the communication load of the entirety of the system. Therefore, the objective function is SL and 1 — FI, and the delivery path is determined to make these values minimum. A Churn Resilience Technique on. Informatica 39 (2015) 355-363 357 3 Robustness enhancement technique 3.1 Previous methods We have proposed techniques for a P2P model where such an environment that a number of destinations collect the sensor data stream during different cycle is assumed so that the load of the source is dispersed [11]. In addition, we have proposed a method which determines relay nodes based on distributed hashing and constructs delivery paths autonomously by each node [12]. The previous method using distributed hashing devides nodes into groups represented by the combination of sensor data stream and delivery cycle. In addition, the previous method assigns delivery times to nodes for each group of delivery cycle. The previous method distributes processes by assigning a node based on delivery time. In addition, the previous method avoids concentrating loads to a specific node and time by assigning a node for each group of delivery cycle. However, in the previous methods, since the specific node builds delivery paths and notifies to related nodes, the assigned node is required to be updated when the related nodes churn. 3.2 Redundancy of node assignment by successor list In the sensor data stream delivery, the number of data to send/receive varies among different delivery cycles. The shorter the delivery cycle is, the larger the number of data and the load are. Therefore, the previous method using distributed hashing first generates circular hash spaces for each sensor data stream and puts nodes on hash spaces based on the distributed hashing of the combination of sensor data stream and node ID. After that, the previous method divides each hash space into partial hash spaces as groups for each delivery cycle. By this process, partial hash spaces of the shorter cycle have the more nodes. The size of each partial hash space is determined based on its cycle. For example, in the case where the selectable delivery cycles are Ci = i (i = 1, 2,3), the ratio of the sizes of partial hash spaces is 1/Ci : I/C2 : I/C3 = 1/1 : 1/2 : 1/3 = 6:3:2. The previous method treats each partial hash space as circular and assigns related times for each cycle to nodes on its partial hash space. In the case where there are no nodes on the partial hash space, the previous method assigns the partial hash space to the nearest neighbor node on the next partial hash space. In addition, the previous method determines the root node on the partial hash space of the shortest cycle based on distributed hashing such as the least common multiple of cycles. The root node first receives data from the source of sensor data stream. In this paper, we propose a churn resilience technique that enhances the robustness of delivery system by a successor list used in Chord [13]. Figure 2 shows an example of the case where the number of nodes is n = 8, cycles are h(C3, 3K h(C3,0). h(C„ 5) hfC,. 1) htC,, 4) Figure 2: Assignment to a group of cycle. Require: cycles: Arrangement of delivery cycles of the nodes sorted in ascending order (cycle of source node is — 1 and at index 0) ownld: An identification of own (node) assignedCyclelndex: An index of own assigned delivery cycle in cycles succCount: The length of a successor list 1: 2: 3: 4 5: 6: 7: 8: 9: 10 11 12 13 14: 15 16: 17 18 19: 20: 21 22: 23 24: 25 cycleLcm — calculateLCM(cycles); if assignedCycleIndex = 0 or searchNode(0, cycleLcm, 0) = ownId then time — 0; while time < cycleLcm do assignedNode — searchNode(assignedCycleIndex, time, 0); if assignedNode = ownId then longCycleIndex — calculateLongestCycleIndex(cycles, time, 0); relayNode; if longCycleIndex = assignedCycleIndex then relayNode — searchNode(0, cycleLcm, 0); succList; succNode — ownId ; for i — 0 to succCount do succNode — successor(succNode); succList.add(succNode); end for else succNodeIndex — random(0, succCount + 1); relayNode — searchNode( longCycleIndex, time, succNodeIndex); end if requestToSend(relayNode, ownId, time); end if time — time + cycles[assignedCycleIndex ]; end while end if Figure 3: Pseudocode to construct delivery paths by nodes. Ci = i (i =1, 2,3), the size of a hash space is 2p, and the length of the successor list is 2. In Figure 2, the beginning values of each partial hash space are 2p x 0/11, 2p x 6/11, and 2p x 9/11. To construct delivery paths, nodes first calculate the least common multiple of selectable delivery cycles for each sensor data stream. After that, the nodes search the sender nodes for each related time that the same time is assigned to in the other cycle groups. The nodes determine the cycle groups to search for each time based on the approach such as the LCF method [11], and the node that belongs 356 Informatica 39 (2015) 355-363 T. Kawakami et al. Require: cycles: Arrangement of delivery cycles of the nodes sorted in ascending order (cycle of source node is — 1 and at index 0) ownId: An identification of own (destination) requestCycleIndex : An index of request delivery cycle in cycles succCount: The length of a successor list 1: cycleLcm ^ calculateLCM(cycles); 2: time ^ 0; 3: while time < cycleLcm do 4: targetCycleIndex ^ getRandomCycleIndex(cycles, time); 5: succNodeIndex ^ random(0, succCount + 1); 6: relayNode ^ searchNode( targetCycleIndex, time, succNodeIndex); 7: requestToSend(relayNode, ownId, time); 8: time ^ time + cycles[requestCycleIndex]; 9: end while Figure 4: Pseudocode to construct delivery paths by destinations. to the longest cycle on each time receives data from root node and sends to the nodes that belong to the other cycle groups. The node that does not belong to the longest cycle group on each time searches the node on the longest cycle group and requests to send data. On the other hand, the node that belongs to the longest cycle group on each time searches the root node of the sensor data stream and requests to send data. Figure 3 shows the pseudocode to construct delivery paths by a node. In the the pseudocode on the Figure 3, the least common multiple of the selectable delivery cycles is calculated in the line 1. The case shown in the line 2 is a case where this node does not belong to the shortest cycle group or is not the root node. The case shown in the line 6 is a case where time in the cycle group is assigned to this node, and the case shown in the line 9 is a case where the cycle group of this node is the longest cycle at time. In the line 10, the root node is searched as a sender to this node at time. Successor node is searched in the line 14, and the searched node is added to the successor list in the line 15. Successor node that receives data is selected at random in the line 18, and the node that belongs to the longest cycle group and succNodeIndex is searched as a sender to this node at time in the line 19. Finally, delivery path from the relay node at time is constructed in the line 21. Similarly destinations first calculate the least common multiple of selectable delivery cycles for each sensor data stream. After that, the destinations determine the cycle group for each time at random among the related cycles. The destinations search the senders in the determined cycle groups for each time and request to send data. Figure 4 shows the pseudocode to construct delivery paths by a destination. In the the pseudocode on the Figure 4, the least common multiple of the selectable delivery cycles is calculated in the line 1. The cycle group able to send at time and successor node that receives data are selected at random in the line 5. After that, the node that belongs to the selected cycle group and succNodeIndex is searched as a sender to this destination at time in the line 6. Finally, delivery path from o o _ o o - 0 500 1000 1500 2000 2500 Time Figure 5: The number of desitinations for the longest cycle group at each time. the relay node at time is constructed in the line 7. 4 Evaluation 4.1 Simulation environment In this section, we evaluate the proposed technique using distributed hashing in Section 3 by simulation. In the simulation environment, the number of nodes is n = 27 = 128, the number of sensor data streams is l = 27 = 128, and the number of destinations is m = 1000. The delivery cycles that destinations request are C = i (i = 1, • • • , 10) and determined at random from 1 to 10 for each sensor data stream. In this environment, the maximum of the least common multiple of delivery cycles is 2520, and then the timetable for delivery is from time 0 to time 2519. As an evaluated value, we calculated the load of each node, system total loads (SL), and fairness index (FI) among the time of the least common multiple of selectable delivery cycles. We executed this simulation 10 times for each method and environments described later. We calculated the average of the results. 4.2 Number of influenced destinations Figure 5 shows the number of destinations that cannot receive the data stream in the case where the assigned node of the longest cycle group on each time churns. In the proposed technique, the assigned node of the longest cycle group on each time relays to the assigned nodes of the other cycle groups on that time. Since a specific node of the longest cycle group is assigned each time, nodes of the other cycle groups and their destinations cannot receive the data stream in the case where the assigned node of the longest cycle group churns. In Figure 5, the assigned nodes of all cycle groups deliver to their destinations at time 0. The longest cycle group at time 0 is 10. Therefore, all of the one thousand destinations cannot receive the A Churn Resilience Technique on. Informatica 39 (2015) 355-363 357 o o <5 CO a <11 a t CO o o> oo c> iS « _ O) ° 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Churn Rate Figure 11: The average of instantaneous system reliability in the constant scenario. ■O .2 œ cc E a) m >» CO o a) c to 01 c O) 3 CM o o o ~i-1-1—^-1-1-1-1-1-1-r 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Churn Rate Figure 12: The average of instantaneous system reliability in the Gaussian scenario. a> EC E a) 0) CO to a) To I I None Figure 13: The average of instantaneous system reliability in the random scenario. large even if there are a few successors. Therefore, the successor list is effective to enhance the robustness and reliability of the delivery system. Figure 14 shows the maximum instantaneous load in an environment where the number of successors is 0, • • • , 4. The maximum instantaneous load is the maximum load for each node and time. The longitudinal axis is the maximum instantaneous load, and the lateral axis is the number of successors. In Figure 14, the difference of the maximum instantaneous load is small in this simulation environment. In the same environment, Figure 15 shows the maximum load of node, Figure 16 shows SL, and Figure 17 shows FI. Also the differences in those results are small in this simulation environment. Therefore, the maintenance cost of the successors is not influenced largely in the case where the number of destinations is especially higer than the number nodes. Figure 18 shows the rate of the number of hops to each node at time 0 in an environment where the number of successors is 0, • • • , 4. The lateral axis is the number of hops, and the longitudinal axis is the rate of the nodes that receive data under the number of hops. The nodes shown as "N/A" do not receive data at time 0 because other nodes in the same cycle group are assigned to time 0. In Figure 18, the nodes shown as "N/A" are reduced in the longer successor list because the nodes that receive data at time 0 are increased. Although the rate of the number of the higher hops are increased in the longer successor list, all of the nodes receive data under four hops. Figure 19 shows the rate of the number of hops to each destination at time 0 in the same environment. The longitudinal axis is the rate of the destinations that receive data under the number of hops. Although the rate of the number of the higher hops are increased in the longer successor list, all of the nodes receive data within five hops. 5 Related work Various techniques for dispersing the communication load in the delivery of streams have been proposed [14]. A P2P stream delivery technique using a P2P technology for sending and receiving data has been proposed in order to disperse the communication load among the terminals [1-5]. The P2P stream delivery technique is di- A Churn Resilience Technique on. Informatica 39 (2015) 355-363 357 œ "O o "a ca o _i to o 0} c cd c « « c Stream Cycle Time Cycle-Time r 4 Number of Successors Figure 14: Maximum instantaneous load by the number of successors. a> "a o « cm - o T3 CO o —I s -©- Stream -a- Cycle —1— Time -*- Cycle-Time Number of Successors Figure 15: Maximum load by the number of successors. vided into a pull type technique and a push type technique. In a pull type technique, such as PPLive1, DONet [1], and SopCast2, the reception terminal that receives data requests data from another terminal and acquires it. Although the reception terminal find terminals that have not yet been received the requested data, no redundant communications are carried out. In a push type technique, such as AnySee, data is sent from the transmission terminal that sends data to another terminal [2]. Although the transmission terminal find terminals that have not yet received the requested data, no such redundant communications are carried out. A technique combining a pull type and a push type, such as PRIME, has been proposed [3]. In P2P stream delivery techniques, a case where the same data stream is delivered to a number of terminals is assumed. In the delivery of the sensor data streams, however, there are cases where a data stream of the same sensor having different delivery cycles is delivered. In this case, 'http://www.pplive.com/ 2http://www.sopcast.com/ CD T3 O O 10 T3 CO O —I M ,o Stream Cycle Time Cycle-Time Number of Successors Figure 16: Total sytem loads by the number of successors. 00 o ë "o ç (0