Informatica 36 (2012) 213-220 213 Issues in Energy Optimization of Reinforcement Learning Based Routing Algorithm Applied to Ad-hoc Networks Shrirang Ambaji Kulkarni Department of Computer Science and Engineering Gogte Institute of Technology, Belgaum - 590006, Karnataka, India E-mail: sakulkarni@git.edu G. Raghavendra Rao Department of Computer Science and Engineering National Institute of Engineering, Mysore - 590008, Karnataka, India E-mail: grrao56@gmail.com Keywords: ad-hoc networks, routing protocols, reinforcement learning, energy optimization and scalability Received: January 5, 2012 Ad-hoc networks represent a class of networks which are highly unpredictable. The critical work of such networks is performed by the underlying routing protocols. Decision in such an unpredictable environment and with a greater degree of successes can be best modelled by a reinforcement learning algorithm. In this paper we consider SAMPLE, a collaborative reinforcement learning based routing algorithm, which performs competitively with other routing protocols of similar category. A major concern of SAMPLE is its energy consumption, as most of the wireless nodes are driven by finite battery power. Energy conservation has a direct bearing on the network survivability and also affects the underlying quality of services. Energy conservation is not just the problem of the network layer, but it must be considered at the data link layer. Thus we consider a cross-layer energy conservation algorithm SPAN which models a solution by aiding the routing protocol at the network layer with a backbone of stable energy nodes and conserves the energy of remaining nodes by extracting the best features of IEEE 802.11 power saving mode. Most of the network survivability issues should consider scalable scenarios and our work extends our applied energy optimization framework to scalable scenarios. A scalable scenario can be best visualized if we can gain insight into the performances of underlying mobility models. Thus we extend our work to the analysis of the underlying mobility model and its impact on energy conservation in the traffic-mobility dimensions. We also verify the simulation results statistically using hypothesis testing to prove the superiority of our energy conservation attempts for SAMPLE. Povzetek: Opisana je optimizacija potrošnje s spodbujevalnim učenjem v omrežjih. 1 Introduction Ad-hoc networks represent a special class of dynamic networks. An ad-hoc network is characterized by dynamic topology, congestion, energy and security constraints [1]. A basic requirement of an ad-hoc network is that nodes act as both hosts and routers. A major work of such networks which represent a complex optimization problem is performed by the underlying routing protocols. Reinforcement learning algorithms hold a lot of promise where solutions to problems in unpredictable scenarios need to be considered [2], [3]. Thus we consider SAMPLE [4] a collaborative reinforcement learning (CRL) routing algorithm which models the dynamics of ad-hoc networks as a distributed optimization problem and strives to solve it, in an optimal manner. Energy consumption modelling was not measured in SAMPLE and in [5] AODV performed better than SAMPLE by 34%. Thus we propose to modify SAMPLE and integrate it with a cross-layer energy extension SPAN [6]. SPAN is engaged in helping a routing algorithm by providing a backbone of energy stable nodes called as coordinators and at the same time enables the remaining nodes to exploit IEEE power saving mode (PSM) [7] of 802.11 at data link layer. Scalability of nodes has a profound effect on the performance of routing algorithms and many hidden issues may come to light. As such the present study extends the energy consumption analysis of SAMPLE under moderately scalable scenarios. To gain some insight into the mobility nodes and their performances the underlying Steady State Random Waypoint [8] for performance analysis is analysed. We also conduct t-Test to vary our assumption and prove the statistical significance of our results. Thus the goal of this paper is to optimize the energy conservation of SAMPLE by 214 Informatica 36 (2012) 213-220 S.A. Kulkami et al. integrating it with SPAN and comparing its performance with AODV, a benchmark routing protocol, in the traffic-mobility dimensions and under moderately scalable conditions. In Section 2 we discuss some of the related work. Section 3 proposes our optimization framework for energy conservation of SAMPLE routing algorithm. In Section 4 we discuss about the Steady State Random Waypoint Mobility Model with its performances under varying density of nodes. Section 5 brings out the simulation results of energy conservation of SPAN with SAMPLE in comparison with a benchmark ad-hoc routing protocol AODV under the traffic-mobility dimensions and scalable conditions along with statistical analysis of results. Section 6 concludes the paper. 2 Related Work Dowling et.al [4] introduced and evaluated collaborative reinforcement learning (CRL) as a self-organizing technique for mobile ad-hoc network routing protocol called SAMPLE. They considered the Random Waypoint Mobility Model and did not model nor optimize energy conservation as a part of their work. P Nurmi [14] applied reinforcement learning algorithms for ad-hoc network problem where routing decisions of nodes where modelled on selfishness and on the energy level of nodes. Chen and Chang [15] in their work on impact of mobility models on energy conservation showed significant energy conservation differences on various mobility models. They concluded that reactive protocols perform well when nodes move in groups, in terms of energy conservation. They did not evaluate the performance of mobility models in terms of their mobility metrics and did not correlate this to energy conservation. S.A. Kulkarni and G R Rao [25] in their work on energy efficiency of routing protocols observed that DSR was energy efficient as compared to AODV. They enhanced the energy efficiency of AODV by deploying SPAN, which acted as a middleware between the data link and the network layer. V Naumov and T Gross [16] analysed the scalability of routing methods for ad-hoc networks and investigated the performance of two routing protocols namely AODV and DSR. They considered only Random Waypoint Mobility Model. They did not take energy conservation of routing protocols in their study. Xu et.al. [17] proposed GAF an energy conservation scheme similar to SPAN. GAF had differences as compared to SPAN; firstly it required nodes to know their geographic positions and second was SPAN's superiority in terms of its integration with IEEE 802.11 PSM. In PAMAS the power-saving medium access protocol [18], [19], a node turns off its radio when it is overhearing a packet not addressed to it. This approach justified the idea when it is costly to process a received packet as compared to mere listening. 3 Energy Optimization Framework In order to optimize energy conservation in the complex dynamics of ad-hoc networks, and support the functioning of the collaborative reinforcement learning algorithm SAMPLE [4], the following architecture illustrated in Figure 1 is proposed. SAMPLE routing protocol works in an on-demand fashion based on collaborative reinforcement learning (CRL). Routing information is represented as a V-value and is floated in the network by attaching it to the data packets. Each agent maintains routing tables that stores the cost to the neighbours and their last advertised V-value. The path selection is based on the Boltzmann-action selection and optimal paths are selected. Now the Figure 1 : Energy Optimization Frame work for SAMPLE. ISSUES IN ENERGY OPTIMIZATION. Informatica 36 (2012) 213-220 215 active nodes performing critical routing actions are eligible to act as coordinators to form energy-stable backbone nodes as chosen by the SPAN algorithm. SPAN adaptively elects "coordinators" from the given active nodes in the ad-hoc network. The chief role of the coordinators is to stay awake and perform multi-hop routing functions, while other nodes are engaged in power saving mode. The task of the coordinator is rotated among all the nodes and SPAN strives to minimize the number of coordinators at any given point of time, one per group of nodes within a radio range. In order to prevent a situation where several nodes contest simultaneously to be elected as a coordinator, a node delays its candidature as specified in Equation 1. whereV ¿is the number of neighbours of node i; Ci is the number of additional pairs of nodes that would be connected if "i" becomes coordinator.,^ and fm are the amounts of remaining and maximum energies of the node.T is the round-trip delay of a packet and R is a uniform value from the interval (0, 1). Only if a node satisfies Equationl, does it announce itself as a coordinator by broadcasting a "Hello" packet. SPAN in order to save energy consumption, relies heavily on IEEE 802.11 power saving mode. The basic idea behind PSM is to power down or make the radio device sleep, when it has no data to transmit or receive. Energy conservation study [6] did not consider scalability issues. In our work we evaluate our optimization framework under scalable routing scenarios. One of the important scenarios to gain information about energy conservation under scalable operations is to gain insights in the underlying mobility model and explore its optimization features via its performance analysis metrics. This study substantiates the fact that network lifetime is defined by the time to a partition in the network [9]. 4 Steady State Random Waypoint Mobility Model The Random Waypoint Mobility despite being simple and easy to simulate is not very realistic. The model also suffers from non steady-state distribution at the start of a simulation and then ultimately converges to a steady-state distribution known in probability terms as "stationary distribution". The network performance may be affected between start-up time and after steady state has been reached as described in [26]. Thus we consider a Steady State Random Waypoint Mobility Model. Consider a mobile node [27] that lives in a connected set A. The trip end times are T0 = 0 < 7\ u 0.00018 £ (1) Ti 0.00016 C a 0.00014 a Q 0.00012 U 0.0001 4-1 a 0.00008 CO a> 0.00006 a (U 0.00004 0» < 0.00002 10 15 20 Max Speed Sec Figure 4: Average Spatial Dependency of Steady State Random Waypoint Mobility Model. Figure 4 shows that the average spatial dependency of RW-MD is high as compared to RW-LD and RW-MD outscore RW-LD by 96%. As mobile ad-hoc networks exhibit cohesive properties, this is directly supported by high spatial dependency. Also a high spatial dependency minimizes network partitions and routing overhead sometimes as nodes are nearby. This in turns has a positive effect on energy conservation, provided the link duration is also correspondingly high. 0 5 25 ISSUES IN ENERGY OPTIMIZATION. Informatica 36 (2012) 213-220 217 5 Simulation Environment and Experimental Results NS-2 simulator ver. 2.28 from [13] was used for the study of energy optimization of routing protocols like AODV and SAMPLE. The underlying MAC Protocol is defined by IEEE 802.11. Continuous bit rate (CBR) traffic sources along with TCP based traffic sources are used. The mobility model used was the Steady State Random Waypoint Mobility Model [20]. The field configurations are 1200 x 1200 m. The traffic generator script called cbrgen.tcl was used to generate CBR/TCP scenario of 8 sources at the rate of 4.0 kbps. The number of nodes in the simulation environment was 20 nodes for low density and 100 nodes for medium density. At least 5 scenarios files for Steady State Random Waypoint Model at different maximum speed of 5, 10, 15, 20, 25 sec were used for testing protocols like AODV and SAMPLE. 350 C 0 *j 300 Si 250 200 £ 0 0 > ra 5 150 £ LU (U 100 ra ra 5 50 □ SAMPLE-CBR-RW-LD □ AODV-CBR-RW-LD □ SAMPLE-SPAN-CBR-RW-LD 10 15 20 25 Max Speed Sec 5.1 Energy Parameter Configuration The energy model for energy consumption is based on the parameters given in Table 1. Figure 5: Average energy conservation of SAMPLE, AODV and SAMPLE-SPAN with CBR based traffic and on low-density of nodes. Tx Rx Idle Sleeping 1400mW 1000mW 830mW 130mW Table 1: Energy Consumption Parameters. The initial energy supplied to all wireless nodes were 1000 Joules. 5.2 Energy Conservation Metric Average Energy Conservation (AEC) - This metric specifies the average of residual energy of the nodes in an ad-hoc network at the finish of the simulation period. yw p. y l=lP l/N ( 7 ) wherep is the remaining energy and N is the number of nodes in the given ad-hoc network 5.3 Simulation Results The results of the energy optimization framework which consists of SAMPLE integrated with SPAN on 802.11 and compared with SAMPLE and AODV on 802.11 are illustrated in Figure 5, Figure 6, Figure 7 and Figure 8. In Figure 5 it is observed that the average energy conservation is the least for the adaptive reinforcement learning based routing algorithm SAMPLE. AODV with low density of nodes and on CBR based traffic outscores SAMPLE by 7% on an average. Thus to improve the energy conservation issues we apply an energy aware extension SPAN to aid modified SAMPLE. In Figure 5 it is clearly observed that SAMPLE-SPAN outperforms AODV and SAMPLE by 25 % and 30% respectively. The high link duration of RW-LD aids in link stability and possibly lesser routing overhead, which in turns aids energy conservation. 250 n E: m 0) 200 C O o > 150 P O W 100 o TO n m > < 50 □ SAMPLE-CBR-RW-MD □ AODV-CBR-RW-MD □ SAMPLE-SPAN-CBR-RW-MD 5 10 15 20 25 Max Speed Sec Figure 6: Average energy conservation of SAMPLE, AODV and SAMPLE-SPAN with CBR based traffic and on medium-density of nodes. In Figure 6 it is seen that SAMPLE once again performs least in terms of energy conservation and is inferior as compared to AODV by 3% on an average. AODV and SAMPLE performance in terms of energy conservation is more or less the same because of improved spatial dependency exhibited by RW-MD, which reduces network partitions in moderately scalable setting and this in turn complements energy conservation. SAMPLE-SPAN outperforms AODV and SAMPLE on an average by 15% and 13% respectively. 0 5 300 0 218 Informatica 36 (2012) 213-220 S.A. Kulkami et al. □ SAMPLE-TCP-RW-LD □ AODV-TCP-RW-LD □ SAMPLE-SPAN-TCP-RW-LD 10 15 20 25 Max Speed Sec 300 £ 0 ra 250 £ a) in c 200 O U > 150 < □ SAMPLE-TCP-RW-MD □ AODV-TCP-RW-MD □ SAMPLE-SPAN-TCP-RW-MD 5 10 15 20 25 Max Speed Sec Figure 8: Average energy conservation of SAMPLE, AODV and SAMPLE-SPAN with TCP based traffic and on medium-density of nodes. In Figure 8 under moderately scalable condition and with relatively higher degree of spatial dependency, it is observed that SAMPLE on TCP based traffic outperforms AODV by 6% and SAMPLE-SPAN outperforms AODV and SAMPLE by 22% and 17% respectively. To prove the statistical significance of our results we conducted Hypothesis Testing on the simulation results. The assumptions are as follows. The null hypothesis H0 states that the values for both the data set are similar i.e. H0 : = ^2 . The alternative hypothesis Ha states that the values for both the data set are not similar i.e. Ha : 4 ^2 , whereby we would like to prove that there is sufficient energy conservation achieved by SAMPLE with SPAN extensions. The Hypothesis t-Test [24] results using Analysis Toolpak[23] are illustrated in Table 2, Table 3, Table 4 and Table 5. Figure 7: Average energy conservation of SAMPLE, AODV and SAMPLE-SPAN with TCP based traffic and on low-density of nodes. In Figure 7 for TCP based traffic and low density of nodes, which have stable link properties, SAMPLE with its inherent capability to extract stable links and model them continuously outperforms AODV by 15%. SAMPE-SPAN further conserves energy and outperforms AODV and SAMPLE by 41% and 31% respectively Variable 1 Variable 2 Mean 233.0138 311.2544 Variance 19.00113 7.840572 Observations 5 5 Pooled Variance 13.42085 Hypothesized Mean Difference 0 Df 8 t Stat -33.7685 P(T<=t) one-tail 3.23E-10 t Critical one-tail 1.859548 P(T<=t) two-tail 6.46E-10 t Critical two-tail 2.306004 Table 2: Hypothesis t-Test: Two-Sample Assuming Equal Variances for comparison of average energy conservation of AODV and SAMPLE-SPAN for low density and CBR Traffic. Variable 1 Variable 2 Mean 211.1304 241.650 1 Variance 7.115177 50.2387 5 Observations 5 5 Pooled Variance 28.67696 Hypothesized Mean Difference 0 df 8 t Stat -9.01123 P(T<=t) one-tail 9.18E-06 t Critical one-tail 1.859548 P(T<=t) two-tail 1.84E-05 t Critical two-tail 2.306004 Table 3 Hypothesis t-Test: Two-Sample Assuming Equal Variances for comparison of average energy conservation of AODV and SAMPLE-SPAN for Medium density and CBR Traffic. Variable 1 Variable 2 Mean 196.1009 331.38 28 Variance 53.8941 33.708 04 Observations 5 5 Pooled Variance 43.80107 Hypothesized Mean Difference 0 df 8 t Stat -32.3197 400 350 300 250 200 150 100 50 0 5 0 ISSUES IN ENERGY OPTIMIZATION. Informatica 36 (2012) 213-220 219 P(T<=t) one-tail 4.58E-10 t Critical one-tail 1.859548 P(T<=t) two-tail 9.15E-10 t Critical two-tail 2.306004 Table 4: Hypothesis t-Test: Two-Sample Assuming Equal Variances for comparison of average energy conservation of AODV and SAMPLE-SPAN for Low density and TCP Traffic. Variable 1 Variable 2 Mean 199.9108 254.5423 Variance 17.32532 45.22516 Observations 5 5 Pooled Variance 31.27524 Hypothesized Mean Difference 0 df 8 t Stat -15.4459 P(T<=t) one-tail 1.53E-07 t Critical one-tail 1.859548 P(T<=t) two-tail 3.07E-07 t Critical two-tail 2.306004 Table 5 Hypothesis t-Test: Two-Sample Assuming Equal Variances for comparison of average energy conservation of AODV and SAMPLE-SPAN for Medium density and TCP Traffic. The test data was collected by varying mobility speed for 5, 10, 15, 20 and 25 m/sec. For each test the significance level was 0.05%. Our tests rejected the null hypothesis and this proved our simulation results with 95% confidence interval, for average energy conservation were significant for SAMPLE with SPAN extensions, as compared to average energy conservation for AODV routing protocol. 6 Conclusions In this paper our study augmented and modified a reinforcement learning routing algorithm SAMPLE'S energy conservation capabilities by integrating it with SPAN a cross layer energy saving extension in the proposed energy conservation optimization framework. We also analysed the experimental mobility model, the Steady State Random Waypoint Mobility Model for performance analysis under low and medium density of nodes to gain insight into energy conservation and scalability issues. From our experimental studies we observed that SAMPLE-SPAN outperformed AODV and SAMPLE on both low density and high density and for both CBR and TCP based traffic. Thus energy optimization should not be the property of only the network layer and it should be visualized at the data link layer also. We also verified our results statistically. In future we would like to apply the energy optimization frame work to an optimized mobility model to gain insights into superior mobility complementing energy conservation issues. References [1] S Corson and J Marker, Mobile Ad Hoc Networking (MANET): Routing Protocol Performance Issues and Evaluation Considerations, RFC 2501, pp. 3-9, January 1999 [2] AEthem, Introduction to Machine Learning, Prentice Hall of India, pp. 370-375, 2006. [3] S.A. Kulkarni, Dr. G.R Rao, "Modeling Reinforcement Learning Algorithms for Performance Analysis", International Conference on Advances in Computing, Communication and Control - ICAC'03, January 23-24th, 2009. [4] J Dowling, E Curran, R Cunningham, and V Cahill, Using Feedback in Collaborative Reinforcement Learning to Adaptively Optimize MANET Routing, IEEE Transactions on Systems Man and Cybernetics. Part A: Systems and Humans, Vol. 35, No.3, 2005 [5] S.A Kulkarni and Dr. G.R Rao, "Optimization of Reinforcement Learning Based Routing Algorithm Applied to Dynamic Network", In Proceedings of ICCVR-2010, August 19th to21st, 2010. [6] B Chen, K Jamieson, H Balakrishnan and R Morris, "Span: An Energy-Efficient Coordination Algorithm for Topology Maintenance in Ad Hoc Wireless Networks", ACM/IEEE International Conference on Mobile Computing and Networking (Mobi Comm. 2001), Rome, Italy, July 16-21, 2001. [7] Wireless LAN Medium Access Control and Physical Layer Specifications, IEEE 802.11 Standard (IEEE Computer Society LAN MAN standards Committee), August 1999. [8] S P Chaudri, J-Y L Boudec, M Vojnovic. Perfect Simulations for Random Trip Mobility Models. Proceedings of the 38th Annual Symposium on Simulations, pp. 72-79, 2005 [9] R Kravets, C Sengul, "Energy Conservation", In Ad-Hoc Networks Technologies and Protocols (Eds.) P Mohapatra and S Krishnamurthy, Springer, pp. 153-155, 2005. [10] J-Y L Boudec and M Vojnovic, "Perfect simulation and stationarity of a class of mobility models," in Proceedings of IEEE Infocom, 2005. [11] F Bai, N Sadagopan and A Helmy, "The IMPORTANT framework for analyzing the Impact of Mobility on Performance of RouTing protocols for AdhocNeTworks", Elsevier Ad Hoc Networks, 1, pp. 383-403, 2003. [12] S.A. Kulkarni, G R Rao, "Mobility and Traffic Model Analysis for Vehicular Ad-hoc Networks", In Advances in Vehicular Ad Hoc Networks: Developments and Challenges, Ed. Mohamed Watfa, IGI Global, pp. 214-232, 2010. [13] The Network Simulator - NS-2, http://www.isi.edu/nsam/ns/ [14] P Nurmi. Reinforcement Learning for Routing in Ad Hoc Networks. Proc. 5thIntl. Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt, Limassol, Cyprus, April 2007) IEEE Computer Society, 2007 220 Informatica 36 (2012) 213-220 S.A. Kulkami et al. [15] B Chen and C Hwa Chang, Mobility Impact on Energy Conservation of Ad Hoc Routing Protocols, SSGR 2003, Italy, July 28-August 2, 2003. [16] V Naumov, T Gross, "Scalability of routing methods in ad hoc networks", In Performance Evaluation Journal (Elsevier Science), Volume 62, pp. 193-207, 2005. [17] Y Xu, J Heidemann and D Estrin, "Geography-informed Energy Conservation for Ad Hoc Routing. In proceedings of the Seventh Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), Rome, Italy, July 2001. [18] C Raghavendra and S Singh, "PAMAS: Power Aware Multi-Access Protocol with Signaling for Ad Hoc Networks", ACM Computer Communication Review, pp. 5- 26, July, 1998. [19] S Singh, M Woo and C S Raghavendra, "Power-Aware Routing in Mobile Ad Hoc Networks", In Proceedings of the Fifth Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), Dallas, 1998. [20] S P Chaudhuri, ns-2 Code for Random Trip Mobility Model, http://monarch. s.rice.edu/~santa /research/mobility/ [21] C Perkins, E Royer and S Das, "Ad Hoc on demand distance vector (AODV) routing", http://www.ietf.org/internet-drafts/draft-ietf-manet-aodv-03.txt [22] C Perkins and E Royer, "Ad hoc on-demand Distance Vector Routing", In Proceedings of the 2nd IEEE Workshop on Mobile Computing Systems and Applications, pp. 90-100, Feb 1999 [23] Analysis Toolpak, http://office.microsoft.com /en-us/excel-help/about-statistical-analysis-tools-HP00 5203873.aspx [24] K N Krishnaswamy, A I Shivakumar and M Mathirajan "Management Research Methodology -Integration of Principles, Methods and Techniques", Pearson Education, pp. 352-355, 2006. [25] S A Kulkarni, G R Rao, "A Performance Analysis of Energy Efficient Routing in Mobile Ad Hoc Network", International Journal of Simulations -Systems, Science and Technology - IJSSST: Vol. 10, No.1-A. . pp. 1- 9, February - 2010. [26] J Yoon, M Liu and B Noble, "Random Waypoint Considered Harmful", in Proceedings IEEE INFOCOM 2003, San Francisco, C A, April 2003. [27] S P Chaudhri, J-Y L Boudec, M Vojnovic, "Perfect Simulations for Random Trip Mobility Models", in Proceedings of the 38th Annual Symposium on Simulation, 2005, pp. 72-79.