A More Precise Latency Bound of Deficit Round-Robin Scheduler Anton Kos, Sašo Tomažič University of Ljubljana, Faculty of Electrical Engineering, Ljubljana, Slovenia E-mail: anton.kos@fe.uni-lj.si Abstract. Delay is an important Quality of Service (QoS) parameter. A significant part of an end-to-end delay of a data flow is latency - the delay induced by scheduling algorithms in network devices. In this paper we present a latency analysis of a Deficit Round-Robin (DRR) scheduler. We correct inaccuracies and deficiencies of the DRR latency analysis detected in works of other authors. We derive a new and more precise latency bound of the DRR scheduler and show that, contrary to the latency bounds derived by other authors, our bound is mathematically correct. Finally we give a comparative analysis of the DRR latency bounds discussed in the paper. Key words: DRR, Deficit Round-Robin, Latency, Latency Bound, Packet Networks, Scheduling NatancnejSa meja zakasnitve odpravnika ž deficitarnim krožnim dodeljevanjem Povzetek - Zakasnitev je pomemben parameter kakovosti storitve. Pomemben del zakasnitev od-konca-do-konca za posamezen podatkovni pretok je tudi začetna zakasnitev alilatenca - zakasnitev algoritma odpravnika vomrečnih napravah. Vtem prispevku predstavljamo analizo latence odpravnika z deficitarnim kročnim dodeljevanjem (Deficit Round-Robin - DRR). Osvetlimo in odpravimo nepravilnosti in pomanjkljivosti analiz latence odpravnika DRR v delih drugih avtorjev. Izpeljemo novo in natančnejšo mejo zakasnitve odpravnika DRR in pokačemo, da je le-ta, v nasprotju z mejami zakasnitve drugih avtorjev, matematično pravilna. Za konec podamo primerjalno analizo mej zakasnitev odpravnikov DRR, obravnavanih v tem prispevku. Ključne besede: DRR, odpravnik z deficitarnim krožnim dodeljevanjem, latenca, meja zakasnitve, paketno omrežje, odpravnik 1 Introduction In present communication networks Quality of Service (QoS) is more and more important. QoS in communication networks is generally measured in terms of cell/packet loss, mean delay and delay jitter. Consequently the end-to-end delay is becoming a more important transmission parameter. It should be bounded, it should be as low as possible, and it should not have too much jitter. This is true especially for (interactive) realtime applications that do not tolerate high delays and jitter. In packet networks resources are statistically shared among different traffic sources that try to send their data to the desired destinations. Overload causes congestion that is solved either by delaying or by dropping excess packets. Many of the problems that we face in networks are related to the allocation of a limited amount of shared resources (buffer, memory, bandwidth, etc.) to competing data flows. Solutions that try to solve this problem can be grouped into two categories: end-system-based solutions and network-based solutions. As latency* is a network device property, we are here only interested in the network-based solutions. Many multimedia applications rely on the ability of the network to provide some sort of quality of service guarantees. QoS can generally be defined as a set of network mechanisms that satisfy the varied quality of service levels, while at the same time maximizing bandwidth utilization. Applications rely on traffic scheduling algorithms in network devices to guarantee performance bounds. Several measures are to be considered when choosing a scheduling algorithm. The most important are: latency, fairness and complexity. This paper focuses on latency. The paper is organised as follows. In Section 2 we briefly describe why latency is an important property of each scheduler. In Section 3 we present our latency bound. We continue with the explanation of the basics of Deficit * Latency is defined in the following sections. Round-Robin (DRR) scheduler in Section 4. Section 5 is the core of this paper where we give the definition of the latency and derive its upper bound for the Deficit Round-Robin scheduler. We conclude with a comparison of latency bounds of some well-known schedulers in section 6. 2 Importance of Delay and Latency The end-to-end delay is a very important QoS parameter. A number of factors contribute to the end-to-end delay: forwarding delay, queuing delay, propagation delay and serialization delay. When scheduling algorithms are discussed, it is only the queuing delay that is of our interest. It denotes the amount of time that a packet has to wait in a queue as the system performs statistical multiplexing and while other packets are serviced before it can be transmitted on the output port. The scheduling algorithm of a scheduler should provide end-to-end delay guarantees for individual flows* without severely under-utilizing network resources. While queuing delay can be viewed primarily as a delay parameter of a packet(s), latency is a delay parameter associated to data flows. The notion of latency, that is going to be used here, is based on the length of time it takes a new flow to begin receiving service at its reserved rate (for details see [6]). Therefore, latency directly affects the size of the playback buffers required in real-time applications. 3 Our Latency Bound Latency bounds of DRR have been given or derived in several papers that discuss Latency-Rate or Round-Robin like schedulers. Let us mention two of the more important papers written by Stiliadis and Varma [6] and Kan-here and Sethu [7]. Our derivation of the latency bound of the DRR has been inspired by both papers. We claim that our latency bound is more precise because some assumptions used to derive the latency bounds in [6] and [7] are incorrect. In [6] they do not take into account the fact, that the packet length is a discrete value. It is true that for large maximum packet lengths this does not yield a big difference in latency bound, but for small maximum packet lengths it does. A more critical effect on the result has the inaccurate assumption about the maximal possible value of the deficit counter. In [7] they remediate both inaccuracies from [6], but they use irregular factor replacements in one of the inequalities in their derivation process. A detailed * Traditionally, a flow is defined as a sequence of packets generated by the same source and headed toward the same destination via the same path in the network. It is assumed that packets belonging to different flows are queued separately while they await transmission. A scheduler dequeues packets from these queues and forwards them for transmission. explanation of the mentioned inaccurate assumptions is given in [1] and [2]. In section 5 we give a simplified derivation of our latency bound. An exhaustive derivation with proofs can again be found in [1] and [2]. Further comments about the differences in latency bounds are given within Sections 5 and 6. The most important result of our work presented in this paper is the latency bound that is more precise than the latency bounds given in [6] and [7]. 4 Deficit Round-Robin Scheduling algorithms can be broadly classified into two categories: sorted priority schedulers and frame-based schedulers. Sorted priority schedulers maintain a global variable called the virtual time or system potential function. The priority of each packet, called the time-stamp, is calculated based on this variable. The packets are then scheduled in an increasing order of their time-stamps. Examples of sorted priority schedulers are Weighted Fair Queuing (WFQ), Self-clocked Fair Queuing (SCFQ), Starttime Fair Queuing (SFQ) and Worst-case Fair Queuing (WF2Q). Generally, they give good fairness and low latency bound but they have great computational complexity. In frame-based schedulers, time is split into frames of fixed or variable length. Reservations are made in terms of the maximum amount of traffic the flow is allowed to transmit during a frame period. The service received by a flow in one round-robin opportunity is proportional to its fair share of the bandwidth. These schedulers do not have to perform sorting among packets and calculate global virtual time function, so they have lower computational complexity than the sorted priority schedulers do. Deficit Round-Robin (DRR), Surplus Deficit Round-Robin, Elastic Round-Robin, Nested Round-Robin are some of the frame based schedulers with complexity O( 1), but they have worse fairness and latency properties than the sorted priority schedulers. In 1996, Shreedhar and Varghese [5] proposed DRR, one of the most popular frame-based scheduling algorithms. The main characteristic of all DRR-like scheduling algorithms is their ability to provide guaranteed service rates for each flow (queue). DRR services flows in a strict round-robin order. It has complexity O(1) and it is easy to implement. Its latency is comparable to other frame-based schedulers. A detailed operation of the DRR algorithm can be found in [5]. Below is a list of variables used in our analysis: R transmission rate of an output link, N total number of active flows, ri reserved rate of flow i, Wi weight assigned to each flow i, Qi quantum assigned to flow i, DCi deficit counter of flow i, F frame size, Lmax maximum possible packet size. Because all flows share the same output link, a necessary constraint is that the sum of all reserved rates must be less or equal to the transmission rate of the output link: ][> < R (1) Let Tmin be the smallest of ri: Tmin = minyi ri. Each flow i is assigned a weight that is given by: (2) Note that Vi G 1, 2, • • • ,N holds wi > 1. Each flow i is assigned a quantum of Qi bits, that is a whole positive value, i.e. Qi G N. This quantum is actually the amount of service that the flow should receive during each round-robin service opportunity. Let us define with Qmin the minimum of all the quanta. Then the quantum for each flow i is expressed as: Qi = WiQm (3) 5 Latency Stiliadis and Varma in [6] defined a general class of schedulers, called Latency-Rate (LR) servers. The behaviour of an LR server is determined by two parameters - the latency and the allocated rate. Latency of an LR server is the worst-case delay seen by the first packet of a busy flow. That is the packet arriving when the flow's queue is empty. The latency of a particular scheduling policy may depend on its internal parameters: its transmission rate on outgoing link, the number of flows sharing the link and their allocated rates. In this definition of LR servers, there was no assumption made on whether the server is based on a fluid model or a packet-by-packet model. The only requirement is that a packet is not considered as departing the server until its last bit has departed. Therefore, packet departures must be considered as impulses. The DRR algorithm satisfies all of these assumptions. The authors also developed and defined the notion of latency of the scheduling algorithm and determined an upper bound on the latency for a number of schedulers that belong to a class of LR servers. This notion of latency is based on the length of time it takes a new flow to begin receiving service at its guaranteed rate. Using the general idea of Stiliadis and Varma in [6] we derive the upper latency bound for the DRR algorithm that is different from theirs. It is also different from the bounds derived in [7]. We show that our upper bound is mathematically correct contrary to the ones derived in [6] and [7]. A more detailed analysis is given in [1] and [2]. Let us first define active and busy periods of a flow. Definition 1 An active period of a flow is the maximal interval of time during which the flow has at least one packet awaiting service or in service. Definition 2 A busy period of a flow is the maximal time interval during which the flow would be active if served exactly at its reserved rate. An active period reflects the actual behaviour of the scheduler since the service offered to a flow varies depending on the number of active flows. A busy period is a mathematical construction that tells us how long a flow would be active if served at exactly its reserved rate. Let us define the following parameters: ai t > a. Ai (ai, t) Si(ai,t) start of a busy period for flow i time instant such that flow i is continuously busy during the time interval (ai,t) arrivals from flow i during the time interval (ai,t) SENTi(ai, t) amount of service received by flow i during time interval (ai, t) number of bits belonging to packets in flow i that arrived after time ai and are scheduled during the time interval (ai,t) starting time of the j-th busy period of flow time instant at which the last bit of traffic, arrived during the j-th busy period, leaves the server number of bits belonging to packets in flow i that arrived at the j-th busy period, i.e. in the time interval (Tj ,t) Sj (Tj ,t) Note that Si(ai,t) and SENTi(ai,t) are not the same. During the given time interval, the scheduler may still be servicing packets that arrived during a previous busy period, and thus Si(ai,t) and SENTi(ai,t) are not the same values. In fact the following must hold: Si(ai,t) < SENTi(ai,t). r WW = r For Stiliadis and Varma a server S belongs to a class of LR servers if and only if for all time instants t after ai: Sj(oü,t) > max{0, r(t - a - Qi)} (4) where 6i is the minimum non-negative number that satisfies the above inequality and represents an upper latency bound. The previous definition of latency is based on flow busy periods. In practice it is easier to analyze scheduling algorithms based on flow active periods. The following was proven in [1] and [2]: Lemma 1 Let (ai, t) indicate an interval of time in which flow i is continuously busy. If the service offered to the packets that arrived in the interval (ai, t) can be bounded at every instant s, ai < s < t as Si(ai; s) > max{0,ri(s - ai - ö-)} (5) then this server is a LR server with a latency less than or equal to 6i. The following definition gives us a proper tool for the analysis of the latency bound: Definition 3 The latency of flow i is the minimal nonnegative constant 6i that satisfies, for all busy periods of the flow, the following inequality: Si(ai,t) > max{0,ri(t - ai - 9i)} (6) Definition 4 d[ is the smallest non-negative constant such that the following expression holds: Proof 1 Since the latency of a LR server can be estimated based on its behaviour in the flow active period, see Corollary 1, we will prove the theorem by showing that, s «•< r- - r + + (L„ - !)( Q-N +i - 4 \Fvi (10) This part of the proof is the same as in [7] and will not be repeated here. So the reader should fill this gap with the work in [7] until the following expression: SENTi(ßi, ßk) > Qi R (ßk - ßi ) - F - Qi (N - l)(Lmoœ - 1) + R R + L-1 Lmax 1 R i - F. Qi (11) This statement is true but the results following it in [7] are not. Discussion is given in Comment 1. Now, since it is proven in [5], that ri < Qf R, the factor ^ R in inequality (11) can only be replaced by ri when the expression in square brackets is greater than or equal to zero. We do not know the exact value of that expression. To proceed correctly, we must dispart the expression (11). We get: 9 r SENTi(/3i,t) > max{0,ri(t - pi - )} (7) where piis an instant of time when flow i becomes active and t > pi is the time instant such that flow i is continuously active during the entire interval (pi, t). Corollary 1 Even though (pi,t) may not be a continuously busy period for flow i, the following holds: 0i < e> (8) i.e., latency 6i, as defined earlier, is bounded with di, for every flow i. Theorem 1 The DRR scheduler belongs to the class of the LR servers with an upper bound on latency 6i for flow i given with the following expression: 9i ±9i ± Qi (k- R)+ + (Lmax - k)(FTiN + ri - R) (9) SENTi (ßi,ßk) > + + QR(ßk - ßi) - Qi + QiR Qi ~R - QiN(Lmax - k) F Qi(Lmax k) + F (Lmax k) (12) We can now use the replacement only on the positive elements of the right side of the inequality. Doing so, we decrease the value of the right side of the inequality, and the inequality still holds. However, when we use the replacement on the negative elements, we increase the value of the right side of the inequality, and the inequality does not necessarily hold anymore. Applying the above methodology, we get the following inequality: > ri(ßk - ßi) - Qi + + Q ri QiN Qi R F - (Lmax + ri (L Tt \Lmax R 1) + + (Lmax - 1) ( R -1 For the validity of the proof in [7] it is necessary that the expression (17) is always positive for every combination of its parameters. Since this is not always satisfied, their result is incorrect. The counter example goes like this: for a chosen value of parameters R = 4000, N = 10, Qi = 200, Lmax = 1500, k = 3, where k represents the number of rounds, the expression (17) has a negative value of -3.023. For the calculation of time — pi we used the expression From the previous expression we get: , F (N - l)(Lmax - 1) kR + R SENTi(ßi,ßik ) > ri (ßk - ßi) 1 - R) - -(L-x - «( FT; N+r- - R (14) Let us now compare inequalities (7) and (14). Taking into consideration that 0' is defined as the smallest nonnegative constant, that satisfies (7), and that t in (7) is replaced by ßk we get the following inequality: * * Qi(r- - r)+ + (Lmax - ^Fr-N + r- - R (15) where d[ is the minimal non-negative constant that satisfies the inequality (7). Finally, from the inequality (8) we get our latency bound: 0i * Qi + (17) To prove their Theorem 1 in [7], from the expression (17) on, Kanhere and Sethu linearly used the inequality Vi < ^ji R. As already explained in the proof of our Theorem 1, this is not mathematically correct and cannot be done in this way. We can make the replacement only on the positive factors of the disparted version of the expression since this interval represents the maximal possible amount of time used for the k round-robin service rounds. It can therefore be observed that the latency bound in [7] will stand for the most of the values of k and Qi. However, it does not stand for every k and Qi, precisely it is not going to stand when the small number of rounds in round-robin servicing is observed or when Qi is a small value. Since their latency bound does not stand for all cases, it is not correct. If we look deeper into this difference among the discussed bounds, we can see that our bound gets relatively closer to the one derived in [7] as ri, and with that also Qi, get smaller. Clearly, this difference gets bigger as ri and Qi get bigger. 6 Comparison of Latency Bounds Table 1 shows latency bounds of various scheduling algorithms. Although it is not easy and straightforward to compare the latency bound of the DRR scheduler to the latency bounds of other listed schedulers, the table can give us a general idea of the latency bounds of the most popular schedulers. It is also not straightforward to compare the three different latency bounds for the DRR scheduler. While the bound derived in [6] has only three parameters (F, Qi and R), the bound derived in [7] introduces additional two parameters (N and Lmax), and our bound another one (ri). In general, the bound from [7] is tighter than the one from [6], and our bound is between the both. Some figures comparing the bounds are available in [2]. 7 Conclusion Stiliadis and Varma in [6] defined a general class of schedulers, called Latency-Rate (LR) servers. Using the general idea of Stiliadis and Varma [6] and of Kanhere and Sethu [7], we derived the upper latency bound for the DRR scheduling algorithm. We show that our bound is unique regardless of the approach used. Table 1. Comparison of latency bounds among different schedulers. Denotations used: transmission rate of an output link (R), total number of active flows (N), reserved rate of flow i (ri), weight assigned to each flow i (wi), quantum assigned to flow i (Qi), Deficit Counter of flow i (DCi), frame size (F), maximum packet length (Lmax). Scheduler Latency bound GPS Q FIFO prpc wpn to Virtual Clock Fair Queuing -LR + LVi- ■'-'max i -'-'max R + Vi SCFQ (N — 1)Lmax i Lmax -zE-- Weighted Round-Robin R + Vi R (F Qi + Lmax) Elastic Round-Robin R ((F Qi) + (Lmax - 1)(N - 1)) DRR - as in [6] 3F - 2Qi R DRR - as in [7] i ((F - Qi) + (Lmax - 1) (Q + N - 2)) DRR - our result QR1 - RR ) + (Lmax - 1) ( Vi -R + FV- N )) It should be noted that depite using the same ideas as Stil-iadis and Varma, and Kanhere and Sethu, we made some changes and corrections in the derivation leading to a new and mathematically correct latency bound. 8 References [1] Anton Kos: Zagotavljanje različnih stopenj kakovosti storitve v omrezčjih s paketnim prenosom podatkov, Doctoral Thesis, Faculty of Electrical Engineering, Universi-tiy of Ljubljana, Slovenia, February 2QQB [2] Anton Kos, Jelena Miletic: Sub-Critical Deficit Round-Robin, Technical Report, Faculty of Electrical Engeneering, University of Ljubljana, Slovenia, July 2QQ5 [B] Anton Kos, Jelena Miletic, Sašo Tomazic: On Latency of Deficit Round-Robin, Proceedings of ERK 2QQB, Por-toroz, Slovenia, September 2QQB [4] Anton Kos, Jelena Miletic, Sašo Tomazic: Improved Latency Bound of Deficit Round-Robin Scheduler, Proceedings of YU INFO 2QQ7, Kopaonik, Serbia, March 2QQ7 [5] M. Shreedhar, George Varghese: Efficient Fair Queuing Using Deficit Round-Robin, IEEE/ACM Transactions on Networking, Volume 4, Issue B, June 199B [B] D. Stiliadis, A.Varma: Latency-Rate Servers - A General Model for Analysis of Traffic Scheduling Algorithms, IEEE/ACM Transactions on Networking, Volume B, Issue 5, October 199B [7] Salil S. Kanhere, Harish Sethu: On the Latency Bound of Deficit Round-Robin, Proceedings of the International Conference on Computer Communications and Networks, Miami, Florida, USA, October 14-1B, 2QQ2 [B] Nihal Pekergin: Stochastic Bounds on Delays of Fair Queuing Algorithms, Proceedings of the Conference on Computer Communications (IEEE Infocom), New York, March 1999 [9] R. Guerin, V. Peris: Quality-of-Service in Packet Networks Basic Mechanisms and Directions, Computer Networks, B1(B):1B9-1B9, February 1999 Anton Kos je diplomiral leta 1994, magistriral leta 1998 in doktoriral leta 2006 na Fakulteti za elektrotehniko Univerze v Ljubljani s področja elektronike in telekomunikacij. Zaposlen je kot asistent v Laboratoriju za komunikacijske naprave na Fakulteti za elektrotehniko. Njegovo pedagoško, raziskovalno in razvojno delo je povezano predvsem s komunikacijskimi omrezji, še posebej z omrezji IP. Znotraj tega področja ga še posebej zanima zagotavljanje kakovosti storitev v sodobnih komunikacijskih omrezšjih. Saso Tomažic je diplomiral leta 1979, magistriral leta 1981 in doktoriral leta 1991 na Univerzi v Ljubljani, vse s podrocšja telekomunikacij. Zaposlen je na Fakulteti za elektrotehniko v Ljubljani kot profesor. Je predstojnik Laboratorija za komunikacijske naprave in predstojnik Katedre za telekomunikacije. Njegovo sedanje delo zajema raziskave na podrocšju obdelave signalov, varnosti v telekomunikacijah, elektronskega poslovanja in porazdeljenih podatkovnih sistemov. Predava predmete Osnove telekomunikacij, Digitalne komunikacije, Komunikacijska vezja, Gradniki TK sistemov, Digitalna obdelava signalov, Adaptivna obdelava signalov in Mobilne komunikacije.