A NEW APPROACH TO THE MODELING OF NETWORK TRAFFIC IN SIMULATIONS Matjaž Fras, Jože Mohorko, Žarko Čučej Univerza v Mariboru, Fakulteta za elektrotehniko, računalništvo in informatiko, Maribor, Slovenija Keywords: self-similar, networl< traffic modeling, Pareto distribution, maximal transmission unit Abstract: Simulations of telecommunications networks have become very important tools for ttieir evaluation. A very important influence in simulations has networl< traffic. This paper introduces new concepts for the modeling of measured network traffic in simulation tools. With these new concepts, we can improve descriptions of the random packet-size process, especially for maximal-packets of network traffic, which have a very great impact on the bit or packet rates of network traffic. The suggested methods improve the contents of packets, especially maximal packets in modeled network traffic simulations, which leads to smaller differences in bit and packet-rates between measured and modeled network traffics. Nov pristop k modeliranju samo-podobnega prometa v simulacijah KJučne besede: samo-podobnost, modeliranje omrežnega prometa, Pareto porazdelitev, maksimalna dolžina paketa Izvleček: Simulacije telekomunikacijskih omrežij postajajo pomembno orodje za ovrednotenje le teh. Zelo pomemben in velik vpliv v simulacijah ima tudi omrežni promet. Ta članek predstavlja novi koncept modeliranja izmerjenega omrežnega prometa v simulacljskih orodjih. Z tem novim konceptom lahko izboljšamo opis naključnega procesa velikosti paketov omrežnega prometa, zlasti maksimalnih paketov, kateri imajo zelo velik vpliv na srednjo vrednost celotnega prometa v bitih in paketih na časovno enoto. Predlagane metode izboljšajo opis vsebnosti paketov v omrežnem prometu, še posebej maksimalnih paketov v modeliranem prometu,kar posledično vodi do manjših razlik v srednji vrednosti bitov in paketov na časovno enoto med izmerjenim in modeliranim prometom. 1. Introduction Statistical analysis in Ethernet networks show that, in many cases, network traffic can be described by self-similarity /1 /. This model appeared before fifteen years as an alternative, at that time, to the used models such as Poisson and Markov /2/. It was also shown, that heavy-tailed distributions, such as Pareto and Weibull, are more suitable for describing network processes, such as process packet-size and inter-arrival time /1, 3, 4, 5/. One of the main goals of researchers was, and still is, the modeling of network traffic in simulations, such as OPNET /6, 7, 8/. In simulation we try to model the measured network traffic, which is the best possible approximation of the measured traffic in the sense of bit or packet-rates, bursts or variance. For evaluating discrepancies between measured and simulated network traffic, we chose different measures such as bit or packet rates, Hurst parameter, variance and also discrepancy between histograms of statistical network process for packet size and inter-arrival time. During measuring and modeling we saw that discrepancies between measured and modeled traffic are derived from an inaccurate description of the packet-size process. We also saw that, especially for longer and maximal length packets (MTU- Maximal Transmission Unit), have a substantial influence on modeled network traffic. The captured histogram of the packet-size process had great discrepan- cy in regard to the measured histogram and chosen distribution, which is usually a consequence of maximal-packets. Maximal packets are a consequence of data fragmentation in TCP/IP stack. Usually with classical modeling, where a captured histogram of packet size process is described with distribution, we do not derive at a good enough description regarding the packet-size process of measured traffic, especially the content of maximal packets. This, consequently, leads to great discrepancy between measured and modeled network traffics, especially in bit and packet-rates, and also traffic bursts. For this reason, we present three methods for describing a measured traffic histogram of packet-size which achieve more accurate descriptions of network traffic in simulations. 1. The first method is based on using "mixed distributions" for describing random processes, a similar concept is used in the area of image processing /9/, and already steps in the area of traffic modeling /10, 11, 12/. 2. The second method is based on estimating data files of a measured traffic histogram by defragmentation in a communications network /3, 4, 5/. 3. The third method combines the first and second methods. This paper is organized as follows. The second section describes statistical modeling of network traffic by distribution and Hurst parameter. The next section describes the packet-size process of network traffic. New approaches with suggested methods are in the forth section. The fifth section represents the simulation results. Finally, we finish this paper with the conclusion. 2 Statistical modeling of measured network traffic Network traffic can be described as a combination of two random processes: 1. packet-size process X(t) 2. inter-arrival time y(f) Lets describe network traffic Z(t) as Z(0=V(X(0,7(0) (1) where \|/ is the function of packet-size X(f) and inter-arrival time process Y(t). Both processes are described by probability distribution function (pdf). The choice of suitable distribution for a traffic process depend the measured network traffic's properties. For network traffic with a short-range dependence property, light-tailed distributions (exponential) are the more suitable for describing packet-size process, such as exponential. In the case of network traffic with long-range dependence, heavy tailed distributions are the more suitable distributions for describing such traffic, such as Pareto and Weibull. The probability density function (pdf) of Pareto distribution is p{x) = ak"-k0 (2) where k is local parameter and a is shape parameter. Probability density function of Weibull distribution is: p(x)=- -(f) •e * , X > 0, a,k>0 (3) where k is local parameter and a is shape parameter. Definition of the self-similar random process /3, 13, 14, 15/ is based on autocorrelation function r{k), which is described as rik) ^ k'^ Liik), k-^oo, 0<ß 0 (i.e., /.i(0 = constant, Li(f) = log{t)). Hurst parameter H is used for described arrival process and it is defined by H 2 0< ß <1 (5) and presents the measure of self-similarity. For describing arrival process, beside parameter H, are also needed parameters such are average arrival-rate, fractal onset time scale, source activity-ratio, and peak to mean ratio. 3 Problem of statistical packet size process From measured traffic by sniffer /8/, we can obtain information about a packet-sizes, inter-arrival time, packet-rate... Based on histograms, we can evaluate both random traffic process X(f) in Y(f) and choose distributions, which are the best approximations of histograms. During research, where we estimate parameters of traffic processes we found that, in the case of estimating packet-size process parameters much larger discrepancies appear than in the case of in-ter-arrival time. Discrepancy between the histogram of measured traffic and distribution, which describe this process, can be evaluated by goodness of fit tests, such as Kolmogorov-Smirnov or Chi-square /16/. The greatest impact on these discrepancies is MTU, which as mentioned in the first section. MTU packets cause a strong discontinuity in the histogram and it is very difficult to describe such a histogram using the classical method. In our research, we paid attention to a statistical description the packet-size process of network traffic. Probability Density 0.52 0,48 0.44 0,4 0,36 0,32 0,28 0,24 0,2 n 18 0,12 0 08 0,04 0 1000 1200 O Histogram — Paleto Fig. 1: Histogram of measured packet size process and distribution parameters estimation with ciassicai method with EasyFit fitting tool. Figure 1 shows an example of a packet-size histogram of measured network traffic and classical distribution parameters' estimation. From the captured histogram, we can see that minimal length size packets of around 54 B prevail. But there are also a lot of packets of maximal length, which also have a great influence on the bit-rate of the entire network traffic. The classical parameter estimation method (Figure 1) does not describe the process very well, especially those maximal packets, which usually lead to great discrepancies between measured and modeled traffic, in the sense of bit or packet rates. Such an estimation method also has very big difference between the contents of packets between measured and simulated traffic. We cannot solve this problem by using other methods for estimating distribution parameters for the packet-size process of network traffic, such as the CCDF method /3/. The greatest discrepancies appear when describing network traffic with long-range dependence (LRD) property, where heavy tailed-distribution is used, such as Pareto. Smaller discrepancies also appear in the case of describing network traffic with short-range dependence (SRD), where exponential distribution was used, but these discrepancies are smaller than in the previous case. 3 Suggested methods for estimating distribution for packet-size process All suggested methods are based on the transformation of captured-traffic. The first method is based on using mixed (multiple) distributions to the describe packet-size process, the second method is based on defragmentation of captured-packets and the third method is a combination of the first and second methods. 3.1 Mixed distributions Using this method, we will describe network-traffic by multiple-distributions, which will be implemented using multiple traffic generators in the same simulation workstation. By using mixed distributions for describing the stochastic process of network traffic, we will achieve a smaller discrepancy between the measured histogram and the fitted distributions for packet-size process (Figure 2). Network traffic Z{t) defined in (1) can be described as the sum or n-th data sources: Z(t) = Zi(t) + Z^ft),... Z„(t) Z(t) =xifi(Xi(t).Yi(t))+...+\if„ (X„(t),Y„(t)) = (6) where Z^f) is traffic for each traffic generator and y/i is a function of two random processes Xi(t) and Yi{t), where Xi{t) represent packet-size process and VXf) inter-arrival time. So, we can divide network traffic into separated segments modeled by different distributions. Points which separate the packet size process in multiple parties described by independent distribution, are threshold points. The simplest way to separate network-traffic for mixture distribution is to define the first traffic Zi(t), where are packets, which are longer than the threshold value, and another traffic Z2(f), with packets that are shorter than the threshold. In many cases, MTU size represents the threshold point. ProbabHity Density 0 52 0,48 0.44 0.4 0.3S 0.32 > 0.23 0,24 0,2 0.16 0.12 0,08 0 04 0 Fig. 2: Example of using two distributions for describing packet size process of captured networic traffic. Z(t)=Zfi)+Z^(t) = _ 'Zi(0=Vi(Xi(0,}i(0); packet_size>thresMd ~ {Z^it) =^2 Y^. (0);packet_size < threshold (7) We must also estimate the belonging distributions for both inter-arrival time processes yi(f) and /2(0 and packet-size processes Xi(f) and X2(t). 3.2 Defragmentation method Whilst transmitting files across a network, IP packets are fragmented because of MTU limitations. The fragmentation process is executed in a model of IP encapsulation in TCP/IP stack. From the captured traffic in Figure 1, we can see that MTU packets impact on the discontinuity in the histogram, this causing the common distribution descriptions, with the help of the classical method. This new method is based on histogram estimation of the transmitted data file before fragmentation /4/. For a distribution estimation of the packet-size process we execute with the addition of maximal packets, which are fragmented in the fragmentation process during transmission. So, we combine all packets from a sequence of MTU packets, including the first packet shorter than the maximal size, from the same source in the new bigger packet. These newly derived at values, together with captured non-fragmented packets, are used designating the histogram of data, which will be described by new distribution. Z{t) Y{t)) ^ Zrit) =^^{XT{t)Mt)) Z(t) » Z/t) (8) Zr (t) represents the transformed traffic, which is a function of the transformed processes for packet-size Xt and inter-arrival time Vr- The transformed histogram represents the originally transmitted files Zit). We spray the distribution of maximal packets in the captured histogram over a new range, using the defragmentation method, which represents the transmitted files. This method leads to more continuous histograms, such as in Figure 3, which can be described by the classical method more precisely using distribution, than the histogram in Figure 1. Estimation parameters of file sizes are used in traffic generators during simulations. Because of the limitation of MTU, which is a defined in model of a communication device, the files are fragmented into maximal packets during the simulation run. So estimate traffic is a good approximation of captured traffic. 3.3 Combination of distributions and defragmentation The third method is based on a combination of mixed distributions and defragmentation methods. The basic idea is to describe captured traffic with two or more distributions, but for captured-traffic we can also execute the defragmentation process. For example, we can execute the de-fragmentation process of captured-traffic Z(f), and then describe with one distribution Xi(f) traffic of packets Zi(f), TRANSFORMED HISTOGRAM OF CAPTURED PACKETS WITH DEFRAGMENTATION METHOD Fig. 3: Transformed histogram of captured histogram on Figure 1 with chosen distribution. which are shorter than the maximal packets. With the second distribution X2{t), we can describe the traffic of the fragmentation packets Z2(f), which was equal to the maximal values before fragmentation. Z(t) = Z^(t)+Z,(t) = ^ jz, (0 =\|/i (Xi Yi (0); packet threshold [Z^(0 defingmentdon packets (9) For both processes Xi(f) and X2(0 we also define and estimate distributions and their parameters for the belonging processes of inter-arrival time Yi(t), Y2{t), and also Hurst parameter, which can also be used in the modeling of arrival process. 4. Simulation results We model the captured self-similar network traffic, which is shown on Figure 4, with short-range dependence by simulations with both classical and presented methods. network traffic and all estimated parameters for presented methods, which was used in OPNET simulations tool. Tabie 1: Parameters of measured and simulated networl< traffic packet size process inter-arrival time p/s kb/s H MSB measured traffic X X 35.6 114.5 0.58 X classical method exponential 1/;. = 416,5 Weibull a = 0.57326 ^ = 0.01895 33.4 113.4 0.53 0.024 packets < MTU 1. method exponential 1/;. = 230.41 Weibull a = 0.65792 ^=0.02587 34.1 124.9 0.54 0.016 packets = MTU constant 1482 Rayleigh ff= 0.17435 2. method exponential 1M = 452,48 Weibull £1 = 0.6521 ,»=0.0244 31.0 114.5 0.52 0.026 packets < MTU 3. method exponential 1/A = 106.7 Weibull a = 0.677 ß = 0.02932 37.2 120.0 0.57 0.003 detragmentation data Rayleigh IPiit mwy^Ad^ aj^t-/, Ii; fiyl^ 100 150 time (s) Measured test networl< traffic captured by Wiresharl< sniffer. In the case of classical estimation, we chose exponential distribution for describing the packet-size process, because the value of Hurst parameter is near 0.5 and also has short-range dependence. For the first and third methods we define threshold, which is equal to MTU size, because the bin with MTU packets is withdrawing from other neighboring local bins in the packet-size histogram (Figure 1). This bin is described by separated distribution, for the first and third methods. Table 1 shows parameters of measured 0,8 0,6 1 I stogramof measured traffic bins (Ibin = 150B)^ 1 stogram of modeled traffic withou method I stogram of modeled trafficwith 3. method 10 Fig. 5: Histograms of pacl -n i'hy ii' ^ i» . , , , s , & ijit " -f^ i'l. s I I n 70 BU 50 4U 30 .n 10 n ij, 1:!:;: tligfalfifiEf^ P::J 1=.....i^^ztifr^piF; Wv'l 100 150 timers) Fig. 6: Simulated networl< traffics in OPNETsimulation tool. First graph presents simulated traffic, which was model by first method. Second graph presents simulated traffic, which was model by second method. Third graph presents simulated traffic, which was model by third method. 5. Conclusion The presented methods show very good results in the case of modeling network traffic with short-range dependence, where we achieved better contents of packets, sometimes even better bit or packet-rates in the modeled traffic and a more accurate description of captured-traffic, then in the case of using classical manner of modeling the measured traffic. For future research we plan modeled network traffic with long-range dependence with purposeful methods, because in these cases classical estimation (without any methods) totally failed and lead to great discrepancy between measured and modeled traffic in the sense of bit and packet-rates, and also in bursts' intensities. /1/ W. E.Leland, M. S. Taqqu, W. Willinger in D, V. Wilson, "Online self-similar nature of Ethernet traffic (Extended version)", IEEE/ ACM Transactions on Networking, Vol.2, pp.1-15, 1994. /2/ V. Paxon in S. Floyd, "Wide area traffic: ttie failure of Poisson modeling", IEEE/ACM Transactions on Networl