Metodološki zvezki, Vol. 3, No. 1, 2006, 63-74 Improved Parameter Estimation in Rayleigh Model Smail Mahdi1 Abstract In this paper we describe and present results on the parameter point estimation for the scale and threshold parameters of the Rayleigh distribution. Five estimating methods have been investigated, namely, the maximum likelihood, the method of moment, the probability weighted moments method, the least square method and the least absolute deviation method. Modified maximum likelihood estimators for the parameters are also proposed. Simulation studies have shown that the modified likelihood estimator outperforms the estimators obtained with the other methods except in the case of very small samples. 1 Introduction The two-parameter Rayleigh distribution is a continuous probability distribution which usually arises when a two dimensional vector has its two orthogonal components normally and independently distributed. The Euclidean norm of the vector will then have a Rayleigh distribution. The distribution may also arise in the case of random complex numbers whose real and imaginary components are normally and independently distributed. The modulus of these numbers will then be Rayleigh distributed. The Rayleigh variable X with threshold parameter e and scale parameter d is characterized by the cumulative function (x-e)2 F(x) =1- e-2d2 for e£ x < ¥ and d> 0 . (1.1) This distribution plays an important in real life applications since it relates to a large number of distributions such as generalized extreme value, Weibull, chi-square and rice distributions. In this paper we investigate the estimation of the scale and threshold parameters using a modified maximum likelihood method (ML), the moment method (MM), the probability weighted moment method 1 Department of Computer Science, Mathematics and Physics, University of the West Indies, Cave Hill Campus, Barbados. 64 Smail Mahdi (PWM), the ordinary least square method (OLS) and the least absolute deviation (LAD) method. The PWM method is a relatively recent technique which is well used by hydrologists in frequency analysis. The method is strongly advocated in Hosking et al. (1985) and according to Davison and Smith (1990) it constitutes the most serious competitor to the ML method, especially, in the case of small samples. The performance of this technique has been recently investigated in Mahdi and Ashkar (2004) and Ashkar and Mahdi (2003) in Weibull and Log-logistic models, respectively. We organise this paper as follows. In Section 1, we have introduced the problem and in Section 2, we derive the parameter estimators using the considered methods and also give results on the asymptotic variances. Simulations results are discussed and illustrated in Section 3. 2 Estimation methods We derive and present below estimators for the parameters e and d by using the five considered methods. We start with the probability weighted moments method. 2.1 Probability weighted moments The probability weighted moment of order (i,j,k) is obtained from the inverse cumulative function x(F) = e + dJ-2ln(1-F) as Mij ,k =E (Xi [ F ( x )] [ 1-F ( x )] k) = 1 ( x ( F )) Fj ( -F ) k dF . (2.1) We use the usual orders i=1 and j=0 since this leads to a class of linear L-moments, see, Hosking (1986; 1990), with asymptotic normality. We denote ar the corresponding probability weighted momentM10 . After integration and simplification, we obtain a = e + dl p 2(r +1) /(r + 1) . (2.2) Substituting two distinct orders r and s into equation (2.2) gives the probability weighted moment estimate for d as Ù = (r + 1)a ˆr-(s + 1)a ˆs (23) I p I p "\|2(r + 1) -"\| 2(s + 1) Improved Parameter Estimation in Rayleigh Model 65 where aˆ C n-i x ri = n-1? Cr n-x1i for r = 0,1,K,n -1, with the convention Ckj = 0 if k< j , i=1 C is the unbiased estimators for ar , see Hosking (1989). Thus, the estimator for e is given by e = (r + 1)ar-d p . 2(r +1) (2.4) 2.1.1 Asymptotic variances of ˆ and d ˆ The asymptotic variances of e ˆand d ˆare approximated by using the asymptotic variances of the PWM estimates ar . We will use result (5.3), provided in Hosking (1986), stating that the vector whose rth component is jn(ar - ar), forr =0,1,...,m-1, has a Gaussian limiting distribution with mean vector 0 and covariance matrix A = ( Ars ) m-=10 where Ars =Irs+Isr and I ( ())r ( ()) ()( ()) rs = 1-F x 1-F y s F x 1-F x dxdy. ??xy) = 1-{(P(X1fy)´K´P(Xnfy)} = 1-[1-F(y)] n. Substituting now F(y) from equation (1.1), we get (y-e 2 G(y)=1-e 2(d / n ) Improved Parameter Estimation in Rayleigh Model 67 which has the same form as F(y). Therefore the Rayleigh variable e has the meane + —j=J;r2 . We propose then to use the modified maximum likelihood estimators ~ and ~ that are solutions of the systems of equations (2.12) = ? (xi -e)2 ~ i =1 (2.13) 2n and which are based on the unbiased estimator ~ . Squaring equation (2.13) and expressing e as function of e from equation (2.12) yield the second order equation of ~ , (2------) ~2 +2J— (e ˆ - x) ~ + 2xe - x 2 - e ˆ 2 = 0. (2.14) 2n v 2n The discriminant of the above equation is positive and is given by D = 2(e-x)2 + (x2 -x2)(2------)³0. (2.15) 2n Therefore, equation (2.14) has two distinct solutions. Furthermore this equation admits a unique positive solution since the roots product is (e-x)2 +(x2 -x2) ----------------------------- 0. Thus the modified estimators for d and e are p p 2-2n respectively given by (x -x(1))J— +J2(x(1) -x)2 + (x2 -x2)(2------) 2------ V 2n v 2n --------------------------------------------------f 0 (216) 2n and 68 Smail Mahdi x(1) - p n/2 (2.17) where x and x2 are the first and second sample moments, respectively and x(1) is the first order statistic based on the random sample x1,K, xn . 2.2.1 Variance of dÙ ande~. ~ The asymptotic variance of dis approximately given by Var(d ˆ) = ¶2ln(f) -1 = d2 n 4 + 3 e + p 2 d2 d (2.18) obtained from the sample Fisher information on d. On the other hand, we have that Var ~ e) = Var(e) = (2-^2 ) d n by using the distribution ofe ˆ. Thus, ~ is a consistent estimator ofe. 2.3 Method of moments The moment about the origin of order r is given by, (x-e) -(x-e)2 e d2 mr = E(X r ) = ¥? xr x -e e 2d2 dx . After integration we get v2 mr V2 (2d ) k G[(k + 1)/2] 2d k=0 This can be simplified as --------V Crk er k(l2d ) k G [( k + 1 ) /2 ] . mr =YCkr - 1 er+1-k(j2d ) k -1 G[(k+1)/2]. k=1 (2.19) (2.20) (2.21) Improved Parameter Estimation in Rayleigh Model 69 The first and second order non central moments can be evaluated from either equation (2.20) or (2.21). Using for instance equation (2.21), we get m 1=2 e+2 G(3/2)d -e = e+dp — and m2 =e2 + 2V2 G(3/2) ed + 2d2 = e2 + 42p ed+ 2d2. This yields the following moment method estimators for d ande, (2.22) (2.23) d = nYß- Žxi 2 n 1- p p and e=x-d— (2.24) (2.25) where s 22 = yx -x i s an estimator of the population standard deviation. 2.3.1 Asymptotic variances and covariance of ˆ and d ˆ The asymptotic variances and covariance of e ˆ and d ˆ are estimated from the variances and covariance of the sample general moments m ˆr and m ˆs, see for instance, Mahdi and Ashkar (2004), as follows: Var(e ˆ) M121 M2 12 2M11M12 -1 Var(d ˆ) = M 2 M2 22 2M 21 M 22 Cov(e ˆ,d ˆ) M 11 M 21 MM 12 22 M 11 M 22 +M 21 M 12_ where Var(m ˆr) = Var(m ˆr) Var(m ˆl ) Cov( m ˆ r , m ˆ l) m 2r-( mr)2 m 2 l-( ml)2 ;Cov(m ˆ r , m ˆ l)= m r + l-mrml (2.26) n Var(m ˆ l) = n n M= ¶ ;M2= ¶d ; M21 11 e 1 ¶e ¶m l andM ¶m l 22 d¶ In the considered case r = 1 and l = 2, we have 70 Smail Mahdi Var(m ˆ2) Var(m ˆ1)= (^rd , 2n 2(4 - p)d2e2 + 242p d3e + 4d4 n = , and finally: d 2 [(4 - p)e + d—] Cov( m ˆ1 ,m ˆ2) =-------------------------, n M11 = 1, V 2 M21=2e+d 2 p M22 = 4d + e2p . (2.27) (2.28) (2.29) (2.30) (2.31) (2.32) (2.33) 2.4 Regression methods The parameters dande can also be estimated through the linear regression technique from the relationx = e + d]-2ln(1-F(x). Ordinary least square estimates as well as least absolute deviation estimates for dande are obtained from the sample points {xi,yi} where xiis the ith sample value corresponding to the empirical quantileF ˆ(x(i)) and yi =^-2ln(1-F(xi) . Ordinary least square estimates for dande are obtained from the usual intercept and slope linear regression estimates, see, for instance Rice (1995). The least absolute deviation or median regression estimates of dande are obtained as solution to the n minimization problem: Min = |yi-a-bxi| with respect to a and b. The solution is obtained by applying the simplex method to the linear programming problem: - Min?(ri+ + ri- ) under the constraints yi - a -bxi - ri+ + ri- = 0 where ri+ and i=1 ri- are, respectively, the positive and negative residuals associated with the observation xi ,i = 1,...,n. Improved Parameter Estimation in Rayleigh Model 71 2.4.1 Variances of OLS estimators The computation of the variances of the least absolute deviation estimators is extremely tedious. However, we can find the variances of the ordinary least square estimators under the assumptions of the standard statistical models, see, for instance Rice (1995). Let us denote dˆOls and eˆOls the OLS estimators of dande, respectively. The variances of these estimators are, respectively, given by Var(d ˆ Ols) = - ns i=1 and 2nY,ln(1 - F(xi))+ //x/-2ln(1-F(xi))) i=1 (2.34) n 2s2 'S] ln(1 - F(xi)) Var( e ˆ Ols) =iii 2nS]ln(1-F(xi)) + y^-J-2ln(1-F ˆ(xi))) (2.35) where s 2 =Var(X) =dd 2 (236) 2 is obtained from equations (2.22) and (2.23). 3 Discussion We have assessed the performance of the considered estimation methods through simulation studies. Different values of the parameters have been considered as well as different sample sizes. Orders (1,2), (1,3) and (2,3) are used in the PWM method. The sample points were generated using the inverse cumulative function technique. The probability weighted moments are estimated with the plotting method outlined in Hosking (1986), that is, M1,u,v is estimated by M 1,u,v = n(-1) ? x(i) pi u (1- pi )v ˆ i where 2 2 72 Smail Mahdi pi = i+g n+d' for d'>g>-1 We used the values g=- 0.35 and d'=0 which are recommended in Hosking (1986) for the study of the generalized extreme value distribution since the Rayleigh is well related to it. Several values ford,e and n were considered, namely,d= 2,4, 6, 8, 10 ; n = 10,20,30,40,50,60,70,80,90,100, and e=1, 3, 5, 7, 9 and 11. Small sample sizes from 1 to 9 were also considered and obtained corresponding results are displayed in Table 3. The root mean square errors (RMSE) for the estimates were then computed and used as performance index. Note that expressions for the asymptotic variances are also obtained whenever it is possible. These variances may be used, for instance, to compute approximate confidence bounds for the underlying parameters. First we have found that the variation of the evalue does not affect the RMSE results. One can then set, without loss of generality,e=1. However, the root mean square errors obtained with all methods increase as the value d increases, as illustrated in Table 1 below. On the other hand, the study has shown that the PWM orders 1 and 2 provide better RMSE results. Table 1: RMSE of dande estimates obtained with the different methods combined by averaging the sample sizes n= 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100 for various values of d. e=1 and PWM orders 1 and 2 are used. d 2 4 6 8 10 dˆPWM .2 3 .4 5 .68 .90 1.1 3 eˆPWM .2 5 .5 0 .75 .99 1.2 5 dˆML .2 0 .4 0 .61 .82 1.0 3 eˆML .2 3 .4 7 .71 .95 1.1 9 dˆmm .2 3 .4 8 .72 .95 1.2 0 eˆMM .4 4 .8 8 1.3 1 1.7 5 2.2 0 dˆOLS .2 3 .4 6 .69 .92 1.1 6 eˆOLS .2 8 .5 6 .84 1.1 2 1.4 1 dˆLAD .2 3 .4 6 .70 .93 1.1 7 eˆLAD .2 8 .5 6 .85 1.1 4 1.4 2 Our investigation has also shown that the method of moments performs poorly in comparison to the other methods. Table 2, displayed below, gives the root mean Improved Parameter Estimation in Rayleigh Model 73 square errors as functions of the sample size n. It shows that the root mean square values are monotonically decreasing with the sample size n. Overall, all methods have performed reasonably well except the method of moments. However, the modified maximum likelihood method provides better estimates ford, with any sample size, and for bothd andeparameters when the sample sizes are not small, say n=10, as illustrated in Tables 2 and 3. Note that in the case of small samples, the PWM method outperforms the maximum likelihood method for the estimation of e and performs almost as good as the maximum likelihood method for the estimation ofd, see, Table 3 results. Consequently, we recommend using the modified maximum likelihood method for the parameter estimation of the Rayleigh distribution in the case of non small samples. However, we notice that there is a gain in using the PWM method for estimating the Rayleigh threshold parameter when the sample size is small; this confirms Davison and Smith (1990) statement. Table 2: RMSE of dande estimates, obtained with the different methods, combined by averaging over the values d = 2, 4, 6, 8 and 10 for various sample sizes n. e=1 and PWM orders 1 and 2 are used. n 10 20 30 40 50 60 70 80 90 100 dˆPWM 1.34 .96 .78 .68 .61 .57 .52 .48 .4 6 .43 eˆPWM 1.45 1.05 .86 .75 .67 .61 .57 .53 .5 0 .46 dˆML 1.25 .87 .71 .61 .55 .49 .46 .43 .4 1 .39 eˆML 1.46 1.03 .82 .70 .62 .56 .52 .48 .4 5 .43 dˆmm 1.45 1.01 .82 .71 .64 .58 .54 .50 .4 7 .45 eˆMM 2.96 1.93 1.52 1.2 8 1.13 1.02 .93 .86 .8 1 .76 dˆOLS 1.37 .98 .80 .70 .63 .57 .53 .50 .4 7 .44 eˆOLS 1.70 1.19 .97 .84 .75 .69 .64 .60 .5 6 .53 dˆLAD 1.40 .99 .80 .70 .62 .57 .53 .49 .4 7 .44 eˆLAD 1.77 1.21 .98 .84 .75 .68 .63 .59 .5 6 .53 Acknowledgment The financial support of UWI is gratefully acknowledged. The author is grateful to the editors and referees for their valuable help and suggestions that benefited and improved this article. 74 Smail Mahdi Table 3: RMSE of dande estimates, obtained with the different methods, combined by averaging over the values d = 2, 4, 6, 8 and 10 for small values n=5, 6, 7, 8 and 9. e=1 and PWM orders 1 and 2 are used. 6 7 8 9 1.7 0 1.5 8 1.4 9 1.4 1 1.8 2 1.7 0 1.6 1 1.5 3 1.6 6 1.5 2 1.4 2 1.3 3 2.1 4 1.9 4 1.7 9 1.6 6 1.9 0 1.7 5 1.6 3 1.5 3 4.1 3 3.7 3 3.4 2 3.1 7 1.7 7 1.6 3 1.5 4 1.4 4 2.2 4 2.0 6 1.9 1 1.8 0 1.8 3 1.6 7 1.5 8 1.4 8 2.3 6 2.1 2 2.0 3 1.8 9 Note: The numerical studies have been carried out with Gauss and SPSS, Release 11. References [1] Abramowitz, M. and Stegun, I. (1970): Handbook of Mathematical Functions. New York: Dover Publications, Inc. [2] Ashkar, F. and Mahdi, S. (2003): Comparison of two fitting methods for the log-logistic distribution. Water Resources Research, 39, no. 8. [3] Davison, A.C. and Smith, R.L. (1990): Models for exceedances over high threshold. J. R. Stat. Soc. B, 52 (3), 393-442. [4] Hosking, J.R.M. (1986): The Theory of Probability Weighted Moments. New York. Research Report RC12210, IBM Thomas J. Watson Research Center. [5] Hosking, J.R.M. (1990): L-Moments: analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. B, 52(1), 105-124. [6] Mahdi, S. and Ashkar, F. (2004): Exploring Generalized Probability Weighted Moments, Generalized Moments and Maximum Likelihood Estimating Methods in Two-Parameter Weibull Model. Journ. of Hydrology, 285, 62-77. [7] Rice, J.A. (1995): Mathematical Statistics and Data Analysis, 2nd Ed. Duxbury Press. n 5 dˆPWM 1.8 5 eˆPWM 1.9 6 dˆML 1.8 4 eˆML 2.4 0 dˆmm 2.1 0 eˆMM 4.6 7 dˆOLS 1.9 4 eˆOLS 2.4 9 dˆLAD 2.0 3 eˆLAD 2.7 4