https://doi.org/10.31449/inf.v42i4.1862 Informatica 42 (2018) 545–553 545 The Heteroskedasticity Tests Implementation for Linear Regression Model Using MATLAB Lуudmyla Malyarets, Katerina Kovaleva, Irina Lebedeva, Ievgeniia Misiura and Oleksandr Dorokhov Simon Kuznets Kharkiv National University of Economics, Nauky Avenue, 9-A, Kharkiv, Ukraine, 61166 E-mail: aleks.dorokhov@meta.ua http://www.hneu.edu.ua/ Keywords: regression model, homoskedasticity, testing for heteroskedasticity, software environment MATLAB Received: September 23, 2017 The article discusses the problem of heteroskedasticity, which can arise in the process of calculating econometric models of large dimension and ways to overcome it. Heteroskedasticity distorts the value of the true standard deviation of the prediction errors. This can be accompanied by both an increase and a decrease in the confidence interval. We gave the principles of implementing the most common tests that are used to detect heteroskedasticity in constructing linear regression models, and compared their sensitivity. One of the achievements of this paper is that real empirical data are used to test for heteroskedasticity. The aim of the article is to propose a MATLAB implementation of many tests used for checking the heteroskedasticity in multifactor regression models. To this purpose we modified few open algorithms of the implementation of known tests on heteroskedasticity. Experimental studies for validation the proposed programs were carried out for various linear regression models. The models used for comparison are models of the Department of Higher Mathematics and Mathematical Methods in Economy of Simon Kuznets Kharkiv National University of Economics and econometric models which were published recently by leading journals. Pozvetek: Avrorji prispevka se ukvarjajo s problemi ekonometričnih modelov z veliko dimenzijami, kjer je izračun problematičen. Razvijejo metodo v MATLABu za multifaktorske regresijske modele. 1 Introduction In econometrics, a linear regression model is often used to describe different processes and phenomena. Using matrix notation, the linear model regression can be given as:  + = XB Y (1) where Y and  are 1  n matrices, X is ) 1 ( +  m n , and B is 1 ) 1 (  + m ; n is the number of measurements (sample size); m is the number of independent variables in the regression model. For the i th row of X (the i th observation) the linear regression model can be written as follows: i im m i i i x b x b x b b y  + + + + + = ... 2 2 1 1 0 (2) where i y are the values of the dependent variable, Y  i y ; i is the experiment identification number, n i , 1 = ; ij x are the values of the independent variable X  j х ( m j , 1 = ) in the i th experiment; 0 b is the constant term of the equation; j b are the regression coefficients, B  j b b , 0 ; i  are the residuals (model errors). An error term is introduced in a regression model because the model does not fully represent the actual relationship between the variables of the model. As a result of this incomplete relationship, there are differences between the observed responses (values of the variable being predicted) in the given dataset and those predicted by a linear function of a set of explanatory variables. The error term is the amount at which the equation may differ from measurements. In other words that is the ‘white noise’. As a rule, the building a linear regression model is done by the method of ordinary least squares (OLS). This method for estimating the unknown parameters is based on the minimization of the sum of the squares of the model errors. The estimators of model parameters determined by OLS are known as best linear unbiased estimators (BLUE). The variances of the model parameters are determined by: jj e jj n i i j b z z m n S  = − − =  = 2 1 2 2 1   (3) 546 Informatica 42 (2018) 545–553 L. Malyarets et al. where jj z is the diagonal element of matrix 1 ) ' ( − = X X Z which corresponds to the parameter j b ; e  is the standard error. The OLS application requires the realization of a number of conditions [1 – 3]. Only if these conditions are met, the estimates calculated by such a model will be unbiased, efficient and well-off. These conditions are formulated in the form of the Gauss ‒ Markov theorem. According to this theorem there are four principal assumptions which admit the using of linear regression models for research and prediction. One of them is the homoskedasticity (constant variance) of the errors in relation to any independent variable. Homoskedasticity makes the assumption that the errors have a constant variance: const = ) var(  and independent of causal variables: 0 ) , cov( =  j x for all j , m j , 1 = . The error  is a random variable distributed according to the normal law: ) , 0 ( ~ 2    Ν where the mathematical expectation of the error term is zero and the variance is constant. Failure to comply with this requirement leads to bias in the estimates obtained using such a regression model. Thus [4] indicate that estimation uncertainty may increase dramatically in the presence of conditional heteroskedasticity. The requirement of homoskedasticity also exists in the construction of the econometric model using the maximum likelihood method [5 – 7]. When the scatter of the errors is different, varying depending on the value of one or more of the independent variables, the error terms are heteroskedastic. Namely the distribution law of errors remains normal with a mathematical expectation equal to zero, but the errors of the model are a function of the values of the independent variables:  ~ )) ( , 0 ( X f Ν , where ) (X f is a function that describes the change in the variance of errors as a function of the values of the independent variables. A similar problem arises during the building of semiparametric [8 – 10] and nonparametric [11, 12] models. Heteroskedasticity makes difficult to gauge the true standard deviation of the forecast errors. The OLS estimates are no longer BLUE. Thus, if the variance of the errors is increasing over time, confidence intervals for out-of-sample predictions will tend to be unrealistically narrow. In particular, heteroscedasticity does not allow us to use equation 3 for the computation of j b S , since it assumes a uniform dispersion of the errors. Under heteroskedasticity, the sample variance of OLS estimator is 1 1 2 ) ' ( ' ) ' ( ) ˆ ( − −  = X X X X X X Var e j b  (4) where Ω is the covariance matrix, the elements of which are defined as the variance of the model parameters. Under homoskedasticity,_Ω= I. Equation 4 is correct if there is no autocorrelation. For these reasons, all the conclusions obtained on the basis of the corresponding − t statistics and − F statistics, as well as interval estimates, will be unreliable. A unified approach to the estimation of heteroscedasticity is lacking. To solve this problem, a large number of different tests and criteria have been developed: the Spearman rank correlation test, the Park test, the Glaser test, the Goldfeld ‒ Quandt test, the Breusch – Pagan test, the Leven's test, the White test, and so on. The application of all the above tests is very difficult for the so-called ‘manual’ account, and for a large set of initial data it is completely impossible. There are a lot of software with which you can identify heteroscedasticity. These are professional packages (SAS, BMDP), universal packages (STADIA, OLIMP, STATGRAPHICS, STATISTICA, SPSS) and specialized packages (DATASCOPE, BIOSTAT, MESOSAUR). When using economic data researcher can face two main problems. Firstly, all the listed software are quite expensive and price of the product may be an insurmountable barrier for the young researcher. Secondly, company-developer never provides the source code, considering that this is not necessary for an ordinary user. Therefore, we can not modify the built-in algorithms to detect and eliminate heterosquadity. Another drawback of the above program products is the outdated conceptual approaches to econometric methods, which are constantly being improved. For example, the program products SPP and MICROSTAT calculate the coefficient of multiple correlations as the square root of the coefficient of determination. STATGRAPHICS calculates it as the square root of the adjusted coefficient of determination [13]. While in theory the coefficients of multiple correlations is estimated using elements of the correlation matrix [2]. Another important aspect that should be taken into account is the existence of different algorithms to identify heteroskedasticity and the specific problem of division by zero [14]. Ideal option would be to create your own software product that would take into account the research tasks. However, to write such a program, the economist should be an expert in algorithmic programming. But this happens rarely. In this article, we carry out a comparative analysis of the tests most often used to detect heteroskedasticity [1, 2, 14] and give their source code. The use of program code allows you to modify the program in accordance with the objectives of the study. 2 Analysis of literary data and the formulation of the problem Before starting the construction of the regression model, it is necessary to verify whether the conditions of the Gauss-Markov theorem are fulfilled. The Heteroskedasticity Tests Implementation for Linear... Informatica 42 (2018) 545–553 547 One of the main methods of preliminary research on heteroskedasticity is a visual analysis of the graph of residues. On these graphs, the scattering of points can vary depending on the value of the independent variables [14, 15]. To estimate heteroskedasticity, are used such quantitative tests [15 ‒ 17] as the White test, the Goldfeld ‒ Quandt test, the Breusch ‒ Pagan test, the Park test, the Glazer test and also the Spearman test. Unlike other tests the Spearman rank correlation test is a nonparametric statistical test for the heteroskedasticity of random errors in the econometric model. The test algorithm can be studied in detail in [18, 19]. However, it is still not implemented in software products which are used to build multiple models [20 ‒ 27]. In this paper we examined the software packages most commonly used in economic activity, which contain tests for heteroskedasticity [15, 28]. Indeed, these software products do not contain the Spearman rank correlation test. The most widely used for evaluating heteroscedasticity is the Park test [20, 21]. However, the Park test contains the assumption that the change in the remnants of the model is described by a functional dependence of a certain type. It was noted in [24, 25] that this can lead to unreasonable conclusions. Therefore, the authors propose to consider the Park test together with other tests. The software implementation of the Park test for multiple models also does not exist [28]. As far as we know software implementation of the Park test for multifactor models also does not exist. Another test that the authors of the article implemented in the MATLAB environment is the Goldfeld ‒ Quandt test. This test to check for heteroskedasticity of random errors is used when there is reason to believe that the standard deviation of errors is proportional to some variable. The test statistics has a Fisher distribution [18, 27]. The Goldfeld ‒ Quandt test can also be used if there is an assumption of intergroup heteroskedasticity, when the variance of errors takes, for example, only two possible values. In this case, for the application of the test, there is a need for its software implementation, since applied commercial software has not taken this possibility into account [25, 28]. In scientific articles on for the problem of detecting heteroskedasticity, the Breusch ‒ Pagan test is often considered [10, 29]. We also carried out research this problem. But it oversteps this article. Analysis of literature sources shows that all tests of heteroskedasticity detection are difficult for ‘manual’ application and require the development of special software. In turn, the software of econometric research does not contain built-in functions for heteroskedasticity testing with open source code. That is way the authors of this article attempted to implement the above tests for heteroskedasticity in the construction of multifactor econometric models in the MATLAB software environment. It should be noted that MATLAB does not contain ready-made software implementation to verify compliance of homoskedasticity. We chose it as a programming environment. For this purpose, other programming environments can also be used, for example, a the free software environment R. The authors have chosen MATLAB by the following reasons. First, MATLAB is used as a high-level programming language for writing scripts (Spearman.m, Parks.m and Gold_Quan.m). Secondly, MATLAB includes built-in functions for constructing regression models (Econometric toolbox), which gave the authors relief from the duty of programming the standard functions of regression analysis. Thirdly, the authors worked with data structures based on matrices. 3 Aims and objectives of the study The purpose of the article is to present functions to check for heteroskedasticity in multifactor regression models. The implementation is made in MATLAB. To achieve this purpose, it is necessary to solve a number of problems. Namely: • writing the program code in the MATLAB programming environment; • planning and execution of computer calculations; • completion of programs; • analysis and interpretation of results; • comparison with the results of calculations using software products of leading companies. 4 Practical implementation of the criteria for the detection of heteroskedasticity in econometric models in the MATLAB 4.1 Spearman’s rank correlation test for multiple regression models The use of the Spearman’s test assumes that the variance of model errors will increase (or decrease) with increasing values of the independent variable. This means that the absolute values of errors i  ) , 1 ( n i = and the values ij x of the independent variable j x ) , 1 ( m j = will correlate with each other. To check whether heteroskedasticity is statistically significant the Spearman’s test provides for the following stages: 1) Estimation of the parameters of the econometric model by the OLS: im m i i i x b x b x b b y + + + + = ... ˆ 2 2 1 1 0 , (5) where i y ˆ is the predicted response in accordance with the model when the independent variables are ) ... ; ; ( 2 1 im i i x x x ; 548 Informatica 42 (2018) 545–553 L. Malyarets et al. 2) Calculate model errors as the difference between the empirical and the ratchet value of the dependent variable: i i i y y ˆ − =  (6) where i y is the value of the dependent variable in the i th experiment; 3) The pairs ) , ( i ij x  are ranked in order of increasing values of the independent variable j x ; 4) The coefficient of rank correlation between i  and ij x is calculated as ( ) 1 6 1 2 1 2 −   − =  = n n d r n i i x  , (7) where i d is the difference between the two ranking; 5) The significance of  x r is tested by using − t statistic: 2 1 2   x x r n r t − − = (8) 6) In accordance with the predetermined confidence probability р (where р − = 1  ) the tabulated value of ) 2 ( 5 . 0 . − = n t t cr  is found. Then the calculated value is compared with the critical one. If the t-statistic value is greater than the critical value, we must say that heteroscedasticity is statistically significant. Here  is the significance level which is chosen to test the null hypothesis: 0 =   x . In the opposite case, the null hypothesis is non-contradictory. As an example of the implementation of this test, we can suggest the following m-file named Spearman: ======================================== % initialization: X1 = load('data1.scv'); X2 = load('data2.scv'); X3 = load('data3.scv'); X4 = load('data4.scv'); X5 = load('data5.scv'); Y = load('data.scv'); % Formation of the source data array: X = [ones(n,1) X1' X2' X3' X4' X5']; % Construction of a linear multifactor % model by OLS - method: [b,bint,r,rint,stats] = regress(Y,X,0.05); y_p = b(1) + b(2).*X1 + b(3).*X2+ b(4).*X3+b(5).*X4+b(6).*X5 sprintf('Model:') fprintf('y_p = %f + %f *X1+%f *X2+%f *X3+%f *X4+%f *X5',b) % Calculation of model remains: e = Y - y_p'; % Preparing an array for further work: X = [X1' X2' X3' X4' X5']; [n,m] = size(X);% Determining the size of the source data %========================================== %% Spearman rank correlation test % Ranking of factors: [Xs I] = sort(X) Dx = zeros(n,m); for j = 1:m for i = 1:n Dx(i,j) = i; end end TMP = zeros(n,m); % Filling an array of factors with ranks % taking into account their sequence numbers: for j = 1:m for i = 1:n i1 = I(i,j); TMP(i1,j) = Dx(i,j); end end X = [X TMP]% Output array % Ranking of remains: [es I] = sort(e); es = [es ones(size(e),1)]; e = [e ones(size(Y),1)]; sprintf(' critical values t:','\n') t_r(:,j) = (r(:,j)*sqrt(n-1))/sqrt(1 - r(:,j)^2); end t_r % output array t-statistics by Spearman % Comparative analysis and conclusions: c = 0; for i = 1:size(e) es(i,2) = i; end % Filling an array of remains with ranks % taking into account their sequence numbers: for i=1:size(e) e(I,2) = es(:,2); end e% an array of remains which contains ranks r = zeros(1,m); d = zeros(n,m); % Calculating the difference of ranks for j = 1:m for i = 1:n d(i,j) = TMP(i,j) - e(i,2); end end d % difference in rank % The square of the difference of ranks: for j = 1:m d(:,j) = d(:,j).^2; end d Sd = zeros(1,m); % The sum of the difference of ranks squares % by the corresponding columns of ranks: for j = 1:m Sd(:,j) = sum(d(:,j)); end Sd % output array % Calculating Spearman's Statistics: for j = 1:m r(:,j) = 1 - (6*Sd(:,j))/(n*(n^2-1)); end r % Output array t_r = zeros(1,m); %% Testing of the significance of the Spearman coefficient: t_t = tinv(0.975,n-2)% tabulated value t for j = 1:m if abs(t_r(:,j)) < abs(t_t) sprintf(' Heteroskedasticity is absent ') else sprintf(' Heteroscedasticity is present ') c = c + 1; end end ======================================== The Heteroskedasticity Tests Implementation for Linear... Informatica 42 (2018) 545–553 549 4.2 Park's test for multiple regression models R. Park proposed a test to check for heteroskedasticity, which is based on some formal dependencies. Namely, it assumes that the heteroskedasticity may be proportional to some power of an independent variable j x in the multiple models. Since the variance of errors ( ) i i    2 2 = is a function of the − i th value ij x of the explanatory variable j x , and for its description Park proposed the this dependence: i v ij i x     2 2 = . After computing its logarithms, we obtain the following relation: i ij i v x + + = ln ln ln 2 2    . Since the variances 2 i  are usually unknown, they are replaced by their estimates 2 i  . The Park's test provides for such effectuation stages: 1) Estimation of the parameters of the econometric model by the OLS (Equation 5); 2) Calculation of the value 2 2 ) ˆ ln( ln i i i y y − =  for each observation; 3) Building the regression model: i ij i x     + + = ln ln 2 , (9) where 2 ln   = . For the case of multiple regressions, this dependence is constructed for each explanatory variable; 4) Verification of statistical significance of the coefficient  on the basis of − t statistics:    = t . (10) 5) In accordance with the predetermined confidence probability р (where р − = 1  ) the tabulated value of ) 1 ( . − − = m n t t cr  is found. Then the calculated value is compared with the critical one. If 1 ( − −  m n t t  , then at the level of significance  the coefficient  is statistically significant and there is a link between 2 ln i  and i x ln . It means that heteroskedasticity is present in statistical data. The M-file named Park's which is implementation of the Park test has the form: ======================================= % initialization: X1 = load('data1.scv'); X2 = load('data2.scv'); X3 = load('data3.scv'); X4 = load('data4.scv'); X5 = load('data5.scv'); Y = load('data.scv'); % Formation of the source data array: X = [ones(n,1) X1' X2' X3' X4' X5']; [n, m] = size(X); % ========== Park Test Algorithm ======= % 1 stage of the Park test % Construction of a linear multifactor % model by OLS - method: [b,bint,r,rint,stats] = regress(Y,X,0.05); y_p = b(1) + b(2).*X1 + b(3).*X2+ b(4).*X3+b(5).*X4+b(6).*X5 sprintf('Model:') fprintf('y_p = %f + %f *X1+%f *X2+%f *X3+%f *X4+%f *X5') % 2 stage of the Park test ln_eps = log((Y' - y_p).^2) % 3 stage of the Park test for j=1:m for i = 1:n X(i,j) = log(X(i,j)); end end % 4 stage of the Park test for i = 2:m [bet, dev,stat] = glmfit(X(:,i),ln_eps); t_t = tinv(0.95, n-2); t_r = stat.t(2); % Comparative analysis and conclusions: if abs(t_r) < abs(t_t) sprintf(' Heteroskedasticity of %i factor is absent \n',i-1) else sprintf(' Heteroskedasticity of %i factor is present\n',i-1) end end ======================================== The Park test’s weakness is that it assumes the heteroskedasticity has a particular functional form. 4.3 Goldfeld ‒ Quandt test for multiple regression models When using the Goldfeld-Quandt test for heteroscedasticity, it is assumed that model errors   depend on one of the external variables j x : 2 2 2 ij x i    = It is also assumed that errors i  are distributed according to the normal law, there is no autocorrelation. The Goldfeld-Quandt test provides for such effectuation stages: 1) Estimation of the parameters of the econometric model by the OLS (Equation 5); 2) Ranking of all n observations in magnitude of the independent variable j x ; 3) Segregation this ordered sample into three approximately equal parts k k n k , 2 , − , respectively; 4) For each part of the sample that has a volume k , its regression equation is constructed and the sums of the squares of the deviations determine:  = = k i i RSS 1 2 1  (11) and  + − = = n k n i i RSS 1 2 3  . (12) Than empirical meaning of the − F statistic is calculated: 550 Informatica 42 (2018) 545–553 L. Malyarets et al. ) 1 /( ) 1 /( 3 1 − − − − = m k RSS m k RSS F (13) 5) Evidence of heteroskedasticity is based on a comparison of the residual sum of squares (RSS) using the − F statistic. The calculated value is compared with the critical value ) 1 ; 1 ( . − − − − = m k m k F F cr  in accordance with the predetermined confidence probability р (where р − = 1  ). If ) 1 ; 1 ( − − − −  m k m k F F  , this means that at the level of significance  the hypothesis that there is no heteroskedasticity does not have grounds to reject. In the opposite case, the hypothesis of the absence of heteroskedasticity is rejected. For multiple regressions, we performed tests for all factors. The M-file named Gold_Quan which is the implementation of the Goldfeld ‒ Quandt test has the form: ======================================== % initialization: X1 = load('data1.scv'); X2 = load('data2.scv'); X3 = load('data3.scv'); X4 = load('data4.scv'); X5 = load('data5.scv'); Y = load('data.scv'); % Formation of the source data array: X = [ones(n,1) X1' X2' X3' X4' X5']; [n, m] = size(X); %========================================= %% Goldfeld ‒ Quandt test: [Xsort Is] = sort(X); for i=1:size(Y) Ysort(i,1) = Y(Is(i),1); end Dat = [Xsort Ysort]; c = fix(4*n/15); k = fix((n - c)/2); if floor(k) > 0.4 k = k+1; end k % Selective aggregate 1: Dat1 = Dat(1:k,:); [b1,dev1,stats1] = glmfit(Dat1(:,1),Dat1(:,2)); S1 = sum(stats1.resid.^2); % Selective aggregate 2: Dat2 = Dat(n-k+1:n,:); [b2,dev2,stats2] = glmfit(Dat2(:,1),Dat2(:,2)); S2 = sum(stats2.resid.^2); % Testing the hypothesis: if S1 > S2 Fp = S1/S2; else Fp = S2/S1; end Ft = finv(0.95,k-m-1,k-m-1); if Fp > Ft sprintf(Heteroscedasticity is present ') else sprintf(Heteroscedasticity is absent ') end ======================================== A weakness of the Goldfeld ‒ Quandt test is that the result is dependent on the criteria chosen for separating the sample measurements into their representative groups. 5 Results of numerical experiments The problem of detecting heteroskedasticity in various multifactor econometric models was considered. For carrying out numerical simulation experiments we used both the models of the Department of Higher Mathematics, Economic and Mathematical Methods of KhNEU [30 ‒ 33], and econometric models which were published recently by leading journals [34 ‒ 36]. To check for heteroscedasticity, we used real data. This is one of the advantages of this paper. However, it is possible to use the data obtained with the Monte Carlo simulation [6, 7, 37 ‒ 39]. Numerical experiments were performed on the configuration AMD Athlon 64 3200+1.5Gb Ram, graphic accelerator – Nvidia GeForce GTX 560 2Gb with using technology NVIDIA CUDA 4.2. Let's look at a concrete example of what happens to an eccentric model, if you do not take into account heteroskedasticity. As a model problem, the linear regression model was calculated for the cost of electronic textbooks produced by the Department Higher Mathematics and Mathematical Methods in Economy. The initial data and designations used in the process of correlation-regression analysis are shown in Figure 1, where Y is the resulting factor Y (cost of the electronic textbook). Figure 1: Initial data for model building Figure 1 shows the dependence of the cost of the electronic textbook (Y) on such external factors: ○ - X1 (average cost of developers' wages); + - X2 (publication volume); × - X3 (average CD recording price); The Heteroskedasticity Tests Implementation for Linear... Informatica 42 (2018) 545–553 551 * - X4 (storage and distribution costs); • - X5 (cost of the use of licensed software). The regression model was constructed using the built-in function Matlab-regress (y, X, alpha) with the code: ======================================== % The program for multiple regression model building, if heteroskedasticity is not taken into account : [b,bint,r,rint,stats] = regress(Y,X,0.05); y_p = b(1) + b(2).*X1 + b(3).*X2+ b(4).*X3+b(5).*X4+b(6).*X5; sprintf(' Heteroskedasticity is not taken into account:') fprintf('y_p = %f + %f *X1+%f *X2+%f *X3+%f *X4+%f *X5',b) ======================================== The program for constructing multiple regressions, if you do not take into account heteroskedasticity, gives such a result: . 87 . 0 33 . 3 90 . 70 61 . 10 33 . 0 06 . 1864 ˆ 5 4 3 2 1 x x x x x y  +  +  + +  +  + − = (14) The results of calculating the errors of the model represented by the Equation 10 are shown in Figure 2. Figure 2: Graphic illustration of the remnants of the model Analysis of the remnants of the model indicates that for this model the dispersion of remnants increases with an increasing of the value of external factors, that is, heteroskedasticity can not be ignored. Using the program procedures developed by the authors to identify heteroskedasticity, the following results were obtained: ======================================== ans = Heteroskedasticity 1 is absent ans = Heteroskedasticity 2 is absent ans = Heteroskedasticity 3 is absent ans = Heteroskedasticity 4 is absent ans = Heteroskedasticity 5 is present ======================================== The construction of the regression model, which takes into account the heteroskedasticity, was performed using the built-in function MATLAB: robustfit (X, y, wfun, tune,const). It should be emphasized that the presence or absence of heteroskedasticity in the initial data is determined automatically by using the check box. For this we used the code: ======================================== %c is a parameter that takes the value 0 or 1 %(where 0 - Heteroscedasticity is absent, 1 - % Heteroscedasticity is present), %c depends on the result of the scripts’ work if c > 0 X = [X1' X2' X3' X4' X5']; [b,stats3] = robustfit(X,Y,'fair',0.001,'on'); y_p = b(1) + b(2).*X1 + b(3).*X2+ b(4).*X3+b(5).*X4+b(6).*X5; sprintf('Heteroskedasticity is taken into account:') fprintf('y_p = %f + %f *X1+%f *X2+%f *X3+%f *X4+%f *X5',b) end ======================================== The program for multiple regression model building, if heteroskedasticity is taken into account yields this result: . 80 . 0 18 . 4 16 . 29 33 . 10 94 . 0 85 . 27 ˆ 5 4 3 2 1 x x x x x y  +  +  − −  +  + = (15) Thus, the above procedure allows eliminating heteroskedasticity. In this case, the resulting models will be able to adequately reflect the reality. Table 1 shows the results of numerical experiments on testing of programs which are presented in this article on various multifactor models. As can be seen from Table 1, software products developed by us using MATLAB can be proposed both for constructing multifactor econometric models, and for investigating the latter for the presence of heteroskedasticity. In doing so, we used new numerical algorithms, developed on the basis of well-known tests of heteroskedasticity detection. Open source code allows the researcher to use this software to solve their own problems. 0 100 200 300 400 500 600 700 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 5 Xj e = Y - Yp Residuals plot X1 X2 X3 X4 X5 552 Informatica 42 (2018) 545–553 L. Malyarets et al. 6 Conclusion and future work The article examined one of the key problems of regression analysis, which consists in verifying the fulfillment of the requirement of homoskedasticity of the remainders of the model. To this end we used various statistic tests. Analysis of literature sources and our own studies confirm the complexity of using all existing tests for detecting heteroskedasticity in the ‘manual account’ mode. Therefore, we gave our own implementation in MATLAB for tests used for detecting heteroskedasticity. This problem was successfully solved, as shown results of numerical experiments which are presented in the article. We represent all software products we have created with open source code, which enables each researcher to customize the program to their problems. In conclusion, we want to note that the work presented in this article is an on going work having the final purpose to create a complete and effective software for detecting heteroskedasticity in regression models. Another further development consists in developing a complete econometric toolbox in MATLAB. 7 Reference [1] Dougherty C (2016). Elements of econometrics http://www.londoninternational.ac.uk/sites/default/f iles/programme_resources/lse/lse_pdf/subject_guid es/ec2020_ch1-4.pdf [2] Dougherty C (2016). Introduction to Econometrics (5th edition) University Press: Oxford [3] Hansen B (2018). Econometrics. University of Wisconsin http://www.ssc.wisc.edu/~bhansen/econometrics/Ec onometrics.pdf [4] Brüggemann R, Jentsch C and Trenkler C (2016). Inference in VARs with conditional heteroskedasticity of unknown form Journal of econometrics 191 pp. 69-85. http://dx.doi.org/10.1016/j.jeconom.2015.10.004 [5] Cordeiro G (2008). Corrected maximum likelihood estimators in linear heteroskedastic regression models Brazilian Review of Econometrics 28 pp. 11–16. [6] Hayakawa K and Pesaran H (2015). Robust standard errors in transformed likelihood estimation of dynamic panel data models with cross-sectional heteroskedasticity Journal of econometrics 188 pp. 111-134. http://dx.doi.org/10.1016/j.jeconom.2015.03.042 [7] Chen S, Khan S and Tang X (2016). Informational content of special regressors in heteroskedastic binary response models Journal of econometrics 193 pp. 162-182. http://dx.doi.org/10.1016/j.jeconom.2015.12.018 [8] Kai B, Li R and Zou H (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models The annals of statistics 39 pp. 305-332 [9] Pelenis J (2014) Bayesian regression with heteroscedastic error density and parametric mean function Journal of econometrics 178 pp. 624-638. http://dx.doi.org/10.1016/j.jeconom.2013.10.006 [10] Norets A (2015). Bayesian regression with nonparametric heteroskedasticity Journal of econometrics 185 pp. 409-419. http://dx.doi.org/10.1016/j.jeconom.2014.12.006 [11] Wei C and Wan L (2015). Efficient estimation in heteroscedastic varying coefficient models Econometrics 3 pp. 1-7. [12] Shen S, Cui J and Wang C (2014). Testing heteroscedasticity in nonparametric regression based on trend analysis Journal of Applied Mathematics 2014 pp. 1-5. http://dx.doi.org/10.1155/2014/435925 [13] Pen R (2015). Planirovaniye experimenta v Statgraphics Centurion. Mezdunarodnye zurnal eksperimentalnoy obrazovanija pp. 160–161 (in Russian) [14] Malyarets L (2014). Economico-mayematechni metody i modeli. KhNEU im. S Kuznetz: Kharkiv (in Ukrainian) Multiple Models Theoretical results The results of the work of the authors' programs Spirmen.m Park.m Goldfeld ‒ Quandt.m Model [28]: Linear approximation Power approximation Hyperbolic approximation Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is present Heteroskedasticity is present Heteroskedasticity is present Heteroskedasticity is present Model [30] Heteroskedasticity is present Heteroskedasticity is present Heteroskedasticity is present Heteroskedasticity is present Model [31] Heteroskedasticity is present Heteroskedasticity is present Heteroskedasticity is present Heteroskedasticity is present Model [33] Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Model [34] Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Heteroskedasticity is absent Model [35] Heteroskedasticity is present Heteroskedasticity is present Heteroskedasticity is absent* Heteroskedasticity is present * The conclusion is not justified, since the test uses a monotonically increasing function Table 1: Results of testing programs on multiple models The Heteroskedasticity Tests Implementation for Linear... Informatica 42 (2018) 545–553 553 [15] Williams R (2015). Heteroskedasticity University of Notre Dame https://www3.nd.edu/~rwilliam/stats2/l25.pdf [16] Kolchinskaya E (2015). Vlijanie transportnoj infrastrucrury na promyshlennoe razvitie regionov Rossii. Aktualnye problemy ekonomiki 34 pp. 77-82 (in Russian) [17] Radkovskaya E (2015). Matematicheskie metody v sovremennyh ekonomicheskih issledovaniyah. Vestnik Yugorskogo gosudarstvennogo universiteta 37 pp. 37-40 (in Russian) [18] Gmurman V (2001). Teorija verojatnostej i matematicheskaya statistika (7th edition). Vyshaya shkola: Moscow (in Russian) [19] Heteroskedasticity http://gauss.stat.su.se/gu/e/slides/Lectures%208- 13/Heteroscadasticity.pdf [20] Redace R (2017). Use the Park test to check for heteroskedasticity http://www.dummies.com/education/economics/eco nometrics/use-the-park-test-to-check-for- heteroskedasticity/ [21] Mazorchuk M (2014). Osobennosti vybora metodov izmereniya nadezhnosti pedagogicheskih tekstov. Radioelectrohhi i komp’jutorni sistemy pp. 131-137 (in Russian) [22] Kim D, El-Tawil S and Naaman A (2007). Correlation between single fiber pullout and tensile response of FRC composites with high strength steel fibers. Fifth International RILEM Workshop on High Performance Fiber Reinforced Cement Composites (HPFRCC5). RILEM, ed H W Reinhardt and A E Naaman: Paris pp. 67-76. [23] Baranov N and Sorokin L (2015). Komp’yuternye prikladnye programmy v formatirovanii stilja myshleniya budushchego spetsialista. Mezhdunarodnyj nauchno-issledovatekskij zhurnal 42 pp. 60-62 (in Russian) [24] Kazanskaya A and Kompanietz V (2009) Opyt issledovanija metodov klasternogo analiza iz paketa Statistica 6. 0 na primere vuborki gorodov. Izvestiya YuFU, Tehnicheskie nayki pp. S103-110 (in Russian) [25] Kosonogov V (2014). The psychometric properties of the Russian version of the empathy Quotient Psychology in Russia pp. 196-104 [26] Yüce M (2017). An Asymptoic test for the detection of heteroskedasticity http://eidergisi.istanbul.edu.tr/sayi8/iueis8m2.pdf [27] Redace R (2017). Test for heteroskedasticity with the Goldfeld ‒ Quandt test http://www.dummies.com/education/economics/eco nometrics/test-for-heteroskedasticity-with-the- goldfeld-quandt-test/ [28] Krasilnikov D (2011). Programmnoe obespechenie ekonometricheskogo issledovanija Econometric Software. Vestnik Nizhegorodskogo universiteta im. N I Lobachevskogo pp. 231-238 (in Russian) [29] Halunga A, Orme C and Yamagata T (2017). A heteroskedasticity robust Breusch–Pagan test for contemporaneous correlation in dynamic panel data models Journal of econometrics 198 pp. 209-230. https://doi.org/10.1016/j.jeconom.2016.12.005 [30] Ponomarenko V, Malyarets L and Dorokhov A (2011). Obespechenie kontrolya logisticheskoyj deyatelnosti s minimizatsiey logisticheskih zatrat. Izvestija IGEA pp. 137-142 (in Russian) [31] Malyarets L (2011). Matematychni metody v suchasnyh tkonomichnih doslidzhennyah KhNEU im. S Kuznetz: Kharkiv (in Ukrainian) [32] Kovaleva E (2015). Regressionnaya model sebestoimosti elektronnyh multimedijnih izdaniy Vestnik NTU KhPI. Mehaniko-tehnologichni sistemy i kompleksy pp. 55-60 (in Russian) [33] Malyarets L (2016). Matematychni metody i modeli v upravlinni ekonomichnymy protsesamy KhNEU im. S Kuznetz: Kharkiv (in Ukrainian) [34] Degtyareva T, Buresh O and Chepasov V (2003). Statisticheskiy analiz transportnogo kompleksa regiona na osnove regressionnyh modelej. Voprosy statistiki pp. 65-67 (in Russian) [35] Jacob J and Lamari M (2012). Factors influencing research productivity in higher education: an empirical investigation Foresight 6 pp. 40-50 [36] Krasnobokaya I (2011). Analiz formirovaniya sebestoimosti produktsii proizvodstvennogo predprijatiya s ispolzovaniem mnogofaktornyh ekonometricheskih modeley Ekonomicheskiy analiz: teoriya i praktika pp. 38-47 (in Russian) [37] Chao J, Hausman J, Newey W, Swanson N and Woutersen T (2014). Testing overidentifying restrictions with many instruments and heteroskedasticity Journal of econometrics 178 pp. 15-21. https://doi.org/10.1016/j.jeconom.2013.08.003 [38] Bekker P and Crudu F (2015). Jackknife instrumental variable estimation with heteroskedasticity Journal of econometrics 185 pp. 332-342. https://doi.org/10.1016/j.jeconom.2014.08.012 [39] Cavalierea G, Nielsen M and Taylor A (2015). Bootstrap score tests for fractional integration in heteroskedastic ARFIMA models, with an application to price dynamics in commodity spot and futures markets Journal of econometrics 187 pp. 557-579. https://doi.org/10.1016/j.jeconom.2015.02.039 554 Informatica 42 (2018) 545–553 L. Malyarets et al.